Benthic Habitat Mapping Model and Cross Validation Using Machine-Learning Classiﬁcation Algorithms

: This research was aimed at developing the mapping model of benthic habitat mapping using machine-learning classiﬁcation algorithms and tested the applicability of the model in di ﬀ erent areas. We integrated in situ benthic habitat data and image processing of WorldView-2 (WV2) image to parameterise the machine-learning algorithm, namely: Random Forest (RF), Classiﬁcation Tree Analysis (CTA), and Support Vector Machine (SVM). The classiﬁcation inputs are sunglint-free bands, water column corrected bands, Principle Component (PC) bands, bathymetry, and the slope of underwater topography. Kemujan Island was used in developing the model, while Karimunjawa, Menjangan Besar, and Menjangan Kecil Islands served as test areas. The results obtained indicated that RF was more accurate than any other classiﬁcation algorithm based on the statistics and benthic habitats spatial distribution. The maximum accuracy of RF was 94.17% (4 classes) and 88.54% (14 classes). The accuracies from RF, CTA, and SVM were consistent across di ﬀ erent input bands for each classiﬁcation scheme. The application of RF model in the classiﬁcation of benthic habitat in other areas revealed that it is recommended to make use of the more general classiﬁcation scheme in order to avoid several issues regarding benthic habitat variations. The result also established the possibility of mapping a benthic habitat without the use of training areas.


Introduction
Indonesia is a country with abundant coastal and marine natural resources, including coral reef resources. Along with several neighbouring countries in the Asia-Pacific region such as the Philippines, Papua New Guinea, Timor Leste, Malaysia, and the Solomon Islands, Indonesia is incorporated in the Coral Triangle Initiative (CTI), which is an association of countries belonging to the center of coral reef biodiversity in the world and was established with the aim of preserving the natural resources of coral reefs. This underwater ecosystem has a high strategic function in national development and turning the wheels of the national economy, including through the food security sector (supporting the sustainability of fish resource stocks), tourism (through snorkeling and diving), and coastal protection (from coastal erosion caused by currents and waves). Furthermore, the role of coral reefs is essential to President Joko Widodo's program in realizing Indonesia as the World Maritime Axis (PMD). The preservation of coral reefs will be of significant support to Indonesia's ideals of becoming a PMD as regards food security through marine fishery resources (pillar number 2), and the economy through maritime tourism (pillar number 3). Given the importance of coral reefs to the nation, information regarding the spatial distribution of its conditions is very important.
were previously used to map benthic habitat by [7][8][9]18]. Therefore, by using SVM and RF, our work can be compared, widely-recognized, and can be put into context into bigger benthic habitat mapping framework. Meanwhile, it is also necessary to propose CTA algorithm for benthic habitat mapping, as to enrich the selection of possible machine learning algorithms for benthic habitat mapping.
This research was conducted in Karimunjawa Islands ( Figure 1). These islands shelter high biodiversity of benthic habitats [14,19]. The substrate is mainly dominated by carbonate sand. Rubble also present in water adjacent to the shoreline, and red-colored volcanic sand dominates the substrate along the shoreline of Kemujan Island. The development of the mapping model was conducted on Kemujan Island, which was selected as the representative area to develop the mapping model due to the high variations of reef ecology and morphology, and water-depths [14]. Karimunjawa, Menjangan Besar and Menjangan Kecil Islands were used in assessing the applicability of the mapping model.

Image Data
The main remote-sensing image used in this research was WorldView-2 (WV2) image, which was acquired on 24 May 2012 at 2m spatial resolution, eight multispectral bands from coastal to nearinfrared (NIR) band, and 11-bit radiometric resolution ( Table 1). The classification process involved only visible bands, while NIR bands were only used for sunglint correction. From this WV2 image, several derivative data used for classification algorithms inputs were derived; among others deglint (sunglint-free) bands, water-column-corrected bands, bands from Principle Component Analysis (PCA, hereafter PC bands), modelled-bathymetry, and underwater slope.

Image Data
The main remote-sensing image used in this research was WorldView-2 (WV2) image, which was acquired on 24 May 2012 at 2 m spatial resolution, eight multispectral bands from coastal to near-infrared (NIR) band, and 11-bit radiometric resolution ( Table 1). The classification process involved only visible bands, while NIR bands were only used for sunglint correction. From this WV2 image, several derivative data used for classification algorithms inputs were derived; among others deglint (sunglint-free) bands, water-column-corrected bands, bands from Principle Component Analysis (PCA, hereafter PC bands), modelled-bathymetry, and underwater slope.  The WV2 image was received at Level 3X Ortho and the pixels contain radiometrically calibrated Digital Number (DN) value and have already been geometrically orthorectified. The DN was converted to Top-of-Atmosphere (TOA) radiance and reflectance following the formula and parameters described in the handbook and image header (see Table 1). Atmospheric correction was applied to the WV2 image to obtain the Bottom-of-Atmosphere (BOA) reflectance image. While it is not mandatory to make use of the BOA reflectance image when carrying out the mapping procedure using the single-date image [20], it is mandatory to ensure that the relationship between spectral bands during PCA transformation is consistent [14]. In addition, the machine-learning mapping model will be applied to other areas, and thus, all images must be independent of atmospheric condition variations, hence atmospheric correction is highly necessary. Dark-Object Subtraction (DOS) method [21] was applied to the TOA reflectance image making use of optically-deep water reflectance as the dark-target [22]. Pixels of optically-deep water were selected manually using visual interpretation on true color composite image of WV2 (RGB 532, Red-Green-Blue). Using DOS formula described in [1], the values of the dark target were converted into atmospheric offset triggered by path radiance (Table 1). These values were subtracted from TOA reflectance image to obtain BOA reflectance image.

Sunglint Correction
Sunglint is visible across the WV2 image, thus, it is necessary to remove this specular reflection in order to minimize noise in the course of image classification and to avoid misclassification. Bright pixels of sunglint can easily be misclassified as carbonate sand when not removed. A number of sunglint correction methods were reviewed by [23]. Among those methods, the most preferred of them was the simple but robust method by [24]. It is a straightforward model, similar to the method developed by [25], but produces less noise and is more suitable for benthic habitat mapping as opposed to bathymetry mapping [23]. NIR band is required for the correction of sunglint in visible bands, and since WV2 image has two NIR bands, it is essential to select the better one. Thus, the regression analysis was conducted for each visible and NIR band pair. Summary of the sunglint correction process can be found in Table 2.  [26,27] was applied to the deglint image to remove benthic habitat reflectance variations due to water-depth effect. Using this technique, it is not necessary to obtain the actual water-column attenuation coefficient for each band (k), and water-depth for each pixel as in more robust methods [28][29][30][31]. The use of DII prevents errors in predicting the water-column attenuation coefficients and water-depth propagate in the water-column correction algorithm, hence adversely impacts the classification result [14]. Instead, this technique only requires the ratio of water-column attenuation coefficient between visible bands pair, which can be statistically derived by using the reflectance of similar objects located at different depths. Altogether, there are 15 combinations of DII from six WV2 deglint bands (the statistics to derive the water column-corrected bands are not shown).

Principle Component Analysis (PCA)
Principle Component Analysis (PCA) was applied to the WV2 deglint bands to reduce the dimensionality of the data sets and produce uncorrelated output bands where each band contains linear combination of spectral information from all deglint bands. Image transformation, especially PCA, succeeded in improving the overall accuracy of benthic habitat mapping at different levels of classification scheme complexities [14]. Previously conducted researches that employed the noise-cancelling transformation such as Minimum Noise Fraction (MNF) also gained an accuracy improvement for benthic habitat mapping [7,[32][33][34]. The resulting PC bands were used as classification input.

Bathymetry Map
Water depth vertically controls the spatial distribution of benthic habitats, as most photosynthesizing biota is depth-limited. Macroalgae may live up to the depth where the irradiance is only 1% of the surface, whereas seagrass and most zooxanthellae in the reefs-building corals require over 10% of the surface irradiances in order to perform photosynthesis [35,36]. Bathymetry is therefore a key factor to be considered when mapping benthic habitats spatial distribution. Bathymetry map for the study area was adapted from [37]. The modelled-bathymetry map has a standard error of estimate (SE) of 1.01m across an optically shallow water body up to a depth of 7 m. This was obtained from the ratio of blue and yellow band (R 2 = 0.776, sig. 95%CL). This bathymetry data was included in the classification input and was also used in calculating the slope-steepness of the underwater topography. The unit of the resulting slope map is percent (%) and was categorized into ten classes with intervals of 10% from 0-100%.

Field Data Collection
Benthic habitat in situ samples were obtained from photo-transect survey method [38]. In short, benthic habitat photos were captured under the water surface every ±2 m intervals by snorkeling along the transect. Coordinates of the surveyed transects (recorded every 2 seconds) were recorded using the Global Positioning System (GPS) Garmin Map 76CSx placed inside waterproof dry bag floating on the water surface towed to the snorkeler. The photos are linked to GPS coordinates by matching the time in photo metadata and GPS reading using Garmin DNR software. The collected-samples were recorded as point, and each photo was interpreted using CPCe 4.0 software. For each photo, 24 points were randomly placed across the photo. Each point in the photo was labeled based on the benthic habitat class following the classification scheme (Table 3). The final class for each photo was determined by the most dominant class in the photo. The locations of photo-transect samples are shown in Figure 2. matching the time in photo metadata and GPS reading using Garmin DNR software. The collectedsamples were recorded as point, and each photo was interpreted using CPCe 4.0 software. For each photo, 24 points were randomly placed across the photo. Each point in the photo was labeled based on the benthic habitat class following the classification scheme ( Table 3). The final class for each photo was determined by the most dominant class in the photo. The locations of photo-transect samples are shown in Figure 2. Dead coral dominated-area (>70%), still maintain the coral reef structure but was already overgrown by macroalgae and associated with rubble. Since bleaching coral is rare in the study area, bleaching coral was also categorised as this class. Dead coral 179 (1531) Dead coral dominated-area (>70%), still maintain the coral reef structure but was already overgrown by macroalgae and associated with rubble. Since bleaching coral is rare in the study area, bleaching coral was also categorised as this class.  Dead coral dominated-area (>70%), still maintain the coral reef structure but was already overgrown by macroalgae and associated with rubble. Since bleaching coral is rare in the study area, bleaching coral was also categorised as this class. These point samples were upscaled to area samples using segments created from the image segmentation process, run using IDRISI Selva, with similarity tolerance 10, weighted mean 0.5, weighted variance 0.5, window width 3, and using an assumption that the segments represent and correspond with the variation of benthic habitat within the segment, as indicated by its point samples contained within the segment.
The resulting segments were exported into vector file, and then overlaid on the point samples. The segment polygon was labeled according to the dominant benthic habitat class of point samples located within its boundary. These polygon segments were converted back to raster format matching WV2 spatial resolution (2m). The conversion of point samples to area samples using this approach was employed in benthic habitat mapping with the use of OBIA [5,6], but rarely in the per-pixel classification. These area samples were divided into two independent sample sets, with one for image classification, and the other for accuracy assessment.  These point samples were upscaled to area samples using segments created from the image segmentation process, run using IDRISI Selva, with similarity tolerance 10, weighted mean 0.5, weighted variance 0.5, window width 3, and using an assumption that the segments represent and correspond with the variation of benthic habitat within the segment, as indicated by its point samples contained within the segment.
The resulting segments were exported into vector file, and then overlaid on the point samples. The segment polygon was labeled according to the dominant benthic habitat class of point samples located within its boundary. These polygon segments were converted back to raster format matching WV2 spatial resolution (2m). The conversion of point samples to area samples using this approach was employed in benthic habitat mapping with the use of OBIA [5,6], but rarely in the per-pixel classification. These area samples were divided into two independent sample sets, with one for image classification, and the other for accuracy assessment.
The samples for machine learning model training and classification accuracy assessment were randomly selected for each class. Furthermore, if the location of model training and accuracy assessment samples for a particular class is adjacent to each other, whenever possible, we purposively modified the location of these samples so that the model training and accuracy assessment samples are not located adjacent to each other; hence they are statistically and spatially-independent to fully Sand 398 (13,115) Calcium carbonate sand, white bright colour (>70%).
These point samples were upscaled to area samples using segments created from the image segmentation process, run using IDRISI Selva, with similarity tolerance 10, weighted mean 0.5, weighted variance 0.5, window width 3, and using an assumption that the segments represent and correspond with the variation of benthic habitat within the segment, as indicated by its point samples contained within the segment.
The resulting segments were exported into vector file, and then overlaid on the point samples. The segment polygon was labeled according to the dominant benthic habitat class of point samples located within its boundary. These polygon segments were converted back to raster format matching WV2 spatial resolution (2 m). The conversion of point samples to area samples using this approach was employed in benthic habitat mapping with the use of OBIA [5,6], but rarely in the per-pixel Remote Sens. 2019, 11, 1279 7 of 24 classification. These area samples were divided into two independent sample sets, with one for image classification, and the other for accuracy assessment.
The samples for machine learning model training and classification accuracy assessment were randomly selected for each class. Furthermore, if the location of model training and accuracy assessment samples for a particular class is adjacent to each other, whenever possible, we purposively modified the location of these samples so that the model training and accuracy assessment samples are not located adjacent to each other; hence they are statistically and spatially-independent to fully assess the performance of machine learning classification model. We argue that the use of area samples may assess the spatial dimension of benthic habitat classification accuracy, and accommodate the spatial displacement between the GPS reading of situ benthic data and the geometric accuracy of the WV2 image.

Classification Scheme
Constructing benthic habitats classification scheme is not easy. The major level scheme, which is widely recognized, comprises of coral reef, seagrass, macroalgae, and bare substratum class [1,2]. If we want to understand the dynamics, changes, impacts of management, and how these benthic habitats provide ecological functions and serve as natural resources inventory, more detailed information on benthic habitat spatial distribution will be required. The mapping was therefore conducted in a more detailed classification scheme. In this research, instead of using benthic habitat compositions to seek the balance and consistency of mapping at different levels of complexities [14], the major level scheme was further detailed based on the ecological function and how they may be spectrally separated using remote-sensing reflectance. The classification scheme created, based on ecological purposes, was also proposed by Hochberg and Atkinson [39].
Coral reef class was divided into healthy, intermediate, and dead. These three classes are important for ecological analysis, monitoring benthic habitat environment's health, and evaluating management impacts. The healthy coral reef class refers to an area dominated by healthy coral reef of various life-forms. Coral reef life-forms such as branching, tabular, massive, sub-massive, and encrusting corals are commonly found in the study area. The intermediate class refers to an area of healthy coral reef with some variations of rubble, macro algae, and dead coral. Furthermore, the dead coral class refers to an area dominated by dead coral reef, and either bleaching corals, corals

Classification Scheme
Constructing benthic habitats classification scheme is not easy. The major level scheme, which is widely recognized, comprises of coral reef, seagrass, macroalgae, and bare substratum class [1,2]. If we want to understand the dynamics, changes, impacts of management, and how these benthic habitats provide ecological functions and serve as natural resources inventory, more detailed information on benthic habitat spatial distribution will be required. The mapping was therefore conducted in a more detailed classification scheme. In this research, instead of using benthic habitat compositions to seek the balance and consistency of mapping at different levels of complexities [14], the major level scheme was further detailed based on the ecological function and how they may be spectrally separated using remote-sensing reflectance. The classification scheme created, based on ecological purposes, was also proposed by Hochberg and Atkinson [39].
Coral reef class was divided into healthy, intermediate, and dead. These three classes are important for ecological analysis, monitoring benthic habitat environment's health, and evaluating management impacts. The healthy coral reef class refers to an area dominated by healthy coral reef of various life-forms. Coral reef life-forms such as branching, tabular, massive, sub-massive, and encrusting corals are commonly found in the study area. The intermediate class refers to an area of healthy coral reef with some variations of rubble, macro algae, and dead coral. Furthermore, the dead coral class refers to an area dominated by dead coral reef, and either bleaching corals, corals overgrown by algae, or surrounded by rubble. There is a possibility of healthy coral reef being present in this class, but will be at a very low percentage.
Seagrass class was detailed into species composition, given that information regarding species may represent their unique ability to provide shelter and food for marine biota, sequestering and burying carbon, coastal protection, and biodiversity measure. Moreover, mapping seagrass species is a difficult task [40][41][42], despite the spectral response variations [43][44][45]. Seagrass species commonly found in abundance in the study area were Enhalus acoroides (Ea), Thalassia hemprichii (Th), and Cymodocea rotundata (Cr). However, other species such as Halodule uninervis (Hu), Cymodocea serrulata (Cs), Syringodium isoetifoleum (Si), and Halophile ovalis (Ho) are rarely encountered in abundance, and are commonly found amid the more dominant seagrass species such as Ea, Th, and Cr.
Macroalgae class was further classified based on pigment variations, given that brown, green, and red macroalgae are ecologically and economically important. Actually, the life-form of macroalgae, i.e., encrusting, calcareous, turf and fleshy indicates unique functions. Unfortunately, since the reflectance of benthic habitat is basically sourced from visible wavelengths, the classification of macroalgae based on pigment is more feasible, especially since different life-forms of macroalgae may contain similar pigmentations, i.e., calcareous green algae and turf green algae, which make the spectral differentiation difficult.
Bare substratum class was further divided into sand class and rubble class. In the former, the main material is calcium carbonate, which is important for determining the ideal boat route, aquaculture location, and safe places for anchoring. The rubble class can be used as an indicator of coral reef health and degradation. Volcanic sand is unique to the study area. It is red-colored, originated from the weathered and eroded iron-rich volcanic materials, and can be found mainly on Kemujan Island.
In this research, we developed the mapping model based on the two levels of classification scheme. To begin with, we developed the model using the most complex classification scheme and then used the major classification scheme as comparison afterwards. The details of the classification scheme used in this research are given in Table 3.

Image Classification
Three machine-learning algorithms were used in carrying out the benthic habitat classification. The input for classification is the combination of deglint bands, DII, and PCA with bathymetry and slope of underwater benthic habitat topography.

Classification Tree Analysis (CTA)
CTA is categorised as a non-parametric univariate method of image classification. It is a bottom-up approach to classify image, where the user allows the algorithm to learn the information from the input training area in order to classify the image by continuously splitting the data to obtain a homogenous cluster of pixels in a hierarchical tree of classification rules. The components of CTA are: (1) The Root, which is the starting point of the classification tree; (2) the Internode, which is the connection between the roots, leaves, and other internodes; and (3) the Leaf, which is the resulting class that contains pixels of the same class or those classified to that class.
There are three CTA algorithms available and they are Gain Ratio, Entropy, and Gini algorithm. In this study, Gain Ratio algorithm was preferred as it may possibly reduce the potential of over splitting in Entropy algorithm. As a matter of fact, it is the normalization of Entropy algorithm. Gini splitting algorithm was not used, as it was difficult to find a significantly homogenous class within our training area [46]. The process starts from the root, and then using the information from training areas, pixels are split and assigned based on binary split rule. The splitting process will continue to grow until certain conditions are met, which are usually related to the stability of the leaf (benthic habitat class). This result is perceived to be a more precise classification result given that each data (ROI) is considered important, and their characteristics are learned by the algorithm. We experimented with different auto-pruning threshold, from 1%, 5%, 10%, and 15% so as to understand its impact to the classification result. The auto-pruning threshold is used in eliminating leaves with pixel numbers equal or less than the specified percentage of the class proportion.

Random Forest (RF)
The RF algorithm employed in this research was based on [47]. Random forest is an ensemble classification method for supervised classification based on classification trees. RF algorithm can produce a good classification result even though there are many outliers in the training data and also if the data has a lot of noise [48]. To obtain the best RF model, we tuned the RF algorithm using (1) different number of trees, i.e., 25, 50, 75, 100, 500, (2) different functions to determine the number of randomly selected features to determine the optimum split point in a node, which is important to avoid model overfitting, i.e., Square root and Log function, and (3) impurity functions, i.e., Gini coefficient and Entropy.

Support Vector Machine (SVM)
SVM is a powerful machine-learning technique for image classification, and it has the capability of creating boundaries called the hyperplane in the multi-dimensional feature space which separates and classifies each pixel into classes [49,50]. SVM exploits the model based on margin maximization concept. Hence, it has the ability to work efficiently on poor sample distribution, and does not require prior estimation of the class statistical distribution for the classification process [8,51,52]. SVM is increasingly being used in image classification for benthic habitat mapping and produced a better accuracy than parametric classification algorithms including ML [8,34,[52][53][54]. The SVM was conducted using Gaussian Radial Basis Function (RBF) kernel. We tuned the SVM algorithm using different range and multiplier value of C (regularization parameters) and g (The width of the Gaussian kernel function). The range of C and g that we experimented was 0.01-1000.00 and for the multiplier was set to 2, 5, and 10. The number of folds for the internal cross validation was set to 3. The termination criterion for grid search was set to 0.100 and the termination criterion for final training was set to 0.001. Performance surface matrix was used to determine the best combination of C and g that produced the highest classification accuracy. Both RF and SVM were run using EnMAP-Box.

Accuracy Assessments
There are two parts of classification accuracy assessment. First is the confusion matrix analysis, which was used in assessing the accuracy of the resulting classification from RF, CTA, and SVM. It calculates the overall accuracy of the classification result, along with the user's and producer's accuracy of individual benthic habitat class in the classification scheme [55]. Since the classification accuracies are being compared, McNemar Test [56] was applied to the confusion matrix to come to a conclusion if the accuracies of two classifications are significantly different. Secondly, the model of machine-learning algorithm with the highest accuracy was applied to other benthic habitat locations, such as Karimunjawa Island, Menjangan Besar Island, and Menjangan Kecil Island. This procedure was carried out to test the performance of the machine-learning model so as to classify benthic habitats independently, without the use of training areas in other locations.

Research Flowchart
The flowchart of this research is provided in Figure 3.

Classification Results
Benthic habitat classification had its highest accuracy obtained from RF with 88.54% overall accuracy. The mean overall accuracy is 88.05 ± 0.29%. These were obtained from RF model using Gini coefficient to determine impurities in a node and Square root of all features to determine the number of randomly selected features. The selected number of trees is 100. This accuracy is very high, given the number of classes involved and the complexity of the classification scheme. RF produced a better accuracy than other algorithms, not only based on the statistics of accuracy assessment result, but also on the spatial distribution of the benthic habitats across the scene. Seagrass and macroalgae classes were classified along the shoreline. Reef-flat on the southern part of the scene was classified as brown algae and sand. The lagoon located on the western part of the island was also classified as healthy, intermediate or dead coral. Sand was correctly classified in the lagoon, especially in the backreef area. The reef-crest and fore-reef were mainly classified as healthy coral reef with some mixture of intermediate coral reef. The misclassification between coral reef and seagrass in the reef-crest area

Classification Results
Benthic habitat classification had its highest accuracy obtained from RF with 88.54% overall accuracy. The mean overall accuracy is 88.05 ± 0.29%. These were obtained from RF model using Gini coefficient to determine impurities in a node and Square root of all features to determine the number of randomly selected features. The selected number of trees is 100. This accuracy is very high, given the number of classes involved and the complexity of the classification scheme. RF produced a better accuracy than other algorithms, not only based on the statistics of accuracy assessment result, but also on the spatial distribution of the benthic habitats across the scene. Seagrass and macroalgae classes were classified along the shoreline. Reef-flat on the southern part of the scene was classified as brown algae and sand. The lagoon located on the western part of the island was also classified as healthy, intermediate or dead coral. Sand was correctly classified in the lagoon, especially in the back-reef area. The reef-crest and fore-reef were mainly classified as healthy coral reef with some mixture of intermediate coral reef. The misclassification between coral reef and seagrass in the reef-crest area and in the boundary between optically shallow and optically deep water did not occur on RF result but was noticeable in CTA and SVM results (see red polygon in Figure 4). However, there are also areas in the Northern part of Kemujan Island where seagrass was misclassified as coral reef, and only CTA was able to produce the correct classification (see blue boxes in Figure 4).
Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 25 and in the boundary between optically shallow and optically deep water did not occur on RF result but was noticeable in CTA and SVM results (see red polygon in Figure 4). However, there are also areas in the Northern part of Kemujan Island where seagrass was misclassified as coral reef, and only CTA was able to produce the correct classification (see blue boxes in Figure 4). In RF result, a misclassification occurred between healthy coral, intermediate, and dead coral as they share similar spectra and class descriptor. Healthy coral was also misclassified as EaTh and sand. Brown algae class was mainly misclassified as mixed-algae and sand, which was expected, given that mixed-algae also contains brown algae, while mixed-algae was misclassified as coral reef classes due to similar pigmentation, resulting in similar reflectance. Ea was misclassified as brown algae since its reflectance covered by epiphyte resembles that of brown algae [42]. EaTh was misclassified as mixedseagrass and Th, as the spectra of these classes are overlapping [42]. Ho was mainly misclassified as intermediate coral, Th as brown algae and EaTh, and ThCr as Th. These misclassifications can be attributed to the association of Th or ThCr with brown algae, especially Padina sp. and Dictyota sp. The misclassification of Th as EaTh and ThCr as Th was as a result of the overlapping spectra of Th in these classes. Mixed-seagrass was understandably misclassified as ThCr, since ThCr spectra are also mixed-seagrass spectra. Furthermore, mixed-seagrass was also misclassified as sand. Sand was misclassified as brown algae, Ea, healthy coral, and intermediate coral, as sand was the dominant substrate for these classes and almost all benthic classes in the study area. See the confusion matrix of RF result for the detailed information in Table 4.
The highest accuracy from CTA was obtained at 77.8% when using DII. The accuracy of CTA using other inputs was just as high with a mean overall accuracy of 75.39±2.17%, which was higher than the mean classification accuracy of SVM. When all inputs were being utilized, the accuracy of CTA was 77.17%. Ho class has a very low accuracy and rubble produced a total misclassification. Ho is rarely found in a large bed and high density, and thus the dominant resultant reflectance is still highly affected by a sandy background. It was also found adjacent to brown algae Padina sp., and thus, was mostly misclassified as sand and brown algae. Rubble had the worst accuracy with zero classification accuracy. Given that rubble is commonly found in between and adjacent to coral reef of various conditions, all the validation samples were classified as healthy coral reef, intermediate coral reef, or sand. The class descriptor of healthy and intermediate coral reef class also included rubble as the minor component. Rubble had the same material with carbonate sand, and was thus easily misclassified as sand, which had more dominant coverage than rubble. See Table 5 for the confusion matrix of CTA result. In RF result, a misclassification occurred between healthy coral, intermediate, and dead coral as they share similar spectra and class descriptor. Healthy coral was also misclassified as EaTh and sand. Brown algae class was mainly misclassified as mixed-algae and sand, which was expected, given that mixed-algae also contains brown algae, while mixed-algae was misclassified as coral reef classes due to similar pigmentation, resulting in similar reflectance. Ea was misclassified as brown algae since its reflectance covered by epiphyte resembles that of brown algae [42]. EaTh was misclassified as mixed-seagrass and Th, as the spectra of these classes are overlapping [42]. Ho was mainly misclassified as intermediate coral, Th as brown algae and EaTh, and ThCr as Th. These misclassifications can be attributed to the association of Th or ThCr with brown algae, especially Padina sp. and Dictyota sp. The misclassification of Th as EaTh and ThCr as Th was as a result of the overlapping spectra of Th in these classes. Mixed-seagrass was understandably misclassified as ThCr, since ThCr spectra are also mixed-seagrass spectra. Furthermore, mixed-seagrass was also misclassified as sand. Sand was misclassified as brown algae, Ea, healthy coral, and intermediate coral, as sand was the dominant substrate for these classes and almost all benthic classes in the study area. See the confusion matrix of RF result for the detailed information in Table 4.
The highest accuracy from CTA was obtained at 77.8% when using DII. The accuracy of CTA using other inputs was just as high with a mean overall accuracy of 75.39±2.17%, which was higher than the mean classification accuracy of SVM. When all inputs were being utilized, the accuracy of CTA was 77.17%. Ho class has a very low accuracy and rubble produced a total misclassification. Ho is rarely found in a large bed and high density, and thus the dominant resultant reflectance is still highly affected by a sandy background. It was also found adjacent to brown algae Padina sp., and thus, was mostly misclassified as sand and brown algae. Rubble had the worst accuracy with zero classification accuracy. Given that rubble is commonly found in between and adjacent to coral reef of various conditions, all the validation samples were classified as healthy coral reef, intermediate coral reef, or sand. The class descriptor of healthy and intermediate coral reef class also included rubble as the minor component. Rubble had the same material with carbonate sand, and was thus easily misclassified as sand, which had more dominant coverage than rubble. See Table 5 for the confusion matrix of CTA result.  The best CTA results were obtained by using 1% auto-pruning threshold. Based on our experiments, which involved using 5%, 10%, 15%, and 20% auto-pruning threshold, the accuracy decreased on 5% threshold, but increased on 10%. Afterwards, the accuracy kept decreasing on 15% and 20% auto-pruning threshold. In addition to the declining accuracy, the analysis revealed that a high percentage of auto-pruning thresholds should not be recommended for mapping with complex classification scheme. This is due to the fact that it will eliminate some classes when the pixel in the leaf composing these particular classes does not meet the auto-pruning threshold criteria. For instance, at 10% threshold, the accuracy of CTA using DII increased to 89%. However, the remaining benthic habitat classes were only eight.
SVM is the third-best algorithm with an overall accuracy of 75.98%. This was obtained using DII input, with a mean accuracy of 74.27 ± 1.04%, which is slightly lower than CTA mean. These were obtained from the following settings. The C value was set to 10.00 and g value was set to 0.001. Meanwhile, the multiplier setting for C and g value during model tuning was set to 10. Despite the accuracy, SVM failed to classify the rubble class, and several seagrass classes such as ThCr, CrHu and Ho (Table 6). This is contradictory to the work of Reference [9], where seagrass variations can be correctly classified using SVM. This difference can be attributed to the differences in benthic habitat environmental complexity of each study area. Reference [9] classified areas covered mostly by seagrass of different conditions, and thus lowered the misclassification rate of seagrass class to other benthic habitats.
The main setback of running SVM algorithm using high number of training samples is the time required to perform the classification. The time required to run SVM classification using area samples adds up to almost four hours per classification process. We tried running the SVM algorithm using the newest computer processing hardware, but there was no significant decrease in the processing time compared to older computer hardware. Thus, although the accuracy of SVM is quite similar to CTA, the productivity is much lower compared to CTA, especially if we are to experiment with various scenarios of SVM parameter settings. The summary of overall accuracy for each classification result is illustrated in Table 7. The results of machine learning classification are provided in Figure 5. The best CTA results were obtained by using 1% auto-pruning threshold. Based on our experiments, which involved using 5%, 10%, 15%, and 20% auto-pruning threshold, the accuracy decreased on 5% threshold, but increased on 10%. Afterwards, the accuracy kept decreasing on 15% and 20% auto-pruning threshold. In addition to the declining accuracy, the analysis revealed that a high percentage of auto-pruning thresholds should not be recommended for mapping with complex classification scheme. This is due to the fact that it will eliminate some classes when the pixel in the leaf composing these particular classes does not meet the auto-pruning threshold criteria. For instance, at 10% threshold, the accuracy of CTA using DII increased to 89%. However, the remaining benthic habitat classes were only eight.
SVM is the third-best algorithm with an overall accuracy of 75.98%. This was obtained using DII input, with a mean accuracy of 74.27 ± 1.04%, which is slightly lower than CTA mean. These were obtained from the following settings. The C value was set to 10.00 and g value was set to 0.001. Meanwhile, the multiplier setting for C and g value during model tuning was set to 10. Despite the accuracy, SVM failed to classify the rubble class, and several seagrass classes such as ThCr, CrHu and Ho (Table 6). This is contradictory to the work of Reference [9], where seagrass variations can be correctly classified using SVM. This difference can be attributed to the differences in benthic habitat environmental complexity of each study area. Reference [9] classified areas covered mostly by seagrass of different conditions, and thus lowered the misclassification rate of seagrass class to other benthic habitats.
The main setback of running SVM algorithm using high number of training samples is the time required to perform the classification. The time required to run SVM classification using area samples adds up to almost four hours per classification process. We tried running the SVM algorithm using the newest computer processing hardware, but there was no significant decrease in the processing time compared to older computer hardware. Thus, although the accuracy of SVM is quite similar to CTA, the productivity is much lower compared to CTA, especially if we are to experiment with various scenarios of SVM parameter settings. The summary of overall accuracy for each classification result is illustrated in Table 7. The results of machine learning classification are provided in Figure 5.   McNemar test was performed to select the ideal model to be applied on other islands, and RF produced the highest classification accuracy. Based on this test, the performance of RF model using all dataset was not significantly different from deglint bands, and for this reason, we selected the model from RF deglint bands. The RF model was selected from deglint bands because it is more consistent and can be widely applied. The use of DII, PC bands, and bathymetry will improve the complexity of the required input and lead to the non-standard input values in exchange for an insignificant OA improvement. The value of PC bands and DII is highly dependent on the statistic of the input samples and image statistics, which is prone to subjectivity. Meanwhile, bathymetry between areas vastly varies, and cannot be used as standard parameter.

Model Application
The application of RF 14 class benthic habitat model from Kemujan Island to Karimunjawa, Menjangan Besar, and Menjangan Kecil Islands (hereafter, these islands are referred to as test areas) was not very successful. The accuracy of RF in test areas was 48.99%, with healthy coral reefs as the most accurate class with 66.24% and 91.99% UA and PA respectively. The accuracy of other classes such as ThCr (UA 55.13%, PA 55.4%), Ea (UA 49.6%, PA 49.87%), rubble (UA 33.33%, PA 33.59%), intermediate (UA 28.61%, PA 28.85%), and sand (UA 21.38%, PA 27.8%) followed accordingly. The rest of benthic habitat classes had less than 10% of UA and PA. Nevertheless, the result was actually consistent where the shoreline was dominated by seagrass and brown algae, and coral reef was located in reef-crest or reef-cut. However, there were some inconsistencies where healthy coral occurred in the shoreline, and the overestimation of seagrass extent in Menjangan Besar and Menjangan Kecil Islands. These results indicate that: • The classification scheme of benthic habitat is too detailed and creates confusion in the application of models in other areas.

•
Since not all benthic classes may be present in all areas, it is unclear if the model failed to classify a particular class, i.e., CrHu, or this particular class truly does not exist in the area. In our case, we can confirm that this was as a result of a failure in the model, since our field data indicated that there is CrHu located somewhere in the field.

•
For general mapping, the scheme needs to be refined and be more universal, to ensure that all classes in the scheme are present in many areas.
To justify our statement, we simplified the scheme and used the major benthic habitat classification scheme; coral reef, seagrass, macroalgae, and "sand and rubble". All training areas were re-labelled based on these classes. We developed the model by means of RF using deglint bands and produced 94.17% OA. The UAs are 98.58%, 83.13%, 66.69%, and 91.80% for coral reef, seagrass, macroalgae, and "sand and rubble" classes respectively. The PAs are 97.44%, 87.76%, 58.57%, and 93.74% for the classes in the same order. Most classes produced very high accuracies with low misclassifications. Macroalgae class had the lowest accuracy owing to high misclassification rate with sand and rubble, and seagrass.
Afterwards, we applied this RF model to the test areas and yielded 70.93% OA. The UAs for coral reef, seagrass, macroalgae, and "sand and rubble" class are 86.24%, 44.30%, 5.94%, and 37.00% respectively, while the PAs are 91.75%, 20.40%, 13.48%, and 29.64% for the classes in the same order. The accuracy of the simplified RF model produced more accurate results, just as expected. This accuracy is statistically high, especially for rapid mapping, and it is within the acceptable limit for benthic habitat mapping using a scheme that comprises of four benthic classes. The acceptable limit lies within the range of 40%-70% [1] and >60% based on Indonesian Nasional Standard for Mapping [57]. Nevertheless, only coral reef class produced an accurate result, while other classes had low accuracies. Seagrass class was highly misclassified as macroalgae and coral reef, macroalgae was highly misclassified as sand and rubble and seagrass. Sand and rubble class was highly misclassified as coral reef and also indicated that the spatial distribution of coral reef was overestimated. Many areas near shoreline, which should have been classified as sand, seagrass, or macroalgae, were misclassified as coral reef.

Accuracy Comparison
The goal of this research is to perform benthic habitat mapping using machine-learning algorithms, develop a mapping model from the most accurate result, and lastly, adapt the model to other areas. Machine-learning approach is different from the parametric classification algorithms that utilize training area statistics to generate the centroid of each class cluster, while the remaining pixels are classified accordingly, based on the range of boxes, shortest distance, mahalanobis distance, maximum probabilities, or angle of spectral similarity [58]. Machine-learning may resolve the issue of non-normal Gaussian distribution of training areas, where in this research is mainly due to the sub-pixel mixing of benthic habitats, different number of training areas between classes, and the possibility of inconsistencies in labeling the sample photos.
The use of CTA for benthic habitats mapping is limited, and a direct comparison could not be performed. SVM is a powerful classifier for benthic habitats mapping as described in previous works [8,9,52]. In this research, the reported accuracy of SVM is higher than other works that have employed less complex classification schemes [8,9,34,53]. The setback is the failure to properly classify small life-form seagrass classes compared to RF and CTA, and the required processing time. In SVM, there were four classes with zero accuracy, even after we experimented with different settings of SVM parameters in order to obtain more effective classification results and minimizing the misclassification.
It is also rather difficult comparing our result with others since the scheme used in each research is unique. The closest would be [14] with 13 classes, where he reached an accuracy of 40% using PC bands. However, the scheme does not contain any species differentiation of seagrass or the variation of macroalgae pigments. Our accuracy is considered to be higher, compared to other researches with the same or even lower scheme complexities. Even when using hyperspectral data, an accuracy of over 80% was only obtained for benthic habitat mapping with 3-12 classes complexity [7]. Recent researches that made use of high spatial resolution image classified less than ten benthic habitats classes, and examples are [8] with 73% accuracy (four classes), and Reference [59] employed the Spectral Angle Mapper (SAM) method on CASI hyperspectral image (7 classes, accuracy not reported).
The misclassification pattern for SVM and CTA result is similar, where in average, the user's and producer's accuracy of classes containing multiple benthic class (mixed class) are lower than classes containing single benthic class. Meanwhile, in RF result, the average producer's accuracy of mixed classes is lower but the average user's accuracy of mixed classes is higher than classes containing single benthic class (see Tables 4-6). Therefore, RF algorithm has better performance to correctly classify different benthic habitat compositions compared to SVM and CTA. However, strong misclassification did not only occur for mixed classes, but also in the classes containing single benthic object due to the similarity of object spectral response characteristics between individual benthic classes, overlapping class descriptor between classes, and class association, as explained in Section 3.1. As a consequence, even the class containing single benthic object can also have high misclassification rate.
The application of PCA did little to improve the accuracy of RF, CTA, and SVM, which is not in accordance with the result obtained by Reference [14] where the accuracy of PC bands outperformed other inputs such as deglint bands and DII in hierarchical benthic habitats mapping complexity using ML. Another contrary result was obtained by Reference [34] where the application of SVM to PC bands of Landsat 8 image obtained accuracy higher than that of DII. In fact, the accuracy is relatively constant for all inputs with very low standard deviation. Our results indicated that the incorporation of bathymetry and slope data in the classification input had no significant effect on improving the classification accuracy. Depending on the main input bands and algorithm, adding bathymetry and slope data may or may not improve the accuracy, and this is similar to the result from Reference [60] where the addition of LIDAR data increased and decreased the classification accuracy depending on the associated input bands and classification algorithm. Reference [7] indicated that bathymetry is not necessary to improve the accuracy of benthic habitats mapping, but Reference [8] reported otherwise. This difference may be as a result of the quality of bathymetry data. The bathymetry model of [8] was generated from a more complex radiative transfer model. Hence, resulting in a more accurate bathymetry map, and provided accuracy improvement on benthic habitat classification result.
Reference [8] highlighted the importance of water-column correction, therefore the dominant spectral response from underwater pixels sourced solely from benthic habitats, meanwhile, [7] warned about the difficulties of removing water-column effect from an optically shallow water, due to the need to obtain unique parameters for each spectral band. Additionally, the correction of water-column effect may not always be beneficial. Using ML, [14] did not manage to obtain accuracy improvement just by applying DII. Since there are several methods for removing the water-column effect, it is important to assess the impact of these different methods on the classification accuracy in other works to come.
In our case, the most accurate CTA and SVM result was from DII. Although the accuracy improvement for CTA was only 2% from deglint bands, DII made a significant impact in resolving the misclassification between coral reef and seagrass. Interestingly, if the results of RF, SVM and CTA from deglint bands, water column corrected bands, and PC bands are compared, the accuracy would not vary much. Even if all data were used at once, the accuracy will still not be significantly different. Hence, machine-learning algorithms may also reflect the maximum descriptive resolution of remote-sensing image to map benthic habitats at this level of complexity.

Benefits and Setbacks of Machine-Learning Algorithms
Using machine-learning algorithm, it is very possible to include various datasets as classification inputs, whether it is spectral bands or continuous dataset, i.e., bathymetry, slope, distance from shoreline, and categorical dataset, i.e., coral reef geomorphology map, in order to obtain an accurate benthic habitat map. The more data we made use of, the more information can be used in the machine-learning process. We do not have to repeatedly process and classify the image using different scenarios. Thus, machine-learning algorithm may produce a classification result at the maximum image descriptive resolution. In this research, the maximum accuracy is 88.54% (14 class) and 94.17% (4 class), which were derived from RF using Deglint bands. When we consider the various combinations of inputs or all input bands from all algorithms, the difference is insignificant (<5% standard deviation), which means that we can either make use of all the available data, or only limited data such as deglint bands to produce high accuracy. Machine-learning algorithm is powerful enough to exploit the capability of the input bands to their maximum descriptive resolution. This can be observed from the insignificant difference between accuracies of different input bands. Consequently, further processing may not be necessary, given that deglint bands are capable of producing an accurate result. We just need to ensure the correctness of the spatial distribution of benthic habitats by carefully identifying areas with potential misclassifications.
We also maintained the details provided by WV2 spatial resolution (2 m) while obtaining higher classification accuracy. Depending on benthic habitat complexity of the study area, the intended details of benthic habitat map, and the complexity of the classification scheme, per-pixel classification or OBIA may produce more representative benthic habitat spatial distribution. In this research, we preferred to have information at pixel level for benthic habitat mapping, as the information contained in each pixel is unique and the variation of adjacent benthic habitat in the study area is not always constant.
There are many small patches of coral reef, seagrass, sand, or macroalgae in between and among on another. Coral reef and the change of adjacent habitat may not be linear, and sometimes exhibit a distinct feature boundary [61]. This distinct boundary is highly essential in identifying the biodiversity and ecological composition. Generalizing this variation may not be ecologically sound and reduce intra-habitat variations information. Performing majority analysis may also provide a better object distribution and less noise. However, it also decreases the precision provided by the WV2 image, and thus negates the high cost required to purchase the image [61]. Finally, whether it is per-pixel classification or OBIA, it depends on the in situ object variations that we intend to explain as well as what we are really planning to aim for benthic habitat mapping.

Machine-Learning Model Performance in Test Area
Using a model developed by a complex classification scheme is not effective when applied to other areas. Only several classes managed to be classified with good accuracy, especially the most dominant class, which is healthy coral reef. In addition, the detailed variation of benthic habitat between areas may not be similar, and thus, there is a possibility that a particular class in the developed model is not present in the applied areas, and this could therefore result in the possibility of having a non-existence class in the applied areas. The use of a more general scheme yielded better accuracy, though some classes still failed to be mapped at high accuracy. Coral reef class remains the class to be mapped with the highest accuracy, while macroalgae is mapped with the lowest.
As regards to the model that can be widely applied, it is important not to use input bands with high variability. For instance, the quality of water column corrected image is highly subjective and dependent of the range of depths and water column attenuation coefficient. PC bands result also relies strongly on the object variation in the scene, and bathymetry vastly varies between areas. As a result, the application of the model may fail. The most standard input would be surface reflectance and deglint bands. However, deglint bands may also introduce variability, depending on the strength of sunglint intensity.
The main goal for developing benthic habitats mapping model is to establish an automatic, accurate and consistent mapping model that can be widely applied. This research has shown that it is possible to perform such a task, although it is still premature to be concluded, since the area used in testing the model performance is having a relatively similar reef type and benthic habitat complexity. We plan to apply the RF model developed in this research to map benthic habitat in different reef types with different benthic habitat complexities. It is also important for us to develop similar models, especially using 3 visible bands, given that this is the most widely available spectral resolution, i.e., IKONOS, Quickbird, Geoeye-1, Skysat, PlanetScope, Rapideye, ALOS AVNIR-2, Sentinel-2, Landsat series, and SPOT series. This will ensure data continuity and possibility to perform historical analysis of benthic habitat.
Finally, refining the classification scheme in order to understand whether the model scheme has satisfactorily answered various management needs, and if obtaining the variations of benthic habitats areas across different geographical areas are necessary. For instance, seagrass class in our study did not include Thalassodendrom ciliatum (Tc), which is uniquely abundant on Nusa Lembongan Island [42]. Furthermore, it is essential to identify the balance between the details of benthic habitat classification scheme and the ease of scheme adaption to different areas. If these can be achieved, we can predict a well-detailed benthic habitat map across inaccessible areas, since the requirement to have in situ data to train the classification process is no longer necessary, unless the benthic habitats composition is unique to the area. Obtaining in situ benthic habitat data for validation will then be subject to the management priority.

Conclusions
Machine-learning algorithms are powerful approaches used in classifying benthic habitats at detailed and major classification schemes in order to produce an accurate benthic habitat map. Machine-learning algorithms also produced more stable classification results where the variation of input bands resulted in no significant alteration on the overall accuracy. The maximum accuracy was obtained from RF with 94.17% (4 classes) and 88.54% (14 classes), which is relatively higher than most classification results at similar levels of scheme complexity and managed to resolve various misclassification issues encountered by CTA and SVM. CTA and SVM also produced high accuracy, but there were several issues of misclassification between coral reef and seagrass classes in the CTA result, and also the failure of SVM to classify 4 from 14 benthic habitat classes. The application of RF model in classifying benthic habitat in other areas reveals that it is recommended to use the more general classification scheme to avoid several issues associated with benthic habitat variations. We successfully applied the 4-class RF model developed in Kemujan Island to the test areas with acceptable accuracy. Therefore, the RF model that we developed in this project can be applied to other benthic habitat areas, provided that the input bands required by our RF model are available. However, it is important to test our RF model to classify benthic habitat in other areas with either similar or different environmental setting to provide a more comprehensive assessment regarding its applicability. When the mapping model can be applied to other areas with a relatively consistent accuracy, initial benthic habitat mapping effort can be conducted without performing field survey across many currently unmapped areas.