Next Article in Journal
Evaluation of Soil Moisture Variability in Poland from SMOS Satellite Observations
Previous Article in Journal
Deformations and Morphology Changes Associated with the 2016–2017 Eruption Sequence at Bezymianny Volcano, Kamchatka
Article

Benthic Habitat Mapping Model and Cross Validation Using Machine-Learning Classification Algorithms

1
Department of Geographic Information Science, Faculty of Geography, Universitas Gadjah Mada, Yogyakarta 55281, Indonesia
2
Faculty of Engineering, Universitas Esa Unggul, Jakarta 11510, Indonesia
*
Author to whom correspondence should be addressed.
Remote Sens. 2019, 11(11), 1279; https://doi.org/10.3390/rs11111279
Received: 15 April 2019 / Revised: 21 May 2019 / Accepted: 25 May 2019 / Published: 29 May 2019
(This article belongs to the Section Coral Reefs Remote Sensing)

Abstract

This research was aimed at developing the mapping model of benthic habitat mapping using machine-learning classification algorithms and tested the applicability of the model in different areas. We integrated in situ benthic habitat data and image processing of WorldView-2 (WV2) image to parameterise the machine-learning algorithm, namely: Random Forest (RF), Classification Tree Analysis (CTA), and Support Vector Machine (SVM). The classification inputs are sunglint-free bands, water column corrected bands, Principle Component (PC) bands, bathymetry, and the slope of underwater topography. Kemujan Island was used in developing the model, while Karimunjawa, Menjangan Besar, and Menjangan Kecil Islands served as test areas. The results obtained indicated that RF was more accurate than any other classification algorithm based on the statistics and benthic habitats spatial distribution. The maximum accuracy of RF was 94.17% (4 classes) and 88.54% (14 classes). The accuracies from RF, CTA, and SVM were consistent across different input bands for each classification scheme. The application of RF model in the classification of benthic habitat in other areas revealed that it is recommended to make use of the more general classification scheme in order to avoid several issues regarding benthic habitat variations. The result also established the possibility of mapping a benthic habitat without the use of training areas.
Keywords: classification; benthic habitat; machine-learning; mapping; random forest; support vector machine; classification tree analysis; accuracy assessment classification; benthic habitat; machine-learning; mapping; random forest; support vector machine; classification tree analysis; accuracy assessment

1. Introduction

Indonesia is a country with abundant coastal and marine natural resources, including coral reef resources. Along with several neighbouring countries in the Asia-Pacific region such as the Philippines, Papua New Guinea, Timor Leste, Malaysia, and the Solomon Islands, Indonesia is incorporated in the Coral Triangle Initiative (CTI), which is an association of countries belonging to the center of coral reef biodiversity in the world and was established with the aim of preserving the natural resources of coral reefs. This underwater ecosystem has a high strategic function in national development and turning the wheels of the national economy, including through the food security sector (supporting the sustainability of fish resource stocks), tourism (through snorkeling and diving), and coastal protection (from coastal erosion caused by currents and waves). Furthermore, the role of coral reefs is essential to President Joko Widodo’s program in realizing Indonesia as the World Maritime Axis (PMD). The preservation of coral reefs will be of significant support to Indonesia’s ideals of becoming a PMD as regards food security through marine fishery resources (pillar number 2), and the economy through maritime tourism (pillar number 3). Given the importance of coral reefs to the nation, information regarding the spatial distribution of its conditions is very important.
Benthic habitats mapping activities including coral reef are quite challenging with the use of remote-sensing, and this led to the development of approaches to obtain a higher classification accuracy [1,2]. Per-pixel classification algorithms were commonly used in conducting the mapping process, especially at a general level of benthic habitat complexity [3,4]. The development of algorithm was further enhanced, using information on object’s texture, shape, neighborhood, and other spatial aspects in Object-based Image Analysis (OBIA) [5,6]. Recently, machine learning non-parametric classification algorithms such as Support Vector Machine (SVM) and Random Forest (RF) have also been adapted and showed promising accuracy [7,8,9]. Improving the accuracy is not only limited to finding the suitable classification algorithm but also on finding the most suitable benthic habitat classification scheme for the specific remote-sensing data [10], integrating various datasets such as hyperspectral image, aerial photo, and bathymetry [7,9], incorporating active and passive remote-sensing systems [11], and conducting the mapping procedure, using hierarchical classification process [5].
Recent improvements of spatial and spectral resolution of multispectral images challenged us to produce a more accurate benthic habitat map. However, the available spectral bands to penetrate the water body is limited to visible bands and hindered by the low signal-to-noise ratio (SNR) of shorter-wavelength bands [8], which make the mapping more challenging. Consequently, only benthic habitat mappings using four classes’ scheme (i.e., coral reefs, seagrass, macro-algae, and sand), generally have higher accuracy [2]. However, the accuracy is relatively lower for mappings at detailed scheme [12,13,14]. Furthermore, [15] have established that even at 0m, most benthic habitats with similar pigmentation are difficult to be spectrally-differentiated, and with the presence of water-column energy attenuation effect, as the water-depth increases, so does the difficulty of spectrally-separating benthic habitat. This occurs especially when detailed benthic habitats classification scheme is desired. Furthermore, the accuracies of detailed benthic habitat mapping vary greatly with images, and the methods employed, as well as the complexity of benthic habitat environment and composition of each study area [2,16].
A commonly used classification algorithm for benthic habitats mapping is Maximum Likelihood (ML) [3,4,12,13,14,17]. This classification is considered ideal if the spectral response of benthic habitat has Gaussian distribution and the training area of benthic habitats feature in the dataset is normally distributed. In reality, these assumptions are not always met, especially for complex benthic habitat environments. Machine-learning techniques such as SVM, RF, and Neural Net (NN) are more suitable to accommodate these issues and produced higher accuracy than ML [7,9,18], since they do not require such assumptions to work effectively.
Another issue of benthic habitat mapping using the remote-sensing method is the need to obtain field data for the training of the classification algorithm each time the classification is carried out. As a consequence, benthic habitat map for areas with limited access is scarce or maybe not even in existence, whereas, these hardly accessible areas shelter rich biodiversity of benthic habitats. Many of these areas remain unmapped mainly as a result of this issue, which will impact on the time required to conduct the mapping, as well as the required costs for mapping logistics, i.e., transportation, accommodation, surveyor, and device insurance. Therefore, it is also essential to develop a benthic habitat mapping model that can be applied to different areas with a relatively consistent accuracy, and this became the main purpose of this research.
We tested three machine-learning classifications algorithms (RF, CTA, and SVM) in order to classify benthic habitats, and obtain the parameter of its most accurate map to be adapted to other areas. Machine-learning algorithm has the capability of developing a parameter and a set of decision rules that can be saved and adapted to other images given they have similar inputs. The classification was performed at pixel level given that it has the benefit to maintain the precision of the detailed benthic habitat variations at this level, which is in some way sacrificed in the OBIA. We made use of two levels of classification scheme complexities to assess the performance of the model. SVM and RF were previously used to map benthic habitat by [7,8,9,18]. Therefore, by using SVM and RF, our work can be compared, widely-recognized, and can be put into context into bigger benthic habitat mapping framework. Meanwhile, it is also necessary to propose CTA algorithm for benthic habitat mapping, as to enrich the selection of possible machine learning algorithms for benthic habitat mapping.
This research was conducted in Karimunjawa Islands (Figure 1). These islands shelter high biodiversity of benthic habitats [14,19]. The substrate is mainly dominated by carbonate sand. Rubble also present in water adjacent to the shoreline, and red-colored volcanic sand dominates the substrate along the shoreline of Kemujan Island. The development of the mapping model was conducted on Kemujan Island, which was selected as the representative area to develop the mapping model due to the high variations of reef ecology and morphology, and water-depths [14]. Karimunjawa, Menjangan Besar and Menjangan Kecil Islands were used in assessing the applicability of the mapping model.

2. Materials and Methods

2.1. Image Data

The main remote-sensing image used in this research was WorldView-2 (WV2) image, which was acquired on 24 May 2012 at 2 m spatial resolution, eight multispectral bands from coastal to near-infrared (NIR) band, and 11-bit radiometric resolution (Table 1). The classification process involved only visible bands, while NIR bands were only used for sunglint correction. From this WV2 image, several derivative data used for classification algorithms inputs were derived; among others deglint (sunglint-free) bands, water-column-corrected bands, bands from Principle Component Analysis (PCA, hereafter PC bands), modelled-bathymetry, and underwater slope.

2.2. Image Corrections

2.2.1. Atmospheric Correction

The WV2 image was received at Level 3X Ortho and the pixels contain radiometrically calibrated Digital Number (DN) value and have already been geometrically orthorectified. The DN was converted to Top-of-Atmosphere (TOA) radiance and reflectance following the formula and parameters described in the handbook and image header (see Table 1). Atmospheric correction was applied to the WV2 image to obtain the Bottom-of-Atmosphere (BOA) reflectance image. While it is not mandatory to make use of the BOA reflectance image when carrying out the mapping procedure using the single-date image [20], it is mandatory to ensure that the relationship between spectral bands during PCA transformation is consistent [14]. In addition, the machine-learning mapping model will be applied to other areas, and thus, all images must be independent of atmospheric condition variations, hence atmospheric correction is highly necessary. Dark-Object Subtraction (DOS) method [21] was applied to the TOA reflectance image making use of optically-deep water reflectance as the dark-target [22]. Pixels of optically-deep water were selected manually using visual interpretation on true color composite image of WV2 (RGB 532, Red-Green-Blue). Using DOS formula described in [1], the values of the dark target were converted into atmospheric offset triggered by path radiance (Table 1). These values were subtracted from TOA reflectance image to obtain BOA reflectance image.

2.2.2. Sunglint Correction

Sunglint is visible across the WV2 image, thus, it is necessary to remove this specular reflection in order to minimize noise in the course of image classification and to avoid misclassification. Bright pixels of sunglint can easily be misclassified as carbonate sand when not removed. A number of sunglint correction methods were reviewed by [23]. Among those methods, the most preferred of them was the simple but robust method by [24]. It is a straightforward model, similar to the method developed by [25], but produces less noise and is more suitable for benthic habitat mapping as opposed to bathymetry mapping [23]. NIR band is required for the correction of sunglint in visible bands, and since WV2 image has two NIR bands, it is essential to select the better one. Thus, the regression analysis was conducted for each visible and NIR band pair. Summary of the sunglint correction process can be found in Table 2.

2.2.3. Water-Column Correction

A simple water-column correction technique called Depth-Invariant Bottom Index (DII) which was developed by References [26,27] was applied to the deglint image to remove benthic habitat reflectance variations due to water-depth effect. Using this technique, it is not necessary to obtain the actual water-column attenuation coefficient for each band (k), and water-depth for each pixel as in more robust methods [28,29,30,31]. The use of DII prevents errors in predicting the water-column attenuation coefficients and water-depth propagate in the water-column correction algorithm, hence adversely impacts the classification result [14]. Instead, this technique only requires the ratio of water-column attenuation coefficient between visible bands pair, which can be statistically derived by using the reflectance of similar objects located at different depths. Altogether, there are 15 combinations of DII from six WV2 deglint bands (the statistics to derive the water column-corrected bands are not shown).

2.3. Principle Component Analysis (PCA)

Principle Component Analysis (PCA) was applied to the WV2 deglint bands to reduce the dimensionality of the data sets and produce uncorrelated output bands where each band contains linear combination of spectral information from all deglint bands. Image transformation, especially PCA, succeeded in improving the overall accuracy of benthic habitat mapping at different levels of classification scheme complexities [14]. Previously conducted researches that employed the noise-cancelling transformation such as Minimum Noise Fraction (MNF) also gained an accuracy improvement for benthic habitat mapping [7,32,33,34]. The resulting PC bands were used as classification input.

2.4. Bathymetry Map

Water depth vertically controls the spatial distribution of benthic habitats, as most photosynthesizing biota is depth-limited. Macroalgae may live up to the depth where the irradiance is only 1% of the surface, whereas seagrass and most zooxanthellae in the reefs-building corals require over 10% of the surface irradiances in order to perform photosynthesis [35,36]. Bathymetry is therefore a key factor to be considered when mapping benthic habitats spatial distribution. Bathymetry map for the study area was adapted from [37]. The modelled-bathymetry map has a standard error of estimate (SE) of 1.01m across an optically shallow water body up to a depth of 7 m. This was obtained from the ratio of blue and yellow band (R2 = 0.776, sig. 95%CL). This bathymetry data was included in the classification input and was also used in calculating the slope-steepness of the underwater topography. The unit of the resulting slope map is percent (%) and was categorized into ten classes with intervals of 10% from 0–100%.

2.5. Field Data Collection and Classification Scheme Construction

2.5.1. Field Data Collection

Benthic habitat in situ samples were obtained from photo-transect survey method [38]. In short, benthic habitat photos were captured under the water surface every ±2 m intervals by snorkeling along the transect. Coordinates of the surveyed transects (recorded every 2 seconds) were recorded using the Global Positioning System (GPS) Garmin Map 76CSx placed inside waterproof dry bag floating on the water surface towed to the snorkeler. The photos are linked to GPS coordinates by matching the time in photo metadata and GPS reading using Garmin DNR software. The collected-samples were recorded as point, and each photo was interpreted using CPCe 4.0 software. For each photo, 24 points were randomly placed across the photo. Each point in the photo was labeled based on the benthic habitat class following the classification scheme (Table 3). The final class for each photo was determined by the most dominant class in the photo. The locations of photo-transect samples are shown in Figure 2.
These point samples were upscaled to area samples using segments created from the image segmentation process, run using IDRISI Selva, with similarity tolerance 10, weighted mean 0.5, weighted variance 0.5, window width 3, and using an assumption that the segments represent and correspond with the variation of benthic habitat within the segment, as indicated by its point samples contained within the segment.
The resulting segments were exported into vector file, and then overlaid on the point samples. The segment polygon was labeled according to the dominant benthic habitat class of point samples located within its boundary. These polygon segments were converted back to raster format matching WV2 spatial resolution (2 m). The conversion of point samples to area samples using this approach was employed in benthic habitat mapping with the use of OBIA [5,6], but rarely in the per-pixel classification. These area samples were divided into two independent sample sets, with one for image classification, and the other for accuracy assessment.
The samples for machine learning model training and classification accuracy assessment were randomly selected for each class. Furthermore, if the location of model training and accuracy assessment samples for a particular class is adjacent to each other, whenever possible, we purposively modified the location of these samples so that the model training and accuracy assessment samples are not located adjacent to each other; hence they are statistically and spatially-independent to fully assess the performance of machine learning classification model. We argue that the use of area samples may assess the spatial dimension of benthic habitat classification accuracy, and accommodate the spatial displacement between the GPS reading of situ benthic data and the geometric accuracy of the WV2 image.

2.5.2. Classification Scheme

Constructing benthic habitats classification scheme is not easy. The major level scheme, which is widely recognized, comprises of coral reef, seagrass, macroalgae, and bare substratum class [1,2]. If we want to understand the dynamics, changes, impacts of management, and how these benthic habitats provide ecological functions and serve as natural resources inventory, more detailed information on benthic habitat spatial distribution will be required. The mapping was therefore conducted in a more detailed classification scheme. In this research, instead of using benthic habitat compositions to seek the balance and consistency of mapping at different levels of complexities [14], the major level scheme was further detailed based on the ecological function and how they may be spectrally separated using remote-sensing reflectance. The classification scheme created, based on ecological purposes, was also proposed by Hochberg and Atkinson [39].
Coral reef class was divided into healthy, intermediate, and dead. These three classes are important for ecological analysis, monitoring benthic habitat environment’s health, and evaluating management impacts. The healthy coral reef class refers to an area dominated by healthy coral reef of various life-forms. Coral reef life-forms such as branching, tabular, massive, sub-massive, and encrusting corals are commonly found in the study area. The intermediate class refers to an area of healthy coral reef with some variations of rubble, macro algae, and dead coral. Furthermore, the dead coral class refers to an area dominated by dead coral reef, and either bleaching corals, corals overgrown by algae, or surrounded by rubble. There is a possibility of healthy coral reef being present in this class, but will be at a very low percentage.
Seagrass class was detailed into species composition, given that information regarding species may represent their unique ability to provide shelter and food for marine biota, sequestering and burying carbon, coastal protection, and biodiversity measure. Moreover, mapping seagrass species is a difficult task [40,41,42], despite the spectral response variations [43,44,45]. Seagrass species commonly found in abundance in the study area were Enhalus acoroides (Ea), Thalassia hemprichii (Th), and Cymodocea rotundata (Cr). However, other species such as Halodule uninervis (Hu), Cymodocea serrulata (Cs), Syringodium isoetifoleum (Si), and Halophile ovalis (Ho) are rarely encountered in abundance, and are commonly found amid the more dominant seagrass species such as Ea, Th, and Cr.
Macroalgae class was further classified based on pigment variations, given that brown, green, and red macroalgae are ecologically and economically important. Actually, the life-form of macroalgae, i.e., encrusting, calcareous, turf and fleshy indicates unique functions. Unfortunately, since the reflectance of benthic habitat is basically sourced from visible wavelengths, the classification of macroalgae based on pigment is more feasible, especially since different life-forms of macroalgae may contain similar pigmentations, i.e., calcareous green algae and turf green algae, which make the spectral differentiation difficult.
Bare substratum class was further divided into sand class and rubble class. In the former, the main material is calcium carbonate, which is important for determining the ideal boat route, aquaculture location, and safe places for anchoring. The rubble class can be used as an indicator of coral reef health and degradation. Volcanic sand is unique to the study area. It is red-colored, originated from the weathered and eroded iron-rich volcanic materials, and can be found mainly on Kemujan Island.
In this research, we developed the mapping model based on the two levels of classification scheme. To begin with, we developed the model using the most complex classification scheme and then used the major classification scheme as comparison afterwards. The details of the classification scheme used in this research are given in Table 3.

2.6. Image Classification

Three machine-learning algorithms were used in carrying out the benthic habitat classification. The input for classification is the combination of deglint bands, DII, and PCA with bathymetry and slope of underwater benthic habitat topography.

2.6.1. Classification Tree Analysis (CTA)

CTA is categorised as a non-parametric univariate method of image classification. It is a bottom-up approach to classify image, where the user allows the algorithm to learn the information from the input training area in order to classify the image by continuously splitting the data to obtain a homogenous cluster of pixels in a hierarchical tree of classification rules. The components of CTA are: (1) The Root, which is the starting point of the classification tree; (2) the Internode, which is the connection between the roots, leaves, and other internodes; and (3) the Leaf, which is the resulting class that contains pixels of the same class or those classified to that class.
There are three CTA algorithms available and they are Gain Ratio, Entropy, and Gini algorithm. In this study, Gain Ratio algorithm was preferred as it may possibly reduce the potential of over splitting in Entropy algorithm. As a matter of fact, it is the normalization of Entropy algorithm. Gini splitting algorithm was not used, as it was difficult to find a significantly homogenous class within our training area [46]. The process starts from the root, and then using the information from training areas, pixels are split and assigned based on binary split rule. The splitting process will continue to grow until certain conditions are met, which are usually related to the stability of the leaf (benthic habitat class). This result is perceived to be a more precise classification result given that each data (ROI) is considered important, and their characteristics are learned by the algorithm. We experimented with different auto-pruning threshold, from 1%, 5%, 10%, and 15% so as to understand its impact to the classification result. The auto-pruning threshold is used in eliminating leaves with pixel numbers equal or less than the specified percentage of the class proportion.

2.6.2. Random Forest (RF)

The RF algorithm employed in this research was based on [47]. Random forest is an ensemble classification method for supervised classification based on classification trees. RF algorithm can produce a good classification result even though there are many outliers in the training data and also if the data has a lot of noise [48]. To obtain the best RF model, we tuned the RF algorithm using (1) different number of trees, i.e., 25, 50, 75, 100, 500, (2) different functions to determine the number of randomly selected features to determine the optimum split point in a node, which is important to avoid model overfitting, i.e., Square root and Log function, and (3) impurity functions, i.e., Gini coefficient and Entropy.

2.6.3. Support Vector Machine (SVM)

SVM is a powerful machine-learning technique for image classification, and it has the capability of creating boundaries called the hyperplane in the multi-dimensional feature space which separates and classifies each pixel into classes [49,50]. SVM exploits the model based on margin maximization concept. Hence, it has the ability to work efficiently on poor sample distribution, and does not require prior estimation of the class statistical distribution for the classification process [8,51,52]. SVM is increasingly being used in image classification for benthic habitat mapping and produced a better accuracy than parametric classification algorithms including ML [8,34,52,53,54]. The SVM was conducted using Gaussian Radial Basis Function (RBF) kernel. We tuned the SVM algorithm using different range and multiplier value of C (regularization parameters) and g (The width of the Gaussian kernel function). The range of C and g that we experimented was 0.01–1000.00 and for the multiplier was set to 2, 5, and 10. The number of folds for the internal cross validation was set to 3. The termination criterion for grid search was set to 0.100 and the termination criterion for final training was set to 0.001. Performance surface matrix was used to determine the best combination of C and g that produced the highest classification accuracy. Both RF and SVM were run using EnMAP-Box.

2.7. Accuracy Assessments

There are two parts of classification accuracy assessment. First is the confusion matrix analysis, which was used in assessing the accuracy of the resulting classification from RF, CTA, and SVM. It calculates the overall accuracy of the classification result, along with the user’s and producer’s accuracy of individual benthic habitat class in the classification scheme [55]. Since the classification accuracies are being compared, McNemar Test [56] was applied to the confusion matrix to come to a conclusion if the accuracies of two classifications are significantly different. Secondly, the model of machine-learning algorithm with the highest accuracy was applied to other benthic habitat locations, such as Karimunjawa Island, Menjangan Besar Island, and Menjangan Kecil Island. This procedure was carried out to test the performance of the machine-learning model so as to classify benthic habitats independently, without the use of training areas in other locations.

2.8. Research Flowchart

The flowchart of this research is provided in Figure 3.

3. Results

3.1. Classification Results

Benthic habitat classification had its highest accuracy obtained from RF with 88.54% overall accuracy. The mean overall accuracy is 88.05 ± 0.29%. These were obtained from RF model using Gini coefficient to determine impurities in a node and Square root of all features to determine the number of randomly selected features. The selected number of trees is 100. This accuracy is very high, given the number of classes involved and the complexity of the classification scheme. RF produced a better accuracy than other algorithms, not only based on the statistics of accuracy assessment result, but also on the spatial distribution of the benthic habitats across the scene. Seagrass and macroalgae classes were classified along the shoreline. Reef-flat on the southern part of the scene was classified as brown algae and sand. The lagoon located on the western part of the island was also classified as healthy, intermediate or dead coral. Sand was correctly classified in the lagoon, especially in the back-reef area. The reef-crest and fore-reef were mainly classified as healthy coral reef with some mixture of intermediate coral reef. The misclassification between coral reef and seagrass in the reef-crest area and in the boundary between optically shallow and optically deep water did not occur on RF result but was noticeable in CTA and SVM results (see red polygon in Figure 4). However, there are also areas in the Northern part of Kemujan Island where seagrass was misclassified as coral reef, and only CTA was able to produce the correct classification (see blue boxes in Figure 4).
In RF result, a misclassification occurred between healthy coral, intermediate, and dead coral as they share similar spectra and class descriptor. Healthy coral was also misclassified as EaTh and sand. Brown algae class was mainly misclassified as mixed-algae and sand, which was expected, given that mixed-algae also contains brown algae, while mixed-algae was misclassified as coral reef classes due to similar pigmentation, resulting in similar reflectance. Ea was misclassified as brown algae since its reflectance covered by epiphyte resembles that of brown algae [42]. EaTh was misclassified as mixed-seagrass and Th, as the spectra of these classes are overlapping [42]. Ho was mainly misclassified as intermediate coral, Th as brown algae and EaTh, and ThCr as Th. These misclassifications can be attributed to the association of Th or ThCr with brown algae, especially Padina sp. and Dictyota sp. The misclassification of Th as EaTh and ThCr as Th was as a result of the overlapping spectra of Th in these classes. Mixed-seagrass was understandably misclassified as ThCr, since ThCr spectra are also mixed-seagrass spectra. Furthermore, mixed-seagrass was also misclassified as sand. Sand was misclassified as brown algae, Ea, healthy coral, and intermediate coral, as sand was the dominant substrate for these classes and almost all benthic classes in the study area. See the confusion matrix of RF result for the detailed information in Table 4.
The highest accuracy from CTA was obtained at 77.8% when using DII. The accuracy of CTA using other inputs was just as high with a mean overall accuracy of 75.39±2.17%, which was higher than the mean classification accuracy of SVM. When all inputs were being utilized, the accuracy of CTA was 77.17%. Ho class has a very low accuracy and rubble produced a total misclassification. Ho is rarely found in a large bed and high density, and thus the dominant resultant reflectance is still highly affected by a sandy background. It was also found adjacent to brown algae Padina sp., and thus, was mostly misclassified as sand and brown algae. Rubble had the worst accuracy with zero classification accuracy. Given that rubble is commonly found in between and adjacent to coral reef of various conditions, all the validation samples were classified as healthy coral reef, intermediate coral reef, or sand. The class descriptor of healthy and intermediate coral reef class also included rubble as the minor component. Rubble had the same material with carbonate sand, and was thus easily misclassified as sand, which had more dominant coverage than rubble. See Table 5 for the confusion matrix of CTA result.
The best CTA results were obtained by using 1% auto-pruning threshold. Based on our experiments, which involved using 5%, 10%, 15%, and 20% auto-pruning threshold, the accuracy decreased on 5% threshold, but increased on 10%. Afterwards, the accuracy kept decreasing on 15% and 20% auto-pruning threshold. In addition to the declining accuracy, the analysis revealed that a high percentage of auto-pruning thresholds should not be recommended for mapping with complex classification scheme. This is due to the fact that it will eliminate some classes when the pixel in the leaf composing these particular classes does not meet the auto-pruning threshold criteria. For instance, at 10% threshold, the accuracy of CTA using DII increased to 89%. However, the remaining benthic habitat classes were only eight.
SVM is the third-best algorithm with an overall accuracy of 75.98%. This was obtained using DII input, with a mean accuracy of 74.27 ± 1.04%, which is slightly lower than CTA mean. These were obtained from the following settings. The C value was set to 10.00 and g value was set to 0.001. Meanwhile, the multiplier setting for C and g value during model tuning was set to 10. Despite the accuracy, SVM failed to classify the rubble class, and several seagrass classes such as ThCr, CrHu and Ho (Table 6). This is contradictory to the work of Reference [9], where seagrass variations can be correctly classified using SVM. This difference can be attributed to the differences in benthic habitat environmental complexity of each study area. Reference [9] classified areas covered mostly by seagrass of different conditions, and thus lowered the misclassification rate of seagrass class to other benthic habitats.
The main setback of running SVM algorithm using high number of training samples is the time required to perform the classification. The time required to run SVM classification using area samples adds up to almost four hours per classification process. We tried running the SVM algorithm using the newest computer processing hardware, but there was no significant decrease in the processing time compared to older computer hardware. Thus, although the accuracy of SVM is quite similar to CTA, the productivity is much lower compared to CTA, especially if we are to experiment with various scenarios of SVM parameter settings. The summary of overall accuracy for each classification result is illustrated in Table 7. The results of machine learning classification are provided in Figure 5.
McNemar test was performed to select the ideal model to be applied on other islands, and RF produced the highest classification accuracy. Based on this test, the performance of RF model using all dataset was not significantly different from deglint bands, and for this reason, we selected the model from RF deglint bands. The RF model was selected from deglint bands because it is more consistent and can be widely applied. The use of DII, PC bands, and bathymetry will improve the complexity of the required input and lead to the non-standard input values in exchange for an insignificant OA improvement. The value of PC bands and DII is highly dependent on the statistic of the input samples and image statistics, which is prone to subjectivity. Meanwhile, bathymetry between areas vastly varies, and cannot be used as standard parameter.

3.2. Model Application

The application of RF 14 class benthic habitat model from Kemujan Island to Karimunjawa, Menjangan Besar, and Menjangan Kecil Islands (hereafter, these islands are referred to as test areas) was not very successful. The accuracy of RF in test areas was 48.99%, with healthy coral reefs as the most accurate class with 66.24% and 91.99% UA and PA respectively. The accuracy of other classes such as ThCr (UA 55.13%, PA 55.4%), Ea (UA 49.6%, PA 49.87%), rubble (UA 33.33%, PA 33.59%), intermediate (UA 28.61%, PA 28.85%), and sand (UA 21.38%, PA 27.8%) followed accordingly. The rest of benthic habitat classes had less than 10% of UA and PA. Nevertheless, the result was actually consistent where the shoreline was dominated by seagrass and brown algae, and coral reef was located in reef-crest or reef-cut. However, there were some inconsistencies where healthy coral occurred in the shoreline, and the overestimation of seagrass extent in Menjangan Besar and Menjangan Kecil Islands. These results indicate that:
  • The classification scheme of benthic habitat is too detailed and creates confusion in the application of models in other areas.
  • Since not all benthic classes may be present in all areas, it is unclear if the model failed to classify a particular class, i.e., CrHu, or this particular class truly does not exist in the area. In our case, we can confirm that this was as a result of a failure in the model, since our field data indicated that there is CrHu located somewhere in the field.
  • For general mapping, the scheme needs to be refined and be more universal, to ensure that all classes in the scheme are present in many areas.
To justify our statement, we simplified the scheme and used the major benthic habitat classification scheme; coral reef, seagrass, macroalgae, and “sand and rubble”. All training areas were re-labelled based on these classes. We developed the model by means of RF using deglint bands and produced 94.17% OA. The UAs are 98.58%, 83.13%, 66.69%, and 91.80% for coral reef, seagrass, macroalgae, and “sand and rubble” classes respectively. The PAs are 97.44%, 87.76%, 58.57%, and 93.74% for the classes in the same order. Most classes produced very high accuracies with low misclassifications. Macroalgae class had the lowest accuracy owing to high misclassification rate with sand and rubble, and seagrass.
Afterwards, we applied this RF model to the test areas and yielded 70.93% OA. The UAs for coral reef, seagrass, macroalgae, and “sand and rubble” class are 86.24%, 44.30%, 5.94%, and 37.00% respectively, while the PAs are 91.75%, 20.40%, 13.48%, and 29.64% for the classes in the same order. The accuracy of the simplified RF model produced more accurate results, just as expected. This accuracy is statistically high, especially for rapid mapping, and it is within the acceptable limit for benthic habitat mapping using a scheme that comprises of four benthic classes. The acceptable limit lies within the range of 40%–70% [1] and >60% based on Indonesian Nasional Standard for Mapping [57]. Nevertheless, only coral reef class produced an accurate result, while other classes had low accuracies. Seagrass class was highly misclassified as macroalgae and coral reef, macroalgae was highly misclassified as sand and rubble and seagrass. Sand and rubble class was highly misclassified as coral reef and also indicated that the spatial distribution of coral reef was overestimated. Many areas near shoreline, which should have been classified as sand, seagrass, or macroalgae, were misclassified as coral reef.

4. Discussion

4.1. Accuracy Comparison

The goal of this research is to perform benthic habitat mapping using machine-learning algorithms, develop a mapping model from the most accurate result, and lastly, adapt the model to other areas. Machine-learning approach is different from the parametric classification algorithms that utilize training area statistics to generate the centroid of each class cluster, while the remaining pixels are classified accordingly, based on the range of boxes, shortest distance, mahalanobis distance, maximum probabilities, or angle of spectral similarity [58]. Machine-learning may resolve the issue of non-normal Gaussian distribution of training areas, where in this research is mainly due to the sub-pixel mixing of benthic habitats, different number of training areas between classes, and the possibility of inconsistencies in labeling the sample photos.
The use of CTA for benthic habitats mapping is limited, and a direct comparison could not be performed. SVM is a powerful classifier for benthic habitats mapping as described in previous works [8,9,52]. In this research, the reported accuracy of SVM is higher than other works that have employed less complex classification schemes [8,9,34,53]. The setback is the failure to properly classify small life-form seagrass classes compared to RF and CTA, and the required processing time. In SVM, there were four classes with zero accuracy, even after we experimented with different settings of SVM parameters in order to obtain more effective classification results and minimizing the misclassification.
It is also rather difficult comparing our result with others since the scheme used in each research is unique. The closest would be [14] with 13 classes, where he reached an accuracy of 40% using PC bands. However, the scheme does not contain any species differentiation of seagrass or the variation of macroalgae pigments. Our accuracy is considered to be higher, compared to other researches with the same or even lower scheme complexities. Even when using hyperspectral data, an accuracy of over 80% was only obtained for benthic habitat mapping with 3-12 classes complexity [7]. Recent researches that made use of high spatial resolution image classified less than ten benthic habitats classes, and examples are [8] with 73% accuracy (four classes), and Reference [59] employed the Spectral Angle Mapper (SAM) method on CASI hyperspectral image (7 classes, accuracy not reported).
The misclassification pattern for SVM and CTA result is similar, where in average, the user’s and producer’s accuracy of classes containing multiple benthic class (mixed class) are lower than classes containing single benthic class. Meanwhile, in RF result, the average producer’s accuracy of mixed classes is lower but the average user’s accuracy of mixed classes is higher than classes containing single benthic class (see Table 4, Table 5 and Table 6). Therefore, RF algorithm has better performance to correctly classify different benthic habitat compositions compared to SVM and CTA. However, strong misclassification did not only occur for mixed classes, but also in the classes containing single benthic object due to the similarity of object spectral response characteristics between individual benthic classes, overlapping class descriptor between classes, and class association, as explained in Section 3.1. As a consequence, even the class containing single benthic object can also have high misclassification rate.
The application of PCA did little to improve the accuracy of RF, CTA, and SVM, which is not in accordance with the result obtained by Reference [14] where the accuracy of PC bands outperformed other inputs such as deglint bands and DII in hierarchical benthic habitats mapping complexity using ML. Another contrary result was obtained by Reference [34] where the application of SVM to PC bands of Landsat 8 image obtained accuracy higher than that of DII. In fact, the accuracy is relatively constant for all inputs with very low standard deviation. Our results indicated that the incorporation of bathymetry and slope data in the classification input had no significant effect on improving the classification accuracy. Depending on the main input bands and algorithm, adding bathymetry and slope data may or may not improve the accuracy, and this is similar to the result from Reference [60] where the addition of LIDAR data increased and decreased the classification accuracy depending on the associated input bands and classification algorithm. Reference [7] indicated that bathymetry is not necessary to improve the accuracy of benthic habitats mapping, but Reference [8] reported otherwise. This difference may be as a result of the quality of bathymetry data. The bathymetry model of [8] was generated from a more complex radiative transfer model. Hence, resulting in a more accurate bathymetry map, and provided accuracy improvement on benthic habitat classification result.
Reference [8] highlighted the importance of water-column correction, therefore the dominant spectral response from underwater pixels sourced solely from benthic habitats, meanwhile, [7] warned about the difficulties of removing water-column effect from an optically shallow water, due to the need to obtain unique parameters for each spectral band. Additionally, the correction of water-column effect may not always be beneficial. Using ML, [14] did not manage to obtain accuracy improvement just by applying DII. Since there are several methods for removing the water-column effect, it is important to assess the impact of these different methods on the classification accuracy in other works to come.
In our case, the most accurate CTA and SVM result was from DII. Although the accuracy improvement for CTA was only 2% from deglint bands, DII made a significant impact in resolving the misclassification between coral reef and seagrass. Interestingly, if the results of RF, SVM and CTA from deglint bands, water column corrected bands, and PC bands are compared, the accuracy would not vary much. Even if all data were used at once, the accuracy will still not be significantly different. Hence, machine-learning algorithms may also reflect the maximum descriptive resolution of remote-sensing image to map benthic habitats at this level of complexity.

4.2. Benefits and Setbacks of Machine-Learning Algorithms

Using machine-learning algorithm, it is very possible to include various datasets as classification inputs, whether it is spectral bands or continuous dataset, i.e., bathymetry, slope, distance from shoreline, and categorical dataset, i.e., coral reef geomorphology map, in order to obtain an accurate benthic habitat map. The more data we made use of, the more information can be used in the machine-learning process. We do not have to repeatedly process and classify the image using different scenarios. Thus, machine-learning algorithm may produce a classification result at the maximum image descriptive resolution. In this research, the maximum accuracy is 88.54% (14 class) and 94.17% (4 class), which were derived from RF using Deglint bands. When we consider the various combinations of inputs or all input bands from all algorithms, the difference is insignificant (<5% standard deviation), which means that we can either make use of all the available data, or only limited data such as deglint bands to produce high accuracy. Machine-learning algorithm is powerful enough to exploit the capability of the input bands to their maximum descriptive resolution. This can be observed from the insignificant difference between accuracies of different input bands. Consequently, further processing may not be necessary, given that deglint bands are capable of producing an accurate result. We just need to ensure the correctness of the spatial distribution of benthic habitats by carefully identifying areas with potential misclassifications.
We also maintained the details provided by WV2 spatial resolution (2 m) while obtaining higher classification accuracy. Depending on benthic habitat complexity of the study area, the intended details of benthic habitat map, and the complexity of the classification scheme, per-pixel classification or OBIA may produce more representative benthic habitat spatial distribution. In this research, we preferred to have information at pixel level for benthic habitat mapping, as the information contained in each pixel is unique and the variation of adjacent benthic habitat in the study area is not always constant. There are many small patches of coral reef, seagrass, sand, or macroalgae in between and among on another. Coral reef and the change of adjacent habitat may not be linear, and sometimes exhibit a distinct feature boundary [61]. This distinct boundary is highly essential in identifying the biodiversity and ecological composition. Generalizing this variation may not be ecologically sound and reduce intra-habitat variations information. Performing majority analysis may also provide a better object distribution and less noise. However, it also decreases the precision provided by the WV2 image, and thus negates the high cost required to purchase the image [61]. Finally, whether it is per-pixel classification or OBIA, it depends on the in situ object variations that we intend to explain as well as what we are really planning to aim for benthic habitat mapping.

4.3. Machine-Learning Model Performance in Test Area

Using a model developed by a complex classification scheme is not effective when applied to other areas. Only several classes managed to be classified with good accuracy, especially the most dominant class, which is healthy coral reef. In addition, the detailed variation of benthic habitat between areas may not be similar, and thus, there is a possibility that a particular class in the developed model is not present in the applied areas, and this could therefore result in the possibility of having a non-existence class in the applied areas. The use of a more general scheme yielded better accuracy, though some classes still failed to be mapped at high accuracy. Coral reef class remains the class to be mapped with the highest accuracy, while macroalgae is mapped with the lowest.
As regards to the model that can be widely applied, it is important not to use input bands with high variability. For instance, the quality of water column corrected image is highly subjective and dependent of the range of depths and water column attenuation coefficient. PC bands result also relies strongly on the object variation in the scene, and bathymetry vastly varies between areas. As a result, the application of the model may fail. The most standard input would be surface reflectance and deglint bands. However, deglint bands may also introduce variability, depending on the strength of sunglint intensity.
The main goal for developing benthic habitats mapping model is to establish an automatic, accurate and consistent mapping model that can be widely applied. This research has shown that it is possible to perform such a task, although it is still premature to be concluded, since the area used in testing the model performance is having a relatively similar reef type and benthic habitat complexity. We plan to apply the RF model developed in this research to map benthic habitat in different reef types with different benthic habitat complexities. It is also important for us to develop similar models, especially using 3 visible bands, given that this is the most widely available spectral resolution, i.e., IKONOS, Quickbird, Geoeye-1, Skysat, PlanetScope, Rapideye, ALOS AVNIR-2, Sentinel-2, Landsat series, and SPOT series. This will ensure data continuity and possibility to perform historical analysis of benthic habitat.
Finally, refining the classification scheme in order to understand whether the model scheme has satisfactorily answered various management needs, and if obtaining the variations of benthic habitats areas across different geographical areas are necessary. For instance, seagrass class in our study did not include Thalassodendrom ciliatum (Tc), which is uniquely abundant on Nusa Lembongan Island [42]. Furthermore, it is essential to identify the balance between the details of benthic habitat classification scheme and the ease of scheme adaption to different areas. If these can be achieved, we can predict a well-detailed benthic habitat map across inaccessible areas, since the requirement to have in situ data to train the classification process is no longer necessary, unless the benthic habitats composition is unique to the area. Obtaining in situ benthic habitat data for validation will then be subject to the management priority.

5. Conclusions

Machine-learning algorithms are powerful approaches used in classifying benthic habitats at detailed and major classification schemes in order to produce an accurate benthic habitat map. Machine-learning algorithms also produced more stable classification results where the variation of input bands resulted in no significant alteration on the overall accuracy. The maximum accuracy was obtained from RF with 94.17% (4 classes) and 88.54% (14 classes), which is relatively higher than most classification results at similar levels of scheme complexity and managed to resolve various misclassification issues encountered by CTA and SVM. CTA and SVM also produced high accuracy, but there were several issues of misclassification between coral reef and seagrass classes in the CTA result, and also the failure of SVM to classify 4 from 14 benthic habitat classes. The application of RF model in classifying benthic habitat in other areas reveals that it is recommended to use the more general classification scheme to avoid several issues associated with benthic habitat variations. We successfully applied the 4-class RF model developed in Kemujan Island to the test areas with acceptable accuracy. Therefore, the RF model that we developed in this project can be applied to other benthic habitat areas, provided that the input bands required by our RF model are available. However, it is important to test our RF model to classify benthic habitat in other areas with either similar or different environmental setting to provide a more comprehensive assessment regarding its applicability. When the mapping model can be applied to other areas with a relatively consistent accuracy, initial benthic habitat mapping effort can be conducted without performing field survey across many currently unmapped areas.

Author Contributions

Conceptualization, P.W.; methodology, P.W.; software, P.W., P.A.A., W.L.; validation, P.W., P.A.A., W.L.; formal analysis, P.W.; investigation, P.W., W.L.; resources, P.W.; data curation, P.W., W.L.; writing—original draft preparation, P.W.; writing—review and editing, P.W.; visualization, P.W.; supervision, P.W.; project administration, P.W.; funding acquisition, P.W.

Funding

This research was funded by Direktorat Riset dan Pengabdian Masyarakat–Direktorat Jenderal Penguatan Riset dan Pengembangan–Kementerian Riset, Teknologi, dan Pendidikan Tinggi Republik Indonesia” via “Penelitian Berbasis Kompetensi” Research Scheme, Grant No. 1697/UN1/DITLIT/DIT-LIT/LT/2018.

Acknowledgments

We would like to thank DigitalGlobe, Inc. and Stuart Phinn from University of Queensland for providing us with WorldView-2 image of Karimunjawa Islands. We also want to thank Muhammad Hafizt from Indonesian Institute of Science (LIPI), Nur Hafizul Kalam, and Aisya Jaya Dhanahisvara from Universitas Gadjah Mada for assisting in collecting and preparing the field benthic habitat data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Green, E.P.; Mumby, P.J.; Edwards, A.J.; Clark, C.D. Remote Sensing Handbook for Tropical Coastal Management. In Coastal Management Sourcebooks 3; Edwards, A.J., Ed.; UNESCO: Paris, France, 2000. [Google Scholar]
  2. Goodman, J.A.; Purkis, S.J.; Phinn, S.R. Coral Reef Remote Sensing a Guide for Mapping, Monitoring and Management; Phinn, S.R., Ed.; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  3. Pu, R.; Bell, S.; Meyer, C.; Baggett, L.; Zhao, Y. Mapping and assessing seagrass along the western coast of Florida using Landsat TM and EO-1 ALI/Hyperion imagery. Estuar. Coast. Shelf Sci. 2012, 115, 234–245. [Google Scholar] [CrossRef]
  4. Zapata-Ramirez, P.A.; Blanchon, P.; Olioso, A.; Hernandez-Nunez, H.; Sobrino, J.A. Accuracy of IKONOS for mapping benthic coral-reef habitats: A case study from the Puerto Morelos Reef National Park, Mexico. Int. J. Remote Sens. 2013, 34, 3671–3687. [Google Scholar] [CrossRef]
  5. Phinn, S.R.; Roelfsema, C.M.; Mumby, P.J. Multi-scale, object-based image analysis for mapping geomorphic and ecological zones on coral reefs. Int. J. Remote Sens. 2012, 33, 3768–3797. [Google Scholar] [CrossRef]
  6. Roelfsema, C.; Phinn, S.; Jupiter, S.; Comley, J.; Albert, S. Mapping coral reefs at reef to reef-system scales, 10 s–1000 s km2, using object-based image analysis. Int. J. Remote Sens. 2013, 34, 6367–6388. [Google Scholar] [CrossRef]
  7. Zhang, C.; Selch, D.; Xie, Z.; Roberts, C.; Cooper, H.; Chen, G. Object-based benthic habitat mapping in the Florida Keys from hyperspectral imagery. Estuar. Coast. Shelf Sci. 2013, 134, 88–97. [Google Scholar] [CrossRef]
  8. Eugenio, F.; Marcello, J.; Martin, J. High-Resolution Maps of Bathymetry and Benthic Habitats in Shallow-Water Environments Using Multispectral Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3539–3549. [Google Scholar] [CrossRef]
  9. Zhang, C. Applying data fusion techniques for benthic habitat mapping and monitoring in a coral reef ecosystem. ISPRS J. Photogramm. Remote Sens. 2015, 104, 213–223. [Google Scholar] [CrossRef]
  10. Mumby, P.J.; Harborne, A.R. Development of a systematic classification scheme of marine habitats to facilitate regional management and mapping of Caribbean coral reefs. Biol. Conserv. 1999, 88, 155–163. [Google Scholar] [CrossRef]
  11. Sagawa, T.; Mikami, A.; Komatsu, T.; Kosaka, N.; Kosaki, A.; Miyazaki, S.; Takahashi, M. Mapping seagrass beds using IKONOS satellite image and side scan sonar measurements: A Japanese case study. Int. J. Remote Sens. 2008, 29, 281–291. [Google Scholar] [CrossRef]
  12. Mumby, P.J.; Edwards, A.J. Mapping marine environments with IKONOS imagery: Enhanced spatial resolution does deliver greater thematic accuracy. Remote Sens. Environ. 2002, 82, 248–257. [Google Scholar] [CrossRef]
  13. Benfield, S.L.; Guzman, H.M.; Mair, J.M.; Young, J.T. Mapping the distribution of coral reefs and associated sublittoral habitats in Pacific Panama: A comparison of optical satellite sensors and classification methodologies. Int. J. Remote Sens. 2007, 28, 5047–5070. [Google Scholar] [CrossRef]
  14. Wicaksono, P. Improving the accuracy of Multispectral-based benthic habitats mapping using image rotations: The application of Principle Component Analysis and Independent Component Analysis. Eur. J. Remote Sens. 2016, 49, 433–463. [Google Scholar] [CrossRef]
  15. Lucas, M.G.; Goodman, J. Linking Coral Reef Remote Sensing and Field Ecology: It’s a Matter of Scale. J. Mar. Sci. Eng. 2015, 3, 1–20. [Google Scholar] [CrossRef]
  16. Hedley, J.D.; Roelfsema, C.M.; Phinn, S.R.; Mumby, P.J. Environmental and sensor limitations in optical remote sensing of coral reefs: Implications for monitoring and sensor design. Remote Sens. 2012, 4, 271–302. [Google Scholar] [CrossRef]
  17. Andréfouët, S.; Kramer, P.; Torres-Pulliza, D.; Joyce, K.E.; Hochberg, E.J.; Garza-Perez, R.; Mumby, P.J.; Riegl, B.; Yamano, H.; White, W.H.; et al. Multi-sites evaluation of IKONOS data for classification of tropical coral reef environments. Remote Sens. Environ. 2002, 88, 128–143. [Google Scholar] [CrossRef]
  18. Zhang, C.; Xie, Z. Combining object-based texture measures with a neural network for vegetation mapping in the Everglades from hyperspectral imagery. Remote Sens. Environ. 2012, 124, 310–320. [Google Scholar] [CrossRef]
  19. Nababan, M.G.; Munasik, I.Y.; Kartawijaya, T.; Prasetia, R.; Ardiwijaya, R.L.; Pardede, S.T.; Sulisyati, R.; Mulyadi, Y.S. Status Ekosistem di Taman Nasional Karimunjawa: 2010; Wildlife Conservation Society-Indonesia Program: Bogor, Indonesia, 2010. [Google Scholar]
  20. Song, C.; Woodcock, C.E.; Seto, K.C.; Lenney, M.P.; Macomber, S.A. Classification and change detection using Landsat TM data: When and how to correct atmospheric effects? Remote Sens. Environ. 2001, 75, 230–244. [Google Scholar] [CrossRef]
  21. Chavez, P.; Berlin, G.; Mitchell, W. Computer Enhancement Techniques of Landsat MSS Digital Images for Landuse/Landcover Assessments. Remote Sens. Earth Resour. 1977, 6, 259. [Google Scholar]
  22. Wicaksono, P.; Hafizt, M. Dark Target Effectiveness for Dark-Object Subtraction Atmospheric Correction Method on Mangrove Above-Ground Carbon Stock Mapping. IET Image Process. 2018, 12, 582–587. [Google Scholar] [CrossRef]
  23. Kay, S.; Hedley, J.D.; Lavender, S. Sun Glint Correction of High and Low Spatial Resolution Images of Aquatic Scenes: A Review of Methods for Visible and Near-Infrared Wavelengths. Remote Sens. 2009, 1, 697–730. [Google Scholar] [CrossRef]
  24. Hedley, J.D.; Harborne, A.R.; Mumby, P.J. Simple and Robust Removal of Sunglint for Mapping Shallow-Water Benthos. Int. J. Remote Sens. 2005, 26, 2107–2112. [Google Scholar] [CrossRef]
  25. Lyzenga, D.R.; Malinas, N.P.; Tanis, F.J. Multispectral bathymetry using a simple physically based algorithm. IEEE Trans. Geosci. Remote Sens. 2006, 44, 2251–2259. [Google Scholar] [CrossRef]
  26. Lyzenga, D.R. Passive Remote-Sensing Techniques for Mapping Water Depth and Bottom Features. Appl. Opt. 1978, 17, 379–383. [Google Scholar] [CrossRef]
  27. Lyzenga, D.R. Remote sensing of bottom reflectance and water attenuation parameters in shallow water using aircraft and Landsat data. Int. J. Remote Sens. 1981, 2, 71–82. [Google Scholar] [CrossRef]
  28. Bierwirth, P.N.; Lee, T.J.; Burne, R.V. Shallow Sea-Floor Reflectance and Water Depth Derived by Unmixing Multispectral Imagery. Photogramm. Eng. Remote Sens. 1993, 59, 331–338. [Google Scholar]
  29. Purkis, S.J.; Pasterkamp, R. Integrating in situ Reef-top Reflectance Spectra with Landsat Tm Imagery to Aid Shallow-Tropical Benthic Habitat Mapping. Coral Reefs 2004, 23, 5–20. [Google Scholar] [CrossRef]
  30. Mishra, D.; Narumalani, S.; Rundquist, D.; Lawson, M. Benthic Habitat Mapping in Tropical Marine Environments Using QuickBird Multispectral Data. Photogramm. Eng. Remote Sens. 2006, 72, 1037–1048. [Google Scholar] [CrossRef]
  31. Wicaksono, P. Integrated Model of Water Column Correction Technique for Improving Satellite-based Benthic Habitat Mapping, a Case Study on Part of Karimunjawa Islands, Indonesia. Master’s Thesis, Universitas Gadjah Mada, Yogyakarta, Indonesia, 2010. [Google Scholar]
  32. Mishra, D.; Narumalani, S.; Rundquist, D.; Lawson, M.; Perk, R. Enhancing the detection and classification of coral reef and associated benthic habitats: A hyperspectral remote sensing approach. J. Geophys. Res. 2007, 112. [Google Scholar] [CrossRef]
  33. Bertels, L.; Vanderstraete, T.; Coillie, S.V.; Knaeps, E.; Sterckx, S.; Goossens, R. Mapping of coral reefs using hyperspectral CASI data; a case study: Fordata, Tanimbar, Indonesia. Int. J. Remote Sens. 2008, 29, 2359–2391. [Google Scholar] [CrossRef]
  34. Manuputty, A.; Gaol, J.L.; Agus, S.B.; Nurjaya, I.W. The utilization of Depth Invariant Index and Principle Component Analysis for mapping seagrass ecosystem of Kotok Island and Karang Bongkok, Indonesia. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bogor, UK, 2017. [Google Scholar]
  35. Duarte, C.M. Seagrass depth limits. Aquat. Bot. 1991, 40, 363–377. [Google Scholar] [CrossRef]
  36. Choice, Z.D.; Frazer, T.K.; Jacoby, C.A. Light requirements of seagrasses determined from historical records of light attenuation along the Gulf coast of peninsular Florida. Mar. Pollut. Bull. 2014, 81, 94–102. [Google Scholar] [CrossRef]
  37. Wicaksono, P. Perbandingan Akurasi Metode Band Tunggal dan Band Rasio dalam Pemetaan Batimetri Pada Laut Dangkal Optis. In Prosiding Simposium Sains Geoinformasi IV—2015; PUSPICS: Yogyakarta, Indonesia, 2015. [Google Scholar]
  38. Roelfsema, C.M.; Phinn, S.R. A Manual for Conducting Georeferenced Photo Transects Surveys to Assess the Benthos of Coral Reef and Seagrass Habitats; Manual Document; University of Queensland: Queensland, Australia, 2009. [Google Scholar]
  39. Hochberg, E.J.; Atkinson, M.J. Capabilities of remote sensors to classify coral, algae, and sand as pure and mixed spectra. Remote Sens. Environ. 2003, 85, 174–189. [Google Scholar] [CrossRef]
  40. Phinn, S.R.; Roelfsema, C.M.; Brando, V.; Anstee, J. Mapping seagrass species, cover and biomass in shallow waters: An assessment of satellite multi-spectral and airborne hyper-spectral imaging systems in Moreton Bay (Australia). Remote Sens. Environ. 2008, 112, 3413–3425. [Google Scholar] [CrossRef]
  41. Wicaksono, P.; Hafizt, M. Mapping seagrass from space: Addressing the complexity of seagrass LAI mapping. Eur. J. Remote Sens. 2013, 46, 18–39. [Google Scholar] [CrossRef]
  42. Wicaksono, P.; Kumara, I.S.W.; Kamal, M.; Fauzan, M.A.; Zhafarina, Z.; Nurswantoro, D.A.; Yogyantoro, R.N. Multispectral Resampling of Seagrass Species Spectra: WorldView-2, Quickbird, Sentinel-2A, ASTER VNIR, and Landsat 8 OLI. In IOP Conference Series: Earth and Environmental Science; IOP Publishing: Bristol, UK, 2017. [Google Scholar]
  43. Fyfe, S.K. Spatial and temporal variation in spectral reflectance: Are seagrass species spectrally distinct? Limnol. Oceanogr. 2003, 48, 464–479. [Google Scholar] [CrossRef]
  44. Thorhaug, A.; Richardson, A.D.; Berlyn, G.P. Spectral reflectance of the seagrasses: Thalassia testudinum, Halodule wrightii, Syringodium filiforme and five marine algae. Int. J. Remote Sens. 2007, 28, 1487–1501. [Google Scholar] [CrossRef]
  45. Wicaksono, P.; Kamal, M. Spectral response of healthy and damaged leaves of tropical seagrass Enhalus acoroides, Thalassia hemprichii, and Cymodocea rotundata. In Remote Sensing for Agriculture, Ecosystems, and Hydrology XIX; SPIE: Warsaw, Poland, 2017. [Google Scholar]
  46. Zambon, M.; Lawrence, R.; Bunn, A.; Powell, S. Effect of alternative splitting rules on image processing using classification tree analysis. Photogramm. Eng. Remote Sens. 2005, 72, 25–30. [Google Scholar] [CrossRef]
  47. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  48. Pal, M. Random Forests for Land Cover Classification. In Proceedings of the International Geoscience and Remote Sensing Symposium, Toulouse, France, 21–25 July 2003. [Google Scholar]
  49. Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
  50. Huang, C.; Davis, L.S.; Townshend, J.R.G. An assessment of support vector machines for land cover classification. Int. J. Remote Sens. 2002, 23, 725–749. [Google Scholar] [CrossRef]
  51. Mather, P.; Tso, B. Classification Methods for Remotely Sensed Data, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2009; p. 376. [Google Scholar]
  52. Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2010, 66, 247–259. [Google Scholar] [CrossRef]
  53. Wahidin, N.; Siregar, V.P.; Nababan, B.; Jaya, I.; Wouthuyzen, S. Object-based image analysis for coral reef benthic habitat mapping with several classification algorithms. Procedia Environ. Sci. 2015, 24, 222–227. [Google Scholar] [CrossRef]
  54. Cubillas, J.E.; Japitana, M. The Application of Support Vector Machine (SVM) Using CIELAB Color Model, Color Intensity and Color Constancy as Features for Ortho Image Classification of Benthic Habitats in Hinatuan, Surigao Del Sur, Philippines. In the International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Proceedings of the XXIII ISPRS Congress, Prague, Czech Republic, 12–19 July 2016; ISPRS: Prague, Czech Republic, 2016. [Google Scholar]
  55. Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991, 37, 35–46. [Google Scholar] [CrossRef]
  56. de Leeuw, J.; Jia, H.; Yang, L.; Liu, X.; Schmidt, K.; Skidmore, A. Comparing accuracy assessments to infer superiority of image classification methods. Int. J. Remote Sens. 2006, 27, 223–232. [Google Scholar] [CrossRef]
  57. Badan Informasi Geospasial. Peraturan Kepala Badan Informasi Geospasial No. 8/2014 Tentang Pedoman Teknis Pengumpulan dan Pengolahan Data Geospasial Habitat Dasar Perairan Laut Dangkal; BIG: Bogor, Indonesia, 2014. [Google Scholar]
  58. Richards, J.A. Remote Sensing Digital Image Analysis; Springer: Berlin, Germany, 2013. [Google Scholar]
  59. Leiper, I.A.; Phinn, S.R.; Roelfsema, C.M.; Joyce, K.E.; Dekker, A.G. Mapping Coral Reef Benthos, Substrates, and Bathymetry, Using Compact Airborne Spectrographic Imager (CASI) Data. Remote Sens. 2014, 6, 6423–6445. [Google Scholar] [CrossRef]
  60. McCarthy, M.J.; Halls, J.N. Habitat Mapping and Change Assessment of Coastal Environments: An Examination of WorldView-2, QuickBird, and IKONOS Satellite Imagery and Airborne LiDAR for Mapping Barrier Island Habitats. ISPRS Int. J. Geo-Inf. 2014, 3, 297–325. [Google Scholar] [CrossRef]
  61. Joyce, K.E.; Phinn, S.R.; Roelfsema, C.M. Live Coral Cover Index Testing and Application with Hyperspectral Airborne Image Data. Remote Sens. 2013, 5, 6116–6137. [Google Scholar] [CrossRef]
Figure 1. The location of the research area.
Figure 1. The location of the research area.
Remotesensing 11 01279 g001
Figure 2. Benthic habitat samples collected in the field using photo-transect method. White box indicates the location of subset area in Figure 4.
Figure 2. Benthic habitat samples collected in the field using photo-transect method. White box indicates the location of subset area in Figure 4.
Remotesensing 11 01279 g002
Figure 3. Research flowchart for benthic habitat mapping model and cross validation using machine-learning classification algorithms.
Figure 3. Research flowchart for benthic habitat mapping model and cross validation using machine-learning classification algorithms.
Remotesensing 11 01279 g003
Figure 4. Comparison of misclassification between RF (OA 88.54%), CTA (OA 77.80%), and SVM (OA 75.98%). Blue boxes show the area where seagrass was misclassified as coral reef except in CTA result. However, many pixels in the reef-crest area were misclassified as seagrass classes in CTA result, where they should have been classified as coral reef class as in RF and SVM results (see red polygon).
Figure 4. Comparison of misclassification between RF (OA 88.54%), CTA (OA 77.80%), and SVM (OA 75.98%). Blue boxes show the area where seagrass was misclassified as coral reef except in CTA result. However, many pixels in the reef-crest area were misclassified as seagrass classes in CTA result, where they should have been classified as coral reef class as in RF and SVM results (see red polygon).
Remotesensing 11 01279 g004
Figure 5. Benthic habitats map of Kemujan Island from RF with 88.54% OA, CTA with 77.80% OA, and SVM with 75.98% OA.
Figure 5. Benthic habitats map of Kemujan Island from RF with 88.54% OA, CTA with 77.80% OA, and SVM with 75.98% OA.
Remotesensing 11 01279 g005aRemotesensing 11 01279 g005b
Table 1. Specification and parameters of WV2 image used in this research. The unit of coefficient of calibration is W m−2 str−1 count−1 and ESUN is W m−2 µm−1.
Table 1. Specification and parameters of WV2 image used in this research. The unit of coefficient of calibration is W m−2 str−1 count−1 and ESUN is W m−2 µm−1.
NameWorldView-2
Date of Acquisition24 May 2012
Multispectral bandsBandWavelength (µm)Coefficient of calibrationESUNBandWavelength (µm)Coefficient of calibrationESUN
Coastal0.400–0.4500.0092956541580.8140Red0.630–0.6900.011036231559.4555
Blue0.450–0.5100.017835681758.2220Red-edge0.705–0.7450.0051881361342.0695
Green0.510–0.5800.013641971974.2416NIR10.770–0.8950.012243801069.7302
Yellow0.585–0.6250.0058298151738.4791NIR20.860–1.0400.009042234861.2866
Radiometric resolution11-bit
Correction levelLV3X Ortho
Solar zenith32.6°
Off-nadir viewing14.5°
Table 2. Summary of the atmospheric offset for DOS atmospheric correction and regression analysis for sunglint correction.
Table 2. Summary of the atmospheric offset for DOS atmospheric correction and regression analysis for sunglint correction.
BandDOS Atmospheric CorrectionSunglint Correction
Atmospheric OffsetRegression with Infrared BandSlopeR2
Cyan0.153NIR10.3830.263
Blue0.121NIR10.5900.382
Green0.075NIR10.9530.523
Yellow0.050NIR21.4810.778
Red0.039NIR10.9020.952
Red-edge0.029NIR21.4160.908
NIR 10.021---
NIR 20.016---
Table 3. The classification scheme of benthic habitats used in this research along with the number of samples. The numbers in bracket are the number of pixels obtained from the extrapolation of point samples into area samples based on image segmentation result resampled to WV2 pixel size (4 m2).
Table 3. The classification scheme of benthic habitats used in this research along with the number of samples. The numbers in bracket are the number of pixels obtained from the extrapolation of point samples into area samples based on image segmentation result resampled to WV2 pixel size (4 m2).
NoMajor Class (Level 1)Detailed Class (Level 2)Samples No. (Area)Class Descriptor
1Coral reef
Remotesensing 11 01279 i001
Healthy coral reef289 (24,614)Area dominated by healthy coral reef of any life-form (digitate, branching, tabular, foliose, massive, sub-massive, encrusting) (>70%). Corals with minor diseases also belong to this class.
Intermediate coral reef182 (3259)Mostly healthy coral reef (30–70%), but started to be overgrown by macroalgae or having medium level disease, with some variations of rubble, macro algae, and dead coral.
Dead coral179 (1531)Dead coral dominated-area (>70%), still maintain the coral reef structure but was already overgrown by macroalgae and associated with rubble. Since bleaching coral is rare in the study area, bleaching coral was also categorised as this class.
2Seagrass
Remotesensing 11 01279 i002
Ea48 (454)Seagrass bed dominated by Ea (>70%), and other species may be present in small quantity.
EaTh12 (657)Seagrass bed of mixed Ea, Th, Cr, Hu, Si, Ho with Ea as the dominant species (50–70%).
Th40 (415)Seagrass bed dominated by Th (>70%), and other species may be present in small quantity.
ThCr30 (520)Seagrass bed dominated by Th and or Cr, or a mixture of both (>70%). Other species may be present in small quantity.
CrHu4 (28)Seagrass bed dominated by Cr and or Hu, or a mixture of both (>70%). Other species may be present in small quantity.
Ho8 (109)Seagrass bed dominated by Ho (>70%).
Mixed Seagrass52 (2477)Seagrass bed of mixed species with none being dominant (each species <50%).
3Macro algae
Remotesensing 11 01279 i003
Brown algae180 (1243)Area dominated by brown algae, mainly fleshy type such as Padina sp., Sargassum sp., Dictyota sp., Caulerpa sp., also turf brown algae (>70%).
Green algae16 (merged with Mixed algae)Area dominated by calcareous green algae such as Halimeda sp. (>70%).
Mixed algae34 (432)Macroalgae bed of mixed species with none being dominant (each species <50%).
4Bare substratum
Remotesensing 11 01279 i004
Sand398 (13,115)Calcium carbonate sand, white bright colour (>70%).
Volcanic sand4 (merged with Sand)Red-coloured sand from weathered and eroded volcanic materials (>70%).
Rubble34 (41)Area of mainly rubble (>70%), with small portion of macroalgae, seagrass, or dead coral present.
Table 4. Confusion matrix of RF result. HC: Healthy Coral, IC: Intermediate Coral, DC: Dead Coral, BA: Brown Algae, MA: Mixed-Algae, MS: Mixed-Seagrass, Rb: Rubble, Sd: Sand. OA: Overall Accuracy, UA: User Accuracy, PA: Producer Accuracy, EC: Error Commission, EO: Error Omission.
Table 4. Confusion matrix of RF result. HC: Healthy Coral, IC: Intermediate Coral, DC: Dead Coral, BA: Brown Algae, MA: Mixed-Algae, MS: Mixed-Seagrass, Rb: Rubble, Sd: Sand. OA: Overall Accuracy, UA: User Accuracy, PA: Producer Accuracy, EC: Error Commission, EO: Error Omission.
Map ClassReference Class
BACrHuDCEaEaThHCHoICMAMSRbSdThThCrSumUAEC
BA574011112211241041801469192262.2637.74
CrHu02100000000000021100.000.00
DC16010560011702045145118142774.0026.00
Ea155011936201000155141546.5153.49
EaTh123216133001336014221074782.0617.94
HC220584314322,94805690001342123,92095.944.06
Ho170100061024630041018953.9746.03
IC612570151259021986106542381457.6342.37
MA702000230242191052130272.5227.48
MS1621221150106231204710104253791.138.87
Rb000000000021000211000.00
Sd31419818512193117083801612,649206313,88591.108.90
Th630117400011150172341340258.2141.79
ThCr41052105112130285530646965.2534.75
Sum124328153145483324,614109325943224774113,11541552049,071
PA46.187568.9742.5173.5993.2393.5867.4450.6993.3451.2296.4556.3958.85OA88.54
EO53.822531.0357.4926.416.776.4232.5649.316.6648.783.5543.6141.15
Table 5. Confusion matrix of CTA result. HC: Healthy Coral, IC: Intermediate Coral, DC: Dead Coral, BA: Brown Algae, MA: Mixed-Algae, MS: Mixed-Seagrass, Rb: Rubble, Sd: Sand. OA: Overall Accuracy, UA: User Accuracy, PA: Producer Accuracy, EC: Error Commission, EO: Error Omission.
Table 5. Confusion matrix of CTA result. HC: Healthy Coral, IC: Intermediate Coral, DC: Dead Coral, BA: Brown Algae, MA: Mixed-Algae, MS: Mixed-Seagrass, Rb: Rubble, Sd: Sand. OA: Overall Accuracy, UA: User Accuracy, PA: Producer Accuracy, EC: Error Commission, EO: Error Omission.
Map ClassReference Class
BACrHuDCEaEaThHCHoICMAMSRbSdThThCrSumUAEC
BA501026918323911210603703613123540.5759.43
CrHu1722009541084170131092369.3290.68
DC80494101050186206430785257.9842.02
Ea2360431824147067000610589147321.5978.41
EaTh3335656867004115360936535157036.1863.82
HC180180582621,132174413001988622,38494.415.59
Ho4804900267480193208451843517.0182.99
IC25044401520861159117106603424937.4462.56
MA85013727112141041349020263284415.8884.12
MS6017310219011145542046032836152302767.5932.41
Rb00400770290020510018111.0588.95
Sd92089373108117931181410,95354211,57294.655.35
Th5313950002131100151431241134.7965.21
ThCr671234617544181021894318260230.2369.77
Sum124328153145483324,614109325943224774113,11541552049,071
PA40.3178.5732.2770.0468.1985.8567.8948.8231.0282.648.7883.5234.4635OA77.80
EO59.6921.4367.7329.9631.8114.1532.1151.1868.9817.451.2216.4865.5465
Table 6. Confusion matrix of SVM result. HC: Healthy Coral, IC: Intermediate Coral, DC: Dead Coral, BA: Brown Algae, MA: Mixed-Algae, MS: Mixed-Seagrass, Rb: Rubble, Sd: Sand. OA: Overall Accuracy, UA: User Accuracy, PA: Producer Accuracy, EC: Error Commission, EO: Error Omission.
Table 6. Confusion matrix of SVM result. HC: Healthy Coral, IC: Intermediate Coral, DC: Dead Coral, BA: Brown Algae, MA: Mixed-Algae, MS: Mixed-Seagrass, Rb: Rubble, Sd: Sand. OA: Overall Accuracy, UA: User Accuracy, PA: Producer Accuracy, EC: Error Commission, EO: Error Omission.
Map ClassReference Class
BACrHuDCEaEaThHCHoICMAMSRbSdThThCrSumUAEC
BA249379616712504939033912134102224.3675.64
CrHu0000000000000000.00100
DC420238081022513446210243765036.6263.38
Ea3702154369460580001591561523286.6293.38
EaTh20004000000000666.6733.33
HC2006872921723,714222791317001050327,23987.0612.94
Ho0000000000000000.00100
IC9503926162856527088104242051778164516.4183.59
MA3182223940022173590135804995718.0881.92
MS57235259217242221959022694234280769.7930.21
Rb0000000000000000.00100
Sd4150168141144422401331181710,49098412,33485.0514.95
Th81003113870035168342.1757.83
ThCr0000000000000000.00100
Sum124328153145483324,614109325943224774113,11541552049,071
PA20.03015.5533.920.4896.3408.2840.0579.09079.988.430OA75.98
EO79.9710084.4566.0899.523.6610091.7259.9520.9110020.0291.57100
Table 7. Overall accuracy (OA) of classification results (%) in Kemujan Island. * depicts the highest classification accuracy for each classification algorithm. ** we failed to obtain the classification result of SVM using all dataset as input due to computational issues. RF model produced the highest and most consistent classification accuracy using different inputs.
Table 7. Overall accuracy (OA) of classification results (%) in Kemujan Island. * depicts the highest classification accuracy for each classification algorithm. ** we failed to obtain the classification result of SVM using all dataset as input due to computational issues. RF model produced the highest and most consistent classification accuracy using different inputs.
InputRFCTASVM
Deglint bands88.0175.5875.04
Deglint-Bathymetry-Slope87.6875.2573.99
DII87.8577.80 *75.98 *
DII-Bathymetry-Slope88.0776.8273.25
PC bands87.8873.2873.55
PC-Bathymetry-Slope88.2971.8473.78
All dataset88.54 *77.17**
Mean88.0575.3974.27
Standard Deviation0.292.171.04
Back to TopTop