Land-Use Land-Cover Classiﬁcation by Machine Learning Classiﬁers for Satellite Observations—A Review

: Rapid and uncontrolled population growth along with economic and industrial development, especially in developing countries during the late twentieth and early twenty-ﬁrst centuries, have increased the rate of land-use / land-cover (LULC) change many times. Since quantitative assessment of changes in LULC is one of the most e ﬃ cient means to understand and manage the land transformation, there is a need to examine the accuracy of di ﬀ erent algorithms for LULC mapping in order to identify the best classiﬁer for further applications of earth observations. In this article, six machine-learning algorithms, namely random forest (RF), support vector machine (SVM), artiﬁcial neural network (ANN), fuzzy adaptive resonance theory-supervised predictive mapping (Fuzzy ARTMAP), spectral angle mapper (SAM) and Mahalanobis distance (MD) were examined. Accuracy assessment was performed by using Kappa coe ﬃ cient, receiver operational curve (RoC), index-based validation and root mean square error (RMSE). Results of Kappa coe ﬃ cient show that all the classiﬁers have a similar accuracy level with minor variation, but the RF algorithm has the highest accuracy of 0.89 and the MD algorithm (parametric classiﬁer) has the least accuracy of 0.82. In addition, the index-based LULC and visual cross-validation show that the RF algorithm (correlations between RF and normalised di ﬀ erentiation water index, normalised di ﬀ erentiation vegetation index and normalised di ﬀ erentiation built-up index are 0.96, 0.99 and 1, respectively, at 0.05 level of signiﬁcance) has the highest accuracy level in comparison to the other classiﬁers adopted. Findings from the literature also proved that ANN and RF algorithms are the best LULC classiﬁers, although a non-parametric classiﬁer like SAM (Kappa coe ﬃ cient 0.84; area under curve (AUC) 0.85) has a better and consistent accuracy level than the other machine-learning algorithms. Finally, this review concludes that the RF algorithm is the best machine-learning LULC classiﬁer, among the six examined algorithms although it is necessary to further test the RF algorithm in di ﬀ erent morphoclimatic conditions in the future.


Introduction
Knowledge of land-use/land-cover (LULC) change is essential in a number of fields based on the use of Earth observations, such as urban and regional planning [1,2], environmental vulnerability and impact assessment [3][4][5][6][7], natural disasters and hazards monitoring [8][9][10][11][12][13] and estimation of soil erosion and salinity, etc. [14][15][16][17]. Quantitative assessment and prediction of LULC dynamics are the most efficient means to manage and understand the landscape transformation [18]. Mapping LULC change has been identified as an essential aspect of a wide range of activities and applications, such as in planning for land use or global warming mitigation [19,20]. Consequently, assessment in LULC change is inevitably required for a variety of purposes for the welfare of human beings in the context of rapid and uncontrolled population growth along with economic and industrial development, especially in developing countries with intensified LULC changes [20][21][22][23]. These changes have a series of impacts on both human society and environment in many ways like increasing flood and drought vulnerability, environmental degradation, loss of ecosystem services, groundwater depletion, landslide hazards, soil erosion and others [14,15,[24][25][26][27].
Several techniques have been developed to map LULC patterns and dynamics from the satellite observations, including traditional terrestrial mapping, as well as satellite-based mapping. Terrestrial mapping, known as a field survey, is a direct way of mapping in which the map can be produced at various scales incorporating information with different levels of precision, although it is a manpower-based, time-and money-consuming way to map large areas [28]. Moreover, there is a chance of subjectivity in mapping. On the other hand, the satellite-and aerial photograph-based mapping of LULC are cost-effective, spatially extensive, multi-temporal, and time-saving [29]. Earlier, the spatial resolution of satellite data was comparatively less than that of the maps prepared through terrestrial surveys. With the advancement of remote-sensing (RS) techniques and microwave sensors, satellites provide data at various spatial and temporal scales [30][31][32]. RS provides the opportunity for rapid acquisition of information on LULC at a much reduced price compared to the other methods like ground surveys [33,34]. The satellite images have the advantages of multi-temporal availability as well as large spatial coverage for the LULC mapping [35,36]. In the past few decades, studies on mapping, monitoring and forecasting of LULC dynamics have been carried out using medium-and low-resolution observations from satellites, such as Landsat, Satellite Pour l'Observation de la Terre, or Satellite for observation of Earth (SPOT), Indian Remote Sensing (IRS) Satellite Resourcesat, Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), Moderate Resolution Imaging Spectroradiomete (MODIS) and others [18,31,[37][38][39][40]. With the advancement of hyperspectral satellite sensors, the importance of RS has increased many times in the research field and for planning purposes.
Over the last decade, more advanced methods, such as artificial neural networks (ANN), SVM, RF, decision tree, and other models, have gained exceptional attention in remote sensing-based applications, such as LULC classification. Thus, numerous studies on the LULC modelling have been carried out using different machine-learning algorithms [14,[48][49][50][51] as well as comparing the machine-learning algorithms [52][53][54]. Furthermore, a few studies have been carried out to identify the best suited and accurate algorithm among used machine-learning classifiers for LULC mapping [52][53][54][55]. Each machine-learning technique has different types of accuracy levels. It has been found that ANN, SVM, and RF generally provide better accuracy as compared with the other traditional classifier techniques [56], while SVM and RF are the best techniques for the LULC classification compared to all other machine-learning techniques [57,58]. However, the sensor characteristics and image data-related factors, such as spatial and temporal resolution, processing software and hardware, etc. determine the accuracy of LULC classification [59].
Several studies found that the LULC classification using medium-and low-resolution observations from satellites has several spectral and spatial limitations that affect its accuracy [24,[60][61][62]. Therefore, researchers have been applying machine-learning algorithms to reduce the aforementioned limitations and obtain high-precision LULC images. Furthermore, all machine-learning techniques do not always produce a high-precision LULC map because good results depend on the machine-learning model set-up, training samples and input parameters. Up to the present, numerous studies have been conducted on land-use classification using machine-learning algorithms [20,63], but the performance of models is not well examined. In this article, we utilized six machine-learning techniques to understand which method can produce a high-precision LULC map based on accuracy statistics.

Study Area
We selected a stretch of riparian landscape of the river Ganga from Rajmahal to Farakka barrage in India emphasizing three major dynamic river islands (locally, charland) dominated by patches. LULC classification in relatively stable areas is easier than highly dynamic landscape like charland and such work is undertaken by many scholars. How far the advanced methods are useful for delineating LULC units in such a dynamic area was given emphasis with different approaches of accuracy assessment. Successful application of one method in different similar sites proves its usability. Hence, to test the precision of the applied methods three such patches from the study stretch were used. The study area covers parts of Jharkhand and the West Bengal states of India. More precisely, it covers some parts of Sahibganj District of Jharkhand and Malda and the Murshidabad districts of West Bengal ( Figure 1). The topography of the study area is dominated by alluvial plain, which is formed by the sediments deposited by Ganga, Mahananda and Kalindi rivers. The elevation of the region varies between approximately 12 meters to 90 meters. The regular flooding makes the region suitable for agriculture with seasonal water scarcity. The climate of the region is of sub-humid monsoon type (Koppen-Cwg) with average annual rainfall more than 1500 mm and temperature ranges from 10 and 38 • C. The rapidly increasing population causes large-scale landscape transformation by the expansion of agricultural land and human settlement along with well-defined riverbank erosion and flood hazards. Riverbank erosion has caused significant changes in the LULC pattern in the area for a long time. The frequent flooding and riverbank erosion have caused large-scale displacement of the human settlements in the region during the first decade of the 21st century [64].

Materials
In this work, the Landsat 8 Operational Land Imager (OLI) image (path/row 139/43) downloaded from the United States Geological Survey (USGS) website (https://earthexplorer.usgs.gov) ) has been used to map the LULC using different machine-learning algorithms ( Figure 2). Six first-order LULC classes have been identified based on a comprehensive literature survey and expert-based knowledge about the study area ( Table 1). The acquisition date of the Landsat data downloaded was 03 October 2019. Furthermore, the Google Earth image and field-based observations have been used for the accuracy assessment of the LULC maps prepared.

Methods for Land-Use/Land-Cover (LULC) Modelling
The LULC classification was performed using the six most popular machine-learning classifiers. The descriptions of the parameters for optimizing the models and software used to perform the LULC classification are given in Table 2.

Artificial Neural Network (ANN)
The ANN is the most widely applied machine-learning technique, which can be efficiently used in non-linear phenomena such as parameter retrieval [65][66][67], LULC changes with the ability to work on big data analysis. It is currently one of the most used non-parametric classification techniques [68]. It does not depend on any assumption of generally distributed data [69].
The ANN is a forward structure black-box model, which is trained by back propagation algorithm (supervised training algorithm). The ANN is functioned like a human brain or nervous system containing nerve fibres with many interconnections through other axons [70]. It can learn and produce meaningful results from examples, even when the input data having error or complexity and incomplete. Therefore, it can simulate exactly like the human nervous system. However, the ANN has one input layer, at least one hidden layer and one output layer. Each layer is formed by neurons (like brain nerves) (Figure 3). These neurons are non-linear processing units. However, all the neurons in a layer are interconnected to all other neurons in the adjacent layers and formed networks. In addition, the connection between neurons in successive layers are weighted. This process (transferring information from one neuron to another or one layer to other layer) is called forward connection. This automatic learning is accomplished through a dynamic adjustment of network inter-connection associated with each neuron [66].
One of most important algorithms that ANN usually uses is the back propagation algorithm, which is a gradient-decent algorithm. The main function of it is to minimize the error between the actual network outputs and the outputs of training input/output pairs [71]. The network repeatedly receives the numbers of input/output pairs and the error is propagated from the output back to the input layer. The learning rate and update rule renew the weights of the backward paths [72,73]. In addition, the default processing unit, training and learning rate cannot uniquely specify the ANN. Therefore, the trial and error process of changes of model parameters can only be the best way to obtain better result. In this review article, the multilayer perceptron (MLP) ANN architecture used in the LULC classification is modelled using a layered feed-forwarded model in the TerrSet Geospatial Modelling Software.
The MLP architecture can be explained mathematically. In MLP architecture, the input layer comprises the n 0 neurons, which collect a normalized set of input variables of x i (i = 1, 2 . . . . . . n 0 ). The second layer is also known as the hidden layer that contains the n 1 neurons and receives a set of variables of y j (j = 1, 2 . . . . . . ...n 1 ), which are the output of the first layer. Each of the layers receive a bias value of 1 in each of the neurons that rectify their outputs. The third or output layer consists of the n 2 neurons with number equal to output variables of z k (K = 1, 2 . . . ., n 2 ). A continuous non-linear mapping is performed in the n 0 neurons of x i variables in the output layer to the y i variables in the hidden layer after summing them up using an activation function. The parameter of this function is also defined as weights of neurons in each hidden layer for each result of neurons of the input layer [74]. One of the most common methods for ANN training is the back-propagation algorithm defined by minimizing the cost function as presented in Equation (1).
where n represents the number of classes, a i denotes the expected output, and b i is the response of designed ANN from the i neuron of the total n neurons in the output layer.

Support Vector Machine (SVM)
SVM is a non-parametric supervised machine learning technique and initially aimed to solve the binary classification problems [41,75]. It is based on the concept of structural risk minimization (SRM), which maximizes and separates the hyper-plane and data points nearest the spectral angle mapper (SAM) of the hyper-plane. It separates data points into various classes using a hyper-spectral plane. In this process, the vectors ensure that the width of the margin will be maximized [76]. SVM can support multiple continuous and categorical variables as well as linear and non-linear samples in different class membership. The training samples or bordering samples that delineate the margin or hyper-plane of SVM are known as support vectors [46]. In remote sensing, the polynomial and radial basis function (RBF) kernel has been used most commonly [41], but for LULC classification RBF is the most popular technique and gives better accuracy than the other traditional methods.
The original SVM method has been launched with a set of data, and its objective is to find the hyper-plane that can separate the datasets into a number of classes, as the aim of SVM is to find the optimal separating hyper-plane from the available hyper-planes [77]. Furthermore, the SVM algorithm needs a proper kernel function to establish the hyper-planes accurately and minimize the classification errors [78]. The essential part of the SVM technique is the kernel type used. The functionality of the SVM mainly depends on the kernel size, and the similarity of a smooth surface depends on the more significant kernel density. For simulated and real-world hyperspectral satellite data, the genetically optimized SVM using the support vectors shows the best performance [79]. The primary function of SVM is to find the optimal boundary, which will increase the separation between the entire support vectors. The RBF and polynomial function kernel were performed on ENVI software version 5.3 for LULC mapping.

Fuzzy ARTMAP (FA)
The fuzzy ARTMAP technique is based on the similarity of the fuzzy subset calculation as well as the adaptive and vibrant category selection through the feature space search. The structure of fuzzy ARTMAP includes two modules, i.e., ART a and ART b . These two modules can be further sub-divided into two sub-modules in the function (attention and orientation subsystem). The attention subsystem has several functions. For example, it deals with the modules, establishes the exact internal illustration, and creates fine-tuning for the modules. In contrast, the orientation subsystem is used for dealing with the newly appeared module [80]. Each module of the fuzzy ARTMAP consists of three layers, namely F 0 as input layer, F 1 as comparison layer and F 2 as recognition layer. These characteristics of fuzzy ARTMAP are identical to the artificial neural network. Furthermore, each layer has its respective neuron units M, M, N as well as the control connections associated with the layers. F 1 is used for the detection of features and it has adequate nodes for the mode coding, while the nodes of F 2 show the categories concerning the input.
Based on the comprehensive investigation of fuzzy ARTMAP as well as the characteristics of remote-sensing data, a simplified fuzzy ARTMAP algorithm has been applied using the Terrset software. It comprises two layers in which the first is used for the feature data input and the second for the classification of remote-sensing data. In the first layer, the numbers of neurons are equal to the feature dimensions of the data, while in the second layer, the numbers of neurons are decided by the user as per the trial and process results [81]. The fuzzy ARTMAP firstly calculates the comparison between the new pattern and the existing active pattern. Then all active values are arranged in ascending order to degree of matching and compared with warning values. If the warning values are exceeded by the matching degree, the pattern of the training sample will be the same as the output layer. The fuzzy ARTMAP combines the pattern of output layer neurons and uses the weight between the output-input layers and the radius. If all the output layer neurons do not meet the matching requirements, a new output layer neuron will be built to store the new pattern and, thus, the results of classification become more accurate with more output layers.
4. Random Forest RF is a new non-parametric ensemble machine-learning algorithm developed by Breiman [82]. The RF algorithm has been widely applied for solving the environmental problems, like water resource management and natural hazard management. It can handle a variety of data, like satellite imageries, and numerical data [83]. It is an ensemble learning method based on a decision tree, which combines with massive ensemble regression and classification trees. For setting up the RF model, two parameters are needed and called the base of the method. These parameters are (1) the number of trees, which can be explained by 'n-tree' and (2) many features in each split, which can be explained by 'm-try.' Classification trees provide an individual choosing power or vote and accurate classification in regulating the majority vote from trees in the entire forest.
In recent times, several studies have shown a satisfactory performance for LULC classification using RF in the field of remote-sensing applications [42,52,57]. A vast number of trees of this method provide better accuracy in the field of image classification [84] and land-use modelling. Breiman [82] stated that using more trees compared to required trees is an unnecessary and time-consuming process, but it does not hamper the entire model. Furthermore, Feng et al. [85] selected 200 decision trees in their study and noted that the performance of RF was accurate. The RF technique has been benefited with the two more powerful algorithms: bagging and random, which are called the powerhouse of this method. In our study, the 'randomForest' package in R has been used to produce the LULC map. As suggested by Feng et al. [85], 200 decision trees have been used with 3 input features (m-try) in our study.

Mahalanobis Distance (MD)
Supervised image classification is performed to detect the quantitative approach in the remote-sensing image. The prime goals of supervised classification are to segment the spectral domain into the areas that match the ground cover interest classes for a particular purpose. The Mahalanobis distance (MD) supervised image classification algorithm was developed by an Indian applied statistician Mahalanobis in the 1930s [86]. In MD classification, training data are given to specify the spectral classes of the pixel based on the user-defined classes. MD classification is same as maximum likelihood classification where statistics have been used for each class and it considers only equal class coefficients. The MD method measures the distance between two or more than two correlated variables. In mathematical term, MD distance is equal to Euclidean distance (ED) when the covariance matrix is the unit matrix. The small value of MD increases the chance of an observation being closer to the group's centre. For each feature vector, the MD (D 2 k ) towards class means is calculated as in the following equation: where x i is the vector showing the pixel of image data, x k is the sample mean vector of the kth class, S −1 k V is the variance/covariance matrix for class i; and T represents transpose of the matrix.

Spectral Angle Mapper (SAM)
SAM is an auto-generated supervised spectral classifier machine learning technique that is used to determine the spectral similarity between the given image spectra and reference spectra in an n (here n denotes the spectral band number) dimensional space using the calculation of the angle between the spectra [87]. In recent times, a large number of bands have been used in hyper-spectral remote sensing to identify the different objects accurately and the SAM is able to analyse all bands together. Reference spectra refer to the spectra that can be taken either by field investigation or directly from satellite images. For LULC classification, reference spectra can be taken as a signature from the satellite image [88].
In SAM, only angular information can be used to identify the pixel spectra. Thus, SAM uses only angular information to identify the pixel spectra, which assumes that an observed reflectance spectrum in a vector format is a multidimensional space with the number of dimensions equal to the number of bands. The difference between image spectra and reference spectra is shown as the level of angle where a small angle indicates high similarity and a high angle indicates low similarity. The maximum threshold limits of tolerance of angle are not classified. Hence, it is better to define a threshold angle limit (in radians) under which a pixel cannot be classified. In our study, SAM has been applied using the ENVI 5.3 image processing environment for LULC classification. This technique is comparatively intensive for illumination and albedo conditions while calibrating reflectance information. The SAM is auto-generated supervised classification.

Similarity Test among the Classifiers
For representing the difference of performance of the algorithms for delineating LULC, similarity ratio (SR) is computed. It is simply the ratio between proportions of area of a given LULC computed by two algorithms. SR = 1 signifies the absolute similarity of the areal proportion of LULC computed by two algorithms. A value >1 or <1 means growing dissimilarity.

Accuracy Assessment and Correlation among the Classifiers
The post-classification accuracy assessment has been considered as the most vital part of validating the LULC maps produced by the models [61,89]. The high-precision LULC map can generate fundamental grounds for successful planning and management. The statistics only can tell about the accuracy assessment and the Kappa coefficient is a statistical technique that has been applied in the present study for assessing the accuracy. Monserud and Leemans [90] suggested five levels of agreement, poor or very poor, fair, good, very good, and excellent corresponding to the values lower than 0.4, from 0.4 to 0.55, from 0.55 to 0.70, from 0.70 to 0.85, and higher than 0.85, respectively, between images and ground reality. Thus, the Kappa coefficient has been calculated using 200 randomly selected sample ground control points in order to evaluate the accuracy of LULC maps produced by using different algorithms (the random points are shown in Figure 1). The sample points have been selected from the field observation and using Google Earth Pro for the remote and inaccessible areas.
The receiver operating characteristics (RoC) curve graph was plotted to validate the performance of LULC classifiers for detecting the different features of LULC. The graph was plotted between sensitivity and specificity being on y and x axes, respectively. The sensitivity of a model represents the proportion of correctly predicted positive pixels (i.e., the pixels belonging to a particular LULC class were correctly predicted or identified), while the specificity refers to the proportion of correctly predicted negative pixels (i.e., the pixels not belonging to a particular LULC class was correctly predicted or identified). The sensitivity and specificity were calculated following Equations (3) and (4): where a represents true positive, d refers to true negative, b means false positive, and c represents false negative.
The area under curve (AUC) of the RoC curve depict the performance of classifiers for predicting the LULC. The value of AUC ranges from 0-1, while the AUC close to 1 represents the high degree of model performances.
The root mean square error (RMSE) was computed to evaluating the performance of machine learning classifiers using the observed and prediction sample points. The RMSE was calculated by using Equation (5). The lower the RMSE, the higher the accuracy of LULC prediction: where n represents the number of sample points. The "index-based technique" has been introduced to evaluate and select the best machine-learning technique for LULC mapping. Thus, three satellite data-derived indices; normalized differential vegetation index (NDVI), normalized differential water index (NDWI) and normalized differential built-up index (NDBI), have been calculated for this purpose. Each index has been classified based on a manual threshold. For better visualization, LULC classes (water, vegetation-agricultural land, built-up area) and threshold-based NDWI, NDVI, NDBI have been masked out from the study area using the selected three windows. The closeness of the area between the index-derived area and classifier-derived LULC area could be considered as a good result and vice versa. Then, we used correlation matrix among the area of land use classes of six LULC models and satellite data-derived indices to statistically validate the index-based methods: NDWI = (Green band − NIR band) (Green band + NIR band) We also used a visual interpretation procedure to evaluate the accuracy assessment of LULC models. Furthermore, Karl Pearson's coefficient of correlation technique was applied to understand the association among the results of area coverage of land use classes obtained from the six LULC models. Higher correlation coefficient values indicate conformity of the models.

LULC Classification
The spatial analysis of the LULC map shows that the built-up area, and rivers and wetland are more prominent and clearer in the outputs of SVM and random forest classifiers, while they are least prominent in the output of SAM. On the other hand, the fallow land and agricultural land are more prominent in the output of ANN, followed by fuzzy ARTMAP and Mahalanobis distance classifiers. The vegetation cover and sand bar are fairly classified in all classifiers. In RF and SVM, they are excellently classified (Figure 4). Overall, maximum coverage of built-up land was classified by the SVM and fuzzy ARTMAP methods, whereas least coverage of built-up land was classified by SAM and random forest. On the other hand, the highest coverage of vegetation is found by the SAM classifier, followed by RF and SVM classifiers, while the least coverage is found by ANN. The coverage of fallow land is completely reciprocal to the vegetation cover and the ANN classifier has the highest coverage, followed by fuzzy ARTMAP and MD classifiers, while SAM has the least coverage ( Table 3). The coverage of rivers and wetland and sand bar are almost equally classified in all classifiers.  Table 3 shows the percentage share of each LULC class with respect to the total land coverage in the study area for each classifier. Vegetation cover is the most dominant land-use class in the region classified by all classifiers. It covers about half of the total land surface, while the sand bar has the least share in the total land surface area. The percentage share of vegetation cover in total area varies from 44.07% by ANN to 58.62% by SAM. The built-up area (from 9.99% by SAM to 19.59% by fuzzy ARTMAP) and fallow land (from 5.37% by RF to 18.29% by ANN) and agricultural land (from 6.72% by fuzzy ARTMAP to 18.20% by RF) are at the second, third and fourth positions in terms of areal share, while sand bar has the least percentage share with respect to the total surface area by all the classifiers used (from 0.94% by RF to 1.84% by SAM). Computed standard deviation (SD) and coefficient of variation (CV) among the percentage share of area in a single LULC class by different classifiers are also displayed in Table 3. This vividly exhibits that vegetation, river and water bodies are classified more accurately as all the classifiers accounted for quite uniform areas with very low coefficient of variation. In contrast, fallow land and agricultural land are less well classified as all the classifiers accounted areal extent with considerable differences. Based on the result of similarity test, it can be stated that fuzzy ARTmap and MD, fuzzy ARTmap and SVM methods are quite similar in their performance. The difference is found to be maximum between ANN and SAM algorithms (Table 4).

Validation of the LULC Classification
The overall accuracy (in percentage) using Kappa coefficient (K) for all the classifiers is shown in Table 5. The RF classifier has been detected as the highly accurate LULC model with Kappa coefficient of 0.89 among all the classifiers followed by ANN (K = 0.87), SVM (K = 0.86), fuzzy ARTMAP (K = 0.85), SAM (K = 0.84) and MD (K = 0.82). RF, ANN and SVM models exhibit excellent agreement and the other models show very good agreement between classified LULC map and ground reality. All the models can be treated as useful but the RF algorithm can be recommended as the best suited classifier of LULC. However, the agricultural land and river and wetland were classified better by using the RF and ANN algorithms (user's accuracy of RF and ANN: 94%, 92%) in comparison to the other algorithms. Similarly, most of the LULC classes were well classified by using RF, fuzzy ARTmap and ANN (See details in supplementary Table S1). The computed area under curve (AUC) of ROC and RMSE stated in Table 5 also yield the same result as identified when using the Kappa coefficient. The areas of the three spectral indices; normalized differential water index (NDWI), normalized differential vegetation index (NDVI), and normalized differential built-up index (NDBI) have been computed and compared with the areas of water body, vegetation-agricultural land and built-up of LULC maps by using the six classifiers. Results show that the RF classifier performs better than the other classifiers because the total area of three spectral indices is strongly correlated with the area of three LULC classes (Table 6). It is thus considered as the best-fit classifier for preparing LULC in the present study area. Table 6 shows the departure of the area between spectral indices (NDWI, NDVI, and NDBI) and maximum likelihood (ML) algorithms of three LULC units ( Figure 5). The total area of NDVI-based vegetation and agricultural land in the three windows is 155.87 km 2 and the departure of the area is −0.68 km 2 . It is very close to the area detected by the RF classifier (156.55 km 2 ). The NDWI-based water body area is 79.57 km 2 and departure of area is 0.78km 2 in the selected window. It is the most similar with the water body area computed by the RF classifier (78.79 km 2 ) ( Table 6). A similar result is found in the case of NDBI-based built-up area. Figure 5. Validation of LULC of different classifiers with satellite data-derived indices (normalized differential vegetation index (NDVI), normalized differential water index (NDWI), and normalized differential built-up index (NDBI)).
The earlier analysis of closeness between the LULC models and satellite data-derived indices is based on the comparison of the absolute values, but it does not assure robustness. Similarity in area coverage does not always refer to the identical geographical location of any feature. Therefore, we conducted correlation matrix analysis between all LULC models and satellite data-derived indices. Figure 6a presents the correlation between the vegetation and agricultural land by the six classifiers and NDVI in window 1, where the maximum correlation (0.99) can be found by the RF classifier and NDVI at 0.001% significance level, followed by ANN (0.98), Fuzzy ARTMAP (0.97) and SVM (0.97), while the least correlation can be found by the Mahalanobis distance (0.87) with NDVI. The highest correlation (0.96) between water body by RF and NDWI can be found in window 2 at a significance level of 0.001% (Figure 6b), followed by Fuzzy ARTMAP (0.95), SVM (0.94) and ANN (0.93), while the least correlation (0.88) can be detected between SAM and NDWI (Figure 6b). The highest correlation (1) is found between built-up area by RF classifier and NDBI at a significant level of 0.001% followed by Fuzzy ARTMAP (1) at 0.01% of significance level, SVM (0.99) and ANN (0.93) in window 3 (Figure 6c). The lowest correlation (0.91) between SAM and NDBI has been detected (Figure 6c).  We validated the LULC modelling using six classifiers by comparing the high-resolution images provided by Google Earth. The aforementioned three sites were selected for comparison ( Figure 5). The water bodies and wetlands have been classified very well and they can be matched with the Google Earth images in three sites, whereas both vegetation and agricultural land have been prominently visualized by all classifiers. In the cases of RF, fuzzy ARTMAP and SVM, the land-use features were classified very well as proved by comparing with the Google Earth images. Although it is very difficult to model a built-up area with 30-meter resolution images, machine-learning algorithms with good training sites can perform predictions very well. Therefore, in the present study, some classifiers like MD and fuzzy ARTMAP generated more built-up area than the other classifiers and even the reality. However, the ANN, RF and SAM have classified built-up area as found in Google Earth images and even similar to what we have found in the field survey.

Variation in LULC in the Output of Classifiers Used
Several studies reported that the areas of LULC classes are not equal in all the classification techniques, whether machine-learning algorithms or traditional classification techniques are adopted [54,91,92]. In this study, the variation is also found in the results of six classifiers ( Figure 5). The area under any land-use class of a classifier does not exactly match with the area under the same land-use class of another different classifier. The area under each land-use class for the same region also varied in the different satellite data due to the atmospheric, illumination and geometric variations [92]. However, the differences in area under LULC classes of different classifiers occur due to the differences in the parameter optimization of the models, techniques and the accuracy differences in the algorithms used [42,92]. Furthermore, a few studies reported that the machine-learning techniques do not have significant difference in the results [42,52]. In our study, we also found similar kinds of result with small variation in a few land-use classes but not in every case. The coefficient of variation showed significant difference of area computed under various LULC classes. The chi square value clearly exhibited that as the difference in result produced in applied classifiers is significantly high, it is not due to random chance. Hence, we need to justify the suitability of any one or two model(s) adopted here.

Comparison of Accuracy Assessment of Different Classifiers with the Literature
The accuracy of a classification varied with methods, techniques, time and space [41,52,93,94]. Several studies reported minor to moderate fluctuation in the accuracy of the LULC classification using different classifiers [95][96][97][98]. The accuracy assessment in this study shows a small variation among the outputs of the classifiers used in the present case. The accuracy of a LULC classification does not only vary with classifier but also with space and time. This is possibly due to the influence of atmospheric, surface and illumination variations [53,61]. The Kappa coefficient is the most popular technique used to analyse the accuracy. The result shows that the maximum accuracy has been observed in the case of the RF classifier (0.89). Previous studies like Adam et al. [42] and Ma et al. [57] noted that the accuracy levels were 0.93 and 0.90, respectively, for the RF classifier. On the other hand, the minimum accuracy has been found in the case of MD (0.82). The previous studies on LULC classification using MD classifier reported the accuracy level of 0.89 [99], which is higher than the result of the present study. A small difference is found between the previous study and the present study on the accuracy levels of ANN, SAM, fuzzy ARTMAP and SVM [75,100,101]. In this study, the validation of LULC models using satellite image-derived indices is novel and the findings show that RF has modelled LULC in a very good manner, followed by Fuzzy ARTMA, SVM and ANN. However, on the basis of Kappa coefficient, index-based accuracy assessment and empirical observations, it can be concluded that RF is the best classifier for LULC classification. A number of studies from literature also noted that SVM and RF have the highest accuracy in LULC classification (Table 7), while SAM and MD have the lowest accuracy levels. Furthermore, Li et al. [53] noted that the accuracy of SVM and RF has very little difference, but the difference increases between either SVM and ANN or RF and ANN. The result shows that the difference between accuracies of RF and SVM is more than that between RF and ANN.

Conclusions
This study was conducted to examine the accuracy of different machine-learning classifiers for LULC mapping for satellite observations. The aim was to suggest the best classifier. Six machine-learning algorithms were applied on the Landsat 8 (OLI) data for the LULC classification. Accuracy assessments were undertaken by using the Kappa coefficient, an index-based technique and empirical observations. Results suggest that the area under each LULC class varies under different classifiers. The maximum variation is observed for the agricultural and fallow lands, while the minimum for the water bodies and wetlands. Such variation requires a need to prove the best suited classifier.
Furthermore, the Kappa coefficient and index-based analysis also show variation in the accuracy of each LULC classifier. The variation in the accuracy of the classifiers used is found to be minor, but this minor variation has very important significance in the area of LULC mapping and planning. Both the Kappa coefficient and index-based analysis show that the RF has the highest accuracy of all classifiers applied in this study. To justify the result, previous literature on this was taken into consideration and most of the studies concluded that either RF or ANN is the best classifier. Although the previous studies found a higher accuracy for RF and ANN than this study, this study concludes that RF is the best machine-learning classifier for LULC modelling in the highly dynamic charland-dominated areas.
Furthermore, numerous studies suggested that the accuracy of LULC mapping varies with time and location. Thus, for future research, it is suggested to analyse the accuracy of the classifiers for different morphoclimatic and geomorphic conditions. Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/7/1135/s1, Table S1: The accuracy assessment of LULC mapping of six classifiers using Kappa coefficient.

Conflicts of Interest:
The authors declare no conflict of interest.