Land Use / Land Cover Mapping Using Multitemporal Sentinel-2 Imagery and Four Classiﬁcation Methods—A Case Study from Dak Nong, Vietnam

: Information on land use and land cover (LULC) including forest cover is important for the development of strategies for land planning and management. Satellite remotely sensed data of varying resolutions have been an unmatched source of such information that can be used to produce estimates with a greater degree of conﬁdence than traditional inventory estimates. However, use of these data has always been a challenge in tropical regions owing to the complexity of the biophysical environment, clouds, and haze, and atmospheric moisture content, all of which impede accurate LULC classiﬁcation. We tested a parametric classiﬁer (logistic regression) and three non-parametric machine learning classiﬁers (improved k-nearest neighbors, random forests, and support vector machine) for classiﬁcation of multi-temporal Sentinel 2 satellite imagery into LULC categories in Dak Nong province, Vietnam. A total of 446 images, 235 from the year 2017 and 211 from the year 2018, were pre-processed to gain high quality images for mapping LULC in the 6516 km 2 study area. The Sentinel 2 images were tested and classiﬁed separately for four temporal periods: (i) dry season, (ii) rainy season, (iii) the entirety of the year 2017, and (iv) the combination of dry and rainy seasons. Eleven di ﬀ erent LULC classes were discriminated of which ﬁve were forest classes. For each combination of temporal image set and classiﬁer, a confusion matrix was constructed using independent reference data and pixel classiﬁcations, and the area on the ground of each class was estimated. For overall temporal periods and classiﬁers, overall accuracy ranged from 63.9% to 80.3%, and the Kappa coe ﬃ cient ranged from 0.611 to 0.813. Area estimates for individual classes ranged from 70 km 2 (1% of the study area) to 2200 km 2 (34% of the study area) with greater uncertainties for smaller classes. the of combining Sentinel-2, multi-spectral, dry and rainy season when mapping in accuracies for the composite 4 obtained by combining and rainy season image sets using


Motivation
Most Vietnamese forests are classified as tropical with natural forest accounting for more than 70% of the total forest area [1]. Dak Nong province has the most abundant natural forest resources in

Remotely Sensed Data
Remote sensing offers a unique environmental capability for monitoring extensive geographical areas in a cost-efficient manner, while simultaneously producing information related to the Earth's land, atmosphere, and oceans [8]. Land cover mapping represents one of the most common uses of remotely sensed data [9][10][11], with satellite imagery serving as one of the most important data sources [11].
As previously, Dak Nong presents unique challenges for the construction of accurate remote sensing-based land use land cover (LULC) maps [12]. The variation in vegetation owing to the rainy/dry seasonal variation affects the spectral reflectance properties of vegetation. For example, deciduous dipterocarp forests have spectral properties in the dry season that are similar to those of other cover types such as industrial coffee and rubber crops, whereas the respective spectral properties are quite different in the rainy season. Only a few studies have accommodated this kind of seasonal variation when constructing satellite image-based land cover classifications. Sothe et al. (2017) [13] combined multi-spectral fall and spring season images when mapping land cover with Landsat-8 data and found that the inclusion of additional band data considerably improved classifications when compared with the use of fall spectral bands alone. For both classifiers used by Sothe et al. (2017) [13], there were meaningful increases in classification accuracy, by 4.8% and 2.9% for the random forests and support vector machine classifiers, respectively, when the "spring" spectral bands were added.
Dak Nong's seasonal growth variation, varying vegetation spectral signatures, and varying topography suggest that Sentinel-2 satellite spectral data, with its fine spatial resolution (10-60 m), fine temporal resolution (five days), and fine spectral resolution (13 spectral bands), may be particularly well-suited for land cover classification purposes in the province. Although data from the Sentinel-2 sensor have been investigated for a variety of vegetation monitoring [14,15], terrestrial monitoring [16], and forest classification [16] applications, only a few studies have used Sentinel-2 for land cover mapping [17][18][19]. Therefore, additional studies that evaluate the utility of this imagery for land cover classification for regions with extremely diverse conditions such as those in Dak Nong are well-justified [20]. the beneficial effects of the multi-seasonal Sentinel-2 data. Google Earth Engine was used for collecting and pre-processing both training and accuracy assessment data. A second subordinate objective was rigorous statistical estimation of the ground area of each land cover class.

Overview
The structure of this section has multiple components. First, the Dak Nong study area is described in Section 2.2, the Sentinel-2 satellite imagery and its separation into temporal periods are described in Section 2.2, and the land cover data from multiple sources and their separation into training and validation subsets are described in Section 2.3. Next, brief descriptions of the four classifiers are provided in Section 2.4, including descriptions of their statistical properties, details on their required input parameters, and procedures for optimizing their performance. Finally, in Section 2.5, the two analytical components used to compare all combinations of the four temporal image periods and the four classifiers are described. The first component focuses on map accuracy assessment, while the second component focuses on estimating LULC class areas and their corresponding uncertainties. The overall research approach is summarized in Figure 1.
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 28 the beneficial effects of the multi-seasonal Sentinel-2 data. Google Earth Engine was used for collecting and pre-processing both training and accuracy assessment data. A second subordinate objective was rigorous statistical estimation of the ground area of each land cover class.

Overview
The structure of this section has multiple components. First, the Dak Nong study area is described in Section 2.2, the Sentinel-2 satellite imagery and its separation into temporal periods are described in Section 2.2, and the land cover data from multiple sources and their separation into training and validation subsets are described in Section 2.3. Next, brief descriptions of the four classifiers are provided in Section 2.4, including descriptions of their statistical properties, details on their required input parameters, and procedures for optimizing their performance. Finally, in Section 2.5, the two analytical components used to compare all combinations of the four temporal image periods and the four classifiers are described. The first component focuses on map accuracy assessment, while the second component focuses on estimating LULC class areas and their corresponding uncertainties. The overall research approach is summarized in Figure 1.  Research approach as a flowchart. The Sentinel-2 2017 and 2018 data were collected for different seasons: dry, rainy, whole year, and rainy and dry composited images. The MLR, ik-NN, SVM, and RF classifiers were tested with the resulting uncertainty assessments used as criteria for comparing combinations of seasonal datasets and classifiers. OA, overall accuracy; K, Kappa coefficient; PA, producer's accuracy; UA, user's accuracy.

Study Area
The study was conducted in Dak Nong Province in the Central Highlands of Vietnam ( Figure 2). The average elevation is between 600 and 700 m above sea level. The mean temperature is 24 degrees Celsius. The climate conditions produce general characteristics of a subequatorial tropical monsoon climate. The province has characteristics of humid tropical highland climate and is affected by dry-hot southwest monsoons. There are two distinct annual seasons: the rainy season starts in April and ends in November, and the dry season starts in December and ends in March the following year. The average annual rainfall is 2500 mm, of which 90% occurs during the rainy season. The study area extends over 6516 km 2 and is characterized by substantial fragmentation, thereby making LULC classification particularly challenging. Natural forest consists of patches of natural evergreen broad-leaved, mixed bamboo, deciduous dipterocarp, and semi-deciduous forest with different levels of disturbance, mainly human in origin.

Study Area
The study was conducted in Dak Nong Province in the Central Highlands of Vietnam ( Figure  2). The average elevation is between 600 and 700 m above sea level. The mean temperature is 24 degrees Celsius. The climate conditions produce general characteristics of a subequatorial tropical monsoon climate. The province has characteristics of humid tropical highland climate and is affected by dry-hot southwest monsoons. There are two distinct annual seasons: the rainy season starts in April and ends in November, and the dry season starts in December and ends in March the following year. The average annual rainfall is 2500 mm, of which 90% occurs during the rainy season. The study area extends over 6516 km 2 and is characterized by substantial fragmentation, thereby making LULC classification particularly challenging. Natural forest consists of patches of natural evergreen broadleaved, mixed bamboo, deciduous dipterocarp, and semi-deciduous forest with different levels of disturbance, mainly human in origin. Sentinel-2 MSI (multi-spectral instrument, Level-1C) remotely sensed data were used for LULC classification. The Sentinel-2 mission was developed by the European Space Agency (ESA) as a part of the Copernicus Programme [55]. The mission's wide swath, fine spatial resolution (10 m-60 m), multi-spectral features (13 spectral bands), and frequent revisit time (10 days at the equator with one satellite, and 5 days with two satellites) support monitoring vegetation changes within a growing season, forest monitoring, land cover change detection, and natural disaster management [56]. The spectrum characteristics of the Sentinel 2 images are described in Table 1.

Sentinel-2 Imagery
Sentinel-2 MSI (multi-spectral instrument, Level-1C) remotely sensed data were used for LULC classification. The Sentinel-2 mission was developed by the European Space Agency (ESA) as a part of the Copernicus Programme [55]. The mission's wide swath, fine spatial resolution (10 m-60 m), multi-spectral features (13 spectral bands), and frequent revisit time (10 days at the equator with one satellite, and 5 days with two satellites) support monitoring vegetation changes within a growing season, forest monitoring, land cover change detection, and natural disaster management [56]. The spectrum characteristics of the Sentinel 2 images are described in Table 1. The difference in solar illumination geometry during image acquisition between the two seasons was considered in the present study. Although vegetation in the study area presents reduced climatic and phonological seasonality, the observed reflectance varies by season owing to changes in the solar illumination geometry caused by the Earth's translation movement [13]. Therefore, seasonal image datasets were separately classified to evaluate these influences. Accordingly, the scenes of interest included the following: (i) a collection of Sentinel-2 MSI scenes in the study area during the dry season  The different seasonal image datasets represented for each season were considered for the analyses based on the median value of the collection. The multi-spectral bands in the study included Blue (B2), Green (B3), Red (B4), Red Edge 1 (B5), Red Edge 2 (B6), Red Edge 3 (B7), near infrared (NIR) (B8), Red Edge 4 (B8A), short-wave infrared (SWIR) 1 (B11), and SWIR 2 (B12). In addition to these spectral bands, the normalized difference vegetation index (NDVI) and a digital elevation model (DEM) were added to the seasonal image data (IMG 1-4) with the objective of increasing classification accuracy, as reported from previous studies [22,57]. These bands, including NDVI [58,59] and DEM, were resampled at 10 m resolution. Image information is described in Table 2 below.
To conduct the analyses, the JavaScript API Code Editor in the Google Earth Engine (GEE) was used to collect data for a large number of images. GEE provides most freely available image data and an application programming interface (API) to analyze and visualize the data [60,61]. Surface reflectance (SR) images for 2017 were not available, and for 2018 images, approximately 50% of the study area was covered by clouds. Hence, a top of atmosphere (TOA) dataset acquired for 2017 and 2018 was used for the study. The set of collected images was pre-processed to reduce the effects of topography and bidirectional reflectance distribution function (BRDF). At the same time, cloud areas were masked out and shadows were removed during this process.
All images underwent pixel-wise cloud and cloud shadow masking using the Google cloudScore algorithm for cloud masking and temporal dark outlier mask (TDOM) for cloud shadows, both of which built on ideas from Landsat TDOM and cloudScore algorithm. The original concept was written by Carson Stam, adapted by Ian Housman, currently documented in [60], and described and evaluated in a forthcoming paper [62]. This study implemented the correction of reflectance spectral values by BRDF based on the method described by Roy et al. 2017a,b [63,64]. Topographic correction is the process to account for diffuse atmospheric irradiance caused by slope, aspect, and elevation effects. The sun-canopy-sensor + C (SCSc) correction method based on the C-correction, as described by Soenen et al. [65], was applied for topographic correction in this study. The median function was then applied to create an image object (single image) representing the median value of all images in the filtered collection [66,67]. The median lies closer to the majority of values and is insensitive to extreme values and has exactly half the values smaller and half the values greater than the median, as elaborately applied by [68]. The post-processed images were then resampled to a spatial resolution of 10 m using the nearest neighbor method [69], and subsetted to the study area. The entire pre-processing was implemented on GEE based on the script available on "Open Geo Blog -Tutorials, Code snippets and examples to handle spatial data" [61,70].
Acquiring adequate training and validation data is often challenging in tropical regions. Sothe et al., 2017 [13] and Teluguntla et al., 2018 [71] both obtained good results using sample data from a combination of sources including field investigations, very fine spatial resolution Google Earth imagery, current Landsat and Sentinel imagery, and other sources such as maps. A similar approach was used for this study for which three sets of sample data were acquired in 2017 and 2018: (1) field observations for a purposive sample of size 232; (2) visual interpretations of fine and very fine resolution imagery from sources that included Google Earth for a purposive sample size of 214; and (3) visual interpretations of fine and very fine resolution imagery from sources that included Google Earth and Sentinel-2A imagery for a simple random sample size of 800. For the latter sample dataset, field observations and data from the 2016 Dak Nong Forest Inventory Map were used to clarify and refine interpretations for the LULC classes such as semi-evergreen forest, plantation forest, and some perennial industrial crops that were difficult to distinguish in the imagery.
To obtain the probability sample necessary for validation, a systematic sample of the probability-based third dataset was selected. The plots in the third dataset were first sorted by their east and north coordinates, and then a systematic sample was selected from within each LULC class. For each class, the proportion selected was arbitrary, but was guided by the desire for a minimum sample size of 15, where possible, while yet retaining a sufficient sample size for training purposes.
For the eleven LULC classes, the proportions were, in order, as follows: 0.20, 0.20, 0.50, 0.67, 1.00, 0.50, 0.11, 0.50, 0.67, 0.50, 0.20. Because the third dataset was selected using a simple random sample, and it was systematically subsampled, each subsample can also be considered a simple random sample and, therefore, can be used for validation. The remaining portion of the third dataset was used for training purposes. The result was a sample size of 1036 for training and a sample size of 208 for validation. The number of validation plots by LULC category was considered sufficient and generally complied with the recommendation of Särndal et al. (1992) [72]. A summary of the training and validation datasets is shown in Table 3 with the spatial distribution of sample locations shown in Figure 2.

Classifiers
The MLR, ik-NN, RF, and SVM supervised classification algorithms were used to classify the satellite image data as described above. The training areas for each LULC type were selected based on Google Earth, field data, and prior knowledge, as well as available data. The models were used as supervised classifiers to classify pixels based on their spectral signatures.

Multinomial Logistic Regression (MLR)
With MLR, the probability of class c for the i th plot, c=1,..., C, is estimated as follows: and where C is the number of the LULC classes, x i is the vector of predictor variable observations for the i th population unit, and β c is the vector of regression coefficients associated with LULC class c. The class with the greatest probability is selected as the prediction for the i th population unit. Optimal estimates for β c : c = 1, . . . , C can be obtained using any of multiple statistical software packages.

Improved k-Nearest Neighbors (ik-NN)
In the terminology of nearest neighbors techniques, the auxiliary or predictor variables are designated feature variables and the space defined by the feature variables is designated the feature space; the set of sample units for which observations of both response and feature variables are available is designated the reference set; and the set of population units for which predictions of response variables are desired is designated the target set (Chirici et al., 2016) [73]. All population units for both the reference and target sets are assumed to have complete sets of observations for all feature variables.
For the i th target unit, all forms of nearest neighbors algorithms entail selecting the k-nearest or most similar neighbors, y i j : j = 1, 2, . . . , k , from the reference set with respect to a distance metric, d, formulated as a function of the feature variables. For categorical response variables such as land cover classes, the prediction,ŷ i , for the i th target unit is the most heavily weighted class among the k-nearest neighbors, a weighted median or mode in case of ordinal scale variables, or a mode in the case of nominal variables. Implementation of nearest neighbors techniques requires multiple selections: (i) the distance metric, d, to assess nearness or similarity in the feature space; (ii) a scheme for weighting the predictor variables in the distance metric; (iii) a scheme for weighting individual neighbors when calculating predictions; and (iv) a number, k, of nearest neighbors [73].
For this study, the distance metric was a simplified version of the metric proposed by Tomppo and Halme (2004) [44], as used in the operational Finnish multi-source national forest inventory (MS-NFI), where i denotes a target unit; j denotes a reference unit; d ij is the distance between the units i and j; m indexes the feature variables; x im and x jm are observations of the m th feature variable for the i th target unit and j th reference unit, respectively; and w m is a feature variable weight. Neighbor weights are typically formulated as powers, t ∈ [0, 2], of distances between target and reference units. Often, the necessary selections to implement a nearest neighbor algorithm are made in an arbitrary method, whereas improved k-NN (ik-NN) entails optimized selection of the weights, w m , using a technique such as genetic algorithms [44,45,74]

Support Vector Machine (SVM)
The principle behind the SVM classifier is a hyperplane that separates the data for different classes. The main focus is construction of the hyperplane by maximizing the distance from the hyperplane to the nearest data point of either class. These nearest data points are known as support vectors [75].
According to Huang et al. (2002) [35] (p. 734), by mapping the input data into a high-dimensional space, the kernel function converts non-linear boundaries in the original data space into linear boundaries in the high-dimensional space, which can then be located using an optimization algorithm. Therefore, selection of the kernel function and appropriate values for corresponding kernel parameters, referred to as the kernel configuration, can affect the performance of the SVM.
The radial basis function kernel (RBF kernel) is one of the most popular kernels used to implement the support vector machine algorithm and was used for this study. The squared Euclidean distance metric was used to construct completely non-linear hyperplanes. The RBF kernel of the SVM classifier is commonly used and has performed well. Polynomial kernels, especially high-order kernels, took far more time to train than RBF kernels [35]. Meyer et al. (2002) [76] stated that, for classification tasks, C-classification is most likely used with the RBF kernel because of its good general performance and the small number of parameters (only two, C and γ). Therefore, the two parameters that must be defined for this classification algorithm are the cost parameter (C) and the kernel width parameter (γ). According to Knorn et al. (2009) [77] (p.960), C is a regularization parameter that controls the trade-off between maximizing the margin and minimizing the training error. C is a preset penalty value for misclassification errors, while γ describes the kernel width, which affects the smoothing of the shape of the class-dividing hyperplane.
The authors of LIBSVM suggest trying small and large values for C, such as 1 to 1000, then using cross-validation to decide which is optimal for the data, and finally trying several γ's for the optimal C's. A small C-value tends to emphasize the margin while ignoring the outliers in the training data, while a large C-value may overfit the training data [77] (p.960). Optimal selection of C and γ parameters was done by testing C parameters in the range 2 −1 to 2 8 and γ parameters in the range 0.1 to 2.0.

Random Forests (RF)
The RF classifier developed by Breiman (2001) [78] requires selection of three parameters: ntree (number of trees to grow), mtry (the number of variables to split each node), and variable importance (the number of variables/bands influences model performance). Liaw & Wiener (2002) [79] recommend using the square root of the number of input variables as the default value for mtry. A large value for ntree produces a stable result for variable importance, which is estimated using two indicators: (i) mean decrease accuracy (MDA) and ii) mean decrease gini (MDG). MDA is the accuracy associated with each predictor variable based on the out-of-bag error rate (OOB). Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it is randomly labeled according to the distribution of labels in the subset. For this study, MDA values were investigated to determine the importance of variables. Nguyen et al. (2018) [6] indicated that within the range 1 ≤ ntree ≤ 500, ntree = 300 produced the best fit. In addition, Breiman (2001) [78] stated that using more than the required number of trees may be unnecessary, albeit not harmful, because the relationship between accuracy and ntree is asymptotic. The 'RandomForest' package in the R environment developed by Liaw and Wiener (version 4.6-14 in 2018) was used in present study. Optimal values of mtry, ntree, and variable importance were selected based on the smallest OOB error. The optimal variable importance depended on the MDA value and accuracy of the model.

Accuracy Assessment
Accuracy assessment is an important step before accepting a classification result [21]. The classification accuracy of a map product is estimated by constructing a confusion matrix between reference and classified pixels. Classification accuracy was assessed using criteria such as overall accuracy (OA), Kappa coefficient (K), producer's accuracy (PA), and user's accuracy (UA). Congalton and Green (1999) [80] assert that analysis of the causes of differences in the confusion matrix can be one of the most important and interesting steps in the construction of a map from remotely sensed data.
The objectives of the study included comparing the performance of classifiers as well as assessing the effects of Sentinel-2 satellite images for different seasons, as described in Table 2. The number of seasonal bands used with the four classifiers is reported in Table 4.

Land Cover Class Area Estimation
For each land cover class, an estimate of the class area and the corresponding standard error were calculated using a combination of confusion matrices and stratified estimators [81,82]. For each class, C, a confusion matrix was constructed for two classes: (i) class C and (ii) the aggregation of all other classes into a single class designated~C (Table 5). Using estimates of proportions and corresponding variances as indicated in Table 5, the stratified estimate of the area of class C was as follows: with standard error, SE Â C = A tot · wt 2 1 ·V ar p 1 + wt 2 2 ·V ar p 2 (5) where wt 1 is the proportion of the total map area in class C, wt 2 = 1 − wt 1 , and A tot is the total area of interest. Approximate 95% confidence intervals for the class areas can be estimated as follows: Total n ·1 = n 11 + n 21 n ·2 = n 12 + n 22 PA * pa 1 = n11 n·1 pa 2 = n22 n·2 * UA = user's accuracy, * PA = producer's accuracy.

Multinomial Logistic Regression (MLR)
The parameters of the multinomial logistic regression model (Equations (1) and (2)) were estimated using the multinom function of the R packages [83]. All variables (spectral values of all bands of the image set in the analysis) were included in the model. The log-likelihood stabilized after 100 iterations. The importance of the variables was quite similar among different Sentinel-2 datasets. For the dry season image (IMG 1) and for the all-month image (IMG 3), the most important variables were Blue and SWIR 2, otherwise, the variable importance values were approximately the same. For the rainy season image (IMG 2), the results were also similar, although the Blue and SWIR 2 importance values were slightly less than for IMG 1 and IMG 3. Differences among importance values were small for the two-season image (IMG 4).

Improved k-NN (ik-NN)
The improved k-NN (ik-NN) algorithm was applied as described in [45], except that only overall accuracy was used in the fitness function. The value of k = 10 was used. For the genetic algorithm, the number of the generations was 60, the population and medi-population sizes were 50, and the maximum number of the random iterations was 4000. Otherwise, the genetic algorithm parameters were as reported by Tomppo et al. (2009) [45]. Because genetic algorithms as a heuristic optimization method may select a local optimum, several trial runs were used to find a near optimal solution. Pixel-level estimates can be readily calculated with ik-NN when the weights of the variables have been optimized. The importance for the different variables was similar for ik-NN and MLR. However, in the case of predictor variables with large correlations, caution should be used when drawing conclusions with either method.

Support Vector Machine (SVM)
With the SVM algorithm using the RBF kernel, determination of the optimal cost (C) and Gamma parameters for the model is important. Following Qian et al. (2015) [84] and using our actual dataset, the R function 'tune()' was used to select these two SVM parameters. The optimal cost (C) value was determined from the values: 2 −1 , 2 0 , 2 1 , 2 2 , 2 3 , 2 4 , 2 5 , 2 6 , 2 7 , 2 8 , and the Gamma (γ) value was a free parameter set from 0.1 to 2. The optimal parameters were determined based on classification error. Figure 3 describes the performance of the SVM model using the different cost and Gamma parameters. The darker the blue area, the better the performance of the model presented.
Remote Sens. 2020, 12, x FOR PEER REVIEW 2 of 28 With the SVM algorithm using the RBF kernel, determination of the optimal cost (C) and Gamma parameters for the model is important. Following Qian et al. (2015) [84] and using our actual dataset, the R function 'tune()' was used to select these two SVM parameters. The optimal cost (C) value was determined from the values: 2 -1 , 2 0 , 2 1 , 2 2 , 2 3 , 2 4 , 2 5 , 2 6 , 2 7 , 2 8 , and the Gamma (γ) value was a free parameter set from 0.1 to 2. The optimal parameters were determined based on classification error. For the IMG 2-SVM combination and the IMG 4-SVM combination, the optimal C of 2 3 and Gamma of 0.1 produced classification errors of 0.2283 and 0.1525, respectively, while for the IMG 1-SVM and IMG 3-SVM combinations, the optimal C of 2 5 and 2 6 , both with Gamma of 0.1, produced classification errors of 0.2212 and 0.2145, respectively.

Random Forests (RF)
The three RF parameters, ntree, mtry, and variable importance, play important roles in classification. The algorithm assesses the importance of each variable in the classification process by means of a specific measure. 'Importance()' and 'varImplot()' functions were used to determine the MDA values and to select the potential variables that were actually needed for the optimal RF models. Figure 4 shows the variable importance ranked in the direction of decreasing MDA from right to left for the four seasonal images. The selection of variables was based on MDA using the backward selection method [85]. With this method, the algorithm starts with all predictor variables and then sequentially removes some variables until the greatest accuracy is achieved. Accordingly, the least MDAs were attributed to two bands of IMG 1, 2, and 3: specifically, B6 and B8 for IMG 1 and B6 and B7 for both IMG 2 and IMG 3. For IMG 4, the five bands were included: B5a, B2a, B2b, B7a, and B8b. In addition, the NIR band reduced the accuracy for all images. For the IMG 2-SVM combination and the IMG 4-SVM combination, the optimal C of 2 3 and Gamma of 0.1 produced classification errors of 0.2283 and 0.1525, respectively, while for the IMG 1-SVM and IMG 3-SVM combinations, the optimal C of 2 5 and 2 6 , both with Gamma of 0.1, produced classification errors of 0.2212 and 0.2145, respectively.

Random Forests (RF)
The three RF parameters, ntree, mtry, and variable importance, play important roles in classification. The algorithm assesses the importance of each variable in the classification process by means of a specific measure. 'Importance()' and 'varImplot()' functions were used to determine the MDA values and to select the potential variables that were actually needed for the optimal RF models. Figure 4 shows the variable importance ranked in the direction of decreasing MDA from right to left for the four seasonal images. The selection of variables was based on MDA using the backward selection method [85]. With this method, the algorithm starts with all predictor variables and then sequentially removes some variables until the greatest accuracy is achieved. Accordingly, the least MDAs were attributed to two bands of IMG 1, 2, and 3: specifically, B6 and B8 for IMG 1 and B6 and B7 for both IMG 2 and IMG 3. For IMG 4, the five bands were included: B5a, B2a, B2b, B7a, and B8b. In addition, the NIR band reduced the accuracy for all images.  The number of variables used for splitting at each node (mtry) was determined using the tuneRF function based on the variable importance and the number of trees (ntree). On the basis of the OOB error estimation, the optimum ntree and mtry parameters were chosen for the models. Figure 5 shows the OOB errors when the model was run with ntree ranging from 1 to 500 trees. The smallest OOB errors were obtained for ntree = 500 trees for all seasonal image combinations.  The number of variables used for splitting at each node (mtry) was determined using the tuneRF function based on the variable importance and the number of trees (ntree). On the basis of the OOB error estimation, the optimum ntree and mtry parameters were chosen for the models. Figure 5 shows the OOB errors when the model was run with ntree ranging from 1 to 500 trees. The smallest OOB errors were obtained for ntree = 500 trees for all seasonal image combinations.  The number of variables used for splitting at each node (mtry) was determined using the tuneRF function based on the variable importance and the number of trees (ntree). On the basis of the OOB error estimation, the optimum ntree and mtry parameters were chosen for the models. The OOB errors associated with different mtry values are shown in Figure 6. The smallest OOB error was obtained with mtry = 3 for the IMG 2/RF and IMG 3/RF combinations, with mtry = 6 for the IMG 1/RF combination, and with mtry = 4 for the IMG4/RF combination.

Accuracy of Classification Results
OA, K, PA, and UA for each class for the different combinations of image groups and classifiers are reported Table 6 and a comparison of the results is reported in Figure 7. In general, relatively large accuracies were found with OA >60% and K >0.6. Of interest, the IMG 4 composite of rainy and dry images produced the greatest accuracies for all classifiers. By contrast, the rainy season IMG 2 images produced the smallest accuracies. Classification accuracies for IMG 1 and IMG 3 were less than for IMG 4, but greater than for IMG 2.
The greatest accuracy was achieved for the composite two-season IMG 4 using the SVM classifier, specifically OA = 80.3% and Kappa = 0.813. The smallest accuracy was for IMG 2 with the MLR classifier. The differences between the greatest and smallest accuracies were 16.4% percentage points for OA and 0.202 for K. With respect to the classification algorithms, differences between the greatest and smallest accuracies for the four algorithms were 14.4% percentage points for OA and The OOB errors associated with different mtry values are shown in Figure 6. The smallest OOB error was obtained with mtry = 3 for the IMG 2/RF and IMG 3/RF combinations, with mtry = 6 for the IMG 1/RF combination, and with mtry = 4 for the IMG4/RF combination.

Accuracy of Classification Results
OA, K, PA, and UA for each class for the different combinations of image groups and classifiers are reported Table 6 and a comparison of the results is reported in Figure 7. In general, relatively large accuracies were found with OA >60% and K >0.6. Of interest, the IMG 4 composite of rainy and dry images produced the greatest accuracies for all classifiers. By contrast, the rainy season IMG 2 images produced the smallest accuracies. Classification accuracies for IMG 1 and IMG 3 were less than for IMG 4, but greater than for IMG 2.
The greatest accuracy was achieved for the composite two-season IMG 4 using the SVM classifier, specifically OA = 80.3% and Kappa = 0.813. The smallest accuracy was for IMG 2 with the MLR classifier. The differences between the greatest and smallest accuracies were 16.4% percentage points for OA and 0.202 for K. With respect to the classification algorithms, differences between the greatest and smallest accuracies for the four algorithms were 14.4% percentage points for OA and

Accuracy of Classification Results
OA, K, PA, and UA for each class for the different combinations of image groups and classifiers are reported Table 6 and a comparison of the results is reported in Figure 7. In general, relatively large accuracies were found with OA >60% and K >0.6. Of interest, the IMG 4 composite of rainy and dry images produced the greatest accuracies for all classifiers. By contrast, the rainy season IMG 2 images produced the smallest accuracies. Classification accuracies for IMG 1 and IMG 3 were less than for IMG 4, but greater than for IMG 2.       The greatest accuracy was achieved for the composite two-season IMG 4 using the SVM classifier, specifically OA = 80.3% and Kappa = 0.813. The smallest accuracy was for IMG 2 with the MLR classifier. The differences between the greatest and smallest accuracies were 16.4% percentage points for OA and 0.202 for K. With respect to the classification algorithms, differences between the greatest and smallest accuracies for the four algorithms were 14.4% percentage points for OA and 0.202 for K. For SVM, the differences were 11.9% percentage points and 0.096 for K; for ik-NN, the differences were 10% percentage points for OA and 0.108 for K. The final LULC map was constructed using the classification for the most accurate IMG 4/SVM combination and is shown in Figure 8.
Average UA and PA estimates were greater than 60%, apart from a few cases, but differed by LULC class. For the forest cover classes, dense forest (1) had the greatest accuracy, while open forest (2) had the smallest accuracies for the four methods for most seasons (Figure 9).
Remote Sens. 2020, 12, x FOR PEER REVIEW 2 of 28 Figure 9. User's and producer's accuracies by class for the four classifiers using the IMG 4 combination of the rainy and dry Sentinel 2 satellite imagery.

Land Cover Class Area Estimates
Land cover class area estimates with corresponding standard errors are shown by class for the 16 seasonal image and prediction technique combinations in Table 6. Class area estimates ranged from slightly greater than 70 km 2 for class 10 for the IMG 4 and MLR combination to slightly greater than 2200 km 2 for class 7 for the IMG 4 and SVM combination. Standard errors ranged from approximately 6 km 2 to approximately 230 km 2 , with larger standard errors associated with larger area estimates (Figure 10), although smaller ratios of standard errors to area estimates were associated with larger area estimates. Regardless of the seasonal image and prediction technique combination, the three classes with the greatest areas, in order, were class 7: perennial industrial plants, class 2: open evergreen broadleaved forest, and class 1: dense evergreen broadleaved forest. For IMG 1, IMG 2, and IMG 3, the sums of the estimated areas for these three classes as proportions of the total area ranged from slightly more than 0.50 to approximately 0.63, but with larger estimates for IMG 4.

Land Cover Class Area Estimates
Land cover class area estimates with corresponding standard errors are shown by class for the 16 seasonal image and prediction technique combinations in Table 6. Class area estimates ranged from slightly greater than 70 km 2 for class 10 for the IMG 4 and MLR combination to slightly greater than 2200 km 2 for class 7 for the IMG 4 and SVM combination. Standard errors ranged from approximately 6 km 2 to approximately 230 km 2 , with larger standard errors associated with larger area estimates ( Figure 10), although smaller ratios of standard errors to area estimates were associated with larger area estimates. Regardless of the seasonal image and prediction technique combination, the three classes with the greatest areas, in order, were class 7: perennial industrial plants, class 2: open evergreen broadleaved forest, and class 1: dense evergreen broadleaved forest. For IMG 1, IMG 2, and IMG 3, the sums of the estimated areas for these three classes as proportions of the total area ranged from slightly more than 0.50 to approximately 0.63, but with larger estimates for IMG 4.

Discussion
Errors are present in any classification, estimation, or prediction [21,[86][87][88]. Comparison of the results of this study and those of earlier studies is not straightforward because the numbers and definitions of the vegetation classes differ by study. Thus, optimality differs by study and user [21,[86][87][88]. There are also no generally accepted limits on how accurate a classification should be to be characterized as reliable, because different users may have different concerns about accuracy. They may, for example, be interested in the accuracy for a specific class or in accuracy for areal estimates [89]. In addition, multiple factors influence classification accuracy: image quality, classifier, image composition, number and details of classes, and sample size.
Andersen et al. (1976) [90] recommended that accuracies of 85% for mapping land cover are acceptable. However, as Foody (2008) [91] noted, for many contemporary mapping applications, the challenge may be more difficult than assessed by Anderson et al. (1976) [90], particularly when attempting to distinguish among a large number of relatively detailed classes at a relatively local, large cartographic scale. Consequently, in such applications, the use of the 85% target suggested by Anderson et al. (1976) [90] may be inappropriate, as it may be unrealistically large.
Indeed, many studies have been conducted to select the most accurate classifier, either among those simultaneously evaluated or with classifiers evaluated in other studies. Such works reach no consensus, because the performance of a classifier always depends on the specific site characteristics,

Discussion
Errors are present in any classification, estimation, or prediction [21,[86][87][88]. Comparison of the results of this study and those of earlier studies is not straightforward because the numbers and definitions of the vegetation classes differ by study. Thus, optimality differs by study and user [21,[86][87][88]. There are also no generally accepted limits on how accurate a classification should be to be characterized as reliable, because different users may have different concerns about accuracy. They may, for example, be interested in the accuracy for a specific class or in accuracy for areal estimates [89]. In addition, multiple factors influence classification accuracy: image quality, classifier, image composition, number and details of classes, and sample size.
Andersen et al. (1976) [90] recommended that accuracies of 85% for mapping land cover are acceptable. However, as Foody (2008) [91] noted, for many contemporary mapping applications, the challenge may be more difficult than assessed by Anderson et al. (1976) [90], particularly when attempting to distinguish among a large number of relatively detailed classes at a relatively local, large cartographic scale. Consequently, in such applications, the use of the 85% target suggested by Anderson et al. (1976) [90] may be inappropriate, as it may be unrealistically large.
Indeed, many studies have been conducted to select the most accurate classifier, either among those simultaneously evaluated or with classifiers evaluated in other studies. Such works reach no consensus, because the performance of a classifier always depends on the specific site characteristics, on the type and quality of the remotely sensed data, and also on the number and general aspects of the classes of interest [13]. Using the RF, SVM, maximum likelihood, and neural network classification algorithms to discriminate among four individual land cover classes based on two Landsat-8 OLI scenes, Lowe and Kulkarni (2015) [40] reported overall classification accuracies of 96.25%, 86.88%, 83.13%, and 76.87%, respectively. Kennedy et al. (2015) [41] used RF to classify Landsat time-series data from 1198 training patches for four classes (agriculture, forest, urbanization, and stream) and reported OA greater than 80%, but most successfully for the numerically well-represented forest management class. Meanwhile, Franco-Lopez (2001) [38] used k-NN to map 13 types of land cover using Landsat TM and achieved OA = 64%. Tomppo et al. (2008) [92] reported OA between 70% and 80% for classifying dominant tree species in one boreal forest test site in Finland when using two adjacent Landsat 7 ETM+ scenes and the ik-NN method. Pelletier et al. (2016) [18] used RF and SVM algorithms to classify SPOT-4 imagery and Landsat-8 HR-SITS images in southern France. The authors reported an OA of 83.3% for RF and 77.1% for SVM. Research by Phan and Kappas (2017) [20] showed different results among RF, SVM, and k-NN classifiers used to discriminate six types of LULC using Sentinel-2 image data in the Red River Delta of Vietnam. This research reported that SVM produced the greatest OA (95.29%) with the least sensitivity to the training sample sizes, followed consecutively by RF (94.44%) and k-NN (94.13%). These results indicate that no standard of accuracy is appropriate for all cases, because accuracy relevance depends on both the objective and the user.
Spatial information including remotely sensed data has been an excellent source of information for decision makers in forest management, albeit in conjunction with an understanding of classification uncertainties, whereby the probabilities of non-optimal and infeasible decisions are reduced. For this study, OA ranged from 63.9% to 80.3% (Figure 7) when using Sentinel 2 data to classify 11 LULC classes, with SVM producing the greatest accuracies. The difference between accuracies for the most accurate SVM classifier and the least accurate MLR classifier was approximately 14.4%. Although the results for SVM and RF were relatively similar, some authors recommend RF because training is less time-consuming and parameter selection is easier [18], a recommendation that was confirmed in our study.
Producer's and user's accuracies among the 11 LULC classes differed considerably ( Figure 9). In general, the open evergreen forest classes were confused more than the other forest cover classes. This result is attributed to the heterogeneous conditions of natural tropical forests. In addition, forests in the study area have been disturbed to different degrees [21]. Among the forest classes, deciduous dipterocarp and semi-evergreen forest are considered the most challenging for remote sensing classification because of the seasonal deciduous characteristics of these forest types in the dry season [93]. However, this problem may be solved by using the combination of dry and rainy season images, as investigated in the present study.
The Sentinel-2 images acquired for different seasons (plant growth stages) produced different results. The greatest accuracies were for the composite rainy and dry season IMG 4; by contrast, the lowest accuracies were for the rainy season IMG 2. The observed reflectance varied by season owing to changes in the solar illumination geometry caused by the Earth's translation movement. In addition, the vegetation in the study area varies depending on the season, owing to the substantial rainfall differences for the two seasons. Sothe et al. (2017) [13] assert that differences in classification accuracies for the dry and rainy seasons can be attributed to the differences in solar illumination geometry between the two seasons. For images acquired in the dry season, the incident sun radiation arrives in a more perpendicular direction to the Earth's surface, thus reducing the shadow effect caused by topography and variations in the forest canopy height, and leading to greater pixel illumination. For the current study, there was a substantial increase in classification accuracies when using a composite of dry and rainy Sentinel 2 images (IMG 4). For the ik-NN, RF, and SVM classifiers, the greatest accuracies were obtained for the combined rainy and dry IMG 4 relative to the rainy or dry season alone ( Table 6). The accuracy increase for the composite image may be explained by the fact that different seasons contain different information for the same kind of land cover (e.g., dipterocarp forest is deciduous in dry seasons and green in rainy seasons). Combining the two season's image bands captures additional information on land cover.
Among all combinations of images, classification algorithms, and land classes, the smallest SE for area estimates was for the water surface class owing to its stability, whereas the largest SE was for the industrial plant class. In fact, because cultivation characteristics of industrial plants in the study area are quite complex with a variety of species such as coffee, pepper, and cashew, all with uneven ages, large SEs are inevitable. This complexity also explains the large difference among area estimates for this class, ranging from 1643.45 km 2 to 2223.87 km 2 , or from 25% to 34% of the total area ( Figure 10).
Although classification accuracies for vegetation classes were not particularly large, the classifications are still useful for complex tropical rain forests that have been disturbed to different degrees such as in the Central Highlands of Vietnam. The area estimates and spatial distributions of the LULC classes produced from the current study will assist local authorities, managers, and other stakeholders in decision-making and planning regarding forest land cover and uses. The usual practice is for the Institution of Forest inventory and Planning (FIPI) to conduct a forest inventory and construct a forest map every five years. Local forest units such as Dak Nong receive the maps and update them manually. However, the accuracy of the map has usually not been announced, and inaccuracies and errors have been detected only by local forest staff when patrolling in the field. Moreover, LULC changes, particularly for industrial land, occur quickly and easily owing to factors such as unstable crop markets and increasing population resulting from migration. Thus, the results of this study will not only provide authorities with updated information on current conditions, but will also serve as a recommendation regarding methods for proactively updating LULC maps in a timely and costly manner. Specifically, timely and updated maps assist authorities by serving as a basis for formulating suitable solutions and policies for managing LULC including forest cover.

Conclusions
This research showed the utility of combining Sentinel-2, multi-spectral, and dry and rainy season band data when mapping LULCs in Dak Nong Province, Vietnam. The greatest accuracies were achieved for the composite IMG 4 obtained by combining dry and rainy season image sets using the SVM classifier.
Among the classifiers, SVM produced the greatest accuracies, although RF, which had similar accuracies, was simpler to train and apply, and was less computationally intensive. For IMG 4, the greatest accuracies with SVM were OA = 80.3% and Kappa index = 0.813; for RF, the greatest accuracies were OA = 80.0% and K = 0.802. Thus, the combination of dry and rainy season imagery used with the SVM or RF may contribute to potentially new ways for classifying the complex tropical forest of Vietnam and similar areas. The area estimates and spatial distributions of the LULC classes produced from the current study will assist local authorities, managers, and other stakeholders in decision-making and planning regarding forest land cover and uses.
In conclusion, the two-season, multi-spectral Sentinel-2 images provided useful data for classifying LULC classes in areas with substantial fragmentation, especially for natural forests that have been disturbed and degraded at different levels such as in Dak Nong, Vietnam. The SVM and RF machine learning algorithms were both accurate classifiers when used with the Sentinel 2 imagery. The methods developed for this study are applicable to boreal and temporal forests with different classes in addition to the tropical forests for the current study. However, the model parameters always need to be re-estimated for each application.