Errors are present in any classification, estimation, or prediction [21
]. Comparison of the results of this study and those of earlier studies is not straightforward because the numbers and definitions of the vegetation classes differ by study. Thus, optimality differs by study and user [21
]. There are also no generally accepted limits on how accurate a classification should be to be characterized as reliable, because different users may have different concerns about accuracy. They may, for example, be interested in the accuracy for a specific class or in accuracy for areal estimates [89
]. In addition, multiple factors influence classification accuracy: image quality, classifier, image composition, number and details of classes, and sample size.
Andersen et al. (1976) [90
] recommended that accuracies of 85% for mapping land cover are acceptable. However, as Foody (2008) [91
] noted, for many contemporary mapping applications, the challenge may be more difficult than assessed by Anderson et al. (1976) [90
], particularly when attempting to distinguish among a large number of relatively detailed classes at a relatively local, large cartographic scale. Consequently, in such applications, the use of the 85% target suggested by Anderson et al. (1976) [90
] may be inappropriate, as it may be unrealistically large.
Indeed, many studies have been conducted to select the most accurate classifier, either among those simultaneously evaluated or with classifiers evaluated in other studies. Such works reach no consensus, because the performance of a classifier always depends on the specific site characteristics, on the type and quality of the remotely sensed data, and also on the number and general aspects of the classes of interest [13
]. Using the RF, SVM, maximum likelihood, and neural network classification algorithms to discriminate among four individual land cover classes based on two Landsat-8 OLI scenes, Lowe and Kulkarni (2015) [40
] reported overall classification accuracies of 96.25%, 86.88%, 83.13%, and 76.87%, respectively. Kennedy et al. (2015) [41
] used RF to classify Landsat time-series data from 1198 training patches for four classes (agriculture, forest, urbanization, and stream) and reported OA greater than 80%, but most successfully for the numerically well-represented forest management class. Meanwhile, Franco-Lopez (2001) [38
] used k-NN to map 13 types of land cover using Landsat TM and achieved OA = 64%. Tomppo et al. (2008) [92
] reported OA between 70% and 80% for classifying dominant tree species in one boreal forest test site in Finland when using two adjacent Landsat 7 ETM+ scenes and the ik-NN method. Pelletier et al. (2016) [18
] used RF and SVM algorithms to classify SPOT-4 imagery and Landsat-8 HR-SITS images in southern France. The authors reported an OA of 83.3% for RF and 77.1% for SVM. Research by Phan and Kappas (2017) [20
] showed different results among RF, SVM, and k-NN classifiers used to discriminate six types of LULC using Sentinel-2 image data in the Red River Delta of Vietnam. This research reported that SVM produced the greatest OA (95.29%) with the least sensitivity to the training sample sizes, followed consecutively by RF (94.44%) and k-NN (94.13%). These results indicate that no standard of accuracy is appropriate for all cases, because accuracy relevance depends on both the objective and the user.
Spatial information including remotely sensed data has been an excellent source of information for decision makers in forest management, albeit in conjunction with an understanding of classification uncertainties, whereby the probabilities of non-optimal and infeasible decisions are reduced. For this study, OA ranged from 63.9% to 80.3% (Figure 7
) when using Sentinel 2 data to classify 11 LULC classes, with SVM producing the greatest accuracies. The difference between accuracies for the most accurate SVM classifier and the least accurate MLR classifier was approximately 14.4%. Although the results for SVM and RF were relatively similar, some authors recommend RF because training is less time-consuming and parameter selection is easier [18
], a recommendation that was confirmed in our study.
Producer’s and user’s accuracies among the 11 LULC classes differed considerably (Figure 9
). In general, the open evergreen forest classes were confused more than the other forest cover classes. This result is attributed to the heterogeneous conditions of natural tropical forests. In addition, forests in the study area have been disturbed to different degrees [21
]. Among the forest classes, deciduous dipterocarp and semi-evergreen forest are considered the most challenging for remote sensing classification because of the seasonal deciduous characteristics of these forest types in the dry season [93
]. However, this problem may be solved by using the combination of dry and rainy season images, as investigated in the present study.
The Sentinel-2 images acquired for different seasons (plant growth stages) produced different results. The greatest accuracies were for the composite rainy and dry season IMG 4; by contrast, the lowest accuracies were for the rainy season IMG 2. The observed reflectance varied by season owing to changes in the solar illumination geometry caused by the Earth’s translation movement. In addition, the vegetation in the study area varies depending on the season, owing to the substantial rainfall differences for the two seasons. Sothe et al. (2017) [13
] assert that differences in classification accuracies for the dry and rainy seasons can be attributed to the differences in solar illumination geometry between the two seasons. For images acquired in the dry season, the incident sun radiation arrives in a more perpendicular direction to the Earth’s surface, thus reducing the shadow effect caused by topography and variations in the forest canopy height, and leading to greater pixel illumination. For the current study, there was a substantial increase in classification accuracies when using a composite of dry and rainy Sentinel 2 images (IMG 4). For the ik-NN, RF, and SVM classifiers, the greatest accuracies were obtained for the combined rainy and dry IMG 4 relative to the rainy or dry season alone (Table 6
). The accuracy increase for the composite image may be explained by the fact that different seasons contain different information for the same kind of land cover (e.g., dipterocarp forest is deciduous in dry seasons and green in rainy seasons). Combining the two season’s image bands captures additional information on land cover.
Among all combinations of images, classification algorithms, and land classes, the smallest SE for area estimates was for the water surface class owing to its stability, whereas the largest SE was for the industrial plant class. In fact, because cultivation characteristics of industrial plants in the study area are quite complex with a variety of species such as coffee, pepper, and cashew, all with uneven ages, large SEs are inevitable. This complexity also explains the large difference among area estimates for this class, ranging from 1643.45 km2
to 2223.87 km2
, or from 25% to 34% of the total area (Figure 10
Although classification accuracies for vegetation classes were not particularly large, the classifications are still useful for complex tropical rain forests that have been disturbed to different degrees such as in the Central Highlands of Vietnam. The area estimates and spatial distributions of the LULC classes produced from the current study will assist local authorities, managers, and other stakeholders in decision-making and planning regarding forest land cover and uses. The usual practice is for the Institution of Forest inventory and Planning (FIPI) to conduct a forest inventory and construct a forest map every five years. Local forest units such as Dak Nong receive the maps and update them manually. However, the accuracy of the map has usually not been announced, and inaccuracies and errors have been detected only by local forest staff when patrolling in the field. Moreover, LULC changes, particularly for industrial land, occur quickly and easily owing to factors such as unstable crop markets and increasing population resulting from migration. Thus, the results of this study will not only provide authorities with updated information on current conditions, but will also serve as a recommendation regarding methods for proactively updating LULC maps in a timely and costly manner. Specifically, timely and updated maps assist authorities by serving as a basis for formulating suitable solutions and policies for managing LULC including forest cover.
This research showed the utility of combining Sentinel-2, multi-spectral, and dry and rainy season band data when mapping LULCs in Dak Nong Province, Vietnam. The greatest accuracies were achieved for the composite IMG 4 obtained by combining dry and rainy season image sets using the SVM classifier.
Among the classifiers, SVM produced the greatest accuracies, although RF, which had similar accuracies, was simpler to train and apply, and was less computationally intensive. For IMG 4, the greatest accuracies with SVM were OA = 80.3% and Kappa index = 0.813; for RF, the greatest accuracies were OA = 80.0% and K = 0.802. Thus, the combination of dry and rainy season imagery used with the SVM or RF may contribute to potentially new ways for classifying the complex tropical forest of Vietnam and similar areas. The area estimates and spatial distributions of the LULC classes produced from the current study will assist local authorities, managers, and other stakeholders in decision-making and planning regarding forest land cover and uses.
In conclusion, the two-season, multi-spectral Sentinel-2 images provided useful data for classifying LULC classes in areas with substantial fragmentation, especially for natural forests that have been disturbed and degraded at different levels such as in Dak Nong, Vietnam. The SVM and RF machine learning algorithms were both accurate classifiers when used with the Sentinel 2 imagery. The methods developed for this study are applicable to boreal and temporal forests with different classes in addition to the tropical forests for the current study. However, the model parameters always need to be re-estimated for each application.