New JAXA High-Resolution Land Use/Land Cover Map for Vietnam Aiming for Natural Forest and Plantation Forest Monitoring

Highly detailed and accurate forest maps are important for various applications including forest monitoring, forestry policy, climate change, and biodiversity loss. This study demonstrates a comprehensive and geographically transferable approach to produce a 12 category high-resolution land use/land cover (LULC) map over mainland Vietnam in 2016 by remote sensing data. The map included several natural forest categories (evergreen broadleaf, deciduous (mostly deciduous broadleaf), and coniferous (mostly evergreen coniferous)) and one category representing all popular plantation forests in Vietnam such as acacia (Acacia mangium, Acacia auriculiformis, Acacia hybrid), eucalyptus (Eucalyptus globulus), rubber (Hevea brasiliensis), and others. The approach combined the advantages of various sensor data by integrating their posterior probabilities resulting from applying a probabilistic classifier (comprised of kernel density estimation and Bayesian inference) to each datum individually. By using different synthetic aperture radar (SAR) images (PALSAR-2/ScanSAR, PALSAR-2 mosaic, Sentinel-1), optical images (Sentinel-2, Landsat-8) and topography data (AW3D30), the resultant map achieved 85.6% for the overall accuracy. The major forest classes including evergreen broadleaf forests and plantation forests had a user’s accuracy and producer’s accuracy ranging from 86.0% to 95.3%. Our map identified 9.55 × 106 ha (±0.16 × 106 ha) of natural forests and 3.89 × 106 ha (±0.11 × 106 ha) of plantation forests over mainland Vietnam, which were close to the Vietnamese government’s statistics (with differences of less than 8%). This study’s result provides a reliable input/reference to support forestry policy and land sciences in Vietnam.


Introduction
Globally, 4.7 million ha/year of net forest loss and a 3 million ha/year increase in planted forests have been reported between 2010 to 2020, according to Food and Agriculture Organization of the United Nations (FAO) [1]. The forest dynamics have been attributed to main drivers such as land use/land cover (LULC) conversion for commodity production, forestry, agriculture shifting, wildfire, and urbanization [2]. These land modifications have led to negative environmental impacts including the increase in greenhouse gas emission [3], disruption of the water cycle [4,5], increase in soil erosion [6], biodiversity loss [7,8], and disruption of local livelihoods [9]. To alleviate these issues, managements and policies have been proposed and implemented from local to global scales. In such frameworks, the importance of detailed and accurate measurements of forest types have been emphasized. The mapping of natural forests and plantation forests can provide a more accurate input for actual deforestation detection, carbon assessment, climate change modelling, and biodiversity loss detection.
Previous studies present various approaches to distinguish plantation forests and natural forests using remote sensing data. One approach is to make use of the phenological characteristics of specific plantation types based on time-series imagery. Typical studies using this approach have adopted the difference in spectral characteristics of the defoliation period of deciduous rubber to separate it from natural forests [10][11][12][13][14]. Another approach is to use image processing techniques for enhancing the separability of plantation forests and natural forests. Specifically, texture analysis [15] has been used to differentiate the unique spatial pattern of the targeted plantation, e.g., oil palm fields, from surrounding land covers [16,17]. This technique is usually applicable to high-resolution and cloudless images. Another technique is using remote sensing indices to amplify the differences in the spectral information of the two forest types. A number of optical vegetation indices have proved their effectiveness in mapping plantation forests and natural forests [12,[18][19][20][21][22] such as the normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), soil-adjusted total vegetation index (SATVI), normalized difference tillage index (NDTI), and land surface water index (LSWI). In addition, the development of new radar satellites has facilitated the increasing involvement of radar data in LULC mapping and forest monitoring [23]. L-band synthetic aperture radar (SAR) images have been widely used in forest mapping at the global scale as well as at the regional scale [24][25][26] since it can provide cloud-free structural information sensitive to forest cover. Radar indices such as the polarization ratio, normalized difference index (NDI), and the NL index have also been used in forest type mapping [20,25,[27][28][29][30][31][32][33]. Another comprehensive approach that recent studies have frequently demonstrated is the combination of optical and SAR imagery [12,[19][20][21]27,28,34,35]. The synergy of structural information from SAR images and biophysical information from optical images has improved the accuracy and map detail. In recent studies, this integration of different data types has been carried out by data fusion at a feature level, where the optical and SAR images are stacked into a single dataset as a classification input.
One of the biggest persistent challenges of large-scale forest type mapping is distinguishing between plantation evergreen broadleaf forests (EBFs) and natural EBFs. The spectral characteristics of these two forest types are mostly identical. Several approaches have been attempted to address this challenge. For instance, the detection of acacia has been based on very high-resolution satellite images such as GeoEye [36], airborne photos [37], or complex radiative transfer models [38]. These approaches are suitable for a small scale. Additionally, several studies exploited the fluctuation in the spectral indices during short-rotation cycles to detect a short-rotation eucalyptus [39,40]. This method required inter-annual time-series data for at least 5 to 7 years to sufficiently cover at least one rotation.
Vietnam is a tropical country with about 42% forest cover (as of 2019), in which planted forests account for 29.5% (4.3 million ha) of the total forest area (14.6 million ha) [41]. Acacia is the most popular plantation tree in Vietnam with over 1.1 million ha (as of 2014 [42]) and it has been showing a substantial expansion in southeast Asia during the last three decades [43]. Besides, Vietnam has nearly 1 million ha of rubber and about 500 thousand ha of eucalyptus [44]. Therefore, there is a need for constructing a higher comprehensive mapping approach that is applicable for (1) different plantation forest types; (2) various ranges of geographic regions; (3) short time coverage (e.g., annual mapping).
The advantage of forest mapping studies nowadays is the development of data-rich sources.
The availability of open satellite data such as Global PALSAR-2/PALSAR (phased array type L-band synthetic aperture radar) mosaic [45], Sentinel constellation [46], and new Landsat satellites [47], with advances in their specifications, has offered mapping forest types at a large scale and in high detail. Besides, open cloud-computing platforms for remote sensing, such as Google Earth Engine, have supported data curation in high performance at any spatial scope.
Japan Aerospace Exploration Agency (JAXA) has released several LULC map products for Vietnam using remote sensing data [48]. The previous high-resolution LULC products of Vietnam (version 16.09, 18.07, 18.09) [49,50] with a 10 or 15 m resolution have presented the changes in land cover over about one decade (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017). The most recent LULC products (version 19.08) showed annual changes in land cover from 2015 to 2018 in a 50 m resolution [26]. In terms of forest mapping, these above-mentioned LULC products categorized forests in Vietnam as one class whereas the advantage of this research is to produce a 10 m resolution LULC map with many forest classes. Highly detailed forest maps for Vietnam would be of importance in supporting the REDD+ (reducing emissions from deforestation and forest degradation), in which Vietnam is one of the first countries to have participated.
The goal of this paper was to produce a high-resolution LULC map that distinguishes natural forests and plantation forests (acacia, eucalyptus, rubber, and others) across different geographic regions in mainland Vietnam in 2016 using remote sensing data. The specific objectives were to: (1) construct a comprehensive mapping approach that classifies various types of natural forests and plantation forests for the entirety of Vietnam; (2) evaluate the classification performance of satellite data including PALSAR-2/ScanSAR, PALSAR-2 mosaic, Sentinel-2, Sentinel-1 and Landsat 8; (3) [54]. The accurate measurement of natural forest and plantation forest dynamics would support an investigation about potential telecouplings in plantation forest lands in Vietnam such as the expansion of farm-based plantation of smallholders [55] or vulnerable households under the plantation forest expansion [56].

Study Area
Vietnam is located in the southeast Asia region and extends from 8 • 37 N to 23 • 23 N and from 102 • 11 E to 109 • 27 E (Figure 1a). This paper is focused on mainland Vietnam, which excluded isolated and insufficiently observed islands. Mainland Vietnam accounts for about 332,000 km 2 and consists of a diversity of ecological landscapes and climate regions. Northern Vietnam's climate is characterized by monsoonal features with four distinct seasons whereas, in the south, the climate is tropical monsoon with two seasons (rainy and dry). The topography of Vietnam is featured by mountains and hills (75% of the total area), deltas and coastal areas (25%). In Vietnam, forest ecosystems are considered abundant [57] and rich in biodiversity [58]. In terms of the foliage characteristics, natural forests in Vietnam can be categorized into three main types: (1) an evergreen broadleaf forest, which is major and widely distributed (occupies more than 85% of the total natural forest area); (2) a deciduous (mostly deciduous broadleaf) forest, mainly distributed in the Central Highlands and South Central Coast; (3) coniferous (mostly evergreen coniferous) forest, mainly distributed in the Central Highlands (Figure 1b-d). Vietnam is considered one of the few developing countries where forest transition, from the net forest loss to net forest gain, is recorded [59,60]. After a period of deforestation from the Vietnam War in the early 1980s, the forest cover in Vietnam has increased due to many reasons such as decollectivization in the Doi Moi economic reform (1986), the allocation of forestry land to households, development of timber markets [61] and national reforestation programs [62,63]. The increase in forest cover in Vietnam mainly comes from the expansion of plantation forests. Plantation forests in Vietnam are dominated by acacia, rubber, and eucalyptus (Figure 1e-g). Besides, other plantation trees, such as pine (Pinus), Manglietia conifera Dandy, Melaleuca cajuputi, etc., occupy minor areas.

Mapping Approach
To distinguish natural forests and plantation forests over many geographic regions in Vietnam, our mapping approach was designed based on the differences between the two forest types. These differences were supposed to be independent foliage characteristics since both plantation and natural forests in Vietnam contain several tree types (EBF, deciduous, coniferous). Here, we formed a hypothesis relating to the differences between natural forests and plantation forests as follows: (1) Vertical structure: plantation forests demonstrate uniform structures such as the lattice pattern of rubber in Figure 1e or the dense pattern of acacia in Figure 1f. Trees of plantation forests have the same height, same diameter at breast height (DBH), and same density. On the contrary, natural forests present nonuniform structures such as a random pattern of canopies as seen in Figure 1b. Natural forests are structurally very diverse with a high degree of variation in height classes, DBH and densities. This difference can be recognized by the combinations of L-band SAR polarizations of horizontal transmit-horizontal receive (HH) and horizontal transmit-vertical receive (HV); (2) Biophysical features and water content: the chlorophyll concentration, greenness, brightness, moisture, etc. of plantation forest canopies are different from those of natural forest canopies. This difference can be recognized using water and vegetation indices derived from optical images; (3) Topography: plantation forests are mostly cultivated in low slope lands while natural forests grow in higher slope lands. This difference can be recognized by topography data.
Based on the hypothesis, we constructed a comprehensive mapping approach that satisfied the following criteria:

•
Integrating information from various sensors to recognize all the differences between the two forest types; • Making use of the time-series data to capture information on the phenology, which are essential for the classification of deciduous forests, rice, and other agricultural crops; • Making use of the spectral indices and radar indices aside from the original bands and polarizations. As the indices are less sensitive to atmospheric noise and viewing geometry, they can support the geographical transferability.
To integrate various pieces of information from the multitemporal images of multiple sensors, our approach adopted a nonparametric probabilistic classifier for each of the sensor data, then the integration was implemented by multiplying the resultant probability of each sensor's data classification results. The classifier was based on the kernel density estimation (KDE) with a Bayesian inference [64]. In previous studies, this method was applied for mapping high-resolution land use and land cover products for the entirety of Japan [64,65] and Vietnam [26,49,50]. The resulting products were published as open land cover data by JAXA [66].
A brief explanation of the classification process is as follows. The classifier simulated the probability density function of each land cover category based on the KDE technique with the training data as the input (Equation (1)). The selected kernel type in this study was the Gaussian function (Equation (2)). The posterior probability values corresponding to each of the land cover types were then estimated at the pixel-wise level for the entire feature space based on the Bayes theorem (Equation (3)). For the integration of multitemporal images of the sensors, the joint posterior probability of each of the land cover categories was estimated by multiplying the component posterior probability values of each single-date image from each sensor's data (Equation (4)). Finally, the predicted land cover type of each pixel was assigned by choosing the one having the highest probability (Equation (5)). The engineering of the classification process was conducted using the Saclass software version 17.06 developed by JAXA and the University of Tsukuba (Hashimoto et al. (2014) [64]).
where C k is the k-th category (k = 1, 2, . . . , M, where M is the number of categories; here, M = 12); p(C k ) is the prior probability of category C k ; p(x | C k ) was estimated based on the training data using the KDE (Equation (1)); where 1 ≤ n ≤ N k ; N k is the number of training data in category C k ; h d is the bandwidth of the KDE, defined by Scott's rule in Equation (6); σ d is the standard deviation of the d-th component training data of category C k ; p(C k | x) is the posterior probability; p (C k ) is the joint posterior probability;Ĉ is the predicted land cover type. This study conducted mapping for each 1 • × 1 • tiles ( Figure 1a) individually, instead of mapping the Vietnam area as a whole. The overall workflow of establishing the high-resolution LULC map for Vietnam is illustrated in Figure 2. The preprocessing step for the input data is discussed in Section 2.3.

PALSAR-2/ScanSAR Time-Series Data and Single-Temporal PALSAR-2 Mosaic
The PALSAR-2 is a radar imaging sensor onboard the Advanced Land Observing Satellite 2 (ALOS-2) operated by JAXA. ScanSAR is the name of the image product acquired in the ScanSAR observation mode of PALSAR-2. The ScanSAR mode has a wide swath (350 km) which is suitable for LULC studies and forest monitoring at large scales. The sensitivity of the L-band (1270 MHz) PALSAR/PALSAR-2 images to the forest covers has been exploited in many forest monitoring studies [23,24,26]. The revisit time of the ALOS-2 satellite, which considers one "cycle" as 14 days. The ScanSAR data are provided in 1 • × 1 • tile mosaic images merged by images acquired in one cycle. Therefore, one ScanSAR image can contain the observation data of several days within a 14-day period. These time-mixed images are acceptable for LULC mapping since the phenology would not have a significant discrepancy in such 14-day intervals. The ScanSAR data are a high level-processing product with geometric corrections and terrain corrections including the application of radiometric terrain flattening [67]. ScanSAR images have a 50 m resolution with a georeference of the geographic latitude/longitude WGS84 coordinate system. Each of the images had two polarization data which are HH (horizontal transmit-horizontal receive) and HV (horizontal transmit-vertical receive). The number of ScanSAR images over the Vietnam area in one year depends on the PALSAR-2 Basic Observation Scenario [68]. For this study, the available ScanSAR images in 2016 for the entire Vietnam were 450 1 • × 1 • tile images. The Vietnam area was covered by 60 1 • × 1 • tiles ( Figure 1a). Therefore, the number of images in one coverage tile could be 7 or 8 scenes. The ScanSAR data used in this study were provided by JAXA under the research agreement of "Generation of the Precise Land Cover Map".
Another PALSAR-2 data used in this study was the single-temporal PALSAR-2 mosaic. This yearly product is open data with global coverage provided by JAXA. Originated from Fine Beam Dual Mode (FBD), the PALSAR-2 mosaic product has a spatial resolution of 25 m [69]. In this study, PALSAR-2 mosaic was used as a complement for ScanSAR data since its resolution is higher than the ScanSAR's resolution. This product was provided in 1 • × 1 • tiles with HH and HV polarizations.
For preprocessing, PALSAR-2/ScanSAR time-series data and single-temporal PALSAR-2 mosaic had the same procedure. First, the gamma-0 radar backscatter (unit in decibel (dB)) was derived from the digital number (DN) by Equation (7) [69] (CF is the calibration factor with a given value of −83.0 dB; is an ensemble averaging operator). The radar shadowing and layover pixels were masked by the enclosed mask files. A Lee filter [70] with a 5 × 5 moving window was applied to suppress the speckle noises in gamma-0 images. The radar indices were estimated and then stacked with the two original gamma-0 HH, HV images. The radar indices included ratio (RAT, Equation (8)), normalized difference index (NDI, Equation (9)), and NL index (NLI, Equation (10)). These indices have been proved to be effective in the classification of natural forests and plantation forests [27,28].

Sentinel-1 Time-Series Data
Sentinel-1 satellites provide C-band SAR images (at 5.045 GHz) with an incidence angle between 20 and 45 • . The revisit time of Sentinel-1 constellation (Sentinel-1A and Sentinel-1B) is 6 days (12 days for each individual Sentinel-1). This study used Sentinel-1 images in interferometric wide swath (IW) mode with a swath width of 250 km, and a resolution of 10 m. All the images were acquired in descending observation, with two polarizations including VV (vertical transmit-vertical receive) and VH (vertical transmit-horizontal receive). The Sentinel-1 data were collected from the Google Earth Engine (GEE) cloud platform. These data were provided in the ground range detected (GRD) level 1 product [71] with additional preprocessing including thermal noise removal, radiometric calibration, and a terrain correction using SRTM 30 or ASTER DEM [72]. Since the radiometric terrain flattening was not applied, the pixel value of the data presented a sigma-0 value (unit in decibel (dB)). We generated 8 composite images during 2016, and each composite image was created by taking the pixel-wise median values of all images during each 1.5-month interval using a median reducer function of the GEE [73]. These data were then trimmed into 1 • × 1 • tiles (Figure 1a). Similar to the PALSAR-2 data, a Lee filter with a 5 × 5 moving window was applied to remove the speckle noise in sigma-0 Sentinel-1 images. The radar indices of the Sentinel-1 data, which were analogous to those of the PALSAR-2 data, were then estimated (Equations (11)-(13)). The utilization of these radar indices was based on an assumption that the C-band SAR indices can support and improve the classification of a low biomass plantation and natural vegetation, e.g., crops and natural grass or shrubs [74].

Sentinel-2 and Landsat-8 Data
The Sentinel-2A MultiSpectral Instrument (MSI) data have been available from June 2015 with a high spatial resolution (10 m, 20 m, 60 m), high temporal resolution (10-day revisit time) and 290 km swath width. The Sentinel-2 data during 2016 used in this study were collected from the GEE. The data archived in the GEE were in level 1C, which included cloud masking flags for dense clouds and cirrus clouds [75]. After applying cloud masking, 8 median composite images were generated with 1.5-month intervals (same as the composite method applied for the Sentinel-1 images). Even after cloud masking by a quality assessment (QA) file and conducting the median composite, non-negligible cloud covers were still present in several images. To remove these cloud covers, we adjusted the too-bright threshold value of visible spectral bands and masked out bright pixels [76]. The images were then trimmed to 1 • × 1 • tiles ( Figure 1a). All 13 multispectral bands of the Sentinel-2 were used in this study (Table 1).

AW3D30 Topographic Data
AW3D30 is a 30 m resolution digital surface model (DSM) product generated from the Panchromatic Remote-sensing Instrument for Stereo Mapping (PRISM), which is an optical sensor onboard the Advanced Land Observing Satellite (ALOS). The purpose of using this auxiliary in this study was to improve the separability of natural landscapes from human-impacted landscapes. For example, plantation vegetations are grown mostly in a specific range of altitudes and slopes due to the ecological requirements of plantation species, and to facilitate human accessibility. AW3D30 was provided open and freely by JAXA [90]. The DSM data were downloaded in 1 • × 1 • tiles, and the slope images were then estimated from the DSM images.
All the preprocessing steps were implemented by Python, Shell Script, Geospatial Data Abstraction Library (GDAL; available online: https://gdal.org/), and Quantum Geographic Information System 3.4 (QGIS; available online: https://qgis.org/en/site/). After preprocessing steps, all the data were organized as in Table 2.

Reference Data and Classification Scheme
The land use/land cover category system in this study was established following criteria from the Land Cover Classification System of the FAO [91] and systematically inherited from previous JAXA LULC products [26,49,50] (Table 3). All forest classes complied with the following conditions: areas must be at least 0.5 ha, the tree height must be higher than 5 m and the canopy cover must be at least 10% [92]. The training data were collected by visual interpretation using high-resolution Google Earth images, Sentinel-2, and Landsat 8 images, with the support of GPS photos taken from field surveys. The training data were created in point-sample form, with each training data point representing a homogeneous area of the targeted land cover type. We selected 179,970 training data points in total for 12 LULC categories (Figure 3a). The number of training data points for each category are shown in Figure 3a in square brackets. This study used GPS photos taken from many field surveys in Vietnam in 2015, 2016, 2018, 2019, and 2020 (Figure 3b), which were designed to serve not only this study but also the production of previous Vietnam LULC maps [26,49,50]. The ground-truth photos supported the identification of land cover types from remote sensing imagery. The GPS photos were taken using GPS cameras or automatic time-lapse GPS cameras (Gopro). The number of GPS photos in each of the field surveys are described in the square brackets of Figure 3b.
The validation data were designed and created following the method described by Olofsson et al. (2013) [93]. First, we used the stratified random sampling method to create sampling points. The land cover types of the resultant map were used as strata. Depending on the area values of each stratum, we allocated different numbers of sampling points (Supplementary Table S1). The strata having areas greater than 2 million ha were sampled by 300 points whereas the strata having areas smaller than 2 million ha were sampled by 150 points. The number of samples in each stratum is given in square brackets of Figure 3c. The total number of samples was 2700 points. The stratified random sampling process was conducted using the AcATaMa plugin of QGIS software. The sampling points were then labeled with land cover types by visual interpretation using Sentinel-2 and Landsat 8 images in 2016. The labeled points were then used as the validation dataset to create the error matrix of the resultant LULC map (Supplementary Table S1). The overall accuracy (OA), user's accuracy (UA), producer's accuracy (PA), and their standard errors were estimated following the methods of Olofsson et al. (2013) and Olofsson et al. (2014) [93,94].
As the classification was conducted for each 1 • × 1 • tile individually (mentioned in Section 2.2), the training data used for the targeting tile was taken from all the training data sampling points located within that tile and its 8 surrounding tiles. This practice could avoid the edge mismatching issue which may occur in the resultant LULC map tiles after classification.

Evaluation of the Classification Performance of Satellite Data
The receiver operating characteristic (ROC) [95] was employed to evaluate the classification performance of input data. The ROC curves illustrated the graphs of true positive rates (TPR) (Equation (24)) versus the false positive rates (FPR) (Equation (25)) at different classification thresholds. The thresholds were determined by the predicted probabilities of positive classes. Generally, ROC curves indicate the trade-off between the TPRs and FPRs of a classification model. A model having ROC curves closer to the top-left corner would indicate a better performance. The area under the curve (AUC), which is estimated by the two-dimensional area underneath the ROC curve (Equation (26)), is the numerical measurement of the ROC curve. A higher value of AUC implies a better classification performance.
This study generated ROC curves for all 12 land cover classes for each input data and the model with all the input data integrated (Figure 4). A total of 2700 validation data points were used for establishing the ROC curves.
The ROC plots in Figure 4 showed that the integration of all the sensor data produced the best overall classification performance. This can be interpreted from the Figure that the ROC curves of the integration model ( Figure 4h) were closer to the top-left corner than those of other individual inputs (Figure 4a-g). The grass/shrub class was the most challenging since its AUC showed the lowest values in all models compared to that of other classes. The comparison of the ROC plots of PALSAR-2/ScanSAR and PALSAR-2 mosaic showed that the ScanSAR time-series data display a better classification performance on forest classes than PALSAR-2 mosaic single-temporal data. This was proved in Figure 4e,f, which shows that the AUC values of the EBF, coniferous, deciduous, plantation of ScanSAR data (0.87, 0.93, 0.70, 0.79, respectively) are mostly higher than those of PALSAR mosaic data (0.83, 0.90, 0.71, 0.76, respectively). On the other hand, the C-band Sentinel-1 ROC plot showed a lower classification performance in forest classes in comparison to all other sensor data. Figure 5 showed the AUC values of all the input data for each of the forest classes. As can be seen in Figure 5, the integration of all the input data showed the best performance, which was reflected by the highest AUC values in all the forest classes. As for the major forest classes, including EBFs and plantation forests (Figure 5a,b), the PALSAR-2 and Sentinel-2 data indicated higher AUC values than the Landsat 8 and Sentinel-1 data. In terms of deciduous forests (Figure 5c), the AUC values of optical data (Sentinel-2, Landsat 8) are higher than those of SAR data (Sentinel-1, PALSAR-2). A possible reason would be the high sensitivity of the time-series optical data to the phenological characteristics of deciduous forests (seasonal leaf drop). The trend of the AUC values of coniferous forests was mostly similar to that of EBFs, with higher AUC values for PALSAR-2 and lower AUC values for other data (Figure 5d). Another salient point is that in most of the cases, the time-series PALSAR-2/ScanSAR data have higher AUC values than the single-temporal PALSAR-2 mosaic data, and the Sentinel-2 data have higher AUC values than the Landsat 8 data. This proof emphasizes the advantages of the L-band time-series ScanSAR and high-resolution optical time-series Sentinel-2 data in forest type mapping in Vietnam.

The Resultant Vietnam LULC Map 2016 and Its Comparison to Other LULC Products
The resultant 10-m resolution LULC map of Vietnam in 2016 is shown in Figure 6a. The overall accuracy of the map was 85.6%. The evergreen broadleaf forest class, which accounts for more than 85% of the total natural forest area, showed a high UA and PA (95.3% and 89.6%, respectively). The other natural forest classes, including deciduous forests and coniferous forests, had accuracies lower than 80%. The plantation forest class also had a high UA and PA (86.0% and 88.0%, respectively). The error of forest classes mainly came from a misclassification between the forest types. Another source of confusion came from grass/shrubs versus EBFs and deciduous forests versus crops. The mangrove cover class demonstrated high classification accuracies with both the UA and PA reaching more than 92%. The detailed error matrix is provided in Supplementary Table S1. Figure 6b-g showed an acacia plantation forest area in this study's map, Google Earth imagery, and several open land cover products. The acacia plantation is a field site that was close to the site described in Figure 1f. As can be seen in the Figures, the acacia plantation forest areas were detected in this study's map while in the ESA-CCI map, MODIS land cover map (MCD12Q1), JAXA LULC map v19.08 and JAXA Forest/Non-Forest map, the acacia forest cover is mostly presented as cropland or nonforest areas. In the FROM-GLC 2017v1 map (Figure 6f), some of the acacia areas were detected as forests. However, the FROM-GLC map does not distinguish natural forests and plantation forests.  Figure 7 shows the comparison between this study and Vietnam national statistical data of the total forest area, the natural forest area, and the plantation forest area. For the natural forest area in this study, we merged the classes including EBFs, deciduous forests, and coniferous forests into one class which represented the natural forests in Vietnam and measured their area and the standard error. The total forest area in this study was measured by summing up all the forest classes in the resultant map. Overall, this study area's numbers were lower than those of the national statistics with minor differences (smaller than 8% in all the three forest areas). The total forest area, the natural forest area and the plantation forest area in this study were about 13.50 × 10 6 ha (±0.20 × 10 6 ha), 9.55 × 10 6 ha (±0.16 × 10 6 ha) and 3.89 × 10 6 ha (±0.11 × 10 6 ha), respectively, whereas those of the national statistical data were about 14.38 × 10 6 ha, 10.24 × 10 6 ha, and 4.14 × 10 6 ha, respectively [41]. The differences in forest areas between this map and the national statistics may come from the error of this map and the difference in the definition of land use/land cover used by each of the sources. The national statistics counted all the land areas assigned as forest land, even when there was no tree stands at that time. For this map, forest areas were estimated based on actual forest covers detected by the remote sensing data.

Comparison between This Study's Map and the Vietnam Forest Resource (VFR) Map 2016
We compared the resultant LULC map with the forest map of the government of Vietnam, namely the Vietnam Forest Resource Map (2016). The VFR Map was established under a Vietnam national forest inventory program and it has been opened to the public [54]. The category system of the VFR Map included 17 classes. The source of the VFR Map also provided a simplified forest map with three classes including natural forests (a merged class from many natural forest classes), plantation forests, and bare land. We estimated 10-km resolution fractional cover maps of the natural forests and plantation forests of this study's map and the simplified VFR Map (Figure 8a-d). The natural forest class in this study's map was merged from the EBFs, deciduous forests, and coniferous forests. We then estimated the fractional difference maps (absolute value of the subtraction) between this study and the VFR Map (Figure 8e,f) to examine the degree of consistency between our result and the official map. Figure 8 showed that this study's maps and the VFR Maps had a good consistency at a 10-km resolution. Most of the area over mainland Vietnam had a fractional cover difference of less than 10% in terms of both natural forests and plantation forests. However, several areas revealed a substantial fractional difference, shown as zoom-in sites in Figure 8e,f. We compared our LULC map with the VFR Map at the three sites to explore the causes of these differences (Figure 9).  In the following discussion, we assumed that the VFR Map properly reflected the reality of the forest status of Vietnam in 2016. As for Site 1 (Yen Bai, Ha Giang provinces), many plantation forest areas in this study's map (Figure 9a) were presented as mixed wood and bamboo forest in the VFR Map (Figure 9b). The natural bamboo forests have some similar characteristics to plantation forests such as a low biomass, low moisture content, and bamboo trees having similar trunk sizes. Therefore, the natural bamboo forests are likely to be confused with plantation forests in the classification process.
For Site 2 (Lang Son, Quang Ninh provinces), areas shown as natural forest (EBF) in this study map (Figure 9c) were presented as plantation forests in the VFR Map (Figure 9d). Interestingly, there was a nature-oriented reforestation project aided by the government of Germany from 1995 to 2005 conducted in this region (Bac Giang, Quang Ninh, Lang Son provinces) [96]. According to Sturm and Apel (2006) [96], the project aimed for planting near-natural forests with multiple functions of forest ecosystems and attempting to harmonize the ecological, economic, and social requirements of the region. Therefore, the structures and characteristics of these planted forests in Site 2 were identical to natural forests. This similarity caused the misclassification of the near-natural planted forests, in which they were detected as natural EBFs in this study's map.
For Site 3 (Binh Duong, Binh Phuoc provinces), the rubber plantation forests in this study's map ( Figure 9e) were not presented in the VFR Map (Figure 9f). The cause of this disagreement came from the fact that the Vietnamese government's forest data excluded rubber plantation. In Vietnam's national statistics, rubber plantation areas have been considered as agricultural lands (perennial industrial crops) [97]. However, according to the FAO [92] and the country report of Vietnam for the FAO Forest Resources Assessment 2015 [98], the definition of forest considered rubber to constitute plantation forests. This study followed the FAO definitions to consider rubber as a plantation forest.

Advantages and Potential Applications of the Resultant LULC Map
In terms of methodology, the advantage of this map was demonstrated by the comprehensiveness of the mapping approach. Based on the hypothesis on the ultimate differences between natural forests and plantation forests, the comprehensiveness of the approach consisted of (1) combining the advantages of various sensors using the probabilistic integration at the decision level; (2) using time-series data; (3) using remote sensing indices. The robustness of this approach was proved by its good performance in the ROC analysis.
In terms of product quality, the advantage of this study's LULC map was highlighted by a comparison with previous JAXA LULC maps of Vietnam and other LULC products. This map has a higher resolution (10 m), which is currently the best resolution among JAXA LULC products and other global LULC products. While all the previous JAXA LULC maps of Vietnam categorized forests as one class, this study's map have four forest classes, which can offer better support for forest monitoring initiatives such as REDD+, land use, land use change, and forestry (LULUCF), the national forest inventory, etc. In addition, this map was one of the first regional maps that distinguished natural forests and plantation forests, in which plantation forests were comprised of various foliage types including evergreen broadleaf, deciduous, and coniferous foliage. This salient point would open a potential direction for mapping natural forests and planted forests at a global scale to improve the accuracy of carbon emission assessments, the detection of deforestation, and assessments of biodiversity loss. As for the classification accuracy, the major forest classes, such as EBFs and plantation forests, had high accuracies (PA, UA) ranging from 86.0% to 95.3%. In terms of mapping the forest/nonforest cover, this study's map (merging all forest classes as one and merging all nonforest classes as the other), showed a very higher accuracy (95.7%, Supplementary Table S2). In addition, the higher-accuracy mangrove cover class in this map can facilitate studies associated with blue carbon assessments [99] or mangrove ecosystem services [100,101].

Limitations and Challenges of This Study's Map
The comparison between our map and the VFR Map revealed several limitations and challenges in distinguishing natural forests and plantation forests. The misclassification of the natural bamboo forests was one of the challenges as bamboo forests and plantation forests had similar features in our classification design. Bamboo forests have been considered one important carbon pool because of their strong carbon sequestration capacity [102]. According to Du et al. (2018) [102], bamboo forests in Vietnam occupied 1.018 × 10 6 ha, which accounted for about 3.33% of the total bamboo forest area over the world and about 10% of the total natural forest area of Vietnam. Therefore, bamboo forests should be considered as a future target for our LULC map's improvement and update. Potential solutions to overcome the misclassification of the bamboo forests would be using the leaf area index (LAI) information or adopting the approach of global bamboo forest mapping by Du et al. (2018).
Another challenge was the misclassification of the nature-like planted forests. The establishment of these planted forests involved the use of indigenous species and the incorporation of natural succession [96]. Hence, the characteristics of these forests and natural forests were mostly identical. Therefore, the detection of the nature-like planted forests would require human knowledge-embedded information as ancillary data along with Earth observation imagery.
The misclassification of the grass/shrub class indicated another limitation of this study's map. The error matrix (Supplementary Table S1) showed that the grass/shrub class was mostly confused with EBFs, orchard/crop mosaics, and barren land. The reason for this misclassification may come from the imperfect training data of the grass/shrub class, which were possibly created by an inaccurate visual interpretation. In some cases, the interpreters were not highly confident in identifying whether an area in a satellite image is a rich shrubland or a degraded forest. Similarly, sparse grasslands and bare lands were sometimes difficult to distinguish from each other by satellite image interpretation. The quality of the training data can be improved by increasing the ground-truth data or consulting various sources such as available open reference data.

Future Research Directions
In terms of time-series scalability, this study can be replicated for periods of historical high-resolution remote sensing observations such as ALOS/AVNIR2 and ALOS/PALSAR (2006-2011). Thus, such an expected high-resolution time-series LULC map would provide the long-term dynamics of natural forests and plantation forests in Vietnam. These forest dynamics, in turn, would offer various research directions for identifying the links between forest resources and social-economic issues. For example, previous studies on plantation forests in Vietnam indicated that the expansion of plantation forests in Vietnam had a strong relationship with the expansion of plantation farm-based smallholders [55] and the vulnerability of resource-poor local people [56].
Another future work would be to improve the detail level and accuracy of future LULC maps. Recent studies on carbon emissions from land cover changes in Vietnam, like Avitabile et al. (2016) [103], and the REDD+ readiness status [104] have called for highly reliable and detailed LULC and forest maps. Carbon emission assessments would be more accurate if the input LULC maps contained sub-categories of natural forests such as bamboo, rich forests, medium forests, poor forests, and regrowth forests. Similarly, separating plantation forests into single tree species sub-categories such as rubber, acacia, eucalyptus, pine, etc., would be of importance. Although the VFR Map has a high level of detail, it is expensive because its establishment relies on satellite imagery with a visual interpretation of numerous forestry staff and specialists, and it has been carried out at five-year intervals. Therefore, the expected improved LULC product would provide more timely and objective data to policymakers and the land science community.
Future research initiatives will have more opportunities for improvement since, along with current data archives, we will have the opportunity to use new data from future satellites such as Landsat 9 (optical, 2021), NISAR (L-band SAR, 2021), BIOMASS (P-band SAR, 2021), Tandem-L (L-band SAR, 2023), ALOS-3 (optical, future), ALOS-4 (L-band SAR, future) and a new Vietnamese satellite, LOTUSat-1 (X-band SAR, 2023). Moreover, the continued evolution of advanced deep learning algorithms would provide new classification methods for improving LULC map production.

Conclusions
In this study, we demonstrated a comprehensive approach to create a high-resolution LULC map which aimed at distinguishing natural forests and plantation forests (acacia, rubber, eucalyptus, and others) in Vietnam. This approach comprised of integrating various data products from multiple sensors (PALSAR-2/ScanSAR, PALSAR-2 mosaic, Sentinel-1, Sentinel-2, Landsat 8, AW3D30) at the decision level, after applying the probabilistic classifier for each data, taking advantage of time-series data and remote sensing spectral indices. A ROC analysis showed that the integration of all the sensor data displayed a better classification performance than any individual sensor's data. In addition, the PALSAR-2/ScanSAR and Sentinel-2 data showed better classification performances in forest classes compared to the data products from other sensors.
The high-resolution LULC map over mainland Vietnam in 2016 was produced with 12 classes and an overall accuracy of 85.6%. The major forest classes such as EBFs and plantation forests reached high accuracies of more than 86%. The comparison of the natural and plantation forest fractional covers between this study's map with Vietnam's national statistics and the Vietnam Forest Resources Maps 2016 showed good agreement except for the limitation of the bamboo forest misclassification (confused with plantation forests). This study confirmed the feasibility of producing highly detailed and accurate forest type maps in the forthcoming big data era of Earth observation. There is also a further need to reproduce the resultant map in historical periods to have spatially explicit insights into the constraints of plantation forest dynamics with specific socio-economic and policy backgrounds in Vietnam.
For applications and other interests, the high-resolution LULC map of mainland Vietnam in 2016 can be downloaded from the JAXA/EORC website as follows: https://www.eorc.jaxa.jp/ALOS/en/lulc/ lulc_vnm_v2006.htm.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2072-4292/12/17/2707/s1, Table S1: Error matrix of the resultant LULC map, Table S2: Error matrix of the resultant LULC map in terms of Forest/Non-Forest. The reader can also download the land use/land cover map of mainland Vietnam on the JAXA website (https://www.eorc.jaxa.jp/ALOS/en/lulc/lulc_vnm_v2006.htm).