High-Resolution Vegetation Mapping in Japan by Combining Sentinel-2 and Landsat 8 Based Multi-Temporal Datasets through Machine Learning and Cross-Validation Approach

This paper presents an evaluation of the multi-source satellite datasets such as Sentinel-2, Landsat-8, and Moderate Resolution Imaging Spectroradiometer (MODIS) with different spatial and temporal resolutions for nationwide vegetation mapping. The random forests based machine learning and cross-validation approach was applied for evaluating the performance of different datasets. Cross-validation with the rich-feature datasets—with a sample size of 390—showed that the MODIS datasets provided highest classification accuracy (Overall accuracy = 0.80, Kappa coefficient = 0.77) compared with Landsat 8 (Overall accuracy = 0.77, Kappa coefficient = 0.74) and Sentinel-2 (Overall accuracy = 0.66, Kappa coefficient = 0.61) datasets. As a result, temporally rich datasets were found to be crucial for the vegetation physiognomic classification. However, in the case of Landsat 8 or Sentinel-2 datasets, sample size could be increased excessively as around 9800 ground truth points could be prepared within 390 MODIS pixel-sized polygons. The increase in the sample size significantly enhanced the classification using Landsat-8 datasets (Overall accuracy = 0.86, Kappa coefficient = 0.84). However, Sentinel-2 datasets (Overall accuracy = 0.77, Kappa coefficient = 0.74) could not perform as much as the Landsat-8 datasets, possibly because of temporally limited datasets covered by the Sentinel-2 satellites so far. A combination of the Landsat-8 and Sentinel-2 datasets slightly improved the classification (Overall accuracy = 0.89, Kappa coefficient = 0.87) than using the Landsat 8 datasets separately. Regardless of the fact that Landsat 8 and Sentinel-2 datasets have lower temporal resolutions than MODIS datasets, they could enhance the classification of otherwise challenging vegetation physiognomic types due to possibility of training a wider variation of physiognomic types at 30 m resolution. Based on these findings, an up-to-date 30 m resolution vegetation map was generated by using Landsat 8 and Sentinel-2 datasets, which showed better accuracy than the existing map in Japan.


Introduction
Shifting of vegetation zones and changes in floristic compositions have been reported under the influence of climate change [1][2][3][4][5].Discrimination of vegetation physiognomic characteristics (structure-tree, shrub, herbaceous; or leaf-evergreen or deciduous, needle-leaved or broad-leaved) [6] using satellite remote sensing data is important for better understanding the vegetation responses to Land 2017, 6, 50 2 of 11 changes in environmental conditions with the possibility of tracking changes in vegetation structure and composition [7].
The distributions of Japanese vegetation are found in highly fragmented condition.In the case of moderate resolution (~500 m) satellite data such as Moderate Resolution Imaging Spectroradiometer (MODIS), most of the pixels are influenced by heterogeneous mixtures of the vegetation types.Therefore, due to mixed pixel effects, discrimination and mapping of the vegetation types is difficult using the MODIS data.Moreover, the resulted moderate resolution map misses many fragmented vegetation patches.On the contrary, higher resolution (~30 m) satellite data can represent many biophysical processes and characteristics of the land surface [36].Hence, the higher resolution mapping of vegetation types can play a tremendous role in the conservation and management of the vegetation.There has been some progress in the production of high-resolution land cover maps at national, regional, and global scales recently [37][38][39].Recent research using the MODIS data has reported that multi-temporal satellite datasets are crucial for discriminating the vegetation physiognomic types; whereas classification accuracy does not vary much with the classifier but the performance is very sensitive to input features and size of the ground truth data [7].
The main objectives of the research were to evaluate multi-source satellite data such as Sentinel-2, Landsat 8, and MODIS with different spatial and temporal resolutions for the purpose of nationwide vegetation mapping; and to generate an improved high-resolution (30 m) vegetation map in Japan through machine learning and cross-validation approach.The newly produced vegetation physiognomic map was compared to the existing high-resolution land cover map in Japan, and the improvements were discussed.

Preparation of Input Features
Standard terrain corrected (Level 1T) Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) scenes available from 2014 to 2016 over Japan were used.Quantized and calibrated scaled Digital Numbers (DNs) for each OLI and TIRS band delivered as 16-bit unsigned integers were converted into Top-Of-Atmosphere (TOA) spectral reflectance and brightness temperature (K) values using the rescaling coefficients found in the metadata file.Seven bands (blue, green, red, near infrared, mid infrared, shortwave infrared, and thermal infrared) datasets were extracted.The clouds were removed by using separate Quality Assessment (QA) band information available in the data.In addition, three spectral indices: Normalized Difference Vegetation Index (NDVI; [20]), Urban Built-up Index (UBI; [40]), and Superfine Water Index (SWI; [41]) were also calculated for each scene.The multi-temporal data consisting of spectral and spectral indices were composited by calculating multiple percentiles (0, 10,20,30,40,50,60,70,80,90,100) pixel by pixel.In this way, 110 layered datasets (features) were prepared for machine learning and cross-validation.This research deals not only with the classification of vegetation types but also with the discrimination of the vegetation types from non-vegetative cover types (urban, water, and barren).Therefore, the UBI and SWI were used for improving the discrimination between non-vegetation and vegetation types.
All eight-day cycle Nadir BRDF-Adjusted Reflectance (NBAR) data from the MODIS BRDF/Albedo (MCD43A4) product available at 500 m resolution over Japan from 2014 to 2016 were used.Six bands (red, near infrared, blue, green, mid infrared, and shortwave infrared) datasets were extracted which were cloud free.We calculated three spectral indices-NDVI, UBI, and SWI-using the NBAR for each scene.The eight-day datasets were composited using multiple percentiles (0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100) method, and rich features (in total 99) were prepared for further analysis.
Nine spectral bands (blue, green, red, red edge1, red edge2, red edge3, near infrared, red edge4, and shortwave infrared) datasets from the Sentinel-2 Top-Of-Atmosphere (TOA) reflectance product were used.Cloudy pixels were masked out by using separate quality assessment band.Sentinel-2 data with spatial resolutions varying from 10 to 60 m were resampled into 30 m.All available scenes from 2015 to 2017 over Japan were used.Similar to the Landsat 8 and MODIS datasets, three spectral indices (NDVI, UBI, and SWI) were also calculated for each scene.Finally, the data were composited using multiple percentiles (0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100) method.In total, 132 features were prepared for further analysis.

Machine Learning, Cross-Validation, and Mapping
The ground truth polygon data prepared in the previous studies [7,35] which were basically constructed through on-site field inspection were used in the research.This data comprise of 390 polygons for each class under the study.In the case of 30 m resolution Landsat 8 or Sentinel-2 datasets, sample size could be increased excessively as around 9800 ground truth points could be prepared within 390 polygons.This research deals with the classification of six vegetation physiognomic classes: evergreen coniferous forest, evergreen broadleaf forest, deciduous coniferous forest, deciduous broadleaf forest, shrubs, and herbs; and two land cover types: arable, and non-vegetation (urban, water, and barren).The distribution of ground truth data (390 polygons and 9800 points) is demonstrated in Figure 1.
Random forests based supervised classification approach was adopted in the research because it can handle highly non-linear interactions and shows superior performance than other classifiers for the discrimination of vegetation physiognomic types [7,42].The random forests classifier uses bootstrap aggregating (bagging) to form an ensemble of trees by searching random subspaces from the given data (features) and the best splitting of the nodes by minimizing the correlation between the trees.The performance of different features was evaluated by using the 10-fold cross-validation method.The procedure of 10-fold cross-validation method is described as follows: First of all, the given features were divided into 10-fold of samples after shuffling them well.For each fold of samples, learning was carried out only for nine folds, whereas the remaining one fold was used for the validation.However, inside the cross-validation loop, the best scoring features (training) were scored using the random forests algorithm, and different sets of best features (5, 10, 25, 50, and 75) were obtained.Results up to 75 best features are presented because the performances were usually saturated after that.For each set of best features, the random forests model established with the training folds was used to predict the physiognomic classes with the validation fold.Finally, the predictions were collected from cross-validation loops.The same processing was conducted for each dataset (MODIS, Landsat 8, and Sentinel-2).The optimum parameters (no. of trees = 300, max.features = all) of the random forests classifier obtained from hit and trial method were used for each processing.The validation metrics: confusion matrix, overall accuracy, and kappa coefficient were used for assessing the performance.The overall accuracy-sum of true positives and true negatives divided by number of validation points-measures correctness of the classification.Kappa coefficient measures inter-rater agreement by counting the proportion of instances that predictions agreed with the validation data (observed agreement) after adjusting for the proportion of agreements taking place by chance (expected agreement) [43].The random forests based model by selecting best features from the combination of Landsat 8 and Sentinel-2 datasets was used for the production of seamless nationwide vegetation map.The resultant map was compared with the most recently available land use and cover map (Version 16.09, September 2016) in Japan (http://www.eorc.jaxa.jp/ALOS/lulc/jlulc_jpn.htm, accessed on May 05, 2017) using 9800 ground truth points data prepared in the research.For this comparison, 50 m resolution existing map was remapped according to the legends used in the research, and resampled into 30 m resolution.

Cross-Validation Results
The cross-validation results obtained from different datasets (MODIS, Landsat 8, and Sentinel-2) are summarized in Table 1.Different sets (5, 10, 25, 50, and 75) of important features given by the random forests classifier were examined.In the case of sample size of 390, performances of the Landsat 8 (Overall accuracy = 0.77, Kappa coefficient = 0.74) and Sentinel-2 (Overall accuracy = 0.66, Kappa coefficient = 0.61) datasets were significantly lower than the performance of MODIS (Overall accuracy = 0.80, Kappa coefficient = 0.77) datasets.However, availability of large size (9800) of samples could increase the classification accuracy (Overall accuracy = 0.86, Kappa coefficient = 0.84) significantly in the case of Landsat 8 datasets.The performance of the Sentinel-2 datasets was still lower (Overall accuracy = 0.77, Kappa coefficient = 0.74) than the performance of Landsat 8 datasets.Combination of Landsat 8 and Sentinel-2 datasets slightly improved the classification in the case of 9800 samples (Overall accuracy = 0.89, Kappa coefficient = 0.87) or 390 samples (Overall accuracy = 0.78, Kappa coefficient = 0.75) than using the Landsat 8 datasets separately.Confusion matrices computed with different datasets and ground truth sample sizes are plotted in Figures 2 and 3.The discrimination between inter-class physiognomic types are well demonstrated by the confusion matrices.

Cross-Validation Results
The cross-validation results obtained from different datasets (MODIS, Landsat 8, and Sentinel-2) are summarized in Table 1.Different sets (5, 10, 25, 50, and 75) of important features given by the random forests classifier were examined.In the case of sample size of 390, performances of the Landsat 8 (Overall accuracy = 0.77, Kappa coefficient = 0.74) and Sentinel-2 (Overall accuracy = 0.66, Kappa coefficient = 0.61) datasets were significantly lower than the performance of MODIS (Overall accuracy = 0.80, Kappa coefficient = 0.77) datasets.However, availability of large size (9800) of samples could increase the classification accuracy (Overall accuracy = 0.86, Kappa coefficient = 0.84) significantly in the case of Landsat 8 datasets.The performance of the Sentinel-2 datasets was still lower (Overall accuracy = 0.77, Kappa coefficient = 0.74) than the performance of Landsat 8 datasets.Combination of Landsat 8 and Sentinel-2 datasets slightly improved the classification in the case of 9800 samples (Overall accuracy = 0.89, Kappa coefficient = 0.87) or 390 samples (Overall accuracy = 0.78, Kappa coefficient = 0.75) than using the Landsat 8 datasets separately.

Production of Vegetation Map
The random forests model established by selecting best-performing features from the combination of Landsat 8 and Sentinel-2 datasets was used for the production of nationwide vegetation map.The resultant seamless 30 m resolution map is displayed in Figure 4.
The seamless vegetation map produced in the research was compared to the existing map with reference to the ground truth data prepared in the research.Due to variation in the definitions of corresponding legends between two maps, only four types of forests (evergreen coniferous forest, evergreen broadleaf forest, deciduous coniferous forest, and deciduous broadleaf forest) and nonvegetation (merge of urban, water, and barren) type were used for the comparison.The resultant vegetation physiognomic map was superior to the existing map (Overall accuracy = 0.72, Kappa coefficient = 0.66).The low accuracy of the existing map may be due to limited temporal information carried out by the multi-spectral data, insufficient size of the ground truth samples, and satellite datasets of long time span (2006-2011).

Production of Vegetation Map
The random forests model established by selecting best-performing features from the combination of Landsat 8 and Sentinel-2 datasets was used for the production of nationwide vegetation map.The resultant seamless 30 m resolution map is displayed in Figure 4.
The seamless vegetation map produced in the research was compared to the existing map with reference to the ground truth data prepared in the research.Due to variation in the definitions of corresponding legends between two maps, only four types of forests (evergreen coniferous forest, evergreen broadleaf forest, deciduous coniferous forest, and deciduous broadleaf forest) and non-vegetation (merge of urban, water, and barren) type were used for the comparison.The resultant vegetation physiognomic map was superior to the existing map (Overall accuracy = 0.72, Kappa coefficient = 0.66).The low accuracy of the existing map may be due to limited temporal information carried out by the multi-spectral data, insufficient size of the ground truth samples, and satellite datasets of long time span (2006-2011).

Discussion and Conclusions
The MODIS Land Cover Type product (MCD12Q1) is one of the most recently available global land cover product from which vegetation physiognomic information can be obtained.However, in terms of the mapping of vegetation physiognomic types, poor performance of the MCD12Q1 product has been reported in Japan [35].On the other hand, visual interpretation techniques have been used for the nationwide vegetation mappings.For example, Harada et al. [44] prepared the MODIS based vegetation map of year 2001 in Japan by manually labeling the clusters obtained from the Iterative Self-Organizing Data Analysis Technique.Roy et al. [45] used on-screen visual screen technique for the preparation of land use and land cover database in India using medium-resolution Indian remote sensing satellite data.More recently, Sharma et al. [35] employed machine learning and automated classification approach for the production of nationwide vegetation physiognomic map in Japan using MODIS data.However, mapping of the vegetation physiognomic types by using the 500 m resolution MODIS datasets are affected by mixed pixel effect, and the resulting map misses distribution of many vegetation types that occurred in smaller patches.
Ground truth data are inevitable assets of machine learning and cross-validation approach.Vegetation types are found in highly fragmented condition in Japan, and thus preparing ground truth data even from a homogenous area of at least 500 m pixel-size is difficult.In the case of limited size of the ground truth samples (390), MODIS datasets provided best performance (Overall accuracy = 0.80, Kappa coefficient = 0.77) for the classification of vegetation physiognomic types.It should be because of higher temporal resolution covered by the MODIS datasets than by Landsat and Sentinel-

Discussion and Conclusions
The MODIS Land Cover Type product (MCD12Q1) is one of the most recently available global land cover product from which vegetation physiognomic information can be obtained.However, in terms of the mapping of vegetation physiognomic types, poor performance of the MCD12Q1 product has been reported in Japan [35].On the other hand, visual interpretation techniques have been used for the nationwide vegetation mappings.For example, Harada et al. [44] prepared the MODIS based vegetation map of year 2001 in Japan by manually labeling the clusters obtained from the Iterative Self-Organizing Data Analysis Technique.Roy et al. [45] used on-screen visual screen technique for the preparation of land use and land cover database in India using medium-resolution Indian remote sensing satellite data.More recently, Sharma et al. [35] employed machine learning and automated classification approach for the production of nationwide vegetation physiognomic map in Japan using MODIS data.However, mapping of the vegetation physiognomic types by using the 500 m resolution MODIS datasets are affected by mixed pixel effect, and the resulting map misses distribution of many vegetation types that occurred in smaller patches.
Ground truth data are inevitable assets of machine learning and cross-validation approach.Vegetation types are found in highly fragmented condition in Japan, and thus preparing ground truth data even from a homogenous area of at least 500 m pixel-size is difficult.In the case of limited size of the ground truth samples (390), MODIS datasets provided best performance (Overall accuracy = 0.80, Kappa coefficient = 0.77) for the classification of vegetation physiognomic types.It should be because of higher temporal resolution covered by the MODIS datasets than by Landsat and Sentinel-2 datasets.Consequently, temporally rich datasets were found to be crucial for vegetation physiognomic mapping.On the contrary, sample size could be increased excessively as around Land 2017, 6, 50 9 of 11 9800 ground truth points could be prepared within 390 MODIS pixel-sized polygons in the case of 30 m resolution datasets.The increase in the sample size significantly enhanced the Landsat-8 datasets based classification (Overall accuracy = 0.86, Kappa coefficient = 0.84).However, Sentinel-2 datasets (Overall accuracy = 0.77, Kappa coefficient = 0.74) could not contribute as much as the Landsat-8 datasets.This is possibly due to temporally limited datasets covered by Sentinel-2 satellites so far.Regardless of the fact that Landsat-8 and Sentinel-2 datasets have lower temporal resolutions than MODIS datasets, they could enhance the classification of otherwise challenging vegetation physiognomic types due to possibility of training a wider variation of vegetation types at 30 m resolution.
In this research, classification accuracies were assessed by 10-fold cross-validation method using the random forests classifier.Random forests is a powerful algorithm, which is increasingly used in the classification of remote sensing images [46,47].Random forests can handle highly non-linear interactions and classification boundaries of the multi-temporal spectral data.Random forests consists of a large number of deep trees, where each tree is trained on the bagged data using the random selection of features, so gaining a full understanding of how the features interact non-linearly by examining each individual tree is difficult.However, the spectral indices (NDVI, UBI, SWI) used in the research were in the top list of important features as retrieved from the inbuilt feature importance function of the random forests algorithm.
Based on findings from the evaluation of multi-source satellite datasets, a nationwide 30 m resolution vegetation map was produced through machine learning and cross-validation approach.The resultant map showed higher accuracy than the existing map of Japan.Nevertheless, bottlenecks present in the discrimination of some classes especially between the coniferous and broadleaved forests require further improvements in future.With the additional temporal coverage by Sentinel-2 satellites in near future, further improvements in the classification and mapping of vegetation types are expected.The availability of temporally rich datasets from Sentinel-2 satellites would also be useful for much higher resolution (10 m) vegetation mapping activities on a large scale.

Figure 1 .
Figure 1.The distribution of ground truth data (polygons and points inside polygons) prepared in the research: (a) display of the national territory; (b) zoomed in over the black polygon region in (a) showing the density of the reference points.The national boundary is based on the Global Administrative Areas database (GADM) Version 2.8, November 2015.The random forests based model by selecting best features from the combination of Landsat 8 and Sentinel-2 datasets was used for the production of seamless nationwide vegetation map.The resultant map was compared with the most recently available land use and cover map (Version 16.09, September 2016) in Japan (http://www.eorc.jaxa.jp/ALOS/lulc/jlulc_jpn.htm, accessed on May 05, 2017) using 9800 ground truth points data prepared in the research.For this comparison, 50 m resolution existing map was remapped according to the legends used in the research, and resampled into 30 m resolution.

Figure 1 .
Figure 1.The distribution of ground truth data (polygons and points inside polygons) prepared in the research: (a) display of the national territory; (b) zoomed in over the black polygon region in (a) showing the density of the reference points.The national boundary is based on the Global Administrative Areas database (GADM) Version 2.8, November 2015.

Figure 4 .
Figure 4. Nationwide vegetation physiognomic map produced through the research: (a) Display over the national territory; (b) Zoomed in over the black polygon region in (a).The national boundary is based on Global Administrative Areas database (GADM) version 2.8, November 2015.

Figure 4 .
Figure 4. Nationwide vegetation physiognomic map produced through the research: (a) Display over the national territory; (b) Zoomed in over the black polygon region in (a).The national boundary is based on Global Administrative Areas database (GADM) version 2.8, November 2015.

Table 1 .
Cross-validation results with different datasets and ground truth sample size (s).The computed overall accuracy (Kappa coefficient) with different sets of important features (f) are shown.Highest accuracy results obtained among different sets of best features (f) are highlighted.

Table 1 .
Cross-validation results with different datasets and ground truth sample size (s).The computed overall accuracy (Kappa coefficient) with different sets of important features (f) are shown.Highest accuracy results obtained among different sets of best features (f) are highlighted.