Next Article in Journal
The Influence of Geology on Landscape Typology in Jordan: Theoretical Understanding and Planning Implications
Previous Article in Journal
Scenarios of Vegetable Demand vs. Production in Brazil: The Links between Nutritional Security and Small Farming
Article Menu
Issue 3 (September) cover image

Export Article

Land 2017, 6(3), 50; doi:10.3390/land6030050

Letter
High-Resolution Vegetation Mapping in Japan by Combining Sentinel-2 and Landsat 8 Based Multi-Temporal Datasets through Machine Learning and Cross-Validation Approach
Ram C. Sharma 1,*Orcid, Keitarou Hara 1 and Ryutaro Tateishi 2
1
Department of Informatics, Tokyo University of Information Sciences, 4-1 Onaridai, Wakaba-ku, Chiba 265-8501, Japan
2
Center for Environmental Remote Sensing (CEReS), Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba 263-8522, Japan
*
Correspondence: Tel./Fax: +81-43-236-1408
Received: 30 May 2017 / Accepted: 20 July 2017 / Published: 26 July 2017

Abstract

:
This paper presents an evaluation of the multi-source satellite datasets such as Sentinel-2, Landsat-8, and Moderate Resolution Imaging Spectroradiometer (MODIS) with different spatial and temporal resolutions for nationwide vegetation mapping. The random forests based machine learning and cross-validation approach was applied for evaluating the performance of different datasets. Cross-validation with the rich-feature datasets—with a sample size of 390—showed that the MODIS datasets provided highest classification accuracy (Overall accuracy = 0.80, Kappa coefficient = 0.77) compared with Landsat 8 (Overall accuracy = 0.77, Kappa coefficient = 0.74) and Sentinel-2 (Overall accuracy = 0.66, Kappa coefficient = 0.61) datasets. As a result, temporally rich datasets were found to be crucial for the vegetation physiognomic classification. However, in the case of Landsat 8 or Sentinel-2 datasets, sample size could be increased excessively as around 9800 ground truth points could be prepared within 390 MODIS pixel-sized polygons. The increase in the sample size significantly enhanced the classification using Landsat-8 datasets (Overall accuracy = 0.86, Kappa coefficient = 0.84). However, Sentinel-2 datasets (Overall accuracy = 0.77, Kappa coefficient = 0.74) could not perform as much as the Landsat-8 datasets, possibly because of temporally limited datasets covered by the Sentinel-2 satellites so far. A combination of the Landsat-8 and Sentinel-2 datasets slightly improved the classification (Overall accuracy = 0.89, Kappa coefficient = 0.87) than using the Landsat 8 datasets separately. Regardless of the fact that Landsat 8 and Sentinel-2 datasets have lower temporal resolutions than MODIS datasets, they could enhance the classification of otherwise challenging vegetation physiognomic types due to possibility of training a wider variation of physiognomic types at 30 m resolution. Based on these findings, an up-to-date 30 m resolution vegetation map was generated by using Landsat 8 and Sentinel-2 datasets, which showed better accuracy than the existing map in Japan.
Keywords:
vegetation mapping; physiognomy; Sentinel-2; Landsat 8; MODIS; machine learning; cross-validation; Japan

1. Introduction

Shifting of vegetation zones and changes in floristic compositions have been reported under the influence of climate change [1,2,3,4,5]. Discrimination of vegetation physiognomic characteristics (structure-tree, shrub, herbaceous; or leaf-evergreen or deciduous, needle-leaved or broad-leaved) [6] using satellite remote sensing data is important for better understanding the vegetation responses to changes in environmental conditions with the possibility of tracking changes in vegetation structure and composition [7].
Different types of remote sensing data: multi-spectral, hyper-spectral, radar, or LiDAR obtained from satellites or aircrafts have been exploited for the detection and mapping of vegetation at local or large scale [8,9,10,11,12,13,14,15,16,17,18,19]. Major techniques used for the detection, classification, and mapping of vegetation using remote sensing imagery are vegetation indices [20,21], spectral mixture analysis [22], temporal image-fusion [23,24], texture based measures [25], and supervised classification using machine learning classifiers such as maximum likelihood [26], random forests [27,28], decision trees [29], support vector machines [30], fuzzy learning [31], and neural networks [32,33,34]. Nevertheless, performance of existing large-scale land cover maps is limited to the discrimination of vegetation physiognomic types, which is still a challenging field [35].
The distributions of Japanese vegetation are found in highly fragmented condition. In the case of moderate resolution (~500 m) satellite data such as Moderate Resolution Imaging Spectroradiometer (MODIS), most of the pixels are influenced by heterogeneous mixtures of the vegetation types. Therefore, due to mixed pixel effects, discrimination and mapping of the vegetation types is difficult using the MODIS data. Moreover, the resulted moderate resolution map misses many fragmented vegetation patches. On the contrary, higher resolution (~30 m) satellite data can represent many biophysical processes and characteristics of the land surface [36]. Hence, the higher resolution mapping of vegetation types can play a tremendous role in the conservation and management of the vegetation. There has been some progress in the production of high-resolution land cover maps at national, regional, and global scales recently [37,38,39]. Recent research using the MODIS data has reported that multi-temporal satellite datasets are crucial for discriminating the vegetation physiognomic types; whereas classification accuracy does not vary much with the classifier but the performance is very sensitive to input features and size of the ground truth data [7].
The main objectives of the research were to evaluate multi-source satellite data such as Sentinel-2, Landsat 8, and MODIS with different spatial and temporal resolutions for the purpose of nationwide vegetation mapping; and to generate an improved high-resolution (30 m) vegetation map in Japan through machine learning and cross-validation approach. The newly produced vegetation physiognomic map was compared to the existing high-resolution land cover map in Japan, and the improvements were discussed.

2. Methodology

2.1. Preparation of Input Features

Standard terrain corrected (Level 1T) Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) scenes available from 2014 to 2016 over Japan were used. Quantized and calibrated scaled Digital Numbers (DNs) for each OLI and TIRS band delivered as 16-bit unsigned integers were converted into Top-Of-Atmosphere (TOA) spectral reflectance and brightness temperature (K) values using the rescaling coefficients found in the metadata file. Seven bands (blue, green, red, near infrared, mid infrared, shortwave infrared, and thermal infrared) datasets were extracted. The clouds were removed by using separate Quality Assessment (QA) band information available in the data. In addition, three spectral indices: Normalized Difference Vegetation Index (NDVI; [20]), Urban Built-up Index (UBI; [40]), and Superfine Water Index (SWI; [41]) were also calculated for each scene. The multi-temporal data consisting of spectral and spectral indices were composited by calculating multiple percentiles (0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100) pixel by pixel. In this way, 110 layered datasets (features) were prepared for machine learning and cross-validation. This research deals not only with the classification of vegetation types but also with the discrimination of the vegetation types from non-vegetative cover types (urban, water, and barren). Therefore, the UBI and SWI were used for improving the discrimination between non-vegetation and vegetation types.
All eight-day cycle Nadir BRDF-Adjusted Reflectance (NBAR) data from the MODIS BRDF/Albedo (MCD43A4) product available at 500 m resolution over Japan from 2014 to 2016 were used. Six bands (red, near infrared, blue, green, mid infrared, and shortwave infrared) datasets were extracted which were cloud free. We calculated three spectral indices—NDVI, UBI, and SWI—using the NBAR for each scene. The eight-day datasets were composited using multiple percentiles (0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100) method, and rich features (in total 99) were prepared for further analysis.
Nine spectral bands (blue, green, red, red edge1, red edge2, red edge3, near infrared, red edge4, and shortwave infrared) datasets from the Sentinel-2 Top-Of-Atmosphere (TOA) reflectance product were used. Cloudy pixels were masked out by using separate quality assessment band. Sentinel-2 data with spatial resolutions varying from 10 to 60 m were resampled into 30 m. All available scenes from 2015 to 2017 over Japan were used. Similar to the Landsat 8 and MODIS datasets, three spectral indices (NDVI, UBI, and SWI) were also calculated for each scene. Finally, the data were composited using multiple percentiles (0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100) method. In total, 132 features were prepared for further analysis.

2.2. Machine Learning, Cross-Validation, and Mapping

The ground truth polygon data prepared in the previous studies [7,35] which were basically constructed through on-site field inspection were used in the research. This data comprise of 390 polygons for each class under the study. In the case of 30 m resolution Landsat 8 or Sentinel-2 datasets, sample size could be increased excessively as around 9800 ground truth points could be prepared within 390 polygons. This research deals with the classification of six vegetation physiognomic classes: evergreen coniferous forest, evergreen broadleaf forest, deciduous coniferous forest, deciduous broadleaf forest, shrubs, and herbs; and two land cover types: arable, and non-vegetation (urban, water, and barren). The distribution of ground truth data (390 polygons and 9800 points) is demonstrated in Figure 1.
Random forests based supervised classification approach was adopted in the research because it can handle highly non-linear interactions and shows superior performance than other classifiers for the discrimination of vegetation physiognomic types [7,42]. The random forests classifier uses bootstrap aggregating (bagging) to form an ensemble of trees by searching random subspaces from the given data (features) and the best splitting of the nodes by minimizing the correlation between the trees. The performance of different features was evaluated by using the 10-fold cross-validation method. The procedure of 10-fold cross-validation method is described as follows:
First of all, the given features were divided into 10-fold of samples after shuffling them well. For each fold of samples, learning was carried out only for nine folds, whereas the remaining one fold was used for the validation. However, inside the cross-validation loop, the best scoring features (training) were scored using the random forests algorithm, and different sets of best features (5, 10, 25, 50, and 75) were obtained. Results up to 75 best features are presented because the performances were usually saturated after that. For each set of best features, the random forests model established with the training folds was used to predict the physiognomic classes with the validation fold. Finally, the predictions were collected from cross-validation loops. The same processing was conducted for each dataset (MODIS, Landsat 8, and Sentinel-2). The optimum parameters (no. of trees = 300, max. features = all) of the random forests classifier obtained from hit and trial method were used for each processing. The validation metrics: confusion matrix, overall accuracy, and kappa coefficient were used for assessing the performance. The overall accuracy—sum of true positives and true negatives divided by number of validation points—measures correctness of the classification. Kappa coefficient measures inter-rater agreement by counting the proportion of instances that predictions agreed with the validation data (observed agreement) after adjusting for the proportion of agreements taking place by chance (expected agreement) [43].
The random forests based model by selecting best features from the combination of Landsat 8 and Sentinel-2 datasets was used for the production of seamless nationwide vegetation map. The resultant map was compared with the most recently available land use and cover map (Version 16.09, September 2016) in Japan (http://www.eorc.jaxa.jp/ALOS/lulc/jlulc_jpn.htm, accessed on May 05, 2017) using 9800 ground truth points data prepared in the research. For this comparison, 50 m resolution existing map was remapped according to the legends used in the research, and resampled into 30 m resolution.

3. Results

3.1. Cross-Validation Results

The cross-validation results obtained from different datasets (MODIS, Landsat 8, and Sentinel-2) are summarized in Table 1. Different sets (5, 10, 25, 50, and 75) of important features given by the random forests classifier were examined. In the case of sample size of 390, performances of the Landsat 8 (Overall accuracy = 0.77, Kappa coefficient = 0.74) and Sentinel-2 (Overall accuracy = 0.66, Kappa coefficient = 0.61) datasets were significantly lower than the performance of MODIS (Overall accuracy = 0.80, Kappa coefficient = 0.77) datasets. However, availability of large size (9800) of samples could increase the classification accuracy (Overall accuracy = 0.86, Kappa coefficient = 0.84) significantly in the case of Landsat 8 datasets. The performance of the Sentinel-2 datasets was still lower (Overall accuracy = 0.77, Kappa coefficient = 0.74) than the performance of Landsat 8 datasets. Combination of Landsat 8 and Sentinel-2 datasets slightly improved the classification in the case of 9800 samples (Overall accuracy = 0.89, Kappa coefficient = 0.87) or 390 samples (Overall accuracy = 0.78, Kappa coefficient = 0.75) than using the Landsat 8 datasets separately.
Confusion matrices computed with different datasets and ground truth sample sizes are plotted in Figure 2 and Figure 3. The discrimination between inter-class physiognomic types are well demonstrated by the confusion matrices.

3.2. Production of Vegetation Map

The random forests model established by selecting best-performing features from the combination of Landsat 8 and Sentinel-2 datasets was used for the production of nationwide vegetation map. The resultant seamless 30 m resolution map is displayed in Figure 4.
The seamless vegetation map produced in the research was compared to the existing map with reference to the ground truth data prepared in the research. Due to variation in the definitions of corresponding legends between two maps, only four types of forests (evergreen coniferous forest, evergreen broadleaf forest, deciduous coniferous forest, and deciduous broadleaf forest) and non-vegetation (merge of urban, water, and barren) type were used for the comparison. The resultant vegetation physiognomic map was superior to the existing map (Overall accuracy = 0.72, Kappa coefficient = 0.66). The low accuracy of the existing map may be due to limited temporal information carried out by the multi-spectral data, insufficient size of the ground truth samples, and satellite datasets of long time span (2006–2011).

4. Discussion and Conclusions

The MODIS Land Cover Type product (MCD12Q1) is one of the most recently available global land cover product from which vegetation physiognomic information can be obtained. However, in terms of the mapping of vegetation physiognomic types, poor performance of the MCD12Q1 product has been reported in Japan [35]. On the other hand, visual interpretation techniques have been used for the nationwide vegetation mappings. For example, Harada et al. [44] prepared the MODIS based vegetation map of year 2001 in Japan by manually labeling the clusters obtained from the Iterative Self-Organizing Data Analysis Technique. Roy et al. [45] used on-screen visual screen technique for the preparation of land use and land cover database in India using medium-resolution Indian remote sensing satellite data. More recently, Sharma et al. [35] employed machine learning and automated classification approach for the production of nationwide vegetation physiognomic map in Japan using MODIS data. However, mapping of the vegetation physiognomic types by using the 500 m resolution MODIS datasets are affected by mixed pixel effect, and the resulting map misses distribution of many vegetation types that occurred in smaller patches.
Ground truth data are inevitable assets of machine learning and cross-validation approach. Vegetation types are found in highly fragmented condition in Japan, and thus preparing ground truth data even from a homogenous area of at least 500 m pixel-size is difficult. In the case of limited size of the ground truth samples (390), MODIS datasets provided best performance (Overall accuracy = 0.80, Kappa coefficient = 0.77) for the classification of vegetation physiognomic types. It should be because of higher temporal resolution covered by the MODIS datasets than by Landsat and Sentinel-2 datasets. Consequently, temporally rich datasets were found to be crucial for vegetation physiognomic mapping. On the contrary, sample size could be increased excessively as around 9800 ground truth points could be prepared within 390 MODIS pixel-sized polygons in the case of 30 m resolution datasets. The increase in the sample size significantly enhanced the Landsat-8 datasets based classification (Overall accuracy = 0.86, Kappa coefficient = 0.84). However, Sentinel-2 datasets (Overall accuracy = 0.77, Kappa coefficient = 0.74) could not contribute as much as the Landsat-8 datasets. This is possibly due to temporally limited datasets covered by Sentinel-2 satellites so far. Regardless of the fact that Landsat-8 and Sentinel-2 datasets have lower temporal resolutions than MODIS datasets, they could enhance the classification of otherwise challenging vegetation physiognomic types due to possibility of training a wider variation of vegetation types at 30 m resolution.
In this research, classification accuracies were assessed by 10-fold cross-validation method using the random forests classifier. Random forests is a powerful algorithm, which is increasingly used in the classification of remote sensing images [46,47]. Random forests can handle highly non-linear interactions and classification boundaries of the multi-temporal spectral data. Random forests consists of a large number of deep trees, where each tree is trained on the bagged data using the random selection of features, so gaining a full understanding of how the features interact non-linearly by examining each individual tree is difficult. However, the spectral indices (NDVI, UBI, SWI) used in the research were in the top list of important features as retrieved from the inbuilt feature importance function of the random forests algorithm.
Based on findings from the evaluation of multi-source satellite datasets, a nationwide 30 m resolution vegetation map was produced through machine learning and cross-validation approach. The resultant map showed higher accuracy than the existing map of Japan. Nevertheless, bottlenecks present in the discrimination of some classes especially between the coniferous and broadleaved forests require further improvements in future. With the additional temporal coverage by Sentinel-2 satellites in near future, further improvements in the classification and mapping of vegetation types are expected. The availability of temporally rich datasets from Sentinel-2 satellites would also be useful for much higher resolution (10 m) vegetation mapping activities on a large scale.

Acknowledgments

This research was supported by JSPS (Japan Society for the Promotion of Science) grant-in-aid for scientific research (No. P17F17109). MODIS data used in the research were available from the NASA EOSDIS Land Processes Distributed Active Archive Center (LP DAAC), USGS/Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota. Landsat 8 data were available from the United States Geological Survey. Sentinel-2 data were available from European Space Agency (ESA) Copernicus program.

Author Contributions

Ram C. Sharma designed the research, wrote computer programs, performed analyses, and wrote manuscript. Keitarou Hara supervised the research, and Ryutaro Tateishi revised the manuscript. All authors contributed and approved final manuscript before submission.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ohba, H. The flora of Japan and the implication of global climatic change. J. Plant Res. 1994, 107, 85–89. [Google Scholar] [CrossRef]
  2. Leonelli, G.; Pelfini, M.; di Cella, U.M.; Garavaglia, V. Climate warming and the recent treeline shift in the European Alps: The role of geomorphological factors in high-altitude sites. Ambio 2011, 40, 264–273. [Google Scholar] [CrossRef] [PubMed]
  3. Kirdyanov, A.V.; Hagedorn, F.; Knorre, A.A.; Fedotova, E.V.; Vaganov, E.A.; Naurzbaev, M.M.; Moiseev, P.A.; Rigling, A. 20th century tree-line advance and vegetation changes along an altitudinal transect in the Putorana Mountains, northern Siberia. Boreas 2012, 41, 56–67. [Google Scholar] [CrossRef]
  4. Büntgen, U.; Hellmann, L.; Tegel, W.; Normand, S.; Myers-Smith, I.; Kirdyanov, A.V.; Nievergelt, D.; Schweingruber, F.H. Temperature-induced recruitment pulses of Arctic dwarf shrub communities. J. Ecol. 2015, 103, 489–501. [Google Scholar] [CrossRef]
  5. Seim, A.; Treydte, K.; Trouet, V.; Frank, D.; Fonti, P.; Tegel, W.; Panayotov, M.; Fernández-Donado, L.; Krusic, P.; Büntgen, U. Climate sensitivity of Mediterranean pine growth reveals distinct east-west dipole: East-West dipole in climate sensitivity of Mediterranean pines. Int. J. Clim. 2015, 35, 2503–2513. [Google Scholar] [CrossRef]
  6. Beard, J.S. The Physiognomic Approach. In Classification of Plant Communities; Whittaker, R.H., Ed.; Springer: Dordrecht, The Netherlands, 1978; pp. 33–64. [Google Scholar]
  7. Sharma, R.C.; Hara, K.; Hirayama, H. A Machine Learning and Cross-Validation Approach for the Discrimination of Vegetation Physiognomic Types Using Satellite Based Multispectral and Multitemporal Data. Scientifica 2017, 2017, 8. [Google Scholar] [CrossRef] [PubMed]
  8. Gitas, I.; Karydas, C.; Kazakis, G. Land cover mapping of Mediterranean landscapes, using SPOT4-Xi and IKONOS imagery-A preliminary investigation. Options Mediterr. Ser. B 2003, 2003, 27–41. [Google Scholar]
  9. Salovaara, K.J.; Thessler, S.; Malik, R.N.; Tuomisto, H. Classification of Amazonian primary rain forest vegetation using Landsat ETM+ satellite imagery. Remote Sens. Environ. 2005, 97, 39–51. [Google Scholar] [CrossRef]
  10. Li, L.; Ustin, S.L.; Lay, M. Application of multiple endmember spectral mixture analysis (MESMA) to AVIRIS imagery for coastal salt marsh mapping: A case study in China Camp, CA, USA. Int. J. Remote Sens. 2005, 26, 5193–5207. [Google Scholar] [CrossRef]
  11. Rosso, P.H.; Ustin, S.L.; Hastings, A. Mapping marshland vegetation of San Francisco Bay, California, using hyperspectral data. Int. J. Remote Sens. 2005, 26, 5169–5191. [Google Scholar] [CrossRef]
  12. Helmer, E.H.; Ruzycki, T.S.; Benner, J.; Voggesser, S.M.; Scobie, B.P.; Park, C.; Fanning, D.W.; Ramnarine, S. Detailed maps of tropical forest types are within reach: Forest tree communities for Trinidad and Tobago mapped with multiseason Landsat and multiseason fine-resolution imagery. For. Ecol. Manag. 2012, 279, 147–166. [Google Scholar] [CrossRef]
  13. Zweig, C.L.; Burgess, M.A.; Percival, H.F.; Kitchens, W.M. Use of Unmanned Aircraft Systems to Delineate Fine-Scale Wetland Vegetation Communities. Wetlands 2015, 35, 303–309. [Google Scholar] [CrossRef]
  14. Su, Y.; Guo, Q.; Fry, D.L.; Collins, B.M.; Kelly, M.; Flanagan, J.P.; Battles, J.J. A Vegetation Mapping Strategy for Conifer Forests by Combining Airborne LiDAR Data and Aerial Imagery. Can. J. Remote Sens. 2016, 42, 1–15. [Google Scholar] [CrossRef]
  15. Sankey, T.T.; McVay, J.; Swetnam, T.L.; McClaran, M.P.; Heilman, P.; Nichols, M. UAV hyperspectral and LiDAR data and their fusion for arid and semi-arid land vegetation monitoring. Remote Sens. Ecol. Conserv. 2017. [Google Scholar] [CrossRef]
  16. Koch, M.; Schmid, T.; Reyes, M.; Gumuzzio, J. Evaluating Full Polarimetric C- and L-Band Data for Mapping Wetland Conditions in a Semi-Arid Environment in Central Spain. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 1033–1044. [Google Scholar] [CrossRef]
  17. Betbeder, J.; Rapinel, S.; Corpetti, T.; Pottier, E.; Corgne, S.; Hubert-Moy, L. Multitemporal classification of TerraSAR-X data for wetland vegetation mapping. J. Appl. Remote Sens. 2014, 8, 083648. [Google Scholar] [CrossRef]
  18. Balzter, H.; Cole, B.; Thiel, C.; Schmullius, C. Mapping CORINE Land Cover from Sentinel-1A SAR and SRTM Digital Elevation Model Data using Random Forests. Remote Sens. 2015, 7, 14876–14898. [Google Scholar] [CrossRef]
  19. Furtado, L.F.deA.; Silva, T.S.F.; Novo, E.M.L.deM. Dual-season and full-polarimetric C band SAR assessment for vegetation mapping in the Amazon várzea wetlands. Remote Sens. Environ. 2016, 174, 212–222. [Google Scholar] [CrossRef]
  20. Rouse, J.; Haas, R.; Schell, J.; Deering, D. Monitoring Vegetation Systems in the Great Plains with ERTS. In Proceedings of the Third ERTS Symposium, Washington, DC, USA, 10–14 December 1974; Freden, S.C., Mercanti, E.P., Eds.; U.S. Govt. Printing Office: Washington DC, USA, 1974; Volume 351, p. 309. [Google Scholar]
  21. Tucker, C.J. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef]
  22. Roberts, D.A.; Gardner, M.E.; Church, R.; Ustin, S.L.; Green, R.O. Optimum strategies for mapping vegetation using multiple-endmember spectral mixture models. Proc. SPIE 3118 1997, 108–119. [Google Scholar] [CrossRef]
  23. Udelhoven, T. Long term data fusion for a dense time series analysis with MODIS and Landsat imagery in an Australian Savanna. J. Appl. Remote Sens. 2012, 6, 063512. [Google Scholar] [CrossRef]
  24. Schmidt, M.; Lucas, R.; Bunting, P.; Verbesselt, J.; Armston, J. Multi-resolution time series imagery for forest disturbance and regrowth monitoring in Queensland, Australia. Remote Sens. Environ. 2015, 158, 156–168. [Google Scholar] [CrossRef]
  25. Murray, H.; Lucieer, A.; Williams, R. Texture-based classification of sub-Antarctic vegetation communities on Heard Island. Int. J. Appl. Earth Obs. Geoinf. 2010, 12, 138–149. [Google Scholar] [CrossRef]
  26. Stuart, N.; Barratt, T.; Place, C. Classifying the Neotropical savannas of Belize using remote sensing and ground survey. J. Biogeogr. 2006, 33, 476–490. [Google Scholar] [CrossRef]
  27. Vanselow, K.; Samimi, C. Predictive Mapping of Dwarf Shrub Vegetation in an Arid High Mountain Ecosystem Using Remote Sensing and Random Forests. Remote Sen. 2014, 6, 6709–6726. [Google Scholar] [CrossRef]
  28. Torbick, N.; Ledoux, L.; Salas, W.; Zhao, M. Regional Mapping of Plantation Extent Using Multisensor Imagery. Remote Sens. 2016, 8, 236. [Google Scholar] [CrossRef]
  29. Wang, Z.; Wang, Q.; Zhao, L.; Wu, X.; Yue, G.; Zou, D.; Nan, Z.; Liu, G.; Pang, Q.; Fang, H.; et al. Mapping the vegetation distribution of the permafrost zone on the Qinghai-Tibet Plateau. J. Mt. Sci. 2016, 13, 1035–1046. [Google Scholar] [CrossRef]
  30. Schwieder, M.; Leitão, P.J.; da Cunha Bustamante, M.M.; Ferreira, L.G.; Rabe, A.; Hostert, P. Mapping Brazilian savanna vegetation gradients with Landsat time series. Int. J. Appl. Earth Obs. Geoinf. 2016, 52, 361–370. [Google Scholar] [CrossRef]
  31. Filippi, A.M.; Jensen, J.R. Fuzzy learning vector quantization for hyperspectral coastal vegetation classification. Remote Sens. Environ. 2006, 100, 512–530. [Google Scholar] [CrossRef]
  32. Carpenter, G.A.; Gopal, S.; Macomber, S.; Martens, S.; Woodcock, C.E. A neural network method for mixture estimation for vegetation mapping. Remote Sens. Environ. 1999, 70, 138–152. [Google Scholar] [CrossRef]
  33. Zhang, C.; Xie, Z. Combining object-based texture measures with a neural network for vegetation mapping in the Everglades from hyperspectral imagery. Remote Sens. Environ. 2012, 124, 310–320. [Google Scholar] [CrossRef]
  34. Antropov, O.; Rauste, Y.; Astola, H.; Praks, J.; Häme, T.; Hallikainen, M.T. Land cover and soil type mapping from spaceborne PolSAR data at L-band with probabilistic neural network. IEEE Trans. Geosci. Remote Sens. 2014, 52, 5256–5270. [Google Scholar] [CrossRef]
  35. Sharma, R.C.; Hara, K.; Hirayama, H.; Harada, I.; Hasegawa, D.; Tomita, M.; Geol Park, J.; Asanuma, I.; Short, K.M.; Hara, M.; et al. Production of Multi-Features Driven Nationwide Vegetation Physiognomic Map and Comparison to MODIS Land Cover Type Product. Adv. Remote Sens. 2017, 6, 54–65. [Google Scholar] [CrossRef]
  36. Sharma, R.; Tateishi, R.; Hara, K.; Iizuka, K. Production of the Japan 30-m Land Cover Map of 2013–2015 Using a Random Forests-Based Feature Optimization Approach. Remote Sens. 2016, 8, 429. [Google Scholar] [CrossRef]
  37. Homer, C.G.; Dewitz, J.A.; Yang, L.; Jin, S.; Danielson, P.; Xian, G.; Coulston, J.; Herold, N.D.; Wickham, J.; Megown, K. Completion of the 2011 National Land Cover Database for the conterminous United States-Representing a decade of land cover change information. Photogram. Eng. Remote Sens. 2015, 81, 345–354. [Google Scholar]
  38. Chen, J.; Chen, J.; Liao, A.; Cao, X.; Chen, L.; Chen, X.; He, C.; Han, G.; Peng, S.; Lu, M.; et al. Global land cover mapping at 30m resolution: A POK-based operational approach. ISPRS J. Photogram. Remote Sens. 2015, 103, 7–27. [Google Scholar] [CrossRef]
  39. Pelletier, C.; Valero, S.; Inglada, J.; Champion, N.; Marais Sicre, C.; Dedieu, G. Effect of Training Class Label Noise on Classification Performances for Land Cover Mapping with Satellite Image Time Series. Remote Sens. 2017, 9, 173. [Google Scholar] [CrossRef]
  40. Sharma, R.C.; Tateishi, R.; Hara, K.; Gharechelou, S.; Iizuka, K. Global mapping of urban built-up areas of year 2014 by combining MODIS multispectral data with VIIRS nighttime light data. Int. J. Digit. Earth 2016, 9, 1004–1020. [Google Scholar] [CrossRef]
  41. Sharma, R.; Tateishi, R.; Hara, K.; Nguyen, L. Developing Superfine Water Index (SWI) for Global Water Cover Mapping Using MODIS Data. Remote Sens. 2015, 7, 13807–13841. [Google Scholar] [CrossRef]
  42. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  43. Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
  44. Harada, I.; Hara, K.; Tomita, M.; Short, K.; Park, J. Monitoring Landscape Changes in Japan Using Classification of Modis Data Combined with a Landscape Transformation Sere (LTS) Model. J. Landsc. Ecol. 2015, 7. [Google Scholar] [CrossRef]
  45. Roy, P.; Roy, A.; Joshi, P.; Kale, M.; Srivastava, V.; Srivastava, S.; Dwevidi, R.; Joshi, C.; Behera, M.; Meiyappan, P.; et al. Development of Decadal (1985–1995–2005) Land Use and Land Cover Database for India. Remote Sens. 2015, 7, 2401–2430. [Google Scholar] [CrossRef]
  46. Hayes, M.M.; Miller, S.N.; Murphy, M.A. High-resolution landcover classification using Random Forest. Remote Sens. Lett. 2014, 5, 112–121. [Google Scholar] [CrossRef]
  47. Sharma, R.; Tateishi, R.; Hara, K. A Biophysical Image Compositing Technique for the Global-Scale Extraction and Mapping of Barren Lands. ISPRS Int. J. Geo-Inf. 2016, 5, 225. [Google Scholar] [CrossRef]
Figure 1. The distribution of ground truth data (polygons and points inside polygons) prepared in the research: (a) display of the national territory; (b) zoomed in over the black polygon region in (a) showing the density of the reference points. The national boundary is based on the Global Administrative Areas database (GADM) Version 2.8, November 2015.
Figure 1. The distribution of ground truth data (polygons and points inside polygons) prepared in the research: (a) display of the national territory; (b) zoomed in over the black polygon region in (a) showing the density of the reference points. The national boundary is based on the Global Administrative Areas database (GADM) Version 2.8, November 2015.
Land 06 00050 g001
Figure 2. Confusion matrices computed with different datasets in the cases when the ground truth sample sizes were 390. Only the highest accuracy results among different sets of best features (f) are plotted: (a) MODIS (f = 50); (b) Landsat 8 (f = 25); (c) Sentinel 2 (f = 50); (d) Landsat 8 + Sentinel 2 (f = 75).
Figure 2. Confusion matrices computed with different datasets in the cases when the ground truth sample sizes were 390. Only the highest accuracy results among different sets of best features (f) are plotted: (a) MODIS (f = 50); (b) Landsat 8 (f = 25); (c) Sentinel 2 (f = 50); (d) Landsat 8 + Sentinel 2 (f = 75).
Land 06 00050 g002aLand 06 00050 g002b
Figure 3. Confusion matrices computed with different datasets in the cases when the ground truth sample sizes were 9800. Only the highest accuracy results among different sets of best features (f) are plotted: (a) Landsat 8 (f = 25); (b) Sentinel 2 (f = 50); (c) Landsat 8 + Sentinel 2 (f = 75).
Figure 3. Confusion matrices computed with different datasets in the cases when the ground truth sample sizes were 9800. Only the highest accuracy results among different sets of best features (f) are plotted: (a) Landsat 8 (f = 25); (b) Sentinel 2 (f = 50); (c) Landsat 8 + Sentinel 2 (f = 75).
Land 06 00050 g003aLand 06 00050 g003b
Figure 4. Nationwide vegetation physiognomic map produced through the research: (a) Display over the national territory; (b) Zoomed in over the black polygon region in (a). The national boundary is based on Global Administrative Areas database (GADM) version 2.8, November 2015.
Figure 4. Nationwide vegetation physiognomic map produced through the research: (a) Display over the national territory; (b) Zoomed in over the black polygon region in (a). The national boundary is based on Global Administrative Areas database (GADM) version 2.8, November 2015.
Land 06 00050 g004
Table 1. Cross-validation results with different datasets and ground truth sample size (s). The computed overall accuracy (Kappa coefficient) with different sets of important features (f) are shown. Highest accuracy results obtained among different sets of best features (f) are highlighted.
Table 1. Cross-validation results with different datasets and ground truth sample size (s). The computed overall accuracy (Kappa coefficient) with different sets of important features (f) are shown. Highest accuracy results obtained among different sets of best features (f) are highlighted.
Datasetsf = 5f = 10f = 25f = 50f = 75
MODIS (s = 390) 0.74 (0.71)0.78 (0.75)0.79 (0.76)0.80 (0.77)0.80 (0.77)
Landsat 8 (s = 390)0.68 (0.64)0.76 (0.72)0.77 (0.74)0.77 (0.74)0.77 (0.74)
Landsat 8 (s = 9800)0.73 (0.69)0.83 (0.81)0.86 (0.84)0.86 (0.84)0.86 (0.84)
Sentinel 2 (s = 390)0.54 (0.48)0.61 (0.55)0.63 (0.58)0.66 (0.61)0.66 (0.61)
Sentinel 2 (s = 9800)0.61 (0.55)0.71 (0.67)0.76 (0.72)0.77 (0.74)0.77 (0.74)
Landsat 8 + Sentinel 2 (s = 390)0.68 (0.64)0.76 (0.72)0.77 (0.73)0.77 (0.73)0.78 (0.75)
Landsat 8 + Sentinel 2 (s = 9800)0.75 (0.71)0.83 (0.81)0.87 (0.85)0.88 (0.86)0.89 (0.87)
Land EISSN 2073-445X Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top