Production of the Japan 30-m Land Cover Map of 2013–2015 Using a Random Forests-Based Feature Optimization Approach

Achieving more timely, accurate and transparent information on the distribution and dynamics of the world’s land cover is essential to understanding the fundamental characteristics, processes and threats associated with human-nature-climate interactions. Higher resolution (~30–50 m) land cover mapping is expected to advance the understanding of the multi-dimensional interactions of the human-nature-climate system with the potentiality of representing most of the biophysical processes and characteristics of the land surface. However, mapping at 30-m resolution is complicated with existing manual techniques, due to the laborious procedures involved with the analysis and interpretation of huge volumes of satellite data. To cope with this problem, an automated technique was explored for the production of a high resolution land cover map at a national scale. The automated technique consists of the construction of a reference library by the optimum combination of the spectral, textural and topographic features and predicting the results using the optimum random forests model. The feature-rich reference library-driven automated technique was used to produce the Japan 30-m resolution land cover (JpLC-30) map of 2013–2015. The JpLC-30 map consists of seven major land cover types: water bodies, deciduous forests, evergreen forests, croplands, bare lands, built-up areas and herbaceous. The resultant JpLC-30 map was compared to the existing 50-m resolution JAXA High Resolution Land-Use and Land-Cover (JHR LULC) map with reference to Google EarthTM images. The JpLC-30 map provides more accurate and up-to-date land cover information than the JHR LULC map. This research recommends an effective utilization of the spectral, textural and topographic information to increase the accuracy of automated land cover mapping.


Introduction
Land cover and land use change is a paramount global environmental issue. Achieving more timely, accurate and transparent information on the distribution and dynamics of the world's land cover is essential to understand the fundamental characteristics, processes and threats associated with human-nature-climate interactions, including, among others, biogeochemical and hydrological cycles, biodiversity and genetic variations, land degradation and environmental pollution, climate

Processing of Satellite Data
All of the~5800 standard terrain corrected (Level 1T) Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) scenes of 2013-2015 available from the United States Geological Survey (USGS) over Japan were processed. Landsat-8 has been monitoring global land surface since its launch on 11 February 2013 with a standard 16-day repeat cycle [46,47]. Quantized and calibrated scaled Digital Numbers (DNs) for each OLI and TIRS band delivered as 16-bit unsigned integers were converted into Top-Of-Atmosphere (TOA) spectral reflectance and brightness temperature (K) values using the rescaling coefficients found in the metadata file. Out of nine OLS and two TIRS bands of the Landsat 8 data, seven bands (blue, green, red, near infrared, shortwave infrared and thermal infrared) were extracted. The clouds were removed by using separate Quality Assessment (QA) band information available in the Landsat 8 data. Void data are a great nuisance for pipelining the machine learning and prediction procedures. Hence, the percentile-based image compositing technique was preferred over time period-based (such as monthly) composting techniques to avoid the void data resulting from the latter technique. Using all multi-temporal data of 2013-2015, multiple percentiles, 0, 20, 40, 50, 60, 80 and 100, were calculated pixel by pixel for each band to capture the temporal information of the land cover. In addition, the Normalized Difference Vegetation Index (NDVI; [48]) was calculated for each scene, and similarly, multiple NDVI percentiles, 0, 20, 40, 50, 60, 80 and 100, were calculated pixel by pixel. The temporal distribution of the NDVI data ranging between´1 and 1 varies with the land cover type. Therefore, a percentile, for instance the 40th percentile NDVI (the statistical value below which 40 percent of the multi-temporal NDVI data are found), also varies with the cover types. Using maximum NDVI (100th percentile) images, 18 types of image textures [49,50] were also calculated. The gray-level co-occurrence matrix (GLCM) functions implemented in OTB (https://www.orfeo-toolbox.org/), an open-source C++ library for image processing, were used to calculate the image textures using a 3ˆ3 sliding window size with a single pixel offset. The maximum NDVI image representing the growing season signal of the vegetation was deliberately chosen to obtain drastic textural contrast between the vegetation and non-vegetation cover types. In this way, a stack of 67 feature images (~175 gigabytes) was prepared by exploiting a huge volume (~5 terabytes) of Landsat 8 data. In addition, land surface slope data were prepared from 30-m resolution SRTM-based Digital Terrain Elevation Data (DTED) available from the USGS. The satellite data were automatically processed through custom programming deployed for large-scale remote sensing applications. The processing steps involved in the preparation of feature images are shown in Figure 1. In machine learning, a feature is an individual measurable property of a phenomenon being observed [51].

Collection of Reference Data
The JpLC-30 map includes seven land cover types, namely water bodies, deciduous forests, evergreen forests, croplands, bare lands, built-up areas and herbaceous. The classification scheme is described as follows: Altogether, 68 feature images were used as an input dataset in the research. Feature images are also listed in Table 1.

Collection of Reference Data
The JpLC-30 map includes seven land cover types, namely water bodies, deciduous forests, evergreen forests, croplands, bare lands, built-up areas and herbaceous. The classification scheme is described as follows:

1.
Croplands land used for cultivated crops, such as paddy fields, irrigated or dry farmland and vegetables.

2.
Deciduous forests: land covered with trees, with vegetation cover over 30%, including broadleaf and coniferous forests, and sparse woodland, with cover of 10%-30%, that shed their leaves seasonally.

3.
Evergreen forests: land covered with trees, with vegetation cover over 30%, including broadleaf and coniferous forests, and sparse woodland with cover of 10%-30% that maintain leaves throughout the year. 4.
Herbaceous: land covered with vegetation, as grass or herbs, with cover over 10%.

5.
Water bodies: water bodies within the land area, including rivers, lakes, reservoirs and ponds. 6.
Built-up areas: land modified by human activities, including all kinds of impervious surfaces. 7.
Bare lands: land with vegetation cover lower than 10%, including sandy fields and bare rocks.
Approximately 100 polygons belonging to each land cover class as defined above were collected based on reference data and expert knowledge. The seasonal true-color (RGB) composite images prepared from Landsat 8 data and Google Earth ™-based time-lapse images were used to collect reference data. Reference polygons were collected by interactively viewing all seasonal RGB images and Google Earth™ images available for 2013-2015. Reference polygons were collected only from large homogenous areas that could be interpreted visually. Out of~100 polygons for each land cover class, 22,500 geo-location points distributed at least 50 m apart and located inside the reference polygons were extracted randomly and used as the training data. Similarly, another set of~50 polygons belonging to each class were also collected separately, and 7500 geo-location points for each class were used as the validation data. The distribution of 30,000 geo-location points belonging to each cover class are displayed in Figure 2.

Construction of the Reference Library
It should be noted that the reference (training and validation) data are geo-located points that are sensitive to the biophysical changes of the land surface. They cannot be used after a few months or years, since land cover may have changed considerably at that time. Therefore, altogether 210,000 reference points (30,000 points × 7 classes) prepared through visual interpretation and local knowledge were converted into a reference library with the support of 68 feature images (Table 1) prepared in the research. Both reference points and feature images were recorded during the same time period, i.e., 2013-2015. The input of a large number of feature images is expected to represent spectral, textural and topographic variation of the land cover types. The reference library constructed in this way was used to automate the mapping procedure. The spectral, textural and topographic information used for constructing the reference library is the standard measurements obtained from TOA reflectance, TOA brightness temperature and Digital Terrain Model (DTM) data, respectively. However, the spectral, textural and topographic characteristics of the land cover types in other geographical regions may vary significantly. Therefore, the constructed reference library is applicable

Construction of the Reference Library
It should be noted that the reference (training and validation) data are geo-located points that are sensitive to the biophysical changes of the land surface. They cannot be used after a few months or years, since land cover may have changed considerably at that time. Therefore, altogether 210,000 reference points (30,000 pointsˆ7 classes) prepared through visual interpretation and local knowledge were converted into a reference library with the support of 68 feature images (Table 1) prepared in the research. Both reference points and feature images were recorded during the same time period, i.e., 2013-2015. The input of a large number of feature images is expected to represent spectral, textural and topographic variation of the land cover types. The reference library constructed in this way was used to automate the mapping procedure. The spectral, textural and topographic information used for constructing the reference library is the standard measurements obtained from TOA reflectance, TOA brightness temperature and Digital Terrain Model (DTM) data, respectively. However, the spectral, textural and topographic characteristics of the land cover types in other geographical regions may vary significantly. Therefore, the constructed reference library is applicable to repeat monitoring of land cover dynamics of the same region it represents. The reference library includes seven major land cover types present in Japan.

Machine Learning and Prediction
The land cover exhibits wider spectral variation with respect to both space and time.
To discriminate highly dynamic land cover and land use types efficiently, not only temporal variability of the multi-spectral data, but a robust classification algorithm is also required. Random forests [52,53] is a powerful machine learning classifier. Accurate land cover classification and better performance of the random forests model have been described by many researchers [54][55][56][57]. The random forests classifier uses bootstrap aggregating (bagging) to form an ensemble of trees by searching random subspaces from the given data (features) and the best splitting of the nodes by minimizing the correlation between the trees.
The reference library constructed in the research was used to yield an optimum random forests model by choosing the best features. The reference library is a collection of two subsets of data: training and validation data. The hierarchy of best features provided by the random forests model while running with the training data was grouped into 1-68 sets of best features. For each set of best features, the confusion matrix was computed with the validation data. The first set of best features includes only one feature, whereas the last set of best features includes all 68 features. The optimum features defined as the set of the lowest number of input features yielding the highest overall accuracy was selected. The optimum set of features chosen in this way was used to produce the JpLC-30 map in the research. The retrieval of optimum features not only extracts the best features required for discriminating the variability of land cover types, but also reduces the computational cost required for the production of the classification result. The random forests algorithm implemented in OpenCV (http://opencv.org/), an optimized C/C++ programming library for computer vision, machine learning and robotics, was exploited for the production of the JpLC-30 map.

Selection of Optimum Features
The variation of confusion matrix-based overall accuracy (%) with respect to the number of features used during random forests-based classification of land cover types is demonstrated in Figure 3. The overall accuracy (%) started to elevate with the increase of input features until reaching an optimum point, after which it saturated.  Furthermore, a variation of the confusion matrix with respect to the number of features used is shown in Figure 4. The overall accuracy (%) achieved by using the maximum number of input features (i.e., 68 features) was 88.58% ( Figure 4f); whereas the overall accuracy (%) achieved by using 45 features was 88.62% (Figure 4e).   This analysis suggested that out of 68 features, 45 features are sufficient for explaining the variability of land cover types. A lower number of features (i.e., 1-5 features) has resulted in very low accuracy mainly due to the poor discriminability of some classes, which are spectrally indifferent with respect to other classes. For example, as shown in Figure 4a

Performance Analysis
The resulting JpLC-30 map was compared to the existing JAXA High Resolution Land-Use and Land-Cover (JHR LULC Ver. 14.02) map of 2006-2011 with reference to the validation data collected in the research. The accuracy comparison between two maps was performed in a statistically-rigorous manner using the kappa coefficient [58], which measures inter-rater agreement for categorical variables by counting the proportion of instances that predictions agreed with the validation data (observed accuracy), after adjusting for the proportion of agreements taking place by chance (expected accuracy), as defined in Equation (1).

Performance Analysis
The resulting JpLC-30 map was compared to the existing JAXA High Resolution Land-Use and Land-Cover (JHR LULC Ver. 14.02) map of 2006-2011 with reference to the validation data collected in the research. The accuracy comparison between two maps was performed in a statistically-rigorous manner using the kappa coefficient [58], which measures inter-rater agreement for categorical variables by counting the proportion of instances that predictions agreed with the validation data (observed accuracy), after adjusting for the proportion of agreements taking place by chance (expected accuracy), as defined in Equation (1).

Kappa "
Observed accuracy´Expected accuracy 1´Expected accuracy (1) Altogether 7500 validation points for each class distributed over the study area were used in the computation of the kappa coefficient. The validation data were the true ground location points visually verified with the Google Earth™ images and Landsat 8-based seasonal RGB images. Kappa coefficients calculated for the JpLC-30 and JHR LULC maps were 0.84 and 0.69, respectively. Based on the kappa coefficients, the JpLC-30 map is superior to the JHR LULC map. It should be noted that this analysis is solely based on validation points, which were not used for training the model for the production of the JpLC-30 map. The kappa coefficient was chosen over other accuracy metrics because it has been frequently used for assessing the accuracy of land cover maps.
The real-world performance of the JpLC-30 map was also evaluated in a number of places with reference to the Google Earth™ images of 2013-2015. The existing JHR LULC map was also used for comparison. For example, Figures 6-16 demonstrate the performance of the JpLC-30 map over the existing map with reference to the Google Earth™ image. The JpLC-30 map has far better represented the land cover information than the JHR LULC map. Since the difference between the two maps is also derived from the annual change of the land cover itself, only a broad-scale discrepancy between the two maps in areas that are unlikely to fluctuate from the land cover changes, for example the occurrence of large water bodies over natural forests, was taken into consideration. As described in Figures 6-16 the JHR LULC map has numerous classification errors pertinent to all land cover types. Major classification errors noticed in the JHR LULC map are: croplands in forests (Figure 6), water bodies in forests (Figure 7), water in croplands (Figures 8, 9 and 14) and herbaceous in croplands (Figures 13, 15 and 16). Moreover, the JHR LULC map has numerous unclassified/missing pixels (cf. the black pixels in Figures 10-12), not only in volcanic mountains, but also in forests, croplands and urban built-up areas.
Altogether 7500 validation points for each class distributed over the study area were used in the computation of the kappa coefficient. The validation data were the true ground location points visually verified with the Google Earth™ images and Landsat 8-based seasonal RGB images. Kappa coefficients calculated for the JpLC-30 and JHR LULC maps were 0.84 and 0.69, respectively. Based on the kappa coefficients, the JpLC-30 map is superior to the JHR LULC map. It should be noted that this analysis is solely based on validation points, which were not used for training the model for the production of the JpLC-30 map. The kappa coefficient was chosen over other accuracy metrics because it has been frequently used for assessing the accuracy of land cover maps.
The real-world performance of the JpLC-30 map was also evaluated in a number of places with reference to the Google Earth™ images of 2013-2015. The existing JHR LULC map was also used for comparison. For example, Figures 6-16 demonstrate the performance of the JpLC-30 map over the existing map with reference to the Google Earth™ image. The JpLC-30 map has far better represented the land cover information than the JHR LULC map. Since the difference between the two maps is also derived from the annual change of the land cover itself, only a broad-scale discrepancy between the two maps in areas that are unlikely to fluctuate from the land cover changes, for example the occurrence of large water bodies over natural forests, was taken into consideration. As described in Figures 6-16, the JHR LULC map has numerous classification errors pertinent to all land cover types. Major classification errors noticed in the JHR LULC map are: croplands in forests (Figure 6), water bodies in forests (Figure 7), water in croplands (Figures 8, 9 and 14) and herbaceous in croplands (Figures 13, 15 and 16). Moreover, the JHR LULC map has numerous unclassified/missing pixels (cf. the black pixels in Figures 10-12), not only in volcanic mountains, but also in forests, croplands and urban built-up areas.        The random forests is a non-linear classifier. It consists of a large number of deep trees, where each tree is trained on the bagged data using the random selection of features, so gaining a full understanding of how the features interact non-linearly by examining each individual tree is infeasible. Therefore, to provide insights into how features interact linearly, the optimum features were further examined by using the Linear Discriminant Analysis (LDA; [59,60]). The LDA is the most commonly-used supervised image classification technique, which projects a feature space onto a smaller subspace and computes the linear discriminants that will maximize the separation between the multiple classes. The LDA-based confusion matrix as computed with the validation data yielded 69% overall accuracy ( Figure 17). Hence, the linear classifier alone was able to provide more than two-thirds of the discrimination accuracy. Nevertheless, using the same features and validation data, the non-linear random forests classifier accelerated a recordable overall accuracy of up to 88.62% (Figure 4e).    The optimum features as selected from the random forests-based confusion matrix analysis are composed of the spectral, textural and topographic features of the land surface. Therefore, in a nutshell, the optimum features represent the bio-geo-physical phenomenon of the land surface. Different features are important in discriminating different land cover types pertaining to spatial-temporal dynamics. The random forests can handle highly non-linear interactions and classification boundaries of the bio-geo-physical features represented by the satellite data. The random forests classifier grows trees by searching random subspaces of the given data (features); the individual trees may change if the data are slightly changed, but the forest is relatively stable because it consists of many "random" trees. Therefore, the optimum set of features is unique to the given data, and the same set of optimum features may not be applied to other data, even for the same geographical location. Unlike theoretical models, supervised learning models are sensitive to features and the training data used for learning the model, because they analyze the training data and produce an optimum function to predict the class labels for unseen data by generalizing the training data. Due to the data-specific nature of the set of optimum features retrieved, they are not presented in the paper. However, models are to be learned with the new dataset before predicting results, and the optimum features can be chosen as described in this research. The selection of the optimum features is necessary to improve the prediction results and to boost the computing performance on very high-dimensional datasets.    Out of 45 feature images selected for optimum classification of land cover types, 43 features belonged to Landsat 8 spectral data; one feature belonged to the surface slope; and one feature belonged to the textural group (sum average). The sum average is the sum product of all neighboring pixels of a window (e.g., 3ˆ3 pixels) used. The addition of textural information to the spectral signatures has been described as a valuable method for improving the land cover classification accuracy [61]. The best feature obtained in this research belonged to the textural group (sum average). It alone could yield 46.11% overall accuracy (Figure 4a). Out of 43 spectral features, all seven percentile values of NDVI were selected. This implies the importance of NDVI temporal information for discriminating land cover types.  The significant errors found in the JHR LULC map were the misclassification of urban areas and water bodies with forests and the misclassification of croplands with water bodies. The presence of deep shadows and the variety of the phenological spectra of the forests can misclassify urban built-up areas and water bodies mainly when limited temporal spectra are used for classification. Similarly, when croplands are classified by using water-fed period images only, paddy fields can be misclassified as water bodies. The misclassification errors present in the JHR LULC map may have resulted from limited temporal information carried out by the multi-spectral data and an insufficient number of training data used. The innovative data processing and mapping techniques used in the research have resulted in a highly accurate and seamless land cover map with the absence of unclassified/missing data pixels.    The higher accuracy achieved in the research suggests that the lower accuracy as reported by the existing automated mapping research (e.g., [45]) could be due to the insufficient representation of the temporal information by the sole utilization of green-season images. An effective utilization of multi-spectral, multi-textural and topographic features, as deployed in this research, is suggested for the production of a highly accurate land cover map. Nevertheless, the confusion matrix-based analysis (Figure 4e) with the validation data has indicated two major bottlenecks of the JpLC-30 map: misclassification between urban built-up areas and barren lands and misclassification between croplands and herbaceous. Due to similar spectral characteristics of urban built-up areas and barren lands, nighttime light data (500-1000-m resolution) are used for discriminating urban built-up areas from barren lands. However, the unavailability of higher resolution nighttime light data limits its applicability for 30-m resolution land cover mapping. Furthermore, the utilization of atmospherically-corrected reflectance data would be more accurate for large-scale mapping. The addition of higher resolution radar data-based (e.g., Sentinel 1A) multi-polarization and multi-textural features is expected to further increase the accuracy of the resultant JpLC-30 map. The random forests is a non-linear classifier. It consists of a large number of deep trees, where each tree is trained on the bagged data using the random selection of features, so gaining a full understanding of how the features interact non-linearly by examining each individual tree is infeasible. Therefore, to provide insights into how features interact linearly, the optimum features were further examined by using the Linear Discriminant Analysis (LDA; [59,60]). The LDA is the most commonly-used supervised image classification technique, which projects a feature space onto the non-linear random forests classifier accelerated a recordable overall accuracy of up to 88.62% (Figure 4e). The optimum features as selected from the random forests-based confusion matrix analysis are composed of the spectral, textural and topographic features of the land surface. Therefore, in a nutshell, the optimum features represent the bio-geo-physical phenomenon of the land surface. Different features are important in discriminating different land cover types pertaining to spatial-temporal dynamics. The random forests can handle highly non-linear interactions and classification boundaries of the bio-geo-physical features represented by the satellite data. The random forests classifier grows trees by searching random subspaces of the given data (features); the individual trees may change if the data are slightly changed, but the forest is relatively stable because it consists of many "random" trees. Therefore, the optimum set of features is unique to the given data, and the same set of optimum features may not be applied to other data, even for the same geographical location. Unlike theoretical models, supervised learning models are sensitive to features and the training data used for learning the model, because they analyze the training data and produce an optimum function to predict the class labels for unseen data by generalizing the training data. Due to the data-specific nature of the set of optimum features retrieved, they are not presented in the paper. However, models are to be learned with the new dataset before predicting results, and the optimum features can be chosen as described in this research. The selection of the optimum features is necessary to improve the prediction results and to boost the computing performance on very high-dimensional datasets.
Out of 45 feature images selected for optimum classification of land cover types, 43 features belonged to Landsat 8 spectral data; one feature belonged to the surface slope; and one feature belonged to the textural group (sum average). The sum average is the sum product of all neighboring pixels of a window (e.g., 3 × 3 pixels) used. The addition of textural information to the spectral signatures has been described as a valuable method for improving the land cover classification accuracy [61]. The best feature obtained in this research belonged to the textural group (sum average). It alone could yield 46.11% overall accuracy (Figure 4a). Out of 43 spectral features, all seven percentile values of NDVI were selected. This implies the importance of NDVI temporal information for discriminating land cover types.
The significant errors found in the JHR LULC map were the misclassification of urban areas and water bodies with forests and the misclassification of croplands with water bodies. The presence of deep shadows and the variety of the phenological spectra of the forests can misclassify urban built-up areas and water bodies mainly when limited temporal spectra are used for classification. Similarly, when croplands are classified by using water-fed period images only, paddy fields can be misclassified as water bodies. The misclassification errors present in the JHR LULC map may have

Conclusions
Higher resolution (~30-50 m) land cover mapping at a large scale is complicated with the traditional manual processing, analysis and interpretation of huge volumes of data. Developing a technique that can automate the overall mapping procedures as much as possible is an inevitable solution for timely and reliable production of a higher resolution land cover map. The existing higher resolution land cover maps produced at national, regional and global scales have involved labor-intensive and time-consuming procedures. This research presented an automated technique for 30-m resolution land cover mapping at a national scale with high accuracy. The major attributes of the automated technique are the construction of a reference library by combining multi-spectral, multi-textural and topographic features and mapping by selecting the optimum number of features required for discriminating the land cover types. The reference library-driven feature-rich automated technique was used to produce the Japan 30-m resolution land cover (JpLC-30) map of 2013-2015. A comparative analysis of the JpLC-30 map with the existing 50-m resolution land cover map available in Japan (JHR LULC map) provided the superiority of the JpLC-30 map. The outcomes of this research highlight the feasibility of automated and repeat monitoring of land cover at the national, regional and even the global scale by the construction of a suitable reference library.