Automatic Land-Cover Mapping using Landsat Time-Series Data based on Google Earth Engine

The Google Earth Engine (GEE) has emerged as an essential cloud-based platform for land-cover classification as it provides massive amounts of multi-source satellite data and high-performance computation service. This paper proposed an automatic land-cover classification method using time-series Landsat data on the GEE cloud-based platform. The Moderate Resolution Imaging Spectroradiometer (MODIS) land-cover products (MCD12Q1.006) with the International Geosphere–Biosphere Program (IGBP) classification scheme were used to provide accurate training samples using the rules of pixel filtering and spectral filtering, which resulted in an overall accuracy (OA) of 99.2%. Two types of spectral–temporal features (percentile composited features and median composited monthly features) generated from all available Landsat Thematic Mapper (TM) and Enhanced Thematic Mapper Plus (ETM+) data from the year 2010 ± 1 were used as input features to a Random Forest (RF) classifier for land-cover classification. The results showed that the monthly features outperformed the percentile features, giving an average OA of 80% against 77%. In addition, the monthly features composited using the median outperformed those composited using the maximum Normalized Difference Vegetation Index (NDVI) with an average OA of 80% against 78%. Therefore, the proposed method is able to generate accurate land-cover mapping automatically based on the GEE cloud-based platform, which is promising for regional and global land-cover mapping.


Introduction
Land-cover mapping and monitoring is one of the main applications of Earth observation satellite data [1], and is essential for the development and management of natural resource [2], modelling of environmental variables [3] and understanding of habitat distribution [4]. Due to the availability of a long-term data record going back to 1972 and the spatial resolution which can capture fine-grained land-cover patterns [5], images from the Landsat satellite series are an important data source for studying land-cover types and changes [6]. Figure 1. The location of the study areas which were shown in Landsat TM true color composites. The numeric labels correspond to path/row designations. Noted that the three path/rows were referred to as "scene_123032", "scene_125033" and "scene_125034", respectively, for the reminder of the article.

Landsat TM/ETM+ Data
The Landsat-5 and 7 satellites have the same sun-synchronous orbits and pass over the same location on the Earth's surface every 16 days. Therefore, images acquired by the two different satellites are offset from each other by 8 days [28]. The Landsat-5 TM and Landsat-7 ETM+ sensors have a 15° field of view and 30 m spatial resolution, and their data are available as approximately 185 km × 180 km scenes which are defined in a Worldwide Reference System (WRS) of path and row coordinates [29]. The two sensors have similar reflectance bands and all TM/ETM+ band pairs also have similar spectral response functions [30]. Therefore, composites of Landsat TM and ETM+ images have been widely used to increase the temporal frequency of Landsat observations. All Landsat TM/ETM+ surface reflectance data from the year 2010 ± 1 available in GEE (a total of 92 images from path 123 and row 32, 89 images from path 125 and row 34 and 94 images from path 125 and row 33) were used in this study. The numeric labels correspond to path/row designations. Noted that the three path/rows were referred to as "scene_123032", "scene_125033" and "scene_125034", respectively, for the reminder of the article.

Landsat TM/ETM+ Data
The Landsat-5 and 7 satellites have the same sun-synchronous orbits and pass over the same location on the Earth's surface every 16 days. Therefore, images acquired by the two different satellites are offset from each other by 8 days [28]. The Landsat-5 TM and Landsat-7 ETM+ sensors have a 15 • field of view and 30 m spatial resolution, and their data are available as approximately 185 km × 180 km scenes which are defined in a Worldwide Reference System (WRS) of path and row coordinates [29]. The two sensors have similar reflectance bands and all TM/ETM+ band pairs also have similar spectral response functions [30]. Therefore, composites of Landsat TM and ETM+ images have been widely used to increase the temporal frequency of Landsat observations. All Landsat TM/ETM+ surface Remote Sens. 2019, 11, 3023 4 of 20 reflectance data from the year 2010 ± 1 available in GEE (a total of 92 images from path 123 and row 32, 89 images from path 125 and row 34 and 94 images from path 125 and row 33) were used in this study.

MODIS Land Cover Product
The MODIS Land Cover Type Product (MCD12Q1.006), which can be accessed in GEE, provides global land-cover maps at yearly intervals and 500 m spatial resolution from 2001 to present. As one of the layers in Collection 6 MCD12Q1 [31], the layers from the International Geosphere-Biosphere Program (IGBP) classification scheme are used to provide training class labels for Landsat-based land-cover classification. The IGBP classification scheme classifies each 500 m pixel into one of 17 classes-10 out of these 17 classes were used in this study. The class names, abbreviations and descriptions are summarized in Table 1. In addition, the IGBP "cropland/natural vegetation mosaic" class was discarded because we assumed that the mosaic of cropland and natural vegetation at a scale of 500 m scale can be separated at a scale of 30 m. Table 1. Class names, abbreviations and descriptions of the 10 IGBP classes [32] used in this study.

Name Abbreviation Description
Evergreen

MODIS Nadir BRDF-Adjusted Reflectance Product
The nadir BRDF-adjusted reflectance (NBAR) product (MCD43A4) [33], which can also be accessed in GEE, provides daily 500 meter reflectance data for MODIS bands 1 through 7. These are adjusted using the BRDF model to remove the view-angle effects from the directional reflectances, resulting in a stable and consistent NBAR product. Bands 1-4, 6 and 7 of this product are used to determine the homogeneous 500 m pixels during the process of pixel filtering for MODIS land-cover products.

Method
The method ( Figure 2) used in this study consisted of three parts. Firstly, the samples for each scene were extracted automatically using the MODIS land-cover products (MCD12Q1.006). A novel method of sample collection based on a previous approach [11] was used. Secondly, two typical feature-extraction methods were used for land-cover characterization. One involved extracting the percentiles from the annual Landsat reflectance time-series; the other method consisted of making monthly cloud-free images using median composites. In addition, the median composited monthly features were compared with the maximum NDVI composited monthly features, which are widely used in land-cover classification. Finally, based on the extracted samples and Landsat features, the land-cover types were classified using the random forest algorithm. All of these steps were performed on the GEE cloud-based platform. relatively fast and flexible process. The MODIS and Landsat images can be easily resampled to the same spatial resolution (e.g., 30 meters) and transformed to the same map projection, e.g., Geographic Lat/Lon (EPSG:4326) using JavaScript codes on GEE cloud computing platform. Therefore, multisource and multi-temporal satellite imagery can be easily processed using this platform without the need to pay too much attention to the basic work of mosaicking, registration, projection conversion and resampling.

Automatic Collection of Samples from MODIS Land Cover Products
MCD12Q1 products were used to select accurate training samples automatically; the selected samples were then used to extract Landsat spectral-temporal features which were used to train the classifier. This process aimed to automatically generate land-cover mapping at 30 m using Landsat data. The preliminary filtering criteria for MCD12Q1 products that were used are described below. These criteria were designed to help select only high-quality training class labels in which confidence was high: (i) the MCD12Q1 pixel values from 2009 to 2011 were identical; (ii) the MCD12Q1 pixels that had the same values in the 8 surrounding pixels for 2010; (iii) the 500 m MODIS pixels were homogeneous; (iv) the 30 m Landsat pixels within the 500 m MODIS pixel were homogenous.
Rule (i) selected the MCD12Q1 pixels that were consistently classified as belonging to the same land-cover class from 2009 to 2011. Rule (ii) helped to reduce 500 m pixel edge effects in situations where there were possible changes in the underlying land cover. Rule (iii) was introduced because the homogeneous pixels had a higher classification accuracy [34]. Rule (iv) helped to reduce the GEE cloud-based platform provides massive amounts of multi-source satellite data and high-performance computation service, making the computation between different satellite imagery a relatively fast and flexible process. The MODIS and Landsat images can be easily resampled to the same spatial resolution (e.g., 30 meters) and transformed to the same map projection, e.g., Geographic Lat/Lon (EPSG:4326) using JavaScript codes on GEE cloud computing platform. Therefore, multi-source and multi-temporal satellite imagery can be easily processed using this platform without the need to pay too much attention to the basic work of mosaicking, registration, projection conversion and resampling.

Automatic Collection of Samples from MODIS Land Cover Products
MCD12Q1 products were used to select accurate training samples automatically; the selected samples were then used to extract Landsat spectral-temporal features which were used to train the classifier. This process aimed to automatically generate land-cover mapping at 30 m using Landsat data. The preliminary filtering criteria for MCD12Q1 products that were used are described below. These criteria were designed to help select only high-quality training class labels in which confidence was high: Rule (i) selected the MCD12Q1 pixels that were consistently classified as belonging to the same land-cover class from 2009 to 2011. Rule (ii) helped to reduce 500 m pixel edge effects in situations where there were possible changes in the underlying land cover. Rule (iii) was introduced because the homogeneous pixels had a higher classification accuracy [34]. Rule (iv) helped to reduce the spectral variation within the 30 m pixels caused by heterogeneous pixels. The thresholds in Table 2 were used Remote Sens. 2019, 11, 3023 6 of 20 to identify homogeneous regions in the MODIS and Landsat imagery: within the 3 × 3 window, if the range of values was less than a predefined threshold then it was considered to be a homogeneous pixel. In the previous approach [11], only one 30 m Landsat pixel within a filtered 500 m MODIS pixel was selected in order to reduce the data volume for mapping land cover at a continental scale. However, for the area covered by a Landsat scene (185 km × 180 km), a few 500 m MODIS pixels of land-cover type occupying a small area were left after preliminary filtering based on rules i, ii and iii. Therefore, using only one Landsat pixel per MODIS pixel may not be enough for the classifier to classify the land cover of local areas accurately. Due to the difference in spatial resolution, there are 17 × 17 (i.e., 289) Landsat pixels within one MODIS pixel. Consequently, all homogeneous Landsat pixels corresponding to the filtered MODIS pixel were selected and further steps were then taken to reduce the spectral variability of the 30 m pixels within the 500 m pixels. The details of this refinement (Rule v) are described below.
(a) Calculate the spectral centroid of the sample set: where ρ i is a vector which consists of the reflectance values of the Landsat TM/ETM+ bands 1-5 and band 7; ρ c is the spectral centroid (a vector consisting of six Landsat TM/ETM+ surface reflectance values) and is the median value of a sample set in the spectral dimension. (b) Calculate the Euclidean distance from an individual sample to the spectral centroid of the sample set: where ∆ i is the Euclidean distance from sample i to the spectral centroid of the sample set. (c) Sort the calculated Euclidean distance from small to large and retain the top 50% of the samples.
A minority of heterogeneous 30 m pixels, which may have had land-cover types that were inconsistent with the 500 m MCD12Q1 pixels, were removed by the above refinement process. For samples with the same land-cover type, the spectral variability caused by the issue of mixed pixels was reduced and the spectral purity was improved significantly.
Therefore, the quality controls for MCD12Q1 pixels are Rule i, Rule ii and Rule iii; the quality controls for Landsat pixels within MCD12Q1 pixel are Rule iv and Rule v. That is to say, the MCD12Q1 pixels filtered by Rule i, ii and iii were used as polygons for Landsat data extraction. Therefore, not all those Landsat pixels belong to the correct class, only the Landsat pixels meet the requirement of Rule iv and Rule v are further considered to be the potential correct class.

Landsat TM/ETM+ Data Pre-Processing
The Landsat TM/ETM+ Surface Reflectance data in GEE were atmospherically corrected using the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS), using a radiative transfer code with the aerosol characterization derived independently for each Landsat scene and external water vapor and ozone characterizations [36]. Using these data and the cloud-based platform, nadir BRDF correction was performed using a c-factor BRDF normalization method together with a fixed set of MODIS BRDF spectral model parameters [37] to normalize the forward and backward scattering difference in the area of overlap between the Landsat TM and ETM+ images. Topographic illumination correction was also carried using an empirical rotation model [38] to compensate for the solar irradiance, thus minimizing the variation in the reflectance of similar targets due to topography. For example, vegetation in the shadow of mountains may be confused with water in land-cover classification results. Compared with the cosine and C models, this model performs consistently well for both top-of-atmosphere and top-of-canopy Landsat reflectance data [39].

Landsat TM/ETM+ Composites
Due to the low temporal frequency, many of the images acquired by Landsat sensors are covered by cloud, which heavily limits actual data availability. In addition, the Landsat-7 ETM+ scan line corrector failed in May 2003, causing a 22% loss in data in each Landsat ETM+ image [40]. To improve data availability, pixel-based compositing of Landsat data has increased in popularity since the opening of the USGS Landsat archive in 2008 [41]. In this study, two types of spectral features-annual percentile composites [11,14,21] and monthly cloud-free composites [21,42]-were selected to characterize the land cover. Because cloud cover significantly limits the availability of Landsat images, all Landsat TM/ETM+ data from the year 2010 ± 1 that were available in the GEE were used to derive the metrics and monthly composites, irrespective of the acquisition year, in order to ensure that more cloud-free observations were available.
Based on the Landsat surface reflectance that was corrected using the GEE (Section 3.2), the 10th, 20th, 25th, 50th, 75th, 80th and 90th percentiles of annual time series observations of the TM/ETM+ reflectance bands and NDVI [43] were generated for use as the first type of spectral input; monthly cloud-free composites of the TM/ETM+ reflectance bands and NDVI from April to October were generated for use as the second type of spectral inputs. The composites for other months were discarded because of persistent cloud and snow [44]. Median composite technology was used to reduce each monthly group of images into a single composite on a per-pixel, per-band basis. The two types of Landsat-based compositing algorithms can be implemented easily and quickly on the GEE cloud-based platform. Before the compositing process was performed, areas of cloud, cloud shadow and snow in each Landsat image were masked using the CFMASK band of the Landsat SR product. Therefore, the percentile metric is extracted based on Landsat time-series "clear" (not contaminated by cloud, cloud shadow and snow) observations within the year; and monthly feature was derived based on Landsat multiple "clear" observations within the month.
In addition to the reflectance bands and NDVI, the 30 m digital elevation model (DEM), slope, and aspect bands derived from the Shuttle Radar Topography Mission (SRTM) data [45], were added to the feature inputs used for land-cover classification. The DEM, slope, and aspect features were involved because these variables were capacity of explaining the apparent variance caused by terrain factors within each class. Therefore, both compositing methods had a total of 52 band features; these are listed in Table 3.

Land-Cover Classification
The samples automatically selected from MCD12Q1.006 (Section 3.1) were used as training samples for land cover classification. The per-band pixel values of the stacked composited images were extracted from training samples and the resulting data were used to train Random Forest (RF) classifiers, which is the internal algorithm used in the GEE. The RF classifier is an ensemble machine learning algorithm which uses a set of CART decision trees to make predictions [46]. Each tree is created by randomly selected training samples and randomly selected features. New, unlabeled data is classified by all decision trees and the class with the maximum votes is the one that is finally selected. Two parameters need to be set to produce the forest trees: in this study, the number of decision trees (Ntree) was set to 100 and the user-defined number of features (Mtry) was kept at the default value (the square root of the number of input variables). The RF algorithm was chosen in this study given its superior ability to process features with a large number of dimensions, robustness to outliers and noise, and the higher classification accuracy obtained using the ensemble technique [47]. A land-cover map was generated for each test area using both compositing methods and the classification accuracy of each map was computed using the corresponding validation data sets.

Accuracy Assessment
The accuracy assessments of classification results were performed using a new independent stratified validation samples which were visually interpreted from very high resolution time series images of Google Earth. We have also used the System for Terrestrial Ecosystem Parameterization (STEP) sites which were drawn on Google Earth and used by the MCD12Q1 product, as well as the field photos from the Global Field Photo Library (eomf.ou.edu) for reference and understanding the spatial pattern of each land cover type. A total of 1375 samples from scene_123032, 1489 samples from scene_125034 and 1451 samples from scene_125033 were selected.
The traditional metrics, including user accuracy (UA), producer accuracy (PA) and overall accuracy (OA), were used to quantitatively evaluate the accuracy of classification. Kappa coefficient, a commonly used measure, was not used in this study, because it is highly correlated with overall accuracy [48]. Additionally, all the accuracy measures are adjusted with the area of each stratum (class) and presented with a 95% confidence interval [48].

Samples Automatically Extracted from MCD12Q1.006
The MCD12Q1.006 Land Cover (MLC) product was used to select samples for classifying the land cover in each selected Landsat scene. The MLC pixels which covered the same area as the corresponding Landsat scene were preliminarily filtered using rules (i), (ii), (iii) and (iv). The cloud-free images used for the selection of homogeneous MODIS and Landsat pixels for each scene are listed in Table 4.   Because the spatial resolution of MODIS is coarser than that of Landsat, the Landsat pixels within a given MODIS pixel may have outliers with different land-cover types. Generally, samples corresponding to the same land-cover type have similar spectral characteristics and form a band in a band-reflectance plot. The noisy samples have abnormal spectral curve shapes which should be removed to further refine the purity of the samples. Figure 4 illustrates the spectral features of a set of deciduous broadleaf forest samples before and after these were further refined using Rule v. It can be seen that, compared with beforehand, the homogeneity of the samples was improved significantly. In particular, the "Urban and built-up" samples with an annual maximum NDVI greater than 0.25 (experimentally obtained) were removed before further refinement based on Rule v because for small villages surrounded by cropland, the boundaries were indistinct at a scale of 500 m and the "Urban and built-up" samples were mixed with the "Croplands" type across the boundaries at the 30 m scale. Because the spatial resolution of MODIS is coarser than that of Landsat, the Landsat pixels within a given MODIS pixel may have outliers with different land-cover types. Generally, samples corresponding to the same land-cover type have similar spectral characteristics and form a band in a band-reflectance plot. The noisy samples have abnormal spectral curve shapes which should be removed to further refine the purity of the samples. Figure 4 illustrates the spectral features of a set of deciduous broadleaf forest samples before and after these were further refined using Rule v. It can be seen that, compared with beforehand, the homogeneity of the samples was improved significantly. In particular, the "Urban and built-up" samples with an annual maximum NDVI greater than 0.25 (experimentally obtained) were removed before further refinement based on Rule v because for small villages surrounded by cropland, the boundaries were indistinct at a scale of 500 m and the "Urban and built-up" samples were mixed with the "Croplands" type across the boundaries at the 30 m scale. The sampling strategy used was stratified random sampling; i.e., random sampling was performed independently for each land-cover type according to the proportion of the MCD12Q1 area covered by the type. A total of 2749 samples from scene_123032, 3183 samples from scene_125034 and 2903 samples from scene_125033 were extracted. The number of training and validation samples for each land-cover type is shown in Tables 5-7 for each scene.

Validation of the Automatically Extracted Samples
To investigate the reliability of the samples extracted by the proposed method, the extracted samples were validated using an image interpretation method based on the historical high-resolution imagery in Google Earth, which is widely used to select samples used for land-cover classification. We randomly selected one thousand samples (one hundred for each land-cover types) from the automatically generated sample database to validate the accuracy of the samples. The error rate and accuracy for each sample type and the whole set of samples are shown in Figure 5. The confusion matrix is shown in Table S1. The sample accuracy for grasslands, croplands, and urban and built-up were 98%, 99% and 95%, respectively. The sample accuracy for the remaining land-cover types was 100%. The overall accuracy of all the samples was 99.2%. These validation results demonstrate that the samples extracted by the proposed method were reliable for use in land-cover classification because the RF classifier used in this study was insensitive to noise as long as the noise threshold of 20% was not exceeded [1]. The sampling strategy used was stratified random sampling; i.e., random sampling was performed independently for each land-cover type according to the proportion of the MCD12Q1 area covered by the type. A total of 2749 samples from scene_123032, 3183 samples from scene_125034 and 2903 samples from scene_125033 were extracted. The number of training and validation samples for each land-cover type is shown in Tables 5-7 for each scene. Note: "Tra." represents the number of training samples for each class; "Val." represents the number of validation samples for each class; "Per." represents the classification result obtained using percentile features; "Max" represents the classification result obtained using maximum NDVI composited monthly features; and "Med." represents the classification result obtained using median composited monthly features. The same for Tables 6 and 7.

Validation of the Automatically Extracted Samples
To investigate the reliability of the samples extracted by the proposed method, the extracted samples were validated using an image interpretation method based on the historical high-resolution imagery in Google Earth, which is widely used to select samples used for land-cover classification. We randomly selected one thousand samples (one hundred for each land-cover types) from the automatically generated sample database to validate the accuracy of the samples. The error rate and accuracy for each sample type and the whole set of samples are shown in Figure 5. The confusion matrix is shown in Table S1. The sample accuracy for grasslands, croplands, and urban and built-up were 98%, 99% and 95%, respectively. The sample accuracy for the remaining land-cover types was 100%. The overall accuracy of all the samples was 99.2%. These validation results demonstrate that the samples extracted by the proposed method were reliable for use in land-cover classification because the RF classifier used in this study was insensitive to noise as long as the noise threshold of 20% was not exceeded [1].

Figure 5.
The error rate and accuracy for the samples automatically extracted by the proposed method (Abbreviations on the x-axis correspond to different land-cover classes: see Table 1).

The Landsat-based Classification Results
Using the samples automatically selected by the proposed method, classification of land cover using the Landsat imagery was performed. The classification results for each scene obtained using percentile features and median composited monthly features are shown in Figure 6(b) and Figure  6(d), respectively. From a visual comparison, the spatial patterns of the land-cover types are similar for the two sets of results. The number of training and validation samples, the classification accuracy assessment (including UA, PA and OA) for each scene are given in Tables 5-7. Accuracy measures are adjusted by the area of each stratum (class) and presented with a 95% confidence interval. All the confusion matrixes are shown in Tables S2-S10. As can be clearly seen, the OA of the classification results based on median composited monthly features are 79%, 80% and 80% for scene_123032, scene_125034 and scene_125033, respectively; and those based on the percentile features are 75%, 78% and 77% for scene_123032, scene_125034 and scene_125033, respectively. The average OA of the former (80%) is higher than that of the latter (77%). In the case of the single-type results, for most land-cover types, the producer accuracy was higher for the results obtained using the median composited monthly features than for the results obtained using the percentile features. A comparison of the results thus demonstrates that, in terms of multitemporal Landsat classification, the performance of the monthly composited features was superior to that of the percentile composited features. The time-series monthly features retain phenological information about the land cover and so perform better at land-cover characterization. The percentile features were affected by the number of valid observations: even though as many Landsat data as possible were used to derive the percentile features, the number of valid observations still varied from pixel to pixel.
In addition, the median composite method was compared to the maximum NDVI composite method, which is a method commonly applied to monthly cloud-free composited images [21]. The particular observation-not cloud-contaminated or missing-corresponding to the maximum NDVI in the specific temporal range (i.e., one month) was used for compositing. The classification results obtained using the maximum NDVI composited monthly features for each scene are shown in Figure  6(c). As shown in Tables 5-7, the OA of the classification results that were based on the maximum NDVI composited monthly features were 77%, 79% and 79% for scene_123032, scene_125034 and scene_125033, respectively. The average OA of this compositing method (78%) was lower than that 0% 20% 40% 60% 80% 100% Percentage Class Error rate Accuracy Figure 5. The error rate and accuracy for the samples automatically extracted by the proposed method (Abbreviations on the x-axis correspond to different land-cover classes: see Table 1).

The Landsat-based Classification Results
Using the samples automatically selected by the proposed method, classification of land cover using the Landsat imagery was performed. The classification results for each scene obtained using percentile features and median composited monthly features are shown in Figure 6b,d, respectively. From a visual comparison, the spatial patterns of the land-cover types are similar for the two sets of results. The number of training and validation samples, the classification accuracy assessment (including UA, PA and OA) for each scene are given in Tables 5-7. Accuracy measures are adjusted by the area of each stratum (class) and presented with a 95% confidence interval. All the confusion matrixes are shown in Tables S2-S10. As can be clearly seen, the OA of the classification results based on median composited monthly features are 79%, 80% and 80% for scene_123032, scene_125034 and scene_125033, respectively; and those based on the percentile features are 75%, 78% and 77% for scene_123032, scene_125034 and scene_125033, respectively. The average OA of the former (80%) is higher than that of the latter (77%). In the case of the single-type results, for most land-cover types, the producer accuracy was higher for the results obtained using the median composited monthly features than for the results obtained using the percentile features. A comparison of the results thus demonstrates that, in terms of multitemporal Landsat classification, the performance of the monthly composited features was superior to that of the percentile composited features. The time-series monthly features retain phenological information about the land cover and so perform better at land-cover characterization. The percentile features were affected by the number of valid observations: even though as many Landsat data as possible were used to derive the percentile features, the number of valid observations still varied from pixel to pixel.
Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 20 based on the median composited monthly features (80%). A comparison of these results thus also shows that the median composite method was superior to the maximum NDVI composite method for constructing cloud-free time-series of spectral features. Therefore, it can be concluded that landcover types can be more accurately characterized by monthly median composited features.  In addition, the median composite method was compared to the maximum NDVI composite method, which is a method commonly applied to monthly cloud-free composited images [21]. The particular observation-not cloud-contaminated or missing-corresponding to the maximum NDVI in the specific temporal range (i.e., one month) was used for compositing. The classification results obtained using the maximum NDVI composited monthly features for each scene are shown in Figure 6c. As shown in Tables 5-7, the OA of the classification results that were based on the maximum NDVI composited monthly features were 77%, 79% and 79% for scene_123032, scene_125034 and scene_125033, respectively. The average OA of this compositing method (78%) was lower than that based on the median composited monthly features (80%). A comparison of these results thus also shows that the median composite method was superior to the maximum NDVI composite method for constructing cloud-free time-series of spectral features. Therefore, it can be concluded that land-cover types can be more accurately characterized by monthly median composited features.
As can be seen in Tables 5-7, the accuracies of the same land cover type were not identical in different regions. Based on the classification results obtained by median composited monthly features, CSL, WS, SVN and UB have greater difference in producer accuracy (more than 20%) between different regions while DBF, GRL, CRL and WTR have smaller difference in producer accuracy (less than 15%). Region-specific climate and environmental condition usually lead to the variation of spatial pattern and spectral features of the same land cover type. This also highlighted the importance of independent classification of geographic strata.
Based on the results obtained by median composited monthly features, the average producer accuracy was used to compare the differences between single class accuracies. The single classes, sorted by PA in descending order are WTR, GRL, DBF, WS, CRL, SVN, CSL and UB. Therefore, water, grasslands and deciduous broadleaf forest classes have the top three highest producer accuracy while closed shrublands and urban classes have the two lowest accuracies. For classification results obtained by monthly features, the average OA with the median method is slightly higher than that with the maximum NDVI method. However, GRL have higher producer accuracy with the maximum NDVI method, while DBF, CSL, WS, SVN and UB have better producer accuracy with the median method. CRL and WTR have the same accuracy with either maximum NDVI or median method. Furthermore, no significantly better result was obtained with the two methods combined.
In addition, the differences of class area between MODIS classification and the resulting classification of this study were compared and analyzed. The area statistics of land cover class for MODIS land cover product and the resulting Landsat-based classification were illustrated in Figure 7. As can be clearly seen, except for grassland and cropland, most land cover types have no significant area difference among the four classification results. In scene_123032 and scene_125033, the area of grassland type derived from MODIS classification is substantially larger than that derived from Landsat-based results. Additionally, the area of cropland type derived from MODIS classification is obviously smaller in scene_123032 and scene_125033, and slightly larger in scene_125034. The comparison results demonstrated that the scale effect of satellite imagery has great influence on the area of grassland and cropland cover type.

Effect of Feature Combinations and Training-Sample Size
Multitemporal satellite images are widely used in land-cover classification [49]. Because using single-date imagery to identity land-cover types with similar spectral features can be challenging [50,51], multitemporal images have been employed to make use of the phenological variation which is a valuable information source for improving the accuracy of land-cover classification [52]. However, in terms of land-cover classification, the usefulness of information does not always increase in proportion to the number of multitemporal images [53]. For example, the separability between land cover types was found to be highest using images from December and January, and lowest using images from July and August [54]; the classification accuracy achieved by using Landsat images from all four seasons was higher than that achieved by using single-season imagery for land-cover classification in urban areas [55]. Remote Sens. 2019, 11, x FOR PEER REVIEW 15 of 20

Effect of Feature Combinations and Training-Sample Size
Multitemporal satellite images are widely used in land-cover classification [49]. Because using single-date imagery to identity land-cover types with similar spectral features can be challenging [50,51], multitemporal images have been employed to make use of the phenological variation which In this section, the effect of feature combination on classification accuracy is fully explored for the two types of feature input based on the RF classifier. Features were randomly combined based on the mathematical principles of permutation and combination (i.e., C m n equals the number of combinations when m features are taken from n features). The OA measurement, which was derived based on validation samples, was used to quantitatively evaluate the performance for each feature combination. Figures 8a and 9a illustrate the classification accuracy of each feature combination for the specific number of selected percentile/monthly features (from 1 to 7). The results demonstrate that different combinations of the same number of features produced different classification accuracies and that the range of variation for the percentile features was smaller than that for the monthly features. However, RF gave a higher overall accuracy with an increased number of percentile or monthly features, which was more evident in the case of the monthly features than for the percentile features. that for the monthly features. However, RF gave a higher overall accuracy with an increased number of percentile or monthly features, which was more evident in the case of the monthly features than for the percentile features.
In addition, the effect of the training-sample size on the classification accuracy was assessed for the two types of feature input based on the RF classifier and for the validation samples. The effect of the training-sample size was evaluated using the OA by altering the size of the training sets in increments of 5%, from 5% to 95%. Figure 8(b) and Figure 9(b) illustrate the relationship between the size of the training sample and classification accuracy. It can be seen that both the percentile and monthly features had a low sensitivity to the training-sample size. For an increase in sample size from 5% to 95%, the overall accuracy changed from 68% to 76% (an 8% increase) for the percentile features and from 67% to 80% (an 13% increase) for the monthly features. Thus, for the percentile features, the accuracy changed by less as the number of training data increased than it did in the case of the monthly features. However, the relative increase in OA as the training-sample size increased was similar for both percentile and monthly features.

Conclusion
The intention of this study was to develop a method for automatic land-cover mapping at a scale of 30 m based on the Google Earth Engine cloud-based platform. A novel method for automatic collection of training samples was proposed based on previous approaches to resolve the issues of insufficient sample size and sample confusion in local-area mapping. MCD12Q1.006 land-cover products were used to provide reliable training class labels based on the rules of pixel filtering and spectral filtering. All available Landsat TM/ETM+ data acquired in the study area from the year 2010 ± 1 were used for the land-cover classification. Two types of spectral-temporal features (percentile composited features and median composited monthly features) were used for characterizing the land cover. In addition, the monthly features composited using the median were compared with those composited using the maximum NDVI, the latter being a commonly used method for temporal compositing of images. The following conclusions can be drawn. (i) The samples automatically extracted by the proposed method were reliable and accurate with an overall accuracy of 99.2%; (ii) Both the percentile features and monthly features produced excellent land-cover classification results; however, the classification produced using the median composited monthly features was more accurate than that obtained using the percentile features -average overall accuracy of 80% against 77%. In addition, the monthly features composited using the median values In addition, the effect of the training-sample size on the classification accuracy was assessed for the two types of feature input based on the RF classifier and for the validation samples. The effect of the training-sample size was evaluated using the OA by altering the size of the training sets in increments of 5%, from 5% to 95%. Figures 8b and 9b illustrate the relationship between the size of the training sample and classification accuracy. It can be seen that both the percentile and monthly features had a low sensitivity to the training-sample size. For an increase in sample size from 5% to 95%, the overall accuracy changed from 68% to 76% (an 8% increase) for the percentile features and from 67% to 80% (an 13% increase) for the monthly features. Thus, for the percentile features, the accuracy changed by less as the number of training data increased than it did in the case of the monthly features. However, the relative increase in OA as the training-sample size increased was similar for both percentile and monthly features.

Conclusions
The intention of this study was to develop a method for automatic land-cover mapping at a scale of 30 m based on the Google Earth Engine cloud-based platform. A novel method for automatic collection of training samples was proposed based on previous approaches to resolve the issues of insufficient sample size and sample confusion in local-area mapping. MCD12Q1.006 land-cover products were used to provide reliable training class labels based on the rules of pixel filtering and spectral filtering. All available Landsat TM/ETM+ data acquired in the study area from the year 2010 ± 1 were used for the land-cover classification. Two types of spectral-temporal features (percentile composited features and median composited monthly features) were used for characterizing the land cover. In addition, the monthly features composited using the median were compared with those composited using the maximum NDVI, the latter being a commonly used method for temporal compositing of images. The following conclusions can be drawn.
(i) The samples automatically extracted by the proposed method were reliable and accurate with an overall accuracy of 99.2%; (ii) Both the percentile features and monthly features produced excellent land-cover classification results; however, the classification produced using the median composited monthly features was more accurate than that obtained using the percentile features -average overall accuracy of 80% against 77%. In addition, the monthly features composited using the median values outperformed those composited using the maximum NDVI values in terms of the classification performance -average overall accuracy of 80% against 78%; (iii) For single class accuracy, WTR, GRL and DBF have the top three highest producer accuracy while CSL and UB have the two lowest accuracies, based on median composited monthly features. Additionally, GRL have higher producer accuracy with the maximum NDVI method, while DBF, CSL, WS, SVN and UB have better producer accuracy with the median method; (iv) Higher overall accuracies were achieved with an increased number of percentile or monthly features. Both the percentile and monthly features had a low sensitivity to the training-sample size and the OA increased as the size of the training samples increased.
Therefore, the method proposed in this paper was able to achieve land-cover classification for Landsat data automatically, quickly and accurately based on the Google Earth Engine cloud-based platform. These results are of great significance to regional and global land-cover mapping.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/11/24/3023/s1, Table S1: Accuracy assessment of the automatically extracted samples, Table S2: Confusion matrix for the classification result of scene_123032 based on percentile features. Accuracy measures are adjusted with the area of each class and presented with a 95% confidence interval (The same for Tables S3-S10), Table S3: Confusion matrix for the classification result of scene_123032 based on maximum NDVI composited monthly features, Table S4: Confusion matrix for the classification result of scene_123032 based on median composited monthly features, Author Contributions: Conceptualization, L.L.; methodology, S.X. and X.Z.; software, S.X. and X.C.; validation, S.X. and Y.G.; investigation, S.X. and J.Y.; resources, J.Y.; writing-original draft preparation, S.X.; writing-review and editing, X.Z. and L.L.