Long Time Series High-Quality and High-Consistency Land Cover Mapping Based on Machine Learning Method at Heihe River Basin

: Long time series of land cover changes (LCCs) are critical in the analysis of long-term climate, environmental, and ecological changes. Although several moderate to ﬁne resolution global land cover datasets have been publicly released and they show strong consistency at the global scale, they have large deviations at the regional scale; furthermore, high-quality land cover datasets from before 2000 are not available and the classiﬁcation consistency among different datasets is not very good. Thus, long time series of land cover datasets with high quality and consistency are in great demand but they are still unavailable, even at the regional scale. The Landsat series of satellite imagery composed of eight successive satellites can be traced back to 1972 and it is, therefore, possible to produce a long time series land cover dataset. In addition, the newly available satellite data have the capability to construct time series satellite images and a time series analysis method such as LCMM can be employed for making high-quality land cover datasets. Therefore, by taking the advantages of the two categories of satellite data, we proposed a new time series land cover mapping method based on machine learning and it, thereafter, is applied to Heihe River Basin (HRB) for veriﬁcation purposes. Firstly, the high-quality land cover datasets at HRB from 2011–2015, which were retrieved using the LCMM method, are used for quickly and accurately making training samples. Secondly, a strategy for transferring the training samples after 2011 to earlier years is established. Thirdly, the random forest model is employed to train the selected yearly samples and a land cover map for every year is subsequently made. Finally, comprehensive analysis and validation are carried out for evaluation. In this study, a long time series land cover dataset including 1986, 1990, 1995, 2000, 2005, 2010, 2011, 2012, 2013, 2014, and 2015 is ﬁnally made and an average precision of about 90% is achieved. It is the longest time series land cover map with 30 m resolution at HRB and the dataset has good time continuity and stability.


Introduction
Land cover changes (LCCs) are the result of human activities and natural evolution and LCC has great impacts on the climate system and ecology [1]; it is subsequently an important factor in studying environmental changes and climate change. Therefore, a better understanding of LCC is necessary to provide a reference for evaluating the vulnerability of carbon and water cycles [2,3] and other ecosystem processes related to global or regional change [4]. Especially, long time series of land cover maps have been available from long-term land cover mapping based on the sequence of remote sensing detection, which has provided more information on land change in the analysis of longterm climate, environmental, and ecological changes.
Due to its capability of global coverage within a short period, remote sensing has become widely used for global land cover mapping, as it can capture the changes of land cover quickly. Until now, several moderate to fine resolution global land cover datasets have been publicly released [5][6][7][8][9][10][11][12] and the spatial resolutions of these products are between 1 km and 30 m. They show strong consistency at the global scale, but they have large deviations at the regional scale. The main problems include: (1) most of the land cover datasets have a relatively low resolution (500 m and lower) and their accuracies are usually lower than 75% [8,13]; among the released datasets, GlobeLand30 [14] and FROM-GLC30 [9] have a spatial resolution of 30 m, and their overall accuracies are 80% and 72% respectively; (2) the classification accuracy of these released datasets at the regional scale is much lower than the claimed accuracy, especially at heterogeneous areas, which is difficult to meet the modeling requirements, and (3) all the datasets are after 2000 and do not have long time series and therefore cannot support the analysis of long-term climate, environmental, and ecological changes.
Recently, satellite constellations and satellite-loaded sensors with large swath, such as Sentinel-2, FORMOSAT-2, Chinese Huanjing-1, and GF-1, have been launched and subsequently have the capability to scan the whole globe every few days at high resolution; hence time series of remote sensing images are increasingly available. Some regional land cover maps with an accuracy of over 90% have been made [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18] by employing time series analysis on satellite images. Furthermore, the authors of [19] developed a Land Cover mapping method by using multi-classifiers and multisource remotely sensed imagery (LCMM) by using HJ-1/CCD time series of images to make a finer land cover map at Heihe River Basin (HRB) from 2011-2015 [19]. However, satellite image time series were not available before 2011. Therefore, long time series of land cover datasets are in great demand but still unavailable, even at the regional scale.
In order to make long time series of land cover datasets of high spatial resolution to support the analysis of long-term climate, environmental, and ecological changes, the Landsat series of satellite imagery is the only choice. It is composed of eight satellites and can be traced back to 1972. It provides data support for continuous detection of the global surface and is of great significance; furthermore, it is freely available [20] and the emergence of new computing tools, such as Google Earth Engine (GEE), provides powerful cloud computing capabilities [21]. However, the 16-day revisiting period makes it very difficult to construct time series data, so the methods based on time series analysis, such as LCMM, cannot be employed to make a high-quality land cover map, especially at early stages.
In recent years, machine learning (ML) algorithms, such as random forest tree (RFT), support vector machine (SVM), and neural network (NN), in remote sensing land cover classification have been greatly improved in both efficiency and accuracy with the increase of computation capability and the technological development of artificial intelligence. However, the training samples, usually based on manual collecting or interpretation from satellite images, stopped them from being used widely, especially on large-scale applications. Furthermore, the timeliness of these methods did not allow them to be used for emergency cases [22,23]. Subsequently, an automatic sampling strategy for retrieving high-quality samples has become more and more important, especially for large-scale remote sensing land cover mapping [8,24]. Because samples from historical land cover maps and classification cases contain the prior information and knowledge that is helpful to the classification of the current satellite images, transfer learning is employed to solve the problems [25][26][27][28] and it has achieved good results for applications at the local scale. However, the accuracy is degraded without considering the samples incorrectly introduced by historical samples. In addition, these methods have not fully used the historical land cover maps with multi-year coverage [22,29]. Therefore, we propose a new time series land cover mapping method based on machine learning and it is applied to HRB for verification purposes. Firstly, the highquality land cover datasets at HRB from 2011-2015, which were retrieved using the LCMM method and can be downloaded at http://westdc.westgis.ac.cn/data/6bbf9a3f-e7d8-4 255-9ecb-131e1543316d, accessed on 19 April 2021, are used for quickly and accurately making training samples. Secondly, a strategy for transferring the training samples after 2011 to earlier years is established. Thirdly, the random forest model is employed to train the selected yearly samples and the land cover maps for earlier years are subsequently made. Finally, comprehensive analysis and validation are carried out for evaluation.

Study Area
The study area, HRB, is located at the northeast of the Tibetan plateau. Its geographical coordinates are between 97.1 • E-102.0 • E and 37.7 • N-42.7 • N, and it covers an area of approximately 143,000 km 2 . HRB's elevation ranges from 2000-5000 m and it covers highly heterogeneous landscapes including cold and arid landscapes at the upper stream, the artificial oasis-riparian ecosystem-wetland-desert compound in the middle stream, and the natural oasis and desert at the lower stream. Therefore, the complicated landscapes would be a good test site for verifying the proposed method and this was the first reason behind selecting HRB as the study area ( Figure 1). HRB is a typical inland river basin in China, and it has served as an experimental site for integrated watershed studies, land surface measurements, and hydrological observations for a very long time [30]. Many major research plans have been carried out here, for example, the "Integrated research on the eco-hydrological process of the Heihe River Basin" launched by the National Natural Science Foundation of China [31]. Under the support of these projects, many ground experiments and much scientific research has been carried out to collect a large number of ground measurements, remote sensing data, and land surface parameters including high-quality land cover datasets. Therefore, HRB has been well investigated and it is subsequently conducive to this study.

Data and Preprocessing
In this study, four categories of data were used and they include TM and OLI from Landsat series of satellites, SRTMGL1_003 from ASTER, land cover datasets of 2011-2015 from the LCMM method [19] using HJ-1/CCD data, and high-resolution images from Google Earth for sample verification and validation. The details of these data are listed in Table 1. Since the Landsat/TM images are not enough for land cover mapping before 1986, the earliest year in this study is 1986. In total, 3257 scenes of TM/OLI images including 2084 Landsat5/TM and 1173 Landsat8/OLI surface reflectance images were used from GEE in this study. The number of scenes for each year is listed in Table 2. Among them, the number in 1986 is the least, with only 109 scenes, and the year 2014 has the most scenes, 439. In addition, the number of Landsat8/OLI is more than that of Landsat5/TM. Due to the failure of the Landsat7 satellite, ETM+ was not used [32]. Figure 2 shows the seasonal reflectance image composites for 1986 and 2014 and the composites of 2014 look better than those of 1986 because of more available data (see statistics on Table 2). Especially, the autumn composite of 1986 has a lot of noise induced by clouds, so the final land cover may degrade. The details will be discussed in the discussion section.  In order to make a usable surface reflectance composite, the following preprocessing procedures were carried out.
(1) The CFMASK algorithm [33] was used to generate the quality assessment (QA) band and cloud contamination was subsequently removed. (2) For Landsat 5 images, a negative buffering method [34] was employed to remove bad pixels at edges.
(3) The percentile reducer in GEE was used to mosaic the Landsat images within one season and 25% was used as a threshold for better noise removing. (4) Aiming at minimizing the missing data, a reconstruction algorithm [35] striving to ensure the authenticity and integrity of the data was employed to reconstruct the missing portion of the data.

The Land Cover Classification System
Based on the analysis of existing classification systems, such as IGBP [7] and GLC2000 [6], the requirements from land process modeling at HRB, the characteristics of the HRB, and the capability of the remotely sensed data used in this study, a specific classification system was constructed, which is listed in Table 3. The criteria for building the classification system are as follows: (1) The classification system was a combination of the IGBP [7,8], GLC2000 [6], and GlobCover [5] systems; (2) Based on many ground campaigns at HRB, prior knowledge related to land cover from these campaigns was used to determine the classification system. (3) In order to make a consistent land cover dataset at HRB, the capability of the Landsat series of data directly determined the classification system.

Methodology
The objective of this study is to make the longest time series land cover dataset at HRB (1986-2015) with high accuracy, high spatial resolution, and excellent consistency to support the analysis of long-term climate, environmental, and ecological changes from the perspective of the regional scale or the river basin scale. The Landsat series of satellite imagery composed of eight successive satellites can be traced back to 1972 and it is, therefore, the only choice to work on it. The statistics of data availability in Table 2 and the surface reflectance composites of 1986 and 2014 in Figure 2 further prove that Landsat series satellite data are feasible for this work. However, several factors stopped us from making high-quality and excellent-consistency land cover datasets by only using Landsat series satellite data and they are concluded as follows: (1) Many comprehensive classification methods, such as GlobeLand30 [14] using Landsat series satellite data have been developed but the ones with high accuracy usually require manual intervention; thus, more than 10 years' land cover maps with areas over 140,000 km 2 do not allow a lot of manual intervention. (2) Although methods based on time series analysis like LCMM have the capability for making high-quality land cover datasets [19], a 16-day revisiting period of Landsat series satellite data does not support constructing time series data at HRB. (3) Although machine learning methods, such as FROM-GLC30 [9] have a lot of advantages and have been employed to map land cover using remote sensing imagery, they are usually limited by sampling amount and representativity; the requirement for consistency on data of these methods cannot be met by Landsat series satellite data (see Figure 2), while they are applied to make a long time series land cover dataset.
In order to solve the above problems, we proposed a new time series land cover mapping method based on machine learning by taking the advantages of multiple satellite data and the well explored HRB is taken as the experimental site for land cover mapping and thereafter validation. The procedure of the proposed method is illustrated in Figure 3 and the major idea is described as follows: (1) The new satellite data with high frequency, such as HJ-1/CCD and Sentinel2/MSI, were firstly used to construct monthly time series surface reflectance and they were subsequently used in the LCMM method [19] for time series analysis to make the high-quality land cover datasets at HRB from 2011-2015. In this study, the land cover dataset at HRB from 2011-2015 was made, and publicly released at the Western Data Center of China (http://westdc.westgis.ac.cn/, accessed on 19 April 2021) [19] and this dataset was well tested by many applications under the support of the HiWATER project [31], so it was directly used in this study. This takes advantage of the newly available satellite data. (2) Instead of making the land cover year by year for 10 years, the machine learning method was chosen as the classifier to lower the labor and time costs. While employing machine learning, the training sample is always the key to its performance and it usually requires a lot of labor for manual sampling; subsequently, an automatic sampling strategy was established in this study to retrieve enough accurate training samples from the high-quality land cover dataset at step 1 by comprehensively using the land cover maps from all five years. The details of the strategy are presented in Section 2.4.2.

Land Cover Dataset at HRB from LCMM Method
LCMM is a comprehensive land cover mapping method using multiple classifiers and multisource remotely sensed imagery. Multisource remotely sensed data have advantages in spatial resolution (VHSR images from Google Earth), temporal resolution (monthly HJ-1/CCD images), and spectrum (Landsat/TM). In the meantime, multiple classifiers including time series analysis, SVM, thresholding, object-based method, and decision trees were all employed for different classification purposes. All the classifiers and data were successfully integrated by LCMM, and a land cover dataset at HRB with a high accuracy of over 90% was made in a simple and efficient way, which has been largely downloaded (463 downloads on 12 January 2020) from the datacenter website and widely used for different applications and scientific research, such as vegetation parameter retrieval [36], eco-hydrological modeling [37], land process modeling [38] and so on. However, only land cover maps in 2011-2015 were made because of data availability. In addition, this land cover dataset is monthly, so it needs to be aggregated to a yearly map based on the classification system in Table 3 before using and the mapping rules for different classes can be found in Table 3. In addition, the accuracy can be improved further after 2015, because the Sentinel-2/MSI data from ESA and GF1/6-WFV data from China can compose a higher frequency of time series images with better quality.

Automatically Sampling Strategy from 2011-2015 Land Cover Dataset
Now that high-quality land cover maps at HRB in 2011-2015 are available, the training samples for machine learning methods can be automatically retrieved through random stratified sampling based on the land cover maps. However, the land cover maps are not 100% accurate, so the errors in land cover maps will lead to the wrong samples and they will subsequently degrade the quality of final land cover maps. Therefore, an automatically sampling mechanism needs to be established to further improve the accuracy of samples. The following sampling rules will further guarantee the sampling accuracy from the sample amount, sample refining, and sample distribution; Figure 4 illustrates the sampling procedures.
(1) Based on previous research, land cover classification for large areas requires a larger number of samples and training samples are better when proportional to their areas [39,40]. The authors of [39] indicated that the training sample size should account for approximately 0.25% of the study area. HRB's area exceeds 140,000 square kilometers and the samples were, therefore, close to 400,000 pixels. However, the barren land at HRB has an area more than 80% of the total area and most of the samples for barren land can be greatly reduced without degrading the training; therefore, 50,000 samples were selected for training purposes while considering the calculation efficiency and classification accuracy. (2) The varied pixels in five years have relatively low confidence and only pixels with a consistent category in all five years are therefore used for sampling. This rule further confines the land cover map for sampling to guarantee the accuracy of the final samples. (3) The confined land cover map was objectized and the number of samples was distributed to each object (land cover feature). For each object, 50% of the samples were randomly distributed to the central part of the object and the others were randomly distributed to the part close to its boundary. The central part of an object has typical characteristics such as the corresponding land cover and the boundary is usually easy to be confused with the neighboring land cover; therefore, this rule will improve the classification of boundaries. Based on the above rules, the samples were retrieved and they were subsequently put into machine learning models (the random forest model was chosen in this study and the reasons will be discussed in Section 2.4.4.) for training.

Sample Transferring Strategy for Earlier Years
The trained random forest model looks perfect to be used for land cover mapping directly in earlier years (before 2011); however, the seasonal surface reflectance composites for each year were not as consistent as expected (see Figure 2) and the accuracy by directly using the trained model was subsequently degraded. Figure 5 gives an example of land cover classification based on machine learning transferring. The model used in 2010, 2005, and 2000 was the same one that was trained by the samples collected in Section 2.4.2. It is obvious that the results are not temporally consistent, especially for barren land and forests. In addition, the large number of samples did not allow manual checking. Therefore, a sample transferring strategy for earlier years needs to be established to lower the labor and time costs while guaranteeing the sampling accuracy. The major procedures for the sample transferring are as follows: (1) The trained random forest model was applied to each year's seasonal surface reflectance composites to produce a land cover map as a reference. (2) The training samples in Section 2.4.2. were compared with the land cover reference map from step 1 and the unmatched samples were removed from the sample collection. (3) The surface reflectance composites were automatically checked to remove those samples whose surface reflectance was abnormal, such as noise, cloud contamination, and cloud shadow. (4) Finally, if the number of samples was lower than the requirement, some new samples would be manually added to correct the amount of the samples. Although some manual work is required, only a few samples need to be added, which greatly reduced the labor and time costs while compared to all-labor sampling.
Based on the above procedures, the samples for every year before 2011 were collected and were subsequently put into the random forest model for training. Finally, the land cover maps before 2011 were made.

Machine Learning Model Selection
After training samples are collected, many classifiers can be employed to implement land cover classification, such as maximum likelihood classifier (MLC) [41,42], Support Vector Machine (SVM) [43] classifier, and Random Forest (RF) classifier [44][45][46]. The advantages of MLC include simplicity, computational efficiency, and robustness, but its accuracy is usually limited by its simplicity. The SVM classifier has been widely used for its outstanding performance in remote sensing classification, especially for instances with fewer samples [47]. Much research has reported that the RF classifier performs well [48]; in addition, the RF classifier is robust and accurate while dealing with high-dimensional data, such as multi-spectral and multi-temporal remote sensing images [44,49,50], which is more suitable for the multi-feature dimensions constructed in this paper. Therefore, the RF classifier was employed as the training model and the parameters were set as follows: the number of trees was 100 because it proved it can achieve a better result when considering the classification accuracy and efficiency; the number of variables per split was set to 0 (default), which means the square root of the number of variables; the min leaf population is 1 (default) and the bag fraction was 0.5 (default).

Classification Results
Based on the procedure in Section 2., the land cover maps from 1986-2015 were produced and they are shown in Figure 6.

Validation
The procedure for validating the classification results is as follows: (1) Randomly sample from the classification map by land cover types. The sample number for each class was determined by the area ratio of the class. The sampling details are shown in Table 4.  Table 4 gives an example of the confusion matrix of 2014. The overall accuracy of classification in 2014 is 93.68%, and the kappa coefficient is 0.92. Because of its easy confusion with forests and grassland, shrubland had the lowest accuracy, whose producer's accuracy (PA) was only 77.78%. Except that, the user's accuracy (UA) of grassland and bare land is a little bit lower than 90%, the PAs and UAs of the other classes are all over 90%.  The OAs and Kappa coefficients for each year are listed in Table 5 and it shows that almost all of OAs are over or close to 90%. An average accuracy of 90.32% and an average kappa coefficient of 0.88 were achieved. The OA in 1986 is 85.2% and the result is degraded by the lack of data. Therefore, the proposed method is effective by combining high-precision samples with historical data to produce a land cover dataset. In order to further verify the land cover dataset in this study, the land cover maps at HRB from both GlobeLand30 and the proposed method were compared at close views in 2000 and 2010 at the first place, which is shown in Figure 7. Based on the visual comparison, the advantages of the proposed method can be concluded as follows: (1) The classification accuracy of the proposed method is much higher than that of the GlobeLand30. For example, the small villages on A and C are accurately classified in our map and they are completely missed in the GlobeLand30 one. (2) The temporal consistency is better than that of the GlobeLand30. B1 (2000) and D1 (2010) from the GlobeLand30 were very different; B1 had a large area of shrub and water, but D1 did not. In contrast, B3 (2000) and D3 (2010) from the proposed method are very consistent. (3) Waterbodies can be better classified with seasonal composites in our method. Only limited data used in GlobeLand30 caused the waterbodies to not be discerned.
In order to better illustrate the temporal consistency of our land cover maps at longer time series, the time series of major land covers, such as cropland, built-ups, and waterbodies for every five years from 1986-2015 were plotted. Figure 8 shows two close-view examples of time series land covers from 1986-2015. The cropland and built-ups have been continuously increasing. Furthermore, the total areas at HRB for every five years since 1986 for the major land covers, such as built-ups and urban, snow and ice, forest, and cropland, were calculated and plotted in Figure 9. The increasing cropland and built-ups are consistent with the increasing human activities. The decreasing of snow and ice is strongly related to the global warming trend. The time series analysis of land covers further supports the effectiveness of the proposed method for long time series of land cover mapping.

Conclusions and Discussions
In this paper, a time series land cover mapping method is proposed to produce long time series of a land cover dataset with high accuracy and consistency, especially for earlier years with fewer and lower quality data. The proposed method takes the advantages of time series Landsat images and the high-quality land cover datasets from the LCMM method; the high-quality land cover datasets from LCMM are used for quickly locating the accurate training samples and the RF classifier is employed to train the collected samples of each year to finally get the land cover maps. Based on the comprehensive validation, an average classification accuracy of 90.32% and an average kappa coefficient of 0.88 are achieved, which is suited for most of the applications and research at HRB. Compared to some other land cover datasets, such as LCMM, GlobeLand30 [11], and FROM-GLC30 [9], the proposed method is more applicable for long time series analysis while land process modeling because of the following advantages: (1) It has the longest time series land cover dataset at HRB with 30 m spatial resolution, which starts from 1986. (2) It has an average classification accuracy of over 90% and has high temporal consistency, making it the best land cover map at HRB among the available ones. (3) The automatic strategy for collecting training samples from high-quality land cover maps and transferring samples to earlier years makes it efficient and accurate. Therefore, the proposed method provides a solution for making high-quality land cover maps of earlier years, even though new and high-quality data are not available.
Although the new method is developed for HRB, the methodology can be extended to other regions, which is the next plan of our research. While the new method can be applied to other regions, the land cover map from LCMM needs to be made for the first time and it will take a lot of work; therefore, instead of making land cover maps from LCMM, the strategy for transferring the training samples at HRB to other regions needs to be explored in the perspective of practical use.