Mapping Main Grain Crops and Change Analysis in the West Liaohe River Basin with Limited Samples Based on Google Earth Engine

: It is an important issue to explore achieving high accuracy long-term crop classiﬁcation with limited historical samples. The West Liaohe River Basin (WLRB) serves as a vital agro-pastoral ecotone of Northern China, which experiences signiﬁcant changes in crop planting structure due to a range of policy. Taking WLRB as a case study, this study constructed multidimensional features for crop classiﬁcation suitable for Google Earth Engine cloud platform and proposed a method to extract main grain crops using sample augmentation and model migration in case of limited samples. With limited samples in 2017, the method was employed to train and classify crops (maize, soybean, and rice) in other years, and the spatiotemporal changes in the crop planting structure in WLRB from 2014 to 2020 were analyzed. The following conclusions were drawn: (1) Integrating multidimensional features could discriminate subtle differences, and feature optimization could ensure the accuracy and efﬁciency of classiﬁcation. (2) By augmenting the original sample size by calculating the similarity of the time series NDVI (normalized difference vegetation index) curves, migrating the random forest model, and reselecting the samples for other years based on the model accuracy scores, it was possible to achieve a high crop classiﬁcation accuracy with limited samples. (3) The main grain crops in the WLRB were primarily distributed in the northeastern and southern plains with lower elevations. Maize was the most predominant crop type with a wide distribution. The planting area of main grain crops in the WLRB exhibited an increasing trend, and national policies primarily inﬂuenced the variations of planting structure in maize and soybean. This study provides a scheme for extracting crop types from limited samples with high accuracy and can be applied for long-term crop monitoring and change analysis to support crop structure adjustment and food security.


Introduction
Crop planting structure refers to the types, area, and distribution of crops, reflecting the utilization of agricultural resources within a specific region [1], which is the prerequisite for crop area measurement [2][3][4], yield estimation, and growth monitoring [5].However, land degradation [6], extreme weather events [7], and rapid urbanization [8] pose severe threats and challenges to agricultural production activities [9].Extracting crop planting information accurately can provide crucial decision-making information for both government authorities and farmers [10], which can strongly support agricultural production, policy formulation, and ecological environment protection.It is of great importance for maintaining food security and agricultural sustainable development [11,12].
Remote sensing is widely considered an effective means to obtain crop spatial distribution information, benefiting from its extensive observation range, high temporality, and low cost [13,14].Recent years have witnessed significant improvements in the quality and diversity of remote-sensing images and the refinement of classification algorithms [15].
There also has been considerable progress in the extraction of crops from remote sensing [16].Crop identification depends on the differences in spectral, temporal, and textural features among different crop types.The land cover classification utilized a limited number of spectral features, whereas these finite features encountered problems distinguishing different crops with high spectral similarity.Several studies have demonstrated that constructing high-dimensional features (e.g., spectrum, phenology, backscattering, topography, and texture) can comprehensively identify the differences in diverse crop phonological stages, environmental conditions, and morphological characteristics to facilitate recognition improvement [17,18].For instance, You et al. [19] utilized Sentinel-2 to synthesize 10-day resolution time series images of multiple spectral indices and reflectance bands, calculated the maximum image and harmonic wave parameters of the growing season, selected the optimal feature combination, and used the random forest classifier to obtain the distribution of maize, soybean, and rice in Northeast China.Fatchurrachman et al. [20] generated monthly composite images of VH polarization and NDVI using Sentinel-1 and Sentinel-2 images and employed the K-means clustering to obtain a distribution map of rice growth stages in the Malay Peninsula.Under the constraints of limited samples, an excessive number of features can negatively impact the model's performance and efficiency [21].It is imperative to construct multidimensional features and select them prudently to guarantee a high-quality classification result.
Google Earth Engine (GEE) is the most extensively used remote sensing cloud platform, which has been applied in many research fields, such as change detection, ecological environment monitoring, and crop mapping [22,23].GEE incorporates a variety of massive datasets and offers users free access services, and it is characterized by automatic parallel processing and fast computational ability, enabling the handling of data with large spatial extents and long temporal periods [24].Utilizing a state-of-the-art cloud platform to construct and select suitable multi-dimensional features remains an essential work to be improved in crop classification.
Machine learning algorithms, such as random forest and support vector machines, have demonstrated commendable performance in many practices [25].The emergence of deep learning algorithms, epitomized by convolutional neural networks, recurrent neural networks, and fully convolutional networks, has led to further improvements in classification performance [26,27].However, these methods above rely on the quantity of samples to train a robust classification model, particularly for deep learning algorithms [28].Sample collection is currently dominated by field investigation and manual interpretation via high-resolution images, which is time-consuming and labor-intensive, making it difficult to collect data evenly in large study areas and to ensure that data are available over multiple years [29].Several studies attempted to perform classification for target years with a lack of sample data by exploiting archived samples, involving sample augmentation and feature or model migration [28,30,31].Given a similar planting structure and patterns, sample augmentation refers to generating new samples highly similar to historical samples.Zhang et al. [32] proposed a method that utilized samples from 2013 to 2017 to determine unchanged pixels as new samples for 2018.Liu et al. [29] calculated spectral similarity between the historical year and target year of samples and performed cluster analysis to produce new samples.Transferring models trained on archived reference data to identify crops in the target year is also an approach to solving this issue.Hao et al. [33] successfully transferred a random forest model that trained on cropland data to identify crop types in three test sites where training samples were scarce.You et al. [34] employed samples collected in 2017 and a random forest classifier trained early season images and achieved early stage crop identification for 2018 in the Heilongjiang province of China.The previous studies provided guidelines for crop classification when samples were scarce.However, these approaches were confined to augment samples in a single year, or the historical samples were employed for model migration to classify crop types in other years restricted to adjacent years or regions.It is sometimes impossible to obtain enough samples for consecutive years in certain regions to classify crops due to some factors, such as labor costs, bad weather, and adverse physical environment.Hence, the approach to migrate the model for achieving long-term crop classification with high classification accuracy in more challenging situations, especially when historical samples are inadequate, serves as a valuable subject for exploration.This study will explore the feasibility of combining sample augmentation with model migration to solve this situation.
The West Liaohe River Basin (WLRB), located in the agro-pastoral ecotone of Northern China, is a vital zone connecting agricultural farmland and ecological grassland, rendering it a typical ecologically sensitive and fragile territory [35].The basin serves as a traditional livestock husbandry and significant commodity grain production region in China.Due to long-term and high-intensity human activities, environmental damage occurred in recent years, such as land desertification and soil fertility decline.Additionally, the land use and crop planting structure in WLRB is constantly undergoing adjustments and changes influenced by different policies (e.g., ecological restoration projects and Sickle Bend policies) [36].This study took WLRB as the study area, aiming to (1) construct optimal multidimensional feature sets for crop classification suitable for GEE to extract the main grain crops; (2) develop a scheme to extract main grain crops using sample augmentation and model migration with limited samples; and (3) analyze the spatiotemporal changes in the crop planting structure in WLRB from 2014 to 2020.

Study Area
The WLRB is located in the southwest of Northeastern China, between 116 • 36 -124 • 34 E and 41 • 05 -45 • 12 N, connecting the Mongolia Plateau, the North China Plain and the Northeast Plain.The basin covers four provinces, including Inner Mongolia Autonomous Region, Heibei Province, Liaoning Province, and Jilin Province, with a total area of 13.8 × 10 4 km 2 (Figure 1).WLRB contains various landforms such as mountains, plains, and hills.The terrain exhibits a southwest high-northeast low pattern.The eastern part of the basin is known as the West Liaohe River Plain, located in the "World Golden Maize Belt", characterized by fertile soil, abundant water sources, and flat terrain, making it a crucial agricultural production base in China.WLRB is a transition region from a warm, semi-humid climate to a medium, semi-arid climate, which is dry and windy in spring and autumn, hot and humid in summer, and long and cold in winter.The annual average temperature ranges from 5 to 8 • C, and the average precipitation is approximately 300-400 mm.The main grain crops in WLRB consist of maize, soybean, and rice, with a single harvest per year [36].

Datasets
Landsat 8 is the eighth satellite in the Landsat series, which contains Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) with 11 bands and a revisit period of 16 days.The spatial resolution of Landsat images is 30 m.The OLI sensor consists of 9 spectral bands: coastal aerosol, blue, green, red, near-infrared, shortwave infrared 1, shortwave infrared 2, panchromatic, and cirrus.The dataset has been preprocessed through atmospheric correction and ortho correction.It is labeled as "LAND-

Datasets
Landsat 8 is the eighth satellite in the Landsat series, which contains Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) with 11 bands and a revisit period of 16 days.The spatial resolution of Landsat images is 30 m.The OLI sensor consists of 9 spectral bands: coastal aerosol, blue, green, red, near-infrared, shortwave infrared 1, shortwave infrared 2, panchromatic, and cirrus.The dataset has been preprocessed through atmospheric correction and ortho correction.It is labeled as "LANDSAT/LC08/C02/T1_L2" within the GEE.
MODIS (Moderate Resolution Imaging Spectroradiometer) is a sensor carried by the Terra satellite and Aqua satellite, consisting of 36 spectral bands.These two satellites cooperate to complete a repeated observation in 1-2 days.MOD09Q1 is an 8-day composite data surface reflectance product with 250 m spatial resolution in red and near-infrared bands.It has a quality band that can be utilized for cloud detection.The dataset within the GEE is labeled as "MODIS/061/MOD09Q1".
The SRTM DEM (Shuttle Radar Topography Mission Digital Elevation Model) is a DEM dataset jointly measured by the NASA (National Aeronautics and Space Administration) and the NGA (National Geospatial-Intelligence Agency), with a spatial resolution of 30 m.The SRTM DEM dataset with the identifier "USGS/SRTMGL1_003" within the GEE was utilized.
Sample data were collected using GPS in the WLRB in 2017, including crop type and location information.They were used for training and validation of crop classification.The spatial distribution of samples is shown in Figure 1.

Methods
The study process was divided into four parts, as illustrated in Figure 2. (1) The required data were retrieved and preprocessed on the GEE.The original sample data were augmented, and the cropland was extracted as a mask for crop classification.(2) Multidimensional features were conducted, optimal features are selected, and the random forest algorithm was used to classify crops in 2017.(3) The model trained in 2017 was transferred as a reference, samples of other years are selected, new random forest models were trained and employed annually to obtain classification results.(4) Classification accuracy was evaluated using verification samples and statistical data, and the spatiotemporal changes in the main grain crops were analyzed.

Methods
The study process was divided into four parts, as illustrated in Figure 2. (1) The required data were retrieved and preprocessed on the GEE.The original sample data were augmented, and the cropland was extracted as a mask for crop classification.(2) Multidimensional features were conducted, optimal features are selected, and the random forest algorithm was used to classify crops in 2017.(3) The model trained in 2017 was transferred as a reference, samples of other years are selected, new random forest models were trained and employed annually to obtain classification results.(4) Classification accuracy was evaluated using verification samples and statistical data, and the spatiotemporal changes in the main grain crops were analyzed.

Data Preprocessing and Cropland Extraction
The MOD09Q1, Landsat 8 OLI and SRTM DEM data were obtained from GEE, and the time and spatial filtering, cloud removal, and clipping were performed.The sample data were uploaded to the GEE.The original samples were augmented using the similarity of the NDVI time series curves.The calculation of the NDVI curve was specified in Section 3.2.The mean value of the NDVI time series of the original samples (denoted as

Data Preprocessing and Cropland Extraction
The MOD09Q1, Landsat 8 OLI and SRTM DEM data were obtained from GEE, and the time and spatial filtering, cloud removal, and clipping were performed.The sample data were uploaded to the GEE.The original samples were augmented using the similarity of the NDVI time series curves.The calculation of the NDVI curve was specified in Section 3.2.The mean value of the NDVI time series of the original samples (denoted as Mean_NDVI) was calculated for each crop type.The spectral angle distance between time series NDVI of original samples and Mean_NDVI were calculated (denoted as SAD) with Equation (1).The mean value and standard deviation of the SAD of all original samples were calculated for each crop type (denoted as Mean_SAD and Std_SAD).The spectral angle distance between pixels in the cropland and Mean_NDVI were calculated.The threshold was set to the Mean_SAD plus one Std_SAD, and the pixels with the closest distance were selected as new samples, whose spectral angle distance were smaller than the threshold.The above samples were further refined through comparing high-resolution images to remove sample points with distinct errors.This process was repeated in each type of crop until a sufficient number of samples were generated.The original sample numbers for maize, soybean, rice, and other crop types were 109, 53, 44, and 50, respectively; and they increased to 502, 346, 385, and 393 after sample augmentation.
where SAD is the spectral angle distance, and X and Y are vectors.
The land cover classification in the WLRB was carried out, where cropland was extracted as a mask image for crop classification.The classification contained six types: water, grassland, forest, cropland, construction land, and bare soil.Samples were collected on the GEE platform through manual visual interpretation.The classification features were constructed from Landsat 8 OLI, consisting of spectral bands 2 to 7, as well as spectral indices, including NDVI, EVI (Enhanced Vegetation Index), LSWI (Land Surface Water Index), MNDWI (Modified Normalized Difference Water Index), NDTI (Normalized Difference Tillage Index), NDBI (Normalized Difference Built-Up Index) and NDSVI (Normalized Difference Senescence Vegetation Index), which were calculated as percentage images (5%, 25%, 50%, 75%, 95%) during the period from April to October each year.The random forest classifier was employed to obtain the land cover maps from 2014 to 2020 in WLRB.The classification accuracy was shown in Table S1.

Multidimensional Feature Construction
The Landsat data are vulnerable to the impact of adverse weather conditions, resulting to a reduction of available pixels [37,38].It is necessary to analyze the coverage of Landsat images to provide a reference for subsequent application.A total of 22 Landsat images with different numbers covered the whole basin.Taking the crop growing season (i.e., from March to November) in 2017 for an example, a total of 342 images recorded data were discovered during the period.After the pixels affected by cloud were excluded, the maximum number of revisits in the image overlapping state was 33, and the minimum number of revisits was only 1 (Figure 3).Therefore, it is essential to recover the missing Landsat values to guarantee the existence of key features for crop classification.
This study employed the Gap Filling and Savitzky-Golay (GF-SG) algorithm [39], a high spatiotemporal reconstruction algorithm using Landsat OLI and MOD09Q1 to generate time series Landsat NDVI images from March to November with 8-day resolution.The fundamental principle of this algorithm was to utilize the NDVI shape curve provided by MODIS to calculate the missing values in Landsat.GF-SG algorithm could effectively improve the reconstruction accuracy of Landsat NDVI time series under the condition of long-term continuous missing values, and reduce the residual noise caused by cloud detection error.It could be run in GEE due to its simple and easy principle.The original Landsat NDVI curve, the de-clouded NDVI curve and the GF-SG reconstructed NDVI curve were shown in Figure 4.It could be seen that the time series NDVI curve reconstructed by the GF-SG algorithm was smooth and coherent, compensating for the long-term lack of Landsat NDVI after cloud removal.
The Landsat data are vulnerable to the impact of adverse weather conditions, resulting to a reduction of available pixels [37,38].It is necessary to analyze the coverage of Landsat images to provide a reference for subsequent application.A total of 22 Landsat images with different numbers covered the whole basin.Taking the crop growing season (i.e., from March to November) in 2017 for an example, a total of 342 images recorded data were discovered during the period.After the pixels affected by cloud were excluded, the maximum number of revisits in the image overlapping state was 33, and the minimum number of revisits was only 1 (Figure 3).Therefore, it is essential to recover the missing Landsat values to guarantee the existence of key features for crop classification.This study employed the Gap Filling and Savitzky-Golay (GF-SG) algorithm [39], a high spatiotemporal reconstruction algorithm using Landsat OLI and MOD09Q1 to generate time series Landsat NDVI images from March to November with 8-day resolution.The fundamental principle of this algorithm was to utilize the NDVI shape curve provided by MODIS to calculate the missing values in Landsat.GF-SG algorithm could effectively improve the reconstruction accuracy of Landsat NDVI time series under the condition of long-term continuous missing values, and reduce the residual noise caused by cloud detection error.It could be run in GEE due to its simple and easy principle.The original Landsat NDVI curve, the de-clouded NDVI curve and the GF-SG reconstructed NDVI  Vegetation phenology parameters were calculated from the generated NDVI curves by GF-SG algorithm, including growing season start time, growing season end time, growing season duration, maximum NDVI, minimum NDVI, NDVI amplitude and maximum NDVI date.Furthermore, the percentage (5%, 25%, 50%, 75%, 95%), mean, difference (95% -5%), and standard deviation for the Landsat OLI 2-7 bands and various spectral indices during March to November were calculated as reflectance and spectral index features.Spectral indices including NDVI, EVI, LSWI, MNDWI, NDTI, and NDSVI, were calculated from Landsat OLI.The elevation and slope were selected as topography features.The data sources, names, and numbers of the different types of features (a total of 139) are shown in Table 1.The phenological parameters of the NDVI curve and the calculation formulas of spectral indices are shown in Figure S1 and Table S2, respectively.Vegetation phenology parameters were calculated from the generated NDVI curves by GF-SG algorithm, including growing season start time, growing season end time, growing season duration, maximum NDVI, minimum NDVI, NDVI amplitude and maximum NDVI date.Furthermore, the percentage (5%, 25%, 50%, 75%, 95%), mean, difference (95% − 5%), and standard deviation for the Landsat OLI 2-7 bands and various spectral indices during March to November were calculated as reflectance and spectral index features.Spectral indices including NDVI, EVI, LSWI, MNDWI, NDTI, and NDSVI, were calculated from Landsat OLI.The elevation and slope were selected as topography features.The data sources, names, and numbers of the different types of features (a total of 139) are shown in Table 1.The phenological parameters of the NDVI curve and the calculation formulas of spectral indices are shown in Figure S1 and Table S2, respectively.Random forest (RF) is an ensemble learning algorithm composed of multiple decision trees, which randomly selects a subset of samples with release from the original training samples to train individual decision tree models [40].These individually trained weak classification models are then combined to form a random forest, and the classification result is determined by the vote of each decision tree, following the principle of "majority rules", with the highest number of votes selected as the final result.The RF is known for its simplicity and stability and performs excellently in both classification and regression.As a result, the RF classifier finds extensive applications in land cover and crop classification.
The recursive feature elimination random forest (RFE-RF) method was employed to select the appropriate number and obvious role of classification features.The importance scores of all features were calculated using RFE-RF.The features with the lowest importance scores were successively removed, and the remaining features were fed into the model for training.Given the substantial number of initial features in this study, the selection criterion for removing feature numbers was set to five at each iteration.When confronted with a limited number of samples, a straightforward partitioning of the samples into a training set and validation set in a 7:3 ratio fails to fully exploit the available data, and the obtained validation results are susceptible to the effects of the initial grouping [41].In this study, the k-fold cross-validation method was employed, and the value of k was set to 5 with representing the number of folds.The average overall accuracy and kappa coefficient were calculated across the five different training and validation sets, and then the analysis was performed to examine how the model's accuracy varies with changes in the number of features.

Multi-year Crop Classification Strategy
Referring to the crop classification scheme implemented in 2017, the images of crop classification features were generated in other years.It was worth noting that differences in atmospheric conditions, crop growth stages and sensor performances among different years could reduce classification accuracy when directly transferring the random forest model.The output mode of the random forest was set to "MULTIPROBABILITY", which allows the classification results to display the classification probability of different crop types.The random forest model trained in 2017 was employed to generate probability maps of crops for other years, with probability values ranging from 0 to 1.A higher value indicated a higher probability that the pixel belonged to a specific crop type.The probability maps of the crops were used as a reference for re-selecting the samples.The probability threshold was initially set to 0.8, meaning samples were selected from the pixels greater than this threshold.If there was a relatively sparse distribution of crops within the threshold range, the threshold was sequentially reduced by 0.1 until a sufficient number of samples were selected.A suitable number of new samples with higher classification scores were selected annually for each crop type, and the training set and verification set were divided.The new random forest model was trained to obtain the crop distribution maps from 2014 to 2020 (except 2017).

Classification Accuracy Evaluation
There were two ways to assess the classification accuracy: (1) confusion matrix, and (2) statistical data.The confusion matrix, also referred to as the error matrix, was a common method to evaluate classification accuracy by comparing the classification results with the reference data [42].The confusion matrix was used to calculate the mapping accuracy (MA), user accuracy (UA), overall accuracy (OA), and kappa coefficient, which collectively provide a comprehensive validation of the classification performance [36].
Statistical data were utilized by comparing the crop area in statistical yearbooks with the area of crop classification, including both basin scale and county scale.At the basin scale, the total classification area of different crops was compared with statistical data, where the crop area of counties with incomplete administrative unit areas in the statistics was calculated by the proportion of the basin in which the administrative unit is located.At the county scale, a linear regression was fitted between the statistical crop area and the area of crop classification for complete administrative units within the basin, yielding the determination coefficient R 2 as a metric.

Feature Importance Analysis and Feature Selection
The importance scores of 139 features were calculated.The top five features in importance were the 5% value of the green band, the 95% value of the near-infrared band, the elevation, the 75% value of MNDWI, and the mean value of MNDWI.Ranked by average importance, the various feature types were as follows: topography, time series NDVI, reflectance, spectral indices, and phenology parameters, with importance values of 2.91, 2.49, 2.48, 2.35, and 1.92, respectively.Specifically, the importance of elevation in topographic features was prominent, and the importance of slope was weak.The importance rankings of time series NDVI, reflectance, and spectral index were similar.Among these three types of features, the feature images corresponding to critical time points significantly influenced crop classification results.The phenological parameters entirely had lower importance.
The features were ranked in decreasing order of importance and fed into a random forest model.The relationship between the number of features and the overall accuracy and kappa coefficient is shown in Figure 5.As the number of features increased from 5 to 10, the overall accuracy increased from 88.68% to 92.33%, and the kappa coefficient increased from 0.85 to 0.90.As the number of features increased from 10 to 30, the overall accuracy and kappa coefficient gradually increased, reaching 94.16% and 0.92, respectively.However, after the number of features reached 30, the overall accuracy and kappa coefficient did not show a consistent increase but fluctuated with the increasing number of features.After the number of features exceeded 30, the average overall accuracy was 94.25%, with the highest being 94.94%, and the kappa coefficient at an average of 0.92, with the highest 0.93.Although the accuracy of excessive feature numbers surpassed the feature number at 30, there was not much variation in accuracy compared to the feature number of 30, differing by only 0.78% from the highest overall accuracy and 0.1 from the highest kappa coefficient.Considering that the increase in the number of features could reduce the computational efficiency, the top 30 features of importance were used.
94.25%, with the highest being 94.94%, and the kappa coefficient at an average of 0.92, with the highest 0.93.Although the accuracy of excessive feature numbers surpassed the feature number at 30, there was not much variation in accuracy compared to the feature number of 30, differing by only 0.78% from the highest overall accuracy and 0.1 from the highest kappa coefficient.Considering that the increase in the number of features could reduce the computational efficiency, the top 30 features of importance were used.

Classification Accuracy and Crop Distribution Characteristics
The crop classification accuracies from 2014 to 2020 were validated based on the confusion matrix and statistical data (Table 2).The overall accuracy (OA) and kappa coefficients were both greater than 0.90, with the highest overall accuracy reaching 0.98, and the lowest being 0.93.The maximum value of the kappa coefficient was 0.96, and the minimum value was 0.90.The mapping accuracy (MA) and user accuracy (UA) of each crop were all greater than 0.80, with the highest value at 1.00 and the lowest at 0.80.It showed that the classification results were excellent from the perspective of verification samples.At the whole basin scale of statistical validation, aside from the absolute values of the statistical errors for soybean and rice in 2020 exceeding 20%, the absolute values of errors for the other years were all less than 20%.At the county scale of statistical validation, the linear fitting coefficient (R 2 ) revealed that the accuracies of maize and rice were greater than 0.80 and 0.90, respectively.The accuracy of soybean was relatively low, with values greater than 0.70 except for 2014 and 2015.These indicators collectively demonstrated that the results exhibited a relatively high accuracy.The main grain crops in the WLRB in 2017 were predominantly distributed in the northeastern and southern plains with lower elevations, forming strip-like patterns along both sides of the rivers (Figure 6).The cropland in the region near the mainstream of the West Liaohe River and Xinkai River exhibited a high concentration.Maize was the most predominant crop type with a wide distribution, particularly concentrated in the central and eastern regions of the basin.The soybean was primarily distributed in Changling County and Shuangliao City in the northeastern part of the basin and Aohan Banner and Ongniud Banner in the southern part of the basin.The area of rice was small, and mainly distributed in the northern and eastern parts of Ongniud Banner along the Xar Moron River and Laoha River and the eastern part of Horqin Left Back Banner.

Classification Accuracy and Crop Distribution Characteristics
The crop classification accuracies from 2014 to 2020 were validated based on the confusion matrix and statistical data (Table 2).The overall accuracy (OA) and kappa coefficients were both greater than 0.90, with the highest overall accuracy reaching 0.98, and the lowest being 0.93.The maximum value of the kappa coefficient was 0.96, and the minimum value was 0.90.The mapping accuracy (MA) and user accuracy (UA) of each crop were all greater than 0.80, with the highest value at 1.00 and the lowest at 0.80.It showed that the classification results were excellent from the perspective of verification samples.At the whole basin scale of statistical validation, aside from the absolute values of the statistical errors for soybean and rice in 2020 exceeding 20%, the absolute values of errors for the other years were all less than 20%.At the county scale of statistical validation, the linear fitting coefficient (R 2 ) revealed that the accuracies of maize and rice were greater than 0.80 and 0.90, respectively.The accuracy of soybean was relatively low, with values greater than 0.70 except for 2014 and 2015.These indicators collectively demonstrated that the results exhibited a relatively high accuracy.
The main grain crops in the WLRB in 2017 were predominantly distributed in the northeastern and southern plains with lower elevations, forming strip-like patterns along both sides of the rivers (Figure 6).The cropland in the region near the mainstream of the West Liaohe River and Xinkai River exhibited a high concentration.Maize was the most predominant crop type with a wide distribution, particularly concentrated in the central and eastern regions of the basin.The soybean was primarily distributed in Changling County and Shuangliao City in the northeastern part of the basin and Aohan Banner and Ongniud Banner in the southern part of the basin.The area of rice was small, and mainly distributed in the northern and eastern parts of Ongniud Banner along the Xar Moron River and Laoha River and the eastern part of Horqin Left Back Banner.

Temporal Changes in the Area of Main Grain Crops
The area of main grain crops in the WLRB from 2014 to 2020 is shown in Figure 7.The area of main grain crops in the WLRB exhibited an increasing trend with a total area increment of 4265.74 km 2 and a growth rate of 28.08%.The total area of the crop experienced a process of increase, decrease, a significant increase, and stable increase.Specifically, the crop area increased slightly in 2015 and decreased in 2016, subsequently, there was a substantial rebound from 2016 to 2017 with an increase of 3398.43 km 2 .The crop area showed a gradual increase from 2017 to 2020, however, the annual growth rate gradually declined from 4.28% to 1.27%.

Spatial Changes in Main Grain Crops
Combining the area transfer matrix with crop distribution maps in 2014 and 2020, the spatial changes in main grain crops in the WLRB were analyzed.The distribution of main grain crops in the WLRB is shown in Figure 8, and the area transfer matrix of main grain crops is shown in Table 3.The outflow area of maize was 3886.89km 2 , the inflow area was 7295.72 km 2 , and the area with no change was 10,571.16km 2 , indicating an expansive change for maize.The mainstream of the West Liaohe River, the Laoha River, and the XinKai River basins had traditionally been the core areas for the cultivation of maize, with a relatively stable planting structure.The reduced area of maize was primarily converted into non-cropland and other crop types.The increased maize area was mainly converted from non-cropland, accounting for approximately 76.86%.It was widely distributed throughout the basin, with significant concentrations in the northern, central, northeastern, and southern regions of the basin.Among them, areas with substantial increases were particularly notable in Shuangliao City and the eastern part of Ningcheng County (as shown in border a).The areas of conversion from other crop types into maize were concentrated in the southern part of Naiman Banner.The region of maize that transformed into non-cropland was mainly concentrated in the eastern and central parts of the Horqin District and the central part of Horqin Left Middle Banner (as shown in border b), and the region transformed into other crop types was predominantly located in the eastern part of the Horqin District.
The outflow area of soybean was 314.09 km 2 , the inflow area was 1150.83 km 2 , and only 23.95 km 2 remained unchanged, indicating a dramatic change and significant expansion.The distribution of increased cultivation regions was relatively scattered.Soybeans in Horqin District and Horqin Left Middle Banner were mainly converted from maize (as shown in border c).Soybeans in Aru Horqin Banner, Aohan Banner, Weichang Manchu, and Mongolian Autonomous County were mainly converted from non-cropland.The outflow area of rice was 196.46 km 2 , the inflow area was 205.17 km 2 , and the unchanged area was 191.73 km 2 .The cultivation regions of rice with unchanged, increased, and decreased were primarily concentrated in the northern and western parts of the Wengniute Banner and the surrounding areas of the Horqin Left Rear Banner, respectively.The unchanged region of rice was located in the central part of the above two main rice cultivation areas.In terms of specific crops, the planting area of maize also demonstrated an increasing trend with a total area increase of 3416.90 km 2 .It was the primary crop type contributing to the increase in total grain crop area in the WLRB.The changing trend of maize was almost consistent with the total crop area mentioned above, and the area was gradually stabilized at 17,000-18,000 km 2 (Figure 7).The area of soybean experienced a short-term decline in 2015, reaching its lowest value at 166.10 km 2 in this period.Subsequently, it continuously increased from 2015 to 2020.It is worth noting that it exhibited a dramatic increase from 2017 to 2018, with a growth rate of 149.76% and an increase of 521.78 km 2 .Afterwards, the area of soybean began to steadily increase from 2018 to 2020 reaching more than 1000 km 2 .Although the annual growth rate gradually declined, it remained above 10% yearly.The rice area remained relatively stable, exhibiting fluctuating patterns characterized by a subtle increase followed by a slight decline.The rice area fluctuated ranging from 400 to 500 km 2 , with no particularly significant changes.

Spatial Changes in Main Grain Crops
Combining the area transfer matrix with crop distribution maps in 2014 and 2020, the spatial changes in main grain crops in the WLRB were analyzed.The distribution of main grain crops in the WLRB is shown in Figure 8, and the area transfer matrix of main grain crops is shown in Table 3.The outflow area of maize was 3886.89km 2 , the inflow area was 7295.72 km 2 , and the area with no change was 10,571.16km 2 , indicating an expansive change for maize.The mainstream of the West Liaohe River, the Laoha River, and the XinKai River basins had traditionally been the core areas for the cultivation of maize, with a relatively stable planting structure.The reduced area of maize was primarily converted into non-cropland and other crop types.The increased maize area was mainly converted from non-cropland, accounting for approximately 76.86%.It was widely distributed throughout the basin, with significant concentrations in the northern, central, northeastern, and southern regions of the basin.Among them, areas with substantial increases were particularly notable in Shuangliao City and the eastern part of Ningcheng County (as shown in border a).The areas of conversion from other crop types into maize were concentrated in the southern part of Naiman Banner.The region of maize that transformed into non-cropland was mainly concentrated in the eastern and central parts of the Horqin District and the central part of Horqin Left Middle Banner (as shown in border b), and the region transformed into other crop types was predominantly located in the eastern part of the Horqin District.The outflow area of soybean was 314.09 km 2 , the inflow area was 1150.83 km 2 , and only 23.95 km 2 remained unchanged, indicating a dramatic change and significant expansion.The distribution of increased cultivation regions was relatively scattered.Soybeans in Horqin District and Horqin Left Middle Banner were mainly converted from maize (as shown in border c).Soybeans in Aru Horqin Banner, Aohan Banner, Weichang Manchu, and Mongolian Autonomous County were mainly converted from non-cropland.The outflow area of rice was 196.46 km 2 , the inflow area was 205.17 km 2 , and the unchanged area was 191.73 km 2 .The cultivation regions of rice with unchanged, increased, and decreased were primarily concentrated in the northern and western parts of the Wengniute Banner and the surrounding areas of the Horqin Left Rear Banner, respectively.The unchanged region of rice was located in the central part of the above two main rice cultivation areas.

The Selection of Classification Features and Classifiers
Random forest was used to select the appropriate number of classification features with obvious effects in this study.To assess the superiority of the combination of random forest classifier and optimized features, we designed six feature sets using three classifiers and compared the overall accuracy among different combinations.The three classifiers included classification and regression tree (CART), random forest (RF), and support vector machine (SVM).The six feature sets were shown in Table 4.The overall accuracy of different feature sets and classifiers are shown in Figure 9.For the RF classifier, when incorporating three types of features: i.e., reflectance, spectral index, and time series NDVI, the highest accuracy was achieved at 94.74%.When the optimized features were utilized, the accuracy was 94.52%, exhibiting a minor reduction of only 0.22% compared to the highest accuracy.When all features were added, accuracy decreased to 93.64%, which was lower than the highest accuracy of 1.1%.For the CART classifier, when all features were added, the accuracy was the highest at 90.79%.The accuracy of the optimized features was the second at 89.47%, which was slightly lower than the former.Notably, all features sets of CART exhibited lower accuracy than corresponding feature sets of RF.For the SVM classifier, the accuracy of set 1-4 increased progressively with a maximum of 94.96%.However, the accuracy dropped drastically to 42.98% when all the features were added.Moreover, the optimized feature set was only 72.15%.The SVM model required a uniform data range for input features.The input of phenology and topography features disrupted the uniformity of the original data in set 4.
As the above results suggested, when processing features of multiple types and dimensions, RF classifier did not require data normalization and could achieve optimal accuracy compared with the other two classifiers.The optimized features could obtain a higher classification accuracy while using fewer features and achieve a balance between efficiency and accuracy.Therefore, selecting the combination of optimized feature with random forest classifier can not only reduce the computational complexity but also ensure higher classification accuracy.
As the above results suggested, when processing features of multiple types and dimensions, RF classifier did not require data normalization and could achieve optimal accuracy compared with the other two classifiers.The optimized features could obtain a higher classification accuracy while using fewer features and achieve a balance between efficiency and accuracy.Therefore, selecting the combination of optimized feature with random forest classifier can not only reduce the computational complexity but also ensure higher classification accuracy.

The Characteristics and Influencing Factors of Classification Accuracy
To solve the dilemma of insufficient samples during the classification, we augmented samples in 2017 by spectral angle distance.Moreover, we selected the samples from other years based on the accuracy score of the model trained in 2017 and re-trained the new random forest model annually.Although the augmented and reselected samples met the classification requirements, they were dependent on the characteristics of the original samples.The representativeness and diversity of the new samples decreased, leading to an overfitting in the model with the mapping accuracy or user accuracy being 1 in several years.Hence, utilizing statistical data for auxiliary validations was essential to avoid biased assessments of the results due to overfitting.In this study, the accuracy of classification results was verified by comparing them with statistical data, including at the basin scale and county scale (see Section 3.5).There are incomplete counties in WLRB, that is, some counties have parts of their administrative scope outside the basin.When calculating the crop area of these incomplete counties, we used the proportion of the area of these counties in the basin to convert the crop area, which inevitably brought large errors.The smaller the crop area, the larger the error may be.For example, the planting areas of rice in WLRB were 390-500 km 2 , and the errors were almost −10% or more, while the maize planting areas were over 14,000 km 2 , and the errors were relatively small (Table 2).However, the determination coefficient (R 2 ) for complete county validation was a better indicator of the accuracy between the classification results and the statistics.Because there was no area conversion matter in the verification of the complete county scale, the consistency between the classification results and the statistics could be better explained.As seen from Table 2, the R 2 values were high from 2014 to 2020, which indicated the reliability of our method and classification results.
The classification accuracy of different crops was different.The accuracy of rice and maize were consistently greater than 0.9 and 0.8 in terms of R 2 , respectively, while the accuracy of soybean was relatively low with values exceeding 0.7 for all years except 2014 and 2015.Rice had a distinct planting environment compared to the upland crops, making it easier to identify.Maize benefited from a larger number of original samples; its features were adequately preserved during the model migration.In contrast, soybeans exhibited significant planting structure changes, leading to different crop characteristics across planting years and locations.This made it more difficult to accurately identify soybeans over many years.That may be the reason why the accuracy of soybeans was lower than that of maize and rice.
The classification accuracy varied across regions.The crop area of classification and statistical data were listed in Figure 10, where the area was the average area of each district from 2014 to 2020.The maize classification area in Horqin District, Kailu County, and Yuanbaoshan District exceeded their statistical area, indicating the misclassification of non-maize cropland as maize.The statistical area of maize exceeded the classification area in western regions of the basin, such as Bairin Left Banner and Bairin Right Banner, suggesting some omissions where certain pixels of maize were not correctly identified.Similar patterns were observed for soybeans.However, regional accuracy differences of rice were less pronounced due to the concentrated distribution.The primary reason for this phenomenon may lie in the distribution of original samples which were mainly concentrated in the eastern plains of the WLRB.The classifier tended to over-extract crop features from the eastern plains region, while there were fewer original samples in the western part of the basin.There were notable variations in crop growing environments, phenology, and topography between the eastern and western parts of the basin, leading to significant differences in crop classification accuracy between the two regions.

Causes of Changes in the Area of Mian Grain Crops
The planting area of maize and soybean changed significantly during the study period.Although both of them exhibited an increasing trend, the change process was not monotonous.The governmental macro policies could explain the phenomenon.
For issues such as oversupply and low production efficiency in maize production, the Ministry of Agriculture of China promulgated the "Guiding Opinions on the Structural Adjustment of Maize in the Sickle Bend Region" in 2015 [43,44].This document explicitly stated that the cultivation area of maize should be reduced by 2020, with the total area stabilized at around 100 million mu (6.7 million hm 2 ).Afterwards, the "National Plan for Structural Adjustment of Planting Industry (2016-2020)" was released in 2016, further emphasizing the maize reduction policy in the Sickle Bend Region.The Department of Agriculture and Animal Husbandry of Inner Mongolia Autonomous Region of China actively provided specific requirements and supporting measures for maize planting structure adjustment.Under the requirements of national and local policies, the maize area in the WLRB decreased by 1500.18 km 2 in 2016.However, the maize area did not continue to decrease and experienced a significant rebound in 2017 with an increment of 3280.77km 2 .The primary causes underlying it could be attributed to two factors.Firstly, despite policy subsidies aimed at reducing maize production, the overall profitability of maize remains significantly higher than soybean owing to the influence of market factors, so farmers opted to resume and expand their maize cultivation to pursue high profit [45].Secondly, the inadequate provision of government safeguards regarding the maize production reduction policy failed to stimulate farmers' motivation to decrease their maize output [46].In 2018, fortunately, the central government was aware of these problems concerning the rebound in maize cultivation areas and once again worked with local governments to improve relevant measures, which curbed the proliferation of maize cultivation areas effec-

Causes of Changes in the Area of Mian Grain Crops
The planting area of maize and soybean changed significantly during the study period.Although both of them exhibited an increasing trend, the change process was not monotonous.The governmental macro policies could explain the phenomenon.
For issues such as oversupply and low production efficiency in maize production, the Ministry of Agriculture of China promulgated the "Guiding Opinions on the Structural Adjustment of Maize in the Sickle Bend Region" in 2015 [43,44].This document explicitly stated that the cultivation area of maize should be reduced by 2020, with the total area stabilized at around 100 million mu (6.7 million hm 2 ).Afterwards, the "National Plan for Structural Adjustment of Planting Industry (2016-2020)" was released in 2016, further emphasizing the maize reduction policy in the Sickle Bend Region.The Department of Agriculture and Animal Husbandry of Inner Mongolia Autonomous Region of China actively provided specific requirements and supporting measures for maize planting structure adjustment.Under the requirements of national and local policies, the maize area in the WLRB decreased by 1500.18 km 2 in 2016.However, the maize area did not continue to decrease and experienced a significant rebound in 2017 with an increment of 3280.77km 2 .The primary causes underlying it could be attributed to two factors.Firstly, despite policy subsidies aimed at reducing maize production, the overall profitability of maize remains significantly higher than soybean owing to the influence of market factors, so farmers opted to resume and expand their maize cultivation to pursue high profit [45].Secondly, the inadequate provision of government safeguards regarding the maize production reduction policy failed to stimulate farmers' motivation to decrease their maize output [46].In 2018, fortunately, the central government was aware of these problems concerning the rebound in maize cultivation areas and once again worked with local governments to improve relevant measures, which curbed the proliferation of maize cultivation areas effectively.The growth rate of the maize area in the WLRB experienced a gradual decline and approached a state of stability from 2018 to 2020.
The industry of soybeans in China has stagnated and even shrunk over the past 20 years, while domestic demand for soybeans continued to increase, leading to a growing reliance on international imports and a serious deficiency in self-sufficiency of soybean production [47].The Chinese government initiated the second round of the soybean revitalization plan in 2018, aiming to promote area expansion and production for the sustainable development of the soybean industry [48].The Ministry of Agriculture officially released the implementation plan for the soybean revitalization, which clearly stipulated the requirement to increase soybean area in key regions, striving for a total national cultivation area of 140 million mu (9.3 million hm 2 ) by 2020 [49].Under the guidance of these policies, the soybean area in the WLRB experienced a significant increase trend from 2018 to 2020.
Unlike maize and soybean, the distinct cultivation conditions of rice made it a considerable challenge altering the crop type in a short period.Therefore, rice maintained a relatively stable planting structure during the study period.

The Advantages and Limits of the Study
This study utilized the GF-SG algorithm with Landsat OLI and MOD09Q1 to generate an 8-day temporal resolution time series of Landsat NDVI images during the crop growing season.Compared with traditional spatiotemporal fusion algorithms, GF-SG demonstrated the capability to reconstruct the NDVI time series under consecutive data gaps and could be readily reproduced and utilized on cloud platforms, making it more convenient and efficient for large-scale and long-term research.The time series NDVI images generated by GF-SG algorithm served two purposes.On the one hand, it functioned as a feature for crop classification; on the other hand, it could be used to augment the sample by calculating curve similarity.Table 5 showed the classification accuracy with and without sample augmentation.It could be seen that OA and kappa coefficient without augmentation were 0.87 and 0.79, while the two indictors with sample augmentation reached 0.94 and 0.92, respectively.The MA and UA of maize and soybean were almost improved.It indicated that sample augmentation played a crucial role in improving crop classification accuracy.Also, in this study the classification feature set further incorporated multiple features, such as phenology, spectral indices, reflectance, and topography, to address the issue when crop's NDVI time series curves were similar.Although the selection was made only based on the similarity of time series curves when the samples were augmented, the initial selection was then filtered by the high-resolution image and the geographic location of the newly selected samples to ensure their accuracy.The results of WLRB as a case study also proved that our method were accurate and reliable.In conclusion, our approach of retraining the model based on the samples of high probability took the dynamic factors affecting the feature images across different years into account.It could utilize sample data from a single year to achieve crop classification for the consecutive years.We constructed multi-dimensional features to select the best features.On the one hand, the multi-dimensional features could complement each other, which was conducive to the identification of differences between crop types.On the other hand, and more importantly, it could reduce the computation load, especially for large-scale classification, such as land use classification and crop type classification, which was conducive to improving work efficiency.
The study had some limits.The classification process involved artificial subjective judgment in sample augmentation that may potentially lead to some errors.It could be solved by adding significant spectral indices that differentiate crops as additional criteria for future assessments, which could alleviate the limitations of relying solely on time curves for sample augmentation.The random forest is a relatively straightforward machine-learning algorithm with limited transferability.In the future, investigations could be undertaken using deep networks on small or weak sample learning when sample data are insufficient or have low accuracy.A robust migration classification model might be established to complete crop extraction in different years.The reason for the significant changes in maize and soybean areas was mainly the government's macro policies.There may be several other factors that contributed to the change in planting structure, such as market demand, trade friction, and technological level [50,51].Future research will explore the relationship between these factors and the planting structure change.This study opted to employ Lansat 8 OLI in crop mapping to investigate a crop classification scheme appropriate for Landsat-based data sources.The applicability of Landsat TM images could be explored to attain crop distribution over longer historical years in future study.Note: MA-mapping accuracy; UA-user accuracy; OA-overall accuracy.

Conclusions
In this study, optimal multi-dimensional features for crop classification were constructed based on the GEE, and the crop classification of the WLRB in 2017 was carried out; a method to extract main grain crops using sample augmentation and model migration with limited samples was developed.With limited samples in 2017, the method was used to train and classify crops in other years, and the spatial distribution of maize, soybean, and rice from 2014 to 2020 was extracted; then the spatiotemporal changes in the planting structure in WLRB during 2014-2020 was analyzed.The main conclusions are as follows: (1) Constructing time series NDVI, phenology, topography, reflectance, and spectral index as multi-dimensional features could differentiate subtle differences among crops from various aspects, and these features were suitable and simple to calculate on the GEE.Recursive feature elimination with random forest algorithms was utilized for feature selection, ensuring that the model completes crop classification with higher efficiency and accuracy.
(2) This study augmented the original sample size according to the similarity of time series NDVI curves, migrated the random forest model, and reselected samples for other years based on model accuracy scores, which effectively overcame the problem of lacking samples.It was feasible to realize a multi-year crop classification with higher accuracies in the case of limited samples.The case study showed that the classification results were accurate and reliable in showing crop layout and area by utilizing confusion matrices and statistical metrics for validation.(3) The main grain crops in the WLRB were predominantly distributed in the northeastern and southern plains with lower elevations, forming strip-like patterns along both sides of the rivers.Maize was the most predominant crop type in the basin with a wide distribution.Soybean and rice cultivation areas were relatively small.(4) The planting area of main grain crops in the WLRB exhibited an increasing trend from 2014 to 2020.Maize was the primary crop type driving the increase in crop area, soybean experienced the most dramatic increase, while rice remained relatively stable.National policies primarily influenced the variations of planting structure in maize and soybean.

Figure 2 .
Figure 2. The classification flowchart of main grain crops in the WLRB from 2014 to 2020.

Figure 2 .
Figure 2. The classification flowchart of main grain crops in the WLRB from 2014 to 2020.

Figure 3 .
Figure 3.The number of revisiting frequency of Landsat images after cloud removal in WLRB.

Figure 3 .
Figure 3.The number of revisiting frequency of Landsat images after cloud removal in WLRB.

Figure 4 .
Figure 4.The curves of Landsat NDVI from different sources.

Figure 4 .
Figure 4.The curves of Landsat NDVI from different sources.

Figure 5 .
Figure 5.The relationship between feature dimension and accuracy.Figure 5.The relationship between feature dimension and accuracy.

Figure 5 .
Figure 5.The relationship between feature dimension and accuracy.Figure 5.The relationship between feature dimension and accuracy.

Figure 6 .
Figure 6.The distribution of main grain crops in the West Liaohe River Basin in 2017.Figure 6.The distribution of main grain crops in the West Liaohe River Basin in 2017.

Figure 6 .
Figure 6.The distribution of main grain crops in the West Liaohe River Basin in 2017.Figure 6.The distribution of main grain crops in the West Liaohe River Basin in 2017.

Figure 7 .
Figure 7.The area of main grain crops in the WLRB from 2014 to 2020.

Figure 7 .
Figure 7.The area of main grain crops in the WLRB from 2014 to 2020.

Figure 8 .
Figure 8.The distribution of main grain crops in the WLRB in 2014 and 2020.Note: Border a refers to areas where maize increased prominently.Border b refers to typical areas where maize decreased.Border c refers to areas where soybean converted from maize.

Figure 8 .
Figure 8.The distribution of main grain crops in the WLRB in 2014 and 2020.Note: Border a refers to areas where maize increased prominently.Border b refers to typical areas where maize decreased.Border c refers to areas where soybean converted from maize.

Figure 9 .
Figure 9.The overall accuracy of different feature sets and classifiers.

Figure 9 .
Figure 9.The overall accuracy of different feature sets and classifiers.

21 Figure 10 .
Figure 10.The comparison of classification area and statistical area.

Figure 10 .
Figure 10.The comparison of classification area and statistical area.

Table 1 .
The names and numbers of different classification feature types.

Table 1 .
The names and numbers of different classification feature types.

Table 2 .
The classification accuracy of main grain crops in the West Liaohe River Basin.

Table 3 .
The area transfer matrix of main grain crops from 2014 to 2020.

Table 3 .
The area transfer matrix of main grain crops from 2014 to 2020.

Table 4 .
The composition type and quantity of feature sets.

Table 4 .
The composition type and quantity of feature sets.

Table 5 .
Classification accuracy with and without sample augmentation.