Prediction of Winter Wheat Yield Based on Multi-Source Data and Machine Learning in China

: Wheat is one of the main crops in China, and crop yield prediction is important for regional trade and national food security. There are increasing concerns with respect to how to integrate multi-source data and employ machine learning techniques to establish a simple, timely, and accurate crop yield prediction model at an administrative unit. Many previous studies were mainly focused on the whole crop growth period through expensive manual surveys, remote sensing, or climate data. However, the e ﬀ ect of selecting di ﬀ erent time window on yield prediction was still unknown. Thus, we separated the whole growth period into four time windows and assessed their corresponding predictive ability by taking the major winter wheat production regions of China as an example in the study. Firstly we developed a modeling framework to integrate climate data, remote sensing data and soil data to predict winter wheat yield based on the Google Earth Engine (GEE) platform. The results show that the models can accurately predict yield 1~2 months before the harvesting dates at the county level in China with an R 2 > 0.75 and yield error less than 10%. Support vector machine (SVM), Gaussian process regression (GPR), and random forest (RF) represent the top three best methods for predicting yields among the eight typical machine learning models tested in this study. In addition, we also found that di ﬀ erent agricultural zones and temporal training settings a ﬀ ect prediction accuracy. The three models perform better as more winter wheat growing season information becomes available. Our ﬁndings highlight a potentially powerful tool to predict yield using multiple-source data and machine learning in other regions and for crops.


Introduction
Wheat (Triticum aestivum L.), as one of the three top grains (wheat, rice, and corn) and one of the most productive cereals in the 21st century [1], provides the most calories and protein for the global food supply [2]. The accurate prediction of crop yields in advance plays an important role in the grain circulation market, famine prevention, and food security [3]. Crop yield forecasting is also valuable for managing field activities, such as fertilization [4]. China is the world's top wheat producer, accounting for 11.26% of the world's total wheat acreage and 17.98% of the world's total production [5]. Especially for winter wheat, China's production assumes an absolutely dominant role, with nearly 85% of total summer grain production [6]. Therefore, accurately and timely estimating winter wheat yield in China is highly required, considering its significant influence on agricultural development and national food security, or even global scale. eight machine learning algorithms for predicting winter wheat yield at the county scale, including K-nearest neighbor (KNN), neural network (NN), decision tree (DT), support vector machine (SVM), Gaussian process regression (GPR), random forest (RF), boost trees (BST) and bagging trees (BGT). Our main objectivities are: (1) to construct a winter wheat yield prediction framework; (2) to select the better machine learning algorithms for yield prediction; (3) to identify the best time window for the training settings of winter wheat; (4) to explore the regional differences of yield prediction and the relative importance of variables.

Study Area
The studied area extends from 100.20 • E to 121.89 • E and from 23.13 • N to 118.2 • N, consisting of 629 counties. Due to differences in climate and planting pattern, agricultural division of China are divided into nine typical types, six of which are planted by winter wheat (Figure 1). The eastern part (Zone III), as the main planting areas for winter wheat, is located in the North China Plain, with an average annual temperature of 16 • C and an annual rainfall of 800 mm. The growing season of winter wheat is nearly eight months, from the end of September to early or mid-June of the following year [29].

Data Sources
The data (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) obtained in the study include remote sensing data, climate data, soil data, yield data and crop map. An overview of the data is presented in Table A1. First, all data were resampled to a common 1 × 1 km spatial resolution and 1-month temporal resolution. Then all variables were further masked based on the winter wheat planting areas. Finally, we calculated the averages in atmospheric influence and enhance the effects of monitoring vegetation dynamics in high biomass areas. The combination of NDVI and EVI can provide more crop information, which will contribute to crop yield prediction [41]. The two VIs during 2001-2014 in China were derived from the MOD13Q1, which has a spatial resolution of 250 × 250 m (https://ladsweb.modaps.eosdis.nasa.gov/). The recurrence period of this dataset is 16 days. The sequential NDVI and EVI were resampled to monthly scale by the MVC (maximum synthesis method).

Climate Data
Climate variables (e.g., temperature, precipitation) are important drivers for crop production [43][44][45]. Extreme temperatures and droughts show adverse impacts on crops in the context of global climate change [35,46,47]. We selected monthly maximum temperature (TMAX), monthly minimum temperature (TMIN), monthly drought index (DI) and precipitation (PRE) for predicting winter wheat yield. We used Terra Climate, a dataset of high spatial resolution (1/24 • ) monthly climate and climatic water balance for global terrestrial surfaces from 2001-2014 [48] (http://doi.org/10.7923/G43J3B0R). The GEE platform was used to process Terra Climate datasets and calculate the climate variables for each county from 2001 to 2014.

Soil Data
Soil is a crucial factor affecting crop yield, including soil water and the physical and chemical properties of soil [49]. For example, soil moisture can increase winter wheat yield in dryland [50,51]. We selected 14 variables (Appendix A Table A1) that describe the physical and chemical properties of the soil. For example organic carbon content, PH for the topsoil layer (0-30 cm) and the subsoil layer (30-100 cm) as the soil variables in this study. Soil moisture (SM) data were obtained from Terra Climate and soil physicochemical properties data from the Harmonized World Soil Database (HSWD) [52]. More details in Appendix A Table A1.

Wheat Yield Data and Planting Area
The winter wheat yield (kg/ha) data were collected from 2001 to 2014 in 629 counties in China, provided by the Agricultural Statistical Yearbook and some unpublished county-level statistics [53]. We calculated the average and standard deviation (SD) of the yield time series in each county, and then defined the observed values as outliers and removed them if they were not within the range of the biophysical attainable yields or ± 3SD [54]. The planting areas were extracted from our previous study on the phenology information of three main crops [55]. Based on the extracted wheat phenology, we defined the pixels including three key phenology stages (green-up, anthesis, and mature stages) for more than 7 years as the winter wheat planting areas. Overall, we selected 629 counties across the whole main wheat planting areas in China.

Identify the Better Time Window for the Training Settings of Wheat
Different crop growing stages contain different information, and it is essential to determine a suitable time window to retrieve wheat features for yield prediction [56]. Various environmental factors associated with crop yields may vary by the width of time-window during wheat growing season, so we conducted a controlled trial to examine the seasonal sensitivities of winter wheat yields. In China, winter wheat is generally sowed at the end of September and harvested in early or mid-June the next year [29]. To determine the ideal interval for training, we developed machine learning models during different month windows during 2001~2013 and tested them in 2014.
The starting point is triggered by the initial growing period (October and November), and the ending point is determined just before the harvest period (April and May). Accordingly, four time windows are set by combing two different starts and ends as follows: October~May, October~April, November~May, and November~April, respectively.

Machine-Learning Methods for Estimating Crop Yield
Eight advanced machine learning algorithms were applied here. All variables in the first 13 years (2001-2013) were defined as training samples and those of the last year (2014) as test samples. We standardized all the variables in the datasets by the z-score method before developing each model. All algorithms were implemented in Weka3.8 and matlab2019a.

K-Nearest Neighbor Regression
The K-nearest neighbor (KNN) approach is a type of instance-based learning, which is based on the distance of the predictor variables to the nearest training group known to the model [57]. Aha et al. firstly proposed the new framework and methodology for KNN [58]. KNN can tolerate noise and unrelated properties and has a relatively relaxed concept bias [58].

Neural Network (NN)
Neural networks consist of different elements highly interconnected and have been extensively employed in recent years. BP neural network (BPNN) is one of the most widely used artificial neural networks [59]. BPNN typically includes one input layer, one output layer, and multiple hidden layers. The input layer only inputs data, then the neurons in the hidden layer begin to analyze and process the data, and the results are transmitted to the output layer through the transfer function in the end. When dealing with the issue of nonlinear functions, BPNN would usually be well qualified to detect the complex relationship between independent variables [60].

Decision Tree (DT)
The decision tree (e.g., C4.5) is an effective tool for solving classification and regression problems and has been widely used in remote sensing application [61]. The tree consists of a root node (containing all data), internal nodes, and several leaves. Each node makes a binary decision to separate different categories until the leaf node is reached. The algorithm is non-parametric and can deal with large and complex datasets effectively without complex parameter structure [62]. C4.5 decision tree is a method of approximating discrete value function, which is robust to noisy data [63]. The confidence factor used for pruning is 0.25 and the minimum number of instances per leaf is 2.

Support Vector Machine (SVM)
SVM is a supervised non-parametric algorithm, which is characterized by using the kernels and acting on the margins [64]. During SVM regression, the input is mapped to a high-dimensional feature space using a kernel function, and then a linear regression model is constructed in the new feature space to balance between minimizing errors and overfitting [2,65]. Kernel functions (linear, polynomial, Gaussian, etc.) are one of the important hyper-parameters that need tuning. By comparing different kernel functions, the Gaussian kernel function performed the best in this study.

Gaussian Process Regression (GPR)
GPR is a generalized Gaussian probability distribution for nonlinear regressions [66] and a nonparametric method for a variety of situations, especially for high dimensional space problems. The Gaussian process is a collection of random variables whose properties are any finite number of subsets with a joint Gaussian distribution [67]. However, matrix inversion is a necessary challenge to handle, which increases the computational complexity and causes a very slow run of the model. The GPR used in this paper is based on the kernel function of exponential [67], and the parameters were optimized using Bayesian. Random forests are a combination of tree predictors and are more robust with respect to noise [68]. Each tree is built by selecting random variable sets and dataset samples, and all the trees in the forest have the same distribution characteristic. After generating a large number of individual trees, they will vote for the most popular classes. Therefore, RF shows the efficiency to handle high-dimensional datasets and avoids overfitting during the past decade [69,70]. Additionally, RF can quantify the relative importance of measured variables and is a reasonable method for variable selection [68,71].

Ensembles of Learning Machines
Ensemble methods first construct a set of classifiers and then classify new data points by voting on their predictions [72]. This approach can often perform better than any single classifier because it generates classifiers with high precision by combining diverse classifiers with lower precision, which is widely applied to solve practical problems [73]. However, it usually takes more computing time to evaluate the prediction accuracy of an ensemble model. We used two typical ensemble learning algorithms: boost trees (BST) and bagging trees (BGT) [74].

Model Evaluation
In order to evaluate the eight ML techniques, cross-validation (CV) is a widely used strategy for algorithm selection because of its simplicity, universality, and efficiency in avoiding over-fitting issue [75,76]. It is generally accepted that a model with the smallest estimation error is the best model. We used 5-fold cross-validation to select the model in this study. We adopted the root-mean-square error (RMSE), the coefficient of determination(R 2 ), and the mean absolute error (MAE) to evaluate the performance of the machine learning model, which can be calculated as follows: where n (i = 1, 2, . . . , n) is the number of samples used for machine learning model, y i is the observed winter wheat yield, y i is the corresponding mean value, f i is the predict winter wheat yield, f i is the corresponding mean value. The closer R 2 is to 1, the higher the prediction performance of the model is. Small RMSE and MAE values indicate less discrepancy within the observed yield and predicted yield.

Comparison of Training Accuracy of Winter Wheat Yield Prediction Models in Different Time Windows
In this study, eight machine learning models were trained with the observed yields and 21 variables of winter wheat from 2001 to 2013 at county level. The evaluated results, based on the five-fold cross-validation, were summarized according to different models and time windows (Figure 3). Comprehensively considering three evaluation indicators (R 2 , RMSE and MAE), RF, GPR and SVM models showed the higher accuracy, with higher R 2 (0.79~0.81) and lower RMSE (<750 kg/ha) and MAE (<531 kg/ha). Although, R 2 of other models are above 0.6, all their RMSEs were >780 kg/ha, even over 1000 kg/ha (KNN), indicating an insignificant relationship between the predicted and the observed yields, and larger errors. Thus, RF, GPR, and SVM are more suitable for winter wheat yield prediction than other algorithms in China. Moreover, we found that the training accuracy varied by time windows even with the same machine learning algorithm, especially for the RMSE and MAE. However, the time windows showed less impact on R 2 values of RF, GPR and SVM. Finally, three algorithms (SVM, GPR, and RF) were selected to establish prediction models for winter wheat yield at the county level, and all analyses later will only be based on these three models.
Remote Sens. 2019, 11, x FOR PEER REVIEW 8 of 22 over 1000 kg/ha (KNN), indicating an insignificant relationship between the predicted and the observed yields, and larger errors. Thus, RF, GPR, and SVM are more suitable for winter wheat yield prediction than other algorithms in China. Moreover, we found that the training accuracy varied by time windows even with the same machine learning algorithm, especially for the RMSE and MAE. However, the time windows showed less impact on R 2 values of RF, GPR and SVM. Finally, three algorithms (SVM, GPR, and RF) were selected to establish prediction models for winter wheat yield at the county level, and all analyses later will only be based on these three models.

Winter Wheat Yield Predictions
Based on the trained models of RF, GPR and SVM in Section 3.1, winter wheat yields of 629 counties in 2014 were predicted. The residuals of the prediction results of these models all passed Kolmogorov-Smirnov test and obey normal distribution, which showed that these regression models were acceptable ( Figure A1). The scatter diagrams of the predicted and observed yields of the models in different growing periods are shown in Figure 4. We found that the predicted and observed yields showed a good linear fit with R² of about 0.80. Such results indicated that the three machine learning models can predict the yield of winter wheat at the county level with higher accuracy, in the order is RF > GPR > SVM. Although all predicted yields were much closer to the 1:1 line, however consistent underestimations were found for all models and all time windows. Moreover, the prediction areas were overestimated for low yields observed with smaller deviations, while underestimated for high observed yields with relatively greater deviations. Nevertheless, all errors of the predicted yields are within 10%, suggesting that RF, GPR, and SVM models perform well for crop prediction at larger scales.

Winter Wheat Yield Predictions
Based on the trained models of RF, GPR and SVM in Section 3.1, winter wheat yields of 629 counties in 2014 were predicted. The residuals of the prediction results of these models all passed Kolmogorov-Smirnov test and obey normal distribution, which showed that these regression models were acceptable (Appendix B Figure A1). The scatter diagrams of the predicted and observed yields of the models in different growing periods are shown in Figure 4. We found that the predicted and observed yields showed a good linear fit with R 2 of about 0.80. Such results indicated that the three machine learning models can predict the yield of winter wheat at the county level with higher accuracy, in the order is RF > GPR > SVM. Although all predicted yields were much closer to the 1:1 line, however consistent underestimations were found for all models and all time windows. Moreover, the prediction areas were overestimated for low yields observed with smaller deviations, while underestimated for high observed yields with relatively greater deviations. Nevertheless, all errors of the predicted yields are within 10%, suggesting that RF, GPR, and SVM models perform well for crop prediction at larger scales.

Impacts of Selecting Time Windows on Prediction Accuracy
We plotted the RMSEs and MAEs of three models by the four growing periods to investigate the impacts of the selected time window on the prediction accuracy ( Figure 5). RMSEs ( Figure 5a) and MAEs ( Figure 5b) were shown in the order of November~April > October~April > Novemeber~May > October~May, regardless of the algorithms or evaluation indicators. The highest accuracy was indicated by the time window of October~May, with the smallest errors (RMSE and MAE). Thus, the closer to the sowing date (the end of September) for time window opening, and the same ending to harvesting date (June), the higher the prediction accuracy will be, which was more strongly substantiated by SVM models. The finding suggests that the prediction accuracy will be significantly improved as more observations during growing seasons are provided by such within-seasons predictor variables. Johnson [77] further pointed out that more dates included will accumulate the entire season information and the developed model would perform better than those from only some dates. Moreover, many studies have also found good relationships between wheat yield and late-season NDVI at the regional scale [78,79], suggesting NDVI prior to harvest can provide good estimates of regional yield. Beyond the above, other attributes such as climate and soil conditions during early sowing have important implications for later crop growth [80,81]. Hence, more information available during crop growing season could improve the accuracy of winter wheat yield prediction.
Moreover, we found that the sensitivities of RMSE and MAE to time window varied by algorithms, with the most significant difference for SVM, followed by GPR and RF ( Figure 5). The fewer changes in RF suggest that RF has higher generalization ability than other models [68], with relatively smaller RMSE and MAE values (6.89 and 0.19 kg/ha). It is noted that RF predicted accurately wheat yield 1~2 months in advance before harvest dates (R 2 > 0.8).

Spatial Patterns of Winter Wheat Yield Predicted
The spatial pattern of the predicted yields by three machine learning models is almost the same as that of the observed yields despite a slight difference ( Figure 6). Moreover, the higher yields are located mainly in the eastern areas, while lower yields in the western areas ( Figure 6a). However, the areas with higher yields predicted in the east are less than those of observations, especially for SVM ( Figure 6b) and GPR (Figure 6c). The eastern areas in the study are the so-called Huang-Huai-Hai plain, the main producing area of winter wheat in China, with flat terrain, fertile soil, suitable climate, and good irrigation conditions. The southwest areas are mainly dominated by rice cultivation, with steep terrain, where winter wheat yields are generally lower and planting areas are fewer. We also found that some extremely high yields (>6500 kg/ha) are difficult to predict, indicating the limitations of the machine learning technique.

Spatial Patterns of Winter Wheat Yield Predicted
The spatial pattern of the predicted yields by three machine learning models is almost the same as that of the observed yields despite a slight difference ( Figure 6). Moreover, the higher yields are located mainly in the eastern areas, while lower yields in the western areas ( Figure 6a). However, the areas with higher yields predicted in the east are less than those of observations, especially for SVM ( Figure 6b) and GPR (Figure 6c). The eastern areas in the study are the so-called Huang-Huai-Hai plain, the main producing area of winter wheat in China, with flat terrain, fertile soil, suitable climate, and good irrigation conditions. The southwest areas are mainly dominated by rice cultivation, with steep terrain, where winter wheat yields are generally lower and planting areas are fewer. We also found that some extremely high yields (>6500 kg/ha) are difficult to predict, indicating the limitations of the machine learning technique.

Comparison of Forecast Errors in Different Areas
Crop yield is driven by the interaction of management, soil, and weather conditions. The yield would vary not only from season to season but also from location to location [14]. To investigate the prediction errors comprehensively, we summarized them according to the areas and the machine learning types (Figure 7). The results showed that the errors of machine learning algorithms do vary by the agricultural zones. With exception of zone III and zone V, the errors of other agricultural zones are all <10%. Moreover, the positive errors suggest overestimations in zone IV and zone VI, and underestimation for the other four areas. Overestimated yields are generally located in the humid and semi-humid areas.
Remote Sens. 2019, 11, x FOR PEER REVIEW 12 of 22 target one as possible when building a crop yield prediction model [14]. In the meantime, optimal spatial area coverage of machine learning methods should be taken into account.

Prediction Variables and the Order of Relative Importance
To investigate the importance of different variables for the winter wheat yield prediction in China, we calculated the decreased accuracy (mean square error) after removing one variable from the RF model. The mean decrease in accuracy indicates the amount by which the random forest model's prediction accuracy would decrease if one variable is excluded. The larger the decreased accuracy for a prediction variable, the more important the variable is ( Figure 8). Mean square error represents the relative importance of the variables. Soil physicochemical properties are generally kept constant during a short period, so we haven't selected soil physicochemical variables. Importance of prediction variables is ordered as: EVI > TMIN > PRE > NDVI > SM > TMAX > DI. EVI is more important for winter wheat yield prediction than NDVI, which is consistent with the study by Bolton et al. [82]. They stated that MODIS-EVI provided relatively better results for predicting crop yields than MODIS-NDVI [82]. As two variables related to temperature, TMIN contributed more significantly than TMAX to the accurate prediction of wheat yield, implying a greater impact of TMIN on wheat yield than TMAX [35]. We further found that the prediction accuracy of machine learning algorithms does vary by agricultural zones. For example, RF shows the highest accuracy in the zone VI with 2.40% error; SVM and GPR perform the best in zone I (−4.78%, −4.39%). The accuracy which is depended on areas suggests potential other variables should be involved to develop a model. That is to say, a good model developed in one area could reduce its performance in other areas because other key variables exist controlling crop yields, such as crop variety, field managements (e.g., fertilizer application). The crop yield model is very sensitive to the studied area where we are focusing on based on some machine learning algorithms. Therefore, it is essential for us to know the source area as closely resembling the target one as possible when building a crop yield prediction model [14]. In the meantime, optimal spatial area coverage of machine learning methods should be taken into account.

Prediction Variables and the Order of Relative Importance
To investigate the importance of different variables for the winter wheat yield prediction in China, we calculated the decreased accuracy (mean square error) after removing one variable from the RF model. The mean decrease in accuracy indicates the amount by which the random forest model's prediction accuracy would decrease if one variable is excluded. The larger the decreased accuracy for a prediction variable, the more important the variable is ( Figure 8). Mean square error represents the relative importance of the variables. Soil physicochemical properties are generally kept constant during a short period, so we haven't selected soil physicochemical variables. Importance of prediction variables is ordered as: EVI > TMIN > PRE > NDVI > SM > TMAX > DI. EVI is more important for winter wheat yield prediction than NDVI, which is consistent with the study by Bolton et al. [82]. They stated that MODIS-EVI provided relatively better results for predicting crop yields than MODIS-NDVI [82].
As two variables related to temperature, TMIN contributed more significantly than TMAX to the accurate prediction of wheat yield, implying a greater impact of TMIN on wheat yield than TMAX [35]. model's prediction accuracy would decrease if one variable is excluded. The larger the decreased accuracy for a prediction variable, the more important the variable is ( Figure 8). Mean square error represents the relative importance of the variables. Soil physicochemical properties are generally kept constant during a short period, so we haven't selected soil physicochemical variables. Importance of prediction variables is ordered as: EVI > TMIN > PRE > NDVI > SM > TMAX > DI. EVI is more important for winter wheat yield prediction than NDVI, which is consistent with the study by Bolton et al. [82]. They stated that MODIS-EVI provided relatively better results for predicting crop yields than MODIS-NDVI [82]. As two variables related to temperature, TMIN contributed more significantly than TMAX to the accurate prediction of wheat yield, implying a greater impact of TMIN on wheat yield than TMAX [35].

Model Performance for Estimating Yields in Different Time Windows
We proposed the framework for yield prediction through developing a comprehensive yield prediction model driven by weather, satellite remote sensing, and soil datasets for food security decision-making. The differences in various machine learning algorithms have been discussed above. Among the three best machine learning algorithms (GPR, SVM, and RF), the average computing time of GPR is six times that of RF, but its overall prediction accuracy is slightly lower than that of RF, and the prediction ability of RF is consistently better in different time windows. These results indicate that RF is a promising method for yield prediction [22,83].
Furthermore, we found that various environmental factors related to winter wheat yield showed disparate sensitivities to the growing season [84,85], which is in accord with previous studies demonstrating that the longer time series would better reflect the diversity of growth conditions for estimating yields [84,86]. The time period selected in our study could apply to other wheat planting areas in China, and the combination of months with the optimal time window might vary by the crop phenology in different regions. Generally, the heading period of winter wheat in the study is around in April and the grain filling period in May. Remote sensing vegetation indices (e.g., NDVI and EVI) can reflect the physiological characteristics of crops in different growth stages [87,88]. The close correlations between vegetation indexes and crop yields, especially during the flowering and grain filling stages [78,89,90], have strengthened the important role of later time window for yield prediction. Higher values of vegetation indexes are generally associated with faster growth rates and higher biomass accumulation in the vegetative period, which can prolong the grain filling period by delaying leaf senescence at the maturity stage to increase yield [88,91], and vice versa. Thus, more details of vegetation growing conditions with longer time window will really seize the impacts from environments on the final yields [92].
Moreover, the maximum and minimum temperatures can also characterize the impacts of temperature during the growing season in some degree. For example, Slafer et al. [93] have pointed out that during the grain filling stage, the crop is highly sensitive to temperature, and the increase in temperature causes a contraction of grain filling time [94,95]. However, the dry grain weight (quantified as the product of time and grain filling rate) increases linearly with the extension of the filling time under a suitable temperature condition [96]. Hence heat stress during the grain filling stage will negatively affect crop yield [97][98][99] through preventing the transfer of photosynthetic products to the grains [100], or damaging photosynthesis in winter wheat leaves and causing faster senescence and shorter maturity [101,102]. In contrast, the low temperature in winter (frost damage) is a crucial stress affecting the growth of winter wheat, which usually causes damage to, or even the death of wheat seedlings, reducing the number of spikelets or seeds, consequently leading to a decreased yield [93]. Similarly, other climatic variables (e.g., rainfall, soil moisture, and drought index) during the whole growing season of wheat also contribute significantly yield prediction [103].
The weaker associations between EVI/NDVI and yields during the early growth stages, climatic factors could provide more spatial and temporal information for more accurate yield prediction. Combing the appendix role of climatic factors with significant contributions from EVI/NDVI during later growth stages, the yield models have performed robustly for predicting winter wheat yields. Thus, integrating the advantages of remote sensing vegetation indexes and climatic variables, the climate variables in the early time and the late time of the growing period may contribute more to yield forecast. We found the predictors with a longer time windows would better predict the yield because of their better characterization of the diversity of growth conditions. The result is consistent with the previous statement that the effects of different variables on wheat growth and yield are complex and diverse [104,105]. Since machine learning itself is a kind of black box, the conclusions drawn from our analysis are limited in terms of plant physiology. The application of crop process models in the future may help to elaborate the mechanism between variables and yield in more detail.

Model Performance for Regional Differences
We also found that the accuracy of yield prediction varied by the time window and across different regions. The regional disparate is mainly caused by the regional differences in environmental characters, which have been supported by the previous studies [35,85,86,106]. Ji et al. pointed out that the yield prediction accuracy of the regression models increased with the shrinking of the geographical area [86]. Larger spatial scales include more areas and greater variability in planting conditions. In our study, the variation of winter wheat yield in the Huang-Huai-Hai Plain is mainly affected by irrigation applications and shows a lower sensitivity to rainfall, while a higher sensitivity to rainfall is indicated in other rainfall-fed areas. Moreover, the sample size in diverse regions also will affect yield prediction errors [35]. We established a larger-area-scale winter wheat yield prediction model applicable to average climatic conditions in China and achieved good prediction results. However, the framework of crop yield prediction proposed by us will provide a scientific paradigm for all regions. Attention should be paid to the discrepancy in predictive variables (e.g., winter wheat cultivars) as the spatial region varies.

Feature Importance in Yield Estimations
The relative impact of a single variable cannot be quantified independently of other variables, and the RF method provides a measure for assessing the relative importance of variables to the prediction results [22]. The results show that vegetation and climate data are crucial for yield prediction. Previous studies have also emphasized the importance of vegetation index, precipitation and temperature in predicting wheat yield [22,23,107,108]. EVI can reflect the phenological characteristics, which is a significant growing state described in many crop growth models (e.g., WOFOST) [109,110]. Due to the phenological changes of plant growth under different climatic conditions, the correlation between crop biomass and remote sensing vegetation index in different phenological periods is different [111][112][113]. In our study, EVI plays more important role for estimating yield than NDVI. We might attribute the different order to the following: (1) the final crop yield is significantly related to the duration of green biomass [91], especially for the maximum biomass during heading stage [114]; (2) the saturation problem of NDVI under high biomass condition may cause inaccurate yield estimation at the heading stage (April) [88,115]; (3) EVI overcomes the problems of NDVI saturation and soil noise by decoupling of the canopy background signal and reducing the atmosphere influence, and hence EVI will improve the sensitivity to high biomass condition and canopy structure [88,90,116]. Therefore, EVI is an effective indicator to track crop phenological events and to evaluate and monitor seasonal changes of crops and evergreen vegetation [82,90,117]. Furthermore, in many other crop (corn and rice) yield prediction studies, EVI has proven consistently to be more effective than NDVI [82,88]. The important role of EVI will benefit scientists to estimate winter wheat yield more timely and accurately at a larger spatial scale to ensure regional food security.
Climate data affect crop growth rates, which directly determine the final crop yield because of its strong correlation with spike number, spike length and wheat grouting time [35]. The impact of climate on winter wheat is mainly reflected by the annual variabilities in yields. Additionally, we found that the relative importance of TMIN is higher than that of TMAX (Figure 8), in accord with the previous studies [35]. Tao et al. found that crop yields will increase with the increase of TMIN, which is mainly due to the decrease in frost damage [35]. We also found that the importance of the DI is relatively low, and the potential reasons might be the relatively small difference of DI during the whole growing season (Figure 2e).

Uncertainties in the Study
There are still some uncertainties, mainly from data sources. For example, the most limitations come from uncertainties in data quality such as the low resolution of the remote sensing, meteorological and soil datasets, which lower the predictive ability of machine learning and make the predicted yields potentially uncertain [118]. The launch of higher-resolution remote sensing satellites, such as sentinel-2 (with the maximum resolution of 10 × 10 m) [119,120], offers new opportunities to provide more accurate yield prediction. In addition, management measures of human activities such as irrigation, fertilization, and wheat varieties are very important for crop growth, which have been extensively studied in many studies [121]. For example, Ji et al. [86] believe that soil fertility is the key input variable of the artificial neural network model. Because of the limitation of data availability, these variables are only supplied in very limited places, and availability for crop yield prediction at a larger scale remains a challenge. Currently, many studies [122,123] are trying to use more new variables for yield prediction, such as solar-induced chlorophyll fluorescence (SIF) and radar. SIF can reflect crop photosynthesis and microwave remote sensing (either passive or active) can provide canopy biomass and water content. Such additional information will potentially improve yield prediction and should be considered in the future.
In contrast, we should note that no any mechanism processes of crop growth have been included by machine learning models due to their internal black box. Although machine learning is able to mine data efficiently, such unknown processes included will inevitably increase the uncertainty of model performance [85]. Alternatively, crop growth models are developed and validated by many experts during decades of researches. Crop models have characterized the internal growth and development mechanism of crops in some degree and have been widely applied with higher accuracy in many regions. Thus, combining machine learning with the crop growth model is an idea for the future study of yield prediction [123].

Conclusions
We predicted winter wheat yield at the county scale based on multi-source data and multiple machine learning models. It was found that RF, GPR, and SVM predicted wheat yields with higher accuracy, and RF demonstrated the best generalization ability among the three methods. RF model can estimate wheat yields accurately in advance (before the harvesting dates) in China. Moreover, we investigated the impact of the time window selected on the prediction accuracy and found the window has a high prediction accuracy throughout the growing period. We also found that the prediction accuracy varied by agricultural zones and algorithms, and the regional difference will affect the yield prediction accuracy. In addition, EVI is the most important predictor used in our study for winter wheat yield. We are sure the framework to forecast winter wheat yield by multi-source data and the GEE platform is generic and applicable to other crops in the world.