Identifying the Contributions of Multi-Source Data for Winter Wheat Yield Prediction in China

: Wheat is a leading cereal grain throughout the world. Timely and reliable wheat yield prediction at a large scale is essential for the agricultural supply chain and global food security, especially in China as an important wheat producing and consuming country. The conventional approach using either climate or satellite data or both to build empirical and crop models has prevailed for decades. However, to what extent climate and satellite data can improve yield prediction is still unknown. In addition, socio-economic (SC) factors may also improve crop yield prediction, but their contributions need in-depth investigation, especially in regions with good irrigation conditions, su ﬃ cient fertilization, and pesticide application. Here, we performed the ﬁrst attempt to predict wheat yield across China from 2001 to 2015 at the county-level by integrating multi-source data, including monthly climate data, satellite data (i.e., Vegetation indices (VIs)), and SC factors. The results show that incorporating all the datasets by using three machine learning methods (Ridge Regression (RR), Random Forest (RF), and Light Gradient Boosting (LightGBM)) can achieve the best performance in yield prediction (R 2 : 0.68~0.75), with the most individual contributions from climate (~0.53), followed by VIs (~0.45), and SC factors (~0.30). In addition, the combinations of VIs and climate data can capture inter-annual yield variability more e ﬀ ectively than other combinations (e.g., combinations of climate and SC, and combinations of VIs and SC), while combining SC with climate data can better capture spatial yield variability than others. Climate data can provide extra and unique information across the entire growing season, while the peak stage of VIs (Mar.~Apr.) do so. Furthermore, incorporating spatial information and soil proprieties into the benchmark models can improve wheat yield prediction by 0.06 and 0.12, respectively. The optimal wheat prediction can be achieved with approximately a two-month leading time before maturity. Our study develops timely and robust methods for winter wheat yield prediction at a large scale in China, which can be applied to other crops and regions.


Introduction
Wheat (Triticum aestivum L.) is the leading cereal grain in the world, providing the most calories and protein for world food supply [1,2].China is the major wheat producing and consuming country globally [3][4][5][6], with winter wheat reaching about 85% of its total summer grain production [7].Wheat yield in China, however, has stagnated in recent years [7,8].Timely and reliable crop yield prediction has increasingly become one of the key issues for food security, supply chain planning of agriculture industry, and market prediction for the entire population [9,10].Furthermore, such yield estimations will also help farmers to make informed management and financial decisions in advance [11,12].In recent years, extensive studies have focused on crop yield prediction at different spatial scales by either using statistical regression models or crop models in various countries [13][14][15][16].With some explicit cause-effect relationships [7,8], regression models have been increasingly replaced by crop models due to their lower required explanation and limited spatial generalization in recent years [17].However, there are still many problems regarding crop models since they over-depend on extensive input data such as climate, cultivar, management, and soil conditions, as well as huge requirements in replicability, transparency, and code efficiency [1,16,18].Thus, the problems associated with the above two methods highlight the need for another novel approach with an ability to provide timely, reliable, and cost-effective wheat yield prediction.
Machine learning (ML) has been widely and successfully applied in various data-driven fields, such as for analysis of landslides susceptibility, image processing, and facial expression recognition [19][20][21][22][23][24], as well as in various agricultural fields such as crop classification [25,26], grassland fuel content estimation [27], and crop yield estimation [28,29].Compared with crop models and the traditional statistical methods, ML has limited process-based interpretation and can handle nonlinear relationships for crop yield prediction.Several previous studies had proved its ability in improving crop yield prediction [1,12,28,[30][31][32][33].However, such methods have rarely been tested for crop yield prediction in China.
Crop yields are affected by many variables, which are generally classified into dynamic and static variables.Climate, vegetation indices (VIs), and social-economic (SC) factors belong to the former, which can be monitored at various temporal scales [1,34].Climate and VIs are usually available through observation stations or remote sensing, while SC factors are reported in annual statistic reports of different administrative units [8,18].Variables unchanged during the growing season are the static ones, including spatial information (e.g., latitude and longitude) and soil features.Distinct spatial patterns of crop yields have substantiated strongly that these static variables have impacts on final yields [1,35,36], implying their potential roles in improving wheat yield estimation and gap analysis [37].
Satellite remote sensing has been successfully used for monitoring crop growth, tillage differentiation, and yield prediction owing to spatiotemporal capture of earth's surface across various spectral bands [38].Among them, visible and near-infrared data have the most advantages in monitoring crop growth for estimating crop yield by developing various vegetation indices [38][39][40][41].Since Tucker et al. (1980) developed the first Normalized Difference Vegetation Index (NDVI) [42], several popular VIs have been extensively applied in the agricultural field (e.g., NDVI, enhanced vegetation index (EVI), green chlorophyll vegetation index (GCVI)) [18,25,40].Given the spectral indices can dynamically capture crop growing conditions through various combinations, and these various products have commonly shared and complementary information to contribute to yield prediction, we focused on three VIs associated strongly with yield: EVI, GCVI, and NDVI, to determine their contributions to wheat yield prediction in China [43][44][45][46].In addition, climate variables (e.g., temperature and precipitation) are the primary inputs for crop yield prediction, which can capture the environment information [1,34].
With exception of dynamic variables (e.g., VIs, climate data) monitoring natural environment states, SC factors also play an important role in crop yield and production [47,48], especially irrigation, fertilizer application, pesticide use, farm mechanization status, and others.For example, adopting the best farm management practices could increase crop yield [49,50]. Understanding how changes in farm management practices can benefit crop yield prediction, especially for spatial analyses of yield [47,49].However, very few studies have considered the contribution of SC factors for crop yield prediction at a regional scale so far.Although the availability and accessibility of multi-source data with potentially valuable information for agricultural applications increase, very few studies have quantified their shared and unique contributions to predicting wheat yields and their spatiotemporal variations across China.Therefore, we first adopt exploratory data analysis (EDA) to select variables that are significantly related to crop yield.Then a multiple linear regression [ridge regression (RR)] and two machine learning methods-Random Forest (RF) and Light Gradient Boosting (LightGBM)-are applied to estimate wheat yield and compare their yield prediction performance by using different combinations of input variables with different growing stages.In this paper, the following questions need to be solved: (1) how can we integrate such multi-source data (climate, satellite, and SC factors) to derive the best wheat yield prediction model to explain the spatio-temporal variation of yields?(2) how much will such unique and shared information from temporal dynamic data improve wheat yield prediction across China?(3) how do ML techniques perform compared with the common regression technique for crop yield prediction? (4) to what degrees will the static variables (spatial information and soil properties) improve wheat yield prediction in China?

Study Area
In this study, the main winter wheat producing areas in China-North China Plain (NCP) (Figure 1) was selected as our research study, which covers 398 counties in five provinces (Hebei, Shandong, Henan, Jiangsu, and Anhui).This region, between 29 • 41 N~42 • 40 N and 111 • 21 E ~122 • 43 E, is one of the important food baskets in China, with a total wheat cultivation area of 16.18 × 10 4 ha, accounting for 67.02% of the total wheat area and 75.79% of the total wheat yield in China [17,51].The region mainly has a semi-humid climate, with the total annual precipitation from 400 to 800 mm, and the total amount is relatively low for winter wheat growth.Therefore, the abundance and shortage of water resources greatly affects winter wheat yield, and the social and economic development factors (e.g., agricultural irrigation, fertilizer, and pesticide consumption) play a great role in crop yield [17,51]. Winter wheat grown across this region is generally sown at the beginning of October (spring) and harvested at the end of May or the beginning of June (autumn) in the following year [17].
Remote Sens. 2020, 12, 750 3 of 22 unique contributions to predicting wheat yields and their spatiotemporal variations across China.Therefore, we first adopt exploratory data analysis (EDA) to select variables that are significantly related to crop yield.Then a multiple linear regression [ridge regression (RR)] and two machine learning methods-Random Forest (RF) and Light Gradient Boosting (LightGBM)-are applied to estimate wheat yield and compare their yield prediction performance by using different combinations of input variables with different growing stages.In this paper, the following questions need to be solved: 1) how can we integrate such multi-source data (climate, satellite, and SC factors) to derive the best wheat yield prediction model to explain the spatio-temporal variation of yields?2) how much will such unique and shared information from temporal dynamic data improve wheat yield prediction across China? 3) how do ML techniques perform compared with the common regression technique for crop yield prediction? 4) to what degrees will the static variables (spatial information and soil properties) improve wheat yield prediction in China?

Study Area
In this study, the main winter wheat producing areas in China-North China Plain (NCP) (Figure 1) was selected as our research study, which covers 398 counties in five provinces (Hebei, Shandong, Henan, Jiangsu, and Anhui).This region, between 29°41′N~42°40′N and 111°21′E ~ 122°43′E, is one of the important food baskets in China, with a total wheat cultivation area of 16.18 × 10 4 ha, accounting for 67.02% of the total wheat area and 75.79% of the total wheat yield in China [17,51].The region mainly has a semi-humid climate, with the total annual precipitation from 400 to 800 mm, and the total amount is relatively low for winter wheat growth.Therefore, the abundance and shortage of water resources greatly affects winter wheat yield, and the social and economic development factors (e.g., agricultural irrigation, fertilizer, and pesticide consumption) play a great role in crop yield [17,51]. Winter wheat grown across this region is generally sown at the beginning of October (spring) and harvested at the end of May or the beginning of June (autumn) in the following year [17].

Dataset and Preprocessing
The yield records, cropping area, climate, SC, and satellite variables were obtained from various sources (Table 1).We first unified spatial and temporal resolutions of climate and satellite data into 1 km and monthly interval, respectively.Then crop maps were used to mask climate and satellite data, and aggregate mean monthly variables at the county-level.
Table 1.Summary in the collected datasets for winter wheat yield prediction in China.The Tmax, Tmin, Pre, Vpd, and Vap refer to maximum temperatures ( • C), minimum temperatures ( • C), precipitation (mm), vapor pressure deficit (kPa), and vapor pressure (kPa), respectively; IA, CCF, CAP, TPAM, and ECRA refer to the irrigation area (ha), the consumption of chemical fertilizers (ton), the consumption of agricultural pesticide (CAP), the total power of agricultural machinery (TPAM), and the electricity consumed in rural areas (ECRA), respectively.

Crop Yield and Area
The winter wheat yields at county-level were obtained from the Agricultural Yearbook of each county (http://www.stats.gov.cn)from 2001 to 2015 (unit: kg/ha).A preliminary data quality check was conducted to identify and filter the outliers: (i) those that fell outside the range of biophysical attainable yield records. (ii) those that fell outside the range of the mean from 2001 to 2015 plus or minus two times of standard deviation [17,52,53].Furthermore, the accurate information of annual winter wheat regions is vital to predicting yield, but is not available in the public domain.The National Land Cover Dataset (NLCD) of China, with a spatial resolution of 1 km for 2005, 2010, and 2015, was provided by Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (http://www.resdc.cn/).These datasets are generated by visual interpretation and digitization of Landsat TM/ETM+ images with an overall classification accuracy above 90% [54,55].The typical cropping system of this region is a winter wheat-summer maize rotation system, and only wheat is sown in dryland during our study period [7,8].Therefore, we extract dryland in cultivated land as the winter wheat areas in the three time periods and then use the three generated area maps as the broad wheat cropping areas (Figure 1).According to the statistical data from Agricultural Yearbook of each province (Figure A1), we assume that there is no significant crop area change in the five years.Therefore, the crop coverage map of 2005 was used to mask the explanatory variables (i.e., climate, satellite data, and soil properties) for 2001-2005, 2010 for 2005-2010, and 2015 for 2010-2015, respectively.

Satellite Data
Moderate Resolution Imaging Spectroradiometer (MODIS) Terra MOD09A1 Version 6 with 500 m and 8-day resolution (https://doi.org/10.5067/MODIS/MOD09A1.006) from 2000 to 2015 were retrieved from GEE (Google Earth Engine) platform.The GEE platform hosts remotely sensed imagery and other data that are freely available.NDVI is the widest indicator to monitor crop growth and predict yield since the early 1980s [42,56], which is highly correlated with the vegetation vigor and canopy background variations [46].EVI can improve sensitivity in high biomass regions and indicate a potential photosynthetic capacity due to its usefulness in estimating the fraction of absorbed photosynthetically active radiation (fPAR) related to chlorophyll contents [44].GCVI has been found to have the most significant linear relationship with the leaf area index (LAI) compared with other VIs [43,45].Therefore, we chose three commonly used VIs-NDVI [42], EVI [44], and GCVI [43] to approximate aboveground vegetation dynamics associated with biomass and photosynthesis.The three indexes were calculated as where NIR, RED, GREEN, and BLUE represent reflectance at near infrared, red, green, and blue wavelengths, respectively.Here, the values of coefficients for EVI were set at G = 2, C1 = 6, C2 = 7.5, and L = 1.

Climate Data
Meteorological forcing data were obtained from the TerraClimate dataset (http://doi.org/10.7923/G43J3B0R) within the GEE platform.This monthly dataset generated from climatological aided interpolation, with a high spatial resolution (1/24 • , ~4 km) for global terrestrial surfaces from 1958-2018, was produced by Abatzoglou et al., (2018).Compared with the Climatic Research Unit (CRU) and other coarser resolution gridded datasets, TerraClimate data showed significant improvements in the overall mean absolute error and increased spatial realism over maximum (Tmax), minimum temperatures (Tmin), precipitation (Pre), vapor pressure (Vap), and other variables [57].

Socio-Economic Factor
Except for climate and satellite data, socio-economic features could also improve winter wheat yield prediction, especially in the NCP.Winter wheat in this region is generally grown under non-limiting conditions, with sufficient irrigation, and fertilizer applications [51].Therefore, the irrigation area (IA; unit: ha), the consumption of chemical fertilizers (CCF, unit: ton), the consumption of agricultural pesticide (CAP, unit: ton), the total power of agricultural machinery (TPAM, unit: kw), and the electricity consumed in rural areas (ECRA, unit: kwh) at the county-level were obtained from the Agricultural Yearbook of each county (http://www.stats.gov.cn)from 2001 to 2015 to assess the influence of socio-economic factors on yield prediction.

Other Datasets
The DEM (digital elevation model) with 90 m × 90 m was obtained from the Shuttle Radar Topography Mission (SRTM) digital elevation dataset.Soil properties [58] including soil depth, soil texture, organic carbon content, pH, cation exchange capacity, and bulk density for the topsoil layer (0-30 cm) and the subsoil layer (30-100 cm) at 0.00833 • (~1 km) were also collected, which were gathered from a soil particle-size distribution dataset in China (http://globalechange.bnu.edu.cn).

Methods
Two high-performance-based machine-learning methods (RF and LightGBM) and a traditional multiple regression method (RR) were applied to winter wheat prediction in the NCP.First, all the variables and yields from 2001 to 2015 were normalized to have mean zero and unit standard deviation.Therefore, all of the variables in the models are at a common level and are comparable [34].Second, the whole dataset across all years and counties were randomly divided into two parts: training (70%) and testing (30%) datasets.Then, the cross-validated R 2 was calculated by applying 10-fold cross-validation and GridSearchCV packages in Python 3.7 to optimize hyper-parameters for each method from empirical candidates only using the training data [1].Finally, we employed the three optimized models on the testing dataset and calculated the predicted R 2 .In order to minimize the uncertainty of R 2 , the whole process for one predicted R 2 was repeated 100 times to calculate the mean predicted R 2 , which was used to evaluate the performance of different models.All the R 2 in the following sections refer to the mean predicted R 2 .The results of those models were utilized to compare with each other.Here, Python 3.7 was applied to develop the three models.The following sections describe the three methods in details.

Ridge Regression (RR)
RR is one of the most popular multiple linear regression methods proposed by Hoerl and Kennard (1970).Compared with other linear regression methods, this algorithm introduces a method for showing in two dimensions the effects of nonorthogonality, namely ridge trace [59], which locates the minimum residual sum of squares of the prediction and limits the sum of squares of the regression coefficients [60,61].The alpha (Regularization strength) is the main hyper-parameter that needs to be tuned.RR method is mainly applied to multiple regression data with high correlations.When correlation occurs, least squares estimates are unbiased, but their variances are large so they may be far from the true value.By adding a degree of bias to the regression estimates, RR can reduce the standard errors.Thus, the RR performs well and results in parsimonious models.Since the input variables (i.e., climate and VIs in different months) have correlations, it is rational to use the RR model for predicting yield.

Random Forest (RF)
RF, first developed by Breiman (2001), is an ensemble learning method, which creates many regression trees that are generated by a large set of decision trees for computing regression [62].The RF algorithm has been applied to research as a powerful tool for yield prediction [1,33].Each in the RF algorithm increases diversity among the regression trees by selecting a random set of variables and samples of the dataset over the different tree induction processes [63].The number of trees in the forest, the tree number of predictive variables used to split the nodes, and the maximum depth of tree are three user-defined hyper-parameters that needed to be tuned [1].The RF usually performs overall better and is more accurate than any of decision tree [63], as the bias is compensated for by the single decision tree due to the randomness.Furthermore, RF can effectively deal with high-dimensional datasets [1], which can analyze these datasets, in our research (e.g., five monthly climate variables and three monthly indices derived from the satellite data over seven months).

Light Gradient Boosting Machine (LightGBM)
The LightGBM, generated by Ke and colleagues in 2017, is a relatively new machine learning method, which is also an ensemble learning method and is mainly based on the Decision Tree algorithm [64].This method has been applied to many different kinds of data mining tasks (e.g., classification, regression, and ordering) and exhibits excellent accuracy [65].The LightGBM algorithm contains two novel techniques: the gradient-based one-side sampling and the exclusive feature bundling, respectively, which are convenient for dealing with a number of data instances and a number of features.Therefore, compared with other similar algorithms (e.g., eXtreme Gradient Boosting) [66], LightGBM can significantly perform better in terms of computational speed and memory consumption.The number of leaves per tree, the speed of iteration, the maximum depth of the tree, the minimum number of the records for a leaf, the fraction of features selected randomly in each iteration, and the fraction of data to be used for each iteration are the main parameters that needed to be tuned in the LightGBM algorithm [66].

Experiment Design
Three experiments were designed to identify the contribution of multi-source data to improving wheat yield prediction and answer the four questions by using RR, RF, and LightGBM mentioned above.The details about those experiments are shown in the following sections.

The First Experiment to Separate the Spatial and Temporal Variations of Yields and Combine the Explanatory Ones Differently
To distinctly capture the spatial and temporal patterns, three options for yield were performed (Figure 2).Option 1: all recorded yields and input variables were selected in a county as a sample, namely "Raw".Such raw yield essentially includes both spatial and temporal features of yield variability.Option 2: Linear trends of the raw yield and input variables at the county level (i.e., detrend) were removed.Then we obtained the corresponding detrended datasets, namely "Detrend", which thus removed their spatial variabilities at the county-level while the remaining temporal variabilities remained both in yield and input variables when compared with raw values [34].Comparing the results between Option 1 and 2 can reveal their distinctions in explaining spatial and temporal variability.Option 3: Using the average of the raw data covering 2001-2015 for each county plus the "Detrend" (Option 2) to obtain another new dataset, namely "Detrend+Mean".It maintains the spatial variability when compared with the "Detrend" dataset, but removes the multi-year linear trend when compared with the raw one [34].

Experiment Design
Three experiments were designed to identify the contribution of multi-source data to improving wheat yield prediction and answer the four questions by using RR, RF, and LightGBM mentioned above.The details about those experiments are shown in the following sections.To distinctly capture the spatial and temporal patterns, three options for yield were performed (Figure 2).Option 1: all recorded yields and input variables were selected in a county as a sample, namely "Raw".Such raw yield essentially includes both spatial and temporal features of yield variability.Option 2: Linear trends of the raw yield and input variables at the county level (i.e., detrend) were removed.Then we obtained the corresponding detrended datasets, namely "Detrend", which thus removed their spatial variabilities at the county-level while the remaining temporal variabilities remained both in yield and input variables when compared with raw values [34].Comparing the results between Option 1 and 2 can reveal their distinctions in explaining spatial and temporal variability.Option 3: Using the average of the raw data covering 2001-2015 for each county plus the "Detrend" (Option 2) to obtain another new dataset, namely "Detrend+Mean".It maintains the spatial variability when compared with the "Detrend" dataset, but removes the multi-year linear trend when compared with the raw one [34].The first sub-group experiment is designed to investigate how climate and VIs during different growing stages-separated by early (Oct.-Feb.: the emergence to green-up period), peak (Mar.-Apr.: the green-up to heading period) and late (May-Jun.: the heading to maturity period) stages [51]-contribute to yield prediction (the second question).Then to evaluate the influence of climate and VIs from different growing stages on crop yield estimation, VIs (climate) data from the whole growing season but only the climate (satellite) data during one specific growing stage in the following six options WERE used.Option 1: only using climate data during the early growing stage ("early climate + all VIs data"); Option 2: only using climate data during the peak growing stage ("peak climate +all VIs data"); Option 3: only using climate data during the late growing stage ("late climate +all VIs data").Option 4: only using satellite data during the early growing stage ("early VIs + all climate data"); Option 5: only using satellite data during the peak growing stage ("peak VIs + all climate data"); Option 6: only using satellite data during the late growing stage ("late VIs + all climate data").The predicted R 2 values of the new models were compared with the corresponding benchmark and determined climate (VIs), at which stage we could have more additional contributions to the final yield prediction.
The second sub-group experiment was focused on the temporal progress of the model performance after inserting another monthly variable.For any month during the growing season, the input information from the current month and all previous months since the beginning of the growing season was used to predict the final wheat yield.The predicted R 2 from different models based on different combinations of inputs (i.e., climate data only, VIs data only, and all data combined) were obtained and compared to quantify the added values of either climate data or VIs data over the time for final wheat yield prediction.

The Third Experiment to Investigate the Values of Static Variables on Yield Prediction, and to Validate the Model Performances
The third experiment was designed to determine whether adding spatial and soil proprieties information can improve the performance of wheat yield prediction.There were three static spatial information variables (DEM, latitude, and longitude) and 14 soil proprieties (topsoil and the subsoil layer) in the following two options: Option 1: only adding spatial information ("With spatial features"), Option 2: only adding soil features ("With soil features").Their predicted R 2 with those of benchmark models (Option 3) was compared and investigated to obtain the additional values of spatial and soil features for predicting wheat yield.
Finally, the "leave-one-year-out" validation was performed from 2001-2015 to assess the generalization ability of those models.Specifically, One-year data were used for testing and all the other years for training.For example, the 2001-2014 data were chosen as training data to predict the crop yield in 2015, and the data for 2001-2013 and 2015 were applied as training data to predict the crop yield for 2014.

Exploratory Data Analysis (EDA)
The correlations between the variables and wheat yields were conducted by using the 2001-2015 raw data (Table 2).The results show that all variables are significantly correlated with yield (p < 0.01), hence they would be included to develop yield models.Tmax (0.03) and Tmin (−0.08) are negatively and positively correlated with yields, respectively.However, a negative correlation (−0.4) is found between Pre and Yield, suggesting that good irrigation conditions in the NCP should have offset the adverse impacts from insufficient precipitation.Some related studies show that irrigation conditions there have been able to complement the normal water requirements during winter wheat growing seasons since the 1980s.Not surprisingly, consistently positive relations are indicated by all SC factors, highlighting that the increasing inputs from the socio-economic measures benefit winter wheat yield, especially IA (0.09), TPAM (0.16), and CCF (0.16).Similarly, the consistently positive impacts of VIs further substantiate that the remote sensing derived data could explain the variability of crop yields.The fifteen-year-averaged (2001-2015) spatial patterns of the VIs, climate, and SC factors in March are plotted (Figure 3).The similar spatial patterns among NDVI, EVI, GCVI, and Yield also support the above findings.

The Performances of Multi-Models for Predicting Wheat Yield
The results from the first experiment (Figure 4a) distinguish both the shared and unique information of the different predictor variables.A ranking of those combinations from high to low ranges from: "VI + Climate + SC" (R 2 , 0.68~0.75)> "VI + Climate" (0.63~0.73) > "Climate +SC" (0.54~0.70) > "VI + SC" (0.50~0.55) > "Climate" (0.45~0.58) > "VI" (0.41~0.47) > "SC" (0.30~0.43).As expected, the model performances increase with more input variables.For example, the models with VIs, climate and SC data together perform best for raw crop yields.However, the models with only climate data achieve the highest R 2 , followed by those with VIs and SC factors.

Quantifying Unique and Shared Information from Climate and VIs
The results of the first sub-group of the second experiment were shown in Figure 5 to indicate how sequential information (climate and VIs) during different growing stages (early, peak, and late stages) contribute to crop yield prediction.The models developed by all VIs with a specific climate data show an increased R 2 ranging from 0.05 to 0.25, especially for LightGBM models (Figure 6a).Moreover, the RR models perform better than RF and LightGBM models when only using VIs data (the black dashed lines), but do worse when using both VIs and climate data.The different performance might be that RR characterized by a linear model could not fully capture the no-linear relationships between climate variables and yield.Additionally, all the R 2 of three models have improved significantly (bars above the dashed line in Figure 5a) after combining a specific climate data with VIs, suggesting the climate inputs across the whole growing seasons could provide unique and added information for better yield estimation.
On the other side, the increased R 2 (0.01-0.15) with all of climate variables and a specific VIs (Figure 6b) is smaller than those in Figure 5a, except for the RR models with peak VIs.More interestingly, the largest improvement of R 2 (0.07~0.15) is found in the models with the peak VIs, while smaller changes for those with the other two stages, especially for the late VIs.The roles of VIs are different because the peak-value vegetation index reflects most crop growth states suffering from both biotic and abiotic stresses, while crop at "Late" stage enters the senescence stage and vegetation index is not a good indicator of biomass and final yield anymore.

Quantifying Unique and Shared Information from Climate and VIs
The results of the first sub-group of the second experiment were shown in Figure 5 to indicate how sequential information (climate and VIs) during different growing stages (early, peak, and late stages) contribute to crop yield prediction.The models developed by all VIs with a specific climate data show an increased R 2 ranging from 0.05 to 0.25, especially for LightGBM models (Figure 6a).Moreover, the RR models perform better than RF and LightGBM models when only using VIs data (the black dashed lines), but do worse when using both VIs and climate data.The different performance might be that RR characterized by a linear model could not fully capture the no-linear relationships between climate variables and yield.Additionally, all the R 2 of three models have improved significantly (bars above the dashed line in Figure 5a) after combining a specific climate data with VIs, suggesting the climate inputs across the whole growing seasons could provide unique and added information for better yield estimation.
On the other side, the increased R 2 (0.01-0.15) with all of climate variables and a specific VIs (Figure 6b) is smaller than those in Figure 5a, except for the RR models with peak VIs.More interestingly, the largest improvement of R 2 (0.07~0.15) is found in the models with the peak VIs, while smaller changes for those with the other two stages, especially for the late VIs.The roles of VIs are different because the peak-value vegetation index reflects most crop growth states suffering from both biotic and abiotic stresses, while crop at "Late" stage enters the senescence stage and vegetation index is not a good indicator of biomass and final yield anymore.We further summarize in the temporal progresses of the three models' R 2 more detail using different combinations of input variables (Figure 6a-c).The consistent temporal patterns have been indicated by all R 2 trajectory curves that R 2 would increase with more input data being included until it reaches a saturation at the "Peak" stage (i.e., April).The combinations of inputs perform better than only individual either VIs or climate data.Overall, only climate inputs obviously outperform unique VIs, except for the RR model in the "Later" trajectory.Moreover, both climate and their combinations start with a relatively high performance (R 2 > 0.4) for the RF and LightGBM models and R 2 increases gradually (0.1~0.3) with the growing season moving on, with the largest increase in June (Figure 6b-c).However, only VIs inputs start with a much lower performance (~0.1) and achieve larger increases in performance as crops become mature (R 2 increase in 0.3~0.4)(Figure 6a-c).These results might be associated with our modeling framework.For example, the research is explained at a county scale with each county as an independent sample; and all models predict both spatial and temporal variability in the study.More specifically, climate information during the early season usually captures some spatial patterns of yield.Temperature gradient in space is largely maintained from the early to the late growing season.Therefore, early season temperature will capture some yield patterns in space if the yield is correlated with temperature.Consequently, the relatively high predictive performances of models using climate inputs during the early season would attribute largely to their spatial patterns that can capture some spatial variability of yields.In contrast, the early VIs are consistently low in space and provide less information about the spatial pattern of yields.We further summarize in the temporal progresses of the three models' R 2 more detail using different combinations of input variables (Figure 6a-c).The consistent temporal patterns have been indicated by all R 2 trajectory curves that R 2 would increase with more input data being included until it reaches a saturation at the "Peak" stage (i.e., April).The combinations of inputs perform better than only individual either VIs or climate data.Overall, only climate inputs obviously outperform unique VIs, except for the RR model in the "Later" trajectory.Moreover, both climate and their combinations start with a relatively high performance (R 2 > 0.4) for the RF and LightGBM models and R 2 increases gradually (0.1~0.3) with the growing season moving on, with the largest increase in June (Figure 6b-c).However, only VIs inputs start with a much lower performance (~0.1) and achieve larger increases in performance as crops become mature (R 2 increase in 0.3~0.4)(Figure 6a-c).These results might be associated with our modeling framework.For example, the research is explained at a county scale with each county as an independent sample; and all models predict both spatial and temporal variability in the study.More specifically, climate information during the early season usually captures some spatial patterns of yield.Temperature gradient in space is largely maintained from the early to the late growing season.Therefore, early season temperature will capture some yield patterns in space if the yield is correlated with temperature.Consequently, the relatively high predictive performances of models using climate inputs during the early season would attribute largely to their spatial patterns that can capture some spatial variability of yields.In contrast, the early VIs are consistently low in space and provide less information about the spatial pattern of yields., the prediction at any specific month contains input data covering the period from the beginning of the growing season to that specific month, thus the later period contains more inputs and usually has a higher performance).Blue refers to the model performance of using input sources including climate data and VIs; orange for VIs only inputted; and gray for climate data only.The right panel shows the differences of model performance between combined input sources and VIs only (a blue column indicates the benefits from VIs, calculated by subtracting the orange line from the blue line in the corresponding left panel); and the differences between VI + climate and climate only (a gray column indicates the benefits from VIs, calculated by subtracting the gray line from the blue line in the corresponding panel).
In addition, the increased values of the individual dataset over time are also calculated (Figure 6d-f).The distinctly consistent patterns of the declined contributions of climate data as wheat grow from the "Early" to the "Late" stages, while increased ones for VIs.These results suggest that as the growing season progresses, the roles from climate have been replaced gradually by those of VIs.Of course, it is worth noting the obvious difference between the roles of climate data in Figure 6 and those in Figure 5.We could primarily ascribe such a disparity to the different experimental design.Figure 5 shows the increased roles of a specific climate/VIs during different stages combining with all VIs/climate information from the whole growth stages, while Figure 6 indicates the accumulative roles of the individual dataset at different monthly scales., the prediction at any specific month contains input data covering the period from the beginning of the growing season to that specific month, thus the later period contains more inputs and usually has a higher performance).Blue refers to the model performance of using input sources including climate data and VIs; orange for VIs only inputted; and gray for climate data only.The right panel shows the differences of model performance between combined input sources and VIs only (a blue column indicates the benefits from VIs, calculated by subtracting the orange line from the blue line in the corresponding left panel); and the differences between VI + climate and climate only (a gray column indicates the benefits from VIs, calculated by subtracting the gray line from the blue line in the corresponding panel).
In addition, the increased values of the individual dataset over time are also calculated (Figure 6d-f).The distinctly consistent patterns of the declined contributions of climate data as wheat grow from the "Early" to the "Late" stages, while increased ones for VIs.These results suggest that as the growing season progresses, the roles from climate have been replaced gradually by those of VIs.Of course, it is worth noting the obvious difference between the roles of climate data in Figure 6 and those in Figure 5.We could primarily ascribe such a disparity to the different experimental design.Figure 5 shows the increased roles of a specific climate/VIs during different stages combining with all VIs/climate information from the whole growth stages, while Figure 6 indicates the accumulative roles of the individual dataset at different monthly scales.

The Effects of Spatial Information and Soil Properties on Improving Yield Estimation
Comparing with the roles of dynamic variables (climate and satellite data) for improving yield estimation, those of two types of static variables are shown in Figure 7.The results show that both spatial information and soil properties contribute significantly to wheat yield prediction (an increased R 2 by 0.05-0.16),especially for spatial information (Figure 7).The underlying reasons might be that elevation, longitude, latitude, and soil properties can characterize county-specific features over a long time, while dynamic factors (climate, VIs, SC factors) capture dynamic states related to yield during the crop growth season.For example, high-yield areas are usually characterized by fertile soils, lower elevations, and other suitable environmental conditions and vice versa for low-yield areas.All such features can be comprehensively represented by spatial information and soil properties, rather than by VIs, climate, or SC factors.

The Effects of Spatial Information and Soil Properties on Improving Yield Estimation
Comparing with the roles of dynamic variables (climate and satellite data) for improving yield estimation, those of two types of static variables are shown in Figure 7.The results show that both spatial information and soil properties contribute significantly to wheat yield prediction (an increased R 2 by 0.05-0.16),especially for spatial information (Figure 7).The underlying reasons might be that elevation, longitude, latitude, and soil properties can characterize county-specific features over a long time, while dynamic factors (climate, VIs, SC factors) capture dynamic states related to yield during the crop growth season.For example, high-yield areas are usually characterized by fertile soils, lower elevations, and other suitable environmental conditions and vice versa for low-yield areas.All such features can be comprehensively represented by spatial information and soil properties, rather than by VIs, climate, or SC factors.

The Best Combinations of Explanatory Variables to Explain Spatial or Temporal Variability of Wheat Yield
In the study, three levels of yield data were separated, namely "raw", "Detrend", and "Detrend + Mean" (Figure 4).The models, driven by different combinations of inputs (i.e., climate, VIs, and SC), show disparate performances to track spatial or temporal crop yield variability (Figure 4).For example, the combination of VIs and Climate data can best capture inter-annual yield variability, but SC and climate combination capture spatial yield variability, suggesting that VIs and climate are key factors controlling inter-annual yield variability, and SC and climate for yield variability among counties [1,53,67].These results might be caused by the fac that the information on VIs and climate variables during a specific period might related to the crop yield for that period, while others such as irrigation and fertilizer application can characterize different features among counties and over a long time.No doubt that the combination of the three datasets can predict yield most accurately for tracking both spatial and temporal yield variability (Figure 4); the reason for this accuracy may be

The Best Combinations of Explanatory Variables to Explain Spatial or Temporal Variability of Wheat Yield
In the study, three levels of yield data were separated, namely "raw", "Detrend", and "Detrend + Mean" (Figure 4).The models, driven by different combinations of inputs (i.e., climate, VIs, and SC), show disparate performances to track spatial or temporal crop yield variability (Figure 4).For example, the combination of VIs and Climate data can best capture inter-annual yield variability, but SC and climate combination capture spatial yield variability, suggesting that VIs and climate are key factors controlling inter-annual yield variability, and SC and climate for yield variability among counties [1,53,67].These results might be caused by the fac that the information on VIs and climate variables during a specific period might related to the crop yield for that period, while others such as irrigation and fertilizer application can characterize different features among counties and over a long time.No doubt that the combination of the three datasets can predict yield most accurately for tracking both spatial and temporal yield variability (Figure 4); the reason for this accuracy may be that crop growing status is not only affected by abiotic factors, but also by biotic factors [68,69].For example, high-yielding can be characterized by fertile soils, water conditions, well-educated farmers, good technologies, well-equipped irrigation facilities, and suitable climate conditions.Thus, combinations of multi-source data may be sufficient to predict yield.Additionally, the results also demonstrate capturing inter-annual yield variability is more difficult and challenging than spatial pattern of crop yield.Our findings are in-line with some previous studies [1,70].
Our results also demonstrate our approaches' capability of predicting wheat yield with a lead time up to two months before the maturity (Figure 6) in China, which shows the R 2 reaches its highest value around May.The best model, i.e., LightGBM, can achieve a yield prediction of 0.79 for the predicted R 2 in Oct. The model performance and the lead time of our prediction are consistent with or better than the existing prior work, for instance, Cai et al. (2019) shows their yield model can achieve an R 2 ranging from 0.5 to 0.73 with about two months of lead time before the harvest time.

The Unique and Shared Contributions of Different Data Sources for Predicting Crop Yield
The dynamic variables such as the climate and satellite (especially Visible-NIR based VIs) data have been the domain sources for crop monitoring and yield prediction [34,36,71].Understanding their shared and unique contributions will better predict and monitor crop yield more scientifically and accurately.First, about 53% and 45% of the raw crop yield variability can be explained by only using climate information and VIs, respectively, and a combination of VIs and climate information can attain a better performance, which is better than previous research (Figure 4) [34].Second, the increased R 2 from a specific VIs combing all climate are much smaller than those from a specific climate combing VIs (Figure 5).The finding further substantiates that climate data supplies more information than VIs for accurately estimating wheat yield, or that the information from VIs includes mixed information of land surfaces, rather than that only related to winter wheat growth [8].Given that the climate data and satellite data have large overlapping amounts of information, our study reveals their unique individual contribution.Third, regarding the contributions of dynamic variables (VIs and climate data) during different stages (the early, peak, and late stages), peak VIs can provide more information than the other two stages (Figure 5b).It is generally accepted that peak VIs strongly associate with biotic or abiotic factors, which eventually determine final yields [17,72].However, no such differences have been indicated by climate data, suggesting that unique individual information exists across the whole season (Figure 5a), rather than in a specific stage.The results further suggest that climate variables over the entire growing season play vital roles in determining the final wheat yield across China for the past 15 years.Moreover, the contributions for wheat yield of other critical factors such as SC factors, soil properties, and spatial information were identified and isolated for the first time.Those factors can also contribute to explain more yield variability, proving the hypothesis from previous research [1,34]

Method Comparison
The non-linear ML methods (RF and LightGBM) perform overall better than the traditional linear method (RR) in yield prediction (Figures 4-8), and such results have been reported in previous studies [19,63].This could be explained by the non-linear relationships between variables and crop yields [1,34].The following findings in the current study would further substantiate this finding: 1) for estimating the raw yields (Figure 4a), the RF and LightGBM models performed much better than the RR model for all combinations including climate data (e.g., climate, climate + SC, climate + VIs, climate + VIs + SC); 2) the RF and LightGBM models have improved more than the RR model when the raw yields are estimated by using climate data rather than by using VIs and SC factors (Figure 5a); 3) the added value of climate data after being combined with all VIs for RR models is smaller than for the RF and LightGBM models, suggesting the linear RR model could not fully capture the non-linear yield responses to climate data.Such responses had been extensively indicated by temperature and precipitation [25,34].In addition, recent rapid development in artificial intelligence has led to increased development of deep learning (DL) algorithms, which has been successfully applied in various domains and obviously outperform other techniques (e.g., ML).Therefore, DL could then be used for crop yield estimation in further research [65,73].
performances in 2007 and 2002 might have been caused by some extreme events, which significantly lowered yields.Since these models in the study were developed based on a very small number of event records, condition variables capturing the events might not have been sampled during the historical training, and thus the yield predictions in such extreme years will inevitably deviate from the reality.Such findings also highlight that more efforts and a novel method should be focused on yield loss estimation for the extreme yields.

Some Limitations of this Study
The first limitation is that using the dryland (one of cultivated land) as winter wheat planting area could lead to errors when we extract the area for the input variables (e.g., satellite and climate data), and so for when the aggregated yield is needed [1].Therefore, detailed winter wheat cover information will best isolate the relevant areas from the input variables.Using the land use data for 2005, 2010, and 2015 to mask cropland cover is feasible to some degree, but the wheat growing area changes yearly [1].Hence, it is highly essential in future studies to produce more accurate crop classification each year [1,74] to reduce potential errors.Secondly, we only used vegetation indices (VIs) from optical and near infrared.Although these VIs can provide indicators of photosynthetic canopy cover or leaf area index [34,41], other satellite data encompassing diverse spectral ranges contain complementary information on crop growth and yield.Many other options might further improve the crop yield prediction [25,34], such as solar-induced fluorescence (SIF), thermal-based To validate the model's generalization, a series of "leave-one-year-out" experiments were conducted by using the best combinations of all variables (e.g., VIs, climate and SC factors), and the results were shown in Figure 8.The highest R 2 was found in both 2011 (0.76~0.84) and 2014 (0.78~0.83), while the lowest R 2 in 2007 (0.48~0.55), followed by 2002 (0.58~0.66).The worse performances in 2007 and 2002 might have been caused by some extreme events, which significantly lowered yields.Since these models in the study were developed based on a very small number of event records, condition variables capturing the events might not have been sampled during the historical training, and thus the yield predictions in such extreme years will inevitably deviate from the reality.Such findings also highlight that more efforts and a novel method should be focused on yield loss estimation for the extreme yields.

Some Limitations of this Study
The first limitation is that using the dryland (one of cultivated land) as winter wheat planting area could lead to errors when we extract the area for the input variables (e.g., satellite and climate data), and so for when the aggregated yield is needed [1].Therefore, detailed winter wheat cover information will best isolate the relevant areas from the input variables.Using the land use data for 2005, 2010, and 2015 to mask cropland cover is feasible to some degree, but the wheat growing area changes yearly [1].Hence, it is highly essential in future studies to produce more accurate crop classification each year [1,74] to reduce potential errors.Secondly, we only used vegetation indices (VIs) from optical and near infrared.Although these VIs can provide indicators of photosynthetic canopy cover or leaf area index [34,41], other satellite data encompassing diverse spectral ranges contain complementary information on crop growth and yield.Many other options might further improve the crop yield prediction [25,34], such as solar-induced fluorescence (SIF), thermal-based Evapotranspiration (ET), QuikSCAT Ku-band radar backscatter, AMSR-E X-band passive microwave Vegetation Optical Depth (VOD), and so on.For example, the SIF can provide a general proxy of plant photosynthesis [1,34,75]; the thermal bands are correlated with crop growth for closed canopies and water stress [1,38]; passive or active microwave remote sensing data provide canopy biomass; and water content [1].All those data may provide complementary information than VIs that are used here.Thirdly, our analysis was based on a relatively coarse county level.ML models are more likely to perform better by collecting more field-level data [12], hence more detailed information will be collected at the field-level for further analyses in the future.Finally, due to their complex model structure for both RF and LightGBM models (so-called black box), it is difficult for us to produce testable hypotheses that could potentially provide biological insights into crop growth and final yield.In the future, we need to combine a crop growth model with ML/DL for predicting yield, forecasting, and monitoring disasters at a large spatial scale.

Conclusions
In this study, two ML and RR models were applied to predict wheat yield in China based on climate, satellite data, and other data.The results showed that integrating climate data, satellite data and SC factors can achieve the best performance for raw wheat yield prediction.However, the combination of VIs and climate can best capture inter-annual yield variability, and combining SC with climate can capture spatial yield variability.More specifically, the satellite data can gradually capture crop yield variability, but climate information serves as a unique contribution to wheat yield prediction across the entire growing season.By adding spatially locations and soil properties into benchmark results, the performances of all models further improved, suggesting they convey unique information for prediction yield.In addition, a decent yield prediction can be obtained around two months before harvest in the NCP.This study formulates a robust modeling framework to integrate multi-source data and predict crop yield at a large spatial scale.This framework can be applied to other crops and regions.
Evapotranspiration (ET), QuikSCAT Ku-band radar backscatter, AMSR-E X-band passive microwave Vegetation Optical Depth (VOD), and so on.For example, the SIF can provide a general proxy of plant photosynthesis [1,34,75]; the thermal bands are correlated with crop growth for closed canopies and water stress [1,38]; passive or active microwave remote sensing data provide canopy biomass; and water content [1].All those data may provide complementary information than VIs that are used here.Thirdly, our analysis was based on a relatively coarse county level.ML models are more likely to perform better by collecting more field-level data [12], hence more detailed information will be collected at the field-level for further analyses in the future.Finally, due to their complex model structure for both RF and LightGBM models (so-called black box), it is difficult for us to produce testable hypotheses that could potentially provide biological insights into crop growth and final yield.In the future, we need to combine a crop growth model with ML/DL for predicting yield, forecasting, and monitoring disasters at a large spatial scale.

Conclusion
In this study, two ML and RR models were applied to predict wheat yield in China based on climate, satellite data, and other data.The results showed that integrating climate data, satellite data and SC factors can achieve the best performance for raw wheat yield prediction.However, the combination of VIs and climate can best capture inter-annual yield variability, and combining SC with climate can capture spatial yield variability.More specifically, the satellite data can gradually capture crop yield variability, but climate information serves as a unique contribution to wheat yield prediction across the entire growing season.By adding spatially locations and soil properties into benchmark results, the performances of all models further improved, suggesting they convey unique information for prediction yield.In addition, a decent yield prediction can be obtained around two months before harvest in the NCP.This study formulates a robust modeling framework to integrate multi-source data and predict crop yield at a large spatial scale.This framework can be applied to other crops and regions.

Figure 1 .
Figure 1.The location of study area.(a) The main winter wheat producing provinces (black line) and counties (orange line), which includes ~76% of the total wheat yield in China; the green shading refers to winter wheat cropping areas.(b) The scope of the China and provincial boundaries (gray line).

Figure 1 .
Figure 1.The location of study area.(a) The main winter wheat producing provinces (black line) and counties (orange line), which includes ~76% of the total wheat yield in China; the green shading refers to winter wheat cropping areas.(b) The scope of the China and provincial boundaries (gray line).

2. 4 . 1 .
The First Experiment to Separate the Spatial and Temporal Variations of Yields and Combine the Explanatory Ones Differently

Figure 2 .
Figure 2. A typical example to illustrate the differences in spatial and temporal patterns of data from Options 1-3.Option 1 uses raw county-level data (the red dashed line shows its linear fit) (a); Option 2 removes the linear trend (the red line in Option 1) from the raw data (b), and Option 3 adds the multi-year average of the raw county-level data into the Option 2 data (c); e-f express the spatial patterns of the three types of yields corresponding to each Option.

Figure 2 .
Figure 2. A typical example to illustrate the differences in spatial and temporal patterns of data from Options 1-3.Option 1 uses raw county-level data (the red dashed line shows its linear fit) (a); Option 2 removes the linear trend (the red line in Option 1) from the raw data (b), and Option 3 adds the multi-year average of the raw county-level data into the Option 2 data (c); (e)-(g) express the spatial patterns of the three types of yields corresponding to each Option.Based on the three dynamic datasets included as explanatory variables (climate, satellite, and SC factors), seven different combinations were applied to develop the models: (1) SC only; (2) VIs

Figure 3 .
Figure 3. Fifteen-year-averaged (2001-2015) spatial patterns after normalization of crop yield (a), the satellite-based variables (b-d), climate variables (e-i), socio-economic factors (j-n) and for all counties in the study region.Note: all of the data have a mean of zero and standard deviation of one; b-i are based on March of every year.

Figure 3 .
Figure 3. Fifteen-year-averaged (2001-2015) spatial patterns after normalization of crop yield (a), the satellite-based variables (b-d), climate variables (e-i), socio-economic factors (j-n) and for all counties in the study region.Note: all of the data have a mean of zero and standard deviation of one; b-i are based on March of every year.

Figure 4 .
Figure 4.The model performances (predicted R2) of the three methods separated by the three Options with different inputs for the entire growing season.(a) "Raw"; (b) "Detrend"; and (c) "Detrend+Mean".The blue color is for RR, green for RF and red for LightGBM.The error bars are one standard deviation of predicted R 2 from 100 ensembles by randomly dividing training and testing datasets.

Figure 4 .
Figure 4.The model performances (predicted R2) of the three methods separated by the three Options with different inputs for the entire growing season.(a) "Raw"; (b) "Detrend"; and (c) "Detrend+Mean".The blue color is for RR, green for RF and red for LightGBM.The error bars are one standard deviation of predicted R 2 from 100 ensembles by randomly dividing training and testing datasets.

Figure 5 .
Figure 5. (a) The model performance (predicted R 2 ) using VIs during the whole growing season and climate data during a specific stage, either for Early (Oct.and Feb.), or Peak (Mar.and Apr.), or Late stage (Jun.and Jul.).The dashed line in (a) represents the benchmark model performance by only using VIs.(b) The model performance (predicted R 2 ) using the climate data during the whole growing season and VIs during a specific stage (the same stage in (a)).The dashed line in (b) represents the benchmark model performance by only using climate data.The error bars are one standard deviation of predicted R 2 from 100 ensembles by randomly dividing training and testing datasets.

Figure 5 .
Figure 5. (a) The model performance (predicted R 2 ) using VIs during the whole growing season and climate data during a specific stage, either for Early (Oct.and Feb.), or Peak (Mar.and Apr.), or Late stage (Jun.and Jul.).The dashed line in (a) represents the benchmark model performance by only using VIs.(b) The model performance (predicted R 2 ) using the climate data during the whole growing season and VIs during a specific stage (the same stage in (a)).The dashed line in (b) represents the benchmark model performance by only using climate data.The error bars are one standard deviation of predicted R 2 from 100 ensembles by randomly dividing training and testing datasets.

Figure 6 .
Figure 6.The temporal progression of the model performance based on the three methods (RR, RF, and LightGBM).The left panel shows the temporal progress of model performance according to each month (i.e., the prediction at any specific month contains input data covering the period from the beginning of the growing season to that specific month, thus the later period contains more inputs and usually has a higher performance).Blue refers to the model performance of using input sources including climate data and VIs; orange for VIs only inputted; and gray for climate data only.The right panel shows the differences of model performance between combined input sources and VIs only (a blue column indicates the benefits from VIs, calculated by subtracting the orange line from the blue line in the corresponding left panel); and the differences between VI + climate and climate only (a gray column indicates the benefits from VIs, calculated by subtracting the gray line from the blue line in the corresponding panel).

Figure 6 .
Figure 6.The temporal progression of the model performance based on the three methods (RR, RF, and LightGBM).The left panel shows the temporal progress of model performance according to each month (i.e., the prediction at any specific month contains input data covering the period from the beginning of the growing season to that specific month, thus the later period contains more inputs and usually has a higher performance).Blue refers to the model performance of using input sources including climate data and VIs; orange for VIs only inputted; and gray for climate data only.The right panel shows the differences of model performance between combined input sources and VIs only (a blue column indicates the benefits from VIs, calculated by subtracting the orange line from the blue line in the corresponding left panel); and the differences between VI + climate and climate only (a gray column indicates the benefits from VIs, calculated by subtracting the gray line from the blue line in the corresponding panel).

Figure 7 .
Figure 7.The model performance (predicted R 2 ) after including spatial information and soil properties; the green and red columns mean the R 2 after including the soil properties and spatial information in the benchmark model (blue color), respectively.The error bars are one standard deviation of predicted R 2 from 100 ensembles obtained by randomly dividing training and testing datasets.

Figure 7 .
Figure 7.The model performance (predicted R 2 ) after including spatial information and soil properties; the green and red columns mean the R 2 after including the soil properties and spatial information in the benchmark model (blue color), respectively.The error bars are one standard deviation of predicted R 2 from 100 ensembles obtained by randomly dividing training and testing datasets.

Figure 8 .
Figure 8.The results of the "leave-one-year-out" experiment across different years.One-year data are selected for testing, while data from other years for training.The worst performance in 2002 and 2007 may be due to extreme events (black for RR, red for RF, and green for LightGBM model).

Figure 8 .
Figure 8.The results of the "leave-one-year-out" experiment across different years.One-year data are selected for testing, while data from other years for training.The worst performance in 2002 and 2007 may be due to extreme events (black for RR, red for RF, and green for LightGBM model).

Table 2 .
EDA results showing the correlations among the 13 variables and correlations between each variable and yields (last column).The input variables were grouped into three categories with different colors: red for climate-related variables, yellow for socio-economic ones, and blue for VIs.
Note: double asterisk ** and threefold asterisks (***) indicate a correlation coefficient (r) with statistical significance levels of p-value 0.01 and p-value b 0.001, respectively.