Prediction of Maize Yield at the City Level in China Using Multi-Source Data

: Maize is a widely grown crop in China, and the relationships between agroclimatic parameters and maize yield are complicated, hence, accurate and timely yield prediction is challenging. Here, climate, satellite data, and meteorological indices were integrated to predict maize yield at the city-level in China from 2000 to 2015 using four machine learning approaches, e.g., cubist, random forest (RF), extreme gradient boosting (Xgboost), and support vector machine (SVM). The climate variables included the diffuse ﬂux of photosynthetic active radiation (PDf), the diffuse ﬂux of shortwave radiation (SDf), the direct ﬂux of shortwave radiation (SDr), minimum temperature (Tmn), potential evapotranspiration (Pet), vapor pressure deﬁcit (Vpd), vapor pressure (Vap), and wet day frequency (Wet). Satellite data, including the enhanced vegetation index (EVI), normalized difference vegetation index (NDVI), and adjusted vegetation index (SAVI) from the Moderate Resolution Imaging Spectroradiometer (MODIS), were used. Meteorological indices, including growing degree day (GDD), extreme degree day (EDD), and the Standardized Precipitation Evapotranspiration Index (SPEI), were used. The results showed that integrating all climate, satellite data, and meteorological indices could achieve the highest accuracy. The highest estimated correlation coefﬁcient (R) values for the cubist, RF, SVM, and Xgboost methods were 0.828, 0.806, 0.742, and 0.758, respectively. The climate, satellite data, or meteorological indices inputs from all growth stages were essential for maize yield prediction, especially in late growth stages. R improved by about 0.126, 0.117, and 0.143 by adding climate data from the early, peak, and late-period to satellite data and meteorological indices from all stages via the four machine learning algorithms, respectively. R increased by 0.016, 0.016, and 0.017 when adding satellite data from the early, peak, and late stages to climate data and meteorological indices from all stages, respectively. R increased by 0.003, 0.032, and 0.042 when adding meteorological indices from the early, peak, and late stages to climate and satellite data from all stages, respectively. The analysis found that the spatial divergences were large and the R value in Northwest region reached 0.942, 0.904, 0.934, and 0.850 for the Cubist, RF, SVM, and Xgboost, respectively. This study highlights the advantages of using climate, satellite data, and meteorological indices for large-scale maize yield estimation with machine learning algorithms.


Introduction
The global demand for the use of agricultural crops as food, feed, and bioenergy will continue to grow in the coming decades [1]. Accurate and timely estimation of yield can help make informed policies and investments in agriculture and increase market stability and efficiency [2,3]. Maize is an important cereal crop and is cultivated almost everywhere around the world. The relationships between agroclimatic input parameters and maize climate extremes, NDVI, crop-process model simulated biomass, and the Standardized Precipitation Evapotranspiration Index (SPEI) with a regression model (RF or multiple linear regression (MLR)). They found that the forecasting system based on RF was better than that based on MLR at each forecasting event.
In China, many studies have attempted to estimate crop yield [38][39][40], however, most researches were based on a single model for small-scale yield forecasting. Few studies used agrometeorological indicators such as GDD, EDD, and SPEI for yield forecasting; thus, it is necessary to explore the possibility of multiple models to predict crop yield in China using multi-source data.
All climate, satellite data, and meteorological indices have had some success in yield predictions by applying machine learning methods [20,29,36], however, previous studies about crop yield prediction still had some limitations. First, these studies paid more attention to machine learning methods such as RF and SVM. In contrast, the newest studies [1,41,42] confirmed that novel machine learning methods, such as cubist and extreme gradient boosting (Xgboost), had the advantage of improving the accuracy of estimating R. Second, satellite data such as the EVI and NDVI have been widely used in yield predictions. Still, few studies have considered the SAVI to be superior to these indices [34]. Third, these studies usually conducted their experiments at a large-scale, such as the provincial or state level [10,43], and few studies have focused on crop yield forecasting at the city level [13]. Moreover, maize is one of the major cereal crops grown around the world, providing the primary caloric and nutritional source for millions of people worldwide. Accurate and rapid monitoring and prediction of maize growth is essential to ensure food security. Hence, executing a reliable, large-scale prediction of maize yield using climate and satellite data at the city level is needed.
Here, this study combined climate, satellite data, and meteorological indices to forecast maize yield using four machine learning methods in China. This study was designed to address the following objectives: (1) to evaluate the prediction accuracies of the four machine learning algorithms by comparing seven forms of input data and discuss the highest estimating correlation coefficient (R) of maize yield prediction; (2) to explore the contributions of climate, satellite data and meteorological indices to maize yield prediction and analyze the divergence in climate and satellite data in different growth stages from maize yield prediction; and (3) to investigate the divergences in R of maize yield prediction between five maize-growing regions.

Study Area
The total land area of China is approximately 9.6 million km 2 ; specifically, mountains, plateaus, and hills account for approximately 67% of the land area, and basins and plains account for approximately 33% of the land area, with high elevation in the western part and low elevation in the eastern part. In terms of climate type, the eastern part of China has a monsoon climate (which can also be divided into subtropical monsoon climates), the northwest part has a subordinate temperate continental climate, and the Qinghai-Tibet Plateau has an alpine climate. China is one of the main producers and exporters of maize globally, accounting for 21.5% of global maize cropping area and 22.8% of global maize production [44], and it plays a major role in the global market. This study covers maize production in 213 cities, whereas nine cities are missing from 2000 to 2015, mainly in five maize-growing regions of China, which includes Northeast China, North China Plain, Northwest China, Southwest China, and South China (Figure 1). The detailed information of five regions during the maize growing season is shown in Table 1.

Materials
Here, the growing season of maize in China was defined from March to August, and the analysis focused on this period. Five types of data, city-level maize statistics, maize cropping region masks, climate data, satellite data, and meteorological indices, were used in this study.

Crop Yield
The city-level maize yields used in this study were survey data collected by the National Bureau of Statistics of China from 2000 to 2015. Due to the limited yield records, 203 cities were selected. The maize cropping regions were derived from the International Food Policy Research Institute (IFPRI) spatial production allocation model (SPAM) (2000, 2005, and 2010) beta cropland product, which was generated in collaboration with the International Institute for Applied Systems Analysis (IIASA). The SPAM [45] was on the basis of the IIASA best available cropland mask, subnation-level statistics, and a series of suitability variables [3]. These features estimated cropland distribution within five arc-minute grids by using a cross-entropy approach to 0.0833 degrees.

Materials
Here, the growing season of maize in China was defined from March to August, and the analysis focused on this period. Five types of data, city-level maize statistics, maize cropping region masks, climate data, satellite data, and meteorological indices, were used in this study.

Crop Yield
The city-level maize yields used in this study were survey data collected by the National Bureau of Statistics of China from 2000 to 2015. Due to the limited yield records, 203 cities were selected. The maize cropping regions were derived from the International Food Policy Research Institute (IFPRI) spatial production allocation model (SPAM) (2000,2005, and 2010) beta cropland product, which was generated in collaboration with the International Institute for Applied Systems Analysis (IIASA). The SPAM [45] was on the basis of the IIASA best available cropland mask, subnation-level statistics, and a series of suitability variables [3]. These features estimated cropland distribution within five arc-minute grids by using a cross-entropy approach to 0.0833 degrees.

Climate, Satellite Data, and Meteorological Indices
Eight variables, including maximum temperature (Tmx), mean temperature (Tmp), minimum temperature (Tmn), potential evapotranspiration (Pet), vapor pressure (Vap), precipitation (Pre), cloud cover percentage (Cld), and wet day frequency (Wet), were collected from the Climatic Research Unit (CRU) [46] at 1-month intervals and at a 0.5 degree spatial Remote Sens. 2021, 13, 146 5 of 17 resolution. The vapor pressure deficit (Vpd) was calculated using the method described in Cai et al. [29]. In addition, the four radiation-related variables used in this study were the diffuse flux of shortwave radiation (SDf), the direct flux of shortwave radiation (SDr), the diffuse flux of photosynthetic active radiation (PDf), and the direct flux of photosynthetic active radiation (PDr), which were developed by the satellite sensor of the Clouds and the Earth's Radiant Energy System (CERES). It was provided as a monthly product with a resolution of 1 degree.
Satellite data were derived from the Moderate Resolution Imaging Spectroradiometer (MODIS, collection 6), including the MOD13A3 EVI and the NDVI, with a 1 km spatial resolution and monthly temporal resolution. The SAVI was calculated using the bands from the MOD13A3 product [34].
Meteorological indices included the GDD, EDD, and SPEI. Daily maximum and minimum temperature from 2000 to 2015 were obtained from the climate data-sharing service system of the China Meteorological Administration (CMA) with a spatial resolution of 0.5 degrees (http://data.cma.cn). The GDD and EDD, which were a representation of effective thermal units and extreme heat events, respectively, were computed as following according to previous studies [47,48].
where T max and T min are maximum and minimum temperature ( • C), respectively. The SPEI product covering the period between 2000 and 2015 was obtained from the Spanish National Research Council (CSIC), with a monthly time resolution and a 0.5 degrees spatial resolution. SPEI, which are often defined as meteorological indices to assess meteorological drought, was calculated based on meteorological variables. Meteorological drought can reflect the characteristics of drought to some extent, while the agricultural drought often has a lag of 3-6 months after meteorological drought [49,50]. Thus, a 3month time scale was chosen as the agriculture was closely related with SPEI-3, according to Zhang et al. [51] and Fu et al. [52].
All the data were obtained for the period 2000 to 2015. First, the spatial and temporal resolutions of climate, satellite and meteorological indices variable were unified at a monthly time resolution and a 0.5-degree spatial resolution. The maize cropping regions (2000,2005,2010)

Exploratory Data Analysis
According to Cai et al. [29], exploratory data analysis (EDA) is an important step before applying machine learning algorithms; hence, EDA were applied in this study. First, EDA was conducted for 13 climate variables divided into four groups according to Cai et al. [29]: (1) radiation-related variables, (2) temperature-related variables, (3) water demand-related variables, and (4) water supply-related variables. The Pearson correlations between each variable and maize yield, and the correlations among the variables were calculated using the average value in the maize growing season by using R statistical software. In each group, the climate variables which obtained the maximum absolute correlation with yield were selected, and the climate variables that had a correlation with Remote Sens. 2021, 13, 146 6 of 17 previously selected variables in the same group below a certain threshold (0.5 in this study) were utilized according to Cai et al. [29].

Machine Learning Methods for Estimating Maize Yield
Four machine learning methods cubist, RF, Xgboost, and SVM, were used in this study. All the 13 climate variables, satellite data and meteorological indices were normalized to the range of 0 to 1 before applying each machine learning method. To generate the predicted R, the data were evenly and randomly partitioned into 10 subsets. Subsequently, nine subsets and samples were the training data, and the remaining data were selected for validation purposes. A 10-fold cross-validation method was utilized in this study, and the whole process was repeated 10 times to calculate the mean predicted R, which was used to assess the performances of different models.
The cubist algorithm [53] is a decision tree-based algorithm, which is an extension of the M5 model tree developed by Rule Quest Company. It is a type of decision tree that adopts a multivariate linear regression fitted at each end of the leaves [54]. This algorithm aims to predict the value of a variable by a series of independent variables (called attributes). This algorithm does not usually fit as well as other ensembles due to its tree being shallow but has the advantage of being able to extrapolate to abnormal values, which is useful in forecasting anomalous years. The cubist algorithm has the strength to analyze the input data for nearest neighbor correlations and can run iteratively multiple times to form committee or ensemble models [55]. Moreover, although commercial production also widely uses regression and classification approach, this algorithm was made available through R statistical software [56]. The "boot" method was used in this study and the number was set as 10.
According to LV et al. [57], a regression model was developed at each node for pruning and prediction as follows: where E is the set of data points that reached the node, Ei is the data point that resulted from splitting at the node and fell into one subspace according to the chosen splitting parameter, and sd is the standard deviation. After avoiding the overfitting problem by pruning the tree, the tree was smoothed to compensate for the sharp discontinuities caused by splitting as follows: where Y is the prediction passed to the next highest node, y is the prediction passed to the next lowest node, n is the number of training instances that reached the node below, t is the value predicted by the model at this node, and m is the smoothing constant. The RF algorithm, proposed by Breiman [58], is a bagging-based method that employs a regression tree method. It has been widely applied for prediction via the "RandomForest" package within the R software environment [59], although it lacks efficiency, especially when dealing with our large training set [60]. The final predicted value is the mean fitted response from all individual trees [61]. The ntree was set as 400 in this study. Compared to traditional decision tree-building methods, RF has the advantages of fast speed, easy adjustment to parameters, accurate results, and the characteristics of insensitivity to multiple collinearities in dealing with multidimensional features. Additionally, RF uses random sampling with replacement to build a decision tree. Thus, an overfitting phenomenon would not occur as in traditional decision trees, even if the decision tree was not pruning [62]. For the training dataset drawn at random from the distribution of the random vector X, Y, and an ensemble of classifiers h 1 (X), h 2 (X) . . . , and h k (X), the margin function is presented as follows: where I(.) is the indicator function. The margin measures the extent to which the mean number of votes at X, Y for the right class exceeds the mean vote for any of the other classes. The Xgboost model is a special variety of gradient boosting algorithm proposed by Chen and Guestrin [63]. It was constructed by building sequential trees, where one tree is added at a time to optimize the objective further. The model aims to develop a strong learner by combining all of the predictions for a set of weak learners through additive training strategies based on the idea of boosting. The model is characterized by parallel calculations to improve computational speed. Xgboost inherits the high fitting capability of ensemble trees, which is extremely efficient (at least ten times faster than RF to train) [64]. In this study, the parameters of "nrounds" was set as 50, the number of parallel trees was set as 100, the verbose was set as 2, and the "maximum depth" was set as 10. The general function of the prediction at step t is presented as follows: where x i is the input variable and f t (xi) and f t i are the learner and predictions at step t, respectively.
To prevent overfitting problems without reducing the computing time of this algorithm, the objective functions are presented as follows: where n denotes the number of observations, l indicates the loss function, and Ω represents the regularization in the form of Equation (8).
where γ designates the minimum loss needed to further partition the leaf node, λ denotes the regularization parameter, and ω represents the vector of scores in the leaves. The SVM is a supervised nonparametric statistical learning algorithm proposed by Cortes and Vapink [65] and has been used in numerous applications for crop yield prediction [1,42]. The basic method of SVM is the mapping of data to a high-dimensional feature space by applying nonlinear mapping [66]. The regression prediction function is constructed in the high-dimensional feature space and finally mapped back to the original space [67]. The biggest characteristic of SVM is that it changes the traditional principle of mining risk based on experience [68]. The problems of a small number of samples, nonlinearity, overfitting, and single-digit disaster are well solved.
SVMs with Gaussian radial basis functions were used in this study. According to Brdar et al. [66], the function is expressed as follows: The regression function can be expressed as follows: where a i and a * i are Lagrange multipliers and x i is a support vector that is learned through the optimization technique in SVM regression. To build a good predictive model, a parameter selection procedure was employed. Cross validation was used to find the best parameters C (cost of constraint violation) and γ = 1 2σ 2 . The final model was built from the whole training dataset with the best parameters that were previously estimated.

Experimental Design
The study included three groups of experiments ( Figure 2). The first group was designed to analyze which combination form would obtain the best performance, and we adopted nine combinations of inputs: (1) the satellite data (EVI, NDVI, and SAVI), (2) climate data only, (3) meteorological indices only, (4) climate and satellite data (EVI, NDVI, and SAVI); (5) climate data and the meteorological indices, (6) satellite data (EVI, NDVI, and SAVI) and the meteorological indices, and (7) climate, satellite data (EVI, NDVI, and SAVI) and the meteorological indices.
The regression function can be expressed as follows: where ai and ai * are Lagrange multipliers and xi is a support vector that is learned through the optimization technique in SVM regression. To build a good predictive model, a parameter selection procedure was employed. Cross validation was used to find the best parameters C (cost of constraint violation) and σ γ 2 2 1 = . The final model was built from the whole training dataset with the best parameters that were previously estimated.

Experimental Design
The study included three groups of experiments ( Figure 2). The first group was designed to analyze which combination form would obtain the best performance, and we adopted nine combinations of inputs: (1) the satellite data (EVI, NDVI, and SAVI), (2) climate data only, (3) meteorological indices only, (4) climate and satellite data (EVI, NDVI, and SAVI); (5) climate data and the meteorological indices, (6) satellite data (EVI, NDVI, and SAVI) and the meteorological indices, and (7) climate, satellite data (EVI, NDVI, and SAVI) and the meteorological indices. The second group was to test how climate, meteorological indices and satellite data from three growing periods contributed to maize yield prediction. The growing stages were defined as (1) the early growing period (March and April), (2) the peak growing period (May and June), and (3) the late growing period (July and August). The performance of climate data and meteorological indices from all stages and satellite data from specific stages, climate data from specific stages and satellite data and meteorological indices from all stages, as well as meteorological indices from specific stages and climate and satellite data from all stages, were compared. The second group was to test how climate, meteorological indices and satellite data from three growing periods contributed to maize yield prediction. The growing stages were defined as (1) the early growing period (March and April), (2) the peak growing period (May and June), and (3) the late growing period (July and August). The performance of climate data and meteorological indices from all stages and satellite data from specific stages, climate data from specific stages and satellite data and meteorological indices from all stages, as well as meteorological indices from specific stages and climate and satellite data from all stages, were compared.
The third group was designed to explore the spatial divergences of the model performance. For five maize-growing regions, the climate, satellite data, and meteorological indices were used to predict the maize yield.
Finally, the leave-one-out prediction method, which uses one year for testing and the rest for training to validate the performance of the model, was conducted.

Selection of Climate Variables
The EDA results are shown in Figure 3. The water supply-related variables and temperature-related variables were negatively correlated with maize yield, whereas the Vpd of the water demand-related variables was positively correlated with maize yield. Among all water supply-related variables, the R between Wet and maize yield was the highest. Additionally, the R between Wet and Cld, Wet and Pre were above 0.5, so Cld and Pre were not selected. Among the three temperature-related variables, the Tmn was selected for the highest correlation coefficient with maize yield. All the water demandrelated variables were selected, as the R between Vap and Pet, Vap and Vpd was below 0.5. Moreover, the results showed that maize yield was negatively related to PDf and SDf; conversely, maize yield was positively related to PDr and SDr. This was consistent with Tollenaar et al. [69], which showed that 27% of the maize yield change between 1984 and 2013 was attributable to solar brightening. Among the four radiation-related variables, the SDr had the highest correlation coefficient with maize yield. The R between other variables and SDr were below 0.5 except PDr, thus, PDf, SDf, and SDr were selected to apply in the machine learning methods. Finally, eight climate variables (Wet, Tmn, Pet, Vap, Vpd, PDf, SDf, and SDr) were selected out of the 13 variables to represent climate conditions.

Multi-Model Performances When Estimating Maize Yield
The first group of experiments used different combinations of climate, satellite data, and meteorological indices to evaluate model performance. The prediction performances of the seven forms of data inputs were evaluated across four machine learning algorithms ( Figure 4). The results showed that the combination of climate, satellite data, and meteorological indices achieved the best performance, with R values ranging from 0.742 to 0.828. Indeed, the combination of the climate data and meteorological indices performed best, following by the combination of climate data with satellite data and the combination of meteorological indices with satellite data. Additionally, the performance of climate data only, with R values between 0.641 and 0.752, was better than that of satellite data only and meteorological indices only, with the lowest R values ranging from 0.460 to 0.487 0.449 to 0.486, respectively. Furthermore, the analyses clearly showed that the cubist model generally provided the best accuracy among the four learning methods in estimating R, followed by RF, Xgboost, and SVM.

Multi-Model Performances When Estimating Maize Yield
The first group of experiments used different combinations of climate, satellite data, and meteorological indices to evaluate model performance. The prediction performances of the seven forms of data inputs were evaluated across four machine learning algorithms ( Figure 4). The results showed that the combination of climate, satellite data, and meteorological indices achieved the best performance, with R values ranging from 0.742 to 0.828. Indeed, the combination of the climate data and meteorological indices performed best, following by the combination of climate data with satellite data and the combination of meteorological indices with satellite data. Additionally, the performance of climate data only, with R values between 0.641 and 0.752, was better than that of satellite data only and meteorological indices only, with the lowest R values ranging from 0.460 to 0.487 0.449 to 0.486, respectively. Furthermore, the analyses clearly showed that the cubist model generally provided the best accuracy among the four learning methods in estimating R, followed by RF, Xgboost, and SVM.
The first group of experiments used different combinations of climate, satellite data, and meteorological indices to evaluate model performance. The prediction performances of the seven forms of data inputs were evaluated across four machine learning algorithms (Figure 4). The results showed that the combination of climate, satellite data, and meteorological indices achieved the best performance, with R values ranging from 0.742 to 0.828. Indeed, the combination of the climate data and meteorological indices performed best, following by the combination of climate data with satellite data and the combination of meteorological indices with satellite data. Additionally, the performance of climate data only, with R values between 0.641 and 0.752, was better than that of satellite data only and meteorological indices only, with the lowest R values ranging from 0.460 to 0.487 0.449 to 0.486, respectively. Furthermore, the analyses clearly showed that the cubist model generally provided the best accuracy among the four learning methods in estimating R, followed by RF, Xgboost, and SVM.  (7) climate, satellite (EVI, NDVI, and SAVI) and meteorological indices, input for the whole growing season. The error bars are one standard deviation of predicted R from 10 ensembles by dividing the training and testing datasets randomly.

The Divergences of Model Performances Between Different Growth Stages and Maize-Growing Regions
The yield forecasting performances using satellite data and meteorological indices from all periods and climate data from one specific period during the growing season were compared with those using satellite data from different periods and climate data and meteorological indices from all periods or meteorological indices from one specific period and climate and satellite data from all stages. R improved by about 0.126, 0.117, and 0.143 by adding climate data from the early, peak, and late periods to satellite data and meteorological indices from all stage via the four machine learning algorithms, respectively. R increased by 0.016, 0.016, and 0.017 when adding satellite data from the early, peak and late stages to climate data and meteorological indices from all stages, respectively. R increased by 0.003, 0.032, and 0.042 when adding meteorological indices from the early, peak and late periods to climate and satellite data from all stages, respectively ( Figure 5).  (7) climate, satellite (EVI, NDVI, and SAVI) and meteorological indices, input for the whole growing season. The error bars are one standard deviation of predicted R from 10 ensembles by dividing the training and testing datasets randomly.

The Divergences of Model Performances between Different Growth Stages and Maize-Growing Regions
The yield forecasting performances using satellite data and meteorological indices from all periods and climate data from one specific period during the growing season were compared with those using satellite data from different periods and climate data and meteorological indices from all periods or meteorological indices from one specific period and climate and satellite data from all stages. R improved by about 0.126, 0.117, and 0.143 by adding climate data from the early, peak, and late periods to satellite data and meteorological indices from all stage via the four machine learning algorithms, respectively. R increased by 0.016, 0.016, and 0.017 when adding satellite data from the early, peak and late stages to climate data and meteorological indices from all stages, respectively. R increased by 0.003, 0.032, and 0.042 when adding meteorological indices from the early, peak and late periods to climate and satellite data from all stages, respectively ( Figure 5).
The result of the third experiment is shown in Figure 6. According to the model performance results, the R-value in Northwest China regions was much higher than that in other regions. and climate and satellite data from all stages. R improved by about 0.126, 0.117, and 0.143 by adding climate data from the early, peak, and late periods to satellite data and meteorological indices from all stage via the four machine learning algorithms, respectively. R increased by 0.016, 0.016, and 0.017 when adding satellite data from the early, peak and late stages to climate data and meteorological indices from all stages, respectively. R increased by 0.003, 0.032, and 0.042 when adding meteorological indices from the early, peak and late periods to climate and satellite data from all stages, respectively ( Figure 5). Figure 5. (a) The R values using satellite data and meteorological indices for all stages but using climate data for only one stage of the maize growing season, which was divided into early (Mar and Apr), peak (May and Jun), and late (Jul and Aug) stages. (b) R values using climate data and meteorological indices for all stages but using satellite data for only one stage. (c) R values using climate and satellite data for all stages but using meteorological indices for only one stage. The dashed lines in (a-c) represent the benchmark model performance when using satellite data only, climate data only, and meteorological indices only during all stages, respectively. Figure 5. (a) The R values using satellite data and meteorological indices for all stages but using climate data for only one stage of the maize growing season, which was divided into early (Mar and Apr), peak (May and Jun), and late (Jul and Aug) stages. (b) R values using climate data and meteorological indices for all stages but using satellite data for only one stage. (c) R values using climate and satellite data for all stages but using meteorological indices for only one stage. The dashed lines in (a-c) represent the benchmark model performance when using satellite data only, climate data only, and meteorological indices only during all stages, respectively.
The result of the third experiment is shown in Figure 6. According to the model performance results, the R-value in Northwest China regions was much higher than that in other regions. In the Northwest China region, the R reached 0.941, 0.904, 0.934, and 0.850 for the cubist, RF, SVM, and Xgboost, respectively. The divergences between Northeast China, North China Plain, and South China were insignificant, ranging from 0.744 to 0.  A leave-one-out test was executed to assess the robustness of the model. Prediction accuracy of different years clearly had some differences in different years. Most years showed great prediction accuracy except for 2003 and 2014 (Figure 7). According to the monthly minimum temperature and accumulated precipitation during the maize growing season, the results showed that the mean monthly temperature in 2003 and 2014 was lower than that in other years. The EDD was considered in the study; however, the extremely low temperature was ignored.  A leave-one-out test was executed to assess the robustness of the model. Prediction accuracy of different years clearly had some differences in different years. Most years showed great prediction accuracy except for 2003 and 2014 (Figure 7). According to the monthly minimum temperature and accumulated precipitation during the maize growing season, the results showed that the mean monthly temperature in 2003 and 2014 was lower than that in other years. The EDD was considered in the study; however, the extremely low temperature was ignored.
climate data for only one stage of the maize growing season, which was divided into early (Mar and Apr), peak (May and Jun), and late (Jul and Aug) stages. (b) R values using climate data and meteorological indices for all stages but using satellite data for only one stage. (c) R values using climate and satellite data for all stages but using meteorological indices for only one stage. The dashed lines in (a-c) represent the benchmark model performance when using satellite data only, climate data only, and meteorological indices only during all stages, respectively.
The result of the third experiment is shown in Figure 6. According to the model performance results, the R-value in Northwest China regions was much higher than that in other regions.  A leave-one-out test was executed to assess the robustness of the model. Prediction accuracy of different years clearly had some differences in different years. Most years showed great prediction accuracy except for 2003 and 2014 (Figure 7). According to the monthly minimum temperature and accumulated precipitation during the maize growing season, the results showed that the mean monthly temperature in 2003 and 2014 was lower than that in other years. The EDD was considered in the study; however, the extremely low temperature was ignored.

Quantifying the Contributions of Climate, Satellite Data, and Meteorological Indices in Different Growth Stages to Maize Yield
Generally, the results indicated that the climate data, satellite data, and meteorological indices over the whole maize growing period played critical roles in determining maize growth and final maize yield. The highest estimated R values for the cubist, RF, SVM, and Xgboost methods were 0.828, 0.806, 0.742, and 0.758, respectively. The cubist algorithm achieved the best performance among four machine learning methods, achieving an R of 0.828 by combining climate, satellite data, and meteorological indices in this study; however, the value was relatively lower than that of Jiang et al. [5] with a value of 0.872. They used a long short-term memory (LSTM) model by integrating heterogeneous crop phenology, meteorology, and remote sensing data to estimate country-level maize yield across the US Corn Belt from 2006 to 2016. The reasons for the slight divergences in R were as follows: (1) this study was conducted in China, while the study of Jiang et al. [47] was conducted in the US Corn Belt. The main maize cropping regions belong to a monsoon climate or subordinate temperate continental climate in China, however, the US Corn Belt belongs to a temperate continental climate zone, where rainfall is less and concentrated, and the continental climate is strong; (2) this study was conducted for a longer period of 2000 to 2015 than that from 2006 to 2016. The R of combining climate and meteorological indices reached 0.80, 0.79, 071, and 0.72 for the cubist, RF, SVM, and Xgboost, respectively. This result was similar to the previous study of Chen et al. [70], which indicated that climate change and extreme climate could account for 56.7% of maize yield change by applying a stepwise regression.
The results also showed that the prediction skill of satellite data and meteorological indices for all stages and climate data for the late-stage was better than that from the early and peak stages of the four machine learning methods. Additionally, the prediction skill of climate and satellite data for all stages and meteorological indices for the latestage was better than that for the early and peak stage. Specifically, R improved by about 0.126, 0.117, and 0.143 by adding climate data from the early, peak, and late periods to satellite data and meteorological indices from all stages via the four machine learning algorithms, respectively. R increased by 0.016, 0.016, and 0.017 when adding satellite data from the early, peak, and late stages to climate data and meteorological indices from all stages, respectively. R increased by 0.003, 0.032, and 0.042 when adding meteorological indices from the early, peak, and late stages to climate and satellite data from all stages, respectively. These were in agreement with the result of Karimi et al. [42], which showed that the RMSE values at the tasseling stage were lower than those at the early growth stage. They also found the R values between the observed and simulated yields at the tasseling stage were much higher than those calculated for the early growth stage by using the SVM algorithm. This study was also consistent with previous studies [71,72], which noted that crop yield was highly correlated with biomass during the ripening stage or grain filling period. This phenomenon might contribute to crop sensitivities to different climatic events varying with the growth phase [73]. For example, according to Daryanto et al. [74], maize was more sensitive to drought during the reproductive phase than during the vegetative phase. In contrast, the difference was not obvious between satellite data in the three stages and climate data and meteorological indices over the whole maize growing stages. The results also revealed that the added value of satellite data given climate data and meteorological indices over the maize growing season, as well as the value of meteorological data given climate and satellite data in the whole growing season, was smaller than the value of climate data given satellite data and meteorological indices over the maize growing season. Indeed, the performance of the model could slightly improve the R-value to a level that was impossible to reach by climate information alone. This finding was in agreement with the results of Becker-Reshef et al. [43] and Kern et al. [75], who found that the model performance was greatly improved by adding satellite information to climate data. According to Son et al. [31], crop biomass at different phenological periods had different correlation levels with different VIs, which might be attributed to the phenological variations in plant growth related to the change in climatic conditions [76,77].

Quantifying the Divergences of Model Performances Between Five Maize-Growing Regions
The spatial divergences of R value between different maize-growing regions were obvious. Indeed, the R-value in Northwest China region was significantly higher than that in other regions. The divergence might be attributed to Northwest China having a continental climate, which is characterized by a dry climate and sufficient sunshine. Besides, the population density in this region is small, with less interference from human activities compared to other regions. The reason for the poor model performance in the Southwest China region might be that this region has a temperate humid climate, which is characterized by abundant rainfall and less sunshine. Additionally, climate changes are more complicated in this region. Therefore, for regions with similar climate characteristics to Northwest China, the maize yield prediction method in this study was efficient and useful. Yao et al. [40] estimated the maize yield by using a process-based model in the Northeast China Plain during 2002-2011. Their results showed that the R and RMSE were 0.827 and 712 kg/ha, respectively, from 2002 to 2011. The results were slightly better than those in this study, however, this might be attributable to the process-based model being superior in small-scale than large-scale regions. However, machine learning methods were widely and well used in large-scale regions.
The leave-one-out test showed great prediction accuracy in most years except for 2003 and 2014. The EDD was considered in the study, however, the extreme low temperature was ignored. This result was consistent with that of Li et al. [11] and Cai et al. [29], who found that extreme climate was another factor affecting yield predictability. However, our results showed good performances throughout the year, which had extremely low precipitation. This might be attributed to the fact that the prediction could be greatly improved by incorporating satellite data that could contain crop progress information [11]. Hence, this study showed that the machine learning models had limited in predicting maize yield when extreme climate conditions existed. These extreme climate events could cause large prediction biases due to that such conditions might not have occurred in the historic training dataset.

Uncertainty and Limitations
This study successfully applied four machine learning approaches to predict maize yield in China. Three limitations of this work should be considered. Firstly, although the SAVI was added to consider soil property information, uncertainties in the heterogeneity of environmental conditions might be ignored [78], including irrigation system, fertilizer application rate, and soil conditions. Secondly, as our models were built on a fixed growth season, crop phenology dynamics were ignored in this study. Hence, further studies should consider the crop phenology of each statistical unit to improve the prediction accuracy. Finally, all the machine learning methods that function as black boxes were nonlinear methods with the characteristic of limited process-based interpretation. Moreover, the mechanism of each method also had divergences, thus, a more accurate machine learning method will improve the performance of maize prediction.

Conclusions
In this study, four machine learning approaches, including cubist, Xgboost, RF, and SVM, were used to predict maize yield at the city level in China from 2000 to 2015 based on climate, satellite data, and meteorological indices. This study included three groups of experiments. The first group adopted seven combinations of inputs to analyze which combination would obtain the best performance. The second group was used to test how climate data, meteorological indices, and satellite data from different growing periods contributed to maize yield prediction, and the third group was designed to explore the model performance differences of five maize-growing regions. The major findings were as follows: (1) The performance of climate data, with R ranging from 0.641 to 0.752, was better than that of satellite data and meteorological indices, with the lowest R ranging from 0.449 to 0.486 and 0.442 to 0.481, respectively. Integrating all climate, satellite data, and meteorological indices could achieve the highest accuracy. (2) The climate data or satellite data inputs from all growth stages were essential for maize yield prediction, especially in late growth stages. (3) The spatial analysis found that the spatial divergences were large, and the R-value in the Northwest region reached 0.942, 0.904, 0.934, and 0.850 for the Cubist, RF, SVM, and Xgboost, respectively. Additionally, unprecedented extreme climate events could cause large prediction biases.
In summary, this study provided an applicable approach using climate, satellite data, and meteorological indices for large-scale yield prediction, which could be used for other crop types and other regions.