A Comparison Between Major Artificial Intelligence Models for Crop Yield Prediction: Case Study of the Midwestern United States, 2006–2015

This paper compares different artificial intelligence (AI) models in order to develop the best crop yield prediction model for the Midwestern United States (US). Through experiments to examine the effects of phenology using three different periods, we selected the July–August (JA) database as the best months to predict corn and soybean yields. Six different AI models for crop yield prediction are tested in this research. Then, a comprehensive and objective comparison is conducted between the AI models. Particularly for the deep neural network (DNN) model, we performed an optimization process to ensure the best configurations for the layer structure, cost function, optimizer, activation function, and drop-out ratio. In terms of mean absolute error (MAE), our DNN model with the JA database was approximately 21–33% and 17–22% more accurate for corn and soybean yields, respectively, than the other five AI models. This indicates that corn and soybean yields for a given year can be forecasted in advance, at the beginning of September, approximately a month or more ahead of harvesting time. A combination of the optimized DNN model and spatial statistical methods should be investigated in future work, to mitigate partly clustered errors in some regions.


Introduction
Accurate estimations of crop yields are important for many agronomic issues, including agricultural management, national food policies, and international crop trade. For this reason, a variety of methods are employed for crop yield prediction, and the application of satellite images is becoming increasingly important. Satellite remote sensing techniques, which continuously cover large areas, can help provide more accurate estimations of crop yields. Prasad et al. [1] predicted corn and soybean yields in Iowa, and Ren et al. [2] estimated the yield of winter wheat in Shandong, China, using a regression model with vegetation index and weather data. Most studies have focused on statistical analysis based on linear relationships between crop yields and vegetation indices obtained from optical satellite sensors [1][2][3][4][5][6][7]. In addition to vegetation indices, various land surface variables, such as weather elements, soil moisture (SM), hydrological conditions, soil properties, and fertilizer application, have also been The objective of this study was to develop an optimized DNN crop yield prediction model using optimized input variables from satellite products and meteorological datasets, following comprehensive and objective comparisons of various AI models. We first prepared input data for crop yield prediction, such as satellite-based vegetation indices, and meteorological and hydrological data, and constructed a matchup database on the Cropland Data Layer (CDL), a high-resolution map classifying crop types. Then, we selected an optimal period for crop yield estimation by considering the effect of phenology; for example, (1) May to September (entire growing season), (2) July to August (major productive period), and (3) a combination of all months showing high correlations between input variables and crop yield. We assumed that the prediction could be done based only on the July to August dataset, which can allow for yield prediction for the year, before harvesting. Using the optimized input dataset, we built six major AI models, including multivariate adaptive regression splines (MARS), SVM, RF, extremely randomized trees (ERT), ANN, and DNN, and comprehensively and objectively compared them. In particular, the DNN model was optimized by adjusting the hyper-parameters, layer structure, loss function, optimizer, activation function, and drop-out ratio to improve the accuracy of the crop yield prediction. Our experiment focused on corn and soybean yields in the Midwestern US, 2006-2015.

Study Area
The US is the world's largest grain exporter [22], and our experiment focused on five states in the Midwestern US (Illinois, Iowa, Minnesota, North Dakota, and South Dakota), where corn and soybeans are dominant. Our study area included 407 counties in five states (102 in Illinois, 99 in Iowa, 87 in Minnesota, 53 in North Dakota, and 66 in South Dakota), which were selected because the area of cropland exceeded 50% of the county area ( Figure 1). comprehensive and objective comparisons of various AI models. We first prepared input data for crop yield prediction, such as satellite-based vegetation indices, and meteorological and hydrological data, and constructed a matchup database on the Cropland Data Layer (CDL), a high-resolution map classifying crop types. Then, we selected an optimal period for crop yield estimation by considering the effect of phenology; for example, (1) May to September (entire growing season), (2) July to August (major productive period), and (3) a combination of all months showing high correlations between input variables and crop yield. We assumed that the prediction could be done based only on the July to August dataset, which can allow for yield prediction for the year, before harvesting. Using the optimized input dataset, we built six major AI models, including multivariate adaptive regression splines (MARS), SVM, RF, extremely randomized trees (ERT), ANN, and DNN, and comprehensively and objectively compared them. In particular, the DNN model was optimized by adjusting the hyperparameters, layer structure, loss function, optimizer, activation function, and drop-out ratio to improve the accuracy of the crop yield prediction. Our experiment focused on corn and soybean yields in the Midwestern US, 2006-2015.

Study Area
The US is the world's largest grain exporter [22], and our experiment focused on five states in the Midwestern US (Illinois, Iowa, Minnesota, North Dakota, and South Dakota), where corn and soybeans are dominant. Our study area included 407 counties in five states (102 in Illinois, 99 in Iowa, 87 in Minnesota, 53 in North Dakota, and 66 in South Dakota), which were selected because the area of cropland exceeded 50% of the county area ( Figure 1).

Satellite Products
The Terra Moderate Resolution Imaging Spectroradiometer (MODIS) is a satellite sensor operated by the National Aeronautics and Space Administration (NASA) for Earth environmental monitoring. The MODIS captures data in 36 spectral bands ranging in wavelength from 400 to 1,440 nm, at varying spatial resolutions (two bands for red and near infrared rays at 250 m resolution, five bands for blue, green, and shortwave infrared rays at 500 m resolution, and a further 29 bands for ultraviolet, visible, and infrared rays at 1 km resolution) [23]. It produces 44 products for the atmosphere, land, and oceans. We obtained the normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), leaf area index (LAI), fraction of photosynthetically active radiation

Satellite Products
The Terra Moderate Resolution Imaging Spectroradiometer (MODIS) is a satellite sensor operated by the National Aeronautics and Space Administration (NASA) for Earth environmental monitoring. The MODIS captures data in 36 spectral bands ranging in wavelength from 400 to 1440 nm, at varying spatial resolutions (two bands for red and near infrared rays at 250 m resolution, five bands for blue, green, and shortwave infrared rays at 500 m resolution, and a further 29 bands for ultraviolet, visible, and infrared rays at 1 km resolution) [23]. It produces 44 products for the atmosphere, land, and oceans. We obtained the normalized difference vegetation index (NDVI), enhanced vegetation index (EVI), leaf area index (LAI), fraction of photosynthetically active radiation (FPAR), and gross primary production (GPP) as indicators of vegetation vitality, photosynthesis, and biomass [24].
The NDVI, which represents the vitality of vegetation and photosynthetic activity, is the most commonly used vegetation index, and is calculated using the reflectance of near-infrared and red bands. The EVI is calculated by including the reflectance of the blue band to solve the signal saturation problem of the NDVI. Because the EVI is more sensitive to vegetation vitality than the NDVI in highly vegetated areas, it has been widely used for cropland monitoring [25,26].
The LAI is defined as the total leaf area per unit of ground surface area and is conventionally used as a proxy for the biomass of leaves [27]. It can be derived using maximum primary production and a crop-specific growth coefficient. The FPAR is defined as the fraction of incident photosynthetically active radiation (400-700 nm) absorbed by the green elements of a vegetation canopy [27]. GPP represents the vegetation biomass resulting from photosynthetic activity, in terms of chemical energy (kg C m −2 ) [28].
We used the products for NDVI and EVI, which were provided in the form of 250 m spatial resolution and 16 day temporal resolution. Also, we obtained the products for LAI, FPAR, and GPP which have 500 m spatial resolution and eight-day temporal resolution.

Meteorological and Hydrological Data
The Parameter-elevation Regressions on Independent Slopes Model (PRISM) [29] is a knowledge-based system developed for estimating regional-scale climate elements in physiographically complex landscapes at 4 km spatial resolution, such as precipitation (PPT), maximum temperature (TMAX), minimum temperature (TMIN), and mean temperature (TMEAN) [30]. The Global Land Data Assimilation System (GLDAS) [31] enables a reanalysis of land surface data by combining three land surface models (LSMs): Mosaic, Noah, and the Community Land Model (CLM) [32]. The SM product of GLDAS is known to be very reliable [33], and we used the monthly SM from the Noah land surface model (LSM) in a 0.25 • grid for crop yield modeling.

Cropland Data Layer
The National Agricultural Statistics Service (NASS) of the US Department of Agriculture (USDA) provides a CDL map for crop type classifications at 30 m resolution for 2006-2009 and 56 m resolution for 2010-2015 [34]. The CDL is based on the cropland census and satellite images from the Landsat Thematic Mapper (TM) and Advanced Wide Field Sensor (AWIFS) [35]. The CDL product is updated yearly and used to produce a crop mask map [36,37]. In this study, we extracted pixels that were recorded as corn or soybean area on the CDL map for masking the croplands.

Crop Yield Statistics
The USDA NASS provides annual county-level statistics for crop yields [38]. The yield statistics of corn and soybeans in the study area were gathered for the period 2006-2015. The bushels per acre data were converted to tons per hectare for the sake of convenience. Table 1 summarizes the dataset used in this study. The study period was 2006-2015, during which all of the necessary data were available.  (12) Satellite Images NDVI (2) 250 m 16 days NASA EOSDIS (13) EVI (3) LAI (4) 500 m 8 days FPAR (5) GPP (6) Meteorological Data PPT (7) 4 km Monthly PRISM (14) Climate Group TMAX (8) TMIN (9) TMEAN (10) Hydrological Data SM (11) 0.25 • Monthly NASA GES DISC (15) Crop Yield Statistics Corn County Yearly USDA (12)

Data Processing
We first extracted the cropland pixels from the CDL recorded as corn (ID = 1) and soybean (ID = 5). Then, we carried out spatial re-arrangement to distribute the input variables (NDVI, EVI, LAI, FPAR, GPP, PPT, TMAX, TMIN, TMEAN, and SM) on the CDL grid, using the nearest neighbor method. By referring to the quality index of the MODIS products, only the best-quality pixels were extracted for NDVI, EVI, LAI, FPAR, and GPP. For comparison with county-level yield statistics, all the gridded variables were aggregated in accordance with the polygon zone of each county using a zonal mean operation. Finally, two databases were built for corn-and soybean-sown areas ( Figure 2).
For the 10 candidate explanatory variables, we conducted a multi-collinearity test using the variation inflation factor (VIF). The VIF of the j-th explanatory variable x j is expressed as: where R j 2 is the R-squared value of the regression equation, with x j as a response variable and the other x variables as regressors. It indicates the degree to which an explanatory variable is correlated with the other explanatory variables. The actual input variables for our crop yield models were selected through the multi-collinearity test. The variables with VIF > 10 were generally excluded from the input data, because they could respond erratically to small changes in the model or the data [39]. Six variables were selected as valid explanatory variables: EVI, LAI, GPP, PPT, SM, and either TMAX (for corn) or TMIN (for soybean). The optimal temperature for soybean growth is between 20 and 26 • C; germination, flowering, and pod development can be delayed with lower temperatures [40]. Temperatures of 25-33 • C are appropriate for the growth of corn, which originated from tropical regions; temperature > 35 • C without sufficient rainfall can decrease growth [41]. Table 2 shows the correlation coefficients between the selected explanatory variables and the yields for corn and soybean.  (2) 2.412 1.937 GPP (3) 4.825 3.148 PPT (4) 1.900 1.935 TMAX (5) 1.489 -TMIN (6) -2.122 SM (7) 1.

Major Artificial Intelligence Models
We employed six major artificial intelligence models for crop yield modeling. Main features, advantages, disadvantages, and the software used were summarized in Table 3. Table 3. Summary of the main features, advantages, disadvantages, and the software for the six major artificial intelligence (AI) models used in this study.

Model
Main features Advantages Disadvantages Software used

MARS (1)
A non-parametric regression technique that combines a series of linear models to cope with nonlinearity and interactions between variables.
Generates a flexible model that can handle both linearity and nonlinearity.
Susceptible to overfitting and limited to handling large data. earth package in R SVM (2) Conducts optimal grouping of data and can be combined with a regression model for the optimal groups.
Supports optimal grouping of data by maximizing the margin between groups using kernel functions.
Susceptible to overfitting issues depending on kernel functions used in optimal grouping. e1071 package in R RF (3) An ensemble model that uses the bootstrap and bagging process.
Accurate predictions and better generalizations are achieved due to the utilization of ensemble strategies and random sampling.
Susceptible to overfitting issues because it cannot deal with outliers when the model is trained by small number of datasets.
randomForest package in R ERT (4) An ensemble grouping model using unpruned decision trees.
Increases in generalization capability by constructing the unpruned decision trees through the training used the complete learning sample Susceptible to overfitting issues because it cannot deal with outliers when the model is trained by a small number of datasets.
extraTrees package in R ANN (5) A network model consisting of input, hidden, and output layers to emulate a biological neural system.

Self-adaptive model as compared to traditional linear and simple nonlinear analyses
Local minima problem in which an optimization process often stops at a locally, rather than globally, optimized state.
nnet package in R DNN (6) Accuracy improvement by training complicated, huge input data in a deep and intensive neural network.
Can resolve the problems of overfitting and local minima through an intensive optimization process in a deep network structure with the combination of activation functions and dropout method.
Requires a high-end computer tensorflow package in Python MARS is a non-parametric regression technique that combines a series of linear models to cope with nonlinearity and interactions between variables. The learning dataset is divided into adaptive splines with different slopes. In general, the splines are smoothly connected to each other, and the polynomial function for the splines can generate a flexible model that can handle nonlinear relationships between data [42,43]. The suitability of a MARS model can be evaluated using the generalized cross-validation (GCV) value [44].
SVM is a machine-learning algorithm that performs optimal grouping of data and builds a regression model for the optimally divided dataset [45][46][47]. A hyperplane is used for the optimal grouping of data [48][49][50] by maximizing the margin between groups. The MMH can be found by maximizing the margin between support vectors at the boundary of the data groups [51], using kernel functions such as linear and Gaussian radial basis functions (RBFs).
RF, which represents an improved version of classification and regression trees (CART), is an ensemble model using the bootstrap and bagging technique [52]. It generates a large number of decision trees with slightly different features by extracting random samples from the training data, including a bootstrap that determines the suitability of the sampling distribution, and conducts resampling as needed. During the bagging process, bootstrap-based decision trees are aggregated to create a final solution, using an ensemble model such as the average or majority vote [13,53]. The final decision tree can be optimized by the tree pruning algorithm to determine an optimal size for the trees. In our experiment, the number of trees was set to 500, and the number of variables used for splitting nodes was set to n/3 (n = number of input variables). In addition, the out-of-bag error was used as the criterion of model suitability.
ERT is an ensemble model similar to RF, but it uses unpruned decision trees. ERT divides the nodes by randomly chosen cut-points and incorporates the complete learning sample (without bootstrap copying) to grow the trees [54]. The two main parameters for ERT are the number of attributes randomly selected at each node and the minimum sample size for splitting the node. The predictions of the trees are aggregated to produce a final ensemble prediction. In our experiment, the number of trees and the number of variables used for splitting nodes were set to be the same as those of RF.
ANN is a network model for emulating a biological neural system, and consists of input, hidden, and output layers [55]. The input layer corresponds to the explanatory variables, and the hidden layer represents the core of the node-link network for the computation of nonlinear problems [56,57]. Factors affecting the performance of ANNs include the number of nodes in the hidden layer, the learning rate, and the training tolerance [55]. The learning rate determines the amount by which the weights change during a series of iterations to bring the predicted value within an acceptable range of the observed value; furthermore, the training tolerance refers to the maximum error rate at which the network must converge during training [58]. Through this process, we can optimize a weight and bias set for the node-link structure and produce an approximation of the answer. In our experiment, the number of neurons in the hidden layer was set to three, which was selected through a performance test.
The classic ANN has a local minima problem in which an optimization process often stops at a locally, rather than globally, optimized state. In addition, generic machine learning models sometimes have problems with overfitting, in which they cannot handle data with outliers due to excessive learning from the given dataset. Such problems can be resolved by DNNs through an intensive optimization process in a deep network structure. To handle local minima and issues with overfitting, L1/L2 regularization can be employed to ensure sparsity (L1) and simplicity (L2) of the DNN model. Also, backward and forward optimization is conducted in the back-propagation algorithm to improve accuracy. The problem of vanishing gradients of loss functions, which may occur during the back-propagation process, can be managed by applying appropriate activation functions, such as sigmoid and rectified linear units (ReLU). The drop-out method deals with unexpected outliers via a learning mechanism in which the DNN model becomes more robust to extreme cases through iterations of a type of handicapped training with randomly deleted links and nodes [59]. In addition, a weight and bias set built in an existing DNN model can be imported as an initial value of a new DNN model for more custom-tailored training. This is called pre-training and transfer learning, which can improve the optimization of a DNN model. Fine-tuning can also be incorporated into the optimization process to adjust the weight and bias set in more detail, by including additional training data [60]. We set up the configurations of our DNN model through the parameter optimization procedure presented in Figure 3.
unexpected outliers via a learning mechanism in which the DNN model becomes more robust to extreme cases through iterations of a type of handicapped training with randomly deleted links and nodes [59]. In addition, a weight and bias set built in an existing DNN model can be imported as an initial value of a new DNN model for more custom-tailored training. This is called pre-training and transfer learning, which can improve the optimization of a DNN model. Fine-tuning can also be incorporated into the optimization process to adjust the weight and bias set in more detail, by including additional training data [60]. We set up the configurations of our DNN model through the parameter optimization procedure presented in Figure 3.

Prediction and Validation
Since crop harvesting is conducted yearly, our prediction and validation were also conducted on a yearly basis. Leave-one-year-out cross-validation was conducted to examine the accuracy of corn and soybean yield predictions [61][62][63][64][65]. The hindcasting models were built by training with the nine-year dataset, excluding the target year, and the predicted yields for the target year were validated using the hidden, true observation values of the target year. We conducted 10 rounds of training and validation, and the number of datasets for each round was approximately 3,200 for training and 350 for validation. The mean bias error (MBE), mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and the correlation coefficient between the observed and predicted yields during 2006-2015 were calculated for the validation statistics.

Spatial Analysis of Errors
The prediction errors can often show a unique spatial pattern. These errors may be either randomly distributed in space or spatially clustered. Hence, the spatial autocorrelation characteristics of the prediction errors were also examined using the Local Geary index (Gi*) [66], which expresses the degree of spatial association of an area with its neighboring areas: where x j is the attribute value of the neighboring area j, x is the average value of x, w ij is the spatial weight between a target area i and its neighboring area j, n is the number of areas, and S is the standard deviation of x. The Gi* statistic is calculated for each area as a z-score. A positive, high value of Gi* indicates a hot spot with high clustering, and a negative, low value indicates a cold spot with low clustering. A value of zero indicates neutral or random spacing.

Phenology Effect
In general, corn and soybeans are planted between late April and mid-May, mature in September, and are harvested by late October [1]. To analyze the effects of phenology, we derived 13 cases using different combinations of months, including one case representative of the entire growing season (GS) between May and September, five cases corresponding to individual months, four cases corresponding to two successive months, and three cases corresponding to three successive months (Tables 4 and 5). By considering correlation coefficients between input variables and crop yield, three periods were distinguished: (1) the entire GS from planting to harvest; (2) July-August (JA), which is the major period for corn and soybean production; and (3) the optimal combination (OC) of months with high correlations between input variables and crop yield. The OC for corn yield was based on EVI, LAI, and GPP data for JA, PPT data for June-July-August (JJA), TMAX data for July, and SM data for June-July (JJ). The OC for soybean included EVI, LAI, and SM data for JA, GPP data for July-August-September (JAS), PPT data for GS, and TMIN data for June.

Development of the Optimized Prediction Model for Corn Yield
Crop yield can differ by land surface variables according to season. To find the best months for crop yield prediction, we conducted an analysis using the three aforementioned periods: (1) GS, (2) JA, and (3) OC. For corn yield prediction, we built six AI models (MARS, SVM, RF, ERT, ANN, and DNN) for the three periods (GS, JA, and OC). In particular, we carried out the parameter optimization for the DNN models according to the procedure presented in Figure 3. The results of the sensitivity analysis according to DNN parameters are illustrated in Figure 4. The best structure of the hidden layers was determined to be 300-300 nodes, which was selected from several layer structures, including 200-200, 300-300, 500-500, 200-200-200, 300-300-300, and 500-500-500 nodes. From the loss function, such as sum of square errors (SSE), mean of square errors (MSE), and cross entropy (CE), SSE was selected as the best one. The adaptive gradient (AdaGrad) algorithm was better than the root mean square propagation (RMSProp) as an optimizer. For the activation function, the ReLU outperformed the sigmoid and hyperbolic tangent functions. The drop-out ratio was set to 40% through minimization of RMSE and MAPE.  Then, we examined the results of all 18 corn yield models (six models by three periods). The overall prediction accuracy was higher for JA than for GS or OC. In terms of the correlation coefficient, the highest accuracy was 0.945 for JA, 0.919 for GS, and 0.935 for OC. This suggests that a prediction model consisting of July and August input data is sufficient for corn-yield forecasting for the year. Therefore, we can forecast the yield for the year in advance (at the beginning of September) using the  (4) (%) Corr. (5) MARS (6) 0.067 0.773 1.009 11.0 0.911 SVM (7) 0.020 0.726 0.946 10.0 0.917 RF (8) 0.015 0.708 0.929 9.8 0.922 ERT (9) 0.001 0.703 0.922 9.6 0.924 ANN (10) 0.024 0.705 0.928 9.8 0.926 DNN (11) 0.029 0.582 0.765 7. The optimized DNN model produced the highest prediction accuracy (RMSE = 0.765 ton/ha and MAPE = 7.6%), and the scatter plot of the DNN for the predicted versus actual corn yields during 2006-2015 showed better agreement than other models ( Figure 5). The other five models had somewhat dispersed patterns in terms of the one-to-one line of the scatter plots, which was due to problems with overfitting or local minima. The DNN model overcame such issues because of the intensive optimization in the deep network structure, including the best configurations for the cost function, optimizer, activation function, and drop-out ratio. In terms of MAE, the DNN model with the JA database produced an approximately 21% (0.703/0.582-1) to 33% (0.773/0.582-1) better outcome than the other five models.

Development of Optimized Prediction Model for Soybean Yield
To examine the effect of phenology on prediction accuracy, we compared the predicted soybean yields from the GS, JA, and OC periods. The overall prediction accuracy of soybean yield was higher than for GS or OC. In terms of correlation coefficient, the highest accuracy was 0.901 for JA, 0.867 for GS, and 0.880 for OC. The validation results produced by the 10 rounds of experiments between 2006 and 2015 are presented in Table 7. The accuracy of soybean models was similar to that of corn models in terms of MAPE, but was somewhat lower in terms of the correlation coefficient. This is presumably because soybeans are a small grain and have a small yield value, which may lead to a relatively dispersed pattern around the one-to-one line on the scatter plot ( Figure 6). However, the prediction results of the DNN model were relatively stable and its scatter plot was highly concentrated around the one-to-one line. Similar to the optimization of the corn yield model, the best-fitted DNN model for soybean yield was built by setting the structure of the hidden layers to 500-500 nodes, SSE for loss function, AdaGrad for optimizer, ReLU for activation function, and the drop-out ratio to be 40%. The optimized DNN model for soybean showed high prediction accuracy (RMSE = 0.285 ton/ha, MAPE = 7.8%), which suggests that the DNN approach can be also effective for crops other than corn or soybeans. In terms of MAE, the DNN model with the JA database produced an outcome approximately 17% (0.270/0.222-1) to 22% (0.259/0.222-1) more accurate than those of the other five models.   MARS (6) −0.018 0.269 0.336 9.6 0.860 SVM (7) −0.036 0.265 0.339 9.5 0.850 RF (8) −0.027 0.261 0.332 9.4 0.854 ERT (9) −0.030 0.259 0.329 9.3 0.859 ANN (10) −0.016 0.270 0.340 9.7 0.853 DNN (11) −

Spatial Characteristics of the Optimized DNN Models
Sections 4.1 and 4.2 showed that the crop yield prediction accuracy was highest when using the optimized DNN model with the JA period data. To analyze the spatial characteristics of the result in more detail, we examined the annual predicted corn and soybean yields of the optimized DNN model for JA. Figures 7-10 are the maps of the annual county-level corn and soybean yields, respectively, for 2006-2015. The actual (Figures 7a, 8a, 9a and 10a) and predicted yields (Figures 7b, 8b, 9b and 10b) showed quite similar patterns. Figures 7c, 8c, 9c and 10c show the prediction errors for corn and soybean yields. Figures 7d, 8d, 9d and 10d show the Gi* for the prediction errors of our DNN model for corn and soybean yields. A somewhat spatially clustered pattern was found in a few areas for multiple years. This tendency of the spatial autocorrelation of the prediction error may be associated with the spatial distribution of the input data for land surface and weather variables. Indeed, the DNN model does not account for the spatial relationships between crop yield and agricultural variables. To examine the spatial clustering tendency of the prediction errors in more detail, and to solve the problem, the DNN model used in this study could be coupled with spatial statistical methods, such as geographically weighted regression (GWR), in a future work.
Examination of soil properties was also effective for analyzing the spatial pattern of prediction errors. In the case of Illinois, the spatial clustering of prediction errors was found in almost every year, for both corn and soybeans. This may be related to the spatial characteristics of the distribution of soil property values, such as bulk density and sodium content. High bulk density can prevent plants from rooting, and high sodium content can result in water stress for crops. Hence, we referred to the soil property maps from the Harmonized World Soil Database (HWSD) for the study area. Presumably, the spatial clustering of prediction errors in Illinois can be associated with the values of bulk density and sodium content, which were higher than in the other states. Additional treatment aimed at the spatial distribution of soil properties can contribute to improving the accuracy of crop yield prediction.

Conclusions
This paper compared different AI models in order to develop the best crop yield prediction model for the midwestern US. We constructed the input dataset from satellite-based vegetation indices, and meteorological and hydrological variables, in accordance with a high-resolution CDL. Through experiments to examine the effects of phenology using three periods (GS, JA, and OC), we selected the JA database as the best months to predict corn and soybean yields. Using the optimal input database, we built six major AI models for crop yield prediction and conducted a comprehensive and objective comparison of the AI models. Particularly for the DNN model, we performed an optimization process to ensure the best configurations for the layer structure, cost function, optimizer, activation function, and drop-out ratio, to improve the crop yield prediction accuracy. Consequently, the DNN model with the JA database outperformed the other five AI models, with prediction errors of approximately 7.6% and 7.8% for corn and soybeans, respectively, which is a better result than those of previous studies. In terms of MAE, our DNN model with the JA database was approximately 21-33% and 17-22% more accurate for corn and soybeans, respectively, than the other five AI models. Our model showed a correlation coefficient of 0.945 for corn and 0.901 for soybean. This was a better result than that previous research using AI models, although the experiment conditions were different in terms of the crop types and study areas (Table 8). This indicates that our DNN with JA model can forecast the corn and soybean yields very accurately for a given year in advance, at the end of August or beginning of September. The optimized DNN model developed in this study can also be adopted in other regions and/or for other crops, only if the parameter optimization is conducted for a new region (or crop). Additionally, we analyzed the spatial characteristics of the prediction errors using Gi*. Prediction errors had a tendency to cluster spatially in some areas in several years, indicating that the relationship between crop yield and land environmental factors can have a locally unique or regionally heterogeneous pattern, and that it will be necessary to employ spatial statistical methods to solve these problems. Therefore, a combination of the optimized DNN model and a spatial statistical model, such as GWR, should be investigated in a future work for greater accuracy improvement. This could also help mitigate the slight tendency toward underestimation of our soybean yield model. Also, more appropriate input variables, such as the growing degree days, should be added to a modified DNN model.