Comparing Machine Learning Approaches for Predicting Spatially Explicit Life Cycle Global Warming and Eutrophication Impacts from Corn Production

Agriculture ranks as one of the top contributors to global warming and nutrient pollution. Quantifying life cycle environmental impacts from agricultural production serves as a scientific foundation for forming effective remediation strategies. However, methods capable of accurately and efficiently calculating spatially explicit life cycle global warming (GW) and eutrophication (EU) impacts at the county scale over a geographic region are lacking. The objective of this study was to determine the most efficient and accurate model for estimating spatially explicit life cycle GW and EU impacts at the county scale, with corn production in the U.S.’s Midwest region as a case study. This study compared the predictive accuracies and efficiencies of five distinct supervised machine learning (ML) algorithms, testing various sample sizes and feature selections. The results indicated that the gradient boosting regression tree model built with approximately 4000 records of monthly weather features yielded the highest predictive accuracy with cross-validation (CV) values of 0.8 for the life cycle GW impacts. The gradient boosting regression tree model built with nearly 6000 records of monthly weather features showed the highest predictive accuracy with CV values of 0.87 for the life cycle EU impacts based on all modeling scenarios. Moreover, predictive accuracy was improved at the cost of simulation time. The gradient boosting regression tree model required the longest training time. ML algorithms demonstrated to be one million times faster than the traditional process-based model with high predictive accuracy. This indicates that ML can serve as an alternative surrogate of process-based models to estimate life-cycle environmental impacts, capturing large geographic areas and timeframes.


Introduction
Meeting continuously increasing food and fuel demands while protecting environmental integrity is a grand challenge. Agriculture as the primary stage for food and fuel production is associated with a range of environmental pollution issues ranging from global warming to nutrient degradation. traditionally process-based or input-output LCA approaches, ML models are capable of efficiently computing large datasets at the county scale across multiple years. Although ML techniques have promising potentials for transforming LCAs, the application of ML models to spatially and temporally LCA has been limited to date [43][44][45][46][47][48]. Previously, ML techniques were used to estimate either spatially generic life cycle release inventory of production processes [48,49], or life cycle toxicity impacts of chemicals [50]. A few recent studies applied neural networks to examine the life cycle GHG emissions associated with producing various crops based on literature sources or the spatially generic emission factors of the Intergovernmental Panel on Climate Change (IPCC) [6,7,[51][52][53][54][55][56]. These valuable efforts greatly contributed to fusing LCA and ML. However, none of the existing studies assessed the possibility of using ML techniques for predicting spatially and temporally explicit life cycle environmental releases.
To fill in the aforementioned knowledge gaps, this study tested and compared five ML approaches for computing spatially and temporally explicit life cycle environmental impacts at the county scale, with corn production in the Midwest region as a case study. The research question this study addresses is which machine learning model will yield highest predictive accuracy and computational efficiency for estimating the spatially and temporally explicit life cycle environmental impacts of corn. The Midwest region contains more than 39 million hectares of land and produced more than 80 percent of U.S. corn in the year 2018 [57]. This study identified the most efficient the and accurate combination of ML approaches and datasets for predicting spatially explicit life cycle global warming (GW) and eutrophication (EU) impacts of corn production. To the authors' best knowledge, this is the first study applying and comparing ML approaches for predicting the spatially explicit life cycle environmental impacts of agricultural production.

Overview of Models
Five modeling approaches were tested and compared in this study, including linear regressison (LR), support vector machine regression (SVR) (Section 2.1.1), artificial neural network (ANN) (Section 2.1.2), two tree based model, gradient boosted regression tree (GBRT) (Section 2.1.3), and extreme gradient boosting (XGBoost) (Section 2.1.4) to determine the modeling approach which presented best predictive accuracy and the highest computational efficiency [58].

Support Vector Machine Regression
Support vector machines (SVMs) are a class of learning based non-linear modeling algorithms with proven performance in a wide range of practical applications in classification and regression [59,60]. SVMs were initially developed for qualitative modeling problems. With the introduction of an -insensitive loss function, SVMs have been extended to solve quantitative modeling problems. In this study, the SVM regression (SVR) technique was employed to model the county-level life cycle GW and EU impacts influenced by climate, soil characteristics, and farming practices features.
Suppose that a training set S containing n data points is given as: where x i ∈ R d , i.e., d-dimensional space, and y i ∈ R. The goal of SVR is to find a function f which can estimate all these data well. It is equivalent to the following constrained optimization problem: where w is the normal vector of the hyperplane, ξ i and ξ * i are the slack variables, and C is the coefficient of the regularization terms. In this case, it is set to equal to 1. w T x i + b is the regression line which is the center of a tunnel with radius γ = 1 ||w|| . Points on the surface so-called support vectors make up the tunnel. Involving ξ i , ξ * i allows left certain points outside tunnel as errors in order to avoid over-fitting. ξ i , ξ * i are set as 0.1. The main purpose of SVR is to represent complicated relationships via non-linear mapping. The original input space is mapped into a high-dimensional feature space using kernel functions where it becomes linearly separable. Some common used kernel functions are linear, polynomial, sigmoid, and radial basis kernel functions (RBF) [61]. The RBF is the most often used kernel function in SVR.
The kernel function can be chosen according to data structure and the aim of project. Due to the small size of features, Gaussian kernel is used in this study. Two parameters, C and γ , must be appropriately set in SVR. The accuracy will be very high in the training stage and very low in the testing stage, when the value of C is set too large, while an extremely small value of may result in under-fitting, and excessively large value of may lead to over-fitting.

Artificial Neural Networks
Artificial neural networks (ANN) represent a type of computing that is based on the way that the human brain performs computations [62], which has been extensively used for pattern recognition and classification [63,64]. It consists of simple computational units called neurons. The numbers of neurons in the input and output layers are typically fixed by the type of application. The number of hidden layers and their neurons is typically determined by trial and error [65]. ANN can model complex non-linear relationships. The extra layers enable the composition of features from lower layers, giving the potential of modeling complex data with fewer units than a similarly performing shallow network [66].
The multi-layer perceptron (MLP) is the most frequently used ANN method, which consists of layered feedforward networks typically trained with static back propagation (see Figure 1). The neuron receives a set of inputs or signals (x), calculates a weighted average of them (z) using the summation function and weights (w), and then uses some activation function ( f ) to produce an output, where The connections between the input layer and the hidden layer contain weights which are usually determined through training the system. The hidden layer sums the weighted inputs and uses the transfer function to create an output value. The transfer function is a relationship between the internal activation level of the neuron (called activation function) and the outputs. The rectified linear unit (ReLU) is the most commonly used activation function in deep learning models.
where x is the input to a neuron. The MLP was trained with back-propagation algorithm (BPA), the most frequently used in the ANN literature, and was adopted in this study. The values of the weights are changed by using stochastic gradient descent (SGD) to minimize a suitable function used as the training stopping criterion. One of the functions most commonly used in supervised learning is the sum-of squared residuals given by Equation (6): where y i andŷ i are the actual and predicted values, respectively. The current weight change on a given layer is given by Equation (7).The weights will be updated by following the slope of the error surface downwards towards its minimum.
where η (usually between 10 −6 and 1.0) is the learning rate. A high learning rate corresponds to rapid learning which may push the training towards a local minimum or cause oscillation. In turn, when applying small learning rates, the time to reach a global minimum will be considerably increase [67].
To achieve faster learning and avoid local minima, an additional term, called "momentum," is adopted to smooth out the weight changes, which helps to protect network learning from oscillation. To control the overfitting of a neural network, L 2 regularization and dropout are also used in this study.

Gradient Boosting Regression Tree
Gradient boosting regression tree (GBRT) is an ensemble learning method that integrates the regression tree model with gradient boosting method. It builds a set of decision trees in a greedy manner at training time and makes a class prediction by majority voting. The algorithm is shown below.
3. Fit a regression tree to target r im giving terminal regions R mj , for j = 1, ..., J (R mj is a tree consisting of J leaf nodes).
The new multiplier γ jm is updated based on the tree f m−1 (x) and γ jm−1 . 4. The new tree is updated by I is the indicator function which equals to 1 if x is in the region R mj . In this paper, we prevent over-fitting by controlling the learning rate and subsample. We set the learning rate to 0.01 and subsampling rate to 0.75.
The multiplier γ jm shrinks in small steps again to avoid over-fitting sufficiently. The maximum depth of the model is set to be 8 to constrain the function space. It controls the complexity of trees. In general, the GBRT model is voted by majority of M trees to decide the best step function to approximate the regression line. The total estimators is equal to 2000.

Extreme Gradient Boosting
Extreme gradient boosting (XGBoost) is an improved algorithm based on the gradient boosting decision tree and can construct boosted trees efficiently and operate in parallel [68]. Compared with GBRT, XGBoost provides four additional features to achieve better performance. They are: (1) The weights of each new tree can be scaled down by given constant leaves. It reduces the influence of a single tree on the final score. (2) In column sampling, which works in a similar way to random forests, each tree is built using only a column-wise sample from the training dataset. (3) The introduction of the regularized loss function to avoid overfitting. (4) The loss function is approximated by Taylor expansion which speeds up the optimization procedure. We want to minimize the following regularized objective function in order to choose f k .
l is the loss function. In order to prevent too large a complexity of the model, the regularization term Ω is included as follows: where γ is the complexity of each leaf. T is the number of leaves in the decision tree, and λ is a parameter to scale the penalty which is set to be 0.85 in this study.

Datasets and Modeling Scenarios
The predictor variables (Xs) included the features of temperature, precipitation, soil organic content, soil texture, application rates of nitrogen and phosphorus fertilizers, and farming practices. The outcomes variables (Ys) were defined as county-level life cycle GW and EU impacts. A total of 10 modeling scenarios reflecting various combinations of algorithms and features were tested and compared in this study (Table 1). Modeling scenarios S1A through S5A included total of 32 features, containing monthly average temperature and precipitation features. Modeling scenarios S1A through S5A included a total of 18 features, containing seasonally average temperature and precipitation features. The data on climate, soil characteristics, and farming practices were collected from various governmental databases for 12 U.S. Midwest states during the years 2000-2008 ( Table 2). The monthly and seasonal average temperature (°C) and precipitation (mm) were obtained from the National Oceanic and Atmospheric Administration (NOAA)'s [69] National Climatic Data Center (NCDC) [70]. Soil data such as soil organic content and soil texture were extracted from the U.S. Department of Agriculture Soil Survey Geographic Database (SSURGO) [71]. The nitrogen and phosphorus fertilizer application rates and tillage practices were obtained from the U.S. Department of Agriculture's (USDA) [72] National Agricultural Survey Statistics and the Conservation Technology Information Center. The life cycle GW and EU impacts of corn production in Midwest states at the county-level during the years 2000-2008 are reported in our previous work (Lee et al. [17]). The total records are approximately 8000 over 9 years from 2000 through 2008. After removing records containing missing values, around 6000 complete records were left. To the authors' best knowledge, this is the largest database for spatially explicit life cycle impacts of corn production. The data sources for each feature are described in Table 2.

Training, Validation, and Test Approaches
The modeling procedures include four sequential steps: data preprocessing, hyper-parameter tuning, validation, and testing. The process is shown in Figure 2. To preprocess the raw data, we performed some trivial data cleaning, such as removing the missing values from the data using case-wise deletion. Categorical variables were one-hot encoded, while the numeric variables were normalized. The training step is equivalent to tuning the parameters for each model to seek the best fitted hyper-parameter combination. We call the parameters for the model hyper-parameters for distinguishing them from parameters of the data set. In order to find the best performing hyper-parameter combination, the grid search method was used. Two hyper-parameters, C and r, have to select the optimal value from two chooses of each kind. The grid, in this case, can be imagined as a 2 by 2 matrix. The names of hyper-parameters are taken to be the columns, and the two possible choices are taken to be the rows. The algorithm chooses one hyper-parameter from each kind at a time becoming a unique combination. Every model used in this study has different hyper-parameters. The different models should be assigned a distinct hyper-parameter grid. The grid search method would try every combination of preset parameters until it accessed all the combinations once. Preset parameters are given by experience. The algorithm would record every result from every hyper-parameter combination, and eventually return the best result and the related hyper-parameter combination. The best-performed parameter set would settle for the corresponding ML model.
After the first round of grid search, the rough selection provides a range of high performing hyper-parameter. This happens only if the hyper-parameters are continuous. For higher precision, the range is divided into several instances depending on the computer's power. In this study, we divided the hyper-parameters, such as the bag fraction for the GBRT model, into 10 instances.
In order to address the overfitting concern, we constructed models using a stricter 3-fold cross-validation methodology, in which we divided the three folds into training (two folds) and test (one fold) sets. The construction of the folds for cross-validation was created using the same stratification procedure. The process repeated three times until every fold was used as a validation fold once. The output from the 3-fold cross-validation would be the average of all folds of cross-validation results. Three metrics were used to evaluate the predictive accuracy of machine learning models, including cross-validation correlation, mean square error, and coefficient of determination: (i) Cross-validation correlation coefficient (Corr, see Equation (15)), was adopted to measure the strength of relationship between label y i and predictionŷ i . (ii) Mean-square-error (MSE, see Equation (16)) measures the average of the errors between y i andŷ i . This metric was involved to be the measure of training, testing, and cross-validation performance.
(iii) Coefficient of determination (R 2 , see Equation (17)) was used to evaluate how well the regression line represents the data. The computational efficiency was determined by the required time for training ML models and for using ML models for predicting spatially and temporally explicit life cycle environmental impacts, based on the computer configuration of Intel i7-4790k CPU, 8 GB memory at 2400 Mhz and Nvidia GTX 970 graphic card. Finally, we are able to calculate the scores under different metrics. The relative importance of input parameters such as weather, soil, and farming practices on life cycle GW and EU were ranked by sorting their corresponding coefficients. All the algorithms and metrics are adopted from Python package scikit-learn [73].

Influences of Input Variables
The models built on monthly features generally yielded higher predictive accuracy than the models built on seasonal features for the same ML algorithms. For example, both cross-validation (CV) correlation and R 2 values of S1A (linear model based on monthly features) were larger than those of S1B (linear model based on seasonal features) for the life cycle GW. Similarly, both CV correlation and R-squared values of S1A were larger than those of S1B for the life cycle EU. The same trends of CV correlation and R-squared values were observed for the remaining modeling algorithms (Table 3), including XGBoost, GBRT, SVR, and MLP. Consistently, XGBoost and GBRT using monthly features achieved lower MSE values than their corresponding algorithms with seasonal features. The only exception was that LM, SVR, and MLP with monthly features presented slightly higher MSE values than their corresponding algorithms with seasonal features.

Influences of Modeling Algorithms
Based on Table 3, GBRT presented the superior predictive accuracy. GBRT showed the highest CV correlation value (0.8) and R squared score (0.63), and the lowest MSE value (0.27) among all five models for the life cycle GW. Similarly, GBRT showed the highest CV correlation (0.87) and R squared score (0.75), and the lowest MSE value (10) among all five models for the life cycle EU. XGBoost was the second-best choice in terms of predictive accuracy. When the XGBoost model was applied for predicting life cycle GW, it produced the second-highest CV correlation value (0.78), the second-highest R squared score (0.61), and the lowest MSE value (0.28). Similarly, when the XGBoost model was used for predicting life cycle EU, it produced the second-highest CV correlation value (0.86), the second-highest R squared score (0.73), and the lowest MSE value (9). In contrast, the benchmark linear model exhibited the lowest predictive accuracy for both GW and EU. For example, the linear model had the lowest CV correlation value (0.45) and R 2 scores (0.2), and the highest MSE (0.58) among all five models for life cycle GW. Similarly, the linear model showed the lowest CV correlation (0.65) and R squared (0.63), and the lowest MSE (0.27) among all five models for the life cycle EU. Overall, XGBoost, GBRT, SVR, and MLP all showed higher predictive accuracy than the benchmark linear model, indicating that these models were capable of capturing non-linearity and complex interactions among features.

Influences of Sample Sizes
Figures 3 and 4 presented the influences of training data size on models' predictive capabilities. The CV correlation values for predicting life cycle GW exponentially increased for all model algorithms when the sample size increased from 100 to 4000. The CV correlation values for predicting life cycle GW reached the peak values at a sample size of round 4000. When the sample size continuously increased beyond 4000, the peak point, the CV correlation values slightly decreased. The CV correlation values for predicting life cycle EU dramatically increased for all model algorithms when the sample size increased from 100 to 5000. Additionally, the CV correlation values for GB and XGBoost exceeded 0.85 when 5000 input datasets were used to generate the models. Figures 3 and 4 also confirmed that GBRT and XGBoost presented the highest predictive accuracy while the linear model always had the lowest predictive accuracy. For the same sample sizes across different algorithms, the GBRT model consistently showed the highest predictive accuracy among all models for both the life cycle GW and EU. The XGBoost model always ranked as the second-best fitted model, whose CV correlation values were slightly lower than the GBRT model under the same sample sizes.

Predictive Efficiency
All five models were a million times more efficient than the traditional process-based LCA models for estimating spatially explicit life cycle GW and EU. With the configuration of an Intel i7-4790k CPU, 8 GB memory at 2400 Mhz, and a Nvidia GTX 970 graphic card, the training time for five distinct algorithms ranged from 0.02 to 112 s for each simulation of hyper-parameter combination. While the linear model required the shortest duration of 0.02 s/simulation, GBRT needed the longest duration of 112 s/simulation for model training( Table 4). The predictive time for all models was less than 1 s. The total training and prediction times for all five models were less than 4 h for estimating spatially explicit life cycle GW and EU at the county scale in the entire Midwest region. In contrast, the traditional process-based LCA model approach relying on detailed bio-geochemistry models, such as EPIC, DNDC, and DAYCENT, may require months to estimate spatially explicit life cycle GW and EU for a large region such as the entire Midwest [17,39].
The comparison across five algorithms indicated trade-offs existed between computational efficiency and predictive accuracy for both the life cycle GW and EU. The linear model was the most efficient approach but presented the lowest predictive accuracy for both the life cycle GW and EU. In contrast, GBRT required the longest training time, but yielded the highest predictive accuracy for both impacts. In addition, although the time required for training GBRT for each hyper-parameter combination was three times longer than for training XGBoost, the CV correlation value for GBRT was only improved over 0.2 than for XGBoost.

Key Influential Factors
The GBRT results showed that nitrogen and phosphorus application rates, soil organic content, and clay and silt types of soil were the top influencing factors for life cycle GW impacts ( Figure 5). Nitrogen fertilizers contributed to life cycle GW impacts directly by causing on-farm nitrous oxide (N 2 O) emissions from volatilization and denitrification processes in soil [74,75], and indirectly by emitting GHGs from supply chain activities such as manufacturing nitrogen fertilizers [76]. Phosphorus fertilizers contributed to life cycle GW primarily by generating GHGs from supply chain activities such as manufacturing phosphorus fertilizers [76]. Moreover, soil characteristics also had a significant role in regulating GHG emissions. Recent studies based on field experiments [77,78] and mechanistic model (i.e., DNDC) [70] suggested that soil organic matter increased the availability of nitrogen and carbon to soil microorganisms, consequently enhancing N 2 O emissions. Furthermore, finer-textured soils, such as clay or silt soils, which tend to retain more water, often resulted in higher N 2 O emissions due to higher denitrification rates in soil [79]. Additionally, meteorological factors such as temperature also had a significant contribution to life cycle GW. For instance, March's temperature potentially influenced the shift of the N 2 O:N 2 ratio [80], thereby increasing N 2 O emissions during the freeze-thaw cycle [81].
The present study also found that the top influencing factors of life cycle EU impacts were precipitation in February, soil organic content, the temperature in May, and precipitation in May and June ( Figure 6). Precipitation can promote the penetration of nutrients through soil layers and wash away the nutrients from topsoil, thereby increasing nutrient leaching and runoff [82]. The contributions of precipitation to life cycle GW impacts were consistent with previous studies examining N fluxes in the watersheds of the contiguous U.S. [83,84] and Lake Michigan Basin [85], and field-based studies in the U.S. Midwest [86], where N losses to water bodies were higher during wet seasons/years compared to dry seasons/years. Furthermore, few studies [87,88] explained that soil organic carbon can lead to the accumulation of carbon and nutrients in the soil, subsequently resulting in more nutrients available for leaching and runoff, which corroborates the findings of this study. Amery and Vandecasteele also reported that lower phosphate binding capacity to soil organic portion may increase susceptibility to phosphorus leaching [89].

Syntheses of Machine Learning Modeling Comparison
The comparison among all modeling scenarios indicated that the GBRT algorithm along with 4000 records of monthly features yielded the highest predictive accuracy for life cycle GW. The model based on the GBRT algorithm with 6000 records of monthly features showed the highest predictive accuracy for the life cycle EU. It is interesting to note that predictive accuracy was improved at the cost of simulation time. Although GBRT showed the highest predictive accuracy, GBRT required the longest training time. Compared with GBRT, the XGBoost model had slightly lower predictive accuracy but significantly lower computational time. Additionally, more training datasets (larger than 6000) will likely enable higher predictive accuracy for the life cycle EU. However, more training datasets (larger than 4000) cannot further improve the predictive accuracy for life cycle GW. Overall, when predictive accuracy is the key criterion, the GBRT model is the best choice. In contrast, when both predictive accuracy and model efficiency are top concerns, the XGBoost model is highly recommended. The majority of the existing machine learning applications in LCA used the ANN approach and did not compare performances between ANN and other ML approaches. However, our study found that BRT and XGBoost models showed higher predictive accuracy than the ANN approach for estimating corn's life cycle global warming and eutrophication impacts.

Application of Machine Learning Based Modeling Approaches for Supply Chain Management and Environmental Policies
Compared with traditional process-based models, the ML models discussed in this study present unique merits, such as faster execution and reduced storage needs to estimate modeling outputs, and more flexible integration into other processes and simulation platforms. Such advantages permit ML models to rapidly complete a high number of simulation runs, thereby making ML models superior approaches for a range of computationally intensive tasks, such as optimization, prediction, and validation. For example, it is often infeasible for traditional process-based models to identify the optimal corn supply chains for corn-based biorefineries in the U.S. Midwest region at a fine spatial resolution over a large region, due to billions of simulation scenarios reflecting various combinations of weather, soil, farming practices, and supply scenarios. In contrast, the ML models compressed the model simulation time from months to seconds, consequently enabling the identification of the optimal corn supply chains for corn users such as biorefineries and animal sectors at a fine spatial resolution over a large geographic area.
The ML models in this study are valuable for policy assistance at the county, state, and regional scales. At the county and state scales, these models are capable of identifying the top polluting counties and states, and of suggesting spatially targeted remediation strategies. For example, different fertilizer application rates and tillage practices could be recommended for different counties to approximate an optimal trade-off level among life cycle GW and EU impacts. The results could also be aggregated into region scale assessments for supporting regional policies and planning. For example, to support regional biofuel and land use policies, our ML models could be used for corn supply chain optimization, corn-based biorefinery siting, and feedstock landscape optimization.
There are also several drawbacks associated with ML models. First, the computational burden for generating spatially explicit training datasets at fine spatial scales (such as a county) over a large geographic area (i.e., the Midwest) is substantial. The spatially explicit life cycle analyses at a fine spatial resolution over a large geographic area are lacking to date. The majority of LCAs that use spatially explicit biogeochemical models focused on a few farms or a small region due to data and computational constraints. This study obtained spatially explicit life cycle environmental impacts of corn at the county scale in the Midwest by integrating process-based LCA and the spatially explicit EPIC model with supercomputer clusters [17]. The computational burden for generating the spatially explicit training datasets likely will restrain the adoption of ML models, especially when the computing infrastructure is not available for a large number of process-based modeling simulations. However, this study showed that increasing sample size beyond 4000 would not improve model accuracy for life cycle GW. This indicated that reducing the sample size to 4000 would be appropriate for reaching desirable predictive accuracy for life cycle GW, and is capable of reducing computational burden associated with generating training datasets and fitting ML models. We recommend future studies to identify the minimal sample size for spatially and temporally explicit LCAs of other crop and animal systems with machine learning approaches. Second, besides inheriting the uncertainty of process-based LCA and the EPIC model, the training algorithms also introduce additional uncertainty to our ML models. The uncertainty of process-based LCA and EPIC models was extensively discussed in previous studies [17,20]. This study focuses on the uncertainty introduced by training algorithms. By comparing four different algorithms and cross-validation procedures, this study found that the GBRT algorithm was the best statistical approach to approximate the traditional process-based model at a lower computational cost. We recommend future studies to identify the minimal sample size for spatially and temporally explicit LCAs of other crop and animal systems with machine learning approaches. Additionally, caution should be taken when applying the ML models generated from this study to other regions, whose weather, soil, or farming practice values may fall out of the ranges of our training datasets in the Midwest. In these cases, one should follow the procedures described in Figure 4 to construct and test ML models before employing them for other regions.

Conclusions and Recommendations for Future Work
While machine learning approaches were suggested as alternative approaches for predicting the life cycle environmental impacts of agricultural production, no studies yet identified the appropriate sample sizes and most suitable machine learning for predicting spatially and temporally explicit life cycle environmental impacts. This study comprehensively compared the influences of ML algorithms, training datasets, and sample sizes on the predicative accuracy and efficiency of ML models for estimating spatially explicit life cycle GW and EU impacts, with the Midwest corn as a case study. Among all LCA studies which used ML algorithms, this study included the largest training datasets, covering over 1000 counties in the Midwest over 9 years. This study found that GBRT and XGBoost models yielded the highest predictive accuracy for both the life cycle GW and EU. To further improve predictive accuracy, this study suggests expanding spatially and temporally explicit training datasets, particularly for spatially explicit life cycle EU. Moreover, the methods demonstrated in this study can be applied to other crop and animal systems. We recommend future studies to test these methods for predicting spatially explicit life cycle environmental impacts of soybean, beef, and dairy production, which are among the top contributors to the national GW and EU impacts. Additionally, the ML models open up new opportunities in solving computation-intensive challenges, such as optimization, prediction, and validation. Future interdisciplinary work, based on integrating the ML models in this study with optimization and economic models, will enable identifying the optimal supply chains, and supporting spatially and temporally explicit economic and policy analyses.
Funding: This document is the result of the research project funded by Presidential Innovation Award at University at Albany, State University of New York.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.