A Novel Framework for Predicting Daily Reference Evapotranspiration Using Interpretable Machine Learning Techniques

Elsadek, Elsayed Ahmed; Ali, Mosaad Ali Hussein; Williams, Clinton; Thorp, Kelly R.; Elshikha, Diaa Eldin M.

doi:10.3390/agriculture15181985

Open AccessArticle

A Novel Framework for Predicting Daily Reference Evapotranspiration Using Interpretable Machine Learning Techniques

by

Elsayed Ahmed Elsadek

^1,2,†

,

Mosaad Ali Hussein Ali

^3,†

,

Clinton Williams

⁴

,

Kelly R. Thorp

⁵

and

Diaa Eldin M. Elshikha

^1,*

¹

Biosystems Engineering Department, University of Arizona, Tucson, AZ 85721, USA

²

Agricultural and Biosystems Engineering Department, College of Agriculture, Damietta University, Damietta 34517, Egypt

³

Mining and Metallurgical Engineering Department, Faculty of Engineering, Assiut University, Assiut 71511, Egypt

⁴

United States Department of Agriculture (USDA)—Agricultural Research Service (ARS), Arid Land Agricultural Research Center, Maricopa, AZ 85138, USA

⁵

Grassland Soil & Water Research Laboratory, United States Department of Agriculture (USDA)—Agricultural Research Service (ARS), Temple, TX 76502, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Agriculture 2025, 15(18), 1985; https://doi.org/10.3390/agriculture15181985

Submission received: 28 August 2025 / Revised: 16 September 2025 / Accepted: 16 September 2025 / Published: 20 September 2025

(This article belongs to the Section Agricultural Water Management)

Download

Browse Figures

Versions Notes

Abstract

Accurate estimation of daily reference evapotranspiration (ET_o) is crucial for sustainable water resource management and irrigation scheduling, especially in water-scarce regions like Arizona. The standardized Penman–Monteith (PM) method is costly and requires specialized instruments and expertise, making it generally impractical for commercial growers. This study developed 35 ET_o models to predict daily ET_o across Coolidge, Maricopa, and Queen Creek in Pinal County, Arizona. Seven input combinations of daily meteorological variables were used for training and testing five machine learning (ML) models: Artificial Neural Network (ANN), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Support Vector Machine (SVM). Four statistical indicators, coefficient of determination (R²), the normalized root-mean-squared error (RMSE_n), mean absolute error (MAE), and simulation error (S_e), were used to evaluate the ML models’ performance in comparison with the FAO-56 PM standardized method. The SHapley Additive exPlanations (SHAP) method was used to interpret each meteorological variable’s contribution to the model predictions. Overall, the 35 ET_o-developed models showed an excellent to fair performance in predicting daily ET_o over the three weather stations. Employing ANN10, RF10, XGBoost10, CatBoost10, and SVM10, incorporating all ten meteorological variables, yielded the highest accuracies during training and testing periods (0.994 ≤ R² ≤ 1.0, 0.729 ≤ RMSE_n ≤ 3.662, 0.030 ≤ MAE ≤ 0.181 mm·day⁻¹, and 0.833 ≤ S_e ≤ 2.295). Excluding meteorological variables caused a gradual decline in ET-developed models’ performance across the stations. However, 3-variable models using only maximum, minimum, and average temperatures (T_max, T_min, and T_ave) predicted ET_o well across the three stations during testing (17.655 ≤ RMSE_n ≤ 13.469 and S_e ≤ 15.45%). Results highlighted that T_max, solar radiation (R_s), and wind speed at 2 m height (U₂) are the most influential factors affecting ET_o at the central Arizona sites, followed by extraterrestrial solar radiation (R_a) and T_ave. In contrast, humidity-related variables (RH_min, RH_max, and RH_ave), along with T_min and precipitation (P_r), had minimal impact on the model’s predictions. The results are informative for assisting growers and policymakers in developing effective water management strategies, especially for arid regions like central Arizona.

Keywords:

arid climates; daily reference evapotranspiration (ET_o); ET_o prediction; machine learning; SHapley Additive exPlanations (SHAP)

1. Introduction

In the desert U.S. Southwest, agroecosystems are vulnerable to water stress due to the ongoing drought in the Colorado River basin [1,2], depletion of groundwater and reservoir water supplies, and increased distribution of water for industrial and municipal needs [2,3,4,5]. A great challenge for agriculture in this region is to find solutions for sustaining crop production under the reduced water allocations mandated for many of the water districts [6,7] and climate change, which significantly impacts the agricultural system and accelerates the inevitable risk of water scarcity [8,9,10]. Given the current and future water scarcity and climate change challenges facing the agriculture sector [11], improving existing irrigation management practices for obtaining economic yields with less water is critical. Water scarcity requires more precise use of irrigation water [12], where irrigation timing and amount are synchronized to meet crop water requirements and sustain plant-available water status [13,14]. The crop water requirement is closely related to crop evapotranspiration (ET), which is the sum of (1) the losses of water to air from the soil surface evaporation and (2) plant transpiration of soil water from the crop root zone [15]. In arid and semi-arid climates in the U.S., ET constitutes a large majority of the water consumption from irrigated fields [16]. Thus, improvements in irrigation management and water use efficiency require accurate estimates of ET to help guide appropriate irrigation scheduling decisions for growers [17].

During the past few decades, various measurement techniques have been shown to accurately quantify the actual crop ET in fields, including soil water balance (SWB) methods, lysimeters, Bowen ratio systems, and eddy covariance (EC) systems [18]. However, use of these methods is costly and time-consuming, and generally impractical for commercial growers. Moreover, they require expertise and knowledge of the instruments used [19]. Nevertheless, some cost-effective ET tools have been developed, such as widely used crop coefficient (K_c) methods, where the ET for a crop is estimated by multiplying crop-specific, time-based K_c values by a weather-based reference ET [20]. The Penman–Monteith model (PM) is the standard method for estimating reference evapotranspiration (ET_o) and a benchmark model for calibrating other ET_o models. Measuring all the input variables needed for the PM is costly and labor-intensive, requiring specialized instrumentation and expertise. Moreover, most of these variables may not be available or of low quality, especially in developing countries, impeding its universal application [21]. Therefore, empirical models such as the Hargreaves and Samani model [22], a simplified temperature-based equation that utilizes maximum, minimum, and average temperatures, have been developed for calculating ET_o [9,10]. However, ET_o estimates based on such equations have shown inconsistencies compared to the PM in different geographical regions worldwide [23,24,25].

To date, artificial intelligence techniques have exhibited rapid expansion, offering powerful alternative tools for ET_o modeling [26,27,28,29], especially through machine learning (ML) models [30]. However, no cited studies have used advanced ML techniques to accurately predict daily ET_o under arid climate conditions in Arizona, USA. To fill the scientific gap and provide a reliable and interpretable data-driven framework to support irrigation planning and water resource management in water-scarce regions, such as Arizona, the main objectives of this study were to (1) develop and evaluate five advanced ML models, namely Artificial Neural Network (ANN), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Support Vector Machine (SVM) for the accurate prediction of daily ET_o based on ten weather variables (maximum, minimum, and average temperature [T_max, T_min, and T_ave], precipitation [P_r], maximum, minimum, and average relative humidity [RH_max, RH_min, and RH_ave], wind speed at 2 m height [U₂], extraterrestrial solar radiation [R_a], and solar radiation [R_s]), (2) employ the SHapley Additive exPlanations (SHAP) method to interpret the contribution of each meteorological variable to model predictions, and (3) use each ML algorithm to develop ET_o models with different weather input variables, resulting in varying levels of model complexity. To ensure their accuracy, the 35 ET_o-developed models were trained and tested at three weather stations in Pinal County, Arizona, from January 1995 to June 2025.

2. Materials and Methods

2.1. Study Area and Datasets

Pinal County, located in central Arizona, USA (latitude 32.90431° N, longitude 111.34471° W), covers an area of approximately 13,900 km² (Figure 1). The region has an arid climate, where average annual precipitation is about 150–200 mm. Hot and dry conditions prevail from late spring through fall, with maximum daily air temperatures typically reaching 43.0 °C to 48.0 °C during summer. Pinal County was selected because agriculture accounts for nearly 92.3% of the county’s water consumption [31]. Most of this water was sourced from the Colorado River [32]. Irrigation water is predominantly consumed for irrigating highly water-intensive crops, such as alfalfa and cotton [33]. Recently, Arizona’s Colorado River supply was reduced by 18–21% due to the mandated water rationing in the state, where private growers in the central and southern counties of the state are primarily bearing the brunt of the reductions [34].

Daily meteorological data, including T_max, T_min, T_ave, P_r, RH_max, RH_min, RH_ave, U₂, R_a, R_s, and ET_o, during the period from January 1995 to June 2025, were collected from Arizona Meteorological Network (AZMET) stations in Coolidge, Maricopa, and Queen Creek (https://cals.arizona.edu/AZMET/26.htm, accessed on 1 July 2025). The meteorological variables were input to the five ML models to project daily ET_o as a target output variable across the three weather stations in Pinal County, Arizona. The meteorological variables were selected based on their Kendall correlation (Figure 2) and significance in predicting ET_o [23,35,36]. The statistics of the datasets used in the current study are summarized in Table 1 and Figure 2.

2.2. Machine Learning Models

2.2.1. Artificial Neural Network (ANN)

The ANN is a mathematical ML model that simulates the human brain learning from trials and errors. Like the human brain, an ANN consists of neurons that arrange processing elements (PEs) in three layers: input, hidden, and output. In the input layer, data are introduced into the ANN, and then the data are processed in the hidden layer. Finally, the results are produced in the output layer. The three layers are fully interconnected by interconnection weights, similar to the synapses in biological neurons. Each PE (neuron) consists of a part to sum the weighted inputs, add a bias term, and then pass the results (activation values) by another part called the activation or transfer function. The activation function (f) works as a nonlinear filter to squash the output values of an artificial neuron to values between two asymptotes, typically between zero and 1 or −1 and 1.

{PE}_{output, j} = f (I_{j})

(1)

I_{j} = \sum_{i = 1}^{n} {(w}_{i j} x_{i} + b_{i})

(2)

where PE_output,j is the output of a processing element or a neuron j, I_j is the sum of weighted inputs for a neuron j in the i-th layer, w_ij is the weight between the i-th input and the neuron j of the hidden layer. x_i and b_i are the i-th input value and bias term.

ANN has been used in hydrological studies to approximate unknown functions or predict values based on a muddled time series of data [37]. For further information about ANN, see Kumar et al. [38].

2.2.2. Random Forest (RF)

The RF, developed by Breiman [39], is a powerful ML algorithm widely used for regression and classification tasks. The concept of RF is to ensemble a set of regression or classification trees. Each tree in the forest grows upon a diverse learning dataset, which is called bagging or bootstrap sampling. In each bootstrap, a random sample is drawn from the original dataset with replacement to train each decision tree and generate a regression tree during the prediction by the RF. Then, each tree is split randomly, using a subset of features, to construct multiple trees.

The splitting algorithm follows a hierarchical structure and is implemented in a binary manner. Thus, at each step, it identifies the optimal split points and the best splitting variable among all input variables and all possible split points, aiming to minimize the sum of squared residuals.

Three training factors control how the forest is built: (1) the number of trees (ntree), (2) the number of input variables randomly tried at each split (mtry), and (3) the maximum number of terminal nodes per tree (maxnode). A full description of RF can be found in Breiman [39].

2.2.3. XGBoost

Extreme Gradient Boosting (XGBoost) is a scalable end-to-end tree boosting algorithm, widely recognized as an effective machine learning method. The original framework of XGBoost involves aggregating numerous “weak” learners into a “strong” learner through an iterative process known as gradient boosting decision trees. In each iteration, the model updates by correcting previous predictions based on residuals. The algorithm allows for the independent selection of the loss function appropriate for model evaluation. Furthermore, a regularization term is incorporated into the model to mitigate the risk of overfitting. The mean score of each tree serves as the predictive value in regression tasks. For the m-th decision tree, the XGBoost formula is expressed as follows:

\hat{y_{i}} = \sum_{m = 1}^{M} f_{m} (x_{i}), f_{m} \in Ƒ

(3)

where

\hat{y_{i}}

is the predicted value for the i-th sample, M is the total number of trees (boosting rounds),

f_{m} (x_{i})

is the prediction from the m-th regression tree for input x_i, and Ƒ is the space of all possible regression trees. Each

f_{m}

is a tree structure that maps x_i to a leaf score. A full description of XGBoost can be found in Chen and Guestrin [40].

2.2.4. CatBoost

CatBoost is a powerful new gradient boosting (GB) toolkit, especially designed to handle categorical features with minimal loss of information. CatBoost has both a Graphics Processing Unit (GPU) and a Central Processing Unit (CPU) implementation of learning and scoring algorithms, respectively, making it outperform other GB algorithms on ensembles of similar sizes [41].

CatBoost is distinguished by two crucial algorithmic advances: (1) implementing ordered boosting, a permutation-driven alternative to overcome the overfitting caused by the classical GB algorithms, and (2) an innovative algorithm for handling categorical features. Both were developed to address the prediction shift caused by a specific form of target leakage inherent in all existing implementations of GB algorithms [42].

Binary decision trees are used as a base predictor in CatBoost, whereas a decision tree (h) is written as follows [42].

f (x_{i}) = \sum_{j = 1}^{J} b_{j} 1_{{x \in R_{j}}}

(4)

where f(x_i) is the prediction/forecast made by a decision tree for the i-th variable x, J is the total number of terminal nodes/leaves in the decision tree, R_j is the region/subset of feature space corresponding to the j-th leaf, b_j is the predicted value in region R_j, 1_{{x∈R_j}} is an indicator function, equal to one if x falls in the region R_j or zero otherwise.

2.2.5. Support Vector Machine (SVM)

The SVM, originally introduced by Cortes and Vapnik [43], is a machine learning algorithm designed for two-class issues. Theoretically, SVM is a kernel trick-based algorithm [44], whereas it works by mapping input vectors nonlinearly into a very high-dimensional feature space. Within this space, a linear decision boundary is established. The distinctive characteristics of this decision boundary ensure the learning machine has a high generalization ability.

f (x) = \sum_{i = 1}^{n} \propto_{i} K (x, x_{i}) + b

(5)

where f(x) is the predicted/forecasted value for an input x, ∝_i is the Lagrange multiplier for under/overestimation, K(x, x_i) is the Kernel function, and b is a bias term (intercept).

As a powerful and good function for nonlinear data, the Kernel Radial Basis Function (RBF) was used in the current study to predict ET_o.

K (x, x_{i}) = e x p (- γ {‖x - x_{i}‖}^{2})

(6)

2.3. Models’ Hyperparameter Settings

Table 2 summarizes the default and tuned hyperparameters for each machine learning model used in ET_o prediction. The hyperparameters of each machine learning model were selected based on the best practices and empirical performance. For the ANN, a two-hidden-layer architecture was adopted using MLPRegressor (hidden_layer_sizes = (100, 50), max_iter = 1000, random_state = 42), where the increased layer depth and iteration count enhance the model’s capacity to capture complex, nonlinear relationships in meteorological data. The SVM model employed a radial basis function kernel [SVR (kernel = ‘rbf’)], which is well-suited for handling nonlinear regression problems, with other parameters left at default to maintain simplicity. The RF was configured with n_estimators = 100 and random_state = 42, striking a balance between accuracy and computational efficiency while leveraging the ensemble’s robustness to hyperparameter settings. For the XGBoost, the model was initialized as XGBRegressor (n_estimators = 100, learning_rate = 0.1, random_state = 42), reflecting standard configurations that offer strong performance across a wide range of tabular datasets. Lastly, the CatBoost was used with verbose = 0 to suppress console output and random_state = 42 for reproducibility, as it typically achieves high accuracy with minimal tuning. These configurations were chosen to establish a strong baseline performance while ensuring stability and comparability across the ML models. In this study, all ML models were implemented using the open-source scikit-learn library (Version 1.7.1) within the Python 3.13.3 environment.

2.4. Input Combinations

The current study used seven input combinations of daily meteorological variables as a running scenario for training and testing the five ML models on predicting ET_o across Pinal County, Arizona (Table 3). The seven combinations consist of a complete combination with all meteorological variables and the incomplete combinations with part of the variables, reflecting different levels of complexity: (1) T_max, T_min, T_ave, P_r, RH_max, RH_min, RH_ave, U₂, R_a, and R_s; (2) T_max, T_min, T_ave, P_r, RH_ave, U₂, R_a, and R_s; (3) T_max, T_min, T_ave, P_r, RH_ave, U₂, and R_s; (4) T_max, T_min, T_ave, P_r, RH_ave, and U₂; (5) T_max, T_min, T_ave, P_r, and RH_ave; (6) T_max, T_min, T_ave, and P_r; while the last is a simplified temperature-based combination [(7) T_max, T_min, and T_ave]. Recently, different approaches have been adopted for splitting datasets into training and testing subsets, with a 70:30 division being the most frequently applied [23,45]. Following this approach, the daily meteorological data from January 1995 to June 2025 were randomly partitioned into 70% for training and 30% for testing, ensuring a reliable basis for developing and validating the ET_o prediction models.

2.5. SHAP: SHapley Additive exPlanations

SHAP is a framework used for understanding a ML model’s prediction by attributing the contribution of each feature to the predicted outcome [46]. SHAP values, which are rooted in cooperative game theory, endeavor to provide a comprehensive approach to explicate the output of intricate models [47]. These values capture the marginal impact of each feature on the prediction while considering the interplay between features and assessing all feasible combinations. Consequently, SHAP represents a valuable tool for comprehending and quantifying the significance of each feature in a predictive model and thus discerning which features propel specific predictions and how much they contribute.

2.6. Evaluation Indices

Four statistical indicators, namely, coefficient of determination (R²), the normalized root-mean-squared error (RMSE_n), mean absolute error (MAE), and simulation error (S_e), were used to evaluate the performance of the five ML algorithms.

R^{2} = {[\frac{\sum_{i = 1}^{n} ({E T}_{o b s} - {E T}_{o b s}^{-}) ({E T}_{p r e} - {E T}_{p r e}^{-})}{\sqrt{\sum_{i = 1}^{n} {({E T}_{o b s} - {E T}_{o b s}^{-})}^{2}} \sqrt{\sum_{i = 1}^{n} {({E T}_{p r e} - {E T}_{p r e}^{-})}^{2}}}]}^{2}

(7)

{R M S E}_{n} = \frac{100}{{E T}_{o b s}^{-}} \sqrt{\frac{\sum_{i = 1}^{n} {({E T}_{p r e} - {E T}_{o b s})}^{2}}{n}}

(8)

M A E = \frac{\sum_{i = 1}^{n} | {E T}_{p r e} - {E T}_{o b s} |}{n}

(9)

S_{e} = \frac{{E T}_{p r e} - {E T}_{o b s}}{{E T}_{o b s}} \times 100 %

(10)

where ET_obs and ET_pre refer to observed and predicted ET.

{E T}_{o b s}^{-}

and

{E T}_{p r e}^{-}

are the means of observed and predicted ET, respectively. The R² values reflect the strength of the predicted datasets. The R² close to 1.0 indicates a good match between the predicted and observed datasets. The RMSE_n value ranges between 0 and ∞. The simulation is excellent if the estimation result of RMSE_n

<

10% and good if the result ranges between 10 and 20%. A result from 20 to 30% indicates fair performance, while RMSE_n > 30% indicates poor performance [48]. The MAE reflects the difference between the mean absolute value of the projection/modeling and the observed value. The lower the MAE value, the greater the projection/modeling accuracy. The S_e between ±15% is acceptable [49].

3. Results and Discussion

3.1. Coolidge Station

Table 4 summarizes the performance evaluation results of the 35 developed ET_o models across the Coolidge weather station, Pinal County, Arizona. Considering the 10-variable models, which require T_max, T_min, T_ave, P_r, RH_max, RH_min, RH_ave, U₂, R_a, and R_s, the average evaluation indices were 1.305 ≤ RMSE_n ≤ 3.485, and 0.890 ≤ S_e ≤ 2.295 during the testing period (Table 4). Meanwhile, all 10-variable models showed a slight tendency to overestimate ET_o by 0.045–0.115 mm·day⁻¹.

For all ML models, the scatter plots of the test set across Coolidge station showed a linear-positive distribution between ET_obs and ET_pre with 0.994 ≤ R² ≤ 0.999 (Table 4 and Figure 3).

Based on violin diagrams and performance indices (Figure 4 and Table 4), there were no obvious differences in the structure and distribution of all ML algorithms compared with the actual datasets, indicating excellent accuracy in predicting ET_o.

The CatBoost algorithm, CatBoost10 model, outperformed other models with R² = 0.999, RMSE_n = 1.305%, MAE = 0.045 mm·day⁻¹, and S_e = 0.890%, making it the best choice to predict ET_o over Coolidge using the ten weather parameters.

Figure 5 presents the SHAP beeswarm plot for the CatBoost10 model, illustrating each input feature’s contribution to the ET_o prediction. The features are ranked by importance from top to bottom, based on their average absolute SHAP values. The SHAP values on the x-axis represent the impact of each feature on the model output in terms of the ET_o prediction in mm·day⁻¹, where positive values indicate an increase and negative values a decrease in predicted ET_o. Each dot corresponds to a sample, with its color reflecting the feature’s actual value (red for high, blue for low). These values quantify how much each feature shifts the model’s output from the base value, i.e., the average model prediction, either positively or negatively. For instance, a SHAP value of +2 for T_max indicates that this feature has increased the predicted ET_o by 2 mm·day⁻¹ for a specific instance. Among all variables, T_max, R_s, and U₂ had the most significant positive influence on ET_o predictions, as high values of these parameters tend to enhance evapotranspiration. R_a and T_ave also showed moderate positive effects. Conversely, the variables related to relative humidity (RH_min, RH_max, and RH_ave) showed an adverse effect on ET_o, resulting in reduced SHAP values. T_min and P_r had the least influence, suggesting limited direct relevance to the model’s prediction of ET_o under the given conditions.

Figure 6 displays a SHAP bar plot illustrating the mean absolute SHAP values for each input feature in predicting ET_o. The x-axis represents the average magnitude of a feature’s contribution to the model’s output, expressed in mm·day⁻¹. Importantly, these values reflect the absolute influence of each variable regardless of whether the effect increases or decreases the average ET_o prediction across all data points. Thus, a higher value on the x-axis signifies a more substantial and consistent role in shaping the model’s predictions, rather than a directional change. For instance, a mean absolute SHAP value of 0.6 mm·day⁻¹ for R_s means that, on average, solar radiation contributes ± 0.6 mm·day⁻¹ to the model’s prediction of daily ET_o, regardless of whether it increases or decreases the prediction. Among the considered variables, maximum temperature emerged as the most influential factor, followed by solar radiation and wind speed at 2 m height (T_max, R_s, and U₂). R_a and T_ave also contributed significantly, to a lesser extent. Meanwhile, humidity-related variables (RH_min, RH_max, and RH_ave) along with T_min and P_r, were found to have minimal impact on the model’s predictions.

Related to the 8-variable models, which require the same input variables except RH_max and RH_min, all ML models showed an excellent performance in predicting ET_o with 0.994 ≤ R² ≤ 0.999, 1.685 ≤ RMSE_n ≤ 3.546,1.191 ≤ S_e ≤ 2.389, and 0.060 ≤ MAE ≤ 0.120 mm·day⁻¹ during the testing period (Table 4).

The ANN algorithm, model ANN8, outperformed other models with R² = 0.999, RMSE_n = 1.685%, MAE = 0.045 mm·day⁻¹, and the lowest simulation error (S_e = 1.191%), making it the best choice to predict ET_o over Coolidge station using the eight weather parameters.

Considering the 7-variable models (T_max, T_min, T_ave, RH_ave, R_s, P_r, U₂), all ML models showed a slight tendency to overestimate ET_o by 0.105–0.143 mm·day⁻¹. The lowest simulation error during the testing period was recorded for the CatBoost algorithm, model CB 7 (S_e = 2.099%), making it the best choice to predict ET_o using the seven weather parameters over the Coolidge station.

The absence of R_s from the 6-variable models (T_max, T_min, T_ave, RH_ave, P_r, U₂) led to increasing S_e up to 8.107% (RF6 model); however, all ML models performed well during the testing period with 0.948 ≤ R² ≤ 0.954, 10.125 ≤ RMSE_n ≤ 10.811, and a slight overestimation of ET_o by 0.381 to 0.406 mm·day⁻¹. The ANN 6 model outperformed all 6-variable models with a S_e of 7.615%.

Excluding U₂ from the 5-variable models (T_max, T_min, T_ave, RH_ave, P_r) resulted in a slight overestimation of ET_o, ranging from 0.563 (ANN6 model) to 0.601 mm·day⁻¹ (RF6 model). The ANN 5-variable model outperformed other ML models with R² = 0.903, RMSE_n = 14.718%, and a S_e of 11.249%.

Even after excluding P_r and RH_ave from the 4-variable and 3-variable models, respectively, all tested ML algorithms performed well in predicting ET_o over Coolidge station (0.832 ≤ R² ≤ 0.902 and 14.748 ≤ RMSE_n ≤ 19.327). All ML models overestimated ET_o by 0.567 (ANN 4 model) to 0.727 mm·day⁻¹ (RF3 model) and S_e between 11.325% (ANN 4 model) and 14.525% (RF3 model), making ANN 4 and ANN3 models the best-performing models when using the 4-variable and 3-variable models, respectively, to predict ET_o over Coolidge station.

3.2. Maricopa Station

The performance evaluation results of the 35 developed ET_o models across the Maricopa weather station are summarized in Table 5. Similarly to the Coolidge weather station, considering 10-variable parameters, all ML algorithms showed a linear-positive distribution between ET_obs and ET_pre with 0.995 ≤ R² ≤ 0.999 during the testing period (Figure 7 and Table 5). All 10-variable models showed a slight tendency to overestimate ET_o by 0.043–0.114 mm·day⁻¹. RMSE_n values ranged between 1.224 and 3.521%, while the S_e values were 0.833 ≤ S_e ≤ 2.219.

Our analysis revealed no notable differences in the structure and distribution of violin diagrams of the observed and predicted datasets, indicating a highly accurate prediction of ET_o during the testing period (Figure 8 and Table 5).

Likewise, the CatBoost algorithm, CatBoost10 model, outperformed other models with R² = 0.999, RMSE_n = 1.224%, MAE = 0.043 mm·day⁻¹, and S_e = 0.833% when considering the ten weather parameters, making it the best choice to predict ET_o over Maricopa.

Like Coolidge station, T_max, U₂, and R_s had the most significant positive impact on ET_o predictions. Additionally, R_a and T_ave showed a moderate positive effect on predicting ET_o. However, RH_min, RH_max, and RH_ave diminishing ET_o, leading to lower SHAP values. RH_ave and P_r had the lowest impact on ET_o, suggesting limited direct relevance to the model’s predictions (Figure 9).

Among the ten weather parameters, T_max was identified as the most influential variable, followed by U₂ and R_s. Additionally, both Ra and T_ave contributed notably to the model’s predictions; however, RH_min, RH_max, T_min, R_Have, and P_r had minimal impacts (Figure 10).

Considering the 8, 7, 6, 5, 4, and 3-variable models, all ML algorithms showed excellent (RMSE_n = 1.587%) to fair (RMSE_n = 20.737%) accuracy in predicting daily ET_o over the Maricopa station. The ANN algorithm outperformed all ML models with S_e values ranging between 1.063 and 14.747%. The RMSE_n values were between 1.587% and 19.457%, reflecting an excellent-to-good performance of ANN in predicting ET_o over the Maricopa station.

3.3. Queen Creek Station

Evaluation indices of the 35 developed ET_o models across the Queen Creek weather station are summarized in Table 6. Like the Coolidge and Maricopa weather stations, all ML algorithms showed a linear-positive distribution between ET_obs and ET_pre (0.994 ≤ R² ≤ 0.999) with a slight overestimation of daily ET_o ranging between 0.043 and 0.110 mm·day⁻¹ when considering 10-variable parameters (Figure 11 and Table 6).

No notable differences in the structure and distribution of violin diagrams of the observed and predicted datasets (Figure 12). RMSE_n values ranged between 1.224 and 3.521%, while the S_e values were 0.877 ≤ S_e ≤ 2.235, indicating excellent accuracy in the prediction of ET_o during the testing period (Table 6).

The CatBoost algorithm, CB10 model, outperformed other models with a slight overestimation of predicted ET_o (MAE = 0.043 mm·day⁻¹) when considering the ten weather parameters. R², S_e, and RMSE_n were 0.999, 0.877%, and 1.450%, indicating an excellent performance of the CB10 model in predicting ET_o Queen Creek.

Like Coolidge and Maricopa, T_max, R_s, and U₂ significantly impacted ET_o predictions while R_a and T_ave had moderate effects. However, RH_min, RH_max, and RH_ave have lessened ET_o, with RH_ave and P_r showing minimal impact, indicating limited relevance to the model (Figure 13).

T_max stood out as the most influential variable, with U₂ and R_s also playing significant roles among the ten weather parameters. Meanwhile, R_a and T_ave made notable contributions to the model’s predictions, while RH_min, RH_max, T_min, RH_ave, and P_r had only minor impacts (Figure 14).

All ML algorithms showed excellent (RMSE_n = 1.748%) to good (RMSE_n = 18.603%) accuracy in predicting daily ET_o over the Queen Creek when considering the 8, 7, 6, 5, 4, and 3-variable models. The ANN algorithm outperformed all ML models with R² values between 0.862 and 0.999 and S_e ≤ 13.489%. The RMSE_n ranged between 1.748% and 17.655%, reflecting an excellent-to-good performance of the ANN-based models in predicting ET_o over the Queen Creek station.

3.4. Overall Discussion

In this study, 35 ET_o models were developed using five ML algorithms and 7 input combinations of meteorological variables, as listed in Table 3. Statistical indices assessing the performance of 35 ET_o-developed models in comparison with the FAO-56 PM standard model across Coolidge, Maricopa, and Queen Creek are summarized in Table 4, Table 5 and Table 6, respectively. Generally speaking, the 35 ET_o-developed models showed an excellent to fair performance in predicting daily ET_o across the three tested weather stations in Pinal County, Arizona. Employing the ANN10, RF10, XGBoost10, CatBoost10, and SVM10 ET_o models using all the meteorological variables resulted in the highest accuracies in predicting daily ET_o during training and testing periods at the three tested stations. However, excluding meteorological variables led to a gradual reduction in the performance of the ET-developed models across the three tested weather stations, whereas the highest S_e were recorded under 3-variable models. The ET models executing complete meteorological variables have the best prediction accuracy, as compared with the incomplete input parameters [50,51]. These findings are consistent with previously cited studies, which reported that using more meteorological variables as inputs can generally boost the accuracy of ML models in estimating daily ET_o [35,36,52,53].

The CatBoost algorithm, CatBoost10 model, slightly outperformed other 10-variable models with R² = 0.999, 1.227 ≤ RMSE_n ≤ 1.479, 0.043 ≤ MAE ≤ 0.073 mm·day⁻¹, and 0.833 ≤ S_e ≤ 0.890% during the testing period (Table 4, Table 5 and Table 6), making it the best choice to predict ET_o over the three weather stations. Our findings agreed with Huang et al. [54], who reported a higher performance of CatBoost than SVM and RF-based models when all input combinations are available. Also, Zhang et al. [55] concluded that the CatBoost model has a great ability in modeling daily ET_o in the arid and semi-arid regions of Northern China and recommended CatBoost for ET_o estimations with similar climates.

The ANN-based models were slightly superior to the SVM-based models across the three weather stations (Table 4, Table 5 and Table 6). These were inconsistent with Kişi and Cimen [56] and Wen et al. [57], who reported a better performance of SVM than ANN in modeling daily ET_o at North Coast Valleys, California, in the USA, and under extreme arid regions in Ejina City, China. This inconsistency might be a result of factors involved with the performance of the SVM and ANN models, like the training algorithm used and the models’ hyperparameter settings.

Meanwhile, the SVM models were more robust and capable than the RF models in predicting ET_o across Pinal weather stations. This was in agreement with Rai et al. [58], who revealed that SVM outperforms RF for estimating monthly ET_o in Uttar Pradesh and Uttarakhand States, India. Similar observations were reported by Abdallah et al. [35], Chen et al. [59], Fan et al. [60], and Huang et al. [54], who evaluated different deep and machine learning models, including SVM and RF, in predicting daily ET_o under different climate conditions in China and Sudan.

Similarly, XGBoost models outperformed RF models in predicting ET_o across the three weather stations. This agreed with Abdallah et al. [35], who reported that XGBoost models have a higher accuracy in estimating daily ET_o than RF-based models under the hyper-arid regions in Sudan. Additionally, Fan et al. [60] concluded that the XGBoost models are more efficient and capable than the RF models in predicting daily ET_o over China. Similar findings were reported in Brazil [26,61] and Bengaluru, Karnataka State, in India [62].

Based on our analyses, the RF8, RF7, RF6, RF5, RF4, and RF3 models showed the highest accuracy in predicting ET_o during the training period (see R², RMSE_n, MAE, S_e values, Table 4, Table 5 and Table 6). In contrast, compared with other ML models, the RF models had the highest simulation errors across the three weather stations during the testing period (Table 4, Table 5 and Table 6). Our observations were in agreement with Abdallah et al. [35], Fan et al. [60], Feng et al. [63] and Huang et al. [54], who reported that the RF algorithm could show a higher accuracy in predicting daily ET_o than other ML models during the training period. Meanwhile, they had the highest MAE and root mean square errors during the testing period [35].

Taking the CatBoost10 models as an example, the SHAP-based interpretability across the three weather stations aligns well with established physical understanding of ET_o drivers (Figure 5, Figure 9 and Figure 13). Among all variables, T_max, R_s, and U₂ had the most significant positive influence on ET_o predictions, as high values of these parameters tend to enhance ET_o. R_a and T_ave also showed moderate positive effects. This was consistent with Allen et al. [20], who reported that during hot, dry weather conditions, ET demand increases due to the aridity of the atmosphere and the abundance of energy from direct R_s and latent heat. Under such conditions, the air can retain a considerable amount of water vapor, while wind may facilitate water transportation, thereby enabling greater water vapor uptake. Conversely, under humid conditions, the wind can only replace saturated air with slightly less saturated air and remove heat energy, leading to a far lesser extent than under arid conditions.

In contrast, the variables related to relative humidity (RH_min, RH_max, and RH_ave) showed an adverse effect on ET_o, resulting in reduced SHAP values (Figure 5, Figure 9 and Figure 13). RH is inversely proportional to temperature. If the humidity remains constant, increasing temperature should increase ET. Increased humidity can partially offset the impact of increasing temperature on ET. T_min and P_r had the least influence, suggesting limited direct relevance to the model’s prediction of ET_o under the arid climate conditions.

Based on mean absolute SHAP values, T_max emerged as the most influential factor, followed by R_s and U₂ across the three tested weather stations (Figure 6, Figure 10 and Figure 14). R_a and T_ave also contributed significantly, to a lesser extent. These variables are well-known drivers of evapotranspiration due to their direct physical relationship with atmospheric energy and vapor transport [20]. Meanwhile, humidity-related variables (RH_min, RH_max, and RH_ave) along with T_min and P_r were found to have minimal impact on the model’s predictions (Figure 6, Figure 10 and Figure 14). This ranking provides a clear interpretation of the relative importance of meteorological parameters, reinforcing the physical understanding that T_max, R_s, U₂, followed by R_a, and T_ave are the dominant factors influencing ET_o under arid and semi-arid conditions.

The clear hierarchy of feature influence revealed by the SHAP values (Figure 6, Figure 10 and Figure 14) aligns well with established agro-meteorological principles. It provides a transparent and quantitative basis for prioritizing critical input variables in model development or field sensor deployment, especially in settings where data collection may be constrained. Overall, this bar plot enhances model interpretability and supports the conclusion that radiative and aerodynamic factors are the key drivers of ET_o within the studied region.

4. Conclusions, Recommendations, and Outlooks

The current study developed 35 ET_o models to predict daily ET_o across Pinal County, Arizona. Seven input combinations of daily meteorological variables were used as a running scenario for training and testing the five ML models: Artificial Neural Network (ANN), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Support Vector Machine (SVM). Then, SHapley Additive exPlanations (SHAP) was used to interpret each meteorological variable’s contribution to the model predictions. Results highlighted that the ML models could accurately predict daily ET_o under the arid climate in Pinal County, Arizona. Based on the statistical indices, the models ranked with all 10 meteorological variables as follows: CatBoost, ANN, SVM, XGBoost, and RF. However, for incomplete input sets, the ranking was: ANN, SVM, CatBoost, XGBoost, and RF. Employing the ANN10, RF10, XGBoost10, CatBoost10, and SVM10 ET_o models using all the meteorological variables resulted in the highest accuracies in predicting daily ET_o during training and testing periods at the three tested stations. While excluding meteorological variables decreased model performance, models using only T_max, T_min, and T_ave still predicted ET_o well across the three tested weather stations. Therefore, the three-variable temperature-based models are highly recommended as a simplified technique for predicting daily ET_o, especially in areas with limited climatic data, such as developing countries. These models can assist in water resource management and irrigation scheduling when meteorological data is limited. Additionally, our findings highlighted that T_max, R_s, and U₂ are the most influential factors affecting ET_o in arid conditions, followed by R_a and T_ave. In contrast, humidity-related variables (RH_min, RH_max, and RH_ave), along with T_min and P_r, had minimal impact on the model’s predictions.

Selecting only three weather stations with daily meteorological data is insufficient to capture the spatial-temporal variability across Arizona and the USA. Thus, to confirm their accuracy, further studies are recommended to evaluate the performance of the 35 ET_o-developed models under different spatial and temporal (hourly, monthly, and yearly) scales in the USA and even worldwide. Moreover, the ET-developed ML models rely on historical datasets, which may not reflect future climate change. Therefore, future studies integrating ET-developed models with global climate models would help improve water budget analysis and support growers and policymakers in the face of water scarcity.

Author Contributions

Conceptualization, E.A.E.; methodology, E.A.E., M.A.H.A. and D.E.M.E.; software, E.A.E. and M.A.H.A.; validation, E.A.E., M.A.H.A. and D.E.M.E.; investigation, E.A.E., M.A.H.A. and D.E.M.E.; data curation, E.A.E., M.A.H.A. and D.E.M.E.; writing—original draft preparation, E.A.E. and M.A.H.A.; writing—review and editing, E.A.E., M.A.H.A., C.W., K.R.T. and D.E.M.E.; visualization, E.A.E.; supervision, D.E.M.E.; project administration, D.E.M.E.; funding acquisition, K.R.T., C.W. and D.E.M.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the University of Arizona Cooperative Extension Service, The University of Arizona, Tucson, AZ 85721, USA. Additionally, it was supported by the Agricultural Research Service of the U.S. Department of Agriculture and carried out in collaboration with the Arid Land Agricultural Research Center.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Acknowledgments

The authors greatly appreciate the University of Arizona Cooperative Extension Service for supporting irrigation research in arid regions in the USA, such as Arizona.

Conflicts of Interest

The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this study.

References

Elshikha, D.E.; Attalah, S.; Waller, P.; Hunsaker, D.J.; Thorp, K.R.; Bautista, E.; Williams, C.; Wall, G.W.; Orr, E.; Elsadek, E.A. Can OpenET Transform Irrigation Management in the Southwestern, U.S.? College of Agriculture, Life, and Environmental Sciences, University of Arizona: Tucson, AZ, USA, 2025. [Google Scholar]
Bennett, K.E.; Talsma, C.; Boero, R. Concurrent Changes in Extreme Hydroclimate Events in the Colorado River Basin. Water 2021, 13, 978. [Google Scholar] [CrossRef]
Holdren, G.C.; Turner, K. Characteristics of Lake Mead, Arizona–Nevada. Lake Reserv. Manag. 2010, 26, 230–239. [Google Scholar] [CrossRef]
Castle, S.L.; Reager, J.T.; Thomas, B.F.; Purdy, A.J.; Lo, M.-H.; Famiglietti, J.S.; Tang, Q. Remote Detection of Water Management Impacts on Evapotranspiration in the Colorado River Basin. Geophys. Res. Lett. 2016, 43, 5089–5097. [Google Scholar] [CrossRef]
Elsadek, E.A.; Attalah, S.; Waller, P.; Norton, R.; Hunsaker, D.J.; Williams, C.; Thorp, K.R.; Orr, E.; Elshikha, D.E.M. Simulating Water Use and Yield for Full and Deficit Flood-Irrigated Cotton in Arizona, USA. Agronomy 2025, 15, 2023. [Google Scholar] [CrossRef]
Thorp, K.R.; Calleja, S.; Pauli, D.; Thompson, A.L.; Elshikha, D.E. Agronomic Outcomes of Precision Irrigation Management Technologies with Varying Complexity. J. ASABE 2022, 65, 135–150. [Google Scholar] [CrossRef]
Elshikha, D.E.; Attalah, S.; Elsadek, E.A.; Waller, P.; Thorp, K.; Sanyal, D.; Bautista, E.; Norton, R.; Hunsaker, D.; Williams, C.; et al. The Impact of Gravity Drip and Flood Irrigation on Development, Water Productivity, and Fiber Yield of Cotton in Semi-Arid Conditions of Arizona. In Proceedings of the 2024 ASABE Annual International Meeting, Anaheim, CA, USA, 28–31 July 2024; American Society of Agricultural and Biological Engineers: St. Joseph, MI, USA, 2024; pp. 1–16. [Google Scholar]
Elshikha, D.E.M.; Wang, G.; Waller, P.M.; Hunsaker, D.J.; Dierig, D.; Thorp, K.R.; Thompson, A.; Katterman, M.E.; Herritt, M.T.; Bautista, E.; et al. Guayule Growth and Yield Responses to Deficit Irrigation Strategies in the U.S. Desert. Agric. Water Manag. 2023, 277, 108093. [Google Scholar] [CrossRef]
Elsadek, E.A.; Zhang, K.; Hamoud, Y.A.; Mousa, A.; Awad, A.; Abdallah, M.; Shaghaleh, H.; Hamad, A.A.A.; Jamil, M.T.; Elbeltagi, A. Impacts of Climate Change on Rice Yields in the Nile River Delta of Egypt: A Large-Scale Projection Analysis Based on CMIP6. Agric. Water Manag. 2024, 292, 108673. [Google Scholar] [CrossRef]
Elsadek, E.A. Study on the In-Field Water Balance and the Projected Impacts of Climate Change on Rice Yields in the Nile River Delta. Ph.D. Thesis, Hohai University, Nanjing, China, 2023. [Google Scholar]
Elsadek, E.A.; Elshikha, D.E.M.; Awad, A.; Hamoud, Y.A.; Elsheikha, A.M.; Williams, C.; Orr, E.; Shaghaleh, H.; Hamad, A.A.A.; Thorp, K.R.; et al. Projecting Rice Water Footprint for Different Shared Socioeconomic Pathways under Arid Climate Conditions. Irrig. Sci. 2025, 43, 955–969. [Google Scholar] [CrossRef]
Elsadek, E.; Zhang, K.; Mousa, A.; Ezaz, G.T.; Tola, T.L.; Shaghaleh, H.; Hamad, A.A.A.; Alhaj Hamoud, Y. Study on the In-Field Water Balance of Direct-Seeded Rice with Various Irrigation Regimes under Arid Climatic Conditions in Egypt Using the AquaCrop Model. Agronomy 2023, 13, 609. [Google Scholar] [CrossRef]
Adeyemi, O.; Grove, I.; Peets, S.; Norton, T. Advanced Monitoring and Management Systems for Improving Sustainability in Precision Irrigation. Sustainability 2017, 9, 353. [Google Scholar] [CrossRef]
Abioye, E.A.; Abidin, M.S.Z.; Mahmud, M.S.A.; Buyamin, S.; Ishak, M.H.I.; Rahman, M.K.I.A.; Otuoze, A.O.; Onotu, P.; Ramli, M.S.A. A Review on Monitoring and Advanced Control Strategies for Precision Irrigation. Comput. Electron. Agric. 2020, 173, 105441. [Google Scholar] [CrossRef]
Volk, J.M.; Huntington, J.L.; Melton, F.S.; Allen, R.; Anderson, M.; Fisher, J.B.; Kilic, A.; Ruhoff, A.; Senay, G.B.; Minor, B.; et al. Assessing the Accuracy of OpenET Satellite-Based Evapotranspiration Data to Support Water Resource and Land Management Applications. Nat. Water 2024, 2, 193–205. [Google Scholar] [CrossRef]
Melton, F.S.; Huntington, J.; Grimm, R.; Herring, J.; Hall, M.; Rollison, D.; Erickson, T.; Allen, R.; Anderson, M.; Fisher, J.B.; et al. OpenET: Filling a Critical Data Gap in Water Management for the Western United States. JAWRA J. Am. Water Resour. Assoc. 2022, 58, 971–994. [Google Scholar] [CrossRef]
Wanniarachchi, S.; Sarukkalige, R. A Review on Evapotranspiration Estimation in Agricultural Water Management: Past, Present, and Future. Hydrology 2022, 9, 123. [Google Scholar] [CrossRef]
French, A.N.; Sanchez, C.A.; Hunsaker, D.J.; Anderson, R.G.; Saber, M.N.; Wisniewski, E.H. Lettuce Evapotranspiration and Crop Coefficients Using Eddy Covariance and Remote Sensing Observations. Irrig. Sci. 2024, 42, 1245–1272. [Google Scholar] [CrossRef]
Bawazir, A.S.; Luthy, R.; King, J.P.; Tanzy, B.F.; Solis, J. Assessment of the Crop Coefficient for Saltgrass under Native Riparian Field Conditions in the Desert Southwest. Hydrol. Process. 2014, 28, 6163–6171. [Google Scholar] [CrossRef]
Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop Evapotranspiration-Guidelines for Computing Crop Water Requirements-FAO Irrigation and Drainage Paper 56; FAO: Rome, Italy, 1998. [Google Scholar]
Shiri, J.; Marti, P.; Karimi, S.; Landeras, G. Data Splitting Strategies for Improving Data Driven Models for Reference Evapotranspiration Estimation among Similar Stations. Comput. Electron. Agric. 2019, 162, 70–81. [Google Scholar] [CrossRef]
Hargreaves, G.H.; Samani, Z.A. Reference Crop Evapotranspiration from Temperature. Appl. Eng. Agric. 1985, 1, 96–99. [Google Scholar] [CrossRef]
Elbeltagi, A.; Katipoğlu, O.M.; Kartal, V.; Danandeh Mehr, A.; Berhail, S.; Elsadek, E.A. Advanced Reference Crop Evapotranspiration Prediction: A Novel Framework Combining Neural Nets, Bee Optimization Algorithm, and Mode Decomposition. Appl. Water Sci. 2024, 14, 256. [Google Scholar] [CrossRef]
Raja, P.; Sona, F.; Surendran, U.; Srinivas, C.V.; Kannan, K.; Madhu, M.; Mahesh, P.; Annepu, S.K.; Ahmed, M.; Chandrasekar, K.; et al. Performance Evaluation of Different Empirical Models for Reference Evapotranspiration Estimation over Udhagamandalm, The Nilgiris, India. Sci. Rep. 2024, 14, 12429. [Google Scholar] [CrossRef] [PubMed]
Celestin, S.; Qi, F.; Li, R.; Yu, T.; Cheng, W. Evaluation of 32 Simple Equations against the Penman–Monteith Method to Estimate the Reference Evapotranspiration in the Hexi Corridor, Northwest China. Water 2020, 12, 2772. [Google Scholar] [CrossRef]
Ferreira, L.B.; da Cunha, F.F. New Approach to Estimate Daily Reference Evapotranspiration Based on Hourly Temperature and Relative Humidity Using Machine Learning and Deep Learning. Agric. Water Manag. 2020, 234, 106113. [Google Scholar] [CrossRef]
Sowmya, M.R.; Kumar, M.B.S.; Ambat, S.K. Comparison of Deep Neural Networks for Reference Evapotranspiration Prediction Using Minimal Meteorological Data. In Proceedings of the 2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA), Cochin, India, 2–4 July 2020; pp. 27–33. [Google Scholar]
Bellido-Jiménez, J.A.; Estévez, J.; García-Marín, A.P. New Machine Learning Approaches to Improve Reference Evapotranspiration Estimates Using Intra-Daily Temperature-Based Variables in a Semi-Arid Region of Spain. Agric. Water Manag. 2021, 245, 106558. [Google Scholar] [CrossRef]
Kaya, Y.Z.; Zelenakova, M.; Üneş, F.; Demirci, M.; Hlavata, H.; Mesaros, P. Estimation of Daily Evapotranspiration in Košice City (Slovakia) Using Several Soft Computing Techniques. Theor. Appl. Climatol. 2021, 144, 287–298. [Google Scholar] [CrossRef]
Chia, M.Y.; Huang, Y.F.; Koo, C.H.; Fung, K.F. Recent Advances in Evapotranspiration Estimation Using Artificial Intelligence Approaches with a Focus on Hybridization Techniques—A Review. Agronomy 2020, 10, 101. [Google Scholar] [CrossRef]
Duval, D.; Bickel, A.K.; Frisvold, G. County Agricultural Economy Profiles for Southern Arizona. Available online: https://mapazdashboard.arizona.edu/article/county-agricultural-economy-profiles-southern-arizona (accessed on 27 June 2025).
Migoya, C. With Colorado River Water Cuts, Some Pinal Farmers Drill Wells. For Others, Fields Sit Dry. Available online: https://www.azcentral.com/story/news/local/arizona-environment/2023/02/08/cap-water-cuts-increase-disparities-among-pinal-county-farmers/69827876007/ (accessed on 26 August 2025).
Duval, D.; Montanía, C.V.; Quintero, J.H. Arizona County Agricultural Economy Profiles; College of Agriculture, Life, and Environmental Sciences, University of Arizona: Tucson, AZ, USA, 2022. [Google Scholar]
Nabhan, G.P.; Richter, B.D.; Riordan, E.C.; Tornbom, C. Toward Water-Resilient Agriculture in Arizona: Future Scenarios Addressing Water Scarcity; Lincoln Institute of Land Policy: Cambridge, MA, USA, 2023. [Google Scholar]
Abdallah, M.; Mohammadi, B.; Zaroug, M.A.H.; Omer, A.; Cheraghalizadeh, M.; Eldow, M.E.E.; Duan, Z. Reference Evapotranspiration Estimation in Hyper-Arid Regions via D-Vine Copula Based-Quantile Regression and Comparison with Empirical Approaches and Machine Learning Models. J. Hydrol. Reg. Stud. 2022, 44, 101259. [Google Scholar] [CrossRef]
Mehdizadeh, S.; Mohammadi, B.; Pham, Q.B.; Duan, Z. Development of Boosted Machine Learning Models for Estimating Daily Reference Evapotranspiration and Comparison with Empirical Approaches. Water 2021, 13, 3489. [Google Scholar] [CrossRef]
Bisoyi, N.; Gupta, H.; Padhy, N.P.; Chakrapani, G.J. Prediction of Daily Sediment Discharge Using a Back Propagation Neural Network Training Algorithm: A Case Study of the Narmada River, India. Int. J. Sediment Res. 2019, 34, 125–135. [Google Scholar] [CrossRef]
Kumar, M.; Raghuwanshi, N.S.; Singh, R. Artificial Neural Networks Approach in Evapotranspiration Modeling: A Review. Irrig. Sci. 2011, 29, 11–25. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Dorogush, A.V.; Ershov, V.; Gulin, A. CatBoost: Gradient Boosting with Categorical Features Support. arXiv 2018, arXiv:1810.11363. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Proceedings of the Advances in Neural Information Processing Systems, 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Wang, L. Support Vector Machines: Theory and Applications; Springer Science & Business Media: Boston, NY, USA, 2005; Volume 177, ISBN 3540243887. [Google Scholar]
Muraina, I. Ideal Dataset Splitting Ratios in Machine Learning Algorithms: General Concerns for Data Scientists and Data Analysts. In Proceedings of the 7th International Mardin Artuklu Scientific Research Conference, Mardin, Turkey, 10–12 December 2021; pp. 496–504. [Google Scholar]
Štrumbelj, E.; Kononenko, I. Explaining Prediction Models and Individual Predictions with Feature Contributions. Knowl. Inf. Syst. 2014, 41, 647–665. [Google Scholar] [CrossRef]
Antwarg, L.; Miller, R.M.; Shapira, B.; Rokach, L. Explaining Anomalies Detected by Autoencoders Using Shapley Additive Explanations. Expert Syst. Appl. 2021, 186, 115736. [Google Scholar] [CrossRef]
Jamieson, P.D.; Porter, J.R.; Wilson, D.R. A Test of the Computer Simulation Model ARCWHEAT1 on Wheat Crops Grown in New Zealand. Field Crops Res. 1991, 27, 337–350. [Google Scholar] [CrossRef]
Brisson, N.; Ruget, F.; Gate, P.; Lorgeou, J.; Nicoullaud, B.; Tayot, X.; Plenet, D.; Jeuffroy, M.-H.; Bouthier, A.; Ripoche, D.; et al. STICS: A Generic Model for Simulating Crops and Their Water and Nitrogen Balances. II. Model Validation for Wheat and Maize. Agronomie 2002, 22, 69–92. [Google Scholar] [CrossRef]
Mattar, M.A. Using Gene Expression Programming in Monthly Reference Evapotranspiration Modeling: A Case Study in Egypt. Agric. Water Manag. 2018, 198, 28–38. [Google Scholar] [CrossRef]
Mehdizadeh, S.; Behmanesh, J.; Khalili, K. Using MARS, SVM, GEP and Empirical Equations for Estimation of Monthly Mean Reference Evapotranspiration. Comput. Electron. Agric. 2017, 139, 103–114. [Google Scholar] [CrossRef]
Droogers, P.; Allen, R.G. Estimating Reference Evapotranspiration Under Inaccurate Data Conditions. Irrig. Drain. Syst. 2002, 16, 33–45. [Google Scholar] [CrossRef]
Wang, S.; Lian, J.; Peng, Y.; Hu, B.; Chen, H. Generalized Reference Evapotranspiration Models with Limited Climatic Data Based on Random Forest and Gene Expression Programming in Guangxi, China. Agric. Water Manag. 2019, 221, 220–230. [Google Scholar] [CrossRef]
Huang, G.; Wu, L.; Ma, X.; Zhang, W.; Fan, J.; Yu, X.; Zeng, W.; Zhou, H. Evaluation of CatBoost Method for Prediction of Reference Evapotranspiration in Humid Regions. J. Hydrol. 2019, 574, 1029–1041. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, Z.; Zheng, J. CatBoost: A New Approach for Estimating Daily Reference Crop Evapotranspiration in Arid and Semi-Arid Regions of Northern China. J. Hydrol. 2020, 588, 125087. [Google Scholar] [CrossRef]
Kişi, O.; Cimen, M. Evapotranspiration Modelling Using Support Vector Machines/Modélisation de l’évapotranspiration à l’aide de ‘Support Vector Machines’. Hydrol. Sci. J. 2009, 54, 918–928. [Google Scholar] [CrossRef]
Wen, X.; Si, J.; He, Z.; Wu, J.; Shao, H.; Yu, H. Support-Vector-Machine-Based Models for Modeling Daily Reference Evapotranspiration With Limited Climatic Data in Extreme Arid Regions. Water Resour. Manag. 2015, 29, 3195–3209. [Google Scholar] [CrossRef]
Rai, P.; Kumar, P.; Al-Ansari, N.; Malik, A. Evaluation of Machine Learning versus Empirical Models for Monthly Reference Evapotranspiration Estimation in Uttar Pradesh and Uttarakhand States, India. Sustainability 2022, 14, 5771. [Google Scholar] [CrossRef]
Chen, Z.; Zhu, Z.; Jiang, H.; Sun, S. Estimating Daily Reference Evapotranspiration Based on Limited Meteorological Data Using Deep Learning and Classical Machine Learning Methods. J. Hydrol. 2020, 591, 125286. [Google Scholar] [CrossRef]
Fan, J.; Yue, W.; Wu, L.; Zhang, F.; Cai, H.; Wang, X.; Lu, X.; Xiang, Y. Evaluation of SVM, ELM and Four Tree-Based Ensemble Models for Predicting Daily Reference Evapotranspiration Using Limited Meteorological Data in Different Climates of China. Agric. For. Meteorol. 2018, 263, 225–241. [Google Scholar] [CrossRef]
Ferreira, L.B.; da Cunha, F.F.; Fernandes Filho, E.I. Exploring Machine Learning and Multi-Task Learning to Estimate Meteorological Data and Reference Evapotranspiration across Brazil. Agric. Water Manag. 2022, 259, 107281. [Google Scholar] [CrossRef]
Agrawal, Y.; Kumar, M.; Ananthakrishnan, S.; Kumarapuram, G. Evapotranspiration Modeling Using Different Tree Based Ensembled Machine Learning Algorithm. Water Resour. Manag. 2022, 36, 1025–1042. [Google Scholar] [CrossRef]
Feng, Y.; Cui, N.; Gong, D.; Zhang, Q.; Zhao, L. Evaluation of Random Forests and Generalized Regression Neural Networks for Daily Reference Evapotranspiration Modelling. Agric. Water Manag. 2017, 193, 163–173. [Google Scholar] [CrossRef]

Figure 1. Arizona linear hydrography features and geographic distribution of the Pinal weather stations, Arizona, USA.

Figure 2. Kendall correlation analysis between the variables used in the Pinal weather stations.

Figure 3. Scatter plots of predicted reference evapotranspiration (ET_o, mm·day⁻¹) at the Coolidge weather station while considering all meteorological variables (10-variable models).

Figure 4. Violin diagrams for actual reference evapotranspiration (ET_o, mm·day⁻¹) versus predicted ET_o at the Coolidge weather station while considering all meteorological variables (10-variable models).

Figure 5. SHapley Additive exPlanations for the Coolidge weather station.

Figure 6. Mean absolute SHapley Additive exPlanations (SHAP) values for each input feature used to predict daily reference evapotranspiration (ET_o, mm·day⁻¹) across the Coolidge weather station.

Figure 7. Scatter plots of predicted reference evapotranspiration (ET_o, mm·day⁻¹) at the Maricopa weather station while considering all meteorological variables (10-variable models).

Figure 8. Violin diagrams for actual reference evapotranspiration (ET_o, mm·day⁻¹) versus predicted ET_o at the Maricopa weather station while considering all meteorological variables (10-variable models).

Figure 9. SHapley Additive exPlanations for the Maricopa weather station.

Figure 10. Mean absolute SHapley Additive exPlanations (SHAP) values for each input feature used to predict daily reference evapotranspiration (ET_o, mm·day⁻¹) across the Maricopa weather station.

Figure 11. Scatter plots of predicted reference evapotranspiration (ET_o, mm·day⁻¹) at the Queen Creek weather station while considering all meteorological variables (10-variable models).

Figure 12. Violin diagrams for actual reference evapotranspiration (ET_o, mm·day⁻¹) versus predicted ET_o at the Queen Creek weather station while considering all meteorological variables (10-variable models).

Figure 13. SHapley Additive exPlanations for the Queen Creek weather station.

Figure 14. Mean absolute SHapley Additive exPlanations (SHAP) values for each input feature used to predict daily reference evapotranspiration (ET_o, mm·day⁻¹) across the Queen Creek weather station.

Table 1. Statistics of the daily meteorological variables (1995–2025) used to predict reference evapotranspiration (ET_o, mm·day⁻¹) across Pinal County, Arizona.

Parameter	Station	Statistics
Parameter	Station	Mean	Standard Deviation	Minimum	Maximum	Standard Error	Range
T_max, °C	Coolidge	30.2	8.5	5.8	48.7	0.1	42.9
	Maricopa	30.4	8.8	5.8	48.0	0.1	42.2
	Queen Creek	29.8	8.6	5.7	47.9	0.1	42.2
T_min, °C	Coolidge	10.8	8.4	−10	31.1	0.1	41.1
	Maricopa	12.7	8.9	−8.7	32.5	0.1	41.2
	Queen Creek	12.1	8.3	−17.3	32.2	0.1	49.5
T_ave, °C	Coolidge	20.4	8.3	−1.0	37.6	0.1	38.6
	Maricopa	21.5	8.9	0.1	39.2	0.1	39.1
	Queen Creek	21.1	8.4	−0.6	38.4	0.1	39.0
P_r, mm	Coolidge	0.4	2.3	0.0	51.0	0.01	51.0
	Maricopa	0.4	2.5	0.0	56.9	0.01	56.9
	Queen Creek	0.5	2.9	0.0	73.4	0.01	73.4
RH_max, %	Coolidge	77.3	16.1	17.0	100.3	0.2	83.3
	Maricopa	69.1	19.5	19.2	100	0.2	80.8
	Queen Creek	71.9	17.4	16.9	100.2	0.2	83.3
RH_min, %	Coolidge	18.6	12.3	1.4	97.0	0.1	95.6
	Maricopa	18.2	12.1	0.1	91.0	0.1	90.9
	Queen Creek	18.8	12.1	1.8	93.0	0.1	91.2
RH_ave, %	Coolidge	45.1	17.6	7.4	100.0	0.2	92.6
	Maricopa	40.5	18.4	8.5	98.0	0.2	89.5
	Queen Creek	42.1	17.7	6.1	100.0	0.2	93.9
U₂, m·s⁻¹	Coolidge	1.8	0.8	0.0	8.0	0.01	8.0
	Maricopa	1.8	0.8	0.2	7.6	0.01	7.5
	Queen Creek	1.7	0.7	0.0	6.6	0.01	6.6
R_a, MJ·m⁻²·day⁻¹	Coolidge	30.6	8.5	0.0	41.5	0.1	41.5
	Maricopa	30.6	8.4	17.8	41.5	0.1	23.7
	Queen Creek	30.6	8.5	0.0	41.5	0.1	41.5
R_s, MJ·m⁻²·day⁻¹	Coolidge	20.6	7.1	1.2	34.3	0.1	33.1
	Maricopa	20.7	7.1	1.2	33.8	0.1	32.6
	Queen Creek	20.2	7.0	0.3	32.6	0.1	32.3
ET_o, mm·day⁻¹	Coolidge	5.0	2.4	0.3	12.5	0.01	12.2
	Maricopa	5.2	2.6	0.4	12.8	0.01	12.3
	Queen Creek	5.0	2.3	0.3	12.6	0.01	12.2

Notes: T_max, T_min, and T_ave refer to maximum, minimum, and average temperature, respectively. P_r refers to precipitation. RH_max, RH_min, and RH_ave are the maximum, minimum, and average relative humidity, respectively, while U₂ is the wind speed at 2 m height. R_a is extraterrestrial solar radiation, and R_S is solar radiation. ET_o is the reference evapotranspiration.

Table 2. List of the default and tuned hyperparameters for each machine learning model used in ET_o prediction.

Model	Hyperparameters	Default	Tuned
ANN	hidden_layer_sizes	100,	100, 50
	activation	‘relu’	‘relu’
	solver	‘adam’	‘adam’
	alpha (L2 penalty)	0.0001	0.0001
	learning_rate	‘constant’	‘constant’
	learning_rate_init	0.001	0.001
	max_iter	200	1000
	batch_size	‘auto’	‘auto’
	random_state	None	42
RF	n_estimators	100	100
	max_depth	None	None
	min_samples_split	2	2
	min_samples_leaf	1	1
	max_features	‘sqrt’	‘sqrt’
	bootstrap	True	True
	random_state	None	42
XGBoost	booster	‘gbtree’	‘gbtree’
	learning_rate	0.3	0.1
	max_depth	6	6
	n_estimators	100	100
	subsample	1	1
	colsample_bytree	1	1
	gamma	0	0
	reg_alpha	0	0
	reg_lambda	1	1
	random_state	None	42
CatBoost	iterations	1000	1000
	learning_rate	Auto tuned	Auto tuned
	depth	6	6
	l2_leaf_reg	3	3
	loss_function	RMSE	RMSE
	bootstrap_type	‘Bayesian’	‘Bayesian’
	random_strength	1	1
	bagging_temperature	1	1
	random_state	None	42
SVM	kernel	‘rbf’	‘rbf’
	C	1.0	1.0
	epsilon	0.1	0.1
	gamma	‘scale’	‘scale’
	degree	3	3
	coef0	0.0	0.0
	random_state	None	42

Notes: ANN, RF, XGBoost, CatBoost, and SVM refer to Artificial Neural Network (ANN), Random Forest, Extreme Gradient Boosting, Categorical Boosting, and Support Vector Machine, respectively.

Table 3. Input combination of the five machine learning models used for predicting daily reference evapotranspiration (ET_o, mm·day⁻¹) across Pinal County, Arizona.

Model (No. of Inputs)					Input Combination
ANN10	RF10	XGBoost10	CatBoost10	SVM10	T_max, T_min, T_ave, P_r, RH_max, RH_min, RH_ave, U₂, R_a, R_s
ANN8	RF8	XGBoost8	CatBoost8	SVM8	T_max, T_min, T_ave, P_r, RH_ave, U₂, R_a, R_s
ANN7	RF7	XGBoost7	CatBoost7	SVM7	T_max, T_min, T_ave, P_r, RH_ave, U₂, R_s
ANN6	RF6	XGBoost6	CatBoost6	SVM6	T_max, T_min, T_ave, P_r, RH_ave, U₂
ANN5	RF5	XGBoost5	CatBoost5	SVM5	T_max, T_min, T_ave, P_r, RH_ave
ANN4	RF4	XGBoost4	CatBoost4	SVM4	T_max, T_min, T_ave, RH_ave
ANN3	RF3	XGBoost3	CatBoost3	SVM3	T_max, T_min, T_ave

Notes: ANN, RF, XGBoost, CatBoost, and SVM refer to Artificial Neural Network (ANN), Random Forest, Extreme Gradient Boosting, Categorical Boosting, and Support Vector Machine, respectively. T_max, T_min, and T_ave refer to maximum, minimum, and average temperature, respectively. P_r refers to precipitation. RH_max, RH_min, and RH_ave are the maximum, minimum, and average relative humidity, respectively, while U₂ is the wind speed at 2 m height. R_a is extraterrestrial solar radiation, and R_S is solar radiation.

Table 4. Statistical indices assessing the performance of five machine learning models in predicting daily reference evapotranspiration (ET_o, mm·day⁻¹), in comparison with the FAO-56 Penman Monteith (PM) standardized method at Coolidge, Pinal County, Arizona.

Model	Training				Testing
Model	R²	MAE, mm	RMSE_n, %	S_e, %	R²	MAE, mm	RMSE_n, %	S_e, %
T_max, T_min, T_ave, P_r, RH_max, RH_min, RH_ave, U₂, R_a, R_s
ANN10	0.999	0.048	1.340	0.941	0.999	0.049	1.403	0.983
SVM10	0.999	0.052	1.750	1.032	0.998	0.056	2.238	1.123
RF10	0.999	0.045	1.359	0.895	0.994	0.115	3.485	2.295
XGBoost10	0.999	0.063	1.634	1.255	0.997	0.100	2.842	1.999
CatBoost10	1.000	0.031	0.788	0.609	0.999	0.045	1.305	0.890
T_max, T_min, T_ave, P_r, RH_ave, U₂, R_a, R_s
ANN8	0.999	0.055	1.527	1.089	0.999	0.060	1.685	1.191
SVM8	0.998	0.056	1.956	1.098	0.997	0.062	2.474	1.229
RF8	0.999	0.046	1.378	0.912	0.994	0.120	3.546	2.389
XGBoost8	0.999	0.069	1.790	1.357	0.996	0.110	3.097	2.189
CatBoost8	0.999	0.044	1.152	0.869	0.999	0.062	1.836	1.243
T_max, T_min, T_ave, P_r, RH_ave, U₂, R_s
ANN7	0.996	0.103	2.905	2.047	0.996	0.107	3.110	2.142
SVM7	0.996	0.102	3.115	2.025	0.994	0.109	3.575	2.173
RF7	0.999	0.054	1.556	1.066	0.992	0.143	4.146	2.867
XGBoost7	0.997	0.093	2.494	1.848	0.993	0.139	3.910	2.775
CatBoost7	0.998	0.082	2.252	1.625	0.996	0.105	3.059	2.099
T_max, T_min, T_ave, P_r, RH_ave, U₂
ANN6	0.957	0.373	9.726	7.381	0.954	0.381	10.125	7.615
SVM6	0.954	0.373	10.063	7.379	0.951	0.382	10.491	7.646
RF6	0.992	0.154	4.083	3.042	0.948	0.406	10.811	8.107
XGBoost6	0.970	0.306	8.036	6.048	0.949	0.400	10.676	7.991
CatBoost6	0.969	0.314	8.242	6.222	0.953	0.383	10.258	7.647
T_max, T_min, T_ave, P_r, RH_ave
ANN5	0.901	0.564	14.701	11.153	0.903	0.563	14.718	11.249
SVM5	0.896	0.573	15.071	11.343	0.899	0.569	14.973	11.371
RF5	0.984	0.227	5.957	4.495	0.889	0.601	15.753	12.025
XGBoost5	0.929	0.481	12.451	9.515	0.892	0.591	15.484	11.812
CatBoost5	0.927	0.489	12.583	9.678	0.897	0.579	15.145	11.573
T_max, T_min, T_ave, RH_ave
ANN4	0.901	0.566	14.685	11.197	0.902	0.567	14.748	11.325
SVM4	0.896	0.575	15.077	11.367	0.899	0.569	14.991	11.367
RF4	0.984	0.229	5.980	4.529	0.887	0.608	15.889	12.149
XGBoost4	0.929	0.482	12.427	9.531	0.892	0.593	15.518	11.848
CatBoost4	0.928	0.489	12.554	9.679	0.896	0.580	15.180	11.586
T_max, T_min, T_ave
ANN3	0.850	0.686	18.105	13.574	0.850	0.683	18.284	13.649
SVM3	0.846	0.694	18.302	13.725	0.849	0.687	18.355	13.731
RF3	0.977	0.268	7.116	5.292	0.832	0.727	19.327	14.525
XGBoost3	0.892	0.589	15.359	11.658	0.839	0.714	18.901	14.280
CatBoost3	0.886	0.605	15.737	11.971	0.846	0.701	18.524	14.011

Notes: ANN, RF, XGBoost, CatBoost, and SVM refer to Artificial Neural Network (ANN), Random Forest, Extreme Gradient Boosting, Categorical Boosting, and Support Vector Machine, respectively. The R² close to 1.0 indicates a good match between the predicted and observed datasets. The simulation is excellent if the estimation result of RMSE_n

<

10% and good if the result ranges between 10 and 20%. A result from 20 to 30% indicates fair performance, while RMSE_n > 30% indicates poor performance. The lower the MAE value, the greater the projection/modeling accuracy. The S_e between ±15% is acceptable.

Table 5. Statistical indices assessing the performance of five machine learning models in predicting daily reference evapotranspiration (ET_o), in comparison with the FAO-56 PM standard model at Maricopa, Pinal County, Arizona.

Model	Training				Testing
Model	R²	MAE, mm	RMSE_n, %	S_e, %	R²	MAE, mm	RMSE_n, %	S_e, %
T_max, T_min, T_ave, P_r, RH_max, RH_min, RH_ave, U₂, R_a, R_s
ANN10	1.000	0.040	1.063	0.762	0.999	0.044	1.224	0.848
SVM10	0.999	0.051	1.816	0.977	0.998	0.057	2.250	1.101
RF10	0.999	0.045	1.346	0.859	0.995	0.114	3.521	2.219
XGBoost10	0.999	0.059	1.506	1.137	0.997	0.101	2.879	1.955
CatBoost10	1.000	0.030	0.729	0.568	0.999	0.043	1.227	0.833
T_max, T_min, T_ave, P_r, RH_ave, U₂, R_a, R_s
ANN8	0.999	0.051	1.393	0.987	0.999	0.055	1.587	1.063
SVM8	0.999	0.055	1.956	1.054	0.998	0.062	2.530	1.196
RF8	0.999	0.046	1.356	0.876	0.995	0.118	3.607	2.294
XGBoost8	0.999	0.063	1.625	1.220	0.997	0.104	2.988	2.015
CatBoost8	1.000	0.041	1.055	0.792	0.999	0.060	1.730	1.155
T_max, T_min, T_ave, P_r, RH_ave, U₂, R_s
ANN7	0.997	0.102	2.723	1.964	0.997	0.103	2.832	1.995
SVM7	0.996	0.103	3.051	1.977	0.995	0.106	3.510	2.061
RF7	0.999	0.055	1.559	1.060	0.994	0.139	4.014	2.704
XGBoost7	0.998	0.092	2.406	1.763	0.995	0.131	3.601	2.534
CatBoost7	0.998	0.081	2.139	1.552	0.997	0.104	2.890	2.014
T_max, T_min, T_ave, P_r, RH_ave, U₂
ANN6	0.966	0.364	9.247	7.006	0.962	0.379	9.767	7.361
SVM6	0.961	0.376	9.862	7.239	0.959	0.382	10.131	7.416
RF6	0.994	0.154	4.006	2.960	0.957	0.404	10.457	7.848
XGBoost6	0.974	0.308	7.977	5.926	0.959	0.394	10.194	7.660
CatBoost6	0.974	0.314	8.059	6.046	0.962	0.381	9.853	7.401
T_max, T_min, T_ave, P_r, RH_ave
ANN5	0.904	0.609	15.438	11.719	0.904	0.610	15.597	11.848
SVM5	0.900	0.619	15.778	11.899	0.899	0.619	15.957	12.025
RF5	0.985	0.243	6.204	4.677	0.896	0.633	16.232	12.294
XGBoost5	0.931	0.517	13.063	9.948	0.899	0.623	15.982	12.095
CatBoost5	0.929	0.526	13.224	10.119	0.901	0.619	15.841	12.025
T_max, T_min, T_ave, RH_ave
ANN4	0.903	0.613	15.506	11.797	0.904	0.611	15.568	11.858
SVM4	0.899	0.619	15.798	11.913	0.900	0.617	15.898	11.976
RF4	0.984	0.244	6.219	4.689	0.894	0.637	16.354	12.368
XGBoost4	0.932	0.518	13.014	9.954	0.897	0.629	16.121	12.207
CatBoost4	0.929	0.531	13.300	10.213	0.900	0.622	15.921	12.089
T_max, T_min, T_ave
ANN3	0.851	0.757	19.217	14.557	0.850	0.759	19.457	14.747
SVM3	0.845	0.772	19.592	14.842	0.845	0.769	19.781	14.939
RF3	0.975	0.302	7.826	5.801	0.830	0.795	20.737	15.450
XGBoost3	0.888	0.659	16.695	12.680	0.841	0.777	20.018	15.089
CatBoost3	0.883	0.673	17.033	12.952	0.846	0.767	19.718	14.904

Notes: ANN, RF, XGBoost, CatBoost, and SVM refer to Artificial Neural Network (ANN), Random Forest, Extreme Gradient Boosting, Categorical Boosting, and Support Vector Machine, respectively. The R² close to 1.0 indicates a good match between the predicted and observed datasets. The simulation is excellent if the estimation result of RMSE_n

<

10% and good if the result ranges between 10 and 20%. A result from 20 to 30% indicates fair performance, while RMSE_n > 30% indicates poor performance. The lower the MAE value, the greater the projection/modeling accuracy. The S_e between ±15% is acceptable.

Table 6. Statistical indices assessing the performance of five machine learning models in predicting daily reference evapotranspiration (ET_o), in comparison with the FAO-56 PM standard model at Queen Creek, Pinal County, Arizona.

Model	Training				Testing
Model	R²	MAE, mm	RMSE_n, %	S_e, %	R²	MAE, mm	RMSE_n, %	S_e, %
T_max, T_min, T_ave, P_r, RH_max, RH_min, RH_ave, U₂, R_a, R_s
ANN10	0.999	0.062	1.240	0.833	0.999	0.072	1.450	0.932
SVM10	0.999	0.086	1.733	1.004	0.997	0.134	2.705	1.138
RF10	0.999	0.063	1.277	0.827	0.994	0.181	3.662	2.235
XGBoost10	0.999	0.071	1.432	1.087	0.996	0.147	2.985	1.906
CatBoost10	1.000	0.038	0.759	0.588	0.999	0.073	1.479	0.877
T_max, T_min, T_ave, P_r, RH_ave, U₂, R_a, R_s
ANN8	0.999	0.074	1.488	1.007	0.999	0.086	1.748	1.092
SVM8	0.998	0.092	1.856	1.049	0.996	0.142	2.879	1.186
RF8	0.999	0.065	1.316	0.854	0.994	0.186	3.756	2.339
XGBoost8	0.999	0.080	1.604	1.210	0.996	0.152	3.069	2.016
CatBoost8	0.999	0.055	1.100	0.826	0.999	0.092	1.863	1.165
T_max, T_min, T_ave, P_r, RH_ave, U₂, R_s
ANN7	0.996	0.141	2.834	2.013	0.996	0.157	3.182	2.218
SVM7	0.996	0.156	3.148	2.043	0.993	0.203	4.107	2.300
RF7	0.999	0.076	1.520	1.041	0.991	0.225	4.564	3.042
XGBoost7	0.997	0.122	2.450	1.779	0.993	0.202	4.092	2.775
CatBoost7	0.998	0.114	2.296	1.655	0.995	0.163	3.299	2.244
T_max, T_min, T_ave, P_r, RH_ave, U₂
ANN6	0.949	0.526	10.591	8.200	0.946	0.544	11.010	8.469
SVM6	0.942	0.560	11.284	8.305	0.940	0.573	11.596	8.455
RF6	0.991	0.220	4.421	3.343	0.937	0.590	11.954	9.041
XGBoost6	0.963	0.447	8.996	6.883	0.939	0.581	11.760	8.919
CatBoost6	0.962	0.456	9.175	7.058	0.943	0.562	11.371	8.651
T_max, T_min, T_ave, P_r, RH_ave
ANN5	0.897	0.746	15.019	11.580	0.893	0.765	15.499	11.854
SVM5	0.894	0.758	15.260	11.601	0.892	0.769	15.580	11.781
RF5	0.984	0.298	5.998	4.535	0.883	0.801	16.219	12.284
XGBoost5	0.927	0.627	12.629	9.672	0.890	0.779	15.774	11.950
CatBoost5	0.926	0.632	12.729	9.816	0.892	0.771	15.606	11.803
T_max, T_min, T_ave, RH_ave
ANN4	0.899	0.564	14.873	11.366	0.894	0.580	15.461	11.750
SVM4	0.893	0.578	15.305	11.643	0.892	0.584	15.632	11.835
RF4	0.983	0.227	6.039	4.570	0.881	0.612	16.339	12.398
XGBoost4	0.927	0.484	12.659	9.743	0.887	0.598	15.937	12.103
CatBoost4	0.926	0.489	12.742	9.842	0.890	0.590	15.752	11.950
T_max, T_min, T_ave
ANN3	0.865	0.853	17.187	13.142	0.862	0.872	17.655	13.489
SVM3	0.862	0.865	17.418	13.187	0.860	0.877	17.759	13.469
RF3	0.978	0.348	7.018	5.269	0.846	0.919	18.603	14.184
XGBoost3	0.902	0.729	14.689	11.217	0.854	0.896	18.143	13.814
CatBoost3	0.896	0.749	15.087	11.534	0.857	0.887	17.957	13.681

Notes: ANN, RF, XGBoost, CatBoost, and SVM refer to Artificial Neural Network (ANN), Random Forest, Extreme Gradient Boosting, Categorical Boosting, and Support Vector Machine, respectively. The R² close to 1.0 indicates a good match between the predicted and observed datasets. The simulation is excellent if the estimation result of RMSE_n

<

10% and good if the result ranges between 10 and 20%. A result from 20 to 30% indicates fair performance, while RMSE_n > 30% indicates poor performance. The lower the MAE value, the greater the projection/modeling accuracy. The S_e between ±15% is acceptable.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Elsadek, E.A.; Ali, M.A.H.; Williams, C.; Thorp, K.R.; Elshikha, D.E.M. A Novel Framework for Predicting Daily Reference Evapotranspiration Using Interpretable Machine Learning Techniques. Agriculture 2025, 15, 1985. https://doi.org/10.3390/agriculture15181985

AMA Style

Elsadek EA, Ali MAH, Williams C, Thorp KR, Elshikha DEM. A Novel Framework for Predicting Daily Reference Evapotranspiration Using Interpretable Machine Learning Techniques. Agriculture. 2025; 15(18):1985. https://doi.org/10.3390/agriculture15181985

Chicago/Turabian Style

Elsadek, Elsayed Ahmed, Mosaad Ali Hussein Ali, Clinton Williams, Kelly R. Thorp, and Diaa Eldin M. Elshikha. 2025. "A Novel Framework for Predicting Daily Reference Evapotranspiration Using Interpretable Machine Learning Techniques" Agriculture 15, no. 18: 1985. https://doi.org/10.3390/agriculture15181985

APA Style

Elsadek, E. A., Ali, M. A. H., Williams, C., Thorp, K. R., & Elshikha, D. E. M. (2025). A Novel Framework for Predicting Daily Reference Evapotranspiration Using Interpretable Machine Learning Techniques. Agriculture, 15(18), 1985. https://doi.org/10.3390/agriculture15181985

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Framework for Predicting Daily Reference Evapotranspiration Using Interpretable Machine Learning Techniques

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Datasets

2.2. Machine Learning Models

2.2.1. Artificial Neural Network (ANN)

2.2.2. Random Forest (RF)

2.2.3. XGBoost

2.2.4. CatBoost

2.2.5. Support Vector Machine (SVM)

2.3. Models’ Hyperparameter Settings

2.4. Input Combinations

2.5. SHAP: SHapley Additive exPlanations

2.6. Evaluation Indices

3. Results and Discussion

3.1. Coolidge Station

3.2. Maricopa Station

3.3. Queen Creek Station

3.4. Overall Discussion

4. Conclusions, Recommendations, and Outlooks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI