Performance Evaluation of Regression-Based Machine Learning Models for Modeling Reference Evapotranspiration with Temperature Data

: In this study, due to their flexibility in forecasting, the capabilities of three regression-based machine learning models were explored, specifically random forest regression (RFr), generalized regression neural network (GRNN), and support vector regression (SVR). The above models were assessed for their suitability in modeling daily reference evapotranspiration (ET o ), based only on temperature data (T min , T max , T mean ), by comparing their daily ET o results with those estimated by the conventional FAO 56 PM model, which requires a broad range of data that may not be available or may not be of reasonable quality. The RFr, GRNN


Introduction
The decrease in water availability over the last few decades is one of the principal environmental problems that could severely restrict agricultural production and industrial development in some arid and semiarid areas of the world [1,2], particularly under climate change conditions.Evapotranspiration (ET) is a parameter of major importance, participating in both hydrological cycle and surface energy balance [1][2][3], and is wellestablished in various disciplines such as hydrology [4], agronomy [5], climatology [6], and other geosciences [7].Modeling daily reference evapotranspiration (ET o ) holds significant importance in various aspects, including water resources management, estimating water balances, scheduling irrigation, forecasting agricultural production, and addressing theoretical challenges within the fields of hydrology and meteorology.The ASCE-standardized method [8], at a daily step for short reference crops (clipped grass of 0.12 m), gives, as suggested by [8], equivalent ET o results to the FAO56 Penman-Monteith (FAO 56 PM) equation [9].The FAO 56 PM equation was adopted by the FAO as a standard method of estimating ET o , as it gives more consistent ET o estimates, and it has been shown to perform Hydrology 2024, 11, 89 2 of 20 better than other ET o methods [8,9].However, the detailed meteorological data required by the FAO 56 PM equation are not often available, especially in developing countries.In such circumstances, there is much research for innovative modeling approaches to ensure the trustworthy estimation of ET o values.
The ASCE Task Committee [10] recognized artificial neural networks (ANNs) as a reliable modeling tool, prompting the further exploration of machine learning (ML) techniques in various hydrological and climatological applications.Since then, ML methods have been used for water quality examination, hydrologic time series analysis [11][12][13][14][15], landslide susceptibility mapping [16], and climate impact assessments on dam seepage [17].
Several studies have focused on estimating reference evapotranspiration (ET o ) using ML techniques.For example, ref. [18] used a cascade correlation neural network with Kalman filtering for monthly ET o estimation, while [19] found that generalized regression neural networks (GRNNs) outperformed radial basis function neural networks (RBFNNs) for daily ET o estimation in northern Algeria.Similarly, ref. [20] evaluated least square support vector regression, multivariate adaptive regression splines, and M5 Model Tree for ET o estimation.Ref. [21] explored daily ET o estimation using ANNs and empirical equations with limited input data, and [22] proposed GRNN and random forest models for daily ET o in the Sichuan basin, southwest China.
Various studies have explored the use of boosted machine learning models as alternatives to empirical methods for estimating daily reference evapotranspiration (ET o ).For instance, ref. [23] investigated boosted ML models, while [24] examined spatial and temporal ML approaches for ET o estimation.Linear regression algorithms were explored by [25] using limited climate data, and [26] focused on support vector machine (SVM) models for reference crop evapotranspiration.A comprehensive evaluation of ML models at both general and regional levels was conducted by [27].In an arid climate context, ref. [28] assessed the performance of ML algorithms for ET o estimation.Lastly, ref. [29] explored the k-nearest neighbor algorithm, multigene genetic programming, and support vector regression (SVR) for daily ET o estimation in Turkey.
The literature mentioned above highlights the necessity for further investigation into the capabilities of different machine learning algorithms and structures.Nevertheless, the utilization of machine learning algorithms and their various adaptations in water resources applications have not been fully exploited.Towards this direction, the main objective of this study is to investigate and finally evaluate three regression-based machine learning modeling approaches, namely random forest regression (RFr), generalized regression neural network (GRNN) and support vector regression (SVR), for accurate and reliable daily reference evapotranspiration (ET o ) estimation.The difference between this work and the Hargreaves-Samani (HG-S) method [30,31], which also assesses temperature-based features for the estimation of ET o , is examined.Taking advantage of the capabilities offered by the strength of machine learning, the goal was to construct accurate daily ET o prediction models using only temperature data (T min , T max , T mean ) and astronomical data (extraterrestrial radiation (R a ), and theoretical sunshine (N), which can easily be estimated for a certain day and location) as inputs.It is important to mention that machine learning models built based on temperature can also be extended to predict ET o using predicted temperature data, especially in the context of climate change conditions.
The motivation for employing the RFr, GRNN, and SVR machine learning approaches was to evaluate various algorithmic strategies for the same problem using three distinct modeling techniques, thereby aiming to produce the most reliable results possible.The reasoning behind the selection of these algorithms was that RFr is an ensemble learning algorithm that leverages the combined knowledge of multiple models to enhance overall performance.GRNN is a probabilistic neural network, which offers significant advantages, particularly its ability to efficiently converge to the underlying data function even with a limited number of training samples.Finally, the ε-SVR methodology addresses the nonlinear regression problem by transforming it into a linear one using kernel functions.
The performance evaluation of the RFr, GRNN and SVR models was assessed by comparing their daily ET o results with those of the FAO 56 PM model [8,9], while the data we used were obtained from daily meteorological data collected at two weather stations, which are sited at Sindos and Piperia, in northern Greece.These stations were selected because they are located in regions frequently affected by droughts due to the combined influence of climate change and extensive human activities.Additionally, this work is motivated by the fact that, so far, scientists in Greece have not investigated the use of machine learning models with temperature-based features.Furthermore, there is a scarcity of global evaluations of these three specific methodologies using only temperature measurements.

Study Area and Data
The study area is located in northern Greece.Daily meteorological variables, including maximum (T max )/minimum (T min ) air temperature at 2 m height, mean relative humidity (RH), wind speed at 2 m height (U 2 ), and net solar radiation (R n ), were obtained at Sindos meteorological station (Lat.40

functions.
The performance evaluation of the RFr, GRNN and SVR models was comparing their daily ETo results with those of the FAO 56 PM model [8,9], w we used were obtained from daily meteorological data collected at two wea which are sited at Sindos and Piperia, in northern Greece.These stations w because they are located in regions frequently affected by droughts due to t influence of climate change and extensive human activities.Additionally, this tivated by the fact that, so far, scientists in Greece have not investigated the us learning models with temperature-based features.Furthermore, there is global evaluations of these three specific methodologies using only temperat ments.

Study Area and Data
The study area is located in northern Greece.Daily

FAO56 Penman-Monteith (FAO 56 PM) Method
In this study, the performance of the RFr, GRNN, and SVR models, using only temperature data (Tmin, Tmax, Tmean) and astronomical data (extraterrestrial radiation (Ra) and theoretical sunshine (N)) as inputs, was assessed by comparing their daily ETo results with those of the conventional FAO 56 PM method.This method was adopted by the FAO as the standard method of estimating ETo as it gives more consistent ETo estimates, and it has been shown to perform better than other ETo methods [8,9].According to [8,9], the FAO 56 PM method is summarized by the following equation: where ETo is the reference evapotranspiration (mm d -1 ), Rn is the daily net solar radiation (MJ m −2 d -1 ), G is the soil heat flux (MJ m −2 d -1 ), T is the average daily air temperature at a height of 2 m (°C), U2 is the daily mean of the wind speed at a height of 2 m (m s -1 ), es is the saturation vapor pressure (kPa), ea is the actual vapor pressure (kPa), Δ is the slope of the saturation vapor pressure versus the air temperature curve (kPa °C-1 ), and γ is the psychrometric constant (kPa °C-1 ).All parameters were calculated using equations provided by [9].The soil heat flux (G) was assumed to be zero over the calculation time step period (24 h) [8].Extraterrestrial radiation (Ra) and theoretical sunshine (N), which are astronomical data, can easily be estimated for a certain day and location, according to [9], as follows: where Ra is extraterrestrial radiation (MJ/m 2 d), Gsc is the solar constant (0.0820 MJ/m 2 min), dr is the inverse relative distance between the Earth and the Sun, ωs is the sunset hour angle (rad), φ is latitude (rad), δ is solar declination (rad), and N is theoretical sunshine (h).

Machine Learning Modeling Approaches
Partitioning the dataset into distinct training and testing subsets constitutes a foundational procedure in machine learning, enabling efficient model training, performance assessment, and protection against overfitting.This methodology serves to promote the model's ability to effectively generalize to unseen data and make precise predictions in

FAO56 Penman-Monteith (FAO 56 PM) Method
In this study, the performance of the RFr, GRNN, and SVR models, using only temperature data (T min , T max , T mean ) and astronomical data (extraterrestrial radiation (R a ) and theoretical sunshine (N)) as inputs, was assessed by comparing their daily ET o results with those of the conventional FAO 56 PM method.This method was adopted by the FAO as the standard method of estimating ET o as it gives more consistent ET o estimates, and it has been shown to perform better than other ET o methods [8,9].According to [8,9], the FAO 56 PM method is summarized by the following equation: where ET o is the reference evapotranspiration (mm d −1 ), R n is the daily net solar radiation (MJ m −2 d −1 ), G is the soil heat flux (MJ m −2 d −1 ), T is the average daily air temperature at a height of 2 m ( • C), U 2 is the daily mean of the wind speed at a height of 2 m (m s −1 ), e s is the saturation vapor pressure (kPa), e a is the actual vapor pressure (kPa), ∆ is the slope of the saturation vapor pressure versus the air temperature curve (kPa • C −1 ), and γ is the psychrometric constant (kPa • C −1 ).All parameters were calculated using equations provided by [9].The soil heat flux (G) was assumed to be zero over the calculation time step period (24 h) [8].
Extraterrestrial radiation (R a ) and theoretical sunshine (N), which are astronomical data, can easily be estimated for a certain day and location, according to [9], as follows: where R a is extraterrestrial radiation (MJ/m 2 d), G sc is the solar constant (0.0820 MJ/m 2 min), d r is the inverse relative distance between the Earth and the Sun, ω s is the sunset hour angle (rad), φ is latitude (rad), δ is solar declination (rad), and N is theoretical sunshine (h).

Machine Learning Modeling Approaches
Partitioning the dataset into distinct training and testing subsets constitutes a foundational procedure in machine learning, enabling efficient model training, performance assessment, and protection against overfitting.This methodology serves to promote the model's ability to effectively generalize to unseen data and make precise predictions in practical real-world applications.Due to this fact, the available dataset was divided into two parts: the daily meteorological data for the period 2000-2007 as the fitting dataset and the daily meteorological data for the period 2008-2009 as the test dataset from Sindos station.In order for the predictive ability of the built machine learning models to be further assessed, the daily meteorological data for the period 2008-2009 from the Piperia station were used as a second test dataset as well.The k = 10 cross-validation technique [32] was applied to the comprehensive dataset, consisting of 2922 daily measurements, ensuring that all available data patterns were taken into account during the model construction phase.

•
Random Forest for regression (RFr) Utilizing the structural and algorithmic power of machine learning techniques, which are known for their superiority in simulating intricate non-linear systems, and, considering the fact that as non-parametric methods, they can bypass the constraints of standard regression modeling [33], the random forest for regression (RFr) machine learning modeling approach was employed in order for accurate and reliable ET o models to be created.This modeling technique has been extensively described in [34].Ref. [35] was the pioneer in introducing this supervised machine learning technique, founded on the idea that multiple models have the capacity to generate an outcome capable of capturing the true underlying structure of the available data.Random forest for regression (RFr) utilizes numerous individual models, known as decision trees, which are ultimately aggregated into one, aiming to minimize both the variance and bias of the base learner, which is the decision tree, to the greatest extent achievable by the system.This approach, referred to as ensemble learning [36][37][38], harnesses the combined knowledge of multiple models to enhance the overall performance of the learning system.That is, a random forest comprises a collection of regression trees (decision trees), utilizing their information collectively.
In the training process, the observations within the fitting dataset are employed to create numerous regression trees, each having distinct training parameters, thereby contributing uniquely to the prediction process.The ultimate observation prediction results from the amalgamation of all individual predictions, thereby leveraging the diverse internal characteristics of each tree to improve generalization.It represents a form of decision structure learning, centered on a predictive model, with the aim of accurately estimating the dependent variable based on the observed values of independent variables.
Each individual regression tree comprises a connected flowchart.In this structure, there is a solitary starting node from which two branches initially extend and lead to 'child' nodes stemming from their parent nodes.Each node has a specific satisfaction condition (impurity criterium), and if this objective is not met, the process advances to a new node and its corresponding children.The ensemble method used for ET o model construction was bootstrapped aggregation (bagging) [36,[39][40][41][42][43].This approach entails training multiple independent models on random subsets (bootstraps) of the fitting data.The algorithm selects a random subset of the available features while ensuring there is no correlation among the decision tree estimators.These estimators showed, as expected, high variance, since they perfectly capture the pattern of the particular sample data.Ultimately, when the predictions from these individual models (regression trees) were combined through averaging, the variance in aggregation was significantly reduced.
For the development of a precise and reliable RFr model, it is essential to appropriately adjust its learning hyperparameters.The most critical factors in this process are the quantity of decision trees and their maximum depth within the modeling system.Additionally, careful consideration should be given to parameters such as the minimum number of samples necessary to split a node, the minimum number of samples required for a leaf node, and the number of features to be considered when searching for the optimal split.To identify the most suitable combination of hyperparameters for the RFr model, we utilized a trial-and-error approach, aiming to minimize the mean square error.This involved iterating through various settings for the first two hyperparameters, while keeping the last three at their default values, until we achieved the desired target error.We examined a range of values for the number of decision trees and their maximum depth, spanning from 50 to 500 per unit for the former and 5 to 12 per unit for the latter.The default values for the minimum number of samples required to split a node, the minimum number of samples needed for a leaf node, and the number of features considered for the optimal split were 2, 1, and 1, respectively.

• Generalized Regression Neural Network (GRNN)
The utilization of a probabilistic neural network, such as a GRNN, offers significant benefits, primarily because it can efficiently converge towards the underlying data function even when there are limited training samples available.Moreover, the minimal additional knowledge required for achieving a satisfactory fit can be obtained without requiring further input from the user.Consequently, this makes the generalized regression neural network (GRNN) a highly valuable tool for making predictions and conducting practical comparisons of system performance.As a statistical method of function approximation into the structure of a neural network, generalized regression neural networks (GRNNs) [44] have been successfully used for daily reference evapotranspiration modeling [20,22,45,46].The concluding remarks of the conducted research indicate that this artificial neural network methodology is worthy of further exploration in hydrology studies.GRNNs have been described extensively in the related literature [47][48][49].Summarizing, the learning of this methodology depends on a single parameter (σ), which is known as the smoothing parameter.It represents the width of the normalized Gaussian function which is embedded in the probability density function used in single-bandwidth GRNN training and can be described as follows: where Ŷ(X) represents the Nadaraya-Watson kernel regression estimator; is the Gaussian radial basis function (RBF), whose outcome is influenced by the smoothing parameter (σ) value; X is the current input vector and X i is the corresponding training output vector; n is the number of elements of the vector X; and the symbol T represents the transpose operation applied to the vector.A CRNN's architecture comprises four distinct layers.The initial layer functions as the input layer, where independent variables are introduced to the system.Subsequently, a second layer called the pattern layer is established, which receives information from the input layer.Within the pattern layer, each node generates a signal using the RBF (radial basis function) and transmits it to the next layer, known as the summation layer, which is the third layer.In this layer, ∑ n i=1 (X − X i ) T •(X − X i ) represents the squared Euclidean distance between the training value and the prediction point, serving as a gauge of the neural network's adaptation to the actual training values.Finally, the fourth layer, known as the output layer, incorporates the values obtained from Equation (4).
Based on the preceding explanation, it becomes obvious that the precise choice of the smoothing coefficient (σ) is of outmost importance for both the accuracy and generalization capability of the final GRNN model.Therefore, its value was fine-tuned using an exhaustive grid-search method [50] through a trial-and-error process, spanning the range of [0, 10] with increments of 0.001.This selection was driven by optimizing a combination of maximum correlation and minimum mean square error between the observed values and those estimated by the GRNN model.

•
Support Vector Regression (SVR) Support vector regression (SVR) is another highly promising algorithm within the realm of machine learning approaches, offering considerable potential for application in environmental modeling.It was initially introduced along with the concept of the capacity of machine learning by [51], in conjunction with the work of Cortes and Vapnik [52].A comprehensive description of SVR can be found in Vapnik's works from 1999 and Hydrology 2024, 11, 89 9 of 20 2000 [53,54], as well as in the publications [55,56].In essence, this methodology addresses the nonlinear regression problem by converting it into a linear one through the use of kernel functions.To achieve this transformation via the ε-SVR algorithm, the original input space is mapped onto a higher-dimensional feature space.ε-SVR involves the creation of an initial space with a width of (2ε), where ε > 0, effectively encapsulating the original data within the range [−ε, +ε].With the introduction of an additional variable ξ i , referred to as a slack variable, which quantifies the deviation of each training point from the initial space with a width of (2ε), the system aims to minimize the ε-insensitive loss function [54,56,57]: where C is a system's hyperparameter that needs to be tuned, w is the vector of the weights, and bc is the system's bias.
Out of the four kernel functions offered in ε-SVR, which include the radial basis function, linear, sigmoid, and polynomial, the radial basis function (RBF) kernel (Equation ( 6)) was chosen for its capacity to effectively measure similarity.It was employed to convert the data into a multi-dimensional super-space (m-dimensional) with the aim of representing intricate nonlinear relationships using an optimal straight line: where γ = ( 1 /2σ 2 ), and x i − x j is the Euclidean distance between the support vectors (SVs).Considering Equations ( 5) and ( 6), it can be seen that the precision of estimation and the intricacy of ε-SVR models rely on three key hyperparameters.These include (ε), responsible for determining the width of the ε-insensitive zone; gamma (γ), serving as the tuning parameter for Gaussian radial basis function (RBF) kernels; and the cost parameter (C), which regulates the influence of each support vector, effectively managing the trade-off between misprediction and model simplicity.The optimal combination of these hyperparameters was determined through an exhaustive grid-search approach [50], where (ε) varied between 0.01 and 0.8 in increments of 0.01, (γ) ranged from 0.01 to 1 with steps of 0.01, and (C) spanned from 1 to 100 in increments of 1.
All machine learning modeling methodologies used were implemented using the scikit-learn libraries [58] within the Python programming language [59].

Performance Evaluation Criteria
In order to determine the accuracy in modeling daily reference evapotranspiration of the RFr, GRNN and SVR models employed in this study, graphical and numerical analyses of the errors were performed.For this purpose, four different criterium values were used.These evaluation metrics are the correlation coefficient (R, mm/d), the absolute average error (AAE, mm/d), the root mean square error (RMSE, mm/d), and the percent relative error (RE%).The formulas for these metrics are provided below: where ET PM

Performance of the Constructed Machine Learning Models
As non-parametric modeling approaches, machine learning methodologies do not impose assumptions.However, there are specific hyperparameters unique to each machine learning algorithm that need to be tuned to generate accurate and reliable models.In order for the best fitted RFr to be constructed, the optimal values of the training elements of the model were assessed through trial-and-error methodologies, taking into account the estimation and prediction mean square errors.Moreover, in order to guarantee full randomization of the procedure, the bootstrap aggregation technique was employed in the selection of training data for each individual decision tree during model building.To achieve this, 300 regression trees were utilized, each having 10 branches.These specific numbers were chosen after testing various options, ranging from 2 to 500 for the number of trees and 1 to 15 for the number of branches per tree.It was observed that once a combination of 300 regression trees with 10 branches each was employed, there was no substantial improvement in the model's average estimation error.Additionally, this choice of tree depth effectively prevented the over-parameterization of the system during the learning process of the random forest regression (RFr) model.
Based on the models developed using generalized regression neural networks (GRNNs), the smoothing factor value (σ) was fine-tuned through an exhaustive grid-search approach [58], involving a total of 4950 fits.The optimal σ value, which resulted in the most precise and reliable model, was determined to be 1.489.An exhaustive grid-search approach was conducted to identify the best hyperparameter combination (ε, γ, and C) for constructing the ε-SVR model as well.The optimal hyperparameter values which resulted in both the highest correlation value and the smallest root mean square error between the observed and estimated ET o values were determined to be 0.01, 0.5, and 180, respectively.
In Table 1, the evaluation metrics of the trained RFr, GRNN, and SVR models at Sindos station during the calibration period (2000-2007), using 2922 daily data points, are given.When evaluating the performance of the machine learning models constructed for the calibration dataset (Table 1), the RFr model demonstrated the most favorable fit to the data, achieving the highest R value and simultaneously the lowest values across all error metrics.The SVR model exhibited the second-best fit to the data, while the constructed GRNN model displayed comparatively lower adaptability, resulting in 1.49 times, 1.57 times, and 6.7% higher AAE, RMSE, and RE% values, respectively, compared to the error values obtained from the RFr model.
To assess the generalization capability of the created machine learning models, we computed evaluation metrics for a new time period spanning from 2008 to 2009, encompassing both the Sindos and Piperia stations.This analysis utilized 731 distinct daily input datasets for each station, and the outcomes can be found in Table 2.
Table 2 illustrates that all the developed models exhibited effective generalization to new data, as evidenced by evaluation metric values closely resembling those obtained during the calibration period (Table 1).This indicates that the models delivered accurate ET o predictions for new data, whether in the same station (Sindos) as the calibration data or in an entirely different station (Piperia).Furthermore, a consistent pattern in the evaluation metrics was observed among the models, with the RFr model generating the most precise predictions, followed by the SVR model, and the GRNN model producing the least accurate predictions.To investigate the accuracy of estimation and prediction by the models on an annual basis, Figures 4 and 5 were generated for each individual year.The sub-figures in Figure 4 illustrate the performance of the RFr, GRNN, and SVR models at Sindos station, during the calibration period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007), for each individual year.It is important to highlight that these models were constructed using solely air temperature, Ra, and N data.In the case of Sindos station, shown in Figure 4, the performance metrics exhibited noticeable variability throughout the calibration period.Specifically, considering the RFr model, the values of R (Figure 4a) ranged from 0.9894 to 0.9851, with an average of 0.9843.The AAE values (Figure 4b) ranged from 0.2104 to 0.2211, averaging 0.2189 mm/d for the same model, while the RMSE values (Figure 4c) ranged from 0.2881 to 0.3201, with an average of 0.3108 mm/d.Lastly, the RE values (Figure 4d) ranged from 0.1093 to 0.1197, with an average value of 0.1269 for the best-performing RFr model.The GRNN and SVR models exhibited lower average R values, with their performance resulting in 2.39% and 2.11% lower average R values, respectively, compared to the average R value derived by the RFr model (Figure 4a).The SVR model yielded an average AAE value that was 1.45 times higher than that of the RFr model, while the GRNN model had an average AAE value 1.49 times higher than that of the RFr model.Additionally, the SVR model resulted in an average RMSE value that was 1.53 times smaller, and the GRNN model had an average RMSE value 1.56 times smaller than that of the RFr model.
Finally, the relative error (RE) in percentage error values for the GRNN and SVR models were 7.11% and 6.67% larger, respectively, compared to the average RE% value of the RFr model.The results mentioned above and shown in Figure 4 indicate that the constructed RFr model demonstrated the most effective fit with the calibration dataset.It is important to highlight that these models were constructed using solely air temperature, Ra, and N data.In the case of Sindos station, shown in Figure 4, the performance metrics exhibited noticeable variability throughout the calibration period.Specifically, considering the RFr model, the values of R (Figure 4a) ranged from 0.9894 to 0.9851, with an average of 0.9843.The AAE values (Figure 4b) ranged from 0.2104 to 0.2211, averaging 0.2189 mm/d for the same model, while the RMSE values (Figure 4c) ranged from 0.2881 to 0.3201, with an average of 0.3108 mm/d.Lastly, the RE values (Figure 4d) ranged from 0.1093 to 0.1197, with an average value of 0.1269 for the best-performing RFr model.The GRNN and SVR models exhibited lower average R values, with their performance resulting in 2.39% and 2.11% lower average R values, respectively, compared to the average R value derived by the RFr model (Figure 4a).The SVR model yielded an average AAE value that was 1.45 times higher than that of the RFr model, while the GRNN model had an average AAE value 1.49 times higher than that of the RFr model.Additionally, the SVR model resulted in an average RMSE value that was 1.53 times smaller, and the GRNN model had an average RMSE value 1.56 times smaller than that of the RFr model.
Finally, the relative error (RE) in percentage error values for the GRNN and SVR models were 7.11% and 6.67% larger, respectively, compared to the average RE% value of the RFr model.The results mentioned above and shown in Figure 4 indicate that the constructed RFr model demonstrated the most effective fit with the calibration dataset.
Based on the evaluation of the machine learning models' generalization capability using test datasets from Sindos and Piperia stations in the 2008-2009 testing period (Figure 5), the performance metrics for all models exhibited minor fluctuations, suggesting that the tested models possessed similar generalization abilities.According to [30,31], the Hargreaves-Samani (HG-S) method, which also relies on temperature data, has been effectively used in certain locations to estimate daily ETo, and it is summarized by the following equation:

𝐸𝑇
[0.0023 (Tmean + 17.8)(Tmax − Tmin) 0.5 Rα (11) where ΕΤο is the reference evapotranspiration (mm d −1 ), Tmean is the mean daily air temperature (°C), Tmax is the maximum daily air temperature (°C), Tmin is the minimum daily air temperature (°C), and Rα is extraterrestrial radiation (mm d −1 ), which can easily be estimated for a certain day and location using Equation (2). Figure 7 displays the annual ETo values (mm/year) estimated using the HG-S method at Sindos station for the period 2000-2009 and at Piperia station for the testing period 2008-2009.Figure 7 highlights a significant divergence between the annual ETo values (mm/year) estimated using the FAO 56 PM and HG-S methods.At Sindos station, when comparing the HG-S method to the reference FAO 56 PM annual ETo values, the HG-S method significantly overestimates ETo for all years.The overestimation of annual ETo values ranged from 20.29% to 41.20%, with an average of 29.33%.Conversely, at Piperia station, the HG-S method significantly underestimates the annual ETo, with underestimation values of 48.61% for 2008 and 50.27% for 2009.Similar discrepancies are observed in the monthly ETo values (mm/day) estimated by the two methods, confirming that the HG-S method has failed to reliably estimate daily ETo at both stations.According to [30,31], the Hargreaves-Samani (HG-S) method, which also relies on temperature data, has been effectively used in certain locations to estimate daily ET o , and it is summarized by the following equation: where ET o is the reference evapotranspiration (mm d −1 ), T mean is the mean daily air temperature ( • C), T max is the maximum daily air temperature ( • C), T min is the minimum daily air temperature ( • C), and R α is extraterrestrial radiation (mm d −1 ), which can easily be estimated for a certain day and location using Equation (2). Figure 7 displays the annual ET o values (mm/year) estimated using the HG-S method at Sindos station for the period 2000-2009 and at Piperia station for the testing period 2008-2009.Figure 7 highlights a significant divergence between the annual ET o values (mm/year) estimated using the FAO 56 PM and HG-S methods.At Sindos station, when comparing the HG-S method to the reference FAO 56 PM annual ET o values, the HG-S method significantly overestimates ET o for all years.The overestimation of annual ET o values ranged from 20.29% to 41.20%, with an average of 29.33%.Conversely, at Piperia station, the HG-S method significantly underestimates the annual ET o , with underestimation values of 48.61% for 2008 and 50.27% for 2009.Similar discrepancies are observed in the monthly ET o values (mm/day) estimated by the two methods, confirming that the HG-S method has failed to reliably estimate daily ET o at both stations.
Figure 8 demonstrates that there is minimal fluctuation in monthly ET o values (mm/day) between the RFr, GRNN, SVR, and FAO 56 PM models.This consistency is observed during the calibration period from 2000 to 2007 at Sindos station (Figure 8a), as well as during the testing period, from 2008 to 2009, at both Sindos (Figure 8b) and Piperia (Figure 8c) stations.Notably, the RFr model consistently outperforms the other models in both periods.As can be seen (Figure 10), the RFr model has the highest R 2 value (0.8775), suggesting it explains the most variance in the data.However, its slope (0.8475) deviates significantly from 1, indicating it tends to underestimate the actual values.GRNN model has a lower R 2 value (0.8581) compared to the RFr model but has the slope (1.0035) closest to 1 and the intercept (0.0783) closest to 0, suggesting very accurate predictions, although it could not be able to explain the most variance in the data.Finally, SVR model showed a middle R 2 value (0.869), a slope (1.0114) close to 1, and a slightly higher intercept (0.1716) compared to the GRNN model.Although the results do not clearly identify the best prediction model for the Piperia test dataset, the overall performance of the RFr model on both the calibration and test datasets, compared to the GRNN and SVR models, suggests that the RFr model can be considered the safest choice.
highest R 2 value (0.8775), suggesting it explains the most variance in the data.However, its slope (0.8475) deviates significantly from 1, indicating it tends to underestimate the actual values.GRNN model has a lower R 2 value (0.8581) compared to the RFr model but has the slope (1.0035) closest to 1 and the intercept (0.0783) closest to 0, suggesting very accurate predictions, although it could not be able to explain the most variance in the data.Finally, SVR model showed a middle R 2 value (0.869), a slope (1.0114) close to 1, and a slightly higher intercept (0.1716) compared to the GRNN model.Although the results do not clearly identify the best prediction model for the Piperia test dataset, the overall performance of the RFr model on both the calibration and test datasets, compared to the GRNN and SVR models, suggests that the RFr model can be considered the safest choice.

Discussion
This study aims to offer a comprehensive and dependable methodology for estimating and predicting daily reference evapotranspiration (ETo), which is of great importance for water resources management, water balance estimation, irrigation scheduling, agricultural production forecasting, and solving many theoretical problems in the fields of hydrology and meteorology.Given the nonlinear nature of ETo, our approach involves employing three non-parametric regression-based machine learning modeling methods.This strategy is designed to leverage the unique strengths of each algorithm, while also taking into account their individual advantages and limitations.
The models were primarily constructed based on temperature data (Tmin, Tmax, Tmean), which are correlated.The effect of multicollinearity introduced in these non-parametric machine learning models can be considered minimal.Examining each modeling method used, it can be said that the ε-SVR approach effectively addresses the non-linear nature of the data by using kernel tricks to map input features to high-dimensional spaces, thus reducing issues from multicollinearity.Random forest (RFr) generally handles multicollinearity well.In our case, the model, composed of 300 regression trees with 10 branches each, was of moderate complexity.This tree depth effectively prevents over-parameterization during learning.Finally, generalized regression neural networks (GRNNs)

Discussion
This study aims to offer a comprehensive and dependable methodology for estimating and predicting daily reference evapotranspiration (ET o ), which is of great importance for water resources management, water balance estimation, irrigation scheduling, agricultural production forecasting, and solving many theoretical problems in the fields of hydrology and meteorology.Given the nonlinear nature of ET o , our approach involves employing three non-parametric regression-based machine learning modeling methods.This strategy is designed to leverage the unique strengths of each algorithm, while also taking into account their individual advantages and limitations.
The models were primarily constructed based on temperature data (Tmin, Tmax, Tmean), which are correlated.The effect of multicollinearity introduced in these nonparametric machine learning models can be considered minimal.Examining each modeling method used, it can be said that the ε-SVR approach effectively addresses the nonlinear nature of the data by using kernel tricks to map input features to high-dimensional spaces, thus reducing issues from multicollinearity.Random forest (RFr) generally handles multicollinearity well.In our case, the model, composed of 300 regression trees with 10 branches each, was of moderate complexity.This tree depth effectively prevents over-parameterization during learning.Finally, generalized regression neural networks (GRNNs) are also less sensitive to multicollinearity due to their kernel regression principles.GRNNs, being inherently non-linear, do not assume linearity and base their predictions on the distance between input and training vectors, minimizing the impact of multicollinearity in high-dimensional spaces.
Additionally, all modeling approaches possess the capability to address data challenges like high variance, outliers, and potential missing values, ultimately resulting in accurate results.Nevertheless, it is essential to appropriately fine-tune the hyperparameters for all modeling approaches, as this task plays a crucial role in ensuring their effective estimation and prediction performance.
Specifically, the first utilized approach for ET o estimation and prediction involves random forest regression (RFr), which, through the independent building of its decision trees, can effectively address the significant issue of overfitting while reducing the variance [60] and bias of predictions [61].Nonetheless, it also exhibits a significant limitation, namely the absence of extrapolation capability.In other words, any forecast made outside the range of values encountered during the system's construction phase will essentially be an average of the previously observed values within the RFr model's scope.The further the predicted value is from the range of the training data, the less reliable the prediction becomes.The second modeling approach utilizes a generalized regression neural network (GRNN), which is a probabilistic neural network.It provides notable advantages [44], particularly in its ability to effectively converge towards the underlying data function, even in scenarios with a limited number of training samples.Furthermore, it requires minimal additional knowledge to achieve a satisfactory fit, eliminating the need for additional user input.As a result, the GRNN proves to be a valuable tool for making predictions and facilitating practical comparisons of system performance.However, due to the fact that it is a memory-intensive neural network, GRNN cannot be considered as a suitable neural network type for large datasets or high-dimensional data.Furthermore, it can be prone to overfitting, which is a common problem for all neural network types.Due to ε-SVR's ability to uncover intricate relationships within real-world data and effectively address issues like overfitting and local minima [61,62], the support vector regression approach was our third viable choice for estimating and predicting ET o .However, training an SVR model is a computationally intensive procedure.
Taking into account the flexibility exhibited by each of the employed approaches, as shown in Table 1, and their capacity to generalize to new datasets, as indicated in Table 2, it is reasonable to deem their performance as satisfactory for both calibration and the two distinct test datasets.Additionally, there was consistency observed in how they performed in terms of estimation and prediction accuracy.
The modeling of daily ET o by the developed RFr, GRNN, and SVR models, using only temperature data, offers the possibility of integrating ET o in many cases, under current and climate change conditions, when temperature data are available.Some of its uses include the prediction of the agricultural production through the reliable estimation of the crop water requirements and its introduction in each grid of distributed hydrological models for the accurate estimation of the water balance components.
At this point, it is noteworthy to say that the Hargreaves-Samani (HG-S) method [30,31], which also assess temperature-based features for estimation ET o , was examined in this paper, and the finding is that it has failed at reliable daily ET o estimation at both stations.

Conclusions
This study delved into the feasibility of utilizing three different regression-based machine learning models to estimate daily reference evapotranspiration (ET o ): random forest regression (RFr), generalized regression neural networks (GRNNs), and support vector regression (SVR).We employed meteorological data from two stations situated in northern Greece, focusing solely on temperature variables (T min , T max , T mean ), extraterrestrial radiation (R a ), and theoretical sunshine (N) as input parameters for the models.To assess performance, we compared the results with FAO 56 PM daily ET o values estimated using comprehensive meteorological data, which served as our reference benchmark.
Analyzing the statistical comparisons (both numerically and graphically), encompassing evaluation metrics like R, AAE, RMSE, and RE%, alongside a comprehensive examination of the models' performance throughout the entire calibration and testing periods, as well as on a yearly basis, it becomes obvious that all three models (RFr, GRNN, and SVR) consistently displayed robust performance when estimating daily ET o at both stations.Notably, the RFr model stood out as the top performer, providing the most precise daily ET o estimates.This study's findings suggest the utilization of three regression-based machine learning models (RFr, GRNN, and SVR) as valuable tools for estimating daily ET o with only temperature data.Once these regression-based machine learning models have been successfully developed, they have the potential to serve as effective alternatives for estimating daily ET o , under current and climate change conditions, when temperature data are available.
Daily ET o is crucial information for water resources management and particularly for the prediction of agricultural production under climate change conditions.Machine learning models offer several advantages, including their ability to support engineers and watershed managers in diverse applications.As the models run on a daily time step, selecting two stations in different regions with significantly different altitudes (10 m and 160 m) ensures the integration of diverse meteorological variable values.
It is worth noting that this study utilized data from two stations, and future research could explore the application of the RFr, GRNN, and SVR models with data from additional stations.Additionally, the significant limitation of the RFr model-its lack of extrapolation capability-must be seriously considered, as it could impact its prediction accuracy.
Finally, it would be interesting for future research to include a comparison of these approaches, which have demonstrated potential in accurately estimating and predicting ET o , against more complex machine learning models to further evaluate their relative performance.

Figure 1 .
Figure 1.Geographic location of the meteorological stations in Greece.

Figures 2
Figures 2 and 3 present the monthly and inter-annual variation in m variables (mean air temperature, relative humidity, wind speed, net solar ra reference evapotranspiration estimated using the FAO 56 PM method (Equa

Figure 1 .
Figure 1.Geographic location of the meteorological stations in Greece.Figures2 and 3present the monthly and inter-annual variation in meteorological variables (mean air temperature, relative humidity, wind speed, net solar radiation, and reference evapotranspiration estimated using the FAO 56 PM method (Equation (1)).

Figure 2 .
Figure 2. Monthly variation in meteorological variables and ETo at Sindos station, during the training (2000-2007) and test (2008-2009) period, and at Piperia station, during the test (2008-2009) period (a-e).Tmean, RH, U2, Rn, and ETo represent mean air temperature, relative humidity, wind speed, net solar radiation, and reference evapotranspiration estimated by the FAO 56 PM method, respectively.These variations are shown for Sindos station during the test periods 2000-2007 and 2008-2009 and for Piperia station during the test period 2008-2009.Since the models operate on a daily time step, the mean monthly temperature values do not capture the deviations in daily values, which can reach up to 12.5 °C, with lower temperatures typically observed at Piperia station (alt.160 m) compared to Sindos station (alt.10 m).However,

Figure 2 .
Figure 2. Monthly variation in meteorological variables and ET o at Sindos station, during the training (2000-2007) and test (2008-2009) period, and at Piperia station, during the test (2008-2009) period (a-e).T mean , RH, U 2 , R n , and ET o represent mean air temperature, relative humidity, wind speed, net solar radiation, and reference evapotranspiration estimated by the FAO 56 PM method, respectively.These variations are shown for Sindos station during the test periods 2000-2007 and 2008-2009 and for Piperia station during the test period 2008-2009.Since the models operate on a daily time step, the mean monthly temperature values do not capture the deviations in daily values, which can reach up to 12.5 • C, with lower temperatures typically observed at Piperia station (alt.160 m) compared to Sindos station (alt.10 m).However, the mean monthly temperature values in Figure 2a and the mean annual temperature values in Figure 3(a 1 ,a 2 ) clearly show temperature disparities between the two stations.Additionally, Figures 2 and 3 indicate notable differences in relative humidity and wind speed between the stations.

i
and ET M i are ET o values at the i-th step obtained by the FAO 56 PM and the constructed machine learning models, respectively; n is the number of the time steps; while ET PM i and ET M i are the average ET o values obtained by the FAO 56 PM and the constructed machine learning models, respectively.High values of R, accompanied by a low AAE, RMSE, and RE, indicate high model performance, whereas the opposite implies poorer performance.

Figure 5 .
Figure 5. Performance of RFr, GRNN, and SVR models at Sindos and Piperia stations during testing (2008-2009) period.R (a), AAE (b), RMSE (c), and RE (d) represent correlation coefficient, absolute average error, root mean square error, and relative error, respectively.The sub-figures in Figure 4 illustrate the performance of the RFr, GRNN, and SVR models at Sindos station, during the calibration period (2000-2007), for each individual year.It is important to highlight that these models were constructed using solely air temperature, Ra, and N data.In the case of Sindos station, shown in Figure4, the performance metrics exhibited noticeable variability throughout the calibration period.Specifically, considering the RFr model, the values of R (Figure4a) ranged from 0.9894 to 0.9851, with an average of 0.9843.The AAE values (Figure4b) ranged from 0.2104 to 0.2211, averaging 0.2189 mm/d for the same model, while the RMSE values (Figure4c) ranged from 0.2881 to 0.3201, with an average of 0.3108 mm/d.Lastly, the RE values (Figure4d) ranged from 0.1093 to 0.1197, with an average value of 0.1269 for the best-performing RFr model.The GRNN and SVR models exhibited lower average R values, with their performance resulting in 2.39% and 2.11% lower average R values, respectively, compared to the average R value derived by the RFr model (Figure4a).The SVR model yielded an average AAE value that was 1.45 times higher than that of the RFr model, while the GRNN model had an average AAE value 1.49 times higher than that of the RFr model.Additionally, the SVR model resulted in an average RMSE value that was 1.53 times smaller, and the GRNN model had an average RMSE value 1.56 times smaller than that of the RFr model.Finally, the relative error (RE) in percentage error values for the GRNN and SVR models were 7.11% and 6.67% larger, respectively, compared to the average RE% value of the RFr model.The results mentioned above and shown in Figure4indicate that the constructed RFr model demonstrated the most effective fit with the calibration dataset.Based on the evaluation of the machine learning models' generalization capability using test datasets from Sindos and Piperia stations in the 2008-2009 testing period (Figure5), the performance metrics for all models exhibited minor fluctuations, suggesting that the tested models possessed similar generalization abilities.Figure6illustrates that there is minimal variability in annual ET o values (mm/year) between the RFr, GRNN, SVR models, as well as the FAO 56 PM model, for the years

Figure 6
illustrates that there is minimal variability in annual ET o values (mm/year) between the RFr, GRNN, SVR models, as well as the FAO 56 PM model, for the years 2000-2009 at Sindos station, as well as for the years 2008-2009 at Piperia station.When comparing these models to the reference FAO 56 PM ET o values, at Sindos station, they tend to overestimate ET o in 2002 and 2006 and underestimate ET o in 2000, 2001, 2007, and 2008, with the RFr model performing better.Meanwhile, at Piperia station, the RFr model overestimated ET o in 2008 and 2009, while the SVR model underestimated ET o in the same years, with the GRNN model showing superior results.using test datasets from Sindos and Piperia stations in the 2008-2009 testing period (Figure5), the performance metrics for all models exhibited minor fluctuations, suggesting that the tested models possessed similar generalization abilities.

Figure 6
illustrates that there is minimal variability in annual ETo values (mm/year) between the RFr, GRNN, SVR models, as well as the FAO 56 PM model, for the years 2000-2009 at Sindos station, as well as for the years 2008-2009 at Piperia station.When comparing these models to the reference FAO 56 PM ETo values, at Sindos station, they tend to overestimate ETo in 2002 and 2006 and underestimate ETo in 2000, 2001, 2007, and 2008, with the RFr model performing better.Meanwhile, at Piperia station, the RFr model overestimated ETo in 2008 and 2009, while the SVR model underestimated ETo in the same years, with the GRNN model showing superior results.

Figure 8
Figure 8 demonstrates that there is minimal fluctuation in monthly ETo values (mm/day) between the RFr, GRNN, SVR, and FAO 56 PM models.This consistency is observed during the calibration period from 2000 to 2007 at Sindos station (Figure 8a), as well as during the testing period, from 2008 to 2009, at both Sindos (Figure 8b) and Piperia (Figure 8c) stations.Notably, the RFr model consistently outperforms the other models in both periods.

Figure
Figure 9a,c,e depict scatterplots comparing ETo values (mm/day) estimated by the FAO 56 PM model with those from the RFr, GRNN, and SVR models during the calibration period (2000-2007) at Sindos station.In these plots, it is obvious that the RFr model outperformed the GRNN and SVR models based on the slope of the fitted line equations and R 2 values (slope: 1.019 vs. 1.0073 and 0.9745; R 2 : 0.9662 vs. 0.9162 and 0.9212, respectively).

Figure 23 Figure 9 .
Figure 9a,c,e depict scatterplots comparing ET o values (mm/day) estimated by the FAO 56 PM model with those from the RFr, GRNN, and SVR models during the calibration period (2000-2007) at Sindos station.In these plots, it is obvious that the RFr model outperformed the GRNN and SVR models based on the slope of the fitted line equations and R 2 values (slope: 1.019 vs. 1.0073 and 0.9745; R 2 : 0.9662 vs. 0.9162 and 0.9212, respectively).R PEER REVIEW 17 of 23

Figure 9 .
Figure 9. Scatterplots of ET o values (mm/d) estimated by FAO 56 PM and RFr, GRNN, and SVR models, during calibration (2000-2007) period (a,c,e) and during testing (2008-2009) period (b,d,f), respectively, at Sindos station.On the other hand, Figure 9b,d,f present scatterplots comparing ET o values (mm/day) estimated by the FAO 56 PM model with those from the RFr, GRNN, and SVR models during the testing period (2008-2009) at Sindos station.Similar to the calibration period, the RFr model demonstrated superior performance compared to the GRNN and SVR models, as indicated by the slope of the fitted line equations and R 2 values (slope: 0.9875 vs. 0.9872 and 0.9596; R 2 : 0.9172 vs. 0.9022 and 0.91, respectively).Figure10a-c illustrate scatterplots that compare ET o values (mm/day) estimated by the FAO 56 PM model with those from the RFr, GRNN, and SVR models during the testing period (2008-2009) at Piperia station.As can be seen (Figure10), the RFr model has the highest R 2 value (0.8775), suggesting it explains the most variance in the data.However, its slope (0.8475) deviates significantly from 1, indicating it tends to underestimate the actual values.GRNN model has a lower R 2 value (0.8581) compared to the RFr model but has the slope (1.0035) closest to 1 and the intercept (0.0783) closest to 0, suggesting very accurate predictions, although it could not be able to explain the most variance in the data.Finally, Figure 10a-c illustrate scatterplots that compare ET o values (mm/day) estimated by the FAO 56 PM model with those from the RFr, GRNN, and SVR models during the testing period (2008-2009) at Piperia station.

Table 2 .
Evaluation metrics for the predicted daily ET o by the RFr, GRNN, and SVR models during the 2008-2009 period, for both Sindos and Piperia stations; n = 731.