Assessing the Robustness of Pan Evaporation Models for Estimating Reference Crop Evapotranspiration during Recalibration at Local Conditions

: A classic method for assessing the reference crop evapotranspiration (ET o ) is the pan evaporation (E pan ) method that uses E pan measurements and pan coe ﬃ cient (k p ) models, which can be functions of relative humidity (RH), wind speed (u 2 ), and temperature (T). The aim of this study is to present a methodology for evaluating the robustness of regression coe ﬃ cients associated to climate parameters (RH, u 2 , and T) in pan method models during recalibration at local conditions. Two years of daily data from April to October (warm season) of meteorological parameters, E pan measurements from class A pan evaporimeter and ET o estimated by ASCE-standardized method for the climatic conditions of Thessaloniki (Greece, semi-arid environment), were used. The regression coe ﬃ cients of six general nonlinear (NLR) regression E pan models were analyzed through recalibration using a technique called “random cross-validation nonlinear regression RCV-NLR” that produced 1000 random splits of the initial dataset into calibration and validation sets using a constant proportion (70% and 30%, respectively). The variance of the regression coe ﬃ cients was analyzed based on the 95% interval of the highest posterior density distribution. NLR models that included coe ﬃ cients with a 95% HPD interval that ﬂuctuates in both positive and negative values were considered nonrobust. The machine-learning technique of random forests (RF) was also used to build a RF model that includes E pan , u 2 , RH, and T parameters. This model was used as a benchmark for evaluating the predictive accuracy of NLR models but, also, for assessing the relative importance of the predictor climate variables if they were all included in one NLR model. The ﬁndings of this study indicated that locally calibrated NLR functions that use only the E pan parameter presented better results, while the inclusion of additional climate parameters was redundant and led to underﬁtting.


Introduction
The parameter of reference crop evapotranspiration ET o describes the maximum water losses that can be achieved by evaporation and transpiration from a field covered by a reference crop (e.g., grass) under no water restrictions, and it is used for water balance analysis and irrigation planning [1,2]. Various methods have been proposed for ET o assessment [1][2][3][4][5][6][7][8][9][10][11], with the most popular being the FAO-56 Penman-Monteith [1] from Food and Agriculture Organization (FAO) of the United Nations, and its updated version, the ASCE-standardized method [2], which reflects the current state-of-the-art, providing ET o estimations for two reference crops (a short and a tall reference crop, which correspond to clipped grass and alfalfa, respectively). The ASCE-standardized method has been proposed by the Task Committee of Environmental and Water Resources Institute (EWRI) of the American Society of Civil Engineers (ASCE) as the most precise method and requires a complete set of the four climatic parameters of temperature, solar radiation, relative humidity, and wind speed, which, in many cases, are not all available.
The problem of data availability can be confronted by other methods of reduced parameters [3][4][5][6][7][8][9][10][11], among which, there is the pan evaporation method that differs from the others, because it requires direct evaporation measurements (E pan ) from an evaporimeter. In this method, ET o is commonly calculated as the multiplication product of E pan and a pan coefficient (k p ), which depends on the local conditions of the surrounding environment [1]. There are various types of evaporimeters, but the most commonly used type is the class A pan. The class A evaporation pan is circular, 120.7 cm in diameter and 25 cm deep. It is made of galvanized iron (22 gauge) or Monel metal (0.8 mm). The pan is mounted on a wooden open frame platform, which is 15 cm above ground level. The pan must be level. It is filled with water to 5 cm below the rim, and the water level should not be allowed to drop to more than 7.5 cm below the rim. The water should be regularly renewed, at least weekly, to eliminate extreme turbidity. The pan, if galvanized, should be painted annually with aluminum paint [1].
With respect to class A pan evaporation, various regression k p models have been developed, which are functions of various parameters such as relative humidity, wind speed, temperature, and the fetch distance between the pan evaporation and boundary of grass covered or fallow soil surface [12][13][14][15][16][17][18][19][20][21]. The specific regression k p models have been tested in many environments , while, in many cases, their regression coefficients have been recalibrated or their form has been changed (e.g., using machine-learning models) to describe better the local conditions [19,26,33,38,40,41,47,49].
There are four main issues during k p models' recalibration or formation that should be considered. The first issue is that the predefined forms of k p models are subjected to nonlinear regression analysis (nonlinear regression also includes machine-learning techniques) without investigating if the climate parameters inside a k p model (e.g., RH, u 2 , and T) have a robust effect.
The second issue is that the regression analysis is usually performed considering the k p as dependent parameter. This approach does not allow a comparative analysis between the effects of E pan and the other climate parameters (e.g., u 2 , RH, and T), because E pan measurements are embodied in the "measured" k p values. This approach also creates other problems that are associated to the regression analysis. During a regression analysis, an optimization algorithm aims at minimizing the errors between observed and predicted values of the dependent variable, and because of this procedure, the observed k p with larger values have a greater effect on the error measures [61] and, consequently, a greater effect on regulating the regression coefficients. For example, if the measured k p values of colder days (lower ET o ) are larger from the respective values of warmer days (higher ET o ), which is not unusual [29,36,49], then the final calibrated k p model will perform better at colder days with lower ET o values. It is also worth mentioning that there are uncertainties in the validity of ET o during the cold periods [7] and thus, the k p values of those periods may not be accurate.
The third issue is that the estimated ET o by a complete formula (e.g., ASCE-standardized method) and the measured E pan evaporation are both affected by the same climate parameters, and someone would expect that solely E pan could explain the largest portion of variance in estimated ET o values, especially when there are no observed extreme deviations of meteorological parameters from normal conditions. That means that a model that considers only E pan without the inclusion of other parameters could provide an adequate prediction accuracy (examples of such models are given in [25,35,43,46,47,60], with R 2 ranging between~0.7-0.95), while the other climate parameters may be redundant. The reasons that could explain a low accuracy regression model between ET o and E pan are (a) the selection of a nonappropriate form of the regression model and (b) the introduction of experimental bias during ET o Hydrology 2020, 7, 62 3 of 17 and E pan measurements and, especially, during the measurement of real ET o in lysimeter conditions where additional effects from the soil and the grass crop may occur [62]. It is also worth mentioning that there are very few cases of calibrated k p models using measured ET o from lysimeters [13,15,16,22,24,33], while, in most cases, ET o is calculated using a benchmark method (e.g., ASCE-standardized method) that already uses a full set of climate parameters. Calibrating k p models that include the same climate parameters used for calculating ET o values may also be questionable from a statistical/mathematical point of view.
The fourth issue is that the relationship between ET o and E pan is usually defined a priori by ET o = k p ·E pan without investigating if these two parameters are related in a different way.
Considering the above issues, the objective of this study is to propose an integrated methodology for evaluating the robustness of regression coefficients and, consequently, their associated climate parameters that participate in general forms of pan evaporation models during recalibration at local conditions. In this study, the general forms of existing pan evaporation models are used as examples, and they are recalibrated at local conditions based on estimated daily ET o values by the ASCE-standardized method for short grass (equivalent to the FAO-56 method). The proposed methodology that is going to be presented in this study allows to select the important climate parameters (e.g., RH, u 2 , and T) in combination with E pan measurements for building a robust model for the estimation of ET o at local conditions.

Data
Daily meteorological data of temperature (T), relative humidity (RH), solar radiation (R s ), wind speed at 2 m above ground (u 2 ), and precipitation (P) covering the warm-dry cultivation periods (May to September) of the years 2008 and 2009 were obtained by the meteorological station of the Aristotle University Farm (40 • 32 08 N, 22 • 59 18 E,~1 m a.s.l.) in Thessaloniki, Greece (Table 1). The daily values of climate parameters were calculated as mean values of hourly observations of a 24-h period.
The climate data were used for estimating daily ET o based on the ASCE-standardized method [2]. Additionally, daily E pan measurements were obtained during the same period using a class A pan evaporimeter of Monel metal with fetch distance F = 1 m for green upwind fetch (Case A) ( Table 1). The data describe adequately the meteorological conditions of the warm-dry season in the Thessaloniki Plain in Greece, where the climate is considered a semi-arid Mediterranean environment. The period of May-September is the period for cultivating summer crops and the period of interest for calculating ET o for irrigation purposes. The records of rainfall days (p > 0) during May-September were also excluded from the analysis, leading to a final number of 212 daily records of meteorological and E pan data.

Reference Crop Evapotranspiration
ET o was estimated in this study using the ASCE-standardized method at a daily step for short reference crops (clipped grass of 12 cm) by the following function [2]: where ET o is the reference crop evapotranspiration (mm d −1 ), R n is the net radiation at the crop surface (MJ m −2 d −1 ), u 2 is the mean daily wind speed at 2-m height (m s −1 ), T is the mean daily air temperature ( • C), G is the soil heat flux density at the soil surface (MJ m −2 d −1 ), e s is the mean daily vapor pressure (kPa), e a is the mean daily actual vapor pressure (kPa), ∆ is the slope of the saturation vapor pressure-temperature curve (kPa • C −1 ), γ is the psychrometric constant (kPa • C −1 ), C n and C d are constants, which vary according to the time step, the reference crop type (bulk surface resistance and aerodynamic roughness of the surface), and daytime/nighttime ratio. For short reference crop calculations at daily steps using Equation (1), the following values 900, 0.34, and 0 are considered for C n , C d , and G, respectively, as suggested by Allen et al. [1,2], providing equivalent ET o results to the FAO-56 Penman-Monteith equation. Table 1. Minimum, maximum, and average values of daily temperature (T), wind speed at 2 m above ground surface (u 2 ), relative humidity (RH), incident solar radiation (R s ), measured Class A pan evaporation (E pan ), and estimated reference evapotranspiration according to the ASCE-standardized method (Equation (1)) after excluding rainfall days.

Pan Evaporation Method and General Forms for Local Conditions
The general equation of the pan evaporation method for estimating reference crop evapotranspiration ET o is the following [63]: where E pan is the measured evaporation from the evaporimeter (mm day −1 ), and k p is the pan evaporation coefficient. The range of k p values is approximately between 0.35 to 1.1 for various pan types and various local conditions of climates and surrounding environments [14,63].
For the purposes of this study, the k p models of Cuenca [13], Allen and Pruitt [14], Snyder [15], Pereira et al. [16], and Orang [17] were used. These models have been developed for estimating short reference crops based on measurements from class A pan evaporimeters and are given, respectively, by the following equations: Hydrology 2020, 7, 62 5 of 17 where RH is the mean daily relative humidity (%), u 2 is the mean daily wind speed at the height of 2 m above the ground (in km d −1 for Equations (3)-(5) and (7) and in m sec −1 for Equation (6)), and F is the upwind distance from the pan to the border of grass-covered land (m). A difference exists in Equation (6) in comparison to the other k p models, because it is a function of u 2 and temperature T (T is included in the ∆ parameter), and it is also a ratio without an intercept, where all coefficients are related to u 2 and T. An additional important k p model is the one of Raghuwanshi and Wallender [18], but it is not used in this study, because it consists of categorical factors where their classes could not be adequately covered by the dataset of this study for allowing a proper recalibration of all its compartments.
On the other hand, Snyder et al. [25] proposed a sine wave equation for estimating ET o using only E pan adjusted for F = 100 m, which was further modified by Ghare et al. [60], who added an intercept as follows: Equation (8) is a quite flexible formula and provides an alternative relation between ET o and E pan compared to the commonly used Equation (2).
The five k p models (Equations (3)- (7)) were transformed to their initial general form in order to analyze the robustness of their regression coefficients and, consequently, the effects of their associated climate parameters (e.g., RH, u 2 , and T) through regression analysis. In Equations (3)-(7), the fetch distance was F = 1 m, which led to the following simplified general forms, respectively: k p = a + b·u 2 + c·RH + d·RH 2 + e·RH 2 u 2 (from Equation (3)) (9) k p = a + b·u 2 + c·ln(RH) (from Equation (4)) (10) k p = a + b·u 2 + c·RH (from Equations (5) and (7)) (11) where a, b, c, d, and e are regression coefficients. Similarly, the general form of Equation (8), with an intercept according to Ghare et al. [61] for estimating ET o using only E pan , was defined as: Finally, the simple regression tool of STATGRAPHICS Centurion XV software (StatPoint Technologies Inc.) was used in order to assess the optimum linear formula with transformed variables between ET o and E pan . The specific tool analyzes automatically 27 linear formulas with various transformations of the variables (Table S1 in Supplementary Materials), providing the option to the user to choose the best one based on performance metrics. For the dataset of this study, the best transformation for both variables (ET o and E pan ) was found to be the following: In this study, the second nonlinear form of Equation (14) was used in the comparative analysis of the models.

Nonlinear Regression with Random Cross-Validation
The robustness of the regression coefficients of the six models (Equations (9)- (14)) was investigated using a random cross-validation nonlinear regression (RCV-NLR) analysis considering ET o as the dependent variable (i.e., the four general nonlinear forms of k p , Equations (9)-(12), were included in Equation (2) for estimating ET o ). The RCV-NLR analysis is based on a random splitting of the initial dataset into a calibration set (70% of the records) and a validation set (30% of the records). This random splitting was performed 1000 times, leading to a respective number of calibration and validation pairs of datasets. The RCV-NLR analysis was performed for each one of the six models based on the calibration sets, providing 1000 estimations of their respective regression coefficients. The estimated coefficients of each calibration set were used to validate each model based on the respective validation set. The RCV-NLR was built in R software using the "nls.lm" function of the {minpack.lm} package [64], which includes the Levenberg-Marquardt nonlinear least-squares algorithm.
The range of 1000 solutions of each regression coefficient of calibration and validation procedures was defined by the 95% confidence interval of the highest posterior density (HPD) distribution. The 2.5% and 97.5% thresholds (HPD thresholds), which contain the central 95% interval of the HPD distribution, were estimated in R software using the "p.interval" function of the {LaplacesDemon} package [65], which returns unimodal or multimodal HPD intervals, depending on the form of the probability distributions. Based on the RCV-NLR results, a model was considered robust according to the following rule: "a model is robust only when the 95% HPD intervals of all its regression coefficients preserve a constant sign (+ or -)" (i.e., a 95% HPD that contains positive and negative values of a regression coefficient indicates a nonrobust coefficient and, consequently, a nonrobust model).

Multiple Random Forests
Random forests (RF) [66] is a machine-learning method that is an extension and improvement of the Classification and Regression Trees (CART) method, also called decision trees. RF uses a modification of the bootstrap aggregating technique (bagging), by which a large collection of decorrelated, noisy, approximately unbiased trees are built and averaged in order to reduce the model variance and instability problems [67]. RF is an ensemble method where the aggregation of multiple trees improves the overall prediction accuracy, with both low bias and low variance results [66,68]. Some of the advantages of RF are the ability of modeling high-dimensional nonlinear relationships, resistance to overfitting, relative robustness, estimation of variable importance, and few user-defined parameters [66][67][68][69].
The hyperparameters of a RF model play a significant role in a model's performance, so the determination of their values is crucial. The hyperparameters that were considered were the number of the regression trees (num.trees), the number of candidate predictors that were randomly sampled (mtry), the proportion of train set that was used for building the model (sample.fraction), and the minimum number of points in the terminal nodes of the regression trees (min.node.size). A number of combinations of various values of hyperparameters was constructed, and by execution of the "ranger" package [70], their optimal set of values was determined (mtry = 3, num.trees = 1000, sample.fraction = 0.7, and min.node.size = 3). The validation set was 30% of the initial dataset, as in the case of RCV-NLR. RF also estimates the predictor variables' importance. The importance of a predictor variable is calculated as the mean-across all trees-decrease (%) of accuracy, expressed by the % change in Mean Squared Error -MSE (%), on the Out-of-Bag (OOB) sample when the variable is not taken into account by permuting its values randomly and maintaining the others as they were. Using constant optimal values of hyperparameters, the internal random procedures in RF lead every time to different solutions. For this reason, 1000 RF iterations (multiple random forests-MRF) were made in order to make comparisons with the NLR models. The MRF analysis was performed using all the predictor variables (E pan , u 2 , RH, and T) that participate in Equations (9)- (14). RF is not restricted by the limitations of a predefined nonlinear form and can be used as a benchmark model for evaluating the predictive accuracy of NLR models (e.g., Equations (9)- (14)) but, also, for assessing the relative importance of the predictor climate variables if they were all included in one NLR model.

Models' Metrics
The metrics of R 2 and the root mean square error (RMSE) were estimated for each calibration and validation dataset of the RCV-NLR analysis (Equations (9)- (14)) and OOB and validation dataset of the MRF analysis, leading to 1000 respective estimations of their values for each model. Moreover, 1000 estimations of the slope and intercept of the trend line in the 1:1 plot of observed (ET o -ASCE) vs. predicted (RCV-NLR or MRF) models only using the validation sets were also made as complementary metrics. The 1000 estimations of the metrics R 2 , RMSE, slope, and intercept were also analyzed through the computation of HPD intervals.

RCV-NLR Approach
The minimum, mean, maximum, and 2.5% and 97.5% HPD thresholds of the regression coefficients of Equations (9)-(14) derived by the RCV-NLR analysis during calibration are provided in Table S2 (Supplementary Materials). The HPD distributions of the regression coefficients based on their 1000 values are given in Figure 1.
Hydrology 2020, 7, x FOR PEER REVIEW 7 of 16 restricted by the limitations of a predefined nonlinear form and can be used as a benchmark model for evaluating the predictive accuracy of NLR models (e.g., Equations (9)- (14)) but, also, for assessing the relative importance of the predictor climate variables if they were all included in one NLR model.

Models' Metrics
The metrics of R 2 and the root mean square error (RMSE) were estimated for each calibration and validation dataset of the RCV-NLR analysis (Equations (9)- (14)) and OOB and validation dataset of the MRF analysis, leading to 1000 respective estimations of their values for each model. Moreover, 1000 estimations of the slope and intercept of the trend line in the 1:1 plot of observed (ETo-ASCE) vs. predicted (RCV-NLR or MRF) models only using the validation sets were also made as complementary metrics. The 1000 estimations of the metrics R 2 , RMSE, slope, and intercept were also analyzed through the computation of HPD intervals.

RCV-NLR Approach
The minimum, mean, maximum, and 2.5% and 97.5% HPD thresholds of the regression coefficients of Equations (9)-(14) derived by the RCV-NLR analysis during calibration are provided in Table S2 (Supplementary Materials). The HPD distributions of the regression coefficients based on their 1000 values are given in Figure 1. Considering Table S2 and Figure 1, the following were observed: • For Equation (9), four out of five regression coefficients (b, c, d, and e coefficients) could not preserve a stable sign inside their 95% HPD interval. These coefficients are associated to RH and u2 based on the specific kp model.

•
For both Equations (10) and (11), one out of three regression coefficients (b coefficient for both cases) could not preserve a stable sign inside its 95% HPD interval. This coefficient is associated to u2 in both kp models. • For Equation (12), one out of three regression coefficients (b coefficient) could not preserve a stable sign inside its 95% HPD interval. Due to the complex form of Equation (12), this coefficient is associated to all climate parameters that participate in the specific kp model.

•
For Equations (13) and (14), all their coefficients presented a stable sign inside their 95% HPD intervals.
Considering the results of Table S2 and Figure 1, the rule of a constant sign in the 95% HPD interval of all regression coefficients is violated in all models except those that use only the Epan parameter (Equations (13) and (14)). Based on this rule, these two models are expected to be more robust than the other four models (Equations (9)- (12)). This suggestion is verified by the results of R 2 and RMSE in the calibration and validation procedures (Table S3 and Figure 2). Figure 2 shows an alternative way to present the HPD distributions using modified box-whisker plots with boxes adjusted to the 95% HPD interval. These plots allow an easier visual comparison of the models' metrics. The results of Figure 2 show that the models of Equations (13) and (14) present higher R 2 and lower RMSE values than the other four models (Equations (9)-(12)) in both calibration (Figure 2a,c) and validation (Figure 2b,d) procedures. The results of Equations (13) and (14) are quite similar, with a slight better performance of Equation (13) (mean R 2 = 0.872 and mean RMSE = 0.429 mm d −1 during calibration and mean R 2 = 0.866 and mean RMSE = 0.443 mm d −1 during validation), because it has three regression coefficients instead of two, as in the case of Equation (14). Considering Table S2 and Figure 1, the following were observed: • For Equation (9), four out of five regression coefficients (b, c, d, and e coefficients) could not preserve a stable sign inside their 95% HPD interval. These coefficients are associated to RH and u 2 based on the specific k p model.

•
For both Equations (10) and (11), one out of three regression coefficients (b coefficient for both cases) could not preserve a stable sign inside its 95% HPD interval. This coefficient is associated to u 2 in both k p models. • For Equation (12), one out of three regression coefficients (b coefficient) could not preserve a stable sign inside its 95% HPD interval. Due to the complex form of Equation (12), this coefficient is associated to all climate parameters that participate in the specific k p model.

•
For Equations (13) and (14), all their coefficients presented a stable sign inside their 95% HPD intervals.
Considering the results of Table S2 and Figure 1, the rule of a constant sign in the 95% HPD interval of all regression coefficients is violated in all models except those that use only the E pan parameter (Equations (13) and (14)). Based on this rule, these two models are expected to be more robust than the other four models (Equations (9)- (12)). This suggestion is verified by the results of R 2 and RMSE in the calibration and validation procedures (Table S3 and Figure 2). Figure 2 shows an alternative way to present the HPD distributions using modified box-whisker plots with boxes adjusted to the 95% HPD interval. These plots allow an easier visual comparison of the models' metrics. The results of Figure 2 show that the models of Equations (13) and (14) present higher R 2 and lower RMSE values than the other four models (Equations (9)-(12)) in both calibration (Figure 2a,c) and validation (Figure 2b,d) procedures. The results of Equations (13) and (14) are quite similar, with a slight better performance of Equation (13) (mean R 2 = 0.872 and mean RMSE = 0.429 mm d −1 during calibration and mean R 2 = 0.866 and mean RMSE = 0.443 mm d −1 during validation), because it has three regression coefficients instead of two, as in the case of Equation (14).  (14)) presented as modified box-whisker plots, with boxes adjusted to the 95% HPD interval.

MRF Approach
The minimum, mean, maximum, and 2.5% and 97.5% HPD thresholds of R 2 and RMSE during the OOB and validation procedures of the MRF approach are provided in the modified box-whisker plots of Figure 3a,b, respectively. Their respective values are provided in Table S4 of the Supplementary Materials. The performance of the MRF modeling approach that uses the full set of variables (i.e., Epan, u2, RH, and T) was generally better than the RCV-NLR models, according to the R 2 and RMSE metrics (mean R 2 = 0.891 and mean RMSE = 0.398 mm d −1 during the OOB procedure and mean R 2 = 0.894 and mean RMSE = 0.398 mm d −1 during validation of the MRF).  14)) presented as modified box-whisker plots, with boxes adjusted to the 95% HPD interval.

MRF Approach
The minimum, mean, maximum, and 2.5% and 97.5% HPD thresholds of R 2 and RMSE during the OOB and validation procedures of the MRF approach are provided in the modified box-whisker plots of Figure 3a,b, respectively. Their respective values are provided in Table S4 of the Supplementary Materials. The performance of the MRF modeling approach that uses the full set of variables (i.e., E pan , u 2 , RH, and T) was generally better than the RCV-NLR models, according to the R 2 and RMSE metrics (mean R 2 = 0.891 and mean RMSE = 0.398 mm d −1 during the OOB procedure and mean R 2 = 0.894 and mean RMSE = 0.398 mm d −1 during validation of the MRF).
The respective modified box-whisker plots of the variables' importance are also provided in Figure 4 and their respective values in Table S5 of the Supplementary Materials. Based on Figure 4 and Table S5, the importance of the variables follows the order E pan > u 2 > RH > T, with the MSE% values of E pan being at least five times greater than the rest of the parameters. the OOB and validation procedures of the MRF approach are provided in the modified box-whisker plots of Figure 3a,b, respectively. Their respective values are provided in Table S4 of the Supplementary Materials. The performance of the MRF modeling approach that uses the full set of variables (i.e., Epan, u2, RH, and T) was generally better than the RCV-NLR models, according to the R 2 and RMSE metrics (mean R 2 = 0.891 and mean RMSE = 0.398 mm d −1 during the OOB procedure and mean R 2 = 0.894 and mean RMSE = 0.398 mm d −1 during validation of the MRF). The respective modified box-whisker plots of the variables' importance are also provided in Figure 4 and their respective values in Table S5 of the Supplementary Materials. Based on Figure 4 and Table S5, the importance of the variables follows the order Epan > u2 > RH > T, with the MSE% values of Epan being at least five times greater than the rest of the parameters. Range of the variables' importance in the MRF analysis presented as modified box-whisker plots, with boxes adjusted to the 95% HPD interval (Epan: daily pan evaporation (mm d −1 ), u2: daily wind speed at 2 m above ground surface (m s −1 ), RH: daily relative humidity (%), T: daily temperature ( o C)).

Trend Line (Slope and Intercept) of Observed vs. Predicted ETo Values from 1:1 Plots Based on the Validation Datasets of All Models
The minimum, mean, maximum, and 2.5% and 97.5% HPD thresholds of the slopes and intercepts of the trend lines of the observed vs. predicted ETo values from 1:1 plots were estimated for all the validation cases from the RCV-NLR analysis of Equations (9)- (14) and MRF analysis (Figure 5a,b, Table S6) (in all cases, the validation data are the 30% of the initial dataset). The mean values and the 95% HPD intervals of the slopes and intercepts ( Figure 5) showed that the ETo predictions of Equations (13) and (14) are better distributed along the perfect fit line of 45 degrees in comparison to the rest RCV-NLR models and the MRF model.

Trend Line (Slope and Intercept) of Observed vs. Predicted ET o Values from 1:1 Plots Based on the Validation Datasets of All Models
The minimum, mean, maximum, and 2.5% and 97.5% HPD thresholds of the slopes and intercepts of the trend lines of the observed vs. predicted ET o values from 1:1 plots were estimated for all the validation cases from the RCV-NLR analysis of Equations (9)- (14) and MRF analysis (Figure 5a,b, Table S6) (in all cases, the validation data are the 30% of the initial dataset). The mean values and the 95% HPD intervals of the slopes and intercepts ( Figure 5) showed that the ET o predictions of Equations (13) and (14) are better distributed along the perfect fit line of 45 degrees in comparison to the rest RCV-NLR models and the MRF model. for all the validation cases from the RCV-NLR analysis of Equations (9)- (14) and MRF analysis (Figure 5a,b, Table S6) (in all cases, the validation data are the 30% of the initial dataset). The mean values and the 95% HPD intervals of the slopes and intercepts ( Figure 5) showed that the ETo predictions of Equations (13) and (14) are better distributed along the perfect fit line of 45 degrees in comparison to the rest RCV-NLR models and the MRF model.

Testing the Inclusion of Meteorological Parameters as Independent Variables in the E pan Methodology
The results of this study showed that the decision to include additional meteorological parameters in the E pan methodology for assessing ET o or to use existing k p models that already use meteorological parameters should be tested with more than one methodology before their use. The advantages and disadvantages of each methodology used in this study can be summarized as follows: • The RCV-NLR method with various existing models has the advantage of analyzing the HPD interval of the coefficients, allowing to derive conclusions about the effects of their associated parameters (e.g., RH, u 2 , and T). On the other hand, this method has the following disadvantages: (a) it is based on predefined nonlinear models that are formed by the user and may not be efficient enough to capture the responses of the dependent variable, and (b) in some cases, the form of the nonlinear models is complex, and the effect of the regression coefficient cannot be easily interpreted (e.g., coefficients of Equation (12)).

•
The MRF method has the following advantages: (a) it is a machine-learning technique that is not restricted by predefined nonlinear forms by the user, (b) its predictions can be used as a measure of the predictive accuracy of typical NLR models, and (c) it provides the relative importance of the parameters. On the other hand, this method has the following disadvantages: (a) an RF model cannot be visualized as a function that is easily applicable by others, and (b) it is impossible to derive information that could explain whether an independent variable positively or negatively affects the dependent variable.
Considering the above, it is crucial to combine the two methodologies in order to derive an integrated conclusion that will allow the selection of the optimum NLR or RF model.

Effect of the Nonlinear Form on the Importance of Independent Variables
The results of this study showed that different forms of a nonlinear model affect differently the robustness of a regression coefficient and, consequently, the robustness of its associated independent variable. For example, the regression coefficient c in Equations (10) and (11) is associated to the RH and is robust according to the 95% HPD interval rule, while, in Equation (9), all the regression coefficients that are associated to the RH (i.e., the c, d, and e coefficients) are not robust ( Figure 1). As regards u 2 , its respective coefficient b is not robust in Equations (9)-(11) due to the positive and negative values in the respective 95% HPD intervals. Considering the above, someone could agree that RH presents a more clear and robust effect on ET o in the majority of NLR cases than u 2 , according to the RCV-NLR analysis, but this suggestion is opposite to the results of the MRF analysis, since the importance of the u 2 variable is greater than the one of RH (Figure 4). These results highlight the limitations of NLR models to capture correctly the effect of an independent variable.

Is It Really Needed to Include Climate Parameters in E pan Methodology for Estimating ET o ?
The variables' importance provided by MRF analysis and the small difference in statistical metrics (R 2 and RMSE) between the MRF and Equations (13) and (14) showed that simple NLR models that use only E pan can reach a very satisfactory performance when they are calibrated for local conditions. Moreover, it was observed that the inclusion of additional climate parameters in NLR models may lead to underfitting, since Equations (9)- (12) showed poorer performance than Equations (13) and (14). The MRF analysis also showed the dominance of the E pan variable over the other three parameters (u 2 , RH, and T), even though the dependent variable (ET o of the ASCE-standardized method) was estimated directly by them. This result can be easily justified by the fact that E pan includes/combines all the effects of the rest variables, indicating that their inclusion may be redundant. Our opinion is that the general forms of Equations (13) and (14) or some of the alternative general equations provided in Table S1 can describe satisfactorily the relation between ET o and E pan when they are calibrated for local conditions. The mean values and 95% HPD intervals of the slopes and intercepts that were derived from the 1:1 plots of the observed vs. predicted ET o values of the validation datasets ( Figure 5) also showed that the ET o predictions of Equations (13) and (14) are better distributed along the perfect fit line of 45 degrees in comparison to the MRF model and the rest of the RCV-NLR models. This indicates that MRF may be the most powerful model to minimize the overall error (in terms of RMSE and R 2 ), but it is not so robust at larger and smaller ET o values, probably due to bias from the inclusion of the climate parameters of u 2 , RH, and T.

Future Studies for Different Fetch and Different Climate Types
The precalibrated models of Equations (3)-(9) were given in this study in order to provide the source of different NLR forms, while the general forms that occurred (Equations (9)-(14)) were free from F, because they had to be recalibrated for a specific environment with constant fetch conditions. On the other hand, in the case where recalibration should be performed using data that include different fetch conditions, then the same analysis should be performed with the inclusion of F in the general forms of Equations (9)- (14). Such an analysis would be interesting, allowing also to assess the robustness of F parameter through analysis on its associated regression coefficients. The result of such an analysis regarding the robustness of the F parameter is unknown, and future studies should also focus on this issue.
The analysis of this study is based only on two years during the period May-September due to the temporary installation of a class A pan in the context of a parallel evapotranspiration experiment with lysimeters cropped, with flooded rice during the same period [71,72]. We believe that the selected dataset fulfills the requirements for supporting the results and conclusions of this study, despite the fact that it covers a small period and despite the fact that it describes the conditions of one climate type (semi-arid environment). Our opinion is based on the high variability of the daily records of the parameters (T, u 2 , RH, E pan , and ET o ) that were used in this study, which are provided in the form of histograms ( Figure S1 in the Supplementary Materials). The large coverage and the distribution of these parameters is adequate for supporting robust conclusions about the effects of the T, u 2 , and RH parameters in pan method models for the specific climate type, but future studies are also required to prove these findings for other climate types.

Conclusions
This study presented an integrated methodology for evaluating the robustness of regression coefficients associated to climate parameters (RH, u 2 , and T) in pan method models during recalibration at local conditions. The results of this study showed that: (a) the typical formula of Equation (2), with the use of a predefined form of a k p model with locally calibrated coefficients, may not be the optimum solution for estimating ET o . In the case of using such models, the calibration should be performed using as a dependent variable the ET o and not the "measured" k p (defined as the ratio ET o /E pan ). (b) the inclusion of climate parameters (e.g., u 2 , RH, and T) in pan method models (NLR, MRF, etc.) for estimating ET o may lead to underfitting and can be considered questionably from a statistical/mathematical point of view when ET o is not measured but computed by a formula that is based on the same climate parameters. (c) locally calibrated nonlinear regression functions for estimating ET o , which use only the E pan parameter, can have high predictive accuracy without the inclusion of additional climate parameters and can provide more balanced predictions along the perfect fit line of 45 degrees in 1:1 plots of observed vs. predicted ET o .
The findings of the specific study should also be investigated for different fetch and different climate types in future studies.
Supplementary Materials: The following are available online at http://www.mdpi.com/2306-5338/7/3/62/s1: Table S1: 27 cases of linear models with transformed variables. The performance of these models is automatically tested by StatGraphics Centurion XV software. Table S2: Minimum, mean, maximum, and 2.5% and 97.5% HPD thresholds of regression coefficients of Equations (9)-(14) derived by the RCV-NLR analysis during calibration (for Equations (9)-(12), the regression was made based on Equation (2) using ET o as the dependent variable). Table S3: Minimum, mean, maximum, and 2.5% and 97.5% HPD thresholds of R 2 and RMSE during calibration and validation for the six models Equations (9)-(14) based on RCV-NLR. Table S4: Minimum, mean, maximum, and 2.5% and 97.5% HPD thresholds of R 2 and RMSE during OOB and validation procedures for the MRF modeling approach. Table S5: Minimum, mean, maximum, and 2.5% and 97.5% HPD thresholds of the variables' importance expressed as MSE% for the MRF modeling approach. Table S6: Minimum, mean, maximum, and 2.5% and 97.5% HPD thresholds of slopes and intercepts of the trend lines of observed vs. predicted ET o values from 1:1 plots estimated for all the validation cases from the RCV-NLR analysis of Equations (9)- (14) and the MRF analysis. Figure S1: Frequency histograms of the 212 records of T, u2, RH, E pan , and ET o used in this study.