Towards a Comprehensive Assessment of Statistical versus Soft Computing Models in Hydrology: Application to Monthly Pan Evaporation Prediction

: This paper evaluates six soft computational models along with three statistical data-driven models for the prediction of pan evaporation (EP). Accordingly, improved kriging—as a novel statistical model—is proposed for accurate predictions of EP for two meteorological stations in Turkey. In the standard kriging model, the input data nonlinearity effects are increased by using a nonlinear map and transferring input data from a polynomial to an exponential basic function. The accuracy, precision, and over/under prediction tendencies of the response surface method, kriging, improved kriging, multilayer perceptron neural network using the Levenberg–Marquardt (MLP-LM) as well as a conjugate gradient (MLP-CG), radial basis function neural network (RBFNN), multivariate adaptive regression spline (MARS), M5Tree and support vector regression (SVR) were compared. Overall, all the applied models were highly capable of predicting monthly EP in both stations with a mean absolute error ( MAE ) < 0.77 mm and a Willmott index ( d ) > 0.95. Considering periodicity as an input parameter, the MLP-LM provided better results than the other methods among the soft computing models ( MAE = 0.492 mm and d = 0.981). However, the improved kriging method surpassed all the other models based on the statistical measures ( MAE = 0.471 mm and d = 0.983). Finally, the outcomes of the Mann–Whitney test indicated that the applied soft computational models do not have signiﬁcant superiority over the statistical ones ( p -value > 0.65 at α = 0.01 and α = 0.05).


Introduction
One of the key elements of water resources management and hydrological projects is to estimate the evaporation in a given region. This is even more important in managing water resources in arid and semi-arid regions [1]. Some researchers have applied the Budyko framework, which is a straightforward model that considers only rainfall and potential evaporation as the required input for simulating and controlling various water management plans [2,3]. Simply, knowing the accurate amount of the evaporation is essential for water resources management projects.
Researchers have applied different approaches for modeling pan evaporation (EP) and evapotranspiration in the literature classified as (i) physically-based combination models that take into account mass and energy conservation principles; (ii) semi-physical models that use either mass or energy conservation; and (iii) data-driven models including soft computing and statistical techniques [4][5][6]. The shortage of EP data (temporally or spatially) is a major problem in some areas because it is difficult and expensive to install evaporation pans. In these cases, applying data-driven and soft computing models for estimating water evaporation is an effective and appropriate approach [7][8][9]. The accuracy of modeling approaches is the most important parameter to take into account.
Several researchers have used climatic variables to estimate EP values [10,11]. Climatebased approaches are appropriate methods when provided with specific climatic data, which cannot always be easily obtained from a determined area. Similarly, data-driven approaches, including computational intelligence and machine learning are also suitable for estimating the EP. Recently, regular hybrid and integrative data-driven models (e.g., artificial neural networks (ANN) as well as support vector machine (SVM) and adaptive neuro-fuzzy inference systems (ANFIS)) have been used for estimating the EP [7,[12][13][14][15][16][17][18][19][20].
Tabari et al. [21] estimated the daily EP of a region by using different methods (ANNs and multivariate nonlinear regression (MNLR)) and concluded that ANN was more accurate than MNLR. Kişi et al. [22] applied three soft computing models, such as M5tree, ANN, and chi-squared automatic interaction, to predict the daily EPs in Turkey. They reported that the ANN model performed better than the two others. Tezel and Buyukyildiz [23] investigated the usability of ANNs and ε-SVR to estimate monthly EP. According to the performance criteria, the ANN algorithms and ε-SVR had the same performance. Tezel and Buyukyildiz [23] compared the accuracy of SVM basis ε-support vector regression, radial basis function (RBFNN), and multilayer perceptron ANN (MLPNN), and showed that the latter provided the most accurate results. Keshtegar and Kisi [24] proposed the modified response surface method (RSM) and modified RSMs have been compared with ANFIS and M5Tree. Wang et al. [25] investigated the capabilities of ANFIS, M5Tree, and fuzzy genetic (FG) for six stations in the Yangtze River Basin. The results stated that the FG model generally produced better results. In another study, Wang et al. [26] compared the abilities of FG, SVR, MARS, M5Tree, and multiple linear regression (MLR). The overall results indicated that the soft computing models generally performed better than the regression methods. Ghorbani et al. [27] applied a hybrid MLPNN for daily EP prediction at two stations. The results showed that the MLPNN model provided better performance compared to the SVM model. Majhi et al. [28] applied a deep ANN model and compared it with the traditional MLPNN for three areas of the Chhattisgarh State in India. The findings of the study showed that the deep ANN model was more accurate than the traditional MLPNN. The abilities of ANN and extreme learning machine (ELM) models have been compared in predicting EP for two stations in Algeria by Sebbar et al. [29]. The results indicated that the ELM could be successfully used to estimate the daily EP [29]. Al-Mukhtar [30] investigated the applicability of quantile regression forest for EP prediction in arid areas. In comparison to conventional NNs and linear regression models, the applied quantile regression forest provided better results. Mohammadi et al. [31] predicted monthly EP using integrative ANFIS, MLP and RBNN models for two stations in Iran. The results showed that the integrative ANFIS model acted better compared to the MLP and RBNN. Yaseen et al. [32] applied several machine learning models, including ANN, classification and regression tree (CART), gene expression programming and SVM for predicting EP in arid and semi-arid areas. The findings of the study indicated that the SVM was superior to the other applied models.
A literature review related to the kriging approach revealed that it has never been used to predict PE. However, this method was applied for the prediction of solar radiation [33] and the daily total dissolved gas in aquatic systems concentration by Heddam et al. [34]. The kriging interpolation, which is a flexible regression tool for approximating any nonlinear problem, can be introduced as a potential method for providing accurate EP predictions.
Indeed, soft computing techniques have provided satisfactory results in EP prediction [35]. The majority of EP modeling reported the superiority of soft computing models Water 2021, 13, 2451 3 of 22 over statistical models [21,26,36]. The main objective of this study is to challenge the capability of soft computing techniques versus statistical data-driven models.
The present study investigates the accuracy of six soft computing methods, including M5tree, MARS, SVR, RBFNN, Levenberg-Marquardt perceptron ANN and conjugate gradient perceptron ANN and compares them with three different statistical approaches, RSM, kriging, and improved kriging. In improved kriging, the basic functions are transferred from polynomial to exponential functions to estimate monthly EP. In kriging models, the second-order polynomial function is applied as regressed function to predict nonlinear challenges. This function may not have predicted an accurate result for complex problems such as EP. Thus, novel improved modeling is parented to enhance the regressed function applied in original kriging. It uses a nonlinear transformation as an exponential map for input variables of the EP predictions as a complex engineering problem with nonlinear effect that the accurate results in the modeling process is a cap for prediction of the statistical regression approaches such as kriging and RSM models.
To our best knowledge, similar studies have not been carried out in applying the above-mentioned methods for the estimation of EP. The subsequent parts of the rest of the paper are organized as follows: In Section 2, the two stations are introduced, and the data sets are presented. The third section deals with nine modeling methods applied in statistical and soft computing approaches. The results of the predicted EP are presented and compared in the fourth section. The fifth section of the paper deals with the hypothesis testing and relevant discussion. Finally, the last section provides the concluding remarks of the present work.

Case Study and Dataset
The input parameters for this study are monthly climatic data such as solar radiation (SR, as Langley), sunshine hours (HS), relative humidity (RH), wind speed (WS as m/s) as well as the minimum (Tmin as • C) and maximum temperature (Tmax as • C). Two stations in the Eastern Mediterranean Region as Adana (latitude 37.22 • N, longitude 35.40 • E and altitude 20 m) and Antakya (latitude 36.33 • N, longitude 36.30 • E and altitude 100 m) were selected for the comparing modeling results. The map of the study area is illustrated in Figure 1. The studied area has a climate with cool and rainy winters, and moderately hot and dry summers and it receives yearly rainfall amounts of between 580 and 1300 mm. Data were gathered from the Turkish State Meteorological Service (TSMS) having a modernized calibration center. The calibration center was accredited by the Turkish Accreditation Agency to ensure the measurements' reliability and to provide the validity of the measurements' quality around the world. Temperature, RH, and WS calibration laboratories are accredited with TS EN ISO/IEC 17025 standards and work in accordance with this standard. Global radiation and wind direction data are also in accordance with the TS EN ISO/IEC 17025 standards. The evaporimeter used for obtaining pan evaporation in Turkey is the US Weather Bureau Class A pan. The raw datasets were directly utilized in the presented study without pre-processing. The available data period covers September 1981 to March 2016 for Adana and from October 1983 to December 2010 for Antakya. There is no gap in the data.
In Figure 2, the general characteristics of the independent variables and the target value for the (a) Adana and (b) Antakya stations are shown using the box-whisker plot and related correlation values. These plots graphically depict the variability of each parameter in terms of minimum, quartiles, and maximum values. Moreover, outliers are plotted as individual points.      The Pearson correlation coefficient was applied to analyze the effects of Tmax, RH, WS, SR, and HS on EP. It can be seen from Figure 2 that there were high correlations between EP, Tmin, Tmax, SR, and HS for both Stations. It is worth noting that the wind speed showed high correlation with EP for the Antakya Station (correlation = 0.804), whereas the reciprocal value of the WS correlation for the Adana Station is much lower (correlation = 0.245).
For both of the Stations, the correlation between the relative humidity (RH) and the EP was weaker than the other parameters. However, in the Antakya Station, the correlation value of RH is negative, which implies that the increase in relative humidity might lead to a decrease in EP. The main factors responsible for the EP in both Stations include sunshine hours (HS), Tmin, and SR. In the present study, data was split into two sets, 70% (for training) and 30% (for testing) for executing and assessing the applied models.
As for the sensitivity analysis, the best subset regression using adjusted R-Sq and Mallows CP was applied. The results indicated that all of the input variables have a significant impact on the EP variable; hence, all of the independent parameters are used as the input vector for constructing the models. Considering a descending order (from the most influential to the least influential independent variables on PE), the following results were achieved:

Methods
Nine different approaches in terms of two main categories of statistical (RSM, kriging and improved kriging) and machine learning models (SVR, MARS, M5Tree, MLP-LM, MLP-CG and RBNN) were implemented for estimating EP.

Artificial Neural Networks: MLP-LM, MLP-CG, RBFNN
The ANNs are adaptable learning structures constructed from different interconnected layers and a number of processing elements (called artificial neurons). So far, several types of ANNs were developed and implemented for simulating and predicting hydrological problems such as evaporation [37]. Among all the developed models, the multilayer perceptron (MLP) and radial basis function neural network (RBFNN) have been used in several applications, and their potential in capturing nonlinear features of complex phenomena were proven by the following relation [38,39].
where β 0 , β j , w j , w ij are respectively the biases and weights of the output and the M-hidden layer and NV represents the number of input variables. f is an activation function for hidden neurons in the MLP and RBFNN models. Sigmoid functions were considered for the MLP and radial basis function were applied for the RBFNN models. It should be noted that MLPs [38] and RBFNNs [40] can be considered as the fundamental versions of feed-forward networks with a supervised learning approach. In this study, two types of MLP networks have been developed using two different approaches for the learning algorithm: (1) the Levenberg-Marquardt algorithm, and (2) the conjugate gradient (CG) algorithm. In addition to the MLP-LM and MLP-CG neural networks, the efficiency of RBFNN was also challenged for the evaporation simulation [41].

Support Vector Regression (SVR)
The rapid application of SVMs in modeling various problems in engineering urges researchers to apply different types of SVMs to different research fields. The core analogy of constructing SVMs is to map variables from input space into high-dimensional feature space by using special functions as below [42,43]: Water 2021, 13, 2451 where β 0 is bias and K(x, x i ) is the Kernel function for transferring the input data from x-space to N-set feature space which is computed as below relation [44]: where, σ is the parameter of the Kernel function. α i and α * i represent the Lagrange multipliers as unknown coefficients in the SVR model. Recently, the application of the SVR model in hydrological time series modeling has provided promising outcomes [45]. Several researchers have already claimed that SVR is efficient in modeling evaporation processes [23,46].

Multivariate Regression Spline (MARS)
Proposed by Friedman in 1991 [47], multivariate adaptive regression spline is a procedure for fitting adaptive nonlinear functions using a piecewise nonparametric regression method. Unlike the black box models (e.g., ANNs), MARS models are deterministic, which means that in the final regression form the input variables are identified and the interactions between them are specified. Therefore, the MARS models are much easier to be interpreted than the other techniques [48][49][50]. Considering X as the only independent variable and Y as the dependent variable (target value), it can be seen in Figure 3 that the space of X variable is divided into three sub-regions with three different equations. These equations relate the independent variable space to the target of the system.
where β0 is bias and ( , ) is the Kernel function for transferring the input data from xspace to N-set feature space which is computed as below relation [44]: where, is the parameter of the Kernel function. and represent the Lagrange multipliers as unknown coefficients in the SVR model. Recently, the application of the SVR model in hydrological time series modeling has provided promising outcomes [45]. Several researchers have already claimed that SVR is efficient in modeling evaporation processes [23,46].

Multivariate Regression Spline (MARS)
Proposed by Friedman in 1991 [47], multivariate adaptive regression spline is a procedure for fitting adaptive nonlinear functions using a piecewise nonparametric regression method. Unlike the black box models (e.g., ANNs), MARS models are deterministic, which means that in the final regression form the input variables are identified and the interactions between them are specified. Therefore, the MARS models are much easier to be interpreted than the other techniques [48][49][50]. Considering X as the only independent variable and Y as the dependent variable (target value), it can be seen in Figure 3 that the space of X variable is divided into three sub-regions with three different equations. These equations relate the independent variable space to the target of the system. The endpoints of the segments of each sub-region are called knots ( Figure 3). The resulting piecewise regression lines (basis functions, BFs) make the final regression form flexible and appropriate for capturing trends from linear functions as bellow [51]: where βi = 0, 1, …, m are unknown coefficients and m is the number of basis functions (BF) which is determined using a piecewise linear function as follows [33]: where C represents the knot which is a constant coefficient. By considering more independent variables, more equations will be added to the final regression form. In order to The endpoints of the segments of each sub-region are called knots ( Figure 3). The resulting piecewise regression lines (basis functions, BFs) make the final regression form flexible and appropriate for capturing trends from linear functions as bellow [51]: where β i = 0, 1, . . . , m are unknown coefficients and m is the number of basis functions (BF) which is determined using a piecewise linear function as follows [33]: where C represents the knot which is a constant coefficient. By considering more independent variables, more equations will be added to the final regression form. In order to determine the location of knots, an adaptive regression algorithm is used. In addition, BFs are generated by a stepwise searching process. In brief, the main procedure of the MARS method is categorized into two parts of the forward and backward phases. The location of potential knots and BFs equations are specified in the forward phase. To modify and improve the modeling accuracy, unnecessary and the least effective variables are removed in the backward phase [48]. Further details for the mathematical procedure of the MARS method can be obtained from Friedman [47].

M5 Model Tree
Quinlan (1992) introduced a piecewise linear regression model, called the M5 model tree (M5Tree) [52,53], which has a tree structure based on binary decisions. The linear regression functions, which develop interconnection between the input and output vectors, can be extracted at the terminals (leaves) nodes.
Constructing an M5tree model requires two distinct phases; first the initial tree is generated and is then pruned. In the first phase, data sets are split into several subsets, which create a decision tree. In other words, the M5 model tree splits the data set space into subsets (sub-spaces) and generates a linear regression model [54,55]. As can be observed in Figure 4, the two-dimensional dynamic space of the input vector (X1 & X2) is split schematically into five subsets. determine the location of knots, an adaptive regression algorithm is used. In addition, BFs are generated by a stepwise searching process. In brief, the main procedure of the MARS method is categorized into two parts of the forward and backward phases. The location of potential knots and BFs equations are specified in the forward phase. To modify and improve the modeling accuracy, unnecessary and the least effective variables are removed in the backward phase [48]. Further details for the mathematical procedure of the MARS method can be obtained from Friedman [47].

M5 Model Tree
Quinlan (1992) introduced a piecewise linear regression model, called the M5 model tree (M5Tree) [52,53], which has a tree structure based on binary decisions. The linear regression functions, which develop interconnection between the input and output vectors, can be extracted at the terminals (leaves) nodes.
Constructing an M5tree model requires two distinct phases; first the initial tree is generated and is then pruned. In the first phase, data sets are split into several subsets, which create a decision tree. In other words, the M5 model tree splits the data set space into subsets (sub-spaces) and generates a linear regression model [54,55]. As can be observed in Figure 4, the two-dimensional dynamic space of the input vector (X1 & X2) is split schematically into five subsets. The splitting criterion is determined by assuming the standard deviation (sd) of the class values that reach a node. Based on the sd, the standard deviation reduction (SDR) can be calculated as the following relation [13,56]: where T stands for the set of examples that reaches the node, and Ti is the subset of examples with the ith outcome of the potential set. After the first phase (viz. constructing the initial tree), a huge tree-like structure might be generated, which may cause poor generalization. To cope with this problem, in the second phase, the overgrown tree is pruned.

Response Surface Methodology
The response surface methodology (RSM) presents the advantage of multiple regression analysis via a statistical technique to simulate a response space based on quantitative data obtained from the extracted multivariate equations as presented below [57]: where NV denotes the number of input variables. β0, βi and βij are unknown coefficients for polynomial terms. During the mathematical process, RSM explores the influence of The splitting criterion is determined by assuming the standard deviation (sd) of the class values that reach a node. Based on the sd, the standard deviation reduction (SDR) can be calculated as the following relation [13,56]: where T stands for the set of examples that reaches the node, and T i is the subset of examples with the ith outcome of the potential set. After the first phase (viz. constructing the initial tree), a huge tree-like structure might be generated, which may cause poor generalization. To cope with this problem, in the second phase, the overgrown tree is pruned.

Response Surface Methodology
The response surface methodology (RSM) presents the advantage of multiple regression analysis via a statistical technique to simulate a response space based on quantitative data obtained from the extracted multivariate equations as presented below [57]: where NV denotes the number of input variables. β 0 , β i and β ij are unknown coefficients for polynomial terms. During the mathematical process, RSM explores the influence of multiple independent variables on the response parameter and optimizes the trending procedure by tuning the number of required experiments [58][59][60][61].

Kriging Interpolation Approach
The kriging basis nonlinear model is a well-known interpolation approach to approximate geological problems [62]. It is defined by using stochastic terms according to the following relation [63,64] whereβ = β 1 ,β 2 , . . . ,β m T are regression coefficients for n-support points with m basic functions. The unknown coefficients are computed as follows [65]: whereŶ(X) is the predicted value. R represents the correlation matrix which is given as: in which r X i , X j is the cross-correlation function computed by the following relation: where, r ij is the distance between points, X i and X j and θ > 0 are unknown correlation parameters, which are determined as presented below [66][67][68][69]: where n represents the number of training points, andσ 2 denotes the variance of the model obtained as:σ In the kriging model, the basis function f can be defined as below: where the vector [ f 1 (X 1 ), f 2 (X 1 ), . . . , f m (X 1 )] includes the basic functions that are evaluated at the data input point of X 1 , and m is the number of the basic functions. The basis function f can be used as polynomial and exponential functions for original kriging and improved kriging, as presented in this study.
In the kriging models, the basic functions are considered as follows: where Mon represents the periodicity (month of the year), Ws is wind speed (m/s), T max and T min are respectively the maximum and minimum temperature ( • C), RH is the relative humidity (%), SR is the solar radiation (langley), and Hs represents the hours of sunshine (h). The surrogate model that uses an adaptive kriging framework can be used for (i) reducing the computational burden and increase the accuracy results of the optimization problem [50,70], (ii) structural reliability analysis [65,68], and (iii) reliability-based design optimization [67,71].

Improved Kriging
In the fitness process of the kriging model, the basic function term, i.e.,β f is an important factor for providing a flexible prediction. The stochastic term, i.e., r T (X)R −1 (Y −β f ), may produce a smaller covariance for approximating data with accurate basic function. Thus, the nonlinear form of the basic function can improve the accuracy of the EP predictions. A schematic comparative view of the exponential and linear polynomial functions is presented in Figure 5 to illustrate the fitness of the exponential basic function. We used the exponential basic function for the regression process instead of the linear basic function, in order to enhance the ability of the standard kriging model.

Improved Kriging
In the fitness process of the kriging model, the basic function term, i.e., ^ is an important factor for providing a flexible prediction. The stochastic term, i.e., ( ) − ^ , may produce a smaller covariance for approximating data with accurate basic function. Thus, the nonlinear form of the basic function can improve the accuracy of the EP predictions. A schematic comparative view of the exponential and linear polynomial functions is presented in Figure 5 to illustrate the fitness of the exponential basic function. We used the exponential basic function for the regression process instead of the linear basic function, in order to enhance the ability of the standard kriging model. In improved kriging, the linear and exponential functions are used by the following basic function: where are the input variables and exp denotes the exponential operator. The prediction accuracy of the improved kriging model for the estimation of the EP is tested based on an untried data set using ( ) in Equation (11) and predicted relation in Equation (8).

Methodology and Models Evaluation
The modeling process focuses on the monthly predictions of the EP based on two different scenarios, as presented below:


Scenario I (without periodicity): In the first scenario (#I), the monthly averaged of six meteorological parameters including wind speed (WS, m/s), relative humidity (RH, %), solar radiation (SR, Langley), sunshine hours (HS, h), minimum (Tmin, °C), and maximum temperatures (Tmax, °C), are considered as the input vectors of the applied models.


Scenario II, (with periodicity): In the second strategy, all of the mentioned independent meteorological parameters along with the time factor formed the input vector.
Due to the fact that in the second scenario, the order of the data is important for modeling, the time series cross-validation technique was applied. Thus, in both of the scenarios, 70% of the data was used for training the models, and the other 30% was used for the testing set. In the current work, the root mean square error (RMSE) was used as a measure of accuracy, while the mean bias error (MBE) was used as a measure of tendency. The absolute residual of the standard deviation between the actual EP values and the modeled ones (RSTD) was used as a measure of precision as presented below [60,72,73]: In improved kriging, the linear and exponential functions are used by the following basic function: f (X k ) = {1X k exp(X k )} (16) where X k are the input variables and exp denotes the exponential operator. The prediction accuracy of the improved kriging model for the estimation of the EP is tested based on an untried data set using r(X) in Equation (11) and predicted relation in Equation (8).

Methodology and Models Evaluation
The modeling process focuses on the monthly predictions of the EP based on two different scenarios, as presented below: • Scenario I (without periodicity): In the first scenario (#I), the monthly averaged of six meteorological parameters including wind speed (WS, m/s), relative humidity (RH, %), solar radiation (SR, Langley), sunshine hours (HS, h), minimum (Tmin, • C), and maximum temperatures (Tmax, • C), are considered as the input vectors of the applied models.

•
Scenario II, (with periodicity): In the second strategy, all of the mentioned independent meteorological parameters along with the time factor formed the input vector.
Due to the fact that in the second scenario, the order of the data is important for modeling, the time series cross-validation technique was applied. Thus, in both of the scenarios, 70% of the data was used for training the models, and the other 30% was used for the testing set. In the current work, the root mean square error (RMSE) was used as a measure of accuracy, while the mean bias error (MBE) was used as a measure of tendency.
The absolute residual of the standard deviation between the actual EP values and the modeled ones (RSTD) was used as a measure of precision as presented below [60,72,73]: where N is the number of data, EPm i represents the modeled EP for the ith data and EPo i stands for the observed EP values for the ith data. In addition to the above-mentioned measures, other statistics and criteria such as the mean absolute error (MAE), mean absolute percentage error (MAPE), Willmott index (d), total pan evaporation (Tot-EP), maximum value of the relative error between the calculated and observed EP (MAX (RE)) were also used for the evaluation of the applied methods [58].
where EPmean is the mean of monthly EP. In this study, the Wilcoxon nonparametric statistical hypothesis test is also implemented to evaluate the performance of statistical versus soft computing models at the 95% confidence level. The maximum relative absolute error ((Max (RE)) was computed as max (RE i ) and RE i = (|Ŷ i − Y i )|/Y i , whereŶ i an Y i indicate the estimated and observed pan evaporation.

Evaluation of the Applied Models
Tables 1 and 2 report the comparison statistics of the applied data-driven models for the Adana station for the first and second scenarios. For the first scenario (without periodicity), the improved kriging model has the lowest MAE (0.659 mm), MAPE (0.189), RMSE (0.843 mm) and the highest d (0.964), followed by the SVR model. Based on the MAE, d, and RMSE values, the ANN-CG and RSM models provided the weakest results. However, the M5Tree model gave the worst Max (RE) value (135.32). The mean and total pan evaporations were also better approximated by the improved kriging compared to the other models. In the second scenario, the improved kriging model presented a better value of MAE than the SVR model (improved kriging = 0.646 mm vs. SVR = 0.648 mm), but worse values for RMSE (improved kriging = 0.821 mm vs. SVR = 0.796 mm) and could not be seen as the being better than the SVR model.    In general, all of the applied statistical and soft computing models approximated the EP values satisfactory (with d > 0.95 and RMSE < 1 mm). In Table 3, the best predictive model is presented as the improved kriging model based on attaining two of the highest position of the three elements of accuracy, precision, and tendency. Table 3. The general performance of the applied models in terms of accuracy, precision, and tendency for Adana Station in the testing period.   The comparison statistics of the applied models are given in Tables 4 and 5 for the Antakya Station. In the first scenario, the improved kriging model outperformed the other statistical models considering all of the given measures in Table 4. Nonetheless, in comparison to the SVR-as the best soft computing model-the improved kriging model provided the best performance for the MBE (−0.001) value, it failed in sustaining its superiority over the SVR for the MAE (improved kriging = 0.489 mm vs. SVR = 0.463 mm) and RMSE (improved kriging = 0.626 mm vs. SVR = 0.613 mm) criteria.    For the second scenario (Table 5), the improved kriging surpassed all the other applied statistical and soft computing models considering MAE, MAPE, RMSE, and d. In this scenario, the MLP-LM was the best soft computing model in predicting the EP values based on the MAE, MBE, d, and Max (RE) values. Table 6 presents the best predictive models in terms of three perspectives of accuracy, precision, and tendency. As expected, the improved kriging performed better than the other models in the first scenario (without periodicity), while the MLP-LM and MLP-CG were also among the best models for the second scenario (with periodicity).    Figure 7, it is clear that the improved kriging, MARS, and MLP-CG models have similar graphs and they have less scattered predations than the other two models for the two modeling scenarios. It can also be seen that the M5Tree has the most scattered predicted values.
The ratio of the Willmott index of agreement (d) to the MAE can be used as a measure to compare the accuracy of different models. This statistic (d/MAE) varies from 0 to ∞. The larger value of the d/MAE denotes the better calibration of the applied model . The calculated d/MAE ratios of the applied models are illustrated in Figure 8 for both stations. In general, it is apparent that the improved kriging has higher accuracy than the other models. It can also be observed that better results were given by the improved kriging model considering the periodicity (scenario II). Figure 8 shows that the SVR is the second-best accurate model in predicting EP values, which is similar to the results in Tables 3 and 6 (marked with "H"). Despite being the most accurate soft computing model, the SVR did not act well on the precision and tendency of the predicted values, and as a result, it was not specified as the best model in Tables 3 and 6  for both stations. In general, it is apparent that the improved kriging has higher accuracy than the other models. It can also be observed that better results were given by the improved kriging model considering the periodicity (scenario II). Figure 8 shows that the SVR is the second-best accurate model in predicting EP values, which is similar to the results in Tables 3 and 6 (marked with "H"). Despite being the most accurate soft computing model, the SVR did not act well on the precision and tendency of the predicted values, and as a result, it was not specified as the best model in Tables 3 and 6.

Hypothesis Testing
The results of the significance test of the predicted values of statistical techniques and soft computing models using the Mann-Whitney test are presented in Tables 7 and 8 for the Adana and Antakya Stations, respectively. In the Mann-Whitney test, the null and alternative hypotheses are as follows (η is the median): The null hypothesis, H0: η1 − η2 = 0. The alternative hypothesis, H1: η1 − η2 ≠ 0.
The results of Tables 7 and 8 clearly reveal that there is no significant difference between the performance of the statistical models (RSM, kriging, and improved kriging) and soft computing models (M5Tree, RBNN, MLP-LM, MLP-CG, and MARS) at 95% and 99% confidence levels, as they have p-values greater than 0.05 and 0.01. In other words, the Mann-Whitney nonparametric test implies that the null hypothesis was not rejected, and none of the applied statistical-based predictive models surpasses the other soft computing models at the 0.05 and 0.01 levels of significance.

Hypothesis Testing
The results of the significance test of the predicted values of statistical techniques and soft computing models using the Mann-Whitney test are presented in Tables 7 and 8 for the Adana and Antakya Stations, respectively. In the Mann-Whitney test, the null and alternative hypotheses are as follows (η is the median): The null hypothesis, H0: η1 − η2 = 0. The alternative hypothesis, H1: η1 − η2 = 0. The results of Tables 7 and 8 clearly reveal that there is no significant difference between the performance of the statistical models (RSM, kriging, and improved kriging) and soft computing models (M5Tree, RBNN, MLP-LM, MLP-CG, and MARS) at 95% and 99% confidence levels, as they have p-values greater than 0.05 and 0.01. In other words, the Mann-Whitney nonparametric test implies that the null hypothesis was not rejected, and none of the applied statistical-based predictive models surpasses the other soft computing models at the 0.05 and 0.01 levels of significance.

Discussion
This paper aimed to challenge the performance of different statistical and soft computing models based on (i) mathematical (accuracy, precision, and tendency), and (ii) statistical (at the 0.01 and 0.05 levels of significance) perspectives. In accordance with the mathematical comparisons (Tables 1, 2, 4 and 5, and Figure 8), it was concluded that the improved kriging model performed better than the other applied models, which means that an improved statistical model might even be able to surpass soft computing models. Figure 9 illustrates the Taylor diagrams for (a) Adana and (b) Antakya Stations. As shown by these figures, the kriging models provide a better prediction for agreement than the RSM but worse than the soft computing models (viz. SVR, MARS, and RBFNN). The SVR provides a superior correlation with the observed data compared to the other soft computing models. As can be seen in Figure 9, the improved kriging enhanced the predictions of the standard kriging model.  . Taylor diagram for different models in the testing phase of (a) Adana station (b) Antakya station.

Conclusions
The soft computing models and statistical techniques are useful frameworks for making predictions of complex climatological indices, such as the hydrological pan evaporation (EP). The improved kriging method was presented as a statistical technique for the accurate prediction of the EP. The RSM, kriging, and improved kriging models were compared with soft computing models, such as the SVR, M5tree, MARS, RBNN, MLP-LM, and MLP-CG. Two different input scenarios, namely with and without periodicity, were applied for the modeling process in the Antakya and the Adana stations located in Turkey. The abilities of statistical models versus soft computing schemes were compared with There is no doubt that in most cases, soft computing models perform better than traditional statistical models. Similarly, the standard kriging and RSM models failed to surpass the soft computing models due to linear cross-correlation regressed function based on the statistical measures. This assessment is based on pertinent studies in the literature. For instance, in a comparative study between the capability of machine learning versus ANN, and statistical technique versus MLR, for EP prediction, it was found that the model efficiency and correlation coefficient of the ANN was higher than the MLR model for the calibration and validation phases [20]. The same result has been noted for the superiority of ANFIS model over the MLR statistical model [74]. However, it is worth noting that the majority of recent published studies have solely focused on the evaluation of several machine learning models [11,32]. Investigating the outcomes of these studies indicates that the machine learning models perform well in predicting evaporation at different climatical regions.
In this study, in addition to the mathematical evaluation of the potential performance of statistical and soft computing models, the results of the Mann-Whitney hypothesis test were also taken into account. The outcomes of the Mann-Whitney hypothesis test showed that none of the soft computing applied models has significant superiority over the statistical ones. In other words, despite their ability to model nonlinear phenomena, soft computing models should not be taken into granted as the preliminary predictive models. The improved versions of the RSM or kriging-based statistical techniques can improve the accuracy of the prediction of nonlinear problems. Thus, the improved statistical kriging technique uses the exponential transformation of input variables and can also be applied for other engineering problems with nonlinear complex relations. Furthermore, the competency of this method can be apprised by comparing it with machine learning models for complex problems with highly nonlinear relations.

Conclusions
The soft computing models and statistical techniques are useful frameworks for making predictions of complex climatological indices, such as the hydrological pan evaporation (EP). The improved kriging method was presented as a statistical technique for the accurate prediction of the EP. The RSM, kriging, and improved kriging models were compared with soft computing models, such as the SVR, M5tree, MARS, RBNN, MLP-LM, and MLP-CG. Two different input scenarios, namely with and without periodicity, were applied for the modeling process in the Antakya and the Adana stations located in Turkey. The abilities of statistical models versus soft computing schemes were compared with several statistical measures. The key findings of the study are summarized below: • Soft computing using machine learning models such as the SVR, MARS, MLP-ML, and RBNN provided more accurate predictions than the M5Tree and RSM. • The kriging model, as well as the SVR, RBFNN and MLP-ML, provided better performances compared to the RSM and M5Tree. • It was found that the developed improved kriging model performed better than the other applied models, including the soft computing (SVR, RBNN, MLP-ML, and MARS) and standard statistical (kriging and RSM) models. • By comparing the performances of the improved kriging method with six other applied models, it can be concluded that the proposed kriging framework can be successfully applied for this current hydrological challenge while its performances for other hydrological stations and other complex, sophisticated problems should be discussed in future.