The Superiority of Data-Driven Techniques for Estimation of Daily Pan Evaporation

: In the present study, estimating pan evaporation (E pan ) was evaluated based on different input parameters: maximum and minimum temperatures, relative humidity, wind speed, and bright sunshine hours. The techniques used for estimating E pan were the artiﬁcial neural network (ANN), wavelet-based ANN (WANN), radial function-based support vector machine (SVM-RF), linear function-based SVM (SVM-LF), and multi-linear regression (MLR) models. The proposed models were trained and tested in three different scenarios (Scenario 1, Scenario 2, and Scenario 3) utilizing different percentages of data points. Scenario 1 includes 60%: 40%, Scenario 2 includes 70%: 30%, and Scenario 3 includes 80%: 20% accounting for the training and testing dataset, respectively. The various statistical tools such as Pearson’s correlation coefﬁcient (PCC), root mean square error (RMSE), Nash–Sutcliffe efﬁciency (NSE), and Willmott Index (WI) were used to evaluate the performance of the models. The graphical representation, such as a line diagram, scatter plot, and the Taylor diagram, were also used to evaluate the proposed model’s performance. The model results showed that the SVM-RF model’s performance is superior to other proposed models in all three scenarios. The most accurate values of PCC, RMSE, NSE, and WI were found to be 0.607, 1.349, 0.183, and 0.749, respectively, for the SVM-RF model during Scenario 1 (60%: 40% training: testing) among all scenarios. This showed that with an increase in the sample set for training, the testing data would show a less accurate modeled result. Thus, the evolved models produce comparatively better outcomes and foster decision-making for water managers and planners.


Introduction
Estimating pan evaporation (PE) is essential for monitoring, surveying, and managing water resources. In many arid and semi-arid regions, water resources are scarce and seriously endangered by overexploitation. Therefore, the precise estimation of evaporation becomes imperative for the planning, managing, and scheduling irrigation practices. Evaporation happens if there is an occurrence of vapor pressure differential between two A modern universal learning machine proposed by Vapnik (1995) [33] is the support vector machine (SVM), which is applied to both regression [30,34] and pattern recognition. An SVM uses a kernel mapping device to map the input space data to a high-dimensional feature space where the problem is linearly separable. An SVM's decision function relates to the number of support vectors (S.V.s) and their weights and the kernel chosen a priori, called the kernel [1,21]. Several kinds of kernels are Gaussian and polynomial kernels that may be used [10]. Moreover, artificial neural networks (ANN), wavelet-based artificial neural networks (WANN), support vector machine (SVM) were applied at different combinations of input variables by [23]. Their results showed that ANN, which contains three variables of air temperatures and solar radiation, produces root mean square error (RMSE) of 0.701, mean absolute error (MAE) of 0.525, correlation coefficient (R) of 0.990, and Nash-Sutcliffe efficiency (NSE) of 0.977 had better performances in comparison with WANN and SVR.
In principle, wavelet decomposition emerges as an efficient approximation instrument [18]; that is to say, a set of bases can approximate the random wavelet functions. To approximate E pan , researchers used ANN, WANN, radial function-based support vector machine (SVM-RF), linear function-based support vector machine (SVM-LF), and multilinear regression (MLR) models of climatic variables.
There have been many studies on the estimation of E pan based on weather variables using data-driven methods. However, the estimation of E pan based on lag-time weather variables, which can be obtained easily, is not standard. After testing different acceptable combinations as input variables, the same inputs were used in artificial intelligence processes. In the proposed study, the main objective is to (1) model E pan using ANN, WANN, SVM-RF, SVM-LF, and MLR models under different scenarios and (2) to select the best-developed model and scenario in E pan estimation based on statistical metrics. The document's format is as follows. Section 2 contains the study's materials and methods: Section 3 gives the statistical indexes and methodological properties. The models' applicability to evaporation prediction and the results are discussed in Section 4. The conclusion is found in Section 5.

Study Area and Data Collection
Pusa is located in the Samastipur district of Bihar state, with latitude 25 • 46 N and 86 • 10 E. The location map of the study area is shown in Figure 1. Pusa lies 53 m above mean sea level in a hot sub-humid agro-ecological region in the middle of the Gangetic plain. The study area is located near the Burhi Ganadak river, a tributary of the Ganges river. The study area is famous for the Dr. Rajendra Prasad Central Agricultural University, a backbone of the study area's development. The average rainfall for Pusa is 1270 mm, of which 80% of the total rain falls during the monsoon season. The study area is fully covered by the area of the southwest monsoon, which starts in June and eases off in September. The maximum temperature varies from 32 to 38 • C during May and June. The minimum temperature varies from 6 to 9 • C during December and January. The main crops grown in the study area are wheat, maize, paddy, green gram, lentil, potato, and brinjal.
Meteorological data of the study area were gathered from the official "Dr. RPCAU" website (https://www.rpcau.ac.in, accessed on 13 April 2021), Pusa, Bihar. This included maximum and minimum temperatures (T max and T min , • C), relative humidity (RH-1, percent) at 7 a.m. and at 2 p.m. (RH-2, percent), wind speed (WS, km/h), bright sunshine hours (SSH, h) and daily pan evaporation (E Pan , mm). For modeling pan evaporation, five years daily data set between the month 1 June to 30 September means that a total of 610 datasets have been used as input. The same is used for output [35]. Figure 2 displays the climate parameters determined in a box-and-whisker plot between June 2013 and September 2017 (i.e., five-year duration), indicating minimum, first quartile, median, third quartile, and maximum values. years daily data set between the month 1 June to 30 September means that a total of 610 datasets have been used as input. The same is used for output [35].    years daily data set between the month 1 June to 30 September means that a total of 610 datasets have been used as input. The same is used for output [35].    The box-and-whisker plot shows that the relative humidity, measured at 7 a.m. and 2 p.m., respectively, demonstrates the highest variability among other meteorological parameters. Table 1 presents the statistical analysis of maximum and minimum temperatures (T max and T min , • C), relative humidity (RH-1, percent) at 7 a.m. and at 2 p.m. (RH-2, percent), wind speed (WS, km/h), bright sunshine hours (SSH, h) and daily pan evaporation (E Pan , mm). The statistical analysis includes mean, median, minimum, maximum, standard deviation (Std. Dev.), kurtosis, and skewness values from 2013 to 2017. The given data is moderate to highly skewed; due to this problem, there has been a considerable negative effect on model performance. The standard deviation for the datasets shows that the values that are farther from zero mean that the variability in the data is higher. Hence, the variation of data from the mean value is higher. The statistical characteristics from the kurtosis values depict the platykurtic and leptokurtic nature of the climatic parameters, where kurtosis values are less than or greater than 3.  Table 2 depicts the inter-correlation between climatic variables at the given station. Thus, it can be observed that all climate parameters have a significant association with the E Pan at a significance level of 5%. The ANN methodology is a tool used to replicate the problem-solving mechanism of the human brain. ANNs are incredibly robust at modeling and simulating linear and non-linear systems. The ANN's feed-forward back-propagation techniques were highly emphasized among ANNs because their lower level of difficulty in the present study were also used [36,37]. ANN consists of the input layer, output layer, and hidden layers between the input and output layers. Each node within a layer is connected to all the following layer nodes. Only those nodes within one layer are connected to the following layer nodes [29]. Each neuron receives processes and sends the signal to make functional relationships between future and past events. These layers are attached with the interconnected weight W ij and W jk between the layers of neurons. The typical structure using input variables is shown in Figure 3.

Statistical Analysis
For this analysis, only one hidden layer network was used since it was considered dynamic enough to forecast meteorological variables. There are some transfer functions required to create an artificial neural network neuron. Transfer functions are needed to establish the input-output relationship for each neuron layer. In this analysis, Levenberg-Marquardt was used to train the model. A hyperbolic tangent sigmoid transfer function was used to measure a layer's output from its net input. The neural network learns by changing the connection weights between the neurons. By using a suitable learning algorithm, the connection weights are altered using the training data set. The number of hidden layers is typically determined by trial and error. A comprehensive ANN overview is available [25,38,39].
Atmosphere 2021, 12, 701 6 of 22 interconnected weight Wij and Wjk between the layers of neurons. The typical structure using input variables is shown in Figure 3. For this analysis, only one hidden layer network was used since it was considered dynamic enough to forecast meteorological variables. There are some transfer functions required to create an artificial neural network neuron. Transfer functions are needed to establish the input-output relationship for each neuron layer. In this analysis, Levenberg-Marquardt was used to train the model. A hyperbolic tangent sigmoid transfer function was used to measure a layer's output from its net input. The neural network learns by changing the connection weights between the neurons. By using a suitable learning algorithm, the connection weights are altered using the training data set. The number of hidden layers is typically determined by trial and error. A comprehensive ANN overview is available [25,38,39].

Wavelet Artificial Neural Network (WANN)
The wavelet analysis (WA) offers a spectral analysis dependent on the time that explains processes and their relationships in time-frequency space by breaking down time series [40]. WA is an effective method of time-frequency processing, with more benefits than Fourier analysis [41]. WA is an improvement over the Fourier transformation variant used to detect time functionality in data [40]. Wavelet transformation analysis, breaking down time series into essential functions at different frequencies, improves the potential of a predictive model by gathering sufficient information from different resolution levels [25]. There is excellent literature on wavelet transforming theory [42,43]; we will not go into it in depth here. It is vital to choose the base function carefully (called the mother wavelet). The essential functions are generated by translation and dilation [44]. In general, the discrete wavelength transformation (DWT) has been used preferentially in data decomposition, as compared to continuous wavelet transformation (CWT), because CWT is time-consuming [3,18].
The present used the DWT method for daily EPan (mm) estimation. DWT decomposes the original input time series data of Tmax, Tmin, RH-1, RH-2, WS, and SSH into different frequencies (Figure 4), adapted from Rajaee [44].
This analysis used three stages of the Haar à trous decomposition algorithm using Equations (1) and (2):

Wavelet Artificial Neural Network (WANN)
The wavelet analysis (WA) offers a spectral analysis dependent on the time that explains processes and their relationships in time-frequency space by breaking down time series [40]. WA is an effective method of time-frequency processing, with more benefits than Fourier analysis [41]. WA is an improvement over the Fourier transformation variant used to detect time functionality in data [40]. Wavelet transformation analysis, breaking down time series into essential functions at different frequencies, improves the potential of a predictive model by gathering sufficient information from different resolution levels [25]. There is excellent literature on wavelet transforming theory [42,43]; we will not go into it in depth here. It is vital to choose the base function carefully (called the mother wavelet). The essential functions are generated by translation and dilation [44]. In general, the discrete wavelength transformation (DWT) has been used preferentially in data decomposition, as compared to continuous wavelet transformation (CWT), because CWT is time-consuming [3,18].
The present used the DWT method for daily E Pan (mm) estimation. DWT decomposes the original input time series data of T max , T min , RH-1, RH-2, WS, and SSH into different frequencies (Figure 4), adapted from Rajaee [44].
This analysis used three stages of the Haar à trous decomposition algorithm using Equations (1) and (2): where h(l) is the discrete low-pass filter, C r (t) and W r (t) (r = 1, 2, 3, . . . ., n) are scale coefficient and wavelet coefficient at the resolution level. Two sets of filters, including low and high passes, are employed by DWT to decompose the main time series. It is discontinuous and resembles a step feature that is ideal for certain time series of abrupt transitions. The abovementioned wave types were evaluated, and finally, the measured monthly time series, H, were decomposed into multi-frequency time series including details (HD1; HD2; . . . ; HDn) and approximation (Ha) by optimum DWT (Qasem et al., 2019). coefficient and wavelet coefficient at the resolution level. Two sets of filters, including low and high passes, are employed by DWT to decompose the main time series. It is discontinuous and resembles a step feature that is ideal for certain time series of abrupt transitions. The abovementioned wave types were evaluated, and finally, the measured monthly time series, H, were decomposed into multi-frequency time series including details (HD1; HD2; ... ; HDn) and approximation (Ha) by optimum DWT (Qasem et al., 2019). The obtained decomposed frequency values function as an ANN input. Hybridizing the decomposed input time series data of Tmax, Tmin, RH-1, RH-2, WS, and SSH with ANN results in a wavelet artificial neural network (WANN) [42]. Three levels of the Haar à trous decomposition algorithm were used in this study. For the model's training, the Levenberg-Marquardt algorithm was used. The hyperbolic tangent sigmoid transfer function was also used to measure a layer's output from its net input.

Support Vector Machine
The support vector machine (SVM) was developed by [33] for classification and regression procedures. The fundamental concept of an SVM is to add a kernel function, map the input data by non-linear mapping into a high-dimensional function space, and then perform a linear regression in the feature space [45]. SVM is a modern classifier focused on two principles ( Figure 5) adapted from Lin et al. [46]. First, data transformation into a high-dimensional space can render complicated problems easier, utilizing linear discriminate functions. Secondly, SVM is inspired by the training principle and uses only specific inputs nearest to the decision region since they have the most detail regarding classification [47].
We assume a non-linear function f(x) is given by: where w is the weight vector, b is the bias, and Φ(xi) is the high dimensional feature space, linearly mapped from the input space x. Equation (3) can be transformed into higher dimensions and gives final expression as: The obtained decomposed frequency values function as an ANN input. Hybridizing the decomposed input time series data of T max , T min , RH-1, RH-2, WS, and SSH with ANN results in a wavelet artificial neural network (WANN) [42]. Three levels of the Haar à trous decomposition algorithm were used in this study. For the model's training, the Levenberg-Marquardt algorithm was used. The hyperbolic tangent sigmoid transfer function was also used to measure a layer's output from its net input.

Support Vector Machine
The support vector machine (SVM) was developed by [33] for classification and regression procedures. The fundamental concept of an SVM is to add a kernel function, map the input data by non-linear mapping into a high-dimensional function space, and then perform a linear regression in the feature space [45]. SVM is a modern classifier focused on two principles ( Figure 5) adapted from Lin et al. [46]. First, data transformation into a high-dimensional space can render complicated problems easier, utilizing linear discriminate functions. Secondly, SVM is inspired by the training principle and uses only specific inputs nearest to the decision region since they have the most detail regarding classification [47]. where, , are Lagrangian multipliers which are used to eliminate some primal variables, and the term , is the kernel function. The derivation and excellent literature about SVM can be obtained from [48]. The study's kernel function was a linear function (LF) and radial function (RF).

•
Linear kernel function (LF): the most basic form of kernel function is written as: • Radial basis function (RBF): a mapping of RBF is identically represented as Gaussian bell shapes: where is the Gaussian RBF kernel parameter width; the RBF is widely used among all the kernel functions in the SVM technique. The efficiency of the SVR technique depends on the environment for an ε -insensitive loss function of three training parameters (kernel, C, , and ε). However, the values of C and ε influence the complexity of the final model for every specific type of kernel. The ε value measures the number of support vectors (SV) used for predictions. The best value of ε intuitively results in fewer supporting vectors, leading to less complicated regression estimates. However, C's value is the trade-off between model complexity and the degree of deviations permitted within the optimization formulation. Therefore, a more We assume a non-linear function f (x) is given by: Atmosphere 2021, 12, 701 where w is the weight vector, b is the bias, and Φ(x i ) is the high dimensional feature space, linearly mapped from the input space x. Equation (3) can be transformed into higher dimensions and gives final expression as: where, α + i ,α − i are Lagrangian multipliers which are used to eliminate some primal variables, and the term K x i , x j is the kernel function. The derivation and excellent literature about SVM can be obtained from [48]. The study's kernel function was a linear function (LF) and radial function (RF).

•
Linear kernel function (LF): the most basic form of kernel function is written as: • Radial basis function (RBF): a mapping of RBF is identically represented as Gaussian bell shapes: where γ is the Gaussian RBF kernel parameter width; the RBF is widely used among all the kernel functions in the SVM technique.
The efficiency of the SVR technique depends on the environment for an ε-insensitive loss function of three training parameters (kernel, C, γ, and ε). However, the values of C and ε influence the complexity of the final model for every specific type of kernel. The ε value measures the number of support vectors (SV) used for predictions. The best value of ε intuitively results in fewer supporting vectors, leading to less complicated regression estimates. However, C's value is the trade-off between model complexity and the degree of deviations permitted within the optimization formulation. Therefore, a more considerable value of C undermines model complexity [49]. The selection of optimum values for these training parameters (C and ε) guaranteeing fewer complex models is an active research area.

Multiple Linear Regression (MLR)
A linear regression analysis in which more than one independent variable is involved is called MLR. The advantage of MLR is that it is simple, showing how dependent variables interact with independent variables. The overall model of the MLR is: where y is the dependent variable, and x 1 , x 2 , . . . , x n are independent variables, c 1 , c 2 , . . . , c n are regression coefficients, and c 0 is intercepted. These values are the local behavior calculated using the least square rule or other regression [27].

Modeling Methodology
In the present study, the daily pan evaporation (E Pan ) was estimated based on different input climatic variables (T max , T min , RH-1, RH-2, W.S., and S.S.H.). The five different techniques used for estimation were the artificial neural network (ANN), wavelet-based artificial neural network (WANN), radial function-based support vector machine (SVM-RF), linear function-based support vector machine (SVM-LF), and multi-linear regression (MLR) models. The climatic parameters were collected from 2013 to 2017 and split into three different scenarios, based on the percentage of training and testing datasets for model development (Table 3). The results of the applied models in three different scenarios were evaluated through different performance evaluators described in Section 2.5.

Performance Evaluation Criteria
There were four criteria used to measure the performance of the scenarios mentioned above, quantitatively evaluated using root mean square error (RMSE), Nash-Sutcliffe Efficiency (NSE), Pearson's correlation coefficient (PCC), and Willmott index (W.I.), and qualitatively evaluated through graphical interpretation (time-series plot, scatter plot, and Taylor diagram). The RMSE range is zero to infinity (0 < RMSE < ∞); the lower the RMSE, the better the model's performance. The NSE ranges from minus infinity to one (−∞ < NSE < 1). NSE below zero (NSE < 0) indicates that the observed mean only as strong as the average, whereas negative values suggest that the observed mean a more robust indicator than the average [48]. The PCC is also known as the correlation coefficient and is used to calculate the degree of collinearity between observed and estimated values. The PCC varies from minus one to plus one (−1 < PCC < 1) [39]. The WI is also known as the index of agreement. The WI ranges from zero to one (0 < WI < 1); approximately 1 is ideal agreement/fit [3]. The most accurate models were selected based on the highest values of PCC, NSE, and WI, while showing the lowest values of RMSE among all developed models.
where E p obs,i , E p pre,i observed and predicted pan evaporation values on the ith day.
E p obs,i , E p pre,i are average of observed and predicted values, respectively.

Quantitative and Qualitative Evaluation of Results
This section deals with quantitative and qualitative results obtained for the developed models. ANN and WANN trials were conducted depending on the different number of neurons in hidden layers. In contrast, SVM-LF and SVM-RF trials were performed by taking several values of SVM-g, SVM-c, and SVM-e parameters. These were represented in Tables 4-6 as a structure for the model.

Comparison of Training and Testing Datasets for Scenario 1
The training results obtained by ANN, Wavelet, and SVM have been shown in Table 4. As depicted in Table 4, for three developed ANN models, namely ANN-1, ANN-2, and ANN-3, ANN-1 has the highest PCC value of 0.832, the lowest RMSE value of 0.993, the highest NSE value of 0.685, and the highest WI value of 0.904.
Similarly, for the developed WANN model, WANN-1 has shown better performance, with a PCC value of 0.773. Furthermore, the WANN model also has the lowest RMSE value of 1.123, the highest NSE value of 0.597, and the highest WI value of 0.860. Furthermore, among developed SVM-RF and SVM-LF models, SVM-RF-3 has shown better performance than other developed models. The SVM-RF-3 model has the highest PCC value of 0.857; it has the lowest RMSE value of 0.956, the highest NSE value of 0.708, and the highest WI value of 0.895 during training datasets. The value of PCC, RMSE, NSE, and WI for MLR techniques was 0.695, 1.274, 0.483, and 0.800. Thus, it can be stated that SVM-RF has modeled the E pan most efficiently of all the machine learning algorithms developed for training.  1.345, 0.188, and 0.725, respectively. The scatter plot and line diagram for the testing data set has been shown in Figure 6. From the line diagram, it can be observed that the obtained results were under-predicted for all models. The scatter plot shows that the highest value of the determination (R 2 ) coefficients was obtained for the SVM-RF model. Thus, it can be suggested that SVM-RF has modeled the E pan most efficiently among all the machine learning algorithms developed for testing.

Comparison of Training and Testing Datasets for Scenario 2
In Scenario 2, 70% of the entire data set has been used for training, and the rest of the data has been used for testing the developed model. The training results obtained by ANN, Wavelet, and SVM have been shown in Table 5.
As shown in  1.345, 0.188, and 0.725, respectively. The scatter plot and line diagram for the testing data set has been shown in Figure 6. From the line diagram, it can be observed that the obtained results were under-predicted for all models. The scatter plot shows that the highest value of the determination (R 2 ) coefficients was obtained for the SVM-RF model. Thus, it can be suggested that SVM-RF has modeled the Epan most efficiently among all the machine learning algorithms developed for testing.

Comparison of Training and Testing Datasets for Scenario 2
In Scenario 2, 70% of the entire data set has been used for training, and the rest of the data has been used for testing the developed model. The training results obtained by ANN, Wavelet, and SVM have been shown in Table 5.  For Scenario 2, where 30% of the data set has been used for testing, model ANN-1 has the highest PCC value of 0.547, the lowest RMSE value of 1.222, the highest NSE value of 0.046, and a WI value of 0.704 among ANN models. Similarly, WANN-1 has shown better performance, with a PCC value of 0.457, the lowest RMSE value of 1.252, the highest NSE value of −0.002, and the highest WI value of 0.639 WANN models. Furthermore, SVM-RF-3 has shown better performance as compared to other developed models among SVM-RF and SVM-LF models. The SVM-RF-3 model has the highest PCC value of 0.568, the lowest RMSE value of 1.262, and the highest WI value of 0.714 during training datasets. The values of PCC, RMSE, NSE, and WI for MLR techniques were 0.531, 1.262, −0.017, and 0.700, respectively. The scatter plot and line diagram for testing have been shown in Figure 7. It can be seen from the line diagram that the obtained results were under-predicted for all models. The scatter plot showed that the highest value of the coefficient of determination (R 2 ) was obtained for SVM-RF models of 0.3221. Thus, it can be shown that SVM-RF has modeled the E pan most efficiently among all the machine learning algorithms developed for testing.

Comparison of Training and Testing Datasets for Scenario 3
In Scenario 3, 80% of the total dataset was used for training periods, while the rest, 20%, was used to test the models. The training results obtained by ANN, wavelet analysis, and SVM have been shown in Table 6.
As depicted from Table 6, Figure 8. From the line diagram, it has been observed that obtained results were under-predicted and over-predicted for all models. The scatter plot showed that the highest value of the coefficient of determination (R 2 ) was obtained for SVM-RF models of 0.2791. Thus, it can be seen that SVM-RF has modeled the daily E pan most efficiently among all the machine learning algorithms developed for testing.
The comparative results of training and testing data results have been shown in Table 7. This table could suggest that training and testing data using the SVM-RF model, E pan , can be modeled more accurately than ANN and WANN.
The performance of models from best to lowest is SVM > ANN > MLR > WANN for all three scenarios. Table 7 also showed that the WANN model performed poorly compared to other models. This is because wavelet transformation does not reveal the hidden information present in the primary time-series data through different sub-series. It is also observed that, with an increase in the sample set for training, the testing data will show a less accurate modeled result.
The values of PCC, RMSE, NSE, and WI for MLR techniques were 0.506, 1.363, −0.227, and 0.665. The scatter plot and line diagram for testing have been shown in Figure 8. From the line diagram, it has been observed that obtained results were under-predicted and over-predicted for all models. The scatter plot showed that the highest value of the coefficient of determination (R 2 ) was obtained for SVM-RF models of 0.2791. Thus, it can be seen that SVM-RF has modeled the daily Epan most efficiently among all the machine learning algorithms developed for testing. The comparative results of training and testing data results have been shown in Table  7. This table could suggest that training and testing data using the SVM-RF model, Epan, can be modeled more accurately than ANN and WANN.  The comparative result of all three scenarios of all developed models has also been shown through Taylor's diagram [50] in Figure 9a-c, which acquires information based on correlation coefficient, standard deviation, and root mean square difference [27]. Figure 9a-c indicates that the SVM-RF model predictions in all three scenarios are very close to the daily values of E pan, which are tending more toward observed point values at abscissa. The performance-based correlation coefficient, standard deviation, and root mean square difference are also superior compared to others. Therefore, the SVM-RF model with T max , T min , RH-1, RH-2, WS, and SSH climate variables can be used for daily E pan estimation at the Pusa station. The comparative result of all three scenarios of all developed models has also been shown through Taylor's diagram [50] in Figure 9a-c, which acquires information based on correlation coefficient, standard deviation, and root mean square difference [27]. Figure 9a-c indicates that the SVM-RF model predictions in all three scenarios are very close to the daily values of Epan, which are tending more toward observed point values at abscissa. The performance-based correlation coefficient, standard deviation, and root mean square difference are also superior compared to others. Therefore, the SVM-RF model with Tmax, Tmin, RH-1, RH-2, WS, and SSH climate variables can be used for daily Epan estimation at the Pusa station.

Discussion
Our results as obtained are similar to the results of [17,39]. They modeled pan evaporation and found that the ANN and SVR models achieved high correlation coefficients ranging from 0.81 to 0.90. In addition, our findings are in agreement with Cobaner [15], who observed that the ANN model with Bayesian Regularization (BR) and algorithm during training, validation, and testing generated 0.76, 0.67, and 0.72, respectively. Applying Levenberg-Marquardt (LM) algorithm, the corresponding values were 0.77, 0.69, and 0.71, respectively. Furthermore, for SVR, this model's findings are close to those of Tezel and Buyukyildiz [51]. They concluded that the SVR gave high correlations, ranging from 0.86 to 0.90, for evaporation forecasting. Moreover, the results obtained with SVR are in line with Pammar and Deka [52]. They stated that the correlation coefficients and RMSE ranged from 0.79 to 0.84 and from 0.90 to 1.03 under the different kernels. The values of RMSE conducted by Alizamir et al. [17] were 0.836 and 0.882 for ANN 4-6-6-1 and 1.028 and 1.106 for MLR models through the training and testing period. Their results found that ANN's evaporation estimation was better than the estimation through MLR and agreed with the present study results. The ANN model of pan evaporation, with all available variables as inputs, proposed by Rahimi Khoob [21] was the most accurate, delivering an R 2 of 0.717 and an RMSE of 1.11 mm independent evaluation data set, which correlates with our outcomes. As reported by Keskin and Terzi [25], the R 2 values of the ANN 3, 6, 1, ANN 6, 2, 1, and ANN 7, 2, 1 model equaling 0.770, 0.787, and 0.788 for modeling E pan are also acceptable and agree with our results. These developed models produced a more acceptable outcome than Kim et al. [53]. The latter stated that the ANN and MLR generated R 2 values ranging from 0.69 to 0.74 and from 0.61 to 0.64. The RMSE for these models varied from 1.38 to 1.48 and from 1.56 to 1.60, respectively. However, all developed models in this manuscript could not capture the variability of extreme values present in the input and output parameters at the given study location. The models' efficiency might be improved if the extreme values are removed. This is one of the limitations of the study outlined in this paper.

Conclusions
Evaporation processes are strongly non-linear and stochastic phenomena affected by relative humidity, temperature, vapor pressure deficit, and wind speed. In the present study, daily pan evaporation (E pan ) estimation was evaluated using ANN, WANN, SVM-RF, SVM-LF, and MLR models. The input climatic variables for the estimation of daily E pan were: maximum and minimum temperatures (T max and T min ), relative humidity (RH-1 and RH-2), wind speed (W.S.), and bright sunshine hours (SSH). The free availability of these meteorological parameters for other stations in Bihar, India, is a significant concern and limitation of this research. The proposed models were trained and tested in three separate scenarios, i.e., Scenario 1, Scenario 2, and Scenario 3, utilizing different percentages of data points. The models above were evaluated using statistical tools, namely, PCC, RMSE, NSE, and WI, through visual inspection using a line diagram, scatter plot, and Taylor diagram. Research results evidenced the SVM-RF model's ability to estimate daily E pan, integrating all weather details like T max , T min , RH-1, RH-2, WS, and SSH The SVM-RF model's dominance was found at Pusa station for all scenarios investigated. It is also clear that, with an increase in the sample set for training, the testing data will show a less accurate modeled result. Since the Pusa dataset has many extreme values, the developed model could not capture extreme values very efficiently; this is one of the limitations of this paper. Overall, the current research outcome showed the SVM-RF model's viability as a newly established data-intelligent method to simulate pan evaporation in the Indian area. It can be extended to many water resource engineering applications. It is also recommended that SVM-RF models can be applied under the same climatic conditions and the availability of the same meteorological parameters.