Kernel Extreme Learning Machine: An E ﬃ cient Model for Estimating Daily Dew Point Temperature Using Weather Data

: Accurate estimation of dew point temperature (T dew ) has a crucial role in sustainable water resource management. This study investigates kernel extreme learning machine (KELM), boosted regression tree (BRT), radial basis function neural network (RBFNN), multilayer perceptron neural network (MLPNN), and multivariate adaptive regression spline (MARS) models for daily dew point temperature estimation at Durham and UC Riverside stations in the United States. Daily time scale measured hydrometeorological data, including wind speed (WS), maximum air temperature (TMAX), minimum air temperature (TMIN), maximum relative humidity (RHMAX), minimum relative humidity (RHMIN), vapor pressure (VP), soil temperature (ST), solar radiation (SR), and dew point temperature (T dew ) were utilized to investigate the applied predictive models. Results of the KELM model were compared with other models using eight di ﬀ erent input combinations with respect to root mean square error ( RMSE ), coe ﬃ cient of determination ( R 2 ), and Nash–Sutcli ﬀ e e ﬃ ciency ( NSE ) statistical indices. Results showed that the KELM models, using three input parameters, VP, TMAX, and RHMIN, with RMSE = 0.419 ◦ C, NSE = 0.995, and R 2 = 0.995 at Durham station, and seven input parameters, VP, ST, RHMAX, TMIN, RHMIN, TMAX, and WS, with RMSE = 0.485 ◦ C, NSE = 0.994, and R 2 = 0.994 at UC Riverside station, exhibited better performance in the modeling of daily T dew . Finally, it was concluded from a comparison of the results that out of the ﬁve models applied, the KELM model was found to be the most robust by improving the performance of BRT, RBFNN, MLPNN, and MARS models in the testing phase at both stations. of respectively. The results conﬁrmed that the KELM model improved by 8.5% (BRT model), 3.2% (RBFNN model), 4.1% (MARS model), and 1.9% (MLPNN model) based on the RMSE values among the best composite models during the testing phase at Durham station. In addition, the KELM model enhanced the accuracy by 18.1% (BRT model), 23.5% (MARS model), 15.2% (RBFNN model), and 1.4% (MLPNN model) based on the RMSE values among the best composite models during the testing phase in UC Riverside station. The results suggest that the KELM model can be successfully used for estimating dew point temperature using weather data as important parameter for


Introduction
Dew point temperature (T dew ) plays a vital role in the elaboration and application of several ecological, hydrological, and meteorological models, especially for the quantification of evapotranspiration [1][2][3]. Different important climatic parameters can be affected by T dew . In addition, it has been demonstrated that T dew can be used as an important factor for climate change studies [4]. Recently, Ali et al. demonstrated a strong relationship between T dew and extreme precipitation [5]. Bui et al. reported that T dew helped significantly to understand the relationship between precipitation and air temperature, which can be used to quantify near-surface humidity [6,7]. Several authors have paid attention to the strong relationship between T dew and meteorological variables [8][9][10][11]. Many studies have tried to develop models to link T dew to meteorological variables using machine learning models. Dong et al. applied machine learning models for modeling dew point temperature (T dew ) using several meteorological variables as inputs, namely, TMAX, TMIN, TMEAN, RHMAX, RHMIN, RHMEAN, and atmospheric pressure (Pa) [12]. The authors applied and compared ten different soft computing techniques to model T dew . Based on the results, the best accuracy was obtained by the Bat-ELM model by employing Tmax, Tmin, RHmax, and RHmin as input variables. Shiri presented three data-driven models to model T dew at weekly and daily time scales using data collected at six meteorological stations [13]. The proposed three models were gene expression programming (GEP), MARS, and RF. For modeling T dew , Tmean, RHmean, sunshine hours (SH), and wind speed (U2) were used as input parameters. In the second application, the previously measured values of T dew were applied as input variables for modeling T dew one day and seven days in advance. The best accuracy was obtained using the MARS model by employing Tmean, RHmean, and SH as input variables at all stations, respectively, while for T dew prediction, GEP worked best at daily and weekly time steps. Qasem et al. employed three machine learning models for estimating T dew , namely, M5 model tree (M5Tree), GEP, and SVM, using Tmean, RHmean, SH, U2, and actual vapor pressure (VP) [14]. A comparison between the models revealed that the M5Tree model, by having five climatic variables (Tmean, RHmean, SH, U2, and VP), had the best accuracy. Naganna et al. introduced new hybrid models for estimating daily T dew using bulb temperature (TB), VP, and RHmean [15]. They applied MLPNN optimized using the gravitational search algorithm (MLPNN-GSA) and the firefly algorithm (MLPNN-FFA). Results obtained using MLPNN-FFA and MLPNN-GSA were compared to those obtained using the standard MLPNN, SVM, and ELM models, and the best accuracy was yielded by the MLPNN-FFA model.
Attar et al. compared MARS, GEP, and SVM models for predicting T dew using Tmax, Tmin, RHmean, SH, U2, and atmospheric pressure (P) [16]. The best accuracy was obtained by the MARS model. Mehdizadeh et al. applied the GEP model for modeling and forecasting T dew using several meteorological variables, namely, Tmax, Tmin, Tmean, VP, RHmean, and P [17]. The study was conducted according to three scenarios: (i) temperature-based models using only air temperature as an input variable, (ii) using a combination of meteorological variables, and (iii) forecasting models using previously measured values of T dew as input variables. For the first scenario, the best accuracy was obtained using Tmin and the difference between Tmin and Tmax as input variables. For the second scenario, the best accuracy was obtained using VP, RHmean, and P. Finally, for the third scenario, the best accuracy was obtained by the model with T dew measured on the three previous days and the Julian day (J) as input variables. In some regions of India, Deka et al. compared the SVM and ELM models for modeling T dew using daily measured TB, VP, and RHmean [18]. They reported that the ELM model was more accurate than the SVM model and achieved the best accuracy. Shiri et al. introduced a new artificial neural network model called the Elman discrete recurrent neural network (EDRNN) model, trained using two different algorithms: (i) the conjugate gradient learning algorithm and (ii) the quick prop learning algorithm [19]. Results obtained using the EDRNN model were compared to those obtained using the GEP model. The three models were developed using Tmean, RHmean, U2, P, and solar radiation (SR). By comparing several input combinations, the authors demonstrated that GEP significantly surpassed the two ANN models. Kisi et al. investigated the accuracy of four machine learning models, namely, ANFIS-C, ANFIS-G, GRNN, and SOM [20]. Another study was conducted in South Korea using Tmean, RHmean, U2, SH, and VP parameters. The authors reported two important conclusions. First, none of U2, SH, and VP contributed significantly to the improvement of the machine learning models, while Tmean and RHmean were the most efficient variables for predicting T dew . Furthermore, results revealed that the best accuracy was obtained using the GRNN model, while the lowest accuracy was achieved using the SOM model. A hybrid model by combining SVM and the firefly algorithm (SVM-FFA) was developed by Al-Shammari et al. for modeling T dew using Tmean, RHmean, and P, measured at daily time scale in Iran [21]. Results obtained using the SVM-FFA model were compared to those obtained using MLPNN, SVM, and genetic programming (GP), and the best accuracy was obtained using SVM-FFA.
Amirmojahedi et al. employed a new hybrid model by combining the wavelet transform and the extreme learning machines (W-ELM) for predicting daily T dew using Tmean, RHmean, and P [22]. Compared to the standard ELM, SVM, and MLPNN models, W-ELM yielded the best result, while the lowest accuracy was obtained using the MLPNN model. In another study, Baghban et al. applied the least square support vector machine optimized by genetic algorithm (LSSVM-GA) for modeling T dew using Tmean, RHmean, and P [23]. They found that the LSSVM-GA was more accurate compared to ANFIS-GA. In summary, although many studies have been conducted for modeling T dew , this study investigates a reliable tool, the kernel extreme learning machine (KELM), to estimate daily T dew by using hydrometeorological input parameters. Results obtained using KELM were compared to those obtained using the boosted regression tree (BRT), radial basis function neural network (RBFNN), MARS, and MLPNN models. To the best of the authors' knowledge, this is the first study that applies the KELM model to estimate daily T dew at the Durham and UC Riverside stations in the USA.

Artificial Neural Networks (MLPNN and RBFNN)
Artificial neural networks (ANNs) are computational networks and information processing systems that consist of a large number of interconnected computing units called neurons. The formation of ANNs is based on the mathematical simulation of the biological nervous system structure. To date, various versions of ANNs have been developed. In this study, two common types of ANNs, including multilayer perceptron (MLPNN) and radial basis (RBNN), which have had proper performance in simulating hydrometeorological issues, are reviewed [24,25]. Both the MLPNN and RBNN models are neural networks with a supervised network training structure and feed-forward information transfer. In both models, the developed network consists of an input layer, a hidden layer, and an output layer. It is worth mentioning that although MLPNNs can be developed with more than one layer in the hidden layer, these models are often developed with only one layer [26][27][28][29][30]. The input layer receives user-entered information and transmits it to the hidden layer after applying the weight coefficient and biases. The neurons in the hidden layer, defined according to the activation function, process the receive values from the input layer neurons and send them to the output layer neurons. Similar to the hidden layer neurons, the neurons in the output layer compute the model output by using an activation function [2,31]. Figure 1 shows the general structure of an MLPNN with two hidden layers, as well as the general structure of an RBNN.
In this phase, the computed output values of the network are compared with the target values. The main purpose of this network is to minimize the discrepancy between computational and observational values in the mean square error format (as an error function). To this end, back-propagation-based methods, such as gradient descent algorithms, are used. The main difference between the two methods of MLPNN and RBNN can be considered in the type of activation functions. The sigmoid functions are generally used in MLPNN networks (Equation (1)), and the radial basis Gaussian function (Equation (2)) is employed in RBNN networks.
where x represents the input variable, c is the center, and σ is the variance. Further detailed information on the basics of developing and deploying MLPNN and RBNN networks has been provided in detail in many other references [32][33][34]. In this study, the Levenberg-Marquardt (LM) learning method is applied to train the network based on its fast convergence capability for complex datasets. Additionally, Sigmoid and linear transfer functions are employed for hidden and output layers, respectively. Moreover, by applying a trial and error process, the number of neurons in the hidden layer is found to be 10 and 12 for MLPNN and RBFNN models, respectively.
Water 2020, 12, x FOR PEER REVIEW 4 of 20 where x represents the input variable, c is the center, and σ is the variance. Further detailed information on the basics of developing and deploying MLPNN and RBNN networks has been provided in detail in many other references [32][33][34]. In this study, the Levenberg-Marquardt (LM) learning method is applied to train the network based on its fast convergence capability for complex datasets. Additionally, Sigmoid and linear transfer functions are employed for hidden and output layers, respectively. Moreover, by applying a trial and error process, the number of neurons in the hidden layer is found to be 10 and 12 for MLPNN and RBFNN models, respectively.

Kernel Extreme Learning Machine (KELM)
Even though the artificial neural network models have proven their high capabilities in modeling plenty of nonlinear engineering problems, such as dew point temperature, they might suffer from the drawback of using gradient-descent-based training algorithms. That is, these algorithms can be trapped in local minima. Moreover, the ANNs' architecture contains many processors (neurons) as well as network parameters, which make it a sophisticated black-box structure in comparison to other machine learning models. To cope with these weaknesses of regular ANN models, Huang et al. proposed a novel version of a training algorithm called extreme learning machine (ELM) [35]. ELMs are single hidden feed-forward layer (SLFN) networks that choose the input weights and biases randomly, whereas the output hidden neurons are calculated by the Moore-Penrose generalized inverse. The ELM models have been successfully used in several applications in hydrological modeling in the past couple of years [36,37]. Nonetheless, the standard version of the ELM model faces the drawback of providing different accuracies in various trials due to its randomly assigned weight strategy. To resolve this shortcoming of the standard ELM, Huang et al. proposed kernel ELM (KELM) by modifying its random process of allocating random weights between the input and hidden layers [38]. This section briefly explains the theory and methodology of KELM. Complete details for the KELM model can be found in Huang et al.'s paper [38]. To begin with, one can present the formulation of the general structure of an SLFN model having M training samples, N hidden nodes, and g(x) as the activation function, as the following [38,39]: In Equation (3), i corresponds to a hidden node, and j denotes each independent input variable. o is the output vector, x represents the input feature vector, and b is the bias. w shows the weight

Kernel Extreme Learning Machine (KELM)
Even though the artificial neural network models have proven their high capabilities in modeling plenty of nonlinear engineering problems, such as dew point temperature, they might suffer from the drawback of using gradient-descent-based training algorithms. That is, these algorithms can be trapped in local minima. Moreover, the ANNs' architecture contains many processors (neurons) as well as network parameters, which make it a sophisticated black-box structure in comparison to other machine learning models. To cope with these weaknesses of regular ANN models, Huang et al. proposed a novel version of a training algorithm called extreme learning machine (ELM) [35]. ELMs are single hidden feed-forward layer (SLFN) networks that choose the input weights and biases randomly, whereas the output hidden neurons are calculated by the Moore-Penrose generalized inverse. The ELM models have been successfully used in several applications in hydrological modeling in the past couple of years [36,37]. Nonetheless, the standard version of the ELM model faces the drawback of providing different accuracies in various trials due to its randomly assigned weight strategy. To resolve this shortcoming of the standard ELM, Huang et al. proposed kernel ELM (KELM) by modifying its random process of allocating random weights between the input and hidden layers [38]. This section briefly explains the theory and methodology of KELM. Complete details for the KELM model can be found in Huang et al.'s paper [38]. To begin with, one can present the formulation of the general structure of an SLFN model having M training samples, N hidden nodes, and g(x) as the activation function, as the following [38,39]: In Equation (3), i corresponds to a hidden node, and j denotes each independent input variable. o is the output vector, x represents the input feature vector, and b is the bias. w shows the weight vector between the input layer (IL) and the hidden layer (HL), while β denotes the weight vector for connecting the nodes in the hidden layer to output nodes. Assuming an ideal condition for the developed SLFN model in Equation (1), one can expect zero errors between the target value (t) and the SLFN model's output (o). In that case, Equation (3) can be rewritten as below: Consequently, it is possible to arrange Equation (4) in the form of Hβ = T. H and T are the hidden layer output matrix and the activation function matrix, respectively. Equation (2) can be solved using linear methods, for example, the MP-generalized inverse of H, known as H † .
In the KELM model, a kernel, such as K(x,y), maps the data from IL to the HL space. In this sense, the KELM model applies the orthogonal projection procedure to compute the H † matrix so that it can be written as For this purpose, the reliable Gaussian kernel is applied for mapping the data between the layers.
where γ is the kernel parameter. In this study, the regularization coefficient and type of kernel are set to 35 and wavelet kernel, respectively. The step procedures of the developed KELM model in this study is as follows: -Dividing the dataset into the training (80%) and testing (20%) sets. Evaluating the developed KELM performance for the testing data set.

Multivariate Adaptive Regression Splines (MARS)
MARS is a learning machine method consisting of several simple regression models that have high capability in estimating and simulating complex phenomena. In this method, the space issue is divided into intervals of predictive variables. In each interval, the spline functions are fitted to the existing data. The formation of a MARS model is based on the creation of piecewise linear basis functions.
where t represents the knot in the MARS model. Given the written form of Equations (8), the equations written above are known as reflected-pair functions [40]. Therefore, having the n input variable X, BFs can be expressed according to the following equations: where n is the total number of observations, and j = 1,2, . . . , p. The basis functions (BFs) in the MARS method include the input variable, and they express the relationship between the input variables and the target parameter. Finally, by combining the generated spline functions, a resilient and efficient model is created to predict the target parameter as follows: In the above equation, B i (X) are BFs in C or the product of two or more C functions (see Equation (9)). β 0 , β i represent the bias and the coefficients of BFs, which can be calculated by the least square method (LSM), and M stands for the number of terms in a forward/backward stepwise process [29,41,42].

Boosted Regression Tree (BRT)
BRTs are tree models capable of simulating complex processes, which are formed by combining statistical methods and machine learning. The BRT method is based on the creation of an ensemble learning structure called boosting. For this purpose, the performance of a number of developed premise tree models (such as classification and regression trees (CART)) is aggregated by using a boosting technique, which results in better performance of the BRT model than the single CART model in the simulation and prediction of the desired phenomenon [43]. In the boosting method used in BRT, a forward stepwise process is applied so that each of the CART models formed is only assigned to a subset of training data. The subsets are selected by a stochastic process without the possibility of replacement. In the BRT model, two key parameters, including the learning rate and tree complexity factor, are responsible for creating and forming the overall structure of the model. The learning rate parameter controls the number of nodes in each tree to determine the contribution of each CART to the overall BRT structure and tree complexity factor [44].

Description of Study Area and Observational Data
In this research, daily hydroclimatic parameters, including wind speed (WS), maximum air temperature (TMAX), minimum air temperature (TMIN), maximum relative humidity (RHMAX), minimum relative humidity (RHMIN), vapor pressure (VP), soil temperature (ST), and solar radiation (SR) were applied for estimating the values of daily dew point temperature (T dew ) at both stations: Durham (latitude 39 • 36 N, longitude 121 • 49 W, and altitude 39.6 m) is located in Sacramento Valley Region Butte County, and UC Riverside (latitude 33 • 57 N, longitude 117 • 20 W, and altitude 310.9 m) is in the Los Angeles Basin Region Riverside County. The locations of Durham and UC Riverside Stations can be found in Figure 2. The data used in this study were obtained from the California Irrigation Management Information System (CIMIS). CIMIS weather station dataset quality was controlled by applying different data processing, including analyzing the measured weather data accuracy (https://cimis.water.ca.gov/cimis/). The data from 1 January 2005 to 31 December 2009 were applied in this study. In the study, the training dataset included the first 80% of daily data, and the Water 2020, 12, 2600 7 of 20 residual data were applied for the testing of the models. Table 1 presents statistical parameters such as average, minimum (Min), maximum (Max), and standard deviation (St. Dev) of the weather variables employed in this study.

Performance Indices
For the reliability assessment of models, three evaluation indicators, root mean squared error (RMSE), coefficient of determination (R 2 ), and the Nash-Sutcliffe efficiency coefficient (NSE), were applied to assess the performance of the five models developed for estimating dew point temperature (T dew ). These criteria are defined as where n represents the number of data, T dew io stands for the observed dew point temperature values, and T dewip is the model's estimates. Table 1 expresses the descriptive statistics of the observed dataset, including training and testing data from Durham and UC Riverside stations, respectively. The values of the standard deviation criterion produced considerable fluctuations within the used datasets. The results of the correlation matrix between dew point temperature and input parameters are provided in Table 2 for Durham and UC Riverside stations, respectively. Vapor pressure (VP) was the best parameter for correlating dew point temperature (T dew ), whereas wind speed (WS) was identified as the inverse parameter for correlating dew point temperature (T dew ) at both stations.

Durham Station
Based on climatic parameters for predicting dew point temperature (T dew ), different parameters, including wind speed (WS), maximum temperature (TMAX), minimum temperature (TMIN), maximum relative humidity (RHMAX), minimum relative humidity (RHMIN), vapor pressure (VP), soil temperature (SR), and solar radiation (SR), were chosen. The input parameters were framed as one-, two-, three-, four-, five-, six-, seven-, and eight-input composites. The performance metrics (i.e., RMSE, NSE, and R 2 ) for the best models are shown in Table 3 for Durham station. In one-input composite models, the values of three evaluation indicators for the vapor pressure (VP) parameter were the best in all developed models. It is clear from Table 3 that the KELM model (RMSE = 0.426 • C, NSE = 0.995, and R 2 = 0.995) slightly surpassed the BRT, MARS, RBFNN, and MLPNN models during the testing phase. Therefore, vapor pressure (VP) among different parameters was chosen as the major parameter to formulate two-, three-, four-, five-, six-, seven-, and eight-input composite models. In two-input composite models, the composite of VP and TMAX parameters was better than the other two-input composites in the BRT, MARS, RBFNN, and MLPNN models except for the KELM model (i.e., the composite of VP and SR). It can be suggested from Table 3 that the BRT and RBFNN models slightly outperformed the MARS and MLPNN models during the testing phase. Additionally, the KELM model (RMSE = 0.426 • C, NSE = 0.995, and R 2 = 0.995) had the best performance among other two-input composite models during the testing phase. Among three-input composite models, the composite of VP, TMAX, and RHMIN parameters was better than the other three-input composites in the BRT, RBFNN, MLPNN, and KELM models except for the MARS model (i.e., the composite of VP, ST, and RHMAX parameters). The KELM model (RMSE = 0.419 • C, NSE = 0.995, and R 2 = 0.995) yielded the best performance among other three-input composite models during the testing phase. By comparing four-input composite models, it was found that the composite of VP, TMIN, ST, and RHMIN parameters was better than the other four-input composites in the MARS, RBFNN, and KELM models, whereas the composite of VP, TMIN, ST, and RHMAX parameters was the best in the BRT and MLPNN models. It can be seen that the MLPNN and KELM models exceeded the BRT, MARS, and RBFNN models during the testing phase. Additionally, the KELM model (RMSE = 0.435 • C, NSE = 0.995, and R 2 = 0.995) gave, confidently, the best performance among the other four-input composite models during the testing phase. Five-input composite models showed that the composite of VP, TMIN, ST, TMAX, and RHMAX parameters was better than the other five-input composites in the BRT, MARS, RBFNN, and MLPNN models except for the KELM model (i.e., the composite of VP, TMIN, ST, TMAX, and RHMIN parameters). In addition, the KELM model (RMSE = 0.423 • C, NSE = 0.995, and R 2 = 0.995) obviously achieved the best performance compared to other five-input composite models during the testing phase.
The performance of six-input composite models, with the composite of VP, TMIN, ST, TMAX, SR, and RHMIN parameters, was better than that with the other six-input composites in the MARS, RBFNN, MLPNN, and KELM models, except for the BRT model (i.e., the composite of VP, TMIN, ST, TMAX, SR, and WS parameters). Additionally, the KELM model (RMSE = 0.426 • C, NSE = 0.995, and R 2 = 0.995) attained the best performance among the other six-input composite models during the testing phase. In the case of seven-and eight-input composite models, the predictive results of three evaluation indicators revealed that the composite of VP, TMIN, ST, TMAX, SR, WS, and RHMIN parameters was better than the other seven-input composites in the BRT, RBFNN, MLPNN, and KELM models. Only the MARS model (RMSE = 0.466 • C, NSE = 0.994, and R 2 = 0.994) was accurate with the composite of VP, TMIN, ST, TMAX, SR, WS, and RHMAX parameters compared to the other seven-input composites. The KELM model (RMSE = 0.426 • C, NSE = 0.995, and R 2 = 0.995) obtained the best performance among the other seven-input composite models during the testing phase. The performance of the eight-input composite models confirmed that the KELM model (RMSE = 0.429 • C, NSE = 0.995, and R 2 = 0.995) yielded better results than the BRT, MARS, RBFNN, and MLPNN models during the testing phase. Considering the best composite models, the best performance of the developed models (i.e., BRT (two-input), MARS (eight-input), RBFNN (one-input), MLPNN, and KELM (three-input)) were found based on different composites of input parameters during the testing phase. It can be seen from Table 3 that the optimized structures of all input composites for the KELM model yielded better performance than those of BRT, MARS, RBFNN, and MLPNN models during the testing phase. Thus, the KELM model is more influential than the BRT, MARS, RBFNN, and MLPNN models in predicting and generalizing the time series of dew point temperature at Durham station.
The scatter plots of observed and estimated daily T dew is shown in Figure 3 using the best composite models during the testing phase at Durham station. It can be found from the R 2 values that there is a slight difference among the BRT, MARS, RBFNN, MLPNN, and KELM models. The KELM model exhibited better performance than the BRT, MARS, RBFNN, and MLPNN models, while the BRT model yielded the lowest accuracy among the best composite models at Durham station. The scatter plots of observed and estimated daily Tdew is shown in Figure 3 using the best composite models during the testing phase at Durham station. It can be found from the R 2 values that there is a slight difference among the BRT, MARS, RBFNN, MLPNN, and KELM models. The KELM model exhibited better performance than the BRT, MARS, RBFNN, and MLPNN models, while the BRT model yielded the lowest accuracy among the best composite models at Durham station.    Figure 4 illustrates a comparison of RMSE values for the best composite models during the testing phase at Durham station. It can be seen from Figure 4 that the RMSE values of BRT, MARS, and RBFNN models were larger than those of MLPNN and KELM models during the testing phase. In addition, the KELM model produced the best accuracy, whereas the MARS model produced the lowest accuracy based on the best composite models at Durham station.

UC Riverside Station
The performance metrics of the best composite models are provided in Table 4 for UC Riverside station. For one-input composite models, the values of three evaluation indicators for the vapor pressure (VP) parameter were the best for the BRT, MARS, RBFNN, MLPNN, and KELM models during the testing phase. As seen from Table 4, the KELM model (RMSE = 0.570 • C, NSE = 0.992, and R 2 = 0.992) surpassed the BRT, MARS, RBFNN, and MLPNN models during the testing phase. Therefore, vapor pressure (VP) among different parameters was chosen as the basic parameter to define two-, three-, four-, five-, six-, seven-, and eight-input composite models. In two-input composite models, the composite of VP and ST parameters was better than other two-input composites in the BRT and RBFNN models, and the composite of VP and RHMAX parameters was better than the other two-input composites in the MARS and KELM models.  The best models yielding the lowest RMSE are shown in boldface.
By analyzing the four-input composite models, it is clear that the composite of VP, ST, RHMAX, and WS parameters was better than the other four-input composites in the MARS and RBFNN models. Considering the best composite models, the best performance of the developed models (i.e., BRT (one-input), MARS (eight-input), RBFNN (three-input), MLPNN, and KELM (seven-input)) can be identified based on different composites of input parameters during the testing phase. It can be seen from Table 4 that the optimized structures of all input composites for the KELM model gave a better performance than those of BRT, MARS, RBFNN, and MLPNN models during the testing phase. Thus, the KELM model was more accurate than the BRT, MARS, RBFNN, and MLPNN models to predict and generate the time series of dew point temperature at UC Riverside station.
Scatter plots of observed and estimated daily T dew are shown in Figure 6 using the best composite models during the testing phase at UC Riverside station. It can be seen from R 2 values that there is a trivial difference among the BRT, MARS, RBFNN, MLPNN, and KELM models. In addition, the KELM model gave a better performance than the BRT, MARS, RBFNN, and MLPNN models, while the MARS model provided the lowest accuracy among the best composite models at UC Riverside station.  Figure 6 using the best composite models during the testing phase at UC Riverside station. It can be seen from R 2 values that there is a trivial difference among the BRT, MARS, RBFNN, MLPNN, and KELM models. In addition, the KELM model gave a better performance than the BRT, MARS, RBFNN, and MLPNN models, while the MARS model provided the lowest accuracy among the best composite models at UC Riverside station.  Figure 7 compares the RMSE values for the best composite models during the testing phase at UC Riverside station. It can be seen from Figure 7 that the RMSE values of the BRT, RBFNN, and MARS models were larger than those of the MLPNN and KELM models during the testing phase. In addition, the KELM model accomplished the best accuracy, whereas the BRT model provided the least accuracy, based on the best composite models at UC Riverside station.  Figure 7 compares the RMSE values for the best composite models during the testing phase at UC Riverside station. It can be seen from Figure 7 that the RMSE values of the BRT, RBFNN, and MARS models were larger than those of the MLPNN and KELM models during the testing phase. In addition, the KELM model accomplished the best accuracy, whereas the BRT model provided the least accuracy, based on the best composite models at UC Riverside station.   Figure 8 explains the error histogram comprising mean (µ) and standard deviation (σ) for the best composite models at UC Riverside station. The comparison displays that the KELM model provided the lowest standard deviation, whereas the MARS model gave the highest standard deviation based on the best composite models during the testing phase. This trails the pattern of RMSE values for the best composite models during the testing phase at UC Riverside station.  Figure 7 compares the RMSE values for the best composite models during the testing phase at UC Riverside station. It can be seen from Figure 7 that the RMSE values of the BRT, RBFNN, and MARS models were larger than those of the MLPNN and KELM models during the testing phase. In addition, the KELM model accomplished the best accuracy, whereas the BRT model provided the least accuracy, based on the best composite models at UC Riverside station.

Discussion
The results showed that the best composite models took on the nonlinear behavior of dew point temperature at both stations. A comparison based on RMSE values among the best composite models at Durham station supported that the KELM model improved by 8 Within the category of the best composite models, the predictive accuracy using the KELM model at UC Riverside station improved over that of the other models, whereas the predictive accuracy at Durham station was slightly enhanced. The improvement difference between the two stations came from the characteristics of the data available. This can be found from previous research [19,45,46]. If two (or three) potential models provided the best predictive accuracy, additional approaches (e.g., null hypothesis [47] and Akaike's information criterion [48]) were recommended in order to select the best model. The authors of [8] developed the GRNN and MLP models for predicting daily dew point temperature at Durham and UC Riverside stations. They showed that the best model (i.e., the GRNN4 model) provided RMSE = 0.07 °C (Durham station) and 0.08 °C (UC Riverside station). Reference [20] employed the GRNN, SOM, ANFIS-C, and ANFIS-G models for predicting daily dew point temperature at Daegu, Pohang, and Ulsan stations, South Korea. The authors found that GRNN, ANFIS-C, and ANFIS-G models were more accurate than the SOM model. In addition, the predictive results followed the previous studies of [8,20].
In addition, different nature-inspired optimization algorithms and data preprocessing tools can be combined with machine learning models to enhance the predictive accuracy of the applied models. To reinforce the model accuracy of this study, continuous research utilizing the different machine learning models, evolutionary algorithms, and data preprocessing techniques should be undertaken for predicting dew point temperature.

Conclusions
This study was intended to investigate a kernel extreme learning machine (KELM) model to estimate daily dew point temperature (Tdew) at two different stations in the USA. The KELM model was trained to estimate daily Tdew by employing hydroclimatic variables, including WS, TMAX, TMIN, RHMAX, RHMIN, VP, ST, and SR, as inputs. Additionally, eight different scenarios were applied to investigate the effect of hydrometeorological variables on daily Tdew estimation using different machine learning models. The KELM models were compared with the BRT, MARS, RBFNN, and MLPNN models with respect to the Nash-Sutcliffe efficiency (NSE) statistical indices and coefficient of determination (R 2 ) and root mean square error (RMSE) indicators. The KELM models using the three-input parameters of VP, TMAX, and RHMIN and the seven-input parameters of VP,

Discussion
The results showed that the best composite models took on the nonlinear behavior of dew point temperature at both stations. A comparison based on RMSE values among the best composite models at Durham station supported that the KELM model improved by 8.5% (BRT model), 3.2% (RBFNN model), 4.1% (MARS model), and 1.9% (MLPNN model) during the testing phase. Additionally, a comparison based on RMSE values among the best composite models at UC Riverside station showed that the KELM model enhanced the accuracy by 18.1% (BRT model), 23.5% (MARS model), 15.2% (RBFNN model), and 1.4% (MLPNN model) during the testing phase.
Within the category of the best composite models, the predictive accuracy using the KELM model at UC Riverside station improved over that of the other models, whereas the predictive accuracy at Durham station was slightly enhanced. The improvement difference between the two stations came from the characteristics of the data available. This can be found from previous research [19,45,46]. If two (or three) potential models provided the best predictive accuracy, additional approaches (e.g., null hypothesis [47] and Akaike's information criterion [48]) were recommended in order to select the best model. The authors of [8] developed the GRNN and MLP models for predicting daily dew point temperature at Durham and UC Riverside stations. They showed that the best model (i.e., the GRNN4 model) provided RMSE = 0.07 • C (Durham station) and 0.08 • C (UC Riverside station). Reference [20] employed the GRNN, SOM, ANFIS-C, and ANFIS-G models for predicting daily dew point temperature at Daegu, Pohang, and Ulsan stations, South Korea. The authors found that GRNN, ANFIS-C, and ANFIS-G models were more accurate than the SOM model. In addition, the predictive results followed the previous studies of [8,20].
In addition, different nature-inspired optimization algorithms and data preprocessing tools can be combined with machine learning models to enhance the predictive accuracy of the applied models. To reinforce the model accuracy of this study, continuous research utilizing the different machine learning models, evolutionary algorithms, and data preprocessing techniques should be undertaken for predicting dew point temperature.

Conclusions
This study was intended to investigate a kernel extreme learning machine (KELM) model to estimate daily dew point temperature (T dew ) at two different stations in the USA. The KELM model was trained to estimate daily T dew by employing hydroclimatic variables, including WS, TMAX, TMIN, RHMAX, RHMIN, VP, ST, and SR, as inputs. Additionally, eight different scenarios were applied to investigate the effect of hydrometeorological variables on daily T dew estimation using different machine learning models. The KELM models were compared with the BRT, MARS, RBFNN, and MLPNN models with respect to the Nash-Sutcliffe efficiency (NSE) statistical indices and coefficient of determination (R 2 ) and root mean square error (RMSE) indicators. The KELM models using the three-input parameters of VP, TMAX, and RHMIN and the seven-input parameters of VP, ST, RHMAX, TMIN, RHMIN, TMAX, and WS outperformed the other models for the estimation of daily T dew at Durham and UC Riverside stations, respectively. The results confirmed that the KELM model improved by 8.5% (BRT model), 3.2% (RBFNN model), 4.1% (MARS model), and 1.9% (MLPNN model) based on the RMSE values among the best composite models during the testing phase at Durham station. In addition, the KELM model enhanced the accuracy by 18.1% (BRT model), 23.5% (MARS model), 15.2% (RBFNN model), and 1.4% (MLPNN model) based on the RMSE values among the best composite models during the testing phase in UC Riverside station. The results suggest that the KELM model can be successfully used for estimating dew point temperature using weather data as an important parameter for sustainable water resource management.