Modeling of Monthly Rainfall–Runoff Using Various Machine Learning Techniques in Wadi Ouahrane Basin, Algeria

: Rainfall–runoff modeling has been the core of hydrological research studies for decades. To comprehend this phenomenon, many machine learning algorithms have been widely used. Never-theless, a thorough comparison of machine learning algorithms and the effect of pre-processing on their performance is still lacking in the literature. Therefore, the major objective of this research is to simulate rainfall runoff using nine standalone and hybrid machine learning models. The conventional models include artiﬁcial neural networks, least squares support vector machines (LSSVMs), K-nearest neighbor (KNN), M5 model trees, random forests, multiple adaptive regression splines, and multi-variate nonlinear regression. In contrast, the hybrid models comprise LSSVM and KNN coupled with a gorilla troop optimizer (GTO). Moreover, the present study introduces a new combination of the feature selection method, principal component analysis (PCA), and empirical mode decomposition (EMD). Mean absolute error (MAE), root mean squared error (RMSE), relative RMSE (RRMSE), person correlation coefﬁcient (R), Nash–Sutcliffe efﬁciency (NSE), and Kling Gupta efﬁciency (KGE) metrics are used for assessing the performance of the developed models. The proposed models are applied to rainfall and runoff data collected in the Wadi Ouahrane basin, Algeria. According to the results, the KNN–GTO model exhibits the best performance (MAE = 0.1640, RMSE = 0.4741, RRMSE = 0.2979, R = 0.9607, NSE = 0.9088, and KGE = 0.7141). These statistical criteria outperform other developed models by 80%, 70%, 72%, 77%, 112%, and 136%, respectively. The LSSVM model provides the worst results without pre-processing the data. Moreover, the ﬁndings indicate that using feature selection, PCA, and EMD signiﬁcantly improves the accuracy of rainfall–runoff modeling.


Introduction
Accurate rainfall-runoff modeling has been one of the most popular subjects for hydrology researchers because of its importance for water resources planning and management, including dam design, reservoir operation planning, and flood mitigation management [1,2].In addition, the development of these models enhances comprehension of the ongoing hydrological processes in the watersheds [3].This topic has gained paramount attention in recent years because of the world's declining water supply, which necessitates the development of accurate modeling techniques [4].The intricate link between rainfall and runoff makes it difficult to estimate runoff accurately [5].This can be attributed to the heterogeneous distribution and the spatial-temporal fluctuations of hydrological components [6].In addition to rainfall, wind speed, temperature, solar radiation, evapotranspiration, and other meteorological factors, catchment-specific characteristics (e.g., land cover, topography, soil type, and slope) affect river runoff changes.As a result, developing accurate models to capture this dynamic and nonlinear natural phenomenon is challenging because these interrelated factors take place at many temporal and geographical scales [7].Additionally, it is challenging to gather predictor variables from a catchment system using large samples.The difficulties of accurate and quantitative representation of the available data provide the key problems in the modeling process.
In general, there are two categories of hydrological models: (a) conceptual and physicalbased models and (b) empirical or data-driven models.The former models need a lot of input parameters and a lot of hydro-meteorological information.The applicability of these models to represent hydrological processes is frequently limited by these constraints [8].Also, in the absence of accurate data on meteorological and site-specific parameters, the data-driven models are suitable for modeling the rainfall-runoff process due to their minimal input dataset requirements [9].Machine learning and data-driven models have been effectively used in recent years to simulate the nonlinear and nonstationary runoff phenomenon [10][11][12].These approaches can be used to simulate hydrological processes due to various physical phenomena, such as the periodicity, pattern, or randomness of model input and target data [13,14].
Tikhamarine et al. [15] introduced the combination of Harris Hawks optimization (HHO) with a multi-layer perceptron neural network and least squares support vector machine (LSSVM) to predict the rainfall-runoff.Based on the autocorrelation function (ACF), partial ACF (PACF), and cross-correlation function, five alternative situations were explored.The performance of the suggested models was compared with data-driven methodologies integrated with particle swarm optimization (PSO).The findings showed that hybrid models trained using HHO exhibited better performance in forecasting runoff compared with integrated models with PSO.Additionally, coupling LSSVM with HHO resulted in a high degree of runoff prediction accuracy.Adnan et al. [16] examined the application of four machine learning techniques to estimate rainfall-runoff at an hourly timeframe in the Italian Samoggia River basin.The models included a multi-model simple averaging ensemble approach, multiple adaptive regression splines (MARS), an M5 model tree, as well as an adaptive neuro-fuzzy inference system (ANFIS) with fuzzy c-means (FCM) and the PSO algorithm.The outcomes of the developed models were compared with the theoretical EBA4SUB model using five statistics: mean absolute error (MAE), root mean squared error (RMSE), Nash-Sutcliffe efficiency (NSE), modified index of agreement, and scatter index.The MARS, ANFIS-FCM, and ANFIS-PSO offered equal accuracy, which was better than the M5 model.The machine approaches often outperformed the EBA4SUB when compared to the conceptual event-based method; however, in some instances, the latter method provided higher accuracy than the M5 model and MARS.
Mohammadi [11] reviewed the application of machine learning approaches (e.g., support vector machine (SVM), artificial neural network (ANN), and ANFIS) for hydrological subjects, including streamflow, rainfall-runoff, surface hydrology, and flood modeling.Furthermore, the benefits and drawbacks of popular machine learning models were critically examined in the field of runoff modeling.Okkan et al. [17] integrated ANN and support vector regression (SVR) into a conceptual rainfall-runoff model for monthly runoff simulation in the Gediz River Basin, Turkey.The nested hybrid models' parameters were all calibrated at once.The nested hybrid models outperformed the standalone models and linked model versions in terms of mean and high flows, according to the performance metrics.Thus, the research affirmed the credibility of a modeling approach that combined a conceptual model and several machine learning approaches.Roy et al. [18] applied a deep neural network (DNN) and EO-ELM model that integrated an equilibrium optimizer (EO) and an extreme learning machine (ELM) for rainfall-runoff modeling in the UK's River Fal at Tregony and the Teifi in Glanteifi.In order to deploy the suggested models, an ideal amount of Water 2023, 15, 3576 3 of 24 lag inputs was determined using PACF.The proposed models were validated in terms of prediction accuracy using ELM, kernel ELM, PSO-based ELM, SVR, ANN, and gradient boosting machines.Additionally, the research applied a discrete wavelet-based dataset preprocessing approach to improve the performance of the suggested models.This research demonstrated how well EO-ELM and DNN may be used for rainfall-runoff modeling.
Waqas et al. [19] developed radial basis function (RBF)-SVM and M5 models to model the rainfall-runoff process in the Jhelum River Basin, Pakistan.The models were trained and tested using various combinations of datasets.Modeled and observed data were assessed using the coefficient of determination (R 2 ), normalized RMSE, MSE, and coefficient of efficiency for the training and testing phases.According to the findings, gene expression programming was found to be the most precise and highly effective technique.Xiao et al. [20] developed a backpropagation neural network, a generalized regression neural network (GRNN), an ELM, and a wavelet neural network (WNN) for runoff forecasting in the Xijiang River.The GRNN model performed better in runoff forecasting by considering flood propagation time.The WNN model exhibited the highest accuracy in the 7-day lead time for water level.This study suggested a machine learning-based runoff forecasting model would enhance flood and drought early warning systems.Singh et al. [21] used MARS, SVM, multiple linear regression (MLR), and random forest (RF) for rainfall-runoff prediction in the Gola watershed, Uttarakhand.The performance of models was assessed using numerical indices (i.e., R 2 , RMSE, NSE, and percent bias) along with graphical charting (i.e., scatter plots, relative error plots, violin plots, line diagrams, and Taylor diagrams).In all case studies, the RF outperformed the other models in terms of daily runoff forecasting in both the training and testing phases.
After reviewing the literature, it is observed that many machine learning algorithms have been employed to mimic rainfall-runoff simulation.However, there is a lack of a comprehensive comparison of machine learning algorithms.In this regard, the main goal of this research is to simulate the rainfall-runoff phenomenon using standalone and hybrid machine learning models.ANN, LSSVM, K-nearest neighbor (KNN), M5 model, RF, MARS, and multivariate nonlinear regression (MNLR) are examples of conventional models.Meanwhile, hybrid models refer to LSSVM and KNN coupled with gorilla troop optimizer (GTO).Additionally, this study introduces a new combination of the feature selection method, principal component analysis (PCA), and empirical mode decomposition (EMD).The developed models are evaluated using MAE, RMSE, relative RMSE (RRMSE), person correlation coefficient (R), NSE, and Kling Gupta efficiency (KGE).The proposed models are applied to rainfall and runoff dataset records in Wadi Ouahrane, Algeria, because of the complex and nonlinear nature of runoff precipitation in this basin.

Multivariate Empirical Mode Decomposition (EMD)
EMD was introduced to decompose a signal of original data into finite and small oscillating modes.The oscillating methods are known as intrinsic mode functions (IMFs) and should meet the following criteria [22]: Over the entire signal length, the number of zero-crossings and the number of local maxima and minima are either equal to or at least differ by one.

2.
The average upper and lower envelopes calculated by local maxima and minima should be equal to zero.
EMD does not need to select the base function, and it is an alternative to signal decomposition methods such as the Fourier transform and the wavelet transform.In this process, the IMFs are obtained from the signal until they satisfy the above-mentioned criteria.The sifting method for extracting IMFs includes the following steps: Step 1: Determine all the extreme points of the given signal.
Step 2: Use a cubic spline to fit the upper and lower envelopes of the signal.
Step 4: Subtract the average from the data to create the IMF candidate using Equation (2).
Step 5: If h(t) satisfies the two criteria for IMFs, it is considered the first IMF; otherwise, y(t) is replaced with h(t), and we go to step 1.
Step 6: The residual is regarded as new data, and steps 1-5 are applied.This process continues until the number of residues is constant or their trend is obtained.EMD is a simple and efficient method for the decomposition of signals.It is appropriate for identifying immediate frequency changes, especially for nonstationary signals.

Principle Component Analysis (PCA)
PCA is used for data pre-processing to identify the correlation among candidate factors.It converts the input variables into uncorrelated derived variables called principal components (PCs).Sums of PC variances are equal for the original and uncorrelated derived variables.PCs can be obtained using a linear function in Equation (3): where X j is the original variable, j is the index of the input variable, and i is the index of PC; a i,j and PC i are the eigenvalues and eigenvectors of the covariance matrix, respectively.The present study employs PCA because of the large size of the input dataset.

Multivariate Nonlinear Regression (MNLR)
MNLR is a nonlinear regression that estimates the nonlinear relationship between multiple inputs and output data.Equation (4) can be used for estimating the target variable.
where W and b are the weight and bias parameters, respectively.

Artificial Neural Networks (ANNs)
ANN is a machine learning algorithm that solves linear or nonlinear regression and classification problems.It processes input and output data in a multi-layer network to find the relationship between variables.It consists of one input layer, one or multiple hidden layers, and one output layer, in which each layer comprises one or several neurons.Neurons are simple mathematical models of biological neurons.In the hidden layer, the weighted summation of back layer neurons is imposed on one stimulation function, and the stimulation function generates one output signal, which is the input of the subsequent layer neurons.

K-Nearest Neighbor (KNN)
KNN is a nonparametric machine learning algorithm that solves regression and classification problems without presuppositions about training data distribution.In this algorithm, training data are considered neighbor points.The inverse Euclidean distance between the testing data and neighbor points is regarded as the weight of these points.The shorter the Euclidean distance, the greater the weight.The neighbor points are sorted based on their weights, and the K neighbor points with the highest weights are selected.Then, the KNN Water 2023, 15, 3576 5 of 24 computes the output of each input dataset using the weighted average of the K neighbor (Equation ( 5)) [24]: where R j is the jth observed runoff in the training period, R output,i is the ith estimated runoff, and W j is the jth weight of the neighbor that can be calculated in Equation ( 6): where X and X j are the testing and training input data, respectively.Figure 1 shows the KNN scheme for modeling runoff.

K-Nearest Neighbor (KNN)
KNN is a nonparametric machine learning algorithm that solves regression and cl sification problems without presuppositions about training data distribution.In this alg rithm, training data are considered neighbor points.The inverse Euclidean distance b tween the testing data and neighbor points is regarded as the weight of these points.T shorter the Euclidean distance, the greater the weight.The neighbor points are sort based on their weights, and the K neighbor points with the highest weights are selecte Then, the KNN computes the output of each input dataset using the weighted average the K neighbor (Equation ( 5)) [24]: where   is the jth observed runoff in the training period,  , is the ith estimat runoff, and   is the jth weight of the neighbor that can be calculated in Equation ( 6): where  and   are the testing and training input data, respectively.Figure 1 shows t KNN scheme for modeling runoff.

Multivariate Adaptive Regression Spline (MARS)
MARS is a nonparametric and nonlinear machine learning algorithm for solving v ious regression and classification problems.MARS divides the original dataset into m tiple sub-datasets.Then, for each sub-dataset, the target variable is fit using a spline gression.The formulation for this process is given by: ( )

Multivariate Adaptive Regression Spline (MARS)
MARS is a nonparametric and nonlinear machine learning algorithm for solving various regression and classification problems.MARS divides the original dataset into multiple sub-datasets.Then, for each sub-dataset, the target variable is fit using a spline regression.The formulation for this process is given by: where b is the bias parameter, β is a constant coefficient, h is the basis function, and N is the number of basis functions [25].

M5 Model Tree (M5)
M5 is one of the tree-based machine learning algorithms used for modeling continuous variables.Its structure resembles a tree that consists of nodes, branches, and leaves.It splits the feature space into subsets, and a linear regression is fitted to the target variables of each subset.This process includes two steps: (1) growing the tree using input data and establishing linear regression at the end of each leaf, and (2) pruning extra branches to avoid overfitting.The splitting criterion is the maximum reduction in standard deviation, and it is calculated as follows [26]: where S is a subset in the parent node, S i is a subset in the child node, and sd is the standard deviation for the input data.

Least Square Support Vector Machine (LSSVM)
The LSSVM is a modified version of the standard SVM.Unlike SVM, LSSVM employs linear equations instead of quadric programming and modifies SVM's computation time efficiency and accuracy.LSSVM uses the following equation to estimate the output: where α and b are lagrangian coefficients and bias, respectively.H is a kernel function that maps the nonlinear relation between input and output variables in low and highdimensional feature space.This helps LSSVM solve the nonlinear problems in linear form.The linear, polynomial, sigmoid, and RBF are different types of kernel functions.However, the RBF is the most accurate kernel function that has been used in many studies.The RBF kernel functions are estimated as follows [27]: where σ represents the width of the kernel function.The main parameters of LSSVM are the penalty coefficient (gamma) and σ, in which gamma is used for computing α and b.
The LSSVM scheme, including one input, hidden input, and final output, is demonstrated in Figure 2.

M5 Model Tree (M5)
M5 is one of the tree-based machine learning algorithms used for modeling continuous variables.Its structure resembles a tree that consists of nodes, branches, and leaves.It splits the feature space into subsets, and a linear regression is fitted to the target variables of each subset.This process includes two steps: (1) growing the tree using input data and establishing linear regression at the end of each leaf, and (2) pruning extra branches to avoid overfitting.The splitting criterion is the maximum reduction in standard deviation, and it is calculated as follows [26]: where  is a subset in the parent node,   is a subset in the child node, and  is the standard deviation for the input data.

Least Square Support Vector Machine (LSSVM)
The LSSVM is a modified version of the standard SVM.Unlike SVM, LSSVM employs linear equations instead of quadric programming and modifies SVM's computation time efficiency and accuracy.LSSVM uses the following equation to estimate the output: , where  and  are lagrangian coefficients and bias, respectively. is a kernel function that maps the nonlinear relation between input and output variables in low and highdimensional feature space.This helps LSSVM solve the nonlinear problems in linear form.The linear, polynomial, sigmoid, and RBF are different types of kernel functions.However, the RBF is the most accurate kernel function that has been used in many studies.The RBF kernel functions are estimated as follows [27]: where  represents the width of the kernel function.The main parameters of LSSVM are the penalty coefficient () and , in which  is used for computing  and .The LSSVM scheme, including one input, hidden input, and final output, is demonstrated in Figure 2.

Random Forest Regression (RF)
RF is one of the ensemble machine learning algorithms that solves decision trees' overfitting and instability problems.First, n random subsample from the original data is created.Then, for each subsample, one tree model is fitted, and RF integrates the generated results of all n trees into the outcome.In the present study, the M5 is considered an RF tree.For more information about RF, please see [28].

Gorilla Troop Optimizer (GTO)
GTO is based on the collaborative behavior of gorillas.This algorithm mimics five strategies of gorillas, including migration to unknown regions, migration to other gorillas, migration to other known locations, following the silverback, and competition for adult females [29].The first three strategies are for exploration, and the remaining ones are for exploitation.Each artificial gorilla is considered one optimization problem solution, and the best gorilla in each iteration is regarded as a silverback.When rand < p, the first strategy of moving to an unknown region is selected.However, rand < 0.5 implies that the gorilla moves toward other gorillas, and if rand > 0.5, the gorilla shall migrate to known locations.The three exploration strategies are given by [30]: , rand < 0.5 (11) In this context, GX iter+1 i is a new candidate position vector of gorilla, X iter i is the current position of gorilla, rand 1 , rand 2 , and rand 3 are random numbers in the range between 0 and 1.The p variable represents the probability of migration to unknown regions.X r and GT r are members of artificial gorillas that are randomly selected from the whole population.ub and lb are the upper and lower bounds of decision variables.C, L, and H can be calculated in Equations ( 12) and ( 13): where iter refers to the current iteration, Max_Iter is the maximum number of iterations, F is computed using Equation ( 8), cos is a cosine function, and rand 4 is a random number in the range of [0, 1].L is calculated using Equation (9), l is a random number ranging from 0 to 1, H is computed using Equation (11), and Z is a random value in the range between -C and C. The fitness function of all GX is evaluated at the end of an exploration phase, and if the fitness function of GX iter is less than X iter , the GX iter is used as X iter .The best solution at this stage is the silverback gorilla.GTO uses the silverback and competition for adult female strategies in the exploitation phase.Silverback is the head of the group that makes decisions and guides other gorillas to food sources.The young gorillas become mature and compete with other gorillas to select adult female gorillas.As per the below equation, these two strategies are mathematically modeled.If C ≥ W, the first strategy is followed; otherwise, the second strategy is selected.W can be set before running GTO in Equation (17).
where X silverback is the position vector of the silverback, Q is the impact force, and A is the degree of violence in case of conflicts.Meanwhile, M, Q, and A are computed using the following Equations: Water 2023, 15, 3576 8 of 24 where GX iter i is the current position of the candidate gorilla's vector, N is the number of gorillas, rand 5 represents a random number between 0 and 1, and E simulates the violence effect on the solution's dimensions.The values of g and E are calculated as follows: where N 1 is a normal value with a normal distribution in the problem's dimensions and N 2 is a random number with a normal distribution.

Hybrid of LSSVM and KNN with Gorilla Troop Optimizer
Both LSSVM and KNN have essential parameters that should be selected before maneuvering them.However, choosing these parameters is still challenging for scientific societies.Using nature-based optimization algorithms can be an excellent solution to this challenge.Hence, in the present study, the GTO algorithm, as an efficient optimization algorithm, is used to determine the optimal LSSVM and GTO values.In this regard, the two-hybrid algorithms called KNN-GTO and LSSVM-GTO are defined.In KNN-GTO, the numbers of neighbors and input weight vectors are considered decision variables, whereas in LSSVM-GTO, penalty coefficients (gamma) and σ are decision variables.For finding the optimal parameters of KNN and LSSVM, GTO solves the following fitness function (Equation ( 23)) in a pre-defined maximum number of iterations: where R observed, i is the observed runoff.The pseudocodes of KNN-GTO and LSSVM-GTO are presented in Algorithm 1. iter: = iter + 1 10: end while 11: Return the best solution (optimal W and K for KNN, and gamma and σ for LSSVM)

Assessment Criteria
In this study, MAE, RMSE, RRMSE, R, NSE, and KGE metrics are used for assessing the performance of rainfall-runoff models using the following Equations [31,32]: Water 2023, 15, 3576 where R output,i , R observe,i , R output , R observe , std R output , std(R observe ), and N are the output runoff, observed runoff, average output runoff, average observed runoff, standard deviation of output runoff, standard deviation of observed runoff, and number of data, respectively.

Case Study and Data Description
The study area is the Wadi Ouahrane basin in northern Algeria, which is located between 36 • 00 N-36 • 24 N and 01 • 00 E-01 • 3 E.This 270 km 2 region is a section of the Wadi Cheliff basin (Figure 3).The research area was mapped using a digital elevation model (12.5 m horizontal resolution), which displays a maximum altitude of 991 m and a minimum altitude of 165 m.A little, few kilometers long tributary of Wadi Cheliff is called Wadi Ouahrane.The flow of water in this basin is controlled by six pluviometric stations.The Wadi Ouahrane basin is constrained by the Wadi Allala basin to the north, the Wadi Sly basin to the south, the Wadi Fodda basin to the east, and the Wadi Ras basin to the west.With an average interannual rainfall of 333 mm from 1972 to 2018, evapotranspiration (ET) is 1050 mm, and the mean annual flow is equal to 0.472 m 3 /s; this basin has a Mediterranean climate.The yearly average temperature is 18 Celsius.The monthly rainfall datasets were obtained at six stations between 1972 and 2018, and these dataset records are used in this study.The meteorological information was given by the National Meteorological Organization and the National Water Resources Agency of Algeria.
The correlation plot for the input and target variables is shown in Figure 4.In this figure, positive correlation shows a direct relationship between inputs and targets, negative correlation shows the inverse relationship, and close to zero correlation indicates no relation between inputs and targets.The maximum and minimum correlation between input and target variables in Figure 4 are related to R_S1 and T mean , respectively.However, the correlation between inputs and targets is not close to 1 or −1.Also, the statistical criteria for input and target variables are presented in Table 1.According to this table, although the coefficient of variation in runoff data is lower than the inputs, its skewness coefficient is significantly higher than the inputs.Therefore, the runoff data studied do not follow the normal distribution and have high dispersion.These observations prove the nonlinear runoff production in this basin.Consequently, powerful nonlinear methods are expected to be needed for rainfall-runoff modeling in this basin.

Presented Framework for Modeling Rainfall-Runoff
The present study introduces a framework based on a combination of the feature selection method, PCA, EMD, and hybrids of KNN and LSSVM with GTO.In this framework, the most important inputs are selected using feature selection, and then the dataset is randomly divided into training and testing periods.The pseudocode of the applied feature selection method is illustrated in Algorithm 2. This feature selection method selects lagged inputs with a higher correlation with the target data.Calculate the Pearson correlation coefficient (R) between the feature and target data.

5:
If R < threshold of R 6: Remove feature from the input data 7: end if 8: i: = i + 1 9: end while 10: Apply PCA to the remaining input data 11: Return the final inputs list After feature selection, the size of the selected feature can be considerable; therefore, the PCA is used for dimension reduction.Then, the prepared dataset is used to apply the KNN-GTO and LSSVM-GTO models to simulate the rainfall-runoff phenomenon.Finally, the best of the results are selected according to different evaluation criteria.Furthermore, the results of the presented framework are compared with those of other machine learning algorithms, including MLR, KNN, ANN, M5, MARS, LSSVM, and RF, to val-idate the performance of the introduced framework.Figure 5 shows the scheme of the employed framework.

11: Return the final inputs list
After feature selection, the size of the selected feature can be considerable; therefore, the PCA is used for dimension reduction.Then, the prepared dataset is used to apply the KNN-GTO and LSSVM-GTO models to simulate the rainfall-runoff phenomenon.Finally, the best of the results are selected according to different evaluation criteria.Furthermore, the results of the presented framework are compared with those of other machine learning algorithms, including MLR, KNN, ANN, M5, MARS, LSSVM, and RF, to validate the performance of the introduced framework.Figure 5 shows the scheme of the employed framework.

Results and Discussion
This study defines five scenarios for rainfall-runoff modeling (Table 2).In the first scenario, rainfall in six sections (T min , T mean , T max , Rh_mean, and SW) is considered an input.The second scenario resembles the first scenario, with the difference that a 0 to 24-month lag time is imposed on input data and the R threshold is equal to 0.05.The third to fifth scenarios are the same as the second scenario in input data; however, the main difference is the application of IMF and the R threshold value of 0.1.The MaxNumIMF in the third to fifth scenarios equals 3, 4, and 5, respectively.Since the size of the input dataset in the third to fifth scenarios is large, the PCA is employed for dimension reduction.each scenario is different from that in another scenario, owing to the various amounts of input data in each scenario.Furthermore, the number of inputs in the first scenario is less than that in other scenarios.Therefore, it is expected that the accuracy of modeling rainfallrunoff will be lower in this scenario compared to other scenarios.Also, in the third, fourth, and fifth scenarios, the values of W are higher than in other scenarios, showing a greater correlation between these data and runoff data.The greater W value in the mentioned scenarios can lead to the high precision of KNN and KNN-GTO.
Figure 6 shows the weight of inputs (W) obtained by KNN-GTO.As seen, the importance of inputs is between 0 and 1, according to the base assumptions of KNN.The W in each scenario is different from that in another scenario, owing to the various amounts of input data in each scenario.Furthermore, the number of inputs in the first scenario is less than that in other scenarios.Therefore, it is expected that the accuracy of modeling rainfall-runoff will be lower in this scenario compared to other scenarios.Also, in the third, fourth, and fifth scenarios, the values of W are higher than in other scenarios, showing a greater correlation between these data and runoff data.The greater W value in the mentioned scenarios can lead to the high precision of KNN and KNN-GTO.Table 4 compares the accuracy of machine learning algorithms for rainfall-runoff modeling for the training period.According to this table, all algorithms have weak performance in the first scenario.This issue indicates the importance of selecting appropriate inputs and dataset processing.However, in other scenarios, other algorithms, such as ANN, LSSVM, KNN, LSSVM-GTO, and KNN-GTO, are trained with comparable accuracy.The best performance is associated with ANN in the third scenario and KNN-GTO in the fourth and fifth scenarios.For ANN, the MAE, RMSE, and RRMSE are equal to 0.000, while R, NSE, and KGE are equal to 1.0000.In addition, the metrics for KNN-GTO are specified to be 0.0001, 0.0016, 0.0011, 1.0000, 1.0000, and 0.9998, respectively.Table 4 compares the accuracy of machine learning algorithms for rainfall-runoff modeling for the training period.According to this table, all algorithms have weak performance in the first scenario.This issue indicates the importance of selecting appropriate inputs and dataset processing.However, in other scenarios, other algorithms, such as ANN, LSSVM, KNN, LSSVM-GTO, and KNN-GTO, are trained with comparable accuracy.The best performance is associated with ANN in the third scenario and KNN-GTO in the fourth and fifth scenarios.For ANN, the MAE, RMSE, and RRMSE are equal to 0.000, while R, NSE, and KGE are equal to 1.0000.In addition, the metrics for KNN-GTO are specified to be 0.0001, 0.0016, 0.0011, 1.0000, 1.0000, and 0.9998, respectively.
The time series of rainfall-runoff modeling by KNN and KNN-GTO in the third, fourth, and fifth scenarios are compared in Figure 7.In the third and fifth scenarios, KNN and KNN-GTO have weaknesses in estimating peak runoff data.However, in the fourth scenario, KNN performs reasonably, and KNN-GTO is significantly better than the others.Additionally, KNN-GTO has higher accuracy than KNN, proving the capability of GTO to optimize and improve the precision of KNN.
Water 2023, 15, x FOR PEER REVIEW 18 of 25 The time series of rainfall-runoff modeling by KNN and KNN-GTO in the third, fourth, and fifth scenarios are compared in Figure 7.In the third and fifth scenarios, KNN and KNN-GTO have weaknesses in estimating peak runoff data.However, in the fourth scenario, KNN performs reasonably, and KNN-GTO is significantly better than the others.Additionally, KNN-GTO has higher accuracy than KNN, proving the capability of GTO to optimize and improve the precision of KNN.

IMF-KNN-scenario3
IMF-KNN-GTO-scenario3 IMF-KNN-scenario4 IMF-KNN-GTO-scenario4 IMF-KNN-scenario5 IMF-KNN-GTO-scenario5 KNN and KNN-GTO in the third, fourth, and fifth scenarios have closed results to the perfect fit line.In the testing period, KNN and KNN-GTO underestimated the runoff.Model bias refers to the presence of systematic errors in a model that can cause it to consistently make incorrect predictions.Therefore, in this study, a PBias criterion is employed to analyze the bias of modeling in the best scenario, which means the fourth scenario.The estimated values for PBias are listed in Table 6.According to the results of this table, in the training period, M5, MARS, MNLR, and KNN_GTO had lower PBias.KNN and ANN have underestimated results.In contrast, LSSVM and LSSVM-GTO have overestimated results.During the testing period, the bias of all investigated algorithms increased.This is for using new data during the testing period.In this period, the less PBias is related to the MARS, and the maximum value of PBias is related to the RF.Considering all periods, MARS has fewer PBias, and LSSVM-GTO has more PBias.Moreover, KNN_GTO has reasonable PBias compared to other investigated algorithms.According to the study conducted by [33], the performance of the model for the PBias less than 10, between 10 and 15, and between 15 and 25 is very good, good, and fair, respectively.Hence, in terms of bias, according to all periods, it is very good.Model bias refers to the presence of systematic errors in a model that can cause it to consistently make incorrect predictions.Therefore, in this study, a PBias criterion is employed to analyze the bias of modeling in the best scenario, which means the fourth scenario.The estimated values for PBias are listed in Table 6.According to the results of this table, in the training period, M5, MARS, MNLR, and KNN_GTO had lower PBias.KNN and ANN have underestimated results.In contrast, LSSVM and LSSVM-GTO have overestimated results.During the testing period, the bias of all investigated algorithms in- The convergence of GTO in optimizing KNN in the first to fourth scenarios is illustrated in Figure 11.In the fourth scenario, the minimum value of MSE is less than that in the other scenarios.The convergence speed in the fourth scenario is higher than that in the other scenarios.Therefore, using IMF improves the accuracy of rainfall-runoff modeling, and the optimal value of MaxNumIMF is equal to 4. The significant impact of using pre-processing and post-processing dataset methods and the use of time-lagged data show the effectiveness of input selection in modeling accuracy, which is accepted by the results of the third, fourth, and fifth scenarios in Tables 4 and 5.The role of dataset preprocessing and post-processing methods has been confirmed in other studies [34][35][36].Nevertheless, even in these scenarios, algorithms like LSSVM, M5, MARS, RF, and MNLR do not perform well, and ANN and LSSVM-GTO algorithms have moderate performance.The better accuracy of KNN and KNN-GTO algorithms is due to the kernel function, considering K nearest neighbor inputs.Also, the higher accuracy of KNN-GTO compared to KNN indicates the success of the GTO in finding the optimal parameters of the KNN algorithm.The convergence of GTO in optimizing KNN in the first to fourth scenarios is illustrated in Figure 11.In the fourth scenario, the minimum value of MSE is less than that in the other scenarios.The convergence speed in the fourth scenario is higher than that in the other scenarios.Therefore, using IMF improves the accuracy of rainfall-runoff modeling, and the optimal value of MaxNumIMF is equal to 4. The significant impact of using pre-processing and post-processing dataset methods and the use of time-lagged data show the effectiveness of input selection in modeling accuracy, which is accepted by the results of the third, fourth, and fifth scenarios in Tables 4 and 5.The role of dataset preprocessing and post-processing methods has been confirmed in other studies [34][35][36].Nevertheless, even in these scenarios, algorithms like LSSVM, M5, MARS, RF, and MNLR do not perform well, and ANN and LSSVM-GTO algorithms have moderate performance.The better accuracy of KNN and KNN-GTO algorithms is due to the kernel function, considering K nearest neighbor inputs.Also, the higher accuracy of KNN-GTO compared to KNN indicates the success of the GTO in finding the optimal parameters of the KNN algorithm.Nevertheless, even in these scenarios, algorithms like LSSVM, M5, MARS, RF, and MNLR do not perform well, and ANN and LSSVM-GTO algorithms have moderate performance.The better accuracy of KNN and KNN-GTO algorithms is due to the kernel function, considering K nearest neighbor inputs.Also, the higher accuracy of KNN-GTO compared to KNN indicates the success of the GTO in finding the optimal parameters of the KNN algorithm.

Conclusions
In the present study, a new methodology was introduced for rainfall-runoff modeling.This methodology comprised dataset decomposition, feature selection, dataset reduction, and modeling by nine standalone and hybrid machine learning algorithms.The employed machine learning algorithms included neural network-based algorithms (ANNs), kernelbased algorithms (LSSVM and KNN), tree-based algorithms (M5 and RF), regression-based algorithms (MARS and MNLR), and hybrid algorithms (LSSVM-GTO and KNN-GTO).The
The desired values of MAE and RMSE are zeros, and their undesired values are +∞.The desired values of RRMSE are in the range of [0, 0.5].The R-value lies between −1 and 1, and R values close to 1 indicate good model performance.NSE = 1 denotes a perfect fit between the model and the data.KGE values range between −∞ and 1, and values close to one indicate better model performance.

Figure 3 .
Figure 3. Map of the study area.

Figure 3 .
Figure 3. Map of the study area.

Figure 4 .
Figure 4. Correlation plot for the input and target variables.

Algorithm 2 feature selection 1 :
Load input data and target data 2: Apply lag times to input data 3: while i < number of input features do 4:

Figure 5 .
Figure 5. Scheme of the presented framework.Figure 5. Scheme of the presented framework.

Figure 5 .
Figure 5. Scheme of the presented framework.Figure 5. Scheme of the presented framework.

Figure 7 .
Figure 7. Time series plot of the best ML algorithms for modeling rainfall-runoff.Figures 8 and 9 show the scatter plot representing the observed and modeled runoff with the line of a perfect fit at 45° during the training and testing periods.The results closer to the 45° line indicate more accurate machine learning algorithms.KNN and KNN-GTO in the third, fourth, and fifth scenarios have closed results to the perfect fit line.In the testing period, KNN and KNN-GTO underestimated the runoff.

Figure 7 .
Figure 7. Time series plot of the best ML algorithms for modeling rainfall-runoff.

Figures 8 Figure 8 .
Figures 8 and 9 show the scatter plot representing the observed and modeled runoff with the line of a perfect fit at 45 • during the training and testing periods.The results closer to the 45 • line indicate more accurate machine learning algorithms.KNN and KNN-GTO in the third, fourth, and fifth scenarios have closed results to the perfect fit line.In the testing period, KNN and KNN-GTO underestimated the runoff.

Figure 8 .
Figure 8. Scatter plot of the best ML algorithms for modeling rainfall-runoff in the training period.Figure 8. Scatter plot of the best ML algorithms for modeling rainfall-runoff in the training period.

Water 2023 ,Figure 9 .
Figure 9. Scatter plot of the best ML algorithms for modeling rainfall-runoff in the testing period.

Figure 9 .Figure 10 .
Figure 9. Scatter plot of the best ML algorithms for modeling rainfall-runoff in the testing period.

Table 1 .
Statistical criteria for runoff modeling.

Table 2 .
Characteristics of scenarios.

Table 4 .
Results of rainfall-runoff modeling using machine learning algorithms for the training period.