# An Integrated Statistical-Machine Learning Approach for Runoff Prediction

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{8}

^{9}

^{10}

^{11}

^{12}

^{*}

## Abstract

**:**

^{2}), root mean square error (RMSE), Nash–Sutcliffe efficiency (NSE), and percent bias (PBAIS) indices. In addition to the numerical comparison, the models were evaluated. Their performances were evaluated based on graphical plotting, i.e., time-series line diagram, scatter plot, violin plot, relative error plot, and Taylor diagram (TD). The comparison results revealed that the four heuristic methods gave higher accuracy than the MLR model. Among the machine learning models, the RF (RMSE (m

^{3}/s), R

^{2}, NSE, and PBIAS (%) = 6.31, 0.96, 0.94, and −0.20 during the training period, respectively, and 5.53, 0.95, 0.92, and −0.20 during the testing period, respectively) surpassed the MARS, SVM, and the MLR models in forecasting daily runoff for all cases studied. The RF model outperformed in all four models’ training and testing periods. It can be summarized that the RF model is best-in-class and delivers a strong potential for the runoff prediction of the Gola watershed.

## 1. Introduction

_{means}. It was found that MARS-K

_{means}surpassed all other models for multi-step forecasting, i.e., one, six, and twelve hours in advance. In another study, Li et al. [115] evaluated the performances of extreme learning machines (ELM), RF, and SVM for forecasting daily, low, and peak streamflow. They reported the superiority of the ELM model.

## 2. Materials and Methods

#### 2.1. General Description of Study Area

^{2}. The climatic condition of the Gola watershed is mild and generally warm. The minimum and maximum elevations of the watershed are 252 m and 2302 m, respectively, above mean sea level. The Gola watershed comes under a subtropical climate with predominant seasonal rainfall. The average annual rainfall is 1699 mm, heavily influenced by monsoon rainfall. As the watershed lies on the eastern edge of the Himalayan ranges, it is subjected to heavy rainfall. It is mainly a spring-fed river; this river is a source of water for Haldwani and Kathgodam. The monsoon season extends from July to September and produces 90% of the annual rainfall. The watershed receives heavy rainfall in the months of July and August. Due to this, the mainstream of rainfed rivers like the Gola River subsequently has high discharge in these months of the year. The barrage is a landmark for the residents, and provides irrigation water for the bhabar fields. For this reason, it is very important to know the daily forecasting of the river flow to avoid any risk/distress/fatality.

#### 2.2. Data Acquisition and Input Data Preparation

#### 2.3. Gamma Test

_{1},…, X

_{n}corresponds to the predictor’s variables, i.e., m variables for a total of N data points, scalar Y

_{i}is the output variable, and Gamma (Γ) is calculated by building up a linear regression between input (X) and output (Y), as:

_{ratio,}given by Equation (3), indicates the predictability of the output variables. A model’s complexity can be determined from the output Y of Equation (2). A value of Γ close to zero indicates a suitable input variable. We have a complicated model when the gradient is high; we have a simple model if the gradient is low. The Gamma value is more reliable if the standard error (SE) of Γ is smaller. V

_{ratio,}given in Equation (3), measures the predictability of a variable.

^{2}(y) is the output variance of Y, and Γ is the Gamma function. When Vratio is near 0, the predictability is higher. We can build a more qualitative mathematical model with smaller values for Gamma (Γ), gradient, SE, and V

_{ratio}.

## 3. Software Application Used in the Study

#### 3.1. Multiple Linear Regression

_{1}, X

_{2}… …. X

_{n}= the inputs variables; α

_{0}= intercept, and α

_{1}, α

_{2}… … α

_{n}= regression coefficients.

#### 3.2. Multivariate Adaptive Regression Splines (MARS)

_{i}= the corresponding coefficient of ${\mathrm{H}}_{\mathrm{e}\mathrm{i}}\left({\mathrm{X}}_{\mathrm{v}}\left(\mathrm{e},\mathrm{I}\right)\right)$.

_{i}(X) = the splines function; ∂ = the coefficient of the spline function; and i = the total number of functions in the model.

#### 3.3. Support Vector Machine (SVM)

^{n}are the training inputs and y ϵ Y ⸦ R

^{n}are the training outputs. Assume a nonlinear function y is given by:

**w**+ b

^{T}Φ(x_{i})**Φ**(x

_{i}) = higher dimensional feature space by the linear mapping function of input space x. The main objective is to fit the dataset T with the help of function f(x), having the highest deviation ε from the training dataset T. The equation is now transformed into a constrained complex problem, as follows:

#### 3.4. Random Forest (RF)

_{i}= the measured variable value and i = the mean of all out-of-bag (OOB) predictions.

## 4. Performance Evaluation of Models

^{2}), root mean square error (RMSE), Nash–Sutcliffe efficiency (NSE), and percent bias (PBIAS), and visual interpretation using a line diagram, scatter diagram, violin plot, and relative and Taylor diagrams. The R

^{2}, RMSE [12,47,117,137,138], NSE [139], and BIAS [119,140,141] are described as:

^{2}is the ratio of explained variation compared to the total variation [142]. The coefficient of multiple determination measures the percentage of various independent variables, which can be explained by variations in the independent variables when taken together [143]. It ranges from 0 to 1; its higher value indicates less error variance, and generally, a value greater than 0.5 is considered acceptable [144,145]. It is famously used in model evaluation. This statistical tool is highly sensitive to outliers and insensitive to additive and proportional differences between observed and predicted data [146]. The square root of the average square of all of the errors is called root mean square error (RMSE) [104]. It is an excellent general-purpose error matrix commonly used for the numerical prediction model. RMSE has a good measure of accuracy, but it can only compare the prediction error of models or configure a particular variable and not between two different variables, making it scalar-dependent.

^{2}and NSE values, and lower RMSE and PBAIS values, decrees a relatively better model for the simulation of Q

_{t}.

## 5. Results and Discussion

^{2}), the Nash–Sutcliffe coefficient of efficiency (NSE), and percent bias (PBIAS).

#### 5.1. Selection of Best Input Combination

_{(t)}), previous day runoff (Q

_{(t−1)}), previous two days’ runoff (Q

_{(t−2)}), previous three days’ runoff (Q

_{(t−3)}), present-day rainfall (R

_{(t)}), previous day rainfall (R

_{(t−1)}), previous two days’ rainfall (R

_{(t−2)}) and previous three days’ rainfall (R

_{(t−3)}) were used for Gamma testing (Table 2). The models with low Gamma (Γ) and V

_{ratio}values were considered the most appropriate for developing the models [149]. It was noticed that the Gamma value and V

_{ratio}decreased with an increase in the number of predictors. However, after a certain point, the Gamma value began to increase again. This might be due to the following two reasons: (i) the inclusion of a high number of input variables may be the cause of overfitting, and (ii) the inclusion of a smaller number of input variables results in the incapacity of the model to correctly explain the total variance of the forecasted subset. The minimum Gamma (Γ) and V

_{ratio}values were 0.407 and 0.191, respectively, for the M19 predictor set. Hence, the M19 predictor combination was employed for further analysis. It could be stated that using rainfall with a two-day lag and the discharge from one to three days’ lag as a predictor would produce an optimum rainfall–runoff model. It was also noticed that the Gamma value and V

_{ratio}increased when the rainfall of the three-day lag was included in the predictors. This might be due to the low correlation of the predictor variable with the predictand.

#### 5.2. Application of Machine Learning Techniques for Rainfall–Runoff Modeling

#### 5.2.1. MLR Model for Runoff Prediction

^{2}, NSE, and PBIAS were 13.44 m

^{3}/s, 0.78, 0.72, and 0.00%, respectively, during the training and 12.67 m

^{3}/s, 0.67, 0.51, and 0.80%, respectively, during the testing period for the MLR model (Table 3). The model revealed low training bias and underestimated runoff values in the testing period. It was seen that the MLR model lacked the satisfactory mapping of the Gola watershed’s runoff. According to Figure 5a,b, using 95% confidence intervals, the results showed that most of the points of simulated runoff values (m

^{3}/s) are outside of the confidence range, which indicates overestimation and underestimation of the target points in both periods. However, according to the presented results, the model’s performance is not acceptable.

#### 5.2.2. MARS Model for Runoff Prediction

^{2}, NSE, and PBIAS were 12.55 m

^{3}/s, 0.81, 0.76, and 0.00%, respectively, during training, and 10.07 m

^{3}/s, 0.79, 0.74, and 0.20%, respectively, during the testing period for the MARS model (Table 4). The temporal variations and scatter plots of the observed and predicted runoff during the training and testing period are displayed in Figure 6 and Figure 7, respectively. As can be seen from Figure 6a,b, there is good agreement between the observed runoff values and the corresponding values simulated by the MARS model in the training and testing periods, respectively. The trend of predicting runoff was satisfactory for the observed runoff of the Gola watershed. The peak values of the runoff were not predicted with great accuracy. Low values of PBIAS were found in the MARS model during the training period, which indicates an accurate model simulation. The PBIAS value of 0.00% and the positive value of 0.20% indicated slight underestimation during the testing period. According to Figure 7a,b, using 95% confidence intervals, the results showed that some of the points of the simulated runoff values (m

^{3}/s) are outside of the confidence range, which indicates the overestimation and underestimation of the target points in both periods. However, the model’s performance is acceptable according to the presented results.

#### 5.2.3. SVM Model for Runoff Prediction

^{2}, NSE, and PBIAS for the SVM model were 12.614 m3/s, 0.83, 0.81, and −3.90% for the training period and 14.02 m

^{3}/s, 0.60, 0.60, and 0.40% for the testing period, respectively (Table 3). The R

^{2}value (0.83) shows a strong linear relationship between the observed and predicted variables in the training period. It was found satisfactory (0.60) during the testing period. The NSE value (0.81) revealed good model predictive skills during training. The 0.60 value in the testing period shows satisfactory predictive skills during the testing period. The PBIAS value was found to be −3.90% during the training period, which shows the model was overpredicting the runoff values during the training period, and the testing period (0.40%) reveals that the model was underpredicting the runoff values. According to Figure 9a,b, using 95% confidence intervals, the results showed that some of the points of the simulated runoff values (m

^{3}/s) are outside of the confidence range, which indicates underestimation in the training dataset and overestimation and underestimation of the target points in the testing dataset. However, the model’s performance is acceptable according to the presented results.

#### 5.2.4. Random Forest Model for Runoff Prediction

^{3}/s to 6.480 m

^{3}/s and R

^{2}values were in the range of 0.95 to 0.96 during the training period. The values of RMSE lay in the range of 5.430 m

^{3}/s to 5.677 m

^{3}/s, and R

^{2}values lay in the range of 0.94 to 95 during the testing period. From the evaluation of all of the results, it was observed that the RF-28 model was superior to the other RF models.

^{2}

_{,}NSE, and PBIAS values of the RF-28 model were 6.318 m

^{3}/s, 0.96, 0.94, and −0.20% for the training period and 5.565 m

^{3}/s, 0.95, 0.92, and −0.10% for the testing period (Table 3). The low RMSE values show a concentration of data around the best fit line. The R

^{2}values (0.96 and 0.95) during the training and testing period revealed a strong linear relationship between the observed and predicted runoff values. The NSE values were found to be 0.94 and 0.95 during the training and testing periods, respectively, which shows the good predictive ability of the model. The PBIAS values revealed that the model slightly overpredicted the runoff values during training and testing. The temporal variations and scatter plots of the observed and predicted runoff during the training and testing periods are displayed in Figure 10 and Figure 11, respectively. The time-series plot revealed the fact that the model slightly overpredicted in the training period as well as in the testing period (Figure 10a,b). The simulation results of the RF model are also shown in Figure 11a,b: except for a few overestimated and underestimated cases in the testing period, all of the simulated data are in the 95% confidence intervals. The model’s accuracy is also confirmed. According to the statistics presented in Figure 11a,b, it can be concluded that the RF model has a high ability to simulate the runoff value.

#### 5.3. Model Comparison

## 6. Conclusions

^{2}) statistics, the models were ranked as RF, SVM, MARS, and MLR for the training dataset and RF, MARS, MLR, and SVM for the testing dataset. The models were ranked as RF, SVM, MARS, and MLR for training and RF, MARS, SVM, and MLR for testing in the case of NSE statistics, respectively. Based on the quantitative analysis and indices, the ranking of the models was RF, MARS, SVM, and MLR for the training period, and RF, MARS, MLR, and SVM for the testing period. Perhaps these results were due to data division, input uncertainties, and model parameter optimization. In order to determine the consistency of the models, these should be tested using varying data lengths and training–testing splits. The obtained results suggested that the accuracy of the MLR, MARS, SVM, and RF techniques were adequate using rainfall and runoff parameters for modeling. It was found that there was variation in the results of different machine learning models. The evaluation of the model performance revealed that the RF model outperformed the other regression models for predicting the runoff of the Gola watershed.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Alizadeh, Z.; Yazdi, J.; Najafi, M.S. Improving the outputs of regional heavy rainfall forecasting models using an adaptive real-time approach. Hydrol. Sci. J.
**2022**, 67, 550–563. [Google Scholar] [CrossRef] - Khan, M.T.; Shoaib, M.; Hammad, M.; Salahudin, H.; Ahmad, F.; Ahmad, S. Application of machine learning techniques in rainfall–runoff modelling of the soan river basin, Pakistan. Water
**2021**, 13, 3528. [Google Scholar] [CrossRef] - Barrera-Animas, A.Y.; Oyedele, L.O.; Bilal, M.; Akinosho, T.D.; Delgado, J.M.D.; Akanbi, L.A. Rainfall prediction: A comparative analysis of modern machine learning algorithms for time-series forecasting. Mach. Learn. Appl.
**2022**, 7, 100204. [Google Scholar] [CrossRef] - Basha, C.Z.; Bhavana, N.; Bhavya, P.; Sowmya, V. Rainfall prediction using machine learning & deep learning techniques. In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2–4 July 2020; pp. 92–97. [Google Scholar]
- Yang, T.-H.; Yang, S.-C.; Ho, J.-Y.; Lin, G.-F.; Hwang, G.-D.; Lee, C.-S. Flash flood warnings using the ensemble precipitation forecasting technique: A case study on forecasting floods in Taiwan caused by typhoons. J. Hydrol.
**2015**, 520, 367–378. [Google Scholar] [CrossRef] - Liu, J.; Wang, J.; Pan, S.; Tang, K.; Li, C.; Han, D. A real-time flood forecasting system with dual updating of the NWP rainfall and the river flow. Nat. Hazards
**2015**, 77, 1161–1182. [Google Scholar] [CrossRef] - Mosavi, A.; Ozturk, P.; Chau, K. Flood Prediction Using Machine Learning Models: Literature Review. Water
**2018**, 10, 1536. [Google Scholar] [CrossRef] [Green Version] - You, G.J.-Y.; Thum, B.-H.; Lin, F.-H. The examination of reproducibility in hydro-ecological characteristics by daily synthetic flow models. J. Hydrol.
**2014**, 511, 904–919. [Google Scholar] [CrossRef] - Le, T.-T.; Pham, B.T.; Ly, H.-B.; Shirzadi, A.; Le, L.M. Development of 48-hour precipitation forecasting model using nonlinear autoregressive neural network. In Innovation for Sustainable Infrastructure; Ha-Minh, C., van Dao, D., Benboudjema, F., Derrible, S., Huynh, D.V.K., Tang, A.M., Eds.; Lecture Notes in Civil Engineering; Springer: Singapore, 2020; Volume 54, pp. 1191–1196. ISBN 978-981-150-802-8. [Google Scholar]
- Amin, I.; Kumar, R.; Jhajharia, D.; Sherring, A. Estimation and validation of runoff and sediment models for Dachigam watershed of Kashmir Valley. Indian J. Soil Conserv.
**2015**, 43, 9–14. [Google Scholar] - Kumar, R.; Manzoor, S.; Vishwakarma, D.K.; Al-Ansari, N.; Kushwaha, N.L.; Elbeltagi, A.; Sushanth, K.; Prasad, V.; Kuriqi, A. Assessment of Climate Change Impact on Snowmelt Runoff in Himalayan Region. Sustainability
**2022**, 14, 1150. [Google Scholar] [CrossRef] - Vishwakarma, D.K.; Kumar, R.; Pandey, K.; Singh, V.; Kushwaha, K.S. Modeling of Rainfall and Ground Water Fluctuation of Gonda District Uttar Pradesh, India. Int. J. Curr. Microbiol. Appl. Sci.
**2018**, 7, 2613–2618. [Google Scholar] [CrossRef] - Kumar, M.; Kumar, R.; Rajput, T.B.S.; Patel, N. Efficient Design of Drip Irrigation System using Water and Fertilizer Application Uniformity at Different Operating Pressures in a Semi-Arid Region of India. Irrig. Drain.
**2017**, 66, 316–326. [Google Scholar] [CrossRef] - Thomas, D.S.G.; Twyman, C.; Osbahr, H.; Hewitson, B. Adaptation to climate change and variability: Farmer responses to intra-seasonal precipitation trends in South Africa. Clim. Chang.
**2007**, 83, 301–322. [Google Scholar] [CrossRef] - Kramer, K.L.; Hackman, J. Scaling climate change to human behavior predicting good and bad years for Maya farmers. Am. J. Hum. Biol.
**2021**, 33, e23524. [Google Scholar] [CrossRef] - Zhao, Q.; Ma, X.; Liang, L.; Yao, W. Spatial–Temporal Variation Characteristics of Multiple Meteorological Variables and Vegetation over the Loess Plateau Region. Appl. Sci.
**2020**, 10, 1000. [Google Scholar] [CrossRef] [Green Version] - Turgut, M.S.; Turgut, O.E.; Afan, H.A.; El-Shafie, A. A novel Master–Slave optimization algorithm for generating an optimal release policy in case of reservoir operation. J. Hydrol.
**2019**, 577, 123959. [Google Scholar] [CrossRef] - Tikhamarine, Y.; Souag-Gamane, D.; Ahmed, A.N.; Sammen, S.S.; Kisi, O.; Huang, Y.F.; El-Shafie, A. Rainfall-runoff modelling using improved machine learning methods: Harris hawks optimizer vs. particle swarm optimization. J. Hydrol.
**2020**, 589, 125133. [Google Scholar] [CrossRef] - Banadkooki, F.B.; Ehteram, M.; Ahmed, A.N.; Fai, C.M.; Afan, H.A.; Ridwam, W.M.; Sefelnasr, A.; El-Shafie, A. Precipitation Forecasting Using Multilayer Neural Network and Support Vector Machine Optimization Based on Flow Regime Algorithm Taking into Account Uncertainties of Soft Computing Models. Sustainability
**2019**, 11, 6681. [Google Scholar] [CrossRef] [Green Version] - Tikhamarine, Y.; Malik, A.; Kumar, A.; Souag-Gamane, D.; Kisi, O. Estimation of monthly reference evapotranspiration using novel hybrid machine learning approaches. Hydrol. Sci. J.
**2019**, 64, 1824–1842. [Google Scholar] [CrossRef] - Chang, T.K.; Talei, A.; Alaghmand, S.; Ooi, M.P.-L. Choice of rainfall inputs for event-based rainfall-runoff modeling in a catchment with multiple rainfall stations using data-driven techniques. J. Hydrol.
**2017**, 545, 100–108. [Google Scholar] [CrossRef] - Tokar, A.S.; Johnson, P.A. Rainfall-runoff modeling using artificial neural networks. J. Hydrol. Eng.
**1999**, 4, 232–239. [Google Scholar] [CrossRef] - Al Sawaf, M.B.; Kawanisi, K.; Jlilati, M.N.; Xiao, C.; Bahreinimotlagh, M. Extent of detection of hidden relationships among different hydrological variables during floods using data-driven models. Environ. Monit. Assess.
**2021**, 193, 692. [Google Scholar] [CrossRef] - Peel, M.C.; McMahon, T.A. Historical development of rainfall-runoff modeling. WIREs Water
**2020**, 7, e1471. [Google Scholar] [CrossRef] - Moradkhani, H.; Sorooshian, S. General review of rainfall-runoff modeling: Model calibration, data assimilation, and uncertainty analysis. In Hydrological Modelling and the Water Cycle; Sorooshian, S., Hsu, K.-L., Coppola, E., Tomassetti, B., Verdecchia, M., Visconti, G., Eds.; Water Science and Technology Library; Springer: Berlin/Heidelberg, Germany, 2008; Volume 63, pp. 1–24. ISBN 978-354-077-843-1. [Google Scholar]
- Daniell, T.M. Neural networks. Applications in hydrology and water resources engineering. In Proceedings of the National Conference Publication—Institute of Engineers, Perth, Australia; 1991. [Google Scholar]
- French, M.N.; Krajewski, W.F.; Cuykendall, R.R. Rainfall forecasting in space and time using a neural network. J. Hydrol.
**1992**, 137, 1–31. [Google Scholar] [CrossRef] - Asadi, H.; Shahedi, K.; Jarihani, B.; Sidle, R.C. Rainfall-Runoff Modelling Using Hydrological Connectivity Index and Artificial Neural Network Approach. Water
**2019**, 11, 212. [Google Scholar] [CrossRef] [Green Version] - Dash, Y.; Mishra, S.K.; Panigrahi, B.K. Rainfall prediction for the Kerala state of India using artificial intelligence approaches. Comput. Electr. Eng.
**2018**, 70, 66–73. [Google Scholar] [CrossRef] - Chau, K.W.; Wu, C.L. A hybrid model coupled with singular spectrum analysis for daily rainfall prediction. J. Hydroinform.
**2010**, 12, 458–473. [Google Scholar] [CrossRef] [Green Version] - Zounemat-kermani, M.; Kisi, O.; Rajaee, T. Performance of radial basis and LM-feed forward artificial neural networks for predicting daily watershed runoff. Appl. Soft Comput.
**2013**, 13, 4633–4644. [Google Scholar] [CrossRef] - Harshburger, B.J.; Walden, V.P.; Humes, K.S.; Moore, B.C.; Blandford, T.R.; Rango, A. Generation of Ensemble Streamflow Forecasts Using an Enhanced Version of the Snowmelt Runoff Model1. JAWRA J. Am. Water Resour. Assoc.
**2012**, 48, 643–655. [Google Scholar] [CrossRef] - Humphrey, G.B.; Gibbs, M.S.; Dandy, G.C.; Maier, H.R. A hybrid approach to monthly streamflow forecasting: Integrating hydrological model outputs into a Bayesian artificial neural network. J. Hydrol.
**2016**, 540, 623–640. [Google Scholar] [CrossRef] - Kisi, O.; Cimen, M. A wavelet-support vector machine conjunction model for monthly streamflow forecasting. J. Hydrol.
**2011**, 399, 132–140. [Google Scholar] [CrossRef] - Thapa, S.; Zhao, Z.; Li, B.; Lu, L.; Fu, D.; Shi, X.; Tang, B.; Qi, H. Snowmelt-Driven Streamflow Prediction Using Machine Learning Techniques (LSTM, NARX, GPR, and SVR). Water
**2020**, 12, 1734. [Google Scholar] [CrossRef] - Nourani, V.; Andalib, G. Wavelet based artificial intelligence approaches for prediction of hydrological time series. In Artificial Life and Computational Intelligence. ACALCI 2015; Chalup, S.K., Blair, A.D., Randall, M., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2015; Volume 8955. [Google Scholar] [CrossRef]
- Idrees, M.B.; Jehanzaib, M.; Kim, D.; Kim, T.-W. Comprehensive evaluation of machine learning models for suspended sediment load inflow prediction in a reservoir. Stoch. Environ. Res. Risk Assess.
**2021**, 35, 1805–1823. [Google Scholar] [CrossRef] - Kakaei Lafdani, E.; Moghaddam Nia, A.; Ahmadi, A. Daily suspended sediment load prediction using artificial neural networks and support vector machines. J. Hydrol.
**2013**, 478, 50–62. [Google Scholar] [CrossRef] - Rajaee, T.; Mirbagheri, S.A.; Zounemat-Kermani, M.; Nourani, V. Daily suspended sediment concentration simulation using ANN and neuro-fuzzy models. Sci. Total Environ.
**2009**, 407, 4916–4927. [Google Scholar] [CrossRef] [PubMed] - Melesse, A.M.; Ahmad, S.; McClain, M.E.; Wang, X.; Lim, Y.H. Suspended sediment load prediction of river systems: An artificial neural network approach. Agric. Water Manag.
**2011**, 98, 855–866. [Google Scholar] [CrossRef] - Gupta, D.; Hazarika, B.B.; Berlin, M.; Sharma, U.M.; Mishra, K. Artificial intelligence for suspended sediment load prediction: A review. Environ. Earth Sci.
**2021**, 80, 346. [Google Scholar] [CrossRef] - Azamathulla, H.M.; Cuan, Y.C.; Ghani, A.A.; Chang, C.K. Suspended sediment load prediction of river systems: GEP approach. Arab. J. Geosci.
**2013**, 6, 3469–3480. [Google Scholar] [CrossRef] - Nguyen, D.T.; Chen, S.-T. Real-Time Probabilistic Flood Forecasting Using Multiple Machine Learning Methods. Water
**2020**, 12, 787. [Google Scholar] [CrossRef] [Green Version] - Al-Abadi, A.M. Modeling of stage–discharge relationship for Gharraf River, southern Iraq using backpropagation artificial neural networks, M5 decision trees, and Takagi–Sugeno inference system technique: A comparative study. Appl. Water Sci.
**2016**, 6, 407–420. [Google Scholar] [CrossRef] [Green Version] - Kisi, Ö.; Çobaner, M. Modeling River Stage-Discharge Relationships Using Different Neural Network Computing Techniques. Clean Soil Air Water
**2009**, 37, 160–169. [Google Scholar] [CrossRef] - Lohani, A.K.; Goel, N.K.; Bhatia, K.K.S. Takagi–Sugeno fuzzy inference system for modeling stage–discharge relationship. J. Hydrol.
**2006**, 331, 146–160. [Google Scholar] [CrossRef] - Shukla, R.; Kumar, P.; Vishwakarma, D.K.; Ali, R.; Kumar, R.; Kuriqi, A. Modeling of stage-discharge using back propagation ANN-, ANFIS-, and WANN-based computing techniques. Theor. Appl. Climatol.
**2022**, 147, 867–889. [Google Scholar] [CrossRef] - Ajmera, T.K.; Goyal, M.K. Development of stage–discharge rating curve using model tree and neural networks: An application to Peachtree Creek in Atlanta. Expert Syst. Appl.
**2012**, 39, 5702–5710. [Google Scholar] [CrossRef] - Araghi, A.; Mousavi-Baygi, M.; Adamowski, J.; Martinez, C.; van der Ploeg, M. Forecasting soil temperature based on surface air temperature using a wavelet artificial neural network. Meteorol. Appl.
**2017**, 24, 603–611. [Google Scholar] [CrossRef] [Green Version] - Feng, Y.; Cui, N.; Hao, W.; Gao, L.; Gong, D. Estimation of soil temperature from meteorological data using different machine learning models. Geoderma
**2019**, 338, 67–77. [Google Scholar] [CrossRef] - Bilgili, M. Prediction of soil temperature using regression and artificial neural network models. Meteorol. Atmos. Phys.
**2010**, 110, 59–70. [Google Scholar] [CrossRef] - Hariharan, G.; Kannan, K.; Sharma, K.R. Haar wavelet in estimating depth profile of soil temperature. Appl. Math. Comput.
**2009**, 210, 119–125. [Google Scholar] [CrossRef] - Singh, V.K.; Singh, B.P.; Kisi, O.; Kushwaha, D.P. Spatial and multi-depth temporal soil temperature assessment by assimilating satellite imagery, artificial intelligence and regression based models in arid area. Comput. Electron. Agric.
**2018**, 150, 205–219. [Google Scholar] [CrossRef] - Mehdizadeh, S.; Fathian, F.; Safari, M.J.S.; Khosravi, A. Developing novel hybrid models for estimation of daily soil temperature at various depths. Soil Tillage Res.
**2020**, 197, 104513. [Google Scholar] [CrossRef] - Wu, W.; Tang, X.-P.; Guo, N.-J.; Yang, C.; Liu, H.-B.; Shang, Y.-F. Spatiotemporal modeling of monthly soil temperature using artificial neural networks. Theor. Appl. Climatol.
**2013**, 113, 481–494. [Google Scholar] [CrossRef] - Seifi, A.; Ehteram, M.; Nayebloei, F.; Soroush, F.; Gharabaghi, B.; Torabi Haghighi, A. GLUE uncertainty analysis of hybrid models for predicting hourly soil temperature and application wavelet coherence analysis for correlation with meteorological variables. Soft Comput.
**2021**, 25, 10723–10748. [Google Scholar] [CrossRef] - Ali Ghorbani, M.; Kazempour, R.; Chau, K.-W.; Shamshirband, S.; Taherei Ghazvinei, P. Forecasting pan evaporation with an integrated artificial neural network quantum-behaved particle swarm optimization model: A case study in Talesh, Northern Iran. Eng. Appl. Comput. Fluid Mech.
**2018**, 12, 724–737. [Google Scholar] [CrossRef] - Terzi, Ö. Daily pan evaporation estimation using gene expression programming and adaptive neural-based fuzzy inference system. Neural Comput. Appl.
**2013**, 23, 1035–1044. [Google Scholar] [CrossRef] - Guven, A.; Kişi, Ö. Daily pan evaporation modeling using linear genetic programming technique. Irrig. Sci.
**2011**, 29, 135–145. [Google Scholar] [CrossRef] - Kushwaha, N.L.; Rajput, J.; Elbeltagi, A.; Elnaggar, A.Y.; Sena, D.R.; Vishwakarma, D.K.; Mani, I.; Hussein, E.E. Data Intelligence Model and Meta-Heuristic Algorithms-Based Pan Evaporation Modelling in Two Different Agro-Climatic Zones: A Case Study from Northern India. Atmosphere
**2021**, 12, 1654. [Google Scholar] [CrossRef] - Piri, J.; Amin, S.; Moghaddamnia, A.; Keshavarz, A.; Han, D.; Remesan, R. Daily Pan Evaporation Modeling in a Hot and Dry Climate. J. Hydrol. Eng.
**2009**, 14, 803–811. [Google Scholar] [CrossRef] - Shabani, S.; Samadianfard, S.; Sattari, M.T.; Shamshirband, S.; Mosavi, A.; Kmet, T.; Várkonyi-Kóczy, A.R. Modeling daily pan evaporation in humid climates using Gaussian Process Regression. arXiv
**2019**, arXiv:1908.04267. [Google Scholar] [CrossRef] - Kim, S.; Shiri, J.; Singh, V.P.; Kisi, O.; Landeras, G. Predicting daily pan evaporation by soft computing models with limited climatic data. Hydrol. Sci. J.
**2015**, 60, 1120–1136. [Google Scholar] [CrossRef] [Green Version] - Keshtegar, B.; Piri, J.; Kisi, O. A nonlinear mathematical modeling of daily pan evaporation based on conjugate gradient method. Comput. Electron. Agric.
**2016**, 127, 120–130. [Google Scholar] [CrossRef] - Kumar, M.; Kumari, A.; Kumar, D.; Al-Ansari, N.; Ali, R.; Kumar, R.; Kumar, A.; Elbeltagi, A.; Kuriqi, A. The superiority of data-driven techniques for estimation of daily pan evaporation. Atmosphere
**2021**, 12, 701. [Google Scholar] [CrossRef] - Malik, A.; Tikhamarine, Y.; Al-Ansari, N.; Shahid, S.; Sekhon, H.S.; Pal, R.K.; Rai, P.; Pandey, K.; Singh, P.; Elbeltagi, A.; et al. Daily pan-evaporation estimation in different agro-climatic zones using novel hybrid support vector regression optimized by Salp swarm algorithm in conjunction with gamma test. Eng. Appl. Comput. Fluid Mech.
**2021**, 15, 1075–1094. [Google Scholar] [CrossRef] - Tabari, H.; Marofi, S.; Sabziparvar, A.-A. Estimation of daily pan evaporation using artificial neural network and multivariate non-linear regression. Irrig. Sci.
**2010**, 28, 399–406. [Google Scholar] [CrossRef] - Bhagwat, S.; Kashyap, P.S.; Singh, B.P.; Singh, V.K. Daily pan evaporation modeling in hilly region of Uttarakhand using artificial neural network. Indian J. Ecol.
**2017**, 44, 467–473. [Google Scholar] - Huang, G.; Wu, L.; Ma, X.; Zhang, W.; Fan, J.; Yu, X.; Zeng, W.; Zhou, H. Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions. J. Hydrol.
**2019**, 574, 1029–1041. [Google Scholar] [CrossRef] - Nourani, V.; Elkiran, G.; Abdullahi, J. Multi-step ahead modeling of reference evapotranspiration using a multi-model approach. J. Hydrol.
**2020**, 581, 124434. [Google Scholar] [CrossRef] - Wen, X.; Si, J.; He, Z.; Wu, J.; Shao, H.; Yu, H. Support-Vector-Machine-Based Models for Modeling Daily Reference Evapotranspiration with Limited Climatic Data in Extreme Arid Regions. Water Resour. Manag.
**2015**, 29, 3195–3209. [Google Scholar] [CrossRef] - Mor, N.; Jhajharia, D. Time series modelling of monthly reference evapotranspiration for Bikaner, Rajasthan (India). Indian J. Soil Conserv.
**2018**, 46, 42–51. [Google Scholar] - Feng, Y.; Peng, Y.; Cui, N.; Gong, D.; Zhang, K. Modeling reference evapotranspiration using extreme learning machine and generalized regression neural network only with temperature data. Comput. Electron. Agric.
**2017**, 136, 71–78. [Google Scholar] [CrossRef] - Dong, L.; Zeng, W.; Wu, L.; Lei, G.; Chen, H.; Srivastava, A.K.; Gaiser, T. Estimating the Pan Evaporation in Northwest China by Coupling CatBoost with Bat Algorithm. Water
**2021**, 13, 256. [Google Scholar] [CrossRef] - Fan, J.; Yue, W.; Wu, L.; Zhang, F.; Cai, H.; Wang, X.; Lu, X.; Xiang, Y. Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric. For. Meteorol.
**2018**, 263, 225–241. [Google Scholar] [CrossRef] - Kim, S.; Kim, H.S. Neural networks and genetic algorithm approach for nonlinear evaporation and evapotranspiration modeling. J. Hydrol.
**2008**, 351, 299–317. [Google Scholar] [CrossRef] - Elbeltagi, A.; Raza, A.; Hu, Y.; Al-Ansari, N.; Kushwaha, N.L.; Srivastava, A.; Kumar Vishwakarma, D.; Zubair, M. Data intelligence and hybrid metaheuristic algorithms-based estimation of reference evapotranspiration. Appl. Water Sci.
**2022**, 12, 152. [Google Scholar] [CrossRef] - Elbeltagi, A.; Kushwaha, N.L.; Rajput, J.; Vishwakarma, D.K.; Kulimushi, L.C.; Kumar, M.; Zhang, J.; Pande, C.B.; Choudhari, P.; Meshram, S.G.; et al. Modelling daily reference evapotranspiration based on stacking hybridization of ANN with meta-heuristic algorithms under diverse agro-climatic conditions. Stoch. Environ. Res. Risk Assess.
**2022**. [Google Scholar] [CrossRef] - Singh, V.K.; Panda, K.C.; Sagar, A.; Al-Ansari, N.; Duan, H.-F.; Paramaguru, P.K.; Vishwakarma, D.K.; Kumar, A.; Kumar, D.; Kashyap, P.S.; et al. Novel Genetic Algorithm (GA) based hybrid machine learning-pedotransfer Function (ML-PTF) for prediction of spatial pattern of saturated hydraulic conductivity. Eng. Appl. Comput. Fluid Mech.
**2022**, 16, 1082–1099. [Google Scholar] [CrossRef] - Sihag, P.; Tiwari, N.K.; Ranjan, S. Modelling of infiltration of sandy soil using gaussian process regression. Model. Earth Syst. Environ.
**2017**, 3, 1091–1100. [Google Scholar] [CrossRef] - Sihag, P.; Tiwari, N.K.; Ranjan, S. Prediction of unsaturated hydraulic conductivity using adaptive neuro-fuzzy inference system (ANFIS). ISH J. Hydraul. Eng.
**2019**, 25, 132–142. [Google Scholar] [CrossRef] - Sihag, P.; Tiwari, N.K.; Ranjan, S. Estimation and inter-comparison of infiltration models. Water Sci.
**2017**, 31, 34–43. [Google Scholar] [CrossRef] [Green Version] - Singh, B.; Sihag, P.; Parsaie, A.; Angelaki, A. Comparative analysis of artificial intelligence techniques for the prediction of infiltration process. Geol. Ecol. Landsc.
**2021**, 5, 109–118. [Google Scholar] [CrossRef] - Sihag, P.; Singh, B.; Sepah Vand, A.; Mehdipour, V. Modeling the infiltration process with soft computing techniques. ISH J. Hydraul. Eng.
**2020**, 26, 138–152. [Google Scholar] [CrossRef] - Singh, B.; Sihag, P.; Pandhiani, S.M.; Debnath, S.; Gautam, S. Estimation of permeability of soil using easy measured soil parameters: Assessing the artificial intelligence-based models. ISH J. Hydraul. Eng.
**2021**, 27, 38–48. [Google Scholar] [CrossRef] - Sihag, P.; Kumar, M.; Singh, B. Assessment of infiltration models developed using soft computing techniques. Geol. Ecol. Landsc.
**2021**, 5, 241–251. [Google Scholar] [CrossRef] - Sihag, P.; Singh, V.P.; Angelaki, A.; Kumar, V.; Sepahvand, A.; Golia, E. Modelling of infiltration using artificial intelligence techniques in semi-arid Iran. Hydrol. Sci. J.
**2019**, 64, 1647–1658. [Google Scholar] [CrossRef] - Singh, V.K.; Kumar, D.; Kashyap, P.S.; Singh, P.K.; Kumar, A.; Singh, S.K. Modelling of soil permeability using different data driven algorithms based on physical properties of soil. J. Hydrol.
**2020**, 580, 124223. [Google Scholar] [CrossRef] - Elbeltagi, A.; Pande, C.B.; Kouadri, S.; Islam, A.R.M.T. Applications of various data-driven models for the prediction of groundwater quality index in the Akot basin, Maharashtra, India. Environ. Sci. Pollut. Res.
**2022**, 29, 17591–17605. [Google Scholar] [CrossRef] - Gholami, V.; Khaleghi, M.R.; Pirasteh, S.; Booij, M.J. Comparison of Self-Organizing Map, Artificial Neural Network, and Co-Active Neuro-Fuzzy Inference System Methods in Simulating Groundwater Quality: Geospatial Artificial Intelligence. Water Resour. Manag.
**2022**, 36, 451–469. [Google Scholar] [CrossRef] - El Bilali, A.; Taleb, A.; Brouziyne, Y. Groundwater quality forecasting using machine learning algorithms for irrigation purposes. Agric. Water Manag.
**2021**, 245, 106625. [Google Scholar] [CrossRef] - Singha, S.; Pasupuleti, S.; Singha, S.S.; Singh, R.; Kumar, S. Prediction of groundwater quality using efficient machine learning technique. Chemosphere
**2021**, 276, 130265. [Google Scholar] [CrossRef] - Kumar, A.; Singh, V.K.; Saran, B.; Al-Ansari, N.; Singh, V.P.; Adhikari, S.; Joshi, A.; Singh, N.K.; Vishwakarma, D.K. Development of Novel Hybrid Models for Prediction of Drought- and Stress-Tolerance Indices in Teosinte Introgressed Maize Lines Using Artificial Intelligence Techniques. Sustainability
**2022**, 14, 2287. [Google Scholar] [CrossRef] - Elbeltagi, A.; Azad, N.; Arshad, A.; Mohammed, S.; Mokhtar, A.; Pande, C.; Ramezani, H.; Ahmad, S.; Reza, A.; Islam, T.; et al. Applications of Gaussian process regression for predicting blue water footprint: Case study in Ad Daqahliyah, Egypt. Agric. Water Manag.
**2021**, 255, 107052. [Google Scholar] [CrossRef] - Elbeltagi, A.; Deng, J.; Wang, K.; Hong, Y. Crop Water footprint estimation and modeling using an artificial neural network approach in the Nile Delta, Egypt. Agric. Water Manag.
**2020**, 235, 106080. [Google Scholar] [CrossRef] - Babaee, M.; Maroufpoor, S.; Jalali, M.; Zarei, M.; Elbeltagi, A. Artificial intelligence approach to estimating rice yield*. Irrig. Drain.
**2021**, 70, 732–742. [Google Scholar] [CrossRef] - Elbeltagi, A.; Zhang, L.; Deng, J.; Juma, A.; Wang, K. Modeling monthly crop coefficients of maize based on limited meteorological data: A case study in Nile Delta, Egypt. Comput. Electron. Agric.
**2020**, 173, 105368. [Google Scholar] [CrossRef] - Kumar, S.; Roshni, T.; Himayoun, D. A Comparison of Emotional Neural Network (ENN) and Artificial Neural Network (ANN) Approach for Rainfall-Runoff Modelling. Civ. Eng. J.
**2019**, 5, 2120–2130. [Google Scholar] [CrossRef] - Abbot, J.; Marohasy, J. Application of artificial neural networks to rainfall forecasting in Queensland, Australia. Adv. Atmos. Sci.
**2012**, 29, 717–730. [Google Scholar] [CrossRef] - Shoaib, M.; Shamseldin, A.Y.; Melville, B.W.; Khan, M.M. A comparison between wavelet based static and dynamic neural network approaches for runoff prediction. J. Hydrol.
**2016**, 535, 211–225. [Google Scholar] [CrossRef] - Ghumman, A.R.; Ghazaw, Y.M.; Sohail, A.R.; Watanabe, K. Runoff forecasting by artificial neural network and conventional model. Alex. Eng. J.
**2011**, 50, 345–350. [Google Scholar] [CrossRef] [Green Version] - Nayak, P.C.; Sudheer, K.P.; Jain, S.K. Rainfall-runoff modeling through hybrid intelligent system. Water Resour. Res.
**2007**, 43, W07415. [Google Scholar] [CrossRef] - Sinharay, S. An Overview of statistics in education. In International Encyclopedia of Education, 3rd ed.; Peterson, P., Baker, E., McGaw, B., Eds.; Elsevier: Amsterdam, The Netherlands, 2010; pp. 1–11. ISBN 978-0-08-044894-7. [Google Scholar]
- Vishwakarma, D.K.; Pandey, K.; Kaur, A.; Kushwaha, N.L.; Kumar, R.; Ali, R.; Elbeltagi, A.; Kuriqi, A. Methods to estimate evapotranspiration in humid and subtropical climate conditions. Agric. Water Manag.
**2022**, 261, 107378. [Google Scholar] [CrossRef] - Chenini, I.; Khemiri, S. Evaluation of ground water quality using multiple linear regression and structural equation modeling. Int. J. Environ. Sci. Technol.
**2009**, 6, 509–519. [Google Scholar] [CrossRef] [Green Version] - Snedecor, G.W.; Cochran, W.G.; Fuller, J.A.R. Métodos Estadísticos; Continental: México City, Mexico, 1971. [Google Scholar]
- Sekhar Roy, S.; Roy, R.; Balas, V.E. Estimating heating load in buildings using multivariate adaptive regression splines, extreme learning machine, a hybrid model of MARS and ELM. Renew. Sustain. Energy Rev.
**2018**, 82, 4256–4268. [Google Scholar] [CrossRef] - Akin, M.; Eyduran, S.P.; Eyduran, E.; Reed, B.M. Analysis of macro nutrient related growth responses using multivariate adaptive regression splines. Plant Cell Tissue Organ Cult.
**2020**, 140, 661–670. [Google Scholar] [CrossRef] - Mirabbasi, R.; Kisi, O.; Sanikhani, H.; Gajbhiye Meshram, S. Monthly long-term rainfall estimation in Central India using M5Tree, MARS, LSSVR, ANN and GEP models. Neural Comput. Appl.
**2019**, 31, 6843–6862. [Google Scholar] [CrossRef] - Zhang, J.; Zhang, H.; Xiao, H.; Fang, H.; Han, Y.; Yu, L. Effects of rainfall and runoff-yield conditions on runoff. Ain Shams Eng. J.
**2021**, 12, 2111–2116. [Google Scholar] [CrossRef] - Vapnik, V. Statistical Learning Theory; John Wiley & Sons, Inc.: Oxford, UK, 1998; Volume 1. [Google Scholar]
- Li, M.; Zhang, Y.; Wallace, J.; Campbell, E. Estimating annual runoff in response to forest change: A statistical method based on random forest. J. Hydrol.
**2020**, 589, 125168. [Google Scholar] [CrossRef] - Abdulelah Al-Sudani, Z.; Salih, S.Q.; Sharafati, A.; Yaseen, Z.M. Development of multivariate adaptive regression spline integrated with differential evolution model for streamflow simulation. J. Hydrol.
**2019**, 573, 1–12. [Google Scholar] [CrossRef] - Adnan, R.M.; Petroselli, A.; Heddam, S.; Santos, C.A.G.; Kisi, O. Comparison of different methodologies for rainfall–runoff modeling: Machine learning vs conceptual approach. Nat. Hazards
**2021**, 105, 2987–3011. [Google Scholar] [CrossRef] - Li, X.; Sha, J.; Wang, Z.-L. Comparison of daily streamflow forecasts using extreme learning machines and the random forest method. Hydrol. Sci. J.
**2019**, 64, 1857–1866. [Google Scholar] [CrossRef] - Goyal, M.K.; Bharti, B.; Quilty, J.; Adamowski, J.; Pandey, A. Modeling of daily pan evaporation in sub tropical climates using ANN, LS-SVR, Fuzzy Logic, and ANFIS. Expert Syst. Appl.
**2014**, 41, 5267–5276. [Google Scholar] [CrossRef] - Malik, A.; Kumar, A.; Piri, J. Daily suspended sediment concentration simulation using hydrological data of Pranhita River Basin, India. Comput. Electron. Agric.
**2017**, 138, 20–28. [Google Scholar] [CrossRef] - Malik, A.; Kumar, A.; Kim, S.; Kashani, M.H.; Karimi, V.; Sharafati, A.; Ghorbani, M.A.; Al-Ansari, N.; Salih, S.Q.; Yaseen, Z.M.; et al. Modeling monthly pan evaporation process over the Indian central Himalayas: Application of multiple learning artificial intelligence model. Eng. Appl. Comput. Fluid Mech.
**2020**, 14, 323–338. [Google Scholar] [CrossRef] [Green Version] - Singh, A.; Malik, A.; Kumar, A.; Kisi, O. Rainfall-runoff modeling in hilly watershed using heuristic approaches with gamma test. Arab. J. Geosci.
**2018**, 11, 261. [Google Scholar] [CrossRef] - Stefánsson, A.; Končar, N.; Jones, A.J. A note on the Gamma test. Neural Comput. Appl.
**1997**, 5, 131–133. [Google Scholar] [CrossRef] - Noori, R.; Karbassi, A.R.; Moghaddamnia, A.; Han, D.; Zokaei-Ashtiani, M.H.; Farokhnia, A.; Gousheh, M.G. Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. J. Hydrol.
**2011**, 401, 177–189. [Google Scholar] [CrossRef] - Singh, V.K.; Kumar, D.; Kashyap, P.S.; Kisi, O. Simulation of suspended sediment based on gamma test, heuristic, and regression-based techniques. Environ. Earth Sci.
**2018**, 77, 708. [Google Scholar] [CrossRef] - Singh, V.K.; Kumar, D.; Kashyap, P.S.; Singh, P.K. Predicting unsaturated hydraulic conductivity of soil based on machine learning algorithms. In Proceedings of the International Conference on Opportunities and Challenges in Engineering, Management and Science (OCEMS—2019), Bareilly, India, 15–16 February 2019. [Google Scholar]
- Zhang, W.; Goh, A.T.C.; Zhang, Y.; Chen, Y.; Xiao, Y. Assessment of soil liquefaction based on capacity energy concept and multivariate adaptive regression splines. Eng. Geol.
**2015**, 188, 29–37. [Google Scholar] [CrossRef] - Friedman, J.H. Multivariate adaptive regression splines. Ann. Stat.
**1991**, 19, 1–67. [Google Scholar] [CrossRef] - Kisi, O.; Parmar, K.S. Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution. J. Hydrol.
**2016**, 534, 104–112. [Google Scholar] [CrossRef] - Zhang, X. Matrix Analysis and Applications; Cambridge University Press: Cambridge, UK, 2017; ISBN 110-841-741-8. [Google Scholar]
- Rezaie-Balf, M.; Zahmatkesh, Z.; Kim, S. Soft Computing Techniques for Rainfall-Runoff Simulation: Local Non–Parametric Paradigm vs. Model Classification Methods. Water Resour. Manag.
**2017**, 31, 3843–3865. [Google Scholar] [CrossRef] - Adnan, R.M.; Liang, Z.; Trajkovic, S.; Zounemat-Kermani, M.; Li, B.; Kisi, O. Daily streamflow prediction using optimally pruned extreme learning machine. J. Hydrol.
**2019**, 577, 123981. [Google Scholar] [CrossRef] - Vapnik, V. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1999; ISBN 038-798-780-0. [Google Scholar]
- Bray, M.; Han, D. Identification of support vector machines for runoff modelling. J. Hydroinform.
**2004**, 6, 265–280. [Google Scholar] [CrossRef] [Green Version] - Awad, M.; Khanna, R. Support vector regression. In Efficient Learning Machines; Awad, M., Khanna, R., Eds.; Apress: Berkeley, CA, USA, 2015; pp. 67–80. ISBN 978-143-025-990-9. [Google Scholar]
- Kumar, M.; Kumari, A.; Kushwaha, D.P.; Kumar, P.; Malik, A.; Ali, R.; Kuriqi, A. Estimation of Daily Stage–Discharge Relationship by Using Data-Driven Techniques of a Perennial River, India. Sustainability
**2020**, 12, 7877. [Google Scholar] [CrossRef] - Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] [Green Version] - Prasad, A.M.; Iverson, L.R.; Liaw, A. Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction. Ecosystems
**2006**, 9, 181–199. [Google Scholar] [CrossRef] - Sadler, J.M.; Goodall, J.L.; Morsy, M.M.; Spencer, K. Modeling urban coastal flood severity from crowd-sourced flood reports using Poisson regression and Random Forest. J. Hydrol.
**2018**, 559, 43–55. [Google Scholar] [CrossRef] - Malik, A.; Kumar, A. Pan Evaporation Simulation Based on Daily Meteorological Data Using Soft Computing Techniques and Multiple Linear Regression. Water Resour. Manag.
**2015**, 29, 1859–1872. [Google Scholar] [CrossRef] - Kumar, R.; Kumar, A.; Shankhwar, A.K.; Vishkarma, D.K.; Sachan, A.; Singh, P.V.; Jahangeer, J.; Verma, A.; Kumar, V. Modelling of meteorological drought in the foothills of Central Himalayas: A case study in Uttarakhand State, India. Ain Shams Eng. J.
**2022**, 13, 101595. [Google Scholar] [CrossRef] - Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol.
**1970**, 10, 282–290. [Google Scholar] [CrossRef] - Malik, A.; Kumar, A.; Kisi, O. Daily pan evaporation estimation using heuristic methods with gamma test. J. Irrig. Drain. Eng.
**2018**, 144, 4018023. [Google Scholar] [CrossRef] - Nury, A.H.; Hasan, K.; Alam, M.J. Bin Comparative study of wavelet-ARIMA and wavelet-ANN models for temperature time series data in northeastern Bangladesh. J. King Saud Univ. Sci.
**2017**, 29, 47–61. [Google Scholar] [CrossRef] [Green Version] - Wooldridge, M. An Introduction to Multiagent Systems; John Wiley & Sons: Hoboken, NJ, USA, 2009; ISBN 047-051-946-0. [Google Scholar]
- Schroeder, R.; van de Ven, A.; Scudder, G.; Polley, D. Managing innovation and change processes: Findings from the Minnesota innovation research program. Agribusiness
**1986**, 2, 501–523. [Google Scholar] [CrossRef] - Santhi, C.; Arnold, J.G.; Williams, J.R.; Dugas, W.A.; Srinivasan, R.; Hauck, L.M. Validation of the swat model on a large rwer basin with point and nonpoint sources. JAWRA J. Am. Water Resour. Assoc.
**2001**, 37, 1169–1188. [Google Scholar] [CrossRef] - Van Liew, M.W.; Arnold, J.G.; Garbrecht, J.D. Hydrologic simulation on agricultural watersheds: Choosing between two models. Trans. ASAE
**2003**, 46, 1539. [Google Scholar] [CrossRef] - Legates, D.R.; McCabe, G.J., Jr. Evaluating the use of “goodness-of-fit” Measures in hydrologic and hydroclimatic model validation. Water Resour. Res.
**1999**, 35, 233–241. [Google Scholar] [CrossRef] - Gupta, H.V.; Soroosh, S.; Yapo, P.O. Status of Automatic Calibration for Hydrologic Models: Comparison with Multilevel Expert Calibration. J. Hydrol. Eng.
**1999**, 4, 135–143. [Google Scholar] [CrossRef] - Moriasi, D.N.; Arnold, J.G.; van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE
**2007**, 50, 885–900. [Google Scholar] [CrossRef] - Kheirfam, H.; Mokarram-Kashtiban, S. A regional suspended load yield estimation model for ungauged watersheds. Water Sci. Eng.
**2018**, 11, 328–337. [Google Scholar] [CrossRef] - Naghibi, S.A.; Ahmadi, K.; Daneshi, A. Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping. Water Resour. Manag.
**2017**, 31, 2761–2775. [Google Scholar] [CrossRef] - Panda, K.C.; Singh, R.M.; Thakural, L.N.; Sahoo, D.P. Representative grid location-multivariate adaptive regression spline (RGL-MARS) algorithm for downscaling dry and wet season rainfall. J. Hydrol.
**2022**, 605, 127381. [Google Scholar] [CrossRef] - Zhang, J.; Ma, G.; Huang, Y.; Sun, J.; Aslani, F.; Nener, B. Modelling uniaxial compressive strength of lightweight self-compacting concrete using random forest regression. Constr. Build. Mater.
**2019**, 210, 713–719. [Google Scholar] [CrossRef] - Reis, I.; Baron, D.; Shahaf, S. Probabilistic Random Forest: A Machine Learning Algorithm for Noisy Data Sets. Astron. J.
**2018**, 157, 16. [Google Scholar] [CrossRef] [Green Version] - Kim, S.; Jeong, M.; Ko, B.C. Lightweight surrogate random forest support for model simplification and feature relevance. Appl. Intell.
**2022**, 52, 471–481. [Google Scholar] [CrossRef]

**Figure 2.**(

**a**) Rainfall data from three raingauge stations, namely, Bhimtal, Nainital and Kathgodam; (

**b**) mean areal rainfall time-series data for Gola watershed and (

**c**) runoff time-series data for Gola watershed.

**Figure 4.**Results of simulation of the runoff (m

^{3}/s) of Gola watershed using MLR model from 2009 to 2020 (

**a**) training and (

**b**) testing period.

**Figure 5.**Scatter plots, evaluation statistics, and confidence intervals of the observed and predicted daily runoff Q(t) using MLR model. (

**a**) Training dataset and (

**b**) testing dataset.

**Figure 6.**Results of simulation of the runoff (m

^{3}/s) of Gola watershed using MARS model from 2009 to 2020 (

**a**) training and (

**b**) testing period.

**Figure 7.**Scatter plots, evaluation statistics, and confidence intervals of the observed and predicted daily runoff Q(t) using MARS model. (

**a**) Training dataset and (

**b**) testing dataset.

**Figure 8.**Results of simulation of the runoff (m

^{3}/s) of Gola watershed using SVM model from 2009 to 2020 (

**a**) training and (

**b**) testing period.

**Figure 9.**Using SVM model, scatter plots, evaluation statistics, and confidence intervals of the observed and predicted daily runoff Q(t). (

**a**) Training dataset and (

**b**) testing dataset.

**Figure 10.**Results of simulation of the runoff (m

^{3}/s) of Gola watershed using RF model from 2009 to 2020 (

**a**) training and (

**b**) testing period.

**Figure 11.**Using RF model, scatter plots, evaluation statistics, and confidence intervals of the observed and predicted daily runoff Q(t). (

**a**) Training dataset and (

**b**) testing dataset.

**Figure 12.**Violin plot displays the observed and predicted runoff distribution for the four models during the (

**a**) training and (

**b**) testing phase at the Gola watershed.

**Figure 13.**Relative error distribution over the training and testing phase for the daily timescale river flow for the Gola watershed, (

**a**) MLR, (

**b**) SVM and (

**c**) MARS, and (

**d**) random forest.

**Figure 14.**Taylor diagram of SVM, random forest, MARS, and MLR models during the (

**a**) training and (

**b**) testing period at the Gola watershed.

**Table 1.**Basic statistics of training, testing, and total rainfall and runoff datasets at study stations.

Statistical Parameters | Mean | Median | Minimum | Maximum | Standard Deviation | CV (%) | Skewness |
---|---|---|---|---|---|---|---|

Total dataset | |||||||

Rainfall (mm) | 6.45 | 0 | 0 | 172.38 | 15.60 | 24.18 | 3.87 |

Runoff (m^{3}/s) | 17.22 | 6.06 | 1.88 | 250.03 | 27.51 | 15.97 | 3.80 |

Training data | |||||||

Rainfall (mm) | 6.45 | 0 | 0 | 172.38 | 15.87 | 24.60 | 4.02 |

Runoff (m^{3}/s) | 17.35 | 5.66 | 1.38 | 250.03 | 28.69 | 16.54 | 3.76 |

Testing data | |||||||

Rainfall (mm) | 6.89 | 0 | 0 | 111.69 | 14.47 | 21.00 | 3.07 |

Runoff (m^{3}/s) | 16.83 | 7.5 | 1.61 | 197.08 | 22.16 | 13.16 | 3.59 |

Model No. | Model Input Combination | Mask | Gamma | V-Ratio |
---|---|---|---|---|

M1 | Q_{(t−1)} | 0000001 | 0.082 | 0.329 |

M2 | R_{(t)}, Q_{(t−1)} | 1000001 | 0.061 | 0.244 |

M3 | R_{(t)}, R_{(t−1),} Q_{(t−1)} | 1100001 | 0.064 | 0.256 |

M4 | R_{(t)}, R_{(t−1)}, R_{(t−2)}, Q_{(t−1)} | 1110001 | 0.056 | 0.225 |

M5 | R_{(t),} R_{(t−1),} R_{(t−2),} R_{(t−3),} Q_{(t−1)} | 1111001 | 0.051 | 0.207 |

M6 | R_{(t),} R_{(t−1),} R_{(t−2),} R_{(t−3),} Q_{(t−3),} Q_{(t−1)} | 1111101 | 0.054 | 0.217 |

M7 | Q_{(t−1),} Q_{(t−2)} | 0000011 | 0.081 | 0.320 |

M8 | R_{(t),} Q_{(t−1),} Q_{(t−2)} | 1000011 | 0.063 | 0.254 |

M9 | R_{(t),} R_{(t−1),} Q_{(t−1),} Q_{(t−2)} | 1100011 | 0.051 | 0.212 |

M10 | Q_{(t−1),} Q_{(t−2),} Q_{(t−3)} | 0000111 | 0.076 | 0.307 |

M11 | R_{(t),} R_{(t−1),} Q_{(t−1),} Q_{(t−2),} Q_{(t−3)} | 1000111 | 0.053 | 0.213 |

M12 | R_{(t),} R_{(t−2),} Q_{(t−1),} Q_{(t−2),} Q_{(t−3)} | 1100111 | 0.048 | 0.194 |

M13 | R_{(t),} R_{(t−1),} R_{(t−2),} R_{(t−3)} | 1111000 | 0.107 | 0.430 |

M14 | R_{(t),} R_{(t−1),} R_{(t−2),} R_{(t−3),} Q_{(t−1)} | 1111001 | 0.061 | 0.207 |

M15 | R_{(t),} R_{(t−1),} R_{(t−2),} Q_{(t−1),} Q_{(t−2),} Q_{(t−3)} | 1110111 | 0.050 | 0.200 |

M16 | R_{(t),} R_{(t−1),} R_{(t−2)} | 1110000 | 0.114 | 0.459 |

M17 | R_{(t),} R_{(t−1),} R_{(t−2),} Q_{(t−1)} | 1110001 | 0.056 | 0.225 |

M18 | R_{(t),} R_{(t−1),} R_{(t−2)}, Q_{(t−1),} Q_{(t−2)} | 1110011 | 0.052 | 0.209 |

M19 | R_{(t),} R_{(t−1),} R_{(t−2),} Q_{(t−1),} Q_{(t−2),} Q_{(t−3)} | 1110111 | 0.047 | 0.191 |

M20 | R_{(t),} R_{(t−1)} | 1100000 | 0.124 | 0.498 |

M21 | R_{(t),} R_{(t−1),} Q_{(t−1)} | 1100001 | 0.064 | 0.256 |

M22 | R_{(t),} R_{(t−1),} Q_{(t−1),} Q_{(t−2)} | 1100011 | 0.053 | 0.212 |

M23 | R_{(t)}, R_{(t−1),} Q_{(t−1),} Q_{(t−2),} Q_{(t−3)} | 1100111 | 0.073 | 0.194 |

M24 | R_{(t),} R_{(t−1),} R_{(t−3)} Q_{(t−1),} Q_{(t−2),} Q_{(t−3)} | 1101111 | 0.050 | 0.203 |

M25 | R_{(t),} Q_{(t−1)} | 1000001 | 0.061 | 0.244 |

M26 | R_{(t)}, Q_{(t−1),} Q_{(t−2)} | 1000011 | 0.063 | 0.254 |

M27 | R_{(t),} Q_{(t−1),} Q_{(t−2),} Q_{(t−3)} | 1000111 | 0.053 | 0.213 |

M28 | R_{(t),} R_{(t−1),} Q_{(t−1),} Q_{(t−2),} Q_{(t−3)} | 1100111 | 0.048 | 0.194 |

M29 | R_{(t),} R_{(t−1),} R_{(t−2),} R_{(t−3)} | 1111000 | 0.107 | 0.430 |

M30 | R_{(t),} R_{(t−1),} R_{(t−2),} R_{(t−3),} Q_{(t−1)} | 1111001 | 0.051 | 0.207 |

M31 | R_{(t),} R_{(t−1),} R_{(t−2),} Q_{(t−1),} Q_{(t−2)} | 1110011 | 0.052 | 0.209 |

M32 | R_{(t)} R_{(t−1)} R_{(t−2)} R_{(t−3)} Q_{(t−1)} Q_{(t−2)} | 1111011 | 0.050 | 0.200 |

M33 | R_{(t)}, R_{(t−1),} R_{(t−2)} R_{(t−3),} Q_{(t−1),} Q_{(t−2),} Q_{(t−3)} | 1111111 | 0.048 | 0.194 |

Model | Training | Testing | ||||||
---|---|---|---|---|---|---|---|---|

RMSE (m^{3}/s) | R^{2} | NSE | PBIAS (%) | RMSE (m^{3}/s) | R^{2} | NSE | PBIAS (%) | |

MLR | 13.44 | 0.78 | 0.72 | 0 | 12.67 | 0.67 | 0.51 | 0.80 |

MARS | 12.55 | 0.81 | 0.76 | 0 | 10.07 | 0.79 | 0.74 | 0.20 |

SVM | 12.62 | 0.83 | 0.81 | 4.10 | 14.02 | 0.60 | 0.60 | −0.40 |

RF | 6.31 | 0.96 | 0.94 | −0.20 | 5.53 | 0.95 | 0.92 | −0.20 |

**Table 4.**Results of different performance indicators for RF models during training and testing sets.

Models | Training | Testing | ||
---|---|---|---|---|

RMSE | R^{2} | RMSE | R^{2} | |

RF-1 | 6.443 | 0.95 | 5.553 | 0.95 |

RF-2 | 6.388 | 0.95 | 5.576 | 0.95 |

RF-3 | 6.351 | 0.96 | 5.572 | 0.94 |

RF-4 | 6.423 | 0.95 | 5.550 | 0.95 |

RF-5 | 6.442 | 0.95 | 5.621 | 0.94 |

RF-6 | 6.480 | 0.95 | 5.459 | 0.95 |

RF-7 | 6.415 | 0.95 | 5.609 | 0.95 |

RF-8 | 6.371 | 0.95 | 5.677 | 0.94 |

RF-9 | 6.404 | 0.95 | 5.598 | 0.95 |

RF-10 | 6.378 | 0.95 | 5.521 | 0.95 |

RF-11 | 6.403 | 0.95 | 5.430 | 0.95 |

RF-12 | 6.445 | 0.95 | 5.481 | 0.95 |

RF-13 | 6.373 | 0.95 | 5.563 | 0.95 |

RF-14 | 6.449 | 0.95 | 5.500 | 0.95 |

RF-15 | 6.447 | 0.95 | 5.468 | 0.95 |

RF-16 | 6.435 | 0.96 | 5.571 | 0.94 |

RF-17 | 6.386 | 0.95 | 5.580 | 0.95 |

RF-18 | 6.395 | 0.95 | 5.539 | 0.95 |

RF-19 | 6.481 | 0.95 | 5.451 | 0.95 |

RF-20 | 6.408 | 0.95 | 5.575 | 0.95 |

RF-21 | 6.375 | 0.95 | 5.626 | 0.94 |

RF-22 | 6.389 | 0.95 | 5.529 | 0.95 |

RF-23 | 6.451 | 0.95 | 5.467 | 0.95 |

RF-24 | 6.446 | 0.95 | 5.613 | 0.94 |

RF-25 | 6.369 | 0.95 | 5.453 | 0.95 |

RF-26 | 6.427 | 0.95 | 5.536 | 0.95 |

RF-27 | 6.322 | 0.95 | 5.547 | 0.95 |

RF-28 | 6.318 | 0.96 | 5.565 | 0.95 |

RF-29 | 6.375 | 0.95 | 5.480 | 0.95 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Singh, A.K.; Kumar, P.; Ali, R.; Al-Ansari, N.; Vishwakarma, D.K.; Kushwaha, K.S.; Panda, K.C.; Sagar, A.; Mirzania, E.; Elbeltagi, A.;
et al. An Integrated Statistical-Machine Learning Approach for Runoff Prediction. *Sustainability* **2022**, *14*, 8209.
https://doi.org/10.3390/su14138209

**AMA Style**

Singh AK, Kumar P, Ali R, Al-Ansari N, Vishwakarma DK, Kushwaha KS, Panda KC, Sagar A, Mirzania E, Elbeltagi A,
et al. An Integrated Statistical-Machine Learning Approach for Runoff Prediction. *Sustainability*. 2022; 14(13):8209.
https://doi.org/10.3390/su14138209

**Chicago/Turabian Style**

Singh, Abhinav Kumar, Pankaj Kumar, Rawshan Ali, Nadhir Al-Ansari, Dinesh Kumar Vishwakarma, Kuldeep Singh Kushwaha, Kanhu Charan Panda, Atish Sagar, Ehsan Mirzania, Ahmed Elbeltagi,
and et al. 2022. "An Integrated Statistical-Machine Learning Approach for Runoff Prediction" *Sustainability* 14, no. 13: 8209.
https://doi.org/10.3390/su14138209