# Assessment of Machine Learning Techniques for Monthly Flow Prediction

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Gaussian Process Regression

- Squared exponential: ${K}_{SE}(x,{x}^{\prime})={\sigma}_{f}^{2}\mathrm{exp}\left[-\frac{{(x-{x}^{\prime})}^{2}}{2{\sigma}_{l}^{2}}\right]$;
- Exponential: ${K}_{E}(x,{x}^{\prime})={\sigma}_{f}^{2}\mathrm{exp}\left(-\frac{\left(x-{x}^{\prime}\right)}{{\sigma}_{l}}\right)$;
- $\gamma $-exponential: $K(x,{x}^{\prime})={\sigma}_{f}^{2}\mathrm{exp}\left(-{\left(\frac{\left(x-{x}^{\prime}\right)}{{\sigma}_{l}}\right)}^{{}^{\gamma}}\right)\text{}\mathrm{for}\text{}0\gamma \le 2$;
- Rational quadratic: ${K}_{RQ}(x,{x}^{\prime})={\sigma}_{f}^{2}{\left(1+\frac{{\left(x-{x}^{\prime}\right)}^{2}}{2\alpha {\sigma}_{l}^{2}}\right)}^{-\alpha}$;
- Matern 3/2: ${K}_{M}(x,{x}^{\prime})={\sigma}_{f}^{2}\left(1+\frac{\sqrt{3}\left(x-{x}^{\prime}\right)}{{\sigma}_{l}}\right)\mathrm{exp}\left(-\frac{\sqrt{3}\left(x-{x}^{\prime}\right)}{{\sigma}_{l}}\right)$;
- Matern 5/2: ${K}_{M}(x,{x}^{\prime})={\sigma}_{f}^{2}\left(1+\frac{\sqrt{5}\left(x-{x}^{\prime}\right)}{{\sigma}_{l}}+\frac{5{\left(x-{x}^{\prime}\right)}^{2}}{3{\sigma}_{l}^{2}}\right)\mathrm{exp}\left(-\frac{\sqrt{5}\left(x-{x}^{\prime}\right)}{{\sigma}_{l}}\right)$.

#### 2.2. SVR

- linear, $K({x}_{i},{x}_{j})=\langle {x}_{i},{x}_{j}\rangle ;$
- polynomial, $K({x}_{i},{x}_{j})={\left(\gamma \langle {x}_{i},{x}_{j}\rangle +r\right)}^{d},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}\gamma >0;$
- radial basis function (RBF) or Gaussian, $K({x}_{i},{x}_{j})={e}^{\gamma {\Vert {x}_{i}-{x}_{j}\Vert}^{d}},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}\gamma >0;$
- sigmoid, $K({x}_{i},{x}_{j})=\mathrm{tanh}(\gamma \langle {x}_{i}-{x}_{j}\rangle +r),\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}\gamma >0;$
- quadratic, $K({x}_{i},{x}_{j})={(<{x}_{i},{x}_{j}>+1)}^{2}.$

#### GOA Algorithm

- ${X}_{i}^{d}(t)$: the d-dimensional position of the ith grasshopper in the iteration t;
- G: the number of grasshoppers;
- s: an evaluation function of social interactions;
- l: the attractive length scale;
- $u{b}_{d}$: the upper bound in the d-dimensional functions;
- $l{b}_{d}$: the lower bound in the d-dimensional functions;
- ${d}_{ij}$: the distance between the ith and jth grasshoppers;
- ${\widehat{T}}_{d}$: is the d-dimensional position of the optimum solution found;
- c: the reduction factor to decrease the comfort zone;

#### 2.3. Artificial Neural Networks

#### 2.3.1. Feedforward Neural Networks

_{p}is calculated with a single forward pass through the network. For each output unit o

_{k}, we have

_{k}and hidden unit y

_{j}, respectively, w

_{kj}is the weight between output unit o

_{k}and hidden unit y

_{j}, and z

_{i,p}is the value of input unit z

_{i}of input pattern z

_{p}. In addition, the (I + 1)-th input unit and the (J + 1)-th hidden unit are bias units, representing the threshold values of neurons in the next layer.

#### 2.3.2. Radial Basis Neural Network

#### 2.3.3. Time Delay Neural Networks

_{t}time delays for each input unit is illustrated in Figure 2. The output of a TDNN is calculated as follows:

#### 2.3.4. Recurrent Neural Network

_{I+}

_{2},…, z

_{I+}

_{1+J}are connected to all hidden units y

_{j}(for j = 1, · · ·, J) with a weight equal to 1. Thus, the activation value y

_{j}is simply copied to z

_{I+}

_{1+j}. Each output unit’s activation is then calculated as [25]:

_{I}

_{+2,p}, …, z

_{I}

_{+1+J,p}) = (y

_{1,p(t−1)}, …, y

_{J}

_{,p(t−1)}).

#### 2.4. KNN Model

## 3. Case Studies

#### 3.1. Alavian Basin

^{2}in the northwest part of Iran (Figure 4). Sufichay River is the main stream of the basin, which ends at Erumiyeh Lake. This river also constitutes the main inflow discharges to Alavian Dam, and its average annual discharge is estimated to be around 4.6 MCM. For this area, weather information, including the average precipitation and temperature at a monthly scale for an 18-year period (from 1983 to the end of 2001), were gathered from the Maragheh weather station, and monthly flow discharges were provided using the historical records of the Tazkand hydrometric station upstream of Alavian Dam. Owing to the snow melting in this basin, the monthly flow data can be a function of seasonal variations, and thus all gathered information, including monthly discharges and temperatures, each with three temporal lags, were considered as the inputs of the models (six input variables) for predicting the monthly flows. Figure 4 also shows the status of the monitoring stations in the Sufichay Basin.

#### 3.2. Dez Basin

^{2}is located in the western part of Iran, between longitude 48°10′ to 50°20′ and latitude 31°36′ to 34°08′. The main river of the basin, Dez River, stems from the mountainous areas upstream, and ends up in the Persian Gulf. Half of the precipitation occurs during the winter. Two-thirds of the area is higher than 1000 m, and one-third is higher than 2000 m. Hence, precipitation most often occurs in the form of snow. Figure 5 shows the state of the Dez Dam and the monitoring stations spread throughout the basin. The monthly flow data from the TaleZang hydrometric station, upstream of Dez Dam, for 1962 to 1999 are available, and thus, only this period of data was used for predicting the monthly flows in this case study.

## 4. Results and Discussion

## 5. Conclusions

## Supplementary Materials

Supplementary File 1## Author Contributions

## Funding

## Conflicts of Interest

## References

- Box, G.E.; Jenkins, G.M.; Reinsel, G.C. Time Series Analysis: Forecasting and Control, 4th ed.; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2013; ISBN 978-1-118-61919-3. [Google Scholar]
- Tokar, A.S.; Johnson, P.A. Rainfall-runoff modeling using artificial neural networks. J. Hydrol. Eng.
**1999**, 4, 232–239. [Google Scholar] [CrossRef] - Khashei, M.; Bijari, M. A novel hybridization of artificial neural networks and ARIMA models for time series forecasting. Appl. Soft Comput.
**2011**, 11, 2664–2675. [Google Scholar] [CrossRef] - Feng, L.H.; Lu, J. The practical research on flood forecasting based on artificial neural networks. Expert Syst. Appl.
**2010**, 37, 2974–2977. [Google Scholar] [CrossRef] - Pumo, D.; Francipane, A.; Lo Conti, F.; Arnone, E.; Bitonto, P.; Viola, F.; La Loggia, G.; Noto, L.V. The SESAMO early warning system for rainfall-triggered landslides. J. Hydroinform.
**2016**, 18, 256–276. [Google Scholar] [CrossRef] [Green Version] - Zounemat-Kermani, M.; Kisi, O.; Rajaee, T. Performance of radial basis and lm-feed forward artificial neural networks for predicting daily watershed runoff. Appl. Soft Comput.
**2013**, 13, 4633–4644. [Google Scholar] [CrossRef] - Giustolisi, O.; Laucelli, D. Improving generalization of artificial neural networks in rainfall–runoff modelling. Hydrol. Sci. J.
**2005**, 50, 439–457. [Google Scholar] [CrossRef] - Yu, J.; Qin, X.; Larsen, O.; Chua, L. Comparison between response surface models and artificial neural networks in hydrologic forecasting. J. Hydrol. Eng.
**2013**, 19, 473–481. [Google Scholar] [CrossRef] - Thissen, U.; Van Brakel, R.; De Weijer, A.; Melssen, W.; Buydens, L. Using support vector machines for time series prediction. Chemom. Intell. Lab. Syst.
**2003**, 69, 35–49. [Google Scholar] [CrossRef] - Chiu, D.Y.; Chen, P.J. Dynamically exploring internal mechanism of stock market by fuzzy-based support vector machines with high dimension input space and genetic algorithm. Expert Syst. Appl.
**2009**, 36, 1240–1248. [Google Scholar] [CrossRef] - Malekmohamadi, I.; Bazargan-Lari, M.R.; Kerachian, R.; Nikoo, M.R.; Fallahnia, M. Evaluating the efficacy of svms, bns, anns and anfis in wave height prediction. Ocean Eng.
**2011**, 38, 487–497. [Google Scholar] [CrossRef] - Karamouz, M.; Ahmadi, A.; Moridi, A. Probabilistic reservoir operation using Bayesian stochastic model and support vector machine. Adv. Water Resour.
**2009**, 32, 1588–1600. [Google Scholar] [CrossRef] - Chiang, J.; Tsai, Y. Reservoir drought prediction simulation using support vector machines. Appl. Mech. Mater.
**2011**, 145, 455–459. [Google Scholar] [CrossRef] - Çimen, M.; Kisi, O. Comparison of two different data-driven techniques in modeling lake level fluctuations in turkey. J. Hydrol.
**2009**, 378, 253–262. [Google Scholar] [CrossRef] - Noori, R.; Karbassi, A.; Moghaddamnia, A.; Han, D.; Zokaei-Ashtiani, M.; Farokhnia, A.; Gousheh, M.G. Assessment of input variables determination on the SVM model performance using PCA, gamma test, and forward selection techniques for monthly stream flow prediction. J. Hydrol.
**2011**, 401, 177–189. [Google Scholar] [CrossRef] - Kalteh, A.M. Wavelet Genetic Algorithm-Support Vector Regression (Wavelet GA-SVR) for Monthly Flow Forecasting. Water Resour. Manag.
**2014**, 29, 1283–1293. [Google Scholar] [CrossRef] - Hosseini, S.M.; Mahjouri, N. Integrating Support Vector Regression and a geomorphologic Artificial Neural Network for daily rainfall-runoff modeling. Appl. Soft Comput.
**2016**, 38, 329–345. [Google Scholar] [CrossRef] - Pumo, D.; Viola, F.; Noto, L.V. Generation of natural runoff monthly series at ungauged sites using a regional regressive model. Water
**2016**, 8, 209. [Google Scholar] [CrossRef] [Green Version] - Saremi, S.; Mirjalili, S.; Lewis, A. Grasshopper Optimisation Algorithm: Theory and application. Adv. Eng. Softw.
**2017**, 105, 30–47. [Google Scholar] [CrossRef] - Wu, C.L.; Chau, K.W. Data-driven models for monthly streamflow time series prediction. Eng. Appl. Artif. Intell.
**2010**, 23, 1350–1367. [Google Scholar] [CrossRef] [Green Version] - Akbari, M.; Overloop, P.J.V.; Afashar, A. Clustered K Nearest Neighbor Algorithm for Daily Inflow Forecasting. Water Resour. Manag.
**2011**, 25, 1341–1357. [Google Scholar] [CrossRef] - Rasmussen, C.E.; Williams, C.K.I. Gaussian Process for Machine Learning, Adaptive Computational and Machine Learning; MIT Press: Cambridge, MA, USA; London, UK, 2006; ISBN 026218253X. [Google Scholar]
- Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA; London, UK, 2002; ISBN 978-0262194754. [Google Scholar]
- Yazdi, J.; Hokmabadi, A.; Jalili-Ghazizadeh, M.R. Optimal Size and Placement of Water Hammer Protective Devices in Water Conveyance Pipelines. Water Resour. Manag.
**2018**. accepted. [Google Scholar] [CrossRef] - Engelbrecht, A.P. Computational Intelligence: An Introduction, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2007. [Google Scholar]
- Beale, M.; Hagan, M.; Demuth, H. Neural Network Toolbox User’s Guide; The MathWorks Inc.: Natick, MA, USA, 2010. [Google Scholar]
- Maier, H.R.; Jain, A.G.; Dandy, C.; Sudheer, K.P. Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions. Environ. Model. Softw.
**2010**, 25, 891–909. [Google Scholar] [CrossRef] - Leshno, M.; Lin, V.Y.; Pinkus, A.; Schocken, S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw.
**1993**, 6, 861–867. [Google Scholar] [CrossRef] [Green Version] - Picton, P. Neural Networks, 2nd ed.; Palgrave Macmillan: New York, NY, USA, 2000. [Google Scholar]
- Araghinejad, S. Data-Driven Modeling: Using MATLAB
^{®}in Water Resources and Environmental Engineering; Springer: Berlin, Germany, 2014; ISBN 978-94-007-7506-0. [Google Scholar] - Lang, K.J.; Waibel, A.H.; Hinton, G.E. A time-delay neural network architecture for isolated word recognition. Neural Netw.
**1990**, 3, 23–43. [Google Scholar] [CrossRef]

**Figure 6.**Performance of SVR-Quadratic and KNN (the worst and best techniques) in terms of correlation coefficient for Alavian Basin: (

**a**) SVR-Quadratic_training data; (

**b**) SVR-Quadratic_verification data; (

**c**) KNN-training data; (

**d**) KNN- verification data.

**Figure 8.**Monthly flow prediction for verification dataset for Alavian Basin: (

**a**) NNRBF and RNN, and (

**b**) NNRBF, SVR-Quadratic, KNN, and GPR.

**Figure 9.**Comparing the observed flows with the predicted flows obtained by (

**a**) KNN and (

**b**) SVR-Quadratic for the period 1983–2001 for Alavian Basin.

**Figure 10.**Performance of TDNN and NNRBF (the worst and best techniques) in terms of correlation coefficient for Dez Basin. (

**a**) TDNN- training data; (

**b**) TDNN- verification data; (

**c**) NNRBF- training data; (

**d**) NNRBF- verification data.

**Figure 12.**Monthly flow prediction for verification dataset for Dez Basin: (

**a**) NN-MLP and TDNN, and (

**b**) NNRBF, SVR-Quadratic, KNN and GPR.

**Figure 13.**Comparison of the observed flows with predicted flows obtained by (

**a**) NN-RBF and (

**b**) TDNN for the period 1962–1999 for Dez Basin.

Parameter | Lower Bound | Upper Bound | Search Agents | Max Iteration |
---|---|---|---|---|

$\epsilon $ | 0.01 | 0.3 | 100 | 100 |

$C$ | 1 | 100 |

**Table 2.**Residual criteria for the results of training data for Alavian Basin, where n is the number of observations, y

_{i}and ${\widehat{y}}_{i}$ are the simulated and observed data, respectively, and $\overline{{\widehat{y}}_{i}}$ is the average observed data.

Criteria | Formula | FFNN | TDNN | RBFNN | RNN | GPR | SVR-Quadratic | KNN |
---|---|---|---|---|---|---|---|---|

correlation coefficient (R) | $\frac{n({\displaystyle \sum _{i=1}^{n}{\widehat{y}}_{i}}{y}_{i})-({\displaystyle \sum _{i=1}^{n}{\widehat{y}}_{i}})({\displaystyle \sum _{i=1}^{n}{y}_{i}})}{\sqrt{[n{\displaystyle \sum _{i=1}^{n}{\widehat{y}}_{i}{}^{2}}-{({\displaystyle \sum _{i=1}^{n}{\widehat{y}}_{i}})}^{2}][n{\displaystyle \sum _{i=1}^{n}{y}_{i}{}^{2}}-{({\displaystyle \sum _{i=1}^{n}{y}_{i}})}^{2}]}}$ | 0.94 | 0.95 | 0.93 | 0.92 | 0.94 | 0.92 | 0.88 |

Mean Relative Absolute Error (MRAE) | $\frac{1}{n}{\displaystyle \sum _{i=1}^{n}\frac{\left|{y}_{i}-{\widehat{y}}_{i}\right|}{{\widehat{y}}_{i}}}$ | 0.378 | 0.425 | 0.464 | 0.53 | 0.43 | 0.45 | 0.47 |

Mean Absolute Error (MAE) | $\frac{1}{n}{\displaystyle \sum _{i=1}^{n}\left|{y}_{i}-{\widehat{y}}_{i}\right|}$ | 1.051 | 0.947 | 1.181 | 1.25 | 1.23 | 1.13 | 1.41 |

Absolute Maximum Error (AME) | $\mathrm{max}\left|{y}_{i}-{\widehat{y}}_{i}\right|$ | 6.492 | 6.250 | 7.2511 | 8.44 | 7.16 | 8.08 | 8.73 |

Mean Square Error (MSE) | $\frac{1}{n}{\displaystyle \sum _{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}$ | 2.627 | 2.046 | 3.213 | 3.99 | 4.02 | 3.35 | 5.35 |

Root Mean Square Error (RMSE) | $\sqrt{\frac{1}{n}{\displaystyle \sum _{i=1}^{n}{\left({y}_{i}-{\widehat{y}}_{i}\right)}^{2}}}$ | 1.621 | 1.430 | 1.792 | 1.99 | 2.01 | 1.83 | 2.31 |

Nash-Sutcliffe Coefficient (E) | $1-\frac{{\displaystyle {\sum}_{i=1}^{n}{\left({\widehat{y}}_{i}-{y}_{i}\right)}^{2}}}{{\displaystyle {\sum}_{i=1}^{n}{\left({\widehat{y}}_{i}-\overline{{\widehat{y}}_{i}}\right)}^{2}}}$ | 0.885 | 0.910 | 0.859 | 0.83 | 0.81 | 0.85 | 0.77 |

Criteria | FFNN | TDNN | RBFNN | RNN | GPR | SVR-Quadratic | KNN |
---|---|---|---|---|---|---|---|

R | 0.91 | 0.9 | 0.9 | 0.92 | 0.91 | 0.87 | 0.92 |

MRAE | 0.491 | 0.670 | 0.513 | 0.7 | 0.5 | 0.55 | 0.53 |

MAE | 0.977 | 1.048 | 1.041 | 1.13 | 0.99 | 1.11 | 0.95 |

AME | 5.936 | 5.804 | 5.753 | 5.78 | 6.16 | 7.07 | 5.84 |

MSE | 2.451 | 2.710 | 2.772 | 2.79 | 2.6 | 3.39 | 2.23 |

RMSE | 1.566 | 1.646 | 1.665 | 1.67 | 1.61 | 1.84 | 1.49 |

E | 0.822 | 0.803 | 0.7985 | 0.81 | 0.81 | 0.76 | 0.84 |

Criteria | FFNN | TDNN | RBFNN | RNN | GPR | SVR-Quadratic | KNN |
---|---|---|---|---|---|---|---|

R | 0.76 | 0.7 | 0.91 | 0.7 | 0.77 | 0.74 | 0.71 |

MRAE | 0.365 | 0.538 | 0.292 | 0.55 | 0.42 | 0.67 | 0.36 |

MAE | 96.815 | 125.487 | 62.5402 | 113.64 | 94.169 | 112.2 | 94.79 |

AME | 1092.976 | 1177.905 | 418.0132 | 1210.2 | 1095.94 | 1178.4 | 1132.9 |

MSE | 27,871.642 | 36,228.586 | 10,857.818 | 29,642.51 | 23,597.1 | 26,823.89 | 28,282 |

RMSE | 166.948 | 190.338 | 104.201 | 172.17 | 153.61 | 163.78 | 168.17 |

E | 0.553 | 0.414 | 0.826 | 0.48 | 0.59 | 0.53 | 0.5 |

Criteria | FFNN | TDNN | RBFNN | RNN | GPR | SVR-Quadratic | KNN |
---|---|---|---|---|---|---|---|

R | 0.75 | 0.71 | 0.89 | 0.82 | 0.73 | 0.73 | 0.72 |

MRAE | 0.503 | 9.508 | 0.807 | 0.65 | 0.58 | 0.67 | 0.55 |

MAE | 92.649 | 103.872 | 78.786 | 111.55 | 118.2 | 131.51 | 117.15 |

AME | 526.256 | 632.494 | 373.554 | 581.38 | 764.72 | 785.74 | 826.99 |

MSE | 20,467.509 | 23,024.515 | 12,757.705 | 24,670.98 | 34,144.9 | 34,384.28 | 35,878.1 |

RMSE | 143.065 | 155.391 | 112.950 | 157.07 | 184.78 | 185.43 | 189.42 |

E | 0.536 | 0.447 | 0.711 | 0.67 | 0.53 | 0.53 | 0.51 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Alizadeh, Z.; Yazdi, J.; Kim, J.H.; Al-Shamiri, A.K.
Assessment of Machine Learning Techniques for Monthly Flow Prediction. *Water* **2018**, *10*, 1676.
https://doi.org/10.3390/w10111676

**AMA Style**

Alizadeh Z, Yazdi J, Kim JH, Al-Shamiri AK.
Assessment of Machine Learning Techniques for Monthly Flow Prediction. *Water*. 2018; 10(11):1676.
https://doi.org/10.3390/w10111676

**Chicago/Turabian Style**

Alizadeh, Zahra, Jafar Yazdi, Joong Hoon Kim, and Abobakr Khalil Al-Shamiri.
2018. "Assessment of Machine Learning Techniques for Monthly Flow Prediction" *Water* 10, no. 11: 1676.
https://doi.org/10.3390/w10111676