Next Article in Journal
Abnormal Waves Observation and Analysis of the Mechanism in the Pearl River Estuary, South China
Previous Article in Journal
Hard-Bottom Polychaetes Exposed to Multiple Human Pressure along the Mediterranean Coast of Egypt
Previous Article in Special Issue
Feature Extraction and Prediction of Water Quality Based on Candlestick Theory and Deep Learning Methods
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Successive-Station Streamflow Prediction and Precipitation Uncertainty Analysis in the Zarrineh River Basin Using a Machine Learning Technique

Department of Environmental Engineering, University of Tehran, Tehran 14179, Iran
Department of Hydraulic Engineering, Tsinghua University, Beijing 100084, China
Association of Talent under Liberty in Technology (TULTECH), 10615 Tallinn, Estonia
Institute for Nanomaterials, Advanced Technologies and Innovation, Technical University of Liberec, Studentská 1402/2, 461 17 Liberec, Czech Republic
Department of Civil and Environmental Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran 15875, Iran
Authors to whom correspondence should be addressed.
Water 2023, 15(5), 999;
Submission received: 6 January 2023 / Revised: 19 February 2023 / Accepted: 25 February 2023 / Published: 6 March 2023


Precise forecasting of streamflow is crucial for the proper supervision of water resources. The purpose of the present investigation is to predict successive-station streamflow using the Gated Recurrent Unit (GRU) model and to quantify the impact of input information (i.e., precipitation) uncertainty on the GRU model’s prediction using the Generalized Likelihood Uncertainty Estimation (GLUE) computation. The Zarrineh River basin in Lake Urmia, Iran, was nominated as the case study due to the importance of the location and its significant contribution to the lake inflow. Four stations in the basin were considered to predict successive-station streamflow from upstream to downstream. The GRU model yielded highly accurate streamflow prediction in all stations. The future precipitation data generated under the Representative Concentration Pathway (RCP) scenarios were used to estimate the effect of precipitation input uncertainty on streamflow prediction. The p-factor (inside the uncertainty interval) and r-factor (width of the uncertainty interval) indices were used to evaluate the streamflow prediction uncertainty. GLUE predicted reliable uncertainty ranges for all the stations from 0.47 to 0.57 for the r-factor and 61.6% to 89.3% for the p-factor.

1. Introduction

Efficient management of water resources requires precise prediction of river streamflow to evaluate the effect of climate and land-use alterations, as well as increased agricultural irrigation, on regional aquatic systems [1]. Lake Urmia (LU), located in northwestern Iran, has significant socio-economic importance. However, the water level of the lake has decreased by up to 5 m over the past few decades due to the excessive use of available water and climate changes [2,3]. The changes in river discharges flowing into the lake are primarily responsible for the changes in its water level, since rivers contribute more to the inflow of the lake than groundwater and precipitation [4]. Zarrineh River (ZR) basin is the largest and most crucial sub-basin of the LU basin, providing more than 41% of the environmental flow into LU [3]. Therefore, developing a dependable model to predict the ZR streamflow is of great significance in assessing changes in LU’s water level. Nevertheless, the complex and non-linear behavior of the hydrological system’s components and insufficient data in the region hinder streamflow prediction [5,6].
In recent years, there has been widespread research on river streamflow prediction using process-driven and data-driven methods [7,8,9]. Process-driven methods are practical techniques for understanding fundamental mechanisms of hydrological phenomena, but they require a vast number of high-resolution inputs, including meteorological data, hydrological data, vegetation coverage, soil characteristics, and topographic data [10,11]. Therefore, low computational capacity and regions with unreliable and scarce input data often limit process-based model development.
Data-driven methods of machine learning techniques can efficiently convert the nonlinearity of the input–output relationship with no familiarity of the physical procedures [12,13,14,15]. There are several machine learning techniques that can be used for river flow prediction, including Artificial Neural Networks (ANNs) [16], Support Vector Machines (SVMs) [17], Random Forests (RFs) [18], Gaussian Process Regression (GPR) [19], and Long Short-Term Memory (LSTM) [20]. For instance, in [20] a Long Short-Term Memory (LSTM) is proposed based on model for streamflow forecasting in a river with multiple dams. Moreover, Xu et al. [19] proposed a hybrid model for river flow forecasting that combines Gaussian process regression with an improved differential evolution algorithm. Similarly, Liu et al. [18] evaluated the effectiveness of a Random Forest model for predicting daily streamflow. However, the high nonlinear relationship among the inputs and model output limits these data-driven models’ performance due to their simple structure [21,22].
Recently, deep learning (DL) techniques have been efficaciously utilized to address time-series forecasting problems [23,24,25]. DL approaches are capable of simulating more multidimensional purposes than non-deep neural networks by employing multiple neuron layers in a neural network structure [26]. The Gated Recurrent Unit (GRU) was proposed to simplify the structure of a Long Short-Term Memory model and solve the vanishing gradient problem in RNNs [27]. The GRU can produce better results by improving the prediction performance compared to other RNN networks by shortening the computation time. GRU networks have demonstrated a significant performance in dealing with nonlinearity and huge quantities of data with a simpler structure and higher computational speed than other variants of RNN [9,28]. The GRU has found many successful applications in the hydrology field, particularly in river streamflow prediction [29,30].
Streamflow prediction models face significant uncertainties due to insufficient data information and the complexity of the hydrological system. Uncertainties arise from non-optimal model features that are difficult to detect, systematic errors or measurement errors in initial data, and the calculation system due to simplification and assumption [11,31]. Several studies have evaluated and estimated the uncertainties in streamflow prediction [31,32,33,34,35] concluded that input uncertainty is an essential factor affecting the correctness of the streamflow estimation system. Since rainfall is the most essential input of the rainfall–runoff computation, its low spatial and temporal resolution or errors in the evaluation of precipitation data lead to significant uncertainty in the streamflow prediction [31,36].
Several studies have quantified uncertainty in the streamflow prediction models using various approaches [37,38,39]. Among these methods, the Generalized Likelihood Uncertainty Estimation (GLUE) is a cutting-edge method used to estimate uncertainty in prediction models [37]. Furthermore, the GLUE method is one of the most widely used approaches used to analyze uncertainty due to its simple concept, low vulnerability to model discontinuity, and easy implementation [11,31,33,40]. This method uses the Monte Carlo (MC) approach coupled with Bayesian estimation to determine the “behavioral’’ simulations based on the threshold value of the likelihood score.
This paper aims to predict the monthly streamflow of the Zarrineh River at successive stations, from upstream to downstream, by employing the GRU network. To achieve this goal, five model structures with different input variables and various time-lags were considered for each station. The selected input variables include precipitation, temperature, and streamflow with zero to four-month lag time. In addition, the GLUE method was used to quantify precipitation uncertainty in model prediction. However, instead of using MC simulation to produce random precipitation data series, the precipitation data were obtained from General Circulation Models (GCMs) under different Representative Concentration Pathways (RCPs) to avoid the stochastic errors caused by random data generation.

2. Study Area

Lake Urmia, which has a total area of 5750 km2, is the largest lake in Iran and accounts for 7% of the country’s surface water [4]. Zarrineh River (ZR) basin is the largest and most crucial sub-basin of the LU basin, providing more than 41% (i.e., 1271 MCM) of the environmental flow into LU [3]. ZR is situated to the southeast of Lake Urmia and covers an area of about 12,025 km2 with a length of around 300 km, as depicted in Figure 1 [41]. However, the lake surface area has drastically decreased to one-tenth, to 500 km, with the volume of half a billion cubic meters due to the unconventional use of available water and climate changes [2,3]. The Boukan Dam is the largest and most significant dam operating in the ZR basin with a live storage capacity of 650 MCM, storing water for drinking, agricultural, and industrial uses [42]. The average annual precipitation over the basin for the last four decades was 352 mm, which classifies the region as semi-arid with a Mediterranean climate.

3. Data Collection

In this study, four hydrological stations were considered for the streamflow prediction of successive stations in the Zarrineh River basin: Safakhaneh (station #1), Boukan reservoir (station #2), Qezkorpi (station #3), and Nezamabad (station #4) (see Figure 1B). Available measured monthly streamflow and reservoir outflow data for 1974–2014 were obtained from (, accessed on 1 February 2023) and (, accessed on 1 February 2023), respectively. In addition, the meteorological data introduced into the GRU network, including the precipitation, maximum temperature, and minimum temperature dataset for 1974–2014, were collected from Iran Meteorological Organization. A total of 17 well-spread meteorological stations were considered in the basin (Figure 1B) for reliable streamflow predictions.
The projected precipitation data from GCMs under different RCPs were used in the GLUE method to estimate precipitation input uncertainty. Future precipitation data from 2025 to 2060 was collected from all the available models and RCP scenarios in the region. In total, 93 precipitation datasets under RCP6.0, RCP2.6, RCP8.5, and RCP4.5 scenarios were obtained from the Climate Change, Agriculture and Food Security (CCAFS) data portal (, accessed on 1 February 2023). The RCPs represent greenhouse gas (GHG) concentration trajectories used to understand the climate change in future, and vary from very low (RCP2.6) to very high (RCP8.5) future concentrations [43]. Because the GCM projected data contain systematic errors in their rough 3D resolution, they cannot directly be applied in climate models [44]. Hence, raw climate model outputs require bias correction to improve the fit of the projected data to the observations. Mengistu in [45] compared the raw regional climate model (RCMs) and bias-corrected RCMs against observed climate data. Bias-corrected RCMs performed better in reproducing rainfall, minimum temperature, and maximum temperature than raw RCMs, which demonstrated obvious biases in estimating climate data.

4. Model Description

4.1. Gated Recurrent Unit (GRU) Cell Structure

GRU is an advanced variant of a RNN developed to deal with the vanishing gradient problem [27]. Several studies showed that RNNs have higher performance compared to feedforward networks (FFNs) because they predict better and more stable streamflows [46,47]. Compared to other RNN networks, the GRU has a faster training process for multistep-ahead prediction without affecting its prediction performance. Thus, it is a commonly used deep learning technique, which has been utilized in many hydrological investigations, particularly streamflow forecasting.
The typical GRU cell structure is demonstrated in Figure 2. It has a memory ( h t ), a candidate hidden layer ( h t ), and two controlling gates: the reset gate ( r t ) and the update gate ( z t ). The memory of the current t and the previous time steps t − 1 is calculated using the reset and update gates. The update gate controls how much state information ht−1 (ht−1) is transferred to the up-to-date time step from the earlier one. More state data from the prior time step is produced by the greater number of update gates. The reset gate is applied to determine the degree to which the information from the previous state is forgotten. The lesser the reset gate, the more state information is forgotten. The update equations in the GRU cell structure are computed as per Equations (1)–(4):
r t = σ W r x t + U r h t 1
z t = σ W z x t + U z h t 1
h t = tan h ( W h x t + r t U h h t 1 )
h t = 1 z t h t + z t h t 1
where W and U are the networks’ weights matrices. The sigmoid function ( σ ) and the tan h function limits the output range from 0 to 1 and −1 to 1, respectively.

4.2. GLUE Theory

The GLUE is a statistical technique for uncertainty quantification of forecasting computation [37]. The GLUE method uses different variables to make numerous simulations in a model in order to describe the behavioral/non-behavioral models. The generalized likelihood function is used to identify the behavioral simulations. A higher likelihood value represents a better correlation between observed and simulated values. The behavioral models are used to quantify the model uncertainty after discarding the non-behavioral simulations. The term “behavioral” signifies the accepted models based on the available data and knowledge.

5. Methodology

5.1. GRU Model Development

Figure 3 demonstrates the GRU modeling steps undertaken in this study. The GRU network was used to predict successive-station streamflow in the Zarrineh River basin. Therefore, monthly streamflow, precipitation, reservoir outflow, and maximum and minimum temperature of 1979 to 2000 (427 data points), 2000 to 2003 (50 data points), and 2003 to 2014 (127 data points) were considered as training, validation, and testing sample data, respectively. The validation dataset was used to find the optimum model factors and avoid overfitting, and the testing dataset comprised the unseen data in the training procedure to evaluate the calculative algorithm efficiency.
The process of finding essential input variables with the most influence on the model’s output requires a trial-and-error procedure since there are no unified methods to determine them. Therefore, five model structures with different input variables and up to a four-month lag time were considered to find the structure with the best performance (see Table 1). In Table 1, f represents the GRU networks; Q t and Q u s t represent the streamflow of the current and upstream station at month t, respectively; P t is the precipitation at month t; T m a x t , T m i n t ,   and   T a v g t are the maximum, minimum, and average temperature at month t, correspondingly; t − 1, t − 2, t − 3, and t − 4 illustrate one- to four-month lag times in the model structure. Structure S1 and S2 consider all the features, except the current station’s streamflow with zero- and one-month lag times, respectively. However, S3, S4, and S5 structures contain all the input variables with two- to four-month lag times. A total of five model structures were used to simulate all hydrometric stations, except for the first station (Safakhaneh station), which does not include the streamflow from the upstream station.
Tuning the hyper-parameters in the GRU network is an essential step in achieving accurate prediction results [48]. However, there is no specific method available to select and optimize these parameters; therefore, the trial-and-error technique is used to discover the hyper-parameters with the best model performance on the validation dataset [49]. Therefore, a large number of experiments was performed by considering a wide range for each parameter.
Considering that the stochastic gradient descent optimization algorithm is applied to train the DL networks, a loss function is defined to repeatedly estimate the current model state. Then, the network’s weights are updated to increase the model performance on the subsequent evaluation. The present study used mean squared error (MSE) as the loss function MSE (Equation (11)):
M S E = t = 1 n ( Q o Q s ) 2
where Qo and Qs are the observed and estimated streamflow at time t, correspondingly.

5.2. Data Normalization

Normalizing raw data is an important pre-processing step in training ML approaches. Mapping all the attribute data to the same scale avoids numerical difficulties of the model and enhances the speed and accuracy of the modeling. Zhu in [5] suggested normalizing data into the range of [0, 1] for ML techniques, specifically ANN networks. The following equation (Equation (6)) was applied in the present research for the data normalization:
X n o r m = X i X m a x X m i n m a x
where Xi and Xnorm denote the raw and normalized data, correspondingly. Xmax and Xmin represent the maximum and minimum of raw dataset, respectively.

5.3. Model Evaluation Criteria

The accuracy and reliability of streamflow prediction were evaluated using four statistical measures. The Nash Sutcliffe coefficient (NSE) (Equation (7)) is a reliable and widely used criterion for assessing the hydrological models’ performance, and indicates the ratio of the modeled data variance to the observed data variance. The range of NSE is [−∞, 1], with values closer to 1 indicating better performance [46]. The coefficient of determination (R2) (Equation (8)), which has a range of [0, 1], represents the linear relation between the observed and predicted data. The prediction model shows more reliable results if the value of R2 is closer to 1. The root mean square error (RMSE) (Equation (9)) evaluates the magnitude of the difference between the observed and predicted values. The closer the value of RMSE to 0, the higher the accuracy of the prediction.
N S E = 1 t = 1 ( Q m Q o ) 2 t = 1 ( Q o Q o ¯ ) 2
R 2 = 1 ( Q m Q o ) 2 Q m 2
R M S E = ( Q m Q o ) 2 n
where n is the number of data points. Q m and Q o are the predicted and observed values, respectively. Q o ¯ is the average value of the observations.

5.4. Bias Correction Method

The bias modification method enhances the reliability of climate model simulations by adjusting projected precipitation and temperature data to the observations [50]. Thus, simulated raw climate data are corrected based on the alterations in the mean and variability among the climate model outputs and observed data in a reference period. The general procedure of the bias correction approach is illustrated in Figure 4.
The general form of the bias modification method uses observations to correct the mean and temporal variability of the climate prediction technique outputs. This bias correction is performed by the following equation:
T B C t = O R E F ¯ + σ o , R E F σ T , R E F T R A W t T R E F ¯
where TBC is the bias-corrected GCM output, TRAW is the raw GCM output for the historical or future period, TREF is the GCM output from the historical reference period, and σ T , R E F and σ o , R E F are the standard deviation of GCM output and the standard deviation of reference observations from the reference period, respectively.

5.5. Quantification of Input Data Uncertainty Using GLUE

In the present paper, the general concept of the GLUE method used is shown in Figure 5. The first step in quantifying input data uncertainty is generating random sets of data. While previous studies have used MC simulation for this purpose [11,31,40], this study used projected precipitation data from GCMs under different RCPs for the period of 2025–2060. In total, 93 precipitation datasets were acquired from GCMs. Then, the likelihood value ( L ( P | Q ) ) was obtained after applying each dataset to the GRU network. The widespread likelihood quantity is defined as NSE equation (Equation (11)) [31,33]:
L ( P | Q ) = 1 t = 1 ( Q o Q S ) 2 t = 1 ( Q o Q ¯ ) 2
where Q o is the observed streamflow, Q S is the simulated streamflow, and Q ¯ is the average of observed streamflow dataset.
The comparison of the chosen threshold value α = 80 % and likelihood value L ( P | Q ) specifies the behavioral ( L P | Q 80 % ) and non-behavioral ( L P | Q < 80 % ) datasets. Then, the non-behavioral datasets are discarded and the behavioral ones determine the uncertainty interval using the greater (UL, Equation (12)) and lower boundaries (LL, Equation (13)) equations. Furthermore, the streamflow of the upper and lower limits is obtained using Equations (14) and (15).
U L = 1 + α 2 × 100 %
L L = 1 α 2 × 100 %
Q U L = Q m a x   + P U L P m a x P m i n P m a x   Q P m i n Q p m a x
Q L L = Q m i n   + P L L P m a x P m i n P m a x   Q P m i n Q p m a x
where Q U L and Q L L are the upper and lower limits of the predicted streamflow, correspondingly; Q m i n   and Q m a x   are the minimal and maximal amounts of streamflow, respectively; P m i n and P m a x are the precipitation data consistent with Q m i n   and Q m a x   , respectively; and, P L L and P U L are the precipitation data associated to the minor and higher boundaries likelihood values, correspondingly.
The p - f a c t o r and r - f a c t o r are applied to quantify the strength of the simulation and evaluate the predicted streamflow uncertainty. The p - f a c t o r is the percentage of observed data in the uncertainty interval (95PPU) (Equation (16)). The r - f a c t o r reflects the average width of the 95 PPU band (Equation (17)). Theoretically, the prediction is a perfect fit with the observed data if p - f a c t o r and r - f a c t o r are 1 and 0, respectively. A p - f a c t o r larger than 50% depicts low uncertainty and a low value of r - f a c t o r shows lower uncertainty in the model prediction.
p - f a c t o r = t = 1 n l Q o t n
with l Q o t = 1   i f   Q L L < Q o < Q U L 0   o t h e r w i s e
r - f a c t o r = 1 n t = 1 n Q U L Q L L σ o
where Q o t is the experiential streamflow at time t and σ o is the standard deviation of the declared streamflow.

6. Results

6.1. Evaluation of GRU Networks

The hyper-parameters that require tuning include the optimizer, activation function, learning rate, number of epochs, and batch size. The epoch is a process of sending the entire dataset into the network only once to complete an iterative calculation. Each epoch contains large amounts of data; thus, they are split into small batches. The epoch and batch sizes are set to 64 and 1000, respectively. Although a significant number of epochs is selected, the callback is applied to stop the training process if the validation period performance starts to decrease. The optimization is limited in the stochastic gradient descent algorithm by using a similar learning rate for each feature. Furthermore, the Adam algorithm automatically adapts the learning rate by using the applied gradient for the variable. However, the algorithm may not locate the optima by using a small learning rate for each variable. Nevertheless, the Root Mean Squared Propagation (RMSprop), an extension of previous algorithms, uses the decaying moving average of partial gradients to focus on the most recently seen partial gradients and forget early gradients [51,52]. The RMSprop optimizer with a learning rate of 0.001 was selected. Moreover, the activation function was set to “Tanh”. Note that the GRU model with various structures uses the same hyper-parameters.
Numerous experiments were performed on the chosen range for hidden layers, neurons, and drop-out values for each structure of the GRU network in each hydrometric station. The hidden layers, neurons, and drop-out values varied between 1–5, 5–500, and 0.3–0.7, respectively. For instance, the outlet station of the Zarrineh River basin (Nezamabad station) has two hidden layers with 100 and 120 neurons in each layer, and a drop-out value of 0.4 in the S3 model structure. The GRU network might reach sub-optimal solutions using a random start point. Therefore, ten identical runs were performed for each structure, and the final model was selected based on the replication with the best performance in the testing period.
Table 2 lists the results of the GRU network with five model structures for each hydrometric station. The best model structure was determined based on the statistical criteria of NSE, R2, and RMSE in the validation and testing phases to obtain high and comparable performance and avoid model overfitting. The S1 structure with no lag time shows the poorest performance among the other models in that station. However, introducing antecedent streamflow of the station and a one-month lag time of other input parameters in the S2 model increases the model performance significantly compared to the S1 structure. The S2, S3, S4, and S5 model structures have the same input variables with a one- to four-month lag time. All the available climate data with various lag times were considered in the model structures to obtain the best combination of these inputs and their period. In addition, lag times were chosen in order to analyze how temporal variations in inputs affect the results.
In the first station of the Zarrineh River basin, i.e., Safakhaneh, the models’ performance is enhanced with the increase in the lag time, except for the S5 model, which showed lower results than the S4 model. This indicates that the model’s performance declines when complicating the model with excessive inputs. Overall, the S4 structure shows the best performance among the other models, with NSE, R2, and RMSE of 0.75, 0.78, and 5.7, respectively, in the testing phase. While the streamflow of an upstream station is not considered in the model structures of this station, the downstream stations of the Safakhaneh benefit from the upstream streamflow. The monthly inflow to the Boukan dam was predicted using five structures, in which S1 presents inferior performance compared to the other models. However, applying the station’s streamflow and various lag times substantially improves the statistical criteria of the model. The S5 structure with all the input variables and a four-month lag time shows the best output results, with NSE, R2, and RMSE of 0.85, 0.86, and 20.7, respectively.
Considering that the Qezkorpi station is located downstream of the Boukan dam, the measured monthly outflow of the dam is used as the upstream outflow in the GRU network. All the structures depict high performance with comparable results, but with a slight improvement in the S4 model composed of all the input variables and a three-month lag time. The evaluation criteria for the S4 model are 0.98, 0.99, and 8.2 for NSE, R2, and RMSE, respectively, demonstrating the most accurate model. The most critical station in the Zarrineh River basin is the outlet station, i.e., Nezamabad, which yields the outflow to Lake Urmia. The model illustrates satisfactory output results in all the structures with the highest model performance in the S3 model, with NSE, R2, and RMSE of 0.87, 0.88, and 18.3, respectively. Thus, the GRU network shows a significant capability to predict the successive-station monthly streamflow of the basin, particularly at the outlet station contributing to the Lake Urmia inflow.
The observed and predicted hydrograph and the scatter plot of the structure with the best performance for each hydrometric station in the training, validation, and testing phases are shown in Figure 6. The hydrographs show that the model accurately predicted the streamflow fluctuations in all the stations. In addition, the scatter plots illustrate that the streamflow is predicted with high R2 in each station. Although the model shows some inconsistencies at high flows at the Safakhaneh station and Boukan dam, they performed reasonably for the low- and medium-range flows. The GRU model generally performed significantly for all the flows at the Qezkorpi and Nezamabad stations. Various climate data, the land use, and the location of stations are responsible for the inconsistency in the results for the same model structure in different stations. The results demonstrate that the model performed better for downstream stations compared to the upstream stations considering that the calibrated river flow reaches the downstream stations.

6.2. Uncertainty

The projected precipitation data from 93 GCMs were used to determine the uncertainty in the input data. The projected datasets were obtained from 2025 to 2060 and applied to the best GRU model of each station to predict ensemble streamflow. The 95 PPU plots derived from 93 precipitation datasets for each station are presented in Figure 7. The likelihood value L P | Q was obtained by using the confidence level of α = 80 % . The results indicate that more than 92%, 96%, 98%, and 91% of the precipitation datasets satisfied the L P | Q at the Safakhaneh, Boukan dam, Qezkorpi, and Nezamabad stations, respectively. These precipitation datasets are called behavioral and were retained to estimate the uncertainty of the input data in the GRU network (see Table 3). The calculated p-factor represents the number of observed streamflows falling inside the 95 PPU. The uncertainty is lower if the p-factor and r-factor are closer to 1 and zero, respectively [53]. The p-factor of all the stations is greater than 50%, showing low uncertainty in the retained datasets. The stations have a similar r-factors, but the p-factors of Boukan dam and Qezkorpi stations are higher, indicating lower uncertainty than the other stations.

7. Conclusions

The environmental studies in water related subjects are increasing in developing countries due to its vitality [54,55,56]. In Iran as a developing country, the alternation in the Zarrineh River streamflow is primarily responsible for the changes in the water level of Lake Urmia. This study uses a reliable machine learning method, i.e., GRU, to predict the successive-station monthly streamflow of the Zarrineh River basin. Through five model structures defined for each station, the structure with the most accurate results was obtained based on the three statistical criteria. The input variables in the model structures include streamflow of the current and upstream station, precipitation, and maximum, minimum, and average temperature, with a lag time of up to four months, excluding the Safakhaneh station which had no upstream streamflow. The GRU network presented significant performance in predicting streamflow, particularly at the basin’s outlet station, Nezamabad station. Furthermore, the GLUE method was applied to assess the effect of precipitation uncertainty in streamflow prediction. Therefore, ensemble streamflows were obtained by applying 93 GCM projected precipitation datasets to the GRU network. The results indicate the capability of this method to include the precipitation input uncertainty in the streamflow prediction. Most of the precipitation datasets satisfied the likelihood value considering the selected high confidence level. Furthermore, the p-factor and r-factor were used to estimate the input uncertainty by comparing the observed streamflow with the ensemble predicted streamflow. The p-factor of all the stations is greater than 50% and the r-factor is around 0.5, showing low uncertainty in the retained datasets.

Author Contributions

Conceptualization, M.N. and F.G.; methodology, P.N.; software, M.G.; validation, M.A. and M.N.; formal analysis, M.N.; investigation, F.G.; resources, S.W.; data curation, M.G.; writing—original draft preparation, M.N.; writing—review and editing, M.G.; visualization, M.G.; supervision, M.A.; funding acquisition, S.W. All authors have read and agreed to the published version of the manuscript.


This work was supported by the Ministry of Education, Youth and Sports of the Czech Republic and the European Union—European Structural and Investment Funds in the framework of the Operational Programme Research, Development and Education—project Hybrid Materials for Hierarchical Structures (HyHi, Reg. no. CZ.02.1.01/0.0/0.0/16_019/0000843).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, upon a request.


The authors would like to thank Technical University of Liberec in Czechia, University of Tehran and Amirkabir University in Iran, Tsinghua University in China, and special thanks to Non-profit Association of Talent under Liberty in Technology (TULTECH) in Estonia.

Conflicts of Interest

The authors declare there are no conflict of interest.


  1. Nakhaei, M.; Akrami, M.; Gheibi, M.; Coronado PD U,; Hajiaghaei-Keshteli, M.; Mahlknecht, J. A novel framework for technical performance evaluation of water distribution networks based on the water-energy nexus concept. Energy Convers. Manag. 2022, 273, 116422. [Google Scholar] [CrossRef]
  2. Jalili, S.; Hamidi, S.A.; Namdar Ghanbari, R. Climate variability and anthropogenic effects on Lake Urmia water level fluctuations, northwestern Iran. Hydrol. Sci. J. 2016, 61, 1759–1769. [Google Scholar] [CrossRef] [Green Version]
  3. Yazdandoost, F.; Moradian, S.; Izadi, A. Evaluation of Water Sustainability under a Changing Climate in Zarrineh River Basin, Iran. Water Resour. Manag. 2020, 34, 4831–4846. [Google Scholar] [CrossRef]
  4. Farajzadeh, J.; Fard, A.F.; Lotfi, S. Modeling of monthly rainfall and runoff of Urmia lake basin using “feed-forward neural network” and “time series analysis” model. Water Resour. Ind. 2014, 7, 38–48. [Google Scholar] [CrossRef] [Green Version]
  5. Zhu, S.; Zhou, J.; Ye, L.; Meng, C. Streamflow estimation by support vector machine coupled with different methods of time series decomposition in the upper reaches of Yangtze River, China. Environ. Earth Sci. 2016, 75, 531. [Google Scholar] [CrossRef]
  6. Zhang, J.; Chen, X.; Khan, A.; Zhang, Y.K.; Kuang, X.; Liang, X.; Nuttall, J. Daily runoff forecasting by deep recursive neural network. J. Hydrol. 2021, 596, 126067. [Google Scholar] [CrossRef]
  7. Yuan, X.; Chen, C.; Lei, X.; Yuan, Y.; Adnan, R.M. Monthly runoff forecasting based on LSTM–ALO model. Stoch. Environ. Res. Risk Assess. 2018, 32, 2199–2212. [Google Scholar] [CrossRef]
  8. Ghadami, N.; Gheibi, M.; Kian, Z.; Faramarz, M.G.; Naghedi, R.; Eftekhari, M.; Fathollahi-Fard, A.M.; Dulebenets, M.A.; Tian, G. Implementation of solar energy in smart cities using an integration of artificial neural network, photovoltaic system and classical Delphi methods. Sustain. Cities Soc. 2021, 74, 103149. [Google Scholar] [CrossRef]
  9. Morovati, K.; Tian, F.; Kummu, M.; Shi, L.; Tudaji, M.; Nakhaei, P.; Olivares, M.A. Contributions from climate variation and human activities to flow regime change of Tonle Sap Lake from 2001 to 2020. J. Hydrol. 2023, 616, 128800. [Google Scholar] [CrossRef]
  10. Duan, Q.; Sorooshian, S.; Gupta, V. Effective and efficient global optimization for conceptual rainfall-runoff models. Water Resour. Res. 1992, 28, 1015–1031. [Google Scholar] [CrossRef]
  11. Wu, H.; Chen, B. Evaluating uncertainty estimates in distributed hydrological modeling for the Wenjing River watershed in China by GLUE, SUFI-2, and ParaSol methods. Ecol. Eng. 2015, 76, 110–121. [Google Scholar] [CrossRef]
  12. Akbarian, H.; Jalali, F.M.; Gheibi, M.; Hajiaghaei-Keshteli, M.; Akrami, M.; Sarmah, A.K. A sustainable Decision Support System for soil bioremediation of toluene incorporating UN sustainable development goals. Environ. Pollut. 2022, 307, 119587. [Google Scholar] [CrossRef] [PubMed]
  13. Mousavi, S.J.; Nakhaei, P.; Sadollah, A.; Kim, J.H. Optimization of Hydropower Storage Projects Using Harmony Search Algorithm. In Harmony Search Algorithm, Proceedings of the 3rd International Conference on Harmony Search Algorithm (ICHSA 2017), Bilbao, Spain, 22–24 Februar 2017; Del Ser, J., Ed.; Springer: Singapore, 2017; Volume 514. [Google Scholar] [CrossRef]
  14. Talebidaloueia, M.; Mirbagheria, S.A.; Nakhaeib, P. Treatment prediction of sugar industry wastewater in moving-bed biofilm reactor using multi expression programming. Desalination Water Treat. 2020, 191, 82–92. [Google Scholar] [CrossRef]
  15. Gheibi, M.; Eftekhari, M.; Akrami, M.; Emrani, N.; Hajiaghaei-Keshteli, M.; Fathollahi-Fard, A.M.; Yazdani, M. A sustainable decision support system for drinking water systems: Resiliency improvement against cyanide contamination. Infrastructures 2022, 7, 88. [Google Scholar] [CrossRef]
  16. Bárdossy, A. Neural network-based modelling of the river Danube. J. Hydrol. 2008, 349, 88–100. [Google Scholar]
  17. Jang, D.; Kim, T.W.; Park, H. A Comparative Study of Machine Learning Methods for Streamflow Prediction of Unimpaired Rivers in the United States. Water 2016, 8, 438. [Google Scholar]
  18. Liu, J.; Yang, W.; Xiong, L.; He, X. Prediction of Daily Streamflow Using Random Forest Model. J. Hydrol. Eng. 2018, 23, 04018041. [Google Scholar]
  19. Xu, Z.; Xu, C.Y.; Song, J. A hybrid model based on Gaussian process regression and an improved differential evolution algorithm for river flow forecasting. J. Hydrol. 2016, 533, 143–153. [Google Scholar]
  20. Zhang, H.; Sun, Z.; Liu, C.; Zhang, J.; Zhang, B. LSTM-based streamflow forecasting for a river with multiple dams considering hydrologic similarity. J. Hydrol. 2019, 574, 697–710. [Google Scholar]
  21. Amiri, E. Forecasting daily river flows using nonlinear time series models. J. Hydrol. 2015, 527, 1054–1072. [Google Scholar] [CrossRef]
  22. Arab, M.; Akbarian, H.; Gheibi, M.; Akrami, M.; Fathollahi-Fard, A.M.; Hajiaghaei-Keshteli, M.; Tian, G. A soft-sensor for sustainable operation of coagulation and flocculation units. Eng. Appl. Artif. Intell. 2022, 115, 105315. [Google Scholar] [CrossRef]
  23. Hu, R.; Fang, F.; Pain, C.C.; Navon, I.M. Rapid spatio-temporal flood prediction and uncertainty quantification using a deep learning method. J. Hydrol. 2019, 575, 911–920. [Google Scholar] [CrossRef]
  24. Morovati, K.; Nakhaei, P.; Tian, F.; Tudaji, M.; Hou, S. A Machine Learning Framework to Predict Reverse Flow and Water Level: A Case Study of Tonle Sap Lake. J. Hydrol. 2021, 603, 127168. [Google Scholar] [CrossRef]
  25. Shahsavar, M.M.; Akrami, M.; Gheibi, M.; Kavianpour, B.; Fathollahi-Fard, A.M.; Behzadian, K. Constructing a smart framework for supplying the biogas energy in green buildings using an integration of response surface methodology, artificial intelligence and petri net modelling. Energy Convers. Manag. 2021, 248, 114794. [Google Scholar] [CrossRef]
  26. Raghu, M.; Poole, B.; Kleinberg, J.; Ganguli, S.; Sohl-Dickstein, J. On the expressive power of deep neural networks. In Proceedings of the International Conference on Machine Learning 2017, Sydney, Australia, 6–11 August 2017; pp. 2847–2854. [Google Scholar]
  27. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
  28. Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 324–328. [Google Scholar] [CrossRef]
  29. Apaydin, H.; Feizi, H.; Sattari, M.T.; Colak, M.S.; Shamshirband, S.; Chau, K.W. Comparative analysis of recurrent neural network architectures for reservoir inflow forecasting. Water 2020, 12, 1500. [Google Scholar] [CrossRef]
  30. Zhu, S.; Heddam, S.; Nyarko, E.K.; Hadzima-Nyarko, M.; Piccolroaz, S.; Wu, S. Modeling daily water temperature for rivers: Comparison between adaptive neuro-fuzzy inference systems and artificial neural networks models. Environ. Sci. Pollut. Res. 2019, 26, 402–420. [Google Scholar] [CrossRef]
  31. Bae, D.H.; Trinh, H.L.; Nguyen, H.M. Uncertainty estimation of the SURR model parameters and input data for the Imjin River basin using the GLUE method. J. Hydro-Environ. Res. 2018, 20, 52–62. [Google Scholar] [CrossRef]
  32. Lee, H.; Balin, D.; Shrestha, R.R.; Rode, M. Streamflow prediction with uncertainty analysis, Weida catchment, Germany. KSCE J. Civ. Eng. 2010, 14, 413–420. [Google Scholar] [CrossRef]
  33. Tang, X.; Zhang, J.; Wang, G.; Jin, J.; Liu, C.; Liu, Y.; Bao, Z. Uncertainty Analysis of SWAT Modeling in the Lancang River Basin Using Four Different Algorithms. Water 2021, 13, 341. [Google Scholar] [CrossRef]
  34. Zhang, C.; Yan, H.; Takase, K.; Oue, H. Comparison of the soil physical properties and hydrological processes in two different forest type catchments. Water Resour. 2016, 43, 225–237. [Google Scholar] [CrossRef]
  35. Her, Y.; Yoo, S.H.; Cho, J.; Hwang, S.; Jeong, J.; Seong, C. Uncertainty in hydrological analysis of climate change: Multi-parameter vs. multi-GCM ensemble predictions. Sci. Rep. 2019, 9, 4974. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  36. Fabry, F. Obstacles to the greater use of weather radar information. In Proceedings of the 6th International Symposium on Hydrological Applications of Weather Radar, Melbourne, Australia, 2–4 February 2004. [Google Scholar]
  37. Beven, K.; Binley, A. The future of distributed models: Model calibration and uncertainty prediction. Hydrol. Process. 1992, 6, 279–298. [Google Scholar] [CrossRef]
  38. Vrugt, J.A.; Diks, C.G.; Gupta, H.V.; Bouten, W.; Verstraten, J.M. Improved treatment of uncertainty in hydrologic modeling: Combining the strengths of global optimization and data assimilation. Water Resour. Res. 2005, 41. [Google Scholar] [CrossRef]
  39. Vrugt, J.A.; Gupta, H.V.; Bouten, W.; Sorooshian, S. A Shuffled Complex Evolution Metropolis algorithm for optimization and uncertainty assessment of hydrologic model parameters. Water Resour. Res. 2003, 39, 1201. [Google Scholar] [CrossRef] [Green Version]
  40. Liang, Y.; Cai, Y.; Sun, L.; Wang, X.; Li, C.; Liu, Q. Sensitivity and uncertainty analysis for streamflow prediction based on multiple optimization algorithms in Yalong River Basin of southwestern China. J. Hydrol. 2021, 601, 126598. [Google Scholar] [CrossRef]
  41. Amini, A.; Ghazvinei, P.T.; Javan, M.; Saghafian, B. Evaluating the impacts of watershed management on runoff storage and peak flow in Gav-Darreh watershed, Kurdistan, Iran. Arab. J. Geosci. 2014, 7, 3271–3279. [Google Scholar] [CrossRef]
  42. Emami, F.; Koch, M. Agricultural water productivity-based hydro-economic modeling for optimal crop pattern and water resources planning in the Zarrine River Basin, Iran, in the wake of climate change. Sustainability 2018, 10, 3953. [Google Scholar] [CrossRef] [Green Version]
  43. Jubb, I.; Canadell, P.; Dix, M. Representative Concentration Pathways (RCPs); Australian Climate Change Science Program; Australian Government, Department of the Environment, Canberra: Canberra, Australia, 2013; pp. 5–7.
  44. Ramirez-Villegas, J.; Challinor, A.J.; Thornton, P.K.; Jarvis, A. Implications of regional improvement in global climate models for agricultural impact research. Environ. Res. Lett. 2013, 8, 024018. [Google Scholar] [CrossRef]
  45. Mengistu, A.G.; Woldesenbet, T.A.; Dile, Y.T. Evaluation of the performance of bias-corrected CORDEX regional climate models in reproducing Baro–Akobo basin climate. Theor. Appl. Climatol. 2021, 144, 751–767. [Google Scholar] [CrossRef]
  46. Kumar, P.S.; Praveen, T.V.; Prasad, M.A. Artificial Neural Network Model for Rainfall-Runoff—A Case Study. Int. J. Hybrid Inf. Technol. 2016, 9, 263–272. [Google Scholar] [CrossRef]
  47. Muhammad, A.U.; Li, X.; Feng, J. Using LSTM GRU and Hybrid Models for Streamflow Forecasting. In Machine Learning and Intelligent Communications, Proceedings of the 4th International Conference—MLICOM 2019, Nanjing, China, 24–25 August 2019; Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering; Zhai, X., Chen, B., Zhu, K., Eds.; Springer: Cham, Switzerland, 2019; Volume 294. [Google Scholar] [CrossRef]
  48. Hutter, F.; Hoos, H.; Leyton-Brown, K. An efficient approach for assessing hyperparameter importance. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 754–762. [Google Scholar]
  49. Liang, C.; Li, H.; Lei, M.; Du, Q. Dongting lake water level forecast and its relationship with the three gorges dam based on a long short-term memory network. Water 2018, 10, 1389. [Google Scholar] [CrossRef] [Green Version]
  50. Piani, C.; Weedon, G.P.; Best, M.; Gomes, S.M.; Viterbo, P.; Hagemann, S.; Haerter, J.O. Statistical bias correction of global simulated daily precipitation and temperature for the application of hydrological models. J. Hydrol. 2010, 395, 199–215. [Google Scholar] [CrossRef]
  51. Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 2012, 4, 26–31. [Google Scholar]
  52. Kochenderfer, M.J.; Wheeler, T.A. Algorithms for Optimization; MIT Press: Cambridge, MA, USA, 2019; ISBN 978-0262039420. [Google Scholar]
  53. Mehan, S.; Neupane, R.P.; Kumar, S. Coupling of SUFI 2 and SWAT for Improving the Simulation of Streamflow in an Agricultural Watershed of South Dakota. Hydrol. Curr. Res. 2017, 8, 280. [Google Scholar] [CrossRef]
  54. Kiyan, A.; Gheibi, M.; Akrami, M.; Moezzi, R.; Behzadian, K. A Comprehensive Platform for Air Pollution Control System Operation in Smart Cities of Developing Countries: A Case Study of Tehran. Environ. Ind. Lett. 2023, 1, 10–27. [Google Scholar] [CrossRef]
  55. Kiyan, A.; Gheibi, M.; Akrami, M.; Moezzi, R.; Behzadian, K.; Taghavian, H. The Operation of Urban Water Treatment Plants: A Review of Smart Dashboard Frameworks. Environ. Ind. Lett. 2023, 1, 28–45. [Google Scholar] [CrossRef]
  56. Gheibi, M.; Chahkandi, B.; Behzadian, K.; Akrami, M.; Moezzi, R. Evaluation of Ceramic Water Filters’ Performance and Analysis of Managerial Insights by SWOT Matrix. Environ. Ind. Lett. 2023, 1, 1–9. [Google Scholar] [CrossRef]
Figure 1. Zarrineh River basin: Part (A) shows the location map of the Zarrineh River basin in the northwest of Iran and part (B) shows the Zarrineh River basin and its rivers network along with hydrometric and meteorological stations 1–4.
Figure 1. Zarrineh River basin: Part (A) shows the location map of the Zarrineh River basin in the northwest of Iran and part (B) shows the Zarrineh River basin and its rivers network along with hydrometric and meteorological stations 1–4.
Water 15 00999 g001
Figure 2. Schematic of a GRU cell.
Figure 2. Schematic of a GRU cell.
Water 15 00999 g002
Figure 3. Flowchart of GRU modeling procedure.
Figure 3. Flowchart of GRU modeling procedure.
Water 15 00999 g003
Figure 4. Schematic of the bias improvement system.
Figure 4. Schematic of the bias improvement system.
Water 15 00999 g004
Figure 5. General concept of GLUE method.
Figure 5. General concept of GLUE method.
Water 15 00999 g005
Figure 6. Comparison of observed and predicted streamflow for (a) Safakhaneh, (b) Boukan dam, (c) Qezkorpi, (d) Nezamabad stations.
Figure 6. Comparison of observed and predicted streamflow for (a) Safakhaneh, (b) Boukan dam, (c) Qezkorpi, (d) Nezamabad stations.
Water 15 00999 g006
Figure 7. Uncertainty interval (95 PPU) of input data for (a) Safakhaneh, (b) Boukan dam, (c) Qezkorpi, (d) Nezamabad; the gray area is the uncertainty interval and the dots are observations.
Figure 7. Uncertainty interval (95 PPU) of input data for (a) Safakhaneh, (b) Boukan dam, (c) Qezkorpi, (d) Nezamabad; the gray area is the uncertainty interval and the dots are observations.
Water 15 00999 g007aWater 15 00999 g007b
Table 1. The model structures used in the GRU network.
Table 1. The model structures used in the GRU network.
NameModel Structure
S1 Q t = f ( Q u s t , P t , T m a x t , T m i n t , T a v g t )
S2 Q t = f Q u s t 1 , Q u s t , Q t 1 , P t 1 , P t , T m a x t 1 , T m a x t , T m i n t 1 , T m i n t , T a v g t 1 , T a v g t
S3 Q t = f ( Q u s t 2 , Q u s t 1 , Q u s t , Q t 2 , Q t 1 , P t 2 , P t 1 , P t , T m a x t 2 , T m a x t 1 , T m a x t
T m i n t 2 , T m i n t 1 , T m i n t , T a v g t 2 , T a v g t 1 , T a v g t )
S4 Q t = f ( Q u s t 3 , Q u s t 2 , Q u s t 1 , Q u s t , Q t 3 , Q t 2 , Q t 1 , P t 3 , P t 2 , P t 1 , P t , T m a x t 3
T m a x t 2 , T m a x t 1 , T m a x t , T m i n t 3 , T m i n t 2 , T m i n t 1 , T m i n t , T a v g t 3 , T a v g t 2 , T a v g t 1 , T a v g t )
S5 Q t = f ( Q u s t 4 , Q u s t 3 , Q u s t 2 , Q u s t 1 , Q u s t , Q t 4 , Q t 3 , Q t 2 , Q t 1 , P t 4 ,   P t 3 ,
P t 2 , P t 1 , P t , T m a x t 4 , T m a x t 3 , T m a x t 2 , T m a x t 1 , T m a x t , T m i n t 4 , T m i n t 3 , T m i n t 2 ,
T m i n t 1 , T m i n t , T a v g t 4 , T a v g t 3 , T a v g t 2 , T a v g t 1 , T a v g t )
Table 2. Performance of GRU-based streamflow forecasting models for five station structures in the Zarrineh River basin with varying monthly lag time.
Table 2. Performance of GRU-based streamflow forecasting models for five station structures in the Zarrineh River basin with varying monthly lag time.
Training PhaseValidation PhaseTesting Phase
Safakhaneh (#1)S10.340.3513.90.490.3415.10.460.2912.3
Boukan dam (#2)S1−7.90.7923.4−10.80.8227.4−12.60.8520.7
Qezkorpi (#3)S10.940.9615.20.860.8418.60.950.9912.7
Nezamabad (#4)S10.720.7242.30.660.7234.70.710.7727.7
Table 3. Input uncertainty estimation of the GRU model for the Zarrineh River basin.
Table 3. Input uncertainty estimation of the GRU model for the Zarrineh River basin.
Station Names L P | Q p-Factor (%)r-Factor
Boukan dam8989.30.57
Nezam Abad8561.60.47
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nakhaei, M.; Ghazban, F.; Nakhaei, P.; Gheibi, M.; Wacławek, S.; Ahmadi, M. Successive-Station Streamflow Prediction and Precipitation Uncertainty Analysis in the Zarrineh River Basin Using a Machine Learning Technique. Water 2023, 15, 999.

AMA Style

Nakhaei M, Ghazban F, Nakhaei P, Gheibi M, Wacławek S, Ahmadi M. Successive-Station Streamflow Prediction and Precipitation Uncertainty Analysis in the Zarrineh River Basin Using a Machine Learning Technique. Water. 2023; 15(5):999.

Chicago/Turabian Style

Nakhaei, Mahdi, Fereydoun Ghazban, Pouria Nakhaei, Mohammad Gheibi, Stanisław Wacławek, and Mehdi Ahmadi. 2023. "Successive-Station Streamflow Prediction and Precipitation Uncertainty Analysis in the Zarrineh River Basin Using a Machine Learning Technique" Water 15, no. 5: 999.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop