# Streamflow Estimation in a Mediterranean Watershed Using Neural Network Models: A Detailed Description of the Implementation and Optimization

^{*}

## Abstract

**:**

^{3}s

^{−1}. This solution considers a 1D convolutional layer and a dense layer as the input and output layers, respectively. Between those layers, two 1D convolutional layers are considered. As input variables, the best performance was reached when the accumulated precipitation values were 1 to 5, and 10 days and delayed by 1 to 7 days.

## 1. Introduction

^{2}) in southern Portugal. This watershed drains to Ponte Vila Formosa hydrometric station and represents 30% of the Maranhão reservoir watershed, which is one of the two reservoirs included in the collective irrigation system of the Sorraia Valley. With its high runoff variability throughout the year (45% of the total runoff occurs in December and January and 14% of that occurs between April and September [22]), and with it being responsible for supplying 54% of the irrigation needs to the system, tools that can help to optimize the amount of water used in irrigation [23] or that can predict water availability are essential for improving water management and supporting decision-makers.

## 2. Materials and Methods

#### 2.1. Description of the Study Area

^{2}) draining to Ponte Vila Formosa hydrometric station (39°12′57.6″ N, 7°47′02.4″ W), located in Raia River, Alter do Chão, southern Portugal (Figure 1).

#### 2.2. Neural Network Models

#### 2.2.1. Artificial Neural Networks

_{1}, i

_{2}, i

_{3}, …, i

_{n}) where the subscript number indicates the node of the previous layer with which the connection is made. Each connection from the previous layer to the mth node has an associated weight, with this set of weights being represented by vector W

_{m}= (w

_{1m}, w

_{2m}, w

_{3m}, …, w

_{nm}), where w

_{nm}represents the connection weight from the nth node in the preceding layer to the present node.

_{m}is the output value of the mth node, f

_{a}is the activation function, and b

_{m}represents the threshold value for this node, also known as bias. The activation function applied to a node determines its response to the total input signal it receives [10]. In the Keras package, the user can define his own activation function, however, in the present work, the activation functions to be tested were selected from those already available in the package. Thus, the linear, exponential linear unit, rectified linear unit, softsign, and hyperbolic tangent functions were considered. Table 3 presents a summary of the characteristics of those functions according to the Keras webpage [37].

**Multi-layer perceptron models.**Multi-layer perceptron (MLP) models are a type of feedforward artificial neural network. Usually, MLP models have a back-propagation algorithm associated with the training process, which implies a feedforward phase and a backward phase [39]. During the first phase, the input data flows forward in the network structure to estimate the output values, while in the second phase the differences between the output values estimated by the network and the respective observed values force the adaptation of the connection weights [40]. Based on the architecture described before, MLP models are composed of three or more layers of artificial neurons, meaning that these types of models have one or more hidden layers [11].

**Long short-term models.**Long short-term models (LSTMs) are types of recurrent neural network (RNN) models. Although they are structurally similar to ANNs, i.e., they are composed of layers connected between them with cells representing neurons, RNNs have a recurrent hidden unit that allows the model to implicitly maintain historical information about all the past events of a sequence [42,43,44]. In each instance, the recurrent hidden unit receives as input the information corresponding to that instance but also to the previous instance [2]. This makes RNN very suitable for time-series data modelling [45,46,47], though a problem has already been identified which is related to the vanishing/exploding gradient during the learning process, which results in the loss of the ability of RNN to learn long-distance information. LSTM structure, proposed by Hochreiter et al. [46], emerged from the necessity to solve the vanishing/exploding gradient problem, and it has the capacity to learn long-term dependencies [48]. As described by Ni et al. [2], who cite LeCun et al. [43], the LSTM solution makes use of a memory cell working as a gated leaky neuron, since “it has a connection to itself at the next step that has a weight of one, but this self-connection is multiplicatively gated by another unit that learns to decide when to clear the content of the memory”. As referred to by Xu et al. [48], there are several applications demonstrating the potential of LSTM in watershed hydrological modeling, namely, in river flow prediction [49,50].

#### 2.2.2. Convolutional Neural Networks

#### 2.2.3. Training Process

^{d}) and adapting those parameters in the opposite direction of the gradient of the objective function (∇

_{θ}J(θ)) [58]. The changes in the parameters’ values are estimated according to a learning rate which determines the size of the steps to reach the minimum value of the objective function. A detailed description of the optimizers is given below.

**Stochastic gradient descent (SGD).**The stochastic gradient descent (SGD) performs a parameter update for each training input–output example. Thus, the mathematical formulation for this method is:

^{(i)}and y

^{(i)}pair represents the input–output example. The SGD can be used to learn online. However, when its frequent updates occur with a high variance, it can cause the objective function to fluctuate drastically.

^{−4}, 1 × 10

^{−3}, and 1 × 10

^{−2}.

**AdaGrad.**The AdaGrad algorithm [63], i.e., the adaptive gradient algorithm, has as its main strength the capacity to adapt the learning rate to the parameters using larger updates for infrequent parameters and smaller updates for frequent parameters [58]. Thus, this algorithm does not require the manual tuning of the learning rate, with the most common value for this parameter being 0.01 according to Ruder [58]. However, the main limitation of this algorithm is the fact that the learning rate can become infinitesimally small to a point where the algorithm cannot improve the results, since that value is shrunk according to the accumulation of squared gradients.

^{−7}, in the present study, the learning rate was set to 0.01, while the epsilon was tested as 1 × 10

^{−7}or 1 × 10

^{−8}.

**RMSprop.**The RMSprop algorithm was developed to overcome AdaGrad’s problem, which is related to the extremely rapid decrease in the learning rate. Thus, this algorithm divides the learning rate by an exponentially decaying average of squared gradients [58]. In the Keras package, the implementation of this algorithm involves five arguments: the learning rate, with a default value of 0.001; the discounting factor for the history/coming gradient, which is by default 0.9; the momentum value, set to 0.0; the epsilon, already defined in AdaGrad and with a default value of 1 × 10

^{−7}; the centered option, which allows the normalization of the gradients by the estimated variance of the gradient if it is activated. By default, this last option is deactivated, which makes the gradients normalized by the uncentered second moment. In the present study, the learning rate for RMSprop was set as 0.01, and the epsilon value was tested as 1 × 10

^{−7}or 1 × 10

^{−8}.

**Adam.**The Adam optimizer, the name of which comes from adaptive moment estimation, adapts the learning rate value for each parameter, like the RMSprop optimizer. However, the Adam optimizer considers an exponentially decaying average of past gradients, similarly to the momentum method described for the SGD optimizer. According to Kingma and Ba [64], the Adam method is simple to implement, it is computationally efficient, its memory requirements are low, it is invariant to the diagonal rescaling of the gradients, and it has a good performance for noisy problems with large amounts of data. In the Keras package, the implementation of the optimizer referred to implies the definition of five arguments, namely, the learning rate, beta_1, beta_2, epsilon, and amsgrad, with default values of 0.001, 0.9, 0.999, 1 × 10

^{−7}, and deactivated, respectively. With the learning rate and the epsilon arguments already defined, beta_1 represents the decay rate for the 1st moment estimates, beta_2 represents the decay rate for the 2nd moment estimates, and the amsgrad option allows the user to use the AMSGrad variant of the Adam algorithm (more information can be found in Reddi et al. [65]). For Ruder [58], the beta_1, beta_2, and epsilon values should take the values 0.9, 0.999, and 1 × 10

^{−8}, respectively. Thus, in this study, the learning rate was optimized, taking into account the values 1 × 10

^{−4}, 1 × 10

^{−3}, and 1 × 10

^{−2}, while the epsilon parameter took the value 1 × 10

^{−7}or 1 × 10

^{−8}.

**AdaMax.**The AdaMax algorithm is an extension of Adam [64]. Thus, AdaMax improves the stability of the Adam optimizer based on the infinity norm. In the Keras package, the implementation of AdaMax involves the same arguments as that of Adam, and both the learning rate and epsilon were tested for the values already presented for the Adam algorithm.

**Nadam.**The Nesterov-accelerated adaptive moment estimation (Nadam) algorithm is the result of the combination of Adam and the Nesterov accelerate gradient (NAG) [58]. Nadam was developed due to the fact that Adam uses the regular momentum component, which has a lower performance than the NAG. Thus, Dozat [66] developed the Nadam optimizer and demonstrated that this optimizer can improve the speed of convergence and the quality of the learned models when compared to the Adam algorithm. In the Keras package, Nadam has the same arguments as the Adam optimizer. Thus, the learning rate and the epsilon were tested considering the same values presented for the Adam algorithm.

#### 2.2.4. Input Variables

_{scaled}is the new value in the range [0,0.9], v is the original value in the dataset, v

_{max}and v

_{min}are the maximum and minimum values present in the column being scaled, respectively, M is the maximum value of the range, and m is the minimum value of the range. The range was selected considering that the maximum streamflow in this section could not be represented in the period analyzed.

#### 2.2.5. Tunning Parameters

#### 2.3. Model Evaluation

^{2}), the percent bias (PBIAS), the root mean square error (RMSE), and the Nash–Sutcliffe efficiency (NSE), which were computed, respectively as follows:

_{i}

^{obs}and Q

_{i}

^{sim}are the flow values observed and estimated by the model on day i, respectively. Q

_{mean}

^{obs}and Q

_{mean}

^{sim}are the average flow values, which consider the observed and modeled values in the period comprehended in the test dataset, and p is the total number of days/values in this period. According to Moriasi et al. [79], the model’s performance is considered satisfactory when NSE > 0.5, PBIAS ± 25%, and R

^{2}> 0.5. The RMSE represents the standard deviation of the residuals (the difference between the predictions and the observed values) and, consequently, lower values mean better model performance.

## 3. Results

^{2}of 0.85, a PBIAS of −17.3%, and a RMSE of 5.0 m

^{3}s

^{−1}. This solution was obtained for the TP5 input scenario (Figure 7) with the Adamax optimizer.

^{2}, PBIAS, and RMSE values of 0.75, 0.83, 15.8%, and 5.6 m

^{3}s

^{−1}, respectively. However, the scenario in which TP3 was combined with the RMSprop optimizer (Figure 8b) showed a very similar performance, presenting a NSE of 0.74, R

^{2}of 0.76, PBIAS of 13.6%, and RMSE of 5.8 m

^{3}s

^{−1}. As shown in Figure 8, both models predicted negative flow values, which was considered a non-acceptable result.

^{2}of 0.61, a PBIAS of −20.0%, and a RMSE of 7.2 m

^{3}s

^{−1}. However, when compared with the best solution for the multi-layer perceptron model, the performance of this solution was substantially worse.

^{2}higher than 0.82 and 0.83, respectively, with the PBIAS laying in the range of −14 to 11%, and the RMSE varying between 4.2 and 4.9 m

^{3}s

^{−1}(Figure 10). From this set, the combination of the TP5 scenario with the Nadam optimizer is the one with the best performance, with the NSE, R

^{2}, PBIAS, and RMSE values being 0.86, 0.87, 10.5%, and 4.2 m

^{3}s

^{−1}, respectively.

^{−3}and 1 × 10

^{−8}, respectively, with the batch size being defined as 20 while the optimum number of epochs was 200.

## 4. Discussion

^{2}and an average annual streamflow of 142 m

^{3}s

^{−1}, and the Xiangjiaba Hydropower Station basin, where the average annual streamflow is 3810 m

^{3}s

^{−1}. The authors considered 68 variables as candidate inputs, from which rainfall and streamflow were the only ones specified; all others were not given. They compared the performance of a CNN model, an ANN model of the MLP type, and an extreme learning machine (ELM) model with a different number of inputs, with the first model presenting the best performance for both watersheds and most of the number of inputs tested. They concluded that the performance of the models does not improve or worsen clearly with the inclusion of more inputs, but they also did not provide the candidate variables that reached the best performances. Barino et al. [20] also compared the performance of four different models, including a MLP model and two CNN models, to predict the river flow in a river section of Madeira River, a tributary of the Amazon River, Brazil. The input variable of the MLP model and one of the CNN models was the river flow of the previous days, while the other CNN model had the river flow and the turbidity in previous days as input variables. The authors concluded that CNN models were the best models for predicting the river flow, with an average NSE, R

^{2}, and MAPE of 0.93, 0.93, and 22.44%, respectively, compared with the NSE of 0.93, R

^{2}of 0.91, and MAPE of 33.60% for the MLP model. Finally, Duan et al. [80] used a CNN with past values of precipitation, temperature, and solar radiation as inputs to predict the long-term river flow, for Catchment Attributes for Large-Sample Studies watershed regions, in California, USA. The CNN model’s performance was compared with that of other machine learning models, with the authors concluding that ANNs have problems capturing some important temporal features when compared with CNNs and RNNs. Additionally, the CNN model was demonstrated to be faster and more stable during the training phase, producing better results for average and high-flow regimes, while the LSTM model was better at producing results for a low-flow regime.

^{2}of 0.94. More recently, Darbandi and Pourhosseini [81] and Üneş [82] also used MLP algorithms to predict the river flow in the Ajichay watershed (with a drained area of 12 790 km

^{2}), East Azerbaijan, and in a station (with a drained area of 75 km

^{2}) of the Stilwater river, Worcester, Sterling, MA, USA, respectively. In the first case, the authors applied the MLP model to predict monthly river flow at three points of the watershed considering as input data the river flow values from the previous one, two, and three months. The average R

^{2}(considering all the stations and all the input data scenarios) for the training period reached 0.86, while that of the test period was 0.78. In the second case, the authors predicted daily flow values using daily average temperature, precipitation, and lagged day flow values as input variables and obtained a Pearson’s correlation coefficient of 0.91. In both cases, MLP models were compared with other models, however, neither demonstrated the best performance. Ni et al. [2] used a MLP model and three LSTM models (one simple LSTM model, a convolutional LSTM model, CLSTM, and a wavelet-LSTM model, WLSTM) to predict the monthly streamflow volume one, three and six months ahead in Cuntan and Hankou stations, Yangtze River basin, China. They demonstrated that the MLP model was the one with the worst performance (average NSE = 0.72), while the simple LSTM model reached an average NSE of 0.76, and the WLSTM and CLSTM models had average NSEs of 0.78 and 0.79, respectively. According to the authors, WLSTM and CLSTM demonstrated better performance because both can be considered as having pre-processing methods based on the convolutional operation, both are based on filter usage, and both are responsible for extracting temporally local information from data. However, the CLSTM filters can be trained by data and, thus, they can learn, while the WLSTM model has pre-specified structured filters. Xu et al. [48] applied several models to predict the streamflow in two watersheds in China, namely, the Hun river basin, with a drained area of 14,800 km

^{2}, and the Yangtze river basin, with a drained area of 1,002,300 km

^{2}. Among the applied models, the authors considered the different structures of the LSTM models for each watershed with meteorological data from different stations in both basins being used as input variables. They concluded that, during the training period, the LSTM models had the best performance among all the models used in both watersheds, while during the verification period, LSTM performance decreased, becoming the second-best solution right after the hydrological model. Additionally, Hu et al. [83] used a LSTM model to predict stream flow 6 h ahead in one hydrological station placed in Tunxi, China. Using streamflow and precipitation data to feed the model, the authors found that the LSTM model performed better than a support vector regression and the MLP models, with the LSTM solution reaching an R

^{2}of 0.97. On top of the good results, it is also important to note that Xu et al. [48] and Hu et al. [83] found some difficulties in predicting peak flow values when using LSTM models.

## 5. Conclusions

^{−3}and an ε of 1 × 10

^{−8}. The model obtained the best solution with a batch size of 20 and with 200 as the number of epochs. The input variables of the best solution included only the average daily precipitation values in the watershed accumulated in 1, 2, 3, 4, 5, and 10 days and delayed by 1, 2, 3, 4, 5, 6, and 7 days. This solution reached a NSE of 0.86 and an R

^{2}of 0.87, with the PBIAS and RMSE being 10.5% and 4.2 m

^{3}s

^{−1}, respectively. However, it is important to note that the worse performances of the LSTM and MLP models, when compared with solutions found in the literature, can be closely related to the choice and treatment of the input variables.

## Supplementary Materials

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Bourdin, D.R.; Fleming, S.W.; Stull, R.B. Streamflow modelling: A primer on applications, approaches and challenges. Atmos. Ocean
**2012**, 50, 507–536. [Google Scholar] [CrossRef] [Green Version] - Ni, L.; Wang, D.; Singh, V.P.; Wu, J.; Wang, Y.; Tao, Y.; Zhang, J. Streamflow and rainfall forecasting by two long short-term memory-based models. J. Hydrol.
**2020**, 583, 124296. [Google Scholar] [CrossRef] - Humphrey, G.B.; Gibbs, M.S.; Dandy, G.C.; Maier, H.R. A hybrid approach to monthly streamflow forecasting: Integrating hydrological model outputs into a bayesian artificial neural network. J. Hydrol.
**2016**, 540, 623–640. [Google Scholar] [CrossRef] - Besaw, L.E.; Rizzo, D.M.; Bierman, P.R.; Hackett, W.R. Advances in ungauged streamflow prediction using artificial neural networks. J. Hydrol.
**2010**, 386, 27–37. [Google Scholar] [CrossRef] - Chiew, F.; McMahon, T. Application of the daily rainfall-runoff model MODHYDROLOG to 28 Australian catchments. J. Hydrol.
**1994**, 153, 383–416. [Google Scholar] [CrossRef] - Jakeman, A.J.; Littlewood, I.G.; Whitehead, P.G. Computation of the instantaneous unit hydrograph and identifiable component flows with application to two small upland catchments. J. Hydrol.
**1990**, 117, 275–300. [Google Scholar] [CrossRef] - Mehr, A.D.; Kahya, E.; Olyaie, E. Streamflow prediction using linear genetic programming in comparison with a neuro-wavelet technique. J. Hydrol.
**2013**, 505, 240–249. [Google Scholar] [CrossRef] - Zhang, X.; Peng, Y.; Zhang, C.; Wang, B. Are hybrid models integrated with data preprocessing techniques suitable for monthly streamflow forecasting? Some experiment evidences. J. Hydrol.
**2015**, 530, 137–152. [Google Scholar] [CrossRef] - Liu, Z.; Zhou, P.; Chen, X.; Guan, Y. A multivariate conditional model for streamflow prediction and spatial precipitation refinement. J. Geophys. Res.
**2015**, 120, 10116–10129. [Google Scholar] [CrossRef] [Green Version] - ASCE Task Committee on Application of Artificial Neural Networks in Hydrology. Artificial Neural Networks in Hydrology. I: Preliminary Concepts. J. Hydrol. Eng.
**2000**, 5, 115–123. [Google Scholar] [CrossRef] - Maier, H.R.; Jain, A.; Dandy, G.C.; Sudheer, K.P. Methods used for the development of neural networks for the prediction of water resource variables in river systems: Current status and future directions. Environ. Model. Softw.
**2010**, 25, 891–909. [Google Scholar] [CrossRef] - Pham, Q.B.; Afan, H.A.; Mohammadi, B.; Ahmed, A.N.; Linh, N.T.T.; Vo, N.D.; Moazenzadeh, R.; Yu, P.-S.; El-Shafie, A. Hybrid model to improve the river streamflow forecasting utilizing multi-layer perceptron-based intelligent water drop optimization algorithm. Soft. Comput.
**2020**, 24, 18039–18056. [Google Scholar] [CrossRef] - Hussain, D.; Khan, A.A. Machine learning techniques for monthly river flow forecasting of Hunza River, Pakistan. Earth Sci. Inform.
**2020**, 13, 939–949. [Google Scholar] [CrossRef] - Sahoo, A.; Samantaray, S.; Ghose, D.K. Stream flow forecasting in Mahanadi River Basin using artificial neural networks. Procedia Comput. Sci.
**2019**, 157, 168–174. [Google Scholar] [CrossRef] - Le, X.-H.; Ho, H.V.; Lee, G.; Jung, S. Application of Long Short-Term Memory (LSTM) neural network for flood forecasting. Water
**2019**, 11, 1387. [Google Scholar] [CrossRef] [Green Version] - Hauswirth, S.M.; Bierkens, M.F.P.; Beijk, V.; Wanders, N. The potential of data driven approaches for quantifying hydrological extremes. Adv. Water Resour.
**2021**, 155, 104017. [Google Scholar] [CrossRef] - Althoff, D.; Rodrigues, L.N.; Silva, D.D. Addressing hydrological modeling in watersheds under land cover change with deep learning. Adv. Water Resour.
**2021**, 154, 103965. [Google Scholar] [CrossRef] - Shu, X.; Ding, W.; Peng, Y.; Wang, Z.; Wu, J.; Li, M. Monthly streamflow forecasting using convolutional neural network. Water Resour. Manag.
**2021**, 35, 5089–5104. [Google Scholar] [CrossRef] - Wang, J.-H.; Lin, G.-F.; Chang, M.-J.; Huang, I.-H.; Chen, Y.-R. Real-time water-level forecasting using dilated causal convolutional neural networks. Water Resour. Manag.
**2019**, 33, 3759–3780. [Google Scholar] [CrossRef] - Barino, F.O.; Silva, V.N.H.; Lopez-Barbero, A.P.; De Mello Honorio, L.; Santos, A.B.D. Correlated time-series in multi-day-ahead streamflow forecasting using convolutional networks. IEEE Access
**2020**, 8, 215748–215757. [Google Scholar] [CrossRef] - Anderson, S.; Radić, V. Evaluation and interpretation of convolutional long short-term memory networks for regional hydrological modelling. Hydrol. Earth Syst. Sci.
**2022**, 26, 795–825. [Google Scholar] [CrossRef] - SNIRH, n.d. Sistema Nacional de Informação de Recursos Hídricos. Available online: https://snirh.apambiente.pt/index.php?idMain= (accessed on 7 February 2021).
- Simionesei, L.; Ramos, T.B.; Palma, J.; Oliveira, A.R.; Neves, R. IrrigaSys: A web-based irrigation decision support system based on open source data and technology. Comput. Electron. Agric.
**2020**, 178, 105822. [Google Scholar] [CrossRef] - Keras. GitHub. Available online: https://github.com/fchollet/keras (accessed on 19 November 2020).
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’16), Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
- O’Malley, T.; Bursztein, E.; Long, J.; Chollet, F.; Jin, H.; Invernizzi, L. Keras Tuner. Available online: https://github.com/keras-team/keras-tuner (accessed on 30 May 2021).
- Agencia Estatal de Meteorología (España). Atlas Climático Ibérico: Temperatura del Aire y Precipitación (1971–2000)=Atlas Climático Ibérico: Temperatura do ar e Precipitação (1971–2000)=Iberian Climate Atlas: Air Temperature and Precipitation (1971–2000); Instituto Nacional de Meteorología: Madrid, Spain, 2011. [Google Scholar]
- European Digital Elevation Model (EU-DEM), version 1.1., n.d. © European Union, Copernicus Land Monitoring Service 2019, European Environment Agency (EEA). Available online: https://land.copernicus.eu/pan-european/satellite-derived-products/eu-dem/eu-dem-v1.1/view (accessed on 15 May 2019).
- Panagos, P.; Van Liedekerke, M.; Jones, A.; Montanarella, L. European Soil Data Centre: Response to European policy support and public data requirements. Land Use Policy
**2012**, 29, 329–338. [Google Scholar] [CrossRef] - Corine Land Cover 2012, n.d. © European Union, Copernicus Land Monitoring Service 2018, European Environment Agency (EEA). Available online: https://land.copernicus.eu/pan-european/corine-land-cover (accessed on 22 June 2019).
- ARBVS, n.d. Área Regada. Associação de Regantes e Beneficiários do Vale do Sorraia. Available online: https://www.arbvs.pt/index.php/culturas/area-regada (accessed on 18 October 2022).
- APA and ARH Tejo, 2012. Agência Portguesa do Ambiente and Administração da Região Hidrográfica Tejo. Plano de gestão da região hidrográfica do Tejo—Relatório técnico (Síntese). Available online: https://apambiente.pt/agua/1o-ciclo-de-planeamento-2010-2015 (accessed on 6 September 2022).
- Pörtner, H.-O.; Roberts, D.C.; Tignor, M.; Poloczanska, E.S.; Mintenbeck, K.; Alegría, A.; Craig, M.; Langsdorf, S.; Löschke, S.; Möller, V.; et al. IPCC, 2022: Climate Change 2022: Impacts, Adaptation and Vulnerability. In Contribution of Working Group II to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change; Cambridge University Press: Cambridge, UK, 2022; p. 3068. [Google Scholar] [CrossRef]
- Almeida, C.; Ramos, T.; Segurado, P.; Branco, P.; Neves, R.; Proença de Oliveira, R. Water Quantity and Quality under Future Climate and Societal Scenarios: A Basin-Wide Approach Applied to the Sorraia River, Portugal. Water
**2018**, 10, 1186. [Google Scholar] [CrossRef] [Green Version] - Lohani, A.K.; Kumar, R.; Singh, R.D. Hydrological time series modeling: A comparison between adaptive neuro-fuzzy, neural network and autoregressive techniques. J. Hydrol.
**2012**, 442–443, 23–35. [Google Scholar] [CrossRef] - Dolling, O.R.; Varas, E.A. Artificial neural networks for streamflow prediction. J. Hydraul. Res.
**2002**, 40, 547–554. [Google Scholar] [CrossRef] - Keras Documentation: Layer Activation Functions, n.d. Available online: https://keras.io/api/layers/activations/ (accessed on 14 October 2022).
- Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall: Hoboken, NJ, USA, 1999. [Google Scholar]
- Cigizoglu, H.K. Estimation, forecasting and extrapolation of river flows by artificial neural networks. Hydrol. Sci. J.
**2003**, 48, 349–361. [Google Scholar] [CrossRef] - Eberhart, R.C.; Dobbins, R.W. Neural Network PC Tools. A Practical Guide; Academic Press: Cambridge, MA, USA, 1990. [Google Scholar] [CrossRef]
- Keras Documentation: Dropout Layer, n.d. Available online: https://keras.io/api/layers/regularization_layers/dropout/ (accessed on 14 October 2022).
- Elman, J.L. Finding structure in Time. Cogn. Sci.
**1990**, 14, 179–211. [Google Scholar] [CrossRef] - LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature
**2015**, 521, 436–444. [Google Scholar] [CrossRef] - Lipton, Z.C.; Berkowitz, J.; Elkan, C. A critical review of recurrent neural networks for sequence learning. arXiv
**2015**. Available online: https://arxiv.org/abs/1506.00019 (accessed on 8 February 2023). - Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural. Netw.
**1994**, 5, 157–166. [Google Scholar] [CrossRef] [PubMed] - Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural. Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed] - Saon, G.; Picheny, M. Recent advances in conversational speech recognition using convolutional and recurrent neural networks. IBM J. Res. Dev.
**2017**, 61, 1:1–1:10. [Google Scholar] [CrossRef] - Xu, W.; Jiang, Y.; Zhang, X.; Li, Y.; Zhang, R.; Fu, G. Using long short-term memory networks for river flow prediction. Hydrol. Res.
**2020**, 51, 1358–1376. [Google Scholar] [CrossRef] - Shen, C. A transdisciplinary review of deep learning research and its relevance for water resources scientists. Water Resour. Res.
**2018**, 54, 8558–8593. [Google Scholar] [CrossRef] - Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks. Hydrol. Earth Syst. Sci.
**2018**, 22, 6005–6022. [Google Scholar] [CrossRef] [Green Version] - LeCun, Y.; Bengio, Y. Convolutional networks for images, speech, and time-series. In The Handbook of Brain Theory and Neural Networks; Arbib, M.A., Ed.; MIT Press: Cambridge, MA, USA, 1995. [Google Scholar]
- Chong, K.L.; Lai, S.H.; Yao, Y.; Ahmed, A.N.; Jaafar, W.Z.W.; El-Shafie, A. Performance enhancement model for rainfall forecasting utilizing integrated wavelet-convolutional neural network. Water Resour. Manag.
**2020**, 34, 2371–2387. [Google Scholar] [CrossRef] - Bengio, Y. Learning Deep Architectures for AI. Found. Trends Mach. Learn.
**2009**, 2, 1–127. [Google Scholar] [CrossRef] - Deng, L. A Tutorial Survey of Architectures, Algorithms, and Applications for Deep Learning. APSIPA Trans. Signal Inf. Process.
**2014**, 3, E2. [Google Scholar] [CrossRef] [Green Version] - Tao, Q.; Liu, F.; Li, Y.; Sidorov, D. Air pollution forecasting using a deep learning model based on 1D convnets and bidirectional GRU. IEEE Access
**2019**, 7, 76690–76698. [Google Scholar] [CrossRef] - Huang, C.; Zhang, J.; Cao, L.; Wang, L.; Luo, X.; Wang, J.-H.; Bensoussan, A. Robust forecasting of river-flow based on convolutional neural network. IEEE Trans. Sustain. Comput.
**2020**, 5, 594–600. [Google Scholar] [CrossRef] - Keras Documentation: Layer Weight Initializers, n.d. Available online: https://keras.io/api/layers/initializers/ (accessed on 2 December 2022).
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv
**2017**. Available online: https://arxiv.org/abs/1609.04747 (accessed on 8 February 2023). - Ebert-Uphoff, I.; Lagerquist, R.; Hilburn, K.; Lee, Y.; Haynes, K.; Stock, J.; Kumler, C.; Stewart, J.Q. CIRA guide to custom loss functions for neural networks in environmental sciences—Version 1. arXiv
**2021**. Available online: https://arxiv.org/abs/2106.09757 (accessed on 8 February 2023). - Maier, H.R.; Dandy, G.C. Neural networks for the prediction and forecasting of water resources variables: A review of modelling issues and applications. Environ. Model. Softw.
**2000**, 15, 101–124. [Google Scholar] [CrossRef] - Wu, W.; Dandy, G.C.; Maier, H.R. Protocol for developing ANN models and its application to the assessment of the quality of the ANN model development process in drinking water quality modelling. Environ. Model. Softw.
**2014**, 54, 108–127. [Google Scholar] [CrossRef] - Keras Documentation: Model Training APIs, n.d. Available online: https://keras.io/api/models/model_training_apis/ (accessed on 14 October 2022).
- Duchi, J.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res.
**2011**, 12, 2121–2159. [Google Scholar] - Kingma, D.P.; Ba, J. Adam: A method for stochastic pptimization. arXiv
**2017**. Available online: https://arxiv.org/abs/1412.6980 (accessed on 8 February 2023). - Reddi, S.J.; Kale, S.; Kumar, S. On the convergence of adam and beyond. arXiv
**2019**. Available online: https://arxiv.org/abs/1904.09237 (accessed on 8 February 2023). - Dozat, T. Incorporating Nesterov Momentum into Adam. In Proceedings of the ICLR 2016 Workshop, San Juan, Puerto Rico, India, 2–4 May 2016. [Google Scholar]
- Juan, C.; Genxu, W.; Tianxu, M.; Xiangyang, S. ANN Model-based simulation of the runoff variation in response to climate change on the Qinghai-Tibet Plateau, China. Adv. Meteorol.
**2017**, 2017, 1–13. [Google Scholar] [CrossRef] [Green Version] - Nacar, S.; Hınıs, M.A.; Kankal, M. Forecasting daily streamflow discharges using various neural network models and training algorithms. KSCE J. Civ. Eng.
**2018**, 22, 3676–3685. [Google Scholar] [CrossRef] - Riad, S.; Mania, J.; Bouchaou, L.; Najjar, Y. Rainfall-runoff model usingan artificial neural network approach. Math. Comput. Model.
**2004**, 40, 839–846. [Google Scholar] [CrossRef] - Yang, S.; Yang, D.; Chen, J.; Zhao, B. Real-time reservoir operation using recurrent neural networks and inflow forecast from a distributed hydrological model. J. Hydrol.
**2019**, 579, 124229. [Google Scholar] [CrossRef] - Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horanyi, A.; Muñoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc.
**2020**, 146, 1999–2049. [Google Scholar] [CrossRef] - McKinney, W. Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference 2010, Austin, TX, USA, 28 June–3 July 2010; pp. 56–61. [Google Scholar] [CrossRef] [Green Version]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Radiuk, P.M. Impact of training set batch size on the performance of convolutional neural networks for diverse datasets. Inf. Technol. Manag.
**2017**, 20, 20–24. [Google Scholar] [CrossRef] - Airola, R.; Hager, K. Image Classification, Deep Learning and Convolutional Neural Networks: A Comparative Study of Machine Learning Frameworks. 2017. Thesis (BSc). Available online: https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1111144&dswid=341 (accessed on 16 October 2022).
- Afaq, S.; Rao, S. Significance of epochs on training a neural network. Int. J. Sci. Res. Sci. Eng.
**2020**, 9, 485–488. [Google Scholar] - Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. arXiv
**2012**. Available online: https://arxiv.org/abs/1206.2944 (accessed on 8 February 2023). - Jin, Y.-F.; Yin, Z.-Y.; Zhou, W.-H.; Shao, J.-F. Bayesian model selection for sand with generalization ability evaluation. Int. J. Numer. Anal. Methods Geomech.
**2019**, 43, 2305–2327. [Google Scholar] [CrossRef] - Moriasi, D.N.; Arnold, J.G.; Liew, M.W.V.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model Evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE
**2007**, 50, 885–900. [Google Scholar] [CrossRef] - Duan, S.; Ullrich, P.; Shu, L. Using convolutional neural networks for streamflow projection in California. Front. Water
**2020**, 2, 28. [Google Scholar] [CrossRef] - Darbandi, S.; Pourhosseini, F.A. River flow simulation using a multilayer perceptron-firefly algorithm model. Appl. Water Sci.
**2018**, 8, 85. [Google Scholar] [CrossRef] [Green Version] - Üneş, F.; Demirci, M.; Zelenakova, M.; Çalışıcı, M.; Taşar, B.; Vranay, F.; Kaya, Y.Z. River flow estimation using artificial intelligence and fuzzy techniques. Water
**2020**, 12, 2427. [Google Scholar] [CrossRef] - Hu, Y.; Yan, L.; Hang, T.; Feng, J. Stream-flow forecasting of small rivers based on LSTM. arXiv
**2020**. Available online: https://arxiv.org/abs/2001.05681 (accessed on 8 February 2023). - Lee, H.; Song, J. Introduction to convolutional neural network using Keras; an understanding from a statistician. Commun. Stat. Appl. Methods
**2019**, 26, 591–610. [Google Scholar] [CrossRef] [Green Version] - Sit, M.; Demiray, B.; Demir, I. Short-Term Hourly Streamflow Prediction with Graph Convolutional GRU Networks. arXiv
**2021**. Available online: https://arxiv.org/abs/2107.07039 (accessed on 8 February 2023). - Szczepanek, R. Daily Streamflow Forecasting in Mountainous Catchment Using XGBoost, LightGBM and CatBoost. Hydrology
**2022**, 9, 226. [Google Scholar] [CrossRef] - Demir, I.; Xiang, Z.; Demiray, B.; Sit, M. WaterBench-Iowa: A large-scale benchmark dataset for data-driven streamflow forecasting. Earth Syst. Sci. Data
**2022**, 14, 5605–5616. [Google Scholar] [CrossRef]

**Figure 1.**Location of the studied watershed, its delineation details, elevation, main rivers, and Ponte Vila Formosa hydrometric station.

**Figure 4.**Node details (Reproduced with permission from ASCE [10]).

**Figure 6.**Distribution of values of statistical parameters (NSE, R

^{2}, PBIAS, and RMSE) for each scenario in each type of neural network tested.

**Figure 8.**Best solutions for long short-term memory model: (

**a**) TP5 scenario with Adamax optimizer; (

**b**) TP3 with RMSprop optimizer.

**Figure 10.**Best solutions for convolutional model: (

**a**) TP5 scenario with Nadam optimizer; (

**b**) TP5 with Adam optimizer; (

**c**) TP5 with Adagrad optimizer; (

**d**) TP5 with RMSprop scenario.

**Table 1.**Average, minimum and maximum values, and number of completed years (total of daily values equals the number of days in the year) of annual precipitation registered in meteorological stations.

Station | Period | Annual Precipitation | |||
---|---|---|---|---|---|

Average (mm) | Minimum (mm) | Maximum (mm) | Number of Completed Years | ||

Aldeia da Mata | 1979–2021 | 621 | 374 | 1056 | 26 |

Alegrete | 1980–2021 | 794 | 457 | 1269 | 17 |

Alpalhão | 1979–2021 | 717 | 365 | 1224 | 26 |

Alter do Chão | 2011–2021 | 614 | 101 | 1081 | 90 |

Cabeço de Vide | 1931–2021 | 668 | 352 | 1184 | 68 |

Campo Experimental Crato (Chança) | 1971–2021 | 662 | 323 | 973 | 29 |

Castelo de Vide | 1931–2021 | 824 | 46 | 1555 | 76 |

Monforte | 1911–2020 | 516 | 256 | 1030 | 88 |

Ribeira de Nisa | 1979–1985 | 673 | 452 | 962 | 5 |

Vale do Peso | 1931–2021 | 757 | 401 | 1324 | 77 |

**Table 2.**Characterization of the streamflow dataset (average, minimum, maximum, and standard deviation values, and number of records) of Ponte Vila Formosa hydrometric station between 1 January 1979 and 30 May 2011 (entire dataset) and 25 July 2001 and 31 December 2008 (studied period) (source: SNIRH [22]).

Period | Streamflow | ||||
---|---|---|---|---|---|

Average (m ^{3} s^{−1}) | Minimum (m ^{3} s^{−1}) | Maximum (m ^{3} s^{−1}) | Std. Deviation (m ^{3} s^{−1}) | Number of Records | |

1 November 1979–6 March 2019 | 3.8 | 0 | 272.8 | 12.7 | 7703 |

25 July 2001–31 December 2008 | 3.8 | 0 | 160.1 | 9.0 | 2645 |

Long Name | Activation Name | Equation |
---|---|---|

Linear | linear | $f\left(x\right)=x$ |

Exponential linear unit | elu | $f\left(x\right)=\left\{\right)separators="|">\begin{array}{cc}\hfill \alpha {(e}^{x}-1),& x0and\alpha 0\hfill \\ \hfill x,& x\ge 0\hfill \end{array}$ |

Rectified linear unit | relu | $f\left(x\right)=\mathrm{m}\mathrm{a}\mathrm{x}(x,0)$ |

Softsign | softsign | $f\left(x\right)=\frac{x}{\left|x\right|+1}$ |

Hyperbolic tangent | tanh | $f\left(x\right)=\frac{\mathrm{s}\mathrm{i}\mathrm{n}\left(x\right)}{\mathrm{c}\mathrm{o}\mathrm{s}\mathrm{h}\left(x\right)}=\frac{{e}^{x}-{e}^{-x}}{{e}^{x}+{e}^{-x}}$ |

Layers | Number of Layers | Number of Neurons | Activation Function | Dropout after Dense | Dropout Rate |
---|---|---|---|---|---|

Input dense | 1 | 1, 2, 3, 4, 5, 6 or training set size | Linear, elu or relu | Yes/No | 0.1 or 0.2 |

Hidden dense | 0, 1, 2, 3 or 4 | 1, 2, 3, 4, 5, 6 or training set size | Softsign, linear, elu or relu | Yes/No (one by each hidden layer) | 0.1 or 0.2 |

Output dense | 1 | 1 | Softsign or linear | - | - |

**Table 5.**Structure characteristics tested for the LSTM model, with n

_{input}representing the number of neurons in the input layer.

Layers | Number of Layers | Number of Neurons | Activation Function |
---|---|---|---|

Input LSTM | 1 | 4, 8, 16 or 32 | tanh (by default) |

Hidden LSTM | 0, 2 or 4 | If hidden layers = 2: 1st layer: 2 × n _{input} 2nd layer: n _{input} If hidden layers = 4: 1st layer: 2 × n _{input} 2nd layer: 3 × n _{input} 3rd layer: 2 × n _{input} 4th layer: n _{input} | tanh (by default) |

Output dense | 1 | 1 | linear |

Layers | N. of Layers | N. of Filters | Kernel Size | Pooling Size | N. of Neurons | Activation Function | Dropout after Dense | Dropout Rate |
---|---|---|---|---|---|---|---|---|

Input convolutional | 1 | 8, 16, or 32 | 1, 5, or 10 | - | - | None (by default) | - | - |

MaxPooling1D | 1 | - | - | 1 or 2 | - | - | - | - |

Hidden convolutional | 0 or 1 | 8, 16, or 32 | 8, 16, or 32 | - | - | None (by default) | - | - |

MaxPooling1D | 1 | - | - | 1 or 2 | - | - | - | - |

Flatten | 1 | - | - | - | - | - | - | - |

Hidden dense | 0, 1, or 2 | - | - | - | 3, 5, or 10 | softsign, linear, elu, or relu | Yes/No (one by each hidden layer) | 0.1 or 0.2 |

Output dense | 1 | - | - | - | 1 | softsign, linear, elu, or relu | - | - |

**Table 7.**Meteorological input data characterization (precipitation and air temperature, period 25 July 2001–31 December 2008).

Meteorological Variable | Average | Minimum | Maximum | Std. Deviation |
---|---|---|---|---|

Daily total precipitation (mm) | 1.59 | 0 | 45.50 | 4.24 |

Daily air temperature (°C) | 16.09 | 1.77 | 34.74 | 6.46 |

**Table 8.**Tested scenarios and dataset dimensions (Acc. TP—accumulated days of total precipitation; Ave. AT—averaged days of air temperature).

Scenario | Total Precipitation (TP) or Total Precipitation + Air Temperature (TP&AT) | |||||
---|---|---|---|---|---|---|

Time Lag (Days) | Acc. TP (Days) | Ave. AT (Days) | Training Set Size | Validation Set Size | Test Set Size | |

TP1 | - | 1 | - | 1851 | 529 | 265 |

TP2 | - | 1,2,3,4,5,10 | - | 1845 | 527 | 264 |

TP3 | - | 10,30,60 | - | 1810 | 517 | 259 |

TP4 | 1,2,3,4,5,6,7 | - | - | 1846 | 527 | 265 |

TP5 | 1,2,3,4,5,6,7 | 1,2,3,4,5,10 | - | 1810 | 517 | 259 |

TP6 | 1,2,3,4,5,6,7 | 10,30,60 | - | 1845 | 527 | 264 |

TP&AT1 | - | 1 | 1 | 1851 | 529 | 265 |

TP&AT2 | - | 1,2,3,4,5,10 | 1,2,3,4,5,10 | 1845 | 527 | 264 |

TP&AT3 | - | 10,30,60 | 10,30,60 | 1810 | 517 | 259 |

TP&AT4 | 1,2,3,4,5,6,7 | - | - | 1846 | 527 | 265 |

TP&AT5 | 1,2,3,4,5,6,7 | 1,2,3,4,5,10 | 1,2,3,4,5,10 | 1810 | 517 | 259 |

TP&AT6 | 1,2,3,4,5,6,7 | 10,30,60 | 10,30,60 | 1845 | 527 | 264 |

Training Algorithm | Hyper-Parameters Optimized | |
---|---|---|

Possible Values Tested for Learning Rate | Possible Values Tested for ε | |

SGD | 1 × 10^{−4}, 1 × 10^{−3} or 1 × 10^{−2} | - |

AdaGrad | - | 1 × 10^{−7} or 1 × 10^{−8} |

RMSprop | - | |

Adam | 1 × 10^{−4}, 1 × 10^{−3} or 1 × 10^{−2} | |

AdaMax | ||

Nadam |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Oliveira, A.R.; Ramos, T.B.; Neves, R.
Streamflow Estimation in a Mediterranean Watershed Using Neural Network Models: A Detailed Description of the Implementation and Optimization. *Water* **2023**, *15*, 947.
https://doi.org/10.3390/w15050947

**AMA Style**

Oliveira AR, Ramos TB, Neves R.
Streamflow Estimation in a Mediterranean Watershed Using Neural Network Models: A Detailed Description of the Implementation and Optimization. *Water*. 2023; 15(5):947.
https://doi.org/10.3390/w15050947

**Chicago/Turabian Style**

Oliveira, Ana Ramos, Tiago Brito Ramos, and Ramiro Neves.
2023. "Streamflow Estimation in a Mediterranean Watershed Using Neural Network Models: A Detailed Description of the Implementation and Optimization" *Water* 15, no. 5: 947.
https://doi.org/10.3390/w15050947