Water Flow Forecasting Based on River Tributaries Using Long Short-Term Memory Ensemble Model

: Water ﬂow forecasts are an essential information for energy production, management and hydropower control. Advanced actions to optimize electricity production can be taken based on predicted information. This work proposes an ensemble strategy using recurrent neural networks to generate a forecast of water ﬂow at Jirau Hydroelectric Power Plant (HPP), installed on the Madeira River in Brazil. The ensemble strategy consists of combining three long short-term memory (LSTM) networks that model the Madeira River and two of its tributaries: Mamoré and Abunã rivers. The historical data from streamﬂow of the Madeira river and its tributaries are used to validate the ensemble LSTM model, where each time series of river tributaries are modeled separated by LSTM models and the result used as input for another LSTM model in order to forecast the streamﬂow of the main river. The experimental results present low errors for training and test sets for individual LSTM networks and ensemble model. In addition, these results were compared with the operational forecasts performed by Jirau HPP. The proposed model showed better accuracy in four of the ﬁve scenarios tested, which indicates a promising approach to be explored in water ﬂow forecasting based on river tributaries.


Introduction
Water flow forecasts evaluate streamflow in terms of lead time. Prediction is based on probability of the streamflow and its historical records. The prediction measures can be used to understand the complexity of water resource management, in order to deal with uncertainty of climate and to support decision and management in hydroelectric power plant [1]. The streamflow forecasts are performed in short-term and long-term to flood management, water supply and in the analysis and operation of reservoirs in hydroelectric power plants [2]. Despite the climate uncertainty that influences the streamflow, traditional models of forecasts are based on statistics of stationary historical flows. The statistics of non-stationary series increases the uncertainty for investments and water resource planning. Therefore, many models have been explored in order to reduce the uncertainty in planning of water resources uses [3].
Short-term forecasting (real time) is performed continuously or after some warnings condition. Generally, short-term forecasting is provided for operational purposes when required by hydroelectric power plant and navigation. In hydroelectric power plant systems usually the planning is based in flow statistics and adjusted on monthly, weekly or daily data bases [4]. When a forecast is used for flood control and power production, an expected volume is used in planning. Flood forecast is performed during flood season, after a flood warning in a river basin. It could be the level of the basin, rainfall or weather condition. The classification is based on the required lead time or waiting time of the basin level in relation to rainfall. Floods can be sudden, medium (basin flood) and large floods [5].
A forecast short-time of 5-10 days is ideal to increase flood response for large river basins [6]. Moreover, it can be used to regulate the streamflow of a hydroelectric power plant in a river system (basin) since it is an important strategy to optimize energy production. Information regarding streamflow is necessary in analysis and operation of reservoir. Therefore, it is very important to study the flow pattern to support decision and management of hydroelectric power plant [7].
The major benefits of river water flow forecast in the context of a hydroelectric power plant is to reduce risks in decision making, to short-term action planning in order to minimize impact of disasters and to improve energy production [8]. The reservoir operation is particular to each hydroelectric installation and it is necessary to know the characteristics of the river basins to determine proper reservoir operation [9,10].
Studies have used several models to develop short-time water flow forecasting in order to increase accuracy in prediction. Stochastic models such as autoregressive (AR) [11], and AR with moving average with exogenous inputs (ARMAX) [12] have been used for short-time flow prediction based on the time series. These models analyze time series datasets in a method that simulates water flow using classical statistics models. Nevertheless, these models have limitations to capture nonlinear characteristics of data. However, machine learning (ML) based data-driven models [13] such as fuzzy neural network (FNN) [14], support vector machine (SVM) [15], artificial neural network (ANN) [16], extreme learning machine (ELM) [17], and genetic programming (GP) [18], have proven to have the best results in modeling processes compared to the stochastic model.
Kratzert et al. [19] proposed an approach based on long short term memory (LSTM) for modeling rainfall-runoff of catchments with snow influence. The results of the LSTM model were better than reached with traditional models. Fu et al. [20] developed a model based on LSTM and a classic backpropagation neural network model for predicting water flow using historical data from a specific period of time. The results showed that performance of LSTM was superior to traditional model in different situations. Zaini, et al. [21] developed a forecast daily time series for Malaysia's rivers water level based on LSTM. The forecasting models were named LSTM t−1 , LSTM t−2 and LSTM t−3 and corresponding to 1 − h ahead of time at multiple lag time which are 1 − h, 2 − h and 3 − h lag time. Ha and collaborators [22] propose three methods using deep neural network based on a monthly streamflow data of Yangtze river (from 1952 to 2018) to predict monthly streamflow of Yantze River in extreme flood years and small flood year. The proposed models used stacked LSTM, Conv LSTM encoder-decoder LSTM and Conv LSTM encoder-decoder gate recurrent unit. The results confirm that Conv LSTM is more stable than traditional models for prediction of Yangtze River streamflow.
Liu et al. [23] proposed a real-time rolling forecast short-term model based on LSTM to predict high uncertainty of water level in urban river in Fuzhou City, China. The results shown that LSTM is feasible method to real-time forecasting river water level. Ghimire et al. [24] combined two deep neural network to make an integrated model to predict hourly short-term at Brisbane and Teewah Creek rivers in Australia. The convolutional neural network (CNN) integrated with LSTM model were named CNN-LSTM model. The results of CNN-LSTM model were compared with standalone CNN model, LSTM models and with conventional artificial models. In all cases, prediction with CNN-LSTM shows better results than standalone models and conventional artificial models. Le et al. [25] proposed six supervised learning models to evaluate the performance of deep learning models to streamflow forecasting. The deep learning models include a feedforward neural network, a CNN and four LSTM models. Two LSTM models with just one hidden layer and gated recurrent unit that are used in two more complex models: stacked LSTM model and bidirectional LSTM model. According to the authors that LSTM-based models provided a better result.
In common with these studies, the authors used streamflow and rainfall data, comparing different neural networks methods presenting LSTM as a method with better accuracy. In a big river basin such as Madeira River, there are several tributaries with different flows characteristics that influence the streamflow of the main river. In this context, this paper proposes an ensemble LSTM model to forecast the streamflow of Madeira River using data only from the streamflow of two of its tributaries: Mamoré and Abunã rivers as input. Meteorological data is not considered in this model where each time series of river tributaries (Madeira and Mamoré) are modeled separated by LSTM models and the result used as input for another LSTM model in order to forecast the streamflow of the main River. The dataset used as a case study was provided by the Jirau HPP, installed on the Madeira River, in the state of Rondônia, Brazil. The Jirau power plant is managed by the Consortium Energia Sustentável do Brasil (ESBR).
Five scenarios where tested in order to compare the accuracy of the ensemble model with the statistical model used by Jirau Hydroelectric Power Plant. The tested scenarios were strict to a limited period of time in order to compare the models. The ensemble LSTM model outperformed the statistical method in four of five scenarios tested. The findings show that is possible to use ensemble LSTM models for water flow forecast on Madeira River based only on the streamflow from its tributaries. Therefore, the proposed method can contributes to the Jirau HPP to manage and plan decision and processes over data from 5 days in advance with high accuracy.

Materials and Methods
This section describes the related background of LSTM, the case study and the characteristics of the dataset and the description of the methodology of the proposed method.

Long Short Term Memory
Long short term memory is a recurrent artificial neural network (RNN) architecture generally applied in deep learning forecasting problems [26,27]. They are composed of LSTM cells capable of capturing long-term dependencies in sequences while attenuate gradient vanishing/exploding problem [28]. This capacity is achieved by the use of forget and update gates to modify memory cell state that allow gradients to also flow unchanged [29,30]. The LSTM memory cells are composed by self-loops that encoded temporal information in the cell states, and three regulators gates that operate the flow of information within each cell. Figure 1 presents a schematic representation of an LSTM memory cell. Self-loops are responsible for storing encoded temporal information from the past, in the state of the cell. The three gates are called: forget gate f g , input gate i g and output gate o g , which operate the information flow by erasing, writing and reading, respectively. Therefore, LSTM models memorize information at different intervals and are suitable to predict time series with a certain duration interval [30,31].
The cell operation is expressed by Equations (1)-(6) where, h t is a vector that represents the hidden state of cell, corresponding to short term memory. Likewise, C t is the cell state that corresponds to long-term memory, andC t is candidate for cell state in time step t, responsible to select possible important information to be stored over time. The weight matrices of forget gate ( f g ), input gate (i g ), output gate (o g ) and cell state (C t ) are denoted as W f , W i , W o , W c , respectively. The weight matrices and the bias for current entry X t are denoted as The forget gate uses the sigmoid activation function, generating values in the range between 0 and 1, depending on current input and previous output of LSTM cell according to Equation (1). A value of 0 in the forget gate means that all information in the state of the previous cell must be erased, and consequently will not continue to persist over time. Already a value of 1 in forget gate means that the previous information in cell state must be completely maintained. The input gate works in a similar way, in which values between 0 and 1 can control writing of new information in cell state, according to information of current input, cell candidate and previous state. Similarly, output gate can control output of information that must be read in cell state, also according to values between 0 and 1.

Case Study
For the case study it was used data from Madeira river and two of its tributaries: Mamoré and Abunã rivers, provided by Jirau HPP. The Madeira River basin is depicted in Figure 2. The Madeira River basin is located in the north of Brazil with a big hydroelectric potential, where the flow rates of the Madeira River can reach 60,000 m 3 /s. As the geography of the river is predominantly plain, dams built on this river have approximately 15 m of nominal fall, which can be considered low for standard dams. To take advantage of the hydroelectric potential of the river, dams at this river uses a large number of turbines with lower power. Particularly in case of Jirau plant, there are 50 generating units in Madeira River. The Energia Sustentável of Brasil (ESBR) consortium is responsible to manage Jirau HPP and has provided dataset used for construct and evaluation of the proposed model.

Dataset
The dataset consists of three time series, containing 2069 daily measurements of flow history from 28 May 2014 to 22 January 2020. Figure 3 shows the original time series from Mamoré (a), Abunã (b) and Madeira (c) rivers. In order to make the data to fit the LSTM model, a preprocessing step is necessary to normalize raw data between 0 and 1 [33]. This process allows to adjust data on a common magnitude scale, providing a more effective weights adjustments for the neural networks [32]. The normalization is performed by Equation (7) where x t represents time serie sample at time step t, while X t expresses sample at time step t after normalization step. min(x) and max(x) are lowest and highest value in time series, respectively. The size of the training and test dataset varies according to the scenarios for each problem [34]. For this experiment, the three time series were individually normalized and divided into three sets: first 1382 measures for training set, 682 measures in sequence for validation set and last four measures for test set, except for Madeira River time series, which has last five measurements for test set. This setup was choose in order to compare with the forecast of the statistical model provided by Jirau HPP, considering five days ahead forecast.

Ensemble LSTM Model
The proposed LSTM ensemble model is divided into two stages, as depicted in Figure 4. First stage has two univariate LSTM networks, called LSTM 1 and LSTM 2, which should generate a 4-day forecast for Mamoré and Abunã rivers time series, respectively. The second stage has a multivariate LSTM network, called LSTM 3, which uses the results of first stage forecasts as an input to forecast 5-days of Madeira River time series [35]. For the training set, a moving window (MW) strategy is used in order to sample the time series dataset as show in Figure 5. The MW strategy convert the entire time series observations into pairs of input (x t ) and output (y t ) samples of LSTM cell. A sample of time series X t can be observed in time step t, with total time steps of dataset n + 1 and total number of MW m used. Each LSTM network has a specific MW that subsample the data measures for input empiricaly defined. LSTM 1 uses subsample of a single measure for i t and two outgoing measures y t , which must be evaluated by two real measurements o t . For LSTM 2, the size of MW is a subsample with two sequential measurements for input i t and one output y t , which will be evaluated with actual measurement o t . Furthermore, for LSTM 3 the MW consists of a subsampling of three sequential measures of each time series for input i t , and one output y t , which will be evaluated by one measure o t of the time series of Madeira River. It is important to mention that LSTM 1 and LSTM 2 networks are univariate, this means that i t and o t samples belong to same time series. Moreover, LSTM 3 uses data from Rio Mamoré and Rio Abunã simultaneously as the two time series input. The training set has 1382 samples, total of MW will result in m = 1380 MWs for LSTM 1 and LSTM 2 and m = 1379 MWs for LSTM 3. The principle of multi-input multioutput (MIMO) was used in order to generate the windows. This is a strategy in multi-step forecasting that predicts all future observations up to intended forecasting horizon [36,37]. The LSTM parameters are empirically defined according to an extensive hyperparameter optimization using grid search to set the length of the input, learning rate, number of LSTM units, number of layers and maximum training of model (number of epochs). To train the model, ADAM optimization algorithm was used with a learning rate of 0.0001. The ADAM optimizer is generally expected to perform better than other optimizers [38]. The LSTM 1 and LSTM 2 networks were configured with only one LSTM layer with 250 LSTM units. LSTM 3 also has only one layer, but it contains 100 LSTM units. These were best settings found for these datasets. Figure 6 shows the test of the first stage of the proposed model. In the first iteration t 1 of LSTM 2, last two measurements in training dataset are used to predict an output from next time step. The second iteration t 2 uses the last measure of training set as an input, along with the first value predicted in previous iteration in order to predict a new output. From the third iteration t 3 onwards, predictions from the last two iterations are used as an input to predict next output. This process is carried out up to 4 days. This procedure is similar to LSTM 1, using the result of the last iteration as an input to generate the output of two sequential forecasts. LSTM 3 uses the same process, but in the first iteration t 1 the last three measurements of the training dataset are used to predict an output. In the second iteration t 2 , the last two measurements of the training set are used together with LSTM 1 and LSTM 2 predictions of time step t 1 , to predict the next output of the Madeira River dataset. In the third iteration t 3 , the last measurement of training set is used, and LSTM 1 and LSTM 2 forecasts of time steps t 1 and t 2 are used to predict an output. From the fourth iteration t 4 onwards, predictions from step 1 and those prior to the time step of the current iteration are used as an input to predict an output. This process is carried out until reaching a 5-day forecast.

Evaluation Method of Ensemble LSTM Model
To evaluate the predictive ability of the proposed LSTM model, the root mean square error (RMSE) was used, as well as the mean absolute error (MAE) criteria: where N correspond to the amount of data,ŷ i is the predicted value and y i is the measured value. Currently, the Jirau HPP uses its own models to generate predictions of the flow of the Madeira River. Such models were generated in order to assist in optimization of electric energy production. In this way, the forecasts obtained by the proposed model is compared with the forecasts provided by Jirau Hydroelectric Power Plant. This comparison aims to verify, in terms of RMSE and MAE, applicability of proposed model in a real practical use scenario.

Results
All the experimental results were generated by using TensorFlow [39] in version 2.10, Keras 2.3.1 and Python 3.7. Table 1 shows average results of 30 realizations with the respective standard deviation and the best realization on train and test datasets. The experimental results are presented in terms of RMSE and MAE, for Mamoré, Abunã and Madera rivers streamflow datasets with LSTM 1, LSTM 2 and LSTM 3, respectively. One can notice that RMSE and MAE results from test dataset in Table 1 were significantly lower compared to results obtained with train dataset. That occurs due to difference in size between datasets, since test dataset were fixed in a shorter time window in order to compare results obtained with forecast provided by Jirau HPP. Therefore, lowest RMSE and MAE values of test dataset occur due to evaluation of these data not having accumulated error of a long test history. Figure 7 depicts this behavior. The results obtained are close to the actual measurements, but Figure 7a,c,e have more accumulated error compared to Figure 7b,d,f.  Figure 7 shows the prediction results for the train and test dataset obtained using the best prediction among 30 realizations, compared to the real streamflow measurements. Figure 7a shows the prediction results of the Mamoré River train dataset, while Figure 7b shows the forecast with its four days test dataset. Similarly, Figure 7c shows the prediction results of the Abunã River train dataset, and Figure 7d shows the forecast with its four days test dataset. Finally, Figure 7e shows the prediction results of the Madera River train dataset, and Figure 7f shows the forecast with its five days test dataset. As one can notice, the numerical results of all proposed LSTMs methods (LSTM 1, LSTM 2 and LSTM 3) are close to real the measurements of each river on train and test datasets.
In order to compare results, the forecast from the proposed LSTM ensemble model were compared to forecasts generated by the Jirau HPP. Moreover, two standard LSTM multivariate models where used in the experiments to compare the accuracy: Vanilla LSTM with 100 units and a stacked LSTM with 4 layers and 250 units in each other layer [40]. Five different scenarios were tested, providing for five days ahead on different dates in January 2020. Scenario 1 consists of forecast from 23 January 2020 to 27 January 2020, scenario 2 from 17 January 2020 to 21 January 2020, scenario 3 from 20 January 2020 to 24 January 2020, scenario 4 from 21 January 2020 to 25 January 2020 and finally scenario 5 from 22 January 2020 to 26 January 2020. The scenarios were chosen according dataset provided by Jirau HPP related to their model prediction. Table 2 shows the numerical results in terms of RMSE and MAE for five scenarios. The smallest errors obtained are highlighted in bold. As one can notice, scenarios 1, 3, 4 and 5 present considerably lower values of both errors for forecasts obtained with the proposed ensemble model. For scenario 2, the Jirau HPP strategy was better in both errors, but with a very close margin. For the other LSTM models (Vanilla and Stacked), it can be noted that the approaches obtained considerably high values in both errors for the five scenarios tested. Numerically, errors can seem low in both forecasts, but when denormalized, the difference is significant in relation to m 3 /s of water. Such an amount is relevant to the generation of electric energy in the long-term, according to the Jirau HPP.  Figure 8 shows the forecasts of five scenarios, comparing the real value of Madeira River streamflow with the forecasts of the proposed LSTM ensemble, Jirau HPP, Vanilla LSTM and Stacked LSTM. The forecast from the proposed ensemble LSTM model fits the real measures of water flow. By tuning the parameters of the model, the training provides results compatible to the real dataset curve. Considering the RMSE and MAE magnitudes, it is expected that the forecast curve will keep its line within the variation limits of the real water flow curve.

Discussions and Conclusions
This work presents an approach to forecast the water flow based on river tributaries using ensemble long short-term memory network. As a case study, it used data from the Madeira River and two of its tributaries: Madeira and Mamoré Rivers, located in Brazil. The dataset used for training and testing the model correspond to studies resolution obtained from the water flow history of Madeira River and its tributaries, provided by the Jirau HPP. Rainfall data was not considered in order to check the forecast accuracy only with inflow measurements. The predictive capacity of the proposed model was tested in terms of RMSE and MAE and compared to individual LSTM models (i.e., Vanilla and Stacked LSTM), and also compared to the statistical model used by the Jirau HPP to forecast water flow and HPP control.
Five different scenarios considering five-day ahead prediction was used in order to compare the accuracy of the models tested. The ensemble LSTM model resulted in better accuracy compared to the other LSTM models. Moreover, when the model is compared to the statistical model of the Jirau HPP, the ensemble LSTM model outperformed in four of five scenarios.
Despite finding similar works in the literature, there is no reference so far on water flow forecasting based only on river tributaries. Therefore, the ensemble LSTM based neural network model is a promising approach to be explored in water flow forecasting based on river tributaries. For the specific case study of the Madeira River and its tributaries, this work can collaborate with the Jirau HPP to support decision making for the management of efficient energy production and control.