Improving Daily Peak Flow Forecasts Using Hybrid Fourier-Series Autoregressive Integrated Moving Average and Recurrent Artificial Neural Network Models

In multi-purpose reservoirs, to achieve optimal operation, sophisticated models are required to forecast reservoir inflow in both short- and long-horizon times with an acceptable accuracy, particularly for peak flows. In this study, an auto-regressive hybrid model is proposed for long-horizon forecasting of daily reservoir inflow. The model is examined for a one-year horizon forecasting of high-oscillated daily flow time series. First, a Fourier-Series Filtered Autoregressive Integrated Moving Average (FSF-ARIMA) model is applied to forecast linear behavior of daily flow time series. Second, a Recurrent Artificial Neural Network (RANN) model is utilized to forecast FSF-ARIMA model’s residuals. The hybrid model follows the detail of observed flow time variation and forecasted peak flow more accurately than previous models. The proposed model enhances the ability to forecast reservoir inflow, especially in peak flows, compared to previous linear and nonlinear auto-regressive models. The hybrid model has a potential to decrease maximum and average forecasting error by 81% and 80%, respectively. The results of this investigation are useful for stakeholders and water resources managers to schedule optimum operation of multi-purpose reservoirs in controlling floods and generating hydropower.


Introduction
Stream flow forecasting plays an important role in environmental and hydrological research and disaster management. Nowadays, mathematical models are emerging for more precise and longerhorizon daily reservoir inflow forecasting that can be beneficial for reservoir operation, flood warning, and optimal water allocation for various water users. Predicting peak flows can help us to control floods and hydropower operation.
One of the successful mathematical forecasting models are autoregressive models such as Autoregressive Integrated Moving Average (ARIMA) and multilayer perceptron Artificial Neural Network (ANN). Autoregressive models forecast future values of time series based on the identification of past temporal patterns of the records. The main hypothesis of ARIMA models, which are conventional forecasting models, assumes a linear relationship between historical time series for the forecasting of future variables [1]. Hence, ARIMA models cannot forecast the nonlinear pattern of inflow time series. On the other hand, ANN models, as well-known forecasting models, are able to model nonlinear time series [1]. Autoregressive forecasting models can be categorized based on their time horizon as short-term, mid-term, and long-term which are from a part of a day to a week, weeks up to a month, and months to years, respectively. Applied time scales for inflow forecasting are hourly, daily, weekly, monthly, quarterly, and yearly [2][3][4][5][6][7]. One of the advantages of machine learning models is to forecast time series for long-term horizons. Recently, many studies reported the need for forecasting daily hydrological data at a long-term horizon (at least for one year ahead) [8][9][10][11][12].
Some ANN models are linked with conventional auto-regressive models (Auto Regressive (AR) and ARIMA) to reveal their ability in short-term horizon forecasting of daily flow. Kisi and Cigizoglu [3] used Feed-Forward Artificial Neural Network (FF-ANN) and AR models to forecast daily flow for three rivers in the USA and a river in Turkey. Three artificial neural network structures were then selected for comparison with the AR model forecasts. Given the same input data for 1-day-ahead forecasts, the results showed that ANN structures were able to produce better results than AR models. Banihabib et al. [2] forecasted the daily inflow to the Dez reservoir by using an FF-ANN model and a linear regression model based on inflow data from hydrometric stations located upstream of River Dez. The research showed that in short-term horizon forecasting, the FF-ANN performed better than linear regression models. Xie [13] used linear regression and exponential smoothing ANN models to forecast daily flow. These models performed well during the dry seasons while the non-linear ANNs were superior to the other models in forecasting flood flows of rainy seasons; however, they still had limitations in forecasting peak flows [13]. Sattari et al. [14] forecasted daily flow upstream of Eleviyan Reservoir by a Recurrent ANN (RANN) and a Back-Propagation Neural Network (BP-NN). The results suggested that both models are fair in one-day ahead forecasting of flood flow. However, both models performed well when they were applied for forecasting low flows. Consequently, these studies show that ANN models perform better for a shortterm horizon forecasting of daily low flow. However, even for the short-term horizon forecasting of inflow, ANN models do not have a considerable accuracy for peak flows such as flood flows.
Several ARIMA and Artificial Neural Network (ANN) models are proposed for forecasting of inflow in monthly, quarterly, and yearly time scales [4][5][6][15][16][17][18][19][20], while autoregressive models are reported for successful forecasting of daily flow at short-term time horizons [2,3,14,[21][22][23][24][25][26]. The forecasting horizon of most of short term autoregressive models are only one day or 3-to 7-days ahead of flow forecasting [2,3,6,21,[23][24][25]27]. These short-term horizon flow forecasting can benefit single-purpose dam for the purpose of flood control. However, a longer horizon of forecasting is required for optimal operation of a multi-purpose dam that provides water for hydropower generation, flood control, domestic water supply, and irrigation. There are other successful investigations of daily forecasting for at least one year ahead [8,[10][11][12]. Banihabib et al. [9] proposed a non-linear auto-regressive ANN model for forecasting daily streamflow for a long-term horizon versus an ARIMA model. The auto-regressive ANN model improved long-term forecasting daily streamflow by continuously following a daily flow pattern compared to the ARIMA. However, the proposed model has still considerable uncertainty in forecasting peak flows.
The literature review indicates that developing autoregressive models are needed to forecast daily reservoir inflow, especially peak flows for a long-term horizon, to provide the optimal operation of multi-purpose reservoirs for flood control, hydropower generation, domestic water supply, and irrigation purposes. Since forecasting by regular ANN has already been done [2,6,9], we focused on using hybrid ANN models in this study to keep a novelty aspect of the work, as well as to increase the accuracy of the predictions. Indeed, the novelty of this research compared to previous studies like Banihabib et al. [9] is developing a hybrid autoregressive model (a combination of linear and nonlinear forecasting models) to present a robust forecasting model for peak flows in long-horizon daily reservoir inflow forecasting.
The results of this study are applicable for stakeholders and water resources managers to schedule optimum operation of multi-purpose reservoirs to control floods and to generate optimum hydropower.

Multi-Purpose Reservoir and Reservoir Inflow Data
To examine the performance of the proposed autoregressive hybrid model, a multi-purpose reservoir (Dez Dam) has been selected as a case study. Figure 1 shows the location of the Dez basin. The reservoir aims to control floods, generate hydropower, and supply agricultural and domestic water. The Dez reservoir basin is at a latitude of 32°35′N to 34°07′N and longitude of 48°20′ to 50°20′E in Southwest Iran. In this research, the daily flow records of Taleh-Zang hydrometric station located upstream of the reservoir were used to forecast inflow to the reservoir. In addition, daily discharge data from 1975 to 2011 of the hydrometric station were used for calibration/training and forecasting daily reservoir inflow.

Overview of the Research Method and Performance Evaluation
The proposed hybrid autoregressive model consists two sub-models: Fourier Series Filtered ARIMA (FSF-ARIMA) for the linear part and RANN for the nonlinear part. Both sub-models have two processing phases: training (calibration), in which the models use historical observed inflow data for learning and then for forecasting (in which the models forecast the daily reservoir inflow for one year). First, FSF-ARIMA models were calibrated and used for forecasting the linear part of the reservoir inflow data, then the RANN model was trained and used for forecasting the non-linear part of the data.
In this study, reservoir inflow forecasting was carried out by using a conventional model (FSF-ARIMA model) and the proposed model (autoregressive hybrid model), and the results were compared with evaluation indices. Through comparing the models with observed data, the performance of each model was assessed in daily reservoir inflow forecasting. Error Index (EI) and coefficient of determination ( 2 R ) were employed as the indices for performance evaluation of the autoregressive hybrid and FSF-ARIMA models in training and forecasting phases as below [2,6,9]: is defined for determining the best forecasting duration as follows [2,6,9]: where is the average cumulative relative error until a certain month, and m is the cumulative number of days until that certain month. is used to evaluate the time-tendency of error of the models in the forecasting phase.
In this study, we used Windows OS; programming language: MATLAB for RANN and R 2.13.0 for FSF-ARIMA. We stopped training when Mean Squared Error (MSE) was at the minimum. Training algorithm: Levenberg-Marquardt algorithm; number of hidden layers: 1; transfer functions of hidden layer: tangent-sigmoid (tansig) and log-sigmoid (logsig); transfer function in the output layer: pure line. We also tested 1760 nonlinear autoregressive network with exogenous inputs (NARX)-recurrent neural network (RNN) (NARX-RNN) model structures differing in transfer functions, numbers of inputs (2)(3)(4)(5), and neurons per hidden layer ; input delays and output delays ranged from 1 to 10. We tested two training algorithms: Levenberg-Marquardt (LM) algorithm and traingdx, but LM was applied as the learning function, finally, because it has generally high accuracy and it is fast learning.

Previous Linear and Nonlinear Models
To examine the capability of the proposed hybrid model, the previous linear and nonlinear models (FSF-ARIMA RANN models) proposed by Banihabib et al. [9] were applied to the case study, and their results were compared to the result of the developed hybrid model.
The procedure for applying FSF-ARIMA to the seasonally variable reservoir inflow data for a one-year horizon is summarized as follows [9]. The FSF-ARIMA procedure requires normally distributed stationary reservoir inflow time series. First, the time series are normalized using a logarithm transformation [28]. This method has been successfully employed for inflow forecasting based on past investigations [29][30][31]. Then the mean and standard deviation of the logarithmtransformed data are computed. In the next step, a Fourier Series was used to remove the seasonal tendency in the logarithm transformed time series [1,9]. The FSF-ARIMA(p,d,q) model is used (Equation (5)) to forecast daily stream flow data [9]. Multiple FSF-ARIMA models were tried, and the most suitable model was selected using the Akaike Information Criterion ( ) [32] as a well-known criterion for evaluating time series models. The Fourier-transformed data is defined as below: where Q and εt are the Fourier-transformed data and random error, respectively, at time step t. Øi (i = 1, 2,…, p) and θj (j =1, 2,…, q) are model parameters, p is the autoregressive model order, and q is the moving average model order [33]. The FSF-ARIMA model with the best results based on and the number of parameters was used for forecasting and determining the linear part of the steam flow data. Then Q is forecasted using Equation (5) for one year ahead. Several FSF-ARIMA(p,1,q) models are developed and values of p and q are determined based on minimization of . For each candidate model, we use Equation (6) to compute the AIC as below: = −2log (max ℎ ) + 2( ) Building RANN consists of selecting a learning function, inputs, and activation and training functions [9]. The Levenberg-Marquardt (LM) algorithm was applied as the learning function, which has generally high accuracy and is fast learning [34]. The neural networks involve bias, one hidden layer, and tangent-sigmoid or log-sigmoid as activation functions. A pure linear activation function was applied in the output layer. Besides, the appropriate number of output delays is determined by trial and error process.

The Proposed Autoregressive Hybrid Model
The proposed hybrid model benefits from the unique advantages of both autoregressive models, FSF-ARIMA, and RANN models, to recognize linear and nonlinear patterns of reservoir inflow time series. Both FSF-ARIMA and RANN models are autoregressive. Therefore, the hybrid model is also an autoregressive model. Developing the proposed hybrid model generally includes two steps. In the first step, an FSF-ARIMA model is applied to forecast linear elements of daily inflow time series. In the second step, a RANN model is employed to forecast FSF-ARIMA model residuals. Since FSF-ARIMA models cannot calculate nonlinear structures of the datasets, the residuals of the FSF-ARIMA model are the nonlinear part of the stream flow time series and can be forecasted by the nonlinear part of the proposed hybrid model (RANN). The Residuals of Linear Forecasting ( ) can be computed as follows: where Qot is output discharge. To forecast the nonlinear part of reservoir inflow time series ( ), the proposed autoregressive hybrid model employs the RANN model. The RANN model is trained with time series to find the following autoregressive nonlinear relation: where is a nonlinear function estimated using the RANN; is the number of output delays and is tested for 1, 2, 3, 4, 5 in multiple RANN (RANN1 to RANN5); and is the day of year of the forecasted day; is the predicted flow at time step . is -day delayed flow data (observed data in training and forecasted values in forecasting phase).
The RANN is inspired from our understanding of the human brain's neural networks system. First, information processing is accomplished in elements which are identified as neurons; second, information is conveyed between neurons by using their connections; third, each connection has a specific weight that is a multiplier for information conveyed from one neuron to another; fourth, each neuron regularly uses a nonlinear activation function to compute its output. A RANN is described based upon network structure, training method, and activation function. Figure 2 shows a schematic diagram of a RANN model developed in this study; RANNs are dynamic recurrent nonlinear ANNs. In RANN networks, output delays act as dynamic memory in the reservoir inflow forecasting phase. In this research, the numbers of output delays from 1 to 5 in RANN1 to RANN5 models are examined. In addition, the number of neurons in the hidden layer is tested from 1 to 30. In each test, the output of RANN ( computed by Equation (8)) and target value of the outputs ( computed by Equation (7)) are compared, and the changing of weights and bias are repeated to minimize the model error ( RMSE ). Then, the best RANN model is selected based on minimization of EI .

Results and Discussion
The best FSF-ARIMA model was chosen based on and the number of parameters among multiple possible FSF-ARIMA models (Table 1). In addition, among the examined structures, FSF-ARIMA (5, 1, 5), FSF-ARIMA (5, 1, 6), and FSF-ARIMA (1, 1, 2) had the lowest (about 13,760). Since values are similar in these three structures, the best model is determined based on the minimum number of parameters. The FSF-ARIMA (1, 1, 2) with the lowest and the minimum number of parameters among FSF-ARIMA models was chosen as the best model.  Table 2). The result of the best autoregressive hybrid model is selected as the proposed model and is compared with the previous FSF-ARIMA and RANN models.   (Table 3). In addition, the results show that the proposed autoregressive hybrid model had a smaller EI than the previous FSF-ARIMA and RANN models (Table 3). In the training phase, the EI decreased by 0.44 and 0.27, and also 2 R increased by 0.67 and 0.03 by using the proposed model compared to the FSF-ARIMA and RANN models, respectively. The comparison indicates the improvement of fitting by the proposed model in the training phase. Therefore, the results indicate significant improvement via capturing the flow pattern in the training phase by the proposed hybrid autoregressive model compared to the previous models. The regression evaluation metrics help to determine how close the predicted values are to the actual ones. However, they do not evaluate whether the model properly fits the data while the residuals are usually dedicated to evaluating this. Therefore, we evaluated the forecasting reliability of the proposed models by examining for auto-correlation in the errors [35,36].    The comparison of the observed and model-based hydrographs reveals that the proposed model follows the observed flow variation better than the previous models. Figure 6 shows the comparison of forecasted hydrographs in Cubic Meter per Second (CMS) by the models versus the observed data. However, the inflow hydrograph proposed by the RANN model follows the observed hydrograph better than that of the FSF-ARIMA model but with considerable error in forecasting peak flows ( Figure 6). Moreover, Figure 6 displays the capability of the proposed model in following the peaks and low points of observed hydrograph. The hybrid model forecasts peak flows better than the RANN model. The proposed hybrid model forecasts the inflow values of the hydrograph precisely, whereas the FSF-ARIMA model overestimates the inflow except for the maximum peak flow ( Figure  6). In addition, the proposed autoregressive hybrid model forecasts the maximum peak flow considerably better than the RANN model by reducing the relative error from 361% to 57% compared to the previous study [9]. The evaluation of the models based on the indicates that the monthly forecastingperformance of the proposed model is better than the previous models (Figure 7). In most of the for the autoregressive hybrid model is less than for the previous models. Most of the months show higher than 0.4 for the FSF-ARIMA model. However, the proposed model decreases ACRE to less than 0.15. In addition, the maximum of for the proposed model is considerably less than the previous models. Furthermore, the maximum and average of for the FSF-ARIMA model are 1.2 and 0.7 for the peak season (from January to May), respectively. Those values are 0.225 and 0.15 for the proposed model. Indeed, the hybrid model has a potential to decrease maximum and average forecasting error by 81% and 80%, respectively.

Conclusions
In this study, an autoregressive hybrid model developed to forecast long-horizon daily inflow for optimal operation of multi-purpose reservoir with flood control, hydropower generation, domestic water supply, and irrigation goals. The proposed hybrid model comprises two parts: FSF-ARIMA (for linear part) and RANN model (nonlinear part). Since FSF-ARIMA models cannot calculate nonlinear structures of the datasets, the residuals of the FSF-ARIMA model are the nonlinear part of the stream flow time series and can be forecasted by the nonlinear part of the proposed hybrid model (RANN). The best RANN developed in this study had the log sigmoid activation function, five recurrent , , , , , 22 neurons in the hidden layer, and one neuron in the output layer. The best autoregressive hybrid model is selected as the proposed model and compared with the previous linear and nonlinear auto-regressive models (FSF-ARIMA and RANN). Comparison of the proposed autoregressive hybrid model and the previous models showed that the forecasting of long-term daily flow was significantly enhanced by the proposed hybrid model as follows: • The results demonstrated significant improvement in capturing the inflow pattern by the proposed autoregressive hybrid model from the previous models.

•
The monthly variations of the forecasting accuracy were extensively improved by the proposed model throughout the year and during peak season.

•
The proposed autoregressive hybrid model forecasted the peak flows more precisely than the previous models.
Finally, the achievement of this research compared to the current forecasting models [9] is proposing an autoregressive hybrid model which improves the ability for forecasting reservoir inflow especially for peak flows which occurred during a one-year horizon. In [9], NARX-RNN and ARIMA models were employed. The results of ARIMA showed EI values equal to 0.87 and 0.85 for the training and forecasting period, respectively. Moreover, the results of NARX-RNN showed EI values equal to 0.62 and 0.68 for training and forecasting period, respectively. Therefore, compared to Table 3, the hybrid model developed in the current study outperforms both ARIMA and NARX-RNN which were presented by [9]. The findings of this study can be used for optimum allocation of water resources and releasing reservoir water for optimal operation of multi-purpose reservoirs, especially in operating dams for flood control and generating hydropower. Although the hybrid model outperformed the RANN and FSF-ARIMA models, the ACF plots revealed that all models were unable to make reliable forecasts. It is worth mentioning that employing a more advanced model such as long short-term memory (LSTM) and deep learning techniques like the Convolutional Neural Network (CNN), as well as their combination, CNN-LSTM [35,36], can improve the accuracy of forecasting. In addition, these new models have shown more reliable forecasts [35,36]. Therefore, there are open avenues to compare the results of this study with LSTM and CNN-LSTM in future investigations.