Artiﬁcial Neural Network and Multiple Linear Regression for Flood Prediction in Mohawk River, New York

: This research introduces a hybrid model for forecasting river ﬂood events with an example of the Mohawk River in New York. Time series analysis and artiﬁcial neural networks are combined for the explanation and forecasting of the daily water discharge using hydrogeological and climatic variables. A low pass ﬁlter (Kolmogorov–Zurbenko ﬁlter) is applied for the decomposition of the time series into different components (long, seasonal, and short-term components). For the prediction of the water discharge time series, each component has been described by applying the multiple linear regression models (MLR), and the artiﬁcial neural network (ANN) model. The MLR retains the advantage of the physical interpretation of the water discharge time series. We prove that time series decomposition is essential before the application of any model. Also, decomposition shows that the Mohawk River is affected by multiple time scale components that contribute to the hydrologic cycle of the included watersheds. Comparison of the models proves that the application of the ANN on the decomposed time series improves the accuracy of forecasting ﬂood events. The hybrid model which consists of time series decomposition and artiﬁcial neural network leads to a forecasting up to 96% of the explanation for the water discharge time series.


Introduction
Worldwide, flood events are the most significant natural hazards yielding severe consequences to the socio-economic structure with damage up to billions of dollars [1][2][3]. The flood mortality rate during the last couple of years has fallen during this natural hazard which is called the most dangerous natural phenomenon, but although the mortality rate is decreasing the cost is increasing [4]. An increase in the frequency of irregular flood events has been observed with climatic changes [5][6][7][8][9], and prediction of the flood events is required in highly occupied regions such as in New York State.
In upstate New York along the Mohawk River and other tributaries, flood events are widespread and persistent. In Schenectady, New York, flood events are due to either an increase in precipitation and storms during the summer or ice jams in the winter months. Ice jams occur when floes accumulate at the base of bridge piers, locks, and dam structures, impeding the downstream water flow and causing an upstream rise in the water level [10,11]. Flood forecasting research has been conducted in the past at Schenectady in which flood forecasting and damage evaluation has been surveyed [12][13][14][15][16][17].
Prediction of the flood events is a necessity that will mitigate damage. In this study, we implement an advanced artificial neural network (ANN) prediction model that is based on minimum interferences time series from hydrogeological and climatic data to predict the water discharge time series in Mohawk River.
Flood prediction and forecasting is a topic of the most significant interest in the area of natural hazards [18][19][20]. There are two main approaches to the flood prediction problem. The first uses mathematical modeling where the physical dynamics are studied, modeled and applied. Mathematical modeling takes into consideration measurements reflecting the hydrologic cycle. These metrics are then united and transformed into a hydrographic prediction tailor-made to a specific spatial variation of the variables contributing to water flow. A channel flow routing model can then be adapted to calculate river flow and yield a successful prediction. River flow or water discharge forecasting is an example of mathematical modeling for flow forecasting [21]. A second approach is the statistical observations of the relationships among inputs and outputs of water discharge with no other meta information of the physical dynamics of the process. Such observations contribute to the creation of stochastic models such as moving average models [22], and Markov [20], respectively.
Natural hazard geophysical processes are related to multiple daily natural phenomena like rain, wind and gravitational forces. Some hazards are described by univariate models like precipitation, wind speed, or groundwater-level time series variables. Floods result in a multi-variable relationship which describes the interaction of the climatic and hydrogeological variables. However, this correlation may involve interferences in the data acquired from different time scales. An example is the groundwater fluctuation which is related to annual water discharge, seasonal cycles, and short-term variations [14] and a resulted prediction yielded up to 83% of explanation using the multiple linear regression model and the decomposition of the variables. The orchestration of those components results in interferences. Those interferences introduce noise in the data and may hamper the accuracy of the forecasting analysis. The decomposition of the variables into different components decrease the noise and separate the different signals on the data. However, the short-term prediction model showed the lowest performance among the different time series components. In this research, we present how this methodology can be improved using non-linear methods such as the ANN.
There are several studies on analysis and prediction in river flooding applied in several locations (e.g., in Germany, Indus River in India or Elbe River in Czech Republic) [23][24][25][26]. In the Czech Republic, the investigation of the August 2002 flooding has shown an explanation described by a correlation of 0.75 yielding a coefficient of determination (R 2 ) value of 0.56 between the water discharge and the climatic variables. The resulting moderate linear relationship might be due to a mixed interference of scales in the time series. However, this model can be improved with the separation of scales in the time series [27][28][29]. In particular, the separation of the components in the time series is necessary to avoid interferences from different covariance structures on the data. An absence of separation of scales may lead to erroneously estimated parameters of the linear regression model or other multivariate models (principal components, canonical pairs). For this reason, the time series decomposition is essential before performing any analysis.
The time series decomposition using the Kolmogorov-Zurbenko (KZ) filter provides adequate separation of frequencies in the time series data, and it has been applied in many environmental applications [30][31][32][33]. The KZ filter provides a simple design and allows a physical interpretation of all the components of the time series [29,34]. The KZ filter provides the best and closest results to the optimal mean square of error. Also, it allows effective separation of frequencies for application directly to datasets with missing data [18,32,35,36].
Several studies have examined the prediction of the water discharge in different locations [9,23,37,38]. Some studies have used artificial neural networks [23,37,38], and other traditional statistical approaches [9,39]. ANNs have become popular for prediction and modeling purposes in various fields [40][41][42][43][44][45]. Several studies have developed hydrological forecasting models of river flow using ANN [46,47]. In recent studies, they have been used as an alternative to previous methods such as statistical and empirical methods for evaluating different physical phenomena [23,37,43]. In comparison with a multiple linear regression (MLR) model, an ANN provides a computational way of determining a non-linear relationship between some inputs and one or more outputs. ANN has applied for modeling, identification, and prediction of complex systems [48]. Although the methods used for the prediction of the water discharge provide prediction accuracy approximately up to 80% [14], the performance of the prediction may be increased using ANNs. There are cases where ANNs can suffer in a similar way to linear models due to dataset limitations such as volume and noise [49,50].
Noise in hydrologic datasets is a case-specific feature and depends on many factors (including equipment and sensor sensitivity). Different signals may produce noise on the time series data [14,27,28] and subsequently influence the ANN application. We intend to determine the different signals on the time series and eliminate the noise on the data using time series decomposition. We aim to examine how much the ANN application will be improved for each component, separately.
In this study, we examine the water discharge time series as a function of climatic and hydrological variables. We use the multiple linear regression model and artificial neural networks in the decomposed time series data. We examine whether the time series components can explain the hydrological behavior of the Mohawk River basin and reveal the main characteristics of the watershed. We aim to facilitate modeling by separating individual patterns, and decrease interferences originating from covariance structures existing between the components of the time series. We compare the linear MLR model with a non-linear ANN one using the climatic and hydrogeological variables. It is anticipated that this study will provide baseline information toward the establishment of a flood warning system for several sections of Mohawk Watershed, New York.

Study Area
The Mohawk River crosses its basin with an easterly flow direction and a length of 240 kilometers. It is the largest tributary to the Hudson River, which flows into Cohoes, a few kilometers north of the city of Albany. Many tributaries contribute to the Mohawk River including the Schoharie Creek, the East Canada Creek, and the West Canada Creek. We selected for investigation three water discharge stations located in Mohawk River basin western from Albany, NY, USA, and at a location where a confluence exists between the Mohawk River and the Schoharie Creek originating from the Mohawk watershed and the Schoharie watershed, respectively ( Figure 1). The Mohawk River Watershed is divided into three geographic regions. Those are the Upper Mohawk, the Main River, and the Schoharie Watershed. The Main River is also called the Mohawk Valley which includes portions of Fulton, Montgomery, Schenectady, Saratoga, and Albany Counties, and it is a lowland area with extensive agricultural land use. Schoharie Creek is a less agricultural land use area, and it contributes to the public water supply of New York City through a pipeline from the Schoharie Reservoir [51].

Figure 1.
The map shows the location of the weather station, the USGS groundwater station (GWL4001), the two USGS water discharge stations (WD1346 and WD1347) located at the Mohawk River (MR) area of the Mohawk watershed, the third station (WD1500) located at the Schoharie Creek (S) of the Schoharie watershed, and the station of the tides in Albany, NY (ECC-East Canada Creek; MR-Mohawk River; S-Schoharie Creek; WCC-West Canada Creek; NOAA-National Oceanic and Atmospheric Administration; USGS-United States Geological Survey). The geographic information system (GIS) map was composed in ArcMap utilizing data from the New York State clearinghouse and the New York geographic information gateway.

Time Series Decomposition-Kolmogorov-Zurbenko (KZ) Filter
The presence of different signals in time series data may provoke misleading results for the application of a numerical model on the raw data [14,18,19]. The decomposition of the time series applied to the variables [14,18], and separation of the different signals on the data has been applied and results in the minimum interferences between the components of the time series. After the decomposition of the time series, we apply a model for each component separately [19,27,28]. The decomposition of the time series of a variable can be expressed using the expression (1): where X(t) represents the original time series of a variable, L(t) is the long-term trend component, Se(t) is the seasonal component, and Sh(t) is the short-term component. The long-term component expresses the fluctuations of a time series longer than a given threshold, the seasonal component describes the intraannual fluctuations, and the short-term component describes the short-term variations. The KZ filter was applied for the decomposition of the time series data. The KZ filter is a low pass filter, and it is defined by three (p) iterations of a simple moving average of 33 (m) points, where an odd window size m is used (m = 2k + 1). The moving average of the KZ is given by expression (2): where m = 2k + 1. The output of the first iteration becomes the input for the second iteration, and so on. The time series produced by p iterations of the filter described in expression (2) is denoted by: The KZ 33,3 filter (length of 33 with three iterations) was applied in all the variables to provide a physical based explanation of the water discharge time series. The parameters of the filter have been selected to provide the optimal solution for the water discharge time series [14].
The water discharge time series for the three different stations were logarithmically transformed to convert the water discharge time series into a stationary time series with constant variance (Figure 2). The KZ filter was applied to the logarithm of the daily water discharge and produced a time series which consists of the long-term variations in the time series (L(t)). The KZ filter with the same parameters was applied to the variables of daily wind speed, temperature, tides, precipitation, and groundwater level and produced the long-term component of each variable.

Multiple Linear Regression (MLR)
For the explanation and prediction of the water discharge time series by the climatic and hydrogeological variables, we use the multiple linear regression model which describes the projection of the water discharge time series into the climatic and hydrogeological variables. The

Multiple Linear Regression (MLR)
For the explanation and prediction of the water discharge time series by the climatic and hydrogeological variables, we use the multiple linear regression model which describes the projection of the water discharge time series into the climatic and hydrogeological variables. The purpose of the MLR is to explain as much as possible of the variation observed in the water discharge time series, leaving as little variation as possible to unexplained "noise" [52]. For the raw time series data, the multiple linear regression model can be expressed by: where WD(t), T(t), TD(t), PR(t), WS(t), GWL(t), denote the raw data of the logarithm of the water discharge, temperature, tide, precipitation, wind speed, and groundwater-level time series, respectively. The regression coefficients in the multiple linear regression model described in expression (4), have been selected such as they are statistically significant for the model and have been multiplied with the independent variables. ε(t) represents the residuals. Because expression (4) is given for the natural logarithm of the water discharge, the additive term ε(t) has a multiplicative effect, exp(ε(t)), in the original time series. Since the term ε(t) is sufficiently small, we can conclude that: Therefore, ε(t)*100 corresponds to percent changes in the water discharge time series unexplained by the climatic and hydrogeological variables.
To measure the strength of the relationship in the multiple linear regression model of the raw data, we use the coefficient of determination, which is the square of the correlation coefficient, R 2 as a measure of goodness of fit.
For the prediction of the long-term component of the water discharge, the multiple linear regression model is applied in the filtered climatic and hydrogeological variables in the three stations. Expression (6) describes the multiple linear regression model of the long-term component of the water discharge time series: WD LT (t) = e* T LT (t) + f* TD LT (t) + g* PR LT (t) + h*WS LT (t) + i* GWL LT (t) +ε(t) (6) In expression (6), we denote the long-term components of the water discharge, temperature, tides, precipitation, wind speed and groundwater-level time series with WD LT (t), T LT (t), TD LT (t), PR LT (t), WS LT (t), GWL LT (t), respectively. Similar methodology has been applied to the prediction of the seasonal component of the water discharge time series. The seasonal component of the water discharge, WD Se (d), can be defined by the expression (7): d = 1, 2, . . . , 365, and d = 1 represents 1 January; y = 1, ..., Y.
In expression (7), d represents the days of a year, y is the number of years of the observed values, WD(t) is the raw logarithm of the water discharge time series, and WD LT (t) is the long-term component of the water discharge time series. The seasonal components of the climatic and hydrogeological variables can be defined similarly. Based on the previous studies, for the prediction of the seasonal component of the water discharge time series, we apply the multivariate linear regression model [14].
To estimate the short-term component of the water discharge time series, WD Sh (t), we use the expression (8): where WD(t) is the raw water discharge time series, WD LT (t) is the long-term component of the water discharge time series, and WD Se (d) is the seasonal component of the water discharge. Similarly, we can estimate the short-term components of the climatic and hydrogeological variables.
To estimate the total explanation of the MLR model for the water discharge time series, we combine the explanation of the multiple linear regression models for the long, seasonal, and short-term components using expression (9): where R 2 is the coefficient determination and the %Var is the percent of the variance.

Artificial Neural Network (ANN)
ANNs are a robust method for classifying flow data which may perform well for solving non-linear problems using their inner-parallel architecture [20,49,50,53]. The neural classification usually has two main areas of application. One area is to use a number of relevant dataset attributes (predictors) and calculate the value of an unknown attribute (target value). Another area is to use all available attributes at any given time window t and calculate their value at t + 1, t + 2, . . . , t + n (one step ahead and multi-step ahead prediction). This methodology in combination with the KZ filter may provide the minimum interferences between the long-, seasonal-, and short-term component and a maximum performance of ANN.
For this study, ANNs are applied to the decomposed time series (long-, seasonal-, and short-term component) of all the variables. ANN is a mathematical model that consists of an interconnected group of neurons, and it processes information using a connection to approach the computation. ANN changes the structure based on external and internal information that flows through the network during the learning process. The advantages of ANNs are that they can represent both linear and non-linear relationships and learn these relationships directly from the data [53].
For the application of ANNs, a feedforward multilayer perceptron (MLP) classifier with two layers has been implemented for the analysis. The hyperbolic tangent has been selected as the transfer function of the MLP classifier. The formula of the transfer function can be described by Equation (10): where ϕ i is the output of the ith node (neuron) and υ i is the weighted sum of the input connections [53]. MLP training identifies the weight, ω ij, for each layer based on the network learning ( Figure 3).
Water 2018, 10, x FOR PEER REVIEW 8 of 20 To estimate the total explanation of the MLR model for the water discharge time series, we combine the explanation of the multiple linear regression models for the long, seasonal, and shortterm components using expression (9): where R 2 is the coefficient determination and the %Var is the percent of the variance.

Artificial Neural Network (ANN)
ANNs are a robust method for classifying flow data which may perform well for solving nonlinear problems using their inner-parallel architecture [20,49,50,53]. The neural classification usually has two main areas of application. One area is to use a number of relevant dataset attributes (predictors) and calculate the value of an unknown attribute (target value). Another area is to use all available attributes at any given time window t and calculate their value at t + 1, t + 2, …, t + n (one step ahead and multi-step ahead prediction). This methodology in combination with the KZ filter may provide the minimum interferences between the long-, seasonal-, and short-term component and a maximum performance of ANN.
For this study, ANNs are applied to the decomposed time series (long-, seasonal-, and shortterm component) of all the variables. ANN is a mathematical model that consists of an interconnected group of neurons, and it processes information using a connection to approach the computation. ANN changes the structure based on external and internal information that flows through the network during the learning process. The advantages of ANNs are that they can represent both linear and non-linear relationships and learn these relationships directly from the data [53].
For the application of ANNs, a feedforward multilayer perceptron (MLP) classifier with two layers has been implemented for the analysis. The hyperbolic tangent has been selected as the transfer function of the MLP classifier. The formula of the transfer function can be described by Equation (10): where φi is the output of the ith node (neuron) and υi is the weighted sum of the input connections [53]. MLP training identifies the weight, ωij , for each layer based on the network learning ( Figure 3).  Learning occurs in the perceptron by changing connection weights after each piece of data is processed, based on the amount of error in the output compared to the expected result. MLP applies back-propagation to improve the weights of each neuron while minimizing the error [53]. The error can be represented by Equation (11): where j is an output node in the nth data point, d is the target value, and y is the MLP produced value [53]. The error in the output node j in the nth data point is given by the following Equation (12): To minimize the error of each weight, we may use the gradient descent as described in following Equation (13): In Equation (13), y i is the output of the previous neuron and n is the learning rate. To evaluate our proposed ANNs architecture, different sensor observations have been used from 2005 to 2013. We split the dataset in this study in an 80% (training) and 20% (testing) ratio. A seed was kept constant to provide consistent results in the ANN runs of the different time scale components. As soon as we split the climatic and hydrological time series variables, we apply the ANN approach in each component (long-, seasonal-, and short-term), separately ( Figure 4). Learning occurs in the perceptron by changing connection weights after each piece of data is processed, based on the amount of error in the output compared to the expected result. MLP applies back-propagation to improve the weights of each neuron while minimizing the error [53]. The error can be represented by Equation (11): where j is an output node in the nth data point, d is the target value, and y is the MLP produced value [53]. The error in the output node j in the nth data point is given by the following Equation (12): To minimize the error of each weight, we may use the gradient descent as described in following Equation (13): In Equation (13), yi is the output of the previous neuron and n is the learning rate. To evaluate our proposed ANNs architecture, different sensor observations have been used from 2005 to 2013. We split the dataset in this study in an 80% (training) and 20% (testing) ratio. A seed was kept constant to provide consistent results in the ANN runs of the different time scale components. As soon as we split the climatic and hydrological time series variables, we apply the ANN approach in each component (long-, seasonal-, and short-term), separately ( Figure 4).

Assessment of Model Performance for MLR and ANN Model
To estimate the performance of the ANN and MLR model, the coefficient of determination, R 2 , and the mean square of error (MSE) are used for the analysis. The forecasting accuracy of the water discharge time series is determined by using those criteria with the following expressions (14 and 15):

Assessment of Model Performance for MLR and ANN Model
To estimate the performance of the ANN and MLR model, the coefficient of determination, R 2 , and the mean square of error (MSE) are used for the analysis. The forecasting accuracy of the water discharge time series is determined by using those criteria with the following expressions (14 and 15): where, Y i is the actual water discharge value; Y i is the mean of actual water discharge value,Ŷ i is the estimated water discharge value, and n is the total number of observations. The value of the MSE is always positive, representing a zero in the ideal case [54]. For ideal data modeling, the coefficient of determination, R 2 , should approach to 100% as closely as possible.

Assessment of the MLR Model
To evaluate the effectiveness of the MLR model in river flood prediction, we estimate the coefficient of determination, R 2 , for the raw time series data and the components of the time series (long-, seasonal-, and short-term). Table 1 summarizes the coefficients of determination, R 2 , for the raw data of the three stations. For the raw data, R 2 is moderate because different signals exist in all the time series, and they cause interferences in the model. The accuracy of the model has been improved using the decomposition of the time series. In particular, the coefficient of determination varies from 48.7% to 59% for the all the locations (Table 1). Stations WD1346 and WD1347 perform with a similar range (47.6% and 48.7%, respectively), while station WD1500 has shown a better performance of 59%. With the decomposition of all the time series, the accuracy of the MLR model is higher (Table 1). The long-term components of the water discharge time series provide a strong relationship with a coefficient of determination, R 2 , that ranges from 73.5% to 83.3%. Also, the MLR model applied to the seasonal components of the water discharge time series provides a very strong relationship with values that range from 91.2% to 93.1%. As shown in previous studies, it is reasonable to observe seasonal variations because all the climatic and hydrogeological variables consist of seasonal fluctuations (periodicity 365 days) [14]. The short-term component provides R 2 of 34.1% at the station WD1500, while the R 2 ranges from 68.3% to 70.6% for the stations WD1346 and WD1347. All stations have shown a range of different time-scale components affecting the Mohawk watershed in particular in the short term.
The performance of the MLR model varies for the short-term components of the three stations. The stations WD1346 and WD1347 located at Herkimer and Little Falls area, inside the Mohawk watershed provide a similar performance of the MLR model (68.3% and 70.6%), while the performance is lower for the station WD1500 (34.1%) which is located at Schoharie Creek, inside the Schoharie watershed. The multiple linear regression model is applied in the short-term components of the variables and derives a moderate R 2 (Table 1).

Assessment of the ANN Model
The ANN model has been applied in the raw data ( Figure 5) yielding a range of coefficient determination from 53.4% to 60.7% in all the locations ( Table 2). The accuracy of forecasting has been improved using the ANNs on the decomposed time series data (long, seasonal, and short-term component). In particular, the performance of the long-term component ranges from 97.6% to 98% ( Table 2) which is approximately 15% higher compared with the MLR model. The performance of the seasonal component using the ANN model yields a small improvement compared to the MLR model (R 2 ranges from 93.1% to 95.1% for ANN). However, the performance of the short-term component has been substantially improved using the ANN model (R 2 ranges from 94.1% to 95.3%). The accuracy of the model has been improved approximately three to six times for the short-term component (depending on the station) compared with the MLR method.   5. The feedforward artificial neural network of the raw data from the USGS station WD1346 using a hidden layer transfer function of hyperbolic tangent and an output layer for the prediction of water discharge.

Mean Square of Error (MSE) for the MLR and ANN Model
Another measure that we use to evaluate the performance of both models is the MSE. Table 3 shows the MSE estimated for the MLR and the training and testing data of the ANN. For the decomposed data of the water discharge time series of all the stations, the MSE is lower for the ANN compared with the MLR. This reveals that the ANN model provides a higher performance compared with the MLR model. MSE has also been plotted as a function of the coefficient of determination, R 2 ( Figure 6). Using the ANN model in the decomposed data (long-, seasonal-, and short-term), a lower MSE and higher R 2 has been achieved compared with the MLR model. Figure 6 also reveals a linear pattern between the MSE and the coefficient of determination for each of the component for both models (MLR and Figure 5. The feedforward artificial neural network of the raw data from the USGS station WD1346 using a hidden layer transfer function of hyperbolic tangent and an output layer for the prediction of water discharge.

Mean Square of Error (MSE) for the MLR and ANN Model
Another measure that we use to evaluate the performance of both models is the MSE. Table 3 shows the MSE estimated for the MLR and the training and testing data of the ANN. For the decomposed data of the water discharge time series of all the stations, the MSE is lower for the ANN compared with the MLR. This reveals that the ANN model provides a higher performance compared with the MLR model. MSE has also been plotted as a function of the coefficient of determination, R 2 ( Figure 6). Using the ANN model in the decomposed data (long-, seasonal-, and short-term), a lower MSE and higher R 2 has been achieved compared with the MLR model. Figure 6 also reveals a linear pattern between the MSE and the coefficient of determination for each of the component for both models (MLR and ANN), while there is no pattern between the measures for the raw data. This pattern of the forecasting accuracy (lower MSE) has been achieved when the frequencies on the data have been separated (long, seasonal, and short-term component). Figure 6, also shows a different pattern between the MSE and the coefficient of determination for the two different watersheds (Mohawk River and Schoharie Creek). It is reasonable to assume that the accuracy of the model depends on the watershed of the river for the MLR model. This plot can be used to identify different watersheds based on the measures of goodness of fit (MSE and R 2 ) for the MLR model. However, such an interpretation was not apparent in the ANN model. ANN), while there is no pattern between the measures for the raw data. This pattern of the forecasting accuracy (lower MSE) has been achieved when the frequencies on the data have been separated (long, seasonal, and short-term component). Figure 6, also shows a different pattern between the MSE and the coefficient of determination for the two different watersheds (Mohawk River and Schoharie Creek). It is reasonable to assume that the accuracy of the model depends on the watershed of the river for the MLR model. This plot can be used to identify different watersheds based on the measures of goodness of fit (MSE and R 2 ) for the MLR model. However, such an interpretation was not apparent in the ANN model.  6. The MSE as a function of the coefficient determination (R 2 ) for the raw and the decomposed time series data using the MLR and the training ANN model. Figure 6. The MSE as a function of the coefficient determination (R 2 ) for the raw and the decomposed time series data using the MLR and the training ANN model.

Total Explanation for the MLR and ANN Model
To evaluate the overall explanation of the water discharge time series in all the stations, we multiply the percent of variance for each component with the corresponding coefficient of determination (Expression (9)) for the two different models (MLR and ANN). Table 4 shows the results of the MLR and ANN application to the three stations. From Table 4, it can be observed that the overall forecasting accuracy has been increased up to six times using the ANN compared with the MLR models. In particular, for station WD1500, the overall explanation has been increased six times (from 69.4% to 95.1%) using ANNs. Similar results can be conducted for the remaining stations. The increase in the overall explanation of the model using the ANN method is attributed to the significant increase in the explanation of the short-term component. The short-term component describes short-term variations which can be modeled with a better accuracy using a probabilistic method (ANN). The explanation of all the stations varies from 95.1% to 95.8%, which allows a very accurate prediction of the flood events. A graphical representation of the overall MLR and ANN model can be used to compare the raw and the predicted values of the water discharge time series in all the stations. Following the process, Figure 7 shows the flooding event during the Irene storm (21 August to 28 August 2011) in West Canada Creek at near Herkimer (Mohawk Watershed). Using the ANN model, the predicted values perform well compared with the raw data. With the MLR model, some of the flood events may be predicted but with not such high forecasting accuracy. A similar plot can be derived for the station WD1500 (Figure 8) which shows the impact of the Mid-Atlantic United States flood of 2006 (25 June to 5 July 2006) at Schoharie Creek, New York. The predicted values of the water discharge using the ANN model achieve all the flood events and perform well compared with the raw data. Previous studies have shown that separating the data into summer and winter periods may increase the performance of the MLR model [14]. This plot shows the MLR model derived from only the summer period of 2005. The flooding events may be predicted as a good fit, but they do not provide as high a performance as the ANN model. In this study, we do not separate the data in summer and winter periods as the ANN model provides the maximum performance in the predicted values without separating the data in different periods of the year.

Performance of ANN and MLR on Decomposition Data
Traditional statistical models (i.e., multiple linear regression model) have shown substantially comprehensive and adequate results to the ongoing demand for as close to real-time as possible analysis and prediction [55][56][57][58]. In our study MLR and ANN applications in decomposed time series substantially increase the model accuracy. Time series decomposition is essential before the application of the linear or non-linear model to achieve the separation of the different signals in the time series. However, there is a measurable limitation regarding the targeted prediction in terms of the long-and short-term components of the data by applying the multiple linear regression model (Table 1), which was very prominent with the short-term component of the water discharge from the station WD1500. The prediction accuracy of the water discharge can be improved using ANNs. These have been shown to be highly efficient in predictions of the time series data in a range of environments [55,56,58]. Therefore, a hybrid time series analysis and artificial neural network approach increases the prediction in all the components of the water discharge time series.
The MLR model had a wide range of coefficient determination (R 2 ) based on the short-, seasonal-, and long-term data. The stations WD1346 and WD1347 located at the Herkimer and Little Falls area, inside the same watershed that is the Mohawk Watershed provide a similar performance as the MLR model on the short-term component (68.3% and 70.6%). The WD1500 is located at Schoharie Creek, in a watershed that provides a lower performance on the short term (34.1%). This performance can be justified due to the naive nature of the model which does not use any previous knowledge (e.g., case-based reasoning) or feed outputs to inputs (recurrent neural networks) to enhance its training and improve its outcomes. The superior performance of the ANN on the short-term time data can be explained because a clear pattern can be identified from the decomposed data and, thus, the training can make the model substantially robust and accurate to its predictions. The short-term data are "cleaned" enough from the long-term and seasonal-term variations enabling the ANNs application for an effective non-linear approach for the prediction of the short-term component of the water discharge time series with better accuracy than the multiple linear regression model. ANNs stands above MLR in performance in the short-term component, while the MLR determines a weak R 2 for the short-term component implying the existence of an unclear linear relationship.

Comparison of MLR and ANN Performance on the Physical Interpretation
This study has not only presented the robustness of ANN models in river flood forecasting but also assessed their superiority to multiple regression models. To establish the actual merits of ANNs relative to conventional statistical methods (i.e., the multiple linear regression model), we compared the performance of the ANN model and the MLR at each component. The common technique for both models is the necessity of the decomposition of the time series which allows a physical elimination of the interferences of different frequencies in the time series. This paper presents a combination of time series decomposition and ANNs which offers more advantages than the combination of time series decomposition and the multiple linear regression model. The results of this study can be applied to similar watersheds for the river flood prediction.
It is well known that stochastic optimization does not provide identical results each run time. For this reason, studying an ensemble model like an ANN and an MLR provides more accurate prediction and comparisons. In our study, keeping the seed constant in ANN, the results were consistent for MSE and R 2 . However, utilizing stochastic processes such as ANN, results may easily differ through changing the weights or the importance of the variables by providing the same R 2 and MSE which subsequently may yield wrong interpretations of the model. A linear approach is a cause-effect relationship between the dependent and the independent variables that do not allow such loose constraints, and although it may indicate lower values on the measures of goodness of fit such as the R 2 and the MSE of the ANN, the linear approach is more reliable in the physical interpretation of the prediction model.
The long-term component of the water discharge of the WD1500 at Schoharie Creek dominates in comparison to the short-term component, while the WD1346 and WD1347 stations located closer to the Mohawk Watershed show very similar long-term and short-term components. A physical interpretation may be provided by MLR, perhaps, by distinguishing different watersheds that correlate to contrasting performances in the different time scale components, while the outstanding performance of the ANN regardless of the watershed shows uniform results. The total explanation using the MLR model reveals the importance of each time scale component. Each component in the water discharge time series may affect the Mohawk Watershed and the subsequent hydrologic cycle.
The MLR model's coefficient determination (R 2 ) of the water discharge time series varies between the time series components, especially with the location of the water discharge stations. All stations, regardless of their location, indicate similar values of the linear relationship in the seasonal-term and the long-term components. The short-term shows a broader range of values in the three stations ( Figure 9). In particular, the short-term value is lower for the station that is located at the Schoharie Creek. The stations located at the Mohawk Watershed have provided higher performance in the short-term. The WD1346 measures the water discharge of the East Canada Creek, the station is located 6.6 km away from the Mohawk River following the ECC meander's path, and the WD1347 station measures the Mohawk River. The WD1500 station located at the Schoharie Watershed is at a significant distance from the Mohawk River of approximately 24 km following the Schoharie's Creek meander path. We think that this contrasting curvature length may arrange the way that a river may recharge or discharge in a short time. In the long term, the curvature length may not affect the discharge rate as it is reflected in the uniform high performance of the MLR model across the three stations. It is reasonable to assume that in stations located closer to Schoharie Creek, such as the WD1500, the series may be affected by the river and the watershed properties or use such as permeability, agricultural, residential or business activities and associated water usage or supply needs. Also, we notice that the long-term components seem to dominate in watersheds with longer river meanders and less curvature, while the short-term components are mostly affected by smaller watersheds, and shorter river meanders with mature curvature. Another possible explanation for the watersheds' performance difference in the short term is the characteristics of local climatic variations at the drainage basin including but not limited to the precipitation, temperature, and local winds. The MLR model's coefficient determination (R 2 ) of the water discharge time series varies between the time series components, especially with the location of the water discharge stations. All stations, regardless of their location, indicate similar values of the linear relationship in the seasonalterm and the long-term components. The short-term shows a broader range of values in the three stations ( Figure 9). In particular, the short-term value is lower for the station that is located at the Schoharie Creek. The stations located at the Mohawk Watershed have provided higher performance in the short-term. The WD1346 measures the water discharge of the East Canada Creek, the station is located 6.6 km away from the Mohawk River following the ECC meander's path, and the WD1347 station measures the Mohawk River. The WD1500 station located at the Schoharie Watershed is at a significant distance from the Mohawk River of approximately 24 km following the Schoharie's Creek meander path. We think that this contrasting curvature length may arrange the way that a river may recharge or discharge in a short time. In the long term, the curvature length may not affect the discharge rate as it is reflected in the uniform high performance of the MLR model across the three stations. It is reasonable to assume that in stations located closer to Schoharie Creek, such as the WD1500, the series may be affected by the river and the watershed properties or use such as permeability, agricultural, residential or business activities and associated water usage or supply needs. Also, we notice that the long-term components seem to dominate in watersheds with longer river meanders and less curvature, while the short-term components are mostly affected by smaller watersheds, and shorter river meanders with mature curvature. Another possible explanation for the watersheds' performance difference in the short term is the characteristics of local climatic variations at the drainage basin including but not limited to the precipitation, temperature, and local winds. Even though there might be local climatic variations in the Schoharie Watershed, mainly due to a mountainous landscape, upstate New York is the primary source of water supply for New York Even though there might be local climatic variations in the Schoharie Watershed, mainly due to a mountainous landscape, upstate New York is the primary source of water supply for New York State. Reservoirs and dams operation are necessary to the sustainable water supply and flood control. Reservoirs such as those of Delta Dam on the Mohawk River above Rome supply water to the Erie Canal or the Hinckley Reservoir at West Canada Creek or at Schoharie Creek, where at the Gilboa Dam the Schoharie Reservoir supplies water to the New York City. The reservoirs' operation may introduce interferences in the water discharge time series and affect the multi-scale time series components, especially the short-term variations. The Schoharie Watershed is one of the two reservoirs in the Catskill system, and this watershed drains from the Catskills mountainous area. Schoharie watershed behaves as a diverse basin with dimensions and storage capacity ratio different from other watersheds such as the Mohawk Valley Watershed as it is a very small reservoir that may provide a rapid watershed recharge or discharge [55]. Therefore, the short-term component is expected to be affected by local climatic variations or human contribution such as the reservoir operation to provide the public supply of the water in the New York City.

Conclusions
The hybrid model application of the time series decomposition and ANNs has been shown to be an efficient model of flood hazard forecasting, and it presents an improved prediction model for a flood hazard up to 96%. The combination of the KZ filter and the ANNs approach allows us to eliminate the short-term variations on the decomposed data and take advantage of a non-linear approach. In particular, the decomposition of the water discharge time series from different watersheds has substantially diminished interferences, and it improved the ANN's performance. This methodology allows us to study similar patterns and frequencies in the same component and maximizes the performance and physical interpretation of the forecasting model. The results demonstrated that ANN provides a more efficient prediction model compared with the MLR model. However, the MLR model allows physical interpretation of the decomposed data. The Mohawk Watershed is affected by physical phenomena dependent on multiple time scale components such as long-term or short-term variations of the water discharge time series. Local infrastructure and control of the water supply through dams and intermittent operation of reservoirs may affect the short-term water discharge variations. Depending on the drainage area, the short-term water discharge variations may be prominent in the MLR prediction model.
Author Contributions: K.T., A.M. and S.K. are responsible for the conceptualization, the methodology, the application of software in the data, the formal analysis of the data, the visualization of the results, and the writing, review and editing of different parts of the manuscript.
Funding: This research received no external funding.