Water Level Forecasting Using Deep Learning Time-Series Analysis: A Case Study of Red River of the North

: The Red River of the North is vulnerable to ﬂoods, which have caused signiﬁcant damage and economic loss to inhabitants. A better capability in ﬂood-event prediction is essential to decision-makers for planning ﬂood-loss-reduction strategies. Over the last decades, classical statistical methods and Machine Learning (ML) algorithms have greatly contributed to the growth of data-driven forecasting systems that provide cost-effective solutions and improved performance in simulating the complex physical processes of ﬂoods using mathematical expressions. To make improvements to ﬂood prediction for the Red River of the North, this paper presents effective approaches that make use of a classical statistical method, a classical ML algorithm, and a state-of-the-art Deep Learning method. Respectively, the methods are seasonal autoregressive integrated moving average (SARIMA), Random Forest (RF), and Long Short-Term Memory (LSTM). We used hourly level records from three U.S. Geological Survey (USGS), at Pembina, Drayton, and Grand Forks stations with twelve years of data (2007–2019), to evaluate the water level at six hours, twelve hours, one day, three days, and one week in advance. Pembina, at the downstream location, has a water level gauge but not a ﬂow-gauging station, unlike the others. The ﬂoodwater-level-prediction results show that the LSTM method outperforms the SARIMA and RF methods. For the one-week-ahead prediction, the RMSE values for Pembina, Drayton, and Grand Forks are 0.190, 0.151, and 0.107, respectively. These results demonstrate the high precision of the Deep Learning algorithm as a reliable choice for ﬂood-water-level prediction.


Introduction
Forecasting the water levels in rivers and lakes is critical for flood warning and waterresource management.
Since water-level data from hydrological stations typically have a time series structure, researchers typically employ time-series hydrological-prediction models to forecast future data.Hidden information can be revealed by using past data to predict future water levels (future behavior), which is important for mitigating flood effects, reducing or preventing disasters, and managing water resources.

History of Flood in Red River of the North
Red River discharge varies annually and seasonally, and the water demand of the Red River basin may rise in the future due to a variety of factors, including economic development, population growth, and climate change [1].Patterns in seasonal and annual Water 2022, 14,1971.https://doi.org/10.3390/w14121971https://www.mdpi.com/journal/water Water 2022, 14,1971 2 of 18 streamflow in the basin reflect variability in precipitation.Floods happens in the Red River when the water level increases over the tops of riverbanks, due to significant precipitation over the same area for long periods, in the forms of persistent thunderstorms, rain, or snow combined with spring snow melt and ice jam.Due to a long and severe winter for snow accumulation, warmer temperatures in the spring, and flat topography with weak permeability soil, the mid-latitude regions of North America are highly vulnerable to spring-melt floods [2][3][4].Spring-melt floods are frequent in the Red River as it heads north [5,6].During the spring thaw, the southern part of the Red River basin melts first, and the river becomes hydrologically active; meanwhile, the northern part of the basin is often frozen.Along with the flat and homogenous topography, the river activity forms a slow, meandering river, which causes an overflow in the Red River of the North on the northern side, resulting in floods [5,7].Surface runoff from snowmelt during significant floods leads the Red River to overflow its shallow banks, flooding the whole valley and causing immense damage.Research by Hirsch and Ryberg (2012) and Rice et al. (2015) indicate that the frequency of floods in the Red River basin is increasing dramatically [8,9].Early flood forecasting can help provide communities with early warnings about protecting homes and lands as well as mitigating the impact of floods.With this introduction, there is an increasing need to improve the characterization and identification of precursors, which affect the hydrological conditions that cause spring-snowmelt floods and improve predictions to reduce Red River flood damage.

Method Used Previously for Flood Prediction
In general, there are three methods for forecasting streamflow.The first approach mainly depends on physically based models [10] that have long been used to forecast hydrological events, including storms [11,12], runoff or rainfall [13,14], shallow streamflow [15], hydraulic models [16,17], and more cases of global circulation [18], encompassing the interaction between atmosphere, water, and floods [19].Although physical models are capable of forecasting a broad range of flooding situations, they typically require a variety of hydro-geomorphological-monitoring datasets, which necessitates costly computing and prevents short-term prediction [20].Moreover, the construction of physically based models frequently demands in-depth knowledge and expertise in hydrological factors, which has been noted as challenging [21].Furthermore, many types of research demonstrate that there is a gap in the short-term prediction capability of physical models [19].
In the second approach, mathematical models are used to model the streamflow hydrodynamic.Since this approach is based on original hydrological and hydraulic principles, this alternative is broadly utilized in nations across the world.Flood-modeling studies have utilized physically based hydrologic models, such as the Hydrologic Engineering Center's Hydrologic Modeling System (HEC-HMS) [22], the soil and water assessment tool (SWAT) [23], IHACRES [24], and the HSPF model [25], which all have been engaged in flood-modeling studies.However, using these models necessitates substantial field observations as well as trial and error parameterization techniques [26].They still only supply at-site flood-risk estimates based on local streamflow data obtained at gauging hydrometric stations, making them inappropriate for regional-flood assessment [27,28].
The last approach is data-driven and is based on the statistical relationship between input and output data for near-future predictions.The Machine Learning (ML) model, which has been applied in flood forecasting since the 1990s, is one of the most popular frameworks utilized in the data-driven method.ML models can offer a powerful solution for flooding prediction without explicitly knowing such nonlinear dynamic processes, in contrast to a physically based numerical model [29].
Numerous research has been conducted to predict the water levels in rivers, lakes, and other water bodies worldwide using different time-series models.The Autoregressive Integrated Moving Average (ARIMA) model is widely used for river discharge and flood forecasting [30][31][32][33][34][35][36].Yürekli et al., presented a monthly streamflow forecasting method for three gauging stations in the north Anatolia fault line and evaluated the residuals of the ARIMA model [30].The authors state that a comparison of monthly mean and standard deviation for observed and anticipated data using the ARIMA model reveals that the anticipated values maintained the main statistical features of the observed data.By comparing the observed and anticipated monthly data sequences using linear regression, they discovered a statistically significant linear relationship between the observed and anticipated monthly data.In another study by [31], data from two Schuylkill River stations in Berne and Philadelphia (in the United States) were collected over six years.The author demonstrated that daily data have no seasonality; therefore, there was no seasonality in the proposed ARIMA formulation.Even though both stations are located along the same river, the proposed ARIMA models provided for each station differed due to the differing watershed coverage.Exponential smoothing was employed by [37] to study and predict the water-level trends in the Mtera dam in Tanzania.They discovered that the water level in the Mtera dam has been declining over time, and the highest and lowest water levels were both showing a declining trend in recent years.Additionally, estimates for the next five years based on the exponential smoothing of time-series data revealed that the water level would be below the lowest water level required for energy production in the spring of 2023.The authors evaluate the efficiency and the accuracy of several models for predicting Tanshui River water levels in Taiwan during 50 historical typhoon events that occurred over 11 years between 1996 and 2007.The authors compared three eager models, including artificial neural network (ANN), linear regression (REG), and support vector regression (SVR), with two lazy models, including locally weighted regression (LWR) and the k-nearest neighbor (kNN).According to the results, ANN and SVR outperformed REG among eager-learning models.However, the authors state that although ANN, SVR, and REG were considered eager-learning models, their prediction capabilities differed due to different learning optimizers.In their results, among lazy-learning models, LWR outperformed kNN, and both lazy models showed more accurate predictions than the REG eager model.
To our knowledge, no previous studies have explicitly applied a classical statistical method, a classical ML algorithm, and a state-of-the-art Deep Learning method to achieve improvements to flood prediction in the Red River of the North.The goal of our study is to apply three models: SARIMA (a conventional statistical model), RF (a classical ML algorithm), and LSTM (a Deep Learning method) to map flood susceptibility and distinguish flood-hazard regions in the Red River of the North.The findings of this study will assist regional and local authorities as well as policymakers in mitigating flood risks and developing appropriate mitigation measures to minimize potential damages.The observed water level of the Red River of the North in three United States Geological Survey (USGS) stations, at Pembina, Drayton, and Grand Forks, sampled hourly from 2007 to 2019, are used to evaluate the water level for six hours, twelve hours, one day, three days, and one week in advance at the Red River the North.Pembina is the downstream forecasting point, but it only has a water-level station.Both Drayton and Grand Forks have a full discharge-measurement station that provides water-level and discharge series.

Study Area
The Red River basin is an international, multi-jurisdictional watershed of 116,550 square kilometers (45,000 square miles), with 80% of the basin in the United States and 20% in Manitoba, Canada.With a drainage area of 45,000 mi 2 (104,100 km 2 ), it is a unique basin that flows from the south of the region northward into Canada through Pembina (Figure 1a) [38].The basin itself is approximately 60 miles (97 km) at its widest point and 315 miles (507 km) in length.The climate can be characterized as semi-arid, with cold winters and dry summers.With a length of 545 river miles, the highly sinuous topography of the north, low-sloping canal of the northern Red River stretches from Wahpeton, North Dakota to Lake Winnipeg, Manitoba and marks the border between North Dakota and Minnesota (Figure 1b) [39].Most streamflow occurs in the spring and early summer in a typical year due to snowmelt, rainfall on the snowpack, or severe rain on saturated soil.Flooding is more common in the spring and early summer, and it is more severe during wet seasons [39].Furthermore, the flat topography of the basin, along with the climatic conditions stated above, often leads to major floods in the Red River and its tributaries.
Dakota to Lake Winnipeg, Manitoba and marks the border between North Dakota and Minnesota (Figure 1b) [39].Most streamflow occurs in the spring and early summer in a typical year due to snowmelt, rainfall on the snowpack, or severe rain on saturated soil.Flooding is more common in the spring and early summer, and it is more severe during wet seasons [39].Furthermore, the flat topography of the basin, along with the climatic conditions stated above, often leads to major floods in the Red River and its tributaries.

Data Representation and Pre-Processing
The Red River of the North has been chosen for four reasons.First, the river network has a limited number of minor flow-control structures, which is advantageous when using an ML technique to predict the flood.Second, the Red River presents a challenging condition for using satellite altimetry to estimate the stage.Despite its vast catchment area, the Red River runs along a main stem channel that is only around 100 m wide at the bankfull stage.Third, USGS gauging stations are adequately established along the main tributaries, offering field-based estimates of river flow and river stage for the verification of the modeling system.Finally, although the typical occurrence of devastating property losses due to floods is in years with substantial snow accumulations, the river basin, particularly the basin-scale hydrologic response to climatic variability, has not been extensively modeled.It is worth mentioning that among our three selected stations, the Pembina station on Red River, which is located downstream, does not have any data for river-flow discharge.
The characteristics of all three datasets are summarized in Table 1.Figures 2 and 3 present the monthly and annual water levels of the three selected stations, respectively.All datasets used in this paper have a monthly and annual component (see Figures 2 and  3), which make the data non-stationary.

Data Representation and Pre-Processing
The Red River of the North has been chosen for four reasons.First, the river network has a limited number of minor flow-control structures, which is advantageous when using an ML technique to predict the flood.Second, the Red River presents a challenging condition for using satellite altimetry to estimate the stage.Despite its vast catchment area, the Red River runs along a main stem channel that is only around 100 m wide at the bankfull stage.Third, USGS gauging stations are adequately established along the main tributaries, offering field-based estimates of river flow and river stage for the verification of the modeling system.Finally, although the typical occurrence of devastating property losses due to floods is in years with substantial snow accumulations, the river basin, particularly the basin-scale hydrologic response to climatic variability, has not been extensively modeled.It is worth mentioning that among our three selected stations, the Pembina station on Red River, which is located downstream, does not have any data for river-flow discharge.
The characteristics of all three datasets are summarized in Table 1.Figures 2 and 3 present the monthly and annual water levels of the three selected stations, respectively.All datasets used in this paper have a monthly and annual component (see Figures 2 and 3), which make the data non-stationary.The water level of these three datasets was collected from USGS hourly gauge-height record.The samples were adopted from 1 November 2007 to 31 December 2019.During data preprocessing, if the number of consecutive missing values were less than eight hours, we used the linear-interpolation technique to fill the missing data.In the case of the more than eight hours of missing values, we removed the period from our dataset.
The Pembina River, a tributary of the Red River of the North, is the major source of water in south-central Manitoba.The Pembina River flows southeast from the Turtle Mountains' highlands, beginning at its highest point (elevation 2000 feet).It joins the Red River from the west just south of Pembina, North Dakota, approximately 2 miles (3 km) south of the US-Canadian border.At Pembina, the height of the water flowing down the Red River is recorded by a stream gauge.The sensor, one of around 8000 maintained by the USGS, acts as a sentinel for communities along the river that were devastated by floods in 2009, 2010, and 2011.The Pembina gauge was targeted mainly for flood prediction because of two main reasons: first, this station is the last station on the Red River before it flows into Canada, and second, two upstream stations, Drayton and Grand Forks, have discharge information with the USGS, but Pembina station, as the downstream station, does not have any discharge information.
Based on Figure 2, April is the month with the highest streamflow at Pembina station, with an average water level of 26.53 feet.The maximum water level recorded at this station was 52.71 feet on 15 April 2009.The streamflow records for the second station, Drayton, have been continuous since 1942.Since 1970, specific-conductance measurements have been made at both stations, Drayton and Emerson, whenever discharge measurements were obtained or about once every month.This long-term data provided the information so that trends in streamflow and water quality could be examined.On 21 January 1986, streamflow measurements under ice conditions were obtained by the USGS crew on the Red River at Drayton [40]. Figure 2 shows that April is the month with the highest flow found at the Drayton station, which has an average water level of 19.66 feet.The maximum water level at the time of this study was recorded on 6 April 2009, with an average water level of 43.82 feet.Figure 2 also shows that May has the second-highest streamflow for two stations, Pembina and Drayton, with an average water level of 25.08 feet and 1891 feet, respectively.
The upstream gauge station on the Red River of the North at Grand Forks was established in 1882 by the U.S. Engineers (currently the U.S. Army Corps of Engineers).Charles M. Hall, a geology professor at North Dakota Agricultural College, installed an additional station above the original stream gauge on 26 May 1901.Hall's primary objective for the stream gauge was to investigate the possibility of storing Red River floodwaters for hydropower, irrigation, and domestic supply needs [41].Today, this stream gauge has a continuous record of stream gauge height, discharge, stream velocity, and water quality parameters, as well as real-time web data.Figure 2 shows that May is the month with the highest streamflow found at the Grand Forks station, which has an average water level of 19.70 feet.The maximum water level at the time of this study was recorded on 6 April 2009, with an average water level of 49.84 feet.
Frequent flooding has been an issue for the Red River of the North at Grand Forks, ND, most notably the major floods of 1882,1897,1950,1996,1997,2006,2009, and 2011, and that is why Grand Forks stream gauge data are essential for the flood protection of the cities of Grand Forks, ND and East Grand Forks, MN.
Figure 3 shows the box and whisker plot of the water-level data annually at three hydrology stations of the Red River of the North.The maximum average of annual waterlevel data for all three stations occurred in 2019, which was 21.54 feet, 16.05 feet, and 18.84 feet for Pembina, Drayton, and Grand Forks stations, respectively.
Motivated by the success of the Autoregressive Integrated Moving Average (ARIMA) model [32,42,43] we used a seasonal statistical approach called the SARIMA method to capture the components of the time series separately.This method is tested on the real datasets of the Red River for hourly water level forecasting.Linear statistical models, such as SARIMA, might not be perfect at modeling the nonlinear relationships in the time series, but it is sufficient for modeling the linear component [44].Meanwhile, non-parametric statistical ML models, such as long short-term memory (LSTM), can model any nonlinear components (universal approximators).Furthermore, for the last method, RF was selected due to its popular use as an ML algorithm in hydrology applications [45][46][47].All these three selected methods are discussed in the following section.

Seasonal Autoregressive Integrated Moving Average (SARIMA)
Seasonal Autoregressive Integrated Moving Average (SARIMA) extends the Autoregressive Integrated Moving Average (ARIMA) and is often known as Seasonal ARIMA.ARIMA combines the differencing with (Autoregressive) AR and (Moving Average) MA.In other words, in ARIMA, the "AR" indicates the relationship between a variable in time-series data and its own lagged values.The "I" represents differencing an observation's value from the previous values to deliver stationary time-series data.The "MA" denotes the linear combination of observation and the errors of previous observations.The ARIMA model is also named non-seasonal ARIMA, and it is not suitable when time-series data include seasonal components.Hence, an extended version of ARIMA was proposed by adding the seasonal terms called Seasonal ARIMA or, in short, SARIMA.The ARIMA (p, d, q) can be represented mathematically by the following formulas: where ∆ is (1 − B), in which B denotes the "backward" operator and B y (t) = y(t − 1), y(t) shows data samples at time t, c represents the symbol for the constant value, α 1 , . . ., α p are defined as auto-regressive parameters, the white noise at time t is defined as ∈ (t), and β 1 , . . ., β q are the moving average coefficients [32].

Random Forest
The Random Forest (RF) model is an ensemble supervised ML-algorithm technique for multiple decorrelated decision trees.We define a decision tree as a random model that relates output to elucidative variables or attributes.As a result, an individual decision tree has a set of states, which are organized and consecutively devoted to a dataset.We can grow them from stochastic resampled training batches selected from the original data to orthogonalize the trees.Numerous decision trees deliver independent numerical forecasts of the research target for regression applications, contrary to class labels for classification.Eventually, the outcome fits the mean forecast of individual trees.
The RF is a straightforward yet proper choice to tackle real-world water-science problems [48,49].The RF requires users to determine the number of trees and the feature number of each node.Moreover, the RF model is not sensitive to these two factors and does not require fine-tuning parameters on a new dataset [50].Additionally, the RF does not overfit when more independent and diverse trees are added.These make the usability of the RF more convenient.The RF was selected due to its simplicity; tuning a few parameters can result evaluates accuracy more than other ML models [51].In this research, we evaluate Python's scikit-learn package.The systemionality of the RF algorithm-values work is briefly explained as follows: the system selects a set of independent values to make an impact on each tree response, which is a subset of the predictor values of the initial dataset.The optimal subset-predictor value is calculated from log , where M is the input.Now, we can calculate the mean-square error (RMSE) for an RF from where ε, v observed , and v response are mse, variables from observed, and result, respectively.Moreover, we can calculate the trees' average prediction.
Water 2022, 14, 1971 9 of 18 where S and t are RF prediction and the number of trees in the forest, respectively.In classification, after defining a set of random trees and prediction, the algorithm compares the number of excess votes to other classes' average votes.Although a predictor set is randomly chosen for each tree from the equal distribution in the regression algorithm, each tree can add a numerical value response to form the RFs.

Long Short-Term Memory (LSTM)
For this research, we employed another Deep Learning method, named the long shortterm memory (LSTM) network, a similar method to the recurrent neural network (RNN).The notion behind RNNs is to employ input data arbitrarily over extended sequences.It repeats the exact task to all elements in the series, and the results rely on the prior analysis.To be more specific, RNN includes a memory cell that grabs data until the training data sequence is completed.The RNNs are the better choice for the nonlinear time-series problems [52].However, there are gradient issues to train long-time lags, which is required to predict time series or hydrology [52].LSTM is developed to build a robust many-to-one model for hydrological-time series similar to RNN memory cells' structure of the input, self-recurrent connection, forget, and output gates [53].Let's say the i t , o t , f t are input, output, and forget gate at the time of t.
Figure 4 illustrates the (LSTM) adopted from [54], where x t and h t show the input and state at time t.Similarly, we have h and x at time t − 1 and t + 1, etc. C t and h t are defined as long-term and short-term (hidden) memory in this cell.According to the diagram, the chain of action happens in the network and lets the network learn long-term.The following equation will demonstrate the calculation of h t and C t at the t th step in this process.
where U i and W i are matrices for weight; b i is the bias; σ is a sigmoid activation function; and C t is the candidate for the cell-state value.where S and t are RF prediction and the number of trees in the forest, respectively.In classification, after defining a set of random trees and prediction, the algorithm compares the number of excess votes to other classes' average votes.Although a predictor set is randomly chosen for each tree from the equal distribution in the regression algorithm, each tree can add a numerical value response to form the RFs.

Long Short-Term Memory (LSTM)
For this research, we employed another Deep Learning method, named the long short-term memory (LSTM) network, a similar method to the recurrent neural network (RNN).The notion behind RNNs is to employ input data arbitrarily over extended sequences.It repeats the exact task to all elements in the series, and the results rely on the prior analysis.To be more specific, RNN includes a memory cell that grabs data until the training data sequence is completed.The RNNs are the better choice for the nonlinear time-series problems [52].However, there are gradient issues to train long-time lags, which is required to predict time series or hydrology [52].LSTM is developed to build a robust many-to-one model for hydrological-time series similar to RNN memory cells' structure of the input, self-recurrent connection, forget, and output gates [53].Let's say the , , are input, output, and forget gate at the time of t. Figure 4 illustrates the (LSTM) adopted from [54], where xt and ht show the input and state at time t.Similarly, we have h and x at time t − 1 and t + 1, etc. Ct and ht are defined as long-term and short-term (hidden) memory in this cell.According to the diagram, the chain of action happens in the network and lets the network learn long-term.The following equation will demonstrate the calculation of ht and Ct at the tth step in this process.
where Ui and Wi are matrices for weight; bi is the bias; σ is a sigmoid activation function; and is the candidate for the cell-state value.In this work, the assembly of the time-delay model is used "Keras: The Python Deep Learning Library".Similar to previous methods, we divided the dataset into training and testing subsets.We partitioned 70% of the data as the training set, 15% as the validation set, and 15% as the testing set.The LSTM-RNN has one layer for each input, output, and LSTM with memory blocks.Based on two criteria, we assessed the model's accuracy: (i) the root mean square error (RMSE) and (ii) the ENS (Nash-Sutcliffe efficiency coefficient).Using these parameters is common in hydrological fields to assess the correlation between predicted and observed outcomes.The calculation formula is shown as follows: where O i , P i , and N are observation at time i, prediction at the time i, and several observations, respectively.

Results and Discussion
Forecasting time series accurately, particularly water levels for early flood warnings, is an essential but complicated process.As a classical statistical method, a classical ML algorithm, and a state-of-the-art Deep Learning method, respectively, the methods are seasonal autoregressive integrated moving average (SARIMA), Random Forest (RF), and Long Short-Term Memory (LSTM), which are widely used and effective forecasting models that have been proposed and tested on hydrological time series.Figures 2 and 3 present the monthly and annual data of these three selected stations.We evaluated and compared all tested ML methods by dividing the collected data into two parts for training and testing.The samples were taken with different frequencies from 1 January 2007 to 3 June 2017 for Pembina station, from 1 January 2007 to 7 February 2017 for Drayton station, and from 1 January 2007 to 5 August 2017 for Grand Forks station.As mentioned previously, the studied data involve 70% of the data as a training set, 15% as validation, and 15% as a testing set.All models were trained on the training datasets and then the trained models were used to forecast at a different time on the testing sets.
After applying the algorithms described above to three different sampling stations, the models were extracted for further evaluation and tabulated in Table 2.The table gives the details on the average forecast results of all tested methods at five different time intervals: six hours, twelve hours, one day, three days, and one week, for the Pembina, Drayton, and Grand Forks datasets.Low values of RMSE indicate a higher forecast accuracy of the chosen models.The best results for each forecasting horizon are highlighted in bold.By detecting the structures of the SARIMA, RF, and LSTM models, it was verified that the LSTM is more accurate than the two other models.The reason is that the LSTM model possesses a lower RMSE than the RF and SARIMA models for predicting the water-level data for the Red River of the North.Comparing the LSTM to the RF and SARIMA models in the Pembina station, the RMSE values are lower by 77.22% and 78.70%, respectively.Furthermore, there are 26.31% and 31.71%reductions in RMSE between the RF and SARIMA models at Drayton station, respectively, when using LSTM.Finally, the RMSE values for the Grand Forks station for LSTM are 83.70%lower than the RF model and 96.39% lower than the SARIMA model.
Figures 5-7 present the visual comparisons of all methods for forecasting one week of water level at Pembina, Drayton, and Grand Forks using a classical statistical method, SARIMA, a classical ML algorithm, RF, and a Deep Learning method, LSTM.The green line indicates the observed data that were used as the test data, and the red line indicates prediction data that is the output of our models.Figure 5 shows the results of forecasting the water level in a randomly chosen period at Pembina station one week ahead using SARIMA (from 1 July 2019 to 8 July 2019, Figure 5a), RF (from 10 November 2018 to 18 November 2018, Figure 5b), and LSTM (from 25 June 2018 to 2 July 2018, Figure 5c).When forecasting one week in advance, the LSTM yields the best results, as it could capture well the trend of the actual data.The results show that the LSTM performed better than the RF and SARIMA to predict the water level, with an average difference of 0.583 ± 0.21 feet between the tested and predicted water levels for three stations.The mean difference between the tested and predicted water levels for RF and SARIMA are 0.983 ± 0.64 feet and 1.848 ± 0.97 feet, respectively.The other two methods do not work as well as LSTM for the Pembina station.Figure 6 demonstrates the results of forecasting the water level in a randomly chosen period at Drayton station one-week ahead using SARIMA (from 28 May 2019 to 4 June 2019, Figure 6a), RF (from 25 December 2019 to 31 December 2019, Figure 6b), and LSTM (from 27 June 2016 to 4 July 2016, Figure 6c).Figure 7 shows a similar result to the case of Drayton station in that LSTM could quite accurately forecast the peak one week ahead.It still captures rather well the trend of the data in one-week ahead forecasts, but the errors are high.Meanwhile, all other methods failed to forecast and could not capture the data trend.
For Grand Forks data with hourly sampling, in a randomly chosen period for oneweek-ahead prediction using SARIMA (from 23 July 2018 to 30 July 2018, Figure 7a), RF (from 31 March 2018 to 4 July 2018, Figure 7b), and LSTM (from 8 August 2019 to 15 August 2018, Figure 7c), Figure 7 demonstrates once again that the LSTM approach is superior to the SARIMA and RF methods.When predicting water levels one week ahead, LSTM produces the closest values to the real ones (Figure 7c).When forecasting water levels one week in advance, SARIMA and RF originate good results as in the case of one week, but LSTM produces predicted values, which are more similar to the true ones than other methods (Figure 7c).Although RF is second behind LSTM, the gaps between the forecast errors of the two methods are rather wide.
Figures 5, 6 and 7c demonstrate that for all water levels in all three stations, the LSTM method forecast was slightly overestimated.As can be seen in Figures 5, 6 and 7a, SARIMA underestimated the water level for Pembina and Drayton stations but overestimated the water level for Grand Forks station.Finally, the RF method overestimates the water level for Pembina station but underestimates the water level for Drayton and Grand Forks stations (Figures 5, 6 and 7b).
Although  indicate the capacity of the model to estimate the water level in two weeks, the short duration of the sampling data may not be a suitable representation of the models' capturing the flood peak.To present the accuracy of our model with different water-level datasets as driving inputs in capturing the flood peaks and times, we have considered one extreme three-month period that occurred in 2016, from May 16 to August 14.The major reason we offer this plot is that the reader cannot see how our model is excellent based on the statistics above.For this purpose, we have considered the maximum water-level events in the year 2016 and forecasted these events one week ahead.Figure 8

Conclusions
Forecasting time series accurately, especially water levels for flood-warning systems, is an important but challenging task.The water-level forecasts at the Red River flow-gauging stations, specifically for downstream stations without any discharge information available, such as Pembina in this study, play a vital role in the early flood-warning system.In this paper, we have examined a classical statistical method, SARIMA; a classical ML algorithm, RF; and a Deep Learning method, LSTM.As shown in our comparison of the models for Pembina, Drayton, and Grand Forks stations, the LSTM method achieved better results and more accurate prediction performance than the SARIMA and RF methods.SARIMA is effective at modeling linear data, whereas the other statistical machinelearning models are superior at modeling nonlinear data.A water-stage time series, on the other hand, frequently has both linear and nonlinear correlation structures.Results show that for one-week-ahead prediction, the RMSE values for models fit to the series found at Pembina, Drayton, and Grand Forks are 0.

Conclusions
Forecasting time series accurately, especially water levels for flood-warning systems, is an important but challenging task.The water-level forecasts at the Red River flowgauging stations, specifically for downstream stations without any discharge information available, such as Pembina in this study, play a vital role in the early flood-warning system.In this paper, we have examined a classical statistical method, SARIMA; a classical ML algorithm, RF; and a Deep Learning method, LSTM.As shown in our comparison of the models for Pembina, Drayton, and Grand Forks stations, the LSTM method achieved better results and more accurate prediction performance than the SARIMA and RF methods.SARIMA is effective at modeling linear data, whereas the other statistical machine-learning models are superior at modeling nonlinear data.A water-stage time series, on the other hand, frequently has both linear and nonlinear correlation structures.Results show that for one-week-ahead prediction, the RMSE values for models fit to the series found at Pembina, Drayton, and Grand Forks are 0.190, 0.151, 0.107, respectively.These results demonstrate the high precision of the Deep Learning algorithm as a reliable choice for flood prediction.Experimental results on Pembina, Drayton, and Grand Forks stations show a better performance with the LSTM model in all prediction times.RMSE values for LSTM are lower by 77.22% in comparison with the RF model and lower by 78.70% in comparison with the SARIMA model.There are 26.31% and 31.71%reductions in RMSE between the RF and SARIMA models at Drayton station, respectively, when using LSTM.For the Grand Forks station, the RMSE values for LSTM are lower by 83.70% compared to the RF model and lower by 96.39% compared to the SARIMA model.

Figure 1 .
Figure 1.(a) Location of Red River basin; (b) location of USGS stations on Red River in Pembina, Drayton, and Grand Forks.

Figure 1 .
Figure 1.(a) Location of Red River basin; (b) location of USGS stations on Red River in Pembina, Drayton, and Grand Forks.

Figure 2 .
Figure 2. Monthly water level at three hydrology stations of Red River of the North: (a) Pembina, (b) Drayton, and (c) Grand Forks stations.

Figure 2 .
Figure 2. Monthly water level at three hydrology stations of Red River of the North: (a) Pembina, (b) Drayton, and (c) Grand Forks stations.

Figure 3 .
Figure 3. Box and whisker plot of water-level data at three hydrology stations of Red River of the North: (a) Pembina, (b) Drayton, and (c) Grand Forks stations.

Figure 3 .
Figure 3. Box and whisker plot of water-level data at three hydrology stations of Red River of the North: (a) Pembina, (b) Drayton, and (c) Grand Forks stations.

Figure 4 .
Figure 4. Memory block with the memory cell Ct.

Figure 4 .
Figure 4. Memory block with the memory cell C t .

Figures 5 -Figure 5 .
Figures 5-7 present the visual comparisons of all methods for forecasting one week of water level at Pembina, Drayton, and Grand Forks using a classical statistical method, SARIMA, a classical ML algorithm, RF, and a Deep Learning method, LSTM.The green line indicates the observed data that were used as the test data, and the red line indicates prediction data that is the output of our models.

Figure 5 .
Figure 5. Visual comparison of one-week-ahead predicted values using (a) SARIMA, (b) RF, and (c) LSTM forecasting methods with true values on the Pembina series.

Figure 5 .Figure 6 .
Figure 5. Visual comparison of one-week-ahead predicted values using (a) SARIMA, (b) RF, and (c) LSTM forecasting methods with true values on the Pembina series.

Figure 6 .
Figure 6.Visual comparison of one-week-ahead predicted values using (a) SARIMA, (b) RF, and (c) LSTM forecasting methods with true values on the Drayton series.Figure 6. Visual comparison of one-week-ahead predicted values using (a) SARIMA, (b) RF, and (c) LSTM forecasting methods with true values on the Drayton series.

Figure 7 .
Figure 7. Visual comparison of one-week-ahead predicted values using (a) SARIMA, (b) RF, and (c) LSTM forecasting methods with true values on Grand Forks series.Figure 7. Visual comparison of one-week-ahead predicted values using (a) SARIMA, (b) RF, and (c) LSTM forecasting methods with true values on Grand Forks series.

Figure 7 .
Figure 7. Visual comparison of one-week-ahead predicted values using (a) SARIMA, (b) RF, and (c) LSTM forecasting methods with true values on Grand Forks series.Figure 7. Visual comparison of one-week-ahead predicted values using (a) SARIMA, (b) RF, and (c) LSTM forecasting methods with true values on Grand Forks series.
presents a comparison between observed and predicted data in Grand Forks station.The green line indicates the observed data that were used as test data, and the red line indicates the prediction data, which is the output of our model.The results indicate that the peak-flow scenarios in the field for May to August 2016 are well-captured by the trained LSTM.

Water 2022 , 18 Figure 8 .
Figure 8. Visual comparison of 3 months of predicted values using LSTM forecasting method with true values on Grand Forks series.
190, 0.151, 0.107, respectively.These results demonstrate the high precision of the Deep Learning algorithm as a reliable choice for flood prediction.Experimental results on Pembina, Drayton, and Grand Forks stations show a better performance with the LSTM model in all prediction times.RMSE values for LSTM are lower by 77.22% in comparison with the RF model and lower by 78.70% in comparison with the SARIMA model.There are 26.31% and 31.71%reductions in RMSE between the RF and SARIMA models at Drayton station, respectively, when using LSTM.For the Grand Forks station, the RMSE values for LSTM are lower by 83.70% compared to the RF model and lower by 96.39% compared to the SARIMA model.Author Contributions: Conceptualization, V.A., H.T.G., and S.M.S.; methodology, R.K.; software, H.T.G.; validation, H.T.G.; formal analysis, V.A.; investigation, V.A.; resources, S.M.S.; data curation, H.T.G.; writing-original draft preparation, V.A.; writing-review and editing, Y.H.L.; visual-

Figure 8 .
Figure 8. Visual comparison of 3 months of predicted values using LSTM forecasting method with true values on Grand Forks series.

Table 1 .
Characteristics of the water-level time series at three hydrology stations of the Red River.

Table 1 .
Characteristics of the water-level time series at three hydrology stations of the Red River.

Table 2 .
Evaluation of the performance of SARIMA, RF, and LSTM models at three USGS stations root mean square error (RMSE between the predicted and observed water-level data in the testing phase).