Prediction of Flow Based on a CNN-LSTM Combined Deep Learning Approach

: Although machine learning (ML) techniques are increasingly used in rainfall-runoff models, most of them are based on one-dimensional datasets. In this study, a rainfall-runoff model with deep learning algorithms (CNN-LSTM) was proposed to compute runoff in the watershed based on two-dimensional rainfall radar maps directly. The model explored a convolutional neural network (CNN) to process two-dimensional rainfall maps and long short-term memory (LSTM) to process one-dimensional output data from the CNN and the upstream runoff in order to calculate the ﬂow of the downstream runoff. In addition, the Elbe River basin in Sachsen, Germany, was selected as the study area, and the high-water periods of 2006, 2011, and 2013, and the low-water periods of 2015 and 2018 were used as the study periods. Via the ﬁvefold validation, we found that the Nash–Sutcliffe efﬁciency (NSE) and Kling–Gupta efﬁciency (KGE) ﬂuctuated from 0.46 to 0.97 and from 0.47 to 0.92 for the high-water period, where the optimal fold achieved 0.97 and 0.92, respectively. For the low-water period, the NSE and KGE ranged from 0.63 to 0.86 and from 0.68 to 0.93, where the optimal fold achieved 0.86 and 0.93, respectively. Our results demonstrate that CNN-LSTM would be useful for estimating water availability and ﬂood alerts for river basin management.


Introduction
The rainfall-runoff model has been developed over 170 years and extensively utilized in discharge prediction in the hydro-sciences, producing a surface runoff hydrograph in response to a rainfall event [1,2]. Since then, modelling concepts have been developed by embedding physical process understanding, concepts, and formulations gradually [3]. Driven by the advancements in computer technology and the availability of remote sensing data, the rainfall-runoff model has evolved from simple parametric models (conceptual or grey box) to complex mechanistic models (physically based or white box) or metric models (data-based, empirical, or black box) [4].
Parametric models are constructed on the basis of the simplified systemic mathematical conceptualization ensemble with a certain number of interconnected storage points applied to compute various components of the hydrological process through recharge and depletion. Parametric models were developed relatively early based on a small database of parameters and are still routinely applied for operational purposes [5][6][7][8][9][10]. With the enhancement of machine computing power, semi-distributed and distributed mechanistic models were proposed and studied for hydrological processes. Mechanistic models were capable of considering the spatial and temporal variability of land use, slope, soil, and climate by means of partial differential equations, which were solved through the finite difference or finite computation schemes [11][12][13][14]. However, high demand with regard to the computational costs and input data at high spatial and temporal resolution were

Convolutional Neural Network (CNN)
The CNN was introduced for object recognition in digital image processing [50], with a structure composed of a convolution layer, pooling layer, and fully connected layers. The convolution layer applies a convolution filter to input two-dimensional data to generate a feature map. Different feature maps, generated through different filters, are  Figure 1B illustrates an example of the precipitation radar map in this study, according to the information platform UNDINE (http://undine.bafg.de, accessed 29 November 2021), which contains the long-term daily flow data and catchment information of Germany. To study the applicable scope of the model, several recent extreme cases of the Elbe River were collected: January 2011, March and April 2006, and June 2013 for high-water periods; July, August and September 2015, and August and September 2018 for low-water periods [49]. Meanwhile, precipitation radar data of the periods for these extreme cases were extracted from the database of the Deutscher Wetterdienst (DWD, German Meteorological Service, https://cdc.dwd.de/portal/, accessed 30 November 2021). The Elbe River flow data of the periods at Schoena and Torgau were obtained from the Saxon State Office for Environment, Agriculture and Geology (https://www.umwelt.sachsen.de/umwelt/46037.htm, accessed on 30 November 2021).

Convolutional Neural Network (CNN)
The CNN was introduced for object recognition in digital image processing [50], with a structure composed of a convolution layer, pooling layer, and fully connected layers. The convolution layer applies a convolution filter to input two-dimensional data to generate a feature map. Different feature maps, generated through different filters, are eventually combined to generate a final output. In general, the convolutional layer needs an activation function to enhance the non-linear signal, and a rectified linear unit (ReLU) is usually employed as the activation function [51]. After the convolution layer, the max-pooling layer is used to extract the invariant features by eliminating the non-maximal values with an efficient convergence rate to achieve the purpose of non-linear down sampling [52]. The fully connected layer, linked to the max-pooling layer, connects a loss function to calculate Water 2022, 14, 993 4 of 13 errors between the observed values and model outputs [53]. The mean squared error (MSE) is used as a loss function in the present study.

Long Short-Term Memory (LSTM)
The LSTM network is a variation of the recurrent neural network (RNN), which adopts a directed cycle structure that transfers the output of a hidden layer to the same hidden layer to identify features and time series information through the signals of the previous time series [54]. LSTM was introduced by Hochreiter and Schmidhuber [39] to overcome gradient disappearance issues through memory cells. The cell states could be updated by a gating regulation consisting of the following three gates: the input gate determines which information will update the present block state and which new datasets will be included as inputs; the forget gate controls which information should be preserved or discarded; the output gate decides which outputs to generate [55]. The following equations were used in LSTM: Input gate Γ i : Forget gate Γ f : Cell c t : Output vector a t : where C c t is the cell state vector, x t is the input at current step t, δ is an element-wise nonlinear activation function, and b and W denote the bias and weight matrices, respectively.

River Flow Simulation
In this study, CNN and LSTM network were combined to predict the river flow. The two-dimensional data consisted of rainfall radar images with dimensions of 143 × 159 (1-km resolution), while the upstream flow data were applied as additional information and as single vectors. The upstream flow and rainfall radar time points were calculated based on the time interval with the highest correlation between upstream and downstream flow. The CNN model consisted of one convolutional layer, one max pooling layer, and one flatten layer. The output from the flatten layer was fed into a LSTM model as a onedimensional feature vector. The outputs from the LSTM layers and the upstream flow were concatenated and transferred to a fully connected layer, thereby generating downstream flow. More detailed descriptions of CNN and LSTM can be found in Sections 2.2 and 2.3, respectively. In addition, the CNN and LSTM models were implemented using a Keras environment based on Python.3.7

Performance Evaluation
For the objective function used in hydrologic model performance evaluation, it is desirable to include multiple metrics that could measure different aspects of model performance [56]. One of the most popular objective functions for the performance evaluation of hydrological modelling is the Nash-Sutcliffe efficiency (NSE) [57]. The NSE is defined as follows: Gupta et al. [58] proposed a new objective function for performance evaluation, the Kling-Gupta efficiency (KGE). The KGE aggregates three components (error, bias, and relative variability of flows) that are decomposed and modified from the NSE as follows: where r is the Pearson correlation coefficient used to evaluate the error, β is the bias term used to evaluate the bias, and α is a variability term used to measure the relative variability between the flows of observation (Q 0 ) and simulation (Q s ): where cov is the covariance; σ o and σ s are the standard deviation of observation and simulation, respectively; and µ o and µ s denote the mean flows of observation and simulation, respectively. In addition, the K-fold test was used to verify the generalizability and plausibility of the CNN-LSTM algorithm. K-fold is a technique widely adopted as a model selection criterion, using (K-1) fold data as the training dataset and one-fold as the test dataset after random assignment. It mimicked the application of training and test datasets by repeating the CNN-LSTM algorithm K times.

Flow Time Series
The record of the flow of the Elbe River in Sachsen was highly variable. In the last 15

Input Selection
The input attributes of the models were constructed based on the Pearson coefficient. The Pearson coefficient provides information about the correlation between two separate time series from different monitoring stations. The Pearson coefficient ranges between 1 and 0, where values nearer to 1 indicate near perfect correlation and values near to 0 suggest complete anti-correlation. The river takes a certain amount of time to flow from upstream (Schoena) to downstream (Torgau), so the intraday flow data of Torgau and one-day-or two-days-ahead Schoena flow data were compared using the Pearson coefficient. As shown in Figure 2, the Pearson correlation between the Torgau flow data and two-days-ahead Schoena flow data was higher at low flow levels, while the opposite was true at high flow levels. The change in the Pearson coefficient indicated that the velocity of water flow was slower during low-water periods, while the velocity of water flow was faster during periods of high water. For the Elbe River flow in this study, the value of the variation in the Pearson coefficient was 209 m 3 /s. Therefore, the following training situation was divided into high-water period prediction and low-water prediction based on the flow of Schoena (209 m 3 /s). time series from different monitoring stations. The Pearson coefficient ranges between 1 and 0, where values nearer to 1 indicate near perfect correlation and values near to 0 suggest complete anti-correlation. The river takes a certain amount of time to flow from upstream (Schoena) to downstream (Torgau), so the intraday flow data of Torgau and oneday-or two-days-ahead Schoena flow data were compared using the Pearson coefficient. As shown in Figure 2, the Pearson correlation between the Torgau flow data and twodays-ahead Schoena flow data was higher at low flow levels, while the opposite was true at high flow levels. The change in the Pearson coefficient indicated that the velocity of water flow was slower during low-water periods, while the velocity of water flow was faster during periods of high water. For the Elbe River flow in this study, the value of the variation in the Pearson coefficient was 209 m 3 /s. Therefore, the following training situation was divided into high-water period prediction and low-water prediction based on the flow of Schoena (209 m 3 /s). During the low-water period, the downstream flow data at Trogau on a particular day was adopted as the CNN-LSTM model output corresponding to the training input data, including the radar maps of the previous 48 h of hourly rainfall and the upstream flow data for Schoena from two days prior. Similarly, the downstream flows corresponded to training data, consisting of the radar maps of the previous 24 h of hourly rainfall and upstream data from a day earlier during the high-water period. During the low-water period, the downstream flow data at Trogau on a particular day was adopted as the CNN-LSTM model output corresponding to the training input data, including the radar maps of the previous 48 h of hourly rainfall and the upstream flow data for Schoena from two days prior. Similarly, the downstream flows corresponded to training data, consisting of the radar maps of the previous 24 h of hourly rainfall and upstream data from a day earlier during the high-water period.

Flow Simulation
The performance of all the CNN-LSTM models in terms of river flow prediction during model validation phases were evaluated by several statistical indices, including NSE, KGE, r, α, and β. According to the guidelines of the hydrological model performance evaluation proposed by Moriasi [60], the performance of a hydrological model was regarded as very good if the NSE was more than 0.75. Moreover, a refined version of the KGE was adopted to enable further decomposition of the NSE, which facilitated the analysis of the relative importance of its different components (r, α, and β) in the context of the rainfall-runoff model. The performance of the CNN-LSTM model was also evaluated through an inspection of graphical representations of the comparisons. Line and scatter plots also enabled us to gain a better understanding of the fit between the measured and forecasting data. K-fold cross-validated (k = 5) curves were generated from a comparison of the measured and predicted values after the datasets were randomly disrupted according to high-water and low-water periods. According to the high-water period validation shown in Table 1, it was suggested that k3 and k4 have high NSE and KGE values (NSE ≥ 0.9, KGE ≥ 0.9) with a, b, and r close to 1, showing the highest fitting accuracy. The NSE values of k1 and k5 were 0.96 and 0.90, respectively, but the KGE values of the same phases were 0.87 and 0.75, respectively, and their r values were 0.99 and 0.97. The b values were 1.01 and 1.04 which were closer to 1 than those of k3 and k4, but the values were only 0.87 and 0.75, indicating that their fitted fluctuations were slightly inferior to those of K3 and K4. The worst performer, k2, had NSE, KGE, a, b, and c values of 0.46, 0.47, 0.75, 0.58, and 0.80, respectively, all of which were the worst values obtained among all the test results. Collectively, k6 was the combined result of the fivefold test with NSE and KGE values of 0.78, and 0.75, respectively. As shown in Figure 3, the line plot intuitively shows the fitting deviation of the predicted values and the measured values. Figure 3b(1) indicates that the comparison shows an obvious deviation of the prediction effect and almost all the extreme predicted flow values are far lower than the measured values. The scatter plot over the validation phase shows that almost all the points are aligned along the diagonal line of the plots, except the k2 validation phase in the high-water period. However, the magnitude of the coefficient of determination is distinctly sharply worse for k2 validation phase over the whole testing phase-that is, it affected the R 2 of the whole testing phase (K6) to 0.79. While k2 attained an R 2 of 0.56, the other phases attained an R 2 of more than 0.95. As presented in Table 2, the results of fitting and validation based on the model during the low-water period show that k3 has the highest NSE and KGE values (NSE = 0.86 and KGE = 0.93) with a, b, and r close to 1, showing the highest fitting accuracy. k1 and k2 had higher NSE values than KGE values, while k4 and k5 had higher KGE values than NSE values. It could be seen from r that k4 and k5 had a higher fit while their a-values were only 0.74 and 0.69, indicating the poor performance of the fit with regard to their fluctuations. In all the tests, the b-values were close to 1, indicating that the model for low-water periods was better simulated for the mean value. On balance, k6 was the combined result of the fivefold test, with its NSE and KGE values of 0.81 and 0.76, respectively. As shown in Figure 4, the deviation of the prediction results in the low-water periods is much smaller compared with the high-water periods as the flow during low-water periods is much smaller. The line diagram in Figure 4e(1) shows that the model suggested a large error in predicting the large flow value in the low-water period. The scatter plots show that the magnitude of the coefficients of determination ranged from 0.68 to 0.9.  are the scatter plots of the predicted and measured values for k1 to k5 folds, respectively, and f(2) is the scatter plot for all validation sets from k1 to k5.
As presented in Table 2, the results of fitting and validation based on the model during the low-water period show that k3 has the highest NSE and KGE values (NSE = 0.86 and KGE = 0.93) with a, b, and r close to 1, showing the highest fitting accuracy. k1 and k2 had higher NSE values than KGE values, while k4 and k5 had higher KGE values than NSE values. It could be seen from r that k4 and k5 had a higher fit while their a-values were only 0.74 and 0.69, indicating the poor performance of the fit with regard to their fluctuations. In all the tests, the b-values were close to 1, indicating that the model for lowwater periods was better simulated for the mean value. On balance, k6 was the combined result of the fivefold test, with its NSE and KGE values of 0.81 and 0.76, respectively. As shown in Figure 4, the deviation of the prediction results in the low-water periods is much smaller compared with the high-water periods as the flow during low-water periods is much smaller. The line diagram in Figure 4e(1) shows that the model suggested a large error in predicting the large flow value in the low-water period. The scatter plots show that the magnitude of the coefficients of determination ranged from 0.68 to 0.9.  (1), b(1), c(1), d(1) and e(1) are the predicted and measured values for k1 to k5 folds, respectively, and f(1) is the predicted and measured values for all validation sets. a(2), b(2), c(2), d(2) and e(2) are the scatter plots of the predicted and measured values for k1 to k5 folds, respectively, and f(2) is the scatter plot for all validation sets from k1 to k5.  are the scatter plots of the predicted and measured values for k1 to k5 folds, respectively, and f(2) is the scatter plot for all validation sets from k1 to k5.

Discussion
In this study, we proposed a method that used CNN to process information on rainfall and LSTM to combine the flow data from the upstream point and time series of the CNN results as inputs into the model. The model computed the spatial relationship between rainfall and runoff through CNN and the temporal connection through LSTM. In this study, the model efficiency (in terms of NSE) was found to be 0.46-0.97 and 0.63-0.86 for high-water and low-water periods, respectively. In a previous study by Sahraei et al. [61], different parametric models (HEC-HMS, HBV-EC, HSPF, WATFLOOD, and Model Wrapper) employed in the Canadian prairies for discharge prediction achieved model NSE of between −0.39 and 0.76. Moreover, the SWAT, XAJ, MLR, BP, and LSTM models

Discussion
In this study, we proposed a method that used CNN to process information on rainfall and LSTM to combine the flow data from the upstream point and time series of the CNN results as inputs into the model. The model computed the spatial relationship between rainfall and runoff through CNN and the temporal connection through LSTM. In this study, the model efficiency (in terms of NSE) was found to be 0.46-0.97 and 0.63-0.86 for high-water and low-water periods, respectively. In a previous study by Sahraei et al. [61], different parametric models (HEC-HMS, HBV-EC, HSPF, WATFLOOD, and Model Wrapper) employed in the Canadian prairies for discharge prediction achieved model NSE of between −0.39 and 0.76. Moreover, the SWAT, XAJ, MLR, BP, and LSTM models were compared by Xu et al. [62] for runoff prediction in the Hun River basins and were found to produce NSE values between 0.35 and 0.79. In a study by Bhagwat and Maity [63], ANN and LS-SVR were employed for one-day-ahead runoff prediction in upper Narmada River basin and achieved a model efficiency (NSE) of between 0.58 and 0.68. In summary, the NSE values achieved in the above studies are less than or equivalent to those obtained using the CNN-LSTM model proposed in this study. If the training data are properly selected, the NSE value of the model could reach more than 0.85 for runoff simulation during high-water or low-water periods (Tables 1 and 2).
Compared to the parametric model, the advantage of this model was to avoid the complex data collection required by other rainfall-runoff models. For instance, the Hydrological Simulation Program-Fortran (HSPF), an open-source rainfall-runoff model developed by EPA, required rainfall, barometric pressure, air temperature, relative humidity, evaporation, insolation, topographic and land-use data, and the upstream runoff of the study area as the minimum major boundary conditions to run the model [64]. HSPF combined meteorological and geographic data to derive the contribution of rainfall to runoff through physical equations. On the other hand, the CNN-LSTM model required only radar data regarding rainfall and upstream flow data for the study area. The model processed the contribution of rainfall to runoff at the minimum spatial and temporal resolution within each study scale through CNN and LSTM. Although the CNN-LSTM could not show the huge variety of intermediate values in the internal operations, the algorithm continuously adjusted these intermediate values to achieve the best fit, which was equivalent to the calibration phase of the HSPF model that requires human intervention.
Compared to other metric models, the CNN-LSTM could be trained directly on twodimensional rainfall radar maps and output the predicted flow downstream. In the early 1990s, artificial neural network (ANN) approaches were introduced for rainfall-runoff modeling [65]. In the context of runoff prediction, ANNs employed flexible data-driven techniques to reflect the complicated non-linear connections between input driving factors (precipitation, upstream flow, etc.) and runoff that were difficult to discover using traditional rainfall-runoff modeling approaches. The adjustment of weights and biases in the ANN model corresponded to the contribution of land use, meteorological conditions, and other boundary conditions to the rainfall-runoff process. Limited by the ANN structure itself, the first step in using ANNs for two-dimensional image problems was to convert the two-dimensional images to one-dimensional vectors before training the model. This had two drawbacks: (1) With an increasing image size, the number of trainable parameters and the demand for computer hardware power increased dramatically; (2) ANNs could not capture the sequence information that is required to process sequence data from the input data. Therefore, 2D rainfall radar maps were processed by CNN, and sequence information was processed by LSTM to overcome these two drawbacks in this study.
In addition, the use of models was restricted and their prediction effectiveness for data other than training data is greatly reduced by machine learning [66]. For model prediction of runoff, the accuracy with regard to extreme values was sharply reduced if the trained dataset excludes data obtained under these extreme conditions [67]. Among all the recorded data for flow in Trogau, there were four recorded moments over 3000 m 3 /s. However, three of them (including two maximum cases) were included in the k2 validation set. Therefore, the training data of k2 were not assigned to most of the extreme values, resulting in poor learning during the training process and a lack of training for high-water value output. In summary, a lack of full scope coverage in the training data could result in low prediction values for some high-water periods during the validation phase, leading to poor prediction results. Therefore, the CNN-LSTM model is particularly important for the selection of input data scope coverage.
In the future, the occurrence of extremes should, on the one hand, be recorded in time and the model should be updated for training. On the other hand, as computer power increases, the use of machine learning models for predicting larger study areas with more rainfall data could be attempted.

Conclusions
In this study, the CNN-LSTM model was proposed for river downstream flow prediction. The Elbe River basin in Sachsen, Germany, was selected as the study area, and three high-water cases and two low-water cases were selected as the study period. The performance of models was assessed by KGE and NSE through fivefold validation. The rainfall radar map of the area and the upstream flow were utilized as inputs to the models. The CNN-LSTM model achieved good results in discharge prediction for high-water periods (KGE = 0.75, NSE = 0.78) and low-water periods (KGE = 0.76, NSE = 0.81) with the integrated evaluation of all fold validation sets. However, the CNN-LSTM model underestimated the extreme flows when the one-fold training dataset missed these extreme values (KGE = 0.47, NSE = 0.46), while the other folds performed well (KGE > 0.75, NSE > 0.90) for the high-water period. Therefore, we found that the CNN-LSTM model needs optimized input datasets to achieve better prediction performance. In summary, this study demonstrated the successful application of the CNN-LSTM model in predicting river flow based on two-dimensional rainfall data. This approach could be applied to other river basins with historical rainfall radar data and recorded river flow data. This model will be useful for estimating water availability and flood alerts for river basin management.