Improvement of Deep Learning Models for River Water Level Prediction Using Complex Network Method

Kim, Donghyun; Han, Heechan; Wang, Wonjoon; Kim, Hung Soo

doi:10.3390/w14030466

Open AccessArticle

Improvement of Deep Learning Models for River Water Level Prediction Using Complex Network Method

¹

Department of Civil Engineering, Inha University, Incheon 22212, Korea

²

Blackland Research and Extension Center, Texas A&M AgriLife, College Station, TX 76502, USA

^*

Author to whom correspondence should be addressed.

Water 2022, 14(3), 466; https://doi.org/10.3390/w14030466

Submission received: 22 December 2021 / Revised: 27 January 2022 / Accepted: 30 January 2022 / Published: 4 February 2022

(This article belongs to the Section Hydrogeology)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate water level prediction is one of the important challenges in various fields such as hydrology, natural disasters, and water resources management studies. In this study, a deep neural network and a long short-term memory model were applied for water level predictions between 2000 and 2020 in the Phan Rang River Basin of Nihn Thuan located in Vietnam. In addition, a complex network model was utilized to improve the predictive ability of both models for water level prediction at the outlet point of the basin. The water level prediction by each model was compared with the observed water level data, and the predictive power for each model was evaluated using three statistical metrics: the correlation coefficient (CC), the Nash–Sutcliffe efficiency coefficient (NSE), and the normalized root-mean-squared error (NRMSE). Using all data from nearby stations, there may be distortions in the prediction due to unnecessary data for model learning. Therefore, the complex network method was applied to find best data sources providing factors contributing to water level behaviors. The results of this study showed that a combination of the long short-term memory model and the complex network provided the best predictive performance (CC: 0.99; NSE: 0.99; and NRMSE: 0.17) and was selected as the optimal model for water level prediction in this study. As the need for disaster management is gradually increasing, it is expected that the deep learning model with the complex network method have sufficient potential to reduce the damage from natural disasters and improve disaster response systems, such as in the outskirts of Vietnam.

Keywords:

deep neural network; improve disaster response system; long short-term memory; water level behaviors

1. Introduction

The river water level (abbreviation: water level) change plays an important role in various fields such as agriculture, natural disasters, and water resources management [1,2]. For example, water level changes can effect on water circulation, sediment transport, water quality, and ecosystems [3]. Thus, accurate water level prediction is essential for managing water resources and mitigating flood controls properly.

Generally, the water level can be estimated using physically based methods such as numerical models which have been mainly used for 2D and 3D water level estimation [4,5,6]. In addition, recently, multi-source satellite image data have been used for monitoring water level changes in various regions [7,8,9,10,11]. However, physically based methods require complex calculation processes with lots of parameters. It may cause uncertainty in outcomes and time consumption for water level estimation.

Recently, with the advancement of computing resources and algorithms, deep learning models have been used to analyze and predict non-linear relationships between data variables, such as rainfall-runoff relationships. Such models have significantly contributed to the improvement of hydrological analyses performance, since they provide highly reliable modeling results [12]. In addition, deep learning models have shown predictive powers compared to physically based models [13].

In addition, deep learning techniques, such as deep neural network (DNN) and long short-term memory (LSTM) models, have been applied for various purposes in the field of hydrology, including predictions of runoff [14,15,16], precipitation [17,18], groundwater levels [19], drought analysis [20], and soil properties estimation [21]. Moreover, for water level prediction, Jung et al. [22] used an LSTM model to predict the water level variations up to 24 h ahead using various environmental variables such as the dam information, the tidal effect, and the upstream water level. Baek et al. [23] used an LSTM model combined with a convolutional neural network to simulate the water level and the water quality. Tran and Song [24] applied various deep learning models for flood level prediction in an urban area up to 60 min ahead, and they showed the LSTM model shows has the better performance compared to the recurrent neural network (RNN) model. Numerous previous researches have shown good performance for advanced hydrological analysis and improved the performance of physically based hydrologic models [25].

Accurate water level estimation requires understanding the relationships between the elements contributing to water level changes. The complex network is a network of numerous features that do not occur in simple networks. Recently, the complex network method was applied in various fields of research area such as computer networks, biological networks, climate networks, and hydrological networks [26,27]. There are many different types of complex network such as the degree centrality, the clustering coefficient, small-world networks, and the degree distribution [28]. Sivakumar and Woldemeskel [28] introduced a theory of the clustering coefficient, which is one of the popular complex networks, to analyze the spatial connections in streamflow dynamics. Yasmin and Sivakumar [29] used a complex network-based approach to interpret the temporal streamflow dynamics. They showed the usefulness of the phase-space reconstruction-based network construction for examining the temporal connections in the streamflow. Furthermore, Jha and Sivakumar [30] applied complex networks to investigate the properties of rainfall such as spatial connections, the temporal scale, and the network size. Their studies indicated that the complex network-based approach can be utilized for investigating connections in hydrological variables such as streamflow and rainfall, with important implications for interpolation, classification of catchments, and predictions in unmeasured regions [28].

The Phan Rang River Basin of Nihn Thuan located in Vietnam has insufficient disaster management capabilities for flood damage. In order to improve the predictive accuracy of the model, it is necessary to find the best data sources providing factors contributing to water level behaviors. Generally, in the previous studies predicting water changes, data sources close to the target point were used as the input data. To fill the gap with this limitation, this study aimed to develop water level prediction models using cutting-edge data-driven models and the complex network theory. We applied two data-driven models, DNN and LSTM models, to predict the water level change in Phan Rang River Basin of Nihn Thuan located in Vietnam. Moreover, in order to improve the modeling performance, we combined the complex network model with deep learning models and evaluated how well the combined model presented the prediction performance in comparison with the model used alone.

2. Methodology and Material

2.1. Study Area

The Phan Rang River Basin of Nihn Thuan was selected as the study area (Figure 1). This area is characterized by a tropical monsoon climate affected by seasonal winds, and 80–90% of total annual rainfall is observed between September and December during the rainy season. In particular, the downstream part of the Pan Rang River Basin is a tidal river located in the low elevation areas, which is vulnerable to the sea level rise due to climate change, and natural disasters such as typhoons and floods frequently occur. For example, numerous typhoons and tropical cyclones caused serious flood damage of which the cost was over $9.5 million in 2017. One of the reasons of this damage is the lack of disaster management capabilities and insufficient prediction information [31,32,33,34]. Therefore, this study attempted to contribute to the strengthening of disaster management capabilities of this area by using water level prediction models such as DNN and LSTM models for the Phan Rang River Basin.

2.2. Flowchart

In this study, we tried to develop a model to supplement the above-mentioned problems and predict the water level. Figure 2 shows the flow chart of developing a water level prediction model. The calculation process to predict the water level, which is the purpose of the study, is as follows.

(1) As for the dependent/independent variables, Phan Rang’s seven weather data were collected from 1 September 2000 to 31 December 2020 in 1-day units. (2) To develop a model predicting the water level, it was divided into a learning section and an evaluation section. The data from 2000 to 2013 were used for the learning section, and the data from 2014 to 2020 were used for the evaluation section. The water levels were predicted using the DNN and LSTM models separately (the data were divided into 70%/30% by referring to previous studies [1,2,38,39,40,41]). (3) The complex network method was applied to weather stations located in Phan Rang to determine the group. (4) To develop a model for the improvement of predicting the water level, it was divided into a learning section and an evaluation section. The data from 2000 to 2013 were used for the learning section, and the data from 2014 to 2020 were used for the evaluation section. The water levels were predicted using the improved DNN and LSTM models separately. (5) To evaluate the predictive power of each model, the predictive power was evaluated using the correlation coefficient (CC), the Nash–Sutcliffe efficiency coefficient (NSE), and the normalized root-mean-squared error (NRMSE).

2.3. Data Description

In order to develop a water level prediction model using weather data in the Phan Rang River Basin, the daily water level data from 2000 to 2020 of the Daolong Station, which is located at the downstream outlet point of the Phan Rang River Basin, were used as a dependent variable. For independent variables, the daily rainfall data observed at seven weather stations and the daily water level data obtained from the Tanmy Station located in the upstream area were used in this study. Weather data were collected with the help of Viet Nam Meteorological and Hydrological Administration in Vietnam (VNMHA). Table 1 shows the basic statistics of the dependent and independent variables.

2.4. DNN

The DNN is an artificial neural network (ANN) with multiple layers between the input and output layers. There are different types of neural network model, but they always consist of the same components: neurons, synapses, weights, biases, and functions [35]. The structure of the DNN is similar to that of the ANN but has two or more numbers in the hidden layer (Figure 3). These components function similarly to the human brain and can be trained like any other machine learning algorithm. The DNN architectures generate compositional models where the object is expressed as a layered composition of primitives [35,36,42]. The extra layers enable the composition of features from lower layers, potentially modeling complex data with fewer units than a similarly performing shallow network. The core of deep learning is to find and predict patterns in a lot of data. Existing machine learning algorithms have limitations in performance as the amount of data increases, but deep learning has the advantage of linearly increasing performance as the amount of data increases.

Each neuron receives a set of x-values (numbered from 1 to n) as an input, and the predicted y-hat value is computed. Vector x actually contains the values of the features in one of m examples from the training set. What is more, each unit has its own set of parameters, usually referred to as w (column vector of weights) and b (bias) which changes during the learning process. In each iteration, the neuron calculates a weighted average of the values of vector x, based on its current weight vector w and adds bias, which is shown as:

z = w_{1} x_{1} + w_{2} x_{2} + w_{3} x_{3} + \dots + w_{n} x_{n} = w^{T} \cdot x .

(1)

2.5. LSTM

The LSTM model is a type of RNN that directly learns from sequential data [43,44]. The general RNN has the disadvantage of updating only the learning results of the hidden layer for the entire period. It may cause an overfitting problem that affects the modeling performance. On the other hand, the LSTM model was developed to overcome this limitation by adding a cell state structure to the hidden layer and store information about the input data for a longer period of time. Figure 4 shows the conceptual diagram of the LSTM model.

The LSTM model has three main gates, i.e., forget gate (

f_{t}

), input data (

i_{t}

), and output gate (

o_{t}

). The forget gate (

f_{t}

) performs calculation on which information to discard and applies the

h_{t - 1}

of the previous step and

x_{t}

of the current step to the sigmoid function to obtain a value between 0 and 1. This is multiplied by the current state, and in the process, the forget gate decides whether to use or discard the information and can be shown as:

f_{t} = σ (W_{f} \times [h_{t - 1,} x_{t}] + b_{f}) .

(2)

Next, a sigmoid function called input gate (

i_{t}

) decides which data to update. After creating the hyperbolic tangent function, vectors

\tilde{C_{t}}

, the new candidate values, are combined with the

i_{t}

values and added to the cell state (see Equations (3) and (4)):

i_{t} = σ (W_{i} \times [h_{t - 1,} x_{t}] + b_{i}),

(3)

\tilde{C_{t}} = \tanh (W_{C} \times [h_{t - 1,} x_{t}] + b_{C}) .

(4)

Using Equations (1)–(3), a new cell state (

C_{t - 1})

is created by updating the previous state

(C_{t})

. Equation (5) can be used to update the information state of the current step:

C_{t} = f_{t} \times C_{t - 1} + i_{t} \times \tilde{C_{t}} .

(5)

Finally, the result is derived through the output gate (

o_{t}

), and this can be regarded as a step to determine which part of the cell state to derive. To this end, Equation (6) is used, and the current cell state is updated using Equation (7):

o_{t} = σ (W_{o} \times [h_{t - 1,} x_{t}] + b_{o}),

(6)

h_{t} = o_{t} \times \tanh (C_{t}) .

(7)

2.6. Complex Network

The complex network is an effective method for representing the structure and process of a complex and fluid hydrological system [45,46]. The network (or graph) consists of multiple nodes and links. For example, in Figure 5, there are five nodes shown as

V = {a, b, c, d, e}

, and E is a set of links and consists of a total of six links between five nodes shown as

E = {(a, b), (b, c), (c, d), (b, d), (a, d), (d, e)}

. When building a complex network, the most important thing is the presence of links connected to nodes, which are used to calculate various analysis indicators for the complex network, such as the centrality and the clustering coefficient [47,48].

In this study, the importance of each weather station was evaluated using degree centrality among methods for centrality calculation such as degree centrality, closeness centrality, and betweenness centrality. Basically, the degree centrality method is a method for estimating the importance of each node by evaluating how to impact on other nodes connected to each other. In Figure 5, the degree centrality of node d had the highest importance as 4, and the degree centrality of node e had the lowest importance as 1. However, when performing a comparison with the degree centrality value for each node with other networks, it is difficult to make a fair comparison if the size of the network is different. Therefore, in this study, the importance was evaluated after normalizing degree centrality for each node by dividing it by

N - 1

, the maximum degree centrality value for each network, which can be shown as:

D e g r e e C e n t r a i l i t y = \frac{N_{c}}{N - 1},

(8)

where

N

denotes the total number of nodes, and

N_{C}

denotes the number of links connected to individual nodes.

2.7. Evaluation of the Predictive Power

The Phan Rang River Basin of Nihn Thuan has precipitation, which is concentrated in 80–90% of annual rainfall between September and December during the rainy season. Therefore, for the development of the model, the data from 1 September 2000 to 31 December 2013 were used as training data to build the model. In addition, the data from 1 September 2014 to 31 December 2020 were used as test data to evaluate the predictive power of the water level prediction model. In order to consider various conditions, a water level prediction model was constructed in the following six ways, and the predictive power of the model was evaluated.

In this study, the CC, the NSE, and the NRMSE were used as indicators of predictive power evaluation. The CC is a method of analyzing the linear relationship between two variables. In correlation analysis for continuous variables such as the interval scale and the ratio scale, it is defined as a measure for the strength of the one-dimensional correlation between two variables [1,2,16,38,39,40]:

CC (r) = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}},

(9)

where

x_{i}

is the observed value,

\bar{x}

is the mean,

y_{i}

is the predicted value, and

\bar{y}

is the mean.

The NSE means that the predicted result is bad or inconsistent if the value is negative and can be shown as:

NSE = \frac{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2} - \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},

(10)

where

y_{i}

is the observed value,

\bar{y}

is the mean, and

\hat{y_{l}}

is the predicted value. A positive value means that using the predicted result will give better results than using the average of the observations, and a value closer to 1 means an ideal result.

The NRMSE is the value obtained by dividing the numerator, the RMSE, by the range (maximum–minimum) of the denominator, the actual value and can be written as:

NRMSE (%) = \frac{\sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}}{M a x (y_{i}) - M i n (y_{i})} \times 100,

(11)

where

y_{i}

is the observed value,

\bar{y}

is the mean, and

\hat{y_{l}}

is the predicted value. The closer to 0 the NRMSE, the smaller the degree of error.

3. Results

3.1. Overall Performances of the DNN and the LSTM Models

The predictive performances of both the DNN and LSTM models depend on the features of inputs and the parameters and hyper-parameters of the models. Weights and biases for which optimal values are determined by model learning are called parameters, and other factors that must be set to effectively perform model learning are called hyper-parameters. For example, hyper-parameters include the learning rate, the hidden layer, hidden nodes, the dropout, and epoch, and the users can manually set the optimal combination of the parameters for the model. In addition, since optimal hyper-parameter values for each model vary according to input and output variables, a specific value cannot be defined as an optimal value. This study attempted to randomly calculate the parameters in order to solve the problem that this process takes a considerable amount of time to readjust the parameters when flood events occur, which are different from the flood that occurred in the past. Therefore, it can be said that objectivity was secured by randomly selecting parameters for each model to calculate parameters. In addition, K-fold cross-validation arbitrarily divides the dataset into the same size and uses one of them for the verification dataset. In addition, the process of using the remaining (K-1) for the training dataset is sequentially repeated K times to verify the entire given dataset. The advantage of K-fold cross-validation is that all cases are used for training and verification and each case is used only once for testing, so overfitting can be prevented [41]. The hyper-parameter setting values used in this study are shown in Table 2.

The LSTM model should also derive optimal parameter values using trial and error methods. The parameter values used in this study are shown in Table 3.

Table 4 shows the results of predicting the water level at Daolong Station using the test data from 1 September 2014 to 31 December 2020.

Figure 6 and Figure 7 illustrate the predicted water levels and the observed data at the outlet point of the basin using the DNN and LSTM models, respectively. It seems that DNN-based models (i.e., models (1)–(3)) showed moderate prediction results for the overall water level patterns but had a low performance for predicting the peak water level which is important in flood prediction. The LSTM model had a better predictive performance than the DNN model. As shown in Table 4 and Figure 7, models (5) and (6) which used the rainfall and the water level, as inputs had the best performances for predicting water level patterns and peak water level values for the entire periods.

3.2. Calculation of Centrality for the Water Level and Rainfall Stations

Many previous studies considered all data from nearby stations around the study area and used them as independent variables. Using all data from nearby stations, there may be distortions in the prediction due to unnecessary data for model learning. Therefore, in this study, the model was built using only the data of the station related to Daolong Station as the dependent variable.

A complex network analysis was applied to the water level and rainfall stations located in the Phan Rang River Basin. Each station was represented as a node in the network. The correlation coefficient between stations was expressed as a link between nodes, and the strength of the link was calculated through the CC. Centrality was used to analyze the complex network. Looking at previous studies to calculate the degree of centrality, there is no specific method for setting appropriate threshold values, but threshold values from 0.1 to 0.9 are set at 0.1 intervals [36,40]. Therefore, in this study, the number of links was calculated according to the threshold at 0.1 intervals (Figure 8). Based on the set threshold values, four threshold values (T = 0.3, T = 0.4, T = 0.5, and T = 0.6) at which the graph changes rapidly were identified, and the four threshold values were used to construct the network.

When the thresholds were set to 0.3 and 0.4, a network with difficult characteristics to distinguish was formed. That is, the different stations did not show a significant difference. At the threshold value of 0.5, the outermost link of Khanh Son Station was removed. At the threshold of 0.6, the network was built around the most important node, the Daolong Station. As the threshold was gradually increased by 0.1, it could be said that the Phan Rang River Basin formed a network based on Daolong Station. That is, the Phan Rang River Basin was networked with Daolong, Tanmy, Phan Rang, Nhaho, Nhiha, and Quanthe Stations (Figure 9). Therefore, in order to predict the water level in the Phan Rang River Basin, this study tried to develop a water level prediction model using Tanmy Station in the upstream basin and the rainfall station affecting the Phan Rang River Basin based on Daolong Station in the downstream basin.

3.3. Development of the Water Level Prediction Model Using the Complex Network

As a result of the complex network, stations having significant correlations with the outlet point (i.e., Daolong Station) were identified as Tanmy, Phan Rang, Nhaho, Nhiha, and Quanthe Stations. Therefore, in this study, the water level at Daolong Station in the downstream area was set as a dependent variable of the water level prediction model with the water level of Tanmy Station in the upstream basin and the rainfall data from Tanmy, Phan Rang, Nhaho, Nhiha, and Quanthe Stations located nearby as independent variables. Optimal parameter values were derived for the DNN and LSTM models using trial and error methods. The parameter settings used in this study are shown in Table 5.

The evaluation results showed that the predictive power of the water level prediction model with the complex network method was better than the standalone model (i.e., without the complex network). In addition, as shown in Table 6, when comparing the DNN model (model (7)) to the LSTM model (model (8)) with the complex network method, the LSTM model had a better performance than the DNN model for predicting water level patterns as well as peak values. If the complex network method was used, it was judged to be effective in removing unnecessary data at key points. Therefore, in this study, model (8) constructed by applying the complex network was finally selected as the final model (Table 6, Figure 10).

4. Conclusions

This study proposed a model combining a complex network and deep learning models (i.e., DNN and LSTM models) for water level prediction at Phan Rang River Basin in Vietnam and compared it with basic DNN and LSTM models for the evaluation of model performance. The results are as follows.

The LSTM-based models outperformed the DNN-based models for water level prediction. The LSTM-based models provided performance in predicting the daily water level variability and the peak values of the observed water levels. The CC, NSE, and NRMSE values were 0.94–0.95, 0.93–0.95, and 9.82–10.41, respectively, for the LSTM-based models, and were 0.90–0.92, 0.80–0.89, and 15.22–30.40, respectively, for the DNN-based models. The evaluation metrics showed that the LSTM-based models were suitable for water level prediction.

Using all data from nearby stations, there may be distortions in the prediction due to unnecessary data for model learning. At the threshold of 0.6, the network was built around the most important node, Daolong Station. Therefore, in this study, the importance of the stations was evaluated based on the complex network. The Phan Rang River Basin formed a network based on Daolong Station. That is, the Phan Rang River Basin is networked with Daolong, Tanmy, Phan Rang, Nhaho, Nhiha, and Quanthe Stations.

The complex network–LSTM model outperformed the DNN, LSTM, and complex network–DNN models for water level prediction. The complex network–LSTM model provided a high performance in predicting the daily water level variability and the peak values of the observed water level. The CC, NSE, and NRMSE values were 0.99, 0.99, and 0.17, respectively, for the complex network–LSTM model, and were 0.95, 0.89, and 4.41, respectively, for the complex network–DNN models. The evaluation metrics showed that all models were suitable for water level prediction, but the complex network–LSTM model was the best one.

For the prediction of future water levels (flood), models were developed and operated using the past flood discharge data in Vietnam. The rainfall-runoff model currently employed in Vietnam manages the runoff model through parameter adjustment relaying on the experience of hydrology research every time a new flood event occurs. Furthermore, there are cases that in order to match the peak water level, the value out of the range is artificially introduced for forecasting flood discharge. In this case, the problem arises that the parameter values are fixed for past data and they need to be newly adjusted for every new rainfall event. However, using the results of this study, it has the advantage of being able to quickly predict the water level in the downstream area using data from nearby stations in real time.

5. Discussions

The most difficult part in the process of developing a water level prediction model was considered to be reliable data collection. This study has several limitations. The water level and rainfall data used as dependent and independent variables were measured from 2000, and only 21 years of data were used. If it is systematically managed to collect more diverse and larger amounts of observation data, a function with higher predictive power can be developed.

In this study, even though the proposed models used the water level and the rainfall from areas as the input data, new input features such as evapotranspiration, topography, and geospatial datasets can be used to improve the predictive performance in future study. In addition, factors such as various weather data and impervious areas were not considered. If factors related to water level prediction are additionally considered, a reliable water level prediction model can be developed.

The result of this study was to conduct disaster management in advance of flood damage in the outskirts of Vietnam. As the need for disaster management is gradually increasing, it is thought that it can be expanded even in countries that do not have a disaster response system, such as in the outskirts of Vietnam.

Author Contributions

Conceptualization, D.K. and H.S.K.; formal analysis, D.K.; methodology, H.H. and W.W.; supervision, H.S.K. and H.H.; writing—original draft, D.K.; writing—review and editing, D.K. and H.H. All authors have read and agreed to the published version of the manuscript.

Funding

Ministry of Interior and Safety, Korea: 2021-MOIS36-002.

Data Availability Statement

Not applicable.

Acknowledgments

This research was supported by a grant (2021-MOIS36-002) of Technology Development Program on Disaster Restoration Capacity Building and Strengthening funded by Ministry of Interior and Safety (MOIS, Korea).

Conflicts of Interest

The authors declare no conflict of interest.

References

Kim, D.; Kim, J.; Kwak, J.; Necesito, I.V.; Kim, J.; Kim, H.S. Development of water level prediction models using deep neural network in mountain wetlands. J. Wetl. Res. 2020, 22, 106–112. [Google Scholar]
Choi, C.; Kim, J.; Han, H.; Han, D.; Kim, H.S. Development of water level prediction models using machine learning in wetlands: A case study of Upo wetland in South Korea. Water 2020, 12, 93. [Google Scholar] [CrossRef] [Green Version]
Wang, Q.; Wang, S. Machine learning-based water level prediction in Lake Erie. Water 2020, 12, 2654. [Google Scholar] [CrossRef]
Olsen, N.R.; Kjellesvig, H.M. Three-dimensional numerical flow modelling for estimation of spillway capacity. J. Hydraul. Res. 1998, 36, 775–784. [Google Scholar] [CrossRef]
Liao, J.; Gao, L.; Wang, X. Numerical simulation and forecasting of water level for Qinghai Lake using multi-altimeter data between 2002 and 2012. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 609–622. [Google Scholar] [CrossRef]
Reinking, J. GNSS-SNR water level estimation using global optimization based on interval analysis. J. Geod. Sci. 2016, 6, 80–92. [Google Scholar] [CrossRef]
Hostache, R.; Matgen, P.; Schumann, G.; Puech, C.; Hoffmann, L.; Pfister, L. Water level estimation and reduction of hydraulic model calibration uncertainties using satellite SAR images of floods. IEEE Trans. Geosci. Remote Sens. 2009, 47, 431–441. [Google Scholar] [CrossRef]
Velpuri, N.M.; Senay, G.B.; Asante, K.O. A multi-source satellite data approach for modelling Lake Turkana water level: Calibration and validation using satellite altimetry data. Hydrol. Earth Syst. Sci. 2012, 16, 1–18. [Google Scholar] [CrossRef]
Schwatke, C.; Dettmering, D.; Bosch, W.; Seitz, F. DAHITI–an innovative approach for estimating water level time series over inland waters using multi-mission satellite altimetry. Hydrol. Earth Syst. Sci. 2015, 19, 4345–4364. [Google Scholar] [CrossRef] [Green Version]
Tourian, M.J.; Tarpanelli, A.; Elmi, O.; Qin, T.; Brocca, L.; Moramarco, T.; Sneeuw, N. Spatiotemporal densification of river water level time series by multimission satellite altimetry. Water Resour. Res. 2016, 52, 1140–1159. [Google Scholar] [CrossRef] [Green Version]
Rulent, J.; Mir Calafat, F.; Banks, C.J.; Bricheno, L.; Gommenginger, C.; Green, M.; Haigh, I.D.; Lewis, H.; Martin, A. Comparing Water Level Estimation In Coastal And Shelf Seas From Satellite Altimetry And Numerical Models. Front. Mar. Sci. 2020, 7, 919. [Google Scholar] [CrossRef]
Mosavi, A.; Ozturk, P.; Chau, K.W. Flood prediction using machine learning models: Literature review. Water 2018, 10, 1536. [Google Scholar] [CrossRef] [Green Version]
Solomatine, D.P.; Ostfeld, A. Data-driven modelling: Some past experiences and new approaches. J. Hydroinf. 2008, 10, 3–22. [Google Scholar] [CrossRef] [Green Version]
Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall–runoff modelling using long short-term memory (LSTM) networks. Hydrol. Earth Syst. Sci. 2018, 22, 6005–6022. [Google Scholar] [CrossRef] [Green Version]
Xiang, Z.; Yan, J.; Demir, I. A rainfall-runoff model with LSTM-based sequence-to-sequence learning. Water Resour. Res. 2020, 56, e2019WR025326. [Google Scholar] [CrossRef]
Han, H.; Choi, C.; Jung, J.; Kim, H.S. Deep Learning with Long Short Term Memory Based Sequence-to-Sequence Model for Rainfall-Runoff Simulation. Water 2021, 13, 437. [Google Scholar] [CrossRef]
Xingjian SH, I.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Processing Syst. 2015, 28, 802–810. [Google Scholar]
Wu, X.; Zhou, J.; Yu, H.; Liu, D.; Xie, K.; Chen, Y.; Hu, J.; Sun, H.; Xing, F. The Development of a Hybrid Wavelet-ARIMA-LSTM Model for Precipitation Amounts and Drought Analysis. Atmosphere 2021, 12, 74. [Google Scholar] [CrossRef]
Sahoo, S.; Russo, T.A.; Elliott, J.; Foster, I. Machine learning algorithm s for modeling groundwater level changes in agricultural regions of the US. Water Resour. Res. 2017, 53, 3878–3895. [Google Scholar] [CrossRef]
Poornima, S.; Pushpalatha, M. Drought prediction based on SPI and SPEI with varying timescales using LSTM recurrent neural network. Soft Comput. 2019, 23, 8399–8412. [Google Scholar] [CrossRef]
Feng, Y.; Cui, N.; Hao, W.; Gao, L.; Gong, D. Estimation of soil temperature from meteorological data using different machine learning models. Geoderma 2020, 338, 67–77. [Google Scholar] [CrossRef]
Jung, S.; Cho, H.; Kim, J.; Lee, G. Prediction of water level in a tidal river using a deep-learning based LSTM model. J. Korea Water Resour. Assoc. 2018, 51, 1207–1216. [Google Scholar]
Baek, Y.; Kim, H.Y. ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module. Expert Syst. Appl. 2018, 113, 457–480. [Google Scholar] [CrossRef]
Tran, Q.K.; Song, S.K. Water level forecasting based on deep learning: A use case of Trinity River-Texas-The United States. J. KIISE 2017, 44, 607–612. [Google Scholar] [CrossRef]
Abebe, A.J.; Price, R.K. Managing uncertainty in hydrological models using complementary models. Hydrol. Sci. J. 2003, 48, 679–692. [Google Scholar] [CrossRef]
Albert, R.; Barabási, A.L. Statistical mechanics of complex networks. Rev. Mod. Phys. 2002, 74, 47. [Google Scholar] [CrossRef] [Green Version]
Cohen, R.; Havlin, S. Complex Networks: Structure, Robustness and Function; United Kingdom at the University Press: Cambridge, UK, 2010. [Google Scholar] [CrossRef] [Green Version]
Sivakumar, B.; Woldemeskel, F.M. Complex networks for streamflow dynamics. Hydrol. Earth Syst. Sci. 2014, 18, 4565–4578. [Google Scholar] [CrossRef] [Green Version]
Yasmin, N.; Sivakumar, B. Temporal streamflow analysis: Coupling nonlinear dynamics with complex networks. J. Hydrol. 2018, 564, 59–67. [Google Scholar] [CrossRef]
Jha, S.K.; Sivakumar, B. Complex networks for rainfall modeling: Spatial connections, temporal scale, and network size. J. Hydrol. 2017, 554, 482–489. [Google Scholar] [CrossRef]
Decision No. 339/QD-UBND Dated 26 February 2020 of the Chairman of Ninh Thuan Provincial People’s Committee, on the Promulgation of the Plan for Natural Disaster Prevention and Control for the Period of 2021–2025 in Ninh Thuan Province. Available online: https://vanbanphapluat.co/ (accessed on 26 February 2020).
Ninh Thuan Provincial Commanding Committee of Natural Disaster Prevention and Control, Search and Rescue, Summary of Flood Prevention and Search and Rescue (2016–2020). Available online: https://vanbanphapluat.co/ (accessed on 6 March 2021).
Decision No. 03/2020/QD-TTg Dated 13 January 2020 of the Prime Minister on Regulations on Forecasting, Warning and Communicating of Natural Disasters. Available online: https://vanbanphapluat.co/ (accessed on 13 January 2020).
Decision No. 05/2020/QD-TTg Dated 31 January 2020 of the Prime Minister on Regulations on Water Levels Corresponding to Flood Alarming Levels on Rivers Nationwide. Available online: https://vanbanphapluat.co/ (accessed on 31 January 2020).
Bengio, Y. Learning Deep Architectures for AI; Now Publishers Inc.: Delft, The Netherlands, 2009. [Google Scholar]
Szegedy, C.; Toshev, A.; Erhan, D. Deep Neural Networks for Object Detection. Available online: https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41457.pdf (accessed on 26 September 2013).
Donges, J.F.; Zou, Y.; Marwan, N.; Kurths, J. Complex networks in climate dynamics. Eur. Phys. J. Spec. Top. 2009, 174, 157–179. [Google Scholar] [CrossRef] [Green Version]
Choi, C.; Kim, J.; Kim, J.; Kim, D.; Bae, Y.; Kim, H.S. Development of heavy rain damage prediction model using machine learning based on big data. Adv. Meteorol. 2018, 2018, 5024930. [Google Scholar] [CrossRef] [Green Version]
Choi, C.H.; Kim, J.S.; Kim, J.H.; Kim, H.Y.; Lee, W.J.; Kim, H.S. Development of Heavy Rain Damage Prediction Function Using Statistical Methodology. J. Korean Soc. Hazard Mitig. 2017, 17, 604–612. [Google Scholar] [CrossRef]
Kim, J.S.; Choi, C.H.; Lee, J.S.; Kim, H.S. Damage Prediction Using Heavy Rain Risk Assessment: (2) Development of Heavy Rain Damage Prediction Function. J. Korean Soc. Hazard Mitig. 2017, 17, 371–379. [Google Scholar] [CrossRef]
Lee, J.S. Development and Application of Artifacts Foundation Model for Real Time Flood Forecasting. Ph.D. Thesis, Inha University Graduate School, Incheon, Korea, 2021. [Google Scholar]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hu, C.; Wu, Q.; Li, H.; Jian, S.; Li, N.; Lou, Z. Deep learning with a long short-term memory networks approach for rainfall-runoff simulation. Water 2018, 10, 1543. [Google Scholar] [CrossRef] [Green Version]
Fan, H.; Jiang, M.; Xu, L.; Zhu, H.; Cheng, J.; Jiang, J. Comparison of Long Short Term Memory Networks and the Hydrological Model in Runoff Simulation. Water 2020, 12, 175. [Google Scholar] [CrossRef] [Green Version]
Kim, K.; Joo, H.; Han, D.; Kim, S.; Lee, T.; Kim, H.S. On complex network construction of rain gauge stations considering nonlinearity of observed daily rainfall data. Water 2019, 11, 1578. [Google Scholar] [CrossRef] [Green Version]
Joo, H.; Kim, H.S.; Kim, S.; Sivakumar, B. Complex networks and integrated centrality measure to assess the importance of streamflow stations in a River basin. J. Hydrol. 2021, 598, 126280. [Google Scholar] [CrossRef]
Estrada, E. The Structure of Complex Networks: Theory and Applications; Oxford University Press: Oxford, UK, 2012. [Google Scholar]
Newman, M. Networks; Oxford University Press: Oxford, UK, 2018. [Google Scholar]

Figure 1. The Phan Rang River Basin of Nihn Thuan located in Vietnam. Orange circles represent weather stations located in the basin, and red triangles represent water level stations located in the basin.

Figure 2. Procedures of the improvement of deep learning models for water level prediction using the complex network method [1,2,16,35,36,37,38,39,40].

Figure 3. The structure of the deep neural network (DNN) [1,2].

Figure 4. Conceptual diagram of the long short-term memory (LSTM) model [16].

Figure 5. Network in its simplest form, i.e., an undirected network with only a single type of node and a single type of link [45,46].

Figure 6. Observed (Daolong; water level) and predicted water levels from three data-driven models. A dotted line represents model (1). Red triangle represents model (2). Green X represents model (3).

Figure 7. Observed (Daolong; water level) and predicted water levels from three data-driven models. A dotted line represents model (4). Red triangle represents model (5). Green X represents model (6).

Figure 8. Numbers of links for different threshold values.

Figure 9. Construction of links in the stations network with four threshold values: (a) Threshold value = 0.3; (b) Threshold value = 0.4; (c) Threshold value = 0.5; and (d) Threshold value = 0.6.

Figure 10. Observed (Daolong; H) and predicted water levels from two data-driven and complex network models. Green triangle represents model (7). Red X represents model (8).

Table 1. Basic statistics for the dependent and independent variables. H, water level (EL·m); R, rainfall (mm).

Classification (Stations)	Unit	Max	Min	Mean	Standard Deviation	Coefficient of Variation
Daolong	H (EL·m)	385.85	−37.86	32.23	41.72	1.29
Tanmy	H (EL·m)	3832.00	3362.00	3439.75	46.28	0.01
Tanmy	R (mm)	325.20	0.00	5.64	18.65	3.31
Phan Rang	R (mm)	321.80	0.00	4.95	16.58	3.35
Nhaho	R (mm)	259.00	0.00	2.56	11.94	4.67
Khanh Son	R (mm)	373.40	0.00	5.90	18.31	3.10
Songha	R (mm)	205.00	0.00	7.75	19.53	2.52
Nhiha	R (mm)	280.40	0.00	3.63	13.92	3.83
Quanthe	R (mm)	272.60	0.00	4.06	14.87	3.66

Table 2. Setting of hyper-parameters in the DNN. The water level of Daolong Station was predicted using rainfall data as an independent variable (model (1)). The water level data were used as an independent variable to predict the water level of Daolong Station (model (2)). The water level of Daolong Station was predicted using rainfall and water level data as independent variables (model (3)).

Hyper-Parameter	Value (Model (1))	Value (Model (2))	Value (Model (3))
Learning rate	0.1	0.1	0.1
Hidden layer	3	4	3
Hidden nodes	5	8	4
Dropout	0.5	0.5	0.5
Epoch	67	48	55
Batch size	10	10	8
Optimizer	Adam	Adam	Adam
Activation	ReLU	ReLU	ReLU

Table 3. Setting of parameters in the LSTM model. The water level of Daolong Station was predicted using rainfall data as an independent variable (model (4)). The water level data was used as an independent variable to predict the water level of Daolong Station (model (5)). The water level of Daolong Station was predicted using rainfall and water level data as independent variables (model (6)).

Parameter	Values (Model (4))	Value (Model (5))	Values (Model (6))
Activation	Relu	Relu	Relu
Epoch	47	34	41
Otimizer	Adam	Adam	Adam
Learning rate	0.01	0.01	0.01
Loss	Mean squared error	Mean squared error	Mean squared error

Table 4. Evaluation of predictive power by model. The correlation coefficient (CC), the Nash–Sutcliffe efficiency coefficient (NSE), and the normalized root-mean-squared error were used as the indicators of predictive power evaluation.

Classification	CC	NSE	NRMSE (%)
Model (1)_DNN (rainfall)	0.90	0.80	30.4
Model (2)_DNN (water level)	0.91	0.89	16.75
Model (3)_DNN (water level and rainfall)	0.92	0.88	15.22
Model (4)_LSTM (rainfall)	0.94	0.93	10.41
Model (5)_LSTM (water level)	0.95	0.95	9.82
Model (6)_LSTM (water level and rainfall)	0.95	0.94	9.93

Table 5. Setting of hyper-parameters in the DNN model and setting of parameters in the LSTM model. A water level prediction model was developed using the water level of Tanmy Station and the rainfall data from Tanmy, Phan Rang, Nhaho, Nhiha, and Quanthe Stations.

Hyper-Parameter	Values (Model (7))	Parameter	Values (Model (8))
Learning rate	0.1	Activation	ReLU
Hidden layer	3	Epoch	53
Hidden nodes	3	Otimizer	Adam
Dropout	0.5	Learning rate	0.01
Epoch	81	Loss	Mean squared error
Batch size	7
Optimizer	Adam
Activation	ReLU

Table 6. Evaluation of predictive power by model. The CC, the NSE, and the NRMSE were used as the indicators of predictive power evaluation.

Classification	CC	NSE	NRMSE (%)
Model (1)_DNN (rainfall)	0.90	0.80	30.4
Model (2)_DNN (water level)	0.91	0.89	16.75
Model (3)_DNN (water level and rainfall)	0.92	0.88	15.22
Model (4)_LSTM (rainfall)	0.94	0.93	10.41
Model (5)_LSTM (water level)	0.95	0.95	9.82
Model (6)_LSTM (water level and rainfall)	0.95	0.94	9.93
Model (7)_Complex network_DNN (water level and rainfall)	0.95	0.89	4.41
Model (8)_Complex network_LSTM (water level and rainfall)	0.99	0.99	0.17

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, D.; Han, H.; Wang, W.; Kim, H.S. Improvement of Deep Learning Models for River Water Level Prediction Using Complex Network Method. Water 2022, 14, 466. https://doi.org/10.3390/w14030466

AMA Style

Kim D, Han H, Wang W, Kim HS. Improvement of Deep Learning Models for River Water Level Prediction Using Complex Network Method. Water. 2022; 14(3):466. https://doi.org/10.3390/w14030466

Chicago/Turabian Style

Kim, Donghyun, Heechan Han, Wonjoon Wang, and Hung Soo Kim. 2022. "Improvement of Deep Learning Models for River Water Level Prediction Using Complex Network Method" Water 14, no. 3: 466. https://doi.org/10.3390/w14030466

APA Style

Kim, D., Han, H., Wang, W., & Kim, H. S. (2022). Improvement of Deep Learning Models for River Water Level Prediction Using Complex Network Method. Water, 14(3), 466. https://doi.org/10.3390/w14030466

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improvement of Deep Learning Models for River Water Level Prediction Using Complex Network Method

Abstract

1. Introduction

2. Methodology and Material

2.1. Study Area

2.2. Flowchart

2.3. Data Description

2.4. DNN

2.5. LSTM

2.6. Complex Network

2.7. Evaluation of the Predictive Power

3. Results

3.1. Overall Performances of the DNN and the LSTM Models

3.2. Calculation of Centrality for the Water Level and Rainfall Stations

3.3. Development of the Water Level Prediction Model Using the Complex Network

4. Conclusions

5. Discussions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI