Storm Surge Prediction Based on Long Short-Term Memory Neural Network in the East China Sea

: As an area frequently suffering from storm surge, the Yangtze River Estuary in the East China Sea requires fast and accurate prediction of water level for disaster prevention and mitigation. Due to storm surge process being affected by the long-term and short-term correlation of multiple factors, this study attempts to introduce a data-driven idea into the water level prediction during storm surge. By collecting the observed meteorological data and water level data of 12 typhoons from 1986 to 2016 at the Lusi tidal station of Jiangsu Province, China near the north branch of the Yangtze River Estuary, a Long Short-Term Memory (LSTM) neural network model was constructed by using multi-factor time series to predict the water level during the storm surge period. This study concludes that the LSTM model performs precisely for 1 h prediction of water level during the storm surge period and it can provide a 15 h prediction of water level within a limited error, and the prediction performance of the LSTM model is visibly superior to the four traditional ML models by 41% in terms of Accuracy Coefﬁcient.


Introduction
Storm surge is a complex atmosphere-ocean coupled process which is characterized by the sudden occurrence of rising water and waves. It is widely known that the estimated maximum wind speed is a vital indicator of storm damage [1][2][3], because strong winds accompanied with low atmospheric pressure can push the water to pile up above normal levels [4]. Typhoons or hurricanes are the most energetic atmospheric force acting on coastal and estuarine waters and thus are the most serious natural disaster among marine disasters, which would cause significant changes in hydrodynamics like water level or storm surge [5][6][7][8]. Due to the warmer sea surface water [9][10][11] induced by the trend of climate warming in recent years, typhoon average intensity is expected to increase by 14% over the Northwestern Pacific by 2100 [12]. Sea level rise is considered as an important factor of storm surge. The sea level rise projected in this century by many researchers [13,14] will aggravate the threat from storm surge flooding, and the effects of sea level rise need to be considered to deal with the influence of climate changes on coastal areas. Storm size, which is usually represented by the radius of maximum windspeed, is another factor that is less focused on. Irish et al. (2008) analyzed the observed historical storm data along with the idealized numerical simulation data to find that storm surge increases with storm size, especially for the case of intense storms on very shallow slopes [15].
China is one of the nations most prone to suffer from storm surge, as more than one-third of its coastal cities are located in the high-risk zones [16]. When storm surge coincides with high astronomical tides, exceptional high water levels can occur near the mouth of estuaries and rivers [17], which can cause dike overflowing, seawall failure and prediction (harmonic analysis), the excellent nonlinear problem processing capability of Neural Networks solves the environmentally influenced noises of seasonal effects and TC-induced surge superposed on the astronomical tide level series [75]. Many studies have attempted single-layer Neural Networks or multi-layer Neural Networks (known as Deep Neural Networks, DNN) to predict tidal levels or storm surges [58,[78][79][80]. Since tidal levels and storm surges are varying with time, Recurrent Neural Networks (RNN), as a branch of Neural Networks, is a preferable option with better capability of predicting time series. Long Short-Term Memory (LSTM) proposed by Hochreiter and Schmidhuber [81] is a state-of-art development of RNN, whose disadvantages, including the disappearance or the explosion of gradient when dealing with long sequence data, are overcome [82,83]. As a favorable method for the description and prediction of time series, it is wide applied in multiple scientific and engineering fields, e.g., text recognition [84]. speech recognition [85], handwriting recognition [86], trajectory prediction [87], disease diagnosis [88], stock analysis [89], oil production [82], and electricity price [90]. LSTM is also employed to predict tidal level [73,75]. However, when it comes to dealing with the sea level during the typhoon period, which is composed of the pure tidal level and the intense typhoon-induced nonlinearity, it becomes complicated for prediction. Hence, it is an optional idea, which is implemented in this research, to take the meteorological data along with the total water level composed of the pure tidal level and the typhoon induced storm surge as the input for the LSTM model.
It is well-known that the Yangtze River Delta area, as one of the most important regions of China, is highly developed in economy and densely populated. However, it is vulnerable to coastal disasters such as typhoon, therefore it is vital to make fast disaster warning to reserve time for preparation of disaster prevention measures. In this research, a fast early-warning system based on a LSTM model for water level prediction of storm surge at Lusi tidal station, Jiangsu Province, China near the Yangtze River mouth was established. The LSTM model was trained by the time series of water level based on 12 typhoons affecting the Yangtze River Delta area from 1986 to 2016. This paper was organized as follows: Section 2 explains the relevant theory of the applied models. Section 3 gives the specific workflow to establish the LSTM prediction model and the model results are analyzed. Section 4 extends the prediction time of the LSTM model and then compares four other ML models with the LSTM model. Section 5 summarizes the main conclusions.

Methods
The forerunner of LSTM neural network is Recurrent Neural Network (RNN) [91], which is the evolution of Multi-Layer Perception and is capable of processing sequential data due to its short-term memory ability. The memory ability is realized by the so-called hidden state h which is transmitted from the former hidden layer cell to the next which is the major improvement to Feedforward Neural Network [92]. A traditional RNN structure whose hidden layer is unfolded into a full network is shown in Figure 1a, and the formulas correspondingly are shown as follows: where x t is the input, y t is the output, h t is the hidden state which is transmitted to the next hidden layer cell. The subscript t denotes the time step. f is the nonlinear activation function which is typically applied with tanh. U, W, b and V are the hyperparameters to be calibrated. The cycle of the hidden state can store the information of the previous step to keep the dependency between the hidden layer cells, and it can improve the ability of learning and extracting characteristics from the sequential data. However, the long-term dependence problem [93], e.g., the vanishing or exploring of the gradient during the back-propagation calculation, cannot be well-solved. Therefore, LSTM neural network is proposed to improve  [81], selectively adding new information and forgetting previously accumulated information. The structure of LSTM is more complicated compared with RNN and is shown in Figure 1b. Appl where t x is the input, t y is the output, t h is the hidden state which is transmitted to the next hidden layer cell. The subscript t denotes the time step. f is the nonlinear activation function which is typically applied with tanh. U , W , b and V are the hyperparameters to be calibrated. The cycle of the hidden state can store the information of the previous step to keep the dependency between the hidden layer cells, and it can improve the ability of learning and extracting characteristics from the sequential data. However, the long-term dependence problem [93], e.g., the vanishing or exploring of the gradient during the backpropagation calculation, cannot be well-solved. Therefore, LSTM neural network is proposed to improve RNN with the Gating Mechanism [81], selectively adding new information and forgetting previously accumulated information. The structure of LSTM is more complicated compared with RNN and is shown in Figure 1b.
Compared with RNN, LSTM neural network [82,83], as shown in Figure 2, introduces a new internal state t c , which deliveries information linearly to the next hidden layer cell and outputs information nonlinearly to the hidden layer's external state t h (which is analogous with the hidden state t h in RNN). t c and t h are expressed as follows: ( )  Compared with RNN, LSTM neural network [82,83], as shown in Figure 2, introduces a new internal state c t , which deliveries information linearly to the next hidden layer cell and outputs information nonlinearly to the hidden layer's external state h t (which is analogous with the hidden state h t in RNN). c t and h t are expressed as follows: where c t is the internal state. f t , i t , o t are the three gates to control the path of information transmission. The subscript t denotes the time step. c t is the candidate state through the nonlinear activation function tanh. U, W, b and V are the hyperparameters to be calibrated and the subscript c represents c t . LSTM neural network introduces Gating Mechanism to control the path of information transmission, i.e., forget gate, input gate, and output gate. The formulas of these three gates are expressed as follows: where f t is the forget gate to control how much information needs to be forgotten about the internal state c t−1 of the previous time step. i t is the input gate to control how much information needs to be saved about the candidate state c t of the current time step. o t is the output gate to control how much information about the internal state c t needs to be output to the external state h t at the current time step. U, W, b, and V are the hyperparameters to be calibrated and the subscripts f, i and o represent forget gate, input gate and output gate, respectively. The nonlinear activation function σ is the sigmoid function which enables values between 0 and 1, and it is expressed as: From the flow chart of LSTM neural network in Figure 1b, the computational process is as follows: firstly, the external state h t−1 of the previous time step and the input x t of the current time step are used to calculate the three gates f t , i t , o t and the candidate state c t ; secondly, the forget gate f t , and the input gate i t are integrated to update the internal state c t ; finally, combined with the output gate o t , information about the internal state c t is passed to the external state h t .
In addition to the LSTM neural network method, several other ML methods, i.e., Bayesian Ridge Regression (BRR), Gradient Boosted Decision Tree (GBDT), Linear Regression (LR) and Support Vector Regression (SVR), are used in this research as a comparison group. BRR, based on Bayesian knowledge, is aimed to solve the problem of multicollinearity in linear regression, and to serve the purpose of estimating regression coefficients and selecting variables [94]. GBDT is a suitable method for classification and regression problems, which uses decision stumps or regression tress as weak classifiers [95]. LR is the most basic and widely used model in ML and statistics, which is a regression analysis to model the relationship between independent variables and dependent variables. SVR is also widely used as another type of ML approach, which finds a line or a hyperplane in a higher dimension to fit the data [96]. The ML models were constructed based on scikit-learn [25] that is a simple and efficient tool for ML, readers can refer to it for more detailed information of the four ML methods used in this study.  LSTM neural network introduces Gating Mechanism to control the path information transmission, i.e., forget gate, input gate, and output gate. The formula these three gates are expressed as follows:

Study Area and Data Collection
This research focuses on the Yangtze River Delta Region, which is located in the lowermiddle reaches of the Yangtze River near the East China Sea as shown in Figure 3. The Yangtze River Delta Region covers an area of 3.58 × 105 km 2 including Shanghai, Jiangsu, Zhejiang and Anhui, where the total population reaches up to 227 million. The Yangtze River Delta is one of the regions in China with the most active economic development, so it has created nearly a quarter of China's total economy with less than four percent of the country's land area. Due to the coastal location, the Yangtze River Delta has been suffering from multiple disasters, especially typhoons and the resulting storm surge. Hence it is necessary to provide a fast prediction for the water level, which usually exceeds the safety threshold during the typhoon period. In this research, the LSTM model was trained for typhoon water level prediction by the observation data which is derived from the oceanic and meteorological observation station at Lusi Port of Jiangsu province near the north branch of the Yangtze River. The data were collected during the 12 typhoon-induced storm surges occurred from 1986 to 2016 and their tracks are shown in Figure 3. In particular, the oceanic and meteorological sample data of Typhoon No. 1410 in 2014 are shown in Table 1. ML algorithms require large amounts of data as training sets for higher reliability. Air pressure, wind speed, wind direction and water level are set as the main training input factors. Typhoon central pressure, central wind speed, moving speed, and moving direction are set as related auxiliary reference factors.

Data Processing and Model Setting
First, it is vital to preprocess the data by normalization or standardization methods for data correlation analysis and network training because the magnitudes of the original data are usually different. Here the standardization method is utilized, and the formulas are as follows: where x i is standardized data, x i is original data, x is the mean of original data, and S d is the standard deviation of original data. After standardization, the processed data which avoids the influence of abnormal and extreme values is good for ML training. The restore of the standardized data can be implemented according to the inverse function of the formulas above. The dataset was made in the way that the meteorological data and the water level data during the storm surge from the previous time period were selected as input, and the water level data of the next time step were selected as output. More detailly, the input data was divided into data slices by a sliding window of 24 h and then the 24-h data slice was used to predict the next 1 h, 3 h, 7 h, and 15 h of water level data respectively. The dataset was divided into a training set by 80% and a testing set by 20% and the number of the training cycle is 100. The testing set is used to train the LSTM model and the testing set is used for prediction.

Model Evaluation
In order to effectively evaluate the predicted results under different ML models, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Accuracy Coefficient (ACC) are employed as the evaluation indexes for different models.
Mean Absolute Error (MAE), which is used to evaluate the closeness between the predicted data and the observed data. The smaller the value is, the better the fitting result. Root Mean Squared Error (RMSE), which is used to calculate the square root of the mean of the sum of the squares of the errors between the predicted data and the observed data. The smaller the value of RMSE, the better the fitting result. Accuracy Coefficient (ACC), which is 1 minus the absolute error between the predicted data and the observed data. The larger the value of ACC, the better the fitting result. The formulas of MAE, RMSE, and ACC are expressed as follows: where X p i and X o i are the i th predicted value and the i th observed value of n samples, respectively. In practice, the peak value of the water is usually the key criterion for marine management departments to judge whether it is necessary to enter the state of alert or disaster relief. Hence, the maximum difference between the modelled and observed values of at the water level peak is taken into consideration for analysis below.

Model Results of LSTM
The 12 processes of storm surge with the meteorological data (air pressure, wind speed, wind direction) and the water level data which were recorded at Lusi tidal station in Jiangsu Province, China from 1986 to 2016 are used to train the hyperparameters of the LSTM model. Although some scholars used neural network to predict the time-varying storm surge which is calculated by subtracting the astronomical tidal level from the total water level [57], the data used in this study is the total water level including the pure astronomical tidal level and the typhoon-induced water level due to that the key criterion usually considered by the marine management departments is the total water level, especially the peak water level prone to exceed the warning limit. Figure 4 shows the model results of the LSTM model with 1 h prediction time and its ACC value of training set is above 0.95 to guarantee the model precision. The testing set is calculated for prediction of the water level using the hyperparameters derived by training the LSTM model based on the training set. In Figure 4b, the light blue area represents a cold start period of 24 h for testing set calculation and it is not considered into analysis. In terms of the model evaluation for testing set, the values of MAE, RMSE, and ACC are 13.40, 16.28, and 0.95 respectively, as shown in Table 2. The testing set agrees well with the observed water levels. Since the peak value of water level is the most concern during the time series, the differences of each peak value are calculated and the maximum difference is 18 cm circled by the red dash line in Figure 4b. In general, the 1 h prediction ability of the LSTM model performs quite well. analysis. In terms of the model evaluation for testing set, the values of MAE, RMSE, and ACC are 13.40, 16.28, and 0.95 respectively, as shown in Table 2. The testing set agrees well with the observed water levels. Since the peak value of water level is the most concern during the time series, the differences of each peak value are calculated and the maximum difference is 18 cm circled by the red dash line in Figure 4b. In general, the 1 h prediction ability of the LSTM model performs quite well.

Multiple Prediction Times of LSTM Model
In order to investigate the prediction capability of the LSTM model in a longer term, representative prediction times of 3 h, 7 h, and 15 h are selected to train the model and the ACC values of training set are all above 0.85 to ensure the model precision. Figure 5 shows the model results of testing set for the LSTM model with these four prediction times. It can be seen that the time series fit well with the observed values for the testing set morphologically. In terms of the difference between the modelled and observed values of the testing set, the maximum difference increases with prediction time which are 18 cm, 23 cm, 36 cm, and 45 cm at prediction times of 1 h, 3 h, 7 h, and 15 h respectively. Figure 6 shows the MAE, RMSE, and ACC values of testing set for the LSTM model with 1 h, 3 h, 7 h, and 15 h prediction times. Difference has a consistent augment trend for the maximum difference between the modelled and observed values, i.e., the MAE value increases sharply from 1 h prediction time to 3 h prediction time by 150%, which is followed by a slightly decrease from 3 h prediction time to 7 h prediction time by 6% and a moderately increase from 7 h prediction time to 15 h prediction time by 29%. It might be due to the nature of MAE which is the average of the absolute difference value between the modelled and observed values. The variation of RMSE with the prediction time displays a similar form to MAE while ACC shows an inverse pattern. For simplicity, it is feasible to select one of these three methods to evaluate the model. For example, when MAE is taken as the criterion, its value for 1 h and 15 h prediction times is 13.4 cm and 40.7 cm respectively. Based on the 20-year return period storm surge value of 186.5 cm [97], the percentage error of 1 h and 15 h prediction times are 7% and 22% respectively. Finally, it can be concluded that the LSTM model performs precisely for 1 h prediction of water level during the storm surge period and it can also provide a 15 h prediction of water level within a limited error.

Multiple Prediction Times of LSTM Model
In order to investigate the prediction capability of the LSTM model in a longer term, representative prediction times of 3 h, 7 h, and 15 h are selected to train the model and the ACC values of training set are all above 0.85 to ensure the model precision. Figure 5 shows the model results of testing set for the LSTM model with these four prediction times. It can be seen that the time series fit well with the observed values for the testing set morphologically. In terms of the difference between the modelled and observed values of the testing set, the maximum difference increases with prediction time which are 18 cm, 23 cm, 36 cm, and 45 cm at prediction times of 1 h, 3 h, 7 h, and 15 h respectively.  Figure 6 shows the MAE, RMSE, and ACC values of testing set for the LSTM model with 1 h, 3 h, 7 h, and 15 h prediction times. Difference has a consistent augment trend for the maximum difference between the modelled and observed values, i.e., the MAE value increases sharply from 1 h prediction time to 3 h prediction time by 150%, which is followed by a slightly decrease from 3 h prediction time to 7 h prediction time by 6% and a moderately increase from 7 h prediction time to 15 h prediction time by 29%. It might be due to the nature of MAE which is the average of the absolute difference value between the modelled and observed values. The variation of RMSE with the prediction time displays a similar form to MAE while ACC shows an inverse pattern. For simplicity, it is feasible to select one of these three methods to evaluate the model. For example, when MAE is taken as the criterion, its value for 1 h and 15 h prediction times is 13.4 cm and 40.7 cm respectively. Based on the 20-year return period storm surge value of 186.5 cm [97], the percentage error of 1 h and 15 h prediction times are 7% and 22% respectively. Finally, it can be concluded that the LSTM model performs precisely for 1 h prediction of water level during the storm surge period and it can also provide a 15 h prediction of water level within a limited error.

Comparison with Other ML Methods
As mentioned before, the most impressive feature of LSTM is the superior ability to deal with the time sequence data to traditional ML methods owing to its nature of long short-term memory. Hence, it is necessary to compare its performance with other traditional ML methods in terms of different prediction times. When training the four models with different prediction times, their ACC values of training set are all above 0.85 to guarantee the model precision. Figure 7 shows the model results of testing set for the BRR, GBDT, LR and SVR models with 1 h, 3 h, 7 h, and 15 h prediction times. For 1 h prediction results, it can be observed from Figure 7a,i that the line shapes of BRR and LR are quite similar with generally the same values of MAE, RMSE, ACC, and maximum difference (listed in Table 2), which is probably because BRR is a variant development of LR. Compared with BRR and LR, the morphology of GBDT and SVR for 1 h prediction result fits worse with the observation data. As the prediction time increases, the performance of the four models gets worse with different characteristics, i.e., the amplitudes of BRR and LR results are much smaller than the observation data and some parts of GBDT and SVR results are out of phase with the observation data and mix with noise.

Comparison with Other ML Methods
As mentioned before, the most impressive feature of LSTM is the superior ability to deal with the time sequence data to traditional ML methods owing to its nature of long short-term memory. Hence, it is necessary to compare its performance with other traditional ML methods in terms of different prediction times. When training the four models with different prediction times, their ACC values of training set are all above 0.85 to guarantee the model precision. Figure 7 shows the model results of testing set for the BRR, GBDT, LR and SVR models with 1 h, 3 h, 7 h, and 15 h prediction times. For 1 h prediction results, it can be observed from Figure 7a,i that the line shapes of BRR and LR are quite similar with generally the same values of MAE, RMSE, ACC, and maximum difference (listed in Table 2), which is probably because BRR is a variant development of LR. Compared with BRR and LR, the morphology of GBDT and SVR for 1 h prediction result fits worse with the observation data. As the prediction time increases, the performance of the four models gets worse with different characteristics, i.e., the amplitudes of BRR and LR results are much smaller than the observation data and some parts of GBDT and SVR results are out of phase with the observation data and mix with noise. Appl Figure 8c, variation trend of ACC is on the contrary to the other evaluation metrics. In the four subplots of Figure 8, the evaluation values of LSTM (dark blue line) are separated from the other four models, which indicates that its prediction performance is visibly superior to the four traditional ML models. Taking ACC as the evaluation metric, the average ACC values of LSTM and the traditional ML methods for the four prediction times are 0.89 and 0.7, and therefore LSTM has a superior prediction ability by 27%. Appl. Sci. 2022, 12, x FOR PEER REVIEW 13 of 17 or decrease slightly from 3 h to 15 h prediction time. In Figure 8c, variation trend of ACC is on the contrary to the other evaluation metrics. In the four subplots of Figure 8, the evaluation values of LSTM (dark blue line) are separated from the other four models, which indicates that its prediction performance is visibly superior to the four traditional ML models. Taking ACC as the evaluation metric, the average ACC values of LSTM and the traditional ML methods for the four prediction times are 0.89 and 0.7, and therefore LSTM has a superior prediction ability by 27%.

Conclusions
In this study, an improved version of RNN, i.e., LSTM neural network, is trained to predict the water level during the storm surge period in the East China Sea based on the meteorological and water level data of 12 typhoons from 1986 to 2016. The 1 h prediction ability of the LSTM model performs quite well with the MAE, RMSE, and ACC values of 13.40 cm, 16.28 cm, and 0.95, respectively and the maximum difference between the model prediction values and the observation values is 18 cm during the testing set. When extending the prediction time to 3 h, 7 h, and 15 h and comparing their model results with 1 h prediction time, it can be concluded that the LSTM model performs precisely for 1 h prediction of water level during the storm surge period and it can provide a 15 h prediction of water level within a limited error. In addition, four traditional ML models are trained in the prediction times of 1 h, 3 h, 7 h, and 15 h, and the model results are compared with the LSTM model. Based on the comparison analysis, it indicates that the prediction performance of the LSTM model is visibly superior to the four traditional ML models by 27% in terms of ACC. In general, the LSTM model can help engineers and decision-makers to quickly obtain the warning information of the storm surge in advance based on the reasonable water level prediction and immediately make fast emergency responses.

Conclusions
In this study, an improved version of RNN, i.e., LSTM neural network, is trained to predict the water level during the storm surge period in the East China Sea based on the meteorological and water level data of 12 typhoons from 1986 to 2016. The 1 h prediction ability of the LSTM model performs quite well with the MAE, RMSE, and ACC values of 13.40 cm, 16.28 cm, and 0.95, respectively and the maximum difference between the model prediction values and the observation values is 18 cm during the testing set. When extending the prediction time to 3 h, 7 h, and 15 h and comparing their model results with 1 h prediction time, it can be concluded that the LSTM model performs precisely for 1 h prediction of water level during the storm surge period and it can provide a 15 h prediction of water level within a limited error. In addition, four traditional ML models are trained in the prediction times of 1 h, 3 h, 7 h, and 15 h, and the model results are compared with the LSTM model. Based on the comparison analysis, it indicates that the prediction performance of the LSTM model is visibly superior to the four traditional ML models by 27% in terms of ACC. In general, the LSTM model can help engineers and decision-makers to quickly obtain the warning information of the storm surge in advance based on the reasonable water level prediction and immediately make fast emergency responses.