Modeling Maximum Tsunami Heights Using Bayesian Neural Networks

: Tsunamis are distinguished from ordinary waves and currents owing to their characteristic longer wavelengths. Although the occurrence frequency of tsunamis is low, it can contribute to the loss of a large number of human lives as well as property damage. To date, tsunami research has concentrated on developing numerical models to predict tsunami heights and run-up heights with improved accuracy because hydraulic experiments are associated with high costs for laboratory installation and maintenance. Recently, artiﬁcial intelligence has been developed and has revealed outstanding performance in science and engineering ﬁelds. In this study, we estimated the maximum tsunami heights for virtual tsunamis. Tsunami numerical simulation was performed to obtain tsunami height proﬁles for historical tsunamis and virtual tsunamis. Subsequently, Bayesian neural networks were employed to predict maximum tsunami heights for virtual tsunamis.


Introduction
Tsunamis triggered by undersea earthquakes around the Pacific Ocean have frequently caused loss of human life and property damage in coastal areas. To date, two huge tsunamis in history occurred in 2004 and 2011. The Sumatra tsunami event in 2004 caused approximately 220,000 human deaths and 10 billion dollars in property damage. More recently, the East Japan tsunami event in 2011 caused over 20,000 casualties and 300 billion dollars in property damage [1] and incited nuclear power plant meltdowns at Fukushima in Japan. In the field of tsunami research, data analysis and hydraulic experiments for tsunami events are challenging because major tsunamis are rare and laboratory installations for hydraulic experiments are exceedingly expensive. For these reasons, many researchers have focused on developing numerical models to predict accurately tsunami heights defined as vertical height from the troughs to crests of tsunamis and run-up heights defined as vertical heights from the datum levels to the highest inundated areas for several decades [2][3][4][5][6]. Recently, artificial intelligence (AI), using big data, has been developed and applied in many science and engineering fields with significant performances. In machine learning, there are two main methods: deterministic and probabilistic methods. The representatives of the deterministic method are Multi-layer perceptron (MLP), convolution neural networks (CNNs), and recurrent neural networks (RNNs) [7]. Meanwhile, probabilistic random forest and Bayesian neural networks (BNNs) can be classified as probabilistic methods [8,9]. In this study, BNNs, which consist of neural networks (NNs) and a Bayesian inference, are proposed. One advantage of a BNN is that it enables prediction under insufficient data. Moreover, it not only considers the relationships between data but rarely causes overfitting problems, which are a major issue in NNs [10]. BNNs include three processes: training, testing, and prediction. Training is performed to search for optimized parameters with given input variables on BNNs. In the testing, so-called model validation, the performance of the trained weights and bias are estimated by applying it to new data that were not part of the training dataset. Finally, predictions are made based on the provided input variables. This study explores the estimation of maximum tsunami heights with BNNs. Jumunjin Port, located on the eastern coast of the Korean Peninsula, was offered to target location. Inundation damage was caused by a tsunami in 1983 [11]. A numerical simulation was performed for two historical tsunamis and virtual tsunamis, which could be triggered by probable undersea earthquake in the future [12,13], to obtain tsunami height profiles and evaluate the maximum tsunami heights. BNNs were employed, and the pairs of tsunami height profiles at the site of the tsunami epicenter and Jumunjin port (37.89 N, 128.83 E) considering two historical tsunami events: tsunami events in 1983 (40.54 N, 139.02 E) and 1993 (42.34 N, 139.25 E), were used as the input variables for training and testing. Finally, BNNs were used to predict the maximum tsunami heights for the virtual tsunamis.

Numerical Simulation
A tsunami that is triggered along the western coast of Japan can travel to the Korean Peninsula. The propagation model for a tsunami traveling across deep water should account for frequency dispersion and the Coriolis force. Thus, the nonlinear convective terms of the momentum equations can be ignored in the numerical simulation because the free surface displacement of a tsunami is relatively lower than the water depth. However, the frequency dispersion effects are influential in distance propagation and should be considered in the model's governing equations because the wave length of a tsunami is relatively short compared to that of the tide and tsunamis are propagated over long distances. Therefore, the governing equations that address the dispersion effect, namely, the linear Boussinesq equations, should be used in the numerical simulation of tsunami propagation. However, the dispersion terms of the linear Boussinesq equations include higher-order derivatives that may cause practical numerical difficulty. To resolve this difficulty, linear shallow-water equations in a set of lower-order derivatives are used for dispersion effects [6,14].

Propagation
For a tsunami propagating far across a deep ocean, a linear shallow-water equation in a set of lower-order derivatives is used. The linear shallow-water equations can be expressed in the form as below, ∂η ∂t where η is the free-surface displacement, h is the mean water depth, g is the acceleration of gravity, and P and Q are the volume fluxes in the x and y directions, respectively, with depth-averaged flow velocities at the still-water level.

Initial Free Surface Displacement
Two historical tsunami events-the Central East Sea tsunami of 1983 and the Hokkaido tsunami in 1993-as well as the virtual tsunamis were employed for the tsunami numerical simulation (Figure 1). The two selected historical tsunamis are representative of tsunamis that have affected Korea in the last 100 years. The virtual tsunamis were selected based on the seismic gap where potential earthquakes could strike in the vertically developed fault zone along the western coast of Japan. Table 1 presents the fault parameters of the historical tsunamis and virtual tsunamis: H is the height at the top of the fault plane, θ is the strike angle, δ is the dip angle, λ is the slip angle, L is the length, W is the width, and D is the dislocation where the plane is displaced ( Figure 2). The source locations, strike angles, heights, dip angles, and slip angles were obtained from the Korean Peninsula Energy Development Organization [12]. The length, width, and dislocation considering the magnitude (M) were calculated according to the equations presented below,

Bayesian Neural Networks
Neural networks are characterized by two main trends: deterministic and probabilistic methods. BNNs, based on the Bayes' theorem [15], utilize a probabilistic method that combines NNs and Bayesian inference. A robust advantage of a BNN is that it offers more information with probability distribution than deterministic method such as MLP or CNNs or RNNs.

Neural Networks
The MLP, which is a fundamental NN, is explained in this section. Given a dataset X, which is an input vector composed of several causal variables, and Y, which is an output vector composed of several resulting variables, the NNs are conducted. Most NNs consist of nodes, links, and layers with input, hidden, and output layers. The input layer receives input variables for the problem to be dealt with. In the output layer, the prediction of values is evaluated through a loss function, and the final output is represented. In the hidden layers, a trial-and-error procedure is performed. Every layer has nodes, and these nodes are connected by links, with synaptic weight assigned to each link to represent the relative connection strength [16]. In mathematics, NNs are based on the parameterized basis function as follows, where φ is the activation function that leads to a nonlinear effect, x represents the input variables, w is the weight, and b is the bias. Tangent hyperbolic functions or sigmoid functions, expressed below, were frequently used in the beginning of research as activation functions; however, ReLU or Selu functions have been developed and widely used more recently ( Figure 3). The loss function is employed at nodes in the output layer and expressed as follows, where N is the number of observations, Y is the output vector, and y is the value predicted by the networks. If the result is not sufficient to objective error after the loss function is evaluated, it is necessary to adjust the weight and bias. The algorithm was introduced as follows [17], where ∆w ij is the adjusted weight and η is the learning rate. One of the most noticeable differences between NNs and BNNs is the method of computing the weight and bias. The BNNs employ a probability distribution for the weight and bias (Figure 4).

Bayesian Inference
Bayesian inference, which has been used in many science and engineering fields, is based on Bayes' theorem. It offers posterior probability based on evidence, prior probability, and marginal likelihood: where X and Y are the datasets, and w is the weight. P(Y|X) is the evidence, P(w) is the prior probability, P(Y|X, w) is the marginal likelihood, and P(w|X, Y) is the posterior probability. In the BNNs, given the data X, Y, and the initial P(w), the evidence, marginal likelihood, and posterior probability can be evaluated for weight at each link in all layers. Meanwhile, it is well known that predicting an analytical solution for posterior probability is typically difficult P(w|X, Y). Therefore, an approximating variational distribution q π (w) is used in this study [18], and an automatic differentiation variational inference (ADVI) scheme that uses a spherical Gaussian without the correlation of parameters was introduced for variational distribution. The goal was to approximate the posterior distribution P(w|X, Y): where X and Y are the datasets, and q * π is the minimum of this optimization objective.

Results
In this study, numerical simulations for historical tsunamis and virtual tsunamis were performed at Jumunjin Port, where a 1983 tsunami caused inundation ( Figure 5). Tsunami height profiles used on BNNs were extracted in the epicenter (G1 or G2) and at the gage point in Jumunjin (G3) for 175 min ( Figure 6).  BNNs were constructed with a combination of NNs and Bayesian inference. First, two nodes for tsunami height and time profiles in the input layer and arbitrary ten nodes in each of the three hidden layers are introduced. Moreover, a tangent hyperbolic function is adopted at the nodes to describe nonlinear effects in NNs. The assumption of Gaussian distribution for weights and bias, expressed in Equation (14), are accepted as prior distribution for links on BNNs [19]. Based on these conditions, BNNs were established to evaluate the maximum tsunami height for virtual tsunamis following three steps: training, testing, and prediction. Prior to estimating the maximum tsunami height, optimization over parameters of Gaussian distribution was carried out owing to effects of parameters of prior distribution as small dataset [10], where µ is mean value and σ 2 expresses standard deviation over tsunami height.

Training
Training plays a key role in BNN prediction. It is well known that normalization can improve performance and contribute to high-speed training of NNs [20]. Therefore, given tsunami height profiles, normalization, expressed below, was performed for tsunami height and time profiles at the epicenter and gage point prior to training: where H and ζ are the original and normalized values, respectively. The training and testing datasets were collected along the arrival of the tsunamis at G3 in Jumunjin Port over historical tsunami events, which was estimated to approximately 1300 sample size for both tsunami in 1983 and in 1993. Following the works of Afshin G. et al. (2018) [21] and Amazon (2020) [22], 80% of the collected datasets were randomly selected for training (Figure 7), and these datasets are used for updating weights and bias on links. Estimated weights and bias at all links were checked for test and prediction processes. Moreover, it is necessary to optimize parameters of Gaussian distribution for maximum tsunami heights. In this study, given training datasets, optimization estimation for parameters over historical tsunamis is conducted beforehand. Figure 8 shows the mean difference fields, which describe the absolute difference between numerical simulation and BNNs according to mean (µ) and standard deviation (σ 2 ) for maximum tsunami height. Figure 8a,b represents tsunamis in 1983. Figure 8a reveals that the minimum of mean difference field is concentrated in the range of 2.6 to 3.5 standard deviation. In this study, through the estimation of each standard deviation where mean difference demonstrates the minimum, the parameters of Gaussian distribution is estimated to 0.0 mean at 2.9 standard deviation as a prior distribution (Figure 8c). Figure 7b,d shows the 1993 tsunami data. Unlike the tsunami in 1983, the minimum mean difference was distributed from 2.0 to 2.6 (Figure 8b), and the parameters are considered for 0.2 and 0.3 mean at 2.5 standard deviations, where the mean difference becomes small (Figure 8d), as a prior distribution to determine the best performance. Based on training datasets and prior distribution for both tsunamis in 1983 and 1993, training is examined with ADVI. Figure 9 shows how ELBO reached a stable state where KL was minimized. In this study, the ELBO approached stable state after 35,000 of the 50,000 over historical tsunami events.

Testing
Testing was performed to determine how training should be conducted. Figures 10 and 11 demonstrate several of the trained weight and bias distribution at links from the input layer to the first hidden layer for the tsunamis in 1983 and 1993. It can confirm that both the weights and bias at each link depict Gaussian-like distributions, and the bias has broader standard deviation bands than weights as a result of training ( Figure 10). Contrary to the tsunami in 1983, it shows that the standard deviation of weights at links expanded wider than the bias for the tsunami in 1993 (Figure 11). Given trained weights and bias, testing is carried out with a rest of 20% of the normalized tsunami height profiles ( Figure 12).   Table 2 shows that the maximum tsunami heights, evaluated over historical tsunami events in BNNs, were compared with the numerical model, and inverted normalized tsunami heights to true tsunami heights before comparison. The tsunami in 1983 was evaluated to be overestimation with +2% bias. In contrast, tsunami in 1993 exhibits non-bias as a mean of 0.2, and it is overestimated little with +2% bias as mean of 0.3 at a standard deviation of 2.5. The case of a 0.2 mean is estimated to be less biased than the 0.3 mean in testing for tsunamis in 1993.

Prediction
Given trained information on weights and bias for historical tsunamis as described in Figures 10 and 11, BNNs are introduced to virtual tsunamis, which are similar to historical tsunamis in terms of magnitude and occurrence locations at the epicenter. Figure 13 shows the tsunami height profiles of the virtual tsunamis at the epicenter, and it was converted to normalized tsunami heights and time as performed in the training process. Afterward, prediction was performed to determine how well the BNNs predicted virtual tsunamis. Table 3 present the results for the maximum tsunami heights of the numerical model and BNNs. In the case of virtual tsunami 1, although the tsunami height of BNNs is underestimated as compared to those of numerical model approximately 0.07 m, it shows a similar tendency to the tsunami height of the numerical model with a 3% difference. Regarding virtual tsunami 2, on the condition of a stationary standard deviation of 2.5, the BNNs model with a mean of 0.3 indicates better performance. The BNNs model with a mean of 0.2, which yielded more accurate results in testing, did not capture the maximum tsunami height, showing a 35% difference. In contrast, although the BNNs model with a mean of 0.3 has minimal bias, which is considered acceptable, it exhibits better performance. The reason why BNNs model with a mean of 0.2 showed worse performance than those with 0.3 mean is the issue of overfitting, which occurs as data fit into the trained dataset well, not considering other datasets [10]. Although overfitting is not a significant concern for BNNs, sensitivity to parameters of prior distribution was observed in this study. In the case of virtual tsunami 2, the 0.3 mean yields the best performance.

Summary and Conclusions
AI has been applied to many science and engineering fields, exhibiting remarkable performance. In this study, we attempted to discern maximum tsunami heights for virtual tsunami at the Jumunjin Port, where the 1983 tsunami caused inundation in some areas. BNNs, combining Bayesian inference and NNs, have the advantage of enabling prediction with a small dataset and were introduced to evaluate maximum tsunami height. Two nodes in the input layer and ten nodes in each of the three hidden layers were constructed with a tangent hyperbolic function to describe nonlinear effects. In addition, variational inference was proposed in Bayesian inference. Tsunami height profiles at the epicenter and gage in port were used as fundamental datasets, and normalization for tsunami height and time was performed in advance. Approximately 80% of the tsunami height profiles were classified for the training dataset, and the remainder of the dataset was applied to the testing dataset. A Gaussian distribution was assumed as a prior distribution in BNNs, and parameters for mean (µ) and standard deviation (σ 2 ) were optimized. Based on this work, the training, testing, and prediction processes were examined. In training, it was determined that posterior probabilities of weights and bias at links on BNNs were well trained for historical tsunami events. Then, good agreement between the numerical model and BNNs for maximum tsunami heights over historical tsunami events were confirmed in testing. Finally, maximum tsunami height over virtual tsunamis were predicted based on trained weights and bias of BNNs. In particular, depending on distribution parameters, different results were obtained such that 0.2 mean meets overfitting, whereas a mean of 0.3 indicated the best performance. Although there were disparities in maximum tsunami heights between the BNNs and numerical model, the BNNs could reasonably predict the maximum tsunami heights for virtual tsunamis.