A Predictive Model for Estimating Damage from Wind Waves during Coastal Storms

: In recent years, climate abnormalities have been observed globally. Consequently, the scale and size of natural disasters, such as typhoons, wind wave, heavy snow, downpours, and storms, have increased. However, compared to other disasters, predicting the timing, location and severity of damages associated with typhoons and other extreme wind wave events is difﬁcult. Accurately predicting the damage extent can reduce the damage scale by facilitating a speedy response. Therefore, in this study, a model to estimate the cost of damages associated with wind waves and their impacts during coastal storms was developed for the Republic of Korea. The history of wind wave and typhoon damages for coastal areas in Korea was collected from the disaster annual report (1991–2020), and the damage cost was converted such that it reﬂected the inﬂation rate as in 2020. Furthermore, data on ocean meteorological factors were collected for the events of wind wave and typhoon damages. Using logistic and linear regression, a wind wave damage prediction model reﬂecting the coastal regional characteristics based on 74 regions nationwide was developed. This prediction model enabled damage forecasting and can be utilized for improving the law and policy in disaster management.


Introduction
In recent years, the incidence of major natural disasters and the resultant damage costs have increased owing to accelerated global warming and frequent storms. These disasters cause damage to various social assets, in severe cases, and often result in injury to humans. In countries without social disaster prevention facilities or a preliminary system for addressing disasters, human injury and considerable damage to assets will rise.
Furthermore, the destruction of the ecosystem affects the climate adversely, and coastal areas are becoming increasingly fragile because of changing sea levels and the absence of buffering [1]. A total of 23% of the world's population lives in coastal areas, large cities with a population of more than 10 million, and 66% of the residents live in coastal areas [2][3][4]. Following the revision of the Natural Disaster Countermeasures Act in 2005, storms caused by sea winds and waves were classified as natural disasters, along with heavy rains, typhoons, and earthquakes. However, compared to other natural disasters, there are very few studies on forecasting storm damage.
Existing studies on predicting the effects of storms and tsunamis focused on verifying the efficiency and accuracy of predicting tsunami heights using numerical analysis models. Kang Si hwan et al. (2004) predicted the tsunami invading Masan Bay of Typhoon Maemi by calculating the effect of typhoon speed using the local tsunami model [5]. Lee Hye Woo et al. (2014) created a storm prediction model to predict dangerous weather and measured its accuracy [6]. Lee Seung soo et al. (2014) demonstrated the usefulness of the data measurement provided by the Ocean Research Institute by predicting storm and tsunami using a meteorological-ocean-related numerical model [7]. Vanem et al. [8] studied some applications of extreme value analysis of ocean waves that are believed to be of relevance for coastal and ocean engineering. Fazeres-Ferradosa et al. [9] presented copula-based approach to obtain the joint cumulative distribution function of the significant wave heights and the up-crossing mean period.
The Korea Adaptation Center for Climate Change (2013) analyzed foreign climate change vulnerability and reclassified domestic climate change vulnerability to evaluate the vulnerability of local government areas based on climate change [10]. Wamdi (1988) developed a third-generation windfall prediction model, WAM, and studied windfall [11]. Soomere (2005) conducted a statistical study on the storm in the Gulf of Tallinn [12]. Recently, Ardhuin et al. (2017) conducted a study on the effect of small currents on storm [13]. Oliveria et al. [14] found that all the European Union guidelines and frameworks are being implemented in the Portuguese Governmental planning and are very well substantiated, whereas the base of all land management instruments (IGT), have a questionable implementation, mainly due to the number of entities involved and the long implementation process. Taveira-Pinto et al. [15] integrated management and planning of CPLP's coastal zones. Wu et al. [16] presented a series of large-scale wave flume experiments on the scour protection damage around a monopile under combined waves and current conditions with model scales of 1:16.67 and 1:8.33.
Although various technologies have been developed for predicting wind and waves, research has been found to be insufficient to predict the extent of the damage based on past damage history. Therefore, in order to predict the damage scale and take an initial response to it, as mentioned in this study, we would like to propose a function to predict the damage amount by reflecting weather factors and regional characteristics of the Korea Meteorological Administration and the National Ocean Research Institute.

Wind Wave Financial Damage Prediction Model Development Procedure
To develop a wind wave damage amount prediction model that can predict the extent of storm damage, the study was conducted in five stages, as shown in Figure 1. First, we collected the data used as independent and dependent variables. Second, we divided the model into the evaluation and learning sections to assess its predictive capacity. Third, we implemented a complex regression model and developed a financial damage prediction model prototype. Fourth, we tested the verification accuracy using the evaluation section. Finally, we developed the financial damage prediction model using the entire data (1994-2020).

Basic Data Building
This study was based on the annual disaster report of the Ministry of the Interior and Safety and Ministry of the Interior and Safety's website. The wind wave financial damage was set as a dependent variable and the marine weather data (wind speed, wave height, and tide on the date of damage) and social and economic factors (number of fisheries, length of shoreline, and area of the coastal area) were set as the independent variables, based on which the wind wave financial damage was predicted. In addition, a group classification model was developed by classifying the total amount of damage into groups with large damages and small damages. A prediction function for the amount of damage was developed for each classified group.

Classification of Learning and Prediction Section
To evaluate the predictive ability of the windstorm-damage prediction function, the model was divided into the study and prediction sections that were used to develop the damage prediction function and verify the wind-damage prediction formula, respectively.
The classification was performed by analyzing the number of cases of wind damage by year; 60% of the total number of damages was set as the base point, based on which the learning and prediction sections were divided. The study section at the municipal and provincial levels was composed of 252 cases of wind-damage history from 1998 to 2011, and 126 cases of wind-damage history from 2012 to 2020 were set as the prediction sections.

Development of Wind Wave Prediction Model Prototype
A complex regression model (CRM) was implemented using the differentiated learning section data. The CRM prioritized the group classification of damages through the sequential logistic return and multi-return analyses and complemented a separate groupspecific multiple return model to predict the amount of damage.

Predictive Performance Evaluation and Final Model Setup
As a result of the significance assessment between variables, we found that correlation was meaningless, so it could be used as an independent variable.
In addition, the performance of each model was evaluated based on the actual damage, and the model with the most accurate estimate of the amount of damage was selected as the final model. The classification performance assessment metric was the receiver operating characteristics (ROC) curve. The predictive power of the group damage amount classified was assessed based on the root mean square error (RMSE) and normalized RMSE (NRMSE).

Logistic Regression Model
Logistic regression is a regression method that is used when the dependent variable is binary (when there are two possible values such as failure/success and significant damage/insignificant damage).
Odds, the framework of logistic regression, is a concept that refers to the ratio of the probability of Event A occurring; it is summarized as shown in Equation (1) below: The closer P(A) is to 1, the higher the odds, and if P(A) is 0, the odds are zero. In other words, the greater the odds, the greater the probability of Event A occurring. In this study, a logistic return model was used to classify the amount of damage; based on the ROC curve, the groups with significant and minimal damage were set to one and zero, respectively, and the optimal group classification was determined.

Multiple Regression Models
A multi-linear regression model is a linear model that predicts a response variable using two or more independent variables. Assuming the response variable is y, the independent variable is X = x 0 , x 1 , · · · , x p , x 0 = 1, and the regression coefficient is β = β 0 , β 1 , · · · , β p , the response variable is then expressed by the linear combination between the independent variable and the regression coefficient, as shown in Equation (2) below, with β 0 representing intercepts and β 1 , · · · , β p , the corresponding regression coefficients for each independent variable. ε is an error not explained by the model; it is assumed to follow a normal distribution with a mean of zero and variance of σ 2 . The regression coefficient β 0 , β 1 , · · · , β p is estimated to minimize the following error sums:

Composite Return Model
The combined regression model used in this study is a technique that combines the logistic and multiple regression models. Due to the nature of storm damage, the range of the damage varies according to the season. Therefore, the proposed method entails developing the predictive function for the extent of damage by implementing multiple regression models through the logistic regression model (group with significant damage and group with minimal damage) and the classified learning data (classifying learning data by the amount of damage).

Classification Performance Evaluation
In this study, 1 and 0 indicated significant and minimal wind damage, respectively. The classification accuracy was verified through the ROC curve, a graph drawn on a twodimensional plane with respect to the true value of the model. To draw an ROC curve, the boundary conditions must be adjusted before calculation. In the ROC curves, the sum of sensitivity and one-specificity is maximized as the curve gets closer to the left edge and the probability boundary value is determined as the optimal value.

RMSE
The RMSE is a measure of the residual, which is the difference between the values predicted by the model and actual observed values. The RMSE enables predictive power to be integrated into a single unit of measurement. The RMSE of the model's prediction for the estimated variable X est,i is defined as the square root of the mean square error (Equation (3)): Here, X obs,i indicates the actual observed value, and X est,i is the predicted value obtained from the model.

NRMSE
The NMRSE is a normalized representation of the RMSE; it represents the difference between the values predicted by the model and actual observed values. The NMRSE has two expressions; the RMSE is either normalized to the observed range of data, as shown in Equation (4), or to the average of observed data, as shown in Equation (5): In this study, the predictive performance evaluations were performed on different assessment data, with the entire data classified into learning and prediction data. The predictive performance of the data for evaluation corresponds to the predictive performance of the future data; therefore, among the candidate models, the model with the least NMRSE was chosen as the final model. The NRMSE quantitatively evaluated the predictive power of the wind-damage predictive function developed using the approach in Equation (4).

Selection of Research Areas
In this study, the damage history by Si, Gun, and Gu was collected based on the disaster yearbook. Among them, 74 Si, Gun, and Gu corresponding to coastal areas were selected and developed. Figure 2 shows the study area by sea area. Figure 2 shows the study area by sea area.

Classification by Sea Area
Although there were a total of 816 storm damage cases in the disaster compensation list, due to the delayed start date of observation by the Korea Meteorological Observatory, only 378 cases were available for analysis. Therefore, there were 44 areas with a winddamage history, with less than five cases in 74 cities, counties, and districts in coastal areas, and six areas with more than ten cases, which were not sufficient to develop damage forecasting functions by region. Thus, this study sought to increase the reliability of the wind-damage prediction function by securing the number of samples by clustering the areas subject to research by sea area (the East Sea, Yellow Sea, Southern Ocean, and Jeju). Next, 74 of the community by littoral waters, 78, 138 cases, and the East Sea as storm damage by history are listed in Table 1, the Southern Ocean waters, Jeju-si, and Jeju-do, clustering in 162 cases.

Dependent Variable
In this study, the dependent variables were extracted from the disaster annual report provided by the Ministry of Public Administration and Security (MOIS). This is a systematic disaster-related statistical data, and following the enactment and amendment of the Natural Disaster Countermeasures Act, the disaster report began to include information on storm damage. To compensate for the short duration of the damage history, additional damage history (historical storm damage history) that can be considered as storm damage was collected from 1994 to 2020. Pseudo-flood damage history pertained to three types of public facilities and three types of private facilities selected through a historical survey of storm damage caused by typhoon according to the type of damaged facilities (fishery growth food, fishing nets, fishing grounds, ships, harbors, and fishing ports) in the event of a disaster compensation. Finally, the total amount of damage by damaged facilities was utilized as a subordinate variable.

Independent Variable
The independent variables used in this study were divided into three categories, i.e., ocean weather data, social and economic factors, and the pre-investment factor.
The marine weather data were composed of collected average wave height, maximum wave height, and average wind speed data from 11 domestic buoy locations managed by the Korea Meteorological Administration's weather data open portal and 9 domestic lights, and dug, wind speed, and tide data from 16 marine observation sites managed by the National Oceanographic Research Institute, 3 marine science bases, 3 marine observation stations, and 47 tidal stations managed by the Korea Meteorological Administration. The optimal stations were selected for each coastal area based on the date of commencement of observation and the distance from the affected area. The social and economic factors, area and length of the coastal area and the number of fishery workers per year, were then utilized by the National Oceanographic Research Institute's Coastal Disaster Assessment System (CDAS).
First, the CDAS evaluation system is a report published by the National Oceanographic Research Institute, which aims to develop an objective and quantitative coastal disaster vulnerability assessment index by disaster cause, establishes a geographic information system (GIS)-based coastal disaster vulnerability assessment system, produces the necessary basic data for national disaster response, and promotes the establishment of coastal related policies; further, it makes it possible to obtain all the coastal disaster vulnerability information from a single place. Among the coastal disaster vulnerability assessment system, the coastal disaster exposure index, coastal potential impact index, and coastal disaster sensitivity index were selected and utilized. The coastal area was selected based on the GIS provided by the National Geographic Information Service; coastal length is provided annually by the National Statistical Office. The number of fishery workers per year was obtained by calculating the number of fishery workers on a city-level basis, compared to the number of people in the relevant cities, counties, and districts per year. Next, the existing investment component was the cost of restoration by city, county, and district for each year of disaster compensation. As the restoration cost of the disaster compensation was calculated for the total damage to the relevant city, county, and district, the restoration work should be regarded as the recovery cost for the storm damage, and the data from the previous year should be utilized at the time of prediction. Finally, the dependent and independent variables used in this study are as shown in Table 2 below, and the variable names were specified for modeling. In this study, the group classification of the extent of damage was conducted using the wind-damage prediction function developed using the logistic regression model. The group classification model of the amount of damage was implemented first because the analysis of the data on the amount of damage caused by wind and wind damage revealed a difference of approximately 88 billion won between the minimum and maximum amount of damage, resulting in an overestimation of the amount of damage when regression analysis was conducted without classifying the amount of damage. Thus, damages were classified according to the scale in the learning section using the logistic regression model; the classification accuracy was evaluated based on the optimal probability boundary obtained using the area under the curve. To collectively classify the extent of damage, it was necessary to perform k-fold cross-validation (k-fold cross-validation, k-fold CV) for each parameter and derive the optimal probability boundary (pcut). The k-split crossvalidation is a method to increase the statistical reliability of a classifier's performance measurement when there is insufficient data. k is the number of partitions. Cross-validation was conducted to measure the performance of the model, as shown in Figure 3. In this study, a ten-division cross-validation was performed.

Development of Damage Estimation Model by Group
Based on the probability boundary values determined through the logistic regression and ROC curve verification, the learning data were classified into groups with major and minor damage, following which a separate group-by-group multiple regression analysis was conducted. The accuracy of the group damage prediction model prototypes developed in the learning section was evaluated using the RMSE and NMRSE. RMSE: 1,260,167 and NMRSE: 6.95% of the results of the verification through the prediction section showed excellent accuracy and finally developed a predictive function using the entire section. The group classification function of the damage cost is shown in Table 3, which shows the regression coefficient and the intercept value of the group damage prediction function.

Verification of Predicted Amount by Wind-Damage Model
Through the prediction model of the wind-damage amount calculated through the above procedure, the accuracy of the forecast model was verified for four areas with recovery costs in 2020 out of the wind-damage history in 2018. First, the independent variable data and amount of damage amount data in the area were collected (Table 4), and the separation of the estimated amount of damage by group through the previously developed group damage prediction model was performed to compare and analyze the actual amount of damage. The analysis showed that the RMSE and NMRSE were 301,269 and 30.7%, respectively. The biggest error occurred in Jeju City, which was attributed to the fact that the Geomundo Observatory, an observation site for the use of tidal data in Jeju City, was approximately 100 km away from Jeju City, which is not sufficiently close to representing the marine climate of Jeju Island.
Although there is a domestic observation post on Chujado Island, whose reports are closer to the occurrence time of the damage, it was not suitable for use in Jeju Island's damage history (three cases from 1999 to 2013). In addition, because Jeju City was clustered and analyzed under Namhae and Jeju, it was believed that the amount of damage was overestimated, as it was leveled in other areas where there was significant storm damage (Table 5).

Conclusions
In this study, a storm damage prediction function was developed based on the CDAS index, number of local households per year, and local square measures to reflect the local characteristics after collecting wind-damage history and similar wind-damage history data from 1991 to 2020.
A total of 378 cases of storm damage were collected, with an average of 5.1 cases per 74 cities and counties in coastal areas, which was deemed insufficient to predict individual damage history in cities and counties. Therefore, the data shortage was addressed by clustering by the sea area, and the damage prediction function was developed through the representative factors of each city and county. According to the accuracy analysis of the developed wind damage prediction function, RMSE is 1,260,167 and N-RMSE is 6.9%, which is considered to be highly accurate. Consequently, the predictive power evaluation facilitated contributing to the establishment of policies that reflect regional characteristics, as it solved the problem of lack of disaster statistics data through reverse clustering of natural disasters, and reflected regional characteristics. The proposed model can effectively aggregate and calculate the actual storm damage; it can improve the laws and systems related to disaster management and, consequently, the recovery guidelines.
Specifically, by highlighting areas vulnerable areas that require proactive steps to prevent storm damage, the proposed model can enable the MOIS to establish a comprehensive plan to reduce natural disasters.
However, because of the necessary statistical analysis based on disaster history, implementing the proposed method necessitates building a database of continuously collected damage history data; it is believed that more accurate results will be obtained if additional data on storm damage alone are collected. In addition, further studies are required to improve the accuracy of observations for areas that are far from the damage location and ocean observation stations.  Data Availability Statement: This study did not report any data.

Conflicts of Interest:
The authors declare no conflict of interest.