Assessment of Building Damage Risk by Natural Disasters in South Korea Using Decision Tree Analysis

: The purpose of this study is to identify the relationship between weather variables and buildings damaged in natural disasters. We used four datasets on building damage history and 33 weather datasets from 230 regions in South Korea in a decision tree analysis to evaluate the risk of building damage. We generated the decision tree model to determine the risk of rain, gale, and typhoon (excluding gale with less damage). Using the weight and limit values of the weather variables derived using the decision tree model, the risk of building damage was assessed for 230 regions in South Korea until 2100. The number of regions at risk of rain damage increased by more than 30% on average. Conversely, regions at risk of damage from snowfall decreased by more than 90%. The regions at risk of typhoons decreased by 57.5% on average, while those at high risk of the same increased by up to 62.5% under RCP 8.5. The results of this study are highly ﬂuid since they are based on the uncertainty of future climate change. However, the study is meaningful because it suggests a new method for assessing disaster risk using weather indices. (in the long-term) and RCP 8.5 (in the short-, medium-, and long-term). The assessment of the risk from typhoons was conducted considering both the precipitation and gale speed indices. Higher precipitation levels were predicted for the future compared to the past, but the “ave_wind” and “max_wind” gale speeds were found to be lower.


Introduction
Since the 1950s, extreme weather changes, such as decreases in cold temperature extremes, increases in warm temperature extremes, increases in high sea levels, and increases in the number of heavy precipitation events [1], have been observed. Such changes in climate are projected to increase the risks to humans and assets in urban areas due to heat stress, storms and extreme precipitation, inland and coastal flooding, landslides, air pollution, drought, water scarcity, sea level rises, and storm surges [2].
The trend in the occurrence of natural disasters has steadily increased since 1990, although the number of occurrences increase or decrease yearly [3,4]. On the other hand, the intensity of damage, regardless of the type of disaster, increases every year. Consequently, the number of people directly or indirectly affected by disasters, and the associated costs, are also increasing [5]. According to the annual statistical review of disasters issued by the Centre for Research on the Epidemiology of Disasters (CRED) [6], 342 natural disasters occurred in 2016, which is less than the average from 2006 to 2015 (376.4). Inversely, the number of people reported to have been affected by natural disasters (564.4 million) during the former period was the highest since 2006, amounting to 1.5 times its annual average (224 million). Furthermore, the costs of damage from natural disasters were reported to be 12% higher (US $154 billion) than the 2006-2014 average. The Asia-Pacific region is the most disaster-prone area in the world [7]. Developing countries in Asia, in particular, are vulnerable to extreme weather events under variable present-day climatic conditions [8]. According to a report from the Economic and Social Commission for Asia and the Pacific [9], 47% of the world's 334 disasters occurred in Asia. In addition, the number of deaths in South Asia has accounted for 64% of all deaths in natural disasters. This was caused in part by the fact that developing countries in Asia are vulnerable to extreme weather events due to climate change. In addition, developing countries are increasingly suffering from inadequate infrastructure (e.g., old buildings and unsafe facilities) that cannot survive changes in weather [10]. However, damage from natural disasters can occur at any time, depending on the region and environment. As a result, precautions and countermeasures should be implemented to reduce the magnitude of damage and to prevent unpredictable disasters.
Buildings are the main spaces in which humans live, and in the event of a disaster, building damage or collapse leads to injury and deaths. Therefore, quantitative analysis of building damage caused by natural disasters can help to reduce damage and economic losses in future. Recent studies have attempted to quantitatively analyze the causes and consequences of building damage caused by natural disasters [11].
Blong et al. [12] stated that meteorological perils are more significant than geological hazards, noting that the assessment of damage to residential buildings is the most important because they represent more than half of all constructed space. Although land use regulations are strong in some countries, building regulations are rare. The World Bank suggests that building regulations prepared considering national and regional capabilities would be the most effective tools for reducing chronic disaster risks globally [13]. In Korea, regulations for disaster mitigation are present in building laws. However, most of these are related to reducing rain damage, which does not reflect the risk from various natural disasters. Chmutina et al. [14] presented an efficient alternative for creating building regulations aimed at disaster mitigation in Barbados through interviews with public authorities and private experts. To develop and verify building damage assessment methods applicable to earthquake and storm risks around the world, Chandler et al. [15] set the parameters associated with building structures (e.g., age, height, and occupancy) and proposed an evaluation method using vulnerability curves. Various other data, such as the number of floors and construction type, have also been used to evaluate the impact of disasters on buildings [16]. Yazdani et al. [17] collected data on building damage caused by natural disasters based on a field survey and estimated the cause and magnitude of damage based on the structural characteristics of the buildings. Luo et al. [18] analyzed the relationship between gale speed and building damage to reconstruct a gale field around the damage to a residential building after a typhoon. Based on this work, they suggested that such models could possibly be constructed for other disasters. Pita et al. [19] presented a new approach towards assessing hurricane damage to building interiors using the Florida Public Hurricane Loss Model. Finally, Atillah et al. [20] proposed a geographic information system (GIS)-based assessment method to assess building vulnerability and damage in the event of a tsunami in Morocco. Using data on past disasters, they assessed residential buildings and local risks in the Moroccan coastal zone and created a map of building damage.
Various other analytical methods, such as probabilistic analysis [21,22] and Monte Carlo simulation [23][24][25], have also been used to evaluate building damage caused by natural disasters [26]. In recent years, research has used artificial intelligence such as data mining and machine learning to identify and evaluate natural disaster risk factors. These methods present a systematic and automated way to determine statistical rules and patterns from large amounts of data. Other techniques used to study natural disaster risk include artificial neural networks [27][28][29], support vector machines [28,30], and Bayesian network and clustering. Liu et al. [31] also proposed statistical model parameters for compact polarimetric synthetic aperture radar. Finally, Reese et al. [32] assessed building vulnerability using logistic regression for damage data collected during the 2009 South Pacific tsunami and verified the disaster vulnerability factors by building use.
The damage caused by natural disasters to buildings is affected by various factors such as weather conditions, the environment in which the buildings are located, and structure of the buildings. Easterling et al. [33] argued that if there are identifiable trends in extreme climatic events such as temperature or precipitation, human impacts on climate change are a very important factor in damage caused to buildings from natural disasters. Spekkers et al. [34] investigated the relative contribution of different mechanism failures to the occurrence of rainstorm damage using a property-level home insurance database of around 3100 water-related damage in Rotterdam, the Netherlands, by analyzing the relationship between the mechanisms and the weather variables. As a result, they identified that the maximum rainfall intensity and rainfall volume are significant predictors for the probability of precipitation occurring. However, few studies have attempted a quantitative analysis focusing on the weather conditions and building damage.
Therefore, the purpose of this study is to analyze the impact of weather conditions on building damage and evaluate the risk of this occurring in South Korea using weather conditions as variables. It also aims to predict future building damage using climate change scenarios. Decision tree analysis was used to identify the causes of natural disasters and the resultant building damage. It generates decision support rules for various phenomena as a method of machine learning and is useful for evaluating various risks to humans and the environment when the expected utility model indicates a risk that cannot be represented [35]. Many other studies have evaluated the risk of natural disaster using decision tree techniques [36,37]. For example, Guo et al. [36] used classification and regression tree partitioning rules to identify flood hazards in 35 catchments in 10 regions in China. The decision tree technique outperforms other regionalization approaches because it generates rules that optimally consider spatial proximity and physical similarity. Moreover, the random forest classification function of decision tree analysis can be executed efficiently for large databases, and it is possible to derive the importance of specific variables among the classifications [37]. Quanlong et al. [38] classified remote sensing images using a random forest classifier as the main method for mapping floods in China and extracted flooded areas with a 94% accuracy.

Overall Methodology
This study used observation weather data, climate change scenarios, and building damage history as the input data. Correlation analysis and decision tree analysis, which is a data mining technique, were used to evaluate the risk of damage to buildings. Decision tree analysis is a useful method for predicting future results using tree models derived from historical data [39]. The spatial units for evaluating and predicting the effects of building damage were based on 230 administrative districts in the Republic of Korea. Each administrative district operates as a separate autonomous region, and disaster management for damage caused by natural disasters is carried out on at a regional scale. Therefore, this was an appropriate division for predicting and analyzing the risk of building damage in Korea. The study period was from 2005 to 2014.
This study evaluated building damage caused by natural disasters in three stages (see Figure 1). In the first stage, 33 weather indices and four building damage history indices for the study region and time period were compiled and verified as the input variables.
In the second stage, the decision tree model was established using input variables selected using correlation analysis and the accuracy of the models was determined. Weights and limit values for the input variables were derived from the decision tree model with a low error rate. The former indicates the importance of each variable and the latter represents the minimum limit value of the input variable needed to determine whether a building is damaged. Building damage risk was determined by region using the weights and limit values.
In the third stage, building damage in the 230 regions was compared and analyzed according to climate change scenarios (representative concentration pathways (RCPs) 2.6, 4.5, 6.0, and 8.5) in which the weights and limit values derived from the decision tree model were used to calculate risk.

Decision Tree Analysis
Decision tree analysis is defined as a classification procedure that subdivides a dataset based on a test set defined at each branch or limit value in the tree [40]. This model predicts outputs using given inputs [41] and generates rules for an entire area by repeatedly dividing the area of each variable. Generating rules using decision tree analysis has the advantage of being accessible because of its logical "if-then" format and ease of implementation using simple, structured database languages [42]. The decision tree analysis uses limit values as the criteria for its data classification process.
The Gini impurity in each tree is used to evaluate the relative importance of the input variables and evaluate the impact on the model [39] and is an indicator that determines if the data has been appropriately separated [43]. It has a value of 0 when the p value is equal to 0 or 1 and reaches a maximum in a parabolic distribution at a p value of 0.5. Overall, the accuracy increases as the Gini impurity value decreases.
Using historical data to predict the outcome of the future is one of the most important uses of a decision tree model [39]. In this study, we estimated the risk of building damage up to 2100, deriving weight and limit values for building damage based on weather factors using the method described in Section 2.1. The random forest function, which is an ensemble method using a tree type classifier, was used to derive weights according to each input variable. It partitions each limit value by searching a subset of randomly selected input variables [44]. The accuracy of the random forest technique has been examined in a number of studies that used it to derive the importance of variables [38,45,46]. For example, Gislason et al. [44] compared land cover classifications using random forest and other ensemble methods (bagging and boosting), and showed that the former outperformed other statistical models because it could assess the significance of variables.
This method of datafication can generate criteria with which to determine output variables through the iterative process which is the main feature of the decision tree model. The data collected from the input variables were datafied using the same criteria. Data were divided into "training" and "testing" sets, in a 7:3 ratio, to evaluate the model performance. Overestimation was determined by evaluating the training set using the k-fold cross-validation method. Predictions were made for the testing set using the "predict" function. The weight and limit values of the input variables used to predict future building damage were derived from the decision tree model with the highest accuracy (see Figure 2).

Study Area
South Korea, a peninsula located in northeast Asia, was used as the case study for this research. The country is connected to the Asian continent in the north and is surrounded by the North Pacific Ocean (see Figure 3). It has an area of 9,972,000 ha as mainland and 3330 islands. It is divided by natural barriers, such as mountains and rivers into three largely distinctive regions: the northern, central, and southern regions. South Korea is divided into 17 administrative districts consisting of one special city, six metropolitan cities, and eight provinces.
Korea experiences a unique climate due to seasonal influences and complex terrain. South Korea has four distinct seasons: spring, summer, fall, and winter. Abnormal dryness often occurs in spring, and cold temperatures during this season often affect crops and human lives. In summer, which starts in June, the humidity is very high, heavy flooding and typhoons are frequent (there are three or more typhoons per year), and flood damage is an annual occurrence, especially in the lowlands. The average annual precipitation is 1200-1500 mm in the central region, 1000-1800 mm in the southern region, 1500-1800 mm in Jeju Island, and 1800 mm in the southern region; 5-60% of annual precipitation occurs in summer. The highest daytime temperature exceeds 30 • C, which causes frequent casualties due to urban heatwaves. Fall is the mildest season due to the effects of high pressure systems and there are relatively few natural disasters during this period. Winter is cold and dry due to continental high pressure. However, since the amount of snowfall differs by region, there are large regional differences in human and property damage.

Weather
This study used observational weather data from 2005 to 2014 and climate change scenarios for 2021-2100 (see Table 1). The former were used to evaluate past building damage in Korea and the latter were used to predict future building damage based on data from the past. Both datasets were used to assess building damage caused by natural disasters based on the 230 regions in South Korea. In other words, the building damage for the next 80 years was predicted using evaluations of building damage in Korea in the past 10 years.  The reprocessed climate change scenarios calculated by the Korea Meteorological Administration were used to assess and predict the damage caused by natural disasters in Korea. The intensity and frequency of natural disasters has been increasing and there is a limit to producing detailed weather data using only a global climate model [47]. Therefore, detailed weather data from regional climate models (RCMs) are essential for assessing regional natural disaster damage. Recent studies have shown that it is necessary to research how to downscale global climate models [48]. Table 2 shows the detailed weather data used in this study, which was calculated using RCMs such as the Hadley Centre Global Environmental Model ver. 3-Atmospheric regional climate model (HadGEM3-RA), the Regional Climate Model ver. 4.0 (RegCM4), the Seoul National University Regional Climate Model (SNURCM), and the Weather Research and Forecasting model ver. 3.4 (WRF) [47].   Figure 4 presents the 230 spatial units of weather data. The weather data used in this study were reproduced from 73 automatic weather stations using objective analysis and interpolation [49] (p. 525). Since climate change scenarios use atmospheric model grid systems of regional climate models, their grid systems, such as Lambert conformal or Mercator projection [50] (p. 519), are unique; however, it is more efficient to use a GIS lattice system of administrative districts in studies related to natural disasters [50] (p. 526). However, detailed weather data based on regional climate models can only estimate the reliability of future forecasts in a limited way due to errors and uncertainties [51]. Therefore, the weather data used in this study were compared with regional precipitation on 8 July 2005. The automatic and detailed weather data showed the same results; thus, the automatic data were judged to be reliable [52].
The data were generated after bias in the weather observations in the 230 regions was corrected. Since bias correction is based on observational data and modifies weather data, it has the advantage of reducing the uncertainty of the model results [53]. The data were divided into a single disaster index and a complex disaster index. The single disaster index consisted of 26 indices, including precipitation, temperature, gale speed, and snowfall, to predict damage from natural disasters. Korea is particularly vulnerable to typhoons accompanied by heavy flooding and gales [49]. The complex disaster index is based on the precipitation and gale speed on the day of a typhoon and consists of seven indices. These indices have a "©" before them to distinguish them from the simple disaster index. All indices use an annual unit, and precipitation days are defined as days when precipitation exceeds 1 mm. "Fresh snow cover" is measured as the height of the snow accumulated in 24 h in the single disaster index. Table 3 lists the weather data indices, and Figure 5 presents the maps of the weather conditions in the study regions from 2005 to 2014. The average values of the precipitation indices "sum_pr" and "max_pr" (see Table 3 for full explanation of the weather index abbreviations) from 2005 to 2014 were 1401.9 mm and 132.4 mm, respectively. High precipitation was observed mainly in the southern and northern regions, in Gyeongnam and Jeonnam Provinces and Seoul and part of Gyeonggi, respectively. The average values for the snowfall indices "max_nsnd" and "sum_nsnd" were 7.7 cm and 28.2 cm, respectively. Precipitation was concentrated mainly in the Jeonnam coastal area and parts of Gangwon Province, which have an average altitude of 500 m above sea level. The average values for the gale indices "max_wind" and "ave_wind567" were 9.9 m/s and 2.0 m/s, respectively. This showed that strong gales were observed mainly in the coastal areas of Jeollanam Province and mountainous areas of Busan and Gangwon Province.  Future short-, mid-, and long-term impacts of climate change were investigated using the RCP 6.0 scenario and the same weather indices. RCP 6.0 assumes that greenhouse gas reduction policies have been implemented to some extent and portrays socio-economic growth greater than that projected in the Special Report on Emissions Scenarios, based on the introduction of new, highly efficient technology as well as a balance of fossil and non-fossil energy sources. The precipitation index in RCP 6.0 used an average "max_pr" 133.2-160.2 mm (after 2021) and an average "sum_pr" of 1327.4-1397.3 mm (after 2021). Future precipitation was found to increase in the "short-term" to "long-term" period, compared to the past. The snowfall index for the period after 2021 showed an average "sum_nsnd" of 5.9-9.7 cm and the "max_nsnd" was 2.5-3.5 cm, showing a very low snow compared to the previous period. There was no significant difference between the past and future gale indices, as indicated by a "max_wind" average of 9.8-10.0 m/s after 2021 and "ave_wind567" values of 2.06-2.08 m/s.

Building Damage History
We used building damage history data from annual disaster reports issued by the government of the Republic of Korea as the dependent variables in our model. These reports record details of buildings, facilities, and loss of lives caused by natural disasters in Korea for one year from 1 January to 31 December. These data, which were considered appropriate to use in an objective comparison of disasters in the study area, were constructed using the same spatiotemporal units as the weather data. The annual disaster reports provide information on flooded areas and damage to buildings, ships, farmland, and public and private facilities. This study used the total sum of buildings listed as missing/destroyed, partially damaged, and flooded after rain, gale, snowfall, and typhoon events from 2005 to 2014.
In total, 87,446 buildings incurred rain damage costing approximately US $90 million, in South Korea over the past 10 years. There were an additional 80 cases of gale damage, 439 cases of snowfall damage, and 15,115 cases of typhoon damage reported. Rain, which accounted for about 84.8% of the total damage, was confirmed as the natural disaster with the most significant effect on buildings. Additional incidents of damage were caused by typhoons (14.7%), snowfall (0.4%), and gales (0.1%). Building damage caused by natural disasters in Korea are divided into "collapsed building" and "flooded building." According to the type of building damage for each natural disaster, about 97.5% of the total damage caused by rain was categorized as "flooded building." More than 80% of the total damage caused by snowfall and gales were categorized as 'collapsed building'. For typhoons, about 82% and 14.8% of the total damage was described as "flooded building" and "collapsed building," respectively. Figure 6 shows the damage in each region. Spatial characteristics showed that rain damage was concentrated in the southern and northern regions (Seoul, Gyeonggi, and Gangwon), while damage caused by snowfall was concentrated in the coastal areas of South and North Jeolla and Gangwon Provinces. For typhoons, damage was observed mainly in the southern regions (Gyeongnam and South Jeolla Province).

Selection of Input Variables
A correlation analysis was performed before the decision tree model was constructed, to ensure that weather indices with a high impact was applied to the input variables in the model. The correlations of the 14 precipitation indices of rain damage, 6 snow indices of snowfall damage, 3 wind speed indices of gale damage, and 7 indices (including precipitation and wind speed) of typhoon damage were evaluated. The results are shown in Table 4.
The results showed that 11 precipitation indices with the exception of "days_wetday," "ave_drydays," and "pnl90," were correlated with rain damage. In addition, all the snowfall indices, with the exception of "days_nsnd50," were highly correlated with snowfall damage. Six of the complex disaster indices, except for "©days_wetday," showed a high correlation with typhoon damage. Of the temperature indices, "days_freez," "ave_maxtemp," and "ave_mintemp" were was highly correlated with rain and typhoon damage, rain damage, and typhoon damage, respectively. Conversely, no correlation was observed between gale indices and gale damage, and since there are relatively few mentions of gale damage compared to other natural disasters in the annual disaster reports, the correlation between gale damage and all indicators was confirmed as non-significant. The 14 rain evaluation indices, 5 snowfall evaluation indices, and 8 typhoon evaluation indices with high correlations were selected to determine the damage caused by rain, snowfall, and typhoons, respectively. Consequently, these indices were used to establish the decision tree model. 0.5139 ** ©ave_wind 0.3769 ** ©max_wind 0.3539 ** "*" "**" indicates the significance of the probability. * p < 0.05, ** p < 0.01.

Accuracy of the Decision Tree Model
The accuracy of each model was compared and evaluated according to the building damage classification using the indices and variables calculated using a correlation analysis detailed in Section 3.1. The lower predictive ability of the decision tree analysis was calculated using a regression model with continuous output variables. Thus, the disadvantages of the model were overcome through conversion and categorization of variables. The "quantile" method, which is the classification method used in ArcMap, was used to set a similar frequency for each category (see Table 5). Risk [A-E] is a value determined by dividing the building damage data by the "quantile" method. Risk [A] is the most dangerous risk, i.e., that which causes the greatest building damage; Risk [E] represents the lowest risk. It serves as a dependent variable in the decision tree model to predict damage to buildings.  Table 6 shows the accuracy and error rates based on the decision tree model. As the class of damage to buildings decreased, the model accuracy tended to increase. Among the rain damage prediction models (R-1-4), model R-1 (two categories) showed the highest accuracy of 0.65. Meanwhile, model S-1 (two categories) had the highest accuracy of 0.85 among the snowfall damage prediction models (S-1-4). For the typhoon damage prediction models (T-1-4), model T-1 (two categories) had the highest accuracy of 0.71. The model error rate (out-of-bag estimate of the error rate) decreased simultaneously in accordance with the decreasing number of categories. Thus, the weight and limit values were derived from models R-1, S-1, and T-1 for rain, snowfall, and typhoon damage, respectively, and each had two classifications. The size with the lowest "misclass" value was fixed to the size of the node in the tree model.

Deriving the Weight and Limit Value from Input Variables
The limit value and weight of the weather factors were derived from the damage caused by each natural disaster scenario using decision tree models R-1, S-1, and T-1, which showed the highest accuracies (see Table 7). The weather factor with the highest impact on building damage caused by rain was "ave_gt80," and the other impacts followed in the order "sum_pr" > "sum_jja_pr" > "px5d" > "max_pr." The impact of the weather factors that affected the risk of building damage from snowfall followed the order "sum_nsnd" > "ave_nsnd" > "max_nsnd." For typhoon damage, the impact on building damage followed the order "©sum_pr" > "©max_gale" > "©ave_mintemp" > "©days_freez" > "©max_pr" > "©ave_wind." In addition, the building limit value and the weight for each index were derived from the decision tree model. The limit value for each model was used as the standard to determine the building damage category (high or low risk) for the output variables during the creation of the decision tree model, and the building limit value was generated for each weather index. Buildings in regions with a limit value lower than the average in 2005-2014 are more likely to suffer damage from natural disasters in the future. In the rain damage model, the indices in this category were "ave_gt80," "sum_pr," "px5d," "max_pr," "pfl90," "pint," and "ave_maxtemp." For the snowfall damage model, the indices "max_nsnd" and "days_nsnd5" had a lower "limit value" than the average. Meanwhile, the "©ave_maxtemp," "©ave_mintemp," and "©ave_wind" indices in the typhoon damage model (model T-1) had a lower value than the average during 2005-2014.
Decision trees are generated based on each node being used to derive the final results, as shown in Figure 7. The final results are the classifications of the dependent variables used in the most accurate models (R-1, T-1, and S-1). In addition, the node is the eigenvalue or "limit value" ( Table 7-(b)) of each weather index that determined Risk A and B. The "limit value" is the value of weather index that identifies the risk to the buildings (Risk A, B). The weights of the variables were derived from the random forest function and measured by how much each variable contributed to the accuracy (=mean decrease in accuracy) and improvement in node purity (=mean decrease in Gini impurity) of the tree model (see Section 2.2).

Risk Assessment Based on Weights and Limit Values
The weights and limit values based on the weather factors that contributed to building damage caused by natural disasters were derived using the decision tree model detailed in Section 3.2. Based to these results, a risk assessment was conducted for the 230 regions in Korea, using climate change scenarios based on the same weather indices, to evaluate the risk of future damage to buildings.
Observational weather data in South Korea were used to assess the past risk of damage to buildings (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014), whereas weather data based on climate change scenarios (RCPs 2.6, 4.5, 6.0, and 8.5) from 2021 to 2100 were applied to predict future risks. For the latter, the overall prediction period was divided into short-term (2021-2040), medium-term (2041-2070), and long-term (2071-2100) timeframes. The weather value was converted according to the weight of each weather index derived from the decision tree models R-1, S-1, and T-1. The weights for the risk index were calculated by applying comparing the weather and limit values. If the weather value was greater than the limit value, the converted weight was applied; if the weather value was less than the limit value, the converted weight was not applied. Risk scores between 0 and 100 points were calculated by summing all the applied weights. In addition, the calculated risk score was classified into five grades using equal interval classification. Regions with a risk index below 20 points were classified as "very low risk," whereas those with 20-40, 40-60, 60-80, and 80-100 points represented "low," "medium," "high," and "extreme" risk (see Figure 8). . These regions were concentrated in the north (e.g., Seoul and Gyeonggi) and south (e.g., parts of Gyeongnam and Jeonnam). In addition, 41 regions concentrated in the coastal areas of Honam, Gangwon, and Hoseo were at risk of damage from snowfall. Finally, 78 regions located mainly in the southern coastal region were at risk of damage from typhoons.
The results of using the same method to predict future damage to buildings (2021-2100) showed an increase in the number of regions at risk of damage from rain under all climate scenarios (RCPs 2.6, 4.5, 6.0, and 8.5) from 2021 to 2100. The risk of damage from rain increased in 224 regions, compared to the past. Under the climate change scenarios, the increase in precipitation was more prominent than the other indices. The increased risk of damage from rain in the future is conspicuous when only the weather situation is considered. Fewer than five regions were predicted to be damaged in the future because of snowfall, indicating a very low risk. This could be explained by the fact that the snowfall for the indices "sum_nsnd," "ave_nsnd," and "max_nsnd" tended to be much lower in the future compared to the past. Finally, areas with building damage caused by typhoons were reduced by half compared to the past, except under RCP 6.0 (in the long-term) and RCP 8.5 (in the short-, medium-, and long-term). The assessment of the risk from typhoons was conducted considering both the precipitation and gale speed indices. Higher precipitation levels were predicted for the future compared to the past, but the "ave_wind" and "max_wind" gale speeds were found to be lower.

Discussion
In this study, the weights and limit values of damage to buildings were derived using weather indices based on past weather observation and building damage history to evaluate the risk of damage to buildings from natural disasters. Furthermore, the derived indices were used to predict and analyze risks of damage to buildings in 230 regions in South Korea, based on climate change scenarios. As a data mining technique, decision tree analysis was used to determine the weights and limit values of damage to buildings according to weather indices. A total of 37 indices were used to construct the input model, 33 weather indices were applied as independent variables, and 4 damage history indices were used as dependent variables.
In the first stage of this research, a correlation analysis was conducted based on different natural disasters (rain, gales, snowfall, and typhoons) to select the variables with the highest correlations to building damage. The input variables for the decision tree model were based on rain, snowfall, and typhoon damage; gale damage was excluded as it had less building damage.
Then the model with the highest accuracy was selected based on a number of building classifications. The weights according to each weather factor were derived using the most accurate decision tree models, which were R-1, S-1, and T-1. The weather variables with the highest impact on building damage caused by rain followed the order "ave_gt80" > "sum_pr" > "sum_jja_pr" > "px5d" > "max_pr." The impact of building damage caused by snowfall followed the order "sum_nsnd" > "ave_nsnd" > "max_nsnd" > "days_nsnd5." Finally, the impact of typhoons on building damage followed the order "©sum_pr" > "©max_gale" > "©ave_min_temp" > "©days_freez" > "©max_pr" > "©ave_wind." Finally, the regions at risk from building damage were predicted for each climate change scenario (RCPs 2.6, 4.5, 6.0, and 8.5). Predictions for multiple time periods (short-, medium-, and long-term) were calculated using the limit value according to the building damage category determined by the decision tree model. The results showed an average increase of over 30% compared to the past trends (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014) in regions at risk of rain damage (where the risk indices > 40). Conversely, the risk of snowfall damage decreased by more than 90% relative to past trends. There was also an average decrease of 57.5% in areas at risk of typhoon damage under most climate change scenarios, but areas with a high risk of typhoon damage (where the risk indices > 60) increased by more than 60% under RCP 8.5 (see Figure 10). This study also investigated the effects of weather conditions on building damage and predicted the risk of future building damage (2021-2100) in the same regions using a decision tree analysis. The results of this analysis were used to derive the weight and the limit value of the weather variables affecting the building damage, which is an important piece of data. The results support the use of the model presented herein as a reliable disaster management technique given the uncertainty of risks caused by natural disasters. The quantitative data used in this study can be used to prepare for the risks and damage associated with future natural disasters. We further expect that this method can be adopted as a key technique to evaluate sustainable growth in Korea. In addition, this study confirmed possibility of using decision tree analysis to evaluate the risk of damage to buildings. A more accurate disaster prediction model will be constructed in future studies by expanding the input variables and diversifying the analysis techniques.

Conclusions
The purpose of this study was to evaluate the damage caused to buildings by natural disasters and to propose a methodology to predict future risks of the same. Its results can be applied to Korean national disaster policy in the future, since the risk assessments was carried out based on 230 Korean administrative districts. In addition, it is different from other research since the past and future disaster risk are assessed simultaneously using weather data. However, utilizing this data to predict the risk of disasters involves a variety of restrictive conditions; for example, the spatial unit and index of data used for the assessment of past and future disaster risks should be the same. This study used recalculated weather condition data [50]. Finally, a data mining technique was used to analyze Korean historical disaster and weather data. This method has become used more actively as a risk assessment technique recently [54][55][56], but other techniques include field surveys, questionnaires, and statistical analysis. This study uses the decision tree method to efficiently analyze the datasets used. The data collected qualify as "Big Data" because of the volume, variety, and velocity by which they are generated [26]. The use of the decision tree technique to efficiently analyze big data such that used here makes this study different from others.
On the other hand, the future risk index presented as a result of this study cannot be considered objective data as it uses climate change scenarios with a high uncertainty. Risk assessment is somewhat uncertain in that both theoretical work and practice rely on perspectives and principles that could seriously misguide decision-makers [57]. In this study, we used a 20-point interval classification system to convert the final disaster risk score into a risk index. Since the classification methods such as quantile classification [58] and natural breaks classification [59] should be applied differently according to the kind of data and its purpose of use, the accuracy of the results of the classification method used in this study are limited.
Many studies have considered buildings, the environment, and other urban facilities as risk factors with which to assess the risk of natural disasters [60]. However, only the weather condition was taken into account in this study. This does not determine the entirety of the damage caused to buildings during disaster. If additional data on the architecture of the urban environment (e.g., height, slope, and land type) and the conditions of the buildings (e.g., structure, materials, and age) are used in the model constructed, it would be possible to evaluate the integrated risk of disaster [42]. Unlike meteorological data, however, such data on the built environment are difficult to quantify for the future. The scenarios of the future built environment would complement this study. More reliable risk assessment results could be obtained if objective examinations of the various disaster risk factors in each country and region are carried out in a manner similar to the compilation of the weather conditions used in this study.