Seismic Vulnerability Assessment and Mapping of Gyeongju, South Korea Using Frequency Ratio, Decision Tree, and Random Forest

: The main purpose of this study was to compare the prediction accuracies of various seismic vulnerability assessment and mapping methods. We applied the frequency ratio (FR), decision tree (DT), and random forest (RF) methods to seismic data for Gyeongju, South Korea. A magnitude 5.8 earthquake occurred in Gyeongju on 12 September 2016. Buildings damaged during the earthquake were used as dependent variables, and 18 sub-indicators related to seismic vulnerability were used as independent variables. Seismic data were used to construct a model for each method, and the models’ results and prediction accuracies were validated using receiver operating characteristic (ROC) curves. The success rates of the FR, DT, and RF models were 0.661, 0.899, and 1.000, and their prediction rates were 0.655, 0.851, and 0.949, respectively. The importance of each indicator was determined, and the peak ground acceleration (PGA) and distance to epicenter were found to have the greatest impact on seismic vulnerability in the DT and RF models. The constructed models were applied to all buildings in Gyeongju to derive prediction values, which were then normalized to between 0 and 1, and then divided into ﬁve classes at equal intervals to create seismic vulnerability maps. An analysis of the class distribution of building damage in each of the 23 administrative districts showed that district 15 (Wolseong) was the most vulnerable area and districts 2 (Gangdong), 18 (Yangbuk), and 23 (Yangnam) were the safest areas. building age, construction materials, building density, number of ﬂoors, elderly population, child population, population density, and distances from hospitals, ﬁre stations, police stations, roads, and gas stations) as independent variables and buildings damaged in the 2016 Gyeongju Earthquake as dependent variables. Epicenter distance and PGA were found to be the most important factors in the DT and RF models, and factors related to construction materials were the least important. These results may be used as reference data for models based on other methodologies. Model accuracy (success and prediction rates) was veriﬁed using ROC curves; the RF and FR models exhibited the highest and lowest performance, respectively, indicating that the machine-learning- based model is more suitable for seismic vulnerability assessment. Dangerous and safe areas were identiﬁed based on the seismic vulnerability maps created using the three models; in all three maps, district 15 was found to be the most dangerous area and districts 2, 18, and 23 were the safest areas. Therefore, district 15 must be managed ﬁrst in preparation for future earthquakes. The seismic vulnerability maps created in this study facilitate intuitive identiﬁcation of dangerous districts within the target area, which will prevent greater damage in future earthquakes through the establishment of evacuation routes for residents. As reference data, our ﬁndings may be used for developing earthquake-related policies and determining suitable locations for vulnerable infrastructure (e.g., pipelines or high-voltage facilities), as well as important national facilities (e.g., airports, military facilities, and nuclear power plants).


Introduction
An ML 5.8 earthquake occurred 8.7 km south-southwest of Gyeongju, South Korea (35 • [3], the Gyeongju Earthquake was the largest earthquake among those recorded by the domestic seismic observation network; it consisted of a shock wave with concentrated energy, in which strong ground motion lasted for only 1-2 s, 15 km beneath the surface. Due to these characteristics, the initial reporting indicated that the earthquake did not significantly damage structures; however, it resulted in 5368 damaged properties, 111 victims, and 23 injured people. The Gyeongju Earthquake represented a new disaster type that provoked a number of economic and social problems and revealed the limitations of established countermeasures. This disaster also made it impossible to rule out the possibility of similar earthquakes in the future, highlighting the importance of precautions to prevent greater losses.
The Korean Peninsula is located within the Eurasian plate and, therefore, has a lower earthquake occurrence frequency and longer recurrence period than countries located at the plate boundary. The geological structures of the Korean Peninsula include weak crust and many fault structures, which have led to increased earthquake occurrence frequency in recent years, partly because the peninsula is affected by earthquakes occurring in neighboring China and Japan [3]. The 2016 Gyeongju Earthquake occurred approximately five months after the occurrence of the ML 7.3 Kumamoto Earthquake on 16 April 2016, which followed the ML 9.0 Great East Japan Earthquake on 11 March 2011. Earthquake occurrence frequency and size are increasing globally; according to a UN report, disasters related to earthquakes and volcanoes accounted for approximately 10% of the natural disasters that occurred from 1998 to 2017 [4]. Although the proportion of earthquakes is low compared to those of other natural disasters, economic damage caused by earthquakes represented approximately 23% of that of total natural disasters and 56% of total human casualties during the same period. Property damage caused by domestic earthquakes totaled approximately 9.5 million USD in 2016 and 70.6 million USD in 2017, representing significant national losses. Despite continuous damage from earthquakes, it remains impossible to predict earthquake occurrence accurately or to control natural disasters artificially. However, it is possible to minimize damage by predicting areas vulnerable to earthquakes and potential damage, establishing policies suitable for such areas, and performing sustainable preparation in advance.
Seismic vulnerability assessment involves the comprehensive evaluation of factors that affect risks associated with earthquakes within predefined areas. Urban areas are at higher risk of seismic disasters than outlying areas due to their higher building and infrastructure density and larger population. Therefore, in assessing seismic vulnerability, it is essential to select suitable influential factors and methods for the area of interest. Several methodologies have been applied for seismic vulnerability assessment and mapping during the past few decades.
Seismic vulnerability assessment studies commonly analyze case studies using a combination of multi-criteria decision-making (MCDM) and geographic information system (GIS) approaches [5][6][7]. Among these, the analytical hierarchy process (AHP) is one of the most widely known MCDM methodologies; it stratifies and quantifies the importance of each applied influential factor to determine its relative importance, and assesses vulnerability by applying weights to all factors [8][9][10][11][12]. However, this method can be subjective because the opinion of the researcher can affect the weight assignment process; therefore, it is somewhat unsuitable for objective assessment. To address this problem, recent studies have applied hybrid models that combine various methodologies [13][14][15][16][17]. Lee et al. (2019) [16] developed the GIS-based Seismic-Related Vulnerability Calculation Software (SEVUCAS) for seismic vulnerability assessment, which includes a stepwise weight assessment ratio analysis (SWARA), radial basis function (RBF), and teaching-learning-based optimization (TLBO) methods. SEVUCAS provided reliable results by assigning the weights of main indicators and sub-indicators using SWARA and interpolation methods based on RBF and TLBO to reduce the effects of weights with significant variation at the boundary of each class for each factor. Yariyan et al. (2020) [17] constructed a hybrid model by integrating different decision support systems to increase the accuracy of seismic vulnerability mapping. Using this model, seismic vulnerability maps were created based on multiple-criteria decision analysis-multi-criteria evaluation (MCDA-MCE) and MCDA-fuzzy models to construct training datasets, and training points were randomly selected. The MCDA-MCE and MCDA-fuzzy models were found to have 0.85 and 0.80 model accuracy, respectively. Based on two training datasets, MCE-logistic regression (LR) (0.90) and fuzzy-LR (0.85) hybrid models were constructed. The accuracy of the resulting seismic vulnerability maps was found to be directly related to that of the training datasets.
Many recent studies related to seismic vulnerability assessment and mapping have been conducted using machine learning techniques [12,[18][19][20][21]. For example, Han et al. (2019) [20] used a logistic regression (LR) model and applied the support vector machine (SVM) methodology to four kernel models (linear, polynomial, radial basis function, and sigmoid) to derive a suitable model for seismic vulnerability assessment; this study was notable in that the results of several seismic vulnerability models were compared analytically; such analyses are rarely conducted in this field, despite the broad application of machine learning techniques in recent years.
Tree-based machine learning methodologies have mainly been applied in seismic vulnerability studies for parameter evaluation [45][46][47]. For other natural disasters, these methodologies have also been used to determine the relative influence of seismic parameters on the model results.
Tree-based machine learning methodologies have mainly been applied in seismic vulnerability studies for parameter evaluation [45][46][47]. For other natural disasters, these methodologies have also been used to determine the relative influence of seismic parameters on the model results.
The objective of this study was to assess the seismic vulnerability of all buildings in Gyeongju, South Korea, and to create maps using these data. We applied FR, a probabilistic technique, and DT and RF, which are tree-based machine learning techniques, to construct models using 18 subindicators related to geotechnical, physical, structural, social, and capacity indicators as independent variables and building damage location data collected after the 2016 Gyeongju Earthquake as dependent variables. Model performance was verified using relative operating characteristic (ROC) curves. The results were compared and analyzed to identify models suitable for seismic vulnerability assessment and mapping and to evaluate the importance of each factor for each methodology. Finally, dangerous and safe areas were identified in each of 23 administrative districts by creating maps based on the model with the highest accuracy for each methodology, and the results were assessed ( Figure  1).

Study Area
The target area of this study was the city of Gyeongju, Gyeonsangbuk-do, South Korea (35 • 39 -36 • 04 N, 128 • 58 -129 • 31 E). Gyeongju is in the southeastern part of the Korean Peninsula; it has a population of 254,853 and an area of 1324.82 km 2 , and consists of 23 administrative districts ( Figure 2). Within the total area, agriculture and forestry account for 42.36%, followed by green areas (31.04%) and other areas (26.6%) [48].
Sustainability 2020, , x FOR PEER REVIEW 4 of 22

Study Area
The target area of this study was the city of Gyeongju, Gyeonsangbuk-do, South Korea (35°39'-36°04'N, 128°58'-129°31'E). Gyeongju is in the southeastern part of the Korean Peninsula; it has a population of 254,853 and an area of 1324.82 km 2 , and consists of 23 administrative districts ( Figure  2). Within the total area, agriculture and forestry account for 42.36%, followed by green areas (31.04%)  [49].
Several faults in the study area, including Dongrae, Moryang, Miryang, Ulsan, and Yangsan, are distributed within the study area [50], and the Wolseong, Saeul, and Kori nuclear power plants are located along the nearby coastline to the southeast. Due to these regional and geographic characteristics, the probability of earthquake occurrence in this region is considered to be relatively  [49].
Several faults in the study area, including Dongrae, Moryang, Miryang, Ulsan, and Yangsan, are distributed within the study area [50], and the Wolseong, Saeul, and Kori nuclear power plants are located along the nearby coastline to the southeast. Due to these regional and geographic characteristics, the probability of earthquake occurrence in this region is considered to be relatively high, and secondary damage in the event of an earthquake with medium or higher magnitude constitutes an unusually high risk. In 2019, 957 earthquakes with magnitudes of less than 2.0 occurred in the Korean Peninsula; among these, 260 earthquakes (27.17%) occurred in the Gyeongsangbuk-do area (including Daegu) [49]. Among the 88 earthquakes of magnitude 2.0 or higher, 23 (26.17%) occurred in the same area. Since the 2016 Gyeongju Earthquake, large and small earthquakes have occurred continuously. Therefore, sustainable preparation and management planning for such events is required.

Data
We selected factors affecting seismic vulnerability based on the results of a previous study, taking into consideration applicability and practicality [51]. The main indicators were geotechnical, physical, structural, social, and capacity indicators; we selected a total of 18 sub-indicators corresponding to these categories. Geotechnical sub-indicators included slope, altitude, and groundwater level; physical sub-indicators included peak ground acceleration (PGA), epicenter distance, and fault distance; structural sub-indicators included building age, construction materials, building density, and number of floors; social sub-indicators indicators included elderly population (≥65 years), child population (<15 years), and population density; and capacity sub-indicators included distances from hospitals, fire stations, police stations, roads, and gas stations. Sub-indicators were organized into a raster-based spatial database (10 m spatial resolution) and applied to all buildings in Gyeongju as independent variables ( Figure 3). high, and secondary damage in the event of an earthquake with medium or higher magnitude constitutes an unusually high risk. In 2019, 957 earthquakes with magnitudes of less than 2.0 occurred in the Korean Peninsula; among these, 260 earthquakes (27.17%) occurred in the Gyeongsangbuk-do area (including Daegu) [49]. Among the 88 earthquakes of magnitude 2.0 or higher, 23 (26.17%) occurred in the same area. Since the 2016 Gyeongju Earthquake, large and small earthquakes have occurred continuously. Therefore, sustainable preparation and management planning for such events is required.

Data
We selected factors affecting seismic vulnerability based on the results of a previous study, taking into consideration applicability and practicality [51]. The main indicators were geotechnical, physical, structural, social, and capacity indicators; we selected a total of 18 sub-indicators corresponding to these categories. Geotechnical sub-indicators included slope, altitude, and groundwater level; physical sub-indicators included peak ground acceleration (PGA), epicenter distance, and fault distance; structural sub-indicators included building age, construction materials, building density, and number of floors; social sub-indicators indicators included elderly population (≥ 65 years), child population (<15 years), and population density; and capacity sub-indicators included distances from hospitals, fire stations, police stations, roads, and gas stations. Sub-indicators were organized into a raster-based spatial database (10 m spatial resolution) and applied to all We used the 3896 buildings damaged during the 2016 Gyeongju Earthquake as dependent variables. The corresponding building polygons were converted into cells (10 × 10 m spatial resolution) for a total of 9847 cells. Among these cells, 70% (6893) were used as a training dataset to create the models and 30% (2954) were used as a test dataset. We extracted the same number of cells corresponding to undamaged buildings. All cells were randomly sampled, and the accuracy of each model was calculated based on the final training (13,786) and test datasets (5908). , (e) distance from epicenters, (f) distance from faults, (g) density of buildings, (h) construction materials, (i) age of buildings, (j) number of floors, (k) child population, (l) elderly population, (m) population density, (n) distance from hospitals, (o) distance from fire stations, (p) distance from police stations, (q) distance from roads, and (r) distance from gas stations.
We used the 3896 buildings damaged during the 2016 Gyeongju Earthquake as dependent variables. The corresponding building polygons were converted into cells (10 × 10 m spatial resolution) for a total of 9847 cells. Among these cells, 70% (6893) were used as a training dataset to create the models and 30% (2954) were used as a test dataset. We extracted the same number of cells corresponding to undamaged buildings. All cells were randomly sampled, and the accuracy of each model was calculated based on the final training (13,786) and test datasets (5908).

FR Model
The FR model is a probabilistic model used to determine the influence of each factor by analyzing the correlations between seismic vulnerability and earthquake-related factors. The FR model easily classifies the influence factors associated with the largest numbers of accidents during a disaster [52]. FR > 1 indicates strong correlation between seismic vulnerability and the factor class, whereas FR < 1 indicates weak correlation. FR is calculated as follows [53]: TGFC: Training Grid of Factor Class WTG: Whole Training Grid FC: Factor Class Grid WG: Whole Grid In this study, WTG represents the number of cells corresponding to damaged buildings, TGFC is the number of cells corresponding to damaged buildings in the corresponding class, WG is the number of cells corresponding to all buildings, and FC is the number of cells corresponding to the buildings of the corresponding class. After FR values are calculated for each class of the 18 factors and applied to the grid format of each factor, they are superimposed to create the final seismic vulnerability maps.

DT Model
The DT model uses hierarchical structures to find structural patterns in data for the purpose of constructing decision-making rules to estimate the relationships between independent and dependent variables [54]. The DT model consists of three nodes: a root node (all data) located at the top, a set of internal nodes (splits), and a set of terminal nodes (leaves) located at the bottom. Pruning is performed from the top of the tree to its bottom until the terminal nodes are reached [55].
Four main algorithms are used to construct DTs: a classification and regression tree (CART), chi-square automatic interaction detector DT (CHAID), ID3, and C4.5 [56]. In this study, we constructed a regression tree model for seismic vulnerability assessment based on a CART algorithm developed by Breiman et al. (1984) [57]. CART is among the most widely known DT algorithms; it minimizes variance through binary recursive partitioning of the branches of a regression tree [58,59]. In this process, CART repeatedly creates two sub-nodes by partitioning a subset of the data using all predictors; its final goal is to create an optimal tree among several candidate trees [60].
In this study, we applied the DT model using the rpart package of the RStudio software (ver. 3.6.0), which creates an optimal model by adjusting the values of representative parameters, i.e., minsplit, minbucket, maxdepth, and cp. Minsplit is the minimum number of observations available at the node for splitting attempts, and minbucket is the minimum number of observations at all terminal nodes. As the minbucket value decreases, the tree becomes more detailed, thereby increasing the complexity of the model and increasing the prediction rate. Maxdepth is the maximum depth of the tree; if its value is 1, then a redundant column is not used as a node, whereas if it is 2 or greater, then redundant column nodes are allowed, increasing the complexity of the model. The cp value is a complexity parameter, and has values between zero and 1. As the cp value decreases, the size of the tree increases [31].

RF Model
RF [61] is a powerful ensemble algorithm that exhibits excellent performance; it has a wide variety of applications, including classification, regression, and unsupervised learning [60,62]. RF creates a binary tree by randomly selecting the training data of variables selected at each node based on a bootstrap sample, and constructs a DT for final prediction [63]. The tree inducer then selects the optimal data by randomly sampling an attribute subset instead of performing optimal partitioning; this process is an improved version of the bagging method, which forms a random DT at each iteration [64].
The regression algorithm of RF, which was used in this study, calculates estimates of the dependent variables using the average of the results. RF is suitable for analyzing hierarchical interactions and nonlinearity among large datasets because it does not require assumptions about the relationships between explanatory and response variables [65].
In this study, we applied the RF model using the randomForest package of the RStudio software (ver. 3.6.0). To create the model, we defined four parameters: the number of trees (ntree), the number of variables to be used at each node (mtry), the maximum number of terminal nodes (maxnodes), and the depth setting of the tree (nodesize). Although increasing ntree does not guarantee an increase in model accuracy, several ntree values must be tested before finding a sufficiently high value to allow the error to converge [66]. If maxnodes is not given, the tree grows to its maximum; nodesize is a minimum number of nodes, and a small value creates a deep tree.

Assessment of Model Performance
Based on training and test datasets, the three models were verified for performance using statistical measures. These are classified into four categories, depending on how well they predicted the actual damaged building-true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The TN and TP are pixels that accurately classified as damaged and undamaged buildings, and FP and FN are pixels that are classified as opposed to actually damaged. These are used to calculate the following statistical metrics: Sensitivity (also referred to as recall) is the proportion of the damaged building pixels correctly classified, Specificity is the proportion of undamaged building pixels correctly classified, Precision is a positive predictive value that represents the proportion of actual damaged building pixels to those classified as damaged buildings by the model, Accuracy is the proportion of the correctly classified damaged and undamaged pixels, and F1-score means the harmonic mean of precision and sensitivity [23,31,42]. Statistical indices are calculated as follows.
F1-score = 2 × Precision × Sensitivity Precision + Sensitivity In addition, the models created based on the three methodologies were verified using the receiver operating characteristics (ROC) curve method. This method evaluates overall model performance by calculating the area under the ROC curve (AUC) values. The AUC can be classified as follows: excellent (0.9-1), very good (0.8-0.9), good (0.7-0.8), average (0.6-0.7), and poor (0.5-0.6) [28]. The y-axis of the ROC curve graph represents sensitivity, or the true positive rate. The x-axis represents 1-specificity, or the false positive rate.

Model Validation and Comparison
Based on the results of the statistical metrics, the performances of the models were compared ( Table 1). The RF model showed a value of 1.000 with respect to all statistical indices, and generally showed excellent performance among the three models. The DT and FR models showed that the value of DT was higher than that of FR in most statistical indices. Its specificity was DT (0.842) and FR (0.415), and precision was DT (0.838) and FR (0.583). The accuracy was shown as DT (0.828) and FR (0.616), and F1-score as DT (0.826) and FR (0.584), whereas, for sensitivity, FR (0.816) was slightly higher than DT (0.814). The three models were verified for prediction ability using a test dataset. Most statistical indices showed that the highest value for RF, followed by DT and FR. The specificity was shown as RF (0.883), DT (0.810), and FR (0.318), and for FR, undamaged buildings were not the best classified. The precision was shown to be RF (0.881), DT (0.801), and FR (0.561), which best matched the positive predicted values of the model with the actual damaged buildings. Its accuracy was shown as RF (0.872), DT (0.787), and FR (0.594); the RF model was the best in classifying the damaged and undamaged buildings. The F1-score, which considered precision and sensitivity, was shown as RF (0.881), DT (0.782), and FR (0.616). For sensitivity, FR (0.871) was the highest, which was the best at classifying actual damaged buildings, followed by RF (0.862) and DT (0.764). These results confirmed that the RF model seems to be most suitable for predicting the damaged buildings.
The performance accuracy of the model was verified by calculating the success and prediction rates through ROC curves (Figures 4 and 5). The success rate is a measure of the training degree of the model based on the training data, and the prediction rate is a measure of how well the model predicts damage to the building based on the test data. We verified the accuracy of all models using the IBM SPSS software (ver. 25). The FR model exhibited a success rate of 0.661 and a prediction rate of 0.655. The DT model constructed the optimal model by adjusting the minsplit, minbucket, maxdepth, and cp values based on the training datasets. In this study, the optimal model was created at minsplit, minbucket, maxdepth, and cp values of 20, 7, 30, and 0.001, respectively. The success and prediction rates were 0.899 and 0.851, respectively. RF also created the optimal model by adjusting the ntree and mtry values based on training datasets. The highest accuracy was observed at an ntree of 8000 and mtry of 6. The RF model showed the highest performance among the three methodologies, with a success rate of 1.000 and prediction rate of 0.949. The validation based on the statistical indices and ROC curves confirmed that the RF model is the most suitable model for the training and test datasets.

Relative Importance of Factors
After deriving the optimal model for each methodology, we determined the relative importance of the 18 sub-indicators. First, for the FR model, each factor was divided into six classes using the natural breaks method to identify the class with the largest impact on seismic vulnerability. FR values were calculated for the sub-indicators in each class based on the number of pixels corresponding to undamaged and damaged buildings, respectively ( Table 2)

Relative Importance of Factors
After deriving the optimal model for each methodology, we determined the relative importance of the 18 sub-indicators. First, for the FR model, each factor was divided into six classes using the natural breaks method to identify the class with the largest impact on seismic vulnerability. FR values were calculated for the sub-indicators in each class based on the number of pixels corresponding to undamaged and damaged buildings, respectively ( Table 2)

Relative Importance of Factors
After deriving the optimal model for each methodology, we determined the relative importance of the 18 sub-indicators. First, for the FR model, each factor was divided into six classes using the natural breaks method to identify the class with the largest impact on seismic vulnerability. FR values were calculated for the sub-indicators in each class based on the number of pixels corresponding to undamaged and damaged buildings, respectively ( Table 2) (1.16). Buildings corresponding to these classes are predicted to experience the highest degree of damage due to  Importance scores for the various factors considered in the DT model are shown in Table 3. PGA was found to have the largest impact on building damage due to earthquakes (importance = 434.591), followed by epicenter distance (404.310) and distance from fire stations (307.873). Factors with the smallest impact on seismic vulnerability were related to construction materials (masonry, concrete, wood, steel, and concrete/steel mixture) and slope.
In the RF model, the percent mean square error (%IncMSE) and node purity (IncNodePurity) were determined as measures of factor importance in regression tree analysis. Maximum %IncMSE is reached when the variable with the highest value is removed from the model. An increase in IncNodePurity indicates a decrease in the Gini coefficient and includes a reduction in the residual sum of squares of the model. The Gini coefficient is a measure of tree node homogeneity; high Gini coefficient values indicate greater importance of the corresponding variable [67]. Epicenter distance (337.065) exhibited the highest %IncMSE, followed by distance from fire stations (325.576) and PGA (313.262) ( Table 3). Epicenter distance (287.309) was found to be the most important factor based on IncNodePurity, followed by PGA (271.752) and altitude (254.792). Thus, epicenter distance and PGA have the greatest impact on seismic vulnerability according to the RF model, whereas factors related to construction materials are of low importance.

Seismic Vulnerability Mapping
Three seismic vulnerability maps were created based on data for all 71,888 buildings in Gyeongju. In the FR map, FR values were applied to each of the six classes for each sub-indicator. The final seismic vulnerability map was created by superimposing the resulting 18 sub-indicators. Seismic vulnerability maps based on the DT and RF models were created based on the prediction values of the models. In all three seismic vulnerability maps, indicator values were normalized to between 0 and 1, and then divided at equal intervals into five risk classes: safe, low risk, moderate risk, high risk, and very high risk. Based on the resulting maps, the distribution of Gyeongju buildings' risk classes was compared among administrative districts. In the FR map, 589 buildings (0.82%) were classified as safe, 9999 (13.91%) as low risk, 36,172 (50.32%) as moderate risk, 21,299 (29.63%) as high risk, and 3829 (5.33%) as very high risk. Areas that are more vulnerable to earthquakes were then identified based on the sum of the proportions of buildings corresponding to high and very high risk. District 11 was found to be the most vulnerable district to earthquake damage, followed by districts 12, 9, 8, and 15. Among areas classified as safe and low risk, district 2 was found to be the safest, followed by districts 23, 1, 20, and 18 ( Figure 6). In the DT map, 33,890 buildings (47.14%) were classified as safe, 13,621 (18.95%) as low risk, 9305 (12.94%) as moderate risk, 7593 (10.56%) as high risk, and 7479 (10.40%) as very high risk. The most vulnerable areas were districts 14, 7, 17, 15, and 8, whereas the safest areas were districts 19, 18, 23, 2, and 1 (Figure 7). In the RF map, 23,803 buildings (33.11%) were classified as safe, 26,429 (36.76%) as low risk, 13,669 (19.01%) as moderate risk, 6548 (9.11%) as high risk, and 1439 (2.00%) as very high risk. The most vulnerable areas were districts 14, 7, 15, 17, and 12, whereas the safest areas were districts 2, 18, 19, 23, and 5 ( Figure 8). Figure 9 shows the building distribution by risk class.

Discussion
In this study, three seismic vulnerability maps were created based on FR, DT, and RF methodologies, and their results were compared. First, we analyzed the importance of sub-indicators according to each methodology. Epicenter distance and PGA exhibited high importance in both the DT and RF models. Among all 9847 building cells, 5386 (54.70%) and 8756 (88.92%) corresponded to damaged buildings within 5 and 10 km of an epicenter, respectively. These results confirmed that most buildings close to epicenters were damaged; accordingly, this factor had a large influence on model construction. According to the seismic design criteria of South Korea, for an earthquake with a return period of 1000 years, the design ground acceleration of ground with normal rock (SB) is 0.154 g, whereas that of very dense ground (SC) is 0.18 g [3]. Based on these criteria, 9356 of cells corresponding to damaged buildings (95.01%) were found in areas where PGA exceeded 0.18 g. Thus, most cells exhibited values higher than the design ground acceleration, which indicates that ground

Discussion
In this study, three seismic vulnerability maps were created based on FR, DT, and RF methodologies, and their results were compared. First, we analyzed the importance of sub-indicators according to each methodology. Epicenter distance and PGA exhibited high importance in both the DT and RF models. Among all 9847 building cells, 5386 (54.70%) and 8756 (88.92%) corresponded to damaged buildings within 5 and 10 km of an epicenter, respectively. These results confirmed that most buildings close to epicenters were damaged; accordingly, this factor had a large influence on model construction. According to the seismic design criteria of South Korea, for an earthquake with a return period of 1000 years, the design ground acceleration of ground with normal rock (SB) is 0.154 g, whereas that of very dense ground (SC) is 0.18 g [3]. Based on these criteria, 9356 of cells corresponding to damaged buildings (95.01%) were found in areas where PGA exceeded 0.18 g. Thus, most cells exhibited values higher than the design ground acceleration, which indicates that ground

Discussion
In this study, three seismic vulnerability maps were created based on FR, DT, and RF methodologies, and their results were compared. First, we analyzed the importance of sub-indicators according to each methodology. Epicenter distance and PGA exhibited high importance in both the DT and RF models. Among all 9847 building cells, 5386 (54.70%) and 8756 (88.92%) corresponded to damaged buildings within 5 and 10 km of an epicenter, respectively. These results confirmed that most buildings close to epicenters were damaged; accordingly, this factor had a large influence on model construction. According to the seismic design criteria of South Korea, for an earthquake with a return period of 1000 years, the design ground acceleration of ground with normal rock (SB) is 0.154 g, whereas that of very dense ground (SC) is 0.18 g [3]. Based on these criteria, 9356 of cells corresponding to damaged buildings (95.01%) were found in areas where PGA exceeded 0.18 g. Thus, most cells exhibited values higher than the design ground acceleration, which indicates that ground acceleration caused building damage during the 2016 Gyeongju Earthquake, and that PGA exerted a large influence on seismic vulnerability model construction. Factors exhibiting low importance included construction materials (masonry, concrete, wood, and steel/concrete mixture).
Among all damaged building cells, 3083 (31.31%) corresponded to buildings made of masonry or wood, which are relatively vulnerable construction materials. A much larger proportion of damaged building cells corresponded to concrete and steel, which are relatively strong construction materials. In addition, Gyeongju City, as a historic site, has many old buildings that correspond to the relatively weak wood and masonry. However, the corresponding buildings continue to be renovated to preserve historical values. Finally, it can be seen that most affected buildings are small buildings excluded from the seismic design targets (one-or two-story buildings with a floor area of less than 500 square meters) [68]. Therefore, construction materials are somewhat unsuitable for seismic vulnerability assessment.
Next, model success and prediction rates were analyzed to determine their functional differences. The RF model was found to be the most reliable among the three models, with the highest success (1.000) and prediction rates (0.949). The RF model complements the shortcomings of a single tree and operates well on large datasets; therefore, it performed best due to the relatively large number of datasets used in this study. The DT model showed the next best performance, with success and prediction rates of 0.899 and 0.851, respectively. The FR model showed success and prediction rates of 0.661 and 0.655, respectively, indicating low accuracy and underfitting, which prevents the reflection of important trends due to oversimplicity [69]. Therefore, the FR model is somewhat unsuitable for seismic vulnerability assessment.
Several studies of disaster susceptibility have also compared model performance among methodologies similar to those used in the present study.  [71] created models based on four tree-based machine learning methods (RF, CART, LMT, and best first DTs (BFDT)) and compared their performance for landslide susceptibility assessment and mapping. The RF model exhibited the highest prediction accuracy (98.5%), followed by LMT (0.945), BFDT (0.934), and CART (0.933). Thus, several studies have found that tree-based machine learning models exhibited higher performance than statistical models, and RF models exhibited high performance in most studies, confirming their suitability for vulnerability analysis. In a previous study, Han et al. (2019) [20] used 15 factors except for social indicators to build LR and SVM kernel models to compare and analyze their performance. The results showed that the performance of the model based on the radial basis function (RBF) kernel (0.998) of SVM was the best, followed by polynomial (0.842), linear (0.649), LR (0.649), and sigmoid (0.630). The prediction rates were shown for RBF (0.919), polynomial (0.804), LR (0.655), linear (0.651), and sigmoid (0.629). The results showed with the prediction rates that the RF model was about 3% more accurate than the RBF kernel-based model.
Finally, we compared the seismic vulnerability maps created in this study. In all three maps, district 15 (Wolseong) was found to be the most dangerous area, whereas districts 2 (Gangdong), 18 (Yangbuk), and 23 (Yangnam) were identified as safe. Therefore, we mainly focused our sub-indicator characterization and comparison analyses on these districts. District 15 is located in central Gyeongju, with an epicenter distance of 2.798 km, fault distance of 4.269 km, PGA of 0.262 g, altitude of 62.825 m, and groundwater level of 15.078 m. In terms of its structural indicators, district 15 has a building density and age of 322.834 and 43 years, respectively. Among all of the buildings in this district, 1521 (68.36%) are less than 50 years in age. District 15 has a population density of 237.167. In contrast, districts 2, 18, and 23 are located near the northern and southeastern coasts of Gyeongju, with epicenter and fault distances of 11.211 and 4.407 km, respectively, which are further than those of district 15. These districts have a PGA of 0.159 g, altitude of 54.165 m, and groundwater level of 10.292 m, which are lower than those of district 15, as well as a building density and age of 85.869 (ca. 3.76-fold lower than that of district 15) and 32 years, respectively. Among all buildings, 6469 (77.90%) are less than 50 years in age. These districts have a population density of 68.997, which is ca. 3.44-fold lower than that of district 15. There was no significant difference in the average values of five capacity-related factors (distance from hospitals, police stations, fire stations, roads, and gas stations) between dangerous and safe areas.
This study is meaningful in evaluating seismic vulnerability by comprehensively considering 18 factors related to geotechnical, physical, social, and capacity indicators, along with structural characteristics. It was also intended to derive a model suitable for the assessment of seismic vulnerability in Gyeongju by establishing models corresponding to various methodologies. Based on the results of the study, the seismic vulnerability assessment data provided in this study may be used as reference data for selecting parameters for seismic vulnerability assessments in other regions using more or fewer influence factors. The proposed method is also expected to contribute to improving seismic vulnerability assessment and mapping in domestic areas other than Gyeongju.

Conclusions
In this study, seismic vulnerability maps were created and seismic vulnerability assessment was performed for buildings in Gyeongju, South Korea using the probabilistic FR model and machine-learning-based DT and RF models. Models were created for each methodology using 18 factors affecting seismic vulnerability (slope, altitude, groundwater level, PGA, epicenter distance, fault distance, building age, construction materials, building density, number of floors, elderly population, child population, population density, and distances from hospitals, fire stations, police stations, roads, and gas stations) as independent variables and buildings damaged in the 2016 Gyeongju Earthquake as dependent variables. Epicenter distance and PGA were found to be the most important factors in the DT and RF models, and factors related to construction materials were the least important. These results may be used as reference data for models based on other methodologies. Model accuracy (success and prediction rates) was verified using ROC curves; the RF and FR models exhibited the highest and lowest performance, respectively, indicating that the machine-learning-based model is more suitable for seismic vulnerability assessment. Dangerous and safe areas were identified based on the seismic vulnerability maps created using the three models; in all three maps, district 15 was found to be the most dangerous area and districts 2, 18, and 23 were the safest areas. Therefore, district 15 must be managed first in preparation for future earthquakes. The seismic vulnerability maps created in this study facilitate intuitive identification of dangerous districts within the target area, which will prevent greater damage in future earthquakes through the establishment of evacuation routes for residents. As reference data, our findings may be used for developing earthquake-related policies and determining suitable locations for vulnerable infrastructure (e.g., pipelines or high-voltage facilities), as well as important national facilities (e.g., airports, military facilities, and nuclear power plants).