Rapid Visual Screening Feature Importance for Seismic Vulnerability Ranking via Machine Learning and SHAP Values

: Structures inevitably suffer damage after an earthquake, with severity ranging from minimal damage of nonstructural elements to partial or even total collapse, possibly with loss of human lives. Thus, it is essential for engineers to understand the crucial factors that drive a structure towards suffering higher degrees of damage in order for preventative measures to be taken. In the present study, we focus on three well-known damage thresholds: the Collapse Limit State, Ultimate Limit State


Introduction
During the last decades, due to the large amount of existing building stock, engineering focus has shifted from analyzing and designing new structures to maintaining preexisting buildings to modern standards of safety and serviceability [1].As is well known, the results of an earthquake can be catastrophic to society in terms of loss of human lives and require large monetary reparations, with examples including the Turkey (Izmit) 1999, Athens 1999, Pakistan 2005, and Turkey 2023 earthquakes.
Governments and authorities can take preemptive measures to mitigate these effects; however, due to obvious limitations in resources and manpower, it is not possible to do so for all existing buildings, especially in large urban areas.Thus, most countries have introduced multi-stage procedures to assess and evaluate the total potential consequences and losses from an earthquake, and thereby identify the most critical structures where allocation of further resources should be prioritized.
As a first step in these methods a Rapid Visual Screening Procedure (RVSP) [2] is usually performed, wherein experts quickly inspect buildings and identify key structural characteristics that affect the overall seismic behaviour.For example, this could include whether or not the structure has short columns or soft storeys, the presence of neighboring buildings that could result in pounding effects, irregularities in the horizontal or vertical plan of the building, and others [3,4].Subsequently, these obtained characteristics are weighted to compute a seismic vulnerability index which is used to rank the structures according to their expected degree of damage [5].Finally, the most vulnerable structures that have been identified from the aforementioned steps are subjected to more accurate analytical methods such as step-by-step dynamic analysis.These methods take into account other structural characteristics, such as the design of structural reinforcement and quality of concrete, and yield an accurate assessment of the seismic vulnerability of the structures under consideration.In turn, this allows for the identification of any potentially required preemptive measures to be applied.However, they are prohibitively costly and time consuming to apply to every structure in the population.
In USA, the Federal Emergency Management Agency (FEMA) first introduced such an RVSP [2] in 1988, which has since been modified to include more structural features that affect the overall seismic performance [6].Countries with high seismic activity, such as Japan, Italy, Canada, India, and Greece, have derived similar pre-earthquake assessments adapted to the characteristics of their respective building stocks.The success of RVSP in screening candidate structures for further analysis heavily depends on accurate calibration of the weights of the structural characteristics.Thus, past researchers have used data from major recorded earthquakes in conjunction with engineering expertise for this task [7,8].Similarly, for masonry buildings, both index-based [9] and physics-based [10] structural vulnerability assessment studies have been conducted.The effect of the structural parameters of this type of building on their structural vulnerability has been studied as well [11,12].
On the other hand, recent years have seen an increase in the use of Machine Learning (ML) methods for the task of predicting the degree of damage of reinforced concrete structures.Classification techniques have been previously employed to classify structures into predicted damage categories.Harichian et al. [13] employed Support Vector Machines, which they calibrated on dataset of earthquakes in four different countries.Sajan et al. [14] employed a variety of models, including Decision Trees, Random Forests, XGBoost, and Logistic Regression.Similarly, regression methods have been employed for this task.Among others, Luo and Paal [15] and Kazemi et al. [16] used ML methods to predict the interstorey drift, which can be used as a damage index.
Even though machine learning methods are powerful, they often lack the desired interpretability.The path that a Decision Tree follows to reach its predictions can be readily visualized; however, the same does not hold for more complex ML models.Thus, explainability techniques and models have been employed in ML [17] in order to analyze how these models weigh their input parameters when making a decision, thereby increasing the reliability of their predictions.Among others, Mangalathu et al. [18] recently employed Shapley additive explanations (SHAP) [19] to quantify the effect of each input parameter on damage predictions of bridges in California.Sajan et al. [14] performed multiclass classification to predict the damage category of structures and binary classification to predict whether the damage was recoverable or reconstruction was needed.Subsequently, they employed SHAP values to identify 19 of the top 20 most important features for both tasks.However, the features they employed significantly deviated from those in the RVS procedure, and lack many of the features employed in the present study.
The features employed in the present study have been used previously [20,21]; however, there is no consensus on the magnitude of the effect that each feature has on the vulnerability ranking, with different researchers and different seismic codes employing different values.In this paper, we implement explainable machine learning techniques and SHAP values to analyze features' contribution to the relative classification of structures in the respective damage categories.To the best of our knowledge, the novel contribution of the introduced approach is that it does not attempt to directly predict the damage category.Instead, it considers the well known thresholds of the Serviceability Limit State (SLS), Ultimate Limit State (ULS), and Collapse Limit State (CLS) to distinguish structures that not only surpass the ULS threshold but suffer partial or total collapse which could potentially lead to loss of human life.Moreover, machine learning is used to develop binary classification models capable of distinguishing between adjacent damage categories.
The benefit of this modeling research effort in comparison with the previously established literature is twofold.On the one hand, the obtained binary classifiers have significantly improved accuracy compared to previous models.This higher accuracy enhances the reliability of the extracted feature importance coefficients, which is the main focus of the present study.On the other hand, the binary classification approach allows us to examine each of the damage thresholds separately.This allows us to answer the following questions: What are the deciding factors that lead a structure which would have otherwise suffered minimal to no damage to cross the serviceability limit threshold?If a structure does cross the serviceability threshold, what factors prevent it from crossing the ultimate limit state threshold as well?Finally, if it does cross the ULS threshold, what factors prevent it from ultimately collapsing?

Dataset Description
The dataset used in the present study is a sample consisting of 457 structures obtained after the 1999 Athens Earthquake via Rapid Visual Screening (RVS) [20].The selected structures suffered damage across the spectrum, ranging from very low or minimal damage to structures that partially or completely collapsed during the earthquake.The dataset was drawn from different geographical region; thus, the local conditions varied across the sample.In [20], the authors took steps to mitigate the effect on the study of potential biases due to local effects.When sampling from a specific building block, they sampled structures across the entire damage spectrum.This mitigated the effect of the location of the structure on its seismic damage, as structures in the same building block had the same local conditions.The degree of damage was labeled using four categories: • Black: Structures that suffered total or partial collapse during the earthquake, potentially leading to loss of human life.• Red: Structures with significant damage to their structural members.• Yellow: Structures with moderate damage to the structural members, potentially including extended damage to nonstructural elements.• Green: Structures that suffered very little or no damage.
An example of the application of the RVS procedure can be seen in Figure 1, courtesy of [20].The distribution of structures across the above damage categories is shown in Figure 2.For each structure, a set of attributes were documented, specifically: 1. Free ground level (Pilotis), soft storeys and/or short columns: In general, this attribute pertains to structures wherein a storey has significantly less structural rigidity than the rest.For example, this can manifest on the ground floor (pilotis) when it has greater height than the typical structure storey, or when the wall fillings do not cover the whole height of a storey, effectively reducing the active height of the adjacent columns.2. Wall fillings regularity: This indicates whether the infill walls are of sufficient thickness and with few openings.The presence of such wall fillings is beneficial to the structure's overall seismic response, as during an earthquake they act as diagonal struts that support the surrounding frames.3. Absence of design seismic codes: In Greece, this pertains to pre-1960 structures which were not designed following a dedicated seismic code.4. Poor condition: Very high or non-uniform ground sinking, concrete with aggregate segregation or erosion, or corrosion in the reinforcement bars are examples of maintenance-related factors that can reduce the seismic capacity of a building.5. Previous damage: This pertains to structures which had suffered previous earthquake damages that was not adequately repaired.Although this is distinct feature from "poor condition", it causes a similar reduction in the nominal seismic capacity of the building.6. Significant height: This describes structures with five or more storeys.7. Irregularity in height: This describes structures with a discontinuity in the vertical path of the loads.8. Irregularity in plan: This pertains to structures with floor plans that significantly deviate from a rectangular shape, e.g., floor plans with highly acute angles in their outer walls or with E, Z, or H-shapes.Irregularity in height, plan, or both can cause excess seismic overload on the building.9. Torsion: This affects structures with high horizontal eccentricity, which are subjected to torsion during the earthquake.10.Pounding: If adjacent buildings do not have a sufficient gap between them, and especially if they have different heights, then the floor slabs of one building can ram into the columns of the other.11.Heavy nonstructural elements: These elements can potentially create eccentricities if they are displaced during an earthquake, leading to additional torsion.This is because even though these are nonstructural elements, they can often contribute to the total mass and horizontal stiffness of the structure.12. Foundation Soil: The Greek Code for Seismic Resistant Structures-EAK 200 [3] classifies soils into categories A, B, C, D, and X. Class A refers to rock or semi-rock formations extending in wide area and large depth.Class B refers to strongly weathered rocks or soils mechanically equivalent to granular materials.Classes C and D refer to granular materials and soft clay, respectively, while class X refers to loose fine-grained silt [3].
In [20], as well as in the present study, soils in EAK category A are classified as S1, while those in category B are classified as S2; soils in EAK categories C, D, and X were not encountered.13.The design Seismic Code: This feature describes the seismic code(s) that the structures adhered to at the time of their design.Specifically, structures that were built before 1984 are classified as RC1, buildings constructed between 1985 and 1994 are labeled RC2, and buildings constructed after 1995 are labeled RC3, as the Greek state introduced updated seismic codes at these milestones.
Note that most of the above features are binary, i.e., the dataset provides a Yes/No statement about whether or not the structure displayed the relevant feature.We transformed these to Boolean values, i.e., {Yes, No} → {0, 1}.The design seismic code was transformed to an integer value, i.e., {RC1, RC2, RC3} → {1, 2, 3}.Finally, in 452 out of the 457 total documents, the authors of [20] noted the exact number of storeys instead of whether or not this was ≥5.As this was deemed more informative, we opted to disregard these structures (1.09% of the sample) and use this feature instead.

Data Preprocessing
The core of the designed and employed modeling effort lies in the development of a Machine Learning (ML) model for binary classification f : R n × R n → {−1, +1} that, given a pair of structures (s i , s j ) with corresponding feature vectors x i , x j ∈ R n , is capable of predicting whether s j should rank higher than s i or vice versa [22].
However, it can be readily observed from Figure 2 that the "Red" label heavily dominates the sampled dataset.This so-called "class imbalance problem" has significant adverse effects on any machine learning algorithm [23][24][25][26].It leads the model to be skewed towards the majority class, creating bias and rendering the algorithm unable to adapt to the features of the minority classes [23,24].This imbalance can be treated by undersampling the majority class, and there are numerous methods in the literature in order to do so [27][28][29].These methods include randomly selecting a subset of the samples in the majority class [30,31], or using model-based methods such as NearMiss, Tomek Links, or Edited Nearest Neighbours [27][28][29].NearMiss-2 was found to perform the best, and is used in the sequelae.We undersampled the majority class by a factor of 50% in order to achieve a relative class balance, which, as mentioned, is crucial to the performance of machine learning algorithms.The distribution of structures across the above damage categories after undersampling is shown in Figure 3. Next, in order to represent the pair (x i , x j ) using a single feature vector x new as input for the machine learning model, we considered the pairwise transformation T : R n × R n → R n with T(x i , x j ) = x j − x i .Other pairwise transformations can be employed, e.g., T 2 : R n × R n → R 2n , with T 2 (x i , x j ) = [x i ; x j ], i.e., appending x j to x i [32].However, the transformation employed in the present study has the advantage of a more natural interpretation, which is the goal of this study.For a example, a value of 2 storeys in the transformed dataset indicates that structure s j has two more storeys than s i .Similarly, a transformed value of −1 for the "pounding" attribute indicates that s i suffered from pounding while s j did not.
A similar transformation was applied to the labels of the damage categories.To this end, the labels where first ranked in ascending order, i.e., Green, Yellow, Red, Black → {1, 2, 3, 4}.Then, for a pair of structures (s i , s j ) with (y i , y j ) ∈ {1, 2, 3, 4} 2 and y i ̸ = y j , the transformed target variable was y new = sign(y j − y i ), where sign denotes the sign function.Thus, for example, a transformed variable of −1 indicates that s j suffered more severe damage than s i .As the focus of this research is to gauge the contribution of the involved parameters to the extent of a structure's relative damage, pairs with y i = y j were not included in the transformed dataset.
Thus, the final transformed dataset had inputs X new and outputs y new obtained via the above transformations described.

Machine Learning Algorithm
In order to analyze the importance of each feature for the relative classification of each pair of structures, we considered three different pairings of structures.Specifically, we considered the subset consisting of the (Green, Yellow), (Yellow, Red), and (Red, Black) structures.We did this because each of the labels has a very distinct definition: the Black and Red structures correspond to the Collapse state and Ultimate Limit State (ULS), respectively, while Yellow corresponds to the Serviceability Limit State (SLS).Thus, by using this pairing our models learn to distinguish adjacent damage states and the features that lead to this increase in damage.For each of these pairs, we performed the pairwise transformations presented above.The number of structures in each pair and each transformed dataset is shown in Table 1.We constructed a binary classifier for each of the above pairs, as described in Section 2.2.The subsequent analysis of the importance of the features of these classifiers helps to determine the deciding factors that lead a structure to being in the Red rather than the Yellow category, i.e., crossing the ULS and suffer heavy damage instead of only crossing the SLS and suffering moderate damage.There are many classifiers available in the literature to perform this task.In [22], the authors worked on the same dataset and analyzed a variety of models.The best performing one was found to be the Gradient Boosting (GB) Classifier [33], which is what we employing in the sequelae.GB is a powerful method that learns a classifier incrementally, starting from a base model; specifically, it learns a function where h i represents the individual "weak" models (Decision Trees [34]) that the algorithm learns at each iteration, θ i represents their parameters, N is the user-defined number of such models, and α i represents the learned weights that produce the final linear combination.The steps of the method are shown in Algorithm 1 [35].The algorithm was implemented in Python programming language (v.3.11.5)using the scikit-learn machine learning library (v.1.3.0)[36].
Algorithm 1 Gradient Boosting Learning Process [35] Initialize f 0 (x) for i = 1, 2, . . ., N do: end for In the above algorithm, L is the loss function that measures the error between the predictions and the true values, M is the number of samples the model is trained, on and λ > 0 is the so-called "learning rate", which modifies the contribution of each individual tree [37].

Hyperparameter Tuning
As is evident from (1) and Algorithm 1, Gradient Boosting learns a number of parameters during its training, e.g., weights α i .However, there are a number of so-called hyperparameters, i.e., parameters set by the user before training begins, such as the number N of individual Decision Trees and the maximum allowed depth of each tree.The configuration of these hyperparameters can reduce overfitting [38,39] and has a direct impact on the overall accuracy of the model [40].
Thus, the importance of appropriately tuning of these hyperparameters to achieve optimal results becomes clear.This has led to a variety of methods to address this process, with reviews of the existing algorithms provided by Yu and Zhu [41] and by Yang and Shami [40].In this paper we opt for Bayesian optimization, as it does not search the hyperparameter space blindly, instead using each iteration's results in the next one, which can lead to faster convergence to the optimal solution [42].The implementation was carried out using the dedicated Python library scikit-optimize [43] (v.0.10.1).

SHAP
A common measure used to gauge the strength of each feature's effect on the outcome, which is the focus of the present study, is the so-called SHapley Additive exPlanation (SHAP) [19].This is the equivalent in the machine learning literature to the Shapley values in cooperative game theory introduced by Lloyd Shapley in 1951 [44].SHAP values provide interpretability by constructing a simpler explainable model in the local neighborhood of each point in the dataset.Thus, given a learned Machine Learning model f , a local approximation g can be formulated as follows [19]: where n is the number of features, u ∈ R n is a binary vector whose value in the i th position denotes whether or not the corresponding feature was used in the prediction, and ϕ i denotes the SHAP value of that feature, i.e., the strength of its contribution to the model's output.
The values of the ϕ i s, following the notation of Lundberg et al. [45], are computed as follows: let N = 1, 2, . . ., n be the set of features used and let S ⊆ N be a subset of N; then, we have [45,46] Intuitively, this corresponds to the weighted average over all feature combinations (coalitions) of tpip inhe difference in the model prediction with and without the inclusion of the i th feature.
As has been mentioned, the above ϕ i values pertain to a specific point.For example, considering the pair (Red, Black), there were 102 Red structures in the undersampled dataset and 90 Black ones, yielding 90 × 102 = 9180 pairs, i.e., samples in the transformed space, as shown in Table 1.
Thus, we have a matrix Φ ∈ R 9180×13 in which each value ϕ ij is the SHAP value of the j th feature calculated at the i th sample.Thus, in order to obtain an aggregated value for the whole dataset, we used a normalized norm of each column in the matrix.We compared the results obtained using the L 1 norm (sum of absolute values), which is the most commonly used in the literature, and the well known Euclidean norm L 2 , which increases the contribution of larger values while simultaneously reducing the effect of smaller noisy components.Thus, for each feature j = 1, 2, . . ., 13 we considered the alternatives as obtained by Equation ( 4): where m is the number of samples in the transformed space for each pair, as shown in Table 1.The computation of the SHAP values was carried out using the dedicated Python library by Lundberg et al. [47].Thus, our overall proposed methodology comprises the following steps.For each damage threshold: (1) obtain the transformed inputs X new and outputs y new as described in Section 2.2; (2) train the corresponding binary classification ML model using data for the particular damage threshold; and (3) obtain the feature importance metrics of the trained ML model using SHAP values via Equation (4).

Results
As previously stated, the main focus of this study is to analyze the importance of each feature in deciding whether a structure will cross each of the respective damage thresholds.As explained in Section 2.5, this is carried out using SHAP values, which offer just such a quantification.However, the reliability of any feature importance analysis is directly related to the performance of the model under consideration.If a model has poor performance, then the way that it arrives at its predictions will not be very informative.On the other hand, the higher a model's performance, the closer its predictions are to the truth.Thus, the extracted feature importance values are closely coupled with the underlying physical phenomenon, and can be considered highly reliable.
To this end, the rest of this section is structured as follows.In the first part, Section 3.1, we present the results of the hyperparameter tuning and the classification performance metrics.Tuning the hyperparameters allows us to find the model with the highest accuracy and the most reliable feature importance values.Subsequently, we present the accuracy metrics obtained using the optimal values of the involved hyperparameters.This demonstrates the high accuracy obtained by the models, especially in the most critical damage categories, which enhances the reliability of the extracted feature importance values.Finally, in Section 3.2, we present the main results of this research based on the feature importance values obtained from these models.

Binary Classifiers and Hyperparameter Tuning
As mentioned in Section 2.3, we constructed a binary classifier for each pair of labels considered here, namely, (Green, Yellow), (Yellow, Red) and (Red, Black).Each of these classifiers was tuned separately, and we optimized the following hyperparameters: • max_depth: This is the maximum allowed depth of each individual Decision Tree; too large or too small values can lead to overfitting or underfitting, respectively [48].• n_estimators: This is the number of individual Decision Trees used in Gradient Boosting.• min_samples_leaf: This is the minimum number of samples that must remain in an end node (leaf) of each individual tree.• learning_rate: This controls the contribution of each individual tree, as shown in Algorithm 1.If the value is too large, the algorithm might overfit; however, a lower learning rate has the trade-off that more trees are required to reach the desired accuracy.
Table 2 presents the tuning range of each hyperparameter as well as the optimal value for each of the three classifiers considered here.Having obtained the optimal hyperparameter configuration, we trained and tested our three models using five-fold cross-validation [49].In this framework, the dataset is split into five parts and each part is iteratively used as test set, while the remaining parts are used for training.This ensures that the model's predictions are always on unseen data and reduces the sensitivity/variability of the obtained performance metrics.The performance was measured using the well known classification metrics of Precision, Recall, F1-score, Accuracy, and Area Under the Curve (AUC) [50], with the results shown in Table 3.The results clearly show that the classifiers achieved high performance, especially for the most critical pairs, i.e., (Red, Black) and (Yellow, Red).The accuracy with which the model was able to distinguish between these two categories increases the reliability of the feature importance analysis, which is the main focus of the study.

Feature Importance
This subsection presents our main results analyzing of the importance of the RVS features for the relative classification of structures, which we performed using the SHAP values, as explained in Section 2.5.Note that there is some inherent variability in the computations of the ϕ i , and consequently in ϕ from (3) and ( 4).This can stem from how the algorithm splits the dataset between training and testing at each iteration or from the computation of the SHAP values themselves.To mitigate the sensitivity of the results to these factors, we performed 100 runs of our proposed methodology and averaged the obtained feature importances.This heavily reduces the variability of the computations and increases the reliability of the extracted feature importance values.Thus, we constructed a matrix Θ ∈ R 100×13 , where θ ij is the value ϕ j from (4) for the j th feature at the i th iteration.From this, we calculated the average value per column/feature, i.e., we defined Finally, in order to normalize these coefficients, we divided them with their sum, i.e., With this normalization, we now have 0 ≤ λ i ≤ 1 and 13 ∑ i=1 λ i = 1; therefore, these coefficients can be interpreted as the percentage of the contribution of the corresponding features to the overall predictions of the model.We carried out the above using both of the alternatives used in (4).The results are shown in Figure 4.This figure presents the comparative results of the contribution of each feature to the model predictions expressed as a percentage of the total.As previously discussed, these correspond to the mean contributions for all pairs of structures, which, given that we are averaging over thousands of pairs, are representative of the the overall parameter effect on seismic behaviour across the various limit states of all the structures in the dataset.The left subfigures in Figure 4a-c pertain to L 1 , i.e., the absolute values of these features, while the right subfigures pertain to L 2 , i.e., their squares.The results demonstrate a basic hierarchy of the structural properties that influence the seismic vulnerability of the studied structures and contribute to the observed degrees of damage.In general, the results are in agreement with the existing structural mechanics literature and the seismic behaviour of reinforced concrete structures.We analyze and discuss each of Figure 4a-c separately.

•
Distinction between Red (ULS) and Black (Collapse): As can be seen from the left part of Figure 4a, the most crucial factor overall for the Collapse Limit State is the presence of soft storeys and/or short columns, with a weight of approximately 18%.
The presence of regular infill panel walls, however, has an almost equal in magnitude, but a positive effect, which is why the corresponding bar in the figure is hatched.This is an important feature that helped prevent structures that crossed the ULS to cross the CLS as well.Finally, the absence of design seismic codes, the number of storeys in the structure, and the presence of an irregular plan all play import roles for this damage threshold.
The right part of this feature displays an important distinction, as the absence of design seismic codes is now the dominant feature, even if only slightly.This can be explained in the following way.The absence of design seismic codes feature is indeed a crucial factor, as is well known in the literature, and the model assigns high SHAP values to it.However, not many structures were affected by this feature.Of the 452 structures in our dataset, only 26 lacked a design seismic code.Of these, 20 (77%) crossed the ULS, and 19 of those (95%) crossed the CLS as well.Thus, by taking the squares of the SHAP values, as per the right figure of Figure 4a, we assign more weight to these extreme SHAP values even though they pertained to only a limited number of cases.
It is important to note that there is not a noteworthy distinction in the other factors, such as soft storeys/short columns, regularity of the infill panel walls, or structure height, between the left and right subfigures of Figure 4a, as the corresponding SHAP values are more balanced.• Distinction between Yellow (SLS) and Red (ULS): As can be seen from Figure 4b, the most important features by far are the presence of soft storeys and/or short columns as well as the presence of regular infill wall panels.Soft storeys/short columns had a detrimental effect, accounting for approximately 30% of the total.On the other hand, regular infill wall panels had a beneficial effect with approximately equal magnitude.This is in agreement with the established engineering literature, as bricks walls help to reduce storey drift, and consequently decrease the overall degree of damage.The absence of design seismic codes did not play an important role in this case, as most structures that displayed this feature crossed the CLS as well, as mentioned above.Pounding, on the other hand, had a contribution of approximately 15%.The height of the structure and potential preexisting poor condition accounted for 7-8% each.Out of the thirteen total features, these five combined to account for approximately 85% of the total in the model's predictions.Finally, we note that in this case the SHAP values are balanced, as the left and right subfigures, using L 1 and L 2 , respectively, show minimal differences.• Distinction between Green (minimal to no damage) and Yellow (SLS): Finally, the results for the distinction between structures that crossed the SLS (Yellow) and those that suffered minimal to no damage are shown in Figure 4c.It can be seen that the most important factors here are the existence and type of design seismic codes, each of which account for approximately 20% of the total.This is in agreement with the post-1985 Greek seismic codes, which enforce lower damage degrees for the same earthquake design.Regular infill panel walls, soft storeys and/or short columns, and the presence of adjacent structures that could lead to pounding were relevant here, although the magnitude of their effect was only approximately 10%.

Summary and Conclusions
In this research, we have employed a novel machine learning methodology to approach one of the problems commonly found in countries with high seismic activity, namely, that of the preseismic structural assessment.Specifically, we performed an analysis of how the features obtained in the Rapid Visual Screening procedure affect the seismic vulnerability of structures.We specificallyfocused on three well-known damage thresholds: the Serviceability Limit State, the Ultimate Limit State, and the Collapse Limit State, to further emphasize structures that, in addition to crossing the ULS, suffered total or partial collapse.We employed a pairwise approach to perform our analysis, creating pairs from all structures belonging to adjacent damage categories, as shown in Table 1.We then used a Gradient Boosting Machine to create a binary classification model that learned to distinguish structures for each of the above damage thresholds.As shown in Table 2, we tuned some of the model's hyperparameters to increase its performance.This led to the model having high accuracy, especially in the higher damage categories.
As can be seen from Table 3, the model learned to distinguish the CLS threshold with almost 92% accuracy; similarly, for the ULS threshold it displayed an accuracy close to 89%.While the model's performance dropped to 73% for the SLS, this is the least impactful of the three damage thresholds in engineering practice.Finally, we used SHAP values to quantify the effect of each of the features in our models' predictions.The previously mentioned high accuracy of our models, especially in the higher damage categories, enhances the reliability of the subsequently extracted SHAP values.
In addition, the present study highlights the participation of various factors that contribute to the overall structural vulnerability index as calculated via the RSVP.Qualitatively, our results broadly agree with the previously established engineering literature.For the CLS threshold, soft storeys/short columns, the height of the structure, absence of design seismic codes, and irregularities in height and plan were the most impactful detrimental factors.Regular infill wall panels were shown to have a very positive effect.For the ULS threshold, the absence of a design seismic code did not have a significant influence, as the vast majority of structures with this feature that crossed the ULS crossed the CLS as well.Finally, the implementation of modern design seismic codes played a crucial role in preventing structures from crossing the SLS threshold.
The quantitative results obtained via the application of ML methods and SHAP values demonstrates the potential applicability of this approach for recalibrating the computation of structural vulnerability indices using data from recent earthquakes.The method implemented in the present paper pertains to reinforced concrete structures with a particular set of input features; however, it could be implemented in an identical manner using a different set of input features, for example, in countries where other parameters are deemed more important.It could also be employed in different structural types altogether, for example, in masonry buildings commonly found in traditional communities.

Figure 1 .
Figure 1.Application of Rapid Visual Screening in a specific area.Samples of structures across the damage spectrum were drawn to mitigate local effects.Image courtesy of [20].

Figure 2 .
Figure 2. Distribution of structures across the damage spectrum.

Figure 3 .
Figure 3. Distribution of structures across the damage spectrum after undersampling.

Table 1 .
Number of structures for each label pair and corresponding samples in the transformed dataset.

Table 3 .
Classification metrics for the binary classifier of each pair cross-validated on the whole dataset.