Interpretable Machine Learning for Assessing the Cumulative Damage of a Reinforced Concrete Frame Induced by Seismic Sequences

: This study investigates the interpretability of machine learning (ML) models applied to cumulative damage prediction during a sequence of earthquakes, emphasizing the use of techniques such as SHapley Additive exPlanations (SHAP), Partial Dependence Plots (PDPs), Local Interpretable Model-agnostic Explanations (LIME), Accumulated Local Effects (ALE), Permutation and Impurity-based technique. The research explores the cumulative damage during seismic sequences, aiming to identify critical predictors and assess their inﬂuence on the cumulative damage. Moreover, the predictors contribution in respect with the range of ﬁnal damage is evaluated. Nonlinear time history analyses are applied to extract the seismic response of an eight-story Reinforced Concrete (RC) frame. The regression problem’s input variables are divided into two distinct physical classes: pre-existing damage from the initial seismic event and seismic parameters representing the intensity of the subsequent earthquake, expressed by Park and Ang damage index (DI PA ) and Intensity Measures (IMs), respectively. The study offers a comprehensive review of cutting-edge ML methods, hyperparameter tuning, and ML method comparisons. A LightGBM model emerges as the most efﬁcient, among 15 different ML methods examined, with critical predictors for ﬁnal damage being the initial damage caused by the ﬁrst shock and the IMs of the subsequent shock: I FVF and SI H . The importance of these predictors is supported by feature importance analysis and local/global explanation methods, enhancing the interpretability and practical utility of the developed model.


Introduction
A series of successive occurring earthquakes can cause additional damage to already damaged buildings and consequently increasing the risk of collapse.Historical examples of such events include the New Madrid earthquake series in the central United States from 1811 to 1812, and the 1960 Chilean seismic sequence.The New Madrid sequence [1] comprised three primary shocks and numerous aftershocks, causing widespread structural damage and affecting regions as distant as Canada and the eastern United States.The 1960 Chilean earthquake sequence [2], initiated by a record-setting 9.5 magnitude mainshock, led to substantial destruction and casualties in Chile, as well as a series of tsunamis.One notable example of an earthquake sequence occurred in central Italy in 2016 [3,4].Over a span of several months, a succession of earthquakes, with a 6.2 magnitude mainshock, affected the area, leading to significant destruction and casualties.Experts studying the series of seismic events found that they transpired on pre-existing fault lines that had been under stress for a considerable period.Moreover, they ascertained that the earthquake sequence had shifted the stress distribution within the region, potentially increasing the probability of subsequent seismic events.Another example is the Disclaimer/Publisher's Note: The statements, opinions, and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s).MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions, or products referred to in the content.

of 31
2019 Ridgecrest earthquake sequence [5] in California, which included a 6.4 magnitude mainshock followed by several aftershocks, including a 7.1 magnitude event.This sequence caused significant damage to infrastructure and homes in the region, and it demonstrated the potential for aftershocks to be almost as powerful as the mainshock.In February 2023, southeastern Türkiye and parts of Syria were hit by two strong earthquakes that resulted in numerous aftershocks and significant loss of life and injuries [6,7].The earthquakes had magnitudes of 7.8 and 7.5 and occurred approximately nine hours apart.As an outcome of all the above examples, it is important to assess the cumulative damage and vulnerability of buildings to multiple earthquakes to prevent or minimize the potential losses.
Several studies focus on the effect of multiple earthquake events on the seismic performance of Reinforced Concrete (RC) structures.Abdelnaby in his PhD thesis [8] examines the consequences that successive seismic events have on RC buildings' degradation.Later, Hatzivassiliou and Hatzigeorgiou [9] investigate the effect of seismic sequences on the response of three-dimensional RC buildings, while Kavvadias et al. [10] examine the impact of aftershock severity characteristics on the seismic damage of a RC frame.Trevlopoulos and Guéguen [11] propose a framework to assess the vulnerability of RC structures during aftershock sequences based on period elongation.Other studies in this area include Shokrabadi et al. [12] who investigate the impact of mainshock-aftershock (MS-AS) ground motion pairing on the seismic performance, and Furtado et al. [13] asses the damage of infilled RC structures under MS-AS conditions.Di Sarno and Pugliese [14] study the seismic fragility of existing RC buildings with corroded bars under earthquake sequences, while Iervolino et al. [15] examine the accumulation of seismic damage in multiple MS-AS sequences.Rajabi and Ghodrati Amiri [16] develop behavior factor prediction equations for RC frames under critical MS-AS sequences using Artificial Neural Networks (ANNs).Soureshjani and Massumi [17] investigate the seismic behaviour of RC moment-resisting structures with concrete shear walls under MS-AS seismic sequences.Khansefid [18] studies the effect of structural nonlinearity on the risk estimation of buildings under MS-AS sequences in Tehran metro city.Also, Hu et al. [19] propose a framework for the seismic resilience assessment of buildings that considers the effects of both mainshock and multiple aftershocks.Finally, Askouni [20] investigated how repeated earthquakes affect RC buildings with in-plan irregularities.Overall, the research findings demonstrate the importance of considering the effects of multiple earthquakes in the design and assessment of RC structures.
Machine learning (ML) has gained increasing attention in earthquake and structural engineering due to its ability to predict the behaviour of structures.As a result, a number of studies have focused on exploring the application of ML in this fields.The early papers, such as Zhao et al. [21], used ANNs to predict the response of simple simulated structures, and Stavroulakis and Antes [22] applied ANNs in nondestructive elastostatic identification of unilateral cracks.Later papers, such as De Lautour and Omenzetter [23] and Lagaros and Papadrakakis [24], focused on predicting the non-linear seismic response of 2D and 3D buildings, respectively, using ANNs.These papers demonstrated the potential of this ML method to accurately predict the behaviour of complex structures.Other papers have investigated the application of hybrid ML techniques, such as neuro-fuzzy methods, in predicting structural damage under earthquake excitation, as shown in Sánchez Silva and García [25], Alvanitopoulos et al. [26] and Vrochidou et al. [27].Meanwhile, Mangalathu et al. [28] used ML techniques to classified buildings based on data from post-earthquake observations and Wang et al. [29] to predict the progressive building collapse.Recent papers also have explored the use of deep learning, which have shown great promise in rapid seismic response prediction of RC frames, as seen in Wen et al. [30].In addition, Zhang et al. [31] proposed a physics-guided convolutional Neural Network (NN) for data-driven seismic response modelling, while Muradova and Stavroulakis [32] developed physics-informed NN for elastic plate problems with bending and Winkler-type contact effects, furthermore, Katsikis et al. [33] used the same type of NN in static rod and beam problems.Several papers explore the use of ML for seismic response prediction, including Morfidis and Kostinakis [34], who used ANNs to predict the damage state of RC buildings, and Hwang et al. [35], who used ML to predict the seismic demand and collapse of ductile RC building frames.

of 31
Other papers focus on developing earthquake scenarios for building portfolios [36], rapid seismic response prediction [37], and multivariate seismic classification [38].Kazemi et al. [39] presented a ML-based approach for classification of the structural behaviour of tall buildings with a diagrid structure.Karbassi et al. [40] and Ghiasi et al. [41] proposed decision tree and support vector machine algorithms, respectively, to predict damage in RC buildings.Thaler et al. [42] used ML techniques to enhance tail-end prediction of seismic response statistics, while Chen et al. [43], and Jia and Wu [44] investigated probabilistic ML methods for predicting the performance of structures and infrastructures.The current knowledge also cover various application of ML on seismic fragility and vulnerability prediction, including the development of seismic fragility curves, as in Kiani et al. [45], fragility and vulnerability assessment of RC structures as in Kazemi et al. [46], fragility curve parameters prediction for RC buildings as in Dabiri et al. [47], assessing seismic risk and fragility of underground structures as in Yang et al. [48] and efficient and precise seismic vulnerability assessment of urban structures as in De-Miguel-Rodríguez et al. [49].Moreover, Morfidis et al. [50] developed a user-friendly software application that leverages ANNs for rapid damage assessment of RC buildings in earthquake scenarios.Lastly, Lazaridis et al. [51,52] presented studies on forecasting the structural damage experienced by a RC frame subjected to both individual and successive seismic events using ML methods.Additional information regarding the application of ML in the field of earthquake and structural engineering can be found in the respective review papers [53][54][55][56][57][58][59][60][61][62].
In general, ML models are utilized in earthquake engineering to deliver rapid and accurate seismic vulnerability assessment.Despite this, the advanced of these methods have been criticized as "black boxes", due to lack of transparency and resonable explanation of their predictions, questioning their reliability.To address this issue several techniques have been developed recently [63].Global methods focus on capture the general patterns and trends of models, whereas local approaches target specific data points and provide insights for individual predictions.These kind of techniques have been applied by Mangalathu et al. [64] for seismic performance assessment of infrastructure systems, by Wakjira et al. [65] in flexural capacity prediction of RC beams strengthened with FRCM, by Junda et al. [66] in seismic drifts estimation of CLT buildings and by Demertzis et al. [67] in damage prediction of RC buildings following a single earthquake.
In this study, the expansion of interpretation techniques to the concept of cumulative damage during a sequence of earthquakes is carried out, with the pre-existing seismic damage being taken into account.The primary questions addressed in the present paper include: identifying the most important predictors for damage accumulation, understanding how these variables impact the final damage, and determining the range of cumulative damage in which their contributions lie.Furthermore, a thorough review of employed cutting-edge ML methods is offered, with hyperparameter tuning and ML methods comparisons being incorporated.To this end, the transformation of the problem from the perspective of earthquake engineering to that of ML through the adopted workflow and methodology is discussed.

Feature Selection and Dataset Configuration
The input variables for our regression problem fall into two distinct physical classes: the pre-existing damage from the initial seismic event, along with the seismic parameters representing the intensity of the subsequent earthquake, with the goal of forecasting the cumulative damage occurred by two consecutive earthquakes.The first class of feature, as well as the target, is expressed in terms of Park and Ang damage index (DI PA ) [68].For the second category, a preliminary feature selection process has been conducted.During this process, the outcomes of several studies [34,[69][70][71][72][73][74][75][76][77] investigating the interdependence between seismic parameters and damage indicators of RC structures has been taken into account.As a result, 16 prominent Intensity Measures (IMs) have been selected as seismic damage predictors to express the severity of the second shock.The selected IMs, as well as the damage index, are described in the following section.

Ground Motion IMs
Seismic ground motion IMs are metrics employed to quantify the intensity or severity of seismic acceleration signals.These measures play a crucial role in evaluating a site's seismic hazard, predicting seismic demand on structures, and designing earthquake-resistant structures.Various IMs have been suggested over time, each with its own pros and cons.Peak Ground Acceleration (PGA) [78] is among the most commonly used ground motion signal IMs.PGA represents the maximum absolute acceleration of ground motion during an earthquake and is extensively used in seismic hazard analysis and building design, as it offers a straightforward indication of ground shaking intensity.Furthermore, our suit of IMs encompass amplitude parameters such as the maximum absolute values of ground velocity (PGV), and ground displacement (PGD) signals.The Arias intensity (I A ) [79] and Cumulative Absolute Velocity (CAV) [78] are additional seismic ground motion IMs that supply information about the overall amount of ground motion energy during an earthquake.Both I A and CAV are determined as integrals of ground motion acceleration over time, offering a more comprehensive depiction of the seismic signal compared to PGA or PGV, as they account for both the amplitude and duration of the signal.
The frequency content of ground motion signals significantly influences a structure's response.This content can be assessed in a simplified manner using the corresponding frequency PGA/PGV or by calculating the average zero-crossing count of the acceleration time history per unit time.If the number of zero-crossings is denoted as u o , the fraction of I A over u 2  o is recognized as the potential destructiveness measure, according to Araya and Saragoni (I AS ) [80].The strong motion duration of seismic excitation considering as the time interval during which most of the total intensity is released, is another vital parameter.Two definitions of strong motion duration, Trifunac and Brady (SMD TB ) [81] and Reinoso, Ordaz, and Guerrero (SMD ROG ) [82], are based on the Arias intensity's time evolution.The bracketed duration after Bolt (SMD Bolt ) [83], which is calculated based on the initial and final instances when the acceleration exceeds 5% of g, is also utilized.
Advanced measures can be obtained by merging the above parameters, such as power P90 [78], a rms [78], characteristic intensity (I c ) [78], the potential damage metric by Fajfar, Vidic, and Fischinger (I FVF ) [84], and the intensity measure by Riddell and Garcia (I RG ) [85].However, spectral values reliant on the fundamental structural period are not applicable due to the increase in the fundamental period during the initial earthquake.As an alternative, the Housner intensity (SI H ) [86], which aggregates the values of pseudo-velocity spectrum (PSV) to a constant interval of periods and exhibits a strong correlation with seismic damage, is utilized.A brief depiction of the formulas for the studied IMs is given in Table 1.IMs values are derived with Python [87] and NumPy [88] code, whereas the computation of acceleration spectra is performed using OpenSeismoMatlab [89].

Damage Index
Seismic damage in structures manifests as a reduction in resistance to external forces, resulting in instability.The Park and Ang damage index (DI PA ) is a reliable seismic damage metric that represents structural damage as a linear combination of excessive deformation and damage developed by repeated cyclic loading effects.This index is calculated by summing the maximum bending responses and the energy absorbed by plastic hinges during an earthquake, as shown in Equation (1).Kunnath's modified version of the index [91] is calculated using Equation (2).The total damage index (DI G,PA ) is obtained as an adjusted mean of sub-damage values, where each sub-damage belongs to each structural member.The weight of each sub-damage is proportional to the energy used by its corresponding structural member, according to Equation (3).A low value of DI G,PA , close to zero, indicates that the structure has experienced minimal damage and exhibited an elastic response.Conversely, a value around the unity and larger signifies that the structure is nearing collapse.Overall damage indices, such as DI G,PA , offers a quantitative assessment of a structure's seismic damage and has been utilized in several studies [34,[69][70][71][72]76,92] to evaluate the post-earthquake condition of buildings.
The damage index equation comprises several variables associated with structural element capacity and response, such as the maximum displacement response (δ m ), the ultimate displacement capacity (δ u ), the strength deterioration model constant (β) [93], the absorbed cumulative hysteretic energy ( dE), the yield strength (Q y ), the maximum rotation of the member throughout the response (θ m ), the member's ultimate rotation capacity (θ u ), and the recoverable rotation during unloading (θ r ).

Dataset Configuration
For this study, a comprehensive dataset containing both artificial and natural seismic sequences is used.The former are generated by randomly pairing 318 real acceleration records, while the latter comprises 111 real pairs of sequential shocks.Further information about the suite of seismic sequences can be found in [52].All seismic records are sourced from the ESM [94] and PEER NGA West [95] databases, with a 20-second intermediate time gap introduced between consecutive records (Figure 1) to eliminate overlapping structural responses.Nonlinear time history analyses are conducted on both seismic sequences and individual ground motion records to investigate the seismic structural response.
The real seismic sequences are presented in Table A1 in Appendix A, while the histograms and the probability density curves for all variables across the total dataset are provided in Figure A1 in the same appendix.In this study an eight-story RC frame (Figure 2) designed by Hatzigeorgiou and Liolios [96], without seismic provisions [97] or retrofitting [98][99][100][101][102][103] is examined.The finite element simulation is performed using IDARC 2D [104], and the distributed plasticity concept with a three-parameter Park hysteretic model [105] is employed.The concrete and steel materials are modeled based on their effective strength and strain properties, and the structure's initial elastic fundamental period is 1.27 seconds.GNU Octave [106,107] code is also written to automate the creation of IDARC 2D input files and the subsequent processing of the results.

Machine Learning Methods, Hyperparameter Tuning and Interpretation
Machine Learning (ML) is a branch of artificial intelligence that focuses on create algorithms capable to learn from and make predictions based on data, with the ultimate goal to generalize effectively to new and unseen examples.A subcategory of ML is the Supervised Learning which deals with building models that predict a target output using input features.model parameters are adjusted to minimize the cost function and thereby achieving an effective model fit.A loss function (L) measures the difference between the predicted and actual outputs for an individual data point.Conversely, a cost function J computes the total error between the predicted and true outputs for all data points in the training dataset.A general form of a cost function J provided in Equation ( 4) in which the total error estimated as the average of total m data point losses (L).
The main two problems which the ML techniques could be suffered from are the high variance and high bias problem or in other words overfitting and underfitting respectively.Overfitting define the weakness of the model to generalize its prediction after training to new unseen data as those of the test set, in case of high bias the model does not have the required complexity or the proper input features and behave poorly both to train and test set.To address the variance problem, the regularization technique is used in ML.It adds a penalty term to the cost function that discourages the model from having too many parameters or from having parameters with large magnitude.This helps to prevent overfitting and improve the generalization performance of the model.

Linear Models
In linear models, the output variable is determined by calculating the weighted average of the input features and adding a constant, known as the bias term or intercept (Equation 5).Various types of Linear Regression (LR) [108,109] exist, which are further explained below.
The most commonly used linear regression model based on Ordinary Least Squares (OLS), and establishes the association between input features and the output variable by minimizing the sum of squared differences between real and predicted values.As a fundamental and classic approach to linear regression, OLS is often utilized as a baseline against which more advanced models are evaluated.Lasso Regression [110] adds a l1 regularization term to the loss function J, which results in sparse solutions, where many of the coefficients are exactly equal to zero.The regularization term is defined as the sum of the absolute values of the coefficients.This can lead to feature selection and interpretability, but can be too aggressive in shrinking the coefficients and result in a suboptimal solution.Ridge Regression [111] (RR) adds a l2 regularization term to the loss function J, which results in non-sparse solutions, where the coefficients are shrunk towards zero, but not necessarily exactly equal to zero.The regularization term is expressed as the sum of the squared coefficients, which contributes to achieving more stable solutions and prevent overfitting.Elastic Net [112] (EN) is a combination of Lasso and Ridge Regression, which adds both l1 and l2 regularization penalties to the loss function J (Equation ( 6)), weighted by a mixing parameter, ρ.The regularization parameter α determines the strength of the penalty applied to the coefficients, with larger values of α leading to stronger regularization.The mixing parameter ρ governs the balance between the l1 and l2 penalties in the model.When ρ = 1, it results in the Lasso penalty, whereas when ρ = 0, the Ridge penalty is applied.The adjustment of the mixing parameter, can fine-tuned the model and achieve the desired balance between the two penalties.

Non-parametric Algorithms
The so-called non parametric algorithms have no trainable parameters and and as a result, they get their name.An example of this kind of algorithm is the K-Nearest Neighbors [113] (KNN) is an instance-based technique that depends on memorizing the training dataset.The "K" in the method's name signifies the number of training data examples nearest to the point being examined, which are considered when making predictions.This is the primary hyperparameter of the method.
Decision Trees (DTs) [114] constitute a form of ML algorithm used to tackle classification and regression issues.The algorithm constructs a tree-like model illustrating the connections between input features and target variables.Every node in the tree represents a decision of input features, and each branch corresponds to a possible outcome of that decision.The leaves of the tree denote the final prediction or classification outcome.The algorithm's objective is to identify the optimal set of splits that most effectively separate the data, maximizing the homogeneity within each resulting subset.This is accomplished by using criteria such as Information gain or Gini impurity to determine the most suitable feature for data splitting at each node [114].The depth at which a feature is used as a decision node in the tree is indicative of its relative importance, as it reflects the degree to which the feature contributes to the reduction of impurity.Additionally, a feature's importance increases as it is more frequently used in a tree's split points resulting in a greater impurity reduction.DTs provide several advantages such as interpretability, handling both continuous and categorical data, user-friendliness, and fast prediction times.However, they also have some drawbacks, such as being susceptible to overfitting and a tendency to generate complex trees that are challenging to interpret [114].Addressing these challenges, various algorithms have been devised, such as Random Forests, which is an ensemble of DTs [115], and Gradient Boosted Decision Trees, a boosting technique for DTs [116].

Random Forests
Random Forest [115] (RF) is an ensemble ML technique that relies on multiple DTs as its base learners, creating a multitude of trees and fusing their predictions to deliver a final result.This approach aims to tackle the overfitting issue often found in single DTs by aggregating the outputs of several trees.In the RF algorithm, each DT is formed using a random portion of the training data and a random set of features, expanding to its allowed full depth with the ultimate prediction determined by averaging the predictions of individual trees.Employing random data and feature subsets effectively addresses overfitting, as each tree has access to only a limited part of the data and is less prone to fit the data's noise.Nonetheless, random forests can be computationally demanding, particularly when working with large datasets, and interpreting random forests can be more complex than single DTs or linear models because the predictions are averaged across numerous DTs.Despite these drawbacks, RFs have earned popularity and extensive use due to their simplicity, adaptability, and impressive performance in a broad array of applications.An adaptation of RF algorithm is the Extremely Randomized Trees (ERTs) which introduced by Geurts et al. [117] in 2006.This method aims to further reduce the correlation between the trees in the forest to decrease overfitting and enhance the stability of the predictions.In ERTs, the feature selection process at each split in the DTs is randomized also, instead of using a greedy optimization algorithm like traditional DTs.Alongside random feature selection, ERTs also employ random thresholds for each feature at every split [117].By incorporating randomness into the feature selection process, ERTs reduce the correlation between the trees in the forest and make the final predictions more resilient to minor changes of the data.This can lead to enhanced generalization performance, particularly when handling noisy data [117].9 of 31

Boosted Trees
Boosted trees come in several variations, each with their distinct advantages and disadvantages.Adaboost (Adaptive Boosting) [118,119] is among the earliest boosting algorithms.Its fundamental principle is to assign higher weights to challenging to predict instances and lower weights to easily predicted instances.The algorithm trains a weak learner on the weighted training data and adjusts the weights based on the weak learner's performance.This process iterates, with each iteration training a weak learner and updating the weights.The final prediction is derived by combining all weak learners' predictions through weighted majority voting.
Gradient Boosting (GBoost) [116] initiates by fitting a simple decision tree to the training data, followed by fitting another decision tree to the residuals (the discrepancy between the actual target values and the first tree's predictions), and so forth.By integrating multiple trees' predictions, the algorithm strives to rectify previous trees' errors and incrementally enhance the model's overall performance.The central concept of Gradient Boosting is to utilize gradient information to direct model improvement.The gradient information stems from the loss function, which measures the difference between the actual target values and the predictions.The algorithm fits the subsequent tree to the loss function's negative gradient concerning the prediction to minimize the loss.Equation ( 7) explains the general idea behind Gradient Boosting.
where F(x) is the final prediction of the model, F n-1 (x) is the prediction of the previous iteration, T n (x) is the n-th tree model in the ensemble, β n is the weight assigned to the n-th tree model.
The weights β n are optimized to minimize the cost function J (Equation ( 8)), which measures the difference between the true values and the model's predictions.
where F n-1 (x) is the prediction of the previous iteration, m is the number of samples, and L is the loss function.
The learning rate, also referred to as shrinkage or step size, governs the increase of adjustment in each iteration.A smaller learning rate results in slower convergence but generally produces more precise models.
Gradient Boosting presents several benefits over other ML methods, including its ability to handle complex non-linear relationships, work with both categorical and continuous variables, and perform effectively with noisy and incomplete data.However, gradient boosting can be resource-intensive and may overfit the training data if there are excessive trees or if the learning rate is excessively high.Furthermore, interpreting gradient boosting models can be more difficult compared to linear models or individual DTs.
With time, numerous variations of Gradient Boosting have arisen.Some of the most prominent versions of Gradient Boosting include the following: Microsoft's Light Gradient Boosting or LightGBM [120] has gained recognition due to its speed and effectiveness.Created for large-scale, high-dimensional datasets, LightGBM optimizes memory usage and training time.Its gradient-based one-side sampling (GOSS) method, which randomly samples a subset of instances for training each tree, is a key feature that minimizes model bias and improves performance.CatBoost [121,122] employs multiple overfitting prevention techniques such as regularization and early stopping to prevent the model from fitting data noise.XGBoost (eXtreme Gradient Boosting) [123], a gradient boosting implementation, can manage missing values and outliers by partitioning data into smaller segments and fitting trees to the subsets.XGBoost also uses regularization techniques to prevent overfitting and leverages parallel processing for rapid and efficient performance on large datasets.In contrast, Natural Gradient Boosting (NGBoost) [124]  probabilistic framework for decision trees, enabling uncertainty estimation and generating meaningful feature importances that are not easily achieved in conventional gradient boosting.

Feedforward Neural Networks
ANNs are information processing structures initially inspired by the functioning of the human brain and biological neurons [125].Specifically, the Multilayer Perceptron (MLP) [126][127][128], also known as Feedforward Neural Networks, is a type of ANN that serves as a potent learning model for addressing both classification and regression challenges.The MLP consists of artificial neurons called perceptrons or units and organized in layers.Its architecture is composed of successive fully connected layers, with the input layer containing the predictors or independent variables and the final output holding the target variable.Layers and units placed between the input and output layers are referred to as hidden layers and units.Within a hidden layer, each neuron obtains activations from every neuron in the preceding layer and sends its activation to all units in the following layer.More specifically, each neuron's value is calculated by applying a non-linear function, known as the activation function, to the previous layer units linear combination.The coefficients of the aforementioned linear combination are divided into weights (W l ij ) and biases (b l i ), collectively so-called as the trainable parameters of layer l.The process of mapping input features to the output target, called Forward Propagation of a neural network and illustrated by Equation ( 9).The value a l i of the i th neuron in the l th layer depends on the j th neuron in the (l -1) th layer, while the vector α 0 j represent the input attributes.Initially, random values and zeros assigned to weights and biases, respectively, at the beginning of training.
During training, the process of estimating the cost function (J) partial derivatives with respect to trainable parameters (W l ij , b l i ) is known as back-propagation (Equations ( 10)) [129,130].It begins by determining the error at the output layer through a comparison between the predicted target values ŷ and the actual target values y.The error is then propagated backwards through the network.This procedure is iterated for every layer until the error reaches the input layer.To minimize the losses between the predicted and actual target values in the training set, these parameters are updated in each step according to the gradient descent optimization algorithm.This algorithm operates by taking incremental steps in the direction of the negative gradient of the cost function concerning the trainable parameters.The update rules for weights and biases are provided by Equations (11).Furthermore, the complexity of the MLP is influenced by the number of hidden layers and units, as well as the regularization parameter (λ), which functions similarly to the explanation in subsection 3.1, resulting in Equation (12) for the regularized cost function.
where denotes element-wise multiplication.
where α is the learning rate, and ← denotes assignment.
where L is the number of layers, n is the number neurons of layer l, m is the number neurons of layer l -1, and λ is the regularization parameter.

Hyperparameter Tuning and K-fold Cross Validation
The trainable parameters of every parametric ML model could be calibrated after minimization of the cost function on the training set, however this is not the case about the hyperparameters which should be set before the training.Consequently, each variation of hyperparameters' values configured a different instance of a ML method and by comparing the performance of multiple fitted models their optimized values are determined.Fine-tuned hyperparameters could be significantly reduce the bias, variance effects and results in a model with high generalization ability.Hyperparameter tuning is frequently performed using the Grid Search method [131]; however, the Randomized Search Cross-Validation (CV) method [132] can be more efficient when dealing with a large number of hyperparameters and their possible values, as it reduces computational expense.This method select randomly a subset of the possible hyperparameters values combinations and utilizing K-fold CV [133] to assess their performance.According to K-fold procedure, the dataset would be divided in K sections and every ML algorithm should be fitted K times in different every time training set constituted by K-1 parts and evaluated using the remaining Kth part.Afterward, for all of the examined models, the mean cross-validation predicting capacities, which have emerged from the K-fold cross-validation procedure, will be compared, enabling a conclusion to be reached concerning the best performing model for the given problem.All of the above processes are implemented using scikit-learn [134] which is an open source and freely accessible ML library built in Python.

Interpretation Methods
While interpreting a single decision tree can be relatively straightforward by analyzing its structure, including the impurity decrease at each node, gradient boosting models encompass numerous regression trees, making understanding them through individual tree examination more challenging.Moreover, the complexity of ANNs exacerbates interpretability difficulties of high-level ML methods.To address this issue, model-agnostic interpretation approaches have been developed recently [63].This kind of techniques could be used to explain the predictions of any previous described ML method, without consider the internal structure of the model.In this section, several methods that have been developed to distill and interpret advanced and complex ML models, such as gradient boosting models or ANNs, are described.

Global Interpretation Methods
Contrary to local methods, which emphasize in particular instances, global interpretation approaches elucidate the mean performance of ML models.By employing expected values contingent on data distributions, these approaches facilitate the identification of prevailing trends.An illustrative example is the Partial Dependence Plot (PDP) [116,135], a type of feature effect plot that presents the projected outcome whilst marginalizing extraneous features.In essence, the PDP can be seen as the anticipated target response based on the input features of interest.Due to human perceptual constraints, the set of input features of interest should be limited, typically encompassing one or two features.For a single selected feature, the partial dependence function, at a specific feature value, indicates the average prediction when all data points are assigned that particular value.Additionally, Accumulated Local Effects (ALE) [136] plots extend the concept of PDP by accounting for potential feature interactions and correlations.Also, the Permutation feature importance, introduced by Breiman [115], is another global method that assesses the impact of individual features by measuring the change in model performance when the feature values are randomly shuffled, thereby breaking the relationship between the feature and the target.

Local Interpretation Methods
In contrast, local interpretation methods such as Local Interpretable Model-agnostic Explanations (LIME) [137] and SHapley Additive exPlanations (SHAP) [138] focus on explaining individual predictions.More specifically, these methods clarify each feature's contribution to an individual prediction, which can be positive, negative, or neutral.LIME creates an artificial dataset in the vicinity of the examined data point, using the predictions of the complex model as ground truth.In the next step, fit an interpretable model (e.g., linear or decision tree) to the artificial dataset, to approximate the complex model locally with a simpler one.This process offers insights into the model's behaviour for a specific data instance.SHAP is an approach for interpreting ML models that provides a unified and scalable framework for explaining any ML model's output.SHAP values, rooted in cooperative game theory [139], offer a unified measure of feature importance that represent the contribution of each feature to a specific prediction.This technique assess a feature's influence on a prediction by taking into account all possible combinations of feature values and aggregating the resulting predictions.SHAP values quantify a feature's contribution to the prediction while considering feature interactions, and the sum of contributions for all features in a prediction is equal to the difference between the prediction and the average prediction.Choosing between SHAP and LIME depends on the context, as no consensus exists on which is universally better.

Hyperparameter Tuning and ML Models Comparison
An extensive examination of 10,000 distinct variations for the majority of ML methods outlined in Section 3 was conducted, utilizing the Randomized Search CV procedure.Each variation, also referred as an instance of an ML method, was configured with different values and combinations of hyperparameters.A total of 15 ML methods were assessed to determine the most efficient method for predicting the cumulative seismic damage under MS-AS sequences.The best performing instances for each of the examined ML methods are presented in Figure 3, which showcases their performance in terms of the determination coefficient R 2 .This coefficient was calculated for both the training and CV sets during the K-fold process, as well as for the final test set, which was reserved for unbiased evaluation and comprised 15% of the overall dataset.An absolute improvement of 10% in R 2 values was observed when comparing linear models to the most recently developed and advanced ML methods (ensemble, MLP).Our results highlight the high efficiency of the majority of ensemble methods, including both boosted and randomized trees, for predicting the cumulative seismic damage.The MLP model exhibits a slightly more effective bias-variance balance, as it achieves more similar performance across training, CV, and test sets in comparison with the other advanced models.The most efficient model showcased an instance of the LightGBM method which demonstrate the better CV and test performance, while the the poorest performance demonstrated by KNN, with R 2 ≈ 0.4 .In the case of linear models (Section 3.1), regularization factors ranging from 0 to 1 were employed as hyperparameters.For DT, the splitting criterion options included Squared Error, Friedman MSE, Absolute Error, and Poisson, while the maximum tree depth ranged from 0 to 100, both serving as hyperparameters.In the context of random forests, the number of trees varied from 0 to 200, with the splitting criterion, maximum tree depth, and maximum number of features for selection during the splitting procedure all functioning as hyperparameters.In contrast, the splitting criterion is not an individual hyperparameter for boosted trees, and consequently, exploration regarding this aspect is not conducted for this ML method family.However, the learning rate, a hyperparameter which controls the magnitude of updates during the training process of boosted trees, is included.
In Figure 4, the evolution of R 2 against the number of trees is compared for different types of boosted trees.The colored bands represent the influence of other hyperparameters on the performance of each method, with a 95% confidence interval.The AdaBoost method exhibits the poorest performance and lacks the ability to control overfitting as the number of trees increases.In contrast, LightGBM and XGBoost demonstrate the most optimized and robust performance, with higher R 2 values and lower variability as the number of trees increases.Additionally, CatBoost, GBoost, and NGBoost, in descending order, present intermediate R 2 values, with GBoost exhibiting more robust behavior in response to changes in the number of trees.In summary, the majority of boosted trees display instances with high prediction performance, exceeding 0.9.The comparison between the two examined methods of randomized trees, is illustrated in Figure 5.The figure presents the evolution of the mean determination coefficient versus the number of base learner trees, with the hatched band depicting the uncertainty introduced by the other hyperparameters with the same as in case of boosted confidence interval.In the case of Multi-Layer Perceptrons (MLPs), the regularization parameter ranged from 0 to 1. MLPs consisted of one, two, or three hidden layers, each containing between 1 and 200 neurons.The activation function of the hidden neurons varied among identity (no activation function), hyperbolic tangent, and ReLU (Rectified Linear Unit); however, no activation function was consistently applied to the output neuron because the examined problem was being addressed as regression.The sigmoid function was not investigated, as it is more appropriate for use in the output neuron of an MLP classifier.
In Figure 6, the mean R 2 score, calculated in the CV sets using the K-fold validation procedure with K equal to 10, is depicted as being plotted against the total number of hidden neurons composing the MLP.The MLPs utilizing the ReLU activation function appear to exhibit better performance, followed by those with the Tanh activation function.The MLPs with no activation function essentially represent linear mapping, and their R 2 scores are similar to the results from linear models (Figure 3).Additionally, increasing the number of hidden units beyond 100 seems to have little impact on the method's performance, as the R 2 score plateaus around a constant value for each activation function after this point.

Interpretation of the Best ML Model
Generally, input features have varying levels of contribution to predicting the target outcome, and many of them may not be significant.The primary goal in model interpretation is to identify the key features and grasp how they influence the prediction of the target outcome.In this section, the interpretability of the devised ML model is explored, beginning with a discussion of feature importances.
As the hyperparameter tuning and comparison of the examined ML methods are concluded, 8 of them are demonstrated to have high efficiency with R 2 > 0.9.The best ML models are instances of the ensemble and MLP methods and more specifically the more efficient is shown to be the LightGBM model with 195 trees, a maximum depth of 4, a maximum of 17 features (all) and learning rate equal to 0.1589.The Permutation technique, a global model-agnostic interpretation method, is implemented to extract the feature importances for the best models.The relative feature importances of the most efficient models, are presented in Figure 7.As observed, the majority of models agree that the most crucial predictor of the final damage is the pre-existing damage DI G,PA,1st , followed by I FVF , SI H and PGA PGV .Figure 7. Feature Importances for the ML methods with R 2 > 0.9, according to Permutation method.
Individual DTs inherently perform feature selection by choosing suitable split points based on impurity reduction, which can be utilized to assess the significance of each feature.By measuring the decrease in impurity contributed by each feature throughout the tree, we can estimate its relative importance in the decision-making process.Since the most suitable ML method for our problem turned out to be LightGBM, a tree-based ensemble method, the notion of significance can be applied to our model by averaging the impurity-based feature importance across all trees.In Figure 8, the importances of input features, computed using three model-agnostic methods (LIME, SHAP, and Permutation) and the impurity-based method, which is only applicable to decision tree-oriented ML models, are presented.The y-axis of each subplot displays the features in descending order of importance according to the respective interpretation method.Both LIME and SHAP, as previously described, calculate the contributions of each feature for every data point individually.To obtain comprehensive importances, the mean absolute attribution and value for each method are calculated, respectively, over the entire dataset.As depicted in Figure 8, all interpretation methods identify the initial damage DI G,PA,1st as the most significant feature.The majority of interpretation methods indicate the second most important feature and the most significant among intensity measures (IMs) to be I FVF , except for the impurity-based explanation, which identifies SI H . LIME and SHAP rank SI H as the third most important IM (third input feature), while Permutation suggests PGA  PGV and Impurity proposes I AS .The examined methods also, do not agree on the third most important IM: LIME identifies it as CAV, SHAP as PGA PGV , Permutation as SI H , and Impurity as PGD.However, feature importances does not provide insights into whether a positive or negative change in an input variable leads to a corresponding positive or negative influence on the output variable, or the contrary.

Local Explanation Methods (LIME, SHAP)
On the other hand, local interpretation methods offer numerous advantages over other methods for explaining model predictions, including being model-agnostic, enabling explanation of individual predictions, and providing a unified approach for interpreting both linear and non-linear models.By considering each feature's contribution and its interactions with other features, deliver accurate explanations of the model's prediction.The sum of SHAP values for a given instance is equal to the difference between the prediction and the expected prediction value over the entire dataset, ensuring the model's overall behaviour is considered.For a model f(x) that maps input x to a prediction, the SHAP value of feature i for a specific sample x is defined in Equation (13).
where T is the set of all feature indices and x S is the input vector with only the features in S present.
The formula represents the average difference that adding feature i to the input makes over all possible combinations of the remaining features.LIME assesses the impact of each input variable on an individual example basis.The distribution of LIME attributions across the entire dataset is depicted in Figure 9a.On the y-axis, the input features are arranged in descending order based on their mean absolute LIME per feature, as discussed in the previous section.Features such as DI G,PA,1st , I FVF , and SI H appear to have a positive impact, while the PGA PGV ratio suggests a negative impact on the final damage outcome.The distribution of SHAP values for each predictor throughout the overall dataset is depicted in Figure 9b.The x-axis values display the SHAP values for each contributing variable, signifying their impact on the final damage in terms of DI G,PA , while the y-axis arranges the predictors according to their importance.The colors in the figure represents the values of the input features.For instance, increased values of DI G,PA,1st , I FVF , SI H , and PGV lead to a higher estimated final damage, DI G,PA .On the other hand, lower values of PGA PGV and PGA result in higher SHAP values, which negatively influence the final damage.To examine the contribution variation of pre-existing damage and consequent seismic shock severity characteristics across different levels of the final damages, the dataset and the corresponding LIME attributions and SHAP values were grouped according to the following damage states: Minor (0 < DI G,PA ≤ 0.3), Moderate (0.3 < DI G,PA ≤ 0.6), Severe (0.6 < DI G,PA ≤ 0.8), Collapse (0.8 < DI G,PA ).The mean absolute values for each of the above-mentioned interpretation methods were calculated per damage state.The contributions of all the IMs are summarized and compared to those of the initial damage.In Figure 10, the results of the above process are displayed, with values normalized to unity.This illustration highlights the comparison between the contributions of seismic parameters and those of initial damage across each damage state.Quite different results emerge for each interpretation method, but both agree that the contribution of IMs is larger than that of DI G,PA,1st for minor damages.According to LIME, both IMs and established damage contribute equally for the three higher damage states.In contrast, SHAP estimates a larger impact for IMs, which is maximized for moderate damage and decreases as the damage level increases.

Global Explanation Methods (PDP and ALE)
Both PDPs and ALE, as global model-agnostic interpretation methods, can describe the general trends of our ML model with respect to input variables and depict the relationship between the cumulative seismic damage and a group of relevant predictors.
Figure 11 summarizes both PDPs and ALE results for the majority of the damage predictors.In each subplot, the y-axis depicts the expected value of the dependent variable, which in our case is the final damage (DI G,PA ), while the x-axis represents the value of the examined damage predictor.The positive impact of the initial damage on the final damage across its entire range of values can be observed.There is a sharp increase to the point of unity initial damage, and beyond that, the trend levels off horizontally.I FVF has a positive impact for values between 50 and 150 cm • s -0.75 , and zero The SI H has a positive impact for values between 80 and 180 cm, which appear to be responsible for a greater increase in the final damage.Also, final damage values between 0.3 and 0.45 seem to be influenced by this parameter.Subsequently, values of PGA PGV smaller than 15 s -1 appear to have a negative impact on the final damages and larger values present zero impact.CAV has an increasing outcome in the interval of 500-1500 cm s , increasing the damage from 0.3 to 0.4, and zero effect outside of this range.I RG appears to have a predominantly negative influence on damage in the range of [0.2, 0.4].Other seismic parameters (PGD, PGV, I AS , PGA, SMD Bolt , I A ) seem to have less influence on shaping the final damage, affecting its values between 0.3 and 0.4.All the other input parameters appear to have very small or zero impact on the prediction of final damage, predicting values around 0.33, which is the mean output of our total sample.As a general observation, ALE indicates wider ranges of the dependent variable affected in comparison with PDP for every input feature.In summary, this section explored the interpretability of the devised ML model by discussing feature importances, local explanation methods (LIME and SHAP), and global explanation methods (PDP and ALE).The results showed that initial damage, I FVF , and SI H are among the most crucial predictors of the final damage.Additionally, the local explanation methods provided insights into the positive or negative influence of each input variable on the cumulative seismic damage, while the global explanation methods described the general trends of the ML model with respect to damage predictors.

Conclusions
In this study, interpretable ML models to estimate the cumulative damage of an eight-story RC frame subjected to earthquake sequences were presented and analysed.Through the application of local and global explanation methods, a more profound understanding of the impact of individual features and their interactions in the context of ultimate seismic damage was achieved.Local explanation techniques, LIME and SHAP, delivered in-depth insights into how each feature influences the prediction on an individual level, while global explanation approaches, PDP and ALE, facilitated comprehension of the general trends of the ML model concerning input variables.The utilization of these interpretation methodologies contributes to the creation of more transparent and interpretable models, which is of paramount importance for the implementation of ML methods in earthquake engineering problems.Our research investigated the accumulation of damage during a sequence of earthquakes, identifying crucial predictors and understanding their impact on the final damage (DI G,PA ).The input variables for the regression problem were divided into two distinct physical classes: pre-existing damage from the initial seismic event and the characteristics of the subsequent ground motion expressed using the Park and Ang damage index (DI G,PA,1st ) and 16 Intensity Measures (IMs), respectively.The main outcomes are: • The most efficient model for predicting final seismic damage under MS-AS sequences was an instance of the LightGBM method with R 2 greater than 0.95, while the method with the poorest performance was KNN, with an R 2 value of approximately 0.4.• Among the examined boosted trees, LightGBM and XGBoost demonstrated the most optimized and robust performance even against small changes in their hyperparameters.Moreover, they present great resistance to overfitting as the number of trees increases.• In the case of Multi-Layer Perceptrons (MLPs), the ReLU activation function appeared to yield better performance, followed by the Tanh activation function.Also the MLP model present slightly better bias-variance balance than the other advanced ML models.• All the interpretation methods identified the initial damage DI G,PA,1st as the most significant feature followed by the IMs of the subsequent seismic shock.However, the ranking of the IMs importance is varying between the adopted approaches.The majority of interpretation methods indicate the I FVF as the most important IM, except for the impurity-based explanation, which identified SI H .As the second most important IM, LIME and SHAP ranked SI H , although Permutation ranked PGA PGV , and Impurity ranked I AS .• In case of examining the effect of all the IMs in total, both LIME and SHAP local explanation methods show that the contribution of the subsequent ground motion is larger than that of initial damage DI G,PA,1st .In general, the effect of the initial damage tends increase as the final increases.However, they differ in their estimation of contributions for higher damage states.• The analysis of PDPs and ALE reveals key insights into the effects of damage predictors on the final damage.The pre-existing damage demonstrates a positive influence across the entire range of cumulative damage.Additionally, I FVF and SI H present a notable positive impact on moderate final damages.In contrast, PGA PGV values smaller than 15 s -1 seems to have a negative impact on moderate final damages, while CAV and I RG demonstrate more complex effects in narrower range of the final damage.

Figure 3 .
Figure 3. Side-by-Side Bar Plot Comparing the Performance of Different ML Methods on Training (10-fold), CV (10-fold), and Test Sets.

Figure 4 .
Figure 4.The evolution of R 2 versus the number of base learner trees for Boosted Trees.

Figure 5 .
Figure 5.The evolution of R 2 versus the number of base learner trees for Random Forest Algorithms.

Figure 6 .
Figure 6.The evolution of R 2 versus the total number of neurons, for each activation function.

Figure 8 .
Figure 8. Feature Importances for the LightGBM best model, using different interpretation methods.

Table 1 .
Mathematical Formulas of the examined IMs.

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 10 May 2023 doi:10.20944/preprints202305.0737.v1 impact
for smaller and larger values.The corresponding final damage ranges from 0.25 up to 0.56 by PDP and 0.25 to 0.65 by ALE.As observed, a larger cumulative damage range affected by I FVF is identified by ALE.

Table A1 .
Seismic Metadata for the Real Sequences.