A Machine Learning Framework for Regional Damage Assessment Using Multi-Station Seismic Parameters: Insights from the 2023 Kahramanmaraş Earthquakes

Nemutlu, Ömer Faruk; Özçelik, Salih Taha Alperen; Freeshah, Mohamed

doi:10.3390/buildings15183326

Open AccessArticle

A Machine Learning Framework for Regional Damage Assessment Using Multi-Station Seismic Parameters: Insights from the 2023 Kahramanmaraş Earthquakes

by

Ömer Faruk Nemutlu

¹

,

Salih Taha Alperen Özçelik

²

and

Mohamed Freeshah

^3,4,*

¹

Civil Engineering Department, Faculty of Engineering and Architecture, Bingol University, 12000 Bingol, Türkiye

²

Electrical-Electronics Engineering Department, Faculty of Engineering and Architecture, Bingol University, 12000 Bingol, Türkiye

³

Civil and Environmental Engineering Department, College of Engineering, UAE University, Al Ain 15551, United Arab Emirates

⁴

Geomatics Engineering Department, Faculty of Engineering at Shoubra, Benha University, Cairo 13511, Egypt

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(18), 3326; https://doi.org/10.3390/buildings15183326

Submission received: 5 August 2025 / Revised: 4 September 2025 / Accepted: 10 September 2025 / Published: 14 September 2025

(This article belongs to the Section Construction Management, and Computers & Digitization)

Download

Browse Figures

Versions Notes

Abstract

The twin earthquakes that struck Kahramanmaraş in 2023 (Mw 7.7 and Mw 7.6) caused widespread structural destruction across southeastern Türkiye, underscoring the need for more refined approaches to seismic damage assessment. In this study, a large-scale machine learning (ML) analysis is conducted to identify and classify damage patterns among 304,299 buildings across 11 cities. Ten ML algorithms are implemented, and their performance in the multiclass classification of damage severity is comparatively evaluated (collapsed, urgent demolition, moderately damaged, and severely damaged). Unlike conventional methods that rely on single-station data, the proposed approach integrates ground motion parameters from the six seismic stations closest to each building. These parameters include peak ground acceleration, several distance measures (Joyner–Boore, rupture, and epicentral distances), and site condition indicators such as mean shear wave velocity in the upper 30 m and soil classification, yielding 60 engineered features per building. The analysis reveals that ensemble learning models, particularly the random forest and a voting ensemble, achieve the highest classification accuracies (79.65% and 79.62%, respectively). Moreover, classification performance varies across damage categories: severely damaged structures exhibit the highest F1-score (0.891), whereas collapsed buildings exhibit lower accuracy (F1-score: 0.408). These findings offer practical value for post-earthquake emergency operations. Furthermore, the methodology establishes a precedent for future seismic risk assessments and supports data-driven decision-making.

Keywords:

seismic damage evaluation; machine learning; seismic risk; random forest; ensemble learning; Kahramanmaraş earthquake; earthquake observation; damage classification

1. Introduction

Assessing earthquake-induced damage to buildings constitutes a critical aspect of both seismic risk mitigation and post-earthquake recovery management [1,2]. After a major earthquake, the requirement to evaluate and categorize structural damage patterns is essential for assisting emergency response, effective resource allocation, coordination, and long-term urban planning [3,4,5,6,7]. The catastrophic Kahramanmaraş earthquakes (Mw 7.7 and 7.6), which occurred on 6 February 2023, in southeast Turkey, provide an opportunity to advance damage assessment methodologies with the help of large datasets that include detailed building observations at a large scale and comprehensive multi-station seismic records [2,8,9,10]. This earthquake sequence stands as one of the most devastating seismic events in Türkiye’s recent history, resulting in structural damage across 11 provinces and affecting millions of people [11]. Occurring along the same fault system, the sequence generated complex ground motions marked by pronounced spatial variability in seismic intensity, duration, and frequency content [12,13,14]. This unique seismic scenario offers valuable insights into the damage mechanisms triggered by sequential strong ground motions and serves as an important calibration dataset for advanced predictive models.

Traditional approaches to seismic damage estimation have primarily relied on empirical fragility curves, intensity-based correlations, and simplified ground motion parameters, such as peak ground acceleration (PGA) or spectral acceleration at fundamental periods [15,16,17]. However, these methods often overlook the complex, multidimensional interactions among ground motion characteristics, site conditions, local geology, and structural response [18,19,20]. The limitations of single-parameter approaches become particularly evident during complex earthquake sequences involving multiple events, heterogeneous site conditions, and diverse structural typologies [21].

Recent advancements in machine learning (ML) algorithms and artificial intelligence techniques have enabled seismic damage assessments based on extensive multidimensional datasets and the identification of nonlinear patterns that classical statistical methods cannot capture [22,23,24]. These ML algorithms integrate multiple ground motion parameters, site characteristics, and structural attributes to develop more robust predictive models [25,26]. Nevertheless, many ML-based damage assessments remain constrained by limited dataset sizes, oversimplified feature representations, or reliance on single-station seismic records [27,28,29].

The use of multi-station seismic data represents a major advancement in seismic damage estimation. Studies have reported that ground motion characteristics can vary substantially across short distances, owing to directivity effects, basin amplification, topographic influences, and local site conditions [30,31,32]. These effects were also investigated in terms of spectral demands in Kahramanmaraş earthquakes for industrial facilities [33]. These spatial gradients are not captured by single-station techniques, potentially resulting in biased damage estimates [34]. In contrast, multi-station analysis—by integrating data from seismic instruments installed near affected structures—potentially enables a more comprehensive characterization of the local seismic environment [35,36]. Damage evaluation relies on multiple parameters that reflect distinct dimensions of seismic intensity and structural demand [37]. Among such parameters, PGA has traditionally been the most widely used, owing to its computational simplicity and accessibility; however, it only captures peak acceleration amplitude and neglects duration and frequency content [38]. Meanwhile, distance-based parameters—such as Joyner–Boore distance (Rjb), rupture distance (Rrup), epicentral distance (Repi), and hypocentral distance (Rhyp)—offer supplementary insight into source-to-site geometry and attenuation characteristics [39].

Site conditions, commonly represented by the mean shear wave velocity in the upper 30 m (Vs30), play a central role in ground motion amplification and structural response [40,41]. Soil classifications based on Vs30 measurements offer a standardized framework for incorporating site effects into damage assessment models [42,43]. Integrating these heterogeneous site parameters into ML models facilitates the development of more comprehensive damage prediction frameworks.

Despite notable advances in seismic engineering and ML, critical gaps persist in earthquake damage assessment research. First, many earlier studies have relied on relatively small datasets (containing data of hundreds to thousands of buildings), limiting the development of robust models capable of generalizing across varied seismic conditions and structural typologies [44,45]. Second, most ML-based approaches have used reduced feature descriptions or single-site ground motion records, potentially overlooking meaningful spatial variability in seismic demand [46,47].

Third, only a few studies have undertaken systematic comparisons of ML algorithms for seismic damage classification, while most have focused on individual models rather than conducting comprehensive evaluations [23,48]. Fourth, ensemble learning methods, which have demonstrated considerable promise in other domains, have received limited attention in seismic damage assessment applications [49,50]. Recent advances in machine learning applications for seismic damage assessment have demonstrated promising results across various structural systems and prediction frameworks. Ref. [51] developed machine learning algorithms for structural damage prediction of reinforced concrete frames under both single and multiple seismic events, highlighting the capability of ML approaches to capture cumulative damage effects. Similarly, Ref. [52] proposed a real-time seismic damage prediction framework based on machine learning for earthquake early warning systems, demonstrating the potential for immediate damage assessment during seismic events. Recent advances in physics-informed neural networks have demonstrated significant potential in seismic demand prediction by integrating physical laws with machine learning algorithms [53]. Additionally, analytical approaches incorporating curvature distribution models and matrix-based predictive frameworks have shown promising results for structural resilience assessment and damage prediction [54,55].

Hariri-Ardebili and Sattar (2025) [56] conducted comprehensive data-driven analysis of post-earthquake reconnaissance findings from the 2023 Türkiye earthquake sequence, employing AutoML approaches on 242 buildings with detailed structural parameters including age, height, and column/wall indices. Their study demonstrated the effectiveness of building-specific vulnerability assessment through SHAP interpretability analysis. Complementary to this approach, ensemble learning methods have shown considerable promise for large-scale seismic damage assessment. Ref. [57] developed a hybrid stacked ensemble model combining multiple machine learning algorithms for rapid damage classification using both building features and seismic data as input parameters. Similarly, Ref. [58] presented an explainable ensemble learning framework using bootstrap aggregating to predict structural damage in masonry buildings during seismic events, emphasizing the importance of model interpretability in damage assessment applications. Furthermore, Ref. [45] investigated multi-station seismic parameter approaches for regional building damage prediction, demonstrating the potential of comprehensive ground motion characterization from multiple recording stations for large-scale damage assessment scenarios.

This study addresses these gaps by examining the largest recorded dataset of building damage from a single earthquake series in Türkiye, comprising 304,299 buildings affected by the 2023 Kahramanmaraş earthquakes. The unprecedented scale of this dataset enables the development of robust statistical models and supports comprehensive comparisons of ML algorithms.

The primary objective of this study is to develop an integrated ML framework comparing 10 distinct algorithms for multiclass building damage classification, using the extensive seismic dataset of the 2023 Kahramanmaraş earthquake sequence. We incorporate ground motion data from the six closest seismic stations to each structure to capture spatial variation in seismic demand and perform a comprehensive feature importance analysis to identify the most influential parameters for seismic damage classification. This investigation makes several notable contributions to the field of seismic damage assessment:

The ML-based analysis of 304,299 buildings, representing the largest damage dataset from a single earthquake, enables robust statistical inference and model validation.
The integration of 66 seismic parameters from multiple recording stations enables a detailed characterization of local ground motion conditions.
The comprehensive evaluation of 10 ML algorithms, conducted using standardized datasets, preprocessing procedures, and evaluation metrics, enables objective performance assessment.
The study develops sophisticated feature sets that incorporate multidirectional PGA values, multiple distance metrics, soil velocity measurements, and site classification parameters.
Ensemble methods achieve 79.6% classification accuracy and demonstrate practical improvements over individual algorithms.
A detailed evaluation based on multiple metrics, including accuracy, precision, recall, F1-score, and cross-validation, provides a thorough assessment of model performance.
The acquired insights are directly applicable to earthquake damage assessment, emergency response protocols, and seismic risk mitigation strategies.

The remainder of this paper is structured as follows: Section 2 presents the comprehensive methodology including data collection, preprocessing procedures, feature engineering approaches, and machine learning algorithm implementations; Section 3 provides detailed results including exploratory data analysis, model performance comparisons, and ensemble learning outcomes; Section 4 discusses the implications of findings, comparisons with the existing literature, and practical applications; Section 5 concludes with key findings, limitations, and recommendations for future research directions.

2. Data and Study Area

2.1. Sequence of the February 2023 Kahramanmaraş Earthquakes

The Kahramanmaraş earthquake sequence on 6 February 2023 began with a catastrophic Mw 7.7 event (Table 1) at 04:17 (local time), followed approximately 9 h later by an equally damaging Mw 7.6 event at 13:24 (local time) [59]. These events formed a near-simultaneous sequence along the East Anatolian Fault Zone, one of Türkiye’s most seismically active regions, and caused widespread destruction across southeastern Türkiye [60,61].

The first earthquake (Mw 7.7) occurred near the Pazarcık district in Kahramanmaraş Province, and the second major event (Mw 7.6) struck approximately 95 km northeast in the Elbistan district [62,63]. Their close spatial and temporal occurrence led to complex ground motions, characterized by cumulative damage effects and spatial variability in intensity distribution across the affected region [12].

2.2. Study Area and Affected Regions

The Kahramanmaraş earthquake sequence impacted 11 provinces in southeastern Türkiye: Adana, Adıyaman, Diyarbakır, Elazığ, Gaziantep, Hatay, Kahramanmaraş, Kilis, Malatya, Osmaniye, and Şanlıurfa. The affected region spans approximately 110,000 km² and was home to more than 13.5 million people before the earthquakes [64]. Figure 1 presents the geographic distribution of the 304,299 analyzed buildings across the earthquake-affected region. The map visualization uses optimized coordinate bounds (36–40° E longitude, 36.5–39° N latitude) to focus specifically on the building distribution area, eliminating empty geographical regions. The spatial concentration patterns clearly reflect urban development density and proximity to the fault rupture zones of the 6 February 2023 earthquake sequence.

The study area encompasses diverse geological settings, from sedimentary basins to mountainous terrain, and features varying soil conditions that substantially influenced ground motion amplification and structural performance [9,65]. Urban centers such as Kahramanmaraş, Hatay (Antakya), Adıyaman, and Gaziantep experienced particularly severe damage due to their proximity to fault ruptures and unfavorable site conditions [62].

2.3. Building Damage Data Collection

The building damage dataset was compiled from the findings of official damage assessment surveys conducted by the Turkish Ministry of Environment, Urbanization and Climate Change after the Kahramanmaraş earthquake sequence. Approximate coordinates of collapsed buildings were obtained from a database developed by Damcı et al. [8], who systematically processed address records from the government portal (https://hasartespit.csb.gov.tr) (accessed on 15 May 2025). These addresses were converted to geographic coordinates using multiple geocoding services, including the Google Geocoding API, Bing Geocoding API, Here Geocoding API, and OpenStreetMap Nominatim API.

2.3.1. Data Source and Collection Methodology

Building damage assessments using geographic coordinates, conducted between February and November 2023, served as the primary data source for this study. During the data collection process, the geographic coordinates of each assessed building were systematically generated using multiple geocoding services, and administrative identifiers such as province, district, and neighborhood were also obtained. The damage classification procedure followed standardized protocols, under which trained engineers and technical staff conducted field inspections to assess structural damage. As part of quality control, records were excluded when the province specified in the address did not match the province derived from the coordinates. These procedures ensured a reliable dataset, providing comprehensive damage information and verified geographic locations suitable for spatial analysis.

2.3.2. Damage Classification System

Buildings were categorized into four damage levels based on structural integrity and habitability assessments:

Severely Damaged: Buildings with considerable structural damage requiring major repairs but considered potentially salvageable.
Moderately Damaged: Buildings with moderate structural damage requiring repairs but generally inhabitable after remediation.
Collapsed: Buildings that experienced complete or near-complete structural failure.
Urgent Demolition: Buildings posing immediate safety risks and requiring prompt demolition.

Figure 2 presents the distribution of these categories within the analyzed dataset. As indicated, severely damaged buildings account for the largest share (67.1%, n = 204,257), followed by moderately damaged (13.6%, n = 41,291), collapsed (12.6%, n = 38,365), and urgent demolition (6.7%, n = 20,386).

2.3.3. Data Processing and Quality Control

The KMZ files underwent structured processing, beginning with decompression and extraction of KML content, followed by geographic coordinate derivation using XML parsing techniques. Further, administrative boundaries and damage categories were standardized, duplicate entries were removed based on coordinate validation, and all records were integrated into a unified database for analysis. As illustrated in Figure 3, the provincial distribution of building damage displays substantial spatial heterogeneity. For instance, Hatay province recorded the highest absolute number of damaged buildings, while Kahramanmaraş and Adıyaman demonstrated particularly high damage concentrations relative to their respective building stocks.

2.3.4. Building Dataset Characteristics and Limitations

The analyzed dataset contains 304,299 buildings from the affected region, with damage distribution shown in detail in Figure 3. However, detailed information regarding building typology (residential, commercial, industrial), construction materials, design codes, building age, or seismic code compliance was not available in the official damage assessment records used in this study. This represents a significant limitation of the current analysis, as both building characteristics and construction era significantly influence seismic vulnerability and damage patterns.

Building age and seismic code compliance are particularly critical factors, as structures built before Turkey’s major seismic code updates (1975, 1998, 2007, and 2018) typically exhibit different vulnerability characteristics. The absence of this temporal information limits the ability to correlate damage patterns with evolving seismic design standards and construction practices.

The damage assessment was conducted by trained technical teams following standardized protocols established by the Ministry of Environment, Urbanization and Climate Change. While this ensures consistency in damage classification, the absence of detailed building and temporal information limits the ability to perform code-specific or age-based vulnerability analysis.

As evident from Figure 3, the damage distribution varies significantly across provinces, with Hatay showing the highest number of damaged buildings (89,715 total), followed by Kahramanmaraş (53,568) and Malatya (44,710). This provincial variation reflects the complex interplay of ground motion intensity, local site conditions, and building stock characteristics.

Future research should prioritize collecting comprehensive building inventory data including construction year, seismic code compliance level, building type, structural system, and construction materials to enhance the predictive capability of machine learning models for seismic damage assessment.

2.4. Seismic Ground Motion Data

2.4.1. Seismic Station Network

The Disaster and Emergency Management Presidency (AFAD) maintains Türkiye’s national strong-motion network for recording ground motion data. This network comprises digital accelerograph stations equipped with triaxial sensors that capture motion in three orthogonal directions [66]. The station dataset includes detailed seismic parameters for each site, including PGA values in the North–South (NS), East–West (EW), and vertical (UD) directions. It also includes Joyner–Boore distance (R_jb), rupture distance (R_rup), epicentral distance (R_epi), hypocentral distance (R_hyp), average shear wave velocity (Vs₃₀), and soil classification data essential for comprehensive seismic hazard evaluation.

2.4.2. Multi-Station Parameter Integration

For each building in the damage database, seismic parameters were retrieved from the six nearest recording stations to capture spatial variability in ground motion characteristics (Table 2). The multi-station approach represents a significant improvement over traditional single-station analysis in that it provides comprehensive spatial characterization of seismic demand.

The systematic station selection process involves (1) calculating distances from each building to all available seismic stations using the Haversine formula, (2) ranking stations by ascending distance, and (3) assigning sequential labels where Station 1 represents the closest recording station, Station 2 the second closest, continuing through to Station 6. This consistent distance-based labeling methodology ensures that spatial relationships are preserved across all buildings in the dataset.

Distance calculations utilized the Haversine formula to account for the Earth’s curvature in determining the proximity of building sites to seismic recording stations. This approach provides precise distances across the large geographic area affected by the earthquake sequence, which is necessary for the accurate assignment of seismic parameters and maintains spatial coherence for machine learning model training:

d = 2 r \arcsin (\sqrt{\sin^{2} (\frac{Δ ϕ}{2}) + \cos (ϕ_{1}) \cos (ϕ_{2}) \sin^{2} (\frac{Δ λ}{2})})

(1)

where r represents the Earth’s radius (6371 km); φ₁ and φ₂ denote the latitude coordinates of the building and seismic station, respectively; λ₁ and λ₂ represent the corresponding longitude coordinates, while Δφ and Δλ indicate the differences in latitude (φ₂ − φ₁) and longitude (λ₂ − λ₁) between the two locations.

2.5. Feature Engineering and Dataset Integration

2.5.1. Spatial Data Integration

Integrating building damage information with seismic parameters required advanced spatial analysis methods to ensure accurate assignment of parameters to each of the 304,299 buildings in the dataset. The process began with the validation of coordinates to confirm the precision and consistency of building locations, followed by distance calculations to all accessible seismic stations using the Haversine formula. Subsequently, the six nearest stations to each building were identified to reflect local seismic demand variability, and the corresponding seismic parameters were extracted. Lastly, rigorous data quality control was conducted to confirm parameter completeness and ensure the reliability of the merged dataset for ML analysis.

2.5.2. Final Dataset Structure

The merged dataset comprised 304,299 building records and 72 detailed attributes, incorporating both building characteristics and seismic multi-station parameters. Specifically, the building attributes included six attributes, such as precise coordinates, administrative details, and damage classification status, while the remaining 60 attributes captured seismic parameters from six stations, with 10 parameters per station. These parameters included station identification codes; precise distance values (in kilometers); PGA in three directions (NS, EW, and UD), measured in gals; four distance metrics—Joyner–Boore, rupture, epicentral, and hypocentral distances (in kilometers); and site-specific characteristics, such as Vs₃₀ (in meters per second) and detailed soil classification. The interrelationships among these seismic parameters are illustrated in Figure 4, which displays strong correlations between PGA components (r > 0.85) and consistent spatial patterns among distance-based metrics, underscoring the multidimensional nature of ground motion characterization.

The systematic relationships between seismic parameters were analyzed using Pearson correlation coefficients to assess linear relationships between continuous variables. Pearson correlation was selected due to the predominantly continuous nature of seismic parameters and the need to quantify linear dependencies that are fundamental to understanding ground motion physics. The correlation analysis employed pairwise-complete observations to handle any potential missing values, with correlation strength ranging from −1 (perfect negative correlation) to +1 (perfect positive correlation).

These systematic relationships between seismic parameters are illustrated in Figure 4, which shows strong positive correlations between PGA components in different directions (r > 0.85), indicating consistent ground motion amplitudes across orthogonal recording directions. Distance-based parameters display expected spatial consistency, with moderate to strong correlations observed between Joyner–Boore, rupture, and epicentral distances due to their geometric relationships in the study area. The analysis demonstrates the multi-dimensional nature of ground motion characterization while revealing the redundancy in certain parameter combinations.

2.6. Data Quality and Preprocessing

2.6.1. Missing Data Analysis

The dataset was complete, with no missing values for any critical parameters, owing to systematic damage assessments, dense seismic station coverage, and rigorous quality control procedures.

2.6.2. Data Distribution and Characteristics

The dataset exhibited several properties that influenced ML analysis: a pronounced class imbalance, with 67.1% of records representing heavily damaged buildings; spatial clustering near urban centers, reflecting population distributions; a broad range of seismic parameter values that required normalization (as depicted in Figure 5); and multimodal distributions resulting from diverse site conditions across the affected region.

Overall, the large-scale dataset employed in this study forms a robust foundation for advanced ML applications, offering unmatched detail and scale for seismic damage assessment. The integration of multi-station seismic metrics with building damage observations supports the development of stable predictive models for earthquake risk assessment and emergency response planning.

3. Methodology

3.1. Workflow

Figure 6 illustrates the overall workflow of the proposed seismic damage assessment methodology. The proposed approach integrates multisource seismic and building damage data through systematic feature engineering, followed by a comprehensive evaluation of both individual and ensemble ML models.

The pipeline begins with the fusion of multisource data, continues through detailed feature engineering based on distance metrics and seismic attributes, proceeds with a systematic comparison of ML models, and concludes with ensemble learning to improve predictive performance.

3.1.1. Feature Selection and Categorization

The preprocessing pipeline ethically ranked the 72 original attributes to identify relevant predictive features for ML analysis. Administrative identifiers, such as building coordinates and location codes, were excluded as spatial identifiers rather than seismic parameters. The remaining features were categorized into 54 numeric attributes representing continuous seismic measurements (e.g., PGA values, distance metrics, and Vs30) and 12 categorical attributes, including station codes and soil types.

3.1.2. Encoding of Categorical Features

Categorical features were encoded to preserve their ordinal relationships and physical significance. Soil classes were converted using a physically meaningful ordinal scale based on stiffness characteristics (Table 3):

Station Code Encoding

Station codes were converted into numeric values using direct integer mapping, preserving their uniqueness while allowing numerical processing.

3.1.3. Missing Value Treatment

Despite the high-quality dataset with zero missing values in primary fields, preprocessing addressed potential numerical conversion issues:

I m p u t e d V a l u e = \{\begin{matrix} m e d i a n (X_{i}) & i f X_{i} i s n u m e r i c \\ 0 & i f c o n v e r s i o n f a i l s \end{matrix}

(2)

where

X_{i}

represents the ith feature vector.

3.1.4. Outlier Detection and Treatment

Infinite values and extreme outliers were identified and treated using robust statistical methods:

X_{c l e a n e d} = \{\begin{matrix} X & i f | X | < \infty \\ 0 & i f | X | = \infty \\ c l i p (X, Q_{1} - 1.5 \cdot I Q R, Q_{3} + 1.5 \cdot I Q R) & i f o u t l i e r \end{matrix}

(3)

where

Q_{1}

and

Q_{3}

represent the first and third quartiles, and

IQR

is the interquartile range.

3.2. Target Variable Encoding

Multiclass damage classification required systematic encoding for compatibility with ML algorithms. The LabelEncoder converted categorical damage states into numerical labels, as shown in Table 4.

3.3. Feature Scaling and Normalization

3.3.1. Algorithm-Specific Scaling

Different ML algorithms require tailored scaling approaches. Accordingly, the preprocessing pipeline implemented algorithm-specific conditional scaling:

For Distance-Based Algorithms (SVM, KNN, Logistic Regression):

StandardScaler normalization was applied to ensure equal feature contribution:

X_{s c a l e d} = \frac{X - μ}{σ}

(4)

where μ is the feature mean and σ is the standard deviation.

For Tree-Based Algorithms (Random Forest, Decision Trees, XGBoost):

Original feature scales were preserved as tree-based methods are inherently scale-invariant.

All machine learning algorithms were implemented using their default hyperparameters, with only essential parameters modified for reproducibility and computational efficiency. The configurations used were:

–: Random Forest: n_estimators = 100, max_depth = None, min_samples_split = 2, max_features = ‘sqrt’;
–: Decision Tree: criterion = ‘gini’, max_depth = None, min_samples_split = 2;
–: KNearest Neighbors: n_neighbors = 5, weights = ‘uniform’, algorithm = ‘auto’;
–: XGBoost: learning_rate = 0.3, max_depth = 6, n_estimators = 100, subsample = 1.0;
–: LightGBM: learning_rate = 0.1, num_leaves = 31, n_estimators = 100;
–: Logistic Regression: C = 1.0, solver = ‘lbfgs’, penalty = ‘l2′, max_iter = 1000;
–: Support Vector Machine: C = 1.0, kernel = ‘rbf’, gamma = ‘scale’ (commented out in final analysis);
–: Extra Trees: n_estimators = 100, max_depth = None, max_features = ‘sqrt’;
–: Naive Bayes: var_smoothing = 1 × 10⁻⁹ (Gaussian assumption);
–: Linear/Quadratic Discriminant Analysis: Default solvers and tolerances.

This standardized approach ensures fair comparison across algorithms without algorithm-specific bias from hyperparameter tuning. Random seeds were set to 42 for reproducibility, and n_jobs = −1 was used for parallel processing where applicable.

3.3.2. Robust Scaling Implementation

For algorithms sensitive to outliers, RobustScaler was implemented as an alternative:

X_{r o b u s t} = \frac{X - m e d i a n (X)}{Q_{3} - Q_{1}}

(5)

3.4. Feature Importance and Selection Methods

3.4.1. Univariate Feature Selection

Multiple statistical methods were employed for comprehensive feature importance analysis:

F-Score Analysis

The F-statistic measures the ratio of between-class to within-class variance:

F = \frac{M S_{b e t w e e n}}{M S_{w i t h i n}} = \frac{\sum_{i = 1}^{k} n_{i} {(\bar{X} i - \bar{X})}^{2} / (k - 1)}{\sum i = 1^{k} \sum_{j = 1}^{n_{i}} {(X_{i j} - \bar{X_{i}})}^{2} / (N - k)}

(6)

where

M S_{between}

is the mean square between groups,

M S_{within}

is the mean square within groups, k is the number of classes (k = 4 for damage categories), ni is the sample size for class i,

\bar{X}

i is the mean of class i,

\bar{X}

is the overall mean, Xij is the jth observation in class i, and N is the total sample size (N = 304,299).

Mutual Information

The mutual information score quantifies the dependency between features and target variables:

I (X; Y) = \sum_{x \in X} \sum_{y \in Y} p (x, y) \log (\frac{p (x, y)}{p (x) p (y)})

(7)

where p(x, y) is the joint probability distribution, and p(x) and p(y) are the marginal probability distributions of X and Y, respectively.

3.4.2. Model-Based Feature Importance

Random forest feature importance was calculated using Gini impurity reduction:

I m p o r t a n c e (X_{j}) = \sum_{t \in T} p (t) \cdot Δ i (t) \cdot 1_{X_{j} s l t t}

(8)

where p(t) is the proportion of samples reaching node t, and

Δ i (t)

is the impurity decrease at node t.

3.5. Machine Learning Algorithms

The machine learning algorithms implemented include Random Forest [67], Support Vector Machines [68], XGBoost [69], LightGBM [70], K-Nearest Neighbors [71], Logistic Regression [72], Naive Bayes [73], and Linear Discriminant Analysis [74].

3.5.1. Tree-Based Ensemble Methods

Random Forest

An ensemble of decision trees using bootstrap aggregating (bagging):

\hat{y} = \frac{1}{B} \sum_{b = 1}^{B} T_{b} (x)

(9)

where B is the number of trees and Tb is the b-th tree trained on bootstrap sample.

Hyperparameters: n_estimators = 100, random_state = 42, n_jobs = −1

Extra Trees (Extremely Randomized Trees):

Like random forest but with additional randomization in split selection:

S p l i t c r i t e r i o n = {a r g m a x}_{S \in S_{r a n d o m}} I n f o r m a t i o n G a i n (S)

(10)

Hyperparameters: n_estimators = 100, random_state = 42, n_jobs = −1

XGBoost (Extreme Gradient Boosting):

Implements gradient boosting with regularization:

L (ϕ) = \sum_{i} l (\hat{y_{i}}, y_{i}) + \sum_{k} Ω (f_{k})

(11)

where l is the loss function and Ω is the regularization term.

Hyperparameters: random_state = 42, eval_metric = ‘mlogloss’

LightGBM:

Implements gradient boosting using leaf-wise tree growth:

G a i n = \frac{1}{2} [\frac{G_{L}^{2}}{H_{L} + λ} + \frac{G_{R}^{2}}{H_{R} + λ} - \frac{{(G_{L} + G_{R})}^{2}}{H_{L} + H_{R} + λ}] - γ

(12)

Hyperparameters: random_state = 42, verbose = −1

3.5.2. Instance-Based Learning

K-Nearest Neighbors (KNN)

Classification based on majority voting among k nearest neighbors:

\hat{y} = \arg \max_{c} \sum_{x_{i} \in N_{k} (x)} 1 (y_{i} = c)

(13)

where

N_{k} (x)

represents the

k

nearest neighbors of point x.

Distance Metric: Euclidean distance:

d (x_{i}, x_{j}) = \sqrt{\sum_{l = 1}^{p} {(x_{i l} - x_{j l})}^{2}}

.

Hyperparameters: n_neighbors = 5

3.5.3. Linear Models

Logistic Regression

Multinomial logistic regression for multiclass classification:

P (Y = k| X) = \frac{\exp (β_{k}^{T} X)}{\sum_{j = 1}^{K} \exp (β_{j}^{T} X)}

(14)

Optimization: Limited-memory BFGS (L-BFGS) algorithm

Hyperparameters: max_iter = 1000, random_state = 42

3.5.4. Discriminant Analysis

Linear Discriminant Analysis (LDA)

Assumes equal covariance matrices across classes:

δ_{k} (x) = x^{T} Σ^{- 1} μ_{k} - \frac{1}{2} μ_{k}^{T} Σ^{- 1} μ_{k} + \log π_{k}

(15)

Quadratic Discriminant Analysis (QDA)

Allows different covariance matrices for each class:

δ_{k} (x) = - \frac{1}{2} \log |Σ_{k}| - \frac{1}{2} {(x - μ_{k})}^{T} Σ_{k}^{- 1} (x - μ_{k}) + \log π_{k}

(16)

3.5.5. Probabilistic Models

Naive Bayes

Assumes conditional independence between features:

P (y| x_{1}, \dots, x_{n}) = \frac{P (y) \prod_{i = 1}^{n} P (x_{i}| y)}{P (x_{1}, \dots, x_{n})}

(17)

Implementation: GaussianNB assuming Gaussian distribution for continuous features

3.5.6. Single Tree Model

Decision Tree

Recursive binary splitting using information gain criterion:

I n f o r m a t i o n G a i n = H (S) - \sum_{v \in V a l u e s (A)} \frac{|S_{v}|}{|S|} H (S_{v})

(18)

where

H (S)

is the entropy of set S.

Hyperparameters: random_state = 42

3.6. Model Training and Validation Strategy

3.6.1. Train–Test Split

The dataset was partitioned using stratified sampling to maintain class distribution (Table 5):

3.6.2. Cross-Validation Framework

Stratified K-Fold cross-validation (k = 5) was implemented to ensure robust performance estimation:

C V S c o r e = \frac{1}{k} \sum_{i = 1}^{k} S c o r e ({M o d e l}_{i}, {V a l i d a t i o n}_{i})

(19)

Stratified K-Fold cross-validation maintains class distribution in each fold while reducing overfitting risk, providing robust performance estimates and enabling statistical significance testing.

3.6.3. Performance Metrics

Accuracy: Model’s overall correctness rate, showing the proportion of all predictions that were classified correctly across all classes.

A c c u r a c y = \frac{T r u e P o s i t i v e s + T r u e N e g a t i v e s}{T o t a l P r e d i c t i o n s}

(20)

Precision (per class): For each class, it measures how many of the predicted positive cases were actually correct, indicating the model’s ability to avoid false alarms.

{P r e c i s i o n}_{i} = \frac{T P_{i}}{T P_{i} + F P_{i}}

(21)

Recall (per class): For each class, it measures how many of the actual positive cases were correctly identified by the model, showing the model’s ability to find all relevant instances.

{R e c a l l}_{i} = \frac{T P_{i}}{T P_{i} + F N_{i}}

(22)

F1-Score (per class): The harmonic mean of precision and recall for each class, providing a single metric that balances both false positives and false negatives.

F 1_{i} = 2 \times \frac{{P r e c i s i o n}_{i} \times {R e c a l l}_{i}}{{P r e c i s i o n}_{i} + {R e c a l l}_{i}}

(23)

Macro-averaged F1: The simple average of F1-scores across all classes, treating each class equally, regardless of how many samples it contains:

F 1_{m a c r o} = \frac{1}{K} \sum_{i = 1}^{K} F 1_{i}

(24)

Weighted F1: The weighted average of F1-scores across all classes, where each class’s contribution is proportional to its number of samples in the dataset:

F 1_{w e i g h t e d} = \sum_{i = 1}^{K} w_{i} \times F 1_{i}

(25)

where

w_{i}

is the support weight for class i.

3.7. Ensemble Learning Methods

3.7.1. Voting Classifier Implementation

Ensemble methods were implemented based on the top-performing individual models identified through cross-validation.

Soft voting averages the predicted probability distributions from multiple models for each class and selects the class with the highest combined probability score:

\hat{y} = \arg \max_{c} \sum_{i = 1}^{M} w_{i} \cdot P_{i} (y = c| x)

(26)

where

P_{i} (y = c| x)

is the predicted probability of class c from model

i

, and

w_{i}

are model weights.

Hard voting aggregates the discrete class predictions (hard votes) from multiple models and selects the class that receives the majority of votes.

\hat{y} = \arg \max_{c} \sum_{i = 1}^{M} w_{i} \cdot 1 (\hat{y_{i}} = c)

(27)

3.7.2. Model Selection for Ensemble

The ensemble composition was systematically determined by ranking all models based on cross-validation accuracy and selecting the top three performers—random forest, extra trees, and decision tree—for voting classifier implementation. Both soft and hard voting methods were evaluated, with hard voting selected based on its superior performance in the final ensemble configuration.

3.8. Computational Implementation

3.8.1. Software Framework

The computational analysis was conducted using a Python-based framework that integrated Pandas 1.5+ for data manipulation, NumPy 1.24+ for numerical operations, Scikit-learn for ML, and XGBoost 1.7+ and LightGBM 3.3+ for gradient boosting. Visualization was conducted using Matplotlib 3.6+ and Seaborn 0.12+. All analyses were performed on a high-performance workstation equipped with an Intel Core i9-14900K processor and 64 GB of RAM (Intel, Santa Clara, CA, USA).

3.8.2. Reproducibility Framework

All analyses were performed with fixed random seeds (random_state = 42) to ensure reproducibility. The entire pipeline was encapsulated within the EarthquakeDamagePredictor class, enabling systematic experimentation and result validation.

3.8.3. Performance Optimization

To ensure computational efficiency and scalability, multiple optimization techniques were applied throughout the analysis. Parallel processing was employed using the n_jobs = −1 parameter for algorithms supporting multi-threading. Moreover, efficient data structures and systematic garbage collection were adopted to enhance memory management. Further efficiency was achieved through NumPy array-based vector operations, supported by comprehensive logging and progress tracking mechanisms for real-time monitoring of model training and testing. Notably, this methodical approach facilitates robust, reproducible exploration of earthquake damage patterns using state-of-the-art ML techniques. The structured workflow supports consistent model comparison and provides a solid foundation for real-world applications in seismic risk analysis.

4. Results and Analysis

4.1. Model Performance Overview

Our comparison of 11 ML algorithms (Table 6, Figure 7) revealed notable differences in performance. The results highlighted the effectiveness of ensemble and tree-based algorithms in addressing earthquake damage classification.

4.2. Algorithm Performance Analysis

The experimental results presented in Table 5 and Figure 7 indicate the performance levels of each evaluated algorithm. As indicated, random forest achieved the highest accuracy (79.65%) with exceptionally high stability (CV_Std = 0.0016), benefiting from its ensemble nature, which combines multiple decision trees with feature randomization to ensure robust classification and resistance to overfitting and class imbalance. The voting ensemble followed closely with 79.62% accuracy by effectively combining random forest, extra trees, and decision tree models through hard voting, which outperformed soft voting and demonstrated the advantages of model diversity in ensemble learning. Extra trees achieved 79.54% accuracy through its highly randomized split selection strategy, enabling fast training and solid generalization performance.

The mid-performing group included decision trees, which achieved 78.70% accuracy and performed reliably as a standalone model, despite high variance, and KNN, which attained 77.46% accuracy, demonstrating the effectiveness of instance-based learning while remaining sensitive to feature scaling and dimensionality. Gradient boosting algorithms demonstrated mixed performance: XGBoost achieved 75.62% accuracy, and LightGBM attained 74.30% accuracy—both lower than typically expected for models of this class. Despite their consistent cross-validation stability, these results suggest a potential need for hyperparameter tuning.

Linear models delivered modest yet stable performance, with logistic regression (67.31%) and LDA (67.25%) yielding nearly identical outcomes. Their lower accuracy likely reflects the limitations imposed by linear decision boundaries in a problem of this complexity. The poorest performers were QDA, with 41.71% accuracy, and naive Bayes, with 35.35% accuracy. The former likely struggled with insufficient data per class and overfitting in high-dimensional space, while the latter was hampered by strong feature correlations that violated its conditional independence assumption. These findings emphasize the importance of selecting algorithms based on both data characteristics and underlying theoretical assumptions.

4.3. Cross-Validation vs. Test Performance Analysis

As depicted in Figure 8, the cross-validation results closely align with test performance across all model categories. Gradient boosting algorithms demonstrate the most reliable behavior, with CV–test gaps under 0.3%, followed by tree-based ensembles (<0.5%) and individual tree models (<0.4%). Linear models exhibit minimal gaps below 0.1%, though their generalization capacity remains limited. Instance-based algorithms maintain moderate consistency, with gaps below 0.6%. In contrast, probabilistic models exhibit the largest discrepancies, exceeding 1.0%, suggesting potential overfitting issues.

4.4. Confusion Matrix Analysis

The confusion matrix analysis (Figure 9) reflects the classification performance of the tested algorithms across all damage categories, with detailed evaluation metrics summarized in Table 7. The matrix displays a clear diagonal pattern, indicating accurate predictions, with some misclassification observed between neighboring damage levels.

The severely damaged class was classified with high accuracy, likely due to its large sample size, while the collapsed class exhibited conservative predictions, characterized by high precision but relatively low recall. Although class imbalance had a notable effect on performance, the model preserved essential safety detection capabilities in identifying severe damage.

4.5. Feature Importance Analysis

The random forest feature importance analysis (Figure 10) indicates that distance-based parameters overwhelmingly dominate the prediction model, with the top six features corresponding to proximity measurements to various seismic stations. Station1_Distance_km is identified as the most influential predictor, with an importance score of approximately 0.12, followed closely by distances to Stations 2 through 6. This pattern suggests that spatial proximity to recording stations is the primary determinant of building damage severity. The remaining top 10 features consist of station identification codes and Joyner–Boore distance metrics (Rjb), while ground motion parameters and site characteristics display lower importance, highlighting the strong predictive power of geometric distance relationships for earthquake damage assessment in this dataset. The prominence of geometric distance parameters (Station_Distance_km) from the nearest stations reflects their superior ability to capture local ground motion variations compared to traditional seismological distance measures. While epicentral distance (Repi) provides general proximity to earthquake sources, the simple geometric distances to recording stations appear to better represent the actual ground motion intensity experienced by buildings, possibly due to complex wave propagation effects and site-specific amplification patterns in the heterogeneous geological environment of the study region.

4.6. ROC Curve Analysis

The ROC curve analysis (Figure 11) reflects differing degrees of discriminatory power across algorithmic approaches, with corresponding area under the curve (AUC) values presented in Table 8.

4.7. Model Complexity vs. Performance Analysis

The relationship between model complexity and performance (Figure 12) identifies optimal zones in which accuracy peaks without incurring substantial computational cost. As shown in Table 9, different complexity levels yield distinct trade-offs among accuracy, training time, and interpretability.

Model complexity is determined by evaluating the computational resources and parameters required for each algorithm. For tree-based models such as random forest and decision tree, complexity is calculated based on the number of trees, tree depth, and nodes per tree. Linear models like logistic regression and linear discriminant analysis have complexity determined by the number of features and output classes. Instance-based algorithms like K-Nearest Neighbors depend on the training dataset size and feature dimensionality. Probabilistic models such as naive Bayes have relatively simple complexity based on feature-class combinations. Ensemble methods combine the complexities of their constituent models.

The complexity–performance relationship (Figure 12) reveals optimal complexity zones where performance peaks without excessive computational overhead. For this study, with 304,299 samples and 73 features, complexity values range from simple models with minimal parameters to complex ensemble methods requiring substantial computational resources. As shown in Table 9, different complexity levels offer distinct trade-offs between accuracy, training time, and interpretability.

4.8. Learning Curves Analysis

The learning curves of the random forest model (Figure 13) display training scores consistently above 0.95 across all dataset sizes, while validation scores steadily increase from approximately 0.74–0.79 as more training data are introduced. The small gap between the training and validation curves indicates strong generalization with minimal overfitting. The observed performance plateau beyond approximately 100,000 samples suggests that the dataset is sufficiently large for effective model training.

4.9. Ensemble Learning Performance

The ensemble learning strategy integrated the complementary strengths of random forest’s robust feature selection, extra trees’ enhanced variance reduction, and decision tree’s interpretable decision boundaries. The hard voting ensemble achieved 79.62% accuracy, outperforming soft voting (79.24%) and surpassing the average individual model performance by 1.2%. It also exhibited high consistency, with a cross-validation standard deviation of only 0.0016, underscoring the effectiveness of combining diverse algorithms to enhance predictive accuracy.

4.10. Computational Performance

The computational efficiency analysis (Table 10) displays substantial variation in training and prediction times across algorithms.

The results demonstrate that tree-based ensemble methods achieve optimal performance in earthquake damage classification, with random forest and ensemble models providing the best trade-off among accuracy, stability, and computational efficiency. The dominance of distance-based features underscores the critical role of proximity to seismic sources in shaping damage patterns.

5. Discussion

5.1. Algorithm Performance and Methodological Insights

Extensive testing of the 10 ML methods indicates that tree-based ensemble methods perform optimally in classifying earthquake damage, with random forest (79.65%) and voting ensemble (79.62%) achieving the highest accuracy. This reflects their ability to capture complex nonlinear relationships in earthquake damage, where small differences in ground motion parameters can result in substantially different damage outcomes. Unlike linear models that assume monotonic relationships, tree-based methods inherently identify critical damage thresholds through recursive partitioning without requiring prior assumptions about functional form.

The ensemble’s performance reinforces the principle that combining multiple models can enhance predictive accuracy relative to individual algorithms. Although the margin over the best single model is small, it remains practically valuable for scaling to full damage assessment applications. The ensemble integrates random forest (bagging with random features), extra trees (enhanced split randomization), and decision tree (interpretable boundaries) to offer complementary strengths that reduce overall prediction variance.

5.2. Implications of Feature Importance and Seismic Engineering

The dominance of distance-based parameters in the importance rankings provides substantial support for fundamental seismic engineering principles. The presence of distances to multiple stations—not just the nearest—among the top predictors confirms that a multi-station approach is more effective than traditional single-station methods. This finding suggests that spatial averaging of ground motion characteristics offers a considerable advantage in improving damage prediction accuracy and supports the development of more sophisticated seismic characterization methods.

Interestingly, the relatively reduced importance of PGA parameters compared to distance-based indicators contradicts conventional engineering practice and may suggest that proximity to the fault rupture is more influential than specific acceleration values within the observed range. This implies that simplified damage assessment methods based on distance metrics may be unexpectedly effective for regional-scale applications, potentially reducing computational demands in emergency response contexts.

5.3. Class Performance and Emergency Response Applications

The variation in performance across damage classes is attributed to both class imbalance issues and the characteristics of post-earthquake surveys. The favorable performance for severely damaged buildings (F1 = 0.891) indicates reliable identification of structures requiring urgent action, while the conservative classification of collapsed buildings (high precision, low recall) reflects adequate caution. These outcomes directly inform emergency response protocols and facilitate more efficient resource allocation and triage procedures in disaster scenarios.

The high recall rate for severely damaged buildings (94.1%) ensures that most structures requiring immediate intervention are identified, allowing for effective deployment of disaster response resources. By distinguishing among damage types, the proposed model supports the use of more advanced resource allocation strategies than those relying solely on simple damaged/undamaged classifications, thereby equipping emergency managers with actionable intelligence to coordinate response efforts.

5.4. Comparison with the Literature and Practical Applications

Compared to earlier studies on earthquake damage assessment, the 79.65% accuracy achieved in this study reflects a marked improvement. The use of a dataset containing 304,299 buildings also represents an unprecedented scale in this domain. The systematic evaluation of multiple algorithms under strict validation protocols enables objective performance comparison—an aspect previously limited by smaller datasets. Most prior studies have focused on a single algorithm or relatively small datasets, reducing their generalizability and limiting their practical utility. This performance aligns well with recent methodological advances, where Hariri-Ardebili and Sattar (2025) [56] achieved comparable accuracy using AutoML on building-specific structural parameters for 242 buildings from the same earthquake dataset. While their approach excelled in detailed vulnerability analysis with comprehensive structural inventories, our multi-station seismic approach demonstrates that similar accuracy can be achieved across substantially larger building populations using readily available seismic parameters. This comparison highlights complementary methodological paradigms serving different operational needs, with ensemble learning methods showing particular promise, as demonstrated by recent studies [57,58] and validated through our comprehensive algorithm evaluation.

The proposed method demonstrates immediate applicability in emergency response networks by enabling rapid post-earthquake damage assessment within hours, using only building coordinates and seismometer data. This capability can substantially enhance situational awareness and support more effective resource deployment during the critical post-earthquake window. Beyond emergency scenarios, the method is also applicable to future risk prediction, supporting scenario-based earthquake modeling, the development of building codes, and the prioritization of seismic retrofitting efforts.

5.5. Limitations and Future Directions

Several limitations constrain the scope and applicability of this study. The geographical specificity of the training data restricts the transferability of the model to regions with differing seismic characteristics or building typologies. Common seismic input metrics, such as PGA and distance, fail to fully capture the complexity of ground motion, including effects related to duration and frequency content. Moreover, the limited availability of detailed building inventories prevents the model from incorporating structural variations that substantially influence damage patterns. Additionally, the rapid visual inspection methodology, while standardized and conducted by trained engineers, may introduce observational uncertainties compared to detailed structural engineering evaluations. These assessments, although practical for large-scale surveys, may miss subtle structural damage or misclassify damage severity in complex structural configurations where visual indicators may not fully represent actual structural integrity.

The damage classification process, while conducted by trained engineers and technical staff following standardized protocols established by the Ministry of Environment, Urbanization and Climate Change, is subject to additional sources of potential bias and assessment uncertainty. Field-based damage evaluation can be influenced by subjective interpretation of damage severity boundaries, time constraints during large-scale emergency assessments, accessibility limitations in heavily damaged areas, variations in assessment team experience, and potential inconsistencies in applying standardized criteria across different teams and regions. Environmental conditions during post-earthquake assessments may also affect evaluation accuracy. These potential inaccuracies in ground truth labels could introduce noise in machine learning model training and validation processes, though the large dataset size (304,299 buildings) helps mitigate the impact of individual assessment variations on overall model performance.

The study focuses exclusively on immediate post-earthquake damage states without considering progressive damage evolution, aftershock effects, or cumulative structural deterioration over time. The temporal dynamics of damage progression and the influence of multiple seismic events on structural performance remain outside the scope of the current analysis.

Furthermore, while the ensemble methods demonstrate superior predictive accuracy, their computational complexity and inherently reduced interpretability present challenges for operational deployment in time-critical emergency response scenarios. The “black-box” nature of sophisticated ensemble models may limit their acceptance by emergency management agencies and regulatory bodies that require transparent decision-making processes for critical infrastructure applications.

To enhance generalizability, future research should expand the geographical coverage, integrate robust building inventories to support improved structural characterization, and develop physics-informed ML frameworks that merge empirical data with established seismic engineering principles. Combining real-time seismic data streams with autonomous damage assessment systems also presents a promising direction for operational deployment.

6. Conclusions

6.1. Key Findings and Contributions

This study presents the largest ML-based analysis of earthquake damage patterns to date, based on data from 304,299 buildings affected by the 2023 Kahramanmaraş earthquakes. The study demonstrates that ensemble learning models provide improved performance in seismic damage prediction (accuracy = 79.62%), with random forests and extra trees constituting the core components of the top-performing models. Systematic testing of 10 ML algorithms provides strong evidence that tree-based methods considerably outperform linear and probabilistic models in addressing this complex classification task.

The use of multi-station seismic parameters represents a methodological improvement over traditional single-station approaches, with distance-based features serving as the strongest predictors of damage intensity. This finding aligns with fundamental principles of seismic engineering and indicates that simplified, distance-based evaluation methods may yield unexpectedly accurate results at regional scales. The extensive feature engineering approach—based on 66 seismic parameters from multiple recording stations—provides an applicable framework for future large-scale damage evaluation studies.

6.2. Practical Applications and Implications

The practical value of this research extends beyond academic relevance to direct application in emergency response. The developed models can deliver rapid damage assessments within hours of an earthquake, facilitating more effective resource allocation and coordination. Additionally, the methodology supports future risk forecasting through scenario-based earthquake modeling and can inform building code development through empirical validation of damage mechanisms. The adoption of multiclass damage predictions allows for more detailed emergency management strategies compared to conventional binary classification.

Because distance-based features are readily available, damage assessments can be conducted using basic geographic information systems, removing the need for advanced ground motion modeling. This is particularly useful in emergency situations with limited seismic monitoring infrastructure, where rapid assessments can still be performed with minimal computational resources.

6.3. Research Impact and Future Opportunities

The 2023 Kahramanmaraş earthquakes, although devastating, provided a rare opportunity to advance the science of earthquake damage assessment. Demonstrating the ability of ML methods to produce highly accurate damage predictions sets new standards for the field and establishes a foundation for developing operational tools that improve emergency response effectiveness.

Future research priorities include expanding the geographical scope to evaluate model generalizability in other seismically active regions, incorporating detailed building inventories to capture structural variability, and developing physics-informed ML models that integrate empirical findings with established engineering principles. Real-time assessment systems based on streaming seismic data may represent the next critical step toward operational deployment.

As seismic hazards continue to grow due to urban expansion and infrastructure deterioration, integrating advanced computational methods with extensive observational data offers promising avenues for reducing earthquake vulnerability and building more earthquake-resistant communities. The methodology developed in this study offers near-term practical applications and provides a foundation for future advancements in seismic risk analysis and earthquake engineering.

Author Contributions

Conceptualization, S.T.A.Ö. and Ö.F.N.; methodology, S.T.A.Ö. and Ö.F.N.; software, Ö.F.N.; validation, S.T.A.Ö., Ö.F.N. and M.F.; formal analysis, S.T.A.Ö. and M.F.; investigation, S.T.A.Ö. and M.F.; resources, M.F.; data curation, M.F.; writing—original draft preparation, S.T.A.Ö. and Ö.F.N.; writing—review and editing, S.T.A.Ö., M.F. and Ö.F.N.; visualization, Ö.F.N.; supervision, M.F.; project administration, M.F.; funding acquisition, M.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the UAE University grant number 12N264.

Data Availability Statement

The data utilized in this study can be shared upon request from the authors.

Acknowledgments

The authors acknowledge the Turkish Ministry of Environment, Urbanization and Climate Change for providing access to the building damage assessment database, and the Disaster and Emergency Management Presidency (AFAD) for seismic ground motion data. We also express our gratitude to the field engineers and technical staff who conducted the post-earthquake damage assessments under challenging conditions. Additionally, we would like to thank Hüseyin Üzen for his guidance and support throughout this study.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Dong, L.; Shan, J. A Comprehensive Review of Earthquake-Induced Building Damage Detection with Remote Sensing Techniques. ISPRS J. Photogramm. Remote Sens. 2013, 84, 85–99. [Google Scholar] [CrossRef]
Demir, A. Post-Earthquake Structural Damage Assessment, Lessons Learned, and Addressing Objections Following the 2023 Kahramanmaras, Turkey Earthquakes. Bull. Earthq. Eng. 2025, 23, 1107–1127. [Google Scholar] [CrossRef]
El Ghoul, S.S.; Tayeh, B.A.; Baghdadi, A.; Alaloul, W.S.; Abu Aisheh, Y.I. Key Factors Shaping Post-Disaster Building Damage Assessment: Insights from the Gaza Strip as a Conflict Zone. J. Asian Archit. Build. Eng. 2025, 1–21. [Google Scholar] [CrossRef]
Grimaz, S.; Malisan, P.; Pividori, A. Sharing the Post-Earthquake Situation for Emergency Response Management in Transborder Areas: The e-Atlas Tool. J. Saf. Sci. Resil. 2022, 3, 72–86. [Google Scholar] [CrossRef]
Saqib, M.; Şentürk, E.; Arqim Adil, M.; Freeshah, M. Seismo-Ionospheric Precursory Detection Using Hybrid Bayesian-LSTM Network Model with Uncertainty-Boundaries and Anomaly-Intensity. Adv. Space Res. 2024, 74, 1828–1842. [Google Scholar] [CrossRef]
Uyanık, H.; Şentürk, E.; Akpınar, M.H.; Ozcelik, S.T.A.; Kokum, M.; Freeshah, M.; Sengur, A. A Multi-Input Convolutional Neural Networks Model for Earthquake Precursor Detection Based on Ionospheric Total Electron Content. Remote Sens. 2023, 15, 5690. [Google Scholar] [CrossRef]
Uyanik, H.; Kokum, M.; Senturk, E.; Freeshah, M.; Ozcelik, S.T.A.; Akpinar, M.H.; Celik, S.; Sengur, A. Seismic Foresight: A Novel Multi-Input 1D Convolutional Mixer Model for Earthquake Prediction Using Ionospheric Signals. IEEE Access 2025, 13, 116200–116210. [Google Scholar] [CrossRef]
Damcı, E.; Temür, R.; Kanbir, Z.; Şekerci, Ç.; Öztorun Köroğlu, E. Comprehensive Investigation of Damage Due to 2023 Kahramanmaraş Earthquakes in Türkiye: Causes, Consequences, and Mitigation. J. Build. Eng. 2025, 99, 111420. [Google Scholar] [CrossRef]
Yilmaz, Z.; Altunişik, A.C.; Taciroglu, E.; Günaydin, M.; Okur, F.Y.; Sunca, F.; Şişman, R.; Aslan, B.; Sezdirmez, T. Regional Building Damage Survey Data on the 2023 Kahramanmaraş, Türkiye, Earthquakes. ASCE OPEN Multidiscip. J. Civ. Eng. 2024, 2, 04024009. [Google Scholar] [CrossRef]
Freeshah, M.; Şentürk, E.; Zhang, X.; Livaoğlu, H.; Ren, X.; Osama, N. Investigating Multiple Ionospheric Disturbances Associated with the 2020 August 4 Beirut Explosion by Geodetic and Seismological Data. Pure Appl. Geophys. 2024, 181, 875–894. [Google Scholar] [CrossRef]
Gurbuz, T.; Cengiz, A. Structural Damages during the February 06, 2023 Kahramanmaraş Earthquakes in Turkey. Soil Dyn. Earthq. Eng. 2025, 191, 109214. [Google Scholar] [CrossRef]
Altindal, A.; Askan, A. Traditional Seismic Hazard Analyses Underestimate Hazard Levels When Compared to Observations from the 2023 Kahramanmaras Earthquakes. Commun. Earth Environ. 2024, 5, 14. [Google Scholar] [CrossRef]
Čejka, F.; Zahradník, J.; Turhan, F.; Sokos, E.; Gallovič, F. Long-Period Directivity Pulses of Strong Ground Motion during the 2023 Mw7.8 Kahramanmaraş Earthquake. Commun. Earth Environ. 2023, 4, 413. [Google Scholar] [CrossRef]
Wu, F.; Xie, J.; An, Z.; Lyu, C.; Taymaz, T.; Irmak, T.S.; Li, X.; Wen, Z.; Zhou, B. Pulse-like Ground Motion Observed during the 6 February 2023 MW7.8 Pazarcık Earthquake (Kahramanmaraş, SE Türkiye). Earthq. Sci. 2023, 36, 328–339. [Google Scholar] [CrossRef]
Ioannou, I.; Chandler, R.E.; Rossetto, T. Empirical Fragility Curves: The Effect of Uncertainty in Ground Motion Intensity. Soil Dyn. Earthq. Eng. 2020, 129, 105908. [Google Scholar] [CrossRef]
Bodenmann, L.; Baker, J.W.; Stojadinović, B. Accounting for Ground-Motion Uncertainty in Empirical Seismic Fragility Modeling. Earthq. Spectra 2024, 40, 2456–2474. [Google Scholar] [CrossRef]
Gubana, A.; Mazelli, A. Fragility Curves for Different Intensity Measures for a Gravity Load-Designed RC Hospital Building: A Case Study. Structures 2023, 56, 104925. [Google Scholar] [CrossRef]
Bielak, J.; Xu, J.; Ghattas, O. Earthquake Ground Motion and Structural Response in Alluvial Valleys. J. Geotech. Geoenvironmental Eng. 1999, 125, 413–423. [Google Scholar] [CrossRef]
Ozsarac, V.; Ricardo, M.; Askan, A.; Calvi, G.M. Impact of Local Site Effects on Seismic Risk Assessment of Reinforced Concrete Bridges. Soil Dyn. Earthq. Eng. 2023, 164, 107624. [Google Scholar] [CrossRef]
Trifunac, M.D. Site Conditions and Earthquake Ground Motion—A Review. Soil Dyn. Earthq. Eng. 2016, 90, 88–100. [Google Scholar] [CrossRef]
Sanrı Karapınar, I.; Özsoy Özbay, A.E.; Kutlu, Z.N.; Yazgan, A.U.; Kılıç, İ.E. Seismic Vulnerability Analysis Incorporating Local Site Amplification Effects in Shallow, Varying Bedrock Depths. Nat. Hazards 2025, 121, 16013–16032. [Google Scholar] [CrossRef]
Kubo, H.; Naoi, M.; Kano, M. Recent Advances in Earthquake Seismology Using Machine Learning. Earth Planets Space 2024, 76, 36. [Google Scholar] [CrossRef]
Hu, S.; Guo, T.; Alam, M.S.; Koetaka, Y.; Ghafoori, E.; Karavasilis, T.L. Machine Learning in Earthquake Engineering: A Review on Recent Progress and Future Trends in Seismic Performance Evaluation and Design. Eng. Struct. 2025, 340, 120721. [Google Scholar] [CrossRef]
Liu, H.; Yang, P.; Ren, X.; Mei, D.; Le, X.; Zhang, X.; Freeshah, M. The Short-Term Prediction of Low-Latitude Ionospheric Irregularities Leveraging a Hybrid Ensemble Model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4100615. [Google Scholar] [CrossRef]
Ravichandran, N.; Bidorn, B.; Mercan, O.; Paneerselvam, B. Data-Driven Machine-Learning-Based Seismic Response Prediction and Damage Classification for an Unreinforced Masonry Building. Appl. Sci. 2025, 15, 1686. [Google Scholar] [CrossRef]
Wang, X.; Wang, Z.; Wang, J.; Miao, P.; Dang, H.; Li, Z. Machine Learning Based Ground Motion Site Amplification Prediction. Front. Earth Sci. 2023, 11, 1053085. [Google Scholar] [CrossRef]
Kazemi, F.; Asgarkhani, N.; Jankowski, R. Machine Learning-Based Seismic Response and Performance Assessment of Reinforced Concrete Buildings. Arch. Civ. Mech. Eng. 2023, 23, 94. [Google Scholar] [CrossRef]
Safonova, A.; Ghazaryan, G.; Stiller, S.; Main-Knorn, M.; Nendel, C.; Ryo, M. Ten Deep Learning Techniques to Address Small Data Problems with Remote Sensing. Int. J. Appl. Earth Obs. Geoinf. 2023, 125, 103569. [Google Scholar] [CrossRef]
Lara, P.; Bletery, Q.; Ampuero, J.P.; Inza, A.; Tavera, H. Earthquake Early Warning Starting From 3 s of Records on a Single Station With Machine Learning. J. Geophys. Res. Solid Earth 2023, 128, e2023JB026575. [Google Scholar] [CrossRef]
Shen, J.; Ren, X.; Chen, J. Effects of Spatial Variability of Ground Motions on Collapse Behaviour of Buildings. Soil Dyn. Earthq. Eng. 2021, 144, 106668. [Google Scholar] [CrossRef]
Kakoty, P.; Molina Hutt, C.; Ghofrani, H.; Molnar, S. Spectral Acceleration Basin Amplification Factors for Interface Cascadia Subduction Zone Earthquakes in Canada’s 2020 National Seismic Hazard Model. Earthq. Spectra 2023, 39, 1166–1188. [Google Scholar] [CrossRef]
Yang, Q.; Yu, R.; Jiang, P.; Chen, K. Spatial Variation of Strong Ground Motions in a Heterogeneous Soil Site Based on Observation Records from a Dense Array. Front. Earth Sci. 2023, 10, 1054448. [Google Scholar] [CrossRef]
Senel, S.M.; Kayhan, A.H.; Palanci, M.; Demir, A. Assessment of Damages in Precast Industrial Buildings in the Aftermath of Pazarcık and Elbistan Earthquakes. J. Earthq. Eng. 2024, 1–33. [Google Scholar] [CrossRef]
Chang, H.; Abercrombie, R.E.; Nakata, N.; Pennington, C.N.; Kemna, K.B.; Cochran, E.S.; Harrington, R.M. Quantifying Site Effects and Their Influence on Earthquake Source Parameter Estimations Using a Dense Array in Oklahoma. J. Geophys. Res. Solid Earth 2023, 128, e2023JB027144. [Google Scholar] [CrossRef]
Nolte, K.A.; Tsoflias, G.P.; Holubnyak, Y.; Raney, J.; Wreath, D. Designing Monitoring Networks for Local Earthquakes. J. Geophys. Eng. 2022, 19, 75–84. [Google Scholar] [CrossRef]
Feng, T.; Mohanna, S.; Meng, L. EdgePhase: A Deep Learning Model for Multi-Station Seismic Phase Picking. Geochem. Geophys. Geosystems 2022, 23, e2022gc010453. [Google Scholar] [CrossRef]
Cheng, Q.; Li, A.; Ren, H.; Por, C.C.; Liao, W.; Xie, L. Rapid Seismic-Damage Assessment Method for Buildings on a Regional Scale Based on Spectrum-Compatible Data Augmentation and Deep Learning. Soil Dyn. Earthq. Eng. 2024, 178, 108504. [Google Scholar] [CrossRef]
Li, C.; Kunnath, S.K.; Zhao, Y. A New Framework for Ground Motion Selection for Structural Seismic Assessment. Eng. Struct. 2023, 285, 116055. [Google Scholar] [CrossRef]
Tavakoli, B.; Sedaghati, F.; Pezeshk, S. An Analytical Effective Point-Source-Based Distance-Conversion Approach to Mimic the Effects of Extended Faults on Seismic Hazard Assessment. Bull. Seismol. Soc. Am. 2018, 108, 742–760. [Google Scholar] [CrossRef]
Mazanec, M.; Valenta, J.; Málek, J. Does VS30 Reflect Seismic Amplification? Observations from the West Bohemia Seismic Network. Nat. Hazards 2024, 120, 12181–12202. [Google Scholar] [CrossRef]
Yaghmaei-Sabegh, S.; Hassani, B. Investigation of the Relation between Vs30 and Site Characteristics of Iran Based on Horizontal-to-Vertical Spectral Ratios. Soil Dyn. Earthq. Eng. 2020, 128, 105899. [Google Scholar] [CrossRef]
Ismet Kanli, A.; Tildy, P.; Prónay, Z.; Pinar, A.; Hermann, L. Vs30 Mapping and Soil Classification for Seismic Site Effect Evaluation in Dinar Region, SW Turkey. Geophys. J. Int. 2006, 165, 223–235. [Google Scholar] [CrossRef]
Díaz, J.P.; Sáez, E.; Monsalve, M.; Candia, G.; Aron, F.; González, G. Machine Learning Techniques for Estimating Seismic Site Amplification in the Santiago Basin, Chile. Eng. Geol. 2022, 306, 106764. [Google Scholar] [CrossRef]
Chen, M.; Park, Y.; Mangalathu, S.; Jeon, J.S. Effect of Data Drift on the Performance of Machine-Learning Models: Seismic Damage Prediction for Aging Bridges. Earthq. Eng. Struct. Dyn. 2024, 53, 4541–4561. [Google Scholar] [CrossRef]
Bhatta, S.; Kang, X.; Dang, J. Machine Learning Prediction Models for Ground Motion Parameters and Seismic Damage Assessment of Buildings at a Regional Scale. Resilient Cities Struct. 2024, 3, 84–102. [Google Scholar] [CrossRef]
Kubo, H.; Kunugi, T.; Suzuki, W.; Suzuki, S.; Aoi, S. Hybrid Predictor for Ground-Motion Intensity with Machine Learning and Conventional Ground Motion Prediction Equation. Sci. Rep. 2020, 10, 11871. [Google Scholar] [CrossRef]
Peng, Q.; Cheng, W.; Jia, H.; Guo, P.; Jia, K. Rapid Seismic Damage Assessment Using Machine Learning Methods: Application to a Gantry Crane. Struct. Infrastruct. Eng. 2023, 19, 779–792. [Google Scholar] [CrossRef]
Li, C.; Zhou, H.; Cai, Z.; Ozturk, B.; Fouad Hussein, A.; Hesham El Naggar, M. Machine Learning-Based Seismic damage assessment of a bridge portfolio in cohesive soil. Buildings 2025, 15, 1682. [Google Scholar] [CrossRef]
Soleimani-Babakamali, M.H.; Askari, M.; Heravi, M.A.; Sisman, R.; Attarchian, N.; Askan, A.; Soleimani, R.; Taciroglu, E. Deep Ensemble Learning for Rapid Large-Scale Postearthquake Damage Assessment: Application to Satellite Images from the 2023 Türkiye Earthquakes. ASCE OPEN Multidiscip. J. Civ. Eng. 2025, 3, 04025003. [Google Scholar] [CrossRef]
Jia, J.; Ye, W. Deep Learning for Earthquake Disaster Assessment: Objects, Data, Models, Stages, Challenges, and Opportunities. Remote Sens. 2023, 15, 4098. [Google Scholar] [CrossRef]
Lazaridis, P.C.; Kavvadias, I.E.; Demertzis, K.; Iliadis, L.; Vasiliadis, L.K. Structural Damage Prediction of a Reinforced Concrete Frame under Single and Multiple Seismic Events Using Machine Learning Algorithms. Appl. Sci. 2022, 12, 3845. [Google Scholar] [CrossRef]
Zhang, H.; Yu, D.; Li, G.; Dong, Z. A Real-Time Seismic Damage Prediction Framework Based on Machine Learning for Earthquake Early Warning. Earthq. Eng. Struct. Dyn. 2024, 53, 593–621. [Google Scholar] [CrossRef]
Shi, L.; Gao, K.; Liu, X.; Xu, K.; Zhong, J. Matrix-Based Predictive Model of Residual Drift and Analytical Resilience Design Approach for Self-Centering Columns. Eng. Struct. 2024, 305, 117723. [Google Scholar] [CrossRef]
Zhong, J.; Zhu, Y.; Wang, H. The Analytical Curvature Distribution Model of Columns and Mathematical Solution for Pushover Analysis. Earthq. Eng. Struct. Dyn. 2025, 54, 182–205. [Google Scholar] [CrossRef]
Zhong, J.; Shu, Y.; Wang, H. Physic-Law Integrated Neural Network for Nonlinear Seismic Demand Prediction. Earthq. Eng. Struct. Dyn. 2025, 1–19. [Google Scholar] [CrossRef]
Hariri-Ardebili, M.A.; Sattar, S. Data-Driven Insights Into Post-Earthquake Reconnaissance Findings: 2023 Türkiye Earthquake Sequence. Earthq. Spectra 2025, 41, 58–87. [Google Scholar] [CrossRef]
Mostofi, S.; Yilmaz, Z.; Başağa, H.B.; Okur, F.Y.; Altunişik, A.C.; Taciroglu, E. A Hybrid Stacked Ensemble Model for Rapid Seismic Damage Assessment with Imbalanced Training Data: A Case Study on the 2023 Kahramanmaraş Earthquakes. Eng. Struct. 2025, 340, 120754. [Google Scholar] [CrossRef]
Baldassini, M.; Foglia, P.; Lazzerini, B.; Pistolesi, F.; Prete, C.A. Explainable Ensemble Learning for Structural Damage Prediction under Seismic Events. In Proceedings of the ESANN 2025, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 24 April 2025. [Google Scholar]
Ersoz, A.B.; Pekcan, O.; Altun, M.; Teke, T.; Aydogmus, O. Utilizing Digital Technologies for Rapid Damage Assessment and Reconnaissance: The February 6, 2023 Kahramanmaraş-Türkiye Earthquakes (Mw 7.7 and Mw 7.6). Bull. Earthq. Eng. 2024, 22, 1–19. [Google Scholar] [CrossRef]
Altunel, E.; Kozacı, Ö.; Yıldırım, C.; Sbeinati, R.M.; Meghraoui, M. Potential Domino Effect of the 2023 Kahramanmaraş Earthquake on the Centuries-Long Seismic Quiescence of the Dead Sea Fault: Inferences from the North Anatolian Fault. Sci. Rep. 2024, 14, 15440. [Google Scholar] [CrossRef]
Boulton, S.J.; Jones, J.N.; Malcioglu, F.S.; O’Kane, A.; Cleave, M.D.; Adamidis, O.; Efeoglu, T.; Aktaş, Y.D. Earthquake Environmental Effects and ESI 2007 of the 6th February 2023 Kahramanmaraş Earthquakes along the East Anatolian Fault Zone (Türkiye). Quat. Int. 2025, 732, 109804. [Google Scholar] [CrossRef]
Akar, F.; Işık, E.; Avcil, F.; Büyüksaraç, A.; Arkan, E.; İzol, R. Geotechnical and Structural Damages Caused by the 2023 Kahramanmaraş Earthquakes in Gölbaşı (Adıyaman). Appl. Sci. 2024, 14, 2165. [Google Scholar] [CrossRef]
Kırtel, O.; Aydın, F.; Boru, E.; Toplu, E.; Aydın, E.; Sarıbıyık, A.; Dok, G.; Akkaya, A.; Vural, İ.; Öntürk, K.; et al. Seismic Damage Assessment of Under-Construction Industrial Buildings: Insights from the February 2023 Türkiye-Syria Earthquakes. Case Stud. Constr. Mater. 2024, 21, e03507. [Google Scholar] [CrossRef]
Karray, M.; Karakan, E.; Kincal, C.; Chiaradonna, A.; Gül, T.O.; Lanzo, G.; Monaco, P.; Sezer, A. Türkiye Mw 7.7 Pazarcık and Mw 7.6 Elbistan Earthquakes of February 6th, 2023: Contribution of Valley Effects on Damage Pattern. Soil Dyn. Earthq. Eng. 2024, 181, 108634. [Google Scholar] [CrossRef]
Tang, Y.; Şeşetyan, K.; Mai, P.M. Comprehensive Ground-Motion Characterization of the 6 February 2023 7.8 Pazarcık Earthquake in Kahramanmaraş, Türkiye: Insights into Attenuation Effects, Site Responses and Source Properties. Bull. Earthq. Eng. 2024, 22, 6829–6857. [Google Scholar] [CrossRef]
Akinci, A.; D’Amico, S.; Malagnini, L.; Mercuri, A. Scaling Earthquake Ground Motions in Western Anatolia, Turkey. Phys. Chem. Earth Parts A/B/C 2013, 63, 124–135. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V.; Saitta, L. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
Cover, T.M.; Hart, P.E. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
Cox, D.R. The Regression Analysis of Binary Sequences. J. R. Stat. Soc. Ser. B Stat. Methodol. 1958, 20, 215–232. [Google Scholar] [CrossRef]
Hand, D.J.; Yu, K. Idiot’s Bayes—Not so Stupid after All? Int. Stat. Rev. 2001, 69, 385–398. [Google Scholar] [CrossRef]
Fisher, R.A. The Use of Multiple Measurements in Taxonomic Problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]

Figure 1. Geographic distribution of analyzed buildings across the affected region. The map shows the spatial concentration of 304,299 buildings with color coding representing damage severity levels. The coordinate system displays longitude (x-axis, °E) and latitude (y-axis, °N) with axis limits optimized to focus on the building distribution area, eliminating empty geographical regions for better visualization clarity.

Figure 2. Distribution of building damage categories in the dataset. Severely damaged buildings constitute the largest category (67.1%, n = 204,257), followed by moderately damaged (13.6%, n = 41,291), collapsed (12.6%, n = 38,365), and urgent demolition (6.7%, n = 20,386) categories.

Figure 3. Provincial distribution of building damage shows significant spatial heterogeneity. Hatay province shows the highest absolute number of damaged buildings, while Kahramanmaraş and Adıyaman exhibit high damage concentrations relative to their building stock.

Figure 4. Pearson correlation matrix revealing systematic linear relationships between seismic parameters. Strong positive correlations exist between PGA components in different directions (r > 0.85), indicating consistent ground motion amplitudes across orthogonal recording directions. Distance-based parameters show expected spatial consistency patterns with moderate to strong correlations. Color coding represents correlation coefficient strength from −1 (dark blue) to +1 (dark red), with correlation values displayed in each cell for precise interpretation.

Figure 5. Distribution of peak ground acceleration (NS) values across different seismic stations. The histograms show frequency distributions with Station 3 (green) exhibiting the highest PGA values, followed by Station 2 (orange) and Station 1 (blue), reflecting varying proximity to fault rupture zones.

Figure 6. Proposed methodology workflow for seismic damage assessment. The workflow diagram uses a systematic color coding scheme: red/pink represents raw data collection phases (building damage and seismic station data), blue indicates data processing and preparation steps (decoding, validation, feature engineering), green shows machine learning implementation stages (model training, individual algorithms, ensemble methods), orange represents performance evaluation and analysis phases, purple highlights optimization and best model selection processes, and yellow denotes final results and practical applications.

Figure 7. Comparative performance analysis showing test accuracy and cross-validation scores for all evaluated algorithms. Random forest achieves the highest performance (79.65%), followed closely by the voting ensemble (79.62%) and extra trees (79.54%).

Figure 8. Relationship between cross-validation and test accuracy showing consistent performance across validation strategies. The diagonal line represents perfect correlation, with most models showing slight generalization gaps.

Figure 9. Confusion matrix for the best-performing random forest model showing classification accuracy across all damage categories. The matrix reveals strong diagonal performance with some confusion between adjacent damage levels.

Figure 10. Top 10 most important features identified by random forest algorithm. Distance-based parameters dominate feature importance, with station proximity being the primary predictor of damage severity.

Figure 11. ROC curves for selected models using one-vs.-rest binary classification approach. The curves demonstrate varying discrimination capabilities across different algorithmic approaches.

Figure 12. Relationship between model complexity (approximate parameter count) and test accuracy. The analysis reveals optimal complexity zones where performance peaks without excessive computational overhead.

Figure 13. Learning curves for the best-performing random forest model showing training and validation scores as functions of training set size. The curves indicate good generalization with minimal overfitting.

Table 1. Detailed earthquake parameters.

Parameter	Event 1	Event 2
Date	6 February 2023	6 February 2023
Local Time	04:17:34	13:24:49
Magnitude (Mw)	7.7	7.6
Epicenter Coordinates	37.174° N, 37.032° E	38.024° N, 37.203° E
Focal Depth	8.6 km	7.0 km
Fault Mechanism	Strike-slip	Strike-slip
Rupture Length	~300 km	~150 km
Maximum PGA Recorded	>2000 gal	>1500 gal

Table 2. Seismic parameter ranges and distributions.

Parameter	Minimum	Maximum	Mean	Unit
Peak Ground Acceleration:
PGA_NS	0.38	1787.92	314.20	gal
PGA_EW	2.59	1372.07	315.68	gal
PGA_UD	1.03	1296.27	229.03	gal
Distance Metrics:
Rjb	0.12	458.92	98.45	km
Rrup	2.34	462.18	102.78	km
Repi	45.67	512.34	156.89	km
Rhyp	46.12	513.02	157.23	km
Site Parameters:
Vs30	185.00	1456.00	542.67	m/s

Table 3. Soil classification encoding.

Soil Class	Vs30 Range (m/s)	Encoded Value	Stiffness Level
A	>1500	4	Very Stiff
B	760–1500	3	Stiff
C	360–760	2	Moderate
D	<360	1	Soft

Table 4. Variable encoding.

Damage Status	Encoded Label	Frequency	Percentage
Severely	0	204,257	67.1%
Collapsed	1	38,365	12.6%
Moderately	2	41,291	13.6%
Urgent_demolition	3	20,386	6.7%

Table 5. Dataset partitioning strategy.

Subset	Size	Percentage	Purpose
Training	243,439	80%	Model training
Testing	60,860	20%	Final evaluation

Table 6. Comprehensive model performance results.

Model	Test Accuracy	CV Mean	CV Std	Rank
Random Forest	0.7965	0.7921	0.0016	1
Voting Ensemble	0.7962	0.7883	0.0016	2
Extra Trees	0.7954	0.7915	0.0018	3
Decision Tree	0.7870	0.7831	0.0020	4
K-Nearest Neighbors	0.7746	0.7685	0.0023	5
XGBoost	0.7562	0.7539	0.0013	6
LightGBM	0.7430	0.7410	0.0015	7
Logistic Regression	0.6731	0.6737	0.0003	8
Linear Discriminant	0.6725	0.6724	0.0010	9
Quadratic Discriminant	0.4171	0.4168	0.0023	10
Naive Bayes	0.3535	0.3585	0.0021	11

Table 7. Detailed classification performance for random forest.

Damage Category	Precision	Recall	F1-Score	Support
Severely	0.844	0.941	0.891	40,851
Collapsed	0.584	0.313	0.408	7673
Moderately	0.726	0.705	0.715	8258
Urgent_demolition	0.562	0.426	0.485	4077
Macro Average	0.679	0.596	0.625	60,859
Weighted Average	0.789	0.797	0.785	60,859

Table 8. Binary classification performance (severely vs. others).

Model	AUC
Random Forest	0.840
Extra Trees	0.828
Decision Tree	0.794
K-Nearest Neighbors	0.780
Logistic Regression	0.645
Linear Discriminant	0.645
XGBoost	0.809
LightGBM	0.790

Table 9. Complexity–performance trade-offs.

Complexity Level	Models	Accuracy Range	Training Time	Interpretability
Very Low	Naive Bayes, Quadratic Discriminant	35–42%	Very Fast	High
Low	Linear Discriminant, Logistic Regression	67–68%	Fast	High
Medium	Decision Tree, K-Nearest Neighbors	77–79%	Moderate	Medium
High	XGBoost, LightGBM	74–76%	Slow	Low
Very High	Random Forest, Voting Ensemble	79–80%	Very Slow	Very Low

Table 10. Training time analysis.

Model	Training Time	Prediction Time
Naive Bayes	0.1 s	0.01 s
Logistic Regression	0.5 s	0.01 s
Decision Tree	2.3 s	0.02 s
Random Forest	15.7 s	0.15 s
XGBoost	45.2 s	0.08 s
Ensemble	18.5 s	0.17 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nemutlu, Ö.F.; Özçelik, S.T.A.; Freeshah, M. A Machine Learning Framework for Regional Damage Assessment Using Multi-Station Seismic Parameters: Insights from the 2023 Kahramanmaraş Earthquakes. Buildings 2025, 15, 3326. https://doi.org/10.3390/buildings15183326

AMA Style

Nemutlu ÖF, Özçelik STA, Freeshah M. A Machine Learning Framework for Regional Damage Assessment Using Multi-Station Seismic Parameters: Insights from the 2023 Kahramanmaraş Earthquakes. Buildings. 2025; 15(18):3326. https://doi.org/10.3390/buildings15183326

Chicago/Turabian Style

Nemutlu, Ömer Faruk, Salih Taha Alperen Özçelik, and Mohamed Freeshah. 2025. "A Machine Learning Framework for Regional Damage Assessment Using Multi-Station Seismic Parameters: Insights from the 2023 Kahramanmaraş Earthquakes" Buildings 15, no. 18: 3326. https://doi.org/10.3390/buildings15183326

APA Style

Nemutlu, Ö. F., Özçelik, S. T. A., & Freeshah, M. (2025). A Machine Learning Framework for Regional Damage Assessment Using Multi-Station Seismic Parameters: Insights from the 2023 Kahramanmaraş Earthquakes. Buildings, 15(18), 3326. https://doi.org/10.3390/buildings15183326

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Machine Learning Framework for Regional Damage Assessment Using Multi-Station Seismic Parameters: Insights from the 2023 Kahramanmaraş Earthquakes

Abstract

1. Introduction

2. Data and Study Area

2.1. Sequence of the February 2023 Kahramanmaraş Earthquakes

2.2. Study Area and Affected Regions

2.3. Building Damage Data Collection

2.3.1. Data Source and Collection Methodology

2.3.2. Damage Classification System

2.3.3. Data Processing and Quality Control

2.3.4. Building Dataset Characteristics and Limitations

2.4. Seismic Ground Motion Data

2.4.1. Seismic Station Network

2.4.2. Multi-Station Parameter Integration

2.5. Feature Engineering and Dataset Integration

2.5.1. Spatial Data Integration

2.5.2. Final Dataset Structure

2.6. Data Quality and Preprocessing

2.6.1. Missing Data Analysis

2.6.2. Data Distribution and Characteristics

3. Methodology

3.1. Workflow

3.1.1. Feature Selection and Categorization

3.1.2. Encoding of Categorical Features

Station Code Encoding

3.1.3. Missing Value Treatment

3.1.4. Outlier Detection and Treatment

3.2. Target Variable Encoding

3.3. Feature Scaling and Normalization

3.3.1. Algorithm-Specific Scaling

3.3.2. Robust Scaling Implementation

3.4. Feature Importance and Selection Methods

3.4.1. Univariate Feature Selection

F-Score Analysis

Mutual Information

3.4.2. Model-Based Feature Importance

3.5. Machine Learning Algorithms

3.5.1. Tree-Based Ensemble Methods

Random Forest

3.5.2. Instance-Based Learning

K-Nearest Neighbors (KNN)

3.5.3. Linear Models

Logistic Regression

3.5.4. Discriminant Analysis

Linear Discriminant Analysis (LDA)

Quadratic Discriminant Analysis (QDA)

3.5.5. Probabilistic Models

Naive Bayes

3.5.6. Single Tree Model

Decision Tree

3.6. Model Training and Validation Strategy

3.6.1. Train–Test Split

3.6.2. Cross-Validation Framework

3.6.3. Performance Metrics

3.7. Ensemble Learning Methods

3.7.1. Voting Classifier Implementation

3.7.2. Model Selection for Ensemble

3.8. Computational Implementation

3.8.1. Software Framework

3.8.2. Reproducibility Framework

3.8.3. Performance Optimization

4. Results and Analysis

4.1. Model Performance Overview

4.2. Algorithm Performance Analysis

4.3. Cross-Validation vs. Test Performance Analysis

4.4. Confusion Matrix Analysis

4.5. Feature Importance Analysis

4.6. ROC Curve Analysis

4.7. Model Complexity vs. Performance Analysis

4.8. Learning Curves Analysis

4.9. Ensemble Learning Performance

4.10. Computational Performance

5. Discussion

5.1. Algorithm Performance and Methodological Insights

5.2. Implications of Feature Importance and Seismic Engineering

5.3. Class Performance and Emergency Response Applications

5.4. Comparison with the Literature and Practical Applications

5.5. Limitations and Future Directions

6. Conclusions