A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification

Nacaroglu, Engin; Tugrul, Ayse Tuba; Yagcioglu, Berk

doi:10.3390/app16010330

Open AccessArticle

A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification

by

Engin Nacaroglu

^*

,

Ayse Tuba Tugrul

and

Berk Yagcioglu

Faculty of Engineering, Department of Civil Engineering, Kınıklı Campus, Pamukkale University, 20070 Denizli, Türkiye

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(1), 330; https://doi.org/10.3390/app16010330

Submission received: 25 November 2025 / Revised: 25 December 2025 / Accepted: 26 December 2025 / Published: 29 December 2025

(This article belongs to the Special Issue The Application of Machine Learning in Geotechnical Engineering, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

In tabular liquefaction datasets, data augmentation plays a crucial role in enhancing the classification performance of machine learning models. In this study, an XAI-supported, error-focused, weighting-based data augmentation framework is proposed to improve CPT-based soil liquefaction classification in data-limited case-history settings by leveraging feedback from test misclassifications. First, it is hypothesized that test errors are non-random and that certain features contributed the most to misclassifications. Accordingly, a SHAP-based error-contribution score approach was developed to identify error-contributing features. The core of the proposed framework relies on assigning weights to error-contributing features. This targeted weighting was employed in two components: (i) clustering to select training samples for augmentation; and (ii) noise injection applied only in difficult-to-predict regions. To this end, test errors were combined with the training data, and weighted Fuzzy C-Means clustering was applied by assigning a weight of 1.5 to the distance metric in the error-contributing features. Clusters where test errors were concentrated were therefore defined as “difficult-to-predict regions”. In these clusters, noise was injected into the error-contributing features with 1.5× higher amplitude. This design directly integrated XAI-based error explanations into the data augmentation process, enabling targeted augmentation in difficult-to-predict regions. Consequently, the decision boundaries of the models became sharper, particularly in the error-contributing features. The Random Forest model achieved the highest improvement, with its F₁ score increasing by 0.019. These findings demonstrate that the proposed framework enhances classification performance for tabular liquefaction data.

Keywords:

data augmentation; explainable artificial intelligence; soil liquefaction; liquefaction data; error-focused data augmentation; error-contribution score

1. Introduction

Liquefaction is the loss of soil strength resulting from the increase in pore water pressure and the corresponding decrease in effective stress in saturated and particularly cohesionless soils under cyclic and dynamic loading. Loss of bearing capacity, ground settlement, and lateral deformations are typical examples of liquefaction-induced damage. The extensive damage caused by the 1964 Niigata (Japan) and Alaska (USA) earthquakes is considered the starting point of the comprehensive research on liquefaction. Major earthquakes, such as Loma Prieta (1989), Northridge (1994), Kocaeli (1999), Canterbury (2010–2011), Tohoku (2011), and Kahramanmaraş (2023) [1,2], have also resulted in severe liquefaction-induced damage. Liquefaction potential assessment, which is conducted to determine the resistance of soils against liquefaction, emerges as a mandatory analysis for damage prediction, risk management, implementation of preventive measures, and necessary planning. Various methods have been proposed in the literature for this purpose. These include the simplified method [3], the strain-based method [4], energy-based methods [5], laboratory and in situ testing methods [6,7,8], physical modeling testing methods [9,10], and machine learning/artificial intelligence-based approaches [11,12]. In addition, liquefaction vulnerability parameters have been proposed to provide a more performance-oriented assessment of liquefaction effects. Among these, the liquefaction potential index (LPI) [13,14], the liquefaction severity number (LSN) [15,16], and one-dimensional post-liquefaction reconsolidation settlement (S_V1D) [17] are widely used for correlating subsurface conditions with the severity of liquefaction-induced ground deformations and infrastructure damage.

The use of machine learning (ML) in liquefaction prediction has been ongoing since the mid-1990s. Early examples were introduced by [18,19] through backpropagation-based artificial neural networks (ANNs). Limited observations, class imbalance, and overfitting often yield inconsistent results across training runs. Consequently, robust liquefaction models require large, diverse datasets. Therefore, increasing data availability through larger datasets or augmentation improves accuracy and generalization [20,21]. The primary innovation of this study is not a new liquefaction triggering formula, but a novel methodology to overcome the data scarcity and class imbalance problems inherent in geotechnical engineering case histories.

Field case histories of liquefaction are inherently limited in number, which constrains the diversity and representativeness of CPT-based liquefaction classification datasets and increases the risk of overfitting in ML models. In such data-scarce settings, carefully designed augmentation strategies can improve accuracy, robustness, and predictive stability, particularly near the decision boundary for liquefaction occurrence detection. Moreover, it has been shown that models trained on a single dataset may fail when applied to different datasets [22]. In response to this need, Minarelli et al. [23] compiled a database covering 120 liquefied sites from the 2012 Emilia earthquake. Similarly, Hudson et al. [24], within the NGL project, expanded the database to include laboratory testing results and emphasized that robust models require large, diverse datasets beyond field case histories alone.

In line with this, Zhang et al. [25] noted that broader applicability of their Bayesian-optimized SVM for soil-liquefaction prediction needs more data and richer features. To mitigate data dependency, data augmentation has been frequently employed. For instance, Jas and Dodagoudar [12] addressed data imbalance using K-means-supported SMOTE, while Chen et al. [21] tackled the small-sample problem with WGAN, showing it generates more realistic and discrete feature-appropriate samples compared to SMOTE.

In the broader ML literature, targeted data augmentation approaches focusing on incorrect or low-confidence predictions have been proposed. In a non-liquefaction context, de la Calleja et al. [26] showed that SMMO outperformed SMOTE in all tested datasets. Khmaissia and Frigui [27] reported semi-supervised gains on CIFAR-100 with a WideResNet-50-2 backbone by targeting misclassified and low-confidence samples. Apicella et al. [28] incorporated SHAP-based explanations into their framework, showing that explanations can enhance accuracy under certain conditions.

However, data augmentation approaches that directly focus on explanations of misclassifications remain limited. Given the limited observations, class imbalance, and the small-sample nature of tabular soil liquefaction classification datasets, integrating XAI into data augmentation frameworks is therefore of critical importance, particularly in data-scarce domains such as soil liquefaction.

In this study, an XAI-supported, error-focused, and weighting-based data augmentation framework was developed using a tabular liquefaction dataset, driven by misclassified test predictions. While previous work has highlighted the value of data augmentation and model interpretability separately, the proposed framework advances this direction by directly integrating error explanations into the augmentation process. The causes of model test errors were explained through XAI, and a feature-level error-contribution approach was employed to incorporate this information into the weighting strategy. By applying weighting, error-focused clustering was used to identify difficult-to-predict regions, and error-focused data augmentation was applied within these regions. A resulting new training dataset was obtained to reduce model errors and to enhance the predictive performance. In this way, the framework integrates model feedback into targeted data augmentation, which is expected to enhance accuracy and generalizability for tabular prediction systems, particularly in data-limited domains such as soil liquefaction.

2. Materials and Methods

2.1. Liquefaction Dataset Compilation and Feature Engineering

The dataset used in this study, consisting of 321 samples, was compiled from the liquefaction datasets presented by Boulanger and Idriss [29] and Juang et al. [30]. The dataset has a tabular structure composed of ten features representing field observations and a binary class label indicating whether liquefaction occurred in each sample. Three features (CSR, I_c, FC) were generated through feature engineering techniques. In ML, model performance heavily depends on data representation; improving feature extraction/representations has been empirically shown to reduce classification error [31].

This tabular dataset was split into 75% training data (240 samples) and 25% test data (81 samples). Table 1 presents the number of samples for each liquefaction class. All analyses were conducted in Python (v3.13).

To improve model performance, new features were generated, in addition to the experimentally measured data, using empirical formulas. The features were determined based on soil and ground motion parameters known to influence liquefaction. In general, liquefaction models are evaluated in relation to the CSR (cyclic stress ratio) and the CRR (cyclic resistance ratio). Features representing the CSR include peak ground acceleration, effective stress, total stress, and depth, along with the CSR value itself. Features representing the CRR include raw CPT data (f_s and q_c) and I_c, empirically derived from the CPT data. Independently, liquefaction triggering is also affected by fines content. The influence of fines content on liquefaction triggering in relation to the CSR, q_c1Ncs, and FC has been presented in the triggering curves of Boulanger and Idriss [29]. Accordingly, the geotechnical features representing the CSR, CRR, and FS are summarized in Table 2. Explicit engineering parameters, like the CRR or factor of safety (FS), were intentionally excluded from the input set. The objective of using ML in this context is to discover data-driven, non-linear decision boundaries based on fundamental seismic and geotechnical parameters, rather than constraining the model to replicate the existing simplified semi-empirical procedures.

The soil behavior index (I_c) was determined using Equation (1), as proposed by Robertson and Wride [32].

I_{c} = {[(3.47 - l o g (Q))^{2} + (1.22 + l o g (F))^{2}]}^{0.5}

(1)

Here, Q and F values were calculated as defined in Equations (2) and (3), respectively. The parameter n was calculated according to Equation (4), as proposed by Robertson [33]. Since the equations are interdependent, I_c, Q, F, and n values were calculated iteratively.

Q = (\frac{q_{c} - σ_{v c}}{P_{a}}) \cdot {(\frac{P_{a}}{σ_{v c}^{'}})}^{n}

(2)

F = (\frac{f_{s}}{q_{c} - σ_{v c}}) \cdot 100 %

(3)

n = 0.381 \cdot (I_{c}) + 0.05 \cdot (\frac{σ_{v 0}^{'}}{p_{a}}) - 0.15

(4)

The CSR was determined based on the formula developed by Seed and Idriss [34] (Equation (5)). Here, r_d represents the stress reduction coefficient, which accounts for the deformability of the soil column by reducing the shear stress as depth increases compared to a rigid body assumption [35].

C S R = 0.65 \cdot \frac{σ_{v 0}}{σ_{v 0}^{'}} \cdot \frac{a_{m a x}}{g} {\cdot r}_{d}

(5)

The general statistical values of the dataset are presented in Table 3.

2.2. Soil Liquefaction Classification Procedure with Original Training Data

Prior to augmentation, liquefaction classification was performed using the original training data (Data1—240 samples) in Data1–Tune1 experiments. In this setting, ten decision tree-based ensemble learning algorithms, including Random Forest (RF), Extra Trees (ExT), Balanced Random Forest (BRF), Rotation Forest (RoF), AdaBoost (AdaB), Gradient Boosting (GBM), CatBoost (CatB), LightGBM (LGBM), NGBoost (NGB), and XGBoost (XGB), were evaluated. These models formed the foundation of the proposed framework (Figure 1) and produced misclassified test predictions.

Ensemble methods improve the predictive performance by combining the outputs of multiple base learners with two classic families: bagging [36] and boosting [37]. In the bagging family, RF builds trees from bootstrap samples with random feature selection, while BRF grows each tree using a class-balanced bootstrap [38]. RoF applies a PCA to feature subsets, constructing trees in a transformed feature space and enhancing model diversity [39]. ExT enhances ensemble diversity by randomizing both feature and cut-point selection at each split and by growing trees on the full learning sample [40]. Within boosting approaches, AdaBoost iteratively reduces errors by assigning higher weights to misclassified samples [37]. GBM minimizes loss by sequentially adding trees in the direction of the negative gradient [41]. XGB improves GBM through second-order optimization, regularization, and parallelization [42]. LGBM achieves high speed and memory efficiency with histogram-based splitting, GOSS, and EFB techniques [43]. CatB has been specifically optimized for the efficient handling of categorical features [44]. Finally, NGB was developed not only to perform classical prediction but to produce probabilistic outputs by learning the parameters of conditional distributions via the natural gradient method [45].

Before data augmentation, hyperparameter optimization (Tune1) was conducted on the ten decision tree-based algorithms using the original training data (Data1, 240 samples) through an exhaustive grid search with 5-fold cross-validation. This tuning strategy is consistent with applied augmentation studies that employ grid-search-based cross-validation for hyperparameter selection [46,47]. For each hyperparameter combination, 5-fold CV F₁ scores were computed, and the mean and standard deviation of F₁ were recorded to summarize the expected performance and to quantify the fold-to-fold variability (i.e., stability-to-data partitioning) during model selection.

Class imbalance was addressed in all algorithms. With the final set of hyperparameters, the models were trained on the 75% training set, and their performance was reported on the fixed 25% test set (Data1–Tune1 experiments, Table 4).

For each model, the F₁ score on the test data, its balance with training values (in terms of overfitting), the 5-fold CV metrics during training (CV mean F₁ and standard deviation), as well as the ROC AUC (area under the receiver operating characteristic curve) values were considered. The F₁ score is the harmonic mean of precision and recall, providing a balanced measure in imbalanced datasets. ROC AUC provides a single comparison score and is insensitive to highly imbalanced datasets, making it reliable in imbalanced settings [46].

According to these criteria, four models were selected in the Data1–Tune1 experiments: GBM, CatB, XGB, and RF. They showed the highest test F₁ scores, good train–test balance, and leading ROC AUC performance. GBM achieved the highest test F1 (0.925), followed by CatB, XGB, and RF. For ROC AUC, CatB achieved the highest value (0.945), with GBM, RF, and XGB showing comparable performance. The Tune1 hyperparameters were as follows: CatB (70 iterations, maximum depth = 2, learning rate = 0.20), GBM (40 trees, maximum depth = 3, learning rate = 0.08), XGB (20 trees, maximum depth = 3, learning rate = 0.20), and RF (60 trees, maximum depth = 5, maximum features =

\sqrt{n u m b e r o f f e a t u r e s}

). These hyperparameters were also applied in Data2–Tune1. The test misclassifications of these four models (5 false positives and 8 false negatives, 13 in total) were combined for the error-focused framework (Table 5).

Table 6 lists the thirteen misclassifications produced by the four selected models on the fixed test set (dataset size N = 321). In the test set, samples 243 and 112 were classified as false positives (FP) by all four models, whereas samples 182, 254, and 253 were classified as false negatives (FN); such recurring FP/FN outcomes are likely associated with transitional cases where the cyclic stress demand (CSR) approaches the cyclic resistance (CRR). These FP/FN misclassifications from the fixed test set were aggregated to form the combined error set used by the proposed error-focused framework, and they include the FP/FN samples used to compute the error-contribution scores. These model-consistent errors point to intrinsically ambiguous, near-boundary instances, motivating our study’s innovation of identifying difficult-to-predict regions within the dataset. Because the dataset is small and tabular, we focused on tree-based ensemble classifiers, which are well suited to such settings.

2.3. Outlier Detection Analysis

Outliers reduce model performance and generalizability. Kaneda et al. [48] recommended detecting and removing outliers prior to modeling, as this improves robustness and generalization ability, particularly in decision boundary-based algorithms. For this purpose, 321 samples were analyzed using the Isolation Forest method. This unsupervised approach, proposed by Liu et al. [49], computes the anomaly score from the average isolation depth E[h(x)] in randomly generated isolation trees, where the score s(x,n) is defined in Equation (6).

(x, n) = 2^{- \frac{E [h (x)]}{c (n)}}, c (n) = 2 H \cdot (n - 1) - \frac{2 (n - 1)}{n}

(6)

Here,

E [h (x)]

denotes the average isolation depth,

c (n)

is the normalization constant,

H (k)

is the k-th harmonic number, and n is the total number of samples. A score of

s (x, n) \approx 1

indicates an anomaly, whereas values below 0.5 represent normal samples.

Referring to the misclassifications summarized in Table 6, an Isolation Forest model was applied to the full dataset to flag potential outliers. Because such cases may still represent physically plausible case histories, outliers were retained in the training data; however, to prevent them from being over-emphasized in the error-focused framework, outlier-labeled samples were excluded from the error-contribution score computation (which is derived from the SHAP values computed for the non-outlier FP/FN test misclassifications) and the subsequent augmentation steps. In addition, outlier-labeled training samples were excluded from the weighted Fuzzy C-Means clustering, since they can bias distance-based partitioning by shifting cluster centers and memberships, thereby distorting the identification of difficult-to-predict regions targeted for augmentation.

Here, the score was normalized so that negative values were classified as anomalies and positive values as non-outliers. Among the 13 test errors (5 FP and 8 FN), only 2 (samples 19 and 243) were identified as anomalies (Table 7).

In the subsequent stages, the study proceeded with three FP and eight FN samples as the combined test misclassifications for the error-focused framework. These results were also incorporated into the clustering process.

2.4. SHAP-Based Error-Contribution Score: Definition, Computation, and Error-Contributing Features

SHAP, proposed by Lundberg and Lee [50], is a game theory-based explanation method. Based on Shapley values, it calculates the marginal contribution of each feature to the model as the weighted average of its effect across all possible feature combinations. The SHAP value is mathematically defined by Equation (7).

ϕ_{i} (f, x) = \sum_{S \subseteq N ∖ \{i\}} \frac{|S|! (|N| - |S| - 1)!}{|N|!} \cdot [f (S \cup {i}) - f (S)]

(7)

Here, ϕ_i denotes the contribution of the i-th feature to sample x, N represents the set of all features, and S denotes the subsets excluding i. For each S,

[f (S \cup {i}) - f (S)]

represents the change in output resulting from adding i and ϕ_i is the weighted average of these marginal contributions across all subsets. Gilpin et al. [51] emphasized that XAI enhances transparency in decision-making by revealing the reasons for model behavior, thereby highlighting its role in identifying misclassifications and biases.

The error-contribution score approach was designed under the assumption that five of the ten features would be responsible for each misclassification and that the same number would be identified as error-contributing across the dataset.

The SHAP values were computed only for the misclassified test samples in each model’s probability predictions to measure feature contributions. In this binary classification problem, liquefaction is the positive class (y = 1). Positive SHAP values increase P(y = 1), whereas negative SHAP values push the prediction toward y = 0.

For each model, only the non-outlier FP and FN samples were considered. In FP cases, the five features with the highest SHAP contributions toward the positive class were selected; whereas, in FN cases, the five features with the lowest SHAP values (i.e., those most reducing the probability of the positive class) were selected. Features were assigned scores from 5 to 1 based on their importance ranking, and color coding was used to indicate the relative error contributions of features as part of the feature-based error-contribution score approach (Table 8 and Table 9). Model-based contribution scores were calculated using Equations (8) and (9), and the indicator function was defined by Equation (10).

{F P_r a w_s c o r e}_{m o d e l} (f) = \sum_{i = 1}^{N_{F P}^{(m)}} \sum_{r = 1}^{5} P (r) \cdot I (f = {f e a t u r e}_{(r)}^{(i)})

(8)

{F N_r a w_s c o r e}_{m o d e l} (f) = \sum_{i = 1}^{N_{F N}^{(m)}} \sum_{r = 1}^{5} P (r) \cdot I (f = {f e a t u r e}_{(r)}^{(i)})

(9)

I (f = {f e a t u r e}_{(r)}^{(i)}) = \{\begin{array}{l} 1, if f is the r - th ranked feature in the i - th misclassified sample; \\ 0, otherwise \end{array}

(10)

For model m,

N_{F P}^{(m)}

and

N_{F N}^{(m)}

denote the numbers of FP and FN samples, respectively, P(r) = (5, 4, 3, 2, 1) denotes the importance scores, I(⋅) is the indicator function, and f denotes the evaluated feature. The indicator function I(⋅) checks whether f appears among the top five features in the corresponding misclassified sample and assigns the score only under this condition (Equation (10)). The raw contribution scores obtained for each model were then normalized by dividing them by the total number of FP or FN samples in that model (Equations (11) and (12)).

{F P_s c o r e}_{n o r m} (f) = \frac{{F P_r a w_s c o r e}_{m o d e l} (f)}{N_{F P}^{(m)}}

(11)

{F N_s c o r e}_{n o r m} (f) = \frac{{F N_r a w_s c o r e}_{m o d e l} (f)}{N_{F N}^{(m)}}

(12)

In the final stage, the normalized contribution scores of all models were aggregated across features to derive global FP and FN scores (Equations (13) and (14)).

{F P_s c o r e}_{g l o b a l} (f) = \sum_{m} {F P_s c o r e}_{n o r m, m} (f)

(13)

{F N_s c o r e}_{g l o b a l} (f) = \sum_{m} {F N_s c o r e}_{n o r m, m} (f)

(14)

Thus, the total contribution of each feature to FP and FN errors was calculated using Equation (15).

E r r o r_c o n t r i b u t i o n_s c o r e (f) = {F P_s c o r e}_{g l o b a l} (f) + {F N_s c o r e}_{g l o b a l} (f)

(15)

The feature-based error-contribution scores provide a comparative evaluation of feature roles in error formation, and the results are presented graphically in Figure 2.

According to the ranking based on the error-contribution scores, the variable I_c was excluded from the set of error-contributing features because it exhibited a strong positive correlation with FC and a strong negative correlation with q_c. Consequently, the error-contributing features were identified as q_c, f_s, FC, M, and a_max (Figure 3).

In geotechnical terms, the error-contributing features highlighted by the SHAP-based error-contribution score analysis (q_c, f_s, FC, M, and a_max) are consistent with the CSR–CRR basis of liquefaction triggering. In line with this interpretation, the CPT soil behavior type index, I_c, computed from normalized cone resistance and the friction ratio, provides a compact indicator of soil behavior type and is frequently used as a proxy for soil classification and fines-related effects; in CPT-based liquefaction practice, I_c (or I_c-based estimates of soil class/FC) is widely used for liquefaction susceptibility screening and for informing fines-related adjustments in triggering correlations. Since I_c is deterministically derived from q_c and f_s, strong correlations with these and other related variables are expected, and the correlation matrix (Figure 3) confirms that I_c is strongly correlated (|r| > 0.70) with multiple input variables. Therefore, I_c was retained for model training; however, as a heuristic redundancy control, it was excluded from receiving an independent weight in the error-contribution weighting/noise-injection step. Although I_c is a fundamental indicator in geotechnical practice, it is mathematically derived from q_c and f_s. Including I_c alongside its constituent variables creates severe multicollinearity, which can distort the calculation of the SHAP values and obscure the true source of model errors. A more formal, correlation-aware attribution/weighting strategy for derived features will be investigated in future work. This treatment aligns with correlation-based redundancy control/feature elimination practices, particularly in small data settings [52], and is consistent with geotechnical ML studies that employ feature-selection strategies to identify key causal factors and support interpretability [53].

2.5. Error-Focused and XAI-Supported Weighting Strategy

Targeted data augmentation strengthens model robustness by mitigating spurious correlations, enabling models to generalize more reliably across biased datasets [54]. Building on the SHAP-based error-contribution analysis, a feature weighting strategy was developed (Figure 4). For methodological consistency, a single weighting factor (w = 1.5) was assigned to the five error-contributing features (q_c, f_s, FC, M, and a_max).

The weighting strategy was applied in two stages: (i) weighted Fuzzy C-Means clustering to identify difficult-to-predict regions, and (ii) Gaussian noise injection with 1.5× amplitude into the error-contributing features to augment the training data. By injecting higher-amplitude noise into the error-contributing features within difficult-to-predict regions, the framework emphasized these critical features and enhanced their influence, thereby improving test performance (Figure 5).

2.6. Identification of Difficult-to-Predict Regions Through Error-Focused Clustering

Weighted Fuzzy C-Means (and its variants) improve clustering quality by learning feature weights that emphasize informative features [55,56,57]. In this study, weighted Fuzzy C-Means clustering was used to identify difficult-to-predict regions, aiming to augment training samples located in clusters with concentrated test errors. In the Fuzzy C-Means (FuzzyCM) algorithm, the Euclidean distance was weighted by assigning w = 1.5 to the error-contributing features identified through XAI. This increased the influence of these five features in the clustering process.

FuzzyCM is a fuzzy clustering method in which each data point can belong to multiple clusters with membership degrees between 0 and 1. Unlike K-means, memberships are fractional values in [0,1] and sum to 1 for each point. The algorithm iteratively updates the membership matrix U = [u_ik] and the cluster centers (c₁, c₂…, c_C) for a given number of clusters C, aiming to minimize the objective function (Equation (16)).

J = \sum_{i = 1}^{N} \sum_{k = 1}^{C} u_{i k}^{m} {‖ x_{i} - c_{k} ‖}^{2}

(16)

N denotes the number of data points; m > 1 is the fuzzification coefficient; u_ik represents the membership degree of the i-th data point in the k-th cluster; and c_k denotes the center of the k-th cluster. The objective function minimizes the weighted sum of the Euclidean distances between points and cluster centers, with the weights given by the m-th power of the memberships. The following occurs at each iteration: (1) Cluster centers are updated as membership-weighted means, with memberships raised to the power of m, so that points with higher degrees exert a stronger influence and pull the centers toward them (Equation (17)). (2) Membership degrees are updated once the centers are determined, with values assigned inversely to the distances, so that points closer to a center obtain higher memberships (Equation (18)).

c_{k} = \frac{\sum_{i = 1}^{N} u_{i k}^{m} x_{i}}{\sum_{i = 1}^{N} u_{i k}^{m}}

(17)

u_{i k} = {(\sum_{h = 1}^{C} {(\frac{‖ x_{i} - c_{k} ‖}{‖ x_{i} - c_{h} ‖})}^{\frac{2}{m - 1}})}^{- 1}

(18)

The algorithm terminates when the changes in centers/memberships fall below a predefined threshold or when the maximum number of iterations is reached.

Weighted FuzzyCM modifies the distance metric by assigning weights w_j to features, while the standard method treats all features equally. In a p-dimensional space, the distance between x_i and c_k is defined as a weighted Euclidean distance (Equation (19)) as follows:

{‖ x_{i} - c_{k} ‖}_{w} = \sqrt{\sum_{j = 1}^{p} w_{j} {. (x_{i j} - c_{k j})}^{2}}

(19)

Here, x_ij denotes the j-th feature of x_i, and c_kj denotes its counterpart in the k-th cluster center. A weight of w_j = 1.5 was applied to the five error-contributing features, whereas w_j = 1.0 was used for the others. These weights were applied at each iteration of the weighted Euclidean distance calculation.

Clustering was performed on the combined dataset (Data1 + Errors, 240 + 11 samples). A 2D projection using t-SNE resulted in five clusters (with centers shown as black stars (★)), and the FP/FN counts were analyzed within each cluster. Training samples identified as outliers by the Isolation Forest were excluded from the analysis.

In the five-cluster solution, 6 of the total 11 misclassified test predictions (FP + FN), more than half were concentrated in a single cluster (Cluster 3), whereas three were grouped in Cluster 4 (Figure 6). Thus, the errors were predominantly concentrated in two clusters (nine errors in total). Based on this finding, Clusters 3 and 4 were defined as difficult-to-predict regions. Training samples within these two clusters were selected for augmentation through noise injection.

The number of clusters was optimized within the range C = 2–10 using the Silhouette Score. Using FuzzyCM, the average silhouette values were computed (a(i): intra-cluster distance, b(i): nearest-cluster distance).

s (i) = \frac{b (i) - a (i)}{m a x (a (i), b (i))}

(20)

The highest average silhouette value was obtained at C = 5, which was selected as the optimal number of clusters (Figure 7).

2.7. Augmented Training Set Through Error-Focused Data Augmentation

Careful design of data augmentation is critical for preventing overfitting. Compared to classical methods, automated approaches, such as AutoAugment, select the most suitable augmentation strategies, thereby improving model performance, robustness, and reliability under real-world variability [58,59].

In this study, Data2 (339 samples) was created by augmenting 99 training samples from the difficult-to-predict regions through the injection of weighted Gaussian random noise (w = 1.5 applied to error-contributing features). For each original sample, one additional sample was generated, preserving the liquefaction class label.

More specifically, for each numerical feature value x,

ε \sim N (0, σ^{2})

was sampled, and

x' = x + ε

was applied, allowing slight variation around the original values and improving robustness to measure uncertainties. Equation (21) is used for error-contributing features and Equation (22) for the others.

σ_{n o i s e} = w \cdot n o i s e_l e v e l \cdot σ_{f e a t u r e} = 1.5 \cdot 0.05 \cdot σ_{f e a t u r e} = 0.075 \cdot σ_{f e a t u r e}

(21)

σ_{n o i s e} = n o i s e_l e v e l \cdot σ_{f e a t u r e} = 0.05 \cdot σ_{f e a t u r e}

(22)

Here, σ_feature denotes the standard deviation of the feature in the original training set, and noise_level = 0.05. Accordingly, the noise amplitude was set to approximately 7.5% of the natural variation for the critical features (5% for the others), corresponding to an amplitude 1.5 times higher. The class distribution after augmentation is shown in Table 10.

The consistency of the augmented dataset with the original distribution was evaluated using the two-sample non-parametric Kolmogorov–Smirnov (KS) test, which confirmed that the augmentation preserved the distribution. The KS test assesses whether two independent samples originate from the same distribution, defined as

D_{n, m} = {s u p}_{x} |F_{1, n} (x) - F_{2, m} (x)|

where p ≥ 0.05 indicates that H₀ cannot be rejected, and p < 0.05 indicates a significant difference.

No significant differences were detected across all variables: D∈[0.027,0.105], p∈[0.056,0.998]. ECDF comparisons of the five error-contributing features, where higher noise was applied, also confirmed this result (Figure 8).

3. New Experiments and Findings

In this section, the proposed framework was evaluated on four ensemble models (GBM, CatB, RF, and XGB) across three configurations using the fixed test set. The first configuration (Data1–Tune1) corresponds to baseline models trained on the original training data (Data1, 240 samples): hyperparameters were tuned on Data1 via grid search with five-fold cross-validation; with the final hyperparameters, models were trained on the 75% training split and evaluated on the fixed 25% test split. The second configuration (Data2–Tune1) employed the augmented training data (Data2; Data1 plus 99 new samples, total 339) while keeping the Tune1 hyperparameters fixed, to isolate the effect of augmentation at the data level. In the third configuration (Data2–Tune2), hyperparameters were re-optimized on Data2 using the same five-fold CV grid-search procedure.

The proposed error-focused augmentation uses misclassified predictions (FP/FN) to identify difficult-to-predict regions via the weighted Fuzzy C-Means clustering of the combined Data1 + Errors samples; training samples in these regions are then augmented through weighted Gaussian random-noise injection to form Data2. Test performance is reported together with cross-validation mean ± std F₁ to support stability-aware comparisons across Data1–Tune1, Data2–Tune1, and Data2–Tune2.

This comparison investigated the impact of error-focused data augmentation on classification performance. The Tune2 hyperparameters were determined as follows: CatB (120 iterations, max depth = 2, learning rate = 0.30); GBM (60 trees, max depth = 4, learning rate = 0.30); XGB (80 trees, max depth = 3, learning rate = 0.30); and RF (40 trees, max depth = 5, max features =

\sqrt{n u m b e r o f f e a t u r e s}

).

According to Figure 9 and Table 4 and Table 11, test F1 increased across all models with Data2 compared to Data1–Tune1, most notably in RF (0.906 → 0.925), with five-fold CV metrics supporting these gains. With Data2, ROC AUC improved in GBM and XGB, while precision increased in CatB, recall in GBM, and both precision and recall in RF and XGB relative to Data1–Tune1.

4. Discussion

Within the proposed error-focused data augmentation framework, augmentation is intentionally applied only within difficult-to-predict regions, defined as clusters in which misclassified test instances (FP/FN) are concentrated. This targeted strategy is motivated by data-limited tabular settings, where augmenting the entire feature space can increase the computational cost and generate redundant perturbations in already well-separated regions. By performing noise injection only within these error-prone clusters, the framework aims to reduce errors near the decision boundary while leaving well-separated regions (i.e., clusters outside the difficult-to-predict regions, where FP/FN are not concentrated) largely unchanged.

In this context, the weighting factor was fixed at w = 1.5 as a pragmatic default to demonstrate the proposed SHAP-based, error-focused augmentation framework without introducing an additional tuning loop for w, similar in spirit to fixing the perturbation budget when specifying the threat model in early adversarial training studies [60]. Importantly, the augmented training data remained statistically consistent with the original training data, as supported by the two-sample Kolmogorov–Smirnov tests across the variables (D = 0.027–0.105; p = 0.056–0.998) and the ECDF comparisons, with post-augmentation class proportions remaining comparable to the original labels. A systematic calibration and/or optimization of w is a natural extension of this work and is reserved for future studies.

Error explanations are embedded into the augmentation pipeline in three stages. First, SHAP-based error-contribution scores are computed to identify and prioritize the features most responsible for false positives and false negatives. Second, weighted clustering is employed to localize difficult to predict regions in the feature space by amplifying the influence of the prioritized features during region identification. Third, controlled Gaussian noise is injected using a higher perturbation amplitude for the prioritized features to generate targeted new training samples that remain close to the original data manifold while improving robustness in those regions.

Across the examined ensemble models, performance gains were consistently observed after augmentation, indicating that the framework operates at the data level rather than relying on a specific model architecture. The most notable improvement was achieved by RF, for which the test F₁ score increased by approximately +0.019 (0.906 → 0.925). The iterative use of observed errors to guide subsequent augmentation can be interpreted as an active learning style feedback loop, in which the learner is repeatedly exposed to informative, hard-to-classify neighborhoods of the feature space.

These distributional checks, together with the ECDF comparisons, suggest that the augmentation procedure did not introduce a noticeable shift in the underlying training distribution, and the post-augmentation class distribution remained comparable to the original label proportions.

Despite these encouraging results, the dataset size is limited, and further validation on liquefaction databases spanning different scales and geotechnical conditions is required to demonstrate broader generalizability. Future extensions may also consider the adaptive choices of the weighting factor (w) and the number of generated samples to better match dataset complexity, feature dimensionality, and potential class imbalance. Finally, the SHAP-based error-contribution scoring can be further refined for single model settings, alternative ensemble structures, or different modeling pipelines (noting that the present study normalizes the scores on a per-model basis using the combined FP/FN errors across the four selected models).

From a geotechnical perspective, a closer examination of the misclassified samples (Table 6) reveals that the errors are not randomly distributed but are heavily concentrated in transitional soil types. Most of the false positive and false negative instances exhibit soil behavior type index (I_c) values in the range of 1.80–2.20 or possess high fines content (FC). These ranges correspond to the transition zone between sand and silty sand/sandy silt mixtures, where standard CPT-based empirical correlations often face the greatest uncertainty regarding drainage conditions and liquefaction susceptibility. The clustering of errors in these zones suggests that the base models struggled to delineate the boundary in these physically ambiguous transition zones, and the proposed augmentation framework successfully sharpened the decision boundary specifically in this geotechnical ‘grey area’.

A critical design choice in this study was the exclusion of derived engineering indices, specifically the soil behavior type index (I_c), from the XAI-based weighting and augmentation process, while retaining I_c in the baseline feature set for model training. factor of safety (FS) values were excluded to prevent the machine learning models from merely replicating existing semi-empirical simplified procedures.

5. Conclusions

The method applies SHAP-based error weighting (w = 1.5) to emphasize the five most error-contributing features and to generate targeted training samples via (i) weighted Fuzzy C-Means clustering to delineate difficult-to-predict regions and (ii) controlled 1.5× Gaussian-noise injection into those features in those regions.
Future research should focus on validating this framework on larger, multi-site databases and exploring adaptive weighting mechanisms (w) to further tailor the augmentation to varying site conditions.
When evaluated on the fixed hold-out test set, the augmented data configurations (Data2–Tune1 and Data2–Tune2) improved the predictive performance across all examined models (GBM, CatB, RF, and XGB) relative to the baseline Data1–Tune1, with the largest gain observed for the RF model, where the test F₁ score increased by +0.019 (0.906 → 0.925).
The analysis of misclassifications revealed that model errors were not random but heavily concentrated in geotechnical “transition zones” (typically silty sands and sandy silts with I_c values between 1.80 and 2.20).
The approach operates at the data level and is model-agnostic; by using error feedback (FP/FN) to guide subsequent augmentation, it functions similarly to an active learning style feedback loop while maintaining statistical consistency, as supported by the KS test results.
Limitations include the relatively small dataset size and the need for further validation on larger and more diverse liquefaction datasets under different geotechnical conditions.
Owing to its model-agnostic, data-level design, the proposed framework is applicable beyond liquefaction to broader geotechnical and general tabular classification problems.

Author Contributions

Conceptualization, E.N.; methodology, E.N. and A.T.T.; software, A.T.T. and B.Y.; validation, E.N.; formal analysis, A.T.T.; investigation E.N. and A.T.T.; resources, E.N. and B.Y.; data curation, A.T.T.; writing—original draft preparation, E.N., A.T.T. and B.Y.; writing—review and editing, E.N., A.T.T. and B.Y.; visualization, A.T.T. and B.Y.; supervision, E.N. and B.Y.; project administration, E.N. and B.Y.; funding acquisition, A.T.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the TÜBİTAK BİDEB 2211-A scholarship program.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and codes presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AdaB	AdaBoost
a_max	Peak Ground Acceleration (PGA)
ANN	Artificial Neural Network
BRF	Balanced Random Forest
CatB	CatBoost
CPT	Cone Penetration Test
CRR	Cyclic Resistance Ratio
CSR	Cyclic Stress Ratio
CV	Cross-Validation
ECDF	Empirical Cumulative Distribution Function
EFB	Exclusive Feature Bundling
ExT	Extra Trees
FC	Fines Content
FuzzyCM	Fuzzy C-Means
f_s	Cone Sleeve Resistance
FS	Factor of Safety (against liquefaction)
GBM	Gradient Boosting Machine
GOSS	Gradient-based One-Side Sampling
H₀	Null Hypothesis (in statistical tests)
I_c	Soil Behavior Type Index
LGBM	Light Gradient Boosting Machine
LPI	Liquefaction Potential Index
LSN	Liquefaction Severity Number
M	Earthquake Magnitude
ML	Machine Learning
NGB	Natural Gradient Boosting
PCA	Principal Component Analysis
q_c	Cone Tip Resistance
q_c1Ncs	Normalized Clean Sand Cone Tip Resistance
r_d	Stress Reduction Factor
RF	Random Forest
ROC AUC	Receiver Operating Characteristic—Area Under Curve
RoF	Rotation Forest
SMMO	Selective Minority Oversampling
SMOTE	Synthetic Minority Oversampling Technique
SVM	Support Vector Machines
SHAP	SHapley Additive exPlanations
t-SNE	T-distributed Stochastic Neighbor Embedding
w	Weighting Factor
WGAN	Wasserstein Generative Adversarial Network
XAI	Explainable Artificial Intelligence
XGB	Extreme Gradient Boosting (XGBoost)
σ_v	Total Vertical Stress
σ’_v	Effective Vertical Stress

References

Toprak, S.; Zulfikar, A.C.; Mutlu, A.; Tugsal, U.M.; Nacaroglu, E.; Karabulut, S.; Karimzadeh, S. The aftermath of 2023 Kahramanmaras earthquakes: Evaluation of strong motion data, geotechnical, building, and infrastructure issues. Nat. Hazards 2025, 121, 2155–2192. [Google Scholar] [CrossRef]
Ozener, P.; Monkul, M.M.; Bayat, E.E.; Ari, A.; Cetin, K.O. Liquefaction and performance of foundation systems in Iskenderun during 2023 Kahramanmaras-Turkiye earthquake sequence. Soil Dyn. Earthq. Eng. 2024, 178, 108433. [Google Scholar] [CrossRef]
Seed, H.B.; Idriss, I.M. Analysis of soil liquefaction: Niigata earthquake. J. Soil Mech. Found. Div. 1967, 93, 83–108. [Google Scholar] [CrossRef]
Dobry, R. Prediction of Pore Water Pressure Buildup and Liquefaction of Sands During Earthquakes by the Cyclic Strain Method; US Department of Commerce: Washington, DC, USA, 1982. [Google Scholar]
Berrill, J.B.; Davis, R.O. Energy dissipation and seismic liquefaction of sands: Revised model. Soils Found. 1985, 25, 106–118. [Google Scholar] [CrossRef]
Seed, R.B.; Cetin, K.O.; Moss, R.E.; Kammerer, A.M.; Wu, J.; Pestana, J.M.; Riemer, M.F.; Sancio, R.B.; Bray, R.R.; Kayen, R.R.; et al. Recent advances in soil liquefaction engineering: A unified and consistent framework. In Proceedings of the 26th Annual ASCE Los Angeles Geotechnical Spring Seminar, Long Beach, CA, USA, 30 April 2003. [Google Scholar]
Youd, T.L.; Idriss, I.M.; Andrus, R.D.; Arango, I.; Castro, G.; Christian, J.T.; Dobry, R.; Finn, L.; Harder, L.F., Jr.; Koester, J.P.; et al. Liquefaction Resistance of Soils: Summary Report from the 1996 NCEER and 1998 NCEER/NSF Workshops on Evaluation of Liquefaction Resistance of Soils. J. Geotech. Geoenviron. Eng. 2001, 127, 817–833. [Google Scholar] [CrossRef]
Bray, J.D.; Sancio, R.B.; Riemer, M.F.; Durgunoglu, T. Liquefaction susceptibility of fine-grained soils. In Proceedings of the 11th International Conference on Soil Dynamics and Earthquake Engineering and 3rd International Conference on Earthquake Geotechnical Engineering, Berkeley, CA, USA, 7–9 January 2004; pp. 655–662. [Google Scholar]
Iwasaki, T.; Arakawa, T.; Tokida, K.I. Simplified procedures for assessing soil liquefaction during earthquakes. Int. J. Soil Dyn. Earthq. Eng. 1984, 3, 49–58. [Google Scholar] [CrossRef]
Yasuda, S.; Nagase, H.; Kiku, H.; Uchida, Y. The Mechanism and A Simplified Procedure for the Analysis of Permanent Ground Displacement due to Liquefaction. Soils Found. 1992, 32, 149–160. [Google Scholar] [CrossRef]
Jas, K.; Dodagoudar, G.R. Liquefaction Potential Assessment of Soils Using Machine Learning Techniques: A State-of-the-Art Review from 1994–2021. Int. J. Geomech. 2023, 23, 03123002. [Google Scholar] [CrossRef]
Jas, K.; Dodagoudar, G.R. Explainable machine learning model for liquefaction potential assessment of soils using XGBoost-SHAP. Soil Dyn. Earthq. Eng. 2023, 165, 107662. [Google Scholar] [CrossRef]
Iwasaki, T.; Tatsuoka, F.; Tokida, K.; Yasuda, S. A practical method for assessing soil liquefaction potential based on case studies at various sites in Japan. In Proceedings of the 2nd International Conference on Microzonation for Safer Construction—Research and Application, San Francisco, CA, USA, 26 November–1 December 1978; Volume 2, pp. 885–896. [Google Scholar]
Toprak, S.; Holzer, T.L. Liquefaction potential index: Field assessment. J. Geotech. Geoenviron. Eng. 2003, 129, 315–322. [Google Scholar] [CrossRef]
van Ballegooy, S.; Wentz, F.; Boulanger, R.W. Evaluation of CPT-based liquefaction procedures at regional scale. Soil Dyn. Earthq. Eng. 2015, 79, 315–334. [Google Scholar] [CrossRef]
Toprak, S.; Nacaroglu, E.; van Ballegooy, S.; Koc, A.C.; Jacka, M.; Manav, Y.; O’Rourke, T.D. Segmented pipeline damage predictions using liquefaction vulnerability parameters. Soil Dyn. Earthq. Eng. 2019, 125, 105758. [Google Scholar] [CrossRef]
Tonkin & Taylor Ltd. Canterbury Earthquake Sequence: Increased Liquefaction Vulnerability Assessment Methodology; Report No. 52010.140.v1.0; Tonkin & Taylor Ltd.: Auckland, New Zealand, 2015. [Google Scholar]
Tung, A.T.; Wang, Y.Y.; Wong, F.S. Assessment of liquefaction potential using neural networks. Soil Dyn. Earthq. Eng. 1993, 12, 325–335. [Google Scholar] [CrossRef]
Goh, A.T. Seismic liquefaction potential assessed by neural networks. J. Geotech. Geoenviron. Eng. 1994, 120, 1467–1480. [Google Scholar] [CrossRef]
Alobaidi, M.H.; Meguid, M.A.; Chebana, F. Predicting seismic-induced liquefaction through ensemble learning frameworks. Sci. Rep. 2019, 9, 11786. [Google Scholar] [CrossRef]
Chen, M.; Kang, X.; Ma, X. Deep Learning-Based Enhancement of Small Sample Liquefaction Data. Int. J. Geomech. 2023, 23, 04023176. [Google Scholar] [CrossRef]
Preethaa, S.; Natarajan, Y.; Rathinakumar, A.P.; Lee, D.E.; Choi, Y.; Park, Y.J.; Yi, C.Y. A stacked generalization model to enhance prediction of earthquake-induced soil liquefaction. Sensors 2022, 22, 7292. [Google Scholar] [CrossRef]
Minarelli, L.; Amoroso, S.; Civico, R.; De Martini, P.M.; Lugli, S.; Martelli, L.; Molisso, F.; Rollins, K.M.; Salocchi, A.; Stefani, M.; et al. Liquefied sites of the 2012 Emilia earthquake: A comprehensive database of the geological and geotechnical features (Quaternary alluvial Po plain, Italy). Bull. Earthq. Eng. 2022, 20, 3659–3697. [Google Scholar] [CrossRef]
Hudson, K.S.; Zimmaro, P.; Ulmer, K.; Brandenberg, S.J.; Stewart, J.P.; Kramer, S.L. Laboratory component of next-generation liquefaction project database. In Proceedings of the 4th International Conference on Performance-Based Design in Earthquake Geotechnical Engineering, Beijing, China, 15–17 July 2022; Springer International Publishing: Cham, Switzerland, 2022; pp. 1865–1874. [Google Scholar] [CrossRef]
Zhang, X.; He, B.; Sabri, M.M.S.; Al-Bahrani, M.; Ulrikh, D.V. Soil liquefaction prediction based on Bayesian optimization and support vector machines. Sustainability 2022, 14, 11944. [Google Scholar] [CrossRef]
De La Calleja, J.; Fuentes, O.; González, J. Selecting minority examples from misclassified data for over-sampling. In Proceedings of the Florida Artificial Intelligence Research Society International Conference (FLAIRS), Coconut Grove, FL, USA, 15–17 May 2008; pp. 276–281. [Google Scholar]
Khmaissia, F.; Frigui, H. Confidence-guided data augmentation for improved semi-supervised training. arXiv 2022, arXiv:2209.08174. [Google Scholar] [CrossRef]
Apicella, A.; Giugliano, S.; Isgrò, F.; Prevete, R. SHAP-based explanations to improve classification systems. In Proceedings of the Italian Workshop on Explainable Artificial Intelligence (XAI.it@AIxIA 2023), Rome, Italy, 6–9 November 2023; pp. 76–86. [Google Scholar]
Boulanger, R.W.; Idriss, I.M. CPT and SPT Based Liquefaction Triggering Procedures; Report No. UCD/CGM-14/01; Center for Geotechnical Modeling, Department of Civil and Environmental Engineering, University of California: Davis, CA, USA, 2014. [Google Scholar]
Juang, C.H.; Yuan, H.; Lee, D.H.; Lin, P.S. Simplified Cone Penetration Test-based method for evaluating liquefaction resistance of soils. J. Geotech. Geoenviron. Eng. 2003, 129, 66–80. [Google Scholar] [CrossRef]
Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
Robertson, P.K.; Wride, C.E. Evaluating cyclic liquefaction potential using the cone penetration test. Can. Geotech. J. 1998, 35, 442–459. [Google Scholar] [CrossRef]
Robertson, P.K. Interpretation of cone penetration tests—A unified approach. Can. Geotech. J. 2009, 46, 1337–1355. [Google Scholar] [CrossRef]
Seed, H.B.; Idriss, I.M. Simplified procedure for evaluating soil liquefaction potential. J. Soil Mech. Found. Div. 1971, 97, 1249–1273. [Google Scholar] [CrossRef]
Idriss, I.M. An update to the Seed-Idriss simplified procedure for evaluating liquefaction potential. In Proceedings of the TRB Workshop on New Approaches to Liquefaction, Washington, DC, USA, 10 January 1999; Federal Highway Administration: Washington, DC, USA, 1999. No. FHWARD-99-165. [Google Scholar]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Chen, C.; Liaw, A.; Breiman, L. Using Random Forest to Learn Imbalanced Data; Technical Report 666; Department of Statistics, University of California: Berkeley, CA, USA, 2004. [Google Scholar]
Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1619–1630. [Google Scholar] [CrossRef]
Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 3146–3154. [Google Scholar]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems 31; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31, pp. 6638–6648. [Google Scholar]
Islam, S.M.M.; Hossain, S.M.M.; Ray, S. DTI-SNNFRA: Drug-target interaction prediction by shared nearest neighbors and fuzzy-rough approximation. PLoS ONE 2021, 16, e0246920. [Google Scholar] [CrossRef] [PubMed]
Weng, Y.; Liu, Y.; Chuang, H.-H. Intelligent Assessment of Scientific Creativity by Integrating Data Augmentation and Pseudo-Labeling. Information 2025, 16, 785. [Google Scholar] [CrossRef]
Nguyen, H.A.T.; Pham, D.H.; Ahn, Y. Effect of Data Augmentation Using Deep Learning on Predictive Models for Geopolymer Compressive Strength. Appl. Sci. 2024, 14, 3601. [Google Scholar] [CrossRef]
Kaneda, Y.; Pei, Y.; Zhao, Q.; Liu, Y. Improving the performance of the decision boundary making algorithm via outlier detection. J. Inf. Process. 2015, 23, 497–504. [Google Scholar] [CrossRef]
Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining explanations: An overview of interpretability of machine learning. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–4 October 2018; pp. 80–89. [Google Scholar] [CrossRef]
Mikołajczyk-Bareła, A.; Ferlin, M.; Grochowski, M. Targeted data augmentation for improving model robustness. Int. J. Appl. Math. Comput. Sci. 2025, 35, 143–155. [Google Scholar] [CrossRef]
Rickert, C.A.; Henkel, M.; Lieleg, O. An efficiency-driven, correlation-based feature elimination strategy for small datasets. APL Mach. Learn. 2023, 1, 016105. [Google Scholar] [CrossRef]
Mali, N.; Dutt, V.; Uday, K.V. Determining the Geotechnical Slope Failure Factors via Ensemble and Individual Machine Learning Techniques: A Case Study in Mandi, India. Front. Earth Sci. 2021, 9, 701837. [Google Scholar] [CrossRef]
Wang, C.H.; Cheng, C.S.; Lee, T.T. Dynamical optimal training for interval type-2 fuzzy neural network (T2FNN). IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2004, 34, 1462–1477. [Google Scholar] [CrossRef]
Zhou, P.; Qi, Z.; Zheng, S.; Xu, J.; Bao, H.; Xu, B. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv 2016, arXiv:1611.06639. [Google Scholar] [CrossRef]
Lu, Q.; Shi, P.; Lam, H.K.; Zhao, Y. Interval type-2 fuzzy model predictive control of nonlinear networked control systems. IEEE Trans. Fuzzy Syst. 2015, 23, 2317–2328. [Google Scholar] [CrossRef]
Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 113–123. [Google Scholar] [CrossRef]
Wong, S.C.; Gatt, A.; Stamatescu, V.; McDonnell, M.D. Understanding data augmentation for classification: When to warp? In Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 30 November–2 December 2016; pp. 1–6. [Google Scholar] [CrossRef]
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2019, 1706.06083. [Google Scholar] [CrossRef]

Figure 1. Data augmentation framework for improving model performance.

Figure 2. Error-contribution scores for the ten features.

Figure 3. Correlation matrix of the liquefaction dataset.

Figure 4. General workflow of the error-focused and XAI-supported weighting strategy.

Figure 5. Detailed workflow of applying the data augmentation framework to soil liquefaction classification.

Figure 6. Two-dimensional view of Weighted FuzzyCM (feature space reduced to two components). Five clusters were identified (centers shown as black ★, clusters labeled 1–5). Misclassified samples are shown in red (FP) and green (FN).

Figure 7. Selection of the number of clusters based on Silhouette Score optimization.

Figure 8. ECDF plots of the five error-contributing features for the original and the augmented training data. (a) q_c, (b) f_s, (c) FC, (d) M, (e) a_max.

Figure 9. Comparison of the test results ((a) F₁ score and (b) ROC AUC values) for the four models across three experimental setups: before augmentation (Data1–Tune1) and after augmentation (Data2–Tune1, Data2–Tune2).

Table 1. Number of samples in liquefaction classes of the dataset.

Dataset	Liquefied (Yes)	Non-Liquefied (No)	Liquefied (%)	Non-Liquefied (%)
Full Data	229	92	71.34	28.66
Training	174	66	72.5	27.5
Test	55	26	67.9	32.1

Table 2. Data acquisition methods and data types.

Parameter (Symbol)	Data Source
Earthquake magnitude (M)	Measured
Peak ground acceleration (a_max)	Measured
Depth (z)	Measured
Cone tip resistance (q_c)	Measured
Cone sleeve resistance (f_s)	Measured
Cyclic stress ratio (CSR)	Calculated
Soil behavior index (I_c)	Calculated
Fines content (FC)	Measured (if available, otherwise calculated)
Total stress (σ_v)	Measured
Effective stress (σ’_v)	Measured

Table 3. General statistical values of the dataset.

	Count	Mean	Std. dev.	Min	25%	50%	75%	Max
M (M_w)	321	6.960	0.566	5.900	6.600	7.100	7.100	9.000
a_max (g)	321	0.322	0.149	0.090	0.210	0.250	0.400	0.840
z (m)	321	4.700	2.260	1.400	2.900	4.300	5.900	12.700
q_c (kPa)	321	5792.890	3704.660	784.530	3255.810	5000.000	7500.000	25,000.000
f_s (kPa)	321	43.857	44.797	0.980	17.600	31.200	55.897	362.846
CSR	321	0.273	0.126	0.070	0.170	0.245	0.345	0.695
I_c	321	1.985	0.299	1.244	1.811	1.980	2.194	2.959
FC (%)	321	23.238	20.876	0.000	5.000	19.385	34.970	99.766
σ_v (kPa)	321	87.676	43.079	24.000	53.200	80.000	109.300	235.500
σ’_v (kPa)	321	62.444	28.495	19.000	41.000	56.500	77.500	161.600

Table 4. Data1–Tune1 experimental results with ten decision tree-based ensemble learning algorithms.

Model	Subset	F₁ Score	ROC AUC	Mean F₁ (CV)	Std. dev. of F₁ (CV)	Precision	Recall
GBM	Train	0.974	0.994	0.927	0.022	0.994	0.954
GBM	Test	0.925	0.924	0.927	0.022	0.961	0.891
CatB	Train	0.949	0.986	0.912	0.022	0.943	0.954
CatB	Test	0.917	0.945	0.912	0.022	0.926	0.909
XGB	Train	0.940	0.978	0.899	0.019	0.932	0.948
XGB	Test	0.907	0.917	0.899	0.019	0.925	0.891
RF	Train	0.962	0.990	0.903	0.037	0.976	0.948
RF	Test	0.906	0.924	0.903	0.037	0.941	0.873
ExT	Train	0.939	0.965	0.897	0.025	0.953	0.925
ExT	Test	0.899	0.919	0.897	0.025	0.907	0.891
RoF	Train	0.942	0.980	0.888	0.048	0.958	0.925
RoF	Test	0.893	0.909	0.888	0.048	0.958	0.836
LGBM	Train	0.946	0.976	0.891	0.028	0.933	0.960
LGBM	Test	0.889	0.903	0.891	0.028	0.906	0.873
NGB	Train	0.944	0.989	0.904	0.025	0.914	0.977
NGB	Test	0.883	0.886	0.904	0.025	0.875	0.891
BRF	Train	0.909	0.932	0.887	0.029	0.933	0.885
BRF	Test	0.874	0.897	0.887	0.029	0.938	0.818
AdaB	Train	0.956	0.984	0.895	0.027	0.982	0.931
AdaB	Test	0.874	0.931	0.895	0.027	0.938	0.818

Table 5. Indices of FP and FN samples observed in the four models (with indices in the 321-sample dataset).

Model	False Positives (FP)	False Negatives (FN)
CatB	243, 310, 112, 19	182, 254, 203, 247, 253
RF	243, 112, 19	182, 162, 254, 203, 93, 247, 253
XGB	243, 310, 112, 32	182, 162, 254, 93, 309, 253
GBM	243, 112	182, 162, 254, 203, 247, 253

Table 6. Thirteen FP and FN samples of the four selected models (before outlier detection analysis).

Index	Error Type	M (M_w)	a_max (g)	z (m)	q_c (kPa)	f_s (kPa)	CSR (-)	I_c (-)	FC (%)	σ_v (kPa)	σ’_v (kPa)	True Label	Predicted Label
19	FP	6.00	0.37	11.00	6400.00	76.80	0.33	2.16	35.89	209.00	119.00	No	Yes
32	FP	6.20	0.13	4.80	5226.94	77.47	0.13	2.14	30.00	90.00	54.00	No	Yes
112	FP	7.10	0.24	3.00	4700.00	28.20	0.17	1.94	18.13	58.50	51.50	No	Yes
243	FP	7.10	0.38	2.80	900.00	44.10	0.37	2.84	90.01	52.30	34.10	No	Yes
310	FP	7.80	0.16	7.40	6472.30	24.52	0.15	1.82	2.00	135.00	86.00	No	Yes
93	FN	6.90	0.60	4.60	9806.65	14.71	0.59	1.41	0.00	86.00	54.00	Yes	No
162	FN	7.10	0.28	7.00	8500.00	59.50	0.27	1.85	11.12	133.00	84.00	Yes	No
182	FN	7.10	0.27	5.80	9400.00	84.60	0.27	1.84	10.48	109.30	67.60	Yes	No
203	FN	7.10	0.21	7.30	7700.00	30.80	0.18	1.79	6.37	140.70	98.50	Yes	No
247	FN	7.10	0.17	7.30	7227.50	43.15	0.17	1.87	12.16	138.00	80.00	Yes	No
253	FN	7.10	0.23	3.70	7600.15	40.21	0.25	1.69	0.00	70.00	42.00	Yes	No
254	FN	7.10	0.22	4.80	9149.60	56.88	0.25	1.70	0.00	92.00	50.00	Yes	No
309	FN	7.80	0.18	3.30	3236.20	12.75	0.14	1.99	5.00	58.00	48.00	Yes	No

Table 7. Anomaly scores of the 13 test misclassifications obtained through outlier detection.

Type	FP	FP	FP	FP	FP
Index	19	32	112	243	310
Score	−0.045	0.052	0.129	−0.036	0.040
Outlier?	Yes	No	No	Yes	No
Type	FN	FN	FN	FN	FN	FN	FN	FN
Index	93	162	182	203	247	253	254	309
Score	0.018	0.088	0.076	0.071	0.082	0.107	0.103	0.088
Outlier?	No	No	No	No	No	No	No	No

Table 8. The SHAP values of the FP test samples with respect to Class 1.

Global Index:310
Model	M	a_max	z	q_c	f_s	CSR	I_c	FC	σ_v	σ’_v
CatB	0.016	−0.140	0.016	0.003	0.109	−0.137	0.003	−0.003	0.041	0.003
XGB	0.033	−0.126	0.005	−0.048	0.057	−0.083	0.045	−0.049	0.074	0.014
Global Index: 112
CatB	0.013	0.018	−0.009	0.206	−0.013	−0.031	0.028	−0.005	−0.016	0.011
RF	0.014	−0.014	−0.008	0.141	−0.013	−0.042	0.044	0.014	−0.005	−0.002
XGB	0.008	−0.007	0.002	0.238	0.025	−0.010	0.003	0.027	−0.048	−0.036
GBM	0.014	−0.012	−0.009	0.202	−0.005	−0.028	0.021	0.014	−0.003	0.005
Global Index: 32
XGB	−0.033	−0.149	−0.008	0.202	−0.077	−0.096	−0.001	0.034	0.041	−0.025

Table 9. The SHAP values of the FN test samples with respect to Class 1.

Global Index: 182
Model	M	a_max	z	q_c	f_s	CSR	I_c	FC	σ_v	σ’_v
CatB	0.014	−0.014	0.014	−0.299	−0.143	0.091	0.021	−0.003	0.045	0.012
RF	0.003	−0.006	0.020	−0.250	−0.097	0.060	0.029	0.007	0.022	0.009
XGB	0.000	−0.005	0.030	−0.303	−0.061	−0.013	0.051	−0.029	0.077	−0.012
GBM	−0.001	−0.040	0.023	−0.259	−0.162	0.068	0.015	−0.006	0.033	0.018
Global Index: 254
CatB	0.014	−0.038	−0.002	−0.301	−0.086	0.082	−0.088	−0.003	0.028	−0.027
RF	0.008	−0.029	0.004	−0.234	−0.071	0.036	−0.105	−0.069	0.001	−0.015
XGB	0.001	−0.038	−0.008	−0.285	−0.054	−0.003	−0.056	−0.037	0.006	−0.040
GBM	0.008	−0.024	−0.001	−0.258	−0.098	0.038	−0.081	−0.025	0.000	−0.034
Global Index: 203
CatB	0.015	−0.050	0.015	−0.180	−0.056	−0.075	0.009	−0.003	0.047	0.013
RF	0.004	−0.046	0.019	−0.138	−0.027	−0.070	0.028	0.000	0.007	0.008
GBM	0.007	−0.040	0.028	−0.140	−0.069	−0.081	0.020	−0.036	0.030	0.014
Global Index: 247
CatB	0.016	−0.153	0.016	−0.003	−0.042	−0.066	0.022	−0.003	0.050	0.014
RF	0.005	−0.126	0.015	−0.038	−0.017	−0.070	0.034	0.010	0.006	0.012
GBM	0.011	−0.120	0.025	−0.005	−0.062	−0.084	0.020	−0.014	0.023	0.007
Global Index: 253
CatB	0.015	−0.006	−0.002	−0.167	−0.067	0.095	−0.092	−0.003	−0.023	−0.029
RF	0.010	−0.003	−0.013	−0.104	−0.072	0.065	−0.122	−0.073	−0.006	−0.023
XGB	0.000	−0.012	−0.008	−0.116	−0.035	0.011	−0.060	−0.049	−0.025	−0.027
GBM	0.013	−0.026	−0.014	−0.154	−0.110	0.046	−0.088	−0.025	−0.008	−0.034
Global Index: 162
RF	0.004	0.002	0.009	−0.230	−0.045	0.062	0.032	0.008	0.009	0.007
XGB	0.000	−0.002	−0.004	−0.308	−0.058	−0.013	0.044	−0.022	0.076	0.003
GBM	−0.001	−0.03	0.024	−0.253	−0.080	0.081	0.023	−0.013	0.036	0.016
Global Index: 93
RF	−0.009	0.049	−0.009	−0.214	0.129	0.045	−0.114	−0.056	0.000	0.001
XGB	0.000	0.128	−0.004	−0.251	0.085	0.041	−0.105	−0.039	0.014	−0.02
Global Index: 309
XGB	0.042	−0.164	−0.024	0.108	0.127	−0.096	−0.005	−0.039	−0.035	−0.052

Table 10. Liquefaction class percentages after data augmentation.

Dataset	Number of Samples	Yes (%)	No (%)
Full Data	420	70.24	29.76
Training	339	70.80	29.20
Test	81	67.90	32.10

Table 11. Results of Data2–Tune1 (D2-T1) and Data2–Tune2 (D2-T2) experiments for liquefaction models developed using four tree-based ensemble learning algorithms.

Model	F₁ D2-T1	F₁ D2-T2	ROC AUC D2-T1	ROC AUC D2-T2	Prec. D2-T1	Prec. D2-T2	Recall D2-T1	Recall D2-T2	CV Mean F₁ D2-T1	CV Std. dev. F₁ D2-T1	CV Mean F₁ D2-T2	CV Std. dev. D2-T1
CatB	0.926	0.926	0.943	0.942	0.943	0.943	0.909	0.909	0.907	0.037	0.935	0.027
RF	0.925	0.899	0.917	0.894	0.961	0.907	0.891	0.891	0.913	0.032	0.921	0.024
XGB	0.906	0.915	0.929	0.928	0.941	0.857	0.873	0.982	0.904	0.039	0.928	0.024
GBM	0.917	0.929	0.931	0.931	0.926	0.912	0.909	0.945	0.916	0.026	0.946	0.023

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nacaroglu, E.; Tugrul, A.T.; Yagcioglu, B. A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification. Appl. Sci. 2026, 16, 330. https://doi.org/10.3390/app16010330

AMA Style

Nacaroglu E, Tugrul AT, Yagcioglu B. A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification. Applied Sciences. 2026; 16(1):330. https://doi.org/10.3390/app16010330

Chicago/Turabian Style

Nacaroglu, Engin, Ayse Tuba Tugrul, and Berk Yagcioglu. 2026. "A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification" Applied Sciences 16, no. 1: 330. https://doi.org/10.3390/app16010330

APA Style

Nacaroglu, E., Tugrul, A. T., & Yagcioglu, B. (2026). A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification. Applied Sciences, 16(1), 330. https://doi.org/10.3390/app16010330

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification

Abstract

1. Introduction

2. Materials and Methods

2.1. Liquefaction Dataset Compilation and Feature Engineering

2.2. Soil Liquefaction Classification Procedure with Original Training Data

2.3. Outlier Detection Analysis

2.4. SHAP-Based Error-Contribution Score: Definition, Computation, and Error-Contributing Features

2.5. Error-Focused and XAI-Supported Weighting Strategy

2.6. Identification of Difficult-to-Predict Regions Through Error-Focused Clustering

2.7. Augmented Training Set Through Error-Focused Data Augmentation

3. New Experiments and Findings

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI