Next Article in Journal
Image Classification of Raw Beef Cuts Based on the Improvement of YOLOv11n Using Wavelet Convolution
Previous Article in Journal
Study on Surface Movement Law of Coal Seam Mining Based on the Measured Data and Numerical Simulation
Previous Article in Special Issue
Inversion of Mechanical Parameters of Tunnel Surrounding Rock Based on Improved GWO-BP Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification

Faculty of Engineering, Department of Civil Engineering, Kınıklı Campus, Pamukkale University, 20070 Denizli, Türkiye
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(1), 330; https://doi.org/10.3390/app16010330
Submission received: 25 November 2025 / Revised: 25 December 2025 / Accepted: 26 December 2025 / Published: 29 December 2025

Abstract

In tabular liquefaction datasets, data augmentation plays a crucial role in enhancing the classification performance of machine learning models. In this study, an XAI-supported, error-focused, weighting-based data augmentation framework is proposed to improve CPT-based soil liquefaction classification in data-limited case-history settings by leveraging feedback from test misclassifications. First, it is hypothesized that test errors are non-random and that certain features contributed the most to misclassifications. Accordingly, a SHAP-based error-contribution score approach was developed to identify error-contributing features. The core of the proposed framework relies on assigning weights to error-contributing features. This targeted weighting was employed in two components: (i) clustering to select training samples for augmentation; and (ii) noise injection applied only in difficult-to-predict regions. To this end, test errors were combined with the training data, and weighted Fuzzy C-Means clustering was applied by assigning a weight of 1.5 to the distance metric in the error-contributing features. Clusters where test errors were concentrated were therefore defined as “difficult-to-predict regions”. In these clusters, noise was injected into the error-contributing features with 1.5× higher amplitude. This design directly integrated XAI-based error explanations into the data augmentation process, enabling targeted augmentation in difficult-to-predict regions. Consequently, the decision boundaries of the models became sharper, particularly in the error-contributing features. The Random Forest model achieved the highest improvement, with its F1 score increasing by 0.019. These findings demonstrate that the proposed framework enhances classification performance for tabular liquefaction data.

1. Introduction

Liquefaction is the loss of soil strength resulting from the increase in pore water pressure and the corresponding decrease in effective stress in saturated and particularly cohesionless soils under cyclic and dynamic loading. Loss of bearing capacity, ground settlement, and lateral deformations are typical examples of liquefaction-induced damage. The extensive damage caused by the 1964 Niigata (Japan) and Alaska (USA) earthquakes is considered the starting point of the comprehensive research on liquefaction. Major earthquakes, such as Loma Prieta (1989), Northridge (1994), Kocaeli (1999), Canterbury (2010–2011), Tohoku (2011), and Kahramanmaraş (2023) [1,2], have also resulted in severe liquefaction-induced damage. Liquefaction potential assessment, which is conducted to determine the resistance of soils against liquefaction, emerges as a mandatory analysis for damage prediction, risk management, implementation of preventive measures, and necessary planning. Various methods have been proposed in the literature for this purpose. These include the simplified method [3], the strain-based method [4], energy-based methods [5], laboratory and in situ testing methods [6,7,8], physical modeling testing methods [9,10], and machine learning/artificial intelligence-based approaches [11,12]. In addition, liquefaction vulnerability parameters have been proposed to provide a more performance-oriented assessment of liquefaction effects. Among these, the liquefaction potential index (LPI) [13,14], the liquefaction severity number (LSN) [15,16], and one-dimensional post-liquefaction reconsolidation settlement (SV1D) [17] are widely used for correlating subsurface conditions with the severity of liquefaction-induced ground deformations and infrastructure damage.
The use of machine learning (ML) in liquefaction prediction has been ongoing since the mid-1990s. Early examples were introduced by [18,19] through backpropagation-based artificial neural networks (ANNs). Limited observations, class imbalance, and overfitting often yield inconsistent results across training runs. Consequently, robust liquefaction models require large, diverse datasets. Therefore, increasing data availability through larger datasets or augmentation improves accuracy and generalization [20,21]. The primary innovation of this study is not a new liquefaction triggering formula, but a novel methodology to overcome the data scarcity and class imbalance problems inherent in geotechnical engineering case histories.
Field case histories of liquefaction are inherently limited in number, which constrains the diversity and representativeness of CPT-based liquefaction classification datasets and increases the risk of overfitting in ML models. In such data-scarce settings, carefully designed augmentation strategies can improve accuracy, robustness, and predictive stability, particularly near the decision boundary for liquefaction occurrence detection. Moreover, it has been shown that models trained on a single dataset may fail when applied to different datasets [22]. In response to this need, Minarelli et al. [23] compiled a database covering 120 liquefied sites from the 2012 Emilia earthquake. Similarly, Hudson et al. [24], within the NGL project, expanded the database to include laboratory testing results and emphasized that robust models require large, diverse datasets beyond field case histories alone.
In line with this, Zhang et al. [25] noted that broader applicability of their Bayesian-optimized SVM for soil-liquefaction prediction needs more data and richer features. To mitigate data dependency, data augmentation has been frequently employed. For instance, Jas and Dodagoudar [12] addressed data imbalance using K-means-supported SMOTE, while Chen et al. [21] tackled the small-sample problem with WGAN, showing it generates more realistic and discrete feature-appropriate samples compared to SMOTE.
In the broader ML literature, targeted data augmentation approaches focusing on incorrect or low-confidence predictions have been proposed. In a non-liquefaction context, de la Calleja et al. [26] showed that SMMO outperformed SMOTE in all tested datasets. Khmaissia and Frigui [27] reported semi-supervised gains on CIFAR-100 with a WideResNet-50-2 backbone by targeting misclassified and low-confidence samples. Apicella et al. [28] incorporated SHAP-based explanations into their framework, showing that explanations can enhance accuracy under certain conditions.
However, data augmentation approaches that directly focus on explanations of misclassifications remain limited. Given the limited observations, class imbalance, and the small-sample nature of tabular soil liquefaction classification datasets, integrating XAI into data augmentation frameworks is therefore of critical importance, particularly in data-scarce domains such as soil liquefaction.
In this study, an XAI-supported, error-focused, and weighting-based data augmentation framework was developed using a tabular liquefaction dataset, driven by misclassified test predictions. While previous work has highlighted the value of data augmentation and model interpretability separately, the proposed framework advances this direction by directly integrating error explanations into the augmentation process. The causes of model test errors were explained through XAI, and a feature-level error-contribution approach was employed to incorporate this information into the weighting strategy. By applying weighting, error-focused clustering was used to identify difficult-to-predict regions, and error-focused data augmentation was applied within these regions. A resulting new training dataset was obtained to reduce model errors and to enhance the predictive performance. In this way, the framework integrates model feedback into targeted data augmentation, which is expected to enhance accuracy and generalizability for tabular prediction systems, particularly in data-limited domains such as soil liquefaction.

2. Materials and Methods

2.1. Liquefaction Dataset Compilation and Feature Engineering

The dataset used in this study, consisting of 321 samples, was compiled from the liquefaction datasets presented by Boulanger and Idriss [29] and Juang et al. [30]. The dataset has a tabular structure composed of ten features representing field observations and a binary class label indicating whether liquefaction occurred in each sample. Three features (CSR, Ic, FC) were generated through feature engineering techniques. In ML, model performance heavily depends on data representation; improving feature extraction/representations has been empirically shown to reduce classification error [31].
This tabular dataset was split into 75% training data (240 samples) and 25% test data (81 samples). Table 1 presents the number of samples for each liquefaction class. All analyses were conducted in Python (v3.13).
To improve model performance, new features were generated, in addition to the experimentally measured data, using empirical formulas. The features were determined based on soil and ground motion parameters known to influence liquefaction. In general, liquefaction models are evaluated in relation to the CSR (cyclic stress ratio) and the CRR (cyclic resistance ratio). Features representing the CSR include peak ground acceleration, effective stress, total stress, and depth, along with the CSR value itself. Features representing the CRR include raw CPT data (fs and qc) and Ic, empirically derived from the CPT data. Independently, liquefaction triggering is also affected by fines content. The influence of fines content on liquefaction triggering in relation to the CSR, qc1Ncs, and FC has been presented in the triggering curves of Boulanger and Idriss [29]. Accordingly, the geotechnical features representing the CSR, CRR, and FS are summarized in Table 2. Explicit engineering parameters, like the CRR or factor of safety (FS), were intentionally excluded from the input set. The objective of using ML in this context is to discover data-driven, non-linear decision boundaries based on fundamental seismic and geotechnical parameters, rather than constraining the model to replicate the existing simplified semi-empirical procedures.
The soil behavior index (Ic) was determined using Equation (1), as proposed by Robertson and Wride [32].
I c = ( 3.47 l o g ( Q ) ) 2 + ( 1.22 + l o g ( F ) ) 2 0.5
Here, Q and F values were calculated as defined in Equations (2) and (3), respectively. The parameter n was calculated according to Equation (4), as proposed by Robertson [33]. Since the equations are interdependent, Ic, Q, F, and n values were calculated iteratively.
Q = q c σ v c P a · P a σ v c n
F = f s q c σ v c · 100 %  
n = 0.381 · I c + 0.05 · σ v 0 p a 0.15    
The CSR was determined based on the formula developed by Seed and Idriss [34] (Equation (5)). Here, rd represents the stress reduction coefficient, which accounts for the deformability of the soil column by reducing the shear stress as depth increases compared to a rigid body assumption [35].
C S R = 0.65 · σ v 0 σ v 0 · a m a x g · r d
The general statistical values of the dataset are presented in Table 3.

2.2. Soil Liquefaction Classification Procedure with Original Training Data

Prior to augmentation, liquefaction classification was performed using the original training data (Data1—240 samples) in Data1–Tune1 experiments. In this setting, ten decision tree-based ensemble learning algorithms, including Random Forest (RF), Extra Trees (ExT), Balanced Random Forest (BRF), Rotation Forest (RoF), AdaBoost (AdaB), Gradient Boosting (GBM), CatBoost (CatB), LightGBM (LGBM), NGBoost (NGB), and XGBoost (XGB), were evaluated. These models formed the foundation of the proposed framework (Figure 1) and produced misclassified test predictions.
Ensemble methods improve the predictive performance by combining the outputs of multiple base learners with two classic families: bagging [36] and boosting [37]. In the bagging family, RF builds trees from bootstrap samples with random feature selection, while BRF grows each tree using a class-balanced bootstrap [38]. RoF applies a PCA to feature subsets, constructing trees in a transformed feature space and enhancing model diversity [39]. ExT enhances ensemble diversity by randomizing both feature and cut-point selection at each split and by growing trees on the full learning sample [40]. Within boosting approaches, AdaBoost iteratively reduces errors by assigning higher weights to misclassified samples [37]. GBM minimizes loss by sequentially adding trees in the direction of the negative gradient [41]. XGB improves GBM through second-order optimization, regularization, and parallelization [42]. LGBM achieves high speed and memory efficiency with histogram-based splitting, GOSS, and EFB techniques [43]. CatB has been specifically optimized for the efficient handling of categorical features [44]. Finally, NGB was developed not only to perform classical prediction but to produce probabilistic outputs by learning the parameters of conditional distributions via the natural gradient method [45].
Before data augmentation, hyperparameter optimization (Tune1) was conducted on the ten decision tree-based algorithms using the original training data (Data1, 240 samples) through an exhaustive grid search with 5-fold cross-validation. This tuning strategy is consistent with applied augmentation studies that employ grid-search-based cross-validation for hyperparameter selection [46,47]. For each hyperparameter combination, 5-fold CV F1 scores were computed, and the mean and standard deviation of F1 were recorded to summarize the expected performance and to quantify the fold-to-fold variability (i.e., stability-to-data partitioning) during model selection.
Class imbalance was addressed in all algorithms. With the final set of hyperparameters, the models were trained on the 75% training set, and their performance was reported on the fixed 25% test set (Data1–Tune1 experiments, Table 4).
For each model, the F1 score on the test data, its balance with training values (in terms of overfitting), the 5-fold CV metrics during training (CV mean F1 and standard deviation), as well as the ROC AUC (area under the receiver operating characteristic curve) values were considered. The F1 score is the harmonic mean of precision and recall, providing a balanced measure in imbalanced datasets. ROC AUC provides a single comparison score and is insensitive to highly imbalanced datasets, making it reliable in imbalanced settings [46].
According to these criteria, four models were selected in the Data1–Tune1 experiments: GBM, CatB, XGB, and RF. They showed the highest test F1 scores, good train–test balance, and leading ROC AUC performance. GBM achieved the highest test F1 (0.925), followed by CatB, XGB, and RF. For ROC AUC, CatB achieved the highest value (0.945), with GBM, RF, and XGB showing comparable performance. The Tune1 hyperparameters were as follows: CatB (70 iterations, maximum depth = 2, learning rate = 0.20), GBM (40 trees, maximum depth = 3, learning rate = 0.08), XGB (20 trees, maximum depth = 3, learning rate = 0.20), and RF (60 trees, maximum depth = 5, maximum features = n u m b e r   o f   f e a t u r e s ). These hyperparameters were also applied in Data2–Tune1. The test misclassifications of these four models (5 false positives and 8 false negatives, 13 in total) were combined for the error-focused framework (Table 5).
Table 6 lists the thirteen misclassifications produced by the four selected models on the fixed test set (dataset size N = 321). In the test set, samples 243 and 112 were classified as false positives (FP) by all four models, whereas samples 182, 254, and 253 were classified as false negatives (FN); such recurring FP/FN outcomes are likely associated with transitional cases where the cyclic stress demand (CSR) approaches the cyclic resistance (CRR). These FP/FN misclassifications from the fixed test set were aggregated to form the combined error set used by the proposed error-focused framework, and they include the FP/FN samples used to compute the error-contribution scores. These model-consistent errors point to intrinsically ambiguous, near-boundary instances, motivating our study’s innovation of identifying difficult-to-predict regions within the dataset. Because the dataset is small and tabular, we focused on tree-based ensemble classifiers, which are well suited to such settings.

2.3. Outlier Detection Analysis

Outliers reduce model performance and generalizability. Kaneda et al. [48] recommended detecting and removing outliers prior to modeling, as this improves robustness and generalization ability, particularly in decision boundary-based algorithms. For this purpose, 321 samples were analyzed using the Isolation Forest method. This unsupervised approach, proposed by Liu et al. [49], computes the anomaly score from the average isolation depth E[h(x)] in randomly generated isolation trees, where the score s(x,n) is defined in Equation (6).
x , n = 2 E h x c n ,                               c n = 2 H · ( n 1 ) 2 ( n 1 ) n
Here, E [ h ( x ) ] denotes the average isolation depth, c ( n ) is the normalization constant, H ( k ) is the k-th harmonic number, and n is the total number of samples. A score of s ( x , n ) 1 indicates an anomaly, whereas values below 0.5 represent normal samples.
Referring to the misclassifications summarized in Table 6, an Isolation Forest model was applied to the full dataset to flag potential outliers. Because such cases may still represent physically plausible case histories, outliers were retained in the training data; however, to prevent them from being over-emphasized in the error-focused framework, outlier-labeled samples were excluded from the error-contribution score computation (which is derived from the SHAP values computed for the non-outlier FP/FN test misclassifications) and the subsequent augmentation steps. In addition, outlier-labeled training samples were excluded from the weighted Fuzzy C-Means clustering, since they can bias distance-based partitioning by shifting cluster centers and memberships, thereby distorting the identification of difficult-to-predict regions targeted for augmentation.
Here, the score was normalized so that negative values were classified as anomalies and positive values as non-outliers. Among the 13 test errors (5 FP and 8 FN), only 2 (samples 19 and 243) were identified as anomalies (Table 7).
In the subsequent stages, the study proceeded with three FP and eight FN samples as the combined test misclassifications for the error-focused framework. These results were also incorporated into the clustering process.

2.4. SHAP-Based Error-Contribution Score: Definition, Computation, and Error-Contributing Features

SHAP, proposed by Lundberg and Lee [50], is a game theory-based explanation method. Based on Shapley values, it calculates the marginal contribution of each feature to the model as the weighted average of its effect across all possible feature combinations. The SHAP value is mathematically defined by Equation (7).
ϕ i f , x = S N i S ! N S 1 ! N ! · [ f ( S { i } ) f ( S ) ]
Here, ϕi denotes the contribution of the i-th feature to sample x, N represents the set of all features, and S denotes the subsets excluding i. For each S, [ f ( S { i } ) f ( S ) ] represents the change in output resulting from adding i and ϕi is the weighted average of these marginal contributions across all subsets. Gilpin et al. [51] emphasized that XAI enhances transparency in decision-making by revealing the reasons for model behavior, thereby highlighting its role in identifying misclassifications and biases.
The error-contribution score approach was designed under the assumption that five of the ten features would be responsible for each misclassification and that the same number would be identified as error-contributing across the dataset.
The SHAP values were computed only for the misclassified test samples in each model’s probability predictions to measure feature contributions. In this binary classification problem, liquefaction is the positive class (y = 1). Positive SHAP values increase P(y = 1), whereas negative SHAP values push the prediction toward y = 0.
For each model, only the non-outlier FP and FN samples were considered. In FP cases, the five features with the highest SHAP contributions toward the positive class were selected; whereas, in FN cases, the five features with the lowest SHAP values (i.e., those most reducing the probability of the positive class) were selected. Features were assigned scores from 5 to 1 based on their importance ranking, and color coding was used to indicate the relative error contributions of features as part of the feature-based error-contribution score approach (Table 8 and Table 9). Model-based contribution scores were calculated using Equations (8) and (9), and the indicator function was defined by Equation (10).
F P _ r a w _ s c o r e m o d e l ( f ) = i = 1 N F P ( m ) r = 1 5 P r · I f = f e a t u r e ( r ) ( i )
F N _ r a w _ s c o r e m o d e l ( f ) = i = 1 N F N ( m ) r = 1 5 P r · I f = f e a t u r e ( r ) ( i )
I f = f e a t u r e r i = 1 ,   if   f   is   the   r - th   ranked   feature   in   the   i - th   misclassified   sample ;   0 ,   otherwise
For model m, N F P ( m ) and N F N ( m ) denote the numbers of FP and FN samples, respectively, P(r) = (5, 4, 3, 2, 1) denotes the importance scores, I(⋅) is the indicator function, and f denotes the evaluated feature. The indicator function I(⋅) checks whether f appears among the top five features in the corresponding misclassified sample and assigns the score only under this condition (Equation (10)). The raw contribution scores obtained for each model were then normalized by dividing them by the total number of FP or FN samples in that model (Equations (11) and (12)).
F P _ s c o r e n o r m f = F P _ r a w _ s c o r e m o d e l f N F P ( m )
F N _ s c o r e n o r m ( f ) = F N _ r a w _ s c o r e m o d e l ( f ) N F N ( m )
In the final stage, the normalized contribution scores of all models were aggregated across features to derive global FP and FN scores (Equations (13) and (14)).
F P _ s c o r e g l o b a l f = m F P _ s c o r e n o r m , m f
F N _ s c o r e g l o b a l ( f ) = m F N _ s c o r e n o r m , m ( f )
Thus, the total contribution of each feature to FP and FN errors was calculated using Equation (15).
E r r o r _ c o n t r i b u t i o n _ s c o r e ( f ) = F P _ s c o r e g l o b a l ( f ) + F N _ s c o r e g l o b a l ( f )
The feature-based error-contribution scores provide a comparative evaluation of feature roles in error formation, and the results are presented graphically in Figure 2.
According to the ranking based on the error-contribution scores, the variable Ic was excluded from the set of error-contributing features because it exhibited a strong positive correlation with FC and a strong negative correlation with qc. Consequently, the error-contributing features were identified as qc, fs, FC, M, and amax (Figure 3).
In geotechnical terms, the error-contributing features highlighted by the SHAP-based error-contribution score analysis (qc, fs, FC, M, and amax) are consistent with the CSR–CRR basis of liquefaction triggering. In line with this interpretation, the CPT soil behavior type index, Ic, computed from normalized cone resistance and the friction ratio, provides a compact indicator of soil behavior type and is frequently used as a proxy for soil classification and fines-related effects; in CPT-based liquefaction practice, Ic (or Ic-based estimates of soil class/FC) is widely used for liquefaction susceptibility screening and for informing fines-related adjustments in triggering correlations. Since Ic is deterministically derived from qc and fs, strong correlations with these and other related variables are expected, and the correlation matrix (Figure 3) confirms that Ic is strongly correlated (|r| > 0.70) with multiple input variables. Therefore, Ic was retained for model training; however, as a heuristic redundancy control, it was excluded from receiving an independent weight in the error-contribution weighting/noise-injection step. Although Ic is a fundamental indicator in geotechnical practice, it is mathematically derived from qc and fs. Including Ic alongside its constituent variables creates severe multicollinearity, which can distort the calculation of the SHAP values and obscure the true source of model errors. A more formal, correlation-aware attribution/weighting strategy for derived features will be investigated in future work. This treatment aligns with correlation-based redundancy control/feature elimination practices, particularly in small data settings [52], and is consistent with geotechnical ML studies that employ feature-selection strategies to identify key causal factors and support interpretability [53].

2.5. Error-Focused and XAI-Supported Weighting Strategy

Targeted data augmentation strengthens model robustness by mitigating spurious correlations, enabling models to generalize more reliably across biased datasets [54]. Building on the SHAP-based error-contribution analysis, a feature weighting strategy was developed (Figure 4). For methodological consistency, a single weighting factor (w = 1.5) was assigned to the five error-contributing features (qc, fs, FC, M, and amax).
The weighting strategy was applied in two stages: (i) weighted Fuzzy C-Means clustering to identify difficult-to-predict regions, and (ii) Gaussian noise injection with 1.5× amplitude into the error-contributing features to augment the training data. By injecting higher-amplitude noise into the error-contributing features within difficult-to-predict regions, the framework emphasized these critical features and enhanced their influence, thereby improving test performance (Figure 5).

2.6. Identification of Difficult-to-Predict Regions Through Error-Focused Clustering

Weighted Fuzzy C-Means (and its variants) improve clustering quality by learning feature weights that emphasize informative features [55,56,57]. In this study, weighted Fuzzy C-Means clustering was used to identify difficult-to-predict regions, aiming to augment training samples located in clusters with concentrated test errors. In the Fuzzy C-Means (FuzzyCM) algorithm, the Euclidean distance was weighted by assigning w = 1.5 to the error-contributing features identified through XAI. This increased the influence of these five features in the clustering process.
FuzzyCM is a fuzzy clustering method in which each data point can belong to multiple clusters with membership degrees between 0 and 1. Unlike K-means, memberships are fractional values in [0,1] and sum to 1 for each point. The algorithm iteratively updates the membership matrix U = [uik] and the cluster centers (c1, c2…, cC) for a given number of clusters C, aiming to minimize the objective function (Equation (16)).
J = i = 1 N k = 1 C u i k m x i c k 2
N denotes the number of data points; m > 1 is the fuzzification coefficient; uik represents the membership degree of the i-th data point in the k-th cluster; and ck denotes the center of the k-th cluster. The objective function minimizes the weighted sum of the Euclidean distances between points and cluster centers, with the weights given by the m-th power of the memberships. The following occurs at each iteration: (1) Cluster centers are updated as membership-weighted means, with memberships raised to the power of m, so that points with higher degrees exert a stronger influence and pull the centers toward them (Equation (17)). (2) Membership degrees are updated once the centers are determined, with values assigned inversely to the distances, so that points closer to a center obtain higher memberships (Equation (18)).
c k = i = 1 N u i k m x i i = 1 N u i k m
u i k = h = 1 C x i c k x i c h 2 m 1 1
The algorithm terminates when the changes in centers/memberships fall below a predefined threshold or when the maximum number of iterations is reached.
Weighted FuzzyCM modifies the distance metric by assigning weights wj to features, while the standard method treats all features equally. In a p-dimensional space, the distance between xi and ck is defined as a weighted Euclidean distance (Equation (19)) as follows:
x i c k w = j = 1 p w j   . x i j c k j 2
Here, xij denotes the j-th feature of xi, and ckj denotes its counterpart in the k-th cluster center. A weight of wj = 1.5 was applied to the five error-contributing features, whereas wj = 1.0 was used for the others. These weights were applied at each iteration of the weighted Euclidean distance calculation.
Clustering was performed on the combined dataset (Data1 + Errors, 240 + 11 samples). A 2D projection using t-SNE resulted in five clusters (with centers shown as black stars (★)), and the FP/FN counts were analyzed within each cluster. Training samples identified as outliers by the Isolation Forest were excluded from the analysis.
In the five-cluster solution, 6 of the total 11 misclassified test predictions (FP + FN), more than half were concentrated in a single cluster (Cluster 3), whereas three were grouped in Cluster 4 (Figure 6). Thus, the errors were predominantly concentrated in two clusters (nine errors in total). Based on this finding, Clusters 3 and 4 were defined as difficult-to-predict regions. Training samples within these two clusters were selected for augmentation through noise injection.
The number of clusters was optimized within the range C = 2–10 using the Silhouette Score. Using FuzzyCM, the average silhouette values were computed (a(i): intra-cluster distance, b(i): nearest-cluster distance).
s ( i ) = b ( i ) a ( i ) m a x ( a ( i ) , b ( i ) )
The highest average silhouette value was obtained at C = 5, which was selected as the optimal number of clusters (Figure 7).

2.7. Augmented Training Set Through Error-Focused Data Augmentation

Careful design of data augmentation is critical for preventing overfitting. Compared to classical methods, automated approaches, such as AutoAugment, select the most suitable augmentation strategies, thereby improving model performance, robustness, and reliability under real-world variability [58,59].
In this study, Data2 (339 samples) was created by augmenting 99 training samples from the difficult-to-predict regions through the injection of weighted Gaussian random noise (w = 1.5 applied to error-contributing features). For each original sample, one additional sample was generated, preserving the liquefaction class label.
More specifically, for each numerical feature value x, ε N ( 0 , σ 2 ) was sampled, and x = x + ε was applied, allowing slight variation around the original values and improving robustness to measure uncertainties. Equation (21) is used for error-contributing features and Equation (22) for the others.
σ n o i s e = w · n o i s e _ l e v e l · σ f e a t u r e = 1.5 · 0.05 · σ f e a t u r e = 0.075 · σ f e a t u r e  
σ n o i s e = n o i s e _ l e v e l · σ f e a t u r e = 0.05 · σ f e a t u r e  
Here, σfeature denotes the standard deviation of the feature in the original training set, and noise_level = 0.05. Accordingly, the noise amplitude was set to approximately 7.5% of the natural variation for the critical features (5% for the others), corresponding to an amplitude 1.5 times higher. The class distribution after augmentation is shown in Table 10.
The consistency of the augmented dataset with the original distribution was evaluated using the two-sample non-parametric Kolmogorov–Smirnov (KS) test, which confirmed that the augmentation preserved the distribution. The KS test assesses whether two independent samples originate from the same distribution, defined as D n , m = s u p x F 1 , n ( x ) F 2 , m ( x ) where p ≥ 0.05 indicates that H0 cannot be rejected, and p < 0.05 indicates a significant difference.
No significant differences were detected across all variables: D∈[0.027,0.105], p∈[0.056,0.998]. ECDF comparisons of the five error-contributing features, where higher noise was applied, also confirmed this result (Figure 8).

3. New Experiments and Findings

In this section, the proposed framework was evaluated on four ensemble models (GBM, CatB, RF, and XGB) across three configurations using the fixed test set. The first configuration (Data1–Tune1) corresponds to baseline models trained on the original training data (Data1, 240 samples): hyperparameters were tuned on Data1 via grid search with five-fold cross-validation; with the final hyperparameters, models were trained on the 75% training split and evaluated on the fixed 25% test split. The second configuration (Data2–Tune1) employed the augmented training data (Data2; Data1 plus 99 new samples, total 339) while keeping the Tune1 hyperparameters fixed, to isolate the effect of augmentation at the data level. In the third configuration (Data2–Tune2), hyperparameters were re-optimized on Data2 using the same five-fold CV grid-search procedure.
The proposed error-focused augmentation uses misclassified predictions (FP/FN) to identify difficult-to-predict regions via the weighted Fuzzy C-Means clustering of the combined Data1 + Errors samples; training samples in these regions are then augmented through weighted Gaussian random-noise injection to form Data2. Test performance is reported together with cross-validation mean ± std F1 to support stability-aware comparisons across Data1–Tune1, Data2–Tune1, and Data2–Tune2.
This comparison investigated the impact of error-focused data augmentation on classification performance. The Tune2 hyperparameters were determined as follows: CatB (120 iterations, max depth = 2, learning rate = 0.30); GBM (60 trees, max depth = 4, learning rate = 0.30); XGB (80 trees, max depth = 3, learning rate = 0.30); and RF (40 trees, max depth = 5, max features = n u m b e r   o f   f e a t u r e s ).
According to Figure 9 and Table 4 and Table 11, test F1 increased across all models with Data2 compared to Data1–Tune1, most notably in RF (0.906 → 0.925), with five-fold CV metrics supporting these gains. With Data2, ROC AUC improved in GBM and XGB, while precision increased in CatB, recall in GBM, and both precision and recall in RF and XGB relative to Data1–Tune1.

4. Discussion

Within the proposed error-focused data augmentation framework, augmentation is intentionally applied only within difficult-to-predict regions, defined as clusters in which misclassified test instances (FP/FN) are concentrated. This targeted strategy is motivated by data-limited tabular settings, where augmenting the entire feature space can increase the computational cost and generate redundant perturbations in already well-separated regions. By performing noise injection only within these error-prone clusters, the framework aims to reduce errors near the decision boundary while leaving well-separated regions (i.e., clusters outside the difficult-to-predict regions, where FP/FN are not concentrated) largely unchanged.
In this context, the weighting factor was fixed at w = 1.5 as a pragmatic default to demonstrate the proposed SHAP-based, error-focused augmentation framework without introducing an additional tuning loop for w, similar in spirit to fixing the perturbation budget when specifying the threat model in early adversarial training studies [60]. Importantly, the augmented training data remained statistically consistent with the original training data, as supported by the two-sample Kolmogorov–Smirnov tests across the variables (D = 0.027–0.105; p = 0.056–0.998) and the ECDF comparisons, with post-augmentation class proportions remaining comparable to the original labels. A systematic calibration and/or optimization of w is a natural extension of this work and is reserved for future studies.
Error explanations are embedded into the augmentation pipeline in three stages. First, SHAP-based error-contribution scores are computed to identify and prioritize the features most responsible for false positives and false negatives. Second, weighted clustering is employed to localize difficult to predict regions in the feature space by amplifying the influence of the prioritized features during region identification. Third, controlled Gaussian noise is injected using a higher perturbation amplitude for the prioritized features to generate targeted new training samples that remain close to the original data manifold while improving robustness in those regions.
Across the examined ensemble models, performance gains were consistently observed after augmentation, indicating that the framework operates at the data level rather than relying on a specific model architecture. The most notable improvement was achieved by RF, for which the test F1 score increased by approximately +0.019 (0.906 → 0.925). The iterative use of observed errors to guide subsequent augmentation can be interpreted as an active learning style feedback loop, in which the learner is repeatedly exposed to informative, hard-to-classify neighborhoods of the feature space.
These distributional checks, together with the ECDF comparisons, suggest that the augmentation procedure did not introduce a noticeable shift in the underlying training distribution, and the post-augmentation class distribution remained comparable to the original label proportions.
Despite these encouraging results, the dataset size is limited, and further validation on liquefaction databases spanning different scales and geotechnical conditions is required to demonstrate broader generalizability. Future extensions may also consider the adaptive choices of the weighting factor (w) and the number of generated samples to better match dataset complexity, feature dimensionality, and potential class imbalance. Finally, the SHAP-based error-contribution scoring can be further refined for single model settings, alternative ensemble structures, or different modeling pipelines (noting that the present study normalizes the scores on a per-model basis using the combined FP/FN errors across the four selected models).
From a geotechnical perspective, a closer examination of the misclassified samples (Table 6) reveals that the errors are not randomly distributed but are heavily concentrated in transitional soil types. Most of the false positive and false negative instances exhibit soil behavior type index (Ic) values in the range of 1.80–2.20 or possess high fines content (FC). These ranges correspond to the transition zone between sand and silty sand/sandy silt mixtures, where standard CPT-based empirical correlations often face the greatest uncertainty regarding drainage conditions and liquefaction susceptibility. The clustering of errors in these zones suggests that the base models struggled to delineate the boundary in these physically ambiguous transition zones, and the proposed augmentation framework successfully sharpened the decision boundary specifically in this geotechnical ‘grey area’.
A critical design choice in this study was the exclusion of derived engineering indices, specifically the soil behavior type index (Ic), from the XAI-based weighting and augmentation process, while retaining Ic in the baseline feature set for model training. factor of safety (FS) values were excluded to prevent the machine learning models from merely replicating existing semi-empirical simplified procedures.

5. Conclusions

  • The method applies SHAP-based error weighting (w = 1.5) to emphasize the five most error-contributing features and to generate targeted training samples via (i) weighted Fuzzy C-Means clustering to delineate difficult-to-predict regions and (ii) controlled 1.5× Gaussian-noise injection into those features in those regions.
  • Future research should focus on validating this framework on larger, multi-site databases and exploring adaptive weighting mechanisms (w) to further tailor the augmentation to varying site conditions.
  • When evaluated on the fixed hold-out test set, the augmented data configurations (Data2–Tune1 and Data2–Tune2) improved the predictive performance across all examined models (GBM, CatB, RF, and XGB) relative to the baseline Data1–Tune1, with the largest gain observed for the RF model, where the test F1 score increased by +0.019 (0.906 → 0.925).
  • The analysis of misclassifications revealed that model errors were not random but heavily concentrated in geotechnical “transition zones” (typically silty sands and sandy silts with Ic values between 1.80 and 2.20).
  • The approach operates at the data level and is model-agnostic; by using error feedback (FP/FN) to guide subsequent augmentation, it functions similarly to an active learning style feedback loop while maintaining statistical consistency, as supported by the KS test results.
  • Limitations include the relatively small dataset size and the need for further validation on larger and more diverse liquefaction datasets under different geotechnical conditions.
  • Owing to its model-agnostic, data-level design, the proposed framework is applicable beyond liquefaction to broader geotechnical and general tabular classification problems.

Author Contributions

Conceptualization, E.N.; methodology, E.N. and A.T.T.; software, A.T.T. and B.Y.; validation, E.N.; formal analysis, A.T.T.; investigation E.N. and A.T.T.; resources, E.N. and B.Y.; data curation, A.T.T.; writing—original draft preparation, E.N., A.T.T. and B.Y.; writing—review and editing, E.N., A.T.T. and B.Y.; visualization, A.T.T. and B.Y.; supervision, E.N. and B.Y.; project administration, E.N. and B.Y.; funding acquisition, A.T.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the TÜBİTAK BİDEB 2211-A scholarship program.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and codes presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AdaBAdaBoost
amaxPeak Ground Acceleration (PGA)
ANNArtificial Neural Network
BRFBalanced Random Forest
CatBCatBoost
CPTCone Penetration Test
CRRCyclic Resistance Ratio
CSRCyclic Stress Ratio
CVCross-Validation
ECDFEmpirical Cumulative Distribution Function
EFBExclusive Feature Bundling
ExTExtra Trees
FCFines Content
FuzzyCMFuzzy C-Means
fsCone Sleeve Resistance
FSFactor of Safety (against liquefaction)
GBMGradient Boosting Machine
GOSSGradient-based One-Side Sampling
H0Null Hypothesis (in statistical tests)
IcSoil Behavior Type Index
LGBMLight Gradient Boosting Machine
LPILiquefaction Potential Index
LSNLiquefaction Severity Number
MEarthquake Magnitude
MLMachine Learning
NGBNatural Gradient Boosting
PCAPrincipal Component Analysis
qcCone Tip Resistance
qc1NcsNormalized Clean Sand Cone Tip Resistance
rdStress Reduction Factor
RFRandom Forest
ROC AUCReceiver Operating Characteristic—Area Under Curve
RoFRotation Forest
SMMOSelective Minority Oversampling
SMOTESynthetic Minority Oversampling Technique
SVMSupport Vector Machines
SHAPSHapley Additive exPlanations
t-SNET-distributed Stochastic Neighbor Embedding
wWeighting Factor
WGANWasserstein Generative Adversarial Network
XAIExplainable Artificial Intelligence
XGBExtreme Gradient Boosting (XGBoost)
σvTotal Vertical Stress
σ’vEffective Vertical Stress

References

  1. Toprak, S.; Zulfikar, A.C.; Mutlu, A.; Tugsal, U.M.; Nacaroglu, E.; Karabulut, S.; Karimzadeh, S. The aftermath of 2023 Kahramanmaras earthquakes: Evaluation of strong motion data, geotechnical, building, and infrastructure issues. Nat. Hazards 2025, 121, 2155–2192. [Google Scholar] [CrossRef]
  2. Ozener, P.; Monkul, M.M.; Bayat, E.E.; Ari, A.; Cetin, K.O. Liquefaction and performance of foundation systems in Iskenderun during 2023 Kahramanmaras-Turkiye earthquake sequence. Soil Dyn. Earthq. Eng. 2024, 178, 108433. [Google Scholar] [CrossRef]
  3. Seed, H.B.; Idriss, I.M. Analysis of soil liquefaction: Niigata earthquake. J. Soil Mech. Found. Div. 1967, 93, 83–108. [Google Scholar] [CrossRef]
  4. Dobry, R. Prediction of Pore Water Pressure Buildup and Liquefaction of Sands During Earthquakes by the Cyclic Strain Method; US Department of Commerce: Washington, DC, USA, 1982. [Google Scholar]
  5. Berrill, J.B.; Davis, R.O. Energy dissipation and seismic liquefaction of sands: Revised model. Soils Found. 1985, 25, 106–118. [Google Scholar] [CrossRef]
  6. Seed, R.B.; Cetin, K.O.; Moss, R.E.; Kammerer, A.M.; Wu, J.; Pestana, J.M.; Riemer, M.F.; Sancio, R.B.; Bray, R.R.; Kayen, R.R.; et al. Recent advances in soil liquefaction engineering: A unified and consistent framework. In Proceedings of the 26th Annual ASCE Los Angeles Geotechnical Spring Seminar, Long Beach, CA, USA, 30 April 2003. [Google Scholar]
  7. Youd, T.L.; Idriss, I.M.; Andrus, R.D.; Arango, I.; Castro, G.; Christian, J.T.; Dobry, R.; Finn, L.; Harder, L.F., Jr.; Koester, J.P.; et al. Liquefaction Resistance of Soils: Summary Report from the 1996 NCEER and 1998 NCEER/NSF Workshops on Evaluation of Liquefaction Resistance of Soils. J. Geotech. Geoenviron. Eng. 2001, 127, 817–833. [Google Scholar] [CrossRef]
  8. Bray, J.D.; Sancio, R.B.; Riemer, M.F.; Durgunoglu, T. Liquefaction susceptibility of fine-grained soils. In Proceedings of the 11th International Conference on Soil Dynamics and Earthquake Engineering and 3rd International Conference on Earthquake Geotechnical Engineering, Berkeley, CA, USA, 7–9 January 2004; pp. 655–662. [Google Scholar]
  9. Iwasaki, T.; Arakawa, T.; Tokida, K.I. Simplified procedures for assessing soil liquefaction during earthquakes. Int. J. Soil Dyn. Earthq. Eng. 1984, 3, 49–58. [Google Scholar] [CrossRef]
  10. Yasuda, S.; Nagase, H.; Kiku, H.; Uchida, Y. The Mechanism and A Simplified Procedure for the Analysis of Permanent Ground Displacement due to Liquefaction. Soils Found. 1992, 32, 149–160. [Google Scholar] [CrossRef]
  11. Jas, K.; Dodagoudar, G.R. Liquefaction Potential Assessment of Soils Using Machine Learning Techniques: A State-of-the-Art Review from 1994–2021. Int. J. Geomech. 2023, 23, 03123002. [Google Scholar] [CrossRef]
  12. Jas, K.; Dodagoudar, G.R. Explainable machine learning model for liquefaction potential assessment of soils using XGBoost-SHAP. Soil Dyn. Earthq. Eng. 2023, 165, 107662. [Google Scholar] [CrossRef]
  13. Iwasaki, T.; Tatsuoka, F.; Tokida, K.; Yasuda, S. A practical method for assessing soil liquefaction potential based on case studies at various sites in Japan. In Proceedings of the 2nd International Conference on Microzonation for Safer Construction—Research and Application, San Francisco, CA, USA, 26 November–1 December 1978; Volume 2, pp. 885–896. [Google Scholar]
  14. Toprak, S.; Holzer, T.L. Liquefaction potential index: Field assessment. J. Geotech. Geoenviron. Eng. 2003, 129, 315–322. [Google Scholar] [CrossRef]
  15. van Ballegooy, S.; Wentz, F.; Boulanger, R.W. Evaluation of CPT-based liquefaction procedures at regional scale. Soil Dyn. Earthq. Eng. 2015, 79, 315–334. [Google Scholar] [CrossRef]
  16. Toprak, S.; Nacaroglu, E.; van Ballegooy, S.; Koc, A.C.; Jacka, M.; Manav, Y.; O’Rourke, T.D. Segmented pipeline damage predictions using liquefaction vulnerability parameters. Soil Dyn. Earthq. Eng. 2019, 125, 105758. [Google Scholar] [CrossRef]
  17. Tonkin & Taylor Ltd. Canterbury Earthquake Sequence: Increased Liquefaction Vulnerability Assessment Methodology; Report No. 52010.140.v1.0; Tonkin & Taylor Ltd.: Auckland, New Zealand, 2015. [Google Scholar]
  18. Tung, A.T.; Wang, Y.Y.; Wong, F.S. Assessment of liquefaction potential using neural networks. Soil Dyn. Earthq. Eng. 1993, 12, 325–335. [Google Scholar] [CrossRef]
  19. Goh, A.T. Seismic liquefaction potential assessed by neural networks. J. Geotech. Geoenviron. Eng. 1994, 120, 1467–1480. [Google Scholar] [CrossRef]
  20. Alobaidi, M.H.; Meguid, M.A.; Chebana, F. Predicting seismic-induced liquefaction through ensemble learning frameworks. Sci. Rep. 2019, 9, 11786. [Google Scholar] [CrossRef]
  21. Chen, M.; Kang, X.; Ma, X. Deep Learning-Based Enhancement of Small Sample Liquefaction Data. Int. J. Geomech. 2023, 23, 04023176. [Google Scholar] [CrossRef]
  22. Preethaa, S.; Natarajan, Y.; Rathinakumar, A.P.; Lee, D.E.; Choi, Y.; Park, Y.J.; Yi, C.Y. A stacked generalization model to enhance prediction of earthquake-induced soil liquefaction. Sensors 2022, 22, 7292. [Google Scholar] [CrossRef]
  23. Minarelli, L.; Amoroso, S.; Civico, R.; De Martini, P.M.; Lugli, S.; Martelli, L.; Molisso, F.; Rollins, K.M.; Salocchi, A.; Stefani, M.; et al. Liquefied sites of the 2012 Emilia earthquake: A comprehensive database of the geological and geotechnical features (Quaternary alluvial Po plain, Italy). Bull. Earthq. Eng. 2022, 20, 3659–3697. [Google Scholar] [CrossRef]
  24. Hudson, K.S.; Zimmaro, P.; Ulmer, K.; Brandenberg, S.J.; Stewart, J.P.; Kramer, S.L. Laboratory component of next-generation liquefaction project database. In Proceedings of the 4th International Conference on Performance-Based Design in Earthquake Geotechnical Engineering, Beijing, China, 15–17 July 2022; Springer International Publishing: Cham, Switzerland, 2022; pp. 1865–1874. [Google Scholar] [CrossRef]
  25. Zhang, X.; He, B.; Sabri, M.M.S.; Al-Bahrani, M.; Ulrikh, D.V. Soil liquefaction prediction based on Bayesian optimization and support vector machines. Sustainability 2022, 14, 11944. [Google Scholar] [CrossRef]
  26. De La Calleja, J.; Fuentes, O.; González, J. Selecting minority examples from misclassified data for over-sampling. In Proceedings of the Florida Artificial Intelligence Research Society International Conference (FLAIRS), Coconut Grove, FL, USA, 15–17 May 2008; pp. 276–281. [Google Scholar]
  27. Khmaissia, F.; Frigui, H. Confidence-guided data augmentation for improved semi-supervised training. arXiv 2022, arXiv:2209.08174. [Google Scholar] [CrossRef]
  28. Apicella, A.; Giugliano, S.; Isgrò, F.; Prevete, R. SHAP-based explanations to improve classification systems. In Proceedings of the Italian Workshop on Explainable Artificial Intelligence (XAI.it@AIxIA 2023), Rome, Italy, 6–9 November 2023; pp. 76–86. [Google Scholar]
  29. Boulanger, R.W.; Idriss, I.M. CPT and SPT Based Liquefaction Triggering Procedures; Report No. UCD/CGM-14/01; Center for Geotechnical Modeling, Department of Civil and Environmental Engineering, University of California: Davis, CA, USA, 2014. [Google Scholar]
  30. Juang, C.H.; Yuan, H.; Lee, D.H.; Lin, P.S. Simplified Cone Penetration Test-based method for evaluating liquefaction resistance of soils. J. Geotech. Geoenviron. Eng. 2003, 129, 66–80. [Google Scholar] [CrossRef]
  31. Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
  32. Robertson, P.K.; Wride, C.E. Evaluating cyclic liquefaction potential using the cone penetration test. Can. Geotech. J. 1998, 35, 442–459. [Google Scholar] [CrossRef]
  33. Robertson, P.K. Interpretation of cone penetration tests—A unified approach. Can. Geotech. J. 2009, 46, 1337–1355. [Google Scholar] [CrossRef]
  34. Seed, H.B.; Idriss, I.M. Simplified procedure for evaluating soil liquefaction potential. J. Soil Mech. Found. Div. 1971, 97, 1249–1273. [Google Scholar] [CrossRef]
  35. Idriss, I.M. An update to the Seed-Idriss simplified procedure for evaluating liquefaction potential. In Proceedings of the TRB Workshop on New Approaches to Liquefaction, Washington, DC, USA, 10 January 1999; Federal Highway Administration: Washington, DC, USA, 1999. No. FHWARD-99-165. [Google Scholar]
  36. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
  37. Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
  38. Chen, C.; Liaw, A.; Breiman, L. Using Random Forest to Learn Imbalanced Data; Technical Report 666; Department of Statistics, University of California: Berkeley, CA, USA, 2004. [Google Scholar]
  39. Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1619–1630. [Google Scholar] [CrossRef]
  40. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
  41. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  42. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
  43. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 3146–3154. [Google Scholar]
  44. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems 31; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31, pp. 6638–6648. [Google Scholar]
  45. Islam, S.M.M.; Hossain, S.M.M.; Ray, S. DTI-SNNFRA: Drug-target interaction prediction by shared nearest neighbors and fuzzy-rough approximation. PLoS ONE 2021, 16, e0246920. [Google Scholar] [CrossRef] [PubMed]
  46. Weng, Y.; Liu, Y.; Chuang, H.-H. Intelligent Assessment of Scientific Creativity by Integrating Data Augmentation and Pseudo-Labeling. Information 2025, 16, 785. [Google Scholar] [CrossRef]
  47. Nguyen, H.A.T.; Pham, D.H.; Ahn, Y. Effect of Data Augmentation Using Deep Learning on Predictive Models for Geopolymer Compressive Strength. Appl. Sci. 2024, 14, 3601. [Google Scholar] [CrossRef]
  48. Kaneda, Y.; Pei, Y.; Zhao, Q.; Liu, Y. Improving the performance of the decision boundary making algorithm via outlier detection. J. Inf. Process. 2015, 23, 497–504. [Google Scholar] [CrossRef]
  49. Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar] [CrossRef]
  50. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
  51. Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining explanations: An overview of interpretability of machine learning. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–4 October 2018; pp. 80–89. [Google Scholar] [CrossRef]
  52. Mikołajczyk-Bareła, A.; Ferlin, M.; Grochowski, M. Targeted data augmentation for improving model robustness. Int. J. Appl. Math. Comput. Sci. 2025, 35, 143–155. [Google Scholar] [CrossRef]
  53. Rickert, C.A.; Henkel, M.; Lieleg, O. An efficiency-driven, correlation-based feature elimination strategy for small datasets. APL Mach. Learn. 2023, 1, 016105. [Google Scholar] [CrossRef]
  54. Mali, N.; Dutt, V.; Uday, K.V. Determining the Geotechnical Slope Failure Factors via Ensemble and Individual Machine Learning Techniques: A Case Study in Mandi, India. Front. Earth Sci. 2021, 9, 701837. [Google Scholar] [CrossRef]
  55. Wang, C.H.; Cheng, C.S.; Lee, T.T. Dynamical optimal training for interval type-2 fuzzy neural network (T2FNN). IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2004, 34, 1462–1477. [Google Scholar] [CrossRef]
  56. Zhou, P.; Qi, Z.; Zheng, S.; Xu, J.; Bao, H.; Xu, B. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv 2016, arXiv:1611.06639. [Google Scholar] [CrossRef]
  57. Lu, Q.; Shi, P.; Lam, H.K.; Zhao, Y. Interval type-2 fuzzy model predictive control of nonlinear networked control systems. IEEE Trans. Fuzzy Syst. 2015, 23, 2317–2328. [Google Scholar] [CrossRef]
  58. Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 113–123. [Google Scholar] [CrossRef]
  59. Wong, S.C.; Gatt, A.; Stamatescu, V.; McDonnell, M.D. Understanding data augmentation for classification: When to warp? In Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 30 November–2 December 2016; pp. 1–6. [Google Scholar] [CrossRef]
  60. Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2019, 1706.06083. [Google Scholar] [CrossRef]
Figure 1. Data augmentation framework for improving model performance.
Figure 1. Data augmentation framework for improving model performance.
Applsci 16 00330 g001
Figure 2. Error-contribution scores for the ten features.
Figure 2. Error-contribution scores for the ten features.
Applsci 16 00330 g002
Figure 3. Correlation matrix of the liquefaction dataset.
Figure 3. Correlation matrix of the liquefaction dataset.
Applsci 16 00330 g003
Figure 4. General workflow of the error-focused and XAI-supported weighting strategy.
Figure 4. General workflow of the error-focused and XAI-supported weighting strategy.
Applsci 16 00330 g004
Figure 5. Detailed workflow of applying the data augmentation framework to soil liquefaction classification.
Figure 5. Detailed workflow of applying the data augmentation framework to soil liquefaction classification.
Applsci 16 00330 g005
Figure 6. Two-dimensional view of Weighted FuzzyCM (feature space reduced to two components). Five clusters were identified (centers shown as black ★, clusters labeled 1–5). Misclassified samples are shown in red (FP) and green (FN).
Figure 6. Two-dimensional view of Weighted FuzzyCM (feature space reduced to two components). Five clusters were identified (centers shown as black ★, clusters labeled 1–5). Misclassified samples are shown in red (FP) and green (FN).
Applsci 16 00330 g006
Figure 7. Selection of the number of clusters based on Silhouette Score optimization.
Figure 7. Selection of the number of clusters based on Silhouette Score optimization.
Applsci 16 00330 g007
Figure 8. ECDF plots of the five error-contributing features for the original and the augmented training data. (a) qc, (b) fs, (c) FC, (d) M, (e) amax.
Figure 8. ECDF plots of the five error-contributing features for the original and the augmented training data. (a) qc, (b) fs, (c) FC, (d) M, (e) amax.
Applsci 16 00330 g008aApplsci 16 00330 g008b
Figure 9. Comparison of the test results ((a) F1 score and (b) ROC AUC values) for the four models across three experimental setups: before augmentation (Data1–Tune1) and after augmentation (Data2–Tune1, Data2–Tune2).
Figure 9. Comparison of the test results ((a) F1 score and (b) ROC AUC values) for the four models across three experimental setups: before augmentation (Data1–Tune1) and after augmentation (Data2–Tune1, Data2–Tune2).
Applsci 16 00330 g009
Table 1. Number of samples in liquefaction classes of the dataset.
Table 1. Number of samples in liquefaction classes of the dataset.
DatasetLiquefied (Yes)Non-Liquefied (No)Liquefied (%)Non-Liquefied (%)
Full Data2299271.3428.66
Training1746672.527.5
Test552667.932.1
Table 2. Data acquisition methods and data types.
Table 2. Data acquisition methods and data types.
Parameter (Symbol)Data Source
Earthquake magnitude (M)Measured
Peak ground acceleration (amax)Measured
Depth (z)Measured
Cone tip resistance (qc)Measured
Cone sleeve resistance (fs)Measured
Cyclic stress ratio (CSR)Calculated
Soil behavior index (Ic)Calculated
Fines content (FC)Measured (if available, otherwise calculated)
Total stress (σv)Measured
Effective stress (σ’v)Measured
Table 3. General statistical values of the dataset.
Table 3. General statistical values of the dataset.
CountMeanStd. dev.Min25%50%75%Max
M (Mw)3216.9600.5665.9006.6007.1007.1009.000
amax (g)3210.3220.1490.0900.2100.2500.4000.840
z (m)3214.7002.2601.4002.9004.3005.90012.700
qc (kPa)3215792.8903704.660784.5303255.8105000.0007500.00025,000.000
fs (kPa)32143.85744.7970.98017.60031.20055.897362.846
CSR3210.2730.1260.0700.1700.2450.3450.695
Ic3211.9850.2991.2441.8111.9802.1942.959
FC (%)32123.23820.8760.0005.00019.38534.97099.766
σv (kPa)32187.67643.07924.00053.20080.000109.300235.500
σ’v (kPa)32162.44428.49519.00041.00056.50077.500161.600
Table 4. Data1–Tune1 experimental results with ten decision tree-based ensemble learning algorithms.
Table 4. Data1–Tune1 experimental results with ten decision tree-based ensemble learning algorithms.
ModelSubsetF1 ScoreROC AUCMean F1 (CV)Std. dev. of F1 (CV)PrecisionRecall
GBMTrain0.9740.9940.9270.0220.9940.954
GBMTest0.9250.9240.9270.0220.9610.891
CatBTrain0.9490.9860.9120.0220.9430.954
CatBTest0.9170.9450.9120.0220.9260.909
XGBTrain0.9400.9780.8990.0190.9320.948
XGBTest0.9070.9170.8990.0190.9250.891
RFTrain0.9620.9900.9030.0370.9760.948
RFTest0.9060.9240.9030.0370.9410.873
ExTTrain0.9390.9650.8970.0250.9530.925
ExTTest0.8990.9190.8970.0250.9070.891
RoFTrain0.9420.9800.8880.0480.9580.925
RoFTest0.8930.9090.8880.0480.9580.836
LGBMTrain0.9460.9760.8910.0280.9330.960
LGBMTest0.8890.9030.8910.0280.9060.873
NGBTrain0.9440.9890.9040.0250.9140.977
NGBTest0.8830.8860.9040.0250.8750.891
BRFTrain0.9090.9320.8870.0290.9330.885
BRFTest0.8740.8970.8870.0290.9380.818
AdaBTrain0.9560.9840.8950.0270.9820.931
AdaBTest0.8740.9310.8950.0270.9380.818
Table 5. Indices of FP and FN samples observed in the four models (with indices in the 321-sample dataset).
Table 5. Indices of FP and FN samples observed in the four models (with indices in the 321-sample dataset).
ModelFalse Positives (FP)False Negatives (FN)
CatB243, 310, 112, 19182, 254, 203, 247, 253
RF243, 112, 19182, 162, 254, 203, 93, 247, 253
XGB243, 310, 112, 32182, 162, 254, 93, 309, 253
GBM243, 112182, 162, 254, 203, 247, 253
Table 6. Thirteen FP and FN samples of the four selected models (before outlier detection analysis).
Table 6. Thirteen FP and FN samples of the four selected models (before outlier detection analysis).
IndexError
Type
M
(Mw)
amax
(g)
z
(m)
qc
(kPa)
fs
(kPa)
CSR
(-)
Ic
(-)
FC
(%)
σv
(kPa)
σ’v
(kPa)
True LabelPredicted Label
19FP6.000.3711.006400.0076.800.332.1635.89209.00119.00NoYes
32FP6.200.134.805226.9477.470.132.1430.0090.0054.00NoYes
112FP7.100.243.004700.0028.200.171.9418.1358.5051.50NoYes
243FP7.100.382.80900.0044.100.372.8490.0152.3034.10NoYes
310FP7.800.167.406472.3024.520.151.822.00135.0086.00NoYes
93FN6.900.604.609806.6514.710.591.410.0086.0054.00YesNo
162FN7.100.287.008500.0059.500.271.8511.12133.0084.00YesNo
182FN7.100.275.809400.0084.600.271.8410.48109.3067.60YesNo
203FN7.100.217.307700.0030.800.181.796.37140.7098.50YesNo
247FN7.100.177.307227.5043.150.171.8712.16138.0080.00YesNo
253FN7.100.233.707600.1540.210.251.690.0070.0042.00YesNo
254FN7.100.224.809149.6056.880.251.700.0092.0050.00YesNo
309FN7.800.183.303236.2012.750.141.995.0058.0048.00YesNo
Table 7. Anomaly scores of the 13 test misclassifications obtained through outlier detection.
Table 7. Anomaly scores of the 13 test misclassifications obtained through outlier detection.
Type FPFP FPFPFP
Index1932112243310
Score−0.0450.0520.129−0.0360.040
Outlier?YesNoNoYesNo
Type FNFNFNFNFNFNFNFN
Index93162182203247253254309
Score0.0180.0880.0760.0710.0820.1070.1030.088
Outlier?NoNoNoNoNoNoNoNo
Table 8. The SHAP values of the FP test samples with respect to Class 1.
Table 8. The SHAP values of the FP test samples with respect to Class 1.
Global Index:310
ModelMamaxzqcfsCSRIcFCσvσ’v
CatB0.016−0.1400.0160.0030.109−0.1370.003−0.0030.0410.003
XGB0.033−0.1260.005−0.0480.057−0.0830.045−0.0490.0740.014
Global Index: 112
CatB0.0130.018−0.0090.206−0.013−0.0310.028−0.005−0.0160.011
RF0.014−0.014−0.0080.141−0.013−0.0420.0440.014−0.005−0.002
XGB0.008−0.0070.0020.2380.025−0.0100.0030.027−0.048−0.036
GBM0.014−0.012−0.0090.202−0.005−0.0280.0210.014−0.0030.005
Global Index: 32
XGB−0.033−0.149−0.0080.202−0.077−0.096−0.0010.0340.041−0.025
Table 9. The SHAP values of the FN test samples with respect to Class 1.
Table 9. The SHAP values of the FN test samples with respect to Class 1.
Global Index: 182
ModelMamaxzqcfsCSRIcFCσvσ’v
CatB0.014−0.0140.014−0.299−0.1430.0910.021−0.0030.0450.012
RF0.003−0.0060.020−0.250−0.0970.0600.0290.0070.0220.009
XGB0.000−0.0050.030−0.303−0.061−0.0130.051−0.0290.077−0.012
GBM−0.001−0.0400.023−0.259−0.1620.0680.015−0.0060.0330.018
Global Index: 254
CatB0.014−0.038−0.002−0.301−0.0860.082−0.088−0.0030.028−0.027
RF0.008−0.0290.004−0.234−0.0710.036−0.105−0.0690.001−0.015
XGB0.001−0.038−0.008−0.285−0.054−0.003−0.056−0.0370.006−0.040
GBM0.008−0.024−0.001−0.258−0.0980.038−0.081−0.0250.000−0.034
Global Index: 203
CatB0.015−0.0500.015−0.180−0.056−0.0750.009−0.0030.0470.013
RF0.004−0.0460.019−0.138−0.027−0.0700.0280.0000.0070.008
GBM0.007−0.0400.028−0.140−0.069−0.0810.020−0.0360.0300.014
Global Index: 247
CatB0.016−0.1530.016−0.003−0.042−0.0660.022−0.0030.0500.014
RF0.005−0.1260.015−0.038−0.017−0.0700.0340.0100.0060.012
GBM0.011−0.1200.025−0.005−0.062−0.0840.020−0.0140.0230.007
Global Index: 253
CatB0.015−0.006−0.002−0.167−0.0670.095−0.092−0.003−0.023−0.029
RF0.010−0.003−0.013−0.104−0.0720.065−0.122−0.073−0.006−0.023
XGB0.000−0.012−0.008−0.116−0.0350.011−0.060−0.049−0.025−0.027
GBM0.013−0.026−0.014−0.154−0.1100.046−0.088−0.025−0.008−0.034
Global Index: 162
RF0.0040.0020.009−0.230−0.0450.0620.0320.0080.0090.007
XGB0.000−0.002−0.004−0.308−0.058−0.0130.044−0.0220.0760.003
GBM−0.001−0.030.024−0.253−0.0800.0810.023−0.0130.0360.016
Global Index: 93
RF−0.0090.049−0.009−0.2140.1290.045−0.114−0.0560.0000.001
XGB0.0000.128−0.004−0.2510.0850.041−0.105−0.0390.014−0.02
Global Index: 309
XGB0.042−0.164−0.0240.1080.127−0.096−0.005−0.039−0.035−0.052
Table 10. Liquefaction class percentages after data augmentation.
Table 10. Liquefaction class percentages after data augmentation.
DatasetNumber of SamplesYes (%)No (%)
Full Data42070.2429.76
Training33970.8029.20
Test8167.9032.10
Table 11. Results of Data2–Tune1 (D2-T1) and Data2–Tune2 (D2-T2) experiments for liquefaction models developed using four tree-based ensemble learning algorithms.
Table 11. Results of Data2–Tune1 (D2-T1) and Data2–Tune2 (D2-T2) experiments for liquefaction models developed using four tree-based ensemble learning algorithms.
ModelF1 D2-T1F1 D2-T2ROC AUC D2-T1ROC AUC D2-T2Prec. D2-T1Prec.
D2-T2
Recall D2-T1Recall
D2-T2
CV Mean F1 D2-T1CV Std. dev. F1 D2-T1CV Mean F1 D2-T2CV Std. dev. D2-T1
CatB0.9260.9260.9430.9420.9430.9430.9090.9090.907 0.0370.935 0.027
RF0.9250.8990.9170.8940.9610.9070.8910.8910.913 0.0320.921 0.024
XGB0.9060.9150.9290.9280.9410.8570.8730.9820.904 0.0390.928 0.024
GBM0.9170.9290.9310.9310.9260.9120.9090.9450.916 0.0260.946 0.023
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nacaroglu, E.; Tugrul, A.T.; Yagcioglu, B. A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification. Appl. Sci. 2026, 16, 330. https://doi.org/10.3390/app16010330

AMA Style

Nacaroglu E, Tugrul AT, Yagcioglu B. A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification. Applied Sciences. 2026; 16(1):330. https://doi.org/10.3390/app16010330

Chicago/Turabian Style

Nacaroglu, Engin, Ayse Tuba Tugrul, and Berk Yagcioglu. 2026. "A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification" Applied Sciences 16, no. 1: 330. https://doi.org/10.3390/app16010330

APA Style

Nacaroglu, E., Tugrul, A. T., & Yagcioglu, B. (2026). A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification. Applied Sciences, 16(1), 330. https://doi.org/10.3390/app16010330

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop