A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification
Abstract
1. Introduction
2. Materials and Methods
2.1. Liquefaction Dataset Compilation and Feature Engineering
2.2. Soil Liquefaction Classification Procedure with Original Training Data
2.3. Outlier Detection Analysis
2.4. SHAP-Based Error-Contribution Score: Definition, Computation, and Error-Contributing Features
2.5. Error-Focused and XAI-Supported Weighting Strategy
2.6. Identification of Difficult-to-Predict Regions Through Error-Focused Clustering
2.7. Augmented Training Set Through Error-Focused Data Augmentation
3. New Experiments and Findings
4. Discussion
5. Conclusions
- The method applies SHAP-based error weighting (w = 1.5) to emphasize the five most error-contributing features and to generate targeted training samples via (i) weighted Fuzzy C-Means clustering to delineate difficult-to-predict regions and (ii) controlled 1.5× Gaussian-noise injection into those features in those regions.
- Future research should focus on validating this framework on larger, multi-site databases and exploring adaptive weighting mechanisms (w) to further tailor the augmentation to varying site conditions.
- When evaluated on the fixed hold-out test set, the augmented data configurations (Data2–Tune1 and Data2–Tune2) improved the predictive performance across all examined models (GBM, CatB, RF, and XGB) relative to the baseline Data1–Tune1, with the largest gain observed for the RF model, where the test F1 score increased by +0.019 (0.906 → 0.925).
- The analysis of misclassifications revealed that model errors were not random but heavily concentrated in geotechnical “transition zones” (typically silty sands and sandy silts with Ic values between 1.80 and 2.20).
- The approach operates at the data level and is model-agnostic; by using error feedback (FP/FN) to guide subsequent augmentation, it functions similarly to an active learning style feedback loop while maintaining statistical consistency, as supported by the KS test results.
- Limitations include the relatively small dataset size and the need for further validation on larger and more diverse liquefaction datasets under different geotechnical conditions.
- Owing to its model-agnostic, data-level design, the proposed framework is applicable beyond liquefaction to broader geotechnical and general tabular classification problems.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AdaB | AdaBoost |
| amax | Peak Ground Acceleration (PGA) |
| ANN | Artificial Neural Network |
| BRF | Balanced Random Forest |
| CatB | CatBoost |
| CPT | Cone Penetration Test |
| CRR | Cyclic Resistance Ratio |
| CSR | Cyclic Stress Ratio |
| CV | Cross-Validation |
| ECDF | Empirical Cumulative Distribution Function |
| EFB | Exclusive Feature Bundling |
| ExT | Extra Trees |
| FC | Fines Content |
| FuzzyCM | Fuzzy C-Means |
| fs | Cone Sleeve Resistance |
| FS | Factor of Safety (against liquefaction) |
| GBM | Gradient Boosting Machine |
| GOSS | Gradient-based One-Side Sampling |
| H0 | Null Hypothesis (in statistical tests) |
| Ic | Soil Behavior Type Index |
| LGBM | Light Gradient Boosting Machine |
| LPI | Liquefaction Potential Index |
| LSN | Liquefaction Severity Number |
| M | Earthquake Magnitude |
| ML | Machine Learning |
| NGB | Natural Gradient Boosting |
| PCA | Principal Component Analysis |
| qc | Cone Tip Resistance |
| qc1Ncs | Normalized Clean Sand Cone Tip Resistance |
| rd | Stress Reduction Factor |
| RF | Random Forest |
| ROC AUC | Receiver Operating Characteristic—Area Under Curve |
| RoF | Rotation Forest |
| SMMO | Selective Minority Oversampling |
| SMOTE | Synthetic Minority Oversampling Technique |
| SVM | Support Vector Machines |
| SHAP | SHapley Additive exPlanations |
| t-SNE | T-distributed Stochastic Neighbor Embedding |
| w | Weighting Factor |
| WGAN | Wasserstein Generative Adversarial Network |
| XAI | Explainable Artificial Intelligence |
| XGB | Extreme Gradient Boosting (XGBoost) |
| σv | Total Vertical Stress |
| σ’v | Effective Vertical Stress |
References
- Toprak, S.; Zulfikar, A.C.; Mutlu, A.; Tugsal, U.M.; Nacaroglu, E.; Karabulut, S.; Karimzadeh, S. The aftermath of 2023 Kahramanmaras earthquakes: Evaluation of strong motion data, geotechnical, building, and infrastructure issues. Nat. Hazards 2025, 121, 2155–2192. [Google Scholar] [CrossRef]
- Ozener, P.; Monkul, M.M.; Bayat, E.E.; Ari, A.; Cetin, K.O. Liquefaction and performance of foundation systems in Iskenderun during 2023 Kahramanmaras-Turkiye earthquake sequence. Soil Dyn. Earthq. Eng. 2024, 178, 108433. [Google Scholar] [CrossRef]
- Seed, H.B.; Idriss, I.M. Analysis of soil liquefaction: Niigata earthquake. J. Soil Mech. Found. Div. 1967, 93, 83–108. [Google Scholar] [CrossRef]
- Dobry, R. Prediction of Pore Water Pressure Buildup and Liquefaction of Sands During Earthquakes by the Cyclic Strain Method; US Department of Commerce: Washington, DC, USA, 1982. [Google Scholar]
- Berrill, J.B.; Davis, R.O. Energy dissipation and seismic liquefaction of sands: Revised model. Soils Found. 1985, 25, 106–118. [Google Scholar] [CrossRef]
- Seed, R.B.; Cetin, K.O.; Moss, R.E.; Kammerer, A.M.; Wu, J.; Pestana, J.M.; Riemer, M.F.; Sancio, R.B.; Bray, R.R.; Kayen, R.R.; et al. Recent advances in soil liquefaction engineering: A unified and consistent framework. In Proceedings of the 26th Annual ASCE Los Angeles Geotechnical Spring Seminar, Long Beach, CA, USA, 30 April 2003. [Google Scholar]
- Youd, T.L.; Idriss, I.M.; Andrus, R.D.; Arango, I.; Castro, G.; Christian, J.T.; Dobry, R.; Finn, L.; Harder, L.F., Jr.; Koester, J.P.; et al. Liquefaction Resistance of Soils: Summary Report from the 1996 NCEER and 1998 NCEER/NSF Workshops on Evaluation of Liquefaction Resistance of Soils. J. Geotech. Geoenviron. Eng. 2001, 127, 817–833. [Google Scholar] [CrossRef]
- Bray, J.D.; Sancio, R.B.; Riemer, M.F.; Durgunoglu, T. Liquefaction susceptibility of fine-grained soils. In Proceedings of the 11th International Conference on Soil Dynamics and Earthquake Engineering and 3rd International Conference on Earthquake Geotechnical Engineering, Berkeley, CA, USA, 7–9 January 2004; pp. 655–662. [Google Scholar]
- Iwasaki, T.; Arakawa, T.; Tokida, K.I. Simplified procedures for assessing soil liquefaction during earthquakes. Int. J. Soil Dyn. Earthq. Eng. 1984, 3, 49–58. [Google Scholar] [CrossRef]
- Yasuda, S.; Nagase, H.; Kiku, H.; Uchida, Y. The Mechanism and A Simplified Procedure for the Analysis of Permanent Ground Displacement due to Liquefaction. Soils Found. 1992, 32, 149–160. [Google Scholar] [CrossRef]
- Jas, K.; Dodagoudar, G.R. Liquefaction Potential Assessment of Soils Using Machine Learning Techniques: A State-of-the-Art Review from 1994–2021. Int. J. Geomech. 2023, 23, 03123002. [Google Scholar] [CrossRef]
- Jas, K.; Dodagoudar, G.R. Explainable machine learning model for liquefaction potential assessment of soils using XGBoost-SHAP. Soil Dyn. Earthq. Eng. 2023, 165, 107662. [Google Scholar] [CrossRef]
- Iwasaki, T.; Tatsuoka, F.; Tokida, K.; Yasuda, S. A practical method for assessing soil liquefaction potential based on case studies at various sites in Japan. In Proceedings of the 2nd International Conference on Microzonation for Safer Construction—Research and Application, San Francisco, CA, USA, 26 November–1 December 1978; Volume 2, pp. 885–896. [Google Scholar]
- Toprak, S.; Holzer, T.L. Liquefaction potential index: Field assessment. J. Geotech. Geoenviron. Eng. 2003, 129, 315–322. [Google Scholar] [CrossRef]
- van Ballegooy, S.; Wentz, F.; Boulanger, R.W. Evaluation of CPT-based liquefaction procedures at regional scale. Soil Dyn. Earthq. Eng. 2015, 79, 315–334. [Google Scholar] [CrossRef]
- Toprak, S.; Nacaroglu, E.; van Ballegooy, S.; Koc, A.C.; Jacka, M.; Manav, Y.; O’Rourke, T.D. Segmented pipeline damage predictions using liquefaction vulnerability parameters. Soil Dyn. Earthq. Eng. 2019, 125, 105758. [Google Scholar] [CrossRef]
- Tonkin & Taylor Ltd. Canterbury Earthquake Sequence: Increased Liquefaction Vulnerability Assessment Methodology; Report No. 52010.140.v1.0; Tonkin & Taylor Ltd.: Auckland, New Zealand, 2015. [Google Scholar]
- Tung, A.T.; Wang, Y.Y.; Wong, F.S. Assessment of liquefaction potential using neural networks. Soil Dyn. Earthq. Eng. 1993, 12, 325–335. [Google Scholar] [CrossRef]
- Goh, A.T. Seismic liquefaction potential assessed by neural networks. J. Geotech. Geoenviron. Eng. 1994, 120, 1467–1480. [Google Scholar] [CrossRef]
- Alobaidi, M.H.; Meguid, M.A.; Chebana, F. Predicting seismic-induced liquefaction through ensemble learning frameworks. Sci. Rep. 2019, 9, 11786. [Google Scholar] [CrossRef]
- Chen, M.; Kang, X.; Ma, X. Deep Learning-Based Enhancement of Small Sample Liquefaction Data. Int. J. Geomech. 2023, 23, 04023176. [Google Scholar] [CrossRef]
- Preethaa, S.; Natarajan, Y.; Rathinakumar, A.P.; Lee, D.E.; Choi, Y.; Park, Y.J.; Yi, C.Y. A stacked generalization model to enhance prediction of earthquake-induced soil liquefaction. Sensors 2022, 22, 7292. [Google Scholar] [CrossRef]
- Minarelli, L.; Amoroso, S.; Civico, R.; De Martini, P.M.; Lugli, S.; Martelli, L.; Molisso, F.; Rollins, K.M.; Salocchi, A.; Stefani, M.; et al. Liquefied sites of the 2012 Emilia earthquake: A comprehensive database of the geological and geotechnical features (Quaternary alluvial Po plain, Italy). Bull. Earthq. Eng. 2022, 20, 3659–3697. [Google Scholar] [CrossRef]
- Hudson, K.S.; Zimmaro, P.; Ulmer, K.; Brandenberg, S.J.; Stewart, J.P.; Kramer, S.L. Laboratory component of next-generation liquefaction project database. In Proceedings of the 4th International Conference on Performance-Based Design in Earthquake Geotechnical Engineering, Beijing, China, 15–17 July 2022; Springer International Publishing: Cham, Switzerland, 2022; pp. 1865–1874. [Google Scholar] [CrossRef]
- Zhang, X.; He, B.; Sabri, M.M.S.; Al-Bahrani, M.; Ulrikh, D.V. Soil liquefaction prediction based on Bayesian optimization and support vector machines. Sustainability 2022, 14, 11944. [Google Scholar] [CrossRef]
- De La Calleja, J.; Fuentes, O.; González, J. Selecting minority examples from misclassified data for over-sampling. In Proceedings of the Florida Artificial Intelligence Research Society International Conference (FLAIRS), Coconut Grove, FL, USA, 15–17 May 2008; pp. 276–281. [Google Scholar]
- Khmaissia, F.; Frigui, H. Confidence-guided data augmentation for improved semi-supervised training. arXiv 2022, arXiv:2209.08174. [Google Scholar] [CrossRef]
- Apicella, A.; Giugliano, S.; Isgrò, F.; Prevete, R. SHAP-based explanations to improve classification systems. In Proceedings of the Italian Workshop on Explainable Artificial Intelligence (XAI.it@AIxIA 2023), Rome, Italy, 6–9 November 2023; pp. 76–86. [Google Scholar]
- Boulanger, R.W.; Idriss, I.M. CPT and SPT Based Liquefaction Triggering Procedures; Report No. UCD/CGM-14/01; Center for Geotechnical Modeling, Department of Civil and Environmental Engineering, University of California: Davis, CA, USA, 2014. [Google Scholar]
- Juang, C.H.; Yuan, H.; Lee, D.H.; Lin, P.S. Simplified Cone Penetration Test-based method for evaluating liquefaction resistance of soils. J. Geotech. Geoenviron. Eng. 2003, 129, 66–80. [Google Scholar] [CrossRef]
- Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
- Robertson, P.K.; Wride, C.E. Evaluating cyclic liquefaction potential using the cone penetration test. Can. Geotech. J. 1998, 35, 442–459. [Google Scholar] [CrossRef]
- Robertson, P.K. Interpretation of cone penetration tests—A unified approach. Can. Geotech. J. 2009, 46, 1337–1355. [Google Scholar] [CrossRef]
- Seed, H.B.; Idriss, I.M. Simplified procedure for evaluating soil liquefaction potential. J. Soil Mech. Found. Div. 1971, 97, 1249–1273. [Google Scholar] [CrossRef]
- Idriss, I.M. An update to the Seed-Idriss simplified procedure for evaluating liquefaction potential. In Proceedings of the TRB Workshop on New Approaches to Liquefaction, Washington, DC, USA, 10 January 1999; Federal Highway Administration: Washington, DC, USA, 1999. No. FHWARD-99-165. [Google Scholar]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Chen, C.; Liaw, A.; Breiman, L. Using Random Forest to Learn Imbalanced Data; Technical Report 666; Department of Statistics, University of California: Berkeley, CA, USA, 2004. [Google Scholar]
- Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1619–1630. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 3146–3154. [Google Scholar]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems 31; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31, pp. 6638–6648. [Google Scholar]
- Islam, S.M.M.; Hossain, S.M.M.; Ray, S. DTI-SNNFRA: Drug-target interaction prediction by shared nearest neighbors and fuzzy-rough approximation. PLoS ONE 2021, 16, e0246920. [Google Scholar] [CrossRef] [PubMed]
- Weng, Y.; Liu, Y.; Chuang, H.-H. Intelligent Assessment of Scientific Creativity by Integrating Data Augmentation and Pseudo-Labeling. Information 2025, 16, 785. [Google Scholar] [CrossRef]
- Nguyen, H.A.T.; Pham, D.H.; Ahn, Y. Effect of Data Augmentation Using Deep Learning on Predictive Models for Geopolymer Compressive Strength. Appl. Sci. 2024, 14, 3601. [Google Scholar] [CrossRef]
- Kaneda, Y.; Pei, Y.; Zhao, Q.; Liu, Y. Improving the performance of the decision boundary making algorithm via outlier detection. J. Inf. Process. 2015, 23, 497–504. [Google Scholar] [CrossRef]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
- Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining explanations: An overview of interpretability of machine learning. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–4 October 2018; pp. 80–89. [Google Scholar] [CrossRef]
- Mikołajczyk-Bareła, A.; Ferlin, M.; Grochowski, M. Targeted data augmentation for improving model robustness. Int. J. Appl. Math. Comput. Sci. 2025, 35, 143–155. [Google Scholar] [CrossRef]
- Rickert, C.A.; Henkel, M.; Lieleg, O. An efficiency-driven, correlation-based feature elimination strategy for small datasets. APL Mach. Learn. 2023, 1, 016105. [Google Scholar] [CrossRef]
- Mali, N.; Dutt, V.; Uday, K.V. Determining the Geotechnical Slope Failure Factors via Ensemble and Individual Machine Learning Techniques: A Case Study in Mandi, India. Front. Earth Sci. 2021, 9, 701837. [Google Scholar] [CrossRef]
- Wang, C.H.; Cheng, C.S.; Lee, T.T. Dynamical optimal training for interval type-2 fuzzy neural network (T2FNN). IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2004, 34, 1462–1477. [Google Scholar] [CrossRef]
- Zhou, P.; Qi, Z.; Zheng, S.; Xu, J.; Bao, H.; Xu, B. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv 2016, arXiv:1611.06639. [Google Scholar] [CrossRef]
- Lu, Q.; Shi, P.; Lam, H.K.; Zhao, Y. Interval type-2 fuzzy model predictive control of nonlinear networked control systems. IEEE Trans. Fuzzy Syst. 2015, 23, 2317–2328. [Google Scholar] [CrossRef]
- Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. AutoAugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 113–123. [Google Scholar] [CrossRef]
- Wong, S.C.; Gatt, A.; Stamatescu, V.; McDonnell, M.D. Understanding data augmentation for classification: When to warp? In Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, 30 November–2 December 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2019, 1706.06083. [Google Scholar] [CrossRef]










| Dataset | Liquefied (Yes) | Non-Liquefied (No) | Liquefied (%) | Non-Liquefied (%) |
|---|---|---|---|---|
| Full Data | 229 | 92 | 71.34 | 28.66 |
| Training | 174 | 66 | 72.5 | 27.5 |
| Test | 55 | 26 | 67.9 | 32.1 |
| Parameter (Symbol) | Data Source |
|---|---|
| Earthquake magnitude (M) | Measured |
| Peak ground acceleration (amax) | Measured |
| Depth (z) | Measured |
| Cone tip resistance (qc) | Measured |
| Cone sleeve resistance (fs) | Measured |
| Cyclic stress ratio (CSR) | Calculated |
| Soil behavior index (Ic) | Calculated |
| Fines content (FC) | Measured (if available, otherwise calculated) |
| Total stress (σv) | Measured |
| Effective stress (σ’v) | Measured |
| Count | Mean | Std. dev. | Min | 25% | 50% | 75% | Max | |
|---|---|---|---|---|---|---|---|---|
| M (Mw) | 321 | 6.960 | 0.566 | 5.900 | 6.600 | 7.100 | 7.100 | 9.000 |
| amax (g) | 321 | 0.322 | 0.149 | 0.090 | 0.210 | 0.250 | 0.400 | 0.840 |
| z (m) | 321 | 4.700 | 2.260 | 1.400 | 2.900 | 4.300 | 5.900 | 12.700 |
| qc (kPa) | 321 | 5792.890 | 3704.660 | 784.530 | 3255.810 | 5000.000 | 7500.000 | 25,000.000 |
| fs (kPa) | 321 | 43.857 | 44.797 | 0.980 | 17.600 | 31.200 | 55.897 | 362.846 |
| CSR | 321 | 0.273 | 0.126 | 0.070 | 0.170 | 0.245 | 0.345 | 0.695 |
| Ic | 321 | 1.985 | 0.299 | 1.244 | 1.811 | 1.980 | 2.194 | 2.959 |
| FC (%) | 321 | 23.238 | 20.876 | 0.000 | 5.000 | 19.385 | 34.970 | 99.766 |
| σv (kPa) | 321 | 87.676 | 43.079 | 24.000 | 53.200 | 80.000 | 109.300 | 235.500 |
| σ’v (kPa) | 321 | 62.444 | 28.495 | 19.000 | 41.000 | 56.500 | 77.500 | 161.600 |
| Model | Subset | F1 Score | ROC AUC | Mean F1 (CV) | Std. dev. of F1 (CV) | Precision | Recall |
|---|---|---|---|---|---|---|---|
| GBM | Train | 0.974 | 0.994 | 0.927 | 0.022 | 0.994 | 0.954 |
| GBM | Test | 0.925 | 0.924 | 0.927 | 0.022 | 0.961 | 0.891 |
| CatB | Train | 0.949 | 0.986 | 0.912 | 0.022 | 0.943 | 0.954 |
| CatB | Test | 0.917 | 0.945 | 0.912 | 0.022 | 0.926 | 0.909 |
| XGB | Train | 0.940 | 0.978 | 0.899 | 0.019 | 0.932 | 0.948 |
| XGB | Test | 0.907 | 0.917 | 0.899 | 0.019 | 0.925 | 0.891 |
| RF | Train | 0.962 | 0.990 | 0.903 | 0.037 | 0.976 | 0.948 |
| RF | Test | 0.906 | 0.924 | 0.903 | 0.037 | 0.941 | 0.873 |
| ExT | Train | 0.939 | 0.965 | 0.897 | 0.025 | 0.953 | 0.925 |
| ExT | Test | 0.899 | 0.919 | 0.897 | 0.025 | 0.907 | 0.891 |
| RoF | Train | 0.942 | 0.980 | 0.888 | 0.048 | 0.958 | 0.925 |
| RoF | Test | 0.893 | 0.909 | 0.888 | 0.048 | 0.958 | 0.836 |
| LGBM | Train | 0.946 | 0.976 | 0.891 | 0.028 | 0.933 | 0.960 |
| LGBM | Test | 0.889 | 0.903 | 0.891 | 0.028 | 0.906 | 0.873 |
| NGB | Train | 0.944 | 0.989 | 0.904 | 0.025 | 0.914 | 0.977 |
| NGB | Test | 0.883 | 0.886 | 0.904 | 0.025 | 0.875 | 0.891 |
| BRF | Train | 0.909 | 0.932 | 0.887 | 0.029 | 0.933 | 0.885 |
| BRF | Test | 0.874 | 0.897 | 0.887 | 0.029 | 0.938 | 0.818 |
| AdaB | Train | 0.956 | 0.984 | 0.895 | 0.027 | 0.982 | 0.931 |
| AdaB | Test | 0.874 | 0.931 | 0.895 | 0.027 | 0.938 | 0.818 |
| Model | False Positives (FP) | False Negatives (FN) |
|---|---|---|
| CatB | 243, 310, 112, 19 | 182, 254, 203, 247, 253 |
| RF | 243, 112, 19 | 182, 162, 254, 203, 93, 247, 253 |
| XGB | 243, 310, 112, 32 | 182, 162, 254, 93, 309, 253 |
| GBM | 243, 112 | 182, 162, 254, 203, 247, 253 |
| Index | Error Type | M (Mw) | amax (g) | z (m) | qc (kPa) | fs (kPa) | CSR (-) | Ic (-) | FC (%) | σv (kPa) | σ’v (kPa) | True Label | Predicted Label |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 19 | FP | 6.00 | 0.37 | 11.00 | 6400.00 | 76.80 | 0.33 | 2.16 | 35.89 | 209.00 | 119.00 | No | Yes |
| 32 | FP | 6.20 | 0.13 | 4.80 | 5226.94 | 77.47 | 0.13 | 2.14 | 30.00 | 90.00 | 54.00 | No | Yes |
| 112 | FP | 7.10 | 0.24 | 3.00 | 4700.00 | 28.20 | 0.17 | 1.94 | 18.13 | 58.50 | 51.50 | No | Yes |
| 243 | FP | 7.10 | 0.38 | 2.80 | 900.00 | 44.10 | 0.37 | 2.84 | 90.01 | 52.30 | 34.10 | No | Yes |
| 310 | FP | 7.80 | 0.16 | 7.40 | 6472.30 | 24.52 | 0.15 | 1.82 | 2.00 | 135.00 | 86.00 | No | Yes |
| 93 | FN | 6.90 | 0.60 | 4.60 | 9806.65 | 14.71 | 0.59 | 1.41 | 0.00 | 86.00 | 54.00 | Yes | No |
| 162 | FN | 7.10 | 0.28 | 7.00 | 8500.00 | 59.50 | 0.27 | 1.85 | 11.12 | 133.00 | 84.00 | Yes | No |
| 182 | FN | 7.10 | 0.27 | 5.80 | 9400.00 | 84.60 | 0.27 | 1.84 | 10.48 | 109.30 | 67.60 | Yes | No |
| 203 | FN | 7.10 | 0.21 | 7.30 | 7700.00 | 30.80 | 0.18 | 1.79 | 6.37 | 140.70 | 98.50 | Yes | No |
| 247 | FN | 7.10 | 0.17 | 7.30 | 7227.50 | 43.15 | 0.17 | 1.87 | 12.16 | 138.00 | 80.00 | Yes | No |
| 253 | FN | 7.10 | 0.23 | 3.70 | 7600.15 | 40.21 | 0.25 | 1.69 | 0.00 | 70.00 | 42.00 | Yes | No |
| 254 | FN | 7.10 | 0.22 | 4.80 | 9149.60 | 56.88 | 0.25 | 1.70 | 0.00 | 92.00 | 50.00 | Yes | No |
| 309 | FN | 7.80 | 0.18 | 3.30 | 3236.20 | 12.75 | 0.14 | 1.99 | 5.00 | 58.00 | 48.00 | Yes | No |
| Type | FP | FP | FP | FP | FP | |||
|---|---|---|---|---|---|---|---|---|
| Index | 19 | 32 | 112 | 243 | 310 | |||
| Score | −0.045 | 0.052 | 0.129 | −0.036 | 0.040 | |||
| Outlier? | Yes | No | No | Yes | No | |||
| Type | FN | FN | FN | FN | FN | FN | FN | FN |
| Index | 93 | 162 | 182 | 203 | 247 | 253 | 254 | 309 |
| Score | 0.018 | 0.088 | 0.076 | 0.071 | 0.082 | 0.107 | 0.103 | 0.088 |
| Outlier? | No | No | No | No | No | No | No | No |
| Global Index:310 | ||||||||||
| Model | M | amax | z | qc | fs | CSR | Ic | FC | σv | σ’v |
| CatB | 0.016 | −0.140 | 0.016 | 0.003 | 0.109 | −0.137 | 0.003 | −0.003 | 0.041 | 0.003 |
| XGB | 0.033 | −0.126 | 0.005 | −0.048 | 0.057 | −0.083 | 0.045 | −0.049 | 0.074 | 0.014 |
| Global Index: 112 | ||||||||||
| CatB | 0.013 | 0.018 | −0.009 | 0.206 | −0.013 | −0.031 | 0.028 | −0.005 | −0.016 | 0.011 |
| RF | 0.014 | −0.014 | −0.008 | 0.141 | −0.013 | −0.042 | 0.044 | 0.014 | −0.005 | −0.002 |
| XGB | 0.008 | −0.007 | 0.002 | 0.238 | 0.025 | −0.010 | 0.003 | 0.027 | −0.048 | −0.036 |
| GBM | 0.014 | −0.012 | −0.009 | 0.202 | −0.005 | −0.028 | 0.021 | 0.014 | −0.003 | 0.005 |
| Global Index: 32 | ||||||||||
| XGB | −0.033 | −0.149 | −0.008 | 0.202 | −0.077 | −0.096 | −0.001 | 0.034 | 0.041 | −0.025 |
| Global Index: 182 | ||||||||||
| Model | M | amax | z | qc | fs | CSR | Ic | FC | σv | σ’v |
| CatB | 0.014 | −0.014 | 0.014 | −0.299 | −0.143 | 0.091 | 0.021 | −0.003 | 0.045 | 0.012 |
| RF | 0.003 | −0.006 | 0.020 | −0.250 | −0.097 | 0.060 | 0.029 | 0.007 | 0.022 | 0.009 |
| XGB | 0.000 | −0.005 | 0.030 | −0.303 | −0.061 | −0.013 | 0.051 | −0.029 | 0.077 | −0.012 |
| GBM | −0.001 | −0.040 | 0.023 | −0.259 | −0.162 | 0.068 | 0.015 | −0.006 | 0.033 | 0.018 |
| Global Index: 254 | ||||||||||
| CatB | 0.014 | −0.038 | −0.002 | −0.301 | −0.086 | 0.082 | −0.088 | −0.003 | 0.028 | −0.027 |
| RF | 0.008 | −0.029 | 0.004 | −0.234 | −0.071 | 0.036 | −0.105 | −0.069 | 0.001 | −0.015 |
| XGB | 0.001 | −0.038 | −0.008 | −0.285 | −0.054 | −0.003 | −0.056 | −0.037 | 0.006 | −0.040 |
| GBM | 0.008 | −0.024 | −0.001 | −0.258 | −0.098 | 0.038 | −0.081 | −0.025 | 0.000 | −0.034 |
| Global Index: 203 | ||||||||||
| CatB | 0.015 | −0.050 | 0.015 | −0.180 | −0.056 | −0.075 | 0.009 | −0.003 | 0.047 | 0.013 |
| RF | 0.004 | −0.046 | 0.019 | −0.138 | −0.027 | −0.070 | 0.028 | 0.000 | 0.007 | 0.008 |
| GBM | 0.007 | −0.040 | 0.028 | −0.140 | −0.069 | −0.081 | 0.020 | −0.036 | 0.030 | 0.014 |
| Global Index: 247 | ||||||||||
| CatB | 0.016 | −0.153 | 0.016 | −0.003 | −0.042 | −0.066 | 0.022 | −0.003 | 0.050 | 0.014 |
| RF | 0.005 | −0.126 | 0.015 | −0.038 | −0.017 | −0.070 | 0.034 | 0.010 | 0.006 | 0.012 |
| GBM | 0.011 | −0.120 | 0.025 | −0.005 | −0.062 | −0.084 | 0.020 | −0.014 | 0.023 | 0.007 |
| Global Index: 253 | ||||||||||
| CatB | 0.015 | −0.006 | −0.002 | −0.167 | −0.067 | 0.095 | −0.092 | −0.003 | −0.023 | −0.029 |
| RF | 0.010 | −0.003 | −0.013 | −0.104 | −0.072 | 0.065 | −0.122 | −0.073 | −0.006 | −0.023 |
| XGB | 0.000 | −0.012 | −0.008 | −0.116 | −0.035 | 0.011 | −0.060 | −0.049 | −0.025 | −0.027 |
| GBM | 0.013 | −0.026 | −0.014 | −0.154 | −0.110 | 0.046 | −0.088 | −0.025 | −0.008 | −0.034 |
| Global Index: 162 | ||||||||||
| RF | 0.004 | 0.002 | 0.009 | −0.230 | −0.045 | 0.062 | 0.032 | 0.008 | 0.009 | 0.007 |
| XGB | 0.000 | −0.002 | −0.004 | −0.308 | −0.058 | −0.013 | 0.044 | −0.022 | 0.076 | 0.003 |
| GBM | −0.001 | −0.03 | 0.024 | −0.253 | −0.080 | 0.081 | 0.023 | −0.013 | 0.036 | 0.016 |
| Global Index: 93 | ||||||||||
| RF | −0.009 | 0.049 | −0.009 | −0.214 | 0.129 | 0.045 | −0.114 | −0.056 | 0.000 | 0.001 |
| XGB | 0.000 | 0.128 | −0.004 | −0.251 | 0.085 | 0.041 | −0.105 | −0.039 | 0.014 | −0.02 |
| Global Index: 309 | ||||||||||
| XGB | 0.042 | −0.164 | −0.024 | 0.108 | 0.127 | −0.096 | −0.005 | −0.039 | −0.035 | −0.052 |
| Dataset | Number of Samples | Yes (%) | No (%) |
|---|---|---|---|
| Full Data | 420 | 70.24 | 29.76 |
| Training | 339 | 70.80 | 29.20 |
| Test | 81 | 67.90 | 32.10 |
| Model | F1 D2-T1 | F1 D2-T2 | ROC AUC D2-T1 | ROC AUC D2-T2 | Prec. D2-T1 | Prec. D2-T2 | Recall D2-T1 | Recall D2-T2 | CV Mean F1 D2-T1 | CV Std. dev. F1 D2-T1 | CV Mean F1 D2-T2 | CV Std. dev. D2-T1 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CatB | 0.926 | 0.926 | 0.943 | 0.942 | 0.943 | 0.943 | 0.909 | 0.909 | 0.907 | 0.037 | 0.935 | 0.027 |
| RF | 0.925 | 0.899 | 0.917 | 0.894 | 0.961 | 0.907 | 0.891 | 0.891 | 0.913 | 0.032 | 0.921 | 0.024 |
| XGB | 0.906 | 0.915 | 0.929 | 0.928 | 0.941 | 0.857 | 0.873 | 0.982 | 0.904 | 0.039 | 0.928 | 0.024 |
| GBM | 0.917 | 0.929 | 0.931 | 0.931 | 0.926 | 0.912 | 0.909 | 0.945 | 0.916 | 0.026 | 0.946 | 0.023 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Nacaroglu, E.; Tugrul, A.T.; Yagcioglu, B. A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification. Appl. Sci. 2026, 16, 330. https://doi.org/10.3390/app16010330
Nacaroglu E, Tugrul AT, Yagcioglu B. A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification. Applied Sciences. 2026; 16(1):330. https://doi.org/10.3390/app16010330
Chicago/Turabian StyleNacaroglu, Engin, Ayse Tuba Tugrul, and Berk Yagcioglu. 2026. "A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification" Applied Sciences 16, no. 1: 330. https://doi.org/10.3390/app16010330
APA StyleNacaroglu, E., Tugrul, A. T., & Yagcioglu, B. (2026). A Tabular Data Augmentation Framework Based on Error-Focused XAI-Supported Weighting Strategy: Application to Soil Liquefaction Classification. Applied Sciences, 16(1), 330. https://doi.org/10.3390/app16010330

