An Enhanced Discriminant Analysis Approach for Multi-Classification with Integrated Machine Learning-Based Missing Data Imputation

Araveeporn, Autcha; Kangtunyakarn, Atid

doi:10.3390/math13213392

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

An Enhanced Discriminant Analysis Approach for Multi-Classification with Integrated Machine Learning-Based Missing Data Imputation

by

Autcha Araveeporn

^1,*

and

Atid Kangtunyakarn

²

¹

Department of Statistics, School of Science, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand

²

Department of Mathematics, School of Science, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(21), 3392; https://doi.org/10.3390/math13213392 (registering DOI)

Submission received: 20 September 2025 / Revised: 15 October 2025 / Accepted: 23 October 2025 / Published: 24 October 2025

(This article belongs to the Section D1: Probability and Statistics)

Download Versions Notes

Abstract

This study addresses the challenge of accurate classification under missing data conditions by integrating multiple imputation strategies with discriminant analysis frameworks. The proposed approach evaluates six imputation methods (Mean, Regression, KNN, Random Forest, Bagged Trees, MissRanger) across several discriminant techniques. Simulation scenarios varied in sample size, predictor dimensionality, and correlation structure, while the real-world application employed the Cirrhosis Prediction Dataset. The results consistently demonstrate that ensemble-based imputations, particularly regression, KNN, and MissRanger, outperform simpler approaches by preserving multivariate structure, especially in high-dimensional and highly correlated settings. MissRanger yielded the highest classification accuracy across most discriminant analysis methods in both simulated and real data, with performance gains most pronounced when combined with flexible or regularized classifiers. Regression imputation showed notable improvements under low correlation, aligning with the theoretical benefits of shrinkage-based covariance estimation. Across all methods, larger sample sizes and high correlation enhanced classification accuracy by improving parameter stability and imputation precision.

Keywords: Bagged Trees; discriminant analysis; KNN; MissRanger

Share and Cite

MDPI and ACS Style

Araveeporn, A.; Kangtunyakarn, A. An Enhanced Discriminant Analysis Approach for Multi-Classification with Integrated Machine Learning-Based Missing Data Imputation. Mathematics 2025, 13, 3392. https://doi.org/10.3390/math13213392

AMA Style

Araveeporn A, Kangtunyakarn A. An Enhanced Discriminant Analysis Approach for Multi-Classification with Integrated Machine Learning-Based Missing Data Imputation. Mathematics. 2025; 13(21):3392. https://doi.org/10.3390/math13213392

Chicago/Turabian Style

Araveeporn, Autcha, and Atid Kangtunyakarn. 2025. "An Enhanced Discriminant Analysis Approach for Multi-Classification with Integrated Machine Learning-Based Missing Data Imputation" Mathematics 13, no. 21: 3392. https://doi.org/10.3390/math13213392

APA Style

Araveeporn, A., & Kangtunyakarn, A. (2025). An Enhanced Discriminant Analysis Approach for Multi-Classification with Integrated Machine Learning-Based Missing Data Imputation. Mathematics, 13(21), 3392. https://doi.org/10.3390/math13213392

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Enhanced Discriminant Analysis Approach for Multi-Classification with Integrated Machine Learning-Based Missing Data Imputation

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI