This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models
by
Osman Eray
Osman Eray
Department of Computer Technologies, Korkuteli Vocational School, Akdeniz University, Antalya 07800, Türkiye
Appl. Sci. 2026, 16(13), 6726; https://doi.org/10.3390/app16136726 (registering DOI)
Submission received: 5 May 2026
/
Revised: 24 June 2026
/
Accepted: 28 June 2026
/
Published: 5 July 2026
Abstract
Liver disease represents a major global health burden, and early diagnosis is essential for reducing mortality. Machine learning (ML) approaches offer non-invasive alternatives to conventional diagnostics, yet existing studies on liver disease prediction often lack systematic feature selection, apply resampling before data splitting (introducing leakage), and report results from single train-test splits without statistical testing. This study proposes a Hybrid Feature Selection (HFS) framework combining Pearson-correlation-based redundancy elimination with a weighted Information Gain–Gain Ratio scoring function, integrated with SMOTE within a leakage-free pipeline. The framework is evaluated on two benchmarks—the Indian Liver Patient Dataset (ILPD, n = 583) and the BUPA Liver Disorders Dataset (n = 345)—across ten classifiers and ten independent train-test splits, with significance assessed via paired Wilcoxon signed-rank tests. On ILPD, the HFS + SMOTE pipeline produced statistically significant ROC-AUC improvements (p < 0.05) in five of ten classifiers and resolved majority-class collapse, raising mean Specificity from 0.00–0.33 to 0.61–0.92. A 2 × 2 ablation study confirmed that HFS and SMOTE contribute independently, with SMOTE driving the Specificity transformation and HFS reducing feature-space noise. Sensitivity analyses demonstrated robustness to the weighting parameter w and confirmed k = 6 as the optimal feature count. Replication on BUPA—which exhibits near-perfect class balance and no feature redundancy—produced a principled null result, confirming that the pipeline’s effectiveness is mechanistically linked to dataset characteristics. The HFS algorithm consistently identified four clinically meaningful core features (AST, ALT, Total Bilirubin, Age) across all runs, validated by SHAP and Permutation Importance stability analysis.
Share and Cite
MDPI and ACS Style
Eray, O.
Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models. Appl. Sci. 2026, 16, 6726.
https://doi.org/10.3390/app16136726
AMA Style
Eray O.
Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models. Applied Sciences. 2026; 16(13):6726.
https://doi.org/10.3390/app16136726
Chicago/Turabian Style
Eray, Osman.
2026. "Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models" Applied Sciences 16, no. 13: 6726.
https://doi.org/10.3390/app16136726
APA Style
Eray, O.
(2026). Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models. Applied Sciences, 16(13), 6726.
https://doi.org/10.3390/app16136726
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article metric data becomes available approximately 24 hours after publication online.