Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models

Eray, Osman

doi:10.3390/app16136726

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models

by

Osman Eray

Department of Computer Technologies, Korkuteli Vocational School, Akdeniz University, Antalya 07800, Türkiye

Appl. Sci. 2026, 16(13), 6726; https://doi.org/10.3390/app16136726 (registering DOI)

Submission received: 5 May 2026 / Revised: 24 June 2026 / Accepted: 28 June 2026 / Published: 5 July 2026

Download Versions Notes

Abstract

Liver disease represents a major global health burden, and early diagnosis is essential for reducing mortality. Machine learning (ML) approaches offer non-invasive alternatives to conventional diagnostics, yet existing studies on liver disease prediction often lack systematic feature selection, apply resampling before data splitting (introducing leakage), and report results from single train-test splits without statistical testing. This study proposes a Hybrid Feature Selection (HFS) framework combining Pearson-correlation-based redundancy elimination with a weighted Information Gain–Gain Ratio scoring function, integrated with SMOTE within a leakage-free pipeline. The framework is evaluated on two benchmarks—the Indian Liver Patient Dataset (ILPD, n = 583) and the BUPA Liver Disorders Dataset (n = 345)—across ten classifiers and ten independent train-test splits, with significance assessed via paired Wilcoxon signed-rank tests. On ILPD, the HFS + SMOTE pipeline produced statistically significant ROC-AUC improvements (p < 0.05) in five of ten classifiers and resolved majority-class collapse, raising mean Specificity from 0.00–0.33 to 0.61–0.92. A 2 × 2 ablation study confirmed that HFS and SMOTE contribute independently, with SMOTE driving the Specificity transformation and HFS reducing feature-space noise. Sensitivity analyses demonstrated robustness to the weighting parameter w and confirmed k = 6 as the optimal feature count. Replication on BUPA—which exhibits near-perfect class balance and no feature redundancy—produced a principled null result, confirming that the pipeline’s effectiveness is mechanistically linked to dataset characteristics. The HFS algorithm consistently identified four clinically meaningful core features (AST, ALT, Total Bilirubin, Age) across all runs, validated by SHAP and Permutation Importance stability analysis.

Keywords: machine learning; liver disease prediction; hybrid feature selection; SMOTE; SHAP; explainable AI

Share and Cite

MDPI and ACS Style

Eray, O. Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models. Appl. Sci. 2026, 16, 6726. https://doi.org/10.3390/app16136726

AMA Style

Eray O. Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models. Applied Sciences. 2026; 16(13):6726. https://doi.org/10.3390/app16136726

Chicago/Turabian Style

Eray, Osman. 2026. "Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models" Applied Sciences 16, no. 13: 6726. https://doi.org/10.3390/app16136726

APA Style

Eray, O. (2026). Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models. Applied Sciences, 16(13), 6726. https://doi.org/10.3390/app16136726

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Liver Disease Prediction Using Hybrid Feature Selection: A Comparative Analysis of Machine Learning Models

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI