Cardiovascular disease (CVD) is a major cause of mortality around the world. This underscores the critical need to implement effective predictive tools to inform clinical decision-making. This study aimed to compare the predictive performance of ensemble learning algorithms, including Bagging, Random Forest, Extra
[...] Read more.
Cardiovascular disease (CVD) is a major cause of mortality around the world. This underscores the critical need to implement effective predictive tools to inform clinical decision-making. This study aimed to compare the predictive performance of ensemble learning algorithms, including Bagging, Random Forest, Extra Trees, Gradient Boosting, and AdaBoost, when applied to a clinical dataset comprising patients with CVD. The methodology entailed data preprocessing and cross-validation to regulate generalization. The performance of the model was evaluated using a variety of metrics, including accuracy,
F1 score, precision, recall, Cohen’s Kappa, and area under the curve (
AUC). Among the models evaluated, Bagging demonstrated the best overall performance (accuracy ± SD: 93.36% ± 0.22;
F1 score: 0.936;
AUC: 0.9686). It also reached the lowest average rank (1.0) in Friedman test and was placed, together with Extra Trees (accuracy ± SD: 90.76% ± 0.18;
F1 score: 0.916;
AUC: 0.9689), in the superior statistical group (group A) according to Nemenyi post hoc test. The two models demonstrated a high degree of agreement with the actual labels (Kappa: 0.87 and 0.83, respectively), thereby substantiating their reliability in authentic clinical contexts. The findings substantiated the preeminence of aggregation-based ensemble methods in terms of accuracy, stability, and concordance. This underscored the prominence of Bagging and Extra Trees as optimal candidates for cardiovascular diagnostic support systems, where reliability and generalization were paramount.
Full article