Abstract
Sudden cardiac death (SCD) remains a major clinical challenge, with implantable cardioverter-defibrillators (ICDs) serving as the primary preventive intervention. Current patient selection guidelines rely on limited and imperfect risk markers. This study explores the potential of machine learning (ML) models to improve SCD risk prediction using tabular clinical data that include features derived from medical sensing devices such as electrocardiograms (ECGs) and ICDs. Several ML models, including tree-based models, Naive Bayes (NB), logistic regression (LR), and voting classifiers (VC), were trained on demographic, clinical, laboratory, and device-derived variables from patients who underwent ICD implantation at a Croatian tertiary center. The target variable was the activation of the ICD device (appropriate or inappropriate/missed), serving as a surrogate for high-risk SCD detection. Models were optimized for the F2-score to prioritize high-risk patient detection, and interpretability was achieved with post hoc SHAP value analysis, which confirmed known and revealed additional potential SCD predictors. The random forest (RF) model achieved the highest F2-score (F2-score 0.74, AUC-ROC 0.73), demonstrating a recall of 97.30% and meeting the primary objective of high true positive detection, while the VC classifier achieved the highest overall discrimination (F2-score 0.71, AUC-ROC 0.76). The predictive performance of multiple ML models, particularly the high recall they achieved, demonstrates the promising potential of ML to refine ICD patient selection.