Machine Learning for Mortality Risk Prediction in Myocardial Infarction: A Clinical-Economic Decision Support Framework
Abstract
1. Introduction
2. Materials and Methods
2.1. Data and Preprocessing
2.2. Handling Class Imbalance
2.3. Feature Scaling and Transformation
2.4. Dimensionality Reduction
2.5. Predictive Modeling
2.6. Hyperparameter Optimization
2.7. Subpopulation-Based External Validation
3. Evaluation Metrics
4. Results
5. Discussion
5.1. Methodological Advancements over Prior Frameworks
5.2. Implications for Triage and ICU Management
5.3. Bioinformatics and Economic Perspectives
5.4. Limitations
5.5. Future Directions
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Heidari-Foroozan, M.; Farshbafnadi, M.; Golestani, A.; Younesian, S.; Jafary, H.; Rashidi, M.M.; Tabatabaei-Malazy, O.; Rezaei, N.; Kheirabady, M.M.; Ghotbi, A.B.; et al. National and Subnational Burden of Cardiovascular Diseases in Iran from 1990 to 2021: Results from Global Burden of Diseases 2021 Study. Glob. Heart 2025, 20, 43. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization. Cardiovascular Diseases (CVDs). Available online: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) (accessed on 28 June 2025).
- Zhang, X.; Wang, X.; Xu, L.; Ren, P.; Wu, H. The Predictive Value of Machine Learning for Mortality Risk in Patients with Acute Coronary Syndromes: A Systematic Review and Meta-Analysis. Eur. J. Med. Res. 2023, 28, 451. [Google Scholar] [CrossRef]
- Salari, N.; Morddarvanjoghi, F.; Abdolmaleki, A.; Rasoulpoor, S.; Khaleghi, A.A.; Hezarkhani, L.A.; Shohaimi, S.; Mohammadi, M. The Global Prevalence of Myocardial Infarction: A Systematic Review and Meta-Analysis. BMC Cardiovasc. Disord. 2023, 23, 206. [Google Scholar] [CrossRef]
- Yang, Y.; Tang, J.; Ma, L.; Wu, F.; Guan, X. A Systematic Comparison of Short-Term and Long-Term Mortality Prediction in Acute Myocardial Infarction Using Machine Learning Models. BMC Med. Inform. Decis. Mak. 2025, 25, 208. [Google Scholar] [CrossRef]
- Fanaroff, A.C.; Chen, A.Y.; Thomas, L.E.; Pieper, K.S.; Garratt, K.N.; Peterson, E.D.; Newby, L.K.; de Lemos, J.A.; Kosiborod, M.N.; Amsterdam, E.A.; et al. Risk Score to Predict Need for Intensive Care in Initially Hemodynamically Stable Adults With Non–ST-Segment–Elevation Myocardial Infarction. J. Am. Heart Assoc. 2018, 7, e008894. [Google Scholar] [CrossRef]
- Lee, W.; Lee, J.; Woo, S.I.; Choi, S.H.; Bae, J.W.; Jung, S.; Jeong, M.H.; Lee, W.K. Machine Learning Enhances the Performance of Short- and Long-Term Mortality Prediction Model in Non-ST-Segment Elevation Myocardial Infarction. Sci. Rep. 2021, 11, 12886. [Google Scholar] [CrossRef]
- Gupta, A.K.; Mustafiz, C.; Mutahar, D.; Zaka, A.; Parvez, R.; Mridha, N.; Stretton, B.; Kovoor, J.G.; Bacchi, S.; Ramponi, F.; et al. Machine Learning vs Traditional Approaches to Predict All-Cause Mortality for Acute Coronary Syndrome: A Systematic Review and Meta-Analysis. Can. J. Cardiol. 2025, in press. [CrossRef]
- Kumar, R.; Safdar, U.; Yaqoob, N.; Khan, S.F.; Matani, K.; Khan, N.; Jalil, B.; Yousufzai, E.; Shahid, M.O.; Khan, S. Assessment of the Prognostic Performance of TIMI, PAMI, CADILLAC and GRACE Scores for Short-Term Major Adverse Cardiovascular Events in Patients Undergoing Emergent Percutaneous Revascularisation: A Prospective Observational Study. BMJ Open 2025, 15, e091028. [Google Scholar] [CrossRef] [PubMed]
- Newaz, A.; Mohosheu, M.S.; Noman, M.A.A. Predicting Complications of Myocardial Infarction Within Several Hours of Hospitalization Using Data Mining Techniques. Inform. Med. Unlocked 2023, 42, 101361. [Google Scholar] [CrossRef]
- Diakou, I.; Iliopoulos, E.; Papakonstantinou, E.; Dragoumani, K.; Yapijakis, C.; Iliopoulos, C.; Spandidos, D.A.; Chrousos, G.P.; Eliopoulos, E.; Vlachakis, D. Multi-Label Classification of Biomedical Data. Med. Int. 2024, 4, 68. [Google Scholar] [CrossRef]
- Joshi, A.; Gunwant, H.; Sharma, M.; Chaudhary, V. Early Prognosis of Acute Myocardial Infarction Using Machine Learning Techniques. In Lecture Notes on Data Engineering and Communications Technologies, Proceedings of Data Analytics and Management, Polkowice, Poland, 26 June 2021; Springer: Singapore, 2022; Volume 91, pp. 815–829. [Google Scholar] [CrossRef]
- Lokeswar Reddy, K.; Thangam, S. Predicting Relapse of the Myocardial Infarction in Hospitalized Patients. In Proceedings of the 2022 3rd International Conference for Emerging Technology (INCET), Belgaum, India, 20–22 May 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Al-Zaiti, S.S.; Martin-Gill, C.; Zègre-Hemsey, J.K.; Bouzid, Z.; Faramand, Z.; Alrawashdeh, M.O.; Gregg, R.E.; Helman, S.; Riek, N.T.; Kraevsky-Phillips, K.; et al. Machine Learning for ECG Diagnosis and Risk Stratification of Occluded Myocardial Infarction. Nat. Med. 2023, 29, 1804–1813. [Google Scholar] [CrossRef]
- Beaulieu-Jones, B.K.; Moore, J.H. Missing Data Imputation in the Electronic Health Record Using Deeply Learned Autoencoders. Pac. Symp. Biocomput. 2017, 22, 207–218. [Google Scholar] [CrossRef]
- Liu, M.; Li, S.; Yuan, H.; Ong, M.E.H.; Ning, Y.; Xie, F.; Saffari, S.E.; Shang, Y.; Volovici, V.; Chakraborty, B.; et al. Handling Missing Values in Healthcare Data: A Systematic Review of Deep Learning-Based Imputation Techniques. Artif. Intell. Med. 2023, 142, 102587. [Google Scholar] [CrossRef]
- Zamanzadeh, D.J.; Petousis, P.; Davis, T.A.; Nicholas, S.B.; Norris, K.C.; Tuttle, K.R.; Bui, A.A.T.; Sarrafzadeh, M. AutoPopulus: A Novel Framework for Autoencoder Imputation on Large Clinical Datasets. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2021, 2021, 2303–2309. [Google Scholar] [CrossRef] [PubMed]
- Qiu, Y.L.; Zheng, H.; Gevaert, O. Genomic Data Imputation with Variational Autoencoders. GigaScience 2020, 9, giaa082. [Google Scholar] [CrossRef] [PubMed]
- Yeo, I.-K.; Johnson, R.A. A New Family of Power Transformations to Improve Normality or Symmetry. Biometrika 2000, 87, 954–959. [Google Scholar] [CrossRef]
- Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
- Zhu, J.; Pu, S.; He, J.; Su, D.; Cai, W.; Xu, X.; Liu, H. Processing Imbalanced Medical Data at the Data Level with Assisted-Reproduction Data as an Example. BioData Min. 2024, 17, 29. [Google Scholar] [CrossRef]
- Cusworth, S.; Gkoutos, G.V.; Acharjee, A. A Novel Generative Adversarial Networks Modelling for the Class Imbalance Problem in High-Dimensional Omics Data. BMC Med. Inform. Decis. Mak. 2024, 24, 90. [Google Scholar] [CrossRef]
- Liu, R.; Wang, M.; Zheng, T.; Zhang, R.; Li, N.; Chen, Z.; Yan, H.; Shi, Q. An Artificial Intelligence-Based Risk Prediction Model of Myocardial Infarction. BMC Bioinformatics 2022, 23, 217. [Google Scholar] [CrossRef]
- Singh, J.; Beeche, C.; Shi, Z.; Beale, O.; Rosin, B.; Leader, J.; Pu, J. Batch-Balanced Focal Loss: A Hybrid Solution to Class Imbalance in Deep Learning. J. Med. Imaging 2023, 10, 051809. [Google Scholar] [CrossRef]
- Vincent, P.; Larochelle, H.; Lajoie, I.; Bengio, Y.; Manzagol, P.A. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. J. Mach. Learn. Res. 2010, 11, 3371–3408. [Google Scholar]
- Ditzler, G.; Roveri, M.; Alippi, C.; Polikar, R. Learning in Nonstationary Environments: A Survey. IEEE Comput. Intell. Mag. 2015, 10, 12–25. [Google Scholar] [CrossRef]
- Ahsan, M.M.; Mahmud, M.A.P.; Saha, P.K.; Gupta, K.D.; Siddique, Z. Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance. Technologies 2021, 9, 52. [Google Scholar] [CrossRef]
- Yang, Y.; Sun, H.; Zhang, Y.; Zhang, T.; Gong, J.; Wei, Y.; Duan, Y.G.; Shu, M.; Yang, Y.; Wu, D.; et al. Dimensionality Reduction by UMAP Reinforces Sample Heterogeneity Analysis in Bulk Transcriptomic Data. Cell Rep. 2021, 36, 109442. [Google Scholar] [CrossRef]
- Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. A Review of Feature Selection Methods on Synthetic Data. Knowl. Inf. Syst. 2013, 34, 483–519. [Google Scholar] [CrossRef]
- Hosmer, D.W.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression, 3rd ed.; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Cover, T.M.; Hart, P.E. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased Boosting with Categorical Features. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS), Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Bauer, E.; Kohavi, R. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Mach. Learn. 1999, 36, 105–139. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning Representations by Back-Propagating Errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. Available online: https://openaccess.thecvf.com/content_ICCV_2017/papers/Lin_Focal_Loss_for_ICCV_2017_paper.pdf (accessed on 15 May 2025).
- Arik, S.O.; Pfister, T. TabNet: Attentive Interpretable Tabular Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 2–9 February 2021; Volume 35, pp. 6679–6687. Available online: https://arxiv.org/abs/1908.07442 (accessed on 18 July 2025).
- Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
- Rockenschaub, P.; Akay, E.M.; Carlisle, B.G.; Hilbert, A.; Wendland, J.; Meyer-Eschenbach, F.; Näher, A.F.; Frey, D.; Madai, V.I. External validation of AI-based scoring systems in the ICU: A systematic review and meta-analysis. BMC Med. Inform. Decis. Mak. 2025, 25, 5. [Google Scholar] [CrossRef]
- Guo, H.; Li, Y.; Shang, J.; Gu, M.; Hu, Y.; Gong, B. Learning from Class-Imbalanced Data: Review of Methods and Applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
- van den Goorbergh, R.; van Smeden, M.; Timmerman, D.; Van Calster, B. The Harm of Class Imbalance Corrections for Risk Prediction Models. medRxiv 2022. [Google Scholar] [CrossRef]
- Lobo, J.M.; Jiménez-Valverde, A.; Real, R. AUC: A Misleading Measure of the Performance of Predictive Distribution Models. Glob. Ecol. Biogeogr. 2008, 17, 145–151. [Google Scholar] [CrossRef]
- Saito, T.; Rehmsmeier, M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef]
- Sasaki, Y. The Truth of the F-Measure. Teach. Tutor. Mater. 2007, 1, 1–5. [Google Scholar]
- López, V.; Fernández, A.; García, S.; Palade, V.; Herrera, F. An Insight into Classification with Imbalanced Data: Empirical Results and Current Trends on Using Data Intrinsic Characteristics. Inf. Sci. 2013, 250, 113–141. [Google Scholar] [CrossRef]
- Kubat, M.; Matwin, S. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In Proceedings of the 14th International Conference on Machine Learning (ICML), Nashville, TN, USA, 8–12 July 1997; pp. 179–186. [Google Scholar]
- Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Matthews, B.W. Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme. Biochim. Biophys. Acta Protein Struct. 1975, 405, 442–451. [Google Scholar] [CrossRef]
- Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
- Tschoellitsch, T.; Seidl, P.; Böck, C.; Maletzky, A.; Moser, P.; Thumfart, S.; Giretzlehner, M.; Hochreiter, S.; Meier, J. Using Emergency Department Triage for Machine Learning-Based Admission and Mortality Prediction. Eur. J. Emerg. Med. 2023, 30, 408–416. [Google Scholar] [CrossRef] [PubMed]
- Greco, S.; Ishizaka, A.; Tasiou, M.; Torrisi, G. On the Methodological Framework of Composite Indices: A Review of the Issues of Weighting, Aggregation, and Robustness. Soc. Indic. Res. 2019, 141, 61–94. [Google Scholar] [CrossRef]
- Altman, D.G.; Bland, J.M. Diagnostic Tests. 1: Sensitivity and Specificity. BMJ 1994, 308, 1552. [Google Scholar] [CrossRef] [PubMed]
- Shreffler, J.; Huecker, M.R. Type I and Type II Errors and Statistical Power. In StatPearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, January 2025. Available online: https://www.ncbi.nlm.nih.gov/books/NBK557530/ (accessed on 20 May 2025).
- de Vos, J.; Visser, L.A.; de Beer, A.A.; Fornasa, M.; Thoral, P.J.; Elbers, P.W.G.; Cinà, G. The Potential Cost-Effectiveness of a Machine Learning Tool That Can Prevent Untimely Intensive Care Unit Discharge. Value Health 2022, 25, 359–367. [Google Scholar] [CrossRef]
Parameter | Value |
---|---|
Total patients | 1700 MI cases (Intensive Care Unit (ICU) admissions) |
Input features (baseline) | 111 clinical variables (demographics, vitals, labs, etc.) |
Outcome variable | In-hospital mortality (binary: 0 = survived, 1 = died) |
Mortality rate | 15.94% (271/1700 patients died in-hospital) |
Missing data | 7.6% of values missing, imputed during preprocessing |
Step | Technique/Method | Applied To | Purpose |
---|---|---|---|
Row Removal | Drop rows with >20 missing values | Samples | Remove incomplete records |
Feature Removal | Drop features with >300 missing values | Features | Remove sparse features |
Feature Categorization | Binary, Categorical, Integer, Continuous | Types | Organize feature types |
Numeric Imputation | SimpleImputer (mean) + Denoising Autoencoder | Numeric | Impute and denoise |
Categorical Imputation | SimpleImputer (most frequent) + OneHotEncoder | Categorical | Preserve category structure |
Binary Imputation | SimpleImputer (most frequent) | Binary | Fill binary gaps |
Feature Selection | SelectFromModel (Random Forest) | Transformed | Reduce feature dimensions |
Advanced Imputation | VAEImputer (Variational Autoencoder) | Missing | Reconstruct missing values |
Scaling | PowerTransformer (Yeo-Johnson) | Numerical | Normalize for performance |
Class Imbalance | SMOTE (Synthetic Oversampling) | Training | Balance class distribution |
Algorithm | Description and Role in the Study |
---|---|
Neural Network (Focal Loss) | Deep neural network trained with Focal Loss to handle severe class imbalance. Learns nonlinear patterns and serves as the baseline deep learning model for mortality prediction. |
XGBoost | Gradient Boosting Trees with regularization. High-performing model on tabular data, used to model complex feature interactions. Part of the ensemble and standalone comparison. |
LightGBM | Fast, histogram-based boosting algorithm. Efficient on large datasets and sparse features, used to compare with other tree-based models. |
CatBoost | Gradient boosting natively supports categorical features. Included for its superior handling of categorical variables and stable performance. |
Random Forest | Bagging ensemble of decision trees. Used for feature importance (SelectFromModel) and final prediction benchmark. Interpretable and robust. |
Logistic Regression | Interpretable linear model. Used as a benchmark and for its clinical relevance in mortality studies. |
SVM | Support Vector Machine with a nonlinear kernel. Included to test margin-based separation under class imbalance. |
K-Nearest Neighbors | Distance-based classifier. Included for simplicity and to assess performance on the transformed feature space. |
TabNet | Deep learning model with sequential attention. Designed for tabular data, used to explore deep feature learning and interpretability via masks. |
Voting Ensemble | Combines XGBoost, CatBoost, and Random Forest. Aims to leverage diverse model strengths for a more stable final prediction. |
Algorithm | Hyperparameter Tuning _ Grid Search |
---|---|
Neural Network (Focal) | learning_rate: [0.01, 0.001] gamma: [2.0, 4.0] alpha: [0.25, 0.5] batch_size: [16, 32] epochs: [15, 30] |
XGBoost | n_estimators: [100, 300] learning_rate: [0.01, 0.1] max_depth: [3, 6, 9] subsample: [0.8, 1.0] colsample_bytree: [0.8, 1.0] |
LightGBM | n_estimators: [100, 300] learning_rate: [0.01, 0.1] num_leaves: [31, 50] max_depth: [−1, 10, 20] min_child_samples: [20, 50] subsample: [0.8, 1.0] |
CatBoost | iterations: [100, 300] learning_rate: [0.01, 0.1] depth: [4, 6, 10] l2_leaf_reg: [1, 3, 5] |
Random Forest | n_estimators: [100, 300] max_depth: [None, 10, 20] min_samples_split: [2, 5] min_samples_leaf: [1, 2] max_features: [‘sqrt’, ‘log2’] |
LogisticRegression | penalty: [‘l2’] C: [0.01, 0.1, 1.0, 10] solver: [‘lbfgs’] max_iter: [100, 300] |
SVM (RBF) | C: [0.1, 1, 10] kernel: [‘linear’, ‘rbf’] gamma: [‘scale’, ‘auto’] |
K-Nearest Neighbors | n_neighbors: [3, 5, 7, 9] n_neighbors: [3, 5, 7, 9] weights: [‘uniform’, ‘distance’] metric: [‘euclidean’, ‘manhattan’] |
TabNet | n_d: [8, 16] n_a: [8, 16] n_steps: [3, 5] gamma: [1.3, 1.5] lambda_sparse: [1 × 10−3, 1 × 10−4] optimizer_params: [{‘lr’: 1 × 10−3}, {‘lr’: 5 × 10−4}] |
Model | F1 | ROC-AUC | MCC |
---|---|---|---|
NeuralNetwork (FocalLoss) | 0.62 | 0.87 | 0.53 |
XGBoost | 0.64 | 0.89 | 0.60 |
LightGBM | 0.62 | 0.89 | 0.60 |
CatBoost | 0.60 | 0.90 | 0.57 |
Logistic Regression | 0.62 | 0.88 | 0.51 |
Random Forest | 0.52 | 0.88 | 0.54 |
SVM | 0.58 | 0.85 | 0.48 |
k-Nearest Neighbor | 0.46 | 0.72 | 0.29 |
TabNet | 0.36 | 0.71 | 0.11 |
Voting Classifier (XGB+CAT+RF) | 0.62 | 0.90 | 0.60 |
Metric | Interpretation | Key Use Case |
---|---|---|
Accuracy | Proportion of total correct predictions | Misleading when classes are imbalanced |
Precision | Correctly predicted deaths over all predicted deaths | Important for minimizing false positive alerts |
Recall | Correctly predicted deaths over all actual deaths | Critical to avoid missing high-risk patients |
Specificity | Correctly predicted survivals over all actual survivals | Complements recall in evaluating full coverage |
F1-Score | Harmonic mean of precision and recall | Useful when both false positives/negatives matter |
G-Mean | Geometric mean of recall and specificity | Ensures balanced performance across classes |
ROC-AUC | The probability of a death receiving a higher risk score than a survivor | Threshold-independent discrimination measure |
MCC | Correlation between predicted and actual outcomes | Reflects overall reliability in imbalanced data |
Confusion Matrix | Distribution of TP, TN, FP, FN across all predictions | Visual error analysis for clinical decisions |
Model | Accuracy | Precision | F1 Score | ROC-AUC | MCC | Recall | Specificity | G-Mean |
---|---|---|---|---|---|---|---|---|
NeuralNetwork (FocalLoss) | 0.9097 | 0.8750 | 0.6000 | 0.8879 | 0.5921 | 0.4565 | 0.9886 | 0.6718 |
CatBoost | 0.9000 | 0.8261 | 0.5507 | 0.8693 | 0.5397 | 0.4130 | 0.9848 | 0.6378 |
SVM | 0.8968 | 0.8182 | 0.5294 | 0.8687 | 0.5208 | 0.3913 | 0.9848 | 0.6208 |
LightGBM | 0.9000 | 0.8000 | 0.5634 | 0.8929 | 0.5429 | 0.4348 | 0.9811 | 0.6531 |
Voting Classifier | 0.9129 | 0.9130 | 0.6087 | 0.8777 | 0.6089 | 0.4565 | 0.9924 | 0.6731 |
Random Forest | 0.8968 | 0.8889 | 0.5000 | 0.8648 | 0.5172 | 0.3478 | 0.9924 | 0.5875 |
XGBoost | 0.9065 | 0.7931 | 0.6133 | 0.8892 | 0.5826 | 0.5000 | 0.9773 | 0.6990 |
Logistic Regression | 0.9032 | 0.7500 | 0.6154 | 0.8821 | 0.5742 | 0.5217 | 0.9697 | 0.7113 |
KNN | 0.8806 | 0.7368 | 0.4308 | 0.7417 | 0.4230 | 0.3043 | 0.9811 | 0.5464 |
TabNet | 0.8290 | 0.3333 | 0.2090 | 0.5820 | 0.1402 | 0.1522 | 0.9470 | 0.3796 |
Model | Accuracy | Precision | F1 Score | ROC-AUC | MCC | Recall | Specificity | G-Mean |
---|---|---|---|---|---|---|---|---|
NeuralNetwork (FocalLoss) | 0.9065 | 0.7297 | 0.6506 | 0.8878 | 0.6020 | 0.5870 | 0.9621 | 0.7515 |
CatBoost | 0.9194 | 0.8621 | 0.6667 | 0.8883 | 0.6450 | 0.5435 | 0.9848 | 0.7316 |
SVM | 0.9161 | 0.7632 | 0.6905 | 0.8970 | 0.6464 | 0.6304 | 0.9659 | 0.7803 |
LightGBM | 0.9161 | 0.8571 | 0.6486 | 0.8953 | 0.6282 | 0.5217 | 0.9848 | 0.7168 |
Voting Classifier | 0.9129 | 0.8519 | 0.6301 | 0.8905 | 0.6112 | 0.5000 | 0.9848 | 0.7017 |
Random Forest | 0.9097 | 1.0000 | 0.5625 | 0.8838 | 0.5948 | 0.3913 | 1.0000 | 0.6255 |
XGBoost | 0.9000 | 0.7586 | 0.5867 | 0.8854 | 0.5515 | 0.4783 | 0.9735 | 0.6823 |
Logistic Regression | 0.8548 | 0.5079 | 0.5872 | 0.8918 | 0.5108 | 0.6957 | 0.8826 | 0.7836 |
KNN | 0.7774 | 0.3678 | 0.4812 | 0.8179 | 0.3856 | 0.6957 | 0.7917 | 0.7421 |
TabNet | 0.7548 | 0.2917 | 0.3559 | 0.6917 | 0.2217 | 0.4565 | 0.8068 | 0.6069 |
Metric | Without SMOTE (Best Model/Score) | With SMOTE (Best Model/Score) |
---|---|---|
Accuracy | Logistic Regression/0.9032 | SVM or LightGBM/0.9161 |
Precision | Voting Classifier/0.9130 | Random Forest/1.0000 |
F1 Score | Logistic Regression/0.6154 | SVM/0.6905 |
ROC-AUC | LightGBM/0.8929 | SVM/0.8970 |
MCC | Voting Classifier/0.6089 | LightGBM/0.6282 |
Recall | Logistic Regression/0.5217 | Logistic Regression or KNN/0.6957 |
Specificity | Voting Classifier or RF/0.9924 | Random Forest/1.0000 |
G-Mean | Logistic Regression/0.7113 | Logistic Regression/0.7836 |
Model (SMOTE) | CPI (SMOTE) | Model (No SMOTE) | CPI (No SMOTE) |
---|---|---|---|
SVM | 0.7342 | Logistic Regression | 0.6688 |
CatBoost | 0.7031 | XGBoost | 0.6652 |
NeuralNetwork (FocalLoss) | 0.7018 | Voting Classifier | 0.6559 |
Logistic Regression | 0.6939 | NeuralNetwork (FocalLoss) | 0.6518 |
LightGBM | 0.6902 | LightGBM | 0.6273 |
Voting Classifier | 0.6755 | CatBoost | 0.6140 |
XGBoost | 0.6456 | SVM | 0.5987 |
Random Forest | 0.6244 | Random Forest | 0.5781 |
KNN | 0.6234 | KNN | 0.5108 |
TabNet | 0.4748 | TabNet | 0.3279 |
Model (SMOTE) | Type I Error | Type II Error | Total Error Rate | Model (No SMOTE) | Type I Error | Type II Error | Total Error Rate |
---|---|---|---|---|---|---|---|
SVM | 0.0341 | 0.3696 | 0.4037 | Logistic Regression | 0.0303 | 0.4783 | 0.5086 |
Logistic Regression | 0.1174 | 0.3043 | 0.4217 | XGBoost | 0.0227 | 0.5000 | 0.5227 |
Neural Network (Focal Loss) | 0.0379 | 0.4130 | 0.4509 | Voting Classifier | 0.0076 | 0.5435 | 0.5511 |
CatBoost | 0.0152 | 0.4565 | 0.4717 | Neural Network (FocalLoss) | 0.0114 | 0.5435 | 0.5549 |
KNN | 0.2083 | 0.3043 | 0.5126 | LightGBM | 0.0189 | 0.5652 | 0.5841 |
LightGBM | 0.0152 | 0.4783 | 0.4935 | CatBoost | 0.0152 | 0.5870 | 0.6022 |
Voting Classifier | 0.0152 | 0.5000 | 0.5152 | SVM | 0.0152 | 0.6087 | 0.6239 |
XGBoost | 0.0265 | 0.5217 | 0.5482 | Random Forest | 0.0076 | 0.6522 | 0.6598 |
Random Forest | 0.0000 | 0.6087 | 0.6087 | KNN | 0.0189 | 0.6957 | 0.7146 |
TabNet | 0.1932 | 0.5435 | 0.7367 | TabNet | 0.0530 | 0.8478 | 0.9008 |
Authors | Accuracy | Precision | F1 Score | ROC-AUC | MCC | Specificity | G-Mean |
---|---|---|---|---|---|---|---|
This study | SVM (0.9161) | RF (1.000) | SVM (0.6905) | SVM (0.8970) | SVM (0.6464) | RF (1.000) | Logistic Regression (0.7836) |
Newaz et al. | XGBoost 0.9134 | RF (1.000) | XGBoost (0.5425) | XGBoost (0.7606) | XGBoost (0.6222) | KNN/RF (1.000) | XGBoost (0.7270) |
Aspect | Present Study | Newaz et al. | Diakou et al. | Joshi et al. | Lokeswar Reddy K. & Thangam S. |
---|---|---|---|---|---|
Primary Outcome | Mortality | Mortality | 12 complications | Mortality and cause | MI relapse |
Imputation | DAE + VAE | Mean/mode | MICE | Median | Mean/median |
Feature Selection | RF-SFM | Chi-sq + corr. | None | Mutual info | RFE |
Imbalance Handling | SMOTE + Focal Loss | SMOTE | None | None | None |
Models | 12 ML/DL incl. TabNet, GBM, ensemble | RF, DT, GB, SVM, LR, GNB, KNN | Multi-label | KNN, RF, DT, Ridge, SVC | DT, RF, XGB, SVM, KNN, LR, NB, GB |
Metrics | F1, ROC-AUC, MCC, Rec., Spec., G-Mean, CI, Type Error I/II | Acc., Prec., Rec., F1 | Hamming loss | F1 only | Acc., F1, Rec., Prec. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fourkiotis, K.P.; Tsadiras, A. Machine Learning for Mortality Risk Prediction in Myocardial Infarction: A Clinical-Economic Decision Support Framework. Appl. Sci. 2025, 15, 9192. https://doi.org/10.3390/app15169192
Fourkiotis KP, Tsadiras A. Machine Learning for Mortality Risk Prediction in Myocardial Infarction: A Clinical-Economic Decision Support Framework. Applied Sciences. 2025; 15(16):9192. https://doi.org/10.3390/app15169192
Chicago/Turabian StyleFourkiotis, Konstantinos P., and Athanasios Tsadiras. 2025. "Machine Learning for Mortality Risk Prediction in Myocardial Infarction: A Clinical-Economic Decision Support Framework" Applied Sciences 15, no. 16: 9192. https://doi.org/10.3390/app15169192
APA StyleFourkiotis, K. P., & Tsadiras, A. (2025). Machine Learning for Mortality Risk Prediction in Myocardial Infarction: A Clinical-Economic Decision Support Framework. Applied Sciences, 15(16), 9192. https://doi.org/10.3390/app15169192