Comment on Iacobescu et al. Evaluating Binary Classifiers for Cardiovascular Disease Prediction: Enhancing Early Diagnostic Capabilities. J. Cardiovasc. Dev. Dis. 2024, 11, 396
Abstract
1. Introduction
2. Unusually High Performance Raises Concerns
3. Data Leakage Through Improper Resampling
4. The kNN = 2 Issue and Overfitting
5. Results After Correcting the Methodology
- Reconstruction (global SMOTE-ENN + min–max, then 70/30 split)—following the exact same workflow as in [3] with the aim to replicate—as close as possible—the results established;
- Leak-free resampling (split first; SMOTE–ENN and scalers fit only inside training folds)—limiting the application of SMOTE-ENN only to the training set, maintaining the ‘blindness’ of the model to the test set, and evaluating the model on a truly held-out 30% test set;
- Undersampling (split first; RandomUnderSampler within folds)—using the same exact workflow in [3] while replacing the oversampling with undersampling (thus avoiding the data synthesis and leakage to the test set).
6. Other Issues Noted
7. Discussion and Recommendations
- Split Data Early and Properly: Always separate the test set (or use cross-validation folds) before any resampling, normalization, or feature engineering steps. This ensures the model is evaluated on truly unseen data.
- Use Nested Validation for Tuning: Hyperparameter tuning (e.g., GridSearchCV) should be performed within a training fold, with an independent validation mechanism, rather than on the full dataset. This prevents “peeking” at test data during model selection.
- Apply Oversampling Only to Training Data: Techniques like SMOTE should never have knowledge of the entire dataset. Generate synthetic samples after splitting, within the training subset (and if using cross-validation, redo it for each fold). This avoids contaminating the test set with synthetic points derived from it.
- Be Wary of Extreme Metrics: Treat near-100% results with healthy skepticism. Examine whether any feature or preprocessing step could be unintentionally leaking information. Often, a deep dive will reveal either data leakage, label proxy features, or an overly simplistic dataset if performance is too good to be true.
- Cross-Check Model Complexity: If an automated search selects an unusual hyperparameter (e.g., k = 1 or 2 in kNN, very deep trees, etc.), consider if this may be overfitting. Manually inspect performance on validation vs. training sets. A small k in kNN yielding huge accuracy gains is a hint to double-check the data pipeline for leaks or anomalies. Hyperparameter choices optimized purely by an algorithm should be interpreted in the context of clinical goals—ensuring that the resulting model serves meaningful and generalizable predictions rather than just maximizing mathematical metrics.
- Report Methodology Transparently: Provide clear details on when each preprocessing step was performed relative to splitting. Ambiguity in this can hide leakage. Diagrams are helpful, but they must include these details, not just a high-level pipeline. Transparent reporting allows others to trust and reproduce the findings or catch issues if present.
Author Contributions
Data Availability Statement
Conflicts of Interest
References
- Dey, D.; Slomka, P.J.; Leeson, P.; Comaniciu, D.; Shrestha, S.; Sengupta, P.P.; Marwick, T.H. Artificial Intelligence in Cardiovascular Imaging. J. Am. Coll. Cardiol. 2019, 73, 1317–1335. [Google Scholar] [CrossRef] [PubMed]
- Krittanawong, C.; Virk, H.U.H.; Bangalore, S.; Wang, Z.; Johnson, K.W.; Pinotti, R.; Zhang, H.; Kaplin, S.; Narasimhan, B.; Kitai, T.; et al. Machine learning prediction in cardiovascular diseases: A meta-analysis. Sci. Rep. 2020, 10, 16057. [Google Scholar] [CrossRef] [PubMed]
- Iacobescu, P.; Marina, V.; Anghel, C.; Anghele, A.-D. Evaluating Binary Classifiers for Cardiovascular Disease Prediction: Enhancing Early Diagnostic Capabilities. J. Cardiovasc. Dev. Dis. 2024, 11, 396. [Google Scholar] [CrossRef] [PubMed]
- Van Calster, B.; Nieboer, D.; Vergouwe, Y.; De Cock, B.; Pencina, M.J.; Steyerberg, E.W. A calibration hierarchy for risk models was defined: From utopia to empirical data. J. Clin. Epidemiol. 2016, 74, 167–176. [Google Scholar] [CrossRef] [PubMed]
- Ioannidis, J.P.A. Why Most Published Research Findings Are False. PLoS Med. 2005, 2, e124. [Google Scholar] [CrossRef] [PubMed]
- Alturayeif, N.; Hassine, J. Data leakage detection in machine learning code: Transfer learning, active learning, or low-shot prompting? PeerJ Comput. Sci. 2025, 11, e2730. [Google Scholar] [CrossRef] [PubMed]
- Yagis, E.; Atnafu, S.W.; de Herrera, A.G.S.; Marzi, C.; Scheda, R.; Giannelli, M.; Tessa, C.; Citi, L.; Diciotti, S. Effect of data leakage in brain MRI classification using 2D convolutional neural networks. Sci. Rep. 2021, 11, 22544. [Google Scholar] [CrossRef] [PubMed]
- Rosenblatt, M.; Tejavibulya, L.; Jiang, R.; Noble, S.; Scheinost, D. Data leakage inflates prediction performance in connectome-based machine learning models. Nat. Commun. 2024, 15, 1829. [Google Scholar] [CrossRef] [PubMed]
- Demircioğlu, A. Applying oversampling before cross-validation will lead to high bias in radiomics. Sci. Rep. 2024, 14, 11563. [Google Scholar] [CrossRef] [PubMed]
- Maleki, F.; Ovens, K.; Gupta, R.; Reinhold, C.; Spatz, A.; Forghani, R. Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological Pitfalls. Radiol. Artif. Intell. 2023, 5, e220028. [Google Scholar] [CrossRef] [PubMed]
- Kapoor, S.; Narayanan, A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns 2023, 4, 100804. [Google Scholar] [CrossRef] [PubMed]
- Batista, G.E.A.P.A.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2009; Volume xxii, 745p. [Google Scholar]
- Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Terven, J.; Cordova-Esparza, D.-M.; Romero-González, J.-A.; Ramírez-Pedraza, A.; Chávez-Urbiola, E.A. A comprehensive survey of loss functions and metrics in deep learning. Artif. Intell. Rev. 2025, 58, 195. [Google Scholar] [CrossRef]
- Krstajic, D.; Buturovic, L.J.; Leahy, D.E.; Thomas, S. Cross-validation pitfalls when selecting and assessing regression and classification models. J. Cheminform. 2014, 6, 10. [Google Scholar] [CrossRef] [PubMed]
- Sasse, L.; Nicolaisen-Sobesky, E.; Dukart, J.; Eickhoff, S.B.; Götz, M.; Hamdan, S.; Komeyer, V.; Kulkarni, A.; Lahnakoski, J.M.; Love, B.C.; et al. Overview of leakage scenarios in supervised machine learning. J. Big Data 2025, 12, 135. [Google Scholar] [CrossRef]
- Sasse, L.; Nicolaisen-Sobesky, E.; Dukart, J.; Eickhoff, S.B.; Götz, M.; Hamdan, S.; Komeyer, V.; Kulkarni, A.; Lahnakoski, J.; Love, B.C.; et al. On Leakage in Machine Learning Pipelines. arXiv 2024, arXiv:2311.04179. [Google Scholar] [CrossRef]
- Lones, M.A. How to avoid machine learning pitfalls: A guide for academic researchers. arXiv 2021, arXiv:2108.02497. [Google Scholar]
| Accuracy/ Method | Iacobescu et al. [3] | Reconstruction (1) | Proper Sampling (2) | Undersampling (3) |
|---|---|---|---|---|
| Random Forest | 0.94 | 0.95 | 0.84 | 0.75 |
| Log. Reg. | 0.88 | 0.85 | 0.68 | 0.76 |
| kNN | 0.99 | 0.92 | 0.77 | 0.66 |
| Grad. Boosting | 0.95 | 0.95 | 0.87 | 0.74 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Eltawil, M.; Byham-Gray, L.; Jia, Y.; Mistry, N.; Parrott, J.; Gohel, S. Comment on Iacobescu et al. Evaluating Binary Classifiers for Cardiovascular Disease Prediction: Enhancing Early Diagnostic Capabilities. J. Cardiovasc. Dev. Dis. 2024, 11, 396. J. Cardiovasc. Dev. Dis. 2026, 13, 46. https://doi.org/10.3390/jcdd13010046
Eltawil M, Byham-Gray L, Jia Y, Mistry N, Parrott J, Gohel S. Comment on Iacobescu et al. Evaluating Binary Classifiers for Cardiovascular Disease Prediction: Enhancing Early Diagnostic Capabilities. J. Cardiovasc. Dev. Dis. 2024, 11, 396. Journal of Cardiovascular Development and Disease. 2026; 13(1):46. https://doi.org/10.3390/jcdd13010046
Chicago/Turabian StyleEltawil, Mohamed, Laura Byham-Gray, Yuane Jia, Neil Mistry, James Parrott, and Suril Gohel. 2026. "Comment on Iacobescu et al. Evaluating Binary Classifiers for Cardiovascular Disease Prediction: Enhancing Early Diagnostic Capabilities. J. Cardiovasc. Dev. Dis. 2024, 11, 396" Journal of Cardiovascular Development and Disease 13, no. 1: 46. https://doi.org/10.3390/jcdd13010046
APA StyleEltawil, M., Byham-Gray, L., Jia, Y., Mistry, N., Parrott, J., & Gohel, S. (2026). Comment on Iacobescu et al. Evaluating Binary Classifiers for Cardiovascular Disease Prediction: Enhancing Early Diagnostic Capabilities. J. Cardiovasc. Dev. Dis. 2024, 11, 396. Journal of Cardiovascular Development and Disease, 13(1), 46. https://doi.org/10.3390/jcdd13010046

