Next Article in Journal
Heart Attack Risk Prediction via Stacked Ensemble Metamodeling: A Machine Learning Framework for Real-Time Clinical Decision Support
Previous Article in Journal
A Review on Scholarly Publication Recommender Systems: Features, Approaches, Evaluation, and Open Research Directions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparison of Ensemble and Meta-Ensemble Models for Early Risk Prediction of Acute Myocardial Infarction

by
Daniel Cristóbal Andrade-Girón
1,
Juana Sandivar-Rosas
2,
William Joel Marin-Rodriguez
3,*,
Marcelo Gumercindo Zúñiga-Rojas
4,
Abrahán Cesar Neri-Ayala
5 and
Ernesto Díaz-Ronceros
3
1
Department of Formal and Natural Sciences, Universidad Nacional José Faustino Sánchez Carrión, Lima 15136, Peru
2
Department of Engineering, Universidad Nacional Mayor de San Marcos, Lima 15081, Peru
3
Department of Engineering Systems, Computer and Electronics, Universidad Nacional José Faustino Sánchez Carrión, Lima 15136, Peru
4
Department of Sociology, Universidad Nacional José Faustino Sánchez Carrión, Lima 15136, Peru
5
Department of Administration and Management, Universidad Nacional José Faustino Sánchez Carrión, Lima 15136, Peru
*
Author to whom correspondence should be addressed.
Informatics 2025, 12(4), 109; https://doi.org/10.3390/informatics12040109
Submission received: 18 July 2025 / Revised: 5 September 2025 / Accepted: 8 October 2025 / Published: 11 October 2025

Abstract

Cardiovascular disease (CVD) is a major cause of mortality around the world. This underscores the critical need to implement effective predictive tools to inform clinical decision-making. This study aimed to compare the predictive performance of ensemble learning algorithms, including Bagging, Random Forest, Extra Trees, Gradient Boosting, and AdaBoost, when applied to a clinical dataset comprising patients with CVD. The methodology entailed data preprocessing and cross-validation to regulate generalization. The performance of the model was evaluated using a variety of metrics, including accuracy, F1 score, precision, recall, Cohen’s Kappa, and area under the curve (AUC). Among the models evaluated, Bagging demonstrated the best overall performance (accuracy ± SD: 93.36% ± 0.22; F1 score: 0.936; AUC: 0.9686). It also reached the lowest average rank (1.0) in Friedman test and was placed, together with Extra Trees (accuracy ± SD: 90.76% ± 0.18; F1 score: 0.916; AUC: 0.9689), in the superior statistical group (group A) according to Nemenyi post hoc test. The two models demonstrated a high degree of agreement with the actual labels (Kappa: 0.87 and 0.83, respectively), thereby substantiating their reliability in authentic clinical contexts. The findings substantiated the preeminence of aggregation-based ensemble methods in terms of accuracy, stability, and concordance. This underscored the prominence of Bagging and Extra Trees as optimal candidates for cardiovascular diagnostic support systems, where reliability and generalization were paramount.

1. Introduction

Cardiovascular disease (CVD) has been identified as the leading cause of mortality on a global scale [1,2]. According to the World Health Organization (WHO), approximately 17.9 million people died from CVD in 2019, representing 32% of all global deaths [3]. Of the aforementioned mortalities, 85% were attributed to acute cardiovascular events, primarily acute myocardial infarction (AMI) and cerebrovascular accidents (CVAs) [2]. A substantial proportion of the global burden of CVD is concentrated in low- and middle-income countries (LMICs) [4]. In these regions, structural limitations in health systems, including deficiencies in hospital infrastructure, specialized human resources, and access to emergency services, serve as significant barriers to the delivery of timely, effective, and continuous care. This, in turn, results in elevated levels of morbidity and mortality [5]. This structural scenario is exacerbated by an unfavorable epidemiological transition characterized by sustained increases in modifiable cardiovascular risk factors, such as obesity, type 2 diabetes mellitus, hypertension, and a sedentary lifestyle [6]. The interplay of these clinical and social factors contributes to an escalation in disease burden, leading to an increase in CVD-attributable morbidity and mortality [7]. This upward trend is indicative of a growing challenge for health systems and has profound long-term clinical and epidemiological implications [8]. The most recent global estimates project that, in the absence of effective primary prevention interventions and strengthened primary care, the number of CVD-attributable deaths could exceed 23.6 million per year by the year 2030 [3,9].
The management of CVD represents a multifaceted challenge, not only in terms of its clinical complexity but also in terms of the growing economic burden it poses on health systems worldwide [10]. The most recent estimates project that the total cost associated with CVD-linked treatment, rehabilitation, and lost productivity could reach US$1.1 trillion by 2035, a substantial increase from the US$555 billion reported in 2015 [11]. The observed exponential increase in economic impact is attributable to two factors: population aging and the proliferation of cardiovascular risk factors that are not adequately managed. This underscores the pressing need for cost-effective prevention strategies and predictive models that facilitate early intervention [12].
Preventive measures and the early diagnosis of CVD are of paramount importance to mitigate its impact on overall morbidity and mortality [13]. The majority of cases of CVD can be prevented by interventions that target risk factors [14,15]. Concurrently, the capacity for early detection enables the implementation of optimal clinical management, thereby facilitating the timely execution of therapeutic and pharmacological strategies. These strategies have been shown to improve patient prognosis and quality of life [16].
In the context of early cardiovascular risk assessment, biomarkers play a pivotal role by facilitating early detection of subclinical cardiac dysfunction and enabling precise risk stratification for adverse cardiovascular events [17]. Cardiac troponins (cTnI and cTnT) are among the most widely used markers. They are recognized as the most sensitive and specific markers for detecting myocardial necrosis [18]. This makes them essential tools for the timely diagnosis of AMI [19,20]. Conversely, B-type natriuretic peptide (BNP) and its N-terminal inactive fragment (NT-proBNP) serve as pivotal biomarkers in the functional evaluation of the left ventricle and the categorization of heart failure (HF), both with reduced and preserved ejection fraction [21]. Additionally, high-sensitivity C-reactive protein (hs-CRP) has been extensively validated as a marker of systemic inflammation, with significant clinical implications in the pathophysiology of atherosclerosis and its association with an elevated risk of major cardiovascular events, including myocardial infarction (MI) and stroke [22].
In addition to serum biomarkers, the lipid profile is a critical component in the comprehensive assessment of cardiovascular risk [23]. This profile includes the quantification of lipoproteins, such as low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), total cholesterol, and triglycerides. Altered levels of these substances have been robustly associated with the progression of atherosclerosis and the development of coronary heart disease [24]. Early detection and appropriate management of these dyslipidemias allow the implementation of primary and secondary prevention strategies aimed at reducing the burden of atherothrombotic cardiovascular events [25]. Conversely, noninvasive cardiovascular imaging techniques serve as essential diagnostic tools for the structural and functional detection of heart disease in its early stages [26]. The electrocardiogram (ECG) is a diagnostic tool that utilizes electrical signals to identify abnormalities in the heart’s electrical activity, thereby facilitating the diagnosis of arrhythmias and myocardial ischemia [27]. Echocardiography is a pivotal diagnostic modality, providing insights into myocardial structure and function. It plays a crucial role in the evaluation of HF and valvular heart disease [28]. Coronary angiotomography (CTA) offers a noninvasive assessment of atherosclerosis, while cardiac magnetic resonance (CMR) provides detailed information on myocardial viability, inflammation, and fibrosis [29]. Conversely, functional and stress tests, including stress testing, stress echocardiography, and myocardial perfusion by single photon emission computed tomography (SPECT) and positron emission tomography (PET), facilitate precise evaluation of coronary ischemia and perfusion [30,31].
However, despite their high diagnostic accuracy, many of these techniques are expensive and require specialized infrastructure, which limits their implementation in low-resource countries [2,12]. In light of this challenge, predictive models founded upon machine learning (ML) have materialized as a compelling solution for the expeditious diagnosis of CVD [1,32]. These models have demonstrated a superior capacity for risk stratification and prediction of cardiovascular events, allowing for more efficient identification of high-risk patients and optimization of clinical decision-making [33].
Research on ML algorithms applied to cardiovascular problems has gained significant relevance in recent years [34,35]. Nonetheless, the implementation of an ML model in clinical prediction, such as the early diagnosis of AMI, presents substantial technical limitations that compromise its applicability in a real-world setting [36,37]. Problems such as overfitting, convergence to local optima, sensitivity to noise, and instability to slight variations in the data affect its generalizability and robustness [38,39]. These limitations are further exacerbated when dealing with complex, multivariate, and frequently imbalanced clinical data, as is often the case in cardiology [40]. In response to the aforementioned limitations, the utilization of ensemble methods has emerged as a superior methodological strategy. Ensemble methods integrate multiple models to reduce bias and variance, thereby enhancing the stability of the predictive system [36,37]. Algorithms such as Bagging, Boosting, and Stacking have been demonstrated to leverage the strengths of diverse classifiers, thereby generating more accurate, interpretable, and clinically useful predictions [41]. Despite their proven effectiveness, the application of ensemble architectures in predicting MI remains scarce in the literature [42], revealing a knowledge gap with direct implications for the design of intelligent medical decision support systems. This study addresses this gap by proposing a comprehensive approach that combines methodological rigor and clinical applicability to contribute to the development of decision-support tools in cardiovascular medicine.
In this context, this study has two objectives. First, it aims to develop ML assembly architectures. Second, it aims to compare these architectures. The goal of this comparison is to predict AMI. This approach aims to optimize diagnostic precision while enhancing the resilience and transferability of models within authentic clinical contexts. In such settings, patient heterogeneity, imbalanced data, and the variability in the clinical manifestation of the disease present substantial obstacles to the efficacy of conventional models.

2. Literature Review

A thorough examination of the most pertinent literature concerning the prediction of MI was conducted, with a particular emphasis on the utilization of ML algorithms. In this context, Liu et al. [43] developed supervised ML models for predicting MI. They used recursive feature elimination as a selection algorithm and a hierarchical modeling approach. The models evaluated included random forest, gradient boosting decision tree (GBDT), logistic regression, and support vector machine (SVM). GBDT was determined to be the optimal model. In a similar vein, Wang et al. [44] developed three ML models to predict the occurrence of tachyarrhythmias following AMI. Following the implementation of a variable selection process, the artificial neural network (ANN) demonstrated the highest degree of accuracy, thereby exhibiting superior performance in comparison to the model that was based on the GRACE score variables. Sharma and Sunkaria [45] proposed a technique based on the stationary wavelet transform to analyze multiderivative ECG signals, extracting discriminatory features. They then used SVM and k-nearest neighbor (KNN) algorithms to classify patients with MI and controls, using the PTB-DB database. In the class-oriented approach, the SVM attained an accuracy of 98.84% (area under the curve (AUC) = 0.9994), while the KNN algorithm achieved 98.69% accuracy (AUC = 0.9945).
In addition, Oliveira et al. [46] examined data from patients with a primary diagnosis of AMI in a Portuguese hospital (2013–2015) to predict mortality using ML algorithms. A total of three experiments were conducted, with the number and type of variables included in each experiment being systematically varied. In Experiment 1, stochastic gradient descent achieved an accuracy of 80%, a recall of 77%, and an AUC of 79%. In Experiment 2, the incorporation of novel variables enhanced the performance of the SVM, with an AUC of 81%. In Experiment 3, the integration of feature selection and the Synthetic Minority Oversampling TEchnique (SMOTE) enhanced the model’s performance, attaining an AUC of 88% and a recall of 80%. The results obtained demonstrate the significance of variable selection and data balancing in the prediction of AMI mortality.
Conversely, Li et al. [47] developed an early warning model for HF in patients with AMI using ML techniques. In Cohort 1 (2018–2019), patients with and without HF were included, and seven algorithms were evaluated using features selected from routine testing. Cohort 2 (2020–2021) was utilized for the external validation of the model. Among the models evaluated, XGBoost demonstrated the most optimal performance, identifying troponin I, triglycerides, urine red blood cell count, γ-glutamyl transpeptidase, glucose, urine density, prothrombin time, prealbumin, and urea as key features. The HF-Lab9 model, which is based on XGBoost, achieved an AUC of 0.966 and demonstrated a high clinical benefit according to decision curve analysis, thereby highlighting its potential in the early prediction of post-AMI HF.
In a systematic review study, Cho et al. [48] reported that of 7,348 articles identified, 112 were reviewed in full text, resulting in a final set of 24 studies with data from 374,365 patients. The ML models that were evaluated included neural networks (n = 12), random forests (n = 11), decision trees (n = 8), SVMs (n = 8), and Bayesian approaches (n = 7). In comparison, Conventional Statistical Method (CSM) employed logistical regression (n = 19), CSM-derived risk scores (n = 12), and Cox regression (n = 2). Of the 19 studies that analyzed mortality, 13 reported higher C rates with ML. A total of 29 comparisons were conducted between ML and CSM, with absolute differences of less than 0.05 in 90% of cases.
Barker et al. [49] employed a comparable methodology in their systematic search of 10 databases, resulting in the identification of 4356 studies. Of these, 11 met the predetermined inclusion criteria. The ML models encompassed between 4 and 72 variables, exhibiting an AUC ranging from 0.71 to 0.96. In five of six comparative studies, ML demonstrated superior performance in comparison to regression analysis. However, the study revealed that none of the included studies satisfied the established reporting standards, and five studies were found to be at high risk of bias.
Despite recent advancements in the application of individual ML models for cardiovascular event prediction, these approaches have critical limitations, including risk of bias, low interpretability, limited generalizability [50], and propensity to converge to local optima [51]. These shortcomings compromise their effectiveness in real clinical settings, where data heterogeneity and the need for reliable decisions demand robust, stable, and explainable models [52]. In the context of large-scale datasets, high dimensionality, and clinically heterogeneous populations, the challenges associated with data analysis are exacerbated. The structural complexity and inherent variability of these data increase the risk of overfitting and compromise the stability and generalizability of models [33]. In this context, and as a response to the limitations of individual models, the development and comparative evaluation of ensemble architectures in ML is proposed. The utilization of these techniques facilitates the mitigation of overfitting, enhances learning stability, and fortifies generalization capability by amalgamating multiple base estimators that discern disparate patterns inherent in intricate and heterogeneous clinical data [38].
In this context, this study aims to address the existing gap in the early prediction of AMI by developing, benchmarking, and validating ML models based on ensemble techniques. This methodological approach is designed to enhance the stability, interpretability, and generalizability of predictive models. These attributes are fundamental for ensuring the successful implementation of predictive models in actual clinical settings in the future.
The objective of this study is to enhance the performance of multiple base algorithms by employing robust aggregation methods, such as Bagging, Boosting, AdaBoost, and Extra Trees. This approach aims to mitigate the limitations inherent in individual models, particularly with regard to performance variability, overfitting, and sensitivity to noise. Consequently, this study contributes to the development of more accurate, reliable, and clinically relevant predictive tools that can aid medical decision-making in the timely identification of patients at risk of acute coronary events.

3. Methodology

The development of the model involved the implementation of the methodology employed in the design of ML machines, as outlined below.

3.1. Data Source

A dataset extracted from the Kaggle repository (https://www.kaggle.com/datasets/ankushpanday2/heart-attack-risk-dataset-of-china, accessed 10 February 2024) was utilized in this research. The dataset comprised 239,266 records and 28 variables. This dataset provides a substantial foundation for developing a model that forecasts the likelihood of AMI in the Chinese population. The model integrates critical clinical variables, such as age, sex, smoking, blood pressure, cholesterol levels, and cardiovascular history, with pertinent contextual determinants. These determinants encompass regional disparities in health service access, urban–rural differences in habits, and environmental exposure to air pollutants.

3.2. Data Preprocessing

A preliminary exploratory analysis of the database was conducted using descriptive statistical functions to identify missing values, inconsistencies in data types, and anomalies in the distribution of variables. To ensure structural consistency, the data underwent integration by means of merging and concatenation techniques, along with standardization of variable names and formats.
The management of missing values was addressed through a combination of elimination and imputation strategies, with the mean, median, or mode applied depending on the variable type (continuous or categorical). In the context of AMI, Z-score analysis was employed to identify and retain only those patterns that were deemed to be clinically plausible. Consequently, the duplicate observations were eliminated by the drop_duplicates() function to circumvent redundancies and biases in the model training.
Categorical variables were transformed using one-hot encoding, with the implementation of the OneHotEncoder and pd.get_dummies() functions to ensure adequate representation for ML algorithms. Subsequently, variable scaling and transformation techniques were applied to improve the homogeneity of the distributions and the numerical stability of the model. Standardization with StandardScaler was employed to achieve these objectives.
The following preprocessing stages were foundational to optimizing the performance of the predictive models in the classification of clinical risk and the identification of complications associated with AMI.

3.3. Selection of Learning Algorithm

The selection of ensemble models for the prediction of complications of AMI is based on their ability to improve model accuracy and robustness by combining multiple base estimators. This strategy is advantageous in clinical scenarios where data present high variability and nonlinear relationships between patient characteristics and clinical outcomes.

3.3.1. AdaBoost Classifier

The Adaptive Boosting (AdaBoost) algorithm is a boosting-based ensemble method that combines multiple weak classifiers to construct a robust and accurate model [53]. The operation of AdaBoost is predicated on iteratively assigning weights to the observations in the training set [54]. In each iteration, the instances that were misclassified by the previous classifier are weighted more heavily. This process forces the subsequent classifier to concentrate on the errors made [55]. The contribution of each classifier to the final ensemble is determined by a weighting coefficient inversely proportional to its error rate, ensuring a greater influence of the more accurate classifiers on the final decision [53].
In this study, the implementation of AdaBoost Classifier (from the sklearn.ensemble library) led to enhanced detection of relevant clinical patterns, achieved through more effective differentiation between patients at low and high risk for cardiovascular adverse events. This algorithm has demonstrated notable efficacy in identifying nonlinear relationships and synergies among biomedical variables (e.g., biomarkers, clinical history, vital signs), even in the presence of noisy or partially collinear data.
Furthermore, the adaptive nature of AdaBoost offers greater resistance to overfitting compared to more complex models, particularly when used with simple base classifiers, such as decision stumps. This property is of particular relevance in clinical settings, where the generalization of the model to new patients is critical to the practical utility of the medical decision support system.
F m x = m = 1 M α m h m ( x ) ,
where h m x are base estimators (commonly weak regressors such as decision trees), and α m represents their weighting coefficients.

3.3.2. Gradient Boosting Classifier

The Gradient Boosting Classifier algorithm is a supervised learning method based on boosting, which builds an additive model by progressively minimizing a loss function using downward gradients in the functional space [56]. The architecture of the model is predicated on the sequential combination of weak base classifiers. In this regard, each new estimator is adjusted to correct the residual errors of the preceding models. This process is repeated iteratively to improve the model’s accuracy [57].
In the context of AMI complication prediction, the Gradient Boosting Classifier optimizes the identification of complex and nonlinear clinical patterns, facilitating better risk stratification of patients [58]. Its capacity to manage substantial quantities of heterogeneous data, in conjunction with its resilience to overfitting, positions it as a reliable instrument for clinical decision-making, underpinned by sophisticated predictive models [59].
F m x = F m 1 + γ m h m x ,
where γ m is the learning coefficient obtained by minimizing:
γ m = a r g   min γ i = 1 n L ( y i , F m 1 x i + γ h m ( x i ) )
In contrast to AdaBoost, which adjusts sample weights according to their error, Gradient Boosting builds a sequential model by fitting each new estimator to the residual gradients. This formulation permits flexibility in the utilization of various loss functions, including mean square error for regression or classification.

3.3.3. Random Forest Classifier

The Random Forest Classifier algorithm, an ensemble method based on decision trees that combines multiple classifiers built on random subsets of data and features, was used to improve accuracy, reduce variance, and minimize overfitting [60]. The implementation of the model, which was carried out using the sklearn.ensemble library, incorporated hyperparameter adjustment through grid search and stratified cross-validation.
In the context of predicting complications of AMI, Random Forest allowed modeling nonlinear and complex relationships between clinical variables and biomarkers, with high robustness to outliers and good tolerance to high dimensionality. Furthermore, the model provided measures of variable importance, thereby facilitating clinical interpretation and enhancing its applicability in medical decision support environments.
F x = 1 T t = 1 T h t ( x ) ,
where h t x are individual regression trees trained with bootstrap sampling.

3.3.4. Extra Trees Classifier

The Extra Trees Classifier (Extremely Randomized Trees) is a variant of the Random Forest algorithm that incorporates a higher degree of randomization during the construction of decision trees [61]. In contrast to the Random Forest method, which determines optimal splitting thresholds based on impurity criteria, Extra Trees allocates these thresholds randomly within the subset of features selected for each node [62]. This strategy fosters increased diversity among the ensemble trees, thereby leading to a reduction in model variance and enhancing its stability against noise and overfitting.
In this study, the Extra Trees Classifier from the sklearn.ensemble library was utilized as part of the set of models evaluated for the prediction of complications of AMI. The architecture’s high degree of randomization enabled the capture of nonlinear interactions and complex relationships between clinical variables and biomarkers, even in the presence of collinearity or skewed distributions.
This approach has proven to be particularly effective in high-dimensional contexts, demonstrating a notable capacity to manage substantial volumes of data while maintaining optimal computational performance. In addition, Extra Trees offers measures of variable significance that facilitate clinical interpretation and the identification of key predictors in cardiovascular risk stratification, as is the case with other tree-based models.
F ( x ) = 1 B t = 1 B h b x ,
where h b x represent the prediction of the tree b . This strategy introduces a higher bias compared to Random Forest, but reduces the variance and correlation between trees, improving the generalization capability.

3.3.5. Bagging Classifier

The Bagging Classifier algorithm is an ensemble method that enhances the stability and accuracy of base models by reducing variance through training on multiple subsets of data generated by sampling with replacement (bootstrap) [63]. The base classifier h b ( x ) is independently calibrated on distinct samples, and the final ensemble prediction is obtained by majority voting. This approach has been demonstrated to mitigate overfitting and enhance model generalization [64].
In the context of predicting complications of AMI, Bagging Classifier optimizes the model’s ability to handle clinical data with high variability. This allows for better identification of risk factors and greater robustness in the classification of patients according to their clinical evolution:
y ^ = 1 B b = 1 B h b ( x ) ,
where each h b x is a base model trained on a random subset of the data. The relationship elucidates the impact of Bagging on variance reduction:
V a r y ^ = 1 B V a r h b + 1 1 B C o v ( h b , h b ) ,
where the decrease in variance is greater when the models h b are less correlated. Bagging improves model stability and generalizability, reducing overfitting without significantly increasing bias.

3.4. Performance Metrics

The performance of the model is evaluated through performance metrics derived from the confusion matrix, a fundamental tool for assessing classification models [65]. The confusion matrix is a statistical tool that summarizes the prediction results by comparing the actual labels with the predictions generated by the model. It provides detailed insight into the model’s classification capability [66].
Given the inherent fallibility of any classification system, prediction errors may take two primary forms: false positives and false negatives. To comprehend these errors, it is necessary to define four key metrics within the confusion matrix:
  • True positives (TP): These occur in cases where the model correctly predicts an observation belonging to the positive class.
  • False negatives (FN): These occur in situations where the observation belongs to the positive class, but the model misclassifies it as negative.
  • False positives (FP): These occur when an observation of the negative class is incorrectly classified as positive.
  • True negatives (TN): These occur in cases where the model correctly predicts an observation as belonging to the negative class.
It is possible to calculate fundamental performance metrics from these values, including the following:
Accuracy: The proportion of accurate predictions over the total number of observations is indicative of the model’s high precision, which in turn suggests its capacity to categorize both emergency and nonemergency cases accurately. Nevertheless, in medical problems where FN can have critical consequences, accuracy alone is insufficient for assessing model performance. This is because the model can have a high accuracy rating even if it fails to detect emergencies with precision. The term is defined as follows:
A c c u r a c y = ( T P   +   T N ) ( T P   +   T N   +   F P   +   F N )
Precision: It is defined as the degree to which a model correctly identifies positive outcomes, thereby reducing the impact of FP. For instance, if a model were to categorize non-emergency cases as emergency cases, this would create a significant problem for the hospital’s Emergency Department (ED). In actual emergency scenarios, the provision of adequate medical care is often compromised. However, this is not an issue for a subset of misclassified cases. Subsequent diagnostic evaluations have revealed that these cases do not meet the criteria for emergency treatment. Consequently, it is imperative to acknowledge that the occurrence of FP is not as prevalent as previously assumed.
P r e c i s i o n = T P ( T P + F P )
Sensitivity (Recall): To assess the efficacy of the model in identifying observations that belong to the positive class, it is imperative to examine its capacity to detect critical cases accurately. In the event that a patient is diagnosed with severe HF, a condition that constitutes a medical emergency, the following measures must be taken. Upon arrival at the hospital, the model is employed to ascertain whether the patient’s condition constitutes an emergency. In the event that the model misclassifies the patient as a nonemergent case, an FN is generated, which has the potential to compromise the patient’s life in a significant manner. Given the critical consequences of this type of error, the most relevant parameter for model evaluation is the false negative rate (FNR). Consequently, the primary metric used to measure its performance is the recall, since it reflects the model’s ability to identify positive cases and minimize FN correctly.
R e c a l l = T P ( T P + F N )
Specificity: This is indicative of the model’s capacity to identify observations in the negative class accurately, that is, patients who do not present a medical emergency. In the context of a diagnosis of severe HF, a TN event occurs when the model accurately classifies a patient with no critical condition. Conversely, an FP results in an erroneous classification as an emergency, leading to unnecessary interventions and inefficient resource utilization. It is defined as the proportion of TN over the total number of negative cases. A critical component of the evaluation process involves the assessment of the model’s efficacy in excluding emergency scenarios.
S p e c i f i c i t y = T N ( T N + F P )
F1 score: The harmonic mean between precision and sensitivity is a useful metric in the context of unbalanced datasets, as evidenced by the F1 score formula (Equation (11)) This metric enables the integration of the two metrics previously mentioned into a single formula, thereby facilitating a more comprehensive and nuanced assessment of model performance. It is noteworthy that the effects of “recall” and “precision” on the F1 score are equivalent in the formula. However, it is important to note that the “recall” score is considered more significant than the “precision” score. Therefore, when conducting a final evaluation, it is essential to prioritize the “recall” metric as the primary criterion.
F 1 = 2 × ( P r e c i s i o n × R e c a l l ) ( P r e c i s i o n + R e c a l l )
The evaluation of classification model performance is imperative for the optimization of its predictive ability. This process entails the calibration of hyperparameters and the refinement of data preprocessing techniques. In addition to conventional metrics, Cohen’s Kappa coefficient is employed as a statistical measure to quantify the agreement between two classifiers by adjusting the observed agreement based on the expected match by chance [67].
The Kappa coefficient provides a more rigorous assessment than standard precision, which measures only the proportion of hits. The Kappa coefficient corrects for random agreement, thus allowing for a more robust interpretation of model reliability [68]. Mathematically, the quantity k is defined as the proportion of observed agreement ( p 0 ) in relation to the proportion of expected agreement by chance ( p e ) . The value of p 0 is obtained by dividing the sum of the frequencies of the main diagonal of the confusion matrix by the total observations. Conversely, p e is calculated from the marginal probabilities of each category. Its mathematical formulation is expressed as follows:
k = ( p 0 p e ) ( 1 p e )
The interpretation of k follows the scale of values of k ≤ 0.20 indicate poor agreement, 0.21 ≤ k ≤ 0.40 fair, 0.41 ≤ k ≤ 0.60 moderate, 0.61 ≤ k ≤ 0.80 good, and 0.81 ≤ k ≤ 1.00 very good. Values approaching 1 indicate substantial agreement among raters, whereas values near 0 indicate that the model fails to surpass the level of chance agreement. This metric is advantageous for the evaluation of the consistency of ML models and the validation of classification systems in environments characterized by ambiguous or imbalanced data. It offers a robust measure of reliability in classification tasks.
To facilitate model comparison, this study employed a robust nonparametric approach for the comparison of multiple supervised classification models. This approach was based on the Friedman test and Nemenyi post hoc test.

3.5. Evaluation of Model Stability

The stability of a classification model is a fundamental criterion for its application in the diagnosis of critical diseases, such as MI [69]. To assess its stability and mitigate the risk of overfitting, cross-validation is employed. In this process, the mean performance and standard deviation (SD) of the classification model are calculated [70].
N-fold cross-validation involves the division of the dataset into N subsets. In each iteration, one of the subsets is designated as a test set, while the remaining subsets are utilized for model training [71]. This process is repeated N times, with the performance obtained in each cycle being stored. In the evaluation of model stability using N-fold cross-validation, two fundamental metrics are employed: the average performance ( μ P e r f o m a n c e ) and the SD of the performance ( σ P e r f o m a n c e ) [72]. These metrics enable the quantification of model stability and generalizability, thereby minimizing the risk of overfitting. In the context of MI diagnosis, this approach ensures that the model can correctly identify patients with acute cardiac events in a consistent manner. This reduces the risk of misdiagnosis and improves reliability in its clinical implementation.
μ P e r f o m a n c e = 1 N i = 1 N P e r f o m a n c e i
σ P e r f o m a n c e = i = 1 N ( P e r f o m a n c e i μ P e r f o m a n c e ) 2 N

3.6. Receiver Operating Characteristic (ROC) Curve

In the domain of healthcare, the ROC curve constitutes a foundational instrument for the evaluation of the efficacy of binary classification models in the identification of critical diseases, such as MI [73]. The ROC curve is a graphical representation of the model’s capacity to differentiate between two classes: the presence or absence of the medical condition of interest [74]. Each point on the ROC curve corresponds to a distinct threshold for classification. The ideal model will approach the upper left corner of the graph, indicating a high discriminatory ability [75]. In the context of clinical practice, the ROC curve is employed to ascertain the optimal threshold of a diagnostic model. This is achieved by minimizing both FN outcomes, which occur when a patient with an infarction is not detected, and FP outcomes, which occur when an unnecessary hospitalization is recommended. In medical settings, the employment of a model with a high degree of sensitivity is of paramount importance. This is due to the fact that an erroneous classification of a patient with infarction as a healthy individual (i.e., an FN) can have lethal consequences.
Consequently, the ROC curve and the AUC will facilitate the selection of the most suitable predictive model, based on the optimal balance between sensitivity and specificity, thereby enhancing decision-making processes in clinical settings. The AUC is defined as:
A U C = T P R T . d d T F P R T d T ,
where T is the classifier decision threshold, TPR(T) is the true positive rate as a function of the threshold, and FPR(T) is the false positive rate as a function of the threshold, that is:
T P R = T r u e   p o s i t i v e T o t a l   p o s i t i v e
F P R = F a l s e   p o s i t i v e T o t a l   n e g a t i v e
The AUC is indicative of the performance of the binary classifier, irrespective of the true instance distribution. In essence, the function calculates the probability P ( x 1 > x 2 ) of two classes.

4. Results and Discussion

The application of ensemble-based ML models to the clinical dataset yielded evidence of differential performance in the classification of the risk of AMI. The implementation of multiple algorithms on a large and heterogeneous population base facilitated the comparative evaluation of their discriminative ability against complex morbidity patterns. In this particular context, a marked superiority of ML ensemble methods was observed. These methods demonstrated greater consistency in the identification of positive cases and adequate sensitivity to the presence of diverse clinical and contextual variables. The findings underscore the efficacy of the ensemble approach in contexts characterized by high epidemiological variability. They also establish a technical framework for model prioritization in diagnostic support tasks within the domain of preventive cardiology.
To ensure the integrity, consistency, and analytical robustness of the dataset, a systematic preprocessing protocol was applied to mitigate the effects of missing values, atypicality, structural redundancy, and class imbalance. These factors are critical in clinical studies of AMI prediction. In the initial phase, a range of imputation strategies was employed, with each strategy tailored to a specific type of variable. In the context of numerical variables, the median was employed as a robust estimator due to its capacity to withstand extreme values and asymmetric distributions. For categorical variables, the imputation by mode was applied, preserving the semantic coherence of the categories and maintaining the stability of the marginal distributions. Subsequently, a process of record elimination was initiated with the objective of reducing structural redundancies and minimizing the risk of overfitting. This process was implemented to ensure that the ML algorithms were not biased by the replication of specific patterns. The coding of categorical variables was performed by one-hot encoding, with the first category systematically excluded to avoid perfect collinearity. Subsequently, the StandardScaler standardization technique was implemented, whereby the numerical variables were transformed to ensure a zero mean and unit SD. This transformation was imperative to guarantee the appropriate convergence of models that are sensitive to the scale of the variables.
Since the positive event class Acute Myocardial Infarction (AMI) was substantially underrepresented compared to the negative class, the problem of class imbalance was explicitly addressed using the SMOTE [76]. This method generates synthetic examples through the interpolation of the nearest neighbors of the minority class, thereby facilitating the realistic expansion of the representation space of patients with infarction [77]. The implementation of SMOTE was executed subsequent to the partitioning of the dataset into training and validation sets. This approach was adopted to prevent information leakage and maintain the validity of the cross-evaluation process.
Consequently, a comparative evaluation of five ensemble algorithms commonly used in supervised ML was performed: AdaBoost, Gradient Boosting, Random Forest, Extra Trees, and Bagging [78]. The selection of these models was predicated on their demonstrated capacity to enhance the precision of predictions by integrating multiple classifiers. This approach has been shown to minimize both bias and variance, which are pivotal considerations in intricate clinical scenarios, such as the prognosis of AMI.
The performance of each model was quantified using widely accepted clinical and computational classification metrics. The utilization of overall accuracy as an initial performance metric is a common practice. However, it is important to note that its interpretation in isolation can be misleading in contexts characterized by class imbalance [79]. Consequently, supplementary metrics were integrated to encompass diverse dimensions of predictive efficacy. The F1 score, calculated as the harmonic mean between precision and sensitivity, was used to assess the balance between type I and type II errors. Accuracy (precision) and sensitivity (recall) were reported as complementary metrics to examine the model’s ability to correctly identify TP cases and minimize FP, respectively.
The assessment of the adjusted agreement between model predictions and actual values was conducted by estimating the Cohen’s Kappa coefficient. This coefficient is utilized to adjust for the expected agreement by chance [80]. Values above 0.80 were considered indicative of a near-perfect level of agreement, according to criteria established in the epidemiological literature [81]. Furthermore, the AUC was calculated, as this metric is utilized to assess the discriminatory capacity of a model in differentiating between cases and controls across a range of decision thresholds [82].
As illustrated in Table 1, the results obtained for each model are accompanied by a critical analysis that addresses their relative performance in terms of accuracy, robustness, and generalizability.
As illustrated in Table 1, the performance of the ensemble models exhibits significant variation in relation to the metric under consideration. The accuracy metric, defined as the proportion of correct predictions over total observations, reached its highest values in the Bagging (0.936) and Extra Trees (0.916) models, in contrast to the AdaBoost (0.646) model, which demonstrated the lowest performance. However, given the potential for the accuracy metric to provide misleading indications in settings characterized by imbalanced classes, as is often observed in clinical cohorts with low prevalence of AMI, its interpretation should be approached with caution [83].
In this context, the F1 score, understood as the harmonic mean between accuracy (precision) and sensitivity (recall), is a pivotal metric for evaluating the performance of models in unbalanced classification tasks, such as the detection of AMI events [84]. This metric offers a robust measurement of the model’s capacity to strike a balance between the accurate identification of positive events (sensitivity) and the minimization of FP (accuracy).
The findings reveal that the Bagging model attains both the highest overall accuracy and the maximum F1 score (0.936), thereby substantiating its proficiency in the comprehensive classification of AMI events. In a similar vein, Extra Trees displays an F1 score of 0.916, signifying a robust and balanced performance with regard to detection and specificity. Conversely, the AdaBoost model, with an F1 Score of 0.646, demonstrates suboptimal performance, which may be attributed to an imbalance between its detection capability (recall) and the reliability of its positive predictions (accuracy) [85]. This behavior indicates a higher probability of type I (FP) or type II (FN) errors, thereby compromising its clinical utility in scenarios where early detection and diagnostic accuracy are paramount.
Accuracy, defined as the proportion of TP over all cases predicted as positive, acquires particular clinical relevance in the prediction of cardiovascular events. This is due to the fact that low accuracy can result in a high number of FP, which can lead to unnecessary invasive or costly diagnostic procedures [86]. The Bagging model demonstrated a high degree of reliability, with an accuracy of 0.941, signifying its capacity for precise positive predictions. In contrast, the AdaBoost model demonstrated a substantially lower level of accuracy, with a value of 0.647. This finding suggests that, in clinical settings, the AdaBoost model may potentially lead to an increased frequency of over-intervention.
Sensitivity, also known as recall, is a measure of the proportion of TP correctly identified by the model. From an epidemiological and clinical perspective, this metric is critical, as low recall may imply a high FNR, which is patients with AMI who are missed, with potentially fatal consequences [87]. The Bagging and Extra Trees methods demonstrated a consistent sensitivity level above 93%, indicating their notable efficacy in identifying individuals at risk. These properties render them well-suited for integration within clinical decision support systems.
The Bagging (0.873) and Extra Trees (0.833) models exhibited excellent adjusted concordance, thereby substantiating the statistical robustness of their predictions. In contrast, the results obtained from the AdaBoost (0.292) and Gradient Boosting (0.417) models indicated low-to-moderate agreement. This finding suggests that the accuracy of these models may be largely attributable to chance, thereby compromising their clinical applicability.
To assess the multiscale discriminative capability of the ML models, ROC curves [74] were generated for each of the ensemble algorithms implemented (see Figure 1). This graphical representation enables the analysis of the performance of the classifiers as a function of their TPR versus FPR over different thresholds. Consequently, it provides a global measure using the AUC. This is especially useful in scenarios with imbalanced classes, such as AMI [88].
The Bagging (AUC = 0.9686) and Extra Trees (AUC = 0.9689) models exhibited remarkable discriminative capability, with ROC curves converging toward the upper left vertex of the graph (TPR ≈ 1; FPR ≈ 0), suggesting minimal error probability in both FP and FN. This behavior is characteristic of models with low variance and high stability, which are particularly valuable in clinical settings where the cost of diagnostic error is asymmetric [58]. In a similar vein, Random Forest (AUC = 0.9687) exhibited a comparable performance, exhibiting only slight variations compared to the preceding models. This finding serves to substantiate its reliability as a dependable classifier within the domain of predictive medicine.
Conversely, Gradient Boosting (AUC = 0.7860) exhibited reduced convexity in the curve, indicating diminished sensitivity at specific decision thresholds, a phenomenon potentially attributable to overfitting or sensitivity to outliers [89]. Conversely, AdaBoost (AUC = 0.7021) exhibited the lowest discriminative capacity, with a curve approaching the nondiscrimination diagonal, thereby impeding its practical application in clinical settings.
These findings are consistently reflected in the results summarized in Table 2, which details the statistical comparison of the five ensemble learning models evaluated.
As illustrated in Table 2, the ensuing findings are derived from a comparative evaluation of the ensemble learning models applied to the designated dataset. The evaluation of the models was conducted using three complementary approaches: (1) average precision (AP) with its SD, (2) statistical comparison of average ranks using Friedman test with Nemenyi post hoc test, and (3) Cohen’s Kappa index, which quantifies the agreement with the ground truth, correcting for the effect of chance. The Cohen’s Kappa coefficient, a measure of agreement between model predictions and actual classifications, adjusted for expected agreement by chance, offers critical insights into the reliability of the model. The coefficient is of particular value in the presence of class imbalances, as the accuracy may be inflated in such cases [90].
The Bagging algorithm demonstrated the highest level of accuracy (93.36% ± 0.22) and the lowest average rank (1.0), thereby substantiating its consistently superior performance across all validation partitions. This model, in conjunction with Extra Trees (90.76% ± 0.18, rank 2.0), falls within statistical group A, suggesting that the observed disparities in performance are not statistically significant at the significance level α = 0.05. In terms of reliability, both models exhibited high Kappa indices (0.87 and 0.83, respectively), thereby substantiating that their high performance is not merely a product of chance, but rather is indicative of authentic agreement with the actual classification.
For its part, Random Forest demonstrated an average accuracy that was analogous to Extra Trees (90.41% ± 0.18). Nevertheless, its average rank (3.0) and its classification as statistical group B indicate that, from a statistical perspective, its performance is inferior to that of Bagging and Extra Trees in certain test subsets. Nevertheless, the observed Cohen’s Kappa of 0.83 indicates a substantial degree of true agreement, aligning with the observed accuracy.
In contrast, the evaluated boosting algorithms, Gradient Boosting (70.72% ± 0.30) and AdaBoost (65.15% ± 0.29), demonstrate notably lower accuracies and occupy the highest average ranks (4.0 and 5.0, respectively). The post hoc test groups Gradient Boosting in group B, while AdaBoost clearly lags behind in group C, indicating a statistically significant difference with respect to the higher performing models. This inferiority in accuracy is consistently reflected in the Kappa indices (0.42 for Gradient Boosting and 0.29 for AdaBoost), values that denote a barely moderate or even poor level of agreement with the reference classification, suggesting a high risk of overfitting or underfitting, possibly attributable to the sensitivity of these algorithms to suboptimal parameter settings or to the nature of the data employed.
The findings indicate the necessity of implementing nonparametric multiple comparison statistical tests in conjunction with robust matching metrics, such as Cohen’s Kappa [68], to derive reliable conclusions regarding the superiority of ML models. In particular, the combination of Bagging and Extra Trees emerges as the most robust and generalizable option for the addressed problem, while boosting-based methods should be carefully reconsidered or adjusted before implementation in production scenarios.
From the standpoint of ML applied to medicine, model selection should not be based exclusively on global metrics. Rather, it should also take into account the model’s behavior under real clinical conditions, its interpretability, and stability in the face of data variability [44,91]. In this regard, Random Forest, Extra Trees, and Bagging are preferred choices for clinical decision support systems, as they exhibit a high AUC, a low FPR, and a high TPR. This approach is designed to minimize the risk of over-intervention, also known as FP, and diagnostic omission, also known as FN. As a result, it optimizes the screening of patients with AMI [92].
The findings of this study are consistent with the recent evidence that combines class-imbalance rebalancing with ensemble models. In particular, Zheng et al. [93] report that integrating SMOTE/SMOTETomek with stacking on imbalanced clinical data markedly enhances discrimination (AUC ≈ 0.99). This finding aligns with the methodological principle that imbalance-oriented preprocessing improves the effectiveness of ensemble architectures by reducing minority-class bias [76]. A similar pattern is also observed in large-scale cardiometabolic applications. In these cases, ML frameworks that leverage extensive clinical data and balance/optimization pipelines deliver sustained gains in risk stratification performance [33,42]. From a clinical perspective, the combination of high AUC, low FPR, and high TPR is crucial for screening and decision support, as it simultaneously minimizes over-intervention (FP) and diagnostic omission (FN) [74].
However, we observe discrepancies with the findings of Kasim et al. [94], who reported that a linear SVM with targeted feature selection (AUC = 0.93; 95% CI: 0.89–0.98) outperforms even the best ensemble (AUC = 0.91; 95% CI: 0.87–0.96). While the findings are valuable, their generalizability is constrained by an evaluation that is almost exclusively focused on discrimination metrics. The methodological literature recommends the use of both discrimination and concordance metrics (e.g., Cohen’s Kappa and Matthews Correlation Coefficient (MCC) to assess the stability and reliability of predictions in cases of class imbalance and clinical heterogeneity [67,68,90]. Additionally, it suggests the implementation of nonparametric multiple comparisons (e.g., the Friedman test with Nemenyi post hoc test) to rule out differences driven by variability across cross-validation partitions [72]. Furthermore, concerns regarding the misuse of specific Random Forest variable-importance indicators in medicine underscore the necessity for robust, reproducible evaluation frameworks [91]. Additionally, common pitfalls in ML development for cardiology further support the implementation of more stringent validation protocols and more comprehensive reporting [39,40].
In light of these limitations, this study proposed an integrated approach that incorporated three elements: (1) the utilization of ROC curves with enhanced AUC metrics, (2) the implementation of Friedman tests with Nemenyi post hoc tests for multi-model comparisons, and (3) the incorporation of concordance metrics, such as Kappa and MCC, to assess the reliability of the models. This integrated approach led to the attainment of Kappa values consistently exceeding 0.80, thereby substantiating the enhanced robustness of Bagging and Extra Trees algorithms when compared to alternative models. While this statistical robustness supports the operational superiority of ensembles for myocardial risk stratification, responsible clinical adoption further requires consideration of temporal stability (e.g., concept drift in real-world data streams) [82], computational cost, integration with hospital information systems, and model transparency to support auditable medical decisions [42,92]. When considered as a whole, these elements favor an explicit balance between accuracy, methodological robustness, and operational suitability. This ensures that predictive gains translate into tangible clinical benefits in cardiology [2,42].
Finally, the results obtained lend support to the convergence of three lines of quantitative evidence, including ROC curves with superior AUCs, statistical significance of the Friedman test with Nemenyi post hoc test, and Cohen’s Kappa values consistently greater than 0.80. This finding serves to legitimize the robustness of the Bagging and Extra Tree models and ratifies their superiority to the other classifiers in predicting cardiovascular events. Nevertheless, the aforementioned statistical robustness does not absolve us of the necessity to deliberate on the temporal stability of the performance, the computational cost, and the feasibility of integrating the algorithm into hospital information systems. Additionally, we must consider its transparency to support auditable medical decisions. Therefore, the clinical adoption of new methods must be based on an explicit balance between accuracy, methodological robustness, and operational suitability. This balance is necessary to ensure that the predictive gain translates into tangible benefits for cardiology practice.

5. Conclusions

This study, conducted on a heterogeneous clinical cohort under a stringent validation protocol, demonstrates the relative superiority of aggregation- and randomization-based ensembles over boosting alternatives for the early prediction of AMI. In particular, the Bagging and Extra Trees methods achieved an AUC ≈ 0.969, Kappa ≥ 0.80, and the lowest average ranks in multiple comparisons. These results fall within statistical group A of the Friedman test with Nemenyi post hoc test. This supports the robustness and between-partition consistency of the methods. These findings align with the results from AP ± SD, MCC, and ROC curves, thereby reducing reliance on a solitary metric and mitigating the likelihood of erroneous conclusions.
From a methodological perspective, the observed performance can be attributed to two fundamental mechanisms. First, there is a reduction in variance, which enhances the ensemble’s resilience to noise. Second, there is an enhancement in noise resilience, which is inherent to sampling- and aggregation-based ensembles. These mechanisms are further augmented by a workflow that mitigates class imbalance through the application of SMOTE exclusively within training folds. This approach prevents information leakage and ensures the integrity of validation folds, which are evaluated using threshold-independent metrics. Conversely, boosting methods demonstrated significantly lower accuracy and Kappa values, along with higher average ranks. This finding indicates a heightened sensitivity to hyperparameter configuration and the data structure within this clinical domain.
In terms of applicability, it is recommended that priority be given to Bagging or Extra Trees as reference models for decision-support workflows in preventive cardiology. These models should be accompanied by probability calibration, threshold optimization aligned with the clinical costs of type I/II errors, and auditable interpretability mechanisms to facilitate responsible adoption in hospital settings. It must be acknowledged, however, that external validity and temporal stability require further confirmation. Accordingly, future work should include multicenter/temporal validation, prospective evaluations of impact on clinical processes, and health outcomes. Collectively, these steps will ensure that the predictive gains of ensemble methods translate into tangible clinical benefits, while maintaining an explicit balance among accuracy, methodological robustness, and operational suitability.

Author Contributions

D.C.A.-G.: writing-review and editing, writing-original draft, visualization, resources, methodology, investigation, formal analysis, conceptualization. J.S.-R.: visualization, resources, methodology, formal analysis. W.J.M.-R.: writing-original draft, methodology, investigation, formal analysis, supervision. M.G.Z.-R.: writing-review and editing, investigation, supervision. A.C.N.-A.: project administration, resources, validation, funding acquisition. E.D.-R.: project administration, resources, validation, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created in this study. Analyses used the publicly available Kaggle dataset cited in Methods (accessed 10 February 2024). Supporting scripts are available from the corresponding author upon reasonable request.

Acknowledgments

Artificial intelligence platforms have been used to improve the writing, but not to generate the article document.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Sun, J.; Qiao, Y.; Zhao, M.; Magnussen, C.G.; Xi, B. Global, Regional, and National Burden of Cardiovascular Diseases in Youths and Young Adults Aged 15–39 Years in 204 Countries/Territories, 1990–2019: A Systematic Analysis of Global Burden of Disease Study 2019. BMC Med. 2023, 21, 222. [Google Scholar] [CrossRef]
  2. Kaptoge, S.; Pennells, L.; De Bacquer, D.; Cooney, M.T.; Kavousi, M.; Stevens, G.; Riley, L.M.; Savin, S.; Khan, T.; Altay, S.; et al. World Health Organization Cardiovascular Disease Risk Charts: Revised Models to Estimate Risk in 21 Global Regions. Lancet Glob. Health 2019, 7, e1332–e1345. [Google Scholar] [CrossRef]
  3. Mc Namara, K.; Alzubaidi, H.; Jackson, J.K. Cardiovascular Disease as a Leading Cause of Death: How Are Pharmacists Getting Involved? Integr. Pharm. Res. Pract. 2019, 8, 1–11. [Google Scholar] [CrossRef]
  4. Kelly, B.B.; Narula, J.; Fuster, V. Recognizing Global Burden of Cardiovascular Disease and Related Chronic Diseases. Mt. Sinai J. Med. A J. Transl. Pers. Med. 2012, 79, 632–640. [Google Scholar] [CrossRef]
  5. Nansseu, J.R.; Tankeu, A.T.; Kamtchum-Tatuene, J.; Noubiap, J.J. Fixed-Dose Combination Therapy to Reduce the Growing Burden of Cardiovascular Disease in Low- and Middle-Income Countries: Feasibility and Challenges. J. Clin. Hypertens. 2018, 20, 168–173. [Google Scholar] [CrossRef]
  6. Janez, A.; Muzurovic, E.; Bogdanski, P.; Czupryniak, L.; Fabryova, L.; Fras, Z.; Guja, C.; Haluzik, M.; Kempler, P.; Lalic, N.; et al. Modern Management of Cardiometabolic Continuum: From Overweight/Obesity to Prediabetes/Type 2 Diabetes Mellitus. Recommendations from the Eastern and Southern Europe Diabetes and Obesity Expert Group. Diabetes Ther. 2024, 15, 1865–1892. [Google Scholar] [CrossRef] [PubMed]
  7. Jagannathan, R.; Patel, S.A.; Ali, M.K.; Narayan, K.M.V. Global Updates on Cardiovascular Disease Mortality Trends and Attribution of Traditional Risk Factors. Curr. Diab. Rep. 2019, 19, 44. [Google Scholar] [CrossRef] [PubMed]
  8. Ibrahim, L.; Mesinovic, M.; Yang, K.-W.; Eid, M.A. Explainable Prediction of Acute Myocardial Infarction Using Machine Learning and Shapley Values. IEEE Access 2020, 8, 210410–210417. [Google Scholar] [CrossRef]
  9. Writing Committee; Smith, S.C.; Collins, A.; Ferrari, R.; Holmes, D.R.; Logstrup, S.; McGhie, D.V.; Ralston, J.; Sacco, R.L.; Stam, H.; et al. Our Time: A Call to Save Preventable Death from Cardiovascular Disease (Heart Disease and Stroke). Eur. Heart J. 2012, 33, 2910–2916. [Google Scholar] [CrossRef]
  10. Gheorghe, A.; Griffiths, U.; Murphy, A.; Legido-Quigley, H.; Lamptey, P.; Perel, P. The Economic Burden of Cardiovascular Disease and Hypertension in Low- and Middle-Income Countries: A Systematic Review. BMC Public Health 2018, 18, 975. [Google Scholar] [CrossRef]
  11. Parry, M.; Bjørnnes, A.K.; Nickerson, N.; Lie, I. Family Caregivers and Cardiovascular Disease: An Intersectional Approach to Good Health and Wellbeing. In International Perspectives on Family Caregiving; Stanley, S., Ed.; Emerald Publishing Limited: Leeds, UK, 2025; pp. 135–157. ISBN 978-1-83549-612-1. [Google Scholar]
  12. Laslett, L.J.; Alagona, P.; Clark, B.A.; Drozda, J.P.; Saldivar, F.; Wilson, S.R.; Poe, C.; Hart, M. The Worldwide Environment of Cardiovascular Disease: Prevalence, Diagnosis, Therapy, and Policy Issues. J. Am. Coll. Cardiol. 2012, 60, S1–S49. [Google Scholar] [CrossRef]
  13. Capotosto, L.; Massoni, F.; De Sio, S.; Ricci, S.; Vitarelli, A. Early Diagnosis of Cardiovascular Diseases in Workers: Role of Standard and Advanced Echocardiography. BioMed Res. Int. 2018, 2018, 7354691. [Google Scholar] [CrossRef]
  14. Forman, D.; Bulwer, B.E. Cardiovascular Disease: Optimal Approaches to Risk Factor Modification of Diet and Lifestyle. Curr. Treat. Options Cardio Med. 2006, 8, 47–57. [Google Scholar] [CrossRef]
  15. Hymowitz, N. Behavioral Approaches to Preventing Heart Disease: Risk Factor Modification. Int. J. Ment. Health 1980, 9, 27–69. [Google Scholar] [CrossRef]
  16. Ullah, M.; Hamayun, S.; Wahab, A.; Khan, S.U.; Rehman, M.U.; Haq, Z.U.; Rehman, K.U.; Ullah, A.; Mehreen, A.; Awan, U.A.; et al. Smart Technologies Used as Smart Tools in the Management of Cardiovascular Disease and Their Future Perspective. Curr. Probl. Cardiol. 2023, 48, 101922. [Google Scholar] [CrossRef]
  17. Thupakula, S.; Nimmala, S.S.R.; Ravula, H.; Chekuri, S.; Padiya, R. Emerging Biomarkers for the Detection of Cardiovascular Diseases. Egypt Heart J. 2022, 74, 77. [Google Scholar] [CrossRef] [PubMed]
  18. Fathil, M.F.M.; Md Arshad, M.K.; Gopinath, S.C.B.; Hashim, U.; Adzhri, R.; Ayub, R.M.; Ruslinda, A.R.; Nuzaihan, M.N.M.; Azman, A.H.; Zaki, M.; et al. Diagnostics on Acute Myocardial Infarction: Cardiac Troponin Biomarkers. Biosens. Bioelectron. 2015, 70, 209–220. [Google Scholar] [CrossRef] [PubMed]
  19. Tiwari, R.P.; Jain, A.; Khan, Z.; Kohli, V.; Bharmal, R.N.; Kartikeyan, S.; Bisen, P.S. Cardiac Troponins I and T: Molecular Markers for Early Diagnosis, Prognosis, and Accurate Triaging of Patients with Acute Myocardial Infarction. Mol. Diagn. Ther. 2012, 16, 371–381. [Google Scholar] [CrossRef]
  20. Garg, P.; Morris, P.; Fazlanie, A.L.; Vijayan, S.; Dancso, B.; Dastidar, A.G.; Plein, S.; Mueller, C.; Haaf, P. Cardiac Biomarkers of Acute Coronary Syndrome: From History to High-Sensitivity Cardiac Troponin. Intern. Emerg. Med. 2017, 12, 147–155. [Google Scholar] [CrossRef] [PubMed]
  21. Li, Y.; Xu, H.; Chen, S.; Wang, J. Advances in Electrochemical Detection of B-Type Natriuretic Peptide as a Heart Failure Biomarker. Int. J. Electrochem. Sci. 2024, 19, 100748. [Google Scholar] [CrossRef]
  22. Onitilo, A.A.; Engel, J.M.; Stankowski, R.V.; Liang, H.; Berg, R.L.; Doi, S.A.R. High-Sensitivity C-Reactive Protein (Hs-CRP) as a Biomarker for Trastuzumab-Induced Cardiotoxicity in HER2-Positive Early-Stage Breast Cancer: A Pilot Study. Breast Cancer Res. Treat. 2012, 134, 291–298. [Google Scholar] [CrossRef]
  23. Upadhyay, R.K. Emerging Risk Biomarkers in Cardiovascular Diseases and Disorders. J. Lipids 2015, 2015, 971453. [Google Scholar] [CrossRef]
  24. Georgoulis, M.; Chrysohoou, C.; Georgousopoulou, E.; Damigou, E.; Skoumas, I.; Pitsavos, C.; Panagiotakos, D. Long-Term Prognostic Value of LDL-C, HDL-C, Lp(a) and TG Levels on Cardiovascular Disease Incidence, by Body Weight Status, Dietary Habits and Lipid-Lowering Treatment: The ATTICA Epidemiological Cohort Study (2002–2012). Lipids Health Dis. 2022, 21, 141. [Google Scholar] [CrossRef]
  25. Sonmez, A.; Yilmaz, M.I.; Saglam, M.; Unal, H.U.; Gok, M.; Cetinkaya, H.; Karaman, M.; Haymana, C.; Eyileten, T.; Oguz, Y.; et al. The Role of Plasma Triglyceride/High-Density Lipoprotein Cholesterol Ratio to Predict Cardiovascular Outcomes in Chronic Kidney Disease. Lipids Health Dis. 2015, 14, 29. [Google Scholar] [CrossRef]
  26. Djaberi, R.; Beishuizen, E.D.; Pereira, A.M.; Rabelink, T.J.; Smit, J.W.; Tamsma, J.T.; Huisman, M.V.; Jukema, J.W. Non-Invasive Cardiac Imaging Techniques and Vascular Tools for the Assessment of Cardiovascular Disease in Type 2 Diabetes Mellitus. Diabetologia 2008, 51, 1581–1593. [Google Scholar] [CrossRef]
  27. Ansari, S.; Farzaneh, N.; Duda, M.; Horan, K.; Andersson, H.B.; Goldberger, Z.D.; Nallamothu, B.K.; Najarian, K. A Review of Automated Methods for Detection of Myocardial Ischemia and Infarction Using Electrocardiogram and Electronic Health Records. IEEE Rev. Biomed. Eng. 2017, 10, 264–298. [Google Scholar] [CrossRef] [PubMed]
  28. Klaeboe, L.G.; Edvardsen, T. Echocardiographic Assessment of Left Ventricular Systolic Function. J. Echocardiogr. 2019, 17, 10–16. [Google Scholar] [CrossRef] [PubMed]
  29. Cheng, K.; Lin, A.; Yuvaraj, J.; Nicholls, S.J.; Wong, D.T.L. Cardiac Computed Tomography Radiomics for the Non-Invasive Assessment of Coronary Inflammation. Cells 2021, 10, 879. [Google Scholar] [CrossRef] [PubMed]
  30. Mushtaq, S.; Conte, E.; Pontone, G.; Baggiano, A.; Annoni, A.; Formenti, A.; Mancini, M.E.; Guglielmo, M.; Muscogiuri, G.; Tanzilli, A.; et al. State-of-the-Art-Myocardial Perfusion Stress Testing: Static CT Perfusion. J. Cardiovasc. Comput. Tomogr. 2020, 14, 294–302. [Google Scholar] [CrossRef]
  31. Beller, G.A.; Heede, R.C. SPECT Imaging for Detecting Coronary Artery Disease and Determining Prognosis by Noninvasive Assessment of Myocardial Perfusion and Myocardial Viability. J. Cardiovasc. Trans. Res. 2011, 4, 416–424. [Google Scholar] [CrossRef]
  32. Baghdadi, N.A.; Farghaly Abdelaliem, S.M.; Malki, A.; Gad, I.; Ewis, A.; Atlam, E. Advanced Machine Learning Techniques for Cardiovascular Disease Early Detection and Diagnosis. J. Big Data 2023, 10, 144. [Google Scholar] [CrossRef]
  33. Boudali, I.; Chebaane, S.; Zitouni, Y. A Predictive Approach for Myocardial Infarction Risk Assessment Using Machine Learning and Big Clinical Data. Healthc. Anal. 2024, 5, 100319. [Google Scholar] [CrossRef]
  34. Dimopoulos, A.C.; Nikolaidou, M.; Caballero, F.F.; Engchuan, W.; Sanchez-Niubo, A.; Arndt, H.; Ayuso-Mateos, J.L.; Haro, J.M.; Chatterji, S.; Georgousopoulou, E.N.; et al. Machine Learning Methodologies versus Cardiovascular Risk Scores, in Predicting Disease Risk. BMC Med. Res. Methodol. 2018, 18, 179. [Google Scholar] [CrossRef] [PubMed]
  35. Saikumar, K.; Rajesh, V. A Machine Intelligence Technique for Predicting Cardiovascular Disease (CVD) Using Radiology Dataset. Int. J. Syst. Assur. Eng. Manag. 2024, 15, 135–151. [Google Scholar] [CrossRef]
  36. Hakim, M.A.; Jahan, N.; Zerin, Z.A.; Farha, A.B. Performance Evaluation and Comparison of Ensemble Based Bagging and Boosting Machine Learning Methods for Automated Early Prediction of Myocardial Infarction. In Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India, 6–8 July 2021; pp. 1–6. [Google Scholar]
  37. Rai, H.M.; Chatterjee, K. Hybrid CNN-LSTM Deep Learning Model and Ensemble Technique for Automatic Detection of Myocardial Infarction Using Big ECG Data. Appl. Intell. 2022, 52, 5366–5384. [Google Scholar] [CrossRef]
  38. Bian, K.; Priyadarshi, R. Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues. Arch. Comput. Methods Eng. 2024, 31, 4209–4233. [Google Scholar] [CrossRef]
  39. Aliferis, C.; Simon, G. Overfitting, Underfitting and General Model Overconfidence and Under-Performance Pitfalls and Best Practices in Machine Learning and AI. In Artificial Intelligence and Machine Learning in Health Care and Medical Sciences: Best Practices and Pitfalls; Simon, G.J., Aliferis, C., Eds.; Springer International Publishing: Cham, Switzerland, 2024; pp. 477–524. ISBN 978-3-031-39355-6. [Google Scholar]
  40. Cai, Y.-Q.; Gong, D.-X.; Tang, L.-Y.; Cai, Y.; Li, H.-J.; Jing, T.-C.; Gong, M.; Hu, W.; Zhang, Z.-W.; Zhang, X.; et al. Pitfalls in Developing Machine Learning Models for Predicting Cardiovascular Diseases: Challenge and Solutions. J. Med. Internet Res. 2024, 26, e47645. [Google Scholar] [CrossRef]
  41. Ribeiro, M.H.D.M.; dos Santos Coelho, L. Ensemble Approach Based on Bagging, Boosting and Stacking for Short-Term Prediction in Agribusiness Time Series. Appl. Soft. Comput. 2020, 86, 105837. [Google Scholar] [CrossRef]
  42. Krittanawong, C.; Virk, H.U.H.; Bangalore, S.; Wang, Z.; Johnson, K.W.; Pinotti, R.; Zhang, H.; Kaplin, S.; Narasimhan, B.; Kitai, T.; et al. Machine Learning Prediction in Cardiovascular Diseases: A Meta-Analysis. Sci. Rep. 2020, 10, 16057. [Google Scholar] [CrossRef]
  43. Liu, R.; Wang, M.; Zheng, T.; Zhang, R.; Li, N.; Chen, Z.; Yan, H.; Shi, Q. An Artificial Intelligence-Based Risk Prediction Model of Myocardial Infarction. BMC Bioinform. 2022, 23, 217. [Google Scholar] [CrossRef]
  44. Wang, S.; Li, J.; Sun, L.; Cai, J.; Wang, S.; Zeng, L.; Sun, S. Application of Machine Learning to Predict the Occurrence of Arrhythmia after Acute Myocardial Infarction. BMC Med. Inf. Decis. Mak. 2021, 21, 301. [Google Scholar] [CrossRef] [PubMed]
  45. Sharma, L.D.; Sunkaria, R.K. Inferior Myocardial Infarction Detection Using Stationary Wavelet Transform and Machine Learning Approach. Signal Image Video Process. 2018, 12, 199–206. [Google Scholar] [CrossRef]
  46. Oliveira, M.; Seringa, J.; Pinto, F.J.; Henriques, R.; Magalhães, T. Machine Learning Prediction of Mortality in Acute Myocardial Infarction. BMC Med. Inf. Decis. Mak. 2023, 23, 70. [Google Scholar] [CrossRef] [PubMed]
  47. Li, X.; Shang, C.; Xu, C.; Wang, Y.; Xu, J.; Zhou, Q. Development and Comparison of Machine Learning-Based Models for Predicting Heart Failure after Acute Myocardial Infarction. BMC Med. Inf. Decis. Mak. 2023, 23, 165. [Google Scholar] [CrossRef]
  48. Cho, S.M.; Austin, P.C.; Ross, H.J.; Abdel-Qadir, H.; Chicco, D.; Tomlinson, G.; Taheri, C.; Foroutan, F.; Lawler, P.R.; Billia, F.; et al. Machine Learning Compared with Conventional Statistical Models for Predicting Myocardial Infarction Readmission and Mortality: A Systematic Review. Can. J. Cardiol. 2021, 37, 1207–1214. [Google Scholar] [CrossRef]
  49. Barker, J.; Li, X.; Khavandi, S.; Koeckerling, D.; Mavilakandy, A.; Pepper, C.; Bountziouka, V.; Chen, L.; Kotb, A.; Antoun, I.; et al. Machine Learning in Sudden Cardiac Death Risk Prediction: A Systematic Review. EP Eur. 2022, 24, 1777–1787. [Google Scholar] [CrossRef]
  50. Chellappan, D.; Rajaguru, H. Generalizability of Machine Learning Models for Diabetes Detection a Study with Nordic Islet Transplant and PIMA Datasets. Sci. Rep. 2025, 15, 4479. [Google Scholar] [CrossRef]
  51. Sun, Y.; Pang, S.; Zhao, Z.; Zhang, Y. Interpretable SHAP Model Combining Meta-Learning and Vision Transformer for Lithology Classification Using Limited and Unbalanced Drilling Data in Well Logging. Nat. Resour. Res. 2024, 33, 2545–2565. [Google Scholar] [CrossRef]
  52. Rana, N.; Sharma, K.; Sharma, A. Diagnostic Strategies Using AI and ML in Cardiovascular Diseases: Challenges and Future Perspectives. In Deep Learning and Computer Vision: Models and Biomedical Applications; Dulhare, U.N., Houssein, E.H., Eds.; Springer Nature: Singapore, 2025; Volume 1, pp. 135–165. ISBN 978-981-96-1285-7. [Google Scholar]
  53. Taherkhani, A.; Cosma, G.; McGinnity, T.M. AdaBoost-CNN: An Adaptive Boosting Algorithm for Convolutional Neural Networks to Classify Multi-Class Imbalanced Datasets Using Transfer Learning. Neurocomputing 2020, 404, 351–366. [Google Scholar] [CrossRef]
  54. Cao, Y.; Miao, Q.-G.; Liu, J.-C.; Gao, L. Advance and Prospects of AdaBoost Algorithm. Acta Autom. Sin. 2013, 39, 745–758. [Google Scholar] [CrossRef]
  55. Shahraki, A.; Abbasi, M.; Haugen, Ø. Boosting Algorithms for Network Intrusion Detection: A Comparative Evaluation of Real AdaBoost, Gentle AdaBoost and Modest AdaBoost. Eng. Appl. Artif. Intell. 2020, 94, 103770. [Google Scholar] [CrossRef]
  56. Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A Comparative Analysis of Gradient Boosting Algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
  57. Bahad, P.; Saxena, P. Study of AdaBoost and Gradient Boosting Algorithms for Predictive Analytics. In Proceedings of the International Conference on Intelligent Computing and Smart Communication, Tehri, India, 20–21 April 2019; Singh Tomar, G., Chaudhari, N.S., Barbosa, J.L.V., Aghwariya, M.K., Eds.; Springer: Singapore, 2020; pp. 235–244. [Google Scholar]
  58. Sun, R.; Wang, G.; Zhang, W.; Hsu, L.-T.; Ochieng, W.Y. A Gradient Boosting Decision Tree Based GPS Signal Reception Classification Algorithm. Appl. Soft Comput. 2020, 86, 105942. [Google Scholar] [CrossRef]
  59. Aziz, N.; Akhir, E.A.P.; Aziz, I.A.; Jaafar, J.; Hasan, M.H.; Abas, A.N.C. A Study on Gradient Boosting Algorithms for Development of AI Monitoring and Prediction Systems. In Proceedings of the 2020 International Conference on Computational Intelligence (ICCI), Virtual, 8–9 October 2020; pp. 11–16. [Google Scholar]
  60. Chowdhury, A.R.; Chatterjee, T.; Banerjee, S. A Random Forest Classifier-Based Approach in the Detection of Abnormalities in the Retina. Med. Biol. Eng. Comput. 2019, 57, 193–203. [Google Scholar] [CrossRef]
  61. Dhananjay, B.; Venkatesh, N.P.; Bhardwaj, A.; Sivaraman, J. Cardiac Signals Classification Based on Extra Trees Model. In Proceedings of the 2021 8th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 26–27 August 2021; pp. 402–406. [Google Scholar]
  62. Aria, M.; Cuccurullo, C.; Gnasso, A. A Comparison among Interpretative Proposals for Random Forests. Mach. Learn. Appl. 2021, 6, 100094. [Google Scholar] [CrossRef]
  63. Fumera, G.; Roli, F.; Serrau, A. A Theoretical Analysis of Bagging as a Linear Combination of Classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 1293–1299. [Google Scholar] [CrossRef] [PubMed]
  64. Plaia, A.; Buscemi, S.; Fürnkranz, J.; Mencía, E.L. Comparing Boosting and Bagging for Decision Trees of Rankings. J. Classif. 2022, 39, 78–99. [Google Scholar] [CrossRef]
  65. Heydarian, M.; Doyle, T.E.; Samavi, R. MLCM: Multi-Label Confusion Matrix. IEEE Access 2022, 10, 19083–19095. [Google Scholar] [CrossRef]
  66. Markoulidakis, I.; Markoulidakis, G. Probabilistic Confusion Matrix: A Novel Method for Machine Learning Algorithm Generalized Performance Analysis. Technologies 2024, 12, 113. [Google Scholar] [CrossRef]
  67. Kolesnyk, A.S.; Khairova, N.F. Justification for the Use of Cohen’s Kappa Statistic in Experimental Studies of NLP and Text Mining. Cybern. Syst. Anal. 2022, 58, 280–288. [Google Scholar] [CrossRef]
  68. Wang, J.; Yang, Y.; Xia, B. A Simplified Cohen’s Kappa for Use in Binary Classification Data Annotation Tasks. IEEE Access 2019, 7, 164386–164397. [Google Scholar] [CrossRef]
  69. Mokeddem, S.A. A Fuzzy Classification Model for Myocardial Infarction Risk Assessment. Appl. Intell. 2018, 48, 1233–1250. [Google Scholar] [CrossRef]
  70. Yates, L.A.; Aandahl, Z.; Richards, S.A.; Brook, B.W. Cross Validation for Model Selection: A Review with Examples from Ecology. Ecol. Monogr. 2023, 93, e1557. [Google Scholar] [CrossRef]
  71. Lim, C.; Yu, B. Estimation Stability with Cross-Validation (ESCV). J. Comput. Graph. Stat. 2016, 25, 464–492. [Google Scholar] [CrossRef]
  72. Raschka, S. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. arXiv 2020, arXiv:1811.12808. [Google Scholar] [CrossRef]
  73. Mohd Faizal, A.S.; Hon, W.Y.; Thevarajah, T.M.; Khor, S.M.; Chang, S.-W. A Biomarker Discovery of Acute Myocardial Infarction Using Feature Selection and Machine Learning. Med. Biol. Eng. Comput. 2023, 61, 2527–2541. [Google Scholar] [CrossRef] [PubMed]
  74. Obuchowski, N.A.; Bullen, J.A. Receiver Operating Characteristic (ROC) Curves: Review of Methods with Applications in Diagnostic Medicine. Phys. Med. Biol. 2018, 63, 07TR01. [Google Scholar] [CrossRef]
  75. Rojas, J.C.; Lyons, P.G.; Chhikara, K.; Chaudhari, V.; Bhavani, S.V.; Nour, M.; Buell, K.G.; Smith, K.D.; Gao, C.A.; Amagai, S.; et al. A Common Longitudinal Intensive Care Unit Data Format (CLIF) for Critical Illness Research. Intensive Care Med. 2025, 51, 556–569. [Google Scholar] [CrossRef] [PubMed]
  76. Elreedy, D.; Atiya, A.F. A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for Handling Class Imbalance. Inf. Sci. 2019, 505, 32–64. [Google Scholar] [CrossRef]
  77. Özdemir, A.; Polat, K.; Alhudhaif, A. Classification of Imbalanced Hyperspectral Images Using SMOTE-Based Deep Learning Methods. Expert Syst. Appl. 2021, 178, 114986. [Google Scholar] [CrossRef]
  78. Carreira-Perpiñán, M.Á.; Zharmagambetov, A. Ensembles of Bagged TAO Trees Consistently Improve over Random Forests, AdaBoost and Gradient Boosting. In Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference, Virtual, 18–20 October 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 35–46. [Google Scholar]
  79. Shao, G.; Tang, L.; Liao, J. Overselling Overall Map Accuracy Misinforms about Research Reliability. Landsc. Ecol. 2019, 34, 2487–2492. [Google Scholar] [CrossRef]
  80. Li, M.; Yu, T. Methodological Issues on Evaluating Agreement between Two Detection Methods by Cohen’s Kappa Analysis. Parasit. Vectors 2022, 15, 270. [Google Scholar] [CrossRef]
  81. Demirhan, H.; Yilmaz, A.E. Detection of Grey Zones in Inter-Rater Agreement Studies. BMC Med. Res. Methodol. 2023, 23, 3. [Google Scholar] [CrossRef]
  82. Brzezinski, D.; Stefanowski, J. Prequential AUC: Properties of the Area under the ROC Curve for Data Streams with Concept Drift. Knowl. Inf. Syst. 2017, 52, 531–562. [Google Scholar] [CrossRef]
  83. Newaz, A.; Mohosheu, M.S.; Al Noman, M.A. Predicting Complications of Myocardial Infarction within Several Hours of Hospitalization Using Data Mining Techniques. Inform. Med. Unlocked 2023, 42, 101361. [Google Scholar] [CrossRef]
  84. Abbas, S.; Ojo, S.; Krichen, M.; Alamro, M.A.; Mihoub, A.; Vilcekova, L. A Novel Deep Learning Approach for Myocardial Infarction Detection and Multi-Label Classification. IEEE Access 2024, 12, 76003–76021. [Google Scholar] [CrossRef]
  85. Alsirhani, A.; Tariq, N.; Humayun, M.; Naif Alwakid, G.; Sanaullah, H. Intrusion Detection in Smart Grids Using Artificial Intelligence-Based Ensemble Modelling. Clust. Comput. 2025, 28, 238. [Google Scholar] [CrossRef]
  86. Van den Bruel, A.; Cleemput, I.; Aertgeerts, B.; Ramaekers, D.; Buntinx, F. The Evaluation of Diagnostic Tests: Evidence on Technical and Diagnostic Accuracy, Impact on Patient Outcome and Cost-Effectiveness Is Needed. J. Clin. Epidemiol. 2007, 60, 1116–1122. [Google Scholar] [CrossRef]
  87. Miao, J.; Zhu, W. Precision–Recall Curve (PRC) Classification Trees. Evol. Intel. 2022, 15, 1545–1569. [Google Scholar] [CrossRef]
  88. Chakraborty, D.; Elzarka, H. Advanced Machine Learning Techniques for Building Performance Simulation: A Comparative Analysis. J. Build. Perform. Simul. 2019, 12, 193–207. [Google Scholar] [CrossRef]
  89. Li, A.H.; Bradic, J. Boosting in the Presence of Outliers: Adaptive Classification with Nonconvex Loss Functions. J. Am. Stat. Assoc. 2018, 113, 660–674. [Google Scholar] [CrossRef]
  90. Chicco, D.; Warrens, M.J.; Jurman, G. The Matthews Correlation Coefficient (MCC) Is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification Assessment. IEEE Access 2021, 9, 78368–78381. [Google Scholar] [CrossRef]
  91. Wallace, M.L.; Mentch, L.; Wheeler, B.J.; Tapia, A.L.; Richards, M.; Zhou, S.; Yi, L.; Redline, S.; Buysse, D.J. Use and Misuse of Random Forest Variable Importance Metrics in Medicine: Demonstrations through Incident Stroke Prediction. BMC Med. Res. Methodol. 2023, 23, 144. [Google Scholar] [CrossRef]
  92. Liu, L.; Lewandrowski, K. Establishing Optimal Cutoff Values for High-Sensitivity Cardiac Troponin Algorithms in Risk Stratification of Acute Myocardial Infarction. Crit. Rev. Clin. Lab. Sci. 2024, 61, 1–22. [Google Scholar] [CrossRef]
  93. Zheng, H.; Sherazi, S.W.A.; Lee, J.Y. A Stacking Ensemble Prediction Model for the Occurrences of Major Adverse Cardiovascular Events in Patients with Acute Coronary Syndrome on Imbalanced Data. IEEE Access 2021, 9, 113692–113704. [Google Scholar] [CrossRef]
  94. Kasim, S.; Amir Rudin, P.N.F.; Malek, S.; Ibrahim, K.S.; Wan Ahmad, W.A.; Fong, A.Y.Y.; Lin, W.Y.; Aziz, F.; Ibrahim, N. Ensemble Machine Learning for Predicting In-Hospital Mortality in Asian Women with ST-Elevation Myocardial Infarction (STEMI). Sci. Rep. 2024, 14, 12378. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Comparative ROC curves of ensemble models for binary classification.
Figure 1. Comparative ROC curves of ensemble models for binary classification.
Informatics 12 00109 g001
Table 1. Performance of ML ensemble models.
Table 1. Performance of ML ensemble models.
Model Accuracy F1 ScorePrecisionRecallAUC
AdaBoost0.6461220.6459180.6465480.6461790.702120
Gradient Boosting0.7083420.7083300.7084060.7083600.786005
Random Forest0.9129380.9129100.913365 0.912903 0.968798
Extra Trees 0.916395 0.916330 0.917540 0.916337 0.968897
Bagging 0.936273 0.936083 0.941200 0.936156 0.968604
Table 2. Comparative performance of ensemble learning models: average accuracy, average rank, statistical group, and Cohen’s Kappa index.
Table 2. Comparative performance of ensemble learning models: average accuracy, average rank, statistical group, and Cohen’s Kappa index.
ModelAccuracy ± SDAverage RankStatistical GroupCohen’s Kappa
Bagging 93.36% ± 0.22 1.0 A 0.87
Extra Trees 90.76% ± 0.18 2.0 A 0.83
Random Forest 90.41% ± 0.18 3.0 B 0.83
Gradient Boosting 70.72% ± 0.30 4.0 B 0.42
AdaBoost 65.15% ± 0.29 5.0 C 0.29
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Andrade-Girón, D.C.; Sandivar-Rosas, J.; Marin-Rodriguez, W.J.; Zúñiga-Rojas, M.G.; Neri-Ayala, A.C.; Díaz-Ronceros, E. Comparison of Ensemble and Meta-Ensemble Models for Early Risk Prediction of Acute Myocardial Infarction. Informatics 2025, 12, 109. https://doi.org/10.3390/informatics12040109

AMA Style

Andrade-Girón DC, Sandivar-Rosas J, Marin-Rodriguez WJ, Zúñiga-Rojas MG, Neri-Ayala AC, Díaz-Ronceros E. Comparison of Ensemble and Meta-Ensemble Models for Early Risk Prediction of Acute Myocardial Infarction. Informatics. 2025; 12(4):109. https://doi.org/10.3390/informatics12040109

Chicago/Turabian Style

Andrade-Girón, Daniel Cristóbal, Juana Sandivar-Rosas, William Joel Marin-Rodriguez, Marcelo Gumercindo Zúñiga-Rojas, Abrahán Cesar Neri-Ayala, and Ernesto Díaz-Ronceros. 2025. "Comparison of Ensemble and Meta-Ensemble Models for Early Risk Prediction of Acute Myocardial Infarction" Informatics 12, no. 4: 109. https://doi.org/10.3390/informatics12040109

APA Style

Andrade-Girón, D. C., Sandivar-Rosas, J., Marin-Rodriguez, W. J., Zúñiga-Rojas, M. G., Neri-Ayala, A. C., & Díaz-Ronceros, E. (2025). Comparison of Ensemble and Meta-Ensemble Models for Early Risk Prediction of Acute Myocardial Infarction. Informatics, 12(4), 109. https://doi.org/10.3390/informatics12040109

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop