1. Introduction
Cardiovascular diseases (CVDs) remain the leading cause of morbidity and mortality worldwide, and ST-segment elevation myocardial infarction (STEMI) represents one of the most lethal clinical manifestations within this spectrum [
1]. Despite substantial advances in primary percutaneous coronary intervention (pPCI) and pharmacological therapies, short- and long-term mortality rates in STEMI patients still range between 5% and 10%, underscoring the ongoing critical importance of early risk stratification [
2]. Although conventional risk scores such as the thrombolysis in myocardial infarction (TIMI) score and the Global Registry of Acute Coronary Events (GRACE) score are widely used to predict mortality risk, these models typically rely on a limited number of variables and may be insufficient to fully capture the complex pathophysiological status of patients [
3,
4]. Therefore, there is a need for more comprehensive predictive models incorporating novel, next-generation biomarkers to enable clinicians to identify high-risk patients with greater precision.
It is well established that inflammation plays a pivotal role in the pathogenesis of atherosclerosis and plaque rupture, and that a high inflammatory burden is associated with poor prognosis [
5,
6]. In this context, biomarkers derived from complete blood count parameters—such as the neutrophil-to-lymphocyte ratio (NLR) and the more contemporary systemic immune-inflammation index (SII)—have been reported in the literature as strong predictors of adverse cardiovascular outcomes in STEMI patients [
7,
8]. In addition to inflammatory markers, metabolic and nutritional parameters reflecting prognosis—such as the triglyceride–glucose index (TyG), an indicator of insulin resistance, and the C-reactive protein (CRP)/albumin ratio (CAR)—have also been shown to have significant prognostic value [
9,
10]. However, analyzing a large number of variables that represent diverse physiological pathways (inflammation, metabolism, renal function, etc.) within a single model using traditional linear regression methods introduces notable statistical challenges.
The complexity of medical data and the presence of non-linear relationships among variables necessitate the use of more advanced analytical tools beyond traditional statistical methods [
11]. Machine learning (ML) algorithms, a subfield of artificial intelligence (AI), have shown promising results in cardiology due to their superior ability to process high-dimensional data and detect latent patterns among variables [
12,
13,
14,
15,
16,
17,
18]. ML-based models not only improve predictive accuracy but also have the potential to inform clinicians about the relative contribution of each parameter to risk.
In this study, we aimed to comprehensively evaluate the prognostic value of multiple inflammatory and metabolic indices (NLR, SII, Systemic Inflammatory Response Index (SIRI), pan-immune-inflammation value (PIV), CAR, TyG, atherogenic index of plasma (AIP), Prognostic Nutritional Index (PNI), and Advanced Lung Cancer Inflammation Index (ALI)) for long-term (24-month) mortality in hospitalized patients diagnosed with STEMI. The study was conducted on a multidimensional dataset integrating these indices with patients’ demographic characteristics, comorbidities (e.g., diabetes, hypertension (HT), chronic obstructive pulmonary disease (COPD), renal failure), laboratory findings (e.g., lipid profile, renal and liver function tests), and procedural parameters (e.g., door-to-balloon time (DTBT), number of diseased vessels, left ventricular ejection fraction (LVEF)). Within this framework, several ML algorithms—including Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), and Artificial Neural Networks (ANN)—were implemented to build a high-performance prognostic model. The key novelty of this study lies in combining multiple pan-inflammatory and metabolic biomarkers rather than focusing on a single inflammatory marker, and in enhancing clinical applicability by making the model’s decision mechanism explainable using SHAP (SHapley Additive Explanations).
2. Materials and Methods
This study was conducted using retrospective clinical data obtained from patients diagnosed with STEMI who presented to the Emergency Department of Amasya University Faculty of Medicine and were subsequently hospitalized, monitored, and treated in the Department of Cardiology. The primary aim of the study was to develop and evaluate ML–based prognostic models capable of predicting mortality in STEMI patients. The dataset comprised multidimensional clinical variables, including demographic data (age, sex, body mass index (BMI)), comorbidities (diabetes mellitus (DM), HT, COPD, chronic kidney disease (CKD), atrial fibrillation (AF), and history of cerebrovascular accident (CVA)), cardiac and procedural parameters (DTBT, number of diseased vessels, and LVEF), and laboratory markers (complete blood count (CBC), glucose, creatinine, C reactive protein (CRP), hemoglobin A1c (HbA1c), lipid profile, and liver function tests). In addition, composite indices reflecting inflammatory and metabolic processes—NLR, SII, SIRI, PIV, CAR, TyG, AIP, PNI, and ALI—were calculated for each patient.
2.1. Study Population
This retrospective observational study included adult individuals aged ≥18 years with a confirmed diagnosis of STEMI who were followed and treated in the Department of Cardiology at Amasya University Faculty of Medicine between 10 January 2023 and 18 July 2023. The study was approved by the Amasya University Non-Interventional Clinical Research Ethics Committee (Decision No: 2025/250; approval date: 12 December 2025). Due to the retrospective nature of the study, the requirement for obtaining informed consent from patients was waived by Amasya University Rectorate Non-Interventional Clinical Research Ethics Committee.
Following ethical approval, demographic, clinical, laboratory, and angiographic data were retrospectively reviewed using the hospital information system. The diagnosis of STEMI was confirmed by ST-segment elevation on a 12-lead ECG, typical chest pain and/or elevated cardiac troponin levels, and was supported by coronary angiography findings in all patients.
Inclusion criteria were adult patients with a confirmed STEMI diagnosis who underwent pPCI and had complete clinical and laboratory data. Patients with acute infection, active malignancy, chronic inflammatory disease, advanced liver failure, hematologic disorders, or a history of immunosuppressive therapy were excluded. Cases with missing laboratory records, insufficient angiographic data, or lacking 24-month mortality follow-up were also excluded from the analysis. A total of 1500 patient records with ACS were screened. Of these, 1171 patients were excluded based on the predefined criteria, including missing laboratory data (n = 400), active malignancy (n = 50), chronic inflammatory disease (n = 100), insufficient angiographic data (n = 621), and other exclusion conditions as described above. After applying strict inclusion and exclusion criteria, 329 patients with STEMI were included in the final study cohort. Mortality was defined as long-term all-cause mortality occurring during a mean 24-month follow-up period. Follow-up mortality data were obtained through verification and confirmation via the patient’s hospital records and the Central Population Management System (MERNIS-TÜRKİYE), where deaths are registered in our country.
In the study, all patients without AF received ticagrelor plus acetylsalicylic acid (ASA) therapy for one year, whereas patients with AF were treated with ASA for one week and clopidogrel plus a direct oral anticoagulant for one year.
The methodological workflow of the study—from data collection to model development and performance evaluation—is systematically summarized in
Figure 1. This flowchart comprehensively reflects all stages of the research, including patient selection, data preprocessing, modeling, and performance evaluation.
2.2. Clinical and Procedural Parameters
Clinical and procedural parameters of STEMI patients included length of hospital stay, DTBT, vascular involvement score, and LVEF. Length of hospital stay reflects clinical recovery time and the risk of developing complications, whereas DTBT represents reperfusion success and treatment effectiveness. The vascular involvement score was determined according to the presence of severe obstructive coronary artery disease identified on coronary angiography and indicates the overall coronary atherosclerotic burden. Based on the number of major epicardial coronary arteries with significant involvement, the vascular score was classified as follows: Score 1 if one major epicardial vessel was involved, score 2 if two vessels were involved, and score 3 if three vessels were involved. These parameters were analyzed to assess their associations with acute clinical management and long-term mortality in STEMI patients.
2.3. Laboratory Parameters
Laboratory findings at presentation were evaluated as key biomarkers providing information on systemic inflammatory response, metabolic status, and cardiac function. All laboratory variables were examined in three main groups:
CBC: White blood cell count (WBC), hemoglobin (HGB), platelet count (PLT), neutrophil, lymphocyte, and monocyte values were analyzed. These parameters were used to assess inflammatory activity and hematological balance. In particular, the NLR is an important indicator reflecting systemic inflammation [
19].
Metabolic–Biochemical Profile: Glucose, creatinine, estimated glomerular filtration rate (eGFR), liver enzymes (alanine aminotransferase, aspartate aminotransferase, alkaline phosphatase (AST, ALT, ALP)), uric acid, CRP, troponin, and albumin levels were evaluated. These parameters were analyzed to assess metabolic status, renal and hepatic function, inflammatory burden, and myocardial injury.
Lipid Profile: Lipid profile (total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), triglycerides (TGs)) and HbA1c levels were measured. These markers were used to evaluate cardiometabolic risk and the severity of atherosclerotic processes.
All laboratory data were analyzed to determine the effects of metabolic stress, inflammation, and cardiac injury on mortality in STEMI patients.
2.4. Calculated Indices and Ratios
To comprehensively assess systemic inflammatory activity, metabolic status, and nutritional indicators in STEMI patients, several composite indices with established utility in the literature were calculated from laboratory parameters. These indices were evaluated in two main categories: inflammatory indices and metabolic/prognostic indices.
2.4.1. Inflammatory Indices
Hematological ratios reflecting inflammatory response were calculated using the following formulas:
These indices quantitatively reflect the level of systemic inflammatory response and were used as potential biomarkers for predicting post-STEMI mortality risk.
2.4.2. Metabolic and Prognostic Indices
Metabolic status and nutritional indices were calculated as follows:
These indices were used to evaluate—within a multidimensional framework—the impact of inflammatory burden, metabolic dysregulation, and nutritional status on cardiovascular mortality.
2.5. Statistical Analysis
All statistical analyses were performed using SPSS version 25.0 (IBM Corp., Armonk, NY, USA). Continuous variables were summarized as mean ± standard deviation (SD), and categorical variables as counts and percentages (%). The distribution of continuous variables was assessed using the Shapiro–Wilk test. For non-normally distributed variables, the Mann–Whitney U test was used, whereas the independent-samples t-test was applied for normally distributed variables. Differences between categorical variables were analyzed using the chi-square (χ2) test or z-test, as appropriate.
In addition, differences between patients with and without a prior percutaneous coronary intervention (PCI), as well as between mortality groups, were evaluated separately. Statistical significance was set at p < 0.05. Furthermore, receiver operating characteristic (ROC) curve analysis was performed to identify variables with the highest discriminatory power for predicting mortality, and the area under the curve (AUC) was calculated.
Univariate statistical analyses were conducted to describe baseline group differences and to provide clinical context. However, inclusion of variables in the machine learning models was not restricted based on univariate statistical significance. Since machine learning algorithms are capable of capturing nonlinear relationships and interaction effects that may not be evident through conventional hypothesis-driven testing, all clinically relevant variables were retained for model training.
2.6. Classification
The classification process was based on multidimensional clinical, laboratory, and procedural data obtained from STEMI patients, with the aim of developing predictive models for long-term mortality.
Five supervised ML algorithms were implemented: LR, RF, XGBoost, SVM, and ANN. The dataset was partitioned into 70% training and 30% independent test sets using stratified sampling to preserve the original class distribution across both subsets.
The dataset exhibited significant class imbalance, with mortality cases representing approximately 11% of the total population. To address this, the Synthetic Minority Over-sampling Technique (SMOTE) was applied exclusively to the training set prior to model fitting, generating synthetic samples for the minority class to achieve balanced class distribution. Importantly, SMOTE was performed only within the training partition at each iteration to prevent data leakage into the test set. Additionally, algorithms supporting imbalance handling (LR, RF, and SVM) were configured with balanced class weights to further penalize misclassification of the minority class and improve sensitivity.
To reduce the risk of performance overestimation associated with a single train–test split, model robustness was evaluated using Repeated Stratified 5-Fold Cross-Validation (5 folds × 10 repetitions = 50 iterations) applied exclusively to the training dataset. Performance metrics were averaged across folds, and standard deviations were calculated to assess stability. Final model performance metrics were subsequently evaluated on the independent hold-out test set.
Continuous variables were standardized using z-score normalization prior to model training. Missing values were handled using median imputation. Hyperparameter configurations were predefined based on prior literature and empirical testing within the training dataset.
Model performance was assessed using accuracy, sensitivity, specificity, precision, F1-score, and AUC-ROC. SHAP analysis was applied to the best-performing model to quantify the contribution of each variable to mortality prediction and enhance clinical interpretability.
All input features used for machine learning model training were derived from the variables presented in
Table 1,
Table 2,
Table 3 and
Table 4, including demographic characteristics, clinical parameters, procedural variables, laboratory measurements, and calculated inflammatory–metabolic indices. No additional or hidden variables beyond those explicitly reported in these tables were included in the modeling process.
4. Discussion
In this study, we present a comprehensive modeling framework in which multivariable clinical, laboratory, and inflammatory indices were analyzed using ML methods to predict long-term mortality in patients with STEMI. Our findings demonstrate that both conventional risk determinants (e.g., age, LVEF, and DTBT) and next-generation inflammatory–metabolic indices (e.g., SIRI, TyG, and CAR) exert a substantial impact on mortality. The highest predictive performance was achieved with the XGBoost and LR models, both of which predicted mortality with high accuracy (98–99%). Moreover, SHAP analysis confirmed that prolonged DTBT, low serum albumin, low ALI, and a high vascular score were the parameters most strongly associated with mortality.
When the demographic characteristics of our cohort were examined, the distributions of age and sex were consistent with those reported in classical STEMI cohorts in the literature [
20]. However, the significantly lower BMI observed in the mortality group (21.5 kg/m
2) supports the concept of the “obesity paradox,” which remains controversial but has been repeatedly discussed in acute coronary syndrome (ACS) [
21]. In a large-scale study by Bucholz et al. (2012), lower BMI values were associated with increased mortality among patients with ACS, and malnutrition/cachexia was suggested to reduce myocardial reserve [
22]. Excessive activation of neurohormonal and inflammatory pathways may accelerate catabolic processes and impose an additional burden on myocardial metabolism. Malnourished patients may be metabolically insufficient to meet the increased myocardial energy and nutritional demands. This energy imbalance and reduced physiological reserve may represent one of the key mechanisms explaining the strong relationship between malnutrition and adverse outcomes and increased mortality in ACS [
23]. In our study, the lower BMI, albumin, and PNI values in the mortality group similarly reflect the detrimental prognostic effect of malnutrition.
With respect to clinical parameters, the decisive role of DTBT on mortality aligns closely with prior evidence. Nallamothu et al. (2015) demonstrated that each 10 min delay in DTBT linearly increased mortality risk [
24]. DTBT delay prolongs the duration of myocardial ischemia, thereby increasing irreversible myocyte injury [
25]. Importantly, contemporary understanding emphasizes that successful myocardial perfusion is not limited to epicardial coronary artery patency; microvascular functionality also plays a pivotal role. Prolonged DTBT not only increases ischemic time but also substantially raises the risk of no-reflow and microvascular dysfunction, adversely affecting the final infarct size [
26]. Consistent with this, DTBT emerged as the most influential variable in our SHAP analysis, and the mean DTBT was significantly longer in the mortality group (53.6 min;
p < 0.001). This finding reaffirms that the timing of revascularization remains the most critical determinant of survival, independent of modern pharmacotherapy and interventional techniques.
Recent studies suggest that beyond the NLR, composite indices incorporating platelets and monocytes—such as SII and SIRI—may better reflect prognosis after STEMI [
7,
13]. Our results support this notion by demonstrating significantly higher levels of monocyte-based indices (SIRI) and the TyG index, which reflects metabolic dysregulation, in the mortality group. In addition, the strong association between low ALI—a parameter widely used in oncology but only recently explored in cardiology—and mortality suggests that ALI may also serve as a prognostic marker in STEMI. In contrast to some reports, we did not observe significant associations between SII, NLR, or AIP and mortality in our cohort (
p < 0.05). This may be attributable to the limited sample size. Nevertheless, these findings indicate that in complex clinical syndromes such as STEMI—where prognosis is influenced by numerous factors—a single biomarker may be insufficient for holistic risk assessment. Accordingly, our data support the hypothesis that comprehensive and complex algorithms integrating multiple parameters may provide more reliable guidance for clinical decision-making.
AI refers to machine-based technologies capable of learning by mimicking human cognitive functions, extracting meaningful patterns from complex datasets, and generating dynamic solutions based on these data. ML, a key subfield of AI, comprises algorithms that systematically analyze data, discover latent patterns, and optimize predictive capacity as they are exposed to additional data over time [
27]. This ability to “learn” and improve distinguishes ML from conventional analytical approaches. In the literature, ML-based strategies for mortality prediction in STEMI have been shown to outperform traditional risk scores such as TIMI or GRACE. Aziz et al. (2021) applied ML algorithms in 6299 STEMI patients to predict short- and long-term mortality and reported significantly higher AUC values for ML models compared with the TIMI score (0.88 vs. 0.81) [
28]. In line with these findings, our ML models—particularly XGBoost and RF—demonstrated strong predictive performance beyond conventional logistic modeling. While more complex deep learning (DL) or LLM-based models are increasingly applied in large-scale or unstructured datasets, their advantages are less evident in small structured tabular datasets. Given the sample size and the nature of the available features, tree-based algorithms and classical machine learning models were considered methodologically more appropriate and less prone to overfitting.
Furthermore, Han et al. (2025) developed explainable ML models incorporating systemic inflammation indices to predict malignant ventricular arrhythmias in STEMI patients and reported an AUC of 0.925 for the RF model [
13]. In our study, the AUC achieved by XGBoost was 0.999, indicating even higher discrimination. This difference may be explained by our integration of multiple indices (SII, SIRI, TyG, CAR, AIP, PIV, etc.) rather than relying on a single inflammatory marker, as well as the inclusion of clinically critical variables (DTBT, LVEF, and COPD). Collectively, this multidimensional approach may better represent biological heterogeneity.
In another relevant study, Fedai et al. (2024) used the CALLY index to predict the no-reflow phenomenon in STEMI and demonstrated that the XGBoost algorithm achieved high accuracy (>90%) [
29]. Our work expands this perspective by combining multiple inflammatory and metabolic indices rather than focusing on a single biomarker, thereby yielding a potentially stronger model for clinical decision support.
Regarding interpretability, SHAP analyses revealed clinically meaningful drivers of mortality risk. Prolonged DTBT, presence of COPD, low BMI, longer hospital stay, and reduced LVEF were identified as the strongest factors increasing mortality risk. These results align with previous studies emphasizing the critical importance of reperfusion timing and comorbidity management in STEMI [
28,
30]. Additionally, protective factors identified by SHAP (e.g., higher albumin and lower CRP) support the prognostic role of inflammation, consistent with prior evidence highlighting the value of inflammatory biomarkers in STEMI prognosis [
13].
This study has several limitations. First, it was conducted using a single-center retrospective design, which may introduce selection bias and limit the generalizability of the findings. Although our ML models demonstrated excellent performance in the internal dataset, they were not externally validated; therefore, testing in independent populations is required. Second, the predictors included in the model were limited to routinely available clinical and laboratory data. More comprehensive hemodynamic measurements, advanced imaging parameters (e.g., coronary flow reserve, microvascular perfusion indices), and richer long-term follow-up data were not available and were therefore excluded. Third, because reperfusion strategies, stent types, and pharmacological treatments (e.g., antiplatelet and statin dosing) were not uniformly documented across all cases, these factors may represent potential confounders that could influence model performance. Despite these limitations, integrating prognostic markers—whose utility in STEMI is supported by the literature—with AI/ML technologies holds substantial promise. We believe that combining these data with large-scale electronic health records to develop next-generation prognostic models will provide a strategic roadmap for future research and strengthen clinical decision-support systems. Additionally, more complex DL architectures were not extensively explored due to the limited sample size, and future studies with larger multicenter datasets may further evaluate their potential incremental benefit.
Although all clinically relevant variables were retained for model training, explicit feature selection or dimensionality reduction techniques were not applied. Tree-based algorithms such as XGBoost and RF inherently perform embedded feature selection through their split optimization process, thereby reducing the risk of unnecessary model complexity. Nevertheless, future studies may explore sensitivity analyses with reduced feature subsets to further evaluate model parsimony and generalizability.
Despite these methodological considerations, the overall objective of the present model was to provide a clinically applicable and interpretable prognostic framework rather than to maximize algorithmic complexity.
From a clinical perspective, the proposed ML model may be implemented as a bedside or web-based risk calculator using routinely available admission parameters. Integration into electronic health record systems could allow automatic risk estimation immediately after pPCI, facilitating early identification of high-risk patients who may benefit from closer monitoring or intensified therapeutic strategies. Although conventional risk scores such as TIMI and GRACE remain widely used in STEMI management, a direct comparison within the same dataset was not performed. Future studies should evaluate the incremental predictive value of ML-based approaches relative to established risk scores.
5. Conclusions
In this study, we demonstrated the effectiveness of ML algorithms and next-generation inflammatory–metabolic indices in predicting long-term mortality among patients with STEMI undergoing pPCI.
Based on our findings, the main conclusions are as follows:
The power of AI: Among the developed models, the XGBoost algorithm showed the most clinically robust performance by correctly identifying all mortality cases (100% sensitivity/recall). This suggests that AI-based systems may serve as strong clinical decision-support tools for the early identification of high-risk patients in emergency settings.
The importance of time: DTBT emerged as the most decisive factor associated with mortality. This finding further reinforces the vital importance of rapid revascularization in STEMI management.
Novel biomarkers: In addition to low albumin and low ALI, higher TyG and PIV values were strongly associated with poor prognosis. Because these indices can be easily derived from routine blood tests, they may be incorporated into risk stratification without additional cost.
The obesity paradox: The significantly lower BMI observed in the mortality group highlights the potential role of malnutrition and frailty as important risk factors in STEMI patients.
Overall, this multi-parametric approach—integrating clinical variables, biomarkers, and ML algorithms—may facilitate the development of personalized treatment strategies and contribute to improved survival among high-risk STEMI patients.