3.1. Interventional Cardiologist Phase
In the interventional cardiologist phase, which begins after the patient has been evaluated in the emergency department and referred for coronary angiography or stent placement, the aim is to support interventional cardiologists in identifying patients at high risk of developing cardiogenic shock by analyzing clinical and laboratory parameters available at this stage. The model incorporates a comprehensive set of features, the parameters were described in the Methods section. These variables were selected to reflect both the hemodynamic and ischemic burden of the patient and will be processed using machine learning algorithms to identify the most relevant predictors of cardiogenic shock at this stage.
The analysis focuses on the most important coefficients determined by Random Forest, which are crucial for predicting the risk of cardiogenic shock in this phase. These include Killip class, reperfusion type, number of DES, potassium, lesion type (culprit lesion), CKI, and TIMI flow before PCI. These factors significantly influence the predictive power of the model, emphasizing the need to consider them in clinical decision-making.
In the context of the interventional cardiologist phase, the Random Forest (RF) model demonstrates strong predictive performance. With an accuracy of 87.10% (95% CI: 75.30–98.90), high sensitivity (85.71%), and specificity (88.24%), along with an AUC of 0.9496, the model effectively distinguished patients at risk. Notably, it also achieved a low Brier score of 0.1110, indicating well-calibrated probability estimates essential for guiding therapeutic decisions in high-stakes settings such as the cardiac catheterization lab. In comparison, the logistic regression model achieved a lower accuracy of 74.19% (95% CI: 58.79–89.60%) and a higher Brier score of 0.1567, suggesting that its predictions were less reliably calibrated. Although both models performed well in terms of classification metrics, Random Forest demonstrated superior calibration and overall predictive reliability, as shown in
Table 1.
Additionally, the McNemar test p-value of 0.6171 indicates no statistically significant difference between the performance of the two models. This suggests that while RF outperforms LR in sensitivity, specificity, and F1-score, the difference in overall model performance is not large enough to be considered statistically significant, and both models can be used interchangeably in clinical practice depending on the specific needs for predictive power versus model interpretability.
Key parameters, including clinical factors such as Killip class, reperfusion type, number of DES, potassium, culprit lesion, CKI, and TIMI flow before PCI, were evaluated through logistic regression to assess their contribution to predicting the risk of cardiogenic shock in patients undergoing interventional cardiology procedures.
The results from the logistic regression analysis, as illustrated in
Table 2, provide significant insights into the prediction of cardiogenic shock risk. Killip class (Coef = 1.1524,
p-value = 0.0000) emerges as a highly significant predictor, with a positive coefficient indicating that a higher Killip class, reflecting a worse clinical condition, substantially increases the likelihood of poor outcomes such as cardiogenic shock. Similarly, reperfusion type (Coef = 1.3812,
p-value = 0.0029) is significantly associated with improved patient outcomes, where successful reperfusion (e.g., through PCI) significantly reduces the risk of complications and enhances recovery. On the other hand, number of DES (Coef = 0.0795,
p-value = 0.5944) does not show statistical significance, suggesting that the number of DES placed may not play a critical role in determining patient outcomes in this context. Potassium (K) (Coef = 0.4891,
p-value = 0.2147) exhibits a positive association with outcomes, but this relationship is not statistically significant, indicating that potassium levels may not be a primary predictor in this scenario. Culprit lesion (Coef = 0.1261,
p-value = 0.1045) also fails to reach statistical significance, although its inclusion provides valuable information regarding the severity of occlusion. Notably, Creatine Kinase Index (CKI) (Coef = −0.0006,
p-value = 0.0052) is identified as a statistically significant negative predictor, with lower CKI levels correlating with poorer outcomes, which aligns with its established role in myocardial injury. Finally, TIMI flow before PCI (Coef = 0.0230,
p-value = 0.9337) does not significantly predict patient outcomes in this phase, suggesting that the level of myocardial ischemia prior to intervention may not be a key factor in determining post-procedure prognosis. These findings highlight the critical role of clinical parameters such as Killip class and reperfusion type in predicting outcomes, while emphasizing the limited predictive value of other factors like number of DES and TIMI flow before PCI in the context of cardiogenic shock prediction.
To assess multicollinearity in the model and ensure the stability of the regression coefficients, we performed a Variance Inflation Factor (VIF) analysis for each of the key variables included in the model. The VIF values provide insight into how much the variance of a regression coefficient is inflated due to collinearity with other variables. A VIF value greater than 10 would typically suggest high multicollinearity, potentially leading to unreliable coefficient estimates. The following table summarizes the VIF values for each parameter, allowing us to assess the degree of multicollinearity and its potential impact on the model’s stability.
The Variance Inflation Factor analysis, as shown in
Table 3, indicates that multicollinearity is not a significant concern in the model, as all VIF values are below the commonly accepted threshold of 10. However, higher VIF values for “TIMI flow before PCI”. (4.7122) and “culprit lesion” (4.4707) suggest some degree of overlap in the information they provide regarding the severity of coronary artery disease. Despite this, these variables continue to independently contribute to the predictive model, emphasizing their relevance in assessing the risk of cardiogenic shock.
Overall, this study’s findings, with their significant implications for medical decision-making, underscore the clinical utility of combining clinical and angiographic data in predicting cardiogenic shock and other adverse outcomes during the interventional cardiology phase. In this context, it is essential to recognize the importance of considering all seven key parameters—Killip class, reperfusion type, number of DES, potassium (K), culprit lesion, CKI, and TIMI flow before PCI. While three parameters may appear most representative based on their individual coefficients and statistical significance, disregarding any of the other critical parameters could lead to incomplete risk stratification and suboptimal patient management. Each parameter contributes a unique perspective to the model, reflecting different facets of the patient’s condition. For example, while Killip class and reperfusion type provide valuable insights into the severity of heart failure and the effectiveness of treatment, the potassium level and TIMI flow before PCI score offer additional biochemical and pre-treatment insights that are equally essential in predicting patient outcomes. Additionally, the culprit lesion and CKI, although statistically less significant on their own, still play a role in providing a fuller understanding of myocardial injury and recovery potential.
Thus, the integration of these seven parameters in the predictive model ensures that clinicians can more accurately assess the risk of cardiogenic shock and other adverse events.
To support the conclusion that it is important to consider all seven parameters, we can analyze the predictive performance of the Random Forest (RF) model using both all seven features and only the top three features.
When using all seven features, the model demonstrates a higher overall accuracy of 87.10%, with strong sensitivity (88.24%) and specificity (85.71%), as well as an AUC of 94.96%. These results indicate that the model is well-calibrated and can effectively predict the risk of cardiogenic shock while minimizing both false positives and false negatives. The F1-score of 86.96% further reflects the model’s ability to balance precision and recall, essential for predicting high-risk patients.
In contrast, when the model is limited to just the top three features, accuracy decreases to 80.65%, and the F1-score drops to 80.00%, demonstrating a reduction in the model’s overall predictive capability. Although the sensitivity increases to 92.31%, which indicates better detection of high-risk patients, specificity drops to 72.22%, meaning there are more false positives, and the model is less reliable in identifying low-risk patients. The AUC also drops to 0.8172, further supporting the notion that using fewer features compromises the model’s ability to discriminate effectively between high-risk and low-risk patients.
These findings emphasize the importance of using all seven parameters in the predictive model, as omitting any of them can lead to a decrease in model performance, particularly in terms of accuracy and overall discrimination. The comprehensive inclusion of all features ensures a more robust and reliable model, which is critical in clinical decision-making, especially for managing high-risk patients in the interventional cardiology phase. Therefore, even though some parameters may appear less significant individually, their collective contribution is indispensable for achieving optimal model performance and improving clinical outcomes.
3.2. Cardiac Intensive Care Unit
The prediction of cardiogenic shock in patients within the cardiac intensive care unit is a critical element of personalized medical care. Early identification of at-risk patients allows for prompt intervention, which can significantly improve clinical outcomes.
The initial step in our analysis involves considering a comprehensive set of 45 key parameters in assessing patients during the CICU, covering demographic data, cardiovascular risk factors, clinical and biological parameters, ECG findings, echocardiographic findings, and angiographic parameters, with additional data such as the occurrence of cardiac arrest during hospitalization, events of resuscitation during hospitalization, ICU admission, the need for ventilatory support, and the use of inotropic and vasopressor support. Each of these factors plays a significant role in predicting patient outcomes and informing clinical decisions in this critical phase.
Using Random Forest, we perform feature selection to identify the most representative key parameters that are most predictive of cardiogenic shock. These key parameters, which include Killip class, hemoglobin, pain onset, heart rate, age, urea, and sex, are then used in the predictive models. The subsequent prediction is performed using two modeling approaches, logistic regression and Random Forest, to assess the effectiveness of the selected key parameters in predicting cardiogenic shock outcomes, as demonstrated in
Table 4.
The Random Forest model outperforms the logistic regression model across all performance metrics, including accuracy, sensitivity, F1-score, and AUC. With an accuracy of 80.77%, Random Forest demonstrates a higher ability to correctly classify both positive and negative outcomes compared to logistic regression, which has an accuracy of 76.92%. The sensitivity of Random Forest (80.00%) is also higher than that of logistic regression (73.33%), meaning it is better at identifying patients at high risk of cardiogenic shock.
Specificity remains the same for both models at 81.82%, indicating that both models are equally effective in identifying patients at low risk. However, the higher F1-score of Random Forest (0.8090) compared to logistic regression (0.7734) demonstrates that Random Forest provides a better balance between precision and recall.
Finally, the AUC of Random Forest (0.8667) is superior to that of logistic regression (0.7818), suggesting that Random Forest has a better overall discriminatory ability in predicting cardiogenic shock outcomes.
The p-value of 1.0000 from McNemar’s test indicates that there is no statistically significant difference in misclassification between the two models. This suggests that while the Random Forest model has a slightly higher AUC, both models have comparable performances, and the choice of model may depend on other factors such as interpretability and computational efficiency.
The analysis of logistic regression coefficients reveals that Killip class is a highly significant predictor for the risk of cardiogenic shock, as demonstrated by its low
p-value and positive coefficient. Although other variables such as hemoglobin, pain onset, heart rate, age, and urea are included in the model, none of them show significant associations with the outcome, as evidenced by their high
p-values, as shown in
Table 5.
However, these variables may still contribute to the predictive model, as their VIF indicate low to moderate levels of multicollinearity, suggesting that they do not significantly overlap with other predictors. For instance, hemoglobin and BUN have reasonable VIFs, but their lack of statistical significance (with p-values above 0.05) suggests that they do not strongly influence the prediction of cardiogenic shock in this model.
Thus, while Killip class remains the most significant parameter for predicting cardiogenic shock, the potential importance of the other variables in specific clinical contexts should not be overlooked, as shown in
Table 6. These factors, despite not being significant in this analysis, may still hold predictive value when considered in combination with other clinical parameters or in more refined models.
Given the inclusion of numerous variables (45 parameters), Random Forest (RF) was utilized for feature selection, focusing on the top seven most important predictors. The model performance using all seven features achieved an accuracy of 0.8077, with a 95% CI for accuracy ranging from 0.6154 to 0.8846. In contrast, when the model was restricted to the top one parameter (Killip class), the performance dropped significantly, with an accuracy of 0.7692, sensitivity of 0.6923, and AUC of 0.7758. These results highlight that, while a single highly significant predictor can offer some predictive value, using all seven key parameters—such as Killip class, cardiac arrest, and mechanical complications—leads to a more comprehensive and reliable assessment of cardiogenic shock risk. The full model consistently outperforms the simplified version, demonstrating that each parameter provides distinct and valuable information. Therefore, incorporating all relevant parameters is essential for enhancing prediction accuracy and enabling timely, targeted interventions in the cardiac ICU.
In this study, Random Forest was used not only as a classifier but also for identifying the most predictive clinical parameters associated with cardiogenic shock in the cardiac intensive care unit. With a Brier score of 0.1505, the model demonstrated both high discriminative ability and reliable probability estimates. This level of calibration is particularly relevant in medical contexts, where accurate risk prediction is critical for guiding clinical decisions. In comparison, the logistic regression model, although showing competitive accuracy and AUC, had a higher Brier score of 0.1870, suggesting less precise probability estimates. These findings support the use of calibrated models like Random Forest for clinical decision support where both classification accuracy and risk reliability are essential.
The Random Forest model, enhanced through feature selection and VIF analysis, offers a strong balance between predictive performance and interpretability. Identifying key variables not only improves model stability but also contributes to building practical tools that support early recognition of patients at risk of cardiogenic shock.
To further strengthen model reliability, it is important to explore additional algorithms. Models such as Extra Trees and Decision Trees may better capture feature interactions, while probabilistic approaches like QDA and Naïve Bayes can improve understanding of prediction uncertainty. Support Vector Machines, K-Nearest Neighbors, and ensemble methods like Gradient Boosting and AdaBoost offer complementary strengths that may enhance predictive accuracy and generalizability.
Overall, by combining key feature selection with diverse machine learning strategies, this approach supports the development of more accurate and reliable predictive tools for early detection of cardiogenic shock. These tools can improve clinical outcomes by enabling timely and targeted interventions.
Table 7 presents the performance results for the interventional cardiologist dataset (BD_ES_Interventional.csv). Among the 11 evaluated ML models, RF and QDA achieved the highest accuracy, both reaching 87.50%. These two models also have identical values for precision, recall, and F1-score, each of 87.50%. In terms of MCC, RF slightly out-performed QDA, with RF achieving 75.59% and QDA obtaining 75.00%. In comparison, NB have the lowest performances: ACC: 71.87%, Precision: 74.24%, Recall: 71.87%, F1-score: 71.17%, MCC: 46.05%.
Table 8 shows the results for the CICU dataset (BD_ES_CICU.csv). From all MLs, RF and QDA demonstrate the best performance. For RF, the metrics are: ACC: 84.37%, precision: 85.62%, recall: 84.37%, F1-score: 84.23%, and MCC: 69.99%. Similarly, QDA achieves ACC: 84.37%, precision: 84.50%, recall: 84.37%, F1-score: 84.35%, and MCC: 68.88%. On the other hand, ADA shows the lowest performance, with ACC, precision, recall, and F1-score all at 68.75%, and an MCC of 37.50%.
As observed QDA was the only model that demonstrated consistently strong performance across all evaluation metrics.
Notably, QDA was the only model that consistently demonstrated strong performance across both interventional and intensive care phases, suggesting its robustness and potential for generalized application in dynamic STEMI-CS risk stratification. Our findings support the use of RF and QDA in advanced care phases, offering timely risk estimation that may guide decisions such as extended CICU monitoring, early mechanical support, or stepped-down care in lower-risk patients.
Moreover, the XAI-LIME interpretation of the Random Forest classifier during the interventional cardiologist phase provides a detailed view of how individual features influenced the model’s prediction. The tables below,
Table 9 and
Table 10, integrates global and local interpretability insights from the Random Forest model used to predict cardiogenic shock. It summarizes both the most influential features identified across the entire dataset (global) and the instance-specific contributions revealed by XAI-LIME (local), including directionality and clinical significance.