1. Introduction
Coronary heart disease (CHD) remains one of the leading causes of death and disability worldwide, forming a significant clinical and socio-economic burden on healthcare systems [
1,
2]. In inpatient practice, patients who are admitted with clinical suspicion of coronary heart disease and acute coronary syndromes (ACS) are a particularly difficult category, since the choice of diagnostic and therapeutic tactics must be carried out in a short time and based on a limited set of primary examination data [
3,
4,
5]. Current recommendations emphasize the need for an early structured assessment of the risk and likelihood of coronary lesion, as well as timely determination of indications for invasive tactics and revascularization [
6,
7]. At the same time, standardization of terms and endpoints (including myocardial infarction) is essential for the correct interpretation of clinical trial results and the construction of reproducible prognostic models [
8].
Despite the high diagnostic accuracy of coronary angiography and other imaging methods of the coronary bed, their use may be limited by the availability of resources, organizational factors and clinical stability of the patient [
9,
10]. For this reason, the role of approaches focused on the use of routine non-invasive data available at admission, clinical characteristics, standard laboratory parameters and electrocardiography is increasing. These data form the basis of primary triage and are used in almost all clinics, which makes them attractive for the development of scalable diagnostic decision-support tools. At the same time, the question remains as to how informative such an “early” set of indicators is for predicting not only the anatomical severity of coronary atherosclerosis, but also the actual clinical decision to perform percutaneous coronary intervention (PCI) in the context of current hospitalization.
Type 2 diabetes mellitus is one of the key factors of cardiometabolic risk and significantly modifies the course of coronary heart disease, the incidence of complications and the prognosis of patients [
11]. Diabetes is associated with more widespread and diffuse coronary artery disease, a higher incidence of adverse events, and potentially worse revascularization outcomes compared with patients without impaired carbohydrate metabolism [
12]. In this regard, the clinical guidelines (ESC/EASD) separately identify patients with diabetes as a high-risk group requiring systematic consideration of metabolic factors in the diagnosis and treatment of cardiovascular diseases [
13].
In addition to the fact of diabetes, increasing attention is being paid to the parameters of glycemic control and metabolic instability. Traditionally, HbA1c has been widely used in clinical practice as an indicator of chronic hyperglycemia. However, HbA1c reflects the average glucose exposure and does not always detect fluctuations in glycemia, which may have independent pathophysiological and prognostic significance [
14]. Parameters of glycemic control, such as HbA1c and admission glucose, are commonly used in clinical practice [
15]. At the same time, the issue of the additional diagnostic value of glycemic control and glucose variability metrics specifically for early prediction of the need for PCI remains insufficiently studied, especially in the context of retrospective inpatient data from real practice.
To support clinical decisions, scoring systems and risk stratification scales (GRACE, HEART, TIMI) are traditionally used, which demonstrate acceptable predictive ability in a number of populations, but may have limitations in tolerability between clinics, populations, and clinical scenarios [
16]. In particular, universal scales are not always optimal for predicting the “tactical” endpoint—PCI execution—since revascularization is the result of a comprehensive solution that depends on clinical manifestations, marker dynamics, ECG, angiography availability, and expert assessment by the treating team. This creates the need for models capable of integrating heterogeneous features and taking into account the nonlinear interactions between them.
Machine learning (ML) methods are considered as a promising tool for analyzing clinical tabular data and ECG features, which makes it possible to increase the discriminative ability of models by identifying complex dependencies and interactions of risk factors [
17,
18,
19]. A number of studies have shown that ML approaches can improve the prediction of coronary pathology based on clinical, laboratory, and electrocardiographic parameters compared to traditional methods [
20]. At the same time, the introduction of ML models into the clinic requires not only high accuracy, but also transparency, interpretability and reproducibility of results, which leads to interest in methods of explicable AI (XAI), including approaches based on Shapley values, as well as compliance with modern recommendations on reporting and risk assessment of bias in prediction-model studies [
21,
22].
A retrospective single-center analysis of routine clinical, laboratory, and ECG data was performed of patients hospitalized with suspected coronary heart disease. PCI during the current hospitalization was considered as a clinically significant binary endpoint as a proxy indicator of real patient management tactics. This approach allows assessment of the potential usefulness of diagnostic models at an early stage of hospitalization.
The aim of the study was to evaluate the diagnostic effectiveness of machine learning models based on a basic set of primary examination signs, as well as to study the additional value of glycemic control indicators, including glucose variability metrics, to improve the prediction of the need for PCI. The results obtained may be useful for the development of non-invasive tools for early risk stratification and clinical decision support in patients with suspected coronary artery disease in a hospital setting.
2. Materials and Methods
2.1. Study Design and Sample Characteristics
The study is a retrospective, single-center observational study based on clinical data retrospectively collected from the National Hospital of the Medical Center of the Administrative Department of the President of the Republic of Kazakhstan (Almaty, Kazakhstan), aimed at evaluating the feasibility of predicting percutaneous coronary intervention (PCI) using clinical, laboratory, and electrocardiographic data.
The initial cohort included 138 patients hospitalized with suspected coronary artery disease between October 2024 and December 2025. Inclusion criteria included hospitalization with suspected coronary artery disease and availability of clinical, laboratory, and electrocardiographic data required for analysis. Patients with missing information on the primary endpoint (PCI performed during the current hospitalization) were excluded from the study. One patient was excluded from the analysis due to a lack of information about the primary outcome. Thus, the final analytical sample consisted of 137 patients.
The primary endpoint was the fact that PCI was performed during the current hospitalization (PCI: 1—completed, 0—not completed). It should be emphasized that PCI reflects a clinical decision based on a combination of factors, including the results of coronary angiography, the doctor’s clinical assessment and organizational features, and is not a direct indicator of the anatomical severity of coronary artery damage.
Additional information on coronary angiography was obtained from available clinical records. Coronary angiography was performed according to clinical indications and was not available for all patients.
Patients were categorized based on both PCI status and angiography status. All patients who underwent PCI had previously undergone coronary angiography. In the non-PCI group, patients represented a heterogeneous population and included: (1) patients who underwent coronary angiography without subsequent indication for PCI, and (2) patients who did not undergo angiography due to clinical stability, contraindications, or organizational factors.
Thus, the non-PCI group does not represent a homogeneous population without significant coronary artery disease, but rather reflects real-world clinical practice, where invasive diagnostics are applied selectively.
Baseline characteristics of the study population are presented in
Table 1.
The average age of the patients was 65.1 ± 8.5 years, while the proportion of men reached 62.8%. Type 2 diabetes mellitus was detected in 51.1% of patients. The average values of HbA1c and glucose levels at admission were 7.23 ± 1.89% and 7.86 ± 3.62 mmol/L, respectively.
Among the electrocardiographic signs, ST segment elevation was observed in 22.6% of patients, ST depression—in 12.4%, pathological Q wave—in 21.2%, while inversion of the T wave was recorded in the majority of patients (87.6%).
The overall study workflow, including patient selection, data preprocessing, model development, and evaluation, is illustrated in
Figure 1.
2.2. Research Variables
The selection of variables for this study was carried out taking into account their clinical significance, as well as their availability at various stages of the patient’s examination. The analysis included indicators reflecting demographic characteristics, data from the initial clinical examination, laboratory parameters, glycemic status and electrocardiographic signs.
Particular attention was paid to the temporal consistency of the variables used, in particular, the differentiation of the parameters available at the admission stage and the indicators obtained during hospitalization.
All variables used in the analysis are presented and described in detail in
Table 2.
As shown in
Table 2, the study included a set of variables covering the main aspects of the patient’s clinical assessment. Demographic and clinical indicators reflected the initial characteristics of patients, while laboratory and glycemic parameters made it possible to assess the metabolic state.
Electrocardiographic signs were used to identify objective markers of myocardial ischemia and electrical disorders. PCI performed during hospitalization was used as the primary endpoint during hospitalization, which reflects a clinical decision rather than a direct diagnostic characteristic of the severity of the disease. This structure of variables allowed us to form several models with varying degrees of complexity: from the basic early assessment model (SAFE) to more advanced models including clinical and laboratory (CLINICAL) and glycemic (EXTENDED) parameters.
2.3. Data Preprocessing
Data preprocessing included several sequential steps aimed at improving the quality and reproducibility of the analysis (
Figure 2).
At the first stage, the completeness of the data was checked. One patient with a missing target variable value (PCI performance) was excluded from the analysis. Thus, the final analytical sample consisted of 137 patients.
Next, the correctness of the values was checked. Physiologically implausible or obviously erroneous values (for example, extreme HbA1c values or abnormal ECG interval values) were considered as missed and replaced with missing (NaN) values. This approach made it possible to avoid distortion of the results associated with data entry errors.
Categorical variables were converted to binary format (0/1), which ensured their correct use in machine learning models. In particular, signs such as the presence of diabetes mellitus, as well as electrocardiographic changes (ST segment elevation, ST depression, pathological Q wave, inversion of the T wave), were presented as binary indicators.
The missing values were processed using median imputation for quantitative variables and imputation by the most frequent value for categorical features. It is fundamentally important that the imputation procedure was performed separately in the training subsamples as part of cross-validation, which made it possible to avoid data leakage and ensure the correctness of model evaluation.
Variables with a high proportion of omissions (>50%) were not included in the main analysis, as their use could lead to instability of the models and increase systematic errors.
Additionally, various sets of signs (SAFE, CLINICAL, and EXTENDED) were formed, which made it possible to take into account the availability of data at different stages of the clinical examination and minimize the impact of temporal ambiguity.
2.4. Criteria for Choosing Machine Learning Algorithms
The selection of machine learning algorithms was based on methodological and practical considerations related to the clinical context of the study and the characteristics of the dataset.
Logistic regression was chosen as a baseline model due to its high interpretability and its widespread use in clinical research. This approach enables transparent estimation of feature effects and facilitates the assessment of independent associations between predictors and the outcome.
Random Forest was selected as a nonlinear ensemble method capable of capturing complex interactions between variables and effectively handling heterogeneous clinical data without requiring strict distributional assumptions. In addition, it is relatively robust to noise and less prone to overfitting when applied to small- and medium-sized datasets.
Gradient boosting was additionally included as an alternative ensemble approach to evaluate whether sequential learning strategies could further improve predictive performance.
Given the relatively limited sample size, preference was given to algorithms known to perform reliably under such conditions.
The use of multiple algorithms allows comparison between interpretable linear models and more flexible nonlinear methods, providing a more comprehensive assessment of model performance and robustness.
2.5. Building Models
To address the binary classification task (predicting whether PCI is performed), several machine learning approaches were applied, including both interpretable statistical models and ensemble algorithms.
Three feature sets were defined according to data availability at different stages of patient assessment. The SAFE feature set included age, sex, type 2 diabetes status, heart rate, and ECG findings (ST elevation, ST depression, Q wave, and T-wave inversion). The CLINICAL feature set included all SAFE variables plus body mass index, creatinine, and eGFR. The EXTENDED feature set further included glycemic parameters, namely HbA1c and admission glucose.
Missing data were handled using median imputation within the machine learning pipeline. Physiologically implausible values were treated as missing. Specifically, predefined thresholds were applied for key variables (e.g., HbA1c, ECG intervals, and laboratory parameters), and values outside these ranges were excluded from analysis.
All preprocessing steps, including imputation and scaling of continuous variables, were performed within the training folds as part of the machine learning pipeline to prevent data leakage during cross-validation. Model performance was evaluated using repeated stratified 5-fold cross-validation with 10 repetitions, which allowed for more stable estimates and helped reduce the risk of overfitting in a relatively small dataset. Logistic regression, Random Forest, and Gradient Boosting models were used. Model hyperparameters were predefined based on methodological considerations and were selected to limit model complexity and reduce the risk of overfitting.
Logistic regression was used as a baseline model for predicting the probability of performing PCI. The model estimates the probability of the binary outcome using a logistic function applied to a linear combination of predictors:
where
represents the set of predictors and (
) are the model coefficients estimated using maximum likelihood. The predicted probability was used for classification with a threshold of 0.5. All identified relationships were interpreted as associative and did not imply causality.
The coefficients of the logistic regression were interpreted in terms of the odds ratio, calculated using the formula:
A positive coefficient value (for example, βdiabetes > 0) indicates an increase in the probability of PCI in the presence of a corresponding sign. All the dependencies obtained were interpreted as associative and did not imply a causal relationship.
2.5.1. Random Forest
The Random Forest algorithm was used as a nonlinear ensemble machine learning method in the study. This approach is based on constructing a set of decision trees and then combining their predictions.
The hyperparameters of the Random Forest model were selected a priori based on methodological considerations and prior experience with similar clinical datasets. In particular, the number of trees (n_estimators = 300) was chosen to ensure stable ensemble performance, while limiting tree depth (max_depth = 4) and increasing the minimum number of samples required for splitting and leaf nodes (min_samples_split = 10, min_samples_leaf = 5) were intended to reduce model complexity and mitigate overfitting, which is especially important given the relatively small sample size.
A formal hyperparameter optimization procedure (e.g., grid search) was not performed in order to avoid overfitting to the limited dataset and to preserve model generalizability. Each tree was trained on a random subsample of data using a random subset of features, which made it possible to reduce the correlation between the trees and increase the generalizing ability of the model.
The final prediction was formed based on the voting of the ensemble of trees.
The probability of completing the PCI was estimated as the proportion of trees that voted for the appropriate class:
Thus, the random forest model made it possible to take into account nonlinear dependencies and interactions between features, which is especially important when analyzing clinical data with a complex structure.
2.5.2. Gradient Boosting
As an alternative ensemble method, the study considered the Gradient Boosting algorithm, based on the sequential construction of weak models in order to minimize the loss function.
Unlike a random forest, where trees are trained independently, in gradient boosting, each subsequent model is built taking into account the errors of previous models. Thus, the model gradually refines the predictions, reducing the cumulative error.
The final model is represented as the sum of the basic algorithms:
where h
m(x) is the base model (usually a decision tree), γ
m is the weight of the corresponding model, and M is the total number of iterations (trees).
At each iteration, the model is trained on the gradient of the loss function, which makes it possible to reduce the prediction error in a targeted manner:
where L is the loss function, and F(x) is the current prediction of the model.
The model is updated according to the rule:
where η is the learning rate, which determines the contribution of each subsequent model.
This study used the implementation of gradient boosting with limited tree depth and reduced learning rate, which partially controlled the risk of overfitting. However, given the relatively small sample size, the results of this model were interpreted with caution and used primarily for comparative analysis with more stable models. All analyses were performed in Python (version 3.10, Python Software Foundation, Wilmington, DE, USA) using the following libraries: pandas (version 1.5.3), numpy (version 1.23.5), statsmodels (version 0.13.5), and scikit-learn (version 1.2.2).
2.6. Evaluating Models
The assessment of the quality of the constructed models was carried out using Repeated Stratified K-Fold Cross-Validation. This approach makes it possible to ensure the stability and reproducibility of the results, especially in conditions of a limited sample size. The initial data was divided into 5 equal subsamples (folds) while maintaining the initial distribution of the target variable (stratification). At each iteration, one subsample was used as a test sample, and the remaining four were used to train the model. The procedure was repeated 10 times with various random splits, which allowed us to reduce the variability of estimates and obtain more reliable results.
All stages of data preprocessing, including imputation of missing values and scaling of features, were performed exclusively within training subsamples as part of cross-validation. This approach eliminated information leakage and ensured a correct assessment of the generalizing ability of the models.
To quantify the quality of the models, the following metrics were used: the area under the ROC curve (ROC-AUC), which characterizes the ability of the model to distinguish classes; the area under the accuracy-completeness curve (PR–AUC), which is especially informative for unbalanced data; accuracy, reflecting the proportion of correctly classified observations; and F1-measure representing the harmonic mean of accuracy and completeness.
For each metric, the average and standard deviation were calculated for all iterations of the cross validation. Additionally, models based on different sets of features (SAFE, CLINICAL, and EXTENDED) were compared, which made it possible to evaluate the contribution of various groups of variables to predicting PCI performance.
Thus, the chosen evaluation strategy ensured reliable and objective verification of the models, minimizing the risk of overfitting and overestimating their effectiveness.
2.7. Statistical Analysis
Statistical analysis was performed to quantify the characteristics of the sample and identify associations between clinical, laboratory, and electrocardiographic parameters and the likelihood of percutaneous coronary intervention (PCI).
Continuous variables were analyzed taking into account their distribution. The normality test was performed using the Shapiro–Wilk criterion. Under normal distribution, the data were presented as the mean and standard deviation (mean ± SD), otherwise as the median and interquartile range (median [IQR]).
Categorical variables were represented as absolute and relative frequencies:
where n is the number of observations in the category, and N is the total sample size.
To compare two independent groups (PCI vs. non-PCI), the student’s t-test for normally distributed data and the Mann–Whitney test for abnormal distribution were used.
The χ2 criterion was used for categorical variables:
where O
i is the observed frequency, E
i is the expected frequency.
For small expected values, Fischer’s exact criterion was additionally applied.
Statistical significance was assessed using two-sided tests (
Table 3). A
p-value < 0.05 was considered statistically significant.
The study was reported in accordance with the TRIPOD guidelines.
3. Results
3.1. Comparison Between PCI and Non-PCI Groups
Patients who underwent PCI differed significantly from those who did not in several clinical, metabolic, and electrocardiographic parameters.
The PCI group was younger (62.6 ± 8.7 vs. 66.2 ± 8.2 years, p = 0.024) and had a significantly higher prevalence of type 2 diabetes mellitus (76.6% vs. 37.8%, p < 0.001). In addition, both HbA1c levels (7.82 ± 1.90 vs. 6.72 ± 1.74%, p = 0.006) and admission glucose levels (9.17 ± 4.63 vs. 7.08 ± 2.58 mmol/L, p < 0.001) were significantly higher in patients undergoing PCI.
Coronary angiography was performed in a subset of patients as part of routine clinical evaluation. All patients who underwent PCI had previously undergone coronary angiography, as this procedure is required prior to revascularization. In the non-PCI group, only a proportion of patients underwent angiographic assessment, while the remaining patients received conservative treatment without invasive diagnostics. This reflects real-world clinical practice, in which the decision to perform coronary angiography depends on a combination of factors, including clinical presentation, risk stratification, and the treating physician’s judgment.
The inclusion of patients without angiographic confirmation in the non-PCI group may introduce heterogeneity, as some of these patients could have had significant coronary artery disease that remained undetected due to the absence of invasive evaluation.
Electrocardiographic abnormalities were also more frequent in the PCI group, including ST-segment elevation (39.1% vs. 14.6%, p = 0.003), ST-segment depression (21.3% vs. 7.9%, p = 0.048), and pathological Q waves (39.1% vs. 12.4%, p < 0.001).
No statistically significant differences were observed in BMI, renal function parameters (creatinine and eGFR), or T-wave inversion.
These findings suggest that both metabolic factors and ECG abnormalities are strongly associated with the likelihood of undergoing PCI. The comparison of clinical characteristics between PCI and non-PCI groups is presented in
Table 4.
3.2. Machine Learning Model Results
The results of evaluating machine learning models for various feature sets (SAFE, CLINICAL, and EXTENDED) are presented in
Table 5.
Values are presented as mean ± standard deviation based on repeated stratified 5-fold cross-validation. The models built on the basis of a SAFE set of features (available at the initial examination stage) demonstrated stable predictive ability. The logistic regression showed a ROC-AUC of 0.734 ± 0.092, while the random forest model showed 0.724 ± 0.092.
The expansion of the model to include standard laboratory parameters (the CLINICAL model) led to a slight improvement in the results. The best performance in this group was demonstrated by a random forest (ROC-AUC = 0.739 ± 0.080), while the logistic regression showed a ROC-AUC of 0.713 ± 0.109.
The highest values were achieved using the EXTENDED model, which includes glycemic parameters. The random forest model showed a maximum ROC-AUC value of 0.755 ± 0.079, as well as higher accuracy values (0.718 ± 0.065) and F1-measures (0.571 ± 0.100).
Thus, the addition of glycemic parameters provided a moderate improvement in the predictive ability of the model.
3.3. Interpretation of the Model and Significance of the Features
The results of the multidimensional logistic regression analysis are presented in
Table 6.
The strongest statistically significant association with PCI was observed for the presence of type 2 diabetes mellitus (OR = 7.36; 95% CI: 2.79–19.40;
p < 0.001). This finding highlights the important role of metabolic factors in influencing clinical decision making regarding revascularization (
Figure 3).
Other variables, including age, sex, heart rate, electrocardiographic parameters, and renal function indicators, did not demonstrate statistically significant associations with PCI (p > 0.05). In particular, the presence of a pathological Q wave showed a positive but non-significant association (OR = 2.72; p = 0.118), indicating a possible trend that requires further investigation in larger cohorts.
It should be emphasized that the identified relationships are associative and should not be interpreted as causal.
To assess potential multicollinearity among predictors, the Variance Inflation Factor (VIF) was calculated for all variables included in the EXTENDED model (
Table 7).
All variables demonstrated low VIF values (range: 1.07–2.02), indicating the absence of substantial multicollinearity. In particular, the glycemic variables (type 2 diabetes, HbA1c, and admission glucose), although clinically related, did not exhibit a level of interdependence that could significantly affect the stability of regression estimates.
These findings suggest that the differences observed between the logistic regression results and the Random Forest feature importance are unlikely to be explained by multicollinearity. Rather, they may reflect inherent differences between modeling approaches, including the sensitivity of logistic regression to variable representation and the ability of ensemble methods to capture nonlinear relationships and interactions between predictors.
Analysis of feature importance in the Random Forest model demonstrated that the greatest contribution to PCI prediction was provided by metabolic parameters, particularly HbA1c and admission glucose levels. Additionally, renal function (eGFR), the presence of type 2 diabetes, and heart rate showed notable importance within the model.
These results suggest that metabolic and systemic factors play a substantial role in the integrated prediction of PCI when using machine learning approaches (
Figure 4).
Figure 4 presents the results of the multivariable logistic regression analysis, showing the odds ratios (OR) and 95% confidence intervals for each predictor associated with the likelihood of undergoing PCI.
Type 2 diabetes mellitus demonstrated the strongest association with PCI, indicating a substantially increased probability of intervention. Electrocardiographic parameters, including Q wave and ST-segment abnormalities, also contributed to the model, although with wider confidence intervals.
From the perspective of model interpretability, this analysis provides a transparent assessment of the direction and magnitude of the effect of each variable. Such representation aligns with explainable artificial intelligence (XAI) principles, as it enables direct clinical interpretation of the model outputs.
3.4. The Relationship of Glycemic Status with PCI
When analyzing the dependence of the frequency of PCI on the level of HbA1c, a clear trend towards an increase in the frequency of intervention was revealed as glycemic control worsened. Patients with elevated HbA1c values were significantly more likely to have PCI compared to patients with normal values (
Figure 5).
Additionally, it was shown that the presence of type 2 diabetes mellitus is also associated with a higher frequency of PCI. In the group of patients with diabetes, the proportion of interventions performed was higher than in the group without diabetes (
Figure 6). This confirms the significant role of metabolic disorders in the development of coronary pathology.
Distribution of PCI performance according to type 2 diabetes status. Patients with diabetes demonstrated a higher incidence of PCI compared to those without diabetes.
3.5. Evaluation of Machine Learning Models
The comparative analysis of machine learning models demonstrated that all evaluated approaches showed moderate discriminative ability in predicting PCI.
Models based on the SAFE feature set, which includes only variables available at the time of admission, also demonstrated stable performance. In particular, logistic regression achieved a ROC-AUC of 0.734 ± 0.092, indicating that even early non-invasive data can provide clinically relevant predictive information.
The analysis of mean ROC curves obtained using repeated stratified cross-validation (
Figure 7) demonstrated moderate and consistent discriminative performance across the selected models, without a clear dominance of any single approach.
Precision–Recall analysis (
Figure 8), which is particularly informative in the presence of class imbalance, showed comparable performance across the selected models. The Random Forest model demonstrated slightly higher PR-AUC values within the CLINICAL feature set, while logistic regression exhibited stable performance across different feature configurations.
Overall, these findings indicate that while more complex models and extended feature sets provide incremental improvements, simpler models based on early available data may still offer clinically useful predictive performance.
Calibration analysis was additionally performed for the best-performing EXTENDED Random Forest model. The calibration plot (
Figure 9) and calibration table (
Table 8) demonstrated a generally acceptable agreement between predicted probabilities and observed event rates, with minor deviations observed in the lower probability range.
The Brier score was 0.192, indicating a moderate level of overall prediction accuracy. The Hosmer–Lemeshow test showed no statistically significant lack of fit (χ2 = 7.49, p = 0.058), suggesting acceptable calibration of the model. Overall, the results indicate that the predicted probabilities are broadly consistent with the observed event rates.
4. Discussion
The present study demonstrated the possibility of predicting PCI performance based on noninvasive clinical, laboratory, and electrocardiographic data. The constructed machine learning models showed moderate predictive ability, while the best results were obtained using an expanded set of features, including glycemic parameters.
One of the key results is that even the basic model (SAFE), based solely on data available at the patient admission stage, demonstrates stable discriminative ability (ROC-AUC of about 0.73). This indicates the potential informative value of standard clinical and ECG indicators for early risk stratification and support for clinical decision making in a time-limited environment.
An interesting finding of this study is that the Gradient Boosting model consistently demonstrated lower performance compared to both Random Forest and logistic regression across all feature sets. This finding is somewhat unexpected, given the theoretical advantages of boosting algorithms.
Several factors may explain this result. First, the relatively small sample size and limited number of events may reduce the effectiveness of sequential learning approaches, which typically require larger datasets to achieve stable performance. Second, the use of conservative hyperparameters (e.g., reduced learning rate and limited model complexity) to mitigate overfitting may have constrained the model’s ability to capture informative patterns. Third, class imbalance may have affected the optimization process, as boosting algorithms can be sensitive to the distribution of the target variable.
In contrast, Random Forest, as a bagging-based method, is generally more robust to small sample sizes and noise, while logistic regression benefits from its simplicity and lower variance. These factors may explain the relatively better performance of these models in the present study.
The addition of laboratory and metabolic parameters (CLINICAL and EXTENDED models) was accompanied by a moderate improvement in the quality of forecasting. In particular, the inclusion of glycemic and HbA1c parameters improved the effectiveness of the models, which is consistent with the literature data on the significant role of metabolic disorders in the development and progression of coronary heart disease.
The results of the multivariate analysis showed that the presence of type 2 diabetes mellitus is the most significant factor associated with PCI (OR = 7.36; p < 0.001). This confirms the clinical significance of diabetes as a factor influencing not only the course of the disease, but also the choice of therapeutic tactics.
The findings of the present study are consistent with previously published data demonstrating the significant role of metabolic disorders, particularly type 2 diabetes mellitus, in the progression of coronary artery disease and the need for revascularization [
23]. Prior studies have shown that patients with diabetes tend to have more diffuse and complex coronary lesions, which increases the likelihood of percutaneous coronary intervention.
In addition, the observed associations between electrocardiographic abnormalities (such as ST-segment changes and pathological Q waves) and PCI are in line with established clinical criteria used in the diagnosis and management of acute coronary syndromes [
24]. These ECG markers are routinely incorporated into clinical decision-making algorithms and current cardiology guidelines, where they serve as key indicators for urgent invasive strategies.
Importantly, the endpoint used in this study—PCI performed during the current hospitalization—reflects a clinical management decision rather than a direct measure of disease severity. This decision is influenced not only by anatomical findings, but also by a combination of clinical presentation, physician judgment, and organizational factors. While this may limit the use of PCI as a surrogate for obstructive coronary artery disease, it also provides an opportunity to model real-world clinical decision-making processes. From this perspective, the proposed approach should be interpreted not as a tool for diagnosing coronary artery disease or predicting hard clinical outcomes, but rather as a decision-support framework aimed at estimating the likelihood of revascularization based on early available data. Such models may be particularly useful in the initial stages of hospitalization, where rapid triage and prioritization are required, and where access to advanced diagnostics may be limited or delayed.
A comparison of different algorithms has shown that ensemble methods, in particular a random forest, have greater stability and a better ability to take into account nonlinear relationships between features compared with logistic regression. At the same time, the differences between the models remained moderate, which may be due to the limited sample size.
In general, the results obtained should be considered as preliminary and hypothesis-forming. Despite the moderate accuracy of the models, the study demonstrates the promise of using machine learning methods to integrate heterogeneous clinical data and develop decision-support tools.
Further research should aim to validate the proposed models on larger, multicenter datasets that include more diverse patient populations, which would improve the reliability and generalizability of the results. It would also be important to explore the incorporation of additional data sources, such as longitudinal clinical observations or continuous ECG monitoring, which may provide further improvement in predictive performance. Prospective studies are needed to better understand how such models can be integrated into routine clinical practice and whether they can meaningfully support clinical decision making.
From a clinical standpoint, the potential utility of the proposed approach may vary depending on the healthcare setting. In developed countries, these models could complement existing diagnostic pathways by supporting early risk stratification and helping to optimize workflow in high-demand clinical environments. In developing countries, where access to invasive procedures and advanced imaging may be limited, the use of readily available non-invasive data in combination with machine learning may offer a practical tool for preliminary assessment and prioritization of patients.
5. Limitations
This study has several important limitations. First, it was conducted as a retrospective single-center study with a relatively small sample size, which limits the generalizability of the findings. In particular, the number of PCI events (n = 47) is limited. In multivariable modeling, the number of events per predictor variable (EPV) in the EXTENDED model falls below commonly recommended thresholds, which may lead to instability of coefficient estimates and an increased risk of overfitting. The relatively small sample size also limits the ability to capture more complex patterns in the data. Therefore, the findings should be interpreted with caution and considered exploratory.
Second, the primary endpoint (PCI performance) reflects a clinical decision rather than a direct measure of the anatomical severity of coronary artery disease. Since PCI is typically performed following coronary angiography, patients who did not undergo angiographic assessment could not receive PCI regardless of the true severity of their condition. As a result, the non-PCI group may include a heterogeneous population comprising both patients without significant coronary lesions and those who were not evaluated invasively. This may introduce endpoint-related selection bias and affect both the estimated associations and predictive performance of the models.
Third, information on coronary angiography was not available in a structured format for all patients and therefore was not included as a variable in the main analysis. Although angiographic procedures were documented in individual clinical records, systematic extraction of this information was beyond the scope of the present study. Future studies should incorporate angiography status as a structured variable and perform sensitivity analyses restricted to patients who underwent coronary angiography.
Fourth, although internal model validation was performed using repeated cross-validation and calibration was assessed using calibration plots, calibration tables, Brier score, and the Hosmer–Lemeshow test, no external validation was conducted. This limits the assessment of model generalizability and stability in independent cohorts.
Fifth, some variables were characterized by a substantial proportion of missing values, which could affect model stability despite the use of imputation procedures.
Sixth, model hyperparameters were predefined rather than optimized using systematic search procedures. Although this approach was chosen to reduce the risk of overfitting in a relatively small dataset, it may have limited the ability to identify optimal model configurations.
Finally, a formal risk of bias assessment using the PROBAST framework was not performed, which represents an additional limitation of the study.
Overall, the proposed approach should be considered preliminary and hypothesis-generating. The results should be interpreted as reflecting real-world clinical decision making rather than purely anatomical disease severity. Further studies should include larger and multicenter cohorts, as well as external validation and incorporation of angiographic data.
6. Conclusions
This study demonstrates the feasibility of predicting the likelihood of a clinical decision to perform percutaneous coronary intervention (PCI) based on noninvasive clinical, laboratory, and electrocardiographic data available at early stages of hospitalization. The best-performing model, based on an EXTENDED set of features and a Random Forest algorithm, achieved moderate discriminative performance. In addition to discrimination metrics, calibration analysis indicated an overall acceptable agreement between predicted probabilities and observed event rates.
Importantly, the proposed models are intended to support the assessment of clinical decision-making processes rather than to directly predict the presence or severity of coronary artery disease. The findings also suggest a potential role of metabolic factors, particularly type 2 diabetes and glycemic parameters, in shaping the likelihood of revascularization decisions.
Given the retrospective single-center design, relatively small sample size, absence of external validation, and remaining methodological limitations, the results should be interpreted as preliminary and hypothesis-generating.
Further research using larger, independent cohorts, with external validation and comprehensive calibration assessment, is required before any potential clinical implementation.