Aortic Risks Prediction Models after Cardiac Surgeries Using Integrated Data

The complications of thoracic aortic disease include aortic dissection and aneurysm. The risks are frequently compounded by many cardiovascular comorbidities, which makes the process of clinical decision making complicated. The purpose of this study is to develop risk predictive models for patients after thoracic aneurysm surgeries, using integrated data from different medical institutions. Seven risk features were formulated for prediction. The CatBoost classifier performed best and provided an ROC AUC of 0.94–0.98 and an F-score of 0.95–0.98. The obtained results are widely in line with the current literature. The obtained findings provide additional support for clinical decision making, guiding a patient care team prior to surgical treatment, and promoting a safe postoperative period.


Introduction
The complications of thoracic aortic disease include aortic dissection and aneurysm. These pathologies are common for elderly patients, males, smokers, and those with a family history of aneurysms. More than 20% of patients with aortic disease, suffering from acute aortic events, have no symptoms and die at home, without receiving medical help [1].
The causes of death include not only aortic rupture, but also myocardial infarction, renal insufficiency, and stroke [2]. In combination with several cardiovascular comorbidities, these factors complicate clinical decision making. One of the ways to decrease a patient's risk is to ensure a timely prognosis of complications.
Despite the fact that various risk scales (Euroscore, Euroscore II, STS score) are successfully used in cardiac surgery, there is still no single prognostic risk assessment scale for patients with thoracic aortic pathology. Currently, there are several attempts being made to design specific predictive models for thoracic aortic pathology risk assessment [3,4]. However, extension of the dataset is required to identify the most significant risk factors, due to the heterogeneity in the obtained predictors in all studies. The significant risk factors are used to create a scale that is correct for assessing perioperative risk in patients with thoracic aorta.
Machine learning (ML) can provide tools for personalized risk prediction based on realworld data and the clinical history of a patient [5]. It employs collected routine clinical data to implement mathematical models that can forecast risks [6]. The ML models can predict the expansion of aortic aneurysm based on the anatomical features extracted from CT scans and textual documents. The ML algorithm developed by Hirata et al. [7] could predict an expansion of an aneurysm with high accuracy. Another study used ML techniques to make a prognosis on the risk of aortic aneurysm growth in 85% and 71% of patients at 12 and 24 months, respectively [8].
The incidence of adverse events is not the same in each patient. The evaluation of risk factors for adverse events in patients after such a complex procedure is crucial. To date, some authors have attempted to identify predictors of early postoperative complications [4,[9][10][11]. However, searching the predictors for perioperative and postoperative complications and mortality after thoracic aortic surgery is still an issue. Recent studies have investigated the problem of TAA and related risks. Table 1 summarizes the results of the review performed for cardiovascular predictive modelling. The algorithms most frequently used for cardiovascular predictive modelling are logistic regression (LR), ensemble models and tree models (random forest and decision tree classifiers), and boosting strategies, such as XGBoost. The most frequent metric for the evaluation of predictive models is the area under the receiver operating characteristic curve (AUC-ROC). Thereby, a higher value corresponds to better discrimination [17].
The goal of the presented study is to develop predictive models for significant risk factor identification in patients after thoracic aneurysm surgeries, using integrated data from different medical institutions.

Materials and Methods
The model for risk prognosis was developed using two datasets from two clinical providers. The first dataset contains 97 structured records for 137 patients with clinical records on aortic operations. The second dataset contains 56,929 text documents from the years 2008-2019 for the 343 TAA operations of 319 patients.
The pipeline for the model development is represented in Figure 1. The features in the dataset with >30% missing values were eliminated. For managing features with up to 30% missing values, the k nearest neighbors (KNN) imputation technique was applied. The Pearson's correlation method was used for feature correlation analysis. Features with a high correlation coefficient were eliminated. The synthetic minority over-sampling technique (SMOTE) was employed for balancing the dataset. The classification was conducted using the two most important features, and all of the features were used to compare performances. The feature selection was organized through the voting of several techniques: univariate feature selection with a chi-squared test, recursive feature elimination (RFE), extra trees classifier, and Lasso.
We used logistic regression (LR), random forest (RF) and CatBoost (CC) classifiers for experiments. The parameters were tuned through the grid search, and the F-score was used as the optimization metric.
LR is expressed by the following equation: LR is the most frequently used machine learning model in medical applications, due to its high interpretability. Its sensitivity to the multicollinearity problem is one of the disadvantages of the LR model. Thus, highly correlated features should not be included in the predictive model.
RF is an ensemble model based on decision trees. During classification, each tree assigns the most likely target to each patient with a set of predictors. The averaging function is expressed by the following equation: where pt (y|x) is the probability distribution for each tree. RF is also a widespread algorithm for medical applications.
CatBoost is an ordered gradient boosting algorithm that addresses the problem of target leakage. CC is effective on small datasets. Binary decision trees are used in the CC classifier. The CC output can be expressed as follows: H(xi) is a decision tree function and Rj is a disjoint region corresponding to the leaves of the tree. The features in the dataset with >30% missing values were eliminated. For managing features with up to 30% missing values, the k nearest neighbors (KNN) imputation technique was applied. The Pearson's correlation method was used for feature correlation analysis. Features with a high correlation coefficient were eliminated. The synthetic minority over-sampling technique (SMOTE) was employed for balancing the dataset. The classification was conducted using the two most important features, and all of the features were used to compare performances. The feature selection was organized through the voting of several techniques: univariate feature selection with a chi-squared test, recursive feature elimination (RFE), extra trees classifier, and Lasso.
We used logistic regression (LR), random forest (RF) and CatBoost (CC) classifiers for experiments. The parameters were tuned through the grid search, and the F-score was used as the optimization metric.
LR is expressed by the following equation: LR is the most frequently used machine learning model in medical applications, due to its high interpretability. Its sensitivity to the multicollinearity problem is one of the disadvantages of the LR model. Thus, highly correlated features should not be included in the predictive model.
RF is an ensemble model based on decision trees. During classification, each tree assigns the most likely target to each patient with a set of predictors. The averaging function is expressed by the following equation: where p t (y|x) is the probability distribution for each tree. RF is also a widespread algorithm for medical applications.
CatBoost is an ordered gradient boosting algorithm that addresses the problem of target leakage. CC is effective on small datasets. Binary decision trees are used in the CC classifier. The CC output can be expressed as follows: H(x i ) is a decision tree function and R j is a disjoint region corresponding to the leaves of the tree.
The experiments were conducted with the following Python 3 packages: scikitlearn [18] and CatBoost [19] for machine learning model implementation; seaborn [20] and matplotlib [21] for data visualization; SMOTE [22] for dataset balancing; and SHapley Additive exPlanations (SHAP) [23] for the interpretation of black-box results. The discrimination was evaluated using ROC curves. Table 2 lists the machine learning models and parameters used in the research.  Table 3 shows the best performances for each classification target.  The red color in Figure 2 relates to a higher value of the feature (for binary features, it corresponds to one), while the blue color corresponds to a lower feature value. The negative SHAP value corresponds to a negative impact on prediction, leading the model to predict zero, and a positive SHAP value corresponds to a positive impact on prediction, leading the model to predict one. For instance, a higher intraoperative hematocrit leads to a lower mortality risk, and a lower intraoperative hematocrit leads to a higher mortality risk. A decreased level of red blood cells leads to lower risks of TND cases, but a decreased level of red blood cells does not necessarily lead to higher risks of TND cases. Figure 3 represents the plot, showing the most powerful predictors for a particular patient from the dataset for in-hospital mortality.

Results
The bold value in Figure 3 indicates the model's output value. The red features increase the prediction and the blue features decrease the prediction. Aortic valve insufficiency has a positive impact on the output value and the red blood cell feature has a negative impact on the output value. The red color in Figure 2 relates to a higher value of the feature (for binary features, it corresponds to one), while the blue color corresponds to a lower feature value. The negative SHAP value corresponds to a negative impact on prediction, leading the model to predict zero, and a positive SHAP value corresponds to a positive impact on prediction, leading the model to predict one. For instance, a higher intraoperative hematocrit leads to a lower mortality risk, and a lower intraoperative hematocrit leads to a higher mortality risk. A decreased level of red blood cells leads to lower risks of TND cases, but a decreased level of red blood cells does not necessarily lead to higher risks of TND cases. Figure 3 represents the plot, showing the most powerful predictors for a particular patient from the dataset for in-hospital mortality. The bold value in Figure 3 indicates the model's output value. The red features increase the prediction and the blue features decrease the prediction. Aortic valve insufficiency has a positive impact on the output value and the red blood cell feature has a negative impact on the output value.

Discussion
Despite the fact that a number of scoring systems for cardiac risk assessment have been developed and successfully applied in practice, they do not take into account the specificity of thoracic aortic pathology. More and more medicine-related studies concentrate on building machine learning models to learn from historical experience [24], and to identify specific risk factors.
Currently, there are a number of studies devoted to the identification of prognostic factors for postoperative outcomes in patients with thoracic aortic pathology. Age, NYHA III-IV class of heart failure, renal insufficiency, ascending aorta dilatation, involvement of the aortic arch in the pathological process, lower limb malperfusion, and emergent/urgent aortic surgery are the most common risk factors that affect the survival and development of postoperative complications. In addition, the likelihood of a favorable prognosis decreases, due to reoperations, combined cardiac surgery (e.g., coronary artery bypass grafting), and a prolonged cardiopulmonary bypass duration [4,11]. Some studies have emphasized the negative role of increased blood components in transfusions (packed red blood cells, fresh frozen plasma, and platelets) [4,9,10]. The red color in Figure 2 relates to a higher value of the feature (for binary features, it corresponds to one), while the blue color corresponds to a lower feature value. The negative SHAP value corresponds to a negative impact on prediction, leading the model to predict zero, and a positive SHAP value corresponds to a positive impact on prediction, leading the model to predict one. For instance, a higher intraoperative hematocrit leads to a lower mortality risk, and a lower intraoperative hematocrit leads to a higher mortality risk. A decreased level of red blood cells leads to lower risks of TND cases, but a decreased level of red blood cells does not necessarily lead to higher risks of TND cases. Figure 3 represents the plot, showing the most powerful predictors for a particular patient from the dataset for in-hospital mortality. The bold value in Figure 3 indicates the model's output value. The red features increase the prediction and the blue features decrease the prediction. Aortic valve insufficiency has a positive impact on the output value and the red blood cell feature has a negative impact on the output value.

Discussion
Despite the fact that a number of scoring systems for cardiac risk assessment have been developed and successfully applied in practice, they do not take into account the specificity of thoracic aortic pathology. More and more medicine-related studies concentrate on building machine learning models to learn from historical experience [24], and to identify specific risk factors.
Currently, there are a number of studies devoted to the identification of prognostic factors for postoperative outcomes in patients with thoracic aortic pathology. Age, NYHA III-IV class of heart failure, renal insufficiency, ascending aorta dilatation, involvement of the aortic arch in the pathological process, lower limb malperfusion, and emergent/urgent aortic surgery are the most common risk factors that affect the survival and development of postoperative complications. In addition, the likelihood of a favorable prognosis decreases, due to reoperations, combined cardiac surgery (e.g., coronary artery bypass grafting), and a prolonged cardiopulmonary bypass duration [4,11]. Some studies have emphasized the negative role of increased blood components in transfusions (packed red blood cells, fresh frozen plasma, and platelets) [4,9,10].

Discussion
Despite the fact that a number of scoring systems for cardiac risk assessment have been developed and successfully applied in practice, they do not take into account the specificity of thoracic aortic pathology. More and more medicine-related studies concentrate on building machine learning models to learn from historical experience [24], and to identify specific risk factors.
Currently, there are a number of studies devoted to the identification of prognostic factors for postoperative outcomes in patients with thoracic aortic pathology. Age, NYHA III-IV class of heart failure, renal insufficiency, ascending aorta dilatation, involvement of the aortic arch in the pathological process, lower limb malperfusion, and emergent/urgent aortic surgery are the most common risk factors that affect the survival and development of postoperative complications. In addition, the likelihood of a favorable prognosis decreases, due to reoperations, combined cardiac surgery (e.g., coronary artery bypass grafting), and a prolonged cardiopulmonary bypass duration [4,11]. Some studies have emphasized the negative role of increased blood components in transfusions (packed red blood cells, fresh frozen plasma, and platelets) [4,9,10].
Great attention is paid to the prognostic criteria for thoracic aortic surgery; however, there are few studies that aim to identify the relationship between risk factors and adverse outcomes. This study is dedicated to the development of a predictive model based on integrated medical data, using two datasets from high-throughput aortic centers.
Feature selection plays an important role in medical risk prediction using machine learning models. We removed six features due to discrepancies in the data storage formats and in the diagnostic methods applied in the participating clinics, and because of the missing values. The exploratory data analysis resulted in the removal of weight, due to the high correlation with two other features. The circulatory arrest time, cardioplegic arrest time, and cardiopulmonary bypass time were eliminated because of the large number of missing values, as shown in [25], acknowledging that the application of imputation methods can distinctly affect the performance of the predictive model.
We tested three machine learning algorithms to develop a predictive model: (1) LR; (2) RF; (3) CatBoost. CatBoost, with the SMOTE balancing technique, demonstrated the best performance for the most targets.
We demonstrated several tools for CatBoost evaluation and interpretation: featuring importance scores, which are summarized using summary plots for each target variable ( Figure 2); comparison with other well-known machine learning models (LR and RF), using metrics such as ROC AUC, F-score, Recall, and Precision (Table 3). An accuracy measurement can be misleading, due to the fact that higher metric values indicate overfitting, especially on imbalanced datasets [26]. Precision is the ratio between correctly classified patients and all patients assigned to the class. Recall is the rate of correctly classified patients. If recall equals one, the prediction of positive classes is perfect. This metric is crucial to evaluate medical prediction models, as it is important to identify as many cases of the pathological event as possible. A low recall value corresponds to a high rate of positive cases of medical risk missed. F-score is the harmonic mean of recall and precision. The use of F-score in parameter tuning helps to penalize models for extreme values [27].
The SHAP value was used to ensure interpretability of the model. SHAP covers two aspects: global and local interpretability. Global interpretability explains the relationships of predictors with target variables, i.e., risk factors with risks, and allows the consistency of the model to be analyzed with the current practices. Local interpretability helps to understand why a particular case or patient obtains a particular prediction. Figure 2 illustrates the summary plots for each target variable, showing negative and positive relationships of predictors with targets. These plots take into account the feature importance, the impact of each feature on the final prediction, the initial value of the feature (lower values are blue and higher values are red), and the correlation of the feature with the target (lower intraoperative creatinine correlates with a lower risk of multiple organ failure). The SHAP value provides the correlation, but not causation. Figure 3 illustrates an example of a force plot for a single patient from the dataset. It helps to understand the influence of each predictor on the final output. Such a plot might be useful for future decision making.
The performance of the developed models could be compared to the results of other studies in predicting postoperative cardiovascular complications. Coulson et al. [16] set an aim to develop models to predict the risks of acute kidney injury and the need for renal replacement therapy after cardiac surgery, using as few predictors as possible. The simplicity and interpretability of the models, and the few predictors used, ensure the accessability of prediction models for clinicians. Thus, a careful analysis of the literature and accumulated practical experience is needed to stratify risk factors. The AUC ROC for the acute kidney injury postoperative prediction was 0.70, and the AUC ROC for the need for renal replacement therapy postoperative prediction was 0.85.
Fernandes et al. [15] investigated machine learning models to predict mortality after cardiac surgery. The best results were shown by boosting classifiers and random forest, showing 0.87 AUC ROC and up to 0.91 recall.
Czerny et al. [3] showed that logistic regression outperformed the other investigated classifiers, with a mean AUC of 0.712 for predicting mortality rate in acute aortic dissection.
The CatBoost classifier performs better in comparison with the results from the literature. In most cases, the obtained results are in line with the current literature. Thus, the independent risk factors for postoperative acute kidney injury requiring RRT are impaired preoperative renal function, reduced left ventricle ejection fraction, and transfusion of a large volume of blood components, as well as being overweight [28][29][30]. In our model, these factors contribute significantly to the postoperative acute kidney injury.
Additionally, Wang et al. [11] demonstrated that the large extent of aortic dissection was an independent risk factor for early mortality. In another study, a significant negative role of primary fenestration with aortic dissection, especially with type B, was revealed as an important factor for mortality [31]. Moreover, the presence of this type of aortic dissection led to an increase in postoperative renal complications [32]. In another study, an enlarged abdominal aortic diameter was shown to be a risk factor for complications in the postoperative period [33].
Nevertheless, we should point out that, from a clinical perspective, the impact of many features in the predictive model is obscure. However, most of the features have a logical clinical explanation. The example of such clinical significance is a direct relation of the aortic diameter at the sinuses of Valsalva to temporal neurological deficit, which is still indistinct. To reveal the answer, one needs to resolve a logical chain. A large aortic root is an indication that it has been replaced. This naturally prolongs the cardiopulmonary bypass time and, successively, increases the risk of neurological deficiency.
Despite the successful implementation of surgical risk calculators (Euroscore, Euroscore II, and STS score), a standardized prognostic risk assessment scale for patients with thoracic aortic pathology has not yet been adopted. In the current literature, there have been a few attempts to compile prognostic models [4]. However, due to the heterogeneity of the predictors obtained in each particular study, the accumulation of more data is needed, in order to identify the significant risk factors. Elaboration of the correct risk score calculation for prognosis assessment in patients with thoracic aortic diseases is crucial. Our findings provide additional support for clinical decision making, guiding a patient care team prior to a surgical treatment, and promoting a safe postoperative period.
The presented study has certain limitations. Despite the integration of medical records from the datasets of two different clinics, the number of patients and clinical cases (operations) is relatively small. We are planning to extend it in the future. The study faced a problem of unbalanced data, which is a traditional concern for medical data [12]. This leads to situations where machine learning algorithms tend to classify the data into predominant classes. SMOTE for data balancing, and F-measure as the optimization metric, which is less sensitive to data imbalance, were applied to address the problem. However, the study still has limitations due to the imbalanced medical datasets. Another limitation is related to the loss of data during the integration process. We had to compare and map not only the logical data structures and contents, but also diagnostic methods and treatment approaches in different institutions. This reduced the amount of data we could include in the study.

Conclusions
This study has implemented models for postoperative risk prognosis for patients with thoracic aortic disease, using real-world data from two different medical institutions, comprising from both structured data and free-text medical records. The obtained findings provide additional support for clinical decision making, guiding a patient care team prior to surgical treatment, and promoting a safe postoperative period. Future studies may address the current limitations of the study, such as relevant synthetic patients' generation, model validation in a medical practice, and the development of applied risk stratification scales based on the obtained results.   Mitral valve replacement.