A Machine Learning Model for the Accurate Prediction of 1-Year Survival in TAVI Patients: A Retrospective Observational Cohort Study

Background: predicting the 1-year survival of patients undergoing transcatheter aortic valve implantation (TAVI) is indispensable for managing safe early discharge strategies and resource optimization. Methods: Routinely acquired data (134 variables) were used from 629 patients, who underwent transfemoral TAVI from 2012 up to 2018. Support vector machines, neuronal networks, random forests, nearest neighbour and Bayes models were used with new, previously unseen patients to predict 1-year mortality in TAVI patients. A genetic variable selection algorithm identified a set of predictor variables with high predictive power. Results: Univariate analyses revealed 19 variables (clinical, laboratory, echocardiographic, computed tomographic and ECG) that significantly influence 1-year survival. Before applying the reject option, the model performances in terms of negative predictive value (NPV) and positive predictive value (PPV) were similar between all models. After applying the reject option, the random forest model identified a subcohort showing a negative predictive value of 96% (positive predictive value = 92%, accuracy = 96%). Conclusions: Our model can predict the 1-year survival with very high negative and sufficiently high positive predictive value, with very high accuracy. The “reject option” allows a high performance and harmonic integration of machine learning in the clinical decision process.


Introduction
The introduction of transcatheter aortic valve implantation (TAVI) in the past decade has revolutionized the approach to managing patients affected by severe aortic stenosis. As compared with surgical aortic valve replacement (SAVR), crucial issues for the success of TAVI include its minimal invasiveness, avoidance of cardiopulmonary bypass and a reduced incidence of early complications, such as bleeding, especially in elderly and intermediate-to high-risk patients [1]. Moreover, the possibility of avoiding general anesthesia in most cases has raised the question of the feasibility and safety of early discharge strategies [2][3][4]. Early discharge strategies can prevent hospital-related infections as well as save both costs and resources, topics that acquired increased weight during the coronavirus-2019 (COVID- 19) pandemic. On the other hand, about 10% of patients who undergo TAVI experience readmission for any cause in the first 30 days post procedure in high-volume centres [2,3]. Moreover, although the rate of 30-day mortality remains low (about 1%), 1-year mortality occurs in about 7% of patients [4]. Because there is a lack of consensus about both the timing (i.e., same or next day) and selection criteria for early discharge, tools for the personalized prediction of outcomes (as opposed to general or standardized algorithms) could be useful for improving patient eligibility. The complexity and variety of clinical scenarios in this particular population can result in a difficult prognosis for clinicians. Prior studies have highlighted the influence of clinical, anatomical and procedural variables on postprocedural complications and survival rates [5][6][7]. Nevertheless, postoperative complications and residual paravalvular regurgitation have also demonstrated a significant influence on the outcomes. Artificial intelligence and machine learning (ML) are expected to provide increasingly complex prediction models. Therefore, we aimed to develop an ML model that is able to assess the 1-year survival of patients following TAVI with a high prediction performance.

Methods
Based on our prior experience and results in identifying risk factors for outcomes after TAVI [5][6][7], the Machine Learning for Aortic Valve Interventions (MALAVI) initiative began in March 2019. The project started from the observation of the complexity of a personalized heart team evaluation of patients affected by aortic valve stenosis, as suggested by international guidelines. The general scope of the MALAVI project was to support clinicians' decision making through the use of ML to achieve the lowest possible rate of complications. This is a monocentric, retrospective observational cohort study conducted in accordance with the STROBE statement [8], as reported in Supplementary Table S1.
For the specific scope of the present manuscript, we retrospectively analyzed our centre's experience with TAVI procedures between January 2012 and August 2018. All consecutive patients who underwent TAVI for symptomatic severe stenosis of the native aortic valve were included in the study. The indication for TAVI was in accordance with international guidelines, that is, in patients at intermediate and/or high or prohibitive surgical risk after discussion by the institutional heart team, which consisted of an interventional cardiologist, an imaging specialist cardiologist, a cardiac surgeon and an anesthetist [9]. Exclusion criteria were bicuspid aortic valve of type Sievers 0, pure aortic regurgitation, absence of a baseline electrocardiogram-triggered multislice computed tomography (MDCT) and use of experimental prostheses. Overall, 881 patients were evaluable. Because of the deep differences in outcomes, only patients who underwent transfemoral (TF)-TAVI were included, excluding the transapical (n = 252) and transaortic (n = 1) access. Clinical, operative and postprocedural data were prospectively collected in our institutional database. A total of 134 input variables were considered for the model. All variables and definitions are listed in Table S2. Basically, preoperative characteristics were defined based on the EuroSCORE II criteria, whereas procedural and postprocedural variables were based on the VARC-2 criteria [10,11]. The VARC-2 definition considers the onset of most complications within the first 72 h following the index procedure. The follow-up was conducted through visits in the outpatient ward and, alternatively, through telephone contact with patients or their physicians, and it was concluded in March 2022. All patients provided informed consent for the anonymous use of their data, and the study was approved by the institutional review board (IRB-2022-10). The study protocol conforms to the ethical guidelines of the Declaration of Helsinki.
To improve negative and positive predictive powers of the models, an important and common method in machine learning-the "reject option"-was applied. This means that the models were allowed to refuse to make a prediction (i.e., the model was "in doubt"). The advantage of this option is that it substantially increases the accuracy of the models in such a way that the medical expert can safely rely on the prediction. The disadvantage of this method, however, is that 19% of individuals do not receive a prediction. If no prediction is made with the model, a prediction concerning the 1-year survival can still be made by the clinical expert if desired.
Medical expert knowledge was used together with the results of published studies to select a large set of possible candidate predictors expected to have high or moderate prediction power. These are shown in Table S2 (bold marked label).

Statistical Methods
Data were screened and cleaned for incomplete, incorrect and missing data. Data were analyzed and tested for normality. To assess the univariate predictive power, continuously distributed data were analyzed for various distributions, including normal, log-normal and gamma distributions. Fisher's exact test and Pearson's chi-square test were used to analyze cross-tabulation tables. In addition, generalized linear models with log-normal and gamma distributions as well as t-tests with and without the assumptions of variance homogeneity were used for continuous variables.
ML Algorithms: Multilayer perceptron neural networks, support vector machines, nearest neighbour classifiers, random forest models and Bayes classifiers were built, trained in learning samples using 10-fold cross-validation and additionally tested in an independent test sample. Model performances were compared against each other using negative predictive value (NPV), positive predictive value (PPV) and accuracies. Supplementary Table S3 in the appendix presents an overview of preprocessing; model training; the split of data into training, validation and test samples; training stop criteria; selection of thresholds for the reject option and the final test of the model. No data imputation was performed to replace the missing data; all models were trained with complete data.
Feature Selection: We performed feature selection in two steps. In the first step, prior medical expert knowledge was used together with the results of published studies to reduce the number of the initial 134 input variables to a smaller set of 40 candidate predictors, which were expected to have high or moderate prediction power. Within this large set of candidate predictors, it is still highly likely that there are correlated and redundant variables in this set. In the second step, these 40 candidate predictors were then offered a genetic algorithm for variable selection [12] to further reduce the number of predictor variables. The genetic algorithms for the feature selection algorithm were directly integrated as part of the learning algorithm (the "embedded method"), which means that the genetic algorithm was applied during training.
Reject Option: To allow the algorithm to refuse a prediction ("if the prediction model is in doubt"), we applied the "reject option" [13]. This means that, in the finally trained model, two cutoffs instead of one for posterior probabilities were chosen in the training sample and then tested in the test samples. We aimed to identify prediction models with high NPVs and PPVs for two reasons: a false-negative prediction will determine the discharge of a patient who is at risk of dying within 1 year, and a false-positive prediction may lead to maintaining the unnecessary hospitalization of a patient who is not expected to die. This unnecessarily increases resource consumption and the risk of hospital-related infections. To achieve the goal of high NPVs and PPVs, all models were checked as to whether the application of the so-called "reject option" increased predictive values as compared with models without a reject option. To explain why a certain number of patients did not receive a prediction, how closely both data clouds are stuck together and thus the need for a reject option, we illustrate the overlap between various data distributions using matrix plots of univariate predictors.
A priori probabilities of the 1-year survival were used by feeding the model with the a priori distribution to increase model performance.
Model Performance: Finally, to assess model performance after applying the reject option, we computed NPVs and PPVs, the percentage of subjects without a prediction and the total of correctly predicted numbers (i.e., the number of patients with a correct prediction proportional to the number of patients receiving a prediction) in the training and test samples. To demonstrate how the algorithms generalized to the new previously unseen data, we compared and reported the results from the training and test samples [13]. All reported tests were two-sided, and p values < 0.05 were considered statistically significant. All statistical analyses in this report were performed using STATISTICA 13 (StatSoft, Tulsa, OK, USA) and MATHEMATICA 13 (Wolfram Research, Inc., Champaign, IL, USA) [14].

Results
A total of 629 patients undergoing TF-TAVI between January 2012 and August 2018 were included in the study. Table 1 provides an overview of the demographic data. The prosthesis model and label sizes are shown in Supplementary Table S4. From 64 patients, the 1-year survival was missing, such that data from 565 patients' data records were included for modelling. Among all TAVI patients, 88.1% (498/565) were still alive after 1 year, but 11.8% (67/565) died within 1 year. This information was used in the a priori distribution. Univariate significant predictors: Of the 40 candidate variables, 19 were determined to be significant predictors-baseline transvalvular Dmax, p = 0.049; C-reactive protein (mg/dL), p = 0.0003; QRS complex duration (ms), p = 0.008; QTc interval (ms), p = 0.005; baseline hemoglobin level (g/dL), p = 0.016; baseline white blood cell count (×1000/uL), p = 0.033; age, p = 0.0006; baseline left ventricular ejection fraction, p = 0.0015; computed tomography (CT)-based aortic annulus area (mm 2 ), p = 0.008; CT-based aortic annulus perime-ter (mm), p = 0.019; SAPS2 value on intensive care unit admission, p = 0.0006; discharge transvalvular Dmean, p = 0.039; creatinine (mg/dL) postprocedural peak, p = 0.000006; extracardiac arteriopathy, p = 0.016; non-insulin-dependent diabetes mellitus, p = 0.02; baseline mitral valve regurgitation, p < 0.0001; valve prosthesis's label size, p = 0.04; prior valvuloplasty, p = 0.02 and postprocedural paravalvular regurgitation, p = 0.016. An overview of the candidate predictor variables is given in Table 2. Medical expert knowledge was used together with the results of published studies to select the set of 40 possible candidate predictors that were expected to have high or moderate prediction power. A genetic feature selection algorithm was used to further reduce the set of predictors.
Performance of Other ML Algorithms: Before applying the reject option, the model performances in terms of NPV/PPV were similar between all models. Support vector machines and random forests showed better performance (Table 3). Therefore, these models were also analyzed using the reject option to further increase NPV and PPV at the cost of accepting the situation in which some patients will not get a prediction. Model performances are very similar between all models. Random forest models and support vector machine models showed the most promising results in terms of receiver operating characteristic curves and were further analyzed using the reject option (Table 4). NPV = negative predictive value; PPV = positive predictive value. 1 No estimation of PPV was possible. To explain why a certain number of patients received no prediction, we provide a deeper insight into how closely both data clouds are stuck together and illustrate the overlap between various data distributions. As illustrated, there are considerably large overlaps between various distributions, indicating the need for the reject option (Figure 1). Model Performances Using the Random Forest Model after Applying the Reject Option: The best ML model achieved an NPV of 96% and a PPV of 92%. The algorithm excluded 109 of 565 (19%) patients, and the total correctly predicted number was also 96% (Figure 2; Tables  4 and S5).
In the remaining 437 cases (81%) in which a prediction was made, the corresponding NPV was 96%, and the PPV was 92% (Table 4). To provide a better understanding of the models and the reject option, we provide real data (Table 5).  Table 5. Illustration of model performance for predicting 1-year survival for eight patien going TAVI. Green highlighted patients (columns) were correctly predicted, and red hig patients were falsely predicted; patients marked in grey did not receive a prediction due to option. NPV was 96%, PPV was 92% and the total correctly predicted number was 96% af cation of the reject option ( Table 4). The cost for this improvement is that 19% of all patients in grey) did not receive a prediction.  In the remaining 437 cases (81%) in which a prediction was made, the corresponding NPV was 96%, and the PPV was 92% (Table 4). To provide a better understanding of the models and the reject option, we provide real data (Table 5). Table 5. Illustration of model performance for predicting 1-year survival for eight patients undergoing TAVI. Green highlighted patients (columns) were correctly predicted, and red highlighted patients were falsely predicted; patients marked in grey did not receive a prediction due to the reject option. NPV was 96%, PPV was 92% and the total correctly predicted number was 96% after application of the reject option ( Table 4). The cost for this improvement is that 19% of all patients (marked in grey) did not receive a prediction.

Discussion
Nowadays, TAVI represents the main solution-together with SAVR-for the treatment of severe aortic valve stenosis. The classical and most applied scores for the prediction of perioperative mortality-namely, EuroSCORE II and STS-PROM-were designed for SAVR, but many randomized trials conducted over the past decade have proved their inadequacy in predicting the outcome when applied to TAVI in the all-risk category [1]. If the original scope of these scores was to support the heart team in deciding between TAVI and SAVR, recent evidence showing good early and mid-term results-and in some cases also superiority-in favour of TAVI has definitely made the use of these scores obsolete. Indeed, recent guidelines focus on life expectancy and anatomical variables as the main issues for the heart team's discussion [15]. In addition to the important issue of choosing between TAVI and SAVR, the spread of the application of TAVI uncovers new challenges for clinicians. Considering the higher prevalence of aortic stenosis in the elderly population, the application of TAVI over SAVR is reasonably expected to continue to increase. The higher costs of the transcatheter valve and concerns about the economical sustainability of TAVI as first-line treatment will be a challenge for the next decade [16]. The shorter length of stay and lower resource consumption have the potential to significantly lower hospital costs [17]. On the other hand, these strategies could significantly affect patient safety. Prior experiences with standard algorithms did not result in a statistically significant reduction in 1-year mortality (e.g., Spence and colleagues noted a reduction from 10.5% to 6.4% using 11-point Discharge Risk Evaluation criteria) [18]. Instruments used in a personalized decision process, possibly with a prediction period longer than 30 days, are essential and necessary.
The role of ML models in medicine in recognizing a complex statistical pattern is an issue of debate. To the best of our knowledge, only a few studies have investigated the performance of ML in predicting the clinical outcomes following TAVI [19,20]; however, the prediction was limited to only in-hospital outcomes. Hernandez-Suarez and colleagues, based on a large interhospital data set (namely, the National Inpatient Sample [NIS]) including 10,883 TAVI procedures, tested the performance of four different ML models in predicting all-cause in-hospital mortality: logistic regression, neural network, I Bayes and random forest [19]. The authors found that the logistic regression-based artificial intelligence had the best area under the curve (AUC = 0.92; 95% confidence interval: 0.89 to 0.95). Overall, acute kidney injury was the variable with the greatest importance across all ML algorithms, followed by cardiogenic shock, fluid and electrolyte disorders, cardiac arrest, sepsis, dyslipidemia, hypertension, coagulopathy, current smoking and vascular complications, respectively. Unfortunately, the variable's definition was based on the ICD-9 cm and not on the VARC criteria, thus generating possible bias, especially regarding the clinical variable. In fact, although this method may be highly accurate regarding procedures (for billing purposes), some nonrelated clinical diagnoses may be omitted and may not represent the true prevalence of risk factors, whereas the anatomical variables are completely not evaluated. Moreover, the time of onset of some variables (such as cardiogenic shock) is not well defined and could refer both to baseline condition and post procedure. Finally, their study population included patients up to 2015, thus excluding the intermediate-risk population.
Gomes et al. recently published their experience using three different models (i.e., neural networks, support vector machines and random forests) in predicting in-hospital complications such as all-cause mortality, stroke, major vascular complications, paravalvular leakage and new pacemaker implantations [20]. In contrast to Hernandez-Suarez and colleagues, their data set was based on a single-centre experience with 451 patients, including 83 variables (including clinical, electrocardiogram, echocardiograph, CT, laboratory variables and postprocedural complications) defined based on VARC criteria.
However, the data set of Gomes et al. showed a predominance of echocardiographic parameters, whereas some important information, such as CT-based annuli dimensions and oversizing/undersizing, was missing [20]. Those researchers found that the random forest model was the most promising for in-hospital mortality (AUC = 0.97, accuracy = 90%, sensitivity = 0.96), which is in accordance with our study, but they could not accurately predict other outcomes.
The limitations of the above-mentioned NIS score and Gomes score, mainly related to short-term prediction, make their use unsuitable for supporting an early discharge strategy. The novelty of our MALAVI-t1y (tavi-1-year) model could be summarized in three points: (1) it is the first ML model for patients with TAVI designed to predict the longest outcome (i.e., 1-year survival) reported in the literature to date, (2) it is the first ML model that takes into account defined and quantitative anatomical variables alongside other well-known variables and (3) it is the first model to introduce the reject option and thus achieve higher model performance.
The wider application of the TAVI to patients at lower risk, and thus with a longer life expectancy, shifts the focus to long-term outcomes. The successful implantation of a transcatheter valve in the aortic position does not reset the pathological findings of patients affected by degenerative aortic stenosis, and late-onset complications could be responsible for the outcome of death. Retained calcified leaflets can lead to leaflet thrombosis and stroke [21], as well as to residual paravalvular regurgitation, which is associated with reduced survival at 2 years [22]. Moreover, the postprocedural onset of a conduction disturbance such as a left bundle branch block (the incidence of which is about 15%) did not result in any additional risk in the short-term (30-day) follow-up but was shown to be associated with an increased risk of all-cause (19.3%) and cardiovascular (16.2%) mortality at 2 years [23]. Therefore, focusing on the mid-term follow-up seems to better address the needs of the modern TAVI population. Interestingly, some of the 19 variables which contribute to the prediction are also editable. Indeed, a low LVEF can efficiently be treated improving the medical therapy [24,25]. Anemia could be easily addressed as well as infections. The application of KDIGO guidelines (the "KDIGO bundle", consisting of the following elements: avoidance of nephrotoxic agents, discontinuation of angiotensinconverting enzyme inhibitors and angiotensin II receptor blockers for the first 48 h after the procedure, avoidance of hyperglycemia for the first 72 h after the procedure, avoidance of radiocontrast agents, continued close monitoring of urinary output via indwelling urinary catheters, and most importantly, goal-directed fluid therapy, which focused on evidencebased hemodynamic endpoints) can reduce the frequency and the severity of acute kidney injury [26,27]. Recently, Flores-Emanzor and colleagues reported promising results in the use of percutaneous occluders to treat the moderate-to-severe residual paravalvular leaks [28]. The moderate-to-severe mitral insufficiency, if not regressed after the TAVI, can be effectively treated percutaneously, with good 5-year outcomes [29]. Addressing the above-mentioned factors, through a personalized pathway of care, they may theoretically lengthen the survival of these patients. The use of ML for guiding the long-term care of patients goes beyond the scope of the present study and should be investigated through future investigation ad hoc.
In our model, the dimension of the aortic annuli (as area and perimeter) as assessed with MDCT was correlated with survival. The role of this anatomical variable has already been widely investigated [5][6][7] and is now an important part of the heart team discussion [15]. However, its inclusion in prior ML models for TAVI has been limited. Our results stress the importance of this variable and the careful and shared discussion of anatomical structures within the heart team.
To the best of our knowledge, our MALAVI-tavi-1y is the first ML model for patients with TAVI that uses the reject option, which allows the model to achieve higher NPV and PPV (visual abstract). The goal to find a model with high NPV and PPV was achieved, however, at the cost that 19% of all patients did not receive a prediction. Thus, we suggest that in follow-up studies, additional predictors should be identified to build models to decrease this number, on the one hand, and to end up with high NPV and PPV, on the other. The classical version of the old scores, which expresses the risk in a percentage, always returns a value; however, it is not always simple to apply for dichotomy outcomes (such as mortality) and prolongs the discussion. Physicians cannot just discharge four-fifths of a single patient alive, while the remaining fifth of that person dies-the patient either lives or dies. The establishment of cut-off values leads to inaccuracies and failure of prediction. The purpose of the reject option is to support the clinician in the decision process. If the MALAVI-t1y is "not sure" about the prediction, it does not make a prediction; this is a sort of the mathematical equivalent of the journalistic "no comment". This occurs in about one-third of patients, whereas in the other two-thirds, the model makes highly accurate predictions. Considering that the ML is not intended to be a substitute for clinical decision making but rather to support it, we find this to be an acceptable compromise. The clinical decision remains the domain of the clinicians within a co-operative relationship with the patient.

Strengths and Limitations of This Study
Strengths: The most important strength of this study is that it provides a prediction model for patients with TAVI with excellent accuracy to identify patients who will survive within 1 year. In addition, the models are not only theoretical concepts but can also be implemented in any database and are of practical use in the clinic to support the clinician's decision. The large sample size of about 565 patients is also an important strength of the study because a large sample size is important for resulting in statistically sound models, as emphasized by mathematical statistics and ML theory. It is important that the models were 10-fold cross-validated in the learning sample and, after application of the reject option, also tested in an independent test sample to avoid overlearning and to provide accurate predictions when the models are confronted with new, previously unseen data (generalization to new data and model stability).
Limitations: Some limitations must be disclosed. This algorithm is based on retrospective data from a single-centre experience. Although the TAVI procedure is quite standardized worldwide, and our results are in line with those reported in the international literature, external validation and prospective observational (not interventional) studies are recommended. Our group is currently working on both. The study population predominately consisted of high-and intermediate-risk patients under general anesthesia, as at the time of the study period the TAVI procedure was not yet endorsed for low-risk or younger patients. In addition, if 1-year mortality in the latter group is expected to be lowered even further, future studies should also investigate the reliability in this specific category of patients. The same consideration is valid for procedures conducted with local anesthesia, which now represents the standard of care. To conclude, readers should be aware that all included patients were selected from a heart team in accordance with the guidelines, meaning that each patient was assigned to either TAVI or SAVR in order to achieve a higher performance and achieve the lowest complication rate possible. This selection a priori is also fundamental for the correct functioning of the MALAVI model.

Conclusions
Our machine learning model identifies a subcohort of 81% of the original cohort and predicts 1-year mortality with an accuracy of 96%. The NPV of the final model based on random forests was 96% and the PPV was 92% after applying the reject option. Decisions for the remaining 19% of patients are still challenging, and it is up to the clinical expert to make a prediction. This finding will support the heart team in selecting patients eligible for early discharge strategies. These results may improve early discharge strategies and secure resource distribution while not impairing patient safety.
Supplementary Materials: The following supporting information can be downloaded at https: //www.mdpi.com/article/10.3390/jcm12175481/s1, Table S1. STROBE Statement. Table S2. List of all variables used for MALAVI project. Table S3. Overview of preprocessing, model training, split of data into training, validation and test sample, model performance and final test of the model. Table S4. An overview of label sizes according to implanted prosthesis. One patient died intraoperatively after balloon dilatation and before prosthesis implantation because of annulus rupture. Table S5

Institutional Review Board Statement:
The study was conducted in accordance with the STROBE guidelines and the Declaration of Helsinki and was approved by the Ethics Committee of Paracelsus Medical University-Campus Nuremberg (IRB-2022-10).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: The data sets used and/or analyzed during the current study are available in anonymized form from the corresponding author upon reasonable request.

Conflicts of Interest:
D.E. declares speaker fees from Edwards Lifesciences. F.P. is the inventor of an arterial cannula for mechanical circulatory support (patent EP3860674). The other authors have declared no conflicts of interest. No funding was provided for this study.