Causal Models for the Result of Percutaneous Coronary Intervention in Coronary Chronic Total Occlusions

: Background: Patients undergoing coronary angiography very frequently exhibit coronary chronic total occlusions (CTOs). Over the last decade, there has been an increasing acceptance of the percutaneous coronary interventions (PCI) in CTOs due to, among else, rising operator experience and advances in technology. This study is an effort to address the problem of identifying important factors related to the success or failure of the PCI. Methods: The analysis is based on the EuroCTO Registry, which is the largest database available worldwide, consisting of 164 variables and 29,995 cases for the period 2008–2018. The aim is to assess the dynamics of causal models and causal discovery, using observational data, in predicting the result of the PCI. Causal models use graph structure to assess the cause–effect relationships between variables. In this study, the constrained-based algorithm PC was employed. The focus was to ﬁnd the local causal structure around the PCI result and use it as a feature selection tool for building a predictive model. Results: The model developed was compared with other modeling approaches from the literature, and it was found to perform equally well or better. Conclusions: The analysis showcased the potential of employing local causal structure in predictive model development.


Introduction
Coronary chronic total occlusions (CTOs) are defined as occlusions with at least 3 months duration, and thrombolysis in myocardial infarction flow grade 0 (TIMI = 0) [1]. It is a very common condition for patients undergoing coronary angiography with approximately 20% of the patients exhibiting CTOs. They are considered the most complicated malfunctions of coronary arteries that a cardiologist might address. During the last decade, there has been an increasing acceptance of the percutaneous coronary interventions (PCI) in CTO, with an increasing success rate [1]. PCI is a nonsurgical procedure used to treat narrowing of the coronary arteries of the heart found in coronary artery disease. The process involves combining coronary angioplasty with stenting. The success rate of PCI has been increased by, among other reasons, operator experience and advances in technology, both having a positive impact in the success of the operation, and currently achieving levels of even 93%.
There exist several attempts in the literature to assess the impact of different predictor variables to the success or failure of the PCI, ranging in size of the dataset from a few hundred cases [2] to approximately 20,000 [3]. In most studies, the authors develop a scoring system that categorizes patients in risk groups relevant to the success of the operation [2][3][4][5][6][7]. The operation success ranges from 50% in the seminal paper of Morino et al. [2] to 93% in [5].
The current study is based on the largest database available worldwide (EuroCTO Registry), which consists of 29,995 cases referring to the period 2008-2018 (including demo-graphic data, clinical, anatomic, procedure parameters, etc.). The motivation in this study was to assess the dynamics of causal models and causal discovery, using observational data, in predicting the result of the PCI. Causal models are used for the first time in this context to the best of our knowledge. Current approaches are mainly based on standard regression analysis and standard correlation assessment, to detect the most appropriate predictor variables for the response variable of interest. Causal models, on the other hand, aim to discover the cause-effect relationships between variables. The focus in this paper was to find and use the local causal structure around the response/target variable (i.e., operation result) to predict its outcome, particularly the so-called Markov blanket. By definition, the Markov blanket of a variable T is the minimal variable subset conditioned on which all other variables are probabilistically independent of T. The Markov blanket of T consists of the variables representing its parents, children, and other parents of its children in the graph (spouses). Using the Markov blanket of the response variable constitutes one of the emerging successful filtering approaches in variable selection [8].
In [9], the authors used causal models, particularly the probabilistic graphical models, to assess whether lung cancer prediction can be improved. The motivation in their approach was that the Markov blanket around a target variable contains, by definition, the most informative variables for the target variable. They found that their modeling approach, which was based on causal graphs, performed better than other competitive models in the literature. In [10], the authors used extensive simulations to assess the usage of Markov blanket as a feature selection tool compared to other methodological schemes. Particularly, they compared the predictive accuracy of eight logistic regression-based models, in which the set of predictors was selected with a different reasoning. Their results demonstrated empirically that the Markov blanket-based logistic regression model performed equal to or better than all its competitors. The authors argued that using causal models to identify the Markov blanket of a target variable might be a useful, efficient strategy to select predictors in clinical risk prediction models.
The aim of this study is to assess the dynamic of the Markov blanket as a feature selection tool for detecting predictors that are causally related with the result of the PCI, and therefore can be used as input for models predicting the PCI success or failure.

Patient Population
The EuroCTO club is a collaborative effort among high-volume CTO operators in 55 European centers aimed at sharing experiences and outcomes data. The EuroCTO registry is an electronic database, developed by EuroCTO club, including data since 2008, related to patients in whom chronic total occlusion (CTO) recanalization was attempted. In this database, multiple variables are recorded for every patient regarding preprocedural demographic and anatomical characteristics, procedural details, and postprocedural outcomes. It is the largest registry worldwide, including 29,995 cases. The data for this analysis refers to a 10 years' time period starting in 2008.

•
The degree of calcification was visually estimated on fluoroscopy, defined as moderate when one-half of the total CTO segment exhibited visible residues, and as severe when the extension of calcification was >50% of the segment. • Arterial tortuosity arises from abnormal elongation of the arteries and is characterized by blood vessel abnormalities, particularly abnormal twists and turns. The CTO was defined as straight if the pre-occlusive segment contained a bend of <70; moderate when a segment contained either two bends >70, or one bend >90; and as severe when it contained either two or more pre-occlusive bends of >90, or at least one bend of >120.

•
The length of coronary occlusions was visually estimated from angiographic projections with single-or dual-contrast injections.
• Chronic obstructive pulmonary disease (COPD) is a chronic inflammatory lung disease that causes obstructed airflow from the lungs. It is characterized by long-term respiratory symptoms and airflow limitation.

•
The morphology of the vessel stump was characterized as tapered, blunt, or unseen, depending on its appearance on fluoroscopy. The angiographic assessment of collateral connections was made according to the Werner classification. • Dyslipidemia is present when an abnormally high amount of lipids is present.

•
Renal failure is a long-term condition of kidney malfunction.

•
Family history of coronary disease is present when one or more close relatives had early coronary artery disease (men <55 years, women <65 years).

•
Peripheral disease was defined when abnormal narrowing of arteries was present. • Segmental regional abnormalities were defined as regional abnormalities in contractile function.

•
Previous MI was defined as documented history of previous myocardial infarction.

•
In-stent CTO was defined when CTO inside a previous PCI existed. • CTO location represented the vessel in which the CTO is located.

•
Previous CABG corresponded to history of coronary artery bypass surgery.

•
Technical success was defined as a residual stenosis of <10% at the end of the procedure with TIMI flow grade three antegrade flow.

PC Algorithm
The PC algorithm [11,12] is a method commonly used to learn the structure of a causal Bayesian network. More specifically, for each pair of variables (X, Y) in a dataset, the PC algorithm tests their conditional independence given the remaining variables, and it claims the nonexistence of a causal relationship between X and Y, i.e., no edge to be drawn between X and Y in the corresponding graph, when X and Y are independent given some other variables. Practically, the PC algorithm assesses the association conditioning on all subsets of all variables other than X and Y, in order to determine whether there exists a persistent association between X and Y [13]. The relationship is causal when the association exists given each of the conditioning sets. A network with a structure consistent with the results of the tests of independence is returned. It is assumed that causal sufficiency [12], which is a typical condition for the aforementioned algorithm, holds. This condition requires that for every pair of measured variables, all their common direct causes are also measured. In other words, there are no hidden, unmeasured confounders for any pair of variables.

Preprocessing
The initial dataset consisted of 29,995 cases and 164 variables. In order to focus on detecting predictors for the CTO PCI (response variable), conceptual (variables irrelevant to the result of the PCI were excluded), relevance (variables with weak association with the response were filtered out), and literature (variables considered important in the literature were included) criteria were used to filter the initial dataset and exclude a large number of variables. In addition, variables with missing values over 20% were not assessed. No imputation method for the missing values was considered, in order for the analysis to be based only on observed data. The association of the predictor variables with the response was separately assessed for each predictor variable with binary logistic regression analysis. Associations with a p-value > 0.05 resulted in excluding the corresponding variables from the dataset. The final dataset included 22 out of the 164 original variables with 17,370 complete cases.
This dataset was randomly partitioned into two sub-datasets, namely the training and the test datasets, which corresponded to 70% (n = 12,160) and 30% (n = 5210) of its cases, respectively. The training dataset was used for model development. The test dataset was used for model assessment. Since a strong class imbalance was observed between the two categories of the PCI result, failure and success, both in the initial dataset (n = 29,995 cases; 14% failure, 86% success) and in the filtered dataset (n = 17,370 cases; 12.4% failure, 87.6% success); the latter distribution remaining the same in the training dataset due to random partition, downsampling was applied. Particularly, the training dataset was randomly downsampled to address the class imbalance between the two classes of the response variable. The downsampling procedure employed randomly excluded cases with operation success until resulting in a dataset with equally sized classes (success and failure) for the response variable (n = 3026). Despite losing information, downsampling was selected, since it was assumed that it is of equal interest to correctly predict both the successful and unsuccessful operation results. In addition, by selecting downsampling compared to other approaches (e.g., upsampling), the likelihood of overfitting and consequent negative impact in the prediction model's performance was avoided [14]. The above procedure is depicted in a flow chart ( Figure 1).
Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 In the RECHARGE model, the variable "Disease distal landing zone" was exclu since in the EuroCTO registry it exhibited a large number of missing values. Similarl the case of the PROGRESS model, the variable "No interventional collateral" was excluded.
For all hypothesis tests, the level of statistical significance was set at a = 0.05. analysis was performed with R version 4.1.0.

Results
The variable characteristics of the training and test datasets are shown in Sup mental Tables A2, A3 for all the 21 variables considered, separately for the cases successful operation and unsuccessful operation. The results show that the training test datasets are homogeneous regarding all variables considered. For completeness variable characteristics are shown as well for the downsampled training dataset (Sup  mental Table A1), where the results are, as expected, different in general compared to other two datasets.
The causal graph developed based on the PC algorithm is displayed in Figure 2. graph revealed the complicated network structure of causal relations among factors have effect on the operation result. Five variables appear to have a cause-effect rela ship with the operation result, particularly the CTO length, proximal tortuosity, stu calcification, and segmental regional are all direct causes of the operation result. The eration result does not have any effects/children in the graph; thus, the Markov blank the operation result is composed from its causes alone.

Analysis
Standard descriptive analysis was performed both for the qualitative variables (frequencies, percentages) and for the quantitative variables (mean, standard deviation).
In the downsampled training dataset, the PC algorithm was used to develop the causal graph. Since both ordinal and continuous variables were included in the dataset, the constrained-based PC algorithm was used [12] with the R package "MXM" [15], which is appropriate for mixed data. In particular, the function "pc.skel" was used to produce the skeleton of the network, employing the "comb.mm" method (to assess the conditional independence for every pair of variables, each of the two variables is treated as response and the appropriate regression model is fitted. Next, two likelihood ratio tests are performed, and the two emerging p-values are combined in an overall p-value).
Next, the Markov blanket corresponding to the response variable was retrieved. All the variables that constituted the Markov blanket of the response were included as independent variables in a multivariate binary logistic regression model with the operation result as the dependent variable. This model was trained on the downsampled training dataset, resulting in the so-called Markov blanket for feature selection (MBFS) model.
The predictive accuracy of the MBFS model was assessed on the test dataset. Standard statistical accuracy measures were used, namely overall accuracy (overall proportion of true predictions), sensitivity (true predictions of successful operations over total number of successful operations), specificity (true predictions of unsuccessful operations over total number of unsuccessful operations), and the value of the area under the receiver operation characteristics (AUROC) curve [16].
Next, the predictive accuracy of the MBFS model was compared with five predictive models from the literature, namely, JCTO [2], CASTLE [3], RECHARGE [6], CL [4], and PROGRESS [5]. Particularly, the predictors included in each of these five models in the original manuscripts were included as independent variables in five multivariate binary logistic regression models, with the operation result as the target variable, respectively. For fairness, these five models were trained anew in the downsampled training dataset of this study, and new coefficient estimators were computed for the independent variables. Then, their predictive accuracy was assessed on the test dataset, and their performance was compared to the MBFS model. For the comparison, the Akaike information criterion (AIC) was computed after training each model on the downsampled training dataset, and the AUROC was computed while assessing each model on the test dataset.
In the RECHARGE model, the variable "Disease distal landing zone" was excluded since in the EuroCTO registry it exhibited a large number of missing values. Similarly, in the case of the PROGRESS model, the variable "No interventional collateral" was also excluded.
For all hypothesis tests, the level of statistical significance was set at a = 0.05. The analysis was performed with R version 4.1.0.

Results
The variable characteristics of the training and test datasets are shown in Supplemental Tables A2 and A3 for all the 21 variables considered, separately for the cases with successful operation and unsuccessful operation. The results show that the training and test datasets are homogeneous regarding all variables considered. For completeness, the variable characteristics are shown as well for the downsampled training dataset (Supplemental Table A1), where the results are, as expected, different in general compared to the other two datasets.
The causal graph developed based on the PC algorithm is displayed in Figure 2. The graph revealed the complicated network structure of causal relations among factors that have effect on the operation result. Five variables appear to have a cause-effect relationship with the operation result, particularly the CTO length, proximal tortuosity, stump, calcification, and segmental regional are all direct causes of the operation result. The operation result does not have any effects/children in the graph; thus, the Markov blanket of the operation result is composed from its causes alone.
The five variables included in the Markov blanket were then included in a multivariate binary logistic regression model, and the results are shown in Table 1. All five variables were statistically significant in the multivariate model.    In Table 2, the assessment results of the MBFS model are shown. The overall accuracy of the MBFS model at a prediction probability threshold of 0.5 was 66.47%, with sensitivity at 66.62% (representing the percentage of correctly identifying cases with successful operation), and specificity at 65.43% (representing the percentage of correctly identifying cases with unsuccessful operation). The AUROC was 0.7145. Table 2. Accuracy measures for the MBFS model over the test dataset (n = 5210). The values of overall accuracy, sensitivity, specificity, and area under the receiver operation characteristics (AUROC) curve are displayed (the selected threshold for the predicted probabilities was set to 0.5).  Figure 3 shows the tradeoff between sensitivity, specificity, Youden index (sum of sensitivity and specificity minus one), and misclassification error for different prediction probability thresholds. It is displayed that at the standard threshold of 0.5, the curves corresponding to sensitivity and specificity almost cross each other, thus resulting in very similar values (66.62% and 65.43%). At the same time, the Youden index approximately receives its maximum value (0.3205). On the other hand, the misclassification error is 33.44%, which is higher than its minimum value (12.44%), corresponding to prediction probability threshold equal to 0.035 (which is clearly, however, very close to zero). The reason for this is that in the test dataset, the class imbalance in operation success still exists, thus, decrease in sensitivity results in higher values of misclassification error, despite the larger increase in the specificity. Assuming that it is of equal interest to correctly predict both the successful and unsuccessful operation results, a slight increase of the misclassification error is acceptable provided that high values of specificity are achieved. In Table 2, the assessment results of the MBFS model are shown. The overall accuracy of the MBFS model at a prediction probability threshold of 0.5 was 66.47%, with sensitivity at 66.62% (representing the percentage of correctly identifying cases with successful operation), and specificity at 65.43% (representing the percentage of correctly identifying cases with unsuccessful operation). The AUROC was 0.7145. Table 2. Accuracy measures for the MBFS model over the test dataset (n = 5210). The values of overall accuracy, sensitivity, specificity, and area under the receiver operation characteristics (AUROC) curve are displayed (the selected threshold for the predicted probabilities was set to 0.5).

Accuracy Measures MBFS
Overall Accuracy 0.6647 Sensitivity 0.6662 Specificity 0.6543 ΑUROC 0.7145 Figure 3 shows the tradeoff between sensitivity, specificity, Youden index (sum of sensitivity and specificity minus one), and misclassification error for different prediction probability thresholds. It is displayed that at the standard threshold of 0.5, the curves corresponding to sensitivity and specificity almost cross each other, thus resulting in very similar values (66.62% and 65.43%). At the same time, the Youden index approximately receives its maximum value (0.3205). On the other hand, the misclassification error is 33.44%, which is higher than its minimum value (12.44%), corresponding to prediction probability threshold equal to 0.035 (which is clearly, however, very close to zero). The reason for this is that in the test dataset, the class imbalance in operation success still exists, thus, decrease in sensitivity results in higher values of misclassification error, despite the larger increase in the specificity. Assuming that it is of equal interest to correctly predict both the successful and unsuccessful operation results, a slight increase of the misclassification error is acceptable provided that high values of specificity are achieved.  The results of the comparison among the MBFS model and the models JCTO, CASTLE, RECHARGE, CL, and PROGRESS are shown in Table 3 and Figure 4. In Table 3, the number of the features is displayed for each model, along with their specific names, the AIC value obtained based on the corresponding logistic regression model (downsampled training data), and the AUROC value (test data). The differences among the models are small, which has already been noted, among else, in [17]. The proposed MBFS model is the best one in terms of both the AIC (exhibited minimum value) and the AUROC (exhibited maximum value), compared to all the remaining models. Particularly, the MBFS model exhibited an AIC value of 3751.7, compared to (in increasing order) 3760. 6  Note that this is not a strict ranking of the above models, since the JCTO, CASTLE, RECHARGE, CL, and PROGRESS models are used in a different way than in the original articles, and mainly as feature selection tools in this study. In Figure 4, it is visually exhibited that the AUROC corresponding to the MBFS model surpasses all other models.

Discussion
CTO is a very common condition for patients that undergo coronary angiography. The increasing acceptance of PCI in CTO in recent years highlights the importance of efficiently detecting the factors that are strongly related with the result of the PCI for prediction model development. Several attempts are available in the literature that assess the impact of independent predictor variables to the PCI result. To the best of our knowledge in all previous attempts, the authors have used regression-based methodological procedures to select the predictor variables that will be included in their models. In this study, a new approach is proposed to select the most important predictor variables. This approach is based on exploiting the dynamics of cause-effect relationships between variables. The focus was on the local causal structure around the target variable, particularly on its Markov blanket. The Markov blanket of the operation result was used as a feature selection tool towards developing the prediction model for the result of the PCI. Essentially, variables detected to be causally related with the result of the PCI were included as independent factors in a prediction model for the result of the PCI (success/failure). A similar approach has been already successfully applied in [9], aiming in detecting the most informative variables in lung cancer. The proposed prediction model, MBFS, included all five variables that constituted the Markov blanket of the operation result, namely, the CTO length, proximal tortuosity, stump, calcification, and segmental regional. The impact of proximal tortuosity, stump, calcification, and CTO length to the operation result (see Table 1) was found to be in agreement with the literature [2][3][4][5][6][7]. Particularly, it is known that the proximal tortuosity categories moderate and severe are compounding regarding the success of the operation compared to the straight class (odds ratio values 0.671, 0.355, respectively). Similarly, the classes moderate and severe of calcification are compounding regarding the success of the operation compared to the none/mild class (odds ratio values 0.697, 0.395, respectively). The CTO length negatively impacted the operation success, exhibiting odds ratio equal to 0.986. On the other hand, the stump categories tapered and blunt positively impacted the

Discussion
CTO is a very common condition for patients that undergo coronary angiography. The increasing acceptance of PCI in CTO in recent years highlights the importance of efficiently detecting the factors that are strongly related with the result of the PCI for prediction model development. Several attempts are available in the literature that assess the impact of independent predictor variables to the PCI result. To the best of our knowledge in all previous attempts, the authors have used regression-based methodological procedures to select the predictor variables that will be included in their models. In this study, a new approach is proposed to select the most important predictor variables. This approach is based on exploiting the dynamics of cause-effect relationships between variables. The focus was on the local causal structure around the target variable, particularly on its Markov blanket. The Markov blanket of the operation result was used as a feature selection tool towards developing the prediction model for the result of the PCI. Essentially, variables detected to be causally related with the result of the PCI were included as independent factors in a prediction model for the result of the PCI (success/failure). A similar approach has been already successfully applied in [9], aiming in detecting the most informative variables in lung cancer. The proposed prediction model, MBFS, included all five variables that constituted the Markov blanket of the operation result, namely, the CTO length, proximal tortuosity, stump, calcification, and segmental regional. The impact of proximal tortuosity, stump, calcification, and CTO length to the operation result (see Table 1) was found to be in agreement with the literature [2][3][4][5][6][7]. Particularly, it is known that the proximal tortuosity categories moderate and severe are compounding regarding the success of the operation compared to the straight class (odds ratio values 0.671, 0.355, respectively). Similarly, the classes moderate and severe of calcification are compounding regarding the success of the operation compared to the none/mild class (odds ratio values 0.697, 0.395, respectively). The CTO length negatively impacted the operation success, exhibiting odds ratio equal to 0.986. On the other hand, the stump categories tapered and blunt positively impacted the operation success compared to the no category (odds ratio values 3.247, 1.946, respectively). Interestingly, a new predictor variable was included in this model, namely the segmental regional. It was found that the category representing hypokinetic/akinetic/dyskinetic was compounding for operation success, compared to the category representing normal segmental regional, exhibiting an odds ratio equal to 0.674 (Table 1).
The MBFS model that is proposed in this study was assessed and compared to five well-known models from the literature. In the corresponding manuscripts of these five models [2][3][4][5][6], a multivariate binary logistic regression model was developed at first, and then a risk score was computed based on the magnitude of the coefficient estimates of the independent variables that were included in each model. Here, we were only interested in assessing the predictive ability of the binary logistic regression models themselves, which were all retrained in this dataset for fairness. Thus, the comparison is not a strict attempt to rank the MBFS and all five well-known models, but to also assess a novel approach that employs causality in detecting the most important predictor variables, compared to the standard approach that is mainly based on regression. The results of the assessment have not only shown that the MBFS model performed better than all its competitors, but a new important predictor factor emerged as well, namely, the segmental regional.
Building on this experience, a next step might be to further exploit the cause-effect relationships, not only with the target variable (here operation result), but between predictor variables as well. This could possibly assist in detecting yet unexplored variable relationships, and enrich the capacity of the final prediction model.

Conclusions
The above discussion highlights the strong potential of causality as a feature selection tool in prediction model development. Particularly, capitalizing on the dynamics of local causal structure and the Markov blanket resulted in a promising prediction model, and showcased the prospects of employing causal relationships in building prediction models.
Causal model development has yet to be thoroughly assessed as a prediction tool in different scientific fields. In this study, the usage of local causal structure as a feature selection tool for prediction regarding the result of PCI in CTO resulted in developing a very competitive model, thus, highlighting the potential of causal models for applications. Data Availability Statement: The data will not be made available to other researchers for purposes of reproducing the results.

Conflicts of Interest:
The authors declare no conflict of interest.