Towards a Deep-Learning Approach for Prediction of Fractional Flow Reserve from Optical Coherence Tomography

: Cardiovascular disease (CVD) is the number one cause of death worldwide, and coronary artery disease (CAD) is the most prevalent CVD, accounting for 42% of these deaths. In view of the limitations of the anatomical evaluation of CAD, Fractional Flow Reserve (FFR) has been introduced as a functional diagnostic index. Herein, we evaluate the feasibility of using deep neural networks (DNN) in an ensemble approach to predict the invasively measured FFR from raw anatomical information that is extracted from optical coherence tomography (OCT). We evaluate the performance of various DNN architectures under different formulations: regression, classiﬁcation—standard, and few-shot learning (FSL) on a dataset containing 102 intermediate lesions from 80 patients. The FSL approach that is based on a convolutional neural network leads to slightly better results compared to the standard classiﬁcation: the per-lesion accuracy, sensitivity, and speciﬁcity were 77.5%, 72.9%, and 81.5%, respectively. However, since the 95% conﬁdence intervals overlap, the differences are statistically not signiﬁcant. The main ﬁndings of this study can be summarized as follows: (1) Deep-learning (DL)-based FFR prediction from reduced-order raw anatomical data is feasible in intermediate coronary artery lesions; (2) DL-based FFR prediction provides superior diagnostic performance compared to baseline approaches that are based on minimal lumen diameter and percentage diameter stenosis; and (3) the FFR prediction performance increases quasi-linearly with the dataset size, indicating that a larger train dataset will likely lead to superior diagnostic performance.


Introduction
Cardiovascular disease (CVD) is the number one cause of death worldwide, and coronary artery disease (CAD) is the most prevalent CVD, accounting for 42% of these deaths. In CAD patients, plaque builds up in the coronary arteries and limits the blood flow to the myocardium, especially when the demand is increased (exercise, stress). In severe cases, this can lead to myocardial infarction, or even death.
X-ray coronary angiography (XA) represents the gold standard in CAD imaging [1]. Optical coherence tomography (OCT) is used in certain scenarios in conjunction with XA. since we use as input raw, reduced-order anatomical data instead of hand-crafted features. The second important aspect of the study is that we focus on intermediate lesions, for which the visual anatomical assessment of CAD based on XA does not allow for a clear clinical decision. As a result, the dataset contains a large number of lesions having an FFR value that is close to the cut-off of 0.8, making the prediction task more challenging.
Deep-learning (DL) is a class of machine learning algorithms that uses multiple layers to extract higher level features from the raw input [33]. The FFR prediction task can be formulated either as a regression problem (predict the exact value of FFR) or as a classification problem (predict the FFR class, e.g., binary classification: ≤0.8 or >0.8). There are several types of neural networks that are suitable for the FFR prediction, amongst others: -fully connected neural network, commonly referred to as artificial neural networks (ANNs). Potential disadvantages of ANNs are the large number of trainable parameters, which leads to the requirement of large training datasets, and the difficulty in capturing the inherent properties in 1D/2D/3D data structures convolutional neural networks (CNNs). Compared to ANNs, CNNs can capture the inherent properties in 1D/2D/3D data structures, but still require relatively large training sets. Also, fixed size input data are required if the network is not fully convolutional. -recurrent neural networks (RNNs) [34]. RNNs have the advantage that a variable length input sequence can be processed, but they may be affected by vanishing and exploding gradient issues.
Few-shot learning (FSL) is a type of learning where the prediction is performed based on a limited number of samples [35]. In a study that was published by Yang et al., the models that were used for FSL were classified into four categories: multitask learning, embedding learning, learning with external memory, and generative modeling.
Herein, we evaluate the performance of ANNs, CNNs, and RNNs in both regression and classification formulations. Additionally, we also consider the use of FSL, focusing specifically on prototypical networks [47], a subcategory of the embedding learning models, considered the state of the art for classification tasks. More details that are related to prototypical networks are included in Appendix A.1.

Study Design
This was a single-center, retrospective study that was carried out at the Clinical Emergency Hospital, Bucharest, Romania. The study complied with the Declaration of Helsinki for investigation in human beings. The study protocol was approved by the local ethics committee and each patient signed an informed consent form before the enrolment in the study.

Study Population
Patients at least 18 years old, with stable angina, and an indication for diagnostic XA due to intermediate or high likelihood of obstructive coronary artery disease, were considered. Further inclusion criteria were: at least one lesion with 40% to 80% diameter stenosis by visual assessment, and invasive FFR measurement considered required by the operator for clinical decision-making. Patients were excluded if they were unable to provide informed consent, had significant arrhythmia (heart rate over 120 bpm), suspected acute coronary syndrome, atrial fibrillation, low systolic pressure (below 90 mmHg), contraindication to beta blockers, nitroglycerin or adenosine, a non-cardiac illness with a life expectancy of less than 2 years, pathological aortic valve, rest state angina, or myocardial infarct during the last 6 months. Additionally, aorto-ostial lesions were excluded from the study. A total of 80 patients were included in the study.

Procedure Protocol
Coronary angiography (Siemens Artis Zee, Forchheim, Germany) was performed after iso-centering in posterior-anterior and lateral planes, via a transradial (preferred) or transfemoral approach. In all cases, a 6 French diagnostic catheter was used after intracoronary injection of glyceryl trinitrate according to routine practice in the hospital, with manual contrast injection and cine acquisition at a frame rate of 15 frames/second. OCT imaging was performed using a frequency-domain OCT systems (St. Jude Medical/Abbott, St. Paul, MN, USA). The fiber probe was pulled back at a constant speed and cross-sectional images were generated with a spacing of 0.2 mm.
The acquisition of physiological data for FFR calculation was performed according to conventional practice [48] with a commercially available FFR measurement system (PressureWire Aeris; St. Jude Medical, Minneapolis, MN, USA). The 0.014 coronary wire with a pressure tip was advanced until the pressure sensor passed the orifice of the guiding catheter. Transcatheter aortic and intracoronary pressure tracings were equalized. Subsequently, the guidewire was advanced into the respective coronary artery until the pressure sensor passed the index lesion. Hyperemia was induced by the administration of adenosine either intravenously at a constant rate of 140 µg/kg/min, or as an intracoronary bolus (100 µg for the right and 200 µg for the left coronary artery); the pressure recording was started, and the FFR was determined. A total of 102 coronary lesions in 80 patients underwent FFR analysis. This invasively measured FFR represents the ground truth that is used during the training of the deep neural networks, as described in the following.

Data Pre-Processing
The OCT data were exported from the OCT workstation available onsite. All OCT slices are RGB images, and the exported data contains the automatically detected coronary lumen, which is overlaid on the image and depicted in green. The spacing between the slices is 0.2 mm, and the number of slices per acquisition is constant at 376. Figure 1 displays the data pre-processing workflow starting from the exported OCT images with automatically detected lumen contour. First, the contours are automatically extracted by processing the green channel as follows: a threshold representing 90% of the maximum intensity value is used to create a binary image, and all the contours are extracted [49]. We then retain the contour which surrounds the center of the image: if there are multiple such contours, we pick the one with the largest area. Next, we use an in-house developed application to collect manual input that is provided by the clinical expert: selection of the proximal start and distal end slice, which define the coronary artery region of interest. Slices representing the catheter are excluded, alongside other slices with sub-optimal image quality (e.g., blood artifacts); -rejecting/correcting erroneous contours within the selected slice-range: the automatically detected contours may be incorrect on certain slices, typically in bifurcation regions and/or if the lumen has a profoundly non-circular shape (e.g., concave shape). Erroneous bifurcation contours are rejected, while erroneous contours in the stenosis region are corrected (required in less than 10% of the OCT acquisitions).
Appl. Sci. 2022, 12, x FOR PEER REVIEW 5 of 24 Figure 1. OCT data processing workflow, including FFR prediction using a deep neural network.
Next, the data are pre-processed: the inside area of each non-rejected lumen contour in the selected slice-range is computed and the effective radius is determined (considering an equivalent circular contour with identical area). The radius of rejected contours is set using linear interpolation that is applied on the radiuses of the closest neighboring contours that have not been rejected. The radiuses are then arranged in a 1D sequence, starting with the proximal slice of the selected slice-range. Since the OCT slices are equidistant, only the radius values are used as input. For the further processing using deep neural networks, the 1D radius sequence is padded to a size of 376 (maximum length of an OCT sequence), and z-score normalization is performed [50]. The mean and standard deviation of each acquisition are computed, and then a global mean and global standard deviation are computed for the training set by averaging the mean and standard deviation values of the acquisitions that are included in the training set. The acquisitions in the validation/test split are normalized using the values that are employed for the training set. The 1D sequence of normalized radius values is used as input for the deep neural network predicting FFR.

Deep Neural Network Based FFR Prediction
Different types of neural network models are considered for the prediction of the invasively measured FFR, ANNs, CNNs, and RNNs, applied with different approaches: -a regression approach: models predict a rational number representing invasive FFR a classification approach: models predict the class of the FFR value (positive, i.e., FFR ≤ 0.8, or negative, i.e., FFR > 0.8) -a FSL approach: similar to the classification approach.
As ANN, we used a fully connected neural network with 4 hidden layers, and the rectified linear unit (ReLU) [51] as the activation function for the hidden layers. The details of the ANN architecture are included in Appendix A (Table A1).
As CNN, we used a fully convolutional neural network (1D convolutions) with eight layers. For the hidden layers we used ReLU as activation function, and batch normalization was employed [52]. For the regression and the classification approach we added a final fully connected layer to perform the prediction. For the FSL approach, this layer is not required. The details of the CNN architectures are included in Appendix A (Tables A2  and A3).
As RNN, we included a bidirectional gated recurrent unit (GRU) [53] layer on top of the previously described fully convolutional neural network (referred to as CNN + RNN in the appendix). This avoids the padding requirement. The CNN layers learn the relevant features from the input, and then the RNN performs the final prediction based on those features. Training a fully RNN network was not possible considering the small size of the available dataset. For the regression and the classification approach we added a fully connected layer after the bidirectional GRU to perform the prediction. For the bidirectional Next, the data are pre-processed: the inside area of each non-rejected lumen contour in the selected slice-range is computed and the effective radius is determined (considering an equivalent circular contour with identical area). The radius of rejected contours is set using linear interpolation that is applied on the radiuses of the closest neighboring contours that have not been rejected. The radiuses are then arranged in a 1D sequence, starting with the proximal slice of the selected slice-range. Since the OCT slices are equidistant, only the radius values are used as input. For the further processing using deep neural networks, the 1D radius sequence is padded to a size of 376 (maximum length of an OCT sequence), and z-score normalization is performed [50]. The mean and standard deviation of each acquisition are computed, and then a global mean and global standard deviation are computed for the training set by averaging the mean and standard deviation values of the acquisitions that are included in the training set. The acquisitions in the validation/test split are normalized using the values that are employed for the training set. The 1D sequence of normalized radius values is used as input for the deep neural network predicting FFR.

Deep Neural Network Based FFR Prediction
Different types of neural network models are considered for the prediction of the invasively measured FFR, ANNs, CNNs, and RNNs, applied with different approaches: -a regression approach: models predict a rational number representing invasive FFR a classification approach: models predict the class of the FFR value (positive, i.e., FFR ≤ 0.8, or negative, i.e., FFR > 0.8) -a FSL approach: similar to the classification approach.
As ANN, we used a fully connected neural network with 4 hidden layers, and the rectified linear unit (ReLU) [51] as the activation function for the hidden layers. The details of the ANN architecture are included in Appendix A (Table A1).
As CNN, we used a fully convolutional neural network (1D convolutions) with eight layers. For the hidden layers we used ReLU as activation function, and batch normalization was employed [52]. For the regression and the classification approach we added a final fully connected layer to perform the prediction. For the FSL approach, this layer is not required. The details of the CNN architectures are included in Appendix A (Tables A2 and A3).
As RNN, we included a bidirectional gated recurrent unit (GRU) [53] layer on top of the previously described fully convolutional neural network (referred to as CNN + RNN in the Appendix A). This avoids the padding requirement. The CNN layers learn the relevant features from the input, and then the RNN performs the final prediction based on those features. Training a fully RNN network was not possible considering the small size of the available dataset. For the regression and the classification approach we added a fully connected layer after the bidirectional GRU to perform the prediction. For the bidirectional GRU, we used ReLU as the activation function. The details of the RNN architecture are included in Appendix A (Table A4).
No activation function was used on the last layer for the regression approach, and the sigmoid function [54] was chosen for the classification approach. For the FSL approach, the output of the network is represented by the features from the last hidden layer. The class is then determined by the smallest Euclidean distance between the output of the network and the two class clusters. These are defined by the mean features of the training set samples of each class.
For the classification and FSL approaches, all the samples with invasive FFR ≤ 0.8 represent the positive class and all the samples with invasive FFR > 0.8 represent the negative class. Since the dataset consists of only 102 invasive values, the models are evaluated using the leave-one-out cross validation strategy that is applied at the patient level [55]. For each fold, the samples of one patient are moved to a validation set, while the model is trained for a fixed number of epochs (300) on the samples of the remaining patients. The classification accuracy is computed for each epoch, and the epoch leading to the highest accuracy on the entire dataset, i.e., all folds, is chosen for reporting the statistics. Additionally, only during training of the classification-based approaches, we also ignored the samples with invasive FFR values in the range 0.79-0.81 (six samples). By removing these samples that are close to the cut-off point, the model is able to learn to better discriminate between the classes. For all the models we used the Adam optimizer [56], mean squared error as a loss function for the regression approach, and cross entropy [57] for the classification and the FSL approach (more details are included in Appendix A.2). All the architectures were optimized using grid search [58], applied for: number of layers, number of neurons per layer, dropout percentage, and the learning rate. The implementation is based on Python, and the PyTorch [59] library for DL model training and inference.
To allow for a fair assessment of the performance, an ensemble approach is considered for each configuration: each of the proposed models is trained 20 times using different random seeds. For each configuration, the 20 models are then combined into one ensemble model. For regression approaches, the ensemble prediction for one sample is the mean value of the predictions of all 20 models. For classification and FSL approaches, the ensemble prediction for one sample is the mean value of the probabilities of all 20 models. This allows for a more robust assessment of the model performance, which is independent from the random seed that is used during training. The value 20 was chosen following experiments which indicated that the ensemble model performance did not change when using larger values.
For all the ensemble models, we performed the receiver operating characteristic (ROC) analysis [60] and we computed the area under the curve (AUC) score [61]. Based on the ROC curves, we selected for each ensemble model the optimal cut-off point as being the point closest to the point (0, 1) [62]. The reported model performance metrics are based on the optimal cut-off point. The formula that is used to determine the point closest to (0, 1) is [63]: where ER is the closest point to (0, 1), c is a cut-point, Se is sensitivity, and Sp is specificity. Similar to other studies, we further consider the minimum lumen diameter (MLD) and percentage diameter stenosis (%DS) as simple baseline references to assess the performance of the DL models. The %DS is computed as follow: where r min is the minimum radius of the sequence, r avg is the average of the proximal and distal reference radius values of the lesion, as extracted from the OCT data. For both MLD and %DS, we also apply the leave-one-out cross validation strategy at the patient level, as follows: for each fold, a threshold value is chosen which balances sensitivity and specificity on the respective training set, and then this threshold is applied to classify the test sample(s).
To evaluate the results, we computed the diagnostic statistics (accuracy, sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) [64]) for all approaches, and additionally the mean absolute error (MAE), mean error (ME), and the mean squared error (MSE) for the regression approach. For the diagnostic statistics we additionally computed the 95% confidence intervals.

Population Characteristics
Baseline patient and lesion characteristics are summarized in Tables 1 and 2: 80 patients (66 male, 14 female) with 102 lesions were included in this study. The mean patient age was 60.5 ± 11.2 years. The mean FFR was 0.80 ± 0.08, and 48 of the lesions were hemodynamically significant according to the criterion FFR ≤ 0.80.  Figure 2 displays the ROC curve, the AUC scores including their 95% confidence intervals (CI), and the closest point to (0, 1) for all the approaches. The best three approaches based on AUC score are regression CNN, FSL RNN, and FSL CNN. Interestingly, the AUC score is superior for the regression CNN approach, but the FSL CNN approach has the closest point to (0, 1), i.e., the best diagnostic performance statistics, as shown below.

Invasive FFR Prediction Performance
Appl. Sci. 2022, 12, x FOR PEER REVIEW 8 of score is superior for the regression CNN approach, but the FSL CNN approach has t closest point to (0, 1), i.e., the best diagnostic performance statistics, as shown below. parentheses represent the 95% confidence intervals computed as in [65].
The performance and statistics of the various ensemble DL models and approach considered herein are displayed in Table 3.
The ROC curve, AUC score, and the closest point to (0, 1) for all approaches. Values in the parentheses represent the 95% confidence intervals computed as in [65].
The performance and statistics of the various ensemble DL models and approaches considered herein are displayed in Table 3. In terms of diagnostic performance, the FSL approach is performing better than classical regression and classification, while in terms of AUC, the CNN regression is superior to other methods. Since the 95% confidence intervals overlap, the differences are statistically not significant. FSL algorithms have been designed for optimal performance on small datasets where they tend to perform better than classic models. The best performing architecture is the one that is based on CNN. Furthermore, the training accuracy suggests that overfitting is not present for eight of the nine approaches. For the classic CNN-based classification, the model seems to overfit, even though different attempts were made to address this: L2 regularization and dropout. The confusion matrix for the best approach is depicted in Table 4.
Each ensemble model consists of 20 models that were trained with different seed values. Table 5 displays the mean accuracy, the standard deviation (std) of the accuracy, the minimum accuracy (min), and the maximum accuracy (max) for the validation dataset when employing the default operating points/thresholds of 0.8 for regression and 0.5 for classification. While all variations are quite small, the smallest std is obtained for the models that are based on FSL, which further underlines the robustness of this approach. Additionally, we computed the ensemble model mean uncertainty by averaging the uncertainty of the ensemble model for each examination [66]. The ensemble model uncertainty for regression approaches is the standard deviation of the predictions of all models for one sample. An intuitive approximation for the ensemble model's uncertainty for classification and FSL approaches was chosen as: where y(i) is the ensemble model prediction for each sample and N is the number of samples; this uncertainty measure is the distance between the output probability and the predicted class label (0 or 1), therefore, predictions such as 0.1 or 0.9 are considered "confident" while others such as 0.4 or 0.6 are considered more "uncertain". This approximation is feasible since ensemble models usually have well-calibrated outputs [66]. The ensemble uncertainty results of the regression approaches are not directly comparable to the ensemble uncertainty results for the classification and FSL approaches, and it has been also shown [66] that regression-based uncertainty that is computed as the ensemble predictions' standard deviation is not well-calibrated as the MSE training loss "is not a scoring rule that captures predictive uncertainty" [66]. For the regression approaches, RNNs tend to have the smallest uncertainty. For classification and FSL approaches the uncertainty is similar for five of the approaches, while FSL CNN has a much smaller uncertainty. The reason the default thresholds were employed in Table 5 is that selecting a bestoperating-point with respect to some metrics and some held-out test-set is part of a postprocessing stage; uncertainty estimates, however, depend solely on two factors: the input samples (i.e., input noise, out-of-distribution, etc.) and the learned model (here, the training procedure, the network architecture, and especially the training set have a large influence); the ground-truth label of a test input sample has no influence on the prediction uncertainty. Therefore, for an unbiased assessment, uncertainty measures of all the approaches were computed from the raw ensemble predictions and compared with the mean accuracy that was obtained from using the default thresholds. Figure 3 displays four sample cases: one for each of the categories true positive (TP), true negative (TN), false positive (FP), and false negative (FN). A representative angiographic frame is displayed, indicating the invasive FFR value and the coronary vessel and region of interest that is visualized using OCT. Further, the longitudinal OCT view and the radius profile that were used as input to the DNNs are displayed.

Subgroup Analyses
In the following, we use the best performing model according to the results in Table  3 (FSL-CNN) to perform a series of subgroup analyses.
As detailed in Section 2.1, the dataset contains a large number of samples in the in-

Subgroup Analyses
In the following, we use the best performing model according to the results in Table 3 (FSL-CNN) to perform a series of subgroup analyses.
As detailed in Section 2.1, the dataset contains a large number of samples in the interval 0.75-0.85 (46%). Hence, we have computed the statistics separately for lesions with FFR < 0.75, lesions with FFR > 0.85, and for the lesions with intermediate values. The results are displayed in Table 6. As expected, the accuracy of the model increases in the two bins at the extremes. In another analysis, we assessed the performance as a function of the vessel on which the measurement was performed. The results are displayed in Table 7 and indicate a higher accuracy on the LCx, compared to the other two main coronary arteries. The literature suggests that the LCx has typically a smaller baseline and hyperemic flow velocity compared to the LAD and RCA, which impacts the FFR measurements [67]. In other words, the same radius profile will lead to different invasive FFR values on different arteries. Since the type of artery is not used as an input to the DNN, a performance difference is expected. Most of the measurements in the study were performed in the LAD. The clinical literature suggests that proximal LAD lesions are of particular interest for long-term patient outcome [68]. Hence, we have divided LAD lesions into proximal lesions and others (mid or distal lesion). The results are displayed in Table 8 and indicate a similar performance in terms of accuracy, but the sensitivity is slightly lower for proximal lesions. This is an expected outcome since literature indicates that a lesion with a certain severity will lead to smaller FFR values when it is located in the proximal LAD, compared to the mid and distal LAD. Hence, the model slightly underestimates the severity of proximal LAD lesions. In another analysis, we assessed the prediction performance for male and female patients. The results in Table 9 indicate that the model performs slightly better for male patients. This is an expected outcome since the vast majority of lesions are from male patients (82%). The age of the patient can be another important factor in the clinical decision-making. We have divided the data at the patient level into three equally large bins. The results in Table 10 indicate a marked difference between the three subgroups. The intermediate bin has a slightly larger number of intermediate lesions (18 vs. 15/14), partially explaining the difference in diagnostic performance. Finally, in another subgroup analysis we have considered the centerline length of the input data and have divided the samples into three equally sized bins. The results in Table 11 display a balanced performance, i.e., the considered length has no major influence on the model performance.

Effect of Dataset Size
To assess the impact of the number of samples on the performance, we trained the best performing approach (CNN architecture with FSL) on datasets containing only a part of the original dataset. We started with 30% of the original dataset, and then increased the size in increments of 10%, until reaching 100%, i.e., the original dataset. The smaller datasets were set up by random sampling from the original dataset. To limit the selection bias, for each percentage we ran 20 experiments, where for each experiment a new random sampling was performed, and the CNN was initialized with a new random seed. The accuracy and the standard deviation for all the considered experiments is displayed in Figure 4.
As expected, the dataset size has an important impact on the accuracy. Encouragingly, a relatively linear increase in performance can be observed, indicating that with larger datasets, the performance should further increase. Moreover, the variation, i.e., standard deviation, decreases as the dataset size increases. This is motivated by two aspects. First, the smaller the percentage of data are, the larger is the variability of the actual dataset that is employed for the leave-one-out cross-validation. When 100% of the data are employed, the variability stems only from the random seed that is used for the initialization. Secondly, the larger the dataset, the more robust the prediction will be, i.e., with a smaller variability. best performing approach (CNN architecture with FSL) on datasets containing only a part of the original dataset. We started with 30% of the original dataset, and then increased the size in increments of 10%, until reaching 100%, i.e., the original dataset. The smaller datasets were set up by random sampling from the original dataset. To limit the selection bias, for each percentage we ran 20 experiments, where for each experiment a new random sampling was performed, and the CNN was initialized with a new random seed. The accuracy and the standard deviation for all the considered experiments is displayed in Figure 4.

Saliency Maps and Runtime
To analyze the features that the model is focusing on, we computed the saliency maps [69] for the best ensemble model (CNN-FSL). To obtain the saliency map for the ensemble model, we computed the derivative of the output with respect to the input for each individual model and then we averaged all saliency maps (see Figure 5). As expected, the output of the ensemble CNN-FSL model is influenced by all coronary diameters, but the gradient is larger in the stenosis area, which is known as the main determinant for the measured FFR values.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 14 of 24 As expected, the dataset size has an important impact on the accuracy. Encouragingly, a relatively linear increase in performance can be observed, indicating that with larger datasets, the performance should further increase. Moreover, the variation, i.e., standard deviation, decreases as the dataset size increases. This is motivated by two aspects. First, the smaller the percentage of data are, the larger is the variability of the actual dataset that is employed for the leave-one-out cross-validation. When 100% of the data are employed, the variability stems only from the random seed that is used for the initialization. Secondly, the larger the dataset, the more robust the prediction will be, i.e., with a smaller variability.

Saliency Maps and Runtime
To analyze the features that the model is focusing on, we computed the saliency maps [69] for the best ensemble model (CNN-FSL). To obtain the saliency map for the ensemble model, we computed the derivative of the output with respect to the input for each individual model and then we averaged all saliency maps (see Figure 5). As expected, the output of the ensemble CNN-FSL model is influenced by all coronary diameters, but the gradient is larger in the stenosis area, which is known as the main determinant for the measured FFR values.  The training time for one-fold and one epoch is approximately 1050 ms for all the described approaches, the inference time for regression and classification approaches for one sample is approximately 2 ms, and the inference time for FSL approaches for one sample is approximately 25 ms. This difference of one order of magnitude is determined by the necessity of determining the classification clusters. All experiments were run on a desktop computer with AMD Ryzen 9 5900X CPU, 128 GB of RAM, and an NVIDIA RTX 3060 graphics card.

Deep Learning-Based Prediction of FFR
As more data are emerging from studies that are based on artificial intelligence and computational modelling, the incremental diagnostic value of predicted coronary functional diagnostic indices over the traditional XA-based visual or quantitative lesion grading is becoming more evident.
We have introduced a method for the deep-learning-based prediction of FFR from routine optical coherence tomography. No specific requirements were formulated for the OCT acquisition. We demonstrated that this approach has a high potential in assessing functionally significant stenoses. Different models and approaches were proposed and evaluated. The experiments indicated the superiority of the FSL-based approach, a type of DL formulation that is specialized for small datasets. However, given the large overlap in the 95% confidence intervals, the differences between the methods are statistically not significant.
Thus, the main findings of this study can be summarized as follows: (1) DL-based FFR prediction from reduced-order raw anatomical data is feasible in a dataset that is focused on intermediate lesions for which the visual anatomical assessment of CAD based on XA does not allow for a clear clinical decision, and with no restriction on the type of lesions that were included in the study, and on the OCT acquisition; (2) DL-based FFR prediction provides superior diagnostic performance compared to baseline approaches based on MLD or %DS; (3) the FFR prediction performance increases quasi-linearly with the dataset size, indicating that a larger training dataset will likely lead to superior diagnostic performance.
The diagnostic accuracy of 77.5% achieved herein is lower compared to that of other studies focusing on FFR prediction from OCT, which reported an accuracy ranging between 88% and 95% [21,22,31,70,71]. There are two main aspects that are responsible for this difference. First, the complexity of the dataset that is processed herein is higher than that of other studies: 46% of the samples have an invasive FFR value ranging between 0.75 and 0.85, while in other studies these grey zone lesions represented between 20% and 44% of the entire dataset [21,22,31,70,71].
Secondly, past studies focusing on FFR prediction from OCT either rely on computational fluid dynamics (CFD) [21,22,70,71], or on ML-based approaches including handcrafted features [31]. By applying a deep neural network directly on the raw data that are represented by the effective radius along the centerline of the vessel of interest, we allow the model to automatically learn powerful features for FFR prediction. The results that were obtained in other application areas (healthcare or others) demonstrate that classic machine learning (ML) techniques and hand-crafted features typically outperform DLbased approaches when the training set is small, but, conversely, the DL-based approaches outperform classic ML-based approaches when the size of the trainset increases significantly [70]. The results in Figure 4, depicting the accuracy as a function of the dataset size, confirm that a larger dataset will enable a better performance: the performance of the DL model increases quasi-linearly with the dataset size. As shown in Table 3, the diagnostic performance of the proposed model is already considerably higher outside of the 0.75-0.85 FFR value interval.
To increase the prediction performance of DL models, different types of regularization are employed in the literature: mathematical expressions added to the loss function (L1, L2 regularization) [71], dropout (used to randomly drop out neurons during training) [72], and data augmentation [73]. Herein, we have used L2 regularizations and dropout. Data augmentation, i.e., generating new samples by perturbing the input data, is difficult to perform when training against invasively measured FFR, since the approximation of the ground truth values is not straightforward. We have considered data augmentation by adding a small amount of noise to the 1D radius sequence used as input, but the results have not improved.
A DL-or ML-based prediction of FFR was considered also in studies relying on other types of medical images (CCTA, XA). Kumamaru et al. [74] proposed a DL model to estimate invasive FFR from CCTA. They had a dataset containing 207 measurements from 131 patients and have obtained an accuracy of 75.9% in predicting an abnormal invasive FFR (≤0.8). Another interesting approach was proposed by Zreik et al. [75], they used DL in an unsupervised manner and obtained an overall accuracy of 78% on CCTA data. They obtained an accuracy of 66% for FFR < 0.7, 75% for an FFR between 0.7 and 0.8, 79% for an FFR between 0.8 and 0.9, and 73% for an FFR > 0.9. Itu et al. [29] proposed a DL model that was trained on ground truth values computed with a CFD-based approach on a database of synthetically-generated coronary anatomies. They achieved an accuracy of 83.2% on CCTA data.

Clinical Impact
Despite the overwhelming clinical evidence that an FFR-guided revascularization strategy improves patient outcome, still the number of coronary interventions preceded by FFR measurements is relatively low due to the limitations of invasive pressure measurements [76]. Hence, a virtual functional index would increase the adoption of physiologyguided coronary interventions, while drastically reducing the requirement for invasive pressure measurements.
The proposed method is potentially well suited for a clinical setting, given the realtime prediction performance of the DL model. Certain manual steps are required in the current pipeline, but these can be automated using algorithms for image quality assessment, e.g., to exclude slices with blood artifacts, and more accurate lumen contour detection [77]. The approach only requires knowledge of the coronary luminal geometry, which can be extracted directly from OCT.

Limitations
The motivation to perform invasive FFR was clinical, which resulted in a large proportion of anatomically borderline lesions in a population with extensive atherosclerotic disease. No cases were excluded, and the results should be interpreted with the consideration that this was a retrospective single-center study.
The anatomical data that was used as input to the DL model may not always accurately reflect the true luminal geometry due to limitations of the OCT acquisition itself (heart motion during automatic pullback, sub-optimal calibration), and small errors that are introduced by the linear interpolation of radius values for the rejected contours. Furthermore, by using the effective radius information as input, we neglect the actual three-dimensional shape of the coronary lumen. The literature suggests that this has a small impact [78], but in certain samples, with non-circular lumen geometry, e.g., concave shape, the impact may not be negligible.
Moreover, the manual editing steps limit the real-time capabilities of the algorithm and introduce intra-and inter-observer variability.
While the subgroup analyses indicate that the length of the considered segment does not influence the results, the maximal length of 7.5 cm may represent a limitation in the case of serial stenoses. For example, if lesions are present in the proximal and distal segment of a vessel, a processed vessel length that is larger than the limit of 7.5 cm would be required to accurately predict FFR.
Finally, to validate our findings and to provide more representative results, the proposed method requires further validation in larger, prospective studies, that are conducted at multiple clinical sites.

Future Work
Multiple future directions can be defined, given also the current limitations that are listed above. First, the size of the training set should be increased to exploit the capabilities of a deep neural network-based approach. To limit the complexity of the input data, we currently use the effective radius, however, we envision the use of the coronary lumen mask as input, which may then allow the model to consider lumen non-circularities for the prediction. The dimensionality of the input data would increase from 1D to 3D, which would require a larger training set for enabling an accurate prediction. Furthermore, with the increase in the dataset size, other deep-learning approaches (evaluated herein or others) might lead to the best FFR prediction performance.
When employing a classification-based approach, another possible future direction is to increase the number of output classes. For example, a three-class approach would predict lesions as being functionally significant, functionally non-significant, or intermediate/uncertain. This would allow for the definition of hybrid decision-making strategy, where lesions which are not in the intermediate, i.e., uncertain class, can be confidently diagnosed, while for the ones in the intermediate class further aspects may be considered for the final decision, potentially even performing the invasive FFR measurement. The invasive FFR cut-off values for distinguishing the three classes may be chosen based on the performance of the model, e.g., to ensure a sensitivity/specificity of at least 95% for the lesions which are not in the intermediate class. The better the performance of the model, the closer the cut-off values may be to 0.8, i.e., the fewer lesions would be predicted as being uncertain.
Herein, we have considered only the coronary lumen information as input. Previous studies have demonstrated that FFR is influenced also by other patient characteristics (demographics, other pathologies, etc.) [31]. The results of the sub-group analyses have shown the patient sex and age and the vessel of interest may influence the prediction. Additional features may be considered directly as input into the deep neural network, or a cascaded modeling approach may be designed: the first model processes only the coronary lumen information, while the second model, which takes as input the output of the first model, processes all additional features to perform a final and more accurate prediction.
Standard OCT acquisitions have been used for obtaining the input data for the FFR prediction. OCT acquisition guidelines containing specific requirements (e.g., include the entire stenosis in the OCT sequence) may likely improve the prediction accuracy. Such an approach was successfully applied in a previous study [79].
The method that is described herein may be applied similarly on coronary lumen information that is extracted from other imaging modalities (XA, CCTA, IVUS). Since the image resolution, especially on XA and CCTA, is lower than on intra-vascular images, the coronary lumen information may be less accurate. However, XA and CCTA allow for a more complete evaluation of the coronary tree since the vessel of interest can be assessed in all its segments, alongside large side branches. A different methodology might lead to the optimal performance in that case, e.g., based on graph neural networks [80].
Finally, the approach can also be extended to predict other hemodynamic quantities, such as coronary flow reserve (CFR), rest Pd/Pa [81], the instantaneous wave-free ratio (iFR) [82], or hyperemic/basal stenosis resistance (HSR/BSR) [83,84], each of which can be used as a ground-truth during training.

Data Availability Statement:
The data has been acquired as part of the projects acknowledged in the manuscript, and cannot be made public, considering GDPR regulations and the content of the informed consent signed by the patients.
Acknowledgments: The concepts and information presented in this paper are based on research results that are not commercially available. Future commercial availability cannot be guaranteed.  Table A1. ANN architecture. The layer that is highlighted in purple is used only for the regression and the classification approaches (not for the FSL approach). For the regression approach, we used no activation function and the activation function that is highlighted in green is used for the classification approach (not for the FSL approach).  1  Conv1D  3  1  64  2  ReLU  -Batch norm  3  2  Conv1D  3  64  128  2  ReLU  -Batch norm  7  3  Conv1D  3  128  256  2  ReLU  -Batch norm  15  4  Conv1D  3  256  512  2  ReLU  -Batch norm  31  5  Conv1D  3  512  512  2  ReLU  -Batch norm  63  6  Conv1D  3  512  512  1  ReLU  -Batch norm  127  7  Conv1D  3  512  512  1  ReLU  -Batch norm  191  8  Conv1D  3  512  512  1 ReLU -Batch norm 255 Table A3. The fully connected layers that were added on top of the architecture that is presented in Table A2, for the CNN-based regression and classification. For the regression approach, we used no activation function and the activation function that is highlighted in green is used for the classification approach (not for the FSL approach). FC  2048  1024  ReLU  Dropout  FC  1024  1 Sigmoid -Table A4. The bidirectional GRU that was added on top of the architecture that is presented in Table A2, used for CNN + RNN approach. The layer that is highlighted in purple is only used for the regression and the classification approach (not for the FSL approach). For the regression approach, we used no activation function and the activation function that is highlighted in green is used for the classification approach (not for the FSL approach).