Automated Coronary Optical Coherence Tomography Feature Extraction with Application to Three-Dimensional Reconstruction

Coronary optical coherence tomography (OCT) is an intravascular, near-infrared light-based imaging modality capable of reaching axial resolutions of 10–20 µm. This resolution allows for accurate determination of high-risk plaque features, such as thin cap fibroatheroma; however, visualization of morphological features alone still provides unreliable positive predictive capability for plaque progression or future major adverse cardiovascular events (MACE). Biomechanical simulation could assist in this prediction, but this requires extracting morphological features from intravascular imaging to construct accurate three-dimensional (3D) simulations of patients’ arteries. Extracting these features is a laborious process, often carried out manually by trained experts. To address this challenge, numerous techniques have emerged to automate these processes while simultaneously overcoming difficulties associated with OCT imaging, such as its limited penetration depth. This systematic review summarizes advances in automated segmentation techniques from the past five years (2016–2021) with a focus on their application to the 3D reconstruction of vessels and their subsequent simulation. We discuss four categories based on the feature being processed, namely: coronary lumen; artery layers; plaque characteristics and subtypes; and stents. Areas for future innovation are also discussed as well as their potential for future translation.


Introduction
Coronary artery disease (CAD) is a leading cause of death, morbidity, and economic burden globally [1,2]. Although rates of myocardial infarction (MI) are decreasing through some parts of the world, recurrent major adverse cardiovascular events (MACE) following initial MI continue to occur at unacceptably high rates [3]. This is because of the complex pathogenesis and widespread nature of atherosclerotic plaques, including those in non-infarct related arteries that continue to pose a risk of plaque destabilization and atherothrombotic events [4,5]. This is despite advances in structural, molecular, and functional imaging technology, percutaneous coronary intervention (PCI) and pharmacotherapy. While invasive coronary angiography (ICA) is still the cornerstone of CAD assessment in real-world practice [6], intravascular imaging modalities, such as intravascular ultrasound (IVUS) and optical coherence tomography (OCT) can also be adjuvantly used, owing to their ability to identify vulnerable plaque features [7] such as plaque burden [8] and thin-cap fibroatheroma (TCFA) [9], respectively. These high-risk plaque features have been shown to portend up to a six-fold increase in future MACE [10]. However, the ability of conventional IVUS and OCT imaging to predict which plaques will progress to cause future thrombotic events is still suboptimal, with positive predictive values of only 20-30% [11].
Coronary biomechanics is emerging as a potentially useful tool to improve this predictive capability [12]. Computational fluid dynamics (CFD) has predominantly been applied to assess regions of low wall shear stress (WSS) [13][14][15], an established factor that has shown associations with low-density-lipoprotein deposition [16] and subsequent plaque progression [17,18]. Conversely, in the general population heightened structural stress [19,20] has been associated with plaque instability and rupture [21], as well as plaque growth over time [22], and can be modulated by the dynamics of left ventricular function [23][24][25]. This highlights the complex and highly nonlinear relationships within the coronary vasculature that can influence a patient's biomechanical stress profile. Furthermore, the challenge facing coronary biomechanics, much like imaging modalities, is that no one parameter can provide a reliable or wholistic summation of a patient's biomechanical profile. To address this, comprehensive biomechanical simulations are required, demanding high-fidelity imaging to segment important regions accurately and deliver robust, realistic, and patient-specific stress distributions.
Among current commercially available intracoronary imaging modalities applied in real-world clinical scenarios, OCT is uniquely placed to deliver sufficient accuracy, given that it has axial and lateral resolutions of 5-20 µm and 10-90 µm, respectively, depending on laser source and lens properties, approximately ten-fold higher axial and lateral resolutions than IVUS [26,27]. OCT achieves this accuracy through light-based, near-infrared spectrum wavelengths of 1250 to 1350 nm emitted from a single invasive fiberoptic wire, which rotates as it is pulled backwards through the target vessel [28]. The backscattering of light measured by the time for light to travel from tissue to the catheter lens over each revolution of the fiberoptic wire forms each cross-sectional image of the vessel wall. The high spatial resolution of this light-based imaging modality allows for delineation between atherosclerotic components [29,30], shown in Figure 1. This enables identification of high-risk features, notably thin fibrous cap, macrophage infiltration, plaque microchannels, cholesterol crystals, spotty calcification, lipid arc [31,32], and layering of plaque [33], which have been identified as predictors of rapid plaque growth [34] and determinants of biomechanical stress.
The primary limitation of commercially available intracoronary OCT is its penetration depth of 0.1 to 2 mm in plaques, compared to up to 10 mm for IVUS, which prevents visualization of the deep content of plaques, the external elastic membrane and adventitial layer in diseased regions [28,35]. This penetration depth decreases significantly in the presence of lipid rich plaques due to the high attenuation and low backscattering properties of lipid. However, OCT does overcome IVUS's limited penetration depth in calcified lesions which ultrasound cannot penetrate. Despite this, many clinical studies have taken OCT-centered approaches [36-39] to assess vulnerable plaque features or biomechanically simulate arteries after three-dimensional (3D) reconstruction [40][41][42][43][44]. Nevertheless, annotation of OCT images is still predominantly a manual and tedious task, susceptible to individual interpretation, which is a major obstacle to its use [45]. Indeed, the risk of intra and inter-observer variability in quantitative analysis necessitates that each image is analyzed by at least two analysts, further compounding the significant time cost. With the advent of machine learning techniques, automated medical image classification and segmentation has gained significant attention, with deep learning based neural networks predominantly used for medical image analysis [46]. In the simplest terms, these models work through back-propagation to minimize a prescribed loss function (such as cross-entropy [47], dice loss [48] or Tversky [49]) by directing a machine how to alter its parameters. The most common method used in image analysis is a convolutional neural network (CNN) [50]. Compared to artificial neural networks (ANNs) [51], that work by connecting multiple inputs to individual neurons, which are then multiplied by a weight and effectively summed to produce a single output, CNNs can reduce the number of weights used through sharing, resulting in convolution operations, and reduced computation time. CNNs generally apply a combination of convolutional and pooling layers, where the pooling layer down samples data allowing for an increased field of view in subsequent layers, as described in Figure 2. However, this usually leads to a reduction in image resolution [52], which can hamper the accurate segmentation of tissue borders, a critical feature for biomechanical simulation. Fully convolutional networks (FCN), such as the U-Net [53] which is named after its characteristic U-shaped structure, can assist in meeting this challenge. These networks couple the high-resolution, low level image data with low-resolution, higher level feature information to improve image segmentation and classification results. Various architectures exist depending on the task to be completed and interested readers are directed to references [54][55][56][57][58] for more detail.
In this systematic review, we evaluate recent methods to automatically segment and classify pathological and non-pathological features in coronary OCT imaging. This automated segmentation is critical to rapidly and quantitatively assessing atherosclerotic lesions in clinical scenarios. Uniquely, we focus this review on the application of automated techniques to 3D computational reconstruction and subsequent patient-specific simulation which requires specific characteristics to be accurately delineated, such as the outer elastic membrane and deep plaque components. PUBMED and Web of Science databases were searched, supplemented by Google Scholar, resulting in 161 articles which were further screened based on title and abstract to include only full-length, original journal articles published during the previous five years (2016)(2017)(2018)(2019)(2020)(2021). Figure 3 details the consort diagram and review categories. A total of 78 screened articles were classified based on their focus as either the coronary lumen, artery layers, plaque characteristics and subtypes and stents. Included articles are summarized in Appendix A (Tables A1-A4), classifying the aim, dataset size, morphological/filter operations, feature detection/classification method, presented outcome and the point of comparison of each study. A glossary of evaluation metrics used to assess algorithm performance is also provided. Finally, we highlight potential challenges and multi-disciplinary opportunities for the computer science, engineering, and medical fields.

Coronary Lumen
Segmentation of the coronary artery lumen contour is perhaps the simplest task for automated techniques when there is no atherosclerotic disease and there has been appropriate clearance of blood from the OCT images. Here, globally used binarization methods [59], such as Otsu filtering [60][61][62][63], morphological operations, edge detection [64][65][66] and curve fitting [67] were often sufficient to automatically delineate the lumen. However, these methods are challenged when facing bifurcation regions and catheter artefacts, as well as improper blood clearance, which are not uncommon occurrences in clinical scenarios. Using a sequential combination of processing steps, an automated lumen border detection tool has shown good agreement with expert annotation when addressing these challenges [63]. Tissue characteristics, such as reflectivity, backscattering and absorption were used followed by contour refinement with a weighted linear least squares local regression approach before fitting of a second-degree polynomial to bridge catheter and bifurcation artefacts. However, these approaches can suffer in more complex lumen geometries, difficult bifurcation contours and stented artery sections.
Addressing complex lumen geometries, Joseph et al., developed a lumen segmentation method by enhancing lumen intensity through a transmittance-based method to iteratively drive the detected lumen edge towards the true lumen contour [68]. By utilizing speckle properties through a localized level-set segmentation method, this approach showed the ability to overcome image intensity variations. This allowed segmentation of challenging imaging datasets, including multiple lumens and subsequent automated 3D reconstruction. Other approaches to difficult lumen geometries include random walks based on edge weights and optical backscattering and graph-cut segmentation [69,70].
The latter, investigated by Essa et al., introduced a spatio-temporal segmentation method applying a Kalman filter to ensure border homogeneity and smoothness across an entire pullback [70]. This assisted in overcoming localized image-based noise and artefacts, an important consideration in automated 3D reconstruction. A cost function based on asymmetric local phase and first-order gaussian derivatives was introduced alongside a set of shape constraints to train a random forest (RF) classifier [71]. RF is particularly useful when handling noisy data and a large amount of input features as it avoids over fitting and can be more computationally efficient than other supervised learning techniques such as support vector machines (SVM) [72]. This approach achieved a sensitivity, specificity and Jaccard similarity index of 95.55 ± 3.19%, 99.84 ± 0.29%, and 0.95 ± 0.03, respectively, improving upon earlier first-order gaussian derivative approaches that achieved 89.76 ± 5.99%, 99.80 ± 0.56%, and 0.89 ± 0.06 in the same metrics [73]. Compared to using image intensity values alone, classification accuracy increased 6.80% in a dataset of 1846 images from 13 pullbacks (457 training, 1389 testing), whilst the mean average difference in area and the Hausdorff distance were reduced by 55% and 70% respectively. This highlights both that evaluation metric heterogeneity can significantly bias how improvement is measured, and that spatio-temporal approaches that consider all images in a pullback can achieve smooth contour segmentation in complex lumen geometries.
Although it is common to ignore bifurcation regions in 3D reconstructions, these regions are important to consider when assessing hemodynamics due to their flow-disturbing nature. However, bifurcation regions present difficulties when automatically segmenting the lumen. Addressing this, Macedo et al., built on their earlier work to propose a distance transform, similar to the distance regularized level set proposed in [74], to automatically correct lumen segmentation in bifurcation regions and areas of complex plaque [62,75]. Regions of bifurcations achieved results of 1.20 ± 0.80 mm 2 and 0.88 ± 0.08 for the mean average difference in area (MADA) and dice coefficient, respectively, compared to manual segmentation. This was in comparison to non-bifurcation regions achieving 0.19 ± 0.13 mm 2 and 0.97 ± 0.02 in the same metrics. Rather than a distance transform, Akbar et al., proposed an L-and C-mode interpolation approach to bridging lumen contour gaps caused by bifurcations [65]. Their approach, applied to 5931 images (40 patients), was then used to automatically reconstruct 3D lumen models for fractional flow reserve (FFR) assessment, with good correlation between manual and automated segmentations (R = 0.98).
To automatically segment bifurcation regions, rather than simply bridging over them, Cao et al., developed an automated branch ostium detection method [76]. By first fitting a contour to the main lumen, a dynamic programming based distance transform, introduced earlier and visualized in Figure 4c [74], was then used to select the main lumen and branch centroids. Ostium points on the main lumen contour were then detected using a differential filter and taking locations of maximum curvature. The method, shown in Figure 4, resulted in reasonable agreement to manual segmentation, but required manual intervention to adjust the threshold for the elliptical ratio of branches to avoid misclassification. Further advancement of this method by using a bifurcation classifier, such as that proposed by Miyagawa et al., could enhance segmentation results [77]. By comparing four CNNs (an original network using stochastic gradient descent followed by three networks making use of transfer learning from previous investigations [78]) a final area under the curve (AUC) of 99.72 ± 0.17% was reached, outperforming other bifurcation classifiers [75,79,80]. Interestingly, no statistically significant difference was found between results using polar and cartesian image coordinates, removing the need to pre-process images to polar form. To improve the ability to classify and segment the lumen in difficult regions, such as stented arteries and bifurcations, machine learning approaches show significant potential. Yang et al., compared the performance of six classifiers (RF, SVM, J48, Bagging, Naïve Bayes and adaptive boosting (AdaBoost) [81][82][83]) in difficult or irregular regions [84]. By identifying and classifying 92 features from 54 patients and 14,207 images (1857 images denoted as irregular) through supervised learning and a partition-membership filtering method, the RF classifier produced the best overall accuracy compared to the other five classifiers: RF 98.2%, SVM 98.1%, J48 97.3%, Bagging 96.6%, Naïve Bayes 88.8%, AdaBoost 88.7%. However, residual blood artefacts and clots hampered accuracy, which Yong et al., subsequently improved upon with a linear regression CNN trained on a 64 pullback dataset (19,027 images) [85]. Consisting of four convolutional layers and three fully connected layers with gradient based adaptive optimization (ADAM) [86], an overall dice and Jaccard index of 0.99 and 0.97 were reached, respectively, with an average processing time of 40.6 ms per image. Here the most significant improvements in accuracy were seen after training on 25 pullbacks; however, incremental gains were seen by including additional images.
As networks deepen, detailed information can be gradually lost due to resolution degradation, hampering classification and segmentation accuracy. Tang et al., addressed this by proposing a novel N-Net based CNN capable of re-using the original input image in deeper convolutions to couple the initial high resolution data with low resolution feature information [87]. Consisting of a multi-scale U-Net architecture and cross-entropy loss function trained on 20,000 images, results showed excellent agreement to expert annotation, including in complex lumen shapes, such as bifurcation regions (accuracy: 0.98 ± 0.00; specificity: 99.40 ± 0.05%; dice: 0.93 ± 0.00). The N-Net also resulted in significantly reduced loss (0.08) compared to traditional U-Net architectures (0.11-0.15). Approaches like this could assist in accurately and efficiently generating 3D lumen geometries for assessment of quantitative flow reserve (QFR) or WSS in near-real time [88][89][90].
For clinical application, computationally efficient segmentation and simulation is important. Using the K-means algorithm for unsupervised learning, followed by B-spline curve fitting, Athanasiou et al., achieved significant computation speed-ups compared to their previous methods [91,92]. A total computation time of 180 sec for lumen border detection and 3D reconstruction was achieved using biplane angiography. This compared to 1080 sec previously, with added robustness in cases with artefacts and noise, resulting in excellent agreement between manual and automated WSS computations (R 2 = 0.95). Computational speed and efficiency were further improved during the development of DeepCap, which further focused on using a small memory footprint [93]. Their approach was based on a U-Net architecture, using upsampling, downsampling and skip connections to improve network gradient propagation [94]. Dynamic routing was then utilized to optimize capsule weights [95,96]. Comparisons made between the UNet-ResNet18 (UNet-18), FCNResNet50 (FCN-50) and DeepLabV3-ResNet50 (DLV3-50) [97][98][99] showed that the proposed DeepCap method achieved 70% faster graphics processing unit (GPU) computation, 95% faster central processing unit (CPU) computation and a 70% reduction in memory. This speedup resulted in segmentation of an entire 200 image pullback in 19 sec on a CPU and just 0.8 sec on a GPU. This was achieved with comparable robustness and accuracy (dice: 97.00 ± 5.82; Hausdorff distance: 3.30 ± 1.51; specificity: 99.54 ± 0.75%; sensitivity: 93.27 ± 8.22%) in a 12,011 image (22 patient) dataset. Impressively, only 12% of the total parameters of previous methods were used. The resulting 3D reconstruction and comparison to expert annotation-based reconstructions is shown in Figure 5. This rapid clinical application of automated lumen segmentation could produce a significant leap in quantitative data available to clinicians, improving patient outcomes and the utility and acceptance of intravascular imaging modalities, machine learning approaches and the translation of 3D simulation capability, such as WSS computation.

Artery Layers
In healthy coronary sections the inner and outer elastic membranes can be visualized through intensity changes and their associated gradients, as illustrated previously in Figure 1. Using this knowledge, Zahnd et al., developed a front propagation scheme to segment the intima-media, media-adventitia and adventitia-periadventitial tissue borders [100]. By using the image gradient properties, an AdaBoost classified machine learning approach, and feature selection based on a RF framework, segmentation errors of 29 ± 46 µm, 30 ± 50 µm and 50 ± 64 µm resulted for the intima-media, media-adventitia and adventitia-periadventitial layers (Dice = 0.93). By further investigating the efficacy of three emerging classifiers (CNN pre-trained on the AlexNet model, RF and SVM), Abdolmanafi et al., found that the most robust feature extractor was the pre-trained CNN, while the RF produces the best classification results of up to 96% for the media layer [101]. Furthermore, using the pre-trained CNN as a feature generator for both the RF and SVM classifiers resulted in their highest accuracy (96 ± 0.06 and 0.90 ± 0.10, respectively) and most computationally efficient approach compared to the purely CNN method (0.97 ± 0.04).
Further approaches to segment the intimal and medial layers in cardiac allograft patients made use of the layered optimal graph-based image segmentation for multiple objects and surfaces (LOGISMOS) framework [73,[102][103][104][105]. This approach enables a fast and quantitative assessment of changes in wall morphology that associate with cardiac allograft vasculopathy (CAV). By using transfer learning from the ImageNet database initialized with the Caffe framework [106], Chen et al., generated exclusion regions to classify artery layers in 50 heart transplant patients, with average errors of 4.98 ± 31.24 µm and 5.38 ± 28.54 µm for the intima and media respectively [102]. These errors were less than the inter-observer variability reported of 6.76 ± 10.61 µm, although their standard deviations were significantly larger, possibly due to the surface smoothness constraint put on the algorithm.
By extracting further information on vascular tissue components through polarizationsensitive OCT (PS-OCT) [107][108][109], Haft-Javaherian et al., were able to detect the lumen, intima and medial layers with impressive absolute distance errors of 2.36 ± 3.88 µm, 6.89 ± 9.99 µm and 7.53 ± 8.64 µm, respectively ( Figure 6) [110]. Comparisons between the automated approach (blue) and expert annotation (red) showed strong ability to handle many difficult, yet common, features observed in OCT pullbacks. Carried out on a small dataset of 984 images (from 57 patients), a multi-term, multivariate loss function was created through combination of five common functions, namely: dice; weighted cross-entropy; topological; boundary precision loss; and an attending physician loss function to account for manual input. When applied through a U-Net based deep residual learning model using a leaky rectified linear unit (ReLU) function [111], overall classification accuracy for six components were: plaque shadow 0.82, guidewire shadow 0.97, lumen 0.99, intima 0.98, media 1.00 and outer wall 0.99. This approach could also be useful in segmenting the outer elastic membrane in hybrid IVUS-OCT systems [112], where the multivariate loss function could manage the added information provided by IVUS while maintaining the high-resolution OCT image characteristics during segmentation. Although showing impressive accuracy, the segmented outer boundaries in this approach did not always produce smooth contours, particularly in diseased regions where signal attenuation was high (see Figure 6A,D,F-I). Discontinuous contours produce challenges when applying results to 3D modelling (in both computer-aided design (CAD) or finite element mesh (FEM) packages) and do not represent biological tissues well. Addressing this challenge, Olender et al., developed a 3D surface fitting technique using a mechanical, spring based approach [113]. This method was designed to ensure smoothness of the outer wall over the entire pullback through a force-balance/constrained nonlinear optimization method. By using edge detection methods to segment the outer elastic membrane in healthy wall regions and fitting of an anisotropic, linear elastic mesh to the associated A-line locations, forces proportional to the sum of A-line pixel intensities were then added ( Figure 7) [114]. The resulting iterative force-balance optimization resulted in a mean difference in area (MADA) of 0.93 ± 0.84 mm 2 compared to expert annotation in 724 images from seven patients. Further validation against manually annotated and co-registered IVUS pullbacks resulted in a MADA of 1.72 ± 1.43 mm 2 (19.2 ± 15.0%). While surface smoothing and fitting times were 2.74 ± 0.28 ms and 40.20 ± 7.50 ms per frame, respectively, this approach would benefit from improvements to the lumen and edge detection speeds which required a much greater 4.20 ± 1.50 s and 5.35 ± 0.85 s per frame, respectively, to make it clinically applicable. This approach shows promise for smoothly segmenting the outer wall in OCT images while constraining atherosclerotic tissue classification approaches. Figure 7. Outline of the surface fitting technique using four different spring stiffnesses (blue, green, yellow, and red) fitted either to visible sections of the outer elastic membrane or the detected lumen contour. Nodes (black circles) were connected to adjacent nodes within the image frame as well as both proximal and distal frames. Gray arrows represent the applied forces proportional to the sum of A-line pixel intensities. The surface fitting and force-balance optimization was carried out across the entire pullback (j direction) to generate a smooth and continuous outer wall over the entire artery section. © [2019] IEEE. Reprinted, with permission, from [113].

Plaque Characteristics and Subtypes
Finding critical features to help accurately classify coronary plaques is an important research focus, as computation time is heavily dependent on the number of plaque features acquired. These morphological features, including optical characteristics, lumen morphology, A-line peaks and texture analyses were further investigated in [115]. Here a three-class random forest (3C-RF) classifier was compared to a similar three-class support vector machine (3C-SVM) as well as a dual binary (DB) classifier; the difference being the three-class classifiers simultaneously searched for fibro-calcific and fibro-lipidic A-lines, whereas the DB followed a sequential approach. Using both the minimal-redundancymaximal relevance (mRMR) [116] and binary Wilcoxon [117] methods combined with conditional random field (CRF) [118] denoising, a total of ten feature selection and classification schemes were tested on a dataset of 6556 images (49 pullbacks) and histologically validated on 440 ex vivo images (10 pullbacks). It was found that lumen morphology and 3D edge/texture features from the Leung-Malik filter bank [119] provided the largest improvements in classification accuracy of up to 81.6% in the 3C-SVM with mRMR feature selection. This segmentation was then translated into a 3D rendering to demonstrate an automated, proof-of-concept segmentation tool, shown in Figure 8. were shown [120][121][122]. These errors are due to the high signal attenuation and diffuse contours representative of a fibrous cap overlying a lipid pool coupled with inter-observer variability and expert interpretation in the manually segmented ground truth. As accurate thickness measurement is a critical parameter for quantification of plaque vulnerability and biomechanical stress, further research to address these challenges and reduce errors is required [123]. Techniques such as dynamic programming have also demonstrated the capability to overcome these challenges and could be further explored [124,125]. This study was also limited to using only 1008 images (after data augmentation) from two patients, suggesting room for larger, more detailed studies in the future.
To assess the vulnerability of plaques, quantifying multiple plaque components and subtypes is essential. Liu et al., developed an ensemble method to combine the outputs of multiple networks to improve the accuracy of detecting vulnerable regions [150]. By combining the Adaboost, YOLO, SSD, and Faster region-based CNN outputs, a precision and recall of 88.84% and 95.02%, respectively, were reached, with a total detection quality of 88.46%. To further improve vulnerable plaque assessment, Gerbaud et al., introduced an adaptive attenuation compensation algorithm to assist in visualizing the outer elastic membrane in in regions of high attenuation [151]. This allowed plaque burden to be quantitively and automatically assessed, resulting in a mean difference of 0.27 ± 3.31 mm 2 for the outer elastic membrane and −0.5 ± 7.0% for plaque burden when compared to matched IVUS frames. Such capability overcomes one of the most significant limitations associated with OCT use and could be further used to assist quantifying the lipid core burden index proposed in [152]. By further developing a normalized-intensity standard deviation (NSD) measure, Rico-Jimenez et al., were also able to successfully automate the detection of macrophage infiltration in regions of intimal thickening, fibrous plaque and fibroatheroma, resulting in an accuracy, sensitivity and specificity of 87.45%, 85.57% and 88.03%, respectively, in a k-fold validation against manual segmentation [153]. Through the introduction of a pyramid parsing network, with encoder consisting of a ResNet50 based CNN, Shibutani et al., were also able to detect regions of previous rupture/erosion that have since healed [154]. The ex vivo assessment and histological comparison of 1103 segments showed excellent area under the curve of 0.86, highlighting the potential for future automated classifiers to recognize emerging risk factors.
A key focus has been the classification of atherosclerotic tissue into fibro-calcific and fibrolipid components through A-line characteristics [115,[155][156][157]. Kolluru et al., showed that CNN classification more closely resembled expert annotations than an ANN, despite similar accuracy for both fibro-calcific and fibro-lipid components [155]. With this knowledge, Lee et al., compared the classification accuracy of the SegNet and Deeplab v3+ CNNs [157][158][159]. The 91 layered SegNet network, pre-trained in the ImageNet dataset [160], outperformed the Deeplab v3+ network for both fibro-lipidic (Dice: 0.83 ± 0.06 vs. 0.780 ± 0.077; Jaccard: 0.73 ± 0.073 vs. 0.65 ± 0.10) and fibro-calcific (Dice: 0.90 ± 0.04 vs. 0.82 ± 0.07; Jaccard: 0.83 ± 0.04 vs. 0.70 ± 0.10) A-line classifications, respectively. Investigations have also suggested that including attenuation coefficients in A-line classification of fibro-calcific and fibro-lipid components can further increase accuracy, including differentiation from other tissue types (mixed, macrophages, necrotic cores) [161][162][163]. The network architecture totaled five pooling/unpooling layers with 26 convolutional layers and added image padding to avoid misclassification due to edge effects. This architecture was then applied in a hybrid learning approach on 6556 images from 49 patients with a RF classifier [156] implemented due to the faster computation time, needing only 25% of the training time and 33% run time of a SVM to achieve comparable accuracy. When a CRF was applied for noise postprocessing, the hybrid model approach outperformed a purely CNN for fibro-calcific (sensitivity: 97.20% vs. 80.20%; specificity: 91.90% vs. 92.90%) and fibro-lipid (sensitivity: 77.30% vs. 46.80%; specificity: 91.90% vs. 92.90%) classification, needing approximately one second per image (the majority, 0.9 s, required for feature extraction). The key differentiator here was that the hybrid method made use of morphological features.
To investigate the classification of fibrous tissue alongside calcification, macrophages, neovascularization and healthy intima/media layers, Abdolmanafi et al., compared three CNN based feature generators (AlexNet [164], VGG-19 [145] and Inception-v3 [165]) to train a RF classifier [132]. Although features generated from pre-trained networks are useful to reduce training/computation time, results show that accuracy, sensitivity, and specificity suffer when supervised fine tuning is not applied. To overcome this, a weighted majority voting approach was applied to the RF results from each set of features, leading to signifi-cant improvements in performance over 33 patients (Accuracy: 0.99 ± 0.01%; Sensitivity: 98.00 ± 2.00%; Specificity: 100.00 ± 0.00%). This method outperformed an FCN trained on a larger 5040 image (45 pullback) dataset [133]. By making use of dilated convolutions for semantic segmentation and spatial pyramid pooling modules, Abdolmanafi et al., further developed an FCN capable of classifying and segmenting tissues into fibrous, fibro-calcific, fibroatheroma, thrombus, and micro-vessels with accuracy of over 93% in each case [134]. They demonstrated that the ADAM optimizer and weighted cross-entropy loss function outperformed stochastic gradient descent and the dice loss coefficient, respectively, in the 41-pullback dataset. While ADAM in particular may outperform stochastic gradient descent, its generalization performance may suffer, hampering translation to other datasets [166]. Interestingly, this approach also made use of the original image rather than A-lines from the polar transform, reducing the computational cost associated with this pre-processing step whilst maintaining accuracy.
Polar and cartesian representations of OCT images can provide varying features for automated extraction. This was exploited by Gessert et al., with a multi-path architecture, as shown in Figure 9 [130]. Variations in concatenation points for feature fusion, transfer learning approaches and data augmentation resulted in an overall best performance of 91.70%, 90.90%, and 92.40% for accuracy, sensitivity, and specificity, respectively (F1 score of 0.913) [130]. The dual path variations of ResNet-v2 [97] and DenseNet with late feature concatenation increased accuracy by 1.4% and 1.8%, respectively, suggesting some added benefit from combining features from cartesian and polar image forms. Interestingly, cartesian based images saw a more significant gain in accuracy with both data augmentation (16%) and transfer learning approaches (15%), compared to polar images. Both methods were shown to outperform other models to classify vulnerable plaque when applied to a deep residual, U-Net based CNN [126,135]. The traditional encoder was replaced with the pre-trained ResNet101 for transfer learning improvements while rotational based data augmentation increased the number of images ten-fold (to 8000). A multi-term loss function was proposed to overcome imbalances in foreground/background pixels, which can lead to incomplete vulnerable region detection. By combining the weighted cross-entropy loss function, to enhance boundary pixels and improve boundary segmentation, and dice coefficient, to increase pixel classification accuracy, an overall pixel accuracy and precision of 93.31% and 94.33%, respectively, were reached [135], improvements of 49% and 14%, respectively, over the initial prototype U-Net. More impressively, the mean intersection over union and frequency weighted intersection over the union, improved measures of the overlap in two regions, improved 103% and, 71%, respectively. Calcified plaques generally present more favorable optical properties for segmentation [45]. Using a deep CNN, trained on the ResNet-50 network over a dataset of 4860 images (18 pullbacks), He et al., managed a precision, recall and F1 score of 0.97 ± 0.01, 0.98 ± 0.03, and 0.96 ± 0.03, respectively [167]. This result was achieved by the zeropadding, 3D ResNet network trained in the ImageNet dataset making use of the ADAM optimizer, which outperformed the same network setup for the 2D ResNet. Here, data augmentation was also shown to be an important step, reducing model overfitting, and strengthening the generalizability. In comparison, using a U-Net based architecture with the same binary cross-entropy loss function, Avital et al., managed an impressive accuracy of 0.99 [168]. However, this classification and segmentation still requires translation to 3D geometries for the purpose of application in biomechanical simulation.
Building on their previous work, Lee et al., developed a two-step process to both segment and reconstruct 3D calcification models, as shown in Figure 10 [169]. Here a deep learning CNN model was used for classification followed by the pre-trained SegNet network developed in [170]. The initial classification made use of transfer learning from the VGG-16 and VGG-19 networks with five-fold cross validation and final use of the Tversky loss function, which provided superior performance compared to the weighted cross-entropy and dice loss coefficients. Importantly, a fully connected CRF was applied to denoise the output and create labels with more relevant spatial characteristics, an important step for 3D reconstruction. This resulted in calcification detection sensitivity, specificity and F1 score of 97.70%, 87.70%, and 0.92, respectively, from a dataset of 8231 images (68 patients). This improved upon earlier sensitivity and dice coefficients of 85.00 ± 4.00% and 0.76 ± 0.03 [170], respectively, from a one-step, weighted VGG-16 based CNN that was tested on 2640 images from 34 pullbacks and trained on the CamVid dataset [171]. Furthermore, the two-step approach reduced misclassification of tissues adjacent to calcifications, resulting in more accurate calcification angle, depth and thickness measurements and subsequently better segmentations. Of note, at least 3900 images were required for training of the two-step method to obtain stable and reproducible results, highlighting the need for larger, expert annotated datasets. Dealing with limited datasets, with either scarce or weak annotations, is a significant challenge in the medical field and an ongoing research focus [55]. Rather than addressing the challenge of dataset size by building larger datasets, Kolluru et al., proposed to reduce the number of images needing expert annotation [172]. By focusing on calcified lesions, a deep feature-based clustering technique was developed to identify images needing expert annotation from identified volumes of interest (VOI). This removed the need to manually annotate a complete set of training labels, reducing a significant time cost. The clustering method was compared to annotation of equally spaced images on a dataset of 3741 images (60 VOIs from 41 pullbacks), outperforming the equally spaced annotation dataset using just 10% of the total selected images. Further development and use of approaches such data augmentation, transfer, and active learning, CRF post-processing and class activation mapping to reduce the number of annotated images needed for accurate training and classification would benefit the field.

Stents
OCT can be used both immediately after stent deployment to visualize stent sizing, apposition of struts against the intimal surface and to identify acute stent-related complications (e.g., stent-edge dissection). Furthermore, it also plays a role when assessing the underlying nature of later stent complications, such as in-stent restenosis caused by neointimal hyperplasia or neo-atherosclerosis and stent thrombosis. The automatic detection, segmentation and quantification of stent strut mal-apposition post stent deployment could assist in analyzing areas at increased risk of subsequent neointimal proliferation, stent thrombosis and MACE [173]. Early classification of this apposition and neointimal coverage was carried out using a supervised ANN on a relatively small dataset of 20 pullbacks [174]. Twenty-two A-line features in polar coordinates were extracted based on image intensity gradients in similar fashion to early lumen-based segmentation, but with the addition of strut shadow gradients to classify candidate regions of interest (ROI). A-line representation (previously visualized in Figure 1) of stent struts and their shadows were suggested to be less affected by artefacts and rotational distortion in polar coordinates, a preferable characteristic for automated classification [175]. Based on a split of 70%, 15% and 15% split for training, validation, and testing, respectively, results showed a strong positive predictive value of 95.60% (97.40% vs. 95.10% for uncovered and covered struts, respectively). However, these results were influenced by image quality, with covered struts in particular suffering from a lower positive predictive value of 86.10% in suboptimal image sets.
To improve stent strut segmentation in suboptimal images, such as those with residual blood artefacts, Cao et al., investigated an AdaBoost trained, cascade classifier [176]. With a combination of three filters of varied angles developed through a dynamic programming approach, true positive scores of 0.87-0.93 in image sets with significant blood artefacts (F score 0.88-0.89) were achieved, comparable to images without artefacts (TPR 0.91-0.96; F score 0.90-0.93). While still using a relatively small dataset of 15 pullbacks (4065 images and 12,550 struts), the overall recall rate for covered struts was 0.98. The resulting malapposition calculation matched well with manual segmentation, although with a slight increase due to the false positive rate of 26.70% driven by images with significant blood artefacts.
Another challenge presented in stented arteries is variation in the optical characteristics between bare metal stents (BMS) and bioresorbable vascular scaffolds (BVS). While metallic stents present with well-defined edges and an invisible strut backside/pronounced shadow, BVS edges are well defined around a dark core [177]. Focusing on metallic stents, Jiang et al., compared the performance of the YOLOv3 framework and a region-based fully-convolutional neural network (R-FCN) [178]. The YOLOv3 framework made use of a binary-cross entropy loss function and K-means adjusted anchor box detector using the SSD method, while the R-FCN combined log-classification and smooth regression loss functions and a novel position-sensitive feature score map. Although obtaining similar results, the R-FCN eventually reached the highest precision of 99.8%, although the test set consisted of only 425 images. In contrast, Amrute et al., built on previous work to automatically segment BVS using an unsupervised K-means clustering approach [179]. A positive predictive value of 93.00% was reached through testing on 1140 images. Building on this work, Lau et al., focused on segmenting both BMS and BVS with one architecture [180]. The MobileNetV2 [181] was first combined with the U-Net architecture to reduce computational cost and compared to the DenseNet121 encoder, with the overall best dice coefficient of 0.86 for the segmentation of the BVS. However, misclassification of images with bright fringes (common in BMS), dark shadowing, fractured struts, and areas of large neointimal coverage is common in many approaches. These are still future challenges to be overcome for automatic strut detection methods.
By building larger datasets for training and validation, Lu et al., further addressed the challenges of stent apposition, quantitative coverage measurement and detection in regions of strut clustering [182]. In 80 pullbacks (7125 images) with 39,000 covered and 16,500 uncovered struts, 21 features (including patch features shown in Figure 11) were chosen through a forward feature selection technique with a bagged decision trees classifier.
By using a SVM for classification (LIBSVM library [183]) and a graph-based mesh growing technique to overcome challenges associated with stent struts that were clustered close together, a sensitivity and specificity of 94.00 ± 3.00% and 90.00 ± 4.00%, respectively, were obtained. This approach was further developed into a toolkit (OCTivat-Stent), published in 2020, capable of reducing total segmentation time to just 30 min per pullback, from 6-12 h through manual annotation [184]. Additionally, specificity was greatly improved as strut coverage increased beyond 40 µm, with further research needed to accurately and consistently quantify thinner neointimal coverage. Figure 11. Patches used to extract features for uncovered, thinly covered, and thickly covered struts. Side patches (orange) capture continuity of the tissue, while the green, blue, red, and purple patches highlight the front, middle, stent strut and backside pixel regions, respectively. Reprinted from [182], with permission, under the Creative Commons.
Feature-based segmentation still encounters challenges with varying acquisition settings and patients, as well as difficulty translating between stent designs without manual intervention. With this in mind, Wu et al., developed a CNN architecture based on the U-Net and RefineNet architectures [185] (Figure 12), to segment stent struts from pseudo-3D image stacks in polar form [175]. The pseudo-3D form uses prior knowledge of the implanted stent design and consecutive image slices to constrain the segmentation results, similar to a previous approach for constraining the 3D segmented point clouds to known strut skeletons [186]. The four-stage deep CNN architecture, consisting of start and end modules sandwiching the encoder and decoder, made use of batch normalization and convolution operations to mitigate gradient degradation and shortcut connections to minimize loss of spatial resolution, common factors impacting strut detection. With 80% of images used for training with the ADAM optimizer and combined binary cross-entropy and Tversky loss functions over 300 epochs, the deep CNN outperformed all feature-based techniques as well as the same deep CNN without the pseudo-3D image input. This highlights the importance of using consecutive image slices and prior knowledge of the stent structure to classify and detect struts. Importantly, in a dataset of 170 pullbacks (205,513 stent struts) containing 13 stent designs, overall results for dice coefficient, Jaccard index and precision were 0.91 ± 0.04, 0.84 ± 0.06 and 0.94 ± 0.04, respectively, highlighting the ability of this approach to handle difficult cases of malapposition and intimal coverage.
Application of these segmentation methods to computational simulation requires the additional step of 3D reconstruction of both the stent structure and lumen surface. Building from in vitro models with application of the Sobel edge detection and interpolation between detected struts [187,188], Migliori et al., used a fuzzy logic approach for classification of a Multi-link 8 stent (Abbott Laboratories, Abbott Park, IL, USA) and subsequent 3D reconstruction with reasonable agreement to manual approaches [189]. To improve the stent reconstruction, Elliot et al., made use of diffeomorphic metric mapping to develop a constrained iterative deformation process that configures an initial undeformed stent geometry to the 3D imaged point cloud [190]. Tested on two stents (Integrity bare metal stent and Xience Alpine drug eluting stent) in four in vitro models and compared to manual segmentation and reconstruction, results showed good agreement, with an average distance between the strut centroids of 97.5 ± 54.4 µm. In in vivo cases, by improving lumen segmentation around struts with a novel correction step to account for blood artefacts, Bologna et al., automatically generated a stented artery model for simulation of WSS from the OCT based 3D point cloud and biplane angiography centerline ( Figure 13) [64]. However, these approaches suffered in the case of struts that did not have visible, continuous, or square outlines. Building on this with an enhanced reconstruction method using prior knowledge of the undeformed stent geometry, O'Brien et al., automatically analyzed four swine models using attenuation coefficients and a decision tree classifier, expanding previous studies to obtain good agreement with manual segmentation [186,191,192]. WSS results from the enhanced simulation showed improved resolution in the hemodynamic microenvironment compared to the unenhanced method. Furthermore, a strong association between WSS and strut-lumen distance was seen, highlighting the importance of accurate classification, segmentation, and reconstruction for 3D simulation results.

Discussion
Methods to automate the classification and segmentation of pathological and nonpathological formations in intravascular OCT images are emerging as clinically feasible. To automatically segment the lumen, the deep capsules approach presented by Balaji et al., showed impressive accuracy, speed and efficient computational use which make it an ideal candidate to make it to clinical use [93]. This approach built upon the useful characteristics of the U-Net to maintain high-level feature accuracy and shows strong promise to be expanded to plaque component analysis. However, this approach should also be expanded to be able to segment bifurcation regions and requires further work to better handle fringe cases (i.e., increasing the number of cases with artefacts and difficult geometries). Addressing the artery layers and outer wall, the mechanical approach presented by Olender et al., demonstrated impressive speed when fitting and smoothing a 3D surface from all images in a pullback [113]. This overcomes OCT's most significant limitation, penetration depth in deep atherosclerotic components. However, its lumen and outer elastic membrane identification speed still lacks and could benefit from the U-Net based network proposed by Haft-Javaherian et al. [110]. This approach could also show promise for automating the segmentation of tissue in future hybrid imaging modalities, such as a combined IVUS-OCT probe [193], as its multivariate loss function could manage the added information that IVUS presents. Various techniques provided strong segmentation capability for plaque compositions and coronary stents, with CRF de-noising and strut detection constraints with prior knowledge of stent design more critical factors than the underlying network to providing strong results. However, further research is required to target quantifying fibrous cap thickness accurately in image datasets that well represent real-world scenarios, with current studies significantly limited to small datasets (179-348 images in each study to date [123][124][125]). Until studies have access to datasets that are representative of real-world scenarios, clinical application will remain limited.
Furthermore, while these methods show strong promise, assessing their effectiveness is not a straight-forward task, as heterogeneity in evaluation metrics can lead to an incomplete assessment of a methodology. A wide range of evaluation metrics have been used to assess the performance of automated techniques, with significant research applied to developing distance, similarity and boundary overlap metrics [194,195]. Choosing the most effective measure for the task at hand is difficult and can lead to bias in results, particularly when dealing with class imbalance [196]. Making use of frequency weighted evaluation metrics, such as the frequency weighted intersection over union rather than the commonly used Jaccard similarity index could assist in dealing with this challenge. Development of consensus documents for OCT based deep learning may also assist researchers reduce other biases in their work, including data distribution, dataset leakage and methodological bias, factors already shown to significantly skew results in cancer diagnoses [197][198][199][200]. Improving access to large scale, longitudinal and multicenter datasets that are representative of real-world scenarios coupled with consistent use of techniques including cross-validation, model regularization (to prevent overfitting or underfitting) and de-biasing through oversampling and adversarial de-biasing will help in addressing these challenges. Competitions, such as [201], could further assist by standardizing the development and evaluation of methods on pre-defined datasets, improving transparency, while open-source projects, such as the medical open network for artificial intelligence (MONAI), first publicly released in 2020, provide best practice deep learning frameworks [202].
Reviewed studies primarily used supervised learning techniques, such as neural networks, RF and SVM, where the model has access to both the original image, as well as manually annotated versions during training to effectively learn the correct parameters [85,101,156]. This requires large, high-quality, manually annotated datasets for training and validation to produce accurate and robust results, a significant cost. A focus on addressing this challenge by handling imperfect datasets with sparse or no manual annotations is emerging [55]. State-of-the-art unsupervised learning techniques, such as generative adversarial networks (GAN) and autoencoders, are also gaining in popularity and could reduce this burden by learning patterns from unlabeled data or generating further image labels to optimize segmentation [203,204]. While Abdolmanafi et al., applied a sparse autoencoder in their work segmenting atherosclerotic tissue types [134], recent advancements in autoencoders applied to CT imaging are also leading to stronger feature learning and dimensionality reductions that could translate for use in intravascular OCT [205].
With improvements in classification and segmentation capability, there is a growing need to integrate these advances into automated 3D reconstructions in a sufficient framework for biomechanical simulation. Lumen and stent-based investigations have already begun developing this ability for clinical application [91,93]. However, structural based analysis still lags due to the added complications of generating smooth and sufficiently connected regions for finite element mesh generation. To the best of our knowledge, the only framework to integrate image classification, segmentation, 3D reconstruction and structural simulation is that recently presented by Kadry et al. [206]. This framework, shown in Figure 14, built on their previous works to classify pixels into six tissue components within a constrained wall area region, making use of 3D mode filtering to improve spatial consistency and continuity of contours [113,114,131]. This approach shows significant potential to translate to clinical use, as it brings together the relevant processing steps into a single framework. Future work could also be made to account for motion artefacts within intravascular imaging, which were suggested to result in relative stenosis length errors of up to 160% (compared to 0.6% after motion catheter trajectory and time synchronization) [207]. While an impressive step forward, future work is still required to integrate an imaging modality capable of generating an accurate 3D centerline to stack the 2D contours [208][209][210][211]. Of the available modalities that could be used, invasive coronary angiography is the primary candidate due to its widespread clinical use and requirement during intracoronary OCT procedures. However, computed tomography coronary angiography is a rising noninvasive contender and coronary magnetic resonance imaging could also be a useful addition to reduce patient radiation and contrast exposure, although lower image resolution and susceptibility to motion related image degradation could impact reconstruction accuracy in these cases [212,213]. Framework layout for the automated reconstruction and 3D structural simulation of an artery. Initial OCT images were stacked to form a pseudo-3D image sequence before classification with a CNN and generation of label maps which were subsequently smoothed into contours to generate the digital phantom which was converted to a finite element mesh for structural simulation. Republished with permission of The Royal Society Publishing, from [206]; permission conveyed through Copyright Clearance Centre, Inc.
Multi-modal intravascular imaging modalities also have the capability to further overcome challenges with automatic OCT segmentation. The integration of OCT and IVUS, for example, could overcome the limited 0.1 to 2 mm penetration depth associated with OCT in plaques, removing the need for complex estimation techniques to segment the outer wall or plaque backsides and quantify plaque burden in regions of high attenuation [193,214]. The complementary capabilities of these two imaging modalities have already demonstrated their potential to increase positive predictive capability when detecting TCFA [215]. Developments in OCT also show promise for providing useful histopathological information, with PS-OCT [108] demonstrating incremental value in the segmentation of artery layers and the outer wall [110]. Furthermore, molecular information obtained from multi-modal imaging could assist in automatically segmenting emerging vulnerable features, such as layered plaques, indicative of previously destabilized plaque that has since healed, or collagen arrangement within the fibrous cap, which could suggest lesion instability [216,217]. Further development of near-infrared spectroscopy/Raman, fluorescence lifetime (FLIM) and near-infrared autofluorescence (NIRAF) modalities in combination with OCT also shows promise to extract biochemical and molecular tissue information on elastin and macrophages whilst nuclear imaging techniques such as positron emission tomography (PET) could supplement this with information on local inflammatory responses [112,[218][219][220].
This molecular imaging capability could lead to more accurate classification and segmentation of vulnerable plaque regions. For example, the first in-human study on NIRAF combined with OCT showed NIRAF associated with high-risk plaque phenotypes, complementing the structural information available through OCT [221]. Further advancements could also assist in differentiating between healthy re-endothelization or fibrin drug eluting stent coverage, improving the ability to stratify risk of late stent thrombosis [222]. Combining this ability to accurately segment pathological borders and extract molecular information, reminiscent of an advanced virtual histology IVUS/OCT [223,224], presents opportunities to reverse engineer tissue constitutive models and adapt structural simulations to patient-specific conditions, currently a major limitation in the field of biomechanics [225][226][227][228][229][230][231][232][233][234]. However, there is still a need for further evidence to determine which multi-modal imaging technique can provide the strongest incremental benefits and risk stratification to improve both clinical outcomes and simulation capability.
Clinician acceptance of machine learning algorithms, especially in the case of intravascular OCT, is still tied to the imaging modality's clinical utility. While OCT and IVUS are still not a part of routine coronary angiography procedures, automated segmentation approaches that can run in near real time in the catheterization laboratory could provide a significant advance in making quantitative data (e.g., fibrous cap thickness measurement) readily available to the interventional Cardiologist and assist with the interpretation of OCT images. In turn, this could inform clinical decision making and lead to better patient outcomes. The future potential for automated approaches to make it into clinical use also require addressing a number of systemic challenges, including: (1) Improving access to large scale, expertly annotated datasets to train and test techniques on data that is representative of real world scenarios; (2) Evidence that techniques are robust and reliable enough to enable clinical use and provide sufficient incremental value to justify the associated costs (i.e., health economic analysis); (3) Regulations surrounding the updates of medical technology could inhibit the rapid adoption required for AI in clinical scenarios; (4) Data ownership could impact how techniques develop, particularly if research techniques develop with large scale datasets to the point of commercial potential. [235]. These are both multi-disciplinary challenges and opportunities for the engineering, computer science and medical research fields.

Conclusions
Intravascular OCT is a high resolution, near-infrared light-based imaging modality capable of visualizing vulnerable plaque features, such as TCFA. Manual annotation of these images is a time consuming and tedious task, limiting its clinical application and use in 3D reconstructions for biomechanical simulation. With increases in computation power and numerical capability, automated techniques are emerging to classify and segment pathological and non-pathological formations, including vulnerable features. This review summarized recent advances (2016-2021) in automated techniques, applied to coronary OCT imaging and their subsequent application to 3D reconstruction and biomechanical simulation. Deep learning models have demonstrated the capability to classify and segment structural features in OCT imaging, including lipidic, calcific, and fibrous plaques, as well as stent and lumen borders in regions with considerable imaging artefacts. This capability is beginning to show potential for clinical use, with significant reductions in computation time allowing near real-time classification and segmentation. However, challenges surrounding access to large scale, expertly annotated image datasets that represent real-world scenarios and robustness of automated techniques to clinical use still need to be addressed before clinical acceptance. Further advances in multi-modal imaging catheters could increase the information available to automated techniques. When coupled with patient details and developments to streamline the process of 3D reconstruction and simulation, this capability could one day assist in guiding patient-specific care or intervention. Conflicts of Interest: P.J.P. has received research support from Abbott Vascular; has received consulting fees from Amgen and Esperion; and has received speaker honoraria from AstraZeneca, Bayer, Boehringer Ingelheim, Merck Schering-Plough, and Pfizer. All other authors declare no other relationships relevant to this paper to disclose.

Glossary of Performance Metrics
Accuracy (ACC): Accuracy is the proportion of pixels classified correctly out of the total number of pixels classified, defined as where k is the total number of classification categories within the dataset, t i is the number of pixels belonging to the ith category and n ii is the number of correct pixel predictions of the ith category. Area under the curve (AUC): Area under the curve determines an algorithm's ability to distinguish between two classifications, with a value closer to one indicating better performance.
Average symmetric surface distance (ASSD): The average symmetric surface distance calculates the average distances, D, between point, x, on the boundary of the predicted region, ∂P, and its nearest point, y, on the boundary of the ground truth, ∂GT, and in reverse from the ground truth to the predicted surface.
Bhattacharya distance (BHAT): The Bhattacharya distance determines the similarity between two discrete probability density functions of image intensity values as [236] where h is the number of image intensity levels, in the case of image analysis, considered for the probability distributions P and Q.

Coefficient of determination (R 2 ):
The coefficient of determination defines how changes in a dependent or predicted variable are explained by changes in a second variable, described by where there are n values of dataset y, and predicted values, f.

Cohen's kappa coefficient (CK):
Cohen's kappa coefficient evaluates the reliability of agreement between two results, in this case the ground truth manual annotation and algorithm result, by taking into account chance and is evaluated as [237] where p 0 is the observed agreement and p e is the probability of agreement by chance.

Concordance-correlation-coefficient (CCC):
The Concordance-correlation-coefficient determines the agreement between variable y and a reference ground truth x, defined as [238] where the µ x and µ y are the variable means, σ x and σ y are standard deviations, σ x 2 and σ y 2 are the variable variances and ρ is Pearson's correlation coefficient.

Dice coefficient (DICE): The Dice coefficient determines the overlap between two regions A and B as [239]
where A is the region determined by the algorithm and B is the manually labelled ground truth or point of comparison, with higher values suggesting better performance.

Frequency weighted intersection over union (FIoU):
The frequency weighted intersection over the union determines the mean overlap between the algorithm calculated area and ground truth weighted by the frequency of occurrence of each category. Defined as where k is the total number of classification categories within the dataset, t i is the number of pixels belonging to the ith category, n ji is the incorrect prediction of the jth category when pixels belong to the ith category and n ii is the number of correct pixel predictions of the ith category.

Hausdorff distance (HD):
The Hausdorff distance determines the largest of all the distances, D, between a point, x, on the boundary of the predicted region, ∂P, and its nearest point, y, on the boundary of the ground truth, ∂GT, defined as Jaccard similarity (JS): The Jaccard similarity index defines the size of the overlapping region divided by the size of the union of the two regions A and B as [240] where A is the region determined by the algorithm and B is the manually labelled ground truth or point of comparison, with higher values suggesting better performance.

Kullback-Leibler divergence (KL):
The Kullback-Leibler divergence is a statistical distance measure evaluating the difference between two probability distributions, P and Q, over a domain of image intensity values, h, defined as [241] Mean average difference in area (MADA): The mean average difference in area is calculated between the ground truth (GT) and predicted (P) result for N samples as

Mean intersection over union (MIoU):
The mean intersection over the union calculated the mean overlap between the algorithm calculated area and ground truth where k is the total number of classification categories within the dataset, t i is the number of pixels belonging to the ith category, n ji is the incorrect prediction of the jth category when pixels belong to the ith category and n ii is the number of correct pixel predictions of the ith category. This is also the mean Jaccard similarity index.

Mean pixel accuracy (MPA):
The mean pixel accuracy determines the proportion of correctly classified pixels against the total number of pixels in each category, averaged across all categories, where k is the total number of classification categories within the dataset, t i is the number of pixels belonging to the ith category and n ii is the number of correct pixel predictions of the ith category.

Misclassification ratio (MCR):
The percentage of misclassified pixels defined as where ACC is the accuracy defined earlier, k is the total number of classification categories within the dataset, t i is the number of pixels belonging to the ith category and n ii is the number of correct pixel predictions of the ith category.

Negative predictive value (NPV):
The proportion of true negatives within the total negative algorithm predictions, defined as the ratio of true negatives (TN) to the sum of true negatives and false negatives (FN). .

Pearson's correlation coefficient (R):
The population-based Pearson's correlation coefficient for a pair of variables X and Y described by where cov is the covariance and σ x and σ y are standard deviations. Root mean square symmetric surface distance (RMSD): The root mean square symmetric surface distance is defined as: and calculates the root mean value of all distances, D, between point, x, on the boundary of the predicted region, ∂P, and its nearest point, y, on the boundary of the ground truth, ∂GT, and in reverse from the ground truth to the predicted surface.