Deep Recursive Bayesian Tracking for Fully Automatic Centerline Extraction of Coronary Arteries in CT Images

Extraction of coronary arteries in coronary computed tomography (CT) angiography is a prerequisite for the quantification of coronary lesions. In this study, we propose a tracking method combining a deep convolutional neural network (DNN) and particle filtering method to identify the trajectories from the coronary ostium to each distal end from 3D CT images. The particle filter, as a non-linear approximator, is an appropriate tracking framework for such thin and elongated structures; however, the robust ‘vesselness’ measurement is essential for extracting coronary centerlines. Importantly, we employed the DNN to robustly measure the vesselness using patch images, and we integrated softmax values to the likelihood function in our particle filtering framework. Tangent patches represent cross-sections of coronary arteries of circular shapes. Thus, 2D tangent patches are assumed to include enough features of coronary arteries, and the use of 2D patches significantly reduces computational complexity. Because coronary vasculature has multiple bifurcations, we also modeled a method to detect branching sites by clustering the particle locations. The proposed method is compared with three commercial workstations and two conventional methods from the academic literature.


Introduction
Extraction of coronary arteries in coronary computed tomography angiography (CCTA) is a prerequisite task for the automatic quantification of coronary lesions. In clinical application, quantification of coronary artery lesions is critical for correct diagnosis, treatment, and procedure planning. However, the quantification of coronary artery lesions still requires manual annotation by an experienced expert, which becomes a considerable burden both in time and cost. Coronary arteries are represented as a tree structure in a three-dimensional (D) volume image and elongated with an inhomogeneous contrast enhancement on the lesion. Automatic segmentation of coronary arteries in CT images remains a challenge because coronary arteries are elongated and have complex tree shapes.
In the literature [1,2], considerable attention was paid to the analysis of curvilinear or vascular structures. Seven geometric and four photometric characteristics were introduced for the definition of curvilinear objects as a region of pixels or voxels within one image [2]. Coronary arteries as curvilinear objects have the same characteristics, and the representative characteristics are: • The region of the coronary artery is thin across a long path.

•
The voxels have significantly different intensities compared to the neighboring background.

•
The cross-sectional profile-the intensity values transverse to the main directionfollow a specific distribution.

•
The variation in color along the main direction is smooth.

•
Coronary arteries have local curvatures; for instance, some parts may be mostly straight, other parts can admit soft bends, and other parts may be highly tortuous.

•
Coronary arteries have bifurcation sites that are defined as three-branch joints.
There are several approaches to locating coronary arteries based on the representative characteristics of the curvilinear objects. Lorenz et al. [3] proposed supervised wave propagation based on the models of hyper-intensity, locally tubular geometry, centerline smoothness with the adaptive threshold, and co-variance analysis of extracted segments. Lorigo et al. [4] proposed a curve evolution approach based on the centerline smoothness using curvature regularization and image gradients. Hessian-based minimal paths were found using the multiscale Hessian-based vessel enhancement filters [5][6][7].
Particle filtering-based tracking methods for coronary artery extraction were introduced by varying measurement models or using novel prior information [8][9][10][11][12][13]. Lesage et al. proposed the representative method [9,10] based on the particle filtering framework. Fluxbased vascular features were utilized for the likelihood function, and medial-based geometric models were learned by kernel density estimation in terms of the scale and direction for the prior functions. This method has significant potential for improvement by employing a deep neural network as a robust observation and measurement model. In such a stochastic framework, the most important component is an accurate measurement of the vesselness, which mostly affects tracking results.
Mathematical modeling features for a coronary artery is a challenging task because coronary arteries are thin, elongated, and have complex tree structures. The image qualities can vary by the noise, artifacts, contrast injection timing, which makes the problem more challenging; however, a convolutional neural network (CNN) can extract representative features effectively [14]. CNNs have been successfully applied to vascular segmentation, and motion estimation on various image modalities including cardiac CT images [15][16][17][18][19].
Wolterlink et al. [16] proposed a CNN-based method for coronary artery tracking. This method utilizes 3D volume patches to observe the contextual information around the tracker, which can robustly predict 3D orientations; this, eventually, leads to the location of centerlines of coronary arteries. Similarly, Salahuddin et al. [19] proposed a 3D patchbased CNN method to track the coronary artery centerlines. Jung et al. [17] proposed a CNN-based method for coronary motion estimation where the 2D cross-sectional patches of coronary arteries were sampled and learned for motion correction; finally, the coronary arteries were reconstructed in a 3D volume using the compensated 2D patches. The method shows that CNN can learn the features of coronary arteries using cross-sectional patches. Zhang et al. [18] proposed a CNN-based deep reinforcement learning method. The method employed double deep Q-learning and designed Markov decision process for branch-aware centerline extraction, and shows a higher performance in processing time. Kong et al. [20] proposed a novel tree-structured convolutional gated recurrent unit model to learn the structure of coronary arteries. The method demonstrates that long short-term memory can learn the tree structures of coronary arteries. Inspired by U-Net, Chen et al. [21] proposed an architecture having multiple auxiliary branches; here, an uncertainty map is generated from the multiple abstract feature maps in the inference stage, and the uncertainty map is used to refine the segmentation results.
In this study, we propose a CNN-based stochastic tracking algorithm for the extraction of coronary arteries from 3 to D CCTA, which is inspired by recently introduced methods utilizing a spherical local image patch-based CNN [16] and the adaptive particle filtering method [10]. The proposed method uses tangent patches on a spherical surface as an input of the CNN model, and the robust vesselness probability squashed by a softmax function is utilized as a likelihood function in our particle filtering framework. Our transition priors consider variations in both direction changes and intensity distributions of patches. Further, we present the robust bifurcation detection model based on clustering the particles.
Our contribution is the development of a fully automatic method to track coronary arteries. The deep neural network is integrated with a particle filtering framework. The particles are re-used to identify the bifurcation site by clustering the particles for every time-step. In terms of accuracy, the method shows a higher performance compared with existing methods. The proposed method can eventually provide all the centerlines as a tree structure given CT images.

Tracking Scheme
Coronary arteries found in 3D CCTA are thin, elongated, and tree-structured objects. Our system aims at recovering the vascular centerlines with the tree structure as a chain of sphere centers X T = {x 0 , . . . , x T } that are estimated given the observation and the stationary image Y t = Y, ∀t. The recursive fashion of Maximum a Posteriori (MAP) estimation is a feasible solver for the most probable path of the coronary artery [22]. Thereafter, a Monte-Carlo approximation is varied out for the posterior distribution p(X t |Y) using the weighted population of N t discrete samples t ) are the likelihood and the transition prior, respectively. Using the principle of sequential importance resampling (SIR) at every time-step t, we estimated the posterior distribution in two steps: prediction and update. In the prediction step, the N T samples {x In the update step, the samples are weighted by Equation (1). The result would be the For the centerline estimation, we can estimate and update the state x t as,x where γ denotes the step size, and || d  indicates the potential successive samples of x The likelihood function is approximated by employing a 2D tangent patch-based convolutional neural network. The tangent patches Φ can be achieved using the direction d (i,j) t as normal and the centroid location x (i,j) t+1 . When the tangent patch is well aligned to the main direction, the cross-sectional shape of the coronary artery in the patch image will be near-circular (red). Otherwise, the cross-sectional shape will be irregular (blue). A simple CNN can learn for these small patch images, and the robust vesselness measure is possible to accomplish by the trained network.

Likelihood Approximator p(Y|x
Our likelihood function is designed to provide reliable and robust vesselness probabilities. The likelihood function is approximated by the local patch-based CNN in Figure 2a. We utilize the softmax values of the CNN to obtain the vesselness probabilities. Consider-ing a successoral location x can be achieved, where the patch's centroid and the normal are x where f θ is a deep CNN parametrized by θ. The parameter θ is highly optimized by training with 170,000 local patch images as described in detail in Section 3.1. As shown in Figure 1b, the cross-sectional shape of the coronary artery is near-circular when the normal of the tangent patch is well aligned to the main direction of the coronary artery; otherwise, the cross-sectional shapes of coronary arteries appear irregular . The original purpose of the CNN for the training was to classify the patches into in-vessel (y = 1) and out-of-vessel (y = 0) classes. We directly used the output of the softmax function in Equation (3) for the likelihood term and in Equation (1) for our tracking scheme.

Transition Prior p(x
t ) Our transition prior is assumed to be the first-order Markovian model in Equation (4). We mainly have two priors with independent variables considering direction variations and the similarity of the adjacent patch-images for vascular dynamics.
where KL(P||Q) = ∑ x∈X P(x)log( P(x) Q(x) ) and M = 1 2 (P + Q). Distributions P and Q in Equation (5) are the normalized image histograms Π  1] values for a similarity distance, with values near zero indicating a similarity between distributions and positive values indicating divergence in distribution. Thereafter, we mapped the distance values to a weighting function [23]: w Φ in Equation (6) is a polynomial function to approximate the probability given the distribution similarity as, where λ(= 6) is a scale factor for the probability calibration. Overall, our transition prior (7) prefers that the direction and intensity distribution do not change significantly. As prior information, the tracker prefers to change the directions smoothly so that the image distribution between adjacent time-steps do not change significantly. A directional prior distribution learned from ground truth (a) can be directly used in our method, and the local patches are simply converted to an intensity distribution form (b,c) for distance measure using Jensen-Shannon divergence.

Majority-Minority (M&m) System for Bifurcation Detection
In the case of vessels with a single structure, the posterior distribution is clearly higher around the center of the vessel; an example can be seen in Figure 4c. However, at the branching point, higher values are mapped in two places, as seen in Figure 4d. Thus, it is not easy to determine the direction of the main vessel at the branching site. For every step t, the proposed method utilizes only the important samples Ω t ⊂ S t , leading to a geometrical splitting of the important samples into by density-based spatial clustering of applications with noise (DBSCAN) [24] visualized in Figure 4e. The weighted average µ t,k = Σ Let v t,k = µ t,k −x t be the candidate direction. The K directions v t,k are used to determine whether the state is on a branching site or not by computing the angle θ = cos −1 v t,k=1 , v t,k=2 . If the θ is higher than a specific angle α, then the current state is on a branching site. The θ responses along the trajectory during tracking are plotted in Figure 5. There are some high peaks when the tracker passes branching sites, compared to a weak signal and small variations in other places. At such branching sites, the tracker continues in the main vessel direction depending on the size of clusters. The tracker recognizes the direction of the main vessel using the bigger cluster Ω t,M while storing the direction using the smaller cluster Ω t,m in the stack for the branch vessels, as shown in Figure 4a. Ω t,m are used for the new seed points for branch vessels. The process for tracking multiple vessels as a tree structure is fully automatically performed.

Stopping Criterion
We introduced a likelihood measurement p(Y|Φ (i) t ) in Equation (3), which purely indicates a probability of vesselness. We assume that the summation of the likelihoods measured from all particles at the coronary distals converges to zero, as the tracking goes to more distal parts. We used this assumption as a stopping criterion τ t = ∑ N t i=0 p(Y|Φ The algorithm is forced to track the coronary artery from its ostium to time step (t = 130) to check how τ t changes on every time-step t. As shown in Figure 6, clearly different trends are observed from where a coronary exists to where a coronary does not exist. We terminate the tracking when τ t < β.

Experiment and Result
The particle filtering framework with CNN (Deep-PF) was implemented in Python using TensorFlow library. Experiments were performed using an NVIDIA Titan Xp GPU. Our framework extracts a single vessel at an average of 7.6 s. Our method was evaluated and compared with the three commercial workstations (Xelis Cardiac 3D, by INFINITT [25], and Vitrea, by vital image [26], QAngioCT, by Medis [27]) and the recently introduced stochastic methods (AS [28], AAPF [10]) on eight CT images with thirty two coronary vessels provided in MICCAI 2008 Coronary Artery Tracking Challenge (CAT08) dataset [29]. Note that it was difficult to compare AAPF directly, as it was tested on 50 private cases. For each dataset that was tested, the network parameter and prior distribution were learned on the remaining seven cases using Leave-One-Out rule.

Training Patch-Based CNN
To train the patch-based CNN introduced in Section 2.2, we collected 170,000 local patches from the centerline ground truth (GT) [29]. From each image, four types of vessels (RCA, LCA, LCX, and one branch) were annotated by two medical experts in GT. A single vessel consisted of centerline locations and radiuses for time-step t as {x t , y t , z t , r t } t=0:L .
An even number of local patches from the proximal (t ≈ 0) to distal parts (t ≈ L) are sampled for all vessel scales as shown in Figure 7a. At a specific time-step t, the local patch images are randomly sampled considering the system noise φ ∼ U[0, 2π), and θ ∼ N(0, σ 2 θ ) for the sphere parameters. We labeled y = 1 for the patches whose centroids lie inside the vessels of radius r t ; otherwise, we labeled them as y = 0. Then, 85,000 patch images were evenly sampled for each class for the class balance. In total, 170,000 local patch images were sampled for training data by varying vessel types, radius, and patients. For example, LAD, LCX, RCA, a branch can be one of the vessel types, and the radius scale can vary from 2.5 mm to 0.8 mm.
A simple CNN architecture was employed for learning the local patch images, and the architecture required a 32 × 32 of input image, three convolution layers with 32, 64 and 128 channels, and three fully connected layers with 512, 256 and 2 neurons for binary classification. ReLU activation function was used for all the layers.
We trained this simple CNN with 170,000 local patches, and used the softmax values as uncertainties for the likelihood function in Equation

Initialization and Parameters
The method is automatically initialized using the seed points (coronary ostia) as x 0 and x 1 , and an initial main direction as d 0 = x 1 − x 0 by the identification method [30]. We fixed the other parameters as N T = 200, λ = 6, α = 50 • , β = 5 for the experiment.

Evaluation on a CCTA Database
The robustness and accuracy were evaluated based on the criteria and the public dataset introduced in [29]. The proposed method was applied on eight patients, 32 vessels for the quantitative evaluation, and the dataset is described in Table 1  For the evaluation metrics [29], a point of the GT centerline is marked as true positive reference (TPR) if the distance to at least one of the points connected on the centerline result by a method to be evaluated is less than the radius, otherwise, a point of the centerline GT is marked as false negative (FN). Then, a point of the centerline result by a method is marked as true positive method (TPM) if there is at least one point on the GT centerline at a distance less than the radius defined at the GT point; otherwise, the point of the centerline result is marked as false positive (FP). Then, the evaluation metrics are defined as below, • OV: Total overlap, ||TPM||+||TPR|| ||TPM||+||TPR||+||FN||+||FP|| . • OT: OV of the extracted centerline with the clinically relevant part of the vessel (radius ≥ 1.5 mm), which indicates how well the method is able to track the section of the vessel that is assumed to be clinically relevant. Vessel segments with a diameter of 1.5 mm or larger are assumed to be clinically relevant [31,32]. • AI: The average inside accuracy metric (AI) measures the average distance between the reference, A(x) = 1/nΣ(d(p(x), p i )) 2 , and extracted centerline for automatically extracted points that are within the radius of the reference centerline.

Evaluation and Results
In this study, we proposed a deep CNN with particle filtering method (Deep-PF) for extraction of coronary arteries from CT images. Table 1 shows the average overlap and accuracy results for each of the eight datasets. In terms of overlap, Deep-PF obtained an average OV of 92% and an average OT of 93%. In terms of accuracy, Deep-PF obtained AI of 0.36 mm, which is similar to the typical width of a voxel, but smaller than the spacing between slices in the dataset. The extracted coronary tree by Deep-PF and the GT centerlines are co-visualized in Figure 8, and some examples from the results are visualized in Figure 9. Three workstations were compared with Deep-PF in Table 2. In the case of Vitrea [26], the centerline is not provided directly; however, it provides the segmentation region of the coronary artery without any user-interaction. The centerlines were manually obtained based on the automatically extracted segmentation regions. Therefore, the OV and OT corresponding to the length of the detected region are comparable. In the case of xelis [25], the centerlines were extracted based on the semi-automatic algorithm, and we gave multiple seed points. In the case of QAngioCT [27], the centerlines were extracted without any interaction. It was possible to make a comparison by referring to the method AS (active search) [28], which employed the experiment with the same measures and dataset. In the comparison presented in Table 2, the Deep-PF showed the best performance in OV and OT, which is a measure of the extent to which the coronary artery is tracked. In terms of accuracy, all methods have small values (0.23 mm-0.36 mm), which is smaller than the typical width of a voxel or similar to one in the dataset, and the AAPF method showed the highest accuracy (AI). However, the accuracy of Deep-PF can be improved by centerline refinement through post-processing. Overall, Deep-PF shows a higher performance of coronary artery extraction as an approach combining Deep CNN with particle filtering.

Discussion
In this paper, we have presented a method for a robust centerline extraction method using CNN and particle filtering framework. Tangent patches reflect the features of the cross-sectional shape of coronary arteries that are circular in appearance. CNN can learn these features and provide a robust vesselness probability. Our particle filtering framework is a novel design that uses CNN as a likelihood function. Furthermore, we have presented a robust bifurcation detection model by clustering the particles. The proposed method can automatically extract all the centerlines as a tree structure.
AAPF [10] is based on a highly optimized coronary artery tracking approach by improving the original particle filtering framework. However, AAPF still uses imagegradient-based hand-crafted features for the likelihood function. On the other hand, the proposed method uses CNN to measure vesselness probability with a more sophisticated and large number of features. However, since both methods use local information sequentially, it is very challenging near the distal part of vessels. The proposed method showed slightly higher performance by minimizing the problem of early stopping or over-tracking by reducing uncertainty at the distal part of the vessels.
In recent studies, Wolterink et al. [16] proposed a 3D patch-based CNN that independently classifies the local orientation of a coronary artery, and the method shows the accurate performance. On the other hand, the proposed method is designed to minimize the computation cost to evaluate each particle using 2D patches since particle filtering is a population-based approach. From the perspective of observing image information. In fact, the quantity of the contextual information from the multiple 2D tangent patches is not small compared to a single 3D boxed patch. The use of a 3D boxed patch has the advantage to observe the structural contextual information around the tracker, while the use of multiple 2D tangent planes allows the tracker to observe the contextual information around a wider area. However, there is a clear difference in the tracking approach between the 3D patch-based CNN [16] and the proposed method. In the case of 3D patch-based CNN, a local direction with the highest probability is directly chosen by a single 3D patch for each step, and the direction is not dependent on the previous direction. Whereas the proposed method estimates an optimal direction with multiple particles and the particle filtering framework enables the first-order Markovian property to consider the previous states, which can prevent leakages to other organs or non-coronary vessels.
A limitation of the proposed method may arise when the coronary arteries have long missing regions due to severe complex coronary lesions. The tracker may terminate earlier, which is critical for achieving an accuracy score. This is actually a common limitation that may occur in other tracking-based methods. Even though the proposed method shows a higher performance compared with other tracking-based methods, it has the same limitations. Furthermore, the prior functions in our method make the trajectories smoother, which leads the tracker to be generally more robust. In spite of this advantage, the tracker may leak to other organs if there are very sharply curved vessels. Furthermore, detecting the terminal point is a common challenge of all tracking-based methods because these methods eventually find the local maximal path. To improve these concerns, a graph-based global approach may be an alternative, but there is a limit to its practical use because the computation cost is too high. An eclectic alternative may be policy-based reinforcement learning methods that approximate a global solution.
Deep reinforcement learning (DRL) is one feasible solver for the sequential decision process. DRL can ideally learn and search near-global paths quickly, which is demonstrated from the literature [33][34][35][36]; however, it is still not simple to learn every scenario of the coronary arteries, thus its accuracy is still incomparable with the state-of-the-art methods [18]. The learned policy by DRL, nevertheless, may be helpful for tracking-based methods by hybridizing local and global perspectives.

Conclusions
In this study, we proposed a deep learning-based tracking method for the extraction of coronary arteries from CT images. The proposed method shows best performances as 0.92, 0.93 and 0.36 for OV, OT, AI, respectively. However, this approach still offers potential for improvement. We are currently improving our method to combine with the policy-based guidance from deep reinforcement learning, and planning to test the method with more CT images to validate its robustness. Institutional Review Board Statement: No human or animal studies were carried out by the authors for this article. The institutional review board approved this study and waived the requirement for informed consent due to its retrospective design.