Bayesian Estimation of Geometric Morphometric Landmarks for Simultaneous Localization of Multiple Anatomies in Cardiac CT Images

We propose a robust method to simultaneously localize multiple objects in cardiac computed tomography angiography (CTA) images. The relative prior distributions of the multiple objects in the three-dimensional (3D) space can be obtained through integrating the geometric morphological relationship of each target object to some reference objects. In cardiac CTA images, the cross-sections of ascending and descending aorta can play the role of the reference objects. We employed the maximum a posteriori (MAP) estimator that utilizes anatomic prior knowledge to address this problem of localizing multiple objects. We propose a new feature for each pixel using the relative distances, which can define any objects that have unclear boundaries. Our experimental results targeting four pulmonary veins (PVs) and the left atrial appendage (LAA) in cardiac CTA images demonstrate the robustness of the proposed method. The method could also be extended to localize other multiple objects in different applications.


Introduction
As a type of arrhythmia, atrial fibrillation (AF) is a powerful risk factor for stroke, independently increasing risk fivefold in all age ranges. Radio frequency (RF) energy, a procedure that is used in many clinical methods and studies [1][2][3] to treat arrhythmia, is applied around the pulmonary veins (PVs) using a catheter to block sources of ectopic foci. This minimally invasive arrhythmia procedure is performed by matching the anatomical landmarks manually annotated in the pre-scanned three-dimensional (3D) computed tomography angiography (CTA) image with the rough points obtained in the real-time electrophysiology (EP) catheterization and form the atrial shape [4]. However, since it is time-consuming to designate the 3D landmark manually, an automatic method for localization of the target objects is needed to reduce the time required and improve the accuracy of the procedure.
The left atrial appendage (LAA) is known as one of the major locations of cardiac thrombus formation. LAA occlusion devices are used to prevent stroke in AF patients [5], and accurate measurement of the LAA diameter is important for determining the appropriate device sizes. However, as LAA can have a variety of sizes and shapes, it is challenging to describe its anatomic structure. Localization of the LAA is a prerequisite for automatically finding a longitudinal view of the LAA, where the operator can define the size of the device. Furthermore, in fluid simulations based on the left atrium for predicting patient-specific blood patterns or planning LAA intervention, precise manual annotation of in-and out-lets such as four PVs and LAA is essential [6].

Materials and Methods
The proposed method simultaneously estimates multiple objects on the basis of anatomic and geometric features. Our target objects are four PVs and LAA. As these have variable shapes and locations, it is difficult to automatically localize the target objects if each object is considered separately. However, by considering geometric information from the aorta as a robust reference, robust detection of the PVs and LAA is possible. The overall workflow was divided into three parts, as shown in Figure 2. CTA images were observed to have a large variation in intensity distribution. Hence, adaptive image parameters needed to be estimated before the next processes Figure 2a. The image parameters were utilized for the two localization processes both for reference and target objects Figure 2b,c.
We can obtain the geometric prior distributions of the five targets by estimating the AA and DA together to derive the geometric information in terms of angle and distance. We can estimate AA, DA, PVs, and LAA automatically by using the corresponding geometric information as the prior distribution.
After pre-processing, we assume the set of where m is the number of target objects, and n is the number of reference objects. Note that C i is either a voxel or a discrete region depending on the pre-processing method. Our goal is to find m + n objects among N candidates. Here we assume that m target objects are conditionally independent of the reference objects. Then, for a given image I, we have the following Bayesian formula: P(C 1 , . . . , C m target objects , C m+1 , . . . , C m+n reference objects | I) ∝ P(I | C 1 , . . . , C m ) · P(I | C m+1 , . . . , C m+n ) · P(C 1 , . . . , C m , C m+1 , . . . , C m+n ), where P(I) is treated as a constant since I is given data. If we set the probability of the reference objects as P mn := P(C m+1 , . . . , C m+n ), then we have P(C 1 , . . . , C m , C m+1 , . . . , C m+n ) = P(C 1 , . . . , C m | C m+1 , . . . , C m+n ) · P mn .
Putting Equation (1) into Equation (1), we now have The probabilities of both target and reference objects are simultaneously modeled by Equation (3). However, we separately consider target-and reference-associated terms for reducing computational complexity. Thus, we first identify the reference objects by arg max C m+1 ,··· ,C m+n P(I | C m+1 , · · · , C m+n ) · P mn = arg max C m+1 ,··· ,C m+n L(I | C m+1 , . . . , C m+n ) + L(C m+1 , . . . , C m+n ), (2) where L(·) is the log-likelihood, log P(·). After the reference objects C * m+1 , · · · , C * m+n are detected, the probability associated with the reference objects, P(I | C * m+1 , · · · , C * m+n ) · P * mn is assumed to be a uniform distribution. Thus, the multiple target objects are finally estimated by the equation In Sections 2.2 and 2.3, we analyze each component in detail.

Estimation of Image-Specific Parameters
The PVs, LAAs, AA, and DA all have the common feature that they belong to contrastenhanced regions. However, since contrast density may vary from case to case, as shown in Figure 3, we propose a method to adaptively determine thresholds by automatically separating foreground and background values using the unsupervised manner from the input images.
First, we analyze the histogram of input images after thresholding by 0 HU. Then, R + is divided into the foreground set (F ) and the background set (B) by applying k-means clustering (k = 2) with the within-cluster sum-of-squares (WCSS) optimization function.
Let µ B and σ B be the average and standard deviation, respectively, of the set {I(x, y) | I(x, y) ∈ B}. Likewise, let µ F and σ F be the average and standard deviation, respectively, of the set {I(x, y) | I(x, y) ∈ F}. Then, the input image is thresholded to generate candidates for the reference and targets, and the resulting mask is generated as follows: where The estimated values will also be used for the target value of the likelihood function in Section 2.3.

Localization of Reference Objects
The ascending and descending aorta (AA and DA) have the relatively distinct features of large, thick, stem-like shapes and the appearance of two large circles in the axial plane. For this reason, AA and DA are selected as reference objects (n = 2). The locations of AA and DA can be robustly identified by the method in [28], which exploits the ratios of the eigenvalues, as their shape is relatively close to circular in the axial view. To find the eigenvalues, first, the resulting mask in Equation (4) is labeled as multiple discrete regions using connected-component analysis (CCA): Then, we consider the covariance matrix Σ C i for each C i : where (x k , y k ) ∈ C i . Let λ 1,i ≤ λ 2,i be the eigenvalues of Σ C i and let r i := λ 1,i /λ 2,i . Then, the ideal value of r i ≈ 1 if the object C i is near circular. The joint log-likelihood for the two most isotropic components is formulated as where σ 2 r is the variance of the eigenvalue ratio for AA and DA. The log-likelihood measures the deviation of the two components from the isotropy for both AA and DA.
The prior term with two references is denoted as L(C m+1 , C m+2 ). With the discrete candidates in Equation (5), the geometric features of the aorta in the axial images are described in terms of the distance δ and the angle θ between the ascending and descending aorta ( Figure 4): (8) Figure 4. A geometric model for AA and DA (C m+1 , C m+2 ) and a target (C 1 ). (a) The distance | r| and angle between r andû x are sampled to obtain a prior distribution for reference, where r is the vector between the two references. m is the mean point between the two references, and t 1 = C 1 − m is the direction of the target C 1 , which would be used to specify a single solution. δ is the distance between C 1 and C m+k . (b) The simplified geometry is overlaid on a CTA image with an example of C 1 = LAA. More details about the intersection of two circles whose centroids are AA and DA are described in Section 3.3. AA, ascending aorta; DA, descending aorta; LAA, left atrial appendage.
This prior term L(C m+1 , C m+2 ) is interpreted as the two learned prior distributions with regard to the angles and distances. Let r C m+1 ,C m+2 be the vector from the centroid of C m+1 to the centroid of C m+2 , and let δ C m+1 ,C m+2 be its norm. The variable θ C m+1 ,C m+2 is the angle between the x-axis and r C m+1 ,C m+2 . Let θ and δ be the mean angle and distance, where the distance and angle variations can be evaluated as the ratios respectively. The two independent variables are converted into the prior probabilities as The ratios θ are a perfect match to our model. Recall the reference estimator in Equation (2); two reference objects are found by maximizing the following equation: The object pair C * m+1 and C * m+2 , which maximize Equation (10), is assumed to be AA and DA. We now have two robust cylinder-like objects that cover the entire range of the CTA volume as patient-specific references, as shown in Figure 5.  3D (a,b). Each cylinder-like object is approximated to a vector to cover the entire range of volume (c). AA, ascending aorta; DA, descending aorta.

Localization of Multiple Target Objects
Although the deviation of the geometric relationships between the clinical positions from the average is small, the variation in the size of the whole heart can be very large, for example between adults and children. In addition, the angle at which the heart is placed may be slightly different in different cases. To minimize this variance, we used a prior distribution created by sampling the distance ratios between each clinically named location and the references, similarly to the flexible prior distribution illustrated. This can fit most human heart shapes if the reference positions are correctly posed (such as by using the robust reference identification method introduced in Section 2.2).
Unlike the reference objects, the boundaries of the target objects are generally not clear. Hence, it may not be possible to have discrete components that were used in the technique for identifying the reference objects. Instead, we propose a new feature for each pixel using the relative distances.
Let (x m+1 , y m+1 ) and (x m+2 , y m+2 ) be the coordinates of the centroids of C m+1 and C m+2 , respectively. We define three distances by As we have the prior distributions of the relative distances between all target objects, we denote µ i,j and σ 2 i,j to be the average distance between the i-th object and the j-th object, and its variance, respectively.
Then, we propose a target similarity measure for each target C k as follows: The target similarity measure is computed for every 2D slice. Our estimated parameter for reference µ m+1,m+2 is 81.6 mm. Table 1 shows our estimated parameter values from the distance measurement samples by Equation (11), which can be referred for Equation (12).
Given a robust reference, a relative distance distribution is determined in order to localize the target object, as shown in Figure 6c. Additionally, the elements of f k (x, y) are separately analyzed, and Figure 7 shows why distance distributions from both references are needed to localize an object. However, since the intersection of each distribution forming the 2D ring shape appears in two places, it still does not uniquely specify a single position. Therefore, we combine it with the directional information t shown in Figure 4a to obtain the final distribution. The behavior of f k (x, y) in Equation (12) is also analyzed to show that the target object can be localized without depending on the rotation and scale by using the geometric prior distribution, which changes dynamically according to changes in the position of the reference, as shown in Figure 8.
In order to simultaneously localize multiple targets in 3D, we compute f k for each 2D slice, and we finally obtain the following 3D mask by thresholding the resultant measure: where the element Ω i is a grouped sample set specified by Equation (13), which is visualized, for example, in Figure 9c. A common fact that the target objects exist in the contrast region is considered in the likelihood function for the targets, which is formulated as where I k is the mean intensity of all voxels in C k . The image-specific parameters, µ F and σ F , for Equation (14), can be calculated by the adaptive method given in Section 2.1. The higher probabilities are mapped on the contrast regions by computing the likelihood as shown in Figure 6b.  Visualization of geometric prior distributions: the distance-based PDFs based on two reference locations, C m+1 (a) and C m+2 (b). While each ring-shaped PDF is not able to identify a unique location, the joint PDF (c) can simply designate a target location; PDF, probability density function.
Then, the positions of the target objects with unclear boundaries can be specified by the posterior probability in Figure 6d.
Subsequently, we formulate the probability of the target objects as follows: where δ i,j is the distance between Ω i and Ω j , and µ i,j and σ 2 i,j are its average and variance, respectively, for Equation (15). The geometry of the relationships among the components is similar to a complete graph. The goal of our method is to find a geometry that fits our prior probabilities.

Experimental Results
We have performed a number of experiments to demonstrate the robustness of our system and the quantitative results for the five target objects.

Data Set
The proposed method was applied on a total of 60 CTA images. Besides thirtytwo public datasets, Rotterdam Challenge images [30,31], which are well focused on hearts, were acquired on the CT scanners from various vendors. We additionally selected twenty-eight cases with variations in the field of views, contrast injection timings, artifacts, and morphology from Severance Hospital, Republic of Korea. Each CTA image was reconstructed to 512 × 512 × [272, 433] voxels with an isotropic voxel size between (0.26 mm × 0.26 mm × 0.26 mm) and (0.48 mm × 0.48 mm × 0.48 mm ). Ground truth sets for 60 CTA cases were manually labeled by a medical expert. We have each labeled target region (four PVs and LAA).

Parameters
We fixed parameters for Equation (9) as δ = 81.6 mm, θ = 66.9 • , σ δ = 0.132, and σ θ = 0.011. The parameters were referred from the previous investigation on aortic localization [28], which were measured from CTA images from eight patients in the Rotterdam Challenge training set [30,31]. σ δ and σ θ are the parameters normalized by δ and θ, which play the roles of the internal weighting parameters that can be adjusted so that the larger the difference from the mean values (δ, θ), the greater the penalty. Despite the limited amount of data, the estimated average values can be converged quite closely to the actual values. For example, the average arch width (notated as δ) has been reported as 82 mm, which is a value obtained from 234 patient cases in a clinical study [32].
The parameters of Equation (12) were set to the values of a geometric Gaussian distribution in Table 1. The parameters are newly investigated in this paper with the same dataset from eight patient's CTA images, using the Rotterdam Challenge training dataset [30,31].

Analysis of Geometric Prior Distribution
Given the robust reference objects, a geometric prior distribution was determined to localize a target object. Figure 7 shows why two reference objects are needed to localize a target position. The elements of f k (x, y) are separately analyzed, and each 2D ringshaped distribution is drawn in Figure 7a,b based on C m+1 and C m+2 . For the experiment, the learned parameters, which are provided in Table 1, are referred for localizing the five target objects based on AA and DA. Figure 8 shows that the geometric prior distributions were rotation-and scale-invariant. A target object can be localized regardless of rotation and scale using the geometric prior distribution that flexibly changes according to the change of the positions of the reference objects, as shown in Figure 8a,b. However, since the joint distributions L(d m+1,k ) + L(d m+2,k ) have two local peaks, it still can not specify a unique target position. Hence, we used directional information → t described in Figure 8c together for a unique position. where X and Y can be the outputs by one of methods and ground truth, respectively. In terms of a detection rate, TPR is computed to compare the robustness, as shown in Table 2. Even though the experiments were done without any user interaction or parameter changes, the reference objects, AA and DA, were detected 100% of the time in our dataset.  Table 2 presents the results for PVs and LAA localizations. The quantitative comparison of our method with the output given by the CT EP planning function of a workstation (Vitrea, by Vital Images [33]) is shown. The success rate was measured with a prepared ground truth set corresponding to 60 images in the dataset we tested.

Quantitative Comparison with Other Methods
Our proposed method required 7.34 s total computation time using an i7 3.50-GHz 32-GB PC. Most of the time was consumed in the processing the object preparation. Vitrea required about 10-20 s computation time, depending on CTA images, using dual Xeon E5-2690 2.90 GHz 128-GB HP workstation. Vitrea does not provide a description of the detailed methodology for the detection of LA and its landmarks, but it seems to always conduct LA segmentation first, and then the landmark detection process is done based on the LA segment. On the other hand, the proposed method did not need a segmentation process for LA, rather it could directly detect LA landmarks, which seems to make a difference in computation time. Figure 9 shows some examples, some target candidates, and their estimated geometries, where maximum probability is visualized with maximum intensity projection (MIP), in the axial view in Figure 9a and a magnified view in the yellow box in Figure 9b. It shows the candidates on the target regions, RIPV and RSPV. All the candidates Ω in Equation (13) are visualized in Figure 9c from a view slightly rotated to the blue arrow in Figure 9a. Figure 9d shows a magnified view of the yellow box in Figure 9c. In this example, the five objects connected to each other with red edges are the estimated 3D target objects among seven candidate objects by Equation (15), and C 1 , C 2 , C 3 , C 4 , and C 5 are RSPV, RIPV, LAA, LSPV, and LIPV, respectively. Figure 9e demonstrates that the results were detected with maximum probabilities from several cases. We find such geometries form a complete graph, and its vertices are our target objects.

Discussion and Conclusions
In this paper, we proposed a Bayesian framework to localize five target objects (four PVs and LAA) in LA based on the two reference objects (AA and DA) by designing a Bayesian formulation that utilizes geometric prior distributions. The proposed method is based on the important prior knowledge that there are geometrical relationships among the cardiac anatomies and that follow Gaussian distributions. Multiple target objects whose structures are complex can be robustly localized. For successful localization of the multiple objects, first, we propose an adaptive model for estimating patient-specific image parameters using an unsupervised manner, which can be generally utilized for the preprocessing of other methods. We also designed a Bayesian formulation comprising the relative geometric prior distribution to solve the LA landmark detection problem. We investigated and provided the geometric prior distributions between five anatomies (LSPV, RSPV, LIPV, LSPV, and LAA) and AA and DA based on distance measures. As a result, the average detection rates were 1.0, 0.97, 0.91, 1.0, and 0.94 for localizing LSPV, LIPV, LAA, RIPV, and RSPV, respectively. The proposed method required a total of 7.34 s computation time.
The geometric prior distributions can be easily integrated into other methods. In addition, the proposed method can be easily applied to other applications since it can specify all positions in the cardiac CT image based on two robust reference objects. For example, policy-based localization methods [23,34,35] using deep reinforcement learning, which are the new approaches in the literature, are very fast compared to other approaches that require an extensive search. However, in order to track a target object from a random location in CT images, the direction toward the target location is sequentially searched using only local boxed information. For this reason, sometimes it fails to converge to the target location. Such a problem may occur because the agent cannot use the context of the global heart structure. Initializing the start position of the agent using the proposed method can help to minimize the failure case.
A limitation of the proposed method may arise when the reference or target anatomic parts are abnormally positioned. Cardiac malposition is very rarely reported, but the position of the heart can be reversed, or parts may be missing as in cases of situs inversus, asplenia, or polysplenia. However, we will include case-specific distributions in our database, which is supposed to solve these problems.
In future work, we plan to approximate the likelihood function using a convolutional neural network to obtain a more robust posterior probability in order to improve the robustness of the proposed method, and the geometric prior distribution will be combined with a policy-based method to minimize the failure rate of target object localization in further research. In addition, we aim to apply the proposed method to automatically analyze the morphology of LAA, select device sizes for the LAA procedure, and automate the detection of the landing zone. For LA fluid simulations, we also plan to generate simulation-ready LA geometries by automatically localizing the inlets and outlets of blood flow.