Cascaded Regression-Based Segmentation of Cardiac CT under Probabilistic Correspondences

: The creation of 3D models for cardiac mapping systems is time-consuming, and the models suffer from issues with repeatability among operators. The present study aimed to construct a double-shaped model composed of the left ventricle and left atrium. We developed cascaded-regression-based segmentation software with probabilistic point and appearance correspondence. Group-wise registration of point sets constructs the point correspondence from probabilistic matches, and the proposed method also calculates appearance correspondence from these probabilistic matches. Final point correspondence of group-wise registration constructed independently for three surfaces of the double-shaped model. Stochastic appearance selection of cascaded regression enables the effective construction in the aspect of memory usage and computation time. The two correspondence construction methods of active appearance models were compared in terms of the paired segmentation of the left atrium (LA) and left ventricle (LV). The proposed method segmented 35 cardiac CTs in six-fold cross-validation, and the symmetric surface distance (SSD), Hausdorff distance (HD), and Dice coefﬁcient (DC), were used for evaluation. The proposed method produced 1.88 ± 0.37 mm of LV SSD, 2.25 ± 0.51 mm* of LA SSD, and 2.06 ± 0.34 mm* of the left heart (LH) SSD. Additionally, DC was 80.45% ± 4.27%***, where * p < 0.05, ** p < 0.01, and *** p < 0.001. All p values derive from paired t -tests comparing iterative closest registration with the proposed method. In conclusion, the authors developed a cascaded regression framework for 3D cardiac CT segmentation.


Introduction
Compared with imaging modalities such as ultrasound and magnetic resonance imaging, cardiac computed tomography (CT) can provide more comprehensive anatomic information on the heart chambers, large vessels, and coronary arteries [1]. Pulmonary veins have been isolated using various catheter techniques in patients with atrial fibrillation [2,3]. Many of the approaches incorporate cardiac mapping systems because the capabilities of these systems are useful in catheter ablation procedures [4]. The registration of three-dimensional anatomic models with an interventional system could support diagnosis and help navigate mapping and ablation catheters off complex structures such as the left atrium (LA). However, the creation of 3D models for such cardiac mapping systems is time-consuming, and the models suffer from issues with repeatability among operators. The present study aimed to construct a double-shaped model composed of the left ventricle (LV) and LA.
The convolutional neural network (CNN)-based method needs the training of significant parameters in neural networks [5]. In contrast, as a learning-based system, the active appearance model (AAM) can be adapted to heart segmentation. Mitchell et al. proposed a three-dimensional AAM for cardiac images [6]. This system produced a 3D shape model by using Procrustes analysis and 2D contours sampled slice-by-slice. However, the usage of 2D contour modeling restricts the topology of the input shape. AAM learns the structure of the target organ [6][7][8], and this training method is used together with deep learning for prostate segmentation [9]. In AAM, the shape is expressed as a linear combination of shape bases learned via PCA, while appearance is modeled as the volume enclosed by the shape mesh [10]. Therefore, the AAM-based method matches statistical models of appearance to images. Cootes et al. [10] constructed an iterative matching algorithm by learning the relationship between perturbations in the model parameters and the induced image errors. Compared with Cootes et al.'s method using a fixed distribution, the cascaded regression method can produce better accuracy by using a large cascade depth [10,11]. Cootes et al.'s method performs several iterations by changing the step size in one regression result.
Cascaded pose regression refines a loosely specified initial guess, where a different regressor performs each refinement. Each regressor executes simple image measurements that are dependent on the output of the previous regressors [12]. This advancement of the regressor system is developed in several ways [11,[13][14][15]. The present paper used the supervised descent method (SDM). From SDM, we adapted descent direction, and stochastic modeling of appearance data is added. In the present study, we extended this 2D framework of a deformable facial model into a 3D framework for 3D cardiac segmentation. Since the variation of the 3D heart shape is more significant than that of the facial model, this extension did not make the best segmentation compared with other state-of-the-art methods [7,16].
Boundary search along the normal direction improves the segmentation accuracy considerably [7,16]. Zheng et al.'s method [7] uses marginal space learning and projection on the subspace constructed by PCA, in contrast to the cascaded regression used in the method proposed in the present study. The final segmentation step uses the boundary detectors to move each control point along the normal direction to the optimal position, where the score from the boundary detector is the highest. This method produced point-to-mesh error with 1.32 mm in the LA, and 1.17 mm in LV on 323 volumes from 137 patients. High accuracy comes from boundary search-based non-rigid deformation. Ecabert et al.'s method deforms surface triangles with similarity transformation, affine transformation, and deformable adaptation sequentially. The whole heart segmentation produced surface-to-surface error of 0.82 mm on 28 computed tomography images. This method uses also boundary search along the normal direction. Compared with these methods, the proposed method used active appearance model, and did not adopt the boundary search-based final processing, although this skip eliminated much accuracy improvement. Since high gradient boundary information is needed to perform the boundary normal search, the proposed method made statistical modeling comparisons under the active appearance framework.
Zheng et al. proposed a part-based left atrium model, which includes the chamber, the appendage, four major pulmonary veins and right-middle pulmonary veins [17]. Depa et al. proposed the automatic segmentation of the LA by using weighted voting label fusion and a variant of the demons registration algorithm [8].
Recently, many papers have been published in the heart segmentation field. In MRI, a fully convolutional neural network (FCN) was used for segmenting the left and right ventricles [18]. Payer et al. proposed fully automatic whole heart segmentation, based on multi-label CNN and using volumetric kernels. After they localize the center of the bounding box around all heart structures, segmentation of the fine detail of the whole heart structure within the bounding box is performed [19,20]. De Vos et al. developed a localization method to extract the bounding box around the LV using a combination of three CNNs [21]. Wang and Smedby proposed an automatic whole heart segmentation framework combined CNN with statistical shape priors. The additional shape information is used to provide explicit 3D shape knowledge to CNN [22,23]. Like Wang and Smedby's method, AAM can also be combined with CNN to improve the system accuracy and better understand the segmentation system.
Group-wise registration of point sets is a fundamental step in creating a statistical shape model. The probabilistic correspondence construction method is developed with a probabilistic view of estimating correspondence [24][25][26][27]. The correspondence for each point on one shape is formulated as a weighted combination of all points on the other shape, where the weights/probabilities are derived from a probabilistic function of the pairwise distances. Gooya et al. extended pairwise matching to group-wise correspondence construction [26]. They proposed a sparse model selection method and similarity parameter update equations of the Gaussian Mixture Model (GMM)-based method. In the proposed method, only simple probabilistic correspondence without a sparse model selection will be adapted. In Gooya et al.'s method, only point correspondences were constructed without appearance correspondence, because their research was performed only for constructing statistical shape models, and did not extend to a segmentation method.
Since the cascaded regression model is constructed on appearance correspondence, the exact shape of the point model must be estimated from input reference volume. The proposed method made appearance correspondence probabilistic estimation under the mapping of tetrahedron sets. The reason why the appearance correspondence is needed is explained in detail in Section 2.2. Moreover, the proposed method deals with three object models. As there is close interaction between two objects (inner and outer surfaces) of the left ventricle, we made a new separate model point construction method. We compare the proposed method with the simple and popular pairwise method [28] based on similarity and nonrigid transformations. In the present study, we developed cascaded regression-based segmentation software of a double-shaped heart model with point and appearance correspondence under the modified group-wise correspondence construction method.

Patients
In total, 35 cardiac CT angiography (CCTA) volumes were used for the evaluation of cardiac CT segmentation. These volumes were acquired at our institution (Cardiovascular Center, Seoul National University Bundang Hospital, Seoul, South Korea). The need for informed consent was relinquished because of the retrospective nature of this study. We used a 256-slice CT scanner (iCT256, Philips Healthcare/ Philips Medical System, Eindhoven, The Netherlands). Volumes were downsampled by an integer value two along three axes for effective execution. Therefore, the x and y dimensions were adjusted to 256 × 256. For evaluation, six-fold cross-validation was performed by dividing 35 cases into six-groups of [6,6,6,6,6,5] cases. Five groups were selected as the training sets, and the remaining group was used as the test set. Two engineers with bioengineering major participated in the delineation of the LA and the inner and outer boundary of the LV by using ITKSNAP software [29]. One engineer has an experience of 10 years, and the other engineer has an experience of 3 years. The mask from the 10 years experience engineer was generated for a gold standard mask. The remaining mask was used for repeatability calculation.

Comparison of Two Correspondence Models
Two closed meshes were used to represent the LV and LA. The mesh of LV includes inner and outer surfaces. The atrium model includes four large vessels that are connected to the cavity of the atrium. It is necessary to establish point correspondence among a group of shapes to build a statistical shape model. Here, two correspondence construction methods are compared as representative methods of pairwise and group-wise. As the pairwise method, Frangi et al.'s method uses an atlas for correspondence construction [28].
Since the group-wise method performs iterative correspondence construction procedures considering whole volumes, the method produces more unbiased results than the pairwise method. The GMM-based correspondence construction method is used to take this advantage [26]. We used the implementation of Gooya et al. without sparsity parameter change. Therefore, the vertex number of the 3D model was given as a fixed value, and additional distance features on the narrow band were ignored in EM iterations. This GMM framework iteratively updates the estimates of the similarity registration parameters of the point sets to the mean model by using an expectation-maximization (EM) algorithm. we can define S = {S k } as the set of n K similarity transforms, where a number of the training set is n K , and 1 ≤ k ≤ n K is satisfied. In E mode of EM, the posterior probability is updated by using the GMM model under the current parameter estimate. In M mode of EM, given an estimate of parameters, EM maximizes the lower bound to calculate a new parameter estimate. We modified this method in correspondence construction by adding the independence of objects, which are LA, the inner surface of LV, and the outer surface of LV. This modification will be explained in the following section. In the same way of the pairwise method, the training image was transformed by using the similarity transform S.
In cascaded regression-based segmentation, a tetrahedron set of closed atlas mesh must be provided. This tetrahedron set contains the inside volume of the closed atlas. From this tetrahedron set, we can construct appearance data. A tetrahedron inside a closed mesh is denoted as T a ; a = 0, ..., n T − 1, where n T is the number of tetrahedrons. In pairwise correspondence method, reference inside the volume of the closed atlas can be given in atlas construction step by using non-rigid transformation of two transforms performed in the pairwise method. Nevertheless in the GMM-based method, this non-rigid transformation is replaced with soft correspondences. Therefore, this inside volume must be calculated. In the next section, the way to extract this information from the soft correspondence is provided for the tetrahedron set construction of atlas.

Correspondence Point Construction
As a number of the training set is n K and X = {χ k }, 1 ≤ k ≤ n K is defined, n K observed 3-dimensional training point sets are defined as specifies the spatial coordinates of each point. The EM-based correspondence construction method updates the mean model every iteration. Let M = {m j ∈ R 3 }, 1 ≤ j ≤ n M , be a model point set, where m j denotes the pure spatial coordinates. n M denotes vertex number of the 3D model. Each S k globally transforms the model points m j ∈ M to the space of X k . E kij specifies the posterior probability that x ki are sampled from the Gaussian component of mean m j in each iteration. Each point set is regarded as a spatially transformed GMM sample, and the registration problem is formulated to estimate the underlying GMM from training samples x ki . Thus each Gaussian in the mixture specifies model point m j , which is probabilistically corresponded to a training point x ki . In EM, the parameter set is defined as {S k ,M,covariance}. Exact modeling can be found in Gooya et al.'s paper [26].
Shape models are constructed by PCA under Equation (2), and this method requires one-to-one point correspondences. As no exact point correspondences are made in the EM-based method directly, point correspondences are identified by a virtual correspondence [26]. Compared with original modeling, virtual correspondence's modeling of the proposed method considers the independence of objects. Before the virtual correspondences of three objects are constructed separately, label selection among three virtual correspondences is performed along with n M point set. Since an object label has the same value in the same spatial location of n K virtual correspondence sets, statistics among n K virtual correspondence sets can help to find object labels.
Let us denote U b k as a subset of input point set χ k which includes points corresponding to object b. Here object 0, 1 and 2 are LA, the inner surface of LV, and the outer surface of LV, respectively. U b k is found by searching the nearest points among point sets of three objects under given object label masks in pre-processing. For any model point m j , a virtual correspondence point of U b k denoted by m b kj is induced by the following equation.
Here, m b kj is calculated in each object along with each j point. Let us define C ∈ Z b×N M flag matrix for the object label, and we initialize all elements of C as 0. Here, Z means an integer. This C matrix accumulates as majority voting on the index of argmax

T a Construction Method
We can find the closed iso-surface mesh from the training input mask by using the marching cube algorithm [30]. This triangulation is based on χ k . However, 3D triangulation of point set of iso-surface mesh finds only convex hull. Therefore, we performed the filtering of the tetrahedron center on which training input mask has labels. This filtering can make the shape of tetrahedrons to be similar with the training input mask. This filtering is denoted as constrained triangulation. The result tetrahedron set of filtering is denoted as Here p * l is the index of any point, and c k is total tetrahedron number of k input. It is not easy to calculate T k from D k , since we can not know the exact reference mask for T k .
T k is tetrahedron set under m kj set. Therefore, tetrahedron mapping is calculated between D k and T k to calculate T k by using E k,p * l ,j . On one point p * l , we find m largest effective corresponding model points of m kj by sorting E k,p * l ,j values. m 4 tetrahedrons of T k are constructed from four model point sets in every combination. In other words, one tetrahedron of D k produces m 4 tetrahedrons of T k . After changing one tetrahedron into volume point set with using inside check of tetrahedron, tetrahedron counts on volume grid image are increased by using this volume point set. Therefore, accumulation of tetrahedrons generated under T k is performed on the volume grid image which contains the tetrahedron counts. If we we want to constititue dense point set of a target object, thresholding this volume grid image makes that dense point set. We tested m in 1, 2, 3 and 4 values. Selection of 3 was optimal under computation time and reconstruction quality. Reconstruction quality was visually evaluated by checking the missing structure manually in every case. The resulting volume accumulation is shown in Figure 1.
As this volume accumulation generated from this mapping uses much redundancy, we simplify volume accumulation into tetrahedron sets by using a filtering result of 3D Delaunay triangulation of M. This 3D Delaunay triangulation of the model points is performed with the alpha radius of 40 to remove noise tetrahedrons [31].
To extract one statistical model tetrahedron of all simplified T k , we use statistics in every T k . In the tetrahedron statistics calculation, tetrahedrons between the LV and LA also are considered as the cases of 0.5 probability in two objects. The tetrahedrons composed of only inner surface points of LV are not considered in the statistics. As common reference tetrahedrons are made from 3D Delaunay triangulation of M, statistics analysis is possible. As input T a of cascaded regression, atlas tetrahedron set is made by using tetrahedrons above threshold-probability of statistic probability accumulation array. Threshold-probability was selected as 14% under an empirical decision. Although the surface of the atlas tetrahedron set is not realistic, reconstructed T k makes realistic shapes in Figure 2. With the found T a , we make the final segmentation mask under the shape parameter set found.

Shape and Appearance Models
Because we can judge if (x, y, z) is inside T a , by simple calculation, the point set P a ; a = 0, ..., n T − 1 is constructed in a regular grid of 3D input. The n T group of a point set P a is concatenated with all tetrahedrons of T a , and it is denoted as P T . The array P T becomes a full set for the target points of appearance. From this array, a random selection of appearance is performed under a given number. The input image does not have any requirements in size and spacing, because appearance sampling is performed based on physical locations.
The transform between two images is constructed by piecewise-affine transform through the correspondence of tetrahedron vertexes. Specifically, the meshes of the training image have the same size because they are produced from the correspondence construction procedure. Between the atlas mesh and input mesh, we can perform a piecewise-affine transform [32]. Appearance correspondence can be constructed using these transforms. Because we know the P T point set of an atlas, we can rapidly extract the intensity and additional data of the training target by using the piecewise-affine transform on the P T point set.
The training input image I k is from a three-dimensional volume and two shapes are defined with u 1 and u 2 tetrahedron vertexes. The set of all vertexes is a vector ∈ R 3(u 1 +u 2 )×1 that defines two models of the heart. This set is brought from a virtual correspondence point m kj of group-wise correspondence construction procedure. Since the correspondence construction method makes similarity-free models, every shape correspondence does not have variations due to similarity transforms (translation, rotation, and scaling). PCA is applied to obtain the shape model. This shape model increases the number of eigenvectors independently with appearance information. The model is defined by the mean shape s 0 and the ith shape eigenvector s i of n sh number. Shape eigenvectors are represented as columns of the matrix B ∈ R 3(u 1 +u 2 )×n sh . This model also describes the ith eigenvalue λ i . Finally, to model similarity transforms, B is appended with six additional bases [33]. An instance of the shape model is constructed by where p ∈ R n sh ×1 is the vector of the shape parameters.
In order to learn the appearance model, each training image I k is warped to a reference frame by using S k to produce a similarity-free appearance. Since voxels in a tetrahedron are found by applying the piecewise-affine transform on the P T point set, an array of voxels included on each tetrahedron can be constructed rapidly. The total number of voxels on every tetrahedron is denoted as n v . Three features on one voxel are accumulated to construct an appearance set ∈ R 3n v ×1 . The original intensity of the training image is used as the first feature, which is normalized with statistics in convex envelop volume of LV containing the cavity region of LV [33]. The sum of x, y, and z on the gradient is used as the second and third features, under Gaussian smoothing with 0.5σ and 1.0σ. These three features are normalized by the z-score. Subsequently, only mean appearance A 0 is calculated.

Stochastic Cascaded Regression
Two points are manually selected as the initial location for the proposed method. The two points are the central locations of the LV and LA, and because they are selected on axial, sagittal, and coronal views by using ITKSNAP software [29], these central locations are relatively accurate initial locations. Moreover, while this initial location provides only scale and translation information, shape-deformation information must be determined using the segmentation program. The flow chart of the proposed method is shown in Figure 3.
Stochastic optimization was introduced by Robbin and Monro [34]. The approximation of the true gradient direction is realized by computing the gradient direction using only a small subset of randomly chosen samples in every iteration. In this manner, the computational cost per iteration is significantly reduced, while the convergence properties remain comparable to those obtained with deterministic gradient descent. The gradient direction is calculated in each iteration. Although the original cascaded regression method used all appearance sets owing to the small number of landmarks, the number of appearances can be too large in the proposed method because this framework is performed in 3D volume. To reduce computational cost and memory usage, we adopted a stochastic method by selecting a subset of the appearance set randomly. We denote the number of subsets in the appearance set as 3n s . This subset usage can substantially reduce the cost of ridge regression.
The ground-truth shape parameters p * k ∈ R n sh are defined in each training image I k , k = 0, ..., n K − 1. After the ground-truth shape parameters p * k are calculated in each training image, a set of n Y perturbed shape parameters p k,j (t), j = 0, ..., n Y − 1 are generated for t = 0 where t is the iteration number. This perturbation must capture the statistics of the heart-segmentation initialization process. SDM model of cascaded regression was used for the segmentation [11].
Generalized ridge regression (GRR) can be applied if the number of predictors p exceeds the sample size n [35]. Furthermore, GRR can apply different sparsity parameters to different groups of predictor variables. The present study used GRR as the ridge regression method because this method can implement additional selectiveness in the proposed framework. The work of removing features can be performed by increasing the sparsity parameter in the GRR framework. The procedure of GRR is explained conceptually in the following. After input data of the design matrix are rescaled by using the inverse square root of sparsity parameters, ridge regression is performed in the sparsity parameter of 1. The result vector is applied with rescaling again by using the inverse square root of the same scale sparsity parameters.

Evaluation Method
The training procedure is evaluated by the average L1 error of the n K * n Y sample number in iteration t which is defined as follows: The flow of this error is calculated in the training procedure to find an optimal sparsity parameter of ridge regression.
SSD is calculated by averaging the closest Euclidean distance using both directions between two surfaces. Three SSDs of the LV, LA, and left heart (LH) objects were evaluated. Hausdorff distance (HD) of the LV and LA, and Dice coefficient (DC) of LH were evaluated [36]. To compare our method to other methods, Microsoft Excel 2016 was used to generate descriptive statistics and perform paired t-tests. The significance levels were set to 0.05, 0.01, and 0.001.

Results
The proposed method was realized by C++ and ran on a Laptop with Intel Core i7-8750H, 16 GB RAM configuration. The perturbation number n Y was 60 in the training image of 30 (or 29) in six groups. The grid spacing in appearance sampling was 0.7188 × 0.7188 × 0.9001 mm which caused the size of the P T point set (n v ) to be approximately 500,000. Since the appearance point numbers of the LA and LV were 500 and 500, the total appearance feature number (3n s ) was 3000. The sparsity parameter of ridge regression was selected as 7500 empirically by considering the training profile of Equation (3). The shape eigen number (n sh ) was selected as 26. This value includes six eigenvectors with similarity. Training of cascade regression needed 3496 ± 191 s in 10 iterations. Segmentation work needed 14.85 ± 7.31 s in 10 iterations.

System Evaluation
The evaluation metrics between two gold-standard masks of two operators were calculated for the comparison of difference. (1.06 ± 0.14 mm, 0.97 ± 0.12 mm, 1.02 ± 0.12 mm, 9.98 ± 2.83 mm, 10.60 ± 11.21 mm, 89.33% ± 1.87%) were the results of the evaluation, which are denoted as (LV SSD, LA SSD, LH SSD, LV HD, LA HD, LH DC). Since the extent of the structure in four major pulmonary vessels of LA might be different between two gold-standard masks, DC could be lowered to some degree. The repeatability of gold-standard generation can be estimated as 0.51 mm since the repeatability is calculated as the standard deviation of test sets. All p values derive from paired t-tests comparing the proposed method with each of the three methods. * p < 0.05, ** p < 0.01, *** p <0.001. Initial: segmentation result of two landmarks, Pairwise : segmentation result of pairwise correspondence construction method, Group-wise 1 : segmentation result of group-wise correspondence construction method without separate object control, Proposed : segmentation result of the proposed method.
According to Table 1, the proposed method produced a significantly better result in LH SSD and DC metric than every three methods. The group-wise method showed the performance improvement of LA SSD compared with the pairwise method from 2.88 ± 0.70 mm to 2.31 ± 0.53 mm. When the atlas of the LA was constructed in the pairwise correspondence method, the structure of four major pulmonary vessels was smoothed. This problem was removed in the proposed method. Group-wise 1 method performs correspondence construction without separate object control. In LV, the inner and outer surface can have a dependence on the soft correspondence construction of GMM. The proposed method showed significantly better LV SSD of 1.94 ± 0.34 mm than 2.34 ± 0.44 mm of Group-wise 1 method. Because of dependence in two surfaces of LV, Group-wise 1 method's LV SSD is worse than the pairwise method's. In six-folds, training curves of the proposed method are shown in Figure 4.
The iterative closest point (ICP) method minimizes the difference between two clouds of point [37]. We performed the ICP method with a trained shape model under six-fold cross-validation. Since point cloud extracted from gold-standard mask directly is used for registration of shape model, the appearance of LH is not considered in this registration. In Table 2, PairICP produced the significantly worse result in every metric than the cascaded segmentation of the pairwise method. However, GroupICP showed a registration result reaching the proposed method with a slight difference significance. If the ICP method is improved from simple ICP, then the registration result will be improved. DC was calculated under the appearance correspondence of cascaded regression.  We tested the proposed method by varying the vertex number of 3D model n M in Table 3. Experiments only show that a small n M of 3000 or 4500 can degrade the segmentation accuracy. As n k statistics is 9219 ± 1696, a selection of 3000 and 4500 can make shape loss. Since there is not much difference between 6000 and 7500, we selected 6000 as n M in experiments of Tables 1 and 2. As the additional experiment, we checked which model is better between a combined model of two objects and a separate shape model. We can duplicate the shape eigenvector and set the other part of this eigenvector to zero. Under this condition, a combined model with 26 shape eigenvectors extended to a separate model with 46 eigenvectors. The number 6 here is the number of shape eigenvectors. The evaluation result of the separate model in n M of 6000 was (1.91 ± 0.37 mm, 2.24 ± 0.52 mm *, 2.07 ± 0.32 mm **, 14.38 ± 2.82 mm, 19.21 ± 5.18 mm, 81.45% ± 3.84%***), where * p < 0.05, ** p < 0.01, and *** p <0.001. All p values derive from paired t-tests comparing the separate model with the proposed method. The separate model with 46 eigenvectors was slightly better than the proposed method, although conflict between LA and LV could occur in a separate model.

Comparison with Other Methods
Tran et al.'s method uses a deep, fully convolutional neural network for ventricle segmentation in cardiac MRI [18]. This method evaluates the surface distance of 30 cases on the LV's endocardial and epicardium surfaces. The endocardial surface error was 1.73 ± 0.35 mm, and the epicardium surface error was 1.65 ± 0.31 mm. Tran et al.'s method was evaluated only on the ventricle model. Although the repeatability is not considered in the evaluation metric calculation, the proposed method shows a comparable accuracy of 1.88 ± 0.37 mm. Mitchell et al.'s method uses a 3D active appearance model and produced an error of 2.75 ± 0.86 mm for endocardial contours and 2.63 ± 0.76 mm for epicardium contours, which were evaluated on 359 slices of cardiac MRI [6]. Compared with this result, our method shows a better accuracy of 1.88 ± 0.37 mm in LV SSD. Tölli et al.'s method performed the segmentation of four-chamber model by using artificially enlarged training set and Active Shape Model-based segmentation in 25 subjects. This method produced 1.77 ± 0.36 mm in LV, and 2.44 ± 0.85 mm in LA, although the enlargement method of training set is used [38]. Our method shows a better result in LA, compared with this method. Zhuang et al. evaluated whole heart segmentation under deep learning (DL) based-method and multi atlas segmentation (MAS) frameworks [36]. This evaluation result shows mean SSD of 2.12 ± 5.13 mm and mean DC of 87.20% ± 8.70%. Range of DC can be different from that of the proposed method because the characteristics of the two datasets are different. In this Open-Access Grand Challenge, the active appearance-based method is not utilized. If the current research includes whole heart components, we can also apply to a challenge like this.

Examples of the Segmentation Result
Three examples of the proposed method with n M of 7500 are presented in Figures 5-7. Figure 5 presented the best case. Figure 6 presented the median case. The boundary of LV is not captured exactly. Figure 7 presented the worst case. The statistics of LV is not learned properly since training images do not contain such images enough.

Discussion
Multi-objects modeling in cascaded regression is under development. In the correspondence construction of the GMM-based method, the dependence of multi-objects must be considered, although training procedure is performed in combined shape. If the vertex number of 3D model n M is increased to cover many objects, computation resources will increase rapidly. There is a trade-off between memory problems and shape loss in separate and combined constructions of correspondences.
Segmentation accuracy depends on the quality of tetrahedron set T a since the final segmentation region is calculated by checking inside of the tetrahedron set. In the proposed method, several heuristics rules are adapted to tetrahedron statistics calculations. A more systemic rule must be developed to increase the quality of segmentation shape from the found point set.
Separate operation of correspondence construction is made from the input sort array of training points. Since statistics of sorts is calculated accurately by using ∑ i∈U b k E kij , we apply this sort of information to statistical analysis of tetrahedron T a . Tetrahedron between LA and LV can have noises since the topology between LA and LV is complex. More fine control is needed, especially in the region between LA and LV.
The separate shape model of duplicate eigenvectors gave more freedom in shape change and produced a slightly better evaluation result than the proposed method. But since the conflict between LA and LV could occur, a check of conflict must be performed to use this model. Since this conflict check between objects needs an additional computation, the combined shape model can be used instead.
Since unit-vector normalization makes the contents of appearance too small, z-score normalization is applied to the appearance input of ridge regression. Furthermore, ridge regression is not scale-invariant. If the scales used to express the individual appearance variables are changed, then the ridge coefficients do not alter in inverse proportion to the changes in the variable scales [39]. Therefore, z-score normalization is assumed to produce better scaling in terms of data size.
In the future, the segmentation model for dealing with every phases of the heart must be developed, since the statistics of the systole phase make DC be comparatively low value. If an extensive database is applied, memory problems and many technical problems will occur since cascaded regression needs texture modeling compared with an active shape model. To improve the segmentation result, then many cases must be applied to construct the heart anatomical model. Finally, readjustment of segmentation surfaces by using boundary normal direction can be developed to increase the segmentation performance by including this post-processing in active appearance framework.

Conclusions
We developed a cascaded regression framework for 3D cardiac CT segmentation. In the GMM-based correspondence construction method, the appearance correspondence extraction method is proposed to extend this correspondence method to model-based segmentation. An independant final point correspondence construction method is developed to apply the GMM-based correspondence method to the double-shaped model of LH. Stochastic appearance selection enables effective construction in the aspect of memory usage and computation time. A comparison between the two state-of-art correspondence construction methods was performed in segmentation accuracy. By using a constructed cardiac model, the rapid segmentation of three-dimensional anatomical models could support diagnosis and help navigate mapping and ablation catheters off complex structures such as the LA.