Bimodal CT / MRI-Based Segmentation Method for Intervertebral Disc Boundary Extraction

: Intervertebral disc (IVD) localization and segmentation have triggered intensive research e ﬀ orts in the medical image analysis community, since IVD abnormalities are strong indicators of various spinal cord-related pathologies. Despite the intensive research e ﬀ orts to address IVD boundary extraction based on MR images, the potential of bimodal approaches, which beneﬁt from complementary information derived from both magnetic resonance imaging (MRI) and computed tomography (CT), has not yet been fully realized. Furthermore, most existing approaches rely on manual intervention or on learning, although su ﬃ ciently large and labelled 3D datasets are not always available. In this light, this work introduces a bimodal segmentation method for vertebrae and IVD boundary extraction, which requires a limited amount of intervention and is not based on learning. The proposed method comprises various image processing and analysis stages, including CT / MRI registration, Otsu-based thresholding and Chan–Vese-based segmentation. The method was applied on 98 expert-annotated pairs of CT and MR spinal cord images with varying slice thicknesses and pixel sizes, which were obtained from 7 patients using di ﬀ erent scanners. The experimental results had a Dice similarity coe ﬃ cient equal to 94.77(%) for CT and 86.26(%) for MRI and a Hausdor ﬀ distance equal to 4.4 pixels for CT and 4.5 pixels for MRI. Experimental comparisons with state-of-the-art CT and MRI segmentation methods lead to the conclusion that the proposed method provides a reliable alternative for vertebrae and IVD boundary extraction. Moreover, the segmentation results are utilized to perform a bimodal visualization of the spine, which could potentially aid di ﬀ erential diagnosis with respect to several spine-related pathologies.


Introduction
Intervertebral disc (IVD) localization and segmentation has triggered activity in the medical image analysis community, since IVD abnormalities are strongly associated with various chronic diseases, including disk herniation and slipped vertebrae [1]. Most works utilize magnetic resonance imaging (MRI) due to its superior soft-tissue contrast and non-invasive nature [2]. On the other hand, computed tomography (CT) often provides essential cues aiding diagnosis, leading to a minority of CT-based methods [3][4][5]. Still, there is a lack of bimodal approaches, based on complementary information derived from both MRI and CT imaging. framework to tackle the problem of localization and segmentation of IVDs from T2-weighted MR data [31].
More recently, deep learning provided another course of effective methods for spinal image analysis. Cai et al. proposed to use a hierarchical 3D deformable model for multi-modality vertebra recognition, where multi-modal features extracted from deep networks were used for vertebra landmark detection [32]. The recognition result guides a watershed-based segmentation algorithm. However, their deep network is trained in a set of only 1200 pairs of CT/MRI patches, whereas there is no convincing check for overfitting in the evaluation of the recognition part. Moreover, no actual details are provided with respect to the segmentation algorithm, as well as no study on the dependency between the segmentation quality and the recognition accuracy. Despite these shortcomings, the method of Cai et al. is interesting in the sense that it combines information from CT and MRI data [32]. Similar criticism with respect to the size of the training dataset and overfitting applies on other deep learning applications, such as those of [33], who used feed-forward neural networks on CT data and Chen et al., who used 3D fully convolutional networks (FCNs) with flexible 3D convolutional kernels on MRI data [34]. Overall, the main problem arising when considering deep learning is the availability of sufficiently large 3D datasets. More so in the case of a potential bimodal approach which requires pairs of CT/MRI data.
A recent comparative study of several state-of-the-art methods, focusing on MRI, both for localization and segmentation, can be found in [5]. Most of these works, require human intervention, whereas they operate on 2D sagittal images instead of 3D volumes. Most importantly, apart from the work [32], all works operate on a single modality, either CT or MRI. Table 1 summarizes the advantages and limitations of several state-of-the-art methods, including information originally presented in [5] (in the case of MR methods). These methods are included in the experimental comparisons presented in this work. A model of the interspace between objects to guaranteed that the shapes are not deformed.
Requires the manual selection of the IVD center. 50 s per vertebra, 2.4 GHz C++

MR
Lopez Andrade and Glocker [27] Globally optimal segmentation with learned likelihood.
L5-S1 disc should be present. Requires training.
Complexity depends on the number of atlases. Problems in the segmentation of structures deviating from atlases.
Computationally intensive. Memory cost is proportional to image resolution.
Computational complexity proportional to the number of voxels used for training. Problems in the presence of severe pathologies and cropped image parts.
This work introduces a bimodal method for vertebrae and IVD boundary extraction. Although it assumes the availability of both CT and MR images for each case, the proposed method derives complementary information from both modalities and does not depend on learning or on the availability of large datasets, whereas it requires a limited degree of human intervention. Moreover, it is capable to obtain segmentation results of at least comparable quality to the ones obtained by state-of-the-art methods, although it is not learning-based. Beyond segmentation, it offers a bimodal visualization of the spine, which could potentially aid differential diagnosis with respect to several spine-related pathologies. It is based on the observation that vertebrae are much more prominent in CTs. In this light, each MRI image is geometrically transformed to be aligned with its CT counterpart. As a next stage, vertebral regions are extracted from sagittal CT and projected on the corresponding sagittal MRI. The projected vertebra regions naturally define the boundaries of IVD regions and thus can be used to guide localization. In the final stage, segmentation is performed using a region-based active contour, initialized and guided by the localization result.
The rest of this paper is organized as follows: Section 2 presents the various stages of the proposed method, whereas Section 3 provides an experimental evaluation on actual CT/MRI pairs, as well as comparisons with the state of the art. Finally, Section 4 discusses the main results and summarizes the conclusions of this work.

Materials and Methods
The proposed method aims to extract IVD boundaries and provide a bimodal visualization of the spine, using imaging data from both CT and MR images. It consists of six main stages, as illustrated in Figure 1: (1) geometric transformation in order to derive the rules for the projection of structures identified in CT in the context of MR, (2) CT segmentation for vertebral boundary extraction, taking into account that vertebra are prominent and thus easier to identify in the context of this modality. Linear gray level normalization and Otsu thresholding with 3 gray levels are applied at this stage, (3) vertebral region projection on MRI, using the rules derived in stage 1, (4) IVD localization by means of a simple heuristic, (5) CT/MRI-based segmentation for IVD boundary extraction, using the boundaries defined by the vertebra projected in stage 3, the coarse IVD regions obtained in stage 4, and the Chan-Vese active contour, (6) CT/MRI image fusion, offering a bimodal visualization of the spine. The proposed method is based on 3D images (CT and MR), although some of its components are applied on 2D slices of each 3D image.

Geometric Transformation
A pair of CT and MRI 3D images is used as the input and the user is asked to manually determine a pair of 3D regions of interest (ROIs). Although these ROIs are selected to approximately match each other, they have a different number of slices, as well as different pixel sizes. It should be remarked that determining ROIs is the only human intervention required by the proposed method. The MRI 3D ROI is geometrically transformed to match the CT 3D ROI, by means of one-to-one evolutionary optimization [36] and the geometric transformation method of Wells et al. [37], which is based on mutual information ( Figure 2). This method considers that both MR and CT are informative of the same underlying anatomy and share mutual information. In this light, a transformation function T is found by maximizing this mutual information, as quantified by means of entropy I: in which: where x are voxel coordinates, u(x) is a voxel of the reference volume (e.g., CT), υ(x) is a voxel of the target volume (e.g., MR), h(u) and h(υ) are the entropies of random variables u and υ, respectively, and h(u, υ) ≡ − p(u, υ)lnp(u, υ)dudυ is the joint entropy of random variables u, υ. The calculation of entropies is based on estimating the underlying probability density function by means of Parzen window density. Further details can be found in [36].

CT Segmentation for Vertebral Boundary Extraction
Taking into account that vertebral regions are much more prominent in CT, we focused on this modality for vertebral boundary extraction. Still, there are several challenges for accurate vertebral boundary extraction in CT, including the intensity inhomogeneity within each structure, which may result in the identification of false 'gaps' as well as the presence of noise, which along with intensity inhomogeneity may lead to segmentation artifacts. Linear gray-level normalization is applied on the CT ( Figure 3) as a pre-processing stage, aiming to further enhance the intensity distribution. In the resulting pre-processed CT image, vertebral regions are even more distinguishable. In the following step, the well-known Otsu's thresholding with three classes is applied to extract vertebral boundaries. Otsu's thresholding is 'a non-parametric, unsupervised method for automatic threshold selection' [38]. The algorithm exhaustively searches k−1 thresholds for k classes, maximizing inter-class variance. Its main stages are: (1) intensity distributions and probabilities of each class are initialized; (2) iteratively, all possible threshold combinations are examined, class probabilities and mean intensity values for each class are updated, and inter-class variance is calculated; (3) threshold values, corresponding to maximum inter-class variance, are selected. Figure 3 illustrates a flowchart for Otsu's thresholding. The three classes considered are associated with background, vertebral bodies, and vertebral contours (Figure 4b). In this subfigure, the presence of 'gaps' as a result of intensity inhomogeneity is evident. The regions marked with the latter two classes are maintained (Figure 4c). Aiming to cope with the effects of intensity inhomogeneity in the results of Otsu thresholding, we compactify the resulting regions by means of morphological closing operations, with a size of 10 × 10 and a disk-shaped structuring element with a radius equal to 2. At this stage, some segmentation artifacts are generated in the form of 'islands' of background pixels, implausibly isolated on the z-axis. Aiming to cope with this, we utilize the extra information of the third dimension and consider an inter-slice window in the z-axis. The labels are corrected by means of a voting scheme, taking into account all corresponding labels in the neighboring slices (Figure 4d,e). As a result, the 'islands' of background pixels are re-labelled as parts of vertebral bodies. Note that only the central regions will eventually be maintained, whereas the rest of the extracted regions (e.g., at the right of Figure 4d,e) will be discarded in the following stage.

Vertebral Region Projection on MRI
The sagittal MRI images are linearly normalized with respect to gray-levels. The vertebral regions identified in CT, as described in stage 2, are projected on the pre-processed MRIs by means of the geometrical transformation derived in stage 1, and the gray-levels associated with the projected regions are set to zero ( Figure 5). The accuracy of the subsequent IVD localization stage is inherently determined by this stage.

IVD Localization
IVD regions are localized by means of a simple heuristic applied on the result of stage 3: non-zero regions (Figure 5c) within an empirically defined stripe approximating the spine are determined as IVD regions. The stripe is defined by scanning the registered image from left to the right and marking the first vertebra pixel. The same process is also performed reversely, from right to left. Eventually, for each row, two pixels are marked defining the stripe (illustrated as red in Figure 6a). Although this empirically determined stripe provides a rough approximation of the spinal regions, when combined with the result of stage 3, the derived localizations are robust (Figure 6b,c). It should be stressed that the localization result of this stage is a rough approximation used to initialize the Chan-Vese active contour model described in the next sub-section.

CT/MRI-Based Segmentation for IVD Boundary Extraction
Starting from the rough approximation of the spinal regions, which is obtained at stage 4, this stage aims at accurately extracting IVD boundaries. For this task, the Chan-Vese active contour model [39] is the segmentation approach of choice, since it is relatively insensitive to initialization and robust against weak edges and noise. The steps followed at this stage are: (1) MR image enhancement by means of the sharpening technique of Saleh and Nordin [40] (Figure 7a

CT/MRI Image Fusion
Apart from the segmentation result, the proposed method provides a hybrid, CT/MRI-based, visualization of the spine. Vertebral regions identified on CT (stage 2) are super-imposed on spinal regions identified on MR (stage 5). Figure 8 illustrates an example of different views of such a visualization, in which both vertebral (marked with gray) and IVD (marked with red) regions are identified. This bimodal-based illustration of both regions provides a valuable tool, potentially aiding differential diagnosis with respect to various spine-related pathologies.

Experimental Evaluation
The proposed method was applied on 7 pairs of CT and MR images with 98 images using different scanners. The pixel size and slice thickness differed within the ranges of 0.33-0.37 mm and 1.5-3 mm for CT and 0.47-0.55 mm and 3-4 mm for MRI, respectively. This dataset is publicly available (http://spineweb.digitalimaginggroup.ca).

Evaluation Metrics
The method was quantitatively evaluated, using CT and MRI ground truth segmentations obtained by a medical expert. The Dice similarity coefficient (DSC) [41] and Hausdorff distance (HD) [42] were adopted to evaluate the segmentation accuracy. DSC depends on the number of common pixels between the images compared, whereas HD is derived from distances of all pairs of pixels.
Let S g and S t be the binary images that are obtained from the manual and the proposed segmentation, respectively. In both images, the pixels of the structures are set to 1 and the rest are set to 0. Let also X g and Y t be the boundaries of structures in S g and S t , respectively. DSC and HD are defined in Equations (3) and (4): HD X g , Y t = max sup x∈X g in f y∈Y t d(x, y), sup y∈Y t in f x∈X g d(x, y) where the metric d employed is the Euclidean distance.

Results and Discussion
The CT segmentation of the proposed method was quantitatively compared with the methods of Huang and Isaac [3,4] ( Table 2). The IVD segmentations obtained by the proposed method were quantitatively compared with four state-of-the-art methods [5] (Table 3). Figures 9 and 10 illustrate the results of Tables 2 and 3, respectively. Figure 11 illustrates the segmentation quality obtained by all methods compared in both modalities. It can be noted that when considering both quality measures, the proposed method is at least comparable to state-of-the-art learning-based methods in both modalities. Apart from the dependency on training, the state-of-the-art methods compared have also numerous limitations, which include difficulties in the presence of severe pathologies or when certain structures are absent, memory requirements, dependency on atlases etc. These limitations are summarized in Table 1. Figures 11 and 12 illustrate example CT and MRI segmentations obtained by the proposed method. In Figure 11, it can be observed that the vertebra boundaries extracted are plausible. This is also confirmed by a medical expert who was asked to qualitatively evaluate the segmentation result. Similarly, in Figure 12, it can be observed that the IVD boundaries extracted are plausible, whereas this is confirmed by the medical expert. Figure 13 presents example visualizations obtained by the proposed method, marking vertebrae and IVDs with different colors. Table 2. CT image segmentation quality of the proposed method and state-of-the-art.

Method
Mean DSC (%) ± SD Mean HD (mm) ± SD Lopez Andrade and Glocker [27] 87.9 ± 3.4 4.9 ± 1.5 Wang and Forsberg [29] 90.0 ± 2.6 4.7 ± 0.9 Chen et al. [35] 88.4 ± 3.7 4.7 ± 1.4 Korez et al. [30] 91.5 ± 2.     Another set of experiments involved the evaluation of the robustness of the proposed method against various parameters. Two different transformation types where tested: affine and similarity-based. In addition, three different interpolation methods where tested: linear, nearest neighbor, and cubic. Similarity-based transformation and linear interpolation led to the highest segmentation quality with 100 iterations, a radius growth factor equal to 1.05, and an initial search radius equal to 0.004. The DSC and HD of the segmentation result obtained for various settings of the mask size for morphological filtering on CT (blue line) and MRI (red line) are illustrated in Figure 14a,b. It can be observed that the differences in the accuracy obtained are less than 5% with respect to DSC and less than 0.2 pixels with respect to HD, as this parameter ranges from 8 to 12. In the previously presented experimental comparisons, this value was set to 10. In addition, the variances in DSC and HD as the number of active contour iterations ranges from 20 to 65 are illustrated in Figure 14c,d. It can be observed that the differences in the accuracy obtained are less than 6% with respect to DSC and less than 1.5 pixels with respect to HD. In the previously presented experimental comparisons, this value was set to 30. It should be remarked that, unlike these parameters, the thresholds involved in CT segmentation (Section 2.2) are automatically selected.
As a final remark, for our suboptimal Matlab (Mathworks, MA, USA) implementation, the CT-based and MR-based stages are currently applied sequentially, with an effect on overall time cost, which is approximately 3 min. Parallelization of these stages is expected to drastically reduce time cost.

Conclusions
This work introduces a computational approach for vertebrae and IVD boundary extraction, based on CT and MRI data. The proposed method (1) derives complementary information from both modalities; (2) is not learning-based and is not dependent on the availability of large, labelled datasets, unlike the vast majority of state-of-the-art methods; (3) is capable of obtaining segmentation results of at least comparable quality to the ones obtained by state-of-the-art methods, although it is not learning-based; (4) provides a bimodal visualization of the spine, which could potentially aid differential diagnosis with respect to several spine-related pathologies. In addition, the proposed method requires a limited amount of intervention. It starts from aligning corresponding CT and MR images, whereas the CT images are segmented to extract vertebrae boundaries. The result of CT segmentation is fused with MR images to guide the subsequent localization and segmentation stages. The result of CT segmentation guides IVD localization and the exact IVD boundaries are extracted by applying the Chan-Vese active contour model on a contrast-enhanced version of the MR image. Finally, the extracted vertebrae and IVD boundaries can be used to provide a hybrid 3D visualization of the spine. The proposed method was compared with state-of-the-art methods, with respect to: (1) CT image segmentation for vertebrae boundary extraction, (2) MR image segmentation for IVD boundary extraction. In both cases, the obtained segmentation quality, as quantified by means of the DICE and HD measures, was at least comparable to the one obtained from state-of-the-art learning-based methods. In addition, unlike competing methods, the proposed method requires no prior knowledge in the form of an atlas or learning from annotated samples. On the other hand, the proposed method depends on the availability of CT/MR image pairs, which could limit its applicability, taking into account that several currently adopted clinical practices are single-modal. Moreover, our suboptimal Matlab implementation is relatively slower than some of the state-of-the-art methods.
A research challenge for future work is the design of hybrid approaches that incorporate learning-based components, which can take advantage of the potential availability of small-sized, labelled CT/MR image pairs. Moreover, the development of an optimized implementation, as well as additional experiments on clinically important cases, such as those involving the presence of dehydrated discs, are the subject of our ongoing research.