Integration of Deep Learning and Active Shape Models for More Accurate Prostate Segmentation in 3D MR Images

Magnetic resonance imaging (MRI) has a growing role in the clinical workup of prostate cancer. However, manual three-dimensional (3D) segmentation of the prostate is a laborious and time-consuming task. In this scenario, the use of automated algorithms for prostate segmentation allows us to bypass the huge workload of physicians. In this work, we propose a fully automated hybrid approach for prostate gland segmentation in MR images using an initial segmentation of prostate volumes using a custom-made 3D deep network (VNet-T2), followed by refinement using an Active Shape Model (ASM). While the deep network focuses on three-dimensional spatial coherence of the shape, the ASM relies on local image information and this joint effort allows for improved segmentation of the organ contours. Our method is developed and tested on a dataset composed of T2-weighted (T2w) MRI prostatic volumes of 60 male patients. In the test set, the proposed method shows excellent segmentation performance, achieving a mean dice score and Hausdorff distance of 0.851 and 7.55 mm, respectively. In the future, this algorithm could serve as an enabling technology for the development of computer-aided systems for prostate cancer characterization in MR imaging.


Introduction
Prostate cancer is the most common cancer in the world for what concerns the male population [1]. There are many techniques that can be used for diagnosis, such as screening using the serum prostate specific antigen (PSA) or urological approaches, such as biopsies guided by transrectal ultrasound; however, the method that demonstrates the highest accuracy is MRI (magnetic resonance imaging). The use of MRI in prostate cancer diagnosis has the primary purpose of determining the stage of the disease, and potentially planning the necessary radiation therapy, both in form of external beam radiation, and in the form of a guide for the seed implant for brachytherapy.
Prostate gland (PG) boundary delineation represents an important role for radiation therapy planning, MRI-guided biopsy and development of computer-aided diagnosis systems for the early detection of prostate cancer [2]. However, manual segmentation of the prostate in 3D MRI images is time-consuming and prone to inter-and intra-operator variability. In this regard, fast and accurate automated segmentation algorithms are highly demanded in clinical practice. Automatic segmentation of the PG is a very challenging Fusion of the information from anisotropic images to avoid resampling to isotropic voxels.
Pollastri et al. [19] 2022 Prostate-MRI-US-Biopsy dataset [21][22][23] Variability of prostate shape and texture Long-range 3D Self-Attention Block integrated within the CNN In this paper, a novel prostate gland segmentation strategy is presented. Our approach starts from an initial segmentation of the prostate using a volumetric CNN model to fully exploit the 3D spatial contextual information of the input data to perform a volume-tovolume prediction. The proposed deep network is three-dimensional, thus giving more importance to spatial coherence that allows obtaining the organ in its entirety, compared to a two-dimensional network. Although three-dimensional segmentation has this spatial advantage, it places less emphasis on accuracy on the individual slice, especially at the edges of the prostate [24]. For this reason, an improved active shape model is proposed to perform a refinement of the prostate contours on the CNN outputs to achieve a more precise segmentation and to adjust any implausible shapes output by the network as they are corrected according to the shape constraints defined by the ASM parameters. The main contribution of this paper can be summarized as follows:

•
We propose a fully automated hybrid approach for 3D MRI prostate segmentation using CNNs and ASMs. A custom 3D deep network is employed to provide a first segmentation of the prostate glands. Then, a statistical model is applied to refine the segmentation in correspondence of the prostate edges.

•
We develop an effective approach based on the combination of semantic features extracted from the CNN and statistical features from the ASM. By using a CNN-based initialization, we can bypass the limitations of current ASMs.

•
We improved the robustness of the ASM by removing noisy intensity profiles using the DB-SCAN clustering algorithm.

•
We publicly release the code used in this work. An extended validation is also performed by comparing the proposed approach with two manual operators. Our algorithm obtains highly satisfactory results.

Patients and Dataset Composition
The dataset is composed of T2w MRI prostatic volumes of 60 male patients. T2w imaging was employed since it provides good contrast between anatomical characteristics of the organ and for this reason are helpful in concluding a diagnosis [25]. The study was approved by the local Ethics Committee and all the participants signed an informed consent form (protocol number: CE IRCCS 191/2014, approval date: 17 June 2014). The images were collected with a 1.5 T scanner (Signa Excite HD, GE Healthcare, Chicago, IL, USA) using a four-channel phased-array coil combined with an endorectal coil (Medrad, Indianola, PA, USA). The following protocol was adopted to acquire the T2w images: slice thickness of 3 mm covering 7.2 cm, hence obtaining 24 slices per volume; field of view equal to 16 × 16 cm; acquisition matrix of 384 × 288 with a reconstruction matrix of 512 × 512 pixels. The prostate gland was segmented in T2w images by an expert radiologist, with 11 years of experience in MRI prostate examination. Manual annotations have been obtained using the 3D Slicer software [26]. An example of manual annotations is shown in Figure 1. Our dataset was divided into a construction set (45 patients) and test set (15 patients). The construction set was then divided into a training set (36 patients) and a validation set (9 patients).
initialization, we can bypass the limitations of current ASMs.

•
We improved the robustness of the ASM by removing noisy intensity profiles using the DB-SCAN clustering algorithm.

•
We publicly release the code used in this work. An extended validation is also performed by comparing the proposed approach with two manual operators. Our algorithm obtains highly satisfactory results.

Patients and Dataset Composition
The dataset is composed of T2w MRI prostatic volumes of 60 male patients. T2w imaging was employed since it provides good contrast between anatomical characteristics of the organ and for this reason are helpful in concluding a diagnosis [25]. The study was approved by the local Ethics Committee and all the participants signed an informed consent form (protocol number: CE IRCCS 191/2014, approval date: 17 June 2014). The images were collected with a 1.5 T scanner (Signa Excite HD, GE Healthcare, Chicago, IL, USA) using a four-channel phased-array coil combined with an endorectal coil (Medrad, Indianola, PA, USA). The following protocol was adopted to acquire the T2w images: slice thickness of 3 mm covering 7.2 cm, hence obtaining 24 slices per volume; field of view equal to 16 × 16 cm; acquisition matrix of 384 × 288 with a reconstruction matrix of 512 × 512 pixels. The prostate gland was segmented in T2w images by an expert radiologist, with 11 years of experience in MRI prostate examination. Manual annotations have been obtained using the 3D Slicer software [26]. An example of manual annotations is shown in Figure 1. Our dataset was divided into a construction set (45 patients) and test set (15 patients). The construction set was then divided into a training set (36 patients) and a validation set (9 patients).

Bias Field Correction and Intensity Normalization
Pre-processing is applied to improve the quality of the MR images by enhancing contrast and making the process of feature recognition easier for the algorithm. MRI images can be characterized by non-uniformity of intensity at low frequency that corrupts the images (i.e., bias field). Bias field correction is performed using the N4 algorithm [27] which adopts an iterative scheme to estimate and remove the distortion field. The algorithm usually takes an image as input and estimates its bias field using an automatically

Bias Field Correction and Intensity Normalization
Pre-processing is applied to improve the quality of the MR images by enhancing contrast and making the process of feature recognition easier for the algorithm. MRI images can be characterized by non-uniformity of intensity at low frequency that corrupts the images (i.e., bias field). Bias field correction is performed using the N4 algorithm [27] which adopts an iterative scheme to estimate and remove the distortion field. The algorithm usually takes an image as input and estimates its bias field using an automatically defined mask by applying Otsu's threshold algorithm. Let us consider the MRI image defined as: where v is the image, u represents the uncorrupted image, f is the bias field and n is the Gaussian noise. The iterative procedure followed by the N4 algorithm allows us to obtain the uncorrupted image at the n th iteration as: The optimization is applied to the residual bias field, and the total bias field is obtained as the sum of the residuals. Compared to other techniques [28], the N4 algorithm does not use least squares fitting, and because of this, it does not need any regularization parameters to be adjusted, so it is more robust to noise because the approximation is done locally and then all the local solutions are combined to obtain the global approximation.
The lack of a standard image intensity scale in MRI makes both visualization and subsequent image analysis challenging. MRI intensities do not have a fixed meaning due to different protocols, patients, and scanners. This can cause various problems during image segmentation and quantification of radiomic features [29][30][31]. Therefore, we normalized the intensity of the MRI volumes according to the method proposed by Làszlò et al. [32]. Briefly, several reference points (i.e., percentile values, mode) are extracted from the grayscale histograms of the training set volumes to compute a standardized histogram. Then, for each MRI volume, image mapping is applied from the original histogram space to the standardized one through piecewise linear transformations. More details about the intensity normalization are provided in Appendix A. Figure 2 shows the pre-processing steps employed.
where is the image, represents the uncorrupted image, is the bias field and is the Gaussian noise. The iterative procedure followed by the N4 algorithm allows us to obtain the uncorrupted image at the n th iteration as: The optimization is applied to the residual bias field, and the total bias field is obtained as the sum of the residuals. Compared to other techniques [28], the N4 algorithm does not use least squares fitting, and because of this, it does not need any regularization parameters to be adjusted, so it is more robust to noise because the approximation is done locally and then all the local solutions are combined to obtain the global approximation.
The lack of a standard image intensity scale in MRI makes both visualization and subsequent image analysis challenging. MRI intensities do not have a fixed meaning due to different protocols, patients, and scanners. This can cause various problems during image segmentation and quantification of radiomic features [29][30][31]. Therefore, we normalized the intensity of the MRI volumes according to the method proposed by Làszlò et al. [32]. Briefly, several reference points (i.e., percentile values, mode) are extracted from the grayscale histograms of the training set volumes to compute a standardized histogram. Then, for each MRI volume, image mapping is applied from the original histogram space to the standardized one through piecewise linear transformations. More details about the intensity normalization are provided in Appendix A. Figure 2 shows the pre-processing steps employed. Pre-processing steps applied to each MRI volume. First, the N4 algorithm is used for bias field correction. Then, intensity normalization is applied to standardize each MRI volume.

Deep Convolutional Networks
A custom network, called VNet-T2, is designed to perform an initial segmentation of the prostate gland. The basic idea is to create a network inspired by the UNet architecture [33] that, taking as input a prostate volume, learns from it to discriminate between voxels belonging to the prostate and the background, and returns a volumetric label of the same size as the input. The network discussed in this section is trained from scratch since the weights obtained for similar problems employed different input sizes, and due to the complexity and high computational load of the volumetric problem, it was decided to avoid resampling to make the sizes compatible. This means that the original size of the volumes was maintained (512 × 512 × 24 pixels) and the input size of the network was consequently customized. Pre-processing steps applied to each MRI volume. First, the N4 algorithm is used for bias field correction. Then, intensity normalization is applied to standardize each MRI volume.

Deep Convolutional Networks
A custom network, called VNet-T2, is designed to perform an initial segmentation of the prostate gland. The basic idea is to create a network inspired by the UNet architecture [33] that, taking as input a prostate volume, learns from it to discriminate between voxels belonging to the prostate and the background, and returns a volumetric label of the same size as the input. The network discussed in this section is trained from scratch since the weights obtained for similar problems employed different input sizes, and due to the complexity and high computational load of the volumetric problem, it was decided to avoid resampling to make the sizes compatible. This means that the original size of the volumes was maintained (512 × 512 × 24 pixels) and the input size of the network was consequently customized.
The VNet-T2 architecture, based on the 3D-UNet Keras model [34], is shown in Figure 3. This network has alternating blocks of 3D convolution, 3D max-pooling, activation, and 3D deconvolution. The VNet-T2 network consists of an initial deconvolution path composed of convolution and max pooling blocks to extract image features at deeper levels. Specifically, the individual blocks of the encoder have been structured as follows: • A first convolutional layer (kernel size 3 × 3 × 3) followed by a layer of instance normalization; • A second convolutional layer (kernel size 3 × 3 × 3) followed by a layer of instance normalization; • A max-pooling layer in 3D with a pool size equal to (2, 2, 2).
composed of convolution and max pooling blocks to extract image features at deeper levels. Specifically, the individual blocks of the encoder have been structured as follows: • A first convolutional layer (kernel size 3 × 3 × 3) followed by a layer of instance normalization; • A second convolutional layer (kernel size 3 × 3 × 3) followed by a layer of instance normalization; • A max-pooling layer in 3D with a pool size equal to (2, 2, 2). The spatial information is reconstructed in the subsequent upsampling path composed of alternating blocks of transposition and convolution to restore the original dimensions. The output of the network is a probability map that assigns each voxel a probability of belonging to a specific class.
The entire network is trained on a two-class problem (prostate vs. background), giving the 512 × 512 × 24 grayscale images as input and the corresponding labeled masks as the target. To solve the problem of class imbalance, two different strategies have been tested:

•
Class balancing: the network's loss function is class-weighted by considering how frequently a class occurs in the training set. This means that the least represented class (prostate gland) will have a greater contribution than the more represented one (background) during the weight update. This is performed following the same approach of our previous work [35]. • Dice loss: the network loss function is calculated as 1-DSC, where DSC is the dice score computed between the manual annotation and the network prediction. The dice overlap is a widely used loss function for highly unbalanced segmentations [36].
To optimize all hyperparameters in the network, we tried different configurations, varying the network depth, number of filters, loss function, learning rate, and evaluation metric. The maximum number of epochs was set to 50, with a validation patience of 25 epochs for early stopping of the training process. The final configuration of the VNet-T2 network is summarized in Table 2. The best model was chosen by analyzing the performance obtained on the training set (36 patients). We have made the image dataset and implementation codes of the VNet-T2 network publicly available (see Data Availability The spatial information is reconstructed in the subsequent upsampling path composed of alternating blocks of transposition and convolution to restore the original dimensions. The output of the network is a probability map that assigns each voxel a probability of belonging to a specific class. The entire network is trained on a two-class problem (prostate vs. background), giving the 512 × 512 × 24 grayscale images as input and the corresponding labeled masks as the target. To solve the problem of class imbalance, two different strategies have been tested:

•
Class balancing: the network's loss function is class-weighted by considering how frequently a class occurs in the training set. This means that the least represented class (prostate gland) will have a greater contribution than the more represented one (background) during the weight update. This is performed following the same approach of our previous work [35]. • Dice loss: the network loss function is calculated as 1-DSC, where DSC is the dice score computed between the manual annotation and the network prediction. The dice overlap is a widely used loss function for highly unbalanced segmentations [36].
To optimize all hyperparameters in the network, we tried different configurations, varying the network depth, number of filters, loss function, learning rate, and evaluation metric. The maximum number of epochs was set to 50, with a validation patience of 25 epochs for early stopping of the training process. The final configuration of the VNet-T2 network is summarized in Table 2. The best model was chosen by analyzing the performance obtained on the training set (36 patients). We have made the image dataset and implementation codes of the VNet-T2 network publicly available (see Data Availability Statement). Training was performed on a workstation with 32 GB of RAM with access to a Nvidia K80 GPU (Nvidia, Santa Clara, CA, USA) using Keras framework and TensorFlow as backend.

Active Shape Models (ASM)
The results obtained in output from the VNet-T2 network present a rough segmentation of the prostate that needs further refinement. To obtain the final precise segmentation of the prostate in the MRI volumes, an ASM approach was employed.
The Point Distribution Model (PDM) [37] represents the modes of variation of the shape of a structure, having the average shape calculated on the training set as the initial model. From this average shape, the position, orientation, scale and shape parameters are adjusted to obtain a better fit of the image. Limits are placed on the movements that every point can make, to maintain the shape that can be obtained plausible (Global Shape Constraints) [4]. Hence, the first step in the ASM refinement is the construction of a mean shape model on the training dataset.

Mean Shape Model Determination and Appearance Data
In order to create a mean shape model, a set of coherent vertices for all the prostates in the dataset must first be determined, after which the final shape model can be built. After an initial isotropic upsampling in order to have the same resolution (0.5 mm × 0.5 mm × 0.5 mm) along all three axes, the computation of the coherent vertices is divided into 4 steps: 1.
3D-surface fit: in this step, the external surface of each prostate is fitted with a threedimensional ellipsoidal surface. Subsequently, key reference points are determined in both the x-y plane and the y-z planes as F xy = x center , y center ± c xy 2 , z center and F xz = x center ± c xz 2 , y center , z center , respectively, where c xy = radius 2 max − radius 2 min and c xz = radius 2 min .

2.
Triangulation: Starting from the 3D vertices obtained in step 1, the Alpha Shape Triangulation [38] is employed to divide the 3D surface into a variable number of triangles. This triangulation method requires an α parameter that defines the level of refinement of the structure. A value of α = 50 was used, as it was found to be appropriate for all prostate shapes included in the dataset.

3.
Ray-Triangle intersection: This step is necessary to obtain corresponding key points in each prostate. To do so, the Moller-Trumbdor [38] algorithm is employed to compute the intersection between a set of rays originating in each of the key points found in step 1 (i.e., F xy and F xz ) and each of the triangles that were obtained in step 2.
A ray is defined as R(t) = OP + tD where O is the origin of the ray and D is a normalized direction. In this study, we chose 8 directions with a step angle (θ) of 360 • /8. For a detailed description of how this algorithm works, please see the study by Moller et al. [38].

4.
Vertices determination: For each direction D, the intersection points between the ray and the 3D model are determined and make up the final vertices.
For the final mean shape model determination, the vertices obtained in step 4 for all volumes in the training set (e.g., manual segmentations) need to be aligned. For simplicity, the contour positions of all volumes are aligned to the first volume in the dataset. Alignment is obtained by removing any potential translation (i.e., the volumes are centered) and by correcting the rotation using the four-quadrant inverse tangent in two steps ( Figure 4). Subsequently, a matrix containing all the contour point data of the training set is constructed and a principal component analysis (PCA) is applied to remove contour noise by maintaining only 98% of the total variance, which produces the final mean shape model (Figure 4). For the final mean shape model determination, the vertices obtained in step 4 for all volumes in the training set (e.g., manual segmentations) need to be aligned. For simplicity, the contour positions of all volumes are aligned to the first volume in the dataset. Alignment is obtained by removing any potential translation (i.e., the volumes are centered) and by correcting the rotation using the four-quadrant inverse tangent in two steps (Figure 4). Subsequently, a matrix containing all the contour point data of the training set is constructed and a principal component analysis (PCA) is applied to remove contour noise by maintaining only 98% of the total variance, which produces the final mean shape model (Figure 4).  On the other hand, the gray level profiles of the original images are used to construct the grayscale appearance model. The next step in our ASM approach is the determination of the appearance data (i.e., gray levels) that need to be employed to optimize the finale prostate contour. A single-resolution model (scale = 1) based on the values of the gray profiles along the normal directions of the surface points is employed. The gray level intensity profiles along the normal directions of the surface are obtained by determining a length of the profile, length = (n s * 2) + 1, which is centered on the surface and protrudes both within and outside the surface. In this study, n s is equal to 2. A cubic interpolation is then applied on the gray level intensity profiles and subsequently a clustering algorithm was employed. Since the prostate presents a great variability along the contours, the application of the density-based spatial clustering of applications with noise (DB-SCAN) algorithm [39,40] was crucial to find clusters of an arbitrary shape, taking into consideration the potential noise present in the data distribution ( Figure 4). More details about the DB-SCAN algorithm applied to the gray level intensity profiles is reported in Appendix B.

ASM Model Application on Network Output
The obtained model described in Section 2.4.1 is subsequently applied on the vertices that are obtained in output from the 3D CNN (Section 2.3) after performing isotropic upsampling. The model is applied on the input volume at its original resolution (scale = 1). The input for the mean shape model application algorithm is the matrix of vertices that correspond to a single segmented volume (i.e., 3D CNN binary output). The normal directions to the surface are computed as described previously and a gray level intensity profile (g) is placed along each normal direction with a length n = 2 * n s + 1 where n s corresponds to the search length in both directions. PCA is applied to the matrix containing the selected gray level profiles to compute the eigenvector matrix W g and the mean gray level intensity profile g. Noisy intensity profiles were filtered out before applying the PCA using the DB-SCAN clustering algorithm (see Appendix B). The objective function for the ASM evolution is defined as follows: and λ g are the eigenvectors. The regularization error is calculated as the sum of the objective function values for all the points corresponding to the search length within the profile length. The minimum value of f is selected at each iteration to determine the movement of each point as: where n s is the length of the search profile in both directions as described previously and ind is the position in the profile at which f reaches the minimum value. The model is then realigned, removing the translation and rotation, using the same method as described in Section 2.4.1. The final b parameters are obtained as where W s , x search and x mean are the shape eigenvectors matrix, the shape in the current iteration and the mean shape, respectively. The parameters are limited to the constraints of the mean model using the eigenvalues of the PCA model obtained during the training phase, as described previously. Hence, each point belonging to the final shape should lie within an area b max whose limits are given by a parameter m: where λ s are the shape eigenvalues. A detailed analysis of the shape constraint parameter (m) is given in Section 3. The b parameters of the model are then reset to the positions of the boundary in the original space: Finally, the translation and rotation are restored with an inverse aligning procedure (i.e., the inverse of the procedure described previously is applied).

Post-Processing
The final vertices obtained in output from the active shape model need to be postprocessed in order to compute validation parameters and evaluate the entire pipeline's performance ( Figure 5). While the final vertices are in theory very close to the real shape of the considered prostate, a post-processing algorithm is necessary to obtain the final binary segmentation of the volume. The algorithm is divided into 3 steps:

1.
Triangulation: the Alpha Shape Triangulation method is employed to divide the 3D surface into a variable number of triangles, with α = 30. 2.
2D slices definition: to obtain the final 2D slices of the segmentation, the volume is divided into a number of planes whose z-coordinate corresponds to the number of slices. Then, similarly as to what was described previously, new vertices of the segmentation are found by computing the intersection between each ray and each triangle and taking the furthest point from the center, which is a first approximation of the points on the outermost surface.

3.
Final 3D volume reconstruction: the final 3D volume is obtained by stacking the 2D slices together, employing a hole-filling operation, and then a 3D morphological closing (spherical structural element, radius = 4). The post-processed binary volume is finally downsampled to the original resolution.
Finally, the translation and rotation are restored with an inverse aligning procedure (i.e., the inverse of the procedure described previously is applied).

Post-Processing
The final vertices obtained in output from the active shape model need to be postprocessed in order to compute validation parameters and evaluate the entire pipeline's performance ( Figure 5). While the final vertices are in theory very close to the real shape of the considered prostate, a post-processing algorithm is necessary to obtain the final binary segmentation of the volume. The algorithm is divided into 3 steps: 1. Triangulation: the Alpha Shape Triangulation method is employed to divide the 3D surface into a variable number of triangles, with α = 30. 2. 2D slices definition: to obtain the final 2D slices of the segmentation, the volume is divided into a number of planes whose z-coordinate corresponds to the number of slices. Then, similarly as to what was described previously, new vertices of the segmentation are found by computing the intersection between each ray and each triangle and taking the furthest point from the center, which is a first approximation of the points on the outermost surface. 3. Final 3D volume reconstruction: the final 3D volume is obtained by stacking the 2D slices together, employing a hole-filling operation, and then a 3D morphological closing (spherical structural element, radius = 4). The post-processed binary volume is finally downsampled to the original resolution. Then, the ASM model is applied to refine the volumetric segmentation. Finally, triangularization is performed to obtain the binary masks.

Performance Metrics
A comparison between manual and automatic masks is carried out to assess the performance of the proposed method for prostate gland segmentation. The Dice similarity coefficient (DSC) is calculated for both the train and test sets. Since prostate gland boundary delineation has a crucial role in radiation therapy planning, we also computed the Hausdorff distance. Specifically, we computed the 95th percentile Hausdorff distance (HD95), which is defined as the maximum distance of a set (manual boundary) to the nearest point in the other set (automatic boundary). This metric is more robust towards a very small subset of outliers because it is based on the calculation of the 95th percentile of distances. Finally, we calculated the relative volume difference (RVD) to measure the underand over-segmentation of the algorithm [41].

Performance Metrics
A comparison between manual and automatic masks is carried out to assess the performance of the proposed method for prostate gland segmentation. The Dice similarity coefficient (DSC) is calculated for both the train and test sets. Since prostate gland boundary delineation has a crucial role in radiation therapy planning, we also computed the Hausdorff distance. Specifically, we computed the 95th percentile Hausdorff distance (HD 95 ), which is defined as the maximum distance of a set (manual boundary) to the nearest point in the other set (automatic boundary). This metric is more robust towards a very small subset of outliers because it is based on the calculation of the 95th percentile of distances. Finally, we calculated the relative volume difference (RVD) to measure the under-and over-segmentation of the algorithm [41].

Ablation Study
We have tested our method in 4 different configurations by varying three of the most important parameters of the ASM: number of iterations (it), search length (ns), shape constraint (m). The number of iterations defines how many times the algorithm runs the search before stopping, the search length refers to the length in pixels along the profile normal to each surface point on which the search is performed, while the shape constraint is used to limit the model parameters based on what is considered a normal contour. Table 3 shows the ASM configuration used in this work, while Table 4 summarizes the performance of the proposed method in prostate segmentation. The highest DSC value on the train set is obtained by the 3D network alone (0.886) while on the test set the best DSC is achieved by the VNet-T2 + ASM-2 configuration (0.851). In terms of Hausdorff distance, the best performing configuration is the VNet-T2 + ASM-2 both for the training and test set. The same configuration of ASM (it: 2, ns: 8, m:2) also obtains the best value of RVD in the test set, showing a good generalization ability of the proposed method. The HD 95 analysis reveals a maximum distance between surfaces of about 7.55 mm, while RVD has an average value of 9.60%, meaning that the algorithm tends to over-segment on average. Interestingly, the Hausdorff distance always decreases with the application of ASM for all four tested configurations (both mean values and standard deviation). Furthermore, the application of the ASM model reduces the performance gap between the train and the test set, thus mitigating the overfitting of the VNet-T2 network. Figure 6 shows the performance of our method before and after the application of the ASM (i.e., VNet-T2 vs. VNet-T2 +ASM-2). standard deviation). Furthermore, the application of the ASM model reduces the performance gap between the train and the test set, thus mitigating the overfitting of the VNet-T2 network. Figure 6 shows the performance of our method before and after the application of the ASM (i.e., VNet-T2 vs. VNet-T2 +ASM-2).

Inter-Observer Variability
In the research field of prostate gland segmentation, several studies have shown the presence of a fair amount of variability between the manual annotations of two operators [42,43]. Therefore, we asked a second experienced operator to manually segment the prostate volumes of the test set. Table 5 reports the metrics values (DSC, HD95, and RVD) comparing Operator 1 (Op1) vs. Operator 2 (Op2) to take account of inter-operator variability. Specifically, for the test set we compared the minimum, average and maximum values of the metrics obtained between two operators and the best performing configuration of our method (i.e., VNet-T2 + ASM-2). As can be seen from Table 5, the application of ASM improves the performance of the deep volumetric network, increases the average values of the metrics, reduces the outliers (maximum values) and presents values similar to those obtained between two operators (Op1 vs. Op2). Figure 7 depicts for three patients in the test set the comparison between VNet-T2 and the same network with ASM refinement.

Inter-Observer Variability
In the research field of prostate gland segmentation, several studies have shown the presence of a fair amount of variability between the manual annotations of two operators [42,43]. Therefore, we asked a second experienced operator to manually segment the prostate volumes of the test set. Table 5 reports the metrics values (DSC, HD95, and RVD) comparing Operator 1 (Op1) vs. Operator 2 (Op2) to take account of inter-operator variability. Specifically, for the test set we compared the minimum, average and maximum values of the metrics obtained between two operators and the best performing configuration of our method (i.e., VNet-T2 + ASM-2). As can be seen from Table 5, the application of ASM improves the performance of the deep volumetric network, increases the average values of the metrics, reduces the outliers (maximum values) and presents values similar to those obtained between two operators (Op1 vs. Op2). Figure 7 depicts for three patients in the test set the comparison between VNet-T2 and the same network with ASM refinement.

Discussion
Prostate segmentation in MRI images is a challenging task, due to the heterogeneity in size, morphology, and appearance of the prostate gland. In addition, the boundaries between the prostate gland and surrounding anatomic structures are not always clear and defined. In this study, we propose a fully automated hybrid approach for 3D MRI segmentation of the prostate using CNN and ASM models. A 3D CNN, named VNet-T2, is employed to obtain an initial segmentation of the prostate gland. The output of the CNN is then used as the initialization of the statistical model (ASM) to obtain an accurate segmentation of the prostate contours. The main reason why we proposed a hybrid model based on CNN and ASM is that the CNN 3D model does not guarantee a noise-free prediction, which leads to some boundaries that deviate from the real ones. When applying ASM, the overall segmentation is constrained by the average shape model of the training set, and this ensures the plausibility of the shapes obtained as the final output, reducing the noise problem.
The results obtained show that the application of the statistical model in cascade to the CNN improves the segmentation performance, increasing the DSC values from 0.840 to 0.851 and decreasing the Hausdorff distance from 10.74 mm to 7.55 mm. These results show how applying the ASM on the CNN output results in a closer to true segmentation, especially at the contours. From Table 4, it can be seen that the ASM model allows us to decrease the standard deviation of the metrics, confirming the robustness of the proposed method. The use of a hybrid model allows us to overcome the limitations of the individual models, taking from each what are the most advantageous features. In fact, the overall segmentation can be improved by incorporating knowledge of prostate shape variability into a CNN-based framework.

Discussion
Prostate segmentation in MRI images is a challenging task, due to the heterogeneity in size, morphology, and appearance of the prostate gland. In addition, the boundaries between the prostate gland and surrounding anatomic structures are not always clear and defined. In this study, we propose a fully automated hybrid approach for 3D MRI segmentation of the prostate using CNN and ASM models. A 3D CNN, named VNet-T2, is employed to obtain an initial segmentation of the prostate gland. The output of the CNN is then used as the initialization of the statistical model (ASM) to obtain an accurate segmentation of the prostate contours. The main reason why we proposed a hybrid model based on CNN and ASM is that the CNN 3D model does not guarantee a noise-free prediction, which leads to some boundaries that deviate from the real ones. When applying ASM, the overall segmentation is constrained by the average shape model of the training set, and this ensures the plausibility of the shapes obtained as the final output, reducing the noise problem.
The results obtained show that the application of the statistical model in cascade to the CNN improves the segmentation performance, increasing the DSC values from 0.840 to 0.851 and decreasing the Hausdorff distance from 10.74 mm to 7.55 mm. These results show how applying the ASM on the CNN output results in a closer to true segmentation, especially at the contours. From Table 4, it can be seen that the ASM model allows us to decrease the standard deviation of the metrics, confirming the robustness of the proposed method. The use of a hybrid model allows us to overcome the limitations of the individual models, taking from each what are the most advantageous features. In fact, the overall segmentation can be improved by incorporating knowledge of prostate shape variability into a CNN-based framework.
Despite the good performance, this method is not free from limitations. The CNN architecture presented here is quite simple, and to achieve better results, it would be necessary to increase the depth of the network and the number of base filters. This is because the ASM model performs better the more precise its initialization is. Currently, it was not possible to further increase the depth of the network without running into the problem of overfitting. This problem may be solved in the future by increasing the size of the dataset. This limitation can be partially overcome by adding terms that better approximate the shape of the CNN output by penalizing the curvature within the triangulation algorithm or by using another triangulation algorithms [44]. Moreover, since the ASM was developed for operating on T2w MRI images, if the MRI modality were to change (e.g., ADC, T1w, etc.), the model would have to be partially fine-tuned again. The constructed mean shape model could be maintained since it depends on the organ and not on the image acquisition modality, but the grayscale appearance model would have to undergo another fine-tuning process.
In the future, we plan to expand the dataset, including volumes acquired using different scanners so as to increase the capability of generalizing the problem both for the deep learning network and the ASM. The presented pipeline is also modular, so that in the future it would be possible to update or implement newly developed deep neural networks and/or active shape model. Our methodology (i.e., integration of an ASM with a 3D CNN) can be easily extended to other segmentation problems, including the segmentation of the other organs (e.g., lung, muscles, etc.) and also segmentation within images acquired using a different imaging modality (e.g., CT, US). Our research group is currently working on the extension of this approach for prostate gland segmentation in multiparametric magnetic resonance imaging (mp-MRI), with the aim of developing innovative computer-aided diagnosis systems for prostate cancer characterization.

Conclusions
In this paper, we propose a fully automated hybrid approach for 3D MRI prostate segmentation using CNN and ASM models. The combination of semantic features extracted from the CNN and statistical features from the ASM model results in performance comparable to that of manual operators. The most important finding of this work is that the use of a CNN-based initialization for an ASM model improves segmentation performance compared to the deep network alone. The quantitative differences between this hybrid approach and the 3D network may not be large, but the improved accuracy could prove to be critical for specific clinical applications, allowing the conservation of smaller structures such as the bladder, external sphincter, or seminal vesicles.
The publicly available code can be used to compare the proposed method against popular state-of-the-art models and newly developed architectures. Given the direct clinical necessity of prostate segmentation and volumetric analysis for surgical planning and fusion biopsy, our approach could represent a promising model for future clinical implementations of deep learning to clinical imaging. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The image dataset and the scripts used to develop the VNet-T2 for prostate segmentation in MR images can be found here: 10.17632/25r9ccd7z2.1 (doi).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The standardization method involves two stages: a first training phase to learn transformation parameters and a second phase in which these are used to map the intensity histogram of each volume into a standardized range. The algorithm is based on the following parameters of the histogram ( Figure A1): Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The image dataset and the scripts used to develop the VNet-T2 for prostate segmentation in MR images can be found here: 10.17632/25r9ccd7z2.1 (doi).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The standardization method involves two stages: a first training phase to learn transformation parameters and a second phase in which these are used to map the intensity histogram of each volume into a standardized range. The algorithm is based on the following parameters of the histogram ( Figure A1 For our implementation, in order to compute the value for μ, some conditions were determined considering the difference in adjacent elements in the histogram and the final value was chosen to fall within the 30th and the 70th percentile with respect to the entire volume. Transformation phase: for each volume, µ j is matched to µ s with two linear mappings (from [p 1j , µ j ] to [s 1 , µ s ] and from [µ j , p 2j ] to [µ s , s 2 ]) then extend range [m 1 , m 2 ] is extended to [s 1j , s 2j ]. The total mapping is called standardizer of the image and has the following expression: For our implementation, in order to compute the value for µ, some conditions were determined considering the difference in adjacent elements in the histogram and the final value was chosen to fall within the 30th and the 70th percentile with respect to the entire volume.

Appendix B
The density-based spatial clustering of applications with noise (DB-SCAN) finds neighbors of data points, within a circle of radius ε, and adds them into same cluster. For any neighbor point, if its ε-neighborhood contains at least a predefined number of points (min POINTS ), the cluster is expanded to contain its neighbors, as well. However, if the number of points in the neighborhood is less than min POINTS , the point is labeled as noise, and it is deleted [40].
In our implementation, the algorithm is applied to the gray profiles normal to each point belonging to the prostate surface in order to keep only the most similar profiles within the training data set. The value of min POINTS has been set to 1, thus meaning that only one point is required to define a cluster. For what concerns the ε parameter, an optimization procedure has been carried out to obtain a cluster with a number of profiles between 5 and 25, which has been considered a good compromise to obtain sufficiently high density.