On Unsupervised Methods for Medical Image Segmentation: Investigating Classic Approaches in Breast Cancer DCE-MRI

Unsupervised segmentation techniques, which do not require labeled data for training and can be more easily integrated into the clinical routine, represent a valid solution especially from a clinical feasibility perspective. Indeed, large-scale annotated datasets are not always available, undermining their immediate implementation and use in the clinic. Breast cancer is the most common cause of cancer death in women worldwide. In this study, breast lesion delineation in Dynamic Contrast Enhanced MRI (DCE-MRI) series was addressed by means of four popular unsupervised segmentation approaches: Split-and-Merge combined with Region Growing (SMRG), k-means, Fuzzy C-Means (FCM), and spatial FCM (sFCM). They represent well-established pattern recognition techniques that are still widely used in clinical research. Starting from the basic versions of these segmentation approaches, during our analysis, we identified the shortcomings of each of them, proposing improved versions, as well as developing ad hoc pre- and post-processing steps. The obtained experimental results, in terms of area-based—namely, Dice Index (DI), Jaccard Index (JI), Sensitivity, Specificity, False Positive Ratio (FPR), False Negative Ratio (FNR)—and distance-based metrics—Mean Absolute Distance (MAD), Maximum Distance (MaxD), Hausdorff Distance (HD)—encourage the use of unsupervised machine learning techniques in medical image segmentation. In particular, fuzzy clustering approaches (namely, FCM and sFCM) achieved the best performance. In fact, for area-based metrics, they obtained DI = 78.23% ± 6.50 (sFCM), JI = 65.90% ± 8.14 (sFCM), sensitivity = 77.84% ± 8.72 (FCM), specificity = 87.10% ± 8.24 (sFCM), FPR = 0.14 ± 0.12 (sFCM), and FNR = 0.22 ± 0.09 (sFCM). Concerning distance-based metrics, they obtained MAD = 1.37 ± 0.90 (sFCM), MaxD = 4.04 ± 2.87 (sFCM), and HD = 2.21 ± 0.43 (FCM). These experimental findings suggest that further research would be useful for advanced fuzzy logic techniques specifically tailored to medical image segmentation.


Introduction
The use of advanced imaging technologies has significantly improved the quality of medical care delivered to patients, allowing medical imaging to be an essential part of today's healthcare system [1]. In fact, medical imaging comprises techniques for acquiring • unsupervised segmentation methods, based on classic pattern recognition techniques, can still provide an effective solution despite the prevalence of supervised approaches, such as deep CNNs; • the use of traditional unsupervised approaches-requiring no training-is an advantage in terms of of immediate clinical feasibility; • fuzzy clustering techniques significantly outperformed direct region detection approaches and crisp k-means; • the integration of spatial information into the FCM algorithm achieved the best performance; • with the goal of providing a guide for beginners, as well as possibly enabling new future extensions from other researchers, this study provides all the technical information needed to understand both the functioning of each of the studied algorithms and the implemented workflow.
The remainder of this work is organized as follows. Section 2 introduces the theoretical background about unsupervised segmentation approaches, focusing on the algorithms used in this study. A detailed description of the performed analysis and of the implemented processing steps is proposed in Section 3, where the exploited DCE-MRI dataset is also described. Section 4 formulates the metrics used to evaluate the performance of the analyzed approaches. Section 5 illustrates the experimental results, including a discussion about the comparison of the proposed techniques. Finally, discussion and conclusions are provided in Section 6.

Theoretical Background
A number of algorithms and techniques for image segmentation have been developed and implemented over the years, and a large amount of literature papers about this nontrivial task were proposed. The aim of this section is to present a brief overview about theoretical notions of literature approaches from which this work drawn inspiration. For this reason, we decided for a comprehensive description, to provide the reader with all the technical information needed to understand the functioning of each of the investigated algorithms, as well as the implemented workflow.
The segmentation involves the image partitioning into homogeneous and meaningful sub-regions. By a formal point of view, the segmentation of an image I involves the identification of a finite set of regions R 1 , R 2 , . . . , R N as in Equation (1): with the following constraints: P(R i ) = TRUE, for i = 1, 2, . . . , N, With more details, P is an appropriate logical predicate leading the segmentation process. Equation (1) states that the union of all the sub-regions resulting from segmentation process. Equation (2) points out that the intersection of two different sub-regions is the empty set: this means that the segmented sub-regions do not overlap each other. According to Equation (3), the result of the logical predicate P on all the pixels belonging to the same sub-region is always TRUE; in other words, all the pixels belonging to the same region share the same characteristics according to the predicate P. As a consequence, Equation (4) states that the result of P on the union of two distinct sub-regions is FALSE.
It is important to clarify that each type of medical image has a specific set of features reflecting its own properties: in fact, each image is the result of a complex interaction between the human body and the scanner (i.e., X-rays for CT, magnetic fields for MRI, radioactive decay for nuclear medicine exams) [19]. As a consequence, not all the segmentation techniques obtain the same results on all image types, but some algorithms yield better results when applied on a specific kind of image. Furthermore, it is necessary to point out that, typically, an approach that works very well with one type of image does not mean that it continues to perform well even on different images. As a matter of fact, bioimages exhibit a very high variability, as they depend on various factors, both intrinsic (e.g., patients) and extrinsic (e.g., imaging modalities, acquisition parameters). Accordingly, ad hoc modifications might be required at the level of both the segmentation approach and the pre-/post-processing phases.
In what follows, we outline the techniques investigated and compared for contrastenhancing mass segmentation on DCE-MRI.

Split-and-Merge Combined with Region Growing
The simplest segmentation approaches use a global threshold applied on pixel intensity to partition the original image: pixels with an intensity greater than threshold T are assigned to one region, while those below the threshold T are assigned to another one [20][21][22][23]. In this way, a binary image is created providing the segmentation of the original image with respect to the chosen threshold value. Split-and-Merge and Region Growing (SMRG) is basically a threshold-based approach that combines Split-and-Merge (SM)-composed of a first split (top-down) phase of the image followed by a merge (bottom-up) phase-with RG for the refinement of the identified regions [23].
The SM algorithm represents a valid alternative to thresholding, because it can find homogeneous regions in terms of uniformity criteria [24][25][26][27]. Unlike SM, the RG algorithm, starting from one or more seed-points, identifies an ROI through a growing procedure guided by appropriate similarity properties that describe ROI intensity features. Generally SR and RG are used individually for image segmentation, but the use of both together allows us to exploit the overall potential [28,29].
With more details, the idea behind the SM algorithm involves successive splits of the whole image into disjoint regions lead by a logical predicate P. The algorithm starts with an arbitrary partition R of the original image (i.e., the whole image) and yields an output composed of uniform sub-regions R i , for i = 1, 2, . . . , n, according to the logical criterion expressed by P. At the generic step t, if P(R i ) = FALSE, each region R i is split into four sub-regions (also called 'quad-regions'): this process iteratively continues until a quad-region such that P(R i ) = TRUE or with an area smaller than a certain threshold is found. The logical predicate chosen for this work allows us to find the quad-regions with a mean intensity that is greater than the threshold value yielded by the Otsu's method [30]. If only splitting is used, the final partition contains adjacent regions with identical properties: this drawback can be overcome by merging only adjacent regions where the combined pixels satisfy the predicate P.
After the initial rough ROI identification obtained by means of SM, the RG algorithm expands this initial region to properly identify the lesion. According to the classic RG algorithm, each region begins its own growth from a single pixel (seed-point). Instead, when the SM is exploited, the seed-point can be obtained by using the ROI yielded by the SM algorithm (more correctly, a seed-region) [23]: the seed-region is iteratively grown by evaluating, for each pixel on the boundary, its 8-neighborhood as candidate for the growth. A stopping rule is necessary for interrupting the growing procedure if no more pixels match the membership criterion, which often refers to the proximity of the pixel intensities.

K-Means
Classification refers to data labeling into disjoint sets according to a common set of features. Among these, clustering algorithms can be used to determine the natural structures in the data. Clustering algorithms can use more sophisticated properties of the image: in digital imaging, this means that spatial and/or spectral features concerning pixels can be exploited [31].
K-means is an unsupervised clustering technique that aims at partitioning an input set of N observations into k clusters [32]. Let X = {x 1 , x 2 , . . . , x N } be a set of vector observations such that x i ∈ R n for i = 1, 2, . . . , N. In image segmentation, each component of a vector x represents a numerical pixel attribute: if segmentation is based on gray-scale intensity alone, the n-dimensional observation x i ∈ R n degenerates into the scalar value x i ∈ R representing the intensity of the i-th pixel. The final purpose of k-means is to partition the N observations into k < N disjoint cluster sets C = {C 1 , C 2 , . . . , C k } such that the sum of the distances from each point in a set to the mean of that set is minimum. From a mathematical point of view, k-means turns classification into an optimization problem with the following cost function in Equation (5): where v i is the centroid of the samples in the set C i for i = 1, 2, . . . , k. The function J (X , V ) has no analytical solution: as a result, k-means proceeds by iteratively finding the minimum of its cost function. In particular, at each iteration t the centroid values of the prototype set V (t) ) are updated according to Equation (6).
where |C i | is the number of objects belonging to the i-th at the step t.

Fuzzy C-Means
Biomedical images are characterized by an intrinsic uncertainty, thus causing not well-defined regions (e.g., blurry boundaries or poor anatomical details). This aspect makes thresholding approaches and crisp approaches (such as k-means) not always suitable. Therefore, the natural fuzziness characterizing the Fuzzy C-Means (FCM) clustering approach allows to reach better segmentation results than the hard partitioning offered by k-means [33][34][35][36].
The FCM algorithm is an unsupervised clustering technique that searches for the optimal partition of an input data set. The idea leading the FCM classification process is that of minimizing the intra-cluster variance as well as maximizing the inter-cluster variance, in terms of a distance metrics between the feature vectors. Formally, the FCM technique searches for the optimal partition of an input data set X = {x 1 , x 2 , . . . , x N } of N objects into C clusters. With respect to the k-means algorithm in which each point is assigned to one cluster only, FCM allows each object to belong to multiple clusters with different degrees of membership. This 'soft' classification allows us to define a fuzzy partition, P defined as a fuzzy set family P = {Y 1 , Y 2 , . . . , Y C } such that each point can have a partial membership to multiple clusters. In mathematical terms, the matrix U = [u ik ] ∈ R C×N denotes a fuzzy C-partition of the data set X by means of C membership functions u i : X → [0, 1], whose values u ik := u i (x k ) ∈ [0, 1] represent membership grades of each element x k to the i-th fuzzy cluster Y i , and have to hold the constraints in Equation (7): Although hard clustering works well on compact and well-separated groups of data, in many real-world situations clusters overlap each other: as a result, assigning them with gradual memberships by exploiting a soft computing approach may be more appropriate. Computationally, the FCM algorithm assigns to the sample x k the membership function values using the relative distance (i.e., intensity value similarity) of x k from the C prototype points V = {v 1 , v 2 , . . . , v C } identifying the centroids of the C clusters. Such as k-means algorithm, FCM may be rewritten as an optimization problem with respect to the objective function in Equation (8). where: • m is the fuzzification constant (i.e., a weighting exponent such that 1 ≤ m < ∞) controlling the fuzziness of the classification process. If m = 1, the FCM algorithm degenerates to a k-means clustering: in general, the higher the m value the greater will be the fuzziness degree (the most common value is m = 2); • U is the fuzzy C-partition of the set X ; • |x k − v i | 2 is the Euclidean distance between the elements x k and the centroid v i ∈ V.
Considering that the optimization problem described by FCM does not have a closedform solution, the minimum of the cost function J m (U, V; X ) has to be found iteratively. In particular, at each iteration t, the centroid values of the prototype setV (t) and the elements of the matrixÛ (t) are updated according to Equations (9) and (10), respectively.
with m > 1 and x k =v (t) j , ∀j, k. At each iteration, each object x k is compared with the elements of the centroid vector and is assigned to the nearest cluster. The process stops when the convergence condition (i.e., the matrix norm distance betweenÛ (t+1) andÛ (t) is less than a fixed value (i.e., minimum improvement in the objective function J between two consecutive iterations ) or the maximum number of iterations T max is reached. After the convergence, a defuzzification is applied to assign each pixel to the cluster with the highest membership degree, thus achieving a binary classification.

Spatial Fuzzy C-Means
The traditional FCM clustering does not take into account spatial relationship among neighboring pixels, making it sensitive to noise and other imaging artifacts [37,38]. Breast lesions generally tend to grow in an isotropic way, preserving a pseudo-spherical appearance [39]. Relying on those features, it is expected that neighbouring pixels in a digital image are highly correlated and that the probability that they belong to the same cluster is great. Therefore, the use of the sFCM, taking advantage of spatial relationship of neighbouring pixels, can help image segmentation [40,41]. The spatial function used by sFCM is defined in Equation (11): where N (x j ) represents a square neighborhood (in the spatial domain) around the pixel x j . The term h ij represents the probability that the pixel x j belongs to i-th cluster: as a result, the spatial function of a pixel for a cluster is large if the majority of its neighbourhood belongs to the same cluster. The contribution of the spatial function modifies the classic FCM membership function according to Equation (12): where p and q control the relative importance of both functions. In a homogeneous region, the spatial functions simply emphasizes the original membership and the clustering remains unchanged. On the other hand, for a noisy pixel this formula reduces the weighting of a noisy cluster by the labels of its neighboring pixels: as a result, misclassified pixels from noisy regions can easily be corrected. The sFCM clustering process involves two steps at each iteration: the former, which is the same as that in standard FCM, allows for calculating the membership function in the features domain, while the latter maps the membership information of each pixel into the spatial domain and allows to calculate the spatial functions for each pixel of the image. At this point, the FCM iteration proceeds with the new membership that is incorporated with the spatial function and stops when the maximum difference between cluster centers at two successive iterations is lower than a certain threshold. After the convergence, a defuzzification scheme (i.e., maximum membership) is applied.

Materials and Methods
As introduced in Section 1, the purpose of this work is to present an in-depth analysis of classical unsupervised segmentation algorithms, developing them appropriately to adapt them to the clinical case addressed (i.e., breast lesions detection), thus improving the overall performance.
This section, along with showing the characteristics of the DCE-MRI dataset analyzed, describes the processing pipeline of the proposed analysis. It is necessary to point out that particular attention was paid to the optimization of each step for the specific case study, basically on three different levels: • pre-processing: obtaining images with similar characteristics for the downstream processing steps; • segmentation: optimization of the parameters of the investigated segmentation algorithms; • post-processing: definition of appropriate region properties (based on connectedcomponents) to refine the segmentation results.

MRI Dataset Description
The analysis of this study was performed on a clinical DCE-MRI dataset composed of 50 patients with breast cancer: a total of 599 slices were processed. The main details on MRI acquisition parameters are reported in Table 1 , while in Figure 1 the phases related to some benign and malignant lesions are shown. The dataset includes patients with different stages of breast cancer, allowing us to cover a wide clinical scenario: as matter of fact, this dataset contains various levels of segmentation difficulty with some scans showing low contrast and large inhomogeneities. Lesions with a non-homogeneous enhancement region, masses with an irregular shape or necrotic core are also included.  for segmentation purposes the clinician selected the strongest one. In particular, for these lesions, the 4th, 3rd, 4th, and 3rd phase was chosen, respectively.

The Proposed Analysis
Clinical knowledge points out that breast lesions in DCE-MRI appear hyperintense compared to the adipose and muscle surrounding tissues. Furthermore, anatomical atlases and diagnosis reports also refer that breast lesions have the tendency to grow in an isotropic manner, preserving a pseudo-spherical shape [39]. Relying on these morphological features, in this study we decided to start the segmentation from the central slice, which should be the one with the largest area: adjacent slices were processed successively. The idea behind this choice is to segment initially the slice in which the lesion appears more evident, finding the centroid of the connected-component identifying the lesion. On slices different from the central one, the lesion ROI was found by identifying the connected-component with the centroid that has the minimum distance from the centroid found for the central slice. The whole segmentation process can be summarized as follows: • dataset loading: all DCE-MRI images (belonging to the selected patient) and its groundtruth are loaded; • ROI selection: a bounding box containing the lesion is manually drawn in the central image: in this way, the algorithm analyzes only the pixels within the box, reducing computational times and avoiding classification mismatches; • pre-processing: necessary to reduce noise and provide images with similar characteristics for the next processing steps; • lesion segmentation: once the images are pre-processed, the MR image stack of each patient is segmented by all the analyzed methods (i.e., SMRG, k-means, FCM, and sFCM); • post-processing: performed to refine the segmentation and to properly identify the ROI from each binary mask obtained in the previous step; • performance evaluation: the final masks are stored and performance metrics (i.e., areabased and distance-based metrics) are calculated comparing the masks against the ground-truth.
In addition to the well-known segmentation techniques, also the exploited pre-and post-processing operations are widely used by the international community in the field of medical imaging. In fact, the literature offers many works that use the same preprocessing [42,43] and post-processing [44,45] steps in the image analysis phases (before segmentation), as well as in the refinement phases (after segmentation), with excellent results. Figure 2 shows the flow diagram of the implemented and proposed analysis. Each processing block is described in the following subsections. All the methods were developed using the MatLab environment (Natick, MA, USA). The code is available via GitHub: https: //github.com/carmilitello/UnsupervisedSegmentation.git (accessed on 9 November 2021).

Dataset Loading and ROI Selection
This step selects the MRI series to be analyzed. The data loading step, even though does not represent a real processing step, is essential for each algorithm-both supervised and unsupervised-to obtain the data to process. Moreover, the concept of 'supervision' refers generally to the training phase of an algorithm by requiring labeled data. For these reasons, the 'unsupervised' nature of the algorithm used here is preserved. In order to reduce computational times and improve segmentation performance, a minimal user interaction is needed. In fact, by manually tracing a rectangular bounding-box on the image, the operator provides to approximately select the initial ROI containing the breast lesion [13]. Once traced, this ROI is used to crop all the slices of the MR image stack The manual selection of the initial ROI containing the lesion is useful to reduce the processing time (since only a portion of the whole image is processed), as well as to exclude regions containing pixels with characteristics similar to the lesion that could complicate the algorithm performance, thus invalidating the final result. Moreover, the initial, interactive input provides the clinician with the ability of confidently controlling the entire segmentation process, which is generally preferred compared to a fully automatic process [18,46]. Moreover, considering that the method works on an ROI-selected by the clinician-it is assumed that the tumor is always present within the ROI.

Pre-Processing
In order to improve segmentation results, some pre-processing operations are applied after the ROI selection ( Figure 3). In particular, the aim of this preliminary phase is to perform denoising and data pre-processing, allowing us to achieve reliable results during the segmentation phase.
The first pre-processing operation deals with noise reduction [47]. The use of median or average (with Gaussian or flat kernels) filters is a common choice in MR image processing. For instance, median filtering was applied to facilitate the ROI identification by reducing the outlier introduced by anatomical peculiarities [48] or also deal with small patient shifts [49]. We used Gaussian kernels having the form in Equation (13), which are the only circular symmetric kernels that are also separable and that allow you to reduce noise by altering the image less than the average filter.
where r, σ and k represent the radius, the standard deviation and the normalization factor of the Gaussian function G, respectively. Kernel normalization, obtained by multiplying its coefficients by k-obtained as the inverse of the sum of all kernel coefficients-has two purposes: (i) the average value of an area of constant intensity would equal that intensity in the filtered image (as it should) and (ii) it prevents the introduction of biases during filtering (i.e., the sum of the pixels in the original and filtered images will be the same). For the purpose of this work, a Gaussian kernel with r = 5 and σ = 0.6 was chosen: this standard deviation value performed denoising without an excessive blurring of the original image. The next pre-processing step subtracts the mean value from the original image. This is a very helpful step in signal processing because it emphasizes signal variations by shifting its mean to zero. For this reason, during the pre-processing phase, the algorithm removed this bias from the original image amplifying gray level variations. Once the mean value subtraction, the pixels resulting in a negative value have to be clipped to zero in order to avoid visualization problems in the following steps.
After noise reduction and mean subtraction, the last step performs a modified gammatransformation to stretch the original image histogram. The general form of a gamma transformation is defined in Equation (14): where c and γ are positive constants. Power-law curves with fractional values of γ map a narrow range of dark input values into a wider range of output values, and vice versa for bright input values. When c = γ = 1 the gamma transformation reduces to the identity transformation. The modified gamma transformation proposed in this work used c = 1 and γ = 0.7, by means a piece-wise function in which the final value of each pixel depends on a certain threshold value θ. In particular, the function mapping the initial value of a pixel r(x, y) into its final value s(x, y) is defined in Equation (15). The value of θ, used for the piece wise gamma selection, is the one suggested by the Otsu's thresholding method [30]. Figure 4 shows the results of each pre-processing step applied on two (one per row) DCE-MRI breast lesions.

Segmentation
After pre-processing, the segmentation involves the partitioning of a digital image into multiple sets of pixels, according to the specific clinical purpose. Nowadays, segmentation of digital images still remains one of the most challenging topics in image processing: in fact, medical image segmentation is often still performed manually, via time-consuming and operator-dependent procedures.
As indicated in Section 3.2.1, to start the segmentation process, it is necessary to select the ROI containing the lesion in the central image. The central slice-which is generally the one with the largest tumor section-is determined automatically after the operator sets the range of the initial and final slices containing the tumor. Starting from the selected ROI in the central slice, the ROIs in the adjacent slices are determined starting from the centroid of the segmented lesion in the previous slide. By doing so, the ROIs (set automatically) in the adjacent slices are centered (or almost) with the lesion section they contain. After the segmentation, a check is made to verify that the lesion does not touch the ROI boundary. In fact-considering that the ROI size is set equal to that of the ROI in the previous slice-it could happen that the selected ROI is too small and, consequently, the lesion was cropped. If so, the ROI is enlarged in order to fully contain the lesion section and repeated segmentation.
The segmentation is sequential: starting from the central slice, all the upper slices are processed first and then the lower ones. All the segmentation steps are the same for both the central slice and the other slices. The only difference lies in the different definition of the ratios-defined in Equations (18) and (19)-used in the post-processing steps, differentiated with the goal of optimizing the segmentation result.

SMRG Setting
There are several parameters controlling the SMRG behavior. First of all, the splitting predicate P, a logical predicate fundamental to achieve satisfactory outputs. In order to identify breast lesions, homogeneity criteria are defined in terms of the mean value µ of each quad-regions. Here, the logical predicate in Equation (16) was used: The minimum block dimension ρ min sets the minimum quad-region size beyond which no further splitting is carried out: the best results are found with minimum block dimensions of 4 × 4 pixels, because small regions are detected too. After the splitting phase, the final partition will contain adjacent regions with identical properties: this drawback can be addressed by merging only adjacent regions whose combined pixels satisfy the predicate P.
Finally, a stopping rule interrupts the growing procedure if no more pixels match the membership criterion. For the aim of this work, a reasonable stopping rule uses a similarity criterion between the candidate pixel to be incorporated and the pixels already belonging to the region. The criterion for the stopping rule is defined in terms of absolute distance between the regional mean of each quad-region and the threshold provided by the Otsu's method [30] on the original cropped image, both of them calculated without the contribution of null pixels.

Clustering Setting
Here, the optimal setting used for k-means, FCM and sFCM clustering algorithms is reported, in terms of the following parameters: (i) number of clusters; (ii) maximum number of iterations; (iii) minimum improvement in objective function between two consecutive iterations; (iv) exponent for the fuzzy partition matrix (only for fuzzy approaches); (v) < p, q > parameters (only for fuzzy approaches).
Regarding the number of clusters, the lower the partition fuzziness, the better the segmentation result: the best clustering is achieved when V pe is minimal. If the partition entropy for C = 2 is lower than the one obtained for C = 3, the lesion does not include any necrotic region: if so, pixels belonging the brightest cluster are turned to 1 and the others to 0. Otherwise, the two brightest clusters are fused together and their pixels are turned to 1, allowing for the inclusion of necrotic cores into the preliminary mask. Considering that, sometimes breast lesions can include an inner necrotic region that appears darker (i.e., hypo-intense) with respect to the rest of the lesion. In those cases where a binary clustering (C = 2) does not allow for properly detecting a necrosis, the number of clusters might be increased (C = 3). The choice of the optimal number of clusters to be used in k-means, FCM and sFCM is automatically set by evaluating the partition entropy for both cases C = 2 and C = 3. The partition entropy V pe is a cluster validity function defined as in Equation (17): The p, q parameters, dealing with FCM and sFCM clustering, control the relative importance of both functions defined in Equations (10) and (11), respectively. In particular, with p, q = 1, 1 , we sFCM is equivalent to the traditional FCM. As a matter of fact, to properly weight spatial information, the values p, q >= 1, 2 were used. The choice was guided by the analysis and results obtained in [18], tackling a similar problem on DCE-MRI images. All these parameters are reported in Table 2.
At the end of the segmentation of each slice, the algorithm also verifies whether or not the contour of the detected lesion touches the boundaries of the initial rectangular bounding-box: if so, the shape of the lesion results in a cut version of the original one. For this reason, when the lesion intersects the boundaries of the original m × n rectangular crop, the algorithm shifts and expands the bounding-box itself until the boundaries of both the lesion and the crop do not intersect each other. The result of the segmentation process is a binary image in which a label is assigned to each pixel: 0 if the pixel does not belong to the mask, and 1 otherwise. The aim of the post-processing phase is to properly choose, among the connected-components resulting from the previous step, the only one representing the lesion. The post-processing itself consists of several sub-steps allowing for removing connected-components that do not meet specific morphological criteria ( Figure 5).

Small area removal
Remove connected component with area< (0.75* mean area of all the connected regions)

"Ratio_central" maximization
Maximize the ratio (μ i * s i )/ ε i for all the remaining connected components

Small area removal
Remove connected component with area< (0.75* mean area of all the connected regions)

"Ratio_other" maximization
Maximize the ratio (μ i * s i )/ (d i ) 2 for all the remaining connected components The first post-processing phase allows us to extract all the connected-components from the raw binary mask. For each connected-component, a list of morphological features (i.e., extreme points, area, centroid, eccentricity, mean intensity and solidity) are computed that the algorithm is going to exploit in the following steps. In order to remove spurious connected-components with an area that is too small to be considered as a lesion, an areabased selection is performed: all the connected-components with an area smaller than 0.75 × meanArea of all the connected-components. In such a way, spurious regions with a very small area are removed, reducing the number of valid candidates to be analyzed in the following steps.
Clinical evidence allows us to state that breast lesions generally grows in an isotropic way preserving a pseudo-spherical appearance [39]. As a consequence, their particular shape can be described in terms of eccentricity, which is defined as the ratio of the distance between the foci of an ellipse and its major axis length. Eccentricity values lay in the range [0, 1]: the extreme values represent degenerate cases identifying a circle and a line segment, respectively. The aim of this post-processing step is to delete from the list of the lesion candidates the regions with eccentricity lower than 0.3 and higher than 0.9 (these threshold values were determined experimentally). Eccentricity values close to 1 relate to very elongated lesions, while eccentricity values close to 0 relate to almost perfectly round shapes. In the case of breast lesions, even if overall there are rounded lesions, the lesion has a 'lobed' trend for which the final value of the eccentricity never takes values below 0.2-0.4. Using 0.3 as eccentricity lower limit in the post-processing phase, allow us to eliminate those (almost perfectly) circular connected components with pixel values similar to the lesion (and which are therefore incorrectly selected), but which are instead part of the background. As a consequence of their tendency to be round-like shaped, breast lesions exhibit an high solidity, which is defined as the ratio between the region area and the (including the region) convex polygon area. Considering this morphological information, the lesion ROI identification can be turned into an optimization problem with the ultimate goal of maximizing ratioCentral, defined in Equation (18): where µ i is the mean intensity, s i is the solidity and i denote the eccentricity of the i-th connected-component, respectively.
The post-processing in the other slices differs from the central slice. In this case, the parameter to maximize ratioOther is defined according to Equation (19): where d 2 i represents the square of the distance between the centroid of the lesion region in the central slice and the centroids of all the connected-components in the current slice. In order to reduce false positives identification, ratioOther maximization searches for the connected-component with the centroid that is the closest to the one of the central slice mask. Recalling that breast lesions usually preserve a pseudo-spherical appearance, as the slices move away from the central one, the cross section identifying the lesion is reduced. In Equation (19) the replacement of i with d 2 i aims to avoid misclassification of spurious regions in slices different from the central one. Figure 6 shows the results of each post-processing step applied to a segmentation mask obtained by means of sFCM clustering. Unfortunately, breast lesions might be characterized by different scenario-in terms of uniformity, contrast and well-defined boundaries-which a simple thresholding cannot properly manage. Figure 7 shows two segmentation results that allow us to appreciate the lesion non-homogeneity that only clustering approaches can properly manage, thus maximizing the result accuracy.  (18) and (19). With respect to the position in the DCE-MRI sequence, all images have been rotated 90°clockwise, in order to improve the graphical representation and allow all the images, related to aspecific lesions, to be displayed in a single line.  Figures 4 and 6, a 2× zoom factor was used, that allows us to better appreciate the lesion non-homogeneity that only clustering approaches can properly manage, thus maximizing the final result accuracy.

Segmentation Performance Evaluation
Several evaluation measures are computed to quantify segmentation performances by comparing masks-obtained from analyzed unsupervised methods-and the groundtruth-provided by a radiologist, with more than 5-year experience on breast MRI, in consensus with a consultant breast radiologist (with more than 30-year experience on breast imaging). To obtain an accurate and detailed quantification, both area-based and distance-based metrics were used. The reason behind this choice is that area-based metrics strongly depend on region size and are not always able to evaluate the precision of a segmentation approach. On the other hand, distance-based metrics take into account the distance between the boundaries of the two segmentations to be compared, ignoring the actual volume difference between the two masks.

Spatial Area-Based Metrics
Spatial area-based metrics compare the semi-automatic segmented regions with the manually segmented ones (R A and R T , respectively) by calculating the overlapping percentage of area between the two masks obtained from the segmentation of the image I.
When validating the segmentation results, the two most used area-based metrics are the Dice Index (DI) and the Jaccard Index (JI), defined in Equations (20) and (21), respectively. DI and JI are used to describe how much similar the manual (ground-truth) and the semiautomatic segmentations are: the greater they are, the higher is the overlapping percentage between the two masks.
Sensitivity and specificity-defined in Equations (22) and (23)-represent the portion of positive pixels (foreground) and negative pixels (background) correctly detected by a segmentation method with respect to the ground-truth, respectively.
False Positive Ratio (FPR) and False Negative Ratio (FNR)-defined in Equations (24) and (25)-denote the presence of false positives and false negative compared to the reference region, respectively.

Spatial Distance-Based Metrics
Area-based metrics are susceptible to differences between the positions of segmented regions and strongly dependent on their own size. To take into account the spatial position of the pixels, it is necessary to quantify the distance between the boundaries computed by the semi-automatic methods and the ground-truth delineated by the expert. Let A = {a i : i = 1, 2, . . . , K} be the set of vertices belonging to the semi-automatic mask and T = {t j : i = 1, 2, . . . , N} the set of vertices belonging to the ground-truth, the distance between the i-th pixel in A and the set T is defined as: where ||a i − t j || 2 denotes the Euclidean distance between two points. Many metrics can be defined in order to quantify the similarity/dissimilarity between two segmentations.
The Mean Absolute Distance (MAD)-defined in Equation (27)-quantifies the average error in the segmentation process. The Maximum Distance (MaxD)-defined in Equation (28)-measures the maximum difference between the two ROI boundaries. The Hausdorff Distance (HD) between the point sets A and T-defined in Equation (29)measures the maximal distance from a point in the first set to a nearest point in the other one.
where h(T, A) = max It is important to point out that all the measured distances are expressed in pixels: in this way, they result will be independent from the spatial resolution among different MRI datasets (i.e., pixel spacing). Tables 3 and 4-expressed as mean ± standard deviation. As easily appreciable, the results showed that the fuzzy framework offered by FCM and sFCM reflects the intrinsic uncertainty that characterizes medical images, allowing us to achieve better segmentation results compared to the hard clustering performed by k-means. Furthermore, it is worth to note that spatial constraints taken into account by sFCM contribute to reduce the standard deviation of the final result, thus ensuring higher reliability with respect to SMRG and k-means. On the other hand, the large value in standard deviation indicates that SMRG has a high variability of the results that affects its reliability. The boxplots in Figures 8 and 9 summarize the obtained results.

Area-based metrics obtained by each segmentation algorithm are shown in
Regarding sensitivity, the results showed that the FCM-based approach offered better performance in terms of both mean value and standard deviation; on the contrary, the hard partition offered by k-means did not provide satisfying results. The specificity values showed that the clustering-based approaches obtained better performances compared to SMRG. In particular, sFCM and k-means achieved slightly better results than FCM.
Altogether, the results in Tables 3 and 4 showed that segmentation approaches based on soft-clustering techniques achieved better performance compared to SMRG and kmeans. In fact, by explicitly exploiting the fuzziness, both FCM and sFCM better handled the intrinsic uncertainty and the natural variability of medical images. On the other hand, the crisp k-means allowed us to reach satisfying results in specificity values, but without granting good performance in sensitivity.    0.14 ± 0.12 0.22 ± 0.09 Figure 10 shows some segmentation examples focusing on scenarios where the results are not particularly satisfactory in terms of FPs and FNs. In general, it is possible to note that SMRG and k-means have a greater tendency to leave out parts of the lesion (FN)-especially, when these are not uniform-and to include areas outside the lesion (FP), while the techniques based on clustering, and in particular the sFCM guarantees a better lesion detection.

Spatial Distance-Based Metrics Segmentation Results
The need for using both area-and distance-based metrics comes out in considering that area-based metrics do not take in account pixels' spatial distribution. This leads to the need to quantify the distance between the boundaries computed by the semi-automatic methods and the ground-truth. The boxplots in Figure 11 summarize the obtained results.
In terms of HD, all the presented methods share similar characteristics with a mean value of ≈2.2 and a standard deviation of ≈0. 43. This means that the boundaries of the semi-automatic masks are quite close to the manually traced ones.
The MAD metric is quite similar across the examined techniques, except for the SMRG, which shows the higher mean value and standard deviation. Clustering-based approaches, on the other hand, result more stable and precise, offering a better performance on the whole dataset.

Mean Absolute Distance Maximum Distance
Hausdorff Distance  Table 5 shows distance-based metrics obtained using the considered unsupervised segmentation approaches: lower distance values indicate better segmentation results. Observing the general trend, just a small deviation between the segmentations of the proposed methods and those of the experienced radiologist can be denoted. Furthermore, the achieved spatial distance-based indices are consistent with area-based metrics, confirming that clustering-based segmentation approaches allow us to reach better results with respect to SMRG. As shown at the top of Figure 12, FCM (with p = 6.268 × 10 −6 and p = 2.670 × 10 −7 ) and sFCM (with p = 8.566 × 10 −5 and p = 1.855 × 10 −7 ) clustering methods achieved significantly higher DI values compared to SMRG and k-means, respectively. To statistically validate the obtained results, the two-sided Wilcoxon signed rank test [50] on paired DI results was performed with the null hypothesis that the samples come from continuous distributions with equal medians (considering a significance level of 0.05). Obtained p-values are shown in Table 6. With more details, this test on paired results was used to statistically compare the distributions of the DI values achieved by two competing methods and identify significant differences. The p-values were corrected by the Bonferroni-Holm method [51] for multiple comparisons.
The values reported in Tables 3-5 represent the average value ± standard deviation over all 50 breast masses. All the values obtained on each breast mass are reported in the Supplementary Materials.

Processing Times
In order to evaluate the processing times, all segmentations were performed on the entire dataset, by calculating the average elapsed time and the corresponding standard deviation for each of the four investigated algorithms over all the analyzed images, obtaining the following values: SMRG: 2.78 ± 2.79 s; k-means: 1.35 ± 0.78 s; FCM: 1.65 ± 0.88 s; sFCM: 1.76 ± 0.83 s. These processing times were measured using the Matlab R2019b IDE (by means of the tic and toc stopwatch timer functions) running on a Windows 10 Pro general-purpose PC equipped with an Intel I7-3630QM@2.40 GHz CPU and 8 GB RAM.
As expected, SMRG had the longest processing time due to the two-stage approach and also considering the iterative processes employed during the Split-and-Merge and Region Growing executions. Interestingly, the introduction of the fuzzy logic is negligible compared to the crisp k-means, as well as the integration of the spatial constraints into the sFCM algorithm does not require a remarkable computational overhead in addition to the standard FCM clustering.
Overall, these results demonstrate the clinical feasibility of the investigated classic unsupervised methods also in terms of both computational resources and processing times, by considering that supervised CNN-based approaches for segmentation require a training phase and then an inference phase typically performed on Graphics Processing Units [18].

Difficult Cases
As previously highlighted, medical images are characterized by an intrinsic variability in which boundaries or anatomical details may be not well defined. Furthermore, noise corrupts digital images, thus affecting certain features within the original image. MRI suffers from various kinds of noise and artefacts because of the nature of the signal detection and spatial encoding [37,38]. For instance, hardware-induced errors are often caused by the complicated acquisition scheme depending on radiofrequency coils, while thermal noise can derive from transmission lines, receiver circuits and polarization magnetic field B 0 drift during the scan acquisitions. In addition, natural body motion (e.g., respiratory and cardiac motion) can degrade the image quality too. As a consequence, even after a proper pre-processing step, it is common to deal with low-contrast, noisy images. The proper lesion segmentation in this kind of images is not a trivial task and user interaction would be required to produce accurate results.

Case with Low Contrast-Enhanced Mass
In some cases, MR images are difficult to segment because the lesion itself does not appear brighter with respect to the muscle and adipose surrounding tissue. This kind of images exhibit a narrow histogram located typically toward the middle of the intensity scale, implying a washed-out grey look through the whole image. As a consequence, the meaningful partition of the original image results in a very difficult task leading to imprecise results. The opposite is true for the histogram of a high-contrast image, which covers a wide range of the intensity scale and has a pixel distribution not too far from uniform. The effect is an image that shows a great deal of gray-level detail and has a high dynamic range. As a matter of fact, a segmentation process on this kind of images will produce satisfying results with a high reproducibility. As easily appreciable in Figure 13, MR images offer a very difficult scenario where the lesion boundaries are not clearly distinguishable from the rest of the image. The manual segmentation (green contour) cannot properly segment the whole lesion because of the strong uncertainty due to the low contrast characterizing the whole image. On the other hand, the FCM segmentation (red contour) correctly identifies the lesion but, because of the low percentage of overlapping area with the manual mask, it does not ensure satisfying results.

Case with Blurred Boundary Mass
From Figure 14, it is possible to appreciate how blur severely compromises the contrast of the original image, making lesion segmentation a very challenging task. As mentioned above, because of the strong uncertainty in boundaries delineation, the manual segmentation (green contour) identifies a very simple and smooth shape into which the lesion is included. Of course, this kind of strategy allows the identification of lesion's location, but also includes into the ROI a lot of false positives. On the other hand, the semi-automatic mask (red contour) identifies a more precise region avoiding the misclassification of FP pixels. Even in this case, because of the imperfect area, overlap area-based metrics will not yield a high score.

Case with Irregular Mass
Opposed to what is reported in the literature [39], breast lesions may sometimes exhibit irregular shapes and borders with internal divisions. In these cases, breast masses could cause difficulties during the manual segmentation process: in fact, as the lesion contour becomes more irregular, the manual tracing of the ROI becomes more challenging. As a consequence, the manual segmentation of breast lesions with unusual shapes does not always match the effective lesion contour. Figure 15 exhibits one of this cases with unusual elongated breast lesion. The manual segmentation (green contour) completely cut the left portion in the upper part of the lesion which is, instead, properly included in the semiautomatic boundary (red contour). As a consequence, computer-assisted segmentation process results in a more precise mask that properly reproduces the narrowed-shape of lesion.

Discussion and Conclusions
The main objective of this work was to offer a detailed analysis of well-established classical unsupervised segmentation techniques by carefully comparing them in a real clinical application. Breast cancer is the most common cause of cancer death in women worldwide [52,53] and the second most common cancer overall [54]. Fortunately, science evolution has led to the development of medical imaging techniques, which are used to detect abnormalities in breast parenchyma. Among imaging techniques, multiparametric MRI plays a crucial role and it is widely used in clinical applications, due to its high resolution images and the ability to precisely differentiate soft tissues.
As matter of fact, as a case study, the contrast-enhancing mass delineation in breast DCE-MRI was addressed by means of four popular unsupervised segmentation methods, namely: SMRG, k-means, FCM, and sFCM. Although they represent well-known approaches in the literature, they are still widely used in clinical tools. Starting from the basic versions of these approaches, during the initial analysis, we identified the shortcomings of each of them, developing and implementing improved versions, when possible.
Nowadays, deep learning techniques represent the state-of-the-art, allowing us to achieve high performance and accurate lesion segmentation for datasets of thousands of patients [14]. Deep learning approaches for image segmentation are generally supervised techniques that require a considerable computation times and a large amount of data for training [15]. In fact, these data must be representative of all the possible scenarios in which the deep neural network could operate, not always available in small-or medium-sized hospitals, and therefore not clinically feasible. Moreover, it should be noted that the authors of [18] showed that semi-automatic approached based on classic unsupervised techniques obtained results comparable or superior to the deep CNNs (namely, SegNet and U-Net). Therefore, this study was focused on classic pattern recognition approaches with the goal of providing an in-depth analysis.
It is important to point out that, even considering all the disadvantageous aspects related to manual segmentation, the contribution of an expert radiologist still remains essential at least to validate the results obtained by means a computer-assisted approach. In fact, clinicians rely on computational approaches with interpretable results [41]. From this point of view, the classical unsupervised approaches-such as those analyzed in this work-provide this important advantage. This aspect is even more critical in deep learning architectures, where CNNs are generally adopted as 'black-box', thus making it difficult to offer a physical interpretation to the features encoded in all the intermediate CNN layers [46].
A second segmentation of the same radiologist or the segmentation of a different radiologist would have certainly allowed us to quantify the inter-/intra-operator variability of the results. Nevertheless, as already observed in [56], the mean DI was 0.81 (range 0.19-0.96). The mean DI is higher for the 'easy tumors' compared to the 'challenging tumors' (0.83 vs. 0.75, respectively, p < 0.001). The mean DI for each observer combination separately, for all tumors, ranged between 0.78 and 0.83, where the segmentations of the breast radiologist and the medical student showed the highest overlap. These results confirm that the performance achieved by the best performing methods are in line with the inter-observer agreement, also in terms of metrics variability according to the lesion types.
As further developments, we plan to investigate innovative improvements to further improve the performance with fuzzy clustering, by using (i) more sophisticated membership functions, and (ii) more advanced pre-and post-processing steps. Moreover, investigating and comparing the latest machine learning techniques, such as Generative Adversarial Networks (GANs), for unsupervised detection and segmentation [57,58] would be relevant with a sufficient amount of data for training and test. Finally, the implementation of multiparametric or multimodal approaches [59], by using different types of co-registered medical images-i.e., Diffusion Weighted Imaging (DWI) and Positron Emission Tomography (PET)/MRI-probably would allow us to improve the detection performance [60,61].
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/app12010162/s1: Table S1: Dice Index (DI) values obtained by the four investigated unsupervised methods on each of the 50 segmented breast masses on DCE-MRI. In the last row the mean value ± the standard deviation is reported; Table S2: Jaccard Index (JI) values obtained by the four investigated unsupervised methods on each of the 50 segmented breast masses on DCE-MRI. In the last row the mean value ± the standard deviation is reported; Table S3: Sensitivity values obtained by the four investigated unsupervised methods on each of the 50 segmented breast masses on DCE-MRI. In the last row the mean value ± the standard deviation is reported; Table S4: Specificity values obtained by the four investigated unsupervised methods on each of the 50 segmented breast masses on DCE-MRI. In the last row the mean value ± the standard deviation is reported; Table  S5: False Positive Ratio (FPR) values obtained by the four investigated unsupervised methods on each of the 50 segmented breast masses on DCE-MRI. In the last row the mean value ± the standard deviation is reported; Table S6: False Negative Ratio (FNR) values obtained by the four investigated unsupervised methods on each of the 50 segmented breast masses on DCE-MRI. In the last row the mean value ± the standard deviation is reported; Table S7: Mean Absolute Distance (MAD) values obtained by the four investigated unsupervised methods on each of the 50 segmented breast masses on DCE-MRI. In the last row the mean value ± the standard deviation is reported; Table S8: Maximum Distance (MaxD) values obtained by the four investigated unsupervised methods on each of the 50 segmented breast masses on DCE-MRI. In the last row the mean value ± the standard deviation is reported; Table S9: Hausdorff Distance (HD) values obtained by the four investigated unsupervised methods on each of the 50 segmented breast masses on DCE-MRI. In the last row the mean value ± the standard deviation is reported.  Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of "Azienda Ospedaliera Universitaria Policlinico P.Giaccone" of Palermo, Italy (protocol code n.1/2020-15/01/2020).

Informed Consent Statement:
Retrospective data collection was approved by the Ethics Committee. The requirement for evidence of informed consent was waived because of the retrospective nature of our study.

Data Availability Statement:
The data presented in this study may be available on reasonable request from the corresponding author. The data are not publicly available due to ethical and privacy restrictions.

Conflicts of Interest:
The authors declare no conflict of interest.