A Bayesian Neo-Normal Mixture Model (Nenomimo) for MRI-Based Brain Tumor Segmentation

: The detection of a brain tumor through magnetic resonance imaging (MRI) is still challenging when the image is in low quality. Image segmentation could be done to provide a clear brain tumor area as the region of interest. In this study, we propose an improved model-based clustering approach for MRI-based image segmentation. The main contribution is the use of the adaptive neo-normal distributions in the form of a ﬁnite mixture model that could handle both symmetrical and asymmetrical patterns in an MRI image. The neo-normal mixture model (Nenomimo) also resolves the limitation of the Gaussian mixture model (GMM) and the generalized GMM (GGMM), which are limited by the short-tailed form of their distributions and their sensitivity against noise. Model estimation is done through an optimization process using the Bayesian method coupled with a Markov chain Monte Carlo (MCMC) approach, and it employs a silhouette coe ﬃ cient to ﬁnd the optimum number of clusters. The performance of the Nenomimo was evaluated against the GMM and the GGMM using the misclassiﬁcation ratio ( MCR ). Finally, this study discovered that the Nenomimo provides better segmentation results for both simulated and real data sets, with an average MCR for MRI brain tumor image segmentation of less than 3%. an MRI image’s pattern. This study used the FSSN and MSNBurr distributions to build the mixture models. The cluster center was more robust since these distributions are stable in their modes.


Introduction
We chose brain tumor detection as the topic for this study since a brain tumor is the 15th most deadly disease in Indonesia, a comparison that includes all types of cancer. The World Health Organization noted that during 2018, Indonesia had 5323 cases of brain and nervous system tumors, with 4229 cases of mortality [1]. This high mortality rate also occurs in Surabaya, where cases of brain tumors are increasing from year to year. The Dr. Soetomo General Hospital (RSUD) in Surabaya provides radiological examination services for brain tumor patients, including brain scanning with the 3 Tesla and 1.5 Tesla magnetic resonance imaging (MRI) scanners. The higher Tesla power provides better quality MRI images. However, the 1.5 Tesla service is still in high demand due to its low cost; moreover, this service is also covered by the Indonesian government's social security (BPJS). A high-quality MRI image is needed to provide an accurate tumor location for medical surgery. This study tried to employ the image segmentation technique to enhance brain tumor images from the 1.5 Tesla MRI scanner. Segmentation is done by separating the tumor area (which is known as

Neo-Normal Mixture Model
In this section, we present the mixture model based on neo-normal distributions that were applied in segmenting the MRI brain tumor images. This is explained, as are the finite mixture model and the neo-normal family distributions.

Finite Mixture Model
This study used MRI images as the input data. MRI images are grayscaled digitized image, in which each pixel has gray color intensities. The grayscale intensities in the pixels describe the level of color from black to white. The lower the intensity value, the darker the gray level, and vice versa.
Suppose we have random variable Y = Y 1 , Y 2 , . . . , Y N , where y i denotes the grayscale intensity of i-th pixel observation, and we assume that there are N pixels in an image that groups into K clusters; then, the finite mixture model is a superposition of K specific distributions whose form is given by [11]: where f (y i θ, p) is the mixture densities with parameters θ = (θ 1 , θ 2 , . . . , θ K ) and p = (p 1 , p 2 , . . . , p K ), and f j y θ j is the density of specific univariate distribution with parameter θ j . The p j terms are the proportions of the mixture component density that must satisfy the following constraints: K j=1 p j = 1 and 0 ≤ p j ≤ 1, j = 1, 2, . . . , K.
The likelihood of the N data can be written as: where Y i and Y n are assumed to be conditional independence ∀i n, and come from identical distributions. This formulation is a general finite K components mixture model. In a specific way, we can provide a certain distribution in the form of f j y i θ j . If it is a normal density, then the model becomes a normal or Gaussian mixture model. Likewise, if f j y i θ j is the neo-normal distribution, then the model becomes a neo-normal mixture model or the Nenomimo.

Neo-Normal Distribution
The neo-normal distribution family consists of the adaptive distributions that has symmetrical properties, but it can also accommodate skewed or asymmetrical patterns; moreover, it can also be sharper or gentler [18]. These distributions slightly deviate from normal or Gaussian distributions but set their mode as the location parameter. Some of the distributions included in the neo-normal model are the exponential power distribution and Azalini skew normal distribution. The exponential power distribution is similar to the Gaussian distribution, but it can be more pointed and has a thick tail [19]. The Azalini skew normal distribution can accommodate symmetrical or skewed characteristics of a distribution, but this distribution's mode is not stable in its location parameters [20].
Fernandez and Steel [15] overcame the weakness of the Azalini's distribution by introducing another skew normal distribution. This distribution is able to maintain the stability of modes in its location parameters-whether it is symmetrical or skewed-by adding the transformation parameters. This distribution is called the FSSN. Another distribution that is also stable in mode is the MSNBurr by Iriawan [16]. The advantage of using the MSNBurr is that the distribution can accommodate long-tail data, as in an MRI image's pattern. This study used the FSSN and MSNBurr distributions to build the mixture models. The cluster center was more robust since these distributions are stable in their modes.
Supposing the random variable Y i follows the FSSN distribution, Iriawan et al. [17] provided the density as: where µ is a location parameter, σ is a dispersion parameter, and γ is a skewness parameter. This distribution transforms the symmetrical normal distribution using inverse scale factors in the positive and negative axes. It loses its symmetry when γ 1, skews to the left when γ < 1, and skews to the right when γ > 1. By substituting Equation (3) as f j y i θ j in Equation (1), we were able to obtain the FSSN mixture model. The MSNBurr distribution idea was designed for the skewed distribution, which ensures that the modes of the Burr II distribution for all values of its skewness parameter, γ, remain at its location parameter. Iriawan [16] defined the density function (p.d.f) for the MSBurr family as: where µ and σ are location and dispersion parameters, respectively, and κ is a scale factor. The distribution in Equation (5) is called the MSNBurr (γ, µ, σ), with the value of κ is as Equation (5): The MSNBurr mixture model is built by substituting Equation (4) as f j y i θ j in Equations (1) and (2) with parameter θ j = γ j , µ j , σ j , κ j .

Bayesian Coupled with Markov Chain Monte Carlo Approach
The fully Bayesian method coupled with MCMC approach was employed to estimate the model parameters and conduct inferences. Bayesian MCMC estimates parameters using full conditional posterior distribution through a sample average approach that is stochastically optimized [21]. In this section, we present the inference with Bayesian for the FSSN and MSNBurr mixture models, and we construct the algorithm.
Suppose N denotes the number of pixels of the MRI image and K denotes the number of the mixture component; then, we assume that each pixel i belongs to a single cluster that is indexed by label z i . For each observation, label z i is assumed to be independent, and the value is given by: where: For each z i , there is only one of z ij that equals to 1, and the rest are 0. For a certain i, therefore, K j=1 z ij = 1. Single-trial results for each z i in exactly one of K possible mixture components have the probabilities p 1 , p 2 , . . . , p K , p j ∈ (0, 1), j = 1, 2, . . . , K. Since z i 's are independent, the joint density of the variable is: Appl. Sci. 2020, 10, 4892

of 19
Let Y = Y 1 , Y 2 , . . . , Y N be an i.i.d random sample drawn from neo-normal mixture density. The likelihood of the mixture density is given by Equation (2). By integrating the label z i to the likelihood, we can rewrite the likelihood as: where θ j is the set of the neo-normal parameters.

Bayesian Approach for FSSN Mixture Model
The Bayesian approach was employed to perform the estimation, owing to the non-close form of the posterior distribution that acts as the multiplication between the likelihood function in Equation (8) and the prior distribution. We designed the prior distribution for the FSSN mixture model (FSSN-MM) following Iriawan et al. [17]. The prior distribution of mixture parameters were discussed by Gelman et al. [22].
The value of hyperparameters were calculated from the data. The hyperparameter η j = mean y i∈j and ϕ j = var y i∈ j , where y i∈ j represents all data that belongs to the jth cluster. We also set α j = ϕ j , β j = η j / α j − 1 , a j = η 2 j /ϕ j , and b j = η j /ϕ j . By employing these priors as independent priors and contaminating them with the likelihood, the joint posterior can be constructed easily by multiplying them together. This derivation and the Gibbs sampling algorithm for the FSSN mixture model were adopted from Iriawan et al. [17].
The full conditional posterior was conducted to generate the parameter values. It was derived from multiplying the likelihood and the prior distribution of parameters, which were estimated [23]. The full conditional posterior of each parameter is given below (for simplicity, all equations are written in logarithm and all parts of equation that do not contain the estimated parameter are summed as a constant).

Bayesian Approach for MSNBurr Mixture Model
The prior distribution for MSNBurr was discussed by Iriawan [16]. The proportion of the mixture components follows the Dirichlet distribution, as in the FSSN-MM. The following are prior distributions for each parameter of the MSNBurr mixture model (MSNBurr-MM): The hyperparameters value are given as in the FSSN-MM; however, we set the hyperparameter for prior γ j by τ j = ϕ j /2, l j = 0, and u j = 2. By setting these priors as independent priors and multiplying them with the likelihood, its joint posterior could be found easily (see Appendix A). The full conditional posterior distributions for each parameter are discussed below. For simplicity, the formulas are shown as logarithms (some derivations are shown in Appendix B) and all parts of the equation that do not contain the estimated parameter are assembled as a constant.
(e) The full conditional posterior of the label z ij follows the multinomial . . , K, as in Equation (13).
The Gibbs sampling algorithm was constructed with a full conditional posterior for each parameter. The step of Gibbs sampling for the MSNBurr-MM is presented in Algorithm 2.

2:
While t ≤ T do
Both Algorithm 1 and Algorithm 2 were used as a basis for programming via MATLAB. The generation of samples in both algorithms, we used the acceptance rejection sampling method. All programs were run under the computer specification of Intel Core i7, 8GB RAM, 256 GB SSD, without GPU and VRAM. For the sake of reproducibility of our results, we made our MATLAB code for the MSNBurr-MM available online at https://github.com/anindya364/nenomimo.

Cluster Validation and Comparison Tools
Cluster validation was done by employing the silhouette coefficient (SC). The value of SC determines the optimum number of clusters. SC j i is formulated in Equation (17) [24]: where SC j i is the SC value for all ith data at the jth cluster, ζ j i is the average squared Euclidean distance that denotes how similar the ith data are to their cluster, and ξ  (19): The OSC has a range of values from −1 to 1. The closer the OSC is to 1, the more likely the data should be in the cluster.
The comparison measurement tool used for the segmentation result was the misclassification ratio (MCR). The MCR calculates the differences between the segmentation results with the ground truth (GT) image. The smaller the value of MCR, the better the segmentation results [17]. The formula of MCR is given by Equation (20):

Result and Discussion
In this section, we compare the proposed models with the GMM by Rasmussen [10] and the GGMM by Deledalle et al. [14]. The GMM and the GGMM represented the model-based normal distribution, while the FSSN-MM and the MSNBurr-MM represented the Nenomimo.

Application for Data Simulation
The data simulation used in this study were synthetic images and natural images. For synthetic images, original images were also set for the GT. To add more challenge in segmentation, we provided noise (Gaussian noise, salt-paper noise, and speckle noise) to each image. The synthetic images (SI01 and SI02) and natural images (NI01 and NI02) with their GT are visualized in Figure 1.
The segmentation results of each method for SI01, SI02, NI01, and NI02 are shown in Figures 2-5. The methods run under two clusters for SI01 and NI01, while SI02 and NI02 run under three clusters.
The accuracy of segmentation results was shown by the MCR. As visualized in Figure 6, the minimum MCR was shown by the Nenomimo (FSSN-MM and MSNBurr-MM). For different types of noise, the Nenomimo gave a better performance in segmenting both the synthetic images and natural images. This indicated that the Nenomimo is more robust in handling noise than the GMM and the GGMM. Overall, for the simulation dataset, the Nenomimo segmented the images more precisely than the model-based normal distribution.

Application for Data Simulation
The data simulation used in this study were synthetic images and natural images. For synthetic images, original images were also set for the GT. To add more challenge in segmentation, we provided noise (Gaussian noise, salt-paper noise, and speckle noise) to each image. The synthetic images (SI01 and SI02) and natural images (NI01 and NI02) with their GT are visualized in Figure 1.          The accuracy of segmentation results was shown by the . As visualized in Figure 6, the minimum was shown by the Nenomimo (FSSN-MM and MSNBurr-MM). For different types of noise, the Nenomimo gave a better performance in segmenting both the synthetic images and natural images. This indicated that the Nenomimo is more robust in handling noise than the GMM and the GGMM. Overall, for the simulation dataset, the Nenomimo segmented the images more precisely than the model-based normal distribution.
The next application was for a real dataset. The MRI-based brain tumor segmentation was also done under the Nenomimo, the GMM, and the GGMM.

Application for MRI-Based Brain Tumor
The real datasets were from the Dr. Soetomo General Hospital (RSUD), Surabaya. The MRI sequences in this study were from T1 MEMP + C and T2 FLAIR; both were from the axial point of view. T1 MEMP + C was the MRI sequence in which the tumor area was enhanced by contrast media, which is the material given to patients, either orally or injected, before doing MRI scanning. T2 FLAIR was the MRI sequence without contrast media; this sequence visualized the swelling or edema of the brain tumor more visibly; therefore, the ROI of T2 FLAIR was usually larger than T1 MEMP + C.
We tried to segment about 75 images from the T1 MEMP + C and T2 FLAIR sequences. The tumor area was set as the ROI, while another area was set as a non-ROI. All sequences used in this study passed medical approval, and the GT was also conducted under medical judgment. All MRI images were analyzed under normal and neo-normal models. This paper only displays four sample  The next application was for a real dataset. The MRI-based brain tumor segmentation was also done under the Nenomimo, the GMM, and the GGMM.

Application for MRI-Based Brain Tumor
The real datasets were from the Dr. Soetomo General Hospital (RSUD), Surabaya. The MRI sequences in this study were from T1 MEMP + C and T2 FLAIR; both were from the axial point of view. T1 MEMP + C was the MRI sequence in which the tumor area was enhanced by contrast media, which is the material given to patients, either orally or injected, before doing MRI scanning. T2 FLAIR was the MRI sequence without contrast media; this sequence visualized the swelling or edema of the brain tumor more visibly; therefore, the ROI of T2 FLAIR was usually larger than T1 MEMP + C.
We tried to segment about 75 images from the T1 MEMP + C and T2 FLAIR sequences. The tumor area was set as the ROI, while another area was set as a non-ROI. All sequences used in this study passed medical approval, and the GT was also conducted under medical judgment. All MRI images were analyzed under normal and neo-normal models. This paper only displays four sample images. The sample datasets are visualized in Figure 7, while the histograms are shown in Figure 8.      Figure 7a,c includes the sample from the T1 MEMP + C sequence, while Figure 7b,d is the sample from the T2 FLAIR sequence. The sample data images are named IM01, IM02, IM03, and IM04. The first applied procedure was preprocessing. It was done to eliminate the fat around the skull, which could have interfered with the segmentation process. Figure 8 shows the image histogram after preprocessing. Unlike Figure 8b-d, the multimodality in Figure 8a is not clearly visible since the frequency of the other modes was very small. Due to the multimodalities present in the MRI data pattern, applying the mixture model framework was considered appropriate for this case.
The 75 datasets were segmented under the GMM, the GGMM, the FSSN-MM, and the MSNBurr-MM. The OSC was calculated to provide the optimum number of clusters. Table 1 shows the sample of the OSC for IM01-04. The optimum value of the OSC led to the optimum number of clusters. The optimum number of clusters for the Nenomimo for IM01 and IM03 was K = 3, while IM02 and IM04 reached the optimum number at K = 4. Various results were found with the GMM; for IM01, the optimum number of clusters was K = 7, while IM02 and IM04 reached the optimum number at K = 4, and IM03 reached the optimum number at K = 5. The GGMM had the cluster optimum at K = 4 for IM01 and IM04, K = 5 for IM02, and K = 3 for IM03. The segmentation results for the optimum clusters are visualized in Figure 9. Based on segmentation results in Figure 9, we can see that the ROI from the GMM still contained more noise than the GGMM, the FSSN-MM, and the MSNBurr-MM. For empirical comparison, the MCR from 75 dataset was found. Figure 10 shows the comparison of the MCR for all dataset using different methods. The MCR from Figure 10 shows that the minimum value was reached by the MSNBurr-MM, while the FSSN-MM gave similar results. This indicated that the Nenomimo have a better performance than the model-based normal distribution. The average value of the MCR for the FSSN-MM was about 0.02761, while the MSNBurr-MM had an average MCR value of about 0.02644. This value showed that the Nenomimo could segment the real dataset with a misclassification ratio of less than 3%.  Based on segmentation results in Figure 9, we can see that the ROI from the GMM still contained more noise than the GGMM, the FSSN-MM, and the MSNBurr-MM. For empirical comparison, the from 75 dataset was found. Figure 10 shows the comparison of the for all dataset using different methods. The MCR from Figure 10 shows that the minimum value was reached by the MSNBurr-MM, while the FSSN-MM gave similar results. This indicated that the Nenomimo have a better performance than the model-based normal distribution. The average value of the for the  The parameter of the Nenomimo for optimum clusters is shown in Table 2. In Table 2, the ROI for each dataset is denoted by the last cluster. For example, in IM01, for the FSSN-MM, the ROI had a cluster center at a grayscale intensity of 165 with the range of intensity dispersion at about 6.719. The pixel in the ROI was about 5.4% from all MRI areas. Different results were provided by the MSNBurr-MM, which had a cluster center at a grayscale intensity of about 163, an intensity dispersion of 7.138, and the ROI pixel just 6.3% from the MRI areas. Both the FSSN-MM and the MSNBurr-MM had skewness parameters > 1 in the first and third clusters, indicating that the distribution of the grayscale intensities was right skewed. The skewness parameter < 1 in the second cluster of IM01 indicated the left skew of the distribution.  The parameter of the Nenomimo for optimum clusters is shown in Table 2. In Table 2, the ROI for each dataset is denoted by the last cluster. For example, in IM01, for the FSSN-MM, the ROI had a cluster center at a grayscale intensity of 165 with the range of intensity dispersion at about 6.719. The pixel in the ROI was about 5.4% from all MRI areas. Different results were provided by the MSNBurr-MM, which had a cluster center at a grayscale intensity of about 163, an intensity dispersion of 7.138, and the ROI pixel just 6.3% from the MRI areas. Both the FSSN-MM and the MSNBurr-MM had skewness parameters γ > 1 in the first and third clusters, indicating that the distribution of the grayscale intensities was right skewed. The skewness parameter γ < 1 in the second cluster of IM01 indicated the left skew of the distribution. Figure 11 shows the historically generated parameter for the Nenomimo iteration. Under the computer specification with processor Intel Core i7, 8GB RAM, 256GB SSD, without GPU and VRAM, the average running time for clustering the MRI images with the FSSN-MM took about six minutes of computer processing, while the MSNBurr-MM took about four minutes. Figure 11 we can see that parameters from both the FSSN-MM and the MSNBurr-MM reached convergence fast. The steady state for the historical mean plotted in Figure 6 was reached before 100 iterations.  Figure 11 shows the historically generated parameter for the Nenomimo iteration. Under the computer specification with processor Intel Core i7, 8GB RAM, 256GB SSD, without GPU and VRAM, the average running time for clustering the MRI images with the FSSN-MM took about six minutes of computer processing, while the MSNBurr-MM took about four minutes. Figure 11 we can see that parameters from both the FSSN-MM and the MSNBurr-MM reached convergence fast. The steady state for the historical mean plotted in Figure 6 was reached before 100 iterations. (b) MSNBurr-MM parameter convergence Figure 11. Historically generated the Nenomimo parameter for sample MRI images. Figure 11. Historically generated the Nenomimo parameter for sample MRI images.

Conclusions
In this paper, we presented an improved method for image segmentation, namely the neo-normal mixture model (Nenomimo). The parameter estimation of the Nenomimo employs the Bayesian method coupled with an MCMC framework via Gibbs sampling. The algorithm ran well when using a computer with processor Intel Core i7, 8GB RAM, 256GB SSD, without GPU and VRAM, and it reached convergence quickly. The proposed method successfully overcame the limitation of mixture models based on the normal distribution. The Nenomimo gave a better performance than the GMM and the GGMM on both simulated and real datasets. For segmenting MRI brain tumor images, the Nenomimo showed powerful results, with the value of the misclassification ratio being less than 3%.
In the present work, segmentation results for MRI brain tumors using the Nenomimo still had a little noise, which was identified as the ROI. Therefore, for future research, it is recommended to continue to develop the Nenomimo model that considers pixel spatial dependencies for image segmentation.

Appendix B. Some Direction for the Derivation of Full Conditional Posterior
Appendix B gives an example of a derivation to get the full conditional posterior for one of the MSNBurr-MM parameters. For example, we derived the full conditional posterior for parameter µ j , j = 1, 2, .., K. The full conditional posterior µ j for certain jth cluster is given by: To simplify the derivation, we processed Equation (A2) in the form of the logarithm as follows: ln f µ j σ j , γ j , p j , z, y = N i=1 z ij (A3) Equation components that do not contain µ j can be simplified into constants; therefore, Equation (A3) can be written as: In the same way, a full conditional posterior for parameter σ j and γ j can be obtained.