Robust Mixture Modeling Based on Two-Piece Scale Mixtures of Normal Family

In this paper, we examine the finite mixture (FM) model with a flexible class of two-piece distributions based on the scale mixtures of normal (TP-SMN) family components. This family allows the development of a robust estimation of FM models. The TP-SMN is a rich class of distributions that covers symmetric/asymmetric and light/heavy tailed distributions. It represents an alternative family to the well-known scale mixtures of the skew normal (SMSN) family studied by Branco and Dey (2001). Also, the TP-SMN covers the SMN (normal, t, slash, and contaminated normal distributions) as the symmetric members and two-piece versions of them as asymmetric members. A key feature of this study is using a suitable hierarchical representation of the family to obtain maximum likelihood estimates of model parameters via an EM-type algorithm. The performances of the proposed robust model are demonstrated using simulated and real data, and then compared to other finite mixture of SMSN models.


Introduction
Finite mixture models are highly demanded in machine-learning analysis, due to their properties, computational tractability, and for being a good approximation for continuous densities [1].They are also an important statistical tool for many applications in clustering, discriminant analysis, image processing and satellite imaging [2].Beyond the already known results provided for the finite mixture of normal distributions (FM-NOR) model in the literature [1], recent developments cover symmetric/asymmetric and light/heavy tailed distributions.One of these is the novel class of finite mixture of multivariate skew-normal mixture (FM-SN) models [3,4], which provides some advantages over the normal mixtures: the normal components allow an arbitrarily close modeling of any distribution by increasing the number of components, and, in the context of supervised learning, groups of observations represented by asymmetrically distributed data can lead to the wrong classification.The components of skew-normal mixture models, however, capture skewness due to their flexibility [1].In addition, a robust extension of the FM-SN model to robust finite mixture of skew-t (FM-ST) has been done in the influential works of [3,[5][6][7].The FM-ST components, too, capture both skewness and extreme observations due to their flexibility [8].
The SMSN family is a rich and very strong flexible class of distributions which covers the light/heavy-tailed distributions; e.g., skew-normal (SN), skew-t (ST), skew-slash (SSL) and skew contaminated-normal (SCN) distributions, and has been widely considered in many statistical models, especially FM models (see e.g., [5,[9][10][11][12][13][14][15]).The SMSN family is an extension of the skewed version of the well-known symmetric scale mixtures of the normal (SMN) family which contains the light/heavy-tailed members: the normal (N), t (T), slash (SL) and contaminated-normal (CN) distributions [16].Lange et al. [17], Lange and Sinsheimer [18], and Maleki and Nematollahi [19] used the SMN family in an application of robust statistical modeling.A two-piece distribution based on the symmetrical distributions with various scales is an alternative approach to model atypical data (see e.g., [10,[20][21][22][23][24]). In our approach, we have used the two-piece distributions based on the SMN family.This family, called the two-piece distributions based on the scale mixtures of normal (TP-SMN), and analogy of the SMSN family, contains the light/heavy-tailed members: the two-piece normal (TP-N), two-piece t (TP-T), two-piece slash (TP-SL) and two-piece contaminated-normal (TP-CN) distributions as its members.
In this paper, we consider the TP-SMN family of distributions as a two-component mixture of truncated SMN distributions on a special two partition of the real domain (R), and then propose the finite mixture of this family, called FM-TP-SMN models.It represents an alternative family to the well-known scale mixtures of skew normal (SMSN) family studied by [25].We have also used a hierarchical representation of the FM-TP-SMN and implemented an expectation-maximization (EM)-type algorithm for finding the maximum likelihood (ML) estimates of the proposed model.Studies by [21,23], show that by truncating the distribution in two partitions, makes it possible to obtain a better fit of empirical distribution because, the subjacent process of the complete likelihood is modeled.This way, the "two-piece" modeling is a direct competitor against the FM-SMSN family of distributions [21].
The rest of this paper is organized as follows.In Section 2, we review some main properties of the TP-SMN family and represent this family as a two-component mixture of the truncated SMN distributions.In Section 3, the FM-TP-SMN model is introduced and the ML estimates of the proposed model parameters via an EM-type algorithm are provided.In Section 4, numerical studies with an application of the proposed models and estimates are considered.Some conclusions and ideas for future research are offered in Section 5.

The Two-Piece Scale Mixtures of Normal Distributions
In this section, we analyze some necessary properties of the TP-SMN family of distributions for our proposed FM model.
The well-known SMN family introduced by [16] (the basis of the robust asymmetric TP-SMN family), has the following probability density function (PDF) and stochastic representation.Let X ∼ SMN(µ, σ, ν), then its PDF is and its stochastic representation is where φ(•|µ, σ 2 ) represents the density of N(µ, σ 2 ) distribution, H(•|ν) is the cumulative distribution function (CDF) of the scale mixing random variable U, which can be indexed by a scalar or vector of parameters ν, and W is a standard normal random variable that is independent of U.
The TP-SMN is a rich family of distributions that covers the asymmetric light-tailed TP-N (also called the epsilon-skew-normal; [26]), the asymmetric heavy-tailed TP-T, TP-SL and TP-CN distributions, and their corresponding symmetric members.Note that symmetric members of the TP-SMN and SMSN classes are the SMN family.In terms of density, for y ∈ R this family can be represented as where 0 < γ < 1 is the slant parameter, f SMN (•|µ, σ, ν) is given by (1) and is denoted by ), and U is the scale mixing random variable in (2).
By using auxiliary (latent) variables S j , j = 1, 2; in terms of the components of the mixture in Equation ( 4), the TP-SMN random variable can have the following stochastic representation where A = (−∞, µ) and SMN(•)I A (•) denotes the truncated SMN distribution on the interval A, and S = (S 1 , S 2 ) has a multinomial distribution with following probability mass function (PMF): and is denoted by

Finite Mixtures TP-SMN
In this section, we introduce the finite mixture of TP-SMN (FM-TP-SMN) model and obtain the ML estimates of this model's parameters.
Concerning the parameter ν j of the mixing distribution H(•|ν j ), for j = 1, . . ., g, it is worth noting that it can be a vector of parameters, e.g. the contaminated normal distribution.Thus, for computational convenience we assume that ν 1 = • • • = ν g = ν (see also [5]).
In terms of the components of the mixtures, Equation ( 7) can be equivalently obtained by where Z = (Z 1 , . . ., Z g ) ∼ Multinomial(1, π 1 , . . ., π g ) is a multinomial (component-label) vector with probability mass function Since only one component of Z can be equal to one (remaining ones are zero), events {Z j = 1} and {Z j = 1, Z r = 0; ∀j = r} are equivalent, indicating thus that the distribution of Y corresponds to the i-th component of the mixture; for further details, see e.g., [1].
The FM-TP-ESN densities in (7) are an extremely flexible class which includes the finite mixtures of SMN densities as special case, when ε j = 0, j = 1, . . ., g.
For each i.i.d.sample in the form of Y = (Y 1 , . . ., Y n ) , by considering the PDF (7), the log-likelihood function is (9)
The above hierarchical representation of the FM-TP-SMN model will be used to obtain the ML estimates via an ECME-algorithm.This algorithm is a generalization of the ECM-algorithm introduced by [27], which is an extension of the EM-algorithm [28].It can be obtained by replacing some CM-steps, which maximize the constrained expected complete-data log-likelihood function, with steps that maximize the corresponding constrained actual likelihood function.As [27,29] indicated, the joint ML estimates obtained by ECME-algorithms are much more efficient than other EM-type algorithms.
CML-step of the ECME-algorithm.

Numerical Studies
In this section, we assess the performance of the proposed FM model using simulated and real datasets.The implementations of the algorithms were based on the R software [30] version 3.5.1 with a core i7 760 processor 2.8 GHz, and a relative tolerance of 10 −5 was used for convergence of the ECME-algorithms.A sample copy of the R code is available up on request from the authors and will be available in an R package specialize to this proposed model.

Simulations
In this section, we have three simulations.In the first, we showed the robustness of the FM-TP-SMN models to classify heterogeneous data; in the second, we showed the misspecification of the proposed FM-TP-SMN models; and in the third simulation we considered suitability of the asymptotic properties for proposed model estimates.The FM models are useful for clustering the observations by allocating them into groups of observations that are similar in some sense.In fact, by considering the estimated (posterior) probabilities, we can assign such observation points to given groups.However, some atypical data have an undesirable effect to suitable clustering (see e.g., [1,2,8]).In our models, we consider the skewness and use the clustering as a base on them to show the robustness on the clustering of atypical data in components.We generated 1000 samples from the FM-TP-SMN with two components and for each sample, and considered the k-means clustering while we have ignored the true classification on these classifications.

Misspecification
For this section, we simulated 2000 samples with lengths n = 150 from FM-SN (asymmetric and light tailed components) and FM-ST (asymmetric and heavy tailed components) separately, with parameters of the previous simulation structure and with (λ 1 , λ 2 ) = (−2, 3).Then, we fitted various proposed FM-TP-SMN models to these data.In Table 2, various FM-TP-SMN models were first compared with the ordinary FM-NOR model (symmetric and light tailed components) and then various competitors within the FM-TP-SMN models (asymmetric components).The results in the first of four rows of Table 2 demonstrate that the number of preferred models belongs to the class of FM-TP-SMN models against the FM-NOR model.Also, the number of preferred models to fit the FM-SN is FM-TP-N, and in this case other preferred models except the FM-TP-N model are models which have similarities with it (for example FM-TP-T with large values of degree of freedom ν), i.e., preferred fitted models to the FM-SN with asymmetric and light tailed components are the FM-TP-SMN models with light tailed components.In the cases of FM-ST with asymmetric but heavy tailed components, also the FM-TP-SMN models with heavy tailed components were preferred.In this and the real application parts, the model selection criteria to choose the best model are: logarithm of the maximized likelihood function (log-like) which is ( Θ|y), Akaike information criteria (AIC); [31], Bayesian information criteria (BIC); [32], in the form of respectively, where k is the number of the model parameters.For this section, we simulated 400 samples each one with sample sizes n = 150, 600, 1000, 2000, 4000, from some FM-TP-T models with two components which are weak separated (WS), medium separated (MS) and strong separated (SS) of components, i.e., little, medium and large overlap of components respectively (see Figure 1), for which Using the proposed ECME algorithm to find the ML estimates we focus on the evaluation of Monte-Carlo average of biasness (MC-bias) and mean squared error (MSE) defined as of the ML estimates in each j-th sample, j = 1, . . ., 400, respectively given in Tables 3-5 by where ξ (i) j is the ML estimate of the parameter ξ j in the i-th sample.These results in Tables 3-5 are obtained from the different fitted FM-TP-SMN models and show the performance of the proposed models as well as their parameters estimates.As the sample size increased we naturally observed that the Monte-Carlo average bias of ML estimates and MSE were tending toward zero.
Table 3. Monte-Carlo average bias (MC-bias) and mean squared error (MSE) for maximum likelihood (ML) estimates in the weak separated components (WS) FM-TP-true (T) model.

Applications
In this section, we apply the FM-TP-SMN models on some various real data sets to show the performance of the proposed models and estimates in applications.

BMI Data
We considered the body mass index (BMI) data set collected for men aged between 18 and 80 years.The BMI data set was gathered with the National Health and Nutrition Examination Survey in the US National Center for Health Statistics (NCHS) of the Center for Disease Control (CDC).A strong relationship between the obesity problem and many chronic diseases has attracted attention in recent years, that is, most people with an obesity problem will have chronic diseases.The ratio of body weight in kilograms and height in squared meters (BMI) is a measure to determine the rate of relationship between overweight and obesity.In this way, a person with BMI > 25 is considered overweighed, while BMI > 30 is considered obese.
This dataset had 4579 participants with BMI records, but for modeling with finite mixture models, participants with weights within 39.50-70.00kg and 95.01-196.80 kg with 1069 and 1054 participants were considered in the first and second subgroups respectively.Lin et al. [7] were first analyzed this dataset by considering the reports in 1999-2000 and 2001-2002, and were fitted the FM-normal, FM-T, FM-SN and FM-ST, always with two components, and then [5,13] fitted the FM-SMSN models to this dataset.The results, obtained by [13], were general and involved the results by [5,7].So we fitted the proposed FM-TP-SMN models to this dataset and compared obtained results in the [13].
Table 6 contains the ML estimates of the FM-TP-SMN models with two components, and the Log-likelihood, AIC and BIC criterions of the proposed FM-TP-SMN models and FM-SMSN taken from Table 1 due to [13] appear in Table 7.As noted by Lin et al. [4] and Prates et al. [13], the criteria values in Table 7 indicate that the heavy tailed FM-SMSN models (FM-ST, FM-SCN and FM-SSL) had a better fit than the ordinary FM-NOR and FM-SN models, and also the FM-SSL and FM-ST were the best fitted models.Such results are for the FM-TP-SMN (with corresponding FM-SMSN counterparts) models, while the FM-TP-SMN models were more reasonable than FM-TP-SMN models.However, the FM-TP-SL and FM-TP-T were the best models.In Figure 2, we plot the fitted FM-TP-T and FM-ST densities curved on the histogram of BMI data.

UScrime Data
As a further application of the FM-TP-SMN models and proposed methodology, we consider the effect of punishment regimes on crime rates [33,34], which is of high interest to criminologists.This has been studied using aggregate data of 47 US states for 1960 given in this data frame, and we

2 and
P x (a, b) denotes the distribution function of the Gamma(a, b) distribution evaluated at x. Now, the expectation step (E-step) at the (r + 1)th iteration of the ECME-algorithm requires the calculation of Q

Figure 1 .
Figure 1.An artificial simulated finite mixture two-piece distributions based on the scale mixtures of normal (FM-TP-SMN) data of length n = 400 with two components: weakly separated components (WS); medium separated components (MS) and strongly separated components (SS), with curved probability density function (PDF) that datasets extracted from them.

Figure 2 .
Figure 2. Histogram of body mass index (BMI) data with fitted FM-TP-true (T) (left) and FM-ST (right) models with two components.

Figure 3 .
Figure 3. Histogram of UScrime data with fitted FM-TP-SMN and FM-SMSN models with two components.

Table 1 .
Mean of true allocations rates for fitted finite mixture two-piece distributions based on the scale mixtures of normal (FM-TP-SMN) models.

Table 2 .
The number of times (out of 2000) the true FM models chosen under seven proposed hypotheses.

Table 4 .
MC-Bias and MSE for ML estimates in the medium separated components (MS) FM-TP-T model.

Table 5 .
MC-Bias and MSE for ML estimates estimates in the strong separated components (SS) FM-TP-T model.

Table 6 .
ML estimation results for fitting FM-TP-SMN models to the body mass index (BMI) data.

Table 7 .
Model selection criteria for fitting FM-TP-SMN and FM-scale mixtures of the skew normal (SMSN) models to the BMI data.The best values are marked in bold.