A Multivariate Flexible Skew-Symmetric-Normal Distribution: Scale-Shape Mixtures and Parameter Estimation via Selection Representation

Multivariate skew-symmetric-normal (MSSN) distributions have been recognized as an appealing tool for modeling data with non-normal features such as asymmetry and heavy tails, rendering them suitable for applications in diverse areas. We introduce a richer class of MSSN distributions based on a scale-shape mixture of (multivariate) flexible skew-symmetric normal distributions, called the SSMFSSN distributions. This very general class of SSMFSSN distributions can capture various shapes of multimodality, skewness, and leptokurtic behavior in the data. We investigate some of its probabilistic characterizations and distributional properties which are useful for further methodological developments. An efficient EM-type algorithm designed under the selection mechanism is advocated to compute the maximum likelihood (ML) estimates of parameters. Simulation studies as well as applications to a real dataset are employed to illustrate the usefulness of the presented methods. Numerical results show the superiority of our proposed model in comparison to several existing competitors.


Introduction
The use of multivariate normal (MN) distribution plays a central role in statistical modeling. However, there are some situations where the data are not in agreement with the MN distribution. Departure from normality can take place in different ways, such as multimodality, lack in central symmetry, and positive or negative excess of kurtosis. The class of scale mixtures of skew-normal distributions (SMSN) whose general form was first introduced by Branco and Dey [1] includes many multivariate skew symmetric (MSS) distributions with only one mode.
More formally, a scale mixture distribution can be constituted by mixing a base density over a scaling distribution. Its density can be expressed in the form of the integral given by f (y) = ∞ 0 g y|κ(τ) dH(τ; ν), (1) where g y|κ(τ) is the conditional density of a p × 1 random vector Y given κ(τ). Herein, κ(·) is a positive function of a scaling variable τ with cumulative distribution function (cdf) H(τ; ν), indexed by the parameter vector ν.
As shown by Azzalini and Capitanio [6], the pdf of the MSS distribution can be written as follows: where ξ and Σ are, respectively, the location vector and scale covariance matrix; f 0 : R p → R + is a p-variate centrally symmetric pdf with respect to the origin, i.e., f 0 (x) = f 0 (−x); G 0 : R → [0, 1] is a univariate cdf satisfying G 0 (−x) = 1 − G 0 (x); and w : R p → R is an odd real-valued function, namely w{−x} = w{x}. The MSS class is equivalent to the one studied by Wang et al. [7] for which G 0 w{x} is replaced by π : In light of (6), it is possible to generate a wide range of asymmetric families of unimodal and multimodal skew distributions depending on the specification of the function w{x}. This essential property induces greater flexibility in the available shapes. For example, Ma and Genton [8] proposed a flexible class of skew-symmetric distributions by choosing w{x} = P K (x), where P K (x) is an odd polynomial function of order K.
The Hadamard product, also known as the Schur product, is a type of matrix multiplication that is commutative and simpler than the matrix product. See [9] for a comprehensive review and its applications to multivariate statistical analysis. The Hadamard product is advantageous in computations and algebraic manipulations because the products are entry-wise, the multiplication is commutative, and, particularly, the inverse is very easy to obtain and the computation of power matrices is straightforward. By convention, we use " " to denote the Hadamard operations (product and power). Let A = (a ij ) and B = (b ij ) be two p × q matrices of the same dimension but not necessarily square. Then, the Hadamard product between these two matrices, denoted by A B, is the element-wise product of A and B, that is, a p × q matrix whose (i, j) entry is a ij b ij . Accordingly, the nth Hadamard power of matrix A is defined as A n = [a n ij ]. More key properties concerning the multiplication and partial derivatives of the Hadamard product are deferred to Appendix A.
In the multivariate setup, the odd polynomial of order K has various combinations of the coefficients. For example, the bivariate case under an odd polynomial of order K = 3 is given by As an alternative to that of Ma and Genton [8] in more a general setting, we introduce the multivariate flexible skew-symmetric-normal (MFSSN) distribution, denoted by MFSSN p (ξ, Σ, α), which has the following pdf: where α = (λ 1 , . . . , λ m ) is a pm × 1 multiple-scaled vector of shape parameters and 2k−1 remains a p × 1 vector through an odd Hadamard power transformation of order 2K − 1, for K = 1, . . . , m. Notably, the MFSSN distribution encompasses the flexible generalized skew-normal (FGSN) distribution introduced by Ma and Genton [8] as the univariate case. Figure 1 illustrates the scatter-contour plots coupled with their marginal histograms of the bivariate MFSSN distribution under ξ = 0, Σ = I 2 and various specifications of shape parameters arisen from two setups of m. As can be seen, many different non-elliptically distributional shapes with multiple modes and asymmetric patterns can be produced. Additional flexibility can be gained by expanding the thickness of tails. This motivates us to propose a more general class of scale-shape mixtures of MFSSN (SSMFSSN) distributions. The class of SSMFSSN distributions can be hierarchically represented by where . Further, we denote the pdf of (8), the marginal pdf of Y is given by Obviously, the MFSSN model is obtained by setting τ 1 = τ 2 = 1 in (9). The family of SSMFSSN distributions introduced in (8) is quite vast, containing several subfamilies of asymmetric and multimodal distributions which have never been considered in the literature. Notice that the MSSMSN distribution described in (5) is a simple case of SSMFSSN by taking κ(τ 1 ) = 1/τ 1 , η(τ 1 , τ 2 ) = (τ 1 /τ 2 ) −1/2 , and λ j = 0, for j = 2, . . . , m. More importantly, the scale-shape mixtures of flexible generalized skewnormal (SSMFGSN) distributions proposed very recently by Mahdavi et al. [10] can be thought of as a univariate case of SSMFSSN when the dimension p = 1.
The EM algorithm [11] and some of its extraordinary variants such as the expectation conditional maximization (ECM) algorithm [12] and the expectation conditional maximization either (ECME) algorithm [13] are broadly applicable methods to carry out ML estimation for multivariate skew distributions in a complete-data framework. To the best of our knowledge, previous developments of the EM-type algorithm are based on the convolution-type representations, see, e.g., [14] for the MSN distribution, [15] for the MST distribution, [1] for the SMSN distribution, and [5] for the MSSMSN distribution. Since our proposed SSMFSSN model cannot be explicitly expressed by a convolution-type representation, we develop a novel EM-based procedure designed under the selection mechanism to compute the ML estimates. The rest of the paper is organized as follows. Section 2 presents the formulation of the general SSMFSSN model and discusses how to deploy the ECME algorithm for ML estimation based on the selection-type mechanism. Section 3 exemplifies some particular cases of SSMFSSN distributions constructed by setting different mixing distributions for τ. The proposed techniques are illustrated by conducting two simulation studies in Section 4 and analyzing a real data example in Section 5. We conclude in Section 6 with a few remaks and offer directions for future research.

The Family of SSMFSSN Distributions
A p-variate random vector Y∼SSMFSSN p (ξ, Σ, α, ν) is asserted to follow the SSMF-SSN distribution with location vector ξ, scale covariance matrix Σ, shape parameters α = (λ 1 , . . . , λ m ) , and flatness parameters ν if it has the following selection-type representation: where and (Z 1 , Z 2 ) ∼N p+1 (0, I p+1 ). Herein, ' d =' stands for equality in distribution, and U is obviously a continuous random variable symmetric about zero. Using this character-ization, the random samples for SSMFSSN p (0, I p , α, ν) can be simulated through the following scheme As a result, the random samples of the general SSMFSSN p (ξ, Σ, α, ν) can be obtained by the affine transformation Y = ξ + Σ 1/2 X.
For fitting the SSMFSSN model (10) within the complete-data framework via the EMtype algorithm, we introduce two latent variables W d = U | (U > 0) and γ = (γ 1 , has the following joint pdf: where Integrating out W from (12) gives the following joint pdf Therefore, the marginal pdf of Y is given by where the second equality holds if we further assume τ 1 and τ 2 are mutually independent. It is noteworthy that the shape mixtures of MSSMN distribution established by Arellano-Valle et al. [5] also belongs to the family of our proposed SSMFSSN model by imposing λ 2 = · · · = λ m = 0. Dividing (12) by (13) yields the following relation for which '≡' in (15) means that W and γ 1 are conditionally independent. Moreover, it is straightforward to show that where TN(µ, σ 2 )I A represents a doubly truncated normal distribution confined within the interval A = {a 1 < x < a 2 }, and I A is an indicator function of set A. Using Lemma 2 of Lin et al. [16], we have the following conditional expectation: By Bayes' rule, the conditional pdf of γ = (γ 1 , γ 2 ) given Y = y is

Parameter Estimation via the ECME Algorithm
The EM algorithm [11] is a widely used iterative technique to deal with ML estimation in models that involve incomplete data or latent variables. One primary virtue of EM lies in the fact of attractive monotone convergence properties and the preservation of simplicity and stability. In practice, a major limitation of EM is often that some estimators in the M-step cannot be solved in terms of closed-form expressions. To overcome this obstacle, the ECM algorithm proposed by Meng and Rubin [12] recommends replacing the E-step of EM with a sequence of simpler conditional maximization (CM) steps, yet it also enjoys the same convergence properties as EM. However, in certain problems, some of the CM-steps of ECM may become analytically intractable or suffer from slow convergence. As a further flexible extension, the ECME algorithm [13] divides the CM-steps of ECM to maximize either the Q-function, called the CMQ-step, or the corresponding constrained actual loglikelihood function, named as the CML-step. In what follows, we describe in greater detail how the proposed SSMFSSN model can be fitted by using the ECME algorithm.
Suppose that y = (y 1 , . . . , y n ) constitutes a set of p-dimensional observed samples of size n arising from the SSMFSSN model. Under the EM framework, the latent variables W = (W 1 , . . . , W n ) and γ = (γ 1 , . . . , γ n ) introduced in the preceding subsection are treated as missing data. Then, the complete data are given by y c = (y , W , γ ) . Further, we let θ = (ξ , vech(Σ) , α , ν ) denote the entire unknown parameters to be estimated in the SSMFSSN model, where vech(·) is the half-vectorization operator that stacks the lower triangular elements of a p × p symmetric matrix into a single p(p + 1)/2 vector.
According to (12), the log-likelihood function of θ corresponding to the complete-data y c , excluding additive constants and terms that do not involve parameters of the model, is given by with containing the elements of λ j on the main diagonal, and On the kth iteration, the E-step requires the calculation of the so-called Q-function, which is the conditional expectation of (19) given the observed data y and the current estimateθ (k) , where the superscript (k) denote the updated estimates at iteration k. To evaluate the Q-function, we require the following conditional expectations: which have explicit expressions that are discussed in detail in a subsequent section along withŝ which may not have standard form for some subfamilies. Substituting (20) and (21) into (19) yields the following Q-function: The CM-steps are implemented to update estimates of θ in the order of ξ, Σ, α and ν by maximizing, one by one, the Q-function obtained in the E-step. After some algebraic manipulations, they are summarized by the following CMQ and CML steps: CMQ-Step 1: Fixing Σ =Σ (k) and α =α (k) , we updateξ (k) via Proposition A2 by taking the partial derivative of (22) with respect to ξ. Since the derivation cannot get a closed-form expression for its maximizer, the solution ofξ (k+1) is validated by numerically solving the root of the following equation: where the two terms ζ

CMQ-Step 3:
Fixing ξ =ξ (k+1) , we update∆ (k) j via Proposition A3 by taking the partial derivative of (22) with respect to ∆ j each, j = 1, . . . , m. Since their solutions cannot be isolated and set equal to zeros, we have the following equation for finding the nonlinear roots of ∆ j : where ζ Collecting the above solutions turn out to beα (k+1) = (λ For some members of SSMFSSN, the calculation ofŝ 4i (ν) is not straightforward. An update ofν (k) can be achieved by directly maximizing the constrained actual log-likelihood function. This gives rise to the following CML-step:

CML-
Step:ν (k) is updated by optimizing the following constrained log-likelihood function: ν). (27) Note that the maximization in the above CML-step requires p-dimensional search of the objective function (constrained log-likelihood), which can be easily accomplished by using, for example, the optim routine in R Development Core Team [17]. The iterations of the above algorithms are alternately repeated until a suitable convergence rule is satisfied, e.g., the relative difference | (θ (k+1) )/ (θ (k) ) − 1| is sufficiently small, e.g. less than 10 −6 , which we consider in the numerical experiments, where (θ) = ∑ n i=1 ln f SSMFSSN (y i ; θ). To prevent infinite loop from adopting this criterion, the maximum number of iterations is set to 5000.
On the initialization of parameters for starting the algorithm, the location vectorξ (0) and the scale covariance matrixΣ (0) are specified as the sample mean vector and sample covariance matrix, respectively. The initial values for the shape parametersλ 2 ) , their initial values are chosen as relatively small values depending on the settings of parameter domain. To avoid getting stuck in one of the many local maxima of the likelihood function, a convenient method is to try a variety different of initial values with perturbations or using the bootstrap resampling method [14]. The solution with the highest log-likelihood value is treated as the ML estimates, denoted bŷ θ = (ξ , vech(Σ) ,α ,ν ) .

Examples of SSMFSSN Distributions
We present some special cases of SSMFSSN distributions which are induced by setting different mixing distributions for τ. For each case, additional conditional expectations are also derived for the implementation of ECME.

CMQ-
Step 4:ν (k+1) is obtained by solving the root of the following equation:

The Multivariate Flexible Skew-Symmetric-Slash-Normal Distribution
Let Y be a p-dimensional random vector with the following representation where Z∼N p (0, I p ) and is independent of τ d = U 1/ν ∼Beta(ν, 1), where U is the uniform distribution on the interval (0, 1). From (31), the conditional distribution Y given τ is N p (ξ, τ −1 Σ). Then, Y has a multivariate slash distribution with pdf given by where G(·; r) denotes the cdf of the gamma distribution with scale parameter 1 and shape parameter r. Using the law of iterated expectations, the mean vector and variancecovariance matrix of Y are If we select τ 1 ∼Beta(ν, 1) and τ 2 = 1 for (10), this generates the multivariate flexible skew-symmetric-slash-normal (MFSSSLN) distribution, denoted by Y∼MFSSSLN p (ξ, Σ, α, ν), with the pdf taking the form of Note that the MFSSSLN distribution contains the MFSSN distribution as a limiting case for ν → ∞ and encloses the multivariate skew-slash distribution considered by Wang and Genton [19] as a reduced case when λ 2 = · · · = λ m = 0.
According to (18), the conditional distribution γ 1 | y is given by In addition, some algebraic manipulations yield and To implement the ECME procedure for fitting MFSSSLN, the conditional expectations involved in (20) and (21) can be easily evaluated via the results of (17), (36), and (37). Besides, the DOFν (k) can be alternatively updated by maximizing the Q-function over ν, leading to the following CMQ-Step:
To obtainŝ (k) 4i , we require the following conditional expectation The resulting Q-function can be easily evaluated through (17) and (41) sinceŝ (k) 2i = 1. To estimate ν 1 and ν 2 , we perform the CML-Step, so the calculation ofŝ (k) 4i can be omitted.
With arguments similar to those of Lin et al. [20], it is straightforward to derive and where Additionally, using the law of iterated expectations, we can deduce that As a consequence, the E-step in (20) and (21) requires the calculation ofŝ 4i = E(ln γ i | y i ,θ (k) ), which can be directly evaluated via (45)-(47). Moreover, the procedure of updatingν (k) is the same as (30).
According to (18), it is easy to see γ 1 | Y = y∼Γ (ν 1 + p)/2, (ν 1 + δ 2 )/2 . The conditional pdf of γ 2 given Y = y is Further, we have the following conditional expectations: and Applying the law of iterated expectations to (15) and (50) gives With slight modifications as defined in (20) and (21), the necessary conditional expectations in the E-step includeŝ , which can be easily evaluated via the results given in (29) and (51)-(53), respectively. To numerically estimate ν 1 and ν 2 for the MFSSTT distribution, we resort to the following two root-finding equations:

CMQ-Step 4:ν
are obtained by solving the roots of the following two equations: = 0 (54) and 1 + ln

Recovery of the True Underlying Parameters
The first experiment intends to investigate the ability of the proposed ECME algorithm to recover the true underlying parameters. Monte Carlo samples of different sample sizes n = 100, 250, 500, and 1000 were generated from the MFSSN distributions specified in (7) and five examples of SSMFSSN distributions studied in Section 3. For ease of exposition, we considered the Hadamard power transformation of order three (K = 3) that allows two shape parameters in the skewing function Φ(·). Moreover, the flatness parameters for MFSSCN and MFSSTT were assumed to be equal, say ν 1 = ν 2 = ν, referred to as the MFSSCNe and MFSSTTe distributions. The presumed true parameters were ξ = (1, −1) , σ = vech(Σ) = (1, 0.5, 4) , λ 1 = (−2, 2) , and λ 2 = (1, −1) . Furthermore, ν = 3 was taken for the MFSSTN, MFSSSLN, MFSST, and MFSSTTe distributions, while ν = 0.7 was adopted for the MFSSCNe distribution since its support lies within the interval (0, 1). As an illustration, Figure 2 displays the scatter plots superimposed on the fitted contours for each type of data simulated from one trail. For all scenarios, the accuracies of the parameter estimates are assessed by computing the mean absolute bias (MAB) and the root mean square error (RMSE) over R = 100 replications. For a vector of parameters θ = (θ 1 , . . . , θ p ) , these measures are, respectively, defined as whereθ kr denotes the ML estimate of the kth parameter at the rth replication and θ A k represents the actual value of θ k . The experimental results are summarized in Table 1. It is readily seen both MAB and RMSE values tend to approach zero with increasing the sample size. While this study is limited to the simplest case (p = 2; m = 2), our developed ECME algorithm shows favorable ability to recover the true parameter values with data generated exactly according to model assumptions. Similar experiments have also been undertaken on more complex scenarios (p = 3; m = 3). The extensive results would not necessarily be excessively reported since the conclusions are in accordance with those already presented.

Comparing the Proposed Procedure with Convolution-Type EM Algorithms
The second experiment aims to compare the performance of the proposed selectiontype ECME procedure outlined in Section 2.2 with the traditional EM-based algorithms derived based on convolution-type representations. As an illustration, we consider the fitting of MSN, MST, and MSCN distributions arisen from the multivariate skew-normal independent (SNI) family studied by Cabral et al. [22]. As discussed above, they are special cases of our proposed MFSSN, MFSST, and MFSSCN distributions by setting λ 2 = · · · = λ m = 0. Accordingly, the CMQ-Step 1 in Section 2 can be simplified as follows: CMQ-Step 1: Fixing Σ =Σ (k) and λ 1 =λ (k) 1 , we obtainξ (k+1) bŷ 1i +Σ (k)∆ (k) In total, 100 Monte Carlo (MC) samples of sizes n = 100, 500, and 1000 were generated from each of the three distributions. The true parameters were the same as those given in the previous experiment except for λ 2 = 0. Each simulated sample was fitted twice with the proposed selection-type ECME procedure and the EM-type algorithm based on convolution-type representations, as implemented by the mixsmsn R package [23]. For a fair comparison, we started the two algorithms using the same initial values as described at the end of Section 2. All computations were carried out by Microsoft R package 3.5.1 in win64 environment of a desktop computer with 2.80-GHz/Intel Core(TM) i7-7700HQ CPU Processor and 16.0 GB RAM. Performance evaluation was assessed by the execution CPU time and the converged log-likelihood maxima.
The box plots depicted in Figure 3 reveal the selection-type algorithm demands much lower computational cost than those required for the convolution-type algorithm. The phenomenon is more apparent for the MST and MSCN distributions, particularly for larger n. The high efficiency of the selection-type algorithm can be ascribed to the fact that its E-step is designed by virtue of simplification. Finally, it is worth mentioning that both algorithms can achieve the same final log-likelihood, as demonstrated by violin plots in Figure 4.

An Illustrative Example: The Wind Speed Data
We considered a trivariate dataset analyzed by Azzalini and Genton [24] for the study of spatial distribution of wind speed by means of the MST and various MSSMSN distributions proposed by Azzalini and Capitanio [6] and Arellano-Valle et al. [5], respectively. This dataset contains 278 hourly average speed assembled at three meteorological towers: Goodnoe Hills (gh), Kennewick (kw), and Vansycle (vs) from 23 February to 30 November 2003 recorded at midnight when wind speeds tend to peak. The positive and negative signs of wind speed measurements represent a westerly wind direction and an easterly wind direction, respectively. The Ljung-Box test indicates weak serial correlation for observations measured at the three stations. For modeling these data, we followed Azzalini and Genton [24] to treat the observations as being independent and identically distributed. Figure 5 presents histograms overlaid with kernel density curves obtained by using R density() function for measurements collected at each tower.
Six SSMFSSN models of order 3 were considered to fit the wind speed data. For the sake of comparison, the MST, MSTN, and MSTC distributions belonging to the MSSMSN family [5] were also fitted as sub-models of SSMFSSN subject to the constraint of λ 2 = 0.
To select an appropriate model from the candidates, we adopted the Akaike information criterion (AIC) [25] and the Bayesian information criterion (BIC) [26], which are the two most widely used model selection indices based on penalized likelihood and applicable for both nested and non-nested models. The two criteria are defined as where d is the number of free parameters in the model and max is the maximized loglikelihood value. A lower AIC or BIC value indicates that a closer fit of the model to the data. Table 2 compares the ML estimation results for nine candidate models. As can be seen, our proposed SSMFSSN models perform favorably as compared to three MSSMSN analogs because they suffer from a lack of ability to capture the possibly bimodal behavior of the wind speed data ( Figure 5). Accordingly, the MFSSTT distribution provides the best fit in terms of the lowest value of AIC as well as BIC, followed by the MFSST distribution. The MSTC and MST are the top two MSSMSN models with smaller AIC and BIC values.   Table 3 summarizes the resulting ML estimates of 6 considered SSMFFSN models together with their asymptotic standard errors obtained by performing the parametric bootstrap method [27]. Notably, the estimates of the shape and flat parameters indicate the presence of skewed and leptokurtic characteristics toward different directions among the three variables. As an illustration, the fitted 3D contour densities of MSTC, MST, MFSST, and MFSSTT distributions are depicted in Figure 6. It is interesting to see that the MFSSTT model having the smallest AIC and BIC can adapt the shape of the wind speed data more closely than the other three competitors.

Conclusions
We introduce a novel family of SSMFSSN distributions as a generalization of the work of Mahdavi et al. [10] that can capture simultaneously the dependency among multivariate responses, skewness, heavy-tailedness, and, in particular, multimodal density shapes without resorting to the use of finite mixtures [14,15,21]. Since the SSMFSSN model cannot be represented by a convolution-type form, this stimulates us to devise a feasible ECME algorithm for ML estimation under the selection-type mechanism. The effectiveness and efficiency of the algorithm are evaluated by conducting two simulation studies. Numerical analysis of a real dataset highlights the potential and capability of our proposed approach as a promising alternative tool for modeling multimodal multivariate data with asymmetrical behavior. Computer programs for implementation of our methods can be installed as a R package from Github devtools::install_github(''a-mahdavi/SSMFSSN.EM''). Further developments of the current approach could be exploited for powerful extensions of the factor analysis model or finite mixtures thereof with censored or possibly missing values that were considered recently by the authors of [28][29][30][31][32][33][34][35]. One limitation of the SSMFSSN model is that it may not be suited to the data with modes having too far distances. Another worthwhile extension of this work is to pursue a mixture modeling framework of the current approach that would be an effective way to resolve this problem.
Author Contributions: A.M. and T.-I.L. conceived the project, developed the statistical methods, designed the approach, and analyzed the results. All authors contributed to the development of the methodology and to writing the manuscript. All authors have read and agreed to the published version of the manuscript.