Next Article in Journal
Copy-Move Forgery Detection Using Scale Invariant Feature and Reduced Local Binary Pattern Histogram
Previous Article in Journal
Position Dependent Planck’s Constant in a Frequency-Conserving Schrödinger Equation
Previous Article in Special Issue
A Selective Overview of Skew-Elliptical and Related Distributions and of Their Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Inference for Skew-Symmetric Distributions

by
Fatemeh Ghaderinezhad
1,
Christophe Ley
1,* and
Nicola Loperfido
2
1
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Krijgslaan 281, S9, Campus Sterre, 9000 Gent, Belgium
2
Dipartimento di Economia, Società e Politica, Università degli Studi di Urbino “Carlo Bo”, Via Saffi 42, 61029 Urbino (PU), Italy
*
Author to whom correspondence should be addressed.
Symmetry 2020, 12(4), 491; https://doi.org/10.3390/sym12040491
Submission received: 27 February 2020 / Revised: 17 March 2020 / Accepted: 18 March 2020 / Published: 25 March 2020
(This article belongs to the Special Issue Recent Advances on Symmetry in Mathematical Statistics)

Abstract

:
Skew-symmetric distributions are a popular family of flexible distributions that conveniently model non-normal features such as skewness, kurtosis and multimodality. Unfortunately, their frequentist inference poses several difficulties, which may be adequately addressed by means of a Bayesian approach. This paper reviews the main prior distributions proposed for the parameters of skew-symmetric distributions, with special emphasis on the skew-normal and the skew-t distributions which are the most prominent skew-symmetric models. The paper focuses on the univariate case in the absence of covariates, but more general models are also discussed.

1. Introduction

The need to model skewed data led to the development of many skewed distributions which are obtained by adding to a symmetric distribution a parameter that controls skewness [1]. Arguably, the best known example is the skew-normal distribution introduced by Azzalini (1985) [2]. Its probability density function (pdf) is
s n ( x ; μ , σ , λ ) = 2 σ ϕ x μ σ Φ λ x μ σ , x R ,
where μ R is the location parameter, σ R + the scale parameter, both inherited from the standard normal distribution with pdf denoted by ϕ and (cumulative distribution function) cdf Φ , and λ R is called the skewness parameter given that density (1) is asymmetric for λ 0 and reduces to the standard normal pdf for λ = 0 . Several extensions and generalizations followed, see [3], a very general and highly popular being the skew-symmetric distributions of Wang et al. (2004) [4] with pdf
s f ; Π ( x ; μ , σ , λ ) = 2 σ f x μ σ Π x μ σ , λ , x R ,
where f is the symmetric density to be skewed and Π : R × R [ 0 , 1 ] is a so-called skewing function satisfying Π ( z , λ ) + Π ( z , λ ) = 1 z , λ R and Π ( z , 0 ) = 1 / 2 z R . The most widely used subfamily of skew-symmetric distributions has densities of the form
s f ; G ( x ; μ , σ , λ ) = 2 σ f x μ σ G λ x μ σ , x R ,
where G is any symmetric, univariate, absolutely continuous cumulative distribution function. In (2), μ R is a location, σ R + a scale and λ R a skewness parameter. The function G might be replaced by a function w · satisfying 0 w x = 1 w x 1 , as done in [4,5]. Different skew models might then be obtained by different choices for f and G, as for example the pdf and the cdf of the power exponential distribution [6], the Student t distribution [5] and the logistic distribution [7].
The skew-normal distribution admits several stochastic representations, of which a very simple is given as follows [8]: let U and V be independent, standard normal variables, then
Z λ = λ 1 + λ 2 | U | + 1 1 + λ 2 V
has the standard skew-normal distribution S N ( λ ) = S N 0 , 1 , λ . This representation models departures from normality, allows for simple generation of random numbers and the calculation of odd moments. Despite its numerous appealing probabilistic properties [3], the skew-normal distribution suffers from serious inferential problems, as remarked already by Azzalini (1985). In the first place, the method of moments might lead to complex estimates of the parameters. In the second place, when μ = 0 , σ = 1 and all observations are positive (negative) the likelihood function will be monotonically increasing (decreasing) in λ , thus making the maximum likelihood estimate (MLE) of λ (minus) infinite. In the third place, the MLE is quite unstable even in the presence of negative and positive observations. In the fourth place, skew-normal data offer little help in discriminating between different values of the shape parameter λ : very different values of λ might correspond to very similar skew-normal densities. In the fifth place, the sampling distribution of the MLE does not allow for analytically tractable standard errors and confidence intervals. In the sixth place, if all three parameters are assumed to be unknown, the profile likelihood function for λ has a stationary point at λ = 0 , regardless of the observed sample, and the Fisher information matrix is singular when λ approaches zero.
The latter problem, also known as the singularity problem, is very serious and has motivated an active line of research both for multivariate and semiparametric generalizations of the skew-normal distribution. It has been discussed in various papers such as [9,10,11,12,13,14,15]. Different parameterizations have been proposed by, among others, [2,11,13,16]. While most authors were pointing at some special status for the normal distribution as symmetric base distribution, Ref. [12] showed that this singularity can occur for very general symmetric densities f and provided a full characterization, in a general multivariate context, of the singularity problem, showing that it is due to unfortunate mismatches between the symmetric density f and the skewing function Π ( · , · ) . It is to be noted that in the context of models (2) singularity occurs only when f is normal, irrespective of the choice of function G. However, even the proposed solutions of reparameterization do not remove all problems as remarked by Azzalini and Capitanio (1999) [9]: “there are cases where the likelihood shape and the MLE are problematic. We are not referring here to difficulties with numerical maximization, but to the intrinsic properties of the likelihood function, not removable by change of parameterization. In case of this sort, the behaviour of the MLE appears quite unsatisfactory, and an alternative estimation method is called for”.
These difficulties arise from the shape of the likelihood function, which can be modified by appropriate weighting functions [17]. The most natural and best known approach to weight the likelihood function is the Bayesian one, where the prior distribution plays the role of the weight function. The Bayesian approach might adequately address both point estimation and hypothesis testing of the skewness parameter λ . Most papers in this area focused on objective priors for the skew-normal distribution, whereas only a few of them dealt with its multivariate or semiparametric generalizations. Bayesian inference and prior elicitation of λ for skew-symmetric distributions are challenging since λ not only controls symmetry but also spread, modes and tail behaviour. This paper is not meant to be a comprehensive review of this very active research topic, but rather a handy source for interested readers to the Bayesian analysis of skew-symmetric distributions. The paper is structured as follows. Section 2 recalls some basic concepts regarding the default choices for the prior distribution. Section 3 and Section 4 review the prior distributions for the parameters indexing the skew-normal and some other skew-symmetric models, respectively, while Section 5 reviews some of the literature on generalizations of the univariate skew-symmetric model. Final comments are provided in Section 6.

2. Default Prior Choices in Bayesian Statistics

The Bayesian approach to quantify uncertainty in statistical inference can be broken down into three steps [18]. The first step consists in choosing a joint probability distribution for observable and unobservable quantities, consistently with the available knowledge about the underlying scientific problem and the data collection process. The second step is to condition on the observed data, which is carried out by means of several computational techniques. The third step is assessing the model’s fit and interpreting the implications of the resulting posterior distribution. In this section we focus on the first step and review the most common default choices for the prior distribution. We use the following notation in the sequel: θ is our parameter of interest, π ( θ ) is the prior distribution, π ( θ | t ) is the posterior distribution given data information t, p ( t | θ ) is the data likelihood, p ( t , θ ) = p ( t | θ ) π ( θ ) is the joint distribution of t and θ , and p ( t ) is the marginal distribution of t. With this notation in hand, we of course have that the posterior equals
π ( θ | t ) = π ( θ ) p ( t | θ ) p ( t ) .

2.1. Jeffreys Priors

In Bayesian analysis there are situations in which the available prior information is too vague to be formalized into a probability distribution, too controversial to be acceptable in scientific communities or too complicated to allow for a reliable statistical analysis. Hence the need for priors with minimal effect on the posterior distribution, so that “the chosen prior would let the data speak for themselves” [19]. Reference analysis aims at an “objective” Bayesian solution to statistical inference in the same way as conventional statistical methods, where solutions only depend on model assumptions and observed data.
One of the earliest non-informative (objective) priors is the uniform prior for the Binomial proportion [20,21]. Unfortunately, this prior suffers from its lack of invariance under one-to-one reparameterization. Jeffreys’ prior is a non-informative prior which is invariant under one-to-one reparameterization and is proportional to the positive square root of the Fisher information associated with the parameter of interest. For regular models where asymptotic normality holds, the Jeffreys prior enjoys some optimality properties in the absence of nuisance parameters, but suffers from serious difficulties in the presence of nuisance parameters. As a first example, in the Neyman–Scott problem it leads to a strong inconsistency in Bayes estimation of the error variance [22]. As a second example, when estimating the product of two independent normal means, a circular symmetric prior was found to be superior to Jeffreys’ prior [23]. As a third example, Jeffreys himself supported the use of another prior for location-scale models.

2.2. Reference Priors

Intuitively, a reference prior for some real-valued parameter θ is a prior of the form π ( θ ) = π ( θ | T , P ) which maximizes the missing information about θ within the class P of prior distributions compatible with the available prior knowledge T [19]. More formally, let D be a set of observations, generated by some random mechanism p ( D | θ ) that only depends on a real-valued parameter θ Θ . Furthermore, let t = t ( D ) T be any sufficient statistic (which may be the complete data set D). In Shannon’s general information theory the amount of information I θ T , π ( θ ) , which may be expected to be provided by D or equivalently by t ( D ) about θ , is
I θ T , π ( θ ) = T Θ p ( t , θ ) log p ( t , θ ) p ( t ) π ( θ ) d θ d t = E t Θ π ( θ | t ) log π ( θ | t ) π ( θ ) d θ ,
which is the expected Kullback–Leibler divergence of the prior from the posterior (here E t indicates that the expectation is taken on the t-part). The functional I θ [ T , π ( θ ) ] is concave, non-negative and invariant under one-to-one transformations of θ . Lindley (1956) [24] and Bernardo (1979) [19] defined the reference prior as the prior maximizing (4). There are some situations where we need the asymptotic maximization of the above expectation, since for a fixed n, its maximization might lead to a discrete prior with finitely many jumps, which is hardly compatible with the concept of diffuse prior [25]. Ref. [19] proved that, in the absence of any nuisance parameter, Jeffreys’ prior yields the necessary maximization.

2.3. Matching Priors

Matching priors allow for posterior probability statements which have an interpretation as confidence statements in the sampling model. Matching priors aim at achieving a compromise between Bayesian and frequentist inference based on some order of approximation, thus providing default priors for routine use in Bayesian inference and possibly more palatable to frequentist statisticians. The concept of matching prior appears to have been proposed first by Lindley (1958) [26] and several matching priors have been proposed since, such as for example quantile matching priors, matching priors for distribution functions, highest probability density matching priors and matching priors associated with likelihood ratio statistics [27].
In this subsection we illustrate the approach to matching priors introduced by Welch and Peers in the seminal paper [28]. Suppose that Y 1 , , Y n are i.i.d. random variables with pdf f ( Y | θ ) , where θ is real-valued. In addition, assume all the regularity conditions which allow to expand the posterior around the MLE θ ^ n . Furthermore, for 0 < α < 1 , let θ 1 α π ( Y 1 , , Y n ) θ 1 α π denote the ( 1 α ) -th asymptotic posterior quantile of θ based on the prior π , that is
P π [ θ θ 1 α π | Y 1 , , Y n ] = 1 α + O p ( n r )
for some r > 0 . If r = 1 , π is called a first-order matching prior and if r = 3 / 2 the prior π will be a second-order probability matching prior. For instance, the Jeffreys prior is a first-order probability matching prior in the absence of nuisance parameters. We illustrate this appealing property with an example from [29]. Suppose that Y 1 , , Y n are i.i.d. with pdf N ( θ , 1 ) and that π ( θ ) = 1 with < θ < . Then the posterior density π ( θ | Y 1 , , Y n ) stems from the N ( Y ¯ n , n 1 ) . By considering z 1 α as the 100 ( 1 α ) % quantile of the N ( 0 , 1 ) distribution, we have
P [ n ( θ Y n ¯ ) z 1 α | Y 1 , , Y n ] = 1 α = P [ n ( Y n ¯ θ ) z 1 α | θ ] .
Therefore, the one-sided credible interval Y n ¯ + z 1 α / n for θ has exact frequentist coverage probability ( 1 α ) . This exact matching does not always exist. However, if Y 1 , , Y n are i.i.d. random variables then
θ n ^ | θ a N θ , I 1 / n
where I is the expected Fisher information and a means asymptotically equivalent in distribution. Using the delta method we have
g ( θ ^ n ) | θ a N g ( θ ) , ( g ( θ ) ) 2 I 1 / n .
Therefore, if g ( θ ) = I 1 / 2 ( θ ) , then g ( θ ) = θ I 1 / 2 ( t ) d t and n g ( θ n ^ ) g ( θ ) θ is asymptotically N ( 0 , 1 ) . In the absence of nuisance parameters, a first-order matching prior for θ is a solution of the differential equation
d d θ π ( θ ) I 1 / 2 ( θ ) = 0 ,
so that the Jeffreys prior is the unique first-order matching prior, but it does not always hold for the second-order matching probability [27].
To obtain the second-order matching prior we need an asymptotic expansion of the posterior distribution function up to O ( n 1 ) and the differential equation given by Mukerjee and Dey (1993) [30] and Mukerjee and Ghosh (1997) [31], that is
1 3 d d θ ( π ( θ ) I 2 ( θ ) g 3 ( θ ) + d 2 d θ 2 ( π ( θ ) I 1 ( θ ) = 0
where g 3 ( θ ) = E d 3 log f ( Y 1 | θ ) d θ 3 θ . Jeffreys’ prior is the unique second-order matching prior if it satisfies (5), as it happens for the location-scale families: for π J ( θ ) = I 1 / 2 ( θ ) , (5) converts to
1 3 d d θ [ I 3 / 2 ( θ ) g 3 ( θ ) ] + d 2 d θ 2 I 1 / 2 ( θ ) = 0
which requires
1 3 I 3 / 2 ( θ ) g 3 ( θ ) + d d θ I 1 / 2 ( θ )
to be constant for all values of θ . We refer the reader to [29] for more details on cases where, in the absence of nuisance parameters, there is not a second-order probability matching prior, and where, in the presence of nuisance parameters, there are first- and second-order matching priors and where there is not a second-order matching prior.

3. Prior Choices for the Skew-Normal Distribution

This section reviews the prior distributions proposed for Bayesian inference on the parameters of the skew-normal distribution: the reference prior by Liseo and Loperfido (2006) [32], the matching prior by Cabras et al. (2012) [33] and the informative prior by Canale and Scarpa (2013) [34].

3.1. The Reference Prior

Liseo and Loperfido (2006) [32] first proposed a default prior for the shape parameter of the location-scale-free (standard) skew-normal model s n ( z ; λ ) = 2 ϕ ( z ) Φ ( λ z ) , z R . The associated Jeffreys prior is
π J ( λ ) I 1 / 2 ( λ ) where I ( λ ) = 2 z 2 ϕ ( z ) ϕ 2 ( λ z ) Φ ( λ z ) d z .
This prior is proper, symmetric about λ = 0 , decreasing in | λ | and its tails are of order O ( λ 3 / 2 ) . This prior is therefore suitable for testing the hypothesis of symmetry, which might be formalized in the skew-normal framework as H 0 : λ = 0 versus H 1 : λ 0 . The same authors investigated the frequentist performances of this prior with simulated data, concluding that the Bayesian approach might be beneficial in easing some inferential difficulties of the frequentist approach for the standard skew-normal distribution.
Ref. [32] also considered a default Bayes analysis for the general scalar case (1), where λ is the parameter of interest and the location parameter μ and the scale parameter σ are the nuisance parameters. They are assumed to be independent of λ and to have a normal-inverse gamma distribution:
μ | σ N μ 0 , σ 2 τ with μ 0 R , τ > 0 , σ 2 G a m m a ( α , β )
where G a m m a ( α , β ) is a Gamma distribution with parameters α , β > 0 . The default prior π ( μ , σ ) σ 1 is a limiting case and is the conditional reference prior for ( μ , σ ) given λ . These assumptions allow for a closed-form expression of the marginal likelihood for λ . The proposed method has been successfully applied to the infamous “frontier” dataset (see http://azzalini.stat.unipd.it/SN/frontier.dat), where the maximum likelihood estimate of the skewness parameter λ is infinite.
Bayes and Branco (2007) [35] highlighted the advantages of the Bayesian approach and proposed two priors. They considered the stochastic representation (3) of the skew-normal distribution and, following the Bayes–Laplace rule, chose the uniform distribution on the interval 1 , 1 as a prior for λ / 1 λ 2 , thus leading to a t ( 0 , 0.5 , 2 ) distribution as prior for λ , where t ( a , b , c ) denotes the Student t distribution centered in a R with scale b > 0 and c > 0 degrees of freedom, which is a non-vague and non-subjective prior. They further proposed the tractable approximation t ( 0 , π 2 / 4 , 1 / 2 ) for the Jeffreys prior from [32]. They motivated it by the following approximation (see [36]):
1 π ϕ ( x ) Φ ( x ) ( 1 Φ ( x ) ) 1 2 π ( π / 2 ) exp 2 x 2 π 2 .

3.2. The Matching Prior

Cabras et al. (2012) [33] proposed another approach towards Bayesian inference about the shape parameter of the skew-normal distribution. It is based on a pseudo-likelihood function and a matching prior distribution for the shape parameter when location and scale parameters are unknown. First, they derive the marginal likelihood
L m ( λ ) = 0 + L ( λ , η ) σ d μ d σ ,
where L ( λ , η ) = i = 1 n s n ( y i ; η , λ ) is the skew-normal likelihood function, η = ( μ , σ ) is the nuisance parameter and σ 1 is the right-invariant Haar measure on the location-scale group of transformations, whose action on the parameter space leaves λ unchanged. By considering the fact that the marginal likelihood (6) can be approximated by the modified profile likelihood L m p ( λ ) of [37] since L m ( λ ) = L m p ( λ ) ( 1 + O ( n 1 ) ) (see [38]) and by invoking results about the use of pseudo-likelihood functions in Bayesian analysis, the matching prior π ( λ ) is simply proportional to the square root of the inverse of the asymptotic variance of the MLE of λ . Based on Ventura et al. (2009) [39], the matching prior for λ corresponding to (6) is
π ( λ ) I λ λ . η ( λ , η ^ λ ) 1 / 2 ,
where η ^ λ is the constrained MLE of η for a given λ and I λ λ . η ( λ , η ) = I λ λ ( λ , η ) I λ η ( λ , η ) I η η ( λ , η ) 1 I η λ ( λ , η ) is the partial information with I λ λ ( λ , η ) , I λ η ( λ , η ) , I η η ( λ , η ) and I η λ ( λ , η ) blocks of the expected Fisher information
I ( λ , η ) = I λ λ ( λ , η ) I λ η ( λ , η ) I η λ ( λ , η ) I η η ( λ , η ) .
For the interested reader we provide the detailed quantities of this matrix:
I λ λ ( λ , η ) = a 2 , I λ μ ( λ , η ) = 1 σ b A 3 / 2 λ a 1 , I λ σ ( λ , η ) = λ a 2 σ , I μ μ ( λ , η ) = ( 1 + λ 2 a 0 ) σ 2 , I μ σ ( λ , η ) = 1 σ 2 b λ ( 1 + 2 λ 2 ) A 3 / 2 + λ 2 a 1 , I σ σ ( λ , η ) = 2 + λ 2 a 2 σ 2 ,
where b = 2 π , A = 1 + λ 2 , and a i = E Z i ϕ ( λ Z ) Φ ( λ Z ) 2 , i = 0 , 1 , 2 , with Z following the standard skew-normal with parameter λ . However, the prior (7) might be data dependent because of the presence of η ^ λ . A prior for λ which does not suffer from this problem is proportional to
a 2 A 2 [ π ( 1 + a 0 λ 4 ) + λ 2 ( π ( 1 + a 0 ) 4 ) ] + 2 2 π a 1 λ A 3 / 2 π a 1 2 λ 2 A 3 2 π A 3 [ 2 + λ 2 ( 2 a 0 + a 2 ) + λ 4 ( a 0 a 2 a 1 2 ) ] 2 ( λ + 2 λ 3 ) 2 2 ( 2 π ) a 1 λ 3 A ( 1 + 3 λ 2 + 2 λ 4 ) .
This prior is proper, symmetric at the origin and with tails of order O ( λ 3 / 2 ) . It also compensates for the possible monotonicity of the modified profile likelihood (6) and possesses good frequentist properties.

3.3. The Informative Priors of Canale and Scarpa (2013)

Canale and Scarpa (2013) [34] discuss two informative priors for the skewness parameter of the skew-normal distribution. Their study is motivated by an interesting data set on marks referring to first-year undergraduate students for the program in Economics at the University of Padua. The skew-normal model is implemented on students’ grades in the first mandatory class of Statistics. Making inference on the grades of the previous years shows that the distribution of Statistics grades is skewed to the right around a certain mean, which explains why they need informative priors for their endeavour.
The first prior is the normal density with hyperparameters reflecting prior beliefs about the expectation and variance of λ in order to center the prior on a particular guess for λ . The resulting posterior belongs to the family of unified skew-normal (SUN) distributions, introduced in [40]. The explicit expressions for the mean and the variance of the posterior are not very tractable but they allow for a simple interpretation. The second informative prior is itself a skew-normal, motivated by the distribution of grades of university examinations [41]. The skew-normal prior includes location and scale hyperparameters as well as a skewness hyperparameter reflecting the beliefs on the direction of skewness. The posterior distribution also belongs to the class of SUN distributions. The authors set the location hyperparameter of the skew-normal prior equal to zero in order to have a rough prior information only on the skewness side of the distribution of the data: considering negative or positive values for the skewness hyperparameter puts more prior mass on the positive or negative semi-axis. In both cases the resulting posteriors are intractable, but the SUN parametrization eases efficient sampling methods for posterior computation via Markov Chain Monte Carlo (MCMC). For both prior choices for λ , they have specified an independent normal inverse gamma distribution for the location and scale parameters. To perform the related Bayesian inference, the authors have presented an algorithm to simulate the full conditional distribution of the skewness parameter λ given the location and scale parameter. This algorithm uses a Gibbs sampler for the stochastic representation of the SUN model. To get the posterior, the authors introduced normal latent variables, say η 1 , , η n . Conditionally on these latent variables, the generic i-th observation will be normally distributed with a specific mean and variance. This way of constructing the Gibbs sampler leads to conjugacy for the location and scale parameters. For the detailed computations we refer the reader to [34]. This sampling method is useful in MCMC methods to approximate the posterior distribution.
We also wish to mention that generally the MCMC method in Bayesian statistics bears a particular importance in model selection. Suppose we have a set of models reflecting competing hypotheses about the underlying data set, where each model is characterized by a specific vector of parameters of interest. From the Bayesian viewpoint, these models are compared pairwise through their Bayes factor which is the ratio of relative marginal likelihoods. Obviously, finding the marginal likelihood is often not feasible in particular analytically. We refer the reader to [42] and references therein for estimation methods of the marginal likelihood, specifically in general non-nested models.

4. Prior Choices for Other Skew-Symmetric Distributions

There exists a wide literature on the Bayesian analysis of skew-symmetric distributions different form the skew-normal. Azzalini (1986) [6] and Naranjo et al. (2012) [43] provided a Bayesian analysis of a skewed exponential power distribution. This family includes the symmetric exponential distribution as well as the skew-normal distribution, and provides flexible distributions with lighter and heavier tails. Interestingly, this family of densities can fit each tail separately. Hossianzadeh and Zare (2016) [44] estimated the parameter of the discrete skewed Laplace distribution by an empirical Bayesian analysis and compared it with the maximum likelihood approach. In what follows, we will first consider the popular skew-t distribution and then focus on two general approaches for skew-symmetric densities.

4.1. Jeffreys’ Prior for Skew-t Distributions

Skew-t distributions are the best-known alternatives to skew-normal ones, due to their flexibility: they can model any level of skewness and excess kurtosis. However, they pose some further inferential problems, which we illustrate in the simpler case of the Student t distribution with known location and scale parameters. Ref. [45] discussed that the likelihood function approaches infinity when the degrees of freedom go to zero, and showed that the supremum of the likelihood function may be achieved when the degrees of freedom go to infinity. There have been several frequentist attempts to solve the inferential problems of the skew-t distribution with all parameters unknown. Sartori (2006) [46] used the modified score function, which requires the degrees of freedom to be fixed. Azzalini and Genton (2008) [10] proposed a deviance approach which is only partially satisfactory, since its implementation might not be straightforward. We illustrate the problem with the univariate skew-t distribution. The deviance approach replaces the boundary maximum likelihood estimate of λ , υ by the smallest vector λ 0 , υ 0 for which the null hypothesis H 0 : λ , υ = λ 0 , υ 0 is not rejected. The deviance approach assumes that such smallest vector exists but neither theoretical results nor simulation studies support this assumption. For these reasons we cannot exclude the existence of samples admitting two vectors λ a , υ a and λ b , υ b satisfying λ a > λ b and υ a < υ b for which the hypotheses H 0 : λ , υ = λ a , υ a and H 0 : λ , υ = λ b , υ b are not rejected, while no vector λ 0 , υ 0 exists satisfying λ 0 , υ 0 < λ a , υ a , λ 0 , υ 0 < λ b , υ b and for which the null hypothesis H 0 : λ , υ = λ 0 , υ 0 is not rejected. We reckon that this situation is likely to happen, given that when either of the parameters λ or υ is large the shape of the skew-t density function remains almost unchanged if one parameter is substantially increased while the other is substantially decreased.
Given these shortcomings, Branco, Genton and Liseo (2012) [47] studied Bayesian analysis for various forms of skew-t distributions. Denoting by ν > 0 the degrees of freedom parameter, they first considered skew-t densities of the form
2 t ( z | ν ) T ( λ z | ν ) , z R ,
where t ( · | ν ) and T ( · | ν ) are the pdf and the cdf of a Student t distribution with ν degrees of freedom. The corresponding Jeffreys prior for the skewness parameter λ when ν is known and finite is
π J ( λ | ν ) z 2 t ( z | ν ) t 2 ( λ z | ν ) T ( λ z | ν ) d z .
It is proper, symmetric about zero and with tails of order O ( λ 3 / 2 ) . The same authors further investigated the case of the skew-t distribution of [5] with pdf
2 t ( z | ν ) T λ z ν + 1 ν + z 2 ν + 1 , z R .
The corresponding Jeffreys prior for λ for known and finite ν is
π J ( λ | ν ) 0 z 2 t ( z | ν ) t 2 ( λ z | ν ) ( ν + z 2 ) T λ z ν + 1 ν + z 2 ν + 1 1 T λ z ν + 1 ν + z 2 ν + 1 d z .
Again, this prior is proper, symmetric about zero and the tails are of order O ( λ 3 / 2 ) .

4.2. Jeffreys Prior for General Skew-Symmetric Models

Rubio and Liseo (2014) [48] investigated the Jeffreys prior for the skewness parameter of a general class of scalar skew-symmetric models. The Jeffreys prior cannot be used for some skew-symmetric models at λ = 0 because of the singularity of the Fisher information at this point; see the Introduction for details about this issue. They showed that under mild conditions, including knowledge of location and scale parameters, the Jeffreys prior of the skewness parameter λ in the skew-symmetric model is proper, symmetric about zero and tails are of order of O ( | λ | 3 / 2 ) . They used these results to construct the independence Jeffreys prior for the model including the location and scale parameters: it is the product of the Jeffreys prior of each parameter, under the assumption that the remaining parameters are held fixed. The same authors also provided sufficient conditions for the existence of the posterior distribution and briefly discussed the existence of a proper independence Jeffreys prior for the skew-logistic model described in [7] and gave a Student t approximation to that prior.
The approach in [48] might be sketched as follows. The Fisher information for the shape parameter in (2) is
I ( λ ) = 2 z 2 f ( z ) g 2 ( λ z ) G ( λ z ) d z = 0 + 2 z 2 f ( z ) π h ( λ z ) 2 d z
where
h ( z ) = 1 π g ( z ) G ( z ) 1 G ( z )
and therefore in this case the Jeffreys prior for λ is π J ( λ ) I ( λ ) . The first step for approximating the function h ( z ) in (8) is to see that the transformed random variable Z = G 1 ( X ) h ( z ) , where the random variable X B e t a ( 1 / 2 , 1 / 2 ) and G is a cumulative distribution function of an absolutely continuous symmetric random variable, that is G ( z ) = 1 G ( z ) for all z R . The corresponding cumulative distribution function H is
H ( z ) = 2 π arcsin G ( z ) .
An approximation of h in terms of g might then be achieved by choosing the scale parameter σ such that
h ( z ) 1 σ g z σ .
The quality of this approximation depends on the thickness of the tails of g. The authors illustrate this point by considering the case of g ( z ) having a Student t distribution with ν degrees of freedom and comparing the approximations using quantiles. Alternatively, σ might be chosen to minimize the Kullback-Leibler divergence between h ( z ) and g ( z / σ ) / σ . Ref. [32] approximated the Jeffreys prior using the parameterization δ = λ / 1 + λ 2 . They also proposed to use the symmetric Beta prior B e t a ( τ , τ ) for β = δ 1 / 2 , thus leading to the Student t prior for λ
π J ( λ ) = Γ ( 2 τ ) 2 2 τ 1 Γ ( τ ) Γ ( τ ) ( 1 + λ 2 ) τ + 1 / 2
which reduces to the Cauchy distribution for τ = 0.5 , see [32].

4.3. Distance-Based Priors

As already mentioned, the shape parameter λ does not only impact the skewness in skew-symmetric models, but also the mean, the variance, the modes and the kurtosis. Dette et al. (2018) [49] dealt with this issue by assigning a prior distribution on the perturbation effect of the skewness parameter, quantified by the Total Variation distance between the symmetric density f and its skew-symmetric counterpart 2 f ( x ) G λ w ( x ) , where w is an odd function, rather than on the skewness parameter itself. The rationale behind this choice is that such a distance is more easily interpretable than the parameter λ , and hence informative as well as non-informative priors can more readily be found for the effect of λ than for λ itself. The Total Variation distance between two probability measures μ ( · ) and ν ( · ) on R is
d T V ( μ , ν ) = sup A R | μ ( A ) ν ( A ) | ,
that is, the maximum difference between the probabilities assigned to the same event by the two measures. It is bounded between zero and one, 0 d T V ( μ , ν ) 1 . The Total Variation distance between f and 2 f ( x ) G λ w ( x ) is given by
d T V ( f , G | λ ) = 1 2 R | 2 G λ w ( x ) 1 | f ( x ) d x .
The symmetry of G implies that d T V ( f , G | λ ) is not a one-to-one function of the parameter λ : d T V ( f , G | λ ) = d T V ( f , G | λ ) . It is therefore convenient to use M T V ( λ ) = s i g n ( λ ) d T V ( f , G | λ ) as a measure of perturbation, due to its appealing properties: M T V ( 0 ) = 0 , the largest/smallest value of M T V ( λ ) is ± 0.5 (attained when λ ± ) so that M T V ( λ ) ( 1 / 2 , 1 / 2 ) , and M T V ( λ ) is invariant under affine transformations. Moreover, M T V ( λ ) = 0.5 1 2 S f ; G ( 0 ; λ ) , where S f ; G is the cdf associated with s f ; G , which means that M T V ( λ ) is a re-scaling of the difference between the mass cumulated on either side of the origin for a fixed choice of f and G by the distribution S f ; G . In summary, M T V ( λ ) quantifies the impact of the parameter λ on the relocation of the probability mass on either side of the symmetry center of f.
The proposed measure M T V allows to build both informative and non-informative priors for the perturbation parameter λ in skew-symmetric models. Since M T V ( 1 / 2 , 1 / 2 ) is an injective function of λ any prior option for λ induces a proper prior. Ref. [49] proposed, for M T V ( λ ) , Beta priors with support on the interval ( 1 / 2 , 1 / 2 ) and with density
1 B e t a ( α , β ) u + 1 2 α 1 1 2 u β 1 ,
where B e t a ( α , β ) is a beta function with hyperparameters α , β > 0 . This induces on λ the proper prior with pdf
π ( λ | α , β ) = 1 B e t a ( α , β ) M T V ( λ ) + 1 2 α 1 1 2 M T V ( λ ) β 1 d d λ M T V ( λ ) .
Priors of this type are called Beta Total Variation priors and are denoted by B T V ( α , β ) ; they are flexible, interpretable and lead to tractable posterior distributions. The behaviour of the prior B T V ( α , β ) is well illustrated by the special case B T V ( 1 , 1 ) , that is a uniform prior giving equal probability mass to any pair of subintervals of equal length belonging to the support. If g is a bounded pdf and 0 1 w ( x ) f ( x ) d x < , then B T V ( 1 , 1 ) is well-defined for all λ and is given by
π T V ( λ | 1 , 1 ) = 2 0 w ( x ) f ( x ) g ( λ w ( x ) d x .
Since (9) does not have a closed-form, the authors proposed to approximate it by a Cauchy distribution centered at the origin and with scale parameter equal to 0.92 . A Monte Carlo study showed that the proposed non-informative prior induces a posterior distribution with good frequentist properties and similar to those of the Jeffreys prior.

4.4. Prior Choices in the Presence of Kurtosis Parameters

Rubio and Steel (2015) [50] have proposed a general strategy for constructing weakly informative priors for kurtosis parameters by assigning a uniform prior to a bounded measure of kurtosis applied to the symmetric baseline density f ( · | δ ) in which δ is the tail parameter and is a one-to-one function of the kurtosis. This methodology, used in [51], induces a proper prior on δ that can be interpreted as weakly non-informative prior, in that it assigns a flat prior on a function that incorporates the influence of the parameter δ on the shape of the density. This prior can be coupled with the Jeffreys prior for the skewness parameter in order to produce a joint prior for ( δ , λ ) in skew-symmetric models by using p ( λ , δ ) = p ( λ | δ ) p ( δ ) where
p ( λ | δ ) 0 x 2 f ( x | δ ) g ( λ x ) 2 G ( λ x ) [ 1 G ( λ x ) ] d x .
For each value of δ the tails are of order O ( | λ | 3 / 2 ) . A simulation study showed that this prior produces a posterior density with good frequentist properties.

5. Overview on Related Topics

So far this paper has focussed on the univariate case without covariates. This section briefly reviews some of the literature on more general settings related to skew-symmetric distributions.
Ref. [52] proposed a general population Monte Carlo algorithm in order to conduct a full Bayesian analysis of the multivariate skew-normal distribution, also in the presence of constrained parameters. Since the prior distribution approximates the actual reference prior for the shape parameter vector, this approach can be considered as a weakly informative prior. In addition, a generalization to the matrix variate regression model with skew-normal error is also provided.
Ref. [53] carried out a Bayesian analysis of a p-variate skew-t distribution by providing a new parameterization, considering a set of non-informative priors and a sampler designed to obtain the posterior model based on the parameters. The methodology can be extended to multivariate regression models with skewed errors and also stochastic frontier models.
Ref. [54] investigated the time series of electricity spot prices, which exhibit heavy tails and skewness. The authors conducted Bayesian inference on the multivariate skew-t distribution by putting a normal prior on the multi-dimensional skewness parameter.
Ref. [51] proposed a general non-informative structure for regression models with skew-symmetric errors, showed that under some mild conditions the resulting posterior distribution is proper and extended the results to the cases where the response variables are censored. The authors also investigated accelerated failure time models, which are relevant in survival analysis. Different prior distributions have been implemented on the skewness parameter of the skew-normal model including a Jeffreys prior, a matching prior, an informative prior and a uniform, noninformative prior on the parameter δ = λ / 1 + λ 2 , leading to the proper prior
π ( λ ) 1 ( 1 + λ 2 ) 3 / 2 .
Ref. [55] used finite mixtures of skewed distributions to model flow cytometry data, in order to describe their skewness, kurtosis and heterogeneity. The authors developed Bayesian inference based on data augmentation and MCMC sampling using the aforementioned model. Data augmentation in this case is based on stochastic representation of the skew-normal distribution in terms of a random-effects model with truncated normal random effects. Finite mixtures of skew-normals provide a Gibbs sampling scheme that can be drawn from standard densities only. The same MCMC scheme is extended to mixtures of skew-t distributions by considering the skew-t distribution as a scale mixture of skew-normals.
Ref. [56] proposed a new class of distributions by introducing a skewness parameter in multivariate elliptically symmetric densities. This class of densities contains many standard families such as skew-t and skew-normal distributions. They condition on some unobserved variables commonly used in regression modelling and model stock market returns, security options or risky financial assets subject to shocks. Within the Bayesian realm, they show inter alia that there exist posterior distributions and moments for regression coefficients derived under improper priors.
Linear mixed models (LMM) are commonly used to analyze repeated measure data since they allow for flexible modelling of within-subject correlations. Mostly LMM for continuous responses assume that the random effects and the within-subjects errors are normally distributed, which can be unrealistic. Ref. [57] considered the less restrictive assumption of skew-normality and Bayesian inference based on prior distributions very similar to non-informative ones. They illustrated the proposed approach with the Framingham cholesterol data, obtained from a well-known long-term study aimed at investigating the relationship between various risk factors and diseases and to characterize the natural history of chronic circulatory diseases.

6. Discussion

In this paper we have provided an overview on the various proposals of Bayesian inference within skew-symmetric models. We hope that the reader will consider it as a helpful tool and source of information on this research domain. We refer the interested reader to the simulation study and real data analysis of the recent paper [49] for a performance comparison between several of the above described prior proposals. Digging further into performance comparisons is a promising research task in order to get a more complete picture on which prior to ideally use in which situation when dealing with skew-symmetric distributions.
A referee remarked that in the general case the posterior distribution is multimodal and it is therefore necessary to impose some conditions ensuring unimodality. Log-concavity implies unimodality and it is preserved under convolution, marginalization, affine transformations and conditioning. For example, the assumption that the joint distribution of the parameter and the observations is log-concave implies that the posterior distribution is log-concave, too. We illustrate this point with a simple example. Assume that we sampled just one observation from a standard skew-normal distribution and that our prior distribution on the shape parameter is standard normal: f z | λ = 2 ϕ z Φ λ z and π λ = ϕ λ . The joint distribution of the observation and the parameter is then f z , λ = 2 ϕ z ϕ λ Φ λ z . A little calculation shows that f z , λ is log-concave. Hence, without further calculation, we know that f λ | z is log-concave and hence unimodal, too. MAP estimates are then uniquely defined and can be easily derived by noticing that the posterior distribution is skew-normal: π λ | z = 2 ϕ λ Φ λ z . Ref. [58] provides a thorough review of the literature on log-concavity, both in the univariate and in the multivariate case. Would the assumption of log-concavity be too restrictive, one could resort to other multivariate generalizations of unimodality, as for example block-unimodality, which already appeared in the Bayesian literature ([59]).

Author Contributions

Conceptualization, F.G. and C.L.; methodology, F.G.; writing—original draft preparation, F.G.; writing—review and editing, C.L. and N.L.; supervision, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by a BOF Starting Grant of Ghent University.

Acknowledgments

We thank two anonymous reviewers for their constructive comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ley, C. Skew distributions. In Statistical Theory and Methods, Encyclopedia of Environmetrics, 2nd ed.; El-Shaarawi, A., Piegorsch, W., Eds.; Wiley: New York, NY, USA, 2012; pp. 1944–1949. [Google Scholar]
  2. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  3. Azzalini, A.; Capitanio, A. The Skew-Normal and Related Families; IMS Monograph Series; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
  4. Wang, J.; Boyer, J.; Genton, M. A skew-symmetric representation of multivariate distribution. Stat. Sin. 2004, 14, 1259–1270. [Google Scholar]
  5. Azzalini, A.; Capitanio, A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. Ser. B 2003, 65, 367–389. [Google Scholar] [CrossRef]
  6. Azzalini, A. Further results on a class of distributions which includes the normal ones. Statistica 1986, 46, 199–208. [Google Scholar]
  7. Nadarajah, S. The skew logistic distribution. Statistica 2009, 93, 197–203. [Google Scholar] [CrossRef]
  8. Henze, N. A probabilistic representation of the skew-normal distribution. Scand. J. Stat. 1986, 13, 271–275. [Google Scholar]
  9. Azzalini, A.; Capitanio, A. Statistical applications of the multivariate skew-normal distribution. J. R. Stat. Soc. Ser. B 1999, 61, 579–602. [Google Scholar] [CrossRef]
  10. Azzalini, A.; Genton, M. Robust likelihood methods based on the skew-t and related distributions. Int. Stat. Rev. 2008, 76, 106–129. [Google Scholar] [CrossRef]
  11. Chiogna, M. A note on the asymptotic distribution of the maximum likelihood estimator for the scalar skew-normal distribution. Stat. Methods Appl. 2005, 14, 331–341. [Google Scholar] [CrossRef]
  12. Hallin, M.; Ley, C. Skew-symmetric distributions and Fisher information—A tale of two densities. Bernoulli 2012, 18, 747–763. [Google Scholar] [CrossRef]
  13. Hallin, M.; Ley, C. Skew-symmetric distributions and Fisher information: The double sin of the skew-normal. Bernoulli 2014, 20, 1432–1453. [Google Scholar] [CrossRef]
  14. Ley, C.; Paindaveine, D. On the singularity of multivariate skew-symmetric models. J. Multivar. Anal. 2010, 101, 1434–1444. [Google Scholar] [CrossRef] [Green Version]
  15. Pewsey, A. Problems of inference for Azzalini’s skew-normal distribution. J. Appl. Stat. 2000, 27, 859–870. [Google Scholar] [CrossRef]
  16. Arellano-Valle, R.B.; Azzalini, A. The centred parametrization for the multivariate skew-normal distribution. J. Multivar. Anal. 2009, 99, 1362–1382, Erratum in 2009, 100, 816. [Google Scholar] [CrossRef]
  17. Firth, D. Bias reduction of maximum likelihood estimates. Biometrika 1993, 80, 27–38. [Google Scholar] [CrossRef]
  18. Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; Chapman & Hall/CRC: London, UK, 2013. [Google Scholar]
  19. Bernardo, J.M. Reference posterior distributions for Bayesian inference (with discussion). J. R. Stat. Soc. Ser. B 1979, 41, 113–147. [Google Scholar]
  20. Bayes, T. An essay towards solving a problem in the doctrine of chances. Philos. Trans. R. Soc. Ser. A 1763, 53, 370–418. [Google Scholar] [CrossRef]
  21. Laplace, P.S. Théorie Analytique des Probabilités; Courcier: Paris, France, 1812. [Google Scholar]
  22. Bernardo, J.M.; Berger, J.O. On the development of reference priors (with discussion). Bayesian Stat. 1992, 4, 35–60. [Google Scholar]
  23. Efron, B. Why isn’t everyone a Bayesian? (with discussion). Am. Stat. 1986, 40, 1–11. [Google Scholar]
  24. Lindley, D.V. On the measure of the information provided by an experiment. Ann. Math. Stat. 1956, 27, 986–1005. [Google Scholar] [CrossRef]
  25. Berger, J.O.; Bernardo, J.M. Estimating a product of means. Bayesian analysis with reference priors. J. Am. Stat. Assoc. 1989, 84, 200–207. [Google Scholar] [CrossRef]
  26. Lindley, D.V. Fiducial distributions and Bayes’ theorem. J. R. Stat. Soc. Ser. B 1958, 20, 102–107. [Google Scholar] [CrossRef]
  27. Datta, G.S.; Mukerjee, R. Probability Matching Priors: Higher Order Asymptotics; Springer: New York, NY, USA, 2004. [Google Scholar]
  28. Welch, B.; Peers, H.W. On formulae for confidence points based on integrals of weighted likelihoods. J. R. Stat. Soc. Ser. B 1963, 25, 318–329. [Google Scholar] [CrossRef]
  29. Ghosh, M. Objective priors: an introduction for frequentists. J. Stat. Sci. 2011, 26, 187–202. [Google Scholar] [CrossRef]
  30. Mukerjee, R.; Dey, D.K. Frequentist validity of posterior quantiles in the presence of a nuisance parameter: Higher order asymptotics. Biometrika 1993, 80, 499–505. [Google Scholar] [CrossRef]
  31. Mukerjee, R.; Ghosh, M. Second-order probability matching priors. Biometrika 1997, 84, 970–975. [Google Scholar] [CrossRef]
  32. Liseo, B.; Loperfido, N. A note on reference priors for the scalar skew-normal distribution. J. Stat. Plan. Inference 2006, 136, 373–389. [Google Scholar] [CrossRef]
  33. Cabras, S.; Racugno, W.; Castellanos, M.E.; Ventura, L. A matching prior for the shape parameter of the skew-normal distribution. Scand. J. Stat. 2012, 39, 236–247. [Google Scholar] [CrossRef]
  34. Canale, A.; Scarpa, B. Informative Bayesian inference for the skew-normal distribution. arXiv 2013, arXiv:1305.3080. [Google Scholar]
  35. Bayes, C.; Branco, E. Bayesian inference for the skewness parameter of the scalar skew-normal distribution. J. Stat. Plan. Inference 2007, 21, 141–163. [Google Scholar]
  36. Chaibub Neto, E.; Branco, M.D. Bayesian Reference Analysis for Binomial Calibration Problem; Technical Report; IME-USP: São Paulo, Brazil, 2003; p. 12. [Google Scholar]
  37. Barndorff-Nielsen, O.E. On a formula for the distribution of the maximum likelihood estimator. Biometrika 1983, 70, 343–365. [Google Scholar] [CrossRef]
  38. Pace, L.; Salvan, A.; Ventura, L. Likelihood based discrimination between separate regression models. J. Stat. Plan. Inference 2006, 136, 3539–3553. [Google Scholar] [CrossRef] [Green Version]
  39. Ventura, L.; Cabras, S.; Racugno, W. Prior distributions from pseudo-likelihoods in the presence of nuisance parameters. J. Am. Stat. Assoc. 2009, 104, 768–774. [Google Scholar] [CrossRef]
  40. Arellano-Valle, R.B.; Azzalini, A. On the unification of families of skew-normal distributions. Scand. J. Stat. 2006, 33, 561–574. [Google Scholar] [CrossRef]
  41. Canale, A.; Pagui, E.C.K.; Scarpa, B. Bayesian modelling of university first-year students’ grades after placement test. J. Appl. Stat. 2016, 43, 3015–3029. [Google Scholar] [CrossRef]
  42. Chib, S.; Jeliazkov, I. Marginal likelihood from the Metropolis-Hastings output. J. Am. Stat. Assoc. 2001, 96, 270–291. [Google Scholar] [CrossRef] [Green Version]
  43. Naranjo, L.; Perez, C.J.; Martin, J. Bayesian analysis of a skewed exponential power distribution. In Proceedings of the Computational Statistics 2012, Limassol, Cyprus, 27–31 August 2012; Colubi, A., Ed.; Physica-Verlag (Springer) Publishing: Heidelberg, Germany, 2012; pp. 641–652. [Google Scholar]
  44. Hossianzadeh, A.; Zare, K. Bayesian analysis of discrete skewed Laplace distribution. J. Mod. Appl. Stat. Methods 2016, 15, 696–702. [Google Scholar] [CrossRef] [Green Version]
  45. Fernàndez, C.; Steel, M.F.J. Multivariate Student t regression models: pitfalls and inference. Biometrika 1999, 86, 153–167. [Google Scholar] [CrossRef]
  46. Sartori, N. Bias prevention of maximum likelihood estimates for scalar skew normal and skew t distributions. J. Stat. Plan. Inference 2008, 136, 4259–4275. [Google Scholar] [CrossRef]
  47. Branco, E.; Genton, M.; Liseo, B. Objective Bayesian analysis of skew-t distributions. Scand. J. Stat. 2012, 40, 63–85. [Google Scholar] [CrossRef]
  48. Rubio, F.J.; Liseo, B. On the independence Jeffreys prior for skew-symmetric models. Stat. Probab. Lett. 2014, 85, 91–97. [Google Scholar] [CrossRef] [Green Version]
  49. Dette, H.; Ley, C.; Rubio, F.J. Natural (non-)informative priors for skew-symmetric distributions. Scand. J. Stat. 2018, 45, 405–420. [Google Scholar] [CrossRef] [Green Version]
  50. Rubio, F.J.; Steel, J.M. Bayesian modelling of skewness and kurtosis with two-piece scale and shape distributions. Electron. J. Stat. 2015, 9, 1884–1912. [Google Scholar] [CrossRef]
  51. Rubio, F.J.; Genton, M. Bayesian linear regression with skew-symmetric error distributions with applications to survival analysis. Stat. Med. 2016, 35, 2441–2454. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  52. Liseo, B.; Parisi, A. Bayesian inference for the multivariate skew-normal model: A population Monte Carlo approach. Comput. Stat. Data Anal. 2013, 63, 125–138. [Google Scholar] [CrossRef] [Green Version]
  53. Parisi, A.; Liseo, B. Objective Bayesian analysis for the multivariate skew-t model. Stat. Methods Appl. 2017, 27, 277–295. [Google Scholar] [CrossRef] [Green Version]
  54. Panagiotelisa, A.; Smith, M. Bayesian density forecasting of intraday electricity prices using multivariate skew t distributions. Int. J. Forecast. 2008, 24, 710–727. [Google Scholar] [CrossRef]
  55. Frühwirth-Schnatter, S.; Pyne, S. Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics 2010, 11, 317–336. [Google Scholar] [CrossRef] [Green Version]
  56. Sahu, S.K.; Dey, D.K.; Branco, M.D. A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 2003, 31, 129–150. [Google Scholar] [CrossRef] [Green Version]
  57. Arellano-Valle, R.B.; Bolfarine, H.; Lachos, V.H. Bayesian inference for skew-normal linear mixed models. J. Appl. Stat. 2007, 34, 663–682. [Google Scholar] [CrossRef]
  58. Samworth, R.J. Recent progress in log-concave density estimation. Stat. Sci. 2018, 33, 493–509. [Google Scholar] [CrossRef] [Green Version]
  59. Liseo, B.; Petrella, L.; Salinetti, G. Block unimodality for multivariate Bayesian robustness. J. Ital. Stat. Soc. 1993, 2, 55–71. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Ghaderinezhad, F.; Ley, C.; Loperfido, N. Bayesian Inference for Skew-Symmetric Distributions. Symmetry 2020, 12, 491. https://doi.org/10.3390/sym12040491

AMA Style

Ghaderinezhad F, Ley C, Loperfido N. Bayesian Inference for Skew-Symmetric Distributions. Symmetry. 2020; 12(4):491. https://doi.org/10.3390/sym12040491

Chicago/Turabian Style

Ghaderinezhad, Fatemeh, Christophe Ley, and Nicola Loperfido. 2020. "Bayesian Inference for Skew-Symmetric Distributions" Symmetry 12, no. 4: 491. https://doi.org/10.3390/sym12040491

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop