A Note on Ordering Probability Distributions by Skewness

This paper describes a complementary tool for fitting probabilistic distributions in data analysis. First, we examine the well known bivariate index of skewness and the aggregate skewness function, and then introduce orderings of the skewness of probability distributions. Using an example, we highlight the advantages of this approach and then present results for these orderings in common uniparametric families of continuous distributions, showing that the orderings are well suited to the intuitive conception of skewness and, moreover, that the skewness can be controlled via the parameter values.


Introduction
Detailed knowledge of the characteristics of probability models is desirable (if not essential) if data are to be modeled properly. In studying these properties, many authors have considered orderings within probability distribution families, according to diverse measuring criteria. The usual approach taken by researchers in this field is to evaluate or measure one or more theoretical characteristics of a given distribution and to study the effect produced by the value of its parameters on this measurement. In actuarial science, stochastic orders are widely used in order to make risk comparisons [1].
Some parametric distributions can be ordered according to the evaluation made of a given property, merely by comparing some of its parameters. Although most related orders are actually preorders, each one presents interesting applications. Many studies have been conducted in this area, and the following are particularly significant: Lehmann (1955) [2], which is of seminal importance; Arnold (1987) [3], who compared random variables according to stochastic ordering in a particular Lorenz order; Shaked and Shanthikumar (2006) [1], on stochastic orders; Nanda and Shaked (2001) [4], on reversed hazard rate orders; Ramos-Romero and Sordo-Díaz (2001) [5], on the likelihood ratio order; and Gupta and Aziz (2010) [6], on convex orders.
In this paper, we study the relationship between the skewness of some parametric distributions and the value of one of their parameters. The first question to be addressed is that of measuring the skewness. In this respect, Oja (1981) [7] introduced a set of axioms to be verified by any measurement of skewness considered. These axioms were established for indexes of skewness with one main constraint: that the skewness of a distribution should be evaluated by a single real number. This point is discussed below.
Many authors have proposed and obtained different descriptive elements to measure skewness (see, for instance, [8][9][10][11][12][13]). Ref [10] suggested a measurement of skewness corresponding to the (unique) mode, M, given by the following index: Ref [10] applied this index to ordering the gamma, log-logistic, lognormal and Weibull families of distributions by their skewness, taking into account the feasible values of their respective parameters. Index (1), which is proven to satisfy those axioms derived from Oja (1981) [7], is also recommended in [14] as a (very) good index of skewness. However, notice that (1) only compares the probability weight on the left side of a central point (the mode) with the value 1/2, but it does not account for how the weights are distributed to each side of the centre. García et al. (2015) [15] introduced some further elements to be incorporated into the list of skewness measurements of a probability distribution. According to these authors, given a unimodal probability distribution F (x), its skewness is considered to be a local function of a given distance, z, from the mode, M. For such a distance, and given the interval [M − z, M + z], the aggregate skewness function, ν F (z), compares the probability weight of F at either side of the interval: where z ≥ 0. Thus, the (maximum) right skewness of the distribution F and its (minimum) left skewness are respectively given by The distances, z p and z n , where these extreme values are achieved, are termed the critical distances to the mode. As the skewness function is bounded inside the interval [−1, 1] and ν F (∞) = 0, the bivariate index S − (F) , S + (F) belongs to [−1, 0] × [0, 1]. A given distribution function F such that ν F (z) ≥ 0 for all z ≥ 0 is said to be only skewed to the right; and if ν F (z) ≤ 0 for all z ≥ 0, it is said to be only skewed to the left.
The relationship F < c G (F c−precedes G) means that G −1 [F (x)] is a convex function. For a continuous distribution F, the bivariate measurement of skewness S − (F) , S + (F) verifies the following properties, where aF + b and −F mean the distributions of the corresponding transformation of a random variable that is F−distributed: These properties can be considered as a vectorial interpretation of the axioms given by Oja (1981) [7].
As it is easily proven that ν F (0) = γ M (F), we can establish that (2) and (3) give considerably clearer and more complete information than (1) about the skewness of any distribution function.
Most families of continuous distributions are only skewed to the right (or only to the left), while doubles-sign skewness is abundant within the discrete families, as shown in [15]. Nevertheless, the joint use of the function (2) and the bivariate index (3) makes it possible to improve the ordering of the skewness-based distribution discussed in [10], as can be seen in the following example. Example 1. Assume the following random variable X ∈ [−2, ∞) with PDF given by: Assume also the PDF, g (y) of Y = −X. Then, γ M (F) = 0 = γ M (G) . That is, according to coefficient γ M (·), both distributions have the same null skewness, although they do not even have a symmetric support set. However, using expression (2), we find that and ν G (z) = −ν F (z) , for all z ≥ 0. These functions are plotted in Figure 1, where it can be seen that Clearly, the information about skewness obtained from the aggregate skewness function ν (z) and the indices S + (·) and S − (·) is considerably more comprehensive than that obtained from γ M (·).

Outline
In applied statistical analysis, it is useful to have a large catalogue of plausible distributions with which to fit the data. According to García et al. (2015) [15], common measures of skewness can be complemented with a bivariate index of positive-negative skewness, and the authors show that the mode is the relevant central value to study both right and left skewness. In this paper, we extend the tool-box approach to fit data from probability distributions, introducing two orderings that are deduced from the skewness measures given in [15]. The first of those orderings is based on the positive part of the bivariate index of skewness, which in many instances coincides with the well known γ M (F). Nevertheless, the differences can be highly significant, as in the previous example. The second, more noteworthy, order is based on the skewness function ν F (z) and meets the first of the conditions, but not the reciprocal.
There are two reasons for ordering a family of distributions according to a given measurement of skewness. Firstly, as a property of the distribution, this ordering allows us to control its skewness by the appropriate selection of the parameter. When this is done (and the parameter is readily determined), the theoretical results have immediate applications in the data-fitting process. Secondly, when a given family of distributions is conceived as being more or less skewed according to the value of a parameter, and a measurement of skewness ratifies the ordering, it may be concluded that the functioning of this measurement provides a reasonably good fit with an intuitive conception of skewness.
The rest of this paper is organized as follows. In Section 2, we study the aggregate skewness function and the resultant skewness-based ordering of the gamma, log-logistic, lognormal, Weibull and asymmetric Laplace families of continuous probability distributions. In Section 3, we study the ordering of two of the most well-known distributions commonly used in PERT methods: the beta and the asymmetric triangular distributions. Finally, conclusions are presented in Section 4.

Families of Uniparametric Distributions Ordered by Skewness
Let F and G be unimodal distribution functions, with no centre or scale parameters, and modes M F and M G , respectively. We compare their respective skewness by two different criteria.
then we say that F has equal or more aggregate skewness to the right at any point than G. We denote this by F ≥ ν G.

Definition 2.
If F and G are both skewed only to the right, we say that F has equal or more maximum aggregate skewness to the right than G when and we denote this by F ≥ + G.
With these definitions, it immediately follows that: The reverse implication is not true in general.
Proof. The proof follows immediately from the definitions given in (4) and (5).
In the next section, we consider some well known uniparametric families of continuous distributions, with no centre or scale parameters but depending on a skewness parameter, and examine whether they are ordered by aggregate skewness, or by maximum aggregate skewness. The gamma family is a very broad one, which includes many other well known distributions as particular cases. A study of the log-logistic, lognormal, Weibull and asymmetric Laplace families, one by one and in turn, when not included inside the previous one, will produce widely varying results.

Proposition 2.
Let G (α 1 ) and G (α 2 ) be gamma distributions with CDF as in (6). Then: Proof. Part 1. We can write Therefore, u (z) = 0 when is the integral of a negative function, so it is negative, and the proof is complete. and, Then, clearly we have that dν G /dz < 0 for all z ≥ α. For 0 ≤ z < α, if we denote then the sign of dν G /dz is the sign of w (x). As w (0) = 0, w (α) = − (2α) α e −α < 0, and we conclude that ν G is a decreasing function on z ≥ 0 and S + (G (α)) = ν G (0; α) .
where Γ (1 + α, α) is the incomplete Gamma function, and then S + (G (α)) is a decreasing function on α, when α → ∞. Nevertheless, a simple plotting of the functionals ν G (z; α i ) for any 0 < α 1 < α 2 shows that both functionals cross each other and that they are not ordered by "≥ ν ". Thus, the proof is completed.

Log-Logistic Distributions
The CDF of a uniparametric log-logistic distributed random variable X is given by for x > 0, with θ > 0. The mode of these distributions depends on θ. If 0 < θ ≤ 1, then M = 0, and ν LL (z; θ) = 1 1 + z θ , and S + (F LL (θ)) = 1. The functionals ν LL (z; θ) for different values of θ inside the rank cross each other at z = 1, and these distributions are ordered neither by skewness function nor by skewness indexes. Nevertheless, for θ > 1, the mode is Notice that M is an increasing function of θ when θ > 1, because When θ > 1, it is also known from Arnold and Groeneveld (1995) that ν LL (0; θ) = 1 θ .

Lognormal Variance Distributions
for x, σ > 0, where Φ (·) is the standard normal distribution function. The mode is given by

Uniparametric Weibull Distributions
Consider the uniparametric Weibull distributions family given by the CDF The mode is known to be at 0, for c ≤ 1 (as a limit, when c < 1) and at for c > 1. The expression for ν W is given by On the one hand, when c < 1, note that ν W(c) (1) = e −1 , so all these functions intersect at this point. Graphically, it can be seen that there is no ordering by "≥ ν ", and also that S + (W (c)) = 1, when c < 1. On the other hand, for 1 ≤ c 1 < c 2 , the following result is obtained.

Proposition 5.
Let W (c 1 ) and W (c 2 ) be Weibull distributions with 1 ≤ c 1 < c 2 and CDF as in (12). Then, Proof. For 1 ≤ c 1 < c 2 , the corresponding modes are M 1 < M 2 . Then, for 0 < z < M 1 , because each part of the expression inside brackets {·} is positive. If we take M 1 ≤ z < M 2 , then for a similar reason. Finally, if we take z > M 2 , then and the proof is completed.

Asymmetric Laplace Distributions
The asymmetric Laplace distribution has been introduced in the literature by different ways ( [16,17]). In this paper we will use Kozubowski and Podgórski (2002) [18] (later refined in [19]) to refer it. This distribution is obtained by using the scheme introduced by Fernández and Steel (1998) [20] to produce skewness on a symmetric distribution. In this way, the pdf of a skewed or asymmetric Laplace distribution can be written in the form where σ, κ > 0, and −∞ < µ < ∞. Then, we assign values (0, 1) to the centre and scale parameters (µ and σ, respectively) in order to study the aggregate skewness function, and the extreme right and left skewness indices then depend only on the skewness parameter κ > 0. Thus, it is easily proven that: 1. The aggregate skewness function of an AL (κ) distribution can be written as 2. ν AL (z; κ) is an increasing negative function of z when κ > 1, and it is a decreasing positive function of z when 0 < κ < 1. ν AL,1 (z; 1) = 0, for all z ≥ 0. That is, any AL distribution is skewed only to the right or to the left, depending on κ. In any case, the function verifies lim z→∞ ν AL (z; κ) = 0 but, when κ = 1, the function never reaches that limit value. To prove these results, it is sufficient to note that 3. At z = 0, the skewness function takes the following value: Then, ν AL (0; κ) is the value for S + (F AL (κ)) or S − (F AL (κ)), depending on its sign. 4. ν AL (z; κ) is a strictly decreasing function on κ. This is easily shown by means of for all z > 0, and all κ > 0.

The Beta and the AST Distributions
The methods for Project Management and Review Technique (PERT) are well known and widely applied when the needed activities for a given project must be ordered according to precedence in time. Some of these methods require modelling the time length of each activity as a random variable, following an expert's opinion. The beta and the asymmetric triangular distributions are commonly used by engineers to describe these time lengths. In any case, the indications of the experts can be related to a maximum and a minimum values and a mode, often completed with further considerations about the shape and skewness of the PDF of the time random variable. Then, a deep study of the skewness of both families of probability distributions would be welcome to improve the model fit.
On the one hand, the asymmetric standard triangular distribution (ASTD) , free of center and scale parameters, depends on only one parameter 0 ≤ θ ≤ 1, and has the pdf: There is a large body of literature that shows the use of the ASTD in PERT methods (see [21] and [19] and cites therein). Note that cases θ = 0, 1 are members of the beta family of distributions.
For 0 < θ < 1, the ASTD(θ) CDF can be written as follows: As the mode is found to be at x = θ, its skewness function is found to be In the case θ = 0.5, the skewness function is null. Then, for 0 < θ < 0.5 and θ < z ≤ 1 − θ, In the case 0.5 < θ < 1, for 1 − θ < z ≤ θ, and it is easily found that for 0 ≤ z < ∞. Some algebra allows to prove that, being 0 < θ 1 < θ 2 < 1, Therefore, the skewness of the ASTD distributions is completely controlled by the parameter θ.
On the other hand, the pdf of a beta distribution is given by where α, β > 0, and B (·, ·) is the beta function. Given that its CDF F (x; α, β) verifies that F (x; α, β) = 1 − F (x; β, α) and the sign of its skewness depends only on the condition β ≥ α or β ≤ α, we can study only the case β > α.
We are interested on the cases α, β > 1, where there is an unique mode M, Hence, we only consider cases where 1 < α < β, where there exists a right skewness; the cases 1 < β < α, with left skewness, can be immediately deducted by taking the parameters in reverse.
With these results, we can conclude that ν B (z; α + 1, 1 + (1 − M) α/M) decreases with the feasible values of α. That way, the subsets of Beta distributions with fixed mode are ordered on skewness (see Figure 2). As the parameter values increase, these Beta distributions become less skewed.

Conclusions
In this paper two main objectives are achieved: on the one hand, the given examples show that the skewness function orders the mesh in good accordance with the intuitive conception of skewness. Moreover, these examples show that the skewness of a distribution obtained from certain parametric families can be controlled by reference to their parameters.
As we show, the function ν F (z) facilitates the description of a random variable by means of a probability distribution, by making any skewness in the model easily observable and should be undertaken to examine the use of these properties in data fitting.
In practice, much can be learned from this model, but there remains the risk that it may be wrongly specified in real applications. Thus, in practice we must be willing to assume that the underlying distribution has a unique mode and belongs to a uniparametric family of distributions.
In many practical situations, the maximum skewness index coincides with the well known γ M (F), but this second index only takes into account the difference of probability weights at each side of the mode, while the first takes a value from the point where this difference is maximum. Moreover, the aggregate skewness function gives more accurate information about how the probability weight is distributed along both sides of the mode. Accordingly, the condition F ≥ ν G provides highly valuable information.