Next Article in Journal
Quantifying Configuration-Sampling Error in Langevin Simulations of Complex Molecular Systems
Next Article in Special Issue
Quantum Statistical Manifolds
Previous Article in Journal
Adjusted Empirical Likelihood Method in the Presence of Nuisance Parameters with Application to the Sharpe Ratio
Previous Article in Special Issue
On Normalized Mutual Information: Measure Derivations and Properties
Article

Divergence from, and Convergence to, Uniformity of Probability Density Quantiles

by 1,*,† and 2,†
1
Department of Mathematics and Statistics, La Trobe University, Bundoora, VIC 3086, Australia
2
School of Mathematics and Statistics, University of Melbourne, Parkville, VIC 3010, Australia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2018, 20(5), 317; https://doi.org/10.3390/e20050317
Received: 7 March 2018 / Revised: 10 April 2018 / Accepted: 19 April 2018 / Published: 25 April 2018
(This article belongs to the Special Issue Entropy: From Physics to Information Sciences and Geometry)

Abstract

We demonstrate that questions of convergence and divergence regarding shapes of distributions can be carried out in a location- and scale-free environment. This environment is the class of probability density quantiles (pdQs), obtained by normalizing the composition of the density with the associated quantile function. It has earlier been shown that the pdQ is representative of a location-scale family and carries essential information regarding shape and tail behavior of the family. The class of pdQs are densities of continuous distributions with common domain, the unit interval, facilitating metric and semi-metric comparisons. The Kullback–Leibler divergences from uniformity of these pdQs are mapped to illustrate their relative positions with respect to uniformity. To gain more insight into the information that is conserved under the pdQ mapping, we repeatedly apply the pdQ mapping and find that further applications of it are quite generally entropy increasing so convergence to the uniform distribution is investigated. New fixed point theorems are established with elementary probabilistic arguments and illustrated by examples.
Keywords: convergence in Lr norm; fixed point theorem; Kullback–Leibler divergence; relative entropy; semi-metric; uniformity testing convergence in Lr norm; fixed point theorem; Kullback–Leibler divergence; relative entropy; semi-metric; uniformity testing

1. Introduction

For each continuous location-scale family of distributions with square-integrable density, there is a probability density quantile (pdQ), which is an absolutely continuous distribution on the unit interval. Members of the class of such pdQs differ only in shape, and the asymmetry of their shapes can be partially ordered by their Hellinger distances or Kullback–Leibler divergences from the class of symmetric distributions on this interval. In addition, the tail behaviour of the original family can be described in terms of the boundary derivatives of its pdQ. Empirical estimators of the pdQs enable one to carry out inference, such as robust fitting of shape parameter families to data; details are in [1].
The Kullback–Leibler directed divergences and symmetrized divergence (KLD) of a pdQ with respect to the uniform distribution on [0,1] is investigated in Section 2, with remarkably simple numerical results, and a map of these divergences for some standard location-scale families is constructed. The ‘shapeless’ uniform distribution is the center of the pdQ universe, as is explained in Section 3, where it is found to be a fixed point. A natural question of interest is to find the invariant information of the pdQ mapping, that is, the conserved information after the pdQ mapping is applied. To this end, it is necessary to repeatedly apply the pdQ mapping to extract the information. Numerical studies indicate that further applications of the pdQ transformation are generally entropy increasing, so we investigate the convergence to uniformity of repeated applications of the pdQ transformation, by means of fixed point theorems for a semi-metric. As the pdQ mapping is not a contraction, the proofs of the fixed point theorems are through elementary probabilistic arguments rather than the classical contraction mapping principle. Our approach may shed light on future research in the fixed point theory. Further ideas are discussed in Section 4.

2. Divergences between Probability Density Quantiles

2.1. Definitions

Let F denote the class of cumulative distribution functions (cdfs) on the real line R and for each F F define the associated quantile function of F by Q ( u ) = inf { x : F ( x ) u } , for 0 < u < 1 . When the random variable X has cdf F, we write X F . When the density function f = F exists, we also write X f . We only discuss F as absolutely continuous with respect to Lebesgue measure, but the results can be extended to the discrete and mixture cases using suitable dominating measures.
Definition 1.
Let F = { F F : f = F exists and is positive } . For each F F , we follow [2] and define the quantile density function q ( u ) = Q ( u ) = 1 / f ( Q ( u ) ) . Parzen called its reciprocal function f Q ( u ) = f ( Q ( u ) ) the density quantile function. For F F , and U uniformly distributed on [0,1], assume κ = E [ f Q ( U ) ] = f 2 ( x ) d x is finite; that is, f is square integrable. Then, we can define the continuous pdQ of F by f * ( u ) = f Q ( u ) / κ , 0 < u < 1 . Let F * F denote the class of all such F.
Not all f are square-integrable, and this requirement for the mapping f f * means that F * is a proper subset of F . The advantages of working with f * s over fs are that they are free of location and scale parameters; they ignore flat spots in F and have a common bounded support. Moreover, f * often has a simpler formula than f; see Table 1 for examples.
Remark 1.
Given that a pdQ f * exists for a distribution with density f, then so does the cdf F * and quantile function Q * = ( F * ) 1 associated with f * . Thus, a monotone transformation from X F to X * F * exists; it is simply X * = m ( X ) = Q * ( F ( X ) ) . For the Power ( b ) distribution of Table 1, f b * = f b * , where b * = 2 1 / b , so m b ( x ) = Q b * ( F b ( x ) ) = x b / b * = x b 2 / ( 2 b 1 ) . For the normal distribution with parameters μ , σ , it is m μ , σ ( x ) = Φ ( ( x μ ) / 2 σ ) . In general, an explicit expression for Q * that depends only on f or F (plus location-scale parameters) need not exist.

2.2. Divergence Map

Next, we evaluate and plot the [3] divergences from uniformity. The [3] divergence of density f 1 from density f 2 , when both have domain [0,1], is defined as
I ( f 1 : f 2 ) : = 0 1 ln ( f 1 ( u ) / f 2 ( u ) ) f 1 ( u ) d u = E [ ln ( f 1 ( U ) / f 2 ( U ) ) f 1 ( U ) ] ,
where U denotes a random variable with the uniform distribution U on [0,1]. The divergences from uniformity are easily computed through
I ( U : f * ) = 0 1 ln ( f * ( u ) ) d u = E [ ln ( f * ( U ) ) ]
and
I ( f * : U ) = 0 1 ln ( f * ( u ) ) f * ( u ) d u = E [ ln ( f * ( U ) ) f * ( U ) ] .
Kullback ([4], p. 6) interprets I ( f * : U ) as the mean evidence in one observation V f * for f * over U ; it is also known as the relative entropy of f * with respect to U . The terminology directed divergence for I ( f 1 : f 2 ) is also sometimes used ([4], p. 7) with ‘directed’ explained in ([4], pp. 82, 85); see also [5] in this regard.
Table 1 shows the quantile functions of some standard distributions, along with their pdQs, associated divergences I ( U : f * ) , I ( f * : U ) and symmetrized divergence (KLD) defined by J ( U , f * ) : = I ( U : f * ) + I ( f * : U ) . The last measure was earlier introduced in a different form by [6].
Definition 2.
Given pdQs f 1 * , f 2 * , let d ( f 1 * , f 2 * ) : = I ( f 1 * : f 2 * ) + I ( f 2 * : f 1 * ) . Then, d is a semi-metric on the space of pdQs; i.e., d satisfies all requirements of a metric except the triangle inequality. Introducing the coordinates ( s 1 , s 2 ) = ( I ( U : f * ) , I ( f * : U ) ) , we can define the distance from uniformity of any f * by the Euclidean distance of ( s 1 , s 2 ) from the origin ( 0 , 0 ) , namely d ( U , f * ) .
Remark 2.
This d does not satisfy the triangle inequality: for example, if U , N and C denote the uniform, normal and Cauchy pdQs, then d ( U , N ) = 0.5 , d ( N , C ) = 0.4681 but d ( U , C ) = 1 ; see Table 1 and Figure 1. However, d can provide an informative measure of distance from uniformity.
Figure 1 shows the loci of points ( s 1 , s 2 ) for some continuous shape families. The light dotted arcs with radii 1/2, 1 and 2 are a guide to these distances from uniformity. The large discs in purple, red and black correspond to U , N and C . The blue cross at distance 1 / 2 from the origin corresponds to the exponential distribution. Nearby is the standard lognormal point marked by a red cross. The lower red curve is nearly straight and is the locus of points corresponding to the lognormal shape family.
The chi-squared( ν ), ν > 1 , family also appears as a red curve; it passes through the blue cross when ν = 2 , as expected, and heads toward the normal disc as ν . The Gamma family has the same locus of points as the chi-squared family. The curve for the Weibull( β ) family, for 0.5 < β < 3 , is shown in blue; it crosses the exponential blue cross when β = 1 . The Pareto(a) curve is shown in black. As a increases from 0, this line crosses the arcs distant 2 and 1 from the origin for a = ( 2 2 + 1 ) / 7 0.547 and a = ( 5 1 ) / 2 1.618 , respectively, and approaches the exponential blue cross as a .
The Power(b) or Beta( b , 1 ) for b > 1 / 2 family is represented by the magenta curve of points moving toward the origin as b increases from 1/2 to 1, and then moving out towards the exponential blue cross as b . For each choice of α > 0.5 , β > 0.5 the locus of the Beta( α , β ) pdQ divergences lies above the chi-squared red curve and mostly below the power(b) magenta curve; however, the U-shaped Beta distributions have loci above it.
The lower green line near the Pareto black curve gives the loci of root-divergences from uniformity of the Tukey( λ ) with λ < 1 , while the upper green curve corresponds to λ 1 . It is known that the Tukey( λ ) distributions, with λ < 1 / 7 , are good approximations to Student’s t-distributions for ν > 0 , provided λ is chosen properly. The same is true for their corresponding pdQs ([1], Section 3.2). For example, the pdQof t ν with ν = 0.24 degrees of freedom is well approximated by the choice λ = 4.063 . Its location is marked by the small black disk in Figure 1; it is of distance 2 from uniformity. The generalized Tukey distributions of [7] with two shape parameters also fill a large funnel shaped region (not marked on the map) emanating from the origin and just including the region bounded by the green curves of the Tukey symmetric distributions.

2.3. Uniformity Testing

There are numerous tests for uniformity, but as [8] points out, many are undermined by the common practice of estimating location-scale parameters of the null and/or alternative distributions when in fact it is assumed that these distributions are known exactly. In practice, this means that if a test for uniformity is preceded by a probability integral transformation including parameter estimates, then the actual levels of such tests will not be those nominated unless (often complicated and model-specific) adjustments are made. Examples of such adjustments are in [9,10].
Given a random sample of m independent, identically distributed (i.i.d.) variables, each from a distribution with density f, it is feasible to carry out a nonparametric test of uniformity by estimating the pdQ with a kernel density estimator f m * ^ and comparing it with the uniform density on [0,1] using any one of a number of metrics or semi-metrics. Consistent estimators f m * ^ for f * based on normalized reciprocals of the quantile density estimators derived in [11] are available and described in (Staudte [1], Section 2). Note that such a test compares an arbitrary uniform distribution with an arbitrary member of the location-scale family generated by f; it is a test of shape only. Preliminary work suggests that such a test is feasible. However, an investigation into such omnibus nonparametric testing procedures, including comparison with bootstrap and other kernel density based techniques found in the literature, such as [12,13,14,15,16,17], is beyond the scope of this work.

3. Convergence of Density Shapes to Uniformity via Fixed Point Theorems

The transformation f f * of Definition 1 is quite powerful, removing location and scale and moving the distribution from the support of f to the unit interval. A natural question of interest is to find the information in a density that is invariant after the pdQ mapping is applied. To this end, it is necessary to repeatedly apply the pdQ mapping to extract the information. Examples suggest that another application of the transformation f 2 * : = ( f * ) * leaves less information about f in f 2 * and hence it is closer to the uniform density. Furthermore, with n iterations f ( n + 1 ) * : = ( f n * ) * for n 2 , it seems that no information can be conserved after repeated *-transformation so we would expect that f n * converges to the uniform density as n . An R script [18] for finding repeated *-iterates of a given pdQ is available as Supplementary Material.

3.1. Conditions for Convergence to Uniformity

Definition 3.
Given f F , we say that f is of *-order n if f * , f 2 * , , f n * exist but f ( n + 1 ) * does not. When the infinite sequence { f n * } n 1 exists, it is said to be of infinite *-order.
For example, the Power( 3 / 4 ) family is of *-order 2, while the Power(2) family is of infinite *-order. The χ ν 2 distribution is of finite *-order for 1 < ν < 2 and infinite *-order for ν 2 . The normal distribution is of infinite *-order.
We write μ n : = { f ( y ) } n d y , κ n = 0 1 { f n * ( x ) } 2 d x , n 1 , and κ 0 = { f ( x ) } 2 d x . The next proposition characterises the property of infinite *-order.
Proposition 1.
For f F and m 1 , the following statements are equivalent:
(i) 
μ m + 2 < ,
(ii) 
μ j < for all 1 j m + 2 ,
(iii) 
κ j < and κ j = μ j μ j + 2 μ j + 1 2 for all 1 j m .
In particular, f is of infinite *-order if and only if μ n < , n 1 .
Proof of Proposition 1.
For each i , n 1 , provided all terms below are finite, we have the following recursive formula
ν n , i : = { f n * ( x ) } i d x = 1 κ n 1 i ν n 1 , i + 1 ,
giving
κ n = 1 j = 0 n 1 κ j n + 1 j μ n + 2 .
(i) ⇒ (ii) For 1 j m + 2 ,
μ j = { f ( x ) } j 1 { f ( x ) > 1 } d x + { f ( x ) } j 1 { f ( x ) 1 } d x { f ( x ) } m + 2 d x + f ( x ) d x = μ m + 2 + 1 < .
(ii) ⇒ (iii) Use (2) and proceed with induction for 1 n m .
(iii) ⇒ (i) By Definition 1, κ 1 < means that κ 0 < . Hence, (i) follows from (2) with n = m . ☐
Next, we investigate the involutionary nature of the *-transformation.
Proposition 2.
Let f * be a pdQ and assume f 2 * exists. Then, f * U if and only if f 2 * U .
Proof of Proposition 2.
For r > 0 , we have
0 1 | f 2 * ( u ) 1 | r d u = 1 κ 1 r 0 1 | f * ( x ) κ 1 | r f * ( x ) d x .
If f * ( u ) U , then κ 1 = 1 and (3) ensures 0 1 | f 2 * ( u ) 1 | r d u = 0 , so f 2 * ( u ) U .
Conversely, if f 2 * ( u ) U , then using (3) again gives 0 1 | f * ( x ) κ 1 | r f * ( x ) d x = 0 . Since f * ( x ) > 0 a.s., we have f * ( x ) = κ 1 a.s. and this can only happen when κ 1 = 1 . Thus, f * U , as required. ☐
Proposition 2 shows that the uniform distribution is a fixed point in the Banach space of integrable functions on [0,1] with the L r norm for any r > 0 . It remains to show that f n * has a limit and that the limit is the uniform distribution. It was hoped that the classical machinery for convergence in Banach spaces ([19], Chapter 10) would prove useful in this regard, but the *-mapping is not a contraction. For this reason, although there are many studies of fixed point theory in metric and semi-metric spaces (see, e.g., [20] and references therein), the fixed point Theorems 1, 2 and 3 shown below do not seem to be covered in these general studies. Moreover, our proofs are purely probabilistic and non-standard in this area. For simplicity, we use L r to stand for the convergence in L r norm and P for convergence in probability as n .
Theorem 1.
For f F with infinite *-order, the following statements are equivalent:
(i) 
f n * L 2 1 ;
(ii) 
For all r > 0 , f n * L r 1 ;
(iii) 
μ n μ n + 2 μ n + 1 2 1 as n .
Remark 3.
Notice that μ n = E f * ( U ) n 1 , n 1 , are the moments of the random variable f * ( U ) with U U . Theorem 1 says that the convergence of { f n * : n 1 } is purely determined by the moments of f * ( U ) . This is rather puzzling because it is well known that the moments do not uniquely determine the distribution ([21], p. 227), meaning that different distributions with the same moments have the same converging behaviour. However, if f is bounded, then f * ( U ) is a bounded random variable so its moments uniquely specify its distribution ([21], pp. 225–226), leading to stronger results in Theorem 2.
Proof of Theorem 1
It is clear that (ii) implies (i).
(i) ⇒ (iii): By Proposition 1, κ n = μ n μ n + 2 μ n + 1 2 . Now,
0 1 { f n * ( x ) 1 } 2 d x = κ n 1 ,
so (iii) follows immediately.
(iii) ⇒ (ii): It suffices to show that f n * L r 1 for any integer r 4 . To this end, since for a , b 0 , | a b | r 2 a r 2 + b r 2 , we have from (4) that
0 1 | f n * ( x ) 1 | r d x 0 1 ( f n * ( x ) 1 ) 2 ( f n * ( x ) r 2 + 1 ) d x = ν n , r 2 ν n , r 1 + ν n , r 2 + κ n 1 ,
where, as before, ν n , r = 0 1 { f n * ( x ) } r d x . However, applying (1) gives
ν n , r = μ n + r κ n 1 r κ n 2 r + 1 κ 0 n + r 1
and (2) ensures
μ n + r = κ n + r 2 κ n + r 3 2 κ 0 n + r 1 ,
which imply
ν n , r = κ n + r 2 κ n + r 3 2 κ n r 1 1
as n . Hence, it follows from (5) that 0 1 | f n * ( x ) 1 | r d x 0 as n , completing the proof. ☐
We write g = sup x | g ( x ) | for each bounded function g.
Theorem 2.
If f is bounded, then
(i) 
for all n 0 , f ( n + 1 ) * f n * and the inequality becomes equality if and only if f n * U ;
(ii) 
f n * L r 1 for all r > 0 .
Proof of Theorem 2.
It follows from (4) that κ n 1 and the inequality becomes equality if and only if f n * U .
(i) Let Q n * be the inverse of the cumulative distribution function of f n * , then f ( n + 1 ) * ( u ) = f n * ( Q n * ( u ) ) κ n f n * κ n , giving f ( n + 1 ) * f n * κ n f n * . If f n * U , then Proposition 2 ensures that f ( n + 1 ) * U , so f ( n + 1 ) * = f n * . Conversely, if f ( n + 1 ) * = f n * , then κ n = 1 , so f n * U .
(ii) It remains to show that κ n 1 as n . In fact, if κ n 1 , since κ n 1 , there exist a δ > 0 and a subsequence { n k } such that κ n k 1 + δ , which implies
μ n k + 2 μ n k + 1 = i = 0 n k κ i ( 1 + δ ) k   a s   k .
However, μ n k + 2 μ n k + 1 f < , which contradicts (6). ☐
Theorem 3.
For f F with infinite *-order such that { μ n μ n + 2 μ n + 1 2 : n 1 } is a bounded sequence, then the following statements are equivalent:
(i*) 
f n * P 1 ;
(ii) 
For all r > 0 , f n * L r 1 ;
(iii) 
μ n μ n + 2 μ n + 1 2 1 as n .
Proof of Theorem 3.
It suffices to show that (i*) implies (iii). Recall that κ n = μ n μ n + 2 μ n + 1 2 . For each subsequence { κ n k } , there exists a converging sub-subsequence { κ n k i } such that κ n k i b as i . It remains to show that b = 1 . To this end, for δ > 1 , we have
0 1 f ( n k i + 1 ) * ( x ) 1 1 f ( n k i + 1 ) * ( x ) 1 δ d x = 1 κ n k i 0 1 f ( n k i ) * ( x ) κ n k i f ( n k i ) * ( x ) 1 f ( n k i ) * ( x ) κ n k i δ κ n k i d x .
(i*) ensures that
f ( n k i + 1 ) * 1 P 0 , f ( n k i ) * f ( n k i ) * κ n k i P | 1 b | , 1 f ( n k i ) * ( x ) κ n k i δ κ n k i P 1
as i , so applying the bounded convergence theorem to both sides of (7) to get 0 = | 1 / b 1 | , i.e., b = 1 . ☐
Remark 4.
We note that not all distributions are of infinite *-order so the fixed point theorems are only applicable to a proper subclass of all distributions.

3.2. Examples of Convergence to Uniformity

The main results in Section 3.1 cover all the standard distributions with infinite *-order in [22,23]. In fact, as observed in the Remark after Theorem 1 that the convergence to uniformity is purely determined by the moments of f * ( U ) with U U , we have failed to construct a density such that { f n * : n 1 } does not converge to the uniform distribution. Here, we give a few examples to show that the main results in Section 3.1 are indeed very convenient to use.
Example 1.
Power function family.
From Table 1, the Power ( b ) family has density f b ( x ) = b x b 1 , 0 < x < 1 , so it is of infinite *-order if and only if b 1 . As f b is bounded for b 1 , Theorem 2 ensures that f b n * converges to the uniform in L r for any r > 0 .
Example 2.
Exponential distribution.
Suppose f ( x ) = e x , x < 0 . f is bounded, so Theorem 2 says that f n * converges to the uniform distribution as n . By symmetry, the same result holds for f ( x ) = e x , x > 0 .
Example 3.
Pareto distribution.
The Pareto(a) family, with a > 0 , has f a ( x ) = a x a 1 for x > 1 , which is bounded, so an application of Theorem 2 yields that the sequence { f a n * } n 1 converges to the uniform distribution as n .
Example 4.
Cauchy distribution.
The pdQ of the Cauchy density is given by f * ( u ) = 2 sin 2 ( π u ) , 0 < u < 1 , see Table 1; it retains the bell shape of f. It follows that F * ( t ) = t sin ( 2 π t ) / ( 2 π ) , for 0 < t < 1 . It seems impossible to obtain an analytical form of f n * for n 2 . However, as f is bounded, using Theorem 2, we can conclude that f n * converges to the uniform distribution as n .
Example 5.
Skew-normal.
A skew-normal distribution [17,24] has the density of the form
f ( x ) = 2 ϕ ( x ) Φ ( α x ) , x R ,
where α R is a parameter, ϕ and Φ, as before, are the density and cdf of the standard normal distribution. When α = 0 , f is reduced to the standard normal so it is possible to obtain its { f n * } by induction and then derive directly that f n * converges to the uniform distribution as n . However, the general form of skew-normal densities is a lot harder to handle and one can easily see that the density is bounded and so Theorem 2 can be employed to conclude that f n * converges to the uniform distribution as n .
Example 6.
Let f ( x ) = ln x , x ( 0 , 1 ) . Then, μ n = n ! and κ n = n + 2 n + 1 1 as n , so we have from Theorem 1 that, for any r > 0 , f n * converges in L r norm to constant 1 as n .

4. Discussion

The pdQ, transformation from a density function f to f * extracts the important information of f such as its asymmetry and tail behaviour and ignores the less critical information such as gaps, location and scale, and thus provides a powerful tool in studying the shapes of density functions. We found the directed divergences from uniformity of the pdQs of many standard location-scale families and used them to make a map locating each shape family relative to others and giving its distance from uniformity. It would be of interest to find the pdQs of other shape families, such as the skew-normal of Example 5; however, a simple expression for this pdQ appears unlikely given the complicated nature of its quantile function. Nevertheless, the [25] skew-normal family should be amenable in this regard because there are explicit formulae for both its density and quantile functions. To obtain the information conserved in the pdQ transformation, we repeatedly applied the transformation and found the limiting behaviour of repeated applications of the pdQ mapping. When the density function f is bounded, we showed that each application lowers its modal height and hence the resulting density function f * is closer to the uniform density than f. Furthermore, we established a necessary and sufficient condition for f n * converging in L 2 norm to the uniform density, giving a positive answer to a conjecture raised in [1]. In particular, if f is bounded, we proved that f n * converges in L r norm to the uniform density for any r > 0 . The fixed point theorems can be interpreted as follows. As we repeatedly apply the pdQ transformation, we keep losing information about the shape of the original f and will eventually exhaust the information, leaving nothing in the limit, as represented by the uniform density, which means no points carry more information than other points. Thus, the pdQ transformation plays a similar role to the difference operator in time series analysis where repeated applications of the difference operator to a time series with a polynomial component lead to a white noise with a constant power spectral density ([26], p. 19). We conjecture that every almost surely positive density g on [ 0 , 1 ] is a pdQ of a density function, hence uniquely representing a location-scale family. This is equivalent to saying that there exists a density function f such that g = f * . When g satisfies 0 1 1 g ( t ) d t < , one can show that the cdf F of f can be uniquely (up to location-scale parameters) represented as F ( x ) = H 1 ( H ( 1 ) x ) , where H ( x ) = 0 x 1 g ( t ) d t (Professor A.D. Barbour, personal communication). The condition 0 1 1 g ( t ) d t < is equivalent to saying that f has bounded support and it is certainly not necessary, e.g., g ( x ) = 2 x for x [ 0 , 1 ] and f ( x ) = e x for x < 0 (see Example 2 in Section 3.2).

5. Conclusions

In summary, the study of shapes of probability densities is facilitated by composing them with their own quantile functions, which puts them on the same finite support where they are absolutely continuous with respect to Lebesgue measure, and thus amenable to metric and semi-metric comparisons. In addition, we showed that further applications of this transformation, which intuitively reduces information and increases the relative entropy, is generally valid but requires a non-standard approach for proof. Similar results are likely to be obtainable in the multivariate case. Further research could investigate the relationship between relative entropy and tail-weight or distance from the class of symmetric pdQs.

Supplementary Materials

An R script entitled StaudteXiaSupp.R, which is available online at https://www.mdpi.com/1099-4300/20/5/317/s1, enables the reader to plot successive iterates of the pdQ transformation on any standard probability distribution available in R.

Author Contributions

Section 2 is mostly due to Robert Staudte and Section 3 is mostly due to Aihua Xia. Both authors contributed equally to the other three sections.

Acknowledgments

The authors thank the three reviewers for their critiques and many positive suggestions. The authors also thank Peter J. Brockwell for helpful commentary on an earlier version of this manuscript. The research by Aihua Xia is supported by an Australian Research Council Discovery Grant DP150101459. The authors have not received funds for covering the costs to publish in open access.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Staudte, R. The shapes of things to come: Probability density quantiles. Statistics 2017, 51, 782–800. [Google Scholar] [CrossRef]
  2. Parzen, E. Nonparametric statistical data modeling. J. Am. Stat. Assoc. 1979, 7, 105–131. [Google Scholar] [CrossRef]
  3. Kullback, S.; Leibler, R. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  4. Kullback, S. Information Theory and Statistics; Dover: Mineola, NY, USA, 1968. [Google Scholar]
  5. Abbas, A.; Cadenbach, A.; Salimi, E. A Kullback–Leibler View of Maximum Entropy and Maximum Log-Probability Methods. Entropy 2017, 19, 232. [Google Scholar] [CrossRef]
  6. Jeffreys, H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. A 1946, 186, 453–461. [Google Scholar] [CrossRef]
  7. Freimer, M.; Kollia, G.; Mudholkar, G.; Lin, C. A study of the generalized Tukey lambda family. Commun. Stat. Theory Methods 1988, 17, 3547–3567. [Google Scholar] [CrossRef]
  8. Stephens, M. Uniformity, Tests of. In Encyclopedia of Statistical Sciences; John Wiley & Sons: Hoboken, NJ, USA, 2006; Volume 53, pp. 1–8. [Google Scholar] [CrossRef]
  9. Lockhart, R.; O’Reilly, F.; Stephens, M. Tests of Fit Based on Normalized Spacings. J. R. Stat. Soc. B 1986, 48, 344–352. [Google Scholar]
  10. Schader, M.; Schmid, F. Power of tests for uniformity when limits are unknown. J. Appl. Stat. 1997, 24, 193–205. [Google Scholar] [CrossRef]
  11. Prendergast, L.; Staudte, R. Exploiting the quantile optimality ratio in finding confidence Intervals for a quantile. Stat 2016, 5, 70–81. [Google Scholar] [CrossRef]
  12. Dudewicz, E.; Van Der Meulen, E. Entropy-Based Tests of Uniformity. J. Am. Stat. Assoc. 1981, 76, 967–974. [Google Scholar] [CrossRef]
  13. Bowman, A. Density based tests for goodness-of-fit. J. Stat. Comput. Simul. 1992, 40, 1–13. [Google Scholar] [CrossRef]
  14. Fan, Y. Testing the Goodness of Fit of a Parametric Density Function by Kernel Method. Econ. Theory 1994, 10, 316–356. [Google Scholar] [CrossRef]
  15. Pavia, J. Testing Goodness-of-Fit with the Kernel Density Estimator: GoFKernel. J. Stat. Softw. 2015, 66, 1–27. [Google Scholar] [CrossRef]
  16. Noughabi, H. Entropy-based tests of uniformity: A Monte Carlo power comparison. Commun. Stat. Simul. Comput. 2017, 46, 1266–1279. [Google Scholar] [CrossRef]
  17. Arellano-Valle, R.; Contreras-Reyes, J.; Stehlik, M. Generalized Skew-Normal Negentropy and Its Application to Fish Condition Factor Time Series. Entropy 2017, 19, 528. [Google Scholar] [CrossRef]
  18. R Core Team. R Foundation for Statistical Computing; R Core Team: Vienna, Austria, 2008; ISBN 3-900051-07-0. [Google Scholar]
  19. Luenberger, D. Optimization by Vector Space Methods; Wiley: New York, NY, USA, 1969. [Google Scholar]
  20. Bessenyei, M.; Páles, Z. A contraction principle in semimetric spaces. J. Nonlinear Convex Anal. 2017, 18, 515–524. [Google Scholar]
  21. Feller, W. An Introduction to Probability Theory and Its Applications; John Wiley & Sons: New York, NY, USA, 1971; Volume 2. [Google Scholar]
  22. Johnson, N.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions; John Wiley & Sons: New York, NY, USA, 1994; Volume 1. [Google Scholar]
  23. Johnson, N.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions; John Wiley & Sons: New York, NY, USA, 1995; Volume 2, ISBN 0-471-58494-0. [Google Scholar]
  24. Azzalini, A. A Class of Distributions which Includes the Normal Ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  25. Jones, M.; Pewsey, A. Sinh-arcsinh distributions. Biometrika 2009, 96, 761–780. [Google Scholar] [CrossRef]
  26. Brockwell, P.; Davis, R. Time Series: Theory and Methods; Springer: New York, NY, USA, 2009. [Google Scholar]
Figure 1. Divergence from uniformity. The loci of points ( s 1 , s 2 ) = ( I ( U : f * ) , I ( f * : U ) ) is shown for various standard families. The large disks correspond respectively to the symmetric families: uniform (purple), normal (red) and Cauchy (black). The crosses correspond to the asymmetric distributions: exponential (blue) and standard lognormal (red). More details are given in Section 2.2.
Figure 1. Divergence from uniformity. The loci of points ( s 1 , s 2 ) = ( I ( U : f * ) , I ( f * : U ) ) is shown for various standard families. The large disks correspond respectively to the symmetric families: uniform (purple), normal (red) and Cauchy (black). The crosses correspond to the asymmetric distributions: exponential (blue) and standard lognormal (red). More details are given in Section 2.2.
Entropy 20 00317 g001
Table 1. Quantiles of some distributions, their pdQs and divergences. In general, we denote x u = Q ( u ) = F 1 ( u ) , but for the normal F = Φ with density ϕ , we use z u = Φ 1 ( u ) . The logistic quantile function is only defined for u 0.5 , but it is symmetric about u = 0.5 . Lognormal( σ ) represents the lognormal distribution with shape parameter σ . The quantile function for the Pareto is for the Type II distribution with shape parameter a, and the pdQ is the same for Type I and Type II Pareto models.
Table 1. Quantiles of some distributions, their pdQs and divergences. In general, we denote x u = Q ( u ) = F 1 ( u ) , but for the normal F = Φ with density ϕ , we use z u = Φ 1 ( u ) . The logistic quantile function is only defined for u 0.5 , but it is symmetric about u = 0.5 . Lognormal( σ ) represents the lognormal distribution with shape parameter σ . The quantile function for the Pareto is for the Type II distribution with shape parameter a, and the pdQ is the same for Type I and Type II Pareto models.
Q ( u ) f * ( u )     I ( U : f * ) I ( f * : U ) J ( U , f * )
Normal z u 2 π ϕ ( z u ) 0.1530.0970.250
Logistic ln ( u / ( 1 u ) ) 6 u ( 1 u ) 0.2080.1250.333
Laplace ln ( 2 u ) , u 0.5 2 min { u , 1 u } 0.3070.1930.500
t 2 2 u 1 { 2 u ( 1 u ) } 1 / 2 2 7 { u ( 1 u ) } 3 / 2 3 π 0.3910.2000.591
Cauchy tan { π ( u 0.5 ) } 2 sin 2 ( π u ) 0.6930.3071.000
Exponential ln ( 1 u ) 2 ( 1 u ) 0.3070.1930.500
Gumbel ln ( ln ( u ) ) 4 u ln ( u ) 0.1910.1160.307
Lognormal ( σ ) e σ z u 2 π e σ 2 / 4 ϕ ( z u ) e σ z u σ 2 4 + 1 2 ln ( 2 ) - 1 4 + 3 σ 2 8
Pareto (a) ( 1 u ) 1 / a 2 a + 1 a ( 1 u ) 1 + 1 / a 1 + a a ln ( 2 + 1 a ) - ( 1 + a ) 2 a ( 1 + 2 a )
Power (b) u 1 / b 2 b 1 b u 1 1 / b b 1 b ln ( 2 1 b ) - ( b 1 ) 2 b ( 2 b 1 )
Back to TopTop