Next Article in Journal
Multimode Decomposition and Wavelet Threshold Denoising of Mold Level Based on Mutual Information Entropy
Previous Article in Journal
Amplitude Constrained MIMO Channels: Properties of Optimal Input Distributions and Bounds on the Capacity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Ordering of Shannon Entropies for the Multivariate Distributions and Distributions of Eigenvalues

1
Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan
2
Center for General Education, Saint Mary’s Junior College of Medicine, Nursing and Management, Yi-Lang 266, Taiwan
3
Institute of Statistics, National Chiao-Tung University, Hsin-Chu 30010, Taiwan
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(2), 201; https://doi.org/10.3390/e21020201
Submission received: 14 January 2019 / Revised: 17 February 2019 / Accepted: 18 February 2019 / Published: 20 February 2019

Abstract

:
In this paper, we prove the Shannon entropy inequalities for the multivariate distributions via the notion of convex ordering of two multivariate distributions. We further characterize the multivariate totally positive of order 2 ( M T P 2 ) property of the distribution functions of eigenvalues of both central Wishart and central MANOVA models, and of both noncentral Wishart and noncentral MANOVA models under the general population covariance matrix set-up. These results can be directly applied to both the comparisons of two Shannon entropy measures and the power monotonicity problem for the MANOVA problem.

1. Introduction

Let X 1 , X 2 , be independent random vectors with a common continuous distribution function F ( x ) which has density function f ( x ) , x R p with respect to Lebesgue measure. The Shannon entropy for f ( x ) is denoted by
I S ( f ) = { log f ( x ) } f ( x ) d x .
We further use F to denote all density functions whose Shannon entropies exist. That is
F = { f : I S ( f ) < } .
Entropy inequality has been studied by several authors, such as Karlin and Rinott ([1,2], and references therein), who used the notions of majorization and Schur functions to study the Shannon (differential) entropy comparisons. Zografos and Nadarajah [3] pointed out that some comparisons of Shannon entropies can be induced by the affine transformation, those results show agreement on the concern that Shannon entropy is not scale invariant. Tsai [4] used the notion of convex ordering to investigate the Shannon inequalities for the univariate distributions. The novel finding of his approach is to adopt the difference of two Shannon entropy measures as a new measure, which is essentially symmetric. This new measure has some advantages over the well-known Kullback-Leibler divergence (relative entropy). The former one is designed for small or moderate deviation, however, the latter one is quite often used for large deviation. The measure of the difference between two Shannon entropies enjoys the finite sample property.
The key point of this method is to transform the convex ordering into the Lorenz ordering, which nicely depicts the stochastic ordering of two normalized distributions with the same starting and ending points, respectively. The presentation of the difference between two Shannon entropy measures can be represented in terms of a standard form of Kullback-Leibler divergence, however, it is interesting to note that the original Kullback-Leibler divergence does not have such kind of symmetric representation. Furthermore, the new measure is monotone likelihood ratio of two normalized density functions. This new measure provides us with some advantages for statistical inference.
The ordering of Shannon entropies can naturally occur when two underline distributions have the convex ordering or concave ordering relationship via the help of Lorenz ordering. In this paper, one of the main goals is to extend the result of Tsai [4] to the multivariate case, which is studied in Section 3. Please note that the notions of convex ordering of two distributions, monotone likelihood ratio ( M L R ) and totally positive of order 2 ( T P 2 ) are essentially equivalent. We then adopt the notion of multivariate totally positive of order 2 ( M T P 2 ) or multivariate reverse rule of order 2 ( M R R 2 ) to get a more complete theory on Shannon entropy comparisons. Eigenvalues problem has become quite a hot topic at present. We further study the M T P 2 property of the distribution functions of eigenvalues of both central Wishart and central MANOVA models, and of both noncentral Wishart and noncentral MANOVA models under the general population covariance matrix set-up, respectively, in Section 4. The results can be directly applied to the comparisons of two Shannon entropy measures as well as the power monotonicity problem for the MANOVA problem, which has been open for a long time in the literature (Perlman and Olkin [5]). For a high-dimensional Wishart matrix with unknown scale times identity matrix as the population matrix, the T P 2 property of limiting empirical density function of eigenvalues is also studied in the Section 4.5. Mixture density functions, such as the noncentral chi-square density function and those presented in Section 4.3 and Section 4.4, play an important role in statistical inference. We also connected this notion with that of well-known Fisher’s multivariate analysis of variance in the final remark section.

2. The Univariate Distributions

Tsai [4] adopted the notion of convex ordering of distribution functions to study the orderings of Shannon entropy measures. For two univariate distributions F , G F , it is said that F is c- o r d e r e d (convex ordered) with respect to G ( F c G ) if and only if G 1 F is convex on the interval where 0 < F ( x ) < 1 (van Zwet [6]). The transformation of Barlow and Doksum [7] is to transform the convex ordering of distributions F and G to the stochastic ordering of distributions, it essentially can be viewed as a kind of Lorenz ordering (Gastwirth [8]).
Let C F ( x ) = G 1 F ( x ) , then note that d C F ( x ) / d x = f ( x ) / g ( G 1 F ( x ) ) , x ( , ) . It can also be rewritten as f ( F 1 ( u ) ) / g ( G 1 ( u ) ) , w h e r e u = F ( x ) , u [ 0 , 1 ] . We adopt the notion of Barlow and Doksum [7] to denote that H F ( u ) = 0 G 1 ( u ) f ( F 1 G ( t ) ) d t , u [ 0 , 1 ] , which can be viewed as the inverse function of the transformation considered by Barlow and Doksum [7]. Please note that the notation in Barlow and Doksum [7] is originally defined that F ( 0 ) = 0 and F 1 ( 0 ) = 0 , we use the general notations that F ( ) = 0 and F 1 ( 0 ) = in this paper. Obviously, H G ( u ) = u , u [ 0 , 1 ] . Under the assumption that F ( x ) c G ( y ) , Barlow and Doksum [7] pointed out that both H F ( u ) and H G ( u ) , u [ 0 , 1 ] are two distribution functions. Let h F and h G be the corresponding density functions of H F and H G , respectively, then we may see that h F ( u ) = f ( F 1 ( u ) ) / g ( G 1 ( u ) ) and h G ( u ) = 1 , u [ 0 , 1 ] . Then, it is easy to note that the difference of two Shannon entropies can be represented as the Kullback-Leibler divergence of the uniform density function h G and the density function h F .
Please note that if F ( x ) c G ( y ) , then h F ( u ) is nondecreasing in u [ 0 , 1 ] , namely, it enjoys the property of monotone likelihood ratio. In addition, it gets across with the uniform density function h G ( u ) = 1 , u [ 0 , 1 ] at most once, the sign of the difference ( h F ( u ) h G ( u ) , u [ 0 , 1 ] ) of two density functions is from negative to nonnegative. Also note that 0 1 ( h F ( v ) 1 ) d v = 0 , thus we have that H F ( u ) u = H G ( u ) , u [ 0 , 1 ] . Similarly, if F ( x ) c G ( y ) , then F ( x ) G ( y ) , and C F ( x ) is a convex function in x , x ( , ) (for the details see Shaked and Shanthikumar [9]). Namely, the convex ordering implies the stochastic ordering, however, the vice versa may not be true. We provide a simple counterexample for this. For 0 x 1 , let F ( x ) = x , G ( x ) = ( 3 x 2 ) 3 / 9 + 8 / 9 , then F 1 ( u ) = u , G 1 ( u ) = ( 3 9 u 8 + 2 ) / 3 , u [ 0 , 1 ] . Thus, f ( F 1 ( u ) ) = 1 , g ( G 1 ( u ) ) = ( 9 u 8 ) 2 / 3 and h F ( u ) = ( 9 u 8 ) 2 / 3 . It is easy to see that F ( u ) G ( u ) . However, h F ( u ) is not nondecreasing in u, u [ 0 , 1 ] , that means that F ( x ) c G ( y ) will not hold in this example. As such, we may conclude that if F ( x ) c G ( y ) , then F ( x ) G ( y ) . However, if F ( x ) G ( y ) , then F ( x ) c G ( y ) may not hold generally.
Theorem 1
(Tsai [4]). If F ( x ) c G ( y ) , then I S ( f ) I S ( g ) .
With the convex ordering assumption, i.e., F ( x ) c G ( y ) , the corresponding Lorenz ordering plays an important role for the proof of Theorem 1. To check whether the condition F ( x ) c G ( y ) holds or not, one should make simultaneously check whether the condition F ( x ) G ( y ) holds or not. Similarly, define that F ( x ) c o n c a v e G ( x ) if and only if F 1 G ( x ) is concave function in x , x ( , ) , then we have the following corollary.
Corollary 1.
If F ( x ) c o n c a v e G ( y ) , then I S ( f ) I S ( g ) .
Proof. 
It can be easily followed by Theorem 1 via changing the sign of l o g h F ( u ) . □
Proposition 1.
Composition lemma (Karlin [10]). If K is T P 2 and L is T P 2 and ν is a σ-finite measure, then the convolution M ( x , y ) = Z K ( x , z ) L ( z , y ) d ν ( z ) is T P 2 .
It can be proved similarly. We also note that if K is T P 2 and L is R R 2 , then M is R R 2 .

3. The Multivariate Distributions

By convex ordering transformation, we can neatly transform the difference of two Shannon entropy measures into the Lorenz ordering. In this section, we intend to extend this result to the multivariate case. It is natural to do so in the sense of conditional convex ordering for the multivariate case. Let X = ( X 1 , , X p ) and Y = ( Y 1 , , Y p ) be random vectors, and f and g be the density function of X and Y , respectively. Please note that the M T P 2 holds if and only if all the pairwise T P 2 hold (Fact 4.3.2 of Tong [11]), namely to examine the M T P 2 is the same as to examine all the pairwise T P 2 for any fixed pair. Hence without loss of generality, we may take p = 2 . Please note that f ( x 1 , x 2 ) = f 1 ( x 1 ) f 2 | 1 ( x 2 | x 1 ) , thus f ( x 1 , x 2 ) is T P 2 if and only if f 1 ( x 1 ) f 2 | 1 ( x 2 | x 1 ) is T P 2 . By the product property that the product function is T P 2 if two functions are each T P 2 , respectively. Thus we may conclude that f ( x 1 , x 2 ) is T P 2 if and only if both f 1 ( x 1 ) and f 2 | 1 ( x 2 | x 1 ) are T P 2 . That is to say that F ( x 1 , x 2 ) c G ( y 1 , y 2 ) iff F 1 ( x 1 ) c G 1 ( y 1 ) and F 2 | 1 ( x 2 | x 1 ) c G 2 | 1 ( y 2 | y 1 ) . Continue this process, for the general p case, we have the following.
Proposition 2.
Let c = ( c 1 , , c p ) R p , then (i) P { Y 1 > c 1 } c P { X 1 > c 1 } , and (ii) P { Y j > c j | Y 1 = y 1 , , Y j 1 = y j 1 } c P { X j > c j | X 1 = x 1 , , X j 1 = x j 1 } for all x 1 y 1 , , x j 1 y j 1 and for all j = 2 , , p if and only if F ( x ) c G ( y ) .
From the results of Proposition 2, we are now in a position to use the convex ordering assumption for which F j | 1 , , j 1 ( x j | x 1 , , x j 1 ) c G j | 1 , , j 1 ( y j | y 1 , , y j 1 ) to get each corresponding Lorenz ordering, j = 1 , , p , where j = 1 denotes that F 1 ( x 1 ) c G 1 ( y 1 ) . With some elementary arguments, we then have the following.
Theorem 2.
If F ( x ) c G ( y ) , then I S ( f ) I S ( g ) .
Proof. 
For simplicity we may consider the situation when p = 2 first. Let g 12 ( y 1 , y 2 ) be the joint density function of random variables Y 1 and Y 2 , g 2 | 1 ( y 2 | y 1 ) be the corresponding conditional density function of Y 2 given Y 1 = y 1 and G 2 | 1 ( y 2 | y 1 ) be the corresponding distribution function. Let f 12 ( x 1 , x 2 ) , f 2 | 1 ( x 2 | x 1 ) and F 2 | 1 ( x 2 | x 1 ) be defined similarly for the variables X 1 and X 2 . Also let U 1 = F 1 ( x ) , U 2 | u 1 = F 2 | 1 ( x 2 | x 1 ) ( V 1 = G 1 ( y ) , V 2 | v 1 = G 2 | 1 ( y 2 | y 1 ) ) . Please note that
I S ( f 12 ) I S ( g 12 ) = { log f 12 ( x 1 , x 2 ) } f 12 ( x 1 , x 2 ) d x 1 d x 2 { log g 12 ( y 1 , y 2 ) } g 12 ( y 1 , y 2 ) d y 1 d y 2 = 0 1 0 1 { log [ f 1 ( F 1 1 ( u 1 ) ) f 2 | 1 ( F 2 | 1 1 ( u 2 | u 1 ) ) ] } d u 1 d u 2 0 1 0 1 { log [ g 1 ( G 1 1 ( v 1 ) ) g 2 | 1 ( G 2 | 1 1 ( v 2 | v 1 ) ) ] } d v 1 d v 2 = 0 1 0 1 log [ f 1 ( F 1 1 ( u 1 ) ) f 2 | 1 ( F 2 | 1 1 ( u 2 | u 1 ) ) g 1 ( G 1 1 ( u 1 ) ) g 2 | 1 ( G 2 | 1 1 ( u 2 | u 1 ) ) ] d u 1 d u 2 = 0 1 log h F 1 ( u 1 ) d u 1 + 0 1 [ 0 1 log h F 2 | 1 ( u 2 | u 1 ) d u 2 ] d u 1 ,
where h F 1 ( u 1 ) = f 1 ( F 1 1 ( u 1 ) ) / g 1 ( G 1 1 ( u 1 ) ) , h G 1 ( u 1 ) = 1 and h F 2 | 1 ( u 2 | u 1 ) = f 2 | 1 ( F 2 | 1 1 ( u 2 | u 1 ) ) / g 2 | 1 ( G 2 | 1 1 ( u 2 | u 1 ) ) , h G 2 | 1 ( u 2 | u 1 ) = 1 . Furthermore, by Proposition 2, we may note that F ( x 1 ) c G ( y 1 ) , and thus via the arguments in Section 2, we have that h F 1 ( u 1 ) , u 1 [ 0 , 1 ] is the density function. Similarly, we also have that F 2 | 1 ( x 2 | x 1 ) c G 2 | 1 ( y 2 | y 1 ) , and then h F 2 | 1 ( u 2 | u 1 ) , u 2 [ 0 , 1 ] is the density function. Thus by the information inequality, the first term of the right hand side of Equation (3) is non-negative and the second term insight bracket is non-negative too, and hence the second term of the right hand side of Equation (3) is non-negative. Thus we may conclude that I S ( f 12 ) I S ( g 12 ) .
Let U i | u 1 , , u i 1 = F i | 1 , , i 1 ( x i | x 1 , , x i 1 ) , V i | v 1 , , v i 1 = G i | 1 , , i 1 ( y i | y 1 , , i p 1 ) and h F i | 1 , , i 1 ( u i | u 1 , , u i 1 ) = f i | 1 , , i 1 ( F i | 1 , , i 1 1 ( u i | u 1 , , u i 1 ) ) / g i | 1 , , i 1 ( G i | 1 , , i 1 1 ( v i | v 1 , , v i 1 ) ) , i = 2 , , p . Continuing the processes, finally for general p we have
I S ( f ) I S ( g ) = { log f ( x ) } f ( x ) d x { log g ( y ) } g ( y ) d y = 0 1 log h F 1 ( u 1 ) d u 1 + 0 1 [ 0 1 log h F 2 | 1 ( u 2 | u 1 ) d u 2 ] d u 1 + + 0 1 0 1 [ 0 1 log h F p | 1 , , p 1 ( u p | u 1 , , u p 1 ) d u p ] d u p 1 d u 1 0 .
Hence, the theorem follows. □
For two arbitrary densities f g , f h , the Kullback-Leibler information discrimination is defined by K ( f g , f h ) = f g ( x ) log { f g ( x ) / f h ( x ) } d x . It is non-negative for all f g , f h , and is equal to zero if f g = f h almost everywhere. Let E [ X ] denote the expectation of random variable X. If F ( x ) c G ( y ) , from Equation (4) we have the following representation:
I S ( f ) I S ( g ) = K ( 1 , h F 1 ) + E U 1 [ K ( 1 , h F 2 | 1 ) ] + + E U 1 , , U p 1 [ K ( 1 , h F p | 1 , , p 1 ) ] ,
where E U 1 , , U i 1 denotes the expectation of independent uniform densities of ( U 1 , , U i 1 ) , U i [ 0 , 1 ] , i = 1 , , p 1 .
Corollary 2.
If F ( x ) c o n c a v e G ( y ) , then I S ( f ) I S ( g ) .
As we have seen, the proof of Theorem 2 is surprisingly easy. However, to characterize the property of convex ordering for two multivariate distributions is sometimes much more complicated via the conditional approach than to characterize the M T P 2 property directly. The convex ordering of two multivariate distributions is essentially equivalent to the notion of M T P 2 , so does the concave ordering to the M R R 2 . Karlin and Rinott ([12,13]) had studied several M T P 2 and M R R 2 distributions, respectively. Some other often seen models related to Theorem 2 and Corollary 2 are exampled in the following.
Example 1.
Let y = A x , where A is a nonsingular matrix. Then I S ( g ) = l o g | A | + I S ( f ) , where | A | denotes the determinant of matrix A (Zografos and Nadarajah [3]). Thus we have I S ( f ) I S ( g ) iff | A | 1 .
Example 2.
Let B p = { ( x 1 , , x p ) : x i > 0 , i = 1 , , p , i = 1 p x i 1 } be the p-dimensional simplex, and let
f ( x , a 1 , , a p ; b 1 , , b p ) = c i = 1 p x i a i 1 ( 1 k = 1 i x k ) b i 1
be the generalized Dirichlet density function with parameters, a i > 0 , b i > 0 , i = 1 , , p and
c = i = 1 p Γ ( k = i p ( a k + b k 1 ) + 1 ) Γ ( a i ) Γ ( b i + k = i p ( a k + b k 1 ) ) .
It is easy to note that x i a i 1 is T P 2 in ( x i , a i ) , ( 1 k = 1 i x k ) b i 1 is R R 2 in ( x i , b i ) and ( 1 k = 1 i x k ) b i 1 is pairwisely R R 2 in ( x i , x j ) when b i 1 . Then, by Proposition 2, we have (i) i = 1 p x i a i 1 is M T P 2 and (ii) i = 1 p ( 1 k = 1 i x k ) b i 1 is M R R 2 when b i 1 , i = 1 , , p . Thus, we may conclude that the generalized Dirichlet density function is M R R 2 when b i 1 , i = 1 , , p . In addition, Corollary 2 is applicable.
Example 3.
Gupta and Richards [14] considered the multivariate Liouville distribution which the density function is of the form
f ( x , θ ) = c ( θ ) g ( i = 1 p x i ) i = 1 p x i θ i ,
where g : R + R + is continuous, and θ , x R + p and
c 1 ( θ ) = i = 1 p Γ ( θ i ) Γ ( i = 1 p θ i ) 0 t i = 1 p θ i 1 g ( t ) d t .
Please note that i = 1 p x i θ i is M T P 2 . By Proposition 2, it is easy to see that f ( x , θ ) is M T P 2 ( M R R 2 ) if and only if g ( x 1 + x 2 ) is T P 2 ( R R 2 ) in ( x 1 , x 2 ) on R + 2 . The M R R 2 property of some special cases of multivariate Liouville distributions, such as Dirichlet distributions and inverted Dirichlet distributions, are studied by Karlin and Rinott [13].
Example 4.
In biostatistics, many studies are under the setup of well-known Cox proportional hazard model (Cox [15]), and its basic model is assumed to be that G ¯ ( x ) = ( F ¯ ( x ) ) λ , λ > 0 , where G ¯ ( x ) = 1 G ( x ) . Without loss of generality, we may assume p = 2 . Write G ¯ ( x ) = G ¯ 1 ( x 1 ) G ¯ 2 | 1 ( x 2 | x 1 ) and F ¯ ( x ) = F ¯ 1 ( x 1 ) F ¯ 2 | 1 ( x 2 | x 1 ) . Let G 1 ( x ) = u 1 , u 1 [ 0 , 1 ] , then we have F 1 ( G 1 1 ( u 1 ) ) = 1 ( 1 u 1 ) 1 / λ , thus we have d F 1 ( G 1 1 ( u 1 ) ) / d u 1 = ( 1 u 1 ) ( 1 / λ 1 ) / λ , which is nondecreasing in u 1 if λ 1 . As such, we have G 1 1 ( u 1 ) c F 1 1 ( u 1 ) , u 1 [ 0 , 1 ] when λ 1 . Namely, when λ 1 we have F 1 ( x 1 ) c G 1 ( x 1 ) . Let G 2 | 1 ( x 2 | x 1 ) = u 2 , similar arguments as above, we may have that F 2 | 1 ( x 2 | x 1 ) c G 2 | 1 ( x 2 | x 1 ) when λ 1 . Thus, by Proposition 2, we may have that F ( x ) c G ( x ) when λ 1 . This procedure can be directly generalized to the general p case by mathematical induction. As such, for the positive disease gene case λ 1 , then by Theorem 2 we may conclude that I S ( f ) I S ( g ) . Similarly, for the negative disease gene case λ 1 , then I S ( f ) I S ( g ) .
Example 5.
The density function of eigenvalues for a multivariate beta matrix is of the form
f ( x ) = c i = 1 p x i 1 2 ( n 1 p + 1 ) 1 ( 1 x i ) 1 2 ( n 2 p + 1 ) 1 1 i < j p | x i x j | ,
where 0 < x i < 1 , i = 1 , , p , n 1 , n 2 > p , and
c = i = 1 p Γ ( 3 2 ) Γ ( 1 2 ( n 1 + n 2 p + i ) ) Γ ( 1 + i 2 ) Γ ( 1 2 ( n 1 p + i ) ) Γ ( 1 2 ( n 1 p + i ) )
(Dumitriu [16]; Peddada and Richards [17]). Please note that the Vandermonde determinant 1 i < j p | x i x j | is M T P 2 , for the details see Dykstra and Hewett [18]. Using Proposition 2, it is easy to see that i = 1 p x i ( n 1 p + 1 ) / 2 1 is T P 2 in ( x i , n 1 p + 1 ), and i = 1 p ( 1 x i ) ( n 2 p + 1 ) / 2 1 is R R 2 in ( x i , n 2 p + 1 ). Thus, i = 1 p x i ( n 1 p + 1 ) / 2 1 ( 1 x i ) ( n 2 p + 1 ) / 2 1 is M R R 2 . Then, we may conclude that the density function, which is the product of these functions, is M R R 2 . In addition, Corollary 2 is applicable.

4. The Central Wishart and Central MANOVA Models, and Noncentral Wishart and Noncentral MANOVA Models

Theorem 2 states that if the density function has the M T P 2 property, then the corresponding ordering of Shannon entropies holds, namely the M T P 2 (or M R R 2 ) property of density function ensures the ordering of Shannon entropies. The M T P 2 also implies the power monotonicity property of MANOVA tests, for which based on the monotone function of eigenvalues. Eigenvalues play an important role in statistical inference. Example 5 deals with the density function of eigenvalues of MANOVA models (or Jacobi ensembles) when the population covariance matrix is assumed to be an identity matrix ( Σ = I ) . Two other often seen models are (i) Gaussian (or Hermite) ensembles,
f ( x ) = ( 2 π ) p 2 i = 1 p Γ ( 3 2 ) Γ ( 1 + i 2 ) e 1 2 1 p x i 2 1 i < j p | x i x j | ,
and (ii) Wishart models (or Laguerre ensembles),
f ( x ) = 2 p ( p + 1 ) 2 i = 1 p Γ ( 3 2 ) Γ ( 1 + i 2 ) Γ ( 1 2 ( n 1 p + 1 ) ) i = 1 p x i 1 2 ( n 1 p + 1 ) 1 e 1 2 1 p x i 1 i < j p | x i x j |
(Dumitriu [16]). From similar arguments to Example 5, it is easy to see that the density function f ( x ) is M T P 2 either for (i) (the identity population covariance matrix is an M-matrix) or for (ii) (gamma type density function), respectively, we omit the details.
For statistical inference, generally the population covariance matrix is unknown. In this section, we further study the M T P 2 property of the distribution functions of eigenvalues of central Wishart and MANOVA models, and of noncentral Wishart and MANOVA models (James [19]; Muirhead [20]) under the general population covariance matrix set-up.

4.1. Type 0 F 0 , Exponential: Eigenvalues of a Wishart Matrix

Suppose that the columns of a matrix p × n matrix X are independently normally distributed with covariance matrix Σ and E ( X ) = 0 . Let L = diag ( l 1 , , l p ) , where l i be the ith largest eigenvalue of X X . Similarly, let Δ = diag ( δ 1 , , δ 1 ) , where δ i be the ith largest eigenvalue of Σ . Write Σ = Q Δ Q and X X = U L U , where Q , U O ( p ) with O ( p ) being the group of p × p orthogonal matrices. Also denote ( d U ) the m-form of U with m = 1 2 p ( p + 1 ) and tr denotes the trace operator, then
0 F 0 ( 1 2 Σ 1 , X X ) = O ( p ) e 1 2 tr ( Q Δ 1 Q U L U ) ( d U ) = O ( p ) e 1 2 tr ( Δ 1 H L H ) ( d H ) = 0 F 0 ( 1 2 Δ 1 , L ) ,
where H = Q U O ( p ) and noting that ( d U ) = ( d H ) . Due to the invariant property, the density function of L depends only upon Δ , which is of the form
ϕ δ ( l 1 , , l p ) = | Δ | 1 2 n 0 F 0 ( 1 2 Δ 1 , L ) ϕ 0 ( l 1 , , l p ) ,
where
ϕ 0 ( l 1 , , l p ) = π 1 2 p 2 2 1 2 p n Γ p ( 1 2 n ) Γ p ( 1 2 p ) | L | 1 2 ( n p 1 ) i < j ( l i l j )
with Γ p ( a ) = π 1 4 p ( p 1 ) i = 1 p Γ ( a 1 2 ( i 1 ) ) .
Theorem 3.
0 F 0 ( 1 2 Δ 1 , L ) is M T P 2 .
Proof. 
Please note that
log 0 F 0 ( 1 2 Δ 1 , L ) L = O ( p ) e 1 2 tr ( Δ 1 H L H ) ( 1 2 Δ 1 H H ) ( d H ) O ( p ) e 1 2 tr ( Δ 1 H L H ) ( d H ) = O ( p ) e 1 2 tr ( Δ 1 H L H ) ( 1 2 Δ 1 ) ( d H ) O ( p ) e 1 2 tr ( Δ 1 H L H ) ( d H ) = 1 2 Δ 1 .
Thus, it is easy to see that
2 log 0 F 0 ( 1 2 Δ 1 , L ) L L = 0
and
2 log 0 F 0 ( 1 2 Δ 1 , L ) Δ L = 1 2 Δ 2 ,
respectively. Please note that Δ is a positive diagonal matrix, which is clearly positive definite. Hence the theorem follows. □
By the result of Dykstra and Hewett [18], it is easy to note that ϕ 0 ( l 1 , , l p ) enjoys the M T P 2 property. Hence, by Theorem 3 we may conclude that the density function ϕ δ ( l 1 , , l p ) of eigenvalues of X X , which is the product of two M T P 2 functions (Karlin and Rinott [12]), is M T P 2 .

4.2. Type 1 F 0 , Binomial Series: Eigenvalues When Σ 1 Σ 2

Suppose that the p × n 1 matrix variate X with n 1 independent columns distributed as N ( 0 , Σ 1 ) and p × n 2 matrix variate Y with n 2 independent columns distributed as N ( 0 , Σ 2 ) . Also let F = diag ( f 1 , , f p ) and Ω = diag ( ω 1 , , ω p ) , where f i is the ith largest eigenvalue of
| X X f Y Y | = 0 ,
and
| Σ 1 ω Σ 2 | = 0 .
Let
1 F 0 ( a ; S , T ) = O ( p ) | I + S H T H | a ( d H ) ,
the equation is well defined for all S and T being positive definite. Without loss of generality, we may take both S and T are positive diagonal matrices. Then the joint density function of eigenvalues f 1 , , f p is of the form
ϕ ω ( f 1 , , f p ) = | Ω | 1 2 n 1 1 F 0 ( 1 2 ( n 1 + n 2 ) ; Ω 1 , F ) ϕ * ( f 1 , , f p ) ,
where
ϕ * ( f 1 , , f p ) = π 1 2 p 2 Γ p ( 1 2 ( n 1 + n 2 ) ) Γ p ( 1 2 n 1 ) Γ p ( 1 2 n 2 ) Γ p ( 1 2 p ) | F | 1 2 ( n 1 p 1 ) i < j ( f i f j ) .
Theorem 4.
1 F 0 ( 1 2 n ; Ω 1 , F ) is M T P 2 , where n = n 1 + n 2 .
Proof. 
Please note that 1 F 0 ( 1 2 n ; Ω 1 , F ) = O ( p ) | I + Ω 1 H F H | n / 2 ( d H ) = | Ω | n / 2 O ( p ) | Ω + H F H | n / 2 ( d H ) , thus
log 1 F 0 ( 1 2 n ; Ω 1 , F ) F = O ( p ) 1 2 n | Ω + H F H | 1 2 n ( Ω + H F H ) 1 ( d H ) O ( p ) | Ω + H F H | 1 2 n ( d H ) .
Write B ( Ω , F ) = O ( p ) | Ω + H F H | n / 2 ( d H ) > 0 , then after some calculations
2 log 1 F 0 ( 1 2 n ; Ω 1 , F ) Ω F = { 1 4 n 2 O ( p ) | Ω + H F H | 1 2 n ( Ω + H F H ) 2 ( d H ) × B ( Ω , F ) 1 4 n 2 [ O ( p ) | Ω + H F H | 1 2 n ( Ω + H F H ) 1 ( d H ) ] 2 + 1 2 n O ( p ) | Ω + H F H | 1 2 n ( Ω + H F H ) 2 ( d H ) × B ( Ω , F ) } / B 2 ( Ω . F )
By Schwartz inequality, we then have
2 log 1 F 0 ( 1 2 n ; Ω 1 , F ) Ω F 1 2 n O ( p ) | Ω + H F H | 1 2 n ( Ω + H F H ) 2 ( d H ) B ( Ω , F ) = 1 2 n E f [ ( Ω + H F H ) 2 | F ; Ω ] 0
with probability one, where f ( H | F ; Ω ) = | Ω + H F H | n / 2 / B ( Ω , F ) can be viewed as the conditional density function of H given F . Please note that the matrix H F H is positive definite with probability one, thus the conditional expectation E f [ ( Ω + H F H ) 2 | F ; Ω ] is positive definite with probability one. Hence, the theorem follows. □
Please note that ϕ * ( f 1 , , f p ) is M T P 2 . Thus, by Theorem 4 we have that the density function ϕ ω ( f 1 , , f p ) is M T P 2 .

4.3. Type 0 F 1 , Bessel: Noncentral Means with Known Covariance

Suppose that the columns of a matrix p × n matrix X are independently normally distributed with covariance matrix Σ and E ( X ) = M . Let w i be the eigenvalues of | X X w Σ | = 0 and W = diag ( w 1 , , w p ) , then its density function depends only upon Ω = diag ( ω 1 , , ω p ) , where ω i are the eigenvalues of | M M ω Σ | = 0 , and is
ψ ω ( w 1 , , w p ; Ω ) = e 1 2 tr Ω 0 F 1 ( 1 2 n ; 1 4 Ω , W ) ψ 0 ( w 1 , , w p ) ,
where
ψ 0 ( w 1 , , w p ) = π 1 2 p 2 2 1 2 p n Γ p ( 1 2 n ) Γ p ( 1 2 p ) e 1 2 trw | W | 1 2 ( n p 1 ) i < j ( w i w j ) .
Furthermore, from Equation (32) of James [19], we have the inverse Laplace transform
0 F 1 ( 1 2 n ; Ω , W ) = 2 1 2 p ( p 1 ) Γ p ( 1 2 n ) ( 2 π i ) 1 2 p ( p + 1 ) R ( T ) > 0 e t r T | T | 1 2 n 0 F 0 ( T 1 Ω , W ) ( d T ) .
Without loss of generality, we may assume that T = diag ( t 1 , , t p ) and also let T 1 Ω = U 1 , then the Equation (16) becomes
0 F 1 ( b ; Ω , W ) = 2 1 2 p ( p 1 ) Γ p ( 1 2 n ) ( 2 π i ) 1 2 p ( p + 1 ) R ( U ) > 0 e t r ( Ω U ) | Ω U | 1 2 n 0 F 0 ( U 1 , W ) | Ω | p ( d U ) = 2 1 2 p ( p 1 ) Γ p ( 1 2 n ) ( 2 π i ) 1 2 p ( p + 1 ) R ( U ) > 0 g ( u 1 , , u p ; Ω ) 0 F 0 ( U 1 , W ) ( d U ) ,
where
g ( u 1 , , u p ; Ω ) = e i = 1 p ω i u i i = 1 p ω i 1 2 n + p i = 1 p u i 1 2 n .
Theorem 5.
0 F 1 ( 1 2 n ; Ω , W ) is M T P 2 .
Proof. 
By Equation (18), after some straightforward calculations, we have
2 log g ( u 1 , , u p ; Ω ) ω i u i = 1 > 0 , i = 1 , , p .
Thus, g ( u 1 , , u p ; Ω ) is T P 2 pairwise in ( u i , δ i ) . Namely, g ( u 1 , , u p ; Ω ) is M T P 2 . By the composition lemma (Proposition 1) of Karlin [10], thus the theorem follows. □
Please note that ψ 0 ( w 1 , , w p ) is M T P 2 , thus by Theorem 5, we may conclude that the density function ψ ω ( w 1 , , w p ; Ω ) is M T P 2 .

4.4. Type 1 F 1 , Confluent Hypergeometric: Noncentral Eigenvalues

Suppose that the p × n 1 matrix variate X with n 1 independent columns distributed as N ( M , Σ ) and p × n 2 matrix variate Y with n 2 independent columns distributed as N ( 0 , Σ ) for n 1 p . Let ω i * be the eigenvalues of | M M ω * Σ | = 0 . Write Ω * = diag ( ω 1 * , , ω p * ) , then the joint density function of eigenvalues f i of | X X f Y Y | = 0 , (i) for n 1 p , is of the form
ϕ ω * * ( f 1 , , f p ; Ω * ) = e 1 2 tr Ω * 1 F 1 ( 1 2 ( n 1 + n 2 ) ; 1 2 n 2 ; 1 2 Ω * , ( I + F 1 ) 1 ) ϕ * ( f 1 , , f p ) ,
where
ϕ * ( f 1 , , f p ) = π 1 2 p 2 Γ p ( 1 2 ( n 1 + n 2 ) ) Γ p ( 1 2 n 1 ) Γ p ( 1 2 n 2 ) Γ p ( 1 2 p ) | F | 1 2 ( n 1 p 1 ) | I + F | 1 2 ( n 1 + n 2 ) i < j ( f i f j ) .
Theorem 6.
1 F 1 ( 1 2 n ; 1 2 n 2 ; 1 2 Ω * , ( I + F 1 ) 1 ) is M R R 2 , where n = n 1 + n 2 .
Proof. 
Please note that
1 F 0 ( 1 2 n ; Ω * 1 , ( I + F 1 ) 1 ) = O ( p ) | I Ω * 1 H ( I + F 1 ) 1 H | 1 2 n ( d H ) = | Ω * | 1 2 n O ( p ) | Ω * H ( I + F 1 ) 1 H | 1 2 n ( d H ) .
Let G = ( I + F 1 ) 1 and C ( Ω * , G ) = O ( p ) | Ω * H G H | n / 2 ( d H ) > 0 , then
2 log 1 F 0 ( 1 2 n ; Ω * 1 , G ) Ω * G = { 1 4 n 2 O ( p ) | Ω * H G H | 1 2 n ( Ω * H G H ) 2 ( d H ) × C ( Ω * , G ) 1 4 n 2 [ O ( p ) | Ω * H G H | 1 2 n ( Ω * H G H ) 1 ( d H ) ] 2 + 1 2 n O ( p ) | Ω * H G H | 1 2 n ( Ω * H H ) 2 ( d H ) × C ( Ω * , G ) } / C 2 ( Ω * , G ) .
By Schwartz inequality, then
2 log 1 F 0 ( 1 2 n ; Ω * 1 , G ) Ω * G 1 2 n O ( p ) | Ω * H G H | 1 2 n ( Ω * H G H ) 2 ( d H ) / C ( Ω * , G ) = 1 2 n E g [ ( Ω * + H G H ) 2 | G ; Ω * ] 0
with probability one, where g ( H | G ; Ω * ) = | Ω + H G H | n / 2 / C ( Ω * , G ) can be viewed as the conditional density function of H given G . Please note that the matrix H G H is positive definite with probability one, thus the conditional expectation E g [ ( Ω * + H G H ) 2 | G ; Ω * ] is positive definite with probability one. Then, we have that 1 F 0 ( 1 2 n ; Ω * 1 , G ) is M R R 2 in ( Ω * , G ) . Also, it is easy to note that G is the monotone increasing function of F , thus, 1 F 0 ( 1 2 n ; Ω * 1 , ( I + F 1 ) 1 ) is M R R 2 in ( Ω * , F ) .
Next, consider
1 F 1 ( 1 2 n , 1 2 n 2 ; Ω * , ( I + F 1 ) 1 ) = 2 1 2 p ( p 1 ) Γ p ( 1 2 n 2 ) ( 2 π i ) 1 2 p ( p + 1 ) R ( T ) > 0 e t r T | T | 1 2 n 2 1 F 0 ( 1 2 n ; T 1 Ω * , ( I + F 1 ) 1 ) ( d T ) .
Let T 1 Ω * = U 1 , then
1 F 1 ( 1 2 n , 1 2 n 2 ; Ω * , ( I + F 1 ) 1 ) = 2 1 2 p ( p 1 ) Γ p ( 1 2 n 2 ) ( 2 π i ) 1 2 p ( p + 1 ) R ( U ) > 0 e t r ( Ω * U ) | Ω * U | 1 2 n 2 1 F 0 ( 1 2 n ; U 1 , W ) | Ω * | p ( d U ) = 2 1 2 p ( p 1 ) Γ p ( 1 2 n 2 ) ( 2 π i ) p ( p + 1 ) / 2 R ( U ) > 0 g * ( u 1 , , u p ; Ω * ) 1 F 0 ( 1 2 n ; U 1 , W ) ( d U ) ,
where
g * ( u 1 , , u p ; Ω * ) = e i = 1 p ω i * u i i = 1 p ω * i 1 2 n 2 + p i = 1 p u i 1 2 n 2 .
After some straightforward calculations, we have
2 log g * ( u 1 , , u p ; Ω * ) ω i * u i = 1 > 0 , u i , i = 1 , , p .
Thus, g * ( u 1 , , u p ; Ω * ) is T P 2 pairwise in ( u i , ω i * ) . Namely, g * ( u 1 , , u p ; Ω * ) is M T P 2 . By the composition lemma (Proposition 1) of Karlin [10], then the theorem follows. □
Similar to arguments in Example 5, we may have that ϕ * ( f 1 , , f p ) is M R R 2 . Thus by Theorem 6, we may conclude that the density function ϕ ω * * ( f 1 , , f p ; Ω * ) , which is the product of two M R R 2 functions (Karlin and Rinott [13]), is M R R 2 .
(ii) For p n 1 the joint density function of eigenvalues f 1 , , f p is of the form
ϕ 2 ω * * ( f 1 , , f p ; Ω * ) = e 1 2 tr Ω * 1 F 1 ( 1 2 ( n 1 + n 2 ) ; 1 2 p ; 1 2 Ω * , ( I + F 1 ) 1 ) ϕ 2 * ( f 1 , , f p ) ,
where
ϕ 2 * ( f 1 , , f p ) = π 1 2 n 1 2 Γ n 1 ( 1 2 ( n 1 + n 2 ) ) Γ n 1 ( 1 2 n 1 ) Γ n 1 ( 1 2 ( n 1 + n 2 p ) ) Γ n 1 ( 1 2 p ) | F | 1 2 ( p n 1 1 ) | I + F | 1 2 ( n 1 + n 2 ) i < j ( f i f j ) .
Similar to arguments in Theorem 6, we may conclude that the density function ϕ 2 ω * * ( f 1 , , f p ; Ω * ) is M R R 2 .

4.5. High-Dimensional Wishart Matrices

Suppose that the columns of a matrix p × n matrix X are independently normally distributed with covariance matrix Σ and E ( X ) = 0 . Let L = diag ( l 1 , , l p ) , where l i are the ith largest eigenvalue of X X . Let Σ = σ 2 I and c = lim n p / n , c ( 0 , 1 ) . When dimension p is large, the limiting distribution of sample spectral eigenvalues has the well-known Mar c ˇ enko-Pastur distribution F c , σ 2 (M-P law) with index c and scale parameter σ , which the density function is of the form
f c , σ 2 ( x ) = 1 2 π x c σ 2 ( b σ 2 x ) ( x a σ 2 ) , a σ 2 x b σ 2 ,
where a = ( 1 c ) 2 and b = ( 1 + c ) 2 (Mar c ˇ enko and Paster [21]). After some algebraic manipulations, we can show that
2 log f c , σ 2 ( x ) x σ 2 = 1 4 π c ( a + b ) x 2 4 a b x σ 2 + ( a + b ) a b σ 4 ( b σ 2 x ) 2 ( x a σ 2 ) 2 0 .
Thus, the Mar c ˇ enko-Pastur density function is T P 2 in ( x , σ 2 ) , and hence Theorem 1 is applicable.

5. Remarks

We adopted the notion of convex ordering of two distributions to prove the ordering of Shannon entropies in Theorem 2. Please note that the notions of convex ordering of two distributions, the monotone likelihood ratio and totally positive of order 2 ( M T P 2 ) are essentially equivalent. For many density functions such as those discussed in Section 3 and Section 4, it seems easier to characterize the M T P 2 property of density functions than that of convex ordering of distribution functions. In practice, we suggest use the notion of M T P 2 for the comparisons of Shannon entropy measures.
The difference of two Shannon entropies is an intrinsic distribution measure as we emphasized in this paper. As a result, the Shannon entropy measures can be ordered when two underline distributions have the relationship of convex ordering (i.e., the M T P 2 ) property. The monotonicity of power function, which is discussed basically under the same distributions but with different ordered parameters set up, comes out to be a special case of the results of the comparison of two Shannon entropies which can be even discussed under two totally different distribution functions.
For the problem of hypothesis testing H 0 : F ( x ) = c G ( x ) against H 1 : F ( x ) c G ( x ) , test statistic based on the sample version of the difference of two Shannon entropies enjoys the optimal property. The above hypothesis testing problem is equivalent to the hypothesis problem of testing H 0 : H F ( u ) = u , u [ 0 , 1 ] against H 1 : H F ( u ) is convex on [ 0 , 1 ] . Let U 1 : n < U 2 : n < < U n : n be the order statistics from H F , then the most powerful test is based on the statistics i = 1 n l o g h F ( U i : n ) , which is the empirical version of the difference of two Shannon entropies. Since h F ( u ) is monotone increasing in u, thus the test has the property of power monotonicity.
Noncentral chi-square distributions play an important role for statistical inference. For the univariate case, a noncentral chi-square or gamma density which are typically Poisson mixture of central chi-square or gamma type densities. Beyond the monotonicity property, we may aware of the following: let f α ( x , β ) denote the univariate gamma density function with the shape parameter α and the scale parameter β , and its distribution function is denoted by F α ( x , β ) . For the fixed scale parameter gamma densities, van Zwet [6] showed the result of convex ordering of gamma distributions, i.e., if 0 < α 1 α 2 then F α 2 ( x , β ) c F α 1 ( x , β ) . Namely, f α ( x , β ) is T P 2 in ( x , α ) when β is fixed. Thus, by Theorem 1 if 0 < α 1 α 2 then I S ( f α 1 ) I S ( f α 2 ) when β is fixed. Similarly, when α is fixed f α ( x , β ) is T P 2 in ( x , β ) . The above results tell us that the larger degrees of freedom is, the larger Shannon entropy measure is when both noncentralities are kept the same. For any two test statistics (or estimators) which are noncentral chi-square distributed with the same noncentrality, but with different degrees of freedom, then the Pitman efficiency (based on the Kullback-Leibler divergence) makes no difference of comparisons. However, our new measure of the difference of two Shannon entropies is nonzero, which means that the new measure makes a difference of comparisons. For the MANOVA models, the density functions of eigenvalues which are Bessel of multivariate gamma type density or confluent hypergeometric of multivariate beta type density such as those discussed in Section 4.3 and Section 4.4, respectively, our results can be directly applied to the tests with monotone acceptance regions to enjoy the power monotonicity for those density functions with the M T P 2 ( M L R ) property.
The conceived mixture model arises also for the finite case. A multi-sample model where for some G ( G > 1 ) , there are G densities f g ( x ) , g = 1 , , G and a sample of size n g is drawn from the density f g ( x ) , so that the pooled sample relates to the mixture density with w g = n g / g = 1 G n g . For the finite mixture case, it is hard to find out the explicit form of Shannon entropy of mixture distribution, even under the multinormal set-up. Suppose that there are G groups, the random vector X has the mixture density function f G * ( x ) = g = 1 G w g f g ( x ) , where g = 1 G w g = 1 . Then the distribution function is F G * ( x ) = P { X x } = x g = 1 G w g f g ( y ) d y = g = 1 G w g x f g ( y ) d y = g = 1 G w g F g ( x ) . The w g are nonnegative and they add up to 1, so the mixture density represents a convex combination of the component densities. We find that the decomposability of Shannon entropy still holds in a different way for the continuous random vectors.
Theorem 7.
I S ( f G * ) = g = 1 G w g I S ( f g ) + g = 1 G w g K ( f g , f G * ) .
Proof. 
Please note that
I S ( f G * ) = log f G * ( x ) d F G * ( x ) = log ( g = 1 G w g f g ( x ) ) d [ i = 1 G w i F i ( x ) ] = g = 1 G w g f g ( x ) log ( i = 1 G w i f i ( x ) ) d x = g = 1 G w g f g ( x ) log f g ( x ) d x + g = 1 G w g f g ( x ) log ( f g ( x ) i = 1 G w i f i ( x ) ) d x = g = 1 G w g I S ( f g ) + g = 1 G w g K ( f g , f G * ) .
 □
It follows from Theorem 7 that I S ( f G * ) g = 1 G w g I S ( f g ) . We further note that for a mixture model, the Shannon entropy has the decomposability property similar to well-known Fisher’s MANOVA model. Please note that the first term on the right hand side represents an average of the individual entropies (i.e., analogous to the within group sum of squares) while the second term is nonnegative and represents the between group distances.
For a high-dimensional Wishart matrix with an unknown scale times the identity matrix being the population covariance matrix, the property of T P 2 property (i.e., the M L R property) for the limiting empirical density function of eigenvalues is studied in the Section 4.5. Although the Mar c ˇ enko-Pastur equation (Silverstein [22]) provides the link of limiting empirical spectral distribution, F, to the limiting behavior of the population spectral distribution, H, one can expect to retrieve the information of H from F. However, the difficulty lies within the fact that the relationship between H and F is entangled. Whether the result studied in Section 4.5 can be extended to the general population covariance matrix case or not remains to be clarified. Another focus of random matrices in the literature is related to the Gaussian divisible ensembles. The Gaussian divisible ensembles are matrices of the form H t = e t / 2 H 0 + ( 1 e t ) 1 / 2 H G where t > 0 is a parameter, H 0 is a Wigner matrix and H G is an independent Gaussian orthogonal ensemble matrix. The eigenvalue distribution of the Gaussian divisible ensembles are the same as that of the solution of a matrix valued Ornstein-Uhlenbeck process H t for any time t 0 . The dynamics of the eigenvalues of H t is given by the system of so-called Dyson’s Brownian motion (Dyson [23]). The treatment of the sample covariance matrix is analogous, but the formulas change slightly (see Erd o ʺ s, L. et al. [24], for the details). It is interesting to further study that: under the general model, whether the corresponding limiting spectral density function of eigenvalues, obtained via the convolution of the limiting density function of eigenvalues of initial matrix with the Mar c ˇ enko-Pastur density function, is T P 2 or not. We pose those problems as a project of future study.

Author Contributions

Conceptualization, M.-T.T., F.-J.H. and C.-H.T.; methodology, M.-T.T., F.-J.H. and C.-H.T.; validation, M.-T.T., F.-J.H. and C.-H.T.; formal analysis, M.-T.T., F.-J.H. and C.-H.T.; writing–original draft preparation, M.-T.T., F.-J.H. and C.-H.T.; writing–review and editing, M.-T.T.

Funding

The APC was funded by Academia Sinica, Taiwan, R.O.C.

Acknowledgments

The authors woud like to thank the reviewers for their helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Karlin, S.; Rinott, Y. Entropy inequalities for class of probability distributions I. The univariate case. Adv. Appl. Probab. 1981, 13, 93–112. [Google Scholar] [CrossRef]
  2. Karlin, S.; Rinott, Y. Entropy inequalities for class of probability distributions II. The Multivariate case. Adv. Appl. Probab. 1981, 13, 325–351. [Google Scholar] [CrossRef]
  3. Zografos, K.; Nadarajah, S. Expressions for Renyi and Shannon entropies for multivariate distributions. Stat. Probab. Lett. 2005, 71, 71–84. [Google Scholar] [CrossRef]
  4. Tsai, M.-T. The ordering of Shannon entropies. Stat. Sin. 2017, 27, 1725–1729. [Google Scholar] [CrossRef]
  5. Perlman, M.; Olkin, I. Unbiasedness of invariant tests for MANOVA and other multivariate problems. Ann. Stat. 1980, 8, 1326–1341. [Google Scholar] [CrossRef]
  6. Van Zwet, W.R. Convex Transformations of Random Variables; Mathematisch Centrum: Amsterdam, The Netherlands, 1964. [Google Scholar]
  7. Barlow, R.E.; Doksum, K.A. Isotonic tests for convex orderings. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume I; University of California Press: Berkeley, CA, USA, 1972; pp. 293–323. [Google Scholar]
  8. Gastwirth, J.L. A general definition of the Lorenz curve. Econometrica 1971, 39, 1037–1039. [Google Scholar] [CrossRef]
  9. Shaked, M.; Shanthikumar, J. Stochastic Orders; Springer: New York, NY, USA, 2007. [Google Scholar]
  10. Karlin, S. Total Positivity; Stanford University Press: Stanford, CA, USA, 1968. [Google Scholar]
  11. Tong, Y.L. The Multivariate Normal Distribution; Springer: New York, NY, USA, 1990. [Google Scholar]
  12. Karlin, S.; Rinott, Y. Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions. J. Multivar. Anal. 1980, 10, 467–498. [Google Scholar] [CrossRef]
  13. Karlin, S.; Rinott, Y. Classes of orderings of measures and related correlation inequalities. II. Multivariate reverse rule distributions. J. Multivar. Anal. 1980, 10, 499–516. [Google Scholar] [CrossRef]
  14. Gupta, R.D.; Richards, D.S.P. Multivariate Liouville distributions. J. Multivar. Anal. 1987, 23, 233–256. [Google Scholar] [CrossRef]
  15. Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B 1972, 34, 187–220. [Google Scholar] [CrossRef]
  16. Dumitriu, I. Eigenvalue Statistics for Beta-Ensembles. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2003. [Google Scholar]
  17. Peddada, S.D.; Richards, D.S.P. Entropy inequality for some multivariate distributions. J. Multivar. Anal. 1991, 39, 202–208. [Google Scholar] [CrossRef]
  18. Dykstra, R.L.; Hewett, J.E. Positive dependence of the roots of a Wishart matrix. Ann. Stat. 1978, 6, 235–238. [Google Scholar] [CrossRef]
  19. James, A.T. The distribution of the latent roots of the covariance matrix. Ann. Math. Stat. 1960, 31, 874–882. [Google Scholar] [CrossRef]
  20. Muirhead, R.J. Aspects of Multivariate Statistical Theory; John Wiley & Sons: New York, NY, USA, 1982. [Google Scholar]
  21. Marčenko, V.A.; Paster, L.A. Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sbornik 1967, 1, 457–483. [Google Scholar] [CrossRef]
  22. Silverstein, J.W. Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices. J. Multivar. Anal. 1995, 55, 331–339. [Google Scholar] [CrossRef]
  23. Dyson, F.J. A Brownian-motion model for the eigenvalues of a random matrix. J. Math. Phys. 1962, 3, 1191–1198. [Google Scholar] [CrossRef]
  24. Erdős, L.; Schlein, B.; Yau, H.-T.; Yin, J. The local relaxation flow approach to university of the local statistics for random matrices. Ann. Inst. Henri Poincaré Probab. Stat. 2012, 48, 1–46. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Tsai, M.-T.; Hsu, F.-J.; Tsai, C.-H. The Ordering of Shannon Entropies for the Multivariate Distributions and Distributions of Eigenvalues. Entropy 2019, 21, 201. https://doi.org/10.3390/e21020201

AMA Style

Tsai M-T, Hsu F-J, Tsai C-H. The Ordering of Shannon Entropies for the Multivariate Distributions and Distributions of Eigenvalues. Entropy. 2019; 21(2):201. https://doi.org/10.3390/e21020201

Chicago/Turabian Style

Tsai, Ming-Tien, Feng-Ju Hsu, and Chia-Hsuan Tsai. 2019. "The Ordering of Shannon Entropies for the Multivariate Distributions and Distributions of Eigenvalues" Entropy 21, no. 2: 201. https://doi.org/10.3390/e21020201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop