Next Article in Journal
Entropy Calculation of Reversible Mixing of Ideal Gases Shows Absence of Gibbs Paradox
Previous Article in Journal
On Thermodynamics, Entropy and Evolution of Biological Systems: What Is Life from a Physical Chemist's Viewpoint
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On Joint and Conditional Entropies

Department of Statistics, University of California, Riverside, CA 92521, USA
Entropy 1999, 1(2), 21-24; https://doi.org/10.3390/e1020021
Submission received: 19 March 1999 / Accepted: 30 April 1999 / Published: 5 May 1999

Abstract

:
It is shown that if the conditional densities of a bivariate random variable have maximum entropies, subject to certain constraints, then the bivariate density also maximizes entropy, subject to appropriate constraints.  Some examples are discussed.

Introduction and Notation

Let f(x,y) denote a continuous bivariate probability density defined on the support SX × SY. The entropy of f(x,y) is defined as H(f) = Ef(-ln f(X,Y)). We shall use a similar notation for the entropy of univariate densities. Let the conditional densities of f(x,y) be denoted by f1(x|y) and f2(y|x). Many families of probability distributions are known to maximize the entropy among distributions that satisfy given constraints on the expectations of certain statistics. The Beta, Gamma and normal families of distributions are well known examples of this principle. Suppose now that f1(x|y) and f2(y|x) are known to belong to such families. The purpose of the present note is to show that f(x,y) also maximizes the entropy subject to constraints on the expectations of suitably defined statistics. This statement is made more specific and more precise in the following section and presented as a theorem.

Main Results

Let us make the following assumptions on f1(x|y) and f2(y|x).
Assumption 2.1. Density f1(x|y) maximizes the entropy H(f1(y)), for each y ε SY, in the class Ψ1 of continuous densities which satisfy
E 1 ( q i ( X ) )   = μ i ( y ) ,   i   =   1 , 2 , ,   k ,
for some constants μi, which may depend on y. Here E1 denotes the expectation with respect to the densities in Ψ1. The statistics {qi(x)} are functionally independent.
Assumption 2.2. Density f2(y|x) maximizes the entropy H(f2(x)), for each x ε SX, in the class Ψ2 of continuous densities which satisfy
E 2 ( t j ( Y ) )   = ν j ( x ) ,   j   =   1 , 2 , ,   l ,
for some constants νj , which may depend on x. Here E2 denotes the expectation with respect to the densities in Ψ2. The statistics {tj(y)} are functionally independent.
Theorem 1: If Assumptions 2.1 and 2.2 hold, then the joint density f(x,y) maximizes its entropy H(f) in the class Ψ of densities subject to the constraints
E g [ q i ( X ) ) t j ( Y ) ]   = σ i j ,   i   =   0 , 1 , 2 , ,   k ,   j   =   0 , 1 , 2 , , l ,
where q0(x) = t0(y) ≡ 1, σij are appropriate constants, and Eg denotes the expectation with respect to densities in Ψ.
Proof: It is well known that (see [2,3], for example) under Assumption 2.1 the density f1(x|y) is of the form
f 1 ( x | y )   = β 1 ( θ _ ( y ) ) exp { Σ i = 1 k   θ i ( y ) q i ( x ) } ,   x   ε   S X ,   y   ε   S Y ,
and equals zero otherwise. The μi(y) and (β1(θ(y)), θi(y)) are assumed to be 1:1 functions. Similarly, under Assumption 2.2 the density f2(y|x) is of the form
f 2 ( y | x )   = β 2 ( λ _ ( x ) ) exp { Σ j = 1 l   λ j ( x ) t j ( y ) } ,   y   ε   S Y ,   x   ε   S X ,
and equals zero otherwise. The νj(x) and (β2(λ(x)), λj(x)) are assumed to be 1:1. Now, if f1(x|y) and f2(y|x) are given to be of the forms (4) and (5) respectively, it has been shown by Arnold, et.al, [1, page 36] that the bivariate density f(x,y) must be of the form
f ( x , y )   =   e x p { q ( x ) M t ( y ) } ,   ( x , y ) ε   S X × S Y ,
and equals zero otherwise, for a suitable choice of the (k+1)×(l+1) matrix of parameters M. Here q(x) = (q0(x), q1(x),q2(x),…,qk(x))’ and t(y) = (t0(y),t1(y),t2(y),…,tl(y))’. From (6) it follows that f(x,y) maximizes the entropy H(f) in class of continuous bivariate densities Ψ, in which, for g ε Ψ,
E g [ q i ( X ) ) t j ( Y ) ]   = σ i j ,   i   =   0 , 1 , 2 , , k ,   j   =   0 , 1 , 2 , , l ,
and the parameters in M are 1:1 functions of σij. Note that the maximum entropy density (6) incorporates constraints on the mixed moments of the form Eg[qi(X))tj(Y)] only.

Examples and some remarks

As first example, suppose that the conditional density of Y given X = x is normal with mean a(x) and standard deviation b(x), and the conditional density of X given Y = y is normal with mean d(y) and standard deviation c(y). Arnold et.al. [1, pages 25-27] show that the joint density of (X,Y) must be of the form
f ( x , y )   =   ( 2 π ) 1 e x p { [ A + 2 B x + 2 G y + C x 2 + D y 2 + 2 H x y + 2 J x 2 y + 2 E x y 2 + F x 2 y 2 ] / 2 } ,
in which the parameters A through F must satisfy one of the two sets of the following conditions.
(i)
F = E = J = 0, D > 0, C > 0, H2 < DC, or
(ii)
F > 0, FD > E2, CF > J2.
Models satisfying conditions (i), are classical bivariate normal densities with normal marginals, normal conditionals, linear regressions and constant conditional variances. Models satisfying conditions (ii) have normal conditionals (as specified) but distinctly non-normal densities. They are
g ( x )   =   [ 2 π ( D + 2 E x + F x 2 ) ] 1 / 2 exp { [ A + 2 B x + C x 2   ( G + H x + J x 2 ) 2 / ( D + 2 E x + F x 2 ) ] / 2 }
for X and
h ( y )   =   [ 2 π ( C + 2 J y + F y 2 ) ] 1 / 2 e x p { [ A + 2 G y + D y 2   ( B + H y + E y 2 ) 2 / ( C + 2 J y + F y 2 ) ] / 2 }
for Y. From the point of view of this paper, these results provide an interesting insight in the structure of joint maximum entropy distributions when conditional maximum entropy distributions are specified. Note that one set of conditions give the usual bivariate family of normal distributions, which is well known to maximize entropy subject to constraints on the moments given by a given mean vector and a positive definite covariance matrix. In such families the joint moment constraint is through the covariance of X and Y while the other constraints are on the marginal moments. The other set of conditions yields a family of bivariate distributions which still has maximum entropy, but the marginal moment constraints are remarkably different and complicated.
As second example, consider the situation when the conditional densities of X|y and Y|x are exponential with mean functions given by [μ(y)]-1 and [ν(x)]-1. Then the joint density must be of the form
f ( x , y )     exp [ - ( B x + C y + D x y ) ] ,   x   >   0 ,   y   >   0 ,
with parameters B,C, and D being positive. In this case, even though the joint density is a maximum entropy density subject to moment constraints on X, Y and XY, the marginal density of X is proportional to (Dx+C)-1 exp(-Bx) (see [1], page 19). This is not the exponential density that maximizes entropy with positive real line as support and the usual constraint on the mean E(X). Further, the correlation coefficient between X and Y is always non-positive.
In summary, one can say that there are two approaches of obtaining bivariate maximum entropy densities. One usual approach is to start with marginal maximum entropy densities and get joint maximum entropy densities by imposing constraints on bivariate moments. The other is to start with conditional maximum entropy densities and construct a joint density. Theorem 1 proves that the second approach also leads to a maximum entropy density. The examples above bring out the differences in the two approaches.
As a final remark, it may be pointed out that the proof of Theorem 1 above, as it follows directly from the results of Arnold et.al. [1], is not at all intricate. On the other hand, an alternative proof, without the use of their results, seems quite difficult.

Acknowledgments

Thanks are due to the referee for helpful suggestions.

References and Notes

  1. Arnold, B.C.; Castillo, E.; Sarabia, J. Conditionally Specified Distributions. Lecture Notes in Statistics, No. 73; Springer-Verlag: New York, NY, 1992.
  2. Gokhale, D.V. Maximum entropy characterizations of some distributions. In Statistical Distributions in Scientific Work; Patil, G.P., Kotz, S., Ord, J.K., Eds.; D. Reidel Publishing: Boston, MA; pp. 299–304. 1975. [Google Scholar]
  3. Kagan, A.M.; Linnik, Yu.V.; Rao, C.R. Characterization Problems in Mathematical Statistics, J. Wiley: New York, NY, 1973.

Share and Cite

MDPI and ACS Style

Gokhale, D.V. On Joint and Conditional Entropies. Entropy 1999, 1, 21-24. https://doi.org/10.3390/e1020021

AMA Style

Gokhale DV. On Joint and Conditional Entropies. Entropy. 1999; 1(2):21-24. https://doi.org/10.3390/e1020021

Chicago/Turabian Style

Gokhale, D. V. 1999. "On Joint and Conditional Entropies" Entropy 1, no. 2: 21-24. https://doi.org/10.3390/e1020021

Article Metrics

Back to TopTop