On joint and conditional entropies

It is shown that if the conditional densities of a bivariate random variable have maximum entropies, subject to certain constraints, then the bivariate density also maximizes entropy, subject to appropriate constraints. Some examples are discussed.


Introduction and Notation
Let f(x,y) denote a continuous bivariate probability density defined on the support S X × S Y .The entropy of f(x,y) is defined as H(f) = E f (-ln f(X,Y)).We shall use a similar notation for the entropy of univariate densities.Let the conditional densities of f(x,y) be denoted by f 1 (x|y) and f 2 (y|x).Many families of probability distributions are known to maximize the entropy among distributions that satisfy given constraints on the expectations of certain statistics.The Beta, Gamma and normal families of distributions are well known examples of this principle.Suppose now that f 1 (x|y) and f 2 (y|x) are known to belong to such families.The purpose of the present note is to show that f(x,y) also maximizes the entropy subject to constraints on the expectations of suitably defined statistics.This statement is made more specific and more precise in the following section and presented as a theorem.

Main Results
Let us make the following assumptions on f 1 (x|y) and f 2 (y|x).Assumption 2.1.Density f 1 (x|y) maximizes the entropy H(f 1 (y)), for each y ε S Y , in the class Ψ 1 of continuous densities which satisfy E 1 (q i (X)) = µ i (y), i = 1,2,…, k, (1) for some constants µ i , which may depend on y.Here E 1 denotes the expectation with respect to the densities in Ψ 1 .The statistics {q i (x)} are functionally independent.
Assumption 2.2.Density f 2 (y|x) maximizes the entropy H(f 2 (x)), for each x ε S X , in the class Ψ 2 of continuous densities which satisfy E 2 (t j (Y)) = ν j (x), j = 1,2,…, l, (2) for some constants ν j , which may depend on x.Here E 2 denotes the expectation with respect to the densities in Ψ 2 .The statistics {t j (y)} are functionally independent.
Theorem 1: If Assumptions 2.1 and 2.2 hold, then the joint density f(x,y) maximizes its entropy H(f) in the class Ψ of densities subject to the constraints where q 0 (x) = t 0 (y) ≡ 1, σ ij are appropriate constants, and E g denotes the expectation with respect to densities in Ψ .

Examples and some remarks
As first example, suppose that the conditional density of Y given X = x is normal with mean a(x) and standard deviation b(x), and the conditional density of X given Y = y is normal with mean d(y) and standard deviation c(y).Arnold et.al.[1, pages 25-27] show that the joint density of (X,Y) must be of the form f(x,y) = (2π) -1 exp{-[A+2Bx+2Gy+Cx 2 +Dy 2 +2Hxy+2Jx 2 y+2Exy 2 +Fx 2 y 2 ]/2}, (8) in which the parameters A through F must satisfy one of the two sets of the following conditions.
(i) F = E = J = 0, D > 0, C > 0, H 2 < DC, or (ii) F > 0, FD > E 2 , CF > J 2 .Models satisfying conditions (i), are classical bivariate normal densities with normal marginals, normal conditionals, linear regressions and constant conditional variances.Models satisfying conditions (ii) have normal conditionals (as specified) but distinctly non-normal densities.They are g ]/2} for Y. From the point of view of this paper, these results provide an interesting insight in the structure of joint maximum entropy distributions when conditional maximum entropy distributions are specified.Note that one set of conditions give the usual bivariate family of normal distributions, which is well known to maximize entropy subject to constraints on the moments given by a given mean vector and a positive definite covariance matrix.In such families the joint moment constraint is through the covariance of X and Y while the other constraints are on the marginal moments.The other set of conditions yields a family of bivariate distributions which still has maximum entropy, but the marginal moment constraints are remarkably different and complicated.
As second example, consider the situation when the conditional densities of X|y and Y|x are exponential with mean functions given by [µ(y)] -1 and [ν(x)] -1 .Then the joint density must be of the form f(x,y) ∝ exp[-(Bx+Cy+Dxy)], x > 0, y > 0, with parameters B,C, and D being positive.In this case, even though the joint density is a maximum entropy density subject to moment constraints on X, Y and XY, the marginal density of X is proportional to (Dx+C) -1 exp(-Bx) (see [1], page 19).This is not the exponential density that maximizes entropy with positive real line as support and the usual constraint on the mean E(X).Further, the correlation coefficient between X and Y is always non-positive.
In summary, one can say that there are two approaches of obtaining bivariate maximum entropy densities.One usual approach is to start with marginal maximum entropy densities and get joint maximum entropy densities by imposing constraints on bivariate moments.The other is to start with conditional maximum entropy densities and construct a joint density.Theorem 1 proves that the second approach also leads to a maximum entropy density.The examples above bring out the differences in the two approaches.
As a final remark, it may be pointed out that the proof of Theorem 1 above, as it follows directly from the results of Arnold et.al.[1], is not at all intricate.On the other hand, an alternative proof, without the use of their results, seems quite difficult.
Asknowledgement: Thanks are due to the referee for helpful suggestions.