Abstract
It is shown that if the conditional densities of a bivariate random variable have maximum entropies, subject to certain constraints, then the bivariate density also maximizes entropy, subject to appropriate constraints. Some examples are discussed.
Introduction and Notation
Let f(x,y) denote a continuous bivariate probability density defined on the support SX × SY. The entropy of f(x,y) is defined as H(f) = Ef(-ln f(X,Y)). We shall use a similar notation for the entropy of univariate densities. Let the conditional densities of f(x,y) be denoted by f1(x|y) and f2(y|x). Many families of probability distributions are known to maximize the entropy among distributions that satisfy given constraints on the expectations of certain statistics. The Beta, Gamma and normal families of distributions are well known examples of this principle. Suppose now that f1(x|y) and f2(y|x) are known to belong to such families. The purpose of the present note is to show that f(x,y) also maximizes the entropy subject to constraints on the expectations of suitably defined statistics. This statement is made more specific and more precise in the following section and presented as a theorem.
Main Results
Let us make the following assumptions on f1(x|y) and f2(y|x).
Assumption 2.1. Density f1(x|y) maximizes the entropy H(f1(y)), for each y ε SY, in the class Ψ1 of continuous densities which satisfy
for some constants μi, which may depend on y. Here E1 denotes the expectation with respect to the densities in Ψ1. The statistics {qi(x)} are functionally independent.
Assumption 2.2. Density f2(y|x) maximizes the entropy H(f2(x)), for each x ε SX, in the class Ψ2 of continuous densities which satisfy
for some constants νj , which may depend on x. Here E2 denotes the expectation with respect to the densities in Ψ2. The statistics {tj(y)} are functionally independent.
Theorem 1: If Assumptions 2.1 and 2.2 hold, then the joint density f(x,y) maximizes its entropy H(f) in the class Ψ of densities subject to the constraints
where q0(x) = t0(y) ≡ 1, σij are appropriate constants, and Eg denotes the expectation with respect to densities in Ψ.
Proof: It is well known that (see [2,3], for example) under Assumption 2.1 the density f1(x|y) is of the form
and equals zero otherwise. The μi(y) and (β1(θ(y)), θi(y)) are assumed to be 1:1 functions. Similarly, under Assumption 2.2 the density f2(y|x) is of the form
and equals zero otherwise. The νj(x) and (β2(λ(x)), λj(x)) are assumed to be 1:1. Now, if f1(x|y) and f2(y|x) are given to be of the forms (4) and (5) respectively, it has been shown by Arnold, et.al, [1, page 36] that the bivariate density f(x,y) must be of the form
and equals zero otherwise, for a suitable choice of the (k+1)×(l+1) matrix of parameters M. Here q(x) = (q0(x), q1(x),q2(x),…,qk(x))’ and t(y) = (t0(y),t1(y),t2(y),…,tl(y))’. From (6) it follows that f(x,y) maximizes the entropy H(f) in class of continuous bivariate densities Ψ, in which, for g ε Ψ,
and the parameters in M are 1:1 functions of σij. Note that the maximum entropy density (6) incorporates constraints on the mixed moments of the form Eg[qi(X))tj(Y)] only.
Examples and some remarks
As first example, suppose that the conditional density of Y given X = x is normal with mean a(x) and standard deviation b(x), and the conditional density of X given Y = y is normal with mean d(y) and standard deviation c(y). Arnold et.al. [1, pages 25-27] show that the joint density of (X,Y) must be of the form
in which the parameters A through F must satisfy one of the two sets of the following conditions.
- (i)
- F = E = J = 0, D > 0, C > 0, H2 < DC, or
- (ii)
- F > 0, FD > E2, CF > J2.
Models satisfying conditions (i), are classical bivariate normal densities with normal marginals, normal conditionals, linear regressions and constant conditional variances. Models satisfying conditions (ii) have normal conditionals (as specified) but distinctly non-normal densities. They are
for X and
for Y. From the point of view of this paper, these results provide an interesting insight in the structure of joint maximum entropy distributions when conditional maximum entropy distributions are specified. Note that one set of conditions give the usual bivariate family of normal distributions, which is well known to maximize entropy subject to constraints on the moments given by a given mean vector and a positive definite covariance matrix. In such families the joint moment constraint is through the covariance of X and Y while the other constraints are on the marginal moments. The other set of conditions yields a family of bivariate distributions which still has maximum entropy, but the marginal moment constraints are remarkably different and complicated.
As second example, consider the situation when the conditional densities of X|y and Y|x are exponential with mean functions given by [μ(y)]-1 and [ν(x)]-1. Then the joint density must be of the form
with parameters B,C, and D being positive. In this case, even though the joint density is a maximum entropy density subject to moment constraints on X, Y and XY, the marginal density of X is proportional to (Dx+C)-1 exp(-Bx) (see [1], page 19). This is not the exponential density that maximizes entropy with positive real line as support and the usual constraint on the mean E(X). Further, the correlation coefficient between X and Y is always non-positive.
In summary, one can say that there are two approaches of obtaining bivariate maximum entropy densities. One usual approach is to start with marginal maximum entropy densities and get joint maximum entropy densities by imposing constraints on bivariate moments. The other is to start with conditional maximum entropy densities and construct a joint density. Theorem 1 proves that the second approach also leads to a maximum entropy density. The examples above bring out the differences in the two approaches.
As a final remark, it may be pointed out that the proof of Theorem 1 above, as it follows directly from the results of Arnold et.al. [1], is not at all intricate. On the other hand, an alternative proof, without the use of their results, seems quite difficult.
Acknowledgments
Thanks are due to the referee for helpful suggestions.
References and Notes
- Arnold, B.C.; Castillo, E.; Sarabia, J. Conditionally Specified Distributions. Lecture Notes in Statistics, No. 73; Springer-Verlag: New York, NY, 1992.
- Gokhale, D.V. Maximum entropy characterizations of some distributions. In Statistical Distributions in Scientific Work; Patil, G.P., Kotz, S., Ord, J.K., Eds.; D. Reidel Publishing: Boston, MA; pp. 299–304. 1975. [Google Scholar]
- Kagan, A.M.; Linnik, Yu.V.; Rao, C.R. Characterization Problems in Mathematical Statistics, J. Wiley: New York, NY, 1973.
© 1999 by the authors. Reproduction of this article, by any means, is permitted for noncommercial purposes.