On Joint and Conditional Entropies

D. V. Gokhale

doi:10.3390/e1020021

Department of Statistics, University of California, Riverside, CA 92521, USA

Entropy1999, 1(2), 21-24;https://doi.org/10.3390/e1020021

Version Notes

Order Reprints

Abstract

It is shown that if the conditional densities of a bivariate random variable have maximum entropies, subject to certain constraints, then the bivariate density also maximizes entropy, subject to appropriate constraints. Some examples are discussed.

Keywords:

joint and conditional entropies; maximum entropy subject to constraints

Introduction and Notation

Let f(x,y) denote a continuous bivariate probability density defined on the support S_X × S_Y. The entropy of f(x,y) is defined as H(f) = E_f(-ln f(X,Y)). We shall use a similar notation for the entropy of univariate densities. Let the conditional densities of f(x,y) be denoted by f₁(x|y) and f₂(y|x). Many families of probability distributions are known to maximize the entropy among distributions that satisfy given constraints on the expectations of certain statistics. The Beta, Gamma and normal families of distributions are well known examples of this principle. Suppose now that f₁(x|y) and f₂(y|x) are known to belong to such families. The purpose of the present note is to show that f(x,y) also maximizes the entropy subject to constraints on the expectations of suitably defined statistics. This statement is made more specific and more precise in the following section and presented as a theorem.

Main Results

Let us make the following assumptions on f₁(x|y) and f₂(y|x).

Assumption 2.1. Density f₁(x|y) maximizes the entropy H(f₁(y)), for each y ε S_Y, in the class Ψ₁ of continuous densities which satisfy

E_{1} (q_{i} (X)) = μ_{i} (y), i = 1, 2, \dots, k,

(1)

for some constants μ_i, which may depend on y. Here E₁ denotes the expectation with respect to the densities in Ψ₁. The statistics {q_i(x)} are functionally independent.

Assumption 2.2. Density f₂(y|x) maximizes the entropy H(f₂(x)), for each x ε S_X, in the class Ψ₂ of continuous densities which satisfy

E_{2} (t_{j} (Y)) = ν_{j} (x), j = 1, 2, \dots, l,

(2)

for some constants ν_j , which may depend on x. Here E₂ denotes the expectation with respect to the densities in Ψ₂. The statistics {t_j(y)} are functionally independent.

Theorem 1: If Assumptions 2.1 and 2.2 hold, then the joint density f(x,y) maximizes its entropy H(f) in the class Ψ of densities subject to the constraints

E_{g} [q_{i} (X)) t_{j} (Y)] = σ_{i j}, i = 0, 1, 2, \dots, k, j = 0, 1, 2, \dots, l,

(3)

where q₀(x) = t₀(y) ≡ 1, σ_ij are appropriate constants, and E_g denotes the expectation with respect to densities in Ψ_.

Proof: It is well known that (see [2,3], for example) under Assumption 2.1 the density f₁(x|y) is of the form

f_{1} (x | y) = β_{1} (\underline{θ} (y)) \exp {{Σ_{i = 1}}^{k} θ_{i} (y) q_{i} (x)}, x ε S_{X}, y ε S_{Y},

(4)

and equals zero otherwise. The μ_i(y) and (β₁(θ(y)), θ_i(y)) are assumed to be 1:1 functions. Similarly, under Assumption 2.2 the density f₂(y|x) is of the form

f_{2} (y | x) = β_{2} (\underline{λ} (x)) \exp {{Σ_{j = 1}}^{l} λ_{j} (x) t_{j} (y)}, y ε S_{Y}, x ε S_{X},

(5)

and equals zero otherwise. The ν_j(x) and (β₂(λ(x)), λ_j(x)) are assumed to be 1:1. Now, if f₁(x|y) and f₂(y|x) are given to be of the forms (4) and (5) respectively, it has been shown by Arnold, et.al, [1, page 36] that the bivariate density f(x,y) must be of the form

f (x, y) = e x p {q (x) ’ M t (y)}, (x, y) ε S_{X} \times S_{Y},

(6)

and equals zero otherwise, for a suitable choice of the (k+1)×(l+1) matrix of parameters M. Here q(x) = (q₀(x), q₁(x),q₂(x),…,q_k(x))’ and t(y) = (t₀(y),t₁(y),t₂(y),…,t_l(y))’. From (6) it follows that f(x,y) maximizes the entropy H(f) in class of continuous bivariate densities Ψ, in which, for g ε Ψ,

E_{g} [q_{i} (X)) t_{j} (Y)] = σ_{i j}, i = 0, 1, 2, \dots, k, j = 0, 1, 2, \dots, l,

(7)

and the parameters in M are 1:1 functions of σ_ij. Note that the maximum entropy density (6) incorporates constraints on the mixed moments of the form E_g[q_i(X))t_j(Y)] only.

Examples and some remarks

As first example, suppose that the conditional density of Y given X = x is normal with mean a(x) and standard deviation b(x), and the conditional density of X given Y = y is normal with mean d(y) and standard deviation c(y). Arnold et.al. [1, pages 25-27] show that the joint density of (X,Y) must be of the form

f (x, y) = {(2 π)}^{- 1} e x p {- [A + 2 B x + 2 G y + C x^{2} + D y^{2} + 2 H x y + 2 J x^{2} y + 2 E x y^{2} + F x^{2} y^{2}] / 2},

(8)

in which the parameters A through F must satisfy one of the two sets of the following conditions.

(i): F = E = J = 0, D > 0, C > 0, H² < DC, or
(ii): F > 0, FD > E², CF > J².

Models satisfying conditions (i), are classical bivariate normal densities with normal marginals, normal conditionals, linear regressions and constant conditional variances. Models satisfying conditions (ii) have normal conditionals (as specified) but distinctly non-normal densities. They are

g (x) = {[2 π (D + 2 E x + F x^{2})]}^{- 1 / 2} \exp {- [A + 2 B x + C x^{2} - {(G + H x + J x^{2})}^{2} / (D + 2 E x + F x^{2})] / 2}

for X and

h (y) = {[2 π (C + 2 J y + F y^{2})]}^{- 1 / 2} e x p {- [A + 2 G y + D y^{2} - {(B + H y + E y^{2})}^{2} / (C + 2 J y + F y^{2})] / 2}

for Y. From the point of view of this paper, these results provide an interesting insight in the structure of joint maximum entropy distributions when conditional maximum entropy distributions are specified. Note that one set of conditions give the usual bivariate family of normal distributions, which is well known to maximize entropy subject to constraints on the moments given by a given mean vector and a positive definite covariance matrix. In such families the joint moment constraint is through the covariance of X and Y while the other constraints are on the marginal moments. The other set of conditions yields a family of bivariate distributions which still has maximum entropy, but the marginal moment constraints are remarkably different and complicated.

As second example, consider the situation when the conditional densities of X|y and Y|x are exponential with mean functions given by [μ(y)]^-1 and [ν(x)]^-1. Then the joint density must be of the form

f (x, y) \propto \exp [- (B x + C y + D x y)], x > 0, y > 0,

with parameters B,C, and D being positive. In this case, even though the joint density is a maximum entropy density subject to moment constraints on X, Y and XY, the marginal density of X is proportional to (Dx+C)^-1 exp(-Bx) (see [1], page 19). This is not the exponential density that maximizes entropy with positive real line as support and the usual constraint on the mean E(X). Further, the correlation coefficient between X and Y is always non-positive.

In summary, one can say that there are two approaches of obtaining bivariate maximum entropy densities. One usual approach is to start with marginal maximum entropy densities and get joint maximum entropy densities by imposing constraints on bivariate moments. The other is to start with conditional maximum entropy densities and construct a joint density. Theorem 1 proves that the second approach also leads to a maximum entropy density. The examples above bring out the differences in the two approaches.

As a final remark, it may be pointed out that the proof of Theorem 1 above, as it follows directly from the results of Arnold et.al. [1], is not at all intricate. On the other hand, an alternative proof, without the use of their results, seems quite difficult.

Acknowledgments

Thanks are due to the referee for helpful suggestions.

References and Notes

Arnold, B.C.; Castillo, E.; Sarabia, J. Conditionally Specified Distributions. Lecture Notes in Statistics, No. 73; Springer-Verlag: New York, NY, 1992.
Gokhale, D.V. Maximum entropy characterizations of some distributions. In Statistical Distributions in Scientific Work; Patil, G.P., Kotz, S., Ord, J.K., Eds.; D. Reidel Publishing: Boston, MA; pp. 299–304. 1975. [Google Scholar]
Kagan, A.M.; Linnik, Yu.V.; Rao, C.R. Characterization Problems in Mathematical Statistics, J. Wiley: New York, NY, 1973.