Calculation of Differential Entropy for a Mixed Gaussian Distribution

In this work, an analytical expression is developed for the differential entropy of a mixed Gaussian distribution. One of the terms is given by a tabulated function of the ratio of the distribution parameters.


Introduction
The concept of entropy for a random process was introduced by Shannon [1] to characterize the irreducible complexity in a particular process beyond which no compression is possible.Entropy was first formulated for discrete random variables, and was then generalized to continuous random variables in which case it is called differential entropy.By definition, for a continuous random variable X with probability density function p(x), the differential entropy is given by h(X) = − S p(x) log p(x) dx (1) where S = {x|p(x) > 0} is the support set of X.The log function may be taken to be log 2 , and then the entropy is expressed in bits; or as ln, in which case the entropy is in nats.We shall use the latter convention for the computations in this paper.
Textbooks (e.g.Cover & Thomas [2]) which discuss the concept of entropy often do not provide analytic calculations of differential entropy for many probability distributions; specific cases are usually  [2], (pg.486-487) does provide a table of entropies for a large number of the probability density functions usually listed in a table of statistical distributions.This table was extracted from a paper by Lazo & Rathie [3].In addition, a very detailed computation of these entropies may be found in Michalowicz et al. [4].(Note: There are two typographical errors in the Cover & Thomas list; please double check by using the other two references, both of which have the correct formulas).
In this paper we calculate the differential entropy for a case not appearing in the lists cited above; namely, for a mixed Gaussian distribution with the probability density function Clearly this distribution is obtained by just splitting a Gaussian distribution N (0, σ 2 ) into two parts, centering one half about +µ and the other about −µ and summing the resultants.Such a density function is depicted in Figure 1.This distribution has a mean of zero and a variance given by σ 2 mg = σ 2 + µ 2 .This is because the second moment of the mixed Gaussian is 1/2 the sum of the second moments for the Gaussian components, each of which is σ 2 + µ 2 .It can also be written in the more compact form The mixed Gaussian distribution is often considered as a noise model in a number of signal processing applications.This particular noise model is used in describing co-channel interference, for example, where thermal, Gaussian distributed noise is combined with man-made "clutter" e.g., signals from communication systems [5].Wang and Wu [6] considered a mixed-Gaussian noise model in a nonlinear signal detection application.Mixed Gaussian noise was also used for modeling purposes in Tan et al. [7].Additional works on mixed Gaussian noise include that of Bhatia and Mulgrew [5], who looked at a non-parametric channel estimator for this type of noise, and Lu [8], who looked at entropy regularized likelihood learning on Gaussian mixture models.It has also been demonstrated that entropy-based parameter estimation techniques (e.g.mutual information maximization) are of great utility in estimating signals corrupted by non-Gaussian noise [9,10], particularly when the noise is mixed-Gaussian [11].
However, these works relied on non-parametric estimation of signal entropy due to the absence of a closed-form expression.Our work is therefore aimed at providing an analytical expression for signal entropy in situations where the corrupting noise source is mixed-Gaussian.The calculation of the differential entropy, in terms of nats, proceeds as follows If we let y = µx/σ 2 in this integral, the above expression becomes e −σ 2 y 2 /2µ 2 cosh(y) ln(cosh(y))dy.(5) Noting that the integrand is an even function, we obtain Let α = µ/σ.Then ∞ 0 e −y 2 /2α 2 cosh(y) ln(cosh(y))dy.
The first term is recognized as the entropy in nats of a Gaussian distribution.When µ = 0 (and so α = 0), our distribution reduces to a Gaussian distribution and the entropy reduces to just this first term.An analytic expression for the integral in Eqn.(7) could not be found.However, there are analytic bounds for the integral term which are derived by noting that y − ln 2 ≤ ln(cosh(y)) ≤ y ∀ y ≥ 0.
Thus, for the upper bound to the integral term we have by means of formula 3.562 (4) in [12], where erf denotes the error function, defined as Likewise, for the lower bound we have by means of formula 3.546 (2) in [12].Since the integrand in I is always greater than or equal to 0, we know that I ≥ 0, so we can write where for all α = µ/σ ≥ 0.
The graph of I as a function of α is shown in Figure 2, along with the analytic upper and lower bounds.Clearly I converges rapidly to the lower bound as α increases.A tabulation of numerically computed values of I is presented in Table 1, together with corresponding values of α 2 − I.As is clear in the Table, (α 2 − I) monotonically increases from 0 to ln 2 = 0.6931.Hence the differential entropy, in nats, of a mixed Gaussian distribution, as depicted in Figure 1, can be expressed as where (α 2 − I) is a function of α = µ/σ (tabulated in Table 1) which is equal to zero at α = 0 (in which case the distribution is Gaussian) and monotonically increases to ln 2 as α increases to α > 3.5 (in which case the distribution is effectively split into two separate Gaussians).In particular, if σ = 1, h e (X) is a  To express the differential entropy in bits, Eqn. ( 12) needs to be divided by ln 2, which gives where the second term is a monotonically increasing function of α = µ/σ which goes from 0 at α = 0 to 1 for α > 3.5.In particular, for σ = 1, the differential entropy in bits goes from 2.05 to 3.05 depending on the value of µ; that is, depending on how far apart the two halves of the mixed Gaussian distribution are.

Conclusions
This paper calculates the differential entropy for a mixed Gaussian distribution governed by the parameters µ and σ.A closed form solution was not available for one of the terms, however, this term was calculated numerically and tabulated, as well as estimated by analytic upper and lower bounds.For µ = 0 the entropy corresponds to the entropy for a pure Gaussian distribution; it monotonically increases to a well-defined limit for two well-separated Gaussian distribution halves (µ >> 0).Parameter estimation techniques based on information theory are one area where such calculations are likely to be useful.