A Note on the W-s Lower Bound of the Mee Estimation

The minimum error entropy (MEE) estimation is concerned with the estimation of a certain random variable (unknown variable) based on another random variable (observation), so that the entropy of the estimation error is minimized. This estimation method may outperform the well-known minimum mean square error (MMSE) estimation especially for non-Gaussian situations. There is an important performance bound on the MEE estimation, namely the W-S lower bound, which is computed as the conditional entropy of the unknown variable given observation. Though it has been known in the literature for a considerable time, up to now there is little study on this performance bound. In this paper, we reexamine the W-S lower bound. Some basic properties of the W-S lower bound are presented, and the characterization of Gaussian distribution using the W-S lower bound is investigated.


Introduction
where [.] E denotes the expectation operator, ( ) E X g Y   denotes the estimation error, and G denotes the collection of all measurable functions of Y.The MMSE criterion is prevalent in estimation theory due to its mathematical tractability.Under Gaussian assumption, the MMSE criterion yields a linear optimal estimator, which requires only a simple matrix-vector operation [1].When the data are non-Gaussian, however, the MMSE estimator will be suboptimal and even unacceptable, since it considers only up to second-order statistics for their design.
In order to take into account higher-order statistics in design of estimators, researchers proposed many non-MMSE criteria.The minimum error entropy (MEE) is one of such criteria [2][3][4][5][6][7][8][9].Under MEE criterion, the optimal estimator is obtained by minimizing the error entropy, that is: where   p y is the marginal PDF of Y .The MEE criterion is invariant with respect to the error's mean.In practice, the MEE estimator is usually restricted to an unbiased one with zero-mean error.The entropy is a measure of concentration of a distribution, minimizing the error entropy forces the error to gather.
The early work in MEE estimation can be traced back to the late 1960s when Weidemann and Stear [2] studied the use of error entropy as a cost function for analyzing the performance of a general sampled-data estimating systems.Minamide [5] extended Weidemann and Stear's results to a continuous-time estimating system.Tomita, Kalata, Minamide et al. applied the MEE estimation to linear Gaussian systems, and studied filtering (state estimation), smoothing, and predicting problems from the information theory viewpoint [3][4][5].Some important properties of the MEE estimation were also reported in [11][12][13][14][15][16].In recent years, MEE has become a popular optimization criterion in the areas of signal processing and machine learning [8,9,[17][18][19][20][21][22].Combining kernel density estimation (KDE) and Renyi's quadratic entropy yields a computationally simple, nonparametric entropy estimator that has been successfully used in information theoretic learning (ITL) [8].
There is a performance bound on the MEE estimation, which was originally derived by Weidemann and Stear [2], and later was rederived and named the W-S lower bound by Janzura et al. [6].The W-S lower bound provides a lower bound on the error entropy, although it is not necessarily attained by the MEE estimator for a given joint distribution p XY .This performance bound is nothing but the conditional entropy of the unknown variable X given the observation Y , that is, we have: The above inequality can be easily derived using Jensen's inequality.Let ( ) log x x x    .We have:

H E p x dx p x g y y p y dy dx p x g y y p y dy dx p x g y y dx p y dy p x y dx p y dy
where (a) comes from the concavity of ( ) x  and Jensen's inequality, and ( ) H X Y y  denotes the conditional entropy of X given Y y  .
The performance bounds are very important in estimation theory.So far there is, however, little study on the W-S lower bound of the MEE estimation.In this paper, we will present some important properties of the W-S lower bound, and show that this performance bound can be applied to characterize the Gaussian distribution.The rest of the paper is organized as follows: in Section 2, some basic properties of the W-S lower bound are presented.In Section 3, the characterization of the Gaussian distribution using W-S lower bound is investigated.Finally, the conclusions are given in Section 4.

Some Properties of the W-S Lower Bound
In the following, we present some properties of the W-S lower bound.First, we present several sufficient and necessary conditions under which the W-S lower bound can be achieved.

and only if any one of the following properties holds:
(1) the error † ( ) The mutual information ( ; ) I E Y equals zero if and only if E and Y are independent, so we conclude that the MEE estimator achieves the W-S lower bound if and only if the error is independent of Y .
(2) As † ( ) Remark: The properties (2)~(4) of Theorem 1 suggest that if the error entropy achieves the W-S lower bound, only the location (or mean) of the conditional density of X given Y = y will depend on y through function † ( ) g y , while the shape of the conditional density is always the same as the shape of the error density, which is independent of y ., , , does not depend on y , where i  denotes the conditional mean value of i X given Y = y.
Proof: If the error entropy achieves the W-S lower bound, the shape of the conditional density of X given Y = y will not depend on y .Thus, the theorem holds since the shape of a density function determines its central moments (Note that the central moments depend only on the shape of a density, and are independent of the location of distribution).
where n c   is a n -dimensional constant vector.
Proof: According to the property (2) of Theorem 1, we have † ( ) And hence, † ( ) , where It has been shown in [15] that the MEE estimator may be non-unique even if the error distribution is restricted to zero-mean (unbiased).However, the following corollary holds.Proof: If error E is restricted to zero-mean (i.e.,   e., c  0 ).In this case, the MEE estimator becomes the conditional mean of X given Y (i.e., the MMSE estimator), which is, obviously, unique.Proof: According to [16], the SMEE estimator is obtained by minimizing the smoothed MEE criterion , where  is the smoothing factor, and U is a smoothing variable (see [16] for the detailed description of the smoothing variable) that is independent of X , Y and E .Clearly, the SMEE estimator of X based on Y is identical to the MEE estimator of X U   based on Y .Since the MEE estimator of X based on Y achieves the W-S lower bound, we have † ( ) where Z Z U     .Because U is independent of X , Y and E , the variable Z will also be independent of Y .By property (2) of Theorem 1, one may easily conclude that the MEE estimator of X based on Y is identical to the MEE estimator of X U   based on Y .This completes the proof.
Theorem 5: Let the random vector , then the MEE estimator of X based on Y will achieve the W-S lower bound, and it will be an affine linear function of Y .
Proof: It is easy to prove that the conditional distribution of X given Y has a Gaussian distribution with mean vector the conditional mean of X given Y is an affine linear function of Y , and the conditional covariance matrix of X given Y is constant (i.e., does not depend on Y ).Since the shape of the Gaussian distribution depends only on the covariance matrix, the conditional density of X given Y has a fixed shape.And hence, the MEE estimator of X based on Y will achieve the W-S lower bound.By Theorem 3, the MEE estimator of X will also be an affine linear function of Y .
Remark: If X and Y are joint Gaussian, the MEE estimator of X based on Y will achieve the W-S lower bound.However, it should be noted that for most cases, the MEE estimator cannot achieve this performance bound.A simple example is given below., where: where 1 sin( )   .The joint density XY p is joint Gaussian only when 0   , and in this case y  is independent of Y .For any y and  ( 1   ), the conditional distribution of X given Y = y is Gaussian, which is symmetric and unimodal (SUM).According to the Theorem 1 in [12], the MEE estimator of X based on Y will be the conditional median of X given Y .For different  values, we can calculate the minimum error entropy and the W-S lower bound (  ) H X Y .The results are shown in Figure 1.As one can see clearly, when 0   (joint non-Gaussian), the minimum error entropy is always above the W-S lower bound.

Characterization of the Gaussian Distribution
The W-S lower bound can be applied to characterize the Gaussian distribution, i.e., constructing some conditions under which a distribution is Gaussian (or joint Gaussian).The problem of characterization of the Gaussian distribution is an interesting problem, which has been extensively studied in the literature [23][24][25][26][27]. First, we introduce a lemma.
Lemma 1 [24]: Let X and Y be two random vectors, , and the distribution of Y given X is a q -dimensional Gaussian distribution with mean vector a BX  ( q a   , q p B    ), and constant covariance matrix 0 q q    , then the joint distribution of X Y       will be a

 
p + q -dimensional Gaussian distribution, whose mean vector and covariance matrix are: Based on Lemma 1, we can state the following theorem.
will be a   n+ m -dimensional multivariate Gaussian random vector.
Proof: Since the linear estimator X = BY achieves the W-S lower bound, by Theorem 1, we have: where n Z   is a random vector that is independent of Y .And hence, the conditional mean vector of X givenY is a + BY , where . In addition, the conditional covariance matrix of X given Y is equal to the covariance matrix of Z , which is a constant matrix (i.e., independent of Y ).By applying Lemma 1, we complete the proof.
The next lemma is needed in the proof of Theorem 7.
Lemma 2 [25] will not depend on y (i.e., be constant on m  ).By Lemma 2, the joint density XY p will be multivariate Gaussian.Before presenting Theorem 8, we introduce the third lemma, which is an extended version of Ghurye and Olkin's theorem [23,26].
Lemma 3 [26]: Let 1 p U   and 2 q U   be two independent non-degenerate random vectors, and let 1 p X   and 2 q X   be two independent random vectors such that: where 1 (ii) none of the rows of  Proof: Since Y = AX + Z , the error E can be expressed as: Thus we have: where .Therefore, we have the following corollary: Corollary 2: For the estimation problem in Figure 2, if X , Y , Z are all scalar random variables ( 1 m n   ), and there exists a linear estimator X = BY such that the error entropy ( ) H E achieves the

nX
  and m Y   be two random vectors, with joint probability density function (PDF) ( , ) XY p x y , where X represents a unknown variable and Y stands for the observation.An optimal OPEN ACCESSestimator of X based on the observation is a function of Y that minimizes a certain cost function.Under the well-known minimum mean square error (MMSE) criterion, the optimal estimator is:

Theorem 2 :Y
Let X and Y be two random vectors,   .If there exists an MEE estimator of X based on Y that achieves the W-S lower bound ( ) H X Y , then   1 2

Theorem 3 :
Let X and Y be two random vectors, n X   , m Y   .If there exists an MEE estimator of X based on Y that achieves the W-S lower bound ( ) H X Y , then it will be:

Corollary 1 :
Let X and Y be two random vectors, n X   , m Y   .If there exists an MEE estimator of X based on Y that achieves the W-S lower bound ( ) H X Y , then the unbiased MEE estimator will be unique, and be identical to the MMSE estimator.

Theorem 4 :
Let X and Y be two random vectors, n X   , m Y   .If there exists an MEE estimator of X based on Y that achieves the W-S lower bound ( ) H X Y , then the MEE estimator and the smoothed MEE (SMEE) estimator of X based on Y will be identical.

Example 1 :
Consider the joint PDF

Figure 1 .
Figure 1.The minimum error entropy and the W-S lower bound.

Theorem 6 :
Let X and Y be two random vectors, distribution of X given Y is a n -dimensional Gaussian distribution.If there exists a linear estimator X = BY such that the error entropy ( ) H E achieves the W-S lower bound ( ) H X Y , where problem of estimating the source n X   given the observation Y = AX + Z , where m Y   , is the additive noise that is independent of X , as shown in Figure2.

Figure 2 .Theorem 8 :
Figure 2. General setup of the source estimating problem.
The error entropy achieves the W-S lower bound if and only if if and only if E is independent of the observation Y .Then by applying Lemma 3, we arrive easily at the results.When