Some Further Results on the Minimum Error Entropy Estimation

The minimum error entropy (MEE) criterion has been receiving increasing attention due to its promising perspectives for applications in signal processing and machine learning. In the context of Bayesian estimation, the MEE criterion is concerned with the estimation of a certain random variable based on another random variable, so that the error's entropy is minimized. Several theoretical results on this topic have been reported. In this work, we present some further results on the MEE estimation. The contributions are twofold: (1) we extend a recent result on the minimum entropy of a mixture of unimodal and symmetric distributions to a more general case, and prove that if the conditional distributions are generalized uniformly dominated (GUD), the dominant alignment will be the MEE estimator; (2) we show by examples that the MEE estimator (not limited to singular cases) may be non-unique even if the error distribution is restricted to zero-mean (unbiased).


Introduction
A central concept in information theory is entropy, which is a mathematical measure of the uncertainty or the amount of missing information [1].Entropy has been widely used in many areas, including physics, mathematics, communication, economics, signal processing, machine learning, etc.The maximum entropy principle is a powerful and widely accepted method for statistical inference or probabilistic reasoning with incomplete knowledge of probability distribution [2].Another important entropy principle is the minimum entropy principle, which decreases the uncertainty associated with a system.In particular, the minimum error entropy (MEE) criterion can be applied in problems like estimation [3][4][5], identification [6,7], filtering [8][9][10], and system control [11,12].In recent years, the MEE criterion, together with the nonparametric Renyi entropy estimator, has been successfully used in information theoretic learning (ITL) [13][14][15].
In the scenario of Bayesian estimation, the MEE criterion aims to minimize the entropy of the estimation error, and hence decrease the uncertainty in estimation.Given two random variables: where ( ) where ( ) F y denotes the distribution function of Y .From (2), one can see the error PDF ( ) actually a mixture of the shifted conditional PDF.
Different from conventional Bayesian risks, like mean square error (MSE) and risk-sensitive cost [16], the "loss function" in MEE is log (.) , which is directly related to the error's PDF, transforming nonlinearly the error by its own PDF.Some theoretical aspects of MEE estimation have been studied in the literature.In an early work [3], Weidemann and Stear proved that minimizing the error entropy is equivalent to minimizing the mutual information between the error and the observation, and also proved that the reduced error entropy is upper-bounded by the amount of information obtained by the observation.In [17], Janzura et al. proved that, for the case of finite mixtures (Y is a discrete random variable with finite possible values), the MEE estimator equals the conditional median provided that the conditional PDFs are conditionally symmetric and unimodal (CSUM).Otahal [18] extended Janzura's results to finite-dimensional Euclidean space.In a recent paper, Chen and Geman [19] employed a "function rearrangement" to study the minimum entropy of a mixture of CSUM distributions where no restriction on Y was imposed.More recently, Chen et al. have investigated the robustness, non-uniqueness (for singular cases), sufficient condition, and the necessary condition involved in the MEE estimation [20].Chen et al. have also presented a new interpretation on the MSE criterion as a robust MEE criterion [21].
In this work, we continue the study on the MEE estimation, and obtain some further results.Our contributions are twofold.First, we extend the results of Chen and Geman to a more general case, and show that when the conditional PDFs are generalized uniformly dominated (GUD), the MEE estimator equals the dominant alignment.Second, we show by examples that, the unbiased MEE estimator (not limited to singular cases) may be non-unique, and there can even be infinitely many optimal solutions.The rest of the paper is organized as follows.In Section 2, we study the minimum entropy of a mixture of generalized uniformly dominated conditional distributions.In Section 3, we present two examples to show the non-uniqueness of the unbiased MEE estimation.Finally, we give our conclusions in Section 4.

MEE Estimator for Generalized Uniformly Dominated Conditional Distributions
Before presenting the main theorem of this section, we give the following definitions.
where  is Lebesgue measure.The set D  is called the  -volume dominant support of F .
Definition 2: The nonnegative, integrable function set F is said to be generalized uniformly dominated (GUD) in n x   if and only if there exists a function : uniformly dominated, where: The function ( ) t   c will also be a dominant alignment of F .
When regarding y as an index parameter, the conditional PDF ( ) p x y will represent a set of nonnegative and integrable functions, that is: If the above function set is (generalized) uniformly dominated in n x   , then we say that the conditional PDF ( )

Remark 2:
The GUD is much more general than CSUM.Actually, if the conditional PDF is CSUM, it must also be GUD (with the conditional mean as the dominant alignment), but not vice versa.In  (a) (b) exists (here "exists" means "exists in the extended sense" as defined in [19]) then (

Proof of Theorem 1:
The proof presented below is similar to that of the Theorem 1 in [19], except that the discretization procedure is avoided.In the following, we give a brief sketch of the proof, and consider only the case 1 n  (the proof can be easily extended to 1 n  ).First, one needs to prove the following proposition.
satisfies the following conditions: (1) non-negative, continuous, and integrable in x   for each m y   ; (2) generalized uniformly dominated in x , with dominant alignment ( ) y  ; (3) uniformly bounded in ( , ) x y .
Then for any : m g   , we have: where (here we extend the entropy definition to nonnegative 1 L functions), and:

Proof of Proposition 1:
The above proposition can be readily proved using the following two lemmas.
Lemma 1 [19]: Assume the nonnegative function Then the following results hold: (c) For any Proof of Lemma 1: See [19].
).Therefore, to prove Proposition 1, it suffices to prove: Lemma 2: Functions m  and g m satisfy: (a) (b) Proof of Lemma 2: (a) comes directly from the fact . We only need to prove (b).
We have: where where   .
 denotes the indicator function, and (a) follows from In the above proof, we adopt the convention 0 0    .

Q.E.D (Proposition 1)
Now the proof of Proposition 1 has been completed.To finish the proof of Theorem 1, we have to remove the conditions of continuity and uniform boundedness imposed in Proposition 1.This can be easily accomplished by approximating ( | ) p x y by a sequence of functions   which satisfy these conditions.The remaining proof is omitted here, since it is exactly the same as the last part of the proof for Theorem 1 in [19].

Q.E.D. (Theorem 1)
Example 1: Consider an additive noise model: where  is an additive noise that is independent of Y .In this case, we have    

| ( ) p x y p x y
In fact, this result can also be proved by: where (b) comes from the fact that  and   are independent (For independent random variables X and Y , the inequality holds).In this example, the conditional PDF   | p x y is, obviously, not necessarily CSUM.
Example 2: Suppose the joint PDF of random variables X , Y ( , where 0 y  , exp( ) 1 x y  .Then the conditional PDF   p x y will be: One can easily verify that the above conditional PDF is non-symmetric but generalized uniformly dominated, with dominant alignment ( ) exp( ) ).By Theorem 1, the function exp( ) y  is the minimizer of error entropy.

Non-Uniqueness of Unbiased MEE Estimation
Because entropy is shift-invariant, the MEE estimator is obviously non-unique.In practical applications, in order to yield a unique solution, or to meet the desire for small error values, the MEE estimator is usually restricted to be unbiased, that is, the estimation error is restricted to be zero-mean [15].The question of interest in this paper is whether the unbiased MEE estimator is unique.In [20], it has been shown that, for the singular case (in which the error entropy approaches minus infinity), the unbiased MEE estimation may yield non-unique (even infinitely many) solutions.In the following, we present two examples to show that this result still holds even for nonsingular case.
Example 3: Let the joint PDF of X and Y ( , X Y   ) be a mixed-Gaussian density [20]: e x p e x p 2 1 2 1 4 1 where 0 The conditional PDF of X given Y will be: y is symmetric around zero (but not unimodal in x).It can be shown that for some values of  ,  , the MEE estimator of X based on Y does not equal zero (see [20], Example 3).In these cases, the MEE estimator will be non-unique, even if the error's PDF is restricted to zero-mean (unbiased) distribution.This can be proved as follows: Let * g be an unbiased MEE estimator of X based on Y .Then g does not equal zero.Therefore, the unbiased MEE estimator must be non-unique.Obviously, the above result can be extended to more general cases.In fact, we have the following proposition.
Proposition2: The unbiased MEE estimator will be non-unique if the conditional PDF   | p x y satisfies: (1) Symmetric in (2) There exists a function : Proof: Similar to the proof presented above (Omitted).
In the next example, we show that, for some particular situations, there can be even infinitely many unbiased MEE estimators.
Example 4: Suppose Y is a discrete random variable with Bernoulli distribution: The conditional PDF of X given Y is (see Figure 2): where 0 a  .Note that the above conditional PDF is uniformly dominated in Given an estimator ˆ( ) X g Y  , the error's PDF will be: Let g be an unbiased estimator, then ( ) , and hence (0) (1) g g   . In the following, we assume (0) 0 g  (due to symmetry, one can obtain similar results for (0) 0 g  ), and consider three cases: . In this case, the error PDF is: Then the error entropy can be calculated as: . In this case, we have: One can easily verify that the error entropy achieves its minimum value when 0 (0) 4 g a   (the first case).There are, therefore, infinitely many unbiased estimators that minimize the error entropy.

Conclusion
Two issues involved in the minimum error entropy (MEE) estimation have been studied in this work.The first issue is about which estimator minimizes the error entropy.In general there is no explicit expression for the MEE estimator unless some constraints on the conditional distribution are imposed.In the past, several researchers have shown that, if the conditional density is conditionally symmetric and unimodal (CSUM), then the conditional mean (or median) will be the MEE estimator.We extend these results to a more general case, and show that if the conditional densities are generalized uniformly dominated (GUD), then the dominant alignment will minimize the error entropy.The second issue is about the non-uniqueness of the unbiased MEE estimation.It has been shown in a recent paper that for the singular case (in which the error entropy approaches minus infinity), the unbiased MEE estimation may yield non-unique (even infinitely many) solutions.In this work, we show by examples that this result still holds even for nonsingular case.

nX
  , an unknown parameter to be estimated, and m Y   , the observation (or measurement), the MEE estimation of X based onY can be formulated as: Y denotes an estimator of X based on Y , g is a measurable function, G stands for the collection of all measurable functions of Y , the probability density function (PDF) of the estimation error.Let ( ) p x y be the conditional PDF of X given Y .Then:

Figure 1
Figure 1 we show two examples where two PDFs (solid and dotted lines) are uniformly dominated but not CSUM.
as x   .(b) For any function

2 )
We are now in position to prove(11):

.
PDF.It is clear that   | p x y is generalized uniformly dominated, According to Theorem 1, we have where (c) comes from the fact that   | p x y is symmetric around zero, and further:  , which contradicts the fact that *