Convergence of a Fixed-Point Minimum Error Entropy Algorithm

The minimum error entropy (MEE) criterion is an important learning criterion in information theoretical learning (ITL). However, the MEE solution cannot be obtained in closed form even for a simple linear regression problem, and one has to search it, usually, in an iterative manner. The fixed-point iteration is an efficient way to solve the MEE solution. In this work, we study a fixed-point MEE algorithm for linear regression, and our focus is mainly on the convergence issue. We provide a sufficient condition (although a little loose) that guarantees the convergence of the fixed-point MEE algorithm. An illustrative example is also presented.


Introduction
In recent years, information theoretic measures, such as entropy and mutual information, have been widely applied in domains of machine learning (so called information theoretic learning (ITL) [1]) and OPEN ACCESS signal processing [1,2].A possible main reason for the success of ITL is that information theoretic quantities can capture higher-order statistics of the data and offer potentially significant performance improvement in machine learning applications [1].Based on the Parzen window method [3], the smooth and nonparametric information theoretic estimators can be applied directly to the data without imposing any a priori assumptions (say the Gaussian assumption) about the underlying probability density functions (PDFs).In particular, Renyi's quadratic entropy estimator can be easily calculated by a double sum over samples [4][5][6][7].The entropy in supervised learning serves as a measure of similarity and follows a similar framework of the well-known mean square error (MSE) [1,2].An adaptive system can be trained by minimizing the entropy of the error over the training dataset [4].This learning criterion is called the minimum error entropy (MEE) criterion [1,2,[8][9][10].MEE may achieve much better performance than MSE especially when data are heavy-tailed or multimodal non-Gaussian [1,2,10].
However, the MEE solution cannot be obtained in closed form even when the system is a simple linear model such as a finite impulse response (FIR) filter.A practical approach is to search the solution over performance surface by an iterative algorithm.Usually, a simple gradient based search algorithm is adopted.With a gradient based learning algorithm, however, one has to select a proper learning rate (or step-size) to ensure the stability and achieve a better tradeoff between misadjustment and convergence speed [4][5][6][7].Another more promising search algorithm is the fixed-point iterative algorithm, which is step-size free and is often much faster than gradient based methods [11].The fixed-point algorithms have received considerable attention in machine learning and signal processing due to their desirable properties of low computational requirement and fast convergence speed [12][13][14][15][16][17].
The convergence is a key issue for an iterative learning algorithm.For the gradient based MEE algorithms, the convergence problem has already been studied and some theoretical results have been obtained [6,7].For the fixed-point MEE algorithms, up to now there is still no study concerning the convergence.The goal of this paper is to study the convergence of a fixed-point MEE algorithm and provide a sufficient condition that ensures the convergence to a unique solution (the fixed point).It is worth noting that the convergence of a fixed-point maximum correntropy criterion (MCC) algorithm has been studied in [18].The remainder of the paper is organized as follows.In Section 2, we derive a fixed-point MEE algorithm.In Section 3, we prove a sufficient condition to guarantee the convergence.In Section 4, we present an illustrative example.Finally in Section 5, we give the conclusion.

Fixed-Point MEE Algorithm
Consider a simple linear regression (filtering) case where the error signal is with ( ) d i ∈  being a desired value at time i , ( ) ( ) , , ,  the weight vector, and ( ) ( ), ( ), , ( )  the input vector (i.e., the regressor).The goal is to find a weight vector such that the error signal is as small as possible.
Under the MEE criterion, the optimal weight vector is obtained by minimizing the error entropy [1,2].With Renyi's quadratic entropy, the MEE solution can be expressed as


is also called the quadratic information potential (QIP) [1].In a practical situation, however, the error distribution is usually unknown, and one has to estimate it from the error samples { } (1), (2), , ( ) e e e N  , where N denotes the sample number.Based on the Parzen window approach [3], the estimated PDF takes the form where (.) κ stands for a kernel function (not necessarily a Mercer kernel), satisfying ( ) 0 . Without mentioned otherwise, the kernel function is selected as a Gaussian kernel, given by whereσ denotes the kernel bandwidth.With Gaussian kernel, the QIP can be simply estimated as Therefore, in practical situations, the MEE solution of (2) becomes Unfortunately, there is no closed form solution of (6).One can apply a gradient based iterative algorithm to search the solution, starting from an initial point.Below we derive a fixed-point iterative algorithm, which is, in general, much faster than a gradient based method (although a gradient method can be viewed as a special case of the fixed-point methods, it involves a step-size parameter).Let's take the following first order derivative:

e j W N e i e j e i e j X i X j N e i e j d i d j X i X j N e i e j X i X j X i X j W N
where


, and assume that the matrix MEE XX R is invertible.Then, we obtain the following solution [15]: ( ) The above solution is, in form, very similar to the well-known Wiener solution [19].However, it is not a closed form solution, since both matrix MEE XX R and vector MEE dX P depend on the weight vector W (note that ( )  e i depends on W ). Therefore, the solution of ( 9) is actually a fixed-point equation, which can also be expressed as ,where The solution (fixed-point) of the equation ( ) can be found by the following iterative fixed-point algorithm: 1 ( ) where k W denotes the estimated weight vector at iteration k .This algorithm is called the fixed-point MEE algorithm [15].An online fixed-point MEE algorithm was also derived in [15].In the next section, we will prove a sufficient condition under which the algorithm (11) surely converges to a unique fixed-point.

Convergence of the Fixed-Point MEE
The convergence of a fixed-point algorithm can be proved by the well-known contraction mapping theorem (also known as the Banach fixed-point theorem) [11].According to the contraction mapping theorem, the convergence of the fixed-point MEE algorithm (11) such that the initial weight vector 0 p W β ≤ , and where .p denotes an lp-norm of a vector or an induced norm of a matrix, defined by with respect to W , given by ( ) ( ) ( ) where 1 To obtain a sufficient condition to guarantee the convergence of the fixed-point MEE algorithm (11), we prove two theorems below.] [ ] ( ) Proof.The induced matrix norm is compatible with the corresponding vector lp-norm, hence where is the 1-norm (also referred to as the column-sum norm) of the inverse matrix , which is simply the maximum absolute column sum of the matrix.According to the matrix theory, the following inequality holds: where is the 2-norm (also referred to as the spectral norm) of , which equals the maximum eigenvalue of the matrix.Further, we have where (a) comes from ( ) In addition, it holds that x .Combining ( 16)-( 18) and (20), we derive

N N c i j e i e j d i d j X i X j N e i e j d i d j X i X j N d i d j X i X j N
, where * σ is the solution of the equation ( ) ϕ σ β = , and † σ is the solution of equation ( ) then it holds that , and Proof.By Theorem 1, we have

W w e i e j x i x j e i e j X i X j X i X j W N e i e j x i x j e i e j d i d j X i X j N
e i e j x i x j e i e j X i X j X i X j W N e i e j x i x j e i e j d i d j X i X j N Obviously, ( ) ψ σ is also a continuous and monotonically decreasing function of σ over ( ) 0,∞ , and satisfies According to Theorem 2 and Banach Fixed-Point Theorem [11], given an initial weight vector satisfying 0 1 W β ≤ , the fixed-point MEE algorithm (11) will surely converge to a unique fixed point in the range provided that the kernel bandwidth σ is larger than a certain value.Moreover, the value of α ( 0 1 α < < ) guarantees the convergence speed.It is worth noting that the derived sufficient condition will be, certainly, a little loose, due to the zooming out in the proof process.

Illustrative Example
In the following, we give an illustrative example to verify the derived sufficient condition that guarantees the convergence of the fixed-point MEE algorithm.Let us consider a simple linear model: where ( ) X i is a scalar input, and ( ) v i is an additive noise.Assume that ( )    .
Table 1 shows the numbers of iterations for convergence with different kernel bandwidths (3.0, 1.0, 0.1, 0.05).The initial weight vector is set at 0 0.1 W = , and the stop condition for the convergence is , the fixed-point MEE algorithm will surely converge to a solution with few iterations.When σ becomes smaller, the algorithm may still converge, but the convergence speed will become much slower.Note that when σ is too small (e.g., 0.01 σ = ), the algorithm will diverge (the corresponding results are not shown in Table 1).

Conclusion
The MEE criterion has received increasing attention in signal processing and machine learning due to its desirable performance in adaptive system training especially with non-Gaussian data.Many iterative optimization methods have been developed to minimize the error entropy for practical use.But the fixed-point algorithms have been seldom studied, and in particular, too little attention has been paid to the convergence issue of the fixed-point MEE algorithms.This paper presented a theoretical study of this problem, and proved a sufficient condition to guarantee the convergence of a fixed-point MEE algorithm.The results of this study may provide a possible range for choosing a kernel bandwidth for MEE learning.However, the derived sufficient condition may give a much larger kernel bandwidth than a desired one due to the zooming out in the formula derivation process.In the future study, we will try to derive a tighter sufficient condition that ensures the convergence of the fixed-point MEE algorithm.
* σ is the solution of equation ( ) ) follows from the convexity of the vector l1-norm, and (c) is because 21)Clearly, the function ( ) ϕ σ is a continuous and monotonically decreasing function of σ over ( )

} 100 1 ( 1 .
( ) v i is zero-mean Gaussian with variance 0.01 .There are 100 training samples { ), ( ) i X i d i = generated from the system (28).Based on these data we calculate Then by solving the equations ( ) ϕ σ β = and ( )ψ σ α = ,we obtain σ * = 2.38 and σ = † 2.68 .Therefore, by Theorem 2, if σ ≥ 2.68 the fixed-point MEE algorithm will converge to a unique solution in the range 3 3 W − ≤ ≤ .Figures1-3illustrate the curves of the functions W , .In this case, the algorithm still will converge to a unique solution in the range 3 3 W − ≤ ≤ .This result confirms the fact that the derived sufficient condition is a little loose (i.e., far from being necessary).The main reason for this is that there is a lot of zooming out in the derivation process; (iii) however, when σ is too small, ≤ .In this case, the algorithm may diverge.

Figure 1 .
Figure 1.Plots of the functions W , ( ) W f and ( ) d W dW f when 3.0 σ =

Figure 2 .
Figure 2. Plots of the functions W , ( ) W f and ( ) d W dW f when 0.1 σ = .

Figure 3 .
Figure 3. Plots of the functions W , ( ) W f and ( ) d W dW f when 0.01 σ =

Table 1 .
Numbers of iterations for convergence with different kernel bandwidthsσ .