Proportionate Minimum Error Entropy Algorithm for Sparse System Identification

Sparse system identification has received a great deal of attention due to its broad applicability. The proportionate normalized least mean square (PNLMS) algorithm, as a popular tool, achieves excellent performance for sparse system identification. In previous studies, most of the cost functions used in proportionate-type sparse adaptive algorithms are based on the mean square error (MSE) criterion, which is optimal only when the measurement noise is Gaussian. However, this condition does not hold in most real-world environments. In this work, we use the minimum error entropy (MEE) criterion, an alternative to the conventional MSE criterion, to develop the proportionate minimum error entropy (PMEE) algorithm for sparse system identification, which may achieve much better performance than the MSE based methods especially in heavy-tailed non-Gaussian situations. Moreover, we analyze the convergence of the proposed algorithm and derive a sufficient condition that ensures the mean square convergence. Simulation results confirm the excellent performance of the new algorithm.


Introduction
Sparse system identification is an active research area at present, which finds various real-world applications in network echo cancelation, wireless multipath channels, underwater acoustic communications, and so on [1,2].A system is qualitatively classified as a sparse system if only a small percentage of coefficients are active, while other coefficients are insignificant (i.e., equal or close to zero).It is worth noting that in compressive sensing and sparse coding, the term "sparse vector" usually means that most of the elements are exactly zero.In system identification and adaptive filtering, however, the term "sparse system (or filter)" in general means that most of the coefficients are equal or close to zero.For a sparse system, classic adaptive filtering algorithms like least mean square (LMS) and normalized LMS (NLMS) [3] may perform poorly in terms of steady state excess mean square error and convergence speed due to not using the a priori knowledge, especially in applications with long sparse systems.As a new scheme, the proportionate normalized least mean square (PNLMS) [4], which updates each filter coefficient in proportion to the magnitude of its estimate, has recently received a great deal of attention, and can perform much better than the conventional NLMS in the identification of sparse systems.Several improvements of the PNLMS algorithm have been proposed [5][6][7][8].Moreover, several proportionate-type affine projection algorithms (APAs) were also developed [9][10][11][12].
Most of the existing proportionate-type adaptive algorithms (such as PNLMS) are developed based on the well-known mean square error (MSE) criterion.The MSE is computationally simple and mathematically tractable, and optimal when data are Gaussian.However, when the data are non-Gaussian (especially when data are disturbed by impulsive noises or containing large outliers), the MSE may be a poor descriptor of optimality.Man-made low frequency atmospheric noises and lighting spikes in natural phenomena can be described more accurately using non-Gaussian noise models [13,14].From a statistical point of view, MSE only takes into account the second-order statistics, which is insufficient to capture all possible information from data.As a result, in non-Gaussian situations, the proportionate-type NLMS algorithms may perform poorly especially in the presence of impulsive noises.
Information theoretic learning (ITL) provides an appropriate framework for dealing with non-Gaussian signal processing [15,16].In ITL, the quadratic Renyi's entropy of the error was proposed as an alternative to MSE.With nonparametric Parzen window approach, the entropy can be easily estimated from the samples.Under the minimum error entropy (MEE) criterion, an adaptive system can be trained such that the error entropy between the model and unknown system is minimized [17][18][19][20][21][22][23][24][25].Since entropy can capture higher-order statistics and information content of signals rather than simply their energy, the MEE based adaptive algorithms may achieve significant performance improvements in non-Gaussian situations.In this work, we propose a novel proportionate algorithm for sparse system identification, called the proportionate minimum error entropy (PMEE) algorithm.Instead of using the MSE criterion, the new algorithm is derived based on the MEE criterion.The PMEE algorithm may perform much better than the PNLMS when identifying a sparse system with non-Gaussian noises.In a recent paper [26], we proposed three sparse adaptive filtering algorithms under MEE criterion, namely ZAMEE, RZAMEE, and CIMMEE, which are derived by incorporating a sparsity penalty term into the MEE criterion.These algorithms also perform well for sparse system identification with non-Gaussian noises.However, simulation results in this work show that PMEE can outperform them.
The rest of the paper is organized as follows.In Section 2, after briefly introducing the MEE criterion, we derive the PMEE algorithm.In Section 3, we carry out the mean square convergence analysis.In Section 4, we present simulation results to confirm the excellent performance of the PMEE.Finally, In Section 5, we give the conclusion.

Minimum Error Entropy Criterion
Figure 1 depicts an adaptive filtering scheme under MEE criterion.According to Figure 1, the adaptive filtering can be formulated as minimizing the error entropy between the filter output and the desired response.Since entropy quantifies the average uncertainty or dispersion of a random variable, its minimization makes the error concentrated.Consider a linear system where the desired signal is generated by: * ( ) ( ) ( ) where denotes the weight (parameter) vector of an finite impulse response (FIR) channel with M being the memory size, Τ denotes the transpose operator, and ( ) v n stands for the interference or measurement noise.Assume that the adaptive filter is also an FIR filter with weight vector Then the filtering error is: where ( ) y n is the output of the adaptive filter at instant n .Let the filtering error be a random variable with probability density function (PDF) (.) e p .The quadratic Renyi's entropy of error is: V e p d =  ξ ξ is named the quadratic information potential (QIP) [17,18].In practical applications, however, an analytical expression of the error entropy is not available in general; one has to estimate it directly from the error samples.By the Parzen window approach, the error PDF can be estimated as:  In the rest of the paper, unless mentioned otherwise, we will use the Gaussian kernel.Combining Equations ( 3) and (4) yields:

H e e e i de N e e i e e j de N e i e j N
Obviously, minimizing the quadratic Renyi entropy is equivalent to maximizing the QIP.Thus, the optimal weight vector under MEE can be formulated as:

Proportionate Minimum Error Entropy
Before presenting the PMEE algorithm, a general form of the PNLMS-type algorithms is revisited.Generally, the weight update equation of the PNLMS-type algorithms can be expressed as [4,8]: where μ is a step size parameter, δ 0 > is a regularization parameter that prevents division by zero in Equation ( 9) and stabilizes the solution, ( ) G n is a diagonal matrix that modifies the step size of each tap according to a specific rule.In general, the matrix ( ) G n is given by: where: The parameter ε prevents the coefficients from stalling when they are much smaller than the largest one.The parameter φ is an initialization parameter that helps to prevent stalling of the weight updating at the initial stage when all the taps are initialized to zero.
To develop the PMEE algorithm for sparse system identification, we use the error entropy instead of the squared error as the adaptation cost.According to Equation ( 8), a steepest ascent algorithm for estimating the weight vector can be derived as: where ( ( )) V e n ∇ denotes the gradient of the QIP with respect to the weight vector, given by: where ( , ) ( ) ( ) e i j e i e j Δ = − , and L denotes the sliding data length.Hence, inspired by the PNLMS-type algorithms, we propose the following weight update equation: ( ) where ( ) G n is determined by Equations ( 10)-( 12).This algorithm is referred to as the PMEE algorithm.

Remark 1. Obviously, one can also propose a normalized version of PMEE by dividing the term ( ) ( ) ( ) δ
T X n G n X n + , just like Equation ( 9).However, our simulation results indicate that the normalized PMEE performs well only when the underlying system is extremely sparse.Thus, in this work, we don't consider the normalized PMEE.Maybe other updating laws may exist that can provide possibly better result, but this is beyond the scope of this work.

Energy Conservation Relation
We rewrite Equation ( 2) in a form of block data: Now Equation ( 15) can be rewritten as: ( 1) where 1 2

n e n n G n n h e n e n e n n G n n G n h e n e n e n n n h e n n e n e n h e n G n n n e n e n W n W
Squaring both sides of Equation ( 22), we have: After some simple manipulations, we derive: where ( ) ( ) ( ) .Taking the expectations of the both sides of Equation ( 24) leads to the energy conservation relation [19,23,26]: ( ) is called the weight error power (WEP) at iteration n .

Sufficient Condition for Mean Square Convergence
Based on the energy conservation relation Equation ( 25), a sufficient condition that guarantees the mean square convergence (i.e., the monotonic decrease of the WEP) can be easily derived.Substituting ( 1) It follows that: Since μ 0 ≥ , a sufficient condition for the mean square convergence will be: The above sufficient condition ensures that the WEP will be monotonically decreasing (hence the algorithm will not diverge).

Simulation Results
Now we present simulation results about sparse system identification to demonstrate the performance of the PMEE, compared with ZAMEE, RZAMEE, CIMMEE [26] and PNLMS.The mean square deviation (MSD) is used as the performance index, calculated by: MSD= is very close to 2.0.The main reason for this is that, when α comes near to 2.0, the noise will be approximately Gaussian.Simulation results confirm that the PMEE can effectively identify a sparse system in a non-Gaussian impulsive noise environment.In the third simulation, we investigate how the kernel width affects the convergence performance.The noise parameters are set at (1.2,0, 0.4,0) V = . Simulation results are shown in Figure 5, from which one can see that the kernel width has significant influence on the convergence performance.When the kernel width is too large or too small, the performance will become poor.In a practical application, the kernel width can be manually selected or optimized by trial and error.

Conclusions
In this work, the proportionate minimum error entropy (PMEE) algorithm has been developed to identify a sparse system.Different from the existing proportionate-type adaptive filtering algorithms, such as the proportionate normalized least mean square (PNLMS), PMEE is derived by using the minimum error entropy (MEE) instead of the traditional mean square error (MSE) as the adaptation criterion.Convergence analysis based on energy conservation relation has been carried out, and a sufficient condition for ensuring the mean square stability is obtained.Simulation results have demonstrated the superior performance of the proposed algorithm especially in impulsive non-Gaussian situations.

.
where N is the samples number, κ σ denotes a kernel function with bandwidth σ , satisfying σ ( ) 0 The most popular kernel function used in ITL is the Gaussian kernel: