A Proportionate Normalized Maximum Correntropy Criterion Algorithm with Correntropy Induced Metric Constraint for Identifying Sparse Systems

: A proportionate-type normalized maximum correntropy criterion (PNMCC) with a correntropy induced metric (CIM) zero attraction terms is presented, whose performance is also discussed for identifying sparse systems. The proposed sparse algorithms utilize the advantage of proportionate schemed adaptive ﬁlter, maximum correntropy criterion (MCC) algorithm, and zero attraction theory. The CIM scheme is incorporated into the basic MCC to further utilize the sparsity of inherent sparse systems, resulting in the name of the CIM-PNMCC algorithm. The derivation of the CIM-PNMCC is given. The proposed algorithms are used for evaluating the sparse systems in a non-Gaussian environment and the simulation results show that the expanded normalized maximum correntropy criterion (NMCC) adaptive ﬁlter algorithms achieve better performance than those of the squared proportionate algorithms such as proportionate normalized least mean square (PNLMS) algorithm. The proposed algorithm can be used for estimating ﬁnite impulse response (FIR) systems with symmetric impulse response to prevent the phase distortion in communication system.


Introduction
Sparse adaptive filtering (AF) algorithms are becoming a boosted topic in recent decades and they are discussed and used for sparse system identifications, multi-path channel estimations, echo cancelation and underwater acoustic communications [1][2][3][4][5][6]. Additionally, the AF has been used for improving the electrocardiogram (ECG) detection during magnetic resonance imaging (MRI) [7]. Furthermore, most of the in-nature signals are sparse or can be regarded as sparse such as underwater channels, a multi-path channel in complex terrain environments, echoes in a high-definition television channel, in which only a few coefficients are dominant, while most of the coefficients are unimportant [8][9][10][11]. In this case, the insignificant coefficients are usually inactive, which are usually zeros or near zeros in a system. For such sparse signal processing, traditional least mean square (LMS), and normalized LMS (NLMS) might perform poorly by considering only the convergence, and steady-state error (SSE) since these conventional adaptive filters did not use the prior sparse information of the sparse systems. Thus, the demand for sparse signal processing has been becoming a hot topic. The fast convergence algorithms and sparse AF are desired in the sparse finite impulse response (FIR) environments, resulting in the rapid development of proportionate-type (Pt) adaptive filters, including the PNLMS algorithm [2]. The PNLMS algorithm exploits the sparsity characteristics in the sparse systems via proportionally assigning different gains to the magnitudes of the estimated coefficients [2,3]. By using this technique, the convergence speed at the initial stage can be improved. However, the PNLMS's convergence speed may be slower than the traditional NLMS algorithm for estimating long sparse signals because the small coefficients are assigned small gains, which increases the convergence time for achieving a steady-state mean square error (MSE) [3]. Another drawback of the typical PNLMS algorithm is that the convergence speed rate may be slower in comparison with NLMS algorithm when the sparse systems are less sparse or dispersive. In the sequel, several improved PNLMS algorithms have been proposed to enhance its performance [3,4,[12][13][14]. These PNLMS algorithms with low complexity were realized by using the squared error criterion to get optimal for processing Gaussian data. All of these PNLMS algorithms mainly update their favored adaptive filtering coefficients, which render these PNLMS algorithms suitable for sparse signal processing [12][13][14][15]. Moreover, the PNLMS algorithm is also an NLMS algorithm if all the coefficients are assigned the same gains. However, the PNLMS algorithm may perform worse when the data are non-Gaussian noises such as the low frequency atmospheric noise [16]. The PNLMS algorithm and its variants just take the second-order statistics into consideration to construct the cost function [2][3][4], which is not enough to capture all the information from the data. Therefore, the performance of these PNLMS algorithms will be poor if they are used in the non-Gaussian environments.
With the on-going AF algorithms, a minimum error entropy (MEE) was reported within information theoretic learning framework for handling the non-Gaussian signals [16][17][18][19][20][21][22][23]. In the information theoretic learning, a quadratic Renyi's entropy has been used for data classification, channel modeling and data fusion [24][25][26] and it is used to provide an alternative method instead of MSE [16]. As a result, the entropy can access the samples well by using a non-parametric Parzen window method. Then, the MEE adaptive filters were used for identifying systems, which can obtain a better estimating performance in non-Gaussian environments [23]. This is because the entropy employs a high-order statistic and utilizes the contents of the information instead of the power of the signal power. After that, the proportionate MEE (PNMEE) algorithm is proposed to utilize the sparse property in nature systems [16]. However, the PMEE algorithm has a high computational burden, which makes it not good for practical engineering applications. To reduce the computational burden, the maximum correntropy criterion (MCC) was developed using a similarity function to act as a novel cost function [27]. Finally, the MCC has a comparable complexity compared with the basic LMS algorithm, while it can provide a robustness-like performance as that of the MEE algorithm. Thereby, the MCC algorithm is a proper selective method for practical engineering applications in non-Gaussian environments. However, the MCC does not use the sparse-structure information in the existing sparse system. Although a Pt technique was integrated into the normalized MCC (NMCC) to form a proportionate NMCC (PNMCC) algorithm to handle the sparse signals, the PNMCC's convergence becomes slow after the early iterations and the PNMCC cannot fully use the sparse information of the natural sparse signals.
An improved proportionate normalized MCC (PNMCC) and a correntropy induced metric (CIM) penalized PNMCC (CIM-PNMCC) are considered for fully taking use of the inherent sparsity characteristics in the sparse systems. The developed PNMCC algorithm is realized by incorporating the proportionate-type technique, the normalization method and a generalized Gaussian density function (GGDF) into the MCC algorithm, while the CIM-PNMCC algorithm is implemented by means of the CIM theory which is integrated into the newly presented PNMCC algorithm to form an amazing zero attractor. The proposed PNMCC and the CIM-PMCC algorithms are mathematically given in detail and they are used for estimating sparse systems. These two algorithms perform better for identifying the sparse systems in non-Gaussian noises compared with the PNLMS algorithm.
The paper is well constructed as follows. We review the MCC and zero attraction methods in Section 2. In Section 3, the PNMCC and CIM-PNMCC algorithms will be proposed. The performance for sparse system identification of the developed algorithms is discussed in Section 4. Finally, we give a short summary of this paper.
Notations: · l 2 -norm (·) T Transpose operation for a matrix or a vector x with bold front Vector or Matrix

Conventional MCC
We consider a sparse system identification within the MCC framework shown in Figure 1. From Figure 1, we can see that the MCC-based adaptive filtering is to mimic the entropy error e(n) that is a difference between output y(n) and desired signal d(n) under non-Gaussian noise v(n). As we know, the entropy is to quantify the uncertainty with designated random variables, and, hence, the minimized errors are concentrated. Herein, we discuss the MCC algorithm based on a linear system given in Figure 1. From the system identification theory, d(n) is depicted by: where x(n) = [x(n), x(n − 1), · · · , x(n − N − 1)] T represents an input vector, w o (n) = [w 0 , w 1 , · · · , w N−1 ] T denotes an unknown system which is an FIR channel, T is the transpose operation, v(n) denotes noise or interference signal which is non-Gaussian. Here, the memory size of the channel is N. In the AF framework, the estimated output is y(n) = x T (n)w(n), and, hence, e(n) is given by: w(n) represents the estimated system. The MCC algorithm is to calculate the equation [23,27]: Herein,ê(n) = d(n) − x T (n)w(n + 1), · represents l 2 norm, and we use ξ = χ MCC x(n) 2 .
To get the solution of Equation (3), the Lagrange multiplier (LM) method is used in this paper. Therefore, the MCC's cost function is [27]: where λ denotes Lagrange multiplier. Then, we have: Thus, we can get the following formula: Here, λ is obtained: We substitute Equation (6) into Equation (5), the updating of the MCC is [23,27]: Compared to the basic LMS, an exponential weighting is considered and used in MCC, which is for eliminating the large errors. The MCC algorithm is robust for dealing with impulsive noises for system identifications.

Zero Attracting Technique
From the update equation in Equation (7), it is found that the MCC system identification method does not consider the sparse-structure information in the sparse system w o (n). Furthermore, the MCC-based system identification method can be written as [23,27] w(n + 1) =w(n) + Adaption error updates.

Proportionate NMCC (PNMCC) Algorithm
In order to get the PNMCC, we reinvestigate the MCC in the AF framework again. Here, the NMCC is also rewritten in Equation (3). The differences between the MCC and NMCC algorithms are that we use ξ = χ in NMCC algorithm to replace the ξ = χ MCC x(n) 2 in the MCC. Similarly, the NMCC is to implement the following updating equation: Then, we use a reweighting controlling matrix G(n) for reassigning different weights to the magnitudes of the filter coefficients, where the idea is obtained from the known Pt algorithms. Introducing G(n) into (10), we get Here, ϑ > 0 represents a constant to get a stable solution. We know that a diagonal matrix G(n) is used to give a modification of NMCC's step size under a rule given as follows. In general, G(n) is [2][3][4] G(n) = diag(g 0 (n), g 1 (n), · · · , g N−1 (n)), (12) where the individual gain g i (n) is Here, ρ p > 0 and γ g > 0 are constants, and they are usually set to be ρ p = 0.01 and γ g = 5/N [2]. ρ p is implemented to avoid the stalling. The parameter γ g is used for preventingw i (n) from stalling.
Finally, we propose a CIM penalized PNMCC (CIM-PNMCC) algorithm, which is realized by introducing CIM penalty into PNMCC's cost function to construct a sparse PNMSS with a implementation of the CIM. We also take the gain controlling matrix into consideration to design the desired zero-attractor. According to the zero attracting sparse AF algorithms, the developed CIM-PNMCC is to calculate where G −1 (n) denotes the inverse operation of G(n). γ CIM > 0 is a regularization, which is a balance between the system identification errors and CIM penalty ofw(n + 1). Our devised CIM-PNMCC is different from the general zero attracting since it provides a scaling by G −1 (n) on the CIM penalty.
To get the minimization of (20), the LM method is employed. Therefore, the proposed CIM-PNMCC's cost function is The gradients on J(n + 1) in Equation (20) by consideringw(n + 1) and λ are given by Multiplying x T (n) on both sides of Equation (23), we get Considering Equation (22), we obtain The Lagrange multiplier is obtained Considering Equations (26) and (23), we get From the updated Equation (27), we can see that the elements in G(n)x(n)x T (n){x T (n)G(n)x(n)} −1 are very small in comparison with 1. Thus, we can ignore this term, and the proposed CIM-PNMCC's updating equation is We introduce a step-size β and ε CIM = δ 2 x /N. Then, the proposed CIM-PNMCC's updating equation is changed to bẽ w(n + 1) = w(n) + χ 1 exp − e 2 (n) 2σ 2 G(n)e(n)x(n) where χ 1 = ξ β, and ρ CIM = βγ CIM is a regularization parameter used for controlling the zero attracting strength. Similar to the early reported zero attracting algorithms, there is an additional in the CIM-PNMCC algorithm, which is called a CIM zero-attractor. In the devised CIM-PNMCC, the constructed CIM zero-attractor attracts small coefficients to zero with a high probability. Furthermore, our proposed CIM-PNMCC algorithm employs a gain controlling matrix to assign a large step-size for the dominant coefficients, while the zero attractor mainly gives a strong zero-attraction to the inactive coefficients. Therefore, our CIM-PNMCC algorithm has the advantages of the proportionate-type algorithms, normalized adaptive filtering algorithms and zero attraction adaptive filters. From the derivation of the CIM-PNMCC, we can put the remarks as follows:

1.
A PNMCC algorithm is devised by using a generalized Gaussian distribution function to utilize the prior-sparse-structure information in the in-nature systems.

2.
A CIM constraint is adopted and incorporated into the proposed PNMCC's cost function to create a modified cost function.

3.
The derivation of the devised CIM-PNMCC algorithm is presented by the use of the LM method to further take the advantages of the prior-sparse-structure information. 4.
The convergence of the CIM-PNMCC is analyzed and its performance is discussed for identifying sparse systems, which is compared with the previous MCC algorithms. 5.
Our developed CIM-PNMCC outperforms the previous MCC algorithms in terms of the convergence and MSD.

Convergence Analysis of the Devised CIM-PNMCC
The mean and convergence for the CIM-PNMCC is performed based on an approximation approach. For simple description, our proposed CIM-PNMCC is rewritten as where f (e (n)) = exp − e 2 (n) . For simple convergence analysis, we give some assumptions that are used frequently in [51][52][53][54].
We have the following assumptions (A): A1: x (n) with zero mean is i. i. d. A2: v (n) is independent with x (n), and it is zero mean, and its variance is σ 2 v . A3: f (e (n)) is independent with x (n). A4:w (n) and m (n) are independent with x (n). A5: The expectation of a ration of two random variables is equal to the ration of the expectation of each random variable. Moreover, the expectation is existing, and E

Mean Square Convergence (MSC)
As for the MSC of our CIM-PNMCC, we give the auto-covariance ofŵ (n) where Then, considering Equations (39), (32) and (33), we get where Under the assumptions 1-5, we can say that Y (n) and B (n) are both zero means. In addition, we know thatŵ (n), x (n) and v (n) are independent with each other. Putting Equation (40) into Equation (38), we have where ϕ (n) = E [ŵ (n)]. Since the Gaussian variable with fourth-order moment is three times the variable square [52], and we also know that S (n) is symmetric [52], we obtain Moreover, using Equations (31) and (39) and B (n), we have According to Equations (42)-(44), we have In Equation (45), ϕ (n), E [w (n)], and B (n) are all bounded. Thus, E w (n) B T (n) is converged. Then, the CIM-PNMCC is stable when By solving the Equation (46), we can get the range of χ 1 (47)

Results and Discussions of the PNMCC and CIM-PNMCC Algorithms
We will discuss the behavior of our proposed PNMCC and CIM-PNMCC for identifying sparse systems. Furthermore, their system identification performance is analyzed by comparing with NMCC, MCC, RZA-MCC and ZA-MCC algorithms. Additionally, the PNLMS algorithm is also employed for estimation performance comparison since the MCC is an LMS-like algorithm and the PNLMS is a proportionate-typed NLMS, which is also a PNMCC-like algorithm. The MSD is employed for evaluating the behaviors of the devised PNMCC and CIM-PNMCC algorithms, defining by Based on the derivation of the PNMCC and CIM-PNMCC algorithms, χ gives an important effect on the PNMCC algorithm, while χ 1 and ρ CIM play important effects on the CIM-PNMCC algorithm. Thereby, we first investigate their effects on the estimation behaviors over sparse systems with impulsive noise. In this paper, (1 − θ)N(ι 1 , ν 2 1 ) + θN(ι 2 , ν 2 2 ) = (0, 0.01, 0, 20, 0.05) is used for generating the desired noise, where N(ι i , ν 2 i )(i = 1, 2) are the Gaussian distributions with means of ι i and variances of ν 2 i , where θ is a mixture parameter. Herein, a sparse system with a length of N = 16 is adopted. One non-zero tap randomly distributes in the system. To investigate the effects of χ on the PNMCC algorithm, the simulation parameters are ϑ = 0.01 and σ = 1000 with corresponding results presented in Figure 2. The developed PNMCC converges faster when χ increases from χ = 0.3 to χ = 1, which has a similar function in the PNLMS algorithm to implement the step-size.
Similarly, the parameter χ 1 is also selected to understand the performance of the derived CIM-PNMCC, whose performance is described in Figure 3. The CIM-PNMCC's convergence speed rate becomes faster when χ 1 increases from 0.2 to 0.7. However, the estimation misalignment is getting worse with an increment of χ 1 . Additionally, ρ CIM is a zero attraction controlling parameter, which is designed to trade off the sparsity development and estimation bias. The effects of ρ CIM on the CIM-PNMCC estimation behavior is illustrated in Figure 4. It is observed that the estimation bias of the CIM-PNMCC is reduced as ρ CIM ranges from 9 × 10 −4 to 5 × 10 −5 . If ρ CIM is reduced from 5 × 10 −5 to 5 × 10 −7 , the estimation bias rebounds towards the opposite direction. Therefore, we should choose proper parameter value to obtain good performance of the PNMCC and CIM-PNMCC algorithms. Then, these parameters are used to obtain the convergence speed investigation of the developed PNMCC and CIM-PNMCC algorithms. Here, where χ NMCC , χ MCC , µ RZA , µ ZA , µ PNLMS are step-sizes for NMCC, MCC, RZA-MCC, ZA-MCC and PNLMS algorithms. ρ RZA and ρ ZA are the regularization parameters for RZA-MCC and ZA-MCC, respectively. The convergence comparisons for PNMCC and CIM-PNMCC algorithms are given in Figure 5. From Figure 5, we observe that the PNMCC converges faster than the PNLMS and NMCC algorithms. It is worth noting that our developed CIM-PNMCC gets the fastest convergence speed for adaptive system identifications. The CIM-PNMCC algorithm only needs 200 iterations to reach a steady-state MSD, while the previously reported RZA-MCC algorithm requires more than 400 iterations to get its steady-state MSD. Thereby, the CIM-PNMCC algorithm converges much quicker than the RZA-MCC algorithm for achieving the same MSD. Next, the effects of the sparsity level are analyzed by using the non-zero coefficients K in the FIR system. Here, we still use a 16 length sparse system that has K dominant coefficients. The simulated parameters for the mentioned adaptive filtering algorithms are χ MCC = 0.03, χ NMCC = 0.4, µ ZA = µ RZA = 0.03, ρ ZA = 8 × 10 −5 , ρ RZA = 2 × 10 −4 , µ PNLMS = 0.27, χ = 0.24, χ 1 = 0.3, ρ CIM = 5 × 10 −5 . The estimation behavior of our developed PNMCC and CIM-PNMCC algorithms is discussed for various K, and their estimation behaviors are given in Figures 6-9, respectively. It is noted that our developed PNMCC achieves a little gain compared with the PNLMS because they have comparable complexity and similar updating equation except the exponential weighting. However, our PNMCC algorithm achieves lower steady-state bias for K = 1. As for the CIM-PNMCC, it provides the lowest steady-state misalignment because it has a zero attraction term that can quickly force inactive coefficients to zero. With an increase of K, the MSD floor is increased. However, our developed CIM-PNMCC still provides the lowest steady-state misalignment, indicating that the CIM-PNMCC is more useful. Even for K = 8, our CIM-PNMCC algorithm can still get an MSD level of 10 −4 , which is a very low estimation bias. As we know, the ZA-MCC is a regular zero attraction algorithm. However, with the increasing of K, we can see that the performance of the ZA-MCC has degraded because of its uniform zero attraction on all the channel coefficients. Then, we construct an example to discuss the tracking behavior of our devised PNMCC and CIM-PNMCC algorithms. Herein, we investigate their tracking performance over network echo channels with different sparsities, and the channel length is set to be 256. We use ζ 12 to measure the sparsity of the designated network echo channel [37]. A typical network echo channel is given in Figure 10. The simulation parameters used in this experiment are The tracking behavior for ζ 12 (w o ) = 0.8222 and ζ 12 (w o ) = 0.7362 is illustrated in Figure 10. Our PNMCC algorithm outperforms the PNLMS, MCC and NMCC algorithms in terms of the estimation bias for ζ 12 (w o ) = 0.8222. However, our PNMCC algorithm achieves the similar tracking behavior to the PNLMS algorithm because of the weighted step-size assignment scheme. For our developed CIM-PNMCC, it possesses the fastest convergence speed and lowest MSD for both ζ 12 (w o ) = 0.8222 and ζ 12 (w o ) = 0.7362. Therefore, a conclusion is given that our developed PNMCC and CIM-PNMCC algorithms can be used to effectively handle the sparse system identification. Furthermore, the CIM-PNMCC algorithm has little effect on the sparsity of w o compared with the mentioned algorithms above. Finally, a practical speech is considered as the input signal to test the performance of the devised CIM-PNMCC. In this experiment, the used speech signal is a 2 s real speech, which has a sampling ratio of 8 kHz. The parameter used herein is the same as the last experiment. The speech signal, as well as the results, are shown in Figures 11 and 12, respectively. The proposed CIM-PNMCC can still get the best convergence speed and lower estimation error for echo cancellation applications.
According to the aforementioned discussion of the proposed PNMCC and CIM-PNMCC algorithms, the CIM-PNMCC achieves the smallest estimation misalignment and quickest convergence compared with the mentioned adaptive system identification algorithms. For the proposed PNMCC algorithm, it has a similar complexity to the PNLMS, while it can provide a little better gain than the PNLMS in impulsive noise environments. Moreover, the CIM-PNMCC performs better than the PNMCC since it introduces a CIM zero-attractor to use the sparsity characteristics in the used systems. This is attributed to the CIM measurement which can account for the dominant non-zero coefficients. In addition, the CIM zero attractor forces the inactive coefficients to zero rapidly. Additionally, the developed CIM-PNMCC algorithm has an additional regularization parameter ρ CIM that is used for controlling the zero attraction ability.

Conclusions
PNMCC and CIM-PNMCC algorithms were developed, and their behaviors were presented and discussed for sparse system identification. The CIM-PNMCC algorithm can provide the fastest convergence, while the PNMCC also performs better than the PNLMS algorithm. Since the CIM-PNMCC algorithm integrates a CIM zero attraction in its iteration, its convergence is faster and its estimation bias is smaller than the presented PNMCC algorithm. The obtained results showed that the CIM-PNMCC is stable, and it can obtain more gains. Therefore, the proposed CIM-PNMCC is a good candidate to consider the practical applications for sparse system identifications. For the future work, we will develop the block sparse PNMCC algorithmbased on the block method in [56,57] and reduce the complexity of the proposed CIM-PNMCC algorithm.