Newton Recursion Based Random Data-Reusing Generalized Maximum Correntropy Criterion Adaptive Filtering Algorithm

For system identification under impulsive-noise environments, the gradient-based generalized maximum correntropy criterion (GB-GMCC) algorithm can achieve a desirable filtering performance. However, the gradient method only uses the information of the first-order derivative, and the corresponding stagnation point of the method can be a maximum point, a minimum point or a saddle point, and thus the gradient method may not always be a good selection. Furthermore, GB-GMCC merely uses the current input signal to update the weight vector; facing the highly correlated input signal, the convergence rate of GB-GMCC will be dramatically damaged. To overcome these problems, based on the Newton recursion method and the data-reusing method, this paper proposes a robust adaptive filtering algorithm, which is called the Newton recursion-based data-reusing GMCC (NR-DR-GMCC). On the one hand, based on the Newton recursion method, NR-DR-GMCC can use the information of the second-order derivative to update the weight vector. On the other hand, by using the data-reusing method, our proposal uses the information of the latest M input vectors to improve the convergence performance of GB-GMCC. In addition, to further enhance the filtering performance of NR-DR-GMCC, a random strategy can be used to extract more information from the past M input vectors, and thus we obtain an enhanced NR-DR-GMCC algorithm, which is called the Newton recursion-based random data-reusing GMCC (NR-RDR-GMCC) algorithm. Compared with existing algorithms, simulation results under system identification and acoustic echo cancellation are conducted and validate that NR-RDR-GMCC can provide a better filtering performance in terms of filtering accuracy and convergence rate.


Introduction
Adaptive filters have been widely used in different engineering fields. Typical applications include system identification, active noise control, acoustic echo cancellation and channel equalization [1][2][3][4]. The adaptive filtering algorithms based on a mean square error (MSE) criterion mainly include the family of least mean square (LMS) algorithm, the tribe of affine projection (AP) algorithm and the class of least square algorithm [5][6][7]. Among them, the LMS algorithm is widely used because of its simple structure, low computational complexity, and good convergence in smooth environments. However, the above-mentioned MSE-based algorithms will have obvious filtering performance degradation in a non-Gaussian environment, especially in a noise environment with heavy-tailed distribution in which the typical heavy-tailed distribution noises include Laplace noise, Cauchy noise, mixed Gaussian noise, and alpha-stable distribution (α-SD) noise [8].
In order to enhance the ability of an adaptive filtering algorithm to suppress impulsive noise and improve the robustness of algorithms, from the perspective of similarity measurement, a series of nonlinear optimization criteria and/or cost functions have been proposed and applied to signal processing. Typical examples include the l p -norm with p ∈ [1, 2) [9], the M-estimation theory [10], and the information theoretic learning (ITL) family [11]. More details about robust adaptive signal processing schemes can be seen in a review article [12]. In the family of ITL criteria, thanks to all the even-order moment information of the error signal contained in the minimization of error entropy (MEE) [13] and the maximum correntropy criterion (MCC) [14,15], they are widely used in robust signal processing and machine learning. Generally speaking, the MCC criterion has a smaller computational burden than that of the MEE criterion. Although correntropy can provide a generalized similarity measure between two random variables, the Gaussian kernel function used in the MCC criterion may not always be the best choice. Therefore, as an extension of the MCC criterion, the correntropy criterion based on the generalized Gaussian density function, usually called the generalized maximum correntropy criterion (GMCC), has been proposed and widely used in the field of robust adaptive filtering [16][17][18][19][20][21]. Similar to the LMS algorithm family, when the correlation degree of the input signal gradually increases, the gradient-based generalized maximum correntropy criterion algorithm (GB-GMCC) will have obvious convergence performance degradation. In addition, the gradient method can only provide relevant first-order derivative information, which leads to certain deficiencies in the filtering performance of corresponding algorithms.
The GB-GMCC algorithm has the disadvantage of a slow convergence rate in highly colored input situations. In other words, the gradient method is not suitable for the correlated inputs. There are many methods to overcome this issue. Examples include affine projection, the recursive method [10], the subband method [22] and the data-reusing method, etc. However, relatively speaking, the complexity of the data-reusing method is usually lower. Therefore, we can consider it as a candidate method for solving the problem introduced by the highly colored inputs. On the other hand, the first-order derivative information may damage the convergence rate of GB-GMCC, therefore we can consider the second-order derivative information to remedy such issues, that is to say, the Newton recursion method can be a good choice to replace the gradient method.
A new robust adaptive filtering algorithm is proposed by the methods of data-reusing, Newton recursion and GMCC. The new algorithm is called the Newton recursion-based data-reusing generalized maximum correntropy criterion algorithm (NR-DR-GMCC). In addition, to further improve the convergence rate and filtering accuracy of the NR-DR-GMCC algorithm, based on the random data-reusing strategy, an enhanced version of NR-DR-GMCC is derived and is named the Newton recursion-based random data-reusing generalized maximum correntropy criterion algorithm (NR-RDR-GMCC). Some simulation results under system identification and acoustic echo cancellation are conducted and demonstrate that NR-RDR-GMCC can achieve a better filtering accuracy and faster convergence rate than existing related algorithms in the α-SD noise environment. The main contributions of this work are as follows: • Based on the data-reusing and Newton recursion method, we propose the NR-DR-GMCC algorithm, which is derived by maximizing the generalized correntropy. Based on the data-reusing method, the information contained in the latest M input vectors can be obtained to combat the negative influence of highly colored inputs. Using the Newton recursion method, the second-order derivative information, i.e., the Hessian matrix, can be explored. Thus, the Hessian matrix can be used to update the weight vector leading to a faster convergence rate, thereby overcoming the disadvantage of the gradient method that only considers the first-order derivative information. • Inspired by the random strategy method, NR-RDR-GMCC can be derived by introducing a cache window with length C ≥ M into the NR-RDR-GMCC algorithm. Compared with the NR-DR-GMCC algorithm, NR-RDR-GMCC has very similar computational complexity except for requiring extra memory to contain C input vectors. However, NR-RDR-GMCC can explore more historical information and avoid the effects of consecutive large outliers through the cache window and use it to update the weight, thereby improving the filtering performance of the NR-DR-GMCC algorithm.
The rest of this paper is organized as follows. In Section 2, some preliminaries about α-SD Noise, the GMCC criterion, and the Newton recursion method are reviewed. Section 3 presents our proposed algorithms, namely, NR-DR-GMCC and NR-RDR-GMCC. Simulation results are presented in Section 4. Finally, the conclusion is given in Section 5.

Alpha-Stable Distribution Noise
In this paper, the impulsive noise can be simulated by α-SD, and the noise model can construct the ubiquitous non-Gaussian noise with strong impulsive characteristics well [23]. The characteristic function of this model is defined as: where j 2 = −1; α ∈ (0, 2] denotes a characteristic exponent measuring the thickness of the tail of a distribution. The smaller the value of α, the thicker the tail of the corresponding distribution, and the more significant the impulsive characteristics. When the value of α is close to 2, the corresponding distribution is close to the Gaussian distribution; λ > 0 denotes a dispersion parameter, which is similar to the variance of a Gaussian distribution; ς ∈ [−1, 1] means the symmetry parameter, which can define the inclination of the distribution; and δ ∈ (−∞, +∞) denotes the location parameter. In general, when α = 2, ς = 0, α-SD is equivalent to a Gaussian distribution. In this paper, for the sake of simplicity, the parameter vector of α-SD is denoted as d T α = [α, ς, δ, λ].

Generalized Maximum Correntropy Criterion
Correntropy measures the local similarity between two random variables X and Y [11], where E[·] stands for the expectation operator; F X,Y (x, y) denotes the joint distribution function of X and Y; and κ δ (·, ·) denotes a Gaussian kernel defined as: where σ > 0 means a kernel factor. In practice, the joint distribution F X,Y (x, y) is usually unknown, and thus only a finite number of data {x n , y n } N n=1 are used to derive the sample estimator of (3) as From a probabilistic statistical point of view, the maximization of correntropy can lead to the error producing the maximum probability density at the origin, and thus the Equation (4) can be regarded as a nonlinear cost function, and it has been widely adopted in the field of robust adaptive signal processing. However, the Gaussian kernel function used in the estimation (4) may not always be the best choice. Therefore, other types of kernel functions can be used to replace the Gaussian kernel, such as a kernel function based on the generalized Gaussian density (GGD) function, i.e., where τ = t −s is the kernel parameter; s and t are positive numbers and present the shape parameter and the scale factor, respectively; and γ s,t = s/2tΓ s −1 denotes the normalization constant with Γ(·) being the Gamma function. Injecting GGD into (4), a new sample estimator can be obtained as In the field of robust adaptive signal processing, (6) is often referred to as generalized correntropy. In applications such as regression analysis and classification studies, the correntropy loss (C-loss) metric can be used in place of correntropy [24]. Inspired by this, the generalized C-loss (GC-loss) function between the random variables X and Y can be defined as where e n = x n − y n means the error information. Equation (7) shows that minimizing GC-loss is equivalent to maximizing generalized correntropy, which is usually denoted as GMCC and is widely used in the field of adaptive filtering [18,20,25].

Newton Recursion
Using the second-order Taylor formula to expand a continuous function where Therefore, based on (8), we can obtain the gradient of f (x) with respect to x as Setting g(x) = 0, we obtain which leads to the general form of the Newton recursion method [26], This recursive method is often used to obtain approximate solutions of nonlinear equations g(x) = 0. If g(x) is a real-valued function, then x k+1 represents the point where the tangent y − g(x k ) = H(x k )(x − x k ) of the function g(x) at point x k intersects the xaxis [27]. It is worth noting that when g(x) is a vector function, the H(x) becomes a Hessian matrix and, in this case, the use of Newton recursion must ensure that the Hessian matrix is positive definite.

GB-GMCC
The following linear model is considered to reconstruct an unknown system such that the output y(n) of a adaptive filter matches the desired signal d(n) where u(n) = [u(n), u(n − 1), . . . , u(n − L + 1)] T denotes the input vector with length L; T means the intrinsic weight vector of the unknown system; and v(n) is the α-SD noise. For simplicity, the normalized constant γ s,t in the GGD function is ignored, and the sample estimator in (6) is replaced by an instantaneous estimator to establish the following cost function, where e(n) = d(n) − u(n) T w denotes the estimation error and w = [w 1 , w 2 , . . . , w L ] T means the estimated weight vector. Calculate the gradient of (13), and use the gradient ascent method to obtain the weight vector update formula of the GB-GMCC algorithm as where denotes a nonlinear function of the error information; and µ > 0 stands for the learning step size.

NR-DR-GMCC
Observing from (14), one can find that the GB-GMCC algorithm only uses the current input vector u(n) to update the weight vector. Although this method has a lower computational complexity, it results in the slower convergence rate of the GB-GMCC algorithm when facing the increment of the correlation of the input signal. This problem can be overcome by using the data-reusing method, and thus the cost function in (13) can be changed as where represents an error vector consisting of M recent error signals and · s s means the s power of s -norm. In addition, the GB-GMCC algorithm updates the weight vector by using the gradient ascent strategy, which only considers the first-order derivative information. When all elements in the gradient are zero values, the stationary point of the gradient ascent method may be a local maximum and/or a saddle point. In other words, the gradient ascent method is an inefficient or even ineffective optimization method in some situations. To this end, inspired by the Newton recursion strategy, we can derive a new generalized maximum correntropy criterion algorithm as follows.
Let g(w) = ∂J(w) ∂w , and take the gradient of (16) with respect to w as where sgn(·) is a symbolic function. Further denoting ∂w as H(w), we can obtain the Hessian matrix as Based on the following definitions, the g(w) and H(w) can be represented as more compact formations as sgn(e(n)) = [sgn(e(n)), sgn(e(n − 1)), . . . , sgn(e(n − M + 1))] T , (20) where is the Hartman product. Therefore, based on Newton recursion in (11), a weight vector update formation of the Newton recursion-based data-reusing generalized maximum correntropy criterion (NR-DR-GMCC) algorithm is derived as where R u,ϕ (n) = U(n)ϕ(n)ϕ(n) T U(n) T ; R u = U(n)U(n) T is a covariance matrix of the input matrix U(n); η > 0 is step-size; and > 0 denotes a small-valued smoothing factor.

NR-RDR-GMCC
Although the filtering performance of NR-DR-GMCC can be improved by digging much information contained in the latest M input vectors, the computational complexity of the algorithm will be increased when the data-reusing order M is large. Therefore, to balance the computational burden and the good filtering performance, we further inject a random strategy of data-reusing into the NR-DR-GMCC algorithm, and thus an enhanced version of NR-DR-GMCC is obtained as follows.
Based on this, we can define the error vector as e r (n) = e r Based on some similar operations to (18) and (19), we can obtain a weight vector update formation of theNewton recursion-based random data-reusing generalized maximum correntropy criterion (NR-RDR-GMCC) algorithm as in which the following symbols are used Remarks: (1) Based on the data-reusing method, the NR-DR-GMCC algorithm can use the latest M input information to update the weight equation, remitting the convergence rate degradation problem of GB-GMCC due to the increased correlation of the input signal; (2) In comparison with GB-GMCC only using the information of the first-order derivative, the NR-DR-GMCC algorithm can use the second-order information provided by the Newton recursion method, and thus NR-DR-GMCC achieves faster convergence behavior; (3) As an improved version of NR-DR-GMCC, the NR-RDR-GMCC algorithm uses a cache window with length C > M to store the latest C input vectors. Then, to update the weight vector, NR-RDR-GMCC randomly selects M input vectors from the cache window. According to the random data-reusing method, NR-RDR-GMCC obtains abundant error information from the cache window and avoids the convergence loss caused by continuous outliers. Furthermore, when the data-reusing order M is fixed, NR-RDR-GMCC does not have a similar computational complexity to that of NR-DR-GMCC with data-reusing order M, but also achieves a better filtering performance than NR-DR-GMCC. Moreover, observing from (12) and (16), one can find that, when C = M, the NR-RDR-GMCC algorithm becomes the NR-DR-GMCC algorithm. That is to say, NR-RDR-GMCC can be regarded as an effective extension of NR-DR-GMCC. Algorithm 1 summarizes the pseudocode of the proposed algorithms. Moreover, Figure 1 plots the schematic diagram of the data-reusing and the random data-reusing methods, and shows the difference between these two methods, namely, the data-reusing method only utilizes the latest M inputs, whereas the random data-reusing method saves the latest C inputs and randomly selects M entries from these C inputs.

Computation Complexity Analysis
Different from the traditional GB-GMCC algorithm, the random strategy is applied in NR-RDR-GMCC for efficiently reusing the past M input vectors, and thus our proposal can extract more historical information supporting the weight vector update inspired by Newton recursion. Although, compared with GB-GMCC algorithm, our proposed NR-RDR-GMCC has a higher computational complexity, it is worthwhile since NR-RDR-GMCC outperforms GB-GMCC in terms of filtering accuracy as will be shown in the simulation results. It is worth noting that the matrix inverse involved in NR-RDR-GMCC can be achieved with the help of some iterative optimization methods, thereby reducing the computational complexity of the NR-RDR-GMCC algorithm [28]. In addition, Table 1 lists the computational complexity of the proposed NR-RDR-GMCC and other related algorithms per iteration in terms of multiplications and additions in which M represents the order of data-reusing and/or projection, L stands for the tap length, N τ,s exp indicates the computational complexity associated with the nonlinear function exp −τ|e(i)| s |e(i)| s−1 [20], and O(L 3 ) is the computational burden required for the direct inverse of the square matrix in the L × L dimension.

Algorithms
Multiplications Additions

System Identification
In order to evaluate the filtering performance of the NR-RDR-GMCC algorithm, we consider the system identification problem represented in (12), where the unknown weight vector w o is randomly generated and the tap length L = 32. Impulsive noise is modeled by α-SD with the parameter vector d T α = [1.5, 0, 0, 0.1]. The input signal u(n) is generated by a zero-mean Gaussian noise with unit variance filtered by the following second-order system: To compare the filtering performance of competing algorithms, the normalized mean square deviation (NMSD) is used, i.e., NMSD = 20 lg w − w o w o ; (27) in addition, all NMSD simulation results are averaged over 100 independent experiments. Firstly, in order to study the effect of the order M of data-reusing on the convergence performance of the NR-RDR-GMCC algorithm, we set different M from ∈ {2, 4,6,8,16, 32}, and the other parameters are fixed as C = 512, s = 1.5, τ = 0.0001, η = 0.005, and = 0.0001. Figure 2 plots the corresponding NMSD curves. From Figure 2, we can observe that: (1) When M is large, such as M = 32, the NR-RDR-GMCC algorithm does not converge; (2) Under the premise of the convergence of NR-RDR-GMCC, by appropriately increasing the M value, such as from 2 to 16, the convergence rate of NR-RDR-GMCC will be significantly improved; (3) With some moderate values, such as M = 4, NR-RDR-GMCC can achieve an acceptable convergence rate and the best filtering accuracy. Therefore, for NR-RDR-GMCC, in practical applications, it is necessary to select a proper value of M to make the trade-off among the computational complexity, filtering accuracy and convergence rate.  Figure 3, which clearly shows that: (1) The filtering accuracy of NR-RDR-GMCC can be enhanced by increasing the values of C. In addition, with some large C values, such as C ∈ {256, 512}, the jitter behavior of NR-RDR-GMCC can be restrained, thereby improving the stability of our proposed algorithm; (2) When C = M = 16, NR-RDR-GMCC reduces to the NR-DR-GMCC algorithm. However, compared to NR-RDR-GMCC with other large C values, the NR-DR-GMCC algorithm realizes the fastest convergence rate at the cost of the worst accuracy. That is to say, with some moderate C values, NR-RDR-GMCC can achieve a good balance between a smaller misadjustment and a good convergence rate. Thirdly, we study the effect of the shape parameter s and kernel parameter τ on the filtering accuracy of NR-RDR-GMCC. To this end, different s and τ values are considered and other parameters are set as C = 512, M = 16, η = 0.005 and = 0.0001. Figure 4 shows the steady-state NMSD (SS-NMSD) estimated by the last 100 NMSD for NR-RDR-GMCC with respect to various values of s ∈ (0, 3) and τ ∈ (0.00001, 0.0002). From this figure we can observe that: (1) With the same shape parameter s, different τ values lead to NR-RDR-GMCC obtaining very similar filtering accuracy; (2) With the same kernel parameter τ, different s values result in obvious steady-state behaviors of NR-RDR-GMCC. That is to say, when s ∈ (1.0, 3.0), the filtering accuracy will gradually increase with the decrease of s; when s = 1.0, the algorithm mutates into a divergent state; when s ∈ (0.0, 1.0), the algorithm does not converge. In addition, to clearly show the filtering behavior, some NMSD curves of NR-RDR-GMCC with τ ∈ {0.00001, 0.002} and s ∈ {1.5, 2.0, 2.5, 3.0} are plotted in Figure 5. From this figure, we have the following observations: (1) When the s parameter is same, in most cases, the NR-RDR-GMCC algorithms with τ = 0.00001 or τ = 0.002 realize similar convergence behavior; (2) With the same τ parameter, NR-RDR-GMCC can realize an enhanced convergence rate and filtering accuracy by decreasing the values of s from 3.0 to 1.5. Finally, in order to verify the effectiveness of the proposed NR-RDR-GMCC algorithm, some related algorithms, such as NR-DR-GMCC, GB-GMCC [19], affine projection GMCC (AP-GMCC) [20], AP sign algorithm (APSA) [29], AP Versoria (APV) [30] and MCC-APA [31], are considered in this part. For all algorithms, the smooth parameter or regularization parameter = 0.001; for all GMCC based algorithms, s = 1.5, τ = 0.0001; other parameter settings are experimentally tested so that the algorithms have similar initial convergence rates. Table 2 lists the parameter setting. In addition, to further verify the advantages of the GMCC criterion, this experiment also considers the special case of s = 2.0 in the NR-RDR-GMCC algorithm; in other words, the NR-RDR-GMCC algorithm reduces to an MCC-based NR-RDR algorithm. Furthermore, we change the unknown weight vector from w o to −w o at the middle iterations to simulate system mutation and explore the tracking performance of various algorithms. Figure 6 plots the corresponding NMSD curves and reveals that: (1) Compared to the GB-GMCC algorithm, the NR-RDR-GMCC (s = 1.5) algorithm has a very obvious advantage in terms of filtering accuracy and convergence rate. In addition, irrespective of the mutation of system, in comparison with other algorithms, the NR-RDR-GMCC (s = 1.5) algorithm achieves the comparable convergence rate and the best filtering accuracy; (2) In this simulation, the NR-RDR-GMCC (s = 2.0) algorithm is inferior to the other algorithms in terms of convergence rate and steady-state misadjustment as shown in Table 2. That is to say, in some situations, to derive some robust adaptive filtering algorithms, GMCC is better than MCC.

Acoustic Echo Cancellation
To further compare the filtering behaviors of the mentioned algorithms, a typical application, i.e., acoustic echo cancellation (AEC), is also considered in this work. Under the condition of double-ended voice, in this part, two kinds of environmental noises are considered, namely: (1) the noise v(n) being impulsive and (2) the noise v(n) not being impulsive. In all trials, a Gaussian background noise with a signal-to-noise ratio of 60 dB was considered. The echo paths w o 1 and w o 2 [32] involved in this experiment and the voice input signal are shown in Figures 7 and 8, respectively, where the signal sample rate was set to 8 kHz. To simulate the system mutation, the echo path was abruptly changed from w o 1 to w o 2 at the 7000-th iteration. Except where otherwise mentioned, for all algorithms, the data-reusing order or projection order is M = 16, the regularization parameter or smooth parameter is = 0.0001, and for GMCC-based algorithms the kernel factor is τ = 0.001.

Without Impulsive Noise
In order to compare the acoustic echo cancellation filtering performance of the NR-RDR-GMCC algorithm and other related algorithms in an environment without impulsive noise, based on various experimental tests, Table 3 lists the remaining parameters' setting to realize a similar initial convergence rate. Figure 9 plots the corresponding NMSD curves. From this figure, we can see that: (1) Due to the interference of near-end voice, the NR-RDR-GMCC algorithm and other related algorithms have a slightly jitter phenomenon after convergence; (2) Without interference of impulsive noise, NR-RDR-GMCC is superior to other algorithms in terms of convergence rate and filtering accuracy before or after system mutation.

With Impulsive Noise
In this part, we use the Bernoulli-Gaussian distribution model to obtain the impulsive noise, i.e., v(n) = b(n)v g (n), where b(n) is the Bernoulli process with the probability density model P{b(n) = 1} = Pr = 0.001. Before the system mutation, v g (n) is a zeromean Gaussian noise with a variance of 151. After system mutation, the variance of v g (n) becomes 110. For all algorithms, Table 4 lists the remaining parameters' setting to realize a similar initial convergence rate. The corresponding NMSD curves are plotted in Figure 10, which shows that, although some impulsive noises may have a negative effect on the filtering performance of the considered algorithms, our proposed NR-RDR-GMCC still achieves the smallest steady-state misadjustment before or after the mutation of the system.

Conclusions
The traditional GB-GMCC algorithm has a degradation of convergence performance when facing highly colored input signals. In this work, to overcome this issue, the NR-DR-GMCC algorithm was derived by using the data-reusing method and the Newton recursion method. In addition, to further avoid the influence of continuous large outliers in the output signal and extract more historical data, a cache window is considered to collect the latest C input vectors, and then M input vectors in the cache window are randomly selected to derive an enhanced version of NR-DR-GMCC, namely, NR-RDR-GMCC. From a mathematical point of view, when the cache window length C is equal to the data-reusing order M, NR-RDR-GMCC can be converted to the NR-DR-GMCC algorithm, and thus the former can be regarded as a good extension of the latter. The system identification simulation results show that the NR-RDR-GMCC algorithm has a better filtering accuracy and a faster convergence rate than the corresponding AP-type algorithm and GB-GMCC. Furthermore, in acoustic echo cancellation applications, the NR-RDR-GMCC algorithm has obvious advantages over other related algorithms in terms of steady-state misalignment.