Low-Complexity Constrained Recursive Kernel Risk-Sensitive Loss Algorithm

: The constrained recursive maximum correntropy criterion (CRMCC) combats the non-Gaussian noise effectively. However, the performance surface of maximum correntropy criterion (MCC) is highly non-convex, resulting in low accuracy. Inspired by the smooth kernel risk-sensitive loss (KRSL), a novel constrained recursive KRSL (CRKRSL) algorithm is proposed, which shows higher ﬁltering accuracy and lower computational complexity than CRMCC. Meanwhile, a modiﬁed update strategy is developed to avoid the instability of CRKRSL in the early iterations. By using Isserlis’s theorem to separate the complex symmetric matrix with fourth-moment variables, the mean square stability condition of CRKRSL is derived, and the simulation results validate its advantages.


Introduction
The constrained adaptive filters (CAFs) [1], where the weight is subject to linear constraints, have been widely studied in the field of adaptive signal processing. The original research of CAFs was derived from the antenna array processing, which employed the linearly-constrained minimum-variance (LCMV) criterion to estimation the direction of the antenna array [2]. CAFs have since been successfully applied to adaptive beamforming [3], system identification [4], channel equalization [5], and blind multiuser detection [6].
The simplest linearly-constrained adaptive filter, named constrained least mean-square (CLMS) [2], is developed from the LCMV criterion, and its mean square performance is analyzed based on a decomposable symmetric matrix in [7]. Due to stochastic gradient optimization, the CLMS has a simple structure with low computational complexity. However, its performance is highly influenced by the step size and correlated input. To improve the convergence speed, the constrained fast least-squares (CFLS) algorithm [8], linear-equality-constrained recursive least-squares (CRLS) algorithm [9] and its relaxed version are proposed at the expense of high computational complexity. Furthermore, the reduced-complexity constrained recursive least-squares algorithm based on the dichotomous coordinate descent (CRLS-DCD) iterations [10] and low-complexity constrained affine-projection (CAP) algorithm [11] with the data-selection method are proposed for reducing the computational complexity effectively. Other types of improved constrained filters [12][13][14] have been widely used, which make a trade-off between the computational complexity and filtering performance, i.e., convergence speed and filtering accuracy. All the mentioned constrained algorithms above are developed based on the mean square error (MSE) criterion [15] with good performances under Gaussian assumptions. However, with non-Gaussian cases, the filtering accuracy of these algorithms will decline sharply.
Therefore, the maximum correntropy criterion (MCC) [16,17], generalized MCC (GMCC) [18] and minimum error entropy (MEE) [19] criteria from information theoretic learning (ITL) [20] become alternative criteria, showing strong robustness to non-Gaussian signals. Adding the linear constraints on the weights of MCC and GMCC, the constrained MCC (CMCC) [21] and constrained GMCC (CGMCC) [22] are developed by using stochastic gradient optimization. CMCC and CGMCC display good filtering performances in the presence of single-peak heavy-tailed noise. However, when coping with multi-peak noise, their performances will decline. Due to the symmetry of the errors, MEE counteracts the influence of multi-peak noise effectively. Adding the linear constraints into MEE, a gradient-based constrained MEE (CMEE) algorithm [23] with a sliding window is proposed at the expense of higher complexity but with better accuracy than CMCC and GMCC in the presence of multi-peak noise. Except for the ITL-based constrained filters, there also exist other criteria [24,25] to show good performance under a non-Gaussian environment. A constrained least mean M-estimation (CLMM) algorithm based on an improved M-estimation loss function has been proposed in [24]. Inspired by the boundedness of the gradient of the lncosh function, a constrained least lncosh adaptive filtering algorithm (CLLAF) has been developed in [25]. These constrained algorithms show good performances under different non-Gaussian noise. However, the gradient-based constrained adaptive filters need to make a trade-off between accuracy and convergence speed through an adjustable step size. Hence, the constrained recursive MCC (CRMCC) algorithm [26] is developed, which not only improves the accuracy but also accelerates the convergence.
Recently, an advanced ITL-based criterion, named kernel risk-sensitive loss (KRSL) [27], has been proposed by introducing a beneficial risk-sensitive parameter to regulate the shape of its performance surface. Compared with MCC, the KRSL is more "convex", which conduces better accuracy and faster convergence. Based on the KRSL, a gradient-based constrained mixture KRSL algorithm [28] is proposed, which indicates a higher filtering accuracy than CMCC.
In this paper, thanks to the advantage of KRSL criterion, a novel constrained recursive KRSL (CRKRSL) algorithm is proposed by using an average approximation method [29]. Thus, the proposed CRKRSL algorithm is converted to a variable step size gradient-based algorithm, which has a lower complexity than the traditional constrained recursive algorithms. Meanwhile, due to the instability of CRKRSL at the initial update stage, we use a gradient algorithm with a fixed step size to replace its initial update. Moreover, the convergence condition of mean square stability regarding iteration times is derived by decomposing a symmetric matrix with fourth-order variables. Simulation results indicate the advantages of CRKRSL on filtering accuracy and computational complexity.
The rest of the paper is organized as follows. The constrained KRSL loss and the CRKRSL algorithm are presented in Section 2. Stability analysis of CRKRSL is given in Section 3. Simulation results and discussion of CRKRSL are shown in Section 4. Finally, the conclusion is given in Section 5.

Notations
Throughout this paper, R denotes the real field, R m denotes an m-dimensional realvalued vector space, R n×m denotes n × m matrix where its entries belong to R; (·) T represents the transpose operation; E[·] is the expectation operation; · denotes the Euclidean norm; and O(·) represents the computational complexity of algorithm.

KRSL Loss
As a nonlinear similarity measure of variable X , Y ∈ R, the KRSL [27] is defined as where γ ∈ [0, +∞) is a risk-sensitive parameter. ϕ(·) is the corresponding mapping variable in reproducing kernel Hilbert spaces (RKHS) [30], Generally, in terms of the unknown joint distribution between X and Y, we adopt a finite sampled data pair {x l ,ỹ l } N l=1 to approximate the expected loss, which is given bŷ

Constrained KRSL Loss
When applied to constrained adaptive filtering, the optimization problem with KRLS loss becomes where {u l , d l } N l=1 ∈ R m × R are the input-output training data pair; w ∈ R m is the weight vector and e l = d l − w T u l is the corresponding error; C ∈ R m×q and f ∈ R q are the constrained matrix and vector, respectively. By constructing the Lagrange function, the constrained problem is transformed to minimize the following constrained KRSL loss: with θ n ∈ R q being the Lagrange multiplier.

Proposed CRKRSL Algorithm
Taking the gradient of J c (n) to zero at instant n, one has with φ(e l ) = exp(γ(1 − κ(e l )))κ(e l ). Define Then, the constrained solution is derived as . By the matrix inversion lemma [31], Equation (7) is further rewritten by where the gain is given by Reorganizing Equation (11), we get another form of g n by To obtain a recursive solution, we expand Equation (9) as follow: where the corrected error e n updated by the a priori weight w n−1 is obtained by Substituting Equation (12) into Equation (13), we get with Therefore, combing Equations (10), (11), (14) and (15), we obtain the constrained recursive CRKRSL algorithm. The main drawback of Equation (15) is that the inverse matrix (C T U n C) −1 needs to be updated iteratively with complexity O(q 3 ). To reduce the computational complexity of the CRKRSL algorithm, we consider the following linear model and give some assumptions. The linear model is described by where w * is the model parameter and v n is the noise at instant n. The assumptions are given as follows: A3 the error e n is uncorrelated with u n u T n . Inspired by the average approximation [29] and based on A1-A3, the correlation matrix U n is approximated by where ) −1 is approximated by the first-order Taylor expansion around the noise.
Furthermore, Equation (13) can be simplified as w n = w n−1 + φ(e n )e n U n u n + U n Cθ n = w n−1 + η n φ(e n )e n Zu n + ZCθ, whereθ = Θ(f − C T (w n−1 + η n φ(e n )e n Zu n )) and the constrained inverse matrix is defined as Θ = (C T ZC) −1 . Therefore, w n is further expressed as w n = Q(w n−1 + η n φ(e n )e n Zu n ) + p (20) with Q = I − ZCΘC T and p = ZCΘf.
Based on the approximation, the recursive CRKRSL algorithm has been changed to a gradient one with variable step size η n and transformed input Zu n . (20) has a significant impact on the stability of the CRKRSL algorithm under non-Gaussian noise since CRKRSL can suppress large outliers (e n → ∞) with a small φ(e n ). Figure 1 shows the relation between φ(e n ) and the error e n . It is clear to see that φ(e n ) in CRKRSL is bigger than κ(e n ) in CRMCC, when the error is small. (Note that φ(e n ) = κ(e n ) if γ = 0). Moreover, when γ = 0, the CRKRSL degenerates to an efficient CRMCC [26] algorithm and φ(e n ) reaches the maximum at e n = 0. When γ > 0, φ(e n ) reaches the maximum at local points around e n = 0 with a larger increment, resulting in a faster convergence speed and higher accuracy than CRMCC. However, Equation (20) is not stable at the initial update phase, since the variable step size η n is large with a small instant n. Especially, when the kernel width of KRSL is small, the step size η n even exceeds the convergence range, leading to the attenuation of filtering performance. To overcome this unfavorable factor, we introduce a gradient strategy with a fixed step size µ to replace the update of Equation (20) in the initial L iterations, which is described by w n =Q(w n−1 + µφ(e n )e n u n ) +p (21)

Remark 1. The term φ(e n ) in Equation
Finally, the CRKRSL is summarized in Algorithm 1.

Initialization:
Choose step-size µ; kernel width σ; risk-sensitive γ; initial iterative length L; training size N tr ; initial weight w 0 = 0. e n = d n − w T n−1 u n for n = 1 : L w n =Q(w n−1 + µφ(e n )e n u n ) +p end for n = (L + 1) : N tr w n = Q(w n−1 + η n φ(e n )e n Zu n ) + p end Remark 2. In Equation (20), the constant inverse matrix Θ = (C T ZC) −1 needs to be calculated only once before the update. On the contrary, (C T U n C) −1 in Equation (15) needs to be updated iteratively. Meanwhile, the update of matrix U n by Equation (10) is avoided by using Equation (20). Therefore, the proposed CRKRSL with Equation (20) has a lower computational complexity than one with Equation (15).

Stability Analysis
To obtain the stability condition of mean square weight error of CRKRSL, we define the weight error and model error asw with the optimal weight where the robust correlation matrix and vector are defined as U = lim . Subtracting w o from both sides of Equation (20), we obtaiñ w n = Q(w n−1 + η n φ(e n )Zu n d n − η n φ(e n )Zu n u T n w n−1 ) Since Q is an idempotent matrix, we have Qw o − w o + p = 0 and Qw n =w n . Then, we getw n = Q(I − η n φ(e n )Zu n u T n )w n−1 + η n φ(e n )QZu n (u T n w ∆ + v n ) = (I − η n φ(e n )QZu n u T n )w n−1 + η n φ(e n )QZu n (u T n w ∆ + v n ). (26) We take the expectation operator on both sides of the square norm, since the noise v n is independent with u n and the input sequence {u n } is i.i.d under assumptions A1-A2. We further get the useful results that the a priori error weightw n−1 is independent with u n and v n based on the independence assumptions [32]. Hence, the cross terms are equal to zeros. Then, we obtain with According to Isserlis's theorem [33], the symmetric matrix with fourth-moment Gaussian variables can be separated by where tr{·} denotes the trace operator and R = Z −1 = E[u n u T n ] is a positive define correlation matrix.
Therefore, Equation (27) can be simplified as with a simplified Define q k and r k , k ∈ {1, 2, · · · , m}, being the kth eigenvalue of matrix Q and R. To ensure the square stability, the eigenvalues of F n should satisfy the following condition: Then, the convergence condition about the step size can be expressed as Since the step size η n = (nE[φ(v n )]) −1 is related to the iteration n, finally, we obtain the weight square error stability condition about the iteration n by where the nonlinear terms E[φ(e n )] and E[φ 2 (e n )] can be approximated by the Taylor expansion.

Remark 3.
Inequality (34) implies that the iteration n should be sufficiently large to guarantee the convergence. When n is small, CRKRSL with Equation (20) cannot satisfy (34), resulting in fluctuations at the initial update stage. Therefore, it is reasonable to use Equation (21) to replace Equation (20) to improve the convergence speed and filtering accuracy.

Results and Discussion
In this section, we will show the advantages of the CRKRSL algorithm on the filtering accuracy and computational complexity for both low-dimensional and highdimensional inputs. The noise model, data selection and algorithms comparison are given as follows, respectively.
Data selection: The training inputs sampled from a Gaussian distribution with zero mean and variance matrix R and 5000 samples are chosen for the simulation. The parameters C, f, R are configured as the same as in [7]. Note that there exist two sets of data with different input dimensions in [7]. The underlying dimension is either m = 7 or m = 31. The simulated mean square deviation (MSD) is define as MSD(dB) = 10 log 10 ( w n 2 ) and the steady-state MSD is defined as the mean of last 1000 samples. The obtained results are averaged over 500 Monte Carlo trials. The results were run on MATLAB version R2020b on a Windows 10 operating system, configured with an Intel(R) Core(TM) i7-8700 CPU 3.20 GHz and RAM 16 GB.
Compared algorithms: The constrained algorithms, including the CLMS [7], CMCC [21], CLLAF [25] CRLS [9], and CRMCC [26], are chosen to be compared with the proposed CRKRSL. For fair comparison, the kernel widths σ of CMCC, CRMCC, and CRKRSL are set by the same value; the regularization terms of CRLS and CRMCC are set as 0.001; the initial iterative length of CRKRSL is set as the same the input dimension, i.e., L = m.

Low-Dimensional Input
In this part, the input dimension m and constrained dimension q are set as m = 7 and q = 3, respectively. The noise considered here satisfies the following conditions, i.e., Gaussian noise v(n) ∼ N(0, 0.1) and mixed noise v 1 (n) ∼ N(0, 0.1). To reflect the influence of risk-sensitive parameter γ on the MSD, the relations between γ and steady-state MSD are shown in Figures 2 and 3 under different noise models. From Figure 2, one can see that CRKRSL has a stable steady-state MSD under Gaussian noise. Therefore, γ has little influence on the performance of CRKRSL. From Figure 3, it is observed that the CRKRSL is sensitive to γ under mixed noise with model (a) and achieves the lowest steady-state MSD around γ = 2.5. Therefore, we choose γ = 2.5 for the following algorithm comparison in Figures 4 and 5. Note that ρ denotes the shape parameter in the CLLAF algorithm. All necessary parameters are given in the figures.   From Figure 4, one can see that CRKRSL, CRMCC and CRLS coincide and have almost the same MSDs under Gaussian noise. This implies that the CRKRSL deals with the Gaussian noise well by choosing a large kernel width. From Figure 5, it is observed that CRKRSL has the best performance among all constrained algorithms since the risk-sensitive parameter γ can avoid fluctuations caused by a small kernel width. Moreover, CRKRSL is more stable than CRMCC at the initial stage, potentially leading to a better filtering performance. In Table 1, we further compare the consumed time at each iteration and steady-state MSDs of each algorithm under mixed noise with model (a). The consumed time of CRKRSL is far less than that of CRMCC and CRLS. To show the advantage of CRKRSL on computational complexity, Table 2 lists the compared results of all mentioned algorithms at each iteration. One can see that CRKRSL has a lower computational complexity than CRLS and CRMCC by avoiding calculating the inverse matrix. Although the inverse matrix U n of CRMCC is not required to be calculated, the inverse matrix (C T U n C) −1 is still needed to be calculated at each iteration, resulting in a high computational complexity with a large q.
To test the performance of CRKRSL under mixed noise with model (b), Figure 6 gives the MSD results of all mentioned algorithms. One can see from Figure 6 that CRKRSL, CRMCC, CMCC and CLLAF show strong robustness to outliers, and CRKRSL has the lowest MSD, whereas the CLMS and CRLS are not stable due to being sensitive to αstable noise.

Algorithms
Computational Complexity

High-Dimensional Input
In this part, the input dimension m and constrained dimension q are set as m = 31 and q = 1, respectively. We only consider the mixed noise, since the Gaussian noise influences the performance of CRKRSL little by selecting a large kernel width. The mixed noise satisfies v 1 (n) ∼ N(0, 1). Figures 7 and 8 show the MSDs of different algorithms under mixed noise with model (a) and model (b), respectively. It is clear to see that CRKRSL shows the best performance out of all the compared algorithms both under mixed noise with model (a) and model (b). The initial iterative length L influences the convergence speed, significantly. Therefore, the initial iterative length should be not smaller than the input dimension. In Table 3, we further compare the consumed time at each iteration and steady-state MSDs of each algorithm under mixed noise with model (a). One can see that the consumed time of CRKRSL is far less than that of CRMCC. Moreover, CRKRSL has the lowest steady-state MSD value.

Conclusions
By introducing the linear constraints to the kernel risk-sensitive loss (KRSL), a lowcomplexity constrained recursive KRSL (CRKRSL) algorithm is presented with the help of average approximation. Since the risk-sensitive parameter is able to control the smoothness of the performance surface, CRKRSL achieves higher accuracy than some existing constrained recursive algorithms. Due to the inaccuracy of the average approximation with a few inputs, a fixed step-size gradient method is adopted to avoid the instability of CRKRSL at the initial update stage. Moreover, mean square analysis indicates that iteration times influence the stability of CRKRSL significantly and simulation results confirm the advantages of CRKRSL. The effectiveness of CRKRSL highly relies on the average approximation method, which has limits when coping with nonstationary signals. In the future, we focus on finding a novel approximation method to process both stationary and nonstationary signals and further improve the computational efficiency and accuracy of constrained recursive algorithms.