Recursive Minimum Complex Kernel Risk-Sensitive Loss Algorithm

The maximum complex correntropy criterion (MCCC) has been extended to complex domain for dealing with complex-valued data in the presence of impulsive noise. Compared with the correntropy based loss, a kernel risk-sensitive loss (KRSL) defined in kernel space has demonstrated a superior performance surface in the complex domain. However, there is no report regarding the recursive KRSL algorithm in the complex domain. Therefore, in this paper we propose a recursive complex KRSL algorithm called the recursive minimum complex kernel risk-sensitive loss (RMCKRSL). In addition, we analyze its stability and obtain the theoretical value of the excess mean square error (EMSE), which are both supported by simulations. Simulation results verify that the proposed RMCKRSL out-performs the MCCC, generalized MCCC (GMCCC), and traditional recursive least squares (RLS).


Introduction
As many noises are non-Gaussian distributed in practice, the performance of traditional second-order statistics-based similarity measures may deteriorate dramatically [1,2]. To efficiently handle the non-Gaussian noise, a higher order statistic called correntropy [3][4][5][6] was proposed. The correntropy is a nonlinear and local similarity measure widely used in adaptive filters [7][8][9][10][11][12][13][14][15], and usually employs a Gaussian function as the kernel function thanks to flexible and positive definiteness. However, the Gaussian kernel is not always the best choice [16]. Hence, Chen et al. proposed the generalized maximum correntropy criterion (GMCC) algorithm [16,17] using a generalized Gaussian density function as the kernel. Compared with traditional maximum correntropy criterion (MCC), the GMCC behaves better when the shape parameter is properly selected. In addition, the MCC can be regarded as a special case of GMCC. Considering that the error performance surface of correntropic loss is highly non-convex, Chen et al. proposed another algorithm named the minimum kernel risk-sensitive loss (MKRSL), which is defined in kernel space but also inherits the original form of risk-sensitive loss (RSL) [18,19]. The performance surface of the kernel risk-sensitive loss (KRSL) is more efficient than the MCC, resulting in a faster convergence speed and a higher accuracy. Furthermore, KRSL is also not sensitive to outliers.
Generally, adaptive filter has been mainly focused on the real domain and cannot be used to deal with complex-valued data directly. Recently, the complex domain adaptive filter has drawn more attention. Guimaraes et al. proposed the maximum complex correntropy criterion (MCCC) [20,21] and provided a probabilistic interpretation [20]. MCCC shows an obvious advantage over the least absolute deviation (LAD) [22], complex least mean square (CLMS) [23], and recursive least squares (RLS) algorithms [24]. The stability analysis and the theoretical EMSE of the MCCC have been derived [25]. The MCCC has been extended to the generalized case [26]. The generalized MCCC (GMCCC) algorithm employs a complex generalized Gaussian density as a kernel and offers a desirable performance for handling the complex-valued data. In addition, a gradient-based complex kernel risk-sensitive loss (CKRSL) defined in kernel space has shown a superior performance [27]. Until now, there has been no report about the recursive complex KRSL (CKRSL) algorithm. Therefore, in this paper we first propose a recursive minimum CKRSL (RMCKRSL) algorithm. Then, we analyze the stability and calculate the theoretical value of the EMSE. Simulations show that the RMCKRSL is better than the MCCC, GMCCC, and traditional RLS. Simultaneously, the correctness of the theoretical analysis is also demonstrated by simulations.
The remaining parts of this paper are organized as follows: In Section 2, we provide the loss function of the CKRSL and propose the recursive MCKRSL algorithm. In Section 3 we analyze the stability and obtain the theoretical value of the EMSE for the proposed algorithm. In Section 4, simulations are performed to verify the superior convergence of the RMCKRSL algorithm and the correctness of the theoretical analysis. Finally, in Section 5 we draw a conclusion.

Complex Kernel Risk-Sensitive Loss
Supposing there are two complex variables C 1 = X 1 + jY 1 and C 2 = X 2 + jY 2 , the complex kernel risk-sensitive loss (CKRSL) is defined as [27]: where X 1 , X 2 , Y 1 and Y 2 are real variables, λ is the risk-sensitive parameter, and κ c σ (C 1 − C 2 ) is the kernel function.
This paper employs a Gaussian kernel which is expressed as: where σ is the kernel width.

Cost Function
We define the cost function of MCKRSL as: where denotes the error at the k th iteration, d(k) represents the expected response at the k th iteration, w = w 1 w 2 · · · w m denotes the estimated weight vector, m is the length of adaptive filter, and

Recursive Solution
Using the Wirtinger Calculus [28,29], the gradient of J MCKRSL with respect to w * is derived: By making ∂J MCKRSL ∂w * = 0, we obtain the optimal solution where It is noted that Equation (6) is actually a fixed point solution because R and p depend on w. In practice, R and p are usually estimated as follows when the samples are finite: Hence,R,p and w are updated as follows: Using the matrix inversion lemma [30], we may rewriteR −1 k in Equation (12) as: and After some algebraic manipulations, we may derive the recursive form of w(k) as follows: Finally, Algorithm 1 summarizes the recursive MCKRSL (RMCKRSL) algorithm.

Stability Analysis
Supposing the desired signal is as follows: we rewrite the error as: where w 0 is the system parameter to be estimated, w(k − 1) = w 0 − w(k − 1), v(k) represents the noise at discrete time k, and e(k) = w H (k − 1)x(k). Furthermore, we rewrite w(k) as: where f (e(k)) = h(e(k))e * (k), , v represents the noise, and the second line is approximately obtained by using the following: (1) We can approximate the second line of Equation (20) when |e a (l)| 2 is small enough, where e a (l) = (w 0 − w(l − 1)) H x(l). (2) According to Equation (20), the RMCKRSL can be approximately viewed as a gradient descend method with variable steps a 0 /k. By multiplyingR 1/2 on both sides of Equation (20), we obtain the following: where · F represents the Frobenius norm, Re(·) is the real part, and I m denotes the m × m identity matrix. Then, we can determine that if the sequence E w(k) 2 is decreasing and the algorithm will converge.

Excess Mean Square Error
Let S(k) be the excess mean square error (EMSE) and defined as: To derive the theoretical value of S(k), we adopt some commonly used assumptions [8,27,31]: (A1) v(k) is zero-mean and independently identically distributed (IID); e a (k) is independent of v(k) and also zero-mean; (A2) x(k) is independent of v(k), circular and stationary. Thus, taking (23) and (25) into consideration, we obtain the following: Similar to [27], we can obtain the following: where Thus, Entropy 2018, 20, 902 It can be seen from Equation (31), that S(k) is the solution to a first-order difference equation. Thus, we derive that: and (32) is reliable only when |e a | 2 is small enough and k is large. c 1 can be obtained by using the initial value of the EMSE. However, it is not necessary to calculate c 1 in general, because S h (k) S p (k) when k is large. Thus, S(k) ≈ S p (k).

Simulation
In this section, two examples are used to illustrate the superior performance of the RMCKRSL i.e., system identification and nonlinear prediction. We obtained the simulation results by averaging 1000 Monte Carlo trials.

Example 1
We chose the length of the filter as five where the weight vector w 0 = w 1 w 2 · · · w 5 is generated randomly, where w i = w Ri + jw Ii , with w Ri ∈ N(0, 0.1) and w Ii ∈ N(0, 0.1) being the real and imaginary parts of w i , and N µ, σ 2 denoting the Gaussian distribution with µ and σ 2 being the mean and variance, respectively. The input signal x = x R + jx I is also generated randomly, where x R , x I ∈ N(0, 1). An additive complex noise, v = v R + jv I , with v R and v I being the real and imaginary parts, is considered in the simulations.
First, we verify the superiority of the RMCKRSL in the presence of contaminated Gaussian noise [17,19] 20) represents an outlier (or impulsive disturbances), P(c(k) = 1) = 0.06 is the occurrence probability of impulsive disturbances, P(c(k)= 0) = 0.94. To ensure a fair comparison, all the algorithms use the recursive iteration to search the optimal solution. The parameters for different algorithms are chosen experimentally to guarantee the desirable solution. The performances of different algorithms on the basis of weight error power w 0 − w(k) 2 are shown in Figure 1. It is clear that compared with the MCCC, GMCCC and traditional RLS, RMCKRSL has the best filtering performance.
Then, the validity of the theoretical EMSEs for MCKRSL is demonstrated. The noise model is also a contaminated Gaussian model, 20), P(c(k) = 1) = 0.06 and P(c(k)= 0) = 0.94. Figure 2 compares the values of the theoretical EMSEs and simulated ones under variations of σ 2 v . Obviously there is a good match between the theoretical EMSEs and the simulated ones. In addition, it has been shown that the value of EMSE becomes bigger with the increase of noise variance.
Next, we tested the influence of the outlier on the performance of the RMCKRSL algorithm. The noise model is also a contaminated Gaussian noise, where v 1R , v 1I ∈ N(0, 0.1), v 2R , v 2I ∈ N 0, σ 2 B /2 , P(c(k) = 1) = p and P(c(k)= 0)= 1 − p. variance of the outlier is σ 2 B = 40. One can observe that the proposed RMCKRSL algorithm is robust to the probability of an outlier and has better performance than the MCCC, GMCCC and RLS. Figure 4 depicts the performances of different algorithms under different variance of outlier (σ 2 B ) values, where the sample size is also 5000 and the probability of an outlier is p = 0.06. It can be observed that the proposed RMCKRSL algorithm is also robust to the variance of an outlier and has better performance than other algorithms.
. Figure   Finally, the influences of the kernel width σ and risk-sensitive parameter λ on the performance of the RMCKRSL are investigated. The noise model is also a contaminated Gaussian noise, where v 1R , v 1I ∈ N(0, 0.1), v 2R , v 2I ∈ N(0, 20), P(c(k) = 1)= 0.06 and P(c(k)= 0)= 0.94. Figures 5 and 6 present the performance of the RMCKRSL under a different kernel width σ and risk-sensitive parameter λ, respectively. One can see that both the kernel width σ and risk-sensitive parameter λ play an important role in the performance of the RMCKRSL. It is challenging to choose the optimal σ and λ because it is dependent on the statistical characteristic of the noise, which is unknown in the practical case. Thus, it is suggested that the parameters are chosen by experimentation. is robust to the probability of an outlier and has better performance than the MCCC, GMCCC and RLS. Figure 4 depicts the performances of different algorithms under different variance of outlier ( 2 B  ) values, where the sample size is also 5000 and the probability of an outlier is 0.06 p  . It can be observed that the proposed RMCKRSL algorithm is also robust to the variance of an outlier and has better performance than other algorithms.  Finally, the influences of the kernel width  and risk-sensitive parameter  on the performance of the RMCKRSL are investigated. The noise model is also a contaminated Gaussian noise, where . Figures 5 and 6 present the performance of the RMCKRSL under a different kernel width  and risk-sensitive parameter  , respectively. One can see that both the kernel width  and risk-sensitive parameter  play an important role in the performance of the RMCKRSL. It is challenging to choose the optimal  and RLS. Figure 4 depicts the performances of different algorithms under different variance of outlier ( 2

B
 ) values, where the sample size is also 5000 and the probability of an outlier is 0.06 p  . It can be observed that the proposed RMCKRSL algorithm is also robust to the variance of an outlier and has better performance than other algorithms.

Example 2
In this example, the superiority of the RMCKRSL is demonstrated by the prediction of a

Example 2
In this example, the superiority of the RMCKRSL is demonstrated by the prediction of a nonlinear system, where s(t) = u 0 [s 1 (t) + js 2 (t)], s 1 (t) is a Mackey-Glass chaotic time series described as follows [15]: is the reverse of s 1 (t), and u 0 is a complex valued number whose real and imaginary parts are randomly generated and obey a uniform distribution over the interval [0, 1]. s(t) is discretized by sampling with an interval of six seconds, and affected by the contaminated Gaussian noise  Figure 7. One may observe that the RMCKRSL has a faster convergence rate and better filter accuracy than other algorithms. In addition, the RLS behaves the worst since the minimum square error criterion is not robust to the impulse noise.  Figure 7. One may observe that the RMCKRSL has a faster convergence rate and better filter accuracy than other algorithms. In addition, the RLS behaves the worst since the minimum square error criterion is not robust to the impulse noise.

Conclusions
As a nonlinear similarity measure defined in kernel space, kernel risk-sensitive loss (KRSL) shows a superior performance in adaptive filter. However, there is no report about the recursive KRSL algorithm. Thus, in this paper we focused on the complex domain adaptive filter and

Conclusions
As a nonlinear similarity measure defined in kernel space, kernel risk-sensitive loss (KRSL) shows a superior performance in adaptive filter. However, there is no report about the recursive KRSL algorithm. Thus, in this paper we focused on the complex domain adaptive filter and proposed a recursive minimum complex KRSL (RMCKRSL) algorithm. Compared with the MCCC, GMCCC and traditional RLS algorithms, the proposed algorithm offers both a faster convergence rate and higher accuracy. Moreover, we derived the theoretical value of the EMSE, and demonstrated its correctness by simulations.