Recursive Minimum Complex Kernel Risk-Sensitive Loss Algorithm

Qian, Guobing; Luo, Dan; Wang, Shiyuan

doi:10.3390/e20120902

Open AccessArticle

Recursive Minimum Complex Kernel Risk-Sensitive Loss Algorithm

by

Guobing Qian

^1,2,*

,

Dan Luo

¹ and

Shiyuan Wang

^1,*

¹

College of Electronic and Information Engineering, Brain-inspired Computing & Intelligent Control of Chongqing Key Laboratory, Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, Southwest University, Chongqing 400715, China

²

School of Mathematics and Statistics, Southwest University, Chongqing 400715, China

^*

Authors to whom correspondence should be addressed.

Entropy 2018, 20(12), 902; https://doi.org/10.3390/e20120902

Submission received: 4 November 2018 / Revised: 23 November 2018 / Accepted: 23 November 2018 / Published: 25 November 2018

(This article belongs to the Special Issue Information Theoretic Learning and Kernel Methods)

Download

Browse Figures

Versions Notes

Abstract

:

The maximum complex correntropy criterion (MCCC) has been extended to complex domain for dealing with complex-valued data in the presence of impulsive noise. Compared with the correntropy based loss, a kernel risk-sensitive loss (KRSL) defined in kernel space has demonstrated a superior performance surface in the complex domain. However, there is no report regarding the recursive KRSL algorithm in the complex domain. Therefore, in this paper we propose a recursive complex KRSL algorithm called the recursive minimum complex kernel risk-sensitive loss (RMCKRSL). In addition, we analyze its stability and obtain the theoretical value of the excess mean square error (EMSE), which are both supported by simulations. Simulation results verify that the proposed RMCKRSL out-performs the MCCC, generalized MCCC (GMCCC), and traditional recursive least squares (RLS).

Keywords:

complex; kernel risk-sensitive loss; recursive; stability; EMSE

1. Introduction

As many noises are non-Gaussian distributed in practice, the performance of traditional second-order statistics-based similarity measures may deteriorate dramatically [1,2]. To efficiently handle the non-Gaussian noise, a higher order statistic called correntropy [3,4,5,6] was proposed. The correntropy is a nonlinear and local similarity measure widely used in adaptive filters [7,8,9,10,11,12,13,14,15], and usually employs a Gaussian function as the kernel function thanks to flexible and positive definiteness. However, the Gaussian kernel is not always the best choice [16]. Hence, Chen et al. proposed the generalized maximum correntropy criterion (GMCC) algorithm [16,17] using a generalized Gaussian density function as the kernel. Compared with traditional maximum correntropy criterion (MCC), the GMCC behaves better when the shape parameter is properly selected. In addition, the MCC can be regarded as a special case of GMCC. Considering that the error performance surface of correntropic loss is highly non-convex, Chen et al. proposed another algorithm named the minimum kernel risk-sensitive loss (MKRSL), which is defined in kernel space but also inherits the original form of risk-sensitive loss (RSL) [18,19]. The performance surface of the kernel risk-sensitive loss (KRSL) is more efficient than the MCC, resulting in a faster convergence speed and a higher accuracy. Furthermore, KRSL is also not sensitive to outliers.

Generally, adaptive filter has been mainly focused on the real domain and cannot be used to deal with complex-valued data directly. Recently, the complex domain adaptive filter has drawn more attention. Guimaraes et al. proposed the maximum complex correntropy criterion (MCCC) [20,21] and provided a probabilistic interpretation [20]. MCCC shows an obvious advantage over the least absolute deviation (LAD) [22], complex least mean square (CLMS) [23], and recursive least squares (RLS) algorithms [24]. The stability analysis and the theoretical EMSE of the MCCC have been derived [25]. The MCCC has been extended to the generalized case [26]. The generalized MCCC (GMCCC) algorithm employs a complex generalized Gaussian density as a kernel and offers a desirable performance for handling the complex-valued data. In addition, a gradient-based complex kernel risk-sensitive loss (CKRSL) defined in kernel space has shown a superior performance [27]. Until now, there has been no report about the recursive complex KRSL (CKRSL) algorithm. Therefore, in this paper we first propose a recursive minimum CKRSL (RMCKRSL) algorithm. Then, we analyze the stability and calculate the theoretical value of the EMSE. Simulations show that the RMCKRSL is better than the MCCC, GMCCC, and traditional RLS. Simultaneously, the correctness of the theoretical analysis is also demonstrated by simulations.

The remaining parts of this paper are organized as follows: In Section 2, we provide the loss function of the CKRSL and propose the recursive MCKRSL algorithm. In Section 3 we analyze the stability and obtain the theoretical value of the EMSE for the proposed algorithm. In Section 4, simulations are performed to verify the superior convergence of the RMCKRSL algorithm and the correctness of the theoretical analysis. Finally, in Section 5 we draw a conclusion.

2. Fixed Point Algorithm under Minimizing Complex Kernel Risk-Sensitive Loss

2.1. Complex Kernel Risk-Sensitive Loss

Supposing there are two complex variables

C_{1} = X_{1} + j Y_{1}

and

C_{2} = X_{2} + j Y_{2}

, the complex kernel risk-sensitive loss (CKRSL) is defined as [27]:

L_{λ}^{c} (C_{1}, C_{2}) = \frac{1}{λ} E [\exp [λ (1 - κ_{σ}^{c} (C_{1} - C_{2}))]]

(1)

where

X_{1}

,

X_{2}

,

Y_{1}

and

Y_{2}

are real variables,

λ

is the risk-sensitive parameter, and

κ_{σ}^{c} (C_{1} - C_{2})

is the kernel function.

This paper employs a Gaussian kernel which is expressed as:

κ_{σ}^{c} (C_{1} - C_{2}) = \exp (- \frac{(C_{1} - C_{2}) {(C_{1} - C_{2})}^{*}}{2 σ^{2}})

(2)

where

σ

is the kernel width.

2.2. Recursive Minimum Complex Kernel Risk-Sensitive Loss (RMCKRSL)

2.2.1. Cost Function

We define the cost function of MCKRSL as:

\begin{array}{l} J_{M C K R S L}^{} & = E [L_{λ}^{c} (e (k))] \\ = \frac{1}{λ} E [\exp [λ (1 - κ_{σ}^{c} (e (k)))]] \end{array}

(3)

where

e (k) = d (k) - w^{H} x (k)

(4)

denotes the error at the

k^{t h}

iteration,

d (k)

represents the expected response at the

k^{t h}

iteration,

w = [\begin{matrix} w_{1} & w_{2} & \dots & w_{m} \end{matrix}]

denotes the estimated weight vector,

m

is the length of adaptive filter, and

x (k) = [x (k) {\begin{matrix} x (k - 1) & \dots & x (k - m + 1) \end{matrix}]}^{T}

is the input vector,

{(\cdot)}^{H}

and

{(\cdot)}^{T}

denote the conjugate transpose and transpose, respectively.

2.2.2. Recursive Solution

Using the Wirtinger Calculus [28,29], the gradient of

J_{M C K R S L}^{}

with respect to

w^{*}

is derived:

\frac{\partial J_{M C K R S L}^{}}{\partial w^{*}} = E [L_{λ}^{c} (e) \times (- d^{*} x + x x^{H} w)]

(5)

By making

\frac{\partial J_{M C K R S L}^{}}{\partial w^{*}} = 0

, we obtain the optimal solution

w = R_{}^{- 1} p

(6)

where

R = E {h (e) x x^{H}}

(7)

p = E {h (e) d^{*} x}

(8)

h [e] = \exp [λ (1 - κ_{σ}^{c} (e))] κ_{σ}^{c} (e)

(9)

It is noted that Equation (6) is actually a fixed point solution because

R

and

p

depend on

w

. In practice,

R

and

p

are usually estimated as follows when the samples are finite:

\hat{R} = \frac{1}{N} \sum_{k = 1}^{N} h [e (k)] x (k) x^{H} (k)

(10)

\hat{p} = \frac{1}{N} \sum_{k = 1}^{N} h [e (k)] d^{*} (k) x (k)

(11)

Hence,

\hat{R}

,

\hat{p}

and

w

are updated as follows:

\begin{matrix} {\hat{R}}_{k} = {\hat{R}}_{k - 1} + h [e (k)] x (k) x^{H} (k) \\ {\hat{p}}_{k} = {\hat{p}}_{k - 1} + h [e (k)] d^{*} (k) x (k) \\ w (k) = {\hat{R}}_{k}^{- 1} {\hat{p}}_{k} \end{matrix}

(12)

Using the matrix inversion lemma [30], we may rewrite

{\hat{R}}_{k}^{- 1}

in Equation (12) as:

{\hat{R}}_{k}^{- 1} = {\hat{R}}_{k - 1}^{- 1} - {\hat{R}}_{k - 1}^{- 1} x (k) {(h^{- 1} (e (k)) + x^{H} (k) {\hat{R}}_{k - 1}^{- 1} x (k))}^{- 1} \times x^{H} (k) {\hat{R}}_{k - 1}^{- 1}

(13)

b (k) = {\hat{R}}_{k - 1}^{- 1} x (k) {(h^{- 1} (e (k)) + x^{H} (k) {\hat{R}}_{k - 1}^{- 1} x (k))}^{- 1}

(14)

(h^{- 1} (e (k)) + x^{H} (k) {\hat{R}}_{k - 1}^{- 1} x (k)) b (k) = {\hat{R}}_{k - 1}^{- 1} x (k)

(15)

and

\begin{array}{l} b (k) & = h (e (k)) {\hat{R}}_{k - 1}^{- 1} x (k) \\ - b (k) h (e (k)) x^{H} (k) {\hat{R}}_{k - 1}^{- 1} x (k) \\ = h (e (k)) ({\hat{R}}_{k - 1}^{- 1} - b (k) x^{H} (k) {\hat{R}}_{k - 1}^{- 1}) x (k) \\ = h (e (k)) {\hat{R}}_{k}^{- 1} x (k) \end{array}

(16)

After some algebraic manipulations, we may derive the recursive form of

w (k)

as follows:

\begin{array}{l} w (k) & = {\hat{R}}_{k}^{- 1} {h (e (k)) d^{*} (k) x (k) + {\hat{p}}_{k - 1}} \\ = w (k - 1) + b (k) e^{*} (k) \end{array}

(17)

Finally, Algorithm 1 summarizes the recursive MCKRSL (RMCKRSL) algorithm.

Algorithm 1: RMCKRSL.

Input:

σ

,

λ

,

d (k)

,

x (k)

1. Initializations:

δ = 0.0001

,

p_{0} = 0

,

w (0) = 0

,

R_{0} = δ I

,

R_{0}^{- 1} = δ^{- 1} I

2. While

{\begin{matrix} x (k) & d (k) \end{matrix}}

available, do
3.

e (k) = d (k) - w^{H} (k) x (k)

4.

κ_{σ}^{c} (e (k)) = \exp (- {| e (k) |}^{2} / 2 σ^{2})

5.

h [e (k)] = \exp [λ (1 - κ_{σ}^{c} (e (k)))] κ_{σ}^{c} (e (k))

6.

b (k) = {\hat{R}}_{k - 1}^{- 1} x (k) {(h^{- 1} (e (k)) + x^{H} (k) {\hat{R}}_{k - 1}^{- 1} x (k))}^{- 1}

7.

w (k) = w (k - 1) + b (k) e^{*} (k)

8.

{\hat{R}}_{k}^{- 1} = {\hat{R}}_{k - 1}^{- 1} - {\hat{R}}_{k - 1}^{- 1} x (k) {(h^{- 1} (e (k)) + x^{H} (k) {\hat{R}}_{k - 1}^{- 1} x (k))}^{- 1} \times x^{H} (k) {\hat{R}}_{k - 1}^{- 1}

9. End while

10.

{\hat{w}}_{0} = w (k)

Output: Estimated filter weight

{\hat{w}}_{0}

3. Convergence Analysis

3.1. Stability Analysis

Supposing the desired signal is as follows:

d (k) = w_{0}^{H} x (k) + v (k)

(18)

we rewrite the error as:

\begin{array}{l} e (k) & = {\tilde{w}}^{H} (k - 1) x (k) + v (k) \\ = e_{a} (k) + v (k) \end{array}

(19)

where

w_{0}^{}

is the system parameter to be estimated,

\tilde{w} (k - 1) = w_{0} - w (k - 1)

,

v (k)

represents the noise at discrete time

k

, and

e (k) = {\tilde{w}}^{H} (k - 1) x (k)

.

Furthermore, we rewrite

w (k)

as:

\begin{array}{l} w (k) & = w (k - 1) + h (e (k)) e^{*} (k) {\hat{R}}_{k}^{- 1} x (k) \\ \approx w (k - 1) + \frac{a_{0}}{k} f (e (k)) {\bar{R}}_{}^{- 1} x (k) \end{array}

(20)

where

f (e (k)) = h (e (k)) e^{*} (k)

,

\bar{R} = E {x x^{H}}

,

a_{0} =

1 / E [\exp [λ (1 - κ_{σ}^{c} (v))] κ_{σ}^{c} (v)]

,

v

represents the noise, and the second line is approximately obtained by using the following:

\begin{array}{l} {\hat{R}}_{k}^{- 1} & = {[\sum_{l = 1}^{k} h [e (l)] x (l) x^{H} (l)]}^{- 1} \\ = {[k [\frac{1}{k} \sum_{l = 1}^{k} h [e (l)] x (l) x^{H} (l)]]}^{- 1} \overset{}{\approx} \frac{a_{0}}{k} {\bar{R}}^{- 1} \end{array}

(21)

Remark 1.

(1) We can approximate the second line of Equation (20) when

{| e_{a} (l) |}^{2}

is small enough, where

e_{a} (l) = {(w_{0} - w (l - 1))}^{H} x (l)

.

(2) According to Equation (20), the RMCKRSL can be approximately viewed as a gradient descend method with variable steps a₀/k.

(3) We can estimate

\bar{R}

by

\frac{1}{N} \sum_{l = 1}^{N} x (l) x^{H} (l)

, where

N

is the number of samples.

By multiplying

{\bar{R}}_{}^{1 / 2}

on both sides of Equation (20), we obtain the following:

{\bar{R}}_{}^{1 / 2} \tilde{w} (k) = {\bar{R}}_{}^{1 / 2} \tilde{w} (k - 1) - \frac{a_{0}}{k} f (e (k)) {\bar{R}}_{}^{- 1 / 2} x (k)

(22)

where

f (e (k)) = \exp [λ (1 - κ_{σ}^{c} (e (k)))] κ_{σ}^{c} (e (k)) e^{*} (k)

.

Therefore,

\begin{array}{l} E {{‖ {\bar{R}}_{}^{1 / 2} \tilde{w} (k) ‖}^{2}} \\ = E {{‖ {\bar{R}}_{}^{1 / 2} \tilde{w} (k - 1) ‖}^{2}} - \frac{2 a_{0}}{k} E {Re [e_{a} (k) f (e (k))]} \\ + \frac{a_{0}^{2}}{k^{2}} E {{‖ I_{m} ‖}_{F}^{2} {| f (e (k)) |}^{2}} \end{array}

(23)

where

{‖ \cdot ‖}_{F}^{}

represents the Frobenius norm,

Re (\cdot)

is the real part, and

I_{m}

denotes the

m \times m

identity matrix.

Then, we can determine that if

k \geq \frac{m a_{0} E {{| f (e (k)) |}^{2}}}{2 E {Re [e_{a} (k) f (e (k))]}}

(24)

the sequence

E {{‖ \tilde{w} (k) ‖}^{2}}

is decreasing and the algorithm will converge.

3.2. Excess Mean Square Error

Let

S (k)

be the excess mean square error (EMSE) and defined as:

S (k) = E [{| e_{a} (k) |}^{2}]

(25)

To derive the theoretical value of

S (k)

, we adopt some commonly used assumptions [8,27,31]:

(A1)

v (k)

is zero-mean and independently identically distributed (IID);

e_{a} (k)

is independent of

v (k)

and also zero-mean;

(A2)

x (k)

is independent of

v (k)

, circular and stationary.

Thus, taking (23) and (25) into consideration, we obtain the following:

\begin{array}{l} S (k + 1) & = S (k) - \frac{2 a_{0}}{k} E {Re [e_{a} (k) f (e (k))]} \\ + \frac{m a_{0}^{2}}{k^{2}} E {{| f (e (k)) |}^{2}} \end{array}

(26)

Similar to [27], we can obtain the following:

\begin{array}{l} E {{| f (e (k)) |}^{2}} = E {\exp [2 λ (1 - κ_{σ}^{c} (v))] κ_{σ / \sqrt{2}}^{c} (v) {| v |}^{2}} \\ + S (k) \times E {\exp [- 2 λ (κ_{σ}^{c} (v) - 1)] κ_{σ / \sqrt{2}}^{c} (v) R_{1}} \end{array}

(27)

E {Re [e_{a} (k) f (e (k))]} = S (k) E {\exp [λ (1 - κ_{σ}^{c} (v))] R_{2}}

(28)

where

\begin{array}{l} R_{1} & = [1 - 3 \frac{{| v |}^{2}}{σ^{2}} + \frac{{| v |}^{4}}{σ^{4}} + 3 λ \frac{{| v |}^{2}}{σ^{2}} κ_{σ}^{c} (v) \\ - 5 λ \frac{{| v |}^{4}}{2 σ^{4}} κ_{σ}^{c} (v) + \frac{{| v |}^{4}}{σ^{4}} λ^{2} κ_{σ / \sqrt{2}}^{c} (v)] \end{array}

(29)

R_{2} = κ_{σ}^{c} (v) (1 + \frac{1}{2 σ^{2}} (λ κ_{σ}^{c} (v) - 1) {| v |}^{2})

(30)

Thus,

S (k + 1) = S (k) - \frac{2 a_{0} a_{1}}{k} S (k) + \frac{a_{0}^{2} a_{2}}{k^{2}} S (k) + \frac{a_{0}^{2} a_{3}}{k^{2}}

(31)

where

a_{1} = E {\exp [λ (1 - κ_{σ}^{c} (v))] R_{2}}

,

a_{2} = m E {\exp [- 2 λ (κ_{σ}^{c} (v) - 1)] κ_{σ / \sqrt{2}}^{c} (v) R_{1}}

,

a_{3} = m E {\exp [2 λ (1 - κ_{σ}^{c} (v))] κ_{σ / \sqrt{2}}^{c} (v) {| v |}^{2}}

.

It can be seen from Equation (31), that S(k) is the solution to a first-order difference equation. Thus, we derive that:

S (k) = S_{h} (k) + S_{p} (k)

(32)

where

S_{h} (k) = c_{1} λ (k - 1) λ (k - 2) \dots λ (1)

is the homogeneous solution with

λ (k - 1) = 1 - 2 a_{0} a_{1} / (k - 1) + a_{0}^{2} a_{2} / {(k - 1)}^{2}

, and

S_{p} (k)

is the particular solution where:

S_{p} (k) = (\frac{a_{0} a_{3}}{k} c_{2}) / (2 a_{1} - \frac{a_{0} a_{2}}{k} c_{2})

(33)

and

c_{2} \approx \frac{2 a_{0} a_{1}}{2 a_{0} a_{1} - [k / (k + 1)]}

.

Remark 2.

The theoretical value of

S (k)

in Equation (32) is reliable only when

{| e_{a} |}^{2}

is small enough and

k

is large.

c_{1}

can be obtained by using the initial value of the EMSE. However, it is not necessary to calculate

c_{1}

in general, because

S_{h} (k) ≪ S_{p} (k)

when

k

is large. Thus,

S (k) \approx S_{p} (k)

.

4. Simulation

In this section, two examples are used to illustrate the superior performance of the RMCKRSL i.e., system identification and nonlinear prediction. We obtained the simulation results by averaging 1000 Monte Carlo trials.

4.1. Example 1

We chose the length of the filter as five where the weight vector

w_{0} = [\begin{matrix} w_{1} & w_{2} & \dots & w_{5} \end{matrix}]

is generated randomly, where

w_{i} = w_{R i} + j w_{I i}

, with

w_{R i} \in N (0, 0.1)

and

w_{I i} \in N (0, 0.1)

being the real and imaginary parts of

w_{i}

, and

N (μ, σ^{2})

denoting the Gaussian distribution with

μ

and

σ^{2}

being the mean and variance, respectively. The input signal

x = x_{R} + j x_{I}

is also generated randomly, where

x_{R}, x_{I} \in N (0, 1)

. An additive complex noise,

v = v_{R} + j v_{I}

, with

v_{R}

and

v_{I}

being the real and imaginary parts, is considered in the simulations.

First, we verify the superiority of the RMCKRSL in the presence of contaminated Gaussian noise [17,19], i.e.,

v (k) = (1 - c (k)) v_{1} (k) + c (k) v_{2} (k)

, where

v_{1} (k) = v_{1 R} (k) + j v_{1 I} (k)

, with

v_{1 R}, v_{1 I} \in

N (0, 0.1)

,

v_{2} (k) = v_{2 R} (k) + j v_{2 I} (k)

with

v_{2 R}, v_{2 I} \in N (0, 20)

represents an outlier (or impulsive disturbances),

P (c (k) = 1) = 0.06

is the occurrence probability of impulsive disturbances,

P (c (k) = 0) = 0.94

. To ensure a fair comparison, all the algorithms use the recursive iteration to search the optimal solution. The parameters for different algorithms are chosen experimentally to guarantee the desirable solution. The performances of different algorithms on the basis of weight error power

{‖ w_{0} - w (k) ‖}^{2}

are shown in Figure 1. It is clear that compared with the MCCC, GMCCC and traditional RLS, RMCKRSL has the best filtering performance.

Then, the validity of the theoretical EMSEs for MCKRSL is demonstrated. The noise model is also a contaminated Gaussian model, where

v_{1 R}, v_{1 I} \in N (0, σ_{v}^{2} / 2)

,

v_{2 R}, v_{2 I} \in N (0, 20)

,

P (c (k) = 1) = 0.06

and

P (c (k) = 0) = 0.94

. Figure 2 compares the values of the theoretical EMSEs and simulated ones under variations of

σ_{v}^{2}

. Obviously there is a good match between the theoretical EMSEs and the simulated ones. In addition, it has been shown that the value of EMSE becomes bigger with the increase of noise variance.

Next, we tested the influence of the outlier on the performance of the RMCKRSL algorithm. The noise model is also a contaminated Gaussian noise, where

v_{1 R}, v_{1 I} \in N (0, 0.1)

,

v_{2 R}, v_{2 I} \in N (0, σ_{B}^{2} / 2)

,

P (c (k) = 1) = p

and

P (c (k) = 0) = 1 - p

. Figure 3 compares the performances of different algorithms under different probability of outlier values (

p

), where the sample size is 5000 and the variance of the outlier is

σ_{B}^{2} = 40

. One can observe that the proposed RMCKRSL algorithm is robust to the probability of an outlier and has better performance than the MCCC, GMCCC and RLS. Figure 4 depicts the performances of different algorithms under different variance of outlier (

σ_{B}^{2}

) values, where the sample size is also 5000 and the probability of an outlier is

p = 0.06

. It can be observed that the proposed RMCKRSL algorithm is also robust to the variance of an outlier and has better performance than other algorithms.

Finally, the influences of the kernel width

σ

and risk-sensitive parameter

λ

on the performance of the RMCKRSL are investigated. The noise model is also a contaminated Gaussian noise, where

v_{1 R}, v_{1 I} \in N (0, 0.1)

,

v_{2 R}, v_{2 I} \in N (0, 20)

,

P (c (k) = 1) = 0.06

and

P (c (k) = 0) = 0.94

. Figure 5 and Figure 6 present the performance of the RMCKRSL under a different kernel width

σ

and risk-sensitive parameter

λ

, respectively. One can see that both the kernel width

σ

and risk-sensitive parameter

λ

play an important role in the performance of the RMCKRSL. It is challenging to choose the optimal

σ

and

λ

because it is dependent on the statistical characteristic of the noise, which is unknown in the practical case. Thus, it is suggested that the parameters are chosen by experimentation.

4.2. Example 2

In this example, the superiority of the RMCKRSL is demonstrated by the prediction of a nonlinear system, where

s (t) = u_{0} [s_{1} (t) + j s_{2} (t)]

,

s_{1} (t)

is a Mackey-Glass chaotic time series described as follows [15]:

\frac{d s_{1} (t)}{d t} = - 0.1 s_{1} (t) + \frac{0.2 s_{1} (t - 30)}{1 + s_{1} {(t - 30)}^{10}}

(34)

s_{2} (t)

is the reverse of

s_{1} (t)

, and

u_{0}

is a complex valued number whose real and imaginary parts are randomly generated and obey a uniform distribution over the interval [0, 1].

s (t)

is discretized by sampling with an interval of six seconds, and affected by the contaminated Gaussian noise

v (k) = (1 - c (k)) v_{1} (k) + c (k) v_{2} (k)

, where

v_{1} (k) = v_{1 R} (k) + j v_{1 I} (k)

, with

v_{1 R}, v_{1 I} \in

N (0, 0.1)

,

v_{2} (k) = v_{2 R} (k) + j v_{2 I} (k)

with

v_{2 R}, v_{2 I} \in N (0, 20)

,

P (c (k) = 1) = 0.06

,

P (c (k) = 0) = 0.94

.

s (k)

is predicted by x(k) = [s(k − 1) s(k − 2) ⋯ s(k − 6)] and the performance is measured by the mean square error (MSE) with

MSE (k) = \frac{1}{N - 6} \sum_{l = 7}^{N} ({| s (l) - w^{H} (k) x (l) |}^{2})

. The convergence curves of different algorithms on the basis of MSE are compared in Figure 7. One may observe that the RMCKRSL has a faster convergence rate and better filter accuracy than other algorithms. In addition, the RLS behaves the worst since the minimum square error criterion is not robust to the impulse noise.

5. Conclusions

As a nonlinear similarity measure defined in kernel space, kernel risk-sensitive loss (KRSL) shows a superior performance in adaptive filter. However, there is no report about the recursive KRSL algorithm. Thus, in this paper we focused on the complex domain adaptive filter and proposed a recursive minimum complex KRSL (RMCKRSL) algorithm. Compared with the MCCC, GMCCC and traditional RLS algorithms, the proposed algorithm offers both a faster convergence rate and higher accuracy. Moreover, we derived the theoretical value of the EMSE, and demonstrated its correctness by simulations.

Author Contributions

Conceptualization, G.Q., D.L. and S.W.; methodology, S.W.; software, D.L.; validation, G.Q.; formal analysis, G.Q.; investigation, G.Q., D.L. and S.W.; resources, G.Q.; data curation, D.L.; writing—original draft preparation, G.Q. and D.L.; writing—review and editing, S.W.; visualization, D.L.; supervision, G.Q., D.L. and S.W.; project administration, G.Q.; funding acquisition, G.Q.

Funding

This research was funded by the China Postdoctoral Science Foundation Funded Project under grant 2017M610583, and Fundamental Research Funds for the Central Universities under grant SWU116013.

Conflicts of Interest

The authors declare no conflict of interest.

References

Principe, J.C. Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives; Springer: New York, NY, USA, 2010. [Google Scholar]
Chen, B.; Zhu, Y.; Hu, J.; Principe, J.C. System Parameter Identification: Information Criteria and Algorithms; Newnes: Oxford, UK, 2013. [Google Scholar]
Liu, W.; Pokharel, P.P.; Pr´ıncipe, J. Correntropy: A localized similarity measure. In Proceedings of the 2006 IEEE International Joint Conference on Neural Network (IJCNN), Vancouver, BC, Canada, 16–21 July 2006; pp. 4919–4924. [Google Scholar]
Liu, W.; Pokharel, P.P.; Principe, J.C. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Trans. Signal Process. 2007, 55, 5286–5298. [Google Scholar] [CrossRef]
Singh, A.; Principe, J.C. Using correntropy as a cost function in linear adaptive filters. In Proceedings of the 2009 International Joint Conference on Neural Networks (IJCNN), Atlanta, GA, USA, 14–19 June 2009; pp. 2950–2955. [Google Scholar]
Singh, A.; Principe, J.C. A loss function for classification based on a robust similarity metric. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; pp. 1–6. [Google Scholar]
Zhao, S.; Chen, B.; Principe, J.C. Kernel adaptive filtering with maximum correntropy criterion. In Proceedings of the 2011 International Joint Conference on Neural Networks (IJCNN), San Jose, CA, USA, 31 July–5 August 2011; pp. 2012–2017. [Google Scholar]
Chen, B.; Xing, L.; Liang, J.; Zheng, N.; Principe, J.C. Steady-state mean-square error analysis for adaptive filtering under the maximum correntropy criterion. IEEE Signal Process. Lett. 2014, 21, 880–884. [Google Scholar]
Wu, Z.; Peng, S.; Chen, B.; Zhao, H. Robust Hammerstein adaptive filtering under maximum correntropy criterion. Entropy 2015, 17, 7149–7166. [Google Scholar] [CrossRef]
Chen, B.; Wang, J.; Zhao, H.; Zheng, N.; Principe, J.C. Convergence of a Fixed-Point Algorithm under Maximum Correntropy Criterion. IEEE Signal Process. Lett. 2015, 22, 1723–1727. [Google Scholar] [CrossRef]
Wang, W.; Zhao, J.; Qu, H.; Chen, B.; Principe, J.C. Convergence performance analysis of an adaptive kernel width MCC algorithm. AEU-Int. J. Electron. Commun. 2017, 76, 71–76. [Google Scholar] [CrossRef]
Liu, X.; Chen, B.; Zhao, H.; Qin, J.; Cao, J. Maximum Correntropy Kalman Filter with State Constraints. IEEE Access 2017, 5, 25846–25853. [Google Scholar] [CrossRef]
Wang, F.; He, Y.; Wang, S.; Chen, B. Maximum total correntropy adaptive filtering against heavy-tailed noises. Signal Process. 2017, 141, 84–95. [Google Scholar] [CrossRef]
Chen, B.; Liu, X.; Zhao, H.; Principe, J.C. Maximum correntropy Kalman filter. Automatica 2017, 76, 70–77. [Google Scholar] [CrossRef] [Green Version]
Wang, S.; Dang, L.; Wang, W.; Qian, G.; Tse, C.K. Kernel Adaptive Filters with Feedback Based on Maximum Correntropy. IEEE Access 2018, 6, 10540–10552. [Google Scholar] [CrossRef]
He, Y.; Wang, F.; Yang, J.; Rong, H.; Chen, B. Kernel adaptive filtering under generalized Maximum Correntropy Criterion. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 1738–1745. [Google Scholar]
Chen, B.; Xing, L.; Zhao, H.; Zheng, N.; Príncipe, J.C. Generalized correntropy for robust adaptive filtering. IEEE Trans. Signal Process. 2016, 64, 3376–3387. [Google Scholar] [CrossRef]
Chen, B.; Wang, R. Risk-sensitive loss in kernel space for robust adaptive filtering. In Proceedings of the 2015 IEEE International Conference on Digital Signal Processing (DSP), Singapore, 21–24 July 2015; pp. 921–925. [Google Scholar]
Chen, B.; Xing, L.; Xu, B.; Zhao, H.; Zheng, N.; Príncipe, J.C. Kernel Risk-Sensitive Loss: Definition, Properties and Application to Robust Adaptive Filtering. IEEE Trans. Signal Process. 2017, 65, 2888–2901. [Google Scholar] [CrossRef] [Green Version]
Guimaraes, J.P.F.; Fontes, A.I.R.; Rego, J.B.A.; Martins, A.M.; Principe, J.C. Complex correntropy: Probabilistic interpretation and application to complex-valued data. IEEE Signal Process. Lett. 2017, 24, 42–45. [Google Scholar] [CrossRef]
Guimaraes, J.P.F.; Fontes, A.I.R.; Rego, J.B.A.; Martins, A.M.; Principe, J.C. Complex Correntropy Function: Properties, and application to a channel equalization problem. Expert Syst. Appl. 2018, 107, 173–181. [Google Scholar] [CrossRef]
Alliney, S.; Ruzinsky, S.A. An algorithm for the minimization of mixed l1 and l2 norms with application to Bayesian estimation. IEEE Trans. Signal Process. 1994, 42, 618–627. [Google Scholar] [CrossRef]
Mandic, D.; Goh, V. Complex Valued Nonlinear Adaptive Filters: Noncircularity, Widely Linear and Neural Models (ser. Adaptive and Cognitive Dynamic Systems: Signal Processing, Learning, Communications and Control); John Wiley & Sons: New York, NY, USA, 2009. [Google Scholar]
Diniz, P.S.R. Adaptive Filtering: Algorithms and Practical Implementation, 4th ed.; Springer-Verlag: New York, NY, USA, 2013. [Google Scholar]
Qian, G.; Wang, S.; Wang, L.; Duan, S. Convergence Analysis of a Fixed Point Algorithm under Maximum Complex Correntropy Criterion. IEEE Signal Process. Lett. 2018, 24, 1830–1834. [Google Scholar] [CrossRef]
Qian, G.; Wang, S. Generalized Complex Correntropy: Application to Adaptive Filtering of Complex Data. IEEE Access 2018, 6, 19113–19120. [Google Scholar] [CrossRef]
Qian, G.; Wang, S. Complex Kernel Risk-Sensitive Loss: Application to Robust Adaptive Filtering in Complex Domain. IEEE Access 2018, 6, 2169–3536. [Google Scholar] [CrossRef]
Wirtinger, W. Zur formalen theorie der funktionen von mehr complexen veränderlichen. Math. Ann. 1927, 97, 357–375. [Google Scholar] [CrossRef]
Bouboulis, P.; Theodoridis, S. Extension of Wirtinger’s calculus to reproducing Kernel Hilbert spaces and the complex kernel LMS. IEEE Trans. Signal Process. 2011, 59, 964–978. [Google Scholar]
Zhang, X. Matrix Analysis and Application, 2nd ed.; Tsinghua University Press: Beijing, China, 2013. [Google Scholar]
Picinbono, B. On circularity. IEEE Trans. Signal Process. 1994, 42, 3473–3482. [Google Scholar] [CrossRef]

Figure 1. Learning curves of different algorithms.

Figure 2. EMSE as a function of noise variances.

Figure 3. Influence of the probability of outliers.

Figure 4. Influence of the variance of outliers.

Figure 5. Influence of the

σ

(

λ = 3

).

Figure 5. Influence of the

σ

(

λ = 3

).

Figure 6. Influence of the

λ

(

σ = 1

).

Figure 6. Influence of the

λ

(

σ = 1

).

Figure 7. Convergence curves of different algorithms.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qian, G.; Luo, D.; Wang, S. Recursive Minimum Complex Kernel Risk-Sensitive Loss Algorithm. Entropy 2018, 20, 902. https://doi.org/10.3390/e20120902

AMA Style

Qian G, Luo D, Wang S. Recursive Minimum Complex Kernel Risk-Sensitive Loss Algorithm. Entropy. 2018; 20(12):902. https://doi.org/10.3390/e20120902

Chicago/Turabian Style

Qian, Guobing, Dan Luo, and Shiyuan Wang. 2018. "Recursive Minimum Complex Kernel Risk-Sensitive Loss Algorithm" Entropy 20, no. 12: 902. https://doi.org/10.3390/e20120902

APA Style

Qian, G., Luo, D., & Wang, S. (2018). Recursive Minimum Complex Kernel Risk-Sensitive Loss Algorithm. Entropy, 20(12), 902. https://doi.org/10.3390/e20120902

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Recursive Minimum Complex Kernel Risk-Sensitive Loss Algorithm

Abstract

1. Introduction

2. Fixed Point Algorithm under Minimizing Complex Kernel Risk-Sensitive Loss

2.1. Complex Kernel Risk-Sensitive Loss

2.2. Recursive Minimum Complex Kernel Risk-Sensitive Loss (RMCKRSL)

2.2.1. Cost Function

2.2.2. Recursive Solution

3. Convergence Analysis

3.1. Stability Analysis

3.2. Excess Mean Square Error

4. Simulation

4.1. Example 1

4.2. Example 2

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI