A Robust Adaptive Filter for a Complex Hammerstein System

The Hammerstein adaptive filter using maximum correntropy criterion (MCC) has been shown to be more robust to outliers than the ones using the traditional mean square error (MSE) criterion. As there is no report on the robust Hammerstein adaptive filters in the complex domain, in this paper, we develop the robust Hammerstein adaptive filter under MCC to the complex domain, and propose the Hammerstein maximum complex correntropy criterion (HMCCC) algorithm. Thus, the new Hammerstein adaptive filter can be used to directly handle the complex-valued data. Additionally, we analyze the stability and steady-state mean square performance of HMCCC. Simulations illustrate that the proposed HMCCC algorithm is convergent in the impulsive noise environment, and achieves a higher accuracy and faster convergence speed than the Hammerstein complex least mean square (HCLMS) algorithm.


Introduction
Since traditional mean square error (MSE) criterion derived algorithms are sensitive to outliers, they cannot be used to deal with impulsive noise effectively [1,2]. However, impulsive noise commonly exists in practice. To solve this problem, a higher-order statistic, named correntropy, was proposed [3,4]. It has been proven that the maximum correntropy criterion (MCC) algorithm is robust to impulsive noises, and outperforms the traditional MSE algorithms obviously when the noises are non-Gaussian distributed. Thus, the MCC algorithm [5,6] and its variants [7][8][9][10], such as generalized maximum correntropy criterion (GMCC) [7], are widely used in practice.
Different from the well-known Wiener adaptive filter [11], the Hammerstein adaptive filter consists of two parts, namely: a nonlinear memoryless polynomial function and a linear finite impulse response (FIR) filter [12][13][14]. The Hammerstein system has been widely applied to signal processing [15][16][17][18] as well as other applications [19,20]. Considering that the performance of the Hammerstein adaptive filter using MSE criterion decreases dramatically when an impulsive noise exists, Wu et al. applied the MCC criterion to the Hammerstein adaptive filter, and developed a robust Hammerstein adaptive filtering algorithm [21]. This novel adaptive filter is insensitive to outliers and behaves better than the traditional Hammerstein adaptive filters, especially in the case of the impulsive noise.
However, the Hammerstein adaptive filters under the traditional MSE criterion and MCC criterion are defined in the field of real numbers. They cannot be directly employed to handle the complex-valued data. In fact, many signals are defined in the complex domain in practical applications [22][23][24][25]. Thus, in this work, we put forward a Hammerstein maximum complex correntropy criterion (HMCCC) algorithm, which extends the Hammerstein adaptive filter, using MCC criterion, to the complex domain. HMCCC can be used to handle complex-valued data directly, while being robust to impulsive noise. We analyzed the stability and provided the steady-state mean square performance of the HMCCC algorithm. Simulations show that the HMCCC is robust to outliers, and achieves a higher accuracy and faster convergence speed than the Hammerstein complex least mean square (HCLMS) algorithm.
The rest of the paper is organized as follows. A complex Hammerstein adaptive filter under MCCC is developed in Section 2. In Section 3, we analyze the stability and provide the steady-state mean square performance of the HMCCC algorithm. In Section 4, several simulations are presented so as to verify the superior performance of the HMCCC algorithm. Finally, a conclusion is drawn in Section 5.

Complex Correntropy
Considering two complex variables that are C 1 = X 1 + jY 1 and C 2 = X 2 + jY 2 , respectively, the complex correntropy is defined by the following [22]: where κ c σ (C 1 − C 2 ) represents the kernel function, and X 1 , Y 1 , X 2 , and Y 2 denote the real variables. A Gaussian kernel is adopted in this paper, which is expressed as follows: with σ being the kernel width.

Cost Function
Consider a complex Hammerstein system, the output of polynomial nonlinear part is as follows: where p = p 1 p 2 · · · p M T denotes the vector of the complex polynomial coefficient, M is the polynomial order, and T is the complex polynomial regressor vector, (·) T and (·) H denote the transpose and conjugate transpose, respectively. The cost function of the complex Hammerstein filtering algorithm under MCCC is as follows: where , N denotes the length of the linear FIR filter, w 0 and p 0 are the unknown system parameters to be estimated, which are the optimum solutions for w and p, and v(k) is the observation noise.

Adaptive Algorithm
Based on the Wirtinger Calculus [26][27][28], we derive the stochastic gradient of J HMCCC with respect to p * as follows: and with respect to w * as follows: where Then, the updates for p and w are as follows: where f (e(k)) = exp − |e(k)| 2 2σ 2 e * (k) and the constant 1 2πσ 2 is merged into the step-size parameters η p and η w .

Convergence Analysis
To begin the derivation of the convergence analysis, some widely used assumptions are adopted, as follows: (A1) v(k) is independently identically distributed (iid), zero-mean, circular, and independent of e p (k), e w (k) and x(k); (A2) Both X T (k)p * (k) 2 and X(k)w * (k) 2 are uncorrelated with | f (e(k))| 2 when k → ∞ .

Stability Analysis
As |e(k)| 2 is a real-valued function for p T (k) w T (k) T , the following expression can be derived by taking the Taylor series expansion of |e(k and h.o.t represents the terms of higher order infinitesimal. Then, Thus, the sequence |e(k)| will decrease in the mean sense if that is, Considering the fact that exp − |e(k)| 2 2σ 2 ≤ 1, we can obtain that the sequence |e(k)| will decrease In this case, the HMCCC algorithm will converge in the mean sense.

Steady Excess Mean Square Error
We  When the algorithm reaches the steady, the error for the whole Hammerstein system can be approximately divided into two parts, as follows: When only the nonlinear part is taken into consideration, we have and Multiplying each side of Equation (20) by its conjugate transpose and taking the expectation, we obtain the following: Since lim 2 , we further obtain the following: Based on the results of Equations (38) and (46) in the literature [23], we similarly obtain the following expressions by replacing α and λ with 1 and 1/2σ 2 , respectively, as follows: where Furthermore, based on the result of Equation (47) in the literature [23], we obtain the EMSE for the nonlinear part by replacing When only the FIR filter is taken into consideration, we similarly derive Furthermore, when both the nonlinear part and FIR filter are considered, we have Remark 2: H cross = lim k→∞ E 2Re e p * (k)e w (k) is the cross EMSE of the Hammerstein system, and equals zero when both e p (k)and e w (k) are zero mean and independent.

Simulation
In this part, we provide some simulations to illustrate the superior performance of HMCCC. We chose the weight vector as w 0 = 1 + 0.6j 0.6+j 0.1+0.2j 0.2+0.1j 0.06+0.04j 0.04+0.06j T and the complex polynomial coefficient vector as p 0 = 1+0.6j 0.6+j T . An additive complex noise v = v R + jv I was considered in the simulations, with v R and v I being the real and imaginary parts, respectively. We compared the performance of HMCCC with HCLMS (HCLMS is the extension of HLMS [17] to complex domain, which is summarized in the Appendix A), and chose the parameters of both algorithms by trial, in order to ensure a desirable solution. Simulation results were obtained by averaging 100 Monte Carlo trials. The input signal x(k) was generated by a first-order autoregressive process, as follows: where x(k) = x R (k) + jx I (k), x R (k), and x I (k) are the real and imaginary parts of x(k), a = 0.95, ξ(k) = ξ R (k) + jξ I (k), ξ R (k), and ξ I (k) are the real and imaginary parts of ξ(k), ξ R (k), ξ I (k) ∼ N(0, 1), and N µ, σ 2 denotes the Gaussian distribution with mean µ and variance σ 2 , respectively. First, the superiority of HMCCC was verified in the complex alpha stable noise environment. The noise parameters were v R , v I ∈ σ v · v al pha (α, β, γ, δ), where σ 2 v = 0.1, α = 1.2 is the characteristic factor, β = 0 is the symmetry parameter, γ = 0.6 is the dispersion parameter, and δ = 0 is the location parameter, respectively. Figure 1 shows the time sequence and histogram for the real and imaginary parts of the complex alpha stable noise. It is noted that HCLMS may diverge in the complex alpha stable noise environment. Thus, we omitted the trials for HCLMS if w 2 ≥ 100. The simulation shows that HCLMS diverged twice in the 100 trials, while HMCCC did not diverge. The performances of the different algorithms in terms of the normalized testing mean square error (MSE) are shown in Figure 2, where the testing MSE was obtained from a test set of 100 samples, and the trials of the divergence were omitted for HCLMS. It is clear that compared with HCLMS, HMCCC has a better filtering performance in the presence of complex alpha stable noise. simulation shows that HCLMS diverged twice in the 100 trials, while HMCCC did not diverge. The performances of the different algorithms in terms of the normalized testing mean square error (MSE) are shown in Figure 2, where the testing MSE was obtained from a test set of 100 samples, and the trials of the divergence were omitted for HCLMS. It is clear that compared with HCLMS, HMCCC has a better filtering performance in the presence of complex alpha stable noise.  Then, we compared the steady testing MSE of HMCCC under different noise parameters. We ran 15,000 iterations to make sure the HMCCC reaches steady, and calculated the steady testing MSE with the average of next 1000 iterations. Figure 3 shows the steady normalized testing MSEs under different characteristic factors and dispersion parameters, respectively. It illustrates that HMCCC can perform well under different parameters of alpha stable noise.  Then, we compared the steady testing MSE of HMCCC under different noise parameters. We ran 15,000 iterations to make sure the HMCCC reaches steady, and calculated the steady testing MSE with the average of next 1000 iterations. Figure 3 shows the steady normalized testing MSEs under different characteristic factors and dispersion parameters, respectively. It illustrates that HMCCC can perform well under different parameters of alpha stable noise.
Next, the superiority of HMCCC is verified in the contaminated Gaussian (CG) noise environment, Figure 4 shows the time sequence and histogram for the real and imaginary parts of the CG noise. The performances of the different algorithms on the basis of normalized testing MSE are shown in Figure 5, where the testing MSE was also obtained from a test set of 100 samples. One can clearly see that, compared with HCLMS, HMCCC has a better filtering performance in the presence of CG noise.
Then, we compared the steady testing MSE of HMCCC under different noise parameters. We ran 15,000 iterations to make sure the HMCCC reaches steady, and calculated the steady testing MSE with the average of next 1000 iterations. Figure 3 shows the steady normalized testing MSEs under different characteristic factors and dispersion parameters, respectively. It illustrates that HMCCC can perform well under different parameters of alpha stable noise.  Figure 4 shows the time sequence and histogram for the real and imaginary parts of the CG noise. The performances of the different algorithms on the basis of normalized testing MSE are shown in Figure 5, where the testing MSE was also obtained from a test set of 100 samples. One can clearly see that, compared with HCLMS, HMCCC has a better filtering performance in the presence of CG noise.   Furthermore, we tested the robustness of the HMCCC algorithm to the outlier. The CG noise was also used in this simulation, where ( ) Figure 6     Furthermore, we tested the robustness of the HMCCC algorithm to the outlier. The CG noise was also used in this simulation, where ( ) , 0 , Figure 6   Furthermore, we tested the robustness of the HMCCC algorithm to the outlier. The CG noise was also used in this simulation, where v 1R , v 1I ∈ N(0, 0.1), P(c(k) = 1) = p, v 2R , v 2I ∈ N 0, σ 2 B , and P(c(k)= 0)= 1 − p. Figure 6 depicts the steady normalized testing MSE of the HMCCC algorithm , and variances of outlier, σ 2 B , where 15,000 iterations were run to make sure HMCCC reached steady, and the steady normalized testing MSEs were calculated with the average of the next 1000 iterations. One can observe that the proposed HMCCC algorithm is robust to the outlier, and behaves well under different p and σ 2 B . Moreover, HMCCC has a slightly smaller steady testing MSE with a bigger σ 2 B , which is a little surprising, but is consistent with Chen's work [9]. This is due to the fact that the convergence rates are slightly different under different σ 2 B , even with the same learning rate.  Figure 7 presents the normalized testing MSE of HMCCC in the case of three different kernel widths σ . It can be seen that kernel width σ has a vital role on the learning rate and steady value of HMCCC. With a small kernel width, HMCCC converges slowly, but achieves a small steady value. On the contrary, with a large kernel width, HMCCC converges quickly, but achieves a high steady value.
. It is noted that the testing MSEs were not normalized in this simulation, and were obtained from a test set of 1000 samples. In addition, the theoretical values for the nonlinear part and FIR part were calculated by Equations (26) and (27), respectively. Figure 8 shows the simulated steady testing MSEs and the theoretical ones under different 2 v σ , where 40,000 iterations were run to make sure Afterward, we investigated the influences of the kernel width σ on the performance of HMCCC. The CG noise was also employed in this simulation, where v 1R , v 1I ∈ N(0, 0.1), P(c(k) = 1) = 0.06, v 2R , v 2I ∈ N(0, 20), and P(c(k)= 0)= 1 − 0.06. Figure 7 presents the normalized testing MSE of HMCCC in the case of three different kernel widths σ. It can be seen that kernel width σ has a vital role on the learning rate and steady value of HMCCC. With a small kernel width, HMCCC converges slowly, but achieves a small steady value. On the contrary, with a large kernel width, HMCCC converges quickly, but achieves a high steady value. Afterward, we investigated the influences of the kernel width σ on the performance of HMCCC. The CG noise was also employed in this simulation, where ( ) , 0 ,2 0 Figure 7 presents the normalized testing MSE of HMCCC in the case of three different kernel widths σ . It can be seen that kernel width σ has a vital role on the learning rate and steady value of HMCCC. With a small kernel width, HMCCC converges slowly, but achieves a small steady value. On the contrary, with a large kernel width, HMCCC converges quickly, but achieves a high steady value.
. It is noted that the testing MSEs were not normalized in this simulation, and were obtained from a test set of 1000 samples. In addition, the theoretical values for the nonlinear part and FIR part were calculated by Equations (26) and (27), respectively. Figure 8 shows the simulated steady testing MSEs and the theoretical ones under different 2 v σ , where 40,000 iterations were run to make sure the algorithm reached steady, and the steady testing MSEs were calculated with the average of the Finally, we compared the simulated steady testing MSEs with the theoretical ones. The Gaussian noise is used in this simulation, where v(k) = v R (k) + jv I (k) and v R , v I ∈ N 0, σ 2 v . It is noted that the testing MSEs were not normalized in this simulation, and were obtained from a test set of 1000 samples. In addition, the theoretical values for the nonlinear part and FIR part were calculated by Equations (26) and (27), respectively. Figure 8 shows the simulated steady testing MSEs and the theoretical ones under different σ 2 v , where 40,000 iterations were run to make sure the algorithm reached steady, and the steady testing MSEs were calculated with the average of the next 1000 iterations. One can see that the simulated values almost matched with the theoretical ones for the nonlinear part and FIR part. Moreover, there is a little gap between the simulated whole system and the sum of theoretical nonlinear and FIR parts, which is the value of cross EMSE.

Conclusions
Since the Hammerstein adaptive filter can only be used to deal with real-valued data, in this paper, we extended the Hammerstein filter under maximum correntropy criterion (MCC) to the complex domain and developed a new algorithm, named the Hammerstein maximum complex correntropy criterion (HMCCC). Simultaneously, we analyzed the stability and derived some theoretical results for the HMCCC algorithm. The simulation illustrated that HMCCC is always convergent and performs better than the traditional Hammerstein complex LMS (HCLMS) algorithm in the presence of impulsive noises. Additionally, the kernel width has an important impact on the performance of HMCCC, and the novel algorithm behaves well with different probabilities and variances of outliers.