Robust Hammerstein Adaptive Filtering under Maximum Correntropy Criterion

The maximum correntropy criterion (MCC) has recently been successfully applied to adaptive filtering. Adaptive algorithms under MCC show strong robustness against large outliers. In this work, we apply the MCC criterion to develop a robust Hammerstein adaptive filter. Compared with the traditional Hammerstein adaptive filters, which are usually derived based on the well-known mean square error (MSE) criterion, the proposed algorithm can achieve better convergence performance especially in the presence of impulsive non-Gaussian (e.g., α-stable) noises. Additionally, some theoretical results concerning the convergence behavior are also obtained. Simulation examples are presented to confirm the superior performance of the new algorithm.


Introduction
Nonlinear system identification is still an active research area [1].Although linear systems have established a solid theory [2], most practical systems (e.g., hands-free telephone systems) may be more adequately represented as a nonlinear model.One of the main challenges for nonlinear system identification is the choice of an appropriate nonlinear filtering structure that accurately captures the characteristics of the underlying nonlinear system.A common structure used in nonlinear modeling is the block-oriented representation.The Wiener model and the Hammerstein model are two typical block-oriented nonlinear models [3].Specifically, the Wiener model consists of a cascade of a linear time invariant (LTI) filter followed by a static nonlinear function, indicated as a linear-nonlinear (LN) model [4][5][6], and the Hammerstein model consists of a cascade of a static nonlinear function follow by a LTI filter, known as a nonlinear-linear (NL) model [7][8][9][10][11][12][13][14][15][16][17][18][19].Other nonlinear models include neural networks (NNs) [20], Volterra adaptive filters (VAFs) [21], kernel adaptive filters (KAF) [22][23][24][25], among others.
Hammerstein filters can accurately model many real-world systems and, as a consequence, they have been successfully used in various applications of engineering [26][27][28][29].Due to its simplicity and efficiency, the mean square error (MSE) criterion has been widely applied in Hammerstein adaptive filtering [30].Adaptive algorithms under MSE usually perform very well when the desired signals are disturbed by Gaussian noises.However, when the desired signals are disturbed by non-Gaussian noises, especially in the presence of large outliers (observations that significantly deviate from the bulk of data), the performance of the MSE based algorithms may deteriorate rapidly.Actually, MSE is rather sensitive to outliers.In most practical situations, heavy-tailed impulsive noises may occur, which often cause large outliers.For instance, different types of artificial noises in electronic devices, atmospheric noises, and lighting spikes in natural phenomena, can be described as an impulsive noise [31,32].
In this work, instead of using the MSE criterion, we apply the maximum correntropy criterion (MCC) to develop a robust Hammerstein adaptive filtering algorithm.Correntropy is a nonlinear similarity measure between two signals [33,34].The MCC aims at maximizing the similarity (measured by correntropy) between the model output and the desired response such that the adaptive model is as close as possible to the unknown system.It has been shown that, the MCC in terms of the stability and accuracy, is very robust with respect to impulsive noises [33][34][35][36][37][38][39].Compared with the traditional Hammerstein adaptive filtering algorithms based on the MSE criterion, the new algorithm can achieve better performance especially in the presence of impulsive non-Gaussian noises.
The organization of the rest of the paper is as follows.In Section 2, after briefly introducing the correntropy, we derive a Hammerstein adaptive filtering algorithm under MCC criterion.In Section 3, we carry out the convergence analysis.In Section 4, we present simulation examples to demonstrate the superior performance of the proposed algorithm.Finally, we give the conclusion in Section 5.

Hammerstein Adaptive Filtering under the Maximum Correntropy Criterion
Figure 1 shows the structure of a Hammerstein adaptive filter under MCC criterion, where the filter consists of a polynomial memoryless nonlinearity followed by a linear FIR filter.This structure has been commonly used in Hammerstein adaptive filtering [8,9,27].As shown in Figure 1, under the MCC criterion, the parameters of the linear and nonlinear parts are adjusted to maximize the correntropy between the model output and desired response.
Structure of a Hammerstein adaptive filter under maximum correntropy criterion (MCC) criterion.

Correntropy
Correntropy is a nonlinear similarity measure between two signals.Given two random variables X and Y, the correntropy is [33][34][35][36][37][38][39] ( , ) [ ( , )] ( , ) ( , where E[•] denotes the expectation operator, κ(•,•) is a shift-invariant Mercer kernel, and fXY(x, y) stands for the probability density function (PDF) of (X, Y).The most widely used kernel in correntropy is the Gaussian kernel, given by where e = x − y, and σ stands for the kernel bandwidth.In this work, without being mentioned otherwise, the kernel function is a Gaussian kernel.In practical situations, the join distribution of X and Y is usually unknown and only a finite number of data {(d(i), y(i))} K i=1 are available.In these cases, one can use a sample mean estimator of the correntropy: The optimization cost under MCC is thus 1 max ( ( )) where e(i) = d(i) − y(i).We can evaluate the sensitivity (derivative) of the MCC cost JMCC with respect to the error e(i), (5) The derivative curves of −JMCC for different kernel widths are illustrated in Figure 2. As one can see, when the magnitude of error is very large, the derivative will become rather small especially for a smaller kernel width.Therefore, the MCC training is insensitive (hence robust) to a large error.

Hammerstein Adaptive Filtering
Assuming that the input-output mapping of the memoryless polynomial nonlinearity is where M and pM denote the polynomial order and the m-th order coefficient, Expression (6) can be rewritten as where T is the polynomial regressor, and p(n) = [p1 p2•••pM] T is the polynomial coefficient vector.The output of the FIR filter can be expressed as where T is the FIR weight vector, and Combining Equations ( 8) and ( 9) yields ( ) w ( )s( ) w ( ) ( )p( ) Assume that the unknown system that needs to be identified is also a Hammerstein system with parameter vectors Then, the desired signal can be expressed as * ( ) w ( )p ( ) where v(n) stands for an additive disturbance noise.The error signal can then be calculated as In the following, we derive an adaptive algorithm to estimate the Hammerstein parameter vectors using MCC instead of MSE as an optimization criterion.Let us consider the following cost function where e(j) = d(j) − y(j), and L denotes the sliding data length.Then, a steepest ascent algorithm for estimating the polynomial coefficient vector can be derived as follows: e j e j e j n j e j e j j j In a similar way, we propose the following weight update equation for the coefficients of the FIR filter: e j e j e j n j e j e j j j In Equations ( 14) and ( 16), μp and μw are, respectively, step-sizes for polynomial nonlinearity subsystem and FIR subsystem.In this work, for simplicity we consider only the stochastic gradient based algorithm (i.e., L = 1).In this case, we have  The proposed algorithm is in form similar to the traditional Hammerstein adaptive filters under MSE criterion [7], but the step-sizes are different.

Stability Analysis
Using the Taylor series expansion of the error e(n + 1) around the instant n and keeping only the linear term, we have [4,7,40] where h.o.t denotes higher-order terms.Combining Equations ( 11), ( 17) and ( 18), we can obtain Substituting Equations ( 20)- (23) in Equation (19), and after simple manipulation, we have To ensure the stability of the proposed algorithm, we must assure that |e(n + which yields , the following condition guarantees convergence: Remark 1.The derived bound on step-sizes is only of theoretical importance as in general, Equation ( 27) cannot be verified in a practical situation.Similar theoretical results can be found in [7].

Steady-State Mean Square Performance
We denote epw(n) the a priori error of the whole system, ep(n) the a priori error when only the nonlinear part is adapted while the linear filter is fixed, and ew(n) the a priori error when only the linear filter is adapted while the nonlinear part is fixed.Let Before evaluating the theoretical values of the steady-state EMSEs, we make the following assumptions: (A) The noise v(n) is zero-mean, independent, identically distributed, and is independent of the input X(n), ˆ( ) s n and e(n).(B) The a priori errors ep(n) and ew(n) are zero-mean Gaussian, and independent of the noise v(n).
Remark 2. For the assumption (A), it is very common to assume that the noise is independent of the regression vector [41][42][43].In addition, the noise is often restricted to be zero-mean, identically distributed [33][34][35].As discussed in [44,45], the assumption (B) is reasonable for long adaptive filters.Since ep(n) is the a priori error when only the nonlinear part is adapted while the linear filter is fixed, we have the approximation w * ≈ w(n) such that w(n) is asymptotically uncorrelated with f 2 (e(n)).Due to the independent assumption (A), X(n) is also asymptotically uncorrelated with f 2 (e(n)).So ||X T (n)w(n)|| 2 is asymptotically uncorrelated with f 2 (e(n)).Similarly, ||X T (n)p(n)|| 2 is asymptotically uncorrelated with f 2 (e(n)).Therefore, the assumption (C) is rational.
When only the polynomial part with parameter vector p is adapted, the error ep(n) is * ( ) w ( )p w ( ) ( )p( ) w ( ) ( )p( ) Taking the expectations of the both sides of Equation ( 33) yields Assuming the filter is stable and attains the steady state, it holds Combining Equations ( 34) and (35) and the above assumptions, we obtain In order to derive a theoretical value of the steady-state EMSE, we consider two cases below.

Case A. Gaussian Noise
Recalling that e(n) = ep(n) + v(n), and assuming that the noise v(n) is zero-mean Gaussian, with variance ϛ 2 v , we get [34] 3 where σ 2 e denotes the variance of the error, and ) Substituting Equations ( 37) and (38) into Equation ( 36), we have ) Therefore, the steady-state EMSE Hp satisfies Theorem 1.In a Gaussian noise environment and with the same step-size, the proposed nonlinear Hammerstein adaptive filter under MCC criterion has a smaller steady-state EMSE than under MSE criterion.
As the kernel width increases, their values of the steady-state EMSE will become almost identical.
Proof.It can be shown that [34] 2 where Hp−MSE denotes the steady-state EMSE under MSE criterion.From Equation ( 40), we have where ) Further, as σ → ∞, we have Hp → Hp−MSE.

Case B. Non-Gaussian Noise
Taking the Taylor series expansion of f(e(n)) Under the assumptions (A) and (B), we get [34] [ ] Substituting Equations ( 46) and (47) into Equation (36), we Further, substituting Equation (45) into Equation (48), we obtain When only the linear filter with parameter vector w(n) is adapted, we get where ( ) w ( ) ( )p( ) . For Gaussian noise case, we obtain ) In non-Gaussian environments, we have It follows that ( ) where lim ( ) ( )  stands for the cross-EMSE and Hcross ≥ 0 (Hcross = 0 when ep(n) and ew(n) are statistically independent and zero mean) [7].Therefore, Hpw ≥ Hp + Hw, which completes the proof.

Simulation Results
Now, we present simulation results to demonstrate the performance of the Hammerstein adaptive filtering under MCC.In order to show the performance of the proposed algorithm in non-Gaussian noises, we adopt the alpha-stable distribution to generate the disturbance noise, whose characteristic function is [32,46] (t where α ϵ (0, 2] denotes the characteristic factor, −∞ < δ < +∞ is the location parameter, β ϵ [−1, 1] stands for the symmetry parameter, and γ > 0 is the dispersion parameter.The characteristic factor α measures the tail heaviness of the distribution.The smaller α is, the heavier the tail is.In addition, γ measures the dispersion of the distribution.The distribution is symmetric about its location δ when β = 0.Such a distribution is called a symmetric alpha-stable (SαS) distribution.The parameters vector of the noise model is defined as V = (α, β, γ, δ).
In the simulations below, the input signal considered is a colored signal obtained from the following equation: with a = 0.95, and ξ(n) being a white Gaussian signal of unit variance.In addition, the coefficient vectors are initialized with the first coefficient equal to 1 and the others equal to zero [7].

Experiment 1
First, we consider an unknown Hammerstein system with parameter vectors p * = [1, 0.6], w * = [1, 0.6, 0.1, −0.2, −0.06, 0.04, 0.02, −0.03, −0.02, 0.01].Thus, M = 2, N = 10.The kernel width σ is 1.0.The noise vector V is set at (1.2, 0, 0.6, 0), and the noise signal is shown in Figure 3. Simulation results are averaged over 100 independent Monte Carlo runs, and in each simulation, 15,000 iterations are run to ensure the algorithm will reach the steady state, and the steady-state MSE is obtained as an average over the last 2000 iterations.The step-sizes are set at μp = μw = 0.005 and μp = 0.01, μw = 0.01 for MSE and MCC, respectively.Figure 4 shows the average convergence curves under MCC and MSE.As we can see, the Hammerstein adaptive filtering under MCC criterion achieves faster convergence speed and lower steady-state testing MSE than under MSE criterion.Here the testing MSE is evaluated on a test set with 100 samples.Second, we investigate the performance of the algorithms with different noise parameters.The steady-state MSEs with different γ (0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6) and different α (0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4, 1.6, 1.8, 2.0) are shown in Figures 5 and 6, respectively.We observe: (1) In most cases, the new algorithm performs better and achieves a lower steady-state MSE compared with the Hammerstein adaptive filtering under MSE criterion; (2) When α is close to 2.0, the Hammerstein adaptive filtering under MSE criterion can achieve better performance than under MCC criterion.The main reason for this is that, when α ≈ 2.0, the noise will be approximately Gaussian.Simulation results suggest that the proposed algorithm is particularly useful for identifying a Hammerstein system in non-Gaussian noises.

Experiment 2
The second experiment is drawn from [47].The nonlinear dynamic system is composed of two blocks.The first block is a non-polynomial nonlinearity The noise vector V is set at (1.0, 0, 0.8, 0) (see Figure 7 for a typical sequence of the noise), and the polynomial order M and the FIR memory size N are set at 3 and 6, respectively.Simulation results are averaged over 50 independent Monte Carlo runs, and in each simulation, 30,000 iterations are run to ensure the algorithm will reach the steady state, and the steady-state MSE is obtained as an average over the last 2000 iterations.The testing MSE is evaluated on a test set with 100 samples.Figure 8 demonstrates the convergence curves under MCC and MSE.For both adaptive filtering algorithms, the step-sizes are set at μp = 0.005, μw = 0.015.It can be seen that, the Hammerstein adaptive filter under MCC criterion performs better (say, with faster convergence speed and smaller mismatch error) than under MSE criterion.Finally, we show the steady-state performance of the algorithms with different kernel widths σ (0.01, 1.0, 2.0, 3.0, 4.0, 5.0).Simulation results are shown in Figure 9.As we can see, the kernel width has significant influence on the performance of the proposed algorithm.In this example, the lowest steady-state MSE is obtained when σ = 1.0.

Conclusions
The MCC has been successfully applied in domains of machine learning and signal processing due to its strong robustness in impulsive non-Gaussian situations.In this work, we develop a robust Hammerstein adaptive filter under MCC criterion.Different from the traditional Hammerstein adaptive filtering algorithms, the new algorithm use the MCC instead of the well-known MSE as the adaptation criterion, which can achieve desirable performance especially in impulsive noises.Based on [7,31], we carry out the convergence analysis, and obtain some important theoretical results.Simulation examples confirm the excellent performance of the proposed algorithm.How to verify the derived theoretical results is an interesting topic for future study.

Figure 2 .
Figure 2. Derivative curves of −JMCC with respect to e(i) for different kernel widths.

2
update equations are referred to as the Hammerstein adaptive filtering algorithm under MCC criterion, whose pseudocodes are presented in Algorithm 1.

Figure 4 .
Figure 4. Convergence curves under maximum correntropy criterion (MCC) and mean square error (MSE) (for unknown system with polynomial nonlinearity).
the second block is an FIR filter with weight vector [ ] 1 0.75 0.5 0.25 0 0.25

Figure 8 .
Figure 8. Convergence curves under maximum correntropy criterion (MCC) and mean square error (MSE) (for unknown system with non-polynomial nonlinearity).