Kernel Mean p-Power Loss-Enhanced Robust Hammerstein Adaptive Filter and Its Performance Analysis

Liu, Yan; Tu, Chuanliang; Liu, Yong; Chen, Yu; Wen, Chenggan; Yin, Banghui

doi:10.3390/sym17091556

Open AccessArticle

Kernel Mean p-Power Loss-Enhanced Robust Hammerstein Adaptive Filter and Its Performance Analysis

by

Yan Liu

¹,

Chuanliang Tu

^2,*

,

Yong Liu

³,

Yu Chen

⁴,

Chenggan Wen

⁴ and

Banghui Yin

⁵

¹

College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China

²

College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China

³

College of Electronic Science, National University of Defense Technology, Changsha 410073, China

⁴

College of Semiconductors (College of Integrated Circuits), Hunan University, Changsha 410082, China

⁵

School of Electronic Information, Central South University, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(9), 1556; https://doi.org/10.3390/sym17091556

Submission received: 7 August 2025 / Revised: 2 September 2025 / Accepted: 8 September 2025 / Published: 17 September 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Hammerstein adaptive filters (HAFs) are widely used for nonlinear system identification due to their structural simplicity and modeling effectiveness. However, their performance can degrade significantly in the presence of impulsive disturbance or other more complex non-Gaussian noise, which are common in real-world scenarios. To address this limitation, this paper proposes a robust HAF algorithm based on the kernel mean p-power error (KMPE) criterion. By extending the p-power loss into the kernel space, KMPE preserves its symmetry while providing enhanced robustness against non-Gaussian noise in adaptive filter design. In addition, random Fourier features are employed to flexibly and efficiently model the nonlinear component of the system. A theoretical analysis of steady-state excess mean square error is presented, and our simulation results validate the superior robustness and accuracy of the proposed method over the classical HAF and its robust variants.

Keywords:

adaptive filter; Hammerstein system; kernel mean p-power error; random Fourier features; theoretical analysis

1. Introduction

Adaptive filtering is a fundamental technique in the field of signal processing, offering effective solutions for tracking and modeling time-varying systems. While conventional adaptive filters are well-suited for linear scenarios, their performance can degrade significantly in the presence of unknown nonlinearities for practical applications. To overcome this limitation, structured nonlinear models have been integrated into adaptive filtering frameworks, enabling more accurate system representation. Among these, the Hammerstein model [1], comprising a static nonlinear block followed by a linear dynamic system, has become a widely adopted approach due to its simplicity and effectiveness in capturing nonlinear dynamics. Building on this structure, Hammerstein adaptive filters (HAFs) have been extensively investigated and have found successful applications in various scenarios, such as global navigation satellite systems, acoustic echo cancelers, and nonlinear system identification [2,3,4,5,6].

A critical consideration in designing HAF algorithms lies in the choice of the cost function. Traditionally, the mean square error (MSE) criterion has been widely adopted for its simplicity and solid theoretical foundation. However, MSE is known to be sensitive to outliers and non-Gaussian disturbances, which often arise in practical nonlinear systems. To overcome this limitation, the sign error criterion has been introduced as an alternative, leading to the development of the sign normalized least mean square algorithm based on the Hammerstein spline adaptive filter [7]. Owing to the inherent insensitivity to outliers of the sign error criterion, such a filter demonstrates improved robustness compared to MSE-based approaches. Nevertheless, the derivative of the sign error function with respect to the error is either 1 or

- 1

at non-zero points. This characteristic prevents the algorithm from assigning negligible weights to large error samples—often caused by outliers—and can result in the loss of valuable information embedded in normal error samples. To further enhance robustness and adaptability, the least mean p-power (LMP) error criterion has been employed to design HAFs [8]. By flexibly tuning the value of p, the mean p-power error criterion can find its form equivalent to classical MSE and the sign error criterion, enabling more versatile and effective filtering in non-Gaussian environments. Other than the sign error and LMP error criteria, alternative cost functions such as the maximum correntropy criterion (MCC) [9,10] and kernel risk-sensitive loss (KRSL) [11] have also demonstrated strong potential. These advanced error criteria have significantly improved the robustness of HAF algorithms. However, achieving an optimal trade-off between robustness, convergence speed, and steady-state accuracy remains a significant challenge.

To address these issues, this paper proposes a novel Hammerstein adaptive filtering framework based on the kernel mean p-Power error (KMPE) criterion [12]. KMPE extends the traditional p-power loss into kernel space, enabling improved robustness against outliers while capturing higher-order statistical information. By embedding the input data in a reproducing kernel Hilbert space [13], the proposed approach leverages the kernel-induced features to construct a more resilient cost surface, thus improving both stability and performance under non-Gaussian noise conditions. For additional application examples of KMPE, please refer to [14,15,16].

In addition to the choice of cost function, the modeling of the nonlinear component within the Hammerstein structure is crucial for accurately capturing system dynamics. In earlier designs, polynomial functions were commonly adopted as the default approach [4]. However, due to their limited approximation capacity, polynomial-based methods may exhibit suboptimal performance, particularly when identifying Hammerstein systems without prior knowledge of the nonlinear sub-block. To address this limitation, some alternative nonlinear mapping strategies have been explored. Representative models include spline functions [17], the extreme learning machine model [18], the Volterra model [19], the kernel-based model [20,21], and the random Fourier features model [11]. Among these, the random Fourier features (RFF)-based model is capable of providing a good balance between computational efficiency and approximation accuracy, making it particularly suitable for capturing unknown nonlinearities [22,23,24]. Motivated by the desirable properties of the RFF-based model, this paper adopts random Fourier features to model the nonlinear transformation. Given that this approach allows for a more flexible representation of the nonlinear sub-block, it significantly enhances both the scalability and representational capacity of the overall system. The main contributions of this paper are outlined as follows:

(1): A robust Hammerstein adaptive filter based on the KMPE cost function is proposed, providing enhanced resistance to non-Gaussian noises.
(2): The random Fourier feature-based nonlinear modeling scheme is integrated into the proposed HAF structure, enabling efficient and flexible representation of nonlinearities.
(3): Theoretical analyses of steady-state excess mean square error are provided to reveal the steady-state behavior of the proposed method.
(4): Numerical experiments were conducted to validate the effectiveness and robustness of the proposed method in nonlinear system identification tasks.

The remainder of this paper is organized as follows. Section 2 presents the framework of the HAF. In Section 3, we derive a robust HAF algorithm based on the KMPE criterion. Section 4 focuses on analyzing the steady-state performance of the algorithm. Section 5 provides several experimental evaluations that validate both the theoretical analysis and the desired performance of the proposed algorithm. Finally, Section 6 concludes this work.

2. HAF Structure

The HAF is a block-based filter [4], whose block is shown in Figure 1. It is clear that the HAF consists of a nonlinear sub-block and a linear sub-block. For an input signal

x (n)

, it is first sent to the nonlinear sub-block, obtaining the following intermediate output:

\begin{matrix} s (n) & = f (x (n)) \\ = h {(n)}^{T} z (n), \end{matrix}

(1)

where

z (n)

is the new representation of

x (n)

in a high-dimensional feature space, and where

{h (n) = [h (1), h (2), \dots, h (L))]}^{T}

is a vector to store the corresponding weights. Many methods can be used to construct

z (n)

. For example, if the polynomial method is adopted,

z (n)

can be constructed by

z (n) = {[x {(n)}^{1}, x {(n)}^{2}, \dots, x {(n)}^{L}]}^{T},

(2)

where L is the dimension of the feature space. However, due to the inherent limitations of polynomials in approximating unknown nonlinear functions, methods that utilize polynomial features may yield unexpected performance when applied to identify Hammerstein systems without any prior knowledge of the nonlinear sub-block. Consequently, some advanced techniques have been employed to construct

z (n)

. Among these methods, the RFF method is a newly introduced method that has exhibited remarkable competitiveness [11]. Let

{α_{j}}_{j = 1}^{L}

denote a sequence that is randomly generated according to a Gaussian distribution with zero-mean and variance

τ^{2}

. Additionally, let

{b_{j}}_{j = 1}^{L}

denote another sequence that is randomly generated following a uniform distribution over the interval

[0, 2 π]

. Then,

z (n)

obtained through the RFF method can be expressed as follows:

z (n) = \sqrt{\frac{2}{L}} {[\cos (α_{1}^{T} x (n) + b_{1}), \dots, \cos (α_{L}^{T} x (n) + b_{L})]}^{T},

(3)

where

cos (\cdot)

denotes the cosine function. In the rest of the paper, the RFF method will be the default option to construct

z (n)

due to its simplicity and excellent competitiveness in comparison with other options.

Once the vector

z (n)

is constructed, the corresponding intermediate output

s (n)

can be obtained using (1). This intermediate output is then forwarded to the linear sub-block to yield the final system output

y (n)

. In particular, if we use

{w (n) = [w (1), w (2), \dots, w (M))]}^{T}

to denote the weight vector of the linear sub-block, the final

y (n)

can be obtained with the following formula:

\begin{matrix} y (n) & = w {(n)}^{T} s (n) \\ = w {(n)}^{T} Z (n) h (n), \end{matrix}

(4)

where

s (n) = {[s (n), s (n - 1), \dots, s (n - M + 1)]}^{T}

and

Z (n) = {[z (n), z (n - 1), \dots, z (n - M + 1)]}^{T}

.

It can be observed from (1) and (4) that both

h (n)

and

w (n)

are the parameters that require adjustment in the developed model. To effectively learn these parameters with the noisy observation

d (n)

, a well-designed cost function is crucial. In previous studies [4], instantaneous MSE has typically been employed as a default choice, leading to the following cost function:

\begin{matrix} J_{m s e} (w (n), h (n)) & = e {(n)}^{2} \end{matrix}

(5)

where

e (n) = d (n) - w {(n)}^{T} Z (n) h (n)

is the estimated error of the nth sample. Although this cost function is effective when the observation noise sequence

v (n), n = 1, 2, \dots

is drawn from a Gaussian distribution, it will cause a bad performance of the designed HAF when the observation noise sequence contains some outliers or other more complex non-Gaussian noises. To enhance the robustness of the HAF, we propose a robust version of the HAF with a KMPE-based cost function.

3. Proposed Method

3.1. Kernel Mean p-Power Error

KMPE [12] is a well-designed metric that can be used to measure the difference between two arbitrary random variables. In particular, if we use X and Y to denote the two variables then KMPE between them can be calculated by

V (X, Y) = 2^{- p / 2} E [{∥ Φ (X) - Φ (Y) ∥}^{p}]

(6)

where

p > 0

is the power parameter,

Φ (\cdot)

denotes a nonlinear function that transforms a given input variable into the kernel space, and

∥ \cdot ∥

denotes the 2-norm operator in this kernel space. By combining the kernel trick [25], (6) can be rewritten as

\begin{matrix} V (X, Y) & = 2^{- p / 2} E [{({∥ Φ (X) - Φ (Y) ∥}^{2})}^{p / 2}] \\ = 2^{- p / 2} E [{(2 - 2 κ_{σ} (X - Y))}^{p / 2}] \\ = E [{(1 - κ_{σ} (X - Y))}^{p / 2}] \end{matrix}

(7)

where

κ (X, Y)

is a kernel function. Without loss of generality, we will adopt the Gaussian kernel for calculation in the rest of the paper, i.e.,

κ_{σ} (X, Y) = exp (- \frac{{∥ X - Y ∥}^{2}}{2 σ^{2}})

(8)

where

σ

denotes the kernel size of the used Gaussian kernel.

In practice, obtaining a specified joint distribution function for the variables X and Y is extremely challenging, and only a limited number of samples

{x_{n}, y_{n}}_{n = 1}^{N}

may be available. In such cases, the KMPE can be estimated using a sample estimator, leading to the following result:

\hat{V} (X, Y) = \frac{1}{N} \sum_{n = 1}^{N} {(1 - κ_{σ} (x_{n} - y_{n}))}^{p / 2} .

(9)

It can be verified that

\hat{V} (X, Y) = \hat{V} (Y, X)

, indicating that KMPE possesses symmetry and is, therefore, suitable for adaptive filter design. In the following section, we will provide the details of using KMPE to design a robust HAF.

3.2. Learning Algorithm

When adopting KMPE to measure the difference between the current predicted output and the desired one of the HAF, the instantaneous cost function can be expressed as

\begin{matrix} J_{k m p e} (w (n), h (n)) & = {(1 - κ_{σ} (d (n) - y (n)))}^{p / 2} \\ = {(1 - κ_{σ} (d (n) - w {(n)}^{T} s_{n}))}^{p / 2} \\ = {(1 - κ_{σ} (e (n)))}^{p / 2} \end{matrix}

(10)

If we set the value of

σ

to be large enough, such that

- \frac{e {(n)}^{2}}{2 σ^{2}} \to 0

, it can be derived that

\begin{matrix} J (w (n), h (n)) & = {(1 - κ_{σ} (e (n)))}^{p / 2} \\ = {(1 - exp (- \frac{e {(n)}^{2}}{2 σ^{2}}))}^{p / 2} \\ \overset{(a)}{\approx} {(\frac{e {(n)}^{2}}{2 σ^{2}})}^{p / 2} \\ = {(2 σ^{2})}^{- p / 2} ∣ e (n) ∣^{p} \end{matrix}

(11)

where the operator (a) in the third line is established according to the fact that there exists

exp (x) \approx 1 + x

for x small enough. With (11), it can be observed that the cost function of [8], i.e., the mean-p-power error, is simply a special case of KMPE. Furthermore, by setting

p = 2

or

p = 1

, (11) will further reduce to the squared error utilized in [17] or the sign error utilized in [7], respectively. These observations suggest that the proposed method can be viewed as a general extension of existing filters designed within the HAF framework. Consequently, the proposed method is expected to deliver superior performance in unknown noise environments.

We now discuss how to update

w (n)

and

h (n)

with the KMPE cost function of (11). First, the partial derivatives of

J_{k m p e} (w (n), h (n))

to

w (n)

and

h (n)

are, respectively, given by

\begin{matrix} \frac{\partial J_{k m p e} (w (n), h (n))}{\partial h (n)} & = \frac{\partial J_{k m p e} (w (n), h (n))}{\partial e (n)} \frac{\partial e (n)}{\partial h (n)} \\ = \frac{- p}{2 σ^{2}} q (e (n)) Z {(n)}^{T} w (n) \end{matrix}

(12)

\begin{matrix} \frac{\partial J_{k m p e} (w (n), h (n))}{\partial w (n)} & = \frac{\partial J_{k m p e} (w (n), h (n))}{\partial e (n)} \frac{\partial e (n)}{\partial w (n)} \\ = \frac{- p}{2 σ^{2}} q (e (n)) Z (n) h (n) \end{matrix}

(13)

where

q (e (n)) = {(1 - κ_{σ} (e (n)))}^{(p - 2) / 2} κ_{σ} (e (n)) e (n)

is the error term, and where

Z (n) = {[z (n), z (n - 1), \dots, z (n - M + 1)]}^{T}

. By combing the steepest descend method, the equations for updating

w (n)

and

h (n)

can be obtained, i.e.,

\begin{matrix} h (n + 1) & = h (n) - μ_{h} \frac{\partial J (w (n), h (n))}{\partial h (n)} \\ = h (n) + η_{1} q (e (n)) Z {(n)}^{T} w (n) \end{matrix}

(14)

\begin{matrix} w (n + 1) & = w (n) - μ_{w} \frac{\partial J (w (n), h (n))}{\partial w (n)} \\ = w (n) + η_{2} q (e (n)) Z (n) h (n) \end{matrix}

(15)

where

μ_{h}

and

μ_{w}

are the initial step sizes for updating

h (n)

and

w (n)

, respectively. Meanwhile,

η_{1} = μ_{h} \frac{p}{2 σ^{2}}

and

η_{2} = μ_{w} \frac{p}{2 σ^{2}}

.

It can be observed that the weight updates for the nonlinear and linear parts of the system, governed by (14) and (15), incorporate a nonlinear function of the error, i.e.,

q (e (n)) = {(1 - κ_{σ} (e (n)))}^{(p - 2) / 2} κ_{σ} (e (n)) e (n)

. When an outlier comes,

e (n)

becomes large, reducing the value of

q (e (n))

. Specifically, with suitable

σ

and p there exists

lim_{e (n) \to \infty} q (e (n)) \to 0

, which prevents weight updates in this case. This mechanism ensures the robustness of the method for outliers. Since the new method is designed with RFF and KMPE, it will be termed the HAF–RFF–KMPE.

Finally, the pseudo-code of the proposed HAF–RFF–KMPE is summarized in Algorithm 1.

Algorithm 1 HAF–RFF–KMPE

Initialization:
Set the step sizes, $η_{1}$ and $η_{2}$ ; set the parameters to generate the RFF space,
L and $τ$ ; set the parameters in the KMPE, $σ$ and p; set the initial weight
vectors, $h (0)$ and $w (0)$ .
Update:
For ${x (n), d (n)} (n = 1, 2, \dots)$ , do
(1) Compute $Z (n)$ : $Z (n) = {[z (n), z (n - 1), \dots, z (n - M + 1)]}^{T}$ , where $z (n)$ ,
$z (n - 1), \dots, z (n - M + 1)$ are calculated according to (3).
(2) Compute the estimated output: $y (n) = w {(n)}^{T} Z (n) h (n)$ .
(3) Compute the estimated error: $e (n) = d (n) - y (n)$ .
(4) Compute the $q (e (n))$ : $q (e (n)) = {(1 - κ_{σ} (e (n)))}^{(p - 2) / 2} κ_{σ} (e (n)) e (n)$ .
(5) Update the weight vector: $h (n + 1) = h (n) + η_{1} q (e (n)) Z {(n)}^{T} w (n)$ .
(6) Update the weight vector: $w (n + 1) = w (n) + η_{2} q (e (n)) Z (n) h (n)$ .
End

Remark 1.

It can be observed from (14) and (15) that the updated equations for the proposed method are controlled by an additional

q (e (n))

, which is defined as

\begin{matrix} q (e (n)) = {(1 - κ_{σ} (e (n)))}^{(p - 2) / 2} κ_{σ} (e (n)) e (n) \end{matrix}

(16)

Following [11], if KRSL is employed to design the cost function then

q (e (n))

can be expressed as

\begin{matrix} q (e (n)) = exp (λ (1 - κ_{σ} (e (n)))) κ_{σ} (e (n)) e (n) \end{matrix}

(17)

Similarly, if the MCC is utilized to design the cost function then

q (e (n))

can be represented by

\begin{matrix} q (e (n)) = κ_{σ} (e (n)) e (n) \end{matrix}

(18)

It can be found that compared with the method designed with the MCC the proposed method introduces an additional term, i.e.,

{(1 - κ_{σ} (e (n)))}^{(p - 2) / 2}

, which includes two additional exponential operators. Hence, the computational cost associated with calculating

q (e (n))

for the proposed method is slightly higher than that of its counterpart under the MCC. However, when considering the method designed with KRSL as well, the increased computational cost for the calculation of

q (e (n))

can be wild, since both of them include two additional exponential operators.

4. Performance Analysis

4.1. Stabilization Range of the Step Sizes

There are two step sizes for the proposed HAF–RFF–KMPE. First, the bound of step size

η_{1}

is studied. Hence, we define the tap-weight error vector for the nonlinear sub-block as

Δ h (n) = h^{*} - h (n)

, where

h *

denotes the optimal weight vector. Using

h^{*}

to subtract both sides of (14), we obtain

\begin{matrix} Δ h (n + 1) & = Δ h (n) - η_{1} q (e (n)) Z {(n)}^{T} w (n) \end{matrix}

(19)

By pre-multiplying (19) with its transposed form, the following equation can be obtained:

{∥ Δ h (n + 1) ∥}^{2} = {∥ Δ h (n) ∥}^{2} - 2 η_{1} Δ h (n) q (e (n)) Z {(n)}^{T} w (n) + η_{1}^{2} q {(e (n))}^{2} {∥ Z {(n)}^{T} w (n) ∥}^{2} .

(20)

Taking the expectations on both sides of (20), we have

\begin{matrix} E [{∥ Δ h (n + 1) ∥}^{2}] = & E [{∥ Δ h (n) ∥}^{2}] - 2 η_{1} E [Δ h (n) q (e (n)) Z {(n)}^{T} w (n)] + \\ η_{1}^{2} E [q {(e (n))}^{2} {∥ Z {(n)}^{T} w (n) ∥}^{2}] . \end{matrix}

(21)

To guarantee a convergence solution, the energy of the weight error vector should decrease monotonically, i.e.,

E [{∥ Δ h (n + 1) ∥}^{2}] \leq E [{∥ Δ h (n) ∥}^{2}]

, which results in

η_{1}^{2} E [q {(e (n))}^{2} {∥ Z {(n)}^{T} w (n) ∥}^{2}] \leq 2 η_{1} E [Δ h (n) q (e (n)) Z {(n)}^{T} w (n)] .

(22)

On the basis of (22) and the fact that

0 < η_{1}

, we obtain the stabilization range of the step size

η_{1}

as

0 < η_{1} \leq \frac{2 E [Δ h (n) q (e (n)) Z {(n)}^{T} w (n)]}{E [q {(e (n))}^{2} {∥ Z {(n)}^{T} w (n) ∥}^{2}]} .

(23)

To further investigate the bound of step size

η_{2}

, the tap-weight error vector for the linear sub-block is defined as

Δ w (n) = w^{*} - w (n)

, where

w *

denotes the optimal weight vector. By subtracting

w^{*}

from both sides of (15), we obtain

\begin{matrix} Δ w (n + 1) & = Δ w (n) - η_{2} q (e (n)) Z {(n)}^{T} h (n) \end{matrix}

(24)

Using a procedure similar to that used from (20) to (23), we obtain the stabilization range of the step size

η_{2}

as

0 < η_{2} \leq \frac{2 E [Δ w (n) q (e (n)) Z {(n)}^{T} h (n)]}{E [q {(e (n))}^{2} {∥ Z {(n)}^{T} h (n) ∥}^{2}]} .

(25)

4.2. Steady-State Excess Mean Square Error

In the field of adaptive filtering [26], steady-state excess mean square error (EMSE) is a key indicator for measuring the residual error after the algorithm converges. It reflects the output error power of the adaptive filter when it reaches a stable state. In the following, we will provide two theorems that elucidate the steady-state performance of the HAF–RFF–KMPE through steady-state EMSE analysis. First, let

h^{*}

and

w^{*}

denote the optimal weight vectors for the nonlinear sub-block and linear sub-block that need to be estimated, respectively. Then, the desired output in Figure 1 can be expressed as

\begin{matrix} d (n) = {w^{*}}^{T} Z (n) h^{*} + v (n) \end{matrix}

(26)

Furthermore, the following common assumptions are employed for theoretical tractability:

Assumption 1.

The observation noise

v (1), v (2), \dots

is zero-mean, independent, identically distributed. Meanwhile, it is independent of

Z {(n)}^{T} w^{*}

and

Z (n) h^{*}

, respectively.

Assumption 2.

The a priori error

e_{a} (n)

exhibits a zero-mean distribution and is statistically independent of

v (n)

.

Assumption 3.

For sufficiently long filters, the squared norms

{∥ Z {(n)}^{T} w^{*} ∥}^{2}

and

{∥ Z (n) h^{*} ∥}^{2}

become asymptotically uncorrelated to

q {(e (n))}^{2}

.

Assumption 4.

At steady state,

e_{a} (n)

remains sufficiently small, such that its third-order and higher-order terms are negligible for performance analysis.

The rationality of these assumptions has been demonstrated in [11], and we do not elaborate on them further.

Theorem 1.

If the optimal weight vector of the linear sub-block, i.e.,

w^{*}

, is known and only the

h^{*}

needs to be estimated then the steady-state EMSE of the HAF–RFF–KMPE can be expressed as

S = \frac{η_{1} lim_{n \to \infty} Tr (R_{z w}) E [q^{2} (v (n))]}{2 lim_{n \to \infty} E [q^{'} (v (n))] - η_{1} Tr (R_{z w}) lim_{n \to \infty} E [q (v (n)) q^{″} (v (n)) + ∣ q^{'} (v (n)) ∣^{2}]},

(27)

where

R_{z w} = E [Z_{w} (n) Z_{w} {(n)}^{T}]

,

Z_{w} (n) = Z {(n)}^{T} w^{*}

, and where

Tr (\cdot)

represents the trace operator of a matrix. Meanwhile,

q (v (n))

,

q^{'} (v (n))

, and

q^{″} (v (n))

are expressed by

\{\begin{matrix} q (v (n)) = {(1 - κ_{σ} (v (n)))}^{(p - 2) / 2} κ_{σ} (v (n)) v (n) \\ q^{'} (v (n)) = {(1 - κ_{σ} (v (n)))}^{\frac{p - 4}{2}} \cdot A \\ q^{″} (v (n)) = (\frac{p}{2} - 2) {(1 - k_{σ} (v (n)))}^{\frac{p}{2} - 3} \frac{v (n)}{σ^{2}} k_{σ} (v (n)) \cdot A \\ + {(1 - k_{σ} (v (n)))}^{\frac{p}{2} - 2} \cdot B \end{matrix}

(28)

with A and B as

\begin{matrix} A & = k_{σ} {(v (n))}^{2} (\frac{p}{2} \frac{v {(n)}^{2}}{σ^{2}} - 1) + k_{σ} (v (n)) (1 - \frac{v {(n)}^{2}}{σ^{2}}) \end{matrix}

(29)

\begin{matrix} B & = k_{σ} {(v (n))}^{2} (\frac{- p v {(n)}^{3}}{σ^{4}} + \frac{(p + 2) v (n)}{σ^{2}}) + k_{σ} (v (n)) (\frac{v {(n)}^{3}}{σ^{4}} - \frac{3 v (n)}{σ^{2}}) \end{matrix}

(30)

Proof.

When the optimal weight vector of the linear sub-block is known and denoted as

w^{*}

, only (14) will be adopted to learn the weight vector of the nonlinear sub-block, i.e.,

h (n + 1) = h (n) + η_{1} q (e (n)) Z {(n)}^{T} w^{*} .

(31)

By combing (26), the estimated error can be further expressed as

\begin{matrix} e (n) & = d (n) - y (n) \\ = {w^{*}}^{T} Z (n) h^{*} - {w^{*}}^{T} Z (n) h (n) + v (n) \\ = {w^{*}}^{T} Z (n) \tilde{h} (n) + v (n) \\ = e_{a} (n) + v (n), \end{matrix}

(32)

where

\tilde{h} (n) = h^{*} - h (n)

, and where

e_{a} (n) = {w^{*}}^{T} Z (n) \tilde{h} (n)

is the a priori error.

Using

h^{*}

to subtract both sides of (31) results in the following equation:

\tilde{h} (n + 1) = \tilde{h} (n) - η_{1} q (e (n)) Z {(n)}^{T} w^{*} .

(33)

where

\tilde{h} (n)

and

\tilde{h} (n + 1)

denote the weight error vectors at iterations n and

n + 1

, respectively. Applying left multiplication to (33) with its transpose yields

{∥ \tilde{h} (n + 1) ∥}^{2} = {∥ \tilde{h} (n) ∥}^{2} - 2 η_{1} q (e (n)) e_{a} (n) + η_{1}^{2} q {(e (n))}^{2} {∥ Z {(n)}^{T} w^{*} ∥}^{2} .

(34)

Applying mathematical expectation operators to both sides of (34) leads to the following mean square relation:

\begin{matrix} E [{∥ \tilde{h} (n + 1) ∥}^{2}] = & E [{∥ \tilde{h} (n) ∥}^{2}] - 2 η_{1} E [q (e (n)) e_{a} (n)] + \\ η_{1}^{2} E [q {(e (n))}^{2} {∥ Z {(n)}^{T} w^{*} ∥}^{2}] . \end{matrix}

(35)

At steady state, the asymptotic equivalence

lim_{n \to \infty} E [{∥ \tilde{h} (n + 1) ∥}^{2}] = lim_{n \to \infty} E [{∥ \tilde{h} (n) ∥}^{2}]

holds, leading to the following condition:

\begin{matrix} 2 lim_{n \to \infty} E [q (e (n)) e_{a} (n)] = η_{1} lim_{n \to \infty} E [q {(e (n))}^{2} {∥ Z {(n)}^{T} w^{*} ∥}^{2}] . \end{matrix}

(36)

By combining Assumption 3, (36) transforms into

2 lim_{n \to \infty} E [q (e (n)) e_{a} (n)] = η_{1} Tr (R_{z w}) lim_{n \to \infty} E [q {(e (n))}^{2}],

(37)

where

R_{z w} = E [Z_{w} (n) Z_{w} {(n)}^{T}]

with

Z_{w} (n) = Z {(n)}^{T} w^{*}

. Based on the error decomposition

e (n) = e_{a} (n) + v (n)

, the Taylor expansion of

q (e (n))

about

v (n)

takes the form of

\begin{matrix} q (e (n)) & = q (e_{a} (n) + v (n)) \\ = q (v (n)) + q^{'} (v (n)) e_{a} (n) + \frac{1}{2} q^{″} (v (n)) e_{a}^{2} (n) + o (e_{a} (n)), \end{matrix}

(38)

where

o (e_{a} (n))

encompasses third-order and higher terms of

e_{a} (n)

. The expressions for

q (v (n))

,

q^{'} (v (n))

, and

q^{″} (v (n))

derive from (28). Under the conditions where

o (e_{a} (n))

is negligible and Assumptions 1–2 hold, we obtain

\{\begin{matrix} E [q (e (n)) e_{a} (n)] \approx E [q^{'} (v (n))] E [e_{a}^{2} (n)] \\ E [q {(e (n))}^{2}] \approx E [q^{2} (v (n))] + E [q (v (n)) q^{″} (v (n)) + ∣ q^{'} (v (n)) ∣^{2}] E [e_{a}^{2} (n)] . \end{matrix}

(39)

Substituting (39) into (37) yields

\begin{matrix} S & = lim_{n \to \infty} E [e_{a}^{2} (n)] \\ = \frac{η_{1} lim_{n \to \infty} Tr (R_{z w}) E [q^{2} (v (n))]}{2 lim_{n \to \infty} E [q^{'} (v (n))] - η_{1} Tr (R_{z w}) lim_{n \to \infty} E [q (v (n)) q^{″} (v (n)) + ∣ q^{'} (v (n)) ∣^{2}]} . \end{matrix}

(40)

This completes the proof. □

Theorem 2.

If the optimal weight vector of the nonlinear sub-block—i.e.,

h^{*}

—is known and if only the

w^{*}

needs to be estimated then the steady-state EMSE of the HAF–RFF–KMPE can be expressed as

S = \frac{η_{2} lim_{n \to \infty} Tr (R_{z h}) E [q^{2} (v (n))]}{2 lim_{n \to \infty} E [q^{'} (v (n))] - η_{2} Tr (R_{z h}) lim_{n \to \infty} E [q (v (n)) q^{″} (v (n)) + ∣ q^{'} (v (n)) ∣^{2}]},

(41)

where

R_{z h} = E [Z_{h} (n) Z_{h} {(n)}^{T}]

with

Z_{h} (n) = Z (n) h^{*}

, and where

q (v (n))

,

q^{'} (v (n))

,

q^{″} (v (n))

are defined identically to (28).

Proof.

When the optimal weight vector of the nonlinear subpart is known, there exists

h (n) = h^{*}

. Hence, only (15) is adopted to learn the weight vector of the linear subpart, i.e.,

w (n + 1) = w (n) + η_{2} q (e (n)) Z (n) h^{*},

(42)

with the output error as

\begin{matrix} e (n) & = {w^{*}}^{T} Z (n) h^{*} - w {(n)}^{T} Z (n) h^{*} + v (n) \\ = \tilde{w} {(n)}^{T} Z (n) h^{*} + v (n) \\ = e_{a} (n) + v (n), \end{matrix}

(43)

where

\tilde{w} (n) = w^{*} - w (n)

defines the linear weight error vector, and where

e_{a} (n) = \tilde{w} {(n)}^{T} Z (n) h^{*}

denotes the a priori error in this configuration. □

Following a derivation procedure analogous to that from (33) to (39), the steady-state EMSE corresponding to (41) is obtained.

Remark 2.

Theorems 1 and 2 provide expressions for the steady-state EMSE. If the noise distribution is known a priori then the expectations in these theorems can be computed using either integral formulas from mathematics or sample estimators [27] to derive the theoretical steady-state EMSEs. However, it should be noted that Theorems 1 and 2 assume that the steady-state value of

e_{a} (n)

is sufficiently small, allowing the neglect of third- and higher-order terms. When

e_{a} (n)

at steady state is relatively large, the theoretical steady-state EMSEs in the two theorems may deviate from the actual values, which will be observed in the following simulations.

5. Simulation Results

This section presents simulations to validate the theoretical analysis and evaluate the performance of the proposed HAF–RFF–KMPE method.

5.1. Verification of Theoretical Results

To empirically validate the theoretical results of Section 4, a dataset of 300,000 samples was synthesized using the following equations:

\begin{matrix} s (n) = {h^{*}}^{T} z (n) \end{matrix}

(44)

\begin{matrix} y (n) = {w^{*}}^{T} s (n), \end{matrix}

(45)

where the nonlinear transformation

z (n) = \sqrt{\frac{2}{10}} {[cos (α_{1}^{T} x (n) + b_{1}), \dots, cos (α_{10}^{T} x (n) + b_{10})]}^{T}

,

{α_{j}}_{j = 1}^{10}

denotes a sequence that is randomly generated according to a Gaussian distribution with zero mean and variance

0.36

, and where

{b_{j}}_{j = 1}^{10}

denotes another sequence that is randomly generated following a uniform distribution over the interval

[0, 2 π]

. The coefficient vectors remain fixed at

h^{*} = {[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}^{T}

and

w^{*} = {[1, 0.6]}^{T}

, while the system inputs

{x (n)}_{n = 1}^{300000}

are sampled uniformly from

[- 1, 1]

. Similar to [11,28,29], we further corrupted the outputs of data pairs using a standard non-Gaussian noise model, described as

V (n) = (1 - γ (n)) V_{1} (n) + γ (n) V_{2} (n),

(46)

where

V_{1} (n)

denotes normal noise,

V_{2} (n)

represents outliers, and

γ (n)

is a binary variable satisfying

P r (γ (n) = 1) = p

and

P r (γ (n) = 0) = 1 - p

. Within this noise model, the inner noise

V_{1} (n)

is set to draw from uniform distribution, i.e.,

V_{1} (n) \sim U [- 0.1, 0.1]

, and the outlier component

V_{2} (n)

is set to draw from Gaussian distribution with larger variance, i.e.,

V_{2} (n) \sim N (0, 10)

. Meanwhile, the outlier probability is set to

p = 0.02

.

Figure 2 illustrates the comparison between the theoretical and simulated steady-state EMSE of the proposed HAF–RFF–KMPE algorithm, where the theoretical values are derived based on Theorem 1. Similarly, Figure 3 presents the corresponding comparison based on Theorem 2. To estimate the steady-state EMSE from simulations, the values were computed by averaging the EMSE over the final 30,000 iterations of the EMSE learning curves. Furthermore, to reduce the impact of randomness in the simulation, each result was averaged over 50 independent runs. As shown in Figure 2 and Figure 3, the simulated steady-state EMSE values closely matched the theoretical predictions in both scenarios when the value of

e_{a} (n)

at steady state was relatively small, which provides supports for the validity of the proposed theoretical analysis. However, it should be noted that when the value of

e_{a} (n)

at steady state became larger (see Figure 2a), the theoretical values would become less accurate. For the underlying reason leading to these slight differences, refer to Remark 2.

5.2. Performance Evaluation Under Different Nonlinear Sub-Block Settings

To further test the performance of the proposed HAF–RFF–KMPE, a general Hammerstein system is described as follows:

\begin{matrix} s (n) = f (x (n)) \end{matrix}

(47)

\begin{matrix} y (n) = {w^{*}}^{T} s (n), \end{matrix}

(48)

where

x (n)

is the input of the system and is still set to draw from a uniform distribution over the interval

[- 1, 1]

, and where

w^{*}

is the weight vector of a linear sub-block with the definition of

w^{*} = {[1, 0.6]}^{T}

. Meanwhile, f denotes a general nonlinear function. In the following, four cases are considered and described, as

\begin{matrix} Function 1 : f (x (n)) = exp (- sin (x {(n)}^{2})) \end{matrix}

(49)

\begin{matrix} Function 2 : f (x (n)) = \max {0, x {(n)}^{3} + \cos [2 π x (n)]} \end{matrix}

(50)

\begin{matrix} Function 3 : f (x (n)) = 0.2 exp (- {(10 x (n) - 4)}^{2}) + 0.5 exp (- {(80 x (n) - 40)}^{2}) \\ + 0.3 exp (- {(80 x (n) - 20)}^{2}) \end{matrix}

(51)

\begin{matrix} Function 4 : f (x (n)) = \{\begin{matrix} \frac{\sin (5 x (n))}{5 x (n)}, if x (n) \neq 0 \\ 1, otherwise \end{matrix} \end{matrix}

(52)

Following (47)–(52), we generate four groups of input–output data pairs, and the only difference between them is the used nonlinear function. Each set of data contains 50,000 pairs of training samples and 100 pairs of testing samples. The generated training data will be mixed with a noise sequence on the corresponding output, which will be used to train the algorithm, while the generated test samples will be directly used to evaluate performance. The noise sequence is generated according to the procedure in Section 5.1. For the current experiment, we maintained this procedure while modifying the outlier occurrence rate to

p = 0.1

. The evaluated performance was quantified through the MSE, expressed as

\begin{matrix} MSE = \frac{1}{N} \sum_{j = 1}^{N} {(y (j) - \hat{y} (j))}^{2}, \end{matrix}

(53)

where

y (j)

and

\hat{y} (j)

denote the actual and estimated outputs of the j-th sample, respectively, and where N is the number of samples.

Figure 4 shows the averaged testing MSE curves of the HAF–RFF–KMPE under different nonlinear sub-block settings. For comparison, the averaged testing MSE curves of several existing algorithms designed with an HAF structure were also incorporated into this figure. These methods include an HAF designed with the MSE criterion and a polynomial function (the HAF–Polynomial–MSE) [4], an HAF designed with the MCC criterion and a polynomial function (the HAF–Polynomial–MCC) [9], an HAF designed with the LMP criterion and a spline function (the HAF–Spline–LMP) [8], and an HAF designed with the KRSL criterion and an RFF transformation (the HAF–RFF–KRSL) [11]. To ensure a fair comparison, the parameters for different algorithms were selected so that all the algorithms achieved their best performance with almost the same initial convergence speed, similar to [30]. Within the HAF–Spline–LMP framework, the Catmull–Rom spline basis [8] serves as a default choice for spline function design, and the spline configuration employs 23 control points spaced at 0.2 intervals. For both the HAF–RFF–KRSL and the HAF–RFF–KMPE implementations, the RFF space dimensionality was fixed at 50, with parameter

τ

being fixed at 0.1. The remaining parameters across the algorithms were empirically determined to balance the convergence rate and steady-state performance. The specific values are detailed in Table 1. Herein,

η_{1}

denotes the step size for nonlinear weight adaptation,

η_{2}

represents the step size for linear weight updates,

α

specifies the polynomial order in the HAF–Polynomial–MSE and the HAF–Polynomial–MCC, and h defines the kernel bandwidth for the HAF–Polynomial–MCC. Meanwhile,

λ

and

δ

denote the parameters for cost function used in the HAF–RFF–KRSL, and p and

σ

denote the parameters for cost function used in the HAF–RFF–KMPE.

It can be observed from Figure 4 that the proposed HAF–RFF–KMPE not only had faster convergence speed at the initial stage but could also obtain smaller testing MSE values in the steady state, compared with the other four algorithms. These results indicate that for Hammerstein systems with unknown nonlinearities, the HAF–RFF–KMPE presents a superior alternative to the HAF–Polynomial–MSE, the HAF–Polynomial–MCC, the HAF–Spline–LMP, and the HAF–RFF–KRSL.

5.3. Performance Evaluation Under Different Non-Gaussian Noise Environments

For this section, we tested the performance of the proposed HAF–RFF–KMPE under different non-Gaussian noise environments. The noise model used was set the same as (46), in which the outlier probability was set to

p = 0.1

and

V_{2} (n)

was set to be drawn from a Gaussian distribution with zero mean and variance of 10. However, four cases were considered to model the inner noise

V_{1} (n)

. The details are as follows:

Case 1:

V_{1} (n)

was set to be drawn from a Gaussian distribution with zero mean and variance of

0.2

.

Case 2:

V_{1} (n)

was set to be drawn from a binary distribution, which satisfied

P r (V_{1} (n) = - 0.1) = 0.5

and

P r (V_{1} (n) = 0.1) = 0.5

.

Case 3:

V_{1} (n)

was set to be drawn from a uniform distribution over the interval of

[- 0.1, 0.1]

.

Case 4:

V_{1} (n)

was set to be drawn from a Laplace distribution with a location parameter of 0 and a scale parameter of 0.1.

Figure 5 shows the averaged testing MSE curves of the HAF–RFF–KMPE in different non-Gaussian noise environments, and the related parameters are summarized in Table 2. Herein, the nonlinear sub-block of the Hammerstein was modeled as

s (n) = sin (x (n))

, and the corresponding linear sub-block was set the same as (47). As can be observed from Figure 5, the proposed HAF–RFF–KMPE with almost the same initial convergence speed can always obtain the smaller testing MSE at the final iteration, which means it is capable of achieving higher filtering accuracy under different non-Gaussian noise environments. Furthermore, it can be observed that although the HAF–RFF–KRLS also adopted the RFF to model the nonlinear sub-block of Hammerstein systems it tended to obtain larger testing MSEs than the HAF–RFF–KMPE. This indicates that the used KMPE cost function is more effective for identifying Hammerstein systems contaminated by non-Gaussian noises in comparison with KRSL.

5.4. Parameters Sensitivity

There are four key parameters—i.e., L,

τ

, p, and

σ

—that should be appropriately selected to obtain the desired performance of the proposed method. For this section, we investigated the influence of these parameters on the learning performance of the HAF–RFF–KMPE.

First, we examined how the parameter L affected the learning performance of the HAF–RFF–KMPE. Specifically, we varied the value of L from 10 to 100 in increments of 10, while keeping all the other parameters consistent with those used in Table 1. Figure 6a presents the steady-state MSEs of the the HAF–RFF–KMPE for different values of L. The steady-state MSE was calculated by averaging over the last 500 iterations of MSE curves. Additionally, the nonlinear sub-block of Hammerstein was modeled as

s (n) = sin (x (n))

, and we maintained a noise model identical to that used in Case 1. As illustrated in Figure 6a, it is evident that the steady-state MSE for the HAF–RFF–KMPE decreased as L increased. However, it should be noted that a larger L means larger computational complexity. Therefore, L should be selected to provide a balance between computational efficiency and filtering accuracy.

Similarly, we investigated how varying the parameter

τ

impacted the learning performance of the HAF–RFF–KMPE. Figure 6b shows steady-state MSEs of the HAF–RFF–KMPE under different selections for

τ

. From this figure, it can be observed that setting

τ

either too small or too large leads to degraded learning performance for the HAF–RFF–KMPE. Consequently, it is crucial to select an appropriate value for

τ

before implementation.

Furthermore, we examined the impact of parameter p on the learning performance of the HAF–RFF–KMPE, with our results presented in Figure 6c. As illustrated in Figure 6c, a very small value of p (e.g.,

p = 0.01

) can lead to a deterioration in the learning performance of the HAF–RFF–KMPE. Conversely, selecting values for p ranging from 1 to 4 yields relatively superior filtering performance.

Finally, we investigated the effect of parameter

σ

on the learning performance of the HAF–RFF–KMPE, as shown in Figure 6d. It is evident from this figure that the influence of

σ

on the learning performance of the HAF–RFF–KMPE is similar to that of

τ

. Therefore,

τ

should also be appropriately selected before being used.

5.5. Ablation Experiments

Both RFF and KMPE were employed in the design of the proposed method. For this section, we performed a comparative analysis of the performance improvements achieved through the independent application of RFF, the independent application of KMPE, and the synergistic integration of both methods across the aforementioned four types of noise and nonlinear functions.

First, we fixed the RFF mapping function and investigated the influence of different error criteria on the filtering performance. Specifically, the proposed KMPE criterion was compared to classical alternatives including MSE, MCC, and KRSL. As shown in Table 3, across four distinct noise types the KMPE-based method consistently achieved lower steady-state MSE values than the other error criteria. This indicates that the KMPE criterion is more effective in characterizing error distributions under non-Gaussian conditions, thereby suppressing outliers more efficiently, which ultimately leads to enhanced filtering accuracy.

Furthermore, we fixed the KMPE criterion and examined the role of different mapping functions. The RFF mapping function employed in the proposed method was compared with polynomial and spline mapping functions, and the results are summarized in Table 4. Under four different nonlinear function scenarios, the RFF-based approach consistently outperformed the alternatives, yielding the lowest steady-state MSE values. These results demonstrate that the adopted RFF mapping function possesses stronger representational capability for capturing nonlinear structures in the input data.

As a brief summary, the ablation experiments confirm the effectiveness of the proposed HAF–RFF–KMPE method from two complementary perspectives: (1) the KMPE criterion provides superior adaptability to non-Gaussian environments compared to classical error measures, and (2) the RFF mapping function offers better expressiveness and robustness than alternative mapping strategies. The combination of these two components accounts for the significant improvement in filtering accuracy and robustness observed in the proposed method.

6. Conclusions

This paper presents the development and analysis of a robust HAF based on KMPE loss. The proposed algorithm is designed to improve resilience to non-Gaussian noise and outliers, which are common in nonlinear system identification tasks. Theoretical analysis was carried out to characterize the steady-state behavior of the algorithm. In addition, extensive simulations were performed under different nonlinear sub-block settings and in various non-Gaussian noise environments. The results validate the theoretical analysis and demonstrate that the proposed method achieves superior robustness and accuracy compared with the conventional HAF and its robust variants. In the future, we will further explore applications of the method in more challenging scenarios, such as acoustic echo cancellation.

Author Contributions

Conceptualization, Y.L. (Yan Liu); methodology, Y.L. (Yan Liu), Y.L. (Yong Liu) and B.Y.; validation, B.Y. and Y.C.; investigation, Y.C. and C.W.; writing—original draft preparation, Y.L. (Yan Liu); writing—review and editing, Y.L. (Yong Liu), C.T. and C.W.; supervision, C.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

HAF	Hammerstein adaptive filter
KMPE	kernel mean p-power error
MSE	mean square error
EMSE	excess mean square error
LMP	least mean p-power
MCC	maximum correntropy criterion
KRSL	kernel risk-sensitive loss
HAF–RFF–KMPE	Hammerstein adaptive filter based on the KMPE
RFF	random Fourier features
HAF–Polynomial–MSE	HAF designed with MSE criterion and polynomial function
HAF–Polynomial–MCC	HAF designed with MCC criterion and polynomial function
HAF–Spline–LMP	HAF designed with LMP criterion and spline function
HAF–RFF–KRSL	HAF designed with KRSL criterion and RFF transformation

References

Hou, J.; Su, H.; Yu, C.; Chen, F.; Li, P. Bias-Correction Errors-in-Variables Hammerstein Model Identification. IEEE Trans. Ind. Electron. 2023, 70, 7268–7279. [Google Scholar] [CrossRef]
Guo, N.; Yao, Z.; Lu, M. Indirect learning Hammerstein HPA predistorter for wideband GNSS signals. GPS Solut. 2020, 25, 6. [Google Scholar] [CrossRef]
Liu, Z.; Li, C. Adaptive Hammerstein filtering via recursive non-convex projection. IEEE Trans. Signal Process. 2022, 70, 2869–2882. [Google Scholar] [CrossRef]
Stenger, A.; Kellermann, W. Adaptation of a memoryless preprocessor for nonlinear acoustic echo cancelling. Signal Process. 2000, 80, 1747–1760. [Google Scholar] [CrossRef]
Liu, Y.; Li, C. Distributed Prediction via Adaptive Hammerstein Filter over Networked Systems. IEEE Trans. Signal Inf. Proc. Netw. 2018, 4, 534–548. [Google Scholar] [CrossRef]
Sicuranza, G.L.; Carini, A. On the Accuracy of Generalized Hammerstein Models for Nonlinear Active Noise Control. In Proceedings of the 2006 IEEE Instrumentation and Measurement Technology Conference Proceedings, Sorrento, Italy, 24–27 April 2006; pp. 1411–1416. [Google Scholar]
Liu, C.; Zhang, Z.; Tang, X. Sign normalised Hammerstein spline adaptive filtering algorithm in an impulsive noise environment. Neural Process. Lett. 2019, 50, 477–496. [Google Scholar] [CrossRef]
Zhao, H.; Gao, Y.; Zhu, R. Least Mean p-Power Hammerstein Spline Adaptive Filtering Algorithm: Formulation and Analysis. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 6275–6283. [Google Scholar] [CrossRef]
Wu, Z.; Peng, S.; Chen, B.; Zhao, H. Robust Hammerstein adaptive filtering under maximum correntropy criterion. Entropy 2015, 17, 7149–7166. [Google Scholar] [CrossRef]
Qian, G.; Luo, D.; Wang, S. A robust adaptive filter for a complex Hammerstein system. Entropy 2019, 21, 162. [Google Scholar] [CrossRef] [PubMed]
Zheng, Y.; Wang, S.; Chen, B. Identification of Hammerstein Systems with Random Fourier Features and Kernel Risk Sensitive Loss. Neural Process. Lett. 2023, 55, 9041–9063. [Google Scholar] [CrossRef]
Chen, B.; Xing, L.; Wang, X.; Qin, J.; Zheng, N. Robust Learning with Kernel Mean p-Power Error Loss. IEEE Trans. Cybern. 2016, 48, 2101–2113. [Google Scholar] [CrossRef]
Stepaniants, G. Learning Partial Differential Equations in Reproducing Kernel Hilbert Spaces. J. Mach. Learn. Res. 2023, 24, 1–72. [Google Scholar]
Ma, W.; Lei, Y.; Wang, X.; Chen, B. Robust state of charge estimation of lithium-ion battery via mixture kernel mean p-power error loss LSTM with heap-based-optimizer. J. Energy Chem. 2023, 80, 768–784. [Google Scholar] [CrossRef]
Li, N.; He, F.; Ma, W. Wind Power Prediction Based on Extreme Learning Machine with Kernel Mean p-Power Error Loss. Energies 2019, 12, 673. [Google Scholar] [CrossRef]
Ma, W.; Lei, Y.; Yang, B.; Guo, P. A Multi-Mechanism Fusion Method for Robust State of Charge Estimation via Bidirectional Long Short-Term Memory Model with Mixture Kernel Mean p-Power Error Loss Optimized by Golden Jackal Optimization Algorithm. J. Electrochem. Soc. 2024, 171, 090530. [Google Scholar] [CrossRef]
Scarpiniti, M.; Comminiello, D.; Parisi, R.; Uncini, A. Hammerstein uniform cubic spline adaptive filters: Learning and convergence properties. Signal Process. 2014, 100, 112–123. [Google Scholar] [CrossRef]
Tang, Y.; Bu, C.; Liu, M.; Zhang, L.; Lian, Q. Application of ELM–Hammerstein model to the identification of solid oxide fuel cells. Neural Comput. Appl. 2018, 29, 401–411. [Google Scholar] [CrossRef]
Chang, W.D. Identification of nonlinear discrete systems using a new Hammerstein model with Volterra neural network. Soft Comput. 2022, 26, 6765–6775. [Google Scholar] [CrossRef]
Zheng, Y.; Dong, J.; Ma, W.; Chen, B. Kernel adaptive Hammerstein filter. In Proceedings of the 26th European Signal Processing Conference, Rome, Italy, 3–7 September 2018; pp. 504–508. [Google Scholar]
Van Vaerenbergh, S.; Azpicueta-Ruiz, L.A. Kernel-based identification of Hammerstein systems for nonlinear acoustic echo-cancellation. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, 4–9 May 2014; pp. 3739–3743. [Google Scholar]
Elias, V.R.M.; Gogineni, V.C.; Martins, W.A.; Werner, S. Kernel regression over graphs using random Fourier features. IEEE Trans. Signal Process. 2022, 70, 936–949. [Google Scholar] [CrossRef]
Rahimi, A.; Recht, B. Random features for large-scale kernel machines. In Proceedings of the 20th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; pp. 1177–1184. [Google Scholar]
Huang, X.; Wang, S.; Xiong, K. The Cauchy Conjugate Gradient Algorithm with Random Fourier Features. Symmetry 2019, 11, 1323. [Google Scholar] [CrossRef]
Liu, W.; Príncipe, J.C.; Haykin, S. Kernel Adaptive Filtering: A Comprehensive Introduction; John Wiley and Sons: New York, NY, USA, 2010. [Google Scholar]
Sayed, A.H. Adaptive Filters; John Wiley and Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Zhou, T.; Wang, S.; Qian, J.; Zheng, Y.; Zhang, Q.; Xiao, Y. Gauss Hermite Fourier Features Based on Maximum Correntropy Criterion for Adaptive Filtering. IEEE Trans. Circuits Syst. I-Regul. Pap. 2025, 72, 1438–1451. [Google Scholar] [CrossRef]
Zheng, Y.; Chen, B.; Wang, S.; Wang, W. Broad learning system based on maximum correntropy criterion. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 3083–3097. [Google Scholar] [CrossRef]
Xiang, S.; Zhao, C.; Gao, Z.; Yan, D. Low-Complexity Constrained Recursive Kernel Risk-Sensitive Loss Algorithm. Symmetry 2022, 14, 877. [Google Scholar] [CrossRef]
Chen, B.; Xie, Y.; Li, Z.; Li, Y.; Ren, P. Asymmetric Correntropy for Robust Adaptive Filtering. IEEE Trans. Circuits Syst. II-Express Briefs 2022, 69, 1922–1926. [Google Scholar] [CrossRef]

Figure 1. Basic structure of HAF.

Figure 2. Theoretical and simulated steady-state EMSEs for Theorem 1: (a) steady-state EMSE versus the step size

η_{1}

(

p = 2.0

and

σ = 2.0

); (b) steady-state EMSE versus the kernel size

σ

(

η_{1} = 0.05

and

p = 2.0

).

Figure 2. Theoretical and simulated steady-state EMSEs for Theorem 1: (a) steady-state EMSE versus the step size

η_{1}

(

p = 2.0

and

σ = 2.0

); (b) steady-state EMSE versus the kernel size

σ

(

η_{1} = 0.05

and

p = 2.0

).

Figure 3. Theoretical and simulated steady-state EMSEs for Theorem 2: (a) steady-state EMSE versus the step size

η_{2}

(

p = 2.0

and

σ = 1.0

); (b) steady-state EMSE versus the kernel size

σ

(

η_{2} = 0.01

and

p = 2.0

).

Figure 3. Theoretical and simulated steady-state EMSEs for Theorem 2: (a) steady-state EMSE versus the step size

η_{2}

(

p = 2.0

and

σ = 1.0

); (b) steady-state EMSE versus the kernel size

σ

(

η_{2} = 0.01

and

p = 2.0

).

Figure 4. Testing MSE curves of different algorithms under different nonlinear sub-block settings: (a) Function 1; (b) Function 2; (c) Function 3; (d) Function 4.

Figure 5. Testing MSE curves of different algorithms under different non-Gaussian noise environments: (a) Case 1; (b) Case 2; (c) Case 3; (d) Case 4.

Figure 6. Influence of the parameters on the learning performance of the HAF–RFF–KMPE: (a) parameter L; (b) parameter

τ

; (c) parameter p; (d) parameter

σ

.

Figure 6. Influence of the parameters on the learning performance of the HAF–RFF–KMPE: (a) parameter L; (b) parameter

τ

; (c) parameter p; (d) parameter

σ

.

Table 1. Parameter settings of different algorithms under different nonlinear sub-block settings.

Algorithm	Parameters
Algorithm	Function 1	Function 2	Function 3	Function 4
HAF–Polynomial–MSE	$η_{1} = 0.012$ $η_{2} = 0.01$ $α = 10$	$η_{1} = 0.01$ $η_{2} = 0.009$ $α = 35$	$η_{1} = 0.03$ $η_{2} = 0.02$ $α = 40$	$η_{1} = 0.02$ $η_{2} = 0.02$ $α = 40$
HAF–Polynomial–MCC	$η_{1} = 0.08$ $η_{2} = 0.1$ $α = 25$ $h = 1.0$	$η_{1} = 0.008$ $η_{2} = 0.008$ $α = 35$ $h = 1.0$	$η_{1} = 0.05$ $η_{2} = 0.05$ $α = 70$ $h = 1.0$	$η_{1} = 0.1$ $η_{2} = 0.1$ $α = 40$ $h = 1.0$
HAF–Spline–LMP	$η_{1} = 0.013$ $η_{2} = 0.013$ $p = 1.5$	$η_{1} = 0.015$ $η_{2} = 0.014$ $p = 1.5$	$η_{1} = 0.015$ $η_{2} = 0.014$ $p = 1.5$	$η_{1} = 0.015$ $η_{2} = 0.014$ $p = 1.5$
HAF–RFF–KRSL	$η_{1} = 0.01$ $η_{2} = 0.01$ $λ = 1.5$ $δ = 2.0$	$η_{1} = 0.015$ $η_{2} = 0.016$ $λ = 1.5$ $δ = 1.5$	$η_{1} = 0.025$ $η_{2} = 0.028$ $λ = 1.5$ $δ = 2.0$	$η_{1} = 0.012$ $η_{2} = 0.012$ $λ = 1.5$ $δ = 2.0$
HAF–RFF–KMPE	$η_{1} = 0.01$ $η_{2} = 0.012$ $p = 2.0$ $σ = 1.5$	$η_{1} = 0.02$ $η_{2} = 0.02$ $p = 2.0$ $σ = 1.5$	$η_{1} = 0.019$ $η_{2} = 0.02$ $p = 2.0$ $σ = 1.5$	$η_{1} = 0.012$ $η_{2} = 0.013$ $p = 2.0$ $σ = 1.5$

Table 2. Parameter settings of different algorithms under different non-Gaussian noise settings.

Algorithm	Parameters
Algorithm	Case 1	Case 2	Case 3	Case 4
HAF–Polynomial–MSE	$η_{1} = 0.01$ $η_{2} = 0.012$ $α = 5$	$η_{1} = 0.012$ $η_{2} = 0.012$ $α = 5$	$η_{1} = 0.008$ $η_{2} = 0.01$ $α = 5$	$η_{1} = 0.009$ $η_{2} = 0.011$ $α = 5$
HAF–Polynomial–MCC	$η_{1} = 0.015$ $η_{2} = 0.01$ $α = 15$ $h = 1.0$	$η_{1} = 0.08$ $η_{2} = 0.08$ $α = 15$ $h = 1.5$	$η_{1} = 0.09$ $η_{2} = 0.085$ $α = 15$ $h = 1.5$	$η_{1} = 0.085$ $η_{2} = 0.085$ $α = 15$ $h = 1.5$
HAF–Spline–LMP	$η_{1} = 0.0145$ $η_{2} = 0.014$ $p = 1.5$	$η_{1} = 0.02$ $η_{2} = 0.018$ $p = 1.5$	$η_{1} = 0.015$ $η_{2} = 0.015$ $p = 1.5$	$η_{1} = 0.015$ $η_{2} = 0.015$ $p = 1.5$
HAF–RFF–KRSL	$η_{1} = 0.019$ $η_{2} = 0.0197$ $λ = 0.01$ $δ = 0.5$	$η_{1} = 0.005$ $η_{2} = 0.005$ $λ = 1.5$ $δ = 3.5$	$η_{1} = 0.005$ $η_{2} = 0.005$ $λ = 2.0$ $δ = 1.2$	$η_{1} = 0.004$ $η_{2} = 0.004$ $λ = 1.0$ $δ = 1.8$
HAF–RFF–KMPE	$η_{1} = 0.005$ $η_{2} = 0.005$ $p = 2.0$ $σ = 1.5$	$η_{1} = 0.01$ $η_{2} = 0.01$ $p = 2.0$ $σ = 1.0$	$η_{1} = 0.008$ $η_{2} = 0.008$ $p = 2.0$ $σ = 1.0$	$η_{1} = 0.004$ $η_{2} = 0.004$ $p = 1.5$ $σ = 1.0$

Table 3. Steady-state MSEs of HAF methods designed with different error criteria when RFF is fixed.

Noise Type	HAF–RFF–MSE	HAF–RFF–MCC	HAF–RFF–KRSL	HAF–RFF–KMPE
Case 1	0.0179	0.0162	0.0055	0.0040
Case 2	0.0236	0.0150	0.0166	0.0119
Case 3	0.0131	0.0068	0.0034	0.0021
Case 4	0.0158	0.0084	0.0031	0.0021

Table 4. Steady-state MSEs of HAF methods designed with different nonlinear mapping functions when KMPE criterion is fixed.

Function Type	HAF–Polynomial–KMPE	HAF–Spline–KMPE	HAF–RFF–KMPE
Function 1	0.3610	0.0027	0.0013
Function 2	0.1984	0.0092	0.0051
Function 3	0.0060	0.0056	0.0031
Function 4	0.1420	0.0068	0.0042

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Tu, C.; Liu, Y.; Chen, Y.; Wen, C.; Yin, B. Kernel Mean p-Power Loss-Enhanced Robust Hammerstein Adaptive Filter and Its Performance Analysis. Symmetry 2025, 17, 1556. https://doi.org/10.3390/sym17091556

AMA Style

Liu Y, Tu C, Liu Y, Chen Y, Wen C, Yin B. Kernel Mean p-Power Loss-Enhanced Robust Hammerstein Adaptive Filter and Its Performance Analysis. Symmetry. 2025; 17(9):1556. https://doi.org/10.3390/sym17091556

Chicago/Turabian Style

Liu, Yan, Chuanliang Tu, Yong Liu, Yu Chen, Chenggan Wen, and Banghui Yin. 2025. "Kernel Mean p-Power Loss-Enhanced Robust Hammerstein Adaptive Filter and Its Performance Analysis" Symmetry 17, no. 9: 1556. https://doi.org/10.3390/sym17091556

APA Style

Liu, Y., Tu, C., Liu, Y., Chen, Y., Wen, C., & Yin, B. (2025). Kernel Mean p-Power Loss-Enhanced Robust Hammerstein Adaptive Filter and Its Performance Analysis. Symmetry, 17(9), 1556. https://doi.org/10.3390/sym17091556

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Kernel Mean p-Power Loss-Enhanced Robust Hammerstein Adaptive Filter and Its Performance Analysis

Abstract

1. Introduction

2. HAF Structure

3. Proposed Method

3.1. Kernel Mean p-Power Error

3.2. Learning Algorithm

4. Performance Analysis

4.1. Stabilization Range of the Step Sizes

4.2. Steady-State Excess Mean Square Error

5. Simulation Results

5.1. Verification of Theoretical Results

5.2. Performance Evaluation Under Different Nonlinear Sub-Block Settings

5.3. Performance Evaluation Under Different Non-Gaussian Noise Environments

5.4. Parameters Sensitivity

5.5. Ablation Experiments

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI