1. Introduction
Adaptive filtering is a fundamental technique in the field of signal processing, offering effective solutions for tracking and modeling time-varying systems. While conventional adaptive filters are well-suited for linear scenarios, their performance can degrade significantly in the presence of unknown nonlinearities for practical applications. To overcome this limitation, structured nonlinear models have been integrated into adaptive filtering frameworks, enabling more accurate system representation. Among these, the Hammerstein model [
1], comprising a static nonlinear block followed by a linear dynamic system, has become a widely adopted approach due to its simplicity and effectiveness in capturing nonlinear dynamics. Building on this structure, Hammerstein adaptive filters (HAFs) have been extensively investigated and have found successful applications in various scenarios, such as global navigation satellite systems, acoustic echo cancelers, and nonlinear system identification [
2,
3,
4,
5,
6].
A critical consideration in designing HAF algorithms lies in the choice of the cost function. Traditionally, the mean square error (MSE) criterion has been widely adopted for its simplicity and solid theoretical foundation. However, MSE is known to be sensitive to outliers and non-Gaussian disturbances, which often arise in practical nonlinear systems. To overcome this limitation, the sign error criterion has been introduced as an alternative, leading to the development of the sign normalized least mean square algorithm based on the Hammerstein spline adaptive filter [
7]. Owing to the inherent insensitivity to outliers of the sign error criterion, such a filter demonstrates improved robustness compared to MSE-based approaches. Nevertheless, the derivative of the sign error function with respect to the error is either 1 or
at non-zero points. This characteristic prevents the algorithm from assigning negligible weights to large error samples—often caused by outliers—and can result in the loss of valuable information embedded in normal error samples. To further enhance robustness and adaptability, the least mean p-power (LMP) error criterion has been employed to design HAFs [
8]. By flexibly tuning the value of
p, the mean p-power error criterion can find its form equivalent to classical MSE and the sign error criterion, enabling more versatile and effective filtering in non-Gaussian environments. Other than the sign error and LMP error criteria, alternative cost functions such as the maximum correntropy criterion (MCC) [
9,
10] and kernel risk-sensitive loss (KRSL) [
11] have also demonstrated strong potential. These advanced error criteria have significantly improved the robustness of HAF algorithms. However, achieving an optimal trade-off between robustness, convergence speed, and steady-state accuracy remains a significant challenge.
To address these issues, this paper proposes a novel Hammerstein adaptive filtering framework based on the kernel mean p-Power error (KMPE) criterion [
12]. KMPE extends the traditional p-power loss into kernel space, enabling improved robustness against outliers while capturing higher-order statistical information. By embedding the input data in a reproducing kernel Hilbert space [
13], the proposed approach leverages the kernel-induced features to construct a more resilient cost surface, thus improving both stability and performance under non-Gaussian noise conditions. For additional application examples of KMPE, please refer to [
14,
15,
16].
In addition to the choice of cost function, the modeling of the nonlinear component within the Hammerstein structure is crucial for accurately capturing system dynamics. In earlier designs, polynomial functions were commonly adopted as the default approach [
4]. However, due to their limited approximation capacity, polynomial-based methods may exhibit suboptimal performance, particularly when identifying Hammerstein systems without prior knowledge of the nonlinear sub-block. To address this limitation, some alternative nonlinear mapping strategies have been explored. Representative models include spline functions [
17], the extreme learning machine model [
18], the Volterra model [
19], the kernel-based model [
20,
21], and the random Fourier features model [
11]. Among these, the random Fourier features (RFF)-based model is capable of providing a good balance between computational efficiency and approximation accuracy, making it particularly suitable for capturing unknown nonlinearities [
22,
23,
24]. Motivated by the desirable properties of the RFF-based model, this paper adopts random Fourier features to model the nonlinear transformation. Given that this approach allows for a more flexible representation of the nonlinear sub-block, it significantly enhances both the scalability and representational capacity of the overall system. The main contributions of this paper are outlined as follows:
- (1)
A robust Hammerstein adaptive filter based on the KMPE cost function is proposed, providing enhanced resistance to non-Gaussian noises.
- (2)
The random Fourier feature-based nonlinear modeling scheme is integrated into the proposed HAF structure, enabling efficient and flexible representation of nonlinearities.
- (3)
Theoretical analyses of steady-state excess mean square error are provided to reveal the steady-state behavior of the proposed method.
- (4)
Numerical experiments were conducted to validate the effectiveness and robustness of the proposed method in nonlinear system identification tasks.
The remainder of this paper is organized as follows.
Section 2 presents the framework of the HAF. In
Section 3, we derive a robust HAF algorithm based on the KMPE criterion.
Section 4 focuses on analyzing the steady-state performance of the algorithm.
Section 5 provides several experimental evaluations that validate both the theoretical analysis and the desired performance of the proposed algorithm. Finally,
Section 6 concludes this work.
2. HAF Structure
The HAF is a block-based filter [
4], whose block is shown in
Figure 1. It is clear that the HAF consists of a nonlinear sub-block and a linear sub-block. For an input signal
, it is first sent to the nonlinear sub-block, obtaining the following intermediate output:
where
is the new representation of
in a high-dimensional feature space, and where
is a vector to store the corresponding weights. Many methods can be used to construct
. For example, if the polynomial method is adopted,
can be constructed by
where
L is the dimension of the feature space. However, due to the inherent limitations of polynomials in approximating unknown nonlinear functions, methods that utilize polynomial features may yield unexpected performance when applied to identify Hammerstein systems without any prior knowledge of the nonlinear sub-block. Consequently, some advanced techniques have been employed to construct
. Among these methods, the RFF method is a newly introduced method that has exhibited remarkable competitiveness [
11]. Let
denote a sequence that is randomly generated according to a Gaussian distribution with zero-mean and variance
. Additionally, let
denote another sequence that is randomly generated following a uniform distribution over the interval
. Then,
obtained through the RFF method can be expressed as follows:
where
denotes the cosine function. In the rest of the paper, the RFF method will be the default option to construct
due to its simplicity and excellent competitiveness in comparison with other options.
Once the vector
is constructed, the corresponding intermediate output
can be obtained using (
1). This intermediate output is then forwarded to the linear sub-block to yield the final system output
. In particular, if we use
to denote the weight vector of the linear sub-block, the final
can be obtained with the following formula:
where
and
.
It can be observed from (
1) and (
4) that both
and
are the parameters that require adjustment in the developed model. To effectively learn these parameters with the noisy observation
, a well-designed cost function is crucial. In previous studies [
4], instantaneous MSE has typically been employed as a default choice, leading to the following cost function:
where
is the estimated error of the
nth sample. Although this cost function is effective when the observation noise sequence
is drawn from a Gaussian distribution, it will cause a bad performance of the designed HAF when the observation noise sequence contains some outliers or other more complex non-Gaussian noises. To enhance the robustness of the HAF, we propose a robust version of the HAF with a KMPE-based cost function.
5. Simulation Results
This section presents simulations to validate the theoretical analysis and evaluate the performance of the proposed HAF–RFF–KMPE method.
5.1. Verification of Theoretical Results
To empirically validate the theoretical results of
Section 4, a dataset of 300,000 samples was synthesized using the following equations:
where the nonlinear transformation
,
denotes a sequence that is randomly generated according to a Gaussian distribution with zero mean and variance
, and where
denotes another sequence that is randomly generated following a uniform distribution over the interval
. The coefficient vectors remain fixed at
and
, while the system inputs
are sampled uniformly from
. Similar to [
11,
28,
29], we further corrupted the outputs of data pairs using a standard non-Gaussian noise model, described as
where
denotes normal noise,
represents outliers, and
is a binary variable satisfying
and
. Within this noise model, the inner noise
is set to draw from uniform distribution, i.e.,
, and the outlier component
is set to draw from Gaussian distribution with larger variance, i.e.,
. Meanwhile, the outlier probability is set to
.
Figure 2 illustrates the comparison between the theoretical and simulated steady-state EMSE of the proposed HAF–RFF–KMPE algorithm, where the theoretical values are derived based on Theorem 1. Similarly,
Figure 3 presents the corresponding comparison based on Theorem 2. To estimate the steady-state EMSE from simulations, the values were computed by averaging the EMSE over the final 30,000 iterations of the EMSE learning curves. Furthermore, to reduce the impact of randomness in the simulation, each result was averaged over 50 independent runs. As shown in
Figure 2 and
Figure 3, the simulated steady-state EMSE values closely matched the theoretical predictions in both scenarios when the value of
at steady state was relatively small, which provides supports for the validity of the proposed theoretical analysis. However, it should be noted that when the value of
at steady state became larger (see
Figure 2a), the theoretical values would become less accurate. For the underlying reason leading to these slight differences, refer to Remark 2.
5.2. Performance Evaluation Under Different Nonlinear Sub-Block Settings
To further test the performance of the proposed HAF–RFF–KMPE, a general Hammerstein system is described as follows:
where
is the input of the system and is still set to draw from a uniform distribution over the interval
, and where
is the weight vector of a linear sub-block with the definition of
. Meanwhile,
f denotes a general nonlinear function. In the following, four cases are considered and described, as
Following (
47)–(52), we generate four groups of input–output data pairs, and the only difference between them is the used nonlinear function. Each set of data contains 50,000 pairs of training samples and 100 pairs of testing samples. The generated training data will be mixed with a noise sequence on the corresponding output, which will be used to train the algorithm, while the generated test samples will be directly used to evaluate performance. The noise sequence is generated according to the procedure in
Section 5.1. For the current experiment, we maintained this procedure while modifying the outlier occurrence rate to
. The evaluated performance was quantified through the MSE, expressed as
where
and
denote the actual and estimated outputs of the
j-th sample, respectively, and where
N is the number of samples.
Figure 4 shows the averaged testing MSE curves of the HAF–RFF–KMPE under different nonlinear sub-block settings. For comparison, the averaged testing MSE curves of several existing algorithms designed with an HAF structure were also incorporated into this figure. These methods include an HAF designed with the MSE criterion and a polynomial function (the HAF–Polynomial–MSE) [
4], an HAF designed with the MCC criterion and a polynomial function (the HAF–Polynomial–MCC) [
9], an HAF designed with the LMP criterion and a spline function (the HAF–Spline–LMP) [
8], and an HAF designed with the KRSL criterion and an RFF transformation (the HAF–RFF–KRSL) [
11]. To ensure a fair comparison, the parameters for different algorithms were selected so that all the algorithms achieved their best performance with almost the same initial convergence speed, similar to [
30]. Within the HAF–Spline–LMP framework, the Catmull–Rom spline basis [
8] serves as a default choice for spline function design, and the spline configuration employs 23 control points spaced at 0.2 intervals. For both the HAF–RFF–KRSL and the HAF–RFF–KMPE implementations, the RFF space dimensionality was fixed at 50, with parameter
being fixed at 0.1. The remaining parameters across the algorithms were empirically determined to balance the convergence rate and steady-state performance. The specific values are detailed in
Table 1. Herein,
denotes the step size for nonlinear weight adaptation,
represents the step size for linear weight updates,
specifies the polynomial order in the HAF–Polynomial–MSE and the HAF–Polynomial–MCC, and
h defines the kernel bandwidth for the HAF–Polynomial–MCC. Meanwhile,
and
denote the parameters for cost function used in the HAF–RFF–KRSL, and
p and
denote the parameters for cost function used in the HAF–RFF–KMPE.
It can be observed from
Figure 4 that the proposed HAF–RFF–KMPE not only had faster convergence speed at the initial stage but could also obtain smaller testing MSE values in the steady state, compared with the other four algorithms. These results indicate that for Hammerstein systems with unknown nonlinearities, the HAF–RFF–KMPE presents a superior alternative to the HAF–Polynomial–MSE, the HAF–Polynomial–MCC, the HAF–Spline–LMP, and the HAF–RFF–KRSL.
5.3. Performance Evaluation Under Different Non-Gaussian Noise Environments
For this section, we tested the performance of the proposed HAF–RFF–KMPE under different non-Gaussian noise environments. The noise model used was set the same as (
46), in which the outlier probability was set to
and
was set to be drawn from a Gaussian distribution with zero mean and variance of 10. However, four cases were considered to model the inner noise
. The details are as follows:
Case 1: was set to be drawn from a Gaussian distribution with zero mean and variance of .
Case 2: was set to be drawn from a binary distribution, which satisfied and .
Case 3: was set to be drawn from a uniform distribution over the interval of .
Case 4: was set to be drawn from a Laplace distribution with a location parameter of 0 and a scale parameter of 0.1.
Figure 5 shows the averaged testing MSE curves of the HAF–RFF–KMPE in different non-Gaussian noise environments, and the related parameters are summarized in
Table 2. Herein, the nonlinear sub-block of the Hammerstein was modeled as
, and the corresponding linear sub-block was set the same as (
47). As can be observed from
Figure 5, the proposed HAF–RFF–KMPE with almost the same initial convergence speed can always obtain the smaller testing MSE at the final iteration, which means it is capable of achieving higher filtering accuracy under different non-Gaussian noise environments. Furthermore, it can be observed that although the HAF–RFF–KRLS also adopted the RFF to model the nonlinear sub-block of Hammerstein systems it tended to obtain larger testing MSEs than the HAF–RFF–KMPE. This indicates that the used KMPE cost function is more effective for identifying Hammerstein systems contaminated by non-Gaussian noises in comparison with KRSL.
5.4. Parameters Sensitivity
There are four key parameters—i.e., L, , p, and —that should be appropriately selected to obtain the desired performance of the proposed method. For this section, we investigated the influence of these parameters on the learning performance of the HAF–RFF–KMPE.
First, we examined how the parameter
L affected the learning performance of the HAF–RFF–KMPE. Specifically, we varied the value of
L from 10 to 100 in increments of 10, while keeping all the other parameters consistent with those used in
Table 1.
Figure 6a presents the steady-state MSEs of the the HAF–RFF–KMPE for different values of
L. The steady-state MSE was calculated by averaging over the last 500 iterations of MSE curves. Additionally, the nonlinear sub-block of Hammerstein was modeled as
, and we maintained a noise model identical to that used in Case 1. As illustrated in
Figure 6a, it is evident that the steady-state MSE for the HAF–RFF–KMPE decreased as
L increased. However, it should be noted that a larger
L means larger computational complexity. Therefore,
L should be selected to provide a balance between computational efficiency and filtering accuracy.
Similarly, we investigated how varying the parameter
impacted the learning performance of the HAF–RFF–KMPE.
Figure 6b shows steady-state MSEs of the HAF–RFF–KMPE under different selections for
. From this figure, it can be observed that setting
either too small or too large leads to degraded learning performance for the HAF–RFF–KMPE. Consequently, it is crucial to select an appropriate value for
before implementation.
Furthermore, we examined the impact of parameter
p on the learning performance of the HAF–RFF–KMPE, with our results presented in
Figure 6c. As illustrated in
Figure 6c, a very small value of
p (e.g.,
) can lead to a deterioration in the learning performance of the HAF–RFF–KMPE. Conversely, selecting values for
p ranging from 1 to 4 yields relatively superior filtering performance.
Finally, we investigated the effect of parameter
on the learning performance of the HAF–RFF–KMPE, as shown in
Figure 6d. It is evident from this figure that the influence of
on the learning performance of the HAF–RFF–KMPE is similar to that of
. Therefore,
should also be appropriately selected before being used.
5.5. Ablation Experiments
Both RFF and KMPE were employed in the design of the proposed method. For this section, we performed a comparative analysis of the performance improvements achieved through the independent application of RFF, the independent application of KMPE, and the synergistic integration of both methods across the aforementioned four types of noise and nonlinear functions.
First, we fixed the RFF mapping function and investigated the influence of different error criteria on the filtering performance. Specifically, the proposed KMPE criterion was compared to classical alternatives including MSE, MCC, and KRSL. As shown in
Table 3, across four distinct noise types the KMPE-based method consistently achieved lower steady-state MSE values than the other error criteria. This indicates that the KMPE criterion is more effective in characterizing error distributions under non-Gaussian conditions, thereby suppressing outliers more efficiently, which ultimately leads to enhanced filtering accuracy.
Furthermore, we fixed the KMPE criterion and examined the role of different mapping functions. The RFF mapping function employed in the proposed method was compared with polynomial and spline mapping functions, and the results are summarized in
Table 4. Under four different nonlinear function scenarios, the RFF-based approach consistently outperformed the alternatives, yielding the lowest steady-state MSE values. These results demonstrate that the adopted RFF mapping function possesses stronger representational capability for capturing nonlinear structures in the input data.
As a brief summary, the ablation experiments confirm the effectiveness of the proposed HAF–RFF–KMPE method from two complementary perspectives: (1) the KMPE criterion provides superior adaptability to non-Gaussian environments compared to classical error measures, and (2) the RFF mapping function offers better expressiveness and robustness than alternative mapping strategies. The combination of these two components accounts for the significant improvement in filtering accuracy and robustness observed in the proposed method.