Identification of Non-Stationary Communication Channels with a Sparseness Property

Marcin Ciołek

doi:10.3390/app152413043

Department of Signals and Systems, Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, Narutowicza 11/12, 80-233 Gdańsk, Poland

Appl. Sci.2025, 15(24), 13043;https://doi.org/10.3390/app152413043

Version Notes

Order Reprints

Abstract

The problem of identifying non-stationary communication channels with a sparseness property using the local basis function approach is considered. This sparseness refers to scenarios where a few impulse response coefficients significantly differ from zero. The sparsity-aware estimation algorithms are usually obtained using

ℓ^{1}

regularization. Unfortunately, the minimization problem lacks a sometimes closed-form solution; one must rely on numerical search, which is a serious drawback. We propose the fast regularized local basis functions (fRLBF) algorithm based on appropriately reweighted

ℓ^{2}

regularizers, which can be regarded as a first-order approximation of the

ℓ^{1}

approach. The proposed solution incorporates two regularizers, enhancing sparseness in both the time/lag and frequency domains. The choice of regularization gains is an important part of regularized estimation. To address this, three approaches are proposed and compared to solve this problem: empirical Bayes, decentralized, and cross-validation approaches. The performance of the proposed algorithm is demonstrated in a numerical experiment simulating underwater acoustics communication scenarios. It is shown that the new approach can outperform the classical one and is computationally attractive.

Keywords:

adaptive filter; fLBF estimator; preestimate; regularization; sparse channel; time-varying system; underwater acoustics

1. Introduction

In modern wireless systems, distortion in transmitted signals is primarily caused by multi-path effects, where the signal reaches the receiver along different paths with varying time delays. When these multi-path effects are dominated by a few strong reflectors (scatterers), the impulse response coefficients significantly differ from zero [1,2,3]. The time variation of these coefficients is due to the movement of the transmitter/receiver and/or changes in the pattern of scatterers. Depending on how fast the channel coefficients vary with time, different estimation approaches may be required. For slowly varying coefficients, the time-localized versions of the least squares or maximum likelihood approach may be used [4,5,6]. Fast parameter changes can be tracked by algorithms relying on an explicit model of parameter variation, either stochastic [7,8,9] or deterministic [10,11,12,13,14]. Further improvements in parameter estimation accuracy are achieved through regularization [15]. While regularization is commonly used in system identification, most research in the field focuses on time-invariant systems [16,17]. The papers [18,19] started a new trend in the identification of time-varying systems. In both studies, estimation is conducted using the local basis functions (LBF) or fast local basis functions (fLBF) approach for time-varying finite impulse response (FIR) systems, with

ℓ^{2}

regularization applied to penalize excess values of the squared norm of hyperparameters [18] or the squared norm of trajectory parameters [19]. While useful for general-purpose identification, such regularization does not effectively address the specific sparsity property of mobile telecommunication channels.

Sparsity-aware estimation algorithms typically utilize

ℓ^{1}

regularization and belong to the LASSO (Least Absolute Shrinkage and Selection Operator) family [20]. Mainstream sparse identification techniques, such as sparse Bayesian learning [21], the Iterative Shrinkage-Thresholding Algorithm (ISTA) [22], its accelerated variant Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) [23], Orthogonal Matching Pursuit (OMP) [24], and its extension Compressive Sampling Matching Pursuit (CoSaMP) [25], perform sparse channel estimation in an iterative manner. The ISTA algorithm and its fast version FISTA are designed to handle only a single

ℓ^{1}

regularization term; when two or more regularization terms are present, both algorithms must be modified to approximate the solution. A common extension to ISTA/FISTA in this context is the use of the Alternating Direction Method of Multipliers (ADMM) algorithm [26], which can efficiently handle multiple regularization terms. On the other hand, OMP and CoSaMP are greedy algorithms that iteratively construct a sparse solution by selecting the most relevant components at each step. For all of these algorithms, the processing time increases with the number of iterations, making the choice of appropriate stopping criteria crucial for practical applications. Moreover, since the underlying minimization problem does not admit a closed-form solution, these methods rely on numerical optimization, which can be computationally demanding—especially when channel parameter estimation, adaptive hyperparameter selection, and regularization gain optimization must all be performed within a sliding window fashion.

In this paper, the fast regularized local basis functions (fRLBF) algorithm is proposed to address these challenges. In this approach,

ℓ^{1}

regularization is replaced with appropriately reweighted

ℓ^{2}

regularizers. This solution differs from the one described in [27] as it incorporates two regularizers. The first regularizer penalizes a large number of basis functions used to approximate the time evolution of channel parameters, while the second one penalizes a large number of non-zero components of the parameter vector. Finally, three approaches to adaptively choose regularization gains are proposed: the empirical Bayes approach, the decentralized approach, and the cross-validation approach. The core advantages of the proposed multi-stage solution are its computational efficiency in the sliding window approach and its flexibility, which allows each system parameter trajectory to be identified independently. Furthermore, regularization can be performed either jointly or separately for all parameters. Thanks to its multi-stage structure and adaptive hyperparameter selection, the proposed method is well-suited for sparse problems, achieving very high accuracy in time-varying channel estimation.

2. Material and Methods

Many non-stationary communication channels, both terrestrial and underwater, can be well approximated by a time-varying FIR model of the form [1,2]

\begin{matrix} y (t) & = \sum_{i = 1}^{n} θ_{i}^{*} (t) u (t - i + 1) + e (t) \\ = θ^{H} (t) φ (t) + e (t) \end{matrix}

(1)

where

t = \dots, - 1, 0, 1, \dots

denotes discrete (normalized) time,

y (t)

is the complex-valued output signal,

φ (t) = {[u (t), \dots, u (t - n + 1)]}^{T}

is the regression vector composed of past samples of the complex-valued input signal

u (t)

,

θ (t) = {[θ_{1} (t), \dots, θ_{n} (t)]}^{T}

is the vector of time-varying system coefficients, and

{e (t)}

denotes measurement noise. The symbol ∗ stands for complex conjugate, and H denotes the Hermitian (complex conjugate) transpose.

The following assumptions are made:

( $A$ 1): ${u (t)}$ is a zero-mean circular white noise with variance $σ_{u}^{2}$ .
( $A$ 2): ${e (t)}$ , independent of ${u (t)}$ , is a zero-mean circular white noise with variance $σ_{e}^{2}$ .
( $A$ 3): ${θ (t)}$ is a sequence independent of ${u (t)}$ and ${e (t)}$ .

Circular white noise is a sequence of independent and identically distributed (i.i.d.) random variables with independent real and imaginary parts. The FIR model structure is a direct consequence of multi-path signal propagation, while the time-varying nature of

θ (t)

stems from Doppler shifts caused by transmitter/receiver movement and/or changes in the surrounding environment. These assumptions (

A

1)–(

A

3) are typical in wireless communication systems.

The self-interference channel model of a full-duplex underwater acoustic (UWA) system is an example of non-stationary communication channel with sparseness. In these systems, the transmit and receive antennas operate concurrently within the same frequency bandwidth [28], nearly doubling the limited capacity of the acoustic link. The goal is to isolate the signal

e (t)

from

y (t)

in (1), which includes both the far-end signal and measurement noise, by removing the self-interference component

θ^{H} (t) φ (t)

. This self-interference is caused by reflections from surrounding objects like the sea surface, sea bottom, fish, and vessels. Since the near-end signal

u (t)

, generated by the transmit antenna, is always known, the problem reduces to tracking the impulse response coefficients of the self-interference channel.

2.1. Local Basis Functions Estimators

The local basis function (LBF) identification technique assumes that within the local analysis interval

T (t) = [t - k, t + k]

of length

K = 2 k + 1

, centered at t, system parameters can be expressed as linear combinations of a certain number of linearly independent real or complex-valued time functions

f_{1} (j), \dots, f_{m} (j), j \in I_{k} = [- k, k]

, referred to as basis functions, namely

\begin{matrix} θ_{i} (t + j) = \sum_{l = 1}^{m} f_{l}^{*} (j) a_{i l} (t) = f^{H} (j) α_{i} (t) \\ j \in I_{k}, i = 1, \dots, n, α_{i} (t) = {[a_{i 1} (t), \dots, a_{i m} (t)]}^{T} \end{matrix}

(2)

where

f (j) = {[f_{1} (j), \dots, f_{m} (j)]}^{T}

.

Following the local estimation paradigm, parameter trajectories based on the hypermodel (2) are estimated independently for each position of the analysis interval

T (t)

, using a sliding window approach. Thus, while system hyperparameters

a_{i l}

are assumed constant within each interval

[t - k, t + k]

, their values are allowed to change along with the position of the analysis window, and are therefore expressed as functions of t.

The hypermodel (2) can be represented more compactly

\begin{matrix} \begin{matrix} θ (t + j) = F (j) α (t), j \in I_{k} \end{matrix} \end{matrix}

(3)

where

α (t) = {[α_{1}^{T}, \dots, α_{n}^{T}]}^{T}

,

F (j) = I_{n} \otimes f^{H} (j)

, and ⊗ denote the Kronecker product of the corresponding vectors/matrices. Using (3), the system Equation (1) can be reformulated as

\begin{matrix} y (t + j) = α^{H} (t) ψ (t, j) + e (t + j), j \in I_{k} \end{matrix}

(4)

where

ψ (t, j) = φ (t + j) \otimes f (j)

denotes the generalized regression vector.

The most common choices for basis functions before normalization, which enable recursive computability, include powers of time (Taylor series approximation), cosine functions (Fourier series approximations), and the complex exponential basis set [1]. Figure 1 shows the first three basis functions before normalization in the range

[- k, k]

: on the left, powers of time; on the right, cosine functions. In this paper, we will adopt real-valued powers of time as basis functions (for generality, we will retain the complex conjugate transpose). For convenience, and without loss of generality, we will assume that the adopted basis functions are orthonormal, namely

\begin{matrix} \sum_{j = - k}^{k} f (j) f^{H} (j) = I_{m}, \sum_{j = - k}^{k} F^{H} (j) F (j) = I_{m n} \end{matrix}

(5)

where

I_{p}

denotes the

p \times p

identity matrix.

Figure 1. The first three basis functions before normalization in the range

j \in [- k, k]

: on the left, powers of time; on the right, cosine functions.

The LBF estimator has the form [14]

\begin{matrix} {\hat{θ}}^{LBF} (t) = F_{0} {\hat{α}}^{LBF} (t) \end{matrix}

(6)

\begin{matrix} {\hat{α}}^{LBF} (t) = \underset{α}{arg min} \sum_{j = - k}^{k} {| y (t + j) - α^{H} ψ (t, j) |}^{2} \\ = {[\sum_{j = - k}^{k} ψ (t, j) ψ {(t, j)}^{H}]}^{- 1} [\sum_{j = - k}^{k} ψ (t, j) y^{*} (t + j)] \end{matrix}

(7)

where

F_{0} = F (0) = I_{n} \otimes f_{0}^{H}, f_{0} = f (0)

.

An essential property of LBF is that it performs non-causal identification, reducing estimation delay but introducing decision delay. The number of estimated hyperparameters

n m

determines the minimum window length K, which must be substantially greater than

n m

to prevent numerical issues. Unfortunately, the LBF technique requires inverting an

n m \times n m

matrix at every instant t, posing a significant computational drawback.

To help readers understand the concept of the LBF approach, it is illustrated here using results obtained for a non-stationary one-tap FIR system described by

\begin{matrix} y (t) = θ_{1} (t) u (t) + e (t), \end{matrix}

(8)

where the input

u (t)

is a zero-mean, unit-variance, white binary sequence (

\pm 1

). The coefficient

θ_{1} (t)

is modeled as a sinusoidal linear chirp, and the variance of the measurement white Gaussian noise

e (t)

is set to ensure an average signal-to-noise ratio of

SNR = 30 dB

. Unlike classical parameter estimation methods, the LBF approach does not require restrictive assumptions such as local stationarity or slow parameter variations. Instead, the local evolution of the parameter is directly represented as a linear combination of basis functions. Figure 2, Figure 3 and Figure 4 demonstrate the estimation capabilities of the LBF approach and emphasize the importance of proper selection of the design parameters k and m.

Figure 2. From top to bottom: the true parameter trajectory (red) with the LBF parameter estimate superimposed (black) for

k = 200

and

m = 5

; and, at the bottom, the corresponding estimation error along with the mean square estimation error (MSE).

Figure 3. This figure shows

3 \times 3

grid of plots of the estimation error

θ_{1} (t) - {\hat{θ}}_{1} (t)

for different LBF settings, obtained for all combinations of

k \in [50, 100, 200]

and

m \in [1, 3, 5]

. In each plot, the parameter mean square estimation error (MSE) is displayed in the top-left corner.

Figure 4. The parameter estimation error

θ_{1} (t) - {\hat{θ}}_{1} (t)

for different SNR levels

[10, 20, 30]

dB for the setting

k = 50

and

m = 5

. In each plot, the parameter mean square estimation error (MSE) is shown in the top-left corner.

Figure 2 presents, from top to bottom, the true parameter trajectory (red) with the LBF parameter estimate superimposed (black) for

k = 200

and

m = 5

, followed by the estimation error,

θ_{1} (t) - {\hat{θ}}_{1} (t)

. The LBF estimation was initialized before

t = 0

and continued beyond

t =

10,000. The chosen values of

k = 200

and

m = 5

yield a small parameter estimation error as long as the parameter variations are moderate. However, when the parameter begins to vary faster, a better choice of k and m is necessary to maintain satisfactory tracking performance.

Figure 3 presents a

3 \times 3

grid of plots showing the estimation error

θ_{1} (t) - {\hat{θ}}_{1} (t)

for different LBF settings, obtained for all combinations of

k \in [50, 100, 200]

and

m \in [1, 3, 5]

. The selection of these design parameters depends on the rate at which the system parameters vary. For constant parameters, one would typically choose a relatively large window size K and set

m = 1

. Note that for

m = 1

, the basis function approach reduces to the classical least squares method. For slowly time-varying parameters, a smaller value of k and a larger number of basis functions m becomes beneficial. In the case of faster changing parameters, m should be increased further, and an appropriate window size K must be selected to achieve a favorable mean squared estimation error, shown in the top left corner of each plot. The improvement over the classical least squares method (case

m = 1

) is substantial when additional basis functions are used (case

m > 1

)—compare, for example, the plots for

k = 50, m = 1

, and

k = 50, m = 5

.

Figure 4 presents how different SNR levels

[10, 20, 30]

dB impact the parameter estimation error. In this simulated experiment, for

k = 50

and

m = 5

, the MSE increased by approximately a factor of 10 each time the SNR dropped by 10 dB.

2.2. fLBF Estimators

As shown in [29,30], under assumptions (

A

1)–(

A

3), the LBF estimates

{\hat{α}}^{LBF} (t)

and

{\hat{θ}}^{LBF} (t)

can be approximated by the following computationally fast formulas:

\begin{matrix} {\hat{α}}^{fLBF} (t) & = \underset{α}{arg min} \sum_{j = - k}^{k} | | \tilde{θ} (t + j) - F (j) α {| |}_{2}^{2} \end{matrix}

(9)

\begin{matrix} {\hat{θ}}^{fLBF} (t) & = F_{0} {\hat{α}}^{fLBF} (t) = \sum_{j = - k}^{k} h (j) \tilde{θ} (t + j) \end{matrix}

(10)

where

\begin{matrix} h (j) = f_{0}^{H} f (j), j \in I_{k} \end{matrix}

(11)

denotes the impulse response of the FIR filter associated with the LBF estimator, and

{\tilde{θ} (t)}

denotes the vector of pre-estimated trajectories. These pre-estimates can be obtained through “inverse filtering” of the estimates yielded by the exponentially weighted least squares (EWLS) algorithm

\begin{matrix} {\hat{θ}}^{EWLS} (t) = \underset{θ}{arg min} \sum_{j = 0}^{t - 1} λ^{j} {| y (t - j) - θ^{H} φ (t - j) |}^{2} \end{matrix}

(12)

where

λ, 0 < λ < 1

, denotes the forgetting constant. The short-memory EWLS estimates can be computed using the well-known recursive algorithm [4]:

\begin{matrix} ε (t) & = y (t) - {[{\hat{θ}}^{EWLS} (t - 1)]}^{H} φ (t) \end{matrix}

(13)

\begin{matrix} k (t) & = \frac{R (t - 1) φ (t)}{λ + φ^{H} (t) R (t - 1) φ (t)} \end{matrix}

(14)

\begin{matrix} {\hat{θ}}^{EWLS} (t) & = {\hat{θ}}^{EWLS} (t - 1) + k (t) ε^{*} (t) \end{matrix}

(15)

\begin{matrix} R (t) & = \frac{1}{λ} [I_{n} - k (t) φ^{H} (t)] R (t - 1) \end{matrix}

(16)

with initial conditions

{\hat{θ}}^{EWLS} (0) = 0

and

R (0) = c I_{n}

, where c denotes a large positive constant. Alternatively, to reduce the computational cost, one can use the iterative dichotomous coordinate descent (DCD) algorithm described in [31].

The inverse filtering formula has the form

\begin{matrix} \tilde{θ} (t) = L_{t} {\hat{θ}}^{EWLS} (t) - λ L_{t - 1} {\hat{θ}}^{EWLS} (t - 1) \end{matrix}

(17)

where

L_{t} = \sum_{j = 0}^{t - 1} λ^{j} = λ L_{t - 1} + 1, L_{0} = 1

, denotes the effective width of the exponential window. For large values of t, when the effective window width reaches its steady state value

L_{\infty} = 1 / (1 - λ)

, the Formula (17) can be replaced with

\begin{matrix} \tilde{θ} (t) = \frac{1}{1 - λ} [{\hat{θ}}^{EWLS} (t) - λ {\hat{θ}}^{EWLS} (t - 1)] . \end{matrix}

(18)

The recommended choice of the forgetting factor [29], which performs well in practice, is

\begin{matrix} λ = max {0.9, 1 - 2 / n} . \end{matrix}

(19)

For

λ = 1 - 2 / n

, the effective window length

L_{\infty}

is approximately equal to half the number of estimated coefficients, i.e.,

n / 2

. As

L_{\infty}

increases, the mean square deviation of

{\hat{θ}}^{EWLS} (t)

from

θ (t)

becomes increasingly dominated by the bias error, which arises primarily because the estimated parameter trajectory lags behind the true trajectory. Therefore, to achieve a favorable bias–variance trade-off,

L_{\infty}

should remain relatively small.

According to [29], under assumptions (

A

1)–(

A

3), the pre-estimate

\tilde{θ} (t)

is approximately unbiased, i.e., as follows:

\begin{matrix} \tilde{θ} (t) ≊ θ (t) + z (t), \end{matrix}

(20)

where

z (t)

denotes approximately zero-mean white noise with large covariance matrix

cov [z (t)] = σ_{z}^{2} I_{n}

. The unbiasedness property comes at the cost of significant variability. The fLBF estimate

{\hat{θ}}^{fLBF} (t)

can thus be interpreted as a denoised version of

\tilde{θ} (t)

obtained via the basis function approach.

The fLBF approach is illustrated using the same one-tap FIR example as the LBF method—see (8). Figure 5 presents the pre-estimates (A, black) obtained by inverse filtering of the EWLS estimates using Formula (18), with

λ = 0.9

at an SNR level of 10 dB. These pre-estimates were then denoised using the FIR filter impulse response defined in (11) for

k = 50

and

m = 5

, resulting in the final fLBF estimates shown in (B, black). In this simple case, the estimation error for the fLBF results shown in (C) closely matches that of the LBF approach.

Figure 5. From top to bottom: (A) the true parameter trajectory (red) with pre-estimates superimposed (black) at an SNR level of 10 dB; (B) the true parameter trajectory (red) with fLBF estimates superimposed (black) for

k = 50

and

m = 5

; (C) the estimation error,

θ_{1} (t) - {\hat{θ}}_{1} (t)

, with the mean square estimation error (MSE) indicated in the top-right corner.

2.3. Regularized fLBF Estimators

Denote by

{| | x | |}_{2} = \sqrt{\sum_{i = 1}^{n} {| x_{i} |}^{2}}

the

ℓ^{2}

norm of a complex-valued vector

x = {[x_{1}, \dots, x_{n}]}^{T}

, and by

{| | x | |}_{1} = \sum_{i = 1}^{n} | x_{i} |

its

ℓ^{1}

norm. Finally, let

{| | x | |}_{A} = \sqrt{x^{H} A x}

, where

A

denotes a

n \times n

positive definite Hermitian matrix.

The LASSO-type regularized fLBF estimator, defined as

\begin{matrix} {\hat{α}}^{LASSO} (t) & = \underset{α}{arg min} \{\sum_{j = - k}^{k} | | \tilde{θ} (t + j) - {F (j) α | |}_{2}^{2} + {η | | α | |}_{1} + μ | | F_{0} α {| |}_{1}\} \end{matrix}

(21)

\begin{matrix} {\hat{θ}}^{LASSO} (t) & = F_{0} {\hat{α}}^{LASSO} (t) \end{matrix}

(22)

where

η, μ > 0

denote regularization constants, which incorporates two regularizers. When both regularization constants are set to zero, the solution reduces to the standard fLBF approach. These constants allow for fine-tuning the degree of shrinkage applied to the estimates as they approach zero. Setting higher regularization constants increases shrinkage but does not eliminate parameters entirely, unlike LASSO, which can set coefficients to zero. The first regularizer,

{η | | α (t) | |}_{1}

, promotes sparseness in the frequency domain by penalizing the excessive number of basis functions used to approximate the time evolution of channel parameters

θ_{1} (t), \dots, θ_{n} (t)

. The second regularizer,

μ | | F_{0} {α | |}_{1} = {μ | | θ (t) | |}_{1}

, promotes sparseness in the time/lag domain by penalizing non-zero components of the vector

θ (t)

. With the inclusion of the second regularizer, schemes (21) and (22) serve as a group LASSO solution, enhancing model sparsity at both individual (hyperparameter) and group (parameter) component levels [32].

Due to the lack of a closed-form solution to the minimization problem in (21) and (22), numerical search is necessary, posing a drawback for sliding window estimation. Similarly, selecting optimal regularization gains

η

and

μ

presents a similar challenge. To mitigate these issues, we propose replacing the

ℓ^{1}

regularization terms in (21) with appropriately reweighted

ℓ^{2}

regularizers. Reweighting is a known optimization technique [33,34]. Note that the

ℓ^{1}

norm of the vector

x

can be expressed as

\begin{matrix} {| | x | |}_{1} = \sum_{i = 1}^{n} \frac{x_{i}^{*} x_{i}}{| x_{i} |} = x^{H} W x = {| | x | |}_{W}^{2} \end{matrix}

(23)

where

W = diag {| x_{1} |^{- 1}, \dots, | x_{n} |^{- 1}}

is the weight matrix. In order to apply the reweighting technique directly to the vector

α (t)

\begin{matrix} {| | α (t) | |}_{1} = \sum_{i = 1}^{n} \sum_{l = 1}^{m} \frac{a_{i l}^{*} (t) a_{i l} (t)}{| a_{i l} (t) |} = α^{H} (t) W α (t) = {| | α (t) | |}_{W}^{2} \end{matrix}

(24)

one would need to know the true trajectories of the system hyperparameters

a_{i l} (t)

, which are assumed to be unknown. Therefore, the weight matrix

W

is initialized based on the estimated values

{\hat{a}}_{i l}^{fLBF} (t)

obtained from the fLBF algorithm. Using the reweighting technique together with the fLBF approach, one arrives at the approximations

\begin{matrix} {| | α (t) | |}_{1} & {≅ | | α (t) | |}_{Δ (t)}^{2} \end{matrix}

(25)

\begin{matrix} Δ (t) & = bl diag {Δ_{1} (t), \dots, Δ_{n} (t)} \end{matrix}

(26)

\begin{matrix} Δ_{i} (t) & = diag {δ_{i 1} (t), \dots, δ_{i m} (t)} \end{matrix}

(27)

\begin{matrix} δ_{i l} (t) & = | {\hat{a}}_{i l}^{fLBF} {(t) |}^{- 1}, i = 1, \dots, n, l = 1, \dots, m \end{matrix}

(28)

and

\begin{matrix} {| | θ (t) | |}_{1} & = | | F_{0} {α (t) | |}_{1} {≅ | | α (t) | |}_{F_{0}^{H} Γ (t) F_{0}}^{2} \end{matrix}

(29)

\begin{matrix} Γ (t) & = diag {γ_{1} (t), \dots, γ_{n} (t)} \end{matrix}

(30)

\begin{matrix} γ_{i} (t) & = | {\hat{θ}}^{fLBF} {(t) |}^{- 1}, i = 1, \dots, n . \end{matrix}

(31)

This leads to the following approximate version of (21) and (22), further referred to as the fast regularized LBF (fRLBF) estimator

\begin{matrix} {\hat{α}}^{fRLBF} (t) & = \underset{α}{arg min} \{\sum_{j = - k}^{k} | | \tilde{θ} (t + j) - {F (j) α | |}_{2}^{2} + {| | α | |}_{Σ (t; η, μ)}^{2}\} \\ = {[I_{n m} + Σ (t; η, μ)]}^{- 1} {\hat{α}}^{fLBF} (t) \end{matrix}

(32)

\begin{matrix} {\hat{θ}}^{fRLBF} (t) & = F_{0} {\hat{α}}^{fRLBF} (t) \end{matrix}

(33)

where

\begin{matrix} \begin{matrix} Σ (t; η, μ) & = η Δ (t) + μ F_{0}^{H} Γ (t) F_{0} \\ = η Δ (t) + μ Γ (t) \otimes (f_{0} f_{0}^{H}) . \end{matrix} \end{matrix}

(34)

The last transition in (34) follows from the identity

\begin{matrix} (A \otimes B) (C \otimes D) = (A C) \otimes (B D) \end{matrix}

(35)

which holds true for Kronecker products, provided that all dimensions match.

Utilizing the fact that the matrix

Σ (t; η, μ)

is block diagonal

\begin{matrix} Σ (t; η, μ) & = bl diag {Σ_{1} (t; η, μ), \dots, Σ_{n} (t; η, μ)} \end{matrix}

(36)

\begin{matrix} Σ_{i} (t; η, μ) & = η Δ_{i} (t) + μ γ_{i} (t) f_{0} f_{0}^{H}, i = 1, \dots, n \end{matrix}

(37)

the relationship (32) and (33) can be rewritten in a simpler, decomposed form

\begin{matrix} {\hat{α}}_{i}^{fRLBF} (t) & = {[I_{m} + Σ_{i} (t; η, μ)]}^{- 1} {\hat{α}}_{i}^{fLBF} (t) \end{matrix}

(38)

\begin{matrix} {\hat{θ}}_{i}^{fRLBF} (t) & = f_{0}^{H} {\hat{α}}_{i}^{fRLBF} (t) \\ i & = 1, \dots, n \end{matrix}

(39)

where

\begin{matrix} {\hat{α}}_{i}^{fLBF} (t) = \sum_{j = - k}^{k} {\tilde{θ}}_{i} (t + j) f (j) . \end{matrix}

(40)

LASSO estimators discard insignificant components by shrinking them to zero. The proposed fRLBF scheme approximates this behavior. When the estimates

{\hat{a}}_{i l}^{fLBF} (t)

and

{\hat{θ}}_{i}^{fLBF} (t)

are very small in magnitude, the weights

δ_{i l} (t)

and

γ_{i} (t)

in (38) and (39) become very large, effectively shrinking the corresponding fLBF estimates towards zero.

Remark 1.

Communication channels typically exhibit a decaying power profile due to spreading and absorption loss, which can be modeled by assuming that

\begin{matrix} var [θ_{i} (t)] \propto ξ^{i - 1}, i = 1, \dots, n \end{matrix}

where

ξ, 0 < ξ < 1

, represents the decay rate of the exponential power envelope. Incorporating this prior knowledge into the estimation scheme is straightforward by setting

\begin{matrix} γ_{i} (t) = [ξ^{\frac{i - 1}{2}} | {\hat{θ}}_{i}^{fLBF} (t) {|]}^{- 1} \end{matrix}

(41)

This corresponds to replacing the second regularizer in (21) with the weighted

ℓ^{1}

norm

θ (t)

given by

\begin{matrix} \sum_{i = 1}^{n} \frac{| θ_{i} |}{ξ^{\frac{i - 1}{2}}} \end{matrix}

Hence, adjusting ξ according to the decaying power profile leads to improved estimation results.

2.4. Computational Complexity of fRLBF Estimators

First of all, we note that for the selected basis set, powers of time, the fLBF estimates are recursively computable [14]. This is a direct consequence of the recursive computability of

f (j)

. It is easy to check that

\begin{matrix} f (j) = Λ_{m | k} f (j + 1) \end{matrix}

(42)

where

\begin{matrix} Λ_{m | k} = [\begin{matrix} 1 & 0 & \dots & 0 \\ - \frac{1}{k} & 1 & 0 \\ ⋱ & ⋱ \\ \frac{(\binom{m - 1}{m - 1})}{{(- k)}^{m - 1}} & \frac{(\binom{m - 1}{m - 2})}{{(- k)}^{m - 2}} & \dots & 1 \end{matrix}] \end{matrix}

(43)

denotes the

m \times m

transition matrix, and

(\binom{n}{k}) = \frac{n!}{k! (n - k)!}

denotes binomial coefficient, leading to the following recursive formula:

\begin{matrix} {\hat{α}}_{i}^{fLBF} (t + 1) = Λ_{m | k} [{\hat{α}}_{i}^{fLBF} (t) - {\tilde{θ}}_{i} (t - k) f (- k)] + {\tilde{θ}}_{i} (t + k + 1) f (k) . \end{matrix}

(44)

All n fLBF estimates can be obtained at the cost of

(m^{2} + 2 m) n

complex flops (complex multiply-add operations) per time updated. Importantly, this cost remains independent of the analysis window width K.

To estimate the computational burden of the second estimation stage, consider the matrix inversion lemma [4]

\begin{matrix} \begin{matrix} {[I_{m} + Σ_{i} (t; η, μ)]}^{- 1} = {[G_{i} (t; η) + μ γ_{i} (t) f_{0} f_{0}^{H}]}^{- 1} \\ = G_{i}^{- 1} (t; η) - μ γ_{i} (t) \frac{G_{i}^{- 1} (t; η) f_{0} f_{0}^{H} G_{i}^{- 1} (t; η)}{1 + μ γ_{i} (t) f_{0}^{H} G_{i}^{- 1} (t; η) f_{0}} \end{matrix} \end{matrix}

(45)

where

\begin{matrix} G_{i}^{- 1} (t; η) & = diag {g_{i 1} (t; η), \dots, g_{i m} (t; η)} \end{matrix}

(46)

\begin{matrix} g_{i l} (t; η) & = {[1 + η δ_{i l} (t)]}^{- 1} . \end{matrix}

(47)

Exploiting the diagonal nature of the matrix

G_{i}^{- 1} (t; η) = {[I_{m} + η Δ_{i} (t)]}^{- 1}

and the fact that

f_{0} = {[1, \dots, 1]}^{T}

, the cost of updating the estimates

{\hat{α}}_{i}^{fRLBF} (t), i = 1, \dots, n

, is approximately

4 m n

complex flops, after neglecting terms of order

O (n)

and

O (m)

. Additional

m n

complex additions are required to update

{\hat{θ}}_{i}^{fRLBF} (t), i = 1, \dots, n

.

Finally, computation of

g_{i l} (t; η), i = 1, \dots, n, l = 1, \dots, m

, requires

m n

complex flops,

m n

square root operations, and

m n

real divisions. Hence, the cost of running the two-stage procedure is roughly

(m^{2} + 6 m) n

complex flops,

m n

complex additions,

m n

square root operations, and

m n

real divisions per time update.

Remark 2.

Since all eigenvalues of the matrix

Λ_{m | k}

are located on the unit circle in the complex plane, the recursive algorithm (44) is not exponentially stable but only marginally stable. This means that it has the tendency to diverge at a slow (linear) rate when the number of time steps becomes very large. To prevent numerical problems caused by the unbounded accumulation of round-off errors, the algorithm (44) should be periodically reset by directly computing

{\hat{α}}_{i}^{fLBF} (t)

using (40). In the absence of an automated mechanism for detecting numerical issues and initiating timely resets, a practical strategy was employed. To balance the computational efficiency of the recursive formulation with numerical stability, algorithm (44) was reset every 1000 steps, with the fLBF estimates re-initialized directly from the local estimation window at these intervals [35].

2.5. Multi-Step Procedure of fRLBF Estimators

The fRLBF scheme described above is a two-step procedure. For improved performance, one can consider a multi-step procedure, similar to those described in [33,34], at the expense of increased computational complexity, as follows:

\begin{matrix} {\hat{α}}_{i}^{fRLBF} (t, j + 1) & = {[I_{m} + Σ_{i} (t, j; η, μ)]}^{- 1} {\hat{α}}_{i}^{fLBF} (t) \end{matrix}

(48)

\begin{matrix} {\hat{θ}}_{i}^{fRLBF} (t, j + 1) & = f_{0}^{H} {\hat{α}}_{i}^{fRLBF} (t, j + 1) \\ i = 1, \dots, n, & j = 1, 2, \dots \end{matrix}

(49)

where

\begin{matrix} Σ_{i} (t, j; η, μ) & = η Δ_{i} (t, j) + μ γ_{i} (t, j) f_{0} f_{0}^{H} \end{matrix}

(50)

\begin{matrix} Δ_{i} (t, j) & = diag {δ_{i 1} (t, j), \dots, δ_{i m} (t, j)} \end{matrix}

(51)

\begin{matrix} δ_{i l} (t, j) & = | {\hat{a}}_{i l}^{fRLBF} {(t, j) |}^{- 1} \end{matrix}

(52)

\begin{matrix} γ_{i} (t, j) & = | {\hat{θ}}_{i}^{fRLBF} {(t, j) |}^{- 1} \\ i = 1, \dots, & n, l = 1, \dots, m \end{matrix}

(53)

and

{\hat{α}}_{i}^{fRLBF} (t, 1) = {\hat{α}}_{i}^{fLBF} (t)

,

{\hat{θ}}_{i}^{fRLBF} (t, 1) = {\hat{θ}}_{i}^{fLBF} (t)

. To avoid numerical issues, the magnitudes of

| {\hat{a}}_{i l}^{fRLBF} (t, j) |

and

| {\hat{θ}}_{i}^{fRLBF} (t, j) |

should be replaced by a fixed number before inversion if they are too close to zero. This iterative procedure can be implemented efficiently without requiring inversion of an

m \times m

matrix, as demonstrated in the previous section (see (45)–(47)).

It is important to emphasize that the proposed fRLBF approach is not iterative in the same sense as algorithms like ISTA, where a stopping criterion is required to prevent excessive computational cost once convergence is achieved, or OMP, where limiting the number of selected atoms is essential not only to control computational cost but also to avoid increased estimation error due to over-parameterization. In the fRLBF approach, estimation convergence is determined by how the pre-estimates are obtained and by the careful selection of fLBF hyperparameters. The multi-step procedure is intended solely for fine-tuning the results, with each iteration involving

6 m n

complex flops,

m n

square root operations, and

m n

real divisions. While increasing the number of iterations can yield incremental improvements in estimation accuracy, typical criteria—such as absolute or relative changes in parameter values—require selecting an appropriate convergence threshold. If not properly chosen, this can lead to unnecessary iterations and reduced computational efficiency. In the case of the time-varying FIR model, the optimal number of iterations is application-dependent and should be selected to balance estimation accuracy with computational efficiency. Based on empirical observations, the best trade-off was achieved by stopping the procedure after the second iteration.

2.6. Selection of Regularization Gains

The choice of regularization gains is an important part of regularized estimation—if the values of

η

and

μ

are not chosen appropriately, the accuracy of fRLBF estimates can be worse than the accuracy of their not regularized versions. In this section, we will present three approaches to solve this problem.

2.6.1. Empirical Bayes Approach

Bayesian redefinition of the optimization problem (32) and (33) is based on the observation that minimization of the quadratic cost function

\begin{matrix} \sum_{j = - k}^{k} | | \tilde{θ} (t + j) - {F (j) α | |}_{2}^{2} + {| | α | |}_{Σ}^{2} \end{matrix}

is equivalent to the maximization of the expression

\begin{matrix} exp \{- \frac{1}{2 σ_{z}^{2}} \sum_{j = - k}^{k} | | \tilde{θ} (t + j) - F (j) α {| |}_{2}^{2}\} \times exp \{- \frac{1}{σ_{z}^{2}} {| | α | |}_{Σ}^{2}\} \end{matrix}

(54)

which can be given a probabilistic interpretation. Assuming that the pre-estimation noise

z (t)

is Gaussian, the first term in (54) represents the conditional data distribution (likelihood)

\begin{matrix} p (Θ (t) | α, σ_{z}^{2}) = \frac{1}{{(2 π σ_{z}^{2})}^{\frac{M}{2}}} exp {- \frac{1}{2 σ_{z}^{2}} {| | α | |}_{2}^{2} \\ + \frac{1}{2 σ_{z}^{2}} α^{H} \sum_{j = - k}^{k} F^{H} (j) \tilde{θ} (t + j) + \frac{1}{2 σ_{z}^{2}} \sum_{j = - k}^{k} {\tilde{θ}}^{H} (t + j) F (j) α \\ - \frac{1}{2 σ_{z}^{2}} \sum_{j = - k}^{k} | | \tilde{θ} {(t + j) | |}_{2}^{2}} \end{matrix}

(55)

where

Θ (t) = {\tilde{θ} (t + j), j \in I_{k}}

and

M = K n

. The second term corresponds to a prior distribution of

α

\begin{matrix} π (α | Σ, σ_{z}^{2}) = \frac{{| Σ |}^{1 / 2}}{{(2 π σ_{z}^{2})}^{m n / 2}} exp \{- \frac{1}{2 σ_{z}^{2}} α^{H} Σ α\} \end{matrix}

(56)

where

| Σ |

denotes the determinant of the matrix

Σ

. The likelihood for the unknown parameters

Σ

and

σ_{z}^{2}

can be obtained from (see Appendix A)

\begin{matrix} L (Σ, σ_{z}^{2}) = \int_{C^{m n}} p (Θ (t) | α, σ_{z}^{2}) π (α | Σ, σ_{z}^{2}) d α \\ = \frac{{| Σ |}^{1 / 2}}{{(2 π σ_{z}^{2})}^{M / 2} {| I_{m n} + Σ |}^{1 / 2}} exp \{- \frac{ζ (t; μ)}{2 σ_{z}^{2}}\} \end{matrix}

(57)

where

\begin{matrix} ζ (t; Σ) = \sum_{j = - k}^{k} | | \tilde{θ} (t + j) {| |}_{2}^{2} - {[{\hat{α}}^{fLBF} (t)]}^{H} {[I_{m n} + Σ]}^{- 1} {\hat{α}}^{fLBF} (t) \end{matrix}

(58)

denotes the residual sum of squares. Good [36] referred to the maximization of (57) as a type II maximum likelihood method, but recently it has been more frequently referred to as the empirical Bayes approach [15,36].

Since the maximum likelihood estimate of the variance

σ_{z}^{2}

can be obtained in the form

{\hat{σ}}_{z}^{2} (Σ) = ζ (t, Σ) / M

, the optimal value of the regularization matrix

Σ

can be obtained by maximizing the concentrated likelihood function

L (Σ, {\hat{σ}}_{z}^{2} (Σ))

, or equivalently by minimizing the quantity

\begin{matrix} - 2 log L (Σ, σ_{z}^{2} (Σ)) = const + M log {\hat{σ}}_{z}^{2} (Σ) - log | Σ | + log | I_{m n} + Σ | . \end{matrix}

(59)

In the case of structured matrix of the form

Σ = Σ (t; η, μ) = η Δ (t) + μ Γ (t)

, the optimization is restricted to regularization gains

η

and

μ

. This leads to the following optimization formula (see Appendix B):

\begin{matrix} {η^{opt}, μ^{opt}} = arg min_{η, μ} J (t; η, μ) \end{matrix}

(60)

where

\begin{matrix} \begin{matrix} J (t; η, μ) & = M log ζ (t; η, μ) + \sum_{i = 1}^{n} \sum_{l = 1}^{m} log \frac{1 + η δ_{i l} (t)}{η δ_{i l} (t)} \\ + \sum_{i = 1}^{n} log \frac{1 + μ γ_{i} (t) \sum_{l = 1}^{m} \frac{| f_{l} {(0) |}^{2}}{1 + η δ_{i l} (t)}}{1 + μ γ_{i} (t) \sum_{l = 1}^{m} \frac{| f_{l} {(0) |}^{2}}{η δ_{i l} (t)}} . \end{matrix} \end{matrix}

(61)

As a practical way of solving the optimization problem, one can consider parallel estimation. In this case, not one but p fRLBF algorithms equipped with different regularization gains

(η, μ) \in P = {(η_{1}, μ_{1}), \dots, (η_{p}, μ_{p})}

, yielding the estimates

{\hat{α}}^{fRLBF} (t | η, μ)

and

{\hat{θ}}^{fRLBF} (t | η, μ)

, are run simultaneously and compared using the empirical Bayes measure of fit (61). At each time instant t, the best fitting values of

η

and

μ

are chosen using the grid search

\begin{matrix} {\hat{η} (t), \hat{μ} (t)} = \underset{(η, μ) \in P}{arg min} J (t; η, μ) \end{matrix}

(62)

and the final estimates for empirical Bayes approach have the form

\begin{matrix} {\hat{α}}^{fRLBF} (t | \hat{η} (t), \hat{μ} (t)), {\hat{θ}}^{fRLBF} (t | \hat{η} (t), \hat{μ} (t)) . \end{matrix}

(63)

The cost of evaluating (63) is of order

O (p m n)

per time update. Note that in this case, the following computational shortcut can be used to evaluate the residual sum of squares:

\begin{matrix} \begin{matrix} ζ (t; η, μ) & = \sum_{j = - k}^{k} | | \tilde{θ} (t + j) {| |}_{2}^{2} - {[{\hat{α}}^{fLBF} (t)]}^{H} {[I_{m n} + Σ (t; η, μ)]}^{- 1} {\hat{α}}^{fLBF} (t) \\ = \sum_{i = 1}^{n} \sum_{j = - k}^{k} {| {\tilde{θ}}_{i} (t + j) |}^{2} - \sum_{i = 1}^{n} {[{\hat{α}}_{i}^{fLBF} (t)]}^{H} {\hat{α}}_{i}^{fRLBF} (t | η, μ) . \end{matrix} \end{matrix}

(64)

Note also that the first term on the right-hand side of (64) can be updated recursively.

2.6.2. Decentralized Approach

Alternatively, following the decomposed form in (38) and (39), we propose to optimize the regularization gains

η

and

μ

independently for each

{\hat{α}}_{i}^{fRLBF} (t)

, which enhances the flexibility of the approach. Since some of the parameters

θ_{i} (t)

may be static while others time-varying, different regularization gains are necessary. This leads to the decomposed optimization formula

\begin{matrix} {η_{i}^{opt}, μ_{i}^{opt}} = \underset{η_{i}, μ_{i}}{arg min} J_{i} (t; η_{i}, μ_{i}), i = 1, \dots, n \end{matrix}

(65)

where

\begin{matrix} \begin{matrix} J_{i} (t; η_{i}, μ_{i}) & = K log ζ_{i} (t; η_{i}, μ_{i}) + \sum_{l = 1}^{m} log \frac{1 + η_{i} δ_{i l} (t)}{η_{i} δ_{i l} (t)} \\ + log \frac{1 + μ_{i} γ_{i} (t) \sum_{l = 1}^{m} \frac{| f_{l} {(0) |}^{2}}{1 + η_{i} δ_{i l} (t)}}{1 + μ_{i} γ_{i} (t) \sum_{l = 1}^{m} \frac{| f_{l} {(0) |}^{2}}{η_{i} δ_{i l} (t)}} \end{matrix} \end{matrix}

(66)

and

\begin{matrix} \begin{matrix} ζ_{i} (t; η_{i}, μ_{i}) = \sum_{j = - k}^{k} {| {\tilde{θ}}_{i} (t + j) |}^{2} - {[{\hat{α}}_{i}^{fLBF} (t)]}^{H} {\hat{α}}_{i}^{fRLBF} (t | η_{i}, μ_{i}) . \end{matrix} \end{matrix}

(67)

Similar to the centralized approach, at each time instant t, the best fitting values of

η_{i}

and

μ_{i}

are chosen using the grid search

\begin{matrix} {{\hat{η}}_{i} (t), {\hat{μ}}_{i} (t)} = \underset{(η_{i}, μ_{i}) \in P}{arg min} J_{i} (t; η_{i}, μ_{i}) \end{matrix}

(68)

and the final estimates for the decentralized approach have the form

\begin{matrix} {\hat{α}}_{i}^{fRLBF} (t | {\hat{η}}_{i} (t), {\hat{μ}}_{i} (t)), {\hat{θ}}_{i}^{fRLBF} (t | {\hat{η}}_{i} (t), {\hat{μ}}_{i} (t)) . \end{matrix}

(69)

2.6.3. Cross-Validation Approach

Denote by

{\hat{α}}^{\circ} (t | η, μ)

and

{\hat{θ}}^{\circ} (t | η, μ)

the “holey” estimates of

α (t | η, μ)

and

θ (t | η, μ)

, obtained by excluding from the estimation process the central “measurement”

\tilde{θ} (t)

\begin{matrix} {\hat{α}}^{\circ} (t | η, μ) & = \underset{α}{arg min} \{\sum_{\begin{matrix} j = - k \\ j \neq 0 \end{matrix}}^{k} | | \tilde{θ} (t + j) - {F (j) α | |}_{2}^{2} + {| | α | |}_{Σ (t; η, μ)}^{2}\} \end{matrix}

(70)

\begin{matrix} {\hat{θ}}^{\circ} (t | η, μ) & = F_{0} {\hat{α}}^{\circ} (t | η, μ) . \end{matrix}

(71)

The proposed leave-one-out cross-validation approach is based on minimization of the local sum of squared deleted residuals

\begin{matrix} ε^{\circ} (t | η, μ) = y (t) - {[{\hat{θ}}^{\circ} (t | η, μ)]}^{H} φ (t) \end{matrix}

(72)

\begin{matrix} J (t; η, μ) = \sum_{j = - N}^{N} {| ε^{\circ} (t + j | η, μ) |}^{2} \end{matrix}

(73)

where N determines the size of the local decision window.

It is straightforward to check that

\begin{matrix} {\hat{α}}_{i}^{\circ} (t | η, μ) = {[D_{i} (t; η, μ) - f_{0} f^{H}]}^{- 1} [{\hat{α}}_{i}^{fLBF} (t) - {\tilde{θ}}_{i} (t) f_{0}] \end{matrix}

(74)

where

D_{i} (t; η, μ) = I_{m} + Σ_{i} (t; η, μ)

. Using the matrix inversion lemma, one obtains

\begin{matrix} \begin{matrix} {[D_{i} (t; η, μ) - f_{0} f_{0}^{H}]}^{- 1} = D_{i}^{- 1} (t; η, μ) + \frac{D_{i}^{- 1} (t; η, μ) f_{0} f_{0}^{H} D_{i}^{- 1} (t; η, μ)}{1 - f_{0}^{H} D_{i}^{- 1} (t; η, μ) f_{0}} . \end{matrix} \end{matrix}

(75)

Combining (74) with (75) and taking into account the fact that

D_{i}^{- 1} (t; η, μ) {\hat{α}}_{i}^{fLBF} (t) = {\hat{α}}_{i}^{fRLBF} (t)

, one arrives at

\begin{matrix} {\hat{θ}}_{i}^{\circ} (t | η, μ) = \frac{{\hat{θ}}_{i}^{fRLBF} (t | η, μ) - β_{i} (t; η, μ) {\tilde{θ}}_{i} (t)}{1 - β_{i} (t; η, μ)} \end{matrix}

(76)

where (cf. (45))

\begin{matrix} β_{i} (t; η, μ) = f_{0}^{H} D_{i}^{- 1} (t; η, μ) f_{0} = \frac{f_{0}^{H} G_{i}^{- 1} (t; η) f_{0}}{1 + μ γ_{i} (t) f_{0}^{H} G_{i}^{- 1} (t; η) f_{0}} . \end{matrix}

(77)

This means that deleted residuals can be determined without evaluating of the corresponding holey estimates of

α_{i} (t)

.

Remark 3.

The optimization formula can be further extended by the adaptive selection of the power decay coefficient ξ

\begin{matrix} {\hat{η} (t), \hat{μ} (t), \hat{ξ} (t)} = \underset{(η, μ, ξ) \in P}{arg min} J (t; η, μ, ξ) \end{matrix}

(78)

where

(η, μ, ξ) \in P = {(η_{1}, μ_{1}, ξ_{1}), \dots, (η_{p}, μ_{p}, ξ_{p})}

.

2.7. Debiasing

One of the crucial observations about the fLBF approach is that the estimated parameter trajectory lags behind the true one. The length of this delay depends on the forgetting factor

λ

used in the pre-estimation stage (18) and may vary with time. Since fLBF estimates are used to obtain fRLBF estimates, the estimation lag is inherited.

In [37] authors proposed a simple adaptive scheme to obtain time-shifted fLBF estimates

\begin{matrix} {\hat{θ}}^{dfLBF} (t) = {\hat{θ}}^{fLBF} (t + d (t)) \end{matrix}

(79)

where the time-shift

d (t)

is estimated for every time instant t

\begin{matrix} \hat{d} (t) & = \underset{d \in D}{arg min} J (t, d) \end{matrix}

(80)

\begin{matrix} J (t, d) & = λ_{0} J (t - 1, d) + {| ε_{d} (t) |}^{2}, 0 < λ_{0} < 1 \end{matrix}

(81)

\begin{matrix} ε_{d} (t) & = y (t) - {[{\hat{θ}}^{fLBF} (t + τ)]}^{H} φ (t) \end{matrix}

(82)

and

D = [τ - δ_{0}, τ + δ_{0}]

, where

τ

is an initial approximation obtained by rounding

λ / (1 - λ)

. The proposed debiasing procedure can reduce the bias of fLBF estimates without affecting the variance component of the mean square parameter estimation error. The time-shifted fLBF estimates can be used to obtain the time-shifted fRLBF estimates (dfRLBF).

Another solution to the problem relies on replacing unidirectional pre-estimates (18) with bidirectional pre-estimates obtained with non-causal double exponentially weighted least squares estimates [38].

2.8. The Number of Basis Functions and the Analysis Window Size

To balance the bias and variance in the mean squared parameter estimation error, it is crucial to choose appropriate values for the design parameters m and k. Increasing m or decreasing k reduces estimation bias but raises variance, while decreasing m or increasing k does the opposite [29]. Therefore, adjusting m and k according to the rate and mode of parameter variation ensures satisfactory estimation results. This can be achieved using parallel estimation techniques, where multiple identification algorithms with different settings are run simultaneously, providing estimates

{\hat{α}}_{m | k}^{fLBF} (t)

,

{\hat{θ}}_{m | k}^{fLBF} (t)

for various

m \in M

and

k \in K

. At each time instant, only one algorithm is selected, yielding parameter estimates in the form

\begin{matrix} {\hat{α}}_{\hat{m} (t) | \hat{k} (t)}^{fLBF} (t), {\hat{θ}}_{\hat{m} (t) | \hat{k} (t)}^{fLBF} (t) \end{matrix}

(83)

where

\begin{matrix} {\hat{m} (t), \hat{k} (t)} = \underset{\begin{matrix} m \in M \\ k \in K \end{matrix}}{arg min} I_{m | k} (t) \end{matrix}

(84)

and

I_{m | k} (t)

denotes the local decision statistic such as the cross-validation selection rule [29] or the modified Akaike’s final prediction error criterion [39]. One can use the proposed regularization scheme only for the best-fitting fLFB algorithm or apply the decision rules to the fRLBF estimates

{\hat{θ}}_{m | k}^{fRLBF} (t)

.

3. Results and Discussion

To demonstrate the identification performance of a non-stationary communication channel with sparseness using the proposed regularization approach, we analyzed a channel loosely inspired by the self-interference channel model of a full-duplex UWA system [28].

3.1. Channel Model

Following the methodology described in [28], first we generated a channel model with an exponential envelope. The channel was characterized as a 50-tap FIR filter with complex-valued coefficients that varied independently with the decreasing variance chosen according to

\begin{matrix} var [θ_{i} (t)] = ξ^{i - 1}, i = 1, \dots, 50 \end{matrix}

(85)

where

ξ

was set to 0.69. Next, variations were introduced to the channel coefficients to better reflect real-world conditions. First, 30% of these coefficients were randomly set to zero, mimicking signal obstruction or absorption, similar to physical barriers underwater. Subsequently, 50% of the coefficients were assigned fixed values, representing stable transmission paths or reflections from static features like the sea floor. Finally, 20% of the coefficients were left unchanged to simulate time-varying effects. These fluctuations are often introduced by moving objects, such as marine life or waves, or by changes in surface reflections.

The input signal was generated as a circular white binary sequence

u (t) = \pm 1 \pm j

, while the measurement noise followed a circular white Gaussian distribution with variance

σ_{e}^{2}

settings corresponded to input signal-to-noise ratios (SNR) of 30 dB, 20 dB, and 10 dB, respectively. Typical trajectories of the time-varying channel coefficient are shown in Figure 6.

Figure 6. The true time-varying channel coefficient trajectory (a typical example) was used in the system Equation (1) to generate the system output

y (t)

at different SNR levels.

3.2. Metrics

To assess the performance of the compared approaches, we utilized two metrics. The first metric is the self-interference cancellation factor (SICF) [28], defined as

\begin{matrix} SICF = \frac{\sum_{t} {| θ^{H} (t) φ (t) |}^{2}}{\sum_{t} {| {[θ (t) - \hat{θ} (t)]}^{H} φ (t) |}^{2}} \end{matrix}

(86)

The second metric, introduced in [15], measures the normalized root mean squared error of fit, denoted as FIT(t)

\begin{matrix} FIT (t) = 100 (1 - {[\frac{\sum_{i = 1}^{50} {| θ_{i} (t) - {\hat{θ}}_{i} (t) |}^{2}}{\sum_{i = 1}^{50} {| θ_{i} (t) - \bar{θ} (t) |}^{2}}]}^{1 / 2}) \end{matrix}

(87)

where

\bar{θ} (t) = \frac{1}{50} \sum_{i = 1}^{50} θ_{i} (t)

. A FIT value of 100 indicates a perfect match between the true and estimated impulse response. In the simulation experiment, all results were averaged over 20 independent realizations of channel coefficients and 10,000 time instants. Each realization had a different variation pattern. Additionally, each data realization included an additional 1000 time instants at the beginning and the end to mitigate boundary problems.

3.3. Simulation Experiment Settings

In the numerical experiment, three algorithms were compared: LBF, fLBF, and fRLBF (after debiasing dfLBF and dfRLBF). These comparisons were conducted across various combinations of k and m and at three SNR levels: 10 dB, 20 dB, and 30 dB. The combinations considered include

k \in [100, 300]

and

m \in [1, 3, 5]

, excluding the combination

k = 100, m = 5

for LBF. This exclusion was necessary because without regularization, it is impossible to estimate 250 hyperparameters based on only 201 data points. For the dfRLBF algorithm, three approaches were proposed and compared for selecting the regularization parameters: the empirical Bayes (dfRLBF [A]), the decentralized (dfRLBF [B]), and the cross-validation approach (dfRLBF [C]). The regularization parameter combinations included

ξ \in [0.5, 0.7, 0.9]

,

μ \in [0.01, 0.1, 1]

, and

η \in [0.01, 0.1, 1]

. These values were selected based on pre-experimental experience and are intended to cover typical ranges of regularization intensity. The case where the regularization gains are set to zero yields results identical to those obtained with the fLBF approach (without regularization). An improvement in estimation accuracy was observed as the regularization gains increased. However, for values above 1, a degradation in the estimation results occurred. One may obtain even better estimation accuracy by introducing a finer grid in the range of regularization gains between 0.1 and 1, but this comes at the cost of increased computational load.

The size of the local decision window for the cross-validation rule was set to

N = 80

, while for the debiasing technique, it was set to

δ_{0} = 20

.

3.4. Results

Table 1 compares the averaged FIT [%] and SICF [dB] scores for three algorithms, LBF, fLBF, and dfLBF, across various k, m, and SNR settings. The choice of k and m significantly influences the identification results for all methods, as shown in the table. When

m = 1

, LBF’s performance is inferior compared to the fLBF and dfLBF methods. The most notable discrepancy between LBF and dfLBF is observed at

k = 100

and

m = 1

with a 30 dB SNR, where the differences are

Δ

FIT = 2.4% and

Δ

SICF = 7.2 dB. As m increases, so does the number of estimated hyperparameters. LBF’s reliance on matrix inversion makes it sensitive to the number of hyperparameters, affecting the minimum window length required. Consequently, some table entries are marked with ‘x’ where there are insufficient data points to estimate the hyperparameters. Despite its non-causal nature (it uses both past and future data) causing decision delays that grow with larger k, LBF’s estimated parameter trajectory does not lag behind the true one. At high SNR (30 dB), LBF achieves its best results with

k = 300, m = 5

, showing FIT = 99.2% and SICF = 45.4 dB, outperforming the other methods. The fLBF method, despite being non-causal, requires time-shift correction to mitigate bias arising from using causal pre-estimates. It utilizes causal pre-estimates and inherits the estimation delay. Once debiasing is applied, dfLBF mostly shows improvement, especially at higher SNRs (≥20 dB). For instance, at

k = 300

and

m = 3

, the improvements at 10 dB SNR are

Δ

FIT =

0.6 %

and

Δ

SICF =

0.4

dB, while at 30 dB SNR, they are

Δ

FIT = 1.1% and

Δ

SICF =

6.6

dB. This suggests that time-shift estimates are less precise in low SNR scenarios. In the table, the best results for each unique parameter combination of k, m, and SNR are highlighted in bold, demonstrating that no single method is superior across all settings.

Table 1. Average FIT [%] and SICF [dB] values for the LBF, fLBF, and dfLBF algorithms. Results are presented across varying design parameters: local estimation window size (

K = 2 k + 1

, for

k \in [100, 300]

) and number of basis functions (

m \in [1, 3, 5]

), and at signal-to-noise ratio (SNR) levels of 10 dB, 20 dB, and 30 dB. Best performance values among the compared methods are highlighted in bold for each unique parameter combination and SNR level; a method with a greater number of highlighted results signifies its superior overall performance under identical experimental conditions. Cases with insufficient data for the LBF algorithm are denoted by ‘x’.

In the experiment, the fRLBF approach improved FIT [%] and SICF [dB] scores compared to fLBF, with only a few instances of minor degradation. Table 2 presents results for the dfRLBF (after debiasing) with three gain selection algorithms: dfRLBF

[A]

, dfRLBF

[B]

, and dfRLBF

[C]

. The table highlights the best results for each unique parameter combination k, m, and SNR, showing that dfRLBF [B] performs best at 10 and 20 dB SNR, while both dfRLBF

[A]

and dfRLBF

[C]

achieve the highest FIT [%] and SICF [dB] scores at 30 dB SNR.

Table 2. Average FIT [%] and SICF [dB] values for the dfRLBF algorithm, comparing various gains selection approaches: [A] empirical Bayes approach, [B] decentralized approach, and [C] cross-validation approach. Performance is evaluated across design parameters

k \in [100, 300]

and

m \in [1, 3, 5]

, and at SNR levels of 10 dB, 20 dB, and 30 dB. Best performance values for each unique parameter combination k, m, and SNR are highlighted in bold. An adaptive approach, dynamically selecting m and k at each time instant, is denoted by ‘A’.

At a high SNR level of 30 dB, dfRLBF

[B]

performs slightly worse than dfRLBF

[A]

and dfRLBF

[C]

, primarily because its decentralized regularization gain selection mechanism is more sensitive to local variations, resulting in frequent adjustments of the regularization gains. This makes it less stable than approaches that select regularization gains jointly. However, in low and medium SNR scenarios, this gain selection mechanism can enhance channel estimation performance. As expected, the proposed regularization yields the highest improvements at SNR 10 dB; for instance, dfRLBF

[B]

compared to dfLBF at 10 dB SNR with

k = 100, m = 5

results in

Δ

FIT = 5.6% and

Δ

SICF = 3.6 dB. As SNR increases, the score improvements diminish, such as dfRLBF

[B]

versus dfLBF at 30 dB SNR with

k = 100, m = 5

, resulting in

Δ

FIT = 1% and

Δ

SICF = 1.1 dB. In the adaptive approach, labeled as ‘A’ in the table, regularization is applied to the best-fitting dfLBF algorithm. At lower SNR levels (10 and 20 dB), adaptive dfRLBF

[B]

outperforms both adaptive dfRLBF

[A]

and dfRLBF

[C]

, with results approaching those from the best settings. However, at 30 dB SNR, the adaptive dfRLBF

[C]

approach is superior to the others, showing results that are equal to or better than those from the best settings.

Table 3 shows the average FIT [%] and SICF [dB] values for the multi-step dfRLBF

[B, j]

approach with varying iterations:

j = 1, \dots, 4

across design parameters

k \in [100, 300]

and

m \in [1, 3, 5]

, and at SNR level of 10 dB. The multi-step procedure described in (48)–(50) notably improves performance at low SNR (10 dB), with the most improvement seen from dfRLBF

[B, 1]

to dfRLBF

[B, 2]

. Additional iterations yield only slight accuracy gains, as illustrated in Figure 7 for 20 iterations. A practical rule of thumb is to stop after the second iteration, since each additional iteration increases the computational cost by

6 m n

complex flops,

m n

square root operations, and

m n

real divisions. This pattern is similar for the dfRLBF

[A]

and dfRLBF

[C]

methods.

Table 3. Average FIT [%] and SICF [dB] values for the dfRLBF [B] algorithm with varying numbers of iterations

[B, j], j = 1, 2, \dots

in the multi-step procedure. Performance is evaluated across design parameters

k \in [100, 300]

and

m \in [1, 3, 5]

, and at an SNR level of 10 dB. Best performance values for each unique parameter combination k, m, and SNR are highlighted in bold.

Figure 7. Convergence of the average FIT [%] and SICF [dB] values for the multi-step dfRLBF

[B, j]

algorithm, evaluated for

k = 100

and

m = 3

at an SNR level of 10 dB over twenty iterations in the multi-step procedure (

j = 1, \dots, 20

). The presented results were averaged over time for a single realization of the channel model (a typical example).

Figure 8 compares LBF, fLBF, and dfRLBF

[B, 2]

for

k = 100

and

m = 3

at an SNR level of 10 dB, focusing on parameter channel errors. The dfRLBF

[B, 2]

approach yields both quantitative and qualitative improvements over the LBF and fLBF algorithms. The improvement is noticeable for taps

i \geq 20

, where the channel parameter errors are smaller than in the other compared methods.

Figure 8. A snapshot of the true impulse response of the identified system (top figure) and three corresponding parameter channel errors obtained using the LBF, fLBF, and dfRLBF

[B, 2]

, respectively, for

k = 100

and

m = 3

settings at an SNR level of 10 dB.

3.5. Impact of Measurement Noise

So far, it has been assumed that the measurement noise follows a Gaussian distribution. To provide a more comprehensive evaluation, three additional noise models are considered using the same experimental setup: Laplacian noise, non-stationary noise, and pulse noise. All noises meet the assumption (

A

2).

The Laplacian noise is generated with the same variance as the Gaussian noise used in previous experiments [40]. This model is commonly employed to represent measurement noise that occasionally includes small outliers, as the heavier tails of the Laplace distribution make such deviations more likely compared to the Gaussian case.

To simulate non-stationary measurement noise, the noise variance was modulated over time according to the function

\begin{matrix} σ_{e} (t) = 0.7 + 0.5 sin (\frac{2 π t}{T}) \end{matrix}

(88)

where T is the total number of samples. At each time instant, the noise sample was drawn from a zero-mean Gaussian distribution with the corresponding time-varying standard deviation

N (0, σ_{e}^{2} (t))

.

Pulse noise is introduced using a Gaussian mixture model with

ϵ

-contamination [41]. Such a model is often used to describe measurement noise that is mostly Gaussian, but occasionally includes outliers or impulsive disturbances modeled by the high-variance Gaussian component. In this model, the noise distribution is a weighted sum (mixture) of two zero-mean Gaussian distributions with different variances

\begin{matrix} p (x) = (1 - ϵ), N (0, σ_{e}^{2}) + ϵ, N (0, κ σ_{e}^{2}) \end{matrix}

(89)

where

ϵ, (0 < ϵ < 1)

denotes the probability that a sample is drawn from the contaminating distribution, determining the frequency of outliers, while the coefficient

κ ≫ 1

controls the severity of these outliers by substantially increasing the variance of the contaminated component.

Three realizations of measurement noise sequences with different distributions are presented in Figure 9. From to top to bottom: (A) Laplace noise, (B) non-stationary noise, and (C) pulse noise generated for parameters

κ = 100, ϵ = 0.01

. Table 4 presents the average FIT [%] and SICF [dB] values for the dfLBF and dfRLBF

[B]

algorithms under various types of measurement noise and at SNR levels of 10 dB, 20 dB, and 30 dB. Both approaches achieve high values for these metrics when applied to Gaussian, Laplacian, and non-stationary noise. However, performance drops significantly in the presence of pulse noise, primarily because the fLBF approach relies on

ℓ^{2}

-norm minimization, which is not robust to impulsive disturbances. Nevertheless, under these challenging conditions, the dfRLBF

[B]

algorithm demonstrates a marked improvement over the dfLBF method. Similar results were also observed for the dfRLBF

[A]

and dfRLBF

[C]

approaches.

Figure 9. Three measurement noise sequences with different distributions. From top to bottom: (A) Laplacian noise, (B) non-stationary noise with time-varying variance, and (C) mixture noise—Gaussian noise contaminated by occasional impulsive (pulse) noise modeled for parameters

κ = 100, ϵ = 0.01

.

One practical solution is to incorporate a pulse detection algorithm that identifies pulses within the estimation window. With this enhancement, a robust LBF and fLBF approach would estimate channel parameters using only the outlier-free observations

y (t)

, as proposed in [42]. However, a limitation of this approach is that the estimation window will contain fewer data points, which, in cases of a high percentage of contamination, may lead to a drop in estimation performance or even numerical issues. An alternative solution, described in [43], combines pulse detection with the reconstruction of corrupted signal samples based on a signal model. This recursive algorithm is computationally efficient, preserves the original number of data points in the estimation window after reconstruction, and can effectively handle bursts of outliers affecting multiple consecutive samples. The results obtained by combining the pulse removal algorithm [43] with the proposed approaches are denoted as ‘Pulse*’ in Table 4. The pulse removal algorithm [43] based on semi-causal detection was configured with the following settings: an estimation window size of 201, a fifth-order AR model, a prediction-based detection threshold of 3.5, an interpolation-based detection threshold of 4.0, and a maximum detection burst length of 5.

Table 4. Average FIT [%] and SICF [dB] values for the dfLBF and dfRLBF [B] algorithms under various types of measurement noise—Gaussian noise, Laplacian noise, non-stationary noise, pulse noise, and data preprocessed with the pulse removal algorithm [43] (denoted as Pulse*). Performance was evaluated for parameters

k = 300

and

m = 3

, across SNR levels of 10 dB, 20 dB, and 30 dB.

Table 4. Average FIT [%] and SICF [dB] values for the dfLBF and dfRLBF [B] algorithms under various types of measurement noise—Gaussian noise, Laplacian noise, non-stationary noise, pulse noise, and data preprocessed with the pulse removal algorithm [43] (denoted as Pulse*). Performance was evaluated for parameters

k = 300

and

m = 3

, across SNR levels of 10 dB, 20 dB, and 30 dB.

		FIT [%]			SICF [dB]
Method	Noise	10 dB	20 dB	30 dB	10 dB	20 dB	30 dB
dfLBF	Gaussian	93.3	97.7	98.8	26.4	35.6	41.5
	Laplace	92.5	97.4	97.4	25.8	35	33.3
	Non-stationary	91.8	97	97	23.8	33	34.4
	Pulse	65	66.3	66.4	10.3	10.5	10.6
	Pulse*	87	90.7	91.2	18.9	21.2	21.6
dfRLBF [B]	Gaussian	95.3	98.3	98.9	29.2	37.7	41
	Laplace	95.1	98.1	97.4	28.8	37	33.3
	Non-stationary	94.7	97.8	97.5	27.1	35.1	34.4
	Pulse	79.2	80	80.1	14.5	14.8	14.8
	Pulse*	91.8	93.9	94.3	22.5	24.6	24.9

3.6. Comparison with the Orthogonal Matching Pursuit

The Orthogonal Matching Pursuit (OMP) algorithm [24] was employed to obtain a sub-optimal least squares estimate of the hyperparameter vector

α

, as defined in (9). This estimation was achieved by selecting

N_{OMP}

most significant atoms. The atom dictionary, an

n (2 k + 1) \times n m

matrix

F

, was constructed from basis functions and explicitly defined as

F = {[F^{T} (- k), \dots, F^{T} (k)]}^{T}

. The corresponding observation vector, with dimensions

n (2 k + 1) \times 1

, was formed by concatenating all pre-estimate vectors, specifically

{[{\tilde{θ}}_{1}^{T} (t - k), \dots, {\tilde{θ}}_{n}^{T} (t + k)]}^{T}

.

A significant reduction in computational cost during the atom selection step is attributed to the sparse nature of matrix

F

, characterized by its relatively small number of non-zero elements—only 2%. Furthermore, the block diagonal structure of

F (j)

provides crucial optimization for the OMP algorithm. This structure ensures that only a single coefficient needs to be determined per iteration, thereby avoiding the computationally intensive process of recomputing all previously identified coefficients, which is characteristic of standard OMP. This optimized implementation, while preserving the accuracy of the results, effectively lowers the computational cost of the OMP algorithm roughly by a factor of 12 compared to conventional OMP implementation. Additionally, once the atom dictionary is constructed with the chosen design parameters n, k, and m, it remains fixed throughout the simulation.

The atom dictionary for OMP algorithm was constructed with design parameters

n = 50

,

k = 100

, and

m = 3

to investigate the impact of the number of selected atoms on processing time. Table 5 summarizes the average FIT [%], SICF [dB], and processing time (

T_{p}

in seconds) achieved by OMP. This table illustrates how

N_{OMP}

(selected from

m n

dictionary elements) influences performance under different SNR conditions (10 dB, 20 dB, 30 dB). The reported

T_{p}

is for a single realization of 12,000-sample channel coefficients. In comparison, the dRLBF [B ] algorithm requires about 5 s for coefficient shrinking on the same CPU, after its regularization gains (

η

,

μ

) are chosen, with their optimization taking roughly 68 s.

Table 5. Average FIT [%], SICF [dB] values, and processing time (

T_{p}

) in seconds for the Orthogonal Matching Pursuit (OMP) algorithm. The table demonstrates the impact of the number of selected atoms (

N_{OMP}

, chosen from

m \times n

possible atoms in the dictionary) on performance across various SNR levels (10 dB, 20 dB, 30 dB). Processing time is measured per one realization of channel coefficients of length 12,000 samples. The best results per SNR level are indicated in bold.

It is important to note that an increasing number of selected atoms leads to a substantial increase in processing time, making the OMP algorithm less usable compared to the dfRLBF [B] approach for sliding window estimation. Furthermore, an increased number of atoms leads to a noticeable degradation in the OMP algorithm’s performance across both metrics, particularly evident at a low SNR of 10 dB—see Figure 10. At higher SNR levels (20 dB and 30 dB), this degradation remains observable but is smaller.

Figure 10. A snapshot of the true impulse response (solid line) and superimposed OMP estimates (circles) for

N_{OMP} = 20

and

N_{OMP} = 50

atoms are shown in Plots (A) and (B), respectively. OMP results were generated under parameters

k = 100

and

m = 3

at an SNR of 10 dB. Plots (C,D), directly below, illustrate the parameter estimation errors corresponding to the OMP results presented in Plots (A,B).

Table 6 presents the average FIT [%] and SICF [dB] values achieved by the OMP algorithm at different SNR levels (10 dB, 20 dB, and 30 dB), using a fixed selection of

N_{OMP} = 20

atoms. For each experimental condition, atom dictionaries were constructed from matrices

F (j), j \in [- k, k]

, utilizing a fixed

n = 50

and exploring all combinations of

k \in [100, 300]

and

m \in [1, 3, 5]

. At a low SNR of 10 dB, the OMP algorithm achieved the best results among the evaluated methods (LBF, fLBF, dfLBF). Only the dfRLBF [B] approach yielded performance comparable to OMP at this SNR level. Conversely, for SNR values of 20 dB and 30 dB, dfRLBF [B] demonstrated superior performance over OMP, with this advantage becoming more pronounced as the SNR increased. Table 7 shows average

Δ

FIT [%] and

Δ

SICF [dB] values, showing the difference in scores between the dfRLBF [B] and OMP algorithms. Positive values indicate dfRLBF [B]’s superior performance, whereas negative values suggest OMP’s better performance.

Table 6. Average FIT [%] and SICF [dB] values for the Orthogonal Matching Pursuit (OMP) algorithm, using

N_{OMP} = 20

selected atoms. For these experiments, atom dictionaries were constructed from matrices

F (j), j \in [- k, k]

for each combination of design parameters: fixed

n = 50

,

k \in [100, 300]

, and

m \in [1, 3, 5]

. The best results per SNR level are indicated in bold.

Table 7. Average

Δ

FIT [%] and

Δ

SICF [dB] values, quantifying the performance difference between the dfRLBF [B] and OMP (

N_{OMP} = 20

atoms) algorithms. The table illustrates the impact of varying design parameters: local estimation window size (

2 k + 1

, for

k \in [100, 300]

) and number of basis functions (

m \in [1, 3, 5]

), across SNR levels of 10 dB, 20 dB, and 30 dB. Positive values indicate dfRLBF [B]’s superior performance, whereas negative values suggest OMP’s better performance.

3.7. Discussion

Once the communication channel can be approximated by the time-varying FIR model shown in (1), the main goal is to identify the unknown time-varying system coefficients. The assumption regarding the speed of parameter changes is very important, as it determines the choice of the identification algorithm. Classical identification algorithms are typically limited to tracking slowly time-varying parameters [4,5,6]. In contrast, the time-varying channel identification algorithm proposed in this paper is designed to effectively track parameter variations at various speeds, including slow, medium, and fast changes. This enhanced capability is achieved through the use of a hypermodel based on local basis functions. By employing a linear combination of an appropriate number of basis functions (m) and selecting a suitable analysis window size (K), the algorithm can accurately capture even rapid parameter changes. Therefore, the design parameters m and K play a crucial role in the performance of the identification process, as demonstrated in all tables presented.

Previous research has proposed two local statistics for parameter selection: the cross-validation selection rule [29] and the modified Akaike’s final prediction error criterion [39]. In this paper, we show that the cross-validation selection rule is also effective for the proposed regularized estimators. As shown in Table 2, results obtained using this rule closely approach the best results across all tested combinations of m and k. Recent works [44,45] show that if there is prior knowledge of parameter changes, it can improve the selection and/or type of basis functions used in the local basis function approach.

Another important aspect examined in this work is the compensation of estimation bias introduced by the causal EWLS algorithm used to obtain pre-estimates at the first stage. This bias can be mitigated by replacing unidirectional pre-estimates with bidirectional pre-estimates. Alternatively, as detailed in Equations (79)–(82), the parameter estimates can also be refined by applying a time-shift correction at each time instant. As shown in Table 1, the time-shift correction leads to improved estimation results across all SNR cases, with greater benefits observed at higher SNR levels. However, accurately estimating the time-shift in low SNR scenarios remains challenging and may require further investigation in future research.

Next, it was demonstrated that the proposed regularization technique is suitable for channel identification with sparse characteristics. As shown in Figure 8, the channel parameter errors are reduced and are closer to zero in the impulse response tail compared to the LBF or fLBF methods. This behavior is the expected outcome of the proposed method. To achieve the best results, it is necessary to select the regularization gains adaptively. In this paper, three gain selection approaches were derived and compared. The best results were obtained when the gain selection was performed independently for each system coefficient.

The proposed method was validated using various noise models, including Gaussian noise, Laplace noise, non-stationary noise, and pulse noise, achieving high estimation accuracy for all cases except the challenging pulse noise scenario (see Table 4). Since both the LBF and fLBF approaches are based on

ℓ^{2}

-norm minimization, they are inherently not robust to severe impulsive (pulse) noise. Even a small proportion of contamination, such as 1%, can significantly degrade the accuracy of channel parameter estimation. To address this challenge, an additional preprocessing step can be implemented to detect and exclude, trim, or replace corrupted samples before applying the proposed method, as discussed in [42]. Alternatively, the strategy proposed in [43] allows for detecting corrupted samples and replacing them with interpolated values based on a signal model.

Finally, the proposed approach was compared with the Orthogonal Matching Pursuit (OMP) algorithm [24]. The atom dictionary, constructed from basis functions, exhibited inherent sparsity and a block diagonal structure, which reduced the computational cost of the OMP algorithm by a factor of 12 without affecting its results. Nevertheless, the computational burden of OMP remained significantly higher than that of the dfRLBF [B] approach, and was strongly dependent on the number of selected atoms. While OMP achieved comparable results to dfRLBF [B] with 20 atoms at a low SNR of 10 dB, its performance in both metrics declined at higher SNR levels (20 dB and 30 dB).

A brief summary of the key aspects of the proposed multistage method for non-stationary sparse channel identification is provided in Table 8.

Table 8. Summary of proposed multistage approach for identification of non-stationary sparse communication channels.

4. Conclusions

The proposed regularization technique, using local basis functions with reweighted

ℓ^{2}

regularizers in both the time and frequency domains, effectively enhances the identification of non-stationary communication channels, especially those with sparse characteristics and exponential envelope profiles, as commonly observed in full-duplex underwater acoustic channels. Three adaptive gain selection approaches, the empirical Bayes, decentralized, and cross-validation, were investigated, showing additional improvements with debiasing and multi-step procedures. The proposed fRLBF approach presents a superior alternative to LBF for lower SNR levels (10 and 20 dB), providing lower complexity and greater flexibility. In simulation experiments, the proposed fRLBF approach achieved results that were competitive with, or even superior to, those obtained using the Orthogonal Matching Pursuit algorithm, while also delivering significant computational savings. Additionally, the proposed method demonstrated robustness to various types of measurement noise—including Gaussian, Laplacian, and non-stationary noise—but its performance deteriorated in the presence of pulse noise. This suggests that a preprocessing step for pulse noise removal should be considered in such scenarios.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The MATLAB R2022b code used to generate input data, the final results, and the processing code are attached to the paper under the link https://doi.org/10.5281/zenodo.17507866.

Acknowledgments

Computer simulations were carried out at the Academic Computer Center in Gdańsk.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. Derivation of (57)

Derivation of (57) is based on the identity

\begin{matrix} \int_{C^{p}} exp \{- \frac{1}{2} α^{H} A α - \frac{1}{2} α^{H} b - \frac{1}{2} b^{H} α - c\} d α = \frac{{(2 π)}^{p / 2}}{{| A |}^{1 / 2}} exp \{\frac{1}{2} b^{H} A^{- 1} b - c\} \end{matrix}

(A1)

valid for any

p \times p

positive definite Hermitian matrix

A

, which stems from properties of the multivariate Gaussian distribution and the following “completing the squares” identity

\begin{matrix} α^{H} A α + α^{H} b + b^{H} α = {(α + A^{- 1} b)}^{H} A (α + A^{- 1} b) - b^{H} A^{- 1} b . \end{matrix}

Combining (55) with (56), one obtains

A = \frac{1}{σ_{z}^{2}} (I_{m n} + Σ)

,

b = - \frac{1}{σ_{z}^{2}} \sum_{j = - k}^{k} F^{H} (j) \tilde{θ} (t + j) = - \frac{1}{σ_{z}^{2}} {\hat{α}}^{fLBF} (t)

and

c = \frac{1}{2 σ_{z}^{2}} \sum_{j = - k}^{k} | | \tilde{θ} (t + j) {| |}_{2}^{2}

which, when substituted to (A1), leads to (57).

Appendix B. Derivation of (61)

Using the identity

\begin{matrix} | A + x x^{H} | = | A | (1 + x^{H} A^{- 1} x) \end{matrix}

(A2)

which holds true for any nonsingular matrix

A

, one arrives at

\begin{matrix} | Σ (t; η, μ) | & = \prod_{i = 1}^{n} | Σ_{i} (t; η, μ) | \\ | Σ_{i} (t; η, μ) | & = | η Δ_{i} (t) + μ γ_{i} (t) f_{0} f_{0}^{H} | \\ = | η Δ_{i} (t) | [1 + \frac{μ γ_{i} (t)}{η} f_{0}^{H} Δ_{i}^{- 1} (t) f_{0}] \end{matrix}

where

\begin{matrix} | η Δ_{i} (t) | = η^{m} \prod_{l = 1}^{m} δ_{i l} (t), f_{0}^{H} Δ_{i}^{- 1} (t) f_{0} = \sum_{l = 1}^{m} \frac{| f_{l} {(0) |}^{2}}{δ_{i l} (t)} . \end{matrix}

In a similar way, one can show that

\begin{matrix} | I_{m n} + Σ (t; η, μ) | = \prod_{i = 1}^{n} | I_{m} + Σ_{i} (t; η, μ) | \\ | I_{m n} + Σ_{i} (t; η, μ) | = | G_{i} (t; η) + μ γ_{i} (t) f_{0} f_{0}^{H} | \\ = | G_{i} (t; η) | [1 + μ γ_{i} (t) f_{0}^{H} G_{i}^{- 1} (t; η) f_{0}] \end{matrix}

where

\begin{matrix} | G_{i} (t; η) | = | I_{m} + η Δ_{i} (t) | = \prod_{l = 1}^{m} [1 + η δ_{i l} (t)] \\ f_{0}^{H} G_{i}^{- 1} (t; η) f_{0} = \sum_{l = 1}^{m} \frac{| f_{l} {(0) |}^{2}}{1 + η δ_{i l} (t)} \end{matrix}

Substituting all results into (59), one arrives at (61).

References

Tsatsanis, M.K.; Giannakis, G.B. Modeling and equalization of rapidly fading channels. Int. J. Adapt. Contr. Signal Process. 1996, 10, 159–176. [Google Scholar] [CrossRef]
Stojanovic, M.; Preisig, J. Underwater acoustic communication channels: Propagation models and statistical characterization. IEEE Commun. Mag. 2009, 47, 84–89. [Google Scholar] [CrossRef]
Songzuo, L.; Iqbal, B.; Khan, I.U.; Ahmed, N.; Qiao, G.; Zhou, F. Full Duplex Physical and MAC Layer-Based Underwater Wireless Communication Systems and Protocols: Opportunities, Challenges, and Future Directions. J. Mar. Sci. Eng. 2021, 9, 468. [Google Scholar] [CrossRef]
Söderström, T.; Stoica, P. System Identification; Prentice-Hall: Hoboken, NJ, USA, 1988; ISBN 978-0-13-881236-2. [Google Scholar]
Haykin, S. Adaptive Filter Theory; Prentice-Hall: Hoboken, NJ, USA, 1996; ISBN 978-0-13-322760-4. [Google Scholar]
Niedźwiecki, M. Identification of Time-Varying Processes; Wiley: Hoboken, NJ, USA, 2000; ISBN 978-0-471-98629-4. [Google Scholar]
Kitagawa, G.; Gersch, W. A smoothness priors time-varying AR coefficient modeling of nonstationary covariance time series. IEEE Trans. Autom. Control 1985, 30, 48–56. [Google Scholar] [CrossRef]
Kitagawa, G.; Gersch, W. Smoothness Priors Analysis of Time Series; Springer: Berlin/Heidelberg, Germany, 1996; ISBN 978-0-387-94819-5. [Google Scholar]
Niedźwiecki, M. Locally adaptive cooperative Kalman smoothing and its application to identification of nonstationary stochastic systems. IEEE Trans. Signal Process. 2012, 60, 48–59. [Google Scholar] [CrossRef]
Niedźwiecki, M. Functional series modeling approach to identification of nonstationary stochastic systems. IEEE Trans. Autom. Control 1988, 33, 955–961. [Google Scholar] [CrossRef]
Tsatsanis, M.K.; Giannaki, G.B. Time-varying system identification and model validation using wavelets. IEEE Trans. Signal Process. 1993, 41, 3512–3523. [Google Scholar] [CrossRef]
Borah, D.K.; Hart, B.D. Frequency-selective fading channel estimation with a polynomial time-varying channel model. IEEE Trans. Commun. 1999, 47, 862–873. [Google Scholar] [CrossRef]
Wei, H.L.; Liu, J.J.; Billings, S.A. Identification of time-varying systems using multi-resolution wavelet models. Int. J. Syst. Sci. 2002, 33, 1217–1228. [Google Scholar] [CrossRef]
Niedźwiecki, M.; Ciołek, M. Generalized Savitzky-Golay filters for identification of nonstationary systems. Automatica 2020, 108, 108477. [Google Scholar] [CrossRef]
Ljung, L.; Chen, T. What can regularization offer for estimation of dynamic systems? In Proceedings of the 11th IFAC Workshop on Adaptation and Learning in Control and Signal Processing, Caen, France, 3–5 July 2013; pp. 1–8. [Google Scholar]
Chen, T.; Ohlsson, H.; Ljung, L. On the estimation of transfer functions, regularizations and Gaussian process—Revisited. Automatica 2012, 48, 1525–1535. [Google Scholar] [CrossRef]
Pillonetto, G.; Dinuzzo, F.; Chen, T.; De Nicolao, G.; Ljung, L. Kernel methods in system identification, machine learning and function estimation: A survey. Automatica 2014, 50, 657–682. [Google Scholar] [CrossRef]
Gańcza, A.; Niedźwiecki, M.; Ciołek, M. Regularized local basis function approach to identification of nonstationary processes. IEEE Trans. Signal Process. 2021, 69, 1665–1680. [Google Scholar] [CrossRef]
Niedźwiecki, M.; Gańcza, A.; Kaczmarek, P. Identification of fast time-varying communication channels using the preestimation technique. In Proceedings of the 19th Symposium on System Identification, SYSID, Padova, Italy, 13–16 July 2021; Volume 54, pp. 351–356. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the LASSO. J. R. Statist. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Tipping, M.E. Sparse Bayesian Learning and the Relevance Vector Machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar]
Daubechies, I.; Defrise, M.; De Mol, C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 2004, 57, 1413–1457. [Google Scholar] [CrossRef]
Beck, A.; Teboulle, M. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
Davis, G.; Mallat, S.; Avellaneda, M. Adaptive greedy approximations. Constr. Approx. 1997, 13, 57–98. [Google Scholar] [CrossRef]
Needell, D.; Tropp, J.A. Iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 2009, 26, 301–321. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Found. Trends® Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
Gańcza, A.; Niedźwiecki, M. Regularized identification of fast time-varying systems—Comparison of two regularization strategies. In Proceedings of the 60th IEEE Conference on Decision and Control, Austin, TX, USA, 14–17 December 2021; pp. 864–871. [Google Scholar]
Shen, L.; Zakharov, Y.; Henson, B.; Morozs, N.; Mitchell, P. Adaptive filtering for full-duplex UWA systems with time-varying self-interference channel. IEEE Access 2020, 8, 187590–187604. [Google Scholar] [CrossRef]
Niedźwiecki, M.; Ciołek, M.; Gańcza, A. A new look at the statistical identification of nonstationary systems. Automatica 2020, 118, 109037. [Google Scholar] [CrossRef]
Niedźwiecki, M.; Gańcza, A.; Ciołek, M. On the preestimation technique and its application to identification of nonstationary systems. In Proceedings of the 59th Conference on Decision and Control CDC 2020, Jeju Island, Republic of Korea, 14–18 December 2020; pp. 286–293. [Google Scholar]
Zakharov, Y.V.; White, G.P.; Liu, J. Low-complexity RLS algorithms using dichotomous coordinate descent iterations. IEEE Trans. Signal Process. 2008, 56, 3150–3161. [Google Scholar] [CrossRef]
Friedman, J.; Hastie, T.; Tibshirani, R. A note on the group lasso and sparse group lasso. arXiv 2010, arXiv:1001.0736. [Google Scholar] [CrossRef]
Bani, M.S.; Chalmers, B.L. Best approximation in L^inf via iterative Hilbert space procedures. J. Approx. Theory 1984, 42, 173–180. [Google Scholar] [CrossRef]
Burrus, C.S.; Barreto, J.A.; Selesnick, I.W. Iterative reweighted least squares design of FIR filters. IEEE Trans. Signal Process. 1994, 42, 2926–2936. [Google Scholar] [CrossRef]
Gańcza, A. Local Basis Function Method for Identification of Nonstationary Systems. Ph.D. Thesis, The Gdańsk University of Technology, Gdańsk, Poland, 2024. [Google Scholar]
Good, I.J. The Estimation of Probabilities; MIT Press: Cambridge, MA, USA, 1965; ISBN 978-0-262-57015-2. [Google Scholar]
Niedźwiecki, M.; Gańcza, A.; Shen, L.; Zakharov, Y. Adaptive identification of linear systems with mixed static and time-varying parameters. Signal Process. 2020, 200, 108664. [Google Scholar] [CrossRef]
Niedźwiecki, M.; Gańcza, A.; Shen, L.; Zakharov, Y. On Bidirectional preestimates and their application to identification of fast time-varying systems. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Niedźwiecki, M.; Ciołek, M. Fully adaptive Savitzky-Golay type smoothers. In Proceedings of the 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, 2–6 September 2019; pp. 1–5. [Google Scholar]
Miller, S.L.; Childers, D.G.S. Probability and Random Processes: With Applications to Signal Processing and Communications, 2nd ed.; Academic Press: Cambridge, MA, USA, 2012; ISBN 978-0-12-386981-4. [Google Scholar]
Huber, P.J. Robust Estimation of a Location Parameter. Ann. Math. Stat. 1964, 35, 73–101. [Google Scholar] [CrossRef]
Niedźwiecki, M.; Gańcza, A.; Żuławiński, W.; Wyłomańska, A. Robust local basis function algorithms for identification of time-varying FIR systems in impulsive noise environments. In Proceedings of the 63rd Conference on Decision and Control (CDC), Milan, Italy, 16–19 December 2024; pp. 3463–3470. [Google Scholar]
Ciołek, M.; Niedźwiecki, M. Detection of impulsive disturbances in archive audio signals. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 671–675. [Google Scholar]
Niedźwiecki, M.; Gańcza, A. Karhunen-Loeve-based approach to tracking of rapidly fading wireless communication channels. Signal Process. 2023, 209, 109043. [Google Scholar] [CrossRef]
Niedźwiecki, M.; Gańcza, A. On Optimal Tracking of Rapidly Varying Telecommunication Channels. IEEE Trans. Signal Process. 2024, 72, 2726–2738. [Google Scholar] [CrossRef]

Figure 1. The first three basis functions before normalization in the range

j \in [- k, k]

: on the left, powers of time; on the right, cosine functions.

Figure 2. From top to bottom: the true parameter trajectory (red) with the LBF parameter estimate superimposed (black) for

k = 200

and

m = 5

; and, at the bottom, the corresponding estimation error along with the mean square estimation error (MSE).

Figure 3. This figure shows

3 \times 3

grid of plots of the estimation error

θ_{1} (t) - {\hat{θ}}_{1} (t)

for different LBF settings, obtained for all combinations of

k \in [50, 100, 200]

and

m \in [1, 3, 5]

. In each plot, the parameter mean square estimation error (MSE) is displayed in the top-left corner.

Figure 4. The parameter estimation error

θ_{1} (t) - {\hat{θ}}_{1} (t)

for different SNR levels

[10, 20, 30]

dB for the setting

k = 50

and

m = 5

. In each plot, the parameter mean square estimation error (MSE) is shown in the top-left corner.

Figure 5. From top to bottom: (A) the true parameter trajectory (red) with pre-estimates superimposed (black) at an SNR level of 10 dB; (B) the true parameter trajectory (red) with fLBF estimates superimposed (black) for

k = 50

and

m = 5

; (C) the estimation error,

θ_{1} (t) - {\hat{θ}}_{1} (t)

, with the mean square estimation error (MSE) indicated in the top-right corner.

Figure 6. The true time-varying channel coefficient trajectory (a typical example) was used in the system Equation (1) to generate the system output

y (t)

at different SNR levels.

Figure 7. Convergence of the average FIT [%] and SICF [dB] values for the multi-step dfRLBF

[B, j]

algorithm, evaluated for

k = 100

and

m = 3

at an SNR level of 10 dB over twenty iterations in the multi-step procedure (

j = 1, \dots, 20

). The presented results were averaged over time for a single realization of the channel model (a typical example).

Figure 8. A snapshot of the true impulse response of the identified system (top figure) and three corresponding parameter channel errors obtained using the LBF, fLBF, and dfRLBF

[B, 2]

, respectively, for

k = 100

and

m = 3

settings at an SNR level of 10 dB.

Figure 9. Three measurement noise sequences with different distributions. From top to bottom: (A) Laplacian noise, (B) non-stationary noise with time-varying variance, and (C) mixture noise—Gaussian noise contaminated by occasional impulsive (pulse) noise modeled for parameters

κ = 100, ϵ = 0.01

.

Figure 10. A snapshot of the true impulse response (solid line) and superimposed OMP estimates (circles) for

N_{OMP} = 20

and

N_{OMP} = 50

atoms are shown in Plots (A) and (B), respectively. OMP results were generated under parameters

k = 100

and

m = 3

at an SNR of 10 dB. Plots (C,D), directly below, illustrate the parameter estimation errors corresponding to the OMP results presented in Plots (A,B).

Table 1. Average FIT [%] and SICF [dB] values for the LBF, fLBF, and dfLBF algorithms. Results are presented across varying design parameters: local estimation window size (

K = 2 k + 1

, for

k \in [100, 300]

) and number of basis functions (

m \in [1, 3, 5]

), and at signal-to-noise ratio (SNR) levels of 10 dB, 20 dB, and 30 dB. Best performance values among the compared methods are highlighted in bold for each unique parameter combination and SNR level; a method with a greater number of highlighted results signifies its superior overall performance under identical experimental conditions. Cases with insufficient data for the LBF algorithm are denoted by ‘x’.

Table 1. Average FIT [%] and SICF [dB] values for the LBF, fLBF, and dfLBF algorithms. Results are presented across varying design parameters: local estimation window size (

K = 2 k + 1

, for

k \in [100, 300]

) and number of basis functions (

m \in [1, 3, 5]

), and at signal-to-noise ratio (SNR) levels of 10 dB, 20 dB, and 30 dB. Best performance values among the compared methods are highlighted in bold for each unique parameter combination and SNR level; a method with a greater number of highlighted results signifies its superior overall performance under identical experimental conditions. Cases with insufficient data for the LBF algorithm are denoted by ‘x’.

		FIT [%]			SICF [dB]
Method	k/m	1	3	5	1	3	5
		10 dB
LBF	100	91.8	85.4	x	25.1	22.2	x
LBF	300	88.8	93.9	92	21.4	27.4	25.4
fLBF	100	92	88.6	86	25.2	22.7	21.2
fLBF	300	90.1	93	91.5	21.8	26	24.6
dfLBF	100	92.3	88.7	86.1	25.5	22.8	21.1
dfLBF	300	90.7	93.3	91.7	22.3	26.4	24.9
		20 dB
LBF	100	95.3	95.4	x	30.2	32.2	x
LBF	300	89.4	97.9	97.5	21.8	36.5	35.4
fLBF	100	96.5	95.5	94.4	31.9	31.1	30.1
fLBF	300	91.1	96.9	96.5	22.4	32.8	32.2
dfLBF	100	97.1	95.9	94.7	34.1	31.9	30.2
dfLBF	300	91.9	97.7	97.1	23.1	35.6	34.2
		30 dB
LBF	100	95.9	98.5	x	31.3	42.2	x
LBF	300	89.4	98.9	99.2	21.8	42.1	45.4
fLBF	100	97.3	97	96.3	33.9	34.9	34.9
fLBF	300	91.3	97.7	97.5	22.5	34.9	35
dfLBF	100	98.3	97.6	96.7	38.5	37.4	35.9
dfLBF	300	92	98.8	98.6	23.2	41.5	40.4

Table 2. Average FIT [%] and SICF [dB] values for the dfRLBF algorithm, comparing various gains selection approaches: [A] empirical Bayes approach, [B] decentralized approach, and [C] cross-validation approach. Performance is evaluated across design parameters

k \in [100, 300]

and

m \in [1, 3, 5]

, and at SNR levels of 10 dB, 20 dB, and 30 dB. Best performance values for each unique parameter combination k, m, and SNR are highlighted in bold. An adaptive approach, dynamically selecting m and k at each time instant, is denoted by ‘A’.

Table 2. Average FIT [%] and SICF [dB] values for the dfRLBF algorithm, comparing various gains selection approaches: [A] empirical Bayes approach, [B] decentralized approach, and [C] cross-validation approach. Performance is evaluated across design parameters

k \in [100, 300]

and

m \in [1, 3, 5]

, and at SNR levels of 10 dB, 20 dB, and 30 dB. Best performance values for each unique parameter combination k, m, and SNR are highlighted in bold. An adaptive approach, dynamically selecting m and k at each time instant, is denoted by ‘A’.

		FIT [%]			SICF [dB]
Method	k/m	1	3	5	1	3	5
		10 dB
dfRLBF [A]	100	94.4	92.1	90.6	27.6	25.3	23.9
	300	91.2	94.9	94	22.6	28.5	27.3
	A	94.2			27.3
dfRLBF [B]	100	94.1	92.7	91.7	27.4	25.8	24.7
	300	91.2	95.3	94.6	22.6	29.2	28.1
	A	94.4			27.7
dfRLBF [C]	100	93.7	91.3	89.7	26.9	24.9	23.6
	300	91.1	94.8	93.9	22.6	28.4	27.3
	A	90.7			24
		20 dB
dfRLBF [A]	100	97.7	97.1	96.3	35.2	33.9	32.5
	300	91.9	98.1	97.9	23.1	37.2	36.3
	A	98			36.6
dfRLBF [B]	100	97.7	97.2	96.7	35.1	33.9	32.8
	300	91.9	98.3	98.1	23.1	37.7	36.9
	A	98.2			37.2
dfRLBF [C]	100	97.6	96.9	96.2	35.1	33.9	32.6
	300	91.9	98.1	97.8	23.1	37.1	36.3
	A	97.5			35
		30 dB
dfRLBF [A]	100	98.5	98.3	97.6	38.6	38.8	37.1
	300	92	99	98.9	23.2	42.2	42.1
	A	98.9			41.8
dfRLBF [B]	100	98.4	98.2	97.7	37.5	36.7	35.6
	300	92	98.9	98.9	23.2	41	40.6
	A	98.9			40.9
dfRLBF [C]	100	98.5	98.3	97.7	38.7	38.8	37
	300	92	99	98.9	23.2	42.3	42.2
	A	99			42.6

Table 3. Average FIT [%] and SICF [dB] values for the dfRLBF [B] algorithm with varying numbers of iterations

[B, j], j = 1, 2, \dots

in the multi-step procedure. Performance is evaluated across design parameters

k \in [100, 300]

and

m \in [1, 3, 5]

, and at an SNR level of 10 dB. Best performance values for each unique parameter combination k, m, and SNR are highlighted in bold.

Table 3. Average FIT [%] and SICF [dB] values for the dfRLBF [B] algorithm with varying numbers of iterations

[B, j], j = 1, 2, \dots

in the multi-step procedure. Performance is evaluated across design parameters

k \in [100, 300]

and

m \in [1, 3, 5]

, and at an SNR level of 10 dB. Best performance values for each unique parameter combination k, m, and SNR are highlighted in bold.

		FIT [%]			SICF [dB]
Method	k/m	1	3	5	1	3	5
		10 dB
dfRLBF $[B, 1]$	100	94.1	92.7	91.7	27.4	25.8	24.7
dfRLBF $[B, 1]$	300	91.2	95.3	94.6	22.6	29.2	28.1
dfRLBF $[B, 2]$	100	94.3	93.1	92.3	27.7	26.1	25
dfRLBF $[B, 2]$	300	91.2	95.6	95	22.6	29.5	28.5
dfRLBF $[B, 3]$	100	94.4	93.2	92.5	27.7	26.1	25.1
dfRLBF $[B, 3]$	300	91.2	95.6	95	22.6	29.5	28.6
dfRLBF $[B, 4]$	100	94.4	93.3	92.5	27.7	26.1	25.1
dfRLBF $[B, 4]$	300	91.2	95.6	95	22.6	29.5	28.6

Table 5. Average FIT [%], SICF [dB] values, and processing time (

T_{p}

) in seconds for the Orthogonal Matching Pursuit (OMP) algorithm. The table demonstrates the impact of the number of selected atoms (

N_{OMP}

, chosen from

m \times n

possible atoms in the dictionary) on performance across various SNR levels (10 dB, 20 dB, 30 dB). Processing time is measured per one realization of channel coefficients of length 12,000 samples. The best results per SNR level are indicated in bold.

Table 5. Average FIT [%], SICF [dB] values, and processing time (

T_{p}

) in seconds for the Orthogonal Matching Pursuit (OMP) algorithm. The table demonstrates the impact of the number of selected atoms (

N_{OMP}

, chosen from

m \times n

possible atoms in the dictionary) on performance across various SNR levels (10 dB, 20 dB, 30 dB). Processing time is measured per one realization of channel coefficients of length 12,000 samples. The best results per SNR level are indicated in bold.

		FIT [%]			SICF [dB]
$N_{OMP}$	$T_{p}$ [s]	10 dB	20 dB	30 dB	10 dB	20 dB	30 dB
10	128	91	91.9	92	22.4	23.2	23.4
20	230	93.5	96.8	97.4	26.2	32	33.3
30	327	92.2	96.6	97.5	25.1	32.2	34.4
40	431	91.3	96.4	97.3	24.3	31.9	34.5
50	528	90.7	96.1	97.2	23.9	31.6	34.5

Table 6. Average FIT [%] and SICF [dB] values for the Orthogonal Matching Pursuit (OMP) algorithm, using

N_{OMP} = 20

selected atoms. For these experiments, atom dictionaries were constructed from matrices

F (j), j \in [- k, k]

for each combination of design parameters: fixed

n = 50

,

k \in [100, 300]

, and

m \in [1, 3, 5]

. The best results per SNR level are indicated in bold.

Table 6. Average FIT [%] and SICF [dB] values for the Orthogonal Matching Pursuit (OMP) algorithm, using

N_{OMP} = 20

selected atoms. For these experiments, atom dictionaries were constructed from matrices

F (j), j \in [- k, k]

for each combination of design parameters: fixed

n = 50

,

k \in [100, 300]

, and

m \in [1, 3, 5]

. The best results per SNR level are indicated in bold.

		FIT [%]			SICF [dB]
SNR	k/m	1	3	5	1	3	5
10 dB	100	93.4	93.5	93.4	26.3	26.2	26.1
10 dB	300	90.49	95.3	95.2	22	28.2	28.3
20 dB	100	96.8	96.8	96.8	32.3	32	31.9
20 dB	300	91.2	96.5	96.4	22.4	31	30.7
30 dB	100	97.4	97.4	97.3	33.6	33.3	33.2
30 dB	300	91.2	96.6	96.5	22.5	31.3	31

Table 7. Average

Δ

FIT [%] and

Δ

SICF [dB] values, quantifying the performance difference between the dfRLBF [B] and OMP (

N_{OMP} = 20

atoms) algorithms. The table illustrates the impact of varying design parameters: local estimation window size (

2 k + 1

, for

k \in [100, 300]

) and number of basis functions (

m \in [1, 3, 5]

), across SNR levels of 10 dB, 20 dB, and 30 dB. Positive values indicate dfRLBF [B]’s superior performance, whereas negative values suggest OMP’s better performance.

Table 7. Average

Δ

FIT [%] and

Δ

SICF [dB] values, quantifying the performance difference between the dfRLBF [B] and OMP (

N_{OMP} = 20

atoms) algorithms. The table illustrates the impact of varying design parameters: local estimation window size (

2 k + 1

, for

k \in [100, 300]

) and number of basis functions (

m \in [1, 3, 5]

), across SNR levels of 10 dB, 20 dB, and 30 dB. Positive values indicate dfRLBF [B]’s superior performance, whereas negative values suggest OMP’s better performance.

		$Δ$ FIT [%]			$Δ$ SICF [dB]
SNR	k/m	1	3	5	1	3	5
10 dB	100	0.7	−0.8	−1.7	1.1	−0.4	−1.4
10 dB	300	0.71	0	−0.6	0.6	1	−0.2
20 dB	100	0.9	0.4	−0.1	2.8	1.9	0.9
20 dB	300	0.7	1.8	1.7	0.7	6.1	5.6
30 dB	100	1	0.8	0.4	3.9	3.4	2.4
30 dB	300	0.8	2.3	2.4	0.7	9.7	9.6

Table 8. Summary of proposed multistage approach for identification of non-stationary sparse communication channels.

Aspect	Description
Goal	Identification of non-stationary communication channels exhibiting sparsity, approximated by an FIR model.
Estimation framework	Sliding window approach with model parameters updated at each time instant.
Rate of channel variation	Tracks slow, medium, and fast variations; no assumption of local stationarity.
Algorithm features	Approximated closed-form solution; computationally efficient, recursive, adaptive, and does not require matrix inversion.
Numerical stability	Recursive fLBF algorithm is marginally stable; periodic reset required (e.g., every 1000 steps).
Measurement noise	Validated using various noise models, including Gaussian, Laplacian, non-stationary, and pulse noise. High estimation accuracy achieved in all cases except for pulse noise; robustness can be enhanced by excluding or interpolating corrupted samples (see [42,43]).
Sparse problems	Handles sparse systems using two reweighted $ℓ^{2}$ regularizers; both regularization gains are adaptively optimized at each time instant.
Multistage method	Yes (three stages).
Stage 1	The pre-estimates are obtained using inverse filtering of EWLS estimates or, for reduced cost, the iterative dichotomous coordinate descent algorithm; followed by fLBF estimation with adaptive hyperparameter selection (e.g., number of basis functions, window size, decay profile).
Stage 2	fRLBF estimation with joint or separate optimization of regularization gains for each fLBF estimate.
Stage 3	Multi-step refinement procedure with a few iterations to further improve fRLBF estimates.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Identification of Non-Stationary Communication Channels with a Sparseness Property

Abstract

1. Introduction

2. Material and Methods

2.1. Local Basis Functions Estimators

2.2. fLBF Estimators

2.3. Regularized fLBF Estimators

2.4. Computational Complexity of fRLBF Estimators

2.5. Multi-Step Procedure of fRLBF Estimators

2.6. Selection of Regularization Gains

2.6.1. Empirical Bayes Approach

2.6.2. Decentralized Approach

2.6.3. Cross-Validation Approach

2.7. Debiasing

2.8. The Number of Basis Functions and the Analysis Window Size

3. Results and Discussion

3.1. Channel Model

3.2. Metrics

3.3. Simulation Experiment Settings

3.4. Results

3.5. Impact of Measurement Noise

3.6. Comparison with the Orthogonal Matching Pursuit

3.7. Discussion

4. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Derivation of (57)

Appendix B. Derivation of (61)

References

Article Metrics

Citations

Article Access Statistics