Next Article in Journal
Sequence Prediction and Classification of Echo State Networks
Previous Article in Journal
Mathematical Modeling of Collisional Heat Generation and Convective Heat Transfer Problem for Single Spherical Body in Oscillating Boundaries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sparse Diffusion Least Mean-Square Algorithm with Hard Thresholding over Networks

1
System LSI Business, Samsung Electronics, Hwaseong 18448, Republic of Korea
2
Department of Applied Artificial Intelligence, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(22), 4638; https://doi.org/10.3390/math11224638
Submission received: 7 October 2023 / Revised: 5 November 2023 / Accepted: 11 November 2023 / Published: 14 November 2023
(This article belongs to the Section Dynamical Systems)

Abstract

:
This paper proposes a distributed estimation technique utilizing the diffusion least mean-square (LMS) algorithm, specifically designed for sparse systems in which many coefficients of the system are zeros. To efficiently utilize the sparse representation of the system and achieve a promising performance, we have incorporated L 0 -norm regularization into the diffusion LMS algorithm. This integration is accomplished by employing hard thresholding through a variable splitting method into the update equation. The efficacy of our approach is validated by comprehensive theoretical analysis, rigorously examining the mean stability as well as the transient and steady-state behaviors of the proposed algorithm. The proposed algorithm preserves the behavior of large coefficients and strongly enforces smaller coefficients toward zero through the relaxation of L 0 -norm regularization. Experimental results show that the proposed algorithm achieves superior convergence performance compared with conventional sparse algorithms.

1. Introduction

The prevalence of the Internet-of-Things (IoT) in all aspects of life has considerably increased the importance of distributed estimation over networks to identify parameters of interest using the collective data from a group of sensor nodes in the network [1]. Broadly, distributed estimation approaches can be classified into three types: incremental [2], consensus [3,4], and diffusion [5,6,7] strategies.
In the incremental approach [2], a predetermined data path cycles through all the sensor nodes in the network. During each cycle, the algorithm iteratively processes the data from each node in this sequence, continuously refining its estimates until specific optimization is achieved. Despite its conceptual simplicity, this method comes with a substantial computational challenge: defining an optimal cyclic path that passes all nodes is an NP-hard problem, making it computationally intensive to solve. In contrast, consensus strategies require each sensor node to continuously share its locally processed data with all neighboring nodes in the network at every time instance [3,4]. While this can accelerate convergence to a network-wide consensus, it often proves impractical for real-world applications. Continuous data sharing at high frequencies can overwhelm the computational capacities of the sensor nodes and can also lead to network congestion, especially when bandwidth resources are limited.
Diffusion-based methods have gained considerable attention due to their superior performance compared to consensus algorithms [6,8,9]. The advantage is largely attributed to the effective utilization of spatio-temporal data dynamics across all nodes in the network [10]. According to adaptive methods employed in the diffusion algorithm, there are many variants of diffusion adaptive algorithms: diffusion least mean square (LMS) [5], diffusion normalized LMS [11,12], diffusion recursive least-squares [13], and diffusion affine projection algorithm [14]. Among these, the diffusion LMS algorithm serves as a foundational structure for the development of more advanced mechanisms owing to its inherent simplicity [15,16,17]. In particular, our focus is on the adapt-then-combine (ATC) implementation of the diffusion LMS, which generally outperforms the combine-then-adapt (CTA) implementation [6]. For brevity, we will refer to this as the diffusion LMS algorithm, omitting the “ATC” term.
In many practical systems, the impulse response is largely sparse in nature, signifying that the response vectors contain a few relatively larger parameters while the rest remain zero. Despite its intrinsic sparsity, sparsity information is not considered in the cost function of the traditional diffusion LMS algorithms [5,6,7]. Consequently, incorporating this sparsity knowledge has enhanced the estimation performance in the field of adaptive filters [18,19,20,21,22,23]. Sparsity has been primarily exploited in compressed sensing (CS) methods for signal reconstruction, where initial batch-processing algorithms required substantial memory and computational resources [24]. The adoption of sparsity regularization has been successfully used for various applications [25,26,27]. Acknowledging these achievements, recently various diffusion LMS algorithms that take these sparsity regularizations into account have been proposed [7,28,29,30,31,32,33,34,35].
To consider sparsity, one simple approach entails proportionate mechanisms in the diffusion LMS algorithm. In this framework, greater gains are assigned to larger weights in the impulse response [28,29]. While proportionate diffusion LMS algorithms enhance the convergence speed during the transient state for sparse systems, their capability to decrease steady-state errors remains constrained when compared to standard diffusion LMS algorithms [28,29]. As an alternative, sparsity is incorporated into the mean-square error (MSE) cost function of the standard diffusion LMS [7,30,31]. These sparsity-constrained diffusion LMS algorithms employ zero-attracting terms in the objective function motivated by the LASSO technique [36], enforcing small weights towards zero, which in turn reduces the steady-state error in sparse systems. Stemming from this concept, a L 1 -norm regularization-based zero attracting (ZA) diffusion LMS algorithm and its reweighted version (RZA) have been proposed, both of which have shown improvements over the traditional diffusion LMS algorithms [30]. To further improve performance over ZA and RZA based diffusion LMS algorithms, the diffusion polynomial zero attraction LMS algorithm was proposed [32]. Moreover, to improve robustness against non-Gaussian noise, diffusion Versoria zero attraction LMS algorithms have been proposed, which adaptively determine the strength of zero attraction based on the maximum correntropy criterion and the lncosh cost [33]. Additionally, fused sparse diffusion algorithms emerged [34,35], employing two regularization terms: one for the sparsity of weights and the other for the similarity of adjacent weights.
In this paper, we propose a novel sparse diffusion LMS algorithm employing a hard-thresholding technique based on L 0 -norm regularization. Hard thresholding is a simple operation that assigns zero to input values below a specified threshold. This process is mainly derived using a variable splitting strategy in minimizing diffusion LMS cost function with L 0 -norm regularization, as elucidated in [37,38], which effectively forces insignificant weights to zero. This algorithm is particularly beneficial in reducing the steady-state error when estimating a sparse impulse response with many zero weights. We elucidate the theoretical convergence behavior of the proposed method by performing mean stability and mean-square analyses. Simulation results show the superiority of the proposed algorithm over conventional algorithms, such as diffusion LMS and RZA diffusion LMS, especially in system identification applications.
This paper is organized as follows. In Section 2, we derive a sparse diffusion LMS algorithm utilizing a hard-thresholding technique. Section 3 presents performance analysis results including both the mean stability condition and the mean-square performance analysis. The simulation results of the proposed algorithm are presented in Section 4. Finally, some conclusions are drawn in Section 5.
The notations adopted throughout this paper are listed in Table 1. All vectors are column vectors except for the input regressors, which are taken to be row vectors for convenience of notation. We use boldface letters for random variables and normal letters for deterministic quantities, e.g., w k , i and w k , i .

2. Algorithm Formulation

2.1. Derivation of Sparse Diffusion LMS with Hard Thresholding

Our objective is to estimate an unknown sparse vector of interest, denoted as w o , over a network comprising N nodes. Each node in this network endeavors to estimate the common w o . At every time instant i, a node k collects an observation d k ( i ) generated by a linear model:
d k ( i ) = u k , i w o + v k ( i ) ,
where u k , i is an 1 × M input regressor, expressed as u k , i = [ u k ( i ) , , u k ( i M + 1 ) ] for a node k, and v k ( i ) represents a measurement of Gaussian noise with zero mean and variance σ v , k 2 . Consequently, d k ( i ) represents the measured signal at sensor node k, inclusive of additive noise, at specific time instant i. We assume that this additive noise is independent of input regressors throughout the network. The distributed sparse estimation problem can find the optimal vector w o by minimizing the global cost function:
J glob ( w ) = k = 1 N E d k ( i ) u k , i w 2 + γ f ( w ) ,
where f ( w ) represents a sparse regularization function with a regularization weight γ to exploit sparsity of the unknown vector w o . In this paper, we employ the L 0 -norm for f ( w ) , represented as w 0 . Following the derivation for the diffusion process presented in [6,30], we adopt the distributed cost function:
J k dist ( w ) = l N k c l , k E d l ( i ) u l , i w 2 + l N k / k b l , k w ψ l 2 + γ w 0 ,
where ψ l denotes the intermediate estimate of w o at node l, N k represents a set of neighborhoods connected with node k, l N k / k indicates the neighbors of node k excluding the node itself, c l , k is a nonnegative weight of node l on node k that satisfies l N k c l , k = 1 , and b l , k is another nonnegative weight of node l on node k.
Motivated by a variable splitting strategy [37,38], we introduce an auxiliary weight vector, w aux , into the distributed cost function as follows:
J k dist ( w , w aux ) = l N k c l , k E d l ( i ) u l , i w aux 2 + l N k / k b l , k w aux ψ l 2 + ρ k γ w aux w 2 + γ w 0 ,
where ρ k is an auxiliary regularization parameter that determines a threshold for zero attraction. To solve this minimization problem, the cost function can be divided into two subproblems. With a fixed w, the first subproblem to get an estimate of w aux is
J k dist , 1 ( w aux ) = l N k c l , k E d l ( i ) u l , i w aux 2 + l N k / k b l , k w aux ψ l 2 + ρ k γ w aux w 2 .
By applying the steepest-descent approach to (5), the update equation for the estimate w aux , k , i at node k at time i becomes
w aux , k , i = w aux , k , i 1 + μ k l N k c l , k r d u , l R u , l w aux , k , i 1 + μ k l N k / k b l , k ψ l w aux , k , i 1 + μ k ρ k γ w k w aux , k , i 1 ,
where r d u , k = E [ d k ( i ) u k , i * ] , R u , k = E [ u k , i * u k , i ] , and μ k is a step-size parameter for node k. We can divide (6) into two update equations for computing an intermediate estimate ψ k , i at node k and time i as follows:
ψ aux , k , i = w aux , k , i 1 + μ k l N k c l , k r d u , l R u , l w aux , k , i 1 + μ k ρ k γ w w aux , k , i 1 ,
w aux , k , i = ψ aux , k , i + μ k l N k / k b l , k ψ l w aux , k , i 1 .
By substituting w aux , k , i 1 and ψ l in (8) with ψ aux , k , i and ψ aux , l , i , respectively, and substituting w in (7) with w k , i 1 , and introducing the parameter a l , k = μ k b l , k ( l k ) , a k , k = 1 μ k l N k b l , k [6,30], then we can rewrite (7) and (8) as
ψ aux , k , i = w aux , k , i 1 + μ k l N k c l , k r d u , l R u , l w aux , k , i 1 + μ k ρ k γ w k , i 1 w aux , k , i 1 ,
w aux , k , i = l N k a l , k ψ aux , l , i .
Subsequently, we find the w k that minimizes the following second cost function given w k , aux :
J k dist , 2 ( w k ) = w aux , k w k 2 + 1 ρ k w k 0 .
Utilizing the hard-thresholding technique [38], the update equation of w k can be represented as:
w k , i ( n ) = 0 , if w aux , k , i ( n ) < 2 / ρ k w aux , k , i ( n ) , else .
Finally, employing the instantaneous approximations r d u , l d l ( i ) u l , i * and R u , l u l , i * u l , i in (9), the final algorithm can be derived as follows:
ψ aux , k , i = w aux , k , i 1 + μ k l N k c l , k u l , i * d l ( i ) u l , i w aux , k , i 1 + μ k ρ k γ w k , i 1 w aux , k , i 1 ,
w aux , k , i = l N k a l , k ψ aux , l , i ,
w k , i ( n ) = 0 , if w aux , k , i ( n ) < 2 / ρ k w aux , k , i ( n ) , else .
for k = 1 , , N and n = 1 , , M .

2.2. Guideline for Determining Thresholds

In (15), the appropriate determination of the threshold ρ k is crucial for attaining an optimal solution. Therefore, it is essential to provide a clear guideline for determining ρ k . Setting the threshold excessively high can lead to the convergence of relatively larger weights across nodes to negligible values. Conversely, if the threshold is set too small, relatively smaller weights may not be effectively drawn towards zero. To avoid this situation, it is reasonable to set the threshold value greater than the steady-state absolute mean of zero tap weights. This guideline ensures the attraction of only the zero tap weights, without disturbing the convergence of non-zero taps. In this regard, we utilize the following assumption [21]:
Assumption 1. 
The weights w k , i follow a Gaussian distribution.
The Assumption 1 is usually assumed in the performance analysis of sparse adaptive filters and is consistent with empirical simulation results [21]. Under this assumption, the steady-state absolute mean of the zero weights can be expressed as
E w k , ( n ) = 2 π E w ˜ k , ( n ) 2 ,
where w ˜ k , i = w o w k , i . For the index n corresponding to zero weights, we have E w ˜ k , i ( n ) = E w k , i ( n ) , given w o ( n ) = 0 . However, obtaining the feasible steady-state mean-squared deviation, E w ˜ k , ( n ) 2 , of the diffusion LMS algorithm in (16) remains a difficult challenge [5,6]. If γ is small, and we assume no measurement exchange, i.e., c k , k = 1 and c l , k = 0 for l k , ψ k , i is approximately updated through the standard LMS algorithm. Using the tractable simplification, according to [39], the steady- state mean-squared deviation can be reduced to
E w ˜ k , ( n ) 2 = μ k σ v , k 2 2 μ k σ u , k 2 ( M + 2 ) ,
where σ u , k 2 = E u l ( i ) 2 . It is advisable to set the threshold proportionate to the absolute mean of the zero weights. Consequently, from (15), ρ can be defined as
ρ k = π ( 2 μ k σ u , k 2 ( M + 2 ) ) α μ k σ v , k 2 ,
where α serves as a user parameter to control the threshold.

3. Performance Analysis

For the performance analysis, the estimate w k , i is regarded as the realization of a random process w k , i . Firstly, the error weight vectors for node k are defined as
w ˜ k , i = w o w k , i ,
ψ ˜ k , i = w o ψ k , i ,
w ˜ aux , k , i = w o w aux , k , i .
With the notations, the global vectors are defined as
w ˜ i = w ˜ 1 , i w ˜ 2 , i w ˜ N , i ,   ψ ˜ i = ψ ˜ 1 , i ψ ˜ 2 , i ψ ˜ N , i ,   w ˜ aux , i = w ˜ aux , 1 , i w ˜ aux , 2 , i w ˜ aux , N , i ,   W o = w o w o w o .
Subsequently, the step-size matrix is defined as
M = diag μ 1 I M , , μ N I M ,
and the threshold matrix is
P = diag ρ 1 I M , , ρ N I M .
Furthermore, the weighting matrices are represented as
C = C I M , A = A I M N M × N M ,
where C is an adaptation matrix whose elements are c l , k , and A is a combination matrix whose elements are a l , k . We also define the following matrices:
D i = diag l = 1 N c l , 1 u l , i * u l , i , , l = 1 N c l , N u l , i * u l , i , N M × N M
V i = C T col u 1 , i * v 1 ( i ) , , u N , i * v N ( i ) . N M × 1
From the Equations (13) and (14), we have the following relations:
ψ ˜ aux , i = w ˜ aux , i 1 M D i w ˜ aux , i 1 + V i + γ M P w ˜ i 1 w ˜ aux , i 1 ,
w ˜ aux , i = A T ψ ˜ aux , i .
By combining (28) and (29), we have a single recursion:
w ˜ aux , i = A T I M D i w ˜ aux , i 1 A T M V i + γ A T M P w ˜ i 1 w ˜ aux , i 1 .
In addition, a binary matrix is defined via thresholding:
Q i = diag diag ( q 1 , i ) , , diag ( q N , i ) ,
where q k , i satisfies
q k , i ( n ) = 0 , if w aux , k , i ( n ) > 2 / ρ k 1 , otherwise .
With this matrix, the relationship between w i and w aux , i is reformulated as
w i = ( I Q i ) w aux , i ,
and it produces
w ˜ i = ( I Q i ) w ˜ aux , i + Q i W o .
Using this relationship, (30) is rewritten as
w ˜ aux , i = A T I M D i γ M P Q i 1 w ˜ aux , i 1 A T M V i + γ A T M P Q i 1 W o .

3.1. Assumptions

For the sake of mathematical tractability, the following assumptions are adopted. Although these assumptions are not true in general, they significantly simplify our analysis. These assumptions result in theoretical outcomes that align well with empirical observations. Most of these assumptions stem from the analytical results associated with the original diffusion LMS algorithm and its variants.
Assumption 2. 
It is assumed that all weight vectors w aux , k , i and w k , i , the input vector u k , i , and the additive noise v k ( i ) are mutually independent of each other.
Assumption 3. 
The matrix Q i is assumed independent of all weight vectors w aux , i and w i , and input and additive noise matrices D i and V i [29,40].
Assumption 4. 
Each component of Q i is assumed to be mutually uncorrelated, i.e., E q k , i ( n ) q k , i ( n ) = E q k , i ( n ) E q l , i ( m ) for all k , l , m , and n [29,40].

3.2. Mean Stability Analysis

Under the assumptions and recognizing that w ˜ i 1 w ˜ aux , i 1 = w aux , i 1 w i 1 , the expectation of (30) can be reformulated as
E w ˜ aux , i = A T I M D E w ˜ aux , i 1 + γ A T M P E w aux , i 1 w i 1 ,
where D = E D i . The recursion (36) can be further reduced to
E w ˜ aux , i = B i E w ˜ aux , 0 + n = 0 i 1 B n Δ i n ,
where B = A T ( I M D ) and Δ i = γ A T M P E w aux , i 1 w i 1 .
To ensure the convergence of E w ˜ aux , i , the spectral radius of B should be less than one, i.e., λ max A T I M D < 1 where λ max ( A ) denotes the maximum eigenvalue of a Hermitian positive semi-definite matrix A [30]. In this regard, μ k should comply with
0 < μ k < 2 λ max ( k = 1 N c l , k R u , l ) k = 1 , , N .
Under this step-size condition (38), the bias in the mean sense, E w ˜ aux , i , converges as i , which can be deduced:
lim i E w ˜ aux , i = lim i γ Z E w aux , i w i ,
where
Z = I A T I M D 1 A T M P .
Furthermore, from (37), the steady-state expectation of the maximum norm of the bias is given by [30],
lim i E w ˜ aux , i = lim i j = 0 i 1 B j Δ i j lim i j = 0 i 1 B j Δ i j .
Let us define δ = B and b max = max i E w aux , i w i , then we have
lim i j = 0 i 1 B j Δ i j lim i j = 0 i 1 B j Δ i j γ μ max ρ max b max 1 δ ,
where μ max and ρ max represent the maximum step size and threshold across all nodes, respectively. From (15), the absolute difference w aux , k , i ( n ) w k , i ( n ) is always less than or equal to 2 / ρ k . Consequently, b max 2 / ρ min where ρ min is the smallest threshold across all nodes. As a result, the inequality of (42) can be rewritten as
lim i E w ˜ aux , i γ μ max ρ max b max 1 δ 2 γ μ max ρ max ρ min 1 δ .
Given that w ˜ aux , i converges, it follows that w ˜ i will also converge in the mean sense.

3.3. Transient Analysis in Mean Square

In this section, we delineate the transient behavior of the mean-square deviation from a theoretical perspective. Consider Σ as a Hermitian positive semi-definite matrix that we are free to choose. We use the notation w ˜ i Σ 2 to denote the squared weighted norm of w ˜ i with respect to Σ such as w ˜ i Σ 2 = w ˜ i * Σ w ˜ i .
From (35), the expectation of the squared weighted norm of w ˜ aux , i is represented as
E w ˜ aux , i Σ 2 = E w ˜ aux , i 1 I D i M γ Q i P M A Σ A T I M D i γ M P Q i 2 + E V i * M A Σ A T M V i + γ 2 E W o T Q i P M A Σ A T M P Q i W o + 2 γ E W o T Q i P M A Σ A T I M D i γ M P Q i w ˜ aux , i 1 .
Under Assumptions 2–4, the equation of (44) can be reformulated as follows:
E w ˜ aux , i Σ 2 = E w ˜ aux , i 1 Σ 2 + Tr Σ A T M G M A + γ 2 Tr E Σ A T M P Q i W o W o T Q i P M A + 2 γ Tr E Σ A T I M D γ M P Q i w ˜ aux , i 1 W o T Q i P M A ,
where
Σ = A Σ A T D M A Σ A T A Σ A T M D γ A Σ A T M P Q i γ Q i P M A Σ A T γ D M A Σ A T M P Q i γ Q i P M A Σ A T M D + E D i M A Σ A T M D i + γ 2 E Q i P M A Σ A T M P Q i ,
and
G = E V i V i * = C T diag σ v , 1 2 R u , 1 , , σ v , N 2 R u , N C .
Let σ = vec Σ and Σ = vec 1 ( σ ) , where vec 1 ( · ) is the inverse operation of the vec ( · ) . Following the relation between σ and Σ , we get
σ = vec ( Σ ) = F i σ ,
where the matrix F i is given by
F i = I I { I I D M D M I γ Q i P M I γ I Q i P M γ Q i P M D M γ D M Q i P M + γ 2 E Q i P M Q i P M + E D i M D i M } A A .
Using the property Tr Σ X = vec X T T σ , we can rewrite (45) as follows:
E w ˜ aux , i σ 2 = E w ˜ aux , i 1 F i σ 2 + vec A T M G M A T σ + γ 2 vec A T M P Q i W o W o T Q i P M A T σ + 2 γ vec A T M P Q i W o E [ w ˜ aux , i 1 T ] I D M γ Q i P M A T σ .
Let us define
b i = vec A T M G M A + γ 2 vec A T M P Q i W o W o T Q i P M A + 2 γ vec A T M P Q i W o E [ w ˜ aux , i 1 T ] I D M γ Q i P M A .
We can rewrite (50) as [41]
E w ˜ aux , i σ 2 = E w ˜ aux , i 1 F i σ 2 + b i T σ .
Expanding (52), the expectation of the squared weighted norm of w ˜ aux , i is
E w ˜ aux , i σ 2 & = E w ˜ aux , 0 j = 1 i F j σ 2 + Δ i σ ,
where
j = 1 i F j = j = 1 i 1 F j · F i ,
Δ i = F i T Δ i 1 + b i , ( Δ 0 = b 0 ) .
When we assume that w aux , k , 0 ( n ) = 0 for all nodes, w ˜ aux , 0 can be replaced with W o . Given that w ˜ i = ( I Q i ) w ˜ aux , i + Q i W o , the mean-square deviation at node k, MSD k ( i ) , is expressed as
MSD k ( i ) = E w ˜ k , i 2 = E w ˜ aux , i ( I Q i ) ( I Q i ) m k 2 + m k T vec Q i W o W o T Q i + 2 m k T vec Q i W o E [ w ˜ aux , i T ] ( I Q i ) ,
where
m k = vec diag e k I M ,
and e k is the kth standard basis vector, which corresponds to kth column of the N × N identity matrix I N . In other words, e k is an N × 1 vector with a one at the kth position and zeros elsewhere. The detailed derivation is provided in Appendix A.

3.4. Steady-State Analysis in Mean Square

In steady state, using (52), we have
E w ˜ aux , I F σ 2 = b T σ .
Assuming that the matrix I F is invertible, (58) can be rewritten as
E w ˜ aux , σ 2 = b T I F 1 σ .
If we assume that E [ w aux , k , ] = w o [23], b can be simplified to
b = vec A T M G M A + γ 2 vec A T M P Q W o W o T Q P M A ,
where
E q k , ( n ) = 0 , if w o ( n ) > 2 / ρ k 1 , otherwise .
Therefore, using (59), the steady-state MSD at node k is given by
MSD k ( ) = E w ˜ k , 2 = E w ˜ aux , ( I Q ) ( I Q ) m k 2 + m k T vec Q W o W o T Q .

4. Simulation Results

4.1. Performance Comparison

We conducted computer simulations to illustrate the effectiveness of our proposed algorithm.
Example of simulation setup: We consider a network topology composed of 16 interconnected nodes, as depicted in the top of Figure 1. The input regressors have zero-mean white Gaussian distribution with variances σ u , k 2 as illustrated in the middle of Figure 1. The background white noise power of individual nodes, denoted by σ v , k 2 , are shown in the bottom of Figure 1. The combination matrix A employs relative-degree weights, which are defined as a l , k = n l ( m N k n m ) . The adaptation matrix C uses the Metropolis weights described by c l , k = 1 max ( n k , n l ) [6,29]. Here, n k represents the degree of node k, which corresponds to the size of its neighboring nodes. The length of the unknown vector is set to M = 32 , and the regressor vector has the same size. Our study involves three unknown channels with 32 weights, reflecting various sparsity scenarios as shown in Figure 2. We designate only one from the 32 weights of w o as one, resulting in a highly sparse system. Following 2000 iterations, we activate 16 randomly selected weights with one, leading to a sparsity ratio of 50%. After 4000 iterations, all weights are set to one, giving us a fully dense (non-sparse) system.
Performance comparison: In Figure 3, we present the learning curves related to network MSD for three distinct diffusion algorithms: diffusion LMS [13], RZA diffusion LMS [30], and our proposed algorithm. To ensure a balanced comparison, we used a consistent step size of μ = 0.03 across all algorithms. We derived our simulation outcomes by averaging over 50 independent trials. The sparsity parameters of the RZA diffusion LMS were determined as γ = 5 × 10 5 and ε = 0.01 . Meanwhile, the parameters for our proposed algorithm were set to γ = 5 × 10 5 and α = 3.25 (with ρ 1800) to ensure an identical rate of convergence across all algorithms. As shown in Figure 3, in a highly sparse system, our proposed algorithm consistently outperforms both the standard diffusion LMS and the conventional RZA diffusion LMS algorithms in terms of steady-state performance. Our steady-state error is approximately 11.6 dB and 14.7 dB lower than that of the RZA diffusion LMS and standard diffusion LMS algorithms, respectively. For systems where the unknown vector exhibits 50% sparsity, our proposed algorithm demonstrated the lowest error, which is 2.1 dB and 3.3 dB lower than the respective algorithms. While sparse diffusion LMS algorithms still maintain a lower steady-state MSD compared to the standard diffusion LMS, the performance difference becomes less pronounced as sparsity decreases. In fully dense systems, the performance of all algorithms appears largely equivalent.

4.2. Theoretical Validation

In this subsection, we validate our theoretical findings compared to the empirical results. For this simulation, only the 9th weight was set to one and the others were set to zeros. The step size was set to μ = 0.01 . The other parameters remained consistent with those outlined in Section 4.1.
Figure 4 displays the transient network MSD of our proposed algorithm alongside its theoretical result. The theoretical result is well matched with the empirical result throughout all iterations.
Finally, to verify the steady-state MSE analysis, the theoretical result (62) is compared with the simulation outcomes at each node as shown in Figure 5. Despite the theoretical results being slightly higher than the empirical results at the steady state, their overall agreement is satisfactory. This minor deviation can be attributed to the simplifications in our analysis stemming from several assumptions and approximations.

5. Conclusions

In this paper, we introduced a novel sparse distributed estimation method based on the diffusion LMS algorithm. Sparse systems are typically characterized by the predominance of zero coefficients. To achieve this sparse representation, we incorporated L 0 -norm regularization into the diffusion LMS update equation and employed a hard-thresholding technique. Our rigorous statistical analysis resulted in theoretical insights into the mean stability as well as both transient and steady-state MSD behaviors. Simulation results confirmed the improvement of our proposed algorithm and exhibited the consistency between the theoretical results and empirical observations. Significantly, our proposed algorithm demonstrated superior performance compared to conventional sparse algorithms, including the standard diffusion LMS and RZA diffusion LMS, especially in the highly sparse system scenario of distributed estimation over networks. Based on our findings, we recognize the need to expand on the current limitations. Future work will explore enhancements to the sparse diffusion normalized LMS (NLMS) and sparse diffusion affine projection algorithms, aiming to speed up the convergence rate in the presence of highly correlated input signals.

Author Contributions

Conceptualization, H.-S.L. and S.-E.K.; methodology, H.-S.L.; software, H.-S.L.; validation, H.-S.L., C.J., C.S. and S.-E.K.; formal analysis, H.-S.L.; investigation, H.-S.L., C.J., C.S. and S.-E.K.; resources, H.-S.L.; writing—original draft preparation, H.-S.L. and S.-E.K.; writing—review and editing, C.J., C.S. and S.-E.K.; visualization, S.-E.K.; supervision, S.-E.K.; funding acquisition, S.-E.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by Seoul National University of Science and Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The Equation (56) can be derived as follows. The mean-squared deviation at node k, E w ˜ k , i 2 , can be obtained by
E w ˜ k , i 2 = E w ˜ i T S k w ˜ i = E w ˜ i S k 2 ,
where S k = diag ( e k ) I M is a selection matrix for node k. From the relationship, w ˜ i = ( I Q i ) w ˜ aux , i + Q i W o , the Equation (A1) is reformulated by
E w ˜ i S k 2 = E ( I Q i ) w ˜ aux , i + Q i W o S k 2 = E w ˜ aux , i ( I Q i ) S k ( I Q i ) 2 + E W o T Q i S k Q i W o + 2 E W o T Q i S k ( I Q i ) w ˜ aux , i .
By the relationship x T y = Tr ( y x T ) , the Equation (A2) can be rewritten by
E w ˜ i S k 2 = E w ˜ aux , i ( I Q i ) S k ( I Q i ) 2 + Tr E S k Q i W o W o T Q i + 2 Tr E S k ( I Q i ) w ˜ aux , i W o T Q i .
Using the property Tr ( S k X ) = m k T vec ( X T ) where m k = vec ( S k ) , we can rewrite (A3) as follows:
E w ˜ i m k 2 = E w ˜ aux , i ( I Q i ) ( I Q i ) m k 2 + m k T vec Q i W o W o T Q i + 2 m k T vec Q i W o E w ˜ aux , i T ( I Q i ) .

References

  1. Sayed, A.H. Adaptation, learning, and optimization over networks. Found. Trends Mach. Learn. 2014, 7, 311–801. [Google Scholar] [CrossRef]
  2. Lopes, C.G.; Sayed, A.H. Incremental adaptive strategies over distributed networks. IEEE Trans. Signal Process. 2007, 55, 4064–4077. [Google Scholar] [CrossRef]
  3. Schizas, I.D.; Mateos, G.; Giannakis, G.B. Distributed LMS for consensus-based in-network adaptive processing. IEEE Trans. Signal Process. 2009, 57, 2365–2382. [Google Scholar] [CrossRef]
  4. Soatti, G.; Nicoli, M.; Savazzi, S.; Spagnolini, U. Consensus-based algorithms for distributed network-state estimation and localization. IEEE Trans. Signal Inf. Process. Netw. 2016, 3, 430–444. [Google Scholar] [CrossRef]
  5. Lopes, C.G.; Sayed, A.H. Diffusion least-mean squares over adaptive networks: Formulation and performance analysis. IEEE Trans. Signal Process. 2008, 56, 3122–3136. [Google Scholar] [CrossRef]
  6. Cattivelli, F.S.; Sayed, A.H. Diffusion LMS strategies for distributed estimation. IEEE Trans. Signal Process. 2010, 58, 1035–1048. [Google Scholar] [CrossRef]
  7. Liu, Y.; Li, C.; Zhang, Z. Diffusion sparse least-mean squares over networks. IEEE Trans. Signal Process. 2012, 60, 4480–4485. [Google Scholar] [CrossRef]
  8. Sayed, A.H. Diffusion adaptation over networks. In Academic Press Library Signal Processing; Elsevier: Amsterdam, The Netherlands, 2014; Volume 3, pp. 323–453. [Google Scholar]
  9. Tu, S.Y.; Sayed, A.H. Diffusion strategies outperform consensus strategies for distributed estimation over adaptive networks. IEEE Trans. Signal Process. 2012, 60, 6217–6234. [Google Scholar] [CrossRef]
  10. Lee, J.W.; Kim, S.E.; Song, W.J.; Sayed, A.H. Spatio-temporal diffusion strategies for estimation and detection over networks. IEEE Trans. Signal Process. 2012, 60, 4017–4034. [Google Scholar] [CrossRef]
  11. Jung, S.M.; Seo, J.H.; Park, P. A variable step-size diffusion normalized least-mean-square algorithm with a combination method based on mean-square deviation. Circuits Syst. Signal Process. 2015, 34, 3291–3304. [Google Scholar] [CrossRef]
  12. Yu, Y.; He, H.; Yang, T.; Wang, X.; de Lamare, R.C. Diffusion normalized least mean M-estimate algorithms: Design and performance analysis. IEEE Trans. Signal Process. 2020, 68, 2199–2214. [Google Scholar] [CrossRef]
  13. Cattivelli, F.S.; Lopes, C.G.; Sayed, A.H. Diffusion recursive least-squares for distributed estimation over adaptive networks. IEEE Trans. Signal Process. 2008, 56, 1865–1877. [Google Scholar] [CrossRef]
  14. Li, L.; Chambers, J.A. Distributed adaptive estimation based on the APA algorithm over diffusion networks with changing topology. In Proceedings of the IEEE Statistical Signal Processing Workshop, Cardiff, UK, 31 August–3 September 2009; pp. 757–760. [Google Scholar]
  15. Takahashi, N.; Yamada, I.; Sayed, A.H. Diffusion least-mean squares with adaptive combiners: Formulation and performance Analysis. IEEE Trans. Signal Process. 2010, 58, 4795–4810. [Google Scholar] [CrossRef]
  16. Lee, H.S.; Kim, S.E.; Lee, J.W.; Song, W.J. A variable step-size diffusion LMS algorithm for distributed estimation. IEEE Trans. Signal Process. 2015, 63, 1808–1820. [Google Scholar] [CrossRef]
  17. Chu, Y.; Chan, S.C.; Zhou, Y.; Wu, M. A new diffusion variable spatial regularized LMS algorithm. Signal Process. 2021, 188, 108207. [Google Scholar] [CrossRef]
  18. Shi, K.; Shi, P. Convergence analysis of sparse LMS algorithms with l1-norm penalty based on white input signal. Signal Process. 2010, 90, 3289–3293. [Google Scholar] [CrossRef]
  19. Wagner, K.; Doroslovacki, M. Proportionate-type normalized least mean square algorithms with gain allocation motivated by mean-square error minimization for white input. IEEE Trans. Signal Process. 2011, 59, 2410–2415. [Google Scholar] [CrossRef]
  20. Murakami, Y.; Yamagishi, M.; Yukawa, M.; Yamada, I. A sparse adaptive filtering using time-varying soft-thresholding techniques. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 14–19 March 2010; pp. 3734–3737. [Google Scholar]
  21. Su, G.; Jin, J.; Gu, Y.; Wang, J. Performance analysis of l_0 norm constraint least mean square algorithm. IEEE Trans. Signal Process. 2012, 60, 2223–2235. [Google Scholar] [CrossRef]
  22. Chen, Y.; Gu, Y.; Hero III, A.O. Sparse LMS for system identification. In Proceedings of the IEEE Internatinal Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; pp. 3125–3128. [Google Scholar]
  23. Lee, H.S.; Lee, J.W.; Song, W.J.; Kim, S.E. Adaptive algorithm for sparse system identification based on hard-thresholding techniques. IEEE Trans. Circuits Syst. II Exp. Briefs 2020, 67, 3597–3601. [Google Scholar] [CrossRef]
  24. Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
  25. Yang, X.; Wu, L.; Zhang, H. A space-time spectral order sinc-collocation method for the fourth-order nonlocal heat model arising in viscoelasticity. Appl. Math. Comput. 2023, 457, 128192. [Google Scholar] [CrossRef]
  26. Wang, W.; Zhang, H.; Jiang, X.; Yang, X. A high-order and efficient numerical technique for the nonlocal neutron diffusion equation representing neutron transport in a nuclear reactor. Ann. Nucl. Energy 2023, 195, 110163. [Google Scholar] [CrossRef]
  27. Jiang, X.; Wang, J.; Wang, W.; Zhang, H. A predictor-corrector compact difference scheme for a nonlinear fractional differential equation. Fractal Fract. 2023, 7, 521. [Google Scholar] [CrossRef]
  28. Yim, S.H.; Lee, H.S.; Song, W.J. Proportionate diffusion LMS algorithm for sparse distributed estimation. IEEE Trans. Circuits Syst. II Exp. Briefs 2015, 62, 992–996. [Google Scholar] [CrossRef]
  29. Lee, H.S.; Yim, S.H.; Song, W.J. z2-proportionate diffusion LMS algorithm with mean square performance analysis. Signal Process. 2017, 131, 154–160. [Google Scholar] [CrossRef]
  30. Lorenzo, P.D.; Sayed, A.H. Sparse distributed learning based on diffusion adaptation. IEEE Trans. Signal Process. 2013, 61, 1419–1433. [Google Scholar] [CrossRef]
  31. Das, B.K.; Chakraborty, M.; Arenas-García, J. Sparse distributed estimation via heterogeneous diffusion adaptive networks. IEEE Trans. Circuits Syst. II Exp. Briefs 2016, 63, 1079–1083. [Google Scholar] [CrossRef]
  32. Kumar, K.S.; George, N.V. Polynomial sparse adaptive estimation in distributed networks. IEEE Trans. Circuits Syst. II Exp. Briefs 2018, 65, 401–405. [Google Scholar]
  33. Nautiyal, M.; Bhattacharjee, S.S.; George, N.V. Robust and sparse aware diffusion adaptive algorithms for distributed estimation. IEEE Trans. Circuits Syst. II Exp. Briefs 2022, 69, 239–243. [Google Scholar] [CrossRef]
  34. Huang, W.; Chen, C.; Yao, X.; Li, Q. Diffusion fused sparse LMS algorithm over networks. Signal Process. 2020, 171, 107497. [Google Scholar] [CrossRef]
  35. Huang, W.; Shan, H.; Xu, J.; Yao, X. Adaptive diffusion pairwise fused Lasso LMS algorithm over networks. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 5816–5827. [Google Scholar] [CrossRef] [PubMed]
  36. Martin, R.K.; Sethares, W.A.; Williamson, R.C.; Johnson, C.R. Exploiting sparsity in adaptive filters. IEEE Trans. Signal Process. 2002, 50, 1883–1894. [Google Scholar] [CrossRef]
  37. Krishnan, D.; Fergus, R. Fast image deconvolution using hyper-Laplacian priors. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–11 December 2009; pp. 1033–1041. [Google Scholar]
  38. Zuo, W.; Meng, D.; Zhang, L.; Feng, X.; Zhang, D. A generalized iterated shrinkage algorithm for non-convex sparse coding. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 217–224. [Google Scholar]
  39. Haykin, S.S. Adaptive Filter Theory; Pearson Education India: Noida, India, 2005. [Google Scholar]
  40. Haddad, D.B.; Petraglia, R. Transient and steady-state MSE analysis of the IMPNLMS algorithm. Digital Signal Process. 2014, 33, 50–59. [Google Scholar] [CrossRef]
  41. Saeed, M.O.B.; Zerguine, A.; Zunmo, S.A. A variable step-size strategy for distributed estimation over adaptive networks. EURASIP J. Adv. Signal Process. 2013, 1, 135. [Google Scholar] [CrossRef]
Figure 1. Network topology with 16 interconnected nodes (top), regressor variances σ u , k 2 (middle), and noise variances σ v , k 2 (bottom).
Figure 1. Network topology with 16 interconnected nodes (top), regressor variances σ u , k 2 (middle), and noise variances σ v , k 2 (bottom).
Mathematics 11 04638 g001
Figure 2. Unknwon vectors: Channel 1 (top), Channel 2 (middle), and Channel 3 (bottom).
Figure 2. Unknwon vectors: Channel 1 (top), Channel 2 (middle), and Channel 3 (bottom).
Mathematics 11 04638 g002
Figure 3. Network MSD comparison of the proposed algorithm with the standard [13] and RZA [30] diffusion LMS algorithms.
Figure 3. Network MSD comparison of the proposed algorithm with the standard [13] and RZA [30] diffusion LMS algorithms.
Mathematics 11 04638 g003
Figure 4. Theoretical transient MSD compared with empirical results.
Figure 4. Theoretical transient MSD compared with empirical results.
Mathematics 11 04638 g004
Figure 5. Theoretical steady-state MSD at each node compared with empirical results.
Figure 5. Theoretical steady-state MSD at each node compared with empirical results.
Mathematics 11 04638 g005
Table 1. Mathematical Notations.
Table 1. Mathematical Notations.
NotationsDescription
· Euclidean norm of its argument
· 0 L 0 -norm of its argument
· L -norm of its argument (maximum norm)
E · Mathematical expectation
λ max ( · ) Largest eigenvalue of a matrix
Tr · Trace operator
· T Transposition
· * Hermitian transposition
col Column vector with its entries
diag Diagonal matrix with its entries
vec Stack the columns of its matrix argument on top of each other
IIdentity matrix
Kronecker product operation
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, H.-S.; Jin, C.; Shin, C.; Kim, S.-E. Sparse Diffusion Least Mean-Square Algorithm with Hard Thresholding over Networks. Mathematics 2023, 11, 4638. https://doi.org/10.3390/math11224638

AMA Style

Lee H-S, Jin C, Shin C, Kim S-E. Sparse Diffusion Least Mean-Square Algorithm with Hard Thresholding over Networks. Mathematics. 2023; 11(22):4638. https://doi.org/10.3390/math11224638

Chicago/Turabian Style

Lee, Han-Sol, Changgyun Jin, Chanwoo Shin, and Seong-Eun Kim. 2023. "Sparse Diffusion Least Mean-Square Algorithm with Hard Thresholding over Networks" Mathematics 11, no. 22: 4638. https://doi.org/10.3390/math11224638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop