A Robust Diffusion Estimation Algorithm with Self-Adjusting Step-Size in WSNs

In wireless sensor networks (WSNs), each sensor node can estimate the global parameter from the local data in a distributed manner. This paper proposed a robust diffusion estimation algorithm based on a minimum error entropy criterion with a self-adjusting step-size, which are referred to as the diffusion MEE-SAS (DMEE-SAS) algorithm. The DMEE-SAS algorithm has a fast speed of convergence and is robust against non-Gaussian noise in the measurements. The detailed performance analysis of the DMEE-SAS algorithm is performed. By combining the DMEE-SAS algorithm with the diffusion minimum error entropy (DMEE) algorithm, an Improving DMEE-SAS algorithm is proposed for a non-stationary environment where tracking is very important. The Improving DMEE-SAS algorithm can avoid insensitivity of the DMEE-SAS algorithm due to the small effective step-size near the optimal estimator and obtain a fast convergence speed. Numerical simulations are given to verify the effectiveness and advantages of these proposed algorithms.


Introduction
The problem of parameter estimation, which is the indirect determination of the unknown parameters from measurements of other quantities [1][2][3][4][5][6], is a key issue in the signal processing field. Distributed estimation has become very popular for parameter estimation in wireless sensor networks. The objective is to enable the nodes to estimate a vector of parameters of interest in a distributed manner from the observed data. Distributed estimation schemes over adaptive networks can be mainly classified into incremental strategies [7][8][9], consensus strategies [10,11], and diffusion strategies [12][13][14][15][16][17][18][19][20][21][22]. In the incremental strategies, data is processed in a cyclic fashion through the network. The consensus strategies rely on the fusion of intermediate estimates of multiple neighboring nodes. In the Diffusion strategies, information is processed at all nodes while the nodes communicate with all their neighbors to share their intermediate estimates. The diffusion strategies are particularly attractive because they are robust, flexible and fully-distributed, such as the diffusion least mean squares (DLMS) algorithm [12]. In this paper, we focus on the diffusion estimation strategies.
The performance of distributed estimation degrades severely when the signals are perturbed by non-Gaussian noise. Non-Gaussian noise may be natural, due to atmospheric phenomena, or man-made, due to either electric machinery present in the operation environment, or multipath telecommunications signals [23][24][25]. Recently, some researchers focus on improving robustness for non-Gaussian noise of distributed estimation methods. The efforts are mainly directed at searching for a more robust cost function to replace the MSE criterion, which is optimal only when the measurement noise is Gaussian. To address this problem, the diffusion least mean p-power (DLMP) based on p-norm error criterion was proposed to estimate the parameters of the wireless sensor networks [26]. The correntropy as a nonlinear similarity measure has been successfully used as a robust and efficient cost function for non-Gaussian signal processing [27][28][29][30]. In [27], two robust MCC based diffusion algorithms, namely the Adapt-then-Combine (ATC) and Combine-then-Adapt (CTA) diffusion maximum correntropy criterion (DMCC) algorithms, are developed to improve the performance of the distributed estimation over network in impulsive noise environments.
The error entropy criterion based on the minimum error entropy (MEE) method also has shown its ability to achieve more accurate estimates than mean-square error (MSE) under non-Gaussian noise [31][32][33][34][35][36][37]. In [31], the diffusion minimum error entropy (DMEE) was proposed. The DMEE algorithm achieved improved performance for non-Gaussian noise with the fixed step-size, but it still suffers from conflicting requirements between convergence rate and the steady-state mean square error. A large step-size leads to a fast convergence rate but a large mean-square error at the steady state. For this problem, variable step-size techniques have been widely used to improve the convergence of diffusion LMS algorithms remarkably by adjusting the step-size appropriately [38][39][40][41]. Lee et al. [38] proposed a novel variable step-size diffusion LMS algorithm which controls the step-size suboptimally to attain the minimum mean square error at each time instant. In [41], Abdolee investigated the effect of adaptation step-sizes on the tracking performance of DLMS algorithms in networks under non-stationary signal conditions. However, to the best of our knowledge, the variable step-size technique has not been extended to the field of distributed minimum error entropy estimation for non-Gaussian noise yet.
In this paper, we incorporate the minimum error entropy criterion with self-adjusting step-size (MEE-SAS) [42] into the cost function in diffusion distributed estimation. Then, we figure out the diffusion-strategy solutions, which are referred to as the diffusion MEE-SAS (DMEE-SAS) algorithm. Numerical simulation results show that the DMEE-SAS algorithm outperforms DLMS, DLMP and DMEE algorithms when the noise is modeled to be non-Gaussian noise. We also design an Improving DMEE-SAS algorithm by using a switching scheme between DMEE-SAS and DMEE algorithms for a non-stationary environment, which tracks the changing estimator very effectively. The Improving DMEE-SAS algorithm can avoid the small effective step-size of the DMEE-SAS algorithm when it is close to the optimal estimator.
We organize the paper as follows. In Section 2, we briefly revisit the minimization error entropy criterion. In Section 3, firstly, we propose the DMEE-SAS algorithm and analyze the mean, mean square and instantaneous MSD performance for the DMEE-SAS algorithm. Then, we propose the Improving DMEE-SAS algorithm for a non-stationary scenario. Simulation results are shown in Section 4. Finally, we draw conclusions in Section 5.

Minimization Error Entropy Criterion
Considering the limited computational capability and limited memory space for nodes in real distributed networks, this paper is based on an MEE criterion, which is simple enough and has good estimation accuracy. Important properties of MEE can be found in [32,35,37]. In many real world applications, the MEE estimator can outperform significantly the well-known MSE estimator and show strong robustness to noises, especially when data are contaminated by non-Gaussian noises. In this subsection, we introduce an MEE criterion, which could be used to derive a robust diffusion estimation algorithm with a self-adjusting step-size (DMEE-SAS) algorithm.
The aim of the adaptive signal processing problem is to minimize the difference between the desired and the system outputs, which is defined as error e. For the evaluation of the error entropy, we seek to estimate entropy directly from the error samples. Therefore, system parameters can be estimated by minimizing the Renyi's entropy of the error e. Renyi's entropy is given by where q α (e) is the probability density function of a continuous error e, and α is a parameter. When parameter α is set as 2, Equation (1) is quadratic Renyi's entropy. Using a Gaussian kernel with kernel size σ, we can obtain a convenient evaluation of the integral operator in the formulation of quadratic Renyi's entropy as follows: where e = [e 1 , e 2 , · · · , e N ] is N independent and identically distributed samples, and the Gaussian kernel is defined as The information V(e) is quadratic information potential and is defined as the expectation of probability density function, V (e) = E [q (e)]. The quadratic information potential V(e) can be easily estimated by using a simple and effective nonparametric estimator The maximum value of the quadratic information potential V(0) will be achieved when e 1 = e 2 = · · · = e N . The above results are obtained in the case of batch mode, where the N data points are fixed. For online training methods, in order to reduce calculation costs, the estimate of quadratic information potential can be approximated stochastically by dropping the time average in (3), leading to where L is the latest L samples at time i. Obviously, to minimize the error entropy is equivalent to maximizing the quadratic information potential since the log is a monotonic function. Therefore, the cost function for the MEE criterion is given by The selection of the kernel size σ is an important step in estimating the information potential and is critical to the success of information theoretic criteria. In particular, increasing the kernel size leads to a stretching effect on the performance surface in the weight space, which results in increased accuracy of the quadratic approximation around the optimal point [43]. In order to ensure accuracy, in the following, a large enough kernel size can be used during the adaptation process, which is commonly used in information theoretic criteria [42,44].

Proposed Algorithms
As mentioned in the Introduction, the diffusion minimum error entropy algorithm achieved improved performance for non-Gaussian noise with the fixed step-size, but it still suffers from conflicting requirements between convergence rate and the steady-state mean square error. Therefore, we consider a new cost function, which can achieve fast convergence speed and strong robustness against non-Gaussian noise.

Diffusion MEE-SAS Algorithm
Consider a connected wireless sensor networks with K nodes. k ∈ {1, 2, . . . , K} is the node index and i is the time index. To proceed with the analysis, we assume a liner measurement model as follows: where w 0 is a M × 1 deterministic but unknown vector, d k,i is a scalar measurement of some random process, u k,i is the M × 1 regression vector at time i with zero mean, and v k,i is the random noise signal at time i with zero mean. For each node k, we have where We seek an estimate of w 0 by minimizing a linear combination of local information. As explained in Section 2, minimizing a linear combination of the local information is equivalent to maximizing a linear combination of the local quadratic information potential V(e k,i ). To maximize the information potential is equivalent to minimizing the following cost function: where N k denotes the one-hop neighbor set of node k, and {c lk } are some non-negative cooperative coefficients satisfying Here,C is a N × N matrix with individual entries {c lk } and 1 N is a N × 1 all-unity vector. The gradient of the individual local cost function is given by where f l (w) = ( 2 We can replace the estimate of quadratic information potential by the stochastic quadratic information potential, leading to wheref where Iterative steepest-descent solution for estimating w 0 at each node k can thus be derived as where µ k is a positive step size. Using the general framework for diffusion-based distributed adaptive optimization [13], an adapt-then-combine (ATC) strategy for diffusion MEE-SAS algorithm can be formulated as According to Equation (15), the DMEE-SAS algorithm can be seen as a diffusion estimation algorithm with variable step size µ k (i), where The DMEE-SAS algorithm is described formally in Algorithm 1.

Algorithm 1: DMEE-SAS Algorithm
Initialize: w k,i = 0 for i = 1 : T for each node k: Adaptation

end for
In the adaption step of DMEE-SAS algorithm, V(0) − V(e k,i ) is close to V(0) when the algorithm starts, and it is close to 0 when the algorithm begins to converge. V(0) − V(e k,i ) is always a non-negative scalar quantity, which can accelerate the rate of convergence and achieve small steady-state estimation errors. The fast convergence rate and the small steady-state estimation errors of the DMEE-SAS algorithm can be established against non-Gaussian noise in the measurements.

Performance Analysis
In this section, we analyze the mean, mean-square and instantaneous MSD performance of the DMEE-SAS algorithm. For tractability of the analysis, here we focus on the case of batch mode. To briefly present the convergence property of the proposed algorithm in terms of global quantities, the following notations are introduced: M = diag{µ 1 I M , . . . , µ K I M }, W i = col{w 1,i , · · · w K,i }, w (0) = col{w 0 , · · · , w 0 },W i = col{w 1,i · · ·w K,i }, S = col{s 1 (w 0 ), · · · , s K (w 0 )}, C =C T ⊗ I M , I M is the identity matrix.
In order to make the analysis tractable, the followings are assumed: Assumption 1: The regressor u k,i is independent identically distributed (i.i.d) in time and spatially independent, and E[u k,i ] = 0, R k = E[u T k,i u k,i ]. Assumption 2: The input noise v k,i is super-Gaussian noise. In addition, v k,i and the regressor u k,i are independent from each other. We have E[v k,i ] = 0 and E[v 2 k,i ] = ξ k . Assumption 3: The step-sizes, µ k , ∀k, are small enough such that their squared values are negligible.

Mean Performance
Because the input signal and output noises are generated from stationary and ergodic processes, the double time average in Equation (10) can be replaced by the expectation, leading to We consider the gradient error caused by approximating the quadratic information potential V(e k,i ) with their instantaneous values [45]. The gradient error at iteration i and each node k is defined as follows: Using Equation (15), the update equation of the intermediate estimate can be rewritten as According to [44], when input signal-to-noise ratio is not too low, the error should be small on the whole. Therefore, for a relative large kernal size σ, when w = w 0 , ((e k,i − e k,j )/σ ≈ 0 and . Therefore, the Hessian matrix function H k (w 0 ) of F l (w) is calculated as: Based on the Theorem 1.2.1 of [46], we obtain wherew k,i = w 0 − w k,i is the weight error vector for node k. We assume that the estimate of each node converges to the vicinity of the unknown vector w 0 . Therefore,w k,i is small enough such that it is negligible, yielding We can also obtain the approximation of the gradient error at the vicinity of w 0 , which is given by Substituting Equations (22) and (23) into Equation (19), an approximation of intermediate estimate can be obtained at the vicinity of By substituting Equation (24) into the second equation of Equation (15), we get the estimate of unknown parameter as follows: Using global quantities defined above gives the update equation for the network estimate vector as where H collects the Hessian matrix across the network into the global vector H = diag(H 1 (w 0 ), · · · , H N (w 0 )). Noting that Cw (0) = w (0) , subtraction of both sides of Equation (26) from w (0) givesW In view of assumptions A1 and A2,W i , H and C are independent of each other. Hence, taking expectation of both sides of Equation (27) leads to We can easily find that E[S] = col{E[s 1 (w 0 ), · · · , s N (w 0 )]} = 0, and Equation (28) has therefore been reduced to this form From Equation (29), we observe that, in order to be stable for Algorithm 1 in the mean sense, the matrix E[C](I MN − MH) should be stable. All the entries of E(C) are non-negative and all the rows of it add up to unity. Therefore, to ensure the stability in the mean, it should hold that We use the notion λ max (A) to denote the maximum eigenvalue of a Hermitian matrix A. Thus, we note that a sufficient condition for unbiasedness is

Mean-Square Performance
In order to make the presentation clearer, we shall introduce the following notation Performing weighted energy balance on both sides of Equation (27) and taking expectations gives where Σ is an arbitrary symmetric nonnegative-definite matrix, and the notion a 2 Σ = a T Σa represents a weighted vector norm for any Hermitian Σ > 0. By defining where the vec(.) notation stacks the columns of its matrix argument on top of each other. We can modify Equation (32) to Using the following relationship of the vectorization operator and the Kronecker product [47]: We can obtain that where Considering Assumption 3, we can approximate Equation (35) as Using the following relationship of the vectorization operator and the matrix trace [47]: We find that where Substituting Equations (34) and (37) into Equation (33), we can then reformulate recursion as follows: It is known that Equation (38) is stable and convergent if the matrix φ is stable [48], form the Equation We know that all the entries of β in Equation (37) are non-negative, and all its columns sum up to unity. Using the property λ(A ⊗ A) = λ 2 (A), the stability of φ has the same conditions as the stability of I MN − MH. Therefore, we choose the step size in accordance with Equation (31), which can keep the DMEE-SAS stable in the mean-square sense.

Instantaneous MSD
In order to analyze instantaneous mean-square-error (MSD), we can exploit the liberty of choosing θ at time i. Then, Expression (38) gives: The sum of both sides of Equation (39) for n = 0, 1, ..., i − 1 can be given by We can also adopt a similar way to describe the time instant i + 1, given by Subtraction of both sides of Equation (40) from Equation (41) gives By setting in Equation (44) and dividing both sides of it by N, the instantaneous MSD for the whole network are computed by: whereW i can be obtained by the following iteration:

An Improving Scheme for the DMEE-SAS Algorithm
The too small effective step size near the optimal estimator will hinder the tracking ability of the DMEE-SAS algorithm in a non-stationary environment. In a non-stationary environment, the optimal estimator has small changes. A random-walk model is commonly used in the literature to describe the non-stationarity of the weight vector [48].
Therefore, we try to combine the DMEE-SAS algorithm with the DMEE algorithm [31] in a non-stationary environment where tracking is important. The DMEE-SAS algorithm should be used due to the faster convergence when the algorithm starts, and the DMEE algorithm should be used when the algorithm begins to converge. We use the Lyapunov stability theory [49] to analyze the switching time for each node. The Lyapunov energy function is a method for analyzing the convergence characteristics of dynamic systems. The cost function can be viewed as a Lyapunov energy function. For the DMEE-SAS algorithm, the continuous-time learning rule iṡ The temporal dynamics for the Lyapunov energy that describes the DMEE-SAS algorithm can be obtained as follows: The individual local energy function for DMEE algorithm can be written as For the DMEE algorithm, the continuous-time learning rule iṡ In a similar way, the temporal dynamics for the Lyapunov energy that describes the DMEE algorithm can be obtained as follows: The switching time is determined as When the condition of Equation (50) is met, we should switch from the DMEE-SAS algorithm to the DMEE-SAS algorithm. We introduce the following auxiliary variable: This yields the following algorithm, which we refer to as the improving DMEE-SAS algorithm: For the purpose of clarity, we summarize the procedure of the Improving DMEE-SAS algorithm in Algorithm 2.

Simulation Results
Twenty sensors are randomly placed in a square 100 × 100 shown in Figure 1. The communication distance is set as 50. In this paper, the performance of the steady-state network MSD [12] is adopted for performance comparison. All of the performance measures are averaged over 100 trials. We employ the super-Gaussian distribution as the noise model in our simulations. We generate the noise from the zero-mean generalized Gaussian distribution of probability density function q V (v) =∝ exp(− |v| p ), where p is a positive shape parameter of probability density function [50]. We set p = 0.6 to make the noise distribution be super-Gaussian.

(a) In Stationary Environment
Here, the proposed DMEE-SAS algorithm performance is compared with that of some existing algorithms in the literature. We assume the communication link is the ideal link. The unknown parameter vector w 0 is set to [ 1 We set the window length L = 8 and kernel size σ=1.5 for both DMEE and DMEE-SAS algorithms. Furthermore, the p is 1.2 for the DLMP algorithm. The steady state MSD curves are plotted in Figure 2. It is found that the DMEE-SAS algorithm is robust to the non-Gaussian noises and performs better than the DLMP algorithm [26] and DLMS [12]. The DMEE-SAS algorithm achieves a better convergence performance than the DMEE [31] algorithm when the DMEE-SAS and DMEE algorithms achieve comparable performance. Here, the simulations are carried out in the same environments as those shown in stationary environment, except for the optimal estimator w 0 . We compare the proposed Improving DMEE-SAS algorithm with other algorithms.
Motivated by [51], we assume a time-varying w 0 of length 6 as follows: [a 1,i , a 2,i , a 3,i , a 4,i , a 5,i , a 6,i ] T , where a k,i = [cos(wi + (k−1) 2 π)] for k = 1, 2, 3, 4, 5, 6 and w = π 3000 . The unknown link is assumed to change at time 6000. In Figure 3, the Improving DMEE-SAS algorithm can detect the weight vector change and the performance of it is better than the DLMS algorithm. We observe that Improving DMEE-SAS and DMEE algorithms achieve comparable performance and Improving DMEE-SAS achieves better convergence performance than the DMEE algorithm. When compared with the DMEE-SAS algorithm, the Improving DMEE-SAS algorithm exhibits a significant improvement in performance when the estimate is close to the optimal estimator. The Improving DMEE-SAS algorithm achieves a low MSD and fast rate of convergence in the non-stationary environment.

Conclusions
In this paper, a robust diffusion estimation algorithm with self-adjusting step-size is developed which is called the DMEE-SAS algorithm. The mean and mean square convergence analysis of this new algorithm are carried out, and a sufficient condition for ensuring the stability is obtained. Simulation results illustrate that the DMEE-SAS algorithm can achieve better performance than the DLMS, robust DLMP, and DMEE algorithms in non-Gaussian noise scenario. In addition, we propose the Improving DMEE-SAS algorithm, using it in the non-stationary scenario where the unknown parameter is changing over time. The Improving DMEE-SAS algorithm combined the DMEE-SAS algorithm with the DMEE algorithm, and it can avoid the small effective step-size of the DMEE-SAS algorithm when close to the optimal estimator.