Diffusion Logarithm-Correntropy Algorithm for Parameter Estimation in Non-Stationary Environments over Sensor Networks

This paper considers the parameter estimation problem under non-stationary environments in sensor networks. The unknown parameter vector is considered to be a time-varying sequence. To further promote estimation performance, this paper suggests a novel diffusion logarithm-correntropy algorithm for each node in the network. Such an algorithm can adopt both the logarithm operation and correntropy criterion to the estimation error. Moreover, if the error gets larger due to the non-stationary environments, the algorithm can respond immediately by taking relatively steeper steps. Thus, the proposed algorithm achieves smaller error in time. The tracking performance of the proposed logarithm-correntropy algorithm is analyzed. Finally, experiments verify the validity of the proposed algorithmic schemes, which are compared to other recent algorithms that have been proposed for parameter estimation.


Introduction
Sensor networks are useful tools for disaster relief management, target localization and tracking, and environment monitoring [1][2][3][4]. Distributed parameter estimation plays an essential role in sensor networks [5][6][7]. The objective of the parameter estimation is to estimate some essential parameters from noisy observation measurements through cooperation between nodes. Moreover, distributed strategies are of great significance to solve the problem of parameter estimation in sensor networks, due to their robustness against imperfections, low complexity, and low power demands.
Among these distributed schemes, in the incremental strategy [8], a cyclic path is defined over the nodes and data are processed in a cyclic manner through the network until optimization is achieved. However, determining a cyclic path that runs across all nodes is generally a challenging (NP-hard) task to perform. In the consensus strategy [9], vanishing step sizes are used to ensure that nodes can reach consensus and converge to the same optimizer in steady-state. In the diffusion strategy, information is processed locally and simultaneously at all nodes. The processed data are diffused through a real-time sharing mechanism that ripples through the network continuously [10,11]. The diffusion strategies are particularly attractive because they are robust [12][13][14][15], flexible, and fully distributed compared with incremental and consensus strategies, so we adopt diffusion strategies in this paper.
Most prior literature is mainly concerned with the case where nodes estimate the parameter vector collaboratively in the stationary case over sensor networks [10,16]. However, in the real world, the non-stationary case is normal. In this work, we mainly consider the parameter estimation in the non-stationary case, where the parameter is always time-varying. The observation data are nonlinear and non-Gaussian, since the data may be disturbed by changing communication links or outliers under the non-stationary environments.
Inspired by the differentiability and mathematical tractability of logarithm functions, we introduce the logarithm function as the error cost function [17]. Moreover, the correntropy criterion is a nonlinear measure of similarity between two random variables [18], which is a robust optimality criterion has been successfully used in the field of non-Gaussian signal processing. To make the error cost function more suitable for non-stationary environments, we propose a diffusion signal processing framework with a logarithm-correntropy cost function to solve the parameter estimation problem, which can elegantly and gradually adjust the cost function in its optimization based on the error amount.

A. Related Works
The tracking behavior of a wide range of adaptive networks under non-stationary conditions was thoroughly investigated in [19][20][21][22]. In stationary conditions, based on the p norm error criterion, a diffusion minimum average p-power (dLMP) was proposed to estimate the parameters in wireless sensor networks [23]. To estimate the mean-square weight deviations under the zero-mean stationary measurement noise, the proportionate-type normalized least mean square algorithms were proposed in [24]. The diffusion normalized least-mean-square algorithm (dNLMS) was proposed for parameter estimation in a distributed network [25], and the variable step size of the dNLMS algorithm was obtained by minimizing the mean-square deviation to achieve fast convergence rate. The gradient-descent total least-squares (dTLS) algorithm is a stochastic-gradient adaptive filtering algorithm that compensates for error in both input and output data [26]. The steady-state analysis of gradient-descent total least-squares was inspired by the energy-conservation-based approach to the performance analysis of adaptive filters. When measurement noise involves impulsive interference, Ni, Chen, and Chen [27] designed a diffusion sign-error LMS (dSE-LMS) to solve the parameter estimation. The tracking performance of a variable step-size diffusion LMS algorithm is considered in non-stationary environment [28], but this research did not get the closed-form expression of steady-state mean-square deviation (MSD) or excess mean-square error (EMSE) of the network. Consequently, the theory and simulation do not match well. To date, the performance of distributed estimation algorithms has been predominantly studied under stationary conditions. However, the performances of these algorithms may degrade in non-stationary environments.
To find the optimal adaptation step sizes over the networks, Abdolee, Vakilian, and Champagne [29] formulated a constrained nonlinear optimization problem and solved it through a log-barrier Newton algorithm in an iterative manner. By using the optimal step size at each node, the performance of diffusion least-mean squares (DLMS) could be improved in non-stationary signal environments. Compared with this research, the proposed algorithm can respond immediately by taking relatively steeper steps when the error gets larger, and as a result, the new algorithm can perform well in non-stationary environments without finding the optimal step size at each node.

B. Our Contributions and Organization
To further promote estimation performance in non-stationary environments over sensor networks, a novel algorithm needs to be designed. In this paper, the random-walk model is introduced for non-stationary environments. We proposed the logarithm-correntropy algorithm for parameter estimation in sensor networks under the non-stationary environments. This algorithm can adopt both the logarithm operation and correntropy criterion to the estimation error. Moreover, if the error gets larger due to the non-stationary environments, the algorithm can respond immediately by taking relatively steeper steps. Thus, the proposed algorithm achieves smaller error in time. The tracking performance of the proposed algorithm was analyzed. Simulation results are presented to evaluate the proposed algorithm.
The rest of this paper is organized as follows. In Section 2, we describe the estimation problem in a non-stationary environment. Section 3 introduces the adapt-then-combine (ATC) diffusion diffusion logarithmic-correntropy algorithm. In Section 4, the tracking performance analysis of the proposed algorithm is presented. Simulation results are presented in Section 5. Finally, conclusions are drawn in Section 6.
Notation: In what follows, let bold letters denote random variables and non-bold letters represent their realizations. Operators (.) T and E [.] denote transposition and expectation, respectively. I m denotes an m × m identity matrix. 1 is an N × 1 all-unity vector. |.| is the absolute value of a scalar.

Estimation Problem in a Non-Stationary Environment
Consider a network with N nodes (sensors) deployed to observe some physical phenomena and specific events in a special environment. It is fundamentally necessary to consider and analyze parameter estimation under non-stationary conditions with the intent of employing them for practical applications. One challenge confronted in real-world applications is the non-stationary nature of the underlying parameters. For this purpose, a data model with a varying parameter is required. In this paper, we use the random walk model in [19] to depict the non-stationary condition.

Assumption 1. (Random Walk Model):
The parameter vector varies based on the following model: where w * i−1 is a random variable with a constant mean, where i is the time index. η i is a zero-mean random sequence with a covariance matrix R η .

Assumption 2.
The sequence η i is independent of u k,i and n k,i for all k and i. At every time i, every node k can only exchange information with the nodes from its neighborhoods N k (including node k itself), and takes a scalar measurement d k,i according to: where u k,i denotes the M × 1 random regression input signal vector and we assume I > M, n k,i is the Gaussian noise with zero mean and variance σ 2 n,k . The problem is to estimate an M × 1 unknown varying vector w * i at each node k from collected measurements. The objective of the network is to search for all unknown variable w and find the best estimation w * at the end by minimizing the MSE cost function in a distributed manner as follows: ( The cost function of the global network can be described as: The optimization problem in Equation (3) can be solved by the diffusion strategies proposed in [30,31]. In these strategies, the estimate for each node is generated through a fixed combination strategy, which refers to giving different weights to the estimation of k's neighbors to minimize the local function as follows: where c lk is the combination coefficient. For simplicity and good performance, we use the Metropolis rule in our work. The description of the Metropolis rule is: where n k is the degree of node k (the number of nodes connected to node k). The combining coefficients c lk also satisfy the following conditions: ∑ where C is an N × N matrix with non-negative real entries {c lk }.

Diffusion Logarithmic-Correntropy Algorithm
In the non-stationary case, the parameter is always time-varying. We propose a new logarithmic-correntropy method to solve the parameter estimation problem. In order to solve Equation (3), since nodes in sensor networks have access to the observed data, we can take advantage of node cooperation by introducing a distributed diffusion learning manner.
In this paper, we are inspired from the recent developments in the information theoretic learning (ITL) related to the "logarithmic cost function" and the "correntropy"-based approaches [17,32]. The logarithmic function is differentiable, which makes it mathematically tractable. We introduce the logarithmic function as an efficient cost function in the adaptive algorithm. In this framework, we introduce an error cost function using the logarithmic function given by: where α > 0 is a a small systemic parameter and F (e l,i ) is a conventional cost function of the estimation error e k,i on each node k. The estimation error is e k,i = d k,i − w T u k,i . In this paper, we introduce the correntropy criterion to formulate the conventional cost function F (.). The correntropy is a similarity measure based on the ITL criterion. Given two random variables X and Y, the corresponding correntropy between them can be defined by [33]: where k σ (.) is a continuous, symmetric, positive-definite function with bandwidth σ, also called the Mercer kernel. E is an expectation operator. The joint distribution function of X and Y is F XY (x, y). The Gaussian kernel is mainly concerned in this paper.
For each node k, based on the correntropy criterion, the instantaneous conventional cost function F (e k,i ) is: In non-stationary conditions over networks, the communication among nodes is subject to link noise, and it is natural that the observation vectors are affected by noise. The total least squares (TLS) method for estimation can have desirable performance by reducing the noise effect from both the observation vector and the data matrix [34]. We briefly explain the TLS method as follows: Consider the linear parameter estimation problem Ax ≈ b, where A is the data matrix, b is the observation vector, and x is the unknown parameter vector. The least squares (LS) approach considers that the observation vector is noisy while the data matrix is noiseless. However, the total least squares (TLS) approach considers that both the observation vector and the data matrix are noisy [35]. The LS approach seeks the estimate of the unknown parameter vector x by minimizing a sum of squared residuals expressed by: min while the TLS approach minimizes a sum of weighted squared residuals expressed by: From the matrix algebra viewpoint, the total least squares (TLS) approach is a refinement of the LS method when there are errors in both the observation vector and the data matrix. Inspired by the desirable features of the TLS method, to make the logarithm-correntropy method more suitable for non-stationary environments, we rewrite the conventional cost function as: To demonstrate the superiority of the proposed logarithm-correntropy method, we introduce different stochastic cost functions, such as the least mean square cost e 2 and absolute difference cost |e|. Figure 1 compares these cost functions with the proposed cost function logarithm-correntropy (e). It can be observed that the proposed cost function logarithm-correntropy (e) is less sensitive to tiny interference on the error, and shows comparable steepness for quite large error interference. Furthermore, this new logarithm-correntropy cost function benefits from mapping the original input space into a potential higher-dimensional "feature space", where linear methods can be employed. Particularly, if the error gets larger due to the non-stationary environments, the algorithm can respond immediately by taking relatively steeper steps. Thus, the proposed algorithm achieves smaller error in time and takes more gradual steps in space.
Given the data model, all nodes can observe data generated by the data model in Equation (2). It is natural to expect collaboration between nodes to be beneficial for a distributed sensor network. This means that neighbor nodes can share information with each other as permitted by the network topology. Therefore, according to Equations (7) and (13), we define the global cost function so that all nodes in the sensor network can be adapted in a distributed manner, then the new global function can be built as follows: To develop the distributed diffusion logarithm-correntropy algorithm in non-stationary environments over sensor networks, we can build the following new diffusion Logarithm-Correntropy Algorithm (dLCA) local cost function at every node k as: where w k,i is the local estimate obtained by node k at time i, e l,i = d l,i − (w l,i ) T u l,i is the estimation error at node l, l denotes any neighbor node of node k, H (e l,i ) =F (e l,i ) − 1 α ln 1 + αF (e l,i ) , and the c l,k denote combining coefficients, which also is subjected to the Metropolis rule. To reach the minimum w * i , it is a natural thought to use the steepest-descent method. Taking the derivative of Equation (15), we have where . e l,i (u l,i +d l,i w l,i ) Since nodes in the sensor networks have access to all observed data, we can take advantage of node cooperation by introducing a diffusion strategy to estimate the parameter w k,i in a fully distributed manner. This paper concerns the Adapt-then-Combine (ATC) scheme of the diffusion strategy. As Figure 2 shows, in the ATC scheme, nodes in networks combine information from their immediate neighbors firstly, and then employ updates by the following steps: (1) Adaptation: In order to obtain an intermediate estimate, we introduce a step-size parameter µ. Each node updates its current estimate for the true parameter value by taking steepest-descent method. We can obtain an intermediate estimate ϕ k,i as follows: (2) Combination: This step is also called the diffusion step, to obtain a new estimate, each node aggregates its own intermediate estimate from all its neighbor nodes as follows: For the purpose of clarity, we summarize the procedures of the diffusion Logarithm-Correntropy Algorithm (dLCA) (Algorithm 1) as follows:

Algorithm 1: diffusion Logarithm-Correntropy Algorithm
Initialize: Start with w l,−1 = 0 for all l, initialize w k,0 for each node k,step-size µ, and cooperative coefficients c lk . Set α > 0, σ > 0. for t = 1 : T for each node k: Adaptation.  Step 1 depicts a sensor network working in a non-stationary environment. In the adaptation stage 2, each node is using observed data u k,i , d k,i to update its intermediate estimate ϕ k,i . Step 3 shows the information exchanging process between nodes. In the combination stage 4, each node collects the intermediate estimates from its neighbors.

Tracking Performance Analysis
The tracking performance of the proposed diffusion logarithm-correntropy algorithm is analyzed in this section. The convergence condition is first studied withw i , defined as the error signal, which is a time-varying parameter under the random walk model: It has been proven that subtracting w * i from both sides of the update procedure on a node and then taking the expectation value leads to the following relation under stationary conditions in [10]: Then, considering the Assumptions 2 and 3 of the random sequence {η i }, we observe that w * i has a constant mean and hence E w * i = E w * i−1 under the relation in Equation (1). Taking the expectation value leads to the following relation under non-stationary conditions in Equation (1). We obtain In Equation (21), η i is a zero-mean variable sequence with covariance matrix R η . Our purpose is to achieve mean square deviation (MSD) and excess mean square error (EMSE) for each node, which are defined as: In the proposed algorithm in Equation (17), the error signals can be defined as follows: In the non-stationary case with w * i = w * i−1 + η i , based on the definition in Equation (15), J local k (w) is twice continuous differentiable when w = 0. Then we obtain the Hessian matrix of J local k (w), which is defined as ∇ 2 w J local k (w). From Lemma 1 and Theorem 1 in [12], the bound Hessian is: λ k,min I M ≤ ∇ 2 w J local k ≤ λ k,max I M and 0 ≤ λ k,min ≤ λ k,max .
Equations (16)-(18) cause gradient error. The error recursion is then given bỹ In Equation (26), as a positive-definite random matrix, H k,i−1 is defined as Applying Jensen's inequality to Equation (27), the variance ofw k,i is bounded by where . 2 is a convex function and represents the squared Euclidean norm.
Integrating both sides of Equation (26), we achieve It follows from the bound Hessian and Equation (28) that where According to [12], substituting Equation (32) into Equation (30), we get where α ≥ 0 is a constant. The global MSD is introduced, which leads tõ We collect the The C i is left-stochastic, that is, C T i 1 N = 1 N , 1 N means the N × 1 all one vector. From Equations (29) and (35), it holds that where denotes element-wise ordering and Ξ diag µ 2 σ 2 n,1 , · · · , µ 2 σ 2 n,N .
In order to ensure the stability of the proposed algorithm in the mean sense, according to Theorem 1(mean − squarestability) in Reference [12], it should hold that Since where . ∞ is the l ∞ norm. When the step-sizes {µ} are sufficiently small, we can further yield the conclusion that According to the bound in Equation (41), if step-sizes {µ} are sufficiently small, the MSD of each node is E ω k (i) 2 , which can become sufficiently small.

Simulation Results
In this section, to verify the performance of the proposed diffusion logarithm-correntropy algorithm, we considered a network consisting of 20 nodes and 50 communication links. The topology is shown in Figure 3. The sensor nodes were randomly deployed in an area of 100 × 100 and the communication distance between nodes was set as 35. All results below were averaged over 150 independent Monte Carlo simulations with randomly generated samples.
In this simulation part, firstly, the performance of the proposed algorithm was verified in a non-stationary environment over sensor networks and the communication links were ideal. In Figure 4, the regression inputs {u k (i)} are independent identically distributed (i.i.d.), which are zero-mean Gaussian vectors with covariance matrices R u,k = σ 2 u,k I M , and the σ 2 u,k is the input variance. The background noises {n k (i)} are drawn independently of the regressors and are i.i.d. The unknown parameter vector w * i is time-varying, as Figure 5 shows. The fixed step-size µ = 0.002 is used in the simulations. The MSD learning curves are plotted in Figure 6. It shows that the proposed dLCA algorithm obtained the fastest convergence rate when compared with the dSE-LMS, dLMP, dTLS, and dNLMS algorithms. It also shows that the dLCA algorithm could achieve relatively good performance in terms of the network MSD. The proposed algorithm had relatively smaller MSD than the mentioned algorithms. From these simulation results, it can be seen that diffusion logarithm-correntropy algorithm exhibited better tracking ability in non-stationary environments than the existing classical algorithms.   Figure 6. A comparison of simulated MSD learning curves in a non-stationary environment over sensor networks for the diffusion sign-error least-mean-square (dSE-LMS), diffusion minimum average p-power (dLMP), gradient-descent total least-squares (dTLS), diffusion normalized least-mean-square algorithm (dNLMS), and diffusion logarithm-correntropy algorithm (dLCA) algorithms. Figure 7 compares the steady-state EMSE performances of related algorithms on each node in the sensor networks. It can be observed that a large difference was observed at some nodes that achieved low EMSE. By averaging over 150 experiments and over 50 time samples after convergence, the steady-state EMSE values were obtained. The proposed algorithm captured a better trend of the steady-state performance than other algorithms.
Secondly, to further simulate the non-stationary scenarios in sensor networks, the link was assumed to change at time 4000. The unknown parameter vector w * i was time-varying with link changing, as Figure 8 shows. From the simulation results shown in Figure 9, in non-stationary environments over sensor networks with links changing, the diffusion logarithm-correntropy algorithm had smaller MSD than other related algorithms, such as dSE-LMS, dLMP, dTLS, and dNLMS algorithms. It further shows that the proposed dLCA algorithm had better tracking ability in non-stationary environments.
Finally, we compared the simulated network MSD curves with theoretical results under Equation (41) in Figure 10. One can see that theoretical network MSD curves of the proposed algorithm showed good match with its simulated MSD curves.

Conclusions
To solve the problem of parameter estimation in non-stationary environments over sensor networks, each node in the sensor networks was equipped with the logarithm-correntropy cost function. The proposed algorithm can gradually adjust the cost function in its optimization based on the estimation error amount. We investigated the tracking behavior of the proposed algorithm under non-stationary conditions. Furthermore, the simulations were implemented in the non-stationary environments, where the parameters were time-varying with link changing. Simulation experiments were conducted to verify the analytical results, and illustrated that the proposed algorithm outperformed existing algorithms, such as dSE-LMS, dLMP, dTLS, and dNLMS algorithms.