A Robust Diffusion Minimum Kernel Risk-Sensitive Loss Algorithm over Multitask Sensor Networks

Distributed estimation over sensor networks has attracted much attention due to its various applications. The mean-square error (MSE) criterion is one of the most popular cost functions used in distributed estimation, which achieves its optimality only under Gaussian noise. However, impulsive noise also widely exists in real-world sensor networks. Thus, the distributed estimation algorithm based on the minimum kernel risk-sensitive loss (MKRSL) criterion is proposed in this paper to deal with non-Gaussian noise, particularly for impulsive noise. Furthermore, multiple tasks estimation problems in sensor networks are considered. Differing from a conventional single-task, the unknown parameters (tasks) can be different for different nodes in the multitask problem. Another important issue we focus on is the impact of the task similarity among nodes on multitask estimation performance. Besides, the performance of mean and mean square are analyzed theoretically. Simulation results verify a superior performance of the proposed algorithm compared with other related algorithms.


Introduction
Distributed data processing over sensor networks has emerged as an attractive and challenging research area for various applications such as industrial automation, cognitive radios and inference tasks [1][2][3][4]. Distributed estimation plays a significant role in distributed data processing, which estimates some parameters of interest from noise measurements by exchanging information with neighboring nodes. Most algorithms proposed for distributed estimation can be classified into a consensus strategy [5][6][7][8], incremental strategy [9][10][11] and diffusion strategy [12][13][14]. In our work, we center on a diffusion strategy, which is robust, fully distributed and flexible among these strategies [15][16][17][18][19].
Diffusion strategies are particularly attractive schemes in distributed estimation, such as diffusion Recursive Least Squares (RLS) [20,21], diffusion Least Mean Square (LMS) [13,14]. With the mean-square error (MSE) criterion, these algorithms can accomplish a satisfying performance in a Gaussian noise environment. However, their performance may deteriorate dramatically in the presence of impulsive noise [22,23]. Some algorithms have been proposed to solve the issue, including Diffusion least-mean power (D-LMP) and the Diffusion sign-error Least Mean Square where n k,i is the random measurement noise with zero-mean and variance σ 2 n,k , which is independent of regression vector u k,i . The goal of distributed estimation is to estimate an M × 1 deterministic but unknown vector w 0 k by exchanging and combining the data only from neighboring nodes, which is regarded as single-task problem with w 0 k = w 0 for k = 1, 2, ..., K, and multitask problem with w 0 k = w 0 l for k = l. It is assumed that there is no limit to how much information can be transmitted among neighbors.

Diffusion MKRSL Algorithm
In many previous works, the diffusion distributed estimation algorithms are based on the MSE criterion, which achieves desirable performance if the measurement noise is Gaussian, while their performance may deteriorate dramatically in an impulsive noise environment. To solve the parameter estimation problem over multitask sensors networks, it becomes a significant focus of our interest to design a novel algorithm that is robust to both Gaussian noises and impulsive noises.
The information theoretic learning (ITL) plays a significant role and provides a general framework in distributed parameter estimation for non-Gaussian cases. The correntropy is a local statistical similarity measure in ITL, which is defined by Reference [26] where X, Y are two random variables, k σ (.) is a shift-invariant Mercer kernel and σ > 0 denotes the kernel bandwidth. F XY (x, y) is the joint distribution function of (X, Y). In our work, we focus on the Gaussian kernel, which takes the following form: The minimum kernel risk-sensitive loss (MKRSL) algorithm is derived by applying the KRSL to develop a new adaptive filtering algorithm, which shows better convex properties than correntropic loss on the error performance surface [29,38]. The KRSL between two random variables X and Y is defined by where λ > 0 is the risk-sensitive parameter. Nevertheless, the exact joint distribution of (X, Y) is usually unavailable in application scenarios. On the contrary, only a limited number of sample values {x (i) , y (i)} L i=1 are known. Therefore, the sample mean estimator of KRSL-called empirical KRSL-is calculated by an average over samples: Then, the KRSL cost function is derived as with e(i) = d(i) − u T i w. The time average of the KRSL cost function in the above equation can be replaced by the expectation Based on the KRSL cost function mention in the above Equation (7), the instantaneous cost function of the KRSL algorithm is approximated as For the distributed diffusion estimation problem, our goal is to seek the best w 0 k by minimizing the diffusion KRSL cost function at each node k by cooperating with all neighboring nodes. For each node k, N k is the one-hop neighbor set and c l,k are non-negative real cooperative according to Metropolis rule weights satisfying where n k is the degree of node k. The real, non-negative combining coefficients c l,k satisfy the following conditions: ∑ l∈N k ∪k c l,k = 1 and c l,k = 0 i f l / ∈ N k , CI = I, The KRSL local cost function at each node k can be formulated as Based on the KRSL local cost function, the derivative of (10) with respect to w can be derived as At node k, the weight vector update equation based on a stochastic gradient for w 0 k is obtained by where η = µ σ 2 is step-size and w k (i) is estimator for w 0 k at time index i. The above algorithm is a new expression of the MKRSL algorithm. Inspired by the general framework for a diffusion-based distributed estimation algorithm [13], an adapt-then-combine (ATC) strategy for a diffusion MKRSL algorithm is proposed. The ATC scheme first updates the value of the estimator for each node with the adaptive algorithm. Then, the intermediate estimates are fused from its neighbors for each node k. The intermediate estimate at each node k is defined as: The nodes update their intermediate estimates by ϕ k (i − 1) is an intermediate estimate at time index i − 1 for node k. The non-negative real value β l,k is a weight coefficient, which corresponds to the matrices B, especially B = I in the ATC scheme [12]. Therefore, we can obtain: In the above Equation (15), the task relatedness among nodes is ignored, which is called non-cooperative diffusion MKRSL in this article.
However, multitask estimation is an attracting filed in practical applications. Nodes are grouped into some clusters and each cluster has an identical task in clustered multi-task networks. Furthermore, utilizing the relatedness of tasks, the performance of distributed estimation can be improved. The Equation (15) is adjusted for multitask estimation: is the cluster of node k, with the cluster of node k non-negative strength parameter τ, weights ρ kl and η(i)= exp(λ(1 − k σ (e i )))k σ (e i ). The notation N k ∩ c(k) is the set of neighboring nodes k and in the same cluster as k. On the contrary, N k \c(k) denotes the set of neighboring nodes of k that are not in the same cluster as k. The Equations (15) and (16) are defined as the increment step. The combination step can then be derived as The step-size η(i) is a function of e(i) and the curves with different values of λ (where σ = η = 2.0) and σ (where λ = η = 2.0) is depicted in Figure 1.
It is shown that the step-size η(i) will approach zero as |e(i)| → ∞ for different values of λ. Therefore, the MKRSL algorithm maintains the robustness to outliers, such as impulsive noise.
For a better understanding, the Multitask Diffusion MKRSL algorithm is summarized in Algorithm 1:

Algorithm 1: Multitask Diffusion MKRSL Algorithm
Input: d k,i , u T k,i , η, τ, and c l,k satisfying (10) Initialization: Start with w l,−1 = 0 for all l. for i = 1 : T for each node k: Adaptation

Performance Analysis
The multitask D-MKRSL algorithms are evaluated theoretically under model (1) in this section. In the following, some common assumptions are adopted for tractable analysis [39,40].
(1) The regression vector u k,i is independently and identically distributed (i.i.d.) and For each node k at time index i, the input noise n k (i) is independent of u k,i and is a mixture signal of zero mean Gaussian, we have E[n k,i ] = 0.
(3) The step-size η is small enough, so the squared value can be negligible. Then, the estimate-error vectors are defined as follows: Furthermore, the global quantities are defined to covert the local variables to global ones:

Mean Performance
We consider the gradient error caused by replacing the cost function of KRSL with instantaneous values. The gradient error of the intermediate estimate at time i and each node k is defined as follows: wheref k (w k,i−1 ) = 1 σ 2 exp(λ(1 − k σ (e k,i−1 )))k σ (e k,i−1 )e k,i−1 u T k,i−1 and f k (w k,i−1 ) = 1 σ 2 E[exp(λ(1 − k σ (e k,i−1 )))k σ (e k,i−1 )e k,i−1 u T k,i−1 ] The update equation of the intermediate estimate can be rewritten as f k (w k,i−1 ) is twice continuous differentiable in a neighborhood of a line segment between points w 0 k and w k−1 . Thus, based on the Theorem 1.2.1 in Reference [41], we have where H k (w) is the Hessian matrix of f k (w k,i−1 ).w k,i−1 = w 0 k − w k,i−1 is the weight error vector for node k. The unknown vector w 0 k is the real-value that we want to estimate, so f k (w 0 k ) is equal to zero. The estimate of each node converges to the vicinity of the unknown vector w 0 k . Thus,w k,i is small enough such that it is negligible, yielding where R u,k =E u k,i u T k,i and β is a constant. So, the approximate value of the gradient error at the value of w 0 k is Substituting (28) and (29) into (26) and adjusting for multitask estimation, we can get the intermediate estimate where P is the matrix with (k, l)-th entry ρ kl . Substituting (30) into (17), we can get the update equation of w k (i) as follows Define global quantity H = diag H 1 w 0 1 , ..., H k w 0 N and rewrite (32) as Noting that Cw 0 = w 0 , subtracting both sides of (33) from w 0 , the global vector is obtained: Calculating the expectation of (34) leads to Based on Lemma 1 of [13], the matrix I MN − KH + X should be stable to guarantee mean stability. There it holds that |λ max (I MN − KH + XQ)| < 1 (36) λ max is the largest eigenvalue of matrix. Thus, a sufficient condition for maintaining the stability of the algorithm is:

Mean-Square Performance
In this section, we mainly focus on the mean-square performance of the proposed algorithm. Computing the weight norm of (34) and calculating the expectations, we can obtain and Σ is an Hermitian non-negative-definite matrix.w i is dependent of Γ with Assumptions 1 and 2. Therefore, we have: Let and σ = vec {Σ} (42) where vec(.) is the transpose of the vectorization of a matrix. The Equation (40) can be rewritten to follow equation with (41), (42): The vectorization operator denoted by Reference [42] is: Taking expectation and vectorization operations with (38), (41), (42), we have Based on the relationship of the matrix trace and the vectorization operator [42], we have Σ is symmetric and deterministic, and we obtain where V=KE s i s T i K. According to A.1 and A.2, V can be evaluated as Substitution of (45) and (50) into (43) has (51) is stable and convergent if the matrix δ is stable. δ can be approximated as We know that all the entries of Z are non-negative and all its columns sum up to unity. From the above equation, the stability of δ is in accordance with the stability of I MN − KH + XQ. Therefore, choosing the step-size lined with the Equation (37) can keep the proposed algorithm stable in the mean-square sense.

Simulation
In this section, we validate the performance of the proposed algorithm over multitask sensor networks in two scenarios: a Gaussian environment and an impulsive noise environment. The noise is assumed to be generated by a Gaussian mixture distribution, which is commonly used in signal processing [43,44]: where N(0, σ 2 i )(i = 1, 2) is the Gaussian distribution with zero-mean and variance σ 2 i . And σ 2 2 is set to much larger than σ 2 1 , which can generate the impulsive noise. More frequent impulses are achieved with an increase of v i , especially Increasing ν i leads to more frequent impulses. We consider a fully connected sensor network with 15 nodes. The network topology and cluster structures are demonstrated in Figure 2. From the network topology, we can easily find that nodes 1 to 6 belong to the first cluster. Meanwhile, nodes 7 to 10 compose the second cluster and nodes 11 to 15 are in the third cluster.    Figure 3, the desired signal is a random process with a zero-mean Gaussian (i.i.d.) noise signal. In the experiment, system parameters are set with λ = 2, σ = 1.5 and the step-size is set with η = 0.02. τ is a regularization parameter, which promotes similarities between the tasks of the neighboring cluster and is chosen τ = 0.1. The learning curve of the mean square deviation(MSD) is defined as

Scenario 1 (Gaussian noises Environment): As shown in
which is adopted for performance comparison. d(i) is the average value of d k,i for all nodes k at time i in Figure 4a. We compare some related algorithms over multitask network, such as diffusion least mean p-power (D-LMP) [21], diffusion generalized maximum correntropy criterion algorithm (D-GMCC) [16], diffusion sign-error LMS (DSE-LMS) [22], D-LMS [12] and the proposed d-MKRSL algorithm in Figure 4b. The step-sizes of all algorithms are chosen after many experiments to ensure the same convergence speed, and other parameters for each algorithm are experimentally selected to achieve a desirable performance. From the above figure, we can conclude that the D-MKRSL algorithm outperforms other related algorithms in the Gaussian noise environment.  Scenario 2 (Impulsive noise Environment): The impulsive noise model (54) is adopted to depict the distribution of impulsive interference in the experiment. We now test the influence of the impulsive interference on the performance of the algorithms mentioned above. In Figures 5a and 6a, the desired signals are plotted with v i = 0.05, 0.03 impulsive noise. The corresponding performance of the algorithms in the impulsive noise environment is plotted in Figures 5b and 6b. The value of the parameters α and λ for D-GMCC are selected to achieve the best performance in both the Gaussian and impulsive noise environments. We can observe that the proposed D-MKRSL algorithm is robust and also shows superior performance compared with other related algorithms in the impulsive noise environment.  Furthermore, we consider the performance of the algorithm in a nonstationary scenario and the unknown vector w 0 k is assumed to change at time 1000. From the convergence curves in Figure 7, it can be easily observed that the proposed algorithm maintains a desirable performance even in the presence of sudden changes of an unknown vector. Another important aspect is how the correlation of tasks influence the estimation performance. First, we investigate whether the proposed algorithm can promote performance by utilizing the relatedness of tasks compared with non-cooperative strategy. Figure 8 compares the D-MKRSL algorithm with a non-cooperative strategy over a multitask network at identical relatedness of tasks. It is clear that utilizing the relatedness of tasks is beneficial to improve the performance of estimation. Next, the impact of the similarity of tasks on performance is studied. According to Reference [35], the optimum mean vector is assumed to uniformly distribute on a circle of radius r centered at w 0 k . The bigger the value of r is, the smaller the correlation of the tasks will be. Optimum parameter vectors over the multitask network will be different but related based on the model. The multitask estimation model can be expressed as: (56) Figure 9 demonstrates that the performance of the algorithms will be improved with the increasing similarity.

Conclusions
In this work, we consider the problem of distributed estimation over multitask sensor networks. Then, the D-MKRSL algorithm is proposed and can achieve a desirable performance. Through theoretical analysis, a sufficient condition for ensuring the stability of the D-MKRSL algorithm is obtained. Compared with related algorithms, the simulation results show that the D-MKRSL algorithm has better performance in both Gaussian and impulsive noise environments. Furthermore, we uncover the relationship between the relatedness of tasks and estimation performance. It is demonstrated that the performance is improved with a higher correlation among tasks by cooperation strategy.