A Robust Diffusion Minimum Kernel Risk-Sensitive Loss Algorithm over Multitask Sensor Networks

Li, Xinyu; Shi, Qing; Xiao, Shuangyi; Duan, Shukai; Chen, Feng

doi:10.3390/s19102339

Open AccessArticle

A Robust Diffusion Minimum Kernel Risk-Sensitive Loss Algorithm over Multitask Sensor Networks

by

Xinyu Li

^1,2,†,

Qing Shi

²,

Shuangyi Xiao

²,

Shukai Duan

^1,2 and

Feng Chen

^1,2,*,†

¹

College of Artificial Intelligence, Southwest University, Chongqing 400715, China

²

Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, and College of Electronic and Information Engineering, Southwest University, and Chongqing Collaborative Innovation Center for Brain Science, Chongqing 400715, China

^*

Author to whom correspondence should be addressed.

^†

Current address: Chongqing Collaborative Innovation Center for Brain Science, Southwest University, Chongqing 400715, China.

Sensors 2019, 19(10), 2339; https://doi.org/10.3390/s19102339

Submission received: 28 April 2019 / Accepted: 17 May 2019 / Published: 21 May 2019

(This article belongs to the Section Sensor Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Distributed estimation over sensor networks has attracted much attention due to its various applications. The mean-square error (MSE) criterion is one of the most popular cost functions used in distributed estimation, which achieves its optimality only under Gaussian noise. However, impulsive noise also widely exists in real-world sensor networks. Thus, the distributed estimation algorithm based on the minimum kernel risk-sensitive loss (MKRSL) criterion is proposed in this paper to deal with non-Gaussian noise, particularly for impulsive noise. Furthermore, multiple tasks estimation problems in sensor networks are considered. Differing from a conventional single-task, the unknown parameters (tasks) can be different for different nodes in the multitask problem. Another important issue we focus on is the impact of the task similarity among nodes on multitask estimation performance. Besides, the performance of mean and mean square are analyzed theoretically. Simulation results verify a superior performance of the proposed algorithm compared with other related algorithms.

Keywords:

distributed estimation; diffusion minimum kernel risk-sensitive loss; multitask; impulsive noise; sensor networks

1. Introduction

Distributed data processing over sensor networks has emerged as an attractive and challenging research area for various applications such as industrial automation, cognitive radios and inference tasks [1,2,3,4]. Distributed estimation plays a significant role in distributed data processing, which estimates some parameters of interest from noise measurements by exchanging information with neighboring nodes. Most algorithms proposed for distributed estimation can be classified into a consensus strategy [5,6,7,8], incremental strategy [9,10,11] and diffusion strategy [12,13,14]. In our work, we center on a diffusion strategy, which is robust, fully distributed and flexible among these strategies [15,16,17,18,19].

Diffusion strategies are particularly attractive schemes in distributed estimation, such as diffusion Recursive Least Squares (RLS) [20,21], diffusion Least Mean Square (LMS) [13,14]. With the mean-square error (MSE) criterion, these algorithms can accomplish a satisfying performance in a Gaussian noise environment. However, their performance may deteriorate dramatically in the presence of impulsive noise [22,23]. Some algorithms have been proposed to solve the issue, including Diffusion least-mean power (D-LMP) and the Diffusion sign-error Least Mean Square (DSE-LMS) adaptive filtering algorithm [24,25]. To efficiently address the non-Gaussian noise, the correntropy [26,27] was proposed, which is a higher order statistic and widely used in adaptive filters. Moreover, the generalized maximum correntropy criterion (GMCC) algorithm and the minimum kernel risk-sensitive loss (MKRSL) were proposed [28,29], which provide more general frameworks and better performance. In this work, we consider the diffusion minimum kernel risk-sensitive loss (D-MKRSL) algorithm for distributed estimation over multitask networks.

In previous works, diffusion strategies mainly focus on the single-task estimation problem where an identical parameter vector is estimated by all the nodes [30]. On the contrary, many essential applications are multitask-oriented, such as regression, web page categorization and target location tracking. In the above situations, multiple optimum vectors are different but related, which are inferred synchronously over the networks by all nodes in a collaborative manner. This type of problem is known as a multitask problem. Generally, distributed estimation problems over multitask networks can be roughly classified into two fields. In the first case, there is no knowledge about the correlation of tasks. Meanwhile, which nodes share the same tasks is unknown and nodes cooperate according to network topology [31,32,33]. In the second situation, it is assumed that nodes know which cluster they belong to and the parameter vector in each cluster is the same. Exploiting the information about the similarity of tasks, diffusion strategies for distributed estimation over multitask are obtained [34,35,36,37]. In our work, we focus on the second case.

Inspired by the adapt-then-combine (ATC) DLMS algorithm, we propose the diffusion MKRSL algorithm over multitask networks. The algorithm can achieve desirable performance in both Gaussian and impulsive noise environments. Additionally, the impact of task relatedness on estimation performance is also studied. Moreover, the mean and mean square stability are analyzed theoretically. Effectiveness and advantages of the proposed algorithm are verified by simulation results.

The remaining parts of the article are organized as follows: In Section 2, we briefly introduce the data model of distributed estimation and propose the multitask Diffusion MKRSL algorithm. In Section 3, the mean and mean square performance of the multitask D-MKRSL algorithm are analyzed. Simulation results are demonstrated in Section 4. Finally, we draw conclusions in Section 5.

Notation: We use

{(.)}^{T}

,

E [.]

and ⨂ to denote transposition, expectation and Kronecker product operators, respectively.

I_{m}

denotes an

m \times m

identity matrix.

1

is an

N \times 1

all-unity vector.

|.|

is the absolute value of a scalar.

2. Multitask Diffusion Estimation

2.1. Data Model

Let us consider a connected network with K nodes. Every node

k \in \{1, 2, \dots, K\}

has access to scalar random variables

d_{k, i}

and a zero-mean

M \times 1

regression vector

u_{k, i}

at every time instant

i \geq 0

. The data of node k is related via the linear regression model:

d_{k, i} = u_{k, i}^{T} w_{k}^{0} + n_{k, i}

(1)

where

n_{k, i}

is the random measurement noise with zero-mean and variance

σ_{n, k}^{2}

, which is independent of regression vector

u_{k, i}

. The goal of distributed estimation is to estimate an

M \times 1

deterministic but unknown vector

w_{k}^{0}

by exchanging and combining the data only from neighboring nodes, which is regarded as single-task problem with

w_{k}^{0} = w^{0}

for

k = 1, 2, \dots, K,

and multitask problem with

w_{k}^{0} \neq w_{l}^{0}

for

k \neq l

. It is assumed that there is no limit to how much information can be transmitted among neighbors.

2.2. Diffusion MKRSL Algorithm

In many previous works, the diffusion distributed estimation algorithms are based on the MSE criterion, which achieves desirable performance if the measurement noise is Gaussian, while their performance may deteriorate dramatically in an impulsive noise environment. To solve the parameter estimation problem over multitask sensors networks, it becomes a significant focus of our interest to design a novel algorithm that is robust to both Gaussian noises and impulsive noises.

The information theoretic learning (ITL) plays a significant role and provides a general framework in distributed parameter estimation for non-Gaussian cases. The correntropy is a local statistical similarity measure in ITL, which is defined by Reference [26]

V (X, Y) = E [k_{σ} (X - Y)] = \int k_{σ} (x - y) d F_{X Y} (x, y)

(2)

where

X, Y

are two random variables,

k_{σ} (.)

is a shift-invariant Mercer kernel and

σ > 0

denotes the kernel bandwidth.

F_{X Y} (x, y)

is the joint distribution function of

(X, Y)

. In our work, we focus on the Gaussian kernel, which takes the following form:

k_{σ} (x - y) = exp (- \frac{{(x - y)}^{2}}{2 σ^{2}})

(3)

The minimum kernel risk-sensitive loss (MKRSL) algorithm is derived by applying the KRSL to develop a new adaptive filtering algorithm, which shows better convex properties than correntropic loss on the error performance surface [29,38]. The KRSL between two random variables X and Y is defined by

\begin{array}{l} L_{λ} (X, Y) & = \frac{1}{λ} E [exp (λ (1 - k_{σ} (X - Y)))] \\ = \frac{1}{λ} \int exp (λ (1 - k_{σ} (X - Y))) d F_{X Y} (x, y) \end{array}

(4)

where

λ > 0

is the risk-sensitive parameter. Nevertheless, the exact joint distribution of

(X, Y)

is usually unavailable in application scenarios. On the contrary, only a limited number of sample values

{\{x (i), y (i)\}}_{i = 1}^{L}

are known. Therefore, the sample mean estimator of KRSL—called empirical KRSL—is calculated by an average over samples:

{\hat{L}}_{λ} (X, Y) = \frac{1}{L λ} \sum_{i = 1}^{L} exp (λ (1 - k_{σ} (x (i) - y (i))))

(5)

Then, the KRSL cost function is derived as

J_{K R S L} = \frac{1}{L λ} \sum_{i = 1}^{L} exp (λ (1 - k_{σ} (e (i))))

(6)

with

e (i) = d (i) - u_{i}^{T} w

. The time average of the KRSL cost function in the above equation can be replaced by the expectation

J_{K R S L}^{'} = \frac{1}{λ} E [exp (λ (1 - k_{σ} (e (i))))]

(7)

Based on the KRSL cost function mention in the above Equation(7), the instantaneous cost function of the KRSL algorithm is approximated as

{\tilde{J}}_{K R S L} = \frac{1}{λ} exp (λ (1 - k_{σ} (e (i))))

(8)

For the distributed diffusion estimation problem, our goal is to seek the best

w_{k}^{0}

by minimizing the diffusion KRSL cost function at each node k by cooperating with all neighboring nodes. For each node k,

N_{k}

is the one-hop neighbor set and

\{c_{l, k}\}

are non-negative real cooperative according to Metropolis rule weights satisfying

c_{l, k} = \{\begin{cases} \frac{1}{max (n_{k,} n_{l})}, if l \in N_{k} ∖ k, \\ 1 - \sum_{l \in N_{k} ∖ k} c_{l, k}, if l = k, \\ 0 . if l \notin N_{k}, \end{cases}

(9)

where

n_{k}

is the degree of node k. The real, non-negative combining coefficients

c_{l, k}

satisfy the following conditions:

\sum_{l \in N_{k} \cup k} c_{l, k} = 1

and

c_{l, k} = 0 i f l \notin N_{k}, C I = I, I^{T} C = I^{T}

, where

C

is an

N \times N

matrix. The KRSL local cost function at each node k can be formulated as

\begin{array}{l} J_{k}^{l o c} (w) & = \sum_{l \in N_{k}} c_{l, k} {\tilde{J}}_{K R S L} (e_{l, i}) \\ = \frac{1}{λ} \sum_{l \in N_{k}} c_{l, k} exp (λ (1 - k_{σ} (e_{l, i}))) \\ = \frac{1}{λ} \sum_{l \in N_{k}} c_{l, k} exp (λ (1 - k_{σ} (d_{l, i} - u_{l, i}^{T} w))) \end{array}

(10)

Based on the KRSL local cost function, the derivative of (10) with respect to w can be derived as

\begin{array}{l} \nabla J_{k}^{l o c} (w) & = \frac{1}{λ} \sum_{l \in N_{k}} c_{l, k} \frac{\partial}{\partial w} exp (λ (1 - k_{σ} (e_{l, i}))) \\ = - \frac{1}{σ^{2}} \sum_{l \in N_{k}} c_{l, k} exp (λ (1 - k_{σ} (e_{l, i}))) k_{σ} (e_{l, i}) e_{l, i} u_{l, i}^{T} \end{array}

(11)

At node k, the weight vector update equation based on a stochastic gradient for

w_{k}^{0}

is obtained by

\begin{array}{l} w_{k} (i) & = w_{k} (i - 1) - μ \nabla J_{k}^{l o c} (w) \\ = w_{k} (i - 1) + \frac{μ}{σ^{2}} \sum_{l \in N_{k}} c_{l, k} exp (λ (1 - k_{σ} (e_{l, i}))) k_{σ} (e_{l, i}) e_{l, i} u_{l, i}^{T} \\ = w_{k} (i - 1) + η \sum_{l \in N_{k}} c_{l, k} exp (λ (1 - k_{σ} (e_{l, i}))) k_{σ} (e_{l, i}) e_{l, i} u_{l, i}^{T} \end{array}

(12)

where

η = \frac{μ}{σ^{2}}

is step-size and

w_{k} (i)

is estimator for

w_{k}^{0}

at time index i. The above algorithm is a new expression of the MKRSL algorithm. Inspired by the general framework for a diffusion-based distributed estimation algorithm [13], an adapt-then-combine (ATC) strategy for a diffusion MKRSL algorithm is proposed. The ATC scheme first updates the value of the estimator for each node with the adaptive algorithm. Then, the intermediate estimates are fused from its neighbors for each node k. The intermediate estimate at each node k is defined as:

φ_{k} (i - 1) = \sum_{l \in N_{k}} β_{l, k} w_{l} (i - 1)

(13)

The nodes update their intermediate estimates by

φ_{k} (i) = φ_{k} (i - 1) + η \sum_{l \in N_{k}} c_{l, k} exp (λ (1 - k_{σ} (e_{l, i}))) k_{σ} (e_{l, i}) e_{l, i} u_{l, i}^{T}

(14)

φ_{k} (i - 1)

is an intermediate estimate at time index

i - 1

for node k. The non-negative real value

β_{l, k}

is a weight coefficient, which corresponds to the matrices

B

, especially

B = I

in the ATC scheme [12]. Therefore, we can obtain:

φ_{k} (i) = w_{k} (i - 1) + η \sum_{l \in N_{k}} c_{l, k} exp (λ (1 - k_{σ} (e_{l, i}))) k_{σ} (e_{l, i}) e_{l, i} u_{l, i}^{T}

(15)

In the above Equation (15), the task relatedness among nodes is ignored, which is called non-cooperative diffusion MKRSL in this article.

However, multitask estimation is an attracting filed in practical applications. Nodes are grouped into some clusters and each cluster has an identical task in clustered multi-task networks. Furthermore, utilizing the relatedness of tasks, the performance of distributed estimation can be improved. The Equation (15) is adjusted for multitask estimation:

φ_{k} (i) = w_{k} (i - 1) + η \sum_{l \in N_{k} \cap c (k)} c_{l, k} exp (λ (1 - k_{σ} (e_{l, i}))) k_{σ} (e_{l, i}) e_{l, i} u_{l, i}^{T} + τ \sum_{l \in N_{k} ∖ c (k)} ρ_{k l} (w_{l} (i) - w_{k} (i))

(16)

c (k)

is the cluster of node k, with the cluster of node k non-negative strength parameter

τ

, weights

ρ_{k l}

and

η (i) = exp (λ (1 - k_{σ} (e_{i}))) k_{σ} (e_{i})

. The notation

N_{k} \cap c (k)

is the set of neighboring nodes k and in the same cluster as k. On the contrary,

N_{k} ∖ c (k)

denotes the set of neighboring nodes of k that are not in the same cluster as k. The Equations (15) and (16) are defined as the increment step. The combination step can then be derived as

w_{k} (i) = \sum_{l \in N_{k}} c_{l, k} φ_{l} (i)

(17)

The step-size

η (i)

is a function of

e (i)

and the curves with different values of

λ

(where

σ = η = 2.0

) and

σ

(where

λ = η = 2.0

) is depicted in Figure 1.

It is shown that the step-size

η (i)

will approach zero as

|e (i)| \to \infty

for different values of

λ

. Therefore, the MKRSL algorithm maintains the robustness to outliers, such as impulsive noise.

For a better understanding, the Multitask Diffusion MKRSL algorithm is summarized in Algorithm 1:

Algorithm 1: Multitask Diffusion MKRSL Algorithm

Input:

d_{k, i}

,

u_{k, i}^{T}

,

η

,

τ

, and

\{c_{l, k}\}

satisfying (10)
Initialization: Start with

\{w_{l, - 1} = 0\}

for all l.
for

i = 1 : T

for each node k:
Adaptation

φ_{k} (i) = w_{k} (i - 1) + η \sum_{l \in N_{k} \cap c (k)} c_{l, k} exp (λ (1 - k_{σ} (e_{l, i}))) k_{σ} (e_{l, i}) e_{l, i} u_{l, i}^{T}

+ τ \sum_{l \in N_{k} ∖ c (k)} ρ_{k l} (w_{l} (i) - w_{k} (i))

Communication
Transmit the intermediate

φ_{k} (i)

to all neighbors in

N_{k}

Combination

w_{k} (i) = \sum_{l \in N_{k}} c_{l, k} φ_{l} (i)

end for

3. Performance Analysis

The multitask D-MKRSL algorithms are evaluated theoretically under model (1) in this section. In the following, some common assumptions are adopted for tractable analysis [39,40].

(1) The regression vector

u_{k, i}

is independently and identically distributed (i.i.d.) and

E [u_{k, i} u_{k, i}^{T}] = R_{u, k}

.

(2) For each node k at time index i, the input noise

n_{k} (i)

is independent of

u_{k, i}

and is a mixture signal of zero mean Gaussian, we have

E [n_{k, i}] = 0

.

(3) The step-size

η

is small enough, so the squared value can be negligible.

Then, the estimate-error vectors are defined as follows:

{\tilde{w}}_{k, i} = w_{k}^{0} - w_{k, i}

(18)

and

{\tilde{φ}}_{k, i} = w_{k}^{0} - φ_{k, i}

(19)

Furthermore, the global quantities are defined to covert the local variables to global ones:

K = b l o c k d i a g \{η I_{M}, \dots, η I_{M}\}

(20)

X = b l o c k d i a g \{τ I_{M}, \dots, τ I_{M}\}

(21)

{\tilde{w}}_{i} = c o l \{{\tilde{w}}_{1, i}, \dots, {\tilde{w}}_{K, i}\}

(22)

w_{i} = c o l \{w_{1, i}, \dots, w_{K, i}\}

(23)

w_{k}^{0} = c o l \{w_{1}^{0}, \dots, w_{K}^{0}\}

(24)

3.1. Mean Performance

We consider the gradient error caused by replacing the cost function of KRSL with instantaneous values. The gradient error of the intermediate estimate at time i and each node k is defined as follows:

s_{k} (w_{k, i - 1}) = {\hat{f}}_{k} (w_{k, i - 1}) - f_{k} (w_{k, i - 1})

(25)

where

{\hat{f}}_{k} (w_{k, i - 1}) = \frac{1}{σ^{2}} exp (λ (1 - k_{σ} (e_{k, i - 1}))) k_{σ} (e_{k, i - 1}) e_{k, i - 1} u_{k, i - 1}^{T}

and

f_{k} (w_{k, i - 1}) = \frac{1}{σ^{2}} E [exp (λ (1 - k_{σ} (e_{k, i - 1}))) k_{σ} (e_{k, i - 1}) e_{k, i - 1} u_{k, i - 1}^{T}]

The update equation of the intermediate estimate can be rewritten as

φ_{k, i} = w_{k, i - 1} + μ (s_{k} (w_{k, i - 1}) + f_{k} (w_{k, i - 1}))

(26)

f_{k} (w_{k, i - 1})

is twice continuous differentiable in a neighborhood of a line segment between points

w_{k}^{0}

and

w_{k - 1}

. Thus, based on the Theorem 1.2.1 in Reference [41], we have

f_{k} (w_{k, i - 1}) = f_{k} (w_{k}^{0}) - (\int_{0}^{1} H_{k} (w_{k}^{0} - t {\tilde{w}}_{k, i - 1}) d t) {\tilde{w}}_{k, i - 1}

(27)

where

H_{k} (w)

is the Hessian matrix of

f_{k} (w_{k, i - 1})

.

{\tilde{w}}_{k, i - 1} = w_{k}^{0} - w_{k, i - 1}

is the weight error vector for node k. The unknown vector

w_{k}^{0}

is the real-value that we want to estimate, so

f_{k} (w_{k}^{0})

is equal to zero. The estimate of each node converges to the vicinity of the unknown vector

w_{k}^{0}

. Thus,

{\tilde{w}}_{k, i}

is small enough such that it is negligible, yielding

\begin{array}{l} f_{k} (w_{k, i - 1}) & \approx - (\int_{0}^{1} H_{k} (w_{k}^{0}) d t) {\tilde{w}}_{k, i - 1} \\ = - H_{k} (w_{k}^{0}) {\tilde{w}}_{k, i - 1} \\ = - β R_{u, k} {\tilde{w}}_{k, i - 1} \end{array}

(28)

where

R_{u, k} = E [u_{k, i} u_{k, i}^{T}]

and

β

is a constant.

So, the approximate value of the gradient error at the value of

w_{k}^{0}

is

\begin{array}{l} s_{k} (w_{k, i - 1}) & \approx s_{k} (w_{k}^{0}) \\ = {\hat{f}}_{k} (w_{k}^{0}) - f_{k} (w_{k}^{0}) \\ = \frac{1}{σ^{2}} exp (λ (1 - k_{σ} (e_{k, i - 1}))) k_{σ} (e_{k, i - 1}) e_{k, i - 1} u_{k, i - 1}^{T} \end{array}

(29)

Substituting (28) and (29) into (26) and adjusting for multitask estimation, we can get the intermediate estimate

φ_{k, i} = w_{k, i - 1} + μ (s_{k} (w_{k}^{0}) - H_{k} (w_{k}^{0}) {\tilde{w}}_{k, i - 1}) + τ Q ({\tilde{w}}_{k, i - 1} + w_{k}^{0})

(30)

where

Q = I_{MN} - P \otimes I_{M}

(31)

P

is the matrix with

(k, l)

-th entry

ρ_{k l}

. Substituting (30) into (17), we can get the update equation of

w_{k} (i)

as follows

w_{k} (i) = \sum_{l \in N_{k}} c_{l, k} [w_{k, i - 1} + μ (s_{k} (w_{k}^{0}) - H_{k} (w_{k}^{0}) {\tilde{w}}_{k, i - 1}) + τ Q ({\tilde{w}}_{k, i - 1} + w_{k}^{0})]

(32)

Define global quantity

H = d i a g \{H_{1} (w_{1}^{0}), \dots, H_{k} (w_{N}^{0})\}

and rewrite (32) as

w_{i} = C (w_{i - 1} + K s_{i} - KH {\tilde{w}}_{i - 1} + XQ {\tilde{w}}_{i - 1} + XQ w^{0})

(33)

Noting that

C w^{0} = w^{0}

, subtracting both sides of (33) from

w^{0}

, the global vector is obtained:

{\tilde{w}}_{i - 1} = C (I_{MN} - KH + XQ) {\tilde{w}}_{i - 1} + CK s_{i} + CXQ w^{0}

(34)

Calculating the expectation of (34) leads to

E [{\tilde{w}}_{i - 1}] = C (I_{MN} - KH + XQ) E [{\tilde{w}}_{i - 1}] + CK E [s_{i}] + CXQ w^{0}

(35)

where

E [s_{i}] = c o l \{E [s_{1} (w_{1}^{0}), \dots, s_{N} (w_{N}^{0})]\} = 0

. Based on Lemma 1 of [13], the matrix

I_{MN} - KH + X

should be stable to guarantee mean stability. There it holds that

|λ_{max} (I_{MN} - KH + XQ)| < 1

(36)

λ_{max}

is the largest eigenvalue of matrix. Thus, a sufficient condition for maintaining the stability of the algorithm is:

0 < η < \frac{2}{β λ_{max} (R_{u, k}) + 2 τ}

(37)

3.2. Mean-Square Performance

In this section, we mainly focus on the mean-square performance of the proposed algorithm. Computing the weight norm of (34) and calculating the expectations, we can obtain

E [{∥{\tilde{w}}_{i}∥}_{Σ}^{2}] = E [{∥{\tilde{w}}_{i - 1}∥}_{Γ}^{2}] + E [s_{i}^{T} K C^{T} Σ CK s_{i}] + 2 {(XQ w^{0})}^{T} Σ C (I_{MN} - KH + XQ) E [{\tilde{w}}_{i - 1}] + CXQ w^{0}

(38)

where

Γ = (I_{MN} - KH + XQ) C^{T} Σ C (I_{MN} - KH + XQ)

(39)

and

Σ

is an Hermitian non-negative-definite matrix.

{\tilde{w}}_{i}

is dependent of

Γ

with Assumptions 1 and 2. Therefore, we have:

E [{∥{\tilde{w}}_{i - 1}∥}_{Γ}^{2}] = E [{∥{\tilde{w}}_{i - 1}∥}_{E [Γ]}^{2}]

(40)

Let

γ = v e c \{E [Γ]\}

(41)

and

σ = v e c \{Σ\}

(42)

where

v e c (.)

is the transpose of the vectorization of a matrix. The Equation (40) can be rewritten to follow equation with (41), (42):

E [{∥{\tilde{w}}_{i}∥}_{σ}^{2}] = E [{∥{\tilde{w}}_{i - 1}∥}_{γ}^{2}] + E [s_{i}^{T} K C^{T} Σ CK s_{i}] + 2 {(XQ w^{0})}^{T} Σ C (I_{MN} - KH + XQ) E [{\tilde{w}}_{i - 1}] + CXQ w^{0}

(43)

The vectorization operator denoted by Reference [42] is:

v e c \{A B C\} = (C^{T} \otimes A) v e c \{B\}

(44)

Taking expectation and vectorization operations with (38), (41), (42), we have

γ = δ σ

(45)

where

δ = E [(I_{MN} - KH + XQ) \otimes (I_{MN} - KH + XQ)] Z

(46)

Z = E [C^{T} \otimes C^{T}]

(47)

Based on the relationship of the matrix trace and the vectorization operator [42], we have

t r \{A^{T} B\} = v e c^{T} \{B\} v e c \{A\}

(48)

Σ

is symmetric and deterministic, and we obtain

E [s_{i}^{T} K C^{T} Σ CK s_{i}] = v e c^{T} \{V\} Z σ

(49)

where

V = K E [s_{i} s_{i}^{T}] K

. According to A.1 and A.2,

V

can be evaluated as

V = b l o c k d i a g \{η^{2} s_{1}^{2} (w_{1}^{0}), \dots, η^{2} s_{K}^{2} (w_{K}^{0})\}

(50)

Substitution of (45) and (50) into (43) has

E [{∥{\tilde{w}}_{i}∥}_{σ}^{2}] = E [{∥{\tilde{w}}_{i - 1}∥}_{δ σ}^{2}] + v e c \{V\} Z σ + 2 {(XQ w^{0})}^{T} Σ C (I_{MN} - KH + XQ) E [{\tilde{w}}_{i - 1}] + CXQ w^{0}

(51)

the recursion of Equation (51) is stable and convergent if the matrix

δ

is stable.

δ

can be approximated as

δ \approx [(I_{MN} - KH + XQ) \otimes (I_{MN} - KH + XQ)] Z

(52)

We know that all the entries of

Z

are non-negative and all its columns sum up to unity. From the above equation, the stability of

δ

is in accordance with the stability of

I_{MN} - KH + XQ

. Therefore, choosing the step-size lined with the Equation (37) can keep the proposed algorithm stable in the mean-square sense.

4. Simulation

In this section, we validate the performance of the proposed algorithm over multitask sensor networks in two scenarios: a Gaussian environment and an impulsive noise environment. The noise is assumed to be generated by a Gaussian mixture distribution, which is commonly used in signal processing [43,44]:

p_{n_{i}} = (1 - v_{i}) N (0, σ_{1}^{2}) + v_{i} N (0, σ_{2}^{2})

(53)

where

N (0, σ_{i}^{2}) (i = 1, 2)

is the Gaussian distribution with zero-mean and variance

σ_{i}^{2}

. And

σ_{2}^{2}

is set to much larger than

σ_{1}^{2}

, which can generate the impulsive noise.

More frequent impulses are achieved with an increase of

v_{i}

, especially

\{\begin{matrix} if ν_{i} = 0 \to G a u s s i a n \\ if ν_{i} \neq 0 \to I m p u l s i v e . \end{matrix}

(54)

Increasing

ν_{i}

leads to more frequent impulses.

We consider a fully connected sensor network with 15 nodes. The network topology and cluster structures are demonstrated in Figure 2. From the network topology, we can easily find that nodes 1 to 6 belong to the first cluster. Meanwhile, nodes 7 to 10 compose the second cluster and nodes 11 to 15 are in the third cluster.

Input variances and noise variances based on Assumptions 1 and 2 are depicted in Figure 3.

Scenario 1 (Gaussian noises Environment): As shown in Figure 3, the desired signal is a random process with a zero-mean Gaussian (i.i.d.) noise signal. In the experiment, system parameters are set with

λ = 2, σ = 1.5

and the step-size is set with

η = 0.02

.

τ

is a regularization parameter, which promotes similarities between the tasks of the neighboring cluster and is chosen

τ = 0.1

. The learning curve of the mean square deviation(MSD) is defined as

M S D = \frac{1}{K} \sum_{k = 1}^{K} {∥w_{k}^{0} - w_{k, i}∥}_{2}^{2}

(55)

which is adopted for performance comparison.

d (i)

is the average value of

d_{k, i}

for all nodes k at time i in Figure 4a. We compare some related algorithms over multitask network, such as diffusion least mean p-power (D-LMP) [21], diffusion generalized maximum correntropy criterion algorithm (D-GMCC) [16], diffusion sign-error LMS (DSE-LMS) [22], D-LMS [12] and the proposed d-MKRSL algorithm in Figure 4b. The step-sizes of all algorithms are chosen after many experiments to ensure the same convergence speed, and other parameters for each algorithm are experimentally selected to achieve a desirable performance. From the above figure, we can conclude that the D-MKRSL algorithm outperforms other related algorithms in the Gaussian noise environment.

Scenario 2 (Impulsive noise Environment): The impulsive noise model (54) is adopted to depict the distribution of impulsive interference in the experiment. We now test the influence of the impulsive interference on the performance of the algorithms mentioned above. In Figure 5a and Figure 6a, the desired signals are plotted with

v_{i} = 0.05, 0.03

impulsive noise. The corresponding performance of the algorithms in the impulsive noise environment is plotted in Figure 5b and Figure 6b. The value of the parameters

α

and

λ

for D-GMCC are selected to achieve the best performance in both the Gaussian and impulsive noise environments. We can observe that the proposed D-MKRSL algorithm is robust and also shows superior performance compared with other related algorithms in the impulsive noise environment.

Furthermore, we consider the performance of the algorithm in a nonstationary scenario and the unknown vector

w_{k}^{0}

is assumed to change at time 1000. From the convergence curves in Figure 7, it can be easily observed that the proposed algorithm maintains a desirable performance even in the presence of sudden changes of an unknown vector.

Another important aspect is how the correlation of tasks influence the estimation performance. First, we investigate whether the proposed algorithm can promote performance by utilizing the relatedness of tasks compared with non-cooperative strategy. Figure 8 compares the D-MKRSL algorithm with a non-cooperative strategy over a multitask network at identical relatedness of tasks. It is clear that utilizing the relatedness of tasks is beneficial to improve the performance of estimation. Next, the impact of the similarity of tasks on performance is studied. According to Reference [35], the optimum mean vector is assumed to uniformly distribute on a circle of radius r centered at

w_{k}^{0}

. The bigger the value of r is, the smaller the correlation of the tasks will be. Optimum parameter vectors over the multitask network will be different but related based on the model. The multitask estimation model can be expressed as:

\begin{array}{r} w_{k}^{0} = w^{0} + r (\begin{matrix} cos θ_{k} \\ sin θ_{k} \end{matrix}) \\ θ_{k} = 2 π (k - 1) / N + π / 8 \end{array}

(56)

Figure 9 demonstrates that the performance of the algorithms will be improved with the increasing similarity.

5. Conclusions

In this work, we consider the problem of distributed estimation over multitask sensor networks. Then, the D-MKRSL algorithm is proposed and can achieve a desirable performance. Through theoretical analysis, a sufficient condition for ensuring the stability of the D-MKRSL algorithm is obtained. Compared with related algorithms, the simulation results show that the D-MKRSL algorithm has better performance in both Gaussian and impulsive noise environments. Furthermore, we uncover the relationship between the relatedness of tasks and estimation performance. It is demonstrated that the performance is improved with a higher correlation among tasks by cooperation strategy.

Author Contributions

Data curation, F.C.; Funding acquisition, S.D.; Project administration, X.L.; Software, X.L. and Q.S.; Supervision, S.D. and F.C.; Writing–original draft, X.L.; Writing–review and editing, Q.S. and S.X.

Funding

This work was supported in part by the National Natural Science Foundation of China (Grant No.61875168) and Chongqing Research Program of Basic Research and Frontier Technology (No. cstc2017jcyjAX0265).

Conflicts of Interest

The authors declare no conflict of interest.

References

Sayed, A.H. Adaptation, learning, and optimization over networks. Found. Trends Mach. Learn. 2014, 7, 311–801. [Google Scholar] [CrossRef]
Lorenzo, P.D.; Barbarossa, S.; Sayed, A.H. Bio-inspired swarming for dynamic radio access based on diffusion adaptation. In Proceedings of the 2011 19th European Signal Processing Conference (EUSIPCO), Barcelona, Spain, 29 August–2 September 2011; pp. 402–406. [Google Scholar]
Chen, J.; Cao, X.; Cheng, P.; Xiao, Y.; Sun, Y. Distributed collaborative control for industrial automation with wireless sensor and actuator networks. IEEE Trans. Ind. Electron. 2010, 57, 4219–4230. [Google Scholar] [CrossRef]
Sayed, A.H.; Tu, S.; Chen, J.; Zhao, X.; Towfic, Z.J. Diffusion strategies for adaptation and learning over networks. IEEE Signal Process. Mag. 2013, 30, 155–171. [Google Scholar] [CrossRef]
Olfati-Saber, R.; Fax, J.A.; Murray, R.M. Consensus and cooperation in networked multi-agent systems. Proc. IEEE 2007, 95, 215–233. [Google Scholar] [CrossRef]
Kar, S.; Moura, J.M.F. Distributed consensus algorithms in sensor networks: Link failures and channel noise. IEEE Trans. Signal Process. 2009, 57, 355–369. [Google Scholar] [CrossRef]
Wang, J.; Peng, D.; Jing, Z.; Chen, J. Consensus-Based Filter for Distributed Sensor Networks with Colored Measurement Noise. Sensors 2018, 18, 3678. [Google Scholar] [CrossRef] [PubMed]
Nedic, A.; Ozdaglar, A. Distributed subgradient methods for multiagent optimization. IEEE Trans. Autom. Control 2009, 54, 48–61. [Google Scholar] [CrossRef]
Nedic, A.; Bertsekas, D.P. Incremental subgradient methods for nondifferentiable optimization. SIAM J. Optim. 2001, 12, 109–138. [Google Scholar] [CrossRef]
Rabbat, M.G.; Nowak, R.D. Quantized incremental algorithms for distributed optimization. IEEE J. Sel. Areas Commun. 2005, 23, 798–808. [Google Scholar] [CrossRef] [Green Version]
Lopes, C.G.; Sayed, A.H. Incremental adaptive strategies over distributed networks. IEEE Trans. Signal Process. 2007, 48, 223–229. [Google Scholar] [CrossRef]
Chen, J.; Sayed, A.H. Diffusion adaptation strategies for distributed optimization and learning over networks. IEEE Trans. Signal Process. 2012, 60, 4289–4305. [Google Scholar] [CrossRef]
Cattivelli, F.S.; Sayed, A.H. Diffusion LMS strategies for distributed estimation. IEEE Trans. Signal Process. 2010, 58, 1035–1048. [Google Scholar] [CrossRef]
Zhao, X.; Sayed, A.H. Performance limits for distributed estimation over LMS adaptive networks. IEEE Trans. Signal Process. 2012, 60, 5107–5124. [Google Scholar] [CrossRef]
Tu, S.Y.; Sayed, A.H. Diffusion strategies outperform consensus strategies for distributed estimation over adaptive networks. IEEE Trans. Signal Process. 2012, 60, 6217–6234. [Google Scholar] [CrossRef]
Chen, F.; Li, X.; Duan, S.; Wang, L.; Wu, J. Diffusion generalized maximum correntropy criterion algorithm for distributed estimation over multitask network. Digit. Signal Process. 2018, 81, 16–25. [Google Scholar] [CrossRef]
Liu, Y.; Li, C.; Tang, W.K.S.; Zhang, Z. Distributed estimation over complex networks. Inf. Sci. 2012, 197, 91–104. [Google Scholar] [CrossRef]
Chen, F.; Shao, X. Broken-motifs diffusion LMS algorithm for reducing communication load. Signal Process. 2017, 197, 91–104. [Google Scholar] [CrossRef]
Chen, F.; Shao, X. Complementary performance analysis of general complex-valued diffusion LMS for noncircular signals. Signal Process. 2019, 160, 237–246. [Google Scholar]
Cattivelli, F.S.; Lopes, C.G.; Sayed, A.H. A diffusion RLS scheme for distributed estimation over adaptive networks. In Proceedings of the 2007 IEEE 8th Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Helsinki, Finland, 17–20 June 2007; pp. 1–5. [Google Scholar]
Cattivelli, F.S.; Lopes, C.G.; Sayed, A.H. Diffusion recursive leasts-quares for distributed estimation over adaptive networks. IEEE Trans. Signal Process. 2008, 56, 1865–1877. [Google Scholar] [CrossRef]
Gao, W.; Chen, J. Kernel Least Mean p-Power algorithm. IEEE Signal Process. Lett. 2017, 24, 996–1000. [Google Scholar] [CrossRef]
Shao, X.; Chen, F.; Ye, Q.; Duan, S. A Robust Diffusion Estimation Algorithm with Self-Adjusting Step-Size in WSNs. Sensors 2017, 17, 824. [Google Scholar] [CrossRef] [PubMed]
Wen, F. Diffusion least-mean P-power algorithms for distributed estimation in alpha-stable noise environments. Electron. Lett. 2013, 49, 1355–1356. [Google Scholar] [CrossRef]
Ni, J.; Chen, J.; Chen, X. Diffusion sign-error LMS algorithm: formulation and stochastic behavior analysis. Signal Process. 2016, 128, 142–149. [Google Scholar] [CrossRef]
Liu, W.; Pokharel, P.P.; Principe, J.C. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Trans. Signal Process. 2007, 55, 5286–5298. [Google Scholar] [CrossRef]
Chen, B.; Liu, X.; Zhao, H.; Principe, J.C. Maximum correntropy Kalman filter. Automatica 2017, 76, 70–77. [Google Scholar] [CrossRef] [Green Version]
Chen, B.; Xing, L.; Zhao, H.; Zheng, N.; Principe, J.C. Generalized correntropy for robust adaptive filtering. IEEE Trans. Signal Process. 2016, 64, 3376–3387. [Google Scholar] [CrossRef]
Chen, B.; Xing, L.; Xu, B.; Zhao, H.; Zheng, N.; Principe, J.C. Kernel Risk-Sensitive Loss: Definition, Properties and Application to Robust Adaptive Filtering. IEEE Trans. Signal Process. 2017, 65, 2888–2901. [Google Scholar] [CrossRef] [Green Version]
Chen, J.; Richard, C.; Sayed, A.H. Diffusion LMS over multitask networks. IEEE Trans. Signal Process. 2015, 63, 2733–2748. [Google Scholar] [CrossRef]
Chen, J.; Sayed, A.H. Distributed Pareto optimization via diffusion strategies. IEEE J. Sel. Top. Signal Process. 2013, 7, 205–220. [Google Scholar] [CrossRef]
Zhao, X.; Sayed, A.H. Clustering via diffusion adaptation over networks. In Proceedings of the 2012 3rd International Workshop on Cognitive Information Processing (CIP), Parador de Baiona, Spain, 28–30 May 2012; pp. 1–6. [Google Scholar]
Zhao, X.; Sayed, A.H. Distributed clustering and learning over networks. IEEE Trans. Signal Process. 2015, 63, 3285–3300. [Google Scholar] [CrossRef]
Chen, J.; Richard, C.; Hero, A.O.; Sayed, A.H. Diffusion LMS for mu1titask problems with overlapping hypothesis subspaces. In Proceedings of the 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Reims, France, 21–24 September 2014; pp. 1–6. [Google Scholar]
Bogdanovic, N.; Plata-Chaves, J.; Berberidis, K. Distributed diffusion-based LMS for node-specific parameter estimation over adaptive networks. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 7223–7227. [Google Scholar]
Chen, J.; Richard, C.; Sayed, A.H. Multitask diffusion adaptation over networks. IEEE Trans. Signal Process. 2014, 62, 4129–4144. [Google Scholar] [CrossRef]
Nassif, R.; Richard, C.; Ferrari, A. Proximal multitask learning over networks with sparsity-inducing coregularization. IEEE Trans. Signal Process. 2016, 64, 6329–6344. [Google Scholar] [CrossRef]
Ma, W.; Chen, B.; Duan, J.; Zhao, H. Diffusion maximum correntropy criterion algorithms for robust distributed estimation. Digit. Signal Process. 2016, 58, 10–19. [Google Scholar] [CrossRef] [Green Version]
Sayed, A.H. Adaptive Filters; Wiley: Hoboken, NJ, USA, 2008. [Google Scholar]
Haykin, S. Adaptive Filter Theory; Prentice-Hall: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
Kelley, C.T. Iterative Methods for Optimization; SIAM: Philadelphia, PA, USA, 1999. [Google Scholar]
Abadir, K.M.; Magnus, J.R. Matrix Algebra; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Chan, S.C.; Zou, Y.X. A recursive least M-estimate algorithm for robust adaptive filtering in impulsive noise: fast algorithm and convergence performance analysis. IEEE Trans. Signal Process. 2004, 52, 975–991. [Google Scholar] [CrossRef]
Sayed, A.S.; Zoubir, A.M.; Sayed, A.H. Robust adaptation in impulsive noise. IEEE Trans. Signal Process. 2016, 64, 2851–2865. [Google Scholar] [CrossRef]

Figure 1. Curves of

η (i)

as a function of

e (i)

(a) different values of

λ

(

σ = η = 2.0

) (b) different values of

σ

(

λ = η = 2.0

).

Figure 1. Curves of

η (i)

as a function of

e (i)

(a) different values of

λ

(

σ = η = 2.0

) (b) different values of

σ

(

λ = η = 2.0

).

Figure 2. Network topology.

Figure 3. The variances of the input signal (a) and noise (b).

Figure 4. Gaussian noise environment (a) desired signal (b) transient network MSD(dB).

Figure 5. Impulsive interference environment of

v_{i} = 0.05

(a) desired signal (b) transient network MSD(dB).

Figure 5. Impulsive interference environment of

v_{i} = 0.05

(a) desired signal (b) transient network MSD(dB).

Figure 6. Impulsive interference environment of

v_{i} = 0.03

(a) desired signal (b) transient network MSD(dB).

Figure 6. Impulsive interference environment of

v_{i} = 0.03

(a) desired signal (b) transient network MSD(dB).

Figure 7. MSD learning curves in a non-stationary environment (a) Gaussian environment (b) Impulsive Interference.

Figure 8. Network MSD comparison over multitask environment.

Figure 9. Network MSD comparison with different r value.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Shi, Q.; Xiao, S.; Duan, S.; Chen, F. A Robust Diffusion Minimum Kernel Risk-Sensitive Loss Algorithm over Multitask Sensor Networks. Sensors 2019, 19, 2339. https://doi.org/10.3390/s19102339

AMA Style

Li X, Shi Q, Xiao S, Duan S, Chen F. A Robust Diffusion Minimum Kernel Risk-Sensitive Loss Algorithm over Multitask Sensor Networks. Sensors. 2019; 19(10):2339. https://doi.org/10.3390/s19102339

Chicago/Turabian Style

Li, Xinyu, Qing Shi, Shuangyi Xiao, Shukai Duan, and Feng Chen. 2019. "A Robust Diffusion Minimum Kernel Risk-Sensitive Loss Algorithm over Multitask Sensor Networks" Sensors 19, no. 10: 2339. https://doi.org/10.3390/s19102339

APA Style

Li, X., Shi, Q., Xiao, S., Duan, S., & Chen, F. (2019). A Robust Diffusion Minimum Kernel Risk-Sensitive Loss Algorithm over Multitask Sensor Networks. Sensors, 19(10), 2339. https://doi.org/10.3390/s19102339

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Robust Diffusion Minimum Kernel Risk-Sensitive Loss Algorithm over Multitask Sensor Networks

Abstract

1. Introduction

2. Multitask Diffusion Estimation

2.1. Data Model

2.2. Diffusion MKRSL Algorithm

3. Performance Analysis

3.1. Mean Performance

3.2. Mean-Square Performance

4. Simulation

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI