Diffusion Generalized MCC with a Variable Center Algorithm for Robust Distributed Estimation

: Classical adaptive ﬁltering algorithms with a diffusion strategy under the mean square error (MSE) criterion can face difﬁculties in distributed estimation (DE) over networks in a complex noise environment, such as non-zero mean non-Gaussian noise, with the object of ensuring a robust performance. In order to overcome such limitations, this paper proposes a novel robust diffusion adaptive ﬁltering algorithm, which is developed by using a variable center generalized maximum Correntropy criterion (GMCC-VC). Generalized Correntropy with a variable center is ﬁrst deﬁned by introducing a non-zero center to the original generalized Correntropy, which can be used as robust cost function, called GMCC-VC, for adaptive ﬁltering algorithms. In order to improve the robustness of the traditional MSE-based DE algorithms, the GMCC-VC is used in a diffusion adaptive ﬁlter to design a novel robust DE method with the adapt-then-combine strategy. This can achieve outstanding steady-state performance under non-Gaussian noise environments because the GMCC-VC can match the distribution of the noise with that of non-zero mean non-Gaussian noise. The simulation results for distributed estimation under non-zero mean non-Gaussian noise cases demonstrate that the proposed diffusion GMCC-VC approach produces a more robustness and stable performance than some other comparable DE methods.


Introduction
Distributed estimation has become an important technology. Its object is to estimate interesting and available parameters from noisy measurements using a cooperation strategy between nodes over networks for distributed network applications, such as environment monitoring, spectrum sensing, and source localization [1][2][3]. In recent years, diffusion adaptive filtering (DAF) algorithms with the mean square error (MSE) criterion have been proven to be an optimal and effective method for distributed estimation (DE) in additive white Gaussian noise environments. Among all DAFs, the diffusion least mean square (DLMS) [4][5][6] and diffusion recursive least square [7] are outstanding representatives that have received significant attention in DE applications. However, the performance of these methods with the MSE criterion may degrade under non-Gaussian environments, such as lightning noise, sea cluttering noise, and co-channel interference in distributed spectrum sensing for Cognitive radio applications [8].
Recently, an increasing number of researchers have focused on the development of robust DAF algorithms via the non-second order statistical method. For this purpose, a robust DE method based on the mean p-Power error criterion, called diffusion least mean p-power (DLMP), was developed to estimate the parameters of wireless sensor networks [9]. Specially, the diffusion least mean fourth [10,11] and the diffusion sign error-LMS(DSE-LMS) [12] algorithms, as special cases of the DLMP, were proposed for DE over networks in the cases of non-Gaussian noise interference. In addition, the maximum Correntropy criterion (MCC) was defined to extend second-order statistics into higher-order statistics via the exploitation of the Gaussian kernel function, which can be used as a cost function to design robust adaptive filters due to its smoothness and strict positive-definiteness [13,14]. A robust DE method based on MCC, called diffusion MCC (DMCC), was developed in [15] to mitigate the robustness of the traditional DAF algorithms. In addition, a proportional DMCC algorithm with adaptable kernel width was proposed for sparse distributed system identification in the cases of impulse noise [16]. The diffusion subband adaptive filtering (DSAF) algorithm, based on symmetrical MCC with individual weighting factors, was developed for colored input signals [17]. To improve the convergence performance of the conventional diffusion affine projection (AP) algorithm, an MCC-based diffusion AP algorithm was further derived using the MCC as the cost function for DE over networks [18]. However, the Gaussian kernel in Correntropy may not always be an ideal option under some specific conditions [19]. Consequently, a novel generalized form of Correntropy was further defined by Chen [19], in which the generalized Gaussian density (GGD) function is utilized as the kernel function in Correntropy [19]. Similarly, generalized Correntropy can also be used as a cost function in adaptive signal processing and machine learning fields, and is called generalized MCC (GMCC). The GMCC may achieve better performance than the MCC-based methods for measurements in non-Gaussian noise environments [20][21][22]. This is because a greater number of higher-order moments of data containing errors are contained in GMCC and the additional shape parameters introduced by GGD can further expand the range of possible induced metrics. Therefore, GMCC has been widely utilized to design various robust adaptive filters, such as kernel adaptive filtering under GMCC [22], kernel recursive GMCC [20], Stacked Extreme Learning Machine (ELM) with GMCC [23], and the unscented Kalman filter with GMCC [24]. Chen et al. proposed the diffusion GMCC method for distributed estimation [25], and a novel robust diffusion affine projection GMCC algorithm was further developed over networks [26]. Although the GMCC-based methods can achieve good performance in non-Gaussian noise cases, the steady-state performance may degrade in some practical situations because the center of the generalized Gaussian kernel is located at zero. For example, the error criteria located at zero cannot obtain outstanding results when the error distribution of the signal has a non-zero mean. The main reason for this is that the zero-mean Gaussian function usually cannot match the error distribution well in this case. To overcome this problem, a variable center was introduced into the MCC to define a novel MCC-based criterion [27], called MCC-VC. The MCC-VC-based adaptive filtering methods can achieve better performance than the original MCC-based methods under cases of non-zero mean non-Gaussian noise because of the non-zero center. Taking advantage of the MCC-VC, several MCC-VC-based adaptive filtering algorithms [28,29] and ELM with MCC-VC [30] have been proposed for signal processing and machine learning applications.
Inspired by the MCC-VC and considering the property of the GMCC, a GMCC with a variable center (GMCC-VC) was defined by the author [30], and a recursive adaptive filtering algorithm with a sparse penalty term based on GMCC-VC was developed for sparse system estimation under non-zero mean non-Gaussian environments. In this paper, we focus on the development of a novel robust diffusion adaptive filtering algorithm based on the GMCC-VC, because the center can be located anywhere to obtain good performance for DE over network in more common situations. Due to the attribute of the insensitivity to the outliers, especially with a small kernel bandwidth, the use of GMCC is able to further mitigate the negative impact of non-Gaussian (impulsive) noise on the estimation performance. Moreover, the variable center strategy is used to locate the center anywhere in order to improve the robustness of the proposed method in non-zero mean non-Gaussian noise environments. For feasibility, the online parameter optimization method was designed using the gradient approach to improve the performance of the proposed algorithm. Simulation results demonstrate that the proposed method can effectively improve the distributed parameter estimation over networks in the presence of non-zero mean non-Gaussian noise.
The remainder of this paper is organized as follows. In Section 2, we briefly review generalized Correntropy and define the GMCC with a variable center. In Section 3, the diffusion GMCC with the variable center algorithm is developed and the parameters' optimization methods are presented. In Section 4, numerical simulations are performed to test the performance of the proposed algorithm results. Finally, we conclude this work in Section 5.

Generalized Maximum Correntropy Criterion with Variable Center
In order to improve the performance of the original generalized Correntropy in the cases of non-zero mean non-Gaussian noise, we introduce the variable center idea to generalized Correntropy to extend its applications. First, this section briefly reviews generalized Correntropy, and then defines generalized Correntropy with a variable center.

Briefly Review of the Generalized Correntropy
Generalized Correntropy with a GGD function between arbitrarily given random variables X and Y can be defined as [19]: where E[·] denotes the expectation operator. The GGD function with a zero-mean is usually used as the kernel function in Equation (1), which is expressed in the following form: where e = x − y, α > 0 denotes the shape parameter, β > 0 represents the bandwidth parameter, τ = 1 β α is the kernel parameter, and γ α,τ = α 2βΓ(1/α) stands for a normalized constant. The GGD in Equation (2) is more general and flexible, and general Correntropy may achieve good capability for complex noise cases. In addition, Correntropy with the Gaussian kernel is a special case when generalized Correntropy is set using suitable parameters.
In general, it is difficult to know the joint distribution of X and Y, and only finite number of samples {(x i , y i )} N i=1 are available. Therefore, the sample mean estimator of generalized Correntropy is usually defined and used in practice, and is expressed as: where N is the number of sample points. The generalized Correntropy of error can be utilized as a cost function to design robust adaptive filtering algorithms. This is called the generalized maximum Correntropy criterion (denoted as GMCC). Furthermore, a generalized C-loss function can be defined as: Electronics 2021, 10, 2807 4 of 12 From Equation (3), one can easily obtain the sample mean estimator of the generalized C-loss in Equation (4) as: One can easily find that minimizing J GC−loss (X, Y) is equivalent to maximize V α,τ (X, Y).

Generalized Maximum Correntropy Criterion with Variable Center
As mentioned in [19], generalized Correntropy with the GGD kernel can achieve good performance, and now many generalized C-loss based adaptive filtering and machine learning methods have been developed in different applications. However, the performance of GMCC with zero mean GGD may degrade due to interference with a non-zero mean noise distribution. Therefore, it is important and of interest to expand the flexibility of generalized Correntropy so that it can be adapted to the aforementioned situations.
Inspired by Correntropy with a variable center, we define generalized Correntropy with a variable center (GC-VC) between X and Y as [31]: where the center location c ∈ should be optimized, and mainly controls the performance of the GC-VC. By comparing Equations (1) and (6), it can be seen that the GC-VC will reduce to generalized Correntropy when the center is set at zero, and Correntropy with a variable center when α = 2. The GC-VC also involves higher order moments of the error about the center c as: Similar to generalized Correntropy, the sample estimator of the GC-VC can be given as: Furthermore, one can easily obtain the GC-VC loss as: Now, an optimal model under the GC-VC (or GC-VC loss) can be obtained as: where M * denotes the optimal model, and ℘ is the models' hypothesis space. We then call the optimal model GMCC with a variable center (GMCC-VC).

Diffusion Adaptive Filtering Algorithm under GMCC-VC
Diffusion adaptive filtering (DAF) algorithms have been widely used for distributed estimation over networks due to their outstanding performance. However, the tradi-Electronics 2021, 10, 2807 5 of 12 tional DAF algorithms under MSE cannot achieve a good performance in cases of complex non-Gaussian noise. In this section, we develop a novel robust DAF under the GMCC-VC criterion.

Signal Model and Diffusion GMCC-VC
Here, we consider a network model that is composed of N nodes distributed over a geographic area to estimate an unknown vector w o of size (M × 1) from measurements collected at N nodes. Here, each node k can access the realization of a scalar measurement d k (i) and a regression vector u k (i) of size (M × 1) at each time index i (i = 1, 2, · · · I), related as: where v k (i) represents the measurement noise and T stands for transposition. Based on the model mentioned above, we develop the diffusion GMCC-VC (DGMCC-VC) algorithm for each node k to estimate w o by maximizing a linear combination of local generalized Correntropy with a variable center within the node k' s neighbor N k . Then, we define the cost function of the DGMCC-VC for each node k as: In general, the adapt-then-combine (ATC) strategy is usually used to design the diffusion adaptive filtering algorithm because it can achieve lower steady-state misalignment compared with the combine-then-adapt (CTA) diffusion strategy in some situations [5]. As a result, we mainly focus on the development of the ATC diffusion GMCC-VC (briefly denoted as DGMCC-VC) algorithm in this work, which can be given by the following adaptation and combination steps as: where ψ k (i) represents an intermediate estimate for w o supplied by node k at time instant i. β l,k denotes the combination weight of agent l on agent k. Generally, δ l,k and β l,k should be set to satisfy the following conditions: Recently, several rules have been instituted for selecting these weights, such as the uniform, Metropolis, maximum degree, relative degree, and relative degree-variance rules; their detailed methods can be viewed in [4].

Free Parameter Optimization
Equation (13) shows that two free parameters (the center and the kernel width) are contained in the DGMCC-VC algorithm, which have significant influence on its performance. Optimize these parameters is the crucial problem for this algorithm. In this subsection, we use an online parameter adaptation approach to optimize these parameters; the optimal model is as follows: where the admissible sets of parameters τ l and c l are represented as T and C, τ l (i) and c l (i) denote the adapted parameters at iteration time i. In general, using the Parzen window theory, we have the following results as where the window length of the error samples is L. From [17], it can be seen that the gradient-based approach can be used to solve the optimization problem in Equation (11) in a given finite set. In this work, the following simple methods are used iteratively to optimize the free parameters c l and τ l : Center c: The mean or median value of the error samples are used to obtain the estimate of the center parameter c, and the method can be given as: where the window length L is usually selected to ensure the fit of the error curve to estimate the parameters [31], sort{X} denotes a sort function which sorts the elements of the vector X in ascending order, and median{X} represents the median value of the elements in X. Kernel width τ: To select an optimal kernel width, we use the gradient based method to adaptively optimize this free parameter at each iteration. Taking the derivative of Equation (3) with respect to τ, we can formulate a simple gradient descent-based search rule to update the kernel width as: where η τ = µ τ χ α,τ denotes the step-size parameter for update of τ l .

DGMCC-VC Algorithm with No Measurement Exchange
Equation (14) shows that the exchange of data is required during the adaptation stage, which will make the communication cost relatively high. To address this problem, the uncomplicated strategy with no measurement exchange is used in adaptation stage, and then, using the parameters optimization method, we can obtain the updated equations of the adaptation and combination stage of the novel DGMCC-VC algorithm as: where e c k (i) = d k (i) − u T k (i)w k (i − 1) − c k (i) is the extended error with a variable center for node k. The DGMCC-VC algorithm is summarized schematically in Algorithm 1.
The free parameters τ and c optimazition acccording to Equations (18) and (19)

end for end for
Remark: An extra exponential function of the error G τ,c (e c k (i)) introduced by the GMCC-VC is contained in Equation (20), and this scaling factor will approach zero when a large error occurs (possibly caused by an outlier), which endows the DGMCC-VC algorithm with the property of resisting the influence of outliers. The DGMCC-VC algorithm can be viewed as the DGMCC algorithm when we set the center at zero. In addition, the DGMCC algorithm will reduce to the DLMP algorithm [9] when G τ,c (e c k (i)) is one and the center is located at zero. In addition, it can easily be found that the DLMS, DLMF, and DSE-LMS are special cases of the DGMCC-VC algorithm.

Simulation Results
In this section, we perform Monte Carlo (MC) simulations to verify the performance of the proposed DGMCC-VC algorithm for distributed parameter estimation over networks in non-Gaussian and non-zero mean noise environments. Here, we consider a network topology with 20 nodes, which is generated as a realization of the random geometric graph model, and the unknown parameter vector is set to randn(M,1) √ M (M = 10), where randn(·) represents a function that generates random values with a Gaussian distribution. The input signals are assumed to be zero-mean Gaussian with size M = 10. All results are calculated by taking the ensemble average of the network MSD over 200 independent MC runs. Furthermore, the linear combination coefficients are obtained using the Metropolis rule [4]. In particular, the performance of the proposed DGMCC-VC is compared with some existing algorithms, including the DLMS, DMCC [15], DLMP [9], DSE-LMS [12], and DGMCC algorithms. In order to test the convergence and steady-state performance, we define the mean square deviation (MSD) given in Equation (21) as the evaluation criterion: The measurement noise v(i) is composed of two independent noises, which can be expressed as v(i) = (1 − p(i))A(i) + p(i)B(i), where A(i) with a non-zero mean is inner noise, B(i) with much larger variance is used to model outliers, and p(i) denotes an independent and identically distributed binary process with an occurrence probability of 0.05. In the following simulations, we assume that the noises A(i) and B(i) are independent of a(i), and B(i) is a white Gaussian process with a mean of zero and variance of 100. The following three non-zero mean non-Gaussian distributions are considered as inner noise A(i), which follows [31]: (1) Uniform distribution over [1,2]; (2) Laplace distribution with mean of one and unit variance;

Performance Comparison among the Proposed Algorithm and Other Algorithms
We investigated the steady-state performance of the proposed algorithm in different non-mean non-Gaussian noise environments. For each simulation, the number of repetitions was set at 1000. The step size values of all algorithms were set to ensure almost the same initial convergence rate, which can be seen in the legend. The p was set at 1.1 for the DLMP algorithm, the kernel size was selected as 2.0 for the DMCC algorithm, and the kernel width for DGMCC was 0.1. The exponent parameters were set to 1.8, 1.8, and 2.5 for GMCC and GMCC-VC algorithms for the different inner noises mentioned above, respectively. All parameters were set by scanning for the best results. The convergence curves in terms of MSD are shown in Figure 1a, Figure 2a, Figure 3a under the inner noises (1)-(3), respectively. We can clearly see that the proposed DGMCC-VC algorithm outperforms other methods in terms of steady-state accuracy. The results confirm that the proposed algorithm exhibits a significant improvement in steady-state performance in non-zero mean non-Gaussian noise environments due to the variable center strategy. Furthermore, the steady-state MSDs at each node k are given in Figure 1b, Figure 2b, Figure 3b , respectively. These figures demonstrate the above conclusion that the DGMCC-VC algorithm still shows good performance compared with all other algorithms. pendent of , and is a white Gaussian process with a mean of zero and variance of 100. The following three non-zero mean non-Gaussian distributions are considered as inner noise ( ) A i , which follows [31]: (1) Uniform distribution over [1,2]; (2) Laplace distribution with mean of one and unit variance; (3) Binary distribution over {0,1} with probability mass

Performance Comparison among the Proposed Algorithm and other Algorithms
We investigated the steady-state performance of the proposed algorithm in different non-mean non-Gaussian noise environments. For each simulation, the number of repetitions was set at 1000. The step size values of all algorithms were set to ensure almost the same initial convergence rate, which can be seen in the legend. The p was set at 1.1 for the DLMP algorithm, the kernel size was selected as 2.0 for the DMCC algorithm, and the kernel width for DGMCC was 0.1. The exponent parameters were set to 1.8, 1.8, and 2.5 for GMCC and GMCC-VC algorithms for the different inner noises mentioned above, respectively. All parameters were set by scanning for the best results. The convergence curves in terms of MSD are shown in Figures 1a-3a under the inner noises (1)-(3), respectively. We can clearly see that the proposed DGMCC-VC algorithm outperforms other methods in terms of steady-state accuracy. The results confirm that the proposed algorithm exhibits a significant improvement in steady-state performance in non-zero mean non-Gaussian noise environments due to the variable center strategy. Furthermore, the steady-state MSDs at each node k are given in Figures 1b-3b, respectively. These figures demonstrate the above conclusion that the DGMCC-VC algorithm still shows good performance compared with all other algorithms.

Performance Comparison under Time-Varying Parameter Estimation
In order to test the tracking capabilities of the proposed method, we considered a time-varying parameter estimation case in which the unknown system is changed at the middle of the iterations. Here the number of the iterations is 1000. The parameters of the unknown system are set to different values before and after 500 iterations. Here the inner noise follows a uniform distribution. The convergence curves and MSD at steady state are shown in Figure 4. The results in Figure 4 show that: (1) the proposed algorithm achieves better steady-state accuracy at different stages compared with other methods; (2) the algorithm can converge quickly when the unknown parameters are changed, which means that the proposed algorithm has good tracking ability.

Performance Comparison under Time-Varying Parameter Estimation
In order to test the tracking capabilities of the proposed method, we considered a time-varying parameter estimation case in which the unknown system is changed at the middle of the iterations. Here the number of the iterations is 1000. The parameters of the unknown system are set to different values before and after 500 iterations. Here the inner noise follows a uniform distribution. The convergence curves and MSD at steady state are shown in Figure 4. The results in Figure 4 show that: (1) the proposed algorithm achieves better steady-state accuracy at different stages compared with other methods; (2) the algorithm can converge quickly when the unknown parameters are changed, which means that the proposed algorithm has good tracking ability.

Performance of the Proposed Algorithm with Different Free Parameters
From Equations (18) and (19), we know that the window length L and step size are used to adaptively optimize the center and kernel width. In this subsection, we further investigate the effect of these two free parameters on the performance of the proposed DGMCC-VC algorithm under the condition of uniform inner noise, as represented in Equation (1). First, we set the L at 5, 15, 20, 25, and 30. The other simulation settings were consistent with those of the previous simulations. The convergence curves of the proposed algorithm with different L values are plotted in Figure 5. We can observe that the proposed DGMCC-VC can converge under all selected values of L, but a good performance is obtained when L = 5 in this case. Second, we performed simulations to investigate the performance of the DGMCC-VC algorithm with different step size values of 0.05, 0.08, 0.1, 0.12, 0.15, and 0.20. The results in Figure 6 show that the proposed method can converge consistently for different step size values, and the performance steadily increases when the step

Performance of the Proposed Algorithm with Different Free Parameters
From Equations (18) and (19), we know that the window length L and step size are used to adaptively optimize the center and kernel width. In this subsection, we further investigate the effect of these two free parameters on the performance of the proposed DGMCC-VC algorithm under the condition of uniform inner noise, as represented in Equation (1). First, we set the L at 5, 15, 20, 25, and 30. The other simulation settings were consistent with those of the previous simulations. The convergence curves of the proposed algorithm with different L values are plotted in Figure 5. We can observe that the proposed DGMCC-VC can converge under all selected values of L, but a good performance is obtained when L = 5 in this case. Second, we performed simulations to investigate the performance of the DGMCC-VC algorithm with different step size values of 0.05, 0.08, 0.1, 0.12, 0.15, and 0.20. The results in Figure 6 show that the proposed method can converge consistently for different step size values, and the performance steadily increases when the step value gradually increases from 0.05 to 0.2. According to the results above, we conclude that the free parameters in the optimal method for the center and kernel width are still important for the proposed method.
used to adaptively optimize the center and kernel width. In this subsection, we further investigate the effect of these two free parameters on the performance of the proposed DGMCC-VC algorithm under the condition of uniform inner noise, as represented in Equation (1). First, we set the L at 5, 15, 20, 25, and 30. The other simulation settings were consistent with those of the previous simulations. The convergence curves of the proposed algorithm with different L values are plotted in Figure 5. We can observe that the proposed DGMCC-VC can converge under all selected values of L, but a good performance is obtained when L = 5 in this case. Second, we performed simulations to investigate the performance of the DGMCC-VC algorithm with different step size values of 0.05, 0.08, 0.1, 0.12, 0.15, and 0.20. The results in Figure 6 show that the proposed method can converge consistently for different step size values, and the performance steadily increases when the step value gradually increases from 0.05 to 0.2. According to the results above, we conclude that the free parameters in the optimal method for the center and kernel width are still important for the proposed method.

Conclusions
This paper proposed a novel diffusion adaptive filter (DAF) based on generalized maximum Correntropy with a variable center (GMCC-VC) to improve the performance of classical DAFs for distributed estimation over a network in a non-zero mean non-Gaussian noise environment. Generalized Correntropy with a variable center via the generalized Gaussian kernel function was defined to match the non-zero mean distribution of the non-Gaussian noise. Then, a novel robust diffusion adaptive filtering algorithm based on the GMCC-VC was designed using the adapt-then-combine strategy for distributed estimation over networks. The free parameter optimization techniques based on the gradient method were employed to improve the performance of the proposed algorithm. Simulation results demonstrate that the proposed method outperforms the existing comparable methods for distributed estimation in the case of non-zero mean non-Gaussian noise.
Although the proposed method shows outstanding performance for distributed estimation over networks in special cases, some limitations remain, namely: (1) how to adaptively select the optimal shape parameter α under different conditions; and (2) how to

Conclusions
This paper proposed a novel diffusion adaptive filter (DAF) based on generalized maximum Correntropy with a variable center (GMCC-VC) to improve the performance of classical DAFs for distributed estimation over a network in a non-zero mean non-Gaussian noise environment. Generalized Correntropy with a variable center via the generalized Gaussian kernel function was defined to match the non-zero mean distribution of the non-Gaussian noise. Then, a novel robust diffusion adaptive filtering algorithm based on the GMCC-VC was designed using the adapt-then-combine strategy for distributed estimation over networks. The free parameter optimization techniques based on the gradient method were employed to improve the performance of the proposed algorithm. Simulation results demonstrate that the proposed method outperforms the existing comparable methods for distributed estimation in the case of non-zero mean non-Gaussian noise.
Although the proposed method shows outstanding performance for distributed estimation over networks in special cases, some limitations remain, namely: (1) how to adaptively select the optimal shape parameter α under different conditions; and (2) how to reduce the time complexity of the parameters' optimal process. Those two limitations may be challenges and directions for our future research. Furthermore, sparse distributed estimation in the case of non-zero mean non-Gaussian noise will also be a meaningful area of study in the future.