Next Article in Journal
Non-Destructive Evaluation for Corrosion Monitoring in Concrete: A Review and Capability of Acoustic Emission Technique
Previous Article in Journal
Raman Spectroscopy of Optically Trapped Single Biological Micro-Particles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Distributed Density Estimation Based on a Mixture of Factor Analyzers in a Sensor Network

1
College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2
Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
3
School of Information Science and Engineering, Southeast University, Nanjing 210096, China
*
Author to whom correspondence should be addressed.
Sensors 2015, 15(8), 19047-19068; https://doi.org/10.3390/s150819047
Submission received: 27 May 2015 / Revised: 27 July 2015 / Accepted: 30 July 2015 / Published: 5 August 2015
(This article belongs to the Section Sensor Networks)

Abstract

:
Distributed density estimation in sensor networks has received much attention due to its broad applicability. When encountering high-dimensional observations, a mixture of factor analyzers (MFA) is taken to replace mixture of Gaussians for describing the distributions of observations. In this paper, we study distributed density estimation based on a mixture of factor analyzers. Existing estimation algorithms of the MFA are for the centralized case, which are not suitable for distributed processing in sensor networks. We present distributed density estimation algorithms for the MFA and its extension, the mixture of Student’s t-factor analyzers (MtFA). We first define an objective function as the linear combination of local log-likelihoods. Then, we give the derivation process of the distributed estimation algorithms for the MFA and MtFA in details, respectively. In these algorithms, the local sufficient statistics (LSS) are calculated at first and diffused. Then, each node performs a linear combination of the received LSS from nodes in its neighborhood to obtain the combined sufficient statistics (CSS). Parameters of the MFA and the MtFA can be obtained by using the CSS. Finally, we evaluate the performance of these algorithms by numerical simulations and application example. Experimental results validate the promising performance of the proposed algorithms.

1. Introduction

Sensor networks are composed of tiny, intelligent sensor nodes that are deployed over a geographic region. This type of network has a broad range of applications, such as environmental monitoring, precision agriculture and military surveillance [1,2,3]. Distributed estimation over sensor networks is to estimate some parameters of interest through local computation and information exchange among neighbor nodes. Compared to centralized estimation, it does not need to send observations collected by all of the sensors to a powerful central node, so the complexity and resource consumption can be reduced. Furthermore, distributed estimation is more flexible and robust to node and/or link failure [2,4]. Recently, many distributed estimation algorithms have been proposed, such as distributed LMS [5], distributed recursive least squares (RLS) [6], distributed source location [7], distributed power allocation [8], distributed sparse estimation [9,10], distributed information theoretic learning [11] and distributed Gaussian process regression [12].
Mixture of Gaussians (GMM) is a flexible and powerful probabilistic modeling tool for density estimation. It has been used in several areas, such as pattern recognition, computer vision, signal and image analysis and machine learning. When estimating parameters in the GMM by the maximum likelihood criterion, the expectation maximization (EM) algorithm [13,14] is usually adopted. It iteratively performs the expectation step (E-step) to calculate the conditional expectations of unobserved/hidden variables and runs the maximization step (M-step) to estimate parameters of data distributions based on the result of the E-step. However, when the dimension of observations is high, the fitting performance of the GMM deteriorates or even the associated EM algorithm cannot work [15]. The main reason is that the GMM cannot realize dimensionality reduction, which is to compress highly-correlated components of observations. In this case, a mixture of factor analyzers (MFA) [16,17] can be considered. The MFA combines local factor analysis in a form of a finite mixture. As factor analysis can describe variability among high-dimensional observations in terms of potentially low-dimensional latent factors, the MFA can carry out dimensionality reduction simultaneously when finishing specific tasks. Moreover, in order to process the non-normality of data or outliers, normal distributions in the MFA can be replaced by Student’s t-distributions, obtaining the mixture of Student’s t-factor analyzers (MtFA) [18,19]. Therefore, the MFA and its extension MtFA are effective tools for processing high-dimensional observations [20]. They have been successfully applied in the domains of signal processing [21,22], bioinformatics [23,24] and other applied fields.
In sensor networks, GMM has been introduced for density estimation of observations [25,26,27,28,29,30,31,32]. The estimation process for the GMM needs to be realized by distributed EM algorithms. According to the way by which nodes communicate with each other, distributed EM algorithms can be classified into the incremental type [25,26], the consensus type [27] and the diffusion type [28,29,30,31,32]. In the incremental scheme [25,26], a long way from the first node to last node of the pre-selected path is needed. When any node along the path fails, reliability problem may happen. In the consensus-based distributed EM algorithm for the GMM [27], a consensus filter, by which global statistics for each node is achieved, is carried out between the E-step and the M-step at each iteration. The objective is to obtain the same estimations for all nodes at each iteration. In the diffusion type of distributed estimation for the GMM, each node exchanges information only with its neighbors through a diffusion cooperative protocol. Good performance is obtained while communication overhead is kept low [2,4]. In this paper, we focus on a diffusion type of distributed estimation. Among previous studies, a distributed model order and parameter estimation algorithm for the GMM was proposed in [28]. Moreover, algorithm performance was analyzed. In [29], a diffusion-based EM algorithm was presented for distributed estimation in unreliable sensor networks. In this scenario, some nodes may be subject to data failures and report only noise. The aim of the algorithm was to achieve the optimal performance within the whole range of SNRs. In [30], information diffusion and averaging were considered and performed simultaneously. In [31], an adaptive diffusion scheme was proposed. In [32], the performance of the diffusion-based EM algorithm was analyzed. It could be considered as a stochastic approximation method [33] to find the maximum likelihood estimation of the GMM.
As the MFA can handle high-dimensional observations, which are also usually encountered in sensor networks, in this paper, we propose distributed density estimation algorithms for the MFA and its extension MtFA. We represent these two algorithms as D-MFA and D-MtFA, respectively. Specially, for each node in the sensor network, we define an objective function as the linear combination of local log-likelihoods, whose combination weights are determined by the number of observations in the corresponding neighbor nodes. After local sufficient statistics are computed, the current node calculates its combined sufficient statistics by a linear weighed combination of these local sufficient statistics from nodes in its neighborhood set. Finally, parameters of the MFA and MtFA are updated using the combined sufficient statistics. Apart from the distributed processing of the MFA and the MtFA in this paper, there are two other differences from the existing algorithms. First, in the relevant algorithms [25,27,28], mixing proportions in the GMM are different at each node, whereas means and covariances are the same throughout the network. On the contrary, in this paper, all of the parameters in the MFA or the MtFA are the same throughout the network. Using this design, distributed clustering and classification can be done in arbitrary nodes after the estimation process finishes. Second, for each node, the objective function is directly defined. The combination of weights in the objective function is effectively designed.
The rest of this paper is organized as follows. In Section 2, a brief overview of the MFA and the MtFA are provided. In Section 3, the D-MFA algorithm and D-MtFA algorithm are formulated. In Section 4, numerical simulations for the synthetic observations are performed to illustrate the effectiveness and advantages of the proposed algorithms. Moreover, the application of these algorithms to distributed clustering is also presented. Finally, conclusions are drawn in Section 5.
The acronyms mentioned in this paper are listed in the following.
Acronym list:
GMMGaussian mixture model
EMexpectation maximization
E-stepexpectation step
M-stepmaximization step
MFAmixture of factor analyzers
MtFAmixture of Student’s t-factor analyzers
D-MFAdistributed density estimation algorithm for the MFA
D-MtFAdistributed density estimation algorithm for the MtFA
CSScombined sufficient statistics
LSSlocal sufficient statistics
S-MFAstandard EM algorithm for the MFA
S-MtFAstandard EM algorithm for the MtFA
NC-MFAnon-cooperation MFA
NC-MtFAnon-cooperation MtFA
D-GMMdistributed density estimation algorithm for the GMM
D-tMMdistributed density estimation algorithm for the Student’st-mixture model

2. Preliminaries: MFA and MtFA

2.1. Mixture of Factor Analyzers

Let the observed dataset be Y = { y 1 , . . . , y N } . In the MFA, it assumes that each p-dimensional data vector y n is generated as:
y n = μ i + A i u n + e n i  with prob.  π i ( i = 1 , ... , I )
where I is the number of mixing components. The corresponding q-dimensional ( q < p ) factor u n 𝒩 ( u n | 0 , I q ) is independent of the e n i 𝒩 ( e n i | 0 , D i ) , where D i is a p × p diagonal matrix. The parameter μ i is the mean of the i-th analyzer, and A i ( p × q ) is the linear transformation known as the factor loading matrix. The so-called mixing proportions π i ( i = 1 , . . . , I ) are nonnegative and sum to one. The standard EM algorithm for the MFA is given in [15,16].

2.2. Mixture of Student’s t-Factor Analyzers

Since the MFA adopts the normal family for the distributions of the errors and the latent factors, it is sensitive to outliers. An obvious way to improve the robustness of this model for observations having longer tails than normal is using the t-family of elliptically-symmetric distributions. Therefore, the MtFA has been proposed in [18]. In the MtFA, it assumes that p-dimensional data vector y n is generated in the same way as that in the MFA, as shown in Equation (1). However, the distributions of q-dimensional ( q < p ) factor u n and noise e n i are t ( u n | 0 , I q , ν i ) and t ( e n i | 0 , D i , ν i ) , respectively. In the above Student’s t distributions, ν i is called the degree of freedom that controls the length of the tails of the distributions. With this modification, the MtFA is more robust to outliers and can process the non-normality of observations in a better way [23]. In essence, t ( u n | 0 , I q , ν i ) and t ( e n i | 0 , D i , ν i ) can be respectively regarded as average Gaussian scale distributions 𝒩 ( u n | 0 , I q / w n i ) and 𝒩 ( e n i | 0 , D i / w n i ) with the Gamma distributed precision scalar w n i , that is:
t ( u n | 0 , I q , ν i ) = d w n i 𝒩 ( u n | 0 , I q / w n i ) 𝒢 ( w n i | ν i / 2 , ν i / 2 ) t ( e n i | 0 , D i , ν i ) = d w n i 𝒩 ( e n i | 0 , D i / w n i ) 𝒢 ( w n i | ν i / 2 , ν i / 2 )
where 𝒢 ( · ) denotes the Gamma distribution. The standard EM algorithm for the MtFA is given in [14,18].

3. Distributed Estimation Algorithms for the MFA and MtFA

3.1. Network Model and Objective Function

Consider a sensor network with M nodes. The m-th node has N m data observations Y m = { y m , n } n = 1 , . . . , N m ( m = 1 , . . . , M ) , and y m , n denotes the n-th observation in node m. The distribution of each p-dimensional observation y m , n is modeled by the MFA, defined in Equation (1). It is noted that the factor associated with y m , n here is represented as u m , n . The parameter set of the MFA is Θ = { π i , μ i , A i , D i } i = 1 , . . . , I , which is to be estimated.
The network topology is described by a graph. Let W denote the distance that a node can communicate via wireless radio links. Nodes m and l are connected if the Euclidean distance d m , l between m and l is less than or equal to W. Moreover, a graph is connected if for any pair of nodes ( m , n ) , there exists a path from m to n. The neighborhood set of node m, denoted by R m , is defined as the one-hop neighbors of node m (including m itself). For example, in Figure 1, the dashed circle represents the neighborhood set of node m, containing Node 1, Node 2, node l and node m itself.
Figure 1. A sensor network consists of a collection of cooperating nodes. Node m only exchanges information (e.g., local sufficient statistics (LSS) in the proposed D-MFA and D-MtFA algorithms) with nodes in R m .
Figure 1. A sensor network consists of a collection of cooperating nodes. Node m only exchanges information (e.g., local sufficient statistics (LSS) in the proposed D-MFA and D-MtFA algorithms) with nodes in R m .
Sensors 15 19047 g001
In order to design the D-MFA and the D-MtFA algorithms, the objective functions should be carefully specified at first. Here, we take node m in the sensor network for example. We define the objective function m ( Θ ) of the MFA at node m as a linear combination of local log-likelihoods log p ( Y l | Θ ) associated with nodes l in its neighborhood R m .
m ( Θ ) = l R m c l m · log p ( Y l | Θ ) = l R m c l m n = 1 N l log i = 1 I π i N ( y l , n | μ i , A i , D i )
where { c l m } l R m are some non-negative combination coefficients satisfying the condition l R m c l m = 1 , c l m = 0 if node l R m .
It should be emphasized that when defining m ( Θ ) , we consider two important factors. First, as node m can only communicate with its neighbors, it is reasonable to define m ( Θ ) as a combination of local log-likelihoods log p ( Y l | Θ ) , ( l R m ) . When estimating Θ, node m can make use of the information from nodes in R m . Due to the effect of information diffusion, each node can obtain all information directly or indirectly from other nodes. Second, the contributions of different local log-likelihoods log p ( Y l | Θ ) , ( l R m ) for the estimation of Θ at node m may also be different. The combination coefficient c l m weights the importance of information flow from the node l ( l R m ) . Therefore, how to choose c l m is important. Here, we adopt a simple, but effective mechanism that c l m is determined by:
c l m = N l l R m N l
If node l has a larger number of observations N l , the information from this node makes a larger contribution to obtain more accurate parameter estimations. Therefore, a larger combination coefficient c l m in Equation (3) can further make this contribution prominent. In the future, a more effective implementation, such as an adaptive strategy [4], can be considered to determine these combination coefficients better.

3.2. Distributed Density Estimation Algorithm for the MFA

After the objective functions m ( Θ ) ( m = 1 , . . . , M ) have been determined, the next task is to estimate parameters Θ in the MFA by maximizing m ( Θ ) . For node l, an I-dimensional binary latent variable z l , n , which is associated with y l , n , is introduced. As the MFA is a mixture model, z l , n , i ( = 1 ) denotes that y l , n belongs to the i-th component of the MFA. The latent variables for node l in the neighborhood R m are U l = { u l , n } n = 1 , . . . , N l , Z l = { z l , n } n = 1 , . . . , N l , ( l R m ) . Now, m ( Θ ) ( m = 1 , . . . , M ) in Equation (2) can be expressed as:
m ( Θ ) = l R m c l m · log Z l d U l p ( Y l , Z l , U l | Θ )
where:
p ( Y l , Z l , U l | Θ ) = p ( Z l | Θ ) p ( U l | Z l , Θ ) p ( Y l | Z l , U l Θ )
The three conditional probabilities in Equation (4) are:
p ( Z l | Θ ) = n = 1 N l i = 1 I π i z l , n , i p ( U l | Z l , Θ ) = n = 1 N l i = 1 I 𝒩 ( u l , n | 0 , I q ) z l , n , i p ( Y l | Z l , U l , Θ ) = n = 1 N l i = 1 I 𝒩 ( y l , n | μ i + A i u l , n , D i ) z l , n , i
Here, we derive the distributed estimation algorithm with the aid of the standard EM algorithm [13]. First, we introduce two distributions q ( Z l ) and q ( U l | Z l ) defined over the latent variables. For any choice of q ( Z l ) and q ( U l | Z l ) , the following decomposition holds:
m ( Θ ) = l R m c l m · ( q l , Θ ) + l R m c l m · KL ( q l | | p l )
where:
( q l , Θ ) = Z l q ( Z l ) d U l q ( U l | Z l ) log p ( Y l , Z l , U l | Θ ) q ( Z l ) q ( U l | Z l ) KL ( q l | | p l ) = - Z l q ( Z l ) d U l q ( U l | Z l ) log p ( Z l | Y l , Θ ) p ( U l | Z l , Y l , Θ ) q ( Z l ) q ( U l | Z l )
The verification of log-likelihood decomposition can be found in [34]. As m ( Θ ) as a combination of local log-likelihoods, the whole decomposition can also be expressed by a combination of local log-likelihood decompositions, as shown in Equation (5).
Moreover, KL ( q l | | p l ) in Equation (5) is the Kullback–Leibler divergence between q ( Z l ) q ( U l | Z l ) and p ( Z l | Y l , Θ ) p ( U l | Z l , Y l , Θ ) , which satisfies KL ( q l | | p l ) 0 . Therefore, it can be seen from Equation (5) that l R m c l m · ( q l , Θ ) m ( Θ ) . In other words, l R m c l m · ( q l , Θ ) is a lower bound on m ( Θ ) . As direct maximization of the m ( Θ ) is difficult, it can be solved by the maximization of this lower bound instead.
Suppose that the parameters estimated in the last iteration are Θ old = { π i old , μ i old , A i old , D i old } i = 1 , . . . , I . In the first stage, the lower bound l R m c l m · ( q l , Θ old ) is maximized with respect to q ( Z l ) q ( U l | Z l ) while holding Θ old fixed. From Equation (5), this maximum can be achieved when KL ( q l | | p l ) = 0 . In other words, q ( Z l ) q ( U l | Z l ) = p ( Z l | Y l , Θ old ) p ( U l | Z l , Y l , Θ old ) . Therefore, two conditional distributions, p ( Z l | Y l , Θ old ) and p ( U l | Z l , Y l , Θ old ) , should be computed.
Concretely, for node l ( l R m ), p ( Z l | Y l , Θ old ) can be calculated by:
p ( Z l | Y l , Θ old ) = n = 1 N l i = 1 I p ( z l , n , i | y l , n )
where:
p ( z l , n , i | y l , n ) = π i old N ( y l , n | μ i old , A i old ( A i old ) T + D i old ) i = 1 I π i old N ( y l , n | μ i old , A i old ( A i old ) T + D i old )
Moreover, p ( U l | Z l , Y l , Θ old ) should also be obtained by:
p ( U l | Z l , Y l , Θ old ) = n = 1 N l i = 1 I p ( u l , n | y l , n , z l , n , i ) = n = 1 N l i = 1 I N ( u l , n | u ¯ l , n , i , Ω i )
The mean u ¯ m , n , i and covariance Ω i are:
u ¯ l , n , i = g i T ( y l , n - μ i old ) Ω i = I q - g i T A i old
where:
g i = A i old ( A i old ) T + D i old - 1 · A i old
is an intermediate variable introduced to simplify expressions in the following steps.
When the above two conditional distributions have been obtained, q ( Z l ) q ( U l | Z l ) is determined and held fixed, and the lower bound l R m c l m · ( q l , Θ ) is maximized with respect to Θ to get new estimated Θ new . This will cause the lower bound to increase, which will necessarily cause the corresponding m ( Θ ) to increase.
Concretely, the current lower bound is expressed as:
l R m c l m ( q l , Θ ) = l R m c l m Z l p ( Z l | Y l , Θ old ) d U l p ( U l | Z l , Y l , Θ old ) × log p ( Y l , Z l , U l | Θ ) - log p ( Z l | Y l , Θ old ) p ( U l | Z l , Y l , Θ old )
Discard the second logarithmic term unrelated to Θ in Equation (10); the objective function represented by Q m ( Θ ) at node m is:
Q m ( Θ ) = l R m c l m n = 1 N l i = 1 I p ( z l , n , i | y l , n ) p ( u l , n | y l , n , z l , n , i ) × z l , n , i log π i N ( u l , n | 0 , I q ) N ( y l , n | μ i + A i u l , n , D i )
Now, parameters in Θ can be obtained by taking derivation of Q m ( Θ ) with respect to Θ.
First, π i and μ i are updated to:
π i = l R m c l m n = 1 N l z l , n , i i = 1 I l R m c l m n = 1 N l z l , n , i
and:
μ i = l R m c l m n = 1 N l z l , n , i y l , n l R m c l m n = 1 N l z l , n , i
respectively. Subsequently, by performing derivation of Q m ( Θ ) with respect to A i , we have:
A i = l R m c l m n = 1 N l z l , n , i ( y l , n - μ i ) u l , n T l R m c l m n = 1 N l z l , n , i · u l , n u l , n T
Finally, the expression of D i can be obtained in the same way,
D i = diag l R m c l m n = 1 N l z l , n , i ( y l , n - μ i ) ( y l , n - μ i ) T - A i u l , n u l , n T A i T l R m c l m n = 1 N l z l , n , i
where diag { · } denotes the operator setting off-diagonal terms to zero. In Equations (11,12,13,14), z l , n , i is the expectation of z l , n , i given by Equation (6). u l , n and u l , n u l , n T can be obtained from Equation (7), which are:
u l , n = u ¯ l , n , i and u l , n u l , n T = Ω i + u ¯ l , n , i u ¯ l , n , i T
respectively. Substituting Equation (15) into Equations (13) and (14), we have:
A i = V i g i ( g i T V i g i + Ω i ) - 1
and:
D i = diag V i - A i ( g i T V i g i + Ω i ) A i T
where:
V i = l R m c l m n = 1 N l z l , n , i ( y l , n - μ i ) ( y l , n - μ i ) T l R m c l m n = 1 N l z l , n , i
From Equations (11), (12) and (16)–(18), we can see, when estimating parameters Θ at node m, that three combined sufficient statistics (CSS) must be obtained, represented as:
CSS m ( 1 ) [ i ] = l R m c l m · LSS l ( 1 ) [ i ] CSS m ( 2 ) [ i ] = l R m c l m · LSS l ( 2 ) [ i ] CSS m ( 3 ) [ i ] = l R m c l m · LSS l ( 3 ) [ i ]
where:
LSS l ( 1 ) [ i ] = n = 1 N l z l , n , i LSS l ( 2 ) [ i ] = n = 1 N l z l , n , i · y l , n LSS l ( 3 ) [ i ] = n = 1 N l z l , n , i · y l , n · y l , n T
LSS l = { LSS l ( 1 ) [ i ] , LSS l ( 2 ) [ i ] , LSS l ( 3 ) [ i ] } i = 1 , . . . , I are local sufficient statistics (LSS) of node l. Therefore, CSS in node m is a linear combination of the LSS of nodes in R m . If node l has a large number of observations, the accuracy of calculated LSS l should be high and should make an important contribution to the CSS of node m. A relatively large c l m in Equation (3) can make this contribution prominent, obtaining accurate estimation of Θ.
In the following, we summarize the realization process of the D-MFA algorithm.
Step 1 (Initialization): This initializes the parameters { π i , μ i , A i , D i } i = 1 , . . . , I . Each node l broadcasts the number of its observations N l to its neighbors. When receiving this information, each node calculates the combination coefficient by Equation (3).
Step 2 (Computation): Each node l in the sensor network computes z l , n , i , Ω i and g i by Equations (6), (8) and (9), respectively. Then, it computes three local sufficient statistics LSS l = { LSS l ( 1 ) [ i ] , LSS l ( 2 ) [ i ] , LSS l ( 3 ) [ i ] } i = 1 , . . . , I according to its own observations Y l , by Equation (20).
Step 3 (Diffusion): Each node l in sensor networks diffuses its local sufficient statistics LSS l , as shown in Figure 1.
Step 4 (Combination): When node m ( m = 1 , . . . , M ) receives the local sufficient statistics from all of its one-hop neighbor nodes l ( l R m ), it computes the combined sufficient statistics { CSS m ( 1 ) [ i ] , CSS m ( 2 ) [ i ] , CSS m ( 3 ) [ i ] } i = 1 , . . . , I by Equation (19).
Step 5 (Estimation): Node m ( m = 1 , . . . , M ) estimates π i , μ i , A i and D i according to Equations (11), (12), (16) and (17), respectively. Here, we substitute Equation (19) into Equations (11), (12) and (18), reformulating the estimation step as follows:
π i = CSS m ( 1 ) [ i ] i = 1 I CSS m ( 1 ) [ i ] μ i = CSS m ( 2 ) [ i ] CSS m ( 1 ) [ i ] A i = V i g i ( g i T V i g i + Ω i ) - 1 D i = diag V i - A i ( g i T V i g i + Ω i ) A i T
where:
V i = CSS m ( 3 ) [ i ] - 2 CSS m ( 2 ) [ i ] · μ i + CSS m ( 1 ) [ i ] · μ i μ i T CSS m ( 1 ) [ i ]
Step 6 (Termination): Node m ( m = 1 , . . . , M ) calculates its current local log-likelihood as:
log p ( Y m | Θ new ) = n = 1 N m log i = 1 I π i N ( y m , n | μ i , A i , D i )
where superscript “new” denotes the newly estimated parameters at the current iteration. If log p ( Y m | Θ new ) - log p ( Y m | Θ old ) < ϵ , node m enters the terminated state; else, go to Step 2 and start the next iteration. It is noted that the terminated nodes do no computation or communication in the following iterations. If one node cannot receive information from a neighbor node in the next iteration, the node will use the received and saved LSS information from that neighbor node at the last iteration when updating CSS. When there is no message communication or information exchange in the network, implying all nodes reach the terminated state, the algorithm ends.

3.3. Distributed Density Estimation Algorithm for the MtFA

Compared to the MFA, the main difference of the MtFA is that it has an additional degree of freedom parameter ν i ( i = 1 , . . . , I ) . Therefore, the parameter set of the MtFA to be estimated is Θ = { π i , μ i , A i , D i , ν i } i = 1 , . . . , I . Moreover, apart from Z l and U l , the latent variable W l = { w l , n , i } i = 1 , . . . , I n = 1 , . . . , N l should be introduced, explained in Section 2.2. Similarly, for node m in a sensor network, a linear combination of local log-likelihoods associated with nodes in R m is defined as:
m ( Θ ) = l R m c l m · log p ( Y l | Θ ) = l R m c l m n = 1 N l log i = 1 I π i · t ( y l , n | μ i , A i A i T + D i , ν i )
The derivation process of the D-MtFA algorithm is similar to that of the D-MFA, except that in Step 2, the posterior distribution of p ( w m , n , i | y m , n , z m , n , i ) and p ( u m , n | y m , n , z m , n , i , w m , n , i ) should be computed, and in Step 5, ν i needs to be estimated. We put this derivation in the Appendix in detail and directly describe the D-MtFA algorithm here.
Step 1 (Initialization): This initializes the values of the parameters { π i , μ i , A i , D i , ν i } i = 1 , . . . , I . Each node l broadcasts the number of its observations N l to its neighbors. When receiving this information, each node calculates the combination coefficient by Equation (3).
Step 2 (Computation): Each node l in the sensor network computes five local sufficient statistics LSS l = { LSS l ( 1 ) [ i ] , LSS l ( 2 ) [ i ] , LSS l ( 3 ) [ i ] , LSS l ( 4 ) [ i ] , LSS l ( 5 ) [ i ] } i = 1 , . . . , I according to its observations Y l , given as:
LSS l ( 1 ) [ i ] = n = 1 N l z l , n , i LSS l ( 2 ) [ i ] = n = 1 N l z l , n , i · w l , n , i LSS l ( 3 ) [ i ] = n = 1 N l z l , n , i · w l , n , i · y l , n LSS l ( 4 ) [ i ] = n = 1 N l z l , n , i · w l , n , i · y l , n y l , n T
LSS l ( 5 ) [ i ] = n = 1 N l z l , n , i · log w l , n , i
The expressions of expectations z l , n , i and w l , n , i in Equation (22) are given in the Appendix. Moreover, the intermediate variables Ω i and g i should also be prepared for simplifying expressions in Step 5, shown in the Appendix.
Step 3 (Diffusion): Each node l in the sensor network diffuses its local sufficient statistics LSS m , as shown in Figure 1.
Step 4 (Combination): When node m ( m = 1 , . . . , M ) receives the local sufficient statistics from all of its one-hop neighbor nodes l ( l R m ), it calculates the combined sufficient statistics, shown as:
CSS m ( H ) [ i ] = l R m c l m · LSS l ( H ) [ i ] H = 1 , 2 , 3 , 4 , 5
Step 5 (Estimation): Node m ( m = 1 , . . . , M ) estimates the parameters of the MtFA:
π i = CSS m ( 1 ) [ i ] i = 1 I CSS m ( 1 ) [ i ]
μ i = CSS m ( 3 ) [ i ] CSS m ( 2 ) [ i ]
A i = V i g i ( g i T V i g i + Ω i ) - 1
D i = diag V i - A i ( g i T V i g i + Ω i ) A i T
where:
V i = CSS m ( 4 ) [ i ] - 2 CSS m ( 3 ) [ i ] · μ i T + CSS m ( 2 ) [ i ] · μ i μ i T CSS m ( 1 ) [ i ]
In addition, ν i is updated by solving the following equation:
log ν i 2 - ψ ν i 2 + 1 - CSS m ( 5 ) [ i ] - CSS m ( 2 ) [ i ] CSS m ( 1 ) [ i ] - log ν i old + p 2 + ψ ν i old + p 2 = 0
where ψ ( · ) is the digamma function and ν i old is the value of ν i on the last iteration of this algorithm. Equation (29) can be solved by some numerical methods, i.e., the Newton method.
Step 6 (Termination): Node m ( m = 1 , . . . , M ) calculates its current local log-likelihood log p ( Y l | Θ new ) , expressed in Equation (21). The superscript “new” denotes the newly-estimated parameters at the current iteration. The termination condition of the algorithm is the same as that in the D-MFA algorithm.

4. Experimental Results

4.1. Synthetic Data

In this subsection, we test the performances of the proposed algorithms on synthetic data. Here, we consider a sensor network composed of 100 nodes to evaluate the estimation performance of the proposed algorithms. Nodes are randomly placed in a square of 5 × 5 . The communication distance is taken as 0.8. In this setting, the connected graph reflecting network topology is shown in Figure 2.
Figure 2. Network connection.
Figure 2. Network connection.
Sensors 15 19047 g002
In the first 30 nodes (Node 1–Node 30), each node has 80 observations. In the next 40 nodes (Node 31–Node 70), each node contains 100 observations. In the last 30 nodes (Node 71–Node 100), each node has 120 observations. All of the 10-dimensional observations in the 100 nodes are assumed to be generated from three-component Gaussian mixtures. The parameters are as follows:
( π 1 , π 2 , π 3 ) = ( 0 . 3 , 0 . 5 , 0 . 2 ) μ 1 = ( 3 3 3 3 3 0 0 0 0 0 ) ,   μ 2 = ( 0 0 0 0 0 0 0 0 0 0 ) , μ 3 = ( - 3 - 3 - 3 - 3 - 3 0 0 0 0 0 ) ; Σ 1 = diag ( 1 1 1 1 1 0 . 1 0 . 1 0 . 1 0 . 1 0 . 1 ) , Σ 2 = Σ 1 ,   Σ 3 = Σ 1 .
We adopt several models to represent the distributions of these observations, and the task is to estimate parameters in the models. Here, we compare the performance of four schemes. In the first scheme, the standard EM algorithm for the MFA (S-MFA) is implemented in a centralized unit using all observations from 100 nodes. In the second scheme, the D-MFA algorithm proposed in Section 3.2 performs simultaneously in all nodes. In the third scheme, the EM algorithm for the MFA runs in each node using only local observations of that node. In other words, there is no information exchange among nodes. We abbreviate it as non-cooperation MFA (NC-MFA) for description convenience. In the last scheme, the distributed EM algorithm for the GMM (D-GMM) is implemented. In the D-GMM, the objective function is similar to that of the proposed D-MFA, except that MFA is replaced by GMM. It should be emphasized that the centralized unit is assumed to be always reliable in the S-MFA under ideal conditions. However, this condition is not always fulfilled when the centralized unit fails. Therefore, the S-MFA is seldom adopted in sensor networks. The aim is to test whether the estimation performance of the D-MFA can approach that of the S-MFA.
In the initialization of these MFA schemes, the dimension of factors are set to five. ( π 1 0 , π 2 0 , π 3 0 ) = ( 1 / 3 , 1 / 3 , 1 / 3 ) , { μ 1 0 , μ 2 0 , μ 3 0 } are set as randomly-selected observations in the those nodes. The initial elements in { D 1 0 , D 2 0 , D 3 0 } and { A 1 0 , A 2 0 , A 3 0 } are generated by standardized normal distributions. In order to make the estimation results visible, the principal component analysis is performed for observations, obtaining the two largest eigenvalues and the associated eigenvectors. Then, the observations, the estimated means μ i ( i = 1 , 2 , 3 ) and the covariances Σ i ( Σ i = A i A i T + D i ) after the termination of the algorithms can be projected into 2D principal subspace [34]. Figure 3 illustrates the results of the estimated parameters at the 2D principal subspace in these four schemes. In this figure, the estimated mean μ i of each component is denoted by “+”, and the estimated covariance Σ i is represented by shaded ellipse. Concretely, in Figure 3a, parameters can be correctly estimated by the S-MFA, as the centralized unit can use all of the observations directly. In Figure 3b–d, the results of a randomly-selected node are given. For the NC-MFA, the appropriate parameters are incorrectly estimated, as it can only use its own observations, which also happens in other nodes. For the D-GMM, as it is based on GMM, it cannot describe and process high-dimensional observations well. Finally, in the D-MFA, each node can receive the calculated LSS from nodes in its neighborhood set and combine them for parameter estimation. Compared to GMM, MFA can reflect the properties of these high-dimensional observations more accurately. Therefore, the estimated means and covariances are correct in the D-MFA, as shown in Figure 3b. The other nodes have the same results as this selected node, which are not given here due to space limitation. Moreover, as the same observations and models are used in three MFA schemes, the changes of the average log-likelihood over all nodes in the S-MFA, log-likelihood of the D-MFA and the NC-MFA are shown in different lines in Figure 4. We can see that as the iteration increases, the D-MFA is convergent. Its convergence performance approaches that of the S-MFA.
In order to further show the estimation accuracy of the D-MFA at all of the nodes in a sensor network, we select two kinds of parameters ( π 1 , π 2 , π 3 ) and μ 1 , giving the estimation results of these parameters. In Figure 5, the estimated ( π 1 , π 2 , π 3 ) in the D-MFA and the NC-MFA at all 100 nodes are provided. In the NC-MFA, each node cannot correctly estimate parameters due to limited observations and no information exchange with other nodes, shown by dashed lines. On the contrary, after the D-MFA converges, the estimated values ( π 1 , π 2 , π 3 ) in all 100 nodes approach their true values ( 0 . 3 , 0 . 5 , 0 . 2 ) . In Figure 6, we compare all of the vector components in μ 1 estimated by these three MFA schemes. It is noted that for the D-MFA and the NC-MFA, we give the mean and standard deviation of each vector component over 100 nodes to reflect the whole performance of network. We can clearly see that the S-MFA can correctly estimate μ 1 , as it can use all of the observations. For the D-MFA, the mean of estimated μ 1 in each vector component approaches the corresponding true value ( 3 3 3 3 3 0 0 0 0 0 ) , while the mean of μ 1 obtained by the NC-MFA is not consistent with true value. Moreover, the standard deviation of D-MFA is smaller than that of the NC-MFA. Since the other parameters lead to similar results, we omit them here.
Figure 3. Scatter plot of observations with the estimated parameters at 2D principal subspace using different schemes: (a) S-MFA; (b) D-MFA; (c) NC-MFA; (d) D-GMM.
Figure 3. Scatter plot of observations with the estimated parameters at 2D principal subspace using different schemes: (a) S-MFA; (b) D-MFA; (c) NC-MFA; (d) D-GMM.
Sensors 15 19047 g003
Figure 4. Log-likelihood changes of three MFA schemes during 30 iterations.
Figure 4. Log-likelihood changes of three MFA schemes during 30 iterations.
Sensors 15 19047 g004
Figure 5. The estimated mixing proportions ( π 1 , π 2 , π 3 ) in the D-MFA and the NC-MFA at 100 nodes.
Figure 5. The estimated mixing proportions ( π 1 , π 2 , π 3 ) in the D-MFA and the NC-MFA at 100 nodes.
Sensors 15 19047 g005
Figure 6. The mean and standard deviation of all of the vector components in estimated μ 1 over 100 nodes.
Figure 6. The mean and standard deviation of all of the vector components in estimated μ 1 over 100 nodes.
Sensors 15 19047 g006
For the D-MtFA algorithm, we test its performance and compare it to the S-MtFA, the NC-MtFA and the D-tMM. It is noted that the S-MtFA, the NC-MtFA and the D-tMM can be realized by replacing Gaussian distributions in the S-MFA, the NC-MFA and the D-GMM with Student’s t-distributions, respectively. Here, observations are generated by mixtures of Student’s t-distributions. The parameters { π i , μ i , Σ i } i = 1 , 2 , 3 are unchanged while ν 1 = ν 2 = ν 3 = 5 . In Figure 7, the scatter plot of observations with the estimated parameters at the 2D principal subspace is shown. It is noted that for the D-MtFA, the NC-MtFA and the D-tMM, the results of a randomly-selected node are given. From this figure, we can see several observations located out of ordinary regions, which can be taken as outliers. The S-MtFA in the centralized unit, shown in Figure 7a, can grasp the distributions, as it can make use of all observations. On the contrary, the performance of the NC-MtFA and the D-tMM are bad. The reasons are similar to those of NC-MFA and D-GMM, which have been explained. When implementing the D-MtFA, parameters can be accurately estimated, while the property of robustness to outliers is still maintained, as shown in Figure 7b. In summary, the proposed D-MFA and the D-MtFA can accurately estimate parameters in a distributed way when each node in sensor network has part of the high-dimensional observations.
Figure 7. Scatter plot of observations with the estimated parameters at the 2D principal subspace using different schemes: (a) S-MtFA; (b) D-MtFA; (c) NC-MtFA; (d) D-tMM.
Figure 7. Scatter plot of observations with the estimated parameters at the 2D principal subspace using different schemes: (a) S-MtFA; (b) D-MtFA; (c) NC-MtFA; (d) D-tMM.
Sensors 15 19047 g007

4.2. Real Data

In several countries, there are monitoring sites located in different regions, whose tasks are to detect nutritional ingredients in the wine samples. These sites form a sensor network, in which each site only communicates with its neighbors and can implement local computations. The wine samples sent to these monitoring sites may belong to different cultivars. Therefore, in each monitoring site, these samples need to be classified, which is good for analyzing in-depth the relationship of nutritional ingredients in the wine and their cultivars. It is certain that the more the references are available for each site, the better the results will be. Therefore, the network and cooperation between sites are required.
In this subsection, we consider the wine cultivar clustering problem as a simulation of the above scenario. The database of this problem is the wine dataset, which is one of the most popular datasets in the UCImachine learning repository [35]. In this wine dataset, 178 samples are collected from a chemical analysis of wines grown in three different cultivars in Italy (the No. 1∼No. 59 samples belong to the first class; the No. 60∼No.130 samples belong to the second class; the No. 131∼No. 178 samples belong to the third class). Each sample has 13 attributes, so the dimension of observations is 13. The sensor network is composed of eight nodes, represented as eight monitoring sites. The average number of nodes in the neighborhood set is two, and the graph is guaranteed to be connected. The average number of samples in each site is 22.
Clustering belongs to the unsupervised learning paradigm in machine learning. When the D-MFA (or the D-MtFA) is adopted for clustering, initial values of parameters in the D-MFA are set, which are the same as those in Section 4.1. Then, the corresponding algorithm derived in Section 3.2 (or Section 3.3) is performed. After algorithm converges, an additional computation step based on the estimated parameters Θ at node m is carried out to obtain z m , n , i by Equation (6). Finally, the cluster decision for each observation y m , n is:
𝒞 m , n = argmax i = 1 I z m , n , i . m = 1 , . . . M , n = 1 , . . . , N m
The clustering results of the D-MFA at Node 1∼Node 8 are shown in Figure 8. In this figure, blue “∘” represents correctly-clustered observations, while red “×” denotes wrongly-clustered ones. From these figures, we can see that the correct ratio in eight nodes are 100 % , 100 % , 95 . 2 % , 95 . 5 % , 100 % , 95 . 5 % , 100 % , 92 . 9 % . There are five wrongly-clustered observations in all. The correct ratio in the entire network is 97 . 2 % . In order to compare the performance of the D-MFA with that of the S-MFA, we perform the D-MFA and the S-MFA algorithms 20 times. The average correct ratio of these 20 runs by the D-MFA is 96 . 9 % , approaching that by the S-MFA, which is 98 . 2 % . The reason for the small performance gap between the S-MFA and the D-MFA may be that the number of observations for each node is small and the dimension is relatively high. The accuracy of the calculated LSS in Step 2 of the D-MFA are a little worse than those global sufficient statistics obtained by all of the observations in the E-step of the S-MFA. For the NC-MFA, as the number of observations in this example at each node is small, the clustering cannot be implemented. For the D-GMM, as the dimension of observations is high, it also cannot finish the task of this example effectively. Moreover, as there are no outliers in this dataset, the clustering result of the D-MtFA is the same as that of the D-MFA, which is not shown here. In summary, we can use the proposed schemes to realize distributed clustering.
Figure 8. Clustering results of the wine dataset at (ah) Node 1∼Node 8.
Figure 8. Clustering results of the wine dataset at (ah) Node 1∼Node 8.
Sensors 15 19047 g008

5. Conclusions

In this paper, we propose a distributed density estimation method base on a mixture of factor analyzers in sensor networks. First, a linear combination of local log-likelihoods associated with nodes in its neighborhood (including itself) is defined as the objective function. In this objective function, the combination coefficients are determined by the number of observations in corresponding nodes. Then, the D-MFA and the D-MtFA algorithms are derived. In these algorithms, the combined sufficient statistics of each node are used to estimate parameters, which can be obtained by performing a linear combination of local sufficient statistics from nodes in its neighborhood. Finally, we evaluate the performances of the proposed algorithms and apply them to the tasks of clustering and classification. Experimental results show that they are promising and effective statistical tools for processing the high-dimensional datasets in a distributed way in sensor networks.
In our future work, we will investigate distributed algorithms that can automatically determine the structure of MFA, e.g., the number of components. We also intend to design adaptive strategies to adjust combination coefficients more flexibly. Moreover, the coverage problem [36] in a sensor network is important when implementing distributed algorithms. We will consider this issue in the future.

Acknowledgments

This work is partly supported by the National Natural Science Foundation of China (Grant Nos. 61171153, 61201326, 61273266, 61201165, 61271240, 61322104, 61401228), the State Key Development Program of Basic Research of China (2013CB329005), the Natural Science Fund for Higher Education of Jiangsu Province (Grant Nos. 12KJB510021, 13KJB510020), the Natural Science Foundation of Jiangsu Province (BK20140891), the Priority Academic Program Development of Jiangsu Higher Education Institutions, the Scientific Research Foundation of NUPT(Grant Nos. NY211032, NY211039), the Qing Lan Project, the Zhejiang Provincial Natural Science Foundation of China (Grant No. LR12F01001) and the National Program for Special Support of Eminent Professionals.

Author Contributions

All authors conceived the algorithms and designed the simulations; XinWei wrote the initial research manuscript; The other authors contributed to the revision of the final paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix

A. Derivation of the D-MtFA Algorithm

In this Appendix, we give the derivation of the D-MtFA after defining the objective function m ( Θ ) in Section 3.3. After introducing three distributions q ( Z l ) , q ( W l | Z l ) and q ( U l | W l , Z l ) , the decomposition shown by Equation (5) holds, where:
( q l , Θ ) = Z l q ( Z l ) d W l q ( W l | Z l ) d U l q ( U l | W l , Z l ) × log p ( Y l , U l , W l , Z l | Θ ) q ( Z l ) q ( W l | Z l ) q ( U l | W l , Z l )
KL ( q l | | p l ) = - Z l q ( Z l ) d W l q ( W l | Z l ) d U l q ( U l | W l , Z l ) × log p ( Z l | Y l , Θ ) p ( W l | Z l , Y l , Θ ) p ( U l | W l , Z l , Y l , Θ ) q ( Z l ) q ( W l | Z l ) q ( U l | W l , Z l )
In the first stage, three conditional distributions are needed to compute. Concretely, p ( z l , n , i | y l , n ) at node l is:
p ( z l , n , i | y l , n ) = π i old t ( y l , n | μ i old , A i old ( A i old ) T + D i old , ν i old ) i = 1 I π i old t ( y l , n | μ i old , A i old ( A i old ) T + D i old , ν i old )
The conditional distribution of w l , n , i given z l , n , i and y l , n is obtained:
p ( w l , n , i | z l , n , i , y l , n ) = 𝒢 w l , n , i | ν i old + p 2 , ν i old + 𝒲 l , n , i 2
where:
𝒲 l , n , i = ( y l , n - μ i old ) T A i old ( A i old ) T + D i old ( y l , n - μ i old )
The expectations z l , n , i are given by Equation (A1), and w l , n , i can be obtained from Equation (A2), which is:
w l , n , i = ν i old + p ν i old + 𝒲 l , n , i
The conditional distribution of u l , n given w l , n , i , z l , n , i and y l , n can be computed:
p ( u l , n | w l , n , i , z l , n , i , y l , n ) = N ( u l , n | u ¯ l , n , i , Ω i / w l , n , i )
The mean u ¯ l , n , i and Ω i are:
u ¯ l , n , i = g i T ( y l , n - μ i old ) Ω i = I q - g i T A i old
where:
g i = A i old ( A i old ) T + D i old - 1 · A i old
is an intermediate variable introduced to simplify expressions in the following stage. Moreover, the expectations u l , n and u l , n u l , n T are:
u l , n = u ¯ l , n , i and u l , n u l , n T = Ω i / w l , n , i + u ¯ l , n , i u ¯ l , n , i T
Similar to the D-MFA, when the first stage finishes, the current lower bound is:
Q m ( Θ ) = l R m c l m n = 1 N l i = 1 I p ( z l , n , i | y l , n ) p ( w l , n , i | y l , n , z l , n , i ) p ( u l , n | y l , n , z l , n , i , w l , n , i ) × z l , n , i log π i N ( u l , n | 0 , I q ) 𝒢 ( w l , n , i | ν i / 2 , ν i / 2 ) N ( y l , n | μ i + A i u l , n , D i )
Now, at node m, parameters can be obtained by taking derivation of Q m ( Θ ) with respect to Θ. First, we obtain:
π i = l R m c l m n = 1 N l z l , n , i i = 1 I l R m c l m n = 1 N l z l , n , i
and:
μ i = l R m c l m n = 1 N l z l , n , i · w l , n , i y l , n l R m c l m n = 1 N l z l , n , i · w l , n , i
respectively. By substituting Equations (22) and (23) into Equations (A3) and (A4), we can obtain Equations (24) and (25).
Subsequently, by respectively performing derivation of Q m ( Θ ) with respect to A i and D i , we have:
A i = l R m c l m n = 1 N l z l , n , i · w l , n , i ( y l , n - μ i ) u l , n T l R m c l m n = 1 N l z l , n , i · w l , n , i · u l , n u l , n T
and:
D i = diag l R m c l m n = 1 N l z l , n , i w l , n , i - A i u l , n u l , n T A i T ( y l , n - μ i ) ( y l , n - μ i ) T l R m c l m n = 1 N l z l , n , i
By substituting z l , n , i , w l , n , i , u l , n , u l , n u l , n T into Equations (A5) and (A6) and by considering the simplified expressions in Equations (22), (23) and (28), we can obtain Equations (26) and (27).
Finally, by performing derivation of Q m ( Θ ) with respect to ν i , the update equation for ν i is obtained, shown by Equation (29).

References

  1. Akyildiz, I.; Su, W.; Sankarasubramniam, Y. A survey on sensor networks. IEEE Commun. Mag. 2002, 40, 102–114. [Google Scholar] [CrossRef]
  2. Sayed, A.H.; Tu, S.Y.; Chen, J.; Zhao, X.; Towfic, Z. Diffusion strategies for adaptation and learning over networks: An examination of distributed strategies and network behavior. IEEE Signal Process. Mag. 2013, 30, 155–171. [Google Scholar] [CrossRef]
  3. Poza-Lujan, J.; Posadas-Yagüe, J.; Simó-Ten, J.; Simarro, R.; Benet, G. Distributed sensor architecture for intelligent control that supports quality of control and quality of service. Sensors 2015, 15, 4700–4733. [Google Scholar] [CrossRef] [PubMed]
  4. Chen, J.; Sayed, A.H. Diffusion adaptation strategies for distributed optimization and learning over networks. IEEE Trans. Signal Process. 2012, 60, 4289–4305. [Google Scholar] [CrossRef]
  5. Cattivelli, F.S.; Sayed, A.H. Diffusion LMS strategies for distributed estimation. IEEE Trans. Signal Process. 2010, 58, 1035–1048. [Google Scholar] [CrossRef]
  6. Meteos, G.; Giannakis, G.B. Distributed recursive least-squares: Stability and performance analysis. IEEE Trans. Signal Process. 2012, 60, 3740–3754. [Google Scholar] [CrossRef]
  7. Cao, M.; Meng, Q.; Zeng, M.; Sun, B.; Li, W.; Ding, C. Distributed least-squares estimation of a remote chemical source via convex combination in wireless sensor networks. Sensors 2014, 14, 11444–11466. [Google Scholar] [CrossRef] [PubMed]
  8. Cao, L.; Xu, C.; Shao, W.; Zhang, G.; Zhou, H.; Sun, Q.; Guo, Y. Distributed power allocation for sink-centric clusters in multiple sink wireless sensor networks. Sensors 2010, 10, 2003–2026. [Google Scholar] [CrossRef] [PubMed]
  9. Lorenzo, P.D.; Sayed, A.H. Sparse distributed learning based on diffusion adaption. IEEE Trans. Signal Process. 2013, 61, 1419–1433. [Google Scholar] [CrossRef]
  10. Liu, Z.; Liu, Y.; Li, C. Distributed sparse recursive least-squares over networks. IEEE Trans. Signal Process. 2014, 62, 1385–1395. [Google Scholar] [CrossRef]
  11. Li, C.; Shen, P.; Liu, Y.; Zhang, Z. Diffusion information theoretic learning for distributed estimation over network. IEEE Trans. Signal Process. 2013, 61, 4011–4024. [Google Scholar] [CrossRef]
  12. Gu, D.; Hu, H. Spatial Gausssian process regression with mobile sensor networks. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1279–1290. [Google Scholar] [PubMed]
  13. Dempster, A.P.; Laird, N.M.; Robin, D.B. Maximum likelihood estimation from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 1977, 39, 1–38. [Google Scholar]
  14. McLachlan, G.J.; Krishnan, T. The EM Algorithm and Extensions, 2nd ed.; Wiley: New York, NY, USA, 2008. [Google Scholar]
  15. McLachlan, G.J.; Peel, D. Finite Mixture Models; Wiley: New York, NY, USA, 2000. [Google Scholar]
  16. Ghahramani, Z.; Hinton, G.E. The EM Algorithm for Mixtures of Factor Analyzers; Tech. Rep. CRG-TR-96-1; Department of Computer Science, University of Toronto: Toronto, ON, USA, 1997. [Google Scholar]
  17. Zhao, J.; Yu, P.L.H. Fast ML estimation for the mixture of factor analyzers via an ECM algorithm. IEEE Trans. Neural Netw. 2008, 19, 1956–1961. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. McLachlan, G.J.; Bean, R.W.; Jones, L.B. Extension of the mixture of factor analyzers model to incorporate the multivariated t-distribution. Comput. Stat. Data Anal. 2007, 51, 5327–5338. [Google Scholar] [CrossRef]
  19. Andrews, J.L.; McNicholas, P.D. Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. J. Stat. Plan. Infer. 2011, 141, 1479–1486. [Google Scholar] [CrossRef]
  20. Carin, L.; Baraniuk, R.G.; Cevher, V.; Dunson, D.; Jordan, M.I.; Sapiro, G.; Wakin, M.B. Learning low-dimensional signal models. IEEE Signal Process. Mag. 2011, 28, 39–51. [Google Scholar] [CrossRef] [PubMed]
  21. Li, R.; Tian, T.P.; Sclaroff, S. 3D human motion tracking with a coordinated mixture of factor analyzers. Int. J. Comput. Vis. 2010, 87, 1–2. [Google Scholar] [CrossRef]
  22. Wu, Z.; Kinnunen, T.; Chng, E.S. Mixture of factor analyzers using priors from non-parallel speech for voice conversion. IEEE Signal Process. Lett. 2012, 19, 914–917. [Google Scholar] [CrossRef]
  23. Baek, J.; McLachlan, G.J. Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 2011, 21, 1269–1276. [Google Scholar] [CrossRef] [PubMed]
  24. Wei, X.; Li, C. Bayesian mixtures of common factor analyzers: Model, variational inference, and applications. Signal Process. 2013, 93, 2894–2904. [Google Scholar] [CrossRef]
  25. Nowak, R.D. Distributed EM algorithms for density estimation and clustering in sensor neworks. IEEE Trans. Signal Process. 2003, 51, 2245–2253. [Google Scholar] [CrossRef]
  26. Safarinejadian, B.; Menhaj, M.B.; Karrari, M. Distributed variational Bayesian algorithms for Gaussian mixtures in sensor networks. Signal Process. 2010, 90, 1197–1208. [Google Scholar] [CrossRef]
  27. Gu, D. Distributed EM algorithm for Gaussian mixtures in sensor networks. IEEE Trans. Neural Netw. 2008, 19, 1154–1166. [Google Scholar] [CrossRef]
  28. Safarinejadian, B.; Menhaj, M.B.; Karrari, M. Distributed unsupervised Gaussian mixture learning for density estimation in sensor networks. IEEE Trans. Instrum. Meas. 2010, 59, 2250–2260. [Google Scholar] [CrossRef]
  29. Pereira, S.S.; Valcarce, R.L.; Zamora, A.P. A diffusion-based EM algorithm for distributed estimation in unreliable sensor networks. IEEE Signal Process. Lett. 2013, 20, 595–598. [Google Scholar] [CrossRef]
  30. Pereira, S.S.; Zamora, A.P.; Valcarce, R.L. A diffusion-based distributed EM algorithm for density estimation in wireless sensor networks. In Proceedings of the International Conference on Acoustics Speech, and Signal Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2003; pp. 4449–4453.
  31. Towfic, Z.J.; Chen, J.; Sayed, A.H. Collaborative learning of mixture models using diffusion adaptation. In Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Santander, Spain, 18–21 September 2011; pp. 1–6.
  32. Weng, Y.; Xiao, W.; Xie, L. Diffusion-based EM algorithm for distributed estimation of Gaussian mixtures in wireless sensor networks. Sensors 2011, 11, 6297–6316. [Google Scholar] [CrossRef] [PubMed]
  33. Pascal, B.; Gersende, F.; Walid, H. Performance of a distributed stochastic approximation algorithm. IEEE Trans. Inf. Theory 2013, 59, 7405–7418. [Google Scholar]
  34. Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
  35. UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science, 2013 [online]. Available online: http://archive.ics.uci.edu/ml (accessed on 16 April 2015).
  36. Katsuma, R.; Murata, Y.; Shibata, N.; Yasumoto, K.; Ito, M. Extending k-coverage lifetime of wireless sensor networks using mobile sensor nodes. In Proceedings of the 5th Annual IEEE International Conference on Wireless and Mobile Computing, Networking and Communications, Marrakech, Morocco, 12–14 October 2009; pp. 48–54.

Share and Cite

MDPI and ACS Style

Wei, X.; Li, C.; Zhou, L.; Zhao, L. Distributed Density Estimation Based on a Mixture of Factor Analyzers in a Sensor Network. Sensors 2015, 15, 19047-19068. https://doi.org/10.3390/s150819047

AMA Style

Wei X, Li C, Zhou L, Zhao L. Distributed Density Estimation Based on a Mixture of Factor Analyzers in a Sensor Network. Sensors. 2015; 15(8):19047-19068. https://doi.org/10.3390/s150819047

Chicago/Turabian Style

Wei, Xin, Chunguang Li, Liang Zhou, and Li Zhao. 2015. "Distributed Density Estimation Based on a Mixture of Factor Analyzers in a Sensor Network" Sensors 15, no. 8: 19047-19068. https://doi.org/10.3390/s150819047

Article Metrics

Back to TopTop