Computing the Partial Correlation of ICA Models for Non-Gaussian Graph Signal Processing

Belda, Jordi; Vergara, Luis; Safont, Gonzalo; Salazar, Addisson

doi:10.3390/e21010022

Open AccessArticle

Computing the Partial Correlation of ICA Models for Non-Gaussian Graph Signal Processing

Institute of Telecommunications and Multimedia Applications, Universitat Politècnica de València, 46022 València, Spain

^*

Author to whom correspondence should be addressed.

Entropy 2019, 21(1), 22; https://doi.org/10.3390/e21010022

Submission received: 23 November 2018 / Revised: 23 December 2018 / Accepted: 24 December 2018 / Published: 29 December 2018

(This article belongs to the Special Issue Information Theory Applications in Signal Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Conventional partial correlation coefficients (PCC) were extended to the non-Gaussian case, in particular to independent component analysis (ICA) models of the observed multivariate samples. Thus, the usual methods that define the pairwise connections of a graph from the precision matrix were correspondingly extended. The basic concept involved replacing the implicit linear estimation of conventional PCC with a nonlinear estimation (conditional mean) assuming ICA. Thus, it is better eliminated the correlation between a given pair of nodes induced by the rest of nodes, and hence the specific connectivity weights can be better estimated. Some synthetic and real data examples illustrate the approach in a graph signal processing context.

Keywords:

partial correlation; independent component analysis; graph signal processing

1. Introduction

1.1. Background

The partial correlation coefficient (PCC) [1] is a classical concept that has relevance in a variety of statistical signal processing problems. Essentially, PCC measures the correlation between two random variables conditioned to other observed random variables. Interest in it has recently increased due to the emergence of graph signal processing (GSP) [2,3,4]. One key aspect of GSP is defining the graph connectivity. Although this can be done considering the natural interactions from the context where the graph signal is defined (e.g., time or space proximity between two nodes), it is desirable to develop formal statistical methods; that is, given a set of multivariate samples where every sample component is assigned to a node of the graph, the graph connectivity which best describes the implicit dependences between any two nodes can be learned. Thus, PCCs are appropriate candidates to define the connectivity as the effect of the rest of nodes is removed from the pairwise correlation. Actually, the PCC is formally interpreted as the correlation between the residuals obtained after optimal estimation of the values of the two involved nodes from the rest of nodes. Optimality is in the sense of minimum linear mean square error (LMSE). Fortunately, it is not necessary to make an explicit estimation, as the PCCs can be computed from the so-called precision matrix (inverse of the covariance matrix). Thus, many efforts have focused onto estimating the precision matrix both in GSP [5,6] and in statistics [7,8,9,10,11,12]. However, the minimum LMSE estimator is optimum only if Gaussianity can be assumed. Accordingly, we can say that methods based on the precision matrix are suboptimal in non-Gaussian scenarios. The concept of PCC is extended to a Gaussian mixture model (GMM) in [13]. Apart from this work, and to our knowledge, there have been no other attempts to consider non-Gaussian models in graph connectivity learning.

1.2. New Contributions and Paper Organization

In this work we consider the partial correlation computation under a non-Gaussian model, in particular, a model of independent component analysis (ICA). ICA [14,15,16,17] is a consolidated technique which has found a myriad of applications in statistical signal processing (e.g., blind source separation [18,19,20,21,22,23,24,25]) and pattern recognition (see [26,27,28,29,30,31] and references therein). From the perspective of this work, ICA is a model which incorporates non-Gaussianity through some independent variables (sources), which are linearly mixed to create the observed samples. This makes it highly versatile and allows for the modeling of non-Gaussian multivariate densities.

In the next Section we define a new partial correlation coefficient: ICA-PCC. The basic concept is to replace the implicit linear estimation of conventional PCC by a nonlinear estimation (conditional mean) assuming an underlying ICA model. Then, a general formula is presented to compute the residual covariance matrix from where the ICA-PCCs are to be computed. An essential part of this formula is a diagonal matrix having entries equal to the mean-square-errors of estimating the sources of the ICA model. Then, in Section 3, a practical method is presented to estimate such a matrix from the ICA model parameters. Finally, Section 4 includes some simulations to illustrate the improved estimation of the partial correlation by ICA-PCC in non-Gaussian scenarios. A real data example with EEG multichannel highly non-Gaussian signals is also included to quantify changes in brain connectivity between normal and abnormal states of a patient during sleep.

2. The Partial Correlation of ICA Models

2.1. Statement of the Problem

Let

x = {[x_{1} \dots x_{N}]}^{T}

be the observation vector having covariance matrix

E [x x^{T}] = C_{x x}

. We assume that

x

obeys an ICA model, then

x = U s s = W x

(1)

where

s = {[s_{1} \dots s_{N}]}^{T}

is a vector of independent sources and

U

is a square and invertible mixing matrix (

W = U^{- 1}

is the de-mixing matrix). The sources are considered standardized (zero mean and unit variance), otherwise they may have different non-Gaussian marginal densities, which factorize the joint probability density function (pdf)

p (s) = p (s_{1}) \cdot \dots \cdot p (s_{N})

. Notice that

E [s] = 0 C_{s s} = E [s s^{T}] = I E [x] = 0 C_{x x} = E [x x^{T}] = E [U s s^{T} U^{T}] = U U^{T}

(2)

Every component of

x

is assigned to every node of a graph

G {V, E, A}

, where

V

is the set of N nodes,

E

is the set of edges connecting the nodes and

A

is the adjacency matrix. The generic element

a_{n m}

is the weight (assumed real and nonnegative) corresponding to the edge connecting node m to node n. We will consider undirected graphs, so

a_{n m} = a_{m n}

. The problem is to learn

A

from an available set of observation vectors. PCCs are reasonable candidates as they can measure the correlation between two nodes removing the effect of the rest of nodes. Moreover, PCCs can be computed from the precision matrix

Q_{x x} = C_{x x}^{- 1}

in the form

ρ_{n m}^{P C C} = - \frac{q_{n m}}{\sqrt{q_{n n} q_{m m}}}

(3)

where

q_{n m}

is the nm element of matrix

Q

and

ρ_{n m}^{P C C}

is the PCC of nodes n and m. Equation (3) could be used for any underlying joint probability density

p (x)

, however it is optimal only for the Gaussian case. This is because the formal definition of

ρ_{n m}^{P C C}

is given by

ρ_{n m}^{P C C} = \frac{E [(x_{n} - L [x_{n} / x_{- n m}]) (x_{m} - L [x_{m} / x_{- n m}])]}{\sqrt{E [{(x_{n} - L [x_{n} / x_{- n m}])}^{2}]} \sqrt{E [{(x_{m} - L [x_{m} / x_{- n m}])}^{2}]}}

(4)

where

x_{- n m}

is the vector formed by all the samples of

x

except

x_{n}^{}

and

x_{m}^{}

, and

L [x_{n} / x_{- n m}]

,

L [x_{m} / x_{- n m}]

are respectively the minimum LMSE estimates of

x_{n}^{}

and

x_{m}^{}

from

x_{- n m}

. However, optimum removal of the effect of

x_{- n m}

implies the use of the conditional means

E [x_{n} / x_{- n m}]

and

E [x_{m} / x_{- n m}]

, which respectively coincide with

L [x_{n} / x_{- n m}]

and

L [x_{m} / x_{- n m}]

only when

p (x)

is multivariate Gaussian. Thus, in the non-Gaussian case, conventional PCC does not precisely capture the partial correlation, and so, the graph connectivity. In [13] a generalized PCC (GPCC) is defined in the form

ρ_{n m}^{G P C C} = \frac{E [(x_{n} - E [x_{n} / x_{- n m}]) (x_{m} - E [x_{m} / x_{- n m}])]}{\sqrt{E [{(x_{n} - E [x_{n} / x_{- n m}])}^{2}]} \sqrt{E [{(x_{m} - E [x_{m} / x_{- n m}])}^{2}]}}

(5)

where the conditional mean

E [x_{n} / x_{- n m}]

depends on the specific model assumed for

p (x)

. In this paper we consider the ICA model (1). Then, the corresponding partial correlation coefficient is called ICA-PCC, and it is represented by

ρ_{n m}^{I C A - P C C}

to be specific with respect to the general definition (5).

2.2. A General Formula for the Residual Covariance

Let us define the vector

x_{n m} = {[x_{n} x_{m}]}^{T}

. We can express

x

in the form

x = T_{- n m} x_{- n m} + T_{n m} x_{n m}

(6)

where

T_{- n m}

is a matrix of dimension

(N \times (N - 2))

obtained from an

(N \times N)

identity matrix by dropping the n-th and m-th columns. Similarly,

T_{n m}

is a matrix of dimension

(N \times 2)

obtained from an

(N \times N)

identity matrix dropping all but the n-th and m-th columns. Let us also define the residual vector

e_{n m} = {[e_{n} e_{m}]}^{T}

e_{n} = x_{n} - E [x_{n} / x_{- n m}] e_{m} = x_{m} - E [x_{m} / x_{- n m}] .

Notice that the conditional mean is an unbiased estimator, hence the residuals are zero mean and the residual covariance matrix will be

C_{e_{n m} e_{n m}} = E [e_{n m} e_{n m}^{T}] = [\begin{matrix} E [e_{n}^{2}] & E [e_{n} e_{m}] \\ E [e_{m} e_{n}] & E [e_{m}^{2}] \end{matrix}]

(7)

We want to compute the residual covariance matrix so that (5) can be applied. We assume an ICA model. First notice that

e_{n m} = x_{n m} - E [x_{n m} / x_{- n m}]

, but considering (1) and (6), we may write

E [s / x_{- n m}] = W (T_{- n m} x_{- n m} + T_{n m} E [x_{n m} / x_{- n m}])

(8)

Then, we can solve for

E [x_{n m} / x_{- n m}]

E [x_{n m} / x_{- n m}] = {(W T_{n m})}^{+} (E [s / x_{- n m}] - W T_{- n m} x_{- n m})

(9)

where

{(\cdot)}^{+}

is the Moore-Penrose (left) pseudoinverse. Thus, in (9) we are expressing the conditional mean of

x_{n m}

in terms of the conditional mean of the sources and the ICA model parameters. This allows us to derive the following general formula, which in spite of its simplicity requires a rather tedious derivation that can be found in Appendix A

C_{e_{n m} e_{n m}} = {(W T_{n m})}^{+} M_{n m} {({(W T_{n m})}^{+})}^{T}

(10)

where

M_{n m}

is an

(N \times N)

diagonal matrix having in its main diagonal the MSEs of optimally estimating the sources from

x_{- n m}

, that is,

M_{n m} (i, i) = m s e_{n m i} = E [{(s_{i} - E [s_{i} / x_{- n m}])}^{2}]

(11)

Moreover, from (A7) (see Appendix) we know that

m s e_{n m i} = 1 - var [E [s_{i} / x_{- n m}]]

, hence considering that, by definition,

m s e_{n m i}

and var [^.] are positive quantities, we conclude that

0 \leq m s e_{n m i} \leq 1

.

Notice that (10) is a combination of the contributions of every source to the residual covariance matrix. This can be better seen by expressing (10) in the alternative form

C_{e_{n m} e_{n m}} = \sum_{i = 1}^{N} m s e_{n m i} \cdot u_{n m i}^{+} u_{n m i}^{+ T}

(12)

where

u_{n m i}^{+}

is the i-th column of

{(W T_{n m})}^{+}

. Notice that

W T_{n m}

is a

(N \times 2)

matrix formed by the n-th and m-th columns of

W

, i.e., by the coefficients that define the (demixing) contributions of

x_{n}^{}

and

x_{m}^{}

to

s

. Thus,

{(W T_{n m})}^{+}

is a

(2 \times N)

matrix,

u_{n m i}^{+}

is a

(2 \times 1)

vector and

u_{n m i}^{+} u_{n m i}^{+ T}

is a

(2 \times 2)

matrix that can be interpreted as the contribution of source

s_{i}^{}

to

C_{e_{n m} e_{n m}}

. This contribution is weighted by

m s e_{n m i}

. Thus,

m s e_{n m i} = 0

indicates that source

s_{i}^{}

is perfectly estimated by

x_{- n m}

, hence

s_{i}^{}

does not contribute to the partial correlation between

x_{n}^{}

and

x_{m}^{}

. At the other extreme,

m s e_{n m i} = 1

indicates that

s_{i}^{}

is independent of

x_{- n m}

, so it has maximum contribution to the partial correlation between

x_{n}^{}

and

x_{m}^{}

.

3. Estimating the ICA Partial Correlation Coefficients

We want to estimate

ρ_{n m}^{I C A - P C C}

by

{\hat{ρ}}_{n m}^{I C A - P C C} = \frac{\hat{E} [e_{n}^{} e_{m}^{}]}{\sqrt{\hat{E} [{(e_{n}^{})}^{2}]} \sqrt{\hat{E} [{(e_{m}^{})}^{2}]}}

(13)

So, according to (7), we have to estimate

C_{e_{n m} e_{n m}}

. Considering (10), we need estimates of

W

and

M_{n m}

. Estimates of

W

, the ICA model parameters, can be obtained using a variety of algorithms [14,15,16,17,26,27,28,29,30,31] so, in the following, we concentrate on the estimation of

M_{n m}

, i.e., on estimating

m s e_{n m i} = E [{(s_{i} - E [s_{i} / x_{- n m}])}^{2}] i = 1 \dots N

. To compute

E [s_{i} / x_{- n m}]

we will consider a particular form of the Wiener structure, which was proposed in [32], namely:

E [s_{i} / x_{- n m}] ≃ E [s_{i} / {\hat{s}}_{_{i}}^{l}]

(14)

where

{\hat{s}}_{_{i}}^{l}

is the LMSE estimator of

s_{i}

from

x_{- n m}

(we dropped the dependence on nm to ease the notation) and the uni-dimensional conditional mean can be approximated by [32,33]

E [s_{i} / {\hat{s}}_{_{i}}^{l}] = \sum_{k = 1}^{\infty} \frac{1}{k!} E [s_{i} \cdot {({\hat{s}}_{i n}^{l})}^{k}] H_{k} ({\hat{s}}_{i n}^{l})

(15)

where

H_{k} (x)

is the k-th Hermite polynomial and

{\hat{s}}_{i n}^{l} = \frac{{\hat{s}}_{_{i}}^{l}}{{(var [{\hat{s}}_{_{i}}^{l}])}^{\frac{1}{2}}}

is a standardized Gaussian random variable (this is justified in [32] by using the central limit theorem).

Let us approximate (15) by the first two terms. Taking into account that

H_{1} (x) = x H_{2} (x) = x^{2} - 1

, we can write

E [s_{i} / {\hat{s}}_{_{i}}^{l}] = E [s_{i} \cdot {\hat{s}}_{i n}^{l}] {\hat{s}}_{i n}^{l} + E [s_{i} \cdot {({\hat{s}}_{i n}^{l})}^{2}] (\frac{{({\hat{s}}_{i n}^{l})}^{2} - 1}{2})

(16)

but

E [s_{i} \cdot {\hat{s}}_{i n}^{l}] {\hat{s}}_{i n}^{l} = E [s_{i} \cdot {\hat{s}}_{i}^{l}] \frac{{\hat{s}}_{i}^{l}}{var ({\hat{s}}_{i}^{l})} = E [{\hat{s}}_{i}^{l} \cdot {\hat{s}}_{i}^{l}] \frac{{\hat{s}}_{i}^{l}}{var ({\hat{s}}_{i}^{l})} = var ({\hat{s}}_{i}^{l}) \frac{{\hat{s}}_{i}^{l}}{var ({\hat{s}}_{i}^{l})} = {\hat{s}}_{i}^{l}

(17)

where we consider that

E [s_{i} \cdot {\hat{s}}_{i}^{l}] = E [{\hat{s}}_{i}^{l} \cdot {\hat{s}}_{i}^{l}]

(due to the orthogonality between the estimation error and the linear estimate). As any LMSE estimator is unbiased, we know that

E [{\hat{s}}_{i}^{l}] = E [s_{i}] = 0

. Then, we can express the conditional mean in (16) as the combination of a linear term

{\hat{s}}_{i}^{l}

plus a nonlinear term

s_{i}^{n l} = E [s_{i} \cdot {({\hat{s}}_{i n}^{l})}^{2}] (\frac{{({\hat{s}}_{i n}^{l})}^{2} - 1}{2})

. Let us now compactly express the estimation of

s

from

x_{- n m}

in the form

E [s / x_{- n m}] = \hat{s} = {\hat{s}}^{l} + {\hat{s}}^{n l}

(18)

We can write

\begin{array}{l} M_{n m} = d i a g (E [(s - \hat{s}) {(s - \hat{s})}^{T}]) = d i a g (E [((s - {\hat{s}}^{l}) - {\hat{s}}^{n l}) {((s - {\hat{s}}^{l}) - {\hat{s}}^{n l})}^{T}]) \\ = M_{n m}^{l} + d i a g (E [{\hat{s}}^{n l} {({\hat{s}}^{n l})}^{T}]) - 2 d i a g (E [(s - {\hat{s}}^{l}) {({\hat{s}}^{n l})}^{T}]) \end{array},

(19)

where

M_{}^{l}

is a diagonal matrix whose elements are the MSEs corresponding to the linear estimation of

s_{i}

from

x_{- n m}

, that is,

M_{n m}^{l} = d i a g (E [(s - {\hat{s}}^{l}) {(s - {\hat{s}}^{l})}^{T}]) = d i a g (E [s s^{T}]) - d i a g (E [{\hat{s}}^{l} {({\hat{s}}^{l})}^{T}])

(20)

In (20), we have considered the orthogonality between the error vector and the estimate vector. However,

{\hat{s}}^{l}

is the minimum LMSE estimate, so it can be obtained from the Wiener-Hopft equations

{\hat{s}}^{l} = C_{s x_{- n m}} C_{x_{- n m} x_{- n m}}^{- 1} x_{- n m} C_{s x_{- n m}} = E [s x_{- n m}^{T}] C_{x_{- n m} x_{- n m}}^{} = E [x_{- n m}^{} x_{- n m}^{T}] .

(21)

Hence, we can write:

M_{n m}^{l} = I - d i a g (E [C_{s x_{- n m}} C_{x_{- n m} x_{- n m}}^{- 1} x_{- n m} x_{- n m}^{T} C_{x_{- n m} x_{- n m}}^{- 1} C_{s x_{- n m}}^{T}]) = I - d i a g (C_{s x_{- n m}} C_{x_{- n m} x_{- n m}}^{- 1} C_{s x_{- n m}}^{T})

(22)

Taking into account

C_{s x_{- n m}} = W C_{x x} T_{- n m}

,

C_{x_{- n m} x_{- n m}}^{} = T_{- n m}^{T} C_{x x} T_{- n m}

and

C_{x x} = W^{- 1} {(W^{- 1})}^{T}

, we can finally express

M_{n m}^{l}

in terms of the ICA model parameters:

M_{n m}^{l} = I - d i a g ({(W^{- 1})}^{T} T_{- n m} {(T_{- n m}^{T} W^{- 1} {(W^{- 1})}^{T} T_{- n m})}^{- 1} T_{- n m}^{T} W^{- 1}) .

(23)

Let us now consider the other two terms in (19). First, notice that

{\hat{s}}_{i}^{n l}

can be interpreted as a linear estimate of

s_{i}

from

{({\hat{s}}_{i n}^{l})}^{2}

, because assuming that

{\hat{s}}_{i n}^{l}

is a standardized Gaussian random variable, then

{({\hat{s}}_{i n}^{l})}^{2}

is

χ^{2}

having a mean equal to 1 and variance equal to 2. Hence, we can apply orthogonality again:

d i a g (E [(s - {\hat{s}}^{n l}) {({\hat{s}}^{n l})}^{T}]) = 0 \Rightarrow d i a g (E [s {({\hat{s}}^{n l})}^{T}]) = d i a g (E [{\hat{s}}^{n l} {({\hat{s}}^{n l})}^{T}]) .

(24)

Consequently, we have

d i a g (E [{\hat{s}}^{n l} {({\hat{s}}^{n l})}^{T}]) - 2 d i a g (E [(s - {\hat{s}}^{l}) {({\hat{s}}^{n l})}^{T}]) = - d i a g (E [s {({\hat{s}}^{n l})}^{T}]) + 2 d i a g (E [{\hat{s}}^{l} {({\hat{s}}^{n l})}^{T}]) .

(25)

The second term in (25) is zero, because

{[d i a g (E [{\hat{s}}^{l} {({\hat{s}}^{n l})}^{T}])]}_{i i} = E [{\hat{s}}_{i}^{l} E [s_{i} \cdot {({\hat{s}}_{i n}^{l})}^{2}] (\frac{{({\hat{s}}_{i n}^{l})}^{2} - 1}{2})] = \frac{1}{2} E [s_{i} \cdot {({\hat{s}}_{i n}^{l})}^{2}] (E [{\hat{s}}_{i}^{l} {({\hat{s}}_{i n}^{l})}^{2}] - E [{\hat{s}}_{i}^{l}])

(26)

where

E [{\hat{s}}_{i}^{l}] = 0

and

E [{\hat{s}}_{i}^{l} {({\hat{s}}_{i n}^{l})}^{2}] = {(var [{\hat{s}}_{_{i}}^{l}])}^{\frac{1}{2}} E [{({\hat{s}}_{i n}^{l})}^{3}] = 0

, because we assume that

{\hat{s}}_{i n}^{l}

is Gaussian so its odd moments are zero. Regarding the first term in (25)

\begin{array}{l} {[d i a g (E [s {({\hat{s}}^{n l})}^{T}])]}_{i i} = E [s_{i} E [s_{i} \cdot {({\hat{s}}_{i n}^{l})}^{2}] (\frac{{({\hat{s}}_{i n}^{l})}^{2} - 1}{2})] \\ = \frac{1}{2} E [s_{i} \cdot {({\hat{s}}_{i n}^{l})}^{2}] (E [s_{i} {({\hat{s}}_{i n}^{l})}^{2}] - E [s_{i}]) = \frac{1}{2 {var}^{2} [{\hat{s}}_{_{i}}^{l}]} E^{2} [s_{i} \cdot {({\hat{s}}_{_{i}}^{l})}^{2}] \end{array}

(27)

Defining the vector

{\hat{s}}^{l (2)} = {[{({\hat{s}}_{_{1}}^{l})}^{2} \dots {({\hat{s}}_{N}^{l})}^{2}]}^{T}

and taking into account that

var [{\hat{s}}_{_{i}}^{l}] = {[C_{s x_{- n m}} C_{x_{- n m} x_{- n m}}^{- 1} C_{s x_{- n m}}^{T}]}_{i i}^{}

we can write

d i a g (E [s {({\hat{s}}^{n l})}^{T}]) = \frac{1}{2} d i a g^{2} (E [s {({\hat{s}}^{l (2)})}^{T}]) d i a g^{- 2} (C_{s x_{- n m}} C_{x_{- n m} x_{- n m}}^{- 1} C_{s x_{- n m}}^{T}),

(28)

and considering (21) and (23), (28) can be expressed in terms of the ICA model parameters

\begin{array}{l} d i a g (E [s {({\hat{s}}^{n l})}^{T}]) = \frac{1}{2} \cdot d i a g^{2} (W E [x {({({(W^{- 1})}^{T} T_{- n m} {(T_{- n m}^{T} W^{- 1} {(W^{- 1})}^{T} T_{- n m})}^{- 1} x_{- n m})}^{(2)})}^{T}]) \\ \cdot d i a g^{- 2} ({(W^{- 1})}^{T} T_{- n m} {(T_{- n m}^{T} W^{- 1} {(W^{- 1})}^{T} T_{- n m})}^{- 1} T_{- n m}^{T} W^{- 1}) \end{array}

(29)

So, in conclusion we can express the matrix

M_{n m}

as

M_{n m} = M_{n m}^{l} - d i a g (E [s {({\hat{s}}^{n l})}^{T}]),

(30)

where

M_{n m}^{l}

and

d i a g (E [s {({\hat{s}}^{n l})}^{T}])

can be obtained from (23) and (29), respectively, using estimates

\hat{W}

of the model parameters and a sample mean to evaluate the expectation required in (29). Algorithm 1 below describes the estimation procedure.

Algorithm 1: Computing ICA-PCC.

1: Input: Learning data set

x^{(l)} l = 1 \dots L

2: Compute

\hat{W}

from the learning data set (any ICA algorithm is a candidate)
3: for n = 1, 2 … N
4: for m = n … N
5: Compute

{\hat{M}}_{n m}

(Equations (23), (29) and (30))
6: Compute

{\hat{C}}_{e_{n m} e_{n m}} = - {(\hat{W} T_{n m})}^{+} {\hat{M}}_{n m} {({(\hat{W} T_{n m})}^{+})}^{T}

7: Compute

{\hat{ρ}}_{n m}^{I C A - P C C}

(Equation (13))
8: Compute

{\hat{ρ}}_{m n}^{I C A - P C C} = {\hat{ρ}}_{n m}^{I C A - P C C}

9: end for
10: end for
11: Output

{\hat{ρ}}_{m n}^{I C A - P C C} n = 1 \dots N m = 1 \dots N

Equation (30) provides an interesting decomposition of

m s e_{n m i}

. Let

m s e_{n m i}^{l}

be called the entries of the diagonal matrix

M_{n m}^{l}

, then

m s e_{n m i}

can be expressed as

m s e_{n m i}^{l}

minus a nonnegative term (see Equation (29)), so that

m s e_{n m i} \leq m s e_{n m i}^{l}

. The condition

m s e_{n m i} = m s e_{n m i}^{l} \Leftrightarrow M_{n m} = M_{n m}^{l}

holds for the Gaussian case, because then

E [s_{i} \cdot {({\hat{s}}_{i n}^{l})}^{2}]

(see Equation (27)) becomes zero (it is an odd higher-order moment of a multivariate Gaussian variable). In such a case,

E [s / x_{- n m}] = {\hat{s}}^{l}

becomes a linear function of

x_{- n m}

and so the same happens with

E [x / x_{- n m}]

in (9). Hence, the second term in (30) is responsible for the improved reduction in the influence of

x_{- n m}

in the estimation of the partial correlation between

x_{n}^{}

and

x_{m}^{}

, in the non-Gaussian case. Moreover, we should expect similar results for ICA-PCC and PCC for the Gaussian case.

4. Experiments

4.1. Synthetic Data Experiments

In this experiment we evaluated the influence of the training set size in the estimation of

ρ_{n m}^{I C A - P C C}

as well as comparing the quality of the estimate with the one obtained from the precision matrix. To this aim, we generated synthetic data corresponding to three different ICA models. In the first one, the sources

s_{i}

were independent and identically distributed (i.i.d.) random variables having a unit-variance zero-mean uniform pdf. This correspond to an example of sub-Gaussian distribution, as the excess kurtosis is negative,

κ - κ_{G} = - 1.2

, where

κ

is the kurtosis, and

κ_{G} = 3

is the kurtosis of a Gaussian pdf. In the second model, the sources

s_{i}

were i.i.d. random variables having a unit-variance zero-mean Laplacian pdf. This is an example of super-Gaussian distribution, as the excess kurtosis is positive

κ - κ_{G} = 3

. In the third model, some sources are uniform and the rest are Laplacian. Finally, we also considered the Gaussian case by generating sources having a standard Gaussian pdf. Figure 1 shows the errors corresponding to the estimation of

ρ_{n m}^{I C A - P C C}

for the four models.

Every curve is an average of 10 curves corresponding to 10 different runs. In every run, an ICA matrix

U = W^{- 1}

was randomly selected; every entry was obtained by sampling a standard Gaussian pdf. Then, a varying number of training vectors

x

was generated from source vectors

s

having independent components sampled from the mentioned marginal pdfs: sub-Gaussian (Figure 1a), super-Gaussian (Figure 1b), mixed of sub/super-Gaussian (Figure 1c) and Gaussian (Figure 1d). The error was computed as

\in_{ρ}^{I C A - P C C} = \frac{1}{N^{2} - N} \sum_{n = 1}^{N} \sum_{m \neq n}^{} {(| ρ_{n m}^{I C A - P C C} | - | {\hat{ρ}}_{m n}^{I C A - P C C} |)}^{2}

(31)

and averaged over the 10 runs for every training set size. Notice that

0 \leq \in_{ρ}^{I C A - P C C} \leq 1

, because

\in_{ρ \min}^{I C A - P C C} = 0

, when

| ρ_{n m}^{I C A - P C C} | = | {\hat{ρ}}_{n m}^{I C A - P C C} | \forall n \forall m \neq n

and

\in_{ρ \max}^{I C A - P C C} = 1

, when

| ρ_{n m}^{I C A - P C C} | - | {\hat{ρ}}_{n m}^{I C A - P C C} | = \pm 1 \forall n \forall m \neq n

. In (31),

ρ_{n m}^{I C A - P C C}

was obtained from Algorithm 1 using the true matrix

W

and an extremely large number of instances for the sample mean required to compute the expectation in (29). On the other hand,

{\hat{ρ}}_{m n}^{I C A - P C C}

was computed from Algorithm 1 using estimates of

W

obtained with the corresponding finite training set. We used the Extended Infomax algorithm described in [34] and the JADE algorithm [35]. Extended Infomax is an extension of the Infomax algorithm [36] used to deal with mixed sub/super-Gaussian sources. It is representative of algorithms that iteratively optimize some defined cost-function like Fast-ICA. JADE is based on matrix computation and diagonalization, so, it is not sensitive to initialization or optimization path problems. The same finite training set was also used to evaluate the expectation in (29). In all cases we considered N = 20. For comparison, we also computed the error

\in_{ρ}^{P C C} = \frac{1}{N^{2} - N} \sum_{n = 1}^{N} \sum_{m \neq n}^{} {(| ρ_{n m}^{I C A - P C C} | - | {\hat{ρ}}_{m n}^{P C C} |)}^{2}

(32)

which corresponds to the PCCs obtained from empirical estimates

\hat{Q}

of the precision matrix as indicated in (3):

{\hat{ρ}}_{n m}^{P C C} = - {\hat{q}}_{n m} / \sqrt{{\hat{q}}_{n n} {\hat{q}}_{m m}}

. Notice that it is also

0 \leq \in_{ρ}^{P C C} \leq 1

.

Several conclusions may be drawn from Figure 1. First, we can see that in the non-Gaussian cases (a) (b) and (c), PCC cannot decrease the error with increased training set size. This demonstrates the model mismatch due to the implicit Gaussianity of PCC. In these three cases, ICA-PCC methods improve on PCC after a sufficient number of training samples and maintain a decreased error for an increased training set size. The minimum training set size required to improve on PCC depends on the case and on the ICA-PCC method. Thus, this minimum number is smaller in Figure 1a (sub-Gaussians) than in Figure 1b (super-Gaussians), and has an intermediate value in Figure 1c (mixed sub/super Gaussians). However, notice that the non-decreasing error of PCC is much higher in Figure 1a,c, so this suggests that ICA-PCC improvement begins from a smaller training set size. On the other hand, this minimum value is smaller in JADE than in Extended-Infomax. This is due to the different nature of both algorithms. JADE requires a matrix diagonalization, while Extended-Infomax requires iterative learning. However the computational complexity of JADE is much larger, especially as N increases. Regarding convergence for large values of the training set size, we can see that the error level of the mixed case is clearly above the others. This is because, in general all ICA algorithms have more difficulties in estimating the model in the mixed case. Actually, Extended-Infomax was conceived in an effort to deal with the mixed case by incorporating a procedure to estimate the class (sub/super) of every source. This explains the smaller error of Extended-Infomax in Figure 1c with respect to JADE, after a given training set size. For the Gaussian case, PCC yields a very small error, which decreases with increasing training set size. In this case, ICA-PCC is worse than PCC, although the error is reasonably small. Remember that, in the Gaussian case, we expected similar results for both methods, however, the estimation path followed is different: in PCC the precision matrix is directly estimated, while in ICA-PCC the matrices W and

M_{n m}

are estimated in Algorithm 1. This could explain the separation observed in the error curves of Figure 1d. Finally, most ICA algorithms decompose the estimation of W into two steps: first, estimate a decorrelation matrix, and then, a rotation matrix. When the independent components are Gaussian, any rotation matrix is valid, as all of them are compatible with Gaussianity. However, in the non-Gaussian cases the rotation matrix must be properly estimated for the corresponding non-Gaussian model. This can be interpreted as if a smaller number of model parameters (entries of the decorrelation matrix) should be actually estimated in the Gaussian case. This explains the faster convergence of the curves in Figure 1d.

4.2. A Real Data Application

We applied the proposed method to quantify the significance of changes in brain connectivity during sleep of patients having disorders like apnea or epilepsy [37]. These disorders are characterized by regular arousal, which are stages of abnormal degraded sleep. The frequency of arousals in a given period of time is related to the seriousness of the pathology. However, the intensity of the arousals may also be relevant for an appropriate diagnosis. Assuming that an arousal is associated to changes in brain connectivity [38], a measure related to the change magnitude may be useful to quantify the significance of the pathology. To this aim, the patient was monitored during sleep by 19 channels of EEG recordings. Every signal channel was segmented into intervals of 2 s and a given feature was computed in every interval and averaged in epochs of 26 s. Each epoch was manually or automatically [22] labelled in two possible states: normal sleep (state 0) or arousal (state 1). Then, associated to every epoch, an observation vector

x

was built with one feature extracted from all the channels (the same type of feature for all of them), thus

N = 19

. In this experiment, a total of 2000 epochs were available in every state. Given these data sets, an average measure related to brain connectivity was computed to quantify the importance of brain changes between the two states.

There are many possible definitions of overall connectivity, here, we considered the so called algebraic connectivity [39], which can be computed as the second smallest eigenvalue

λ_{2}

of the graph Laplacian matrix [40]

L = D - A

, being

D

a diagonal matrix with entries

d_{n n} = \sum_{m \neq n}^{} a_{n m}

and

A

the adjacency matrix with entries

a_{n m} \geq 0

. The Laplacian matrix is semidefinite positive with the smallest eigenvalue

λ_{1}

equal to zero, then

λ_{2} \geq 0

. Moreover, it is demonstrated in [39] that

λ_{2} = N

for a complete graph (a graph with

a_{n m} = 1 \forall n \neq m

). It is also demonstrated in [39] that

λ_{2} \leq \frac{N}{N - 1} \min [d_{n n}]

(33)

Hence, assuming that

0 \leq a_{n m} \leq 1

(as it will be in our case), the greatest upper bound for

λ_{2}

in (33) corresponds to the complete graph (

a_{n m} = 1 \forall n \neq m \Rightarrow d_{n n} = N - 1 \forall n

), therefore

0 \leq λ_{2} \leq N

. Consequently, we proposed a normalized version of

λ_{2}

to measure the connectivity

ς = \frac{λ_{2}}{N} 0 \leq ς \leq 1

(34)

The lower bound

ς = 0

corresponds to a disconnected graph, as it implies an order of multiplicity greater than 1 of the smallest eigenvalue. The upper bound corresponds to a complete graph, which is the one having maximum connectivity under the constraint

0 \leq a_{n m} \leq 1

. We obtained connectivity estimates for every state (0 or 1) and method (ICA-PCC or PCC):

{\hat{ς}}_{0}^{I C A - P C C}, {\hat{ς}}_{1}^{I C A - P C C}, {\hat{ς}}_{0}^{P C C}, {\hat{ς}}_{1}^{P C C}

. This was made from (34) with N = 19, after computing the second smallest eigenvalue of the Laplacian matrix, considering that the entries of the associated adjacency matrix are the respective magnitudes of the partial correlation estimates obtained from the training set 0 or 1:

\begin{array}{l} a_{n m 0}^{I C A - P C C} = | {\hat{ρ}}_{n m 0}^{I C A - P C C} |, a_{n m 1}^{I C A - P C C} = | {\hat{ρ}}_{n m 1}^{I C A - P C C} | \\ a_{n m 0}^{I P C C} = | {\hat{ρ}}_{n m 0}^{P C C} |, a_{n m 1}^{P C C} = | {\hat{ρ}}_{n m 1}^{P C C} | \end{array}

(35)

Two different features were considered separately. The first is “amplitude” (Amp), which is the maximum amplitude in the corresponding 2 s interval, the second is the “alfa-slow-index” (Asi), which is the ratio of power in the alpha band (8.0–11 Hz) to the combined power in the delta (0.5–3.5 Hz) and theta (3.5–8.0 Hz) bands. Table 1 and Table 2 show the results corresponding to the Amp and the Asi features, respectively, for 6 different patients. Together with the normalized connectivity, we included the connectivity variation between states defined as

Δ_{}^{I C A - P C C} = | {\hat{ς}}_{1}^{I C A - P C C} - {\hat{ς}}_{0}^{I C A - P C C} |

and

Δ_{}^{P C C} = | {\hat{ς}}_{1}^{P C C} - {\hat{ς}}_{0}^{P C C} |

. We also included a kurtosis estimate for every patient and state. This estimate was obtained as a mean of all the 19 empirical kurtosis separately calculated for every component of vector x, i.e., the empirical kurtosis of the marginal distributions of x. Notice that the estimated kurtosis is clearly above the Gaussian reference

κ_{G} = 3

, so Gaussianity assumption does not hold in this case.

We can see in Table 1 and Table 2 that the PCC method yields very small values of connectivity for all subjects and states, therefore, it is not sensitive to possible changes between states. However, ICA-PCC provides larger values of connectivity and significant changes between states. Figure 2 and Figure 3 show the estimated adjacency matrices corresponding to the different subjects, methods and states. We can see that PCC magnitudes are, in general, much lower than ICA-PCC magnitudes, therefore, PCC has more difficulty revealing the interrelations between the different EEG channels due to the brain activity. This may be explained in terms of the residuals

e_{n}^{}

and

e_{m}^{}

. Notice from (12) that

E [e_{n}^{2}] = \sum_{i = 1}^{N} m s e_{n m i} {(v_{1 n m i}^{+})}^{2} E [e_{m}^{2}] = \sum_{i = 1}^{N} m s e_{n m i} {(v_{2 n m i}^{+})}^{2}

(36)

where

{v_{1 n m i}^{+}}

and

{v_{2 n m i}^{+}}

are the elements of vectors

v_{1 n m}^{+}

and

v_{2 n m}^{+}

, respectively, and later, these are the first and second row of

{(W T_{n m})}^{+}

, respectively. We showed in Section 3 that PCC should be similar to ICA-PCC for

m s e_{n m i} = m s e_{n m i}^{l}

, but for non-Gaussian observations

m s e_{n m i} < m s e_{n m i}^{l}

, so it is deduced from (36) that

\begin{array}{l} E [e_{n}^{2}] \leq E^{l} [e_{n}^{2}] = \sum_{i = 1}^{N} m s e_{n m i}^{l} {(v_{1 n m i}^{+})}^{2} \\ E [e_{m}^{2}] \leq E^{l} [e_{m}^{2}] = \sum_{i = 1}^{N} m s e_{n m i}^{l} {(v_{2 n m i}^{+})}^{2} \end{array}

(37)

where equality holds in the Gaussian case. So, PCC provides overestimated residuals where the actual partial correlation between

x_{n}^{}

and

x_{m}^{}

may be eventually hidden. This masking effect should increase with the non-Gaussianity character of the observations. In our experiment, the features are highly non-Gaussian as demonstrated by the kurtosis values of Table 1 and Table 2. So, when using PCC, the “true” residuals seem to be overestimated by rather uncorrelated residuals that provide a too low estimation of the actual interrelation between the different EEG channels.

5. Conclusions and Extensions

Partial correlations may be used to define the weights of an undirected graph for subsequent graph signal processing. Conventionally, partial correlations are obtained from the precision matrix, but this is optimal only under the Gaussianity assumption. Hence, we have proposed a new method for computing the partial correlation, assuming a non-Gaussian model (ICA). The latter is a versatile model which suits a diversity of non-Gaussian pdfs.

The proposed method requires the computation of the ICA model parameters, which can be made by using any of the many existing algorithms. Two different ICA methods have been considered in the synthetic examples, which may be considered representative of two different kinds of approaches to estimating the ICA model parameters. Both yield similar performance. Computing the mean-square-errors corresponding to the optimal estimation of the sources is also required. Hence, we have proposed a second-order approximation of the conditional mean. Higher orders could be tried at the price of increased complexity.

We have verified, both by simulations and by real data experiments that the new method better captures the pairwise and overall connectivity of the graph compared to the precision matrix in non-Gaussian scenarios. The results could be extended to larger values of N but the training set sizes should be correspondingly increased to keep the quality of the model parameter estimates.

Future extensions of this work can be devised. Some kind of regularization is desirable to emphasize the relevant information provided by the graph connectivity and/or to establish more natural relations between the connected nodes. Thus, sparsity is a common requirement of graph learning (see [41] as a representative example). Considering Equation (10), sparsity could be imposed by selecting only those sources that significantly contribute to the partial correlation between

x_{n}^{}

and

x_{m}^{}

, i.e., by soft or hard thresholding on

m s e_{n m i}

. On the other hand, smoothness regularization could be tried in a similar manner to the approach proposed in [42] for the Gaussian case. To this aim, it could be considered that the representation matrix U can be factorized in a correlation matrix multiplied by a rotation (unitary) matrix [43]. Understanding how this rotation relates to the graph connectivity may allow the definition of cost functions, which include some possible smoothness related terms. Other structural constraints [44] could also be compatible with the non-Gaussian model.

Author Contributions

The work reported here was developed in collaboration among all authors. All authors have contributed to the preparation of the manuscript, and have approved it. Conceptualization, L.V. and A.S.; Data curation, J.B. and G.S.; Formal analysis, J.B., L.V. and G.S.; Funding acquisition, L.V. and A.S.; Investigation, J.B., G.S. and A.S.; Software, J.B. and G.S.; Supervision, L.V. and A.S.; Writing—original draft, J.B. and L.V.; Writing—review & editing, J.B. and L.V.

Funding

This research was funded by Spanish Administration and European Union under grants TEC2014-58438-R and TEC2017-84743-P.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Derivation of the General Formula

Due to the orthogonality between the error and the conditional mean, we can write

C_{e_{n m} e_{n m}} = E [x_{n m} x_{n m}^{T}] - E [E [x_{n m} / x_{- n m}] E^{T} [x_{n m} / x_{- n m}]]

(A1)

Let us define

f (x_{- n m}) = E [s / x_{- n m}] \equiv f q (x_{- n m}) = - W T_{- n m} x_{- n m} = q

(A2)

So, from (9), we may write

E [x_{n m} / x_{- n m}] = {(W T_{n m})}^{+} (f + q)

, hence

\begin{array}{l} C_{e_{n m} e_{n m}} = E [x_{n m} x_{n m}^{T}] - E [{(W T_{n m})}^{+} (f + q) (f^{T} + q^{T}) {({(W T_{n m})}^{+})}^{T}] = \underset{1}{\underset{︸}{E [x_{n m} x_{n m}^{T}]}} - \underset{2}{\underset{︸}{{(W T_{n m})}^{+} E [q q^{T}] {({(W T_{n m})}^{+})}^{T}}} \\ - \underset{3}{\underset{︸}{{(W T_{n m})}^{+} E [f f^{T}] {({(W T_{n m})}^{+})}^{T}}} - \underset{4}{\underset{︸}{({(W T_{n m})}^{+} E [f q^{T}] {({(W T_{n m})}^{+})}^{T} + {(W T_{n m})}^{+} E^{T} [f q^{T}] {({(W T_{n m})}^{+})}^{T})}} \end{array}

(A3)

We have to compute the 4 terms of (A3).

-: Term 1

$E [x_{n m} x_{n m}^{T}] = C_{x_{n m} x_{n m}} = T_{n m}^{T} C_{x x} T_{n m}$

(A4)
-: Term 2

$\begin{array}{l} \begin{matrix} E [q q^{T}] = & E [W T_{- n m} x_{- n m} x_{- n m}^{T} T_{- n m}^{T} W^{T}] = W T_{- n m} C_{x_{- n m} x_{- n m}} T_{- n m}^{T} W^{T} = W T_{- n m} T_{- n m}^{T} C_{x x} T_{- n m} T_{- n m}^{T} W^{T} \\ = & W (I - T_{n m} T_{n m}^{T}) C_{x x} (I - T_{n m} T_{n m}^{T}) W^{T}) = W C_{x x} W^{T} + W T_{n m} T_{n m}^{T} C_{x x} T_{n m} T_{n m}^{T} W^{T} \\ - W T_{n m} T_{n m}^{T} C_{x x} W^{T} - W C_{x x} T_{n m} T_{n m}^{T} W^{T} \end{matrix} \\ \begin{array}{l} {(W T_{n m})}^{+} E [q q^{T}] {({(W T_{n m})}^{+})}^{T} = {(W T_{n m})}^{+} {({(W T_{n m})}^{+})}^{T} + T_{n m}^{T} C_{x x} T_{n m} \\ - T_{n m}^{T} W^{- 1} {({(W T_{n m})}^{+})}^{T} - {(W T_{n m})}^{+} {(W^{- 1})}^{T} T_{n m} \end{array} \end{array}$

(A5)
-: Term 3

$\begin{array}{l} \begin{array}{l} i \neq j \\ E [f f^{T}] (i, j) = E [E [s_{i} / x_{- n m}] E [s_{j} / x_{- n m}]] \\ = E [E [s_{i} / x_{- n m}]] E [E [s_{j} / x_{- n m}]] = E [s_{i}] E [s_{j}] = 0 \end{array} \\ \begin{array}{l} i = j \\ E [f f^{T}] (i, i) = E [E^{2} [s_{i} / x_{- n m}]] = var [E [s_{i} / x_{- n m}]] + E^{2} [E [s_{i} / x_{- n m}]] \\ = var [E [s_{i} / x_{- n m}]] + E^{2} [s_{i}] = var [E [s_{i} / x_{- n m}]] \end{array} \end{array}$

(A6)

Let us express $var [E [s_{i} / x_{- n m}]]$ in terms of $m s e_{i}$ as defined in (A7)

$\begin{array}{l} m s e_{n m i} = E [{(s_{i} - E [s_{i} / x_{- n m}])}^{2}] = E [s_{i}^{2}] + E [E^{2} [s_{i} / x_{- n m}]] - 2 E [s_{i} E [s_{i} / x_{- n m}]] \\ E [(s_{i} - E [s_{i} / x_{- n m}]) E [s_{i} / x_{- n m}]] = 0 \\ \Rightarrow E [s_{i} E [s_{i} / x_{- n m}]] = E [E [s_{i} / x_{- n m}] E [s_{i} / x_{- n m}]] \\ \Rightarrow m s e_{n m i} = E [s_{i}^{2}] - E [E_{i}^{2} [s_{i} / x_{- n m}]] = 1 - var [E [s_{i} / x_{- n m}]] \end{array}$

(A7)

where we have taken into account that the conditional mean is an unbiased estimator. Then, we know that $E [f f^{T}] (i, i) = var [E [s_{i} / x_{- n m}]] = 1 - m s e_{n m i}$ and if we define $M_{n m}$ as a $(N \times N)$ diagonal matrix having in its main diagonal the values $m s e_{n m i} i = 1 \dots N$ , we can write

${(W T_{n m})}^{+} E [f f^{T}] {({(W T_{n m})}^{+})}^{T} = {(W T_{n m})}^{+} (I - M_{n m}) {({(W T_{n m})}^{+})}^{T}$

(A8)
-: Term 4

$\begin{array}{l} E [f q^{T}] = - E [f x_{- n m}^{T} T_{- n m}^{T} W^{T}] \\ E [E [s_{i} / x_{- n m}] x_{j}] \overset{x_{j} \in {x_{- n m}}}{\overset{︷}{=}} \int_{x_{- n m}} x_{j} E [s_{i} / x_{- n m}] p (x_{- n m}) d x_{- n m} = \int_{x_{- n m}} x_{j} [\int_{s_{i}} s_{i} p (s_{i} / x_{- n m}) d s_{i}] p (x_{- n m}) d x_{- n m} \\ = \int_{s_{i}} s_{i} [\int_{x_{- n m}} x_{j} p (s_{i} / x_{- n m}) p (x_{- n m}) d x_{- n m}] d s_{i} = \int_{s_{i}} s_{i} [\int_{x_{- n m}} x_{j} p (x_{- n m} / s_{i}) p (s_{i}) d x_{- n m}] d s_{i} \\ = \int_{s_{i}} s_{i} p (s_{i}) [\int_{x_{- n m}} x_{j} p (x_{- n m} / s_{i}) d x_{- n m}] d s_{i} = \int_{s_{j}} s_{i} p (s_{i}) [\int_{x_{j}} x_{j} p (x_{j} / s_{i}) d x_{i}] d s_{i} \\ = \int_{s_{j}} \int_{x_{j}} s_{i} x_{j} p (s_{i}, x_{j}) d s_{i} d x_{j} = E [s_{i} x_{j}] \\ E [f q^{T}] = - E [f x_{- n m}^{T} T_{- n m}^{T}] W^{T} = E [s x_{- n m}^{T} T_{- n m}^{T}] W^{T} = - E [W x x_{- n m}^{T} T_{- n m}^{T}] W^{T} = - W C_{x x_{- n m}} T_{- n m}^{T} W^{T} \\ = - W C_{x x} T_{- n m} T_{- n m}^{T} W^{T} = - {(W^{- 1})}^{T} (I - T_{n m} T_{n m}^{T}) W^{T} = {(W^{- 1})}^{T} T_{n m} T_{n m}^{T} W^{T} - I \\ {(W T_{n m})}^{+} E [f q^{T}] {({(W T_{n m})}^{+})}^{T} = {(W T_{n m})}^{+} {(W^{- 1})}^{T} T_{n m} - {(W T_{n m})}^{+} {({(W T_{n m})}^{+})}^{T} \\ {(W T_{n m})}^{+} E^{T} [f q^{T}] {({(W T_{n m})}^{+})}^{T} = T_{n m}^{T} W^{- 1} {({(W T_{n m})}^{+})}^{T} - {(W T_{n m})}^{+} {({(W T_{n m})}^{+})}^{T} \end{array}$

(A9)

Finally, considering (A3)–(A6) and (A8) we can write:

\begin{array}{l} C_{e_{n m} e_{n m}} = T_{n m}^{T} C_{x x} T_{n m} - {(W T_{n m})}^{+} {({(W T_{n m})}^{+})}^{T} - T_{n m}^{T} C_{x x} T_{n m} \\ + T_{n m}^{T} W^{- 1} {({(W T_{n m})}^{+})}^{T} + {(W T_{n m})}^{+} {(W^{- 1})}^{T} T_{n m} - {(W T_{n m})}^{+} (I - M_{n m}) {({(W T_{n m})}^{+})}^{T} \\ - {(W T_{n m})}^{+} {(W^{- 1})}^{T} T_{n m} + {(W T_{n m})}^{+} {({(W T_{n m})}^{+})}^{T} - T_{n m}^{T} W^{- 1} {({(W T_{n m})}^{+})}^{T} + {(W T_{n m})}^{+} {({(W T_{n m})}^{+})}^{T} \\ = {(W T_{n m})}^{+} (M_{n m}) {({(W T_{n m})}^{+})}^{T} \end{array}

(A10)

References

Baba, K.; Shibata, R.; Sibuya, M. Partial correlation and conditional correlation as measures of conditional independence. Aust. N. Z. J. Stat. 2004, 46, 657–664. [Google Scholar] [CrossRef]
Shuman, D.I.; Narang, S.K.; Frossard, P.; Ortega, A.; Vandergheynst, P. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag. 2013, 30, 83–98. [Google Scholar] [CrossRef] [Green Version]
Sandryhaila, A.; Moura, J.M.F. Discrete signal processing on graphs. IEEE Trans. Signal Process. 2013, 61, 1644–1656. [Google Scholar] [CrossRef]
Ortega, A.; Frossard, P.; Kovacevic, J.; Moura, J.M.F.; Vandergheynst, P. Graph Signal Processing: Overview, challenges and applications. Proc. IEEE 2018, 106, 808–828. [Google Scholar] [CrossRef]
Zhang, C.; Florencio, D.; Chou, P.A. Graph Signal Processing—A Probabilistic Framework; Tech. Rep. MSR-TR-2015-31; Microsoft Research Lab: Redmond, WA, USA, 2015. [Google Scholar]
Pávez, E.; Ortega, A. Generalized precision matrix estimation for graph signal processing. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, China, 20–25 March 2016; pp. 6350–6354. [Google Scholar]
Mazumder, R.; Hastie, T. The graphical lasso: New insights and alternatives. Electron. J. Stat. 2012, 6, 2125–2149. [Google Scholar] [CrossRef] [PubMed]
Hsieh, C.J.; Sustik, M.A.; Dhillon, I.S.; Ravikumar, P. Sparse inverse covariance matrix estimation using quadratic approximation. Adv. Neural Inf. Process. Syst. 2011, 24, 2330–2338. [Google Scholar]
Chen, X.; Xu, M.; Wu, W.B. Covariance and precision matrix estimation for high-dimensional time series. Ann. Stat. 2013, 41, 2994–3021. [Google Scholar] [CrossRef]
Öllerer, V.; Croux, C. Robust high-dimensional precision matrix estimation. In Modern Multivariate and Robust Methods; Nordhausen, K., Taskinen, S., Eds.; Springer: New York, NY, USA, 2015; pp. 329–354. [Google Scholar]
Friedman, J.; Hastie, T.; Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008, 9, 432–441. [Google Scholar] [CrossRef]
Peng, J.; Wang, P.; Zhou, N.; Zhu, J. Partial correlation estimation by joint sparse regression model. J. Am. Stat. Assoc. 2009, 104, 735–746. [Google Scholar] [CrossRef]
Belda, J.; Vergara, L.; Salazar, A.; Safont, G. Estimating the Laplacian matrix of Gaussian mixtures for signal processing on graphs. Signal Process. 2018, 148, 241–249. [Google Scholar] [CrossRef]
Salazar, A.; Vergara, L. Independent Component Analysis (ICA): Algorithms, Applications and Ambiguities; Nova Science Publishers: New York, NY, USA, 2018. [Google Scholar]
Common, P.; Jutten, C. Handbook of Blind Source Separation: Independent Component Analysis and Applications; Academic Press: Cambridge, MA, USA, 2010. [Google Scholar]
Hyvarinen, A. Independent component analysis: Algorithms and applications. Neural Netw. 2000, 13, 411–430. [Google Scholar] [CrossRef]
Lee, T.W. Independent Component Analysis: Theory and Applications; Kluwer: Norwell, MA, USA, 1998. [Google Scholar]
Chai, R.; Naik, G.R.; Nguyen, T.N.; Ling, S.H.; Tran, Y.; Craig, A.; Nguyen, H.T. Driver fatigue classification with independent component by entropy rate bound minimization analysis in an EEG-based system. IEEE J. Biomed. Health Inform. 2017, 21, 715–724. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Liu, S.; Huang, T.; Zhang, Z.; Hu, Y.; Zhang, T. Infrared spectrum blind deconvolution algorithm via learned dictionaries and sparse representation. Appl. Opt. 2016, 55, 2813–2818. [Google Scholar] [CrossRef]
Naik, G.R.; Selvan, S.E.; Nguyen, H.T. Single-Channel EMG Classification with Ensemble-Empirical-Mode-Decomposition-Based ICA for Diagnosing Neuromuscular Disorders. IEEE Trans. Neural Syst. Rehab. Eng. 2016, 24, 734–743. [Google Scholar] [CrossRef] [PubMed]
Guo, Y.; Huang, S.; Li, Y.; Naik, G.R. Edge effect elimination in single-mixture blind source separation. Circuits Syst. Signal Process. 2013, 32, 2317–2334. [Google Scholar] [CrossRef]
Yuejie, Ch. Guaranteed blind sparse spikes deconvolution via lifting and convex optimization. IEEE J. Select. Top. Signal Process. 2016, 10, 782–794. [Google Scholar]
Pendharkara, G.; Naik, G.R.; Nguyen, H.T. Using blind source separation on accelerometry data to analyze and distinguish the toe walking gait from normal gait in ITW children. Biomed. Signal Process. Control 2014, 13, 41–49. [Google Scholar] [CrossRef]
Guo, Y.; Naik, G.R.; Nguyen, H.T. Single channel blind source separation based local mean decomposition for biomedical applications. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Osaka, Japan, 3–7 July 2013; pp. 6812–6815. [Google Scholar]
Liming, W.; Chi, Y. Blind Deconvolution from Multiple Sparse Inputs. IEEE Signal Process. Lett. 2016, 23, 1384–1388. [Google Scholar]
Salazar, A.; Vergara, L.; Serrano, A.; Igual, J. A General Procedure for Learning Mixtures of Independent Component Analyzers. Pattern Recognit. 2010, 43, 69–85. [Google Scholar] [CrossRef]
Safont, G.; Salazar, A.; Vergara, L.; Gomez, E.; Villanueva, V. Probabilistic distance for mixtures of independent component analyzers. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 1161–1173. [Google Scholar] [CrossRef]
Salazar, A.; Igual, J.; Safont, G.; Vergara, L.; Vidal, A. Image applications of agglomerative clustering using mixtures of non-Gaussian distributions. In Proceedings of the 2015 International Conference on Computational Science and Computational Intelligence, Las Vegas, NV, USA, 7–9 December 2015; pp. 459–463. [Google Scholar]
Safont, G.; Salazar, A.; Rodriguez, A.; Vergara, L. On recovering missing ground penetrating radar traces by statistical interpolation methods. Remote Sens. 2014, 6, 7546–7565. [Google Scholar] [CrossRef]
Salazar, A.; Safont, G.; Soriano, A.; Vergara, L. Automatic credit card fraud detection based on non-linear signal processing. In Proceedings of the IEEE International Carnahan Conference on Security Technology, Boston, MA, USA, 15–18 October 2012; pp. 207–212. [Google Scholar]
Salazar, A.; Igual, J.; Vergara, L.; Serrano, A. Learning hierarchies from ICA mixtures. In Proceedings of the IEEE International Joint Conference on Artificial Neural Networks, Orlando, FL, USA, 12–17 August 2007; pp. 2271–2276. [Google Scholar]
Vergara, L.; Bernabeu, P. Simple approach to nonlinear prediction. Electron. Lett. 2001, 37, 928–936. [Google Scholar] [CrossRef]
Celebi, E. General formula for conditional mean using higher-order statistics. Electron. Lett. 1997, 33, 2097–2099. [Google Scholar] [CrossRef]
Lee, T.W.; Girolami, M.; Sejnowski, T.J. Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources. Neural Comput. 1999, 11, 409–433. [Google Scholar] [CrossRef]
Cardoso, J.F. Blind beamforming for non-Gaussian signals. IEE Proc. F-Radar Signal Process. 1993, 140, 362–370. [Google Scholar] [CrossRef]
Hyvärinen, A.; Oja, E. A fast fixed-point algorithm for Independent Component Analysis. Neural Comput. 1997, 9, 1483–1492. [Google Scholar] [CrossRef]
Salazar, A.; Vergara, L.; Miralles, R. On including sequential dependence in ICA mixture models. Signal Process. 2010, 90, 2314–2318. [Google Scholar] [CrossRef]
Lang, E.W.; Tomé, A.; Keck, I.R.; Górriz-Sáez, J.; Puntonet, C. Brain connectivity analysis: A short survey. Comput. Intell. Neurosci. 2012, 2012. [Google Scholar] [CrossRef]
Fiedler, M. Algebraic connectivity of graphs. Czecoslovak Math. J. 1973, 23, 298–305. [Google Scholar]
Merris, R. Laplacian matrices of a graph: A survey. Linear Algebra Appl. 1994, 197, 143–176. [Google Scholar] [CrossRef]
Lake, B.; Tenenbaum, J. Discovering structure by learning sparse graph. In Proceedings of the 32nd Annual Meeting of the Cognitive Science Society CogSci 2010, Portland, OR, USA, 11–14 August 2010; pp. 778–783. [Google Scholar]
Dong, X.; Thanou, D.; Frossard, P.; Vandergheynst, P. Learning Laplacian matrix in smooth graph signal representations. IEEE Trans. Signal Process. 2016, 64, 6160–6173. [Google Scholar] [CrossRef]
Moragues, J.; Vergara, L.; Gosálbez, J. Generalized matched subspace filter for nonindependent noise based on ICA. IEEE Trans. Signal Process. 2011, 59, 3430–3434. [Google Scholar] [CrossRef]
Egilmez, H.E.; Pavez, E.; Ortega, A. Graph learning from data under Laplacian and structural constraints. IEEE J. Sel. Top. Signal Process. 2017, 11, 825–841. [Google Scholar] [CrossRef]

Figure 1.

\in_{ρ}^{I C A - P C C}

(blue, Extended-Infomax, yellow, JADE) and

\in_{ρ}^{P C C}

; (a) sub-Gaussian case (b) super-Gaussian case (c) Mixed (15/5) sub/super-Gaussian case (d) Gaussian case.

Figure 1.

\in_{ρ}^{I C A - P C C}

(blue, Extended-Infomax, yellow, JADE) and

\in_{ρ}^{P C C}

; (a) sub-Gaussian case (b) super-Gaussian case (c) Mixed (15/5) sub/super-Gaussian case (d) Gaussian case.

Figure 2. Adjacency matrices corresponding to Amp.

Figure 3. Adjacency matrices corresponding to Asi.

Table 1. Results corresponding to the amplitude (Amp).

Subj.	$κ_{0}$	$κ_{1}$	${\hat{ς}}_{0}^{I C A - P C C}$	${\hat{ς}}_{1}^{I C A - P C C}$	$Δ^{I C A - P C C}$	${\hat{ς}}_{0}^{P C C}$	${\hat{ς}}_{1}^{P C C}$	$Δ^{P C C}$
S1	6.46	4.58	0.30	0.33	0.03	0.03	0.03	0.00
S2	8.05	5.29	0.74	0.39	0.35	0.04	0.04	0.00
S3	9.84	6.76	0.57	0.28	0.29	0.03	0.02	0.01
S4	9.04	8.87	0.39	0.66	0.27	0.04	0.02	0.02
S5	9.61	15.13	0.31	0.44	0.13	0.02	0.03	0.01
S6	9.14	13.82	0.24	0.36	0.12	0.02	0.02	0.00

Table 2. Results corresponding to the alfa-slow-index (Asi).

Subj.	$κ_{0}$	$κ_{1}$	${\hat{ς}}_{0}^{I C A - P C C}$	${\hat{ς}}_{1}^{I C A - P C C}$	$Δ^{I C A - P C C}$	${\hat{ς}}_{0}^{P C C}$	${\hat{ς}}_{1}^{P C C}$	$Δ^{P C C}$
S1	16.32	22.51	0.31	0.51	0.20	0.02	0.02	0.00
S2	10.52	9.09	0.34	0.60	0.26	0.02	0.02	0.00
S3	9.91	7.05	0.68	0.48	0.20	0.02	0.03	0.01
S4	8.39	11.69	0.37	0.74	0.37	0.03	0.02	0.01
S5	7.72	13.15	0.22	0.71	0.49	0.02	0.03	0.01
S6	11.86	9.24	0.43	0.56	0.13	0.02	0.03	0.01

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Belda, J.; Vergara, L.; Safont, G.; Salazar, A. Computing the Partial Correlation of ICA Models for Non-Gaussian Graph Signal Processing. Entropy 2019, 21, 22. https://doi.org/10.3390/e21010022

AMA Style

Belda J, Vergara L, Safont G, Salazar A. Computing the Partial Correlation of ICA Models for Non-Gaussian Graph Signal Processing. Entropy. 2019; 21(1):22. https://doi.org/10.3390/e21010022

Chicago/Turabian Style

Belda, Jordi, Luis Vergara, Gonzalo Safont, and Addisson Salazar. 2019. "Computing the Partial Correlation of ICA Models for Non-Gaussian Graph Signal Processing" Entropy 21, no. 1: 22. https://doi.org/10.3390/e21010022

APA Style

Belda, J., Vergara, L., Safont, G., & Salazar, A. (2019). Computing the Partial Correlation of ICA Models for Non-Gaussian Graph Signal Processing. Entropy, 21(1), 22. https://doi.org/10.3390/e21010022

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Computing the Partial Correlation of ICA Models for Non-Gaussian Graph Signal Processing

Abstract

1. Introduction

1.1. Background

1.2. New Contributions and Paper Organization

2. The Partial Correlation of ICA Models

2.1. Statement of the Problem

2.2. A General Formula for the Residual Covariance

3. Estimating the ICA Partial Correlation Coefficients

4. Experiments

4.1. Synthetic Data Experiments

4.2. A Real Data Application

5. Conclusions and Extensions

Author Contributions

Funding

Conflicts of Interest

Appendix A. Derivation of the General Formula

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI