Bi-Smoothed Functional Independent Component Analysis for EEG Artifact Removal

Vidal, Marc; Rosso, Mattia; Aguilera , Ana M.

doi:10.3390/math9111243

Open AccessArticle

Bi-Smoothed Functional Independent Component Analysis for EEG Artifact Removal

by

Marc Vidal

^1,2

,

Mattia Rosso

¹

and

Ana M. Aguilera

^2,*

¹

Institute of Psychoacoustics and Electronic Music (IPEM), Ghent University, 9000 Ghent, Belgium

²

Department of Statistics and O.R. and IMAG, University of Granada, 18071 Granada, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(11), 1243; https://doi.org/10.3390/math9111243

Submission received: 19 April 2021 / Revised: 24 May 2021 / Accepted: 25 May 2021 / Published: 28 May 2021

(This article belongs to the Special Issue Methodological and Applied Contributions on Stochastic Modelling and Forecasting)

Download

Browse Figures

Versions Notes

Abstract

:

Motivated by mapping adverse artifactual events caused by body movements in electroencephalographic (EEG) signals, we present a functional independent component analysis based on the spectral decomposition of the kurtosis operator of a smoothed principal component expansion. A discrete roughness penalty is introduced in the orthonormality constraint of the covariance eigenfunctions in order to obtain the smoothed basis for the proposed independent component model. To select the tuning parameters, a cross-validation method that incorporates shrinkage is used to enhance the performance on functional representations with a large basis dimension. This method provides an estimation strategy to determine the penalty parameter and the optimal number of components. Our independent component approach is applied to real EEG data to estimate genuine brain potentials from a contaminated signal. As a result, it is possible to control high-frequency remnants of neural origin overlapping artifactual sources to optimize their removal from the signal. An R package implementing our methods is available at CRAN.

Keywords:

functional data; functional kurtosis; penalized splines; smoothed principal components; auditory–motor coupling task; EEG; motion artifacts

1. Introduction

In the field of neurophysiology, electroencephalography (EEG) represents one of the few techniques providing a direct measure of bioelectrical brain activity, as oscillations in excitability of populations of cortical pyramidal cells [1] contribute to variations in the electrical potentials over the scalp. Oscillations are characterized by dominant intrinsic rhythms conventionally grouped into frequency bands, which are by now validated as markers of several neurocognitive phenomena [2]. However, despite the temporal resolution achievable with its high sampling rate, EEG is a technique that suffers from low signal-to-noise ratio. This is mainly due to the fact that the layers of tissue dividing the electrodes from the cortex act as a natural filter attenuating genuine brain activity, resulting in a combination of cortical and artifactual sources in the EEG signal. In addition, the dominant brain-related spectral features often overlap with artifactual activity in higher frequency bands [3], and particularly at lower frequencies most of the variance in the signal is explained by physiological sources outside the brain. For these reasons, analyzing EEG signals can ultimately be viewed as solving a source-separation problem with the goal of estimating brain potentials of interest.

Blind source separation techniques such as independent component analysis (ICA) are commonly used to address artifact detection and correction of EEG signals. The term ICA encompasses a broad scope of algorithms and theoretical rudiments aligned to the assumption of independence of the latent sources in the data. From the statistical perspective, it could be regarded as a refinement of principal component analysis (PCA) that goes beyond the variance patterns of the data, introducing high-order measures such as kurtosis or negentropy to obtain more interpretable outcomes. This way, the data can be approximately represented in terms of a small set of independent variables while, in the PCA reduction, these variables are only assumed to be uncorrelated. An overview of statistical methodologies for ICA is provided in [4]. A comprehensive monograph of the subject can be found in [5].

The use of sampling units in the form of functions that evolve on a continuum, rather than through vectors of measurements, has been popularized over the last two decades to solve a broad class of problems. Functional data analysis provides a natural generalization for a wide variety of statistical techniques that take advantage of the complete functional form of the data by including relevant information related to smoothness and derivability (see [6,7,8] for a systematic review of the topic). The extension of ICA to functional data has, however, not yet received the attention nor the prolific developments of other reduction techniques in this framework, such as functional principal component analysis (FPCA).

A first attempt to develop an extension of the classic multivariate ICA model was investigated in [9] by exploiting the functional principal component decomposition. Functional ICA techniques were also implemented in [10], who defined the kurtosis operator of a standardized sample in an approximation to a separable infinite-dimensional Hilbert space. Under this setting, the kurtosis eigenfunctions are expected to be rougher as the space does not contain functions that are pointwise convergent. Their approach focuses on the classification properties of the kurtosis operator, whose decomposition is assumed to have a similar form to the Fisher discriminant function. More recently, [11,12] developed a functional ICA model using an estimation procedure stemmed from the finite Karhunen–Loève (K-L) expansion [13] (p. 37), which is a less rough space since its orthogonal expansion is optimal in the least-squared error sense. In order to control the roughness of the K-L functions, we extend this model setup endowing the space with a new geometrical structure given by a Sobolev inner product.

The use of functional data in brain imaging analysis has gained notoriety in the last years, despite the complexity and computational cost arisen in its treatment. Data acquired from an electroencephalogram might elicit a wide variety of functional data methods, going from the estimation of smoothed sample curves to more advanced reduction and forecast techniques. See, for example, [14,15,16,17,18]. Current research is mainly focused on functional principal component approaches for modelling data free of artifactual sources. However, the efficiency of functional ICA techniques used in stages where data are contaminated by physiological artifacts remains, to the best of our knowledge, untested. In contrast, this problem has been extensively addressed in the multivariate environment; Ref. [19] compares the state-of-the-art methods for artifact removal.

In this paper, a methodology based on piecewise polynomial smoothing (B-splines) is developed to disentangle the overlap between neural activity and artifactual sources. Because of the transient nature and the complex morphology of EEG data, B-splines provide a good alternative to represent the non-sinusoidal behaviour of the neural oscillatory phenomena due to its well-behaved local smoothing. The goal is to use the proposed smoothed functional ICA to obtain more accurate brain estimates by subtracting artifacts free of noise. While for a strictly different kind of data, wavelet-based approaches or hybrid settings combining wavelet with ICA have been demonstrated to perform well at denoising common artifacts (see, e.g., [3,20,21,22]). By contrast, and despite the obvious differences between both kinds of data, our independent component estimation is based on a penalized spline (P-spline) approach [23,24] that has a lower computational cost and is mathematically simpler. P-splines have been successfully applied for dimension reduction [25] as well as for the estimation of different functional regression models [26,27,28,29].

Nevertheless, what characterizes our method is that the decomposition is naturally regulated by the principal component eigendirections and optimized by penalized estimators. Contrarily, in using the wavelet approaches, this is decided on the basis of the frequency band features of the data or the components. For this reason, the proposed functional ICA can be conceived as a bi-smoothed estimation procedure. The end-user will finally appreciate how artifact extraction can be fine-tuned by regulating a single smoothing parameter, making it intuitive to improve the results through a visual inspection of the independent component scores.

The paper is organized as follows. We introduce our model in Section 2 and develop the smoothed FICA decomposition using basis expansion representations of functional data in Section 3. A method for selecting the tuning parameters is discussed in Section 4. To test the effectiveness of our model in recovering brain signals, Section 5 provides a simulation using real EEG data on single trial designs containing stereotyped artifacts. Section 6 shows how our smoothed FICA works in the context of event-related potentials designs. Finally, we conclude with a brief discussion in Section 7. The presented P-spline smoothed FICA is implemented in the R package pfica [30].

2. Smoothed Functional Independent Component Analysis

2.1. Preliminaries

Let

y_{i} = {(y_{i 1}, \dots, y_{i m_{i}})}^{T}

be a signal of

i, (i = 1, \dots, n)

components digitized at the sampling points

t_{i k}, (k = 1, \dots, m_{i})

. Consider that the sample data are observed with error, so that it can be modeled as

y_{i k} = x_{i} (t_{i k}) + ε_{i k},

(1)

where

x_{i}

is the ith functional trajectory of the signal and

ε_{i k}

mutually independent measurement errors with zero means. The sample functions

x_{1}, \dots, x_{n}

are assumed to be realizations of independent and identically distributed copies of a random functional variable X in

L^{2} (T)

, the separable Hilbert space of square integrable functions from T to

R

, endowed with the usual inner product

〈 f, g 〉 = \int_{T} f (t) g (t) d t,

and the induced norm

∥ f ∥ = {〈 f, f 〉}^{1 / 2}

. Thorough the text, X is assumed to have zero mean and finite fourth moments, which implies that higher order operators are well defined.

For

s, t \in T

, the sample covariance operator

C_{x}

is an integral operator with kernel

c (s, t) = n^{- 1} \sum_{i = 1}^{n} x_{i} (s) x_{i} (t)

admitting the Mercer decomposition

c (s, t) = \sum_{j = 1}^{\infty} η_{j} γ_{j} (s) γ_{j} (t),

where

{η_{j}, γ_{j}}_{j}

is a positive sequence of eigenvalues in descending order and their associated orthonormal eigenfunctions. The functions

x_{i} (t)

can be approximately represented by a truncated series of the K-L expansion

x_{i}^{q} (t) = \sum_{j = 1}^{q} z_{i j} γ_{j} (t),

(2)

where

z_{i j} = 〈 x_{i}, γ_{j} 〉

are zero mean random variables with var

(z_{j}) = η_{j}

and cov

(z_{j}, z_{j^{'}}) = 0

for

j \neq j^{'} .

These variables are referred to as the principal components scores and are uncorrelated generalized linear combinations of the functional variable with maximum variance. Moreover, if the q term in (2) is optimally selected, the mean squared error is minimized, providing the best linear approximation to the original data [31] (p. 21). A functional Varimax rotation has been recently introduced to improve the interpretation of the most explicative principal component scores [32].

2.2. Functional ICA of a Smoothed Principal Component Expansion

The notion of independent components of a random vector cannot be immediately extended to the case of Hilbert-valued random elements (functional data) due to the fact that a probability density function is not generally defined in this context [33]. In the sequel, we consider the definition of independence introduced in [34], which establishes that a functional random variable has independent components if the coordinates obtained after projecting on to a given orthonormal basis are independent variables. Then, the aim of functional independent component analysis (FICA) is to find a linear operator

Γ,

such that for a truncated orthonormal basis

ϕ_{j} (j = 1, \dots, q)

in

L^{2} (T)

, the variables

〈 Γ X, ϕ_{j} 〉

are mutually independent. By considering X prompted by a Gaussian process, a functional principal component analysis (FPCA) would suffice to obtain the independent components [13] (p. 40). However, as functional data are not inherently of this kind, it is assumed that if X has a finite-dimensional representation, then it can be transformed by the operator

Γ

to achieve the goals of the model. This begs the question of the basis choice for X, whereupon the results markedly depend.

In this paper, the sample

x_{i}

is approximated by a smoothed functional PCA representation obtained by introducing an orthonormality constraint with respect to the weighted Sobolev inner product

〈f, g〉_{λ} = 〈f, g〉 + λ 〈R f, R g〉,

(3)

where R is an operator with the action

R f (t) = d^{2} f (t) / d t^{2}, f \in dom (R)

that measures the roughness of the curves, and

λ

is a non-negative penalty parameter. Notice that, when

λ = 0

, (3) is simplified to the usual inner product, it means that that

x_{i}

can be uniquely represented by the K-L basis, i.e., the eigenfunctions of

C_{x} .

To estimate the smoothed principal components, Silverman [35] proposed the following variance maximization problem

γ_{λ, j} = argmax \frac{var (〈γ, x〉)}{{| | γ | |}^{2} + λ 〈R γ, R γ〉} = \max \frac{〈γ, C_{x} γ〉}{{| | γ | |}_{λ}^{2}},

(4)

subject to the constraint

{〈 γ, γ_{λ, k} 〉}_{λ} = 0

for all

k < j,

where

γ

is a function assumed in a closed linear subspace of

L^{2}

with square-integrable second derivatives. We emphasize that the problem of finding

γ_{λ, j}

depends on the sample size n and the selection of the penalization parameter

λ

. The authors of [36] established the existence of the solutions of the optimization problem (4) for any

λ \geq 0

. Silverman [35] proved the consistency of the estimators as

n \to \infty

and

λ \to 0

. Generalized consistency and asymptotic distributions of the estimators have been derived in [37], using expansions of the perturbed eigensystem of a sample smoothed covariance operator.

The functions

{γ_{λ, j}}

form a complete orthonormal system in the subspace endowed by

{〈\cdot, \cdot〉}_{λ}

, making this basis non-compatible for our independent component model in

L^{2} (T) .

However, [38] generalized Silverman’s method providing the following equivalents functional PCA.

Proposition 1.

Given a sample

{x_{i}}

of a functional variable with trajectories in

L^{2} (T),

there exists a positive definite operator

S^{2}

such that the following PCA decompositions are equivalent:

The FPCA of $S^{2} (x_{i})$ with respect to ${〈 \cdot, \cdot 〉}_{λ}$ , $S^{2} (x_{i}) = \sum_{j} z_{i j} γ_{λ, j} .$
The FPCA of $S (x_{i})$ with respect to $〈 \cdot, \cdot 〉$ , $S (x_{i}) = \sum_{j} z_{i j} S^{- 1} (γ_{λ, j}) .$
The FPCA of X with respect to ${〈 \cdot, \cdot 〉}_{S}$ , $x_{i} = \sum_{j} z_{i j} S^{- 2} (γ_{λ, j}),$
with ${〈 f, g 〉}_{S} = 〈 S (f), S (g) 〉 = {〈 S^{2} (f), S^{2} (g) 〉}_{λ} .$

Therefore, the eigenfunctions of the covariance operator

C_{S x} = S C_{x} S

of the smoothed sample

S (x_{i})

are given by

β_{j} = S^{- 1} (γ_{λ, j}),

where

γ_{λ, j}

are obtained by the penalized estimation procedure set out for (4). Then, the basis

β_{j}

is orthonormal with respect to the usual inner product in

L^{2} (T)

, so that the smooth sample data

S (x_{i})

can be approximated by its truncated K-L expansion

χ_{i}^{q} (t) = \sum_{j = 1}^{q} z_{i j} β_{j},

(5)

where

z_{i j} = 〈 β_{j}, S (x_{i}) 〉 = 〈 γ_{λ, j}, x_{i} 〉,

and

χ_{i}^{q} (t)

denotes a q-dimensional orthonormal representation of the smoothed sample data

S (x_{i})

in

L^{2} (T) .

The functional ICA version proposed in this paper uses the elements of this expansion to estimate the independent components of the original data.

Our main assumption is that the target functions can be found in the space spanned by the first q eigenfunctions of the operator

C_{S x}

, as it is endowed with a smooth second-order structure represented by the major modes of variation of the empirical data. Thus, in such eigensubspace, it is expected to gain some accuracy in the forthcoming results due to the attenuation of the higher oscillation modes corresponding to the small eigenvalues of

C_{S x}

. Henceforth, we denote by

M^{q} = span {β_{1}, \dots, β_{q}}

the subspace spanned by the q first eigenfunctions of

C_{S x}

. Without loss of generality,

M^{q}

will be assumed to preserve the inner product in

L^{2} (T)

.

Most of the multivariate ICA methods require the standardization of the observed data with the inverse square root of the covariance matrix in order to remove any linear dependencies and normalize the variance along its dimensions. In infinite-dimensional spaces, however, covariance operators are not invertible, giving rise to an ill-posed problem. As long as our signal is represented in

M^{q}

, no regularization is needed and, under moderate conditions, the inverse of the covariance operator can be well defined. Since standardization is a particular case of whitening (or sphering), we can generalize the procedure in the form of a whitening operator

Ψ

that transforms a function in

M^{q}

into a standarized function on the same space. This implies that

Ψ (χ^{q}) = {\tilde{χ}}^{q}

is a standardized functional sample whose covariance operator

C_{{\tilde{χ}}^{q}}

is naturally satisfied to be the identity inside the space.

As an extension of the multivariate case, the sample kurtosis operator of the standardized data is usually defined as

K_{{\tilde{χ}}^{q}} (h) (s) = \frac{1}{n} \sum_{i = 1}^{n} 〈 {\tilde{χ}}_{i}^{q}, {\tilde{χ}}_{i}^{q} 〉 〈 {\tilde{χ}}_{i}^{q}, h 〉 {\tilde{χ}}_{i}^{q} (s) = 〈k (s, \cdot), h〉,

(6)

where

k (s, t) = n^{- 1} \sum_{i = 1}^{n} {∥ {\tilde{χ}}_{i}^{q} ∥}^{2} {\tilde{χ}}_{i}^{q} (s) {\tilde{χ}}_{i}^{q} (t)

denotes the kurtosis kernel function of

{\tilde{χ}}^{q},

and h the function in

M^{q}

to be transformed. In the remainder of this article, it is assumed that the kurtosis operator is positive-definite, Hermitian and equivariant (see [11]). Again, by Mercer’s theorem its kernel admits the eigendecomposition

k (s, t) = \sum_{l = 1}^{q} ρ_{l} ψ_{l} (s) ψ_{l} (t),

where

{\{ρ_{l}, ψ_{l}\}}_{l = 1}^{q}

is a positive sequence of eigenvalues and related eigenfuntions. With this, we can define the independent components of

χ_{i}^{q}

as mutually independent variables with maximum kurtosis given by

ζ_{i l, {\tilde{χ}}^{q}} = 〈 {\tilde{χ}}_{i}^{q}, ψ_{l} 〉 .

Challenging questions arise on how the Karhunen–Loève Theorem might be applied in this context. Intuitively, we note that this procedure leads to the expansion

{\tilde{χ}}_{i}^{q} (t) = \sum_{l = 1}^{q} ζ_{i l, {\tilde{χ}}^{q}} ψ_{l} (t)

which can be approximated in terms of r eigenfuntions

ψ_{l}

of interest, e.g., those associated with the independent components with extreme kurtosis values. Under mild conditions, this problem was solved in [11,12] by choosing

r = q

. However, there are other possibilities, such as considering

r < q

or

{ψ_{1}, \dots, ψ_{q}}

as a basis of projection for either

x, χ^{q}

or

{\tilde{χ}}^{q}

, in view of the fact that it preserves the four-order structure of the standardized data.

3. Basis Expansion Estimation Using a P-Spline Penalty

In order to estimate the independent components from noisy discrete observations in Equation (1), it will be assumed that the tajectories belong to a finite-dimensional space of

L^{2} (T)

spanned by a set of B-spline basis functions

{ϕ_{1} (t), \dots, ϕ_{p} (t)} .

Then, each sample curve can be expanded as

x_{i} (t) = \sum_{j = 1}^{p} a_{i j} ϕ_{j} (t),

(7)

or, in matrix form,

x = A ϕ,

where A is a coefficient matrix

A = (a_{i j}) \in R^{n \times p}

and

ϕ = {(ϕ_{1}, \dots, ϕ_{p})}^{T}

,

x = {(x_{1}, \dots, x_{n})}^{T}

denote vector-valued functions. The basis coefficients for each sample curve can be found by least squares approximation minimizing the mean squared error (

MSE

)

MSE (a_{i} ∣ x_{i}) = {(x_{i} - Φ_{i} a_{i})}^{T} (x_{i} - Φ_{i} a_{i}),

where

Φ_{i} = {ϕ_{j} (t_{i k})} \in R^{m_{i} \times p}

and

a_{i} = {(a_{i 1}, \dots, a_{i p})}^{T}

. For general guidance on both definition knots and order of B-splines, we refer the reader to [6] (Chapters 3 and 4). Although in this paper a non-penalized least squares approximation is assumed, [39] give a detailed account of how to estimate the basis coefficients using different roughness penalty approaches (continuous and discrete) in terms of B-splines.

The next step consists of smoothing the sample curves in terms of the smoothed principal components and associated weight functions

β_{j}

in (5). To do so, we next derive the P-spline FPCA approach developed in [25] that incorporates a discrete penalty based on d-order differences of adjacent B-spline coefficients (P-spline penalty) in the orthonormality constraint. Let us consider the B-spline basis expansion of the covariance eigenfunctions

γ (t) = ϕ {(t)}^{T} b

, with

b = {(b_{1}, \dots, b_{p})}^{T}

being its vector of basis coefficients, and a discrete P-spline roughness penalty function defined by

p e n_{d} (γ) = b^{T} P_{d} b,

where

P_{d} \in R^{p \times p}

is the penalty matrix

P_{d} = Δ_{d}^{T} Δ_{d}

, with

Δ_{d}

being a matrix representation of the d-order difference operator

R .

Throughout the paper, we assume two order differences defining the penalty function

b^{T} P_{2} b = {(b_{1} - 2 b_{2} + b_{3})}^{2} + \dots + {(b_{p - 2} - 2 b_{p - 1} + b_{p})}^{2} .

This way, the inner product in (3) is given in terms of B-splines expansions as

〈f, g〉_{λ} = f^{T} G g + λ f^{T} P_{2} g,

with

f = ϕ^{T} f

,

g = ϕ^{T} g

, and

G = (〈 ϕ_{j}, ϕ_{j^{'}} 〉), (j, j^{'} = 1, \dots, p)

. Then, the maximization problem in (4) is equivalent to solve the following matrix problem:

b_{λ, j} = argmax \frac{b^{T} G Σ_{A} G b}{b^{T} (G + λ P_{2}) b},

(8)

subject to the constraint

b^{T} (G + λ P_{2}) b_{λ, k} = 0

for all

k < j,

where

Σ_{A} = n^{- 1} A^{T} A

and

λ \geq 0

is the penalty parameter used to control the trade-off between maximizing the sample variance and the strength of the penalty.

Because B-spline basis are non-orthonormal with respect to the usual

L^{2}

geometry, we can apply Cholesky factorization of the form

L L^{T} = G + λ P_{2}

in order to find a non-singular matrix that allows us to operate in terms of the B-spline geometrical structure induced into

R^{q}

. Then, finding the weight coefficients corresponds to solve the eigenvalue problem

L^{- 1} G Σ_{A} G {(L^{- 1})}^{T} v_{j} = η_{j} v_{j},

(9)

where

v_{j} = L^{T} b_{λ, j}

and the coefficients of

γ_{λ, j}

are

b_{λ, j} = {(L^{- 1})}^{T} v_{j}

. Therefore, we have obtained a set of orthonormal functions with respect to the inner product

{〈 \cdot, \cdot 〉}_{λ} .

The jth smoothed principal component is then given by

z_{j} = A G b_{λ, j} = A G {(L^{- 1})}^{T} v_{j} .

Thus, the problem is reduced to the multivariate PCA of the matrix

A G {(L^{- 1})}^{T}

in

R^{q}

(see [25] for a detailed study). From the results in [38,40] we deduce in this paper the expression of the smoothing operator S that provides the equivalence between this multivariate PCA and the functional PCA of the smoothed data

S (x_{i})

in

L^{2} (T) .

Proposition 2.

Given the basis expansion (7) for a random sample

{x_{i}}

of curves in

L^{2} (T),

the PCA of the matrix

A G {(L^{- 1})}^{T}

with the usual inner product in

R^{p}

is equivalent to all FPCA in Proposition 1 with the operator

S^{2}

defined as

S^{2} (f) = ϕ {(t)}^{T} {(G + λ P_{d})}^{- 1} G f

, with

f = ϕ {(t)}^{T} f .

Proof.

Define, for all

f = ϕ {(t)}^{T} f, g = ϕ {(t)}^{T} g,

the new inner product

{〈 f, g 〉}_{K} = f^{T} K g

where

K = D^{T} D,

with

D = L^{- 1} G^{T} .

Proposition 2 in [40] proved that the PCA of matrix

A D^{T}

with the usual inner product in

R^{p}

is equivalent to FPCA of

x_{i}

with respect to

{〈 \cdot, \cdot 〉}_{K} .

That is,

x_{i} = \sum_{j} z_{i j} f_{j}

with

f_{j} = ϕ^{T} D^{- 1} v_{j},

with

v_{j}

being the eigenvectors of the matrix

A D^{T} .

Then, from Proposition 1 in this paper, we have that

{〈 S^{2} (f), S^{2} (g) 〉}_{λ} = {〈 f, g 〉}_{K} .

If we suppose that there exists a matrix C such that

S^{2} (f) = ϕ^{T} C f,

then

{〈 S^{2} (f), S^{2} (g) 〉}_{λ} = f^{T} C^{T} (G + λ P_{d}) C g = f^{T} D^{T} D g .

As a consequence,

C^{T} L L^{T} C = D^{T} D

, so that

L^{T} C = R D

with R being an orthonormal matrix

(R R^{T} = I_{p}) .

Therefore,

S^{2} (f) = ϕ^{T} {{(L^{- 1})}^{T} R D} f .

On the other hand, from Proposition 1, we have that

γ_{j} = S^{2} (f_{j})

which implies that

{(L^{- 1})}^{T} v_{j} = {(L^{- 1})}^{T} R D D^{- 1} v_{j} .

As a consequence, we obtain that

R = I_{p}

and

S^{2} (f) = ϕ^{T} {{(L^{- 1})}^{T} D} f = ϕ^{T} {{(G + λ P_{d})}^{- 1} G} f .

□

As a result, the principal components (scores) of

S (x_{i})

are given by

Z = A G {(L^{- 1})}^{T} V

where V is the matrix whose columns are the eigenvectors

v_{j}

verifying Equation (9), and thus the eigenfunctions

β_{j}

are

β_{j} = S^{- 1} (γ_{λ, j}) .

Having estimated the weight functions coefficients and principal components scores, assume next that the smooth principal component expansion in (5) is truncated at the q-term. Then, the column vector of smoothed sample curves is given by

χ^{q} (t) = Z^{q} β (t),

where

Z^{q} = (z_{i j}) \in R^{n \times q}

is the matrix whose columns are the first q principal components scores with respect to the basis of smooth principal component weight functions

β (t) = {(β_{1} (t), \dots, β_{q} (t))}^{T} .

With the above results, the functional independent components are computed from the smoothed principal component approximation of functional data. Following the ICA pre-processing steps, we first standardize the approximated curves defining the whitening operator as

Ψ {χ^{q} (t)} = {\tilde{χ}}^{q} (t) = {\tilde{Z}}^{q} β (t),

with

{\tilde{Z}}^{q} = Z^{q} Σ_{Z^{q}}^{- 1 / 2}

being the matrix of standardized principal components and

Σ_{Z^{q}}^{- 1 / 2} = \sqrt{n} {{(Z^{q})}^{T} Z^{q}}^{- 1 / 2},

the inverse square root of the covariance matrix of

Z^{q}

. The described whitening transformation is essentially an orthogonalization of the probabilistic part of

χ^{q}

, so the matrix

{\tilde{Z}}^{q} \in R^{n \times q}

naturally satisfies

Σ_{{\tilde{Z}}^{q}} = I_{q}

, and the associated covariance operator

C_{{\tilde{x}}^{q}}

is unitary.

Then, the kurtosis operator (6) of the standardized curves

{\tilde{χ}}^{q} (t)

is given in matrix form by

K_{{\tilde{χ}}^{q}} (h) = \frac{1}{n} {({\tilde{Z}}^{q^{T}} D_{{\tilde{Z}}^{q}} {\tilde{Z}}^{q} h)}^{T} β (t), \forall h = β {(t)}^{T} h,

where

D_{{\tilde{Z}}^{q}} = diag ({\tilde{Z}}^{q} {\tilde{Z}}^{q^{T}}) .

The eigenanalysis of this kurtosis operator leads to the diagonalization of the kurtosis matrix of the standardized principal components

{\tilde{Z}}^{q}

,

Σ_{4, {\tilde{Z}}^{q}} u_{l} = ρ_{l} u_{l} (l = 1, \dots, q),

(10)

where

Σ_{4, {\tilde{Z}}^{q}} \in R^{q \times q}

is defined as

Σ_{4, {\tilde{Z}}^{q}} = \frac{1}{n} \sum_{i = 1}^{n} {∥{\tilde{z}}_{i}^{q}∥}^{2} {\tilde{z}}_{i}^{q} {\tilde{z}}_{i}^{q^{T}} = \frac{1}{n} {\tilde{Z}}^{q^{T}} D_{{\tilde{Z}}^{q}} {\tilde{Z}}^{q},

with

{\tilde{z}}_{i}^{q}

being the column vector

q \times 1

with the ith row of the matrix

{\tilde{Z}}^{q} .

The eigenproblem (10) is not restricting to assume that

Σ_{4, {\tilde{Z}}^{q}}

is uniquely determined. In fact, other kurtosis matrices can be considered (see, e.g., [41,42]). This way, the P-spline smoothed functional ICA of x in

L^{2} (T)

is obtained from the multivariate ICA of

Z^{q}

in

R^{q} .

The resulting weight functions are now

ψ_{l} (t) = β {(t)}^{T} u_{l} (l = 1, \dots, q),

where the coefficients vectors

u_{l}

are the eigenvectors of the predefined kurtosis matrix. Then, the independent components can be calculated as

ζ_{l, {\tilde{χ}}^{q}} = {\tilde{Z}}^{q} u_{l} .

Finally, the operator

Γ

defining the FICA model is

Γ (χ_{i}^{q}) = β^{T} U^{T} Σ_{Z^{q}}^{- 1 / 2} z_{i}^{q},

with

z_{i}^{q}

being the column vector

q \times 1

with the ith row of

Z^{q}

and

U \in R^{q \times q}

the matrix of eigenvectors of the kurtosis matrix

Σ_{4, {\tilde{Z}}^{q}} .

4. Parameter Tuning

The problem concerning the estimation of the smoothed independent component curves lies in finding an optimal truncation point q, as well as a suitable penalty parameter. As q approaches p, more of the higher oscillation modes of the standardized sample are induced in the estimation. Otherwise, we are denoising the data from its second and fourth-order structure simultaneously. From this perspective, it is desirable to increase the value of q such that the latent functions of the whitened space can be captured by the kurtosis operator. Observe that this kind of regularization is not exactly the same as the one providing the P-spline penalization of the roughness of the weight functions. Attenuating the higher frequency components of the FPCA model does not necessarily affect an entire frequency bandwidth of the data. Thus, if the original curves are observed with independent error, and the error is persistent in the functional approximation, it may overlap the estimation of the kurtosis eigenfunctions. In this context, smoothing would be appropriate. Once the value of q is decided, we should examine those components with extreme kurtosis, contrary to the FPCA where only the components associated to large eigenvalues are considered.

Penalty Parameter Selection

Leave-one-out cross-validation [43] is generally used to select the penalty parameter in order to achieve a suitable degree of smoothness on the weight functions, but also to induce the truncation point q. In a more explicit and condensed form, this procedure in our model lies in finding a value of

λ

that minimizes

c v_{q} (λ) = \frac{1}{n} \sum_{i = 1}^{n} {∥x_{i} - χ_{i}^{q (- i)}∥}^{2},

(11)

where

χ_{i}^{q (- i)} = \sum_{l = 1}^{q} z_{i l}^{(- i)} β_{l}^{(- i)} (t)

is the reconstruction of the ith curve

x_{i}

in terms of the q first smoothed principal components by leaving out it in the estimation process. We found, however, that cross-validation was not sensitive for a reasonably large basis dimension, forcing us to reformulate the strategy.

To address this problem, the penalty parameter might be subjectively chosen, although this can lead to bias and poor extraction of the artifactual sources. Hence, for the results presented in this paper, we propose a novel adaptive approach which consists in replacing (11) with

b c v_{q} (λ) = \frac{1}{n} \sum_{i = 1}^{n} {∥χ_{i}^{q; λ (- i)} - χ_{i}^{q; λ + ℓ (- i)}∥}^{2},

(12)

where

χ_{i}^{q; λ (- i)}

is a smoothed representation of

x_{i}

for some

λ

and

ℓ > 0

a value that increases the penalty in the second term of the norm, assume

ℓ = 0.1

. Then, for a fixed q, (12) is iterated for each

λ

in a given grid to find the one that minimizes

b c v_{q} (λ)

. Among all the q considered in the estimation process, we select the truncation point that minimizes this function.

If we require a basis dimension p greater than sample size

n,

a shrinkage covariance estimator [44] can be considered for computing

Σ_{A}

. This method guarantees positive definiteness and consequently an estimation of the higher and important eigenvalues not biased upwards. The same strategy is used for

b c v_{q} (λ)

. Recall the quadratic distances in

(12). These are given in terms of basis functions by

\begin{matrix} {∥χ_{i}^{q; λ (- i)} - χ_{i}^{q; λ + ℓ (- i)}∥}^{2} = \int_{T} {[χ_{i}^{q; λ (- i)} (t) - χ_{i}^{q; λ + ℓ (- i)} (t)]}^{2} d t = \\ = \int_{T} {[\sum_{l = 1}^{q} z_{i l}^{λ (- i)} \sum_{j = 1}^{p} b_{l j}^{λ (- i)} ϕ_{j} (t) - \sum_{l = 1}^{q} z_{i l}^{λ + ℓ (- i)} \sum_{j = 1}^{p} b_{l j}^{λ + ℓ (- i)} ϕ_{j} (t)]}^{2} d t = \\ = \int_{T} {[\sum_{j = 1}^{p} e_{i j} ϕ_{j} (t)]}^{2} d t = e_{i}^{T} G e_{i}, \end{matrix}

where

b_{j} = {(b_{j 1}, \dots, b_{j p})}^{T}

is the vector of basis coefficients of the jth weight function

β_{j}

in the B-spline basis

ϕ_{j} (t)

and

e_{i} = {(e_{i 1}, \dots, e_{i p})}^{T}

is a vector of residuals. Next, the matrix

E = (e_{i j}) \in R^{n \times q}

is reconstructed via shrinkage. That is, first we compute cov

_{S} (E)

where cov

_{S}

is a predefined shrinkage covariance estimator, then we apply Cholesky decomposition of the form

L L^{T} = {cov}_{S} (E)

. Finally, the basis coefficients of the reconstructed residual functions are

{\hat{e}}_{i} = {(L^{- 1})}^{T} e_{i}

, and consequently now

b c v {(λ)}_{q} = \frac{1}{n} \sum_{i = 1}^{n} {∥χ_{i}^{q; λ (- i)} - χ_{i}^{q; λ + ℓ (- i)}∥}^{2} = {\hat{e}}_{i}^{T} G {\hat{e}}_{i} .

We call this method baseline cross-validation (see Algorithm 1), as it operates across different reconstructions of

x_{i}

for a given baseline penalty parameter and a fixed q. This approach is more versatile and particularly useful when the original curves are extremely rough and approximated with a larger basis dimension, thus avoiding the least squares to collapse. Moreover, for a given q, it allows scoring of more than one

λ

as a result of the various relative minima it produces. The intuition behind baseline cross-validation is that there are several smoothing levels to endow the estimator with the ability for predictive modelling. These are given at evaluating “short distances” for a smoothing baseline

λ

in a given

χ_{i}^{q}

, which may be seen as a way of finding a trade-off for the global roughness of a q-dimensional basis. Note that, as the value of q increases, and despite the minimization of the mean squared error, it may be more difficult to find a smoothing balance between the elements of the basis due to a complex fabric of variability modes.

Algorithm 1.baseline cross-validation

Input:

A, ϕ_{j} (j = 1, \dots, p), G, P_{2}, λ_{k} = {(λ_{1}, \dots, λ_{m})}^{T}

Output:

λ^{•} .

for each

λ

in

λ_{k}

:

1:: Calculate $L^{- 1}$ via Cholesky decomposition of the matrix $G + λ P_{2} = L L^{T}$ and for $G + (λ + ℓ) P_{2} = L L^{T}$ .
2:: Diagonalize $L^{- 1} G Σ_{A_{s}} G {(L^{- 1})}^{T},$ where $Σ_{A_{s}} =$ cov $_{S} (A)$ , to obtain the coefficients of the eigenfunctions $β_{j}$ , $b_{j}$ and $b_{ℓ, j}$ for the incremental smoothing case.
3:: Calculate $Z^{q} = A^{T} G b_{j}$ , $Z_{ℓ}^{q} = A^{T} G b_{ℓ, j}$ and $A = b_{j} {(Z^{q})}^{T}$ , $A_{ℓ} = b_{ℓ, j} {(Z_{ℓ}^{q})}^{T}$ , where $A, A_{ℓ}$ are the coefficient matrices of the smoothed principal component expansion in terms of $ϕ_{j}$ .
4:: $E = A - A_{ℓ}$ and reconstruct $E$ via the covariance matrix cov $_{S} (E) .$
5:: bcv $(λ) = n^{- 1}$ tr $({\hat{E}}^{T} G \hat{E})$ , where $\hat{E}$ is the reconstructed matrix of residual coefficients and tr(·) is an operator that sums the diagonal elements of a square matrix.

end for

λ^{•} \leftarrow

argmin

_{λ}

bcv.

5. Simulation Study

A simulation study based on EEG data segments containing stereotyped artifacts was conducted to validate our methods for recovering brain sources. The data consist of four separate 64-channel recordings of a subject performing the following classes of self-paced repetitive movements: nodding, hand-tapping with a wide arm movement, eye-blinking and chewing. Recordings were performed in absence of sensory stimulation in a trial length 3 s sampled at 1 kHz, i.e.,

t_{i k} (i = 1, \dots, 64; k = 1, \dots, 3000)

. The signal was high-pass and low-pass filtered using Butterworth filters (cut-off at 0.5 and 30 Hz, order 4 and 6, respectively). An additional notch filter was applied for suppressing the 50 Hz power-line noise. More details on the preprocessing steps and experimental conditions are given in the online supplementary material. In reconstructing the functional form of the sample paths, we sought a less smooth fitting to mimic the brain potential fluctuations. Accordingly, a basis of cubic B-spline functions of dimension

p = 230

is fitted to all signal components minimizing the mean squared error to a negligible value.

The process of identifying artifactual functions is addressed by using topographic maps that roughly represent patterns of eigenactivity related to the distribution of bioelectric energy on the scalp. These maps are elaborated from the projection of the signal components

x_{1}, \dots, x_{64}

on to the basis of independent weight functions, i.e.,

ζ_{i l, x} = 〈 x_{i}, ψ_{l} 〉 (i = 1, \dots, 64; l = 1, \dots, q)

, whose resulting score vectors

ζ_{l, χ} = {(ζ_{1 l}, \dots, ζ_{n l})}^{T}

are depicted in the spatial electrode domain. Therefore, the aim is to examine how the kurtosis eigenfunctions contribute to

x_{i}

to discern possible patterns of artifactual activity. The components identified as artifacts will be considered for subtraction.

In order to simplify the burden of a manual selection, assume that all

ψ_{1}, \dots, ψ_{q}

obtained from the model correspond to a structure of latent artifactual eigenpatterns. Moreover, let

χ_{i}^{q} (t) = \sum_{l = 1}^{q} ζ_{i l, x} ψ_{l} (t)

be an expansion of artifactual components and related artifactual eigenfunctions. Then, the artifact subtraction in terms of basis expansions is

\begin{matrix} x_{i} (t) - χ_{i}^{q} (t) & = \sum_{j = 1}^{p} a_{i j} ϕ_{j} (t) - \sum_{l = 1}^{q} ζ_{i l, x} \sum_{j = 1}^{p} (u_{l}^{T} b_{j}) ϕ_{j} (t) = \\ = \sum_{j = 1}^{p} d_{i j} ϕ_{j} (t), \end{matrix}

(13)

where

d_{i j}

are the cleaned (or residual) coefficients, with

u_{l}

being the vector of coefficients of the independent weight function

ψ_{l}

in terms of the principal eigenfunctions. Thus, given the model parameters q and

λ

, the procedure to estimate and remove smooth artifactual components from EEG functional data can succinctly be derived as in Algorithm 2.

Algorithm 2.functional artifact subtraction

Input:

A, ϕ_{j} (j = 1, \dots, p), G, P_{2}, λ, q

Output:

d_{j} .

1:: Calculate $L^{- 1}$ via Cholesky decomposition of the matrix $G + λ P_{2} = L L^{T}$ .
2:: Perform the PCA of $A G {(L^{- 1})}^{T}$ . Obtain $Z^{q}$ and the coefficients $b_{j}$ of $β_{j}$ .
$\to$ if $p > n$ then diagonalize $L^{- 1} G Σ_{A_{s}} G {(L^{- 1})}^{T},$ where $Σ_{A_{s}} =$ cov $_{S} (A) .$
3:: Whiten $Z^{q}$ : i.e. ${\tilde{Z}}^{q} = Z^{q} Σ_{Z^{q}}^{- 1 / 2}$ .
4:: Fix a fourth-order matrix $Σ_{4, {\tilde{Z}}^{q}}$ and diagonalize it. Obtain the eigenvalues $ρ_{l}$ and associated eigenvectors $u_{l} (l = 1, \dots, q)$ .
5:: Calculate $ζ_{i l, x} = 〈x_{i}, ψ_{l}〉$ for $ψ_{l} (t) = \sum_{j = 1}^{q} u_{l j} β_{j} (t)$ .
6:: Select the artifactual score vectors in $ζ_{l, x} .$ Expand the artifactual space as $χ_{i}^{q} (t) = \sum_{l = 1}^{q} ζ_{i l, x} ψ_{l} (t)$ .
7:: Subtract the artifactual coefficients in terms of $ϕ_{j}$ using (13) and obtain the vector of coefficients $d_{j}$ to reconstruct the functional brain signal.

Baseline cross-validation was performed on a given grid, selecting the value which minimizes

b c v_{q} (λ)

for

q = 1, \dots, j_{0}

where

j_{0}

is defined as the index entry corresponding to the first relative maximum of the first order differences of FPCA’s eigenvalues

Δ η_{j} .

We find that truncating at

q = j_{0}

is a way of exploring independence in the high variability structure of the data. In analysing EEG signals, this entails major effectiveness at reducing the artifactual content to a few eigenfunctions, particularly for the low-frequency physiological activity such as blinks and movement-related artifacts. One may see this truncation rule as a measure to improve the accuracy in the estimation of certain artifacts, while preserving the modes of variability related to the rhythms of the latent brain processes. The log-distances using bcv

(λ)

for each one of the datasets are shown in Figure 1. Further results are presented in Table 1.

Preliminary results comparing both penalized and non-penalized estimation show that the smoothed FICA presumably attenuates the high-frequency potentials of neural origin, revealing the latent shape of the artifact. More importantly, however, is that all topographic maps reflect well-known spatial activation of the artifactual content. A selection of eigenfunctions from each trial and their associated component scores are depicted in Figure 2. Physiological non-brain activity near the recording zone, such as blinks and large amplitude body movements, can be easily detected in controlled conditions using the proposed methodology. However, the coexistence of such artifacts may result in a non-linear distortion of them, e.g., via large changes of the impedance [45]. This could entail a more challenging situation, as algorithms based on linear mixing may not be that effective at a certain point. Nonetheless, the aforementioned artifacts enhance the role of smoothing due to their low-frequency trademark in the signal. In contrast, when artifacts are characterised by localised high-amplitude curves, as is the case of the fourth artifactual eigenfunction (chewing), smoothing is not able to denoise effectively. We believe this happens for two reasons: first, the noise provided by the fourth-order structure of the model is essential to configure the shape of the artifact; second, the B-spline basis has a limited flexibility to smooth abrupt local contours. Hence, artifacts such as jaw clenching and chewing are quite sensitive to smoothing and difficult to correct for subtraction. Interestingly, hybrid procedures combining spline interpolation and wavelet filtering have shown promising results trying to solve this problem in functional near-infrared spectroscopy research (see [46]).

It seems reasonable to conjecture that restricting q to the first FPCA terms decreases the odds of obtaining spurious artifactual functions as, they represent dominant modes of variability usually related to large artifacts. In such cases, the artifact subtraction with the smoothed components preserved the brain activity rhythms in the original form, while for

λ = 0

it caused a reduction and a distortion of relevant potentials. However, bcv may tend to oversmooth slightly in a sense of an effective artifact removal, resulting in certain artifactual residue after subtraction. This happens due to the complexity of the mixed sources, and can be solved by examining other relative minima in our results. The plots for all channels and datasets comparing the effect of subtracting artifactual components are omitted for the sake of space. Online supplementary materials provide R code for its visualization.

Although our tests have provided good results by subtracting all smoothed components, further research is needed to corroborate their physiological validity. As reported in [47], reducing the dimensionality of the data with a PCA before applying ICA is not always beneficial, although in some cases it may improve the signal-to-noise ratio of the large sources and their subsequent isolation. We see that our approach paves the way for developing measures of correlation, dipolarity, stability or sparsity in the functional data domain to fine-tune artifact selection. An important issue that remains open is whether the restriction imposed for the truncation point is beneficial or not to achieve better results.

6. Estimating Brain Signals from Contaminated Event-Related Potentials

To illustrate our methods, we reproduced a typical experimental scenario where a human participant had to perform full-arm movements synchronised to a periodic auditory stimulus. An EEG recording was performed during the task. Arguably, what we provide here is a paradigmatic example wherein the researcher needs to clean the signal from motion-related artifacts while preserving activity genuinely related to perceptual and motor brain processes. The subject was instructed to tap his hand on the table, synchronizing with a steady auditory stimulus in one condition while listening to the same stimulus without any movement involved in the other. Disposing of a baseline, we could directly compare the outcome of our cleaning procedure with an uncontaminated experimental situation. We recorded 100 trials of 3 s per condition, divided into randomized blocks of 25 trials. The stimulus period was 750 milliseconds, i.e., 4 tappings in one of the conditions. Movements were intentionally exaggerated to maximize eventual movement-related artifacts. In this section, the same configuration for running the model (

p = 230; i = 1, \dots, 64; k = 1, \dots, 3000

) is preserved from the previous one.

The P-spline smoothed FICA is performed at each trial to obtain brain estimates by subtracting the artifactual components. Here, the complexity of the signal increases as it is assumed a mixture of artifacts and other brain processes due to the cognitive task. Figure 3 shows the grand-averaged results comparing both conditions before and after the artifact removal. A FPCA is performed on the averaged data to visualize the spatial distribution of the scores in the direction of the leading eigenvector before and after the removal. As expected, the activations where nearly coincident after the artifact removal and more prominent in the central region of the scalp. The upper left panel displays the EEG signal in some frontal channels where the movement-related artifact is prominently visible before the subtraction. Further evidence of such artifactual content is given in the second row where the raw curves are shown in the other condition. Clearly, the pooled artifacts across the trials have here a different origin. The same panel shows the curves after subtracting the artifactual curves.

Our procedure notably reduces the movement-related artifact and renders the signal more stationary. Indeed, differences are smaller in the non-movement condition but, in either case, our algorithm is capable of reducing artifactual content while retaining the brain activity intact. From our previous tests, one may expect some artifact residue at a trial level depending on the estimated

λ

and the diversity of source artifacts. We stress that as the response to the repeated stimulus is assumed to be invariant and small in terms of amplitude, averaging suppresses non-phase-locked activity and reveals the potential elicited by the stimulus [48]. Consequently, the attenuation of the roughness of the artifactual component functions will lead to a better estimation of brain potentials at averaging rather than the subtraction of rough components.

7. Discussion

The proposed independent techniques are, to the best of our knowledge, the first to provide a functional framework for smoothed artifact extraction and removal of dense data approximated with a large number of knots. We found that using shrinkage estimators is a reasonable starting point for smoothing covariance operators with this kind of functional data (see also [49]). According to this setting, a novel cross-validation method is proposed for selecting the model parameters. Despite being computationally expensive, our approach is proven to outperform the lack of sensitivity of other existing methods. Overall, this allows the application of independent component techniques from a smoothing perspective somewhat more flexible when compared to other modelling strategies.

Although [11] established a form of Fisher consistency for the kurtosis operator decomposition, no asymptotic results of the non-smoothed, and hence of the smoothed independent components, have been derived. Therefore, one can assume that we rely on a competitive performance derived from previous FPCA asymptotic results. In our empirical setting, however, the study of such properties must be related to the functional data type and the penalized spline method used, involving considerably more technicalities. See, for example, [50,51]. These theoretical developments lie beyond the scope of the present work. However, we hope to pursue such study in a separate paper.

In our simulations, the kurtosis operator has proven to work well at capturing artifactual eigenfunctions with different frequency characteristics, at least under certain conditions. One of the strengths of our model is the double regularization, which allows us to circumvent the leak of brain activity and obtain clean movement-related artifacts. In essence, the degree of separation is defined through the space dimension, from more dependent (first q terms of the FPCA decomposition) to more independent (

q \to p

). Thus q acts as a regularization parameter to explore the variational component of the artifactual sources in the EEG signal, while

λ

provides more accurate estimations, particularly in using the first q terms of the K-L expansion. Further research is needed to determine how the model parameter selection can optimize the removal of artifacts with a minimum loss of variance patterns related to brain sources. Non-linear artifact distortion will inevitably suffer from cortical entrainment of challenging correction, suggesting the exploration of other subspaces prone to kurtosis data structures in addition to the smoothed principal component eigendirections.

Author Contributions

Conceptualization, methodology, software and formal analysis, M.V. and A.M.A.; writing—original draft preparation, M.V.; writing—review and editing, M.V., M.R. and A.M.A.; visualization, M.V.; supervision, A.M.A.; data collection and preprocessing M.V. and M.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Methusalem funding from the Flemish Government and by the project MTM2017-88708-P of the Spanish Ministry of Science, Innovation and Universities, project FQM-307 of the Government of Andalusia (Spain).

Informed Consent Statement

Informed consent was obtained from the subject involved in the study.

Data Availability Statement

Supplementary material containing the experimental results, additional R code and the datasets used in Section 5 and Section 6, will be publicly available in due course at: https://github.com/m-vidal/psfica-eeg-data.

Acknowledgments

The authors wish to thank Marc Leman for its valuable comments and Daniel Gost for helping with figures and formatting.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations and Abbreviations

Abbreviations

EEG	Electroencephalography
FICA	Functional Independent Component Analysis
FPCA	Functional Principal Component Analysis
ICA	Independent Component Analysis
K-L	Karhunen–Loève
PCA	Principal Component Analysis

References

Wang, X. Neurophysiological and Computational Principles of Cortical Rhythms in Cognition. Physiol. Rev. 2010, 90, 1195–1268. [Google Scholar] [CrossRef] [PubMed]
Buzsáki, G. Rhythms of the Brain; Oxford University Press: Oxford, UK, 2006. [Google Scholar]
Castellanos, N.P.; Makarov, V.A. Recovering EEG Brain Signals: Artifact Suppression with Wavelet Enhanced Independent Component Analysis. J. Neurosci. Methods 2006, 158, 300–312. [Google Scholar] [CrossRef]
Nordhausen, K.; Oja, H. Independent Component Analysis: A Statistical Perspective. WIREs Comput. Stat. 2018, 10, 1–23. [Google Scholar] [CrossRef]
Hyvärinen, A.; Karhunen, J.; Oja, E. Independent Component Analysis; John Wiley & Sons, Ltd: New York, NY, USA, 2001. [Google Scholar]
Ramsay, J.; Silverman, B.W. Functional Data Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
Hsing, T.; Eubank, R. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators; Willey: Chichester, UK, 2015. [Google Scholar]
Wang, J.L.; Chiou, J.M.; Müller, H.G. Functional Data Analysis. Annu. Rev. Stat. Its Appl. 2016, 3, 257–295. [Google Scholar] [CrossRef] [Green Version]
Mehta, N.; Gray, A. FuncICA for Time Series Pattern Discovery. In Proceedings of the 2009 SIAM International Conference on Data Mining, Sparks, NV, USA, 30 April–2 May 2009; pp. 73–84. [Google Scholar]
Peña, C.; Prieto, J.; Rendón, C. Independent Components Techniques Based on Kurtosis for Functional Data Analysis; Working Paper 14–10 Statistics and Econometric Series (06); Universidad Carlos III de Madrid: Madrid, Spain, 2014. [Google Scholar]
Li, B.; Bever, G.V.; Oja, H.; Sabolová, R.; Critchley, F. Functional Independent Component Analysis: An Extension of the Fourth-Order Blind Identification; Technical Report; Université de Namur: Namur, Belgium, 2019. [Google Scholar]
Virta, J.; Li, B.; Nordhausen, K.; Oja, H. Independent Component Analysis for Multivariate Functional Data. J. Multivar. Anal. 2020, 176, 1–19. [Google Scholar] [CrossRef] [Green Version]
Ash, R.B.; Gardner, M.F. Topics in Stochastic Processes; Academic Press: New York, NY, USA, 1975. [Google Scholar]
Xiao, L.; Zipunnikov, V.; Ruppert, D.; Crainiceanu, C. Fast Covariance Estimation for High-Dimensional Functional Data. Stat. Comput. 2016, 26, 409–421. [Google Scholar] [CrossRef] [Green Version]
Hasenstab, K.; Scheffler, A.; Telesca, D.; Sugar, C.A.; Jeste, S.; DiStefano, C.; Sentürk, D. A Multi-Dimensional Functional Principal Components Analysis of EEG Data. Biometrics 2017, 3, 999–1009. [Google Scholar] [CrossRef] [Green Version]
Nie, Y.; Wang, L.; Liu, B.; Cao, J. Supervised Functional Principal Component Analysis. Stat. Comput. 2018, 28, 713–723. [Google Scholar] [CrossRef]
Pokora, O.; Kolacek, J.; Chiu, T.; Qiu, W. Functional Data Analysis of Single-Trial Auditory Evoked Potentials Recorded in the Awake Rat. Biosystems 2018, 161, 67–75. [Google Scholar] [CrossRef] [PubMed]
Scheffler, A.; Telesca, D.; Li, Q.; Sugar, C.A.; Distefano, C.; Jeste, S.; Şentürk, D. Hybrid Principal Components Analysis for Region-Referenced Longitudinal Functional EEG Data. Biostatistics 2018, 21, 139–157. [Google Scholar] [CrossRef] [PubMed]
Urigüen, J.A.; Garcia-Zapirain, B. EEG Artifact Removal—State-of-the-Art and Guidelines. J. Neural Eng. 2015, 12, 1–23. [Google Scholar] [CrossRef] [PubMed]
Akhtar, M.T.; Mitsuhashi, W.; James, C.J. Employing Spatially Constrained ICA and Wavelet Denoising, for Automatic Removal of Artifacts from Multichannel EEG Data. Signal Process. 2012, 92, 401–416. [Google Scholar] [CrossRef]
Mammone, N.; Morabito, F.C. Enhanced Automatic Wavelet Independent Component Analysis for Electroencephalographic Artifact Removal. Entropy 2014, 16, 6553–6572. [Google Scholar] [CrossRef] [Green Version]
Bajaj, N.; Requena Carrión, J.; Bellotti, F.; Berta, R.; De Gloria, A. Automatic and Tunable Algorithm for EEG Artifact Removal Using Wavelet Decomposition with Applications in Predictive Modeling During Auditory Tasks. Biomed. Signal Process. Control. 2020, 55, 101624. [Google Scholar] [CrossRef]
Eilers, P.H.C.; Marx, B.D. Flexible Smoothing with B-Splines and Penalties (with Discussion). Stat. Sci. 1996, 11, 89–121. [Google Scholar] [CrossRef]
Currie, I.D.; Durban, M. Flexible Smoothing with P-Splines: A Unified Approach. Stat. Model. 2002, 2, 333–349. [Google Scholar] [CrossRef]
Aguilera, A.M.; Aguilera-Morillo, M.C. Penalized PCA Approaches for B-Spline Expansions of Smooth Functional Data. Appl. Math. Comput. 2013, 219, 7805–7819. [Google Scholar] [CrossRef]
Aguilera-Morillo, M.; Aguilera, A.; Escabias, M. Penalized Spline Approaches for Functional Logit Regression. TEST 2013, 22, 251–277. [Google Scholar] [CrossRef]
Aguilera, A.; Aguilera-Morillo, M.; Preda, C. Penalized Versions of Functional PLS Regression. Chemom. Intell. Lab. Syst. 2016, 154, 80–92. [Google Scholar] [CrossRef]
Aguilera-Morillo, M.; Aguilera, A.; Durbán, M. Prediction of Functional Data with Spatial Dependence: A Penalized Approach. Stoch. Environ. Res. Risk Assess. 2017, 31, 7–22. [Google Scholar] [CrossRef]
Aguilera-Morillo, M.; Aguilera, A. Multi-Class Classification of Biomechanical Data: A Functional LDA Approach Based on Multi-Class Penalized Functional PLS. Stat. Model. 2020, 20, 592–616. [Google Scholar] [CrossRef]
Vidal, M.; Aguilera, A.M. pfica: Independent Component Analysis for Univariate Functional Data; R Package Version 0.1.2. 2021. Available online: https://CRAN.R-project.org/package=pfica (accessed on 25 May 2021).
Ghanem, R.; Spanos, P. Stochastic Finite Elements: A Spectral Approach; Springer: New York, NY, USA, 1991. [Google Scholar]
Acal, C.; Aguilera, A.M.; Escabias, M. New Modeling Approaches Based on Varimax Rotation of Functional Principal Components. Mathematics 2020, 8, 2085. [Google Scholar] [CrossRef]
Delaigle, A.; Hall, P. Defining Probability Density for a Distribution of Random Functions. Ann. Stat. 2010, 38, 1171–1193. [Google Scholar] [CrossRef] [Green Version]
Gutch, H.W.; Theis, F.J. To Infinity and Beyond: On ICA over Hilbert spaces. In Latent Variable Analysis and Signal Separation; Theis, F.J., Cichocki, A., Yeredor, A., Zibulevsky, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 180–187. [Google Scholar]
Silverman, B.W. Smoothed Functional Principal Components Analysis by Choice of Norm. Ann. Stat. 1996, 24, 1–24. [Google Scholar] [CrossRef]
Qi, X.; Zhao, H. Some Theoretical Properties of Silverman’s Method for Smoothed Functional Principal Component Analysis. J. Multivar. Anal. 2011, 102, 742–767. [Google Scholar] [CrossRef] [Green Version]
Lakraj, G.P.; Ruymgaart, F. Some Asymptotic Theory for Silverman’s Smoothed Functional Principal Components in an Abstract Hilbert Space. J. Multivar. Anal. 2017, 155, 122–132. [Google Scholar] [CrossRef]
Ocaña, F.A.; Aguilera, A.M.; Valderrama, M.J. Functional Principal Component Analysis by Choice of Norm. J. Multivar. Anal. 1999, 71, 262–276. [Google Scholar] [CrossRef] [Green Version]
Aguilera, A.M.; Aguilera-Morillo, M.C. Comparative Study of Different B-Spline Approaches for Functional Data. Math. Comput. Model. 2013, 58, 1568–1579. [Google Scholar] [CrossRef]
Ocaña, F.A.; Aguilera, A.M.; Escabias, M. Computational Considerations in Functional Principal Component Analysis. Comput. Stat. 2007, 22, 449–465. [Google Scholar] [CrossRef]
Kollo, T. Multivariate Skewness and Kurtosis Measures with an Application in ICA. J. Multivar. Anal. 2008, 99, 2328–2338. [Google Scholar] [CrossRef] [Green Version]
Loperfido, N. A New Kurtosis Matrix, with Statistical Applications. Linear Algebra Its Appl. 2017, 512, 1–17. [Google Scholar] [CrossRef]
Rice, J.A.; Silverman, B.W. Estimating the Mean and Covariance Structure Nonparametrically when the Data are Curves. J. R. Stat. Soc. Ser. B 1991, 53, 233–243. [Google Scholar] [CrossRef]
Schäfer, J.; Strimmer, K. A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics. Stat. Appl. Genet. Mol. Biol. 2005, 4, 1–29. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Noury, N.; Hipp, J.F.; Siegel, M. Physiological Processes Non-linearly Affect Electrophysiological Recordings During Transcranial Electric Stimulation. Neuroimage 2016, 140, 99–109. [Google Scholar] [CrossRef] [PubMed]
Novi, S.L.; Roberts, E.; Spagnuolo, D.; Spilsbury, B.M.; Price, D.C.; Imbalzano, C.A.; Forero, E.; Yodh, A.G.; Tellis, G.M.; Tellis, C.M.; et al. Functional Near-Infrared Spectroscopy for Speech Protocols: Characterization of Motion Artifacts and Guidelines for Improving Data Analysis. Neurophotonics 2020, 7, 1–15. [Google Scholar] [CrossRef] [PubMed]
Artoni, F.; Delorme, A.; Makeig, S. Applying Dimension Reduction to EEG data by Principal Component Analysis Reduces the Quality of its Subsequent Independent Component Decomposition. Neuroimage 2018, 175, 176–187. [Google Scholar] [CrossRef]
Tong, S.; Thakor, N.V. Quantitative EEG Analysis Methods and Clinical Applications; Artech House: Boston, MA, USA, 2009. [Google Scholar]
Ieva, F.; Paganoni, A.M.; Tarabelloni, N. Covariance-based Clustering in Multivariate and Functional Data Analysis. J. Mach. Learn. Res. 2016, 17, 1–21. [Google Scholar]
Zhang, X.; Wang, J.L. From Sparse to Dense Functional Data and Beyond. Ann. Stat. 2016, 44, 2281–2312. [Google Scholar] [CrossRef]
Xiao, L. Asymptotic Theory of Penalized Splines. Electron. J. Stat. 2019, 13, 747–794. [Google Scholar] [CrossRef]

Figure 1. The estimated log-bcv

(λ)

function for the first components of each EEG dataset containing different classes of artifacts.

Figure 1. The estimated log-bcv

(λ)

function for the first components of each EEG dataset containing different classes of artifacts.

Figure 2. Artifactual eigenfunctions selected from each trial. The unpenalized FICA (grey) and P-spline smoothed FICA (black dashed) decompositions are compared. The scalp maps represent the scores depicted in the spatial electrode domain obtained by projection of the smooth eigenfunctions in the original sample.

Figure 3. (a) Topographic maps representing the leading functional principal component of the averaged trials before performing the P-spline smoothed FICA and after. (b) Grand-average across trials of a prefrontal channel where the artifactual activity is revealed. A descriptive scheme of the movement is provided at the bottom of the plots. (c) Box plots of the number of components, the selected penality parameter and the cummulative variance of the model.

Table 1. Summary of parameters and cumulative variance of the FICA model.

Trial	$j_{0}$	q	$λ$	log-bcv $(λ)$	var (%)	var (%)
Trial					$λ$	$λ = 0$
Nodding	6	5	$10^{8}$	10.66	99.40	94.43
Arm mov.	4	2	4000	13.91	75.85	62.42
Blinks	4	3	400.0	13.76	97.50	93.56
Chewing	5	4	0.300	13.01	68.23	68.03

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vidal, M.; Rosso, M.; Aguilera , A.M. Bi-Smoothed Functional Independent Component Analysis for EEG Artifact Removal. Mathematics 2021, 9, 1243. https://doi.org/10.3390/math9111243

AMA Style

Vidal M, Rosso M, Aguilera AM. Bi-Smoothed Functional Independent Component Analysis for EEG Artifact Removal. Mathematics. 2021; 9(11):1243. https://doi.org/10.3390/math9111243

Chicago/Turabian Style

Vidal, Marc, Mattia Rosso, and Ana M. Aguilera . 2021. "Bi-Smoothed Functional Independent Component Analysis for EEG Artifact Removal" Mathematics 9, no. 11: 1243. https://doi.org/10.3390/math9111243

APA Style

Vidal, M., Rosso, M., & Aguilera , A. M. (2021). Bi-Smoothed Functional Independent Component Analysis for EEG Artifact Removal. Mathematics, 9(11), 1243. https://doi.org/10.3390/math9111243

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bi-Smoothed Functional Independent Component Analysis for EEG Artifact Removal

Abstract

1. Introduction

2. Smoothed Functional Independent Component Analysis

2.1. Preliminaries

2.2. Functional ICA of a Smoothed Principal Component Expansion

3. Basis Expansion Estimation Using a P-Spline Penalty

4. Parameter Tuning

Penalty Parameter Selection

5. Simulation Study

6. Estimating Brain Signals from Contaminated Event-Related Potentials

7. Discussion

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations and Abbreviations

Abbreviations

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI