Bi-Smoothed Functional Independent Component Analysis for EEG Artifact Removal

Motivated by mapping adverse artifactual events caused by body movements in electroencephalographic (EEG) signals, we present a functional independent component analysis based on the spectral decomposition of the kurtosis operator of a smoothed principal component expansion. A discrete roughness penalty is introduced in the orthonormality constraint of the covariance eigenfunctions in order to obtain the smoothed basis for the proposed independent component model. To select the tuning parameters, a cross-validation method that incorporates shrinkage is used to enhance the performance on functional representations with large basis dimension. This method provides an estimation strategy to determine the penalty parameter and the optimal number of components. Our independent component approach is applied to real EEG data to estimate genuine brain potentials from a contaminated signal. As a result, it is possible to control high-frequency remnants of neural origin overlapping artifactual sources to optimize their removal from the signal. An R package implementing our methods is available at CRAN.


Introduction
In the field of neurophysiology, electroencephalography (EEG) represents one of the few techniques providing a direct measure of bioelectrical brain activity, as oscillations in excitability of populations of cortical pyramidal cells [48] contribute to variations in the electrical potentials over the scalp. Oscillations are characterized by dominant intrinsic rhythms conventionally grouped into frequency bands, which are by now validated as markers of several neurocognitive phenomena [12]. However, despite the temporal resolution achievable with its high sampling rate, EEG is a technique that suffers from low signal-to-noise ratio. This is mainly due to the fact that the layers of tissue dividing the electrodes from the cortex act as a natural filter attenuating genuine brain activity, resulting in a combination of cortical and artifactual sources in the EEG signal. In addition, the dominant brain-related spectral features often overlap with artifactual activity in higher frequency bands [13], and particularly at lower frequencies most of the variance in the signal is explained by physiological sources outside the brain. For these reasons, analyzing EEG signals can ultimately be viewed as solving a source-separation problem with the goal of estimating brain potentials of interest.
Blind source separation techniques such as independent component analysis (ICA) are commonly used to address artifact detection and correction of EEG signals. The term ICA encompasses a broad scope of algorithms and theoretical rudiments aligned to the assumption of independence of the latent sources in the data. From the statistical perspective, it could be regarded as a refinement of principal component analysis (PCA) that goes beyond the variance patterns of the data, introducing high-order measures such as kurtosis or negentropy to get more interpretable outcomes. This way, the data can be approximately represented in terms of a small set of independent variables, while in the PCA reduction, these variables are only assumed to be uncorrelated. An overview of statistical methodologies for ICA is provided in [30]. A comprehensive monograph of the subject can be found in [21].
The use of sampling units in form of functions that evolve on a continuum, rather than through vectors of measurements, has been popularized over the last two decades to solve a broad class of problems. Functional data analysis provides a natural generalization for a wide variety of statistical techniques that take advantage of the complete functional form of data by including relevant information related to smoothness and derivability (see [20,38,47] for a systematic review of the topic). The extension of ICA to functional data has, however, not yet received the attention nor the prolific developments of other reduction techniques in this framework, such as functional principal component analysis (FPCA). A first attempt to develop an extension of the classic multivariate ICA model was investigated in [28] by exploiting the functional principal component decomposition. Functional ICA techniques were also implemented in [35], who defined the kurtosis operator of a standardized sample in an approximation to a separable infinite-dimensional Hilbert space. Under this setting, the kurtosis eigenfunctions are expected to be rougher as the space does not contain functions that are pointwise convergent. Their approach focuses on the classification properties of the kurtosis operator, whose decomposition is assumed to have a similar form to the Fisher discriminant function. More recently, [25,46] developed a functional ICA model using an estimation procedure stemmed from the finite Karhunen-Loève (K-L) expansion [10, pp. 37], which is a less rough space since its orthogonal expansion is optimal in the least-squared error sense. We extend this model setup endowing the space with a new geometrical structure given by a Sobolev inner product in order to control the roughness of the K-L functions.
The use of functional data in brain imaging analysis has gained notoriety in the last years, despite the complexity and computational cost arisen in its treatment. Data acquired from an electroencephalogram might elicit a wide variety of functional data methods, going from the estimation of smoothed sample curves to more advanced reduction and forecast techniques. See, for example, [19,29,36,41,50]. Current research is mainly focused on functional principal component approaches for modelling data free of artifactual sources. However, the efficiency of functional ICA techniques used in stages where data is contaminated by physiological artifacts remains, to the best of our knowledge, untested. In contrast, this problem has been extensively addressed in the multivariate environment; [44] compares the state-of-the-art methods for artifact removal.
In this paper, a methodology based on piecewise polynomial smoothing (B-splines) is developed to disentangle the overlap between neural activity and artifactual sources. Because of the transient nature and the complex morphology of EEG data, B-splines provide a good alternative to represent the non-sinusoidal behaviour of the neural oscillatory phenomena due to its well-behaved local smoothing. The goal is to use the proposed smoothed functional ICA to get more accurate brain estimates by subtracting artifacts free of noise. While for a strictly different kind of data, waveletbased approaches or hybrid settings combining wavelet with ICA have been demonstrated to perform well at denoising common artifacts (see, e.g., [8,11,13,27]). By contrast, and despite the obvious differences between both data kinds, our independent component estimation is based on a penalized spline (P-spline) approach [14,16] that has a lower computational cost and is mathematically simpler. P-splines have been successfully applied for dimension reduction [4] as well as for the estimation of different functional regression models [2,[5][6][7].
Nevertheless, what characterizes our method is that the decomposition is naturally regulated by the principal component eigendirections and optimized by penalized estimators. Contrarily, in using the wavelet approaches, this is decided on the basis of the frequency band features of the data or the components. For this reason, the proposed functional ICA can be conceived as a bi-smoothed estimation procedure. The end-user will finally appreciate how artifact extraction can be fine-tuned by regulating a single smoothing parameter, making it intuitive to improve the results through a visual inspection of the independent component scores.
The paper is organized as follows. We introduce our model in Section 2 and develop the smoothed FICA decomposition using basis expansion representations of functional data in Section 3. A method for selecting the tuning parameters is discussed in Section 4. To test the effectiveness of our model in recovering brain signals, Section 5 provides a simulation using real EEG data on single trial designs containing stereotyped artifacts. Section 6 shows how our smoothed FICA works in the context of event-related potentials designs. Finally, we conclude with a brief discussion in Section 7. The presented P-spline smoothed FICA is implemented in the R package pfica [45].

Preliminaries
Let y i = (y i1 , . . . , y im i ) T be a signal of i, (i = 1, . . . , n) components digitized at the sampling points t ik , (k = 1, . . . , m i ). Consider that the sample data is observed with error, so that it can be modeled as where x i is the ith functional trajectory of the signal and ε ik mutually independent measurement errors with zero means. The sample functions x 1 , . . . , x n are assumed to be realizations of independent and identically distributed copies of a random functional variable X in L 2 (T ), the separable Hilbert space of square integrable functions from T to R, endowed with the usual inner product f , g = T f (t)g(t)dt, and the induced norm f = f , f 1/2 . Thorough the text, X is assumed to have zero mean and finite fourth moments, which implies that higher order operators are well defined.
For s,t ∈ T , the sample covariance operator C x is an integral oper- where {η j , γ j } j is a positive sequence of eigenvalues in descending order and their associated orthonormal eigenfunctions. The functions x i (t) can be approximately represented by a truncated series of the K-L expansion where z i j = x i , γ j are zero mean random variables with var(z j ) = η j and cov(z j , z j ′ ) = 0 for j = j ′ . These variables are referred to as the principal components scores and are uncorrelated generalized linear combinations of the functional variable with maximum variance. Moreover, if the q term in (2) is optimally selected, the mean squared error is minimized, providing the best linear approximation to the original data [17] (pp. 21). A functional Varimax rotation has been recently introduced to improve the interpretation of the most explicative principal component scores [1].

Functional ICA of a smoothed principal component expansion
The notion of independent components of a random vector cannot be immediately extended to the case of Hilbert-valued random elements (functional data) due to the fact that a probability density function is not generally defined in this context [15]. In the sequel, we consider the definition of independence introduced in [18], which establishes that a functional random variable has independent components if the coordinates obtained after projecting on to a given orthonormal basis are independent variables. Then, the aim of functional independent component analysis (FICA) is to find a linear operator Γ, such that for a truncated orthonormal basis φ j ( j = 1, . . . , q) in L 2 (T ), the variables ΓX, φ j are mutually independent. By considering X prompted by a Gaussian process, a functional principal component analysis (FPCA) would suffice to obtain the independent components [10, pp. 40] . However, as functional data is not inherently of this kind, it is assumed that if X has a finite-dimensional representation, then it can be transformed by the operator Γ to achieve the goals of the model. This begs the question of the basis choice for X, whereupon the results markedly depend. In this paper, the sample x i is approximated by a smoothed functional PCA representation obtained by introducing an orthonormality constraint with respect to the weighted Sobolev inner product where R is an operator with the action R f (t) = d 2 f (t)/dt 2 , f ∈ dom(R) that measures the roughness of the curves, and λ is a nonnegative penalty parameter. Notice that, when λ = 0, (3) is simplified to the usual inner product, meaning that x i can be uniquely represented by the K-L basis, i.e. the eigenfunctions of C x . To estimate the smoothed principal components, Silverman [42] proposed the following variance maximization problem subject to the constraint γ, γ λ ,k λ = 0 for all k < j, where γ is a function assumed in a closed linear subspace of L 2 with square-integrable second derivatives. We emphasize that, the problem of finding γ λ , j depends on the sample size n and the selection of the penalization parameter λ . In [37], the authors established the existence of the solutions of the optimization problem (4) for any λ ≥ 0. Silverman [42] proved the consistency of the estimators as n → ∞ and λ → 0. Generalized consistency and asymptotic distributions of the estimators have been derived in [24], using expansions of the perturbed eigensystem of a sample smoothed covariance operator. The functions {γ λ , j } form a complete orthonormal system in the subspace endowed by ·, · λ , making this basis non-compatible for our independent component model in L 2 (T ). However, [34] generalized Silverman's method providing the following equivalents functional PCA. Proposition 1. Given a sample {x i } of a functional variable with trajectories in L 2 (T ), there exists a positive definite operator S 2 such that the following PCA decompositions are equivalent: Therefore, the eigenfunctions of the covariance operator C Sx = SC x S of the smoothed sample S(x i ) are given by β j = S −1 (γ λ , j ), where γ λ , j are obtained by the penalized estimation procedure set out for (4). Then, the basis β j is orthonormal with respect to the usual inner product in L 2 (T ), so that the smooth sample data S(x i ) can be approximated by its truncated K-L expansion where The functional ICA version proposed in this paper uses the elements of this expansion to estimate the independent components of the original data.
Our main assumption facts that the target functions can be found in the space spanned by the first q eigenfunctions of the operator C Sx , as it is endowed with a smooth second-order structure represented by the major modes of variation of the empirical data. Thus, in such eigensubspace, it is expected to gain some accuracy in the forthcoming results due to the attenuation of the higher oscillation modes corresponding to the small eigenvalues of C Sx . Henceforth, we denote by M q = span{β 1 , . . . , β q } the subspace spanned by the q first eigenfunctions of C Sx . Without loss of generality, M q will be assumed to preserve the inner product in L 2 (T ).
Most of the multivariate ICA methods require the standardization of the observed data with the inverse square root of the covariance matrix in order to remove any linear dependencies and normalize the variance along its dimensions. In infinite-dimensional spaces, however, covariance operators are not invertible giving rise to an ill-posed problem. As long as our signal is represented in M q , no regularization is needed and under moderate conditions, the inverse of the covariance operator can be well defined. Since standardization is a particular case of whitening (or sphering), we can generalize the procedure in the form of a whitening operator Ψ that transforms a function in M q into a standarized function on the same space. This implies that Ψ( χ q ) =χ q is a standardized functional sample whose covariance operator Cχq naturally satisfies to be the identity inside the space.
As an extension of the multivariate case, the sample kurtosis operator of the standardized data is usually defined as where denotes the kurtosis kernel function ofχ q , and h the function in M q to be transformed. In the remainder of this article, it is assumed that the kurtosis operator is positive-definite, Hermitian and equivariant (see [25]). Again, by Mercer's theorem its kernel admits the eigendecomposition where {ρ l , ψ l } q l=1 is a positive sequence of eigenvalues and related eigenfuntions. With this, we can define the independent components of χ q i as mutually independent variables with maximum kurtosis given by ζ il,χ q = χ q i , ψ l . Challenging questions arise on how the Karhunen-Loève Theorem might be applied in this context. Intuitively, we note that this procedure leads to the expansionχ q i (t) = ∑ q l=1 ζ il,χ q ψ l (t) which can be approximated in terms of r eigenfuntions ψ l of interest, e.g. those associated with the independent components with extreme kurtosis values. Under mild conditions, this problem was solved in [25,46] by choosing r = q. However, there are other possibilities, such as considering r < q or {ψ 1 , . . . , ψ q } as a basis of projection for either x, χ q orχ q , in view of the fact that it preserves the four-order structure of the standardized data.

Basis expansion estimation using a P-spline penalty
In order to estimate the independent components from noisy discrete observations in Equation (1), it will be assumed that the tajectories belong to a finite-dimensional space of L 2 (T ) spanned by a set of Bspline basis functions {φ 1 (t), . . . , φ p (t)}. Then, each sample curve can be expanded as or, in matrix form, The basis coefficients for each sample curve can be found by least squares approximation minimizing the mean squared error For general guidance on both definition knots and order of B-splines, we refer the reader to [38]. Although in this paper a non-penalized least squares approximation is assumed, [3] give a detailed account of how to estimate the basis coefficients using different roughness penalty approaches (continuous and discrete) in terms of B-splines. The next step consists of smoothing the sample curves in terms of the smoothed principal components and associated weight functions β j in (5). To do so, we next derive the P-spline FPCA approach developed in [4] that incorporates a discrete penalty based on dorder differences of adjacent B-spline coefficients (P-spline penalty) in the orthonormality constraint. Let us consider the B-spline basis expansion of the covariance eigenfunctions

being its vector of basis coefficients, and a discrete P-spline roughness penalty function defined by PEN
Throughout the paper, we assume two order differences defining the penalty This way, the inner product in (3) is given in terms of B-splines expansions as p). Then, the maximization problem in (4) is equivalent to solve the following matrix problem: subject to the constraint b T (G + λ P 2 ) b λ ,k = 0 for all k < j, where Σ A = n −1 A T A and λ ≥ 0 is the penalty parameter used to control the trade-off between maximizing the sample variance and the strength of the penalty. Because B-spline basis are non-orthonormal with respect to the usual L 2 geometry, we can apply Cholesky factorization of the form LL T = G + λ P 2 in order to find a non-singular matrix that allows us to operate in terms of the B-spline geometrical structure induced into R q . Then, finding the weight coefficients corresponds to solve the eigenvalue problem where v j = L T b λ , j and the coefficients of γ λ , j are b λ , j = (L −1 ) T v j . Therefore, we have obtained a set of orthonormal functions with respect to the inner product ·, · λ . The jth smoothed principal component is then given by Thus, the problem is reduced to the multivariate PCA of the matrix AG (L −1 ) T in R q (see [4] for a detailed study). From the results in [33,34] we deduce in this paper the expression of the smoothing operator S that provides the equivalence between this multivariate PCA and the functional PCA of the smoothed data S(x i ) in L 2 (T ). (7) for a random sample {x i } of curves in L 2 (T ), the PCA of the matrix AG (L −1 ) T with the usual inner product in R p is equivalent to all FPCA in Proposition 1 with the operator S 2 defined as S 2 [33] proved that the PCA of matrix AD T with the usual inner product in R p is equivalent to FPCA of x i with respect to ·, · K . That is,

Proposition 2. Given the basis expansion
with v j being the eigenvectors of the matrix AD T . Then, from Proposition 1 in this paper, we have that As a consequence we obtain that R = I p and As a result, the principal components (scores) of S(x i ) are given by Z = AG (L −1 ) T V where V is the matrix whose columns are the eigenvectors v j verifying Equation (9), and thus the eigenfunctions β j are β j = S −1 (γ λ , j ).
Having estimated the weight functions coefficients and principal components scores, assume next that the smooth principal component expansion in (5) is truncated at the q-term. Then, the column vector of smoothed sample curves is given by χ q (t) = Z q β (t) , where Z q = (z i j ) ∈ R n×q is the matrix whose columns are the first q principal components scores with respect to the basis of smooth principal component weight functions β (t) = (β 1 (t), . . . , β q (t)) T .
With the above results, the functional independent components are computed from the smoothed principal component approximation of functional data. Following the ICA pre-processing steps, we first standardize the approximated curves defining the whitening operator as Ψ{ χ q (t)} =χ being the matrix of standardized principal components and Σ n{(Z q ) T Z q } −1/2 , the inverse square root of the covariance matrix of Z q . The described whitening transformation is essentially an orthogonalization of the probabilistic part of χ q , so the matrix Z q ∈ R n×q naturally satisfy ΣZq = I q , and the associated covariance operator Cxq is unitary.
Then, the kurtosis operator (6) of the standardized curvesχ q (t) is given in matrix form by where DZq = diag( Z q Z q T ). The eigenanalysis of this kurtosis operator leads to the diagonalization of the kurtosis matrix of the standardized principal components Z q , where Σ 4, Z q ∈ R q×q is defined as withz q i being the column vector q × 1 with the ith row of the matrix Z q . The eigenproblem (10) is not restricting to assume that Σ 4, Z q is uniquely determined. In fact, other kurtosis matrices can be considered (see, e.g., [23,26]). This way, the P-spline smoothed functional ICA of x in L 2 (T ) is obtained from the multivariate ICA of Z q in R q . The resulting weight functions are now ψ l (t) = β (t) T u l (l = 1, . . . , q), where the coefficients vectors u l are the eigenvectors of the predefined kurtosis matrix. Then, the independent components can be calculated as ζ l,χ q =Z q u l . Finally, the operator Γ defining the FICA model is with z q i being the column vector q × 1 with the ith row of Z q and U ∈ R q×q the matrix of eigenvectors of the kurtosis matrix Σ 4, Z q .

Parameter tuning
The problem concerning the estimation of the smoothed independent component curves lies in finding an optimal truncation point q, as well as a suitable penalty parameter. As q approaches p, more of the higher oscillation modes of the standardized sample are induced in the estimation. Otherwise, we are denoising the data from its second and fourth-order structure simultaneously. From this perspective, it is desirable to increase the value of q such that the latent functions of the whitened space can be captured by the kurtosis operator. Observe that this kind of regularization is not exactly the same as the one providing the P-spline penalization of the roughness of the weight functions. Attenuating the higher frequency components of the FPCA model does not necessarily affect an entire frequency bandwidth of the data. Thus, if the original curves are observed with independent error, and the error is persistent in the functional approximation, it may overlap the estimation of the kurtosis eigenfunctions. In this context, smoothing would be appropriate. Once the value of q is decided, we should examine those components with extreme kurtosis, contrary to the FPCA where only the components associated to large eigenvalues are considered.

Penalty parameter selection
Leave-one-out cross-validation [39] is generally used to select the penalty parameter in order to achieve a suitable degree of smoothness on the weight functions, but also to induce the truncation point q. In a more explicit and condensed form, this procedure in our model lies in finding a value of λ that minimizes where is the reconstruction of the ith curve x i in terms of the q first smoothed principal components by leaving out it in the estimation process. We found, however, that cross-validation was not sensitive for a reasonably large basis dimension, forcing us to reformulate the strategy. To address this problem, the penalty parameter might be subjectively chosen although this can lead to the bias and poor extraction of the artifactual sources. Hence, for the results presented in this paper, we propose a novel adaptive approach which consists in replacing (11) by where χ q;λ (−i) i is a smoothed representation of x i for some λ and ℓ > 0 a value that increases the penalty in the second term of the norm, assume ℓ = 0.1. Then, for a fixed q, (12) is iterated for each λ in a given grid to find the one that minimizes BCV q (λ ). Among all the q considered in the estimation process, we select the truncation point that minimizes this function.
If we require a basis dimension p greater than sample size n, a shrinkage covariance estimator [40] can be considered for computing Σ A . This method guarantees positive definiteness and consequently an estimation of the higher and important eigenvalues not biased upwards. The same strategy is used for BCV q (λ ). Recall the quadratic distances in (12). These are given in terms of basis functions by where b j = (b j1 , . . . , b jp ) T is the vector of basis coefficients of the jth weight function β j in the B-spline basis φ j (t) and e i = (e i1 , . . . , e ip ) T is a vector of residuals. Next, the matrix E = (e i j ) ∈ R n×q is reconstructed via shrinkage. That is, first we compute cov S (E ) where cov S is a predefined shrinkage covariance estimator, then we apply Cholesky decomposition of the form LL T = cov S (E ). Finally, the basis coefficients of the reconstructed residual functions areê i = (L −1 ) T e i , and consequently now We call this method baseline cross-validation, as it operates across different reconstructions of x i for a given baseline penalty parameter and a fixed q. This approach is more versatile and particularly useful when the original curves are extremely rough and approximated with a larger basis dimension, thus avoiding the least squares to collapse. Moreover, for a given q, it allows to score more than one λ as a result of the various relative minima it produces. The intuition behind baseline cross-validation is that there are several smoothing levels to endow the estimator with the ability for predictive modelling. These are given at evaluating "short distances" for a smoothing baseline λ in a given χ q i , which may be seen as a way of finding a trade-off for the global roughness of a q-dimensional basis. Note that, as the value of q increases, and despite the minimization of the mean squared error, it may be more difficult to find a smoothing balance between the elements of the basis due to a complex fabric of variability modes.

Simulation study
A simulation study based on EEG data segments containing stereotyped artifacts was conducted to validate our methods for recovering brain sources. The data consists of 4 separate 64-channel recordings of a subject performing the following classes of self-paced repetitive movements: nodding, hand-tapping with a wide arm movement, eye-blinking and chewing. Recordings were performed in absence of sensory stimulation in a trial length 3 seconds sampled at 1 kHz, i.e., t ik (i = 1, . . . , 64; k = 1, . . . , 3000). More details on the preprocessing steps and experimental conditions are deferred to the online supplementary material. In reconstructing the functional form of the sample paths, we sought a less smooth fitting to mimic the brain potential fluctuations. Accordingly, a basis of cubic B-spline functions of dimension p = 230 is fitted to all signal components minimizing the mean squared error to a negligible value.
The process of identifying artifactual functions is addressed by using topographic maps that roughly represent patterns of eigenactivity related to the distribution of bioelectric energy on the scalp. These maps are elaborated from the projection of the signal components x 1 , . . . , x 64 on to the basis of independent weight functions, i.e. ζ il,x = x i , ψ l (i = 1, . . . , 64; l = 1, . . . , q), whose resulting score vectors ζ l, χ = (ζ 1l , . . . , ζ nl ) T are depicted in the spatial electrode domain. Therefore, the aim is to examine how the kurtosis eigenfunctions contribute into x i to discern possible patterns of artifactual activity. The components identified as artifacts will be considered for subtraction.
In order to simplify the burden of a manual selection, assume that all ψ 1 , . . . , ψ q obtained from the model correspond to a structure of latent artifactual eigenpatterns. Moreover, let χ q i (t) = ∑ q l=1 ζ il,x ψ l (t) be an expansion of artifactual components and related artifactual eigenfunctions. Then, the artifact subtraction in terms of basis expansions is where d i j are the cleaned (or residual) coefficients, with u l being the vector of coefficients of the independent weight function ψ l in terms of the principal eigenfunctions. Thus, given the model parameters q and λ , the procedure to estimate and remove smooth artifactual components from EEG functional data can succinctly be derived as follows: Algorithm FUNCTIONAL ARTIFACT SUBTRACTION Input: A, φ j ( j = 1, . . . , p), G , P 2 , λ , q Output: d j .
1: Calculate L −1 via Cholesky decomposition of the matrix G + λ P 2 = LL T . 2: Perform the PCA of AG (L −1 ) T . Obtain Z q and the coefficients b j of β j .

5: Calculate
6: Select the artifactual score vectors in ζ l,x . Expand the artifactual space as χ q 7: Subtract the artifactual coefficients in terms of φ j using (13) and obtain the vector of coefficients d j to reconstruct the functional brain signal.
Baseline cross-validation was performed on a given grid, selecting the value which minimizes BCV q (λ ) for q = 1, . . . , j 0 where j 0 is defined as the index entry corresponding to the first relative maximum of the first order differences of FPCA's eigenvalues ∆η j . We find that truncating at q = j 0 is a way of exploring independence in the high variability structure of the data. In analysing EEG signals, this entails major effectiveness at reducing the artifactual content to a few eigenfunctions, particularly for the low-frequency physiological activity such as blinks and movement-related artifacts. One may see this truncation rule as a measure to improve the accuracy in the estimation of certain artifacts, while preserving the modes of variability related to the rhythms of the latent brain processes. The log-distances using BCV(λ ) for each one of the datasets are shown in Figure 1. Further results are presented in Table 1.  Preliminary results comparing both penalized and non-penalized estimation show that the smoothed FICA presumably attenuates the high-frequency potentials of neural origin, revealing the latent shape of the artifact. More importantly, however, is that all topographic maps reflect well-known spatial activation of the artifactual content. A selection of eigenfunctions from each trial and their associated component scores are depicted in Figure 2. Physiological non-brain activity near the recording zone, such as blinks and large amplitude body movements, can be easily detected in controlled conditions using the proposed methodology. However, the coexistence of such artifacts may result in a non-linear distortion of them, e.g., via large changes of the impedance [31]. This could entail a more challenging situation, as algorithms based on linear mixing may not be that effective at a certain point. Nonetheless, the aforementioned artifacts enhance the role of smoothing due to their low-frequency trademark in the signal. In contrast, when artifacts are characterised by localised high-amplitude curves, as it is the case of the fourth artifactual eigenfunction (chewing), smoothing is not able to denoise effectively. We believe this happens for two reasons: first, the noise provided by the fourth-order structure of the model is essential to configure the shape of the artifact; second, the B-spline basis has a limited flexibility to smooth abrupt local contours. Hence, artifacts such as jaw clenching and chewing are quite sensible to smoothing and difficult to correct for subtraction. Interestingly, hybrid procedures combining spline interpolation and wavelet filtering have shown promising results trying to solve this problem in functional near-infrared spectroscopy research (see [32]).
It seems reasonable to conjecture that restricting q to the first FPCA terms decreases the odds of getting spurious artifactual functions, as they represent dominant modes of variability usually related to large artifacts. In such cases, the artifact subtraction with the smoothed components preserved the brain activity rhythms in the original form, while for λ = 0 it caused a reduction and a distortion of relevant potentials. However, BCV may tend to oversmooth slightly in a sense of an effective artifact removal, resulting in certain artifactual residue after subtraction. This happens due to the complexity of the mixed sources and can be solved by examining other relative minima in our results. The plots for all channels and datasets comparing the effect of subtracting artifactual components are omitted for the sake of space. Online supplementary materials provide R code for its visualization.
Although our tests have provided good results by subtracting all smoothed components, further research is needed to corroborate their physiological validity. As reported in [9], reducing the dimensionality of the data with a PCA before applying ICA is not always beneficial, although in some cases may improve the signal-to-noise ratio of the large sources and their subsequent isolation. We see that our approach paves the way for developing measures of correlation, dipolarity, stability or sparsity in the functional data domain to fine-tune artifact selection. An important issue that remains open is whether the restriction imposed for the truncation point is beneficial or not to achieve better results.

Estimating brain signals from contaminated event-related potentials
To illustrate our methods, we reproduced a typical experimental scenario where a human participant had to perform full-arm movements synchronised to a periodic auditory stimulus. An EEG recording was performed during the task. Arguably, what we provide here is a paradigmatic example wherein the researcher needs to clean the signal from motion-related artifacts while preserving activity genuinely related to perceptual and motor brain processes. The subject was instructed to tap his hand on the table synchronizing with a steady auditory stimulus in one condition while listening to the same stimulus without any movement involved in the other. Disposing of a baseline, we could directly compare the outcome of our cleaning procedure with an uncontaminated experimental situation. We recorded 100 trials of 3 seconds per condition, divided into randomized blocks of 25 trials. The stimulus period was 750 milliseconds, i.e., 4 tappings in one of the conditions. Movements were intentionally exaggerated to maximize eventual movementrelated artifacts. In this section, the same configuration for running the model (p = 230; i = 1, . . . , 64; k = 1, . . . , 3000) is preserved from the previous one.
The P-spline smoothed FICA is performed at each trial to obtain brain estimates by subtracting the artifactual components. Here, the complexity of the signal increases as it is assumed a mixture of artifacts and other brain processes due to the cognitive task. Figure 3 shows the grand-averaged results comparing both conditions before and after the artifact removal. A FPCA is performed on the averaged data to visualize the spatial distribution of the scores in the direction of the leading eigenvector before and after the removal. As expected, the activations where nearly coincident after the artifact removal and more prominent in the central region of the scalp. The upper left panel displays the EEG signal in some frontal channels where the movement-related artifact is prominently visible before the subtraction. Further evidence of such artifactual content is given in second row where the raw curves are shown in the other condition. Clearly, the pooled artifacts across the trials have here a different origin. The same panel shows the curves after subtracting the artifactual curves.
Our procedure notably reduces the movement-related artifact and renders the signal more stationary. Indeed, differences are smaller in the non-movement condition but, in either case, our algorithm is capable of reducing artifactual content while retaining the brain activity intact. From our previous tests, one may expect some artifact residue at a trial level depending on the estimated λ and the diversity of source artifacts. We stress that as the response to the repeated stimulus is assumed to be invariant and small in terms of amplitude, averaging suppresses non-phase-locked activity and reveals the potential elicited by the stimulus [43]. Consequently, the attenuation of the roughness of the artifactual component functions will lead to a better estimation of brain potentials at averaging rather than the subtraction of rough components.

Discussion
The proposed independent techniques are, to the best of our knowledge, the first to provide a functional framework for smoothed artifact extraction and removal of dense data approximated with a large number of knots. We found that using shrinkage estimators is a reasonable starting point for smoothing covariance operators with this kind of functional data (see also [22]). According to this setting, a novel cross-validation method has been proposed for selecting the model parameters. Despite being computationally expensive, our approach has proven to outperform the lack of sensitivity of other existent methods. Overall, this allows the application of independent component techniques from a smoothing perspective somewhat more flexible when compared to other modelling strategies. Although [25] established a form of Fisher consistency for the kurtosis operator decomposition, no asymptotic results of the nonsmoothed and, hence of the smoothed independent components have been derived. Therefore, one can assume that we rely on a competitive performance derived from previous FPCA asymptotic results. In our empirical setting, however, the study of such properties must be related to the functional data type and the penalized spline method used, involving considerably more technicalities. See, for example, [51] and [49]. These theoretical developments lie beyond the scope of the present work. However, we hope to pursue such study in a separate paper.
In our simulations, the kurtosis operator has proven to work well at capturing artifactual eigenfunctions with different frequency characteristics, at least under certain conditions. One of the strengths of our model is the double regularization, which allows us to circumvent the leak of brain activity and get clean movement-related artifacts. In essence, the degree of separation is defined through the space dimension, from more dependent (first q terms of the FPCA decomposition) to more independent (q → p). Thus q acts as a regularization parameter to explore the variational component of the artifactual sources in the EEG signal, while λ provides more accurate estimations, particularly in using the first q terms of the K-L expansion. Further research is needed to determine how the model parameter selection can optimize the removal of artifacts with a minimum loss of variance patterns related to brain sources. Non-linear artifact distortion will inevitably suffer from cortical entrainment of challenging correction, suggesting the exploration of other subspaces prone to kurtosis data structures in addition to the smoothed principal component eigendirections.