iCanClean Removes Motion, Muscle, Eye, and Line-Noise Artifacts from Phantom EEG

The goal of this study was to test a novel approach (iCanClean) to remove non-brain sources from scalp EEG data recorded in mobile conditions. We created an electrically conductive phantom head with 10 brain sources, 10 contaminating sources, scalp, and hair. We tested the ability of iCanClean to remove artifacts while preserving brain activity under six conditions: Brain, Brain + Eyes, Brain + Neck Muscles, Brain + Facial Muscles, Brain + Walking Motion, and Brain + All Artifacts. We compared iCanClean to three other methods: Artifact Subspace Reconstruction (ASR), Auto-CCA, and Adaptive Filtering. Before and after cleaning, we calculated a Data Quality Score (0–100%), based on the average correlation between brain sources and EEG channels. iCanClean consistently outperformed the other three methods, regardless of the type or number of artifacts present. The most striking result was for the condition with all artifacts simultaneously present. Starting from a Data Quality Score of 15.7% (before cleaning), the Brain + All Artifacts condition improved to 55.9% after iCanClean. Meanwhile, it only improved to 27.6%, 27.2%, and 32.9% after ASR, Auto-CCA, and Adaptive Filtering. For context, the Brain condition scored 57.2% without cleaning (reasonable target). We conclude that iCanClean offers the ability to clear multiple artifact sources in real time and could facilitate human mobile brain-imaging studies with EEG.


1.Introduction
Noise affects the fidelity of data recordings in many experiments.Algorithms for removing noise could lead to better scientific conclusions from experimental studies.There are many options for removing noise from data recordings, such as frequency-based filtering [1], adaptive filtering [2], wavelets [3], [4], independent component analysis [3], [5], and other blind source separation techniques [3], [4].However, there is no single algorithm which is optimal for all scenarios.Efficacy of these approaches varies depending on the makeup of the recordings (e.g., transient versus static noise, large versus small amplitude noise, etc.).Here we describe a novel algorithm, termed iCanClean, that may have applications in high-density electroencephalography and other biomedical recording modalities.

The iCanClean Algorithm
The iCanClean algorithm consists of four steps.First, canonical correlation analysis is used to identify candidate noise components that project onto both the corrupted data recordings as well as the reference noise recordings.Second, a subset of noise components is selected for removal.Third, the projection from the bad components to the data channels is calculated.Fourth, the projected noise components are directly subtracted from the data channels.This process can be applied both to a large window of data and to a smaller sliding window to deal with static and/or transient noise.
Additional mathematical detail is subsequently provided, with key equations written in MATLAB format using built-in MATLAB functions.Variables are bolded.Functions are italicized.A list of variables is provided in Table 1, along with their description and dimensions.

Definitions
Let X be the corrupted data recordings the user wishes to clean with dimensions T x NData , where T is the number of time points (or samples) and NData is the number of data channels.Similarly, let Y be the reference noise recordings with dimensions T x NNoise , where NNoise is the number of reference noise channels.
Step 1 Given corrupted data to clean (X) and reference noise recordings (Y), canonical correlation analysis is used to identify latent sources of noise (i.e., candidate noise components) in common to both X and Y. [ A is an unmixing matrix that converts corrupted data recordings to candidate noise components as U = XMC*A , where XMC is the mean centered data.Similarly, B is an unmixing matrix that converts reference noise recordings to a second set of candidate noise components as V = YMC*B.Finally, R is a vector which contains the correlation between each unique (Ui , Vi) pair, where Ui and Vi are the i th columns of U and V, respectively.The number of candidate noise components identified, NComp , depends on the rank of the data.Specifically, NComp = min( rank(X) , rank(Y) ).Therefore, NComp ≤ min( NData , NNoise ).
With the iCanClean approach, we use corrupted data recordings as one set of inputs to canonical correlation analysis and reference noise recordings as the second set of inputs.Canonical correlation seeks to find the subspaces of these two datasets which are maximally correlated with each other.
Because the corrupted data recordings and reference noise recordings both contain noise, canonical correlation analysis will identify hidden noise components that are common to both datasets.
Canonical correlation analysis returns candidate components in ranked order.Thus, the noise component pair with the strongest relationship (i.e., largest R 2 correlation) appears first (U1 , V1).The next noise component pair (U2 , V2) has the second largest R 2 correlation and is independent of the first component pair, and so forth.The noise components identified by canonical correlation do not depend on how strongly the noise sources project onto the data channels or noise channels.Thus, both large and small noise sources are identified.In a subsequent step, candidate noise components will be appropriately scaled to match the channels being cleaned.
Step 2 A bad subset of components is identified.We suggest a basic thresholding technique where all components with an R 2 value ≥ a user-defined threshold are rejected.Alternative approaches can be employed as well, for example, by examining the frequency content of the candidate noise components.The user has a choice of whether to select U or V, or a combination of U and V as their noise components.
The best choice may vary by each specific application.As an example, we assume here that the user wishes to use mixtures (or subspaces) of the data channels to clean the data channels themselves (i.e., use U to clean X).
BadCompList = find( R.^2 ≥ Thresh ); BadCompActivity = U( : , BadCompList ); Step 3 The projection from the noise components onto the channels is calculated.We recommend using a least squares solution (e.g., using matrix division in MATLAB).Alternative options for calculating the projection include applying a Moore-Penrose pseudoinverse to the A and/or B unmixing matrices, but we have found that it does not perform as well in practice.

Table 1 :
List of variables UCandidate noise components (calculated from data channels) T x NComp V Candidate noise components (calculated from noise channels) T x NComp