Less Is Enough: Assessment of the Random Sampling Method for the Analysis of Magnetoencephalography (MEG) Data

: Magnetoencephalography (MEG) aims at reconstructing the unknown neuroelectric activity in the brain from non-invasive measurements of the magnetic ﬁeld induced by neural sources. The solution of this ill-posed, ill-conditioned inverse problem is usually dealt with using regularization techniques that are often time-consuming, and computationally and memory storage demanding. In this paper we analyze how a slimmer procedure, random sampling, affects the estimation of the brain activity generated by both synthetic and real sources.


Introduction
Some medical imaging modalities rely on mathematical methods to fully exploit the information contained in the measured data, especially when the solution of a complex ill-posed inverse problem is involved. In particular, magnetoencephalography (MEG) [1] and electroencephalography (EEG) [2] require specific attention: These two techniques record the magnetic field and the electric potential difference induced by neural currents, respectively, by means of sensors placed outside or on the skull. Due to several factors (the distance of the sensors from the sources, the weakness of the neural currents, the changes in the sensitivity of the different tissues-brain, cerebrospinal fluid, skull, and scalp-between the source and the sensors), the signal-to-noise ratio (SNR) of these data is extremely low. Several methods have been proposed for the inversion of the M/EEG problem, i.e., the reconstruction of the brain activity from the measured magnetic/electric signals: Approaches implementing L 2 regularization [3] and its variation [4] have been extensively used and, more recently, mixed methods have also been implemented [5,6]. Beamforming methods have shown to be particularly effective [7][8][9] in the analysis of MEG data.
All the inversion methods require the computation of the forward solution, i.e., the electric potential and the magnetic field produced at sensor positions by a given neuro-electric current density supported on the source space: The source space represents the cortical surface where the neural currents are generated and it is usually modeled using magnetic resonance (MRI) images. If MRI images are not available, templates can be used. In order to accurately represent the cortex, the source space consists of thousands of points, a distant few millimeters from each other. Each point represents the position of a current dipole on the cortical surface and for each of these points the dipole amplitude is estimated by solving the M/EEG inverse problem. This operation can be time and memory-consuming, especially if one considers the new trends in MEG and EEG communities: the real time analysis [10,11] and the employment of portable devices, such as wireless EEG caps or MEG helmets based on room temperature magnetometers [12][13][14][15]. So far, M/EEG data are pre-processed and analyzed off-line, but currently there is a lot of interest in the development of tools for real time processing and data analysis, both within and outside laboratories [16][17][18]. For these reasons, there is a need for fast, slim methods for the solution of the inverse M/EEG problem that can also be performed on tablets or single-board computers.
The RAndoM Sampling mEThod (RAMSET) [19,20] meets these demands. In fact, RAMSET uses a sampling procedure to significantly reduce the dimension of the source space thus reducing both memory usage and computation time. A first study on the impact of this source space reduction on the estimate of the brain activity generated by a single neural source was presented in [20] using synthetic MEG data. In this paper we test RAMSET on both synthetic data and real MEG measurements and we investigate how the source space reduction performed by RAMSET affects the localization accuracy of well established inverse algorithms when the forward problem is solved by three different methods.
The paper is organized as follows: in Section 2 we sketch the forward and inverse problems and the methods we used for their solution along with the random sampling method. The synthetic and real data we used in the tests are also described. In Section 3 we compare the results we obtain when applying RAMSET to these forward and inverse methods in the case of synthetic and real MEG data; the discussion of the results is given in Section 4. Finally, our conclusions are offered in Section 5.

The Forward Problem and the Lead-Field Matrix
Brain signals originate from the electrical activities of the pyramidal cells in the cortical mantle. Since the brain is a conductor, these small electrical sources induce both an electric potential difference on the scalp and a (weak) magnetic field outside the head that can be measured by EEG and MEG devices, respectively. The forward problem, required in the computation of the inverse solution, consists in the calculation of the electric potential/magnetic field produced by a known current source distribution at sensors' positions.
Due to the low frequency of the currents involved in neural activity, the magnetic field at sensors' positions q i , i = 1, · · · , N sens , can be computed by discretizing the Biot-Savart law: where V 0 is the brain volume, having permeability µ 0 , and J( r ) is the neuro-electric current density inside V 0 . The total current density J( r ) can be written as: where J p ( r ) is the primary current produced by the neural sources, i.e., the pyramidal neurons, σ( r ) is the macroscopic conductivity of the head tissues and V( r ) is the electric potential obeying the Poisson's equation: with proper boundary conditions [21].
To discretize (1)-(3), an accurate representation of the cortex is required for several reasons: It provides a good support for the solution of the inverse problem and it allows us to pre-compute the forward solution only for the points of the cortex. The cortical mantle surface can be extracted from MRI images using segmentation techniques [22][23][24] and discretized in a regular triangulation whose nodes are the support of the solution, i.e., the source space. The points of the source space are used to discretize the forward problem and to compute the forward solution.
Boundary element methods (BEMs) [25] and finite element methods (FEMs) [26] have been proposed for the solution of the M/EEG forward problem. FEMs are recommended for EEG because of the sensitivity of the signal to the change of electrical conductivity in different tissues (brain, cerebrospinal fluid, skull, and scalp). However, due to the computational demands of FEMs, currently BEMs are the most used approach for MEG, assuming the head is constituted of different regions of homogeneous conductivity whose value is known. This method provides accurate forward solutions with a lower computational cost. A more simplified model is the multishell spherical model (SP) in which the head is modeled as a set of three or four concentric spherical layers with different conductivities representing the different tissues inside the head. In this case, the solution of the forward problem can be approximated by an analytical expression [25]. Furthermore, in order to keep the computational cost low, we also consider a less sophisticated model based on a discretization of the Biot-Savart operator (1) with the assumption that σ( r )∇V( r ) in (2) is negligible so that J( r ) ≈ J p ( r ) [19].
Whatever model chosen for the head, the MEG forward solution can be represented as: where J = [ j 1 , . . . , j N p ] is the current distribution obtained by discretizing the primary current J p as a sum of N p current dipoles j k , , k = 1, · · · , N p , located on the N p grid points of the source space. The vector B = [b 1 , . . . , b N sens ] contains the magnetic field b i , i = 1, . . . , N sens , generated at sensor position i by J. The N sens × 3N p matrix L = [ L ik , i = 1, . . . , N sens , k = 1, . . . , N P ] is the lead-field matrix: Each entry L ik represents the magnetic field generated at sensor position i by a unitary dipole located at the k-th grid point. Its expression depends on the forward model we used.

The MEG Inverse Problem
The MEG inverse problem consists in determining the vector J that minimizes the discrepancy: where B is the MEG data vector having dimension N sens . This problem results in the solution of an underdetermined linear system that can be solved by the least squares method (LSQR) [27]. In fact, the lead-field matrix L is usually ill-conditioned so that regularization techniques have to be used to make the solution of the minimization problem feasible. The most common methods used to solve the MEG inverse problem are: the weighted minimum norm estimates (wMNE) [28], the dynamic statistical parametric map (dSPM) [4], and linearly constrained minimum variance (LCMV) beamforming [29]. Another beamforming method, Truncated Singular Value Decomposition Beamformig (TSBF), presented in [30] was proved to give good results when used for the solution of the MEG inverse problem (cf. [20]). For all the methods, the computation of the inverse solution is given by J = Q B where Q is the inverse kernel of the method. In more detail: • LSQR provides the least squares solution J that minimizes ∆(J) without adding any further constraints. The kernel is given by the pseudoinverse of the lead-field matrix L: • wMNE provides a weighted minimum norm estimates of ∆(J). The kernel is given by: where R is the spatial covariance matrix of the dipole strength vector, C is the noise covariance matrix of the data and λ > 0 is the regularization parameter. We set λ to 0.5||L T B|| ∞ . • dSPM provides noise-normalized estimates, through the application of a normalized version of the wMNE inverse operator: where C and R are as above. Each column Q dSPM i of the dSPM kernel matrix Q dSPM is given by: For LCMV beamforming each row of the kernel is given by: where L i is the i-th column of the lead-field matrix L.

•
In TSBF beamforming each row Q BF i of the beamforming kernel matrix Q BF is computed as: where τ > 0 is the regularization parameter. The matrix L is regularized by means of a truncated singular value decomposition, employing 80 singular values, and τ is set to 10 −8 (cf. [20]).
To solve the inverse problem we used: a Matlab [31] routine for LSQR; the Minimun Norm Estimate (MNE) routines for wMNE, dSPM, and LCMV; and a homemade implemented Matlab routine for TSBF. For all the methods, the lead-field matrix was preconditioned by dividing each column for its norm.

The Random Sampling Method
The RAndoM Sampling mEThod (RAMSET) allows the computation of the solution of the MEG inverse problem using just a small set of points in the source space. The key idea consists in modeling the neuro-electric current distribution J appearing in the forward problem (4) as a linear combination of few current dipoles. The number M of the current dipoles can be chosen in the order of the number of measurements N sens so that the resulting lead-field matrix has 3M 3N p columns and the memory storage is considerably reduced. Moreover, the conditioning of the inverse problem is improved and it can be solved even without adding further constraints [19].
RAMSET randomly selects M current dipoles by choosing among the N P entries of the vector J a set K of M indexes sampled by a uniform distribution, i.e.
This random drawing of source positions from the source space leads to approximate the forward solution as B ≈ L J, where L = [ L ik , i = 1, . . . , N sens , k ∈ K] is a N sens × 3M matrix approximating the lead-field matrix L.
In this paper we use the random sampling method to sample from the three lead-field matrices described in Section 2.1, i.e., the lead-field matrix computed using the boundary element method (LF-BEM) [25], the lead-field matrix computed using the spherical model (LF-SP) [25], and the lead-field matrix computed using the discretized Biot-Savart law (LF-BS) [19]. We compute these matrices using the boundary element method and the spherical model available in MNE software (http://martinos. org/mne/stable/index.html) [32], while we discretize the Biot-Savart operator (1) by a homemade implemented Matlab routine.

Tests on Synthetic Data
In order to assess the robustness of RAMSET with respect to the neural source position, we created N d synthetic datasets, using the anatomical information contained in the sample dataset within the MNE software [32]. A dipole position was randomly selected for N d = 100 times from a source space containing 18,841 points (3.1 mm source spacing) and a synthetic waveform was associated to it in order to create synthetic MEG evoked fields. Noise coming from the pre-stimulus of a real measurement was added: The SNR at the peak is 17.25 ± 2.57 dB. As a forward model for the data generation we employed the boundary element method.
In order to avoid "inverse crime" we employed a different source space for the solution of the inverse problem. We analyzed the data using a source space containing 7498 points (4.9 mm source spacing), not coincident with the points used to generate the data. Due to this, we have to deal with some un-eliminable distance error (1 mm) between the "real" source and the reconstructed one. We consider here the three models for the forward solution and the five kernels for the inverse solution described in the above sections: The lead-field matrices for LF-BEM, LF-SP, and LF-BS were sampled by RAMSET and the solution of the minimization step was performed by means of LSQR, wMNE, dSPM, LCMV, and TSBF methods. We computed the estimates using M = 500, M = 1000, M = 2000, and M = 4000 random points in N runs = 10 runs, selecting in each run a different sample of current dipoles in the source space. For comparison we evaluated also the solution obtained when using the full source space (FSS).
The localization accuracy was estimated by computing (for each run) the distance localization error (DLE) [33] defined as the distance between the estimated source with maximum amplitude and the real source position. Then, the DLE is averaged over all the runs in order to get its mean and standard deviation.

Tests on Real Data
We considered a dataset within the MNE software [32]: The dataset consists of MEG recordings during auditory and visual stimuli. As reported in the MNE manual, the MEG data were acquired with the Neuromag Vectorview system at Athinoula A. Martinos Center Biomedical Imaging, MGH/HMS/MIT (Massachusetts General Hospital, Harvard Medical School, Massachusetts Institute of Technology). According to the processing suggested by the MNE manual, the data were filtered between 0-40 Hz and a MEG sensor was excluded from the analysis. The trials of each condition were properly averaged in order to get four evoked fields (Left Auditory, Right Auditory, Left Visual, and Right Visual). We used the routine available within MNE for the estimation of SNR in real data. The estimated SNR values, around 90 ms after the stimulus, are around 10 for the left auditory, right auditory, right visual data, and around 5 for the left visual data. MRI images, acquired with a Siemens 1.5 T Sonata scanner using an magnetizationrPrepared -rapid gradient echo MPRAGE sequence, were processed using Freesurfer (http://surfer.nmr.mgh.harvard.edu/) [22,23] and MNE in order to extract the source space. A source space containing 18,841 points (3.1 mm source spacing) representing the cortical mantle was created as well as the three lead-field matrices described in Section 2.1.
As in the synthetic case, for all the three forward models and the five inversion kernels we compute the estimates with M = 500, M = 1000, M = 2000, and M = 4000 random points in N runs = 10 runs. We compute also the results obtained when using the full source space.
In this case, at each run we computed the DLE as the distance between the the estimated source with maximum amplitude and the dipole estimated by the dipole fitting method [1] implemented in MNE at the time sample with peak of activity. Finally, we get the average and standard deviation values of the DLE across all the runs.

Synthetic Data
The results are summarized in Figure 1 where boxplots of DLE for sources at different depth levels are shown: D > 5 stays for the sources at least 5 cm from the origin of the head coordinate system, 3 ≤ D ≤ 5 for the sources between 3 and 5 cm from the origin, and D < 3 for the sources at less than 3 cm from the origin.
The boxplots show that the random sampling method is able to give a good localization of the dipole source. In fact, the DLE does not increase significantly when using just few points of the source space or the full source space (cf. the blue and yellow boxplots in Figure 1). To be more specific: • Case M = 500 (Blue bloxplots): The DLE is in the order of 2 cm or less in the case of superficial sources, i.e., distance from the head center greater than 5 cm, for all the forward and inverse methods. Higher errors are produced when using LF-SP, especially when coupled with dSPM. In the case of deep sources, i.e., when the distance from the head center is less than 5 cm, the DLE increases for all the methods except dSPM especially when coupled with LF-BEM. In this case the error remains below 2 cm. We notice that both LSQR and wMNE produce a very high DLE, i.e., greater than 3 cm, when coupled with any of the forward models, while TSBF produces good results, i.e., DLE less than 1 cm, when the distance of the sources from the center is geater than 3 cm especially when coupled with LF-BEM. In conclusion, when using just M = 500 points in the source space the boundary element method coupled with dSPM or TSBF gives acceptable small DLE values. To summarize, for superficial sources the random sampling method gives a rather small DLE when using TSBF as inversion method. Better results are obtained when it is coupled with the boundary element method to solve the forward problem, even if the spherical model and the discretized Biot-Savart operator also give good results. In the case of deep sources, dSPM coupled with the boundary element method is the most accurate method even when just few hundreds of points are selected in the source space.

Real Data Analysis
The results regarding DLE are shown in Figures 2-5 for Left Auditory, Right Auditory, Left Visual, and Right Visual data, and full source space, respectively. The plots show that also in this case the random sampling method gives results comparable with methods that use the full source space: • Left Auditory Case (Figure 2): The random sampling method with M ≤ 4000 gives lower errors when using LCMV or TSBF as inversion methods. In particular, for M = 500 the DLE is about 1 cm or less when they are coupled with LF-BS. As for the other inversion methods, LSQR produces a slightly higher DLE, i.e., below 2 cm, while wMNE and dSPM produce a DLE greater than 2 cm. As expected, the DLE decreases when M increases except for wMNE, which is the less accurate inversion method. Nevertheless, the decrease is very small for M ≥ 2000. These results are confirmed by Figures 6-15 where the intensity of the estimated sources at time 90 ms is displayed. The shown intensity is the average intensity, obtained by averaging the solutions on the 10 runs and then normalizing between 0 and 1. For visualization purposes we consider as support the full source space and divide the intensity into five intervals: For values in the [0, 0.3] interval the color associated to the source space point is gray, in (0.3, 0.5] the color is blue, in [0.5, 0.7), the color is sky blue, in [0.7, 0.9) the color is cyan and in [0.9, 1] the color is yellow. The red circle represents the dipole fitted by the dipole fitting method [1] implemented in MNE. Figures 6-10 refer to the left auditory evoked field for the five inverse methods we consider. The first row of each figure refers to the case M = 500, the second to the case M = 2000, and the third one to the full source space case. Figures 11-15 refer to the right visual evoked field for the five inverse methods we consider. Again, the first row of each figure refers to the case M = 500, the second to the case M = 2000, and the third one to the full source space case.

Discussion
To summarize, the random sampling method highly reduces the computational cost at the price of a lower accuracy, especially in the case of deep sources. Nevertheless, the results show that the accuracy of the random sampling method is comparable with the accuracy obtained when using the full source space in the case of superficial sources. In this case a few thousands of points sampled from the source space are sufficient to have a DLE less than 1 cm when using LCMV with the boundary element method or TSBF with all the three forward models. The DLE increases as the depth of the source increases. For deep sources better accuracy is given when using dSPM as the inversion method and LF-BEM as a forward model.
We point out that although the average of 10 runs is poor from a statistical viewpoint, it is a good compromise between accuracy and computational cost. In [20] the authors investigated how the increasing of the number of runs up to 50 affects the localization accuracy. They found that accuracy does not improve significantly as the number of runs increases.
Before testing the random sampling method on real-time applications there are several issues to be further analyzed. First of all, it should be investigated if the choice of the distribution from which to sample the source space can affect the results. Here, we used the uniform distribution but other types of distributions could improve the accuracy of the localization [34]. Another issue to be investigated is how the accuracy depends on the number of sensors. In the tests we used a few hundreds of magnetometers but portable devices have just a few dozen sensors. Finally, we should analyze if our approach can localize sources spatially close to each other with an acceptable accuracy. We will investigate these issues in the future.
We point out that other approaches to reduce the dimension of the source space are based on its parcellisation (see, for instance, [35]). These methods require a computationally demanding pre-processing since they require an available atlas based on the MRI of the subject or on a template and to assign each point to a parcel. This pre-processing can be performed off-line on MRI images and it allows the keeping of information on the anatomical or functional brain region, where the activity is located, that could be relevant for real-time applications. On the contrary, our approach has a very low computational cost at the price of losing anatomical or functional information on the active brain region. This could be done as a post-processing step at the cost of increasing the computational time.

Conclusions
The solution of the MEG inverse problem is a tricky task that usually requires time and memory consuming methods to be solved. Here, we test a recently proposed technique, the random sampling method (RAMSET), and analyze how the source space reduction performed by RAMSET affects the localization accuracy of well-established inverse methods applied to synthetic and real MEG data. We found results that are comparable with the ones obtained by using full source space methods, while using lower computational cost and memory storage. As the numerical tests show, the employment of a few hundreds of points in the source space does not compromise the capability of detecting neural sources, as shown in synthetic MEG datasets. Moreover, we showed that the employment of a lead-field matrix computed in a slimmer way does not prejudice the reconstruction accuracy of superficial sources. These results suggest that random sampling could be an efficient tool for the analysis not only of real evoked activities but also of raw data such as in the case of real-time recordings and measurements acquired with portable devices. The use of the random sampling with this kind of data and, in particular, with EEG data, will be the subject of a forthcoming paper.
Author Contributions: C.C., A.P., F.P. conceived and designed the experiments; C.C. performed the experiments and analyzed the data; C.C., A.P., F.P. wrote the paper.