# Information Theoretic Approaches for Motor-Imagery BCI Systems: Review and Experimental Comparison

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

#### EEG Measurement and Preprocessing

## 2. The Common Spatial Pattern Criterion

## 3. Divergence-Based Criteria

#### 3.1. Criterion Based on the Symmetric Kullback–Leibler Divergence

#### 3.2. Criterion Based on the Beta Divergence

#### 3.3. Criterion Based on the Alpha-Beta Log-Det Divergence

#### 3.4. Algorithms for Maximizing the Divergence-Based Criteria

#### 3.4.1. Tangent Methods

#### 3.4.2. Optimization on the Lie Algebra

- Start at the zero matrix $\mathbf{0}$.
- Move from $\mathbf{0}$ to$$\begin{array}{c}\hfill {\mathbf{M}}_{t}=\mu {\nabla}_{\mathbf{M}}{J|}_{\mathbf{M}=\mathbf{0}},\end{array}$$$$\begin{array}{c}\hfill {\nabla}_{\mathbf{M}}J=\partial J\left(\mathbf{R}\right){\mathbf{R}}^{\top}-\mathbf{R}\partial J{\left(\mathbf{R}\right)}^{\top}.\end{array}$$
- Define ${\mathbf{Q}}_{t}=exp\left({\mathbf{M}}_{t}\right)$, and use it to come back into the space of the orthogonal matrices.
- Update ${\mathbf{R}}_{t+1}={\mathbf{Q}}_{t}{\mathbf{R}}_{t}$.

#### 3.4.3. Post-Processing

## 4. The Information Theoretic Feature Extraction Framework

## 5. Non-Information-Theoretic Variants of CSP

## 6. Experimental Results

- FBCSP (see Section 5): In this case, we have used a variation of the algorithm in [30]. The selected frequency bands correspond to the brainwaves theta (4–7 Hz), alpha (8–15 Hz), beta (16–31 Hz) and low gamma (32–40 Hz), where five-fold cross-validation has been used to select the best combination of these frequency bands. We extract d features from each band, where d is selected using the method in [72].
- DivCSP (see Section 3.2 and Section 3.4). The values of $\beta $ and $\varphi $ (the regularization parameter) have been selected by five-fold cross-validation, $\beta \in [0,1]$, $\varphi \in [0,0.5]$. This divergence includes the KL divergence as a particular case when $\beta =0$. MATLAB code of the algorithm has been downloaded from [74] and used without any modification. Optimization has been performed using the so-called subspace method (see Section 3.4).
- Sub-LD (sub-space log-det): this algorithm, which also belongs to the class of the subspace methods, is based on the criterion in [42] to maximize the Alpha-Beta log-det divergence (see Section 3.3 and Section 3.4). In this paper, the implementation of the algorithm is based on the BFGS method on the Stiefel manifold of semi-orthogonal matrices and takes as the initialization point the solution obtained by the CSP algorithm. The regularization parameter $\eta $ has been chosen by five-fold cross-validation in the range of values $(-0.2,0.2)$, which are not far from zero. The negative values of $\eta $ favor the expansion of the clusters, while the positive values favor their contraction. For $\eta $ close to zero, the solution of this criterion should not be far from that of CSP, which improves the convergence time of the algorithm and reduces the impact of the values of $\alpha ,\beta $ in the results, so both parameters have been fixed to 0.5.

#### Results on Artificially Perturbed Data

## 7. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Saeid, S.; Chambers, J.A. EEG Signal Processing; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Sörnmo, L.; Laguna, P. Bioelectrical Signal Processing in Cardiac and Neurological Applications; Academic Press: Cambridge, MA, USA, 2005; Volume 8. [Google Scholar]
- Devlaminck, D.; Wyns, B.; Grosse-Wentrup, M.; Otte, G.; Santens, P. Multisubject learning for common spatial patterns in motor-imagery BCI. Comput. Intell. Neurosci.
**2011**, 217987. [Google Scholar] [CrossRef] [PubMed] - Lotte, F. A tutorial on EEG signal-processing techniques for mental-state recognition in brain-computer interfaces. In Guide to Brain-Computer Music Interfacing; Springer: London, UK, 2014; pp. 133–161. [Google Scholar]
- Samek, W.; Meinecke, F.C.; Müller, K.-R. Transferring subspaces between subjects in brain–computer interfacing. IEEE Trans. Biomed. Eng.
**2013**, 60, 2289–2298. [Google Scholar] [CrossRef] [PubMed] - Wu, W.; Gao, X.; Hong, B.; Gao, S. Classifying single-trial EEG during motor imagery by iterative spatio-spectral patterns learning (ISSPL). IEEE Trans. Biomed. Eng.
**2008**, 55, 1733–1743. [Google Scholar] [CrossRef] [PubMed] - Grosse-Wentrup, M.; Liefhold, C.; Gramann, K.; Buss, M. Beamforming in noninvasive brain-computer interfaces. IEEE Trans. Biomed. Eng.
**2009**, 56, 1209–1219. [Google Scholar] [CrossRef] [PubMed] - Gouy-Pailler, C.; Congedo, M.; Brunner, C.; Jutten, C.; Pfurtscheller, G. Nonstationary brain source separation for multiclass motor imagery. IEEE Trans. Biomed. Eng.
**2010**, 57, 469–478. [Google Scholar] [CrossRef] [PubMed] - Sun, G.; Hu, J.; Wu, G. A novel frequency band selection method for common spatial pattern in motor imagery based brain computer interface. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; pp. 1–6. [Google Scholar]
- Thomas, K.P.; Guan, C.; Lau, C.T.; Vinod, A.P.; Ang, K.K. A new discriminative common spatial pattern method for motor imagery brain-computer interfaces. IEEE Trans. Biomed. Eng.
**2009**, 56, 2730–2733. [Google Scholar] [CrossRef] [PubMed] - Graimann, B.; Allison, B.; Pfurtscheller, G. Brain-computer interfaces: A gentle introduction. In Brain-Computer Interfaces; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–27. [Google Scholar]
- Pfurtscheller, G.; Lopes Da Silva, F.H. Event-related EEG/MEG synchronization and desynchronization: Basic principles. Clin. Neurophysiol.
**1999**, 110, 1842–1857. [Google Scholar] [CrossRef] - Lotte, F.; Bougrain, L.; Cichocki, A.; Clerc, M.; Congedo, M.; Rakotomamonjy, A.; Yger, F. A Review of Classification Algorithms for EEG-based Brain-Computer Interfaces: A 10-year Update. J. Neural Eng.
**2018**. (in print). [Google Scholar] [CrossRef] [PubMed] - Schlögl, A.; Lee, F.; Bischof, H.; Pfurtscheller, G. Characterization of four-class motor imagery EEG data for the BCI-competition 2005. J. Neural Eng.
**2005**, 2, L14–L22. [Google Scholar] [CrossRef] [PubMed] - Ehrsson, H.; Geyer, S.; Naito, E. Imagery of Voluntary Movement of Fingers, Toes, and Tongue Activates Corresponding Body-Part-Specific Motor Representations. J. Neurophysiol.
**2003**, 90, 3304–3316. [Google Scholar] [CrossRef] [PubMed] - Dagaev, N.; Volkova, K.; Ossadtchi, A. Latent variable method for automatic adaptation to background states in motor imagery BCI. J. Neural Eng.
**2017**. [Google Scholar] [CrossRef] [PubMed] - Perdikis, S.; Leeb, R.; Millán, J.D. Context-aware adaptive spelling in motor imagery BCI. J. Neural Eng.
**2016**, 13, 036018. [Google Scholar] [CrossRef] [PubMed] - Ramoser, H.; Müller-Gerking, J.; Pfurtscheller, G. Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans. Rehabil. Eng.
**2000**, 8, 441–446. [Google Scholar] [CrossRef] [PubMed] - Brandl, S.; Müller, K.-R.; Samek, W. Robust common spatial patterns based on Bhattacharyya distance and Gamma divergence. In Proceedings of the 2015 3rd International Winter Conference on Brain-Computer Interface (BCI), Sabuk, Korea, 12–14 January 2015; pp. 1–4. [Google Scholar]
- Lotte, F.; Guan, C. Spatially regularized common spatial patterns for EEG classification. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 23–26 August 2010; pp. 3712–3715. [Google Scholar]
- Lu, H.; Plataniotis, K.N.; Venetsanopoulos, A.N. Regularized common spatial patterns with generic learning for EEG signal classification. In Proceedings of the 2009 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Minneapolis, MN, USA, 3–6 September 2009; pp. 6599–6602. [Google Scholar]
- Samek, W.; Vidaurre, C.; Müller, K.-R.; Kawanabe, M. Stationary common spatial patterns for brain-computer interfacing. J. Neural Eng.
**2012**, 9, 026013. [Google Scholar] [CrossRef] [PubMed] - Samek, W.; Kawanabe, M.; Muller, K.-R. Divergence-based framework for common spatial patterns algorithms. IEEE Rev. Biomed. Eng.
**2014**, 7, 50–72. [Google Scholar] [CrossRef] [PubMed] - Wang, H. Harmonic mean of Kullback–Leibler divergences for optimizing multiclass EEG spatio-temporal filters. Neural Process. Lett.
**2012**, 36, 161–171. [Google Scholar] [CrossRef] - Samek, W.; Müller, K.-R. Tackling noise, artifacts and nonstationarity in BCI with robust divergences. In Proceedings of the 2015 23rd European Signal Processing Conference (EUSIPCO), Nice, France, 31 August–4 September 2015; pp. 2741–2745. [Google Scholar]
- Lawhern, V.; David Hairston, W.; McDowell, K.; Westerfield, M.; Robbins, K. Detection and classification of subject-generated artifacts in EEG signals using autoregressive models. J. Neurosci. Methods
**2012**, 208, 181–189. [Google Scholar] [CrossRef] [PubMed] - Delorme, A.; Sejnowski, T.; Makeig, S. Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis. Neuroimage
**2007**, 34, 1443–1449. [Google Scholar] [CrossRef] [PubMed] - Uusitalo, M.; Ilmoniemi, R.J. Signal-space projection method for separating MEG or EEG into components. Med. Biol. Eng. Comput.
**1997**, 35, 135–140. [Google Scholar] [CrossRef] [PubMed] - Urigüen, J.A.; García-Zapirain, B. EEG artifact removal-state-of-the-art and guidelines. J. Neural Eng.
**2015**, 12, 031001. [Google Scholar] [CrossRef] [PubMed] - Ang, K.K.; Chin, Z.Y.; Zhang, H.; Guan, C. Filter bank common spatial pattern (FBCSP) in brain-computer interface. In Proceedings of the IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008; pp. 2390–2397. [Google Scholar]
- Dornhege, G.; Blankertz, B.; Krauledat, M.; Losch, F.; Curio, G.; Muller, K.-R. Combined optimization of spatial and temporal filters for improving brain-computer interfacing. IEEE Trans. Biomed. Eng.
**2006**, 53, 2274–2281. [Google Scholar] [CrossRef] [PubMed] - Kang, H.; Nam, Y.; Choi, S. Composite common spatial pattern for subject-to-subject transfer. IEEE Signal Process. Lett.
**2009**, 16, 683–686. [Google Scholar] [CrossRef] - Ang, K.; Chin, Z.Y.; Zang, H.; Guan, C. Mutual information-based selection of optimal spatial-temporal patterns for single-trial EEG-based BCIs. Pattern Recognit.
**2012**, 45, 2137–2144. [Google Scholar] [CrossRef] - Koles, Z.; Lind, J.; Flor-Henry, P. Spatial patterns in the background EEG underlying mental disease in man. Electroencephalogr. Clin. Neurophysiol.
**1994**, 91, 319–328. [Google Scholar] [CrossRef] - Wu, W.; Chen, Z.; Gao, S.; Brown, E. A probabilistic framework for robust common spatial patterns. In Proceedings of the Annual International Conference of the Engineering in Medicine and Biology Society (EMBC), Minneapolis, MN, USA, 3–6 September 2009; pp. 4658–4661. [Google Scholar]
- Kang, H.; Choi, S. Probabilistic models for common spatial patterns: Parameter extended EM and variational bayes. In Proceedings of the XXVI AAAI Conference on Artificial Intelligence, Toronto, ON, Canada, 22–26 July 2012; pp. 970–976. [Google Scholar]
- Kawanabe, M.; Vidaurre, C. Improving BCI performance by modified common spatial patterns with robustly averaged covariance matrices. In Proceedings of the World Congress on Medical Physics and Biomedical Engineering; Springer: Munich, Germany, 7–12 September 2009; pp. 279–282. [Google Scholar]
- Yong, X.; Ward, R.K.; Birch, G.E. Robust common spatial patterns for EEG signal preprocessing. In Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Vancouver, BC, Canada, 20–25 August 2008; pp. 2087–2090. [Google Scholar]
- Samek, W.; Kawanabe, M.; Vidaurre, C. Group-wise stationary subspace analysis—A novel method for studying non-stationarities. Proc. Int. Brain Comput. Interfaces Conf.
**2011**. Available online: https://www.researchgate.net/profile/MotoakiKawanabe/publication/216887788_Group-wise_Stationary_Subspace_Analysis_-_A_Novel_Method_for_Studying_Non-Stationarities/links/02e7e51d7fec25159b000000.pdf (accessed on 19 December 2017). - Arvaneh, M.; Guan, C.; Ang, K.K.; Quek, C. Optimizing spatial filters by minimizing within-class dissimilarities in electroencephalogram-based brain-computer interface. IEEE Trans. Neural Netw. Learn. Syst.
**2013**, 24, 610–619. [Google Scholar] [CrossRef] [PubMed] - Samek, W.; Blythe, D.; Müller, K.-R.; Kawanabe, M. Robust spatial filtering with beta divergence. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA,, 2013; pp. 1007–1015. [Google Scholar]
- Beeta Thyam, D.; Cruces, S.; Olías, J.; Chichocki, A. Optimization of Alpha-Beta log-det divergences and their application in the spatial filtering of two class motor imagery movements. Entropy
**2017**, 19, 89. [Google Scholar] [CrossRef] - Cichocki, A.; Cruces, S.; Amari, S.-I. Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy
**2011**, 13, 134–170. [Google Scholar] [CrossRef] - Plumbley, M.D. Geometrical methods for non-negative ICA: Manifolds, Lie groups and toral subalgebras. Neurocomputing
**2005**, 67, 161–197. [Google Scholar] [CrossRef] - Edelman, A.; Arias, T.A.; Smith, S.T. The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl.
**1998**, 20, 303–353. [Google Scholar] [CrossRef] - Moler, C.; Van Loan, C. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Rev.
**2003**, 45, 3–49. [Google Scholar] [CrossRef] - Huang, W.; Absil, P.-A.; Gallivan, K.A. A Riemannian BFGS Method for Nonconvex Optimization Problems. In Numerical Mathematics and Advanced Applications ENUMATH 2015; Springer: Cham, Switzerland, 2016; pp. 627–634. [Google Scholar]
- Boumal, N.; Mishra, B.; Absil, P.-A.; Sepulchre, R. Manopt, a Matlab Toolbox for Optimization on Manifolds. J. Mach. Learn. Res.
**2014**, 15, 1455–1459. [Google Scholar] - Grosse-Wentrup, M.; Buss, M. Multiclass common spatial patterns and information theoretic feature extraction. IEEE Trans. Biomed. Eng.
**2008**, 55, 1991–2000. [Google Scholar] [CrossRef] [PubMed] - Feder, M.; Merhav, N. Relations between entropy and error probability. IEEE Trans. Inf. Theory
**1994**, 40, 259–266. [Google Scholar] [CrossRef] - Jones, M.C.; Sibson, R. What is projection pursuit? (with discussion). J. R. Stat. Soc. Ser. A
**1987**, 150, 1–36. [Google Scholar] [CrossRef] - Wang, H.; Tang, Q.; Zheng, W. L1-norm-based common spatial patterns. IEEE Trans. Biomed. Eng.
**2012**, 59, 653–662. [Google Scholar] [CrossRef] [PubMed] - Daly, I.; Nicolaou, N.; Nasuto, S.; Warwick, K. Automated artifact removal from the electroencephalogram: A comparative study. Clin. EEG Neurosci.
**2013**, 44, 291–306. [Google Scholar] [CrossRef] [PubMed] - Fatourechi, M.; Bashashati, A.; Ward, R.; Birch, G. EMG and EOG artifacts in brain-computer interface systems: A survey. Clin. Neurophysiol.
**2007**, 118, 480–494. [Google Scholar] [CrossRef] [PubMed] - Wang, H.; Li, X. Regularized filters for L1-norm-based common spatial patterns. IEEE Trans. Neural Syst. Rehabil. Eng.
**2016**, 24, 201–211. [Google Scholar] [CrossRef] [PubMed] - Arvaneh, M.; Guan, C.; Ang, K.K.; Quek, C. Optimizing the channel selection and classification accuracy in EEG-based BCI. IEEE Trans. Biomed. Eng.
**2011**, 58, 1865–1873. [Google Scholar] [CrossRef] [PubMed] - Park, J.; Chung, W. Common spatial patterns based on generalized norms. In Proceedings of the 2013 International Winter Workshop on Brain-Computer Interface (BCI), Jeongseon, Korea, 18–20 February 2013; pp. 39–42. [Google Scholar]
- Lotte, F.; Guan, C. Learning from other subjects helps reducing brain-computer interface calibration time. In Proceedings of the 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Dallas, TX, USA, 14–19 March 2010; pp. 614–617. [Google Scholar]
- Blankertz, B.; Kawanabe, M.; Tomioka, R.; Hohlefeld, F.U.; Nikulin, V.V.; Müller, K.-R. Invariant common spatial patterns: Alleviating nonstationarities in brain-computer interfacing. In Proceedings of the Advances in Neural Information Processing Systems 20 (NIPS 2007), Vancouver, BC, Canada, 3–5 December 2007; pp. 113–120. [Google Scholar]
- Wojcikiewicz, W.; Vidaurre, C.; Kawanabe, M. Stationary common spatial patterns: Towards robust classification of non-stationary EEG signals. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech, 22–27 May 2011; pp. 577–580. [Google Scholar]
- Wojcikiewicz, W.; Vidaurre, C.; Kawanabe, M. Improving classification performance of BCIs by using stationary common spatial patterns and unsupervised bias adaptation. In Proceedings of the International Conference on Hybrid Artificial Intelligence Systems, Wroclaw, Poland, 23–25 May 2011; pp. 34–41. [Google Scholar]
- Kawanabe, M.; Vidaurre, C.; Scholler, S.; Muuller, K.-R. Robust common spatial filters with a maxmin approach. In Proceedings of the 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Minneapolis, MN, USA, 3–6 September 2009; pp. 2470–2473. [Google Scholar]
- Kawanabe, M.; Samek, W.; Müller, K.-R.; Vidaurre, C. Robust common spatial filters with a maxmin approach. Neural Comput.
**2014**, 26, 349–376. [Google Scholar] [CrossRef] [PubMed] - Lotte, F.; Guan, C. Regularizing common spatial patterns to improve BCI designs: Unified theory and new algorithms. IEEE Trans. Biomed. Eng.
**2011**, 58, 355–362. [Google Scholar] [CrossRef] [PubMed] - Suk, H.-I.; Lee, S.-W. A novel bayesian framework for discriminative feature extraction in brain-computer interfaces. IEEE Trans. Pattern Anal. Mach. Intell.
**2013**, 35, 286–299. [Google Scholar] [CrossRef] [PubMed] - Wang, H.; Zheng, W. Local temporal common spatial patterns for robust single-trial EEG classification. IEEE Trans. Neural Syst. Rehabil. Eng.
**2008**, 16, 131–139. [Google Scholar] [CrossRef] [PubMed] - Dornhege, G.; Blankertz, B.; Curio, G.; Müller, K.-R. Increase Information Transfer Rates in BCI by CSP Extension to Multi-class. In Proceedings of the Advances in Neural Information Processing Systems 16, Vancouver and Whistler, BC, Canada, 8–13 December 2003. [Google Scholar]
- Yang, Y.; Chevallier, S.; Wiart, J.; Bloch, I. Time-frequency optimization for discrimination between imagination of right and left hand movements based on two bipolar electroencephalography channels. EURASIP J. Adv. Signal Process.
**2014**, 38. [Google Scholar] [CrossRef] - Yang, Y.; Chevallier, S.; Wiart, J.; Bloch, I. Subject-specific time-frequency selection for multi-class motor imagery-based BCIs using few Laplacian EEG channels. Biomed. Signal Process. Control
**2017**, 38, 302–311. [Google Scholar] [CrossRef] - Yang, Y.; Chevallier, S.; Wiart, J.; Bloch, I. Subject-Specific Channel Selection Using Time Information for Motor Imagery Brain-Computer Interfaces. Cogn. Comput.
**2016**, 8, 505–518. [Google Scholar] [CrossRef] - BCI Competitions. Available online: http://www.bbci.de/competition/ (accessed on 5 June 2017).
- Yang, Y.; Chevallier, S.; Wiart, J.; Bloch, I. Automatic selection of the number of spatial filters for motor-imagery BCI. In Proceedings of the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges, Belgium, 25–27 April 2012; pp. 109–114. [Google Scholar]
- Fabien Lotte. Matlab Codes and Software. Available online: https://sites.google.com/site/fabienlotte/code-and-softwares (accessed on 12 November 2017).
- Wojciech Samek. The Divergence Methods Web Site. Available online: http://divergence-methods.org (accessed on 12 January 2017).

**Figure 1.**Electrode locations of the international 10–20 system for EEG recording. The letters “F”, “T”, “C”, “P” and “O” stand for frontal, temporal, central, parietal and occipital lobes, respectively. Even numbers correspond to electrodes placed on the right hemisphere, whereas odd numbers refer to those on the left hemisphere. The “z” refers to electrodes placed in the midline.

**Figure 2.**Illustration of the Alpha-Beta log-det divergence (AB-LD) divergence ${D}_{LD}^{(\alpha ,\beta )}({\mathbf{\Sigma}}_{1}\parallel {\mathbf{\Sigma}}_{2})$ in the $(\alpha ,\beta )$-plane. Note that the position of each divergence is specified by the value of the hyperparameters $(\alpha ,\beta )$. This parameterization smoothly connects several positive definite matrix divergences, such as the squared Riemannian metric ($\alpha =0,\beta =0$), the KL matrix divergence or Stein’s loss ($\alpha =1,\beta =0$), the dual KL matrix divergence ($\alpha =0,\beta =1$) and the S-divergence ($\alpha ={\textstyle \frac{1}{2}},\beta ={\textstyle \frac{1}{2}}$), among others.

**Figure 3.**This figure shows the evolution of the common spatial patterns (CSP) criterion function (in blue line), the symmetrized Kullback–Leibler divergence (sKL) (in red line), the symmetrized beta divergence (in purple line) and the AB-LD divergence (in yellow line), all of them as a function of the components of the spatial filter $\mathit{w}=[{w}_{1},{w}_{2}]$ in the two-dimensional case, where it is assumed that ${\parallel \mathit{w}\parallel}_{2}^{2}={w}_{1}^{2}+{w}_{2}^{2}=1$. All the divergences are normalized with respect to their maximum values, and no regularization has been applied. Observe the coincidence of all the critical points. The covariance matrices were generated at random in this experiment.

**Figure 5.**Illustration of the advantages in performance of using an automatic cross-validation method to estimate the best even number of features d with respect to using an a priori fixed value of d. The automatic method relies on the technique proposed in [72], which was implemented here using one-sided t-tests of significance instead of the original two-sided tests. (

**a**) Scatter plot comparison of the accuracies (in percentage) obtained by the CSP algorithm for fixed $d=8$ (x-axis) and for the automatic estimation of d (y-axis); (

**b**) histogram of the estimated best even number of features d.

**Figure 6.**Comparison of the expected accuracy percentages obtained by each of the considered algorithms. The figure shows box-plot illustrations where the median is shown in red line, while the 25% and 75% percentiles are respectively at the bottom and top of each box. Larger positive values $T-STAT\gg 0$ and smaller $P-VAL\ll 1/2$ would correspond with greater expected improvements over CSP. However, none of the p-values, which are shown below their respective box-plots, is able to attain the $5\%$ threshold level of significance ($P-VAL<0.05$), so the possible improvements cannot be claimed to be statistically significant with respect to those obtained by CSP.

**Figure 7.**Performance of the algorithms for different motor imagery combinations involving the right hand. (

**a**) Right-hand versus left-hand motor imagery classification; (

**b**) right-hand versus feet motor imagery classification; (

**c**) right-hand versus tongue motor imagery classification.

**Figure 8.**Accuracy percentages and p-values for the testing of an improvement in performance over CSP when the right hand versus left hand movement imagination are discriminated. The results reveal that, in general and except in a few isolated cases, the null hypothesis that the other methods do not significantly improve the performance over CSP cannot be discarded. (

**a**) Average accuracy obtained by the algorithms for each subject; (

**b**) p-values of the t-tests that compare whether the performance of the alternative algorithms is significantly better than the one obtained by CSP. The horizontal dashed line represents the threshold level of significance of 5%.

**Figure 9.**Histogram of the values of the regularization parameter in the Sub-LD algorithm that have been chosen by cross-validation.

**Figure 10.**Histogram of the hyper-parameters of the DivCSP algorithm selected by cross-validation. (

**a**) Case with $\beta \in [0,0.5]$ and $\varphi =0$; (

**b**) case with $\beta =0.5$ and $\varphi \in [0,0.5]$.

**Figure 11.**Comparison of the accuracy percentages obtained by each of the considered algorithms with respect to the percentage of mismatched labels in the training set. This experiment illustrates deterioration of the performance of the algorithms with respect to the increase of the percentage of randomly switched labels of the motor imagery movements.

**Figure 12.**Accuracy percentages versus the percentage of training trials with outliers in a synthetic classification experiment.

**Table 1.**Computational burden of the considered algorithms, which are sorted in increasing value of their respective execution times without using cross-validation. FBCSP, filter bank CSP; ITFE, information theoretic feature extraction.

Algorithm | Time (s) |
---|---|

CSP | 0.0017 |

FBCSP | 0.0050 |

ITFE | 0.3070 |

Sub-LD | 1.0538 |

DivCSP | 4.6696 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Martín-Clemente, R.; Olias, J.; Thiyam, D.B.; Cichocki, A.; Cruces, S.
Information Theoretic Approaches for Motor-Imagery BCI Systems: Review and Experimental Comparison. *Entropy* **2018**, *20*, 7.
https://doi.org/10.3390/e20010007

**AMA Style**

Martín-Clemente R, Olias J, Thiyam DB, Cichocki A, Cruces S.
Information Theoretic Approaches for Motor-Imagery BCI Systems: Review and Experimental Comparison. *Entropy*. 2018; 20(1):7.
https://doi.org/10.3390/e20010007

**Chicago/Turabian Style**

Martín-Clemente, Rubén, Javier Olias, Deepa Beeta Thiyam, Andrzej Cichocki, and Sergio Cruces.
2018. "Information Theoretic Approaches for Motor-Imagery BCI Systems: Review and Experimental Comparison" *Entropy* 20, no. 1: 7.
https://doi.org/10.3390/e20010007