An Intelligent Motor Imagery Detection System Using Electroencephalography with Adaptive Wavelets

Classification of motor imagery (MI) tasks provides a robust solution for specially-abled people to connect with the milieu for brain-computer interface. Precise selection of uniform tuning parameters of tunable Q wavelet transform (TQWT) for electroencephalography (EEG) signals is arduous. Therefore, this paper proposes robust TQWT for automatically selecting optimum tuning parameters to decompose non-stationary EEG signals accurately. Three evolutionary optimization algorithms are explored for automating the tuning parameters of robust TQWT. The fitness function of the mean square error of decomposition is used. This paper also exploits channel selection using a Laplacian score for dominant channel selection. Important features elicited from sub-bands of robust TQWT are classified using different kernels of the least square support vector machine classifier. The radial basis function kernel has provided the highest accuracy of 99.78%, proving that the proposed method is superior to other state-of-the-art using the same database.


Introduction
The recent increase in Brain-computer interface (BCI) has attracted researchers to develop solutions for computers and other devices with the aid of brain activities. BCI manipulates the brain's electrical activities to help the specially-abled carry out their routine tasks. The cognitive activities accessed through electroencephalogram (EEG) signals provide an intelligent solution to assist specially-abled individuals to accomplish body movement just by mere imagination without any external support is motor imagery (MI) classification [1]. Non-invasive nature and wide exposure of EEG signals for various neurological activities makes it most viable for modelling BCI systems [2][3][4][5][6].
Several techniques using EEG signals exploring time, frequency and time-frequency (TF) domain computation have been developed to detect MI tasks. Separation of MI tasks by distinction sensitive learning vector quantization based adaptive autoregression model has been proposed in [7]. Welch spectral analysis method based on fast Fourier transform (FFT) with decision tree classifier has been used for discrimination of brain activities [8]. Regularized common spatial patterns (R-CSP) [9], filtered regularized CSP (FRCSP) [10], weighted and regularized CSP [11], CSP with particle swarm optimization (PSO) [12], optimization of filterbanks and time window within CSP [13] and adaptive removal of artifacts within CSP [14] have been explored for the MI classification. Cross-correlation (CC) and support vector machine with least square (LS-SVM) has been used in [15] for bifurcation of MI tasks. Cross-correlation [16] and modified cross-correlation (M-CC) [17] with logistic regression (LR) have been proposed for identification of MI. The utility of optimal allocation (OA) algorithm using Naive Bayes (NB) and LS-SVM have been used in MI recognition [18,19]. Power spectral density, phase lock value (PLV) with CC [20], and PLV with spectral coherence [21] have been proposed for MI detection.
Wavelet transform (WT) and Wilcoxon statistical method with tabu fuzzy standards have been used in [22]. Isolation of MI has been accomplished using machine learning techniques combined with discrete WT (DWT) [23,24], rational dilation WT (RDWT) [25], and tunable Q WT (TQWT) [26]. Empirical mode decomposition (EMD) with Hilbert transform [27] based on AM-FM intrinsic mode functions has also been used for recognizing MI tasks. Several TF analysis has also been proposed for MI classification. Classification of MI tasks using TF analysis filtered based FFT with long-short term memory [28], WT [29], and continuous WT (CWT) [30] with convolutional neural networks (CNN) have been proposed. Multivariate EMD with Gaussian SVM has been used for MI identification [31]. In addition to this, Z-score [32], multiscale principal component analysis (MSPCA) with wavelet packet decomposition (WPD) [33], clustering technique (CT) [34], adaptive space-time-frequency analysis [35] and comprehensive spectral methods [36] have been proposed for segregations of MI tasks. An automated form of variational mode decomposition combined with extreme learning machine classifier has been also used to detect MI tasks [37]. In [38], the authors explored hybridization of MSPCA combined with WPD to extract hidden information from EEG using subbands. The utility of flexible analytic wavelet transform (FAWT) has been used to extract and classify the evaluated time-frequency features from the subbands using linear discriminant analysis classifier [39]. The time-series analysis using a CNN module has been explored to detect MI tasks [40]. The authors in [41] explored multiscale Siamese CNN with cross-channel fusion to detect MI. The classification of MI has been accomplished using multi-branch hybrid neural network [42]. Efficient and accurate modelling of BCI systems is only possible with proper selection of channels, decomposition technique, machine learning or deep learning framework, and evaluation of performance indices [43]. Most of the signal decomposition methods used in the literature use an empirical selection of fixed basis functions. The choice of fixed basis function for non-stationary EEG signals may not properly decompose EEG signals. This inspires us to present adaptive tuning parameters selection for fruitful EEG analysis. This methodology proposes robust TQWT using evolutionary optimization algorithms (EOA) for quintessential tuning parameters and Laplacian score for channel selection. The novelty of the method is as follows: • Exploring evolutionary optimization for adaptive selection of proper basis function; • Use of optimum decomposition level to decompose the signal into sub-bands; • Exploring the utility of Laplacian score for selecting proper channels to reduce computational complexity.
The automated and adaptive TQWT has been used to various physiological conditions (sleep, emotions, and drowsiness) [44][45][46] and neurological disorders (schizophrenia and Parkison's disease) [47][48][49]. The paper is structured as datasets, preprocessing, robust TQWT, feature extraction, channel selection, and classification are described in Section 2. Section 3 explains the results of computation of the explored method, discussion with state-of-the-art techniques (SOTA) in Section 4. The gist of the developed technique is presented in Section 5.

Data-Set
The proposed method uses the IV-a dataset of BCI competition-III [50]. MI-tasks of five healthy subjects (aa, al, av, aw, ay) sat on comfortable chairs have been recorded. Subjects performed the MI tasks of right foot (RF) (task1) and right hand (RH) (task2). According to the international 10-20 system, one hundred eighteen electrodes are used to acquire the EEG recordings of MI-related tasks. For intimation of MI tasks, subjects are shown with visual cues of 3.5 s before acquiring EEG signals [51]. For each subject, a total of 280 trials are taken. Each task has 140 trials equally. Each subject consists of different sizes of training and testing sets. Out of 280 trials, for subjects aa, al, av, aw, ay training set composed of 168, 224, 84, 56 and 28 trials and remaining used for testing. The actual sampling rate in recording EEG signals is 1000 Hz, later downsampling at 100 Hz. The stepwise working of the proposed method is shown in Figure 1.

Pre-Processing
For BCI, two types of oscillations play a vital role, Rolandic mu and central beta lying in the frequency range of 7-30 Hz [52]. These oscillations originate in the sensorimotor cortex. Hence the EEG signals are bandpass through a sixth-order Butterworth IIR filter in the frequency range of 7-30 Hz. After the BPF, the filtered data of RF and RH has been segmented into 250 and 200 epochs of length 2000 samples each for every channel for further processing. The RF and RH EEG signals are shown in Figure 2, respectively. It is evident from the Figure that both filtered EEG signals of RF and RH have the same amplitude range, making it difficult to classify MI tasks. Examples of EEG signals for RF and RH brain activity.

Robust Tunable Q Wavelet Transform (r-TQWT)
A TQWT is a parametrized discrete-time WT with a tunable quality factor. To decompose the signal into subbands (SBs) using TQWT requires quality factor (Q), redundancy rate (r) and levels of decomposition (L) as a basis function. Decomposing a signal with L level results in the generation of one lowpass SB (LPS) and L highpass SBs (HPS). TQWT can tune the Q as per the oscillatory behavior of signals. The higher quality factor is recommended for better frequency resolution and a higher redundancy rate for the well-localized time domain. After the L level decomposition of signal, LPS V L 0 (ω) and HPS V L 1 (ω) are represented by [53] With the tuning property of Q and r following the highly oscillatory nature of EEG signals, TQWT has gained wide acceptance in signal processing. The experiential selection of fixed tuning parameters may lead to the improper decomposition of EEG signals and the loss of some critical information. Setting a common for different EEG signals can lead to a higher decomposition error. The non-stationary nature of EEG signals requires the different values of tuning parameters for every set EEG signal for accurate reconstruction and better decomposition. This inspires us to propose the robust TQWT for selecting appropriate tuning parameters with the help of EOAs using the fitness function of mean square error (MSE) of decomposition. To resolve an issue of uniform tuning parameters, robust TQWT is proposed to automate the selection of Q and r using EOAs. The steps for robust TQWT are described in Algorithm 1. The quality factor of TQWT, which must be higher for signals with higher oscillations, is represented by Algorithm 1 Robust TQWT 1: Select the Q and r initially to 1. 2: Calculate L max from the values of Q and r. 3: Decompose EEGs with Q, r and L max . 4: Get the approximated original signal by inverse robust TQWT using Q, r and M (No. of time samples). 5: Compute the MSE of decomposition error w e (t). 6: while (min(w e (t))) do 7: if (w e (t)== min) then 8: Optimum tuning parameters (Q opt i and r opt i ). 9: else 10: Iterate for next values of Q and r.

11:
end if 12: end while 13: Obtain the maximum decomposition levels L max i from Q opt i and r opt i . 14: Evaluate the optimum decomposition levels from L max . 15: Obtain the SBs of EEG signals using the optimum tuning parameters.
where Q opt i , L max i and r opt i are optimum tuning parameters for signal i.
The time-domain response of the signal is well-localized if r is ≥ 3. The redundancy rate is defined as Maximum levels of decomposition in TQWT is denoted by An EEG signal is represented as a combination of approximated and error signal that can be expressed as where f (t) is the original signal in time domain,f (t) denotes approximated signal evaluated by inverse TQWT, and decomposition error is d e (t). To reconstruct the decomposed EEG signal with minimum decomposition error, mean square error is chosen as an fitness function of EOA represented by Obtaining the minimum decomposition error of EEG signals by manually selecting the optimum tuning parameters is laborious. To overcome this, heuristic optimization techniques, namely PSO, ABC, and CS, are used to automate the tuning parameters. Details of these optimization algorithms are available in [54]. Optimization algorithms evaluate the fittest solution by repetitive iterations using several search agents. PSO mimics the behaviour of bird flocks and fish schooling to get the optimal global solution. The fittest solution formulas of PSO are expressed by where Υ is the constriction factor to increase the chances of convergence, p i and g i are the personal and global best. a 1 and a 2 are the personal and social cognitive factors, and ϕ is the random number in the range 0 to 1. The ABC algorithm copies the honey bees' intelligent behaviour in search of nectar. The update equation to find the optimal solution is given by where v i and x i are the i th particle velocity and position, x r is the randomly selected food source, fitness function is denoted by h i , κ is random number in the range {−1, 1} both are inclusive. f i is the value of fitness function for solution x i , probability is denoted by p s i and number of food sources are FN. The cuckoo search algorithm impersonates the cuckoo bird species' broot parasite and Levy flight property. Following the Levy flight, the new solution of CS is generated by where Levy is the Levy distribution, δ is the step size mostly equal to 1, and ⊗ is the matrix multiplication. Once the optimum Q and r are obtained from EOA, the maximum decomposition level is calculated from Equation (5). To maintain the uniformity among all the decomposed EEG signals, the optimum level of the decomposition L opt is evaluated. The equation of the L opt is defined by where P is the number of signals belonging to each class and C are the number of classes. The examples of subbands obtained after r-TQWT decomposition for RF and RH brain activities are shown in Figures 3 and 4. It is seen from the Figures that subbands of RH and RF have different amplitude ranges, which depicts their better discrimination ability. Thus, the decomposition of r-TQWT extracts insight information of the EEG signals. Therefore, the idea of r-TQWT is justified for detecting brain activities.

Features Extraction and Channel Selection
Features represent the statistical measures of the decomposed EEG signals. To discriminate the motor imagery tasks, five statistical parameters are evaluated with details as follow.

Hurst Exponent (HE)
HE measures long-term memory of time series. HE quantifies the relative tendency of time series either regress sharply to cluster in a direction or to the mean.
where E is the expected value, r is the range, J are the number of samples in each signal and σ is the standard deviation.

Modified Mean Absolute value1 (MAV1)
MAV1 [55] is the extension of mean absolute value with assignment of weighted window function w j for improving the robustness.

Difference Absolute Standard Deviation Value (DASDV)
DASDV [55] is representing the standard deviation value of the wavelength.

Log Energy Entropy (LEE)
LEE [56] is a measure of the complexity of the EEG signals.
where p j is the probability of occurance of y.

Variance
Variance measure the variation in the data points of the data defined by The features elicited from the filtered robust TQWT decomposed SBs are computed for 118 channels. To reduce the complexity of the BCI system, instead of exploring all the channels for classification, the adequacy of the single EEG channel is used in this framework. Laplacian score (LS), based on Laplacian Eigenmaps and locality preserving projection, is explored for selecting appropriate channels. The Laplacian score for the k th feature is defined as [57].
where F k is the k th feature matrix, T is the transform, 1 = [1, 1, .....1] T , D = diag(S1) and the Laplacian graph L = S − D. i and j correspond to nodes y i and y j , while t is the constant, respectively.

Classifiers
Statistically, significant features are classified using classification techniques. This methodology uses the LS-SVM classifier to discriminate against the RH and RF activities. LS-SVM involves the formulation of inequality constraints that transform non-linear problems into linear equations with a rigid regression cost function. For a training set of data length of N points {z i , y i } N i=1 the decision function of the LS-SVM classifier [58] is defined as where y i ∈ IR n is the i th input feature vector and z i ∈ IR is the i th output pattern. a i is the positive real coefficient, β is the real constant biasing term, Ψ(y, y i ) is the kernel function. Five kernels namely linear, polynomial, Mexican hat (MHW), Morlet (MW), and Radial basis function (RBF) have been used in this framework to test the compatibility of the proposed method. The details of these kernels are available in [59]. The linear kernel is expressed as Ψ(y, y i ) = (y T i y) The polynomial kernel can be defined as The equation of MHW kernel is denoted as MW kernel is represented as The RBF kernel is defined by where l is the degree of polynomial kernel, σ is the RBF kernel width, ω 0 and a are the parameters of MHW and MW kernels.

Results
The selection of fixed basis functions for decomposing non-stationary EEG signals may result in information loss. Setting the tuning parameters of TQWT may not yield the desired results, leading to higher misclassification. The method proposed in this paper uses the automatic selection of tuning parameters to decompose EEG signals using EOA accurately. PSO, ABC, and CS optimizations are used for self-selection of Q, r and L opt to excerpt the time-domain features from the SBs of decomposed signals. To maintain the effectiveness of the proposed method number of iterations (200) and search agents (50) is kept uniform. To choose the tuning parameters adaptively for every EEG signal, the fitness function of the MSE of the reconstructed and the original signal is considered.
The decomposition error obtained for robust TQWT using EOA is shown in Table 1. The average MSE using EOA is minimum in the case of CS algorithm with 2.548 × 10 −05 for RF and 2.352 × 10 −05 for RH while it is maximum for PSO producing the error as 7.822 × 10 −05 and 5.997 × 10 −05 for RF and RH, respectively. The least average MSE is obtained for the cuckoo search algorithm, which inspires us to use the optimum parameters of this algorithm over other EOA. L max evaluated from Q and r is used to compute L opt from Equation (11) is obtained as 5. These optimum Q, r and L opt are employed to obtain the SBs of EEG. Features from 118 electrodes are elicited from decomposed SBs. Employing features of 118 electrodes for classifying MI tasks increases the system's complexity. Therefore, the Laplacian feature score is evaluated to select the substantial channel for performance evaluation. The Laplacian score of the five time-domain features is shown in Table 2. Out of 118 channels, LS of the best performing seven channels is shown in the table. Features of a channel with the highest Laplacian score are considered significant for classification. LS for four features, MAV1, LEE, variance, and HE, produce the highest score for channel-1, while DASDV provides the best score for channel-9. Results of Table 2 prove the superiority of channel-1; hence features of this channel are taken into consideration for further evaluation. Kruskal Wallis (KW) analysis is applied to the extracted features of channel-1 to check the discrimination ability of the features. KW test is a non-parameterized analysis of variance that gives the chi(p) probability. The probabilistic values of p < 0.05 are considered to be significant. Table 3 represents the probabilistic values of various features of the SBs on channel-1. As observed from the table, five features, MAV1, LEE, DASDV, variance, and HE, show the compelling discrimination ability for all the SBs and motivate us to use these feature sets for the segregation of the MI tasks. Five kernels of LS-SVM classifiers are employed in this method to evaluate the classification accuracy of different SBs of channel-1. Table 4 indicates the accuracy (ACC) score of SBs using the LS-SVM classification technique. Table 4 shows that the worst-performing is the linear kernel, while the RBF kernel gives the best performance in all the SBs. SB5 is most superior, while SB6 is most inferior in the case of the linear kernel with a classification accuracy of 73.56% and 61.56%. The highest accuracy score provided by polynomial and Morlet is in SB4, providing an ACC of 89.78% and 92.44%, while the ACC of 97.56% is recorded highest in SB3 for the Mexican hat kernel. RBF kernel is the best performing kernel among all with ACC of 95.56%, 98.22%, 96%, 95.78%, 99.78% and 72.44% for SB1, SB2, SB3, SB4, SB5 and SB6, respectively. As the RBF kernel overperforms other kernels, it is used to evaluate the performance parameters of channel-1. Table 5 shows the results of the performance parameters evaluated on all the SBs of channel 1. Five performance parameters are evaluated to test the dominance of the proposed methodology, viz. ACC, sensitivity (SEN), specificity (SPE), F-1 score and Matthews correlation coefficient (MCC), respectively. As observed from the table, SB5 gives the best results, while SB-6 is the worst performing SB. ACC of 99.78%, SEN and SPE are 99.60% and 100%, F-1 score is noted 0.998 while MCC 99.55%is marked as best scores for SB5. Worst values of ACC, SEN, SPE, F-1 score and MCC is obtained in SB6 as 72.44%, 75%, 69.19%, 0.752% and 44.14% respectively. The binary classification ability of the model is evaluated using the analysis of receiver operating characteristics and area under the curve, as shown in Figure 5. Our model has obtained the area under the curve of 99.81% with a minimal standard deviation depicting its binary classification ability. The confusion matrix of RF and RH is indicated in Table 6. The confusion matrix of RF is given 100% correct and 0% misclassification, while RH provides an incorrect classification of 0.5% and 99.5% of correct classification.

Discussion
The proposed framework is compared with the existing SOTA using the same database for its effectiveness. The performance of the proposed method is compared with 15 SOTA techniques considering the number of channels and features they have used, as shown in Table 7. Ince et al. [35] used the class separability (CT) method with SVM using three channels and eight features with 96% accuracy. Different versions of CSP have also been employed by Park et al. [10], and Lu et al. [9] to detect the MI tasks with an accuracy mark of 86.23% and 88.32%. Siuly et al. explored cross-correlation based signal processing [15][16][17] and optimal allocation [18,19] for recognition of MI tasks. The highest accuracy achieved is 96.62% while using OA and LS-SVM. Zhang et al. [32] used Z-score claimed 81.1% accuracy using linear discriminant analysis (LDA); the ACC of 96.1% was achieved by Verma et al. [33] using DWT. Kevric et al. [23] used MSPCA, WPD and higher-order statistics (HOS), claiming an ACC of 92.8%. Taran et al. [26,27] used EMD, and TQWT to discriminate brain activities and achieved an ACC of 96.89% and 97.56% using LS-SVM, respectively. Taran et al. [25] used RDWT and quadratic SVM classifier combining 10 channels and five features to obtain an ACC of 99.11%. Recently proposed MI tasks classification method using CWT and CNN by Chaudhary et al. [30] achieved an ACC of 99.35%. Subasi and Qaisar [38] developed a hybrid model combing hybridization of MSPCA and WPD to extract the SB. Six statistical measures extracted from the SB have been classified using ensemble classifier techniques. Their model yielded the highest accuracy of 98.69% using an RF classifier. The proposed method uses single-channel selection by Laplacian score and four time-domain features for classification of MI tasks using RBF kernel to achieve an ACC of 99.78% which is higher than all the SOTA using the same database.
The developed model has following advantages: • The model is robust (developed using a 10-fold cross-validation technique). • The developed model is simple (requiring only a single channel and five features to compute model performance). • The model is data-driven (does not require selection of basis).
However, the model suffers some limitations which are as follows: • The model is tested on a single EEG dataset with only five subjects. • The model does not provide explainability for the predictions.

Conclusions
This method explores the utility of automatic selection of optimum tuning parameters of robust TQWT using evolutionary optimization algorithms. The adaptive selection of optimum tuning parameters is accomplished using PSO, ABC and CS algorithms. CS provides the fittest solution for the objective function; hence, the tuning parameters of CS are used to evaluate optimum decomposition levels and decompose the signal with these parameters. This method exploits the LS for selecting a dominant channel for segregating MI tasks using the LS-SVM classifier. The proposed method obtained an ACC of 99.78% which is higher than all other methods using the same database. The method proposed in this paper can be employed in a real-time BCI system to recognise MI tasks with lowered complexity as it uses only a single channel with a lower number of features.

Data Availability Statement:
The data is taken from BCI Competition III which is available at https://www.bbci.de/competition/iii/(accessed on 31 December 2020).

Acknowledgments:
The authors would like to thanks BCI Competition III for providing open access to the EEG dataset. The authors would also like to thank Varun Bajaj, Indian Institute of Information Technology, Design, and Manufacturing (IIITDM) Jabalpur, India for his support and motivation.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: