1. Introduction
A brain-computer interface (BCI) is a direct communication and control system that is established between the human brain and an electronic device [
1,
2]. BCI systems have important application value in many fields, especially in the field of medical treatment [
3]. Various electroencephalogram (EEG) signals have been used in BCI systems, such as P300 potentials [
4,
5], steady state visual evoked potentials (SSVEP) [
6,
7], and motor imagery (MI) [
8,
9]. Among these EEG signals, the MI signal is one of the most common signals, as it can be generated spontaneously without any stimulation. However, the recognition of MI-EEG is often difficult for several reasons. First, the high-dimensional MI-EEG signal is too weak and its signal-to-noise ratio is low [
10]. Secondly, the MI-EEG signal is a nonlinear and non-stationary signal, which means that its parameters, such as mean and variance change along with time [
11]. Further, MI signals are time-varying signals that depend on time variables [
12]. In general, MI-EEG signals are highly complex and unstable signals, resulting in challenges for MI-EEG feature extraction and classification.
Feature extraction plays a crucial role in the recognition of MI-EEG signals. However, feature extraction is often used in conjunction with preprocessing methods. The selection of preprocessing methods has an important impact on the extraction from the original MI-EEG to the effective features. Traditional methods usually use energy features, and employ preprocessing methods, such as frequency or temporal filtering, to map the raw MI-EEG data into energy signals [
13,
14,
15]. Duan et al. [
16] used a spatial filter to map the MI-EEG data to an energy signal containing the most obvious features. Dose et al. [
17] extracted the time domain energy and spatial location features directly from the raw EEG. Sturm et al. [
18] applied layer-wise relevance propagation (LRP) and deep neural networks (DNNs) to convert the MI-EEG into frequency energy characteristics. Zhang R and Zong et al. [
19] used a one-versus-rest filter to analyze the MI-EEG signal, and then extract the spatial and temporal features.
Ming-ai Li and Zhang et al. [
20] used wavelet packet transform (WPT) to decompose and reconstruct the MI-EEG to obtain mu rhythm and beta rhythm energy feature information. Recently, a time-frequency analysis method mapped MI-EEG signals to time-frequency image signals. Zhichuan Tang and Li et al. [
21] mapped the MI-EEG signals to temporal-frequency image signals using fast Fourier transform (FFT). Tabar and Halici. [
22] employed short time Fourier transform (STFT) to perform temporal-frequency analysis on MI-EEG. Although FFT and STFT have been used to map MI-EEG to time-frequency images, FFT cannot fully capture the details of the signal and the window of STFT cannot change with the frequency. Furthermore, FFT and STFT were difficult to balance both global and local features when dealing with nonlinear unsteady MI-EEG signals [
23]. In this study, we use the continuous wavelet transform (CWT), which can solve these problems by decomposing the signal into different segments and provide a window that changes with the frequency, with a high time resolution.
Deep learning has a strong ability to handle complex and nonlinear high-dimensional data, and it allows machines to learn the characteristics or classify the input data [
24]. Deep learning has been successfully applied to pattern recognition, especially natural language processing, computer vision, and speech recognition [
25,
26,
27,
28]. Due to the ability of excellent self-learning characteristics [
29,
30,
31], deep learning has gradually been applied to the identification of EEG data, such as P300 [
32,
33], SSVEP [
34], and MI [
35]. Tayeb et al. [
36] used three-channel MI-EEG as the input of the STFT, and the proposed convolutional neural network (pCNN) was trained and tested with the output data of the STFT.
Li and Zhu et al. [
37] used optimal wavelet packet transform (OWPT) to construct MI-EEG feature vectors, which were used to train long short-term memory (LSTM) based on a recurrent neural network (RNN). The algorithm performs well on dataset III of the BCI Competition 2003; however, its structure is overly complex. Liu et al. [
32] used a new CNN structure to classify P300 signals. The algorithm performs well on the BCI competition P300 datasets. Although the classification results of the above deep learning methods perform well, these networks are commonly complex and have massive parameters. In this paper, we propose a new neural network that not only simplifies the network structure and reduces the parameters but also improves the classification performance.
In the study, a new CWT-simplified convolutional neural network (SCNN) algorithm is proposed, based on using deep learning to identify MI-EEGs. First, the CWT is used to map the MI-EEG data into time-frequency image signals, which contain time and frequency domain features. Second, we propose a convolutional neural network without pooling layers, named SCNN. There are two convolutional layers in SCNN to extract the time and frequency domain features, and finally, we use softmax to classify the MI-EEG data. The above method is validated on the BCI Competition IV Dataset 2b. The experimental results show that the performance of our algorithm is improved, compared with other algorithms. In addition, when using the same MI-EEG signal and SCNN, compared with common spatial pattern (CSP), FFT, and STFT, the test results show that the performance of CWT is better.
3. Experimental Results
In this study, 320 trials in the Competition IV Dataset 2b were used to test our algorithm.
Table 2 shows the classification accuracy of each subject’s MI-EEG data using the convolutional neural network and stacked autoencoder (CNN-SAE) [
22], CSP [
13], adaptive common spatial patterns (ACSP) [
45], deep belief net (DBN), [
46] and CWT-SCNN algorithms. Bold text indicates the highest classification accuracy for each subject. As seen from
Table 2, the classification performance of deep learning algorithms, such as CWT-SCNN and DBN, is better than the traditional CSP and ACSP algorithms. Furthermore, four among nine subjects (S2, S5, S6, and S8) obtained the highest classification accuracy using the CWT-SCNN algorithm. In addition, the CWT-SCNN algorithm has the highest average classification accuracy and is approximately 5–8% higher than other algorithms.
The kappa value is used to evaluate the classification performance of the algorithm and remove the impact of random classification [
22]. The calculation expression of the kappa coefficient is as follows:
Since two classification problems are studied here, the random classification accuracy in Equation (
10) is (
= 0.5).
Table 3 shows the kappa values using the CNN-SAE [
22], CSP [
13], ACSP [
45], DBN [
46], and CWT-SCNN algorithms.
As seen from
Table 3, compared with traditional algorithms, such as CSP [
13] and ACSP [
45], random classification has a smaller impact on deep learning algorithms, such as CWT-SCNN. Four of the nine subjects achieved the highest kappa values with the proposed algorithm. Between these, three subjects had kappa values above 0.8 with the proposed algorithm. The highest kappa value of the proposed algorithm for S4 is 0.923, slightly less than that of DBN. In addition, the CWT-SCNN algorithm has the highest average classification accuracy, and is approximately 11–13% higher than other algorithms.
Table 4 and
Table 5 show the classification accuracies and kappa values of CSP-SCNN, FFT-SCNN, STFT-SCNN, and CWT-SCNN on the BCI Competition IV Dataset 2b. The MI-EEG signal is mapped to the time-frequency image signal using FFT, STFT, and CWT. When using tranditional CSP, a matrix that maximizes the difference between the two types of features can be obtained. Then the image signal or matrix is trained and tested by SCNN using 10 × 10-fold cross validation. As shown in
Table 4, eight of the nine subjects obtained the highest classification accuracy using the CWT-SCNN method. Furthermore, the highest average classification accuracy is obtained with the proposed CWT-SCNN method and is at least 4% higher than the other methods. In
Table 5, seven of the nine subjects obtained the highest kappa value with the CWT-SCNN method. Furthermore, the highest kappa value is obtained by using the CWT-SCNN method and is about 7–10% higher than the other methods.
In order to obtain the significance comparison results between the proposed algorithm and other algorithms, we use a non-parametric Friedman test [
47,
48] to evaluate the classification performance of the statistical significance of the algorithms in
Table 2,
Table 3,
Table 4 and
Table 5. The alpha value is set to 0.05, and the number of samples is 9. For data in the
Table 2, we establish the hypothesis H0: The median classification accuracy of each algorithm is the same for the MI-EEG data. A
p value is 0.0147, less than 0.05 (significance value), so the H0 was rejected. It is revealed that there is a significant difference between the classification accuracy of the compared five algorithms. Using the same method, we can obtain that there is a significant difference (
p = 0.0134 < 0.05) between the classification accuracy of the CWT-SCNN, CSP-SCNN, FFT-SCNN, and STFT-SCNN algorithms for data in the
Table 4. Furthermore, for data in
Table 3 and
Table 5, the impacts in the kappa values due to different algorithms are also statically significant (
p = 0.015 and 0.0179, respectively).
In order to compare the difference between with and without pooling layers, we add the pooling layer to the C2 and C3 layers of SCNN to form a standard CNN.
Table 6 lists the output matrix and the parameters of each network layer, when training the standard CNN with a image signal of size (44, 200). Compared with the CNN, the network parameters of SCNN are reduced by half, which not only saves the calculation cost, but also shortens the training time of the network. The average training time of the CNN is 456 s, which is 45 s longer than the average training time of SCNN (each training set contains 288 trials).
Figure 5a,b shows the classification accuracies and kappa values of the BCI Competition IV Datasets 2b using the CWT-SCNN and CWT-CNN methods. The MI-EEG signal is mapped to the time-frequency image signal using CWT. Then the image signals perform 10×10-fold cross validation on SCNN and CNN. From
Figure 5a,b, compared with CWT-CNN, eight of the nine subjects obtained the highest classification accuracy and kappa value using the CWT-SCNN method. In addition, the CWT-SCNN algorithm has the highest mean classification accuracy and mean kappa value. Compared with the CWT-CNN method, the mean classification accuracy and mean kappa value improved by 2.9% and 5.3%, respectively.
4. Discussion
In this study, the proposed CWT-SCNN algorithm is used to identify the left and right hand MI-EEG signals. After being simply filtered, the EEG signals are mapped to image signals through the CWT. Then the signals are input into the SCNN for feature extraction and classification. Tested by the BCI Competition IV dataset 2b, the average classification accuracy and average kappa value obtained by CWT-SCNN algorithm are 83.2% and 0.651, respectively. Compared with the CSP-SCNN, FFT-SCNN, STFT-SCNN, and CWT-SCNN methods, the results show that the average classification accuracy and average kappa value of the CWT-SCNN method are the best. Furthermore, the experimental results show that the CWT-SCNN method not only has a higher average classification accuracy and average kappa value, but also has a shorter training time than that of the CWT-CNN method. In short, compared with traditional or deep learning classification methods, the CWT-SCNN method not only improves the classification accuracy and kappa value, but also shortens the training time.
In order to improve the performance of BCI systems, we proposed combining the CNN and SCNN methods to identify MI-EEG signals. As can be seen from
Table 2 and
Table 3, compared with the traditional classification algorithms, CSP or ACSP, the CWT-SCNN method improves each subject’s classification accuracy and kappa value. Compared with the deep learning algorithms, CNN-SAE or DBN, the CWT-SCNN method obtains a higher average classification accuracy and average kappa value. In general, compared with traditional or deep learning classification algorithms, the CWT-SCNN method improves not only the classification performance but also the overall performance of the system.
By comparing the classification results and kappa values of different preprocessing methods, CWT is more suitable for coordinating with SCNN to analyze MI-EEG signals than CSP, FFT, and STFT. As shown in
Table 4 and
Table 5, compared with the CSP-SCNN, FFT-SCNN, and STFT-SCNN methods, the CWT-SCNN obtains higher classification results and a higher kappa of CWT-SCNN. The previous works showed that CSP as a linear analysis method may ignore short-term changes in the signal and fail to capture the details of the signal change [
49]. Furthermore, FFT cannot capture the local features of MI-EEG signals well [
23]. As the window size of STFT is fixed, it can not make the overall and local features clear. CWT can balance both global and local features by decomposing the signal and providing a time-varying window with a high temporal resolution [
50]. These may indicate that the CWT method added on SCNN can enhance the classification performance for the MI-EEG signals.
The proposed SCNN framework takes advantage of the feature extraction and classification. As can be seen from
Figure 5a,b, compared with the CWT-CNN method, the CWT-SCNN method has a higher classification accuracy as well as a higher kappa value. It can be known from
Table 1 and
Table 6 that, compared with the CNN method, the SCNN method not only has a simple network structure but also has fewer network parameters. The SCNN is typically different from a traditional CNN in that it lacks a pooling layer. Generally, the pooling layer has the effect of reducing image dimensions and parameters. However, previous work has shown that high-resolution signals may lose some important information in the pooling layer [
49]. In addition, in order to reduce the dimensionality of the image, the size of the convolution kernel in this method has been appropriately adjusted, similar to those described in the literature [
42]. To sum up, compared with CNN method, the proposed SCNN method has a better application value.
5. Conclusions
In this paper, we propose CWT-SCNN, which is a new algorithm for identifying left and right hand motion imagery EEG signals. To obtain the time-frequency image as the feature signal and to better extract the features of the MI-EEG in the next step, the CWT method is used to map the MI-EEG signal after being simply filtered. The application of the CWT method solves the problem where the traditional and current preprocessing methods cannot balance the overall and local features. Then the signals are input into the SCNN to extract the features and classify them, which is upgraded by removing the pooling layer from the traditional CNN structure.
Compared with the CNN method, the SCNN method not only shortens the training time and reduces the parameters but also improves the classification accuracy and kappa value. The proposed SCNN method can improve the overall performance of CNN and can be regarded as a CNN upgrade. Overall, the combined CWT and SCNN method performs better than the traditional or deep learning classification methods. The experimental results show that the CWT-SCNN algorithm performs well and is worth considering for further application in BCI systems. We will continue to improve the performance of the algorithm and we expect the performance of the algorithm to reach the highest levels. Furthermore, in the future we will improve the robustness and classification accuracy of the algorithm and apply it to real-time online BCI systems.