Classification of Motor Imagery EEG Signals Based on Data Augmentation and Convolutional Neural Networks

In brain–computer interface (BCI) systems, motor imagery electroencephalography (MI-EEG) signals are commonly used to detect participant intent. Many factors, including low signal-to-noise ratios and few high-quality samples, make MI classification difficult. In order for BCI systems to function, MI-EEG signals must be studied. In pattern recognition and other fields, deep learning approaches have recently been successfully applied. In contrast, few effective deep learning algorithms have been applied to BCI systems, especially MI-based systems. In this paper, we address these problems from two aspects based on the characteristics of EEG signals: first, we proposed a combined time–frequency domain data enhancement method. This method guarantees that the size of the training data is effectively increased while maintaining the intrinsic composition of the data. Second, our design consists of a parallel CNN that takes both raw EEG images and images transformed through continuous wavelet transform (CWT) as inputs. We conducted classification experiments on a public data set to verify the effectiveness of the algorithm. According to experimental results based on the BCI Competition IV Dataset2a, the average classification accuracy is 97.61%. A comparison of the proposed algorithm with other algorithms shows that it performs better in classification. The algorithm can be used to improve the classification performance of MI-based BCIs and BCI systems created for people with disabilities.


Introduction
A brain-computer interface (BCI) is a system that facilitates communication between the human brain and external devices (such as a computer or other electronic devices) without the need for any intermediaries. It allows the user to control the computer or smart device directly through signals generated by the brain without the involvement of peripheral organs and muscles. BCI research has significant theoretical, military, medical, and recreational value. BCI research has considerable value across a wide range of fields [1][2][3][4]. Theoretically, it has the potential to offer unprecedented insight into the functioning of the human mind. For military purposes, it can enhance soldiers' efficiency in the field. In the medical realm, it could potentially be used to treat certain conditions, such as paralysis, that have previously been considered untreatable. Finally, it enables the creation of captivating leisure experiences, allowing users to delve into virtual settings and interact with them in unprecedented ways. Electroencephalography (EEG) is a widely utilized signal for building BCI. It offers many advantages, such as being cost-effective, non-invasive, and portable [5]. At the same time, we face the challenge of the high dimensions EEG signal being too weak and the signal-to-noise ratio being low [6]. Furthermore, MI-EEG is a non-linear and unstable signal, which means that its parameters (e.g., mean and variance) change over time [7].
Several scholars have recently used convolutional neural networks to feature the extraction of EEG signals. This technique significantly decreases the number of connections structure of the data. Second, we designed a parallel inputs CNN followed by raw image and continuous wavelet transform (CWT) transforming EEG image as inputs and conduct classification experiments on a public data set to verify the effectiveness of the algorithm. This model not only emphasizes the main features of the original data, but also preserves other valuable features of the original data.

Methods
CWT and CNNs are both used in EEG analysis to extract essential features and patterns from EEG signals. The CWT provides a time-frequency representation of EEG signals, capturing both frequency content and temporal evolution. This is important in EEG analysis because EEG signals are non-stationary and transient in nature, and the CWT can effectively capture these features. CNN, on the other hand, is a type of deep learning model that is well-suited for image and signal analysis, making them a useful tool in EEG analysis. By learning complex patterns and relationships in EEG signals, CNNs can identify important EEG features, such as spikes and oscillations, that are indicative of brain activity and neurological conditions. By leveraging the advantages of both CWT and CNNs, EEG analysis can take advantage of the time-frequency representations provided by CWT, as well as CNNs' ability to identify intricate patterns in EEG signals. This leads to improved EEG analysis and a better understanding of brain activity and neurological conditions.

Continuous Wavelet Transformation (CWT)
CWT is a popularly used method for the time-frequency analysis of signals. It was proposed by Morlet and Grossman in 1987. CWT technique provides a time-frequency representation of signals by decomposing a signal into wavelets of different scales, which are functions that are used to analyze the different frequency components of the signal. This allows for a more complete analysis of signals compared to traditional frequency analysis methods, which only provide a single frequency representation of the signal at a specific point in time. The CWT has proven to be an effective tool for analyzing EEG signals with respect to their frequency and temporal characteristics, which is crucial for understanding brain activity and diagnosing various neurological conditions [20][21][22].
The expression of the continuous wavelet transform is shown in Equation (1) [23].
where s(t) is the input signal, a is the scaling of the wavelet transform, φ is the wavelet basis function, and τ is the time offset. There are five commonly used wavelet basis functions: Morlet wavelet, Mexican Hat wavelet, Haar wavelet, Daubechies wavelet and SymN wavelet clusters. We choose the Morlet wavelet as the wavelet basis function. Its time-domain expression is as follows: The expression of frequency is: By analyzing the data, T and w c of the wavelet function are determined. We use CWT to preprocess the original signal and then use the mapped time-frequency domain image as one of the inputs to the proposed CNN.

CNN
Deep learning is an essential aspect of machine learning-an emerging field that keeps moving forward. In many fields, deep learning has gradually become a pioneer, attracting the attention of numerous scholars [24][25][26][27]. In deep learning, CNNs are a widely used neural network model with multiple applications across various fields [28][29][30]. Meanwhile, there is also relevant research progress in BCI systems [31,32]. Multiple convolutional and pooling layers can be combined in the network structure's middle, followed by fully connected layers.
The heart of CNN is the convolutional layer and its primary function of the convolutional layer is to extract features from the input signal. The ability of the convolutional layer to perform the relevant convolutional operations on the input signal is crucial. Convolutional kernels, also known as filters, can exist in more than one in the framework of a single convolutional layer. The weight parameters and bias of the convolution kernels can be changed during the training of the neural network. Matrix multiplication principles can be used by convolutional operations to generate feature mappings from the input to the output. The position of the neural element in the feature map output by the kth convolution kernel is assumed as (m, n). The output is shown in Equation (4) where I(m, n) is the input data, b is the bias, w(i) k is the kth convolution kernel of the ith layer, f is the activation function of the nerve element, and its common form contains tanch, sigmoid, and rectified linear unit (ReLU) [33].
On the side, subsequent connections to tighter layers enable the extraction of more distinct features. The features of high-dimensional input data can be continuously extracted by combining and superimposing multiple convolutional layers so that more advanced abstract features can be obtained from the signal to the greatest extent. In addition to the convolution operation, the convolution layer also includes the processing of operations such as padding and stride, and the associated computational process is more complex.
Pooling, also known as the sampling layer, is the process of reducing the input image's length and width sizes. Following the convolutional layer, the pooling layer can perform the operation of downsampling to extract local features. The pooling layer allows reducing the number of parameters of the network, which means that the computational effort and complexity of the model are also decreased. At the same time, this can be very robust to small errors in the data and improve the overfitting problem of the network model. The pooling layer has two primary pooling operations. By maximum pooling, the target region's feature value is determined by the largest element value in the region. The maximum pooling operation emphasizes more local features and optimizes the small errors generated by the convolution layer. By averaging pooling, similarly, the target region's feature value is determined by the average value of elements in it. In addition to retaining more adjacency information, the averaging pooling operation improves the region error caused by the size of the convolution kernel. The maximum and average pooling expression is shown in Equations (8) and (9). We assume that the size of the pooling kernel is (N * N).
Max pooling : Average pooling : The fully connected layer is generally treated as the concluding component of the CNN structure. The feature maps produced by intermediate layers are converted into vector format by the feature space transformation when it passes through the fully connected layer, which combines the previously extracted features for use in matrix multiplication. The fully connected layer is responsible for transforming the spatial high-dimensional features that have already been extracted by the CNN and concluding the overall learning process with non-linear mapping.
The better robustness of the CNN network model and its strong generalization performance are mainly attributed to the use of design ideas such as sparse connectivity of convolutional layers, weight sharing, sampling of pooling layers and non-linear mapping of fully connected layers. In traditional recognition studies, a large number of samples are analyzed and processed as a requirement for effective feature extraction and classification recognition. In short, CNN models are automatically trained on the intrinsic features of the signal through convolutional operations and other related operations [34]. CNN has not only excellent feature extraction effects but also has strong interpretability in its network model structure.

Proposed CNN Structure
Event-related desynchronization (ERD)/event-related synchronization (ERS) phenomena typically occur between 8 and 32 Hz in MI-EEG signal tasks, according to related studies [35,36]. In [37], the researchers designed a CNN model with a parallel structure using three different frequency bands as input. They also designed a mixed convolutional scale in their model by assigning each frequency band to the convolutional layer corresponding to three different scales of the convolutional kernel. This adoption of a parallel approach to extracting features at multiple scales improves the accuracy of MI EEG signal classification. In [38], the authors proposed a parallel multiscale filter bank convolutional neural network for MI classification. They used time-domain images as input and then used four kinds of different time-scale convolutional kernels for feature extraction to improve the performance, robustness, and migration learning of the model. The most popular application of CNN is to use either time domain signals or frequency domain signals [9] as input. Since EEG models have both continuous and complex variations, it is hardly possible for the model to extract enough features from only frequency or time domain dimension, a multi-input time-frequency CNN structure is proposed in this paper.
As the EEG signal has temporal, frequency, and spatial information, the proposed CNN is divided into two parts in order to extract more comprehensive temporal and spatial features. As shown in Figure 1, the input to the left half of the model is a time-domain image. This section focuses on modifying the size of the convolution kernel of the original model [31]. The first and second convolutional layers of the network are 1-D convolutional layers with kernel size 1 × 3 and 10 × 1, and stride step size 2 × 1 and 2 × 1, which are used to learn the spatial information and temporal features between each channel, respectively. The right half of the model input is the image after CWT mapping. The overall and local features of the EEG signal may be more clearly evident following the wavelet transform of the original signal, allowing the network to extract features more effectively [34]. The two convolutional layers consist of 32 and 64, convolutional kernels size 5 × 5, and stride step size 2 × 2 and 2 × 2, respectively. The 2-D feature maps extracted from the left and right parts are expanded into a 1-D vector by a fully connected layer. Then, we concatenate all the 1D features from the two branches into a 1D vector and use the vector as the input of the classifier. In this work, we use the SoftMax function as the classifier. ReLUs are used as the activation function in the proposed model because it increases classification accuracy while speeding up convolutional neural network learning. To prevent overfitting of the neural network, L2 regularization is used. In the training, the L2 regularization parameter is set to 0.01, and the Adam optimizer is employed as the optimization method. The initial Sensors 2023, 23, 1932 6 of 16 learning rate is set to 0.1, and the learning rate is automatically optimized. Batch size and number of epochs are set to 64 and 100, respectively. The early stop mechanism is also applied in the training phase. The training process is stopped when the validation loss stops decreasing within 5 epochs. This is performed to prevent the model from continuing to fit the training data and instead keep the best-performing weights that generalize well to new data.
Sensors 2023, 23, x FOR PEER REVIEW 6 of 16 size and number of epochs are set to 64 and 100, respectively. The early stop mechanism is also applied in the training phase. The training process is stopped when the validation loss stops decreasing within 5 epochs. This is performed to prevent the model from continuing to fit the training data and instead keep the best-performing weights that generalize well to new data. CNN's pooling layer can simplify the network parameters, but some useful features might be lost. In [39], a method to classify MI-EEG signals using a simplified CNN was proposed. In this paper, the pooling layer is removed from the standard CNN to optimize the network structure and prevent the loss of effective features.

Data Augmentation
The EEG signals collected by each electrode are commonly considered as 1-D data, in contrast to the computer vision field, where images are often rotated, cropped, deformed, scaled, and subjected to other types of DA methods that frequently use 2-D data. CNN's pooling layer can simplify the network parameters, but some useful features might be lost. In [39], a method to classify MI-EEG signals using a simplified CNN was proposed. In this paper, the pooling layer is removed from the standard CNN to optimize the network structure and prevent the loss of effective features.

Data Augmentation
The EEG signals collected by each electrode are commonly considered as 1-D data, in contrast to the computer vision field, where images are often rotated, cropped, deformed, scaled, and subjected to other types of DA methods that frequently use 2-D data. The two main categories of the current EEG signal data enhancement techniques are data transformation [18,40,41] and noise addition [16,42].
According to the analysis of the essential features of EEG signals, it can be seen that EEG signals have relatively obvious characteristics in both time and frequency domains. The DA algorithm is designed in the time and frequency domains while maintaining the original characteristics of the EEG signal as much as possible in this work.
The proposed method will be divided into two steps: time domain transformation and frequency domain transformation, as shown in Figures 2 and 3. transformation [18,40,41] and noise addition [16,42].
According to the analysis of the essential features of EEG signals, it can be seen that EEG signals have relatively obvious characteristics in both time and frequency domains. The DA algorithm is designed in the time and frequency domains while maintaining the original characteristics of the EEG signal as much as possible in this work.
The proposed method will be divided into two steps: time domain transformation and frequency domain transformation, as shown in Figures 2 and 3.
• Sample 1, sample 2, and sample 3 are three samples of the same class that were randomly selected. We randomly capture a period of 1 s of data from sample 2 to replace the data at the same time position in sample 1. Figure 1 shows a time domain sample generation. Do the same for the test set; • The artificial time-domain EEG sample and sample 3 are divided into two frequency bands, 7-13 Hz and 14-30 Hz, after band-pass filtering, and then a frequency band of sample 3 is exchanged with the corresponding frequency band of the artificial time-domain EEG sample to reconstruct the time-frequency EEG signal.  transformation [18,40,41] and noise addition [16,42].
According to the analysis of the essential features of EEG signals, it can be seen that EEG signals have relatively obvious characteristics in both time and frequency domains. The DA algorithm is designed in the time and frequency domains while maintaining the original characteristics of the EEG signal as much as possible in this work.
The proposed method will be divided into two steps: time domain transformation and frequency domain transformation, as shown in Figures 2 and 3.
• Sample 1, sample 2, and sample 3 are three samples of the same class that were randomly selected. We randomly capture a period of 1 s of data from sample 2 to replace the data at the same time position in sample 1. Figure 1 shows a time domain sample generation. Do the same for the test set; • The artificial time-domain EEG sample and sample 3 are divided into two frequency bands, 7-13 Hz and 14-30 Hz, after band-pass filtering, and then a frequency band of sample 3 is exchanged with the corresponding frequency band of the artificial time-domain EEG sample to reconstruct the time-frequency EEG signal.   • Sample 1, sample 2, and sample 3 are three samples of the same class that were randomly selected. We randomly capture a period of 1 s of data from sample 2 to replace the data at the same time position in sample 1. Figure 1 shows a time domain sample generation. Do the same for the test set; • The artificial time-domain EEG sample and sample 3 are divided into two frequency bands, 7-13 Hz and 14-30 Hz, after band-pass filtering, and then a frequency band of sample 3 is exchanged with the corresponding frequency band of the artificial time-domain EEG sample to reconstruct the time-frequency EEG signal.

Results
All the experimental results are obtained using a computer equipped with an Intel Core i5-7300HQ CPU and a GeForce GTX 1050Ti GPU.

Database
Since 2001, several international BCI competitions have been held to deliver scientists working in this area with reliable data sources and uniform standards of detection algorithms. To test the performance of the proposed methodology, we selected a public data set (2008 BCI Competition IV Data Set 2a [43]) for this purpose.
A diagram of the acquisition process for this data set can be seen in Figure 4. The complete acquisition process of a single EEG signal is divided into three parts. The first 3 s of a trial is the preparation phase. The directional arrow prompts appear on the screen and are accompanied by an audible alert. Subjects need to follow the on-screen prompts to start the motor imagery from the 3rd to the 6th second. After that, the subjects rested and waited for the cue to start the next trial. We intercepted from 3.5 s to the end of the 6th second, considering that the subject might not be able to respond when the cue first appeared, which means that 625 data points were obtained as an original sample in our experiment. The data format for each raw sample is 625 × 22, where 625 is 2.5 s of data points (250 Hz) and 22 means there are 22 electrodes.

Results
All the experimental results are obtained using a computer equipped with an Intel Core i5-7300HQ CPU and a GeForce GTX 1050Ti GPU.

Database
Since 2001, several international BCI competitions have been held to deliver scientists working in this area with reliable data sources and uniform standards of detection algorithms. To test the performance of the proposed methodology, we selected a public data set (2008 BCI Competition IV Data Set 2a [43]) for this purpose.
A diagram of the acquisition process for this data set can be seen in Figure 4. The complete acquisition process of a single EEG signal is divided into three parts. The first 3 s of a trial is the preparation phase. The directional arrow prompts appear on the screen and are accompanied by an audible alert. Subjects need to follow the on-screen prompts to start the motor imagery from the 3rd to the 6th second. After that, the subjects rested and waited for the cue to start the next trial. We intercepted from 3.5 s to the end of the 6th second, considering that the subject might not be able to respond when the cue first appeared, which means that 625 data points were obtained as an original sample in our experiment. The data format for each raw sample is 625 × 22, where 625 is 2.5 s of data points (250 Hz) and 22 means there are 22 electrodes.
Data are collected from each subject in a total of 576 trials. We repeated the proposed DA process 1000 times to obtain 1000 new samples. The final number of samples for each subject increased from the original 576 to 1576.  Figures 5 and 6 show the spectral power comparison plots before and after the signal augmentation, respectively. The red line is the spectrum of the original data, and the blue line is the spectrum after the augmentation. Further, the spectrum of C3, Cz, and C4, which are more closely imagined with the motion, are shown as examples. By comparison, it was found that the characteristics of the original EEG signal were largely retained before and after our augmentation method. The proposed DA method focuses on increasing the energy of the replaced band while keeping the spectral ratio of the signal to its replaced band relatively constant. This process can be used to improve the signal-to-noise ratio and to gain a better explanation of the signal's characteristics. Data are collected from each subject in a total of 576 trials. We repeated the proposed DA process 1000 times to obtain 1000 new samples. The final number of samples for each subject increased from the original 576 to 1576. 6 show the spectral power comparison plots before and after the signal augmentation, respectively. The red line is the spectrum of the original data, and the blue line is the spectrum after the augmentation. Further, the spectrum of C3, Cz, and C4, which are more closely imagined with the motion, are shown as examples. By comparison, it was found that the characteristics of the original EEG signal were largely retained before and after our augmentation method. The proposed DA method focuses on increasing the energy of the replaced band while keeping the spectral ratio of the signal to its replaced band relatively constant. This process can be used to improve the signal-to-noise ratio and to gain a better explanation of the signal's characteristics.

Performance Evaluation Metrics
In this section, we introduce several practical metrics to evaluate the performance of the proposed method. The accuracy Acc is calculated as: where T p , T N , F p , and F N represent true positives, true negatives, false positives, and false negatives, respectively.

Performance Evaluation Metrics
In this section, we introduce several practical metrics to evaluate the performance of the proposed method. The accuracy is calculated as:

Performance Evaluation Metrics
In this section, we introduce several practical metrics to evaluate the performance of the proposed method. The accuracy is calculated as: The other three important metrics, Precision, Recall, and F1 score, can be expressed as:

Determining the Length of Data Segments
Data segmentation and overlap are common ways to increase the amount of data, which can significantly improve classification accuracy [44]. It can be extremely beneficial in terms of reducing resource consumption during network calculational operations. Not only does it decrease the size of the input, but it also reduces the amount of data that needs to be transmitted and stored during the training and inference process, resulting in improved performance and reduced costs. Consequently, these methods can be valuable optimization tools for us. We used a 50% overlap window for the sample, which means that 50% of the data are the same as in the previous segment. The next thing we determined was the scale of the segmentation. Figure 7 shows the accuracy comparison of different size segments, where the 0.5 s window has the worst classification accuracy, and the 2 s and 3 s windows have more similar accuracy. This means that the data of the 2 s fragment already contain a sufficient amount of MI information. To reduce the resources used, subsequent experiments all use a 2 s time window.
where , , , and represent true positives, true negatives, false positives, and false negatives, respectively.
The other three important metrics, Precision, Recall, and F1 score, can be expressed as:

Determining the Length of Data Segments
Data segmentation and overlap are common ways to increase the amount of data, which can significantly improve classification accuracy [44]. It can be extremely beneficial in terms of reducing resource consumption during network calculational operations. Not only does it decrease the size of the input, but it also reduces the amount of data that needs to be transmitted and stored during the training and inference process, resulting in improved performance and reduced costs. Consequently, these methods can be valuable optimization tools for us. We used a 50% overlap window for the sample, which means that 50% of the data are the same as in the previous segment. The next thing we determined was the scale of the segmentation. Figure 7 shows the accuracy comparison of different size segments, where the 0.5 s window has the worst classification accuracy, and the 2 s and 3 s windows have more similar accuracy. This means that the data of the 2 s fragment already contain a sufficient amount of MI information. To reduce the resources used, subsequent experiments all use a 2 s time window.

Performance of the Proposed Model
To demonstrate the superiority of the proposed method, we compared our results with some state-of-the-art models. Table 1 introduces the presentation of comparative literature.

Performance of the Proposed Model
To demonstrate the superiority of the proposed method, we compared our results with some state-of-the-art models. Table 1 introduces the presentation of comparative literature. We retrained two high-cited models (VGGNet and EEGNet) to evaluate their performance using the same dataset. The comparison results are shown in Table 2. In addition, research methods with the suffix (_A) indicate the use of DA techniques. According to the proposed method, the accuracy is improved to 97.61%, the highest among the compared methods. In addition, Our DA method can achieve varying degrees of accuracy improvement, with average accuracy improvements of up to 4.41% and 11.15% for the individual subject.
The compared results with other articles [37,[46][47][48] are shown in Table 3. It can be seen that our method has a relatively high classification accuracy among the state-ofthe-art methods. The proposed method shows a large improvement in the accuracy of subject 2 (up to 18.01% increase) and subject 6 (up to 19.45% increase). Most specifically, compared with Tables 2 and 3, there is no significant difference in classification performance between subjects (the maximum difference between each subject is 5%, which reaches 32% in VGGNet and 12% in DWT-CNN). It indicates that the method in this study overcomes the effect of individual EEG signal variability and can automatically learn the unique EEG signal activity patterns of different subjects in four imaginary states. Alternatively, there are various other metrics that can be employed to evaluate the performance of our proposed model. The confusion matrices for the proposed model are given in Figures 8 and 9.  Figures 8 and 9. Table 4 provides a comprehensive analysis of the precision and recall values of our proposed model on the BCI Competition IV dataset 2a.

Subject
No.

Conclusions
As a result of the complexity of EEG signals, the automatic classification accuracy is usually not high and individual differences are significant in the case of limited training data. Another challenge is that existing CNN-based methods for EEG MI classification use a single domain to extract EEG features. This leads to limited classification since it requires simultaneous decoding of the time domain, frequency domain, and spatial information of EEG signals. We propose a hybrid CNN architecture with a data enhancement approach. The accuracy of EEG motor imagery classification is improved by using more different dimensional images as input to the CNN while increasing the amount of data.
The optimal window size may vary from dataset to dataset, so determining the size of the sliding window can improve the accuracy appropriately. For the 2008 BCI competition IV 2a dataset, the window size of 2 s meets our requirements, which not only reduces the consumption of computational resources but also has an acceptable accuracy rate. Our DA method has been proven to be highly effective in improving the accuracy of training VGGNet, EEGNet, and our proposed model. Overall, our proposed MI-EEG image classification method achieves an average accuracy of 97.61%. The improvement of this method is the design of two different scales of CNN for both time domain and CWT mapping maps, which makes the feature extraction more comprehensive. It also has the highest average classification accuracy compared to several other methods. This indicates the potential of our DA technique in helping to improve the performance of machine learning models. We believe this technique has the potential to be applied to other machine learning and MI-EEG analyzing tasks, and we look forward to further exploring its capabilities. In the future, we aim to design a specialized inference accelerator for this model that can be easily integrated into reconfigurable devices such as field programmable gate arrays (FPGAs). However, this goal presents several design challenges, particularly with the parallel CNN architecture. Both the CNN and CWT mapping operations require a significant amount of computational resources, which may result in insufficient memory or slow inference speed for the devices used. To achieve our goal, we need to

Conclusions
As a result of the complexity of EEG signals, the automatic classification accuracy is usually not high and individual differences are significant in the case of limited training data. Another challenge is that existing CNN-based methods for EEG MI classification use a single domain to extract EEG features. This leads to limited classification since it requires simultaneous decoding of the time domain, frequency domain, and spatial information of EEG signals. We propose a hybrid CNN architecture with a data enhancement approach. The accuracy of EEG motor imagery classification is improved by using more different dimensional images as input to the CNN while increasing the amount of data.
The optimal window size may vary from dataset to dataset, so determining the size of the sliding window can improve the accuracy appropriately. For the 2008 BCI competition IV 2a dataset, the window size of 2 s meets our requirements, which not only reduces the consumption of computational resources but also has an acceptable accuracy rate. Our DA method has been proven to be highly effective in improving the accuracy of training VGGNet, EEGNet, and our proposed model. Overall, our proposed MI-EEG image classification method achieves an average accuracy of 97.61%. The improvement of this method is the design of two different scales of CNN for both time domain and CWT mapping maps, which makes the feature extraction more comprehensive. It also has the highest average classification accuracy compared to several other methods. This indicates the potential of our DA technique in helping to improve the performance of machine learning models. We believe this technique has the potential to be applied to other machine learning and MI-EEG analyzing tasks, and we look forward to further exploring its capabilities. In the future, we aim to design a specialized inference accelerator for this model that can be easily integrated into reconfigurable devices such as field programmable gate arrays (FPGAs). However, this goal presents several design challenges, particularly with the parallel CNN architecture. Both the CNN and CWT mapping operations require a significant amount of computational resources, which may result in insufficient memory or slow inference speed for the devices used. To achieve our goal, we need to overcome these difficulties by finding innovative solutions that allow us to effectively utilize the available resources while ensuring optimal performance.