A Model for EEG-Based Emotion Recognition: CNN-Bi-LSTM with Attention Mechanism

: Emotion analysis is the key technology in human–computer emotional interaction and has gradually become a research hotspot in the ﬁeld of artiﬁcial intelligence. The key problems of emotion analysis based on EEG are feature extraction and classiﬁer design. The existing methods of emotion analysis mainly use machine learning and rely on manually extracted features. As an end-to-end method, deep learning can automatically extract EEG features and classify them. However, most of the deep learning models of emotion recognition based on EEG still need manual screening and data pre-processing, and the accuracy and convenience are not high enough. Therefore, this paper proposes a CNN-Bi-LSTM-Attention model to automatically extract the features and classify emotions based on EEG signals. The original EEG data are used as input, a CNN and a Bi-LSTM network are used for feature extraction and fusion, and then the electrode channel weights are balanced through the attention mechanism layer. Finally, the EEG signals are classiﬁed to different kinds of emotions. An emotion classiﬁcation experiment based on EEG is conducted on the SEED dataset to evaluate the performance of the proposed model. The experimental results show that the method proposed in this paper can effectively classify EEG emotions. The method was assessed on two distinctive classiﬁcation tasks, one with three and one with four target classes. The average ten-fold cross-validation classiﬁcation accuracy of this method is 99.55% and 99.79%, respectively, corresponding to three and four classiﬁcation tasks, which is signiﬁcantly better than the other methods. It can be concluded that our method is superior to the existing methods in emotion recognition, which can be widely used in many ﬁelds, including modern neuroscience, psychology, neural engineering, and computer science as well.


Introduction
Brain-computer interface (BCI) technology makes it possible for the brain to connect directly to peripheral devices and has a huge impact on people's daily lives.Emotional recognition is an important research topic in human-computer interaction and the key technology of computer intelligence [1].EEG emotion recognition, as an important BCI method, has been widely used in interpersonal communication, decision making, and mental illness diagnosis.In practical projects, it helps human-machine interaction become more friendly [2], and machines can understand emotions and interact with humans [3,4].In medical research, it contributes to the diagnosis and treatment of various psychiatric disorders, such as depression and autism spectrum disorders [5,6].In the field of education, it helps to track and improve the learning efficiency of students [7,8].As a result, emotional recognition has become an integral part of our daily lives.
In recent years, researchers in the fields of machine learning and emotional computing have been working on emotional expression analysis based on visual and physical signals.
The most common signs include facial expressions [9], speech [10], posture [11], magnetoencephalography (MEG) [12], electroencephalogram (EEG) [13], and electrocardiogram important features of the entire EEG record.Note that the mechanism layer can set the weight coefficients of each channel to distinguish the differences between them.This kind of weighted operation can lead to better use of important information and improve the performance of model recognition.
The innovations of this study, compared to previous studies, are that: (1) It uses raw EEG signals without any preprocessing to facilitate their application in brain interfaces.
(2) It introduces a Bi-LSTM with better performance than LSTM to solve gradient explosion and gradient disappearance.(3) It introduces attention mechanisms into deep learning frameworks to solve the problem of manual selection of electrode channel characteristics.(4) The model is applied to SEED and SEED-IV datasets, and the results show that it has a very good generalization performance.

SEED Dataset
SJTU emotion EEG dataset (SEED) was provided by Professor Lu of Shanghai Jiaotong University BCMI laboratory.SEED was made from EEG recordings of 15 subjects [38,39].During the experiment, 15 Chinese film clips, including positive, neutral, and negative emotions, were selected as the stimuli and used in the experiment.The selection criteria for the movie clips are as follows: (1) The length of the whole experiment should not be too long, so as not to cause fatigue in the subject.(2) The video can be understood without explanation.
(3) The video should cause a single target emotion.
The duration of each film clip is about 4 min.Each film clip is carefully edited to create coherent emotional triggers and maximize emotional meaning.There were 15 trials per experiment in total.Each clip was preceded by a 5 s cue, 45 s was used for self-assessment, and 15 s was rested after each clip in a session.The order is arranged in a way that two film clips for the same emotion do not show continuously.For feedback, participants were told to report their emotional responses to each movie clip by completing the questionnaire immediately after watching each clip.SEED-IV [40] contained data from EEG recordings of 15 subjects, and 72 movie clips were carefully selected for three experiments that tended to induce feelings of happiness, sadness, fear, or neutrality.To test the stability and portability of the model for emotional recognition, we conducted the experiment on SEED and SEED-IV datasets, respectively.
SEED and SEED-IV EEG data were collected using channel 62 subsampling frequency of 200 Hz.In order to filter noise and remove artifacts, a band pass frequency filter of 0-75 Hz was applied.The film clips used in the SEED and SEED-IV experiments are shown in Tables 1 and 2. Due to the sheer volume of data, we extracted 1000 consecutive datasets from each person in the experiment in the middle of each video segment, so SEED and SEED-IV extracted 15 people × 15 videos × 1000 EEG data = 225,000 and 15 people × 24 videos × 1000 EEG data = 360,000, respectively.The EEG acquisition process is shown in Figure 1.EEG signals and eye movements were collected using the 62-channel ESI NeuroScan system and SMI eye-tracking glasses.

Data Pre-Processing
The SEED has some files containing the differential entropy (DE) features of the extracted EEG signal that are ideal for those who want to quickly test the classification methods without preprocessing the raw EEG data.SEED-IV filtered out noise and artifacts in-

Data Pre-Processing
The SEED has some files containing the differential entropy (DE) features of the extracted EEG signal that are ideal for those who want to quickly test the classification methods without preprocessing the raw EEG data.SEED-IV filtered out noise and artifacts independent of EEG features with linear dynamic systems (LDSs) and moving averages.SJTU Emotion EEG dataset officially pre-processed and reconstructed the original dataset.But, to test the superiority of our model, the raw EEG data were used to perform experiments without any preprocessing.Data preprocessing in this study only normalized the officially provided EEG signal data, and the aim was to reduce the computational amount while increasing the convergence rate of the model.
Normalization was performed by dividing the raw EEG signal by the maximum of each channel to ensure the same distribution of data across the input layers.Converting data labels into unique hot encoding can transform categorical data into a unified digital format for facilitating the processing and computation of machine learning algorithms.The pre-processed data are divided into the training and test sets as input to the deep learning model.

CNN-Bi-LSTM-Attention Model
There are significant correlations in temporal dimensions of EEG signals.Bi-LSTM is just right for extending temporal features and processing data with sequential features but cannot extract spatial features, so CNN was also introduced.In this paper, the CNN-Bi-LSTM model is proposed to improve the prediction accuracy by introducing attention mechanisms widely used in the field of computer vision and taking into account spatial characteristics and temporal dimensions as well as electrode channel selection.The presented model in this paper consists of an input layer, a convolution layer, a Bi-LSTM layer, an attention mechanism layer, two fully connected layers, and an output layer.The pooling layer can reduce dimensions, but some features may be lost and reduce the accuracy of the model classification, so it was not induced to the proposed model.The normalized data are regarded as input to CNN, which extracts spatial features, and then the output of CNN was put into Bi-LSTM to extract temporal features.The extracted features were put into the attention mechanism layer, which calculates and assigns weight values of each feature, then further extracts features and lowers dimensions through two fully connected networks, and finally classifies the final result using Softmax function.The network structure of the CNN-Bi-LSTM-Attention model is shown in Figure 2.

Convolutional Neural Network
Convolutional neural network (CNN) is a kind of deep neural network, mainly used in image classification, object detection, segmentation, and so on.Usually, CNN consists of several types of layers, including a convolutional layer, rectifier linear unit (ReLU), pooling layer, dropout layer, and fully connected layer (FC).The convolutional layer is the basis of the CNN.In this layer, the signals or features of the previous layer are convolved with the sliding kernel to extract new features.The ReLU layer introduces nonlinearity in the feature mapping by applying the activation function f(x) = max(0, x).The pooling layer reduces the dimension of the feature map by sliding the window and calculates the mean, maximum, or sum of the values within the window.The dropout layer sets the input element to zero with a given probability to reduce over-fitting.The fully connected layer is a single sample column vector, commonly used in the later layers of a convolutional neural network for classification tasks.The fully connected layer is that each junction is connected to all the nodes in the previous layer, which is used to synthesize the features extracted from the front layer.Due to its fully connected characteristics, the parameters of the general fully connected layer are also the highest.The fully connected layer can also achieve dimension reduction by mapping the high-dimensional features to the low-dimensional space.

Bi-Directional Long Short-Term Memory
EEG signals are typical time-series data that usually have a significant pre-posterior correlation in the temporal dimension; that is, the output at some point in the future is closely related to the past state.To model this sequence structure, an RNN was introduced.The RNN introduces recurrent connections in the temporal dimension and adds new hidden layers between different time points so that the entire neural network has the ability to model the anterior-posterior relationships between the sequences.The RNN model has a long-time dependence problem due to gradient disappearance or gradient explosion.LSTM can solve these problems.
The key idea of LSTM model is the "cellular state", which resembles a conveyor belt.Along the conveyor belt, there are only a few linear interactions.LSTM introduces an internal mechanism called a "gate" that regulates the flow of information.These portal structures can learn which data in a sequence are important information to retain and which to delete.LSTM has three gates to protect and control cellular states.The forgotten gate is responsible for determining the amount of previous storage cell states passing through the current LSTM unit.The input gate updates the state of the storage unit using information from the current input and the previously hidden state.The output gate controls the selective output of the current storage unit state.These functions enable LSTM to learn about time relationships in long-term sequences.
The LSTM model is shown in Figure 3.In the LSTM model, the first step is to decide what information to discard from the "cell", which is performed using a forgotten gate.The layer reads the current input x and the foreneuron information h, and the f t decides to discard the information.Output 1 means "fully retained" and 0 means "completely abandoned".The second step, which consists of two layers, is to determine the new information stored in the cell's state.The sigmoid layer acts as the "input gate", determining the value i to update.Tanh layer is used to create a new candidate value vector C t to join the state.The third step is to update the state of the old cells by updating C t−1 to C t .We multiply the old state with f t , discarding the information that is not needed.Then, i t × C is added.These are the new candidate values, which vary according to the degree to which we decide to update each state.The final step is to determine the output, which is based on cell state, but also a filtered version.First, we run a sigmoid layer to determine which parts of the cell state will be exported.Then, we process the state of the cell through tanh (to obtain a value between −1 and 1) and multiply it with the output of the sigmoid gate, and in the end, we only export the portion of the output.The mathematical expression of the LSTM unit is defined as shown in Equations ( 1)-( 6).
mation stored in the cell's state.The sigmoid layer acts as the "input gate", determining the value i to update.Tanh layer is used to create a new candidate value vector  t to join the state.The third step is to update the state of the old cells by updating Ct−1 to Ct.We multiply the old state with ft, discarding the information that is not needed.Then, it× is added.These are the new candidate values, which vary according to the degree to which we decide to update each state.The final step is to determine the output, which is based on cell state, but also a filtered version.First, we run a sigmoid layer to determine which parts of the cell state will be exported.Then, we process the state of the cell through tanh (to obtain a value between −1 and 1) and multiply it with the output of the sigmoid gate, and in the end, we only export the portion of the output.The mathematical expression of the LSTM unit is defined as shown in Equations ( 1)-( 6).

( [ , ]
) ~1 tanh( [ , ] ) ) tanh( ) But, both traditional RNN and LSTM messages are sent from the back of the conveyor belt, which has limitations in many tasks, such as lexical tagging, in which a word is related not only to the first word but also to the second.To solve this problem, two LSTM But, both traditional RNN and LSTM messages are sent from the back of the conveyor belt, which has limitations in many tasks, such as lexical tagging, in which a word is related not only to the first word but also to the second.To solve this problem, two LSTM networks are used in this paper, which are also known as bi-directional long-and short-term memory networks (Bi-LSTMs), which are forward and directional.The Bi-LSTM neural network structure model is divided into two independent LSTMs, and the input sequence is represented by two LSTM neural networks (one positive order and one negative order).So, the arrows of h t represent the LSTM in both the anterior and posterior directions.The Bi-LSTM network structure is shown in Figure 4.
The entire output of h t of Bi-LSTM can be calculated using Equation (7).In Bi-LSTM, the feature data obtained at t moment have both past and future information.Compared with the single LSTM structure, the Bi-LSTM is more efficient in extracting EEG signal features.Bi-LSTM can make use of early and late sequence information, which helps to explore deep brain information from long EEG sequence signals.
networks are used in this paper, which are also known as bi-directional long-and shortterm memory networks (Bi-LSTMs), which are forward and directional.The Bi-LSTM neural network structure model is divided into two independent LSTMs, and the input sequence is represented by two LSTM neural networks (one positive order and one negative order).So, the arrows of ht represent the LSTM in both the anterior and posterior directions.The Bi-LSTM network structure is shown in Figure 4.The entire output of ht of Bi-LSTM can be calculated using Equation (7).In Bi-LSTM, the feature data obtained at t moment have both past and future information.Compared with the single LSTM structure, the Bi-LSTM is more efficient in extracting EEG signal features.Bi-LSTM can make use of early and late sequence information, which helps to explore deep brain information from long EEG sequence signals.

A ention Mechanism
The a ention mechanism is a kind of resource allocation mechanism that simulates the human brain.When the human brain processes things, it tends to focus more on what is important and pay less or no a ention to other areas, thus obtaining more detailed information that needs a ention, suppressing other useless information, ignoring irrelevant information, and amplifying information that needs a ention.The a ention mechanism breaks the limitation that the traditional encoder-decoder structure relies on a fixedlength vector in the code.In order to improve model accuracy, we used the a ention mechanism to focus on properties that have a significant impact on output variables to leverage the most decisive information in EEG sequences.
In this paper, a point multiplication a ention mechanism is used to weight sum the expressions of hidden layer vectors of Bi-LSTM output.By applying the a ention mechanism to the back of the feature extraction model, we can focus on the features that affect the output variables and improve the accuracy of the method.Dot-product a ention consists of three parts, that is, a learned key matrix K, a value matrix V, and a query vector q [41].The key matrix K is obtained via Equation (8).

Attention Mechanism
The attention mechanism is a kind of resource allocation mechanism that simulates the human brain.When the human brain processes things, it tends to focus more on what is important and pay less or no attention to other areas, thus obtaining more detailed information that needs attention, suppressing other useless information, ignoring irrelevant information, and amplifying information that needs attention.The attention mechanism breaks the limitation that the traditional encoder-decoder structure relies on a fixed-length vector in the code.In order to improve model accuracy, we used the attention mechanism to focus on properties that have a significant impact on output variables to leverage the most decisive information in EEG sequences.
In this paper, a point multiplication attention mechanism is used to weight sum the expressions of hidden layer vectors of Bi-LSTM output.By applying the attention mechanism to the back of the feature extraction model, we can focus on the features that affect the output variables and improve the accuracy of the method.Dot-product attention consists of three parts, that is, a learned key matrix K, a value matrix V, and a query vector q [41].The key matrix K is obtained via Equation (8).
where W a is a randomly initialized weight matrix.After that, determine the current key matrix, and similarities between each query value and the current key value are calculated to obtain a normalized probability vector d, that is, weight vector, as shown in Equation ( 9).

Fully Connected Layer (FC)
The full connection layer is a column vector, which is used in the back layers of deep neural networks for image classification tasks.Each node in FC is connected to all the nodes of the upper layer, which is used to synthesize features extracted from the front.Because of its fully connected characteristics, the FC also has the most parameters.The whole connecting layer can also be mapped to the lower dimension to reduce the dimension.

Classifying
Softmax is a very common function in machine learning, especially deep learning, particularly in multiple-category scenarios.It maps the input to a real number between 0 and 1.In a multi-classification problem, we need the classifier to output the probability of each classification.Meanwhile, to compare the size of probabilities, the sum of probabilities should be set to 1. Therefore, the Softmax function is used in this paper.

Evaluation Indexes
In this paper, we evaluated the validity and robustness of our model from different perspectives using common indicators in the classification of EEG emotions, including five parameters: accuracy, precision, recall rate, F1-score, and Matthews correlation coefficient (MCC).Of these, true positive, false negative, true negative, and false positive were expressed by TP, FN, TN, and FP, respectively [42].
Accuracy: Predicting the correct number as a percentage of totals in positive and negative cases.Precision: The proportion of samples in which the prediction is correct is based on the result of the prediction.Recall rate: The proportion of samples that are predicted to be correct to the total number of actual samples is based on actual samples.F1score: Neutralized accuracy and recall metrics.MCC is essentially a coefficient describing the correlation between the actual classification and the predicted classification, with a range of values ranging from a perfect prediction of subjects at a value of 1 to a prediction of less than a stochastic prediction at a value of 0, with -1 being a complete discrepancy between the predicted classification and the actual classification.These assessment parameters are calculated as in Equations ( 11)- (15).

Experimental Setup
In this experiment, the dataset was split into 80% and 20% for training and testing, respectively.Because the number of datasets is large enough, the stability of the test set accuracy is maintained.The amount of training is 200 times, the batch size is 1024, the Adam optimizer is used, and categorical_crossentropy is used as the loss function.To ensure the consistency of the data used in the training set and the test set, all the data of the pre-training model are set as the same random seeds, which are randomly scrambled and sent to the network model.CNN-Bi-LSTM-Attention and other comparison network models were implemented and trained using Python 3.7 and TensorFlow2.3 on GeForce RTX 2080Ti.Table 3 shows the CNN-Bi-LSTM-Attention model parameter settings in this paper.

Recognition Results of Three and Four Classification Task
To validate the classification performance of the CNN-Bi-LSTM-Attention model presented here in EEG detection, we compared it to a combination of DNN, CNN, deep separable convolution neural networks (DSCNNs), LSTM, and Bi-LSTMs.Further, 1D CAE is a two-layer convoluted self-encoder, and 1D InceptionV1 is a model for replacing two-dimensional convolution nuclei with one-dimensional convolution nuclei.We also compared them to six traditional machine learning models, Adaboost, Bayes, Decision Tree, KNN, Random Forest, and XGBoost, all using the same random seeds to ensure consistent use of training and test datasets in training models.Because EEG signals are highly correlated, we divided the dataset, with 80% as a training set and 20% as a test set, meaning EEG data from the first 12 subjects and the last 3 made up a test set.The experimental results of the three and four classification tasks are shown in Tables 4 and 5.It can be seen from Table 4 that the CNN-Bi-LSTM-Attention model performed best in the three classification tasks, with 99.44% accuracy, 99.45% precision, 99.44% recall, 99.44% F1-score, and 99.16% MCC.It can be seen from Table 5 that the CNN-Bi-LSTM-Attention model also performed the best in four classification tasks, with 99.99% accuracy, 99.99% precision, 99.99% recall, 99.99% F1-score, and 99.99% MCC.Further, 1D CAE and Random Forest were second only to the CNN-Bi-LSTM-Attention model in the three and four classification tasks.Bayes classifiers performed the worst of the two types of classification tasks.The results show that the proposed model is more suitable for emotion recognition based on EEG signal.As shown in Figure 5, we drew the confusion matrices of the CNN-Bi-LSTM-Attention model for the three and four classification tasks.In machine learning, a confusion matrix is an error matrix that is often used to intuitively evaluate the performance of supervised learning algorithms.The size of a confusion matrix is a square matrix in which the values represent the number of classes.Each row of this matrix represents an instance in a real class, and each column represents an instance in a predictive class.Figure 5 shows that the CNN-Bi-LSTM-Attention model is highly accurate for emotional recognition classification.

The Results of Ten-Fold Cross-Validation
A single test result is not enough to ensure the superiority of our model because deep learning results differ with each training session.So, we further validated the performance of the proposed model with 10-fold cross-validation.The 10-fold cross-validation is an average split of all samples into 10 equal parts, any of which are considered test data, and it is used to obtain reliable and stable models.We also used fixed random seeds to determine the accuracy of the prediction algorithm by taking the average of 10 results.Figures 6 and 7 show the results of the proposed model based on 10-fold cross-validation of test sets for three and four classification tasks.Ten-fold cross-validation of three and four classification tasks showed an average accuracy of 99.55% and 99.79%, respectively, for the proposed model.

The Results of Ten-Fold Cross-Validation
A single test result is not enough to ensure the superiority of our model because deep learning results differ with each training session.So, we further validated the performance of the proposed model with 10-fold cross-validation.The 10-fold cross-validation is an average split of all samples into 10 equal parts, any of which are considered test data, and it is used to obtain reliable and stable models.We also used fixed random seeds to determine the accuracy of the prediction algorithm by taking the average of 10 results.
Figures 6 and 7 show the results of the proposed model based on 10-fold cross-validation of test sets for three and four classification tasks.Ten-fold cross-validation of three and four classification tasks showed an average accuracy of 99.55% and 99.79%, respectively, for the proposed model.

The Results of Ten-Fold Cross-Validation
A single test result is not enough to ensure the superiority of our model because deep learning results differ with each training session.So, we further validated the performance of the proposed model with 10-fold cross-validation.The 10-fold cross-validation is an average split of all samples into 10 equal parts, any of which are considered test data, and it is used to obtain reliable and stable models.We also used fixed random seeds to determine the accuracy of the prediction algorithm by taking the average of 10 results.Figures 6 and 7 show the results of the proposed model based on 10-fold cross-validation of test sets for three and four classification tasks.Ten-fold cross-validation of three and four classification tasks showed an average accuracy of 99.55% and 99.79%, respectively, for the proposed model.Furthermore, we compared it with other models, as shown in Tables 6 and 7.It can be seen from Table 6 that the proposed model performed best in ten-fold cross-validation for three classification tasks, with 99.55% accuracy, 99.55% precision, 99.55% recall, 99.54% F1-score, and 99.32% MCC.It can be seen from Table 7 that the proposed model also performed the best on ten-fold cross-validation for four classification tasks, with 99.79% accuracy, 99.79% precision, 99.79% recall, 99.79% F1-score, and 99.72% MCC.Random Forest had an accuracy rate of 97.26% and 95.98%, respectively, second only to the proposed model.Furthermore, we compared it with other models, as shown in Tables 6 and 7.It can be seen from Table 6 that the proposed model performed best in ten-fold cross-validation for three classification tasks, with 99.55% accuracy, 99.55% precision, 99.55% recall, 99.54% F1score, and 99.32% MCC.It can be seen from Table 7 that the proposed model also performed the best on ten-fold cross-validation for four classification tasks, with 99.79% accuracy, 99.79% precision, 99.79% recall, 99.79% F1-score, and 99.72% MCC.Random Forest had an accuracy rate of 97.26% and 95.98%, respectively, second only to the proposed model.

Discussion
The CNN-Bi-LSTM-Attention model performed well on three and four classification tasks, whether a single test or 10-fold cross-validation, and further validated the superiority of the model by comparing it with other models.In this paper, the spatial features were extracted via one-dimensional convolution, and the temporal features were extracted through bi-directional LSTM.These two models can extract the temporal-spatial features of EEG data sufficiently.Finally, the weighted EEG signal channels were further extracted using an attention mechanism module, and the final classification results were obtained using the Softmax classifier.Numerous experimental results show that the proposed method has obvious advantages over other methods.This may be due to the fact that machine learning models are unable in extracting deeper features, while other deep learning models do not use attention mechanisms.Thus, the CNN-Bi-LSTM-Attention model presented in this paper has higher classification precision and can dynamically learn the relationship between the pathways of EEG emotion signals.

Conclusions
In this paper, we proposed a deep learning framework that integrates CNN, Bi-LSTM, and attention mechanism networks to automatically extract and classify time-series characteristics of EEG emotional signals.The method normalized the raw data and then fed the data into the CNN-Bi-LSTM-Attention network.The average ten-fold cross-validation accuracy of the method was 99.55% for three classification tasks and 99.79% for four classification tasks.This model is superior to other models in predicting EEG mood signals and has high accuracy and reliability.The experimental results show that deep learning is more advantageous to automatic feature extraction of EEG signals than artificial feature extraction, and electrode channels are automatically screened using an attention mechanism.Thus, our deep learning model can be extended to applications such as epilepsy diagnosis through EEG classification and can be further refined by combining EEG with ECG, EMG, and facial expressions through multi-model deep learning training.
In the future, we can graft models into machine learning system-based applications [43,44], such as brain-interface devices, to make human-machine interaction more friendly.

Electronics 2023 ,
12, x FOR PEER REVIEW 6 of 16of CNN was put into Bi-LSTM to extract temporal features.The extracted features were put into the a ention mechanism layer, which calculates and assigns weight values of each feature, then further extracts features and lowers dimensions through two fully connected networks, and finally classifies the final result using Softmax function.The network structure of the CNN-Bi-LSTM-A ention model is shown in Figure2.

Figure 2 .
Figure 2. Network structure of CNN-Bi-LSTM-A ention model.2.3.1.Convolutional Neural Network Convolutional neural network (CNN) is a kind of deep neural network, mainly used in image classification, object detection, segmentation, and so on.Usually, CNN consists of several types of layers, including a convolutional layer, rectifier linear unit (ReLU), pooling layer, dropout layer, and fully connected layer (FC).The convolutional layer is the basis of the CNN.In this layer, the signals or features of the previous layer are con-

Figure 5 .
Figure 5. Confusion matrices of three-and four-class classification tasks.

Figure 5 .
Figure 5. Confusion matrices of three-and four-class classification tasks.

Figure 6 .
Figure 6.The accuracy of CNN-Bi-LSTM-A ention model based on ten-fold cross-validation for three category tasks.

Figure 6 .
Figure 6.The accuracy of CNN-Bi-LSTM-Attention model based on ten-fold cross-validation for three category tasks.Electronics 2023, 12, x FOR PEER REVIEW 13 of 16

Figure 7 .
Figure 7.The accuracy of CNN-Bi-LSTM-A ention model based on ten-fold cross-validation for four category tasks.

Figure 7 .
Figure 7.The accuracy of CNN-Bi-LSTM-Attention model based on ten-fold cross-validation for four category tasks.

Table 4 .
The performance of different models on three classification tasks of test sets.

Table 5 .
The performance of different models on four classification tasks of test sets.

Table 6 .
The performance of different models based on ten-fold cross-validation (three classification tasks).

Table 6 .
The performance of different models based on ten-fold cross-validation (three classification tasks).

Table 7 .
The performance of different models based on ten-fold cross-validation (four classification tasks).