Efficient Multi-View Graph Convolutional Network with Self-Attention for Multi-Class Motor Imagery Decoding

Tan, Xiyue; Wang, Dan; Xu, Meng; Chen, Jiaming; Wu, Shuhan

doi:10.3390/bioengineering11090926

Open AccessArticle

Efficient Multi-View Graph Convolutional Network with Self-Attention for Multi-Class Motor Imagery Decoding

by

Xiyue Tan

,

Dan Wang

^*,

Meng Xu

,

Jiaming Chen

and

Shuhan Wu

College of Computer Science, Beijing University of Technology, Beijing 100124, China

^*

Author to whom correspondence should be addressed.

Bioengineering 2024, 11(9), 926; https://doi.org/10.3390/bioengineering11090926

Submission received: 17 August 2024 / Revised: 11 September 2024 / Accepted: 14 September 2024 / Published: 15 September 2024

(This article belongs to the Section Biosignal Processing)

Download

Browse Figures

Versions Notes

Abstract

Research on electroencephalogram-based motor imagery (MI-EEG) can identify the limbs of subjects that generate motor imagination by decoding EEG signals, which is an important issue in the field of brain–computer interface (BCI). Existing deep-learning-based classification methods have not been able to entirely employ the topological information among brain regions, and thus, the classification performance needs further improving. In this paper, we propose a multi-view graph convolutional attention network (MGCANet) with residual learning structure for multi-class MI decoding. Specifically, we design a multi-view graph convolution spatial feature extraction method based on the topological relationship of brain regions to achieve more comprehensive information aggregation. During the modeling, we build an adaptive weight fusion (Awf) module to adaptively merge feature from different brain views to improve classification accuracy. In addition, the self-attention mechanism is introduced for feature selection to expand the receptive field of EEG signals to global dependence and enhance the expression of important features. The proposed model is experimentally evaluated on two public MI datasets and achieved a mean accuracy of 78.26% (BCIC IV 2a dataset) and 73.68% (OpenBMI dataset), which significantly outperforms representative comparative methods in classification accuracy. Comprehensive experiment results verify the effectiveness of our proposed method, which can provide novel perspectives for MI decoding.

Keywords:

brain–computer interface; deep learning; motor imagery; graph convolutional networks; self-attention

1. Introduction

A brain–computer interface (BCI) is a system that can directly realize interactive communication between the human brain and the outside world without peripheral nerves and muscles [1]. The research results of BCIs have promoted the development of computer- and mathematics-related disciplines, as well as brain cognitive science and neuroinformatic research, since BCIs have been applied in many fields, such as medical rehabilitation and military practice [2]. The motor imagery (MI) signals generated by the sensorimotor cortex is one of the most focused BCI paradigms by researchers. The event-related desynchronization (ERD)/event-related synchronization (ERS) is mainly induced in the alpha and beta bands accompanied by the phenomenon of spectral oscillation when the users perform MI actions [3]. Researchers identified the parts of the body that generate motor imagery through feature extraction and classification of the collected signals [4].

Methods based on traditional machine learning are commonly used for feature extraction of EEG signals [5]. Machine learning-based methods like Filter Bank Common Spatial Patterns (FBCSP) [6] and Sub-band Common Spatial Pattern (SBCSP) [7] are generally applied the optimal public space filters with category information to extract the spatial distribution of EEG data. However, the feature selection of these methods relies heavily on artificial markers, resulting in unsatisfactory classification accuracy.

In BCI studies, deep learning methods have attracted the attention of researchers in medical applications [8], and studies have found that deep learning methods perform higher accuracy in MI decoding tasks than traditional machine learning methods [9]. The convolutional neural networks (CNN) have been commonly used for automatic feature extraction [10] and classification [11]. Schirrmeister et al. proposed the DeepConvNet and ShallowConvNet models based on the FBCSP and CNNs for feature extraction, which improved the four-class average accuracy by 7% on the BCIC public motion imagination dataset [12]. Lawhern et al. proposed the EEGNet algorithm for feature extraction and integration, which contains the depth-wise separable CNNs. The number of parameters is much less than DeepConvNet and ShallowConvNet, and the model reached fine classification accuracy in MI [13]. Izzuddin et al. proposed a more compact MI decoding architecture based on EEGNet, which use a parameterized SincNet layer for bandpass filtering in the first layer CNN [14]. The deep learning methods based on CNN for EEG classification is a promising decoding technological path. There is a problem that the input type of CNN networks is limited to European data, which ignores the characteristics of the non-Euclidean space of the brain and cannot fully extract the spatial features of the MI signals.

Recently, models based on graph structures have been proposed to solve the problem of feature extraction in non-Euclidean space [15]. Scarselli et al. proposed a Graph Neural Network (GNN) model suitable for graph structures and nodes, and designed a function that maps the graph and its nodes to a higher-dimensional Euclidean space [16]. This paper laid the foundation for the subsequent research of GNN. Kipf et al. proposed the Graph Convolutional Network (GCN) through the localized first-order approximation of spectral graph, which is an extension of GNN [17]. At present, the graph convolution network has rarely been applied to the MI task. Zhang et al. constructed three kinds of distance brain views according to the spatial location of EEG electrodes, and performed a graph embedding of MI signals to learn spatial and temporal features from the time periods with the highest discrimination [18]. Sun et al. proposed an Adaptive Spatiotemporal Graph Convolutional Network (ASTGCN), which dynamically integrated electrode channel information by constructing an adaptive graph convolutional layer to realize the classification of the MI tasks [19]. Hou et al. proposed a MI classification method based on graph convolutional networks and constructed a Laplacian graph representing the electrode topological relationship based on the absolute Pearson matrix of EEG signals [20]. However, these methods only focus on a single aspect of the physical location connection, and ignore the importance of the functional connection between brain regions.

In addition, the attention mechanism has recently made great progress in the fields of computer translation [21] and vision [22], which can dynamically assign weights to input vectors for feature selection. Researchers have tried to apply attention mechanism to motor imagery decoding work, Li et al. proposed a CNN model based on the attention mechanism (MS-AMF), which extracted spatiotemporal features at multiple scales based on the Squeeze-and-Excitation (SE) channel attention method [23]. Zhang et al. reported an automatic channel selection (ACS) strategy based on the SE method for automatically assigning channel weights in EEG signals [24]. Liu et al. developed a distinguishable spatial–spectral feature learning method based on FBCSP for preliminary feature extraction and two SE blocks for feature recalibration [25]. Yu et al. proposed a Lightweight Feature Fusion method based on the improved convolutional block attention module (CBAM) for feature selection operations [26]. However, these methods focus on the attention information between local channels and have limitations in sensing the overall dependence of EEG signals. The self-attention method has been applied to emphasize the global expression capability of discriminable EEG features in emotion recognition [27] and sleep stage classification [28], but it has not been leveraged currently for the MI graph representation classification task.

To overcome these above limitations, we propose a multi-view graph convolutional attention network (MGCANet) algorithm for decoding MI-EEG signals. Specifically, we design a multi-view graph convolutional network with a residual structure and a temporal convolutional network for sufficient feature extraction. Subsequently, we employ the multi-head self-attention for feature selection.

The major contributions of our study are summarized as follows:

(1): We constructed different representation of brain views based on physical distance and functional connection, which can sufficiently express the topological relationship of brain regions in MI signals for subsequent spatial feature extraction.
(2): We designed a residual graph convolutional network called ResChebyNet by combing the advantage of the residual learning and Chebyshev functions. In order to avoid the gradient vanishing problem caused by the increase of the layers of the graph convolutional network.
(3): We developed an adaptive-weighted fusion (Awf) module for collaborative integration of features extracted from different brain views, which can enhance the reliability and accuracy of feature fusion.
(4): We introduced the multi-head self-attention method in the classification framework, which can extend the receptive field of MI signals to a global scale and effectively enhance the expression ability of important features to improve decoding accuracy.

The rest of this paper is as follows. Section 2 contains the details of the proposed approach. Section 3 describes the results of the experiments. Section 4 contains discussion of the proposed approach. The conclusion is in Section 5.

2. Methods

2.1. Overall Architecture

The overall architecture of the proposed MGCANet model is illustrated in Figure 1. It contains four components: multi-view on brain network, spatial–temporal feature extraction, feature selection, and classification. First, multiple views of the brain connection are designed for fully reflect the topological relationships among brain regions, then the ResChebyNet is applied for extracting spatial features and the depth-wise convolutional layer is employed for extracting temporal features. The Awf method is developed to combine the features obtained from different views adaptively. After that, the self-attention mechanism is utilized for feature selection. Finally, the obtained features are input into the classifier for classification.

2.2. Multi-View on Brain Graph

As neuroscience research suggests that, during MI tasks, there will be interactions among the functional regions of the brain. Existing MI modeling methods are limited to focus on a single aspect of physical spatial connections. In this case, we construct physical-distance-based brain view and functional-connectivity-based brain view, respectively. Different views reflect different spatial correlations, which integrate the two aspects of correlation to enhance the spatial information expression of MI-EEG signals.

In this paper, the original MI-EEG signal sequence is defined as

Χ = [x_{1}, \dots, x_{N}] \in R^{N \times T_{time}}

, where N represents the number of electrodes for EEG acquisition,

T_{time}

represents the number of time sampling points, and

x_{i \in [1, N]} \in R^{T_{time}}

represents the one-dimensional EEG signal. The MI brain network is defined as graph

G = (V, E, A)

, where

V

represents the set of vertices, each vertex represents an electrode of EEG, which represents the number of vertices in the MI brain network, that is,

| V | = N

,

E

represents the set of edges, i.e., the connections between EEG electrodes, and

A

represents the adjacency matrix of the motor imagery brain network

G

.

As shown in Figure 2, each time node of the EEG signal

X \in R^{N \times T_{time}}

is regarded as a graph, and multiple brain graphs are constructed according to the number of time nodes. We construct the physical-distance-based brain view

G^{P}

and the functional-connectivity-based brain view

G^{F}

from the brain graph, respectively. They are described separately below.

2.2.1. Physical-Distance-Based Brain View

We construct the adjacency matrix based on the natural spatial distribution of EEG electrodes and quantify the degree of connection between different electrodes by the physical distance, the diagram is illustrated in Figure 3. Research has shown that the closer the EEG channels are physically located, the greater the interaction between them, while the interaction between distant channels is smaller. Inspired by the D-Graph [18], we use the distance-based connection method to calculate the distance among electrode nodes, thereby reflecting the connection between physical spatial locations and enhancing the representation ability of brain region information.

We define the set of the distance between any two EEG nodes as

D_{dis} = {d_{ij} | (p^{i}, p^{j}) \in V^{2}, i \neq j}

, where

V

is the set of all nodes,

p^{i}

and

p^{j}

are two different electrode nodes in V,

d_{ij} = \sqrt[2]{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2} + {(z_{i} - z_{j})}^{2}}

is the Euclidean distance between node

p^{i}

, and

p^{j}

,

x_{I}

,

y_{i}

, and

z_{i}

are the physical coordinates of the nodes. We regard the two nodes are adjacent when the distance between two nodes is less than the average value of

D_{dis}

, and the distance between two nodes is defined as the average distance of other adjacent nodes to this node. The diagram of the constructed adjacency matrix based on physical distance is shown in Figure 3 (taking the BCIC IV 2a dataset as an example). The adjacency matrix

A_{distance}

of the physical-distance-based brain view is expressed as follows:

A_{distance} = \{\begin{matrix} \frac{1}{d_{ij}} i f d_{ij} < M (D_{dis}) \\ 0 i f d_{ij} \geq M (D_{dis}) \\ \frac{1}{M ({d_{iq} | d_{iq} < M (D_{dis}), q \in [1, N]})} i f i = j \end{matrix}

(1)

where the

M (D_{dis})

is the average function of distance set

D_{dis}

.

2.2.2. Functional-Connectivity-Based Brain View

During the task of the motor imagery, there will have collaborative neural activity between different brain regions, and generate functional spatial topological relationships. In this case, we construct the brain view based on the functional connections to dynamically learn the functional graph of MI-EEG. We adaptively construct adjacency matrices based on the connection relationship between different channels to further enhance the expression of spatial correlation information.

Specifically, different EEG channels are regarded as nodes in the graph network, and the matrix function

A_{f u n c t i o n}

is defined to represent the functional connection relationship between node

p

and node

q

, which is expressed as:

A_{f u n c t i o n} = \frac{\exp (ReLU (ω^{T} |x_{p} - x_{q}|))}{\sum_{q = 1}^{N} \exp (R e L U (ω^{T} |x_{p} - x_{q}|))}

(2)

where

x_{i \in [1, N]} \in R^{T_{time}}

is the input of the EEG signal,

ω = {[ω_{1}, ω_{2}, . . ., ω_{T_{time}}]}^{T} \in R^{T_{time}}

is a learnable parameter, and ReLU is the activation function to ensures that

A_{f u n c t i o n}

is a non-negative function. The Softmax function is used to normalize the rows of the matrix function. The learnable parameter

ω

is updated by minimizing the loss function, which is expressed as:

L_{function} = \sum_{p, q = 1}^{N} {{| | x}_{p} - x_{q} | |}_{2}^{2} A_{f u n c t i o n} + λ {| | A_{f u n c t i o n} | |}_{2}^{2}

(3)

where

{| | x}_{p} - x_{q} |

| represent the functional distance between the two nodes. The regularization parameter

λ

is used to reduce the sparsity of the matrix and avoid generating trivial solutions. We adopt the cross-entropy loss function as the supervised classification loss, which is defined as:

L_{c r o s s l o s s} = - \frac{1}{N} \sum_{k = 1}^{K} \sum_{n = 1}^{N} y_{n} \log ({\hat{y}}_{n})

(4)

where

N

represents the number of samples,

K

represents the number of categories,

y_{n}

represents the sample label, and

{\hat{y}}_{n}

represents the probability of identifying the sample label as category

k

. The functional loss

L_{f u n c t i o n}

is adopted as a regularization term to the loss function to form the final loss function. The total loss function

L_{l o s s}

of the model can be expressed as follows:

L_{l o s s} = L_{c r o s s l o s s} + L_{f u n c t i o n}

(5)

In order to achieve adaptive learning of graph structure information and improve the model’s ability to process graph data, we add the trainable parameter matrix

W

to the adjacency matrix

A_{d i s t a n c e}

and

A_{d i s t a n c e}

before the graph convolutional layer for adaptive adjustment:

\{\begin{matrix} H_{distance} = (A_{d i s t a n c e} ⨀ W_{1}) \\ H_{function} = (A_{f u n c t i o n} ⨀ W_{2}) \end{matrix}

(6)

where

A_{d i s t a n c e}

and

A_{f u n c t i o n}

represent the physical-distance-based adjacency matrix and the functional-connectivity-based adjacency matrix, respectively.

W_{1} \in R^{N \times N}

and

W_{2} \in R^{N \times N}

are learnable parameter matrices and

H_{distance}

and

H_{function}

are the output adjacency matrix after adaptive adjustment.

2.3. Spatial–Temporal Feature Extraction

2.3.1. Spatial Feature Extraction

Graph convolution based on spectral graph theory can be used to extract spatial feature in the spatial dimension. We develop a method based on the Chebyshev graph convolution [17] to extract the spatial feature of the MI signals. The Chebyshev graph convolution approach is reviewed in this section.

For graph G, its corresponding Laplacian matrix

L \in R^{N \times N}

can be expressed as:

L = D - A

(7)

where the degree matrix

D \in R^{N \times N}

is the diagonal matrix consisting of the degrees of graph nodes,

D_{i i} = \sum_{j} A_{i, j}

,

A

is the adjacency matrix. Since L is a real symmetric matrix, it can be normalized and expressed as:

L = I_{N} - D^{- \frac{1}{2}} A D^{- \frac{1}{2}} = U {Λ U}^{T}

(8)

where

I_{N}

is the identity matrix,

Λ

is the eigenvalue matrix of

L

, and

U

is the orthonormal matrix composed of the eigenvectors of matrix

L

. Assume the input of the graph is

x

, the graph Fourier transform is denoted as:

\hat{x} = U^{T} x

(9)

The inverse graph Fourier transform is expressed as:

x = U \hat{x}

(10)

The graph convolution with input data

x

and filter

g_{θ}

can be expressed as:

x * g_{θ} = U (U^{T} g_{θ} ⨀ U^{T} x) = U \hat{g_{θ}} U^{T} x

(11)

where

⨀

represents the Hadamard product and

\hat{g_{θ}} = U^{T} g_{θ}

. Let

g_{θ}

be the function

g_{θ} (Λ)

of the eigenvalue matrix of

L

.

Since the operation has high computational complexity and lack of locality, Defferrard et al. [17] introduced the K-order truncated expansion of Chebyshev polynomials to approximate the filtering operation as follows:

g_{θ} = g_{θ} (Λ) \approx \sum_{k = 0}^{K} θ_{k} T_{k} (\tilde{Λ})

(12)

\tilde{Λ} = \frac{2}{λ_{m a x}} Λ - I_{N}

(13)

where

λ_{m a x}

is the largest eigenvalue of

L

,

θ

is the coefficient of Chebyshev polynomial. The Chebyshev graph convolution (ChebyNet) with input data

x

and filter

g_{θ}

can be expressed as:

x * g_{θ} \approx \sum_{k = 0}^{K} θ_{k} T_{k} (\tilde{L}) x

(14)

\tilde{L} = \frac{2}{λ_{m a x}} L - I_{N}

(15)

where

T_{k}

is the Chebyshev polynomials which is recursively defined as:

\{\begin{matrix} T_{0} (\tilde{L}) = 1 \\ T_{1} (\tilde{L}) = \tilde{L} \\ T_{k} (\tilde{L}) = 2 \tilde{L} T_{k - 1} (\tilde{L}) - T_{k - 2} (\tilde{L}), k > 1 \end{matrix}

(16)

Since deep graph convolutional network can aggregate high-order features, we build the ReschebyNet block combining the ChebyNet and residual learning [29] to alleviate the gradient vanishing problem caused by multi-layer graph convolutional layers, the structure of ReschebyNet block is shown in Figure 4. A ChebyNet layer contains the graph convolutional layer, the normalization layer and the activation layer. We utilize the ChebyNet as the graph convolutional layer for feature extraction. This is closely followed by a batch normalization layer [30] and an activation function ReLU, which is used to accelerate network convergence speed and perform nonlinear calculations. We build the ResChebyNet block with two sequential ChebyNet layers in inter-block residual connection and add the input of the block directly to the output of the block in skip connection.

The over-smoothing phenomenon will occur when the scale of the graph convolutional network aggregation is gradually expanded to all nodes of the graph, which will cause the gradient vanishing in backpropagation. To overcome the defect of deep GCN, we use the internetwork-block residual structure for the connection between each ResChebyNet block, and the local structure of the graph obtained by the convolutional network of the previous block of graph is used as the input to the next block, as shown in Figure 1. During the process of spatial feature extraction, the sum of the input of each block and the output of the last block is taken as the final extracted spatial feature.

2.3.2. Temporal Feature Extraction

After extracting spatial feature through the proposed ResChebyNet, we use a depth-wise convolutional layer with a convolution kernel size of (1, 25) to extract temporal features in the temporal dimension. A batch normalization layer and an activation layer with ELU function is performed after the depth-wise convolutional layer, which is employed to speed up training. Then, an average pooling layer with kernel size of (1, 4) is utilized to compress temporal features and integrate the information.

2.4. Adaptive-Weighted Fusion

Existing research [19,31] generally uses element-wise addition or concatenate operation to directly concatenate the features extracted from different graph convolution branches, which ignore the potential for collaboration of both. Therefore, we propose an adaptive-weighted fusion (Awf) module, which utilizes the complementary advantages of features from two different views to maximize the representativeness of each feature and improve the decoding performance.

As shown in Figure 5,

X^{P}

and

X^{F}

represent the features from the physical-distance-based brain view and functional-connectivity-based brain view, respectively. First, a concatenation operation is used for connect the two feature sets to obtain the fused feature

Z

. A depth-wise convolutional layer with kernel size 3 and a pointwise convolutional layer with kernel size 1 is used to reduce channel dimension. A global average pooling (GAP) layer is used to reduce the model parameters and computation. Then, a convolutional layer with kernel size 1 without bias is used to integrate the channel information, and a sigmoid function is used to calculate the correlation coefficient matrix R. The correlation coefficient matrix

R

can be expressed as:

R = σ (f_{C o n v} (G A P (f_{P w C o n v} (f_{D w C o n v} (Z)))))

(17)

where

σ

is the sigmoid function,

f_{c o n v}

denotes convolutional operation, and

G A P

represents global average pooling operation. Then, the coefficient matrix is multiplied with the input feature to obtain

\tilde{X^{P}}

and

\tilde{X^{F}}

, which are added together to get the final output feature

\tilde{Z}

. The output of the Awf module can be expressed as:

\tilde{Z} = R e L U (\tilde{X^{P}} + \tilde{X^{F}})

(18)

where

R e L U

is the activation function.

2.5. Self-Attention Feature Selection

Each convolution operation has a limited receptive field, which may lead to the loss of global feature information during EEG feature extraction and affect the classification accuracy of the model. In order to solve the problem of limited receptive field size caused by the convolutional structure, the self-attention mechanism is introduced to capture the global dependencies of EEG signals. We combine the multi-head attention (MHA) [32] method in MGCANet to calculate the attention strength between nodes, and assign more attention weights to the features that have a higher contribution to the classification. Simultaneously, multiple attention modules are used to learn the global dependence of MI-EEG signals from different perspectives to represent more feature information.

The input of MHA method is denoted as

\tilde{Z}

, which are obtained from the prior blocks. First, the input features are transformed into query vectors (Q), key vectors (K), and value vectors (V) with the same shape by multiplying three corresponding weight matrices. Second, a dot product operation is performed on each Q and all remaining K to compute the degree of correlation between vectors. The self-attention weight is calculated by matrix multiplication and SoftMax function, which can express as follows:

A t t e n t i o n (\tilde{Z}) = s o f t m a x ({Q K}^{T} / \sqrt{d_{k}}) V

(19)

where

\sqrt{d_{k}}

is a scaling factor to ensure the stability of gradient. The self-attention feature vector can be obtained by multiplying the self-attention weight and perform element-wise summation with the value vector. Multi-head attention concatenates multiple self-attention layers to obtain internal features of different representation subspaces and achieving richer feature information representation. MHA is expressed as follows:

M H A ({\tilde{Z}}_{Q}, {\tilde{Z}}_{K}, {\tilde{Z}}_{V}) = [{h e a d}_{0}; \dots; {h e a d}_{h - 1}] W^{O}

(20)

{h e a d}_{i} = A t t e n t i o n ({\tilde{Z}}_{Q} W_{i}^{Q}, {\tilde{Z}}_{K} W_{i}^{K}, {\tilde{Z}}_{V} W_{i}^{V})

(21)

where

W_{i}^{Q}

,

W_{i}^{K}

, and

W_{i}^{V}

represent the weight matrices for each group of Q, K, and V, respectively,

{h e a d}_{i}

represents the representation subspaces of

i

th attention head, and MHA denotes merging all groups of attention heads together to obtain the final output that contains all the attention head information. Layer normalization is added after the MHA layer to normalize the representation. A feed-forward (FF) block contains two fully connected layers is applied to improve calculation efficiency.

2.6. Classification

After feature extraction, the spatio-temporal feature maps obtained by prior blocks are concatenated along the channel axis and compress them into a one-dimensional feature vector. Finally, two fully connected layers and a SoftMax function was performed as the classifier to recognize the target limbs.

3. Experiments and Results

We conducted several experiments on two open datasets and compared the performance of the proposed MGCANet with other typical MI classification methods. The details of experimental datasets, setup, and results are shown as follows.

3.1. Dataset

Two frequently used public MI datasets, BCI Competition IV 2a dataset [33] and OpenBMI dataset [34], are utilized for the experiments.

3.1.1. BCI Competition IV 2a Dataset

BCI Competition IV 2a dataset (denoted as BCIC IV 2a below) is a public dataset containing the motor imagery records of nine subjects recorded on two different days. It has two sessions including four types of motor imagination tasks (left hand, right hand, feet, and tongue). Each session includes a total of 288 trials, with 72 trials in each category. EEG signals are collected by using 22 electrodes and sampled at a sampling rate of 250 Hz. We intercept the data from 0 to 4 s after the prompt as a trial, and each trial is a matrix with a dimension of channels (22) × sampling points (1000).

3.1.2. OpenBMI Dataset

The OpenBMI dataset is a public dataset containing the motor imagery records of 54 subjects recorded the tasks of left hand and right hand. It has two sessions, where each session includes a total of 200 trials, with 100 trials in each category. EEG signals are collected by using 62 electrodes and sampled at a sampling rate of 1000 Hz. According to the settings of the original dataset, the raw data are downsampled to 250 Hz and 20 electrodes from the motor cortex region are selected for analysis, namely, FC5, FC3, FC1, FC2, FC4, FC6, C5, C3, C1, Cz, C2, C4, C6, CP5, CP3, CP1, CPz, CP2, CP4, and CP6. We intercept the data from 0 to 4 s after the prompt as a trial, and each trial is a matrix with a dimension of channels (20) × sampling points (1000).

3.2. Experimental Setup

The experiments were implemented with the PyTorch library on a workstation with i9-13900K CPU and RTX4090 GPU. Scikit-learn was utilized to calculate the confusion matrix.

For the preprocessing, we utilized the first-order Butterworth filter to filter the EEG signals and performed exponential moving average normalization on the filtered signal [35] to reduce the impact of numerical differences. We performed cross-session evaluation experiment on BCIC IV 2a and OpenBMI. The signals in session 1 were divided into training set, and the signals in session 2 were divided into evaluation set. In the training set, 80% of the data was randomly selected as the training set and the remaining 20% as the validation set. For the training settings, the batch size was set as 64, and max number of epochs was 400. The adaptive moment estimation (Adam) optimizer [36] was used for model optimization with a learning rate set as 0.001. Wilcoxon Signed-Rank Test was utilized to evaluate the statistical significance.

3.3. Compared Methods

We compared five representative methods published in recent years. The description of compared methods are given as follows.

DeepConvNet [12]: DeepConvNet is a general-purpose architecture which combines temporal convolution and spatial convolution operations. It consists of five convolutional layers. We trained this model in the same way as the original paper.
EEGNet [13]: EEGNet designs a lightweight CNN for EEG decoding. The method was modified according to Borra et al. [37], and is suitable for 128 Hz EEG signals (as opposed to 250 Hz signals in our study).
Sinc-ShallowNet [37]: This method extracts features of EEG signals by stacking temporal sinc-convolutional layers and spatial convolutional layers. We reproduced the author’s design and obtained comparable performance.
G-CRAM [18]: G-CRAM constructs three graph structures through the positioning information of EEG nodes and use a convolutional recurrent model for feature extraction. We adjusted the input format based on the original paper.
BiLSTM-GCN [38]: BiLSTM-GCN uses the BiLSTM with the attention model to extract features and uses the GCN model based on Pearson’s matrix for feature learning. We performed corresponding reproduction operations according to the design of the original paper.
EEG-Conformer [39]: The method consists of three parts: a convolution block, a self-attention block, and a fully connected classifier. We performed corresponding reproduction operations according to the published code in the paper.

3.4. Results

3.4.1. Comparison Experiments

We compared the performance of our model with the performance of five representative methods on the MI datasets. The results of the cross-session experiments are listed in Table 1. We presented the average accuracy and kappa value as performance metrics, where the kappa value is used to measure the stability of the model. The standard deviation denotes as std, and the highest accuracy (acc) and kappa value are shown in bold. It can be found that the average acc and kappa value of the MGCANet method are significantly higher than the compared methods (p-value < 0.05), and the standard deviation is the lowest. This indicates that the proposed method has superior classification performance and better robustness. The MGCANet model reaching an average classification accuracy of 78.26% on BCIC IV 2a, which is 2.89% higher than the compared EEG-Conformer method. And the MGCANet model reaching an average classification accuracy of 73.68% on OpenBMI, which is 5.04% higher than Sinc-ShallowNet method in two-class scenario.

DeepConvNet methods directly use the spatio-temporal matrix of EEG signal without fully considering the topological relationship between EEG electrodes. Therefore, the average classification of the DeepConvNet methods is 11.39% lower than the proposed model. The EEGNet and the sinc-ShallowNet methods contains multi-layer CNNs for feature extraction. The average accuracy of the two methods is unsatisfied due to the lack of consideration of the topological connection of brain regions. The G-CRAM method only focuses on the physical position of the electrodes, ignoring the influence of the functional connection of the brain regions. For the BiLSTM-GCN method, the ability to extract global information is relatively weak, which results in lower classification performance than the proposed method. EEG-Conformer method uses the self-attention method to learn global temporal dependencies of EEG features, which has lower mean accuracy due to the lack of analysis of the topological connection among brain regions.

3.4.2. Confusion Matrix

We plotted the confusion matrix of the proposed method for classification on two datasets to evaluate the performance of the algorithm. The confusion matrix is shown in Figure 6. The horizontal axis of the matrix stands for the predicted result labels of the samples, while the vertical axis stands for the actual labels. Among them, the percentage in each column of the matrix identify the proportion of the number of samples to the class. From Figure 6, it can be seen that the proposed method has a higher recall rate in the left-hand and right-hand categories than the other two categories on the BCIC IV 2a dataset. This is because the spatial features induced by left-hand and right-hand motor imagery are more discriminable, indicating that the proposed method can effectively distinguish categories with significant differences. The misclassifications are frequently occurred between the left-hand and right-hand MI. Moreover, the feet MI tasks are frequently misclassified as tongue tasks. This is due to the complexity of the subjects’ thinking activities, and body categories with small inter-category differences are easily confused. On the OpenBMI dataset, the proposed method can correctly classify the vast majority of the left-hand and right-hand categories, and the two types of samples have similar recall rates.

3.4.3. Ablation Study

The ablation study was performed on two datasets to verify the necessity of each component in MGCANet model, and the result is shown in Table 2.

(a) MGCANet without using the physical-distance-based brain view (denoted as ‘w/o P-view’);

(b) MGCANet without using the functional connection-based brain view (denoted as ‘w/o F-view’);

(c) MGCANet without using the Residual structure (denoted as ‘w/o Res’);

(d) MGCANet without using the adaptive-weighted fusion (denoted as ‘w/o Awf’);

(e) MGCANet without using the multi-head attention method (denoted as ‘w/o MHA’);

(f) Complete MGCANet method.

Table 2 shows the result that the average accuracy of MGCANet method is significantly higher than compared methods on two datasets. This demonstrates that each contribution proposed is effective and necessary. On the BCIC IV 2a, the accuracy of experiments (a) and (b) decreased significantly by 3.43% and 3.59%, respectively, indicating that the P-view and F-view have a greater effect on the improvement of model decoding performance. On the OpenBMI dataset, the accuracy of experiments (a) and (c) decreased significantly by 3.73% and 3.96%, respectively, indicating that the P-view and residual structure played a significant role in improving the decoding performance of the model. This also indicates that the emphasis of network models varies across different datasets.

3.4.4. Visualization

We used the t-SNE method [40] to visualize the feature distribution on the two datasets. The quality of the proposed model was analyzed from the visualization results, as shown in Figure 7 and Figure 8. Yellow, blue, green, and red represent the left hand, right hand, feet, and tongue MI tasks, respectively. In each figure, Figure 7a–c and Figure 8a–c represent the data distribution of initial data, data distribution after the feature extraction block, and data distribution after the feature selection block, respectively. It can be seen that the distribution of the initial data is difficult to discriminate. Each category of MI data shows clustering distribution in two-dimensional space after the feature extraction block. After being processed by feature selection block, the distance between samples of different categories becomes longer and each category of data shows obvious clustering distribution characteristics. This indicates that each block in the proposed method exhibits relatively good discriminability.

4. Discussion

In this study, the proposed MGCANet model improved the accuracy by fusing the topological association of brain regions and global attention dependency of EEG signal sequence. First, we consider the physical and functional associations of brain regions to construct different views to achieve information complementarity and emphasize the spatial feature expression of MI signals. Secondly, we use the residual learning to avoid the gradient vanishing and over-smoothing problems caused by increasing GCN depth, which can learn high-order features and make the model easily converged. Thirdly, we introduce the multi-head attention for capturing global feature information, which enable the model to obtain more abundant features. Besides, we proposed a method to adaptively fusion the features for building a more efficient model. The effectiveness of the proposed model is demonstrated through the comparison experiments and the ablation study in Section 3.4, which can realize higher precision of MI classification.

4.1. Visualization of the Brain Topographical Map

To explore the impact of the functional connection-based brain view on spatial feature distribution, we performed brain topographical visualization on the OpenBMI dataset. As shown in Figure 9, Figure 9a,b represent the left-hand motor imagery samples of subject 17, and Figure 9c,d represent the left-hand motor imagery samples of subject 48. It can be seen that there are differences in event-related desynchronization and event-related synchronization (ERD/ERS) when different subjects (subject 17 and subject 48) perform the same motor imagery task (left hand). The specific performance is that the activation location and intensity are different, which shows that there are obvious individual differences in the MI-EEG signals between different subjects. In addition, it can be seen that the energy distribution of the left-hand motor imagery samples after the enhancement of spatial features changes more significantly than that of the original samples, and both can reflect the energy spatial distribution of left-hand motor imagery.

4.2. Selection of Parameters

In this study, we adjusted the parameters of the proposed method to obtain better classification results. The K-order of the Chebyshev polynomials, the number of graph convolutional layers, the number of attention heads, and the maximum norm of weights in fully connected layers have the influence on the classification performance. The range of values was determined according to commonly used values in deep learning field. The parameter selection experiment started from K-order, and then adjusted the number of the graph convolutional layers, the number of attention heads, and the maximum norm sequentially. We kept the values of other parameters unchanged when tuning each parameter. The result of different parameters selection is shown in Table 3, and the highest average classification accuracy is bolded.

As can be seen from Table 3, the optimal performance can be obtained with K = 3, the number of layers is set as 3, the number of attention heads is set as 8, and maximum norm is set as 0.5. The network with K value set as 3 has higher classification accuracy than the network with K value set as 1 or 2. This shows that the larger value of K-order in the Chebyshev polynomial is, the larger the receptive radius of the convolutional kernel is. Therefore, the node can capture more feature information from other nodes for aggregation. However, the classification accuracy of the network with K value set as 4 is lower than that of the network with K value set as 3, which indicates that when the sensory radius of the convolutional kernel increases, the node will add more irrelevant information while capturing more feature information. In addition, it can be observed that the number of graph convolutional layers also affects the accuracy of model classification. Increasing the number of graph convolutional layers can increase the expressiveness of the model, but at the same time, it can increase the complexity of the model. The classification accuracy of the four-layer graph convolutional network is lower than that of the three-layer graph convolutional network, which indicates that increasing the number of graph convolutional layers may lead to overfitting. The classification accuracy continues to increase when the number of attention heads increases. This indicates that increasing the number of attention heads sever the purpose to obtain more comprehensive features and avoiding the model from relying on certain features for decoding. However, the increase of the number of attention heads will increase the parameters, resulting in overfitting and affecting the improvement of accuracy.

4.3. Ablation Study of the Adaptive-Weighted Fusion

To verify the effectiveness of the proposed adaptive-weighted fusion method, we compare it with the other two feature fusion methods, the element-wise addition (denote as add) and concatenation (denote as concat). The experimental result is shown in Table 4. It can be observed that on the BCIC IV 2a dataset, the average accuracy of the proposed fusion method is 1.78% and 1.01% higher than that of the comparison method, respectively. For the OpenBMI dataset, the average accuracy of the proposed fusion method is improved by 1.44% and 0.83%, respectively, compared with the comparison methods. The experimental results further demonstrate the effectiveness of the proposed Awf method.

4.4. The Influence of Different Attention Methods

In this work, we introduced the multi-head attention (MHA) method for feature selection by capturing global attention information, which effectively integrates local and global features. We evaluate the classification performance of four different attention methods on two datasets, namely Squeeze-and-Excitation (SE) [41], Efficient Channel Attention (ECA) [42], and Shuffle Attention (SA) [43], and the result is shown in Table 5. The standard deviation denotes as std, and the highest accuracy (acc) is shown in bold. It can be seen that the classification accuracy of the MGCANet model combined with MHA is higher than that of the model combined with other attention blocks. The model combined with SE allocates attention weights based on the information between feature maps, only considering the attention in the channel dimension and ignoring the capture of attention in the spatial dimension. Since the SA method sets certain feature map groups, it will reduce the effect of channel attention and affect the classification accuracy of the model combined with SA. The model combined with MHA can calculate the feature correlation in the spatial–temporal dimensions and global attention information, which can achieve effective attention weight allocation and improve the decoding performance.

Although the proposed model has achieved good results on the MI decoding tasks, the number of parameters still needs further reducing to build a more lightweight classification model. Future work intends to further explore the ways to optimize the network structure from the perspective of reducing the number of parameters and reducing the number of attention blocks.

5. Conclusions

In this paper, we propose a multi-view graph convolutional attention network (MGCANet) model for motor imagery classification. According to the topological relationship of brain regions during MI tasks, different brain views are constructed based on physical and functional connection to enrich the representation of spatial association information. Considering that the increase in the number of GCN layers can cause gradient vanishing and network degradation problems, we construct the ResChebyNet through residual learning structure, and design an adaptive-weighted fusion strategy to fuse the features from different brain graphs to improve the ability of feature learning. Besides, we introduced the multi-head self-attention method to learn global dependencies and further improve the classification accuracy of the model. The experimental results on public datasets demonstrate the efficiency of the proposed MGCANet and can improve the performance of MI decoding tasks.

Author Contributions

Conceptualization, X.T. and D.W.; methodology, X.T. and D.W.; validation, X.T. and D.W.; formal analysis, X.T. and S.W.; writing—original draft preparation, X.T. and M.X.; writing—review and editing, D.W. and J.C.; supervision, D.W.; funding acquisition, D.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Postdoctoral Fellowship Program of the China Postdoctoral Science Foundation under Grant Number GZC20230189, the Natural Science Foundation of China under Grant No. 12275295, and the Project of Construction and Support for high-level teaching Teams of Beijing Municipal Institutions.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available at the following URL/DOI: https://bbci.de/competition/iv/ accessed on 10 May 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

Aggarwal, S.; Chugh, N. Review of machine learning techniques for EEG based brain computer interface. Arch. Comput. Methods Eng. 2022, 29, 3001–3020. [Google Scholar] [CrossRef]
Orban, M.; Elsamanty, M.; Guo, K.; Zhang, S.; Yang, H. A review of brain activity and EEG-based brain–computer interfaces for rehabilitation application. Bioengineering 2022, 9, 768. [Google Scholar] [CrossRef]
Grazia, A.; Wimmer, M.; Müller-Putz, G.R.; Wriessnegger, S.C. Neural suppression elicited during motor imagery following the observation of biological motion from point-light walker stimuli. Front. Hum. Neurosci. 2022, 15, 788036. [Google Scholar] [CrossRef] [PubMed]
Chaddad, A.; Wu, Y.; Kateb, R.; Bouridane, A. Electroencephalography signal processing: A comprehensive review and analysis of methods and techniques. Sensors 2023, 23, 6434. [Google Scholar] [CrossRef] [PubMed]
Hosseini, M.P.; Hosseini, A.; Ahi, K. A review on machine learning for EEG signal processing in bioengineering. IEEE Rev. Biomed. Eng. 2020, 14, 204–218. [Google Scholar] [CrossRef]
Rithwik, P.; Benzy, V.K.; Vinod, A.P. High accuracy decoding of motor imagery directions from EEG-based brain computer interface using filter bank spatially regularised common spatial pattern method. Biomed. Signal Process. Control 2022, 72, 103241. [Google Scholar] [CrossRef]
Quadrianto, N.; Cuntai, G.; Dat, T.H.; Xue, P. Sub-band Common Spatial Pattern (SBCSP) for Brain-Computer Interface. In Proceedings of the 2007 3rd International IEEE/EMBS Conference on Neural Engineering, Kohala Coast, HI, USA, 2–5 May 2007; pp. 2–7. [Google Scholar]
Kumar, S.; Sharma, A.; Mamun, K.; Tsunoda, T. A Deep Learning Approach for Motor Imagery EEG Signal Classification. In Proceedings of the 2016 3rd Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE), Nadi, Fiji, 10–12 December 2016; pp. 34–39. [Google Scholar]
Yannick, R.; Hubert, B.; Isabela, A.; Alexandre, G.; Falk, T.H.; Jocelyn, F. Deep learning-based electroencephalography analysis: A systematic review. J. Neural Eng. 2019, 16, 051001. [Google Scholar]
Dai, G.; Zhou, J.; Huang, J.; Wang, N. HS-CNN: A CNN with hybrid convolution scale for EEG motor imagery classification. J. Neural Eng. 2020, 17, 016025. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Ding, M.; Zhang, R.; Xiu, C. Motor imagery EEG classification algorithm based on CNN-LSTM feature fusion network. Biomed. Signal Process. Control 2022, 72, 103342. [Google Scholar] [CrossRef]
Schirrmeister, R.; Gemein, L.; Eggensperger, K.; Hutter, F.; Ball, T. Deep learning with convolutional neural networks for decoding and visualization of EEG pathology. In Proceedings of the 2017 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), Philadelphia, PA, USA, 2 December 2017; pp. 1–7. [Google Scholar]
Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef]
Izzuddin, T.A.; Safri, N.M.; Othman, M.A. Compact convolutional neural network (CNN) based on SincNet for end-to-end motor imagery decoding and analysis. Biocybern. Biomed. Eng. 2021, 41, 1629–1645. [Google Scholar] [CrossRef]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Zhang, D.; Chen, K.; Jian, D.; Yao, L. Motor imagery classification via temporal attention cues of graph embedded EEG signals. IEEE J. Biomed. Health Inform. 2020, 24, 2570–2579. [Google Scholar] [CrossRef]
Sun, B.; Zhang, H.; Wu, Z.; Zhang, Y.; Li, T. Adaptive spatiotemporal graph convolutional networks for motor imagery classification. IEEE Signal Process. Lett. 2021, 28, 219–223. [Google Scholar] [CrossRef]
Hou, Y.; Jia, S.; Lun, X.; Hao, Z.; Shi, Y.; Li, Y.; Lv, J. GCNs-net: A graph convolutional neural network approach for decoding time-resolved eeg motor imagery signals. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 7312–7322. [Google Scholar] [CrossRef]
Galassi, A.; Lippi, M.; Torroni, P. Attention in natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4291–4308. [Google Scholar] [CrossRef]
Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
Li, D.; Xu, J.; Wang, J.; Fang, X.; Ji, Y. A multi-scale fusion convolutional neural network based on attention mechanism for the visualization analysis of EEG signals decoding. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 2615–2626. [Google Scholar] [CrossRef]
Zhang, H.; Zhao, X.; Wu, Z.; Sun, B.; Li, T. Motor imagery recognition with automatic EEG channel selection and deep learning. J. Neural Eng. 2021, 18, 016004. [Google Scholar] [CrossRef] [PubMed]
Liu, C.; Jin, J.; Xu, R.; Li, S.; Zuo, C.; Sun, H.; Cichocki, A. Distinguishable spatial-spectral feature learning neural network framework for motor imagery-based brain–computer interface. J. Neural Eng. 2021, 18, 0460e4. [Google Scholar] [CrossRef]
Yu, Z.; Chen, W.; Zhang, T. Motor imagery EEG classification algorithm based on improved lightweight feature fusion network. Biomed. Signal Process. Control 2022, 75, 103618. [Google Scholar] [CrossRef]
Liu, S.; Wang, X.; Zhao, L.; Li, B.; Hu, W.; Yu, J.; Zhang, Y.D. 3DCANN: A spatio-temporal convolution attention neural network for EEG emotion recognition. IEEE J. Biomed. Health Inform. 2021, 26, 5321–5331. [Google Scholar] [CrossRef] [PubMed]
Eldele, E.; Chen, Z.; Liu, C.; Wu, M.; Kwoh, C.K.; Li, X.; Guan, C. An attention-based deep learning approach for sleep stage classification with single-channel EEG. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 809–818. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. JMLR Org. 2015, 37, 448–456. [Google Scholar]
Ye, Z.; Li, Z.; Li, G.; Zhao, H. Dual-channel deep graph convolutional neural networks. Front. Artif. Intell. 2024, 7, 1290491. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Tangermann, M.; Müller, K.-R.; Aertsen, A.; Birbaumer, N.; Braun, C.; Brunner, C.; Leeb, R.; Mehring, C.; Miller, K.J.; Müller-Putz, G.R.; et al. Review of the BCI Competition IV. Front. Neurosci. 2012, 6, 55. [Google Scholar] [CrossRef]
Lee, M.-H.; Kwon, O.-Y.; Kim, Y.-J.; Kim, H.-K.; Lee, Y.-E.; Williamson, J.; Fazli, S.; Lee, S.-W. EEG dataset and OpenBMI toolbox for three BCI paradigms: An investigation into BCI illiteracy. GigaScience 2019, 8, giz002. [Google Scholar] [CrossRef]
Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef] [PubMed]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Borra, D.; Fantozzi, S.; Magosso, E. Interpretable and lightweight convolutional neural network for EEG decoding: Application to movement execution and imagination. Neural Netw. 2020, 129, 55–74. [Google Scholar] [CrossRef] [PubMed]
Hou, Y.; Jia, S.; Lun, X.; Zhang, S.; Chen, T.; Wang, F.; Lv, J. Deep feature mining via the attention-based bidirectional long short term memory graph convolutional neural network for human motor imagery recognition. Front. Bioeng. Biotechnol. 2022, 9, 706229. [Google Scholar] [CrossRef]
Song, Y.; Zheng, Q.; Liu, B.; Gao, X. EEG conformer: Convolutional transformer for EEG decoding and visualization. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 710–719. [Google Scholar] [CrossRef]
Laurens, V.D.M.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Zhang, Q.L.; Yang, Y.B. Sa-net: Shuffle attention for deep convolutional neural networks. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 2235–2239. [Google Scholar]

Figure 1. The structure of the MGCANet method. P-view and F-view represent the physical-distance-based brain view and functional-connectivity-based brain view, respectively. DwConv and MHA stand for the depth-wise convolutional layer and multi-head attention layer.

Figure 2. Graph conversion of EEG signals.

Figure 3. The diagram of the constructed adjacency matrix based on physical distance.

Figure 4. The structure of the ResChebyNet.

Figure 5. The structure of the adaptive-weighted fusion module. DwConv and PwConv denote the depth-wise convolutional layer and the pointwise convolutional layer and GAP denotes the global average pooling layer.

Figure 6. The confusion matrix of MGCANet. (a) The confusion matrix on BCIC IV 2a. (b) The confusion matrix on OpenBMI.

Figure 7. Visualization results on BCIC IV 2a dataset (a) Data distribution of initial data. (b) Data distribution after the feature extraction block. (c) Data distribution after the feature selection block.

Figure 8. Visualization results on OpenBMI dataset (a) Data distribution of initial data. (b) Data distribution after the feature extraction block. (c) Data distribution after the feature selection block.

Figure 9. Topographical distribution of spatial features in OpenBMI: (a) the topography of original samples of subject 17; (b) the topography of enhanced samples by the functional connection of subject 17; (c) the topography of original samples of subject 48; (d) the topography of enhanced samples by the functional connection of subject 48.

Table 1. Cross-session experiment comparison results.

Dataset	Method	Year	Average Acc (%)	Kappa	Std
BCIC IV 2a	DeepConvNet	2017	66.87	0.59	15.03
	EEGNet	2018	68.18	0.57	14.25
	Sinc-ShallowNet	2020	73.34	0.65	12.80
	G-CRAM	2020	72.53	0.64	12.35
	BiLSTM-GCN	2022	73.65	0.67	12.06
	EEG-Conformer	2023	75.37	0.69	12.74
	MGCANet	—	78.26	0.70	10.50
OpenBMI	DeepConvNet	2017	60.08	0.31	14.95
	EEGNet	2018	68.17	0.39	13.06
	Sinc-ShallowNet	2020	68.64	0.36	13.90
	G-CRAM	2020	68.05	0.36	14.21
	BiLSTM-GCN	2022	67.92	0.39	15.14
	EEG-Conformer	2023	66.45	0.35	14.32
	MGCANet	—	73.68	0.45	12.82

Table 2. The result of the ablation study.

Dataset	w/o P-View (%)	w/o F-View (%)	w/o Res (%)	w/o Awf (%)	w/o MHA (%)	MGCANet
BCIC IV 2a	74.83	74.67	75.21	77.39	75.60	78.26
OpenBMI	69.95	70.39	69.72	72.65	71.14	73.68

Table 3. The results of parameter selection.

Parameter	Value	BCIC IV 2a (%)	OpenBMI (%)
K	1	74.34	71.17
	2	76.58	71.45
	3	78.26	73.68
	4	74.03	70.92
Number of layers	1	75.18	70.60
	2	76.72	71.61
	3	78.26	73.68
	4	72.51	69.24
Numbers of Heads	1	77.65	72.75
	4	77.73	73.16
	6	78.04	73.51
	8	78.26	73.68
	10	77.48	72.24
Max norm	0.1	73.63	71.90
	0.2	75.84	72.44
	0.5	78.26	73.68
	1.0	73.19	71.07

Table 4. The results of the ablation study of the adaptive-weighted fusion.

Method	Fusion	BCIC IV 2a (%)	OpenBMI (%)
MGCANet	add	76.48	72.24
	concat	77.25	72.85
	proposed	78.26	73.68

Table 5. The results of the different attention methods.

Dataset	Method	Average Acc (%)	Std (%)
BCIC IV 2a	SE	73.95	15.76
	ECA	76.67	11.08
	SA	77.20	12.28
	MHA	78.26	10.50
OpenBMI	SE	70.24	16.03
	ECA	72.52	13.24
	SA	71.43	12.95
	MHA	73.68	12.82

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tan, X.; Wang, D.; Xu, M.; Chen, J.; Wu, S. Efficient Multi-View Graph Convolutional Network with Self-Attention for Multi-Class Motor Imagery Decoding. Bioengineering 2024, 11, 926. https://doi.org/10.3390/bioengineering11090926

AMA Style

Tan X, Wang D, Xu M, Chen J, Wu S. Efficient Multi-View Graph Convolutional Network with Self-Attention for Multi-Class Motor Imagery Decoding. Bioengineering. 2024; 11(9):926. https://doi.org/10.3390/bioengineering11090926

Chicago/Turabian Style

Tan, Xiyue, Dan Wang, Meng Xu, Jiaming Chen, and Shuhan Wu. 2024. "Efficient Multi-View Graph Convolutional Network with Self-Attention for Multi-Class Motor Imagery Decoding" Bioengineering 11, no. 9: 926. https://doi.org/10.3390/bioengineering11090926

APA Style

Tan, X., Wang, D., Xu, M., Chen, J., & Wu, S. (2024). Efficient Multi-View Graph Convolutional Network with Self-Attention for Multi-Class Motor Imagery Decoding. Bioengineering, 11(9), 926. https://doi.org/10.3390/bioengineering11090926

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Multi-View Graph Convolutional Network with Self-Attention for Multi-Class Motor Imagery Decoding

Abstract

1. Introduction

2. Methods

2.1. Overall Architecture

2.2. Multi-View on Brain Graph

2.2.1. Physical-Distance-Based Brain View

2.2.2. Functional-Connectivity-Based Brain View

2.3. Spatial–Temporal Feature Extraction

2.3.1. Spatial Feature Extraction

2.3.2. Temporal Feature Extraction

2.4. Adaptive-Weighted Fusion

2.5. Self-Attention Feature Selection

2.6. Classification

3. Experiments and Results

3.1. Dataset

3.1.1. BCI Competition IV 2a Dataset

3.1.2. OpenBMI Dataset

3.2. Experimental Setup

3.3. Compared Methods

3.4. Results

3.4.1. Comparison Experiments

3.4.2. Confusion Matrix

3.4.3. Ablation Study

3.4.4. Visualization

4. Discussion

4.1. Visualization of the Brain Topographical Map

4.2. Selection of Parameters

4.3. Ablation Study of the Adaptive-Weighted Fusion

4.4. The Influence of Different Attention Methods

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI