Four-Stage Domain Adaptation Transfer Learning for EEG-Based Decoding of Unilateral Upper Limb Motor Imagery

Nan, Jiaofen; Jin, Xueqi; Lin, Jingjing; Li, Conghui; Li, Duan; Zheng, Qian

doi:10.3390/info17060592

Open AccessArticle

Four-Stage Domain Adaptation Transfer Learning for EEG-Based Decoding of Unilateral Upper Limb Motor Imagery

by

Jiaofen Nan

,

Xueqi Jin

,

Jingjing Lin

,

Conghui Li

,

Duan Li

and

Qian Zheng

^*

School of Computer Science and Artificial Intelligence, Zhengzhou University of Light Industry, Zhengzhou 450000, China

^*

Author to whom correspondence should be addressed.

Information 2026, 17(6), 592; https://doi.org/10.3390/info17060592 (registering DOI)

Submission received: 14 April 2026 / Revised: 4 June 2026 / Accepted: 10 June 2026 / Published: 13 June 2026

(This article belongs to the Topic Image Processing, Signal Processing and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

The practical application of Brain–Computer Interface (BCI) technology is frequently challenged by significant inter-individual variability in electroencephalogram (EEG) signals. This variability makes it extremely difficult to decode the brain activity of new subjects using pre-recorded data from previous subjects. To address these issues, this study presents an EEG decoding approach based on four-stage domain generalization. We start by preprocessing the data and then dividing it into source and target domains. The source domain data are then passed through four sequential modules: Feature Extraction, Feature Augmentation, Feature Optimization, and Domain Adaptation, where we adjust the parameters using the source domain loss function. Next, the target domain data go through the same four stages while we fine-tune the parameters together with the domain adaptation loss, ultimately obtaining the decoding results for the target domain. The proposed method achieves the highest classification accuracy of 72.61%, outperforming EEGTransferNet by 7.22% and surpassing all classical and deep learning baselines by improvements ranging from 5.97% to 23.86%. Overall, the proposed method significantly enhances cross-subject generalization in motor imagery decoding, offering practical value for plug-and-play BCI applications.

Keywords:

brain–computer interface; domain adaptation; motor imagery; transfer learning; unilateral upper limb

1. Introduction

The human upper limb is the core motor system for performing daily operations, tool use, and fine interactions. Its fundamental movements, such as reaching, forearm pronation/supination, and hand grasping, constitute the basic units of the vast majority of functional activities. Among populations with neurological injuries, including stroke, spinal cord injuries, and motor dysfunctions, impairment of the upper limb motor control pathway often leads to a significant decline in the ability to live independently. Consequently, accurately parsing upper limb movement intentions and achieving reliable decoding have become key scientific issues in the fields of neural engineering, rehabilitation medicine, and Brain–Computer Interface (BCI) [1,2,3].

To date, long-term progress has been made in motor intention decoding based on electroencephalogram (EEG) [4,5]. Since the 1990s, research on MI decoding methods for humans has received widespread attention. At present, decoding research for MI-BCI mainly develops along two directions: traditional machine learning [6] and deep learning [7]. Traditional machine learning methods require manual feature extraction followed by the use of certain features to train classifiers. Notably, classical Filter Bank Common Spatial Pattern (FBCSP) [8]-based approaches have served as the main reference methods for several years. FBCSP effectively extracts discriminative features by dividing EEG signals into multiple frequency bands and applying the CSP [9] algorithm to each band. In addition, methods such as RCSP + SVM, RCSP + KNN, and RCSP + RF also suffer from the same problem caused by manual feature extraction [10,11,12]. In contrast, deep learning can automatically extract features and perform classification, thereby learning complex feature representations from raw data and overcoming the limitations of traditional methods. In particular, Convolutional Neural Networks (CNNs) have been widely adopted as the mainstream architectures due to their exceptional capability in extracting local spatial–temporal patterns. For instance, Schirrmeister et al. [13] designed a deep convolutional network architecture (DEEP CONVNET), leveraging the advantages of deep network structures to automatically learn and extract complex hierarchical features from EEG data. Lawhern et al. [14] proposed a compact convolutional neural network model (referred to as EEGNet), which automatically extracts spatiotemporal features through depthwise and separable convolutions for various EEG classification tasks and demonstrated robust generalization performance in small-sample scenarios. Cai et al. [15] utilized a Genetic Algorithm-pre-trained EEGNet, achieving accuracies of 93.92%, 90.2%, and 94.64% in pairwise classification of unilateral elbow, wrist, and hand joints, respectively. Zhao et al. [16] constructed a multi-branch 3D convolutional neural network framework (3DCNN) specifically for MI classification tasks to automatically model spatial and temporal information. Zhang R. et al. [17] proposed a feature extraction method called Multi-branch Fusion Convolutional Network (MF-CNN), using two independent convolutional neural networks (CNNs) to extract time–frequency domain features and spatial features, achieving a classification accuracy of 78.52% on arm, wrist, and grasping movements. Furthermore, recent Transformer-based architectures have been increasingly explored to capture global temporal and spatial dependencies in EEG analysis [18,19]. For instance, Song et al. [20] proposed a Spatial–Temporal Tiny Transformer method, using the attention mechanism to perform spatial and temporal transformations on feature channels and time slices, achieving average classification accuracies of 82.59% and 84.26% on motor imagery tasks involving hands, feet, and tongue. While the above studies have achieved good results in motor intention recognition, the data distribution across different subjects usually exhibits significant differences due to prominent inter-individual neurophysiological variations. Consequently, when a traditional deep learning model trained on one subject is directly applied to another, its performance often declines. This inter-subject distribution discrepancy greatly limits the versatility and practicality of BCI systems.

To solve this problem, an increasing number of studies in recent years have focused on cross-subject transfer learning methods. These approaches aim to leverage existing data from multiple subjects (source domains) to construct models with robust generalization capabilities and transfer them to new target subjects through appropriate adaptation mechanisms, thereby enhancing the performance and adaptability of the model on new subjects. For example, Roy A.M. et al. [21] proposed an adaptive transfer learning-based multi-scale feature-fused deep convolutional neural network (MSFFCNN) that captures signal characteristics across non-overlapping EEG frequency bands, achieving a classification accuracy of 91.61% in cross-subject experiments involving four tasks: left hand, right hand, feet, and tongue. Zhang K. et al. [22] utilized an adaptive transfer learning method based on deep convolutional neural networks for EEG signal classification, reaching a cross-subject accuracy of 84.19% in decoding EEG for imaginary left- and right-hand tasks. Khademi Z. et al. [23] proposed a hybrid model consisting of CNN and LSTM, combined with a pre-trained Inception-v3 transfer learning strategy, which achieved an average cross-subject decoding accuracy of 92.00% across four classes of motor imagery tasks (left hand, right hand, feet, and tongue). Liang et al. [24] proposed EEGTransferNet, an end-to-end modular transfer learning framework that integrates statistical distribution alignment with domain adaptation techniques to learn cross-individual generic features and domain-specific features at different layers of the neural network. Experimental results demonstrated that EEGTransferNet achieved cross-subject accuracies of 74.68% and 68.29% on two-class motor imagery tasks involving the left and right hands and four-class motor imagery tasks involving the hands, feet, and tongue, respectively. Furthermore, Miao et al. [25] proposed a meta-learning-enhanced multi-source domain adaptation framework, unifying cross-task, cross-dataset, and cross-subject adaptation to achieve zero-calibration MI-EEG decoding. Other recent studies have also actively explored diverse strategies, such as multi-source dynamic adaptation, synchronized self-training, and privacy-preserving adversarial training, to further mitigate cross-domain shifts [26,27,28,29]. The above studies demonstrate that transfer learning and domain adaptation methods make high-precision cross-subject EEG decoding possible. However, existing research on cross-domain decoding specifically targeted at fine motor movements of the same limb remains relatively scarce.

As is well known, limb motor dysfunction caused by stroke is often unilateral [30,31]. In BCI-based rehabilitation training, using motor imagery (MI) tasks of a unilateral limb is more natural and intuitive than using MI tasks involving different body parts [32]. Therefore, focusing on the motor imagery tasks of the right upper limb and addressing the issue of poor cross-subject generalization in EEG decoding, this study employs a four-stage domain generalization technique to enhance the distribution similarity between source and target domain data. This ensures that the proposed model not only meets the deployment requirements of practical BCI systems but also contributes to exploring more robust methods for constructing cross-individual BCI models. The contributions of this paper are as follows:

To decode three core movements of the unilateral upper limb, including reaching, rotating, and grasping, this study proposes a four-stage domain generalization deep learning framework that integrates Feature Extraction, Feature Augmentation, Feature Optimization, and Domain Adaptation. By progressively narrowing the feature distribution discrepancy between the source and target domains, this framework can learn shared EEG representations based on imagining tasks of unilateral limb motor activity across subjects, achieving effective knowledge transfer from the source domain to the target domain.
The Feature Extraction based on the channel attention mechanism and Feature Augmentation based on the time-shift strategy in this study precisely focus on EEG representations of key spatial motor activity and effectively expand data diversity.
In cross-subject experiments, this method comprehensively outperforms existing classical methods and mainstream deep/transfer learning models. The research findings provide fundamental technical support for upper limb rehabilitation BCI, intelligent prosthetic control, and human–computer interaction systems, significantly reducing the user calibration costs of BCI systems and promoting the transition of motor neural decoding from laboratory research to clinical and daily life scenarios.

2. Experimental Data and Methods

2.1. Experimental Data

The EEG data used in this study were obtained from the GigaDB dataset (http://gigadb.org/dataset/view/id/100788/File_page/2 (accessed on 28 October 2023)) published by Jeong et al. [32]. This dataset includes data from 25 healthy subjects. Before the experiment, subjects were required to abstain from alcohol and ensure adequate sleep. During the experiment, subjects were asked to perform three different types of motor imagery tasks: forward arm reaching, cup grasping, left wrist rotation, and others. When the experiment began, a gray background appeared on the screen with a black cross symbol, and visual instructions were provided on the monitor. After a rest period, a visual prompt with text symbols was displayed on the monitor for 3 s. Subsequently, upon seeing the text symbols of the visual prompt, the subjects performed the corresponding imaginary tasks within 3 s. The specific experimental paradigm is shown in Figure 1.

Subjects were required to participate in three experimental sessions, with each session separated by an interval of one week. The experimental environment and protocol were identical for all three sessions. EEG electrodes were placed according to the international 10–20 system, with a total of 60 channels placed and a sampling frequency of 2500 Hz.

2.2. Methods

This study proposes an EEG decoding method based on four-stage domain generalization, the workflow of which is illustrated in Figure 2. The method begins with data preprocessing, followed by the division into source and target domain data pairs. Subsequently, the source domain data sequentially enter four modules: Feature Extraction, Feature Augmentation, Feature Optimization, and Domain Adaptation, during which parameter tuning is performed through the loss function of the source domain. Next, the target domain data undergo feature extraction, feature augmentation, feature optimization, and domain adaptation, with parameters fine-tuned in conjunction with the domain adaptation loss to obtain the decoding results for the target domain. The details are as follows.

2.2.1. Data Partitioning of Source and Target Domains

This study employs Leave-One-Subject-Out Cross-Validation (LOOCV) to complete data partitioning and model evaluation, strictly avoiding any potential data leakage. The dataset comprises a total of 25 subjects. In each round of validation, the EEG data of 24 subjects are selected as the source domain for model training, while the data of the remaining 1 subject serve as the target domain for model testing. Each subject is sequentially designated as the target domain, and the aforementioned process is repeated until all subjects have completed one round of validation as the target domain. Ultimately, 25 sets of independent experimental results are obtained to objectively measure the generalization ability of the model in cross-subject EEG recognition tasks, ensuring the robustness and credibility of the evaluation results.

2.2.2. Data Preprocessing

The EEG signal preprocessing stage includes signal denoising, channel selection, band-pass filtering, data downsampling, and data normalization. The details are as follows:

(1): Signal Denoising: This study employs Information Maximization Independent Component Analysis (Infomax-ICA) to perform signal decomposition on the EEG, facilitating the removal of artifact interference such as eye and head movements.
(2): Channel Selection: Previous studies [17] have shown that when subjects execute motor tasks, the ERD and ERS phenomena observed in the sensorimotor cortex are most significant. Therefore, this study selects 20 EEG channels in the sensorimotor cortex area (FC1, FC2, FC3, FC4, FC5, FC6, C1, C2, C3, C4, C5, C6, CP1, CP2, CP3, CP4, CP5, CP6, CZ, and CPZ) for subsequent feature extraction and task analysis.
(3): Band-pass Filtering and Downsampling: For the denoised EEG signals from the selected 20 channels, a zero-phase fourth-order Butterworth filter is applied for band-pass filtering (8 to 30 Hz), and the signals are downsampled to 250 Hz.
(4): Data Normalization: After merging the multi-trial data for each subject, a normalization process is applied to reduce signal volatility and non-stationarity, which is formulated as follows:

X = \frac{X_{0} - μ}{σ}

(1)

where

X_{0}

and

X

represent the band-pass filtered data and the normalized output, respectively;

μ

and

σ

represent the mean and the standard deviation, which are calculated individually for each subject on a per-channel basis using only the training data, and then applied directly to the test data.

2.2.3. Feature Extraction Module

For the source domain data, the pre-processed EEG signals are first passed through a one-dimensional convolutional layer to extract local features in the time domain. The input EEG signal is a two-dimensional tensor

X \in R^{C \times T}

, where

C

is the number of EEG electrodes and

T

is the number of time points. The output layer is represented as:

Z_{1}^{f} = C o n v 2 D (X, W_{1}) \in R^{F_{1} \times C \times (T - L + 1)}

(2)

where the convolutional kernel

W_{1} \in R^{F_{1} \times 1 \times 1 \times L}

is used to generate

F_{1}

features to extract local temporal features, and L is the sliding window size along the time axis. Subsequently, the extracted local temporal features undergo batch normalization and are input into the spatial convolutional layer as follows:

Z_{2}^{f} = C o n v 2 D (Z_{1}^{f}, W_{2}) \in R^{B \times F_{1} \times (T - L + 1)}

(3)

where the value of

f

ranges from 1 to

F_{1}

, and B represents the number of output channels of the spatial convolutional layer. Each feature map independently uses a spatial convolutional kernel to obtain the spatial correlation between EEG signal channels. Then, the feature is output through the one-dimensional convolutional operation as follows:

Z_{3} = C o n v 2 D (Z_{2}^{f}, W_{3}) \in R^{B \times F_{2} \times 1 \times (T - L + 1)}

(4)

where the convolutional kernel

W_{3} \in R^{F_{2} \times F_{1} \times 1 \times 1}

.

To further enhance the network’s ability to focus on key feature channels, a channel attention mechanism is introduced at this feature extraction stage. This mechanism adaptively assigns different weights to different channels, improving the feature representation capability of the network. For the input feature map

Z_{3}

, global average pooling is performed along the spatial dimension to obtain the channel descriptor, as follows:

s = A v g P o o l (Z_{3}) \in R^{B \times F_{2} \times 1 \times 1}

(5)

Then, the channel descriptor vector is input into the attention mechanism. This attention mechanism compresses the channel dimension to

1 / D

of the original to reduce computational complexity and introduces non-linear feature modeling, which can be expressed as:

a = σ (W_{5} \cdot Re L U (W_{4} \cdot s)); W_{4} \in R^{\frac{F_{2}}{D} \times F_{2}}, W_{5} \in R^{F_{2} \times \frac{F_{2}}{D}}

(6)

where

σ (\cdot)

denotes the sigmoid activation function, and D represents the reduction ratio, which is set to 2 in our experiments. Finally, the attention weights,

a

, are multiplied element-wise by the feature maps,

Z_{3}

, as follows:

Z_{4} = Z_{3} \otimes a

(7)

where

\otimes

denotes element-wise multiplication. Through this mechanism, the model significantly enhances the response to task-related channels while suppressing irrelevant channel features, thereby improving the overall discriminative capability.

2.2.4. Feature Augmentation

To improve the robustness of the proposed transfer learning network against temporal perturbations and enhance its cross-subject generalization capability, this study performs feature augmentation processing on data

Z_{4}

following the feature extraction stage. By introducing instance-level random temporal shifts to the original signals, this time-shift augmentation strategy generates diverse training samples that effectively simulate the natural trial-to-trial latency variations commonly observed in EEG data. The source domain data are shifted along the time axis within a predefined range to obtain

Z_{5}

as follows:

Z_{5} = Z_{4} (t - Δ)

(8)

where

t

denotes the time step, and

Δ

represents the time shift amount. Regarding the implementation details, a zero-padding strategy is adopted. During each training iteration, the shift amount

Δ

for each individual EEG sample within a batch is randomly and independently sampled from a uniform integer distribution [

- S, S]

. In our experiments, the maximum shift range

S

is set to 10 time steps. To maintain a consistent sequence length after shifting, a sample-specific “Crop and Pad” mechanism is employed. Figure 3 visually illustrates this mechanism using the maximum forward shift scenario (

Δ = - 10

) as an example. In this specific case, the tail of the original signal is truncated (Figure 3a, “Cropped Region”), and the vacated elements at the head of the augmented sequence are filled with constant zeros (Figure 3b, “Zero Padding”). Conversely, when

Δ > 0

, the head is truncated, and the tail is padded with zeros. This approach effectively introduces temporal variance while avoiding the artificial high-frequency boundary noise typically caused by circular shifting.

2.2.5. Feature Optimization

As a critical component of the model, the feature optimization stage receives the features

Z_{5}

output from the feature augmentation stage and subjects them to further optimization processing. The details are as follows:

(1): Separable convolution, an efficient convolutional operation, is employed to decompose standard convolution into depthwise and pointwise convolutions. This significantly reduces the computational complexity and parameter size of the model while preserving its feature extraction capability and ensuring overall performance.
(2): Batch normalization is applied to normalize the input data, ensuring a stable distribution for the input of each network layer. This helps accelerate the training process and improves the stability of the model.
(3): The ELU activation function is utilized to enable the model to learn more complex feature representations. Additionally, average pooling is adopted to reduce the spatial dimensions of the feature maps, which retains the most critical feature information while decreasing the number of parameters and computational overhead.
(4): Dropout is implemented to prevent overfitting by randomly dropping certain neurons in the network, thereby improving the generalization capability of the model.

This stage effectively optimizes the features, providing a higher-quality and more compact feature input

Z_{6}

for the subsequent knowledge transfer stage, thereby achieving more effective cross-domain knowledge transfer. For the source domain data, predicted labels are obtained through a fully connected layer, and training optimization is performed using the loss between the source domain ground truth labels and the predicted labels (source domain loss). The source domain loss is calculated as follows:

L_{c l s} = L_{C E} ({\hat{Y}}_{s}, Y_{s}) = L_{C E} (F_{c} (X_{s}), Y_{s})

(9)

where

X_{s}

and

Y_{s}

represent the input data and the corresponding ground truth labels of the source domain, respectively.

{\hat{Y}}_{s}

denotes the predicted labels.

F_{c}

is a classification function, implemented here as a fully connected layer to obtain the final classification results.

L_{C E} (\cdot, \cdot)

represents the cross-entropy loss function.

2.2.6. Domain Adaptation

For the target domain data, feature extraction, feature augmentation, and feature optimization are performed by sharing the parameters trained on the source domain. A domain adaptation stage D(x) is designed for the optimized feature output

Z_{6}

. This stage primarily aims to reduce the feature distribution discrepancy between the source and target domains, thereby achieving cross-domain transfer. The underlying principle is illustrated in Figure 4. Its core idea is to align the data distributions of the source and target domains by introducing an adversarial discriminator. With the assistance of the domain adaptation loss, parameter fine-tuning and optimization are realized. The domain adaptation loss is calculated as follows:

L_{a d p} = d_{a d v} (A (D (x_{i})), β_{i}) = - \frac{1}{n_{s} + n_{t}} \sum_{i = 1}^{n_{s} + n_{t}} [β_{i} \log \frac{1}{A (D (x_{i}))}] + (1 - β_{i}) \log \frac{1}{A (D (x_{i}))}

(10)

where

β_{i}

denotes the binary label of the

i

-th sample, indicating whether the sample belongs to the source or target domain;

n_{s}

and

n_{t}

represent the number of samples in the source and target domains, respectively;

d_{a d v}

is the adversarial distance function; and

A (\cdot)

refers to the domain discriminator, which utilizes a sigmoid activation function. The specific meaning of the formula is that for a source domain sample

A (D (x_{i}))

, the domain discriminator can correctly identify that it originates from the source domain, i.e.,

A (D (x_{i}))

approaches 1. For a target domain sample

A (D (x_{i}))

, it is expected that the domain discriminator incorrectly identifies it as originating from the source domain, i.e.,

A (D (x_{i}))

approaches 0. The logarithmic function and summation operation in the formula are designed to calculate the loss across all samples and compute their average. This method can effectively train the domain discriminator, enabling it to perform adversarial learning between the source and target domains.

The loss function during parameter fine-tuning is formulated as follows:

\underset{f}{m i n} (L_{c l s} + λ L_{a d p})

(11)

where

L_{c l s}

is the classification loss of the source domain labels,

L_{a d p}

represents the domain adaptation loss, and

λ (λ > 0)

is a trade-off parameter. To address the class imbalance problem, categories with fewer samples are assigned higher weights in the loss function. Specifically, the weights are set as the reciprocal of the sample proportions to achieve more balanced training.

2.3. Comparison Methods

In this experiment, we compared various EEG-based classification methods, including machine learning, deep learning, and transfer learning. All methods employed a sampling rate of 250 Hz for data acquisition. The detailed descriptions of the comparison methods are as follows:

CNN [23]: Extracts local features through convolutional layers, reduces feature dimensions using pooling layers, and finally performs classification via fully connected layers.

TF-CNN [33]: Performs feature extraction on input time–frequency representations and feeds them into a CNN model for classification.

MF-CNN [17]: Combines multiple features of EEG signals (e.g., temporal features and spatial features) for feature extraction.

SVM [10]: A supervised learning algorithm based on statistical learning theory, which aims to find an optimal hyperplane to separate data from different categories.

LDA [34]: A classic linear classification method that optimizes the feature space by maximizing the inter-class distance and minimizing the intra-class distance.

KNN [11]: An instance-based learning method that classifies by calculating the distance (e.g., Euclidean distance, Manhattan distance) between samples.

DT (Decision Tree) [35]: Performs classification by constructing a tree-like structure, where each node represents a splitting point of a feature. By recursively splitting the feature space, it ultimately forms multiple leaf nodes, each corresponding to a category.

RF (Random Forest) [12]: An ensemble learning method that improves classification performance by constructing and integrating multiple decision trees. During training, it utilizes randomly sampled data subsets (bootstrapping) and randomly selected feature subsets, ultimately determining the classification results through majority voting.

ALEXNET [36]: A deep CNN model comprising multiple convolutional layers, pooling layers, and fully connected layers, capable of extracting deep-level features.

2DCNN [37]: Treats EEG data as a two-dimensional matrix, where rows represent channels and columns represent time points. This structure can effectively capture the spatial relationships between channels and the local features in time series.

3DCNN [38]: Treats EEG data as a three-dimensional tensor encompassing channel, time, and trial dimensions. This structure can simultaneously capture the spatial relationships between channels, the dynamic changes of time series, and the global features across different trials.

CSP [9]: A classic feature extraction method that extracts features by maximizing the inter-class variance and minimizing the intra-class variance.

FBCSP [8]: Divides EEG signals into multiple frequency bands, applies the CSP algorithm to each band separately to extract features, optimizes the feature combination through feature selection algorithms, and finally inputs the extracted features into a classifier for classification.

SWDA-RCSP [39]: An improved version of CSP that further optimizes feature extraction through relevant subspace projection, enhancing the ability to suppress noise and interference.

Transformer [20]: A deep learning architecture that primarily relies on the attention mechanism for EEG decoding. It applies attention transformation on the feature-channel dimension to enhance relevant spatial features and slices the data in the time dimension to capture global temporal dependencies.

EEGTransferNet [24]: Extracts features through convolutional neural networks and integrates statistical distribution alignment and domain adaptation in different network layers to learn generic features and domain-specific features, respectively, thereby enhancing the similarity of EEG signals across individuals.

MLEMSDA [25]: A meta-learning-enhanced multi-source domain adaptation framework designed for zero-calibration decoding, which unifies cross-task, cross-dataset, and cross-subject adaptation strategies to learn robust representations.

2.4. Evaluation Metric

To evaluate the performance of the algorithm, the classification precision, accuracy, recall, F1-score, confusion matrix, and Kappa coefficient were assessed.

Precision

Precision is the number of correctly predicted positive results divided by the total number of positive results predicted by the model, as shown in Formula (12).

P r e c i s i o n = \frac{T P}{T P + F P}

(12)

2.: Accuracy

Accuracy [40] represents the proportion of correctly classified samples out of the total samples, reflecting the overall classification precision of the model, as shown in Formula (13).

A C C = \frac{T P + T N}{T P + T N + F P + F N}

(13)

3.: Recall

Recall is the number of correctly predicted positive results divided by the total number of actual positive results, as shown in Formula (14).

r e c a l l = \frac{T P}{T P + F N}

(14)

4.: F1-score

The F1-score is calculated by considering both the precision and recall of the test simultaneously. The closer the F1-score is to 1, the better the performance, as follows.

F 1 - s c o r e = 2 \times \frac{p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(15)

5.: Confusion Matrix

The confusion matrix clearly demonstrates the classification accuracy of the model and the degree of misclassification in different tasks [41]. The rows of the matrix represent the true labels, while the columns represent the predicted labels of the model. The values on the diagonal indicate the number of correctly classified samples, whereas the off-diagonal values represent the number of misclassified samples. This intuitively reflects the model’s classification accuracy and misclassification status across different categories.

2.5. Experimental Environment and Parameter Settings

All results presented in this paper were obtained on a computer device equipped with an 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50 GHz, 8 cores, 16 threads, and 16.0 GB of RAM. The software code for all experiments was based on the PyCharm 2024.1.4 integrated development environment and written and debugged using Python 3.8.8 and MATLAB R2020b (version 9.9.0.144674).

The model was trained for 300 epochs with a batch size of 72. We optimized the network using Stochastic Gradient Descent (SGD) with an initial learning rate of 0.001, a momentum of 0.9, and a weight decay of 5 × 10⁻⁴, which serves as an L2 regularization term that penalizes large weights, thereby mitigating overfitting and improving generalization. The learning rate decay coefficient and decay factor were 0.03 and 0.75, respectively.

3. Results

3.1. Cross-Subject Transfer Decoding Results

The cross-subject decoding results are shown in Table 1. Overall, the mean values of the four metrics—precision, accuracy, F1-score, and recall—for the 25 subjects are 85.66%, 72.61%, 72%, and 73%, respectively, with corresponding standard deviations of 3.07%, 3.75%, 5%, and 4%. The standard deviations for F1-score and recall are smaller, indicating that the performance differences among different subjects on these two metrics are relatively minor, whereas the variations in precision and accuracy are more obvious. In terms of individual metric performance, Subject-05 ranks first with a precision of 90.43%; Subject-10 achieves the highest scores across the board in accuracy (81.33%), F1-score (0.81), and recall (0.81), demonstrating the best performance. Subject-22 records the lowest scores in accuracy (66.00%), F1-score (0.61), and recall (0.66), exhibiting the poorest overall performance. From the perspective of data distribution, the precision is concentrated between 81% and 91%, the accuracy between 66% and 82%, and both the F1-score and recall are concentrated between 0.61 and 0.81. In summary, the proposed model exhibits a certain degree of decoding capability for all subjects.

Figure 5 presents the confusion matrices of the three-class task. As can be seen from the best-performing subject in Figure 5a, among the samples with the true label of forward arm reaching, one hundred and forty-seven were correctly predicted, with only two misclassified as cup grasping and one misclassified as left wrist rotating, resulting in the lowest number of misclassifications. Among the samples with the true label of cup grasping, 89 were correctly predicted, yet as many as 58 were misclassified as left wrist rotating, making it the category with the lowest number of correct predictions and the most severe misclassifications among the three classes. For the samples with the true label of left wrist rotating, one hundred and twenty-four were correctly predicted, five were misclassified as forward arm reaching, and twenty-one were misclassified as cup grasping. When comparing this with the worst-performing subject in Figure 5b, a similar trend can be observed. All 150 samples with the true label of forward arm reaching were correctly predicted; however, among the samples with the true label of cup grasping, only 50 were correctly predicted, and up to 94 samples were severely misclassified as left wrist rotating. Among the samples with the true label of left wrist rotating, 117 were correctly predicted, with another 32 misclassified as cup grasping. This indicates that for both the best- and worst-performing subjects, the model demonstrates the highest reliability in recognizing the forward arm reaching action, but there is a significant mutual misclassification phenomenon between the actions of cup grasping and left wrist rotating.

3.2. Comparison Results with Other Methods

This study compared three main categories of mainstream algorithms: classical methods including traditional machine learning, deep learning, and transfer learning. As can be seen from Table 2, the model proposed in this study achieves the best performance among all compared algorithms, with a classification accuracy of 72.61%, which is 7.22% and 5.46% higher than those of other transfer learning methods, EEGTransferNet (65.39%) and MLEMSDA (67.15%), respectively. Compared to the classical methods (RCSP + SVM, RCSP + LDA, RCSP + KNN, RCSP + DT, CSP, FBCSP, RCSP + RF, and SWDA-RCSP), the proposed method achieves improvements of 7.40%, 11.55%, 7.74%, 13.06%, 18.76%, 18.61%, 6.46%, and 5.97%, respectively. Compared to deep learning algorithms (CNN, TF-CNN, MF-CNN, ALEXNET, 2DCNN, DEEP-CONVNET, 3DCNN, and Transformer), the proposed method achieves improvements of 21.53%, 22.37%, 15.55%, 23.86%, 16.25%, 12.96%, 19.61%, and 7.37%, respectively. Meanwhile, to further verify the reliability of the research results, this study employs a paired t-test to conduct statistical significance analysis on the compared algorithms. The results show that the performance differences between the proposed method and all compared methods are statistically significant (p < 0.01). In summary, the proposed method demonstrates a significant advantage in classification accuracy for higher-precision motor imagery recognition, which fully verifies its effectiveness and superiority.

3.3. Ablation Study and Impact Analysis of Each Module

The results of the ablation study are shown in Table 3. When the feature extraction stage is removed, the precision and accuracy decrease by 32.66% and 34.97%, respectively. This indicates that the feature extraction stage is crucial for constructing an effective feature space, and its absence severely hinders the recognition capability of the model. When only the feature augmentation stage is removed, the precision and accuracy drop by 1.2% and 2.65%, respectively. This demonstrates that although the contribution of the feature augmentation stage to improving model performance is relatively small, it still plays a certain optimization role when combined with other operations. Without feature optimization, the performance of this model degrades significantly, with precision decreasing by 27.41% and accuracy decreasing by 26.35%. This highlights the critical role of feature optimization in enhancing the generalization capability of the model. In the absence of domain adaptation technology, the values of precision and accuracy decrease by 4.79% and 7.03%, respectively, and the standard deviation increases significantly. This indicates that the domain adaptation module not only contributes to decoding correctness but also plays an important role in the stability of the model. The optimal performance is achieved by the complete proposed framework. This demonstrates that the comprehensive effect of feature extraction, feature augmentation, feature optimization, and domain adaptation effectively improves prediction capability and stability.

Figure 6 illustrates the visualization results of the feature distributions for the original target domain data and the data after each of the four processing stages. Figure 6a presents the distribution state of the original data. Figure 6b shows that the distribution of data points for each category becomes more aggregated, and that both the clarity of boundaries and the degree of separation between categories are significantly improved. The feature clustering of Forward Arm Reaching is significantly enhanced, and the feature distribution of Left Wrist Rotating also becomes more compact. After feature augmentation in Figure 6c, the data points of Forward Arm Reaching are more tightly clustered together. In Figure 6d, after feature optimization, the degree of distinction among the three categories is significantly enhanced. The data points of Forward Arm Reaching are almost completely separated, while the feature boundaries for Cup Grasping and Left Wrist Rotating become clearer. Figure 6e shows that the feature distribution reaches an optimal state after domain adaptation. The separation effect between categories is significantly enhanced, and the clustering of data points is further improved. Forward Arm Reaching is completely separated, and the distribution of the entire dataset in the two-dimensional space becomes increasingly clear and organized.

The above results indicate that through the cumulative effect of the four-stage feature processing, the processed data have achieved a qualitative leap in separability. The phenomenon of categorical feature confusion is substantially alleviated, providing reliable support for improving EEG decoding performance and fully verifying the effectiveness and superiority of the feature processing pipeline in the proposed model.

3.4. Impact of Batch Size on Decoding Accuracy

Table 4 investigates the impact of varying batch sizes on the evaluation metrics. As can be clearly seen from the data in the table, when the batch size is gradually adjusted from 16 to 108, the evaluation metrics of the model exhibit an overall trend of initially increasing and subsequently decreasing. When the batch size increases from 16 to 64, precision improves from 80.50% to 85.66%, accuracy from 68.00% to 72.61%, F1-score from 0.65 to 0.72, and recall from 0.68 to 0.73. All metrics continuously show positive trends, with the improvement from 16 to 32 being particularly significant, while the magnitude of improvement from 32 to 64 slightly decreases. When the batch size is further expanded from 64 to 108, all metrics experience varying degrees of decline: precision drops to 82.70%, accuracy to 71.32%, F1-score to 0.69, and recall to 0.71. This indicates that an excessively large batch size leads to a degradation in model performance. These results demonstrate that among the batch sizes tested in this experiment, 64 is the optimal batch size that allows the model to achieve the best performance. Furthermore, it reflects that there is an optimal performance threshold for the batch size, and continuing to enlarge the batch size beyond this threshold will exert a negative impact on the model’s effectiveness.

3.5. Impact of the Placement of the Feature Augmentation Module on Decoding Performance

Figure 7 intuitively compares the impact of different embedding positions of the data augmentation module within the model on the decoding performance of the proposed method. The comparison results show that when the data augmentation module is placed before feature extraction, the model accuracy and precision are 53.55% and 68.00%, respectively, which are at relatively low levels. When placed before feature optimization, the accuracy substantially increases to 72.61%, and the precision simultaneously rises to 85.66%. When placed before domain adaptation, both metrics experience a decline, with accuracy and precision dropping by 6.39% and 4.79%, respectively. This indicates that the placement of the feature augmentation module in this study is reasonable and effective. This strategy can not only avoid the destruction of physiological features caused by enhancing raw signals but also reduce the interference of post-optimization augmentation on domain adaptation, providing an effective data augmentation scheme for cross-domain generalized EEG decoding.

3.6. Impact of Attention Mechanism on Decoding Performance

Table 5 demonstrates the impact of the channel attention mechanism on the performance of the proposed model. Compared to the model without channel attention, the decoding model with channel attention shows significant improvements across all evaluation metrics: precision increases by 4.48%, accuracy by 7.94%, F1-score by 18%, and recall by 9%.

4. Discussion

(1): Practical significance of decoding for unilateral upper limb motor imagery

Aiming at the three-class task of unilateral limb movements (forward arm reaching, cup grasping, and left wrist rotating), this study achieved a target domain decoding accuracy of approximately 72.61% under the cross-subject domain generalization scenario involving 25 subjects. This result is at a leading level among similar three-class motor imagery EEG studies, indicating that the proposed four-stage domain generalization framework can effectively alleviate the distribution discrepancies of cross-subject EEG data and provide reliable decoding performance for practical BCI systems. This proves its application potential in real-world scenarios with substantial individual differences, such as rehabilitation training of motor function for stroke patients and exoskeleton robot control, which require cross-user generalization.

(2): Synergistic effect of the four-stage domain generalization framework

The ablation study results show that the complete combination of the four modules—Feature Extraction, Feature Augmentation, Feature Optimization, and Domain Adaptation—achieves an improvement in decoding accuracy ranging from a minimum of 2.65% to a maximum of 34.97% compared to variants missing any single module. This verifies the necessity and synergistic effect of each stage. Among them, the parameter-sharing mechanism between the source and target domains enables the model to learn shared EEG representations of limb movement across subjects. Meanwhile, the domain adaptation loss further narrows the feature distribution discrepancy between the source and target domains, thereby realizing an effective transfer of knowledge from the source domain to the target domain. This design not only preserves the advantage of deep learning in automatically extracting features but also addresses the core issues of large individual differences and high annotation costs in EEG signals through the domain generalization strategy. Feature visualization results demonstrate that after the step-by-step processing of feature extraction, feature augmentation, and feature optimization, the feature distributions of the source and target domains gradually cluster, and the distribution discrepancy is significantly reduced. Following the domain adaptation stage, the overlap of the two types (Cup Grasping and Left Wrist Rotating) of features is further improved, indicating that the four-stage framework can effectively achieve cross-domain feature alignment. This process intuitively proves the progressive role of each module in narrowing inter-domain differences and enhancing feature transferability, providing visual evidence for the model’s effectiveness.

The core role of introducing the feature augmentation module (time-shift augmentation strategy) is to expand the diversity of training data and enhance the model’s robustness against temporal jitter by applying minor perturbations to the original EEG signals in the temporal dimension. The research results show that the optimal performance is achieved when the feature augmentation module is placed after feature extraction and before feature optimization. This is because performing augmentation after initially extracting motor-related features can retain key neural information while avoiding the excessive amplification of original noise signals. This finding provides an important reference for the engineering implementation of EEG feature augmentation strategies.

To provide a more insightful validation of our architectural design, we conducted a fine-grained ablation analysis specifically isolating the channel attention mechanism. The results demonstrate that the addition of this mechanism improved the decoding accuracy by 7.94%, confirming that the framework’s performance gain inherently relies on the module’s capability to adaptively assign spatial weights. By effectively suppressing background noise from irrelevant regions, it plays an indispensable role in extracting highly discriminative spatial features for unilateral motor imagery tasks.

(3): Comparative analysis with existing methods

Compared to the classical methods, the proposed method achieves an 18.76% accuracy improvement over the basic CSP method in the cross-subject decoding task. The advantage of this method stems from the effective combination of the deep learning architecture and the domain generalization strategy. The end-to-end network design can automatically learn hierarchical features and eliminate the reliance on manual features and prior knowledge, thereby accurately capturing subtle neural activity patterns. Simultaneously, the introduced domain generalization strategy overcomes the performance degradation of traditional methods caused by data distribution shifts in cross-subject scenarios, fully demonstrating the potential of deep learning in analyzing high-dimensional, non-stationary EEG signals.

Compared with existing CNN-based architectures, our method offers three key advantages. First, unlike conventional CNN approaches (e.g., EEGNet, DeepConvNet) that assign equal importance to all EEG channels, our framework incorporates a task-specific channel attention mechanism tailored to motor imagery decoding, enabling the network to adaptively reweight the contribution of different channels. Second, in contrast to recent models such as CTNet and MSVTNet, which primarily rely on multi-head self-attention to capture global temporal dependencies, our channel attention mechanism specifically emphasizes neural activity originating from the cortex most closely associated with unilateral upper-limb motor imagery. Third, by suppressing irrelevant background activity and enhancing informative spatial representations, our method learns more discriminative and compact feature embeddings than conventional CNNs, thereby providing a stronger foundation for subsequent domain adaptation and classification.

In the field of deep learning and transfer learning, compared to baseline models specifically designed for EEG such as EEGNet, the target domain accuracy of the proposed method is improved by 3.36%. This indicates that relying solely on a basic shared feature extraction network cannot fully break through the generalization bottleneck across subjects. Furthermore, compared to EEGTransferNet, which also falls under the category of transfer learning, the accuracy of this method is further improved by 7.22%. Compared to the recent meta-learning domain adaptation framework MLEMSDA, the accuracy of our method is improved by 5.46%. Various other sophisticated strategies [26,27,28,29] have recently been explored to address domain shifts, but they often require highly specific pre-processing pipelines or complex training phases. In contrast, our proposed method maintains structural simplicity while delivering robust performance. This comprehensive performance superiority is primarily attributed to the progressive four-stage architecture design of the proposed method. By pre-positioning the feature augmentation and feature optimization modules before the domain adaptation stage, the network can effectively filter redundant noise and significantly improve the robustness and discriminability of features, thereby avoiding the performance bottleneck caused by directly using primary features for distribution alignment. In addition, the deep integration of the domain adaptation mechanism further shortens the distribution distance between the source and target domains, substantially weakening the interference of individual heterogeneity on model stability. This indicates that the domain generalization framework achieves an ideal balance between feature robustification and distribution alignment, providing a highly feasible new approach for cross-subject EEG decoding.

(4): Impact of batch size on decoding performance

Experimental results show that the model achieves the optimal decoding accuracy when the batch size is 64. A batch size that is too small leads to unstable gradient estimation, making it difficult for the domain adaptation process to converge; conversely, a batch size that is too large reduces the generalization capability of the model and easily leads to overfitting on the source domain data. This rule demonstrates that in cross-subject domain generalization tasks, an appropriate batch size must be selected to balance gradient stability and generalization performance, providing a practical basis for hyperparameter selection in subsequent related research.

(5): This study still has certain limitations. First, this study only included 25 subjects, and the sample size and scenario complexity still need to be expanded. Second, the three-class task is relatively simple, and the fine motor actions of unilateral limbs can be extended to intention decoding of more categories. Third, the integration of physiological priors in this study is insufficient. The current model is primarily data-driven and does not fully incorporate physiological priors such as brain network connectivity and neural oscillations, leaving room for improvement in interpretability. Fourth, the current framework focuses on synchronous BCI scenarios and has not yet addressed the discrimination between motor imagery tasks and non-task-related brain activities (i.e., the idle state or asynchronous BCI). As highlighted by relevant studies addressing this challenging scenario [42,43,44], real-world BCI applications inherently require this continuous decoding capability to prevent false triggers caused by background neural activities. In future work, we plan to extend our approach to asynchronous paradigms by introducing a rest class or threshold-based rejection mechanisms, thereby bridging the gap between laboratory validations and practical real-world applications.

5. Conclusions

Aiming at the problems of large individual differences and weak generalization ability in cross-subject decoding of EEG signals, this paper proposes a four-stage deep learning framework that integrates feature extraction, feature augmentation, feature optimization, and domain adaptation. Experimental results demonstrate that the proposed framework achieves excellent decoding accuracy, comprehensively outperforming existing classical methods, basic deep learning, and transfer learning baseline models.

This framework not only leverages the advantages of end-to-end automatic representation in deep learning but also substantially improves the prediction stability of the model when facing completely unfamiliar subjects. This provides a practical and feasible technical path for reducing the user calibration cost of brain–computer interface systems and also offers a valuable new paradigm for advancing the practical engineering implementation of EEG signal decoding in the future.

Author Contributions

Conceptualization, J.L. and X.J.; methodology, J.L.; writing—original draft preparation, J.N., X.J. and J.L.; writing—review and editing, J.N. and X.J.; visualization, J.L. and X.J.; validation, C.L. and D.L.; supervision, J.N. and D.L.; funding acquisition, J.N. and Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key Science and Technology Program of Henan Province [262102211079, 262102211027, and 252102210139], the National Natural Science Foundation of China [No. 82370513 and No. 62476255], the Key Science Research Project of Colleges and Universities in Henan Province of China [No. 25A520003], the Post-Doctoral Foundation of Henan Province [No. HN2025151], and the Zhengzhou Youth Science and Technology Talent Program [No. 20248651].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: http://gigadb.org/dataset/view/id/100788/File_page/2 (accessed on 28 October 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lakshminarayanan, K.; Ramu, V.; Shah, R.; Haque Sunny, M.S.; Madathil, D.; Brahmi, B.; Wang, I.; Fareh, R.; Rahman, M.H. Developing a tablet-based brain-computer interface and robotic prototype for upper limb rehabilitation. PeerJ Comput. Sci. 2024, 10, e2174. [Google Scholar] [CrossRef]
Ke, Z.; Mahmoud, S.S.; Ruan, H.; Fang, Q. Design and Validation of a 3D VR-Assisted Motor Imagery System. In Proceedings of the 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Copenhagen, Denmark, 14–17 July 2025; pp. 1–4. [Google Scholar]
Shi, T.; Gu, X.; Bi, H.; Lv, J.; Liu, Y.; Dai, Y.; Zou, L. A study of unilateral upper limb fine motor imagery decoding using frequency-band attention network. IEEE Access 2024, 12, 32679–32692. [Google Scholar] [CrossRef]
Rong, F.; Yang, B.; Guan, C. Decoding multi-class motor imagery from unilateral limbs using EEG signals. IEEE Trans. Neural Syst. Rehabil. Eng. 2024, 32, 3399–3409. [Google Scholar] [CrossRef] [PubMed]
Gao, Y.; Xie, W.; Luo, Z.; Houston, M.; Zhang, Y. Multi-domain feature analysis of MI-EEG signals using tensor train decomposition and projected gradient Non-negative Matrix Factorization. Neurocomputing 2025, 623, 129410. [Google Scholar] [CrossRef]
Gu, L.; Jiang, J.; Han, H.; Gan, J.Q.; Wang, H. Recognition of unilateral lower limb movement based on EEG signals with ERP-PCA analysis. Neurosci. Lett. 2023, 800, 137133. [Google Scholar] [CrossRef]
Raza, A.; Yusoff, M.Z. Deep Learning Approaches for EEG-Motor Imagery-Based BCIs: Current Models, Generalization Challenges, and Emerging Trends. IEEE Access 2025, 13, 151866–151893. [Google Scholar] [CrossRef]
Yang, B.; Tang, J.; Guan, C.; Li, B. Motor imagery EEG recognition based on FBCSP and PCA. In Proceedings of the International Conference on Brain Inspired Cognitive Systems, Xi’an, China, 7–9 July 2018; pp. 195–205. [Google Scholar]
Jiang, X.; Meng, L.; Chen, X.; Xu, Y.; Wu, D. CSP-Net: Common spatial pattern empowered neural networks for EEG-based motor imagery classification. Knowl. Based Syst. 2024, 305, 112668. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Mohammady, F.; Asadi Amiri, S.; Mohammadpoory, Z. Leveraging Segmentation and Visibility Graph Analysis to Enhance Motor Imagery Classification in EEG Signals. Cogn. Comput. 2026, 18, 8. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef]
Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef] [PubMed]
Cai, Q.; Liu, C.; Chen, A. Classification of Motor Imagery Tasks Derived from Unilateral Upper Limb based on a Weight-optimized Learning Model. J. Integr. Neurosci. 2024, 23, 106. [Google Scholar] [CrossRef]
Zhao, X.; Zhang, H.; Zhu, G.; You, F.; Kuang, S.; Sun, L. A multi-branch 3D convolutional neural network for EEG-based motor imagery classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2019, 27, 2164–2177. [Google Scholar] [CrossRef] [PubMed]
Zhang, R.; Chen, Y.; Xu, Z.; Zhang, L.; Hu, Y.; Chen, M. Recognition of single upper limb motor imagery tasks from EEG using multi-branch fusion. Front. Neurosci. 2023, 17, 1129049. [Google Scholar] [CrossRef]
Zhao, W.; Jiang, X.; Zhang, B.; Xiao, S.; Weng, S. CTNet: A convolutional transformer network for EEG-based motor imagery classification. Sci. Rep. 2024, 14, 20237. [Google Scholar] [CrossRef]
Vafaei, E.; Hosseini, M. Transformers in EEG analysis: A review of architectures and applications in motor imagery, seizure, and emotion classification. Sensors 2025, 25, 1293. [Google Scholar] [CrossRef] [PubMed]
Song, Y.; Jia, X.; Yang, L.; Xie, L. Transformer-based spatial-temporal feature learning for EEG decoding. arXiv 2021, arXiv:2106.11170. [Google Scholar]
Roy, A.M. Adaptive transfer learning-based multiscale feature fused deep convolutional neural network for EEG MI multiclassification in brain–computer interface. Eng. Appl. Artif. Intell. 2022, 116, 105347. [Google Scholar] [CrossRef]
Zhang, K.; Robinson, N.; Lee, S.-W.; Guan, C. Adaptive transfer learning for EEG motor imagery classification with deep convolutional neural network. Neural Netw. 2021, 136, 1–10. [Google Scholar] [CrossRef]
Khademi, Z.; Ebrahimi, F.; Kordy, H.M. A transfer learning-based CNN and LSTM hybrid deep learning model to classify motor imagery EEG signals. Comput. Biol. Med. 2022, 143, 105288. [Google Scholar] [CrossRef] [PubMed]
Liang, Z.; Zheng, Z.; Chen, W.; Pei, Z.; Wang, J.; Chen, J. A novel deep transfer learning framework integrating general and domain-specific features for EEG-based brain–computer interface. Biomed. Signal Process. Control 2024, 95, 106311. [Google Scholar] [CrossRef]
Miao, M.; Fu, W.; Zeng, H.; Xu, B.; Zhang, W.; Hu, W. Meta-Learning Enhanced Multi-Source Domain Adaptation for zero-calibration motor imagery EEG decoding. J. Neurosci. Methods 2026, 431, 110742. [Google Scholar] [CrossRef]
Chen, P.; Liu, X.; Ma, C.; Wang, H.; Yang, X.; Grebogi, C.; Gu, X.; Gao, Z. Unsupervised domain adaptation with synchronized self-training for cross-domain motor imagery recognition. IEEE J. Biomed. Health Inform. 2025, 29, 3664–3677. [Google Scholar] [CrossRef]
Gong, Y.; Shi, K.; Niu, X.; Yang, L.; Yang, X.; Zheng, C. Multi-source Discriminant Dynamic Domain Adaptation for Cross-subject Motor Imagery EEG Recognition. IEEE J. Biomed. Health Inform. 2025, 30, 3556–3567. [Google Scholar] [CrossRef]
Guo, S.; Wang, Y.; Liu, Y.; Zhang, X.; Tang, B. Improving cross-session motor imagery decoding performance with data augmentation and domain adaptation. Biomed. Signal Process. Control 2025, 106, 107756. [Google Scholar] [CrossRef]
Zhu, J.; Xu, G.; Lin, Z.; Long, J.; Zhou, T.; Sheng, B.; Yang, X. Semi-Supervised Privacy-Preserving EEG-Based Motor Imagery Classification via Self and Adversarial Training. IEEE Trans. Autom. Sci. Eng. 2025, 22, 20679–20690. [Google Scholar] [CrossRef]
Yang, B.; Ma, J.; Qiu, W.; Zhu, Y.; Meng, X. A new 2-class unilateral upper limb motor imagery tasks for stroke rehabilitation training. Med. Nov. Technol. Devices 2022, 13, 100100. [Google Scholar] [CrossRef]
Jiang, Y.; Yin, J.; Zhao, B.; Zhang, Y.; Peng, T.; Zhuang, W.; Wang, S.; Huang, S.; Zhong, M.; Zhang, Y. Motor imagery brain-computer interface in rehabilitation of upper limb motor dysfunction after stroke. JoVE (J. Vis. Exp.) 2023, 199, e65405. [Google Scholar] [CrossRef]
Jeong, J.-H.; Cho, J.-H.; Shim, K.-H.; Kwon, B.-H.; Lee, B.-H.; Lee, D.-Y.; Lee, D.-H.; Lee, S.-W. Multimodal signal dataset for 11 intuitive movement tasks from single upper extremity during multiple recording sessions. GigaScience 2020, 9, giaa098. [Google Scholar] [CrossRef]
Mitra, V.; Franco, H. Time-frequency convolutional networks for robust speech recognition. In Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, AZ, USA, 13–17 December 2015; pp. 317–323. [Google Scholar]
Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
Luo, Y.; He, X.; Ren, K. A feature extraction and classification algorithm for motor imagery EEG signals based on decision tree and CSP-SVM. In Proceedings of the Optics in Health Care and Biomedical Optics XI, Nantong, China, 10–19 October 2021; pp. 338–345. [Google Scholar]
Alwasiti, H.; Yusoff, M.Z.; Raza, K. Motor imagery classification for brain computer interface using deep metric learning. IEEE Access 2020, 8, 109949–109963. [Google Scholar] [CrossRef]
Toraman, S. Preictal and Interictal Recognition for Epileptic Seizure Prediction Using Pre-trained 2DCNN Models. Trait. Du. Signal 2020, 37, 1045. [Google Scholar] [CrossRef]
Li, M.-A.; Zhang, J. A Novel Approach to MI-EEG Classification via 3D Interpolation and 3DCNN. In Proceedings of the 2023 35th Chinese Control and Decision Conference (CCDC), Yichang, China, 20–22 May 2023; pp. 3062–3066. [Google Scholar]
Meng, Y.; Zhu, N.; Li, D.; Nan, J.; Xia, Y.; Yao, N.; Han, C. Stepwise discriminant analysis based optimal frequency band selection and ensemble learning for same limb MI recognition. Clust. Comput. 2025, 28, 197. [Google Scholar] [CrossRef]
Lian, X.; Liu, C.; Gao, C.; Deng, Z.; Guan, W.; Gong, Y. A Multi-Branch Network for Integrating Spatial, Spectral, and Temporal Features in Motor Imagery EEG Classification. Brain Sci. 2025, 15, 877. [Google Scholar] [CrossRef] [PubMed]
Krstinić, D.; Braović, M.; Šerić, L.; Božić-Štulić, D. Multi-label classifier performance evaluation with confusion matrix. Comput. Sci. Inf. Technol. 2020, 1, 1–14. [Google Scholar]
Bove, S.; Giaquinto, M.; Percannella, G.; Saggese, A.; Vento, M. The Impact of Non-Task-Related Neural Activity in EEG-Based Motor Imagery Classification. In Proceedings of the 2025 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), Ancona, Italy, 22–24 October 2025; pp. 1–6. [Google Scholar]
Huang, D.; Qian, K.; Fei, D.-Y.; Jia, W.; Chen, X.; Bai, O. Electroencephalography (EEG)-based brain–computer interface (BCI): A 2-D virtual wheelchair control based on event-related desynchronization/synchronization and state control. IEEE Trans. Neural Syst. Rehabil. Eng. 2012, 20, 379–388. [Google Scholar] [CrossRef]
Arpaia, P.; Esposito, A.; Galasso, E.; Galdieri, F.; Natalizio, A. A wearable brain-computer interface to play an endless runner game by self-paced motor imagery. J. Neural Eng. 2025, 22, 026032. [Google Scholar] [CrossRef]

Figure 1. Motor imagery process of the GigaDB dataset.

Figure 2. Model architecture.

Figure 3. Illustration of the Time-Shift Augmentation Strategy.

Figure 4. Feature adaptation process.

Figure 5. Confusion matrices of the three-class task for the best- and worst-performing subjects. Class 1: Forward Arm Reaching; Class 2: Cup Grasping; Class 3: Left Wrist Rotating.

Figure 6. Changes in the feature distribution of subjects.

Figure 7. Impact of the placement of the Feature Augmentation module on decoding accuracy and precision.

Table 1. Cross-subject decoding results for 25 subjects.

Subject	Precision (%)	Accuracy (%)	F1-Score	Recall
sub-01	88.02	75.78	0.75	0.76
sub-02	86.31	68.00	0.67	0.68
sub-03	87.77	73.33	0.73	0.73
sub-04	88.62	76.44	0.76	0.76
sub-05	90.43	79.33	0.79	0.79
sub-06	88.46	73.33	0.72	0.73
sub-07	83.71	71.33	0.70	0.71
sub-08	87.23	70.67	0.69	0.71
sub-09	84.70	72.00	0.71	0.72
sub-10	90.33	81.33	0.81	0.81
sub-11	83.76	68.89	0.69	0.69
sub-12	87.17	72.89	0.72	0.73
sub-13	83.76	68.89	0.67	0.69
sub-14	85.72	72.44	0.72	0.72
sub-15	84.03	69.78	0.67	0.70
sub-16	81.20	70.89	0.71	0.71
sub-17	81.71	71.33	0.71	0.71
sub-18	88.66	77.33	0.77	0.77
sub-19	87.58	75.33	0.74	0.75
sub-20	80.96	70.00	0.70	0.70
sub-21	89.24	76.89	0.77	0.77
sub-22	83.38	66.00	0.61	0.66
sub-23	79.44	68.89	0.63	0.69
sub-24	86.49	74.67	0.74	0.75
sub-25	82.86	69.56	0.69	0.70
Mean/std	85.66/3.07	72.61/3.75	0.72/0.05	0.73/0.04

Table 2. Comparison of classification accuracies among different algorithms.

	Methods	Accuracy (%)
Classical Methods	RCSP + SVM *	65.21
	RCSP + LDA **	61.06
	RCSP + KNN **	64.87
	RCSP + DT **	59.55
	CSP **	53.85
	FBCSP **	54.00
	RCSP + RF *	66.15
	SWDA-RCSP *	66.64
Deep Learning	CNN **	51.08
	TF-CNN **	50.24
	MF-CNN **	57.06
	ALEXNET **	48.75
	2DCNN **	56.36
	EEGNET *	69.25
	DEEP CONVNET **	59.65
	3DCNN **	53.00
	Transformer **	65.24
Transfer Learning	EEGTRANSFERNET *	65.39
	MLEMSDA **	67.15
	Ours	72.61

Note: * indicates p < 0.01, ** indicates p < 0.001 (paired t-test).

Table 3. Ablation experimental results.

Feature Extraction	Feature Augmentation	Feature Optimization	Feature Adaptation	Precision (%) Mean/STD	ACC (%) Mean/STD
×	√	√	√	53.00/1.99	37.64/2.61
√	×	√	√	84.46/2.91	69.96/3.09
√	√	×	√	58.25/3.22	46.26/3.20
√	√	√	×	80.87/8.86	65.58/8.87
√	√	√	√	85.66/3.07	72.61/3.75

Note: × means without this module, √ means with this module.

Table 4. Impact of batch size.

Batch Size	16	32	64	108
Precision	80.50%	83.52%	85.66%	82.70%
Accuracy	68.00%	72.00%	72.61%	71.32%
F1-score	0.65	0.71	0.72	0.69
Recall	0.68	0.68	0.73	0.71

Table 5. Decoding results with and without channel attention mechanism.

Evaluation Metrics	Without Channel Attention	With Channel Attention
Precision	81.18%	85.66%
Accuracy	64.67%	72.61%
F1-score	0.54	0.72
Recall	0.64	0.73

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nan, J.; Jin, X.; Lin, J.; Li, C.; Li, D.; Zheng, Q. Four-Stage Domain Adaptation Transfer Learning for EEG-Based Decoding of Unilateral Upper Limb Motor Imagery. Information 2026, 17, 592. https://doi.org/10.3390/info17060592

AMA Style

Nan J, Jin X, Lin J, Li C, Li D, Zheng Q. Four-Stage Domain Adaptation Transfer Learning for EEG-Based Decoding of Unilateral Upper Limb Motor Imagery. Information. 2026; 17(6):592. https://doi.org/10.3390/info17060592

Chicago/Turabian Style

Nan, Jiaofen, Xueqi Jin, Jingjing Lin, Conghui Li, Duan Li, and Qian Zheng. 2026. "Four-Stage Domain Adaptation Transfer Learning for EEG-Based Decoding of Unilateral Upper Limb Motor Imagery" Information 17, no. 6: 592. https://doi.org/10.3390/info17060592

APA Style

Nan, J., Jin, X., Lin, J., Li, C., Li, D., & Zheng, Q. (2026). Four-Stage Domain Adaptation Transfer Learning for EEG-Based Decoding of Unilateral Upper Limb Motor Imagery. Information, 17(6), 592. https://doi.org/10.3390/info17060592

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Four-Stage Domain Adaptation Transfer Learning for EEG-Based Decoding of Unilateral Upper Limb Motor Imagery

Abstract

1. Introduction

2. Experimental Data and Methods

2.1. Experimental Data

2.2. Methods

2.2.1. Data Partitioning of Source and Target Domains

2.2.2. Data Preprocessing

2.2.3. Feature Extraction Module

2.2.4. Feature Augmentation

2.2.5. Feature Optimization

2.2.6. Domain Adaptation

2.3. Comparison Methods

2.4. Evaluation Metric

2.5. Experimental Environment and Parameter Settings

3. Results

3.1. Cross-Subject Transfer Decoding Results

3.2. Comparison Results with Other Methods

3.3. Ablation Study and Impact Analysis of Each Module

3.4. Impact of Batch Size on Decoding Accuracy

3.5. Impact of the Placement of the Feature Augmentation Module on Decoding Performance

3.6. Impact of Attention Mechanism on Decoding Performance

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI