1. Introduction
BCI systems enable their users communication and control channels that do not depend on the brain’s normal output channels of peripheral nerves and muscles [
1]. Motor imagery holds considerable significance in the realm of motor BCI systems [
2], offering a novel avenue for control and interaction, particularly for individuals who have lost voluntary motor function, such as patients with severe motor neuron diseases [
3], locked-in syndrome, or other conditions that impair their normal movements. Defined as the mental rehearsal of physical movements without actual execution [
4], motor imagery enables these individuals to interact with external devices by translating their mentally imagined movements into commands or actions.
In BCI research community, EEG signals associated with motor imagery are characterized by two fundamental properties [
5], a low signal-to-noise ratio (SNR) and high non-stationarity coupled with complexity. The former stems from the inherent low amplitude of EEG signals and their susceptibility to external interference; the latter arises from the intricate, coordinated neural activities across multiple brain regions engaged in motor imagery, yielding signals that are not only information-rich but also highly complex [
6]. To efficiently extract discriminative features from raw EEG data, a variety of techniques have been developed, including temporal-frequency analysis [
7], frequency-domain feature extraction [
8], spatial information extraction such as common spatial patterns (CSP) [
9,
10] and brain connectivity patterns [
11], deep learning methodologies [
12]. By leveraging these sophisticated feature extraction methods, researchers are better equipped to address the inherent challenges posed by the EEG signals of motor imagery.
In addition to the aforementioned considerations, EEG signals often exhibit notable cross-subject feature distribution discrepancies [
13,
14], with significant variations in EEG activation patterns between individuals. Even when performing identical tasks, such as imagining the movement of the left hand, different subjects may display distinct activation patterns, frequency characteristics, and spatial distributions in the collected EEG data [
15]. A primary obstacle in EEG-based decoding lies in the significant distribution shifts observed between subjects, which often leads to suboptimal decoding performance when generalized to unseen users. Transfer learning offers a robust solution to this challenge by bridging the gap between a data-rich source domain and a data-scarce target domain [
16]. Specifically, in the realm of EEG analysis, this approach capitalizes on prior knowledge gained from a group of source participants to accelerate learning for a target individual. This process effectively mitigates the burden of collecting massive labeled datasets for every new user, thereby streamlining the calibration phase required for practical BCI deployment. By integrating such methodologies, researchers can overcome the limitations posed by subject-specific variations, thereby achieving more accurate and reliable semantic decoding from EEG data.
While the landscape of transfer learning is replete with diverse methodologies, a significant oversight persists that most existing frameworks necessitate direct access to source EEG data, thereby compromising the privacy of source subjects [
17,
18]. This limitation is particularly acute in EEG applications, where signals inherently encapsulate sensitive biometric and pathological markers, raising severe compliance concerns under stringent data protection regulations [
19]. To address this critical gap, our primary contribution is the formulation of a privacy-preserving transfer learning paradigm. Specifically, we introduce a source-free domain adaptation strategy that relies solely on the pre-trained source model, strictly prohibiting access to raw source EEG data during the adaptation phase [
20]. This architecture not only fortifies the confidentiality of source EEG data but also optimizes computational efficiency by eliminating the overhead associated with cross-domain data transmission.
To simultaneously tackle cross-subject EEG adaptation challenges and uphold source data privacy, we introduce a T-CMDP (Transformer-based source-free domain adaptation via Class-balanced Multicentric Dynamic Pseudo-labeling) framework in this paper. This framework orchestrates three synergistic modules into a cohesive pipeline. Initially, during source model construction, we leverage manifold feature extraction to distill task-discriminative representations from raw EEG signals, laying a robust foundation for the subsequent Transformer-based local model training. Building upon this, the adaptation phase implements a parameter transfer mechanism that migrates pre-trained source weights to the target domain, effectively enhancing predictive fidelity while strictly adhering to privacy constraints. Concurrently, a self-supervised refinement loop is integrated to dynamically optimize pseudo-labels for unlabeled target EEG samples, thereby maximizing the model capacity to exploit latent information. Through this intricate interplay of feature distillation, privacy-preserving transfer, and label refinement, T-CMDP significantly elevates both the robustness and generalization performance of cross-subject EEG decoding.
The primary contributions of this study are articulated as follows:
Advancing a privacy-centric adaptation paradigm. We introduce T-CMDP, a novel source-free domain adaptation framework tailored for cross-subject motor-imagery EEG classification. Distinct from conventional approaches, T-CMDP obviates the need for raw source EEG data during target adaptation. By synergizing Transformer-based representation learning with Riemannian manifold-aware feature extraction, our framework significantly enhances transferability while rigorously preserving source subject privacy.
Devising a robust pseudo-label refinement mechanism. We formulate a class-balanced multicentric dynamic pseudo-labeling strategy to address the challenges of source-free adaptation. This mechanism integrates knowledge distillation and information maximization with global inter-class balanced sampling and intra-class multicentric prototype construction. Such a holistic design effectively mitigates class bias and suppresses noisy label propagation, thereby sharpening decision boundaries in the target domain.
Establishing new performance benchmarks. Comprehensive evaluations across three public motor-imagery EEG datasets reveal that T-CMDP consistently outperforms state-of-the-art machine learning, deep learning, and source-free baselines, achieving average accuracies of 56.85%, 76.34%, and 74.49%, respectively. Furthermore, rigorous ablation studies and parameter sensitivity analyses corroborate the complementary efficacy of TSM-based feature extraction, domain adaptation, and self-supervised refinement, while validating the optimality of key hyper-parameters (e.g., the number of class centers and high-confidence instances).
The remainder of this paper is organized as follows. In
Section 2, we review some works related to this study, encompassing EEG feature extraction methods and techniques for cross-subject EEG classification.
Section 3 provides detailed descriptions of the proposed T-CMDP model architecture and its involved components. In
Section 4, we outline the experimental design, present the experimental results, and analyze the indispensability of each model component. Finally,
Section 5 concludes the whole paper by summarizing the main contributions and proposing the future work.
3. The Proposed T-CMDP Method
In this section, we provide a detailed introduction to the EEG feature extraction methods and the proposed T-CMDP model with guaranteed privacy preserving ability.
3.1. Problem Definition
Consider a general case with
K source subjects, where the
k-th subject has
labeled EEG trials. The source domain dataset can be expressed as
, where
represents the EEG data matrix for the
i-th trial, and
denotes its corresponding class label. Here,
m and
l refer to the numbers of EEG channels and time-domain sampling points, respectively. In this study,
C represents the total number of motor intentions. The total number of source samples is
. In contrast, the target domain consists of
unlabeled EEG trials, formulated as
, where
is the EEG data matrix for the
i-th trial in the target domain. Because the target domain lacks labeled data, it is utilized exclusively for domain adaptation and model evaluation. Notably, the raw source EEG data is accessible only during the initial feature extraction and source model training. The main symbols used throughout this paper are summarized in
Table 1.
3.2. Manifold Adaptive EEG Feature Extraction
Raw EEG data is typically structured in three dimensions, i.e., trials, channels, and sampling points. To enable our model to extract more informative and discriminative features from raw EEG data, we employ the Tangent Space Mapping (TSM) method for feature extraction, which is based on the Riemannian geometry and aims to linearize EEG data by transforming its covariance matrix into a tangent space. This approach not only reduces computational complexity in subsequent analysis steps but also effectively preserves the intrinsic geometric properties of the original EEG data, enhancing the robustness and discriminability of the extracted features.
Specifically, before applying tangent space mapping, it is first necessary to compute the covariance matrix for each trial
by
, which serves as a representation of the brain state during that particular trial. Then, the squared Riemanian distance between two covariance matrices (i.e.,
and
) is defined as
where the function
represents the matrix logarithm operation to map the covariance matrix from the Riemannian manifold to the Euclidean space, the function
extracts the upper triangular elements of the resulting matrix by leveraging its symmetric property to reduce redundancy and ensure a compact feature representation (the diagonal elements have weight one and the off-diagonal elements have weight
). Then, the Riemann mean of the
N (i.e.,
N should be respectively
and
for source and target domains) covariance matrices
is
Here
denotes the common tangent space, based on which tangent space features are obtained by transforming the covariance matrix of each subject’s EEG data into a more discriminative feature representation. Specifically, this process is achieved by
where
denotes the aligned covariance matrix of the EEG trial after applying Riemannian geometric transformations. This transformation facilitates downstream processing by linearizing the otherwise non-Euclidean distributed EEG data, thereby enhancing the efficiency and robustness of subsequent classification models.
3.3. Source Model Training
The proposed source model is comprised of three sequential modules, and each designed to address specific challenges associated with EEG signal processing and cross-domain (subject) classification. The framework of our proposed T-CMDP model is shown in
Figure 1.
To preserve the intrinsic temporal dynamics of EEG signals, sinusoidal positional encoding is incorporated into the input sequence. Although the EEG feature extraction stage produces a trial-level feature vector, it can be reorganized into an ordered set of
S tokens without information loss, yielding an input matrix
, where each row corresponds to a token and
denotes its feature dimension. This reformulation enables the Transformer to process the features as a sequence while preserving the structural relationships among tokens. Mathematically, the 1D TSM feature vector of length
L is sequentially reshaped into a
2D matrix without altering any intrinsic values, ensuring no information loss. Physically, each token represents a subset of pairwise spatial connectivities between EEG channels. Unlike direct MLP inputs, this tokenized structure allows the subsequent Multi-Head Self-Attention mechanism to dynamically model high-order interactions between different localized brain connectivity networks, which is essential for capturing domain-invariant motor imagery patterns across subjects. Specifically, the input feature matrix
is projected into an embedding space of dimension
through a learnable linear transformation, i.e.,
where
is a trainable weight matrix. The resulting projected features
are then combined with the positional encoding
via element-wise addition, i.e.,
This design ensures dimensional compatibility between the input features and positional encodings, which enables the Transformer encoder to effectively capture both spatial-channel dynamics and relative temporal positions.
In the first module, a Transformer encoder serves as the primary feature extractor, operating directly on the previously TSM-processed feature vectors derived from each EEG trial. This transformed feature vector forms the input sequence to the Transformer encoder. Unlike conventional approaches that process raw time-series EEG data, our model processes these geometrically regularized features, which have been projected into Euclidean space while preserving the underlying Riemannian structure. This enables the Transformer to focus on high-level temporal dynamics among channels rather than low-level noise or non-stationarities present in raw EEG data.
The Transformer encoder consists of a Multi-Head Self-Attention (MHSA) mechanism and a Feed-Forward Network (FFN). Its core computation is defined as
where
,
,
are query, key, and value matrices obtained by respectively projecting the input sequence
via learnable weight matrices
,
,
. Here,
S denotes the sequence length (i.e., the number of constructed tokens), and
is the dimension per attention head. Multi-head concatenation and projection can be respectively explained as
where
is the projection matrix that linearly transforms the concatenated outputs of all attention heads back to the desired dimension.
The feed forward network (FFN) is defined as
Here
refers to the Gaussian Error Linear Unit activation function, which introduces non-linearity into the network, and
is the output of the Multi-Head Self-Attention layer.
The second module incorporates a feature bottleneck layer that serves to reduce the dimensionality of the high-dimensional features generated by the encoder. This dimensionality reduction is achieved by mapping the extracted features into a lower-dimensional latent space, which not only streamlines the feature representation but also mitigates the risk of overfitting by controlling the overall model complexity. Such compression is essential for distilling the most discriminative information while preserving the salient characteristics of the EEG data.
In the final module, a fully connected classifier is employed to produce the ultimate motor intention predictions. The output dimensionality of classifier is aligned with the number of target classes, ensuring a direct correspondence between the model output and the predefined labels.
To promote a stable training process and enhance convergence, all these modules are initialized using the Xavier uniform distribution. This initialization strategy maintains the variance of activations across layers, thereby mitigating issues such as vanishing or exploding gradients during training.
3.4. Target Domain Adaptation
This step implements model optimization for the target domain adaptation using a bi-objective transfer learning framework. The specific process is illustrated as follows.
In the first objective of knowledge distillation, we achieve source domain knowledge transfer through soft label alignment. We construct a Transformer network for the target domain EEG data with the same architecture as the source model, and independently initialize parameters using Xavier uniform distribution to ensure that the target model has the ability to differentiate solution spaces. Then, we freeze the parameters of the source model, and compute the soft label distribution of the target domain EEG samples through forward propagation
where
,
, and
denote the feature extractor (Transformer-based encoder), bottleneck projection layer, and final classifier head, respectively. After this, we execute distillation loss optimization by minimizing the mean squared error between the output distribution
of the target model and the soft labels of the source model, i.e.,
The SGD optimizer is used for updating parameters, with momentum coefficient and
regularization settings consistent with those of the source model.
When coming to target domain adaptation, we need to jointly optimize the discriminative power and distribution alignment of target domain EEG features, which is achieved by employing the other objective of information maximization (IM) loss, i.e.,
where
denotes the predicted probability that sample
x belongs to class
c, and
represents the aggregated class probability over the target domain, computed as the average prediction of all samples belonging to class
c. By minimizing the prediction entropy while maximizing the entropy of the class distribution, the model promotes confident predictions for individual EEG samples and encourages balanced utilization of different classes, thereby improving the compactness and separability of target-domain EEG feature representations.
The final joint optimization objective is
where
is a balancing coefficient. The parameters of the target model are updated using the SGD optimizer.
3.5. Self-Supervised Learning
In SFDA-based cross-subject EEG classification, pseudo-labeling is critical to compensate for the absence of labeled target EEG data; however, existing methods often update pseudo-labels at fixed intervals, which introduces two key limitations. One is that the static label assignment strategy fails to capture real-time model evolution during training, leading to outdated or noisy labels. The other is the amplification of class bias that monocentric prototype-based pseudo-labels propagate errors for ambiguous samples, especially under inter-class imbalance and intra-class diversity. To address these issues, we introduce a Class-balanced Multicentric Dynamic Pseudo-labeling (CMDP) strategy, which integrates robust prototype design with network update dynamics to generate adaptive and reliable pseudo-labels.
Existing weighted
k-means strategies typically utilize all target domain samples to construct class prototypes, which may result in class imbalance. To prevent easily transferable classes from gradually dominating the prototype generation process, we implement a simple yet effective inter-class balanced sampling strategy that fairly aggregates potential EEG samples for each class. In MIL (multi-instance learning), individual samples are grouped into positive and negative bags. For a specific class
c, we treat the target domain as consisting of one positive bag and one negative bag, where each EEG sample
is represented by its feature vector
and classification score
. Consequently, the prototype for class
c is derived from the positive bag. Since the top-ranked samples are most likely to belong to the positive bag, we aggregate the top
M samples in the target domain with the highest
scores as potential instances. The class-balanced feature prototype
is then constructed by averaging these samples, which in turn facilitates the assignment of pseudo-labels
. This dynamic updating approach not only mitigates class bias but also enhances the model generalization ability in the target domain. This CMDP-based updating process can be expressed by
where
is the set composed by the
M top-scored target samples,
measures the cardinality of a certain set, and
is an Euclidean distance-based metric. This method not only promotes precise allocation of the target labels, but also improves the target domain adaptation performance. Through this strategy, it is expected to effectively reduce the risk of category bias, enhance the effectiveness and reliability of cross-domain knowledge transfer, and provide a more robust foundation for subsequent learning processes.
The pseudo-code of our proposed T-CMDP model is shown in Algorithm 1.
| Algorithm 1: The procedure of the proposed T-CMDP model |
![Systems 14 00476 i001 Systems 14 00476 i001]() |
4. Experiments
4.1. Data Preparation
In the subsequent experiments, three motor imagery EEG datasets were utilized to evaluate the effectiveness of our proposed T-CMDP model, as detailed in
Table 2.
Our evaluation leverages three benchmark motor-imagery (MI) EEG datasets. BNCI2014001, sourced from BCI Competition IV [
32], features data recorded at 250 Hz and bandpass-filtered between 0.5 and 100 Hz; this dataset encompasses a four-class MI paradigm involving the left hand, right hand, feet, and tongue. In contrast, the BNCI2014002 dataset [
33] was acquired using active Ag/AgCl electrodes at a higher sampling frequency of 512 Hz, while BNCI2015001 [
34] shares this 512 Hz sampling rate but incorporates additional preprocessing with a 50 Hz notch filter alongside the standard 0.5–100 Hz bandpass filter. Distinct from the four-class setup of BNCI2014001, both BNCI2014002 and BNCI2015001 focus on a binary classification task comprising imagined movements of the right hand and feet.
All three datasets adhere to a standard cue-based motor imagery (MI) protocol. Each trial commences with a fixation cross displayed for an initial 2-s baseline period. Subsequently, a directional arrow cue (indicating left, right, foot, or tongue imagery) is presented to prompt the subject. The participant is required to sustain the specified motor imagery from the cue onset until the trial concludes at the 6-s mark, coinciding with the disappearance of the fixation point. This structure ensures a consistent 4-s window for active mental task execution following the cue.
In
Table 2, for all the datasets, we utilized the MOABB library
1 to facilitate the downloading and preprocessing of raw EEG data under the motor imagery paradigm. This standardized approach enables the efficient extraction of pertinent information from each participant’s recordings, ensuring that only task-relevant data is retained for subsequent analysis.
4.2. Experimental Setup
To evaluate the performance of the proposed our T-CMDP model, we conducted cross-subject motor imagery classification experiments, by comparing it with three different types of models, i.e., several classic approaches, deep learning architectures, and source-free domain adaptation techniques. These diverse methodologies collectively form the foundation for assessing the effectiveness of T-CMDP in addressing cross-subject EEG-based motor imagery classification tasks. The following items provide an overview of these models.
Traditional models. (1) CSP-LDA (Common Spatial Pattern-Linear Discriminant Analysis) is a two-stage method that extracts discriminative spatial features by identifying filters to maximize inter-class separability, followed by a LDA-based classifier to categorize the extracted EEG features [
10]. (2) EA-CSP-LDA (Euclidean Alignment-CSP-LDA) incorporates the EA strategy to reduce inter-subject variability in EEG data on the basis of the established CSP-LDA framework [
35]. (3) CA-TSM-LDA (Centroid Alignment-Tangent Space Mapping-LDA) combines centroid alignment [
36] with tangent space mapping [
37], aiming to minimize EEG feature distribution discrepancies across subjects.
Deep learning models. (1) Deep Convolutional Networks (DCNs) have been extensively applied to automatically capture spatio-temporal patterns in EEG data through convolutional layers [
38]. (2) Deep Adversarial Networks (DANs) adopt a generative adversarial framework, where a generator and discriminator are trained in tandem to enhance feature representation ability [
39]. Similarly, (3) Domain Adversarial Neural Networks (DANNs) reduce domain shift by incorporating domain classifiers into the training process to align source and target data distributions [
40].
Source-free domain adaptation techniques. (1) SHOT offers a unique solution by freezing the classifier of a pretrained source model and adapting the feature extractor for target domain data using information maximization and self-supervised pseudo-labeling strategies [
26]. (2) Augmentation-based Source-Free Adaptation (ASFA) leverages data augmentation during source model training, emphasizing uncertainty reduction and consistency regularization to improve robustness in the target domain [
29]. (3) Lightweight Source-Free Transfer (LSFT) constructs an intermediate virtual domain consisting of some target domain samples with high prediction consistency by trained source models, which enables knowledge transfer while preserves privacy [
30]. (4) EEG-DG [
40] is a multi-source domain generalization framework, which addresses cross-subject EEG classification by constructing robust domain-invariant representations. It achieves this through a dual-distribution alignment strategy that simultaneously optimizes both marginal and conditional distributions, thereby effectively minimizing statistical discrepancies across diverse source domains. Crucially, adhering to the standard DG paradigm, EEG-DG performs this optimization entirely without access to target domain data during the training phase, relying solely on the generalizability learned from multiple sources. (5) TransDA [
41] is a Transformer-based framework to address source-free domain adaptation by injecting Transformer blocks as attention modules into convolutional networks. This mechanism encourages the model to focus on discriminative regions to improve generalization on unseen target domain samples. Furthermore, it effectively adapts the Transformer to target domain using a novel self-supervised knowledge distillation approach with target pseudo-labels.
The experimental implemented a leave-one-subject-out paradigm, which specifically aims to address unsupervised knowledge transfer scenarios where target domain EEG recordings remained completely unannotated. Source domain processing, along with model training phases, were performed on local infrastructure to maintain data privacy. Classification accuracy served as the primary evaluation metric for assessing model efficacy across domains.
The implementations of different deep learning models have distinct configurations. DCN employed a sequential structure of three identical modules, each containing convolutional operations followed by batch normalization, ReLU-based nonlinear activation, max-pooling, and dropout layers (with probability 0.5). DANN incorporated gradient inversion mechanisms during adversarial training, while DAN employed a multi-kernel distribution matching strategy through MK-MMD metric learning. Hyperparameter settings also differed substantially across architectures. DCN utilized a reduced batch size 32 with lower initial learning rate 0.002, whereas DAN and DANN shared a larger batch size 128 and a higher learning rate 0.01. All of them maintained bottleneck layer dimensionalities between 50 and 288 units.
Implementation details for SFDA methods followed established protocols. SHOT employed consistent batch processing (128 samples) with 50-dimensional bottleneck representations, implementing phased training durations (20 epochs for source domain preparation versus 300 epochs for target domain adaptation). Primary parameters were set as temperature scaling ( = 0.1), cosine distance measurement, periodic adjustments (five-epoch intervals), and regularization balance ( = 0.3). ASFA maintained 128-sample batch size with 50-unit bottleneck dimensionality, whose training schedules was set as 20 epochs for source models while 300 for target adaptation. The balance hyperparameters in ASFA was set as Tsallis entropy (), domain weakening probability (), and decision boundary constraint (). LSFT implemented feature transformation via iterative subspace projection (dimensionality , iterations) with discrepancy thresholds () controlling intermediate domain generation. EEG-DG leverages a sophisticated multi-branch convolutional architecture to achieve robust EEG domain generalization through synergistic domain-invariant feature learning and adaptive feature weighting. The core of this framework comprises four parallel temporal convolutional branches, each engineered to extract distinct temporal patterns and yield four dedicated feature maps, which are subsequently refined by a depthwise convolution block. Crucially, the integration of an adversarial domain classifier with a dynamic feature weighting module enforces the learning of domain-invariant representations, effectively mitigating domain shifts by prioritizing discriminative features while suppressing domain-specific noise. TransDA utilized 64-sample batch processing with 256-dimensional bottleneck representations, scheduling the target domain adaptation for 15 epochs. Primary hyperparameters were configured with a classifier weight of 0.3, an entropy weight of 1.0, cosine distance measurement for pseudo-label generation, and an exponential moving average (EMA) momentum of 0.001 for updating the teacher network.
4.3. Cross-Subject EEG Classification Results
The cross-subject EEG classification accuracies for the three motor imagery EEG datasets are presented in
Table 3,
Table 4 and
Table 5. The best accuracy in each task is highlighted in bold, and the second-best one is underlined. It is observed that on average, our proposed T-CMDP model achieved the best performance across all the three datasets.
Across all the 35 subject-specific evaluations, our method attains the highest average accuracy on all the three datasets, i.e., 56.85% on BNCI2014001, 76.34% on BNCI2014002, and 74.49% on BNCI2015001. This performance improved the second-best model, specifically LSFT by 2.76% on BNCI2014001, LSFT by 2.19% on BNCI2014002, and EEG-DG by 3.08% on BNCI2015001, respectively, indicating the effectiveness of our proposed framework.
Our method ranks among the top two performers in 24 out of the 35 evaluations. This consistency underscores its reliability in practical scenarios where the target subject EEG characteristics are unknown as a priori. Moreover, it excels not only on ‘easy’ subjects (e.g., >95% on subject 1 of BNCI2015001) but also on ‘hard’ ones, striking an optimal balance between peak performance and worst-case robustness.
More critically, our method exhibits superior robustness across some challenging cases. For example, when the subject 2 in the BNCI2014001 dataset served as the target domain, many baselines collapse around chance level, e.g., LSFT 29.17% and ASFA 25.52%; in contrast, our model still achieves an acceptable accuracy of 34.72%. This phenomenon might be caused by the fact that the neural patterns deviate a lot from the subjects in the source domain, leading to irreconcilable inter-domain differences.
To assess the significance of cross-subject EEG classification performance improvements, we performed pair-wised t-test between our model and each of the baseline models, by aggregating the results across all the three datasets together (total 35 evaluation cases). The statistical test results in
Table 6 confirm that the improvements achieved by our proposed T-CMDP model is statistically significant (
p-value is less than 0.05 for each pair-wised comparison). These results validate the overall effectiveness of T-CMDP and equivalently, the effectiveness of its involved components, i.e., Transformer-based feature encoder, the bi-objective target model adaptation and CMD pseudo-label updating strategy-based self-supervised learning.
4.4. Impact of the Bi-Objective Domain Adaptation Strategy
To validate the efficacy of the proposed domain adaptation framework, we conducted a comparative analysis between the baseline source model and the target model enhanced with our transfer learning strategy introduced in
Section 3.4. By taking the BNCI2015001 dataset as an example, we visualized the experimental results across all the 12 evaluation cases in
Figure 2. The blue bar represents the accuracy obtained by the model trained exclusively on source domain data (i.e., without transfer learning), while the orange bar depicts the performance after applying the proposed bi-objective transfer learning method, i.e., including knowledge distillation (KD) and information maximization (IM).
As observed in
Figure 2, the utilization of transfer learning yields a consistent improvement in classification accuracy across almost all the subjects. The bars corresponding to model with transfer learning consistently maintain a superiority over the baseline, particularly in subjects where the source model exhibits substantial performance drops (e.g., subject 7, subject 9, and subject 12). Specifically, when subject 7 serves as the target, the accuracy improves from approximately 64% to over 68%, and for subject 9, it rises from roughly 68% to 72%. This phenomenon indicates that our method effectively mitigates the negative impact of domain shifts inherent in cross-subject EEG classification. We attribute these performance gains to the synergy of the proposed modules. Firstly, the knowledge distillation stage ensures that the target model inherits the task-aware (i.e., different motor intentions) discriminative power of the source model via soft label alignment, preventing the phenomenon of catastrophic forgetting often caused by direct fine-tuning. Secondly, the information maximization loss enhances the compactness of the target EEG sample clusters, pushing the decision boundaries away from high-density regions.
4.5. Impact of CMD Self-Supervised Learning
To further evaluate the contribution of the proposed class-balanced multicentric dynamic pseudo-label update strategy within our framework introduced in
Section 3.5, an ablation study was conducted by comparing their cross-subject EEG classification performance of two model variants, one incorporating the CMD-based self-supervised learning component (termed as “model with self-supervised learning”) and the other excluding it (termed as “model without self-supervised learning”). The results on the BNCI2015001 dataset are given in
Figure 3.
As shown in
Figure 3, the integration of the CMD-based pseudo-label updating strategy consistently improves classification accuracy across nearly all the subjects. The performance bars in orange color corresponding to the model with self-supervised learning lie above those of the baseline in blue color throughout. Notably, substantial improvement are observed for subjects with moderate or low baseline accuracy, particularly the subjects 6, 7, and 10. For example, in the case of when subject 6 serves as the target, it exhibits an accuracy increase from approximately 52% to 59%, while a marked improvement from roughly 82% to nearly 88% is achieved when the subject 4 is the target.
This consistent enhancement demonstrates the efficacy of the CMD pseudo-labeling strategy in overcoming key limitations of conventional pseudo-labeling approaches. Without this module, the adaptation process is susceptible to confirmation bias, where noisy or incorrect pseudo-labels generated under domain shifts and class imbalance lead to error propagation during training. In contrast, the proposed CMDP strategy addresses these challenges through two core mechanisms.
Mitigation of class bias. By leveraging a multi-instance learning-based sampling scheme, prototypes are constructed exclusively from high-confidence EEG samples (i.e., the positive bag), rather than from all available EEG samples. This prevents the dominant, easily transferable classes from skewing prototype estimation, thereby preserving representation fidelity for minority or challenging classes.
Dynamic and robust prototype updating. Unlike static prototype assignment, the multicentric prototypes are updated dynamically to reflect the evolving state of the model. The selection of top-M high-confidence instances (samples) ensures higher reliability of the generated pseudo-labels , facilitating more accurate alignment of target-domain features. This mechanism is particularly beneficial for “hard” subjects (e.g., subjects 10 and 12), where decision boundaries are inherently ambiguous, leading to measurable performance improvements.
To further examine the influence of the CMDP strategy on target domain EEG feature adaptation, t-distributed stochastic neighbor embedding (t-SNE) is employed to visualize the latent features of a representative subject (subject 4) in two-dimensional subspace. As illustrated in
Figure 4, the feature distribution without adaptation (left) is highly entangled, and it is observed that motor intention classes exhibit significant overlap and dispersion, posing a significant challenge for discrimination. In contrast, after applying the CMDP-based self-supervised adaptation (right), the adapted features are reorganized into two well-separated, high-density clusters with a distinct decision boundary and markedly reduced intra-class variance. This visualization corroborates that the CMDP strategy effectively alleviates both domain shift and class bias by refining the latent feature space, enabling the model to learn discriminative and domain-invariant representations even in the complete absence of labeled target data.
4.6. Ablation Studies
To rigorously evaluate the individual contribution of each component within our proposed T-CMDP framework, we conducted a comprehensive ablation study on the three datasets. The results, as detailed in
Table 7, investigate the impacts of the three key strategies, Tangent Space Mapping (TSM)-based feature extraction, bi-objective strategy-based domain adaptation (DA), and the CMDP strategy-based Self-supervised Learning (SL).
Efficacy of the Riemannian manifold-based feature extraction (TSM). The most significant performance leap is observed upon the introduction of TSM. As shown in the comparison between the first row (baseline) and the fifth row, applying TSM alone increases the average accuracy across the three datasets from 42.66% to 61.23%. This substantial improvement underscores the fundamental importance of Riemannian geometry in motor imagery task-aware EEG feature extraction. By mapping the covariance matrices into a tangent Euclidean space, TSM effectively linearizes the data structure, providing a much more discriminative feature space for the subsequent Transformer encoder than raw EEG data.
Impact of the bi-objective strategy-based domain adaptation (DA). Building upon the TSM features, the inclusion of the domain adaptation component by comprising knowledge distillation and information maximization strategies further enhances the model generalization ability on unseen subjects. Comparing the TSM-only model in the fifth row with the ‘TSM+DA’ configuration in the seventh row, the average accuracy rises to 63.82%. This indicates that the domain adaptation process effectively aligns the EEG feature distributions between the source and target domains, successfully mitigating the inter-subject domain shift. Notably, the improvement is consistent across all the three datasets, validating the stability of the proposed losses in achieving domain alignment.
Contribution of the CMD-based self-supervised learning (SL). The SL component with the help of the specific CMD-based pseudo-labeling strategy, also demonstrates a distinct contribution. When added to the TSM baseline shown in the sixth row, it achieves an accuracy of 63.93%, which is comparable to the DA module. More importantly, when integrated into the full framework, the SL component complements the DA module to elevate the final average accuracy to 69.23%. This suggests that while DA aligns the global distribution, the SL strategy refines the decision boundaries for individual EEG samples in the target domain by mitigating class bias and noisy pseudo-labels.
Synergistic effects. The optimal performance is achieved when all the three components are employed simultaneously, as shown in the bottom row (Row 8) of
Table 7. The full model outperforms any single-component or dual-component variant, yielding the highest results on all datasets (56.85% on BNCI2014001, 76.34% on BNCI2014002, and 74.49% on BNCI2015001). This confirms that the proposed components are highly synergistic rather than redundant. TSM provides a robust geometric manifold-aware feature foundation, DA bridges the domain gap, and SL fine-tunes the target-specific representations.
4.7. Parameter Sensitivity Analysis
To evaluate the robustness of the proposed T-CMDP framework and the impact of hyper-parameters on the cross-subject EEG classification performance, we conducted sensitivity analyses on two primary parameters, i.e., the number of class centers (denoted as multi-center-num) and the selection of high-confidence instances (denoted as top-k) for class prototype construction.
As illustrated in
Figure 5, we investigated the sensitivity of model accuracy in terms of different parameter
multi-center-num values, which were varied from 4 to 20. The experimental results indicate that the accuracy initially exhibits an upward trend, reaching its peak of approximately 72.7% when the number of class centers is 12. This suggests that employing multiple centers for each class is beneficial for capturing the intra-class diversity and the complex manifold structure of EEG signals in the target domain. However, as the number of centers exceeds the optimal value, the accuracy begins to slightly decline. This performance degradation is likely attributable to the over-segmentation of the feature space, where an excessive number of centers causes the model to fit local noise or outliers rather than the global distribution, thereby introducing ambiguity in pseudo-label assignments.
The parameter
top-k also plays a vital role in the CMD-based pseudo-label updating strategy, because it determines the number of high-confidence instances in computing the class-balanced prototypes.
Figure 6 displays the variation of classification accuracy as such parameter ranges from one to nine. The highest accuracy is achieved at when
top-k was set to three, indicating that the top-ranked instances provide the most reliable and discriminative information for representing the class distribution. As
k increases beyond this threshold, there exists a slight drop in classification accuracy, particularly when it is larger than five. This phenomenon demonstrates that incorporating more instances into prototype construction tends to introduce noisy samples that located around decision boundaries or with lower classification confidence. Such noise makes the prototypes deviate away from the essential class centroids, leading to the propagation of incorrect pseudo-labels and hindering the domain adaptation process.
In conclusion, the above analysis reveals that appropriate parameters are necessary for our proposed model to achieve reasonable performance. Based on these empirical findings, we set the multi-center-num and top-k respectively as 12 and 3 by default in our experiments to ensure the optimal balance between representation capability and label reliability.
5. Conclusions
In this work, we presented a privacy-preserving, source-free domain adaptation framework termed T-CMDP for cross-subject EEG classification in motor imagery BCI paradigm. By leveraging a Transformer-based source model, a bi-objective domain adaptation strategy, and a novel class-balanced multicentric dynamic self-supervised learning mechanism, our approach effectively mitigated inter-subject variability without accessing raw source EEG data. Experimental results across three public motor imagery EEG datasets demonstrated the competitive performance of T-CMDP in comparison with three different types of EEG classification models (i.e., existing traditional, deep learning, and source-free domain adaptation baselines), achieving state-of-the-art average accuracies of 56.85%, 76.34%, and 74.49%, respectively. Ablation studies further confirmed the complementary contributions of Riemannian manifold-based EEG feature extraction, knowledge distillation together with information maximization-based target domain adaptation, and dynamic pseudo-label refinement. Critically, our framework enables rapid deployment with minimal subject-specific calibration while adhering to privacy preserving and regulatory constraints, making it particularly suitable for real-world applications.
Despite these promising outcomes, several practical and theoretical limitations must be acknowledged. Practically, real-world clinical EEG recordings are often plagued by severe physiological artifacts and possible class imbalances, which may degrade the reliability of the multicentric pseudo-labels generated by our current model. Theoretically, while the SFDA paradigm prevents direct raw data leakage, it does not yet provide strict mathematical privacy guarantees. Advanced adversarial threats, such as model inversion or membership inference attacks, could still pose risks to the transferred model weights. Furthermore, the transition of such technologies into clinical BCI deployment necessitates careful ethical considerations. Ensuring algorithmic fairness across diverse patient demographics and maintaining transparent patient consent regarding how pre-trained models are utilized are paramount. Future work will focus on enhancing model robustness against highly noisy and imbalanced clinical data, integrating rigorous differential privacy mechanisms, and establishing ethical guidelines for the responsible deployment of adaptive neuroprosthetics.