Transformer-Based SFDA by Class-Balanced Multicentric Dynamic Pseudo-Labeling for Privacy-Preserving EEG-Based BCI Systems

Liu, Jiangchuan; Zhang, Jiatao; Hu, Cong; Peng, Yong

doi:10.3390/systems14050476

Open AccessArticle

Transformer-Based SFDA by Class-Balanced Multicentric Dynamic Pseudo-Labeling for Privacy-Preserving EEG-Based BCI Systems

by

Jiangchuan Liu

¹

,

Jiatao Zhang

¹

,

Cong Hu

²

and

Yong Peng

^3,4,*

¹

HDU-ITMO Joint Institute, Hangzhou Dianzi University, Hangzhou 310018, China

²

Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, Guilin University of Electronic Technology, Guilin 541004, China

³

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China

⁴

Key Laboratory of Brain-Machine Collaborative Intelligence of Zhejiang Province, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Systems 2026, 14(5), 476; https://doi.org/10.3390/systems14050476

Submission received: 19 March 2026 / Revised: 15 April 2026 / Accepted: 24 April 2026 / Published: 28 April 2026

(This article belongs to the Special Issue Computational Methods for Complex Systems: Modeling, Optimization, and Decision Support Across Domains)

Download

Browse Figures

Versions Notes

Abstract

As a common brain-computer interface (BCI) paradigm, electroencephalogram (EEG)-based motor imagery provides a critical pathway for both assistive technology to (restoring communication and control) and active rehabilitation (promoting neural plasticity and functional recovery). Domain adaptation has been shown to effectively enhance the decoding performance of motor intentions for target subjects by leveraging labeled data from source subjects. However, EEG data from source subjects often contains extensive personal privacy, and the direct access to source EEG data easily leads to privacy leakage issues. An important research topic is to achieve domain adaptation without directly accessing the source subjects’ raw data. To address this challenge, a privacy-preserving source-free domain adaptation framework, termed Transformer-based SFDA with Class-balanced Multicentric Dynamic Pseudo-labeling (T-CMDP), is proposed for cross-subject motor-imagery EEG classification. This framework consists of three coupled stages. In the source model training stage, a Transformer-based encoder combined with Riemannian manifold-aware feature extraction is employed to learn transferable and discriminative EEG feature representations. In the source-free target adaptation stage, only the pretrained source model is transferred to the target domain and adapted through knowledge distillation and information maximization, without accessing raw source EEG data. In the self-supervised learning stage, class-balanced multicentric prototypes and high-confidence pseudo-label updates are introduced to progressively refine the target-domain decision boundaries. Extensive experiments on three motor-imagery EEG datasets demonstrate that the proposed T-CMDP framework consistently outperforms eleven representative baselines from traditional machine learning, deep learning, and source-free transfer approaches, achieving average accuracies of 56.85%, 76.34%, and 74.49%, respectively. These results indicate that T-CMDP effectively alleviates inter-subject EEG distribution discrepancies and ensures the privacy preserving of source subjects, thereby facilitating more reliable and practical deployment of EEG-based BCI systems.

Keywords:

EEG; motor imagery; multi-centric dynamic self-supervised learning; privacy preserving; source-free domain adaptation

1. Introduction

BCI systems enable their users communication and control channels that do not depend on the brain’s normal output channels of peripheral nerves and muscles [1]. Motor imagery holds considerable significance in the realm of motor BCI systems [2], offering a novel avenue for control and interaction, particularly for individuals who have lost voluntary motor function, such as patients with severe motor neuron diseases [3], locked-in syndrome, or other conditions that impair their normal movements. Defined as the mental rehearsal of physical movements without actual execution [4], motor imagery enables these individuals to interact with external devices by translating their mentally imagined movements into commands or actions.

In BCI research community, EEG signals associated with motor imagery are characterized by two fundamental properties [5], a low signal-to-noise ratio (SNR) and high non-stationarity coupled with complexity. The former stems from the inherent low amplitude of EEG signals and their susceptibility to external interference; the latter arises from the intricate, coordinated neural activities across multiple brain regions engaged in motor imagery, yielding signals that are not only information-rich but also highly complex [6]. To efficiently extract discriminative features from raw EEG data, a variety of techniques have been developed, including temporal-frequency analysis [7], frequency-domain feature extraction [8], spatial information extraction such as common spatial patterns (CSP) [9,10] and brain connectivity patterns [11], deep learning methodologies [12]. By leveraging these sophisticated feature extraction methods, researchers are better equipped to address the inherent challenges posed by the EEG signals of motor imagery.

In addition to the aforementioned considerations, EEG signals often exhibit notable cross-subject feature distribution discrepancies [13,14], with significant variations in EEG activation patterns between individuals. Even when performing identical tasks, such as imagining the movement of the left hand, different subjects may display distinct activation patterns, frequency characteristics, and spatial distributions in the collected EEG data [15]. A primary obstacle in EEG-based decoding lies in the significant distribution shifts observed between subjects, which often leads to suboptimal decoding performance when generalized to unseen users. Transfer learning offers a robust solution to this challenge by bridging the gap between a data-rich source domain and a data-scarce target domain [16]. Specifically, in the realm of EEG analysis, this approach capitalizes on prior knowledge gained from a group of source participants to accelerate learning for a target individual. This process effectively mitigates the burden of collecting massive labeled datasets for every new user, thereby streamlining the calibration phase required for practical BCI deployment. By integrating such methodologies, researchers can overcome the limitations posed by subject-specific variations, thereby achieving more accurate and reliable semantic decoding from EEG data.

While the landscape of transfer learning is replete with diverse methodologies, a significant oversight persists that most existing frameworks necessitate direct access to source EEG data, thereby compromising the privacy of source subjects [17,18]. This limitation is particularly acute in EEG applications, where signals inherently encapsulate sensitive biometric and pathological markers, raising severe compliance concerns under stringent data protection regulations [19]. To address this critical gap, our primary contribution is the formulation of a privacy-preserving transfer learning paradigm. Specifically, we introduce a source-free domain adaptation strategy that relies solely on the pre-trained source model, strictly prohibiting access to raw source EEG data during the adaptation phase [20]. This architecture not only fortifies the confidentiality of source EEG data but also optimizes computational efficiency by eliminating the overhead associated with cross-domain data transmission.

To simultaneously tackle cross-subject EEG adaptation challenges and uphold source data privacy, we introduce a T-CMDP (Transformer-based source-free domain adaptation via Class-balanced Multicentric Dynamic Pseudo-labeling) framework in this paper. This framework orchestrates three synergistic modules into a cohesive pipeline. Initially, during source model construction, we leverage manifold feature extraction to distill task-discriminative representations from raw EEG signals, laying a robust foundation for the subsequent Transformer-based local model training. Building upon this, the adaptation phase implements a parameter transfer mechanism that migrates pre-trained source weights to the target domain, effectively enhancing predictive fidelity while strictly adhering to privacy constraints. Concurrently, a self-supervised refinement loop is integrated to dynamically optimize pseudo-labels for unlabeled target EEG samples, thereby maximizing the model capacity to exploit latent information. Through this intricate interplay of feature distillation, privacy-preserving transfer, and label refinement, T-CMDP significantly elevates both the robustness and generalization performance of cross-subject EEG decoding.

The primary contributions of this study are articulated as follows:

Advancing a privacy-centric adaptation paradigm. We introduce T-CMDP, a novel source-free domain adaptation framework tailored for cross-subject motor-imagery EEG classification. Distinct from conventional approaches, T-CMDP obviates the need for raw source EEG data during target adaptation. By synergizing Transformer-based representation learning with Riemannian manifold-aware feature extraction, our framework significantly enhances transferability while rigorously preserving source subject privacy.
Devising a robust pseudo-label refinement mechanism. We formulate a class-balanced multicentric dynamic pseudo-labeling strategy to address the challenges of source-free adaptation. This mechanism integrates knowledge distillation and information maximization with global inter-class balanced sampling and intra-class multicentric prototype construction. Such a holistic design effectively mitigates class bias and suppresses noisy label propagation, thereby sharpening decision boundaries in the target domain.
Establishing new performance benchmarks. Comprehensive evaluations across three public motor-imagery EEG datasets reveal that T-CMDP consistently outperforms state-of-the-art machine learning, deep learning, and source-free baselines, achieving average accuracies of 56.85%, 76.34%, and 74.49%, respectively. Furthermore, rigorous ablation studies and parameter sensitivity analyses corroborate the complementary efficacy of TSM-based feature extraction, domain adaptation, and self-supervised refinement, while validating the optimality of key hyper-parameters (e.g., the number of class centers and high-confidence instances).

The remainder of this paper is organized as follows. In Section 2, we review some works related to this study, encompassing EEG feature extraction methods and techniques for cross-subject EEG classification. Section 3 provides detailed descriptions of the proposed T-CMDP model architecture and its involved components. In Section 4, we outline the experimental design, present the experimental results, and analyze the indispensability of each model component. Finally, Section 5 concludes the whole paper by summarizing the main contributions and proposing the future work.

2. Related Works

2.1. Review of EEG Feature Extraction Methods

Feature extraction is a critical step in processing and analyzing EEG data, enabling the exploration of task-related meaningful information from raw time-series signals [21]. The commonly used EEG feature extraction methods aim to explore the task-related neural patterns from different EEG domains. For example, the temporal domain features are extracted directly from the raw EEG time-series such as the amplitude statistics and waveform characteristics [22]. For frequency domain features, EEG data is first converted into the frequency domain by the Fourier transform for further analysis [8] and common features include the power spectral density [23], relative power, center frequency, and frequency band power. To achieve time-frequency analysis, techniques such as wavelet transform and Hilbert-Huang transforms, are employed to capture signal variations across both temporal and frequency domains, based on which features such as wavelet coefficients, intrinsic mode functions can be calculated. To leverage the spatial arrangement of EEG electrodes for enhancing classification performance, methods such as common spatial pattern and brain functional (connectivity) networks [11] are widely used for spatial feature extraction.

2.2. Cross-Subject EEG Classification and Privacy-Preserving

As one of the primary data modalities in BCI research and applications, EEG exhibits significant inter-subject variability, posing considerable challenges for decoding models. Therefore, using labeled data from existing subjects for model training and then transferring them to new subjects has become the mainstream approach to addressing model calibration issue in BCI systems, establishing transfer learning as the primary framework for cross-subject EEG classification. Commonly used methods include feature representation-based, instance-based, and model-based transfer learning ones. Generally, these methods have no explicit privacy preserving mechanism during the domain adaptation process and typically involve direct access to source domain EEG data.

Privacy-preserving learning aims at mitigating the risks of personal data leakage during data usage and sharing. With the escalating frequency of data leakage incidents, many regulations were issued to enforce the global personal data protection, such as China’s Personal Information Protection Law (PIPL) and the European Union’s General Data Protection Regulation (GDPR). Since EEG data from source subjects often contains sensitive information, such as subject identities and pathological conditions, the direct use of such data during domain adaptation may lead to privacy breaches. Therefore, enabling effective knowledge transfer while ensuring privacy preserving in cross-subject EEG classification has emerged as an important research topic in EEG decoding. By leveraging advanced privacy preserving techniques like differential privacy [24] and federated learning [25], it is possible to secure data and protect user privacy without compromising the efficiency of model training or its prediction capabilities.

Source-Free Domain Adaptation (SFDA) has emerged as a critical paradigm within domain adaptation, specifically designed to facilitate knowledge transfer from well-labeled source domain to unlabeled target domain without requiring access to raw source data. In contrast, conventional domain adaptation techniques hinge upon the simultaneous availability of both source and target datasets to fine-tune model parameters. This requirement has become prohibitive in privacy-sensitive sectors such as healthcare, where data sharing is strictly regulated. Consequently, SFDA circumvents these regulatory barriers by leveraging only pre-trained source models alongside unlabeled target data for adaptation. Recent studies have proposed various sophisticated SFDA methodologies, each of which aims to address the complexities of source data inaccessibility while maintaining robust transfer performance. For example, the Source HypOthesis Transfer (SHOT) framework introduced a simple and universal representation learning approach that freezes the classifier modules (i.e., hypotheses) in the source model while focusing on learning domain-specific feature extraction modules [26]. Similarly, Niknam et al. developed a passive unsupervised domain adaptation method for single-channel EEG sleep stage classification and further incorporated a weighted diversity loss to achieve better performance, which surpasses existing state-of-the-art techniques without accessing source domain data [27]. Additionally, Zhang et al. proposed a decentralized, privacy-preserving transfer learning scenario within the multi-source domain transfer framework [28], where EEG data and computations from multiple source subjects remain localized and only pre-trained model parameters or prediction results are accessible to ensure privacy protection. Xia et al. proposed an augmentation-based Source-Free Adaptation (ASFA) framework [29] to simultaneously address uncertainty reduction in domain adaptation and enforce consistency regularization during the target model training phase to enhance model robustness. To address the challenge of adapting models to the target domain without access to source data, recent Source-Free Domain Adaptation (SFDA) methods have adopted diverse strategies. Zhang et al. [30] propose a prototype-based reconstruction approach, where target samples exhibiting high prediction consistency with the pre-trained source model are curated to synthesize a virtual intermediate domain. This synthetic domain serves as a proxy for the unavailable source data, facilitating effective domain alignment. Alternatively, Zhao et al. [31] introduce a statistical modeling perspective for cross-subject seizure prediction. Their method leverages Gaussian Mixture Models (GMM) to jointly characterize the latent structure of the source model and the distribution of target data, enabling robust clustering and adaptation through probabilistic alignment rather than explicit data reconstruction.

3. The Proposed T-CMDP Method

In this section, we provide a detailed introduction to the EEG feature extraction methods and the proposed T-CMDP model with guaranteed privacy preserving ability.

3.1. Problem Definition

Consider a general case with K source subjects, where the k-th subject has

n_{s, k}

labeled EEG trials. The source domain dataset can be expressed as

D_{s} = {\{X_{s, i}, y_{s, i}\}}_{i = 1}^{n_{s, k}}

, where

X_{s, i} \in R^{m \times l}

represents the EEG data matrix for the i-th trial, and

y_{s, i} \in {1, \dots, C}

denotes its corresponding class label. Here, m and l refer to the numbers of EEG channels and time-domain sampling points, respectively. In this study, C represents the total number of motor intentions. The total number of source samples is

n_{s}

. In contrast, the target domain consists of

n_{t}

unlabeled EEG trials, formulated as

D_{t} = {\{X_{t, i}\}}_{i = 1}^{n_{t}}

, where

X_{t, i} \in R^{m \times l}

is the EEG data matrix for the i-th trial in the target domain. Because the target domain lacks labeled data, it is utilized exclusively for domain adaptation and model evaluation. Notably, the raw source EEG data is accessible only during the initial feature extraction and source model training. The main symbols used throughout this paper are summarized in Table 1.

3.2. Manifold Adaptive EEG Feature Extraction

Raw EEG data is typically structured in three dimensions, i.e., trials, channels, and sampling points. To enable our model to extract more informative and discriminative features from raw EEG data, we employ the Tangent Space Mapping (TSM) method for feature extraction, which is based on the Riemannian geometry and aims to linearize EEG data by transforming its covariance matrix into a tangent space. This approach not only reduces computational complexity in subsequent analysis steps but also effectively preserves the intrinsic geometric properties of the original EEG data, enhancing the robustness and discriminability of the extracted features.

Specifically, before applying tangent space mapping, it is first necessary to compute the covariance matrix for each trial

X_{s, i} \in R^{m \times l}

by

Q_{i} ≜ C o v (X_{s, i}) \in R^{m \times m}

, which serves as a representation of the brain state during that particular trial. Then, the squared Riemanian distance between two covariance matrices (i.e.,

Q_{1}

and

Q_{2}

) is defined as

D_{R} (Q_{1}, Q_{2}) = {∥upper (\log (Q_{1}^{- 1 / 2} Q_{2} Q_{1}^{- 1 / 2}))∥}_{2}^{2},

(1)

where the function

\log (\cdot)

represents the matrix logarithm operation to map the covariance matrix from the Riemannian manifold to the Euclidean space, the function

upper (\cdot)

extracts the upper triangular elements of the resulting matrix by leveraging its symmetric property to reduce redundancy and ensure a compact feature representation (the diagonal elements have weight one and the off-diagonal elements have weight

\sqrt{2}

). Then, the Riemann mean of the N (i.e., N should be respectively

n_{s}

and

n_{t}

for source and target domains) covariance matrices

Q_{i}

is

M_{R} = \arg \min_{Q} \sum_{i = 1}^{N} D_{R} (Q, Q_{i}) .

(2)

Here

M_{R}

denotes the common tangent space, based on which tangent space features are obtained by transforming the covariance matrix of each subject’s EEG data into a more discriminative feature representation. Specifically, this process is achieved by

x_{i} = upper (\log (M_{R}^{- 1 / 2} Q_{i} M_{R}^{1 / 2})),

(3)

where

Q^{'} ≜ M_{R}^{- 1 / 2} Q_{i} M_{R}^{1 / 2}

denotes the aligned covariance matrix of the EEG trial after applying Riemannian geometric transformations. This transformation facilitates downstream processing by linearizing the otherwise non-Euclidean distributed EEG data, thereby enhancing the efficiency and robustness of subsequent classification models.

3.3. Source Model Training

The proposed source model is comprised of three sequential modules, and each designed to address specific challenges associated with EEG signal processing and cross-domain (subject) classification. The framework of our proposed T-CMDP model is shown in Figure 1.

To preserve the intrinsic temporal dynamics of EEG signals, sinusoidal positional encoding is incorporated into the input sequence. Although the EEG feature extraction stage produces a trial-level feature vector, it can be reorganized into an ordered set of S tokens without information loss, yielding an input matrix

X \in R^{S \times d_{x}}

, where each row corresponds to a token and

d_{x}

denotes its feature dimension. This reformulation enables the Transformer to process the features as a sequence while preserving the structural relationships among tokens. Mathematically, the 1D TSM feature vector of length L is sequentially reshaped into a

S \times d_{x}

2D matrix without altering any intrinsic values, ensuring no information loss. Physically, each token represents a subset of pairwise spatial connectivities between EEG channels. Unlike direct MLP inputs, this tokenized structure allows the subsequent Multi-Head Self-Attention mechanism to dynamically model high-order interactions between different localized brain connectivity networks, which is essential for capturing domain-invariant motor imagery patterns across subjects. Specifically, the input feature matrix

X

is projected into an embedding space of dimension

d_{in}

through a learnable linear transformation, i.e.,

X_{proj} = X W_{p},

(4)

where

W_{p} \in R^{d_{x} \times d_{in}}

is a trainable weight matrix. The resulting projected features

X_{proj} \in R^{S \times d_{in}}

are then combined with the positional encoding

PE \in R^{S \times d_{in}}

via element-wise addition, i.e.,

\begin{matrix} P E_{(p o s, 2 i)} = sin (\frac{p o s}{10000^{\frac{2 i}{d_{in}}}}), \end{matrix}

(5)

\begin{matrix} P E_{(p o s, 2 i + 1)} = cos (\frac{p o s}{10000^{\frac{2 i}{d_{in}}}}), \end{matrix}

(6)

X_{embed} = X_{proj} + PE .

(7)

This design ensures dimensional compatibility between the input features and positional encodings, which enables the Transformer encoder to effectively capture both spatial-channel dynamics and relative temporal positions.

In the first module, a Transformer encoder serves as the primary feature extractor, operating directly on the previously TSM-processed feature vectors derived from each EEG trial. This transformed feature vector

x_{i} \in R^{d_{x}}

forms the input sequence to the Transformer encoder. Unlike conventional approaches that process raw time-series EEG data, our model processes these geometrically regularized features, which have been projected into Euclidean space while preserving the underlying Riemannian structure. This enables the Transformer to focus on high-level temporal dynamics among channels rather than low-level noise or non-stationarities present in raw EEG data.

The Transformer encoder consists of a Multi-Head Self-Attention (MHSA) mechanism and a Feed-Forward Network (FFN). Its core computation is defined as

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V,

(8)

where

Q

,

K

,

V \in R^{S \times d_{k}}

are query, key, and value matrices obtained by respectively projecting the input sequence

X \in R^{n \times d_{x}}

via learnable weight matrices

W_{i}^{Q}

,

W_{i}^{K}

,

W_{i}^{V}

. Here, S denotes the sequence length (i.e., the number of constructed tokens), and

d_{k}

is the dimension per attention head. Multi-head concatenation and projection can be respectively explained as

\begin{matrix} MHSA (X_{embed}) = Concat ({head}_{1}, \dots, {head}_{h}) W^{O}, \end{matrix}

(9)

\begin{matrix} {head}_{i} = Attention (X_{embed} W_{i}^{Q}, X_{embed} W_{i}^{K}, X_{embed} W_{i}^{V}), \end{matrix}

(10)

where

W^{O}

is the projection matrix that linearly transforms the concatenated outputs of all attention heads back to the desired dimension.

The feed forward network (FFN) is defined as

FFN (Z) = GELU (Z W_{1} + b_{1}) W_{2} + b_{2} .

(11)

Here

GELU (\cdot)

refers to the Gaussian Error Linear Unit activation function, which introduces non-linearity into the network, and

Z = MHSA (X_{embed})

is the output of the Multi-Head Self-Attention layer.

The second module incorporates a feature bottleneck layer that serves to reduce the dimensionality of the high-dimensional features generated by the encoder. This dimensionality reduction is achieved by mapping the extracted features into a lower-dimensional latent space, which not only streamlines the feature representation but also mitigates the risk of overfitting by controlling the overall model complexity. Such compression is essential for distilling the most discriminative information while preserving the salient characteristics of the EEG data.

In the final module, a fully connected classifier is employed to produce the ultimate motor intention predictions. The output dimensionality of classifier is aligned with the number of target classes, ensuring a direct correspondence between the model output and the predefined labels.

To promote a stable training process and enhance convergence, all these modules are initialized using the Xavier uniform distribution. This initialization strategy maintains the variance of activations across layers, thereby mitigating issues such as vanishing or exploding gradients during training.

3.4. Target Domain Adaptation

This step implements model optimization for the target domain adaptation using a bi-objective transfer learning framework. The specific process is illustrated as follows.

In the first objective of knowledge distillation, we achieve source domain knowledge transfer through soft label alignment. We construct a Transformer network for the target domain EEG data with the same architecture as the source model, and independently initialize parameters using Xavier uniform distribution to ensure that the target model has the ability to differentiate solution spaces. Then, we freeze the parameters of the source model, and compute the soft label distribution of the target domain EEG samples through forward propagation

p_{source} (X) = Softmax (netC (netB (netF (X)))) \in R^{n_{C}},

(12)

where

netF (\cdot)

,

netB (\cdot)

, and

netC (\cdot)

denote the feature extractor (Transformer-based encoder), bottleneck projection layer, and final classifier head, respectively. After this, we execute distillation loss optimization by minimizing the mean squared error between the output distribution

p_{target} (x)

of the target model and the soft labels of the source model, i.e.,

L_{distill} = E_{X \sim D_{t}} [∥ p_{target} (X) - p_{source} {(X) ∥}_{2}^{2}] .

(13)

The SGD optimizer is used for updating parameters, with momentum coefficient and

ℓ_{2}

regularization settings consistent with those of the source model.

When coming to target domain adaptation, we need to jointly optimize the discriminative power and distribution alignment of target domain EEG features, which is achieved by employing the other objective of information maximization (IM) loss, i.e.,

L_{IM} = \underset{Maximize predictive entropy}{\underset{︸}{E_{X \sim D_{t}} [- \sum_{c = 1}^{n_{class}} p_{c} log p_{c}]}} + \underset{Minimize class distribution entropy}{\underset{︸}{λ \cdot E_{X \sim D_{t}} [- \sum_{c = 1}^{n_{class}} q_{c} log q_{c}]}},

(14)

where

p_{c}

denotes the predicted probability that sample x belongs to class c, and

q_{c}

represents the aggregated class probability over the target domain, computed as the average prediction of all samples belonging to class c. By minimizing the prediction entropy while maximizing the entropy of the class distribution, the model promotes confident predictions for individual EEG samples and encourages balanced utilization of different classes, thereby improving the compactness and separability of target-domain EEG feature representations.

The final joint optimization objective is

L_{total} = L_{IM} + β L_{CE},

(15)

where

β

is a balancing coefficient. The parameters of the target model are updated using the SGD optimizer.

3.5. Self-Supervised Learning

In SFDA-based cross-subject EEG classification, pseudo-labeling is critical to compensate for the absence of labeled target EEG data; however, existing methods often update pseudo-labels at fixed intervals, which introduces two key limitations. One is that the static label assignment strategy fails to capture real-time model evolution during training, leading to outdated or noisy labels. The other is the amplification of class bias that monocentric prototype-based pseudo-labels propagate errors for ambiguous samples, especially under inter-class imbalance and intra-class diversity. To address these issues, we introduce a Class-balanced Multicentric Dynamic Pseudo-labeling (CMDP) strategy, which integrates robust prototype design with network update dynamics to generate adaptive and reliable pseudo-labels.

Existing weighted k-means strategies typically utilize all target domain samples to construct class prototypes, which may result in class imbalance. To prevent easily transferable classes from gradually dominating the prototype generation process, we implement a simple yet effective inter-class balanced sampling strategy that fairly aggregates potential EEG samples for each class. In MIL (multi-instance learning), individual samples are grouped into positive and negative bags. For a specific class c, we treat the target domain as consisting of one positive bag and one negative bag, where each EEG sample

x_{t}

is represented by its feature vector

{\hat{g}}_{t} (x_{t})

and classification score

p (x_{t}) = δ ({\hat{f}}_{t} (x_{t}))

. Consequently, the prototype for class c is derived from the positive bag. Since the top-ranked samples are most likely to belong to the positive bag, we aggregate the top M samples in the target domain with the highest

δ ({\hat{f}}_{t} (x_{t}))

scores as potential instances. The class-balanced feature prototype

{Pro}_{c}

is then constructed by averaging these samples, which in turn facilitates the assignment of pseudo-labels

{\hat{y}}_{t}

. This dynamic updating approach not only mitigates class bias but also enhances the model generalization ability in the target domain. This CMDP-based updating process can be expressed by

\begin{matrix} M_{c} & = \underset{x_{i} \in D_{t}, | M_{c} | = M}{arg max} δ_{c} ({\hat{f}}_{t} (x_{i})), \end{matrix}

(16)

\begin{matrix} {Pro}_{c} & = \frac{1}{M} \sum_{i \in M_{c}} {\hat{g}}_{t} (x_{t}^{i}), \end{matrix}

(17)

\begin{matrix} {\hat{y}}_{t} & = \underset{c}{arg min} D_{f} ({\hat{g}}_{t} (x_{t}), {Pro}_{c}), \end{matrix}

(18)

where

M_{c}

is the set composed by the M top-scored target samples,

| \cdot |

measures the cardinality of a certain set, and

D_{f}

is an Euclidean distance-based metric. This method not only promotes precise allocation of the target labels, but also improves the target domain adaptation performance. Through this strategy, it is expected to effectively reduce the risk of category bias, enhance the effectiveness and reliability of cross-domain knowledge transfer, and provide a more robust foundation for subsequent learning processes.

The pseudo-code of our proposed T-CMDP model is shown in Algorithm 1.

Algorithm 1: The procedure of the proposed T-CMDP model

4. Experiments

4.1. Data Preparation

In the subsequent experiments, three motor imagery EEG datasets were utilized to evaluate the effectiveness of our proposed T-CMDP model, as detailed in Table 2.

Our evaluation leverages three benchmark motor-imagery (MI) EEG datasets. BNCI2014001, sourced from BCI Competition IV [32], features data recorded at 250 Hz and bandpass-filtered between 0.5 and 100 Hz; this dataset encompasses a four-class MI paradigm involving the left hand, right hand, feet, and tongue. In contrast, the BNCI2014002 dataset [33] was acquired using active Ag/AgCl electrodes at a higher sampling frequency of 512 Hz, while BNCI2015001 [34] shares this 512 Hz sampling rate but incorporates additional preprocessing with a 50 Hz notch filter alongside the standard 0.5–100 Hz bandpass filter. Distinct from the four-class setup of BNCI2014001, both BNCI2014002 and BNCI2015001 focus on a binary classification task comprising imagined movements of the right hand and feet.

All three datasets adhere to a standard cue-based motor imagery (MI) protocol. Each trial commences with a fixation cross displayed for an initial 2-s baseline period. Subsequently, a directional arrow cue (indicating left, right, foot, or tongue imagery) is presented to prompt the subject. The participant is required to sustain the specified motor imagery from the cue onset until the trial concludes at the 6-s mark, coinciding with the disappearance of the fixation point. This structure ensures a consistent 4-s window for active mental task execution following the cue.

In Table 2, for all the datasets, we utilized the MOABB library1 to facilitate the downloading and preprocessing of raw EEG data under the motor imagery paradigm. This standardized approach enables the efficient extraction of pertinent information from each participant’s recordings, ensuring that only task-relevant data is retained for subsequent analysis.

4.2. Experimental Setup

To evaluate the performance of the proposed our T-CMDP model, we conducted cross-subject motor imagery classification experiments, by comparing it with three different types of models, i.e., several classic approaches, deep learning architectures, and source-free domain adaptation techniques. These diverse methodologies collectively form the foundation for assessing the effectiveness of T-CMDP in addressing cross-subject EEG-based motor imagery classification tasks. The following items provide an overview of these models.

Traditional models. (1) CSP-LDA (Common Spatial Pattern-Linear Discriminant Analysis) is a two-stage method that extracts discriminative spatial features by identifying filters to maximize inter-class separability, followed by a LDA-based classifier to categorize the extracted EEG features [10]. (2) EA-CSP-LDA (Euclidean Alignment-CSP-LDA) incorporates the EA strategy to reduce inter-subject variability in EEG data on the basis of the established CSP-LDA framework [35]. (3) CA-TSM-LDA (Centroid Alignment-Tangent Space Mapping-LDA) combines centroid alignment [36] with tangent space mapping [37], aiming to minimize EEG feature distribution discrepancies across subjects.
Deep learning models. (1) Deep Convolutional Networks (DCNs) have been extensively applied to automatically capture spatio-temporal patterns in EEG data through convolutional layers [38]. (2) Deep Adversarial Networks (DANs) adopt a generative adversarial framework, where a generator and discriminator are trained in tandem to enhance feature representation ability [39]. Similarly, (3) Domain Adversarial Neural Networks (DANNs) reduce domain shift by incorporating domain classifiers into the training process to align source and target data distributions [40].
Source-free domain adaptation techniques. (1) SHOT offers a unique solution by freezing the classifier of a pretrained source model and adapting the feature extractor for target domain data using information maximization and self-supervised pseudo-labeling strategies [26]. (2) Augmentation-based Source-Free Adaptation (ASFA) leverages data augmentation during source model training, emphasizing uncertainty reduction and consistency regularization to improve robustness in the target domain [29]. (3) Lightweight Source-Free Transfer (LSFT) constructs an intermediate virtual domain consisting of some target domain samples with high prediction consistency by trained source models, which enables knowledge transfer while preserves privacy [30]. (4) EEG-DG [40] is a multi-source domain generalization framework, which addresses cross-subject EEG classification by constructing robust domain-invariant representations. It achieves this through a dual-distribution alignment strategy that simultaneously optimizes both marginal and conditional distributions, thereby effectively minimizing statistical discrepancies across diverse source domains. Crucially, adhering to the standard DG paradigm, EEG-DG performs this optimization entirely without access to target domain data during the training phase, relying solely on the generalizability learned from multiple sources. (5) TransDA [41] is a Transformer-based framework to address source-free domain adaptation by injecting Transformer blocks as attention modules into convolutional networks. This mechanism encourages the model to focus on discriminative regions to improve generalization on unseen target domain samples. Furthermore, it effectively adapts the Transformer to target domain using a novel self-supervised knowledge distillation approach with target pseudo-labels.

The experimental implemented a leave-one-subject-out paradigm, which specifically aims to address unsupervised knowledge transfer scenarios where target domain EEG recordings remained completely unannotated. Source domain processing, along with model training phases, were performed on local infrastructure to maintain data privacy. Classification accuracy served as the primary evaluation metric for assessing model efficacy across domains.

The implementations of different deep learning models have distinct configurations. DCN employed a sequential structure of three identical modules, each containing convolutional operations followed by batch normalization, ReLU-based nonlinear activation, max-pooling, and dropout layers (with probability 0.5). DANN incorporated gradient inversion mechanisms during adversarial training, while DAN employed a multi-kernel distribution matching strategy through MK-MMD metric learning. Hyperparameter settings also differed substantially across architectures. DCN utilized a reduced batch size 32 with lower initial learning rate 0.002, whereas DAN and DANN shared a larger batch size 128 and a higher learning rate 0.01. All of them maintained bottleneck layer dimensionalities between 50 and 288 units.

Implementation details for SFDA methods followed established protocols. SHOT employed consistent batch processing (128 samples) with 50-dimensional bottleneck representations, implementing phased training durations (20 epochs for source domain preparation versus 300 epochs for target domain adaptation). Primary parameters were set as temperature scaling (

τ

= 0.1), cosine distance measurement, periodic adjustments (five-epoch intervals), and regularization balance (

β

= 0.3). ASFA maintained 128-sample batch size with 50-unit bottleneck dimensionality, whose training schedules was set as 20 epochs for source models while 300 for target adaptation. The balance hyperparameters in ASFA was set as Tsallis entropy (

a = 2

), domain weakening probability (

p = 0.1

), and decision boundary constraint (

λ = 0.5

). LSFT implemented feature transformation via iterative subspace projection (dimensionality

p = 20

,

T = 10

iterations) with discrepancy thresholds (

μ = 0.1

) controlling intermediate domain generation. EEG-DG leverages a sophisticated multi-branch convolutional architecture to achieve robust EEG domain generalization through synergistic domain-invariant feature learning and adaptive feature weighting. The core of this framework comprises four parallel temporal convolutional branches, each engineered to extract distinct temporal patterns and yield four dedicated feature maps, which are subsequently refined by a depthwise convolution block. Crucially, the integration of an adversarial domain classifier with a dynamic feature weighting module enforces the learning of domain-invariant representations, effectively mitigating domain shifts by prioritizing discriminative features while suppressing domain-specific noise. TransDA utilized 64-sample batch processing with 256-dimensional bottleneck representations, scheduling the target domain adaptation for 15 epochs. Primary hyperparameters were configured with a classifier weight of 0.3, an entropy weight of 1.0, cosine distance measurement for pseudo-label generation, and an exponential moving average (EMA) momentum of 0.001 for updating the teacher network.

4.3. Cross-Subject EEG Classification Results

The cross-subject EEG classification accuracies for the three motor imagery EEG datasets are presented in Table 3, Table 4 and Table 5. The best accuracy in each task is highlighted in bold, and the second-best one is underlined. It is observed that on average, our proposed T-CMDP model achieved the best performance across all the three datasets.

Across all the 35 subject-specific evaluations, our method attains the highest average accuracy on all the three datasets, i.e., 56.85% on BNCI2014001, 76.34% on BNCI2014002, and 74.49% on BNCI2015001. This performance improved the second-best model, specifically LSFT by 2.76% on BNCI2014001, LSFT by 2.19% on BNCI2014002, and EEG-DG by 3.08% on BNCI2015001, respectively, indicating the effectiveness of our proposed framework.
Our method ranks among the top two performers in 24 out of the 35 evaluations. This consistency underscores its reliability in practical scenarios where the target subject EEG characteristics are unknown as a priori. Moreover, it excels not only on ‘easy’ subjects (e.g., >95% on subject 1 of BNCI2015001) but also on ‘hard’ ones, striking an optimal balance between peak performance and worst-case robustness.
More critically, our method exhibits superior robustness across some challenging cases. For example, when the subject 2 in the BNCI2014001 dataset served as the target domain, many baselines collapse around chance level, e.g., LSFT 29.17% and ASFA 25.52%; in contrast, our model still achieves an acceptable accuracy of 34.72%. This phenomenon might be caused by the fact that the neural patterns deviate a lot from the subjects in the source domain, leading to irreconcilable inter-domain differences.

To assess the significance of cross-subject EEG classification performance improvements, we performed pair-wised t-test between our model and each of the baseline models, by aggregating the results across all the three datasets together (total 35 evaluation cases). The statistical test results in Table 6 confirm that the improvements achieved by our proposed T-CMDP model is statistically significant (p-value is less than 0.05 for each pair-wised comparison). These results validate the overall effectiveness of T-CMDP and equivalently, the effectiveness of its involved components, i.e., Transformer-based feature encoder, the bi-objective target model adaptation and CMD pseudo-label updating strategy-based self-supervised learning.

4.4. Impact of the Bi-Objective Domain Adaptation Strategy

To validate the efficacy of the proposed domain adaptation framework, we conducted a comparative analysis between the baseline source model and the target model enhanced with our transfer learning strategy introduced in Section 3.4. By taking the BNCI2015001 dataset as an example, we visualized the experimental results across all the 12 evaluation cases in Figure 2. The blue bar represents the accuracy obtained by the model trained exclusively on source domain data (i.e., without transfer learning), while the orange bar depicts the performance after applying the proposed bi-objective transfer learning method, i.e., including knowledge distillation (KD) and information maximization (IM).

As observed in Figure 2, the utilization of transfer learning yields a consistent improvement in classification accuracy across almost all the subjects. The bars corresponding to model with transfer learning consistently maintain a superiority over the baseline, particularly in subjects where the source model exhibits substantial performance drops (e.g., subject 7, subject 9, and subject 12). Specifically, when subject 7 serves as the target, the accuracy improves from approximately 64% to over 68%, and for subject 9, it rises from roughly 68% to 72%. This phenomenon indicates that our method effectively mitigates the negative impact of domain shifts inherent in cross-subject EEG classification. We attribute these performance gains to the synergy of the proposed modules. Firstly, the knowledge distillation stage ensures that the target model inherits the task-aware (i.e., different motor intentions) discriminative power of the source model via soft label alignment, preventing the phenomenon of catastrophic forgetting often caused by direct fine-tuning. Secondly, the information maximization loss enhances the compactness of the target EEG sample clusters, pushing the decision boundaries away from high-density regions.

4.5. Impact of CMD Self-Supervised Learning

To further evaluate the contribution of the proposed class-balanced multicentric dynamic pseudo-label update strategy within our framework introduced in Section 3.5, an ablation study was conducted by comparing their cross-subject EEG classification performance of two model variants, one incorporating the CMD-based self-supervised learning component (termed as “model with self-supervised learning”) and the other excluding it (termed as “model without self-supervised learning”). The results on the BNCI2015001 dataset are given in Figure 3.

As shown in Figure 3, the integration of the CMD-based pseudo-label updating strategy consistently improves classification accuracy across nearly all the subjects. The performance bars in orange color corresponding to the model with self-supervised learning lie above those of the baseline in blue color throughout. Notably, substantial improvement are observed for subjects with moderate or low baseline accuracy, particularly the subjects 6, 7, and 10. For example, in the case of when subject 6 serves as the target, it exhibits an accuracy increase from approximately 52% to 59%, while a marked improvement from roughly 82% to nearly 88% is achieved when the subject 4 is the target.

This consistent enhancement demonstrates the efficacy of the CMD pseudo-labeling strategy in overcoming key limitations of conventional pseudo-labeling approaches. Without this module, the adaptation process is susceptible to confirmation bias, where noisy or incorrect pseudo-labels generated under domain shifts and class imbalance lead to error propagation during training. In contrast, the proposed CMDP strategy addresses these challenges through two core mechanisms.

Mitigation of class bias. By leveraging a multi-instance learning-based sampling scheme, prototypes are constructed exclusively from high-confidence EEG samples (i.e., the positive bag), rather than from all available EEG samples. This prevents the dominant, easily transferable classes from skewing prototype estimation, thereby preserving representation fidelity for minority or challenging classes.
Dynamic and robust prototype updating. Unlike static prototype assignment, the multicentric prototypes are updated dynamically to reflect the evolving state of the model. The selection of top-M high-confidence instances (samples) ensures higher reliability of the generated pseudo-labels $y^{t}$ , facilitating more accurate alignment of target-domain features. This mechanism is particularly beneficial for “hard” subjects (e.g., subjects 10 and 12), where decision boundaries are inherently ambiguous, leading to measurable performance improvements.

To further examine the influence of the CMDP strategy on target domain EEG feature adaptation, t-distributed stochastic neighbor embedding (t-SNE) is employed to visualize the latent features of a representative subject (subject 4) in two-dimensional subspace. As illustrated in Figure 4, the feature distribution without adaptation (left) is highly entangled, and it is observed that motor intention classes exhibit significant overlap and dispersion, posing a significant challenge for discrimination. In contrast, after applying the CMDP-based self-supervised adaptation (right), the adapted features are reorganized into two well-separated, high-density clusters with a distinct decision boundary and markedly reduced intra-class variance. This visualization corroborates that the CMDP strategy effectively alleviates both domain shift and class bias by refining the latent feature space, enabling the model to learn discriminative and domain-invariant representations even in the complete absence of labeled target data.

4.6. Ablation Studies

To rigorously evaluate the individual contribution of each component within our proposed T-CMDP framework, we conducted a comprehensive ablation study on the three datasets. The results, as detailed in Table 7, investigate the impacts of the three key strategies, Tangent Space Mapping (TSM)-based feature extraction, bi-objective strategy-based domain adaptation (DA), and the CMDP strategy-based Self-supervised Learning (SL).

Efficacy of the Riemannian manifold-based feature extraction (TSM). The most significant performance leap is observed upon the introduction of TSM. As shown in the comparison between the first row (baseline) and the fifth row, applying TSM alone increases the average accuracy across the three datasets from 42.66% to 61.23%. This substantial improvement underscores the fundamental importance of Riemannian geometry in motor imagery task-aware EEG feature extraction. By mapping the covariance matrices into a tangent Euclidean space, TSM effectively linearizes the data structure, providing a much more discriminative feature space for the subsequent Transformer encoder than raw EEG data.
Impact of the bi-objective strategy-based domain adaptation (DA). Building upon the TSM features, the inclusion of the domain adaptation component by comprising knowledge distillation and information maximization strategies further enhances the model generalization ability on unseen subjects. Comparing the TSM-only model in the fifth row with the ‘TSM+DA’ configuration in the seventh row, the average accuracy rises to 63.82%. This indicates that the domain adaptation process effectively aligns the EEG feature distributions between the source and target domains, successfully mitigating the inter-subject domain shift. Notably, the improvement is consistent across all the three datasets, validating the stability of the proposed losses in achieving domain alignment.
Contribution of the CMD-based self-supervised learning (SL). The SL component with the help of the specific CMD-based pseudo-labeling strategy, also demonstrates a distinct contribution. When added to the TSM baseline shown in the sixth row, it achieves an accuracy of 63.93%, which is comparable to the DA module. More importantly, when integrated into the full framework, the SL component complements the DA module to elevate the final average accuracy to 69.23%. This suggests that while DA aligns the global distribution, the SL strategy refines the decision boundaries for individual EEG samples in the target domain by mitigating class bias and noisy pseudo-labels.
Synergistic effects. The optimal performance is achieved when all the three components are employed simultaneously, as shown in the bottom row (Row 8) of Table 7. The full model outperforms any single-component or dual-component variant, yielding the highest results on all datasets (56.85% on BNCI2014001, 76.34% on BNCI2014002, and 74.49% on BNCI2015001). This confirms that the proposed components are highly synergistic rather than redundant. TSM provides a robust geometric manifold-aware feature foundation, DA bridges the domain gap, and SL fine-tunes the target-specific representations.

4.7. Parameter Sensitivity Analysis

To evaluate the robustness of the proposed T-CMDP framework and the impact of hyper-parameters on the cross-subject EEG classification performance, we conducted sensitivity analyses on two primary parameters, i.e., the number of class centers (denoted as multi-center-num) and the selection of high-confidence instances (denoted as top-k) for class prototype construction.

As illustrated in Figure 5, we investigated the sensitivity of model accuracy in terms of different parameter multi-center-num values, which were varied from 4 to 20. The experimental results indicate that the accuracy initially exhibits an upward trend, reaching its peak of approximately 72.7% when the number of class centers is 12. This suggests that employing multiple centers for each class is beneficial for capturing the intra-class diversity and the complex manifold structure of EEG signals in the target domain. However, as the number of centers exceeds the optimal value, the accuracy begins to slightly decline. This performance degradation is likely attributable to the over-segmentation of the feature space, where an excessive number of centers causes the model to fit local noise or outliers rather than the global distribution, thereby introducing ambiguity in pseudo-label assignments.

The parameter top-k also plays a vital role in the CMD-based pseudo-label updating strategy, because it determines the number of high-confidence instances in computing the class-balanced prototypes. Figure 6 displays the variation of classification accuracy as such parameter ranges from one to nine. The highest accuracy is achieved at when top-k was set to three, indicating that the top-ranked instances provide the most reliable and discriminative information for representing the class distribution. As k increases beyond this threshold, there exists a slight drop in classification accuracy, particularly when it is larger than five. This phenomenon demonstrates that incorporating more instances into prototype construction tends to introduce noisy samples that located around decision boundaries or with lower classification confidence. Such noise makes the prototypes deviate away from the essential class centroids, leading to the propagation of incorrect pseudo-labels and hindering the domain adaptation process.

In conclusion, the above analysis reveals that appropriate parameters are necessary for our proposed model to achieve reasonable performance. Based on these empirical findings, we set the multi-center-num and top-k respectively as 12 and 3 by default in our experiments to ensure the optimal balance between representation capability and label reliability.

5. Conclusions

In this work, we presented a privacy-preserving, source-free domain adaptation framework termed T-CMDP for cross-subject EEG classification in motor imagery BCI paradigm. By leveraging a Transformer-based source model, a bi-objective domain adaptation strategy, and a novel class-balanced multicentric dynamic self-supervised learning mechanism, our approach effectively mitigated inter-subject variability without accessing raw source EEG data. Experimental results across three public motor imagery EEG datasets demonstrated the competitive performance of T-CMDP in comparison with three different types of EEG classification models (i.e., existing traditional, deep learning, and source-free domain adaptation baselines), achieving state-of-the-art average accuracies of 56.85%, 76.34%, and 74.49%, respectively. Ablation studies further confirmed the complementary contributions of Riemannian manifold-based EEG feature extraction, knowledge distillation together with information maximization-based target domain adaptation, and dynamic pseudo-label refinement. Critically, our framework enables rapid deployment with minimal subject-specific calibration while adhering to privacy preserving and regulatory constraints, making it particularly suitable for real-world applications.

Despite these promising outcomes, several practical and theoretical limitations must be acknowledged. Practically, real-world clinical EEG recordings are often plagued by severe physiological artifacts and possible class imbalances, which may degrade the reliability of the multicentric pseudo-labels generated by our current model. Theoretically, while the SFDA paradigm prevents direct raw data leakage, it does not yet provide strict mathematical privacy guarantees. Advanced adversarial threats, such as model inversion or membership inference attacks, could still pose risks to the transferred model weights. Furthermore, the transition of such technologies into clinical BCI deployment necessitates careful ethical considerations. Ensuring algorithmic fairness across diverse patient demographics and maintaining transparent patient consent regarding how pre-trained models are utilized are paramount. Future work will focus on enhancing model robustness against highly noisy and imbalanced clinical data, integrating rigorous differential privacy mechanisms, and establishing ethical guidelines for the responsible deployment of adaptive neuroprosthetics.

Author Contributions

Conceptualization, J.L. and Y.P.; methodology, J.L.; software, J.L. and J.Z.; validation, J.Z. and C.H.; formal analysis, J.L.; investigation, J.L.; resources, J.L. and C.H.; data curation, J.L. and J.Z.; writing—original draft preparation, J.L.; writing—review and editing, Y.P.; visualization, J.Z.; supervision, Y.P.; project administration, Y.P.; funding acquisition, Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the “Pioneer” and “Leading Goose” R&D Program of Zhejiang under Grant 2025C04001, the National Natural Science Foundation of China under Grant 62571171, the MoE Humanities and Social Sciences Project under Grant 24YJCZH225, and the Guangxi Key Laboratory of Automatic Detecting Technology and Instruments under Grant YQ26204.

Institutional Review Board Statement

Ethical review and approval were waived for this study due to all the experimental data are publicly available online.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

All the datasets used in experiments are publicly available.

Acknowledgments

The authors also would like to thank the editors and reviewers for their constructive comments on this work.

Conflicts of Interest

The authors declare no conflicts of interest.

Note

1	https://github.com/NeuroTechX/moabb (accessed on 23 January 2024).

References

Naser, M.Y.; Bhattacharya, S. Towards Practical BCI-Driven Wheelchairs: A Systematic Review Study. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 1030–1044. [Google Scholar] [CrossRef]
Ukov, T.; Sharabov, M.; Tsochev, G. A Systematic Review of the Need for Conceptual Models of Imagery Experiences. Systems 2025, 13, 1051. [Google Scholar] [CrossRef]
Yılmaz, S.; Boz, C.; Özsarı, S.H.; Yılmaz, F.; Fidan Türkön, B.; Mete, A.H. Effects of Neurological Disorders on Health Expenditure and Economic Output: Dynamic Panel Analysis for OECD Countries. Systems 2025, 13, 521. [Google Scholar] [CrossRef]
Arpaia, P.; Esposito, A.; Natalizio, A.; Parvis, M. How to successfully classify EEG in motor imagery BCI: A metrological analysis of the state of the art. J. Neural Eng. 2022, 19, 031002. [Google Scholar] [CrossRef] [PubMed]
Wan, Z.; Yang, R.; Huang, M.; Zeng, N.; Liu, X. A Review on Transfer Learning in EEG Signal Analysis. Neurocomputing 2021, 421, 1–14. [Google Scholar] [CrossRef]
Li, W.; Peng, Y. Transfer EEG emotion recognition by combining semi-supervised regression with bipartite graph label propagation. Systems 2022, 10, 111. [Google Scholar] [CrossRef]
Yan, W.; Wu, Y. A Time-Frequency Denoising Method for Single-Channel Event-Related EEG. Front. Neurosci. 2022, 16, 991136. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Zhang, Y.; Tiwari, P.; Song, D.; Hu, B.; Yang, M.; Zhao, Z.; Kumar, N.; Marttinen, P. EEG-Based Emotion Recognition: A Tutorial and Review. ACM Comput. Surv. 2022, 55, 79. [Google Scholar] [CrossRef]
Wu, W.; Chen, Z.; Gao, X.; Li, Y.; Brown, E.N.; Gao, S. Probabilistic Common Spatial Patterns for Multichannel EEG Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 639–653. [Google Scholar] [CrossRef]
Cai, G.; Zhang, F.; Yang, B.; Huang, S.; Ma, T. Manifold learning-based common spatial pattern for EEG signal classification. IEEE J. Biomed. Health Inform. 2024, 28, 1971–1981. [Google Scholar] [CrossRef]
Luo, C.; Li, F.; Li, P.; Yi, C.; Li, C.; Tao, Q.; Zhang, X.; Si, Y.; Yao, D.; Yin, G.; et al. A survey of brain network analysis by electroencephalographic signals. Cogn. Neurodyn. 2022, 16, 17–41. [Google Scholar] [CrossRef]
Altaheri, H.; Muhammad, G.; Alsulaiman, M.; Amin, S.U.; Altuwaijri, G.A.; Abdul, W.; Bencherif, M.A.; Faisal, M. Deep learning techniques for classification of electroencephalogram (EEG) motor imagery (MI) signals: A review. Neural Comput. Appl. 2023, 35, 14681–14722. [Google Scholar] [CrossRef]
Song, Y.; Zheng, Q.; Wang, Q.; Gao, X.; Heng, P.A. Global adaptive transformer for cross-subject enhanced EEG classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 2767–2777. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Peng, Y.; Tang, J.; Camilleri, T.A.; Camilleri, K.P.; Kong, W.; Cichocki, A. EEG-based affective brain-computer interfaces: Recent advancements and future challenges. J. Neural Eng. 2025, 22, 031004. [Google Scholar] [CrossRef]
Lopez, K.L.; Monachino, A.D.; Vincent, K.M.; Peck, F.C.; Gabard-Durnam, L.J. Stability, change, and reliable individual differences in electroencephalography measures: A lifespan perspective on progress and opportunities. NeuroImage 2023, 275, 120116. [Google Scholar] [CrossRef] [PubMed]
Hosna, A.; Merry, E.; Gyalmo, J.; Alom, Z.; Aung, Z.; Azim, M.A. Transfer learning: A friendly introduction. J. Big Data 2022, 9, 102. [Google Scholar] [CrossRef]
Liang, G.; Zheng, L. A transfer learning method with deep residual network for pediatric pneumonia diagnosis. Comput. Methods Programs Biomed. 2020, 187, 104964. [Google Scholar] [CrossRef]
Zhang, H.; Jiang, L.; Yu, L. Attribute and Instance Weighted Naive Bayes. Pattern Recognit. 2021, 111, 107674. [Google Scholar] [CrossRef]
Xia, K.; Duch, W.; Sun, Y.; Xu, K.; Fang, W.; Luo, H.; Zhang, Y.; Sang, D.; Xu, X.; Wang, F.Y.; et al. Privacy-preserving brain–computer interfaces: A systematic review. IEEE Trans. Comput. Soc. Syst. 2023, 10, 2312–2324. [Google Scholar] [CrossRef]
Kundu, J.N.; Venkat, N.; Babu, R.V. Universal Source-Free Domain Adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2020; pp. 4544–4553. [Google Scholar]
Pooja; Pahuja, S.; Veer, K. Recent Approaches on Classification and Feature Extraction of EEG Signal: A Review. Robotica 2022, 40, 77–101. [Google Scholar] [CrossRef]
Singh, A.K.; Krishnan, S. Trends in EEG Signal Feature Extraction Applications. Front. Artif. Intell. 2023, 5, 1072801. [Google Scholar] [CrossRef]
Ng, W.B.; Saidatul, A.; Chong, Y.; Ibrahim, Z. PSD-Based Features Extraction for EEG Signal During Typing Task. In Proceedings of the IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2019; Volume 557, p. 012032. [Google Scholar] [CrossRef]
Zhao, Y.; Chen, J. A Survey on Differential Privacy for Unstructured Data Content. ACM Comput. Surv. (CSUR) 2022, 54, 207. [Google Scholar] [CrossRef]
Li, L.; Fan, Y.; Tse, M.; Lin, K.Y. A Review of Applications in Federated Learning. Comput. Ind. Eng. 2020, 149, 106854. [Google Scholar] [CrossRef]
Liang, J.; Hu, D.; Feng, J. Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation. In Proceedings of the International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2020; pp. 6028–6039. [Google Scholar]
Niknam, Y. Source-Free Domain Adaptation for Sleep Stage Classification. Master’s Thesis, The University of Western Ontario (Canada), London, ON, Canada, 2023. [Google Scholar]
Zhang, W.; Wang, Z.; Wu, D. Multi-Source Decentralized Transfer for Privacy-Preserving BCIs. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 30, 2710–2720. [Google Scholar] [CrossRef]
Xia, K.; Deng, L.; Duch, W.; Wu, D. Privacy-Preserving Domain Adaptation for Motor Imagery-Based Brain-Computer Interfaces. IEEE Trans. Biomed. Eng. 2022, 69, 3365–3376. [Google Scholar] [CrossRef]
Zhang, W.; Wu, D. Lightweight Source-Free Transfer for Privacy-Preserving Motor Imagery Classification. IEEE Trans. Cogn. Dev. Syst. 2023, 15, 938–949. [Google Scholar] [CrossRef]
Zhao, Y.; Feng, S.; Li, C.; Song, R.; Liang, D.; Chen, X. Source-free domain adaptation for privacy-preserving seizure prediction. IEEE Trans. Ind. Inform. 2024, 20, 2787–2798. [Google Scholar] [CrossRef]
Tangermann, M.; Müller, K.R.; Aertsen, A.; Birbaumer, N.; Braun, C.; Brunner, C.; Leeb, R.; Mehring, C.; Miller, K.J.; Müller-Putz, G.R.; et al. Review of the BCI Competition IV. Front. Neurosci. 2012, 6, 55. [Google Scholar] [CrossRef] [PubMed]
Carrara, I.; Papadopoulo, T. Pseudo-online framework for BCI evaluation: A MOABB perspective using various MI and SSVEP datasets. J. Neural Eng. 2024, 21, 016003. [Google Scholar] [CrossRef] [PubMed]
Ju, C.; Guan, C. Graph neural networks on spd manifolds for motor imagery classification: A perspective from the time–frequency analysis. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 17701–17715. [Google Scholar] [CrossRef]
He, H.; Wu, D. Transfer learning for brain–computer interfaces: A Euclidean space data alignment approach. IEEE Trans. Biomed. Eng. 2020, 67, 399–410. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Wu, D. Manifold Embedded Knowledge Transfer for Brain-Computer Interfaces. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 1117–1127. [Google Scholar] [CrossRef]
Yang, H.; Wu, F.; Zhang, N.; Chen, C.; Wei, A.; Peng, F.; Li, Z. A transfer learning method for motor imagery EEG signals classification based on CCSP and riemannian tangent space mapping. In Proceedings of the 2023 IEEE International Conference on Real-Time Computing and Robotics (RCAR); IEEE: Piscataway, NJ, USA, 2023; pp. 707–712. [Google Scholar] [CrossRef]
Rabbani, M.H.R.; Islam, S.M.R. Deep learning networks based decision fusion model of EEG and fNIRS for classification of cognitive tasks. Cogn. Neurodyn. 2024, 18, 1489–1506. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Zhong, X.C.; Wang, Q.; Liu, D.; Chen, Z.; Liao, J.X.; Sun, J.; Zhang, Y.; Fan, F.L. EEG-DG: A Multi-Source Domain Generalization Framework for Motor Imagery EEG Classification. IEEE J. Biomed. Health Inform. 2025, 29, 2484–2495. [Google Scholar] [CrossRef] [PubMed]
Yang, G.; Zhong, Z.; Ding, M.; Sebe, N.; Ricci, E. Self-training transformer for source-free domain adaptation. Appl. Intell. 2023, 53, 16560–16574. [Google Scholar] [CrossRef]

Figure 1. The general framework of the proposed model.

Figure 2. Comparison of classification accuracy between the baseline model (without transfer learning) and the proposed method (with transfer learning) across the 12 subjects in the BNCI2015001 dataset.

Figure 3. Ablation study on the effect of the CMD-based self-supervised learning strategy across the 12 subjects in the BNCI2015001 dataset.

Figure 4. t-SNE visualization of target-domain samples from subject 4 in the BNCI2015001 dataset before and after self-supervised adaptation.

Figure 5. The average cross-subject EEG classification performance in terms of different class center numbers on BNCI2015001.

Figure 6. The average cross-subject EEG classification performance in terms of top-k selected instances in class prototype construction on BNCI2015001.

Table 1. The main notations and descriptions used in this paper.

Notation	Description	Notation	Description
$D_{s}$	source domain	$X$	EEG trials
$D_{t}$	target domain	$X$	feature matrix
$M_{R}$	reference matrix	$x$	feature vector
$Y$	label matrix	y	label of $x$
K	number of source subjects	C	number of classes
$Q$	covariance matrix	$D_{R} (Q_{1}, Q_{2})$	Riemannian distance
S	number of tokens	$d_{x}$	token feature dimension
$d_{in}$	embedding space dimension	$PE$	positional encoding matrix

Table 2. The main properties of the three MI datasets used in the experiments (# means ‘the number of’).

Dataset	#subject	#channel	#trial	#class
BNCI2014001	9	22	576	4
BNCI2014002	14	15	120	2
BNCI2015001	12	13	400	2

Table 3. Cross-subject EEG classification accuracies (%) on the BNCI2014001 dataset.

Alg.	CSP-LDA	EA-CSP	CA-TSM	DCN	DAN	DANN	SHOT	ASFA	LSFT	EEG-DG	TransDA	Ours
sub1	38.89	38.89	44.62	55.90	55.73	51.74	65.10	62.33	68.75	63.76	59.29	66.84
sub2	27.26	27.78	27.95	32.64	31.42	28.13	26.22	25.52	29.17	27.10	29.17	34.72
sub3	52.61	52.60	40.28	54.34	64.58	65.45	69.79	81.42	78.47	73.58	71.82	84.03
sub4	38.02	38.02	34.55	31.94	39.76	45.49	35.94	37.50	39.58	42.16	37.12	46.01
sub5	26.56	26.39	26.04	25.00	35.59	37.67	24.48	32.64	32.64	34.10	32.99	35.76
sub6	24.48	24.48	26.22	30.73	38.54	36.63	30.21	31.60	37.85	38.78	34.72	42.01
sub7	34.89	34.90	37.33	30.90	43.92	50.17	43.75	47.40	52.78	56.48	51.23	57.99
sub8	54.69	54.69	44.62	57.64	58.51	59.03	63.19	64.41	78.47	62.83	72.75	76.22
sub9	54.86	55.03	38.54	49.13	54.34	50.52	56.77	52.78	69.10	64.17	62.75	68.06
Avg.	39.14	39.20	35.57	40.91	46.93	47.20	46.16	48.40	54.09	51.44	50.20	56.85
Std.	12.27	12.25	7.38	13.03	11.61	11.63	17.86	18.48	19.96	16.20	17.20	18.07

The best accuracy in each task is highlighted in bold, and the second-best one is underlined.

Table 4. Cross-subject EEG classification accuracies (%) on the BNCI2014002 dataset.

Alg.	CSP-LDA	EA-CSP	CA-TSM	DCN	DAN	DANN	SHOT	ASFA	LSFT	EEG-DG	TransDA	Ours
sub1	55.63	56.25	50.00	53.13	60.63	61.88	67.50	65.20	60.62	60.00	62.50	75.00
sub2	60.00	61.88	60.00	78.75	81.25	81.88	83.75	67.80	83.75	81.42	71.13	85.60
sub3	91.88	93.13	95.63	85.00	88.75	86.88	99.38	70.90	100.0	89.85	88.75	99.38
sub4	81.25	82.50	50.00	50.00	73.75	78.13	85.00	57.90	83.13	79.41	78.75	84.38
sub5	56.88	57.50	53.75	60.63	71.25	66.25	65.00	51.80	72.50	70.68	66.25	73.12
sub6	57.50	66.25	58.75	50.00	74.38	65.00	61.25	49.20	71.88	68.39	60.00	73.12
sub7	83.12	69.38	71.25	50.00	79.38	78.13	86.88	80.70	91.25	80.83	81.25	88.75
sub8	56.88	61.88	56.25	54.38	66.88	60.63	50.63	99.00	65.00	65.62	59.38	66.87
sub9	91.25	90.00	80.00	50.00	77.50	80.63	93.75	69.90	94.37	85.31	86.88	93.75
sub10	65.00	61.88	64.38	63.13	59.38	58.13	60.00	74.90	66.25	61.56	61.25	65.62
sub11	50.63	50.00	58.75	50.00	63.75	66.25	51.88	57.20	62.50	56.44	65.00	78.75
sub12	57.50	56.25	58.75	62.50	65.00	67.50	61.88	65.00	80.00	69.38	67.50	75.62
sub13	50.00	50.00	50.63	50.00	56.25	61.25	50.63	84.90	54.37	57.38	53.75	55.63
sub14	48.13	49.38	50.00	50.00	47.50	48.13	50.00	93.10	52.50	49.63	52.50	53.12
Avg.	64.69	64.73	61.30	57.68	68.98	68.62	69.11	70.54	74.15	69.71	68.21	76.34
Std.	15.39	14.34	13.12	11.42	11.14	10.93	17.25	14.83	14.96	12.14	11.63	13.42

The best accuracy in each task is highlighted in bold, and the second-best one is underlined.

Table 5. Cross-subject EEG classification accuracies (%) on the BNCI2015001 dataset.

Alg.	CSP-LDA	EA-CSP	CA-TSM	DCN	DAN	DANN	SHOT	ASFA	LSFT	EEG-DG	TransDA	Ours
sub1	55.63	56.25	50.00	97.50	86.75	84.25	98.25	98.50	98.75	87.50	85.00	99.00
sub2	60.00	61.88	60.00	95.00	92.25	88.00	95.75	96.00	96.00	90.50	89.13	96.00
sub3	91.88	93.13	95.63	61.50	81.75	80.50	91.75	90.75	91.00	85.00	86.75	90.25
sub4	81.25	82.50	50.00	56.25	80.25	77.75	90.00	78.00	84.75	88.25	82.25	91.25
sub5	56.88	57.50	53.75	60.25	72.50	77.75	78.00	50.25	87.25	85.50	68.00	84.50
sub6	57.50	66.25	58.75	62.25	63.00	59.50	50.00	61.50	55.75	65.25	62.25	59.00
sub7	83.12	69.38	71.25	60.75	63.50	67.75	63.50	61.50	55.25	64.25	69.13	71.50
sub8	56.88	61.88	56.25	50.33	59.17	59.83	50.00	71.67	52.75	70.75	61.33	59.50
sub9	91.25	90.00	80.00	51.33	66.67	71.33	64.83	71.25	63.50	58.50	59.17	76.33
sub10	65.00	61.88	64.38	55.33	63.00	58.33	51.33	50.17	55.50	54.13	62.67	59.00
sub11	50.63	50.00	58.75	49.83	49.67	48.67	50.17	49.00	54.75	52.50	55.17	50.33
sub12	57.50	56.25	58.75	51.75	69.63	54.00	50.30	66.67	55.50	54.75	63.50	57.25
Avg.	67.29	67.24	63.13	62.67	70.68	68.97	69.49	70.44	70.90	71.41	70.36	74.49
Std.	15.09	13.97	13.33	16.32	12.44	12.84	19.99	17.49	18.74	15.02	12.03	17.30

The best accuracy in each task is highlighted in bold, and the second-best one is underlined.

Table 6. Statistical test results between our proposed model and each of the baseline models.

t-Test	t-Statistic	p-Value	Significance Level
Our model vs. CSP-LDA	5.1851	$9.8808 \times 10^{- 6}$	***
Our model vs. EA-CSP	5.3454	$6.1141 \times 10^{- 6}$	***
Our model vs. CA-TSM	5.9984	$8.6541 \times 10^{- 7}$	***
Our model vs. DCN	7.5441	$9.2522 \times 10^{- 9}$	***
Our model vs. DAN	5.6817	$2.2320 \times 10^{- 6}$	***
Our model vs. DANN	7.5021	$1.0437 \times 10^{- 8}$	***
Our model vs. SHOT	7.0697	$3.6470 \times 10^{- 8}$	***
Our model vs. ASFA	2.1262	$4.0831 \times 10^{- 2}$	*
Our model vs. LSFT	3.1208	$3.6667 \times 10^{- 3}$	**
Our model vs. EEG-DG	4.7853	$3.2555 \times 10^{- 5}$	***
Our model vs. TransDA	6.3357	$3.1674 \times 10^{- 7}$	***

*, ** and *** respectively denote that the p-value is less than 0.05, 0.01 and 0.001.

Table 7. The ablation study results (%) in evaluating the impact of each component in our proposed T-CMDP framework.

Strategy			Dataset			Average
TSM	DA	SL	BNCI2014001	BNCI2014002	BNCI2015001	Average
✗	✗	✗	25.02	51.88	51.08	42.66
✗	✗	✓	27.36	56.09	52.30	45.25
✗	✓	✗	27.15	53.14	51.28	43.86
✗	✓	✓	28.86	58.54	56.52	47.97
✓	✗	✗	49.16	68.40	66.14	61.23
✓	✗	✓	52.72	71.74	67.32	63.93
✓	✓	✗	51.71	70.40	69.98	63.82
✓	✓	✓	56.85	76.34	74.49	69.23

✓: Component is used; ✗: Component is removed.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, J.; Zhang, J.; Hu, C.; Peng, Y. Transformer-Based SFDA by Class-Balanced Multicentric Dynamic Pseudo-Labeling for Privacy-Preserving EEG-Based BCI Systems. Systems 2026, 14, 476. https://doi.org/10.3390/systems14050476

AMA Style

Liu J, Zhang J, Hu C, Peng Y. Transformer-Based SFDA by Class-Balanced Multicentric Dynamic Pseudo-Labeling for Privacy-Preserving EEG-Based BCI Systems. Systems. 2026; 14(5):476. https://doi.org/10.3390/systems14050476

Chicago/Turabian Style

Liu, Jiangchuan, Jiatao Zhang, Cong Hu, and Yong Peng. 2026. "Transformer-Based SFDA by Class-Balanced Multicentric Dynamic Pseudo-Labeling for Privacy-Preserving EEG-Based BCI Systems" Systems 14, no. 5: 476. https://doi.org/10.3390/systems14050476

APA Style

Liu, J., Zhang, J., Hu, C., & Peng, Y. (2026). Transformer-Based SFDA by Class-Balanced Multicentric Dynamic Pseudo-Labeling for Privacy-Preserving EEG-Based BCI Systems. Systems, 14(5), 476. https://doi.org/10.3390/systems14050476

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Transformer-Based SFDA by Class-Balanced Multicentric Dynamic Pseudo-Labeling for Privacy-Preserving EEG-Based BCI Systems

Abstract

1. Introduction

2. Related Works

2.1. Review of EEG Feature Extraction Methods

2.2. Cross-Subject EEG Classification and Privacy-Preserving

3. The Proposed T-CMDP Method

3.1. Problem Definition

3.2. Manifold Adaptive EEG Feature Extraction

3.3. Source Model Training

3.4. Target Domain Adaptation

3.5. Self-Supervised Learning

4. Experiments

4.1. Data Preparation

4.2. Experimental Setup

4.3. Cross-Subject EEG Classification Results

4.4. Impact of the Bi-Objective Domain Adaptation Strategy

4.5. Impact of CMD Self-Supervised Learning

4.6. Ablation Studies

4.7. Parameter Sensitivity Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Note

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI