A Multi-Constrained Transfer Learning for Cross-Subject Decoding of Motor Imagery-Based BCI

Yu, Boyang; Zhang, Li

doi:10.3390/math14081314

Open AccessArticle

A Multi-Constrained Transfer Learning for Cross-Subject Decoding of Motor Imagery-Based BCI

by

Boyang Yu

and

Li Zhang

^*

State Key Laboratory of Power Transmission Equipment Technology, School of Electrical Engineering, Chongqing University, Chongqing 400044, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(8), 1314; https://doi.org/10.3390/math14081314

Submission received: 7 March 2026 / Revised: 31 March 2026 / Accepted: 7 April 2026 / Published: 14 April 2026

Download

Browse Figures

Versions Notes

Abstract

Individual differences and long calibration time present significant challenges to the practical implementation of brain–computer interfaces (BCIs). Domain adaptation technology can help mitigate these challenges by leveraging knowledge from existing subjects. Although domain adaptation methods have achieved progress in BCIs, there remains a need for further exploration in class structure and cross-domain dispersion. In this paper, we propose a novel framework, multi-constrained transfer learning with selective pseudo-label update (MCTLP). First, Euclidean alignment is applied to reduce inter-subject variability at the data level. Then, multi-constrained feature alignment (MCFA) is introduced, which iteratively constructs a kernel mapping space and then determines an optimized subspace to align both marginal and conditional distributions at the feature level under class structure and dispersion constraints. Moreover, in this iterative process of feature alignment, a selective pseudo-label update method is proposed to update the pseudo-labels of only the target samples with high classification confidence to realize more reliable conditional distribution alignment. Two benchmark datasets were used to verify the presented MCTLP. The results showed that MCTLP outperformed other existing methods, demonstrating its strong ability for cross-subject transfer.

Keywords:

motor imagery; domain adaptation; cross-subject

MSC:

68T10; 92C55; 62H30

1. Introduction

Brain–computer interfaces (BCIs) provide a new interaction channel linking the brain with the outside world [1,2]. Among the various types of input signals, electroencephalography (EEG) stands out as the most commonly used signal in BCIs, driven by its non-invasive nature as well as the portability of its acquisition devices [3].

Compared to other EEG paradigms, motor imagery (MI) simply requires participants to visualize the motion of a body part without external stimulation, making it a widely used paradigm [4,5,6]. In MI tasks, the most relevant EEG changes are mainly reflected in sensorimotor rhythms, especially in the

μ

and

β

frequency bands. These rhythm modulations are commonly characterized by event-related desynchronization and event-related synchronization patterns, which are widely regarded as important neurophysiological features for MI analysis and decoding [7,8,9].

Typically, the process of gathering data from new subjects involves a labor-intensive and user-unfriendly calibration procedure. Thus, decreasing subject-specific calibration is essential for EEG-based BCIs.

Transfer learning aims to leverage existing data, models, or knowledge to adapt to new domains and scenarios, which has been successfully applied to MI-based BCIs [10,11,12]. For instance, Dai et al. [13] introduced a method that integrates kernel common spatial patterns (CSPs) with transfer kernel learning, known as the transfer kernel common spatial pattern approach [14,15]. Jayaram et al. [10] proposed a framework based on multi-task learning to facilitate transfer across subjects and sessions, assigning different weights to EEG channels and features to identify invariant characteristics across subjects.

Recently, data alignment methods have gained popularity. Zanini et al. [16] introduced the Riemannian alignment (RA) method, which assumes that variations in EEG signals across subjects can be captured by their resting-state covariance matrix. This matrix is used as a reference to align the covariance matrices of EEG signals. Rodrigues et al. [17] proposed Riemannian Procrustes analysis, which further reduces inter-subject variability by applying geometrical transformations, such as translation, scaling, and rotation, while preserving the geometry of symmetric positive definite matrices. He and Wu [18] introduced a Euclidean alignment (EA) technique, which aligns EEG signals by utilizing the mean of covariance matrices. In addition to data-level alignment, many approaches have been developed to align feature distributions between source and target domains. Typical examples include joint distribution adaptation (JDA) [19] and transfer component analysis (TCA) [20], which aim to reduce domain shifts in the feature space by aligning marginal and conditional distributions.

More recent studies have further explored transfer learning for cross-subject MIEEG classification. She et al. [21] proposed an improved domain adaptation network based on Wasserstein distance for MI-EEG classification, showing the effectiveness of more robust distribution-aware adaptation. Zhi et al. [22] emphasized the learning of domain-invariant yet class-discriminative representations for cross-subject MI decoding, while Gong et al. [23] proposed a multi-source discriminant dynamic domain adaptation framework to further improve target domain discriminability in cross-subject MI-EEG recognition. Recent studies have focused not only on reducing domain discrepancy, but also on learning more discriminative feature representations and improving generalization across subjects.

However, existing methods still face several limitations in cross-subject MI classification. Many methods mainly focus on reducing domain discrepancy, while insufficiently considering class structure preservation and cross-domain dispersion, which may lead to reduced intra-class compactness and insufficient inter-class separability after adaptation. In addition, many feature alignment methods rely on pseudo-labels for conditional distribution alignment without explicitly assessing their reliability, which may cause error accumulation and degrade adaptation performance.

To address these issues, this study proposes a multi-constrained transfer learning with selective pseudo-label update (MCTLP) framework to improve knowledge transfer across individuals and enhance generalization to unseen subjects. In the proposed framework, raw EEG signals are first aligned using Euclidean alignment (EA) to alleviate inter-subject variability at the data level. A multi-constrained feature alignment (MCFA) method is then developed to project source and target domain features into a kernel mapping space. Within this space, a shared subspace is learned to jointly align marginal and conditional distributions while incorporating a class structure constraint (CSC) and a domain dispersion constraint (DDC). Since conditional distribution alignment relies on pseudo-labels, a selective pseudo-label update (SPLU) strategy is introduced to update the pseudo-labels of high-confidence target samples, thereby improving pseudo-label reliability and reducing error propagation.

The main contributions of this paper are summarized as follows.

A new feature alignment method, MCFA, is proposed, which minimizes marginal and conditional distribution discrepancies while additionally incorporating a CSC and DDC.
A pseudo-label update method SPLU is proposed to improve the reliability of pseudolabels of target samples and reduce error propagation.
Cross-subject classification is evaluated using two publicly available MI datasets, and the superior performance of the proposed method demonstrates its efficiency.

The structure of this paper is as follows: Section 2 presents the materials and methods, including the proposed MCTLP approach. Section 3 reports the results and discussion. Finally, the conclusions are given in Section 4.

2. Materials and Methods

2.1. Problem Definition

The subjects in the dataset were divided into source and target subjects. The data collected from source subjects forms the labeled source domain data

D_{s} = {(x_{s, i}, y_{s, i})

,

i = 1, \dots, N_{s}}

, where

x_{s, i}

represents the EEG signals and

y_{s, i}

denotes the corresponding label. In contrast, the data collected from target subjects forms the unlabeled target domain data

D_{t} = {(x_{t, j}), j = 1, \dots, N_{t}}

, which consists only of EEG signals

x_{t, j}

without labels.

Our objective is to reduce cross-domain discrepancies by performing alignment at both the data and feature levels.

2.2. MCTLP

Considering the substantial inter-subject variability in EEG, we present a novel transfer learning framework, MCTLP, which addresses discrepancies at both the data and feature levels: it applies EA for data-level alignment and iteratively implements MCFA for feature adaptation. To enhance the reliability of pseudo-labeled target samples during feature alignment across domains, SPLU is further proposed. The detailed flowchart is shown in Figure 1.

2.2.1. Data Alignment

Data alignment effectively minimizes the discrepancy at the data level across different subjects. EA is a widely used technique in cross-subject analysis, recognized for its efficiency, simplicity, and capacity to align data without the need for label information.

EA involves calculating a reference matrix for each subject. The reference matrix is calculated as follows:

R = \frac{1}{N} \sum_{i = 1}^{N} x_{i} x_{i}^{T}

(1)

where

x_{i}

represents the i-th EEG trial of a subject. Data alignment is then carried out as follows:

{\tilde{x}}_{i} = R^{- 1 / 2} x_{i}

(2)

Then, the Euclidean mean of the new covariance matrix for this subject is calculated as

\begin{matrix} \bar{Σ} & = \frac{1}{N} \sum_{i = 1}^{N} {\tilde{x}}_{i} {\tilde{x}}_{i}^{T} \\ = \frac{1}{N} \sum_{i = 1}^{N} R^{- 1 / 2} x_{i} x_{i}^{T} R^{- 1 / 2} \\ = R^{- 1 / 2} (\frac{1}{N} \sum_{i = 1}^{N} x_{i} x_{i}^{T}) R^{- 1 / 2} \\ = R^{- 1 / 2} R R^{- 1 / 2} = I \end{matrix}

(3)

EA aligns the EEG data by transforming the mean covariance matrix of each subject to the identity matrix, as shown in Equation (3), thereby standardizing second-order statistics across subjects at the data level. Thus, the data from different subjects are made comparable, which helps reduce cross-subject discrepancy in covariance structure and then enhance classification accuracy.

2.2.2. Feature Extraction

CSP employs spatial filtering for EEG signal processing. It enhances the variance in EEG data within a single class, while reducing it across the other class.

For each class, CSP computes the covariance matrix

Σ^{(c)}

of each trial:

Σ^{(c)} = \frac{{\tilde{X}}^{(c)} {({\tilde{X}}^{(c)})}^{T}}{t r [{\tilde{X}}^{(c)} {({\tilde{X}}^{(c)})}^{T}]}

(4)

where

{\tilde{X}}^{(c)} = [{\tilde{x}}_{1}^{(c)}, \dots, {\tilde{x}}_{N_{c}}^{(c)}]

is the aligned EEG signals from class

c (c = 1, 2)

. The following optimization problem is then formed:

max J (w) = \frac{w^{T} Σ^{(1)} w}{w^{T} (Σ^{(1)} + Σ^{(2)}) w}

(5)

The spatial filter w is derived from the generalized eigenvalue problem as follows:

Σ^{(1)} w = λ (Σ^{(1)} + Σ^{(2)}) w

(6)

where

λ

denotes the generalized eigenvalue and w represents the corresponding eigenvector. The feature f can be obtained by projecting the EEG signal with the CSP filter w as follows:

f = w^{T} \tilde{x}

(7)

The feature matrix F for all trials is then constructed by concatenating the source and target domain features

F = [f_{s, 1}, \dots, f_{s, N_{s}}, f_{t, 1}, \dots, f_{t, N_{t}}]

, where

N_{s}

and

N_{t}

denote the number of trials in the source and target domains, respectively.

2.2.3. Multi-Constrained Feature Alignment (MCFA)

MCFA maps the extracted features into a kernel mapping space and minimizes the distribution discrepancies between source and target domains by projecting the features from this mapping space to a low-dimensional shared subspace of all source and target samples via a projection matrix. The aligned feature representation is formulated as follows:

Z = A^{T} ϕ (F)

(8)

where

ϕ (\cdot)

is the feature mapping function that maps the features into a kernel mapping space and Z represents the projected features in the low-dimensional shared subspace.

Optimization Problem of MCFA

Compared to existing distribution alignment methods such as JDA, which aligns marginal and conditional distributions, MCFA further integrates a CSC (denoted by the intra-class compactness

R_{i n t r a}

and the inter-class separation

R_{i n t e r}

) and DDC (denoted by the domain dispersion

R_{d i s p}

) to enhance feature discriminability and domain alignment. Figure 2 illustrates the difference between MCFA and JDA, showing how the introduced constraints influence the distribution alignment process. To find the projection matrix A (in Equation (8)) that determines the shared low-dimensional subspace of a kernel mapping space, the following optimization problem, which simultaneously minimizes the discrepancies in both marginal and conditional distributions while incorporating a CSC and DDC, is described as

\begin{matrix} min_{A} L (A) = & ∥ P (A^{⊤} ϕ (F_{s})) - P (A^{⊤} ϕ (F_{t})) ∥_{2}^{2} + ∥ Q (A^{⊤} ϕ (F_{s} | y_{s})) - Q (A^{⊤} ϕ (F_{t} | {\hat{y}}_{t})) ∥_{2}^{2} \\ + ρ R_{intra} (A, F_{s}) - μ R_{inter} (A, F_{s}, F_{t}, {\hat{y}}_{t}) + ν R_{disp} (A, F_{s}, F_{t}) + λ {∥ A ∥}_{F}^{2} \end{matrix}

(9)

where

F_{s}

and

F_{t}

denote the source and target domain features, P and Q represent the marginal distribution and the conditional distribution, respectively, and

{∥ ∥}^{2}

represents the 2-norm. The discrepancies in the marginal and conditional distributions of the features in the source and target domains in Equation (9) are calculated as follows, respectively.

∥ P (A^{⊤} ϕ (F_{s})) - P (A^{⊤} ϕ (F_{t})) ∥^{2} = tr (A^{⊤} ϕ (F) M_{0} ϕ (F) A)

(10)

∥ Q (A^{⊤} ϕ (F_{s} | y_{s})) - Q (A^{⊤} ϕ (F_{t} | {\hat{y}}_{t})) ∥^{2} = tr (A^{⊤} ϕ (F) M_{c} ϕ (F) A)

(11)

where

M_{0}

and

M_{c}

are the marginal and conditional MMD matrices, defined as

\begin{matrix} {(M_{0})}_{i j} = \{\begin{matrix} \frac{1}{N_{s}^{2}}, & f_{i}, f_{j} \in D_{s}, \\ \frac{1}{N_{t}^{2}}, & f_{i}, f_{j} \in D_{t}, \\ - \frac{1}{N_{s} N_{t}}, & otherwise \end{matrix} \\ {(M^{(c)})}_{i j} = \{\begin{matrix} \frac{1}{{(N_{s}^{(c)})}^{2}}, & f_{i}, f_{j} \in D_{s}^{(c)}, \\ \frac{1}{{(N_{t}^{(c)})}^{2}}, & f_{i}, f_{j} \in D_{t}^{(c)}, \\ - \frac{1}{N_{s}^{(c)} N_{t}^{(c)}}, & \{\begin{matrix} f_{i} \in D_{s}^{(c)}, f_{j} \in D_{t}^{(c)} \\ f_{i} \in D_{t}^{(c)}, f_{j} \in D_{s}^{(c)} \end{matrix} \\ 0, & otherwise \end{matrix} \end{matrix}

(12)

The CSC in Equation (9) includes both intra-class compactness and inter-class separation, aiming to preserve class structures within the source domain and align the distributions across domains

The intra-class compactness can be formulated as

R_{intra} (F_{s}) = tr (A^{⊤} F L F^{⊤} A)

(13)

where L is the intra-class distance, determined as follows:

\begin{matrix} L & = \sum_{c = 1}^{C} α_{c} Π_{s}^{(c)} (N_{s}^{(c)} I - 1 1^{⊤}) Π_{s}^{(c) ⊤} \\ α_{c} & = \frac{2}{C} \cdot \frac{1}{N_{s}^{(c)} (N_{s}^{(c)} - 1)} \end{matrix}

(14)

where

Π_{s}^{(c)}

is the embedding matrix for source class c.

The inter-class separation is formulated as

R_{inter} (F_{s}, F_{t}, {\hat{y}}_{t}) = tr (A^{⊤} F E F^{⊤} A)

(15)

where E is the inter-class separation matrix, constructed as follows:

E = [\begin{matrix} e_{s}^{(c)} e_{s}^{(c) ⊤} & - e_{s}^{(c)} e_{t}^{(\neg c) ⊤} \\ - e_{t}^{(\neg c)} e_{s}^{(c) ⊤} & e_{t}^{(\neg c)} e_{t}^{(\neg c) ⊤} \end{matrix}]

(16)

where the vectors

e_{s}^{(c)}

and

e_{t}^{(\neg c)}

are defined as

e_{s}^{(c)} = \frac{y_{s} (:, c)}{N_{s}^{(c)}}, e_{t}^{(\neg c)} = \frac{\sum_{k \neq c} y_{t} (:, k)}{N_{t} - N_{t}^{(c)}} .

DDC in Equation (9) is formulated as follows:

R_{disp} (F_{s}, F_{t}) = tr (A^{⊤} F B F^{⊤} A)

(17)

where B is the dispersion alignment matrix that is constructed by

B = [\begin{matrix} H_{s} & 0 \\ 0 & - H_{t} \end{matrix}]

(18)

where

H_{s} = I_{n} - \frac{1}{N_{s}} 1 1^{T}

and

H_{t} = I_{n} - \frac{1}{N_{t}} 1 1^{T}

are the centering matrices for the source and target domains, respectively.

Substituting the marginal and conditional distribution discrepancies (Equations (10) and (11)) along with a CSC (Equations (13) and (15)) and DDC (Equation (17)) into Equation (9) and applying kernel techniques, the optimization problem is reformulated as follows:

\begin{matrix} min_{A} & tr (A^{⊤} F (M + ρ L - μ E + ν B) F^{⊤} A) + λ {∥ A ∥}_{F}^{2} \\ s . t . A^{⊤} F H F^{⊤} A = I . \end{matrix}

(19)

where

M = M_{0} + \sum_{c = 1}^{C} M^{(c)}

,

H = I - \frac{1}{N_{s} + N_{t}} 1 1^{T}

,

1

is a row vector of all ones with dimension

(N_{s} + N_{t})

, and

{λ ∥ A ∥}^{2}

is a regularization term with hyperparameter

λ

controlling the complexity of A.

A^{T} F H F A^{T}

is the trial variance, helping reduce distribution discrepancy while preserving domain structure. L, E, and B regularize intra-class compactness, inter-class separation, and cross-domain dispersion, respectively.

Solving the optimization problem by the Lagrangian method, the projection matrix A is obtained. The new feature representation Z minimizes the discrepancies between different domains under the constraints of class structure and domain dispersion. The main procedure of MCFA is summarized in Figure 3.

Iterative Adaptation Process of MCFA

MCFA progressively refines feature alignment through an iterative process. Since the alignment of conditional distributions relies on the pseudo-labeled target samples, assigning initial pseudo-labels to the target samples with a classifier is required to bootstrap the iterative alignment. The linear discriminant analysis (LDA) classifier

g_{1}

is initialized with the source domain samples and then updated with the source and target domain samples in the iterative process. In each iteration, we first align the feature distributions of the two domains based on their corresponding labels. Then, SPLU is applied to refine the pseudo-labels: The LDA classifier

g_{1}

is first updated using the aligned source domain features. The updated

g_{1}

assigns labels to the target samples, along with a corresponding confidence score

{c o n f}_{1}

. Then, a new LDA classifier

g_{2}

is trained with the source domain samples and the selected target domain samples whose

{c o n f}_{1}

exceed the predefined threshold

τ_{1}

. The trained classifier

g_{2}

re-predicts pseudo-labels along with their confidence scores

{c o n f}_{2}

for the target samples. Similarly, a confidence threshold

τ_{2}

is set for

{c o n f}_{2}

. Then, the pseudo-label for a target sample is updated only when both classifiers (

g_{1}

and

g_{2}

) assign the same pseudo-label to the target sample and the confidence scores both exceed their respective thresholds:

{c o n f}_{1} > τ_{1}

and

{c o n f}_{2} > τ_{2}

. Otherwise, the previous pseudolabel of the target sample is retained. Once the pseudo-label update is finalized, the next iteration commences.

The SPLU method ensures that only the high-confidence target domain samples are selected for pseudo-label updating, assigning more reliable pseudo-labels to the target domain samples and effectively reducing the error propagation caused by unreliable pseudo-labels in the iterative process.

The detailed steps of MCTLP are presented in Algorithm 1.

Algorithm 1: MCTLP

2.3. Datasets

To assess the performance of our method, we test it on two publicly available datasets, which are described below.

2.3.1. Dataset 2a

The first MI dataset (Dataset 2a https://www.bbci.de/competition/iv/desc_2a.pdf (accessed on 6 April 2026)) consists of EEG data from 9 subjects, where each subject performed four different MI tasks during the experiment: left hand (Class 1), right hand (Class 2), both feet (Class 3), and tongue (Class 4) MIS. Each subject performed two sessions on different dates. Each session was split into 6 runs, with 48 trials in each run. The data was recorded from 22 EEG and 3 EOG channels at a sampling rate of 250 Hz. We only selected the left and right hand MI tasks from session 1 as our experimental dataset.

2.3.2. Dataset 1

The second MI dataset (Dataset 1 https://www.bbci.de/competition/iv/desc_1.html (accessed on 6 April 2026)) was collected from seven subjects. The signals were recorded from 59 channels at a sampling rate of 1000 Hz and then down-sampled to 100 Hz. For each subject, two classes of MI were selected from the three classes (i.e., left hand, right hand, and foot MI). The dataset includes three phases. During the calibration phase, each participant performed 100 trials for MI tasks involving both the left and right hand MI. The EEG data recorded in the calibration phase were used as our experimental data.

3. Results and Discussion

The EEG signals were preprocessed. For both datasets, a finite impulse response band-pass filter with a frequency range of 8–30 Hz was applied to the EEG data, which covers the

μ

and

β

frequency bands that are most relevant to motor imagery tasks. The experimental data from both datasets were extracted using a fixed time window of 0.5–3.5 s following the cue, which marks the onset of the MI task in the experimental paradigm. In both datasets, left hand and right hand MI data were used for the evaluation of the proposed methods.

For each subject, we performed leave-one-subject-out cross-validation for cross-subject experiments within the same dataset, where the data from one subject were used as the target domain and the data from the other subjects were used as the source domain. This procedure was repeated until the data from each subject had served once as the target domain, and the final performance was obtained by averaging the results over all target subjects.

We adopted the radial basis function (RBF) kernel to construct a kernel mapping space. The dimensionality of the shared low-dimensional subspace was selected through cross-validation, with the optimal subspace dimensionality being 20 for Dataset 1 and 10 for Dataset 2a. The regularization parameters

(ρ, μ, ν)

were set to (0.01, 0.1, 0.1) for Dataset 1 and (0.05, 0.1, 0.1) for Dataset 2a.

3.1. Results

The proposed MCTLP was tested on Dataset 2a and Dataset 1, alongside five existing methods for comparison:

NOTL: It uses CSP to derive spatial representations from EEG data and LDA for classification, without performing data or feature alignment.
RA: It uses RA for data-level alignment and the minimum Riemannian Distance to class mean for classification, without performing feature alignment.
EA: It employs EA for data-level alignment, CSP for EEG spatial representations and then LDA for classification, without feature-level alignment.
EA + TCA: It first employs EA for data-level alignment. After extracting EEG features with CSP, it uses TCA for feature-level alignment and then uses LDA for classification.
EA + JDA: It employs EA for data-level alignment and then CSP to extract spatial features. Subsequently, it utilizes JDA for feature-level alignment before final classification using LDA.

The classification accuracies of MCTLP and the five methods are shown in Figure 4a and Figure 4b on Dataset 1 and Dataset 2a, respectively. The proposed MCTLP outperformed the other five methods. MCTLP outperformed NOTL due to reduced inter-subject variability, with accuracy improvements of 30.62% on Dataset 1 and 21.38% on Dataset 2a. When compared with the methods where only data alignment was performed, such as RA and EA, MCTLP achieved superior performance, outperforming RA by 19.4% on Dataset 1 and 13.74% on Dataset 2a and EA by 9.19% on Dataset 1 and 11.96% on Dataset 2a, which highlights the critical role of feature alignment. Compared to EA + TCA and EA + JDA, both of which involved feature alignment, MCTLP achieved higher accuracy, with 88.69% on Dataset 1 and 85.35% on Dataset 2a, indicating that our feature distribution alignment approach is more effective.

The improved performance of MCTLP can be attributed to the combined effect of Euclidean space data alignment, feature-level alignment, and reliable pseudo-label refinement. EA reduces inter-subject variability at the data level. MCFA further aligns source and target features under a CSC and DDC, where the CSC helps preserve intra-class compactness and inter-class separability and the DDC helps regulate cross-domain dispersion. SPLU helps reduce error propagation during iterative adaptation in MCFA. Consequently, the proposed framework provides more effective cross-subject transfer than the four methods relying only on data alignment or conventional feature distribution alignment.

To evaluate whether the improvements made by the proposed method are statistically significant compared to the other methods, paired t-tests were performed between MCTLP and the other five methods, as shown in Figure 4. The statistical tests show that MCTLP outperformed the other five methods significantly (MCTLP vs. NOTL: p < 0.0001 on Dataset 1 and p < 0.01 on Dataset 2a; MCTLP vs. RA: p < 0.001 on Dataset 1 and p < 0.01 on Dataset 2a; MCTLP vs. EA: p < 0.01 on Dataset 1 and p < 0.05 on Dataset 2a; MCTLP vs. EA + TCA: p < 0.01 on Dataset 1 and p < 0.05 on Dataset 2a; MCTLP vs. EA + JDA: p < 0.01 on Dataset 1 and p < 0.05 on Dataset 2a), confirming its superior transferability for cross-subject MI tasks.

3.2. Impact of Kernel Function and Subspace Dimension

Figure 5 illustrates the impact of kernel function selection and subspace dimensionality on the classification performance for both datasets. Different kernel functions, including the prima kernel, linear kernel, RBF kernel, and SAM kernel, were tested with different subspace dimensions. The results demonstrate that the RBF kernel achieved the highest performance in most subspace dimensions. As the subspace dimensionality increased, the classification performance of all kernel functions initially increased and then declined, with the best performance generally achieved between 10 and 20 dimensions. The highest accuracies were 88.69% and 85.35% on Dataset 1 (RBF kernel, 20-dimensional subspace) and Dataset 2a (RBF kernel, 10-dimensional subspace), respectively.

3.3. Influence of Hyperparameters

To assess the influence of the hyperparameters

ρ

,

μ

and

ν

on performance, we conducted a sensitivity analysis by testing all combinations over the set 0.01, 0.05, 0.1, 0.5. As shown in Figure 6, the proposed method exhibited low sensitivity to variations in these hyperparameters, maintaining consistent performance across a wide range of values. The accuracy dropped when the parameters were set too small or too large. Among the three parameters,

ρ

had the greatest impact on accuracy. The best accuracy was achieved at

ρ

= 0.01 on Dataset 1 (88.69%) and at

ρ

= 0.05 on Dataset 2a (85.35%). The accuracy decreased as

ρ

increased beyond 0.05. When

ρ

increased from 0.1 to 0.5, the accuracy dropped from 86.59% to 83.27% on Dataset 1 and from 84.95% to 83.79% on Dataset 2a. Compared with

ρ

, less change was observed when

μ

and

ν

were changed. As

μ

and

ν

varied from 0.01 to 0.5, the accuracy changed by about 1.5%. The best accuracy was obtained when

μ

and

ν

were set around 0.1.

3.4. Effect of Two Constraints (CSC and DDC)

As shown in Figure 7, determining the projection matrix without a CSC (w/o CSC) or without a DDC (w/o DDC) led to a performance drop for most subjects. Notably, only a few subjects (Subject 4 in Dataset 1; Subjects 2 and 5 in Dataset 2a) remained stable when one constraint was removed. The results demonstrated that the CSC and DDC were both crucial for subspace optimization.

3.5. Effect of SPLU

To evaluate the specific contribution of SPLU to MCTLP, we removed SPLU from MCTLP and then compared it with the full model. In MCTLP without SPLU (w/o SPLU), all target samples are re-labeled in each iteration without confidence filtering. The results in Figure 7 show that removing SPLU led to consistent performance decline for all subjects, which demonstrates that SPLU mitigated error propagation by filtering out lowconfidence predictions, thus avoiding negative transfer caused by erroneous pseudo-labels and stabilizing the adaptation cycle.

3.6. Feature Visualization

To intuitively evaluate the domain discrepancy and the effectiveness of the proposed method in narrowing the distribution discrepancy between source and target domains, we utilized t-distributed stochastic neighbor embedding (t-SNE) to visualize feature distributions before and after domain adaptation. Specifically, the features before MCTLP are the CSP features from the source and target subjects, whereas the features after MCTLP are the new feature representations obtained with MCTLP.

All features were normalized and projected into a two-dimensional space using t-SNE. Figure 8a and Figure 8b show the t-SNE visualizations of feature distributions before and after adaptation for Subject 4 from Dataset 1 and Subject 3 from Dataset 2a, respectively. As shown in Figure 8, before adaptation, the source and target features form clearly separated clusters, indicating a substantial domain discrepancy. After adaptation, the distribution discrepancy between domains is significantly reduced, with features from different domains overlapping. These results demonstrate that our method effectively mitigates domain shift and facilitates cross-subject knowledge transfer.

4. Conclusions

The MI data from different subjects exhibits subject variability, which necessitates a longer calibration period for new subjects. Transfer learning can leverage existing source domain knowledge to assist classification tasks for a new subject. In this study, a comprehensive MI transfer framework integrating both data alignment and feature alignment was proposed. Compared to other transfer learning methods, our approach not only incorporates constraints on class structure and dispersion but also addresses the challenge of mitigating error propagation from pseudo-labels. Experimental results on two widely used public benchmark datasets support the effectiveness of the proposed framework in cross-subject MI classification under commonly used evaluation settings. The present validation is still limited to public offline datasets with a relatively small number of subjects. In our future study, we will further evaluate the proposed framework on larger subject groups, multi-class MI decoding and experimental conditions that are closer to practical BCI applications.

Author Contributions

Conceptualization, B.Y.; methodology, B.Y.; software, B.Y.; validation, B.Y.; formal analysis, B.Y.; investigation, B.Y.; writing—original draft preparation, B.Y.; writing—review and editing, L.Z.; supervision, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Project No. 51977020).

Data Availability Statement

The data presented in this study are derived from public domain resources. The datasets used in this study, including BCI Competition IV Dataset 2a and Dataset 1, are publicly available at the BCI Competition IV download page: https://www.bbci.de/competition/iv/ (accessed on 6 April 2026).

Conflicts of Interest

The authors declare no conflict of interest.

References

Wolpaw, J.R.; Birbaumer, N.; Heetderks, W.J.; McFarland, D.J.; Peckham, P.H.; Schalk, G.; Donchin, E.; Quatrano, L.A.; Robinson, C.J.; Vaughan, T.M.; et al. Brain-computer interface technology: A review of the first international meeting. IEEE Trans. Rehabil. Eng. 2000, 8, 164–173. [Google Scholar] [CrossRef]
Chaudhary, U.; Birbaumer, N.; Ramos-Murguialday, A. Brain–computer interfaces for communication and rehabilitation. Nat. Rev. Neurol. 2016, 12, 513–525. [Google Scholar] [CrossRef]
McFarland, D.J.; Wolpaw, J.R. EEG-based brain–computer interfaces. Curr. Opin. Biomed. Eng. 2017, 4, 194–200. [Google Scholar] [CrossRef]
Khan, M.A.; Das, R.; Iversen, H.K.; Puthusserypady, S. Review on motor imagery based BCI systems for upper limb post-stroke neurorehabilitation: From designing to application. Comput. Biol. Med. 2020, 123, 103843. [Google Scholar] [CrossRef]
Pfurtscheller, G.; Neuper, C.; Flotzinger, D.; Pregenzer, M. EEG-based discrimination between imagination of right and left hand movement. Electroencephalogr. Clin. Neurophysiol. 1997, 103, 642–651. [Google Scholar] [CrossRef] [PubMed]
Kevric, J.; Subasi, A. Comparison of signal decomposition methods in classification of EEG signals for motor-imagery BCI system. Biomed. Signal Process. Control 2017, 31, 398–406. [Google Scholar] [CrossRef]
Wu, X.; Ouyang, R.; Zhang, C. Learning-based EEG rhythm analysis for the enhancement of motor imagery-based brain-computer interface performance. Biomed. Signal Process. Control 2025, 102, 107345. [Google Scholar] [CrossRef]
Pérez-Velasco, S.; Marcos-Martínez, D.; Santamaría-Vázquez, E.; Martínez-Cagigal, V.; Moreno-Calderón, S.; Hornero, R. Unraveling motor imagery brain patterns using explainable artificial intelligence based on Shapley values. Comput. Methods Programs Biomed. 2024, 246, 108048. [Google Scholar] [CrossRef]
Maeder, C.L.; Sannelli, C.; Haufe, S.; Blankertz, B. Pre-stimulus sensorimotor rhythms influence brain–computer interface classification performance. IEEE Trans. Neural Syst. Rehabil. Eng. 2012, 20, 653–662. [Google Scholar] [CrossRef]
Jayaram, V.; Alamgir, M.; Altun, Y.; Scholkopf, B.; Grosse-Wentrup, M. Transfer learning in brain-computer interfaces. IEEE Comput. Intell. Mag. 2016, 11, 20–31. [Google Scholar] [CrossRef]
Wu, D.; Xu, Y.; Lu, B.L. Transfer learning for EEG-based brain–computer interfaces: A review of progress made since 2016. IEEE Trans. Cogn. Dev. Syst. 2020, 14, 4–19. [Google Scholar] [CrossRef]
Li, M.; Xu, D. Transfer learning in motor imagery brain computer interface: A review. J. Shanghai Jiaotong Univ. (Sci.) 2024, 29, 37–59. [Google Scholar] [CrossRef]
Dai, M.; Zheng, D.; Liu, S.; Zhang, P. Transfer Kernel Common Spatial Patterns for Motor Imagery Brain-Computer Interface Classification. Comput. Math. Methods Med. 2018, 2018, 9871603. [Google Scholar] [CrossRef]
Albalawi, H.; Song, X. A study of kernel CSP-based motor imagery brain computer interface classification. In Proceedings of the 2012 IEEE Signal Processing in Medicine and Biology Symposium (SPMB), New York, NY, USA, 7 December 2012; pp. 1–4. [Google Scholar]
Long, M.; Wang, J.; Sun, J.; Yu, P.S. Domain invariant transfer kernel learning. IEEE Trans. Knowl. Data Eng. 2014, 27, 1519–1532. [Google Scholar] [CrossRef]
Zanini, P.; Congedo, M.; Jutten, C.; Said, S.; Berthoumieu, Y. Transfer learning: A Riemannian geometry framework with applications to brain–computer interfaces. IEEE Trans. Biomed. Eng. 2017, 65, 1107–1116. [Google Scholar] [CrossRef]
Rodrigues, P.L.C.; Jutten, C.; Congedo, M. Riemannian procrustes analysis: Transfer learning for brain–computer interfaces. IEEE Trans. Biomed. Eng. 2018, 66, 2390–2401. [Google Scholar] [CrossRef]
He, H.; Wu, D. Transfer learning for brain–computer interfaces: A Euclidean space data alignment approach. IEEE Trans. Biomed. Eng. 2019, 67, 399–410. [Google Scholar] [CrossRef]
Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2200–2207. [Google Scholar]
Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 2010, 22, 199–210. [Google Scholar] [CrossRef]
She, Q.; Chen, T.; Fang, F.; Zhang, J.; Gao, Y.; Zhang, Y. Improved domain adaptation network based on Wasserstein distance for motor imagery EEG classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 1137–1148. [Google Scholar] [CrossRef] [PubMed]
Zhi, H.; Yu, T.; Gu, Z.; Lin, Z.; Che, L.; Li, Y.; Yu, Z. Supervised contrastive learning-based domain generalization network for cross-subject motor decoding. IEEE Trans. Biomed. Eng. 2024, 72, 401–412. [Google Scholar] [CrossRef] [PubMed]
Gong, Y.; Shi, K.; Niu, X.; Yang, L.; Yang, X.; Zheng, C. Multi-source Discriminant Dynamic Domain Adaptation for Cross-subject Motor Imagery EEG Recognition. IEEE J. Biomed. Health Inform. 2025, 30, 3556–3567. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Framework of the proposed MCTLP method.

Figure 2. Comparison between JDA and MCFA.

Figure 3. Framework of the MCFA algorithm.

Figure 4. Classification accuracy of six methods on two MI datasets: (a) Dataset 1 and (b) Dataset 2a (*: p < 0.05, **: p < 0.01, ***: p < 0.001, ****: p < 0.0001).

Figure 5. Classification accuracy under different kernel functions and dimensions on two MI datasets: (a) Dataset 1 and (b) Dataset 2a.

Figure 6. Classification accuracy under different parameters on two MI datasets: (a) Dataset 1 and (b) Dataset 2a.

Figure 7. Classification accuracy of ablation experiments on two MI datasets: (a) Dataset 1 and (b) Dataset 2a.

Figure 8. T-SNE visualization of features. (a) Dataset 1; (b) Dataset 2a. Blue indicates source domain; red indicates target domain. Circles and triangles denote motor imagery of the left and right hand, respectively.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, B.; Zhang, L. A Multi-Constrained Transfer Learning for Cross-Subject Decoding of Motor Imagery-Based BCI. Mathematics 2026, 14, 1314. https://doi.org/10.3390/math14081314

AMA Style

Yu B, Zhang L. A Multi-Constrained Transfer Learning for Cross-Subject Decoding of Motor Imagery-Based BCI. Mathematics. 2026; 14(8):1314. https://doi.org/10.3390/math14081314

Chicago/Turabian Style

Yu, Boyang, and Li Zhang. 2026. "A Multi-Constrained Transfer Learning for Cross-Subject Decoding of Motor Imagery-Based BCI" Mathematics 14, no. 8: 1314. https://doi.org/10.3390/math14081314

APA Style

Yu, B., & Zhang, L. (2026). A Multi-Constrained Transfer Learning for Cross-Subject Decoding of Motor Imagery-Based BCI. Mathematics, 14(8), 1314. https://doi.org/10.3390/math14081314

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Constrained Transfer Learning for Cross-Subject Decoding of Motor Imagery-Based BCI

Abstract

1. Introduction

2. Materials and Methods

2.1. Problem Definition

2.2. MCTLP

2.2.1. Data Alignment

2.2.2. Feature Extraction

2.2.3. Multi-Constrained Feature Alignment (MCFA)

Optimization Problem of MCFA

Iterative Adaptation Process of MCFA

2.3. Datasets

2.3.1. Dataset 2a

2.3.2. Dataset 1

3. Results and Discussion

3.1. Results

3.2. Impact of Kernel Function and Subspace Dimension

3.3. Influence of Hyperparameters

3.4. Effect of Two Constraints (CSC and DDC)

3.5. Effect of SPLU

3.6. Feature Visualization

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI