2.1. Data Description
Dataset I: This dataset is from BCI Competition III Data Sets Iva, recording EEG data from five subjects (aa, av, al, aw, ay) without feedback. The data includes three types of MI signals (right foot, left, and right hand). Each set of EEG signals was recorded using 118 channels at a sampling rate of 100 Hz. Each subject performed 140 experiments for each type of MI, totaling 280 experiments. The data composition includes a visual cue phase and an MI phase. During the experiment, subjects sat on a chair with their hands naturally placed on the armrests and performed a 3.5 s MI task as required, followed by a 1.75–2.25 s rest period. The specific experimental timeline is shown in 
Figure 1a.
Dataset II: This dataset is from BCI Competition IV Dataset Set I, recording EEG data from seven subjects without feedback. Since the EEG data for subjects c, d, and e are artificially synthesized, this paper only discusses the data for subjects a, b, f, and g. The data includes three types of MI signals (left and right hand, foot). Each set of EEG signals was recorded using 59 channels at a sampling rate of 100 Hz. Each subject performed 100 experiments for each type of MI, totaling 200 experiments. In each experiment, a fixed image is shown on the computer screen for 2 s, followed by a fixation cross overlaid with the MI cue for a total of 6 s. The specific experimental timeline is shown in 
Figure 1b.
Dataset III: This dataset is from BCI Competition IV Dataset 2a, recording EEG data from nine subjects (A01, …, A09) without feedback. The dataset includes four types of MI signals (foot, tongue, left, and right hand). Each set of EEG signals was recorded using the International 10–20 system channel distribution at a sampling rate of 250 Hz. Each subject performed six rounds of MI collection, with each round consisting of 12 experiments for each of the four types of MI, resulting in a total of 288 MI experiments per subject as the training dataset. An equal amount of testing data is also available. The data composition includes a visual cue phase and an MI phase. During the experiment, subjects sat on a chair with their hands naturally placed on the armrests and performed a 4 s MI task as required, followed by a 1.5 s rest period. The specific experimental timeline is shown in 
Figure 1c.
  2.3. Multi-Domain Feature Extraction
Feature extraction aims to obtain effective information from the raw EEG signals that will facilitate classification. We extract features from different domains of EEG for subsequent classification tasks.
  2.3.1. Time Domain Features
Time Domain (TD) features are statistical measures derived from EEG signals that reflect relevant information along the time dimension. These features effectively reflect the changes in EEG signals over time, providing a strong intuitive understanding of the temporal dynamics of brain activity. This paper calculates Higuchi’s Fractal Dimension 
, mean 
, variance 
, and root mean square 
 of EEG signals as TD features. Higuchi’s Fractal Dimension [
24] is a nonlinear metric that assesses the dynamics of time-domain waveforms. It quantitatively evaluates signal complexity under both normal and abnormal conditions, allowing for the detection of concealed information in neurophysiological time series with a high degree of sensitivity. First, the EEG signal can be represented as a sequence of discrete values: 
. Based on the EEG signal, a new set of self-similar time series is constructed, each series being 
, which is defined as:
          where 
 is the initial time, 
 is the time interval, 
 is the number of data points 
 and 
 is a free tuning parameter. 
 is the integer part of 
. Then the length of each time series 
 is 
:
Then, the Higuchi’s Fractal Dimension feature 
 can be obtained as:
The final obtained subset of TD features: 
  2.3.2. Frequency Domain Features
In Frequency Domain (FD) analysis, the distribution of signal bands in the FD can be obtained from the signal’s spectrogram. Changes in the signal can be derived from changes in the frequency bands. In the FD analysis of EEG signals, it is often required to perform a Discrete Fourier Transform (DFT) on the original time series [
25]:
          where 
 represents the 
-th frequency component of the EEG output FD signal, 
 represents the time-varying EEG signal and 
 is the negative exponential function. After performing DFT on the EEG signal, this paper uses the signal representation in the FD 
 to calculate the PSD 
 and energy feature 
 of the 
 band and 
 band respectively, as the FD features of EEG:
The final obtained subset of FD features: .
  2.3.3. Time-Frequency Domain Features
Due to the susceptibility of EEG signals to various interferences, and considering that wavelet transform possesses strong noise resilience, it has become a highly effective method for extracting features from EEG signals [
26]. This paper calculates the discrete wavelet energy as the Time-Frequency Domain (TFD) feature. The definition of Discrete Wavelet Transform (DWT) is:
          where 
 is the wavelet coefficient, 
 is the wavelet basis function, 
 and 
 represent the frequency resolution and time translation, respectively. 
 is decomposed into finite layers using the Mallat algorithm, resulting in:
          where 
L is the number of decomposition layers, 
 is the low-pass approximation component, and 
 is the detail component at different scales. This paper then utilizes the db4 wavelet basis for 4-layer wavelet decomposition, choosing 
D3 to represent 
 wave and 
D4 to represent 
 wave. The wavelet energy of the two bands is then calculated as the TFD features, with the wavelet energy feature 
 calculated as follows:
          where 
 represents the wavelet decomposition at the 
-th layer. The final obtained set of TFD features is the wavelet energy features 
 of the 
 and 
 bands.
  2.3.4. Spatial Domain Features
The CSP algorithm, as a spatial domain (SD)feature, has been widely applied in the processing of EEG signals for BCIs based on MI [
27]. The core idea of the CSP algorithm is to use matrix diagonalization to determine an optimal set of spatial projection filters. This approach maximizes the variance difference between the two classes of signals, thereby generating feature vectors with enhanced discriminative capabilities. The single-trial EEG signal can be represented as a matrix 
 of 
. Here, 
 denotes the number of channels. First, compute the mixed spatial covariance matrix:
          where 
 represents the MI category, 
 and 
 are the mean covariance matrices for the first and second types of imagined movements, respectively, 
trace(•) denotes the trace of the matrix, and 
 represents the transpose matrix. Then, 
 performs eigen-decomposition on the mixed spatial covariance matrix:
          where 
 is the diagonal matrix composed of the eigenvalues of the mixed spatial covariance matrix 
 arranged in descending order, and 
 is the eigenvector matrix of 
. The calculation formula for the whitening matrix is as follows:
Applying the whitening matrix to 
, we obtain:
          where 
 can be used to compute the projection matrix 
 as 
. Where 
 is the matrix of 
. The single-trial EEG signal can be decomposed using the projection matrix into:
Transforming the EEG signal according to the above equation, the CSP feature 
 that can be ultimately used for training the classifier can be calculated from 
.
Extract CSP features for each sub-band. Finally, the EEG data is segmented into six frequency bands, and CSP features are extracted from each band. The frequency range is from 8 to 32 Hz, with an interval of 4 Hz. Features  are extracted from each sub-band as the SD features .
  2.4. Feature Selection and Feature Rotation Transformation
To obtain features that are more advantageous for classification, this study employs Recursive Feature Elimination (RFE) [
28] to filter the feature sets from each domain, thereby reducing the dimensionality of the features and enhancing the performance of the base classifiers. RFE is a backward search algorithm that follows the wrapper pattern, and its effectiveness is contingent upon the classifier utilized during the iterative classification process. Considering the advantages of Random Forest (RF) [
29], such as high accuracy and robustness, we have chosen RF as the iterative classifier for RFE. The primary steps of the RFE-RF algorithm, which utilizes the RF iterative classifier, are as follows: first, the RF model is trained using the candidate feature set. 
 Express the importance measure of each feature in the model as 
, where 
, and rank the feature set 
 based on importance. Then, remove the feature 
 corresponding to the least importance 
, resulting in a new feature subset 
. Repeat the training, calculation, and elimination process until the optimal feature set 
 with 
 features is obtained. After RFE-RF feature selection, the significant feature subsets obtained for each domain feature are 
.
Rotation Forest is a classification algorithm with a local rotation strategy [
30]. By preserving all principal component information, local rotation helps achieve higher performance [
31,
32]. This paper proposes a method for further optimizing multi-domain feature sets through rotation transformation. Specifically, we perform local rotation to reconstruct the feature space, which reduces the correlation between features while simultaneously enriching their representations, thereby enhancing their discriminative ability. The key steps are outlined below:
(1) Randomly divide the original feature space  into  non-overlapping local subspaces, each containing  features.
(2) Perform a 75% sample resampling process on  to construct a new subspace .
(3) Apply PCA transformation on 
 to obtain the principal component coefficients 
, where 
 is a matrix 
. The principal component coefficients obtained from each local subspace are denoted as 
. Construct a coefficient rotation matrix 
 using the principal component coefficients 
. The arrangement of the rotation matrix is as follows:
The obtained rotation matrices  are generated via PCA, which determines the optimal linear transformations for each feature subset. This method maximizes the variance of the rotated features and increases feature diversity, thereby enhancing classifier performance.
Reorder the columns in  according to the original data feature set to obtain the matrix .
(4) By performing local rotation on the significant features, the rotated features  is obtained as .
Based on the research on Extended Space Forest [
33], this study integrates and connects significant features and rotated features to obtain a composite feature subset 
. Here, 
 denotes the connection operation. These can form a promotive relationship, enhancing the discriminability and diversity of the features. Ultimately, four sets of composite feature sets are obtained as 
. The detailed process of rotation fusion is shown in 
Figure 3.
  2.5. Stacking Ensemble Model
In previous research on MI tasks, researchers often relied on single-classifier models. Additionally, the application of ensemble learning was typically limited to homogeneous ensemble methods or simple voting strategies. To address these issues and fully leverage multi-domain features, this paper proposes a stacking model framework that combines the predictions of classifiers trained in each domain, thereby enhancing the accuracy and robustness of recognition tasks related to MI. Stacking ensemble is a method of ensemble learning that merges the predictions of several base learners and incorporates them with a meta-learner to enhance the model’s predictive performance [
34]. This paper chooses Random Forest (RF) as the base classifier in the first layer. The composite feature sets of various domains are used to train the base classifier RF, resulting in the respective base training model 
. Using time-domain features as an example, the output of the trained base classifier can be represented as a probability estimation vector 
, where each element 
 corresponds to the probability estimation for class 
. Here, 
 denotes the number of target classes, and 
 is a matrix of dimension 
, where 
 is the number of training samples. In this way, four probability estimation vectors 
 can be obtained. In traditional stacking models used for MI tasks, the meta-classifier does not directly handle the original features, which may cause it to overlook some information in the original features, thus affecting the final performance of the meta-classifier. Therefore, this paper integrates original significant features from various domains and uses LDA for dimensionality reduction. The linear discriminant features 
 obtained from the dimensionality reduction are used as supplementary features for the meta-classifier. By maximizing the ratio between the inter-class variance and the intra-class variance of the original features, LDA can effectively extract the most discriminating linear combination for classification from the high-dimensional original features, avoiding the overfitting risk brought by the high-dimensional features, not only reducing the complexity of the model but also ensuring the retention of key information, thus improving the generalization ability of the model. The meta-classifier needs to consider composite optimization during the classification process. Therefore, this paper selects Multi-Layer Perceptrons (MLP) [
35] as the meta-classifier model. We use the predicted probability estimates and linear discriminant features from each base classifier as inputs for the meta-classifier MLP. The meta-classifier is then trained to obtain the final classifier model 
. The obtained probability estimation vector can be derived through the stacking model, denoted as 
. Thus, 
, ultimately choosing the category with the highest probability as the final prediction outcome.