2.2.1. Datasets and Preprocessing
The BCI Competition IV Dataset 2a [
27] contains EEG data from 9 subjects performing four distinct motor imagery tasks. Data were acquired using 22 electrodes with a 250 Hz sampling rate. Each subject underwent multiple runs, completing a total of approximately 576 trials, with each trial lasting about 4 s. The BCI Competition III Dataset 3a [
28] is also used for motor imagery tasks. This data includes EEG data from three subjects covering four categories of motor imagery (left hand, right hand, foot, and tongue). It was recorded using 60 channels, with a sampling rate of 250 Hz, a 1–50 Hz bandpass, and a power frequency notch. Each subject underwent at least six runs, each consisting of 40 trials, for a total of approximately 60 trials per category. Each trial lasted approximately 7 s: 0–2 s of rest; at t = 3 s, the arrow cue was given and imagery began, continuing until t = 7 s.
During EEG signal preprocessing in this study, a bandpass filter (frequency range 8 to 32 Hz) was first applied, using a fourth-order Butterworth filter to effectively remove low-frequency drift and high-frequency noise, while retaining frequency bands associated with motor imagery. This frequency band selection was based on the characteristics of movement-related electrical activity. Independent component analysis (ICA) was used to remove eye movement and electromyographic artifacts. ICA decomposition and the manual removal of independent components related to eye movement and electromyography further improved the purity of the EEG signal. Furthermore, baseline correction was performed to eliminate the pre-stimulus baseline drift. Specifically, a baseline window from −200 ms to 0 ms was used, and the mean of this window was subtracted to remove baseline drift and ensure signal stability. For analysis, the preprocessed EEG signal is segmented into fixed-length 2 s time windows, with a 0.04 s interval between each window. This results in 26 distinct time windows for the EEG data between 1 and 4 s. To improve classification accuracy, a multimodal feature extraction method is employed. This method combines spatial discrimination, a time–frequency energy structure, and nonlinear dynamic complexity, specifically applied to EEG signals from motor imagery tasks. By integrating frequency–bandwidth cospatial patterns (FBCSPs), wavelet packet decomposition, and nonlinear features, this method effectively captures differences in neural activity across different brain regions, frequency bands, and structural levels.
2.2.2. Channel Selection Method
In BCI systems, high-density EEG provides rich information but also introduces redundancy, noise, and computational overhead, potentially reducing classification accuracy and stability. In response to this challenge, we design a channel selection method that leverages both physiological priors and statistical data analysis: a core channel retention and candidate channel dynamic selection method, balancing spatial filter stability and channel discriminability. The approach involves the following key steps:
The motor imagery task is highly dependent on the activity of the Primary Motor Cortex (PMC), which is mapped on the scalp EEG mainly to
(left-handed area),
(central area) and
(right-handed area). To maintain the stability and physiological interpretability of spatial feature extraction algorithms (e.g., CSP), the following core channels are fixedly retained in this paper:
. Let the original set of channels be
, then the set of candidate channels is
The symbol “\” represents subtraction. Next, the candidate channels are evaluated, channels with high channel scores are selected, and the final set of channels comprises the core channels and the selected channels.
In the motor imagery EEG classification, channel discriminability varies across motion classes. To reduce redundancy while preserving key information, this paper introduces two statistical scoring metrics to quantify signal distribution and class separability, guiding effective channel selection.
Discriminative Variance Score: EEG signals exhibit task-specific spatial patterns in MI tasks. Channels with high variance in specific tasks and moderate variance across all trials indicate strong selectivity and class discrimination. Such characteristics enable efficient channel screening. The evaluation metric is defined as follows:
In the formula, represents the signal variance of channel in the j category, which is used to measure the fluctuation of the channel’s signal in a particular category; is the total variance of channel across all trials, reflecting the overall level of fluctuation of the channel; N is the number of categories, which in this task is N = 4 (4 categories of EEG-related motor imagery); and is a smoothing factor, which is designed to avoid having a zero in the denominator and to ensure that the formula calculations are reasonable.
Band Power Significance Score: ERD/ERS (Event-Related Desynchronization/Event-Related Synchronization) responses in the
(8–13 Hz) and
(13–30 Hz) bands are typically evoked by motor imagery tasks involving the left hand, right hand, foot, and tongue, providing substantial class-discriminative information. Channels with significant power differences across tasks are more informative. To quantify this, we introduce the ANOVA Power Score [
29], Specifically, for each channel, and the mean square power is computed after filtering, grouped by task labels, and evaluated using one-way ANOVA. The steps involved are presented as follows.
Power feature extraction: Suppose the signal of a certain channel
in the
t-th trial is
. The power of this trial is shown in Formula (3):
where
T represents the number of sampling points in the
trial, and
is the signal value of the
k-th sampling point. This power feature essentially reflects the energy level of the channel in the current trial.
Significance testing method: The power features are divided into four groups according to the category labels of
trial:
where
represents power, and j represents the category of motor imagery EEG (left hand, right hand, foot, and tongue). Analysis of variance (ANOVA) is conducted to determine if there is a significant difference in the means of the four power samples, under the null hypothesis
If
is rejected, it means that the channel has a statistically significant power distribution difference between categories and has discriminatory power. In order to enhance the discriminatory nature of the scoring and so that a larger score value indicates a more important channel, the negative logarithm of the
p-value is used as the score:
where
p-value
i represents the significance probability of the ANOVA test result for channel
. Taking its logarithm can stretch the value range and avoid the problem of insensitivity to small numbers.
Fused Channel Score and Channel Selection: The discriminative variance score and band power significance score assess channel discriminability from complementary perspectives—signal volatility and frequency band energy. To better capture their combined strengths, we design a fusion strategy to compute a composite score for more comprehensive channel importance evaluation.
To eliminate the influence of the measurement scales and the differences in value ranges of different evaluation indicators, first, for the initial scores
and
, the normalization form is as follows:
The normalized scores all fall within the interval
, which facilitates fusion. The final fusion score is calculated as
where
is an adjustable weight parameter representing the proportion of the variance score in the final score. In this paper, an equal-weight strategy is adopted, i.e.,
, indicating that the two types of indicators are equally important.
According to the fused score , all candidate channels are ordered by descending importance, and the top K are selected as the optimal channel subset for feature extraction and classification tasks.
2.2.3. EEG Feature Extraction
Multi-band filtering and FBCSP spatial feature extraction: In the motor imagery task, different motor tasks will activate different areas of the brain, and these areas show different activity patterns in different frequency bands (such as
and
). The FBCSP (Filter Bank Co-spatial Pattern) method can effectively extract the spatial covariance difference between each frequency band and the task through multi-band filtering and spatial projection [
30], thereby improving the accuracy of classification.
The Common Spatial Pattern (CSP) aims to find the best spatial filter
(m represents the number of channels, and k denotes the feature dimension retained after filtering) to maximize the covariance difference between the two types of data. Let the covariance matrices of the two types of signals be
and
. Then the optimization goal is the following:
The obtained filter
W projects the signal to a new space
, where the variance distribution of the projected components reflects the class discriminability. The steps of FBCSP feature extraction are as follows: First, the EEG signal is divided into 11 frequency bands (e.g., 8–12 Hz, 10–14 Hz, …, 28–32 Hz). For each frequency band
b, the covariance matrices
and
corresponding to the two data are calculated. On this basis, the CSP filter
is solved, and the frequency band signal
is projected into a new space to obtain
; then, by calculating the logarithm of the ratio of the variance of each component to the total variance, the first most discriminative component
is selected after projection. The formula is as follows:
where
is the
ith CSP component, and
k represents the number of selected channels. After extracting features from a single frequency band, the features from the 11 frequency bands are concatenated in sequence to generate the final feature vector
with 44 dimensions.
Wavelet Packet Decomposition (WPD) Time–Frequency Domain Feature Extraction: In the motor imagery task, EEG energy in the
and
bands exhibits Event-Related Desynchronization/Synchronization (ERD/ERS) effects, and the temporal patterns vary from individual to individual. Unlike Fourier transform, WPD captures frequency and temporal dynamics through multi-level decomposition, effectively extracting the non-stationary features of EEG [
31]. WPD recursively divides the signal into low-frequency and high-frequency sub-bands, forming a tree structure, in which each node represents a specific frequency band for refined time–frequency analysis. Taking the 5-layer decomposition as an example, the first layer decomposes the original signal into low-frequency sub-band
and high-frequency sub-band
; the second layer further decomposes
into
and
and divides
into
and
, and so on. After each layer of decomposition, the number of sub-bands increases exponentially. The nth layer contains
nodes, each of which corresponds to a specific frequency range. This paper performs a 5-layer WPD decomposition on the signal on each channel and selects 6 key sub-band nodes for feature extraction. The features of frequency bands related to motion imaging tasks (such as 8–32 Hz) can be extracted. For each selected child node b, the signal strength of the frequency band is quantified by calculating the energy feature. The calculation formula is
where
represents the signal value of node
b at the
n-th sampling point, and
N represents the coefficient sequence length for current node. Essentially, this formula is the sum of the squares of the signal amplitudes of the node, directly reflecting the energy magnitude of the signal in this frequency band. For example, if a node corresponds to the
frequency band and an ERD phenomenon occurs in this frequency band during a motor imagery task, its energy
will be significantly reduced. The obtained 90-dimensional feature is marked as
.
Nonlinear Feature Extraction: EEG signals are essentially the output of complex nonlinear systems, containing nonlinear dynamic characteristics such as chaotic behavior and self-similarity. Traditional linear features are limited by linear assumptions and are difficult to characterize complex dynamic patterns of signals. In order to break through the limitations of linear analysis, it is necessary to introduce nonlinear features such as sample entropy, spectral entropy, and fractal dimension. By mining the complex structure and dynamic laws of signals, the accuracy of EEG signal classification can be improved, providing richer and more effective information for EEG signal processing and analysis.
Sample Entropy (SampEn): Sample entropy quantifies the complexity and randomness within a time series, serving as an indicator of the signal’s intrinsic regularity. During calculation, by comparing the number of matching pairs of sequences under different embedding dimensions, the complexity indicator is derived:
In the formula, m denotes the embedding dimension, which controls the complexity of the sequence reconstruction space; r is the tolerance radius, which defines the threshold for sequence similarity matching; N denotes the length of the signal, representing the amount of analysis data; A and B correspond to the number of sequence matching pairs under the embedding dimensions of and m, respectively. The obtained features are expressed as . The lower the sample entropy value, the more significant the regularity of the signal; the higher the value, the stronger the randomness and irregularity of the signal.
Spectral Entropy (SpecEn): Spectral entropy quantifies the uncertainty in a signal’s energy distribution within the frequency domain, reflecting the extent of energy dispersion across the spectrum:
where
is the normalized spectral power, satisfying
(where
n is the number of spectral bins).The obtained features are expressed as
. An increase in spectral entropy corresponds to a more dispersed energy distribution across the frequency domain, which in turn indicates a higher complexity in the signal’s frequency-domain characteristics.
Fractal Dimension (FD): The fractal dimension quantifies the complexity or roughness of a signal, depicting the degree of change in the signal’s morphology:
In the formula, is the average length of the signal at step k, reflecting the signal’s morphology at different scales; k is a variable step length, covering the local and global features of the signal through multi-scale analysis. The obtained features are expressed as . The higher the fractal dimension, the more complex the change in the signal’s morphology, and the more prominent the nonlinear features. The final 45-dimensional nonlinear features are .
The obtained features are concatenated and dimensionality reduced by the PCA (Principal Component Analysis) algorithm that maps high-dimensional features to a set of linearly independent principal components, and sorts from large to small according to the variance contribution, and finally implements standardization. Finally, Support Vector Regression (SVR) is used to realize the four-category recognition of EEG signals. A SVR model is trained for each category through a one-to-many strategy, and the label of the target category is set to 1 and the labels of other categories are set to 0. During prediction, the four models output the “closeness” of each sample to each category, and finally select the category with the predicted value closest to 1 as the classification result of the sample. The overall algorithm steps are as in Algorithm 1.