Adaptive Time–Frequency Segment Optimization for Motor Imagery Classification

Motor imagery (MI)-based brain–computer interface (BCI) has emerged as a crucial method for rehabilitating stroke patients. However, the variability in the time–frequency distribution of MI-electroencephalography (EEG) among individuals limits the generalizability of algorithms that rely on non-customized time–frequency segments. In this study, we propose a novel method for optimizing time–frequency segments of MI-EEG using the sparrow search algorithm (SSA). Additionally, we apply a correlation-based channel selection (CCS) method that considers the correlation coefficient of features between each pair of EEG channels. Subsequently, we utilize a regularized common spatial pattern method to extract effective features. Finally, a support vector machine is employed for signal classification. The results on three BCI datasets confirmed that our algorithm achieved better accuracy (99.11% vs. 94.00% for BCI Competition III Dataset IIIa, 87.70% vs. 81.10% for Chinese Academy of Medical Sciences dataset, and 87.94% vs. 81.97% for BCI Competition IV Dataset 1) compared to algorithms with non-customized time–frequency segments. Our proposed algorithm enables adaptive optimization of EEG time–frequency segments, which is crucial for the development of clinically effective motor rehabilitation.


Introduction
Brain-computer interface (BCI) technology establishes communication between brains and computers or other devices without relying on peripheral nerves or muscles [1].Among various types of signals from the brain, EEG is considered the most promising brain signal due to its portable, cost-effective, and non-invasive advantages.Motor imagery (MI) is one typical electroencephalography (EEG)-based paradigm for BCI development; the voltage fluctuation in the motor area cortex due to imaging a real movement or executing MI is similar to a large degree [2].Specifically, it has been observed that imagination and finishing of a movement results in a power decrease and increase in EEG signals in the alpha (8~12 Hz) and beta (14~30 Hz) frequency bands.This phenomenon is recognized as event-related desynchronization (ERD) and event-related synchronization (ERS), and serves as the foundation for MI-BCI classification [3].
Spatial filtering and channel selection are commonly utilized in order to enhance the performance of MI-BCI.The common spatial pattern (CSP) algorithm and its related algorithms [4][5][6][7] serve to enhance the discernibility of MI-BCI mental states by maximizing the variance of one class while minimizing the variance of the other class.Jin et al. proposed a channel selection method based on brain network parameters using the Pearson coefficient to quantify the statistical relationship and select the optimal channels [5].Zhang et al. incorporated an auto channel selection layer into the neural network, allocating weights based on the EEG channels' contributions to MI signal classification, thereby facilitating adaptive channel selection [8].
The aforementioned algorithms have demonstrated some degree of enhancement in the performance of the MI-BCI systems.However, their effectiveness heavily relies on pre-adopted time truncation and frequency filtering.Most of the existing literature has set a broad frequency band (typically 8~30 Hz) and a non-customized time segment (typically a short period of time after onset of the MI cue).Nevertheless, the time-frequency characteristics of ERD/ERS evoked by different subjects vary.Therefore, a non-customized time-frequency segment leads to subpar classification results across different subjects.The appropriate time-frequency segments are very subject-dependent, which will be the main concern of this paper.
To address the challenge of selecting appropriate time-frequency segments, Kai et al. [9,10] proposed a band selection algorithm, namely, FBCSP, which decomposes signals into several frequency bands.Subsequently, CSP is applied to each band before combining and selecting the CSP features.Sliding windows and overlapping time-frequency segments are employed in other studies to further increase the optional time-frequency segments [11].
The aforementioned methods achieve the selection of time-frequency segments.However, they are hindered by limited optional time-frequency segments, which are dependent on the preset parameters.To address this issue, we proposes the sparrow search algorithm (SSA) correlation-based channel selection (CCS)-regularized common spatial pattern (RCSP) algorithm, which adaptively explores the optimal time-frequency segments and selects appropriate channels to enhance the performance of feature extraction for subjects.Initially, the SSA is employed to optimize the time-frequency segments.Subsequently, the CCS algorithm is utilized to identify EEG channels exhibiting significant correlation.Following this, the RCSP algorithm is implemented to extract signal features from the aforementioned selected channels.Lastly, the support vector machine (SVM) is employed to classify the feature vectors.
Experiments were conducted to verify the effectiveness and generalization of the algorithm across varying subjects and channels, utilizing the publicly available BCI Competition III Dataset IIIa, autonomously collected data from the Chinese Academy of Medical Sciences, and the BCI Competition IV Dataset 1, respectively.The results showed that the recognition efficiency of the proposed algorithm outperformed the algorithms in the comparison group, demonstrating its suitability for diverse subjects and channels.The proposed algorithm is expected to promote the application of MI-BCI in the field of motor rehabilitation by customizing personalized time-frequency segments for subjects.
The major contributions of this study are listed as follows: • A new approach to optimize the time-frequency segment is defined, aiming to provide a subject-specific time-frequency segment.• We develop a novel framework for MI decoding.Rather than employing a noncustomized time-frequency segment, our approach adaptively explores the optimal time-frequency segment without constraints on limited optional segments.• To verify the effectiveness of the proposed algorithm, three MI-EEG datasets are included for classification experiments, and the proposed algorithm shows competitive and robust performance.
The rest of the paper is organized as follows: In Section 2, some related work is reviewed at first.Section 3 introduces the details of the proposed methods, including optimization of time-frequency segments, channel selection, feature extraction, and classification.Section 4 elaborates upon data preprocessing, the overall framework of the algorithm, and the results of the experiment.Section 5 analyzes the experimental results and discusses the proposed algorithm.Section 6 concludes the study and points out future work.

Related Works
The feature extraction and channel selection process is key to the MI decoding process.Feature extraction in MI decoding faces the challenge of a low signal-to-noise ratio, inherent non-stationary nature, and high inter-subject variability.Currently, many MI feature extraction algorithms have been developed to enhance the performance of MI-BCI [12][13][14].CSP is a powerful spatial filtering feature extraction method that provides a filter to maximize the variance of one class of signals while minimizing the variance from the other class.Many variations of CSP have been proposed to enhance its performance, among which the regularized CSP exhibits promising results [5][6][7].However, the effectiveness of CSP heavily relies on pre-adopted time truncation and frequency filtering.Moreover, the high inter-subject variability means that the optimal time-frequency segments for extracting features could be different for different subjects.Hence, a broad frequency range (typically 8~30 Hz) and a non-customized time segment are commonly used [10].Exhaustive search is another effective method for addressing this issue; however, the impractical computational time requirements hinder its practical application.
To achieve automatic selection of frequency bands, Ang et al. proposed the Filter Bank Common Spatial Pattern (FBCSP) method, which decomposes the signal into multiple frequency bands through bandpass filtering.Subsequently, CSP is applied to each band, followed by the use of a feature selection algorithm to automatically select frequency bands and their corresponding CSP features [10].However, there are still some limitations involved.First, there is a lack of attention to the optimal time segment selection.Time segment selection is another important issue in the decoding of MI EEG signals, aiming to cover the interval strongly correlated with motor intention and remove unrelated sampling points.Second, the search space for time-frequency segments is dependent on the choice of initial parameters, and the limitations of the available time-frequency segments make it difficult to find subject-specific optimal segments.
To address the first concern, Peterson et al. decomposed the EEG signals into multiple time windows.In each window, they extracted FBCSP features, resulting in penalized time-frequency band CSP features [15].They simultaneously considered the selection of time segments and frequency bands and achieved a more stable classification accuracy compared to FBCSP.To address the second concern, a temporally constrained sparse group spatial pattern applies bandpass filtering to the raw EEG data across a set of overlapping filter bands [16].This method enhances the variety of selectable time-frequency segments.The utilization of sliding windows and overlapping time-frequency segments serves to augment the pool of available combinations; nonetheless, the selection of time-frequency segments remains contingent upon initial parameters, presenting a challenge in terms of the freedom of time-frequency segment selection.Hence, achieving time-frequency band optimization without being limited by selection space is a problem that needs to be solved.
The main types of channel selection algorithms encompass those employing neural networks and those utilizing functional connectivity methods.Channel selection algorithms utilizing functional connectivity methods are based on the theory that signals from the same type of motor imagery task are more similar, while signals from different types of motor imagery tasks exhibit more significant differences.By sorting the correlation across different channels, the optimal channels are selected.In [17], cross-correlation is proposed to evaluate the correlation across channels using the Pearson coefficient [18].
Channel selection algorithms employing neural networks have gained attention in recent years.In [19], researchers considered EEG channels as a group of neural networks and studied EEG channel selection.In [20], a convolutional neural network with specific channels was employed as a feature extractor to enhance the separability of classifications.Subsequently, a collection of deep sparse autoencoders was utilized to select the optimal channels.Sun [21-24] et al. proposed a graph-based channel selection method, which automatically selected a specified number of optimal channels in the graph convolutional neural network by utilizing the time-domain correlation of signals.
Optimization algorithms have also been applied to enhance channel selection.In [25], a neural evolutionary channel selection algorithm was proposed based on an improved particle swarm optimization algorithm for channel selection, and CSP was used for feature extraction.
In this study, we propose an adaptive time-frequency segment optimization algorithm to overcome the selection space limitation and customize personalized time-frequency segments for subjects.Furthermore, we incorporate CCS for channel selection and utilize RCSP for feature extraction to enhance the decoding performance of MI.

Datasets
The performance of the proposed method was validated on three datasets: the BCI Competition III Dataset IIIa, the Chinese Academy of Medical Sciences autonomous dataset, and the BCI Competition IV Dataset 1.

BCI Competition III Dataset IIIa (DS1)
The BCI Competition III Dataset IIIa is a four-class EEG dataset (left hand, right hand, foot, tongue; the first two actions were used for the experiments in this paper) which was acquired using a 64-channel Neuroscan EEG amplifier, with the left mastoid for reference and the right mastoid as ground.The sampling frequency for the data was set at 250 Hz, and the recording was conducted using 60 electrodes [26,27].The distribution of the channels is shown in Figure 1.
channels.et al. proposed a graph-based channel selection method, which automatically selected a specified number of optimal channels in the graph convolutional neural network by utilizing the time-domain correlation of signals.
Optimization algorithms have also been applied to enhance channel selection.In [25], a neural evolutionary channel selection algorithm was proposed based on an improved particle swarm optimization algorithm for channel selection, and CSP was used for feature extraction.
In this study, we propose an adaptive time-frequency segment optimization algorithm to overcome the selection space limitation and customize personalized time-frequency segments for subjects.Furthermore, we incorporate CCS for channel selection and utilize RCSP for feature extraction to enhance the decoding performance of MI.

Datasets
The performance of the proposed method was validated on three datasets: the BCI Competition III Dataset IIIa, the Chinese Academy of Medical Sciences autonomous dataset, and the BCI Competition IV Dataset 1.

BCI Competition III Dataset IIIa (DS1)
The BCI Competition III Dataset IIIa is a four-class EEG dataset (left hand, right hand, foot, tongue; the first two actions were used for the experiments in this paper) which was acquired using a 64-channel Neuroscan EEG amplifier, with the left mastoid for reference and the right mastoid as ground.The sampling frequency for the data was set at 250 Hz, and the recording was conducted using 60 electrodes [26,27].The distribution of the channels is shown in Figure 1.During the experimental sessions, the subjects sat in chairs with armrests and performed different MI tasks according to cues, and the order of cues was random.Each trial commenced with a period of silence lasting 2 s, then an acoustic stimulus and a cross on the screen indicated the start of the trial.From 3 s to 7 s, the subjects were asked to performed the indicated mental task.The timeline is shown in Figure 2.During the experimental sessions, the subjects sat in chairs with armrests and performed different MI tasks according to cues, and the order of cues was random.Each trial commenced with a period of silence lasting 2 s, then an acoustic stimulus and a cross on the screen indicated the start of the trial.From 3 s to 7 s, the subjects were asked to performed the indicated mental task.The timeline is shown in Figure 2. The dataset was recorded from three subjects (K3b, K6b, and L1b), and the numbers of trials per class were 90 and 60 for participants K3b and L1b, respectively.Due to the considerable variability in the accuracy of the K6b data reported in different studies, as well as suspicions regarding potential errors in the true labels, the credibility of this particular dataset was not deemed sufficiently high.Consequently, only the K3b and L1b datasets were employed as experimental data for the purposes of this paper.The dataset was recorded from three subjects (K3b, K6b, and L1b), and the numbers of trials per class were 90 and 60 for participants K3b and L1b, respectively.Due to the Sensors 2024, 24, 1678 5 of 18 considerable variability in the accuracy of the K6b data reported in different studies, as well as suspicions regarding potential errors in the true labels, the credibility of this particular dataset was not deemed sufficiently high.Consequently, only the K3b and L1b datasets were employed as experimental data for the purposes of this paper.

Chinese Academy of Medical Sciences Autonomous Dataset (DS2)
The Chinese Academy of Medical Sciences autonomous dataset is a two-class EEG dataset which was acquired using a 64-channel Neuracle EEG amplifier, with the REF channel for reference and the GND channel as ground.The sampling frequency for the data was set at 1000 Hz.The electrode locations adhered to the international 10-20 system.The distribution of the channels is shown in Figure 3.The dataset was recorded from three subjects (K3b, K6b, and L1b), and the numbers of trials per class were 90 and 60 for participants K3b and L1b, respectively.Due to the considerable variability in the accuracy of the K6b data reported in different studies, as well as suspicions regarding potential errors in the true labels, the credibility of this particular dataset was not deemed sufficiently high.Consequently, only the K3b and L1b datasets were employed as experimental data for the purposes of this paper.

Chinese Academy of Medical Sciences Autonomous Dataset (DS2)
The Chinese Academy of Medical Sciences autonomous dataset is a two-class EEG dataset which was acquired using a 64-channel Neuracle EEG amplifier, with the REF channel for reference and the GND channel as ground.The sampling frequency for the data was set at 1000 Hz.The electrode locations adhered to the international 10-20 system.The distribution of the channels is shown in Figure 3. Subjects performed 2 MI tasks of right-handed ball bouncing and foot lifting; the movements are shown in Figure 4.In each trial, the experiment commenced with a rest period of 1 s.Then, a visual stimulus in the form of a movement picture was presented to prompt the subjects to mentally simulate the depicted movement.Following a duration of 6 s for MI, the subjects were alerted to transition into a 3 s period of rest through the activation of a system buzzer and the display of a black screen.Each subject participated in 320 trials, with an equal distribution of 160 trials for each type of movement.The timeline is shown in Figure 5. Subjects performed 2 MI tasks of right-handed ball bouncing and foot lifting; the movements are shown in Figure 4.In each trial, the experiment commenced with a rest period of 1 s.Then, a visual stimulus in the form of a movement picture was presented to prompt the subjects to mentally simulate the depicted movement.Following a duration of 6 s for MI, the subjects were alerted to transition into a 3 s period of rest through the activation of a system buzzer and the display of a black screen.Each subject participated in 320 trials, with an equal distribution of 160 trials for each type of movement.The timeline is shown in Figure 5.The recording process involved nine subjects (A01~A09) in DS2 from the Chinese Academy of Medical Sciences, Tiangong University, and Tianjin Medical University.All subjects were right-handed, with an average age of 22.3 +/− 2.9 years.All subjects provided their informed consent (Supplementary Materials S1).The recording process involved nine subjects (A01~A09) in DS2 from the Chinese Academy of Medical Sciences, Tiangong University, and Tianjin Medical University.All subjects were right-handed, with an average age of 22.3 +/− 2.9 years.All subjects provided their informed consent (Supplementary Materials S1).The recording process involved nine subjects (A01~A09) in DS2 from the Chinese Academy of Medical Sciences, Tiangong University, and Tianjin Medical University.All subjects were right-handed, with an average age of 22.3 +/− 2.9 years.All subjects provided their informed consent (Supplementary Materials S1).

BCI Competition IV Dataset 1 (DS3)
The BCI Competition IV Dataset 1 was acquired from 59 EEG channels [28,29].For each subject, two classes of motor imagery were chosen from the three classes: left hand, right hand, and foot.Each trial began with a 2 s display of a fixation cross at the center of the screen, followed by the presentation of cues (arrows pointing left, right, or down) for 4 s.Participants were instructed to perform the corresponding MI task during this period.Subsequently, a 2 s blank screen ensued.The fixation cross overlaid the cues and was displayed for 6 s.Each of the two runs consisted of 100 trials, and the timeline for each trial is depicted in Figure 6.The recording process involved nine subjects (A01~A09) in DS2 from the Chinese Academy of Medical Sciences, Tiangong University, and Tianjin Medical University.All subjects were right-handed, with an average age of 22.3 +/− 2.9 years.All subjects provided their informed consent (Supplementary Materials S1).

BCI Competition IV Dataset 1 (DS3)
The BCI Competition IV Dataset 1 was acquired from 59 EEG channels [28,29].For each subject, two classes of motor imagery were chosen from the three classes: left hand, right hand, and foot.Each trial began with a 2 s display of a fixation cross at the center of the screen, followed by the presentation of cues (arrows pointing left, right, or down) for 4 s.Participants were instructed to perform the corresponding MI task during this period.Subsequently, a 2 s blank screen ensued.The fixation cross overlaid the cues and was displayed for 6 s.Each of the two runs consisted of 100 trials, and the timeline for each trial is depicted in Figure 6.DS3 comprises recordings from 7 participants engaged in MI tasks without feedback.Data from participants 'a', 'b', 'g', 'f' were utilized in this study, and 'c', 'd', and 'e' were excluded due to the fact that they were artificially generated..

Sparrow Search Algorithm-Based Time-Frequency Segment Optimization
SSA is a swarm intelligence optimization algorithm that is mainly inspired by the foraging and antipredator behavior of sparrows.Since the optimal time-frequency segments of EEG vary from subject to subject [30], non-customized time-frequency segments cannot represent the optimal segments for different individuals.To solve this problem, DS3 comprises recordings from 7 participants engaged in MI tasks without feedback.Data from participants 'a', 'b', 'g', 'f' were utilized in this study, and 'c', 'd', and 'e' were excluded due to the fact that they were artificially generated.

Method 3.2.1. Sparrow Search Algorithm-Based Time-Frequency Segment Optimization
SSA is a swarm intelligence optimization algorithm that is mainly inspired by the foraging and antipredator behavior of sparrows.Since the optimal time-frequency segments of EEG vary from subject to subject [30], non-customized time-frequency segments cannot represent the optimal segments for different individuals.To solve this problem, the SSA-based algorithm is utilized to obtain the optimal time-frequency segments of different individual EEG signals.During the search process, sparrows are classified into 3 roles [31]: (1) Producer, responsible for providing appropriate exploration areas or directions for all sparrows seeking to optimize the time-frequency segments of EEG signals; (2) Scrounger, following the sparrows that can indicate the direction of better timefrequency segments; (3) Scout, aware of potential shortcomings in the current process of optimizing the timefrequency segment range of EEG signals, and seeking better time-frequency segments by escaping from the current position.
The sparrow position parameters can be represented by the following matrix: where n is the number of sparrows and d denotes the parameter dimension, and the value of each row X i represents the time-frequency segment corresponding to the ith sparrow, i.e., the frequency start point f start (X i,1 ), frequency width f width (X i,2 ), time start point t start (X i,3 ), and time width t width (X i,4 ).
Sparrow roles are classified according to the fitness value, which is derived from the classifier's accuracy.It can be expressed by the following vector: f (X i ) denotes the fitness value obtained based on X i .The position of the sparrow with the lowest fitness value corresponds to the current optimal time-frequency segment.The k sparrows with lower fitness are classified as producers.Since producers are responsible for guiding other sparrows, they search in a broad range around themselves for better time-frequency segments.The producer location is updated as below: where t denotes the current iteration, j = 1, 2, . . ., d. iter max is the largest number of iterations, α ∈ (0 , 1] is a random number, Q is a random number obeying normal distribution, and X t i,j denotes the jth dimensional parameter of the ith sparrow in t iteration.L is a 1 × d matrix with all elements equal to 1. R 2 ∈ [0 , 1] is alarm value and ST ∈ [0.5 , 1) is the safety threshold; both of them are random numbers.When R 2 ≥ ST, producers will fly to other areas, searching for better time-frequency segments.On the other hand, when R 2 < ST, it means that no predator is around and the producers enter extensive search mode.This mechanism prevents the algorithm from falling into local optima.
The scrounger position is updated as follows: where X t worst and X t best are the global worst and best positions in t iterations, respectively.n 2 denotes the numbers of scroungers; if i > n 2 2 , the current time-frequency segment is not suitable for the subject under investigation and the scrounger will leave the current position.If i ≤ n 2 2 , the scrounger will approach the producer corresponding to the optimal time-frequency segment.
Some of the sparrows are aware of the danger; these sparrows are scouts, and they will abandon their current time-frequency segments and move to new locations.The scout is updated as below: where β and K are random numbers,β conforms to the standard normal distribution, and K ∈ (0 , 1] denotes the direction of the sparrow.ε is a small constant to avoid zero-division errors.Here, f i is the fitness value of the ith sparrow.f g and f w denote the best and worst fitness values, respectively.If f i = f g , the sparrow will need to move closer to others; otherwise, it will move to the vicinity of the current optimal position.After comparing using grid search method to determine the ratio, producers and scroungers are sorted by fitness, constituting 70% and 30% of the total, respectively, while scouts are randomly generated in the population with a ratio of 0.2.

Correlation-Based Channel Selection Algorithm
The channels associated with MI should contain common information for performing the same task [5].Therefore, removing less correlated channels can improve feature extraction.By calculating the Pearson correlation coefficients between signals from different channels, the CCS algorithm can select the subject-specific best channels that are associated with MI [32].Two steps need to be taken in the CCS algorithm: First, data normalization: z-score normalization is applied to the EEG signals with the assistance of the sklearn package in Python, so that the mean of EEG data is 0 and the standard deviation is 1.Before normalization, the signal undergoes time segmentation and frequency filtering, but without artifact removal.
Second, based on the Pearson correlation coefficient, the statistical relationship between every pair of channels is quantified.The Pearson correlation coefficient is calculated as below: where X and Y are the observed variables, X and Y are the mean variables, n is the sample size, and σ X and σ Y are the standard deviations of the variables.The matrix of correlation coefficients of the N channels is defined as: where N is the total number of channels.The mean of the ith row represents the correlation between the ith channel and other channels, while a higher correlation indicates that channel i is more related to other channels.Therefore, N t × N s highly correlated channels are selected after N t trials, and the N s channels that most frequently appear in N t × N s recordings are chosen as the ideal channels.

Feature Extraction Based on Regularized Common Spatial Pattern
The RCSP algorithm is an improved version of the CSP algorithm [5,7].The CSP algorithm uses a spatial filter to maximize the variance for one class and minimize the variance for another class.The RCSP algorithm introduces regularization parameters α and β into the covariance calculation to obtain the regularized mean covariance matrix Q class (α, β), which is calculated as: where trace(•) represents the matrix trace, I represents the identity matrix, and P class is the average covariance matrix, which is calculated as follows: where C class i is the normalized covariance matrix and Ĉclass i is the pair-wise covariance, which are defined as: Sensors 2024, 24, 1678 9 of 18 where cov(•) is the function that calculates the pair-wise covariance of each channel.To maximize the variance of one class while minimizing the variance of the other class, a filter ω is defined as: The above equation is transformed into the generalized eigenvalue problem by means of the Lagrange multiplier method: The eigenvectors of the m maximum and minimum eigenvalues are obtained as the optimal spatial filter ω 2m ∈ R N s ×2m , and the EEG signal features are extracted with the following equations: For the convenience of comparing the performance of the proposed algorithm on different datasets, the parameters m, α, and β are fixed at 2, 0.4, and 0.01, respectively.

Feature Classification Based on Support Vector Machine
SVM is a binary classification model which is suitable for nonlinear and high-dimensional classification [33].In this paper, a radial basis function (RBF) kernel function is used to transform the inner product in the high-dimensional feature space into the computed result of the kernel function in the low-dimensional feature space to reduce the computation's complexity.
Generalizability is an important indicator of the practical application capability of the model.To improve generalizability, in addition to tuning by hyperparameters, the calculation method of model evaluation can be adjusted.In this paper, a five-fold crossvalidation method was employed to estimate the accuracy of SVM, and the Sklearn package in Python was used.

Results
This section describes the experimental data processing framework, results, and performance of the proposed model and four benchmark models used for comparison.

Data Preprocessing
In this study, we underwent preprocessing using the EEGLAB toolbox [34] in Matlab (The MathWorks Inc., Natick, MA, USA) and the mne package in Python 3.6 (The Python Software Foundation, Wilmington, DE, USA).The preprocessing process included downsampling, filtering, time slicing, baseline correction, and artifact rejection.
For the DS1, the signals was downsampled to 250 Hz by the provider of the DS1.Since significant power changes occurred in the alpha (8~12 Hz) and beta (14~30 Hz) rhythms of EEG, the EEG data were filtered between 1 Hz and 50 Hz after notch filtering.In the experiment, data from 2 to 7 s were utilized, and baseline correction was performed using data ranging from 0 to 2 s.For the artifact rejection, source derivations based on the center and the neighbor electrodes were calculated, and the boundary electrodes were calculated based on neighbors.
For the DS2, we only retained 26 electrodes, which covered the sensorimotor area.The EEG data were filtered between 1 Hz and 50 Hz after 50 Hz notch filtering.Experimental analysis data were analyzed using 500 samples of data from 2 to 6 s, and baseline correction was performed using data ranging from 0 to 1 s.Then, the 1000 Hz raw EEG signal was downsampled to 125 Hz to reduce the data volume, enhancing the efficiency of signal processing and analysis.In order to validate the effectiveness of the algorithm on data with artifacts, no specific artifact removal processing was conducted.
For the DS3, the EEG data were filtered between 1 Hz and 50 Hz after 50 Hz notch filtering.In the experiment, data from 2 to 6 s were utilized, and baseline correction was performed using data ranging from 0 to 2 s.Then, the 1000 Hz raw EEG signal was downsampled to 250 Hz, and no specific artifact removal processing was conducted.
For all datasets, the SSA time optimization range was set to within 4 s after the onset of the MI task, enabling unrestricted exploration for the optimal time segment.The frequency optimization range was 1~40 Hz, encompassing the sensorimotor rhythms alpha (8~12 Hz) and beta (14~30 Hz), which are considered to be the main frequency bands related to MI.It can be observed that the three datasets differed in terms of channel count, artifact removal and motor intention.This difference contributed to validating the effectiveness of the algorithm when dealing with data from different channels, with artifacts and data related to different motor imagery tasks.

Whole Framework
In this work, we used SSA to determine the optimal time-frequency segments, specifically the time window and frequency band of EEG.Additionally, optimal channels were selected using the CCS method, and the RCSP method was applied to extract features before SVM classification.The framework of the proposed SSA-CCS-RCSP is shown in Figure 7.The procedure is as follows: (  3) to (8) until the predefined iteration limit is reached; in this experiment, the iteration limit was set to 20.The current optimal result obtained is the best time-frequency segment; (10) Use D t1 as the training set and D t2 as the test set.The ratios are determined according to the specifications of the datasets providers: 1:1 for DS1, 3:1 for DS2, and 3:7 for DS3.Based on the best time-frequency segment obtained, calculate the accuracy of the current model on the test set according to steps ( 5) to (7).
In the process of optimizing time-frequency segments based on the SSA, channel selection was accomplished by CCS through the computation of signal similarity across diverse channels.An imperative consideration for effective channel selection is maintaining a high signal-to-noise ratio; consequently, CCS was employed subsequent to time segmentation and frequency filtering.And prior to the classification process, RCSP was employed as a method for feature extraction.

Experimental Results
The performance of the proposed algorithm was evaluated through experimental comparisons using the datasets mentioned earlier.The comparison involved standard feature extraction methods like CSP, RCSP, and their fusion with SSA or CCS.Additionally, FBCSP, which integrates frequency band processing, was included for comprehensive comparison.The classification tasks were executed using an SVM classifier, and the results are presented in Table 1.The accuracy of the fusion method using SSA or CCS surpassed that of the original algorithm.Moreover, for all datasets, the classification accuracy of our approach was the highest.The results indicate that proposed algorithm effectively enhanced the efficiency of MI decoding.

Analysis of Variance for Results
We conducted a two-way analysis of variance on the experimental results to analyze whether the influence of different algorithms and datasets on classification accuracy was statistically significant.The analysis of variance with two factors was also employed, with the first factor being the algorithm and the second factor being the dataset.The results in Table 2 show that the interaction between the datasets and the algorithms was not statistically significant (p > 0.05).However, the algorithm factor had a significant impact on the classification accuracy (p < 0.05).The dataset factor significantly influenced the classification accuracy (p < 0.05), potentially stemming from individual disparities and variations in collection environments.In Table 3, post hoc analysis with least significant difference shows statistical differences between the SSA-CCS-RCSP and three algorithms implementing a variant of the CSP algorithm alone (namely, FBCSP, CSP, RCSP; p < 0.05).This would indicate that adding SSA, CCS, and RCSP to the classification chain will significantly increase the classification performance.

Optimal Time-Frequency Segment Distributions
By setting different target channel numbers, the CCS algorithm segmented the EEG signals of subjects K3b and L1b into 10 subsets of DS1 with varying channel numbers.The proposed algorithm was employed to calculate the optimal time-frequency segment.By statistically analyzing the optimal time-frequency bands for different datasets, Figure 8a,b illustrate the distribution of time-frequency bands for two subjects in the DS1.Observing Figure 9a,b, it can be found that around 10 Hz and 20 Hz, there is a significant energy distribution in the signal of subject K3b.At the same time, in Figure 8a, it can be observed that the optimized frequency band for K3b is prominently concentrated around 10 Hz and 20 Hz.The same phenomenon can also be observed in subject L1b from Figures 8b and 9c,d.Additionally, the distribution of time and frequency between two subjects was significantly different, indicating individual differences in EEG signals.From Figure 9, it can be observed that L1B exhibited a significantly lower energy distribution frequency band compared to K3b, and this phenomenon is reflected in Figure 8.It can be inferred that the optimal time-frequency bands obtained by the proposed algorithm effectively reflect the changes in the energy of the EEG signals.Observing Figure 9a,b, it can be found that around 10 Hz and 20 Hz, there is a significant energy distribution in the signal of subject K3b.At the same time, in Figure 8a, it can be observed that the optimized frequency band for K3b is prominently concentrated around 10 Hz and 20 Hz.The same phenomenon can also be observed in subject L1b from Figures 8b and 9c,d.Additionally, the distribution of time and frequency between two subjects was significantly different, indicating individual differences in EEG signals.From Figure 9, it can be observed that L1B exhibited a significantly lower energy distribution frequency band compared to K3b, and this phenomenon is reflected in Figure 8.It can be inferred that the optimal time-frequency bands obtained by the proposed algorithm effectively reflect the changes in the energy of the EEG signals.Observing Figure 9a,b, it can be found that around 10 Hz and 20 Hz, there is a significant energy distribution in the signal of subject K3b.At the same time, in Figure 8a, it can be observed that the optimized frequency band for K3b is prominently concentrated around 10 Hz and 20 Hz.The same phenomenon can also be observed in subject L1b from Figures 8b and 9c,d.Additionally, the distribution of time and frequency between two subjects was significantly different, indicating individual differences in EEG signals.From Figure 9, it can be observed that L1B exhibited a significantly lower energy distribution frequency band compared to K3b, and this phenomenon is reflected in Figure 8.It can be inferred that the optimal time-frequency bands obtained by the proposed algorithm effectively reflect the changes in the energy of the EEG signals.

Effect of SSA on Channel Selection
We investigated the impact of SSA on channel selection in CCS.We applied the SSA algorithm to DS1, and 20 channels were selected by the CCS algorithm for K3B and L1B, as shown in Figure 10.The selected channels of the two participants cover areas located in the primary sensorimotor region, which are considered significant areas for ERD/ERS.The overlap between the selected channels and the significant regions of MI-EEG signals suggests that the chosen channels may be neurophysiologically meaningful.
From Figure 10, it can be observed that the channel selection combined with timefrequency optimization yielded results similar to the original CCS.Due to differences in brain activation regions within different frequency bands during the motor imagery task, there were still differences in individual channels.Considering the classification results in Table 1, our proposed method was more effective for different subjects.

Effect of SSA on RCSP Feature Distribution
In order to compare the differences in the distributions of features with and without SSA, Figure 11 illustrates the feature distributions of subjects K3B and L1B in DS1.The feature distributions after optimization with SSA became more discriminative, which provides a basis for explaining the improvement in accuracy at the feature distribution level.

Effect of SSA on Channel Selection
We investigated the impact of SSA on channel selection in CCS.We applied the SSA algorithm to DS1, and 20 channels were selected by the CCS algorithm for K3B and L1B, as shown in Figure 10.

Effect of SSA on Channel Selection
We investigated the impact of SSA on channel selection in CCS.We applied the SSA algorithm to DS1, and 20 channels were selected by the CCS algorithm for K3B and L1B, as shown in Figure 10.The selected channels of the two participants cover areas located in the primary sensorimotor region, which are considered significant areas for ERD/ERS.The overlap between the selected channels and the significant regions of MI-EEG signals suggests that the chosen channels may be neurophysiologically meaningful.
From Figure 10, it can be observed that the channel selection combined with timefrequency optimization yielded results similar to the original CCS.Due to differences in brain activation regions within different frequency bands during the motor imagery task, there were still differences in individual channels.Considering the classification results in Table 1, our proposed method was more effective for different subjects.

Effect of SSA on RCSP Feature Distribution
In order to compare the differences in the distributions of features with and without SSA, Figure 11 illustrates the feature distributions of subjects K3B and L1B in DS1.The feature distributions after optimization with SSA became more discriminative, which provides a basis for explaining the improvement in accuracy at the feature distribution level.The selected channels of the two participants cover areas located in the primary sensorimotor region, which are considered significant areas for ERD/ERS.The overlap between the selected channels and the significant regions of MI-EEG signals suggests that the chosen channels may be neurophysiologically meaningful.
From Figure 10, it can be observed that the channel selection combined with timefrequency optimization yielded results similar to the original CCS.Due to differences in brain activation regions within different frequency bands during the motor imagery task, there were still differences in individual channels.Considering the classification results in Table 1, our proposed method was more effective for different subjects.

Effect of SSA on RCSP Feature Distribution
In order to compare the differences in the distributions of features with and without SSA, Figure 11 illustrates the feature distributions of subjects K3B and L1B in DS1.The feature distributions after optimization with SSA became more discriminative, which provides a basis for explaining the improvement in accuracy at the feature distribution level.

Effects of CCS and SSA on Improving Classification
We compared the CSP and RCSP algorithms with their fusion algorithms incorporating CCS or SSA on DS1, DS2, and DS3.The obtained comparison results are presented in Table 4. Compared with the CSP and RCSP algorithms, the mean accuracy of the CCS fusion algorithm was improved (2.83% and 3.72% for DS1, 2.17% and 2.19% for DS2, 15.58% and 10.25% for DS3, respectively).Compared with the CSP and RCSP algorithms, the mean accuracy of the SSA-RCSP algorithm was improved (3.72% and 5.22% for DS1, 3.92% and 3.94% for DS2, and 13.00% and 7.67% for DS3, respectively).The proposed SSA-CSP-RCSP proposed in this paper achieved the highest accuracy on all datasets, with 99.11%, 87.70%, and 90.50%, respectively.

Effects of CCS and SSA on Improving Classification
We compared the CSP and RCSP algorithms with their fusion algorithms incorporating CCS or SSA on DS1, DS2, and DS3.The obtained comparison results are presented in Table 4. Compared with the CSP and RCSP algorithms, the mean accuracy of the CCS fusion algorithm was improved (2.83% and 3.72% for DS1, 2.17% and 2.19% for DS2, 15.58% and 10.25% for DS3, respectively).Compared with the CSP and RCSP algorithms, the mean accuracy of the SSA-RCSP algorithm was improved (3.72% and 5.22% for DS1, 3.92% and 3.94% for DS2, and 13.00% and 7.67% for DS3, respectively).The proposed SSA-CSP-RCSP proposed in this paper achieved the highest accuracy on all datasets, with 99.11%, 87.70%, and 90.50%, respectively.
The results show that the fusion algorithms of SSA and CCS enhanced the classification accuracy by optimizing time-frequency segments and channel selection when compared to algorithms employing non-customized time-frequency segments and utilizing all channels.Furthermore, the fusion algorithm incorporating CCS effectively reduced the number of required channels and decreased the amount of data to be processed.

Performance of SSA with Different Channel Numbers
The channels of the EEG signal datasets are usually not uniform, necessitating an examination of the suitability of SSA in datasets with varying channel numbers.The CCS algorithm segmented the EEG signals into datasets with varying channel numbers, and the SSA-based algorithm was employed to obtain the optimal time-frequency segment.
Prior to employing the CCS-RCSP algorithm for extracting signal features, time-frequency filtering was separately performed using the optimized segments and non-customized parameters (0~4 s and 8~30 Hz).The accuracy of classification by SVM is illustrated in Figure 12.
The results show that the fusion algorithms of SSA and CCS enhanced the classification accuracy by optimizing time-frequency segments and channel selection when compared to algorithms employing non-customized time-frequency segments and utilizing all channels.Furthermore, the fusion algorithm incorporating CCS effectively reduced the number of required channels and decreased the amount of data to be processed.

Performance of SSA with Different Channel Numbers
The channels of the EEG signal datasets are usually not uniform, necessitating an examination of the suitability of SSA in datasets with varying channel numbers.The CCS algorithm segmented the EEG signals into datasets with varying channel numbers, and the SSA-based algorithm was employed to obtain the optimal time-frequency segment.
Prior to employing the CCS-RCSP algorithm for extracting signal features, time-frequency filtering was separately performed using the optimized segments and non-customized parameters (0~4 s and 8~30 Hz).The accuracy of classification by SVM is illustrated in Figure 12.It can be observed from Figure 12 that, with data from different channels, the accuracy of the SSA-CCS-RCSP method consistently exceeded the accuracy of methods based on non-customized time-frequency ranges.The results indicate that this algorithm can be generalized to data with different channel counts.

Conclusions
To address the problem of poor generalization in algorithms utilizing non-customized time-frequency segments due to the individual variability, this paper proposes a new approach to optimize the time-frequency segment.Further, the CCS was employed to select channels relevant to the MI tasks, while the RCSP was utilized for feature extraction.The performance of the proposed method was evaluated using three datasets: the BCI Competition III Dataset IIIa, the Chinese Academy of Medical Sciences autonomous dataset, and the BCI Competition IV Dataset 1.The results show that the classification efficiency of the proposed algorithm outperformed the algorithms in the comparison group.It effectively addressed the issue of time-frequency segment optimization.
Although this algorithm does consume more computational resources compared to those based on non-customized time-frequency segments, the computational cost is not required for testing (almost real-time), but only for model training (approximately 2000 s is required for data with 59 channels, 1000 sampling points, and 200 trials), and this will It can be observed from Figure 12 that, with data from different channels, the accuracy of the SSA-CCS-RCSP method consistently exceeded the accuracy of methods based on non-customized time-frequency ranges.The results indicate that this algorithm can be generalized to data with different channel counts.

Conclusions
To address the problem of poor generalization in algorithms utilizing non-customized time-frequency segments due to the individual variability, this paper proposes a new approach to optimize the time-frequency segment.Further, the CCS was employed to select channels relevant to the MI tasks, while the RCSP was utilized for feature extraction.The performance of the proposed method was evaluated using three datasets: the BCI Competition III Dataset IIIa, the Chinese Academy of Medical Sciences autonomous dataset, and the BCI Competition IV Dataset 1.The results show that the classification efficiency of the proposed algorithm outperformed the algorithms in the comparison group.It effectively addressed the issue of time-frequency segment optimization.
Although this algorithm does consume more computational resources compared to those based on non-customized time-frequency segments, the computational cost is not required for testing (almost real-time), but only for model training (approximately 2000 s is required for data with 59 channels, 1000 sampling points, and 200 trials), and this will not impede its significant importance in practical implementations.Accordingly, we have enough reasons to believe that the proposed methods could significantly improve the MI classification performance.Meanwhile, the proposed methods would potentially contribute

Figure 1 .
Figure 1.Distribution of channels of DS1.The circles and different numbers correspond to the positions of EEG electrodes.

Figure 1 .
Figure 1.Distribution of channels of DS1.The circles and different numbers correspond to the positions of EEG electrodes.

Figure 2 .
Figure 2. Timeline of one trial of DS1.

Figure 2 .
Figure 2. Timeline of one trial of DS1.

Figure 5 .
Figure 5. Timeline of one trial of DS2.

Figure 5 .
Figure 5. Timeline of one trial of DS2.

Figure 5 .
Figure 5. Timeline of one trial of DS2.

Figure 5 .
Figure 5. Timeline of one trial of DS2.

Figure 6 .
Figure 6.Timeline of one trial of DS3.

Figure 6 .
Figure 6.Timeline of one trial of DS3.

19 Figure 7 .
Figure 7. Illustration of the proposed framework.The procedure is as follows: (1) Initialize the positions of n (n = 10) sparrows, corresponding to time-frequency segments; (2) Split the experimental data into a training set D and a test set D in a 7:3 ratio; (3) Apply time-frequency filtering to D based on the current time-frequency segment; (4) Utilize a five-fold cross-validation method, where D is split into a training set D and a validation set D in each iteration (in a 4:1 ratio); (5) Calculate the optimal channels for D using the CCS algorithm and remove irrelevant channel data from both D and D ; (6) Compute RCSP spatial filters on D and extract features from both D and D ; (7) Train an SVM classifier on D and evaluate the accuracy (fitness) on the test set D ; (8) Update the time-frequency segments for each sparrow based on the fitness; (9) Repeat steps (3) to (8) until the predefined iteration limit is reached; in this experiment, the iteration limit was set to 20.The current optimal result obtained is the best time-frequency segment; (10) Use D as the training set and D as the test set.The ratios are determined according to the specifications of the datasets providers: 1:1 for DS1, 3:1 for DS2, and 3:7 for DS3.Based on the best time-frequency segment obtained, calculate the accuracy of the current model on the test set according to steps (5) to (7).

Figure 7 .
Figure 7. Illustration of the proposed framework.

1 )
Initialize the positions of n (n = 10) sparrows, corresponding to time-frequency segments; (2) Split the experimental data into a training set D t1 and a test set D t2 in a 7:3 ratio; (3) Apply time-frequency filtering to D t1 based on the current time-frequency segment; (4) Utilize a five-fold cross-validation method, where D t1 is split into a training set D t3 and a validation set D t4 in each iteration (in a 4:1 ratio); (5) Calculate the optimal channels for D t3 using the CCS algorithm and remove irrelevant channel data from both D t3 and D t4 ; (6) Compute RCSP spatial filters on D t3 and extract features from both D t3 and D t4 ; (7) Train an SVM classifier on D t3 and evaluate the accuracy (fitness) on the test set D t4 ; (8) Update the time-frequency segments for each sparrow based on the fitness; (9) Repeat steps (

Figure 8 .
Figure 8. Optimal time-frequency segment distributions.(a) Frequency start and end point distribution; (b) time segment start and end point distribution.*** and ** above certain lines denote that the distribution of time and frequency between two subjects was significantly different at the 0.001 and 0.01 levels of significance.The two closely spaced dashed lines at the top and bottom represent the maximum and minimum values, while the dashed line in the middle represents the median.

Figure 8 .
Figure 8. Optimal time-frequency segment distributions.(a) Frequency start and end point distribution; (b) time segment start and end point distribution.*** and ** above certain lines denote that the distribution of time and frequency between two subjects was significantly different at the 0.001 and 0.01 levels of significance.The two closely spaced dashed lines at the top and bottom represent the maximum and minimum values, while the dashed line in the middle represents the median.

Figure 8 .
Figure 8. Optimal time-frequency segment distributions.(a) Frequency start and end point distribution; (b) time segment start and end point distribution.*** and ** above certain lines denote that the distribution of time and frequency between two subjects was significantly different at the 0.001 and 0.01 levels of significance.The two closely spaced dashed lines at the top and bottom represent the maximum and minimum values, while the dashed line in the middle represents the median.

Figure 10 .
Figure 10.Channel selection results with and without SSA.(a) Channel selection results for K3B; (b) channel selection results for L1B.The channels selected by CCS are annotated in green.

Figure 10 .
Figure 10.Channel selection results with and without SSA.(a) Channel selection results for K3B; (b) channel selection results for L1B.The channels selected by CCS are annotated in green.

Figure 10 .
Figure 10.Channel selection results with and without SSA.(a) Channel selection results for K3B; (b) channel selection results for L1B.The channels selected by CCS are annotated in green.

Figure 11 .
Figure 11.Feature distribution with and without SSA.(a) Feature distribution for K3b; (b) feature distribution for L1b; (c) 3D feature distribution for K3b; (d) 3D feature distribution for L1b.The axes represent the feature values extracted by RCSP(m = 2); each dot corresponds to the feature of a trial.

Figure 11 .
Figure 11.Feature distribution with and without SSA.(a) Feature distribution for K3b; (b) feature distribution for L1b; (c) 3D feature distribution for K3b; (d) 3D feature distribution for L1b.The axes represent the feature values extracted by RCSP(m = 2); each dot corresponds to the feature of a trial.

Figure 12 .
Figure 12.Comparison of classification accuracies between optimized time-frequency segments and non-customized time-frequency segments.(a) Classification accuracies for k3b; (b) classification accuracies for l1b.

Figure 12 .
Figure 12.Comparison of classification accuracies between optimized time-frequency segments and non-customized time-frequency segments.(a) Classification accuracies for k3b; (b) classification accuracies for l1b.

Table 2 .
Two-way analysis of variance for experimental results.

Table 3 .
Post hoc analysis with least significant difference.

Table 4 .
Mean classification accuracies achieved by CSP, RCSP, SSA-RCSP, CCS-CSP, CCS-RCSP, and SSA-CCS-RCSP, with SVM being the classifier, respectively, on DS1, DS2, and DS3.The abbreviation NC represents the average number of channels selected across subjects in datasets by CCS under the highest accuracy levels.

Table 4 .
Mean classification accuracies achieved by CSP, RCSP, SSA-RCSP, CCS-CSP, CCS-RCSP, and SSA-CCS-RCSP, with SVM being the classifier, respectively, on DS1, DS2, and DS3.The abbreviation NC represents the average number of channels selected across subjects in datasets by CCS under the highest accuracy levels.