Feature Selection for Motor Imagery EEG Classification Based on Firefly Algorithm and Learning Automata

Motor Imagery (MI) electroencephalography (EEG) is widely studied for its non-invasiveness, easy availability, portability, and high temporal resolution. As for MI EEG signal processing, the high dimensions of features represent a research challenge. It is necessary to eliminate redundant features, which not only create an additional overhead of managing the space complexity, but also might include outliers, thereby reducing classification accuracy. The firefly algorithm (FA) can adaptively select the best subset of features, and improve classification accuracy. However, the FA is easily entrapped in a local optimum. To solve this problem, this paper proposes a method of combining the firefly algorithm and learning automata (LA) to optimize feature selection for motor imagery EEG. We employed a method of combining common spatial pattern (CSP) and local characteristic-scale decomposition (LCD) algorithms to obtain a high dimensional feature set, and classified it by using the spectral regression discriminant analysis (SRDA) classifier. Both the fourth brain–computer interface competition data and real-time data acquired in our designed experiments were used to verify the validation of the proposed method. Compared with genetic and adaptive weight particle swarm optimization algorithms, the experimental results show that our proposed method effectively eliminates redundant features, and improves the classification accuracy of MI EEG signals. In addition, a real-time brain–computer interface system was implemented to verify the feasibility of our proposed methods being applied in practical brain–computer interface systems.


Introduction
Motor imagery brain-computer interface (BCI) [1] is significant for stroke patients, which can help in the recovery of damaged nerves and play an important role in assisting with rehabilitation training. The application spectrum of the BCI also extends to navigation, healthcare, military services, robotics, virtual gaming, communication, and controls [2,3]. The classification accuracy of motor imagery is of great importance in the performance of BCI systems. As for EEG signal processing, the high dimensions of features represent a technical challenge. It is necessary to eliminate the redundant features, which not only create an additional overhead of managing the space complexity but also might include outliers, thereby reducing classification accuracy [4].
Numerous feature selection approaches have been proposed, including principal component analysis [5], independent component analysis [6], sequential forward search [7], and kernel principal The dataset includes four imagination movements: left hand, right hand, both feet, and tongue. Nine subjects recorded EEG data sets on two different days. During the experiment, the subjects sat in front of a computer, reacting according to computer prompts. An experiment lasted 8 s. The experimental timing scheme of the paradigm is shown in Figure 2. The signals were sampled with 250 Hz and band-pass filtered between 0.5 Hz and 100 Hz. The sensitivity of the amplifier was set to 100 μV. An additional 50 Hz notch filter was enabled to suppress power line noise. Each subject's data set consisted of two parts: the training set and the evaluation set, including the 576 experiments entirely in the data set 2a [17].
Before the feature extraction of EEG signals, the signal should be effectively pre-treated to enhance the signal-to-noise ratio and remove artifacts. According to the characteristics of event-related de-synchronization and event-related synchronization, the data were band-pass filtered between 8 Hz and 30 Hz [18] by using a five-order Butterworth band-pass filter.

Experimental Setup
The EEG signals were acquired by using a UE-16B EEG amplifier. It contained 16 channels (Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T3, T4, T5, T6) and a low-pass filter with a cut-off frequency of 15-120 Hz. The sampling frequency of the amplifier was 1000 Hz. In our experiment, the 16-channels were selected as the data source, the ground channel was laid at the forehead, and the reference channels were positioned at the mastoids (A1 and A2). The experimenter wore the EEG cap which connected to the signal acquisition device to enable data collection. The dataset includes four imagination movements: left hand, right hand, both feet, and tongue. Nine subjects recorded EEG data sets on two different days. During the experiment, the subjects sat in front of a computer, reacting according to computer prompts. An experiment lasted 8 s. The experimental timing scheme of the paradigm is shown in Figure 2 (taken from [17]). The dataset includes four imagination movements: left hand, right hand, both feet, and tongue. Nine subjects recorded EEG data sets on two different days. During the experiment, the subjects sat in front of a computer, reacting according to computer prompts. An experiment lasted 8 s. The experimental timing scheme of the paradigm is shown in Figure 2. The signals were sampled with 250 Hz and band-pass filtered between 0.5 Hz and 100 Hz. The sensitivity of the amplifier was set to 100 μV. An additional 50 Hz notch filter was enabled to suppress power line noise. Each subject's data set consisted of two parts: the training set and the evaluation set, including the 576 experiments entirely in the data set 2a [17].
Before the feature extraction of EEG signals, the signal should be effectively pre-treated to enhance the signal-to-noise ratio and remove artifacts. According to the characteristics of event-related de-synchronization and event-related synchronization, the data were band-pass filtered between 8 Hz and 30 Hz [18] by using a five-order Butterworth band-pass filter.

Experimental Setup
The EEG signals were acquired by using a UE-16B EEG amplifier. It contained 16 channels (Fp1,  Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T3, T4, T5, T6) and a low-pass filter with a cut-off frequency of 15-120 Hz. The sampling frequency of the amplifier was 1000 Hz. In our experiment, the 16-channels were selected as the data source, the ground channel was laid at the forehead, and the reference channels were positioned at the mastoids (A1 and A2). The experimenter wore the EEG cap which connected to the signal acquisition device to enable data collection. The signals were sampled with 250 Hz and band-pass filtered between 0.5 Hz and 100 Hz. The sensitivity of the amplifier was set to 100 µV. An additional 50 Hz notch filter was enabled to suppress power line noise. Each subject's data set consisted of two parts: the training set and the evaluation set, including the 576 experiments entirely in the data set 2a [17].
Before the feature extraction of EEG signals, the signal should be effectively pre-treated to enhance the signal-to-noise ratio and remove artifacts. According to the characteristics of event-related de-synchronization and event-related synchronization, the data were band-pass filtered between 8 Hz and 30 Hz [18] by using a five-order Butterworth band-pass filter.

Experimental Setup
The EEG signals were acquired by using a UE-16B EEG amplifier. It contained 16 channels (Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T3, T4, T5, T6) and a low-pass filter with a cut-off frequency of 15-120 Hz. The sampling frequency of the amplifier was 1000 Hz. In our experiment, the 16-channels were selected as the data source, the ground channel was laid at the forehead, and the reference channels were positioned at the mastoids (A1 and A2). The experimenter wore the EEG cap which connected to the signal acquisition device to enable data collection.
Four healthy subjects, two males and two females, right-handed, aged between 22 and 25, and without any neurological diseases, participated in the experiment. In a quiet and dark environment, the subjects kept their muscles relaxed and focused on the experiment to reduce environmental factor and muscle strain interference with the experimental results. Before the experiment, the subjects were given appropriate exercise imagination training to familiarize them with the experimental procedure. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted with approval from the Wuhan University of Technology.
Four types of motor imagery, including left hand, right hand, double foot and tongue movements, were used as thinking tasks, and the subjects were asked to carry out the corresponding action imagery according to the screen prompt. The training data acquisition experiment procedure is as follows: • At the beginning of t = 0 s, a beep is generated and the duration of the beep is 1 s. Meanwhile, the screen displays a gray background image that prompts the subject to prepare. • At t = 2 s, a prompt picture appears on the screen prompting the subject to perform the appropriate MI. The subject's imaginary thinking activity continues until the prompt picture disappears. • At t = 6 s, the prompt image on the screen disappears and a gray background image reappears on the screen prompting the subject to rest for two seconds.

•
At t = 8 s, a single experiment ends and the next experiment begins.
Training data acquisition and experimental process design are shown in Figure 3. Four healthy subjects, two males and two females, right-handed, aged between 22 and 25, and without any neurological diseases, participated in the experiment. In a quiet and dark environment, the subjects kept their muscles relaxed and focused on the experiment to reduce environmental factor and muscle strain interference with the experimental results. Before the experiment, the subjects were given appropriate exercise imagination training to familiarize them with the experimental procedure. All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted with approval from the Wuhan University of Technology.
Four types of motor imagery, including left hand, right hand, double foot and tongue movements, were used as thinking tasks, and the subjects were asked to carry out the corresponding action imagery according to the screen prompt. The training data acquisition experiment procedure is as follows:


At the beginning of t = 0 s, a beep is generated and the duration of the beep is 1 s. Meanwhile, the screen displays a gray background image that prompts the subject to prepare.  At t = 2 s, a prompt picture appears on the screen prompting the subject to perform the appropriate MI. The subject's imaginary thinking activity continues until the prompt picture disappears.  At t = 6 s, the prompt image on the screen disappears and a gray background image reappears on the screen prompting the subject to rest for two seconds.  At t = 8 s, a single experiment ends and the next experiment begins.
Training data acquisition and experimental process design are shown in Figure 3.

System Design
In order to verify the feasibility of the proposed algorithm, we designed a real-time BCI system based on MI. The system includes three modules: signal acquisition, signal processing, and human-computer interaction interface. The signal acquisition module collects, filters, and sorts the original electroencephalography (EEG) signal, and then sends the sorted EEG data to the signal processing module through socket communication. The signal processing module processes the EEG data and translates into the corresponding control instructions, and feeds the results back to the user. The human-computer interaction interface module is used for user information storage, experimental flow control, visual prompt, and control instruction output. The overall structure of the real-time BCI system is shown in Figure 4.

System Design
In order to verify the feasibility of the proposed algorithm, we designed a real-time BCI system based on MI. The system includes three modules: signal acquisition, signal processing, and human-computer interaction interface. The signal acquisition module collects, filters, and sorts the original electroencephalography (EEG) signal, and then sends the sorted EEG data to the signal processing module through socket communication. The signal processing module processes the EEG data and translates into the corresponding control instructions, and feeds the results back to the user. The human-computer interaction interface module is used for user information storage, experimental flow control, visual prompt, and control instruction output. The overall structure of the real-time BCI system is shown in Figure 4.  According to the above experimental design, the motor imagery task begins at t = 2 s, and ends at t = 6 s. We segmented the data in this duration into several epochs and conducted a series of experiments to find that the data between 2.5 s to 3.5 s achieved the best performance, so this data epoch was selected for feature extraction and feature selection. Then, a maze game was displayed as seen in Figure 5. The mapping relationship between the motor imagery tasks and movement directions is shown in Table 1. According to the above experimental design, the motor imagery task begins at t = 2 s, and ends at t = 6 s. We segmented the data in this duration into several epochs and conducted a series of experiments to find that the data between 2.5 s to 3.5 s achieved the best performance, so this data epoch was selected for feature extraction and feature selection. Then, a maze game was displayed as seen in Figure 5. The mapping relationship between the motor imagery tasks and movement directions is shown in Table 1.
According to the above experimental design, the motor imagery task begins at t = 2 s, and ends at t = 6 s. We segmented the data in this duration into several epochs and conducted a series of experiments to find that the data between 2.5 s to 3.5 s achieved the best performance, so this data epoch was selected for feature extraction and feature selection. Then, a maze game was displayed as seen in Figure 5. The mapping relationship between the motor imagery tasks and movement directions is shown in Table 1.

Feature Extraction
Common Spatial Pattern The common spatial pattern (CSP) is the most commonly used method for MI EEG signal feature extraction [19]. The effect is very prominent, especially for two-class problem of EEG signal classification. We denote the EEG data of trial i for class A, which is a matrix of size N by M. Here N represents the number of channels, and M represents the number of sample points in the time domain of a trial. In theory, there are two classes (namely A and B). By solving the covariance matrix

Feature Extraction
Common Spatial Pattern The common spatial pattern (CSP) is the most commonly used method for MI EEG signal feature extraction [19]. The effect is very prominent, especially for two-class problem of EEG signal classification. We denote the EEG data of trial i for class A, which is a matrix of size N by M. Here N represents the number of channels, and M represents the number of sample points in the time domain of a trial. In theory, there are two classes (namely A and B). By solving the covariance matrix of two kinds of data R A and R B , then decomposing the sum of the covariance matrices R A , R B , the corresponding eigenvector matrix U 0 and eigenvalue matrix ε are obtained. The whitening transformation matrix P is constructed as P = ε − 1 /2 U T 0 . R A and R B are transformed, and then eigenvalue decomposition is performed to obtain the same eigenvector matrix U A and U B . In general, the m-th column and the post-m-column eigenvectors can be extracted from U A (or U B ) simultaneously and combined into a matrix, U, to represent the spatial features of the two types of signals. Then, the spatial filter W is constructed as W = U T P.
Using the projection matrix W, the data from each trial, X i , can be projected as After CSP projection, N rows are selected to represent each trial. Let Z p (p = 1, 2, . . . , N) be defined as the variance of row p of Z. Then, usually the p-th component of the feature vector for the i-th trial is computed as the logarithm of the normalized variance as in (2): The feature vector f = ( f 1 , f 2 , . . . , f N ) is then used for designing classifiers for motor imagery tasks.
Since this paper designs the four-class MI task classification, the CSP needs to be expanded to satisfy technical requirements. There are two types of commonly used expansion methods: one-to-one and one-to-other. One-to-other was chosen to expand the CSP. Therefore, four projection matrices-f 1 , f 2 , f 3 , f 4 -will be generated for each trial. Each projection matrix is concatenated to form a whole spatial feature vector: F 2 , where N represents the number of all channels. The signal for each experiment consists of 22 channels of data and 9 channels of ISC component data, so N = 31.

Local Characteristic-Scale Decomposition
Based on the definition of the intrinsic scale component (ISC), a real value signal x(t)(t > 0) can be decomposed into numbers of ISCs by using the local characteristic-scale decomposition (LCD) method in the same manner as [20]. The ISC component's conditions are to eliminate the situation of riding waves, guarantee the waveform is single, and ensure the smoothness and symmetry of the ISC component waveform. They ensure that the ISC component possesses a single mode between two adjacent extrema and corresponds with the sine curve locally. Therefore, the instantaneous frequency of the ISC has physical significance.
A signal x(t) is decomposed into n ISCs and a residue u n (t) as To guarantee that the ISC components meet the definitions, a criterion for the sifting process should be determined. The standard deviation (SD) is adopted as follows: Note that time-consumption is still an issue; the maximum contribution of three channels, C3, C4 and Cz, is selected for processing with LCD, and data of each channel is decomposed into three ISCs.
Thus, after processing the EEG signals with LCD, the time-frequency features in a trial are extracted as F 1 = [ f 11 , f 12 . . . , f 1K ] ∈ R 1×KP , where K is the total number of ISC components in each experiment, and P is the number of features selected from the ISC component. In our experiment, K = 9 and the value of P is set to 20. There are two kinds of feature fusion methods: serial feature fusion and parallel feature fusion. Among them, serial feature fusion connects a variety of features after they are normalized, which is simple. In this paper, we employ the serial feature fusion strategy to fuse the features of the spatial and frequency domains.
Assuming the EEG feature vector after processing and serial feature fusion is F ∈ R 1×(KP+4N) , then We selected three EEG channels, C3, C4 and Cz, that were believed to be more involved in motor imagery tasks for LCD decomposition [21]. Nine ISC components were obtained and the frequency domain features, F 1 , were extracted. Then, the original signals of the 22 channels were added to the nine ISC components obtained by LCD decomposition, and then the spatial features were extracted from 31 channels as F 2 by the CSP method. The flow chart of the feature extraction algorithm is shown in Figure 6. , where K is the total number of ISC components in each experiment, and P is the number of features selected from the ISC component. In our experiment, 9 K  and the value of P is set to 20.

Feature Fusion
There are two kinds of feature fusion methods: serial feature fusion and parallel feature fusion. Among them, serial feature fusion connects a variety of features after they are normalized, which is simple. In this paper, we employ the serial feature fusion strategy to fuse the features of the spatial and frequency domains.
Assuming the EEG feature vector after processing and serial feature fusion is We selected three EEG channels, C3, C4 and Cz, that were believed to be more involved in motor imagery tasks for LCD decomposition [21]. Nine ISC components were obtained and the frequency domain features, 1 F , were extracted. Then, the original signals of the 22 channels were added to the nine ISC components obtained by LCD decomposition, and then the spatial features were extracted from 31 channels as 2 F by the CSP method. The flow chart of the feature extraction algorithm is shown in Figure 6.

Feature Selection
A brief description of our proposed feature selection scheme (FA-LA) is as follows:

Fitness Model Establishment
In order to select a feature subset from the D feature, we set a population with N members. The members of this population were initialized to a vector θ i = w i1 , w i2 , . . . , w iD ; w ij ∈ [0, 1] , in which i = 1, 2, . . . , N. The Parameter w ij represents the activation thresholds. One feature will be activated when its value exceeds 0.5. The Fitness(i) of the i-th population member was evaluated on the basis of the average classification accuracy (CA v ) using the validation-set with the spectral regression discriminant analysis (SRDA) classifier which is trained on the Train-set. According to the fitness function formula, Fitness(i) = 1 /CA v , the minimization problem should be carefully studied.
According to the obtained fitness values, the state transition probability matrix was updated. The Firefly Algorithm The core idea of the FA is that a firefly is attracted by the firefly whose absolute brightness is largest and so updates its location according to the corresponding formula [16]. In the following part, the absolute brightness of the firefly and the attraction between two fireflies are modeled, respectively, and then the updated formula is given.
The standard firefly algorithm needs to be performed three stages: initialization, firefly location update, and firefly brightness update. The information flow of the firefly algorithm is demonstrated in Figure 7. According to the obtained fitness values, the state transition probability matrix was updated.

The Firefly Algorithm
The core idea of the FA is that a firefly is attracted by the firefly whose absolute brightness is largest and so updates its location according to the corresponding formula [16]. In the following part, the absolute brightness of the firefly and the attraction between two fireflies are modeled, respectively, and then the updated formula is given.
The standard firefly algorithm needs to be performed three stages: initialization, firefly location update, and firefly brightness update. The information flow of the firefly algorithm is demonstrated in Figure 7.  In the initial stage, the algorithm parameters are set up and the position of the fireflies is initialized. Then, the position vector of the firefly is fed into the objective function to initialize the brightness of the firefly. According to the brightness of the firefly and the rules of attraction between fireflies, the location of all fireflies is updated in the update stage. In the period of the firefly brightness update process, the new position vector of the firefly is brought into the objective function to complete the update of the firefly brightness. The general termination conditions of the algorithm are as follows: when the algorithm reaches the predetermined number of iterations, and when the algorithm obtains the optimum target value.
Adaptive Selection of Parameters for the FA The state transition probability matrix C is initialized uniformly with 0.05 (due to the lack of a priori information, all values are equally likely) for the parameter γ of FA at 20 quantized levels between (0, 1]. That is C = [C 1 , C 2 , . . . , C 20 ]. Thus, the order of C is N × 20. When selecting γ for the first i-th member, we used the roulette method. After generating a random number r, which is between (0, 1], the chosen C j meets the inequality: Updating the State Transition Probability Matrix In the m-th generation, we chose C j for the i-th member, then calculated the population fitness value Fitness i, C j , m , compared with the fitness value Fitness(i, m − 1) of the i-th member of the (m − 1)-th generation. According to the results of the comparison, we used the linear reinforcement scheme to update the state transition probability matrix C ix (m) to get C ix (m + 1).
If (Fitness i, C j , m < Fitness(i, m − 1)) where, a ∈ [0, 1] is the reward feedback factor, b ∈ [0, 1] is the punishment feedback factor, and e is the number of automatic process actions, and was set at 20. Generally, parameters a and b are equal. In our experiment, a = b = 0.1 was selected.
Pseudo-Code for Feature Selection If gen < genmax, then go to the second step, otherwise select the member with the best fitness to get the final set of the feature subset.

•
The feature subset result obtained in the previous step is used in the test dataset, and the classification result is obtained.

Spectral Regression Discriminant Analysis for Classification
Linear discriminant analysis (LDA) is widely used in feature classification. It is able to maximize the covariance between classes by projecting, while minimizing the covariance within the class. It has been widely used in many areas of signal processing, such as machine learning, data mining, and pattern recognition. However, the computation of LDA involves the eigenvalue decomposition of dense matrices, which is to computationally costly and quite time-consuming. This paper adopts the spectral regression discriminant analysis classifier. By using spectral graph analysis, SRDA casts discriminant analysis into a regression framework that facilitates both efficient computation and the use of regularization techniques. Specifically, SRDA only needs to solve a set of regularized least-squares problems, and there is no eigenvector computation involved, which is a huge saver of both time and memory.

Offline Data Analysis
As for the 2008 BCI competition data set 2a, there were nine subjects taking part in the experiment, and each subject had two data sets. Session T is the training set, while the session E is the evaluation set. Each of them has 288 trials. In order to lay a good foundation for the classification results, the EEG signal is first band-pass filtered to remove artifacts as described in Section 2.1.
To illustrate the effectiveness of the feature selection algorithm proposed in this paper, the results of SRDA are compared to SRDA with the feature selection algorithm (FA-LA-SRDA). The results are shown in Table 2. As seen from Table 2, for subjects 1, 3, 5, 6, 7, 8, and 9, FA-LA-SRDA has achieved better results, compared with SRDA. Moreover, the use of the FA-LA algorithm has significantly improved the average accuracy.
In order to further prove the performance of the proposed method, FA-LA was compared with GA and adaptive weight particle swarm optimization (APSO) under the same conditions. The parameters used in competitor algorithms are summarized in Table 3. In order to further evaluate the effectiveness of the FA-LA algorithm, a new effective method of combing CSP and LCD for MI EEG feature extraction was proposed, then the FA-LA algorithm was employed for feature selection. The classification accuracy and the number of selected features were compared with the other two feature selection methods, GA and APSO. The comparison of the performance metrics of the three algorithms is given in Table 4 and Figure 8, where the classification accuracy (CA) and the number of selected features (FS) of each subject are the two critical metrics.  By comparing the values of the CA of the three algorithms in Table 4 and Figure 8, the FA-LA algorithm achieves the highest classification accuracy for all subjects, except subject 2, compared with the GA and APSO algorithms. The average classification accuracy is also the highest among the three algorithms. The average number of features selected using FA-LA and APSO are similar: 159 and 161 respectively. Compared with the average number of 216 features using the GA, the number of features chosen by this method is significantly reduced.
When comparing the results of feature selection and classification in both cases, it can be seen from Table 4 that, although the number of features selected based on CSP and LCD is relatively large, the average classification accuracy of the three algorithms is higher than that based on CSP classification. The results show that combination with the LCD algorithm to extract the time-frequency domain features can extract more useful information, so as to improve the classification effect. In the three feature selection algorithms, the average classification accuracy of the proposed FA-LA algorithm is still the highest, which also verifies the advantage of the algorithm's performance. Table 5 summarizes the comparison of classification accuracy (CA) and the kappa score (K) of the proposed method with the existing multi-class methods for each subject in data set 2a of BCI competition IV. We applied a 10-fold cross-validation procedure for the proposed method. By comparing the values of the CA of the three algorithms in Table 4 and Figure 8, the FA-LA algorithm achieves the highest classification accuracy for all subjects, except subject 2, compared with the GA and APSO algorithms. The average classification accuracy is also the highest among the three algorithms. The average number of features selected using FA-LA and APSO are similar: 159 and 161 respectively. Compared with the average number of 216 features using the GA, the number of features chosen by this method is significantly reduced.

Comparison to Previous Work
When comparing the results of feature selection and classification in both cases, it can be seen from Table 4 that, although the number of features selected based on CSP and LCD is relatively large, the average classification accuracy of the three algorithms is higher than that based on CSP classification. The results show that combination with the LCD algorithm to extract the time-frequency domain features can extract more useful information, so as to improve the classification effect. In the three feature selection algorithms, the average classification accuracy of the proposed FA-LA algorithm is still the highest, which also verifies the advantage of the algorithm's performance. Table 5 summarizes the comparison of classification accuracy (CA) and the kappa score (K) of the proposed method with the existing multi-class methods for each subject in data set 2a of BCI competition IV. We applied a 10-fold cross-validation procedure for the proposed method. As seen from Table 5, for the method proposed in the literature [22], the effect of subject 2 is the best, but the average classification accuracy is the worst of the four methods. Although an improved classification method is proposed in [22], the effect is not obvious and acceptable. In [23], pre-processing and channel selection was performed for the EEG signal. Subjects 5 and 7 achieved the best performance, but the average classification accuracy was around 66.6% and the kappa score was 0.55, which ranks third out of the proposed methods. The average classification accuracy and kappa score of the FA-LA method achieved a mean performance of 70.2% and 0.6 respectively, which is same as the score in the literature [24]. Subject 3 achieved the best classification accuracy, 88.89%, in all subjects. The proposed method outperforms the reference methods. It proves that the proposed feature selection algorithm can effectively improve classification accuracy.

Real-Time Data Analysis
In the real-time experiment, according to the parameter setting of the UE-16B EEG amplifier in the data acquisition platform (as described in Section 2.2), the original EEG signal collected by the EEG cap will be roughly filtered, and then the EEG signal output from the amplifier will be subject to band-pass filtering such as off-line data preprocessing.
Each subject completed 15 trials of experiments for each class of motor imagery tasks; a total of 60 trials. F-measure was used to analyze the performance of the proposed methods in real-time BCI systems. It is calculated as: F = 2 * P * R P + R . The average precision (P), recall (R) rate, and F-measure of each class of movement is illustrated in Table 6 and Figure 9.  In order to further and more intuitively verify the performance of the feature selection algorithm proposed in this paper, the experimental results were compared with the genetic algorithm and the adaptive weight particle swarm optimization algorithm. After four subjects were adapted to the system, 100 experimental data were collected from the system. In total, 75% of the experimental data was selected as training data and 25% was selected as test data. Each group of data was processed 10 times by selecting different training sets and test sets, and the final results were averaged, as shown in Table 7.  In general, it is expected that the recall rate and precision of the results be closer to 1, but the two are sometimes contradictory, so F-measures can be used to evaluate the results comprehensively. From Table 6 and Figure 9, it can be seen that the effect of the tongue is the best of the four-class MI tasks, with higher precision and recall rates than the other three, especially with a precision of 0.92, which shows that this class of MI task is not easily misjudged; other tasks are not easily mistaken for tongue task. The precision of left hand, right hand, and foot tasks is still relatively high, but the recall rate is not satisfactory. The precision of the left hand can reach 0.7, but its recall rate is the lowest of the four classes, indicating that this task of MI can easily be mistaken for other tasks.
In order to further and more intuitively verify the performance of the feature selection algorithm proposed in this paper, the experimental results were compared with the genetic algorithm and the adaptive weight particle swarm optimization algorithm. After four subjects were adapted to the system, 100 experimental data were collected from the system. In total, 75% of the experimental data was selected as training data and 25% was selected as test data. Each group of data was processed 10 times by selecting different training sets and test sets, and the final results were averaged, as shown in Table 7. According to the real-time experimental results as seen in Table 7, the FA-LA algorithm proposed in this paper had obvious effects on all four subjects. The number of features selected is also the least of the three algorithms, except for the subject 2. On the whole, the number of features selected by the genetic algorithm is the largest, with an average value of 332, while the number of features selected by APSO and FA-LA is significantly reduced by 301.5 and 296, respectively. In the above-mentioned subjects, the number of features selected for subject 2 was the highest, and the classification efficiency was the highest. Subject 1 underwent the least number of experiments, and the classification accuracy of each feature extraction method was low. This shows that motor imagery can be better trained to produce the appropriate instructions and get better classification results. The overall trend is consistent with the results of the competition data, and the validity of the feature algorithm is verified again.
The execution time of completing the real-time maze game is a reasonable choice to verify the real-time performance of the MI signal processing algorithm in this paper. Table 8 shows the total time for four subjects to complete five trials of maze games, respectively. The time for a single data epoch is 1.1 seconds. Within this time, EEG data is collected and processed to obtain the final results. If the classification results are correct, the ball will move in the direction indicated by the picture, and if the classification results are wrong as shown on the right of Figure 5, the ball will stop. The subject prompts the ball to move to the position of the green arrow. There are 19 steps in total to complete the maze game.
Because of individual differences, in the training data collection section, each subject's analysis results are not the same, and the classification model is also different. The game process is long and the classification accuracy of a certain step on the way is too low to affect the mood of the subjects, thus affecting the overall performance of the game. The subjects completed the maze game in a total time of around 50 s, and the average time of each step was about 2.6 s, so the overall real-time effect is still acceptable.
For the parameters of CA and FS, the FA-LA feature selection algorithm, to a certain extent, improved classification performance and achieved the best performance in comparison with the other two algorithms. Thus, it shows that the feature selection method FA-LA proposed in this paper can balance the number of features and classification accuracy effectively, and the final maze game also confirmed that the algorithm achieves good real-time performance.

Conclusions
In this paper, we proposed a novel feature selection method based on the firefly algorithm and learning automata for four-class motor imagery EEG signal processing to avoid being entrapped in the local optimum. After feature extraction using a method of combining CSP and LCD from the EEG data, the proposed feature selection method FA-LA is used to obtain the best subset of features, and the SRDA is utilized for classification. Both the fourth brain-computer interface competition data and real-time data acquired in our physical experiments were used to confirm the validation of the proposed method. Experimental results show that the proposed FA-LA further improves the recognition accuracy of motor imagery EEG, mainly decreases the dimensions of the feature set, and is capable of operating in a real-time BCI system.
In future work, we will further improve the signal preprocessing method to remove eye artifacts in the EEG signal. How to determine the existence of eye artifacts, and how to adaptively remove the eye signals, are critical research issues. The future goal is to apply this method to the real-time control of devices such as robotic arms, which will be helpful in the rehabilitation of patients with functional paralysis.