Research on Channel Selection and Multi-Feature Fusion of EEG Signals for Mental Fatigue Detection

With the rapid development of modern social science and technology, the pace of life is getting faster, and brain fatigue has become a sub-health state that seriously affects the normal life of people. Electroencephalogram (EEG) signals reflect changes in the central nervous system. Using EEG signals to assess mental fatigue is a research hotspot in related fields. Most existing fatigue detection methods are time-consuming or don’t achieve satisfactory results due to insufficient features extracted from EEG signals. In this paper, a 2-back task is designed to induce fatigue. The weight value of each channel under a single feature is calculated by ReliefF algorithm. The classification accuracy of each channel under the corresponding features is analyzed. The classification accuracy of each single channel is combined to perform weighted summation to obtain the weight value of each channel. The first half channels sorted in descending order based on the weight value is chosen as the common channels. Multi-features in frequency and time domains are extracted from the common channel data, and the sparse representation method is used to perform feature fusion to obtain sparse fused features. Finally, the SRDA classifier is used to detect the fatigue state. Experimental results show that the proposed methods in our work effectively reduce the number of channels for computation and also improve the mental fatigue detection accuracy.


Introduction
With rapid economic development, people's life rhythm is getting faster and faster, and the pressure of competition is becoming greater and greater. High-intensity use of the brain for a long time can easily cause mental fatigue. Mental fatigue has become a sub-health state that seriously affects people's normal life, which might cause heart disease, palpitations, and other physiological diseases, and may also cause insomnia, anxiety, depression, and other mental diseases [1]. It is very important and necessary to timely detect mental fatigue and make feedback. There are subjective and objective methods for fatigue detection [2]. The subjective detection method mainly evaluates the fatigue state via the subjects filling out the scale based on their subjective mental state and feelings. The commonly used scales include Karolinska sleepiness scale (KSS), Stanford sleepiness scale (SSS), Chalder fatigue scale, et al. [3,4]. The advantage of the subjective evaluation method is ease of use, but the standards are not uniform, and the results are not comprehensive. The objective evaluation method is to obtain the physiological data parameters or biochemical data parameters of the subject under different brain fatigue states through external experimental equipment to evaluate the fatigue state of the subject. According to the information used, objective evaluation methods are divided into psychological and behavioral evaluations, biochemical evaluations, and physiological evaluations. Itoh et al. used sensors to obtain driver's body movements during driving to assess driver's fatigue [5]. Fernstrom, et al. conducted related research, and the results showed that the subject's serum tryptophan will increase as the subject's fatigue state deepens [6]. Compared with other signals, the electroencephalogram (EEG) signal is considered to be the most accurate and credible assessment of brain fatigue and has been widely recognized by the medical community and scientific research community.
Nowadays, research on brain fatigue detection is not limited to improving the accuracy of fatigue state recognition but also begins to focus on the combination of brain fatigue detection and practical application. This leads to high requirements on the real-time nature of the mental fatigue detection system, the portability of the acquisition equipment, and the complexity of the operation. Considering these factors, it is necessary to reduce the number of brain electrodes to save computation time. Yvonne Trand et al. used 26 channels to obtain EEG data of all subjects before and after fatigue. By analyzing the PSD characteristics and comparing the classification performance of PCA before and after dimensionality reduction, it was found that the results after dimensionality reduction were comparable to the results using the original data [7]. Chen, J et al. used 40 lead channels on the NeuroScan EEG acquisition device to record the EEG signal of the subject in the driving simulation phase on the driving simulator and to study the changes in the mental state of the subject during the simulated driving [8]. Chai, R et al. compared the classification effects of using 32 channels of data and the selected 11 frontal lobe channels (FP1, AF3, F7, F3, FC1, FC2, F4, F8, AF4, FP2, and FZ). The effect of 11-channel data is slightly lower than that of using 32-channel data, but the processing time is reduced [9]. Rifai et al. performed ICA processing on 32-channel data to reduce the channel number to 16. The results showed that the classification effect of the selected 16-channel was comparable to the result using the original 32-channel [10].
In mental fatigue detection based on EEG signals, feature extraction is a crucial step. Whether the extracted features are sensitive to fatigue directly affects the classification performance of fatigue detection. At present, mental fatigue detection methods using EEG signals can be divided into three major categories: power spectrum-based feature analysis, entropy-based feature analysis, and brain network-based feature analysis. Nguyen et al. extracted the power spectrum features of the α-band and β-band, combined with the blink rate, closed-eye rate, and heart rate features to identify the state of the subject. 10 subjects were tested for their proposed method and achieved classification accuracy of 70.5 (±9.9) [11]. Rohit et al. performed spectrum analysis on EEG signals and used support vector machines to predict the state of the 23 subjects. Using EEG spectral features achieved 74% accuracy with SVM and for temporal aggregation of the classifier over a 3 min window improved accuracy to 84% [12]. Wang et al. analyzed the sample entropy value of the EEG signal and combined it with the EMG eigenvalues to establish a mathematical model for fatigue driving evaluation. Combining a PCA-multiple regression model, they got an average accuracy of 95.32% in eight subjects [13]. Zhang et al. calculated the complexity of the approximate entropy of EEG data to distinguish whether the subject had driving experience, combined with the sample entropy characteristics and EOG, EMG signal characteristics to determine the driver's fatigue level [14]. Han, C et al. applied complex network theory to study the changes of brain network characteristics under different fatigue states. Their results demonstrated under fatigue state, the brain network complexity of 16 subjects improved [15]. Zhao, C used the graph theory method to study the functional network reconstruction changes in different frequency bands of six subjects and compared the brain function network in normal and fatigue states [16]. Only performing feature analysis from a single aspect will cause the loss of effective information. Tzimourta et al. extracted six time domain features and five spectral features for analysis and found that the classification results using two fusion features are more ideal and the best accuracy reached 85.5% [17]. Hu, J. et al. extracted four characteristics of sample entropy, fuzzy entropy, approximate entropy, and power spectrum entropy, and fused together as an input to the gradient-enhanced decision tree to determine whether the driver was in a fatigue state. Under performing on 22 subjects, their results had the average highest accuracy up to 94% [18]. Some closely related methods have been recently  [19]. Ruiz, Gomez et al. proposed a CCA-based multiplex feature parameters (multiplex clustering coefficient and strength) method to detect Alzheimer's disease (AD) and showed that the interpretation of the proposed method can be analogous to their frequency-specific counterparts [20].
To solve the problem of complicated calculation induced by channel redundancy when detecting mental fatigue states with multi-channel EEG data, this paper proposes a common channel selection method based on ReliefF algorithm [21]. Considering that the single feature analysis causes the loss of effective information, and multiple feature concatenation causes high dimension of the feature set, this paper suggests a multi-feature fusion method based on sparse representation to improve the classification effect of fatigue state.

Channel Selection Based on ReliefF
Relief is a feature selection algorithm by calculating feature weights, mainly for two classification problems [22]. The importance of different features can be expressed by the following formula: In the above formula, s indicates the sample point. M(S) represents the nearest neighbor sample point of the same kind as s. In the training data set S 1 , S 2 , S 3 . . . . . . S n , sample points S i (1 ≤ i ≤ n) are randomly selected, the nearest neighbor sample point S iN M is calculated as nearest hit in the training data set with the same kind S i , and the nearest neighbor sample point S iN M is calculated as nearest miss in the training data set with different classes S i . For feature sets F, update the weight values of different features according to the following rules: For a certain feature, if the distance between S i and S iNH is smaller than the distance between S i and S iN M , it means that the feature's contribution to distinguishing the samples of the same type and different types is positive, and the feature weight value should be increased. Otherwise, the feature weight value should be reduced. The complexity of the Relief algorithm is positively related to the number of samples and the number of features. The sampling number is m, the original feature number is a, the threshold of feature weight is δ, the weight of each feature is W, the feature subset is T. The algorithm process is as follows Algorithm 1: Obtain a sample S i randomly from the training data set S;

5:
Calculate S i 's nearest neighbor sample point S iNH from homogeneous training sets, and S i 's nearest neighbor sample point S iN M from S i different kinds of training sets;

6:
For j = 1 to a, do In Equation (2), F j is a certain original feature, and S i F j is the value of the sample S i on The ReliefF algorithm is improved on the basis of the Relief algorithm. The principle of handling multi-classification problems is as follows: Sample points S i (1 ≤ i ≤ n) are randomly selected from the training data set S 1 , S 2 , S 3 . . . . . . S n . Calculate the k nearest neighbor sample points of S i nearest hit in the training set of the same kind as S i and record it as S iNH h (h = 1, 2..k). For different features F j , the weight value of each feature is calculated and updated as follows: In the above formula, F j is a certain original feature, W(F j ) is the corresponding weight value, p(C) is the proportion of a certain category C different from S i , p(class(S i )) is the proportion of the randomly selected sample S i corresponding to the category. di f f (F, S 1 , S 2 ) is the distance between sample S 1 and sample S 2 on feature F. Defined as follows: F is discrete: F is continuous: Suppose the sampling frequency is m, the original feature number is a, the nearest neighbor sample number is k, the feature weight threshold is δ, the weight of each feature is W, and the feature subset is T. The specific steps of the ReliefF algorithm are as follows Algorithm 2: In this paper, each EEG channel is regarded as a feature, and then the feature selection is carried out according to the above method, so as to select the channel with a high contribution to the classification effect of fatigue EEG data. There are n channels C 1 , C 2 , C 3 . . . C n and b features F 1 , F 2 . . . F b are extracted from each channel. The weight of channel l can be calculated by the following formula: In the above formula, W(C F i l ) represents the weight corresponding to the feature i on channel l, and b is the number of features extracted on each channel. The value of W(C F i l ) can be obtained by the above ReliefF method.

Common Channel Selection Based on Weight Addition
There are m subjects S 1 , S 2 , S 3 . . . . . . S n . According to the above method, the channel weight value [23] of each subject is W 1 , W 2 , W 3 . . . W m . First normalize the channel weights of each subject. After the normalized channel weights are obtained, the normalized weights of the corresponding channels of each test are summed to obtain the common channel weights, as follows: In the above formula, W(C i ) represents the weight of the i-th channel of the last common channel, W(C S j i ) is the weight of the subject S j channel i after normalization. Finally, the weights W are sorted from large to small, and the channel with the highest rank is the common optimal channel combination.

Common Channel Selection Based on Weighted Classification Performance Using a Single Channel
In this paper, the results of each subject's single-channel performance analysis are combined with the channel weights. The SRDA (Spectral Regression Discriminant Analysis) classifier [24] is used to obtain the single channel classification accuracy rate, and the accuracy rate is weighted to calculate the common channel weight. Suppose there are m subjects S 1 , S 2 , S 3 . . . . . . S m and n channels C 1 , C 2 , C 3 . . . C n . Extract b features F 1 , F 2 . . . F b from each channel, and the calculation method of common channel weights is as follows: In the above formula, W(C l ) represents the weight value of the l-th common channel, W(C F i l S j ) represents the weight value calculated for the i-th feature on the l-th channel of the j-th subject. Acc(C F i l S j ) represents the classification accuracy rate under the i-th feature on the l-th channel of the j-th subject. Calculate the weight value of each channel according to the above formula, and sort each channel by weight. Finally, select the common channel combination according to the order.  Table 1.

Sparse Representation
It is difficult to analyze and process high-dimensional signals, and it is easier and faster to process sparse signals. Sparse signals are signals with few non-zero elements and low dimensions. Most signals are not sparse signals, but in a certain transform domain, the signals can be sparse to obtain a sparse representation of the original signal.
In recent years, the sparse representation theory [25] of signals has attracted the attention of many researchers, a large number of applications are in the field of image signal processing, and there are also a few studies applying sparse representation theory to the classification processing of physiological signals. Sparse representation theory is to represent the original signal with a linear combination of fewer basic signals. Suppose an N-dimensional original signal X ∈ R N , dictionary matrix D ∈ R N * M containing primitive column signals. Then the signal X can be represented by a linear combination of atoms in D, as follows: In the above formula, α ∈ R M = [α 1 , α 2 , ..., α M ] T represents the sparse representation of the original signal X. The sparse representation diagram is shown in Figure 1.  In this paper, in order to reduce the dimension of the feature set, the incomplete dictionary D is used to realize the sparse representation of features.

Multi-Class Feature Fusion Based on K-SVD
The determination of the dictionary in sparse representation theory is a key step. In this paper, K-SVD dictionary learning algorithm [26] is used to design the dictionary matrix.
The K-SVD dictionary learning algorithm is mainly divided into two stages: sparse representation and dictionary update. The sparse representation stage first initializes a dictionary matrix D . Using the OMP algorithm to obtain the sparse representation X of the signal Y is as follows: The results of the initial dictionary for sparse representation are often not optimal, and dictionary update is required. The dictionary matrix D is updated by columns. First fix the sparse coefficient matrix X and dictionary matrix D . Suppose we want to update the k column k d of the dictionary matrix, corresponding k-th row in the sparse coefficient The following formula is available: Keep the non-zero value in k T x , remove the value of 0, and do the corresponding processing on k E , the specific processing is as follows: Define the set  In this paper, in order to reduce the dimension of the feature set, the incomplete dictionary D is used to realize the sparse representation of features.

Multi-Class Feature Fusion Based on K-SVD
The determination of the dictionary in sparse representation theory is a key step. In this paper, K-SVD dictionary learning algorithm [26] is used to design the dictionary matrix.
The K-SVD dictionary learning algorithm is mainly divided into two stages: sparse representation and dictionary update. The sparse representation stage first initializes a dictionary matrix D. Using the OMP algorithm to obtain the sparse representation X of the signal Y is as follows: The results of the initial dictionary for sparse representation are often not optimal, and dictionary update is required. The dictionary matrix D is updated by columns. First fix the sparse coefficient matrix X and dictionary matrix D. Suppose we want to update the k column d k of the dictionary matrix, corresponding k-th row in the sparse coefficient matrix X is x k T .The following formula is available: Keep the non-zero value in x k T , remove the value of 0, and do the corresponding processing on E k , the specific processing is as follows: Define the set Define Ω k as a matrix of N * |ω k |, except that the value at position (ω k (i), i) is set to 1, and all other values are 0. Make Equation (13) as follows: In the above formula, T Ω k respectively show the contraction results of the three after removing 0. Perform SVD decomposition on E k R : The first column of U is denoted by Multiply the first column of V by ∆(1, 1) to get the updated result of x k R . Follow the above steps to update the dictionary column by column to get a new dictionary ∼ D, and then use the new dictionary to sparsely represent the original signal. Determine whether the stop condition is reached. If the stop condition is not met, continue to iterate through the above steps until the direct condition is met. The flow of K-SVD dictionary learning algorithm is shown in Figure 2 below.
In this paper, based on the K-SVD dictionary learning algorithm, the dictionary matrix corresponding to the multi-feature set is constructed. Sparse representations of multiple feature sets are used to obtain sparse fusion features, which are used as input to the SRDA classifier to reduce the dimension of the feature set and improve the classification accuracy. In the above formula, x . Follow the above steps to update the dictionary column by column to get a new dictionary D , and then use the new dictionary to sparsely represent the original signal. Determine whether the stop condition is reached. If the stop condition is not met, continue to iterate through the above steps until the direct condition is met. The flow of K-SVD dictionary learning algorithm is shown in Figure 2 below.

Subject and Experiment Environment
The experimental subjects include eight graduate students without brain disease or other physiological psychological diseases. And they were required not to drink alcohol or take foods, beverages, and drugs that might affect the mental state of the human body during the first 24 h before the experiment. The experiments were conducted in a quiet and closed laboratory, ensuring no external interference, good ventilation, and suitable light. The subjects were asked to sit in a comfortable chair, facing the computer screen, and relax the body to prevent muscle tension from interfering with the results of the experiment. Subjects were instructed and trained about the experimental procedure in advance. And all subjects gave their informed consent for inclusion before they participated in the study. The study was conducted with approval from Wuhan University of Technology.

2-Back Task
The N-back task is a working memory task, which was first proposed by Kirchner and used to examine the age difference in the short-term memory ability of rapidly changing information. In the N-back task, when browsing the stimuli presented sequentially, the subject judges whether the currently presented stimulus matches the N-th stimulus presented before the stimulus. The current stimulation position is marked as M. The subject judges whether it is consistent with the M-1-th stimulation, which is called a 1-back task. The subject judges whether it is consistent with M-2 stimulation, which is a 2-back task. Taking into account the difficulty of the experiment and the experimental experience of the subjects, and studies have shown that 2-back duration of 30 min can induce mental fatigue. In our research, the letter 2-back task is designed to induce mental fatigue. The design diagram is shown in Figure 3.
In this paper, based on the K-SVD dictionary learning algorithm, the dictionar trix corresponding to the multi-feature set is constructed. Sparse representations of tiple feature sets are used to obtain sparse fusion features, which are used as input SRDA classifier to reduce the dimension of the feature set and improve the classifi accuracy.

Subject and Experiment Environment
The experimental subjects include eight graduate students without brain dise other physiological psychological diseases. And they were required not to drink al or take foods, beverages, and drugs that might affect the mental state of the human during the first 24 h before the experiment. The experiments were conducted in a and closed laboratory, ensuring no external interference, good ventilation, and su light. The subjects were asked to sit in a comfortable chair, facing the computer s and relax the body to prevent muscle tension from interfering with the results of t periment. Subjects were instructed and trained about the experimental procedure vance. And all subjects gave their informed consent for inclusion before they partici in the study. The study was conducted with approval from Wuhan University of nology.

2-Back Task
The N-back task is a working memory task, which was first proposed by Kir and used to examine the age difference in the short-term memory ability of rapidly c ing information. In the N-back task, when browsing the stimuli presented sequen the subject judges whether the currently presented stimulus matches the N-th stim presented before the stimulus. The current stimulation position is marked as M. Th ject judges whether it is consistent with the M-1-th stimulation, which is called a 1 task. The subject judges whether it is consistent with M-2 stimulation, which is a 2 task. Taking into account the difficulty of the experiment and the experimental expe of the subjects, and studies have shown that 2-back duration of 30 min can induce m fatigue. In our research, the letter 2-back task is designed to induce mental fatigue design diagram is shown in Figure 3.  Subjects need to judge whether the current letter stimulus is consistent with th vious second letter stimulus based on memory. If they are consistent, the subject ne press the '←' key to respond. If they are not consistent, they need to press the '→' k respond. When the first and second letter stimuli appear, press the '→' key to respo  Subjects need to judge whether the current letter stimulus is consistent with the previous second letter stimulus based on memory. If they are consistent, the subject needs to press the '←' key to respond. If they are not consistent, they need to press the '→' key to respond. When the first and second letter stimuli appear, press the '→' key to respond.

Experimental Design and Procedures
The experimental design in this article is shown in Figure 4.

Experimental Design and Procedures
The experimental design in this article is shown in Figure 4. Step 1: The subject put on an electroencephalogram cap and applied conductive paste to sit on the experimental chair and prepare for the experiment.
Step 2: The subject sat and filled in the Chalder fatigue scale.
Step 3: The subject sat quietly on the experimental chair and EEG data were collected for 2 min.
Step 4: The subject began the 2-back task training stage until the accuracy rate reached 0.8 or above.
Step 5: The subject began the first round of 2-back task formal experiment, and the subject's key response and response time were recorded for 25 min.
Step 6: The subject rested for 5 min.
Step 7: The subject began the second round of the 2-back task formal experiment, and the subject's key response and response time were recorded for 25 min.
Step 8: The subject sat quietly on the experimental chair and the subject's EEG data were collected for 2 min.
Step 9: Subject sat and filled in the Chalder fatigue scale.
Step 10: The experiment ended.
In our research, the following 16 channels are used to collect the fatigue EEG data of the subjects: Cz, FCz, F3, F4, C3, C4, P3, P4, FZ, O, F7, F8, T3, T4, T5, T6. The channel distribution is shown in Figure 5.   Step 1: The subject put on an electroencephalogram cap and applied conductive paste to sit on the experimental chair and prepare for the experiment.
Step 2: The subject sat and filled in the Chalder fatigue scale.
Step 3: The subject sat quietly on the experimental chair and EEG data were collected for 2 min.
Step 4: The subject began the 2-back task training stage until the accuracy rate reached 0.8 or above.
Step 5: The subject began the first round of 2-back task formal experiment, and the subject's key response and response time were recorded for 25 min.
Step 6: The subject rested for 5 min.
Step 7: The subject began the second round of the 2-back task formal experiment, and the subject's key response and response time were recorded for 25 min.
Step 8: The subject sat quietly on the experimental chair and the subject's EEG data were collected for 2 min.
Step 9: Subject sat and filled in the Chalder fatigue scale.
Step 10: The experiment ended.

Experimental Design and Procedures
The experimental design in this article is shown in Figure 4. Step 1: The subject put on an electroencephalogram cap and applied conductive paste to sit on the experimental chair and prepare for the experiment.
Step 2: The subject sat and filled in the Chalder fatigue scale.
Step 3: The subject sat quietly on the experimental chair and EEG data were collected for 2 min.
Step 4: The subject began the 2-back task training stage until the accuracy rate reached 0.8 or above.
Step 5: The subject began the first round of 2-back task formal experiment, and the subject's key response and response time were recorded for 25 min.
Step 6: The subject rested for 5 min.
Step 7: The subject began the second round of the 2-back task formal experiment, and the subject's key response and response time were recorded for 25 min.
Step 8: The subject sat quietly on the experimental chair and the subject's EEG data were collected for 2 min.
Step 9: Subject sat and filled in the Chalder fatigue scale.
Step 10: The experiment ended.

2-Back Task Result Analysis
In this paper, the reaction time and accuracy of the response to stimulus during the fatigue-induced experiment of eight subjects who completed the 2-back task with a total duration of 50 min were recorded. Average the response time of eight subjects according to the time series to get the average response time in the 50 min 2-back task. Next, the average response time is calculated by a time window with a step length of 2 min and an overlap interval of 1 min. Accuracy of response to eight subjects was treated in the same time window. Finally, the average response accuracy of eight subjects is averaged. The results are shown in Figure 6.

2-Back Task Result Analysis
In this paper, the reaction time and accuracy of the response to stimulus during the fatigue-induced experiment of eight subjects who completed the 2-back task with a total duration of 50 min were recorded. Average the response time of eight subjects according to the time series to get the average response time in the 50 min 2-back task. Next, the average response time is calculated by a time window with a step length of 2 min and an overlap interval of 1 min. Accuracy of response to eight subjects was treated in the same time window. Finally, the average response accuracy of eight subjects is averaged. The results are shown in Figure 6. It can be seen from the above figure that although the subjects have a more obvious downward trend around the 25th average reaction time during the completion of the 2back task. The overall trend is increasing. However, the average response accuracy of the It can be seen from the above figure that although the subjects have a more obvious downward trend around the 25th average reaction time during the completion of the 2-back task. The overall trend is increasing. However, the average response accuracy of the subjects fluctuated less, and only decreased significantly in the final stage of the experiment. The reasons for the above results combined with the feedback analysis after the subject experiment are as follows: The subject gradually experienced fatigue when completing the 2-back task. In order to resist the impact of fatigue, the subject needs to spend a longer time to respond to the stimulus. This subjective resistance ensures that the accuracy rate does not fluctuate much during the 2-back task. The obvious decrease in the reaction time in the 25th period was caused by the subjects entering the second round of task after taking a 5 min break after completing the first round of 2-back task. By the end of the experiment, the subjects were already in a state of high fatigue, which caused a significant drop in accuracy.

Channel Selection Result Analysis
After the 2-back task, the EEG data were recorded. With the 500 Hz sample rate of EEG recorder, 2 min's data were collected for pre and post fatigue states. Thus, the EEG data size is 16 × 60,000 × 2 of a subject. After segmenting EEG data, the total number of EEG samples was 1264. The datasets were randomly divided into two parts, 75% for training and the rest for testing.
We compare the classification accuracy of eight subjects in the weighted common channel, accuracy weighted common channel, ICA [27] common channel, and full channel. The average accuracy results are shown in Figure 7.
Entropy 2021, 23, x FOR PEER REVIEW 12 of 17 subjects fluctuated less, and only decreased significantly in the final stage of the experiment. The reasons for the above results combined with the feedback analysis after the subject experiment are as follows: The subject gradually experienced fatigue when completing the 2-back task. In order to resist the impact of fatigue, the subject needs to spend a longer time to respond to the stimulus. This subjective resistance ensures that the accuracy rate does not fluctuate much during the 2-back task. The obvious decrease in the reaction time in the 25th period was caused by the subjects entering the second round of task after taking a 5 min break after completing the first round of 2-back task. By the end of the experiment, the subjects were already in a state of high fatigue, which caused a significant drop in accuracy.

Channel Selection Result Analysis
After the 2-back task, the EEG data were recorded. With the 500 Hz sample rate of EEG recorder, 2 min's data were collected for pre and post fatigue states. Thus, the EEG data size is 16 × 60,000 × 2 of a subject. After segmenting EEG data, the total number of EEG samples was 1264. The datasets were randomly divided into two parts, 75% for training and the rest for testing.
We compare the classification accuracy of eight subjects in the weighted common channel, accuracy weighted common channel, ICA [27] common channel, and full channel. The average accuracy results are shown in Figure 7. It can be seen from the figure that, compared with the classification accuracy under the full channel, except that Subject 4 is slightly lower than the classification accuracy under the full channel, the remaining subjects always have one or several channels after selecting basically flat or slightly higher than the classification accuracy under the full channel. Among the three common channel selection methods, the effect of accuracy weighting is overall better than the other two common channel selection methods. While ensuring the classification accuracy, the number of channels is halved, which effectively reduces the complexity of data processing. In addition, the computation time between the methods of full channels and selected channels are shown in Figure 8. It can be seen that the proposed method were less time consuming. It can be seen from the figure that, compared with the classification accuracy under the full channel, except that Subject 4 is slightly lower than the classification accuracy under the full channel, the remaining subjects always have one or several channels after selecting basically flat or slightly higher than the classification accuracy under the full channel. Among the three common channel selection methods, the effect of accuracy weighting is overall better than the other two common channel selection methods. While ensuring the classification accuracy, the number of channels is halved, which effectively reduces the complexity of data processing. In addition, the computation time between the methods of full channels and selected channels are shown in Figure 8. It can be seen that the proposed method were less time consuming.

Feature Fusion Result Analysis
Time and frequency domain features are illustrated in Figure 9 plotted with the EE-GLAB [28] package in MATLAB.

Feature Fusion Result Analysis
Time and frequency domain features are illustrated in Figure 9 plotted with the EEGLAB [28] package in MATLAB. Figure 9c is the PSD spectrum before fatigue, and the X-axis represents the frequency range from 0 to 30 Hz. The four brain topography maps are the energy spectra of the intermediate values 3, 6, 10, and 22 Hz of the four sub bands δ (0.5-4), θ (4-8), α (8-13), and β (13-30), respectively. Before fatigue, the active energy of each sub band was mainly concentrated in the left brain and left temporal lobe, while during fatigue, the energy was mainly distributed in the posterior and right occipital lobe. Under an alert or excited (pre-fatigue) state, it is generally believed that beta waves improve. As the body enters drowsiness or sleep, alpha and delta waves increase in energy. Figure 9e,f are timefrequency graphs of T5 channel within 1000 ms before and after fatigue, respectively. It can be seen that spectrum energy decreases after fatigue compared with pre-fatigue.
In our research, the five extracted single features and sparse fusion features are used as the input of the SRDA classifier. As mentioned above, the total number of EEG samples was 1264. The datasets were randomly divided into two parts, of which 75% were used for training and the rest for testing. The computation was repeated five times. For feature dimension aspect, the dimension of sample entropy and sub-band PSD features is 16 (16 channels*1) for one sample. The dimension of original fused feature is 80 (16 × 5). After K-SVD dictionary learning algorithm, the original dimension was reduced to 20. Classification accuracy of the eight subjects is shown in Table 2.

Feature Fusion Result Analysis
Time and frequency domain features are illustrated in Figure 9 plotted with the EE-GLAB [28] package in MATLAB.  As can be seen from Table 3, the five extracted features and sparse fusion features are input into the SRDA classifier for classification. Compared with the classification accuracy, the classification effect of the sparse fusion feature obtained by the method in this paper is higher than the classification effect of using a certain feature alone. This shows that the use of multi-class sparse fusion features effectively improves the classification accuracy.
In addition, the sparse feature fusion method in this paper is compared with the commonly used concatenated feature fusion method and the PCA feature fusion method. The results are shown in Figure 10.
It can be seen from the figure that the three fusion features are used as the input of the SRDA classifier, and the classification accuracy is compared. Subject 6 has the best accuracy of PCA fusion feature classification. Subject 3 has the same effect on the three fusion feature classifications. Subjects 1, 2, 4, 5, 7, and 8 all used sparse fusion features for classification accuracy better than the other two fusion features. The sparse feature fusion method proposed in this paper effectively reduces the dimension of the feature set and improves the classification accuracy. In addition, the sparse feature fusion method in this paper is compared with the commonly used concatenated feature fusion method and the PCA feature fusion method. The results are shown in Figure 10. It can be seen from the figure that the three fusion features are used as the input of the SRDA classifier, and the classification accuracy is compared. Subject 6 has the best accuracy of PCA fusion feature classification. Subject 3 has the same effect on the three fusion feature classifications. Subjects 1, 2, 4, 5, 7, and 8 all used sparse fusion features for classification accuracy better than the other two fusion features. The sparse feature fusion method proposed in this paper effectively reduces the dimension of the feature set and improves the classification accuracy.

Conclusions
Research on brain fatigue detection based on EEG signal is a hot spot in current research. In this paper, a fatigue induction experiment based on the 2-back task is designed, and fatigue EEG data of eight subjects are collected. The channel selection and multi-type feature fusion method of fatigue EEG signal are deeply researched. Based on the ReliefF algorithm, an improved common channel selection algorithm is proposed. Treat each channel as a feature. Use ReliefF algorithm to calculate the weight value of each channel under each single feature, and analyze and calculate the classification accuracy of each channel under each single feature. Finally, the weighted sum of the accuracy is used to obtain the weight value of each channel, and the channel with the highest weight value is the common channel. On this basis, the multi-feature fusion algorithm is further studied.

Conclusions
Research on brain fatigue detection based on EEG signal is a hot spot in current research. In this paper, a fatigue induction experiment based on the 2-back task is designed, and fatigue EEG data of eight subjects are collected. The channel selection and multi-type feature fusion method of fatigue EEG signal are deeply researched. Based on the ReliefF algorithm, an improved common channel selection algorithm is proposed. Treat each channel as a feature. Use ReliefF algorithm to calculate the weight value of each channel under each single feature, and analyze and calculate the classification accuracy of each channel under each single feature. Finally, the weighted sum of the accuracy is used to obtain the weight value of each channel, and the channel with the highest weight value is the common channel. On this basis, the multi-feature fusion algorithm is further studied. The time domain and frequency domain features are extracted, and multi-class fusion features are obtained as input to the classifier through the sparse representation method.
Experimental results show that the channel selection method proposed in this paper can effectively reduce the 16-channel data to 8 channels while ensuring the classification effect, reducing the amount of data, and reducing the computational complexity. At the same time, the multi-feature sparse fusion method proposed in this paper improves the classification accuracy while ensuring the feature dimension. The brain fatigue detection method based on EEG signal proposed in this paper provides a theoretical basis for the realization of online mental fatigue detection which has important significance.
However, some other problems need to be further researched. The fatigue induction experiment based on the 2-back task in our research is carried out in the laboratory under the shielded environment, which is relatively ideal, and the noise interference of the fatigue EEG data collected is relatively less. How to effectively eliminate the interference of environmental noise and apply mental fatigue detection to the scene closer to real life is worth researching. The fatigue state is a gradual process, and a more detailed quantitative assessment of the fatigue state is needed in the future.  Informed Consent Statement: All subjects gave their informed consent for inclusion before they participated in the study.

Data Availability Statement:
The data presented in this study is available on request from the corresponding author. The data is not publicly available due to the relevant research still being undergoing.