Mental Workload Classiﬁcation Method Based on EEG Cross-Session Subspace Alignment

: Electroencephalogram (EEG) signals are sensitive to the level of Mental Workload (MW). However, the random non-stationarity of EEG signals will lead to low accuracy and a poor generalization ability for cross-session MW classiﬁcation. To solve this problem of the different marginal distribution of EEG signals in different time periods, an MW classiﬁcation method based on EEG Cross-Session Subspace Alignment (CSSA) is presented to identify the level of MW induced in visual manipulation tasks. The Independent Component Analysis (ICA) method is used to obtain the Independent Components (ICs) of labeled and unlabeled EEG signals. The energy features of ICs are extracted as source domains and target domains, respectively. The marginal distributions of source subspace base vectors are aligned with the target subspace base vectors based on the linear mapping. The Kullback–Leibler (KL) divergences between the two domains are calculated to select approximately similar transformed base vectors of source subspace. The energy features in all selected vectors are trained to build a new classiﬁer using the Support Vector Machine (SVM). Then it can realize MW classiﬁcation using the cross-session EEG signals, and has good classiﬁcation accuracy.


Introduction
The Mental Workload (MW) represents the amount of brain activity in a unit of time. It is closely related to the operator's professional knowledge, personality, task type, and physiological variables [1]. Excessive MW can lead to fatigue, distraction, and reduced efficiency, which leads to errors in analysis and decision-making tasks. However, too low MW also leads to a waste of human resources [2]. Hence, the MW level is a key design factor in human-machine collaboration tasks. It is of great significance to identify the level of MW and evaluate operator performance.
The existing methods of MW measurement include subjective measurement, performance measurement, and psychophysiological measurement [3]. Subjective measurement involves some subjective feeling investigations, such as the Subjective Workload Assessment Technique (SWAT), NASA Task Load Index (NASA-TLX), and Workload Profile (WP) [4]. Performance measurement is to measure the operating performance of the subject based on tasks [5]. The indicators of physiological measurement usually include electroencephalogram (EEG), electromyogram (EMG), electrocardiogram (ECG), and electrooculogram (EOG) [6,7].
EEG data can reflect the electrical activity of the brain [8,9]. It is possible to quantify brain responses to variations in task demand [10]. Many studies showed that some features of EEG signals were responsive to the MW change [11][12][13][14]. Most research divided EEG signals into five rhythms, namely Alpha, Beta, Theta, Delta, and Gamma [15,16]. Some studies revealed that the energy features of EEG signals can classify the MW level [17,18].
In recent years, many EEG classification methods were developed based on machine learning and deep learning, such as Logistic Regression (LR) [19], Linear Discriminant Analysis (LDA) [20], Support Vector Machine (SVM) [20,21], and Neural Network (NN) [22]. Qu et al., proposed an MW classification method based on EEG independent component features with an SVM classifier and compared the SVM classifier with others [23]. Pang et al., used a Stochastic Configuration Network (SCN) for MW classification [24]. The obtained results showed that most of these traditional machine learning methods could achieve good classification results when the trained and classified data were collected in a test period with a short time interval.
EEG signals are non-stationary [23,24] and usually change due to personnel fatigue, environmental change, electrode-skin impedance, and instrument change [25]. Mühl et al., mentioned the influence of environment and affective context on the classification results of MW [26]. Non-stationary EEG leads to a decrease in generalization capability and classification accuracy if the classifier only used the data from one previous session as its training set [27]. Therefore, most statistical classification models need to be retrained from scratch using newly collected data because of the distribution change of EEG data [28].
Domain adaptation methods have been utilized in some studies [29,30], including Transfer Component Analysis (TCA) [31], Subspace Alignment (SA) [32], and Geodesic Flow Kernel (GFK) [33]. Some studies applied domain adaptation to emotion classification using EEG data [34,35] and achieved high accuracy. Zheng et al., applied the TCA method to cross-subject emotion recognition [36]. Chai et al., used the Multi-Subject Subspace Alignment (MSSA) method for multi-subject emotion recognition based on the SA method [37]. Sciaraffa N. et al., highlighted the advantage of the use of calibration in cross-subject MW classification [38]. Zhang et al., used the kernel spectral regression and transfer learning techniques to improve the accuracy of cross-subject MW classification [39]. Liu et al., found a workload decoder using data from other subjects as an approach to improve workload classification performance [40]. Yin et al., used an adaptive deep learning model to deal with the cross-session EEG features [41]. Different from the above research, a mental workload classification method based on EEG Cross-Session Subspace Alignment (CSSA) is proposed in this paper. This method tries to solve the tricky problem of cross-session MW classification and improve its accuracy. In this proposed method, the SA method is used to obtain the domain-invariant feature space with different sessions, and the method not only uses a single source domain as a training set but tries to make full use of all data in different sessions and allows adding new data into the training data set. The trained cross-session MW classifier can improve classification accuracy because it builds a personalized MW classification model using the data from multiple sessions including both new and existing sessions.
The rest of this paper is organized as follows. The first section introduces the experiment about collecting EEG data and extracting EEG features. The second section introduces the principle of the presented CSSA method. The third section applies the CSSA method to classify MW and compares it with the existing methods. The fourth section involves the conclusion of this study. The EEG signals were collected from 13 subjects with an engineering knowledge background from Beihang University (aged between 22 and 25 years, with 2 females and 11 males). The Multi-Attribute Task Battery (MATB-II, National Aeronautics and Space Administration (NASA), Washington DC, MD, USA) was used as the platform for the experiment. It is composed of 4 sub-tasks, including system monitoring, tracking, scheduling, and resource management.

Materials and Methods
The details of the four sub-tasks are shown in Table 1. The system monitoring task is presented in the upper left window of the display. The demands of monitoring gauges and warning lights were simulated here, and the demands of manual control were simulated by the tracking task. This task is shown in the upper middle window. The communication task presented pre-recorded auditory messages to the operator at selected intervals during the simulation, and the demands of fuel management were simulated by the resource management task. The four regions corresponded to visual and operational missions. During the experiment, two levels of MW were set; these were Low Mental Workload (LMW) and High Mental Workload (HMW). Resource Management Clicking the corresponding oil pump with the mouse when failure occurs. 1 24 The EEG data of 13 subjects were collected for 7 days. To avoid the learning effect, the participants were trained for 12 days before the experiment to ensure their familiarity with different tasks in the MATB-II task [42]. They were required to complete four sub-tasks of MATB-II under different MW levels. The order of experiments was designed with a Latin square design method to avoid the order effect. Three minutes of EEG data were collected as baseline data before the MW experiment on every subject.

Data Preprocess
Firstly, the average values of M1 and M2 were used as the reference for data processing [43]. M1 and M2 were located at the mastoid position behind the left and right ears, respectively. Then, the 30-channel EEG signals were filtered with a 1-30 Hz Finite Impulse Response (FIR) band-pass filter, where the filter order was 3300 [44].
Two selection criteria for subject EEG data were used in our study. To ensure the reliability of data analysis, the accuracy of the MATB-II task operation on the subject should be greater than 90%. To reduce the interference of the artifact signals, the artifacts of the subject EEG data can not exceed 40% [45]. Finally, the EEG data of 10 participants (sub1-sub10, aged between 22 and 25 years, with 2 females and 8 males) were selected for analysis.
For the 12-min EEG data, only the middle 10 min of EEG data from each MW experiment were used. They were divided into 300 data segments, where the length of a segment was 2 s. The segments were labeled according to the different MW levels induced by the MATB-II task. There were 600 data segments for two MW tasks.

Blind Source Separation of EEG Signals
The EEG signals are usually collected by non-invasive measurement technology through electrode sensors placed on the scalp. As shown in Figure 1, S i (i = 1, 2, . . . , n) represents the signals generated by neural activities in the brain. Due to the mixed effect, the measured mixed signals, X i (i = 1, 2, . . . , m), are different from the signals generated by neurons, S i . This phenomenon will have a negative effect on feature extraction. This mixed effect depends on the distances from neuron sources to electrode sensors, a ij (i = 1, 2, . . . , m; j = 1, 2, . . . , n) [46]. The problem of separating mixed EEG signals is similar to the problem of Blind Source Separation (BSS), which is used to analyze mixing speech signals [47]. Independent Component Analysis (ICA), based on information independence, is used to separate the original signals from their mixtures [23]. The m-dimensional observation vectors can be expressed as a linear combination of n-dimensional ICs, s i (i = 1, 2, . . . , n), as given in Equation (1) [48].
x j = a j1 s 1 + a j2 s 2 + · · · + a jn s n , j = 1, 2, 3, . . . , m This can be expressed as: where S = [s 1 , s 2 , . . . , s n ] T is a set of ICs and the number of ICs is 30, which is the same as the number of channels in the EEG signals. X = [x 1 , x 2 , . . . , x m ] T is a vector of the observed data and A is a mixing matrix.

Feature Extraction
The Power Spectral Densities (PSDs) at different frequencies are extracted as features of EEG signals in the frequency domain. Fourier transform in Equation (3) is performed on each data segment to obtain F s (n).
where F s (n) is the frequency spectrum of EEG signals after the Fourier transform; S is the ICs of EEG signals and shows the Fourier transform of the signals; N is the number of signal samples; n is 0 to N − 1.
The corresponding PSDs of the EEG ICs are calculated with Equation (4).
where p s (n) is the PSD of the EEG ICs and F * s (n) is the conjugate of F s (n). Many studies have shown that the human mental state is related to four rhythms [15]. According to the frequency band distribution, EEG signals can be divided into δ(1~4 Hz), θ(4~8 Hz), α(8~14 Hz), and β(14~30 Hz) [23,24].
where P s, freq refers to the power spectral density at a certain frequency and E s,δ , E s,θ , E s,α , and E s,β are the four energy features of the EEG ICs.
The absolute power spectrums are normalized to obtain the relative power spectrums. (6) where E all refers to the power spectral density at 1-30 Hz and E s,δ * , E s,θ * , E s,α * , and E s,β * are the four relative energy features of the EEG ICs.

Non-Stationarity of EEG Signals
EEG signals are sensitive to several influence factors, such as personnel fatigue, electrode placement, and the impedance value of the EEG cap, etc. All these lead to the non-stationarity characteristic of EEG signals. In order to explain this clearly, two groups of EEG signals are shown in Figure 2. These EEG signals were obtained from the same subject in MW experiments, but their time interval was 24 h. The three principal components of EEG ICs, PC1, PC2, and PC3 with 23.8% variance lost, are obtained with the Principal Components Analysis (PCA) method. In Figure 2a, the principal components are represented in four colors. The data marked in the colors red and blue are the first-day data, while the data marked in yellow and green are the second-day data. The LMW and HMW data are marked with "+" and "×", separately. The top and left views of Figure 2a are shown in Figure 2b. Even for the same subject, the energy features of the EEG are separated after 24 h, that is their marginal distributions have changed, but the conditional distributions in each domain are almost the same. The EEG data from the 1st day and the 2nd day can be used as the source and target domain data, respectively. The traditional machine learning method simply takes the source domain data as training samples to construct a classifier. Due to the distribution difference between the source domain and the target domain, the generalization ability of the traditional method is insufficient.

SA Method
As the middle 10 min of EEG data in each MW task were chosen and divided into 300 data segments, there were 600 segments and 120-dimensional energy features for two MW tasks in each segment. Let X ∈ x be the recorded EEG of a sample (X, y) and x = R C×D , where C is the number of channels and D is the number of samples of time series. In this paper, C = 30 and D = 300. y ∈ Y represents the corresponding MW label. Let P(X) be the marginal probability distribution of X and P(Y|X) be the conditional probability distribution. The source domain is represented as X Si = (x Si,1 , y Si,1 ), . . . , x Si,j , y Si,j , . . . , x Si,D S , y Si,D S , i = 1, 2, . . . , N, j = 1, 2, . . . , D S . Similarly, the target domain is represented as X T = (x T,1 , y T,1 ), . . . , x T,j , y T,j , . . . , x T,D T , y T,D T , j = 1, 2, . . . , D T , where (X Si , X T ) ∈ X is the normalized relative power spectral density.
As shown in Figure 3, although the source domain data and target domain data are in D-dimensional space, their marginal distributions are different, while their conditional distributions are almost the same. Here, assume that P(X S ) = P(X T ) but P(Y S |X S ) = P(Y T |X T ) , which means there are differences between different domains. The purpose of SA is to make P(φ(X S )) = P(X T ) with the linear mapping φ. According to the PCA method, the eigenvectors corresponding to the first l maximum eigenvalues can be found in each domain. Here, l represents the number of principal components and l = 3 in this paper. These eigenvectors are defined as the source domain subspace and target domain subspace, which are represented as Z S and Z T , respectively. Assume that there is a linear relationship between Z S and Z T , since Z S and Z T only depended on X S and X T , separately, the non-stationarity of EEG did not affect the method. Then we can find a linear mapping to align the source subspace with the target subspace.
We use the transformation matrix, S, from Z S to Z T to align the basis vectors. S transforms the source subspace into the target subspace by aligning the source and target basis vectors. S is achieved by minimizing the following Bregman matrix divergence: where · 2 F is the F-norm. Due to the orthogonality of the F-norm, Equation (7) can be rewritten as: The minimum value of Equation (8) is obtained when S * = Z S T Z T . For the coordinate system of the source domain, X a can be written as: With Equation (10)

KL Divergence
The KL divergence, also known as relative entropy, is an asymmetric measure of the difference between two probability distributions. When two distributions are the same, the KL divergence is zero. When the difference between the two distributions increases, the KL divergence also increases. The formula is given in Equation (11).
where p and q represent different distributions and p(x i ) and q(x i ) represent the probability density functions of two distributions, respectively. For the EEG data used in this paper, p(x i ) and q(x i ) represent the probability density function of the source domain data G and the transferred target domain data H. The purpose of SA is to make P(φ(X S )) = P(X T ) through a linear mapping φ. By calculating the KL divergence of the source domain and the target domain, similar transformed source subspace base vectors can be selected.

MW Classification Method Based on CSSA
To improve the MW classification performance for different time sessions, an MW classification method based on the CSSA is proposed. The proposed method consists of two parts, as shown in Figure 4. As shown in Figure 4a, the first step is to calculate the ICs of the EEG and extract their energy features. The filtered labeled EEG signals in N different time intervals or sessions form N labeled sessions, labeled session i, i = 1, 2, . . . , N. New filtered unlabeled EEG signals form a new unlabeled session. Their independent components, IC Si and IC ST , can be obtained using the ICA. Then, their energy features, X Si and X ST , can be extracted.
As shown in Figure 4b, the second part is the CSSA method. The source and target subspaces, Z Si and Z ST , are built using the PCA method. The transfer features of source subspaces, G i , are extracted using the SA method. Meanwhile, the dimension-reduced features of the target subspace, H, are extracted by multiplying Z T and X T . The KL divergences of G i marginal distributions are calculated in order to select approximately similar transformed source subspace base vectors, G Sj , j = 1, 2, . . . , m, m ≤ N. Then, the energy features in the selected vectors are trained to build a new classifier.
The SVM is a classical statistical classification algorithm. It is very effective in dealing with classification and regression problems [23]. In this paper, a linear SVM is finally used to classify the MWs in the target domains, H, and 5-fold cross-validation is used to reduce overfitting and ensure the robustness of the model. The grid search method is adopted to obtain the optimal model parameters. The penalty coefficient, C, indicates the tolerance of the error, and its search space is [0.001, 0.01, 1, 10, 30].

MW Classification Comparison with Different Methods
The MW classification accuracies of different methods are compared with those of different methods in this section.
In our method, the CSSA can adapt the energy features of ICs in a source subspace to its target subspace through linear mapping. In this way, the source domain subspace is aligned with the target domain subspace. Then the SVM classifier is used to obtain the MW classification results.
The CSSA method mentioned in Section 2.2.6 is named Method 1, and the compared methods are named Method 2 to Method 4, respectively. The SA method mentioned in Section 2.2.4 is named Method 2 [32]. The TCA method is named Method 3, as shown in Figure 5. The PCA method is named Method 4 [31,49,50], as shown in Figure 6. For Method 3 and Method 4, the EEG signals are preprocessed in the same way, as shown in Figure 4a.  Here, G T and G i represent the transfer of features from X T and X Si , respectively. Method 3 assumes that φ is the feature map induced by a universal kernel, and the distance of two domains can be obtained with Equation (12), where H is a universal RKHS [51], and D is the number of samples.
Method 3 used the kernel learning: where µ is a trade-off parameter and I is an identity matrix. K and L can be calculated as follows: The PCA method can extract the main feature components of data for the dimensionality reduction of high-dimensional data [52]. In Method 4, assuming that there is an n-dimensional data sample X with D samples, X can be decomposed into [53]: where UU T = VV T = 1, and Σ is a diagonal matrix. Divide Σ into r columns, named Σ r . H can be calculated by Equation (16): Here, the calculation method used for X T and X Si is the same as in Figure 4a. The PCA method is directly used to obtain X T and X Si . H T and H i represent the reduced dimension features from X T and X Si , respectively.
The EEG signals of the 10 subjects over seven days were obtained and preprocessed. Session 0 represents the EEG data obtained on the first day, and the data from this session were only used as training data. Session 1 to Session 6 represents the corresponding EEG data from the second day to the seventh day.
It should be noted that in Methods 2 to 4, the classified EEG data are the ones from the current day and the training data are the ones from the day before in this section. For example, if the EEG data from the fourth day were classified to determine their MW level, then the data from the third day would be used as the training data. In Method 1, the number of source domains corresponds to the days. For example, the fourth day has three source domains, in which the labeled EEG data are those from the first three days.
The classification accuracies of the four methods for the 10 subjects are compared in Figure 7. For the same time and data, the accuracy of Method 1 ranged from 82.0% to 88.2%, and the average classification accuracy was 84.7%. The accuracy of Method 2 ranged from 74.8% to 88.2%, and the average classification accuracy was 80.2%. The accuracy of Method 3 ranged from 66.0% to 87.1%, and the average accuracy was 74.0%. The accuracy of Method 4 ranged from 50.2% to 63.3%, and the average classification accuracy was 52.9%. For Session 2, Method 2 faced the problem that its accuracy decreased due to the orthogonality of the source domain and target domain for Subject 3. Fortunately, this problem could be solved by Method 1 because the number of source domains gradually increased over the days.
Compared with Method 2, the accuracy of Method 1 was improved by 1.4% to 11.3%, for an average of 4.5%. Compared with Method 3, the accuracy of Method 1 was improved by 1.2% to 16.6%, for an average of 10.7%. Compared with Method 4, the accuracy of Method 1 was improved by 23% to 31.5%, for an average of 27.3%.
The average classification accuracies of Method 1 and Method 2 for 10 subjects were higher than those of Method 3. This is because Method 3 only used shared features. Method 1 and Method 2 both used relevant features. In addition, the stability of Method 1 was better than that of the others. Therefore, we chose to use Method 1 to classify the MWs across time.
The Friedman test based on algorithmic sorting was used to prove whether these methods perform equally. The data obtained in each session were considered as a data set, D i (i = 1, 2, . . . , 6). Each dataset contained data for Subject 1 to Subject 10. In comparing k algorithms on N data sets, r i represents the average order value of the i th algorithm, while τ x 2 and τ F were calculated from Equations (17) and (18), respectively.
When α = 0.05, K = 4, and N = 7, the critical value of this test is 3.29. According to the above equations, τ F = 17. The assumption that all algorithms perform equally can therefore be rejected.

MW Classification Comparisons Using Different Source Domains
As CSSA only changes the mapping of the source domain, it can transform multiple source domains into the same subspace to minimize the difference between all source domains and the target domain. In this section, the impact of the multi-source domains on the classification performance will be discussed using only Method 1.
Six classification cases were used, and they contained 1 to 6 source domains. The number of source domains corresponded to the case number. For example, Case 3 had three source domains, in which the labeled EEG data were those obtained on the first three days.
The average classification accuracies of 10 subjects according to the number of source domains are shown in Figure 8. The results show that the classification accuracy was constantly improved as the number of source domains increased.

Discussion
In the traditional MW classification method, the classifier is trained only using the existing given EEG data, and then identifies the level of MW for unlabeled data. However, the distribution of EEG signals usually changes in different sessions due to the EEG signals being non-stationary. This may reduce the generalization ability of the classifier and lead to low classification accuracy. Hence, the above methods cannot be effective for cross-session classification when the distribution of EEG data changes greatly, which might lead to low classification accuracy.
The TCA method maps the source domain and target domain data into a common space, and the number of its source domains is fixed. The SA method maps the source domain to the target domain, but it is difficult to find the domain-invariant feature when there is an orthogonal relationship between the source domain and the target domain. The CSSA method proposed in this paper can handle this problem because it can make full use of existing and new data. With the increase in the number of source domains, the accuracy of the classifier might be improved and become stable when the number of source domains is large enough.

Conclusions
In this study, we propose an MW classification method based on EEG cross-session subspace alignment. In this method, the source and target subspaces are obtained separately using the energy features extracted from the ICs of filtered EEG signals. The source subspace base vectors are aligned with the target subspace coordinate system base vector by linear mapping. The source domains are optimized by KL divergence. This study proposes a method to solve the problem of the marginal distribution of EEG signals being different in the same space due to their random non-stationary characteristics. The method may be able to improve cross-session generalization ability and classification accuracy.
Based on the analysis and discussion of the results, it can be concluded that: (1) In order to study the impact of a migration algorithm on classification performance,