InstanceEasyTL: An Improved Transfer-Learning Method for EEG-Based Cross-Subject Fatigue Detection

Electroencephalogram (EEG) is an effective indicator for the detection of driver fatigue. Due to the significant differences in EEG signals across subjects, and difficulty in collecting sufficient EEG samples for analysis during driving, detecting fatigue across subjects through using EEG signals remains a challenge. EasyTL is a kind of transfer-learning model, which has demonstrated better performance in the field of image recognition, but not yet been applied in cross-subject EEG-based applications. In this paper, we propose an improved EasyTL-based classifier, the InstanceEasyTL, to perform EEG-based analysis for cross-subject fatigue mental-state detection. Experimental results show that InstanceEasyTL not only requires less EEG data, but also obtains better performance in accuracy and robustness than EasyTL, as well as existing machine-learning models such as Support Vector Machine (SVM), Transfer Component Analysis (TCA), Geodesic Flow Kernel (GFK), and Domain-adversarial Neural Networks (DANN), etc.


Introduction
In recent years, there has been rapid increase in the number of traffic accidents, yielded to huge losses to people's lives and their properties. A lot of evidence shows that driving under the condition of fatigue state (fatigue driving) is one of the main causes of traffic accidents. Statistical results also indicate that fatigue driving leads to 35-45% of road traffic accidents [1][2][3], and directly causes 1550 deaths, 71,000 injuries, and $12.5 billion in economic losses each year according to the reports of American National Highway Traffic Safety Administration (NHTSA) [4]. Therefore, it is of vital importance to design an efficient and accurate analysis model for detecting fatigue over time during driving.
Generally, there are three ways to detect fatigue mental states. The first is video-based detections. Computer vision techniques are used for detecting fatigue by analyzing the facial expressions such as blinking, eye closure duration, yawning, and so on [5]. In this way, the blink frequency is one of the key factors for detecting fatigue. However, changes in illumination or wearing sunglasses will reduce the

Subjects
This experiment is performed under the approval of the local ethical committee of University of Rome "La Sapienza" (Rome, Italy). 15 subjects (ages range [24][25][26][27][28][29][30] years, mean ± std = 26.8 ± 3.2 years) are selected to participate in the driving experiment. All the subjects have qualified driving licenses. Before the experiment, all the subjects are informed and interpreted of the experimental intention, and sign the written consents. In addition, all of them are not allowed to drink alcohol the day before the experiment, and no caffeine is taken 5 h before the experiment, respectively. Finally, the experiment is conducted in accordance with the principles outlined in the Helsinki Declaration of 1975, which was revised in 2013.

Experimental Protocol
To eliminate the possible effects of circadian rhythms and meals, the experiment is performed between 2 and 5 pm. In addition, the Driving track is the Spa Francorchamps (Belgium) track, and the Vehicle type is an Alfa Romeo Giulietta QV (1750 TBi, 4 cylinders, 235 HP) on our driving simulation platform. Figure 1 is the driving experimental setting. The experiment is started in a quiet environment without any noise, the subjects are asked to sit on a comfortable sofa to drive a car by controlling a steering wheel according to the stimulation track 1m in front of them.
There are 8 tasks during the experiment, as shown in Table 1. To obtain a reference, we define warm-up (WUP) as the baseline of the experiment that is at the beginning of the experiment. In WUP, the subject is asked to drive a car for 2 laps without any stimuli and any errors. Next stage performance (PERFO) is similar to WUP, but the total time of driving is 2% less than WUP. Then the task of "alert" with video and "vigilance" with audio (TAV) is designed and appeared randomly during the process of the experiment to make the subjects feel tired more easily, which are follow after PERFO with a pseudo-random order, TAV3, TAV1, TAV5, TAV2, TAV4. Please note that they will receive visual or sound stimuli with different frequencies in TAVs stages. Different stimulus frequencies of "alert" or "vigilance" represent the different degree of stimulation, defined as TAV1, TAV2, TAV3, TAV4, and TAV5 with different stimulus intervals [28][29][30][31]. The TAVs duration will be set up depending on the total time spent in WUP. From TAV1 to TAV5, the stimuli intervals are 9800-10,200, 7700-8100, 5900-6300, 4100-4500, and 2300-2700 ms, respectively. The last stage is drowsiness (DROW) with a slow speed just like driving in a crowded city center, without any video or audio stimuli alongside the track for 2 laps. The subject should press "Button#1" with his left finger when 'X' appears, which is an "alert" task, and press "Button#2" with his right finger when two consecutive "beep"'s come, which is a "vigilance" task [32][33][34]. "Alert" stimuli is used to mimic the actual road conditions, such as traffic lights, pedestrians crossing the road, other vehicles, etc. (as shown in Figure 2) [35], while "vigilance" stimuli is used to stimulate the car radio, engine noise or a phone call (as shown in Figure 3). Usually, the whole experiment will last 2 h. WUP, PERFO, 5 TAVs, and DROW will need about 10 min, 2% less than that of WUP, 1 h, and 20 min, respectively. There is a break of about 5 min between each stage.

WUP
WUP is as a baseline, which requires the driver to drive the car throughout the whole track without any speed requirements and any extra stimuli, but always kept the car in the path.
PERFO In PERFO, the subject is asked to improve his previous performance of 2%. (total time = baseline − 2%)

TAVs
Five TAVs (TAV1 to TAV5) are presented in a pseudo-random order, TAV3,TAV5,TAV1,TAV2,TAV4, where TAV1 is in the situation with the longest stimulus interval, TAV5 is in the situation with the shortest stimulus interval.
DROW DROW is the stage that the subject is required to drive slowly without any requirements on the speed.
All subjects are required to take a half-hour training session to familiarize themselves with the simulator's commands and interfaces before starting the experiment, the experimental flow is shown in Figure 4.
After finishing every stage, the subjects are required to fill in the NASA-TLX questionnaire to provide subjective workload perception during the task. Moreover, a run sheet is used to take note of the subjects' drive performance ("off-road") during the experiment. "off-road" means that the subject has driven out of the track at least with one wheel.    Presented in a pseudo-random order Start End

EEG Recording and Preprocessing
EEG is recorded by using a digital monitoring system (Brain Products GmbH, Germany), in which all 61 EEG channels are referenced to the earlobe, grounded to FCz, their impedances are kept below 10 KΩ, and the sampling frequency F_s is 200 Hz. We use eeglab toolbox in Matlab to pre-process and process EEG data. A band-pass filter is employed to keep EEG signals with a frequency range between 1 and 30 Hz for fatigue driving analysis [36], independent component analysis (ICA) [28,37] is used for removing EOG artifacts.

EEG Feature Extraction
Before feeding the EEG data of each subject into the classifier for training, we extract the power spectrum density (PSD) features from EEG data of each subject [38,39], which is usually used for feature extraction in EEG analysis. The detailed PSD feature exaction procedure is listed as the following steps: 1. For the recorded EEG of each channel, a 0.5 s hamming window without overlapping between two successive windows is used for dividing EEG into multiple samples. We extract 1400 hamming windows of sample points and each window have 0.5 s × F_s = 0.5 × 200 = 100 sample points (as shown in Figure 5a). Thus, the number of hamming windows (HW) is 1400 and the sample points in each hamming window are 100 × 61 channels = 6100. 2. For each channel in each window, we apply the one-sided PSD-estimate of EEG signals with the frequency of 200 Hz that represent the strength in terms of the logarithm of power content of the signal at integral frequencies between 0 and 100 Hz. This produces 101 feature-length vectors. Therefore, 100 sample points in each channel become 101 features as well (as shown in Figure 5b). 3. From the acquired 101 features in step 2, we can then extract 27 features at the frequency bands of θ band (4-7 Hz), α band (8-13 Hz), and β band (14-30 Hz), respectively [40] (as shown in Figure 5c). 4. Then, the extracted features will be appended together to form D = (61 × 27) = 1647 dimension of feature vectors (as shown in Figure 5d).

5.
Consequently, for those HW =1400 windows/samples, we now have a feature space FS with HW × D =1400 × 1647 of order that will be fed into our proposed model for training (as shown in Figure 5e).  Figure 5. EEG feature vectors generation process.

The Existing EasyTL Method
EasyTL [24] is a kind of transfer-learning method that has been applied in the field of image applications and achieved better performance. It consists of two parts: intra-domain alignment and intra-domain programming, as shown in Figure 6. For intra-domain alignment, it aligns n s samples to form the sample set Ω s in the source domain s as x s i n s i=1 , and n t samples to form the sample set Ω t in the target domain t as x t j n t j=1 through intra-domain alignment [41], aiming at making the difference between s and t as small as possible. Please note that the source domain is ( Intra-domain programming builds the classifier model by proposing a new Probability Annotation Matrix W, the rows of W denote the class label c ∈ {1, 2, ..., C}, and the column x t j represents the target samples. The element W cj indicates the annotation probability of x t j belonging to class c. Based on the matrix W, EasyTL can predict the target samples. Please note that the class labels of the target sample x t j that we choose are the corresponding ones with the maximum of {W cj }, j ∈ {1, 2, ..., n t }. For instance, as shown in Figure 7, the class label of x t 1 will belong to class C 3 since it has the maximum probability  . An example of the probability annotation matrix. The rows denote the class labels, and the columns denote the target samples. The entry value W ij indicates the annotation probability of x t j belonging to class C i ; i = 1, 2, 3, 4 and j = 1, 2, ..., n t . The class labels of x t j belonging to class C i are marked in bold .

InstanceEasyTL
The main reason EasyTL has a better performance in the field of image recognition is that there is basically not much difference on pixel features between the images in the target domain and those in the source domain, which makes it be relatively easy to adopt the intra-domain alignment to align the target domain with the source domain. However, for cross-subject EEG analysis, due to the significant differences of EEG among different subjects, which will lead to the large difference in the distribution of EEG features. The existing intra-domain alignment method in EasyTL is difficult to align the features between the source domain and target domain, hence it is difficult to obtain an ideal cross-subject analysis effect. Therefore, in this paper, an improved EasyTL-based method, InstanceEasyTL, is proposed for overcoming such shortcoming of EasyTL for cross-subject EEG analysis. The main idea of the proposed InstanceEasyTL lies in the aspect that to match the different distribution of EEG signals from different subjects, we adopt a strategy of alignment with certain weights to align EEG samples collected from both source and target domains. To achieve this goal, InstanceEasyTL will "borrow" some EEG samples from the target domain Ω t , together with the original source domain Ω s (also called T sd ) to form a new sample set of source domain for training, which will get more EEG data and reduce the cost of EEG collection. As shown in Figure 8, the initial target domain Ω t is divided into two parts: S and T td , in which T td is employed as part of the new source domain Ω s , thus Ω s consists of the initial source domain T sd and T td in Ω t , and accordingly, the new target domain Ω t only includes the part of S. Mathematically, InstanceEasyTL can be depicted as follows.
First, we can determine T sd and T td according to the coefficient λ as Equations (1) and (2), respectively: and where y s and y t are the sets of class labels corresponding to Ω s and Ω t . (x sd i , y sd i ) and (x td i , y td i ) are the i-th sample and its corresponding class label in the source domain Ω s and target domain Ω t , respectively. n s and m are the number of samples in T sd and T td .
Accordingly, we can then form the new source domain Ω s by Equation (3), and the new target domain Ω t by Equation (4), respectively: and here, T and S denote the sample sets of Ω s and Ω t , respectively, l is the number of samples in Ω t . Algorithm 1 illustrates the proposed InstanceEasyTL method in detail. First, initial weights are assigned to both training source domain T sd and training target domain T td by using Equation (5). We should note that compared with the EasyTL method, T sd = Ω s , T td ⊂ Ω t , m < n t , and m + l = n t , respectively (can also refer to Figure 8).
Secondly, the assigned weights for both T sd and T td are divided by the summation of all weights and stored as p t (as shown in Equation (6)). Based on the intra-domain programming method (also called EasyTL(c) in Algorithm 1), we take the training sample set T in Ω s (please see Equation (3)), p t , and the new testing set S in Ω t (please see Equation (4)) as the input of InstanceEasyTL algorithm, then we can calculate the output h t of EasyTL(c). Here, S is not for the update of the weights, but for the testing after the end of iterations.
Thirdly, we will calculate the error t between h t and real class labels y(x). The weights of T td and T sd are updated by β t -based function (please see steps 6 and 7 in Algorithm 1).
Finally, if the number of iteration reaches N, the expected output in S will be calculated by Equation (10).

Algorithm 1. InstanceEasyTL
Require: the labeled sample set T in Ω s (Equation (3)), the unlabeled test set S in Ω t (Equation (4)), the intra-domain programming method EasyTL(c) and the maximum number of iterations N. 1: Initialize: the initial weights, calculated by Equation (5) and the initial number of iterations t = 1.
Where w i sd and w i td are the weights of the i-th samples in Ω s and Ω t . W 1 sd , W 1 td and W 1 are the sets of weights in Ω s , Ω t and both of source and target domains after one iteration, respectively.
3: After t ∈ {1, 2, ..., N} iterations, w, W t and sum t represent the weight of one sample, the set of weights of all samples, and the sum of w in Ω s , respectively, and p t means the set of the weight w of each sample in Ω s in proportion of sum t . 4: After t iterations, take T, p t and S as the input of EasyTL(c) to calculate the expected class label h t {x} of sample x in T and S. 5: After t iterations, calculate the error t of sample x between h t (x) and y(x) in T td , here, y(x) returns the real label of sample x in T td .
6: Define functions β t and β to update W td and W sd , respectively. As shown in Equation (8) [42].
7: The new weight set after t iterations can be obtained by Equation (9).
8: t = t + 1, Go back to step 2 util t = N Ensure: the final expected class labels h f (x) in S according to Equation (10). herein, h t [x] is the expected class label of sample x in S for each step of the loop, which can be got from step 4:

Results
The experiment is performed on a GPU with a memory of 64 GB, Titan Xp graphic memory with 12 GB. Additionally, Intel i5-4570 CPU with a frequency of 3.2 G Hz, 1 TB of storage capacity, and 8 GB of memory are also employed to run these algorithms. InstanceEasyTL and other comparable models except for DANN are tested on MATLAB R2016b software and DANN on pycharm_2018.3.3 software. Codes are available at https://github.com/13etterrnan/InstanceEasyTL.
We compare InstanceEasyTL with the traditional machine-learning and transfer-learning methods, including SVM with linear kernel and intra-domain alignment [41], Transfer Component Analysis (TCA), Geodesic Flow Kernel (GFK), Domain-adversarial Neural Networks (DANN), and the existing EasyTL.

Selection of Experimental Conditions
To distinguish experimental conditions used to train models, we make use of the subjective (i.e., NASA_TLX) and subjects' drive performance (i.e., "off-road"). With the change of the workload in different experimental conditions, it shows TAV3 and DROW have the highest and lowest workload, respectively. In addition, due to TAV3 is the first stage with audio and video stimuli in each experiment, subjects will be most sober. Correspondingly, when entering into the stage of DROW, the subjects are very likely to feel tired. Therefore, TAV3 and DROW are used as the typical mental states for analysis.
There are a total of 1400 samples for each subject, including 700 samples for TAV3 and 700 samples for DROW, respectively.
In this paper, we set the number of loops N = 30 for InstanceEasyTL, since we find that the performances reach convergence after 30 loops. We perform 15 times of experiments to test the performances of InstanceEasyTL, in each experiment, 14 of 15 subjects are used as training samples, and the remaining as the testing samples. Thus, after 15 experiments, each subject is employed as the target domain to be tested at least one time, and the Ω t in each experiment belongs to a different subject to ensure we can acquire more objective performances of InstanceEasyTL. Figure 9 shows the classification performance of different models. We can find that in these 15 experiments, InstanceEasyTL has almost the highest classification accuracy in cross-subject EEG analysis among these classifiers, which is more obvious in the three experiments of Sub_1-others, Sub_3-others, and Sub_15-others. Regarding the average performance, the average accuracy of InstanceEasyTL is 88.33%, which is significantly higher than that of SVM (61.65%), TCA (58.01%), GFK (56.32%), DANN (70.75%), and EasyTL (70.91%). Please note that it is 15% higher than that of the second-ranked classifier.

Statistical Analysis Results
To further analyze the performance of InstanceEasyTL on fatigue driving detection, we calculate the indicators of Recall, Precision, and F1score for each classifier, the results are shown in Table 2. These three indicators can be calculated as follows: where TP is the number of samples correctly predicted as DROW, FP is the number of samples incorrectly predicted as DROW, FN is the number of samples incorrectly predicted as TAV3.
Therefore, Recall means the ratio of the samples correctly predicted as DROW to total samples in DROW state, and Precision means the samples correctly predicted as DROW to the samples predicted as DROW, respectively, F1score is the harmonic mean of the precision and recall.
We can find in 15 experiments InstanceEasyTL outperforms all other methods. The average Recall, Precision, and F1score achieved by InstanceEasyTL on our dataset are 89.08, 88.02, and 88.46, respectively, which significantly outperforms DANN by 18%, 16%, and 17%, respectively.

The Impact of Different T td : Ω t on InstanceEasyTL
In this section, we count the impact of different ratio of T td : Ω t on InstanceEasyTL. Here, we set the ratio of T td : Ω t to be 0.1, 0.2, 0.3, 0.4, and 0.5, respectively. We can find that different ratio of T td : Ω t results in different classification accuracy on InstanceEasyTL, as shown in Figure 10. From Figure 10, the cross-subject recognition accuracy of InstanceEasyTL is above 80% and shows an increasing trend as the ratio of T td : Ω t increases, except for very few exceptions. Similar results can also be observed from Figure 10p by calculating the average accuracy of InstanceEasyTL. We can then conclude that the higher the ratio of T td : Ω t , the higher the accuracy of InstanceEasyTL.   15) has the same meaning as in Figure 9. We also calculate the average accuracy of InstanceEasyTL.

Discussion
Compared to other traditional methods, InstanceEasyTL could acquire better classification accuracy performance for cross-subject EEG analysis. There are mainly two reasons. First, from the perspective of data analysis, InstanceEasyTL "borrows" some samples and labels from the target domain Ω t , which will be regarded as a part of the new generated source domain Ω s , thus, InstanceEasyTL can acquire more information of Ω t to make classification accuracy much higher. However, for other traditional methods, although the samples in the target domain can also be used for classification, they usually have no corresponding labels for these samples. Secondly, from the perspective of model training, InstanceEasyTL adjusts the weights of samples during multiple iterations, and can adaptively choose the samples that are similar to those in the target domain Ω t , which is more helpful for the subsequent training process. Because InstanceEasyTL adopts the samples and labels from Ω t for training, which comes from different subjects, much better performance across subjects will then be acquired. Although other methods, such as TCK and EasyTL, have proven to be effective in image recognition applications, because they make the feature distributions between the source and target domains much similar, due to the significant differences of EEG signals over time and across subjects, it is very difficult to train those models to acquire similar features between the source and target domains for EEG analysis across subjects.
Then, we employ Recall, Precision, and F1score for statistical comparison analysis. It can be known from Equation (11) that the greater the Recall/Precision/F1score are, the easier it can distinguish DROW from TAV3. From Table 2, we can find InstanceEasyTL has the highest values of Recall, Precision, and F1score statistical analysis results, which illustrates better classification performance that InstanceEasyTL has in the cross-subject analysis.
Moreover, the impact of different ratios of T td : Ω t on InstanceEasyTL is counted as well, as shown in Figure 10. Here, S and T td are used to form the new target domain Ω t and new source domain Ω s , respectively. Accordingly, samples in Ω s and Ω t are used by InstanceEasyTL for training and testing. In general, the different ratios of samples in these two domains will also affect the performance of InstanceEasyTL. Obviously, as the ratio of T td : Ω t increases, on the one hand, the samples used for training in Ω s get larger accordingly, the better the performance of InstanceEasyTL as well. On the other hand, in this way, InstanceEasyTL can extract more characteristics from the original target domain Ω t , which is more suitable for EEG analysis across subjects. Additionally, we can find more information from Ω t when T td : Ω t equals to 0.7 since the number of samples in Ω s is getting larger, which is more helpful to InstanceEasyTL for extracting the characteristics in Ω t , and helps improve the performance of InstanceEasyTL as well. However, few abnormal results exist. For instance, as shown in Figure 10a when the ratio of T td : Ω t increases from 0.3 to 0.4, the accuracy of InstanceEasyTL decreases. The main possible reason is that we do not completely remove the artifact components in EEG signals during preprocessing stage, which will result in the inclusion of some artifact components in the new source domain Ω s or the new target domain Ω t , thereby reducing the classification accuracy of InstanceEasyTL.

Conclusions
In this paper, we propose an EasyTL-based model, named InstanceEasyTL, that focuses on performing fatigue detection across subjects based on EEG. InstanceEasyTL can extract important characteristic information from some subjects and then transfer it to the other new subjects by assigning weights to the samples and "borrowing" a part of data from the new subject as training samples. To verify the performance of InstanceEasyTL, we compare it with the traditional methods, such as SVM, TCA, GFK, DANN, and EasyTL. The results show that InstanceEasyTL can obtain better cross-subject classification performance. Moreover, the statistical analysis shows that InstanceEasyTL can better distinguish the mental state of DROW from TAV3, hence we can then make a further conclusion that InstanceEasyTL has a more comprehensive classification performance. In addition, by adjusting the ratio of T td : Ω t , we can find the performance of InstanceEasyTL will be improved as the ratio increases.
In our future work, we will continue to focus on the research on partial transfer learning where the target domain label space is a subspace of the source domain label space.