1. Introduction
Cognitive workload is the measure of the amount of mental effort required to complete any task [
1]. Working memory is required to process information for short periods of time, while long-term memory is associated with storing information for long periods of time [
2]. Tasks such as arithmetic operations, reading and learning require efficient use of working memory. Cognitive workload can be defined as the amount of mental activity utilized by working memory to complete any task. Assessment of an individual’s cognitive workload is an essential component in most human-machine collaboration tasks. A major application of this lies in the defense domain. Operations like driving under high-stress environmental conditions, monitoring air traffic control, piloting an aircraft or operating an unmanned vehicle are excellent examples. The optimal level of cognitive workload is pivotal in high-risk scenarios where important decisions are supposed to be made in real-time. The rate at which the information is processed determines the workload induced in any individual while performing any task. A high workload can lead to unplanned and disproportionate hazards, and too little workload can lead to being disengaged from the task. This points to the importance of maintaining optimal cognitive workload in high-risk scenarios to perform the task satisfactorily. With respect to cognitive workload, emotional intelligence and stability are regarded as essential components. An individuals’ cognitive load will be affected by emotional valence as it will interfere with parallel cognitive processing. Studies show a positive relation between emotional intelligence and some cognitive tasks [
3,
4]. Therefore, classification of cognitive workload can be an essential indicator of emotional intelligence and stability.
Although the assessment of cognitive workload is important, it is not a trivial task. Traditional methods of the evaluation of cognitive workload included subjective measures such as interviews or questionnaire-based approaches where the participants self-reported the amount of workload caused/induced during the task. Various research groups such as Hart et al. [
5] and Malekpour et al. [
6] contribute towards the assessment of cognitive workload with the use of subjective methods, primarily in the form of self-assessment questionnaires, like NASA-TLX (National Aeronautics and Space Administration Task Load Index), MCH (Modified Cooper-Harper Scale) and SWAT (Subjective Workload Assessment Test). Such questionnaires generally record the various metrics involved in performing the task, such as demand (mental, physical and temporal), effort, pressure, concentration, frustration, etc., to evaluate their connection with performance during the task. These methods prove to be subjective to the individual participant, however, and can be biased and prove to be unreliable as a distinct and coherent metric for the evaluation and estimation of cognitive workload in general as they depend on the participant recalling past engagement. Another drawback of using post-task questionnaire is that it does not allow for real-time evaluation of cognitive workload.
In contrast to the subjective questionnaire based methods, the evaluation based on neuro-physiological signals present an opportunity for an objective and real time assessment of cognitive workload. However, this method of evaluation comes at the expense of limited availability of equipment, trained operators and high costs. To obtain better efficacy and efficiency, physiological measures such as Electroencephalography(EEG), Event-Related Potential (ERP), Eye Tracking (gaze entropy), and Heart Rate Variability (HRV) can be utilized [
7,
8,
9]. EEG is highly accepted as a measure to assess cognitive workload in real-time [
10,
11,
12]. Various EEG features including time, frequency, time-frequency, and spatial domain features extracted from raw EEG data are effective ways to gain information from EEG signals. Time domain features mainly include Event Related Potentials (ERP) [
13], statistical features (mean, standard deviation, variance, etc.), higher-order crossing analysis [
14], and Hjorth parameter. Frequency domain features include decomposing the frequency in multiple sub-bands such as delta, theta, alpha, beta, and gamma bands which are mainly associated with deep sleep, drowsy, relaxed, engaged, conscious, and active states, respectively [
15]. Such features are commonly used for classification of workload in various machine learning experiments. Recent advancements in the application of deep learning in various domains such as emotion recognition, pattern recognition and prediction makes it an excellent choice to be used with EEG signals for classification [
16,
17,
18,
19]. EEG signals can be used to decode and classify the human cognitive state. Various studies have carried out research in the area with different combinations of EEG features and machine learning models. Bashivan et al. [
20] demonstrates the use of fast Fourier transform to convert EEG data into the frequency domain and map the 3D spatial positions of electrodes to 2D, according to the distribution of the electrodes. Using theta, alpha and beta frequency bands, 3-channel spectral maps are generated and sent to CNN model for classification of mental load. Kwak et al. [
21] propose a multi-level feature fusion method based on CNN to learn the spectral, spatial, as well as local and global information. Li et al. [
22] reviews some deep learning models (e.g., RNN and CNN) and their applications for EEG data to decode brain activities and diagnose brain diseases.
Substantial research for estimation of cognitive workload from EEG using machine learning and deep learning is limited. Most of the studies perform binary classification of workload into high and low by extracting compute expensive EEG features from the raw data, making these non ideal to be used in real life conditions or in real time. Das et al. [
23] reports an accuracy of 86.33% and 82.57% for binary and three class classification, respectively, using a BLSTM-LSTM based architecture in a subject independent study. Appriou et al. [
24] performs subject specific and subject independent studies for binary classification of workload, achieving the highest mean accuracy of 72.7% and 63.7% using CNN for subject-specific and subject independent cases, respectively. In the study by Zhang et al. [
25], the authors achieved an accuracy of 88.9% in binary classification using a combination of RNN and 3D CNN models with EEG topographic maps as features for classification. Using a similar technique of topographic maps in combination with a modified CNN model, highest accuracy of 91.9% in subject specific three class classification is reported [
26]. However, more informative features regarding an individual’s brain can be obtained from EEG data. Information acquired from signals originating from a specific brain region can be regarded to represent the brain activity of that region. This allows the study of separate brain regions in isolation when evaluating characteristics relevant to a specific cognitive state and this methodology has been adopted by various researchers. However, neuronal activity is not this straightforward as different regions of the brain contribute to the completion of a task, while different regions are still dominantly responsible for specific functions required for the completion of the task. This implores the necessity of examining the inter-regional interactions to understand the collaboration of the different brain regions. More formally, this analysis is termed as brain connectivity.
Brain Connectivity has been used to study the nature of the cerebrum in the past. Based on the attributes of connections, it can be classified into three types: structural connectivity (biophysical connections between neurons or neural elements), functional connectivity (statistical relations between anatomically un-connected cerebral regions) and effective connectivity (directional causal effects from one neural element to another) [
27]. This study focuses on the exploration of functional brain connectivity as a measure to assess different levels of workload. Brain functional connectivity has been linked with cognitive deficient psycho-physiological diseases. Strong patters on connectivity in resting state EEG are evident in autism spectrum disorders as reported by [
28]. Slower and less efficient connectivity is found in schizophrenia patients as reported by [
29]. Another study suggested a relation between high frequency connectivity neural pattern and recurrent illness course of major depressive disorder [
30]. However, few studies have investigated the links between cognitive workload and brain functional connectivity networks. Dimitrakopoulos et al. [
31] is one such study that has used brain connectivity measure as a feature for classification of workload. This study uses correlation as a method of brain connectivity and achieved an accuracy of 88% for binary classification using SVM classifier. Another study by Islam et al. [
32] explores the use of Mutual Information based functional connectivity for binary classification of drivers’ mental workload using the SVM classifier and obtained an accuracy of 82%. There are only a limited number of studies that explore functional connectivity as a feature for classification of workload. Therefore, in this study we explore different functional brain connectivity methods as features to be used for classification of levels of cognitive workload. EEG data is known to have high inter-subject variability [
33,
34]. Various researchers such as Byrne et al. [
35] and Pang et al. [
36] study the inter-subject variability. Nentwich et al. [
37] report the subject-specific nature of EEG-based functional connectivity. Given this evidence, subject specific classification of workload has been aimed at in this study. In Zhang et al. [
38], the authors compared the subject-dependent and independent approach and highlighted that variations in feature distribution of EEG across subjects reduces the generalization ability of a classifier and at the same time subject-dependent approach provides a promising way to solve the problem of personalized classification. In Neto et al. [
39], the authors discussed various subject specific characteristics and data splitting techniques for EEG data. A possible advantage of subject specific classification is that the classifier can learn subject-dependent features and it can be really useful in building robust and effective BCI systems [
40,
41].
The contributions of this paper can be summarized as follows:
A novel method of cognitive workload estimation using EEG, functional brain connectivity and deep learning is proposed. Our pipeline included cleaning 64-channel EEG data, selecting 16 electrodes based on brodmann area, extracting a 16 × 16 connectivity matrix and using deep neural networks for classifying workload into low, medium and high classes.
We chose model-free functional connectivity metrics (Mutual Information (MI), Phase Lag Value (PLV) and Phase Transfer Entropy (PTE) to classify workload using simple yet effective deep learning architectures (CNN, LSTM and Conv-LSTM) in near real-time.
The proposed method achieved state-of-the-art accuracy for three class workload classification. We achieved an average accuracy of 80.87% for three class workload classification problems using MI and CNN. PLV and PTE also perform better with CNN as compared to the other architectures with a average classification accuracy of 74.07% and 71.16%, respectively. CNN outperforms the other architectures because of the high spatial information in the input connectivity matrix.
The efficacious results highlight the promise of using functional connectivity features of EEG for real-time workload classification.
The rest of the paper is organized as follows.
Section 2 presents the materials and methods used for in the experiment.
Section 3 discusses the results obtained in various experiments and
Section 4 presents the implications of the reported results and the possible future directions and possible extensions of the current work.
3. Results and Discussion
In this research, the efficacy of three different functional brain connectivity analysis methods (MI, PLV and PTE) to classify cognitive workload into high, medium and low using three different deep learning architectures (CNN, LSTM and Conv-LSTM) was investigated. Nineteen participants executed the the modern version of the n-back task on a computer screen with three levels of cognitive workload, high, medium and low.
The input to the deep learning networks was 16 × 16 connectivity metrics. Sixteen brain regions were chosen from the brodmann atlas [
61] to cover the different brain regions and at the same time keep the computations as fast as possible.
Figure 4 shows the differences (for a random participant) between low, medium and high workloads of MI, PTE and PLV, respectively. Although the differences among the three connectivity metrics are visible, there are no explicit and visible differences among the three workload conditions, i.e., low, medium and high.
However, in the statistical analysis, significant differences were found among the three conditions. The mean accuracy (in percentage) for the three n-back condition was- 75.42 (SD = 16.10), 62.27 (SD = 15.64), 37.84 (SD = 14.18) for 1-back, 2-back and 3-back, respectively. There were significant differences among the groups (F (2, 75) = 40.22, p < 0.01, = 0.56). Similarly we found significant differences in the reaction time as well (1-back = 492.58 (SD = 91.1), 2-back = 673.58 (SD = 150.57), 3-back = 824.84 (SD = 147.32), ANOVA = F (2, 75) = 40.98, p < 0.01, = 0.48). Differences between all possible combinations (1 vs. 2, 1 vs. 3, 2 vs. 3) across both mean accuracy (in percentage) and mean reaction time (in ms) were also found to be significant (p < 0.01).
Based on the statistical results, we hypothesized that there will be differences in the brain connectivity matrices (although not visible to the naked eye) in the three workload settings and the deep learning classifiers will be able to utilize these differences for successful classification. It was expected that PTE would perform best in terms of connectivity metric, with it being directed and phase-specific.
Several experiments (ablation study) were performed to find best hyperparameter settings for the three deep learning architectures. The results of the ablation study are compiled in
Table 4. As shown in
Table 4, for MI, a mean accuracy of 80.87% was achieved with CNN, 71.87% was achieved with LSTM and 71.16% was achieved with Conv-LSTM. Similarly, for PLV a mean accuracy of 75.88% was achieved with CNN, 71.82% was achieved with LSTM and 69.68% was achieved with Conv-LSTM. Lastly, for PTE a mean accuracy of 71.16% was achieved with CNN, 69.63% was achieved with LSTM and 69.74% was achieved with Conv-LSTM. The highest accuracy (among all subjects) was achieved with the combination of PLV with Conv-LSTM and CNN at 97.92%. This is followed by MI with CNN at 95.83%. Besides the accuracy, Precision, Recall and F1-score of the classifiers are also reported in
Table 5.
Figure 5 shows the box-plot containing the accuracy and statistical results (standard error, quartiles, and outliers) of all the classifiers in combination with different functional connectivity methods. The combination of CNN and MI indicates the best classification performance. The achieved accuracy outperforms the state-of-the-art in multi-class classification in the context of workload classification in the n-back task with various EEG features and machine-learning algorithms. The comparison of the proposed method with others is given in
Table 6. Since, the number of trials for the three workload settings were balanced, accuracy was indicative of the performance of the classifiers. Nevertheless, we reinforced the results with the analysis of the confusion matrices and ROC curves.
Figure 6 shows the confusion matrix and
Figure 7 shows the ROC curves for all combinations of the classifiers and the connectivity metrics of the best subject. From these figures, it can be substantiated that the classification performance of the models is high for the multiclass-classification problem as the true positive rate is high. The high value class-wise area under the curve shows that the classifier is able to learn and classify each class separately with high accuracy.
Figure 8 shows the features learned by the CNN when MI was given as an input. MI was chosen as it gave the highest accuracy and similarly, input image of medium workload was chosen since the recall of medium workload was highest. It is visible that the filters are actually learning similar activation as in the input image indicating that the classifier was successful. Overall, given the consistent performance of the classifiers across all the metrics and the significant differences found in the statistical tests, it can be concluded that the classifier was successful.
Although state-of-the-art results were obtained, the study had some limitations. One important limitation of the study is the hypothesis itself. We hypothesized that there will be differences in the connectivity matrices in the three workload conditions. However, the study was limited to calculating the connectivity using raw(cleaned) EEG data. This was done to test whether all inclusive connectivity (not band limited) would yield conceivable differentiation in workload or not. This would have implications in making the entire framework close to real-time since band-limiting the signals would have increased the computational complexity. In the future we will consider doing a comparison with our approach and investigations in connectivity with different frequency bands to make a comprehensive and exhaustive hypothesis. Another limitation was the subject-dependent classification. The subject-dependent classifiers can extract subject-dependent features and can effectively tackle the issue of accuracy and generalization encountered in subject-independent EEG classifiers. However, it also gives rise to the issues of long collaboration sessions and collection of large quantities of data [
38,
39]. Lastly, the choice of 16 brain regions for computing the connectivity matrices. The choice of the brain regions could have been empirical instead of hypothesis and use-case driven. Exhaustive search and feature selection algorithms could be used in the future for validating the selection of brain regions empirically.
4. Conclusions
Workload Classification can be used as an indicator of the Emotional Intelligence and stability. The aim of the study was to build a fast and accurate workload classifier which can be extended to real-time workload classification. Real-time workload classification is an important and very useful cognitive construct for the development of robust BCI systems [
62] and useful in several other domains like Virtual Reality [
63] and Human-Machine Teaming [
64]. In this research, EEG was chosen as the neuroimaging modality with its advantages of being cheap, portable and having high time resolution [
65]. Model-free functional connectivity was chosen for the feature extraction with the concomitant advantages of being fast and associated with cognitive control in the context of mental workload [
66]. Also, it has been shown that there are subject-specific differences in EEG-based functional connectivity measures [
37].
Thereby, a combination of various directed/non-directed model-free brain functional connectivity algorithms and state-of-the-art deep learning algorithms were utilized for efficient subject-specific classification of cognitive workload into three levels, high, medium and low. Three functional brain connectivity algorithms (Mutual Information, Phase Transfer Entropy and Phase Locking Value) were used to generate the functional connectivity networks, which represents the neuronal interactions between the different regions of the brain. These connectivity networks are used as inputs to the classification models to classify different levels of workload. We employed three different deep learning architectures (CNN, LSTM and Conv-LSTM) for classification of cognitive workload. Intra-subject method of classification was applied on the data of 19 participants. The best classification performance was obtained with CNN in combination of each of the three connectivity networks over LSTM and Conv-LSTM. CNN outperforms the other two deep learning architectures because of the spatial information provided by the connectivity analysis in the form of input data upon which the classification is being performed. With CNN, MI produces the best classification results with an accuracy of 80.87%, followed by CNN with PLV with an accuracy of 75.88% and LSTM with MI with an accuracy of 71.87%.
We achieved state-of-the-art accuracy for multi-class workload classification using EEG and functional connectivity. From the results, it can be concluded that indeed EEG-based model-free functional connectivity metrics, when combined with deep-learning, provides an accurate, reliable and fast method of classifying cognitive workload. Although there is not much literature available on this, it was hypothesized that the connectivity method PTE will outperform MI and PLV as PTE is the only connectivity measure that is phase-specific and directed in nature. However, in our experiments MI outperformed PTE in the classification performance. This can be due to the fact that this study had lesser number of participants’ and the choice of brain regions. Therefore, no significant conclusions can be made about which model-free connectivity measure is the best. A future study can be performed with higher number of participants and different permutations and combinations of brain regions to make better and clear conclusions regarding the comparative analysis of the different connectivity measures.
Since these brain connectivity methods enable extremely rapid (specially MI) and accurate connectivity matrix generation from raw EEG data, the proposed architecture (a combination of MI/PLV/PTE and state-of-the-art CNN) can be used for effective and efficient cognitive state monitoring and other BCI applications. In addition to that, brain connectivity coupled with hybrid deep learning architectures can be used to classify higher-order cognitive processes like executive functioning and complex decision-making in the future. The subject-specific classification also sanctions the analysis and extraction of subject-specific features. Together, this could enable BCIs to become more reliable and efficient exponents of effective state monitoring in complex real world scenarios.