1. Introduction
For industrial safety, identifying risks from human error is necessary because unsafe and reckless behaviors of industrial workers and lack of precautions are directly responsible for human-caused problems. Some of the key factors of these unsafe and reckless behaviors include lack of proper sleep, lack of a proper diet, physical defects, and fatigue, which can lead a person into a stressful situation. This situation causes discomfort, anxiety, depression, cardiovascular disease, high heart rate, and several other harmful effects [
1,
2]. In general, stress is the body’s response to mental and physical pain. However, in a more scientific way, stress can be defined as a complex psycho-physiological state initiated by the discrepancy between the person’s perceived exogenous and endogenous demands (stressors) and its perceived competence to cope with these demands [
3,
4].
In recent years, techniques including functional magnetic resonance imaging (fMRI), near-infrared spectroscopy (NIRS), electrocorticography (ECoG), and, electroencephalogram (EEG) signals have been used to detect and analyze emotional states [
5,
6]. fMRI and NIRS measure brain activations using brain blood. fMRI has the benefit of determining signals inside the brain with an exceptional altitudinal resolution, but the measurements are deferred until the state of the brain changes. In contrast, NIRS can only elucidate the condition of the brain exterior, and the signal is ultimately acquired through blood flow. The ECoG and EEG signals quantify the brain waves. Despite ECoG having the great advantage of measuring long-bandwidth signals, electrode positioning on the shell of the brain to acquire the signals necessitates a surgical procedure. Alternatively, EEG uses a procedure that requires wearing a helmet [
7], and therefore EEG can be measured non-invasively. It measures signals from the scalp rather than the brain itself [
8]. Therefore, in this study, EEG is considered. The main purpose of this paper is to identify the mental state of a person by analyzing the EEG signals.
Several studies have demonstrated the correlations between EEG pattern and emotional states, i.e., calmness, depression, excitement [
9,
10,
11]. In [
10], an EEG-based analysis on the frontal alpha asymmetry index with a support vector machine (SVM) algorithm was proposed for stress analysis. In [
11], a frequency domain-based analysis with a k-nearest neighbor (k-NN) algorithm was proposed to analyze the stress state using the EEG signal. Recently, deep learning-based approaches have also been applied to classify different mental states by analyzing EEG signals. These methods automatically learn feature representation from the data to distinguish among different classes. Li et al. [
12] used a spatial and temporal deep learning architecture to learn discriminative spatial-temporal EEG features for the detection of emotional states. Hefron et al. [
13] suggested a novel convolutional recurrent neural model by using multipath subnetworks for a cross-participant EEG-based assessment of cognitive workloads. Here, the bi-directional-residual recurrent layers statistically signify the increment of performance in predictive accuracy. Kuanar et al. [
14] designed an EEG-based multispectral time-series imaging technique with a recurrent neural network algorithm to do the cognitive analysis of working memory load. However, these deep networks usually need huge amounts of data for training purposes. In addition, deep features (i.e., the output from intermediate layers of deep networks) may be correlated and non-separable. Moreover, the existing approaches do not give any importance to the selection of suitable features from particular domains for deeper causal analysis.
In this study, an EEG data-driven mental state identification technique is developed to analyze whether a person is experiencing stress. The pre-processed signals from the Database for Emotion Analysis using Physiological Signals (DEAP) are considered for analysis [
15]. By analyzing these signals, a custom hybrid feature pool is designed, which consists of two types of features: (1) statistical features from the time domain, and (2) wavelet-based features from the time-frequency domain. Once the feature space is available, the next step is to select the reliable features. Feature selection has a great impact on improving the classifier performance. In the supervised machine learning approaches, the corrected class labels of input data are known, and the feature evaluation criteria is applied to extract valuable features from the given data. The feature selection process serves two key purposes: (1) selection of relevant features, and (2) reduction of feature dimension. The traditional feature selection algorithms usually select the non-redundant or relevant attributes from a single dataset only and do not consider the variations in input datasets. In simple words, it might happen that certain features may be non-redundant/relevant for a particular dataset and redundant/irrelevant for another dataset. To address this issue, a wrapper-based approach called Boruta is utilized, which randomly shuffles sub-datasets and selects the relevant features accordingly. Finally, the features processed by the Boruta technique are supplied to the k-NN algorithm for classification. To demonstrate the robustness of the proposed method, it is compared with two cases: (1) dimensionality reduction of the hybrid feature pool by principal component analysis (PCA) and then the use of k-NN for classification, and (2) consideration of all the features (i.e., without using the feature selection technique) and then applying k-NN for classification. The main contributions of this paper are summarized as follows: (1) a hybrid feature pool is designed by combining the statistical features from the time domain and wavelet-based EEG band-wise features from the time-frequency domain to effectively capture all the intrinsic information from the EEG signal despite artifacts, and (2) a wrapper-based feature selection mechanism, Boruta, is deployed to analyze all the important attributes of the hybrid feature pool.
The remainder of this paper is arranged as follows.
Section 2 describes the publicly available dataset and the details of each step of the proposed methodology. Data arrangement and in-depth analysis of the experimental results are provided in
Section 3. The limitations and the future aspects of this research are discussed into
Section 4. Finally,
Section 5 concludes this paper.
2. Methodology
In this study, the main goal is to identify signs of stress from EEG recordings. In
Figure 1, a block diagram of the overall proposed method is given. The proposed approach is divided into five blocks: (1) pre-processed data collection from the DEAP dataset [
15], (2) data annotation and arrangement, (3) creation of the hybrid feature pool, (4) discriminant feature selection by a wrapper-based feature ranked approach, Boruta, and (5) k-NN-based classification.
2.1. Dataset Description and Annotation
EEG signals from the DEAP dataset [
15] are used for this mental stress classification task. This dataset comprises emotional responses induced by music videos. In total, 32 participants from the 19–37-year age group were tested to build this dataset. Each participant watched 40 music videos. While watching, EEG signals were recorded for 1 min with the 10–20 system of electrode placement. Each music video has a separate experiment ID. For each experiment ID, signals from 40 different channels were recorded. Among these 40 channels, 32 are EEG channels and 8 are peripherals. The specific details related to these channel descriptions are well documented in [
15]. In this research, the main objective is to identify the calm or stress state of a person by analyzing the EEG signals. Verma et al. [
16] compared the performance of 40 channel features (32 EEG and 8 peripherals) with that of 32 EEG channel features and observed that the 40-channel features did not bring any significant improvement over the 32 EEG channel features. For similar reasons, Zhang et al. [
17] considered only EEG based feature extraction in identifying the emotional state of a person. Therefore, in this study, the 32 EEG channels are considered for the final analysis.
After considering the 32 EEG channels, the valence and arousal levels are analyzed from the online self-assessment of each participant for each experiment ID [
18,
19]. Every experiment ID has a predefined online rating, by which all the experiment IDs can be categorized either the genre of stress state or the genre of calmness. However, when a participant provides the self-assessment rating for the same video, from the participant rating list (available on [
15]), it can be observed that the video from the genre of calmness brings the feeling of stress and vice-versa. Therefore, for this research, the online self-assessment rating is considered to categorize the experiment IDs (either calm or stress) for each participant by Equations (1) and (2), derived from [
11,
20,
21].
After differentiating the data into two states, 7 participants (participant numbers 3, 6, 7, 9, 17, 23, and 30) did not show any distinctive mental state of calm and stress. With the remaining 25 participants, the dataset was arranged for the classification task, which is depicted in Table 3 of
Section 3.1. While arranging the dataset, some datasets depicted an imbalanced nature. To tackle the imbalanced nature of these datasets, the synthetic minority over-sampling technique (SMOTE) [
22] was adopted before applying the feature design and classification steps.
2.2. Hybrid Feature Pool
The main objective of designing the hybrid feature pool is to obtain reliable information from EEG signals for emotional state identification. To form a reliable feature pool, analyzing the signals from different domain perspectives is necessary. Therefore, the hybrid feature-attributes are measured from two specific domain-based analyses: (a) statistical features from time-domain analysis, and (b) wavelet-based feature analysis from the time-frequency domain. In this study, a total of 19 features is considered, 11 from the time domain and 8 from the wavelet-based time-frequency domain.
2.3. Statistical Features from the Time Domain
From the time domain, the extracted statistical features are root mean square (F1), square mean root (F2), peak to peak (F3), kurtosis (F4), skewness (F5), kurtosis factor (F6), shape factor (F7), crest factor (F8), and impulse factor (F9). In addition to these statistical feature parameters, Hjorth parameters are also considered to compute the mobility and complexity of the signal [
20,
23,
24]. These two parameters contain the information on the frequency spectrum of the signal. Mobility is F10 and complexity is F11. All 11 features are mathematically defined in
Table 1.
2.4. Wavelet-Based Feature Analysis from the Time-Frequency Domain
Usually, the EEG signal is divided into five distinct frequency bands of delta, theta, alpha, beta, and gamma [
25,
26]. In this study, the considered dataset is downsampled to 128 Hz, smearing a 4 to 45 Hz bandpass filter and eliminating EEG artifacts. So, the delta band (0–4Hz) is not present in the dataset for further analysis. However, from the present frequency bands of the signals, to extract and analyze the time-frequency based wavelet features, discrete wavelet transform (DWPT) is considered in this case. By considering the Daubechies 4 wavelet (db4) function, a level 5 DWPT is applied to the existing signals. The resulting DWPT decomposition tree is depicted in
Figure 2.
The details of the frequency bands and considered correlated DWPT packets are provided in a very detailed manner in
Table 2.
To obtain four distinct bandwidths, five DWPT packets are considered. Therefore, from the wavelet coefficient vector, two features are calculated: energy (F12) and standard deviation (F13). From each wavelet band, the entropy is calculated, and their sum is considered as a separate feature, denoted as the wavelet sum of entropy (F14). Then, from each DWPT packet, the power spectral density (PSD) is calculated by using the Welch method [
27]. For the five DWPT packets, five power bands are calculated and considered as five separate features from F15 to F19.
2.5. Feature Selection by Boruta
As discussed in the introductory part, the objective of feature selection is to obtain the minimal-optimal feature set (i.e., the smallest possible feature set). The traditional feature selection algorithms rely upon classification accuracy to decide the importance of a feature. The features which generally improve classification accuracy are non-redundant features; otherwise they are redundant in nature. The removal of an important feature may decrease the classification accuracy; however, no significant change in the classification accuracy does not signify that the feature is irrelevant or unimportant. Thus, the selection of all relevant attributes instead of just the non-redundant ones is important for preventing the loss of any useful information from the feature set [
28,
29]. A wrapper method usually uses a classifier wrapped around the feature selection process for deciding all the important features. The Boruta algorithm adopted in this study is a wrapper-based technique built around random-forest (RF) [
30] classifier. The algorithm consists of these following steps:
First of all, it duplicates the original features to extend the feature information. The duplicated attributes are known as shadow features.
Then it shuffles the attributes of those shadow features to remove their correlations with the response.
After that, it trains the shadow features around the random forest (RF) classifier to justify the importance of individual features by the mean decrease impurity (MDI) matrix. MDI calculates the number of times a feature is used to split a node, weighted by the number of samples it splits across all the trees of the RF classifier. Thus, MDI decides the importance of each shadow feature. The shadow feature with the highest MDI score is considered as the best shadow feature.
Now, the algorithm tests the real feature attributes to determine whether they are important. For this purpose, the Z score is needed. In tree-based machine learning algorithms (i.e., RF), the significance measure of a feature is attained as the loss of accuracy of classification caused by the random permutation of characteristic values between attributes. This loss is computed individually for all trees in the forest, which use a provided attribute for categorization. Then the average and standard deviation of the accuracy losses (from individual RF trees) are computed. The Z score is finally computed as dividing the average loss by its standard deviation [
28]. In Boruta, the Z score is used as the importance measure since it considers the fluctuations of the mean accuracy loss among trees in the forest (RF classifier). Since the Z score cannot be used directly to measure the feature importance, some external references are needed to decide whether the importance of any given attribute is significant. Therefore, with the MDI score, the most significant shadow features are considered as the external reference to determine the important attributes from the original feature set. Thus, after the RF classifier is applied, the algorithm assesses whether any of the original feature attributes have a higher Z score (ZOF) than the Z score of that important shadow feature (MZSF). If the ZOF is higher than the MZSF, then the algorithm records this event as a count in a vector corresponding to the original features, called a “hit”. From this hit vector, the important feature set is obtained. The unimportant features are discarded.
The procedures from Step 1 to 4 are repeated until a significance is assigned for all the attributes, or the algorithm has attained the earlier set limit of the RF classifier runs (iteration limit) [
28].
In short, Boruta is centered on a similar idea which structures the basis of the RF classifier. It combines randomness to the procedure and accumulate results from the ensemble of randomized samples. Especially, the significance of a shadow feature can be non-zero only due to these random fluctuations. Thus, the set of important shadow features is utilized as a character reference for determining which original features are truly essential. Therefore, it is necessary to repeat the re-shuffling technique for generating different shadow features each time to achieve the statistically reasonable scores. In this work, in total, 20 iterations are considered to serve this purpose.
2.6. Classification by the k-Nearest Neighbor (k-NN) Algorithm
The ranked feature subset is finally classified by the k-NN algorithm. The advantages of this are simplicity in building architecture and less computational complexity for small-sized data [
31,
32]. Because of non-parametric attitudes and classifying samples based on votes of k-nearest neighbors, it is efficient to use for a small featured dataset. This k-NN algorithm performs on three main principles: (a) calculates the distance between the neighbors, (b) finds the
k closest neighbors to deal with the bias-variance trade-off for solving the overfitting/underfitting problem, and (c) votes for labels. From the illustration of
Figure 3, the main mechanism of the k-NN algorithm can be easily explained.
As in
Figure 3, the new data point (inside the circle, yellow color) can be categorized either as class 01 (purple polygon) or as class 02 (blue diamond). When
k = 3, the new data point fits into class 02 because of the higher density of class 02 within the circle, i.e., there are two blue diamonds (class 02) and one purple polygon (class 01) within the second circle. If the value of
k is randomly assigned, i.e.,
k = 5, then the data point fits into class 01 because three instances from class 01 and two instances from class 02 are surrounded by the outmost black dashed circle. So, the selection of the
k value is important. With a given
k-value, boundaries of each class can be drawn. These boundaries segregate class 01 from class 02. Therefore, there are two significant parameters that should be selected to establish the classification task of k-NN: (a) the optimal value of
k that defines the number of neighbors and (b) the distance metric, which is calculated by the Euclidean distance from Equation (3).
To find the optimal value of
k, the K-fold cross validation is used. As depicted in
Figure 4, the K-fold cross-validation (the nearest-neighbor
k is different from this K) involves randomly dividing the training set into 10 groups, or folds. After that, the training dataset is divided into two sets: training folds A, and validation fold B. The model is trained based on training folds A, and tested against the validation fold B. The validation fold B is used to tune the parameters, such as the
k in k-NN. The validation fold is rotated in every iteration (10 times) and the rest of the data is used to train the k-NN. In simple words, a K-fold cross-validation implies splitting the data into K fold, then, training on (K–1) folds, and testing on the remaining 1-fold as the validation fold.
In this research, each of the considered datasets (datasets 1 to 25 in Table 5, the merged dataset in Table 6) is first divided into training and testing sets at a 60/40 ratio. Then, a 10-fold cross-validation is performed on the test set of every dataset using a generated list of odd ks ranging from 1 to 20. On every iteration of 10-fold cross-validation, the misclassification error vs. k is observed. For most of the datasets, the optimal value for k ranges from 4 to 9 after a 10-fold cross-validation.
4. Discussions
The experimental analyses provide insights for future research approaches. As can be seen in
Table 3, the experimental ID for each individual participant is unique. Therefore, it is difficult to generalize the entire recorded response signals from all the participants for classifying stress and calm mental states. For example, if all the participants show a calm state for experimental IDs 4, 20, and 31, and a stress state for experimental IDs 1, 15, and 40, then it is easy to generalize the full dataset. Moreover, from
Table 3, it is visible that the dataset is imbalanced in nature. Thus, SMOTE is adopted to handle this issue. Hence, the lack of proper preprocessing techniques (i.e., the preprocessed data from the DEAP dataset is directly considered) affects the classification performances of a few datasets. For example, in
Table 4, datasets 6, 8, 9, and 12 give an accuracy rate of between 64%–66%, whereas the rest of the datasets give the accuracy within the range of 79%–96%. The performance degradation issues of datasets 6, 8, 9, and 12 are depicted in
Figure 6a (several outliers are visible into the boxplot). Therefore, the performance of the proposed method can be further enhanced by using appropriate pre-processing techniques for EEG signals of different mental states. In addition to these, a comparative analysis of scalp sources can affect the final performance of the proposed approach. However, the focus of this research is to build an effective feature space that can represent the EEG signals in an accurate manner. The extraction of appropriate features from the EEG signals is highly needed for the construction of efficient classification models. As can be seen from
Table 6, the Boruta-based k-NN classifier adopted in this paper outperformed the traditional k-NN classifier based on all features in terms of classification accuracy. Besides, the identification of electrodes and cortical regions containing the most relevant information can be investigated in the future to increase the performance of the proposed approach.