Real-Time Stress Assessment Using Sliding Window Based Convolutional Neural Network

Mental stress has been identified as a significant cause of several bodily disorders, such as depression, hypertension, neural and cardiovascular abnormalities. Conventional stress assessment methods are highly subjective and tedious and tend to lack accuracy. Machine-learning (ML)-based computer-aided diagnosis systems can be used to assess the mental state with reasonable accuracy, but they require offline processing and feature extraction, rendering them unsuitable for real-time applications. This paper presents a real-time mental stress assessment approach based on convolutional neural networks (CNNs). The CNN-based approach afforded real-time mental stress assessment with an accuracy as high as 96%, the sensitivity of 95%, and specificity of 97%. The proposed approach is compared with state-of-the-art ML techniques in terms of accuracy, time utilisation, and quality of features.


Introduction
Mental stress is a state of mind in which the brain and body respond to pressures from a situation or life event which could be due to internal, external, physical, psychological, or social factors [1]. The coping mechanism of the brain is triggered [2] during which the brain decelerates normal body functions and prioritise blood flow and rapid breathing in an effort to keep the mind and body alert. Recurring stressful conditions can render a person susceptible to other mental disorders, i.e., depression, anxiety, or emotion disorders [3]. Furthermore, stress can increase the risk of heart attack, cardiac arrest, high blood pressure, etc. [4]. It can affect cognitive functions and interfere with memory, resulting in memory weakness [5]. Habib et al. [6] presented an extensive study regarding the effects of stress conditions over different functions of the body.
Diagnosing mental stress at the early stage is crucial to avoid further effects such as body dysfunctions. Current clinical practices involve questionnaire-based evaluations, such as perceived stress scale [7], stress response inventory [8], and life events and coping inventory [9], which can assess

Proposed CNN Based Real-Time Stress Assessment
The proposed real-time stress assessment approach is based on employing a CNN that can handle a large amount of data and unsupervised feature extraction. The proposed approach, as shown in Figure 1, comprises of three main process blocks: data acquisition, frequency band selection, and classification. The proposed approach takes raw EEG data in real-time and does not require individual EEG band extraction other than a single 4-30 Hz band filtering. The features are extracted in an unsupervised manner from a sliding window.
The following subsections elaborate the proposed method and describe the data and experiments. The experiments were conducted to analyse and compare the proposed approach with conventional ML approaches. This includes the use of raw EEG data, suitability for real-time applications, data labels for stress assessment, and quality of unsupervised features.

General CNN Architecture
The ability of CNNs to extract spatial features to a granular level and generalised feature matrices provides a high discrimination power for classification. Typically, CNNs are used for image-related objectives; however, they can also be used on signals [29,30]. A CNN comprises primarily three layers: convolutional, pooling, and fully connected layers as shown in Figure 2. Each layer contains filter windows that slides over the input layer from the preceding layer. Filter size, strides (sliding of widows), and padding (window offset over input) settings are parameterised.

Convolutional Layer
The convolution layer contains kernel; these kernels slide over the input to perform initial unsupervised feature extraction. This phenomenon can be expressed as follows [31]: where the above Equation (1) is repeated for all 0 ≤ d ≤ D = Dl + 1, and for any spatial location (i 1 + 1, j 1 + 1) satisfying 0 ≤ i 1 + 1 < H 1 − H + 1 = H 1 + 1, 0 ≤ jl + 1 < Wl − W + 1 = Wl + 1. In this equation x l i l+1 +i,j l+1 +j,d l refers to the element of x l indexed by the triplets i l+1 + i, j l+1 + j, d l . The convolution layer is followed by the activation function such that the model is more robust and sensitive toward noise in the signal. A leaky-relu is used as the activation function at layers four and ten, which, unlike the relu, allows a small nonzero gradient when the unit is inactive, as shown in Equation (3).

Pooling Layer
This layer reduces the feature map obtained from the convolution layer. Different techniques can be used in this regard; the most popular is the max pooling technique, which extracts the most significant feature within the defined region of a feature map.

Fully Connected Layer
This layer can be an analogy of the fully-connected artificial neural network (ANN) because it comprises every neuron connected to every other neuron of the preceding layer. The features are flattened into one dimension before the fully connected layer. The following Equation (4) expresses the process.
where w, b, x, and y represent the weights, biases, final output, and the output from the previous layer, respectively. The softmax function predicts based on x from Fully-connected layer for the class of patterns, i.e., stress and non-stress.

Data Structuring/ Transformation (CNN)
Stress EEG patterns are instantaneous and last for short time intervals in real-time. Therefore, developing a real-time stress assessment approach requires to process and extract features from the instantaneous signal windows. This makes the batch processing of the entire EEG data for real-time stress assessment inefficient as the features representing stress patterns may be lost. In this paper, a window of 40ms is considered, based on the analysis for maximum feature retention (discussed in Section 2.7.3). In order to employ CNN, signals are segmented into 40 ms windows and eventually into a four-dimensional (4D) matrix in Matlab, as required by CNNs. Figure 3 shows the transformation of a two-dimensional (2D) signal into a 4D conceptual format, which is a prerequisite for CNNs. The signal is divided into 40ms windows, in which each window contains 19 electrode channels, 20 samples, and a depth of 1. Each signal slice is 40 ms apart.

Band Selection
EEG signals contain different frequencies; these frequencies refer to different functions of the brain at different locations. Based on brain activities, each band contains different ranges of these frequencies.
The well-known frequency bands are alpha, beta, gamma, theta, and delta. In conventional machine learning approaches, each band is filtered individually from the signal and then feature extraction is applied over all bands, making it both time and computationally inefficient. In this study, the individual bands are not considered, rather the frequency band between 4-30 Hz is considered from the raw EEG signals. This is also justified as a number of researchers have reported that the stress patterns are in between 4-30 Hz frequency range [32][33][34]. Therefore, the time and computational complexity of the process is reduced as only one filter is required, whereas the conventional machine learning approaches require multiple filters for each frequency band.

CNN Architecture
To classify stress from EEG signals, spatial information is vital. In this study, we investigated the performance of the CNN architecture for stress assessment and classification. The architecture includes a four-layer-deep CNN model with two CNN layers connecting consecutively without an activation function between them. The activation function used in this CNN model was the leaky-relu to avoid the dead neurons during learning. The convolutional layers are followed by a single, fully connected layer, which will provide the outcome of the classification. Furthermore, 50% of the neurons were randomly turned off at each iteration to generalise the model, which was performed by adding a dropout layer, and the dataset was shuffled at each epoch to avoid overfitting. Subsequently, 32 batch sizes and 20 epochs were used for training.
The optimiser used for this training was the Adam optimiser, and the cross-entropy loss function was selected. The dimensions of the feature matrix were reduced using pooling but essential features were maintained using max-pooling after convolution. To avoid overfitting and underfitting in the model, the dataset was divided into training, validation, and testing data. The parameters were obtained after optimising the accuracy of the model, and the details of the model architecture are shown in Table 1.

Description of Dataset
In this research, we have utilized the dataset of Subhani et al. [35] who collected the data aiming to classify stress levels. The data acquisition setup comprised Net Amp 300 amplifier (Electrical Geodesic In. (EGI), USA) with 128 electrodes referenced to Cz. The impedance of all electrodes was kept below 50 KΩ. The signals were recorded at a sampling rate of 500 samples per second with a 50 Hz notch filter to protect signals from line noise. In the experimental setting, a computer-based mental arithmetic task induced stress conditions, which applied the Montreal imaging stress task (MIST) protocol [36], which has three sessions: relaxation, and four levels of each of stress and control. The experiment took place between 3 pm and 7 pm. The experiment designed is approved by the Ethics Commission at Hospital Universiti Sains Malaysia, Malaysia.
In this study the data for 26 subjects (average age: 22.46 ± 0.79, six females, and four left handed) is used. The selected participants had no previous critical medical history and were free from any neurological abnormalities. They exhibited normal or corrected-to-normal vision. The participants were fasting for at least 2 h before starting the experiment. The EEG recording comprised of two sessions: relaxing and stress periods for 5 min each, which resulted in 260 min of EEG recording for the study resulting in 15,000 windows of 40ms duration for each subject. In total, 19 electrodes were used after selection based on 10-20 montage with an average mastoid reference (labelled as Fp1, Fp2,  F3, F4, F7, F8, C3, C4, T3, T4, T5, T6, P3, P4, O1, O2, Fz, Cz, and Pz). Figure 4 shows the different locations of the electrodes.
It is reported in the literature that negative emotions and withdrawal behaviours [37][38][39][40] affect the prefrontal alpha asymmetry. Several researchers have used AAS value for stress assessment [40][41][42][43]. In general, a low value of alpha asymmetry indicates that the participant is in stress [5,19]. Lehrer et al. [44] provided in-depth knowledge regarding the principles and practices for stress management. They also highlighted evidence regarding the effects of relaxation and stress over the alpha band. The characteristics of alpha-band and their correlation with stress and depression have been reported in several papers [45][46][47]. Therefore, AAS is used in this study to assign labels for stress assessment. The windows are labelled using AAS values between Fp1 and Fp2 on prefrontal region as shown in Figure 4. Topomaps for stressed and non stressed subjects based on AAS are shown in Figure 5.

Training, Validation, and Testing
The training phase of the model was performed using the hyperparameters as mentioned in Table 1 with the data structure and architecture discussed in the transformation details in Section 2.2 the transformation details. The data for 20 subjects were used for training with a total of 300,000 windows. The stress and non-stress patterns were labelled based on AAS values. Out of 300,000 (20 × 15,000) windows, a total of 45,000 windows were labelled as stressed based on AAS values. Therefore, 45,000 non-stressed windows were selected, making the entire size of the training dataset to be 90,000 windows. The validation was performed on the entire data for three subjects and the testing was performed on the remaining three subjects that were completely blind during the training and validation phase. The dataset was shuffled at every epoch to reduce the case of overfitting and to generalise the CNN model. The performance metrics were computed from the confusion matrix as follows: The sensitivity of a model represents its ability to identify correct positive cases and interpreted with the classification of true cases/positives, represented in Equation (5). Specificity corresponds to the ability of a model to identify non-cases correctly and is often denoted as true negatives, defined as shown in Equation (6). Conventionally, the confusion matrix represents the accuracy of the trained model (7), which is an average between the sensitivity and specificity.

Stress Assessment Based on ML
The work presented in [21] performs stress assessment using features extracted from EEG. Each EEG recording session of the participant contains 5 min of brain activities. Each session labelled as a stressful or nonstress session. The ML approach can learn different patterns that may be associated with the stress patterns; as such, it can be used to automate the analysis of stress patterns. The features extracted were power, relative power, relative power ratios, coherence, and asymmetries. The data were denoised, thereby removing artefacts that could directly affect the learning process of the machine before feature extraction. The algorithms suggested in this study are support vector machine (SVMs) with the radial basis function. The scope of the study was to only assess the stress level of the participants and does not involve any procedure by which this process can apply to real-time stress assessments. The experimental protocol is suitable for offline assessments because of the length of the signal data, which is 5 min of the EEG signals.

Stress Assessment Based on ML with Raw EEG
To assess the technique presented in [21,35], which used clean data, the ML techniques were assessed with the same numbers of features and length of the signal but raw data that contained both noise and artefacts. The algorithms used in this case were the decision tree (DT), logistic regression (LR), and SVM, which are typically used for signal data classification. Hence, the effect of each process change is more visible over the classification ability by the standard algorithms.

Conventional Batch-Wise vs. Window-Based Feature Extraction
For a real-time application, batch processing of the entire EEG signal data is not suitable. For example, neurofeedback applications require signal evaluation within less than 1 second for content triggering. Hence, feature extracted from the entire data may not represent the stress patterns contained in 40ms window. The advantage of using sliding windows over the batch signal is its compatibility with real-time assessments and prevention of feature loss. To demonstrate the prevention of feature loss, feature extraction is performed on 40ms windows as well as on the entire batch signal. Figure 6 shows that the 40 ms windows retained the features representing the stress patterns. Whereas, the features extracted from the batch data were not suitable while applying on 40ms window for real-time applications. For example one metric for stress assessment is AAS value between channel Fp1 and Fp2 at prefrontal region [5]. Figure 7 shows that the AAS values (used for labelling) may shift from negative to positive when the length of the signal window is incremented gradually. Moreover, the window size of 40ms results in an optimal value of AAS showing that the stress and non-stress patterns are not lost. This shows that the stress patterns can be shadowed by non-stress patterns if a longer window is considered.

Stress Assessment Based on ML and CNN with AAS Labels
To assess the effect of the asymmetric analysis, each window was labelled with AAS. The stress condition was instantaneous because denoting an entire session with a single label was inefficient because many of the patterns would be overlapped with each other, implying that in a single session, both nonstress and stress conditions can appear, thereby affecting the ability of the model to classify signals. The experiment included the training of previous learning algorithms, i.e., SVM, LR, and DT with 40 ms windows of raw data; each window was labelled with a new AAS label. Feature extraction from these windows included the power, relative power, relative power ratios, coherence, and asymmetries (eliminating the AAS between prefrontal cortex FP1-FP2). The experiments relied upon the ML approach, which was dependent on the features extracted for the learning algorithm in a supervised manner. Hence, the ability of the ML algorithm depended on the extracted features. The classification between the classes would be inefficient if the features were inefficient. Hence, deep learning can be pivotal for replacing the dependency of classification over the supervised features. To avoid the process of supervised feature extraction and selection, the concepts of deep learning are useful. The CNN, a technique from the pool of deep learning approaches, can extract spatial features from raw signal windows in an unsupervised manner for classification. After training, the model is validated and tested over unseen data to evaluate its ability to perform efficiently with other blind data, which represents the generalisation capability of the model. Cross-validation includes the classification of features extracted by the CNN from ML algorithms to assess the efficiency of features extracted by the CNN.

Results
The proposed network was trained, validated, and tested with the proposed hyper-parameters presented in Table 1. The performance of the model was tested and measured by the metrics shown in Equations (5)-(7). The results obtained demonstrated the capability of the model in classifying correct stress and nonstress patterns. The model yielded 96% accuracy, 95% sensitivity, and 97% specificity, with raw data and compared with different techniques, i.e., LR, Gaussian SVM, and coarse tree on EEG signal batch-wise, process as detailed in [20,21,48]. In [21], features were extracted from processed EEG signals and classified using the SVM with the radial basis function, which yielded classification results with 79.54%, 78%, and 81% accuracy, sensitivity, and specificity, respectively in [21] whereas 94.6% accuracy is reported in [35] for 2 levels stress assessment.
However, the procedure reported in [21,35] is not suitable for real-time applications owing to the pre-processing of EEG data, large batch sizes of signals, and it included supervised features extraction, which is sensitive; these constraints render it difficult to address stress assessment for real-time applications. The scope of this study is to identify stress in real-time, for that reason in order to make a clear comparison among the existing techniques that address stress level identification from an offline EEG signals and the proposed method, we need to convert existing framework present in literature to make them compatible with the required process flow. The purpose of applying such conversion is the limitation that very few studies are available that address stress assessment in real-time with EEG signals.

Stress Assessment Based on ML with Raw EEG
To investigate the factors that can be improved to achieve a real-time assessment, the procedure was modified with different aspects to obtain the most optimised process. The EEG signals in [21] were processed using automatic cleaning techniques as well as manually removing noise and artefacts; therefore, they would not be compatible with real-time applications. In experiment 1, the same features from the same signal window size and labels were classified. The SVM, DT, and LR were used to classify the signal patterns. The DT produced classification results with 59.20%, 42%, and 76% accuracy, sensitivity, and specificity respectively; meanwhile, LR produced 50%, 50%, and 50% correspondingly, whereas SVM produced the most promising results in experiment 1 with 64.50%, 42%, and 87% accuracy, sensitivity, and specificity, respectively. Table 2 and Figure 8 demonstrate these results with a visualization showing that with uncleaned signal in batch data, the models could not perform correct classifications because the results deteriorated compared with the benchmark results from [21] that are 79.54% accuracy, 81% sensitivity and 78% specificity.

Stress Assessment Based on ML with Sliding Windows
The use of uncleaned signals is inevitable when designing a system that avoided all manual processes for real-time applications. Hence, the signal window length can be changed to a 40 ms size. The feature extraction from sliding windows of signals includes the same features as in experiment 1 and the benchmark study with the same label associated with the batch window. After the extraction, the models were trained and tested over unseen signal data. The overall performance of the models deteriorated; the DT classified 58.8% of the signals correctly, whereas the LR could only identify 44.3% of the signal windows correctly, and the SVM predicted 60.7% of the signal windows correctly. The model's performance demonstrated overfitting toward a single class because of the high sensitivity and low specificity of the models, as shown in Table 3 and Figure 9. This indicates that the performance of the models has slightly deteriorated from the previous experiment; the cleaning of EEG signals and feature extraction from batch data data increased the computational time of the entire process. Therefore, the system must address uncleaned signal data with a sliding window.  The labels provided were inefficient because the batch data may contain patterns belonging to both stress and nonstress conditions. Hence, the learned models could not identify the two classes correctly.

Stress Assessment Based on Ml with AAS Labels
A new marker can resolve problems regarding labels. Each window was labelled using a new label and used for training a model. The performance of each model improved by 20-35%, the results from the DT improved from 58.8% to 84% with 91% sensitivity and 71% specificity compared with the previous experiment with 80% sensitivity and 36% specificity. LR could identify 84% of the signal windows correctly with 91% and 71% sensitivity and specificity, respectively. The SVM yielded results with 84%, 78%, and 90% accuracy, sensitivity, and specificity, respectively, which are shown in Table 4. Experiment 3 provided the best results when using AAS as a label with uncleaned 40 ms signals. However, to make the process compatible with real-time applications, it is essential to calculate the total time response of the process. Hence the computation of time response was for 1 second of the signals; the DT, LR, and SVM consumed 4.41, 6.87, and took 6.87 seconds to perform the prediction, respectively. The time response from all three techniques is not compatible for real-time applications because the system has to weigh for approximately 4-7 s to accept the new signal data. Prior to that, the data will increase with the multiple-frequency sample and time response. Hence, the time response for every second will increase exponentially, rendering the ML technique useless for real-time applications. The time responses of all techniques are compared in Table 4. As shown in Figure 10, the ML techniques consumed the most time during feature extraction based on the nature and quantity of the features, and different models have different time consumption abilities.

Stress Assessment Based on CNN with AAS Labels
The supervised feature extraction process must be replaced with other processes that are compatible with the real-time stress assessment system. The CNN in experiment 4 replaces the supervised feature extraction with an unsupervised extraction of features from the sliding windows with AAS labels. Throughout the experiments, the dataset was consistent; as such, the comparison should not be biased. After the training of the CNN model, it predicted the testing data, the same as in previous experiments. The classification performance of the CNN was 96% accuracy, 95% sensitivity, and 97% specificity with the time response for 1 s signal windows of 0.65 s. The model performed significantly better than all previous techniques discussed and performed all the processes in the minimum amount of time, thereby avoiding an exponential increase in the signal wait time, which occurred in the ML approach the performance is highlighted in Table 4 and the time response in Figure 10.

Feature Extraction Prediction
Base se-e-time CNN does not involve processing time for feature extraction, unlike other techniques; hence, it provides results within the base-time of 1 second and proves to be suitable for real-time stress assessments Figure 10. Performance comparison in terms of assessment time for real-time applications.

Cross-Validation of Features from CNN
The CNN must be validated to demonstrate that the features extracted are efficient. The features from the CNN were collected and used in the ML technique, i.e., DT, LR, and SVM. The classification results proved that the features from the CNN improved the performance capability of these ML techniques. The DT yielded 80.6%, 81%, and 80% accuracy, sensitivity, and specificity, respectively which shows decrement in case of accuracy as compared to the results produced by DT showed in Figure 11, but the model with supervised features is overfitting as the specificity is just 71% whereas with features from CNN made the model more robust. The performance of LR improved to 92.7% in terms of accuracy, with 93% sensitivity and 92.7% specificity. The SVM improved by 10% in terms of accuracy and yielded a total accuracy of 95% with sensitivity and specificity of 95% each, which was similar to the accuracy of the CNN, as demonstrated in Figure 12. The experimental results demonstrated that efficient features affected the performance of learning, and that the models performed well. When using ML techniques, domain experts performed feature extraction; however, after extraction, optimum features were selected during feature selection to ensure the good performance of the models.  Figure 12. Performance of ML algorithms using supervised features as compared with features from CNN in terms of accuracy, sensitivity, specificity and assessment time The computation of the system as the time of model training depended on various factors that affected its operating time, i.e., the number of samples, number of epochs, and depth of the deep neural network. After model training, the minimum amount of time was required to process the data for classification. The signal filters consumed the most time in our process. The experiment results and comparisons were executed over MATLAB release version 2019 on DELL-OPTIPLEX 990 with hardware configuration of 8 GB RAM and Intel i7 3.4 GHz 2nd generation CPU.

Conclusions
This paper presents a real-time mental stress assessment method using sliding window based CNN. The study demonstrates the advantages of using CNNs over conventional ML techniques for real-time stress assessments on raw EEG data. The proposed approach yielded mental stress assessments in real-time and outperformed the conventional ML approaches in terms of accuracy, sensitivity, and specificity. The performance of the proposed approach was also analysed in terms of time utilisation, quality of features, and size of sliding window for real-time assessments applied to raw EEG data. The proposed approach can be utilised in real-time applications, such as neurofeedback training, and mobile and wearable stress assessment devices. Clinicians can also utilize the proposed approach to validate their psychiatric evaluations.