Detection of Mental Stress through EEG Signal in Virtual Reality Environment

: This paper investigates the use of an electroencephalogram (EEG) signal to classify a subject’s stress level while using virtual reality (VR). For this purpose, we designed an acquisition protocol based on alternating relaxing and stressful scenes in the form of a VR interactive simulation, accompanied by an EEG headset to monitor the subject’s psycho-physical condition. Relaxation scenes were developed based on scenarios created for psychotherapy treatment utilizing bilateral stimulation, while the Stroop test worked as a stressor. The experiment was conducted on a group of 28 healthy adult volunteers (ofﬁce workers), participating in a VR session. Subjects’ EEG signal was continuously monitored using the EMOTIV EPOC Flex wireless EEG head cap system. After the session, volunteers were asked to re-ﬁll questionnaires regarding the current stress level and mood. Then, we classiﬁed the stress level using a convolutional neural network (CNN) and compared the classiﬁcation performance with conventional machine learning algorithms. The best results were obtained considering all brain waves (96.42%) with a multilayer perceptron (MLP) and Support Vector Machine (SVM) classiﬁers.


Introduction
The feeling of stress is inherent to human beings and cannot be separated from human existence.It is evoked and associated with a multitude of issues humans struggle with such as surgical trauma, emotional arousal, mental or physical effort, fatigue, pain, fear, need for concentration, humiliation, frustration, intoxication with drugs or environmental pollutants [1].However, it is not only negative emotions or events that trigger the feeling of stress, as it can also accompany unexpected success, a lifestyle change or unpredicted opportunities.Stress can also have a positive and motivating effect since the increased value of so-called healthy stress brings about higher productivity.A very general definition of stress can describe it as the perception of threat, causing anxiety discomfort, emotional tension, and difficulty in adjustment [2].Our ancestors' neural system developed an automatic mechanism called a fight-or-flight response in order to facilitate making a splitsecond decision about the reaction to a perceived attack, harmful event or threat to survival.The sympathetic nervous system immediately releases hormones such as adrenaline and cortisol as a way of preparing the body for emergency and the physical symptoms that follow include a faster heartbeat, tightening of muscles, rising blood pressure, quickened breath and the sharpening of senses.These physical changes increase strength and stamina, speed up reaction time and enhance focus, preparing for confrontation [3].The nervous system reacts in a similar fashion during non-life-threatening adversities such as daily struggles with traffic jams or work.If this automated response is triggered too quickly or exceeds a certain level of intensity, it can be damaging to mental and physical health, alter the emotional state negatively and have a detrimental effect on productivity and the quality of life of an individual [4].Chronic or exceedingly intensive stress can lead to a variety of negative effects on human health such as anxiety, depression, irritability, sleeping disorders or headaches.What is more, it contributes to many disabilities worldwide and represents a severe economic burden [5].The longer the period of exceeded stress, the more negative the impact is on both mind and body.The annual stress study conducted by the American Institute of Stress reported that the average stress level in the United States grew from 4.9 to 5.1 on a scale from 1 to 10 in 2015, the main stressors being employment and money [6].Recently, the World Health Organization (WHO) has named stress as the "Health Epidemic of the 21st Century" [2].
VR is a computer-generated three-dimensional interactive artificial environment.Its physical expression is a computer simulation presenting a fully explorable environment which allows interaction with objects and avatars.It is usually three-dimensional and aims at creating a mirror image of the real world, with its appearance and physical phenomena [7].Currently, the most effective way of experiencing VR is by using head-mounted display (HMD) systems, which comprise a device worn on the head or a helmet with a builtin display and lenses.It allows the user to immerse themselves in the virtual world with the help of a wide viewing angle, head and hand movement tracking and controllers [8].What is more, HMD not only provides visual feedback, but may also include other modalities such as aural, haptic, etc. [9].
From the review of research using VR in the therapy of mental disorders, it appears that this type of technology may be conducive to alleviating the symptoms of stressrelated [10][11][12] and anxiety disorders [13,14].Since VR-generated environments may reflect real situations, they can be used as a trigger of various thoughts, feelings and behavioral responses in the context of affective computing.For example, in [15], anxiety was triggered by a VR environment scenario presenting a stressful job interview, and in [16] by virtual traffic lights.
The literature regarding stress recognition is extensive and may consider different information such as facial expressions [17], speech [18], gestures [19], and psychophysiological signals [20].Another method that proves reliable in stress recognition is EEG, which tends to provide informative characteristics in responses to the subject's mental state.
In general, an EEG records the changes in the electrical potential on the skin surface resulting from the activity of neurons of the cerebral cortex, through electrodes arranged along the scalp [21].The examination is usually performed for monitoring and diagnosing a number of diseases, although it is being more and more frequently used in biofeedback [22] and in brain-computer interfaces [23], for example, in cognitive state recognition [24,25].
This paper investigates using an EEG signal to classify a subject's stress level using a VR environment.For this purpose, we designed an acquisition protocol based on alternating relaxing and stressful scenes in the form of a VR interactive simulation, accompanied by an EEG headset to monitor the subject's psycho-physical condition.Relaxation scenes were developed based on scenarios created for psychotherapy treatment utilizing bilateral stimulation [26][27][28], while the Stroop [29] test worked as a stressor.The experiment was conducted on 28 healthy adult volunteers (office workers) participating in a VR session.Subjects' EEG signal was continuously monitored using the EMOTIV EPOC Flex wireless EEG head cap system.Then, we classified the stress level using a convolutional neural network (CNN) and compared the classification performance with conventional machine learning algorithms.
Several experiments were conducted to assess emotional responses induced in VR environments using different EEG headsets (for example, [30][31][32]).However, according to our best knowledge, none of them investigate automatic emotion recognition.This work combines both areas, a VR-based emotion induction and automatic response recognition, with the following contributions:

•
Investigation of automatic stress recognition induced in a VR space for the purpose of the analysis of brain wave parameters that are the most discriminative for such an analysis.
• Analysis of classification performance for individual electrodes to investigate how their number can be minimized without affecting the recognition quality.• Analysis of the EMDR method's effectiveness in VR on decreasing stress levels.
The rest of the paper is organized as follows: Section 2 reviews the related work in the field of stress recognition methods; we review the literature on both stress recognition approaches in virtual environments and the EEG signal analysis in the context of stress recognition.Then, in Section 3, our experiment's protocol as well as details of the proposed method are described, followed by the experimental results and discussion.Finally, conclusions and a comparative analysis of our study are presented in Section 5.

Related Studies
Electrical activity of the brain recorded as an electroencephalograph (EEG) indicates the link between mental states and brain activity by providing rich information about the central nervous system activities [33].Thus, researchers' attention has been directed towards the recognition of an individual's mental state (emotions, anxiety, stress) based on an EEG.For example, in [34], the authors propose a new approach to classify emotional stress in two main areas of the valence-arousal space using an EEG as the main signalnegatively excited and calm-neutral.Wavelet coefficients and chaotic invariants were used as the input to the Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) classifiers.The average classification accuracy was 80.1% and 84.9% for two categories of emotional stress states using the LDA and SVM, respectively.A similar approach is demonstrated in [35], where the authors present a framework to recognize construction workers' stress levels using a wearable EEG device.The framework applies an SVM algorithm to detect low and high stress levels while working in different conditions.
Hou et al. [36] present an algorithm identifying four different stress levels (relaxed, engrossed, stressful and anxious) induced by a Stroop Color and Word Test (SCWT).The results of 67.06% were obtained by combining fractal dimension and statistical features and using SVM as the classifier.The SCWT was also used as a stressor in [37].The authors designed an automatic EEG-based stress recognition systemrelying on an EMOTIV EPOC wireless device.Using the relative difference of beta and alpha power as a feature, along with SVM as a classifier, three-levels of stress were recognized with an accuracy of 75%.
In [33], the authors propose an EEG-based stress recognition application, whose training set is continuously updated with new input signals in near real-time.The authors applied different online multitask learning (OMTL) algorithms to recognize the stress level.The EEG signal was collected in two different environments: a controlled lab environment using a wired EEG and the field using a wearable EEG device.Among all tested algorithms, the OMTL-VonNeuman algorithm resulted in the best prediction accuracy on both datasets (71.14% on the first dataset and 77.61% on the second one).
The scope of the use of deep learning has expanded greatly to different application domains, including the classification of various signals representing emotional states [38,39].This kind of approach is able to determine the most distinctive features based on enormous sets of data.Thus, similar methods are more and more frequently used in EEG signal analyses in the context of mental state recognition.The effectiveness of such an approach was proven in a subsequent paper by Jebelli [40].The authors compared two deep learning structures: a convolutional deep learning neural network (CNN) and a Fully Connected Deep Neural Network (FC-DNN).The optimally configured DNN resulted in a maximum of 86.62% accuracy in recognizing workers' stress.It was at least six percent points more accurate when compared with their previous handcrafted feature-based methods [33].A similar discussion is presented in [41].The multi-column CNN-based model delivered an improved performance with respect to the conventional manual feature-based modules.Apart from the aforementioned studies, there has not been much research conducted on the subject, which leaves room for a further examination of using deep learning to classify emotional stress throughput EEG signals.
In all the above-mentioned studies, stress was induced using short movies, logical tasks and games or triggered by real-world circumstances.Since VR can easily reproduce natural environments, allowing the user to experience a deeper immersion than in front of the screen, evoking emotions in VR becomes a natural approach.For example, in [42], the authors develop an emotion recognition system for affective states evoked through VR.A virtual environment was designed to elicit four possible arousal-valence combinations, according to the Circumplex Model of Affects.The examination involved the recording of the EEG and electrocardiography (ECG) of sixty participants.The features were extracted from these signals using various state-of-the-art metrics that quantify brain and cardiovascular linear and nonlinear dynamics.The SVM classifier predicted the subject's arousal and valence perception with the accuracy of 75% along the arousal dimension and 71.21% along the valence dimension.A similar approach is presented in [43], where frontal EEG data were collected using textile dry electrodes during VR affective stimuli.Based on features extracted from the time, frequency and space domains and three ensemble models, including Gradient-Boosted Decision Trees (GBDT), Radio Frequencies (RF) the and SVM, they achieved a 81.30% accuracy performance.However, when it comes to stress recognition, only several papers are relevant.One of the most important research is presented in [44], where the authors tested 19 healthy young subjects performing a balance beam walking task in VR conditions to determine if exposure to high altitudes in VR induced stress.They recorded the number of steps on the beam, heart rate, electrodermal activity, response time to an auditory cue, and high-density EEG.Their findings indicate that VR provides realistic experiences that can induce physiological stress in humans.

Materials and Methods
In this section, we present the main components of the proposed research, starting with data collection protocol, followed by experiment description and ending with signals analysis and classification methods.The structures of proposed mental stress recognition approaches are presented in Figure 1.All steps of the pipelines are presented in details in the following sections.

Data Collection Protocol
Twenty-eight (9 females and 19 males) healthy volunteers (age: 36.53 ± 10.48 standard deviations (SD) years) from bank branches in Łódź were recruited for the purpose of this experiment (see Table 1).They were selected based on the following inclusion criteria: (1) they had no neurological or mental disorders, (2) they were using VR before and did not report VR sickness symptoms, and (3) they had normal or corrected (contact lenses) vision and no color blindness.All participants were informed about the purpose of the study and the procedure of the experiment was explained to them step by step.Subjects were informed about their right to stop the study and withdraw their consent at any moment.They authorized informed consent in writing.All participants performed the tests in a quiet and well lit environment with room temperature (23-25 • C).The process of mounting EPOC Flex headset and VR HMDs is presented in Figure 2.Each experiment set took about 20 min and it was divided into two main scenes: stressful and relaxing.Each set was repeated three times.Thus, we gathered data representing three mental stress and three relaxation (six in total) conditions per person from the whole experiment.The timing of the whole experiment is described in more detail in Figure 3.The first scene (stressful) consisted of a Stroop test comprising typical tasks that have been used to induce stress in many previous researches [45] and the publication was one of the most cited works in the field of experimental psychology [46].The aim of this test was to determine the color of the text.If the color of the text and the text did not match (e.g., the word "green" printed in blue font, or "red" printed in green) a subject who was asked to specify the color of the text was going to perform it more slowly and with a greater probability of error than if the text and its color were compatible (see Figure 4).Every next scene was more stressful (more difficult) and it was directly followed by relaxation scene.The second scene had a purely relaxation purpose.The VR environment was composed of forest scenery (see Figure 5) enriched by a delicate, instrumental theme with the sounds of chirping birds.Therapeutic exercise involved following the sphere, which was moving horizontally.The sphere moved constantly in the subject's field of view, and a properly performed exercise denied the possibility of head movement.In contrast to stressful scenes, every next relaxation scene was identical.At the beginning of the experiment, the operator presented the subject with the standard protocol of experiment.The protocol included information on the procedure of EEG and VR headset installation and time of VR immersion.All subjects were free to ask any questions they deemed necessary.The sequence of the actions was the following: information to the subject, EEG and VR headset mounting, immersion, VR and EEG headset unmounting.The target platform was Oculus Quest [47] (individual resolution of 1440 × 1600, refresh rate of 72 Hz, 104 horizontal and 98 vertical FOV).

Compliance with Ethical Standards
All objectives of the experiment were achieved with low ethics level, according to the Polish National Science Center.Participants were informed (1) about the purpose of the study, (2) that they had the right to stop the experiment at any time without providing any reason and (3) that they could stop the experiment if they felt sick or had any discomfort.All the training sessions were performed under the supervision of a researcher in case there was an emergency.All participants signed informed consent, a Participant information sheet and consent form before undertaking the VR training.Additionally, we applied to the Commission on the Ethics of Scientific Research from the Lodz University of Technology and received a positive decision (#2/2021).

EEE System
The EMOTIV EPOC Flex (see Figure 6a) is a flexible 32-channel EEG system, with 16 electrodes terminating in a left wiring harness and 16 in a right wiring harness.There was an additional common-mode sensor (CMS) and driven-right-leg (DRL) sensor, which served as the reference.The electrodes were located at the positions presented in Figure 6b.The device had an internal sampling rate of 1024 Hz and, after filtering, the data collected by the controller were sent to the computer at approximately 128 Hz.The signal was transferred wirelessly, which offered much greater mobility, as well as ease of mounting the headset.EEG resolution in Epoc Flex was 14 bits, 1 LSB = 0.51 µV (16 bit ADC, 2 bits instrumental noise floor discarded).
The EMOTIV EPOC Flex was supplied with an application called EmotivPRO, which allowed the user to view data streams and frequency analysis in real time.Raw EEG graphs were displayed as µV per sample and frequency analysis view allowed the user to observe frequency information of a particular EEG channel [48].Flex has in-built EEG signal pre-processing such as a high-pass filter of 0.2 Hz and low-pass filter of 45 Hz.It has been proven that EMOTIV EPOC Flex is able of capturing data similar to research-grade EEG system [49].All of the above-mentioned functionality could be used without a custom application through the supplied control panel.The panel had a tab for each suite with a simple interface for visualizing and controlling, along with a display of the contact quality of each sensor.Additionally, Emotive provides an API powerhouse, enabling developers and researchers to create personalized applications using real-time brain data.

Background on EEG Wave Analysis
EEG signal can be analyzed taking into consideration how often and where the oscillations occurred.The first approach was based on spectrum analysis and the oscillations were characterized by the frequency.Many previous research allowed to form norms describing average values and physiological extremes of basic recording parameters.The basic rhythm of correct EEG signal of an adult is described as Alpha rhythm, which is registered in the back side of the head (in occipital, occipito-parietal and temporo-occipital areas).It is characterized by a signal change frequency from 8 Hz to 12 Hz and an amplitude of approx.35 µV.Alpha waves are the default 'relaxed and alert' modes of the brain.Comparable conditions are valid for other parts of the brain.In the frontal lobes, during time of relaxation, high alpha levels could be measured as well, and they were suppressed when the subject started being involved in other activity.A common EEG practice is the comparison of Alpha suppression levels between different parts of the brain, so as to identify the areas currently in use [50].
Another distinctive parameter of EEG recording is Beta waves (13-28 Hz) present in frontal parts of the brain in the state of conscious relaxation.Beta activity of multiple and volatile frequencies is an indicator of brain activity, problem solving, anxiety or active concentration.
Theta rhythm (4-8 Hz) with an amplitude reaching 30 µV is commonly observed in the hippocampus and rhythm generation is dependent on medial septal area.Theta activity can be recorded in the arousal, drowsiness and meditation-like states, and it is often associated with creative, mindfull or meditative conditions.Delta waves (less than 4 Hz) with an amplitude approx.50 µV should not be observable in a healthy subject's EEG record, and they indicate pathology [49].
Gamma rhythm (greater than 30 Hz) indicates physical activity, motoric functions and cognitive processes characterized by heightened activity of neural network.Basically, Gamma waves are recorded in the frontal brain parts during a quick, coupled processing such as fight/flight reaction or multitasking and task switching.In the case of the latter, Gamma bursts are easily identified when the ongoing task is moved to the short-term memory and a new task becomes processed concurrently [51,52].Full description of individual waves is included in Table 2. Numerous studies show that the amplitude of these waves increases when the subject focuses on the source of the stimulus.The duration of the gamma wave is very short and amounts to several dozen ms.Too much of them may indicate anxiety, high agitation and stress, too little may be associated with depression and be an indicator of learning disabilities.
One should also mention artifacts registered during EEG signal recording, which can have various causes and can be divided into biological and technical.In case of biological artifacts, they stem from organs other than the brain which exhibit electric activity.For example, artifacts connected with muscle tension appear in almost every EEG recording and they demonstrate highest values in frontal and temporal areas [59].Thus, before every EEG examination, a closed/open eyes trial was performed.On the other hand, technical artifacts are registered in connection with activity of electric devices.The most common source is an incomplete contact of an electrode with skin [52].It was observable in subjects with thick or long hair and in the situation of subjects' movements.

Data Pre-Processing
The high-density EEG files (.csv) extracted directly from the Emotive EPOC FLEX system were converted to .fiffiles using methods from the MNE-Python library [60].These are data structures based on Neuromag's FIF file format.As mentioned earlier, the signal at the earlier stage was processed by the high-pass filter of 0.2 Hz and the low-pass filter of 45 Hz built into the EPOC Flex device.
The data were pre-processed and the signal values were adjusted to the level of units required in the .fiffiles (the EMOTIV EPOC device records in µV, whereas MNE library requires V). Figure 7 shows the raw signals registered for a sample person with the ranges of characteristic events marked.Recorded EEG signals can be potentially influenced by many factors, some of them being activity of different parts of the brain, heartbeat, blinking, etc.However, as long as these various source signals are statistically independent and non-gaussian, it is usually possible to separate the sources using independent components analysis (ICA), and then re-construct the sensor signals after excluding the sources that are undesired [61,62].
The used MNE-Python library implements three different ICA algorithms: Infomax, FastICA and Preconditioned ICA for Real Data (Picard).The most modern one is Picard and it is more robust than other algorithms in cases where the sources are not completely independent, which typically happens with real EEG data [81].Therefore, this algorithm was used to prepare the data in our research.Figure 8 shows a fragment of the signals recorded for a sample participant before and after applying ICA.For all signals from each session, fitting ICA to data was performed using 32 channels.Channels were scaled to unit variance ("z-standardized") as a group by channel type prior to the whitening by PCA (Principal Component Analysis).Number of principal components (from the pre-whitening PCA step) that were passed to the ICA algorithm during fitting was 16.The ICA method to use in the fit method was Picard.Maximum number of iterations during fit was 500.
A zero-phase finite impulse response (FIR) filter was applied using the window design method (firwin from the SciPy library; Hamming window).The parameters of the filters used (one-pass, zero-phase, non-causal high pass filter) are provided below: Detailed information about filtering EEG signals can be found in the Python-MNE library documentation and in [82].
The MNE software also includes a number of graphical tools that facilitate data preprocessing and post-analysis.Figure 9a shows the power spectral density (PSD) for each electrode.The PSD analysis, determined by the Welch method [83] in MNE software, provides a quick method of finding spectral artifacts and and outlier channels [82].
Rejection based on interpeak amplitude was also used during analyses.EEG data are inherently noisy, and trials can sometimes be contaminated due to high-amplitude artifacts, reducing the effectiveness of the search for the required information.Existing software (including MNE library), for processing M/EEG data, offer a rudimentary solution by marking a trial as bad if the peak-to-peak amplitude in any sensor exceeded a certain threshold.However, the threshold was usually set manually after a visual inspection of the data.We imposed a limitation on data quality: we rejected any epoch where the peak-to-peak signal amplitude exceeded reasonable limits for this type of channel.This was performed using the values in the rejection dictionary.These values depended on the set of equipment and recording conditions.It was necessary to find the correct balance between data quality and loss of power due to too many dropped epochs.Fortunately, we knew the event points to be analyzed and distinguished as individual epochs.Hence, it was easier to select the required rejection threshold for the methods from the MNE library.It was chosen at the level of 600 micro-Volts.At this level, all events (about 90%) were correctly detected, and no additional events resulting from disturbances in the signal of some of the electrodes were detected.Event markers related to the beginning of the stressful situation (starting to perform indicated tasks) and to the beginning of relaxation between tasks were also included in the data structure.To facilitate the analysis, the stress-related markers were divided into individual signals.Ultimately, a single data structure consisted of EEG signals (32 channels in the international 10-20 system described with names according to the standard) in Figure 6b and event signals EMDR, Stroop 1 , Stroop 2 and Stroop 3 .According to the description, a maximum of three stress-inducing events and three relaxations occurred during one session.

Classification
The final step of the proposed method was classification, which aimed to assign unknown data to a specific category (in this case: stress and relaxation response).We applied two different approaches-machine learning as well as deep learning Neural Network (NN)-in order to compare their performances based on the recognition rates.
In case of machine learning methods, the feature vector consisted of the average GFP values of a given frequency group for the first, second and third minutes of a particular event.The verification of efficiency of feature subsets (subsets consisted of separate brain wave groups) was carried out using several types of classifiers such as k-nearest neighbors algorithm (k-NN), support-vector machines (SVM) [84], Multilayer Perceptron (MLP) [85] and Random Forest (RF) [86] using Weka [87], with 10-fold cross-validation [88].This approach allowed the evaluation of the efficacy of particular features set and determined the most efficient ones.In the course of the research, the parameters for each classifier were identified and selected to achieve the highest recognition results.
In case of second approach, we used a CNN based on low-level features (raw signal and frequency information) in terms of stress recognition efficiency.The application of CNN for EEG-based stress recognition is illustrated in Figure 10.The input was in a form of time-frequency representation (TFR) images-averaged values from all electrodes.Morlet wavelet transform, which is equivalent to a complex sinusoid with Gaussian envelope [89,90], generating a time-frequency representation of the EEG signal was used to create the images.An example of a set of images can be seen in Figures A1 and A2.When creating the images, the frequency range (Y axis) was adopted from 1 Hz to 28 Hz, with the same 3 Hz step that was chosen because such a division provided practically correct boundaries at the dividing points between alpha, beta waves, etc.The time was set at the first minute from the beginning of the event.According to the EEG record analysis, this was a sufficient time to notice the differences between the analyzed emotions.For CNN's following architectures were tested: containing from 2 to 3 convolution layers (each convolution layer was followed by a max pooling layer) followed by 1 to 2 dense layers, from 50 to 400 neurons for convolution and 50 to 200 for dense neurons.For all CNN types, separate models were built, increasing the neuron count on each layer by 25 for each new model.The best results (87.5%) were obtained for a network of 4 layers, 3 layers of convolution neurons 250, 250, 100 for each layer, respectively, and a dense layer of 100 neurons.
We used leave-one-subject-out cross-validation.All CNNs were trained using ADAM [91] for gradient descent optimization and cross-entropy as the cost function, as it is a robust method, based on well known classical probabilistic and statistical principles, and are, therefore, problem-independent.It was capable of efficient solving difficult deterministic and stochastic optimization problems [92].Training was set to 500 epochs with an early stop condition if no loss decrease was detected for more than 30 epochs.
To investigate which electrodes from EMOTIV EPOC Flex were redundant for such research, we analyzed the classification performance for each electrode separately.

Results and Discussion
In this section, we summarized the experimental outcome of the study, followed by their interpretation.

Data Analysis Results
The standard EEG procedure was based on the analysis of a response to specific stimuli/events.The occurrence of such a stimulus was used to define the epoch, which was based on the recorded signals just before the stimulus and after the stimulus for a specified period of time.Signal values could be stored in the form of all channel values (in MNE software, the epoch class is responsible for this) or in average form (evoked class).
By analyzing successive epochs related to events in the obtained data, only in a few participants was it possible to find a clear change in the amplitude of signals at the beginning of a given event.There were two samples showing exemplary signals and topographic maps at specific time points for stress-related epochs, and they are illustrated in Figure 11.This figure also presents the delay in the reaction to the test initiation, characteristic for the remaining cases.The lack of a clear response to the start of the event in most samples might have been due to the absence of a distinct stimulus at the start of the event.It should be noted that, particularly, the initiation of the first of the three stress-related events was often poorly distinguishable from the signals recorded before the event (see the start of Stroop 1 in Figure 7).This may have been due to the fact that the event occurred within 2-3 min of the session, so the participant may have still been uncomfortable in the clinical situation and, potentially, a little bit stressed by the new experience.Only the first cycle of relaxation made it possible to record a stressful situation more clearly.
The other circumstance relevant to the recorded signals was that the stress phases did not end abruptly with the start of the relaxation phase.It was rather a smooth transition, so that there was no clear line separating the events.Therefore, the next step was a thorough analysis of the frequency spectrum in individual events.
Based on the published literature, epoching was performed for each trial over a time window of 3 s, and the resulting files were high-pass filtered at 0.1 Hz.The analyzed frequency band range was 0-60 Hz, with band splitting Alpha, Beta, etc.
The next step was to analyze specific frequency bands during the occurrence of particular events.In EEG signals, several types of activity characteristics for certain frequencies and amplitudes could be distinguished.There were five essential groups based on the different frequency ranges.They were Alpha (α), Theta (θ), Beta (β), Delta (δ) and Gamma (γ) bands.The terms and bands presented in Table 2 were used in the analysis, and the global field power (GFP) was used for the comparison.
The idea presented in [93] included in the MNE library documentation was adopted.The first stage was filtering the signals according to individual bands Figure 9; then, the Hilbert transform was applied.The bands were corrected and averaged, making it easy to compare the final GFP size across bands.A more detailed description of this method is presented in [94].An example of the effect of the above is presented in Figure 12.Such calculations were performed for all participants for all stress-related and relaxationrelated events.Regarding stress, when analyzing the averaged GFP values from three stressful situations, greater values were observed for Theta, Alpha, Beta and Gamma waves at the start of the events.Then, after about 1.5-2.5 min, their suppression could be noticed.In three cases, there was no increase in intensity before the event started, although the above suppression was still reported.The remaining 10 cases were inconclusive or not consistent.After the start of the event, changes in the Alpha and Beta waves in seven cases caused a decrease of about 1.5-2 min in the Beta/Alpha ratio.
The most characteristic feature of the relaxation-related brain wave charts in the studied cases was the presence of Theta and Alpha waves before the start of the event and their gradual suppression within 1-2 min.This occurred in 14 cases and a potential explanation of this phenomenon was that the onset of the relaxation event occurred just after the stress and relaxation was not instantaneous but gradual.Only in four cases one could notice changes in brain waves at the start of the event.In nine cases, the gradual suppression of Beta 0 waves was also clearly visible, and in 10 cases, the gradual suppression of Beta 1 waves.
It should be added that this analysis was based on reviewing six waveforms for 28 participants and for two different events, which amounted to 336 waveforms.In the next step, we determined the mean values in the next four minutes for each wave and event.The results are collected in Figure A3 (stress) and Figure A4 (relaxation).The means were calculated for one minute before a given event (marked as 0 (min)) and for three consecutive minutes after the event (marked as 1, 2, 3 (min)).The values for individual participants differed; therefore, in the next step, the number of minimums and maximums for a particular participant in a given time range was calculated.The designated minimums and maximums were the basis for further analyses.For relaxation (Figure A4), it can be seen that the maximum values occurred primarily for all waves just before the event (e.g., 21 of 28 maximums at time 0 (min) for the Theta wave).This was because relaxation occurred immediately after the stress event.The second and third minutes were characterized by a reduction in the calculated minimum values for a given participant.For stress, the results were more variable (Figure A3).The greatest number of changes in the first minute from the start of the event could be seen for the alpha wave.For 18 participants, in the first minute, there was a maximum for the alpha wave.The acquired mean values were used to classify the state of stress and relaxation.
Overall, in more than half of the cases, traits characteristic of stress-related and relaxation-related events were observed in the recorded brain waves.Sedation usually appeared with a delay of one or two minutes.Additionally, according to the interview with the participants and the previous research of the authors of this work [95], the EMDR method using VR had a calming effect.Traits related to stress or nervousness could be seen much better when analyzing the non-averaged values of individual events.The intensity of making the participants nervous was increased, and this was also seen in the increasing levels of individual brain waves.It should be noted that some of the participants claimed after the session that the stressful situation, due to the fact that it was related to the performance of a specific task, led to increased attention and concentration, and not to nervousness.This could be seen from the increased level in Beta 1 waves and the simultaneous disappearance of Alpha waves.Due to the stress-inducing purposefully incorrect feedback introduced while solving tasks, nervousness appeared in some participants only after the stressful situation was over.Its appearance at the end of the task (Beta waves) and its gradual damping during the EMDR method was seen in about 1/3 of the cases.

Classification-Machine Learning
Classification performances (in %) of different feature representations for the set of two classes-stress and relaxation-are presented in Table 3.
It is clearly visible that the best results were achieved for the subsets containing Theta-related features (94.64% using SVM and MLP).The lowest results were collected in Beta-related features (Beta-76.85%;Beta 1-66.07%;Beta 2-75%), which was noticeable for all types of classifiers.Analyzing results retrieved from different patterns, a significant recognition rate improvement when using the MLP classifier could be observed in most cases.Only in the case of Alpha-related features SVM gave better results than MLF.
The overall accuracy using all features as an input vector (COMBO) could be observed compared to one source-feature set.Furthermore, in this case, the best results were achieved with MLP (96.42%).On the other hand, the lowest increase in results was observed for the k-NN algorithm (83.93%).As expected, the effectiveness of classifiers whose testing and training sets comprised features gathered from different sources was much better than those operating on one particular feature set.The number of features by combining each source set increased the quality of the classification.However, it has to be emphasized that we obtained results barely 2% lower using only Theta-related features.

Classification-CNN Model
Figure 13 shows the results obtained in the classification.The results were valid for the analysis based on the averaged values for all electrodes (the set of used images is shown in Appendix A: Figures A1 and A2), as well as for each electrode separately.Due to the fact that there were 28 participants, 56 images were generated for each electrode in two classes (stress and relaxation).The first columns present the quantities of correctly classified images.On this basis, coefficients describing the quality of the classification, i.e., F1-score, precision and recall, were determined separately for each class.The last columns show the class-averaged results.
Additional factors that were determined were: CA-accuracy classification score; AUCarea under the ROC curve.Individual values of these two coefficients are, additionally, presented in Figure 14.Classification quality scores were up to 90% (AUC).According to the authors, this was a promising result, the more so as the entire frequency range was taken into account.However, it is worth noticing that equally good results could be obtained based on single, particular brain waves.An important element of this stage was also to determine which of the electrodes allowed for a better classification of the analyzed emotions.The following assessment parameters were adopted for the classification factors:  The indicated criteria were marked on the diagram in Figure 14.Additionally, Figure 15a (CA coefficient) and Figure 15b (AUC coefficient) show the electrode arrangement with electrodes marked, which allowed for a sufficient or ineffective classification.CA and AUC values were related, although due to slightly higher AUC values, the obtained images were slightly different for the same criteria.What drew attention was the high symmetry in the obtained results.Mid-area electrodes along the entire head (Fz, Cz, Pz, Cp1, Cp2 and Oz) gave unsatisfactory results (blue color of the electrodes).The outer electrodes (FT9, FT10, TP9 and TP10) were characterized by high-classification efficiency (red color of the electrodes) for both CA and AUC.For the AUC ratio, electrodes F3, F4, FC5, FC6, CP5, CP6, P7 and P8 also met the assumed criterion.Only the high AUC values obtained with the FP2 electrode disturbed the full symmetry.

Conclusions
In this paper, we investigated the use of EEG signals to classify a subject's stress level while using a VR environment.For this purpose, we created a VR interactive simulation with relaxation scenes based on psychotherapy treatment utilizing bilateral stimulation, and used the Stroop test as a stressor.The experiment was conducted on a group of 28 healthy adult volunteers whose EEG signal was continuously monitored using the EMOTIV EPOC Flex wireless EEG head cap system.Then, we classified the stress level using CNN and conventional machine learning algorithms.The obtained results were promising and a further study is greatly encouraged as the popularity and widespread use of VR solutions requires a more systematized and automated approach for usability testing.With the current pace of technology development, it is not a theoretical need, but an immediate necessity.
Participants in the post-interview acknowledged that the developed EMDR method used in the virtual environment was excellent for relaxation, which was consistent with our previous research [95].At the same time, some participants pointed out that the prepared 'stressful' tasks induced a state of focus rather than a state of nervousness.This could have directly affected the deterioration of the later stage classification results.Therefore, a more detailed participant survey should be used in future studies.Another approach might include repeating the session with the same participant, although the acquired experience of the participant could have interfered with the results.
The proposed method of using mean values from Global Field Power for individual waves divided into time units (consecutive minutes from the beginning of the event) turned out to be easier to analyze than the detailed analysis of the full distribution for a given wave that was performed at the beginning of the study.At the same time, the obtained values proved to be sufficient to classify two emotions, stress and relaxation, using MLP.The results exceeded 90% of the classification accuracy.The best results were obtained considering all brain waves (96.42%).When discussing individual waves, the best results were acquired from Theta (94.64%),Gamma (80.36%) and Alpha (76.78%) waves.
The proposed method of the image analysis that utilized Time-Frequency Representation (TFR) using Morlet wavelets additionally allowed an excellent classification between stress and relaxation.However, the average results were slightly lower than in the earlier stage of analyses.It may have been caused by the amount of data.Since the amount was relatively small, the CNN was more difficult to train and apply.Thus, the MLP gave better results.The method made it possible to determine which electrodes proved useful for successful classification and which were redundant.The symmetry of useful and redundant electrodes was also worth noticing.The electrodes in the middle part of the head gave unsatisfactory results, while the lateral ones classified the analyzed emotions very well.At the same time, the obtained results allowed us to focus in subsequent analyses on individual electrodes that classified the considered emotions most accurately.Decreasing the number of electrodes made it possible to analyze a much lower volume of data; thus, reducing the computing resources necessary.Additionally, it might increase the comfort of the study, because participants would be less encumbered by required equipment.
To summarize, we had a lot of concerns about measuring EEGs with the EMOTIV EPOC Flex while using the Oculus Quest HMD, since it required an intricate process of applying the gel for EEG measurements.Although the VR experience required stillness from the user, some of the recordings were rejected for further analysis due to the low quality (instantaneous signal quality was less than 95%).However, the overall results were optimistic and exceeded most results obtained in similar studies related to mental stress classification using EEG signals (for more, see the latest review on stress assessment methods using an EEG [96]).Therefore, one can conclude that due to the separation from the natural environment, VR causes a more substantial emotional involvement, which leads to a better recognition rate.

Figure 1 .
Figure 1.The structure of proposed mental stress recognition approach (a) using machine learning (b) using deep learning.

Figure 2 .
Figure 2. The process of mounting EPOC Flex headset and VR HMDs.

Figure 3 .
Figure 3. Timing protocol of the experiment.EMDR-Eye Movement Desensitization and Reprocessing.

Figure 6 .
Figure 6.The EMOTIV EPOC Flex: (a) a subject in EEG head cap system; (b) electrode positioning.

Figure 7 .
Figure 7. Signals recorded for a sample participant (RAW file).

Figure 8 .
Figure 8. Signals recorded for a sample participant before and after the application of ICA (selected 6 channels limited to 18 s).Disruptions in the signal were related to the exceeding of the maximum range.

Figure 9 .
Figure 9. Example of (a) the power spectral density for each electrode and (b) power spectral density after applying filters for individual frequency bands (participant 1).

Figure 10 .
Figure 10.The process of using a CNN for EEG-based stress recognition (a) shows the process of making the heat-map image based on EEG signals, and (b) the structure of the CNN classifier, which uses the heat-map images as its input.

Figure 11 .
Figure 11.Butterfly plots based on EEG signals (32 channels) with scalp topographies for two participants; at time 0, the event with the label Stroop 2 started.Specific events were spread in time (2-4 min) and, in most samples, when analyzing the entire course of the recorded EEG waveform, it was possible to observe a clear difference in the frequency of individual signals for a given event (Figure7).Observing the signals

Figure 12 .
Figure 12.Global Field Power for individual brain waves for participant 1, averaged values for a stressful situation.

Figure 13 .
Figure 13.Classification results, mean value based on analysis of all electrodes, and values based on analysis of individual electrodes.Legend: CA-accuracy classification score; F1-f-score; AUC-area under the ROC curve.

•
Value below 70%-classification ineffective; • Value in the range of 70-80%-classification not assigned to any of the groups; • Value above 80%-classification sufficient.

Figure 14 .
Figure 14.Classification results, mean value based on analysis of all electrodes-CA.

Figure 15 .
Figure 15.Electrodes with minimum (blue) and maximum (red) CA (a) and AUC (b) values in the classification (average over class).

Figure A3 .
Figure A3.Average GFP values for individual brain waves and for individual participants calculated for stress.The means were counted for four minutes: 0 (min)-average for the minute before the event; 1, 2, 3 (min)-average for the consecutive minutes of the event.The number of minimum and maximum values for individual participants was summarized under the tables.

Figure A4 .
Figure A4.Average GFP values for individual brain waves and for individual participants calculated for relaxation.The means were counted for four minutes: 0 (min)-average for the minute before the event; 1, 2, 3 (min)-average for the consecutive minutes of the event.The number of minimum and maximum values for individual participants was summarized under the tables.

Table 3 .
Classification performances (in %) of different feature representations for the set of two classes: stress and relaxation.Numbers in bold and red highlight the maximum classification rates achieved in each column.COMBO refers to all extracted parameters in one feature-vector.