Audio’s Impact on Deep Learning Models: A Comparative Study of EEG-Based Concentration Detection in VR Games

GomezRomero-Borquez, Jesus; Del-Valle-Soto, Carolina; Del-Puerto-Flores, José A.; López-Pimentel, Juan-Carlos; Castillo-Soria, Francisco R.; Ibarra-Hernández, Roilhi F.; Betancur Agudelo, Leonardo

doi:10.3390/inventions10060097

Open AccessArticle

Audio’s Impact on Deep Learning Models: A Comparative Study of EEG-Based Concentration Detection in VR Games

by

Jesus GomezRomero-Borquez

^1,†

,

Carolina Del-Valle-Soto

¹

,

José A. Del-Puerto-Flores

^1,*,†

,

Juan-Carlos López-Pimentel

¹

,

Francisco R. Castillo-Soria

²

,

Roilhi F. Ibarra-Hernández

²

and

Leonardo Betancur Agudelo

³

¹

Facultad de Ingeniería, Universidad Panamericana, Álvaro del Portillo 49, Zapopan 45010, Mexico

²

Facultad de Ciencias, Universidad Autónoma de San Luis Potosí, San Luis Potosí 78295, Mexico

³

Facultad de Ingeniería en TIC, Universidad Pontificia Bolivariana, Medellín 050031, Colombia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Inventions 2025, 10(6), 97; https://doi.org/10.3390/inventions10060097

Submission received: 25 August 2025 / Revised: 19 October 2025 / Accepted: 24 October 2025 / Published: 29 October 2025

(This article belongs to the Special Issue Advances and Innovations in Deep Learning: Unveiling Multidisciplinary Applications and Challenges)

Download

Browse Figures

Versions Notes

Abstract

This study investigates the impact of audio feedback on cognitive performance during VR puzzle games using EEG analysis. Thirty participants played three different VR puzzle games under two conditions (with and without audio) while their brain activity was recorded. To analyze concentration levels and neural engagement patterns, we employed spectral analysis combined with a preprocessing algorithm and an optimized Deep Neural Network (DNN) model. The proposed processing stage integrates feature normalization, automatic labeling based on Principal Component Analysis (PCA), and Gamma band feature extraction, transforming concentration detection into a supervised classification problem. Experimental validation was conducted under the two gaming conditions in order to evaluate the impact of multisensory stimulation on model performance. The results show that the proposed approach significantly outperforms traditional machine learning classifiers (SVM, LR) and baseline deep learning models (DNN, DGCNN), achieving a 97% accuracy in the audio scenario and 83% without audio. These findings confirm that auditory stimulation reinforces neural coherence and improves the discriminability of EEG patterns, while the proposed method maintains a robust performance under less stimulating conditions.

Keywords:

EEG signal processing; DNN; virtual reality; Magnitude-Square Coherence; spectral entropy; supervised classification; PCA

1. Introduction

The evaluation of video games in virtual environments has gained increasing relevance due to their expanding presence in educational, therapeutic, and training contexts [1]. Beyond entertainment, there exists a clear opportunity to design and implement serious games—digital applications structured with intentional learning objectives—that can foster the development of cognitive and behavioral competencies among players. However, to assess the effectiveness of such games, it is essential to determine whether they engage users cognitively, particularly in terms of concentration and focus. One promising approach for this evaluation involves the use of electroencephalogram (EEG) signals, which provide direct, non-invasive insights into the brain’s electrical activity. By analyzing these signals, it is possible to construct metrics and validation frameworks that classify games—whether serious, standard, or purely recreational—according to their ability to induce measurable cognitive engagement in players. This methodology provides a scientific foundation for evaluating the impact of game design on user attention, potentially guiding the development of more effective serious game interventions.

1.1. Motivation

This research addresses the lack of tools capable of identifying, in real time, when a serious video game induces a cognitive concentration state in the player, thus limiting the objective evaluation of its effectiveness. Accordingly, this study aims to define the most suitable EEG preprocessing pipeline to optimize the training of Deep Neural Networks (DNNs) for concentration detection in virtual reality (VR) environments, and to assess how their performance varies when trained with EEG data obtained under experimental conditions with and without auditory feedback.

1.2. Literature Review

When evaluating the state of the art, recent developments reveal a broader landscape at the intersection of extended reality (XR), electroencephalography (EEG), and neurocognitive analysis. The related works can be classified into six thematic groups: Emotion Recognition via EEG, Multimodal Fusion, Neurocognitive Assessment through XR, Therapeutic and Rehabilitation Applications, Neurological Disorder Analysis, and Real-Time Cognitive Systems.

Emotion Recognition via EEG remains central to multiple contributions. EEG-based emotion classification is addressed through advanced neural architectures, including CNNs, BiLSTMs, and attention mechanisms, aimed at capturing spatial, temporal, and frequency domain patterns in EEG data [2]. Feature engineering techniques such as differential entropy and time–frequency decomposition enhance classification performance, providing robust emotional state predictions applicable in both clinical and commercial contexts.

Multimodal Fusion strategies integrate EEG signals with complementary modalities such as facial expressions, heart rate variability, or galvanic skin response to improve recognition accuracy and generalizability. These multimodal systems leverage synchronized temporal dynamics across signals, enabling affective computing systems with high reliability and ecological validity [2,3,4,5]. Such integration is particularly relevant in adaptive human–computer interfaces and immersive user experience platforms.

Neurocognitive Assessment through XR introduces a rapidly growing domain. Recent systematic reviews and validation studies emphasize the transformative role of XR technologies—particularly virtual reality (VR) and augmented reality (AR)—for evaluating cognitive functions. Bello et al. (2025) demonstrated that immersive VR environments allow for the realistic assessment of attention, memory, and executive functions, while AR systems show potential in early cognitive impairment detection [6]. Similarly, Bhargava et al. (2024) validated VR and 3D mobile games against standardized cognitive tests (ACE-III), showing significant correlations and improved ecological validity [7]. These findings confirm XR’s capacity to provide dynamic, context-aware assessment conditions closer to real-life cognitive performance.

Therapeutic and Rehabilitation Applications extend the utility of immersive systems beyond assessment toward cognitive training and behavioral modification. Park and Jeon (2024) reported that non-immersive VR exercises substantially enhance balance and gait in older adults, especially when feedback and avatar-based interaction are included [8]. Likewise, Ribeiro et al. (2024) demonstrated that AR-based serious games can successfully promote self-examination behaviors for melanoma prevention, increasing both motivation and knowledge retention [9]. For elderly populations with Alzheimer’s disease, You et al. (2024) identified that therapeutic toys and interactive games improve cognitive engagement and emotional well-being, providing a framework for non-pharmacological interventions [10]. These studies collectively support the therapeutic impact of immersive and semi-immersive digital applications.

Neurological Disorder Analysis and Estimation focuses on the detection and classification of cognitive impairments using EEG or combined XR–EEG paradigms. Deep learning models trained on features extracted via ensemble empirical mode decomposition and spectral measures achieve a high accuracy in differentiating mild cognitive impairment (MCI) and Alzheimer’s disease from healthy states [5,11]. Moreover, Datta et al. (2024) demonstrated that immersive digital environments modulate emotional engagement and cognitive load, as measured by brain–computer interfaces, suggesting their potential for diagnostic monitoring [12].

Real-Time Cognitive Systems highlight low-latency frameworks for online adaptation in user-aware environments. These systems are crucial for neuromarketing, interactive learning, and mental state monitoring applications where predictive inference must operate under real-time constraints [4,11,13,14,15]. Optimized models using compact EEG-derived features—such as Power Spectral Density (PSD), Magnitude-Squared Coherence (MSC), and Differential Entropy (DE)—enable efficient on-device computation and seamless deployment in mobile or embedded systems.

Spectral coherence and entropy remain foundational EEG-based indicators for assessing functional connectivity and cognitive complexity, respectively. Coherence quantifies the synchronization between cortical regions during affective processing, while entropy captures the variability and richness of neural oscillations [16,17,18,19,20,21,22,23,24,25,26,27].

Recent machine learning approaches—including Logistic Regression (LR), K-Nearest Neighbors (KNN), Support Vector Machines (SVMs), Deep Neural Networks (DNNs), and Graph Convolutional Neural Networks (DGCNNs)—have demonstrated the feasibility of decoding cognitive and emotional states from EEG-derived features [28,29]. Nevertheless, reported accuracies typically range between 61.56% and 76.60% when using single features such as PSD [3], largely due to the absence of robust preprocessing stages. In the present study, we address this gap by proposing an optimized EEG preprocessing and feature extraction algorithm tailored for DNN models, thereby improving the robustness, reliability, and interpretability of cognitive state classification.

1.3. Contribution and Article Structure

Based on the experimental results obtained, the main contributions of this work can be summarized as follows:

A preprocessing stage for EEG signals was developed, specifically adapted for training deep learning models. This stage includes the normalization of neurophysiological features and the automatic labeling of cognitive states (focused/not-focused) through a proposed algorithm based on Principal Component Analysis (PCA).
The detection of cognitive concentration was formalized as a supervised classification problem, using a Deep Neural Network (DNN) trained on neuroscientific metrics extracted from EEG signals.
An optimized training scheme for DNNs was proposed, achieving performance metrics that surpass those reported in the state of the art.
Different configurations of the experimental dataset were evaluated, considering scenarios with and without audio during VR gaming sessions, which allowed the identification of the most suitable conditions for robust DNN training.

The remainder of this paper is organized as follows. Section 2 details the methodology and experimental design, including the construction of the virtual environment and the procedures followed during the data collection phase. Section 2.2 presents the EEG signal processing methods and the analysis techniques applied to the data obtained from the gameplay sessions. Section 2.5 introduces the preprocessing stage and the proposed DNN model. Section 3 reports the experimental results obtained from evaluating the DNN model, highlighting trends and differences with respect to prior work. Finally, Section 5 provides concluding remarks and discusses the main findings.

2. Materials and Methods

This study aims to identify the most suitable preprocessing pipeline for electroencephalography (EEG) signals in order to optimize the training of machine learning models applied to the detection of concentration states in video game players within VR environments. To this end, different datasets, feature extraction methods, and labeling strategies are evaluated, comparing their impact on the performance of DNN-based classifiers.

As a case study, VR puzzle-type video games were employed, given their documented effect in inducing cognitive focus states. Previous research has shown that this genre of VR games promotes the stimulation of patterns consistent with sustained attention processes [14,20]: (1) interhemispheric synchronization, (2) increased Power Spectral Density (PSD) in frontoparietal regions, (3) reduced randomness in EEG signals within the Gamma band (30–40 Hz). The following section describes the EEG signal acquisition protocols and the subsequent digital signal processing procedures.

2.1. EEG Dataset Description

2.1.1. Subject Recruitment

This study involved 30 young adults (17 women and 13 men), aged between 18 and 27 years, with a mean age of 22.07 years, recruited from the Universidad Panamericana in Guadalajara, Mexico. Participants engaged with three virtual reality (puzzler) games—Cubism, Puzzling Places, and Tetris [30,31,32]—under two experimental conditions: with audio and without audio. Gameplay sessions were conducted at Halberd Studios, a controlled environment equipped with specialized spaces and staffed by professionals experienced in video game development (e.g., 9 Years of Shadows [33]). The inclusion criteria were designed to limit prior exposure to virtual reality, allowing a maximum of three previous sessions with VR headsets.

Figure 1 illustrates the methodology followed for EEG data acquisition. The process was carried out under a standardized three-phase protocol: preparation, execution, and repetition. Initially, participants received instructions through video tutorials for each VR game (Cubism, Puzzling Places, and Tetris) [30,31,32], covering key aspects such as the mechanics, objectives, and level structure. Once the VR environment was set, EEG data were acquired following the sequence below:

Calibration of the VR equipment (emphasizing comfort and correct visualization).
15 min gameplay session with audio + continuous EEG recording.
60 min resting period (to minimize cognitive fatigue).
Switch to the next VR game title.

This cycle was then repeated under identical conditions but omitting the auditory component, thus generating paired EEG datasets that enabled a direct comparative analysis of the impact of audio on neural network models. The EEG signals collected from the 30 participants served as the basis to evaluate the role of audio in shaping EEG data, within the context of training a machine learning classifier designed to identify the cognitive state of concentration.

2.1.2. Experimental Setup

EEG and VR gameplay data were collected using a MUSE 2 EEG headset (4 channels: TP9, AF7, AF8, TP10) and a Meta Quest 2 system. Participants first wore the MUSE 2 device, and EEG signals were monitored in real time via a tablet to ensure stable transmission (sampling rate: 256 Hz). Once signal stability was confirmed, participants used the VR headset. Figure 2 illustrates a participant during two test scenarios: one with integrated auditory stimuli and another in no-audio mode, both using the combined devices. EEG data were preprocessed with participants seated to minimize motion artifacts during gameplay, and brain activity was recorded across the four canonical frequency bands described in Table 1.

2.2. EEG Signal Analysis

Adequate feature extraction from EEG data is essential to highlight relevant information for classification processes and the analysis of cognitive activity, particularly in the study of the focused state in video game players.

Prior to feature extraction, all EEG signals were preprocessed to minimize environmental noise and non-cortical artifacts. A fourth-order Butterworth band-pass filter was applied to isolate the Gamma frequency range (30–40 Hz), which is physiologically associated with sustained attention and cognitive concentration (shown in Table 1). In this context, the estimation of the Power Spectral Density (PSD) in the Gamma band, together with the coherence analysis between the frontal electrodes AF7–AF8 in the 30–40 Hz range, enables the identification of patterns associated with sustained attention. Complementarily, the Spectral Entropy (SpEn) calculated at electrode AF8 within the same band and applied across all participants provides a robust metric for detecting changes in the complexity of the neural signal, thereby reinforcing the characterization of cognitive processes related to the level of concentration.

2.3. Frequency Domain Analysis

EEG signals were analyzed in the frequency domain (FD) in order to observe their spectral components and examine their distribution across the key frequency bands described in Table 1. This representation in the frequency domain allows for the precise identification of brain activity associated with relevant cognitive processes; in particular, we selected the Gamma band, which is strongly linked to attention and concentration.

This was achieved through the calculation of the Power Spectral Density (PSD) using Welch’s method. This method improves spectral estimation by dividing the signal into overlapping segments and averaging their Fourier transforms. The formulation employed is shown in the following equation:

P_{x x} (f) = \frac{1}{N_{F F T} \cdot Δ} \cdot \sum_{i = 0}^{M - 1} \frac{| X_{i} {(f) |}^{2}}{N_{a v g}},

(1)

where

P_{x x} (f)

corresponds to the Power Spectral Density at frequency f,

N_{F F T}

is the number of FFT points,

Δ

is the interval between segments, M is the total number of segments,

X_{i} (f)

is the Fourier transform of the i-th segment, and

N_{a v g}

is the number of averaged segments. This procedure provided a stable and reliable spectral profile, essential for the comparative analysis of cognitive states in video game players.

2.4. Neuroscientific Metrics

2.4.1. Magnitude-Squared Coherence

To evaluate functional connectivity between brain regions, the Magnitude-Squared Coherence (MSC) metric was applied, which quantifies the degree of spectral synchronization between two EEG signals. In this study, coherence was calculated from the signals recorded at the pair of frontal electrodes AF7–AF8 within the Gamma band, since this frequency range is associated with complex cognitive processes and focused attention states (Table 1). The complex coherence function is defined as the normalized cross-spectral density:

C_{x y} (f) = \frac{P_{x y} (f)}{\sqrt{P_{x x} (f) P_{y y} (f)}},

(2)

where

P_{x y} (f)

corresponds to the cross-spectral density between signals

x (t)

and

y (t)

, while

P_{x x} (f)

and

P_{y y} (f)

represent the autospectra of each signal. Specifically, the cross Magnitude-Squared Coherence (MSC) is obtained as

| C_{x y} {(f) |}^{2} = \frac{| P_{x y} {(f) |}^{2}}{P_{x x} (f) P_{y y} (f)} .

(3)

This metric takes values in the range

0 \leq | C_{x y} {(f) |}^{2} \leq 1

, where values close to 1 indicate a high level of spectral synchronization between signals. In the context of cognitive processes such as problem solving, MSC values derived from EEG provide insights into functional connectivity patterns that reflect the coordination between frontal brain regions during task execution. Such analysis is directly related to the study of the neuronal dynamics underlying concentration and sustained attention states.

2.4.2. Spectral Entropy

In addition to functional connectivity, the feature of Spectral Entropy (SpEn) was incorporated to characterize the complexity of the neural signal. This measure quantifies the degree of disorder or randomness in the distribution of energy across different frequencies, providing a sensitive indicator of changes in cognitive states. SpEn is calculated as

S p E n = - \sum_{f} P (f) {log}_{2} P (f),

(4)

where

P (f)

represents the normalized Power Spectral Density at frequency f. In the context of this study, the computation of SpEn at electrode AF8 within the Gamma band allowed us to identify variations in the complexity of brain activity associated with the level of attention and concentration of video game players, thereby complementing the information obtained through PSD and MSC.

By integrating PSD, MSC, and SpEn, this study establishes a robust feature set for distinguishing focused from non-focused cognitive states in video game players under VR conditions. Moreover, these features enable the construction of a physiological profile that serves as a basis for the objective detection of cognitive concentration.

2.5. Deep Neural Network Model

2.5.1. Signal Preprocessing

The features extracted from the EEG signals of each video game player—PSD, MSC, and SpEn—were used as the basis for a preprocessing pipeline aimed at training Deep Neural Network (DNN) models. The purpose of this procedure was to transform the neurophysiological descriptors into normalized and labeled representations that could later be efficiently interpreted by supervised classification models.

First, the average metrics were calculated per subject: Gamma power derived from PSD, MSC coherence between the frontal electrodes AF7–AF8, and SpEn. These metrics were then normalized using z-score standardization, and in the case of SpEn, a negative sign was applied to align with the expected physiological direction, i.e., higher values of Gamma power and MSC together with lower values of SpEn associated with a higher level of concentration.

Once the feature matrix was constructed, an automatic labeling procedure based on Principal Component Analysis (PCA), described in Algorithm 1, was implemented. This method projected the features into a one-dimensional space, adjusting the orientation of the principal component to preserve the physiological relationship among the variables. The median value of the principal component was used as the separation threshold, assigning each subject the binary label focused (1) or not-focused (0). The binary labeling was based on the physiological profile described in Table 2.

Algorithm 1: Automatic Labeling of Focused State using PCA

In this way, the proposed EEG preprocessing pipeline enables the construction of a balanced, normalized, and labeled dataset, ready to be properly used in subsequent phases of training DNN models for the classification of the cognitive state of video game players.

2.5.2. Deep Learning Model

The proposed model consists of a Deep Neural Network (DNN) for the binary classification of the cognitive state focused/not-focused. The training data included the labels obtained from the proposed algorithm (Algorithm 1) together with the PSD of each video game player.

The training procedure is summarized in Algorithm 2. First, a stratified split is performed into training, validation, and test sets, preserving the class distribution. Next, a scaler is fitted on the training set and applied to the remaining partitions to prevent information leakage. On the normalized inputs, the DNN is trained by minimizing the Binary Cross-Entropy (BCE) loss using the Adam optimizer and a callback scheme that includes early stopping, checkpointing of the best model, and adaptive reduction of the learning rate in case of stagnation during validation. When class imbalance is present, weights inversely proportional to the class frequency are applied.

Algorithm 2: DNN Training for Focused-State Classification

The objective function employed is BCE:

L_{BCE} (y, \hat{y}) = - \frac{1}{m} \sum_{i = 1}^{m} [y_{i} log ({\hat{y}}_{i}) + (1 - y_{i}) log (1 - {\hat{y}}_{i})],

where

{\hat{y}}_{i} = σ (z_{i})

is the probability estimated by the network (sigmoid output) for the i-th example,

y

are the binary labels obtained from the automatic labeling procedure (Algorithm 1), and m is the batch size. During training, performance metrics such as accuracy, precision, and recall are monitored on the validation set to guide stopping criteria and hyperparameter selection. Finally, the weights corresponding to the best point (according to the minimum validation loss) are restored for evaluation on the test set and persisted along with the scaler, thus ensuring reproducibility in the inference stage.

2.5.3. DNN Architecture

The architecture employed is illustrated in Figure 3. The model receives as input a single feature corresponding to the average PSD value per subject. The network is composed of three fully connected layers with ReLU activations, each followed by dropout regularization and, in the first two blocks, batch normalization. The final output layer is a single neuron with sigmoid activation, producing the probability of belonging to the focused class. The detailed configuration of each layer, together with the number of trainable parameters, are summarized in Table 3 and Table 4. This compact architecture is designed to capture non-linear patterns in the PSD feature while preserving physiological interpretability: higher Gamma power values are expected to correspond to higher probabilities of being classified as focused. The combination of dropout, early stopping, and adaptive learning rate reduction provides a balance between representational capacity and generalization, effectively reducing the risk of overfitting in a low-dimensional input space.

2.5.4. Performance Metrics

To evaluate the performance of the DNN, the metrics precision, recall, and F1-score were employed, defined as:

P r e c i s i o n = \frac{T P}{T P + F P}, R e c a l l = \frac{T P}{T P + F N}, F 1 = \frac{2 \cdot P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l},

where

T P

,

F P

, and

F N

denote true positives, false positives, and false negatives, respectively.

Precision, recall, and F1-score are employed because they provide a more informative evaluation than accuracy in binary classification problems where class imbalance or asymmetric error costs may exist. Precision quantifies the proportion of positive predictions that are correct, thus controlling the number of false positives (important to avoid labeling as focused those states that are not). Recall measures the ability to recover actual positives, thus controlling the number of false negatives (critical for not missing true attention episodes). The F1-score, as the harmonic mean between precision and recall, penalizes solutions that sacrifice one of the two metrics and provides a robust balance when both are relevant. Together, these metrics allow for a balanced assessment of classifier performance under the physiological profile defined, where both avoiding false alarms and not missing true cases are equally critical.

3. Results

For the preprocessing of EEG signals and the training and execution of the DNN, the Python 3.11 libraries scipy, numpy, pandas, sklearn, and tensorflow were used. Specifically, in the class labeling stage (focused and not-focused), the PCA function was employed. The DNN model was implemented using keras.Model.

3.1. Experiment One

As a first experiment, we present the results obtained from the labeling of EEG data using the proposed Algorithm 1. The data, preprocessed under a gaming scenario with audio, were analyzed considering EEG coherence between the AF7–AF8 electrode pair in the Gamma band and PSD and SpEn extracted from AF8 in the same Gamma band.

Figure 4 shows the distribution of the labeled data, where a clear separation between both classes can be observed, validating the effectiveness of the proposed approach. The focused class (blue diamonds) clusters in regions with high MSC values (0.7–0.9) and low SpEn values (6.6–6.7), while the not-focused class (black dots) exhibits reduced MSC (0.4–0.6) and higher SpEn (6.8–7.0). In addition, the PCA axis associated with

l o g 10 (P (f))

reflects a significant difference between groups, ranging from −2.65 to −2.70 for focused vs. −2.80 to −2.85 for not focused.

Additionally, the test was replicated with EEG data obtained under the gaming scenario without audio. Figure 5 reveals a clear differentiation between cognitive states, showing PSD values between −2.60 and −2.85. Focused subjects present higher neural coherence (MSC: 0.7–0.9) and concentrate in the upper range of

l o g 10 (P (f))

(−2.60 to −2.70), while not-focused subjects show lower synchronization (MSC: 0.3–0.6) and reduced PSD values (−2.75 to −2.85). The marked cluster separation suggests that both MSC and PSD are robust parameters for assessing concentration in VR environments.

3.2. Experiment Two

In a second experiment, the preprocessed EEG data obtained under the gaming environment with audio were used. One hundred training runs of the DNN model described in Figure 3 were performed based on the proposed Algorithm 2. Appendix A.1 shows the results of the top 10 DNN models in terms of accuracy.

For the DNN model, the input vector

X

corresponds to the PSD values extracted from the EEG data in the Gamma frequency band. The hyperparameters used in the DNN model are shown in Table 3. Table 5 reports the performance of the best-trained DNN model, which achieved a 97% accuracy. The classification performance of the selected DNN model for the focused/not-focused classes is shown in Figure 6.

Table 6 compares the accuracy of the proposed model under audio conditions with models reported in the literature, showing that it significantly outperforms all alternatives. This remarkable performance suggests that the innovations introduced, particularly in the proposed preprocessing of the spatial and temporal features of EEG signals, enhance the accuracy of the DNN model. The proposed model demonstrated an advantage over established architectures such as DNN (78.94% ± 12.40) and DGCNN (75.87% ± 18.33), evidencing that the EEG data preprocessing stage is adequate for training the DNN. Notably, it even surpasses the optimized version of DGCNN reported in the literature [3] (76.60% ± 11.83), reinforcing the importance of the proposed auto-labeling algorithm. Among the comparative models, it is observed that deep architectures (DNN and DGCNN) consistently outperform traditional methods such as SVM (71.96% ± 15.02) and LR (67.09% ± 15.92), confirming their ability to capture complex patterns in neural signals. The reduced variance of DNN (

σ

= 12.40) compared with the baseline DGCNN (

σ

= 18.33) suggests greater stability, although the proposed model appears to overcome this limitation by combining a high accuracy in the classification of focused/not-focused cognitive states in gamers.

3.3. Experiment Three

In the third experiment, the proposed model was evaluated under without audio conditions during gaming sessions, following the same EEG signal preprocessing protocol described previously. Similarly, one hundred training runs of the DNN model shown in Figure 3 were performed using Algorithm 2 for parameter optimization. The objective was to determine the impact of the absence of auditory stimuli on the model’s ability to discriminate between the focused and not focused cognitive states.

In this case, the input vector

X

corresponds to PSD features extracted in the Gamma band, processed according to the same temporal and spatial scheme as in the previous experiment. However, the absence of audio implies reduced multisensory stimulation, which potentially lowers the discriminability of the neural patterns associated with each class.

Table 7 presents the classification metrics for the best trained model without audio, achieving an accuracy of 83%. Figure 7 shows the corresponding confusion matrix, where a slight decrease in precision and recall is observed compared with the audio scenario, particularly for the focused class. This suggests that auditory stimuli contribute to greater consistency in discriminative EEG patterns. This result confirms that, although the proposed model maintains a competitive and stable performance without audio, there is a notable 14% difference in accuracy compared with the audio scenario (Table 6). This drop can be attributed to the reduction in cognitive load induced by multisensory stimulation, which diminishes the intensity and consistency of the recorded neural responses. Furthermore, the proposed model without audio still outperforms baseline implementations such as DNN (78.94%) and DGCNN (75.87%) reported in the literature, showing that the preprocessing and optimized architecture remain robust under less stimulating conditions. Nevertheless, the results suggest that the inclusion of audio can significantly enhance the ability to discriminate cognitive states.

3.4. Experiment Four

In this experiment, the model’s ability to generalize across participants was evaluated under two experimental conditions: with and without background auditory stimulation. Two cross-subject validation schemes were applied: Group K-Fold (K = 5) and Leave-One-Subject-Out (LOSO). Both procedures were performed using the same spectral, coherence, and entropy features, as well as the same automatic labeling method based on PCA.

The results are shown in Table 8 and Table 9. Under the audio condition, the model achieved an average accuracy of

0.67 \pm 0.20

and a ROC-AUC of

0.79 \pm 0.20

in the Group K-Fold scheme, indicating a moderate and consistent classification performance between classes. In the LOSO validation, the average accuracy was

0.63 \pm 0.19

, with balanced precision, recall, and F1 values around

0.63

. Although the AUC is not reported for this case because some participants exhibited only one class in the test set, the results indicate that the model maintains adequate generalization across unseen subjects.

In the condition without audio, the model’s performance decreased compared with the audio scenario. In the Group K-Fold scheme, the average accuracy was

0.50 \pm 0.12

and the AUC was

0.49 \pm 0.24

, values close to the random level, suggesting a reduced discriminative ability between classes. In the LOSO validation, the average accuracy was

0.53 \pm 0.51

, with precision, recall, and F1 metrics around

0.23

, indicating high inter-subject variability. These results suggest that, in the absence of auditory stimulation, the extracted features are not sufficient to achieve a consistent classification of the attentional state.

4. Discussion

The results obtained across the four experiments confirm the effectiveness of the proposed model for classifying attentional states from EEG signals during gameplay. The automatic labeling algorithm based on PCA enabled a clear separation between the focused and not focused classes, highlighting the relevance of the spectral, coherence, and entropy features used in this study.

Under the audio condition, the DNN model achieved an accuracy of 97%, clearly outperforming traditional approaches and demonstrating the positive influence of auditory stimulation on the stability of EEG patterns. In the no-audio condition, the model maintained a competitive performance (83%), although a slight decrease was observed, which can be attributed to the reduction in multisensory stimulation and, consequently, a decrease in the discriminability of neural activity.

The cross-subject validation tests revealed that the model exhibits moderate generalization across participants. In the audio condition, accuracy and AUC reached

0.67 \pm 0.20

and

0.79 \pm 0.20

, respectively, indicating consistent discrimination between classes. In contrast, under the no-audio condition, performance dropped to near-chance levels (

0.50 \pm 0.12

), suggesting greater inter-subject variability and the reduced stability of EEG features in the absence of auditory input.

Overall, the findings indicate that auditory stimulation enhances the separability and stability of EEG signals associated with attentional states. Based on these results, it can be concluded that the training of future neural network models should be preferentially conducted using data collected in gaming scenarios that include background audio, as this condition promotes more stable and representative neural patterns. Nevertheless, this aspect remains open for future research, where adaptive architectures and normalization strategies could be explored to further improve cross-subject generalization and model robustness.

5. Conclusions

This paper presented a deep learning-based framework for detecting cognitive concentration states in virtual reality (VR) gaming environments using electroencephalography (EEG) signals. The proposed method integrates an optimized preprocessing stage, automatic labeling via Principal Component Analysis (PCA), and a Deep Neural Network (DNN) classifier. Experimental validation with 30 participants demonstrated that the model achieved an accuracy of 97% under auditory feedback conditions and 83% without audio, confirming the positive effect of auditory stimuli on neural synchronization and classification performance. The proposed approach significantly outperformed traditional methods such as SVM and LR, as well as deep architectures such as DGCNN reported in the literature, demonstrating the robustness and efficiency of the designed system.

The results indicate that the proposed EEG–DNN pipeline provides a reliable foundation for real-time concentration monitoring in serious VR games. Future research will focus on expanding the dataset, extending the model to multi-class classification, and evaluating its deployment in neuroadaptive systems for cognitive assessment and human–computer interaction.

Author Contributions

Conceptualization, J.G.-B., J.A.D.-P.-F., C.D.-V.-S., and L.B.A.; methodology, J.G.-B., J.A.D.-P.-F., C.D.-V.-S., J.-C.L.-P., and L.B.A.; software, J.G.-B., J.A.D.-P.-F., C.D.-V.-S., J.-C.L.-P., and L.B.A.; validation, F.R.C.-S., J.-C.L.-P., and R.F.I.-H.; formal analysis, J.G.-B., J.A.D.-P.-F., C.D.-V.-S., J.-C.L.-P., and L.B.A.; investigation, J.G.-B., J.A.D.-P.-F., C.D.-V.-S., J.-C.L.-P., L.B.A., F.R.C.-S., and R.F.I.-H.; resources, J.G.-B., J.A.D.-P.-F., C.D.-V.-S., L.B.A., F.R.C.-S., and R.F.I.-H.; writing, original draft preparation, J.G.-B., J.A.D.-P.-F., C.D.-V.-S., L.B.A., F.R.C.-S., and R.F.I.-H.; writing, review and editing, J.G.-B., J.A.D.-P.-F., C.D.-V.-S., L.B.A., F.R.C.-S., J.-C.L.-P., and R.F.I.-H.; supervision, J.G.-B., J.A.D.-P.-F., J.-C.L.-P., and C.D.-V.-S.; project administration, J.G.-B., J.A.D.-P.-F., and C.D.-V.-S.; funding acquisition, J.G.-B. All authors read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Integrity Code of the Universidad Panamericana, validated by the Social Affairs Committee and approved by the Governing Council through resolution CR 98-22, on 15 November 2022.

Data Availability Statement

The code used to run the simulations presented in this study is openly available through the following GitHub repository: https://github.com/Alberto-Del-Puerto/DNN-Based-Classification-EEG.git (accessed on 27 October 2025). The simulations were performed using version 1.0 of the software.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1

Table A1. Top 10 runs of the DNN model (audio scenario), ordered by execution number.

Run		Precision	Recall	F1-Score	Support
10	Not Focused	0.82	0.93	0.88	15
	Focused	0.92	0.80	0.86	15
	Accuracy			0.87	30
	Macro avg	0.87	0.87	0.87	30
	Weighted avg	0.87	0.87	0.87	30
13	Not Focused	0.82	0.93	0.88	15
	Focused	0.92	0.80	0.86	15
	Accuracy			0.87	30
	Macro avg	0.87	0.87	0.87	30
	Weighted avg	0.87	0.87	0.87	30
17	Not Focused	0.82	0.93	0.88	15
	Focused	0.92	0.80	0.86	15
	Accuracy			0.87	30
	Macro avg	0.87	0.87	0.87	30
	Weighted avg	0.87	0.87	0.87	30
18	Not Focused	1.00	0.93	0.97	15
	Focused	0.94	1.00	0.97	15
	Accuracy			0.97	30
	Macro avg	0.97	0.97	0.97	30
	Weighted avg	0.97	0.97	0.97	30
25	Not Focused	0.67	0.93	0.78	15
	Focused	0.89	0.53	0.67	15
	Accuracy			0.73	30
	Macro avg	0.78	0.73	0.72	30
	Weighted avg	0.78	0.73	0.72	30
34	Not Focused	0.79	0.73	0.76	15
	Focused	0.75	0.80	0.77	15
	Accuracy			0.77	30
	Macro avg	0.77	0.77	0.77	30
	Weighted avg	0.77	0.77	0.77	30
35	Not Focused	0.90	0.60	0.72	15
	Focused	0.70	0.93	0.80	15
	Accuracy			0.77	30
	Macro avg	0.80	0.77	0.76	30
	Weighted avg	0.80	0.77	0.76	30
39	Not Focused	0.82	0.93	0.88	15
	Focused	0.92	0.80	0.86	15
	Accuracy			0.87	30
	Macro avg	0.87	0.87	0.87	30
	Weighted avg	0.87	0.87	0.87	30
48	Not Focused	0.70	0.93	0.80	15
	Focused	0.90	0.60	0.72	15
	Accuracy			0.77	30
	Macro avg	0.80	0.77	0.76	30
	Weighted avg	0.80	0.77	0.76	30
49	Not Focused	0.88	0.93	0.90	15
	Focused	0.93	0.87	0.90	15
	Accuracy			0.90	30
	Macro avg	0.90	0.90	0.90	30
	Weighted avg	0.90	0.90	0.90	30

Appendix A.2

Table A2. Top 10 runs of the DNN model (no-audio scenario).

Run		Precision	Recall	F1-Score	Support
11	Not Focused	0.56	1.00	0.71	15
	Focused	1.00	0.20	0.33	15
	Accuracy			0.60	30
	Macro avg	0.78	0.60	0.52	30
	Weighted avg	0.78	0.60	0.52	30
22	Not Focused	0.65	0.87	0.74	15
	Focused	0.80	0.53	0.64	15
	Accuracy			0.70	30
	Macro avg	0.73	0.70	0.69	30
	Weighted avg	0.72	0.70	0.69	30
23	Not Focused	0.59	0.87	0.70	15
	Focused	0.75	0.40	0.52	15
	Accuracy			0.63	30
	Macro avg	0.67	0.63	0.61	30
	Weighted avg	0.67	0.63	0.61	30
46	Not Focused	0.80	0.53	0.64	15
	Focused	0.65	0.87	0.74	15
	Accuracy			0.70	30
	Macro avg	0.73	0.70	0.69	30
	Weighted avg	0.72	0.70	0.69	30
48	Not Focused	0.56	1.00	0.71	15
	Focused	1.00	0.20	0.33	15
	Accuracy			0.60	30
	Macro avg	0.78	0.60	0.52	30
	Weighted avg	0.78	0.60	0.52	30
63	Not Focused	0.70	0.47	0.56	15
	Focused	0.60	0.80	0.69	15
	Accuracy			0.63	30
	Macro avg	0.65	0.63	0.62	30
	Weighted avg	0.65	0.63	0.62	30
72	Not Focused	0.81	0.87	0.84	15
	Focused	0.86	0.80	0.83	15
	Accuracy			0.83	30
	Macro avg	0.83	0.83	0.83	30
	Weighted avg	0.83	0.83	0.83	30
81	Not Focused	0.50	1.00	0.67	15
	Focused	0.00	0.00	0.00	15
	Accuracy			0.50	30
	Macro avg	0.25	0.50	0.33	30
	Weighted avg	0.25	0.50	0.33	30
94	Not Focused	0.00	0.00	0.00	15
	Focused	0.50	1.00	0.67	15
	Accuracy			0.50	30
	Macro avg	0.25	0.50	0.33	30
	Weighted avg	0.25	0.50	0.33	30
96	Not Focused	0.50	0.07	0.12	15
	Focused	0.50	0.93	0.65	15
	Accuracy			0.50	30
	Macro avg	0.50	0.50	0.38	30
	Weighted avg	0.50	0.50	0.38	30

References

GomezRomero-Borquez, J.; Del-Valle-Soto, C.; Del-Puerto-Flores, J.A.; Briseño, R.A.; Varela-Aldás, J. Neurogaming in Virtual Reality: A Review of Video Game Genres and Cognitive Impact. Electronics 2024, 13, 1683. [Google Scholar] [CrossRef]
Chuah, Y.K.; Toa, C.K.; Goh, S.K.; Chan, C.K.; Takada, H. Emotion Classification Through EEG Signals: The EmoSense Model. In Proceedings of the 2024 International Conference on Computing Innovation, Intelligence, Technologies and Education, Sepang, Selangor, Malaysia, 5–7 September 2024; pp. 1–6. [Google Scholar] [CrossRef]
Fernandes, J.V.M.R.; Alexandria, A.R.d.; Marques, J.A.L.; Assis, D.F.d.; Motta, P.C.; Silva, B.R.D.S. Emotion Detection from EEG Signals Using Machine Deep Learning Models. Bioengineering 2024, 11, 782. [Google Scholar] [CrossRef] [PubMed]
Attallah, O.; Mamdouh, M.; Al-Kabbany, A. Cross-Context Stress Detection: Evaluating Machine Learning Models on Heterogeneous Stress Scenarios Using EEG Signals. AI 2025, 6, 79. [Google Scholar] [CrossRef]
Li, G.; Khan, M.A. Deep Learning on VR-Induced Attention. In Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality, San Diego, CA, USA, 9–11 December 2019; pp. 163–1633. [Google Scholar] [CrossRef]
Bello, K.; Aqlan, F.; Harrington, W. Extended reality for neurocognitive assessment: A systematic review. J. Psychiatr. Res. 2025, 184, 473–487. [Google Scholar] [CrossRef]
Bhargava, Y.; Kottapalli, A.; Baths, V. Validation and comparison of virtual reality and 3D mobile games for cognitive assessment against ACE-III in 82 young participants. Sci. Rep. 2024, 14, 23918. [Google Scholar] [CrossRef]
Park, J.H.; Jeon, H.S.; Kim, J.H. Effectiveness of non-immersive virtual reality exercises for balance and gait improvement in older adults: A meta-analysis. Technol. Health Care 2024, 32, 1223–1238. [Google Scholar] [CrossRef]
Ribeiro, N.; Tavares, P.; Ferreira, C.; Coelho, A. Melanoma prevention using an augmented reality-based serious game. Patient Educ. Couns. 2024, 123, 108226. [Google Scholar] [CrossRef] [PubMed]
You, D.; Ramli, S.H.b.; Ibrahim, R.; Chen, L.; Lin, Y.; Zhang, M. A thematic review on therapeutic toys and games for the elderly with Alzheimer’s disease. Disabil. Rehabil. Assist. Technol. 2024, 20, 1–13. [Google Scholar] [CrossRef]
Sridhar, S.; Romney, A.; Manian, V. A Deep Neural Network for Working Memory Load Prediction from EEG Ensemble Empirical Mode Decomposition. Information 2023, 14, 473. [Google Scholar] [CrossRef]
Datta, P.; Kaur, A.; Sassi, N.; Singh, R.; Kaur, M. An evaluation of intelligent and immersive digital applications in eliciting cognitive states in humans through the utilization of Emotiv Insight. MethodsX 2024, 12, 102748. [Google Scholar] [CrossRef] [PubMed]
Shah, S.M.A.; Usman, S.M.; Khalid, S.; Rehman, I.U.; Anwar, A.; Hussain, S.; Ullah, S.S.; Elmannai, H.; Algarni, A.D.; Manzoor, W. An Ensemble Model for Consumer Emotion Prediction Using EEG Signals for Neuromarketing Applications. Sensors 2022, 22, 9744. [Google Scholar] [CrossRef]
GomezRomero-Borquez, J.; Puerto-Flores, J.A.D.; Del-Valle-Soto, C. Mapping EEG Alpha Activity: Assessing Concentration Levels during Player Experience in Virtual Reality Video Games. Future Internet 2023, 15, 264. [Google Scholar] [CrossRef]
Gireesh, E.D.; Gurupur, V.P. Information Entropy Measures for Evaluation of Reliability of Deep Neural Network Results. Entropy 2023, 25, 573. [Google Scholar] [CrossRef] [PubMed]
Panachakel, J.T.; Ganesan, R.A. Decoding Imagined Speech from EEG Using Transfer Learning. IEEE Access 2021, 9, 135371–135383. [Google Scholar] [CrossRef]
Mazher, M.; Abd Aziz, A.; Malik, A.S.; Ullah Amin, H. An EEG-Based Cognitive Load Assessment in Multimedia Learning Using Feature Extraction and Partial Directed Coherence. IEEE Access 2017, 5, 14819–14829. [Google Scholar] [CrossRef]
Sun, J.; Hong, X.; Tong, S. Phase Synchronization Analysis of EEG Signals: An Evaluation Based on Surrogate Tests. IEEE Trans. Biomed. Eng. 2012, 59, 2254–2263. [Google Scholar] [CrossRef]
Hu, S.; Stead, M.; Dai, Q.; Worrell, G.A. On the Recording Reference Contribution to EEG Correlation, Phase Synchorony, and Coherence. IEEE Trans. Syst. Man Cybern. Part B 2010, 40, 1294–1304. [Google Scholar] [CrossRef]
GomezRomero-Borquez, J.; Del-Valle-Soto, C.; Del-Puerto-Flores, J.A.; Castillo-Soria, F.R.; Maciel-Barboza, F.M. Implications for Serious Game Design: Quantification of Cognitive Stimulation in Virtual Reality Puzzle Games through MSC and SpEn EEG Analysis. Electronics 2024, 13, 2017. [Google Scholar] [CrossRef]
Ruiz-Gómez, S.J.; Gómez, C.; Poza, J.; Gutiérrez-Tobal, G.C.; Tola-Arribas, M.A.; Cano, M.; Hornero, R. Automated Multiclass Classification of Spontaneous EEG Activity in Alzheimer’s Disease and Mild Cognitive Impairment. Entropy 2018, 20, 35. [Google Scholar] [CrossRef]
Wang, K.; Tian, F.; Xu, M.; Zhang, S.; Xu, L.; Ming, D. Resting-State EEG in Alpha Rhythm May Be Indicative of the Performance of Motor Imagery-Based Brain–Computer Interface. Entropy 2022, 24, 1556. [Google Scholar] [CrossRef]
Zarjam, P.; Epps, J.; Chen, F. Spectral EEG featuresfor evaluating cognitive load. In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3 September 2011; pp. 3841–3844. [Google Scholar] [CrossRef]
Vulpe, A.; Zamfirache, M.; Caranica, A. Analysis of Spectral Entropy and Maximum Power of EEG as Authentication Mechanisms. In Proceedings of the 2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, Romania, 25–27 October 2023; pp. 111–115. [Google Scholar] [CrossRef]
Padma Shri, T.K.; Sriraam, N. EEG based detection of alcoholics using spectral entropy with neural network classifiers. In Proceedings of the 2012 International Conference on Biomedical Engineering (ICoBE), Penang, Malaysia, 27–28 February 2012; pp. 89–93. [Google Scholar] [CrossRef]
Rusnac, A.L.; Grigore, O. Intelligent Seizure Prediction System Based on Spectral Entropy. In Proceedings of the 2019 International Symposium on Signals, Circuits and Systems (ISSCS), Iasi, Romania, 11–12 July 2019; pp. 1–4. [Google Scholar] [CrossRef]
Xi, X.; Ding, J.; Wang, J.; Zhao, Y.B.; Wang, T.; Kong, W.; Li, J. Analysis of Functional Corticomuscular Coupling Based on Multiscale Transfer Spectral Entropy. IEEE J. Biomed. Health Inform. 2022, 26, 5085–5096. [Google Scholar] [CrossRef] [PubMed]
Zheng, W.L.; Zhu, J.Y.; Lu, B.L. Identifying Stable Patterns over Time for Emotion Recognition from EEG. IEEE Trans. Affect. Comput. 2019, 10, 417–429. [Google Scholar] [CrossRef]
Song, T.; Zheng, W.; Song, P.; Cui, Z. EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks. IEEE Trans. Affect. Comput. 2020, 11, 532–541. [Google Scholar] [CrossRef]
Meta. Cubism. 2020. Available online: https://www.meta.com/experiences/cubism/2264524423619421 (accessed on 14 July 2025).
Meta. Puzzling Places. 2021. Available online: https://www.meta.com/experiences/3931148300302917 (accessed on 14 July 2025).
Meta. Tetris. 2020. Available online: https://www.meta.com/experiences/tetris-effect-connected/3386618894743567 (accessed on 14 July 2025).
STEAM. 9 Years of Shadows. 2024. Available online: https://store.steampowered.com/app/1402120/9_Years_of_Shadows/ (accessed on 14 July 2025).

Figure 1. Standardized protocol for EEG data acquisition in VR gameplay sessions.

Figure 2. Experimental setup showing a participant using the MUSE 2 EEG headset (4 channels: TP9, AF7, AF8, TP10) together with the Meta Quest 2 VR system under two test conditions: with audio and without audio.

Figure 3. Architecture of the proposed Deep Neural Network (DNN) for focused state classification.

Figure 4. Labeled EEG data distribution with audio.

Figure 5. Labeled EEG data distribution without audio.

Figure 6. Confusion matrix of the best DNN model with audio.

Figure 7. Confusion matrix of the DNN model without audio.

Table 1. Frequency bands of brain oscillations and their functional associations.

Band	Frequency Range (Hz)	Functional Description
Theta	4–8	Associated with working memory, cognitive control, and planning; often linked to drowsiness and early stages of sleep.
Alpha	8–12	Characterized by relaxed wakefulness; typically decreases during states of focused attention or cognitive engagement.
Beta	12–30	Related to active concentration, alertness, and higher-order cognitive processes; also involved in motor control.
Gamma	30–40	Associated with higher-level cognitive functions, including feature binding, attentional control, and information integration across brain regions.

Table 2. Physiological profile used for automatic labeling.

Label	PSD	MSC	SpEn
Focused (1)	High	High	Low
Not Focused (0)	Low	Low	High

Table 3. Hyperparameters and training setup.

Item	Value
Architecture	Dense–ReLU–BN–Dropout × 3 hidden layers; Sigmoid output
Dropout rates	[0.30, 0.30, 0.20]
Batch size	32
Epochs (max)	100
Optimizer	Adam (learning_rate = 1 $\times 10^{- 3}$ )
LR schedule	ReduceLROnPlateau (factor = 0.5, patience = 5, min_lr = 1 $\times 10^{- 6}$ )
Early stopping	patience = 10, restore_best_weights = True
Checkpoint	save_best_only = True(monitor = `val_loss`)
Weight init	GlorotUniform
Train/Val/Test split	70%/15%/15%
Random seeds	numpy = 42, tensorflow = 42

Table 4. Layers and number of parameters in the proposed DNN.

Layer	Number of Parameters
Dense (1, 256, bias = True)	(1 + 1) ∗ 256 = 512
BatchNormalization (256)	2 ∗ 256 = 512
Dense (256, 128, bias = True)	(256 + 1) ∗ 128 = 32,896
BatchNormalization (128)	2 ∗ 128 = 256
Dense (128, 64, bias = True)	(128 + 1) ∗ 64 = 8256
Dense (64, 1, bias = True)	(64 + 1) ∗ 1 = 65
Sigmoid	0
Total	42,497 parameters ≈ 0.16 MB (float32)

Table 5. Performance metrics of the best DNN model with audio scenario.

Class	Precision	Recall	F1-Score	Support
Not Focused	1.00	0.93	0.97	15
Focused	0.94	1.00	0.97	15
Accuracy			0.97	30
Macro Avg.	0.97	0.97	0.97	30
Weighted Avg.	0.97	0.97	0.97	30

Table 6. Accuracy comparison between the proposed model and reported approaches.

Model	Accuracy %/ $σ$
LR	67.09/15.92
KNN	61.56/20.8
SVM	71.96/15.02
SVM [28]	71.24/16.38
DNN	78.94/12.40
DBN [28]	63.42/19.22
DGCNN	75.87/18.33
DGCNN [29]	76.60/11.83
Proposed model with audio	97/9.24
Proposed model without audio	83/11.02

Table 7. Performance metrics of the DNN model under the no-audio scenario.

Class	Precision	Recall	F1-Score	Support
Not Focused	0.81	0.87	0.84	15
Focused	0.86	0.80	0.83	15
Accuracy			0.83	30
Macro Avg	0.83	0.83	0.83	30
Weighted Avg	0.83	0.83	0.83	30

Table 8. Cross-subject validation results under the audio condition using Group K-Fold (K = 5) and Leave-One-Subject-Out (LOSO) schemes.

Validation Strategy	Accuracy (± $σ$ )	Precision (± $σ$ )	Recall (± $σ$ )	F1-Score (± $σ$ )	ROC-AUC (± $σ$ )
Group K-Fold (K = 5)	$0.67 \pm 0.20$	$0.67 \pm 0.20$	$0.60 \pm 0.25$	$0.63 \pm 0.23$	$0.79 \pm 0.20$
LOSO (N = 30)	$0.63 \pm 0.19$	$0.63 \pm 0.07$	$0.63 \pm 0.27$	$0.63 \pm 0.17$	—

Table 9. Cross-subject validation results under the without audio condition using Group K-Fold (K = 5) and Leave-One-Subject-Out (LOSO) schemes.

Validation Strategy	Accuracy (± $σ$ )	Precision (± $σ$ )	Recall (± $σ$ )	F1-Score (± $σ$ )	ROC-AUC (± $σ$ )
Group K-Fold (K = 5)	$0.50 \pm 0.12$	$0.35 \pm 0.33$	$0.35 \pm 0.34$	$0.35 \pm 0.32$	$0.49 \pm 0.24$
LOSO (N = 30)	$0.53 \pm 0.51$	$0.23 \pm 0.43$	$0.23 \pm 0.43$	$0.23 \pm 0.43$	—

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

GomezRomero-Borquez, J.; Del-Valle-Soto, C.; Del-Puerto-Flores, J.A.; López-Pimentel, J.-C.; Castillo-Soria, F.R.; Ibarra-Hernández, R.F.; Betancur Agudelo, L. Audio’s Impact on Deep Learning Models: A Comparative Study of EEG-Based Concentration Detection in VR Games. Inventions 2025, 10, 97. https://doi.org/10.3390/inventions10060097

AMA Style

GomezRomero-Borquez J, Del-Valle-Soto C, Del-Puerto-Flores JA, López-Pimentel J-C, Castillo-Soria FR, Ibarra-Hernández RF, Betancur Agudelo L. Audio’s Impact on Deep Learning Models: A Comparative Study of EEG-Based Concentration Detection in VR Games. Inventions. 2025; 10(6):97. https://doi.org/10.3390/inventions10060097

Chicago/Turabian Style

GomezRomero-Borquez, Jesus, Carolina Del-Valle-Soto, José A. Del-Puerto-Flores, Juan-Carlos López-Pimentel, Francisco R. Castillo-Soria, Roilhi F. Ibarra-Hernández, and Leonardo Betancur Agudelo. 2025. "Audio’s Impact on Deep Learning Models: A Comparative Study of EEG-Based Concentration Detection in VR Games" Inventions 10, no. 6: 97. https://doi.org/10.3390/inventions10060097

APA Style

GomezRomero-Borquez, J., Del-Valle-Soto, C., Del-Puerto-Flores, J. A., López-Pimentel, J.-C., Castillo-Soria, F. R., Ibarra-Hernández, R. F., & Betancur Agudelo, L. (2025). Audio’s Impact on Deep Learning Models: A Comparative Study of EEG-Based Concentration Detection in VR Games. Inventions, 10(6), 97. https://doi.org/10.3390/inventions10060097

Article Menu

Audio’s Impact on Deep Learning Models: A Comparative Study of EEG-Based Concentration Detection in VR Games

Abstract

1. Introduction

1.1. Motivation

1.2. Literature Review

1.3. Contribution and Article Structure

2. Materials and Methods

2.1. EEG Dataset Description

2.1.1. Subject Recruitment

2.1.2. Experimental Setup

2.2. EEG Signal Analysis

2.3. Frequency Domain Analysis

2.4. Neuroscientific Metrics

2.4.1. Magnitude-Squared Coherence

2.4.2. Spectral Entropy

2.5. Deep Neural Network Model

2.5.1. Signal Preprocessing

2.5.2. Deep Learning Model

2.5.3. DNN Architecture

2.5.4. Performance Metrics

3. Results

3.1. Experiment One

3.2. Experiment Two

3.3. Experiment Three

3.4. Experiment Four

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1

Appendix A.2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI