Next Article in Journal
Redundantly Actuated Hydraulic Shaking Tables via Dual-Loop Fuzzy Control
Previous Article in Journal
Quantum-Enhanced DNA Image Compression: Theoretical Framework and NISQ Implementation Strategy
Previous Article in Special Issue
Effectiveness of Digital Health Tools for Asthma Self-Management: A Systematic Review and Meta-Analysis of Clinical Trials
 
 
Article
Peer-Review Record

Computational Analysis of EEG Responses to Anxiogenic Stimuli Using Machine Learning Algorithms

Appl. Sci. 2026, 16(3), 1504; https://doi.org/10.3390/app16031504
by Felix-Constantin Adochiei 1,2,3, Anamaria Ioniță 1, Ioana-Raluca Adochiei 2,3,4,*, Oana-Isabela Stirbu 1, Gladiola Petroiu 5 and Florin Ciprian Argatu 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2026, 16(3), 1504; https://doi.org/10.3390/app16031504
Submission received: 31 October 2025 / Revised: 11 December 2025 / Accepted: 29 January 2026 / Published: 2 February 2026

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

General Overview

The manuscript entitled “Computational Analysis of EEG Responses to Anxiogenic Stimuli Using Machine Learning Algorithms” presents a computational framework for classifying anxiety levels (non-anxious, moderate, severe) from EEG signals using machine learning techniques.

EEG data were collected from 16 participants using the Unicorn Hybrid Black device and supplemented with the public DASPS dataset. After preprocessing steps—band-pass filtering, artifact removal through ICA, and segmentation—various spectral and temporal features were extracted, including power spectral density across standard frequency bands, spectral entropy, Hjorth parameters, and wavelet coefficients.

Three classifiers were implemented: logistic regression (LR), multilayer perceptron (MLP), and k-nearest neighbors (KNN). Data imbalance was addressed using the SMOTE technique, and model performance was evaluated through accuracy, precision, recall, F1-score, ROC and PR curves, and correlations with HAM-A clinical scores.

The EEG acquisition, preprocessing, and feature extraction procedures are appropriate and well described. The integration of both in-house and publicly available datasets improves data diversity, although potential variability due to different recording systems (number of channels, sampling rates) may affect signal consistency. The overall machine learning pipeline (feature selection → PCA → normalization → model training) is methodologically sound and performance evaluation is comprehensive. However, the small sample size and strong dependence on synthetic oversampling (SMOTE) raise concerns regarding model overfitting and generalizability.

  1. Strengths

Relevance and originality: The topic is clinically and scientifically important, addressing EEG-based biomarkers for anxiety assessment.

Methodological clarity: The data processing and modeling workflow are transparent and reproducible.

Comparative approach: The inclusion of multiple algorithms (LR, MLP, KNN) allows a meaningful comparison between interpretable and nonlinear models.

Clinical linkage: The correlation between model predictions and HAM-A scores provides external validation of the proposed approach.

Visualization quality: The manuscript includes clear graphical representations (confusion matrices, ROC and PR curves), which facilitate result interpretation.

  1. Weaknesses and Limitations

Limited dataset and imbalance: The small sample size and class imbalance significantly restrict the statistical reliability of the findings; SMOTE cannot fully substitute real data diversity.

Cross-device heterogeneity: Combining datasets from different EEG acquisition systems without domain adaptation or normalization strategies may introduce bias.

Narrow algorithmic scope: Only three relatively simple models are tested; stronger baselines (e.g., SVM, Random Forest, XGBoost, or deep learning models) are not explored.

Lack of feature interpretability: The contribution of specific EEG features or frequency bands is not analyzed, limiting neurophysiological insight.

Absence of independent validation: Results rely solely on internal cross-validation; no external test set or cross-dataset validation is performed.

Simplified experimental stimuli: The study uses only visual stimuli, which reduces ecological validity and limits generalization to real-world anxiety responses.

  1. Specific and Actionable Recommendations

Expand dataset and validation strategy

Increase the number of participants or adopt a leave-one-subject-out cross-validation scheme to better evaluate model generalization.

Provide an analysis of inter-dataset variability (e.g., through t-SNE or PCA visualization) to assess consistency between the in-house and DASPS datasets.

 

Enhance model diversity and benchmarking

Include additional algorithms such as SVM, Random Forest, XGBoost, or simple deep neural networks (CNN/LSTM) to establish stronger baselines.

Report statistical significance tests (e.g., paired t-test or Wilcoxon test) when comparing models to ensure differences are not due to random variation.

Incorporate feature interpretability analysis

Perform feature importance or sensitivity analysis (e.g., SHAP values or permutation importance) to identify the EEG bands and electrode regions most influential in classification.

Discuss possible neurophysiological implications of these findings in the context of anxiety-related brain activity.

Mitigate potential overfitting

Provide confidence intervals for accuracy and F1-scores.

Consider using nested cross-validation or an external hold-out dataset to validate model robustness.

Clarify data integration and ethical aspects

Describe in detail how datasets with different sampling rates and channel configurations were harmonized.

Include a brief statement on data privacy, participant consent, and ethical approval, even if formal IRB oversight was not required.

Improve language and figure presentation

Simplify long sentences in the introduction and discussion for better readability.

Ensure that all figures have self-contained captions, labeled axes (including units), and clearly distinguishable color schemes.

Comments on the Quality of English Language

Simplify long sentences in the introduction and discussion for better readability.

Author Response

We thank the Reviewer for their careful reading of our manuscript and for the constructive observations, which have substantially improved the clarity, methodological rigor, and coherence of the study. All comments were addressed in the revised version of the manuscript.

  1. Clarification of motivation and clinical context

The Introduction was expanded to articulate more clearly the rationale for integrating EEG-derived biomarkers with HAM-A–based clinical assessment. Additional recent references were added as requested.

  1. Methodological transparency

The reviewer’s request for additional detail regarding preprocessing, harmonization, and nested cross-validation has been fully implemented. The Methods section now provides explicit descriptions of downsampling, channel alignment, ICA pipelines, SMOTE isolation, and validation logic.

  1. Interpretation of severe-anxiety class

A clear statement was added in both Methods and Discussion to emphasize that the severe category is underrepresented and cannot be modeled reliably within the current dataset.

  1. Coherence of Discussion and Conclusions

Both sections were reorganized to ensure consistency with the revised narrative and to strengthen alignment with the study’s objectives and limitations.

We appreciate the Reviewer’s insightful recommendations, which have significantly improved the scholarly quality of the paper.

 

 

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript tackles an important and timely problem—the development of an objective, EEG-based adjunct to standard anxiety assessment—and offers a clear, well-written account of an end-to-end pipeline from acquisition to classification. Strengths include the use of a portable EEG system (Unicorn Hybrid Black), a thoughtful feature set that combines spectral, entropy and time-frequency descriptors, and the direct comparison of three conceptually different classifiers (logistic regression, MLP, KNN). The reported performance for non-anxious and moderate classes is promising and the authors rightly highlight the potential of MLPs to capture nonlinear structure in EEG. The paper is structured logically and provides useful detail about preprocessing and the modelling pipeline, which makes the work easy to follow.

However, there are several major methodological issues that limit confidence in the conclusions. The in-house sample is very small (n=16), and severe anxiety is so under-represented that no model in the reported test sets learned that class—this fundamentally constrains any claims about clinical utility. Combining the in-house (8-channel, Unicorn) recordings with the DASPS dataset (14 channels, different acquisition characteristics) introduces potential heterogeneity: channel montages, sampling rates, electrode types and ICA pipelines differ and the authors’ approach to harmonize these modalities (resampling, adapted montage, different ICA toolchains) is not shown to preserve equivalence of features. The use of a heuristic, feature-threshold procedure to generate preliminary labels is also concerning because it risks circularity if similar spectral features are then used by the classifiers; the paper should make explicit that these heuristics were not used in final labelling or training, and document safeguards against leakage.

There are also important issues in model evaluation and reporting that need attention. It is unclear whether feature selection (SelectKBest), PCA and SMOTE were applied inside cross-validation folds or on the whole dataset prior to splitting—if the latter, performance estimates will be optimistically biased. The test set is small (20% of an already tiny pooled sample), so single-split metrics and even AUC curves are unstable; reporting variability (CI or results across repeated CV folds) is essential. KNN’s collapse to the majority class and the fact that severe anxiety was never detected in test cases indicate that class-imbalance handling and evaluation thresholds need more careful treatment; SMOTE can help but cannot substitute for real diverse samples. The reported Pearson correlations between model probabilities and HAM-A scores are interesting but should be interpreted cautiously given the limited sample and potential dependence between training and evaluation steps.

In lieu of additional computational recipes, I recommend the authors moderate claims about clinical readiness and concentrate on improving data representativeness, harmonization, and transparency. Specifically: (1) if possible, expand and balance the sample—especially severe cases—or frame the present work clearly as a pilot feasibility study; (2) fully document and standardize preprocessing across datasets (same ICA strategy, channel mapping procedures) and show that feature distributions are comparable after harmonization; (3) ensure that all feature selection, PCA and resampling steps are nested inside cross-validation and report variability across repeated folds; (4) present per-fold confusion matrices and calibration plots rather than single-split summaries, and be explicit about IRB/ethics oversight given the human EEG recordings. With these steps and a more cautious framing of clinical implications, the study will provide an important contribution to EEG-based anxiety research.

Author Response

We are grateful to the Reviewer for the valuable comments and suggestions, which have helped improve the methodological detail, interpretability, and clinical contextualization of the study.

The manuscript has been revised as indicated:

1. Figures and structure

All figures were reordered and relabeled for consistency. Their explanatory text was revised to ensure uniform terminology and methodological continuity.

2. Feature extraction and preprocessing

The reviewer requested more precise descriptions of feature domains. The revised manuscript includes expanded explanations for spectral entropy, Hjorth parameters, wavelet coefficients, and the rationale for selecting the db4 wavelet.

3. Permutation-based feature importance

A dedicated subsection was added as recommended, detailing the procedure and its neurophysiological interpretation.

4. Typographical and formatting corrections

All inconsistencies identified by the reviewer were corrected; references and abbreviations were standardized.

5. Strengthening link to prior work

The Discussion now integrates a more explicit comparative assessment with relevant literature, including portable-EEG studies and high-density clinical EEG baselines.

We sincerely thank the Reviewer for the constructive feedback and for helping us improve the clarity and scientific robustness of the manuscript.

 

 

Reviewer 3 Report

Comments and Suggestions for Authors
  1. Overall assessment

The article is devoted to the development of an automatic portable classification of anxiety levels based on EEG signals, using modern machine learning algorithms. The authors combine their own dataset (16 participants) with the open DASPS dataset (23 participants), which is well justified in the text (lines 37–38)

The topic is relevant, the methodology is described in quite detail, and the results are qualitatively visualized. The article has the potential for publication after revisions.

 

  1. Strengths

2.1. Clear statement of the problem

The authors argue for the need for objective biomarkers of anxiety (lines 55–57);

2.2. Combining their own data and DASPS

This enhances the generalizability and significance of the work (lines 88–93);

2.3. Correct preprocessing methodology

  • resampling (lines 99–101);
  • ICA for artifacts (lines 102–104);
  • clear stimulation protocol (lines 96–98)

This demonstrates good technical quality of the work.

2.4. Comprehensive analysis of models

Compared:

  • Logistic Regression (81.25% accuracy, lines 207–213)
  • MLP (87.5% accuracy — best result, lines 209–210 )
  • KNN (75% accuracy, but serious imbalance — lines 211–213 and 168–174 )

 

2.5. Statistical validity

Correlation with HAM-A confirmed (0.66–0.71, lines 267–289), which strengthens the clinical significance.

 

  1. Main comments (necessary improvements)

3.1. A very small amount of data for the “severe anxiety” class

The authors admit that the models were unable to classify severe cases at all (lines 159–160, 165–167).

Problem: there is no successful classification of severe → the model is actually two-class.

Recommendation:

  • either exclude the severe class as statistically unreliable,
  • or supplement the sample,
  • or discuss the problem and solutions in the discussion section.

 

3.2. Excessive use of SMOTE on a small data set

The authors apply SMOTE (lines 125–127), but this is dangerous when n≈16–39.

Artificial samples can worsen generalization.

Recommendation:

Discuss this problem and its solutions in the discussion section.

3.3. Pre-heuristic labeling (lines 120–124)

This may introduce systematic bias.

There is no evidence that these thresholds (1.4×, 1.3×, 0.8×) have a physiological basis.

Recommendation:

Add:

  • ROC analysis of thresholds,
  • laboratory justification of selected values.

3.4. Ambiguity in stimulation protocol

6 stimuli of 15 sec duration are described (lines 96–98), but no examples of visual stimuli (e.g., pictures used) and their intensity are provided.

Recommendation:

Provide a list or examples of stimuli to ensure reproducibility.

3.5. No comparison with SOTA

The introduction mentions previous works (lines 1–3, 454–472), but:

  • there is no quantitative comparison of the authors' models with the literature
  • there is no baseline metric DASPS

Recommendation:

add the table “Ours vs SOTA vs DASPS Baseline”.

 

3.6. The purpose of the work is formulated incorrectly

3.6.1. the purpose may be to improve diagnostics (its accuracy, speed), and an automated system is a means to achieve the goal.

3.6.2. Usually in medicine (especially in diagnostics), they are very careful: they develop decision support systems. Therefore, the question is: is the work planning a decision support system or an automated diagnostic system? Because for the second, 16 participants are dangerously small.

 

Recommendation: Accept after revisions, because:

  • severe class is statistically inferior,
  • SMOTE may bias results,
  • no comparison with SOTA,
  • labeling is partially heuristic.

Author Response

We thank the Reviewer for the thoughtful and detailed evaluation of our work.

Following the reviewer’s evaluation, the manuscript has been revised as indicated:

1. Clarification of model selection

The manuscript now explicitly justifies the focus on logistic regression, MLP, and KNN, emphasizing transparency, reproducibility, and comparability with existing portable-EEG studies.

2. Feature selection and dimensionality reduction

The reviewer requested further detail regarding SelectKBest and PCA. The Methods section has been expanded to describe statistical criteria, retained variance, and the integration of these steps within nested cross-validation.

3. Interpretation of KNN performance

A more nuanced explanation was added, distinguishing between KNN’s limitations for categorical classification under class imbalance and its strong correlation with HAM-A scores, which supports its potential as a continuous severity estimator.

4. Cross-dataset harmonization

The harmonization workflow—sampling-rate alignment, channel matching, uniform filtering, and visual PCA verification—has been described in greater detail as recommended.

5. Expanded limitations

The reviewer’s request for a more comprehensive discussion of limitations was implemented. The revised section addresses dataset size, imbalance, SMOTE constraints, and the need for external validation on clinically enriched datasets.

6. Comparison with state-of-the-art

A more explicit summary of recent studies (portable vs. clinical EEG, two-class vs. three-class classification) was incorporated and consolidated in Table 4.

These revisions address the reviewer’s concerns and substantially strengthen the methodological and interpretative framework of the study.

 

 

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The revised manuscript presents a well-structured study combining wearable EEG systems with machine learning for anxiety-level classification. Improvements in data harmonization, feature extraction explanation, and nested cross-validation notably enhance reproducibility and practical relevance. However, theoretical depth remains limited, severe class imbalance still compromises generalization, and further refinement is needed in figure presentation, writing clarity, and structural coherence.

 

1.The study lacks theoretical discussion concerning the severe-class characterization and model failures. Add:

  • literature-based physiological signatures of severe anxiety;
  • visualization of feature distributions and decision-boundary collapse;
  • analysis of SMOTE-induced overfitting in limited EEG datasets.

Provide stronger justification for excluding more complex architectures (CNN/RNN) based on dataset size and interpretability constraints.

 

2.Experimental evaluation mostly relies on categorical metrics without addressing generalization. Recommended additions:

  • regression-based severity estimation using predicted probability outputs;
  • sensitivity studies under participant exclusion or cross-population testing.

Conclusions should explicitly state that the system is not yet suitable for clinical deployment, and applicability is limited to mild/moderate anxiety estimation.

 

3.Figures lack standardized color schemes, higher resolution, and confidence intervals. Confusion matrices should explicitly annotate the absence of severe-class detection. Feature importance should be complemented with EEG topographical visualization to enhance neurophysiological interpretability.

Comments on the Quality of English Language

he manuscript contains overly long sentences, inconsistent terminology, and grammatical inconsistencies. Some paragraphs are excessively dense; restructuring is needed for improved readability. A professional language edit is recommended, particularly for the Abstract and Discussion.

Author Response

Response to Reviewer

We thank the Reviewer for the constructive and detailed assessment of our manuscript. The comments were extremely valuable, and we have revised the paper accordingly. Below, we provide a point-by-point response describing the modifications implemented in the revised version.

  1. Theoretical discussion on the severe-anxiety class, model failures, and SMOTE effects

The manuscript has been updated to include a more substantial theoretical discussion regarding severe-anxiety neurophysiology, the challenges associated with identifying this category, and the implications for classifier performance. In Section 4 (Discussion), we added a dedicated explanation of how altered alpha suppression, heightened beta activity, and increased instability of cortical rhythms are typically reported in severe anxiety, clarifying why the reduced number of participants in this category limits ML models’ ability to delineate a stable decision boundary.

We also introduced a formal explanation of the decision-boundary collapse phenomenon, clarifying how, under extreme class imbalance, minority-class samples are projected into majority regions of the feature space. This section now explicitly cites relevant literature documenting this effect in affective computing.

Additionally, Section 4.3 (Limitations) now discusses potential SMOTE-induced distortions in high-dimensional EEG feature spaces and the rationale for cautious interpretation of synthetic samples.

  1. Generalization analysis, regression-style severity interpretation, and clarification of clinical applicability

To address the reviewer’s request regarding generalization, the HAM-A correlation analysis now includes a clear statement at the end of the section noting that the strong associations observed (r = 0.66–0.71) support the development of a continuous-severity estimator that complements categorical classification. This addition clarifies how probability outputs can serve a regression-style function.

Furthermore, the Conclusions section now states explicitly that the proposed system is not yet suitable for clinical deployment and that its applicability is currently limited to exploratory research settings and to distinguishing non-anxious from moderate-anxiety levels under controlled acquisition conditions.

Discussion and Limitations have been strengthened to emphasize the need for multi-site validation, subject-exclusion studies, and larger datasets.

  1. Figures, clarity, and interpretability

The captions of Figures 4–6 have been revised as requested. Each now contains the explanatory note:

“Note: The severe-anxiety class does not appear in the confusion matrix because no severe instances were correctly predicted under the present dataset conditions.”

This makes the limitation explicit and prevents misinterpretation of the model outputs.

Section 3.3 (Feature Importance Analysis) has also been expanded, providing a more precise neurophysiological interpretation of the most influential features and reinforcing the connection between model-defined relevance and established EEG markers.

Figure captions were also harmonized for consistency and improved readability.

  1. Justification for excluding CNN and RNN architectures

A methodological justification was added to Section 2.1 (Machine Learning Algorithms). The revised text explains that deep neural architectures were not included because low-density portable EEG montages (8 channels) do not provide sufficient spatial–temporal structure for practical CNN/RNN training. Furthermore, the study prioritizes interpretability, which is essential for early biomarker investigations. The justification is now supported by appropriate literature.

This avoids attributing limitations solely to dataset size and aligns the methodological choice with the study’s conceptual objectives.

  1. English-language and structural improvements

The manuscript underwent an extensive language revision. Long sentences were shortened, terminology was made consistent across the entire text, and dense paragraphs—particularly in the Abstract and Discussion—were reorganized to improve readability. Several sections were rewritten for clarity while preserving scientific meaning. The revised version addresses the Reviewer’s linguistic concerns.

We thank the Reviewer once again for the constructive feedback. The manuscript has been substantially improved as a result of these comments, and we trust that the revisions have satisfactorily addressed all concerns.

 

Back to TopTop