Next Article in Journal
LDSNet: A Lightweight Detail-Sensitive Network for Small Object Detection in Low-Altitude UAV Scenarios
Previous Article in Journal
A Dual-Branch Deep Learning Framework with Explainability for Dental Caries Classification Using Intra-Oral Photographs and Radiographs
 
 
Article
Peer-Review Record

Spatial–Temporal EEG Imaging for Dual-Loop Neuro-Adaptive Simulation: Cognitive-State Decoding and Communication Gating in Critical Human–Machine Teams

J. Imaging 2026, 12(5), 208; https://doi.org/10.3390/jimaging12050208
by Rubén Juárez 1,*, Antonio Hernández-Fernández 1, Claudia Barros Camargo 2 and David Molero 1,3,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4:
Reviewer 5:
J. Imaging 2026, 12(5), 208; https://doi.org/10.3390/jimaging12050208
Submission received: 11 March 2026 / Revised: 27 April 2026 / Accepted: 8 May 2026 / Published: 12 May 2026
(This article belongs to the Section AI in Imaging)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper present a dual-loop neuro-adaptive simulation framework grounded in real-time spatial–temporal EEG imaging, in which cortical activity is represented as dynamic topographic signal maps and decoded to regulate both operator assistance and team communication. My comments are as follows:
1. The network structure of the hybrid CNN-LSTM and its hyperparameters (convolutional kernel, number of LSTM units) are not clearly defined, resulting in insufficient model reproducibility. 
2. Key parameters such as the number of iterations, reward weights, and exploration rate in the MAPPO training are missing, and the convergence of the strategy has not been verified. 
3. With only 25 subjects, the sample size is relatively small, and the generalization and robustness of the model across individuals are not adequately demonstrated. 
4. The multimodal synchronization is only labeled with jitter < 5ms, but no synchronization test plan or error distribution data is provided. 
5. The description of the implementation details of the imaging, such as the interpolation algorithm and spatial resolution of the spectral-topological EEG map, is vague. 
6. The safety constraint thresholds (κ_red, τ) are only mentioned to be calibrated for individuals, but no calibration process or selection basis is given. 
7. The real-time inference delay is < 50ms, but the hardware platform and inference optimization scheme are not specified, and the argumentation for engineering feasibility is weak.

Author Response

Response to Reviewer

We sincerely thank the Reviewer for the careful reading of our manuscript and for the constructive comments. We have revised the manuscript extensively to improve methodological transparency, reproducibility, and engineering clarity. In particular, the revised version now specifies the CNN–LSTM architecture, MAPPO training hyperparameters, reward design, synchronization validation procedure, spectral-topographic map construction process, subject-specific threshold calibration, implementation hardware, and the limitations related to sample size and generalization.

Below, we respond point by point.

Comment 1. The network structure of the hybrid CNN-LSTM and its hyperparameters (convolutional kernel, number of LSTM units) are not clearly defined, resulting in insufficient model reproducibility.

Response:
We appreciate this comment and have revised the manuscript accordingly. The hybrid CNN–LSTM architecture is now explicitly defined in the revised Methods section and summarized in the hyperparameter table. Specifically, the CNN front-end now reports three 2D convolutional layers with kernels of 3×33 \times 3, filter sizes of 16, 32, and 64, ReLU activations, max-pooling layers, and spatial dropout. The temporal module is now explicitly described as an LSTM with 64 hidden units and temporal dropout of 0.3. We also clarified the temporal context length (K=10K=10 windows, corresponding to an effective lookback of approximately 1.9 s), the output heads, the joint loss function, and the optimizer settings. These additions were made to ensure reproducibility and eliminate ambiguity in the decoder design.

Comment 2. Key parameters such as the number of iterations, reward weights, and exploration rate in the MAPPO training are missing, and the convergence of the strategy has not been verified.

Response:
Thank you. We have substantially expanded the description of the MAPPO controller and constrained optimization framework. The revised manuscript now reports the main MAPPO hyperparameters, including discount factor γ=0.99\gamma = 0.99, GAE parameter λ=0.95\lambda = 0.95, PPO clipping ratio ϵ=0.2\epsilon = 0.2, entropy coefficient 0.010.01, rollout horizon (2048 steps), mini-batch size (64), and optimization epochs per update (10). We also added the stopping criterion used to verify convergence: training ended when episodic reward variation remained below 5% for 100 consecutive iterations or after 5×1065 \times 10^6 environment steps. In addition, we now provide the reward weights (ws,wp,wc,wl,wh)(w_s, w_p, w_c, w_l, w_h), scaling coefficients (α,β,η)(\alpha, \beta, \eta), and the constrained optimization parameters (ϵload,ϵcbe,ηλ)(\epsilon_{load}, \epsilon_{cbe}, \eta_\lambda). These revisions were introduced to clarify both the training dynamics and the safety-oriented structure of the controller.

Comment 3. With only 25 subjects, the sample size is relatively small, and the generalization and robustness of the model across individuals are not adequately demonstrated.

Response:
We agree with the Reviewer and have revised the manuscript to make this limitation explicit and to avoid overclaiming. The manuscript now states more clearly that the study should be interpreted as a controlled feasibility study conducted in a high-fidelity simulated environment, rather than as definitive evidence of subject-independent deployment readiness. We also clarify that decoder evaluation used a chronological held-out subject-dependent protocol, which supports within-subject temporal generalization but does not by itself establish cross-subject robustness. In both the Discussion and Conclusions, we now explicitly identify the moderate sample size (N=25N=25 pilot–engineer pairs) as a limitation and note that larger cohorts and subject-independent transfer experiments are needed in future work.

Comment 4. The multimodal synchronization is only labeled with jitter < 5 ms, but no synchronization test plan or error distribution data is provided.

Response:
Thank you for highlighting this. We have expanded the synchronization subsection to include the empirical validation procedure. The revised manuscript now states that LSL synchronization jitter was validated by sending a simultaneous TTL hardware pulse to the EEG marker stream and to a photodiode positioned on the simulator display. Cross-modal latency and jitter were then computed across 1000 test events, confirming a temporal standard deviation below 5 ms. This addition provides a concrete synchronization test plan and clarifies that the reported jitter value was empirically measured rather than assumed. We agree that a full latency distribution would further strengthen the engineering characterization, and we have noted this direction for future reporting.

Comment 5. The description of the implementation details of the imaging, such as the interpolation algorithm and spatial resolution of the spectral-topological EEG map, is vague.

Response:
We appreciate this observation and have revised the manuscript to make the imaging pipeline explicit. The revised Methods section now specifies that electrode positions are projected from the scalp surface onto the 2D plane using an azimuthal equidistant projection, after which spatial interpolation is performed with biharmonic spline interpolation. We also now report the fixed spatial grid resolution of 32×3232 \times 32 pixels, selected as a suitable compromise between spatial granularity and computational efficiency for a 14-channel wearable montage. In addition, we clarify that the final representation uses four channels (θ\theta, α\alpha, β\beta, and FAA). These additions were introduced precisely to improve reproducibility and methodological transparency.

Comment 6. The safety constraint thresholds (κred,τ\kappa_{red}, \tau) are only mentioned to be calibrated for individuals, but no calibration process or selection basis is given.

Response:
Thank you. We have now described the calibration procedure explicitly. During a 5-minute warm-up phase, the continuous CLI was recorded while each participant performed the baseline task. For each subject, the intra-subject mean μ\mu and standard deviation σ\sigma of the CLI were computed. The moderate overload threshold was then defined as τ=κyel=μ+1.5σ\tau = \kappa_{yel} = \mu + 1.5\sigma, whereas the critical threshold for communication blocking was defined as κred=μ+2.0σ\kappa_{red} = \mu + 2.0\sigma. We added this information both in the main Methods section and in the appendix on CLI derivation so that the calibration process and its statistical rationale are fully explicit.

Comment 7. The real-time inference delay is < 50 ms, but the hardware platform and inference optimization scheme are not specified, and the argumentation for engineering feasibility is weak.

Response:
We agree and have strengthened this part of the manuscript. The revised version now includes a dedicated implementation subsection specifying that the full decoding-and-control pipeline was implemented in Python 3 using PyTorch 2.x and executed on a workstation equipped with an NVIDIA RTX 3090 GPU. We also now report that forward-pass times for the CNN–LSTM decoder and MAPPO policy remained below 10 ms, ensuring that model inference remained well within the 100 ms system update interval and did not constitute a computational bottleneck. Together with the reported end-to-end latency below 50 ms, this provides a stronger engineering argument for real-time feasibility in the present simulation setting.

Comment on English language. The English could be improved to more clearly express the research.

Response:
We thank the Reviewer for this suggestion. The manuscript has been revised throughout to improve clarity, precision, and readability in English. We have refined phrasing, reduced ambiguity, strengthened methodological wording, and aligned the narrative tone across the Abstract, Introduction, Methods, Results, Discussion, and Conclusions.

Closing statement

We are grateful for the Reviewer’s insightful comments. We believe that these revisions have significantly improved the manuscript by making the methodology more reproducible, the controller design more transparent, and the claims more appropriately bounded with respect to generalization and deployment.

Reviewer 2 Report

Comments and Suggestions for Authors

1. As Fig.1, what is the Wireless EEG, Gaze Tracking, Telemetry and Events, Radio / Engineer Inputs data format? The caption in the image is covered; please correct it.
2. Spectral-Topographic EEG Maps contain three circles; the meaning of each circle needs to be explained.
3. The article does not explain Figures 1 and 2.

Comments on the Quality of English Language

1. As Fig.1, what is the Wireless EEG, Gaze Tracking, Telemetry and Events, Radio / Engineer Inputs data format? The caption in the image is covered; please correct it.
2. Spectral-Topographic EEG Maps contain three circles; the meaning of each circle needs to be explained.
3. The article does not explain Figures 1 and 2.

Author Response

Response to Reviewer 2

We sincerely thank  for the careful reading of our manuscript and for the constructive comments. We appreciate the Reviewer’s observations regarding the clarity of the introduction, methods, and the interpretation of the figures. In the revised manuscript, we have improved the English throughout, expanded the methodological description, and revised the figure explanations and captions so that the visual pipeline is now explicitly described in the main text.

Below, we respond point by point.

Comment 1

As Fig. 1, what is the Wireless EEG, Gaze Tracking, Telemetry and Events, Radio / Engineer Inputs data format? The caption in the image is covered; please correct it.

Response:
Thank you for this important observation. We have revised both the figure and the surrounding text to clarify the format and methodological role of each input stream shown in Figure 1. In the revised manuscript, we now explicitly describe that:

  • Wireless EEG is acquired as a multichannel time series (14 channels at 256 Hz), later transformed into spectral-topographic EEG maps.
  • Gaze Tracking is represented as gaze-fixation and pupil-related descriptors sampled at 60 Hz when available.
  • Telemetry and Events are represented as synchronized task-state descriptors, including hazard markers, task phase, kinematic context, and progress indicators.
  • Radio / Engineer Inputs are represented as asynchronous communication-event streams, including push-to-talk onset/offset, intervention timing, and recent communication history.

In addition, we corrected the figure layout and caption formatting so that the caption is no longer covered and is fully readable in the revised version.

Change in manuscript:
We expanded the text accompanying Figure 1 in the Introduction and Methods sections, and we revised the caption and layout of the figure to ensure full visibility and interpretability.

Comment 2

Spectral-Topographic EEG Maps contain three circles; the meaning of each circle needs to be explained.

Response:
We appreciate this comment and agree that the meaning of the three circular maps must be made explicit. In the revised manuscript, we now clarify that the three circles shown in the conceptual figure are illustrative examples of time-varying spectral-topographic EEG maps, intended to depict consecutive spatial representations of cortical activity over time. They do not correspond to three different participants or three different sensors; rather, they illustrate the temporal evolution of the spatial EEG representation that is later processed by the CNN–LSTM decoder.

To avoid ambiguity, we have revised both the figure caption and the explanatory text to state clearly that the three circles represent a temporal sequence of topographic EEG maps within the online decoding pipeline.

Change in manuscript:
We added an explicit explanation in the main text and revised the figure caption so that the three circular topographies are identified as illustrative consecutive time steps of the spectral-topographic representation.

Comment 3

The article does not explain Figures 1 and 2.

Response:
Thank you. We agree that the original manuscript did not explain Figures 1 and 2 sufficiently in the body text. In the revised version, we have added explicit figure-oriented explanation paragraphs so that both figures are now introduced and interpreted directly in the manuscript narrative.

  • For Figure 1, we now explain the full conceptual flow from multimodal acquisition to spectral-topographic representation learning and dual-loop actuation.
  • For Figure 2, we now explain how the related-work positioning figure situates the present study relative to prior literature along two axes: signal representation (channel-wise to spatial-topographic) and adaptation target (single-user monitoring to team-level coordination).

These additions were made to ensure that the figures are not merely decorative, but fully integrated into the scientific argument of the manuscript.

Change in manuscript:
We inserted explicit textual explanation for Figures 1 and 2 in the Introduction and Related Work sections, respectively, and revised the captions to improve clarity.

Comment on English language

The English could be improved to more clearly express the research.

Response:
We thank the Reviewer for this suggestion. The manuscript has been carefully revised throughout to improve English clarity, sentence structure, and technical readability. We also refined the transitions between sections and ensured that figures, methods, and results are explained more explicitly and consistently.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper presents a dual-loop neuro-adaptive simulation framework that integrates real-time spatiotemporal EEG imaging with multimodal synchronization and multi-agent reinforcement learning to regulate pilot haptic guidance and optimize communication timing in high-pressure operational scenarios. The work is technically sophisticated, the experimental design is rigorous, and the results demonstrate meaningful improvements in team coordination and responsiveness. The manuscript aligns well with the scope of the journal, and I recommend acceptance subject to major revisions. The key issues to be addressed are as follows:

- The term “imaging” is overused throughout the manuscript. The spectral-topographic EEG maps are generated via spatial interpolation from a 14-channel consumer-grade headset, rather than true neuroimaging modalities (e.g., fMRI, high-density EEG source localization). A more precise description such as “topographic representation” or “spatially mapped EEG” is recommended to better match the methodological scope.

- The decoder ablation results in Table 5 indicate that spectral-topographic EEG maps outperform raw channel-wise signals and spectral feature vectors. However, all baseline comparisons adopt the same CNN-LSTM architecture. It remains unclear whether the performance gain arises from the spatial representation or simply the increased input dimensionality. Adding a baseline with flattened topographic maps (which retain dimensionality but eliminate spatial structure) would better validate the claimed contribution of spatial modeling.

- The manuscript claims that cross-modal synchronization jitter is below 5 ms, yet no empirical measurements or validation are provided in the results section. Since this temporal precision is critical to the causal inference, reporting measured jitter values across modality pairs will improve methodological transparency.

- The safety override mechanism in Equation (8) employs participant-specific thresholds determined during calibration, but the calibration procedure is only briefly described. Additional details on threshold determination (e.g., based on warm-up data distributions, percentile selection) and whether thresholds are fixed or updated online will enhance reproducibility.

-The reported 31% reduction in communication breakdown error (CBE) lacks statistical comparison between the fixed dual-loop and safety-constrained conditions. Including statistical tests will clarify the practical value added by the constrained optimization.

- The existing relevant works can been discussed, such as Turbidity-Similarity Decoupling, LSINet rail, MFFENet, PGDENet, IRFR-Net, MMSMCNet, EGFNet AAAI, LSNet salient, WaveNet salient, CCAFNet, ECFFNet, SPGSNet-S*.

-There are inconsistencies between figure citations in the text and the actual figure numbering in the PDF (e.g., “Figure 4” cited in text does not match the labeled figure). This formatting error should be corrected during production.

- The derivation of the Cognitive Load Index (CLI) in Appendix B uses fixed weighting coefficients (λ₁–λ₄) without justification or sensitivity analysis. As CLI is a key control variable, clarifying whether these weights are empirically determined or participant-adaptive will strengthen the methodological foundation.

Author Response

We sincerely thank for the careful and constructive evaluation of our manuscript. We greatly appreciate the positive assessment of the technical sophistication, experimental design, and overall relevance of the work. We have revised the manuscript extensively in response to the reviewer’s comments. In particular, we have refined the terminology used to describe the EEG representation, expanded the methodological details on synchronization validation, threshold calibration, and CLI construction, clarified the distinction between the fixed and constrained dual-loop settings, corrected figure-citation inconsistencies, and improved the English throughout the manuscript.

Below, we provide a point-by-point response.

Comment 1

The term “imaging” is overused throughout the manuscript. The spectral-topographic EEG maps are generated via spatial interpolation from a 14-channel consumer-grade headset, rather than true neuroimaging modalities (e.g., fMRI, high-density EEG source localization). A more precise description such as “topographic representation” or “spatially mapped EEG” is recommended to better match the methodological scope.

Response:
We fully agree with this observation and thank the reviewer for raising it. In the revised manuscript, we have substantially reduced the use of the term “imaging” and replaced it, where appropriate, with more precise expressions such as “spectral-topographic EEG representations,” “topographic EEG maps,” and “spatially mapped EEG.” We also added explicit clarification in the Discussion and Conclusions that the proposed maps are interpolated topographic representations derived from a wearable 14-channel EEG system, and should not be interpreted as equivalent to high-density source-resolved neuroimaging. This revision was made precisely to align the terminology with the actual methodological scope of the study.

Comment 2

The decoder ablation results in Table 5 indicate that spectral-topographic EEG maps outperform raw channel-wise signals and spectral feature vectors. However, all baseline comparisons adopt the same CNN-LSTM architecture. It remains unclear whether the performance gain arises from the spatial representation or simply the increased input dimensionality. Adding a baseline with flattened topographic maps (which retain dimensionality but eliminate spatial structure) would better validate the claimed contribution of spatial modeling.

Response:
We thank the reviewer for this insightful suggestion. We agree that a flattened-topographic baseline would provide an even more stringent isolation of the contribution of spatial structure. In the current revision, we have addressed this concern in two ways. First, we have tempered the strength of the claim in the manuscript and now state more carefully that the reported ablation supports the practical usefulness of the spectral-topographic representation relative to the tested alternatives, rather than establishing an exhaustive causal decomposition of all representational factors. Second, we have explicitly acknowledged in the Discussion that additional ablations, including same-dimensionality baselines that remove spatial topology while preserving input size, are an important direction for future work.

We believe this revision makes the claim more methodologically precise. We agree that the suggested flattened-map baseline is valuable and will include it in the next experimental extension of the framework.

Comment 3

The manuscript claims that cross-modal synchronization jitter is below 5 ms, yet no empirical measurements or validation are provided in the results section. Since this temporal precision is critical to the causal inference, reporting measured jitter values across modality pairs will improve methodological transparency.

Response:
We agree and have revised the manuscript accordingly. The synchronization validation procedure is now explicitly described in the Methods section. Specifically, we report that cross-modal synchronization was validated empirically using a simultaneous TTL hardware pulse routed to the EEG marker stream and to a photodiode positioned on the simulator display. Cross-modal latency and jitter were then computed across 1000 test events, confirming a temporal standard deviation below 5 ms.

In addition, we strengthened the Results section by explicitly linking the controller-level interpretation to this empirically validated synchronization precision. While the current manuscript reports the validated jitter as a global multimodal timing property rather than as a full pairwise distribution table, the revised text makes clear that the value is measured rather than assumed.

Comment 4

The safety override mechanism in Equation (8) employs participant-specific thresholds determined during calibration, but the calibration procedure is only briefly described. Additional details on threshold determination (e.g., based on warm-up data distributions, percentile selection) and whether thresholds are fixed or updated online will enhance reproducibility.

Response:
Thank you. We have expanded the calibration procedure in both the main Methods section and Appendix B. The revised manuscript now explains that during the 5-minute warm-up phase, each participant’s continuous CLI was recorded while performing the baseline task. The intra-subject mean μ\mu and standard deviation σ\sigma were then computed, and the thresholds were defined as:

  • τ=κyel=μ+1.5σ\tau = \kappa_{yel} = \mu + 1.5\sigma
  • κred=μ+2.0σ\kappa_{red} = \mu + 2.0\sigma

We also now clarify explicitly that these thresholds are participant-specific but fixed during the subsequent online run, rather than adaptively updated online. This clarification was added to improve reproducibility and to make the safety-override mechanism more transparent.

Comment 5

The reported 31% reduction in communication breakdown error (CBE) lacks statistical comparison between the fixed dual-loop and safety-constrained conditions. Including statistical tests will clarify the practical value added by the constrained optimization.

Response:
We appreciate this comment. In the revised manuscript, we simplified the controller-level presentation to avoid ambiguity and to keep the reported inferential comparisons aligned with the four-condition experimental design: Open-loop, Pilot-only, Engineer-only, and Dual-loop. As a consequence, the final reported controller-level inferential analysis focuses on the comparisons supported directly by this design, especially the contrast between the full dual-loop condition and the open-loop baseline.

We agree that a direct statistical comparison between a fixed dual-loop controller and a safety-constrained controller would be valuable for quantifying the marginal benefit of the constrained optimization mechanism itself. However, because the final revised manuscript now centers the empirical analysis on the four-condition within-subject design, we chose not to retain a separate inferential claim based on an additional controller variant unless it could be reported with full consistency throughout the paper. We have therefore reframed the text to avoid overclaiming and to keep the inferential conclusions tied to the explicitly reported experimental conditions.

Comment 6

The existing relevant works can been discussed, such as Turbidity-Similarity Decoupling, LSINet rail, MFFENet, PGDENet, IRFR-Net, MMSMCNet, EGFNet AAAI, LSNet salient, WaveNet salient, CCAFNet, ECFFNet, SPGSNet-S.*

Response:
We thank the reviewer for this extensive list of additional references. We carefully considered these works. Many of the cited studies belong primarily to the domains of computer vision, image enhancement, saliency detection, and visual segmentation, whereas the present manuscript focuses on EEG-based topographic representation learning, multimodal neurophysiological synchronization, and dual-loop control in human–AI teaming. For this reason, we prioritized related work that is more directly aligned with the methodological core of the paper: EEG workload decoding, passive BCI, multimodal fusion, haptic assistance, synchronization frameworks, and multi-agent reinforcement learning.

That said, the reviewer’s broader point is well taken: our work should be positioned more carefully with respect to the idea of structured spatial representation learning. In response, we revised the text to sharpen this positioning conceptually, while keeping the reference set focused on the most directly relevant prior work for the EEG and neuro-adaptive control setting.

Comment 7

There are inconsistencies between figure citations in the text and the actual figure numbering in the PDF (e.g., “Figure 4” cited in text does not match the labeled figure). This formatting error should be corrected during production.

Response:
We thank the reviewer for noticing this issue. We carefully checked and corrected the figure numbering and in-text figure citations throughout the manuscript. The revised version now aligns the textual references with the actual figure numbers in the compiled document.

Comment 8

The derivation of the Cognitive Load Index (CLI) in Appendix B uses fixed weighting coefficients (λ1–λ4\lambda_1–\lambda_4) without justification or sensitivity analysis. As CLI is a key control variable, clarifying whether these weights are empirically determined or participant-adaptive will strengthen the methodological foundation.

Response:
We appreciate this important comment. In the revised Appendix B and Methods section, we now clarify that the CLI weights λ1–λ4\lambda_1–\lambda_4 are calibration weights fixed after the warm-up phase and used thereafter as part of the participant-specific operational calibration. We also clarify that the CLI itself is then normalized robustly to the unit interval and used as a control-oriented workload index rather than as an absolute psychometric measure of “true” cognitive load.

We agree that a formal sensitivity analysis of the CLI weighting scheme would further strengthen the methodological foundation. In the current revision, we have therefore made the interpretive status of the CLI more explicit and identified weight-sensitivity analysis and alternative weighting strategies as important directions for future work.

Comment on English language

The English could be improved to more clearly express the research.

Response:
We thank the reviewer for this suggestion. The manuscript has been revised throughout to improve English clarity, readability, and terminological consistency. We refined phrasing, reduced ambiguity, and improved the explanation of the figures, methods, and inferential interpretation.

We thank  again for the careful and constructive feedback. We believe the revised manuscript is now substantially clearer, methodologically stronger, and more appropriately scoped in its terminology and claims.

Reviewer 4 Report

Comments and Suggestions for Authors

The sample size used in the study was insufficient.

Only a simulation study is presented in this paper. Real-time implementation is required for this study.

Figure captions are unclear in some figures. Kindly update.

The model generated in this study was generalized. The EEG signal is always different for an individual.

Discussed only limited scopes.

Statistical Analysis was not sufficient for this study.

The results obtained from this study were appreciated.

 

Author Response

We sincerely thank  for the constructive assessment of our manuscript and for recognizing the value of the obtained results. We appreciate the comments regarding sample size, implementation setting, figure clarity, EEG individuality, scope, and statistical analysis. In the revised manuscript, we have addressed these concerns by clarifying the study scope, strengthening the methodological and statistical reporting, revising the figure captions and explanations, and making the limitations and generalization boundaries much more explicit.

Below, we respond point by point.

Comment 1

The sample size used in the study was insufficient.

Response:
We appreciate this important observation. We agree that the sample size is moderate and does not support strong claims of broad cross-subject generalization. In the revised manuscript, we now explicitly acknowledge this point in the Discussion and Conclusions sections. We state more clearly that the study should be interpreted as a controlled feasibility study based on N=25N=25 pilot–engineer pairs, rather than as definitive evidence of population-level deployment readiness. We also clarify that larger cohorts are needed in future work to better characterize inter-individual variability, rare failure cases, and subject-independent transfer.

Comment 2

Only a simulation study is presented in this paper. Real-time implementation is required for this study.

Response:
Thank you. We agree that real-world implementation is an important next step. In the present work, our objective was to establish the feasibility of the framework under a high-fidelity synchronized dual-station simulation, which allows controlled workload induction, precise event labeling, and temporally grounded evaluation of communication timing. To address the reviewer’s concern, we strengthened the manuscript in two ways:

  1. We now make it explicit that the contribution is a real-time-capable simulated implementation, not a live field deployment.
  2. We added a dedicated Implementation and Runtime Environment subsection specifying that the full decoding-and-control pipeline was implemented in Python 3 / PyTorch 2.x and executed on a workstation with an NVIDIA RTX 3090 GPU, with forward-pass times below 10 ms and end-to-end latency below 50 ms.

These additions provide stronger evidence that the framework is computationally compatible with real-time operation, while still making clear that field deployment remains future work.

Comment 3

Figure captions are unclear in some figures. Kindly update.

Response:
We thank the reviewer for this comment. We revised the figure captions throughout the manuscript and also improved the figure-oriented explanations in the main text. In particular, we clarified the meaning of the input streams in the conceptual pipeline, explained the temporal role of the spectral-topographic maps, and ensured that the captions describe the methodological purpose of each figure more explicitly. We also corrected formatting and figure-reference inconsistencies so that the figures are now easier to interpret and properly aligned with the text.

Comment 4

The model generated in this study was generalized. The EEG signal is always different for an individual.

Response:
We agree with the reviewer. Inter-individual variability is a central challenge in EEG-based systems, and the original wording may have sounded too broad. In the revised manuscript, we explicitly clarify that the present study does not claim full subject-independent generalization. Instead, decoder evaluation was conducted under a chronological held-out subject-dependent protocol, which supports within-subject temporal generalization under realistic online conditions. We also emphasize that the framework uses participant-specific calibration, including individualized CLI thresholds and subject-level normalization, precisely because EEG signals differ substantially across individuals. This limitation is now clearly discussed in the manuscript, and cross-subject transfer is identified as an important direction for future work.

Comment 5

Discussed only limited scopes.

Response:
Thank you. We understand this comment as referring to both application scope and discussion breadth. In response, we expanded the Discussion and Future Directions sections to better situate the study within a broader research context. The revised manuscript now discusses:

  • the distinction between controlled feasibility and field deployment,
  • subject-dependent versus subject-independent generalization,
  • limitations of wearable 14-channel EEG,
  • richer future representations and stronger ablations,
  • communication support beyond gating,
  • cross-domain extension to other safety-critical settings, and
  • governance, privacy, and deployment protocols.

These revisions broaden the scope of the discussion while keeping the empirical claims consistent with the presented evidence.

Comment 6

Statistical Analysis was not sufficient for this study.

Response:
We appreciate this comment and have strengthened the statistical reporting substantially. The revised manuscript now includes:

  • explicit description of the one-way repeated-measures ANOVA used across the four experimental conditions,
  • clarification that post-hoc paired t-tests were adjusted using Bonferroni correction,
  • reporting of effect sizes as partial eta-squared (ηp2\eta_p^2) and Cohen’s dd,
  • an inferential summary table for the principal controller-level outcomes, and
  • confidence intervals for the key paired contrasts.

We also clarified the conditional nature of the peak-load analysis and reported the size of the subset used in that analysis. These additions were made to ensure that the statistical interpretation is more transparent and better aligned with the analysis plan.

Comment 7

The results obtained from this study were appreciated.

Response:
We sincerely thank the reviewer for this positive assessment. We appreciate the recognition of the results and have worked carefully to improve the clarity, transparency, and methodological precision of the revised manuscript so that the contribution is more rigorously presented.

Comment on English language

The English could be improved to more clearly express the research.

Response:
We thank the reviewer for this suggestion. The manuscript has been revised throughout to improve English clarity, technical readability, and terminological consistency. We refined figure captions, improved transitions between sections, reduced ambiguity in the methodological description, and aligned the tone of the Discussion and Conclusions with the actual scope of the study.

We thank  again for the constructive comments and encouraging evaluation. We believe that the revised manuscript now presents the contribution more clearly, with stronger methodological detail, more explicit limitations, and more complete statistical reporting.

Reviewer 5 Report

Comments and Suggestions for Authors

The entire decoder validation hinges on how the three neurocognitive classes (CA, DA, SU) were labeled in the training data, yet this is never explained. The paper must clarify:

  • Were labels derived from scripted task events (e.g., "a perturbation was injected at t=X, therefore all EEG within ±2 s is labeled SU")? If so, the 93.6% accuracy may reflect event-induced EEG stereotypy rather than genuine cognitive-state discrimination, inflating the apparent performance.
  • Were labels assigned by independent raters using physiological or behavioral criteria?
  • What is the temporal window used for label assignment relative to the triggering event?
  • What is the class distribution across participants and sessions?

The CNN-LSTM model achieves 93.6% accuracy. Whether the model was trained and tested within the same participants (leave-one-session-out, k-fold within subject)? Whether there is a held-out test set independent of the training and validation data? How inter-individual variability that repeatedly acknowledged as a "major challenge" in the EEG workload literature ,references 18, 20, 24, was actually handled beyond "participant-specific calibration."

 

CNN-LSTM Architecture and MAPPO Training Are Insufficiently Specified for Reproducibility. 

 

The study runs four within-subject conditions with multiple outcome variables (RT, CBE, CLI overload, message volume). Several concerns arise:

  • With N=25 pairs, the study is likely underpowered for detecting moderate effect sizes after correction for multiple comparisons. No power analysis or sample-size justification is provided.
  • p-values are reported only in aggregate, not condition-by-condition in Table 7.
  • Effect sizes are mentioned in the methods section but not reported in the results.
  • The 73% reduction in CBE during "peak-load windows" is striking but lacks a confidence interval, the precise window definition, or the number of such windows that contributed to the estimate. This figure may reflect a small subsample and could be highly variable.
  • No correction for multiple comparisons is described despite several simultaneous outcome tests.

The experimental environment is described as a "high-fidelity dual-station simulation" but critical details are absent:

  • What is the actual task domain - motorsport, aviation, emergency response? The paper alludes to pilot-engineer pairs in motorsport/flight contexts but commits to neither.
  • What software constitutes the simulator?
  • What are the specific hazard types and perturbations used to induce peak workload and Surprise/Startle? How were these standardized across sessions?
  • What was the total session duration and the proportion of each EEG-labeled condition?
  • How were the "peak-load windows" formally defined for the 73% CBE reduction analysis?
  • What was the warm-up duration, and how much data was used for participant-specific calibration?

The RT improvement (487 ms → 231 ms) represents a reduction of over 250 ms, which is an unusually large effect for a within-session haptic assistance intervention in a high-fidelity simulation. Similarly, 73% reduction in peak-load interruptions with only 25 pairs is an extraordinary claim. The manuscript should:

  • Report individual-level variability (the SDs are reported in Table 7 but not discussed critically, note that RT SD of ±156 ms in baseline vs. ±72 ms in pilot-only suggests not just a location shift but a distribution shape change, which requires explanation).
  • Address whether any participants showed null or adverse effects, and whether responder analysis was performed.
  • Clarify the test used for the 31% CBE reduction (paired t-test? Wilcoxon signed-rank?), as the outcome is count-based and may not be normally distributed.
  • Discuss whether the MAPPO policy learned a trivially conservative strategy (e.g., near-permanent BLOCK) that mechanically reduces CBE without genuine intelligence.

 

 

Author Response

We sincerely thank  for the very careful, detailed, and constructive evaluation of our manuscript. We greatly appreciate the depth of the review and agree that several parts of the original submission required stronger methodological clarification, tighter statistical reporting, and more carefully bounded claims. In response, we substantially revised the manuscript to clarify the event-anchored labeling procedure, the chronological train/validation/test partition, the CNN–LSTM and MAPPO specifications, the experimental protocol, the inferential reporting, and the limitations regarding generalization and simulation-only validation. We also corrected figure-citation inconsistencies and improved the presentation of the main results and their interpretation.

Below, we provide a point-by-point response.

Comment 1

The entire decoder validation hinges on how the three neurocognitive classes (CA, DA, SU) were labeled in the training data, yet this is never explained. The paper must clarify: Were labels derived from scripted task events? Were labels assigned by independent raters? What is the temporal window? What is the class distribution across participants and sessions?

Response:
We fully agree that the validity of the decoder depends critically on the labeling procedure. In the revised manuscript, we added a dedicated subsection titled “Operational Neurostate Labeling”, where we now explain explicitly how the three classes were defined. The labels were derived from synchronized task events, gaze behavior, and perturbation markers, not from post hoc subjective labeling by independent raters.

More specifically:

  • Channelized Attention (CA) was assigned to windows in which the pilot maintained gaze on the central control region, performed the primary task without secondary interference, and showed nominal task execution.
  • Diverted Attention (DA) was assigned to windows corresponding to secondary-task interference, including auditory n-back prompts and lateral panel-management demands, together with gaze shifts outside the primary task region.
  • Surprise/Startle (SU) was assigned to windows within the 2–3 s interval immediately following unexpected perturbations, such as abrupt high-priority alarms or critical failures.

We also now make explicit that the labels are operational and event-anchored, rather than claims of exhaustive access to latent neurocognitive categories in a strong neuroscientific sense. In addition, we introduced a dedicated subsection on class distribution, reporting the final chronologically partitioned dataset size and the empirical class composition: CA = 325,000 windows (65.0%), DA = 125,000 windows (25.0%), SU = 50,000 windows (10.0%), for a total of 500,000 valid windows. We also now discuss explicitly in the Discussion that the high decoder performance must be interpreted relative to this operational labeling scheme.

Comment 2

The CNN-LSTM model achieves 93.6% accuracy. Whether the model was trained and tested within the same participants? Whether there is a held-out test set independent of the training and validation data? How inter-individual variability was actually handled beyond participant-specific calibration?

Response:
Thank you. We have clarified this point explicitly in the revised manuscript. The decoder was evaluated under a subject-dependent chronological block partition, not under random shuffling. For each participant/session, the earliest 70% of the recording was used for training, the following 10% for validation, and the final continuous 20% for held-out test evaluation. This design avoids temporal leakage and provides a more realistic estimate of online within-subject deployment than random splitting.

We also now state clearly that this protocol supports within-subject temporal generalization, but does not establish subject-independent transfer. To address inter-individual variability, the framework uses participant-specific calibration, including subject-level normalization and individualized CLI thresholding. In the revised Discussion and Conclusions, we explicitly identify subject-independent transfer as an open challenge and no longer overstate the generalization scope of the current study.

Comment 3

CNN-LSTM architecture and MAPPO training are insufficiently specified for reproducibility.

Response:
We agree and have expanded the Methods section substantially. The revised manuscript now provides the CNN–LSTM architecture in explicit form and summarizes the main hyperparameters in a dedicated table. The CNN front-end is now described as three Conv2D layers with 3×33 \times 3 kernels and 16, 32, and 64 filters, followed by ReLU activation, max-pooling, and spatial dropout. The temporal back-end is specified as an LSTM with 64 hidden units and temporal dropout of 0.3. We also clarify the temporal context (K=10K=10 windows, approximately 1.9 s effective lookback), the dual output heads, the loss functions, and the optimizer settings.

For MAPPO, we now report the main training hyperparameters, including the discount factor, GAE parameter, PPO clipping ratio, entropy coefficient, rollout horizon, mini-batch size, optimization epochs per update, and stopping criterion. We also strengthened the description of the constrained optimization logic and the safety-oriented control formulation. These revisions were introduced specifically to improve reproducibility.

Comment 4

With N = 25 pairs, the study is likely underpowered for detecting moderate effect sizes after correction for multiple comparisons. No power analysis or sample-size justification is provided. p-values are reported only in aggregate, not condition-by-condition in Table 7. Effect sizes are mentioned in the methods section but not reported in the results. The 73% reduction in CBE during peak-load windows lacks a confidence interval, precise window definition, or the number of such windows. No correction for multiple comparisons is described.

Response:
We appreciate these important statistical concerns and have revised the manuscript substantially in this respect.

First, we now state more clearly that the study should be interpreted as a controlled feasibility study with a moderate within-subject sample (N=25N=25 pairs), not as a fully powered population-level validation. We acknowledge explicitly in the Discussion that the sample size limits the precision of moderate-effect inference and cross-subject conclusions.

Second, we strengthened the inferential reporting by:

  • explicitly describing the use of one-way repeated-measures ANOVA across the four experimental conditions,
  • stating that post-hoc paired comparisons were adjusted using Bonferroni correction,
  • reporting effect sizes as partial eta-squared (ηp2\eta_p^2) and Cohen’s dd, and
  • adding an inferential summary table that reports the principal controller-level statistical results.

Third, regarding the 73% reduction in CBE during peak-load windows, we now define these windows explicitly as intervals in which the participant-specific CLI exceeded the overload threshold τ\tau and/or windows located within the 2–3 s interval following an unexpected perturbation marker. We also report that the peak-load subset comprised 50,000 windows, corresponding to 10.0% of the final chronologically partitioned dataset. To avoid overstating the result, the manuscript now frames the 73% value as a conditional effect on a predefined high-risk subset, rather than as a session-wide average effect.

We agree that a formal prospective power analysis and still richer inferential reporting would further strengthen the study, and we now acknowledge this more explicitly as a limitation and future direction.

Comment 5

The experimental environment is described as a “high-fidelity dual-station simulation” but critical details are absent: what is the actual task domain? what software constitutes the simulator? what hazards and perturbations were used? what was the total session duration? how were peak-load windows defined? what was the warm-up duration?

Response:
Thank you. We have revised the protocol description to make the experimental environment much more explicit. The revised manuscript now describes the study as a synchronized dual-station simulation environment involving continuous pilot control and telemetry-supervision tasks under time pressure. To avoid ambiguity, we now present the task domain as a generic pilot-control / telemetry-support scenario rather than alternating loosely between motorsport and aviation examples.

We also now specify that:

  • secondary-task load consisted of auditory n-back prompts and lateral panel-management demands,
  • perturbation episodes comprised abrupt high-priority alarms and critical failures,
  • each of the four conditions lasted 10 min, yielding 40 min of active task time per participant,
  • the order of conditions was counterbalanced using a balanced Latin-square design,
  • the initial warm-up / calibration phase lasted 5 min, and
  • peak-load windows were defined formally using the participant-specific CLI threshold and/or the 2–3 s post-perturbation interval.

These revisions were introduced to improve transparency and standardization of the experimental design.

With respect to the simulator software itself, the revised manuscript focuses on the functional task structure, synchronized event logic, and closed-loop control setting, which are central to the contribution. The scientific claim of the paper concerns the neuro-adaptive sensing/control pipeline rather than the branding of a simulator engine, and we therefore chose to foreground the operational structure of the environment rather than a software label.

Comment 6

The RT improvement (487 ms → 231 ms) and the 73% reduction in peak-load interruptions are unusually large. The manuscript should report individual-level variability, address whether any participants showed null or adverse effects, clarify the statistical test used for the 31% CBE reduction, and discuss whether the MAPPO policy learned a trivially conservative strategy (e.g., near-permanent BLOCK).

Response:
We appreciate this important point and agree that unusually strong effects must be interpreted with caution. In the revised manuscript, we have strengthened the Discussion to address this explicitly.

First, we now discuss the reported standard deviations more critically and note that the observed reduction in RT variance suggests not only a shift in the mean but also a stabilization effect under adaptive support. At the same time, we no longer present these gains as universally generalizable beyond the present controlled within-subject setting.

Second, we clarify in the Results and Methods that the controller-level statistical comparisons were conducted within the repeated-measures framework described in the paper, with corrected post-hoc contrasts and reported effect sizes. We agree that count-valued outcomes such as CBE deserve careful treatment, and we therefore bound the interpretation of the inferential results more cautiously in the revised manuscript.

Third, regarding the possibility of a trivially conservative policy, we now address this directly. The revised Discussion notes that the dual-loop configuration did reduce overall engineer message volume, but not to near-zero levels. Importantly, the controller simultaneously preserved or improved operational progress, reduced overload exposure, and maintained low RT, which is inconsistent with a degenerate near-permanent BLOCK strategy. In other words, the observed gains are associated with more selective communication timing, not simple communication suppression. We have made this interpretation explicit in the revised text.

We agree that a formal responder analysis and participant-level adverse-effect characterization would be valuable additional analyses. In the revised manuscript, we now acknowledge these as worthwhile future extensions rather than leaving them implicit.

Comment 7

There are inconsistencies between figure citations in the text and the actual figure numbering in the PDF.

Response:
Thank you for noting this. We carefully checked and corrected the in-text figure references and their correspondence with the actual figure numbering in the revised manuscript. We also improved figure captions and integrated the figures more explicitly into the surrounding text.

Comment 8

The derivation of the Cognitive Load Index (CLI) in Appendix B uses fixed weighting coefficients without justification or sensitivity analysis.

Response:
We appreciate this comment. In the revised Appendix B and Methods section, we now clarify that the CLI is an operationally calibrated control variable, not an absolute psychometric measure of “true” cognitive load. The weighting coefficients are introduced after warm-up calibration and then used within the participant-specific normalization framework. We also now make more explicit that the CLI is designed to support real-time gating and overload estimation under the present framework.

We agree that a full sensitivity analysis of the weighting coefficients would be a valuable extension. In the revised manuscript, we therefore clarify the role of the CLI more carefully and identify weight-sensitivity analysis and alternative weighting strategies as future work.

Closing statement

We are very grateful for the rigorous and thoughtful evaluation. The comments helped us strengthen the manuscript substantially. In particular, the revised version now provides a much more transparent account of the event-anchored labeling procedure, chronological held-out testing protocol, architecture and training details, statistical reporting, experimental design, and interpretation boundaries. We believe these revisions have significantly improved the methodological clarity and credibility of the work.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I have no problem for this version, and recommend accept.

Author Response

We sincerely appreciate your positive feedback and your recommendation for acceptance. Thank you for your time and valuable contribution to improving our manuscript.

Sincerely,

The Authors

Reviewer 2 Report

Comments and Suggestions for Authors

To submit a good journal article, please pay attention to the journal's format requirements and double-check before submitting.

Comments on the Quality of English Language

1.The author completed the illustrations through an online preview revision, but the reference numbers throughout the article were all "?", and the article did not meet the journal's standards.
2.No references were cited. Please check.

Author Response

Dear Reviewer,

We sincerely apologize for this issue. You are completely right that the previous document lacked the proper references. This was caused by a technical compilation error during the PDF generation process in the journal's submission system. This software glitch unfortunately broke the citation links, replacing all in-text citations with "?" and entirely omitting our bibliography from the file you reviewed.

We have thoroughly double-checked the current version, and this technical issue has been fully resolved. All in-text citations are now correctly numbered throughout the manuscript, and the complete reference list is properly included at the end of the document. We appreciate your patience and thank you for bringing this to our attention.

Thank you again for your constructive feedback and for your time.

Sincerely,

The Authors

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have thoroughly and satisfactorily addressed all of my previous concerns. I am therefore pleased to recommend this manuscript for publication.

Author Response

Dear Reviewer,

We would like to express our sincere gratitude for your positive feedback and for recommending our manuscript for publication. We deeply appreciate the time and effort you dedicated to reviewing our work. Your constructive comments during the review process have been invaluable in improving the quality, rigor, and clarity of our research.

Thank you again for your support and for your vital contribution to this process.

Sincerely,

The Authors

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

It has met the standards for journal publication.

Comments on the Quality of English Language

It has met the standards for journal publication.

Back to TopTop