Bio-Inspired Adaptive Multimodal Decision Fusion for Intelligent Safety Monitoring in Confined Spaces
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors- Novelty and Contribution
-
- The BA-MDF framework appears to be a heuristic, rule-based system rather than a fundamentally new algorithmic contribution.
- The biological inspiration is largely conceptual and not rigorously translated into a mathematical model.
Recommendation:
- Provide a formal mathematical definition of the inverse effectiveness mechanism.
- Compare the proposed method against established adaptive fusion techniques.
- Clarify what distinguishes this approach from standard reliability-weighted fusion.
-
2. Fusion Strategy is Rule-Based and Limited
The BA-MDF module relies on:
- Fixed thresholds (e.g., HRV threshold = 0.3)
- Manually assigned scores (e.g., 0.1, 0.7, 1.0)
- Linear weighted aggregation
This design limits:
- Generalizability across environments
- Adaptability to unseen conditions
- Scientific contribution relative to modern learning-based approaches
Recommendation:
- Justify the choice of thresholds and scoring values (empirical vs theoretical basis).
- Consider replacing or benchmarking against data-driven fusion methods (e.g., neural attention, probabilistic models).
- Provide sensitivity analysis for threshold selection.
3. Experimental Design and Dataset Limitations
The experimental validation is insufficient to support the strong claims made.
Key issues:
- Small sample size (12 participants)
- Limited dataset (1,550 samples, 6 hours of recording)
- Controlled environment rather than real deployment
- No cross-subject generalization analysis
Additionally, the dataset used for system validation is not publicly available, limiting reproducibility.
Recommendation:
- Expand the dataset or justify its adequacy.
- Include cross-validation across subjects (leave-one-subject-out).
- Provide details on variability across participants.
- Consider releasing the dataset or providing access.
4. Lack of Baseline Comparisons for the Full System
While the HAR module is compared with baseline models (LSTM, FFT-LSTM), the complete BA-MDF system is not compared against alternative fusion strategies.
This is a critical omission.
Missing comparisons:
- Fixed-weight fusion
- Classical decision-level fusion (e.g., majority voting, Bayesian fusion)
- Learning-based multimodal fusion approaches
Recommendation:
- Add comparative experiments for the full pipeline.
- Demonstrate the advantage of adaptive fusion quantitatively.
5. Overstated Performance Claims
The manuscript reports:
- 100% accuracy in emergency detection
- Significant improvements in rescue time
However:
- These results are based on simulated scenarios
- No statistical confidence intervals are provided
- No robustness analysis is included
Such claims are not sufficiently supported and may be misleading.
Recommendation:
- Report confidence intervals and standard deviations
- Clarify that results are from controlled simulations
- Temper claims accordingly
6. Insufficient Statistical Rigor
The evaluation lacks:
- Statistical significance testing
- Variance analysis across runs
- Confidence intervals for reported metrics
Recommendation:
- Include statistical tests (e.g., t-test, ANOVA)
- Report variability (mean ± standard deviation)
- Provide multiple experimental runs
Minor revison
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsBased on a technical review of the manuscript titled "Bio-inspired Adaptive Multimodal Decision Fusion for Intelligent Safety Monitoring Algorithm in Confined Spaces," several major flaws and critical areas for improvement have been identified.
- The inverse effectiveness principle is cited as a primary bio-inspired contribution, but its implementation is just a standard statistical method (uncertainty-weighted fusion) with weights inversely proportional to signal variance, not a new bio-mimetic discovery.
- The framework relies on GPS for spatial context and proximity analysis, but in confined spaces like mines, tanks, or tunnels, GPS signals are often unavailable or unreliable due to multipath effects and obstructions. UWB is mentioned as a substitute but not rigorously integrated into the primary validation.
- The HAR module is mainly assessed with the Reyes dataset, which includes waist-mounted smartphones used in open areas. This does not reflect the noise, vibrations, or restricted movement usually found in industrial confined spaces.
- Field validation included just 12 participants and a total of 7 simulated emergency events. This sample size is statistically inadequate to justify claiming 100% identification accuracy for safety monitoring in life-critical situations.
- The HAR module is compared to baseline LSTMs, but the BA-MDF fusion module is not evaluated against other common multimodal fusion methods such as Kalman Filters, attention-based fusion, or majority voting.
- The scores for activity and heart rate appear to be assigned arbitrary discrete values (e.g., 0.1, 0.7, 0.9) without a data-driven justification for these specific thresholds.
- The HAR module employs a 2.56-second window for classification, whereas the BA-MDF module uses a 20-second window for risk evaluation. The manuscript does not clearly explain how the high-frequency HAR outputs are smoothed or combined into the longer decision window without causing considerable delay.
- The assertion of a 68% efficiency improvement over video surveillance is misleading. Video surveillance does not work in zero-light or smoke-filled confined spaces, so its baseline is often non-operational. Comparing a working sensor system with a non-functional visual system creates an exaggerated perception of improvement.
- The system was tested on a Nuvo-6108GC equipped with an NVIDIA 1050Ti GPU. This device is a high-power industrial PC. The discussion does not cover how this architecture might scale down to low-power wearable devices like smartwatches, which have strict battery and computational limitations.
- Provide data or simulations showing system performance when GPS is entirely absent, utilizing only IMU and HRV.
- Conduct an ablation study to quantify the specific contribution of the HRV and spatial modules to the overall F1-score.
- Test the HAR module against datasets containing industrial noise (e.g., heavy machinery vibrations) to prove orientation-resilient claims.
- Include a power consumption and memory footprint analysis for the FFT-LSTM model on an actual wearable ARM-based processor.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis paper proposes an intelligent safety monitoring framework utilizing multimodal data from wearable devices. This work is timely and well-written. Comments to the authors:
1) Explain the reason for using the bio-inspired adaptive multimodal decision fusion (BA-MDF) module over the traditional/conventional approach.
2) How is the LSTM used for temporal sequence modeling? Provide a step-by-step approach.
3) Rewrite the sentences by removing the word "we".
4) How the GPS data is used in this paper.
5) Provide suitable references for equations 2 to 6.
7) Mention the role and working principle of Softmax Classifier?
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for Authors
The paper proposes a safety monitoring framework for confined spaces that integrates an enhanced FFT-LSTM model for human activity recognition with a bio-inspired adaptive multimodal decision fusion module.
While the overall concept of combining inertial sensors, heart rate variability, and geospatial data is reasonable, it requires revisions based on comments below.
The motivation for the specific bio-inspired mechanism and its practical advantage over conventional fusion methods is not clearly justified beyond general references to biological systems.
The methodology section lacks sufficient technical depth in several areas. The reliability estimation used for adaptive weighting in the BA-MDF module is only briefly mentioned without providing the exact formula or detailed implementation steps.
The selection of thresholds for activity scoring, heart rate anomalies, and spatial proximity appears arbitrary and is not supported by systematic sensitivity analysis or ablation studies.
Additionally, the integration of AHRS orientation correction with the FFT-LSTM pipeline is described at a high level without clear justification for its contribution relative to simpler approaches.
The experimental validation has notable weaknesses. The system-level testing relies on a limited controlled simulation at a single construction site with only twelve participants, which restricts the generalizability of the results.
The paper does not provide comprehensive comparisons against other multimodal fusion strategies or state-of-the-art baselines beyond basic LSTM variants.
Furthermore, the discussion of false alarm rates and robustness under various sensor degradation scenarios remains superficial.
The related work section adequately surveys individual components but fails to critically analyze how the proposed combination of techniques advances the field.
Claims regarding the superiority of the bio-inspired adaptive fusion are not sufficiently substantiated through rigorous experimentation or theoretical analysis.
The paper also overlooks important practical considerations such as power consumption, computational overhead on wearable devices, and long-term reliability in real industrial deployments.
Author Response
Please see the attachment.
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsAll my previous comments were addressed except the one below:
The authors mentioned, "We have removed the '68% efficiency improvement' claim from the manuscript" in the cover letter. But the authors failed to remove the text from the manuscript, despite claiming to do so in the cover letter.
Author Response
Thank you for bringing this inconsistency to our attention. We have now carefully reviewed the manuscript and located the remaining unrevised text referencing the "68% efficiency improvement." We confirm that it has been corrected in the latest version.
To be specific, as mentioned in our response, we have revised the following statement in the Conclusion:
Original Statement: "...indicating a 68% efficiency improvement compared to conventional video surveillance systems."
Revised Statement: "...Critically, the proposed multimodal monitoring system remains fully operational in environments where conventional video surveillance is inherently non-functional."
Thank you again for your careful review and patience. We have updated the manuscript accordingly.
Reviewer 4 Report
Comments and Suggestions for AuthorsThe revised manuscript is recommended for publication.
Author Response
Thank you for your positive evaluation and recommendation for publication. We sincerely appreciate the time and expertise you have dedicated to reviewing our manuscript. Your constructive feedback has been instrumental in improving the quality of this work.

