Review Reports
- Omar S. Sonbul and
- Muhammad Rashid*
Reviewer 1: Anonymous Reviewer 2: Anonymous Reviewer 3: Anonymous Reviewer 4: Anonymous
Round 1
Reviewer 1 Report (Previous Reviewer 1)
Comments and Suggestions for AuthorsThere are several issues that should be considered, and the following are the comments:
- In conclusion, please consider adding a sentence or two that concisely reiterates the fundamental difference between this taxonomy and prior work.
- Please ensure the findings in Table 14 are thoroughly discussed and referenced in the main text. Explain why certain methods perform better.
- Please consider using a small schematic or flowchart to visually illustrate how statistical methods are embedded within the machine learning pipeline. This would make the concept of their embedded role much clearer.
Author Response
Please find the attached file.
Author Response File:
Author Response.pdf
Reviewer 2 Report (New Reviewer)
Comments and Suggestions for AuthorsThis manuscript presents a systematic literature review (SLR) on anomaly detection for bridge structural health monitoring (SHM) across 36 peer‑reviewed studies (2020–2025). It proposes a four‑dimensional taxonomy (real‑time capability, multivariate support, analysis domain, detection method) and compares detection paradigms (distance‑based, predictive, image‑processing) along deployment‑oriented axes. There are a few shortcomings that the authors may address:
- The narrative indicates ~6,812 initial hits → 1,022 after title/abstract screening → 580 for detailed review → 36 included; however, the text also states 580 were evaluated in detail and 554 were filtered out, which would leave 26, not 36.
- The abstract and contributions claim a novel four‑dimensional taxonomy, while Section 3 characterizes it as a practical organizing framework rather than a new theoretical construct.
- Specify latency thresholds (e.g., end‑to‑end sub‑second) and state whether pre‑processing is included. When reporting latency/speed, normalize by hardware (CPU/GPU/edge device) to enable cross‑study comparison. Ensure Table 5 lists hardware and uses consistent units.
- Beyond listing accuracies, consider reporting median and interquartile range (IQR) of Accuracy/F1 by method family and by anomaly type. Even a descriptive meta‑summary (with appropriate caveats) would substantiate the qualitative conclusions.
Author Response
Please find the attached file.
Author Response File:
Author Response.pdf
Reviewer 3 Report (New Reviewer)
Comments and Suggestions for AuthorsThe manuscript reviewed, including the comments marked in red, appears complete in its introductory section, as well as in the analytical and concluding parts. The background literature is considered adequate, and the investigations and analyses carried out across the various aspects are appropriately supported.
Author Response
Please find the attached file.
Author Response File:
Author Response.pdf
Reviewer 4 Report (New Reviewer)
Comments and Suggestions for AuthorsYou motivate a “four-dimensional taxonomy”; include a miniature definition line for each dimension in the Introduction for quick reader recall.
RQ3 currently mixes “accuracy, computational efficiency and fault types.” Please define “computational efficiency” (e.g., latency, FLOPs, memory, device class) and name the fault taxonomy used. Then ensure the same terms appear in Results tables.
State the number of reviewers screening titles/abstracts/full texts and report inter-reviewer agreement (e.g., Cohen’s κ). If single-reviewer, state that explicitly
List all inclusion/exclusion criteria verbatim in a table (e.g., domain = bridges only; years 2020–2025; venue constraint to eight databases; results-oriented validation required) with a brief rationale column.
Define how “bridge-specific” was enforced—were lab-scale structures or generic SHM benchmarks excluded? Add explicit rules/examples.
In Fig. 3 (taxonomy), add example algorithms under each leaf (e.g., distance: MCD/KNN; predictive: BDLM, SARIMA, GPR, LSTM; image: FFT/GAF/CWT→CNN).
Where you state “only 11 real-time studies,” add a table of those 11 with device class (server/edge), latency (ms or s), and throughput; include missing values as “NR.”
Standardize metrics across tables: include Accuracy, Precision, Recall, F1 and latency (if available) for every study; where missing, mark “NR” and say so in captions.
Clarify how you handled class imbalance when comparing F1 across papers—add a note that macro/micro averaging differ and label which one each study used (if reported).
Author Response
Please find the attached file.
Author Response File:
Author Response.pdf
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe literature review analyzes studies on anomaly detection methods for bridge Structural Health Monitoring. The authors propose a four-dimensional taxonomy and classify techniques into distance-based, predictive, and image-processing categories. The following are the comments:
- While the proposed four-dimensional taxonomy is well-structured, it largely reiterates existing classification frameworks in SHM anomaly detection literature without sufficiently differentiating its contribution from prior works. The manuscript should explicitly clarify how this taxonomy advances beyond existing categorizations.
- The performance evaluation in Section 4 lacks rigorous statistical validation to support claims about method superiority.
- The exclusion of statistical methods as a standalone category is noted, but could this oversimplify techniques like PCA-based anomaly detection? How might this affect the taxonomy’s comprehensiveness?
- Section 5.1 notes class imbalance as a challenge. Did any reviewed studies effectively address this, and why is this not highlighted in performance results?
- Table 6 shows only 8 studies use multivariate analysis. Is this due to technical complexity, data scarcity, or a research gap? How might sensor fusion improve detection of latent failures?
- In Section 4.2, predictive models are framed as "interpretable", but neural networks dominate this category. How was interpretability objectively assessed without standardized metrics?
Author Response
Please find the attached file
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis is a well-structured and informative review paper that I thoroughly enjoyed reading. The authors have successfully categorized existing methods from various perspectives, which can be particularly valuable for researchers seeking a comprehensive understanding of anomaly detection techniques in SHM. The following comments are intended to help further improve the quality and clarity of the manuscript:
- Consider adding ambient vibration sources alongside excitation in the diagram to reflect both active and passive SHM techniques.
- There is an inconsistency between the figure and the manuscript description. In the current diagram, if no anomaly is detected, the data correction step is applied—whereas the text (lines 43–44) suggests that correction occurs after anomalies are detected. Please revise the diagram to ensure it aligns with the textual explanation.
- Simplify the visuals in Figures 1 and 3 to enhance readability. Avoid excessive visual effects (e.g., glowing edges, gradients, or shadows), and remove highlights from the shapes to maintain a more professional and publication-ready appearance.
- In Table 10 and 11, it would be beneficial to include the specific neural network architecture (e.g., CNN, Transformer) and the corresponding method type for each entry. This additional detail will make the tables more informative and practical for researchers comparing different techniques.
- It is recommended to rename the section "Other Forms of Input Classes" to "Hybrid Input Classes", which more accurately reflects the integration of multiple input modalities and improves clarity for the reader.
The majority of the text is free from errors expect for some errors as below:
In Abstract "...and detections methods."
Change to "...and detection methods."
**Page 1, Introduction **: "To mitigate these wear and tear risks..."
For better flow, consider: "To mitigate these risks..."
Page 7, Section 2.2.1 **: "Selection and Rejection Criterion"
The plural form is "Criteria".
Page 22, Answer to RQ2 **: "> > Similarly, Table 5 indicates that..."
Remove the stray > > at the beginning of the sentence.
Replace "a lot of" with "significant" or "substantial".
Replace "like" with "such as" when providing examples.
Author Response
Please find the attached file.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper presents itself as a systematic literature review of anomaly detection methods for bridge Structural Health Monitoring (SHM), introducing a “novel four-dimensional framework” to classify and evaluate existing methods. This paper should not be published in its current form. The contribution is incremental, the novelty is overstated, and there is no technical rigor. My comment are following for authors:
- The claimed “multi-dimensional taxonomy” (real-time capability, multivariate support, analysis domain, and detection methods) is not new or truly innovative in the context of SHM or anomaly detection.
- Previous reviews in related fields (time-series anomaly detection, general SHM, civil infrastructure monitoring) have used similar multi-faceted frameworks.
- The analysis is based on 36 papers (2020–2025), which is a modest sample given the explosion of SHM+AI research in recent years.
- There is little evidence that the new taxonomy leads to fundamentally new insights. The identification of underutilized real-time and multivariate analysis is already well-known in the literature.
- The review highlights known gaps (e.g., lack of multivariate analysis, real-time capability, computational limitations), but does not offer new perspectives or actionable frameworks.
-
The so-called “performance evaluation” is only a summary of the numbers reported in other studies.
-
No effort is made to control for differences in datasets, bridges, fault types, or metrics across papers.
-
The review does not attempt to perform a true meta-analysis or quantitative synthesis.
For authors please do conduct a true meta-analysis with statistical normalization across studies. Refocus the review on critical, actionable recommendations for practitioners, not just a summary of existing work. Clearly state limitations, and avoid overstating novelty.
Author Response
Please find the attached file.
Author Response File:
Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsI have following comments:
- Clarify the scope of the review. While you mention 36 studies, the selection criteria could be expanded upon why only from 2020–2025? Were older foundational works intentionally excluded?
- Add more critical comparison between methods. The review categorizes methods (distance-based, predictive, image processing) but stops short of a deep comparative analysis in terms of robustness, scalability, and real-world deployment.
- They should include a table summarizing all datasets used in the reviewed studies. What kinds of SHM data (e.g., bridge types, sensor networks) were used for anomaly detection.
- The real-time capability section is thin. It says only 11 of 36 studies addressed real-time processing but does not evaluate how successful these attempts were or what hardware/software challenges existed.
- Real-world SHM relies on embedded devices. what are the computational constraints, and how do the reviewed methods perform under such constraints?
- Many reviewed studies do not use real-world bridge failures or large-scale data, point this out.
- Address interpretability of AI models. Especially for image processing and neural networks, there’s no mention of explainability or how end users (engineers) can trust the outputs. Also, as provided in some studies, ensemble ML models should also considered see j.eswa.2024.124897. Also, discuss the ability of the ensemble ML models and their higher performance.
- Time-frequency domain techniques deserve more attention. Discuss barriers (e.g., complexity, computational load) and potential benefits more clearly.
- Consider adding performance metrics in one consolidated table. Use comment 7 for other metrics table and include them.
- Be more specific about the need for hybrid methods, sensor fusion strategies, and transfer learning for SHM anomaly detection.
Author Response
Please find the attached file.
Author Response File:
Author Response.pdf