Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessReview

Peer-Review Record

Smart Surveillance of Structural Health: A Systematic Review of Deep Learning-Based Visual Inspection of Concrete Bridges Using 2D Images

Infrastructures 2025, 10(12), 338; https://doi.org/10.3390/infrastructures10120338

by Nasrin Lotfi Karkan¹, Eghbal Shakeri^1,*, Naimeh Sadeghi²

and Saeed Banihashemi^3,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Infrastructures 2025, 10(12), 338; https://doi.org/10.3390/infrastructures10120338

Submission received: 4 November 2025 / Revised: 4 December 2025 / Accepted: 5 December 2025 / Published: 8 December 2025

(This article belongs to the Special Issue Modern Digital Technologies for the Built Environment of the Future)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper focuses on the theme of “smart surveillance of concrete bridge structural health based on computer vision” and presents a systematic review of recent applications of deep learning to bridge defect recognition and detection. It first introduces basic deep learning models such as convolutional neural networks, object detection and semantic segmentation frameworks, as well as commonly used evaluation metrics. However, several issues should be addressed before the paper can be considered for publication.

Explicitly formulate 2–3 research questions (e.g., “Which model families work best for each defect type and sensing setup?”), and then structure Sections 4–5 around answering them with quantitative and qualitative comparisons (not just narrative summaries).
Section 2 currently reads like a general computer-vision tutorial (AlexNet, VGG, GoogLeNet, DenseNet, MobileNet, Xception, etc.), with many details that are well known to AI readers and only loosely connected to bridge inspection.
In Section 2.5, the equations for Recall and F1 are labeled again as “Accuracy” in Eqs. (4)–(5), which is mathematically incorrect and confusing. In addition, the discussion of metrics does not highlight that overall “accuracy” is often misleading for highly imbalanced defect datasets (e.g., background vs small cracks) or that mAP@IoU thresholds, per-class F1, and pixel-wise IoU are more informative.
Expand Section 3 to precisely specify inclusion/exclusion criteria (publication type, language, bridge material, imaging modality), duplicate-screening procedures, and any quality or risk-of-bias assessment protocol.
It is recommended to cite more relevant studies from the last five years, such as:

Adaptive Learning Filters–Embedded Vision Transformer for Pixel-Level Segmentation of Low-Light Concrete Cracks，Journal of Performance of Constructed Facilities 39 (3), 04025007

How visual deep-learning methods relate to structural-health assessment: e.g., which studies actually estimate crack width/length, link defects to condition ratings or safety decisions, or integrate with BMS/BIM/3D models.
Expand the Conclusion/Future Directions to include more forward-looking AI topics (e.g., large vision models for crack segmentation, synthetic data and generative models, privacy-preserving on-device inference, robust domain adaptation across countries and bridge types).

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper presents a review about the use of DL for visual inspection. The paper is interesting, even though some aspects should be improved:

As a review paper, it is necessary to define what are the criteria for selecting paper, and it is required the definition of the protocol followed. This should be stated initially in the paper and not in section 3
Research questions should be provided in order to drive the reader thorughout the paper
Section 2 is a bit confused. When talking about DL, 4 main methods should be mentioned, as reported in https://doi.org/10.1016/j.autcon.2024.105719
Some additional information is required about the data at the base of the algorithm. The authors are invited to consider the main used dataset for the purpose of the study
Regarding YOLO and object detection, the works reported do not deal with works using new version of YOLO. The authors of ref 70, used YOLOv11 with attention for multiclass defect detection. Please, consider this paper and refs therein
Some additional information is required about the metrics to be used for evaluating the DL models, exploiting the trade-off between accuracy/precisions/recall and computational effort
In the end, conclusions should be oriented to answer of some research questions. I suggest to consider this modification

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The introduction correctly notes that prior reviews (e.g., Luo et al., Zhang & Yuen) covered broader computer-vision or AI-based bridge damage detection without a detailed comparative analysis of deep learning algorithms. However, the added value of the present review is not fully crystallized:At several points, the comparison in Section 5 is largely narrative (e.g., “DeepLabv3+ exhibits greater accuracy compared to U-Net”, “YOLO is faster but two-stage detectors are more accurate”). It would significantly strengthen the paper to explicitly formalize the research questions and show how they go beyond previous reviews, e.g.:

How have deep learning models for visual inspection of concrete bridges evolved since 2018 in terms of tasks, architectures, and deployment contexts?
What are the comparative performance characteristics (accuracy, mIoU/mAP, and inference time) of different algorithm families under similar tasks?
What gaps exist in datasets, defect types, and quantification capabilities for practical SHM?
Consider adding a short “Contribution and differences w.r.t. prior reviews” subsection to the Introduction, explicitly contrasting your scope, methodology, and deliverables with those of Luo et al. and Zhang & Yuen.

The methodology section mentions following PRISMA guidelines and describes that two reviewers qualitatively assessed risk of bias, focusing on study design, data handling, and reporting transparency. However:No risk-of-bias summary table or figure is provided. Readers cannot see which studies had high/low concerns or the criteria used.
The phrase “articles focusing on a specific issue were excluded” in Table 3 is vague and may introduce selection bias. Please clarify what “specific issue” means (e.g., purely camera-placement studies? narrow case studies with no algorithmic contribution?).Consider specifying: Whether a standard tool or checklist (e.g., a tailored version of PRISMA 2020 / QUADAS-2 style criteria) was used. How disagreements between reviewers were quantified (e.g., Cohen’s kappa) before being resolved by consensus. Whether any sensitivity analysis was done to see if excluding high-risk-of-bias studies changes overall conclusions. Adding a concise Risk of Bias table (rows = studies, columns = criteria) would substantially increase methodological rigor.
The title and abstract state “Visual Inspection of Concrete Bridges” and emphasize “image-based techniques”, which commonly includes both RGB and other imaging modalities (e.g., IR, multispectral). Yet the inclusion criteria explicitly exclude thermal, 3D, radar, and signal-based images, focusing only on “standard image types”.
Please clarify early (Abstract and Introduction) that the review focuses on RGB (or standard 2D) images used for visual surface inspection, and that non-RGB sensing modalities are outside scope.Otherwise, the reader might expect a more comprehensive treatment of all visual/optical modalities.
Section 2 provides a long tutorial on CNNs, VGG, GoogLeNet, ResNet, DenseNet, MobileNet, segmentation networks (FCN, U-Net, SegNet, DeepLab, Mask R-CNN, etc.), and detection networks (R-CNN, YOLO, SSD, RetinaNet, EfficientDet), plus meta-learning, transfer learning, image enhancement, and SfM.
Section 5 now aggregates conclusions from individual comparative studies (e.g., Trach, Cardellicchio et al. for classification; Ma et al., Tran et al. for detection; Qiao, Jin, Subedi, Mirzazade et al. for segmentation) and synthesizes them qualitatively (e.g., “DeepLabv3+ is more accurate than U-Net”, “recent YOLO versions outperform Faster R-CNN”).
To strengthen the “systematic” and “comparative” aspects:Consider constructing aggregated plots or tables that summarize: Typical ranges of mAP/mIoU/precision/recall for each algorithm family under similar tasks (crack vs multi-defect, classification vs segmentation vs detection). Trade-offs between performance and dataset size or input resolution (for example, highlight where models overfit small datasets).
The title emphasizes “smart surveillance of structural health”, but the main body concentrates on algorithms. The reader would benefit from a dedicated subsection in the Discussion or Conclusion on:How these methods have been integrated with robots, UAVs, or continuous monitoring systems (you already mention some climbing robot, LiDAR+SLAM and post-earthquake applications). Practical challenges: field lighting, occlusions, weather, calibration, data storage/annotation burden, latency constraints for real-time inspection, and human–AI interaction (e.g., inspectors validating AI outputs). Gaps between academic benchmarks (e.g., CODEBRIM, SDNET2018) and real inspection workflows, and what is needed to move from “paper performance” to routine deployment by agencies.
In Section 2.5 (Evaluation Metrics), the formulas labelled for Recall and F1-score are preceded by the word “Accuracy” instead of “Recall” and “F1-score”. This is confusing and should be corrected. Please also check that all metric symbols are consistently defined once (e.g., TP, FP, FN) and reused.
In Section 4.2.1, there is an “Error! Reference source not found.” instead of a valid reference to a previous section or figure. This must be fixed.
A number of typographical errors are present, for example:“utlized” → “utilized” in the methodology description. “DensNet” should be “DenseNet”.
Occasionally there are extra spaces or line breaks within words (e.g., “MobileNet”).
I recommend a thorough spell-check and consistency check for algorithm names, defect names (e.g., “spallation” vs “spalling”), and abbreviations.
In Table 3, please clarify what is meant by “articles focusing on a specific issue were excluded” and “non-standard abstracts”. This is important for reproducibility.
Figures 4–5 and the text around them present frequency statistics of damage types and quantification tasks. Please ensure that terminology is consistent with standard bridge inspection vocabularies (e.g., “spalling” vs “spallation”, “exposed rebar” vs “exposed reinforcement”) and with the categories used in CODEBRIM and other datasets.
When you discuss quantification (e.g., crack width, spalling area, defect length), explicitly state whether values reported in the reviewed papers are in pixels or physical units, and how they are converted (e.g., via SfM or calibration). Some methods (e.g., Jang et al., McLaughlin et al., Ye et al.) include explicit measurement procedures; highlighting these consistently would be helpful.

Article Menu

Smart Surveillance of Structural Health: A Systematic Review of Deep Learning-Based Visual Inspection of Concrete Bridges Using 2D Images

Further Information

Guidelines

MDPI Initiatives

Follow MDPI