Smart Surveillance of Structural Health: A Systematic Review of Deep Learning-Based Visual Inspection of Concrete Bridges Using 2D Images

Lotfi Karkan, Nasrin; Shakeri, Eghbal; Sadeghi, Naimeh; Banihashemi, Saeed

doi:10.3390/infrastructures10120338

Open AccessReview

Smart Surveillance of Structural Health: A Systematic Review of Deep Learning-Based Visual Inspection of Concrete Bridges Using 2D Images

¹

Department of Civil and Environmental Engineering, Construction Engineering Management, Amirkabir University of Technology, Tehran 15916-34311, Iran

²

Faculty of Civil Engineering, K.N. Toosi University of Technology, Tehran 19967-15433, Iran

³

School of Built Environment, University of Technology Sydney, Sydney, NSW 2007, Australia

^*

Authors to whom correspondence should be addressed.

Infrastructures 2025, 10(12), 338; https://doi.org/10.3390/infrastructures10120338

Submission received: 4 November 2025 / Revised: 4 December 2025 / Accepted: 5 December 2025 / Published: 8 December 2025

(This article belongs to the Special Issue Modern Digital Technologies for the Built Environment of the Future)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Timely and accurate inspection of concrete bridges is critical to ensuring structural integrity and public safety. Traditional visual inspections conducted by human inspectors are labour-intensive, inconsistent, and often limited in their ability to access all structural components, particularly in hazardous or inaccessible areas. Image-based inspection techniques have emerged as a safer and more efficient alternative, and recent advancements in deep learning have significantly enhanced their diagnostic capabilities. This systematic review critically evaluates 77 studies that applied deep learning approaches to the detection and classification of surface defects in concrete bridges using 2D images. Relevant publications were retrieved from major scientific databases, screened for eligibility, and analyzed in terms of model type, training strategies, and evaluation metrics. The reviewed works encompass a wide spectrum of algorithms—spanning classification, object detection, and image segmentation models—highlighting their architectural features, strengths, and trade-offs in terms of accuracy, computational complexity, and real-time applicability. Key findings reveal that transfer learning, data augmentation, and careful dataset composition are pivotal in improving model performance. Moreover, the review identifies emerging research trajectories, such as integrating deep learning with Building Information Modeling (BIM), leveraging edge computing for real-time monitoring, and developing rich annotated datasets to enhance model generalizability. By mapping the current state of knowledge and outlining future research directions, this study provides a foundational reference for researchers and practitioners aiming to deploy deep learning technologies in bridge inspection and infrastructure monitoring.

Keywords:

bridge inspection; defect detection; machine learning; deep learning; computer vision; image classification; image segmentation; object detection

1. Introduction

Bridges play a crucial role in our society by connecting different regions and serving as key points in the urban transportation network. Hence, substantial financial resources are invested in inspecting and identifying defects in these critical infrastructures [1]. Concrete bridges are more common than steel bridges due to their versatility, resilience, durability, low maintenance, and quick construction [2]. Hence, maintaining bridges in good condition is important because environmental factors, aging, and heavy loads can cause damage. This can decrease their lifespan and increase the risk of collapse, causing economic and physical harm. For instance, a bridge collapse in Minneapolis (MN, USA) in 2007 led to the loss of 13 lives and 145 injuries [3].

Inspection serves as the initial phase of maintenance with this respect. Traditional bridge inspection is performed by human inspectors; this process can be time-consuming and may not be very precise due to human judgment’s inconsistency and uncertainty. It also requires inspector training and requires some equipment, like man lifts, which can be dangerous for both the inspector and the public [4,5]. Using artificial intelligence (AI) for bridge inspections can make the process more efficient and accurate.

Deep learning, a branch of AI, has become increasingly important for bridge defect detection and measurement. Utilizing sensor data and images, these algorithms can be trained to identify and quantify defects in bridges, resulting in quicker inspections and reduced human errors. While acceleration, strain, and displacement sensors are commonly used for bridge inspections, they cover only a limited part of the bridge. Furthermore, they are challenging to maintain and access after installation. Consequently, there has been a shift towards using images to identify defects and inspect bridges. This has led to a significant emphasis on utilizing images for bridge inspection and defect identification [6].

Despite the significant importance of bridge defect detection, systematic reviews of computer vision and deep learning approaches specifically targeting concrete defects remain limited. Most existing literature predominantly focuses on crack detection [1,7]. Recently, two review studies have addressed broader applications of computer vision techniques in bridge damage detection. Luo, et al. [8] explored three key aspects—surface defect detection, vibration measurement, and vehicle parameter identification—while Zhang and Yuen [9] reviewed various AI-driven techniques, including image-based methods, point clouds, infrared thermography (IRT), ground penetrating radar (GPR), and vibration responses. However, these reviews do not focus specifically on deep learning architectures for concrete bridge surface inspection using standard RGB images, nor do they provide a task-structured comparative synthesis of model families in terms of both performance metrics and computational efficiency.

This study extends the existing body of knowledge in three main ways. First, it focuses explicitly on concrete bridges and standard 2D image data, excluding non-visual sensing modalities, over the period 2018–2023, during which deep learning became the dominant approach for image-based bridge inspection. Second, it follows PRISMA-based systematic literature review procedures, with clearly defined search, screening, and risk-of-bias assessment steps, resulting in a curated set of 77 deep learning studies on concrete bridge surface defects. Third, it provides a task-structured comparative synthesis of major deep learning model families—CNN classifiers, one- and two-stage object detectors, and segmentation networks—organized by task (classification, detection, segmentation) and summarized in terms of accuracy-related metrics (e.g., F1-score, mAP, mIoU), inference time, and architectural characteristics (e.g., deep vs. lightweight backbones, one-stage vs. two-stage detectors). This analysis clarifies speed–accuracy trade-offs and helps relate model structures to practical deployment contexts such as UAV-based surveys, robotic inspection, and detailed defect mapping for SHM and maintenance decisions.

The research questions that this paper aims to address are:

How have deep learning-based visual inspection methods for concrete bridge surface defects evolved since 2018 in terms of: (a) defect types, and (b) computer-vision tasks (classification, object detection, segmentation, and hybrid methods)?
For each main task (classification, object detection, segmentation), which families of deep learning models are most commonly used for detecting concrete bridge defects, and what trade-offs do they show between performance (e.g., accuracy, F1-score, mAP, mIoU) and computational efficiency (e.g., inference time)?
What are the main characteristics and limitations of the datasets used to train and validate deep learning models for concrete bridge visual inspection (e.g., dataset size, defect classes, annotation level)?

A systematic literature review was conducted following the PRISMA guidelines to address these questions. Databases such as Web of Science, Scopus, and Google Scholar were searched for papers published between 2018 and 2023 that used deep learning for defect detection and visual inspection of concrete bridges using standard 2D images. Predefined inclusion and exclusion criteria regarding publication type, language, bridge material, imaging modality, and methodological content were applied as described in Section 3.

The paper is organized into six sections. In Section 2, deep learning and its subsets are briefly described. Section 3 details our search strategy for identifying relevant studies. Section 4 categorizes the deep learning methods utilized in bridge damage detection and provides an analysis. Afterward, Section 5 compares the performance of different deep learning algorithms in the context of bridge damage detection. Finally, Section 6 presents the conclusion and future study recommendations.

2. Background on Deep Learning

Computer vision has different key subsets; this research focuses on image classification, segmentation, and object detection. Image classification determines whether an image contains a specific object or not. In object detection, in addition to that, the object location is defined using a bounding box, and finally, in image segmentation, the object is precisely defined at the pixel level. The emergence of deep learning algorithms has transformed the field of computer vision. Deep learning refers to multi-layer neural networks, which have recently been increasingly used for computer vision tasks. Various algorithms, such as Convolutional Neural Networks (CNNs), AlexNet, VGG, DeepLab, U-Net, Region-based CNN (R-CNN), and You Only Look Once (YOLO) were highly effective in this revolution, each offering distinct advantages [10,11]. Moreover, improvements in deep learning and computer vision techniques have enhanced traditional algorithms functions. This section provides the basic concepts and algorithms of deep learning relevant to this review and highlights their strengths compared to earlier models.

2.1. Deep Learning-Based Image Classification Algorithms

Most algorithms for computer vision tasks are based on CNNs. A CNN has multiple layers: convolutional, pooling, and fully connected. In the convolutional layer, kernels (or filters) convolve with the input images to extract feature maps. These filters have weights that are adjusted during training. An activation function, such as sigmoid, ReLU, or tanh, is then applied to these feature maps to introduce non-linearity and determine which features should be passed to the next layer. The pooling layer reduces the dimensions of the convolved features, typically through operations like max pooling or average pooling, to decrease computational load and control overfitting. Finally, the last convolutional layer is connected to the fully connected layer, which integrates the extracted features to make the final classification or prediction [12].

An important component in the training process of CNNs is the loss function, which evaluates their accuracy in predicting the given labels. Backpropagation minimizes the loss function by calculating the gradient of the loss with respect to each weight in the network, moving backward from the output layer. This gradient is then used to adjust the weights through an optimization algorithm, aiming to minimize the loss and improve the network’s performance [13].

Key deep learning models include AlexNet and VGG-16.AlexNet, introduced by Krizhevsky et al. [14]. It is composed of eight weight layers, which include five convolutional layers and three fully connected layers with ReLU activations. This was followed by the VGG-16 model in 2014, developed by Simonyan and Zisserman [15], which enhanced depth by employing 16 layers, including thirteen convolutional layers with a 3 × 3 filter size and three fully connected layers. However, deeper networks, such as ResNet [16], address the vanishing gradient problem by using shortcut connections, enabling networks to have more layers without degradation in performance. DenseNet [17] builds on ResNet by connecting every layer to every subsequent one, improving information flow across the network and its architecture is displayed in Figure 1.

To address the computational challenges, models like MobileNet [18] and Xception [19] used depthwise separable convolutions, reducing the number of parameters and improving efficiency, making them ideal for real-time inspection tasks such as those performed by UAVs during bridge assessments.

2.2. Deep Learning-Based Image Segmentation Algorithms

Several approaches exist for image segmentation. Two important approaches are encoder–decoder structures and fully convolutional networks [20]. Fully convolutional network (FCN), developed by Long et al. [21], and U-Net [22] are two major models used in segmentation tasks. U-Net has a U-shaped architecture consisting of a contracting path (encoder) and an expansive path (decoder). The contracting path is similar to a standard convolutional network, utilizing convolutional and pooling layers for downsampling. The expansive path enhances the resolution and merges it with the information from the contracting path to create accurate segmentations.

Another important model is DeepLab [23], which uses dilated convolutions to expand the receptive field and capture more context without reducing spatial resolution. This feature is important for accurately identifying defects in concrete surfaces. Additionally, Mask R-CNN [24] extends Faster R-CNN by adding a mask branch, allowing for pixel-wise segmentation of detected objects. DeepLabv3+ further improves on the segmentation accuracy by using atrous spatial pyramid pooling (ASPP), which allows the model to capture features at various scales without losing spatial resolution, making it ideal for identifying defects of different sizes in concrete surfaces.

2.3. Deep Learning-Based Object Detection Algorithms

Object detection algorithms like R-CNN [25], Fast R-CNN [26], Faster R-CNN [27], and YOLO [28] have greatly improved the ability to detect and localize defects in concrete surfaces. The YOLO (You Only Look Once) family is especially useful for real-time defect detection due to its ability to process entire images in a single pass. YOLOv3 and YOLOv5 have improved accuracy and speed, making them ideal for use in UAV-based inspections. YOLOv5 introduced attention mechanisms to improve the detection of small and overlapping defects, which is a common challenge in concrete bridge inspection.

Deep learning algorithms improved the weaknesses of their previous algorithms by modifying their structure and adding new features. This continuous improvement made them more accurate, faster, and more reliable.

2.4. Evaluation Metrics

Deep learning performance is measured using different evaluation metrics. Some common evaluation metrics are described here.

Accuracy: Classification algorithms don’t always predict correctly, and their performance is measured according to their true and false predictions. True positives (TP) are when the model accurately predicts the positive class, while true negatives (TN) are when it accurately predicts the negative class. False positives (FP) happen when the model wrongly predicts the negative class as positive, and false negatives (FN) occur when the model incorrectly predicts the positive class as negative. Accuracy is the ratio of correct predictions to the total predictions, and mean accuracy is the average of accuracy across multiple model runs.

A c c u r a c y = \frac{T P + T N}{T o t a l p r e d i c t i o n s}

(1)

M e a n A c c u r a c y = \frac{1}{n} \sum_{i = 1}^{n} {A c c u r a c y}_{i}

(2)

where n is the number of runs and

{A c c u r a c y}_{i}

is the accuracy for each run.

Mean accuracy is obtained by averaging accuracy across multiple runs or folds. However, in highly imbalanced datasets—such as images where one class (e.g., cracks) occupies only a small portion of the pixels or where the “background” class dominates—overall accuracy can be misleading, because a classifier that almost always predicts the majority class may still achieve a high accuracy. Therefore, many studies also report precision, recall, and F1-score to better characterize model performance.

Precision:

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

Recall (Sensitivity):

R e c a l l = \frac{T P}{T P + F N}

(4)

F1 Score:

F 1 S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(5)

Precision reflects how many predicted defects are correct, recall reflects how many true defects are actually detected, and F1 balances these two aspects. In imbalanced settings, per-class precision, recall, and F1 (e.g., for crack vs. spallation vs. efflorescence) are more informative than a single global accuracy value.

For object detection, performance is usually summarized using Average Precision (AP) and Mean Average Precision (mAP). AP is the area under the precision–recall curve for a given class and Intersection-over-Union (IoU) threshold, whereas mAP is the mean of AP values over all classes. Many works report mAP at a fixed IoU threshold (e.g., mAP@0.5) or averaged over multiple thresholds (e.g., mAP@[0.5:0.95]) to simultaneously capture localization and classification quality.

For semantic segmentation, the dominant metric is Intersection-over-Union (IoU), defined for a given class as:

I o U = \frac{A r e a o f o v e r l a p b e t w e e n p r e d i c t i o n a n d g r o u n d t r u t h}{A r e a o f u n i o n b e t w e e n p r e d i c t i o n a n d g r o u n d t r u t h}

(6)

The mean IoU (mIoU) is the average IoU over all classes.

In addition to these accuracy-related metrics, several studies also report computational efficiency, typically in terms of inference time per image, frames per second, or model size (number of parameters).

3. Research Methodology

Following the PRISMA guidelines, this study conducted a Systematic Literature Review (SLR) to investigate the use of deep learning in concrete bridge automated defect detection. The literature review was conducted in three phases of data acquisition, screening, and in-depth analysis as illustrated in Figure 2.

The main goal of data acquisition was identifying the most relevant academic articles that could provide comprehensive data for our analysis. Scopus, Web of Science (WoS), and Google Scholar were used as our research engines. The keywords in Table 1 were utilized to set a comprehensive search query. The search was restricted to peer-reviewed journal articles and conference papers written in English that applied deep learning to visual inspection of concrete bridges using standard 2D images.

In the screening phase, the keywords were used to search the selected databases. We chose to restrict the time frame of our study to include only papers published after 2018 and up to October 2023. Searches were conducted based on article titles, abstracts, and keywords within the defined scope, which resulted in the finding of 1041 papers.

While conducting the literature review, we diligently ensured no duplicate entries and eliminated them accordingly. After conducting an initial database screening, it was found to be necessary to narrow down the number of papers only to include those that met the research criteria (Table 2). This was accomplished by carefully examining each document’s title and abstract. Papers with unclear titles were excluded, and only those related to bridge defect detection were retained.

Articles whose abstracts did not provide sufficient information to determine whether deep learning was used for concrete bridge surface defect detection with standard 2D images were also excluded.

In the full-text screening phase, we further checked that the bridges were primarily concrete, that the input data consisted of standard 2D images or video (excluding thermal, 3D, radar, and other non-visual modalities). Studies that focused only on hardware configuration (e.g., camera placement, UAV trajectory planning) without proposing or evaluating a deep learning defect detection model were excluded. Studies with missing or incomplete data, which could not be reliably included in the synthesis, were also excluded. This selection process resulted in 77 papers being selected for analysis.

Finally, in-depth analysis was used to investigate the use of deep learning algorithms to detect defects in concrete bridges. Content analysis was used to identify patterns in concrete bridge defect detection using AI, while descriptive analysis was employed to address trends and focuses in deep learning for bridge defect detection, as described in Section 4. One reviewer extracted data from each study using a piloted extraction form in Excel, and a second reviewer verified the data. The form collected information on authors, dataset details, defect types, deep-learning algorithms, and performance metrics. Studies were grouped according to the type of detection task: classification, segmentation, object detection, and hybrid methods. The study’s results were organized into summary tables. These tables were used to facilitate comparisons across studies. To explore potential causes of heterogeneity in the study results, we examined patterns in the summary tables, which included details on study characteristics like input image size, dataset size, and defect types.

We prespecified task-appropriate outcomes and consistently extracted them across studies. For semantic segmentation, we collected mean intersection over union (mIoU), precision, and recall when available. For object detection, we gathered precision, recall, and mean average precision (mAP) when available. For classification, we collected accuracy. Results were taken from the test set when available; otherwise, they were obtained from the validation set. When multiple results were reported within an outcome domain (e.g., alternative training procedures or different input sizes), we retained the highest value according to the authors’ stated evaluation protocol. The risk of bias in the studies included in this review was evaluated through a qualitative approach by two independent reviewers. The assessment concentrated on important factors such as the study design, the handling of data, and the transparency of reporting. Any differences in the reviewers’ assessments were addressed through discussion and consensus (Appendix A).

4. Analysis and Results

4.1. Descriptive Analysis

After carefully reviewing the selected articles, it was found that deep learning algorithms were first used in 2018 to detect defects in bridges. Before this, conventional image processing techniques, such as various filters and the Otsu method [29], were utilized. However, these methods had limitations due to the impact of lighting conditions, image resolution, and the level of noise present in the images [1]. Creating an image processing algorithm that could handle these scenarios was a significant challenge. Deep learning algorithms then emerged as a promising solution to overcome these limitations. Figure 3 demonstrates the trend of deep learning usage in bridge damage detection.

Additionally, it is evident from Figure 3 that segmentation algorithms have become increasingly prominent in recent years. This may be because of their ability to identify damage at the pixel level and assess their severity. It is worth noting that only 28% of the researchers focused on measuring bridge defects. Among these, a significant majority (61%) specifically concentrated on measuring cracks, as illustrated in Figure 4.

In the same vein, Figure 5 emphasizes the critical role of crack detection in maintaining the structural integrity of concrete bridges. Cracks are the most common type of damage because they can lead to more severe issues if not managed on time. The figure highlights various defects requiring attention, including spallation and exposed bars, which can compromise the bridge’s safety and longevity.

4.2. Content Analysis

The traditional method of inspecting concrete bridges involves manual inspection. An inspector or a team of inspectors visits the bridge site to assess the structure visually, identifying any defects and determining their severity [3]. This information is then recorded in bridge inspection reports and transferred into BMS systems. However, this transfer process can result in data loss and is also time-consuming [30,31,32].

Though, new technologies are recently utilized as a substitute for this method. These methods can rapidly access bridge conditions, enhance data accuracy, and gain information about inaccessible parts of bridges. Among these technologies is the use of sensors, which record data continuously. However, sensors usually cover limited parts of bridges and are inaccessible once installed, resulting in poor sensor system maintenance. To overcome these limitations, digital images and computer vision techniques prove to be useful [6].

Some image-processing methods relied on filters, morphological analysis, and edge detection techniques without model training. However, these methods were insufficient due to the impact of lighting conditions, image resolution, and the level of noise present in the images. Developing an algorithm that could handle all the unpredictable situations in the real world was not easy [1,33]. Therefore, some researchers used machine learning algorithms such as support vector machines to overcome these limitations. For instance, Prasanna et al. [34] developed a histogram-based classification algorithm to detect cracks in concrete bridge decks, utilizing SVM. However, machine learning methods typically use a limited number of feature extraction levels, making it challenging to address the complexities of real-world image data [35].

As a result, deep learning algorithms emerged in bridge defect detection (Figure 6). Deep learning algorithms can automatically learn features from the data and handle large and complex datasets. Researchers using deep learning to detect bridge defects using images typically classify their approaches into image classification, image segmentation, object detection, and hybrid methods combining multiple algorithms. Figure 7 Illustrates the process of detecting bridge defects using deep learning techniques.

4.2.1. Image Classification

Classification algorithms can assign a bridge defect label to each image. CNN is a pioneering image classification algorithm with this respect. Xu et al. [36] have successfully utilized end-to-end CNN, combined with atrous convolution, Atrous Spatial Pyramid Pooling (ASPP), and depthwise separable convolution, to develop a model for detecting concrete bridge cracks. Using atrous convolution allows for acquiring a wider receptive field without any reduction in resolution. ASPP module enables the network to extract multi-scale data, and depthwise separable convolution reduces the computational time. CNN is a powerful tool for feature extraction, but feature fusion is more critical, and CNN relies on fully connected layers to achieve this. Although these layers perform well in data fusion, they are insufficient for high-level information. To address this limitation, Zhang et al. [37] employed Convolutional Neural Networks and long short-term memory (LSTM) to detect concrete bridge deck cracks. LSTM was utilized as a feature fusion layer. This approach enabled faster crack detection compared to other CNN algorithms, making it particularly advantageous for real-time applications.

As mentioned in Section 2.1, the VGG algorithm emphasizes expanding the network depth to 16–19 weight layers. VGG input images are fixed-size. Chen [38] trained a variant of the VGG structure over 1200 images captured by a UAV and a handheld DSLR camera to detect cracks. They utilized a migration learning technique to overcome the challenge of insufficient data.

DenseNet with dense connections allows for higher accuracy in smaller epochs, making it more suitable for small datasets. Alfaz et al. [39] Utilized the DenseNet algorithm to classify bridge cracks.

Li et al. [40] developed a model called Skip–Squeeze-and-Excitation Networks (SSENets), which had two modules of Skip–Squeeze–Excitation (SSE) and Atrous Spatial Pyramid Pooling (ASPP). The utilization of skip-connections in the SSENets model addresses the vanishing gradient problem found in deep networks, a technique also employed in the ResNet algorithm. Subsequently, the feature map produced by the SSE module is fed into the ASPP module to capture multi-scale features. In the context of crack detection, cracks occupy only a small portion of the image, and their widths vary significantly. Traditional convolutional methods need to be improved for analyzing cracks with different widths at multiple scales, which hinders capturing their features effectively. The Atrous Spatial Pyramid Pooling (ASPP) module addresses this by using atrous convolutions with varying rates to extract multi-scale features from cracks, improving detection accuracy.

CNNs were one of the most frequently used algorithms for defect classification, but lightweight models like MobileNet became popular to enable real-time defect detection. Chen et al. [41] proposed a lightweight model named MobileNetV3-Large-CBAM, which utilized sliding window to locate and classify cracks. The researchers used two open-source data sets, the first containing 6532 images after data augmentation and the second containing 15,068 images. The model combined with Focal Loss handled an imbalanced dataset and achieved 95.90% overall accuracy using the first dataset. They also compared this method with other cutting-edge CNNs, demonstrating the superior performance of their model.

Researchers have expanded their focus to detect multiple defects in addition to cracks. For instance, in 2019, Hüthwohl et al. [42] introduced a three-stage multi-classifier powered by Inception V3 to identify concrete bridge defects. The study recognized eight defect categories: spallation, cracks, rust staining, efflorescence, scaling, abrasion/wear, exposed bars, and general defects. They considered that some types of defects occur along with other types. For example, Exposed bars happens with spalling, scaling, and efflorescence. Rust staining occurs with exposed bars. As a result, they proposed a hierarchical three-stage classifier. Again, in 2019, Kruachottikul et al. [43] developed a novel approach using image processing and deep learning to facilitate bridge inspection. They utilized CNN to classify cracks, Spallation, Erosion, and stains. The model achieved 89% accuracy on a dataset containing 3926 images of bridges across Thailand. Also, Mundt et al. [44] created an open-source dataset named COncrete DEfect BRidge IMage (CODEBRIM) to classify cracks, Spallation, exposed bars, efflorescence (calcium leaching), and corrosion (stains). The data set contains 1590 images, utilized in several research studies to train classification algorithms. At last, two meta-learning approaches, MetaQNN and efficient neural architecture search (ENAS), were employed to classify the defects mentioned in the dataset. Cardellicchio, et al. [45] compared five different deep learning algorithms, namely InceptionV3, ResNet50V2, DenseNet121, MobileNetV3, and NASNetMobile, for the classification of corroded steel reinforcement, cracks, deteriorated concrete, honeycombs, moisture spots, pavement degradation, and shrinkage cracks. ResNet50V2 outperformed other models in detecting the majority of these damages.

In 2023, Trach [46] proposed a method based on convolutional neural networks to classify images of concrete bridge elements into four classes: defect-free, crack, spallation, and popout. The model was trained and evaluated on a dataset of real bridge inspection images obtained from 15 years of inspections in various regions of Ukraine. After comparing the performance of eight different CNN architectures, Trach determined that the MobileNet model was the most effective in terms of accuracy, loss function, and training time.

Abubakr et al. [47] compared Xception & Vanilla models based on convolutional neural networks for the classification of bridge defects. They trained the algorithms on the concrete defect bridge image (CODEBRIM) dataset to detect cracks, corrosion, efflorescence, spallation, and exposed bars on the bridge surface. The result illustrated the superiority of Xception over Vanilla in terms of accuracy.

Since training deep learning algorithms requires considerable data, scholars utilized transfer learning due to the limited concrete bridge defect images. Transfer learning is a significant concept in machine learning, where a trained model on a dataset is used as the starting point of the second model. A study by Zhu et al. [5] utilized CNN and transfer learning to accurately classify spallation, exposed bars, cracks, and pockmarks on concrete bridge surfaces, achieving a testing set accuracy of 97.8%. In 2022, Bukhsh et al. [48] created a dataset of 3588 images of cracks, spallation, exposed bars, and corrosion stains on Slovenian concrete bridges. They then implemented three separate training methods on the dataset using the VGG16 model: without transfer learning, pretraining VGG16 on the ImageNet dataset, and pretraining VGG16 on the CODEBRIM dataset. The second approach yielded the highest accuracy, likely attributed to the vast number of images in the ImageNet dataset. Zoubir et al. [49] constructed a dataset with more than 6900 images of bridge cracks, spallation, and efflorescence. Then, a VGG16 with three different levels of transfer learning on the dataset was trained to identify and localize bridge defects.

In 2021, Kruachottikul et al. [50] introduced an innovative visual monitoring system that utilized deep learning to detect various types of damage in concrete bridge substructures. The system captured images of the substructure, used a Modified ResNet-50 algorithm to identify defects, and then classified them accordingly. Lastly, they utilized an ANN model for defect severity prediction. The ANN model took 12 input variables related to bridge components and defect types to predict the severity of defects in severe and non-severe classes.

Aliyari et al. [51] captured several images from the Skodsberg Bridge and constructed two distinct datasets. One contained heterogeneous and noisy images, and the other one contained images with less noise. They pre-trained VGG16, ResNet50, Resnet152v2, Xception, InceptionV3, DenseNet210, NASNetLarge, and EfficientNetB4 on ImageNet and then on the Skodsberg Bridge datasets and SDNET2018 after applying some preprocessing. SDNET2018 is a benchmark dataset for concrete crack detection introduced by Dorafshan et al. [52]. Among the models, EfficientNetB4 and DenseNet210 achieved the highest accuracy of 96%. Following closely are ResNet50 and VGG16, both with an accuracy of 95%. However, VGG16, ResNet50, and Xception outperformed the other models in the UAV-based bridge inspection datasets. They also determined the optimal amount of transfer learning, which other researchers had not previously explored.

Table 3 presents a summary of papers that were reviewed in this section. It presents the utilized algorithm, the types of identified defects, preprocessing actions, and the employed dataset.

4.2.2. Object Detection

Conventional computer vision classification algorithms can only assign a single label to an image. However, in the case of bridge defect detection, an image may contain multiple damages [8]. To address this challenge, object detection algorithms are commonly employed to identify defects using bounding boxes. Furthermore, object detection algorithms offer an approximate location of the defect, as opposed to classification algorithms. Among object detection algorithms, YOLO is a particularly effective tool for detecting concrete damages in images. Researchers like Murao et al. [55], Yu et al. [56], Li et al. [57], Deng et al. [58], and Ji [59] have developed methodologies for detecting bridge cracks by employing images and YOLO algorithms. For example, Murao et al. [55] employed YOLOv2 on a set of UAV-captured images, but the model accuracy was only 60% due to insufficient training images. This paper highlights the importance of employing high-quality datasets. Similarly, Deng et al. [58] modified YOLOv2 to detect bridge cracks and handwriting on bridge surfaces.

Ruggieri et al. [60] also emphasized the importance of datasets and trained YOLOv5 for detecting Cracks, Corroded steel reinforcement, Deteriorated concrete, Honeycombs, Moisture spots, and Pavement degradation on concrete bridge surfaces employing 2685 images. They declared that their dataset was challenging due to visual similarities among some classes, which could result in misclassifying one defect class as another. Moreover, overlapping two defect classes could lead to the failure of the algorithm. Zhang et al. [61] proposed a method for highway concrete bridge defect detection using YOLOv3. They trained the model on a dataset containing 2206 images from cracks, spallation, exposed bars, and pop-out. Furthermore, in 2023, Liu et al. [62] employed YOLOv5 to detect cracks in concrete bridges and also utilized deep convolutional generative adversarial network (DCGAN) to produce artificial images to enhance dataset image number and reduce the cost of data acquisition. Yamane et al. [63] proposed a novel method for detecting exposed bars from images using YOLOv5 and then reflecting it on a 3D bridge model. SfM (Structure from Motion) was employed for 3D model creation. SfM is a technique that creates 3D models from 2D images. Using SfM, a 3D model of the bridge and defects was constructed, which enhanced the integration of damage detection with BIM. In another research employing the YOLOv3 model, Teng et al. [64] successfully identified cracks and exposed bars in concrete bridges. Their study included a comparative analysis of YOLOv2, YOLOv3, and Faster RCNN. In addition, they implemented transfer learning and data augmentation techniques to enhance the performance of YOLOv3. The transfer learning contained retraining the SqueezeNet model, which extracts features in their YOLO model, using bridge defect images to enhance its capabilities. Consequently, the improved YOLOv3 model exhibited superior accuracy compared to the other models.

Other object detection algorithms that can help identify bridge damages include R-CNN (Region-based Convolutional Neural Networks) and Faster R-CNN. A study by Yu et al. [65] utilized Faster R-CNN to develop an object detector capable of detecting cracks, spallation, and exposed bars, achieving an mAP₅₀ of 84.56%. Similarly, Li et al. [66] utilized Faster R-CNN, training it with a dataset of 637 images captured by UAVs. Their model was tested on a cable bridge in China to detect three types of cracks (transverse, longitudinal, and alligator cracks), achieving a precision of 92.03%. Lin et al. [67] generated a UAV flight plan by considering various input parameters such as a 3D map, georeferenced inspection areas, target capture resolution, cameras’ relative orientation, required visibility, completeness, and safety. Subsequently, they generated 3D point clouds from the captured images. Then, they employed a feature pyramid network (FPN) in a Faster R-CNN system to identify bridge damages, including cracks, spallation, efflorescence, corrosion stains, and exposed bars, in the images. The results were projected back into the 3D point cloud. In 2023, using UAV images, Gan et al. [68] proposed a method for detecting, quantifying, and visualizing cracks on the bottom of concrete bridges. The process involved utilizing deep learning techniques such as Faster R-CNN and BIM. The authors captured 637 images of the bridge using UAV and augmented the dataset by applying flipping and other methods. Subsequently, they trained the model on the augmented dataset and utilized spline curves and crack box girder families to create realistic crack shapes and map them to the bridge model. Finally, they employed a comprehensive judgment method to evaluate the health status of a region of the bridge with cracks. The authors used crack width as a criterion to identify the structural health grade referred to technical specifications for highway maintenance and bridge evaluation [69]. The Faster R-CNN model achieved an accuracy and recall of 92.03% and 96.54%, respectively.

Image quality can highly affect algorithm performance. As a result, Hong et al. [70] initially processed the images using a super-resolution (SR) model to improve image quality. Subsequently, they labeled dataset images based on damage type and bridge member. Then, they developed n models for each member-specific dataset to identify efflorescence, water leaks, concrete sealing, concrete spallation, cracks, and RC corrosion. Where n denotes the number of bridge members, they trained and evaluated Mask R-CNN, YOLO, and Blendmask and ultimately selected the most effective model. Their rationale for this approach was the observation that the shapes of the same type of damage differed across different bridge members, prompting them to develop distinct models for each member. They achieved precise detection results by training optimized detection models for each member. Specifically, the Blendmask model demonstrated an 11.31% improvement, while the Mask-RCNN model showed an 8.893% enhancement in accuracy performance. Notably, the Blendmask outperformed the mask-RCNN overall, with a mean average precision (mAP) of 92.87 compared to 90.374. However, for specific cases like rail and road pavement models, mask-RCNN performed better. In another study, Deng et al. [71] trained Faster R-CNN on a set of images to detect cracks and handwriting on bridges and compared it to YOLOv2. The results indicated that Faster R-CNN achieved higher detection accuracy for concrete crack detection, while YOLOv2 had higher speed.

Recent advancements in defect detection for concrete bridges incorporated attention mechanisms into deep learning models. A research [72] proposed an enhanced YOLO11 model that integrated attention mechanisms to better focus on relevant details in bridge defect images. This attention mechanism not only improved model performance but also enabled real-time implementation on constrained devices, making it a practical tool for on-site inspections.

Table 4 presents a summary of papers that were reviewed in this section.

4.2.3. Image Segmentation

Initially, researchers segmented bridge defects using image processing methods, such as threshold-based segmentation [76,77,78]. Recently, deep learning models have transformed this field, achieving remarkable performance improvements [79]. The pioneering deep learning algorithm for image segmentation was the Fully Convolutional Network (FCN) [21]. Since then, multiple traditional deep learning segmentation algorithms have been developed, including U-Net, SegNet, PSPNet, DeepLab, and more [80]. The outputs of segmentation algorithms and the object detection algorithm are compared in Figure 8. In 2019, Rubio et al. [81] applied FCN to segment delamination and exposed bars on concrete bridge decks. Their dataset consisted of 734 images of bridges located in Japan. In addition, they employed transfer learning by training VGG-16 on the PASCAL VOC 2012 dataset. The model achieved a mean accuracy of 89.7% for delamination and 78.4% for exposed bars detection. In 2022, Lopez Droguett et al. [82] proposed a DenseNet algorithm with 13 layers for crack segmentation on concrete bridges. Additionally, they provided two datasets for crack detection. The achieved Intersection over Union (IoU) for this model was 94.51%. Merkle et al. [83] segmented crack using U-Net, achieving a precision of 56.1%. The precision was not very high due to the limited amount of data used for training (76 images). By combining photogrammetry and SfM, they reconstructed a metric 3D model of the bridge so that each image pixel could be projected onto the bridge surface with real-world coordinates. The crack masks were post-processed to extract the medial axis and crack contours, which were then back-projected onto the 3D CAD/SfM model. Crack length was computed as the Euclidean distance between the farthest 3D points along each crack centreline, and for crack width at each 3D medial-axis point, they searched for the nearest 3D contour point and computed the distance between them. The local crack width was then defined as

W = 2 \times d_{a x i s - c o n t o u r} + G S D

(7)

where GSD (ground sampling distance) is the local real-world size of one pixel (e.g., 0.2 mm/pixel), derived from the SfM reconstruction and camera parameters. In other words, GSD tells how many millimeters on the bridge correspond to one image pixel at that location. In another study [84], U-Net was applied to multi-class segmentation of concrete defects, and different loss functions were tried to check performance. The study demonstrated that U-Net, when trained with IoU loss functions, achieved the best results for detecting defects in concrete structures.

In 2023, Fukuoka and Fujiu [85] introduced a segmentation method for detecting bridge defects using a transfer learning algorithm called SegFormer to identify delamination and exposed bars. Additionally, they trained SegNet, a convolutional neural network, using the same dataset with images of various dimensions to compare the model results. Their findings revealed that larger input image dimensions resulted in improved model performance. SegFormer exhibited slightly lower precision when detecting delamination than the SegNet model; however, it demonstrated significantly better recall. Conversely, in exposed bars detection, the SegFormer model outperformed the SegNet model in all aspects.

Li et al. [86] presented a neural network for bridge crack detection that used an encoder–decoder structure and an MPM module. The MPM module helped the network to detect long and thin objects like cracks. The authors also collected and processed 2240 original photos of bridge cracks, which were expanded to a bridge crack data set of 10,000 images after image processing, and used it to train and test their model. They compared their model with four other image segmentation networks and showed that their model performs better in detecting bridge cracks. However, they acknowledged that their method does not calculate the width, length, area, and other parameters of the cracks, which are important for assessing the bridge condition.

Yamane et al. [87] proposed integrating and recording damage detected from multiple images into a 3D model using deep learning and structure from motion. The authors used Mask R-CNN with PointRend to detect corrosion damage from bridge images. They trained and evaluated the model on a dataset of 1966 images and demonstrated its high accuracy and precision in detecting damage. They employed SfM to construct a 3D model of the bridge from 2316 images captured by a UAV.

Some scholars modified the traditional deep learning segmentation algorithm architecture to improve their performance; for instance, Xu et al. [88] modified DeepLabv3+ with a MobileNetV2 backbone, modified ASPP module, and piecewise loss function to segment cracks on bridges. They also trained the model on two additional datasets (one containing concrete spallation with exposed bars and the other with cable corrosion) to segment spallation, exposed bars, and cable corrosion. Bae et al. [89] introduced “SrcNet”, a deep neural network based on CNN designed to enhance automatic crack detection in raw digital images captured by drones from large civil infrastructure. The network addressed the blurriness issue in these images caused by motion. To address this issue, SrcNet used deep learning to generate high-resolution images from raw input. Then, it employed VGG-16 to detect cracks at the pixel level automatically. Deng et al. [90] introduced a novel deep-learning method for detecting defects in concrete bridges, including exposed bars and delamination. Their developed network, LinkASPPNet, integrated a ResNet encoder with an enhanced ASPP module to capture multi-scale information. The paper also introduced an improved loss function to handle dataset imbalance. The model was trained using a dataset of 732 labeled images, which three experts annotated. Its performance was then compared to U-Net and LinkNet, and it outperformed both models, achieving F1 scores of 93.46% and accuracy of 92.91%, respectively. There are also further studies on bridge defect segmentation using deep-learning algorithms [10,91,92,93,94].

Creating a dataset of images that show cracks in pixel-level labels is time-consuming, and it takes about 1.5 min to annotate a crack image with 256 × 256 resolution at the pixel level. This means annotating thousands of images with pixel-level labels is challenging. To deal with this problem, Jin et al. [95] suggested a GAN-based method that simultaneously generated synthesized crack images and their corresponding labels. To do that, a dataset containing crack images and their pixel-wise labels was prepared first. Then, the Deep Convolutional GAN model was adopted to generate synthesized crack annotations, while the Pixel2Pixel model was used to create the corresponding synthesized crack images. They used 7800 pixel-wise labeled crack images and produced 23,400 synthesized crack annotation images. Finally, they trained PCR-Net on a mixed dataset consisting of both real and synthesized crack images. They stated that the proposed method for establishing the synthesized crack image dataset could be helpful in training deep neural networks. Also, Subedi et al. [96] presented a novel method for segmenting bridge components and damages using synthetic images and deep-learning models. Their method consisted of a new architecture combining a RegNet encoder with a Nested U-Net decoder, which they claimed to outperform other state-of-the-art architectures (like DeepLabV3+, U-Net, PSPNet, LinkNet, RefineNet) in component and damage segmentation tasks. They also employed a Lovasz–Softmax loss function to address the class imbalance problem and a weighted softmax ensembling to enhance the performance. In another study, Ayele et al. [97] used Mask-RCNN to segment bridge cracks. To overcome the problem of pixel-wise labeling, they used an active learning approach for labeling images captured by UAV from Skodsberg Bridge in Norway, which led to 90% less manually labeled data. They also measured length and width as Euclidean distances between contour points in the binary mask, then converted those pixel distances to physical crack length/width using the known ground sampling (pixel size) of the orthomosaic. Wang et al. [98] used a refinement network embedded with an attention mechanism (RefineNet-AM) on a synthetic (Tokaido) dataset to detect bridge components and damages after an earthquake. The attention mechanism, inspired by the human ability to pay more attention to important areas of images, increased the weight of images’ important regions. They demonstrated that the proposed RefineNet-AM could achieve satisfactory accuracy and robustness on the test dataset and outperform a baseline model (U-Net) on several metrics.

Some researchers combined the defect information from deep learning algorithms with 3D models and BIM. Montes et al. [99] developed a unique 3D dataset based on a real bridge structure using a low-cost LiDAR (Light Detection and Ranging)-enabled imaging device. The dataset contained point clouds and RGB images of bridge components and damages, with semantic annotations. They proposed a deep learning method for semantic segmentation of bridge components and damages using an enhanced 3D graph neural network. A 3D Graph Neural Network (3D GNN) is a neural network created to process data structured as graphs in three-dimensional space. First, a colored point cloud image was input into SubNet1 of enhanced 3D GNN. This image was used to create a directed graph, where each pixel was considered a node and was connected to its K-nearest neighbors in 3D space. Pixel features were then calculated using color information and a transformed CNN, which serves as the initial node information, resulting in a component prediction map. In the second step, initial information and weights were re-assigned to each node based on the results of the first sub-network and the constructed graph network. Color information and node types were combined, and initial convolution was performed through CNN, serving as the initial node information for the second network. The final output was a damage type prediction map, effectively utilizing texture and geometric information throughout the process. Artus et al. [100] conducted a study on applying Building Information Modeling to model bridge defects. Their research established a systematic approach for automated collection and transfer of defect data and creating a defect information model for data exchange. The study used Industry Foundation Classes (IFC) for this purpose. Additionally, they utilized TernausNet to segment spalled areas on the captured images of the bridge and an SfM-based pipeline to generate a metric point cloud of the bridge components. The segmented spall pixels were then back-projected onto this 3D point cloud so that the defect geometry (area and volume of the spall) was obtained directly in real-world units and stored as voiding features in the BIM model. Finally, by using an existing BIM model in the form of an IFC file and point clouds created from the photos of the damaged structure, they developed a geometric-semantic as-damaged BIM model, including the damage data. Borin and Cavazzini [101] introduced an approach that integrates deep learning, imaging, and BIM to evaluate the state of concrete bridges. They employed Mask R-CNN to identify and segment spallation. The model was trained on a dataset containing 575 manually annotated images and tested on a bridge in northern Italy. This study used images to generate point clouds, forming the foundation for constructing the BIM. The BIM then functioned as a platform for organizing and visualizing information regarding the bridge’s condition.

In some studies, measurements of the defects have also been carried out in addition to identifying them. For example, a study conducted by Jang et al. [102] aimed to detect cracks in high-rise bridge piers using a climbing robot and deep learning algorithms. The robot captured high-quality images while climbing the piers, which were then used to train a modified SegNet to segment cracks. To determine the length and width of the cracks, the images were converted to grayscale, and a median filter was applied. Background information was removed by subtracting the filtered image from the grayscale image, and the Euclidean distance transform (EDT) and skeletonization were used to calculate the crack length and width. Using a pinhole-camera scale factor:

s = d_{w} l / P f

(8)

where

d_{w}

is the working distance, l the physical sensor size, P the image resolution, and f the focal length, they obtain a metric resolution of 0.086 mm per pixel. Crack length was then obtained by summing the lengths of the skeleton pixels and multiplying by this scale, while crack width at each skeleton point was derived from the EDT radius (distance to the crack boundary) multiplied by the same scale. In this way, both crack length and width were expressed in real-world units (millimeters) rather than pixels.

In 2020, McLaughlin et al. [103] automated the inspection and quantification of damaged areas in concrete bridges using robotics and deep learning. Their study involved using a robot equipped with a camera and LiDAR to capture images of bridges. The approach integrated SLAM to combine LiDAR data and defect information in images into a labeled 3D map of the bridge structure. Additionally, they employed a CNN architecture to automatically identify spallation and delamination in both regular and infrared images. The researchers also proposed a technique to measure defect areas on the 3D map. For automatic damage assessment, they utilized MobileNetV2 for feature extraction and DeepLab V3 for pixel-level segmentation. This CNN architecture achieved high accuracy in detecting structural defects.

Detecting post-earthquake bridge defects through deep learning has proven to be a valuable application. In a study by Ye et al. [104], the multi-task high-resolution net (MT HRNet) model was utilized to identify concrete bridge defects such as cracks, spallation, and exposed bars and evaluate bridge safety levels. It is a neural network designed to handle multiple tasks simultaneously while maintaining high-resolution representations throughout the process by avoiding downsampling. They trained the model on the Tokaido dataset and achieved an accuracy of 0.9661 in detecting the bridge components and an accuracy of 0.9944 in detecting bridge damages. Based on the findings, the authors determined that columns are the most crucial component of a bridge. By considering three key indices, including the damage type of the column (shear and torsional damage and bending damage), spallation area, and the width of the cracks (in pixels), the post-earthquake bridge safety risk level was assessed.

Recently, researchers have started exploring transformer-based models for concrete defect segmentation, moving beyond traditional CNN architectures. In 2025, a study [105] used the ALF-ViT model, a Vision Transformer-based approach for crack detection in low-light concrete images, which showed good segmentation performance in challenging illumination conditions, showing promising potential for real-world inspection scenarios.

Table 5 summarizes the reviewed papers on bridge defect detection using segmentation algorithms.

4.2.4. Hybrid Methods

Segmentation algorithms were often integrated with classification and object detection algorithms to enhance the speed and accuracy of detection as depicted in Figure 9; for instance, Peng et al. [112] proposed a method based on drones and machine vision for detecting cracks and measuring their width. This approach utilizes the R-FCN for crack detection and localization, along with the Haar-AdaBoost algorithm for segmentation. Additionally, they incorporated a three-point laser rangefinder to accurately measure the distance to the object, enabling crack width calculation. Flah et al. [113] developed a model based on image processing and deep learning algorithms to detect cracks on concrete surfaces such as buildings and bridges. The method utilized CNN for crack detection, followed by edge detection and thresholding techniques to accurately measure the crack’s length, width, and orientation angle. Crack dimensions were first obtained in pixels and then converted to real-world units by asking the user to input the physical area of the imaged surface; using the image resolution, the algorithm scaled pixel distances to metric values, achieving small quantification errors for crack length, width, and orientation. Kim et al. [114] trained well-known CNNs, including AlexNet, VGG-16, and ResNet-50, to detect cracks in blurry images. The images were cropped into square patches, and the CNNs classified the patches into two classes: crack and non-crack. After identifying the region of a connected crack during the detection stage, it was visualized in a blob image. A blob image typically refers to a Binary Large Object (BLOB) in the context of image processing. In a binary image, a blob is a group of connected pixels that share specific properties, such as color or intensity. Blobs with low confidence were filtered out. The remaining refined blobs were then transformed into lines using a morphological-thinning algorithm, a process used in image processing to simplify the structure of a binary image while preserving its essential shape. To convert the measured length of these cracks from pixels to real-world units, the study used a pixel-to-centimeter conversion rate. Finally, the lengths of these lines were measured.

Over time, the advancements in deep learning segmentation algorithms have overtaken traditional segmentation methods. In 2019, Liang [115] introduced an innovative image-based technique for inspecting concrete bridge columns after incidents using deep learning and Bayesian optimization. This method involved three key steps; firstly, a CNN was utilized to categorize images into damage-free and damaged. Secondly, the Faster R-CNN model was employed to identify bridge columns as critical load-bearing components using object detection. Finally, damages on the columns were determined at the pixel level. This method enabled engineers and decision-makers to promptly evaluate the bridge condition following an incident and make informed decisions about repairs or replacements. Also, Mirzazade et al. [116] utilized the SegNet model to detect bridge components from the background. They then trained the Inception v3 model to classify images with and without damage. Following this, they employed the U-Net to segment the bridge joint openings. For damage quantification, they used 3D point cloud data generated through Structure-from-Motion (SfM) and compared it with laser scanning data. The dimensions of the detected damage were then calculated directly from the 3D point clouds in real-world units (cm). Additionally, other researchers like Kun et al. [117], also combined classification and segmentation algorithms to detect bridge defects.

In some articles, the defective region was initially identified using object detection algorithms, followed by defect detection at the pixel level using segmentation algorithms; for example, Kim et al. [118] utilized a UAV to capture images of a bridge to create the bridge 3D model and train the R-CNN algorithm to detect crack regions. They then used image processing algorithms to measure the length and thickness of the detected cracks and placed a marker on the bridge to calibrate the image scale, enabling accurate conversion of crack dimensions from pixels to millimetres. The researchers observed that environmental conditions, such as lighting and shadows, could affect the accuracy of crack measurements. As a solution, they suggested using proper lighting equipment during the photography process for future research. Also, Ni et al. [119] developed a deep learning framework for accurately and quickly detecting and measuring concrete cracks from images. They used DCGANs to synthesize crack images from real ones and augment the dataset for training the YOLOv5s model. They also applied the Otsu and edge detection methods to segment the crack images and used the medial axis algorithm to estimate the crack length and width. In their laboratory tests, each crack image was captured together with a checkerboard calibration target with known square size, so that the pixel-based skeleton and distance map could be converted into physical crack dimensions; using this pixel-to-millimetre scale.

Jiang et al. [120] wanted to identify bridge cracks in over 100.000 high-resolution images (5120 × 5120 pixels). Since using segmentation algorithms for this process would have taken too long, they initially used YOLOv4 to identify images with cracks and their approximate area. They then applied HDCB-Net, a deep learning algorithm, to overcome the problem of downsampling loss information of blurred cracks containing three downsampling stages for crack pixel-level identification. Similarly, Tran et al. [121], Kao et al. [122], and Ma et al. [123] utilized object detection followed by a segmentation algorithm to detect concrete bridge cracks. Tran et al. [121] compared five advanced networks for object detection and concluded that YOLOv7 was the most effective one for high-quality images. Additionally, they enhanced two networks for object segmentation, U-Net and pix2pix, by adjusting parameters such as network depth, activation function, loss function, and data augmentation. The modified U-Net model outperformed the original U-Net and pix2pix models. Their approach achieved a 92.38% accuracy in measuring the length of cracks and demonstrated a 0.87 R² value and 91% average accuracy in estimating the width and type of cracks. Furthermore, they addressed the challenge of dealing with huge images (18.000 × 10.000 pixels corresponding to 3.6 m × 2.0 m, i.e., about 0.2 mm per pixel) by splitting them into smaller parts. Kao et al. [122] employed YOLOv4 to locate the cracks and used a combination of thresholding (Sauvola local limit method) and edge detection (canny and morphological method) to estimate crack width. They collected 1463 images and used an additional 3006 images from the SDNET 2018 dataset to train their YOLOv4 model, which achieved a 92% accuracy in crack detection. This study also compared two different crack width measurement methods (the planar marker and the total station) for bridges, revealing a slight difference of 0.005 mm between them. However, the planar marker method was deemed unsuitable for bridges over water or roads. On the other hand, the total station measurement method offered more advantages and was a better alternative. It was easier to use and had no limitations associated with the planar marker method, making it the preferred approach for measuring bridge crack sizes. Ma et al. [123] assessed the effectiveness of different deep learning frameworks (DLFs) and convolutional neural networks in identifying bridge cracks from images. The researchers chose three images of bridge cracks with varying orientation characteristics: oblique, horizontal, and vertical. They used these images to measure the performance of different DLFs and CNNs. Ultimately, they compared the performance of three object detection models (Faster R-CNN, SSD, and YOLO-v5(x)) in detecting bridge cracks using three DLFs (PyTorch, TensorFlow2, and Keras). They also assessed two CNN models (U-Net and PSPNet) for segmenting bridge cracks using the same three DLFs. The study concluded that Faster R-CNN displayed the best performance for object detection and U-Net for segmentation. However, the optimal model choice depended on the DLF employed. Inam et al. [124] developed a method for identifying and localizing concrete bridge cracks using YOLOv5. They evaluated three versions of YOLOv5, including (s.m. and l). YOLOv5m outperformed the other versions with mAP of 99.3%. They used a dataset of 2270 images, consisting of 1370 images of Pakistani bridges, and the SDNET2018 dataset. Afterward, they employed U-Net to segment the cracks and estimate their length, width, and area in pixel units, which had an accuracy of 93.4%. In light of their findings, they recommended utilizing LIDAR or estimating the camera resolution and distance from the surface to measure the cracks in millimetres in the future. Also, Zakaria et al. [125] designed a method for detecting cracks and spallation on concrete bridges using portable devices such as smartphones and tablets. YOLOv5s was employed to accurately locate the defects, while U-Net measured the defects, including crack width and spallation area. In their framework, U-Net first produces a pixel-level segmentation of the damaged region, and the defect size is calculated in pixel coordinates; these pixel measurements are then converted to physical dimensions on the bridge surface using a homography-based transformation between image pixels and global coordinates. Similarly, Meng et al. [126] utilized a robot to capture images of bridge defects. The robot had advanced sensors that captured high-quality images, with optical antishake and thermal antifog capabilities. It used the BeiDou Navigation Satellite System for positioning and navigation. Its high-strength mechanical arm could extend up to 6.5 m horizontally and 80 m vertically, powered by a silent DC rotary motor. They then employed YOLOv3 for defect detection and DeepLab for defect segmentation.

Using CR-YOLO and PSPNet algorithms, Zhang et al. [80] aimed to detect and segment cracks on bridge surfaces. To train their model, they created a dataset of 800 bridge crack images, which they augmented to 5000 images by adding publicly available images and applying image enhancement techniques. They also evaluated their model on an edge device and found it suitable for real-time crack detection and segmentation. Table 6 presents the summary of the analysed hybrid methods.

5. Discussion and Comparison of Algorithms

This section compares deep learning algorithms for image-based bridge defect detection and measurement. Table 7 reviews the existing literature that has evaluated these algorithms. As mentioned earlier, image classification, segmentation, and object detection algorithms, or their combination can be used for the bridge damage detection. However, employing segmentation algorithms is inevitable if defect measurement is also needed.

Among classification algorithms, the reviewed studies consistently indicate that deeper CNN backbones such as VGG, ResNet, and DenseNet achieve higher accuracy than shallower or handcrafted-feature approaches, whereas lightweight architectures (e.g., MobileNet) tend to offer faster inference. Trach [46] conducted a comprehensive study on classifying cracks, spallation, and pop-outs. In this investigation, various classification algorithms were compared, concluding that ResNet50 outperforms VGG16 and, in turn, VGG16 surpasses DenseNet121 in terms of accuracy. Additionally, the research highlighted the superior speed of MobileNet and InceptionV3 compared to other algorithms. Furthermore, Cardellicchio et al. [45] trained five convolutional neural networks, DenseNet121, InceptionV3, ResNet50V2, MobileNetV3, and NASNetMobile, to classify images into seven distinct defect classes. The findings revealed that ResNet performed better in detecting most defect categories. Subsequently, InceptionV3 and DenseNet121 demonstrated comparable accuracy levels. InceptionV3 is often preferred for identifying defects of various sizes because it uses different kernel sizes. This algorithm can accurately detect both small and large defects while benefiting from efficient processing speed because it is not very deep, and multiple kernels of different sizes are implemented within the same layer [129]. Taken together, these results suggest that ResNet- and DenseNet-type backbones are generally preferred when accuracy is prioritized, whereas MobileNet- and NASNet-type models are attractive for resource-constrained or near-real-time applications.

Object detection algorithms are classified into one- and two-stage object detectors. Object detection algorithms first find the objects and then classify them with a bounding box. One-stage object detectors perform both stages in one step, making them faster. In contrast, two-stage algorithms first propose approximate object regions likely to contain the objects, then classify the regions in the next stage, making them more accurate [130].

Several object detection algorithms have been developed, including R-CNN, Faster R-CNN, YOLO, Single Shot MultiBox Detector (SSD), RetinaNet, and EfficientDet. R-CNN and Faster R-CNN are two-stage object detectors and are generally known for their accuracy, while YOLO, SSD, RetinaNet, and EfficientDet are faster because they are one-stage object detectors [131]. One-stage object detectors, such as early versions of the YOLO algorithm and SSD, might be less suitable for small or irregularly shaped objects, such as thin cracks, due to potential limitations in accuracy [132,133].

Ma et al. [123] conducted a comparison of various object detection algorithms. They determined that the Faster R-CNN model demonstrated the highest recall and F1-score compared to other object detection CNNs such as SSD and YOLO-v5(x). Yet, according to Tran et al. [121] and Ni et al. [119] recent versions of the YOLO model have demonstrated superior accuracy and computational speed performance compared to Faster R-CNN. Overall, the reviewed evidence indicates that modern one-stage detectors (YOLO family) provide a favorable accuracy–speed trade-off for UAV-based and real-time inspection, while two-stage detectors remain competitive and can still be advantageous when detection of small or overlapping defects is prioritized and higher computational cost is acceptable.

Several algorithms have been widely studied in image segmentation, including DeepLabv3+, U-Net, SegNet, and Fully Convolutional Networks. In a study by Qiao et al. [93], which focused on segmenting cracks and exposed bars, DeepLabv3+, SegNet, and FCN were evaluated. Among these, DeepLabv3+ demonstrated superior performance. Another comparative study conducted by Jin et al. [95] found that U-Net outperformed both DeepLab and FCN regarding precision and accuracy, while DeepLab exhibited better Mean Intersection over Union (MeanIoU) than U-Net. Lastly, Subedi et al. [96] addressed concrete damage and exposed bars using Nested Reg-UNet, comparing it with six other state-of-the-art segmentation algorithms. In this case, DeepLabv3+ surpassed U-Net in terms of precision and MeanIoU. Overall, these studies suggest that DeepLabv3+ tends to provide higher mIoU and more robust segmentation on complex, multi-defect datasets, while U-Net and its variants can yield competitive or superior precision on simpler or more homogeneous datasets. Also, Mirzazade et al. [116] concluded that U-Net outperforms SegNet in small object segmentation, while SegNet performs better in large-scale object segmentation. Furthermore, among the various segmentation algorithms, FCNs stand out for their computational efficiency. By utilizing locally connected layers and avoiding dense layers, FCNs achieve faster training with fewer parameters.

The comparative analysis and evaluation considered critical metrics such as precision, recall, mean average precision (mAP), mean intersection over union (mIoU), and total inference time for comparison. Figure 10 categorically arranged the algorithms, considering their performance in terms of speed and accuracy.

The comparison of deep learning models for 2D image-based bridge defect detection also reveals several important trends that highlight the trade-offs between performance, dataset size, and model architecture.

In the context of single-class object detection, YOLOv5 and its variants consistently show strong performance, even with moderate dataset sizes. As shown in Figure 11, YOLOv5m achieves a 99.3% mAP with 2270 images, demonstrating that one-stage detectors can offer excellent performance with a balanced speed-accuracy trade-off. This is significant when compared to YOLOv4, which, although using a larger dataset of 4469 images, achieves 92% mAP (also in Figure 11), showing that YOLOv5m is highly effective, even with moderate dataset sizes.

As shown in Figure 12, Faster R-CNN outperforms YOLO even with smaller datasets, indicating that two-stage detectors are often more accurate for multi-class detection tasks. For instance, Faster R-CNN achieves 90% mAP with 1660 images, while YOLOv3 achieves 88% mAP with the same number of images. This suggests that Faster R-CNN tends to provide better performance even with smaller datasets when the task involves multiple object classes. Furthermore, YOLOv3+ with Transfer Learning (TL) and Data Augmentation (DA) achieves the highest mAP in the table with 91% at 1660 images, showing that data augmentation and transfer learning significantly enhance YOLO’s performance, narrowing the gap with two-stage detectors.

On the other hand, YOLOv5s (with 1780 images) shows a significantly lower mAP of 51%, indicating that lightweight models may struggle in multi-class detection, especially when the dataset is not large enough.

Additionally, models like YOLOv2 and YOLOv3 achieve mAP values of around 70% on larger dataset, suggesting that YOLO versions perform reasonably well on larger datasets but are still not as good as Faster R-CNN in multi-class detection scenarios.

In single-class segmentation, as shown in Figure 13, DeepLabv3+ outperforms other models, achieving 82.37% IoU with 5000 images.

In multi-class segmentation, as shown in Figure 14, U-Net achieves the highest mIoU of 80.19% with 7079 images, slightly outperforming DeepLabv3+, which achieves 80.03% mIoU with the same dataset size. Models like SegNet and FCN, both using 4300 images, show lower mIoU values of 75.03% and 74.75%, respectively.

Models with deeper backbones or more complex decoder structures (e.g., ResNet/DenseNet classifiers, DeepLabv3+ segmenters, and two-stage detectors such as Faster R-CNN) typically achieve higher mAP or mIoU but require longer inference times, whereas lightweight CNNs (e.g., MobileNet, NASNet) and one-stage detectors from the YOLO family provide faster inference at the cost of a modest reduction in accuracy. Therefore, the algorithm selection relies on the particular use case and application domain. Moreover, algorithms can be adjusted to enhance accuracy or speed.

6. Conclusions

The increasing application of AI across diverse industries has encouraged bridge managers to adopt advanced AI methodologies in bridge inspection processes. This paper systematically reviewed recent advancements in deep learning-based detection of concrete bridge surface defects, highlighting the limitations of traditional image-processing methods and outlining the transition towards machine learning and deep learning approaches. Hence, several key findings from this systematic review were derived.

Initially, classification algorithms were predominantly utilized for defect detection; however, their limitations prompted the adoption of object detection algorithms, which can handle multiple defect types within single images more effectively. Among various deep learning techniques, object detection and segmentation algorithms have emerged as the most effective methods for concrete bridge defect detection. Object detection algorithms are particularly suited when identification is the primary requirement. Faster R-CCNs (Region-based Convolutional Neural Networks) are preferable for scenarios prioritizing accuracy, whereas YOLO (You Only Look Once) or SSD (Single-Shot Detector) models are employed for faster detection. Recent versions of YOLO demonstrate superior performance in terms of both accuracy and computational speed.

For precise quantification and measurement of defects, segmentation algorithms, either alone or combined with other deep learning methods, are essential as they provide pixel-level accuracy. Among segmentation models, U-Net and DeepLab have shown higher accuracy, while FCNs (Fully Convolutional Networks) offer advantages in computational efficiency. A key limitation identified with segmentation methods is the representation of defects in pixel units rather than real-world measurements. This issue has been addressed by incorporating known-sized reference targets within images or by calculating real-world dimensions based on camera distances. Within the reviewed literature, YOLO was the most frequently employed deep learning algorithm for bridge defect detection tasks.

Beyond algorithmic performance, the reviewed studies also showed how deep learning methods have begun to be embedded in smart surveillance workflows for concrete bridges. Several works integrated CNN- and transformer-based models into UAV platforms or climbing robots for data acquisition and automated inspection [85,87,100,113]. Other studies combined image-based defect detection with photogrammetry, SfM, LiDAR, and SLAM to localize defects on 3D models or within BIM-like representations of bridges [85,87,99,103]. In post-earthquake scenarios, multi-task networks have been used to detect components and damages and to derive risk levels or safety indices for bridges [98,104]. However, a number of practical challenges remain for real-world deployment, including variable field lighting, view obstruction from vehicles and vegetation, adverse weather conditions, camera calibration and scale conversion for defect quantification, and the substantial effort required for data collection, storage, and pixel-level annotation. Real-time or near-real-time inference on edge devices is still not routinely achieved, and human–AI interaction typically remains at the stage where inspectors manually validate model outputs rather than working with well-designed decision-support interfaces. To move from paper performance to routine deployment by agencies, future research needs to focus on robust domain adaptation, uncertainty-aware predictions, active learning, efficient annotation strategies, and tighter integration with BIM/BMS and asset-management workflows that explicitly support inspectors in interpreting and acting on automated defect detections.

Considering the insights gathered, several gaps and potential areas for future research have been identified. Integration of deep learning-based defect detection with advanced 3D visualization techniques such as SfM (Structure from Motion) and SLAM (Simultaneous Localization and Mapping) remains relatively unexplored. Future research could further develop methods to measure bridge condition indices and visualize these effectively using color-coded BIM systems. A significant limitation identified is the lack of readily available, comprehensive datasets containing annotated images of diverse bridge defects, particularly for segmentation tasks. Developing and openly sharing such datasets would significantly benefit the research community by reducing the extensive manual annotation currently required. While various deep learning algorithms reviewed here have demonstrated strong performance, further refinement and modification of deep learning architectures remain necessary to optimize performance, particularly when dealing with lower-quality images or complex backgrounds. Also, advancing research on computationally efficient deep learning models is vital to enable real-time defect detection, thereby enhancing their usability in portable devices such as smartphones and other edge computing devices.

As we’ve seen in some recent studies, generative models and synthetic data are promising solutions to the problem of data scarcity in bridge defect detection. Since gathering and labeling real-world data is both time-consuming and resource-intensive, focusing more on these methods in the future could significantly streamline the process. Additionally, the use of large-scale vision models, such as transformers, is an exciting emerging trend. These models have the potential to push the boundaries of defect detection accuracy and automation, making them an important area for future research in bridge inspection.

Author Contributions

Conceptualization, N.L.K., E.S., N.S. and S.B.; methodology, N.L.K., N.S. and S.B.; formal analysis, N.L.K., E.S., N.S. and S.B.; data curation, N.L.K., N.S. and S.B.; investigation, N.L.K.; writing—original draft preparation, N.L.K.; writing—review and editing, N.L.K., E.S., N.S. and S.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Some or all data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Risk of Bias.

Studies	Study Design	Data Handling	Reporting Transparency
Chen [38]	low	low	Some concerns
Hüthwohl et al. [42]	low	low	low
Zhu et al. [5]	low	low	low
Mundt et al. [44]	low	low	low
Zoubir et al. [49]	low	low	low
Bukhsh et al. [48]	low	low	low
Kruachottikul et al. [50]	low	Some concerns	low
Xu et al. [36]	low	low	low
Zhang et al. [37]	low	low	low
Cardellicchio et al. [45]	low	Some concerns	low
Trach [46]	low	Some concerns	low
Chen et al. [41]	low	low	low
Zhang et al. [54]	low	low	low
Abubakr et al. [47]	low	low	low
Alfaz et al. [39]	low	Some concerns	low
Aliyari et al. [51]	low	low	low
Li et al. [40]	low	Some concerns	low
Murao et al. [55]	Some concerns	Some concerns	Some concerns
Yu et al. [56]	low	low	low
Li et al. [57]	low	low	low
Deng et al. [58]	low	Some concerns	low
Ji [59]	low	Some concerns	low
Deng et al. [71]	low	Some concerns	low
Ruggieri et al. [60]	low	low	low
Zhang et al. [61]	low	low	low
Liu et al. [62]	low	Some concerns	low
Yamane et al. [63]	low	low	low
Teng et al. [64]	low	low	low
Hong et al. [70]	low	low	low
Yu et al. [65]	low	low	low
Gan et al. [68]	low	low	Some concerns
Lin et al. [67]	low	low	low
Ngo et al. [74]	low	low	Some concerns
Lu et al. [75]	low	low	Some concerns
Ruggieri et al. [72]	low	low	low
Rubio et al. [81]	low	low	low
Lopez Droguett et al. [82]	low	Some concerns	low
Merkle et al. [83]	low	low	low
Fukuoka and Fujiu [85]	low	low	low
Li et al. [86]	low	Some concerns	low
Yamane et al. [87]	low	low	Some concerns
Xu et al. [88]	low	low	low
Bae et al. [89]	low	low	Some concerns
Deng et al. [90]	low	low	low
Munawar et al. [92]	low	Some concerns	low
Li et al. [106]	low	low	Some concerns
Fu et al. [91]	low	Some concerns	low
Qiao et al. [93]	low	Some concerns	low
Jin et al. [95]	low	low	low
Subedi et al. [96]	low	low	low
Ayele et al. [97]	low	Some concerns	Some concerns
Wang et al. [98]	low	low	low
Montes et al. [99]	low	Some concerns	low
Artus et al. [100]	low	Some concerns	low
Borin and Cavazzini [101]	low	Some concerns	Some concerns
Jang et al. [102]	low	low	low
McLaughlin et al. [103]	low	low	low
Ye et al. [104]	low	low	low
Shen et al. [105]	low	low	low
Na and Kim [84]	low	low	Some concerns
Peng et al. [112]	low	Some concerns	low
Flah et al. [113]	low	low	low
Kim et al. [114]	low	low	low
Liang [115]	low	low	low
Mirzazade et al. [116]	low	Some concerns	low
Kun et al. [117]	low	Some concerns	low
Kim et al. [118]	low	Some concerns	Some concerns
Yu et al. [128]	low	low	low
Ni et al. [119]	low	low	low
Jiang et al. [120]	low	low	low
Tran et al. [121]	low	Some concerns	low
Kao et al. [122]	low	Some concerns	low
Ma et al. [123]	low	low	Some concerns
Inam et al. [124]	low	Some concerns	low
Zakaria et al. [125]	low	low	low
Meng et al. [126]	low	low	low
Zhang et al. [80]	low	Some concerns	Some concerns

References

Munawar, H.S.; Hammad, A.W.; Haddad, A.; Soares, C.A.P.; Waller, S.T. Image-based crack detection methods: A review. Infrastructures 2021, 6, 115. [Google Scholar] [CrossRef]
Iron, A.; Institute, S. Steel Bridge Construction: Myths & Realities; American Iron and Steel Institute: Washington, DC, USA, 2007. [Google Scholar]
Koch, C.; Georgieva, K.; Kasireddy, V.; Akinci, B.; Fieguth, P. A review on computer vision based defect detection and condition assessment of concrete and asphalt civil infrastructure. Adv. Eng. Inform. 2015, 29, 196–210. [Google Scholar] [CrossRef]
Oh, J.-K.; Jang, G.; Oh, S.; Lee, J.H.; Yi, B.-J.; Moon, Y.S.; Lee, J.S.; Choi, Y. Bridge inspection robot system with machine vision. Autom. Constr. 2009, 18, 929–941. [Google Scholar] [CrossRef]
Zhu, J.; Zhang, C.; Qi, H.; Lu, Z. Vision-based defects detection for bridges using transfer learning and convolutional neural networks. Struct. Infrastruct. Eng. 2020, 16, 1037–1049. [Google Scholar] [CrossRef]
Spencer, B.F., Jr.; Hoskere, V.; Narazaki, Y. Advances in computer vision-based civil infrastructure inspection and monitoring. Engineering 2019, 5, 199–222. [Google Scholar] [CrossRef]
Khan, M.A.-M.; Kee, S.-H.; Pathan, A.-S.K.; Nahid, A.-A. Image Processing Techniques for Concrete Crack Detection: A Scientometrics Literature Review. Remote Sens. 2023, 15, 2400. [Google Scholar] [CrossRef]
Luo, K.; Kong, X.; Zhang, J.; Hu, J.; Li, J.; Tang, H. Computer vision-based bridge inspection and monitoring: A review. Sensors 2023, 23, 7863. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Yuen, K.-V. Review of artificial intelligence-based bridge damage detection. Adv. Mech. Eng. 2022, 14, 1–21. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
Kulkarni, S.; Harnoorkar, S.; Pintelas, P. Comparative analysis of CNN architectures. Int. Res. J. Eng. Technol. 2020, 7, 1459–1464. [Google Scholar]
Neu, D.A.; Lahann, J.; Fettke, P. A systematic literature review on state-of-the-art deep learning methods for process prediction. Artif. Intell. Rev. 2022, 55, 801–827. [Google Scholar] [CrossRef]
Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 84–90. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Gurita, A.; Mocanu, I.G. Image segmentation using encoder-decoder with deformable convolutions. Sensors 2021, 21, 1570. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; part III 18. pp. 234–241. [Google Scholar] [CrossRef]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
Girshick, R. Fast r-cnn. arXiv 2015, arXiv:1504.08083. [Google Scholar] [CrossRef]
Ren, S. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv 2015, arXiv:1506.01497. [Google Scholar] [CrossRef] [PubMed]
Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
Otsu, N. A threshold selection method from gray-level histograms. Automatica 1975, 11, 23–27. [Google Scholar] [CrossRef]
Borrmann, A.; König, M.; Koch, C.; Beetz, J. Building Information Modeling: Why? What? How? Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar] [CrossRef]
Hüthwohl, P.; Borrmann, A.; Sacks, R. Integrating RC bridge defect information into BIM models. J. Comput. Civ. Eng. 2018, 32, 04018013. [Google Scholar] [CrossRef]
Sacks, R.; Eastman, C.; Lee, G.; Teicholz, P. BIM Handbook: A Guide to Building Information Modeling for Owners, Designers, Engineers, Contractors, and Facility Managers; John Wiley & Sons: Hoboken, NJ, USA, 2018. [Google Scholar] [CrossRef]
Kim, B.; Cho, S. Automated vision-based detection of cracks on concrete surfaces using a deep learning technique. Sensors 2018, 18, 3452. [Google Scholar] [CrossRef]
Prasanna, P.; Dana, K.; Gucunski, N.; Basily, B. Computer-vision based crack detection and analysis. In Proceedings of the Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems 2012, San Diego, CA, USA, 12–15 March 2012; pp. 1143–1148. [Google Scholar] [CrossRef]
Zhou, S.; Canchila, C.; Song, W. Deep learning-based crack segmentation for civil infrastructure: Data types, architectures, and benchmarked performance. Autom. Constr. 2023, 146, 104678. [Google Scholar] [CrossRef]
Xu, H.; Su, X.; Wang, Y.; Cai, H.; Cui, K.; Chen, X. Automatic bridge crack detection using a convolutional neural network. Appl. Sci. 2019, 9, 2867. [Google Scholar] [CrossRef]
Zhang, Q.; Barri, K.; Babanajad, S.K.; Alavi, A.H. Real-time detection of cracks on concrete bridge decks using deep learning in the frequency domain. Engineering 2021, 7, 1786–1796. [Google Scholar] [CrossRef]
Chen, R. Migration learning-based bridge structure damage detection algorithm. Sci. Program. 2021, 2021, 1102521. [Google Scholar] [CrossRef]
Alfaz, N.; Hasnat, A.; Khan, A.M.R.N.; Sayom, N.S.; Bhowmik, A. Bridge crack detection using dense convolutional network (densenet). In Proceedings of the 2nd International Conference on Computing Advancements, Dhaka, Bangladesh, 10–12 March 2022; pp. 509–515. [Google Scholar] [CrossRef]
Li, H.; Xu, H.; Tian, X.; Wang, Y.; Cai, H.; Cui, K.; Chen, X. Bridge crack detection based on SSENets. Appl. Sci. 2020, 10, 4230. [Google Scholar] [CrossRef]
Chen, L.; Yao, H.; Fu, J.; Ng, C.T. The classification and localization of crack using lightweight convolutional neural network with CBAM. Eng. Struct. 2023, 275, 115291. [Google Scholar] [CrossRef]
Hüthwohl, P.; Lu, R.; Brilakis, I. Multi-classifier for reinforced concrete bridge defects. Autom. Constr. 2019, 105, 102824. [Google Scholar] [CrossRef]
Kruachottikul, P.; Cooharojananone, N.; Phanomchoeng, G.; Chavarnakul, T.; Kovitanggoon, K.; Trakulwaranont, D.; Atchariyachanvanich, K. Bridge sub structure defect inspection assistance by using deep learning. In Proceedings of the 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), Morioka, Japan, 23–25 October 2019; pp. 1–6. [Google Scholar] [CrossRef]
Mundt, M.; Majumder, S.; Murali, S.; Panetsos, P.; Ramesh, V. Meta-learning convolutional neural architectures for multi-target concrete defect classification with the concrete defect bridge image dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 11196–11205. [Google Scholar] [CrossRef]
Cardellicchio, A.; Ruggieri, S.; Nettis, A.; Patruno, C.; Uva, G.; Renò, V. Deep learning approaches for image-based detection and classification of structural defects in bridges. In Proceedings of the International Conference on Image Analysis and Processing, Lecce, Italy, 23–27 May 2022; pp. 269–279. [Google Scholar] [CrossRef]
Trach, R. A Model Classifying Four Classes of Defects in Reinforced Concrete Bridge Elements Using Convolutional Neural Networks. Infrastructures 2023, 8, 123. [Google Scholar] [CrossRef]
Abubakr, M.; Rady, M.; Badran, K.; Mahfouz, S.Y. Application of deep learning in damage classification of reinforced concrete bridges. Ain Shams Eng. J. 2024, 15, 102297. [Google Scholar] [CrossRef]
Bukhsh, Z.A.; Anžlin, A.; Stipanović, I. BiNet: Bridge Visual Inspection Dataset and Approach for Damage Detection. In Proceedings of the 1st Conference of the European Association on Quality Control of Bridges and Structures: EUROSTRUCT 2021, Padua, Italy, 29 August–1 September 2021; pp. 1027–1034. [Google Scholar] [CrossRef]
Zoubir, H.; Rguig, M.; El Aroussi, M.; Chehri, A.; Saadane, R.; Jeon, G. Concrete Bridge defects identification and localization based on classification deep convolutional neural networks and transfer learning. Remote Sens. 2022, 14, 4882. [Google Scholar] [CrossRef]
Kruachottikul, P.; Cooharojananone, N.; Phanomchoeng, G.; Chavarnakul, T.; Kovitanggoon, K.; Trakulwaranont, D. Deep learning-based visual defect-inspection system for reinforced concrete bridge substructure: A case of Thailand’s department of highways. J. Civ. Struct. Health Monit. 2021, 11, 949–965. [Google Scholar] [CrossRef]
Aliyari, M.; Droguett, E.L.; Ayele, Y.Z. UAV-based bridge inspection via transfer learning. Sustainability 2021, 13, 11359. [Google Scholar] [CrossRef]
Dorafshan, S.; Thomas, R.J.; Maguire, M. SDNET2018: An annotated image dataset for non-contact concrete crack detection using deep convolutional neural networks. Data Brief 2018, 21, 1664–1668. [Google Scholar] [CrossRef]
Li, L.-F.; Ma, W.-F.; Li, L.; Lu, C. Research on detection algorithm for bridge cracks based on deep learning. Acta Autom. Sin. 2019, 45, 1727–1742. [Google Scholar]
Zhang, Y.; Ni, Y.-Q.; Jia, X.; Wang, Y.-W. Identification of concrete surface damage based on probabilistic deep learning of images. Autom. Constr. 2023, 156, 105141. [Google Scholar] [CrossRef]
Murao, S.; Nomura, Y.; Furuta, H.; Kim, C.-W. Concrete crack detection using uav and deep learning. In Proceedings of the 13th International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP), Seoul, Republic of Korea, 26–20 May 2019; pp. 1–8. [Google Scholar]
Yu, Z.; Shen, Y.; Shen, C. A real-time detection approach for bridge cracks based on YOLOv4-FPM. Autom. Constr. 2021, 122, 103514. [Google Scholar] [CrossRef]
Li, X.; Sun, H.; Song, T.; Zhang, T.; Meng, Q. A method of underwater bridge structure damage detection method based on a lightweight deep convolutional network. IET Image Process. 2022, 16, 3893–3909. [Google Scholar] [CrossRef]
Deng, J.; Lu, Y.; Lee, V.C.-S. Imaging-based crack detection on concrete surfaces using You Only Look Once network. Struct. Health Monit. 2021, 20, 484–499. [Google Scholar] [CrossRef]
Ji, H. Development of an Autonomous Column-Climbing Robotic System for Real-time Detection and Mapping of Surface Cracks on Bridges. In Proceedings of the 2023 IEEE IAS Global Conference on Emerging Technologies (GlobConET), London, UK, 19–21 May 2023; pp. 1–6. [Google Scholar] [CrossRef]
Ruggieri, S.; Cardellicchio, A.; Nettis, A.; Renò, V.; Uva, G. Using machine learning approaches to perform defect detection of existing bridges. Procedia Struct. Integr. 2023, 44, 2028–2035. [Google Scholar] [CrossRef]
Zhang, C.; Chang, C.c.; Jamshidi, M. Concrete bridge surface damage detection using a single-stage detector. Comput.-Aided Civ. Infrastruct. Eng. 2020, 35, 389–409. [Google Scholar] [CrossRef]
Liu, Y.; Gao, W.; Zhao, T.; Wang, Z.; Wang, Z. A Rapid Bridge Crack Detection Method Based on Deep Learning. Appl. Sci. 2023, 13, 9878. [Google Scholar] [CrossRef]
Yamane, T.; Chun, P.-J.; Honda, R. Detecting and localising damage based on image recognition and structure from motion, and reflecting it in a 3D bridge model. Struct. Infrastruct. Eng. 2022, 20, 594–606. [Google Scholar] [CrossRef]
Teng, S.; Liu, Z.; Li, X. Improved YOLOv3-based bridge surface defect detection by combining High-and low-resolution feature images. Buildings 2022, 12, 1225. [Google Scholar] [CrossRef]
Yu, L.; He, S.; Liu, X.; Ma, M.; Xiang, S. Engineering-oriented bridge multiple-damage detection with damage integrity using modified faster region-based convolutional neural network. Multimed. Tools Appl. 2022, 81, 18279–18304. [Google Scholar] [CrossRef]
Li, R.; Yu, J.; Li, F.; Yang, R.; Wang, Y.; Peng, Z. Automatic bridge crack detection using Unmanned aerial vehicle and Faster R-CNN. Constr. Build. Mater. 2023, 362, 129659. [Google Scholar] [CrossRef]
Lin, J.J.; Ibrahim, A.; Sarwade, S.; Golparvar-Fard, M. Bridge inspection with aerial robots: Automating the entire pipeline of visual data capture, 3D mapping, defect detection, analysis, and reporting. J. Comput. Civ. Eng. 2021, 35, 04020064. [Google Scholar] [CrossRef]
Gan, L.; Liu, H.; Yan, Y.; Chen, A. Bridge bottom crack detection and modeling based on faster R-CNN and BIM. IET Image Process. 2023, 18, 664–677. [Google Scholar] [CrossRef]
Zhejiang-Highway-Administration. JTG H10-2009: Technical Specification for Highway Maintenance. Available online: https://www.chinesestandard.net/PDF/BOOK.aspx/JTGH10-2009 (accessed on 5 November 2024).
Hong, S.-S.; Hwang, C.-H.; Chung, S.-W.; Kim, B.-K. A deep-learning-based bridge damaged object automatic detection model using a bridge member model combination framework. Appl. Sci. 2022, 12, 12868. [Google Scholar] [CrossRef]
Deng, J.; Lu, Y.; Lee, V.C.S. Concrete crack detection with handwriting script interferences using faster region-based convolutional neural network. Comput.-Aided Civ. Infrastruct. Eng. 2020, 35, 373–388. [Google Scholar] [CrossRef]
Ruggieri, S.; Cardellicchio, A.; Nettis, A.; Renò, V.; Uva, G. Using attention for improving defect detection in existing RC bridges. IEEE Access 2025, 13, 18994–19015. [Google Scholar] [CrossRef]
Dwyer, B.; Nelson, J.; Hansen, T.; Solawetz, J. Roboflow, Version 1.0; Roboflow: Des Moines, IA, USA, 2022.
Ngo, L.; Xuan, C.L.; Luong, H.M.; Thanh, B.N.; Ngoc, D.B. Designing image processing tools for testing concrete bridges by a drone based on deep learning. J. Inf. Telecommun. 2023, 7, 227–240. [Google Scholar] [CrossRef]
Lu, G.; He, X.; Wang, Q.; Shao, F.; Wang, J.; Zhao, X. MSCNet: A Framework With a Texture Enhancement Mechanism and Feature Aggregation for Crack Detection. IEEE Access 2022, 10, 26127–26139. [Google Scholar] [CrossRef]
Adhikari, R.; Moselhi, O.; Bagchi, A. Image-based retrieval of concrete crack properties. In Proceedings of the ISARC International Symposium on Automation and Robotics in Construction, Eindhoven, The Netherlands, 26–29 June 2012; p. 1. [Google Scholar]
Tian, Y.; Zhang, X.; Chen, H.; Wang, Y.; Wu, H. A Bridge Damage Visualization Technique Based on Image Processing Technology and the IFC Standard. Sustainability 2023, 15, 8769. [Google Scholar] [CrossRef]
Vivekananthan, V.; Vignesh, R.; Vasanthaseelan, S.; Joel, E.; Kumar, K.S. Concrete bridge crack detection by image processing technique by using the improved OTSU method. Mater. Today Proc. 2023, 74, 1002–1007. [Google Scholar] [CrossRef]
Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Qian, S.; Tan, C. Automated bridge surface crack detection and segmentation using computer vision-based deep learning model. Eng. Appl. Artif. Intell. 2022, 115, 105225. [Google Scholar] [CrossRef]
Rubio, J.J.; Kashiwa, T.; Laiteerapong, T.; Deng, W.; Nagai, K.; Escalera, S.; Nakayama, K.; Matsuo, Y.; Prendinger, H. Multi-class structural damage segmentation using fully convolutional networks. Comput. Ind. 2019, 112, 103121. [Google Scholar] [CrossRef]
Lopez Droguett, E.; Tapia, J.; Yanez, C.; Boroschek, R. Semantic segmentation model for crack images from concrete bridges for mobile devices. Proc. Inst. Mech. Eng. Part O J. Risk Reliab. 2022, 236, 570–583. [Google Scholar] [CrossRef]
Merkle, D.; Solass, J.; Schmitt, A.; Rosin, J.; Reiterer, A.; Stolz, A. Semi-automatic 3D crack map generation and width evaluation for structural monitoring of reinforced concrete structures. J. Inf. Technol. Constr. 2023, 28, 774–805. [Google Scholar] [CrossRef]
Na, Y.-H.; Kim, D.-K. Deep Learning Strategy for UAV-Based Multi-Class Damage Detection on Railway Bridges Using U-Net with Different Loss Functions. Appl. Sci. 2025, 15, 8719. [Google Scholar] [CrossRef]
Fukuoka, T.; Fujiu, M. Detection of Bridge Damages by Image Processing Using the Deep Learning Transformer Model. Buildings 2023, 13, 788. [Google Scholar] [CrossRef]
Li, G.; Fang, Z.; Mohammed, A.M.; Liu, T.; Deng, Z. Automated Bridge Crack Detection Based on Improving Encoder–Decoder Network and Strip Pooling. J. Infrastruct. Syst. 2023, 29, 04023004. [Google Scholar] [CrossRef]
Yamane, T.; Chun, P.j.; Dang, J.; Honda, R. Recording of bridge damage areas by 3D integration of multiple images and reduction of the variability in detected results. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 2391–2407. [Google Scholar] [CrossRef]
Xu, Y.; Fan, Y.; Li, H. Lightweight semantic segmentation of complex structural damage recognition for actual bridges. Struct. Health Monit. 2023, 22, 3250–3269. [Google Scholar] [CrossRef]
Bae, H.; Jang, K.; An, Y.-K. Deep super resolution crack network (SrcNet) for improving computer vision–based automated crack detectability in in situ bridges. Struct. Health Monit. 2021, 20, 1428–1442. [Google Scholar] [CrossRef]
Deng, W.; Mou, Y.; Kashiwa, T.; Escalera, S.; Nagai, K.; Nakayama, K.; Matsuo, Y.; Prendinger, H. Vision based pixel-level bridge structural damage detection using a link ASPP network. Autom. Constr. 2020, 110, 102973. [Google Scholar] [CrossRef]
Fu, H.; Meng, D.; Li, W.; Wang, Y. Bridge crack semantic segmentation based on improved Deeplabv3+. J. Mar. Sci. Eng. 2021, 9, 671. [Google Scholar] [CrossRef]
Munawar, H.S.; Ullah, F.; Shahzad, D.; Heravi, A.; Qayyum, S.; Akram, J. Civil infrastructure damage and corrosion detection: An application of machine learning. Buildings 2022, 12, 156. [Google Scholar] [CrossRef]
Qiao, W.; Ma, B.; Liu, Q.; Wu, X.; Li, G. Computer vision-based bridge damage detection using deep convolutional networks with expectation maximum attention module. Sensors 2021, 21, 824. [Google Scholar] [CrossRef]
Siddhartha, V.R. Bridge Crack Detection Using Horse Herd Optimization Algorithm. In Proceedings of the 4th International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 20–22 September 2023. [Google Scholar] [CrossRef]
Jin, T.; Ye, X.; Li, Z. Establishment and evaluation of conditional GAN-based image dataset for semantic segmentation of structural cracks. Eng. Struct. 2023, 285, 116058. [Google Scholar] [CrossRef]
Subedi, A.; Tang, W.; Mondal, T.G.; Wu, R.-T.; Jahanshahi, M.R. Ensemble-based deep learning for autonomous bridge component and damage segmentation leveraging Nested Reg-UNet. Smart Struct. Syst. 2023, 31, 335–349. [Google Scholar] [CrossRef]
Ayele, Y.Z.; Aliyari, M.; Griffiths, D.; Droguett, E.L. Automatic crack segmentation for UAV-assisted bridge inspection. Energies 2020, 13, 6250. [Google Scholar] [CrossRef]
Wang, J.; Lei, Y.; Yang, X.; Zhang, F. A refinement network embedded with attention mechanism for computer vision based post-earthquake inspections of railway viaduct. Eng. Struct. 2023, 279, 115572. [Google Scholar] [CrossRef]
Montes, K.; Zhang, M.; Liu, J.; Hajmousa, L.; Chen, Z.; Dang, J. Integrated 3D Structural Element and Damage Identification: Dataset and Benchmarking. In Proceedings of the International Conference on Experimental Vibration Analysis for Civil Engineering Structures, Milan, Italy, 30 August–1 September 2023; pp. 712–720. [Google Scholar]
Artus, M.; Alabassy, M.S.H.; Koch, C. A BIM Based Framework for Damage Segmentation, Modeling, and Visualization Using IFC. Appl. Sci. 2022, 12, 2772. [Google Scholar] [CrossRef]
Borin, P.; Cavazzini, F. Condition assessment of RC bridges. Integrating machine learning, photogrammetry and BIM. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 42, 201–208. [Google Scholar] [CrossRef]
Jang, K.; An, Y.K.; Kim, B.; Cho, S. Automated crack evaluation of a high-rise bridge pier using a ring-type climbing robot. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 14–29. [Google Scholar] [CrossRef]
McLaughlin, E.; Charron, N.; Narasimhan, S. Automated defect quantification in concrete bridges using robotics and deep learning. J. Comput. Civ. Eng. 2020, 34, 04020029. [Google Scholar] [CrossRef]
Ye, X.W.; Ma, S.Y.; Liu, Z.X.; Ding, Y.; Li, Z.X.; Jin, T. Post-earthquake damage recognition and condition assessment of bridges using UAV integrated with deep learning approach. Struct. Control Health Monit. 2022, 29, e3128. [Google Scholar] [CrossRef]
Shen, Q.; Xiao, B.; Mi, H.; Yu, J.; Xiao, L. Adaptive Learning Filters–Embedded Vision Transformer for Pixel-Level Segmentation of Low-Light Concrete Cracks. J. Perform. Constr. Facil. 2025, 39, 04025007. [Google Scholar] [CrossRef]
Li, G.; Liu, Q.; Zhao, S.; Qiao, W.; Ren, X. Automatic crack recognition for concrete bridges using a fully convolutional neural network and naive Bayes data fusion based on a visual detection system. Meas. Sci. Technol. 2020, 31, 075403. [Google Scholar] [CrossRef]
Yang, X.; Li, H.; Yu, Y.; Luo, X.; Huang, T.; Yang, X. Automatic pixel-level crack detection and measurement using fully convolutional network. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 1090–1109. [Google Scholar] [CrossRef]
Narazaki, Y.; Hoskere, V.; Yoshida, K.; Spencer, B.F.; Fujino, Y. Synthetic environments for vision-based structural condition assessment of Japanese high-speed railway viaducts. Mech. Syst. Signal Process. 2021, 160, 107850. [Google Scholar] [CrossRef]
Yang, L.; Li, B.; Li, W.; Liu, Z.; Yang, G.; Xiao, J. A robotic system towards concrete structure spalling and crack database. In Proceedings of the 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), Macau, China, 5–8 December 2017; pp. 1276–1281. [Google Scholar]
Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic road crack detection using random structured forests. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3434–3445. [Google Scholar] [CrossRef]
Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1525–1535. [Google Scholar] [CrossRef]
Peng, X.; Zhong, X.; Zhao, C.; Chen, A.; Zhang, T. A UAV-based machine vision method for bridge crack recognition and width quantification through hybrid feature learning. Constr. Build. Mater. 2021, 299, 123896. [Google Scholar] [CrossRef]
Flah, M.; Suleiman, A.R.; Nehdi, M.L. Classification and quantification of cracks in concrete structures using deep learning image-based techniques. Cem. Concr. Compos. 2020, 114, 103781. [Google Scholar] [CrossRef]
Kim, J.-Y.; Park, M.-W.; Huynh, N.T.; Shim, C.; Park, J.-W. Detection and Length Measurement of Cracks Captured in Low Definitions Using Convolutional Neural Networks. Sensors 2023, 23, 3990. [Google Scholar] [CrossRef]
Liang, X. Image-based post-disaster inspection of reinforced concrete bridge systems using deep learning with Bayesian optimization. Comput.-Aided Civ. Infrastruct. Eng. 2019, 34, 415–430. [Google Scholar] [CrossRef]
Mirzazade, A.; Popescu, C.; Blanksvärd, T.; Täljsten, B. Workflow for off-site bridge inspection using automatic damage detection-case study of the pahtajokk bridge. Remote Sens. 2021, 13, 2665. [Google Scholar] [CrossRef]
Kun, J.; Zhenhai, Z.; Jiale, Y.; Jianwu, D. A deep learning-based method for pixel-level crack detection on concrete bridges. IET Image Process. 2022, 16, 2609–2622. [Google Scholar] [CrossRef]
Kim, I.-H.; Jeon, H.; Baek, S.-C.; Hong, W.-H.; Jung, H.-J. Application of crack identification techniques for an aging concrete bridge inspection using an unmanned aerial vehicle. Sensors 2018, 18, 1881. [Google Scholar] [CrossRef]
Ni, Y.; Mao, J.; Wang, H.; Xi, Z.; Xu, Y. Toward High-Precision Crack Detection in Concrete Bridges Using Deep Learning. J. Perform. Constr. Facil. 2023, 37, 04023017. [Google Scholar] [CrossRef]
Jiang, W.; Liu, M.; Peng, Y.; Wu, L.; Wang, Y. HDCB-Net: A neural network with the hybrid dilated convolution for pixel-level crack detection on concrete bridges. IEEE Trans. Ind. Inform. 2020, 17, 5485–5494. [Google Scholar] [CrossRef]
Tran, T.S.; Nguyen, S.D.; Lee, H.J.; Tran, V.P. Advanced crack detection and segmentation on bridge decks using deep learning. Constr. Build. Mater. 2023, 400, 132839. [Google Scholar] [CrossRef]
Kao, S.-P.; Chang, Y.-C.; Wang, F.-L. Combining the YOLOv4 deep learning model with UAV imagery processing technology in the extraction and quantization of cracks in bridges. Sensors 2023, 23, 2572. [Google Scholar] [CrossRef] [PubMed]
Ma, K.; Meng, X.; Hao, M.; Huang, G.; Hu, Q.; He, P. Research on the Efficiency of Bridge Crack Detection by Coupling Deep Learning Frameworks with Convolutional Neural Networks. Sensors 2023, 23, 7272. [Google Scholar] [CrossRef]
Inam, H.; Islam, N.U.; Akram, M.U.; Ullah, F. Smart and Automated Infrastructure Management: A Deep Learning Approach for Crack Detection in Bridge Images. Sustainability 2023, 15, 1866. [Google Scholar] [CrossRef]
Zakaria, M.; Karaaslan, E.; Catbas, F.N. Advanced bridge visual inspection using real-time machine learning in edge devices. Adv. Bridge Eng. 2022, 3, 27. [Google Scholar] [CrossRef]
Meng, Q.; Yang, J.; Zhang, Y.; Yang, Y.; Song, J.; Wang, J. A Robot System for Rapid and Intelligent Bridge Damage Inspection Based on Deep-Learning Algorithms. J. Perform. Constr. Facil. 2023, 37, 04023052. [Google Scholar] [CrossRef]
Ozgenel, F. Concrete crack images for classification. Mendeley Data 2019. [Google Scholar] [CrossRef]
Yu, L.; He, S.; Liu, X.; Jiang, S.; Xiang, S. Intelligent crack detection and quantification in the concrete bridge: A deep learning-assisted image processing approach. Adv. Civ. Eng. 2022, 2022, 1813821. [Google Scholar] [CrossRef]
Feng, C.; Zhang, H.; Wang, S.; Li, Y.; Wang, H.; Yan, F. Structural damage detection using deep convolutional neural network and transfer learning. KSCE J. Civ. Eng. 2019, 23, 4493–4502. [Google Scholar] [CrossRef]
Carranza-García, M.; Torres-Mateo, J.; Lara-Benítez, P.; García-Gutiérrez, J. On the performance of one-stage and two-stage object detectors in autonomous vehicles using camera data. Remote Sens. 2020, 13, 89. [Google Scholar] [CrossRef]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Srivastava, S.; Divekar, A.V.; Anilkumar, C.; Naik, I.; Kulkarni, V.; Pattabiraman, V. Comparative analysis of deep learning image detection algorithms. J. Big Data 2021, 8, 66. [Google Scholar] [CrossRef]
Wu, W.-H.; Lee, J.-C.; Wang, Y.-M. A study of defect detection techniques for metallographic images. Sensors 2020, 20, 5593. [Google Scholar] [CrossRef]

Figure 1. A five-layer DenseNet structure [18].

Figure 2. Flowchart of systematic literature review (n = number of papers).

Figure 3. The trend of using deep learning methods.

Figure 4. Frequency of damage detection and quantification methods.

Figure 5. The frequency of detection methods used for each damage type.

Figure 6. The transition from Traditional to AI-Powered bridge inspections.

Figure 7. Workflow for bridge defect detection using deep learning.

Figure 8. Object detection bounding boxes and segmentation masks.

Figure 9. Hybrid methods for bridge defect detection workflow.

Figure 10. Algorithms performance comparison.

Figure 11. Performance of object detection algorithms for single-class problems across different dataset sizes.

Figure 12. Performance of object detection algorithms for multi-class problems across different dataset sizes.

Figure 13. Performance of segmentation algorithms for single-class problems across different dataset sizes.

Figure 14. Performance of segmentation algorithms for multi-class problems across different dataset sizes.

Table 1. Search query.

String

(“Viaducts” OR “Bridge”) AND (“Artificial intelligence” OR “Image processing” OR “Machine learning” OR “Deep learning” OR “CNN” OR “convolutional neural network”) AND (“Defect detection” OR “Crack detection” OR “Damage detection” OR “spalling” OR “vegetation” OR “scaling” OR “Delamination” OR “Efflorescence” OR “bulging” OR “Pop-outs” OR “Honeycombs” OR “Reinforcing steel corrosion” OR “Exposed bar” OR “Exposed rebar” OR “corrosion stain “ OR “Water leak” OR “Water leakage” OR “material loss” OR “Loss of section” OR “displacement” OR “deformation” OR “degradation” OR “damage quantification” OR “damage measurement” OR “damage evaluation” OR “damage severity”) AND (“image” OR “video”)

Table 2. Filtration criteria.

Phases

Criteria

Title and Abstract Screening

Review papers were excluded.
Titles lacking clarity were excluded.
Only articles related to visual defect detection on concrete bridges were included.
Articles with abstracts lacking sufficient methodological detail to determine whether deep learning was applied to concrete bridge surface defect detection using standard 2D images were excluded.
Non-English language articles and non-peer-reviewed documents (e.g., theses, technical reports) were excluded.

Full-text Screening

Articles focusing on hardware-only aspects (e.g., camera hardware, UAV trajectory planning, or sensor placement) without proposing or evaluating a deep learning model for defect detection were excluded.
Articles focusing on other bridge defects like cable defects instead of concrete damages were excluded.

Table 3. Summary of reviewed papers on bridge defect detection using classification algorithms.

Author	Algorithm	Type of Defect(s)	Preprocessing	Dataset Details		Performance (%)
Author	Algorithm	Type of Defect(s)	Preprocessing	Dataset Name	Number of Images	Performance (%)
Chen [38]	A Variant of the VGG structure	Crack	Labeling Data augmentation Cropping	Own data collection using a UAV and a handheld DSLR camera	2000	Accuracy = 99.1%
Hüthwohl et al. [42]	Inception V3	Cracks	Labeling	Own data collection and authority image sets	38,408	Accuracy = 97.4%
		Efflorescence				Accuracy = 96.9%
		Scaling				Accuracy = 94.6%
		Spallation				Accuracy = 94.3%
		General defects				Accuracy = 95.4%
		No defect				Accuracy = 97.2%
		Spallation scaling				Accuracy = 95.2%
		Exposed bars				Accuracy = 87%
		Corrosion				Accuracy = 94.9%
Zhu et al. [5]	CNN	Spallation Exposed bars Crack Pockmark	Brightness adjustment, saturation adjustment, and flip adjustment labeling	Own data collection using a digital camera (Canon EOS 700 D, 18 megapixels)	1458	Accuracy = 97.8%
Mundt et al. [44]	Efficient Neural Architecture Search (ENAS)	Crack Spallation Exposed bars Efflorescence Corrosion (stains)	Labeling	COncrete DEfect Bridge IMage: CODEBRIM	1590	Accuracy = 70.78%
Mundt et al. [44]	MetaQNN		Labeling	COncrete DEfect Bridge IMage: CODEBRIM	1590	Accuracy = 72.19%
Zoubir et al. [49]	VGG16	Efflorescence Spallation Crack	Data augmentation (flipping, rotating) Labeling	Own data collection using two 20-MP consumer digital cameras with 5 mm focal length	6952	Accuracy = 97.13%
Bukhsh et al. [48]	CNN	Crack Spallation Exposed bars Corrosion stain	Labeling	BiNet	3588	Accuracy = 80 ± 1%
	VGG16					Accuracy = 80 ± 2%
	Pretrained VGG16-ImageNet					Accuracy = 88 ± 1%
	Pretrained VGG16-CODEBRIM					Accuracy = 81 ± 1%
Kruachottikul et al. [50]	ResNet-50	Crack Erosion Honeycombing Spallation Scaling	Data augmentation (image rotations, adding noises, and flipping images), labeling	Own data collection using a mobile application	3618	Accuracy = 81%
Xu et al. [36]	End-to-end CNN	Crack	Filtering, cropping, flipping, resizing	Own data collection using the Phantom 4 Pro’s CMOS surface array camera	6069	Accuracy = 96.37%
Zhang et al. [37]	1D-CNN-LSTM	Crack	Images were transformed into the frequency domain Cropping Labeling	SDNET2018	16,789	Accuracy = 99.25%
Cardellicchio et al. [45]	DenseNet121 + transfer learning, InceptionV3 + transfer learning, ResNet50V2 + transfer learning, MobileNetV3 + transfer learning, NASNetMobile + transfer learning	Corroded/oxidized steel reinforcement Cracks Deteriorated concrete Honeycombs Moisture spots Pavement degradation shrinkage cracks	Labeling Data enrichment and augmentation (horizontal and vertical flipping)	Own data collection	2436	Best Accuracy = 85.19% for crack 85.4% for corroded and oxidized steel reinforcement 91.78% for deteriorated concrete 90.96% for honeycombs 76.43% for moisture spots 93.44% for pavement degradation 96.13% for shrinkage cracks
Trach [46]	CNN (MobileNet architecture)	Crack Spallation (this class includes two groups of defects—(scaling) and (spallation)) Popout	Manually crop 256 by 256 pixel rotating part of the images by 90◦ to overcome the unbalanced dataset	Data collected from real inspection reports of bridge structures, which were taken for over 15 years in different regions of Ukraine	5200	Accuracy = 94.61%
Chen et al. [41]	MobileNetV3-Large-CBAM	Crack	Data augmentation (flipping, mirroring, cropping, and rotating data) Labeling	Bridge crack dataset [53]	6532	Accuracy = 95.9%
Chen et al. [41]	MobileNetV3-Large-CBAM	Crack		Open-source datasets and images collected from the web	15,068	Accuracy = 99.66% for various material cracks test set and 99.69 for huge-width cracks test set
Zhang et al. [54]	Hybrid probabilistic deep convolutional neural network	Crack	Labeling	Own data collection	40,000	Accuracy = 99.09%
Zhang et al. [54]	Hybrid probabilistic deep convolutional neural network	Crack	Labeling	Own data collection	2600	Accuracy = 92.49%
Abubakr et al. [47]	Xception	Cracks, corrosion, efflorescence, spallation, and exposed bars	Labeling	CODEBRIM	1590	Accuracy = 94.95%
Abubakr et al. [47]	Vanilla		Labeling	CODEBRIM	1590	Accuracy = 85.71%
Alfaz et al. [39]	Dense Convolutional Network (DenseNet)	Crack	Cleaning, resizing, and manual categorization	Crack dataset, gathered by Xu et al. [36]	6069	Accuracy = 99.83%
Aliyari et al. [51]	CNNs	crack	Rotation, zooming in, cropping, flipping	SDNET	19,023	Accuracy (DenseNet210) = 97%
			Rotation, vertical and horizontal flipping, and zoom-in	Heterogeneous dataset: own data collection	308	Accuracy (VGG16, Resnet152v2) = 69%
			Rotation, vertical and horizontal flipping, and zoom-in	Homogeneous dataset: own data collection	400	Accuracy (Xception) = 74%
Li et al. [40]	A convolutional neural network named Skip-Squeeze-and-Excitation Networks (SSENets)	Crack	Filtering, cropping, and flipping Labeling	Crack dataset, gathered by Xu et al. [36]	6069	Accuracy = 97.77% Precision = 95.45%

Table 4. Summary of reviewed papers on bridge defect detection using object detection algorithms.

Author	Algorithm	Type of Defect(s)	Preprocessing	Dataset Details		Performance (%)
Author	Algorithm	Type of Defect(s)	Preprocessing	Dataset Name	Number of Images	Performance (%)
Murao et al. [55]	YOLOv2	Crack	Labeling, resized, inverted, rotated, HSV (Hue, Saturation, Value)	Own data collection using a UAV equipped with a camera and various image data collected from the Internet	164 for system 1
		Concrete joints with line forms other than cracks:, etc., Crack Branch-crack chalk Branch-chalk			166 for system 2	Precision = 27.95%
		Concrete joints with line forms other than cracks: etc., Crack Branch-crack chalk Branch-chalk			253 for system 3	Precision = 41.63%
Yu et al. [56]	YOLOv4-FPM	Crack	Cropping, Labeling	Own data collection using digital SLR camera (Canon EOS M10) and UAV	2768	mAP = 97.6%
Li et al. [57]	Lite-YOLO-v4	Crack	Mosaic data augmentation Labeling	Data collected from a bridge underwater inspection company and the Internet, including Crack 500 dataset	8780	mAP = 77.07% Precision = 93.97% Recall = 47.98%
Deng et al. [58]	YOLOv2	Cracks and handwriting	Horizontal flipping labeling	Data collected by inspectors during annual visual inspections	3010	mAP = 77%
Ji [59]	YOLO Autobot	Crack	labeling	Crack dataset provided by RoboFlow [73]	3917	mAP = 82.6%
Deng et al. [71]	Faster R-CNN	Cracks and handwriting	labeling horizontal flipping cropping	Images from inspection records of concrete bridges taken by consumer-grade cameras with complex background information	5009	mAP = 82%
Ruggieri et al. [60]	YOLOv5 (different versions)	Crack, corroded steel reinforcement, deteriorated concrete, honeycombs, moisture spots, pavement degradation	Labeling	Collecting figures from the observations of some existing bridges in Italy	2685	For YOLOv5m6: mAP = 20.66% Precision = 43.63% Recall = 24.24%
Zhang et al. [61]	Modified YOLOv3	Crack Spallation Exposed bars Pop-out	Downsampling Labeling Data augmentation: (scaling and cropping, flipping, manipulating the images by applying the motion blur, changing brightness, or adding salt and pepper noise)	Data acquired from the Hong Kong Highways Department + Own data collection	2206	mAP = 79.9%
Liu et al. [62]	YOLOv5	Crack	Labeling	Own data collection: The original dataset	180 sets of training samples and 10 sets of validation samples
Liu et al. [62]	YOLOv5	Crack	Labeling	The extended dataset produced by DCGAN	180 sets of training samples and 10 sets of validation samples)
Yamane et al. [63]	YOLOv5	Exposed bars	Labeling	Data were taken during the inspection of bridges managed by the Kanto Regional Development Bureau of Japan’s Ministry of Land, Infrastructure, Transport and Tourism from 2004 to 2018 using Skydio 2, a small UAV developed by Skydio.	1000	AP = 64%
Teng et al. [64]	Improved YOLOv3	Cracks Exposed bars	Data augmentation (color jitter augmentation, flipping, scaling) Labeling	Data from concrete bridges (along the Erenhot–Guangzhou and Guangzhou–Kunming Expressways)	1660	mAP= 91%
Hong et al. [70]	(BDODC-F): Mask R-CNN Blendmask	Efflorescence Spallation Crack Corrosion Water leak Concrete scaling	Enhancing the image quality Labeling	Own data collection	100 for Aggregated model training	Blendmask accuracy = 92.675% Mask-RCNN accuracy = 98.679%
Yu et al. [65]	Modified Faster R-CNN	Crack Spallation Exposed bars	Resizing Labeling	Images taken by a Canon EOS 5DS R camera from the periodic inspection of bridges by the CCCC First Highway Consultants Co., Ltd.	1000	mAP = 84.56%
Gan et al. [68]	Faster R-CNN	Crack	Labeling	Own data collection using DJI M210-RTK	637	Precision = 92.03% Recall = 92.26%
Lin et al. [67]	Faster-RCNN	Crack	Labeling Data augmentation (flipping, rotation, shearing, and changing brightness and contrast)	Own data collection using UAV	742	Average precision = 49.2%
		Spallation				Average precision = 84.6%
		Efflorescence				Average precision = 57.6%
		Corrosion stains				Average precision = 74.1%
		Exposed bars				Average precision = 84.5%
Ngo et al. [74]	CNN	Crack	Changed in brightness and rotation to increase the training image’s quality	Own data collection and the existing datasets (from Kaggle’s image, SDNet2018, and …)	51,000	Accuracy = 95.19%
Lu et al. [75]	MSCNet	Crack	Labeling	SDNET dataset and CCIC dataset	30,000	Accuracy = 92.7% Precision = 93.5% Recall = 94.2% F1-score = 93.8%
Ruggieri et al. [72]	Yolo11x+ attention mechanisms	Crack Corroded steel bar Deteriorated concrete Honeycomb Moisture spot Shrinkage Pavement degradation	Labeling	Dataset proposed by [45]	6580	mAP = 59.38% Precision = 82.87% Recall = 52.61% F1-score = 64.36%

Table 5. Summary of reviewed papers on bridge defect detection using segmentation algorithms.

Author	Algorithm	Type of Defect(s)	Aim	Quantified Attribute	Preprocessing	Dataset Details		Performance (%)
Author	Algorithm	Type of Defect(s)	Aim	Quantified Attribute	Preprocessing	Dataset Name	Number of Images	Performance (%)
Rubio et al. [81]	Fully convolutional networks	Delamination	Detection		Labeling Flipping Changing in illumination and slight affine transformations	Data collected from inspection records of bridges in Niigata Prefecture, Japan	734	Mean accuracy = 89.7%
Rubio et al. [81]	Fully convolutional networks	Exposed bars	Detection				734	Mean accuracy = 78.4%
Tian et al. [77]	Otsu binarization method (threshold segmentation)	Crack Other damages, such as concrete cavities and damage	Detection and quantification (in physical units)	Width Length Area	Grayscale processing Gray adjustment Adaptive filter function Wiener2 for noise removal			Image processing method
Vivekananthan et al. [78]	Image processing-based segmentation (gray-level discrimination approach)	Crack	Detection and quantification	Area and orientation	Gray level discrimination	Own data collection using CMOS cluster camera	2068	Accuracy = 95%
Adhikari et al. [76]	Image processing-based segmentation (threshold operation:(a) maximum entropy; (b) Otsu; and mean of the intensity in the given image Erode and Dilate)	Crack	Detection and quantification (in physical units)	Width Length	Image enhancement using point processing, histogram equalization, and mask processing	Own data collection using SONY-DSC T5 digital camera		Image processing method
Lopez Droguett et al. [82]	DenseNet-13	Crack	Detection		Labeling	Own data collection: CRACKV2	409,432	mIoU = 92.13%
Merkle et al. [83]	U-NET	Crack	Detection and quantification (in physical units)	Width Length	Labeling	Own data collection	126	Precision = 56.1% Recall = 59.9% IoU = 40.8%
Fukuoka and Fujiu [85]	SegFormer	Delamination Exposed bars	Detection		Data augmentation (flipping, scaling, rotation) Labeling	Data collected from the Japanese bridge inspection report: dataset A	17,810	Precision: Delamination detection = 80.8% Rebar-exposure detection = 70.8%
						Data collected from the Japanese bridge inspection report: dataset B	17,810	Precision: Delamination detection = 77.1% Rebar-exposure detection = 74.7%
						Data collected from the Japanese bridge inspection report: dataset C	17,810	Precision: delamination detection = 76.9% rebar-exposure detection = 76.5%
Li et al. [86]	A neural network based on the encoder–decoder	Crack	Detection		90°, 180° clockwise rotation and horizontal inversion Labeling	Own data collection using a telephoto camera on the equipment cart to take pictures of the bridge cracks	2240 which were expanded to 10,000 images	mIoU = 84.5% Precision = 98.3% Recall = 97.3%
Yamane et al. [87]	Mask-R-CNN	Corrosion	Detection		Resizing Labeling	Own data collection	1966	Accuracy = 94% Precision = 80%
Xu et al. [88]	Modified DeepLabv3+	Crack	Detection		Data augmentation (horizontal and vertical flipping, random rotation, and random translation) Labeling	Own data collection captured from several actual bridges in multiple scenes, scales, and resolutions	4303	mAP = 88.8% mIoU = 77.6%
Bae et al. [89]	End-to-end deep super-resolution crack network (SrcNet)	Crack	Detection		Rotating, flipping, labeling	Own data collection and web scraping	4055	Precision on Jang-Duck bridge = 81.18% Recall on Jang-Duck bridge = 92.65%
Deng et al. [90]	LinkASPPNet	Delamination, exposed bars	Detection		Labeling	Images from inspection records of Japanese concrete bridges	732	Mean precision= 73.59% Mean IoU = 61.95%
Munawar et al. [92]	CycleGAN	Corrosion	Detection		Adjusting the image brightness and size Data augmentation (flipping, rotation, cropping) Labeling	Own data collection using the UAV model DJI-M200 Also, images were extracted from public datasets	1300	mIoU = 87.8% Precision = 84.9% Recall = 81.8%
Li et al. [106]	Fully convolutional network and a Naive Bayes data fusion (NB-FCN) model	Crack-only Handwriting Peel off Water stain Repair trace	Detection and Quantification (in physical units)	Crack Length and width	Rotating Flipping Labeling Resizing	Own data collection using an image acquisition device named Bridge Substructure Detection (BSD-10)	7200	Accuracy = 97.96% Precision = 81.73% Recall = 78.97% F1-score = 79.95%
Fu et al. [91]	DeepLabv3+	Crack	Detection		Flipping, rotation, scaling Labeling	Own data collection using digital equipment and from the Internet in various environments.	5000	mIoU = 82.37%
Qiao et al. [93]	EMA-DenseNet	Cracks, exposed bars	Detection		Pixel-level labeling Rotation Cropping	Yang et al. [107] dataset	800	Validation mIoU = 87.42%
Qiao et al. [93]	EMA-DenseNet	Cracks, exposed bars	Detection		Pixel-level labeling Rotation Cropping	Own data collection from bridges in Xuzhou (Zhejiang Province, China)	1800 crack images and 2500 rebar images	Validation mIoU = 79.87% Test mIoU= 80.4%
Jin et al. [95]	PCR-Net-based model	Crack	Detection		Labeling	Synthesized crack image dataset named Bridge Crack Library 2.0	26,600	Accuracy = 98.9% mIoU = 61.49% Precision = 81.63% Recall = 71.36%
Subedi et al. [96]	Nested Reg-Unet	Concrete damage and exposed bars	Detection		Data augmentations (flipping, random brightness. and contrast change) Labeling	Tokaido Dataset [108]	7079	mIoU = 84.19% Precision = 90.98% Recall = 91.04%
Ayele et al. [97]	Mask R-CNN	Crack	Detection and quantification (in physical units)	Cracks length and width	Labeling	Own data collection using a UAV		Accuracy = 90%
Wang et al. [98]	RefineNet with Attention Mechanism (RefineNet-AM)	Concrete damage and exposed bars	Detection			Tokaido Dataset [108]	8599	mIoU = 67.3% Precision = 84.4% Recall = 73.8%
Montes et al. [99]	3D GNN	Corrosion, spallation, cracks, and leaking water	Detection		Labeling	Own data collection using a lidar		Accuracy = 93.914% mIoU = 33.98%
Artus et al. [100]	TernausNet16	Spallation	Detection and quantification (in physical units)			Spallation and crack (CSSC) database [109]	715	Accuracy = 91.96% mIoU = 83.26% Precision = 87.55% Recall = 81.36%
Borin and Cavazzini [101]	Mask R-CNN	Spallation	Detection and visualization		Labeling	Own data collection	575
Jang et al. [102]	Modified SegNet	Intact Crack Marker	Detection and quantification (in physical units)	Crack length and width	Contrast enhancement, Labeling	Own data collection using a ring-type climbing robot at Jang-Duck Bridge in Gangneung City. South Korea	1021	Precision = 90.92% Recall = 97.47%
McLaughlin et al. [103]	DeepLab V3	Spallation Delaminations	Detection and quantification (in physical units)	Area	Flip horizontal Flip vertical, Rotation Width and height shift Labeling	Own data collection	496 infrared images 600 visual spectrum images	Validation mIoU = 71.4%
McLaughlin et al. [103]	DeepLab V3	Spallation Delaminations	Detection and quantification (in physical units)	Area		Own data collection	496 infrared images 600 visual spectrum images	Validation mIoU = 82.7%
Ye et al. [104]	Multi-task high-resolution net (MT HRNet)	Concrete damage (crack and spallation) and exposed bars	Detection and quantification	Spallation area and the width of the cracks (in pixels)		Tokaido dataset [108]	13,956	Accuracy = 99.44% mIoU = 80.47% Precision = 94.66% Recall = 83.62%
Shen et al. [105]	Adaptive learning filters vision transformer (ALF-ViT),	Crack	Detection		Enhancement	CrackForest data set [110], Crack500 data set [111] and own data collection using mobile devices	1339	mIoU = 73.3%
Na and Kim [84]	U-Net	Crack, spallation and delamination, water leakage, exposed bars, and paint peeling	Detection		Data augmentation Labeling	Own data collection using a UAV	14,155	Accuracy = 95.7% Precision = 91.2% Recall = 91.4% F1-Score = 91.3%

Table 6. Summary of reviewed papers on bridge defect detection using Hybrid Methods.

Author	Task (Segmentation/Object Detection/Classification)	Algorithm	Type of Defect (s)	Aim	Quantified Attribute	Preprocessing	Dataset Details		Performance (%)
Author	Task (Segmentation/Object Detection/Classification)	Algorithm	Type of Defect (s)	Aim	Quantified Attribute	Preprocessing	Dataset Name	Number of Data	Performance (%)
Peng et al. [112]	Object detection + Segmentation	R-FCN + Haar-AdaBoos t + A local threshold segmentation	Crack	Detection and quantification (in physical units)	Width	Labeling Data augmentation	Own data collection using a UAV	3540	Average IoU = 90% for crack segmentation Precision = over 95% for detection
Flah et al. [113]	Classification + Segmentation	CNN + Edge detection and thresholding (Otsu method)	Crack	Detection and quantification (in physical units)	Length, width, angle of crack	Manual image selection Labeling Filter utilization to remove the non-uniform background intensity	Concrete crack image classification database [127]	6000	Classification accuracy = 98.25%
Kim et al. [114]	Classification + Segmentation	CNN (AlexNet, VGG16, ResNet152) + Morphological segmentation	Crack	Detection and quantification (in physical units)	Length	Labeling	Data collected by the Korea Expressway Corporation’s bridge monitoring system	192	Precision: AlexNet = 87.74% VGG-16 = 88.76% ResNet 152 = 87.59%
Liang [115]	Classification	Pre-trained VGG16	Bridge damage	Detection		Labeling Data augmentation	Collected from different studies and Google images	492	Accuracy = 98.98%
Liang [115]	Semantic segmentation	Fully deep CNN	Bridge damage	Detection		Pixel-wise labeling	Collected from different studies and Google images	436	Accuracy = 93.14% Weighted IoU = 87.65%
Mirzazade et al. [116]	Classification + Localization + Segmentation	Inception v3 for damage area detection+ U-Net and SegNet for pixel-wise defect segmentation	Damage classification + joints segmentation	Detection and quantification (in physical units)	Joint length and width	Cropping Labeling	Own data collection using a UAV	140 images cropped to 8344 slices and 238 images for segmentation	Inception v3 validation accuracy = 96.2% SegNet mIoU = 49.98% U-Net mIoU = 47.102
Kun et al. [117]	Classification + Segmentation	Deep bridge crack classification (DBCC)-Net: (A CNN-based neural network for crack patch classification) segmentation: DDRNet	Crack	Detection		Labeling	Own data collection using an I-800 UAV	385	DBCC-Net F1-score = 72.6% IoU = 83.4%
Kim et al. [118]	Object detection + Segmentation	R-CNN+ image processing	Crack	Detection, localization, and quantification (in physical units)	Thickness and length	Labeling	Images were taken using human resources and the UAV	384	Crack quantification relative error = 1–2%,
Yu et al. [128]	Object detection + Segmentation	YOLOv5+ image processing segmentation	Crack	Detection and quantification (in physical units)	Length and width	Cropping Extract the binary image Median filtering and graying Mask filter and ratio filter Labeling Data augmentation (mosaic, random rotation, random cropping, Gaussian noise, and manual exposure)	Own data collection using a Canon EOS 5DS R camera.	487 processed by offline data augmentation to obtain 3453 images	For YOLOv5: mAP = 98.7% Precision = 92% Recall = 97.5% For crack quantification: Absolute error is within 0.05 mm
Ni et al. [119]	Object detection + Segmentation	YOLOv5s+ Ostu method and the medial axis algorithm	Crack	Detection and quantification (in physical units)	Crack length and width	Denoising, filtering, and image enhancement (including image flipping, mirroring, rotation, translation, shearing) Dataset Augmentation through DCGANs Labeling	The concrete crack dataset is divided into two sections. The authors created one portion, while the other was obtained from publicly available data sources [52]	4000	For YOLOv5 Precision = 83.3% Recall = 95.3% mAP = 94%
Jiang et al. [120]	Object detection + Segmentation	YOLOv4 + HDCB-Net	Crack	Detection		Labeling	Own data collection: Blurred Crack, which contains five sub-datasets	150 632	Highest mAP of YOLOv4 = 80.1% on the BridgeXQ48
Jiang et al. [120]	Object detection + Segmentation	YOLOv4 + HDCB-Net	Crack	Detection		Labeling	Own data collection: Bridge Nonblurred	1536	HDCB-Net precision = 61.72%
Tran et al. [121]	Object detection + Segmentation	Detecting cracks: YOLOv7 Segmentation: optimized U-Net	Crack	Detection and quantification (in physical units)	Crack length and width	Labeling	Own data collection from bridge decks in South Korea using a 3D-mounted vehicle camera	1441 for the crack detection algorithm 800 to train the crack segmentation algorithm	mAP@0.5 = 74.5% for YOLOv7 IoU= 64% for optimized U-Net
Kao et al. [122]	Object detection + Segmentation	YOLOv4 + Thresholding (Sauvola local thresholding method) and edge detection for segmentation (Canny edge detection and morphological edge detection)	Crack	Detection and quantification (in physical units)	Crack width	Labeling for object detection For segmentation: Cropping to the Bounding Box Converting into grayscale image Image Binarization	Own data collection using Smartphones and UAVs + Open-source data on cracks (SDNET 2018 dataset [52])	1463 + 3006 = 4469	mAP = 92% for YOLOv4
Ma et al. [123]	Object detection + Segmentation	Object detection: Faster R-CNN SSD (YOLO)-v5(x) Segmentation: U-Net PSPNet	Crack	Detection		Labeling	Open-source dataset [53]	2068	F1-score: Faster R-CNN = 76% SSD = 67% YOLOv5 = 67% Accuracy: U-Net = 98.37% PSPNet = 97.86%
Inam et al. [124]	Object detection + Segmentation	YOLOv5 s, m, and l U-Net	Crack	Detection	Width, height, area	Data resizing and augmentation (rotation and cropping) Labeling	Own data collection dataset from Pakistan bridges + SDNET2018 dataset [52]	1370 2270 after data augmentation	YOLOv5 s, m, and l mAP = 97.8%, 99.3%, and 99.1% U-Net accuracy on validation set of SDNET2018 = 93.4%
Zakaria et al. [125]	Object detection + Segmentation	Object detection: Yolov5s Segmentation: U-Net	Crack and spallation	Detection, localization, and quantification (in physical units)	Maximum crack width or area of spallation	bounding box labeling data augmentation	Own data collection + also from datasets published by other researchers, including CODEBRM [44] and SDNET2018 [52]	1600 + 180	mAP = 51% for Yolov5s mIoU = 74% for U-Net
Meng et al. [126]	Object detection + Segmentation	Object detection: Improved YOLOv3 Crack Segmentation algorithm: DeepLab	cracks, spallation, exposed bars, and efflorescence	Detection and quantification	Crack length and width	Retinex image denoising algorithm Labeling	Own data collection	20,033	For improved YOLOv3: mAP = 86.3% precision = 92.9% Recall = 86.9% F1-score = 90% For DeepLab: Accuracy = 85%
Zhang et al. [80]	Object detection + Segmentation	Object detection: CR-YOLO Segmentation: PSPNet	Cracks	Detection		Data enhancement (geometric transformation, optical transformation, and noise addition)	Own data collection using a digital camera + portion of an open-sourced dataset [36]	5000	For CR-YOLO: Precision = 90.88% Recall = 88.69% PSPNet: Precision = 87.03 Recall = 85.45%

Table 7. Algorithms comparison.

Reference	Task	Algorithm	Performance (%)
Mirzazade et al. [116]	Segmentation	SegNet	Mean Accuracy in large-scale object segmentation = 97.299 Mean Accuracy in small-scale object segmentation = 50
Mirzazade et al. [116]	Segmentation	U-Net	Mean Accuracy in large-scale object segmentation = 84.144 Mean Accuracy in small-scale object segmentation = 79.5
Liang [115]	Classification	VGG-16	Testing Accuracy = 98.98
Liang [115]	Classification	AlexNet	Testing Accuracy = 93.88
Tran et al. [121]	Object Detection	YOLOv7	MAP@0.5 = 74.8			Speed (Time to analyze a 1024 × 1024-pixel image) (s) = 0.022
		Faster RCNN-ResNet101	MAP@0.5 = 72.8			Speed (s) = 0.18
		RetinaNet-ResNet101	MAP@0.5 = 72.7			Speed (s) = 0.065
		RetinaNet-ResNet50	MAP@0.5 = 72			Speed (s) = 0.055
		Faster RCNN-ResNet50	MAP@0.5 = 71.2			Speed (s) = 0.164
Trach [46]	Classification	MobileNet	Accuracy = 94.61			Mean times (training time of one epoch) (s) = 63
		ResNet 50	Accuracy = 93.1			Mean times (s) = 79
		VGG16	Accuracy = 92.73			Mean times (s) = 101
		DenseNet201	Accuracy = 91.82			Mean times (s) = 120
		Inception V3	Accuracy = 89.54			Mean times (s) = 63
Cardellicchio et al. [45]	Classification	DenseNet121	Accuracy = 85.19 for crack, 81.92 for Corroded and oxidized steel reinforcement, 91.10 for Deteriorated concrete, 90.37 for Honeycombs, 73.94 for Moisture spots, 89.89 for Pavement degradation, 96.05 for Shrinkage cracks
		InceptionV3	Accuracy = 79.7 for crack, 85.03 for Corroded and oxidized steel reinforcement, 91.78 for Deteriorated concrete, 90.37 for Honeycombs, 73.26 for Moisture spots, 93.44 for Pavement degradation, 96.04 for Shrinkage cracks
		ResNet50V2	Accuracy = 83.33 for crack, 81.85 for Corroded and oxidized steel reinforcement, 91.78 for Deteriorated concrete, 90.96 for Honeycombs, 74.77 for Moisture spots, 89.88 for Pavement degradation, 96.12 for Shrinkage cracks
		MobileNetV3	Accuracy = 84.30 for crack, 83.26 for Corroded and oxidized steel reinforcement, 90.96 for Deteriorated concrete, 89.56 for Honeycombs, 76.43 for Moisture spots, 92.25 for Pavement degradation, 96.13 for Shrinkage cracks
		NASNetMobile	Accuracy = 81.93 for crack, 85.4 for Corroded and oxidized steel reinforcement, 91.78 for Deteriorated concrete, 90.44 for Honeycombs, 76.43 for Moisture spots, 89.88 for Pavement degradation, 96.05 for Shrinkage cracks
Ma et al. [123]	Object Detection	Faster R-CNN	F1-score = 76	Precision = 80.53		Recall = 71.37
		YOLOv5	F1-score = 67	Precision = 87.5		Recall = 54.9
		SSD	F1-score = 67	Precision = 82.76		Recall = 56.47
Ni et al. [119]	Object Detection	YOLOv5l	mAP = 99.6			Inference time (s = frame) = 0.083
		Faster R-CNN	mAP = 98.3			Inference time (s = frame) = 0.5
		YOLOv3	mAP = 98.3			Inference time (s = frame) = 0.103
		YOLOv4	mAP = 82.8			Inference time (s = frame) = 0.022
		SSD	mAP = 71.8			Inference time (s = frame) = 0.022
Qiao et al. [93]	Segmentation	DeepLab v3+	mIoU (%) = 86.5			FPS (f/s) = 12.8
		FCN	mIoU (%) = 85.77			FPS (f/s) = 15.6
		SegNet	mIoU (%) = 85.35			FPS (f/s) = 18.5
Jin et al. [95]	Segmentation	DeepLab	mIoU (%) = 54.23	Precision = 67.56	Recall = 76.18	Accuracy = 98.6	F1-score = 71.61
		U-Net	mIoU (%) = 52.65	Precision = 75.93	Recall = 65.39	Accuracy = 98.75	F1-score = 70.26
		FCN	mIoU (%) = 51.22	Precision = 62.35	Recall = 77.1	Accuracy = 98.72	F1-score = 68.91
Subedi et al. [96]	Segmentation	DeepLabV3+	Precision = 89.24	Recall = 87.27	F1-score = 88.24	mIoU (%) = 80.03	Inference time (ms) = 32
		U-Net	Precision = 88.83	Recall = 87.9	F1-score = 88.36	mIoU (%) = 80.19	Inference time (ms) = 32
		LinkNet	Precision = 88.8	Recall = 87.86	F1-score = 88.33	mIoU (%) = 80.15	Inference time (ms) = 31
		RefineNet	Precision = 85.64	Recall = 65.65	F1-score = 85.64	mIoU (%) = 76.5	Inference time (ms) = 31
		PSPNet	Precision = 85.42	Recall = 82.87	F1-score = 84.13	mIoU (%) = 74.42	Inference time (ms) = 12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lotfi Karkan, N.; Shakeri, E.; Sadeghi, N.; Banihashemi, S. Smart Surveillance of Structural Health: A Systematic Review of Deep Learning-Based Visual Inspection of Concrete Bridges Using 2D Images. Infrastructures 2025, 10, 338. https://doi.org/10.3390/infrastructures10120338

AMA Style

Lotfi Karkan N, Shakeri E, Sadeghi N, Banihashemi S. Smart Surveillance of Structural Health: A Systematic Review of Deep Learning-Based Visual Inspection of Concrete Bridges Using 2D Images. Infrastructures. 2025; 10(12):338. https://doi.org/10.3390/infrastructures10120338

Chicago/Turabian Style

Lotfi Karkan, Nasrin, Eghbal Shakeri, Naimeh Sadeghi, and Saeed Banihashemi. 2025. "Smart Surveillance of Structural Health: A Systematic Review of Deep Learning-Based Visual Inspection of Concrete Bridges Using 2D Images" Infrastructures 10, no. 12: 338. https://doi.org/10.3390/infrastructures10120338

APA Style

Lotfi Karkan, N., Shakeri, E., Sadeghi, N., & Banihashemi, S. (2025). Smart Surveillance of Structural Health: A Systematic Review of Deep Learning-Based Visual Inspection of Concrete Bridges Using 2D Images. Infrastructures, 10(12), 338. https://doi.org/10.3390/infrastructures10120338

Article Menu

Smart Surveillance of Structural Health: A Systematic Review of Deep Learning-Based Visual Inspection of Concrete Bridges Using 2D Images

Abstract

1. Introduction

2. Background on Deep Learning

2.1. Deep Learning-Based Image Classification Algorithms

2.2. Deep Learning-Based Image Segmentation Algorithms

2.3. Deep Learning-Based Object Detection Algorithms

2.4. Evaluation Metrics

3. Research Methodology

4. Analysis and Results

4.1. Descriptive Analysis

4.2. Content Analysis

4.2.1. Image Classification

4.2.2. Object Detection

4.2.3. Image Segmentation

4.2.4. Hybrid Methods

5. Discussion and Comparison of Algorithms

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI