Rotating Target Detection Method of Concrete Bridge Crack Based on YOLO v5

Liu, Yu; Zhou, Tong; Xu, Jingye; Hong, Yu; Pu, Qianhui; Wen, Xuguang

doi:10.3390/app132011118

Open AccessArticle

Rotating Target Detection Method of Concrete Bridge Crack Based on YOLO v5

by

Yu Liu

¹

,

Tong Zhou

¹

,

Jingye Xu

¹

,

Yu Hong

^1,*

,

Qianhui Pu

¹ and

Xuguang Wen

²

¹

School of Civil Engineering, Southwest Jiaotong University, Chengdu 610031, China

²

Guangxi Key Laboratory of International Join for China-ASEAN Comprehensive Transportation, Nanning University, Nanning 530000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(20), 11118; https://doi.org/10.3390/app132011118

Submission received: 8 September 2023 / Revised: 27 September 2023 / Accepted: 6 October 2023 / Published: 10 October 2023

(This article belongs to the Section Civil Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Crack detection is a critical and essential aspect of concrete bridge maintenance and management. Manual inspection often falls short in meeting the demands of large-scale crack detection in terms of cost, efficiency, accuracy, and data management. To address the challenges faced by existing generic object detection algorithms in achieving high accuracy or efficiency when detecting cracks with large aspect ratios, overlapping structures, and clear directional characteristics, this paper presents improvements to the YOLO v5 model. These enhancements include the introduction of angle regression variables, the definition of a new loss function, the integration of PSA-Neck and ECA-Layer attention mechanism modules into the network architecture, consideration of the contribution of each node’s features to the network, and the addition of skip connections within the same feature scale. This results in a novel crack image rotation object detection algorithm named “R-YOLO v5”. After training the R-YOLO v5 model for 300 iterations on a dataset comprising 1628 crack images, the model achieved an mAP@0.5 of 94.03% on the test set, which is significantly higher than other rotation object detection algorithms such as SASM, S²A Net, Re Det, as well as the horizontal-box YOLO v5 model. Furthermore, R-YOLO v5 demonstrates clear advantages in terms of model size (4.17 MB) and detection speed (0.01 s per image). These results demonstrate that the designed model effectively detects cracks in concrete bridges and exhibits robustness, minimal memory usage, making it suitable for real-time crack detection on small devices like smartphones or drones. Additionally, the rotation object detection improvement strategy discussed in this study holds potential applicability for enhancing other object detection algorithms.

Keywords:

health monitoring; crack identification of concrete bridge; machine vision; yolo; rotating target detection

1. Introduction

1.1. Motivation

Currently, China’s total highway mileage has reached 5.28 million kilometers, with over 970,000 highway bridges, playing a significant role in the national comprehensive transportation system. More than 95% of these highway bridges are constructed using concrete, and cracks serve as a key indicator in the mechanical testing of concrete structures, as well as an important aspect of bridge health assessment. Cracks provide a visual understanding of the extent of structural damage and can reveal early signs of most types of defects. Cracks can lead to the detachment of protective layers, resulting in the corrosion of reinforcement bars and diminishing structural strength, durability, and stability. Severe through-cracks can even pose substantial threats to structural safety [1]. Therefore, conducting regular inspections of bridge cracks and evaluating their residual performance, identifying potential issues, and implementing appropriate maintenance measures not only ensure smooth traffic flow and enhance transportation efficiency but also play a pivotal role in promoting regional economic development. However, traditional bridge crack detection heavily relies on manual operations, resulting in low efficiency, unstable outcomes, and difficulties in quantification [2]. There is a significant safety risk when conducting high-altitude crack inspections manually, as shown in Figure 1. This necessitates a substantial additional investment to ensure the safety of inspection personnel. Influenced by factors such as lighting conditions and height, the data are difficult to present intuitively. In terms of economy, efficiency, accuracy, and data management, traditional methods struggle to meet the demands of extensive bridge crack detection [3].

1.2. Related Work

With the rapid advancement of computer hardware and software technology, an increasing number of researchers are utilizing machine vision algorithms for detecting cracks in concrete bridge structures. Hossein et al. [5] proposed an arc length method that effectively captures the shape of bridge cracks and identifies features such as crack length, width, and angle. Abdlekader et al. [6] designed a hybrid image filtering protocol based on the moth–flame optimization algorithm, demonstrating excellent denoising performance on bridge defect images. Fan et al. [7] introduced a multi-view geometric 3D reconstruction method, addressing challenges in surface crack identification, damage localization, and deformation recognition. Compared to manual inspection, machine vision offers objectivity, accuracy, and efficiency. It can also be combined with unmanned aerial vehicles for aerial inspections [8] and integrated with climbing robots [9], significantly enhancing the safety and economic benefits of inspections. In recent years, deep learning technology has become mainstream, eliminating the cumbersome preprocessing steps of traditional image processing methods while enhancing detection accuracy. This technology has greatly expanded the capability of vision-based concrete crack detection [10]. Kalfarisi et al. [11] integrated region-based Faster R-CNN (FRCNN) with Structured Random Forest Edge Detection (SRFED), achieving both crack detection with bounding boxes and crack identification within boxes. Cha et al. [12], using the Faster R-CNN approach, successfully identified and located various damage types, including concrete cracks, different levels of corrosion, and layering. Laxman et al. [13] developed a comprehensive automatic crack detection and crack depth assessment framework for concrete structures using images captured with portable devices. Among these, the YOLO series [14,15], being one of the most popular object detection algorithms, has been employed by numerous researchers in fields such as medicine [16,17], autonomous driving [18,19], and aerospace [20,21]. In the field of concrete crack detection, due to the inherent irregularity in the morphology of cracks, the original network’s performance in comparison to other object recognition tasks is not as satisfactory. Therefore, a multitude of scholars have undertaken improvements. Liao et al. [22] employed an enhanced YOLOv3 network and introduced the K-Means clustering algorithm to address the issue of original anchor sizes not being suitable for bridge crack detection. Cai et al. [23] employed YOLO v3 combined with depthwise separable convolutions and attention mechanisms to propose a lightweight detection network for the real-time detection of surface cracks on bridges. Yu et al. [24] combined YOLO v5 with the UNet3+ algorithm to develop an integrated intelligent bridge crack detection method. Tan et al. [25] improved the DeepLabv3+ model by introducing YOLOF and ResNet modules, significantly enhancing its accuracy compared to the original model. However, the YOLO series has a fatal flaw when used for crack detection. There is a significant amount of background information within the horizontally oriented detection boxes, making it challenging to accurately capture the essence of the cracks. This leads to difficulties in subsequent localization and quantification. To address this challenge, this paper explores a novel crack image rotation-based automatic detection method (R-YOLO v5). It replaces horizontal rectangular anchor boxes with rotated rectangular anchor boxes, introduces rotation regression variables to the original anchor framework, and redefines the loss function. Additionally, it incorporates an attention mechanism module to enhance the model’s recognition accuracy, providing a new approach for crack detection and change monitoring.

2. YOLO v5 Model

The YOLO v5 network architecture consists of five main components: Input, Backbone, Neck, Prediction, and Output, as shown in Figure 2. The Input module uniformly scales images to a specific size using adaptive scaling. Data augmentation for input images is performed using both Mosaic and random affine transformation methods. The Backbone encompasses the Focus, Convolution-Batch Normalization-Leak ReLU, BottleneckCSP, and Spatial Pyramid Pooling (SPP) modules [26] for feature extraction. The Focus module approximates down sampling while integrating width and height information into channels, preserving features. The BottleneckCSP employs residual connections for multi-level feature fusion through branch connections, enhancing feature extraction capabilities. The SPP module aggregates pooling layers of varying scales, enabling multi-receptive field fusion to enhance recognition in complex scenes. The Neck employs a Feature Pyramid Networks (FPNs) + Path Aggregation Network (PAN) structure [27] to effectively fuse features extracted by the Backbone. This results in three enhanced feature layers of different scales, which are then passed to the Prediction module. The Prediction module produces prediction results, which undergo Non-Maximum Suppression (NMS) [28] to obtain final prediction boxes, thereby obtaining object category and location information. The model’s activation functions and loss functions are set based on the network’s outputs and label information, facilitating the updating of network parameters. The latest YOLO v5 employs the Mish activation function [29] and the CIOU loss function [30].

YOLOv5 possesses powerful real-time object detection capabilities, striking a balance between speed and accuracy, making it a mainstream choice for various object detection tasks. However, it requires a significant amount of computational resources during training and is sensitive to input image quality, necessitating careful tuning for optimal performance on specific tasks. For example, when applied to concrete crack detection, several challenges persist:

(1) Compared to well-defined objects, crack image targets are characterized by high aspect ratios, overlapping and intersecting regions, and directional significance. Using horizontal bounding boxes fails to capture the directional information of targets and hinders accurate localization.

(2) Crack images possess intricate backgrounds and are often accompanied by noise. YOLO v5 lacks an effective attention mechanism module, leading to suboptimal detection accuracy in complex backgrounds.

(3) Bridge crack images have relatively low resolutions, causing feature maps to progressively decrease in size due to multiple down sampling layers in YOLO v5 architecture. Consequently, the fine-grained details of cracks are severely compromised.

3. Enhanced YOLO v5 Model

3.1. Angle Regression

The precision of crack annotation is correlated with the convergence time of network training, with fewer redundant details in annotations proving advantageous in guiding the training direction and accelerating the network’s convergence speed. The original YOLO v5s network employs horizontal rectangular bounding boxes to determine object positions and categories. However, the cracks in concrete structure images exhibit elongated features, sometimes spanning the entire structural surface, with intersections and varying orientations. Evidently, rotated rectangular bounding boxes are more suitable for crack annotation, circumventing issues like overlapping true object boxes caused by original rectangular boxes and situations where larger boxes entirely enclose smaller ones. The annotation effect is depicted in Figure 3. It can be observed that rotated rectangular target boxes contain fewer redundant details, with the addition of angle information to represent orientation over the original horizontal bounding boxes. Therefore, this paper selects the YOLO v5s network model as a foundation and introduces modifications to create the new R-YOLO v5, specifically tailored for rotation-aware crack detection in concrete structures.

Common methods for defining rotated boxes often encounter angle boundary issues. Taking the long edge definition method as an example, the label comprises five parameters

(x, y, w, h, α)

: x and y, respectively, represent the horizontal and vertical coordinates of the rotated box’s center point, while w and h denote the long and short sides of the rotated box.

α

stands for the angle between the long side and the x-axis. The angle is measured within a range of 180°, with an angle scope of [−90°, 90°), as illustrated in Figure 4.

From Figure 3, it is evident that the boundary values of −90° and 90° are essentially the same, but numerically they differ by 180°. This discontinuity leads to issues of non-differentiability in the Intersection over Union (IOU) computation of rotated boxes. Consequently, in the regression loss calculation, the Smooth-L1 loss function [31], known for its adaptability to discontinuities at boundaries, is usually employed. However, minimizing the Smooth-L1 loss during training does not necessarily directly correspond to optimal performance in IOU evaluation, resulting in inconsistencies between model loss values and detection accuracy. To address this, a fundamental approach is offered by the Gaussian Wasserstein distance (GWD)-based regression loss [32], which approximates the non-differentiable rotated IOU loss by converting rotated boxes into two-dimensional Gaussian distributions. Even when two rotated bounding boxes do not overlap, GWD can effectively measure the loss value of bounding boxes. We transform any rotated box

(x, y, w, h, α)

into a two-dimensional Gaussian distribution

N (μ, Σ)

, with

μ

representing the mean of the two-dimensional Gaussian distribution. It is composed of the center point’s horizontal coordinate x and vertical coordinate y. The calculation is as follows:

\sum = \sum^{1 / 2} \sum^{1 / 2}

(1)

\sum^{1 / 2} = [\begin{matrix} \cos α & - \sin α \\ \sin α & \cos α \end{matrix}] [\begin{matrix} \frac{w}{2} & 0 \\ 0 & \frac{h}{2} \end{matrix}] [\begin{matrix} \cos α & \sin α \\ - \sin α & \cos α \end{matrix}] = [\begin{matrix} \frac{w}{2} \cos^{2} α + \frac{h}{2} \sin^{2} α & \frac{w - h}{2} \cos α \sin α \\ \frac{w - h}{2} \cos α \sin α & \frac{h}{2} \cos^{2} α + \frac{w}{2} \sin^{2} α \end{matrix}]

(2)

where

α

is the angle of the rotated bounding box,

w

is the width of the rotated bounding box,

h

is the height of the rotated bounding box, and

\sum

is the covariance matrix of the two-dimensional Gaussian distribution.

N_{1} (μ_{1}, Σ_{1})

and

N_{2} (μ_{2}, Σ_{2})

represent two Gaussian distributions, and the Wasserstein distance between them is:

d = {‖μ_{1} - μ_{2}‖}_{2}^{2} + T r {(Σ_{1} + Σ_{2} - 2 Σ_{1}^{1 / 2} Σ_{2} Σ_{1}^{1 / 2})}^{1 / 2}

(3)

where

μ_{1}

is the mean value of two-dimensional Gaussian distribution-N₁,

μ_{2}

is the mean value of two-dimensional Gaussian distribution-N₂,

Σ_{1}

is the covariance matrix of two-dimensional Gaussian distribution-N₁,

Σ_{2}

is the covariance matrix of two-dimensional Gaussian distribution-N₂, and

T r

is the sum of the diagonal elements of a matrix.

To mitigate the sensitivity of GWD to large losses, a nonlinear transformation is applied to d to render the loss smoother and more expressive. The regression loss function for rotated boxes is as follows:

L = 1 - \frac{1}{1 + \ln (1 + d)}

(4)

where L is the regression loss value of the rotation frame.

3.2. Attention Mechanism

The attention mechanism module, by introducing learnable weights, focuses the model’s attention on crucial regions, thereby mitigating the influence of noise during feature extraction from crack images. Currently, attention mechanisms fall into two categories: channel attention mechanisms (such as SE-Layer [33,34] and ECA-Layer [35,36]), which focus on channel information while disregarding spatial details, and channel–spatial attention mechanisms (such as CBAM [37,38] and SCA [39,40]), which combine both channel and spatial attentions. However, the utilization of spatial information between feature maps of different scales is constrained in the latter, making it challenging to establish long-range channel dependencies.

3.2.1. PSA-Neck Module

To address the inherent characteristics of crack image targets and the limitations of existing attention mechanisms, inspired by the design of residual edges in the Bottleneck architecture, we replace the 3 × 3 convolutional layer in the Res-Net network with a Pyramid Split Attention (PSA) module, creating a novel PSA-Neck. This is depicted conceptually in Figure 5. The PSA-Neck comprises a master branch and a residual branch. The master branch undergoes a 1 × 1 convolutional kernel, followed by Batch Normalization (BN) and a Hard-swish activation function, and finally passes through the PSA module. When the number of output channels of the master branch matches that of the residual branch, the features of both branches are added and then output.

For input feature information

X \in R^{C \times H \times W}

, the PSA module first employs the Split, Pyramid, and Channel (SPC) module to split channels, followed by extracting multi-scale feature information from the spatial information of each channel feature map. Subsequently, the SE-Weight module is utilized to extract channel attention for different scale feature maps, yielding channel attention vectors for each scale. Thirdly, Soft-max is applied to recalibrate the multi-scale channel attention vectors, resulting in new attention weights after multi-scale channel interactions. Lastly, the recalibrated weights and corresponding feature maps are element-wise multiplied, generating a feature map weighted by multi-scale feature information attention. The PSA module’s structure is depicted in Figure 6.

3.2.2. ECA-Layer Module

Considering the requirements for model detection performance, the efficient channel attention mechanism ECA-Layer is inserted into the network to effectively enhance model performance. Due to its lightweight design, it does not lead to a significant increase in model parameters. Within the ECA-Layer module, the input feature information is first globally average-pooled to obtain weight information corresponding to channels. Subsequently, it is processed through one-dimensional convolution, followed by treatment with a Sigmoid activation function. The obtained channel attention weights are then multiplied with the original input feature information to generate the output feature information, as illustrated in Figure 7. The ECA-Layer mechanism enhances the performance of convolutional neural networks in computer vision tasks such as image classification and object detection by autonomously learning and selecting critical channel information, amplifying the importance of valuable features, reducing computational overhead, and expanding the network’s receptive field.

For the size of the convolution kernel k, we follow the following process for formula derivation. There is a certain mapping relationship between the one-dimensional convolution kernel size k and channels C, as shown in Equation (5):

C = ϕ (k)

(5)

The simplest functional relationship is a linear relationship, as shown in Equation (6):

ϕ (k) = γ * k - b

(6)

Since the number of channels is typically set to a power of 2, we introduce this linear function into a potential nonlinear solution, as shown in Equation (7):

C = ϕ (k) = 2^{(γ * k - b)}

(7)

Thus, we reverse calculate the convolution kernel size k and perform rounding, as shown in Equation (8):

k = φ (C) = {|\frac{\log_{2} C}{γ} + \frac{b}{γ}|}_{o d d}

(8)

where |·|_odd is the nearest odd number, and γ and b are the hyperparameters (γ = 2, b = 1).

3.3. Enhanced Feature Fusion

Efficient feature fusion modules can thoroughly integrate high- and low-level information, enhancing model performance. YOLO v5 employs a Path Aggregation Network (PAN) structure for feature fusion. Compared to traditional Feature Pyramid Networks (FPNs), PAN introduces an upward–downward feature fusion layer, making more effective use of low-level features, as shown in Figure 8a. However, conventional feature fusion structures like PAN indiscriminately merge nodes, overlooking the contribution of each node. Inspired by BiFPN [41], we enhance R-YOLO v5’s PAN by introducing same-scale skip connections, eliminating low-contributing nodes, and thus enhancing feature fusion efficiency, as depicted in Figure 8b.

3.4. Improved Network Architecture

The improved network architecture is illustrated in Figure 9. Some of the BottleneckCSP structures in YOLO v5s have been replaced with PSAneckCSP structures featuring fused pyramid split attention mechanisms. Additionally, an ECA-Layer has been added after each CSP to enhance critical feature extraction. To address the limitations of the original PAN feature fusion, we have enhanced the module by considering node contributions and introducing same-scale skip connections.

4. Experiments

4.1. Datasets

The data for this crack identification task come from a publicly available online dataset [42] that consists of 2036 images of concrete bridge structural cracks captured by drones; all images were manually annotated using the roLabelImg annotation tool to generate corresponding label files. The label files included crack feature categories and coordinate information corresponding to the dataset images. The dataset was randomly divided into training, validation, and test sets using an 8:1:1 ratio, with 1628 images for training, 204 for validation, and 204 for testing. The training and validation sets were employed for model training and assessing individual training results, while the test set was utilized to evaluate the final model’s detection performance.

4.2. Evaluation Metrics

The experiments utilized mean average precision (mAP) and F1 score as evaluation metrics to assess the performance of the model:

P = \frac{N_{T P}}{N_{T P} + N_{F P}}

(9)

R = \frac{N_{T P}}{N_{T P} + N_{F N}}

(10)

F 1 = \frac{2 P R}{P + R}

(11)

first, define an Intersection over Union (IoU) threshold, typically 0.5, which is a commonly used threshold. This threshold represents the degree of overlap between detected bounding boxes and actual target bounding boxes. N_TP stands for the number of successful detections with an IoU greater than or equal to the threshold, indicating the number of correctly detected targets. N_FP represents the number of successful detections with an IoU less than the threshold, indicating the number of correctly detected but incorrectly bounded targets. N_FN represents the number of targets that were not detected at all. Calculate precision (P) and recall (R) using the formulas, and the F1 score is the harmonic mean of P and R. Next, draw a Precision–Recall (PR) curve with a step size of 0.1, taking the corresponding p values for R = [0, 0.1, 0.2, …, 1]. The average precision (AP) is the average of these p values. To calculate mAP, sum up the AP values for all classes and take the average. Specifically, mAP@0.5 is the mAP value at an IoU threshold of 50% or more, while mAP@0.5:0.95 represents the mAP value over IoU thresholds ranging from 50% to 95%.

4.3. Training

The algorithm experiments were conducted on a Windows Server 2019 operating system, utilizing an Intel(R) Xeon(R) Gold 5218R CPU 2.10 GHz processor and an Nvidia GeForce RTX 4090 24 GB graphics card. The deep learning framework used was PyTorch 2.0.0, with the programming platform being PyCharm and the programming language being Python 3.8. All comparative algorithms were executed in the same environment.

In the data augmentation process of random affine transformation in the R-YOLO v5 network, rotations and shearing transformations that alter object angles were disabled prior to inputting the network. To expedite training speed, model parameters were initialized with weights pre-trained on the COCO dataset, followed by fine-tuning. Model hyperparameters were configured with a batch size of 16, momentum factor of 0.95, initial learning rate of 0.01, and a learning rate reduction strategy using cosine annealing. The training process consisted of 300 iterations.

4.4. Results Analysis

After 300 training iterations, the variations in various metrics during the network training process are presented in Table 1. Precision (P), Recall (R), and mean Average Precision (mAP) experienced slight oscillations in the first 100 iterations and gradually increased thereafter. The convergence of the loss value was rapid, and around 180 iterations, the model reached a saturation point, with the loss value fluctuating around 0.087. The final model achieved a Precision of 91.52, Recall of 90.54, and Average Precision (AP), at an Intersection over Union (IoU) threshold of 0.5, was 94.03.

4.5. Performance Comparison

In this chapter, we conducted a comparison using two approaches. The first approach involved comparing the improved R-YOLO v5 with other rotation detection models such as SASM, S²A Net, and Re Det in terms of predictive performance. The second approach involved contrasting the performance of the enhanced algorithm with the original algorithm.

(1): Comparison of Rotational Object Detection Algorithms

To assess the detection performance of the R-YOLO v5 network on concrete structure cracks, the same dataset of concrete structure cracks was used to train rotational object detection algorithms, including SASM [43], S2A Net [44], Re Det [45], and R-YOLO v5. Following the training, the models were evaluated using the test set for performance comparison (all comparative algorithms were executed in the same environment). The performances of SASM, S²A Net, Re Det, and R-YOLO v5 are summarized in Table 2:

According to the results in Table 2, it can be observed that R-YOLO v5 achieves a higher mAP@0.5 of 94.03 compared to the other three algorithms, indicating a significant improvement in detection accuracy. The model’s memory footprint is 4.17 MB, which is notably smaller than the other algorithms. Additionally, the average detection time for a single image is only 0.01 s. For detecting concrete cracks with large aspect ratios, R-YOLO v5 demonstrates superior overall performance. It meets the precision requirements while also offering advantages such as lightweight model size and fast detection speed. This makes it suitable for real-time crack detection on compact devices such as smartphones or drones.

(2): Comparison of YOLO v5

To analyze the performance differences between the R-YOLO v5 rotation-based object detection algorithm and the standard YOLO v5 object detection algorithm for detecting concrete crack structures, a total of 2036 concrete crack images were annotated using original rectangular bounding boxes and then fed into the YOLO v5s network for training (under the same training environment). Finally, the trained models were evaluated using the test dataset. After conducting the tests, it was found that the YOLO v5 object detection algorithm achieved an mAP@0.5 of 72.12 for detecting concrete structure cracks. The model size was measured at 4.14 Mb, and the average detection time per image was 0.01 s. Comparing the test results, it is evident that R-YOLO v5 outperforms YOLO v5 with a 30.38 higher mAP@0.5. Additionally, the model size of R-YOLO v5 is slightly larger by 0.03 Mb compared to YOLO v5, while the average detection time per image remains the same.

Precision and F1-score are shown in Figure 10. As seen from the figure, the Precision and F1 scores of the R-YOLO v5 model are significantly higher than those of the YOLO v5 model, and it converges faster, which converged in the 10th epoch, while the YOLOv5 algorithm converged in the 50th epoch.

The detection results of all algorithms are presented in Figure 11. The figure demonstrates that the R-YOLO v5 algorithm exhibits a higher accuracy in identifying cracks compared to other algorithms. The annotated cracks have fewer background components, and the recall rate is also higher. The evaluation metrics and detection performance collectively indicate that the R-YOLO v5 network demonstrates superior performance and is suitable for concrete structure crack detection.

5. Conclusions

In response to challenges in concrete image object detection, an enhancement was made to YOLO v5, resulting in the R-YOLO v5 rotational object detection algorithm. This enhancement was carried out through two primary aspects. Firstly, an angular regression variable was introduced, leading to the redefinition of the loss function. Secondly, attention mechanisms, namely PSA-Neck (Positional Self-Attention Neck) and ECA-Layer (Efficient Channel Attention Layer), were seamlessly integrated into the network architecture. This integration took into account the contributions of various node features to the network’s performance. Furthermore, the algorithm incorporated skip connections to maintain consistent feature scales across the network.

The experimental results demonstrate that the R-YOLO v5 detection algorithm achieves an mAP@0.5 of 94.03% for detecting concrete structural cracks, surpassing other algorithms like SASM, S²A Net, Re Det, as well as the horizontal bounding box-based YOLO v5 model. Furthermore, R-YOLO v5 exhibits notable advantages in terms of model size (4.17 Mb) and detection speed (0.01 s per image).

The algorithm proposed in this paper for crack detection offers a higher accuracy and faster detection speed compared to previous research. It can effectively detect concrete bridge cracks in unmanned aerial vehicle (UAV) aerial videos or images, eliminating the need for manual high-altitude inspections and significantly improving detection efficiency. Additionally, the improved strategy for rotating object detection discussed in this study has a certain degree of generality and can provide insights for enhancing other object detection algorithms. However, the current model still has a large computational and parameter load, which increases the training difficulty to some extent. Future work could explore lightweight improvements to the model with the goal of ensuring detection accuracy while enabling deployment on portable devices.

Author Contributions

Methodology, investigation, writing—original draft, visualization and data curation, Y.L.; model experiment, T.Z. and J.X.; writing—review and editing, Y.H.; conceptualization and funding acquisition, Q.P.; provision of study materials, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Cultivation Program for the Guangxi Science and Technology Plan Project of China (No. AA21077011).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Asvitha, V.; Ravi, K. Review on the mechanism and mitigation of cracks in concrete. Appl. Eng. Sci. 2023, 100154, 2666–4968. [Google Scholar]
Rosso, M.; Marasco, G.; Aiello, S.; Aloisio, A.; Chiaia, B.; Marano, G. Convolutional networks and transformers for intelligent road tunnel investigations. Comput. Struct. 2023, 275, 106918. [Google Scholar]
Zhou, X.; Zhang, X. Thoughts on the Development of Bridge Technology in China. Engineering 2019, 5, 1120–1130. [Google Scholar] [CrossRef]
Billie, F.; Vedhus, H.; Yasutaka, N. Advances in Computer Vision-Based Civil Infrastructure Inspection and Monitoring. Engineering 2019, 5, 199–222. [Google Scholar]
Asjodi, A.; Daeizadeh, M.; Hamidia, M.; Dolatshahi, K. Arc Length method for extracting crack pattern characteristics. Struct. Control Health Monit. 2020, 28, 2653. [Google Scholar]
Abdlekader, E.; Marzouk, M.; Zayed, T. A self-adaptive exhaustive search optimization-based method for restoration of bridge defects images. Int. J. Mach. Learn. Cybern. 2020, 11, 1659–1716. [Google Scholar]
Liu, Y.; Fan, J.; Kong, S.; Wei, X. Multi-View Geometric 3D Reconstruction Method for Identifying Structural Defects and Deformations. Eng. Mech. 2020, 37, 103–111. [Google Scholar]
Sanchez, P.; Ramon, P.; Arrue, B.; Ollero, A.; Heredia, G. Robotic System for Inspection by Contact of Bridge Beams Using UAVs. Sensors 2019, 19, 305. [Google Scholar]
Keunyoung, J.; Kyu, Y.; Byunghyun, K.; Soojin, C. Automated crack evaluation of a high-rise bridge pier using a ring-type climbing robot. Comput.-Aided Civ. Infrastruct. Eng. 2020, 36, 14–29. [Google Scholar]
Cha, Y.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar]
Kalfarisi, R.; Wu, Z.; Soh, K. Crack Detection and Segmentation Using Deep Learning with 3D Reality Mesh Model for Quantitative Assessment and Integrated Visualization. J. Comput. Civ. Eng. 2020, 34, 04020010. [Google Scholar]
Cha, Y.; Choi, W.; Suh, G.; Mahmoudkhani, S.; Büyüköztürk, O. Autonomous Structural Visual Inspection Using Region-Based Deep Learning for Detecting Multiple Damage Types. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 731–747. [Google Scholar] [CrossRef]
Laxman, K.; Tabassum, N.; Ai, L.; Cole, C.; Ziehl, P. Automated crack detection and crack depth prediction for reinforced concrete structures using deep learning. Constr. Build. Mater. 2023, 370, 130709. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. Computer Vision and Pattern Recognition. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOx: Exceeding YOLO Series in 2021. Computer Vision and Pattern Recognition. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Zhou, T.; Liu, F.; Ye, X.; Wang, H.; Lu, H. CCGL-YOLOV5: A cross-modal cross-scale global-local attention YOLOV5 lung tumor detection model. Comput. Biol. Med. 2023, 165, 107387. [Google Scholar] [CrossRef]
Klinwichit, P.; Yookwan, W.; Limchareon, S.; Chinnasarn, K.; Jang, J.; Onuean, A. BUU-LSPINE: A Thai Open Lumbar Spine Dataset for Spondylolisthesis Detection. Appl. Sci. 2023, 13, 8646. [Google Scholar] [CrossRef]
Mahaur, B.; Mishra, K. Small-object detection based on YOLOv5 in autonomous driving systems. Pattern Recognit. Lett. 2023, 168, 115–122. [Google Scholar]
Cho, K.; Cho, D. Autonomous Driving Assistance with Dynamic Objects Using Traffic Surveillance Cameras. Appl. Sci. 2022, 12, 6247. [Google Scholar] [CrossRef]
Zhou, Y.; Chang, B.; Zou, H.; Sun, L.; Wang, L.; Du, D. Online visual monitoring method for liquid rocket engine nozzle welding based on a multi-task deep learning model. J. Manuf. Syst. 2023, 68, 1–11. [Google Scholar] [CrossRef]
Zhuang, H.; Xia, Y.; Wang, N.; Dong, L. High Inclusiveness and Accuracy Motion Blur Real-Time Gesture Recognition Based on YOLOv4 Model Combined Attention Mechanism and DeblurGanv2. Appl. Sci. 2021, 11, 998. [Google Scholar]
Liao, Y.; Li, W. Bridged Crack Detection Method Based on Convolutional Neural Networks. Comput. Eng. Des. 2021, 42, 2366–2372. [Google Scholar]
Cai, F.; Zhang, Y.; Huang, J. Bridge Surface Crack Detection Algorithm Based on YOLOv3 and Attention Mechanism. Pattern Recognit. Artif. Intell. 2020, 33, 926–933. [Google Scholar]
Yu, J.; Liu, B.; Yin, D.; Gao, W.; Xie, Y. Bridge Crack Intelligent Recognition and Measurement based on YOLOv5 and UNet3+. J. Hunan Univ. 2023, 50, 65–73. [Google Scholar]
Tan, G.; Ou, J.; Ai, Y.; Yang, R. Bridge Crack Image Segmentation Method based on Improved DeepLabv3+ Mode. J. Jilin Univ. 2022, 1–7. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
Trinh, H.; Le, D.; Kwon, Y. PANET: PANET: A GPU-based tool for fast parallel analysis of robustness dynamics and feed-forward/feedback loop structures in large-scale biological networks. PLoS ONE 2014, 9, 103010. [Google Scholar]
Neubeck, A.; Gool, L. Efficient Non-Maximum Suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; pp. 850–855. [Google Scholar]
Misra, D. Mish. A Self Regularized Non-Monotonic Neural Activation Function. Machine Learning. arXiv 2019, arXiv:1908.08681. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. Proc. AAAI Conf. Artif. Intell. 2019, 34, 12993–13000. [Google Scholar] [CrossRef]
Girshick, R. Fast R CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Yang, X.; Yan, J.; Ming, Q.; Wang, W.; Zhang, X.; Tian, Q. Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss. In Proceedings of the Proceedings of the 38th International Conference on Machine Learning, Virtual Event, 18–24 July 2021; Volume 139, pp. 11830–11841. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Yue, D.; Luo, J.; Li, H. The generative adversarial network improved by channel relationship learning mechanisms. Neurocomputing 2021, 454, 1–13. [Google Scholar]
Li, X.; Wang, C.; Ju, H.; Li, Z. Surface Defect Detection Model for Aero-Engine Components Based on Improved YOLOv5. Appl. Sci. 2022, 12, 7235. [Google Scholar] [CrossRef]
Cao, Y.; Chen, J.; Zhang, Z. A sheep dynamic counting scheme based on the fusion between an improved-sparrow-search YOLOv5x-ECA model and few-shot deepsort algorithm. Comput. Electron. Agric. 2023, 206, 107696. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J. CBAM: Convolutional Block Attention Module. Comput. Vis.-ECCV. 2018, 11211, 3–19. [Google Scholar]
Guo, Z.; Yang, J.; Liu, S. Research on Lightweight Model for Rapid Identification of Chunky Food Based on Machine Vision. Appl. Sci. 2023, 13, 8781. [Google Scholar] [CrossRef]
Tang, X.; Huang, F.; Li, C.; Ban, D. Dayan Ban, SCA-Net: Spatial and channel attention-based network for 3D point clouds. Comput. Vis. Image Underst. 2023, 32, 103690. [Google Scholar] [CrossRef]
Xu, Y.; Liu, J.; Zhao, X.; Zhu, X. AMS-PAN: Breast ultrasound image segmentation model combining attention mechanism and multi-scale features. Biomed. Signal Process. Control 2023, 81, 104425. [Google Scholar]
Tan, M.; Pang, R.; Le, Q. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. [Google Scholar]
Concrete Road Surface, Bridge Construction, Tunnel Crack and Fracture Detection Dataset. Available online: https://blog.csdn.net/Together_CZ/article/details/128290225?utm_medium=distribute.pc_relevant.none-task-blog-2~default~baidujs_utm_term~default-2-128290225-blog-125138837.235^v38^pc_relevant_sort_base2&spm=1001.2101.3001.4242.2&utm_relevant_index=5 (accessed on 22 September 2023).
Hou, L.; Lu, K.; Xue, J.; Li, Y. Shape-Adaptive Selection and Measurement for Oriented Object Detection. Proc. AAAI Conf. Artif. Intell. 2022, 36, 923–932. [Google Scholar] [CrossRef]
Han, J.; Ding, J.; Li, J.; Xia, G. Align Deep Features for Oriented Object Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–11. [Google Scholar]
Han, J.; Ding, J.; Xue, N.; Xia, G. Redet: A Rotation-equivariant Detector for Aerial Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2786–2795. [Google Scholar]

Figure 1. Inspectors from the US Army Corps of Engineers rappel down a Tainter gate to inspect for surface damage [4].

Figure 2. YOLO v5 network architecture.

Figure 3. Comparison of annotation effects between two types of rectangular boxes. (a) Original rectangular bounding boxes; (b) rotated rectangular bounding boxes.

Figure 4. Boundary issues of rotated boxes.

Figure 5. PSANeck structure.

Figure 6. PSA module structure.

Figure 7. ECA-Layer structure.

Figure 8. Feature fusion comparison. (a) R-YOLO v5 PAN structure; (b) enhanced feature fusion structure.

Figure 9. Network architecture of R-YOLO v5.

Figure 10. F1 score and precision.

Figure 11. Comparison of detection results. (a) YOLO v5; (b) R-YOLO v5; (c). S²A Net; (d) ReDet; (e) SASM.

Table 1. Training results of R-YOLO v5 mode.

Epoch	mAP@0.5	mAP@0.5:0.95	Precision	Recall	F1	Loss
1	0.3741	0.1032	0.4617	0.4332	0.4470	0.2017
50	0.9402	0.5973	0.9215	0.8881	0.9045	0.0977
100	0.9401	0.6111	0.9133	0.9122	0.9127	0.0935
150	0.9413	0.6134	0.9174	0.9034	0.9103	0.0895
200	0.9412	0.6123	0.9131	0.9073	0.9102	0.0866
250	0.9402	0.6104	0.9163	0.9048	0.9105	0.0843
300	0.9403	0.6103	0.9152	0.9054	0.9103	0.0839

Table 2. Performance comparison of rotation-aware object detection algorithms.

Algorithm	mAP@0.5	Model Size (Mb)	Inference Time (Per Picture)
SASM	80.29	276	0.22 s
S²A Net	78.17	291	0.25 s
ReDet	89.45	233	0.47 s
R-YOLO v5	94.03	4.17	0.01 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Zhou, T.; Xu, J.; Hong, Y.; Pu, Q.; Wen, X. Rotating Target Detection Method of Concrete Bridge Crack Based on YOLO v5. Appl. Sci. 2023, 13, 11118. https://doi.org/10.3390/app132011118

AMA Style

Liu Y, Zhou T, Xu J, Hong Y, Pu Q, Wen X. Rotating Target Detection Method of Concrete Bridge Crack Based on YOLO v5. Applied Sciences. 2023; 13(20):11118. https://doi.org/10.3390/app132011118

Chicago/Turabian Style

Liu, Yu, Tong Zhou, Jingye Xu, Yu Hong, Qianhui Pu, and Xuguang Wen. 2023. "Rotating Target Detection Method of Concrete Bridge Crack Based on YOLO v5" Applied Sciences 13, no. 20: 11118. https://doi.org/10.3390/app132011118

APA Style

Liu, Y., Zhou, T., Xu, J., Hong, Y., Pu, Q., & Wen, X. (2023). Rotating Target Detection Method of Concrete Bridge Crack Based on YOLO v5. Applied Sciences, 13(20), 11118. https://doi.org/10.3390/app132011118

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Rotating Target Detection Method of Concrete Bridge Crack Based on YOLO v5

Abstract

1. Introduction

1.1. Motivation

1.2. Related Work

2. YOLO v5 Model

3. Enhanced YOLO v5 Model

3.1. Angle Regression

3.2. Attention Mechanism

3.2.1. PSA-Neck Module

3.2.2. ECA-Layer Module

3.3. Enhanced Feature Fusion

3.4. Improved Network Architecture

4. Experiments

4.1. Datasets

4.2. Evaluation Metrics

4.3. Training

4.4. Results Analysis

4.5. Performance Comparison

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI