1. Introduction
The detection of defects on the surface of concrete structures is a vital part of structural health monitoring. The detection of defects, such as cracks, exposed bars, and corrosion on the surface of bridges is necessary due to the effects of such defects on the durability of concrete structures [
1]. Cracks are the most common and most consequential defect because they represent insufficient strength and a decrease in the safety of a bridge [
2]. Crack width is a main criterion used to assess the performance of concrete components and is critical in ensuring bridge performance [
3,
4,
5].
Cracks can be detected through manual visual inspection and assessment. Conventional detection methods involve a bridge inspector using an engineering vehicle or a water vehicle (
Figure 1) and making judgements about deterioration on the basis of their personal experience; this process is both dangerous and subjective [
6], as well as labor-intensive and time-consuming [
7]. To overcome the challenges posed by this conventional method, objective and automated methods must be developed. To this end, computer vision techniques can be employed [
8]. UAVs can easily fly close to bridge components that are difficult for people to access. Using the camera integrated on the UAV (UAV-mounted camera), the camera can be controlled by a remote control on the ground, and UAV imagery can be taken to detect bridge cracks.
In recent years, structural crack identification and detection technology based on computer vision (CV) has been gradually applied to civil engineering operations and maintenance [
9,
10,
11]. Computer vision techniques can be used to identify cracks in concrete. Conventional machine-learning algorithms include linear regression, decision trees, the support vector machine, and the Bayesian algorithm. The disadvantage of computer vision is that it is affected by the presence of different objects, such as light, shadows, and rough surfaces [
12]. However, various hybrid approaches of artificial intelligence (AI) and machine learning (ML) techniques can be used to overcome these limitations [
13,
14,
15,
16]. Sharma et al. (2018) combined a support vector machine with a convolutional neural network (CNN) to identify cracks in reinforced concrete; their method has a higher identification accuracy than does the use of a convolutional neural network alone [
17]. Prateek et al. employed computer vision techniques to extract image features from images of cracks in concrete and then trained their system to extract information about the cracks [
18]. This method can only identify information about spatial positions, gray values, and saturation; it cannot extract depth features, and its identification accuracy is low. Furthermore, machine learning does have weaknesses, such as slow convergence rates and large time requirements [
19,
20]. To achieve more rapid identification, automated feature extraction techniques must be established.
Deep learning is an important branch in the field of machine learning. In the field of deep learning, convolutional neural network (CNN) is the most prominent for image recognition. The convolutional neural network consists of convolutional layers, pooling layers, flattening layers, and fully connected layers, while the number of layers is not fixed. The number of layers in each model is different, so there are different results in the analysis. The vigorous development of deep learning in recent years has given rise to novel approaches for detecting cracks in concrete surfaces, often through the application of CNNs to extract crack features. CNNs can automatically learn depth features from training images and thus have high crack detection efficiency and accuracy [
7,
21]. Cha et al. (2017) proposed a deep CNN for sorting images on the basis of whether they show cracks [
7]. Liang et al. (2017) used a deep CNN to classify concrete cracks and spalling; the challenge in their approach, which is based on image segmentation, is finding the appropriate sub-image size when cracks of various sizes are present [
22]. Gang Yao et al. (2021) improved the YOLOv4 model to realize real-time detection of concrete surface cracks. The results show that the improved YOLOv4 model reduces parameters and calculations by 87.43% and 99.00%, respectively. Although the identification speed has been improved, the mAP of crack detection is 94.09%, which is slightly lower than the 98.52% of the original YOLOv4 model [
23]. Qianyun Zhang et al. (2021), in order to improve the training efficiency, first transformed images into the frequency domain during a preprocessing phase. An integrated one-dimensional convolutional neural network (1D-CNN) and a long short-term memory (LSTM) method was used in the image frequency domain. The algorithm was trained using thousands of images of cracked and non-cracked concrete bridge decks. The accuracy of the developed model was 99.05%, 98.9%, and 99.25%, respectively, for training, validation, and testing data. The 1D-CNN-LSTM algorithm makes it a promising tool for real-time crack detection [
24].
Target detection models can be distinguished into one-stage and two-stage models. The SSD [
25] and YOLO [
26] models are one-stage models that treat target detection as a regression problem. The region-based CNN (R-CNN) [
27], fast R-CNN [
28], faster R-CNN [
29], and spatial pyramid pooling (SPP) network [
30] are two-stage models. When training a two-stage model, the target region detection network is trained after the region proposal network (RPN) has been trained; consequently, two-stage models are highly accurate but slow. To complete an entire detection process and achieve end-to-end object detection without an RPN, initial anchors can be used in a one-stage model to predict classes and locate target regions. One-stage models are fast but relatively inaccurate. The accuracy and inference speed of target detection algorithms are key problems in the field of target detection, and the balance between efficiency and accuracy is a key technical issue. YOLOv4 satisfies these requirements because it has decent processing speed and performance [
23]. Therefore, this study uses the YOLOv4 model as a model for identifying cracks.
Some studies [
31,
32,
33] have verified the effectiveness of using image-processing techniques to extract information about cracks from images. Examples include the use of thresholding to convert cracks and their background into black and white pixels, edge detection techniques to extract the outlines of cracks, mathematical morphology to improve the overall shape of cracks in images, and Canny algorithms to minimize missed detection of crack edges [
34,
35]. Photographs are often affected by complex lighting conditions, shadows, and the randomness of the shape and size of cracks. Interference from weather conditions and brightness levels also lower the performance of concrete damage detection models [
36,
37].
Popular approaches for quantitative analysis of crack width involve the use of deep-learning and image-processing techniques. Kim et al. detected cracks on a bridge’s surface by using an R-CNN and then employed planar markers to quantitatively analyze the cracks; the crack width measurements were discovered to be precise down to 0.53 mm [
38]. Park et al. employed the YOLOv3 model and structured light to identify and quantify crack features. To prevent data being affected by laser beam installation and manufacturing errors, the position of the laser beam was calculated and calibrated using a laser distance sensor. The thinnest crack that could be measured had a width of 0.91 mm [
39]. Kim et al. (2019) proposed a mask R-CNN for detecting cracks and used morphological operations to quantify the crack width. The results indicated that cracks that were wider than 0.3 mm could be successfully measured, with the errors being smaller than 0.1 mm; however, the error was larger when cracks had a width of less than 0.3 mm [
40].
The following shortcomings were identified in the reviewed studies: (1) the accessibility of deteriorated components must be considered when placing planar markers, (2) cracks are generally measured as wider than they actually are, and (3) the installation of laser detection systems on an unmanned aerial vehicle (UAV) requires additional system calibration. In the present study, a model for effectively identifying cracks under uneven lighting and given complex component backgrounds was trained using YOLOv4. Because most concrete bridges either span a river or are an elevated expressway, planar markers cannot be placed on them easily. In these situations, a total station can be employed to measure the coordinates of features on the concrete surface, and these coordinates can then be employed to calculate spatial distances as an alternative to planar markers. Research has verified the effectiveness of this approach when markers cannot be placed on a bridge. Furthermore, the proposed approach does not require a UAV to carry a ranging system or be modified and can measure crack widths smaller than 0.22 mm.
  4. Conclusions
Using transfer learning, an object detection model (YOLO-v4 deep learning model) was trained and found to have an accuracy of 92%. The model performed favorably in the identification of cracks in images with uneven lighting and a complex background, proving that the model trained in this study had a good crack detection accuracy.
The research showed that the overall crack measurement accuracy was superior to 0.22 mm. The measurement performance of the two edge detection methods was similar. However, the Canny edge detector produces different crack edges when given different thresholds, which resulted in a more significant difference between the measured value and the actual value of the width; moreover, the morphological edge detector does not require the use of thresholds, and hence, it can produce crack edges close to the truth.
This study compared the conversion precisions on the two types of scale methods. The results showed that the difference between the two was only 0.005 mm. Personnel could not approach and affix the planar marker next to the bridge crack for viaducts or river crossings. The total station measurement method proposed in this study can achieve the same measurement accuracy as the planar marker method for measuring crack width. Hence, the method proposed in this study can eliminate the limitations of planar marker methods, providing a more convenient operating procedure for bridge crack size measurement.
It is recommended to improve this method by the following directions in subsequent studies: (1) Sauvola’s local thresholding method adopted in this study can convert grayscale images to binary images. We may test this method on images under different backgrounds or environmental conditions in future studies to find the corresponding optimal threshold values. (2) The trained model can be installed in the embedded system. Then, the embedded system can be integrated into the UAV body to realize real-time detection and measurement of bridge cracks. (3) A collection of more images with bridge defects can be used to extend the datasets and further improve the accuracy of the detection methods proposed in this study.