A Review of Computer Vision-Based Crack Detection Methods in Civil Infrastructure: Progress and Challenges

Qi Yuan; Yufeng Shi; Mingyue Li

doi:10.3390/rs16162910

,

and

¹

College of Civil Engineering, Nanjing Forestry University, Nanjing 210037, China

²

School of Foreign Languages, Nanjing Forestry University, Nanjing 210037, China

^*

Author to whom correspondence should be addressed.

Remote Sens.2024, 16(16), 2910;https://doi.org/10.3390/rs16162910

This article belongs to the Special Issue Remote Sensing Applications for Infrastructures

Version Notes

Order Reprints

Abstract

Cracks are a common defect in civil infrastructures, and their occurrence is often closely related to structural loading conditions, material properties, design and construction, and other factors. Therefore, detecting and analyzing cracks in civil infrastructures can effectively determine the extent of damage, which is crucial for safe operation. In this paper, Web of Science (WOS) and Google Scholar were used as literature search tools and “crack”, “civil infrastructure”, and “computer vision” were selected as search terms. With the keyword “computer vision”, 325 relevant documents were found in the study period from 2020 to 2024. A total of 325 documents were searched again and matched with the keywords, and 120 documents were selected for analysis and research. Based on the main research methods of the 120 documents, we classify them into three crack detection methods: fusion of traditional methods and deep learning, multimodal data fusion, and semantic image understanding. We examine the application characteristics of each method in crack detection and discuss its advantages, challenges, and future development trends.

Keywords:

civil infrastructure; crack detection; computer vision; image processing; deep learning; image understanding

1. Introduction

Civil infrastructure is a critical component of the long-term stable development of an economy, and regular inspection and maintenance is essential to prevent major accidents. Traditionally, manual visual inspection has been used to assess the condition of infrastructure during operation [,]. However, this method is time-consuming, subjective [], and potentially hazardous. The current trend in civil infrastructure inspection involves integrating state-of-the-art computer vision technologies with traditional methods. This integration has led to numerous studies focused on intelligent detection techniques aimed at enhancing both robustness and efficiency [,]. Automatic crack detection is a significant area within intelligent inspection of civil infrastructure [,]. Due to the inherent properties of construction materials, most infrastructures develop cracks during their service life []. In some cases, especially under harsh conditions and increasing loads, the formation and propagation of cracks can be rapid, leading to a shortened lifespan and potentially severe safety incidents [,]. Moreover, there are numerous types of cracks in large quantities, categorized based on factors such as causes, sizes, and shapes. Different types of cracks have varying impact on structural safety []. Therefore, employing traditional manual methods to inspect these cracks individually poses significant challenges in terms of efficiency and cost. Accurately identifying cracks that may affect the safety performance of structures from the vast number of cracks is currently one of the most critical issues in the field.

To solve this problem, researchers are actively exploring intelligent crack identification techniques based on computer vision. By building a large database of crack images and combining image processing and pattern recognition techniques, algorithmic models can be trained that can automatically identify different types of cracks. Common crack detection algorithm models include the following: (1) Crack detection by traditional image processing methods mainly relies on techniques such as edge detection, threshold segmentation, morphological operations, etc. to identify cracks. Commonly used algorithms include Canny edge detection [], Otsu technique [], Hough transform [], Gray-Level Co-occurrence Matrix (GLCM) based on texture features [], wavelet transform [], etc. (2) Machine learning-based crack detection uses machine learning algorithms to learn crack features from training data and classify test images to detect cracks. Commonly used algorithms include support vector machines (SVMs) [,], Random Forest [,], Decision Trees [], etc. (3) Deep learning-based crack detection builds a convolutional neural network (CNN) model [] to realize end-to-end crack detection. Frequently used models include U-Net [,], FCN [,], SegNet [], etc. Azouz et al. [] reviewed image processing techniques and visual-based machine learning algorithms and summarized crack detection and analysis techniques for monitoring geometric changes in building structures. Hsieh et al. [] reviewed 68 machine learning (ML)-based crack detection methods. Hamishebahar et al. [] reviewed 61 crack detection methods based on deep learning and found that semantic segmentation is one of the popular methods in the field of crack detection in recent years. In addition, advanced technologies such as drones [,] and 3D laser scanning [] can be used to obtain three-dimensional information of the infrastructure surface, and combined with computer vision algorithms to realize automatic identification and quantitative analysis of cracks, which can greatly improve the efficiency and accuracy of identification. Satellite technologies also play a crucial role in monitoring cracks in civil infrastructure. Key technologies include synthetic aperture radar interferometry [] (e.g., MT-InSAR []), high-resolution optical imaging (e.g., HSR bitemporal satellite images []), multispectral and hyperspectral imaging [], and thermal infrared imaging []. These technologies effectively identify and monitor cracks and structural deterioration by providing high-resolution imagery, three-dimensional modeling, and surface temperature data. Combined with change detection techniques and time-series analysis, they significantly enhance the efficiency and accuracy of infrastructure monitoring. Effective crack detection usually involves overcoming problems related to shadows, light, and illumination. To improve the visibility of cracks in images, shadow removal, light compensation, and illumination adjustment techniques are needed. For example, using illumination enhancement techniques such as Retinex [] or adaptive thresholding [] techniques can improve image quality and ensure accurate crack detection even when lighting conditions change. The deep learning-based algorithm can continuously improve its recognition ability and can gradually improve the accuracy and robustness of identifying various types of cracks. Intelligent crack detection technology can effectively solve the problems of recognition efficiency and cost caused by traditional manual labor, resulting in a more reliable guarantee for the safe operation of civil infrastructure.

In recent years, the field of crack detection technology based on computer vision has witnessed a remarkable advancement. A number of researchers have employed feature-based deep learning methods and traditional image processing techniques for the extraction and analysis of cracks [,,]. However, in real-world scenarios, cracks typically exist under complex and variable environmental conditions, resulting in detection data that often contain substantial noise, which can severely impact the accuracy of the detection results []. Consequently, the advancement of high-precision and highly robust crack detection methodologies has become a shared objective within both the scientific and engineering communities, with the aim of achieving accurate crack detection for civil infrastructure in a variety of scenarios []. Researchers in the field of civil engineering employ the distinctive attributes of their discipline to facilitate interdisciplinary, cross-industry, and cross-domain collaboration, aiming to establish an artificial intelligence theoretical framework applicable to this field []. They have successfully integrated deep learning-based crack detection algorithms with modern technologies, including advanced sensing equipment, high-definition imaging systems [], and lightweight robotic platforms []. This integration has enabled intelligent crack detection, providing effective decision support for the maintenance and repair of civil infrastructure and advancing the field towards greater levels of automation [,]. At present, some researchers are improving deep learning models such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) [,] to enhance the accuracy and robustness of crack detection. In addition, there are also methods based on weakly supervised learning [,] and transfer learning [,,] in the field of crack detection. These methods train models using a small number of labeled samples, thereby reducing the workload associated with data annotation. In summary, the development of high-precision and highly robust crack detection methods represents a critical research direction in the field of computer vision. In the future, deep learning-based crack detection technology will continue to evolve and improve, playing an increasingly important role in engineering practice.

For this paper, we utilized the Web of Science (WOS) and Google Scholar as literature search tools, using “fracture”, “civil infrastructure”, and “computer vision” as search terms, and obtained 325 relevant papers. Figure 1 shows the distribution of relevant studies by year.

Figure 1. Year-wise distribution of articles.

A keyword search was then conducted on the retrieved literature to identify frequent keywords related to the aforementioned terms. The keywords identified were ‘deep learning’, ‘crack detection’, ‘feature extraction’, ‘CNN’, and ‘image processing’, as shown in Figure 2.

Figure 2. Keywords for crack detection.

To maximize the retrieval of papers relevant to the research, these keywords were combined with a basic set of keywords to comprehensively utilize the database; a total of 120 papers were ultimately selected for analysis and summarization over the period 2020–2024. Based on recent research and technology development trends in the field of crack detection in civil engineering infrastructure, this paper proposes a comprehensive classification framework that classifies crack detection methods into three categories: combination of traditional methods and deep learning, multimodal data fusion, and semantic image understanding, reflecting on the characteristics and development trends of the application of various methods in crack detection in recent years. Next, we summarize the datasets and evaluation metrics applied to crack detection. Finally, the advantages and limitations of existing methods are summarized, and the development trends of intelligent crack detection in civil engineering infrastructure are analyzed and discussed.

The comprehensive classification framework proposed in this paper not only covers a combination of traditional and modern technologies, but also integrates various data processing approaches and provides a systematic and multifaceted perspective to help readers more fully understand current technological advances, strengths, and challenges. Compared to studies that focus only on a specific time period or data source, this paper surveys key papers from the past five years to ensure the most up-to-date and comprehensive research.

2. Crack Detection Combining Traditional Image Processing Methods and Deep Learning

In image processing, feature extraction is used to extract representative image information and identifying features from raw data for further analysis and processing. Traditional crack detection methods mainly rely on manual features such as edge detection, thresholding, and morphological operations. These methods can effectively detect cracks under certain conditions. However, they often lack robustness and accuracy when faced with complex backgrounds, changing lighting conditions, and noise interference.

With the rapid development of artificial intelligence, methods represented by deep learning have made a breakthrough in the field of computer vision. Convolutional neural networks (CNNs) have become a central algorithm in deep learning due to their superior performance in image processing and computer vision tasks. Compared to traditional methods, deep learning models not only automatically learn and extract multi-layered complex features from data but are also better able to adapt to lighting variations, noise interference, and complex backgrounds. Deep learning models can be trained and optimized end-to-end directly from the input image to the output result. This approach simplifies the processing workflow, reduces the accumulation of errors in intermediate stages, and improves overall detection efficiency and accuracy. Deep learning models also provide powerful generalization capabilities that can be adapted to a variety of application scenarios as well as training data.

By integrating traditional methods and deep learning for crack detection, we can fully utilize the advantages of both approaches to improve crack detection performance. Traditional methods have rich experience and effectiveness in data preprocessing and feature extraction, while deep learning models excel at learning complex representations of crack features to realize more accurate crack detection.

2.1. Crack Detection Based on Image Edge Detection and Deep Learning

Cracks exhibit distinctive edge features that can be identified through the application of edge detection techniques. Typically, there is a significant change in the grey scale value between the area on either side of the crack edge and the edge pixel itself. The utilization of grayscale gradient information through the use of differential operators in the edge detection algorithm enables the accurate localization of crack edges, thus facilitating the identification of cracks in an effective manner [,]. For example, the gradient for edge detection is calculated using Equation (1):

G r a d i e n t m a g n i t u d e = \sqrt{{(I * G_{x})}^{2} + {(I * G_{y})}^{2}}

(1)

where

I

represents the input image, and

G_{x}

and

G_{y}

are the gradient operators in the horizontal and vertical directions, respectively. This formula calculates the strength of gradients in various directions to detect edges in the image. The most common edge detection operators include gradient operators (first-order differential operators), second-order differential operators, and operators used in Canny edge detection algorithms. First-order differential operators include the Sobel operator, the Prewitt operator, and the Roberts operator. For example, first-order derivative operators (e.g., Sobel and Prewitt operators) define contours by computing local changes in image intensity, as shown in Equation (2):

G_{x} = [\begin{matrix} - 1 & 0 & 1 \\ - 2 & 0 & 2 \\ - 1 & 0 & 1 \end{matrix}], G_{y} = [\begin{matrix} - 1 & - 2 & - 1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{matrix}]

(2)

Equation (2) shows the computation of the gradient of the image by these convolution kernels for edge detection. Classical second-order derivative operators (e.g., Laplace operator []) compute the image change using Equation (3):

L a p l a c i a n = \frac{\partial^{2} I}{\partial_{x^{2}}} + \frac{\partial^{2} I}{\partial_{y^{2}}}

(3)

Equation (3) is used to detect edges and textural changes in the image, which are enhanced by computing the second-order derivatives of the edges. It is evident that different edge detection operators are suitable for the detection of different types of crack edges. Although these operators are relatively simple and fast to compute, a single operator is often unable to meet the complex requirements of crack segmentation in complex backgrounds. Furthermore, the detection capabilities of these operators are constrained in the presence of substantial noise. Consequently, a considerable number of researchers have recently employed deep learning techniques to integrate and enhance edge detection, thereby further enhancing the accuracy and robustness of crack detection. Table 1 provides a summary of crack detection methods based on edge detection and deep learning.

Table 1. Crack detection method based on edge detection and deep learning.

In their initial application of YOLOv4 for crack detection, Kao et al. [] subsequently enhanced the quality of the resulting images by integrating Canny edge detection. This resulted in a notable improvement in the accuracy of the subsequent detection process. Building upon the GM-ResNet crack detection model, Li et al. [] extracted additional quantitative information about cracks using Canny edge detection. In a pioneering approach, Choi et al. [] initially enhanced crack feature information using the Sobel algorithm, subsequently achieving high-precision, high-efficiency crack detection with the ResNet50 neural network. Guo et al. [] employed BARNet for feature learning and prediction, subsequently integrating it with the Sobel algorithm to achieve precise crack detection and localization. Luo et al. [] integrated the Canny edge detection results with the low-level feature layers of the DeepLabV3+ network. This enhanced the positional and detailed information of road cracks, compensating for the loss of details when merging high-level and low-level feature layers, thereby achieving high-precision crack detection.

Edge detection can effectively extract crack edge information and reduce the interference of background noise. Therefore, edge detection is commonly used as a preprocessing step. Its main purpose is to simplify the input data and reduce the complexity and noise interference by first extracting the edge features from the image. This can improve the learning efficiency and detection efficiency of deep learning models. By reducing the processing load of irrelevant areas and reducing the consumption of computational resources, the model can quickly focus on areas that may contain cracks. This improves the final detection accuracy and efficiency. Edge detection can also be used as a postprocessing step to optimize and improve the output of a deep learning model. In this scenario, edge detection is employed to further extract and refine the edge information of cracks to improve crack detection and localization results. Deep learning models have strong generalization ability and can adapt to different types and shapes of cracks. Combining edge detection with deep learning models not only improves detection accuracy but also allows detailed features such as crack length, width, and shape to be extracted. This integration can improve the accuracy of crack risk assessment and maintenance decision-making.

However, incorporating edge detection into the preprocessing stage increases the complexity of the detection pipeline and requires additional computational resources and time. In addition, the algorithm itself requires appropriate parameter settings, which greatly affects the efficiency of preprocessing. In addition, edge detection results depend on the quality of the image and the performance of the algorithm. If the noise level in the image is high or the performance of the algorithm is not optimized, the detection accuracy of subsequent deep learning models may be adversely affected. Deep learning models typically require large amounts of labeled data, placing high demands on data quality and resources. Inadequate data quality can degrade model performance. The key to efficient crack detection lies in overcoming the shortcomings of each while reasonably leveraging the advantages of edge detection and deep learning.

2.2. Crack Detection Based on Threshold Segmentation and Deep Learning

Threshold segmentation is a method that divides an image into several classes on the basis of one or more threshold values in order to separate features of interest from background pixels. The basic principle of threshold segmentation can be expressed in Equation (4) as follows:

I_{s e g} (x, y) = \{\begin{matrix} 1, i f I (x, y) > T \\ 0, o t h e r w i s e \end{matrix}

(4)

where

I (x, y)

represents the pixel intensity at position

(x, y)

in the image,

T

is the threshold value, and

I_{s e g} (x, y)

is the segmented image. This formula indicates that pixels with intensity greater than the threshold

T

are classified as foreground (1), while those with lower intensity are classified as background (0). Low computational cost, fast processing, stable performance, and ease of implementation make this the most basic and commonly used method for detecting crack images in concrete structures. Common threshold-based segmentation methods include the Otsu method [], adaptive thresholding [], multi-level thresholding [], histogram-based thresholding [], and fuzzy thresholding []. The Otsu method determines the optimal threshold by maximizing the interclass variance of the segmented region.

T

is obtained, whose formula is shown in Equation (5):

σ_{B}^{2} = \frac{N_{1} N_{2}}{N_{2}} {(μ_{1} - μ_{2})}^{2}

(5)

where

N_{1}

and

N_{2}

are, respectively, the number of pixels in the two categories and the total number of pixels in

N

, and

μ_{1}

and

μ_{2}

are the average pixel intensities in the two categories. The threshold

T

is chosen by maximizing

σ_{B}^{2}

for optimum foreground and background separation. The adaptive thresholding method adjusts the threshold according to the local characteristics of the image and is usually calculated by Equation (6):

T (x, y) = m e a n (I (x, y)) - C

(6)

where

C

is a constant subtracted from the average intensity at

(x, y)

in the local neighborhood. The method handles different illumination conditions by adjusting the threshold according to the local image statistics. Thresholding segmentation methods are suitable for images with a constant background gray scale, uniform illumination, and high contrast. However, if illumination is irregular or the background contains noise, a single-threshold segmentation method often fails to deliver optimal results. In crack detection, the limitations of single-threshold segmentation methods become more pronounced, especially when dealing with complex images. This is why various improved and integrated threshold segmentation methods are continually emerging. The combination of threshold segmentation and deep learning is currently a major trend in image processing and computer vision. Table 2 summarizes crack detection methods based on edge detection and deep learning.

Table 2. Crack detection method based on threshold segmentation and deep learning.

Flah et al. [] proposed a model for defect detection that integrates convolutional neural networks and advanced Otsu image processing techniques. The model is applicable to the detection of inaccessible areas of concrete structures and enables the classification, localization, segmentation, and quantification of damage in cracked structures. Mazni et al. [] combined transition learning and Otsu to monitor the structural condition of concrete surfaces. He et al. [] first preprocessed images using the Otsu algorithm and then classified them using the YOLOv7 network to accurately classify crack repair traces (CRTs) or secondary cracks (SCs). Zhang et al. [] used an adaptive thresholding method to remove the background of the image and highlight the cracked areas. Weakly supervised learning models (WSISs) were then used to identify cracks and achieve higher detection accuracy. This approach effectively improved the detection performance by reducing the background noise and increasing the sensitivity of the model to crack features. He et al. [] applied adaptive thresholding to the segmentation of crack images obtained with the U-GAT-IT model, followed by re-detection with the model. The study was conducted. The proposed weakly supervised method showed excellent performance in crack detection.

When deep learning models are used for crack detection, threshold segmentation is usually used as a preprocessing step. The reason for this is that threshold segmentation effectively filters out background noise, separating regions of possible cracks from the background and retaining only those regions where cracks are likely to be present. By simplifying the input data and reducing the computational burden on the deep learning model, threshold segmentation allows the model to further learn complex features on the preprocessed image and adapt to cracks of different shapes and sizes. Furthermore, by extracting crack features from the image first through threshold segmentation, it provides more distinct and targeted key features for subsequent models. This improves the robustness of the deep learning model under different scenarios and conditions. Furthermore, threshold segmentation can be applied in the postprocessing step. After the deep learning model outputs initial crack detection results, threshold segmentation can be used to further refine these results and improve the clarity and continuity of crack edges. Threshold segmentation techniques can also be used to further filter the detection results output by the deep learning model to eliminate potentially false-positive areas and improve detection accuracy.

For thresholding as preprocessing and postprocessing of crack detection, each has its drawbacks. Performing thresholding as preprocessing can lead to information loss. Fixed thresholds affect the input quality of deep learning models because they are difficult to adapt to changing image conditions and cannot effectively handle complex textures. Thresholding as a postprocessing step can oversimplify the model output and lose detail. It also relies on parameter tuning and is susceptible to noise, which can easily be mistaken for cracks. These issues can reduce the accuracy and robustness of the final detection results. Despite its simplicity and ease of implementation, threshold segmentation often performs less than optimally when performing complex crack detection tasks due to its inherent limitations.

Incorporating a threshold segmentation branch into a deep learning model allows for more flexible and efficient crack detection and alleviates some of the aforementioned problems. This branch can act as an additional module in the network to improve the detection performance and robustness of the model by processing intermediate features using the thresholding technique. By incorporating the threshold segmentation branch before the feature extraction stage of the deep learning model, more image details can be preserved and the model can further process and optimize these details in subsequent layers to reduce information loss. Adaptive thresholding methods can be incorporated to address the non-adaptive nature of fixed thresholds. By leveraging deep learning models to automatically learn and adjust threshold parameters, models can adapt to different image and lighting conditions to improve robustness. To address the issue of oversimplification of results, feature maps generated by the deep learning model can be integrated with the initial results of threshold segmentation, allowing the model to improve and optimize results in the early stages. This approach helps reduce oversimplification. Subsequent network layers can further improve the fine detail of crack edges and increase detection accuracy. The practical effectiveness of the aforementioned approaches depends on the specific design of the model and the quality of the training data. To achieve the best detection performance, the model structure must be continuously tuned and optimized in real-world applications. This iterative process includes fine-tuning model parameters, trial and error of different architectures, and ensuring that the training data accurately represent the target problem. Continuous improvement of the model can be aimed at improving detection performance.

2.3. Crack Detection Based on Morphological Operations and Deep Learning

The crack detection method based on morphological operations [] involves performing image processing techniques such as erosion, dilation, opening, and closing. These operations effectively remove non-crack edge information and refine the edges of cracks. Subsequently, the features extracted using these morphological operations facilitate crack identification. By combining morphological operations with deep learning, it is possible to detect cracks in complex concrete surface images more effectively, leveraging attributes such as crack shape, connectivity, and curvature. This integration significantly enhances detection accuracy and robustness. Table 3 summarizes the crack detection methods reviewed in this section, highlighting the use of morphological operations and deep learning.

Table 3. Crack detection method based on morphological operations and deep learning.

Huang et al. [] proposed to integrate morphological closure operations into the Mask R-CNN model for crack detection in images, which can effectively detect cracks and segment instances. Fan et al. [] proposed a parallel ResNet for crack recognition, and then used mathematical morphological methods to extract the framework of the crack to obtain information about the length, width, and area of the crack. Kong et al. [] proposed a dual-scale CNN for crack detection, morphological operations for crack measurement, and shape context for crack monitoring, which utilizes a morphological operation to extract the framework of the crack to obtain information about the length, width, and area of the crack. Dang et al. [] proposed the TunnelURes framework for segmentation based on basic crack features to realize automatic evaluation of cracks and quantitative growth monitoring. Andrushia et al. [] proposed a pixel-level thermal image crack detection method based on U-Net architecture with an encoder–decoder framework. To extract the shape information of the crack and accurately quantify the damage, morphological operations and an improved distance transformation method were used to quantify the crack and calculate its width at the pixel level.

In the crack detection process, the integration of morphological manipulation and deep learning models can improve the detection effectiveness at several levels. Specifically, this integrated approach can be categorized into three aspects. In the preprocessing stage, morphological manipulations such as erosion and dilation are applied before the image is fed into the deep learning model. These manipulations significantly reduce noise and emphasize the features of the crack, allowing the deep learning model to focus more on the crack region. This preprocessing method effectively improves the learning efficiency and detection performance of the model. In the postprocessing stage, after obtaining the output of the deep learning model, morphological manipulations such as closure are applied to optimize the detection results. These manipulations fill the gaps along the crack edges and smooth the crack contours to improve the accuracy and precision of the detection results. Postprocessing techniques are introduced to ensure the continuity and integrity of the final detection results. Hybrid models incorporate morphological manipulations into the structure of the deep learning model, such as within the interlayer or at specific points. This approach integrates the morphological processing step inside the deep learning model and leverages the strengths of both technologies to realize more accurate crack detection. Hybrid models not only improve detection accuracy but also enhance the model’s adaptability to a variety of complex crack morphologies.

3. Crack Detection Based on Multimodal Data Fusion

Crack detection is critical in civil engineering, directly affecting the safety and stability of buildings and infrastructure. However, methods based on a single data source face challenges such as sensitivity to illumination, viewing angle, and noise, as well as limitations in capturing crack morphology and depth. To address these issues, multimodal data fusion techniques have emerged as a significant research direction. These techniques integrate information from different sensors or data sources, providing more comprehensive and precise crack detection. This approach enhances detection accuracy, robustness, and reliability. This chapter explores the current application status of multimodal data fusion in crack detection for civil infrastructure inspection, focusing on two key areas: multi-sensor fusion and multisource data fusion.

3.1. Multi-Sensor Fusion

Multi-sensor fusion enhances the adaptability of crack detection systems to environmental changes and interferences by integrating information from various sensor types. By combining data from optical sensors, laser sensors, thermal imaging sensors, and others, it provides more comprehensive and precise detection results. This approach increases information sharing among different sensors and enhances the overall reliability and robustness of the system. Table 4 illustrates the specific application methods of various sensors in multi-sensor fusion, highlighting the advantages and disadvantages of each sensor type.

Table 4. Multi-sensor fusion for crack detection.

Jian et al. [] combined digital photography with LiDAR technology and used YOLOv5s to train and detect cracks in the collected images. This integration achieved temporal and spatial synchronization of multiple sensors, ensuring accurate and efficient crack detection under large-scale data conditions. Liang et al. [] fused features from optical and infrared sensors to capture images and employed C-Net for pixel-level detection of cracks. The thermal features provided by infrared images compensated for the limitations of visible light images under low-light conditions or when obscured, thereby enhancing the accuracy of road crack detection. Alamdari et al. [] first used CNNs to localize cracks in images and then combined the geometric shapes and dimensions of cracks obtained from laser scanners with the surrounding environmental information acquired by LiDAR using point cloud registration techniques. This approach enabled comprehensive, accurate, and detailed defect detection. Liu et al. [] proposed a multifunctional sensor for crack detection by combining optical and capacitive sensors. The optical function of the sensor localizes cracks, while the capacitive function triggers alerts. Through image processing and capacitive measurements, this sensor efficiently detects and quantifies cracks. Park et al. [] utilized visual sensors and two laser sensors to detect cracks on the surface of concrete structures in real time using the YOLO algorithm. They calculated crack dimensions based on the position of the laser beams and enhanced measurement accuracy through distance sensor and laser alignment calibration algorithms. Simulation and experimental validation demonstrated that this system achieves high-precision real-time detection and quantification of cracks.

Different sensors possess varying sensitivities and error characteristics. By integrating diverse types of sensors such as vision, LiDAR, ultrasound, and infrared thermography, researchers can overcome the limitations imposed by environmental noise or interference on a single sensor. This integration enhances resistance to interference and allows for the acquisition of multi-dimensional information about cracks, providing more comprehensive and detailed detection results. The integration of multiple sensors offers redundant information, ensuring that even if one sensor fails or provides inaccurate data, other sensors can still supply valid information. This redundancy ensures the reliability and continuity of the system. Different sensors capture various characteristics of cracks, and by leveraging their complementary nature, a comprehensive capture of the diverse features and changes of cracks can be achieved. For instance, visual sensors provide high-resolution images that aid in the precise localization and measurement of surface cracks, while LiDAR offers three-dimensional structural data. The integration of these two types of sensors is particularly suitable for detecting cracks in complex and large-scale infrastructure. The combination of ultrasonic sensors and infrared thermography is widely used for detecting internal cracks and material defects. Ultrasonic sensors penetrate materials to detect internal cracks, while infrared thermography identifies potential structural damage through temperature variations. Utilizing the inherent characteristics of these sensors enables real-time monitoring of cracks in civil infrastructure, aiding in the timely detection and assessment of crack development. This facilitates prompt maintenance and repair actions to prevent further crack propagation.

With the continuous advancement of artificial intelligence and deep learning technologies, the processing and analysis of multi-sensor data have become increasingly intelligent. Deep learning algorithms enable efficient processing of large-scale data, enhancing the identification and analysis of crack morphology and predicting crack development trends. Concurrently, the application of Internet of Things (IoT) technologies empowers crack detection systems with remote monitoring and automated management capabilities. Monitoring data can be transmitted in real-time to central servers for analysis and processing, facilitating real-time monitoring and management of infrastructure crack conditions.

The integration of multiple sensors in crack detection faces several challenges. Firstly, the synchronization and alignment of sensor data require precise time and spatial calibration. Secondly, the complexity of data processing and fusion algorithms increases the difficulty in system design and implementation. Lastly, addressing the high cost and complex deployment processes of sensors is essential.

3.2. Multi-Source Data Fusion

Multi-source data fusion involves integrating various information sources from different types or origins, such as images, sound, vibrations, under specific standards to logically combine them in spatial or temporal domains. This comprehensive analysis aims to accurately describe the condition of cracks, thereby enhancing the performance and accuracy of crack detection systems. By leveraging the strengths of multiple data sources, this approach overcomes the limitations of individual data streams, enhancing the reliability and precision of detection. It provides detailed and comprehensive crack information, supporting more accurate damage assessment and informed maintenance decisions. Figure 3 compares the performance of multiple data sources in crack detection to help select the most suitable data source or integrate various data sources.

Figure 3. Performance comparison of multi-source data fusion crack detection.

Yan et al. [] combined RGB images with LiDAR data to identify regions of interest and extract depth information, thereby improving the accuracy of crack detection and quantification. This integration enhances the estimation of actual pixel sizes of cracks. Dong et al. [] fused point cloud data with grayscale images as inputs to the YOLOv5 model, which enhanced the effectiveness of detecting pavement cracks. Kim et al. [] addressed the challenge of varying camera angles with concrete surfaces by integrating data from RGB-D cameras and high-resolution digital cameras, resulting in precise crack measurements. Chen et al. [] reduced false alarms in concrete defect detection by integrating geotagged aerial images with Building Information Modeling (BIM) data. Pozzer et al. [] evaluated the performance of various deep neural network models in detecting defects in concrete structures by combining thermal imaging with visible light images, providing comprehensive insights into the structural integrity of concrete.

The fusion of multi-source data offers both significant advantages and challenges in the detection of cracks in civil infrastructure. By integrating various data types, such as image data, laser scanning data, and thermal imaging data, multi-source data fusion provides a comprehensive understanding of crack conditions, enhancing the reliability and accuracy of detection. This approach also improves the system’s adaptability to diverse environmental conditions, expands detection coverage, and uncovers hidden cracks, thereby aiding in infrastructure maintenance and management decisions. However, multi-source data fusion faces several challenges. Data processing is complex, requiring solutions to technical issues such as data alignment, fusion, and analysis, which increase system complexity and costs. The differences between various data sources necessitate the optimization of algorithms and models to improve detection efficiency and accuracy. Additionally, multi-source data fusion relies on support from different types of sensors and equipment, thereby increasing hardware and maintenance costs. Therefore, when applying multi-source data fusion for crack detection in civil infrastructure, it is crucial to carefully select the most suitable approach.

In recent years, the continuous advancement of artificial intelligence, deep learning, and big data technologies has driven new trends in the application of multi-source data fusion for crack detection in civil infrastructure. Firstly, the ability to process and analyze multi-source data using deep learning techniques has significantly improved, leading to enhanced accuracy and efficiency in crack detection systems. Secondly, the ongoing innovation and widespread adoption of sensor technologies have expanded data fusion applications across various sensor types, enriching the methods and content of data fusion in crack detection systems. Furthermore, advancements in cloud computing and edge computing technologies have improved the real-time performance and scalability of multi-source data fusion, making it easier to adapt to the monitoring and management needs of large-scale infrastructure. Overall, the development of multi-source data fusion technology in crack detection is moving towards greater intelligence, integration, and real-time capabilities.

Table A1 in Appendix A compares the advantages and disadvantages of traditional methods compared to the fusion of deep learning and multimodal data.

4. Crack Detection Based on Image Semantic Understanding

Image understanding (IU) involves in-depth analysis and comprehension of images, focusing on semantic understanding. Its primary objective is to recognize and interpret semantic information within images. Typically, IU encompasses three aspects: classification, detection, and segmentation [,]. By employing image understanding techniques, deeper insights into image content can be achieved, facilitating a comprehensive understanding of image information. This capability enables efficient and precise crack detection, providing crucial technical support for safety management and maintenance in civil engineering.

4.1. Crack Detection Based on Classification Networks

Convolutional neural networks (CNNs) were initially developed for image classification problems. They are designed to mimic the workings of the human visual system, with multiple layers of convolution and pooling operations progressively extracting features from images. These features are then fed into fully connected layers for classification. The structure of CNNs enables them to automatically learn features from raw image data and adjust weights during training, leading to accurate classification of images across different categories. Zhang et al. [] pioneered the application of deep learning in crack detection by using CNNs, achieving superior results compared to traditional methods such as support vector machines (SVMs) and boosting.

A classification network can be used to detect the presence of cracks in an image and classify them as either positive or negative samples. When given an image as input, this network analyzes its features and makes a classification judgment. If cracks are identified, the network labels the images as a positive sample; otherwise, it labels it as a negative sample. This algorithm can be applied at both the whole-image level and the image-patch level. Classifying positive and negative samples in crack images is a crucial step in crack detection and analysis. It facilitates the automation of crack detection processes and enhances structural safety and maintenance efficiency. Classification errors can directly impact the accuracy of subsequent crack damage assessment and maintenance strategies.

Some researchers focus on classifying entire images containing structural cracks. This approach involves using the entire image as input and distinguishing between crack and non-crack regions through annotation, feature extraction, model training, and evaluation. Flah et al. [] proposed a classification model for concrete structural cracks, categorizing them into five types, including the presence of surface cracks and four directional cracks (HR, HL, VR, and VL), achieving high classification accuracy. Yang et al. [] compared the performance of three convolutional neural networks—AlexNet, VGGNet13, and ResNet18—in the recognizing and classifying of crack images. The results indicate that ResNet18 outperforms the other two networks, demonstrating superior feature extraction capability and accurate crack identification. Rajadurai et al. [] combined transfer learning with fine-tuning of the AlexNet model for crack classification tasks, resulting in improved accuracy and practical performance. Kim et al. [] proposed a crack classification detection model for concrete structural surface cracks named OLeNet, by fine-tuning the hyperparameters of the LeNet-5 architecture. This model effectively handles low-quality images and collaborates with Internet of Things (IoT) devices without relying on high-performance computing resources. O’Brien et al. [] employed deep convolutional neural networks and transfer learning techniques to achieve automatic detection and classification of cracks in concrete tunnel lining images.

Additionally, some researchers segment original images containing cracks into multiple small-sized images (patches) and independently classify each patch to determine if it contains a crack. For instance, Chen et al. [] used high-definition images of facade cracks captured by multi-source drones. They combined CNN models to classify and detect 128 × 128 pixel image blocks, achieving a 94% F1 score, which enabled the classification of minor cracks. Dais et al. [] utilized transfer learning for image patch-level classification in crack detection. They modified a pre-trained network model by removing the fully connected layer, adding a new fully connected layer as the top layer, and incorporating batch normalization and dropout layers with a dropout probability of 0.5. Finally, they added a fully connected layer with softmax activation to classify images into crack or non-crack categories. This approach enhanced the performance and generalization ability while reducing the need for extensive labeled data and accelerating model training speed, maintaining high classification accuracy. Li et al. [] proposed a method based on residual neural networks (ResNets) and transfer learning, capable of accurately classifying and recognizing dam structural crack images with a resolution of 200×200 pixels without detailed morphology feature annotation. This method also enables the localization of the lower part and shape of the cracks.

The advantage of whole-image-level classification lies in its simplicity and effectiveness when cracks are prominent and widely distributed throughout the image. This method relies on the overall image features to determine the presence of cracks. However, in cases where crack locations are unclear or unevenly distributed, classification accuracy may be lower. In contrast, patch-level classification analyzes images at a finer granularity, compensating for the limitations of whole-image classification. This approach divides the image into smaller patches and classifies each one, which can improve accuracy by identifying local regions containing cracks. Table 5 summarizes algorithms of recent years for crack classification.

Table 5. Summary of relevant details of crack classification.

Currently, research primarily focuses on patch-level classification, as it better identifies local regions within images, determining the position and shape of cracks. This method lays the groundwork for subsequent crack segmentation and quantification. However, patch-level classification can be cumbersome and produces relatively coarse results, making it challenging to perform refined crack feature assessments. In crack classification tasks, networks typically end with fully connected layers, allowing only for patch-level classification. Consequently, classification results are generally binary (crack or non-crack) and do not achieve pixel-level classification or accurately represent specific crack morphologies. To obtain more precise crack morphology information, it is necessary to integrate object detection or segmentation networks with image processing techniques.

By combining these approaches, we can achieve more detailed and accurate crack detection, enhancing the effectiveness of crack hazard assessment and maintenance decision-making.

4.2. Crack Detection Based on Object Detection Networks

Object detection networks are capable of automatically recognizing and localizing cracks in images, typically using bounding boxes. Deep learning-based object detection networks can be categorized into single-stage and two-stage detectors []. These networks can effectively handle crack detection tasks under various scales, shapes, and occlusion conditions, providing crucial technical support for crack detection in civil infrastructure.

Two-stage detectors achieve higher detection accuracy by dividing the task into two separate stages. First, they generate region proposals, and they perform classification and bounding box regression on these proposals. Typical two-stage detectors include the R-CNN series (R-CNN [], Fast R-CNN [], Faster R-CNN []) and Mask R-CNN []. Figure 4 illustrates the evolution of two-stage detectors.

Figure 4. Two-stage detectors from 2014 to present.

Single-stage detectors are a type of object detection method that directly predicts the class and location of objects in an image in a single forward pass. Unlike two-stage detectors, which first generate region proposals and then perform classification and bounding box regression, single-stage detectors accomplish both tasks simultaneously. This makes them faster and suitable for real-time applications. Common single-stage detectors include the YOLO (You Only Look Once) series [,,,,], SSD (Single Shot MultiBox Detector) [], and RetinaNet []. Figure 5 illustrates the development process of single-stage detectors.

Figure 5. Single-stage detectors from 2014 to present.

Park et al. [] combined YOLOv3-tiny with laser scanning to achieve high-precision real-time detection and quantification of cracks on structural surfaces. Xu et al. [] enhanced Mask R-CNN by incorporating Path Aggregation Feature Pyramid Network (PAFPN) and Sobel filter branch, resulting in improved defect detection performance. Zhao et al. [] proposed a Crack-FPN network integrated with YOLOv5, which achieved higher detection accuracy and computational efficiency under varying lighting conditions and complex backgrounds. Li et al. [] combined Faster R-CNN with drones for bridge crack detection, enhancing both the efficiency in detecting fine cracks on bridges and the overall detection accuracy. Tran et al. [] conducted crack detection on bridge decks and compared five different object detection networks. Their results demonstrated that YOLOv7 excelled in detecting cracks on high-resolution bridge deck images, offering superior accuracy and faster analysis speed compared to the other networks. Zhang et al. [] replaced the feature extraction network in YOLOv4 with a lightweight network, reducing the number of parameters and backbone layers. This approach not only achieved real-time detection but also demonstrated high precision and processing speed. Table 6 summarizes the performance of various object detection networks on their respective datasets.

Table 6. Summary of region-based crack detection.

In the field of crack detection, single-stage detectors are valued for their fast computation speed and simple structure, enabling real-time processing of large-scale data. They can swiftly localize cracks in a single forward pass, making them ideal for scenarios demanding high real-time performance. However, their performance may be limited in complex backgrounds and when detecting fine cracks. In contrast, two-stage detectors, though more computationally complex and slower, offer superior accuracy. They are more precise in detecting fine cracks in complex backgrounds, making them particularly suitable for high-precision detection tasks. In summary, single-stage methods, with their speed advantages, are more suitable for large-scale rapid monitoring and early warning systems. Two-stage methods are better suited for detailed analysis and high-precision crack assessment tasks. In practical applications, the choice of method depends on specific requirements. Alternatively, a hybrid detection system that combines the strengths of both methods can balance speed and accuracy.

Crack object detection networks can intelligently localize crack positions but often fall short in providing detailed information about crack morphology and trajectory. Such detailed information is crucial for crack hazard assessment and maintenance decision-making. To address these shortcomings, recent research has focused on pixel-level crack detection. Pixel-level detection leverages image processing techniques or segmentation networks to extract detailed features of cracks, including their morphology, size, and orientation. This approach not only enhances the accuracy of crack detection but also provides a more reliable basis for engineering management and maintenance. Consequently, it improves the accuracy of crack hazard assessments and the effectiveness of maintenance decisions.

In summary, while object detection networks excel at localizing cracks, pixel-level detection offers the detailed analysis necessary for comprehensive crack assessment and informed maintenance planning.

4.3. Crack Detection Based on Segmentation Networks

To achieve more refined crack detection, researchers have developed pixel-level crack detection methods based on deep learning segmentation algorithms. Segmentation networks have been employed for this purpose. Table 7 summarizes the mainstream semantic segmentation algorithms used in recent years.

Models based on encoder–decoder architecture and Spatial Pyramid Pooling (SPP) structure are widely applied in crack detection, effectively capturing crack features across different scales and scenes. This approach is particularly useful for handling targets or scenes with significant scale variations. Many scholars have conducted pixel-level detection of structural cracks using network architectures such as FCN, SegNet, and U-Net [,]. Some researchers have enhanced the accuracy and robustness of these semantic segmentation networks by embedding various functional modules. These enhancements improve the utilization of information within images, thereby boosting the network’s detection performance in detecting cracks [].

Table 7. Summary of semantic segmentation algorithms.

Model	Improvement/Innovation	Backbone/Feature Extraction Architecture	Efficiency	Results
FCS-Net []	Integrating ResNet-50, ASPP, and BN	ResNet-50	-	MIoU = 74.08%
FCN-SFW []	Combining fully convolutional network (FCN) and structural forests with wavelet transform (SFW) for detecting tiny cracks	FCN	Computing time = 1.5826 s	Precision = 64.1% Recall = 87.22% F1 score = 68.28%
AFFNet []	Using ResNet101 as the backbone network, and incorporating two attention mechanism modules, namely VH-CAM and ECAUM	ResNet101	Execution time = 52 ms	MIoU = 84.49% FWIoU = 97.07% PA = 98.36% MPA = 92.01%
DeepLabv3+ []	Replacing ordinary convolution with separable convolution; improved SE_ASSP module	Xception-65	-	AP = 97.63% MAP = 95.58% MIoU = 81.87%
U-Net []	The parameters were optimized (the depths of the network, the choice of activation functions, the selection of loss functions, and the data augmentation)	Encoder and decoder	Analysis speed (1024 × 1024 pixels) = 0.022 s	Precision = 84.6% Recall = 72.5% F1 score = 78.1% IoU = 64%
KTCAM-Net []	Combined CAM and RCM; integrating classification network and segmentation network	DeepLabv3	FPS = 28	Accuracy = 97.26% Precision = 68.9% Recall = 83.7% F1 score = 75.4% MIoU = 74.3%
ADDU-Net []	Featuring asymmetric dual decoders and dual attention mechanisms	Encoder and decoder	FPS = 35	Precision = 68.9% Recall = 83.7% F1 score = 75.4% MIoU = 74.3%
CGTr-Net []	Optimized CG-Trans, TCFF, and hybrid loss functions	CG-Trans	-	Precision = 88.8% Recall = 88.3% F1 score = 88.6% MIoU = 89.4%
PCSN []	Using Adadelta as the optimizer and categorical cross-entropy as the loss function for the network	SegNet	Inference time = 0.12 s	mAP = 83% Accuracy = 90% Recall = 50%
DEHF-Net []	Introducing dual-branch encoder unit, feature fusion scheme, edge refinement module, and multi-scale feature fusion module	Dual-branch encoder unit	-	Precision = 86.3% Recall = 92.4% Dice score = 78.7% mIoU = 81.6%
Student model + teacher model []	Proposed a semi-supervised semantic segmentation network	EfficientUNet	-	Precision = 84.98% Recall = 84.38% F1 score = 83.15%

Li et al. [] proposed an FCS-Net segmentation network that integrates Atrous Spatial Pyramid Pooling (ASPP) and batch normalization (BN) modules with the original ResNet-50. This enhancement aims to improve the segmentation capability for detecting fine cracks. Wang et al. [] combined fully convolutional network (FCN) with multiscale structured forests to construct five FCN-based network architectures capable of accurately segmenting tiny cracks. Hang et al. [] developed an Adaptive Feature Fusion Network (AFFNet) composed of ResNet101 as the backbone and two attention mechanism modules, which enables automatic pixel-level detection of concrete cracks. Sun et al. [] enhanced the DeepLabv3+ model by replacing standard convolutions with depthwise separable convolutions, adjusting the dilation rates of convolutions, assigning channel-wise weights to the spatial pyramid module, and selecting feature maps to contribute to crack detection. This approach not only improved segmentation accuracy and detail preservation but also enhanced the model’s ability to accurately localize cracks and resist background interference. Tabernik et al. [] introduced a two-stage deep learning architecture based on segmentation and decision networks, achieving excellent crack segmentation performance with only 25–30 training samples.

Semantic segmentation networks offer significant advantages in crack detection compared to other detection methods. By classifying each pixel, semantic segmentation can separate cracks from the background, achieving precise localization and recognition. This approach provides fine-grained segmentation results, preserving edge details and enhancing detection accuracy. Additionally, semantic segmentation is adaptable to varying lighting conditions and complex backgrounds, demonstrating robustness in handling diverse crack morphologies. Semantic segmentation effectively filters noise, highlights crack regions, and extracts morphological features such as width, length, and orientation, providing quantitative information about cracks at multiple scales. In monitoring systems, applying semantic segmentation networks enables efficient batch data processing, facilitating automatic and intelligent maintenance and inspection of large-scale civil infrastructure. This method reduces the impact of human errors and allows for continuous learning and optimization, making the detection process more reliable and efficient.

In addition to the aforementioned crack detection models, Vision Transformer (ViT), a recently proposed deep learning model, has received significant attention in recent years. Vision Transformer, originally proposed by Dosovitskiy et al. in 2020, is based on a self-attention mechanism. Unlike traditional convolutional neural networks (CNNs), ViT converts image data into sequential data by dividing the image into fixed-size patches and transforming these patches into one-dimensional vectors combined with linear embedding and position encoding. Thanks to the self-attention mechanism, ViT captures global contextual information in images, offering enhanced feature modeling capabilities. As a result, ViT outperforms CNNs in tasks such as image classification, object detection, and semantic segmentation. This is why many scientists use ViT for crack detection. Direct application of ViT extracts global features from images for crack detection through its self-attention mechanism. Combining ViT with CNN leverages the CNN’s local feature extraction capabilities alongside ViT’s global information processing. Feature enhancement techniques can also be applied to the features extracted by ViT to improve detection performance. ViT can process images at different scales to capture crack features at various scales. Additionally, self-supervised learning can reduce the dependence on labeled data by pre-training ViT without labeled data, followed by fine-tuning with labeled data. Finally, the accuracy of crack detection can be further improved by optimizing the self-attention mechanism or by incorporating other attention mechanisms. In future computer vision tasks related to crack detection, Vision Transformer (ViT) is expected to extend its applications and address the shortcomings of CNNs.

5. Datasets

While algorithm perfection is critical to the successful implementation of crack detection models, the quality and size of the dataset, as well as the support of high-performance computing hardware, are equally important. The dataset is the basis for information learning and training of the deep learning model and directly affects the final performance of the model. Accurate crack detection not only depends on advanced algorithms but also requires rich and high-quality data support so that the model can learn more features and patterns to improve detection accuracy and reliability. To better evaluate the performance of crack detection models, Table 8 summarizes some key publicly available datasets, including various datasets for crack classification, target detection, and pixel-level segmentation. The details of these datasets include the name of the dataset, number of images included, image resolution, manual labeling information, application area, and limitations.

Table 8. Open-source crack detection datasets.

This information will help researchers select the appropriate dataset for their research needs and evaluate the dataset’s applicability and limitations. Manual labeling of pixel-level labels is a time-consuming and labor-intensive process. As a result, the number of samples in a dataset used for crack segmentation is generally much smaller than in a dataset used for crack classification or target detection. This is primarily because pixel-level labeling not only requires significant time and effort but also requires an extremely high level of labeling accuracy, which significantly increases the cost of building these datasets.

Datasets form the basis for model training and evaluation, and their size, quality, and diversity directly affect model performance, generalization ability, and application effectiveness. High-quality datasets significantly increase model accuracy and reliability, while inadequate or inappropriate datasets limit model performance and scope. The size of the dataset has a significant impact on the performance of a crack detection model. Large, diverse datasets provide a rich sample and increase model versatility as the model learns more representative features and reduces the risk of overfitting. Large datasets typically contain more data for validation and testing, allowing for a more accurate assessment of model performance. If the training dataset is too small, the model will remember certain details of the training data, leading to the phenomenon of overfitting, i.e., the model performs well on the training data but does not perform well when faced with new data. In addition, a small dataset that is not diverse enough not only prevents the model from learning enough features to deal with different crack types and changing conditions, but also leads to inaccurate evaluation results due to insufficient sample size during the validation phase. For complex crack detection models, a large amount of data is required for training in order to fully demonstrate their capabilities. If the amount of data is not sufficient, the model will not be able to learn effective features, resulting in poor performance. For simple models, effective learning can be achieved with smaller datasets, but at the same time the upper bound on model performance is limited.

6. Evaluation Index

To objectively evaluate the performance of crack detection models in terms of accuracy, reliability, and validity, a set of mathematical measures are commonly used to quantify model performance. These measures provide a quantitative basis for comparing and validating the effectiveness of different methods. Table 9 summarizes the scales commonly used to evaluate the performance of crack detection models.

Table 9. Performance evaluation index of crack detection model.

The selection and calculation of evaluation metrics may vary across different crack detection tasks, but the core metrics usually include true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs). These parameters are used to calculate precision, recall, and the F1 score. Precision measures the proportion of areas predicted by the model to be cracked that actually are. Recall evaluates a model’s ability to identify all actual crack areas. The F1 score is the reconciled average of precision and recall and provides a comprehensive assessment of performance. To guarantee the objectivity of the F1 score, it is often necessary to reset the prediction thresholds according to the specific task, while precision is only informative when the sample size is balanced.

In crack target detection, intersection over union (IoU) is a commonly used evaluation metric that measures the degree of overlap between predicted and actual regions. The closer the IoU is to 1, the more accurate the detection results. The IoU calculation is based on the intersection and union of predicted and actual images. It is based on the intersection and concatenation of predicted and actual target images and is only valid if the predicted and actual target images belong to the same category. The IoU calculation is based on the intersection and concatenation of predicted and actual images and is only valid if the predicted image belongs to the same category as the actual target image. APs are calculated by 11-point interpolation [] and by the area under the curve (AUC), where mAP is the average of APs in the different categories. mAP is the average of APs calculated in different categories.

For the crack segmentation task, evaluation is based on pixel-level true positives (TPs), false positives (FPs), false negatives (FNs), and true negatives (TNs). Due to the subjective nature of manual labeling, some errors are typically allowed at the edges of the actual pixel region, with the range of discrimination error usually being plus or minus 2 to 5 pixels []. To minimize the effect of subjective errors, the enhanced Hausdorff distance metric [] and the newly proposed CovEval criterion [] are employed to improve the accuracy of evaluation results. These evaluation metrics systematically assess the performance of the crack detection model, providing insights into its strengths and weaknesses and offering a quantitative basis for further research and improvement.

The above evaluation indices and methods provide a scientific basis for systematically evaluating the performance of the crack detection model. They help to understand the model’s strengths and weaknesses and offer data support for subsequent research and technical improvements.

7. Discussion

With the continuous development of technology, computer vision, particularly deep learning techniques, has become an important tool in the field of crack detection in civil infrastructure. These techniques have greatly improved the accuracy and efficiency of detection, but their practical application still faces many challenges. In order to gain a comprehensive understanding of these challenges and strategies to address them, this section reviews the current status, challenges, and future directions of using computer vision for crack detection.

(1) Currently, deep learning is a mainstream trend in crack detection research and applications. However, it faces challenges such as high computational resource requirements, limited end-to-end interpretability, and model accuracy affected by data quality. Traditional image processing methods still offer advantages in specific scenarios. Therefore, integrating traditional methods with deep learning to leverage their respective strengths is an important direction for future research.

(2) Existing deep learning-based crack detection algorithms mainly rely on two-dimensional visible light images to analyze visible crack information, and this approach limits the acquisition of crack depth information. However, the combination of multidimensional remotely sensed data (e.g., high-resolution satellite imagery, 3D laser scanning images, and infrared images) provides new insights to address the problem. These data sources can provide additional support and accuracy in obtaining and analyzing fracture depth information. For example, high-resolution satellite imagery can provide information on crack distribution over large areas, and multispectral satellite imagery can help characterize cracks in different materials. Laser scanning and infrared images can provide more accurate spatial depth information, and RGBD (red–green–blue depth) images combine color and depth information, further improving the accuracy of crack detection.

Future research can focus on developing crack detection algorithms that utilize multidimensional remote sensing data combined with deep learning. Combining different data sources to train the model allows different crack characteristics to be more comprehensively covered and analyzed, which greatly improves the accuracy and reliability of the detection algorithm. The application of this method should promote the development of crack detection technology and provide better support for the safe maintenance of civil infrastructure structures.

(3) Utilizing semantic segmentation algorithms for crack detection in civil infrastructure requires extensive pixel-level annotated data, which is both costly and challenging to obtain. Current crack segmentation models often lack robustness and generalization capabilities when working with limited datasets. To address this issue, it is crucial to establish more comprehensive crack segmentation datasets and develop algorithms designed for small-sample training. Crack segmentation tasks also require fine pixel-level annotations, which can lead to subjective discrepancies among annotators, particularly in the edge regions of micro-cracks. Proposing edge evaluation metrics to mitigate the impact of annotation errors is a valuable research direction. By studying these metrics in depth, we can achieve a more objective assessment of crack detection models’ performance, reducing the influence of human factors and improving the reliability and practicality of these models.

Future research should focus on designing effective edge evaluation metrics and incorporating small-sample training algorithms to enhance the performance of crack segmentation models in real-world environments. Additionally, leveraging techniques such as transfer learning and generative adversarial networks can address issues of data scarcity and annotation errors. These approaches can help mitigate the limitations posed by limited data and subjective biases, thereby improving the accuracy and robustness of crack detection models.

(4) ViT is very demanding on computational resources and training data; its training process is complex, its performance is insufficient for small datasets, and it is inferior to convolutional neural networks (CNNs) in local feature extraction. While ViT performs well on large datasets, the high computational resource requirements and model complexity remain challenges for future research. In future computer vision tasks related to crack detection, Vision Transformer (ViT) is expected to further improve its applications and compensate for the limitations of traditional CNNs in local feature extraction. As the technology matures, ViT is expected to contribute significantly to advances in the field of computer vision.

(5) The size, quality, and diversity of the dataset have a direct impact on the performance of the crack detection model. A large and diverse dataset increases the generalizability and accuracy of the model, while a small dataset can lead to over-fitting and limit the model’s performance on new data. Complex models require sufficient data; for simple models, although training on small datasets can be effective, the upper performance limit is lower. Therefore, datasets should be scaled and optimized to improve model performance, and future research should explore data augmentation and synthesis methods for more efficient training and evaluation.

(6) In the crack detection task, the choice of epochs (training cycles) and batch size significantly affects the model performance. An appropriate number of epochs can ensure that the model fully learns the fine characteristics of cracks and thus improve the detection accuracy, but too many epochs can lead to over-fitting, which affects the model’s ability to generalize new data. On the other hand, the choice of batch size affects the stability of learning and computational efficiency. A smaller batch size improves the generalization ability of the model but may lead to unstable learning; a larger batch size provides more stable gradient estimation and improves the learning rate but may reduce the generalization ability. In crack detection, a reasonable configuration of epochs and batch sizes is very important to improve the accuracy and efficiency of model training, and it is necessary to find an optimal balance between training stability, computational resources, and model performance.

Through these discussions, we aim to provide a comprehensive understanding of the application of computer vision for crack detection, identify bottlenecks in existing technologies, and suggest appropriate research directions and solutions to advance the field.

8. Conclusions

Over time, civil infrastructure projects inevitably develop surface damage such as cracks and spalling, which are common problems for concrete structures. Surface cracks, in particular, are a major contributor to concrete failure. The results of crack detection can clearly show the extent of damage and identify early signs of structural defects. Early detection, assessment, and monitoring of cracks are essential to maintain the integrity and extend the service life of concrete structures. Crack detection using machine vision has several advantages over manual inspection, including increased efficiency, safety, and cost-effectiveness. This technology reduces the risks associated with hazardous conditions and long inspection time, improves inspection efficiency, reduces costs, and is gaining importance in engineering practice. This paper provides a systematic review of current research on crack detection in civil infrastructure. Its conclusions highlight the achievements, challenges, and future directions of computer vision techniques for the inspection and maintenance of concrete structures.

(1) In practice, the choice image classification, object detection, and semantic segmentation algorithms should be based on the specific requirements of the task at hand. If the goal is to identify images containing cracks, a classification network algorithm should be used. This simple yet effective preliminary screening tool provides foundational information for subsequent crack localization and analysis. For crack classification and localization, facilitating rapid inspection of surface cracks on the target object, an object detection network algorithm is appropriate. This method can deliver more detailed crack information, enabling quicker inspection and analysis, and offering crucial references for maintenance and repair work. To obtain quantitative information about crack morphology, a segmentation network algorithm should be employed. This algorithm performs pixel-level segmentation of images, accurately identifying crack regions and providing quantitative information regarding the crack’s shape and size. In summary, the selection of the appropriate algorithm should align with the specific needs of the inspection task to ensure accurate and efficient crack detection and analysis.

(2) Computer vision methods for civil infrastructure face a number of challenges. Data quality and labeling issues (e.g., insufficient datasets and inconsistent labeling) affect model training and performance. Environmental variations such as lighting and weather conditions and different types of cracks increase the detection complexity. High computational resource requirements and real-time processing are important constraints that often cannot be satisfied by traditional methods. Model generalization is important; models may not perform well with new datasets or conditions, and there is a risk of over-fitting. The complexity and robustness of algorithms require models to be more adaptive and stable, and standardization and compatibility issues impede system integration. Real-time processing and integration issues affect detection speed and efficiency. Environmental perturbations require more adaptive methods. Ethical and legal issues, including data protection and compliance, must also be addressed. These challenges emphasize the need for global solutions to ensure effective and sustainable management of civil infrastructures.

(3) Computer vision offers significant benefits in the maintenance and monitoring of civil infrastructure, such as automating the inspection process, improving the completeness and accuracy of inspections, reducing the cost and error of manual inspections, and speeding up the overall inspection. This automation enables real-time or near-real-time monitoring to detect potential problems before they become a serious threat, thereby extending the life of the infrastructure and reducing the frequency of repairs. By combining multimodal data, computer vision can provide detailed information on crack depth, crack width, and material properties, greatly improving the accuracy of the survey and the completeness of the condition assessment. At the same time, the incorporation of satellite remote sensing data allows large areas to be covered and high-resolution surface information to be obtained, which can facilitate large-scale infrastructure condition monitoring and assessment. Modern computer vision techniques also enable large volumes of data to be processed quickly, shortening the overall inspection cycle and allowing them to work in tandem with other monitoring systems, such as sensor networks and unmanned aerial systems, to provide a more comprehensive monitoring solution.

(4) Future directions of computer vision development in civil infrastructure crack detection focus on utilizing advanced techniques to overcome existing challenges. Intelligent data augmentation methods using generative adversarial networks will solve the problem of unbalanced datasets and improve the versatility of models. Lightweight models and data compression techniques will reduce computational resources and improve processing efficiency. Multimodal learning will combine visual images and laser scanning to fully characterize fractures and improve model fitting. Adaptive algorithms and transfer learning will improve performance in different environments, and advanced computing will enable real-time crack detection and repair. Augmented and virtual reality technologies will provide real-time sensor overlay and virtual environment modeling. Autonomous systems, including drones and robots, will facilitate automated inspection and decision-making. The integration of big data and artificial intelligence will enable the development of predictive maintenance models for early warning and prevention of potential problems. These approaches will improve the accuracy, efficiency, and coverage of inspections, supporting the long-term safety of infrastructures.

Author Contributions

Conceptualization, Y.S. and Q.Y.; investigation Q.Y.; writing—original draft preparation, Q.Y.; writing—review and editing, Y.S. and M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 42271450.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The authors would like to thank the contributions of the editor and reviewers.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Comparison of traditional methods with deep learning fusion and multimodal data fusion.

Aspect	Combining Traditional Image Processing Methods and Deep Learning	Multimodal Data Fusion
Processing speed	Moderate—traditional methods are usually fast, but deep learning models may be slower, and the overall speed depends on the complexity of the deep learning model	Slower—data fusion and processing speed can be slow, especially with large-scale multimodal data, involving significant computational and data transfer overhead
Accuracy	High—combines the interpretability of traditional methods with the complex pattern handling of deep learning, generally resulting in high detection accuracy	Typically higher—combining different data sources (e.g., images, text, audio) provides comprehensive information, improving overall detection accuracy
Robustness	Strong—traditional methods provide background knowledge, enhancing robustness, but deep learning’s risk of overfitting may reduce robustness	Very strong—fusion of multiple data sources enhances the model’s adaptability to different environments and conditions, better handling noise and anomalies
Complexity	High—integrating traditional methods and deep learning involves complex design and balancing, with challenges in tuning and interpreting deep learning models	High—involves complex data preprocessing, alignment, and fusion, handling inconsistencies and complexities from multiple data sources
Adaptability	Strong—can adapt to different types of cracks and background variations, with deep learning models learning features from data, though it requires substantial labeled data	Very strong—combines diverse data sources, adapting well to various environments and conditions, and handling complex backgrounds and variations effectively
Interpretability	Higher—traditional methods provide clear explanations, while deep learning models often lack interpretability; combining them can improve overall interpretability	Lower—fusion models generally have lower interpretability, making it difficult to intuitively explain how different data sources influence the final results
Data requirements	High—deep learning models require a lot of labeled data, while traditional methods are more lenient, though deep learning still demands substantial data	Very high—requires large amounts of data from various modalities, and these data need to be processed and aligned effectively for successful fusion
Flexibility	Moderate—combining traditional methods and deep learning handles various types of cracks, but may be limited in very complex scenarios	High—handles multiple data sources and different crack information, improving performance in diverse conditions through multimodal fusion
Real-time capability	Poor—deep learning models are often slow to train and infer, making them less suitable for real-time detection, though combining with traditional methods can help	Poor—multimodal data fusion processing is generally slow, making it less suitable for real-time applications
Maintenance cost	Moderate to high—deep learning models require regular updates and maintenance, while traditional methods have lower maintenance costs	High—involves ongoing maintenance and updates for multiple data sources, with complex data preprocessing and fusion processes
Noise handling	Good—traditional methods effectively handle noise under certain conditions, and deep learning models can mitigate noise effects through training	Strong—multimodal fusion can complement information from different sources, improving robustness to noise and enhancing detection accuracy

References

Azimi, M.; Eslamlou, A.D.; Pekcan, G. Data-driven structural health monitoring and damage detection through deep learning: State-of-the-art review. Sensors 2020, 20, 2778. [Google Scholar] [CrossRef] [PubMed]
Han, X.; Zhao, Z. Structural surface crack detection method based on computer vision technology. J. Build. Struct. 2018, 39, 418–427. [Google Scholar]
Kruachottikul, P.; Cooharojananone, N.; Phanomchoeng, G.; Chavarnakul, T.; Kovitanggoon, K.; Trakulwaranont, D. Deep learning-based visual defect-inspection system for reinforced concrete bridge substructure: A case of thailand’s department of highways. J. Civ. Struct. Health Monit. 2021, 11, 949–965. [Google Scholar] [CrossRef]
Gehri, N.; Mata-Falcón, J.; Kaufmann, W. Automated crack detection and measurement based on digital image correlation. Constr. Build. Mater. 2020, 256, 119383. [Google Scholar] [CrossRef]
Mohan, A.; Poobal, S. Crack detection using image processing: A critical review and analysis. Alex. Eng. J. 2018, 57, 787–798. [Google Scholar] [CrossRef]
Liu, Y.; Fan, J.; Nie, J.; Kong, S.; Qi, Y. Review and prospect of digital-image-based crack detection of structure surface. China Civ. Eng. J. 2021, 54, 79–98. [Google Scholar]
Hsieh, Y.-A.; Tsai, Y.J. Machine learning for crack detection: Review and model performance comparison. J. Comput. Civ. Eng. 2020, 34, 04020038. [Google Scholar] [CrossRef]
Xu, Y.; Bao, Y.; Chen, J.; Zuo, W.; Li, H. Surface fatigue crack identification in steel box girder of bridges by a deep fusion convolutional neural network based on consumer-grade camera images. Struct. Health Monit. 2019, 18, 653–674. [Google Scholar] [CrossRef]
Wang, W.; Deng, L.; Shao, X. Fatigue design of steel bridges considering the effect of dynamic vehicle loading and overloaded trucks. J. Bridge Eng. 2016, 21, 04016048. [Google Scholar] [CrossRef]
Zheng, K.; Zhou, S.; Zhang, Y.; Wei, Y.; Wang, J.; Wang, Y.; Qin, X. Simplified evaluation of shear stiffness degradation of diagonally cracked reinforced concrete beams. Materials 2023, 16, 4752. [Google Scholar] [CrossRef]
Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
Otsu, N. A threshold selection method from gray-level histograms. Automatica 1975, 11, 23–27. [Google Scholar] [CrossRef]
Sohn, H.G.; Lim, Y.M.; Yun, K.H.; Kim, G.H. Monitoring crack changes in concrete structures. Comput.-Aided Civ. Infrastruct. Eng. 2005, 20, 52–61. [Google Scholar] [CrossRef]
Wang, P.; Qiao, H.; Feng, Q.; Xue, C. Internal corrosion cracks evolution in reinforced magnesium oxychloride cement concrete. Adv. Cem. Res. 2023, 36, 15–30. [Google Scholar] [CrossRef]
Loutridis, S.; Douka, E.; Trochidis, A. Crack identification in double-cracked beams using wavelet analysis. J. Sound Vib. 2004, 277, 1025–1039. [Google Scholar] [CrossRef]
Fan, C.L. Detection of multidamage to reinforced concrete using support vector machine-based clustering from digital images. Struct. Control Health Monit. 2021, 28, e2841. [Google Scholar] [CrossRef]
Kyal, C.; Reza, M.; Varu, B.; Shreya, S. Image-based concrete crack detection using random forest and convolution neural network. In Computational Intelligence in Pattern Recognition: Proceedings of the International Conference on Computational Intelligence in Pattern Recognition (CIPR 2021), Held at the Institute of Engineering and Management, Kolkata, West Bengal, India, on 24–25 April 2021; Springer: Singapore, 2022; pp. 471–481. [Google Scholar]
Jia, H.; Lin, J.; Liu, J. Bridge seismic damage assessment model applying artificial neural networks and the random forest algorithm. Adv. Civ. Eng. 2020, 2020, 6548682. [Google Scholar] [CrossRef]
Park, M.J.; Kim, J.; Jeong, S.; Jang, A.; Bae, J.; Ju, Y.K. Machine learning-based concrete crack depth prediction using thermal images taken under daylight conditions. Remote Sens. 2022, 14, 2151. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using u-net fully convolutional networks. Autom. Constr. 2019, 104, 129–139. [Google Scholar] [CrossRef]
Li, G.; Ma, B.; He, S.; Ren, X.; Liu, Q. Automatic tunnel crack detection based on u-net and a convolutional neural network with alternately updated clique. Sensors 2020, 20, 717. [Google Scholar] [CrossRef] [PubMed]
Chaiyasarn, K.; Buatik, A.; Mohamad, H.; Zhou, M.; Kongsilp, S.; Poovarodom, N. Integrated pixel-level cnn-fcn crack detection via photogrammetric 3d texture mapping of concrete structures. Autom. Constr. 2022, 140, 104388. [Google Scholar] [CrossRef]
Li, S.; Zhao, X.; Zhou, G. Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput.-Aided Civ. Infrastruct. Eng. 2019, 34, 616–634. [Google Scholar] [CrossRef]
Zheng, X.; Zhang, S.; Li, X.; Li, G.; Li, X. Lightweight bridge crack detection method based on segnet and bottleneck depth-separable convolution with residuals. IEEE Access 2021, 9, 161649–161668. [Google Scholar] [CrossRef]
Azouz, Z.; Honarvar Shakibaei Asli, B.; Khan, M. Evolution of crack analysis in structures using image processing technique: A review. Electronics 2023, 12, 3862. [Google Scholar] [CrossRef]
Hamishebahar, Y.; Guan, H.; So, S.; Jo, J. A comprehensive review of deep learning-based crack detection approaches. Appl. Sci. 2022, 12, 1374. [Google Scholar] [CrossRef]
Meng, S.; Gao, Z.; Zhou, Y.; He, B.; Djerrad, A. Real-time automatic crack detection method based on drone. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 849–872. [Google Scholar] [CrossRef]
Humpe, A. Bridge inspection with an off-the-shelf 360 camera drone. Drones 2020, 4, 67. [Google Scholar] [CrossRef]
Truong-Hong, L.; Lindenbergh, R. Automatically extracting surfaces of reinforced concrete bridges from terrestrial laser scanning point clouds. Autom. Constr. 2022, 135, 104127. [Google Scholar] [CrossRef]
Cusson, D.; Rossi, C.; Ozkan, I.F. Early warning system for the detection of unexpected bridge displacements from radar satellite data. J. Civ. Struct. Health Monit. 2021, 11, 189–204. [Google Scholar] [CrossRef]
Bonaldo, G.; Caprino, A.; Lorenzoni, F.; da Porto, F. Monitoring displacements and damage detection through satellite MT-INSAR techniques: A new methodology and application to a case study in rome (Italy). Remote Sens. 2023, 15, 1177. [Google Scholar] [CrossRef]
Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A.; Zhang, L. Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters. Remote Sens. Environ. 2021, 265, 112636. [Google Scholar] [CrossRef]
Chen, X.; Zhang, X.; Ren, M.; Zhou, B.; Sun, M.; Feng, Z.; Chen, B.; Zhi, X. A multiscale enhanced pavement crack segmentation network coupling spectral and spatial information of UAV hyperspectral imagery. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103772. [Google Scholar] [CrossRef]
Liu, F.; Liu, J.; Wang, L. Deep learning and infrared thermography for asphalt pavement crack severity classification. Autom. Constr. 2022, 140, 104383. [Google Scholar] [CrossRef]
Liu, S.; Han, Y.; Xu, L. Recognition of road cracks based on multi-scale retinex fused with wavelet transform. Array 2022, 15, 100193. [Google Scholar] [CrossRef]
Zhang, H.; Qian, Z.; Tan, Y.; Xie, Y.; Li, M. Investigation of pavement crack detection based on deep learning method using weakly supervised instance segmentation framework. Constr. Build. Mater. 2022, 358, 129117. [Google Scholar] [CrossRef]
Dorafshan, S.; Thomas, R.J.; Maguire, M. Comparison of deep convolutional neural networks and edge detectors for image-based crack detection in concrete. Constr. Build. Mater. 2018, 186, 1031–1045. [Google Scholar] [CrossRef]
Munawar, H.S.; Hammad, A.W.; Haddad, A.; Soares, C.A.P.; Waller, S.T. Image-based crack detection methods: A review. Infrastructures 2021, 6, 115. [Google Scholar] [CrossRef]
Chen, D.; Li, X.; Hu, F.; Mathiopoulos, P.T.; Di, S.; Sui, M.; Peethambaran, J. Edpnet: An encoding–decoding network with pyramidal representation for semantic image segmentation. Sensors 2023, 23, 3205. [Google Scholar] [CrossRef]
Mo, S.; Shi, Y.; Yuan, Q.; Li, M. A survey of deep learning road extraction algorithms using high-resolution remote sensing images. Sensors 2024, 24, 1708. [Google Scholar] [CrossRef]
Chen, D.; Li, J.; Di, S.; Peethambaran, J.; Xiang, G.; Wan, L.; Li, X. Critical points extraction from building façades by analyzing gradient structure tensor. Remote Sens. 2021, 13, 3146. [Google Scholar] [CrossRef]
Liu, Y.; Yeoh, J.K.; Chua, D.K. Deep learning-based enhancement of motion blurred UAV concrete crack images. J. Comput. Civ. Eng. 2020, 34, 04020028. [Google Scholar] [CrossRef]
Flah, M.; Nunez, I.; Ben Chaabene, W.; Nehdi, M.L. Machine learning algorithms in civil structural health monitoring: A systematic review. Arch. Comput. Methods Eng. 2021, 28, 2621–2643. [Google Scholar] [CrossRef]
Li, G.; Li, X.; Zhou, J.; Liu, D.; Ren, W. Pixel-level bridge crack detection using a deep fusion about recurrent residual convolution and context encoder network. Measurement 2021, 176, 109171. [Google Scholar] [CrossRef]
Ali, R.; Chuah, J.H.; Talip, M.S.A.; Mokhtar, N.; Shoaib, M.A. Structural crack detection using deep convolutional neural networks. Autom. Constr. 2022, 133, 103989. [Google Scholar] [CrossRef]
Wang, H.; Li, Y.; Dang, L.M.; Lee, S.; Moon, H. Pixel-level tunnel crack segmentation using a weakly supervised annotation approach. Comput. Ind. 2021, 133, 103545. [Google Scholar] [CrossRef]
Zhu, J.; Song, J. Weakly supervised network based intelligent identification of cracks in asphalt concrete bridge deck. Alex. Eng. J. 2020, 59, 1307–1317. [Google Scholar] [CrossRef]
Li, Y.; Bao, T.; Xu, B.; Shu, X.; Zhou, Y.; Du, Y.; Wang, R.; Zhang, K. A deep residual neural network framework with transfer learning for concrete dams patch-level crack classification and weakly-supervised localization. Measurement 2022, 188, 110641. [Google Scholar] [CrossRef]
Yang, Q.; Shi, W.; Chen, J.; Lin, W. Deep convolution neural network-based transfer learning method for civil infrastructure crack detection. Autom. Constr. 2020, 116, 103199. [Google Scholar] [CrossRef]
Dais, D.; Bal, I.E.; Smyrou, E.; Sarhosis, V. Automatic crack classification and segmentation on masonry surfaces using convolutional neural networks and transfer learning. Autom. Constr. 2021, 125, 103606. [Google Scholar] [CrossRef]
Abdellatif, M.; Peel, H.; Cohn, A.G.; Fuentes, R. Combining block-based and pixel-based approaches to improve crack detection and localisation. Autom. Constr. 2021, 122, 103492. [Google Scholar] [CrossRef]
Dan, D.; Dan, Q. Automatic recognition of surface cracks in bridges based on 2D-APES and mobile machine vision. Measurement 2021, 168, 108429. [Google Scholar] [CrossRef]
Weng, X.; Huang, Y.; Wang, W. Segment-based pavement crack quantification. Autom. Constr. 2019, 105, 102819. [Google Scholar] [CrossRef]
Kao, S.-P.; Chang, Y.-C.; Wang, F.-L. Combining the YOLOv4 deep learning model with UAV imagery processing technology in the extraction and quantization of cracks in bridges. Sensors 2023, 23, 2572. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Xu, X.; He, X.; Wei, X.; Yang, H. Intelligent crack detection method based on GM-ResNet. Sensors 2023, 23, 8369. [Google Scholar] [CrossRef] [PubMed]
Choi, Y.; Park, H.W.; Mi, Y.; Song, S. Crack detection and analysis of concrete structures based on neural network and clustering. Sensors 2024, 24, 1725. [Google Scholar] [CrossRef] [PubMed]
Guo, J.-M.; Markoni, H.; Lee, J.-D. BARNet: Boundary aware refinement network for crack detection. IEEE Trans. Intell. Transp. Syst. 2021, 23, 7343–7358. [Google Scholar] [CrossRef]
Luo, J.; Lin, H.; Wei, X.; Wang, Y. Adaptive canny and semantic segmentation networks based on feature fusion for road crack detection. IEEE Access 2023, 11, 51740–51753. [Google Scholar] [CrossRef]
Ranyal, E.; Sadhu, A.; Jain, K. Enhancing pavement health assessment: An attention-based approach for accurate crack detection, measurement, and mapping. Expert Syst. Appl. 2024, 247, 123314. [Google Scholar] [CrossRef]
Liu, K.; Chen, B.M. Industrial UAV-based unsupervised domain adaptive crack recognitions: From database towards real-site infrastructural inspections. IEEE Trans. Ind. Electron. 2022, 70, 9410–9420. [Google Scholar] [CrossRef]
Wang, W.; Hu, W.; Wang, W.; Xu, X.; Wang, M.; Shi, Y.; Qiu, S.; Tutumluer, E. Automated crack severity level detection and classification for ballastless track slab using deep convolutional neural network. Autom. Constr. 2021, 124, 103484. [Google Scholar] [CrossRef]
Xu, Z.; Zhang, X.; Chen, W.; Liu, J.; Xu, T.; Wang, Z. Muraldiff: Diffusion for ancient murals restoration on large-scale pre-training. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2169–2181. [Google Scholar] [CrossRef]
Bradley, D.; Roth, G. Adaptive thresholding using the integral image. J. Graph. Tools 2007, 12, 13–21. [Google Scholar] [CrossRef]
Sezgin, M.; Sankur, B.l. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 2004, 13, 146–168. [Google Scholar]
Kapur, J.N.; Sahoo, P.K.; Wong, A.K. A new method for gray-level picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 1985, 29, 273–285. [Google Scholar] [CrossRef]
Pal, N.R.; Pal, S.K. A review on image segmentation techniques. Pattern Recognit. 1993, 26, 1277–1294. [Google Scholar] [CrossRef]
Flah, M.; Suleiman, A.R.; Nehdi, M.L. Classification and quantification of cracks in concrete structures using deep learning image-based techniques. Cem. Concr. Compos. 2020, 114, 103781. [Google Scholar] [CrossRef]
Mazni, M.; Husain, A.R.; Shapiai, M.I.; Ibrahim, I.S.; Anggara, D.W.; Zulkifli, R. An investigation into real-time surface crack classification and measurement for structural health monitoring using transfer learning convolutional neural networks and otsu method. Alex. Eng. J. 2024, 92, 310–320. [Google Scholar] [CrossRef]
He, Z.; Xu, W. Deep learning and image preprocessing-based crack repair trace and secondary crack classification detection method for concrete bridges. Struct. Infrastruct. Eng. 2024, 20, 1–17. [Google Scholar] [CrossRef]
He, T.; Li, H.; Qian, Z.; Niu, C.; Huang, R. Research on weakly supervised pavement crack segmentation based on defect location by generative adversarial network and target re-optimization. Constr. Build. Mater. 2024, 411, 134668. [Google Scholar] [CrossRef]
Su, H.; Wang, X.; Han, T.; Wang, Z.; Zhao, Z.; Zhang, P. Research on a U-Net bridge crack identification and feature-calculation methods based on a CBAM attention mechanism. Buildings 2022, 12, 1561. [Google Scholar] [CrossRef]
Kang, D.; Benipal, S.S.; Gopal, D.L.; Cha, Y.-J. Hybrid pixel-level concrete crack segmentation and quantification across complex backgrounds using deep learning. Autom. Constr. 2020, 118, 103291. [Google Scholar] [CrossRef]
Lei, Q.; Zhong, J.; Wang, C. Joint optimization of crack segmentation with an adaptive dynamic threshold module. IEEE Trans. Intell. Transp. Syst. 2024, 25, 6902–6916. [Google Scholar] [CrossRef]
Lei, Q.; Zhong, J.; Wang, C.; Xia, Y.; Zhou, Y. Dynamic thresholding for accurate crack segmentation using multi-objective optimization. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Turin, Italy, 18 September 2023; Springer: Cham, Switzerland, 2023; pp. 389–404. [Google Scholar]
Vincent, L.; Soille, P. Watersheds in digital spaces: An efficient algorithm based on immersion simulations. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 583–598. [Google Scholar] [CrossRef]
Huang, H.; Zhao, S.; Zhang, D.; Chen, J. Deep learning-based instance segmentation of cracks from shield tunnel lining images. Struct. Infrastruct. Eng. 2022, 18, 183–196. [Google Scholar] [CrossRef]
Fan, Z.; Lin, H.; Li, C.; Su, J.; Bruno, S.; Loprencipe, G. Use of parallel resnet for high-performance pavement crack detection and measurement. Sustainability 2022, 14, 1825. [Google Scholar] [CrossRef]
Kong, S.Y.; Fan, J.S.; Liu, Y.F.; Wei, X.C.; Ma, X.W. Automated crack assessment and quantitative growth monitoring. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 656–674. [Google Scholar] [CrossRef]
Dang, L.M.; Wang, H.; Li, Y.; Park, Y.; Oh, C.; Nguyen, T.N.; Moon, H. Automatic tunnel lining crack evaluation and measurement using deep learning. Tunn. Undergr. Space Technol. 2022, 124, 104472. [Google Scholar] [CrossRef]
Andrushia, A.D.; Anand, N.; Lubloy, E. Deep learning based thermal crack detection on structural concrete exposed to elevated temperature. Adv. Struct. Eng. 2021, 24, 1896–1909. [Google Scholar] [CrossRef]
Dang, L.M.; Wang, H.; Li, Y.; Nguyen, L.Q.; Nguyen, T.N.; Song, H.-K.; Moon, H. Deep learning-based masonry crack segmentation and real-life crack length measurement. Constr. Build. Mater. 2022, 359, 129438. [Google Scholar] [CrossRef]
Nguyen, A.; Gharehbaghi, V.; Le, N.T.; Sterling, L.; Chaudhry, U.I.; Crawford, S. ASR crack identification in bridges using deep learning and texture analysis. Structures 2023, 50, 494–507. [Google Scholar] [CrossRef]
Dong, C.; Li, L.; Yan, J.; Zhang, Z.; Pan, H.; Catbas, F.N. Pixel-level fatigue crack segmentation in large-scale images of steel structures using an encoder–decoder network. Sensors 2021, 21, 4135. [Google Scholar] [CrossRef] [PubMed]
Jian, L.; Chengshun, L.; Guanhong, L.; Zhiyuan, Z.; Bo, H.; Feng, G.; Quanyi, X. Lightweight defect detection equipment for road tunnels. IEEE Sens. J. 2023, 24, 5107–5121. [Google Scholar]
Liang, H.; Qiu, D.; Ding, K.-L.; Zhang, Y.; Wang, Y.; Wang, X.; Liu, T.; Wan, S. Automatic pavement crack detection in multisource fusion images using similarity and difference features. IEEE Sens. J. 2023, 24, 5449–5465. [Google Scholar] [CrossRef]
Alamdari, A.G.; Ebrahimkhanlou, A. A multi-scale robotic approach for precise crack measurement in concrete structures. Autom. Constr. 2024, 158, 105215. [Google Scholar] [CrossRef]
Liu, H.; Kollosche, M.; Laflamme, S.; Clarke, D.R. Multifunctional soft stretchable strain sensor for complementary optical and electrical sensing of fatigue cracks. Smart Mater. Struct. 2023, 32, 045010. [Google Scholar] [CrossRef]
Dang, D.-Z.; Wang, Y.-W.; Ni, Y.-Q. Nonlinear autoregression-based non-destructive evaluation approach for railway tracks using an ultrasonic fiber bragg grating array. Constr. Build. Mater. 2024, 411, 134728. [Google Scholar] [CrossRef]
Yan, M.; Tan, X.; Mahjoubi, S.; Bao, Y. Strain transfer effect on measurements with distributed fiber optic sensors. Autom. Constr. 2022, 139, 104262. [Google Scholar] [CrossRef]
Shukla, H.; Piratla, K. Leakage detection in water pipelines using supervised classification of acceleration signals. Autom. Constr. 2020, 117, 103256. [Google Scholar] [CrossRef]
Chen, X.; Zhang, X.; Li, J.; Ren, M.; Zhou, B. A new method for automated monitoring of road pavement aging conditions based on recurrent neural network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24510–24523. [Google Scholar] [CrossRef]
Zhang, S.; He, X.; Xue, B.; Wu, T.; Ren, K.; Zhao, T. Segment-anything embedding for pixel-level road damage extraction using high-resolution satellite images. Int. J. Appl. Earth Obs. Geoinf. 2024, 131, 103985. [Google Scholar] [CrossRef]
Park, S.E.; Eem, S.-H.; Jeon, H. Concrete crack detection and quantification using deep learning and structured light. Constr. Build. Mater. 2020, 252, 119096. [Google Scholar] [CrossRef]
Yan, Y.; Mao, Z.; Wu, J.; Padir, T.; Hajjar, J.F. Towards automated detection and quantification of concrete cracks using integrated images and lidar data from unmanned aerial vehicles. Struct. Control Health Monit. 2021, 28, e2757. [Google Scholar] [CrossRef]
Dong, Q.; Wang, S.; Chen, X.; Jiang, W.; Li, R.; Gu, X. Pavement crack detection based on point cloud data and data fusion. Philos. Trans. R. Soc. A 2023, 381, 20220165. [Google Scholar] [CrossRef] [PubMed]
Kim, H.; Lee, S.; Ahn, E.; Shin, M.; Sim, S.-H. Crack identification method for concrete structures considering angle of view using RGB-D camera-based sensor fusion. Struct. Health Monit. 2021, 20, 500–512. [Google Scholar] [CrossRef]
Chen, J.; Lu, W.; Lou, J. Automatic concrete defect detection and reconstruction by aligning aerial images onto semantic-rich building information model. Comput.-Aided Civ. Infrastruct. Eng. 2023, 38, 1079–1098. [Google Scholar] [CrossRef]
Pozzer, S.; Rezazadeh Azar, E.; Dalla Rosa, F.; Chamberlain Pravia, Z.M. Semantic segmentation of defects in infrared thermographic images of highly damaged concrete structures. J. Perform. Constr. Facil. 2021, 35, 04020131. [Google Scholar] [CrossRef]
Kaur, R.; Singh, S. A comprehensive review of object detection with deep learning. Digit. Signal Process. 2023, 132, 103812. [Google Scholar] [CrossRef]
Sharma, V.K.; Mir, R.N. A comprehensive and systematic look up into deep learning based object detection techniques: A review. Comput. Sci. Rev. 2020, 38, 100301. [Google Scholar] [CrossRef]
Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 3708–3712. [Google Scholar]
Yang, C.; Chen, J.; Li, Z.; Huang, Y. Structural crack detection and recognition based on deep learning. Appl. Sci. 2021, 11, 2868. [Google Scholar] [CrossRef]
Rajadurai, R.-S.; Kang, S.-T. Automated vision-based crack detection on concrete surfaces using deep learning. Appl. Sci. 2021, 11, 5229. [Google Scholar] [CrossRef]
Kim, B.; Yuvaraj, N.; Sri Preethaa, K.; Arun Pandian, R. Surface crack detection using deep learning with shallow CNN architecture for enhanced computation. Neural Comput. Appl. 2021, 33, 9289–9305. [Google Scholar] [CrossRef]
O’Brien, D.; Osborne, J.A.; Perez-Duenas, E.; Cunningham, R.; Li, Z. Automated crack classification for the CERN underground tunnel infrastructure using deep learning. Tunn. Undergr. Space Technol. 2023, 131, 104668. [Google Scholar]
Chen, K.; Reichard, G.; Xu, X.; Akanmu, A. Automated crack segmentation in close-range building façade inspection images using deep learning techniques. J. Build. Eng. 2021, 43, 102913. [Google Scholar] [CrossRef]
Dong, Z.; Wang, J.; Cui, B.; Wang, D.; Wang, X. Patch-based weakly supervised semantic segmentation network for crack detection. Constr. Build. Mater. 2020, 258, 120291. [Google Scholar] [CrossRef]
Buatik, A.; Thansirichaisree, P.; Kalpiyapun, P.; Khademi, N.; Pasityothin, I.; Poovarodom, N. Mosaic crack mapping of footings by convolutional neural networks. Sci. Rep. 2024, 14, 7851. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Zhang, L. Detection of pavement cracks by deep learning models of transformer and UNet. arXiv 2023, arXiv:2304.12596. [Google Scholar] [CrossRef]
Al-Huda, Z.; Peng, B.; Algburi, R.N.A.; Al-antari, M.A.; Rabea, A.-J.; Zhai, D. A hybrid deep learning pavement crack semantic segmentation. Eng. Appl. Artif. Intell. 2023, 122, 106142. [Google Scholar] [CrossRef]
Shamsabadi, E.A.; Xu, C.; Rao, A.S.; Nguyen, T.; Ngo, T.; Dias-da-Costa, D. Vision transformer-based autonomous crack detection on asphalt and concrete surfaces. Autom. Constr. 2022, 140, 104316. [Google Scholar] [CrossRef]
Huang, S.; Tang, W.; Huang, G.; Huangfu, L.; Yang, D. Weakly supervised patch label inference networks for efficient pavement distress detection and recognition in the wild. IEEE Trans. Intell. Transp. Syst. 2023, 24, 5216–5228. [Google Scholar] [CrossRef]
Huang, G.; Huang, S.; Huangfu, L.; Yang, D. Weakly supervised patch label inference network with image pyramid for pavement diseases recognition in the wild. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 7978–7982. [Google Scholar]
Guo, J.-M.; Markoni, H. Efficient and adaptable patch-based crack detection. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21885–21896. [Google Scholar] [CrossRef]
König, J.; Jenkins, M.D.; Mannion, M.; Barrie, P.; Morison, G. Weakly-supervised surface crack segmentation by generating pseudo-labels using localization with a classifier and thresholding. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24083–24094. [Google Scholar] [CrossRef]
Al-Huda, Z.; Peng, B.; Algburi, R.N.A.; Al-antari, M.A.; Rabea, A.-J.; Al-maqtari, O.; Zhai, D. Asymmetric dual-decoder-U-Net for pavement crack semantic segmentation. Autom. Constr. 2023, 156, 105138. [Google Scholar] [CrossRef]
Wen, T.; Lang, H.; Ding, S.; Lu, J.J.; Xing, Y. PCDNet: Seed operation-based deep learning model for pavement crack detection on 3d asphalt surface. J. Transp. Eng. Part B Pavements 2022, 148, 04022023. [Google Scholar] [CrossRef]
Mishra, A.; Gangisetti, G.; Eftekhar Azam, Y.; Khazanchi, D. Weakly supervised crack segmentation using crack attention networks on concrete structures. Struct. Health Monit. 2024, 23, 14759217241228150. [Google Scholar] [CrossRef]
Kompanets, A.; Pai, G.; Duits, R.; Leonetti, D.; Snijder, B. Deep learning for segmentation of cracks in high-resolution images of steel bridges. arXiv 2024, arXiv:2403.17725. [Google Scholar]
Liu, Y.; Yeoh, J.K. Robust pixel-wise concrete crack segmentation and properties retrieval using image patches. Autom. Constr. 2021, 123, 103535. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Xu, Y.; Li, D.; Xie, Q.; Wu, Q.; Wang, J. Automatic defect detection and segmentation of tunnel surface using modified mask R-CNN. Measurement 2021, 178, 109316. [Google Scholar] [CrossRef]
Zhao, W.; Liu, Y.; Zhang, J.; Shao, Y.; Shu, J. Automatic pixel-level crack detection and evaluation of concrete structures using deep learning. Struct. Control Health Monit. 2022, 29, e2981. [Google Scholar] [CrossRef]
Li, R.; Yu, J.; Li, F.; Yang, R.; Wang, Y.; Peng, Z. Automatic bridge crack detection using unmanned aerial vehicle and faster R-CNN. Constr. Build. Mater. 2023, 362, 129659. [Google Scholar] [CrossRef]
Tran, T.S.; Nguyen, S.D.; Lee, H.J.; Tran, V.P. Advanced crack detection and segmentation on bridge decks using deep learning. Constr. Build. Mater. 2023, 400, 132839. [Google Scholar] [CrossRef]
Zhang, J.; Qian, S.; Tan, C. Automated bridge crack detection method based on lightweight vision models. Complex Intell. Syst. 2023, 9, 1639–1652. [Google Scholar] [CrossRef]
Ren, R.; Liu, F.; Shi, P.; Wang, H.; Huang, Y. Preprocessing of crack recognition: Automatic crack-location method based on deep learning. J. Mater. Civ. Eng. 2023, 35, 04022452. [Google Scholar] [CrossRef]
Liu, Z.; Yeoh, J.K.; Gu, X.; Dong, Q.; Chen, Y.; Wu, W.; Wang, L.; Wang, D. Automatic pixel-level detection of vertical cracks in asphalt pavement based on gpr investigation and improved mask R-CNN. Autom. Constr. 2023, 146, 104689. [Google Scholar] [CrossRef]
Li, Z.; Zhu, H.; Huang, M. A deep learning-based fine crack segmentation network on full-scale steel bridge images with complicated backgrounds. IEEE Access 2021, 9, 114989–114997. [Google Scholar] [CrossRef]
Alipour, M.; Harris, D.K.; Miller, G.R. Robust pixel-level crack detection using deep fully convolutional neural networks. J. Comput. Civ. Eng. 2019, 33, 04019040. [Google Scholar] [CrossRef]
Wang, S.; Pan, Y.; Chen, M.; Zhang, Y.; Wu, X. FCN-SFW: Steel structure crack segmentation using a fully convolutional network and structured forests. IEEE Access 2020, 8, 214358–214373. [Google Scholar] [CrossRef]
Hang, J.; Wu, Y.; Li, Y.; Lai, T.; Zhang, J.; Li, Y. A deep learning semantic segmentation network with attention mechanism for concrete crack detection. Struct. Health Monit. 2023, 22, 3006–3026. [Google Scholar] [CrossRef]
Sun, Y.; Yang, Y.; Yao, G.; Wei, F.; Wong, M. Autonomous crack and bughole detection for concrete surface image based on deep learning. IEEE Access 2021, 9, 85709–85720. [Google Scholar] [CrossRef]
Wang, Z.; Leng, Z.; Zhang, Z. A weakly-supervised transformer-based hybrid network with multi-attention for pavement crack detection. Constr. Build. Mater. 2024, 411, 134134. [Google Scholar] [CrossRef]
Chen, T.; Cai, Z.; Zhao, X.; Chen, C.; Liang, X.; Zou, T.; Wang, P. Pavement crack detection and recognition using the architecture of segNet. J. Ind. Inf. Integr. 2020, 18, 100144. [Google Scholar] [CrossRef]
Bai, S.; Ma, M.; Yang, L.; Liu, Y. Pixel-wise crack defect segmentation with dual-encoder fusion network. Constr. Build. Mater. 2024, 426, 136179. [Google Scholar] [CrossRef]
Wang, W.; Su, C. Semi-supervised semantic segmentation network for surface crack detection. Autom. Constr. 2021, 128, 103786. [Google Scholar] [CrossRef]
Tabernik, D.; Šela, S.; Skvarč, J.; Skočaj, D. Segmentation-based deep-learning approach for surface-defect detection. J. Intell. Manuf. 2020, 31, 759–776. [Google Scholar] [CrossRef]
König, J.; Jenkins, M.D.; Mannion, M.; Barrie, P.; Morison, G. Optimized deep encoder-decoder methods for crack segmentation. Digit. Signal Process. 2021, 108, 102907. [Google Scholar] [CrossRef]
Wang, C.; Liu, H.; An, X.; Gong, Z.; Deng, F. Swincrack: Pavement crack detection using convolutional swin-transformer network. Digit. Signal Process. 2024, 145, 104297. [Google Scholar] [CrossRef]
Lan, Z.-X.; Dong, X.-M. Minicrack: A simple but efficient convolutional neural network for pixel-level narrow crack detection. Comput. Ind. 2022, 141, 103698. [Google Scholar] [CrossRef]
Salton, G. Introduction to Modern Information Retrieval; McGraw-Hill: New York, NY, USA, 1983. [Google Scholar]
Jenkins, M.D.; Carr, T.A.; Iglesias, M.I.; Buggy, T.; Morison, G. A deep convolutional neural network for semantic pixel-wise segmentation of road and pavement surface cracks. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018; IEEE: Piscataway, NJ, USA; pp. 2120–2124. [Google Scholar]
Tsai, Y.-C.; Chatterjee, A. Comprehensive, quantitative crack detection algorithm performance evaluation system. J. Comput. Civ. Eng. 2017, 31, 04017047. [Google Scholar] [CrossRef]
Li, H.; Wang, J.; Zhang, Y.; Wang, Z.; Wang, T. A study on evaluation standard for automatic crack detection regard the random fractal. arXiv 2020, arXiv:2007.12082. [Google Scholar]

Figure 1. Year-wise distribution of articles.

Figure 2. Keywords for crack detection.

Figure 3. Performance comparison of multi-source data fusion crack detection.

Figure 4. Two-stage detectors from 2014 to present.

Figure 5. Single-stage detectors from 2014 to present.

Table 1. Crack detection method based on edge detection and deep learning.

Method	Features	Domain	Dataset	Image Device/Source	Results	Limitations
Canny and YOLOv4 []	Crack detection and measurement	Bridges	1463 images 256 × 256 pixels	Smartphone and DJI UAV	Accuracy = 92% mAP = 92%	The Canny edge detector is affected by the threshold
Canny and GM-ResNet []	Crack detection, measurement, and classification	Road	522 images 224 × 224 pixels	Concrete crack sub-dataset	Precision = 97.9% Recall = 98.9% F1 measure = 98.0% Accuracy in shadow conditions = 99.3% Accuracy in shadow-free conditions = 99.9%	Its detection performance for complex cracks is not yet perfect
Sobel and ResNet50 []	Crack detection	Concrete	4500 images 100 × 100 pixels	FLIR E8	Precision = 98.4% Recall = 88.7% F1 measure = 93.2%	-
Sobel and BARNet []	Crack detection and localization	Road	206 images 800 × 600 pixels	CrackTree200 dataset	AIU = 19.85% ODS = 79.9% OIS = 81.4%	Hyperparameter tuning is needed to balance the penalty weights for different types of cracks
Canny and DeepLabV3+ []	Crack detection	Road	2000 × 1500 pixels	Crack500 dataset	MIoU = 77.64% MAE = 1.55 PA = 97.38% F1 score = 63%	Detection performance deteriorating in dark environments or when interfering objects are present
Canny and RetinaNet []	Crack detection and measurement	Road	850 images 256 × 256 pixels	SDNET 2018 dataset	Precision = 85.96% Recall = 84.48% F1 score = 85.21%	-
Canny and Transformer []	Crack detection and segmentation	Buildings	11298 images 450 × 450 pixels	UAVs	GA = 83.5% MIoU = 76.2% Precision = 74.3% Recall = 75.2% F1 score = 74.7%	Resulting in a marginal increment in computational costs for various network backbones
Canny and Inception-ResNet-v2 []	Crack detection, measurement, and classification	High-speed railway	4650 images 400 × 400 pixels	The track inspection vehicle	High severity level: Precision = 98.37% Recall = 93.82% F1 score = 95.99% Low severity level: Precision = 94.25% Recall = 98.39% F1 score = 96.23%	Only the average width was used to define the severity of the crack, and the influence of the length on the detection result was not considered
Canny and Unet []	Crack detection	Buildings	165 images	-	SSIM = 14.5392 PSNR = 0.3206 RMSE = 0.0747	Relies on a large amount of mural data for training and enhancement

Table 2. Crack detection method based on threshold segmentation and deep learning.

Method	Features	Domain	Dataset	Image Device/Source	Results	Limitations
Otsu and Keras classifier []	Crack detection, measurement, and classification	Concrete	4000 images 227 × 227 pixels	Open dataset available	Classifiers accuracy = 98.25%, 97.18%, 96.17% Length error = 1.5% Width error = 5% Angle of orientation error = 2%	Only accurately quantify one single crack per image
Otsu and TL MobileNetV2 []	Crack detection, measurement, and classification	Concrete	11435 images 224 × 224 pixels	Mendeley data—crack detection	Accuracy = 99.87% Recall = 99.74% Precision = 100% F1 score = 99.87%	Dependency on image quality
Otsu, YOLOv7, Poisson noise, and bilateral filtering []	Crack detection and classification	Bridges	500 images 640 × 640 pixels	Dataset	Training time = 35 min Inference time = 8.9 s Target correct rate = 85.97% Negative sample misclassification rate = 42.86%	It does not provide quantified information such as length and area
Adaptive threshold and WSIS []	Crack detection	Road	320 images 3024 × 4032 pixels	Photos of cracks	Recall = 90% Precision = 52% IoU = 50% F1 score = 66% Accuracy = 98%	For some small cracks (with a width of less than 3 pixels), model can only identify the existence of small cracks, but it is difficult to depict the cracks in detail
Adaptive threshold and U-GAT-IT []	Crack detection	Road	300 training images and237 test images	DeepCrack dataset	Recall = 79.3% Precision = 82.2% F1 score = 80.7%	Further research is needed to address the interference caused by factors such as small cracks, road shadows, and water stains
Local thresholding and DCNN []	Crack detection	Concrete	125 images 227 × 227 pixels	Cameras	Accuracy = 93% Recall = 91% Precision = 92% F1 score = 91%	-
Otsu and Faster R-CNN []	Crack detection, localization, and quantification	Concrete	100 images 1920 × 1080 pixels	Nikon d7200 camera and Galaxy s9 camera	AP = 95% mIoU = 83% RMSE = 2.6 pixels Length accuracy = 93%	The proposed method is useful for concrete cracks only; its applicability for the detection of other crack materials might be limited
Adaptive Dynamic Thresholding Module (ADTM) and Mask DINO []	Crack detection and segmentation	Road	395 images 2000 × 1500 pixels	Crack500	mIoU = 81.3% mAcc = 96.4% gAcc = 85.0%	ADTM module can only handle binary classification problems
Dynamic Thresholding Branch and DeepCrack []	Crack detection and classification	Bridges	3648 × 5472 pixels	Crack500	mIoU = 79.3% mAcc = 98.5% gAcc = 86.6%	Image-level thresholds lead to misclassification of the background

Table 3. Crack detection method based on morphological operations and deep learning.

Method	Features	Domain	Dataset	Image Device/Source	Results	Limitations
Morphological closing operations and Mask R-CNN []	Crack detection	Tunnel	761 images 227 × 227 pixels	MTI-200a	Balanced accuracy = 81.94% F1 score = 68.68% IoU = 52.72%	Relatively small compared to the needs of the required sample size for universal conditions
Morphological operations and Parallel ResNet []	Crack detection and measurement	Road	206 images (CrackTree200) 800 × 600 pixels and 118 images (CFD) 320 × 480 pixels	CrackTree200 dataset and CFD dataset	CrackTree200: Precision = 94.27% Recall = 92.52% F1 = 93.08% CFD: Precision = 96.21% Recall = 95.12% F1 = 95.63%	The method was only performed on accurate static images
Closing and CNN []	Crack detection, measurement, and classification	Concrete	3208 images 256 × 256 pixels or 128 × 128 pixels	Hand-held DSLR cameras	Relative error = 5% Accuracy > 95% Loss < 0.1	The extraction of the cracks’ edge will have a larger influence on the results
Dilation and TunnelURes []	Crack detection, measurement, and classification	Tunnel	6810 images image sizes vary 10441 × 2910 to 50739 × 3140	Night 4K line-scan cameras	AUC = 0.97 PA = 0.928 IoU = 0.847	The medial-axis skeletonization algorithm created many errors because it was susceptible to the crack intersection and the image edges where the crack’s representation changed
Opening, closing, and U-Net []	Crack detection, measurement, and classification	Concrete	200 images 512 × 512 pixels	Canon SX510 HS camera	Precision = 96.52% Recall = 93.73% F measure = 96.12% Accuracy = 99.74% IoU = 78.12%	It can only detect the other type of cracks which have the same crack geometry as that of thermal cracks
Morphological operations and DeepLabV3+ []	Crack detection and measurement	Masonry structure	200 images 780 × 355 pixels and 2880 × 1920 pixels	Internet, drones, and smartphones	IoU = 0.97 F1 score = 98% Accuracy = 98%	The model will not detect crack features that do not appear in the dataset (complicated cracks, tiny cracks, etc.)
Erosion, texture analysis techniques, and InceptionV3 []	Crack detection and classification	Bridges	1706 images 256 × 256 pixels	Cameras	F1 score = 93.7% Accuracy = 94.07%	-
U-Net, opening, and closing operations []	Crack detection and segmentation	Bridges	244 images 512 × 512 pixels	Cameras	mP = 44.57% mR = 53.13% Mf1 = 42.79% mIoU = 64.79%	The model lacks generality, and there are cases of false detection

Table 4. Multi-sensor fusion for crack detection.

Sensor Type	Fusion Method	Advantages	Disadvantages	Application Scenarios
Optical sensor []	Data-level fusion	High resolution, rich in details	Susceptible to light and occlusion	Surface crack detection, general environments
Thermal sensor []	Feature level fusion	Suitable for nighttime or low-light environments, detects temperature changes	Low resolution, lack of detail	Nighttime detection, heat-sensitive areas, large-area surface crack detection
Laser sensor []	Data-level fusion and feature level fusion	High-precision 3D point cloud data, accurately measures crack morphology	High equipment cost, complex data processing	Complex structures, precise measurements
Strain sensor []	Feature level fusion and decision-level fusion	High sensitivity to structural changes; durable	Requires contact with the material; installation complexity	Monitoring structural health in bridges and buildings; detecting early-stage crack development
Ultrasonic sensor []	Data-level fusion and feature level fusion	Detects internal cracks in materials, strong penetration	Affected by material and geometric shape, limited resolution	Internal cracks, metal material detection
Optical fiber sensor []	Feature level fusion	High sensitivity to changes in material properties, non-contact measurement	Affected by environmental conditions, requires calibration	Surface crack detection, structural health monitoring
Vibration sensor []	Data-level fusion	Detects structural vibration characteristics, strong adaptability	Affected by environmental vibrations, requires complex signal processing	Dynamic crack monitoring, bridges and other structures
Multispectral satellite sensor []	Data-level fusion	Rich spectral information	Limited spectral resolution, weather- and lighting-dependent, high cost	Pavement crack detection, bridge and infrastructure monitoring, building facade inspection
High-resolution satellite sensors []	Data-level fusion and feature level fusion	High spatial resolution, wide coverage, frequent revisit times, rich information content	Weather dependency, high cost, data processing complexity, limited temporal resolution	Road and pavement crack detection, bridge and infrastructure monitoring, urban building facade inspection, railway and highway crack monitoring

Table 5. Summary of relevant details of crack classification.

Scale	Dataset/(Pixels × Pixels)	References
Image-based	227 × 227	[,,,]
	224 × 224	[]
	256 × 256	[]
	416 × 416	[]
	512 × 512	[]
Patch-based	128 × 128	[,]
	200 × 200	[]
	224 × 224	[,,,,]
	227 × 227	[]
	256 × 256	[,]
	300 × 300	[,]
	320 × 480	[,]
	544 × 384	[]
	512 × 512	[,,,]
	584 × 384	[]

Table 6. Summary of region-based crack detection.

Model	Improvement/Innovation	Dataset	Backbone	Results
Faster R-CNN []	Combined with drones for crack detection	2000 images 5280 × 2970 pixels	VGG-16	Precision = 92.03% Recall = 96.26% F1 score = 94.10%
Faster R-CNN []	Double-head structure is introduced, including an independent fully connected head and a convolution head	1622 images 1612 × 1947 pixels	ResNet50	AP = 47.2%
Mask R-CNN []	The morphological closing operation was incorporated into the M-R-101-FPN model to form an integrated model	761 images 227 × 227 pixels	ResNets and VGG	Balanced accuracy = 81.94% F1 score = 68.68% IoU = 52.72%
Mask R-CNN []	PAFPN module and edge detection branch was introduced	9680 images 1500 × 1500 pixels	ResNet-FPN	Precision = 92.03% Recall = 96.26% AP = 94.10% mAP = 90.57% Error rate = 0.57%
Mask R-CNN []	FPN structure introduces side join method and combines FPN with ResNet-101 to change RoI-Pooling layer to RoI-Align layer	3430 images 1024 × 1024 pixels	ResNet101	AP = 83.3% F1 score = 82.4% Average error = 2.33% mIoU = 70.1%
YOLOv3-tiny []	A structural crack detection and quantification method combined with structured light is proposed	500 images 640 × 640 pixels	Darknet-53	Accuracy = 94% Precision = 98%
YOLOv4 []	Some lightweight networks were used instead of the original backbone feature extraction network, and DenseNet, MobileNet, and GhostNet were selected for the lightweight networks	800 images 416 × 416 pixels	DenseNet, MobileNet v1, MobileNet v2, MobileNet v3, and GhostNet	Precision = 93.96% Recall = 90.12% F1 score = 92%
YOLOv4 []	-	1463 images 256 × 256 pixels	Darknet-53	Accuracy = 92% mAP = 92%

Table 8. Open-source crack detection datasets.

Datasets Name	Number of Images	Image Resolution	Manual Annotation	Scope of Applicability	Limitations
CrackTree200 []	206 images	800 × 600 pixels	Pixel-level annotations for cracks	Crack classification and segmentation	With only 200 images, the dataset’s relatively small size can hinder the model’s ability to generalize across diverse conditions, potentially leading to overfitting on the specific examples provided
Crack500 []	500 images	2000 × 1500 pixels	Pixel-level annotations for cracks	Crack classification and segmentation	Limited number of images compared to larger datasets, which might affect the generalization of models trained on this dataset
SDNET 2018 []	56000 images	256 × 256 pixels	Pixel-level annotations for cracks	Crack classification and segmentation	The dataset’s focus on concrete surfaces may limit the model’s performance when applied to different types of surfaces or structures
Mendeley data—crack detection []	40000 images	227 × 227 pixels	Pixel-level annotations for cracks	Crack classification	The dataset might not cover all types of cracks or surface conditions, which can limit its applicability to a wide range of real-world scenarios
DeepCrack []	2500 images	512 × 512 pixels	Annotations for cracks	Crack segmentation	The resolution might limit the ability of models to capture very small or subtle crack features
CFD []	118 images	320 × 480 pixels	Pixel-level annotations for cracks	Crack segmentation	The dataset contains a limited number of data samples, which may limit the generalization ability of the model
CrackTree260 []	260 images	800 × 600 pixels and 960 × 720 pixels	Pixel-level labeling, bounding boxes, or other crack markers	Object detection and segmentation	Because the dataset is small, it can be easy for the model to overfit the training data, especially if you’re using a complex model
CrackLS315 []	315 images	512 × 512 pixels	Pixel-level segmentation mask or bounding box	Object detection and segmentation	The small size of the dataset may make the model perform poorly in complex scenarios, especially when encountering different types of cracks or uncommon crack features
Stone331 []	331 images	512 × 512 pixels	Pixel-level segmentation mask or bounding box	Object detection and segmentation	The relatively small number of images limits the generalization ability of the model, especially in deep learning tasks where smaller datasets tend to lead to overfitting

Table 9. Performance evaluation index of crack detection model.

Index	Index Value and Calculation Formula	Curve
True positive	$T_{P}$	-
False positive	$F_{P}$	-
True negative	$T_{N}$	-
False negative	$F_{N}$	-
Precision	$P r e c i s i o n = \frac{T_{P}}{T_{P} + F_{P}}$	PRC
Recall	$R e c a l l = \frac{T_{P}}{T_{P} + F_{N}}$	PRC, ROC curve
F1 score	$F 1 - S c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$	F1 score curve
Accuracy	$A c c u r a c y = \frac{T_{P} + T_{N}}{T_{P} + T_{N} + F_{P} + F_{N}}$	Accuracy vs. threshold curve
Average precision	$A P = \sum_{i = 1}^{n} ({R e c a l l}_{i} - {R e c a l l}_{i - 1}) \cdot {P r e c i s i o n}_{i}$	PRC
Mean average precision	$m A P = \frac{\sum_{i = 1}^{N} {A P}_{i}}{N}$	-
IoU	$I o U = \frac{A r e a o f O v e r l a p}{A r e a o f U n i o n}$	IoU distribution curve, precision-recall curve with IoU thresholds

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Review of Computer Vision-Based Crack Detection Methods in Civil Infrastructure: Progress and Challenges

Abstract

1. Introduction

2. Crack Detection Combining Traditional Image Processing Methods and Deep Learning

2.1. Crack Detection Based on Image Edge Detection and Deep Learning

2.2. Crack Detection Based on Threshold Segmentation and Deep Learning

2.3. Crack Detection Based on Morphological Operations and Deep Learning

3. Crack Detection Based on Multimodal Data Fusion

3.1. Multi-Sensor Fusion

3.2. Multi-Source Data Fusion

4. Crack Detection Based on Image Semantic Understanding

4.1. Crack Detection Based on Classification Networks

4.2. Crack Detection Based on Object Detection Networks

4.3. Crack Detection Based on Segmentation Networks

5. Datasets

6. Evaluation Index

7. Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics