A Review of UAV-Based Crack Detection in Civil Infrastructure: A Multi-Level Visual Analysis Framework, Scene Adaptability, and Challenges

Bai, Yue; Quan, Wei; Shi, Xuming; Yan, Zeyi; Yuan, Guoliang

doi:10.3390/rs18111806

Open AccessReview

A Review of UAV-Based Crack Detection in Civil Infrastructure: A Multi-Level Visual Analysis Framework, Scene Adaptability, and Challenges

by

Yue Bai

¹

,

Wei Quan

^1,*,

Xuming Shi

²,

Zeyi Yan

³ and

Guoliang Yuan

⁴

¹

School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China

²

State Key Laboratory of Infrared Physics, Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200003, China

³

College of Electronic Science and Engineering, Jilin University, Changchun 130012, China

⁴

College of Physics, Northeast Normal University, Changchun 130024, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(11), 1806; https://doi.org/10.3390/rs18111806

Submission received: 21 April 2026 / Revised: 23 May 2026 / Accepted: 25 May 2026 / Published: 2 June 2026

(This article belongs to the Special Issue Unmanned Aerial Vehicle-Based Inspection in Infrastructure Maintenance)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A multi-level visual analysis framework for UAV-based crack inspection is established, organizing existing methods into image-level classification, object-level detection, pixel-level segmentation, geometric quantification, and 3D reconstruction.
Our comparative analysis across bridges, pavements, dams, building facades, and wind turbine blades shows that scene-specific differences strongly influence data acquisition strategies, model selection, and method performance.

What are the implications of the main findings?

This paper provides a systematic methodological reference for advancing UAV-based infrastructure crack inspection from algorithm development toward practical multi-scenario engineering applications.
We identify current research bottlenecks, such as limited multi-scenario generalization and multi-source heterogeneous data fusion, while highlighting future directions like visual foundation models to ensure stable structural health monitoring.

Abstract

Civil infrastructure plays a critical role in ensuring societal safety and economic development. However, structural damages such as cracks inevitably occur during long-term service. Traditional manual inspection methods are insufficient to meet the demands of large-scale and routine monitoring. Unmanned Aerial Vehicles (UAV) remote sensing has become an important approach for Structural Health Monitoring (SHM), owing to its high spatial resolution imaging capability and superior operational flexibility. Nevertheless, existing studies focus on optimizing individual algorithms, lacking a systematic analysis oriented toward multi-scenario engineering applications. Therefore, we present a comprehensive review of UAV-based crack detection techniques for infrastructure using remote sensing imagery. First, publicly available datasets, UAV platforms, and evaluation metrics are systematically summarized. Then a multi-level visual analysis framework for UAV inspection is established. The framework categorizes existing methodologies into five levels: image-level classification, object-level detection, pixel-level segmentation, geometric quantification, and three-dimensional (3D) reconstruction, followed by a systematic evaluation of representative methods. Furthermore, the applicability of different methods across diverse scenarios, including bridges, pavements, dams, building facades and wind turbine blades, is systematically explored. Finally, the key challenges and future research directions are discussed. This review aims to provide a systematic theoretical foundation and methodological reference for advancing UAV-based infrastructure crack inspection from algorithm development toward practical multi-scenario engineering applications.

Keywords:

civil infrastructure; UAV remote sensing; Structural Health Monitoring; deep learning; multi-scenario analysis

1. Introduction

1.1. Background

Civil infrastructure, including Pavements, bridges, dams and buildings, plays a fundamental role in ensuring the stability of societal operations and supporting sustainable economic development [1]. However, during long-term service, these structures are inevitably subjected to crack damage induced by material degradation, environmental temperature variations and overloading, which significantly compromises their durability and structural integrity [2]. According to their morphology and spatial distribution, infrastructure cracks can be broadly classified as longitudinal, transverse, oblique, block, and networked cracks. These crack patterns are related to different deterioration mechanisms, including traffic loading, thermal shrinkage, fatigue damage, differential settlement, and material aging. Their structural implications also vary across infrastructure scenarios. In reinforced concrete structures, cracks may accelerate the ingress of moisture and chloride ions, which promotes reinforcement corrosion and reduces structural capacity. In pavements, crack propagation can promote water infiltration, weaken the subgrade, and induce secondary distress. These risks establish cracks as early visual indicators of structural deterioration and potential safety hazards. Regular and effective inspection and maintenance of infrastructure surfaces are essential for mitigating structural deterioration and preventing structural failure. Currently, manual inspection remains the primary approach for the maintenance and monitoring of civil infrastructure. However, it suffers from several limitations. First, it incurs high operational costs, as crack detection requires substantial labor input and a variety of specialized tools, making it difficult to meet the demands of comprehensive and routine maintenance. Second, its accessibility and spatial coverage are limited. Constrained by the geographic environment and structural configurations, certain inspection tasks require auxiliary equipment such as vessels or aerial work platforms, which inherently restrict inspection efficiency and coverage. Third, it is susceptible to subjective factors. Manual inspection heavily depends on the experience of inspectors and environmental conditions, resulting in significant uncertainty in terms of objectivity and precision [3,4,5]. Therefore, to overcome the inherent limitations of manual inspection, infrastructure maintenance is transitioning from manual-dominated surveys toward intelligent monitoring based on digital data acquisition and model-assisted analysis. Unmanned Aerial Vehicle (UAV) platforms offer several advantages, including low deployment cost, flexible data acquisition perspectives, the capability for unified deployment and reuse across multiple civil infrastructure scenarios. The UAVs have become an important remote sensing means for acquiring infrastructure-related information. It has laid a solid foundation for subsequent crack classification, detection, segmentation and quantification based on UAV remote sensing imagery [6].

1.2. Advantages of UAV-Based Inspection

In practical applications of UAV-based crack monitoring for civil infrastructure, UAV platforms provide high spatial resolution imagery, diverse observation geometries and repeatable temporal acquisition. The above advantages effectively compensate for the limitations of traditional manual inspection in continuous and accurate monitoring as well as coverage of complex scenarios [7]. First, the high spatial resolution imagery enables fine-grained characterization of morphological changes on infrastructure surfaces, reducing information loss during the imaging process. The cracks on structural surfaces, such as pavements, bridges and buildings are typically characterized by small scale, elongated shapes and irregular distribution. The high spatial resolution imagery can preserve these structural details at a finer spatial scale, thereby providing a reliable data basis for subsequent crack analysis [8]. Second, compared with satellite and ground-based sensing systems that are constrained by fixed observation geometries, UAVs provide flexible viewing perspectives and high maneuverability. Through path planning and attitude adjustment, multi-angle imaging of civil infrastructure with different structural forms in different scenarios can be performed, which can effectively overcome the limitations of traditional ground detection in terms of accessibility and limited viewpoints [9]. Finally, for multi-temporal data acquisition of civil infrastructure, UAV platforms can use flight parameters such as flight paths, altitude and acquisition angles to ensure continuous and consistent data collection of the same target area. It enables the systematic recording and comparative analysis of infrastructure surface conditions over time. Based on multi-temporal UAV imagery, operation and maintenance personnel can conduct quantitative analysis of crack development trends, morphological deterioration rates, and spatial structural changes. It can improve the accuracy of infrastructure condition assessment, supporting more reliable decision-making and enhancing the continuity and reliability of long-term infrastructure monitoring [10].

1.3. Related Work

With the rapid development of UAV platforms and the continuous advancement of machine learning algorithms in feature modeling and pattern recognition, data acquisition and feature extraction capabilities in various scenarios have been significantly improved. It holds strong potential for applications in civil infrastructure monitoring. Between 2015 and 2026, substantial research has emerged regarding image datasets, detection models and optimization algorithms for diverse civil infrastructure scenarios. To systematically analyze the development trends in this field, a literature search was conducted using the Web of Science and Google Scholar databases. By employing “cracks” and “UAV” as search keywords, a total of 713 journal articles were initially retrieved. The keyword “infrastructure” was further incorporated for refined filtering, resulting in a final set of 159 publications. The annual publication trend is shown in Figure 1.

To further analyze research hotspots, a clustering analysis of keyword co-occurrence was conducted based on the 159 selected studies. The results are shown in Figure 2. The high-frequency keywords include “UAV”, “deep learning”, “crack detection”, “inspection” and “convolutional neural network (CNN)”, indicating that UAV-based crack detection for civil infrastructure and deep learning techniques are the main research focuses.

Existing review studies have mainly summarized crack detection research from the perspectives of computer vision methods, sensor technologies and data quality evaluation. Most studies focus on methodological frameworks, technical implementations, and performance assessments. First, existing studies based on computer vision methods [11,12,13] mainly focus on network architectures and data optimization. However, they lack a systematic evaluation of crack detection performance under multi-scale remote sensing imaging conditions, as well as its applicability and generalization capabilities across diverse scenarios. Second, Refs. [14,15,16,17] focus on summarizing the technical principles, data representations and structural damage detection capabilities of various sensors. However, they lack a comprehensive discussion regarding the crack characterization capabilities of different remote sensing sensors under multi-scale imaging conditions, as well as the consistencies and discrepancies in their multi-scenario applications. Third, regarding data characteristics and quality evaluation, Refs. [1,18] focus on analyzing the impact of data characteristics on evaluation performance in civil infrastructure crack datasets. However, they lack further investigation into the relationship between model evaluation results and the practical requirements of civil infrastructure monitoring.

With the gradual maturation of computer vision methods and UAV platform technologies, research on crack detection in civil infrastructure based on UAV remote sensing imagery has also evolved. It has shifted from a stage focused on algorithm performance validation to one oriented toward complex engineering requirements and practical deployment. Critical civil infrastructure such as bridges, buildings, and dams are typically constructed using reinforced concrete. The cracks in these structures often appear as elongated or network-like patterns and are characterized by complex surface textures, low contrast and susceptibility to environmental interference [19]. Accordingly, the related studies mainly focus on robust feature extraction under complex backgrounds, pixel-level edge segmentation, three-dimensional (3D) localization and geometric quantification methods [5,20,21,22,23,24]. In pavement scenarios, cracks often exhibit complex patterns and are easily confused with background elements such as road markings and shadows. Consequently, research focuses on multi-scale feature modeling and lightweight architecture optimization. These approaches aim to improve the recognition accuracy of small crack targets while maintaining inference efficiency during large-scale inspections [25,26,27,28]. Furthermore, UAV imaging of wind turbine blades at high altitudes is highly susceptible to motion blur. Related studies mainly employ image enhancement, preprocessing, and multi-source information fusion to reduce motion-induced degradation. The combined use of these techniques improves crack detection robustness under complex reflective textures and low-contrast conditions [29,30,31].

1.4. Contributions

Existing reviews lack a unified analytical framework for UAV remote sensing imaging conditions, multi-scenario applications and methodological organization. To address this gap, we provide a systematic analysis of data, methods and scenario-based applications for crack detection in UAV-based civil infrastructure inspection. The main contributions are summarized as follows:

We establish a multi-level visual analysis framework for UAV-based scenarios. From the perspective of visual task hierarchy, we systematically categorize existing methods into five levels: image-level classification, object-level detection, pixel-level segmentation, geometric quantification and 3D reconstruction. The framework offers a clear methodological structure for civil infrastructure inspection.
We comparatively analyze the engineering applicability of crack detection methods in multi-scenario UAV-based inspection. We consider typical civil infrastructure scenarios, including bridges, pavements, dams and wind turbine blades. We compare different methods in practical engineering contexts and reveal how scenario-specific differences influence model selection and performance.
We summarize the key challenges in UAV-based crack inspection and discuss future research directions. We identify current research bottlenecks, including limited multi-scenario generalization, constraints of UAV platforms and challenges in multi-source heterogeneous data fusion. We further discuss future directions in light of advances in remote sensing imaging and intelligent perception technologies.

The remainder of this review is organized as follows. Section 2 summarizes the public datasets, UAV platforms, and evaluation metrics commonly used in UAV-based crack inspection. Section 3 introduces the multi-level visual analysis framework and systematically organizes representative methods under the corresponding task hierarchy. Section 4 analyzes the applicability of these methods across typical civil infrastructure scenarios, while further identifying key challenges and future research directions.

2. Datasets, UAV Platform and Evaluation Metrics

UAV remote sensing has become an important technique for public infrastructure inspection. A common research pipeline starts with training on public datasets, followed by transfer to UAV scenarios and scenario-specific evaluation. Use of well-established public crack datasets helps reduce annotation and model development costs. Moreover, by transferring the trained model to UAV-acquired imagery for validation and engineering applications, it also provides an effective reference for performance comparison and applicability assessment of different methods under multi-scenario conditions. Therefore, this section systematically reviews three aspects: the characteristics of public datasets, the attributes of UAV platforms, and the evaluation metric system, laying the foundation for subsequent discussions on method applicability and comparative performance evaluation in multi-scenario applications.

2.1. Datasets

Existing studies rely on a variety of public crack datasets for model training and performance evaluation. Table 1 summarizes and compares the key information of typical crack datasets commonly used in the field of infrastructure crack detection, such as scale, collection platform, detection task type and application scenario.

The download links, original sources, and availability information for the public datasets are provided in our GitHub. For datasets without public download links, the corresponding original publications are provided as the primary source https://github.com/Arthasyue/Crack-detection-public-dataset-collection (accessed on 24 April 2026).

Comprising 500 high-resolution, pixel-annotated images captured via smartphones, Crack500 [32] reflects authentic road conditions, including illumination variations, water reflections, and complex pavement textures. Consequently, it serves as a standard benchmark for pavement crack semantic segmentation and model robustness evaluation.

CrackTree206 [33] contains 206 high-resolution images of road surfaces acquired in real-world environments. The images were collected under complex and variable lighting conditions, where cracks often share visual similarity with road shadows and are affected by severe shadow interference. As a result, the dataset provides a challenging benchmark for evaluating the multi-scale feature representation, crack detection performance, and cross-scene generalization capability of deep learning models.

DeepCrack [34] is widely used in studies of crack detection and pixel-level semantic segmentation for asphalt pavements and concrete surfaces. The dataset includes 537 crack images collected from different material surfaces, where the cracks present diverse multi-scale morphological patterns. Given its diversity, DeepCrack is commonly used to assess the ability of deep learning models to capture multi-scale crack features, achieve accurate detection, and generalize across different scenes.

CFD [35] contains 118 urban pavement crack images captured under complex background conditions, including shadows, water stains, oil stains, road markings, and other common sources of visual interference in urban pavement inspection. It is frequently used to evaluate and compare the feature representation and crack detection performance of algorithms in challenging urban environments.

GAPs384 [36] consists of 384 high-resolution asphalt pavement images. Crack patterns closely resemble asphalt grain textures, while illumination variations further complicate crack recognition and extraction. It has been widely adopted to evaluate model discrimination capability and recognition accuracy in complex asphalt backgrounds.

HighRPD [37] is a public dataset for UAV-based road surface distress detection. It comprises 11,696 pavement distress images annotated with bounding boxes for representative defect types, such as linear cracks, large-area cracks, and potholes. Because UAV-based imaging enables wide-area coverage across diverse road scenes, the dataset also exhibits substantial scale variation and complex background interference. These characteristics make it a valuable benchmark for evaluating the road distress recognition capability and aerial-scene generalization performance of detection models.

SDNET2018 [38] is an open benchmark dataset for crack detection and classification of concrete structures. The dataset covers 54 bridge decks, 72 walls, and 104 pavements, and contains 56,000 cropped concrete images. Owing to the presence of rough textures, structural edges, and localized damage, the images present considerable visual complexity. As a result, SDNET2018 is widely used to compare the classification and detection performance of different models in cross-structure scenarios.

UAV-pdd2023 [39] is a UAV-based dataset for road surface distress detection. It was built from aerial images collected over highways, provincial roads, and county roads under sunny and post-rain conditions. The dataset includes 2440 pavement images and 11,158 distress instances, with bounding box annotations for six categories: longitudinal cracks, transverse cracks, alligator cracks, oblique cracks, repairs, and potholes.

Bridge Crack Dataset (BCD) [40] is a public dataset for bridge crack classification. It contains 2068 bridge crack images with a resolution of 1024 × 1024, acquired by a Phantom 4 Pro UAV equipped with a CMOS camera. The images include real environmental interference, such as shadows, water stains, and strong illumination. Crack regions often occupy only a small portion of the image, which increases the practical value of the dataset for model evaluation.

TUT [41] is a multi-scene dataset for evaluating the cross-scene generalization of crack segmentation models. It contains 1408 RGB images from eight complex scenes, including bitumen, cement, bricks, plastic runways, tiles, metal, generator blades, and underground pipelines. Large differences in texture and background make TUT suitable for assessing feature decoupling and generalization across materials and scenes.

CUBIT-Det [42] is a multi-scenario infrastructure defect detection dataset collected using UAV and unmanned ground vehicles (UGV). It contains 5527 ultra-high-resolution inspection images from three public infrastructure scenes: buildings, roads, and bridges. Boundary annotations are provided for cracks, spalling, and moisture defects. Rich structural context and texture details make CUBIT-Det useful for training and evaluating model robustness and engineering applicability in ultra-high-resolution and multi-class defect detection tasks.

Crack Database of the Dam Surface (CDDS) [43] is a non-public dataset for pixel-level crack detection on dam surfaces. It contains 1000 high-resolution images acquired by a UAV during field inspection along a predetermined flight path over a large hydropower dam in the Jialing River Basin.

Dam Surface Inspection (DSI) [44] is a multi-class pixel-level segmentation dataset for dam spillway surface defect inspection. It contains 1711 images collected from the spillway sidewall of the Three Gorges Dam using a climbing robot. The dataset captures real surface conditions, including aging, peeling, and repair traces. Pixel-level annotations are provided for typical defect categories, including cracks, erosion, spots, and patched areas. Safety power cables of the climbing robot are also labeled. DSI is useful for evaluating segmentation performance in dam scenes with multi-scale defects and structural interference.

DTU [45] is a UAV-acquired dataset for wind turbine blade surface inspection. It contains 589 ultra-high-resolution images collected between 2017 and 2018. The original DTU dataset did not include annotations. Annotations were later added in [46], and the annotated dataset defines five defect categories, including missing tooth, erosion, damaged lightning receptor, crack, and paint peeling. A total of 889 defect instances are labeled in 324 images. DTU is suitable for evaluating small-scale defect detection in ultra-high-resolution aerial images of wind turbine blades.

Blade30 [47] is a UAV-based dataset for wind turbine blade surface defect inspection. It contains 1302 images covering 30 complete wind turbine blades. The images were collected from the Gobi desert, offshore, farmland, and mountainous environments. Annotations include real defects, such as cracks and erosion. Non-defect interference is also labeled, including bird droppings, fuel stains, and insect residues. Blade30 supports research on defect classification, detection, and segmentation. It is also useful for evaluating image stitching and defect deduplication in full-blade inspection.

2.2. UAV Platform

Structural Health Monitoring (SHM) is a key technical means to ensure public safety, optimize maintenance costs and extend service life [48]. Traditional inspection methods include manual visual inspection and the use of inspection vehicles or industrial ropes. They are often inefficient and costly and may introduce blind spots and safety risks in complex environments [49]. The integration of drone technology, optical photogrammetry, sensor-based perception and imaging, and computer vision methods is driving the intelligent and automated transformation of nondestructive evaluation technologies for infrastructure structural crack detection [50]. However, different UAV platforms exhibit variations in flight characteristics, endurance performance, and wind resistance stability. These differences determine the configurations of imaging sensors they can carry and their operational flight capabilities, thereby affecting the inspection coverage and the quality of image data [51]. As shown in Figure 3 [52], there are three main categories of UAV platforms, which are multi-rotor UAVs, fixed-wing UAVs, and hybrid UAVs. This section will discuss the structural characteristics of UAV platform and its applicability in infrastructure crack inspection.

Multi-rotor UAVs are the most commonly used and operationally mature platform for infrastructure inspection. Their payload capacity, flight stability, and wind resistance generally increase with rotor number. Typical configurations include quadrotor, hexarotor, and octorotor systems [53]. A multi-rotor UAV usually comprises an airframe, a flight controller, an inertial measurement unit, a global navigation satellite system, a power system, and rotors. Its takeoff, landing, and hovering performance make it well-suited to precise fixed-point observation and high-resolution imaging for fine defect detection [49,51,53]. Therefore, based on the structural and imaging characteristics described above, multi-rotor UAVs are often used for close-range inspection scenarios involving complex structures and confined spaces that require local feature extraction, such as towering bridge piers, high-rise building facades, dams, and wind turbine blades [30,48,49]. However, in terms of endurance performance, due to battery capacity, structure size, weather conditions and other factors, the effective operation time of a single flight of multirotor UAVs is usually between 20 and 40 min, and the maximum flight speed is about 60 km/h, which limits their ability and efficiency to perform large-area inspection to some extent [53,54].
Fixed-wing UAVs offer stable flight, low vibration, long endurance, and high speed. These features enable the acquisition of spatially continuous remote sensing imagery with consistent quality. Fixed-wing platforms are often used for large-scale mapping and data collection [51,53,55]. Their configuration follows the aerodynamic layout of conventional aircraft, including a fuselage, fixed wings, an empennage, and a propulsion system. Lift is generated through the interaction of the wing with the airflow, which is what supports efficient and stable flight [53]. Based on the above dynamic characteristics and design structure, fixed-wing UAVs have a stronger ability to withstand gusts than multi-rotor UAVs, allowing them to maintain stability and capture high quality images even under challenging weather conditions. However, fixed-wing UAVs cannot hover. Their imaging operations also depend on dedicated takeoff and landing areas and careful flight path planning. These constraints place higher demands on site conditions and operator experience. They also limit the use of fixed-wing UAVs for precise close-range inspection in confined spaces [52]. Consequently, fixed wing UAVs are suitable for macroscopic inspection and mapping tasks of large scale infrastructure such as highways and airport runways [56]. These types of scenarios meet the takeoff, landing, and flight requirements of fixed-wing UAVs, while their spatially continuous distribution characteristics facilitate leveraging the technical advantages of fixed-wing UAVs in large area coverage.
Hybrid UAVs combine the hovering capability of multi-rotor UAVs with the speed and endurance of fixed-wing platforms. They alleviate the endurance limits of multi-rotor systems and reduce the dependence of fixed-wing UAVs on dedicated takeoff and landing sites [57]. Hybrid UAVs integrate both rotor and fixed-wing systems. Rotors support vertical takeoff, landing, and hovering. After transition, fixed wings provide the main lift, and the propulsion system enables efficient cruise flight [58]. Therefore, the endurance and spatial coverage capability of hybrid UAVs are generally superior to those of multi-rotor platforms, but still inferior to those of fixed-wing platforms [59]. Based on the above characteristics, hybrid UAVs offer significant advantages in infrastructure inspection tasks that require multi zone, long endurance, and diverse operations, such as highways and long span bridges crossing rivers or seas [52]. However, Hybrid UAVs have more complex structural designs and power configurations. This increases manufacturing cost and system integration complexity. Flight mode transitions also require more frequent maintenance of key components, which limits their large-scale application to some extent [60].

In summary, as shown in Table 2, conventional visual inspection may take from a few hours to several days, whereas UAV-based inspection efficiency is closely related to platform endurance and scenario suitability [61]. Among these platforms, multi-rotor UAVs support close-range, high-resolution inspection of complex structures through stable hovering and flexible maneuverability. However, their endurance and spatial coverage remain limited. Fixed-wing UAVs can realize macroscopic inspection and mapping tasks of a wide range of infrastructure, relying on the advantages of high speed and long flight time, but they usually rely on open takeoff and landing conditions and operators. The hybrid-wing UAV takes into account the large-scale inspection efficiency and hovering characteristics, which can effectively achieve a balance between macroscopic inspection and fine inspection. Its complex system structure and high maintenance cost limit its large-scale application to a certain extent. Therefore, in the process of public infrastructure inspection, the inspection scope, operating environment conditions, and UAV platform performance should be comprehensively considered to ensure the inspection efficiency and operation cost.

2.3. Evaluation Metrics

In the research of infrastructure crack detection based on computer vision, model performance evaluation is a key link to measure the reliability and engineering applicability of the algorithm. Infrastructure cracks usually present characteristics such as fine-scale distribution, irregular topology, complex background interference, and a single evaluation index is difficult to fully reflect the comprehensive performance of the model in multi-level tasks such as crack classification, recognition, localization and structural representation [62]. Deep learning models show notable variation in detection accuracy, localization capability, and generalization performance in crack analysis tasks. Their evaluation is also highly sensitive to the choice of metrics [63]. Accordingly, we expand conventional single-metric evaluation into a multi-dimensional framework that covers classification accuracy, spatial localization accuracy, pixel-level segmentation quality, and engineering quantification capability. The framework enables a more comprehensive evaluation of crack detection performance. Table 3 summarizes the evaluation metrics commonly used in public infrastructure crack detection and provides a structured basis for analyzing model performance and engineering applicability.

3. Multi-Level Analysis Framework for UAV-Based Crack Detection

The application of UAVs in civil infrastructure crack detection has driven the development of SHM toward non-contact, automated and intelligent approaches. UAV platforms offer high spatial mobility and flexible deployment capabilities, enabling the safe and efficient acquisition of high-resolution imagery for complex structures such as bridge piers, building facades and dams [72]. However, UAV imagery is frequently compromised by severe illumination variations, complex background textures and multi-scale spatial resolution changes. Furthermore, the fluctuations in flight altitude, diverse imaging viewpoints and motion blur induced by flight instability further exacerbate the difficulty of feature extraction [73,74]. These multi-dimensional imaging factors significantly increase data uncertainty and scale variations. As a result, they induce cross-scenario feature distribution shifts, imposing stringent requirements on the generalization and robustness of existing computer vision models. Therefore, approaching from the perspective of the computer vision task hierarchy, this section systematically categorizes UAV-based crack analysis methods for civil infrastructure. A multi-level visual analysis framework is established, encompassing image-level recognition, object-level localization, pixel-level segmentation, geometric quantification and 3D reconstruction. The framework lays the groundwork for method selection and model design in subsequent multi-scenario applications. The multi-level visual analysis framework is illustrated in Figure 4.

3.1. Crack Classification

As a fundamental task in computer vision, image classification is primarily used to determine the presence of cracks within an input image or image patch, thereby formulating it as a binary classification problem.

3.1.1. Traditional Machine Learning and Handcrafted Feature-Based Methods

Before deep learning was widely applied in SHM, early studies on UAV-based crack classification mainly relied on traditional machine learning (ML) and handcrafted feature engineering methods. These methods employ image processing techniques to extract low-level features such as edge structures, texture patterns, and grayscale distributions. The resulting handcrafted features are subsequently fed into statistical learning classifiers to discriminate between crack and non-crack regions. ML methods do not require complex model training and offer advantages such as low computational cost and simple model structures. To address the real-time inspection requirements of low-cost micro multi-rotor UAVs, Ref. [22] proposed a visual detection approach based on the Crack Central Point Method.

By extracting the geometric features of the cracks, the method employs a Support Vector Machine classifier to execute binary classification. Moreover, some studies combine traditional image processing techniques for crack classification in aerial images. For instance, in the field of wind turbine blade inspection, Ref. [29] proposed a crack analysis method based on grayscale variation features. The method integrates image processing techniques such as motion deblurring, noise suppression, and morphological enhancement. It can analyze the distribution, severity, and development trends of cracks in UAV imagery without requiring large amounts of training data.

Although these traditional methods offer advantages in computational efficiency and deployment cost, their reliance on handcrafted features limits their effectiveness in complex civil infrastructure environments and multi-scale crack scenarios.

3.1.2. Traditional CNN and Binary Classification

In the early stage of applying deep learning methods to SHM, related studies mainly relied on classical deep CNN architectures, such as AlexNet, VGGNet, and ResNet. These models extract features through multi-layer convolutions with local receptive fields and nonlinear activations, capturing both low-level visual cues and high-level semantic representations. It enables effective discrimination of crack features in image patches. Compared with traditional handcrafted feature-based methods, CNN models demonstrate superior performance in feature representation and recognition accuracy [75,76]. Ref. [75] designed a multi-layer CNN architecture and trained it on a concrete crack dataset containing approximately 40,000 image patches. The model achieved a classification accuracy of approximately 98% under real conditions with varying illumination and complex surface textures, without image preprocessing. Such performance provides a useful benchmark for subsequent crack detection studies.

In UAV-based inspection scenarios, patch-based classification is a widely used strategy for crack recognition. It divides high-resolution aerial images into fixed-size patches through local cropping, converting the crack detection problem into a patch-level binary classification task. Annotators only need to assign a binary label to each patch, avoiding time-consuming pixel-level annotations. Lower annotation cost also improves dataset construction efficiency. For example, Ref. [50] employed a pre-trained VGG16 network to classify UAV-acquired images of building surfaces and performed transfer learning on the SDNET2018 dataset [38], enabling coarse localization of crack regions. In addition, Ref. [77] conducted experiments using ResNet50 and YOLOv8 on multiple public datasets. The results show that ResNet50 achieved an accuracy of over 99% in classification tasks. It indicates that deep CNNs can effectively learn edge structures and texture features of cracks, thereby improving the discrimination of crack regions.

However, CNN-based crack classification methods still exhibit certain limitations in UAV-based inspection applications. Deep CNN models typically involve high computational complexity, characterized by large numbers of floating point operations (flops) and parameters. As a result, they often rely on high-performance GPUs for efficient inference, making real-time crack classification difficult on resource-constrained onboard UAV platforms. In addition, patch-based binary classification lacks the ability to model global spatial context. When individual patches contain local texture interference, non-crack structures may be misclassified as cracks, leading to a higher rate of false positives.

3.1.3. Lightweight Classification Networks for UAV Edge Computing

To address the high computational demands of traditional CNN models, recent UAV-based crack image classification studies have increasingly shifted toward lightweight network architectures. These networks redesign convolution operations to reduce model parameters and computational complexity while maintaining effective representation of crack edges and texture features. It makes them suitable for real-time inspection and edge computing on UAV platforms. For example, MobileNet and ShuffleNet are representative lightweight architectures. MobileNet reduces model parameters and flops by replacing standard convolutions with depthwise separable convolutions. However, depthwise convolution processes each channel independently, which limits cross-channel information interaction. To improve feature fusion across channels, ShuffleNet introduces a channel shuffle mechanism, enabling information exchange between channel groups through feature reorganization.

For example, regarding the performance of lightweight classification networks, Ref. [78] evaluated multiple CNNs, including GoogLeNet, MobileNetV2, and ShuffleNet, on a UAV-acquired asphalt runway crack dataset. The results demonstrated that lightweight models can maintain robust classification performance while significantly reducing model parameters and memory usage. Among them, MobileNetV2 achieved the highest accuracy of 92.8%. In addition, Ref. [79] proposed a lightweight convolutional network, MobileCrack, based on the MobileNet architecture for patch-level classification of asphalt pavement cracks, achieving high recognition accuracy. These studies show that lightweight CNNs reduce model size and computational complexity while maintaining strong classification performance. Lower computational demand reduces the burden on onboard UAV platforms.

3.2. Crack Detection

Crack detection aims to localize crack regions in images using two-dimensional (2D) bounding boxes, enabling both identification and spatial localization of cracks. Confidence scores are also generated to indicate the reliability of the predictions. The current UAV-based crack detection research mainly focuses on two-stage detectors, one-stage detectors, and detection frameworks that incorporate attention mechanisms and Transformer architectures. As illustrated in Figure 5, which summarizes the distribution of different detection methods in recent UAV-based crack detection studies. Methods based on the YOLO series, Transformer, and attention mechanisms account for a large proportion, indicating that balancing detection accuracy, real-time deployment requirements, and global feature modeling has become a key research direction in UAV inspection scenarios.

3.2.1. Two-Stage Detectors

R-CNN and its variants, including Fast R-CNN and Faster R-CNN, drove the development of deep learning-based object detection and laid the foundation for two-stage detectors based on region proposals. Candidate regions are generated first. Classification and bounding box regression are then performed for object recognition and localization. Faster R-CNN is a representative model of two-stage detectors. It replaces the computationally expensive Selective Search algorithm with a Region Proposal Network (RPN) to generate candidate regions. The RPN shares convolutional feature maps with the subsequent detection network and predicts bounding box offsets with different scales and aspect ratios at each anchor on the feature map, enabling efficient generation of Regions of Interest (ROI). The ROI Pooling layer then aggregates region features into a fixed size, which are finally fed into fully connected layers for object classification and bounding box regression.

In UAV-based inspection of civil infrastructure, the imaging environment is often complex and subject to significant texture interference. Faster R-CNN is therefore commonly adopted as a baseline model for crack detection. For example, Ref. [79] applied a Faster R-CNN model with a VGG16 backbone to detect cracks in bridge images acquired from oblique views, achieving a precision of 92.03%. In addition, Ref. [21] proposed an integrated FRCNN-ResNet framework for crack detection in concrete structures. Faster R-CNN was employed to generate candidate regions, followed by classification and bounding box regression for crack localization and identification. ResNet is incorporated to alleviate the vanishing gradient problem in deep networks and enhance feature representation. The model achieved a precision of 93.3% with an inference time of approximately 59.7 ms.

However, two-stage detectors process candidate regions separately for feature extraction, classification, and bounding box regression. Region-wise processing increases computational complexity and results in higher inference latency and slower detection [80,81]. In UAV inspection scenarios, high-speed operations typically require near real-time video processing. Therefore, the processing speed of two-stage detectors becomes a limiting factor in such applications. To balance detection accuracy and computational efficiency, one-stage detectors have been increasingly adopted for crack detection.

3.2.2. One-Stage Detectors

Unlike two-stage detectors, one-stage detectors do not generate region proposals. Object detection is formulated as an end-to-end regression problem, with localization and classification performed within a single network. This improves detection efficiency. Representative one-stage detectors include SSD, RetinaNet, and the YOLO series. However, in UAV-based crack detection tasks, SSD is limited by its feature representation capability and shows relatively weak performance on small targets [82]. RetinaNet alleviates the class imbalance problem through Focal Loss. However, its relatively complex architecture makes it difficult to balance computational efficiency and detection accuracy in real-time deployment scenarios. In contrast, the YOLO series has become a dominant framework for UAV-based crack detection due to its high inference speed, lightweight design, and strong adaptability to edge deployment [83].

To address the challenges in UAV-based inspection, particularly small crack targets and strong background interference, recent studies have improved different versions of the YOLO framework to enhance detection accuracy and real-time performance. For example, Ref. [84] proposed a YOLOv4-FPM model by incorporating Focal Loss and channel pruning into YOLOv4. The model achieved 119 FPS and 97.6% mAP in UAV-based bridge crack detection. These results show that redundant parameters can be reduced while maintaining high accuracy, supporting real-time deployment in UAV inspection scenarios, and the design improves the detection of subtle crack features. To further address issues such as cloud shadows, low contrast, and noise in UAV imagery, Ref. [85] proposed a GC-YOLOv5s network for road crack detection. The model is built on YOLOv5 and introduces a Focal-GIoU loss with dynamic cross-entropy weighting.

The evolution of YOLO toward anchor-free detection and dynamic label assignment has improved its feature representation for small target detection. Accordingly, methods built on newer YOLO frameworks have become a major direction of current research. For example, Ref. [25] proposed MUENet based on YOLOX. The network introduces the MADPM module to enhance crack morphology feature extraction and employs the TI-UFS module for multi-level feature fusion. It further adopts an improved dynamic label assignment strategy to optimize label assignment and bounding box regression for sparse small targets. These designs improve both detection accuracy and real-time performance in UAV-based pavement crack detection. In addition, YOLOv8 also shows strong feature representation capability in feature extraction and multi-scale fusion. For instance, Ref. [86] proposed an improved MRC-YOLOv8 for UAV-based inspection of mountainous roads with complex terrain. The model achieved 92.3% mAP, which is 1.2% higher than YOLOv5s. In recent years, object detection frameworks have further evolved toward end-to-end paradigms. YOLOv10 demonstrates promising performance in extremely small target detection. For example, Ref. [27] proposed YOLO-LSN based on YOLOv10 for detecting micro-settlement cracks in mining areas. Experimental results show clear improvements over several mainstream detection models, highlighting its effectiveness for small crack detection in complex scenarios.

3.2.3. Attention Mechanisms and Transformer-Based Methods

Although the YOLO series is effective in local feature extraction, its convolutional operations rely on a fixed receptive field, which limits its ability to model long-range spatial dependencies. In UAV-based remote sensing imagery, cracks in civil infrastructure often appear as elongated, curved, or intersecting structures. These patterns may extend across large spatial regions. To better capture such spatially extended patterns, recent studies have introduced attention mechanisms and transformers into detection frameworks. For example, Ref. [87] proposed a CNN-Transformer hybrid architecture. The model adopts a shifted-window self-attention mechanism, enhancing long-range spatial dependency modeling. It achieved 95% precision and 43.5 FPS on a dataset of 1560 UAV road images. In addition, Ref. [88] proposed the LSKA-SPFF module, which constructs a spatial attention mechanism with large-kernel convolution. Crack geometry representation is enhanced during feature fusion.

In summary, UAV-based crack detection models are evolving from traditional CNN architectures toward hybrid frameworks that integrate attention mechanisms and Transformers. These models improve global representation and robustness in complex inspection scenarios by capturing long-range dependencies and enhancing feature representation.

3.3. Crack Segmentation

Crack detection can identify the location and distribution of cracks. However, bounding boxes contain substantial background pixels and cannot accurately represent crack geometry. As a result, pixel-level segmentation has become an important approach for quantitative analysis of cracks in civil infrastructure. Semantic segmentation performs dense prediction by assigning a semantic label to each pixel. It produces a binary mask with the same resolution as the input image. It allows precise representation of crack morphology and complex boundary structures. As shown in Figure 6, U-Net-based architectures are the most widely used in UAV-based crack segmentation studies. To address background interference and discontinuous crack textures, recent methods incorporate attention mechanisms and Transformer-based modules to enhance global feature modeling.

3.3.1. U-Net-Based Crack Segmentation

The U-Net architecture consists of a symmetric encoder–decoder structure with skip connections. The decoder progressively restores spatial resolution through skip connections. Fine-grained texture and boundary information from shallow layers are integrated during this process, which helps recover spatial details lost in pooling operations. In UAV-based remote sensing imagery, cracks often exhibit elongated shapes, blurred boundaries, and scale variations. To address these challenges, existing studies improve the U-Net architecture from three aspects: multi-scale feature fusion, network structure redesign, and lightweight design.

In terms of multi-scale feature fusion, Ref. [89] proposed ARD-Unet to address severe scale variations in UAV-based pavement crack detection. The model introduces depthwise separable residual blocks to mitigate gradient vanishing in deep networks. It also incorporates an atrous spatial pyramid attention module to capture multi-scale contextual features. The model achieved 76.41% mIoU and 74.24% F1-score on a self-constructed dataset.

To address the complex structural characteristics of dam cracks, Ref. [90] proposed the U2Net-ECA-AS model, which is built on the nested residual U-block architecture of U2Net. It integrates efficient channel attention and atrous depthwise separable convolutions. These designs enhance feature representation and expand the receptive field. The model achieved an F1-score of 88.88%.

For lightweight deployment on UAV platforms, Ref. [91] proposed RHACrackNet, a lightweight U-shaped segmentation network. The model introduces a hybrid channel-spatial attention module and depthwise separable convolutions to reduce parameters and computational complexity. It achieved 23.8 FPS in UAV-based pavement crack detection scenarios.

3.3.2. Attention-Based Crack Segmentation

In semantic segmentation, attention mechanisms adaptively reweight feature maps. It enables the network to focus on discriminative regions during feature extraction and fusion. As a result, feature representation is improved. Attention mechanisms can be categorized into four types: channel attention, spatial attention, mixed attention, and self-attention.

Channel attention encodes global dependencies across channels and assigns adaptive weights to each channel. It enhances the response to elongated crack structures and suppresses background noise in UAV imagery. As a result, feature representation and segmentation accuracy are improved. For example, Ref. [92] proposed the Cascade-FcaHRNet network for UAV-based bridge crack segmentation. The model introduces a frequency-channel attention mechanism to address the limitation of conventional channel attention in capturing subtle crack features. It leverages multi-spectral information and performs frequency-domain compression to model discriminative channel responses more effectively.

Spatial attention focuses on modeling the importance of different spatial locations in feature maps. It aggregates feature information across the channel dimension and generates a two-dimensional spatial weight map using convolution operations. It allows adaptive reweighting of spatial responses in feature maps. As a result, models with spatial attention can enhance crack regions with geometric continuity in complex backgrounds. For example, Ref. [93] proposed the CrackHAM network for UAV-based pavement inspection. The model uses spatial attention to capture spatial contextual dependencies and strengthen the response of linear crack structures.

Mixed attention combines channel attention and spatial attention. Channel attention suppresses background textures that resemble crack patterns, while spatial attention enhances the continuity of crack structures. For example, Ref. [94] proposed IEDSSNet for landslide crack segmentation. The CBAM module is introduced before the pooling layer, which can reduce the background interference caused by complex surface textures and enhance the local characteristics of subtle cracks. It also mitigates the loss of detail caused by pooling and improves boundary clarity and localization accuracy for small cracks. For resource-constrained UAV edge devices, Ref. [95] proposed a L-DANet. The model incorporates mixed attention modules. It effectively alleviates the loss of fine crack features in lightweight networks and achieves an F1-score of 90.5%.

Self-attention exploits global feature interactions across the entire image. It captures long-range dependencies between spatial regions. In crack segmentation tasks, it leverages long-range semantic information to establish relationships between crack structures, improving the connectivity of crack topology and enhancing overall segmentation performance. For example, Ref. [26] proposed MS-CrackSeg for UAV-based hyperspectral imagery. The model introduces a multi-scale self-attention fusion module, which can capture multi-scale spatial and spectral features. It enables accurate pixel-level segmentation of fine and network-like cracks.

3.3.3. Transformer-Based Crack Segmentation

Vision Transformer (ViT) extends the Transformer architecture to visual tasks and provides an important basis for subsequent Transformer-based segmentation models, including Swin Transformer and SegFormer. In crack segmentation, Transformer architectures are useful for capturing long-range structural continuity, especially when cracks are elongated, discontinuous, or distributed across large image regions. Transformer extends self-attention into a global modeling framework. It divides the input image into a set of non-overlapping patches. These patches are flattened and projected into a one-dimensional feature sequence through linear mapping. Multi-head self-attention captures the relationships between each patch and all other patches. It enables global feature interaction and improves the continuity of fine crack structures [96]. For example, when cracks appear fragmented due to imaging interference, Ref. [97] proposed the TransCrack network. The model uses a Transformer to capture structural and semantic relationships across discontinuous crack regions. It produces complete pixel-level segmentation masks. The results demonstrate the effectiveness of Transformer-based modeling for micro-crack features. To effectively segment the cracks, Ref. [98] proposed CNN-ViT model based on cascaded upsampling. The model combines convolutional local feature extraction with a lightweight Transformer bottleneck. Its cascade upsampling strategy further improves boundary preservation, while the compact Transformer design reduces the computational burden of dense prediction.

However, Transformer-based models are limited in capturing fine-grained local structures and often produce coarse crack boundaries. Recent studies have explored hybrid architectures that combine CNN-based local feature extraction with Transformer-based global representation. Hybrid CNN–Transformer architectures have consequently improved pixel-level crack segmentation performance. For example, Ref. [99] proposed a dual-path network. The CNN branch extracts high-frequency texture and fine-grained edge features of cracks. The Transformer branch captures global contextual information. The model achieved superior performance compared with ten mainstream methods across three public pavement crack datasets.

In terms of lightweight design, Swin Transformer introduces a shifted-window mechanism. Self-attention is confined to local regions, reducing computational complexity. Building on the shifted-window design, Ref. [23] proposed IBR-Former to balance computational cost and segmentation accuracy. The model supports lightweight deployment on UAV platforms. It also leverages hierarchical features to mitigate scale variation caused by abrupt changes in flight altitude. SegFormer adopts an All-Multi Layer Perceptron decoder to reduce model complexity while preserving global context. SegFormer maintains strong segmentation performance in complex crack scenarios. For example, Ref. [100] proposed CrackScopeNet based on the SegFormer architecture for UAV inspection. The model introduces multi-scale branches and a strip-based contextual attention mechanism. These designs enhance both local detail and global feature representation. It achieved 82.12% mIoU on the CrackSeg9k dataset with only 1.05 M parameters and 1.58 G flops.

3.3.4. Mamba-Based Crack Segmentation

Mamba introduces selective state space modeling into visual feature representation. Unlike self-attention, Mamba captures long range dependencies through state transitions and sequence scanning. The computational cost of global modeling is lower than that of self-attention, offering an alternative route for crack segmentation. Long range modeling is important for crack images because cracks are often thin, discontinuous, and spatially irregular. However, standard visual state space modules may not fully match the requirements of crack segmentation. Crack morphology, boundary texture, and local topology require more targeted feature modeling.

Recently, Mamba-based crack segmentation studies have improved crack representation mainly through structure aware modeling, dynamic feature filtering, and CNN Mamba feature fusion. Ref. [101] proposed SCSegamba for structure representation in pixel level crack segmentation. The model introduces a Structure Aware Visual State Space module. Gated bottleneck convolution models crack morphology with low parameter cost, while a structure aware scanning strategy strengthens semantic continuity between crack pixels. Crack topology and texture are therefore better represented under complex interference. SCSegamba achieved 83.9% F1 score and 84.79% mIoU on the multi scenario TUT dataset with only 2.8 M parameters. In contrast to structure aware scanning, Ref. [102] proposed CCMamba to enhance crack information extraction from multi level features. The model designs a Multi Head Criss Cross Mamba module, where high level semantic features generate dynamic convolution kernels. The kernels are applied to low level features along horizontal and vertical directions. Head attention then fuses directional features into representations sensitive to cracks. The mechanism filters background interference while preserving detailed crack responses. Meanwhile, Ref. [103] proposed a lightweight CNN Mamba hybrid network for local and global feature modeling. The Mamba Spatially Modulated Convolution module couples boundary sensitive convolutional features with long range state space representations. Reciprocal co modulation attention refines skip features and suppresses background responses. Cross interaction fusion further strengthens consistency across scales during decoding. The network improves fine crack continuity and boundary recovery.

3.4. Geometric Quantification and 3D Reconstruction of Cracks

Geometric quantification and 3D reconstruction of cracks are essential for transforming visual detection results into quantitative assessment. They involve two types of mappings: scale mapping and spatial mapping. Scale mapping converts image pixel measurements into real-world physical dimensions, enabling the millimeter-level geometry quantification of cracks. Spatial mapping projects 2D crack detection results onto a 3D structural coordinate system. It allows accurate spatial localization of cracks within the structure. It also provides a basis for subsequent crack analysis and evolution monitoring.

3.4.1. Crack Geometric Quantification

Crack geometric quantification extracts parameters such as crack length and width from binary segmentation masks. Pixel-level measurements are then converted into real-world dimensions through scale mapping. Irregular crack boundaries often make direct geometric computation on region contours unstable. Most studies apply skeletonization before quantification to extract the crack centerline. Skeletonization reduces complex crack regions to single-pixel-wide structures while preserving topology and spatial orientation [104]. Common skeletonization methods include the Zhang–Suen thinning algorithm and the medial axis transform. The Zhang–Suen algorithm iteratively removes boundary pixels to generate a continuous skeleton [49]. The medial axis transform computes the Euclidean distance field from foreground pixels to the nearest background boundary. It then extracts local maxima to form the centerline skeleton of crack regions [105].

Based on the extracted skeletons, geometric parameters can be derived. Due to the curved and irregular shape of cracks, polyline approximation is commonly used to estimate crack length [105]. The skeleton is divided into consecutive line segments. The total length is obtained by summing the Euclidean distances between adjacent points. Crack width is estimated using the Euclidean distance transform. For each skeleton point, the distance to the nearest boundary is computed. The maximum crack width is defined as twice the largest distance among all skeleton points. The average crack width is calculated as the ratio between the crack area and its length [105]. In addition, some studies adopt the maximum inscribed circle method to estimate the maximum crack width, which improves measurement stability [106].

However, the reliability of quantification based on skeleton extraction decreases when cracks form complex networks. Although skeletonization simplifies crack regions into centerline structures, intersecting or branching cracks may lead to ambiguous junctions or redundant skeleton branches. Adjacent cracks may also be merged into one connected region when segmentation boundaries are unclear. These problems may introduce errors into the estimation of crack length, width, area, and connectivity [107]. Therefore, measurements derived from skeleton extraction should be carefully examined in complex scenarios, especially when crack boundaries are uncertain or when crack branches are difficult to separate.

To obtain real-world measurements, scale mapping is required. The process estimates a scale factor that defines the physical distance represented by a single pixel. Crack dimensions are then converted from image space to physical space using the estimated factor. Existing methods estimate the scale factor in three ways: reference-based calibration, camera-based calibration, and UAV-based ground sample distance (GSD) estimation. Reference-based calibration uses objects with known dimensions, such as checkerboards, placed near the crack. The scale factor is derived by comparing pixel dimensions in the image with known real-world dimensions. Crack measurements can then be converted from pixel units to physical dimensions [108]. Camera-based calibration derives the scale factor from imaging parameters, including focal length, sensor size, image resolution, and object distance [109]. GSD-based methods estimate the scale factor from the relationship between pixel measurements and ground sample distance. High measurement accuracy and low estimation error make such methods suitable for engineering applications [110,111].

3.4.2. Crack 3D Reconstruction

2D crack perception focuses on crack visibility and measurement in image space and remains limited to planar representation. By contrast, 3D crack reconstruction supports spatial localization, mapping, and representation in 3D space. The process relies on multi-view observations, photogrammetric constraints, and substantial computational resources. In UAV-based infrastructure inspection, 3D reconstruction is commonly achieved using Structure from Motion (SfM) and Multi-View Stereo (MVS). Remote sensing images acquired by UAVs are used to recover camera poses and reconstruct dense 3D scenes. It enables accurate modeling of inspection environments.

SfM extracts and matches stable local features across multiple images. The resulting correspondences are used to estimate initial camera poses. Bundle adjustment then jointly refines camera parameters and 3D point coordinates. A sparse point cloud of the scene is finally generated. MVS builds upon the estimated camera poses. It performs dense stereo matching and depth estimation across multiple views. A dense 3D point cloud is generated. The reconstructed point cloud is then processed through surface reconstruction and texture mapping, producing a 3D mesh model for spatial modeling of cracks on infrastructure surfaces [112,113]. For example, Refs. [112,113] applied SfM and MVS to UAV-based bridge inspection and reconstructed high-precision 3D models. Crack detection results were projected onto the reconstructed models. Ref. [112] demonstrated accurate global localization of cracks on bridge piers and reduced geometric errors caused by non-planar structures, which improved crack width estimation. Ref. [113] further constructed a digital twin model to support visualization and interactive inspection in virtual environments.

3.5. Multi-Level Task Workflow

The multi-level visual analysis framework describes a progressive process from visual crack recognition in UAV imagery to structured damage representation for engineering applications. Crack classification first supports rapid screening by identifying image patches or local regions with crack-related features in large-scale UAV image collections. Crack detection then localizes candidate defects using bounding boxes and confidence scores, reducing the search space for more detailed analysis. Crack segmentation refines candidate crack regions into detailed masks that preserve boundary details and structural continuity. Geometric quantification builds on these masks to derive measurable structural parameters. Crack length, width, area, and topology are then estimated through skeleton extraction, scale mapping, and spatial measurement. 3D reconstruction extends crack information from 2D imagery to the spatial coordinate system of the inspected structure. By combining SfM and MVS, crack semantics and geometric measurements can be projected onto reconstructed point clouds or mesh models, establishing spatial correspondence between visual defects and physical structures.

The proposed framework organizes UAV-based image acquisition, crack recognition, geometric measurement, and structural assessment within a unified analytical workflow. It transforms cracks from visual image targets into spatially localized and quantitatively measurable damage information. In practical UAV inspection, the role of each task level depends on the inspected structure, imaging distance, surface texture, environmental interference, flight stability, and onboard computing resources. The following section further examines how inspection scenarios shape UAV deployment, task selection, and method performance.

4. Scene Analysis and Key Challenges

UAV remote sensing offers a practical solution for civil infrastructure inspection, owing to its high maneuverability and flexible viewing geometry. However, infrastructure types differ markedly in spatial scale, observation conditions, and imaging characteristics. Image quality can also be compromised during UAV surveys due to viewpoint changes, shadows, and occlusions. These factors introduce two key requirements. UAV platforms must accommodate diverse observation needs across different scenarios. Meanwhile, crack analysis methods should remain robust to scale variation, illumination changes, and complex backgrounds. Therefore, this section considers representative infrastructure scenarios, including bridges, building facades, dams, pavements, and wind turbine blades. It examines how different UAV platforms align with vision tasks at different levels. Major challenges of UAV-based crack detection in cross-scene applications are further identified and summarized. A comprehensive discussion then examines the applicability and reliability of UAV remote sensing for multi-scenario civil infrastructure inspection.

4.1. Multi-Scenario UAV-Based Crack Inspection for Civil Infrastructure

4.1.1. Bridge Scenario

Bridges, as large-scale civil infrastructure, present significant challenges for crack detection due to their structural characteristics and surrounding environments. Bridge structures are large in scale, with tall piers and long spans. Critical components are often located in concealed areas, such as the underside of box girders and beam-column connections. These areas are narrow and difficult to inspect. Furthermore, bridges are often located over water or in canyon regions, which limits the accessibility of conventional inspection methods to target areas [49]. UAV-based bridge inspection commonly employs multi-rotor platforms with strong hovering capability. It can capture high-resolution close-range images of key components and support subsequent crack detection, quantification, and 3D reconstruction. In confined or occluded areas, wall-climbing UAVs can travel along structural surfaces and capture images directly. The design reduces the access limitations of conventional aerial platforms in such environments [114]. A bridge inspection example is illustrated in Figure 7.

UAV-based bridge crack inspection is still constrained by flight conditions, imaging conditions, and material-related interference. When UAVs operate close to bridge structures or water surfaces, rotor airflow interacts with the surrounding environment and causes platform vibration. It degrades image quality and may introduce motion blur [49]. In addition, multi-view imaging leads to scale variations of cracks across images, which affects detection accuracy. Bridge structures are exposed to complex environments for long periods. Concrete structures and steel load-bearing structures exhibit different surface conditions. Their surfaces often show rust, water stains, dirt, and coating damage. These non-structural patterns introduce strong background interference in UAV images and increase the difficulty of crack detection [115].

When the imaging distance or shooting attitude changes, fine crack edges may become blurred, while small crack features may be weakened. Therefore, Ref. [79] first evaluated the maximum imaging distance and UAV hovering accuracy before applying Faster R-CNN for bridge crack localization. Prior evaluation of imaging distance and hovering accuracy reduces the impact of image blur and unstable acquisition on region proposal generation. In addition, surface textures such as rust, water stains, dirt, and coating damage may produce visual responses similar to cracks. To suppress background responses caused by these confounding patterns, Ref. [84] introduced focal loss into YOLOv4. The loss function reduces the weight of easily classified background samples during training and increases the relative contribution of samples with crack-like appearances, thereby improving model sensitivity to small crack targets in complex backgrounds. Network pruning was also adopted to reduce model size and inference cost, making the detector more suitable for real-time UAV inspection.

Bridge cracks often have slender structures, irregular boundaries, and local discontinuities under shadows or surface contamination. Ref. [23] proposed IBR-Former for crack segmentation and quantification under complex bridge inspection conditions. The Transformer-based segmentation structure improves contextual representation for cracks of different scales in complex backgrounds, helping distinguish real cracks from background interference. The independent boundary refinement scheme further corrects boundary pixels by referring to more reliable interior crack pixels, thereby alleviating boundary ambiguity in fine crack segmentation. Accurate crack boundary extraction at the pixel level is critical for quantification, as width measurement directly depends on crack edge delineation. In addition, changes in UAV imaging distance and oblique viewing angles can lead to uneven scale factors at different image positions. The full field scale calibration method in [23] indexes local scale factors under different distances and angles, supporting crack geometric measurement without repeated on-site calibration. Finally, 2D image detection remains limited in describing the spatial position of cracks on bridge structures. To overcome the planar limitation of 2D detection, Ref. [113] generated a 3D digital twin of the bridge based on UAV photogrammetry and integrated crack information into the reconstructed model. Crack recognition was achieved through surface difference analysis in the 3D model, linking visual crack features with bridge spatial geometry. Integration of crack information into the reconstructed 3D model provides a more stable basis for crack documentation, spatial localization, and subsequent structural inspection.

4.1.2. Building Facades Scenario

High-rise building facades are directly exposed to the external environment. Their safety, durability, and load-bearing capacity depend on the structural integrity of surface materials. During long-term operation, facade materials degrade under climate change, temperature variation, and structural aging, leading to crack formation, moisture ingress, and local damage [116,117,118,119]. Conventional facade inspection relies on scaffolding or suspended platforms. These methods require high-altitude operations and depend on manual visual inspection or handheld cameras. They are time-consuming and costly, and they involve safety risks. The results are also affected by human experience and often lack accuracy. To improve inspection efficiency and safety, recent studies have applied UAV remote sensing and computer vision to facade inspection. Multi-rotor UAVs equipped with imaging sensors can acquire facade images without physical contact. The approach supports large-scale inspection and improves both coverage and efficiency [118]. In addition, wall-climbing UAVs can attach to structural surfaces and capture images at close range, allowing detailed observation of local areas [5]. As shown in Figure 8, aerial images can be further processed using dedicated algorithms. Automated detection of cracks, spalling, and other visible defects becomes possible, supporting accurate assessment of facade conditions.

UAV-based facade inspection is affected by airflow disturbances around high-rise buildings. Building-induced turbulence, gusts, and atmospheric effects reduce the hovering stability of multi-rotor UAVs, leading to visual confusion between cracks and facade structures [116]. In addition, facade materials such as coatings and glass components exhibit reflectance and texture properties similar to cracks. Strong illumination, shadows, and water stains further reduce the separability of crack features [119].

When airflow disturbances around tall building facades reduce UAV stability, facade inspection requires stable close range image acquisition and fast onboard inference. Ref. [5] developed a wall climbing UAV and selected SSDLite MobileNetV2 for real time crack detection. The wall attachment mechanism keeps the camera close to the facade surface and reduces changes in imaging distance, while the lightweight detector supports rapid crack screening under limited mobile computing resources. In addition, facade materials such as coatings, glass, and tiles introduce heterogeneous textures that may weaken crack contrast or create patterns similar to cracks. To reduce the dependence on handcrafted visual cues under complex facade textures, Ref. [77] adopted deep networks for building crack detection. Hierarchical features learned by the network improve crack representation in visually heterogeneous facade scenes.

Facade images also contain many structural components, such as windows, edges, columns, and pipes. Structural components may generate responses similar to cracks, especially during pixel level crack analysis. To reduce structural background interference, Ref. [119] proposed a CNN and U-Net framework for facade crack detection. The CNN first removes many facade patches without cracks, particularly regions dominated by facade components. U-Net then segments crack pixels within the retained crack patches, improving crack extraction in complex facade backgrounds. Because crack pixels account for only a small proportion of each patch, weighted binary cross entropy reduces the dominance of background pixels during training and strengthens the learning of fine crack regions. For small targets and scale variation, Ref. [48] improved YOLOv8n with BiFPN, EC2f, and SAC2f. BiFPN improves feature fusion across different resolutions. EC2f strengthens channel feature interaction with limited computational cost. SAC2f combines spatial and channel attention to enhance local responses associated with cracks and suppress redundant background information. Similarly, Ref. [117] inserted SE attention modules into the YOLOv4 backbone. Channel attention strengthens feature channels sensitive to cracks and suppresses responses from less relevant facade textures.

4.1.3. Dam Scenario

As critical civil infrastructure for water management and energy production, dams play a pivotal role in flood control, power generation, navigation, and resource allocation. During long-term operation, components such as drift spillways and flood discharge spillways are exposed to high-velocity water flow and environmental weathering, leading to cracks and erosion-related damage on their surfaces [44]. Conventional manual inspection is inefficient and poses safety risks, especially on steep dam surfaces. UAVs enable non-contact inspection of high-risk areas and provide high-resolution aerial images. Combined with computer vision methods, these data support crack detection and structural condition assessment of dam structures [88]. The overall workflow of UAV-based dam crack inspection is shown in Figure 9.

However, under large-scale imaging conditions, small cracks on dam surfaces occupy only a few pixels in UAV images. Extracting features of small targets from high-resolution imagery remains a major challenge. In addition, low texture contrast often occurs in wet, reflective, or shadowed regions, making cracks difficult to distinguish from the background in terms of intensity and texture. Dam surfaces also contain repair marks and multi-scale crack defects. These regions are visually similar to cracks and can lead to false detections and missed detections.

In high resolution UAV images of dam surfaces, small cracks often appear as sparse pixel regions. Direct downsampling may remove weak crack details. To preserve small targets, Ref. [88] used overlapping sliding window cropping before detection and restored patch level results to the original image. For cracks with large scale variation, the NWD loss models bounding boxes as Gaussian distributions. Box similarity estimation is therefore improved when small targets are sensitive to spatial offsets. LSKA-SPFF further strengthens feature fusion by capturing local structures, long range dependencies, and channel correlations. Small crack responses are enhanced under background interference. The pruning and distillation strategy reduces model complexity for UAV field deployment.

For wet, reflective, or shadowed hydraulic surfaces, low contrast and noise often weaken crack boundaries. Ref. [120] addressed boundary degradation with LFPA-EAM-Fast-SCNN. The LFPA module enhances context representation across scales, while EAM uses edge gradient information to refine boundary features in subtle or low contrast damage regions. Weighted cross entropy further reduces the dominance of background pixels during training. In addition, dam surfaces also contain repair marks, patching, complex textures, and blurred backgrounds, which can produce visual responses similar to cracks. Ref. [43] proposed the CDDS network to reduce visual confusion. Its skip branches connect shallow location features with deeper semantic features, enabling crack pixels to be distinguished from visually similar background regions. For blurred cracks with diverse scales and mixed damage, Ref. [24] proposed MPViT-Crack. Multi scale patch embedding and the multi path Transformer structure extract fine and coarse crack features, while LKA captures long range pixel relationships and channel information. Mix-FFN reduces the accuracy loss caused by different UAV image resolutions, and OHEM increases the learning weight of difficult crack samples. Beyond 2D segmentation, Ref. [24] mapped crack width nephograms onto a 3D dam model, providing spatial quantification of crack width and distribution on curved dam surfaces.

4.1.4. Wind Turbine Blade Scenario

Wind turbine blades are critical components of wind energy systems. Their operating condition directly affects energy conversion efficiency and power output [46]. During long-term service, blades are exposed to extreme weather conditions, including lightning strikes, high-speed raindrop impacts, and salt spray corrosion, as well as cyclic mechanical loads that induce mechanical fatigue. These factors cause material degradation, leading to cracks, coating delamination, and pitting on blade surfaces and within internal structures [121]. Compared with conventional manual inspection, UAV-based inspection offers advantages in safety, coverage, and efficiency. By following predefined flight paths to acquire high-resolution images of blade surfaces, UAVs facilitate the non-contact inspection of multiple turbines in a single deployment. It significantly improves operational efficiency, maintains system reliability, and ultimately extends the service life of the blades [47]. The overall workflow of UAV-based crack inspection for wind turbine blades is shown in Figure 10.

Constrained by safety stand-off distances, blade cracks manifest at extremely small scales in UAV imagery, which severely degrades their feature responses and signal-to-noise ratios. In addition, blade coatings have strong reflectance. Consequently, aerial images of these reflective coatings are highly susceptible to overexposure and localized texture loss, severely compromising the separability between cracks and background noise. Complex geographic backgrounds and surface contaminants on blades further interfere with reliable crack discrimination.

Safety distance during UAV inspection makes blade cracks appear as small targets in high resolution images. Standard resizing may further weaken their feature responses or cause missed detections. To reduce scale loss, Ref. [46] adopted a slice aided inference strategy for ultra high resolution blade images. During inference, each image is divided into overlapping local patches, and detected boxes are then merged back into the original image. The relative scale of small and medium defects is increased while their spatial positions on the blade are retained. In addition, reflective coatings and strong illumination introduce another difficulty. Light colored cracks and low definition regions often show weak visual contrast against the blade surface. To improve feature extraction under such conditions, Ref. [31] proposed MI-YOLO. Multivariate information fusion integrates feature extraction based on YOLOv5s, MobileNetV3, and GhostNet, strengthening the representation of faint crack features. The C3TR module further improves the extraction of low definition targets, while Alpha-IoU balances precision and recall during bounding box regression. Moreover, surface dirt and complex geographic backgrounds may produce interference outside or near the blade region. To reduce background disturbance, Ref. [121] proposed KGP-YOLO with a filter and detector structure. KGP-Net first localizes the blade region through keypoint detection and geometric cropping. Background areas outside the blade are removed before defect detection, reducing interference from complex geographic scenes. The BRCA module is then integrated into YOLOv11 to capture long range dependencies through dynamic sparse attention, improving the discrimination of complex blade surface defects.

4.1.5. Pavement Scenario

The stable operation of transportation infrastructure is essential to socioeconomic development. Asphalt and cement concrete pavements are the primary load-bearing structures of road networks. They are subjected not only to fatigue stress induced by traffic loads but also to persistent environmental effects such as temperature variation and moisture infiltration. Consequently, pavement performance gradually deteriorates, driving the accumulation and upward propagation of micro-scale damage until macroscopic pavement distress ultimately manifests [122]. Among different forms of pavement distress, cracks are the most common and exhibit clear propagation characteristics. They are generally classified into longitudinal cracks, transverse cracks, block cracks, and oblique cracks [123]. A highly accurate and efficient crack inspection system is essential for preventing further deterioration. Timely and targeted preventive maintenance can reduce moisture-induced damage and preserve pavement structural integrity. Compared with conventional pavement inspection based on manual surveys or multifunctional inspection vehicles, UAVs effectively compensate for the physical blind spots of traditional methods owing to their high mobility, low cost, and reduced terrain constraints [86], as illustrated in Figure 11. The integration of UAV platforms with deep learning and computer vision enables multi-scale pavement crack detection and segmentation. It provides a practical and scalable solution for large-scale pavement inspection.

However, during UAV-based pavement inspection, cracks appear as extremely small targets in aerial images. Their features are easily weakened or lost during downsampling. In addition, pavement cracks are distributed across multiple scales and directions. Different crack instances, therefore often exhibit blurred boundaries and structural overlap. Road markings, tire skid marks, and asphalt repair strips are also highly similar to real cracks. These factors increase background interference and cause false positives.

To address these limitations, existing studies have adapted feature extraction, instance representation, and loss design to the visual degradation in UAV pavement images. Small cracks weakened by downsampling require feature extraction modules that can preserve weak spatial cues. Ref. [89] proposed ARD-Unet to address feature degradation in small crack extraction. DR-Block reduces feature loss through residual connections, while depthwise separable convolution limits computational cost. ASAM introduces multi-scale contextual modeling and channel attention to suppress interference from road markings, stains, and surface bumps. RFB further enlarges the receptive field and improves the localization of small and irregular crack structures. Cracks distributed across multiple scales and directions place higher demands on scale adaptive feature fusion. Ref. [122] proposed CrackLite-Net by combining lightweight feature extraction with attention guided feature refinement. GhostPercepC2f improves sensitivity to fine crack structures under limited computation. SAFPN strengthens feature fusion across different resolutions. SC2f suppresses irrelevant pavement textures through selective channel enhancement. When stains, traffic markings, and shadows show spatial appearances similar to cracks, Ref. [26] used UAV hyperspectral imagery to introduce spectral discrimination. Its 3D convolutional encoder learns spatial and spectral features jointly, while MSSA adaptively fuses multi-scale crack features.

Under complex pavement backgrounds, Ref. [123] proposed Pavement-DETR to improve localization. CSS attention strengthens channel and spatial responses in defect regions, while Conv3XC improves feature fusion and separates foreground defects from background textures. The weighted combination of PIoU v2 and NWD further optimizes bounding box regression for overlapping defects and small targets. Blurred boundaries and structural overlap between adjacent crack instances require more explicit instance representation. Ref. [124] proposed PDIS-Net, where dynamic convolution kernels are generated from predicted kernel positions and weights. Metric learning and kernel fusion then refine the generated kernels, enabling more reliable separation of adjacent or overlapping pavement distress instances. Small and overlapping defects also increase the difficulty of box regression. Finally, onboard UAV deployment requires compact models with stable feature representation. Ref. [85] proposed GC-YOLOv5s, in which Ghost and CSPCM modules reduce model parameters while preserving feature representation. Transposed convolution recovers spatial details, and Focal-GIoU reduces the influence of easy background samples while improving localization for irregular crack targets.

In summary, Table 4 presents a comparative overview of UAV-based crack inspection methods across bridges, building facades, dams, wind turbine blades, and pavement scenarios. It organizes these methods across multiple dimensions, including UAV platforms, task types, dataset construction, and performance metrics. Their specific strengths and limitations are revealed across different application scenarios. Specifically, three interrelated drivers systematically shape methodological choices across scenarios. First, the relative scale of cracks to imaging distance dictates whether the primary challenge is recovering fine details from small objects or suppressing complex background interference when cracks are imaged at sufficient resolution. Second, the nature of background interference determines the most effective feature representation: crack-like confounders abundant in pavements motivate spatial and channel attention mechanisms, whereas the repair traces and structural joints typical of dams and facades favor global context modeling via dilated convolutions or Transformers. Third, endpoint engineering objectives align with task hierarchy—rapid screening of large pavement networks favors efficient one-stage detectors, while the geometric precision required for bridge and dam assessment shifts emphasis toward segmentation and 3D quantification. These patterns underscore that methodological emphases are rational responses to scene-specific constraints, providing a principled basis for cross-scenario method transfer.

4.2. Key Challenges of UAV-Based Inspection Across Diverse Scenarios

UAV remote sensing has shown significant potential for SHM of civil infrastructure, owing to its high-spatial-resolution imaging capability and flexible deployment. However, civil infrastructure is often characterized by large spans, complex structures, and variable environmental conditions. These factors leave UAV perception, control, and data acquisition vulnerable to environmental disturbances and platform limitations during inspection. As a result, several key challenges remain for UAV deployment across diverse scenarios [6].

During close-range UAV inspection, rotor systems are affected by gusts, wind shear, and wake effects. These disturbances induce high-frequency vibrations that exceed the gimbal’s compensation capability and introduce motion blur into the imaging system. Illumination conditions also impose strong constraints on aerial image quality. In low-light environments, long exposure and high International Organization for Standardization settings introduce blur and noise. Conversely, intense illumination triggers specular reflection, which causes local overexposure and irreversibly erases critical crack features [29].
Current battery energy storage and rotor flight efficiency impose a trade-off between payload capacity and endurance, which limits inspection coverage and image quality to some extent. Most UAV platforms currently rely on lithium-polymer batteries. Under normal payload conditions, a single flight usually lasts only about 30 min [125]. As a result, inspection tasks in large-scale infrastructure scenarios often have to be divided into multiple flight segments. Frequent battery replacement or multi-UAV collaboration is therefore required. Such discontinuous data acquisition fragments aerial imagery across time and space. It also aggravates geometric seam artifacts during image stitching and reduces crack detection accuracy.
To overcome the limitations of single-sensor crack detection in inspection scenarios, UAV platforms are increasingly evolving toward the integration of multi-source heterogeneous sensors, including RGB, thermal infrared, multispectral, and LiDAR sensors [17]. However, heterogeneous sensor integration goes beyond mere hardware stacking. Significant differences exist in sensing mechanisms, imaging modalities, sampling frequencies, and spatial resolutions across sensor types [14]. During UAV inspection under dynamic operating conditions, platform vibration and attitude variation further amplify spatial misalignment among sensors. Different sensors produce inconsistent observations over the same target region, which complicates subsequent image processing and feature extraction.
UAV inspection is increasingly moving toward edge-cloud collaborative frameworks and real-time onboard detection. However, the integration of multi-source heterogeneous sensors has rapidly increased the volume of high-resolution optical imagery, LiDAR point clouds, and other data. These massive data streams sharply raise the computational demand of network inference, including flops and dynamic memory usage. Meanwhile, UAV platforms are constrained by payload capacity and power consumption, leaving onboard edge computing systems far less capable than ground-based or cloud-based devices in computing power, memory, and bandwidth. The mismatch makes out-of-memory failures and severe real-time performance degradation highly likely [126].
At the regulatory level, aviation authorities such as the Civil Aviation Administration of China, the Federal Aviation Administration, and the European Union Aviation Safety Agency generally require UAVs to operate within Visual Line of Sight. In addition, inspection tasks in some areas are subject to complex approval and safety certification procedures. These constraints limit the operational range of UAVs and reduce the deployability of large-scale engineering inspection [54]. In dense urban areas or near sensitive facilities, high-resolution sensors may also involve invasive data collection and security risks, raising public concern over privacy infringement and creating substantial regulatory and social resistance. Such regulatory and social resistance constrains the large-scale and routine deployment of UAV inspection [53].

5. Conclusions

We provide a structured review of the technical framework for public infrastructure crack detection based on UAV remote sensing, covering aspects such as dataset construction, UAV platform characteristics, multi-level computer vision methodologies, and multi-scenario engineering applications. The main innovation of this article lies in elaborating on the hierarchical technical system of the UAV crack detection method and the multi-scenario adaptation relationship, thereby clarifying the core challenges and development direction of UAV inspection technology in large-scale deployment. Firstly, in terms of the technical framework, based on the remote sensing images obtained by UAV in various scenarios, a hierarchical technical path for crack analysis was established, covering image classification, target detection, pixel segmentation, geometric quantification, and 3D reconstruction. Secondly, in the aspect of multi-scenario analysis, the system systematically analyzed the differences in spatial scale, imaging conditions, and background environment among various infrastructures. It also analyzed the impact of different scenarios on the expression of crack features and the performance of the model, thereby discussing the applicability and limitations of the model under complex backgrounds, small-scale targets, and multiple imaging conditions. Finally, it provided a systematic methodological basis and evaluation reference for the subsequent public facility inspection mode centered on UAV, which integrates collection, perception, and analysis.

Aiming at the limitations of the current inspection of cracks in public facilities based on UAV in terms of insufficient cross-scene generalization ability, physical constraints between UAV platform load, computing power and endurance, and difficulties in multi-source remote sensing data fusion, future research can be further developed from the following directions:

Visual foundation models, such as Segment Anything Model (SAM), may improve the cross-scene generalization of UAV-based crack inspection. However, direct deployment of these models on UAV platforms remains difficult. Cracks in UAV imagery are usually fine grained, low in contrast, and small in scale. These characteristics make it difficult for features learned from general visual data to represent crack boundaries and weak textures accurately. In addition, foundation models usually require large memory, high computational power, and long inference time. These requirements are difficult to satisfy on UAV edge platforms such as NVIDIA Jetson, where payload, battery power, and onboard computing resources are limited. Future work should explore lightweight foundation models, efficient fine tuning, model compression, and edge-cloud collaborative inference to balance generalization ability with real-time deployment feasibility.
UAV onboard computing remains far less capable than ground stations, limiting real-time model complexity. The airborne edge intelligence and multi-UAV cooperation inspection mechanism can be promoted to reduce the pressure of single machine data processing through distributed processing and collaborative scheduling, and improve the efficiency of large-scale infrastructure inspection tasks.
Combining RGB, thermal, and LiDAR data can overcome single-modality weaknesses, but dynamic UAV conditions cause severe spatial misalignment. The consistency and complementarity of multi-source heterogeneous remote sensing data in dynamic inspection scenarios are important to ensure the detection reliability.
The integrated application of UAV remote sensing technology and digital twins is constructed to promote the transformation of apparent damage detection to structural state assessment of infrastructure, so as to ensure long-term monitoring and stable operation of infrastructure.

With the collaborative development of UAV platforms, sensor systems and artificial intelligence algorithms, its detection performance and integration level in complex scene environments will continue to develop, and eventually promote the automation, intelligence and large-scale application of infrastructure monitoring.

Author Contributions

Conceptualization, Y.B. and W.Q.; methodology, Y.B.; software, Y.B.; validation, Y.B.; formal analysis, Y.B. and X.S.; investigation, Y.B. and Z.Y.; resources, Y.B.; writing—original draft preparation, Y.B.; writing—review and editing, X.S. and G.Y.; visualization, Y.B.; supervision, W.Q.; project administration, W.Q.; funding acquisition, W.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Jilin Provincial Natural Science Foundation, grant number 20260102281JC.

Data Availability Statement

The dataset collection is available at: https://github.com/Arthasyue/Crack-detection-public-dataset-collection (accessed on 24 April 2026).

Acknowledgments

The authors would like to thank the editors end reviewers for their suggestions and revisions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guo, J.; Liu, P.; Xiao, B.; Deng, L.; Wang, Q. Surface defect detection of civil structures using images: Review from data perspective. Autom. Constr. 2024, 158, 105186. [Google Scholar] [CrossRef]
Dong, C.-Z.; Catbas, F.N. A review of computer vision–based structural health monitoring at local and global levels. Struct. Health Monit. 2021, 20, 692–743. [Google Scholar] [CrossRef]
Wang, W.; Su, C. Semi-supervised semantic segmentation network for surface crack detection. Autom. Constr. 2021, 128, 103786. [Google Scholar] [CrossRef]
Kim, H.; Lee, J.; Ahn, E.; Cho, S.; Shin, M.; Sim, S.-H. Concrete crack identification using a UAV incorporating hybrid image processing. Sensors 2017, 17, 2052. [Google Scholar] [CrossRef] [PubMed]
Jiang, S.; Zhang, J. Real-time crack assessment using deep neural networks with wall-climbing unmanned aerial system. Comput.-Aided Civ. Infrastruct. Eng. 2020, 35, 549–564. [Google Scholar] [CrossRef]
Fan, J.; Saadeghvaziri, M.A. Applications of drones in infrastructures: Challenges and opportunities. Int. J. Mech. Mechatron. Eng. 2019, 13, 649–655. [Google Scholar]
Greenwood, W.W.; Lynch, J.P.; Zekkos, D. Applications of UAVs in civil infrastructure. J. Infrastruct. Syst. 2019, 25, 04019002. [Google Scholar] [CrossRef]
Kerle, N.; Nex, F.; Gerke, M.; Duarte, D.; Vetrivel, A. UAV-based structural damage mapping: A review. ISPRS Int. J. Geo-Inf. 2019, 9, 14. [Google Scholar] [CrossRef]
Rudd, J.D.; Roberson, G.T.; Classen, J.J. Application of satellite, unmanned aircraft system, and ground-based sensor data for precision agriculture: A review. In Proceedings of the 2017 ASABE Annual International Meeting; American Society of Agricultural and Biological Engineers: St. Joseph, MI, USA, 2017; p. 1. [Google Scholar]
Wang, C.; Tang, J. Reliable crack evolution monitoring from UAV remote sensing: Bridging detection and temporal dynamics. Remote Sens. 2025, 18, 51. [Google Scholar] [CrossRef]
Zhou, S.; Canchila, C.; Song, W. Deep learning-based crack segmentation for civil infrastructure: Data types, architectures, and benchmarked performance. Autom. Constr. 2023, 146, 104678. [Google Scholar] [CrossRef]
Ali, R.; Chuah, J.H.; Talip, M.S.A.; Mokhtar, N.; Shoaib, M.A. Structural crack detection using deep convolutional neural networks. Autom. Constr. 2022, 133, 103989. [Google Scholar] [CrossRef]
Ai, D.; Jiang, G.; Lam, S.-K.; He, P.; Li, C. Computer vision framework for crack detection of civil infrastructure—A review. Eng. Appl. Artif. Intell. 2023, 117, 105478. [Google Scholar] [CrossRef]
Kim, S.Y.; Kwon, D.Y.; Jang, A.; Ju, Y.K.; Lee, J.-S.; Hong, S. A review of UAV integration in forensic civil engineering: From sensor technologies to geotechnical, structural and water infrastructure applications. Measurement 2024, 224, 113886. [Google Scholar] [CrossRef]
Aela, P.; Chi, H.-L.; Fares, A.; Zayed, T.; Kim, M. UAV-based studies in railway infrastructure monitoring. Autom. Constr. 2024, 167, 105714. [Google Scholar] [CrossRef]
Feroz, S.; Abu Dabous, S. Uav-based remote sensing applications for bridge condition assessment. Remote Sens. 2021, 13, 1809. [Google Scholar] [CrossRef]
Meira, G.d.S.; Guedes, J.V.F.; Bias, E.d.S. UAV-embedded sensors and deep learning for pathology identification in building façades: A review. Drones 2024, 8, 341. [Google Scholar] [CrossRef]
Hamishebahar, Y.; Guan, H.; So, S.; Jo, J. A comprehensive review of deep learning-based crack detection approaches. Appl. Sci. 2022, 12, 1374. [Google Scholar] [CrossRef]
Yao, Z.; Li, Y.; Fu, H.; Tian, J.; Zhou, Y.; Chin, C.-L.; Ma, C.-K. Research on concrete crack and depression detection method based on multi-level defect fusion segmentation network. Buildings 2025, 15, 1657. [Google Scholar] [CrossRef]
Zhang, Y.; Lu, Y.; Duan, Y.; Wei, D.; Zhu, X.; Zhang, B.; Pang, B. Robust surface crack detection with structure line guidance. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103527. [Google Scholar] [CrossRef]
Kim, B.; Natarajan, Y.; Preethaa, K.S.; Song, S.; An, J.; Mohan, S. Real-time assessment of surface cracks in concrete structures using integrated deep neural networks with autonomous unmanned aerial vehicle. Eng. Appl. Artif. Intell. 2024, 129, 107537. [Google Scholar] [CrossRef]
Lei, B.; Ren, Y.; Wang, N.; Huo, L.; Song, G. Design of a new low-cost unmanned aerial vehicle and vision-based concrete crack inspection method. Struct. Health Monit. 2020, 19, 1871–1883. [Google Scholar] [CrossRef]
Ding, W.; Yang, H.; Yu, K.; Shu, J. Crack detection and quantification for concrete structures using UAV and transformer. Autom. Constr. 2023, 152, 104929. [Google Scholar] [CrossRef]
Zhao, S.; Kang, F.; Li, J. Intelligent segmentation method for blurred cracks and 3D mapping of width nephograms in concrete dams using UAV photogrammetry. Autom. Constr. 2024, 157, 105145. [Google Scholar] [CrossRef]
He, X.; Tang, Z.; Deng, Y.; Zhou, G.; Wang, Y.; Li, L. UAV-based road crack object-detection algorithm. Autom. Constr. 2023, 154, 105014. [Google Scholar] [CrossRef]
Chen, X.; Zhang, X.; Ren, M.; Zhou, B.; Sun, M.; Feng, Z.; Chen, B.; Zhi, X. A multiscale enhanced pavement crack segmentation network coupling spectral and spatial information of UAV hyperspectral imagery. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103772. [Google Scholar] [CrossRef]
An, J.; Dong, S.; Wang, X.; Li, C.; Zhao, W. Research on UAV aerial imagery detection algorithm for Mining-Induced surface cracks based on improved YOLOv10. Sci. Rep. 2025, 15, 30101. [Google Scholar] [CrossRef]
Zhang, Y.; Zuo, Z.; Xu, X.; Wu, J.; Zhu, J.; Zhang, H.; Wang, J.; Tian, Y. Road damage detection using UAV images based on multi-level attention mechanism. Autom. Constr. 2022, 144, 104613. [Google Scholar] [CrossRef]
Peng, L.; Liu, J. Detection and analysis of large-scale WT blade surface cracks based on UAV-taken images. IET Image Process. 2018, 12, 2059–2064. [Google Scholar] [CrossRef]
Wang, L.; Zhang, Z. Automatic detection of wind turbine blade surface cracks based on UAV-taken images. IEEE Trans. Ind. Electron. 2017, 64, 7293–7303. [Google Scholar] [CrossRef]
Xiaoxun, Z.; Xinyu, H.; Xiaoxia, G.; Xing, Y.; Zixu, X.; Yu, W.; Huaxin, L. Research on crack detection method of wind turbine blade based on a deep learning method. Appl. Energy 2022, 328, 120241. [Google Scholar] [CrossRef]
Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1525–1535. [Google Scholar] [CrossRef]
Zou, Q.; Cao, Y.; Li, Q.; Mao, Q.; Wang, S. CrackTree: Automatic crack detection from pavement images. Pattern Recognit. Lett. 2012, 33, 227–238. [Google Scholar] [CrossRef]
Liu, Y.; Yao, J.; Lu, X.; Xie, R.; Li, L. DeepCrack: A deep hierarchical feature learning architecture for crack segmentation. Neurocomputing 2019, 338, 139–153. [Google Scholar] [CrossRef]
Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic road crack detection using random structured forests. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3434–3445. [Google Scholar] [CrossRef]
Eisenbach, M.; Stricker, R.; Seichter, D.; Amende, K.; Debes, K.; Sesselmann, M.; Ebersbach, D.; Stoeckert, U.; Gross, H.-M. How to get pavement distress detection ready for deep learning? A systematic approach. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN); IEEE: New York, NY, USA, 2017; pp. 2039–2047. [Google Scholar]
He, J.; Gong, L.; Xu, C.; Wang, P.; Zhang, Y.; Zheng, O.; Su, G.; Yang, Y.; Hu, J.; Sun, Y. HighRPD: A high-altitude drone dataset of road pavement distress. Data Brief 2025, 59, 111377. [Google Scholar] [CrossRef]
Dorafshan, S.; Thomas, R.J.; Maguire, M. SDNET2018: An annotated image dataset for non-contact concrete crack detection using deep convolutional neural networks. Data Brief 2018, 21, 1664–1668. [Google Scholar] [CrossRef]
Yan, H.; Zhang, J. UAV-PDD2023: A benchmark dataset for pavement distress detection based on UAV images. Data Brief 2023, 51, 109692. [Google Scholar] [CrossRef]
Xu, H.; Su, X.; Wang, Y.; Cai, H.; Cui, K.; Chen, X. Automatic bridge crack detection using a convolutional neural network. Appl. Sci. 2019, 9, 2867. [Google Scholar] [CrossRef]
Liu, H.; Jia, C.; Cheng, X.; Liu, X.; Shi, F. Serial Local Patterns and Irregular Dependencies Extract and Cascaded Fusion Network for Structural Crack Segmentation. In Proceedings of the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: New York, NY, USA, 2025; pp. 1–5. [Google Scholar]
Zhao, B.; Zhou, X.; Yang, G.; Wen, J.; Zhang, J.; Dou, J.; Li, G.; Chen, X.; Chen, B.M. High-resolution infrastructure defect detection dataset sourced by unmanned systems and validated with deep learning. Autom. Constr. 2024, 163, 105405. [Google Scholar] [CrossRef]
Feng, C.; Zhang, H.; Wang, H.; Wang, S.; Li, Y. Automatic pixel-level crack detection on dam surface using deep convolutional network. Sensors 2020, 20, 2069. [Google Scholar] [CrossRef]
Hong, K.; Wang, H.; Yuan, B.; Wang, T. Multiple defects inspection of dam spillway surface using deep learning and 3D reconstruction techniques. Buildings 2023, 13, 285. [Google Scholar] [CrossRef]
Shihavuddin, A.; Chen, X. Dtu-Drone Inspection Images of Wind Turbine. 2018. Available online: https://orbit.dtu.dk/en/publications/dtu-drone-inspection-images-of-wind-turbine (accessed on 24 May 2026).
Gohar, I.; Halimi, A.; See, J.; Yew, W.K.; Yang, C. Slice-aided defect detection in ultra high-resolution wind turbine blade images. Machines 2023, 11, 953. [Google Scholar] [CrossRef]
Yang, C.; Liu, X.; Zhou, H.; Ke, Y.; See, J. Towards accurate image stitching for drone-based wind turbine blade inspection. Renew. Energy 2023, 203, 267–279. [Google Scholar] [CrossRef]
Ren, W.; Zhong, Z. Building construction crack detection with BCCD YOLO enhanced feature fusion and attention mechanisms. Sci. Rep. 2025, 15, 23167. [Google Scholar] [CrossRef]
Kao, S.-P.; Wang, F.-L.; Lin, J.-S.; Tsai, J.; Chu, Y.-D.; Hung, P.-S. Bridge crack inspection efficiency of an unmanned aerial vehicle system with a laser ranging module. Sensors 2022, 22, 4469. [Google Scholar] [CrossRef]
Choi, D.; Bell, W.; Kim, D.; Kim, J. UAV-driven structural crack detection and location determination using convolutional neural networks. Sensors 2021, 21, 2650. [Google Scholar] [CrossRef] [PubMed]
Boon, M.A.; Drijfhout, A.P.; Tesfamichael, S. Comparison of a fixed-wing and multi-rotor UAV for environmental mapping applications: A case study. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 47–54. [Google Scholar] [CrossRef]
Choi, H.-W.; Kim, H.-J.; Kim, S.-K.; Na, W.S. An overview of drone applications in the construction industry. Drones 2023, 7, 515. [Google Scholar] [CrossRef]
Liang, H.; Lee, S.-C.; Bae, W.; Kim, J.; Seo, S. Towards UAVs in construction: Advancements, challenges, and future directions for monitoring and inspection. Drones 2023, 7, 202. [Google Scholar] [CrossRef]
Mohsan, S.A.H.; Othman, N.Q.H.; Li, Y.; Alsharif, M.H.; Khan, M.A. Unmanned aerial vehicles (UAVs): Practical aspects, applications, open challenges, security issues, and future trends. Intell. Serv. Robot. 2023, 16, 109–137. [Google Scholar] [CrossRef]
Kim, J.-W.; Kim, S.-B.; Park, J.-C.; Nam, J.-W. Development of crack detection system with unmanned aerial vehicles and digital image processing. Adv. Struct. Eng. Mech. (ASEM15) 2015, 33, 25–29. [Google Scholar]
Rahnamayiezekavat, P.; Wang, D.; Chai, J.; Moon, S.; Rashidi, M.; Wang, X. Automated pavement marking integrity assessment using a UAV platform—A test case of public parking. J. Asian Archit. Build. Eng. 2025, 24, 1594–1605. [Google Scholar] [CrossRef]
Hegde, N.T.; George, V.; Nayak, C.G.; Kumar, K. Design, dynamic modelling and control of tilt-rotor UAVs: A review. Int. J. Intell. Unmanned Syst. 2020, 8, 143–161. [Google Scholar] [CrossRef]
Shah, S.F.A.; Mazhar, T.; Al Shloul, T.; Shahzad, T.; Hu, Y.-C.; Mallek, F.; Hamam, H. Applications, challenges, and solutions of unmanned aerial vehicles in smart city using blockchain. PeerJ Comput. Sci. 2024, 10, e1776. [Google Scholar] [CrossRef]
Hengyu, L.; Rongguo, M. Sky’s-Eye Perspective: A Multidimensional Review of UAV Applications in Highway Systems. Appl. Sci. 2025, 15, 11199. [Google Scholar]
Tian, Y.; Lin, F.; Li, Y.; Zhang, T.; Zhang, Q.; Fu, X.; Huang, J.; Dai, X.; Wang, Y.; Tian, C. UAVs meet LLMs: Overviews and perspectives towards agentic low-altitude mobility. Inf. Fusion 2025, 122, 103158. [Google Scholar] [CrossRef]
Ameli, Z.; Aremanda, Y.; Friess, W.A.; Landis, E.N. Impact of UAV Hardware Options on Bridge Inspection Mission Capabilities. Drones 2022, 6, 64. [Google Scholar] [CrossRef]
Yuan, Q.; Shi, Y.; Li, M. A review of computer vision-based crack detection methods in civil infrastructure: Progress and challenges. Remote Sens. 2024, 16, 2910. [Google Scholar] [CrossRef]
Ali, L.; Alnajjar, F.; Jassmi, H.A.; Gocho, M.; Khan, W.; Serhani, M.A. Performance evaluation of deep CNN-based crack detection and localization techniques for concrete structures. Sensors 2021, 21, 1688. [Google Scholar] [CrossRef]
Golding, V.P.; Gharineiat, Z.; Munawar, H.S.; Ullah, F. Crack detection in concrete structures using deep learning. Sustainability 2022, 14, 8117. [Google Scholar] [CrossRef]
Tymoshchuk, D.; Didych, I.; Maruschak, P.; Yasniy, O.; Mykytyshyn, A.; Mytnyk, M. Machine Learning Approaches for Classification of Composite Materials. Modelling 2025, 6, 118. [Google Scholar] [CrossRef]
Mayya, A.M.; Alkayem, N.F. Enhance the concrete crack classification based on a novel multi-stage YOLOV10-ViT framework. Sensors 2024, 24, 8095. [Google Scholar] [CrossRef]
Karimi, N.; Mishra, M.; Lourenço, P.B. Automated surface crack detection in historical constructions with various materials using deep learning-based YOLO network. Int. J. Archit. Herit. 2025, 19, 581–597. [Google Scholar] [CrossRef]
Hu, Y.; Chen, S.; Zhao, Z.; Cheng, S. Dual-Path Framework Analysis of Crack Detection Algorithm and Scenario Simulation on Fujian Tulou Surface. Coatings 2025, 15, 1156. [Google Scholar] [CrossRef]
Kang, D.H.; Cha, Y.-J. Efficient attention-based deep encoder and decoder for automatic crack segmentation. Struct. Health Monit. 2022, 21, 2190–2205. [Google Scholar] [CrossRef]
Tran, T.V.; Nguyen-Xuan, H.; Zhuang, X. Investigation of crack segmentation and fast evaluation of crack propagation, based on deep learning. Front. Struct. Civ. Eng. 2024, 18, 516–535. [Google Scholar] [CrossRef]
Nyathi, M.A.; Bai, J.; Wilson, I.D. Deep learning for concrete crack detection and measurement. Metrology 2024, 4, 66–81. [Google Scholar] [CrossRef]
Gonthina, S.S.; Aditya, S.; Gannoju, A.T.; Das, D.; Sinha, A. CrackScan: Enabling Intelligent Edge Inspection with UAVs for Structural Health Monitoring. IEEE Sens. J. 2025, 25, 42113–42120. [Google Scholar] [CrossRef]
Wang, Y.; Chen, Y.; Xie, Y.; Zhu, J.; Dang, C.; Zhu, H. BSGNet: Vehicle Detection in UAV Imagery of Construction Scenes via Biomimetic Edge Awareness and Global Receptive Field Modeling. Drones 2026, 10, 32. [Google Scholar] [CrossRef]
Lee, J.-H.; Gwon, G.-H.; Kim, I.-H.; Jung, H.-J. A motion deblurring network for enhancing UAV image quality in bridge inspection. Drones 2023, 7, 657. [Google Scholar] [CrossRef]
Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep learning-based crack damage detection using convolutional neural networks. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Fan, R.; Bocus, M.J.; Zhu, Y.; Jiao, J.; Wang, L.; Ma, F.; Cheng, S.; Liu, M. Road crack detection using deep convolutional neural network and adaptive thresholding. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV); IEEE: New York, NY, USA, 2019; pp. 474–479. [Google Scholar]
Wang, J.; Wang, P.; Qu, L.; Pei, Z.; Ueda, T. Automatic detection of building surface cracks using UAV and deep learning-combined approach. Struct. Concr. 2024, 25, 2302–2322. [Google Scholar] [CrossRef]
Tapkın, S.; Tercan, E.; Bostan, A.; Şengül, G. Crack detection on asphalt runway using unmanned aerial vehicle data with non-crack object removal and deep learning methods. Rev. Constr. 2025, 24, 603–631. [Google Scholar] [CrossRef]
Li, R.; Yu, J.; Li, F.; Yang, R.; Wang, Y.; Peng, Z. Automatic bridge crack detection using Unmanned aerial vehicle and Faster R-CNN. Constr. Build. Mater. 2023, 362, 129659. [Google Scholar] [CrossRef]
Chen, X.; Liu, C.; Chen, L.; Zhu, X.; Zhang, Y.; Wang, C. A pavement crack detection and evaluation framework for a UAV inspection system based on deep learning. Appl. Sci. 2024, 14, 1157. [Google Scholar] [CrossRef]
Zhou, Q.; Ding, S.; Qing, G.; Hu, J. UAV vision detection method for crane surface cracks based on Faster R-CNN and image segmentation. J. Civ. Struct. Health Monit. 2022, 12, 845–855. [Google Scholar] [CrossRef]
Chen, D.; Chen, D.; Zhong, C.; Zhan, F. NSC-YOLOv8: A Small Target Detection Method for UAV-Acquired Images Based on Self-Adaptive Embedding. Electronics 2025, 14, 1548. [Google Scholar] [CrossRef]
Li, F.; Lv, X.; Zhao, M.; Wu, W. IFD-YOLO: A Lightweight Infrared Sensor-Based Detector for Small UAV Targets. Sensors 2025, 25, 7449. [Google Scholar] [CrossRef]
Yu, Z.; Shen, Y.; Shen, C. A real-time detection approach for bridge cracks based on YOLOv4-FPM. Autom. Constr. 2021, 122, 103514. [Google Scholar] [CrossRef]
Xiang, X.; Hu, H.; Ding, Y.; Zheng, Y.; Wu, S. GC-YOLOv5s: A lightweight detector for UAV road crack detection. Appl. Sci. 2023, 13, 11030. [Google Scholar] [CrossRef]
Chen, X.; Wang, C.; Liu, C.; Zhu, X.; Zhang, Y.; Luo, T.; Zhang, J. Autonomous crack detection for mountainous roads using UAV inspection system. Sensors 2024, 24, 4751. [Google Scholar] [CrossRef] [PubMed]
Xing, J.; Liu, Y.; Zhang, G.-Z. Improved YOLOV5-based UAV pavement crack detection. IEEE Sens. J. 2023, 23, 15901–15909. [Google Scholar] [CrossRef]
Dong, H.; Wang, N.; Fu, D.; Wei, F.; Liu, G.; Liu, B. Precision and Efficiency in Dam Crack Inspection: A Lightweight Object Detection Method Based on Joint Distillation for Unmanned Aerial Vehicles (UAVs). Drones 2024, 8, 692. [Google Scholar] [CrossRef]
Gao, Y.; Cao, H.; Cai, W.; Zhou, G. Pixel-level road crack detection in UAV remote sensing images based on ARD-Unet. Measurement 2023, 219, 113252. [Google Scholar] [CrossRef]
Cheng, H.; Li, Y.; Li, H.; Hu, Q. Embankment crack detection in UAV images based on efficient channel attention U2Net. Structures 2023, 50, 430–443. [Google Scholar] [CrossRef]
Zhu, G.; Liu, J.; Fan, Z.; Yuan, D.; Ma, P.; Wang, M.; Sheng, W.; Wang, K.C. A lightweight encoder–decoder network for automatic pavement crack detection. Comput.-Aided Civ. Infrastruct. Eng. 2024, 39, 1743–1765. [Google Scholar] [CrossRef]
Chu, H.; Chen, W.; Deng, L. Cascade operation-enhanced high-resolution representation learning for meticulous segmentation of bridge cracks. Adv. Eng. Inform. 2024, 61, 102508. [Google Scholar] [CrossRef]
He, M.; Lau, T.L. Crackham: A novel automatic crack detection network based on u-net for asphalt pavement. IEEE Access 2024, 12, 12655–12666. [Google Scholar] [CrossRef]
Xu, H.; Wang, L.; Shu, B.; Zhang, Q.; Li, X. Automatic Detection of Landslide Surface Cracks from UAV Images Using Improved U-Network. Remote Sens. 2025, 17, 2150. [Google Scholar] [CrossRef]
Feng, M.; Xu, J. Lightweight dual-attention network for concrete crack segmentation. Sensors 2025, 25, 4436. [Google Scholar] [CrossRef]
Noh, J.; Jang, J.; Jo, J.; Yang, H. Crack Segmentation Using U-Net and Transformer Combined Model. Appl. Sci. 2025, 15, 10737. [Google Scholar] [CrossRef]
Lin, C.; Tian, D.; Duan, X.; Zhou, J. TransCrack: Revisiting fine-grained road crack detection with a transformer design. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2023, 381, 20220172. [Google Scholar] [CrossRef]
Tibermacine, A.; Tibermacine, I.E.; Kahhoul, Z.S.; Naidji, I.; Rabehi, A.; Habib, M. A Novel CNN–ViT Model with Cascade Upsampling for Efficient Crack Segmentation. Sensors 2026, 26, 1667. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Zeng, Z.; Sharma, P.K.; Alfarraj, O.; Tolba, A.; Zhang, J.; Wang, L. Dual-path network combining CNN and transformer for pavement crack segmentation. Autom. Constr. 2024, 158, 105217. [Google Scholar] [CrossRef]
Zhang, T.; Qin, L.; Zou, Q.; Zhang, L.; Wang, R.; Zhang, H. Crackscopenet: A lightweight neural network for rapid crack detection on resource-constrained drone platforms. Drones 2024, 8, 417. [Google Scholar] [CrossRef]
Liu, H.; Jia, C.; Shi, F.; Cheng, X.; Chen, S. SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures. In Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 10–17 June 2025; pp. 29406–29416. [Google Scholar]
Liu, B.; Zhao, Q.; Wang, C.; Xie, H.; Zhang, H. CCMamba: A Multi-Head Criss-Cross Mamba for Pavement Crack Segmentation. IEEE Trans. Intell. Transp. Syst. 2026, 1–15. [Google Scholar] [CrossRef]
Guan, J.; Cui, L.; Chen, Y.; Yang, C.; Wang, J.; Huo, Y. Lightweight CNN–Mamba Hybrid Network for Multi-Scale Concrete Crack Segmentation Using Vision Sensors. Electronics 2026, 15, 1362. [Google Scholar] [CrossRef]
Liu, Z.-L.; Zhou, A.; Ran, X.-R.; Wu, Y.-P.; Zhao, W.-G.; Zhang, H. A crack detection and quantification method using matched filter and photograph reconstruction. Sci. Rep. 2025, 15, 25266. [Google Scholar] [CrossRef]
Xiang, C.; Wang, W.; Deng, L.; Shi, P.; Kong, X. Crack detection algorithm for concrete structures based on super-resolution reconstruction and segmentation network. Autom. Constr. 2022, 140, 104346. [Google Scholar] [CrossRef]
Xu, Z.; Wang, Y.; Hao, X.; Fan, J. Crack detection of bridge concrete components based on large-scene images using an unmanned aerial vehicle. Sensors 2023, 23, 6271. [Google Scholar] [CrossRef]
Pereira, V.; Fukai, H. Automated topological analysis of crack networks for data-driven road maintenance decision-making. Int. J. Transp. Dev. Integr. 2025, 9, 919–935. [Google Scholar] [CrossRef]
Germanese, D.; Leone, G.R.; Moroni, D.; Pascali, M.A.; Tampucci, M. Long-term monitoring of crack patterns in historic structures using UAVs and planar markers: A preliminary study. J. Imaging 2018, 4, 99. [Google Scholar] [CrossRef]
Yoon, J.; Shin, H.; Song, M.; Gil, H.; Lee, S. A crack width measurement method of UAV Images using high-resolution algorithms. Sustainability 2022, 15, 478. [Google Scholar] [CrossRef]
Cho, H.; Yoon, H.-J.; Jung, J.-Y. Effects of the ground resolution and thresholding on crack width measurements. Sensors 2018, 18, 2644. [Google Scholar] [CrossRef]
Zhou, H.-F.; Hu, T.-Y.; Zhang, X.-L.; Lou, Y.-H.; Ni, Y.-Q. Uav-borne thermal imaging and adaptive pixel-level recognition for asphalt pavement crack detection. Infrared Phys. Technol. 2025, 152, 106279. [Google Scholar] [CrossRef]
Liu, Y.F.; Nie, X.; Fan, J.S.; Liu, X.G. Image-based crack assessment of bridge piers using unmanned aerial vehicles and three-dimensional scene reconstruction. Comput.-Aided Civ. Infrastruct. Eng. 2020, 35, 511–529. [Google Scholar] [CrossRef]
Yiğit, A.Y.; Uysal, M. Virtual reality visualisation of automatic crack detection for bridge inspection from 3D digital twin generated by UAV photogrammetry. Measurement 2025, 242, 115931. [Google Scholar] [CrossRef]
Wang, H.-F.; Zhai, L.; Huang, H.; Guan, L.-M.; Mu, K.-N.; Wang, G.-p. Measurement for cracks at the bottom of bridges based on tethered creeping unmanned aerial vehicle. Autom. Constr. 2020, 119, 103330. [Google Scholar] [CrossRef]
Dorafshan, S.; Maguire, M.; Hoffer, N.V.; Coopmans, C. Challenges in bridge inspection using small unmanned aerial systems: Results and lessons learned. In Proceedings of the 2017 International Conference on Unmanned Aircraft Systems (ICUAS); IEEE: New York, NY, USA, 2017; pp. 1722–1730. [Google Scholar]
Zhang, L.; Gong, L.; Wang, L.; Wang, Z.; Yan, S. A building crack detection UAV system based on deep learning and linear active disturbance rejection control algorithm. Electronics 2025, 14, 2975. [Google Scholar] [CrossRef]
Tse, K.-W.; Pi, R.; Sun, Y.; Wen, C.-Y.; Feng, Y. A novel real-time autonomous crack inspection system based on unmanned aerial vehicles. Sensors 2023, 23, 3418. [Google Scholar] [CrossRef]
Tan, Y.; Yi, W.; Chen, P.; Zou, Y. An adaptive crack inspection method for building surface based on BIM, UAV and edge computing. Autom. Constr. 2024, 157, 105161. [Google Scholar] [CrossRef]
Chen, K.; Reichard, G.; Xu, X.; Akanmu, A. Automated crack segmentation in close-range building façade inspection images using deep learning techniques. J. Build. Eng. 2021, 43, 102913. [Google Scholar] [CrossRef]
Han, F.; Gu, C. Surface Damage Detection in Hydraulic Structures from UAV Images Using Lightweight Neural Networks. Remote Sens. 2025, 17, 2668. [Google Scholar] [CrossRef]
Li, X.; Deng, B.; He, Y.; Chen, Q.; Zhang, Y.; Zhang, B.; Zhang, X. Wind Turbine Blade Surface Defect Detection via UAV High-Resolution Images. IEEE Sens. J. 2025, 26, 2678–2687. [Google Scholar] [CrossRef]
Pan, R.; Zhang, Y. CrackLite-Net: A Sustainable Transportation-Oriented Real-Time Lightweight Network for Adaptive Road Crack Detection. Sustainability 2025, 17, 10973. [Google Scholar] [CrossRef]
Zuo, C.; Huang, N.; Yuan, C.; Li, Y. Pavement-DETR: A high-precision real-time detection transformer for pavement defect detection. Sensors 2025, 25, 2426. [Google Scholar] [CrossRef]
Zhou, J.; Wang, Y.; Zhou, W. Efficient instance segmentation framework for UAV-based pavement distress detection. Autom. Constr. 2025, 175, 106195. [Google Scholar] [CrossRef]
Martínez-Rozas, S.; Alejo, D.; Carpio, J.J.; Caballero, F.; Merino, L. Long-Duration Inspection of GNSS-Denied Environments with a Tethered UAV-UGV Marsupial System. Drones 2025, 9, 765. [Google Scholar] [CrossRef]
McEnroe, P.; Wang, S.; Liyanage, M. A survey on the convergence of edge computing and AI for UAVs: Opportunities and challenges. IEEE Internet Things J. 2022, 9, 15435–15459. [Google Scholar] [CrossRef]

Figure 1. Annual publication trend of the selected studies.

Figure 2. Keyword co-occurrence network and clustering analysis.

Figure 3. Three mainstream UAV platforms for infrastructure inspection. Reprinted from Ref. [52].

Figure 4. A multi-level visual analysis framework for UAV-based crack detection.

Figure 5. Distribution of different methods in UAV-based crack detection studies.

Figure 6. Distribution of different methods in UAV-based crack segmentation studies.

Figure 7. UAV-based crack inspection in a bridge scenario: (a) bridge scene; (b) datasets; (c) multi-level visual analysis.

Figure 8. UAV-based crack inspection in a building facade scenario: (a) building facade scene; (b) datasets; (c) multi-level visual analysis.

Figure 9. UAV-based crack inspection in a dam scenario: (a) dam scene; (b) constructed image dataset; (c) multi-level visual analysis.

Figure 10. UAV-based crack inspection in a wind turbine blade scenario: (a) wind turbine blade scene; (b) constructed image dataset; (c) crack detection on wind turbine blades.

Figure 11. UAV-based crack inspection in a pavement scenario: (a) pavement scene; (b) constructed image dataset; (c) multi-level visual analysis.

Table 1. The overview of public crack datasets for infrastructure inspection.

Datasets	Dataset Scale	Acquisition Platform	Task	Scene	Description
Crack500	500 images (512 × 512)	Ground camera	Segmentation	Pavements	Crack500 covers environmental disturbances such as shadows, water stains, and asphalt textures.
CrackTree206	206 images (800 × 600)	Ground camera	Detection, Segmentation	Pavements	CrackTree206 contains shadow, uneven illumination, low contrast areas, and road cracks are distributed in a complex network.
DeepCrack	537 images (544 × 384)	Ground camera	Detection, Segmentation	Pavements & Concrete surface	DeepCrack covers asphalt pavement and concrete surface images, and cracks show rich multi-scale morphological features.
CFD	118 images (480 × 320)	Ground camera	Detection, Segmentation	Pavements	CFD covers complex environmental disturbances such as shadows, water stains, oil stains and road markings that are common in urban roads.
GAPs384	384 images (384 × 384)	Ground camera	Segmentation	Pavements	GAPs384 contains high-resolution asphalt pavement images.
HighRPD	11,696 images (640 × 640)	UAV	Classification, Detection	Pavements	HighRPD is a road surface disease dataset collected by a high-altitude UAV.
SDNET2018	56,000 images (256 × 256)	Ground camera	Classification, Detection	Multi-scenario (Bridges, Buildings, Pavements)	SDNET2018 is a large-scale concrete dataset containing bridges, walls, and pavements.
UAV-PDD2023	2440 image (2592 × 1944)	UAV	Detection	Pavements	UAV-PDD2023 is a road surface disease dataset collected by UAVs.
BCD	5069 images (224 × 224)	UAV	Classification	Bridges	BCD is a bridge crack classification dataset collected by UAV.
TUT	1408 images (640 × 640)	Ground camera	Segmentation	Multi-scenario (8 material types)	TUT is a crack segmentation dataset consisting of eight different scenes.
CUBIT-Det	5527 images (4624 × 3472; 8000 × 6000)	UAV, Unmanned Ground Vehicle (UGV)	Detection	Multi-scenario (Buildings, Pavements Bridges)	CUBIT-Det is a multi-scenario infrastructure defect detection dataset collected by UAV and UGV.
CDDS	1000 images (5472 × 3648)	UAV	Segmentation	Dam	CDDS is a non-public dataset of pixel-level crack detection on the surface of hydropower DAMS by using UAV.
DSI	1711 images (848 × 480; 1280 × 720)	Wall-climbing robot	Segmentation	Dam	DSI is a dataset of surface defects of the spillway of the dam collected by a climbing robot, including aging, spalling, and repair conditions.
DTU	589 images (5280 × 2890)	UAV	Detection	Wind turbine blades	DTU is an ultra-high resolution wind turbine blade inspection dataset collected by UAV.
Blade30	1302 images (5400 × 3600)	UAV	Classification, Detection, Segmentation	Wind turbine blades	Blade30 is a wind turbine blade surface defect dataset constructed by UAV inspection.

Table 2. Comparison of Multi-rotor, Fixed-wing, and Hybrid UAV Platform Characteristics.

Platform Performance	Multi-Rotor UAVs	Fixed-Wing UAVs	Hybrid UAVs
Platform Architecture	Airframe, FCU, IMU, GNSS, Power System, Multi-rotors	Airframe, Fixed wings, Tail, Propulsion system	Airframe, Rotors + Fixed wings, Propulsion system
Flight Characteristic	Vertical Take-off and Landing, Hovering, High maneuverability	Stable, Low vibration, Long endurance, High speed inspection tasks.	Vertical Take-off and Landing, Efficient cruise, Mode switching
Endurance	20–40 min, About 60 km/h	1–6 h [54] About 80 km/h	1–3 h, About 60–90 km/h [59]
Applicable Scenarios	Close-range fine inspection (bridge piers, dams, blades, etc.)	Large-scale macro inspection (highways, runways, etc.)	Cross-regional inspection (highways, long bridges, etc.), Balances wide coverage
Limitation	Short endurance, Limited coverage	Requires runway, No hovering, High pilot skill required	Complex structure, High cost, Difficult integration, High maintenance

Table 3. Commonly Used Evaluation Metrics for Crack Detection in Civil Infrastructure.

Metric	TYPE	Equations	Parameter Meanings	Description
Precision	Classification	$P r e c i s i o n = \frac{T P}{T P + F P}$	TP (True Positive): Number of crack pixels correctly classified as crack. FP (False Positive): Number of non-crack pixels incorrectly classified as crack.	Precision measures the proportion of correctly identified crack pixels, indicating the capability to suppress false positives and handle background interference [63].
Accuracy	Classification	$A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}$	TN (True Negative): Number of non-crack pixels correctly classified as non-crack. FN (False Negative): Number of crack pixels incorrectly classified as non-crack.	Accuracy measures the proportion of correctly classified crack pixels and reflects overall model performance [64].
Recall	Classification	$R e c a l l = \frac{T P}{T P + F N}$	-	Recall represents the proportion of actual crack pixels correctly identified by the model, reflecting its ability to detect crack regions and reduce missed detections [65].
F1-Score (F1)	Classification	$F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$	-	F1-Score is the harmonic mean of precision and recall. The range of F1-Score is between 0 and 1 [66].
IoU	Object Detection	$I o U = \frac{\|B_{p} \cap B_{g t}\|}{\|B_{p} \cup B_{g t}\|}$	B_p represents the predicted bounding box; B_gt represents the real bounding box annotated by the expert;	IoU measures the overlap between predicted and ground-truth bounding boxes, reflecting localization accuracy [67].
mAP	Object Detection	$m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}$	N represents the total number of categories; AP represents the average precision;	mAP is defined as the mean of average precision across all classes, serving as a comprehensive indicator of detection performance [68].
mIoU	Segmentation	$m I o U = \frac{1}{K + 1} \sum_{i = 0}^{k} \frac{T P_{i}}{T P_{i} + F P_{i} + F N_{i}}$	TP_i: Number of class i pixels correctly classified as i. FP_i: Number of non-class i pixels incorrectly classified as i. FN_i: Number of class i pixels incorrectly classified as non-class i.	mIoU measures the average overlap between predicted and ground-truth masks across all classes, reflecting segmentation performance [69].
RMSE	3D reconstruction	$R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(P_{i} - Q_{i})}^{2}}$	P_i represents the three-dimensional(3D) coordinates of the point cloud predicted by the model or the depth value of the pixel on the depth map; Q_i represents the corresponding actual physical coordinates or depth value; N is the total number of registration points used in the error calculation.	RMSE measures the square root of the mean squared error between predicted and ground-truth 3D coordinates or depth values. Lower values indicate higher accuracy [70].
Crack Width	Engineering quantification	$C ω_{m m} = C ω_{p} \times a_{c}$	Cw is the crack width in millimeters; Cw_p is the crack width in pixels; a_c is the conversion factor in (mm/pixels).	Crack width governs the ingress of moisture and chlorides, which accelerates reinforcement corrosion and leads to deterioration in load-bearing capacity [71].

Table 4. Comparison of UAV-based crack detection methods for different infrastructure scenarios.

Method	Scenario	UAV Platform	Task	Dataset	Dataset Scale	Metric Performance
Faster R-CNN [79]	Bridge	Multi-rotor	Detection	Self-collected	637 images	Precision: 92.03% Recall: 96.26% F1-score: 94.10%
YOLOv4-FPM [84]	Bridge	Multi-rotor	Detection	Self-collected	376 images	mAP: 0.976
IBR-Former [23]	Bridge	Multi-rotor	Detection	public datasets & Self-collected	2800 images	Precision = 86.32% Recall = 71.35% F1-score = 78.12% IoU = 63.34%
OTSU [114]	Bridge	Creeping UAV	Quantification	None	None	Crack Width
Damage augmented digital twins [113]	Bridge	Multi-rotor	3D Reconstruction	Self-collected	176 images	Crack length RMSE: 0.391 cm
YOLOv8-CBAM [116]	Building	Multi-rotor	Detection	CSD	2663 images	Precision: 97% recall: 97.9% mAP50: 98.4% mAP50-95: 54.77%
SSDLite-MobileNetV2 [5]	Building	Wall- Climbing UAV	Detection	Self-collected	1330 images	Accuracy: 94.48%
BCCD-YOLO [48]	Building	Multi-rotor	Detection	Self-collected	800 images	Precision: 91.5% Recall: 86.2% F1-score: 88.7% mAP: 94.9%
YOLOv4-SE [117]	Building	Multi-rotor	Detection	UAPD	4000 images	mAP: 90.02%
CNN,U-Net [119]	Building	-	Classification Segmentation	public datasets & Self-collected	6922 images	Classification: precision: 94% recall: 94% F1-scores: 94% Segmentation: precision: 96% recall: 95% F1-scores: 96%
ResNet50,YOLOv8 [77]	Building	Multi-rotor	Detection Classification	public datasets	18,578 images	Classification accuracy: 99% Detection accuracy: 85%
Drone-Yolov5 [88]	Dam	Multi-rotor	Detection	Self-collected	3157 images	mAP: 80.4% precision: 80% recall: 77%
LFPA-EAM-Fast-SCNN [120]	Dam	Multi-rotor	Segmentation	Self-collected	2479 images	Precision: 94.9% Recall: 89.2% F1-score: 90.6% IoU: 87.92%
CDDS [43]	Dam	Multi-rotor	Segmentation	CDDS	1000 images	Precision: 80.31% Recall: 80.45% F1-score: 79.16% IoU: 66.76%
MPViT-Crack [24]	Dam	Multi-rotor	Segmentation, Quantification 3D Reconstruction	Self-collected	3442 images	mIoU: 95.88% F1-score: 97.86% Precision: 98.04% Recall: 97.68% Accuracy: 97.68%
Slice-aided inference strategy [46]	Wind Turbine Blade	Multi-rotor	Detection	DTU	589 images	YOLOv5 mAP50: 85.1% Faster-RCNN mAP50: 83.4%
MI-YOLO [31]	Wind Turbine Blade	-	Detection	Self-collected	513 images	mAP: 93.2% Precision: 93.1% Recall: 92.2%
Coarse-to-fine strategy [47]	Wind Turbine Blade	Multi-rotor	Detection	Blade30	1302 images	-
KGP-YOLO [121]	Wind Turbine Blade	Multi-rotor	Detection	public datasets & Self-collected	2003 images	SY-PLUS mAP: 87.3% DTU mAP: 92.4% Blade30 mAP: 88.5%
ARD-Unet [89]	Pavement	Multi-rotor	Segmentation	CSRD	1046 images	MIoU: 76.41% Precision: 70.67% F1-score: 74.24% Recall: 78.21%
Pavement-DETR [123]	Pavement	-	Detection	UAV-PDD2023	2440 images	Precision: 89.3%; Recall: 83.8%; mAP_0.5: 87.1%; mAP_0.5: 0.95: 59.8%
PDIS-Net [124]	Pavement	Multi-rotor	Segmentation	UAPD-Instance	4373 images	mAP: 78.1% mPrecision: 90.6% mRecall: 94.1% mF1: 92.3% AP: 71.7%
CrackLite-Net [122]	Pavement	Multi-rotor	Detection	public datasets & Self-collected	11,000 images	Precision: 92.4% Recall: 85.2% F1-score: 88.7% mAP: 93.3%
MS-CrackSeg [26]	Pavement	Multi-rotor	Segmentation	Self-collected	1031 images	Precision: 72.47% Recall: 74.74% F1-score: 73.59% mIoU: 78.74%
GC-YOLOv5s [85]	Pavement	Multi-rotor	Detection	UMSC	2056 images	Precision: 76.9% Recall: 69.8% mAP_0.5: 74.3% mAP_0.5: 0.95: 44.6%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bai, Y.; Quan, W.; Shi, X.; Yan, Z.; Yuan, G. A Review of UAV-Based Crack Detection in Civil Infrastructure: A Multi-Level Visual Analysis Framework, Scene Adaptability, and Challenges. Remote Sens. 2026, 18, 1806. https://doi.org/10.3390/rs18111806

AMA Style

Bai Y, Quan W, Shi X, Yan Z, Yuan G. A Review of UAV-Based Crack Detection in Civil Infrastructure: A Multi-Level Visual Analysis Framework, Scene Adaptability, and Challenges. Remote Sensing. 2026; 18(11):1806. https://doi.org/10.3390/rs18111806

Chicago/Turabian Style

Bai, Yue, Wei Quan, Xuming Shi, Zeyi Yan, and Guoliang Yuan. 2026. "A Review of UAV-Based Crack Detection in Civil Infrastructure: A Multi-Level Visual Analysis Framework, Scene Adaptability, and Challenges" Remote Sensing 18, no. 11: 1806. https://doi.org/10.3390/rs18111806

APA Style

Bai, Y., Quan, W., Shi, X., Yan, Z., & Yuan, G. (2026). A Review of UAV-Based Crack Detection in Civil Infrastructure: A Multi-Level Visual Analysis Framework, Scene Adaptability, and Challenges. Remote Sensing, 18(11), 1806. https://doi.org/10.3390/rs18111806

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review of UAV-Based Crack Detection in Civil Infrastructure: A Multi-Level Visual Analysis Framework, Scene Adaptability, and Challenges

Highlights

Abstract

1. Introduction

1.1. Background

1.2. Advantages of UAV-Based Inspection

1.3. Related Work

1.4. Contributions

2. Datasets, UAV Platform and Evaluation Metrics

2.1. Datasets

2.2. UAV Platform

2.3. Evaluation Metrics

3. Multi-Level Analysis Framework for UAV-Based Crack Detection

3.1. Crack Classification

3.1.1. Traditional Machine Learning and Handcrafted Feature-Based Methods

3.1.2. Traditional CNN and Binary Classification

3.1.3. Lightweight Classification Networks for UAV Edge Computing

3.2. Crack Detection

3.2.1. Two-Stage Detectors

3.2.2. One-Stage Detectors

3.2.3. Attention Mechanisms and Transformer-Based Methods

3.3. Crack Segmentation

3.3.1. U-Net-Based Crack Segmentation

3.3.2. Attention-Based Crack Segmentation

3.3.3. Transformer-Based Crack Segmentation

3.3.4. Mamba-Based Crack Segmentation

3.4. Geometric Quantification and 3D Reconstruction of Cracks

3.4.1. Crack Geometric Quantification

3.4.2. Crack 3D Reconstruction

3.5. Multi-Level Task Workflow

4. Scene Analysis and Key Challenges

4.1. Multi-Scenario UAV-Based Crack Inspection for Civil Infrastructure

4.1.1. Bridge Scenario

4.1.2. Building Facades Scenario

4.1.3. Dam Scenario

4.1.4. Wind Turbine Blade Scenario

4.1.5. Pavement Scenario

4.2. Key Challenges of UAV-Based Inspection Across Diverse Scenarios

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI