1.1. Background
Urban renewal of existing residential areas has become a common challenge faced by many countries worldwide. With the acceleration of urbanization, problems such as building aging and safety hazards in high-density neighborhoods have become increasingly prominent. These issues not only affect the quality of life for residents but also directly relate to public safety and sustainable urban development. To address this, many countries have begun to introduce advanced technologies—such as unmanned aerial vehicles (UAVs), deep learning, and Building Information Modelling (BIM)—into the renovation of existing buildings to enhance diagnostic accuracy, evaluation, and maintenance efficiency.
As a frontier city of China’s reform and opening-up, Shenzhen has long attracted a large inflow of population and resources, making it a typical high-density city. The scarcity of urban land resources has led to the prevalence of high-rise buildings and compact residential patterns. In older residential areas in particular, historical and planning constraints have resulted in high building density, large floor area ratios, and narrow spacing between buildings. These neighborhoods are characterized by concentrated living environments, deteriorating building quality, and excessive energy consumption, often accompanied by low design standards, outdated material technologies, and insufficient maintenance. Building safety, functionality, and sustainability have therefore become critical concerns. Statistics show that in Shenzhen, old residential communities with a floor area ratio greater than 2.5 cover about 105 square kilometers, accounting for 21.9% of the city’s residential land. More than 36% of houses were built before the year 2000, and 1622 residential communities have been listed for renovation.
Traditional diagnostic and evaluation workflows face multiple challenges when applied to high-density urban neighborhoods. These include limited accessibility for inspectors, low efficiency, the risk of missing defects, safety hazards in high-altitude operations, as well as high costs associated with equipment deployment and maintenance. In addition, limitations in data analysis and processing hinder effective adaptation to the complex and dynamic conditions of dense urban environments. More critically, the lack of data-driven and intelligent methods to transform experiential assessments into quantitative evaluation and prediction restricts the objectivity and efficiency of renovation diagnostics. This calls for the exploration of new technologies and methodologies.
In response to these challenges, this study proposes an intelligent diagnostic method that integrates UAV aerial imagery with deep learning for detecting façade defects in aging residential buildings within high-density areas of Shenzhen. A DJI M3T UAV was employed for oblique photography, with carefully planned flight routes to collect both visible-light and thermal infrared imagery. At the algorithmic level, a deep learning-based model was developed to identify two common types of façade defects: cracks and water leakage. Crack detection was performed using visible-light images, while leakage was identified through thermal infrared imaging of abnormal temperature patterns. These two defects are representative and frequently observed under the humid climatic conditions of southern China.
Furthermore, a three-dimensional spatial mapping method was established to project the two-dimensional detection results onto a 3D building model. By employing coordinate transformation, distortion correction, and ray-tracing techniques, the system generates an intuitive visualization of defect distribution. Case studies conducted in Huaqiaocheng East, Shennan Garden, and Huifang Garden in Shenzhen validated the effectiveness and practicality of the proposed method in complex high-density urban environments. This research provides a novel technological pathway for intelligent building diagnostics and informed decision-making in urban renewal.
1.2. Literature Review
In recent years, many scholars have advanced building inspection methods by integrating drone imagery, deep learning algorithms, and non-destructive testing (NDT) technologies. Unlike traditional manual inspection, deep learning enables pixel-level defect detection through the training of semantic segmentation models for various defect features, achieving higher accuracy and consistency. Moreover, lightweight models can be trained for real-time inspection. Several excellent image segmentation architectures have been developed, including the classical U-Net [
1], FCN [
2], PspNet [
3], K-Net [
4], DeepLabv3 [
5], and Mask2Former [
6]. These segmentation networks not only enhance the precision of building pathology diagnosis but also maintain stable performance under complex environmental conditions. Research has shown that techniques such as drone-based dynamic response monitoring, thermal imaging, and deep convolutional neural networks (DCNNs) have significantly improved the accuracy and efficiency of building inspection. Wang, et al. [
7] verified that UAV monitoring accuracy can reach up to 2 cm. Zhong, et al. [
8] achieved automated diagnosis of façade detachment with an accuracy exceeding 90%; Perez, et al. [
9] developed a CNN-based defect detection model using VGG-16 and ResNet-50 with CAM for object localization, supporting real-time detection via mobile devices and UAVs. Dorafshan, et al. [
10] demonstrated that DCNNs outperform traditional methods in crack detection accuracy. Additionally, Kung, et al. [
11], dos Santos, et al. [
12] developed CNN- and Faster R-CNN-based models for façade and roof defect detection, respectively. Liu, et al. [
13] proposed a UAV photogrammetry-based damage identification framework using supervoxel segmentation and random forest algorithms, achieving 90% damage identification accuracy. Goessens, et al. [
14] validated the feasibility of UAV technology through real building tests, providing a practical reference for subsequent research. El Masri and Rakha [
15] reviewed six NDT technologies and their potential applications in building envelope diagnostics. Mayer, et al. [
16] used a pretrained Swin-T Transformer model to detect roof thermal bridges, achieving a recall rate of over 50%. Akbar, et al. [
17] proposed a UAV-based structural health monitoring system combining SURF and RANSAC, demonstrating robustness to UAV pose displacement and effective displacement detection in real-world structures. Shin, et al. [
18] summarized the limitations and improvement directions for UAV–AI hybrid inspection of residential buildings. In the field of infrastructure, Liu and Chou [
19] developed an embedded deep learning model for bridge inspection. Li, et al. [
20] achieved automatic defect identification in photovoltaic systems. Qiu and Lau [
21] integrated YOLO into UAV-based real-time pavement crack detection. Yang, et al. [
22] enhanced wind turbine blade damage detection using Otsu threshold segmentation. Ellenberg, et al. [
23], Kulkarni, et al. [
24] proposed infrared-based methods for detecting pavement voids and bridge deck deterioration, respectively. Tomita and Chew [
25] reviewed infrared thermography applications in building delamination detection, evaluating approximately 200 studies and analyzing key factors affecting detection accuracy, providing benchmarks for standardized testing. However, despite the effectiveness of single-modality techniques in specific scenarios, they exhibit notable limitations in complex building environments. Visible-light imagery can clearly capture crack edges but is easily affected by illumination changes, shadows, and reflections, making it difficult to reveal hidden defects such as internal leakage. In contrast, thermal infrared imagery can highlight abnormal temperature regions and is thus suitable for detecting water infiltration and insulation defects, but it suffers from low spatial resolution and environmental sensitivity. Consequently, single-modality approaches cannot comprehensively identify both surface and internal defects, limiting the accuracy and applicability of façade defect detection.
Table 1 compares UAV-based building inspection studies. Existing research mainly focuses on individual defect types, achieving 50% to 90% detection accuracy. This study, however, uses RGB-based semantic segmentation for crack detection (92.3% recall) and thermal imaging for leakage detection (86.44% recall) to address two common façade pathologies. Combined with 3D modeling, this approach enables high-precision detection and spatial localization of multiple defect types.
In addition, the influence of UAV aerial photography parameters on inspection efficiency and data quality has increasingly attracted research attention. Tan, et al. [
26] proposed a method integrating unmanned aerial vehicles (UAVs) with Building Information Modelling (BIM) to achieve automated surface inspection of buildings. This approach addresses the challenge of maintaining both completeness and high quality of data acquisition while minimizing flight path length, a key consideration given the limited endurance of UAVs. The coverage path planning problem was solved using a genetic algorithm (GA), with inspection areas extracted from the BIM model of the target building. In a subsequent study, Liu, et al. [
27] further developed a UAV inspection path planning method that integrates 3D reconstruction with BIM. The proposed workflow includes rough flight for environmental data collection, inspection waypoint calculation, and path optimization, providing a technical foundation for automated building inspection. Similarly, Bolourian and Hammad [
28] proposed a LiDAR-equipped UAV path planning method for bridge inspection. Their approach considered the potential locations of defects and employed a genetic algorithm to achieve collision-free trajectories, minimal occlusion, maximum coverage, and shortest flight duration through comprehensive optimization. Ivić, et al. [
29] developed a multi-UAV trajectory planning algorithm based on the Heat Equation–Driven Area Coverage (HEDAC) method for 3D visual inspection of complex structures. This method demonstrated significant advantages in reducing operation time, enhancing safety, and improving cost-effectiveness. Nap, et al. [
30] combined terrestrial laser scanning (TLS) with unmanned aircraft systems (UAS) for point cloud-based monitoring of large buildings, successfully identifying façade deformations. Schischmanow, et al. [
31] proposed a seamless real-time 3D thermal-mapping workflow that integrated visual-inertial navigation with thermal infrared cameras, demonstrating progress toward automated BIM generation. Zheng, et al. [
32] in research on façade visual inspection, emphasized that UAV flight paths must ensure adequate building coverage, minimal omission and overlap, and safe distances between the UAV and structures, while optimizing efficiency under these constraints. However, existing studies have paid limited attention to optimizing aerial strategies in high-density urban environments. Significant differences exist among high-rise and multi-story buildings in terms of shooting distance, flight time, data volume, and mapping accuracy. How to design differentiated aerial photography strategies that balance efficiency and precision according to building height remains insufficiently explored. In particular, dense and aging residential areas—characterized by severe occlusion and narrow spaces—still lack targeted solutions for effective UAV inspection.
After automated defect detection, a new challenge has emerged: the detected pathological defect information is stored across a large volume of images, making it difficult to manage and analyze effectively. To address this, several researchers have explored integrating defect information with three-dimensional (3D) models, enabling more efficient management and visualization of large-scale inspection data—now a growing research trend. Chen, et al. [
33] proposed a novel approach combining Building Information Modelling (BIM) and UAV-captured aerial imagery for automatic detection and reconstruction of concrete defects. Their method aligns aerial images with the BIM model using a bundle adjustment algorithm, allowing access to building-related semantic material information. This integration reduces false positives caused by irrelevant background objects and significantly enhances defect detection accuracy. Yang, et al. [
34] proposed a surface defect-extended BIM generation method combining UAV imagery with deep learning, projecting defects onto BIM models through transfer learning and texture mapping. Tan, et al. [
35] developed a method for mapping façade defect data from UAV imagery onto a BIM model. The process involves preprocessing UAV-acquired façade images to extract useful information and introducing a simplified coordinate transformation method to convert real-world defect locations into BIM coordinates. A deep learning-based instance segmentation model was employed to detect and extract defect features from the images. Finally, the identified defects were modelled as new BIM objects with detailed attributes and mapped to corresponding BIM components. Similarly, Pantoja-Rosero, et al. [
36] proposed an end-to-end automated workflow for building damage assessment by generating a Level of Detail 3 (LOD3) digital twin enriched with defect information. This method integrates multi-view stereo (MVS), structure-from-motion (SfM), and machine learning models to automatically generate geometric representations of buildings, segment damage regions, and characterize defects. Unlike traditional workflows, this process requires no manual intervention, produces lightweight models, and can be widely applied to various asset types. However, current approaches largely rely on specialized BIM software and lack lightweight, programmable 3D mapping tools accessible to designers and analysts. Particularly during the building renovation and redesign phase, there is still no mature solution for rapidly integrating defect detection results into parametric design platforms such as Rhino Grasshopper. Achieving linked workflows between defect statistics, façade analysis, and renovation planning remains a challenge, which in turn limits the practical application efficiency of inspection results in real-world architectural design workflows.
In summary, although extensive research has been conducted in digital reconstruction of building information and pathological defect recognition and diagnosis, current studies still face several specific technical challenges in façade inspection for existing residential areas in high-density urban environments, including the following: (1) Immature visible–infrared bimodal collaboration methods. Most existing research relies on a single image modality and fails to fully exploit the complementary advantages of visible-light imagery (high spatial resolution) and thermal infrared imagery (temperature sensitivity). In particular, under humid southern climates, methods for jointly identifying two typical façade defects—cracks and water leakage—remain underdeveloped. (2) Insufficient research on differentiated aerial photography strategies in high-density urban contexts. Few studies have systematically examined the trade-offs between data acquisition efficiency and mapping accuracy at different building heights and shooting distances, making it difficult to provide operational guidance for UAV inspections in dense, aging residential areas. (3) Lack of lightweight 3D mapping tools aligned with design workflows. Current BIM-based integration methods primarily depend on specialized software and have not achieved deep interoperability with parametric design platforms such as Grasshopper, thereby limiting the efficiency and flexibility of applying inspection results during the architectural renewal and design phase. Building upon research on existing building regeneration in Shenzhen and previous field investigations, this study proposes a deep learning–based building defect detection and visualization method that integrates visible–infrared data fusion with Grasshopper-based parametric modelling. Through the systematic integration of multimodal imagery collaboration, differentiated UAV strategies, and parametric 3D mapping, this research provides a methodological reference for enhancing the quality and sustainability of existing residential environments in high-density urban settings.