1. Introduction
Cracks are the earliest signs of structural deterioration that can reduce the lifespan and reliability of concrete structures and lead to severe environmental damage. To ensure the longevity and predict potential failures of such structures, evaluation and monitoring are required [
1]. Typical damage types that occur in concrete structures include cracks, peeling/exfoliation, efflorescence/leakage, and material separation. Among these damages, cracks are the most common type. When a crack occurs in a concrete structure, it can lead to fatal losses such as structural defects, reduced durability, exterior damage, corrosion of steel bars, and impaired waterproofing performance [
2]. In the case of cracks, if they are accessible, they can be measured with a simple instrument such as a tape measure or a crack magnifying glass. However, if access is difficult, the length and area must be estimated using visual inspection, which may differ from the actual crack size [
3]. Furthermore, it is challenging to assess the progression of cracks when reviewing their history with the visual inspection. When measurement results differ, the accuracy of calculating the crack size is compromised, and without determining the progression, the reliability of the outcome is not guaranteed, as an appropriate repair plan for the crack cannot be formulated [
4]. Since most of the crack investigation tasks are based on the inspection method that relies on the inspector’s eyes, there is room for differences in the inspection results prepared by each inspector, and there is a limitation that a lot of inspection manpower and cost are required [
5].
Therefore, as an alternative to the field inspection method of visual inspection using manpower, studies related to video-based safety inspection using unmanned aerial vehicles, image processing, and deep learning technology are being conducted to increase the objectivity and efficiency of crack investigation work. Jeong et al. [
6] conducted a study using Convolutional-Neural-Network (CNN)-based machine learning to identify damage in concrete bridges using images captured using an unmanned aerial vehicle. Liu et al. [
7] conducted a crack detection study that involved image distortion correction and 3D model reconstruction to assess cracks in bridge piers using images obtained from unmanned aerial vehicles. Cho et al. [
8] proposed a safety inspection process for collaborative housing based on unmanned aerial vehicles using the Business Process Modeling Notation (BPMN) technique and verified its practicality using on-site application. Deng et al. [
9] conducted a study on the computer vision-based crack detection and quantification methodology for civil structures and suggested the possibility of automated visual inspection using image processing and deep learning. Avendãno et al. [
10] presented a framework for inspection that combines data collection, crack detection, and quantification of essential parameters of cracks. The framework utilizes machine learning and image analysis algorithms for image-based inspection of concrete cracks using unmanned aerial vehicles. Jung et al. [
11] proposed a concrete crack detection method that employs deep learning and image processing techniques to identify cracks in concrete structures, aiming to enhance the efficiency and objectivity of crack investigation work. Yu et al. [
12] conducted research on vision-based concrete crack detection using a hybrid framework and proposed an automated vision-based method for identifying the surface condition of concrete structures. Maslan et al. [
13] developed an automatic detection and evaluation system for runway surface cracks using UAVs and Deep-CNN. Orinaitė et al. [
14] conducted research on the use of machine learning for the detection of underwater concrete cracks and verified the efficiency and accuracy of the proposed approach. Liu et al. [
15] conducted research on the detection of concrete cracks based on computer vision using U-Net and found it to be more robust and effective compared to CNN-based methods. Park et al. [
16] conducted research on the detection and quantification of concrete cracks using deep learning and structured light. Arbaoui et al. [
17] proposed a methodology for the detection and monitoring of concrete cracks in a concrete material sample and specimen using deep learning-based multi-resolution analysis. Jang et al. [
18] conducted research on deep learning-based concrete crack detection using hybrid image scanning that combines vision and infrared thermography images. Kim et al. [
19] developed a deep learning-based image analysis technique for crack detection and feature analysis in small-scale infrastructure images.
If safety inspections are conducted based on these image-based preceding studies, it becomes possible to enhance the objectivity of the inspection results by minimizing the subjective intervention of the inspector. Additionally, it can improve the limitations on inspection time and location, as well as enhance the existing inspection and investigation processes. This, in turn, is expected to reduce manpower and costs associated with inspections. However, most of the safety inspection studies of image-based facilities are mainly focused on crack detection. It is related to the presence and detection of cracks using the development of crack detection algorithms and deep learning-based models, and research on crack location estimation is relatively insufficient.
The location information of cracks is a crucial factor in conducting effective safety inspections of actual facilities. It enables the understanding of crack size, shape, and distribution, which are vital for implementing appropriate maintenance and reinforcement measures [
20]. However, as mentioned earlier, most of the research related to cracks has focused on their identification and representation, while studies on crack location estimation are relatively scarce or nonexistent. Zoubir et al. [
21] conducted a study on the identification and localization of concrete bridge defects based on D-CNN and transfer learning. The trained model for defect identification achieved a high accuracy of 97.13%, but for defect localization, it provided only an approximate pixel-level location. Kim et al. [
22] evaluated concrete cracks using two stereo visions with different focal lengths. A crack was detected using a framework based on the crack candidate region, and the location of the crack was simply expressed via the construction of a 3D model using the structure from the motion process. Studies related to such crack location information have also focused on crack detection and identification, and the crack location information has been shown to be at the approximate location expression level. Woo et al. [
23] proposed a methodology for estimating the location of concrete cracks with an accuracy at the level of millimeters using image processing techniques that utilize a reference object. However, a limitation of this methodology is that it cannot be applied when there are no identifiable reference points in the facility for defining the reference object for estimating the location of concrete cracks. Moreover, it was also found that there exists a limitation in the accuracy of the image processing technique when there is an inadequate number of reference objects available as feature points in the image.
The purpose of this study is to address the limitations of previous crack location estimation research based on image processing techniques that utilize reference objects. It aims to conduct a new image-based concrete crack location estimation study, seeking improvements in the existing methods. In the utilization of image processing techniques for crack location estimation, the lack of feature points can lead to instability in the results of spatial information construction using methods such as image-stitching and Point Cloud techniques that rely on feature points. A lack of reference objects can degrade the performance of algorithms used in tasks and increase errors, which can affect the accuracy and reliability of the results. Therefore, in this study, we attempted to estimate the location of concrete cracks in facilities lacking reference objects and feature points, and we propose a methodology for this purpose. In addition, this study is expected to be novel in developing a crack localization technology in a situation where most studies are focused on crack detection and identification.
  2. Materials and Methods
  2.1. Overview
In general, images without reference objects may contain different objects or scenes, making it difficult to match feature points. To find an appropriate match between different objects, a more sophisticated feature point extraction and matching algorithm is needed. Non-reference images can have various transformations and distortions, which can make accurate position estimation challenging when stitching images or creating 3D point clouds. Therefore, in this experiment, we attempted to address these limitations by applying Laser Points to the images. Laser Points were used to correct images with various deformations and distortions and served as corresponding points for image correction using a homography matrix.
Through this, the accuracy of image matching and location estimation can be improved, images are corrected using distortion correction and matching algorithms, and the location of cracks is estimated. Finally, we intend to conduct a position estimation experiment via the spatial information constructed using the aerial image obtained by operating the unmanned aerial vehicle equipped with the laser pointer model and the image processing technique. 
Figure 1 depicts the visualized flow chart for estimating the location of cracks in the exterior walls of buildings using unmanned aerial vehicles without the reference objects in the image. The proposed method comprises the following steps: (1) Data Acquisition: UAV-based aerial photography. (2) Construction of analysis data using image processing technique: (a) Distorted image correction using homography matrix and (b) Feature point extraction using SIFT algorithm. (3) Crack localization: (a) Construction of spatial information using image processing techniques, (b) Merging of spatial information layers based on point cloud technique and image stitching technique, and (c) Crack localization and Data validation.
  2.2. Data Acquisition
When acquiring data using unmanned aerial vehicle-based aerial photography, a flight plan must be established after thoroughly considering the filming conditions. Moreover, as the data quality obtained using aerial photography has a significant impact on image processing and spatial information construction, prior planning is crucial. Therefore, in this study, the flight plan was established considering flight safety and the quality of the photographed data.
Before initiating a UAV flight, the pilot must pre-assess factors that could impact flight safety, such as the structural system of the target building, the layout of surrounding terrain and buildings, and the presence of wires near the building. Subsequently, aerial photography is conducted in a location free from potential safety concerns. When capturing images of a building’s walls using a UAV, vertical flight to vary the altitude and horizontal flight to change the UAV’s position are employed. For such aerial photography, a rotary-wing UAV is more suitable than a fixed-wing UAV due to its ability for hovering flight and unrestricted altitude changes during flight. Additionally, in this study, close-up photography of the target building was necessary to capture aerial images containing cracks. To manage unexpected situations with a collision risk, experienced pilots conducted manual flights.
In obtaining aerial photographic data for estimating the location of concrete cracks, several factors must be considered, including flight stability, data quality, and overlap of acquired images. The resolution of an aerial image is influenced by the performance of the mounted camera sensor and the shooting distance. Woo et al. [
24] used an unmanned aerial vehicle with a resolution of 20 Megapixels (MP) to define structural cracks in the exterior walls of concrete buildings. Aerial images were acquired by setting the shooting distance to 2 m. Jeong et al. [
25] conducted crack detection using aerial images with a resolution of 12 MP taken from a shooting distance of 5 to 10 m. However, they confirmed that the detection had low accuracy and a limited detection range. Liu et al. [
7] used a UAV to detect facility cracks and acquired aerial images with a resolution of 20.8 MP by setting the shooting distance to 1~2 m. Kim and Cho [
26] used a UAV for automatic vision-based detection of cracks on concrete surfaces and set the shooting distance to 2 m to acquire aerial images with a resolution of 20 MP. Therefore, this study aimed to acquire aerial images with a resolution of 20 MP. The shooting distance was set to 2 m, considering safety.
The image overlap has a significant impact on the construction of spatial information based on Point Cloud techniques for concrete crack location estimation [
27]. Yonas et al. [
28] constructed spatial information for crack detection in aging bridges by setting the image overlap to 60~70%. Zhu et al. [
29] utilized aerial images with 75% image overlap for deep learning-based crack detection on roads. Yuhan et al. [
30] set the image overlap to 50% to detect defects in buildings and infrastructure. Kim et al. [
31] used aerial images with image overlap of more than 60% to identify cracks in aging concrete bridges using UAVs. Accordingly, this study set the image overlap to at least 65% for data acquisition to build spatial information and estimate the location of concrete cracks.
  2.3. Construction of Analysis Data Using Image Processing Technique
The construction of analysis data using image processing techniques involves two methods: (a) Distorted image correction using a homography matrix and (b) feature point extraction using the SIFT algorithm.
Aerial photographs of building exteriors taken using UAVs can occur from various distortions. Various distortions can occur in the images. These distortions are caused by factors such as the UAV’s position and attitude during aerial photography and the calibration of the camera sensors [
32]. Image distortion can be converted to its original form by applying correction techniques, and homography is one of the matrix conversion methods for image conversion and is used for perspective conversion. Homography transformation represents a projection transformation in a 3D space and performs a transformation that projects a plane of a 2D image onto another plane. Homography 4-point projection selects four corresponding points in the image to perform the transformation, and these corresponding points consist of corresponding pairs of corresponding points in the original image and the transformed image. The homography matrix is estimated using these correspondence points, and the image is transformed using it [
33]. In this study, Laser Point was used as a correspondence point of the homography matrix and applied to distortion correction of images lacking reference objects and feature points. The use of these laser points can reduce errors that occur when correcting distorted images that lack reference objects and feature points.
Next, feature points were extracted and matched using the SIFT algorithm to apply image processing techniques for cracks in concrete buildings using images with distortions corrected using a homography matrix. The SIFT algorithm is a feature point extraction algorithm widely used in image processing and computer vision fields. SIFT transforms images at various scales to detect feature points. To this end, feature point candidates were selected by applying a Laplacian of Gaussian (LoG) filter in various scales and directions of the image. Next, a key point having the sharpness of the image and the maximum value of the region was selected, and feature point detection in the image was performed using a Difference of Gaussian (DoG) filter. Feature descriptors were calculated based on the detected feature points, and feature point matching was performed by comparing the feature descriptors of each feature point using the Euclidean distance formula for feature point matching [
34].
  2.4. Crack Localization
Crack localization involves three methods: (a) Construction of spatial information using image processing techniques, (b) Merging of spatial information layers based on point cloud technique and image stitching technique, and (c) Crack localization and Data validation.
In this study, spatial information was constructed for estimating the location of cracks in concrete building exterior walls using image stitching and Point Cloud techniques. The Point Cloud technique is one of the methods for constructing 3D data and represents objects or environments as a set of points. Each point represents a position in 3D space, and the combination of points forms an overall model [
35]. The typical process of spatial information construction based on Point Cloud techniques consists of three steps: (1) Initial Processing, (2) Point Cloud and Mesh, and (3) Digital Surface Model (DSM) and Orthomosaic. In the initial processing, the Scale Invariant Feature Transform (SIFT) algorithm is used to identify key points as feature points in images that contain location information. Then, matching is performed to find corresponding key points in different images, and the internal and external parameters of the imaging sensor are calibrated. Next, the generated key points undergo Point Densification to construct the Point Cloud. Based on the constructed Point Cloud, a 3D Textured Mesh can be created, allowing the construction of a 3D model. Using this 3D model as a foundation, DSM and orthomosaic can be generated [
36]. Aerial images acquired using unmanned aerial vehicles (UAVs) contain vectorized location information. The spatial information constructed based on the Point Cloud technique using these aerial images has high accuracy and realism because it includes location information [
37]. However, the spatial information built based on the point cloud technique causes a situation in which the image quality is lower than that of raw data in the process of generating a 3D model by dividing and reconstructing aerial images into points [
38]. As a result, while the spatial information constructed based on the Point Cloud technique retains location information, it has limitations in defining cracks due to the difficulty in detecting cracks at the millimeter level.
The image stitching technique is a method that combines multiple images by matching common feature points to create a single image, allowing for the acquisition of high-resolution images like panoramic images. Feature-based image stitching involves extracting geometric features such as corners, edges, and lines from the input images and comparing them with the reference image to find corresponding points [
39]. These characteristics have robustness against brightness variations. However, extracting too many feature points increases the computational workload, leading to slower processing speed. Additionally, if incorrect feature points are extracted, errors may occur during the matching process. Therefore, in feature-based image stitching methods, the extraction of feature points is the most crucial factor to consider [
40]. The typical image stitching-based spatial information construction consists of five steps: (1) Feature Point Detection and Matching, (2) Correspondence Point Matching, (3) Estimation of Transformation Model, (4) Image Blending, and (5) Image Generation. Firstly, a feature point detection algorithm is used to detect common feature points among input images. Then, the detected feature points are used to match corresponding points between images. Correspondence points represent points connected between feature points in one image and feature points in another image. Next, using image transformation algorithms, relative position, rotation, and size information between images are utilized to estimate the transformation model based on the correspondence points. Using this process, multiple images are combined according to the transformation model to generate a single spatial information image formed using image stitching [
41].
Generally, spatial information generated using image stitching has a high resolution that allows for the identification of cracks. However, the resulting spatial information does not include the Geotags of each original image used in the stitching process. Geotags refers to the geographical location information contained in the images. During the stitching process, the images are transformed and combined to create a new image. In this process, there is a typical loss of individual metadata for each image, including Geotags [
42].
In this study, the spatial information layer merged based on the Point Cloud technique, which contains position information but may have difficulty identifying cracks, with the spatial information layer based on the image stitching technique, which has a high resolution, enabling crack identification but lacks position information. By combining the layers via layer merging, we performed crack location estimation of the building’s exterior. Additionally, to validate the accuracy of the crack location information estimated using the proposed methodology, we compared and analyzed the estimated crack positions with the crack position information obtained using field measurements.
  4. Discussion
In this study, a methodology was proposed for the localization of concrete cracks using unmanned aerial vehicle-based aerial images when there is a lack of reference objects and feature points within the images. The use of unmanned aerial vehicles for image-based safety inspections allows data to be obtained from hard-to-reach areas. It minimizes the subjective intervention of inspectors, ensuring the objectivity of the investigation results. Moreover, it improves constraints on the timing and location of inspections, leading to reduced manpower and cost for conventional inspections and surveys. Due to these advantages, research utilizing unmanned aerial vehicles for safety inspections, including buildings and other facilities, has been actively conducted in recent times. However, current studies related to safety inspection, such as the study of Deng et al. [
9], focus only on the detection of cracks. Although there were previous studies related to the location of cracks, the studies also focused on the detection and identification of cracks and the location of cracks was roughly expressed [
43,
44,
45,
46,
47]. In addition, Woo et al. [
23] conducted a study on the localization of cracks using an image processing technique using a reference object, but when the facility to estimate the location of the crack does not have a feature point that can be defined as a reference object, there was a limitation that the methodology could not be used. The methodology proposed in this study overcomes these limitations, and using experimentation and validation, it has been confirmed that crack localization is possible even when reference objects and feature points are lacking.
In this study, the localization of cracks was estimated by applying an unmanned aerial vehicle and image processing techniques, which have recently attracted attention in safety inspection. Compared to previous studies, this study is differentiated in that it provides a methodology that can localization of cracks even when reference objects in the target building are insufficient or do not exist. In addition, it is expected that the utilization of unmanned aerial vehicles during safety inspection can be increased in that the technology that has been attracting attention has been applied.
However, this study has a limitation in that it designed a separate Laser Point model to be attached to the unmanned aerial vehicle for the supplementation of reference objects and feature points.
Many researchers have conducted studies using unmanned aerial vehicles and deep learning to detect cracks. It is anticipated that the fusion of this study with crack detection technology in the future could make a significant contribution to safety inspections.
  5. Conclusions
In this study, a new image-based concrete crack localization method was developed to address the limitations of the existing crack localization research that relied on reference objects in image processing techniques. The accuracy of crack localization in facilities lacking reference objects and feature points was verified by comparing the estimated positions with the measured ground truth values.
A total of 107 aerial images were acquired using an unmanned aerial vehicle equipped with a laser pointer model, and analysis data was established using image correction and feature point extraction based on the homography matrix and SIFT algorithm. Next, layer merging was performed by constructing spatial information based on the point cloud technique and spatial information based on the image stitching technique, and location estimation of cracks on the exterior of the building was performed using the analysis data established using layer merging. Four cracks were defined in the experiment, and localization was performed. The estimated crack positions were compared with the ground truth values obtained from field measurements, revealing an RMSE error ranging from 80.80 to 108.95 mm.
The methodology for localization of concrete cracks proposed in this study was performed for buildings with universal box-shaped characteristics. It is judged that it can be sufficiently utilized for buildings with such morphological characteristics. If a study is conducted on the localization of cracks in atypical concrete buildings in the future, it will be possible to estimate the location of cracks in concrete regardless of the shape of the building.