1. Introduction
The best strategies for the visual inspection of large-scale industrial buildings are still a challenge to be addressed by civil infrastructure engineers. Typically, it is a time-consuming activity, with high human risks and financial costs, that increases in complexity with the surveyed area. In recent years, Unmanned Aerial Vehicles (UAVs) have been incorporated into this task, enabling a remote and enhanced procedure.
Metallic sandwich panels are a versatile solution with properties that ensure a simple on-site installation and durability. In these elements, corrosion represents an early damaged state that can be directly or indirectly responsible for critical failure mechanisms (e.g., delamination, debonding, and perforation, etc.), as well as serviceability constraints of the interior spaces of the assets related to loss of impermeability and water leakage.
In recent years, essential developments in software and hardware have led to undeniable advances in Artificial Intelligence (AI) techniques, allowing several novel applications of the image pattern recognition [
1], which decisively contribute to the solution of problems such the one of the automatic detection of corrosion on metallic sandwich panels. Furthermore, the combination of these advances with a technology that allows the remote survey of large areas and buildings in real-time [
2], i.e., UAVs, could help to diminish the occurrence of fall from height accidents, which was the most critical risk factor associated with construction activities in Great Britain in 2022 [
3] and also represents an important concern in the rest of the world [
4].
Computer vision has made progress with incorporating deep learning techniques for pattern recognition, in this case, anomaly identification. The landmark of this success was the remarkable performance of the Convolutional Neural Network (CNN) architecture developed in 2012 by Krizhevsky et al. [
5], widely known as AlexNet, which won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [
6].
Nowadays, the most advanced deep learning techniques for image analysis allow performing one or several of the following image recognition tasks, depending on the framework used and the level of information required [
7,
8,
9]: (i) classification, for identification of the existence (or not) of anomalies, based on the classical CNN algorithm; (ii) detection, which additionally localizes the anomaly and specifies the type of anomaly, based on the so-called Region-CNN (R-CNN) algorithm, and iii) segmentation, which additionally specifies which pixels belong to each of the identified anomalies, based on the Mask R-CNN algorithm. The Mask R-CNN allows instance segmentation, which consists of the consecutive application of the classification, detection, and segmentation.
In the construction sector, the application of these algorithms can be performed in three distinct areas [
10]: (i) health and safety, (ii) management and tracking, and (iii) damage assessment. In the first area, the development of technology to automatically detect the absence of the use of personal protective equipment is a concern. Shen et al. [
11] developed a methodology for detecting the use of safety helmets on construction sites. Based on transfer learning and the DenseNet network, these authors created a bounding-box regressor capable of surpassing common challenges of complex backgrounds, like scale variance and perspective distortion. The authors were the first to apply a deep learning technique to this problem successfully. The study points out that the proposed solution is competitive with other existing detection methodologies, like the YOLO’s families.
Applications involving asset management and construction progress are more common and broad. For example, Li et al. [
12] created a methodology for rebar counting in on-site construction, based on an improved version of the YOLOV3 network [
13]. They obtained an average precision for detection equal to 99.7% for an IoU of 50%. However, the proposed methodology proved limited since the counting was only based on the transversal section of the rebars. The high average precision achieved also draws attention, indicating that the model’s generalization might be compromised when applied to distinct scenarios in complex backgrounds. Nevertheless, this innovative idea inspired other studies, such as the one developed by Kardovskyi and Moon [
14], that proposed a complete methodology to perform steel rebar assessment, resorting to high-performance hardware. In this study, the Mask R-CNN algorithm, with the support of a stereo vision system, was upgraded to measure not only the number of rebars but also the spacing, length, and diameter of the rebars. However, the dataset, containing only 240 images, was the main drawback of the work. Similarly, Xiao and Kang [
15] developed a large-scale dataset for machinery operating on the construction site, including a reliable labeling method that enhances detection and classification. Despite this performance, further improvements can benefit the work since it only includes segmentation annotations for particular cases. Furthermore, all the images were taken from the ground level, which is less efficient and more timing consuming when compared to aerial acquisition.
The indoor tracking of the construction process was addressed by Wei et al. [
16] with the Mask-RCNN algorithm and a stereo camera to capture 738 images and monitor the execution progress of a base floor including coatings, with the results being transferred to a BIM digital model. This study had the challenge of extrapolating the learning to other building construction stages. In the area of waste management and disposal, which is a current topic of concern, Lu et al. [
17] applied semantic segmentation to recognize the composition of construction waste (e.g., rock, stone, packaging, fabric, and wood, etc.), based on a DeepLabv3+ network [
18], and achieved a Mean Intersection Over Union (mIoU) of 56%. Chen et al. [
19] proposed the application of the Mask R-CNN to estimate the overall built area in rural regions, using open-source satellite images and a transfer learning strategy for the training stage, as well as UAV-acquired images for the test/inference stage. However, this study did not take full advantage of the UAV images, missing the opportunity to use detailed high-resolution images in the training stage.
Damage assessment is currently a significant concern for infrastructure managers and is where most studies involving advanced image processing are performed. Karaaslan et al. [
20] proposed a semi-supervised methodology to detect spalling in real-time, providing a 30% improvement in precision compared to a human inspector. Moreover, Santos et al. [
21] classified exposed steel rebar images from an industrial building using a CNN, innovatively using the support of a UAV to obtain orthomosaic maps with the identified anomalies.
Instance segmentation was also performed for the damage assessment, but this technique is still underused when compared to other AI algorithms [
10]. Zhan et al. [
22] used the Mask R-CNN framework and aerial images to precisely identify damaged buildings after the Kumamoto earthquake in 2016, reaching 88% of accuracy but lacking the report of the segmentation metrics. In addition, Hou et al. [
23] applied the Mask R-CNN using ground penetrating radar images to automatically detect and segment abnormal instances that might indicate corrosion on concrete bridges, reaching an average accuracy for detection and segmentation of 58.6% and 47.6%, respectively.
Corrosion defects were also identified with machine learning in several applications involving bridges and buildings [
24,
25,
26]. It is worth noticing other potential applications within the automatic detection of defects in welded joints [
27]. However, none of these authors explored the potential of the Mask R-CNN to detect corrosion in metallic structures, with the exception of Forkan et al. [
28], who developed a platform based on the Mask R-CNN, called CorrDectector, to segment corrosion in telecommunication towers.
The present work shifts the contributions of AI in the field of civil infrastructure remote inspection, addressing both the lack of applications of instance segmentation algorithms to the area, as well as developing a methodology capable of identifying corrosion on sandwich panels belonging to large-scale industrial buildings. It also creates a novel dataset containing more than 8k high-resolution images acquired with a modern UAV. The labeled dataset contains about 18k segmented instances to overcome the presence of complex backgrounds, typically derived from the use of aerial images and due to the particularities of the location where these industrial buildings are usually situated.
The innovative nondestructive methodology combines data analytics capabilities derived from deep learning through the yet underexplored Mask R-CNN framework, with some proposed adjustments, and the UAV versatility to help management and maintenance planning assess the condition of their buildings. As far as the authors know, this is the first fully dedicated methodology to identify corrosion on metallic sandwich panels efficiently.
6. Conclusions
This article proposes a methodology to automatically detect corrosion in the roofing systems of large-scale industrial buildings. First, the procedure relies on setting up an image database composed of more than 8k high-resolution images with the support of a UAV vision system. Second, the procedure entails the application of advanced image processing techniques based on the Mask R-CNN deep learning framework. The UAV used was the DJI MAVIC Enterprise Advanced, equipped with an RTK system that can provide the estimated position in real-time and, therefore, the ability to register high-accuracy georeferenced images. Finally, the images dataset, containing about 18k instances of corrosion, was annotated with the VIA software and processed in a JSON file compatible with the AI framework.
The training of the Mask R-CNN model involved tuning some hyperparameters from the advanced library made available by the Facebook AI research team, known as Detectron2. The adjusted hyperparameters were the size of the input images, the data augmentation strategy, the value of the RoI IoU Head hyperparameter, and the backbone network. The results are consistent for the training, validation, and test datasets. In terms of metrics, it is highlighted the average precision for detection and segmentation, considering an IoU of 50%, achieved values of 65.1% and 59.2%, respectively. Furthermore, the precision and recall computed reached 85.8% and 84.0% for all instances identified in the labeling process. Visually, the inferences show that the model can be trusted, identifying the anomalies even in the most complex backgrounds and lighting conditions. Indeed, the results of this research suggest a reliable and effective method for detecting corrosion on sandwich metallic panels, allowing for a long-distance, non-contact, low-cost, and automated inspection, culminating in cost savings within the facility management strategies of large-scale industrial buildings.
As future improvements, the authors are developing an application to integrate a new type of anomaly in the instance segmentation model, such as mechanical damages and water puddle accumulation. Furthermore, a semi-supervised technique to be applied in the already-made database is also being studied, which will support the automatic annotation of the corrosion instances in similar contexts. Finally, the integration of the georeferenced anomalies derived from the AI model within 3D photogrammetric reconstructions of the roofing systems are also planned, as well as the real-time inference of the images with embedded UAV hardware (e.g., NVIDIA JETSON Orin).