Automatic Crack Segmentation for UAV-Assisted Bridge Inspection

Bridges are a critical piece of infrastructure in the network of road and rail transport system. Many of the bridges in Norway (in Europe) are at the end of their lifespan, therefore regular inspection and maintenance are critical to ensure the safety of their operations. However, the traditional inspection procedures and resources required are so time consuming and costly that there exists a significant maintenance backlog. The central thrust of this paper is to demonstrate the significant benefits of adapting a Unmanned Aerial Vehicle (UAV)-assisted inspection to reduce the time and costs of bridge inspection and established the research needs associated with the processing of the (big) data produced by such autonomous technologies. In this regard, a methodology is proposed for analysing the bridge damage that comprises three key stages, (i) data collection and model training, where one performs experiments and trials to perfect drone flights for inspection using case study bridges to inform and provide necessary (big) data for the second key stage, (ii) 3D construction, where one built 3D models that offer a permanent record of element geometry for each bridge asset, which could be used for navigation and control purposes, (iii) damage identification and analysis, where deep learning-based data analytics and modelling are applied for processing and analysing UAV image data and to perform bridge damage performance assessment. The proposed methodology is exemplified via UAV-assisted inspection of Skodsberg bridge, a 140 m prestressed concrete bridge, in the Viken county in eastern Norway.


Introduction
In the past decade, Europe, North America and South America suffered more than 50 bridge collapses due to deterioration-related issues, such as fatigue fracture and aging of materials, which culminated in more than 150 fatalities and close to 20 billion USD in overall losses, during which more than a million people were affected. This deterioration process significantly increases due to aging and structural degradation, which can, over time, alter the structural performance and functionality of a bridge [1]. The other key issue that affects bridge performance is inefficient maintenance, which is usually aggravated by technical and economic limitations associated with inspections. The conventional bridge inspection procedures rely on physical site visits and the visual inspection for severe and observable damages, which are related to factors such as scouring, corrosion, fatigue and deterioration of materials [2]. However, it is demanding to detect most of these factors solely based on human vision, such as fracture or cracks in main beams without easy access from the surface, among others [3].
It is not so long since neural networks entered the field of damage (object) detection; and, it is gaining the momentum to be one of the key methods [24]. Object detection algorithms are evolved from simple image classification to multiple object detection and localization including Region Based Convolutional Neural Networks (R-CNN) [25], Fast R-CNN [26], Faster R-CNN [27], You only look once (YOLO) [28], SSD [29], and Mask R-CNN [30]. In addition, the Generative Adversarial Networks (GAN) are recently started to be used for object detection, see e.g., [31,32]. However, the lack of labelled data makes it difficult to generalize training models across a wide variety of structures like bridges. Therefore, these algorithms for the failure detection of civil infrastructures are limited to specific cases and there are no such rich image databases in the failure detection of civil infrastructures; see e.g., [33][34][35][36]. For instance, Mandal et al. [36] proposed an automated pavement distress analysis system based on the YOLO v2 deep learning framework.
Although enormous efforts have been done to secure efficiency in bridge inspection systems, see e.g., Huston et al. [37], Chen et al. [38], Seo et al. [39], Seo et al. [40], Ayele and Droguett [8], and Belcastro et al. [41], still there exist challenges in bridge inspection and maintenance. To tackle the problems and improve the existing bridge management practices, a new knowledge with proactive methods and tools is needed. In this connection, in this paper, UAV-assisted bridge inspection methodology is proposed for improving inspection accuracy and pinpointing defects (such as cracks in steel elements, fractures in concrete elements, etc.) early on. This eventually helps to monitor high risk bridge elements and reduce failure rates. The applicability of state-of-the-art deep learning techniques, such as Convolutional Neural Networks (CNN) for the automatic per-pixel segmentation of cracks on the structure surface, are assessed. Using post processing image analysis techniques such as OpenCV, Agisoft, SfM, Mask R-CNN, individual cracks are extracted and automatic per crack measurements including width, length, perimeter and area computations are performed.
The rest of the paper is organized as follows: Section 2 presents the proposed methodology for the inspection of bridges via the use of automated UAV image processing for damage detection and performance analysis. Section 3 depicts the case study that describes the investigative methods and results for UAV-assisted inspection of Skodsberg bridge, a 140 m prestressed concrete bridge, in the Viken county in eastern Norway. Section 4 provides a discussion of the results of the study. Concluding remarks and future work suggestions are depicted in Section 5.

Proposed Methodology
The proposed methodology is an integrated set of UAV-assisted inspection and automatic damage identification process. Figure 1 illustrates specific stages that are experimental, and a data driven modelling approach to deliver practical and economically viable UAV-assisted bridge inspection. The methodology comprises three key stages, (i) data collection and model training, where one performs experiments and trials to perfect drone flights for inspection using case study bridges to inform and provide necessary (big) data for the second key step, (ii) 3D photogrammetry/construction, where one built 3D models that offer a permanent record of geometry for each bridge asset, which could be used for navigation and control purposes, (iii) crack identification and segmentation, where deep learning-based data analytics and modelling are applied for processing and analysing drone image data and to perform damage assessment. where one performs experiments and trials to perfect drone flights for inspection using case study bridges to inform and provide necessary (big) data for the second key step, (ii) 3D photogrammetry/construction, where one built 3D models that offer a permanent record of geometry for each bridge asset, which could be used for navigation and control purposes, (iii) crack identification and segmentation, where deep learning-based data analytics and modelling are applied for processing and analysing drone image data and to perform damage assessment.

Stage 1: Data Collection and Model Training
The major task of this stage is to test, trial and perfect drone flights for the inspection of case study bridges that provides data for all the other steps. As such, a core requirement is the collection of multiple overlapping images of the bridge elements. This allows for the use of Structure from Motion (SfM), where both the exterior orientations of the cameras and the geometry of the bridge elements can be computed simultaneously. In practise, to achieve this during a drone flight, images should be taken at regular intervals to ensure consistent overlap with enough redundancy for the reliable measurement of the bridge elements' geometry.
Thereafter, data labelling should be carried out. Labelling typically takes a set of unlabelled drone data and augments each piece of that unlabelled data with meaningful semantic tags. For instance, labels might indicate whether a drone photo contains a crack or not. In principle, in bridge inspections, cracks are one of the failure modes that should be labelled at high precision to train an automatic crack detection and segmentation model. Labelling can be done by using various methods depending on the purpose. For classification, a simple single label is given to the entire image (i.e., crack/no crack), and for detection each image contains bounding box coordinates for all objects in the image.
Afterwards, the size of the cracks can be estimated by creating bounding boxes. This can be achieved, for example, by developing an automated Python script from Portable Network Graphics (PNG) masks as well as tracing and extracting the contours and bounding box extents. Note that PNG is a raster-graphics file-format that supports lossless data compression.

Stage 2: 3D Construction-Photogrammetry
3D models offer a permanent record of geometry for each bridge asset, which could be used for navigation and control purposes. The addition of 3D capabilities to bridge management allows navigation through a complex structure, providing visual identification of the area of concern rather than solely relying on reference names or numbers. The key task of this stage is thus to compute an orthomosaic of the bridge elements geometry. Put simply, an orthomosaic is a mosaic of all images which have been orthorectified (i.e., perspective removed). In this regard, initial geometry reconstruction and camera position estimation should be carried outs by using various techniques, such as using Agisoft MetaShapes, see Agisoft LLC [42]. Agisoft Metashape is a stand-alone software product that performs photogrammetric processing of digital images and generates 3D spatial data to be used in various applications.
Once a course model of the bridge elements geometry has been calculated, dense image matching can be employed to compute a dense point cloud of the entire bridge element geometry. In this paper, we suggest that all images should be combined into a single composite image. The key purpose of Energies 2020, 13, 6250 5 of 16 dense point matching is capturing fine grained bridge elements geometry. The resolution is vital here; hence, one has to employ a high resolution such as up to sub pixel over multiple views. In addition, the accuracy heavily depends on bundle adjustment. Dense matching can be used as a refinement for measurement accuracy and does improve Digital Surface Model (DSM) quality and therefore the resulting orthomosiac. DSM represents the bridge surface and includes all elements and objects on it.
Orthorectification is our final product from Stage 2. Orthomosaic, as mentioned above, is a single continuous image of the whole bridge element geometry with no redundancy or perspective (i.e., every pixel orthogonal to the height plane). This has two benefits; firstly, we remove redundancy of overlapping images for an automatic crack segmentation model, which can lead to potential conflicts. Secondly, the orthomosaic uses the georeferenced 3D point cloud and is therefore georeferenced itself, which means that features located on the orthomosaic have meaningful geospatial coordinates. To compute an orthomosaic, a continuous height surface of the scene is required. As a point cloud consists of discrete observations, we first rasterize the dense point cloud by computing a DSM. This allows every pixel to sample a height value and compute the perspective displacement; for a more detailed example of a typical SfM methodology, see e.g., Westoby et al. [16].

Stage 3: Damage Identification Model
Once the bridge surface images along a predetermined trajectory are collected, raw images will be cropped. Then, damage regions such as cracks will be labelled on cropped images to create the damage (crack) dataset, and then in this stage, effective deep network convolution architectures, specifically with the geometry of crack segmentation in mind, should be designed for the automatic per-pixel segmentation of cracks. Thereafter, the deep convolution network can be trained, validated and tested using the crack dataset. In this connection, in the illustrative case study, we have developed a deep network convolution architecture and conducted a number of training experiments to fit the best convolution architectures for analyzing specific drone images and scenarios.

An Illustrative Case Study-UAV-Assisted Inspection of Skodsberg Bridge, Norway
The proposed methodology is exemplified via a drone-assisted inspection of Skodsberg bridge, a 140 m prestressed concrete bridge, in the Viken county in eastern Norway. The location, structure description, access methods, investigation methods, site specific safety analysis and imagery results are discussed. Skodsberg bridge is a 2-lane vehicular bridge, situated nearby Aremark, Viken county, Eastern Norway. Figure 2 depicts the overall view of the bridge and its key data. It is located at Latitude: 59.2063 • or 59 • 12 22.6" N, Longitude: 11.6932 • or 11 • 41 35.4" E, with Elevation: 115 m (377 feet).

Data Collection
DJI Matrice 100 drone (Shenzhen Dajiang Baiwang Technology Co., Ltd., Shenzhen, China) with Zenmuse Z3 (Shenzhen Dajiang Baiwang Technology Co., Ltd., Shenzhen, China) aerial zoom cameras with 7X zoom capacity are used for carrying out the drone-assisted inspection. This particular drone was chosen based upon distinctive features such as flight time, camera resolution, video resolution, and others. Autonomous control has been tested and trialled by using Z3 cameras and sensors which can help the drone to autonomously avoid obstacles or simply hold altitude in a GPS-denied environment. Other equipment used includes DJI remote controllers, landing platform, GPS antenna and handheld, total stations, tripods, spare batteries, blades; I-pad and connection wires to drone remotes; safety helmets, safety boots and reflective jackets, and tapes and markers. Figure 3 depicted tools and equipment used during drone-assisted inspection.

of 16
The proposed methodology is exemplified via a drone-assisted inspection of Skodsberg bridge, a 140 m prestressed concrete bridge, in the Viken county in eastern Norway. The location, structure description, access methods, investigation methods, site specific safety analysis and imagery results are discussed. Skodsberg bridge is a 2-lane vehicular bridge, situated nearby Aremark, Viken county, Eastern Norway. Figure 2 depicts the overall view of the bridge and its key data. It is located at Latitude: 59.2063° or 59°12′22.6′′ north, Longitude: 11.6932° or 11°41′35.4′′ east, with Elevation: 115 m (377 feet).  Figure 2. Skodsberg Bridge overall view and key bridge data.

Data Collection
DJI Matrice 100 drone (Shenzhen Dajiang Baiwang Technology Co., Ltd, Shenzhen, China) with Zenmuse Z3 (Shenzhen Dajiang Baiwang Technology Co., Ltd, Shenzhen, China) aerial zoom cameras with 7X zoom capacity are used for carrying out the drone-assisted inspection. This Energies 2020, 13, x FOR PEER REVIEW 6 of 16 particular drone was chosen based upon distinctive features such as flight time, camera resolution, video resolution, and others. Autonomous control has been tested and trialled by using Z3 cameras and sensors which can help the drone to autonomously avoid obstacles or simply hold altitude in a GPS-denied environment. Other equipment used includes DJI remote controllers, landing platform, GPS antenna and handheld, total stations, tripods, spare batteries, blades; I-pad and connection wires to drone remotes; safety helmets, safety boots and reflective jackets, and tapes and markers. Figure 3 depicted tools and equipment used during drone-assisted inspection.  Before the drone flight, 2 tripod markers were set up at the middle of the bridge as fixed markers. These markers were used as reference points for drone flights. Thereafter, the flying of the drone is commenced. During the inspection period, the weather was cloudy to start with and ended sunny. These varying light levels can be an issue for the 3D construction of the bridge. This issue was managed by fixing the histogram level on the DJI Matrice 100 app. It was decided to fly the drone across and move along the bridge from one end to the other. This has allowed us to collect highresolution images and videos. Thereafter, the bridge column photos were taken from both sides and with the bridge supports allowing sufficient height, and we therefore managed to fly the drone underneath the bridge. Once the drone flight is done, some additional total station recordings for the bridge side was taken. This is used as an input to the 3D construction. Figure 4 illustrates the level of details obtained from the drone-based imaging for Skodsberg bridge. Before the drone flight, 2 tripod markers were set up at the middle of the bridge as fixed markers. These markers were used as reference points for drone flights. Thereafter, the flying of the drone is commenced. During the inspection period, the weather was cloudy to start with and ended sunny. These varying light levels can be an issue for the 3D construction of the bridge. This issue was managed by fixing the histogram level on the DJI Matrice 100 app. It was decided to fly the drone across and move along the bridge from one end to the other. This has allowed us to collect high-resolution images and videos. Thereafter, the bridge column photos were taken from both sides and with the bridge Energies 2020, 13, 6250 7 of 16 supports allowing sufficient height, and we therefore managed to fly the drone underneath the bridge. Once the drone flight is done, some additional total station recordings for the bridge side was taken. This is used as an input to the 3D construction. Figure 4 illustrates the level of details obtained from the drone-based imaging for Skodsberg bridge.

Model Training
Damages (cracks) need to be labelled with high precision to train the developed crack segmentation model (see Section 3.4). Once the bridge surface images along a predetermined trajectory are collected, raw images are cropped. Then, crack regions are labelled on cropped images to create the crack dataset (see Figure 5), and effective network architectures specifically with the geometry of crack segmentation in mind were designed for the automatic per-pixel segmentation of cracks, see Supplementary Material, Appendix I. Thereafter, per-pixel image segmentation is carried out; as such, we require a full per-pixel mask where each pixel value denotes the pixels semantic label. In this regard, we have achieved this by using the popular open-source GNU Image Manipulation Program (GIMP) and LabelImg. GIMP is an open-source paint tool; LabelImg, on the other hand, is a purpose made graphical image annotation tool and label object bounding boxes in images, see Supplementary Material, Appendix II. The results are obtained in binary mask (i.e., Crack = 255 and background = 0). We have also employed a simple active learning approach for labelling. An active learning approach starts with the same data collection effort as a supervised learning approach. However, instead of naively labelling all drone images, a more specific strategy is employed. First, a small sample of images are manually labelled and a modern supervised deep learning architecture is developed, trained, validated and tested using the crack dataset; refer to the Supplementary Material, Appendix I. A number of training experiments are conducted to fit the best deep learning techniques for analysing specific UAV images and scenarios. A multimodal dataset, combining results from a UAV-data collection, is used to provide training data. To train the network within reasonable time and other resources for processing the UAV data, NVIDIA Titan V Volta

Model Training
Damages (cracks) need to be labelled with high precision to train the developed crack segmentation model (see Section 3.4). Once the bridge surface images along a predetermined trajectory are collected, raw images are cropped. Then, crack regions are labelled on cropped images to create the crack dataset (see Figure 5), and effective network architectures specifically with the geometry of crack segmentation in mind were designed for the automatic per-pixel segmentation of cracks, see Supplementary Material, Appendix I. Thereafter, per-pixel image segmentation is carried out; as such, we require a full per-pixel mask where each pixel value denotes the pixels semantic label. In this regard, we have achieved this by using the popular open-source GNU Image Manipulation Program (GIMP) and LabelImg. GIMP is an open-source paint tool; LabelImg, on the other hand, is a purpose made graphical image annotation tool and label object bounding boxes in images, see Supplementary Material, Appendix II. The results are obtained in binary mask (i.e., Crack = 255 and background = 0). We have also employed a simple active learning approach for labelling. An active learning approach starts with the same data collection effort as a supervised learning approach. However, instead of naively labelling all drone images, a more specific strategy is employed. First, a small sample of images are manually labelled and a modern supervised deep learning architecture is developed, trained, validated and tested using the A multimodal dataset, combining results from a UAV-data collection, is used to provide training data. To train the network within reasonable time and other resources for processing the UAV data, NVIDIA Titan V Volta hardware has been used. In addition, a multi-core NVIDIA platform with AI processors, which is the RX/RS 2000 series, CyberpowerPC SLC8780CPG and nVidia GeForce GTX are also employed. hardware has been used. In addition, a multi-core NVIDIA platform with AI processors, which is the RX/RS 2000 series, CyberpowerPC SLC8780CPG and nVidia GeForce GTX are also employed. Thereafter, the entire unlabelled drone dataset is then passed through this trained network, see Supplementary Material, Appendix I. The network can then report the images it classified with the least confidence, and therefore, the highest uncertainty. These images with the lowest confidence are manually labelled and the model incrementally retrained. By repeating this process, the network can achieve optimum performance with up to 90% less manually labelled data, reducing labelling costs by the same percentage. Figure 5 depicts the labelling process of the Skodsberg Bridge top view and pillars. Thereafter, for estimating the size of the cracks, we have created bounding boxes by employing the developed automated Python script from PNG masks; see Appendix II in the supplemental file. The contours and bounding box extents are traced. Later, tfrecords, which are TensorFlow data formats have been created, since tfrecords are required for an automatic crack detection model. Tfrecords are efficient when serialized and binary format for loading large datasets and easy transfer and access. Since one collects a huge quantity of data during drone-assisted inspection, employing a serialized and binary format has a significant impact on reducing the model training time. In addition, tfrecords allows the storage of sequence data, for instance a time series or word encodings, in a way that allows for a very efficient and convenient import of this type of data. Figure 6 depicts the process of extracting the bounding box from the drone photo, which is supported by the tracing of contours. Thereafter, the entire unlabelled drone dataset is then passed through this trained network, see Supplementary Material, Appendix I. The network can then report the images it classified with the least confidence, and therefore, the highest uncertainty. These images with the lowest confidence are manually labelled and the model incrementally retrained. By repeating this process, the network can achieve optimum performance with up to 90% less manually labelled data, reducing labelling costs by the same percentage. Figure 5 depicts the labelling process of the Skodsberg Bridge top view and pillars.
Thereafter, for estimating the size of the cracks, we have created bounding boxes by employing the developed automated Python script from PNG masks; see Appendix II in the supplemental file. The contours and bounding box extents are traced. Later, tfrecords, which are TensorFlow data formats have been created, since tfrecords are required for an automatic crack detection model. Tfrecords are efficient when serialized and binary format for loading large datasets and easy transfer and access. Since one collects a huge quantity of data during drone-assisted inspection, employing a serialized and binary format has a significant impact on reducing the model training time. In addition, tfrecords allows the storage of sequence data, for instance a time series or word encodings, in a way that allows for a very efficient and convenient import of this type of data. Figure 6 depicts the process of extracting the bounding box from the drone photo, which is supported by the tracing of contours.

D Construction-Orthorectification
The bundle adjustment for Skodsberg bridge is calculated using the Agisoft MetaShapes (Agisoft LLC, St. Petersburg, Russia) bundle adjustment algorithm. Bundle adjustment is one of photogrammetric operations that is used to solve the inner and outer orientation of each camera, reconstructing their spatial position/orientation to each other. The performance and accuracy are, in general, dependent on image acquisition. That means that better images with good overlap led to high accuracy reconstruction. In this regard, we have taken the image with 60−70% overlap. The initial process of bundle adjustment takes 2D drone images, and outputs a 3D sparse point cloud of the scene and exterior orientations of the cameras, which enables scale for measurements. Thereafter, we have carried out dense point matching by using Agisoft to capture fine grained bridge element geometry. A resolution of up to sub pixel over multiple views and dense matching to create continuous mesh surfaces were employed. Figure 7 illustrates the bundle adjustment and dense matching process.

D Construction-Orthorectification
The bundle adjustment for Skodsberg bridge is calculated using the Agisoft MetaShapes (Agisoft LLC, St. Petersburg, Russia) bundle adjustment algorithm. Bundle adjustment is one of photogrammetric operations that is used to solve the inner and outer orientation of each camera, reconstructing their spatial position/orientation to each other. The performance and accuracy are, in general, dependent on image acquisition. That means that better images with good overlap led to high accuracy reconstruction. In this regard, we have taken the image with 60−70% overlap. The initial process of bundle adjustment takes 2D drone images, and outputs a 3D sparse point cloud of the scene and exterior orientations of the cameras, which enables scale for measurements. Thereafter, we have carried out dense point matching by using Agisoft to capture fine grained bridge element geometry. A resolution of up to sub pixel over multiple views and dense matching to create continuous mesh surfaces were employed. Figure 7 illustrates the bundle adjustment and dense matching process. Afterwards, Digital Surface Models (DSM) by orthorectification of images were created. While generating DSM, relief displacement or height distortion, which is the shift in bridge elements or an object's image position caused by its elevation above a particular datum, can occur. To remove relief displacement, orthomosiac maps have been employed to DSM. The DSM then provides the resulting pixel size, and therefore image scale. Furthermore, orthorectification of the whole bridge element geometry is generated, which is a single continuous image of the whole scene, with no redundancy or perspective; refer to Supplementary Material Appendix III for python script of orthorectification process. That means every pixel is orthogonal to the height plane. We have stored the orthomosaic as a Tagged Image File Format (Tiff) file where each pixel has an associated pixel size in real world scale (i.e., centimetres). Other information includes bounding extent (in real world coordinates), coordinate system, datum and ellipsoid. Figure 8 depicts the DSM and orthomosiac output for Skodsberg bridge. Afterwards, Digital Surface Models (DSM) by orthorectification of images were created. While generating DSM, relief displacement or height distortion, which is the shift in bridge elements or an object's image position caused by its elevation above a particular datum, can occur. To remove relief displacement, orthomosiac maps have been employed to DSM. The DSM then provides the resulting pixel size, and therefore image scale. Furthermore, orthorectification of the whole bridge element geometry is generated, which is a single continuous image of the whole scene, with no redundancy or perspective; refer to Supplementary Material Appendix III for python script of orthorectification process. That means every pixel is orthogonal to the height plane. We have stored the orthomosaic as a Tagged Image File Format (Tiff) file where each pixel has an associated pixel size in real world scale (i.e., centimetres). Other information includes bounding extent (in real world coordinates), coordinate system, datum and ellipsoid. Figure 8 depicts the DSM and orthomosiac output for Skodsberg bridge.

Crack Segmentation
The central thrust of this stage is detecting and segmenting cracks at the bridges' element level. In this regard, we have designed Mask R-CNN, which is an effective deep network convolution architecture specifically with the geometry of crack segmentation in mind for the automatic per-pixel segmentation of cracks; refer to Supplementary Material Appendix I. The developed Mask R-CNN is a custom model/software designed for geospatial crack detection and segmentation and, currently, an early development proof of concept. As mentioned above, to train the network within reasonable time and other resources for processing the UAV data, we used NVIDIA Titan V Volta hardware, we also employed a multi-core NVIDIA platform with AI processors like the RX/RS 2000 series and CyberpowerPC SLC8780CPG, nVidia GeForce GTX. For any drone image input, a typical crack detection and segmentation, based on Mask R-CNN, includes: • Class labelling and creating the bounding box coordinates, and then returns the object mask.
Resulting masks passed though the crack statistics analysis script; see Appendix II in the supplemental file. The statistics have been calculated by using OpenCV image processing library. We have utilized contour approximations for extracting individual cracks from the predicted binary mask. Then, the contour perimeter and area directly on vector points were calculated. For estimating the length and width of the crack, the Euclidean distance from corners (a1, b1) and (a1, c1) is calculated. The maximum distance, i.e., MAX ((a1, b1), (a1, c1)) is assumed to be the length and the minimum assumed to be the width. This type of approximation is more accurate with the increase of linearity of the crack. Figure 9 demonstrates the process of estimation of cracks length and width using Euclidean distance.

Crack Segmentation
The central thrust of this stage is detecting and segmenting cracks at the bridges' element level. In this regard, we have designed Mask R-CNN, which is an effective deep network convolution architecture specifically with the geometry of crack segmentation in mind for the automatic per-pixel segmentation of cracks; refer to Supplementary Material Appendix I. The developed Mask R-CNN is a custom model/software designed for geospatial crack detection and segmentation and, currently, an early development proof of concept. As mentioned above, to train the network within reasonable time and other resources for processing the UAV data, we used NVIDIA Titan V Volta hardware, we also employed a multi-core NVIDIA platform with AI processors like the RX/RS 2000 series and CyberpowerPC SLC8780CPG, nVidia GeForce GTX. For any drone image input, a typical crack detection and segmentation, based on Mask R-CNN, includes: • Class labelling and creating the bounding box coordinates, and then returns the object mask.
Resulting masks passed though the crack statistics analysis script; see Appendix II in the supplemental file. The statistics have been calculated by using OpenCV image processing library. We have utilized contour approximations for extracting individual cracks from the predicted binary mask. Then, the contour perimeter and area directly on vector points were calculated. For estimating the length and width of the crack, the Euclidean distance from corners (a1, b1) and (a1, c1) is calculated. The maximum distance, i.e., MAX ((a1, b1), (a1, c1)) is assumed to be the length and the minimum assumed to be the width. This type of approximation is more accurate with the increase of linearity of the crack. Figure 9 demonstrates the process of estimation of cracks length and width using Euclidean distance.   Figure 10 displays the Graphical User Interface (GUI) outline of the crack segmentation Mask RCNN analysis for one of the drone images of Skodsberg bridge.   The crack segmentation Mask RCNN calculates the length and width of the crack based on the Euclidean distance measurement principle, and the results ( Figure 10) show that the approximation is therefore more accurate the more linear the crack is. For instance, the measurement for crack id:004 is of high accuracy; however, the accuracy of the values for crack id:002 is lower since it somehow exaggerates the width of the crack.

Result Discussion
The developed novel deep learning crack segmentation model/toolkit is proved to be beneficial for UAV-assisted inspection of bridges. In this regard, UAVs equipped with video cameras were employed as a core of an emerging inspection ecosystem in the above illustrated case study, which also includes post-processing elements such as photogrammetry (taking measurements from photographs and video images) and 3D imagery. The 2D imagery data can be used to quickly establish a basic knowledge of the bridge condition and is usually the first port of call. The developed model/toolkit is designed specifically with the geometry of crack segmentation in mind for the automatic per-pixel segmentation of cracks measurements including width, length, perimeter and area computations.
The findings are as follows: It is demonstrated that the 3D model of bridge can be used as base line for maintenance and asset management purposes. The addition of 3D capabilities to bridge management allows navigation through a complex structure, providing visual identification of the area of concern rather than solely relying on reference names or numbers.

Concluding Remarks and Future Work Suggestions
This work introduced an automatic crack segmentation methodology for the inspection of bridges via the use of segmentation of images obtained from UAVs. The proposed crack segmentation Mask RCNN detects, locates and quantifies cracks and fractures to a level likely to be impossible for a human inspector, and remove much of the uncertainty and prejudice associated with an inspector's personal judgment of the severity of the structural damage. The developed crack segmentation Mask RCNN model/toolkit is custom software designed for geospatial defect detection and it is currently an early development proof of concept. Therefore, the results from the case study should be interpreted in light of the current state of knowledge about crack segmentation models. Moreover, the resulting crack measurement values from the illustrative case study analysis should be updated as new data becomes available, preferably by performing a domain specific road and bridge images data collection and thereby gradually supplanting the developed Mask RCNN model. The proposed pipeline also has the capability to adapt and capture the advantages of UAVs and can be generalized for failure detection in infrastructures such as railways and overhead power grids.
Our intent is not to provide generalized advice on whether UAV-assisted bridge inspection should replace the conventional inspection or not, since these prescriptions will be particular and heterogeneous to types of bridges and accompanying drone-related rules and regulations. Rather, the intent is to highlight the fact that UAV-assisted bridge inspection has a huge potential in the years to come. Our conclusion is that proper using of drones as a key part of bridge inspection and will result in more efficient inspection operations and improved safety.

Future Work Suggestion
Crack segmentation models such as Mask RCNN require very large datasets before generalization is achieved. Such a project would require domain specific road and bridge images to be collected. The next step could be developing a novel deep/machine Generative model to predict fatigue crack propagation. Deep learning techniques can be also utilized to identify relevant variables that influence the direction and rate of the fatigue crack propagation. The other future plan is, with cooperation with road and railway industries, to take a first step towards fully autonomous inspections by coupling the proposed model/toolkit. Finally, the proposed crack segmentation model can be generalized to other defects such as corrosion and structural damage.
Supplementary Materials: The following are available online at http://www.mdpi.com/1996-1073/13/23/6250/s1. Some or all data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request.