Automated Mapping of Post-Storm Roof Damage Using Deep Learning and Aerial Imagery: A Case Study in the Caribbean

Kucharczyk, Maja; Nesbit, Paul R.; Hugenholtz, Chris H.

doi:10.3390/rs17203456

Open AccessArticle

Automated Mapping of Post-Storm Roof Damage Using Deep Learning and Aerial Imagery: A Case Study in the Caribbean

by

Maja Kucharczyk

^1,*

,

Paul R. Nesbit

²

and

Chris H. Hugenholtz

¹

Department of Geography, University of Calgary, Calgary, AB T2N 1N4, Canada

²

Department of Environmental Science, University of San Francisco, San Francisco, CA 94117, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(20), 3456; https://doi.org/10.3390/rs17203456

Submission received: 6 August 2025 / Revised: 13 September 2025 / Accepted: 8 October 2025 / Published: 16 October 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

Highlights

What are the main findings?

We developed an end-to-end workflow for automated roof damage mapping using Esri ArcGIS Pro deep learning tools and post-hurricane aerial images acquired by drones and crewed aircraft in the Caribbean.
The highest overall F1 scores were 0.88 (roof decking) and 0.80 (roof hole). Accuracies increased when single-class models were trained and when the training datasets were expanded to include representation of all four test areas.

What are the implications of the main findings?

The data, tools, and trained models from this study can be downloaded and used to continue model training and testing in additional geographic and imaging contexts.
Our deep learning methodology may be useful in real-world situations, at least for generating initial damage maps that are later refined by human analysts.

Abstract

Roof damage caused by hurricanes and other storms needs to be rapidly identified and repaired to help communities recover from catastrophic events and support the well-being of residents. Traditional, ground-based inspections are time-consuming but have recently been expedited via manual interpretation of remote sensing imagery. To potentially accelerate the process, automated methods involving artificial intelligence (i.e., deep learning) can be applied. Here, we present an end-to-end workflow for training and evaluating deep learning image segmentation models that detect and delineate two classes of post-storm roof damage: roof decking and roof holes. Mask2Former models were trained using 2500 roof decking and 2500 roof hole samples from drone RGB orthomosaics (0.02–0.08 m ground sample distance [GSD]) captured in Sint Maarten and Dominica following Hurricanes Irma and Maria in 2017. The trained models were evaluated using 1440 reference samples from 10 test images, including eight drone orthomosaics (0.03–0.08 m GSD) acquired outside of the training areas in Sint Maarten and Dominica, one drone orthomosaic (0.05 m GSD) from the Bahamas, and one orthomosaic (0.15 m GSD) captured in the US Virgin Islands with a crewed aircraft and different sensor. Accuracies increased with a single-class modeling approach (instead of training one dual-class model) and expansion of the training datasets with 500 roof decking and 500 roof hole samples from external areas in the Bahamas and US Virgin Islands. The best-performing models reached overall F1 scores of 0.88 (roof decking) and 0.80 (roof hole). In this study, we provide: our end-to-end deep learning workflow; a detailed accuracy assessment organized by modeling approach, damage class, and test location; discussion of implications, limitations, and future research; and access to all data, tools, and trained models.

Keywords:

artificial intelligence; GeoAI; semantic segmentation; RGB imagery; damage assessment; storm; hurricane; disaster; emergency; hazard

1. Introduction

1.1. Background

Storms (e.g., hurricanes, cyclones, typhoons, tornadoes) are the costliest natural hazard-related disaster type, resulting in an average of USD 70 billion in recorded economic losses per year [1]. Physical impacts from storms include flooding, debris, downed trees and power lines, and damage to personal and public property. Storm damage to buildings often affects roofs [2]. Strong winds rip off waterproof materials such as shingles and metallic covers. Sometimes, all layers comprising the roof are torn off, resulting in holes [2]. Once these protective layers are removed, the roof no longer functions and the building and its residents are exposed to the elements, including rain, sun, wind, and debris.

Quick repairs must be made to storm-damaged roofs to allow residents to keep living in their homes. This effort reduces temporary shelter needs and additional (costly) damage to buildings [3]. Keeping residents in their homes also serves an important role for mental health, including maintaining a sense of security, familiarity, and privacy [4].

Permanent repairs to damaged roofs are difficult to complete in an emergency timeframe. Therefore, temporary repairs are commonly done in the meantime. These include the application of plastic sheeting, tarps, and other materials including wood and nails to cover exposed roof decking (areas where waterproof layers are missing, but an underlayer is present) and roof holes (areas where no roof layers remain) [5].

Several organizations around the world supply plastic sheeting and tarps to affected residents following major events [6,7,8,9,10,11]. Some organizations also perform repairs. Operation Blue Roof, which is managed by the United States Army Corps of Engineers for the Federal Emergency Management Agency (FEMA), is a program that provides free temporary repairs of residential roofs [12]. To qualify for the program, roofs are assessed for eligibility using factors such as their materials and extent of damage [12]. These inspections have traditionally been performed by ground crews, which can be significantly time-consuming [13].

In recent years, Operation Blue Roof started to expedite inspections by visually assessing aerial and satellite imagery in a GIS environment [13]. To potentially accelerate the process, remote sensing imagery can be combined with artificial intelligence (AI) to train computers to look for and outline roof damage. This automated mapping generates a GIS database that can be used to quantify and manage damage and repairs.

Deep learning is a form of AI that has been growing in popularity in the field of remote sensing image classification since 2014 [14,15,16]. Four major approaches include: (1) scene classification (where an entire image is assigned one or more labels); (2) semantic segmentation (where each pixel in an image is assigned a label); (3) object detection (where objects of interest in an image are detected and their bounding boxes are produced); and (4) instance segmentation (where objects of interest in an image are detected and their boundary masks are produced) [17].

The objectives of this research are to train and evaluate deep learning models that perform automated detection and delineation of two classes of post-storm roof damage: roof decking and roof holes. We apply the semantic segmentation capabilities of a popular image segmentation framework, Mask2Former [18]. Mask2Former and custom variants have been used in numerous related remote sensing studies to detect and delineate buildings [19,20,21,22,23,24,25,26], rooftop photovoltaic panels [27,28,29], and doors and windows on building facades [30].

The present study builds on previous work that delineated roof damage using remote sensing imagery by: (1) using an approach that does not require the isolation of rooftops in the data, (2) observing two classes of roof damage that are relevant to post-storm damage assessment, and (3) evaluating model accuracy in multiple external test locations in the Caribbean. We first trained single-class and dual-class Mask2Former models using 2500 roof decking and 2500 roof hole samples from drone RGB orthomosaics (0.02–0.08 m ground sample distance [GSD]) captured in Sint Maarten and Dominica following Hurricanes Irma and Maria in 2017. The trained models were evaluated using 1440 reference samples from 10 RGB test images, including eight drone orthomosaics (0.03–0.08 m GSD) acquired outside of the training areas in Sint Maarten and Dominica, one drone orthomosaic (0.05 m GSD) from the Bahamas following Hurricane Dorian, and one orthomosaic (0.15 m GSD) derived from images captured in the US Virgin Islands after Hurricane Maria with a crewed aircraft and different sensor. Compared to the dual-class model, the two single-class models resulted in a higher overall F1 score for each damage class. To examine the impact of expanding the training datasets with additional geographic and imaging variety, we then incorporated 500 roof decking and 500 roof hole samples from external areas in the Bahamas and US Virgin Islands. Single-class models trained using the expanded training datasets resulted in the highest overall F1 scores: 0.88 (roof decking) and 0.80 (roof hole). In this study, we provide: our end-to-end deep learning workflow; a detailed accuracy assessment organized by modeling approach, damage class, and test location; discussion of implications, limitations, and future research; and access to all data, tools, and trained models.

This article is organized as follows. First, Section 1.2 presents a literature review of the observation of roof damage using remote sensing data, including a summary of research needs. Section 2 presents our workflow, as well as the process for extending the original results via training dataset expansion. Section 3 and Section 4 contain results and a discussion, respectively. Finally, we summarize our findings and provide conclusions in Section 5.

1.2. Related Work and Research Needs

From a literature review of 83 studies that used remote sensing data to observe roof damage, 42 classified entire roofs as damaged [31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72], while the other 41 performed localization of roof damage [73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113]. Of the 41 localization-focused studies, 14 performed manual interpretation of remote sensing data, including oblique and orthorectified optical imagery, videos, digital surface models (DSMs), and 3D point clouds [73,74,75,76,77,78,79,80,81,82,83,84,85,86]. Meanwhile, four studies used deep learning to detect (i.e., produce bounding boxes around) localized roof damage [87,88,89,90]. The remaining localization-focused studies (23) used (semi-)automated approaches to delineate roof damage [91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113], which is the objective of our study. Damage delineation is important because it allows for areal measurements, which inform repair needs (e.g., how much material is required).

Delineation of roof damage using remote sensing data and (semi-)automated approaches has involved the use of light detection and ranging (lidar) as well as optical imagery products. Lidar-focused studies (6) have performed change detection using pre- and post-event DSMs and point clouds [91,92,93,94], classification of post-event point clouds [95], and thresholding of post-event point cloud intensity values [96].

The remaining studies (17) that performed (semi-)automated delineation of roof damage focused on the use of oblique and orthorectified optical imagery and, occasionally, photogrammetrically derived DSMs and point clouds. Approaches included: unsupervised and supervised classification of pixels [97], superpixels [98,99,100,101,102,103,104], and image objects [105]; image-based texture-wavelet analysis [106,107,108,109]; unsupervised image segmentation followed by the calculation of indexing parameters [110]; clustering and classification of photogrammetric point clouds [111]; and deep learning. The deep learning-related studies applied instance segmentation [112] and semantic segmentation [104,113] frameworks to delineate roof damage.

It is important to note that these 17 image-based damage delineation studies presented approaches that required manual or (semi-)automated isolation of rooftop areas. In other words, non-rooftop areas were generally not considered in the application and evaluation of the damage delineation techniques. These studies isolated rooftops in their data via manual approaches [97,103,110,113], building footprint polygons [102,104,105], edge detection and color invariance properties [106,107,108,109], point cloud filtering [111], vegetation and shadow masking of superpixels [101], the use of a normalized DSM [98,99,100], and deep learning object detection [112]. Rooftop extraction is useful for eventually calculating the percent of damage to each roof, which can be used to rank roofs based on their repair needs [92,93,102,104,105,106,107,108,109] and determine their eligibility for repair [12]. However, it is an extra step that may be time-consuming and inaccurate due to factors such as outdated or misaligned building footprints and classification error.

Furthermore, most of these 17 studies delineated one general roof damage class. Three studies used at least one non-general class, including “hole” [97,113], “deformation” [113], “debris” [113], “collapse” [112], and “deficiency” [112]. In a post-storm temporary repair context, class differentiation is important due to varying labor and repair material needs. For example, exposed roof decking may only require the application of plastic sheeting, while a roof hole may require plywood prior to the application of plastic sheeting [5]. Relatedly, these two classes are separate factors in determining the level of damage to a building, such as in the FEMA Hazus Hurricane Model [114].

2. Materials and Methods

The objectives of this research were to train and evaluate deep learning models that perform automated detection and delineation of two classes of post-storm roof damage: roof decking and roof holes. We used Mask2Former, a universal image segmentation framework that is capable of performing semantic, instance, and panoptic segmentation. Here, we used its semantic segmentation capabilities. The architecture of Mask2Former has three main components. First, a backbone creates a low-resolution feature map from an input image. Then, a pixel decoder gradually upsamples the low-resolution features and generates high-resolution, per-pixel embeddings. These embeddings are then fed to a transformer decoder to predict masks (segments) and corresponding classes [18].

During training of a Mask2Former model, training tiles (each comprising an orthomosaic subset with an accompanying label/mask raster) are systematically input to the model one minibatch (e.g., four tiles) at a time. After a minibatch is passed through the Mask2Former model, a loss (error) is calculated based on the model’s mask and class predictions. The loss is used to guide the model in adjusting its learnable parameters such that the loss is reduced. This iterative learning process continues with each successive minibatch of tiles. For a detailed explanation of Mask2Former, the reader is referred to the original publication [18].

The following sections describe major steps in our deep learning workflow: study area and images (Section 2.1), image preparation (Section 2.2), training and reference polygon creation (Section 2.3), training data export (Section 2.4), model configuration and training (Section 2.5), model inference and post-processing (Section 2.6), and accuracy assessment (Section 2.7). In Section 2.8, we describe the process of extending the original results by expanding the training datasets with samples from additional geographic and imaging contexts, followed by training and evaluation of new models. Figure 1 shows a flowchart of our deep learning workflow, including data inputs/outputs and Esri ArcGIS Pro tools. All procedures were completed using: (1) Esri ArcGIS Pro v.3.4.3 with Python packages arcpy v.3.4 and arcgis v.2.4.0.1 (also known as the ArcGIS API for Python); (2) Python deep learning dependencies for ArcGIS Pro 3.4 (an installer for which can be downloaded at [115]); and (3) a desktop computer with a 64-bit Windows 11 Pro operating system, 4.5 GHz AMD Ryzen Threadripper PRO 5995WX 64-core CPU, 256 GB RAM (Samsung M393A4K40EB3-CWE), and 24 GB NVIDIA GeForce RTX 4090 GPU. All data, tools, and trained models presented in this study are available for download at [116]. Because the availability of file hosting platforms can change over time, a GitHub repository was created to host the download link. The download link can be updated as needed and continue to be shared (as long as the GitHub repository remains accessible). In the event of file access challenges, the reader is encouraged to contact the corresponding author.

Table 1. Properties of orthomosaics used for model training and testing.

Location	Category	Number of Orthomosaics	GSD (m)	Total Area (km²) ¹	Imaging Platform	Source
Dominica	Training	50	0.03–0.08	20.29	SkyRanger ², M200 ³	GlobalMedic [117]
Sint Maarten	Training	24	0.02–0.08	11.77	SkyRanger ², M200 ³	GlobalMedic [117]
All		74	0.02–0.08	32.06
Dominica	Testing	4	0.04	1.75	M200 ³	GlobalMedic [117]
Sint Maarten	Testing	4	0.03–0.08	1.62	SkyRanger ²	GlobalMedic [117]
The Bahamas	Testing	1	0.05	0.51	M200 ³	GlobalMedic [117]
US Virgin Islands	Testing	1	0.15	0.47	Crewed aircraft ⁴	US NOAA NGS [118]
All		10	0.03–0.15	4.35

¹ Areas were calculated based on the polygons used for clipping the orthomosaics (Figure 2). ² Multirotor drone: Teledyne FLIR SkyRanger. ³ Multirotor drone: DJI M200 with a Zenmuse X4S sensor. ⁴ Crewed aircraft with a Trimble Digital Sensor System.

Figure 2. Locations of orthomosaics used for model training and testing.

2.1. Study Area and Images

In September 2017, Hurricanes Irma and Maria passed through the Caribbean. Weeks and months after landfall, GlobalMedic (a Canadian disaster-relief charity [117]) acquired drone imagery in locations including Sint Maarten and Dominica. They used small multirotor drones equipped with RGB cameras (i.e., Teledyne FLIR SkyRanger and DJI M200 with a Zenmuse X4S sensor). The RGB images were processed using Pix4D Pix4Dmapper structure-from-motion photogrammetry software to produce 0.02–0.08 m orthorectified image mosaics (orthomosaics).

Table 1 provides the attributes of these orthomosaics, while Figure 2 shows their locations. Four orthomosaics from Dominica and four from Sint Maarten were designated as test images, while the remainder (50 from Dominica and 24 from Sint Maarten) were used for model training (Table 1 and Figure 2). Test images were chosen based on whether they could be comprehensively labeled, in which each roof decking and roof hole object would be delineated. To evaluate model accuracy in additional geographic and imaging contexts, we designated two other orthomosaics as test images (Table 1 and Figure 2). The first was derived from drone RGB images acquired by GlobalMedic in the Bahamas following Hurricane Dorian in 2019. As in the 2017 campaign, a DJI M200 multirotor drone with a Zenmuse X4S sensor was used and the images were processed in Pix4Dmapper to produce a 0.05 m orthomosaic. The other test orthomosaic was derived from aerial images acquired by the US National Oceanic and Atmospheric Administration (NOAA) National Geodetic Survey (NGS) in the US Virgin Islands following Hurricane Maria in 2017. The images were captured using a crewed aircraft and Trimble Digital Sensor System, processed into 0.15 m orthomosaics, and split into tiles available for download [118]. We used one of the tiles as a test image. Since the Bahamas and US Virgin Islands test images were captured in locations farther from the training areas than the Dominica and Sint Maarten test images, these orthomosaics were used for evaluating model transferability to different geographic settings. The US Virgin Islands test image also provided an opportunity to evaluate model transferability to an imaging context with a higher GSD and different sensor compared to the training data. Furthermore, the US Virgin Islands orthomosaic is part of the vast NOAA NGS Emergency Response Imagery archive that grows with new datasets following major events [119]. Given its size, coverage, and frequency of new data addition, we wanted to explore the potential of incorporating NOAA NGS Emergency Response Imagery into this deep learning application. All orthomosaics used in this study and an accompanying metadata table are available for download at [116].

2.2. Image Preparation

To prepare the downloaded orthomosaics for model training and testing, we first used the Create Features tool in Esri ArcGIS Pro to delineate the portion of each orthomosaic that we wanted to include (Figure 1 and Figure 2). Then, we used a custom ArcGIS Pro tool called Prepare Images, which is part of the Roof Damage Assessment toolbox and is available for download at [116]. To run the tool, we input the downloaded orthomosaics and their boundary polygons (Figure 1 and Figure 2). The following processing was performed on each image: (1) extracting RGB bands, if needed; (2) projecting, if needed; (3) resampling to 0.05 m/px using cubic convolution; (4) clipping by the corresponding boundary polygon; and (5) exporting to an 8-bit unsigned file geodatabase raster.

The US Virgin Islands 4-band test orthomosaic required RGB band extraction and projection, whereas the drone orthomosaics were already 3-band RGB images in appropriate projected coordinate systems. Resampling was done to establish a constant scale. Because each image would eventually be split into 512 px² tiles for either model training or inference, the use of a constant pixel size controlled the scale at which damage objects and surroundings were presented to the model. A pixel size of 0.05 m was chosen because it allows for small damage objects to be resolved while supporting computational efficiency, as opposed to lower values and subsequently more pixels. Cubic convolution was the chosen resampling technique because it is suitable for remote sensing imagery and produces sharper results and less geometric distortion than bilinear and nearest neighbor, respectively [120,121]. The orthomosaics were clipped using their corresponding boundary polygons (Figure 2) to remove outer regions of geometric distortion and no data, as well as to reduce file sizes. Similarly to resampling to a constant spatial scale, each orthomosaic was exported to an 8-bit unsigned raster to establish a common radiometric scale of 0–255.

2.3. Training and Reference Polygon Creation

To prepare vector data that would be used to export training data (Section 2.4) and evaluate each model (Section 2.7), we imported the prepared orthomosaics into ArcGIS Pro and used the Create Features tool to digitize roof decking and roof hole objects (Figure 1). All test images were comprehensively labeled such that each roof damage object was delineated. For the training images, we labeled damage as comprehensively as possible. Table 2 provides a summary of training and reference (testing) polygons in each location. A total of 5000 training polygons were created, with 2500 in each location and class. For the reference polygons, there was a natural class imbalance in each location, but each class had a similar total number of polygons (700 decking and 740 hole objects).

2.4. Training Data Export

To export training data to the proper Esri format, we created a custom ArcGIS Pro tool called Export Training Data, which is also included in the Roof Damage Assessment toolbox [116]. To run the tool, we input the image boundary polygons, prepared training orthomosaics, and training polygons (Figure 1). Export Training Data uses the ExportTrainingDataForDeepLearning function from arcpy [122]. A metadata_format of Classified_Tiles was specified since this is the required format for training a Mask2Former model. Each exported training tile comprised a 512 px² subset of a prepared training orthomosaic and an accompanying label raster with cell values corresponding to background (0) and damage (1 for roof decking and 2 for roof hole). Figure 3 and Figure 4 show example training tiles (images and labels) for decking and holes, respectively. The tile_size_x and tile_size_y parameters were each set to 512 (px) to allow for tiles to generally capture entire roofs and their surroundings while allocating a suitable areal proportion to damage. The stride_x and stride_y parameters determine the translation when creating new tiles. The chosen value of 128 (px) for both dimensions resulted in multiple, partially overlapping tiles capturing each roof damage object. This strategy was used to introduce more context (background) surrounding roof damage objects and to expand the training dataset with variety in terms of object translation. To avoid creating tiles with small portions of roof damage objects and inadequate surrounding context, we set the min_polygon_overlap_ratio parameter to 0.5. This meant that a labeled damage object would be represented in a training tile only if its corresponding training polygon overlapped with the tile boundary by at least 50%. Finally, the in_mask_polygons parameter was set to the image boundary polygons (Figure 2) to ensure that training data creation only occurred within the imaged areas.

To compare the accuracies of single-class and dual-class modeling approaches, we exported three training datasets: (1) decking only, (2) holes only, and (3) dual-class. Each of these datasets were used to train a model (Section 2.5) and are available for download at [116].

2.5. Model Configuration and Training

Model configuration and training were performed using the arcgis Python package (also known as the ArcGIS API for Python) in Jupyter Notebook v.7.2.1 (Figure 1). The notebooks for training models (Train Single-Class Model, Train Dual-Class Model) are available for download at [116]. Each notebook can be opened and run using Jupyter Notebook (which is installed with ArcGIS Pro) or in ArcGIS Pro as an ArcGIS Notebook.

Using the deep learning module of arcgis (arcgis.learn) [123], the prepare_data function was used to create a data object from the exported training dataset. A chip_size of 512 (px²) was specified to match the tile size of the exported data. The val_split_pct parameter was set to 0.1 (default), meaning 10% of the tiles would be set aside as validation tiles for model performance evaluation following each training epoch. For the dual-class model, the stratify parameter was enabled to maintain class balance when splitting the dataset for training and validation. The batch_size parameter was set to 4, such that the tiles were passed through the model one minibatch (four tiles) at a time. The prepare_data function also presents an option to specify which, if any, on-the-fly data augmentations will be applied to the tiles (via the transforms parameter). These on-the-fly augmentations are randomly applied to tiles during model training. This is not an additive function; the total number of tiles used for training remains the same. After testing various types of augmentations and values, Esri’s default settings were kept. These settings use the vision.transform module of the fastai Python package [124] and were determined based on good performance on satellite imagery [123]. They include cropping, padding, dihedral affine transform, symmetric warping (range of −0.2–0.2), rotation (range of −90.0–90.0), zooming (range of 1.0–3.0), brightness change (range of 0.25–0.75), and contrast change (range of 0.5–2.0) [124]. For the remainder of the prepare_data function parameters, we used the default values [123].

The prepared training data object was used to train a Mask2Former model. First, a Mask2Former model object was created and configured using the MMSegmentation class. This class was built using MMSegmentation v.1.1.2, an open-source semantic segmentation toolbox that uses PyTorch v.2.0.1 [125,126]. The data parameter was set to the prepared training data object, while the model parameter was set to mask2former. This loaded a Mask2Former model from the MMSegmentation repository with a Swin Transformer backbone [127]. The backbone was pre-trained using the full ImageNet dataset with 14 million images and 21,841 classes [128], while the overall Mask2Former model was pre-trained using the Cityscapes dataset with 25,000 images and 30 classes [129]. For the dual-class model, we enabled the class_balancing parameter to weight loss (model error) by the relative proportion of pixels in each class. This was done because, although the training dataset was derived from 2500 objects in each class, holes tended to be smaller and represented by fewer pixels (Table 2). For the remainder of the MMSegmentation class parameters, we used the default values [123].

After creating the model object, we set the training parameters and trained the model using the fit method. The learning rate (lr), which relates to the increment by which the model’s learnable parameters are adjusted after each forward pass of a minibatch, was set to 0.0001 after testing several values. Also based on testing, one-cycle learning rate scheduling (one_cycle) was disabled. We set the maximum number of epochs (epochs) to 100 and validation loss (valid_loss) as the metric to monitor throughout training (monitor). We also enabled both early_stopping and checkpoint. Early stopping is a strategy that mitigates overfitting by ending model training if validation loss does not improve by at least 0.001 in five consecutive epochs. With checkpointing enabled, the final model that is saved is the one with the lowest validation loss (i.e., the one from five epochs prior to early stopping). For the remainder of the fit method parameters, we used the default values [123]. Three models were trained—(1) decking only, (2) holes only, and (3) dual-class—each of which is available for download at [116].

2.6. Model Inference and Post-Processing

After the models were trained and saved, we used a custom ArcGIS Pro tool to generate predicted polygons of roof decking and roof holes in the test areas. This tool is called Delineate Roof Damage and is also part of the Roof Damage Assessment toolbox [116]. We ran the tool three times, each time inputting one trained model and all prepared test orthomosaics (Figure 1). For each test image, the tool first used the ClassifyPixelsUsingDeepLearning function from arcpy [130] to classify roof decking and/or roof hole pixels. This function split each test image into 512 px² tiles, which were then passed through the model four tiles at a time (batch_size of 4) for inference. We specified a padding value of 128 (default), which meant that classifications along 128-pixel borders of adjacent tiles would be blended to reduce artifacts [130]. The output was a raster with cell classes corresponding to the damage class(es) associated with the model. The tool then converted each group of contiguous, same-class cells to a polygon, with the final output being a feature class containing predicted polygons attributed by damage class. This vector output supported visual analysis, areal calculations and other attributions in a GIS database, as well as accuracy assessment.

2.7. Accuracy Assessment

As a final step in the deep learning workflow, we assessed the accuracy of the roof damage predictions generated by each model. We created a custom ArcGIS Pro tool called Calculate Accuracy, which is also part of the Roof Damage Assessment toolbox [116]. For each model, we input the prepared test images (Section 2.2), reference polygons (Section 2.3), and predicted polygons (Section 2.6) (Figure 1). Because the model performed semantic segmentation, the appropriate approach was to assess the pixel-based accuracy of the predictions. Since the predicted polygons were to be compared to the reference polygons on a per-pixel basis, each set of polygons needed to follow the boundaries of the corresponding test image pixels. This was already the case for the predicted polygons, whereas the reference polygons were manually created and did not follow the pixel boundaries. Therefore, for each test image, the Calculate Accuracy tool converted the reference polygons to a raster with cell boundaries matching the pixel boundaries of the image, and then back to a polygon feature class. The tool also dissolved each set of polygons (predicted and reference) by damage class such that each class was represented by one predicted multipart polygon and one reference multipart polygon. This allowed the tool to create a union feature class with the predicted and reference polygons, which was used to calculate the total number of test image pixels associated with the region of overlap (true positives), predicted polygon only (false positives), and reference polygon only (false negatives). Based on the pixel counts in these three regions, standard accuracy metrics were calculated:

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

where TP, FP, and FN are the total number of true positives, false positives, and false negatives, respectively [17]. Precision is the proportion of predicted pixels that were correct, while recall is the proportion of reference pixels that were predicted. In other words, precision refers to prediction correctness and recall refers to prediction completeness. Using precision and recall, the F1 score was calculated:

F 1 s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(3)

The F1 score is the harmonic mean of precision and recall, and serves as a measure of overall accuracy [17]. The final accuracy metric that was calculated was the intersection over union (IoU):

I o U = \frac{T P}{T P + F P + F N}

(4)

where TP is the intersection of the predicted pixels and reference pixels, while the sum of TP, FP, and FN is the union. The IoU metric is (non-linearly) correlated with the F1 score, but the F1 score applies twice the weight to true positives [17]. Altogether, these accuracy metrics were used to evaluate each model.

2.8. Additional Training Data Incorporation

Section 2.1, Section 2.2, Section 2.3, Section 2.4, Section 2.5, Section 2.6 and Section 2.7 described our deep learning workflow, including the study area and images, image preparation, training and reference polygon creation, training data export, model configuration and training, model inference and post-processing, and accuracy assessment. Three base models (decking only, holes only, and dual-class) were trained using the training data derived from drone imagery captured in Dominica and Sint Maarten. We wanted to simulate the process that practitioners can perform to build upon these base results. Specifically, we envision that future work will include expansion of the training datasets to incorporate samples from additional geographic and imaging contexts, followed by training and evaluation of new models.

After assessing the accuracy of the single-class and dual-class models, we chose the approach with the higher accuracy (single-class modeling) and repeated the workflow described in Section 2.1, Section 2.2, Section 2.3, Section 2.4, Section 2.5, Section 2.6 and Section 2.7 to incorporate additional training data from orthomosaics captured in external areas in the Bahamas and US Virgin Islands (Figure 5 and Table 3). These additional training orthomosaics (21 from the Bahamas and 26 from the US Virgin Islands) were derived from images captured by the same platforms in areas surrounding the corresponding test orthomosaics (Figure 5 and Table 3). Therefore, we hypothesized that model accuracy would be enhanced in the Bahamas and US Virgin Islands test areas. However, it was unclear how accuracy would be impacted in the Dominica and Sint Maarten test areas.

We created image boundary polygons for each of the additional training orthomosaics and used the Prepare Images tool (Section 2.2) to process each image into the proper format. Then, we created training polygons (Section 2.3) by delineating 250 roof decking and 250 roof hole objects in each area for a total of 1000 additional training polygons (Table 4). With the image boundary polygons, prepared training orthomosaics, and training polygons, we then used the Export Training Data tool (Section 2.4) to expand each original class-specific training dataset with training tiles from the Bahamas and US Virgin Islands. To create each expanded dataset, we made a copy of the original class-specific dataset and input its path as the output folder in the tool. Figure 6 and Figure 7 show example training tiles (images and labels) of decking and holes, respectively, from the training areas in the Bahamas and US Virgin Islands.

Table 4. Properties of training polygons used for expanding the original training datasets.

Location	Category	Number of Polygons (Decking)	Total Polygon Area (m²) ¹	Average Polygon Area (m²) ¹	Number of Polygons (Hole)	Total Polygon Area (m²) ¹	Average Polygon Area (m²) ¹
The Bahamas	Training	250	2982.58	11.93	250	4300.40	17.20
US Virgin Islands	Training	250	14,220.17	56.88	250	3537.09	14.15
All		500	17,202.75	34.41	500	7837.49	15.67

¹ Planimetric areas are provided to convey the areal representation of each class in the training images. Roof slopes must be factored into the calculation of true surface areas of damage.

Figure 6. Example training tiles (sets of images and labels) exported using the roof decking training polygons (yellow) and prepared orthomosaics captured in (a–e) the Bahamas and (f–j) US Virgin Islands.

Figure 7. Example training tiles (sets of images and labels) exported using the roof hole training polygons (red) and prepared orthomosaics captured in (a–e) the Bahamas and (f–j) US Virgin Islands.

After creating each expanded class-specific training dataset, we used the Train Single-Class Model notebook (Section 2.5) to train a new model for each class. We then performed inference and post-processing with each model using the Delineate Roof Damage tool (Section 2.6). Finally, we assessed the accuracy of each model’s predictions using the Calculate Accuracy tool (Section 2.7) and compared the performances of all the models.

3. Results

In the previous section, we described our deep learning workflow and demonstrated how to expand the original training datasets and train additional models. Table 5 summarizes the accuracy assessment results for each of the five models, along with the number of tiles in each training dataset and number of training epochs before early stopping was implemented. Because F1 score and IoU values are strongly correlated [17], we will discuss model accuracy using only one of these metrics (i.e., F1 score since it has been more commonly reported by comparable studies). The single-class models trained using the expanded training datasets achieved the highest F1 scores (0.88 for decking and 0.80 for holes). In the following sections, we expand upon and interpret the results with respect to: single-class versus dual-class modeling (Section 3.1), additional training data incorporation (Section 3.2), and strengths and weaknesses of the best-performing models (Section 3.3).

3.1. Single-Class Versus Dual-Class Modeling

In terms of models trained using the original training datasets from Dominica and Sint Maarten, the single-class models had higher accuracies than the dual-class model. With the single-class approach, overall F1 scores increased from 0.73 to 0.79 (6%) for decking and from 0.62 to 0.66 (4%) for holes. The precision reduced with a single-class approach, changing from 0.95 to 0.87 (8%) for decking and from 0.91 to 0.90 (1%) for holes. This was countered by improvements in recall from 0.59 to 0.73 (14%) for decking and from 0.47 to 0.52 (5%) for holes. Overall, decking accuracy was more greatly impacted by the modeling approach than hole accuracy. We suspect that single-class modeling is more accurate for this deep learning application because the complexity of both damage classes may be too high for one model to adequately learn. Furthermore, performing inference using a dual-class model means that each test image pixel can only be classified as one class. With a single-class approach, inference occurs twice on each test image since there are two models. In addition to the complexity of the damage classes, the exclusive classification associated with the dual-class model likely contributed to its lower recall and F1 scores.

3.2. Additional Training Data Incorporation

3.2.1. Impact on Accuracy of Each Damage Class

Since the single-class modeling approach resulted in higher accuracies, we demonstrated the process of expanding each original class-specific training dataset and training additional models. Training single-class models using the expanded datasets increased overall F1 scores from 0.79 to 0.88 (9%) for decking and from 0.66 to 0.80 (14%) for holes. For decking, the precision increased from 0.87 to 0.93 (6%), while for holes, it decreased from 0.90 to 0.78 (12%). Recall improved for both classes, changing from 0.73 to 0.83 (10%) for decking and from 0.52 to 0.81 (29%) for holes. Overall, the accuracy of both classes increased with training dataset expansion, and roof holes were associated with a larger improvement. This might be due to the higher variability of the roof hole class. Whereas roof decking tends to be more predictable in appearance (e.g., brown with linear textures; Figure 3 and Figure 6), holes comprise variously colored interior objects (including furniture and floor materials) and structural elements (such as walls and trusses) (Figure 4 and Figure 7). We suspect that roof holes were associated with lower accuracies than roof decking in all modeling approaches because of their higher diversity. The training dataset expansion likely contributed valuable variety and representation of both classes and their surroundings in two new contexts.

3.2.2. Impact on Accuracy in Each Test Location

Compared to training single-class models with the original datasets, using the expanded datasets increased decking F1 scores in Dominica from 0.63 to 0.70 (7%), in Sint Maarten from 0.84 to 0.86 (2%), in the Bahamas from 0.82 to 0.85 (3%), and in the US Virgin Islands from 0.77 to 0.92 (15%). The largest increase in decking accuracy in the US Virgin Islands (15%) may be related to the differences in GSD between the orthomosaics from each location. The Dominica and Sint Maarten orthomosaics had GSDs of 0.02–0.08 m compared to 0.15 m for the US Virgin Islands orthomosaics (Table 1 and Table 3). A higher GSD of 0.15 m resulted in decking being resolved with less detail (Figure 6). Expanding the original training dataset with 250 samples of decking from the US Virgin Islands provided representation of decking and its surroundings at this GSD, which likely contributed to improved detection and delineation in the corresponding test area.

Similarly to decking accuracies, hole F1 scores increased in Dominica from 0.78 to 0.87 (9%), in Sint Maarten from 0.60 to 0.78 (18%), in the Bahamas from 0.53 to 0.77 (24%), and in the US Virgin Islands from 0.60 to 0.70 (10%). The large improvement in the Bahamas may be related to the context surrounding holes in this location. The roofs in the Bahamas orthomosaics tended to be shingled, with damage often manifesting as unique arrangements of exposed decking, roofing underlayment, and holes (Figure 6 and Figure 7). Incorporating 250 samples of holes from the Bahamas into the expanded training dataset evidently improved detection and delineation of this class in the Bahamas test area.

3.2.3. Differences in Scene Complexity Between All Training Areas

As previously stated, we hypothesized that expanding the original training datasets with samples from the Bahamas and US Virgin Islands and training new models would enhance accuracy in the test areas of these two locations. However, it was unclear how accuracy would be impacted in the Dominica and Sint Maarten test areas. From our experiment, decking F1 scores increased by 7% (Dominica) and 2% (Sint Maarten), while hole F1 scores increased by 9% (Dominica) and 18% (Sint Maarten). These results may relate to the high learning capacity of each single-class model and the training data requirements for modeling these complex and variable roof damage classes. Accuracies improved in the Dominica and Sint Maarten test areas with expanded training datasets, despite the new samples coming from less-similar geographic and imaging contexts.

An alternative or complementary interpretation of these results is related to one major benefit of incorporating training samples from the Bahamas and US Virgin Islands: the orthomosaic scenes in these two locations were much less complex, with lower building densities and fewer instances of non-delineable roof damage (Figure 6 and Figure 7). As a result, we were able to label roof damage more comprehensively in these scenes. Conversely, orthomosaics from urban environments in Dominica (e.g., Roseau) and Sint Maarten (e.g., Philipsburg) contained a high density of affected rooftops on which damage could not be comprehensively labeled (Figure 8 and Figure 9). We suspect that these complex scenes adversely impacted model training. As training progressed, the model learned to ignore non-delineable roof damage since it was not labeled in the tiles. Because non-delineable roof damage is similar in appearance to our classes of delineable roof damage, the model likely also learned to ignore some forms of our classes. The training tiles derived from the Bahamas and US Virgin Islands orthomosaics were more comprehensively labeled and consistent, which likely contributed to overall increases in the recall of decking from 0.73 to 0.83 (10%) and recall of holes from 0.52 to 0.81 (29%), as well as higher F1 scores in all four test areas (Table 5).

3.3. Strengths and Weaknesses of the Best-Performing Models

Based on the accuracy assessment results, the best-performing models (i.e., the single-class models trained using the expanded class-specific datasets) were chosen for further evaluation. In the following sections, we illustrate the strengths and weaknesses of the roof decking and roof hole final models. Specifically, we provide examples and themes with respect to true positive (Section 3.3.1), false positive (Section 3.3.2), and false negative (Section 3.3.3) damage delineations. In Section 3.3.4, we summarize the findings.

3.3.1. True Positives

Figure 10 shows true positive model predictions of roof decking. The decking model was able to detect and delineate variably sized, shaped, textured, and colored instances. Small exposures of roof decking on shingled roofs in the Bahamas (Figure 10k–o) were captured in addition to the generally larger instances in the three other test areas. Textures included semiregular patterns (Figure 10a,g,r,t) as well as irregular ones (Figure 10e,h). True positives ranged in shape complexity; some were generally rectangular (Figure 10a,d), whereas others were elongated (Figure 10c,n), jagged (Figure 10c,l), and contained holes in the presence of roof holes (Figure 10d,f,h,i) and roof materials (Figure 10g,s). Finally, true positives ranged in variations of brown, including yellow-brown (Figure 10a,i), dark brown (Figure 10e), and light brown (Figure 10b,o).

The roof hole true positives also had different appearances (Figure 11). In addition to size and shape, predicted holes varied with respect to the objects within them. Interior elements included trusses (Figure 11a,g,l,o), walls (Figure 11e,i), variably colored floors (Figure 11b,e,i), and debris (Figure 11b,i). Furthermore, true positives were surrounded by a variety of roof materials, including metal (Figure 11c–e,h,s), wood (Figure 11f,g,j,k,n,p,q,t), and shingles/underlayment (Figure 11l–o).

3.3.2. False Positives

Overall, the best-performing models captured a wide variety of roof decking and roof holes in the test areas. Due to the complexity of post-storm scenes, there was also considerable potential for the models to produce false positive predictions. Figure 12 shows examples of decking false positives in each test area. Common false positives included portions of metal roofs with decking-like colors and textures (Figure 12a,b,f,p,q). Brown shingles were a less-common roof-based source (Figure 12k). Wood materials within, attached to, or next to buildings were another major source, such as interior debris (Figure 12r), attached wood decks (Figure 12c), and outdoor objects (Figure 12h,i,l,m). Brown ground surfaces outside of buildings were also incorrectly delineated (Figure 12d,e,g,j,s), as was wooden debris farther away from buildings (Figure 12n,o,t). Despite the wide variety of false positives, the final model for roof decking had relatively high overall precision (0.93; Table 5).

Like the decking false positives, the roof hole false positives were similar in appearance to the actual damage class (Figure 13). Some roof-based sources included portions of intact roofs (Figure 13f) and dark underlayment (Figure 13k). Outdoor (shadowed) areas next to buildings were the most common false positives (Figure 13a,b,l,p–t), especially in the US Virgin Islands test area, where precision was lower (0.59) than in the other test areas (0.77–0.91; Table 5). The model also incorrectly delineated portions of incomplete structures (Figure 13c–e), including collapsed buildings (Figure 13g–j,n). Some less-common false positives were holes in wood decks attached to buildings (Figure 13m) and holes in vehicles (Figure 13o).

3.3.3. False Negatives

Figure 14 shows example reference polygons of roof decking that were not predicted by the model. False negatives commonly occurred in contexts where there may have been inadequate contrast, such as light-colored decking next to light-colored roof covers (Figure 14a,c,g–i,p–t), dark decking next to dark roof covers (Figure 14b,d,n), and light gray-brown decking that is similar in color to its surroundings (Figure 14e,m). Recall was lower in the Dominica test area (0.61) compared to the other test areas (0.80–0.89; Table 5), with low contrast being a common characteristic of false negatives (Figure 14a–e). Another false negative included decking covered by debris (Figure 14o). Figure 14f,j–l show false negatives that were ordinarily delineated by the model.

Finally, Figure 15 shows example reference polygons of roof holes that were not predicted by the model. Common false negatives were large roof holes that lacked structural elements (e.g., trusses, walls) (Figure 15h,i,k–n), especially in the Bahamas test area, where recall was lower (0.68) than in the other test areas (0.79–0.87; Table 5). Other false negatives included smaller and elongated holes that may have been represented with insufficient detail (Figure 15a–g,j,o), including small holes in the US Virgin Islands (Figure 15p–t) that were captured at a higher GSD (0.15 m) than holes in the other test areas (0.03–0.08 m).

3.3.4. Summary

The best-performing models were able to detect and delineate variably sized, shaped, textured, and colored roof decking and roof hole objects in different surroundings (Figure 10 and Figure 11). These results suggest that each single-class model has the capacity to learn the complexities of and variations in its target class. The roof decking model had relatively high precision (0.93; Table 5) despite the false positives it produced (Figure 12). On the other hand, the roof hole model had lower precision (0.78; Table 5), mostly due to its incorrect delineation of outdoor areas next to buildings (Figure 13). Since the US Virgin Islands test area had the highest number of roof hole false positives, we suspect that additional training data in this less-represented geographic and imaging context would improve model precision by incorporating a greater variety of building surroundings. Finally, the roof decking and roof hole models had similar recall (0.83 and 0.81, respectively; Table 5). Common false negatives included roof decking in low-contrast situations (Figure 14), large roof holes that lacked structural elements, and small/elongated roof holes (Figure 15). It is unclear whether recall can be improved by strategies such as inclusion of more training tiles with these characteristics or if these false negatives are indicative of modeling limitations.

4. Discussion

In the previous section, we provided accuracy assessment results for the five models we trained using single-class and dual-class modeling approaches as well as training dataset expansion. Accuracy was assessed and interpreted with respect to modeling approach, damage class, and test location. In the following sections, we: recommend a modeling approach for this deep learning application (Section 4.1), compare our approach to previous studies (Section 4.2), discuss the limitations and extensions of this deep learning application (Section 4.3), and provide recommendations for future research (Section 4.4).

4.1. Recommended Modeling Approach

4.1.1. Single-Class Modeling

Based on our experimental results, modeling of roof decking and roof holes should be performed via a single-class approach. We suspect that the complexity of each damage class is too high for one model to adequately learn. This is suggested by the increase in overall F1 score from 0.73 to 0.79 (6%) for decking and from 0.62 to 0.66 (4%) for holes when the original class-specific training datasets were used to train single-class models as opposed to training one dual-class model with a single dataset (Table 5).

4.1.2. Geographic and Imaging Inclusion

The original training datasets were created using 2500 polygons of roof decking and 2500 polygons of roof holes from training areas in Dominica and Sint Maarten. Expanding the class-specific training datasets using 500 polygons of each class from the Bahamas and US Virgin Islands training areas resulted in the highest overall F1 scores (0.88 for decking and 0.80 for holes), including an improvement in all test areas (Table 5). The largest increases occurred in the Bahamas and US Virgin Islands test locations, which suggests that geographic and imaging inclusion in the training data is important. For example, roofs in the Bahamas test area were predominantly shingled, and their unique appearances likely needed to be represented in the training data. In the US Virgin Islands, damage objects were captured with less detail using a GSD of 0.15 m, whereas the remainder of the areas were imaged at sub-decimeter GSDs. The inclusion of samples from the US Virgin Islands was important for representing less-detailed damage objects and surroundings. Overall, when applying pre-trained roof damage delineation models to new geographic and imaging contexts, we suggest that practitioners consider how the appearances of roof damage and surroundings in these settings may deviate from those that were used to train the models.

4.1.3. Comprehensively Labeled Training Data

In addition to geographic and imaging inclusion, another important aspect of training data preparation for this deep learning application is the creation of comprehensively labeled tiles. Training tiles from urban areas in Dominica (e.g., Roseau) and Sint Maarten (e.g., Philipsburg) contained a high density of affected rooftops on which damage could not be comprehensively labeled (Figure 8 and Figure 9). We suspect that model training suffered due to the inclusion of these tiles. As training progressed, the model learned to ignore non-delineable roof damage since it was not labeled in the tiles. Because non-delineable roof damage is similar in appearance to our classes of delineable roof damage, the model likely also learned to ignore some forms of our classes. The additional training tiles from the Bahamas and US Virgin Islands orthomosaics were generally fully labeled since these scenes were much less complex, with lower building densities and fewer instances of non-delineable roof damage (Figure 6 and Figure 7). With expansion of the original training datasets using these tiles, the increase in F1 scores in all four test areas (as opposed to improvements only in the Bahamas and US Virgin Islands) suggests that comprehensive labeling is a crucial aspect of training data preparation.

4.2. Comparison to Previous Studies

Section 1.2 summarized 17 previous studies that presented approaches for image-based (semi-)automated delineation of roof damage [97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113]. In terms of accuracy, we are unable to compare our study to 11 of the 17 previous studies for the following reasons: (1) an accuracy assessment was not performed [110]; (2) compatible accuracy assessment details were not provided [100,112,113]; and (3) entire roofs (labeled based on their damage severity or percent) were used as the units for the accuracy assessment as opposed to damage-classified pixels, superpixels, or objects [98,99,105,106,107,108,109].

The six previous studies that are comparable to ours performed roof damage delineation and accuracy assessment on the basis of: (1) pixel classification using random forest [97] and deep learning semantic segmentation [104]; (2) superpixel classification using random forest [102,104], support vector machine [103], and local indicators of spatial association clustering [101]; and (3) supervoxel (point cluster) classification using random forest [111] (Table 6). The precision, recall, and F1 scores of the damage class in these studies have ranged from 0.72–0.97, 0.41–0.93, and 0.53–0.92, respectively. For comparison, our best-performing models had precision values of 0.93 (decking) and 0.78 (hole), recall values of 0.83 (decking) and 0.81 (hole), and F1 scores of 0.88 (decking) and 0.80 (hole) (Table 6).

Our study achieved similar or higher accuracies than the previous studies. Importantly, there are major methodological differences: (1) our approach does not require the isolation of rooftops in the data, and (2) our approach observes two classes of roof damage instead of one general or specific class. The isolation of rooftops, either before or after damage classification, eliminates false positives located away from buildings that would adversely impact model precision and F1 scores. Regarding the distinction between roof decking and roof holes (as opposed to identifying one general or specific damage class), we recommend that future research retains this distinction due to its importance in post-storm assessment for informing repair needs and the severity of building damage (Section 1.2).

Furthermore, when comparing the accuracy of our method to others, direct comparisons are limited by accuracy assessment differences, such as how reference samples were created as well as their quantity and location. The previous studies in Table 6 predominantly created reference samples by labeling the superpixels or supervoxels generated by their method. Conversely, we manually delineated roof damage instances in our test images, which presents less bias compared to the use of method-generated geometry for reference data creation. With respect to reference sample quantity, our study used 1440 roof damage polygons (totaling 8.5 million pixels) on 501 unique rooftops. The previous study with the highest reference sample quantity used 4.4 million damage-classified reference pixels on 100–200 rooftops [104] (Table 6). Finally, our study had four test areas that were separate from the training areas, while previous studies extracted reference samples from their single training areas (Table 6). In summary, we argue that our accuracy assessment procedure is more rigorous by using: (1) manual reference delineations that are independent of method-generated geometry, (2) a higher number of reference samples from a larger diversity of rooftops, and (3) test areas that are separate from training areas. We suggest that future studies emulate our approach by evaluating damage classifier robustness with a high number of manual labels from diverse and external datasets.

4.3. Application Limitations and Extensions

As previously explained, this deep learning application is limited to post-storm roof damage that is discernable and delineable in remote sensing RGB imagery. Even with a suitable GSD, a remote sensing image may contain roofs on which damage is very complex and non-delineable. Furthermore, in this application, trained deep learning models detect and delineate instances of roof decking and roof holes without regard for the geometric quality of the imagery nor the geometry of the roof. First, images must be accurately mosaicked and orthorectified to minimize geometric distortions of roofs. Such photogrammetric and post-processing inaccuracies will translate to errors in areal quantification. Second, for roofs that are not flat, roof slopes will need to be factored into the calculation of true surface areas of damage. This can be done by overlaying damage object delineations with a DSM or assuming an average roof pitch based on geographic context. For calculating the percent of damage to each roof, damage delineations would need to be overlaid with up-to-date building footprint polygons. Finally, our application identifies roof damage without regard for building type. If, for instance, residential buildings are the target of emergency roof repair efforts, the damage delineations would need to be overlaid with building information.

Real-world implementation of this application will have to account for its limitations and add any necessary extensions to achieve the desired information output. With F1 scores of 0.88 for decking and 0.80 for holes in four test areas, we suggest further evaluation of the models before any real-world implementation, especially in a fully automated manner. This application may be suitable as a computer-assisted delineation approach, where human analysts use trained models to automatically generate roof damage delineations and then subsequently intervene by manually deleting false positives and delineating roof damage that was missed by the models.

4.4. Recommended Future Research

The scope of our study was limited by factors including: (1) the evaluation of one deep learning framework (Mask2Former), (2) the evaluation of one software implementation of Mask2Former, and (3) our testing data. Mask2Former is among many other deep learning frameworks that are applicable to roof damage delineation. Future work can compare the performance of Mask2Former to custom variants (including other software implementations), as well as other frameworks that are capable of semantic and instance segmentation. Our model evaluation could also be extended by including additional test images. This would increase the rigor of the accuracy assessment by applying the models to a wider variety of geographic and imaging contexts. It is important to understand model accuracy in locations farther from the training areas, especially in regions outside the Caribbean. This type of evaluation would illuminate whether regional models of roof damage are more appropriate than one global model for each damage class. Furthermore, performing evaluation on images with GSDs higher than 0.15 m will indicate which sources of aerial imagery can be used. For example, the NOAA NGS Emergency Response Imagery archive [119] (which is the source of the US Virgin Islands orthomosaics used in this study) contains images from various events that have GSDs higher than 0.15 m, so it would be helpful to understand the GSD limitations and which images can be used. Finally, we envision that researchers will gradually expand the roof decking and roof hole training datasets in an effort to increase geographic and imaging inclusion. This process can follow the steps we took to expand our original training datasets with samples from the Bahamas and US Virgin Islands (Section 2.8). We provided all data, tools, and trained models from this study as a starting point for future work [116].

5. Conclusions

Our study presented an end-to-end workflow for training and evaluating deep learning image segmentation models that detect and delineate two classes of post-storm roof damage: roof decking and roof holes. Compared to previous studies that delineated roof damage using remote sensing imagery, our study: (1) used an approach that does not require the isolation of rooftops in the data, (2) observed two classes of roof damage that are relevant to post-storm damage assessment, and (3) evaluated model accuracy in multiple external test locations in the Caribbean.

Mask2Former models were trained using 2500 roof decking and 2500 roof hole samples from drone RGB orthomosaics (0.02–0.08 m GSD) captured in Sint Maarten and Dominica following Hurricanes Irma and Maria in 2017. The trained models were evaluated using 1440 reference samples from 10 test images, including eight drone orthomosaics (0.03–0.08 m GSD) acquired outside of the training areas in Sint Maarten and Dominica, one drone orthomosaic (0.05 m GSD) from the Bahamas, and one orthomosaic (0.15 m GSD) captured in the US Virgin Islands with a crewed aircraft and different sensor. Accuracies increased with a single-class modeling approach (instead of training one dual-class model) and expansion of the training datasets with 1000 additional samples from external areas in the Bahamas and US Virgin Islands. The best-performing models reached overall F1 scores of 0.88 (roof decking) and 0.80 (roof hole).

Based on our experimental results, we suggested that practitioners: (1) use a single-class modeling approach due to the complexity of each roof damage class; (2) evaluate the trained models in a greater variety of geographic and imaging contexts; and (3) expand the training datasets to increase geographic and imaging inclusion, particularly with comprehensively labeled training tiles. We provided all data, tools, and trained models from this study to support future work [116].

Author Contributions

Conceptualization, M.K. and C.H.H.; methodology, M.K., P.R.N. and C.H.H.; software, M.K. and P.R.N.; validation, M.K. and P.R.N.; formal analysis, M.K., P.R.N. and C.H.H.; investigation, M.K., P.R.N. and C.H.H.; resources, M.K., P.R.N. and C.H.H.; data curation, M.K. and P.R.N.; writing—original draft preparation, M.K.; writing—review and editing, M.K., P.R.N. and C.H.H.; visualization, M.K. and C.H.H.; supervision, C.H.H.; project administration, M.K. and C.H.H.; funding acquisition, M.K. and C.H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Alberta Innovates, Alberta Advanced Education, and the University of Calgary.

Data Availability Statement

The data, tools, and trained models presented in this study are available for download at [116].

Acknowledgments

We gratefully acknowledge GlobalMedic and the US NOAA NGS for the aerial imagery used in this research. We also thank Michelle Clements and Clay Wearmouth for maintaining access to computer resources throughout the study. Jean Slick, Geoffrey J. Hay, and Torsten Geldsetzer provided invaluable guidance on disaster management, remote sensing, and machine learning. Finally, we sincerely thank the editorial team of Remote Sensing and three anonymous reviewers whose thoughtful feedback greatly improved this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

3D	Three-dimensional
API	Application programming interface
CPU	Central processing unit
DSM	Digital surface model
FEMA	Federal Emergency Management Agency
FN	False negative
FP	False positive
GIS	Geographic information system
GPU	Graphics processing unit
GSD	Ground sample distance
IoU	Intersection over union
Lidar	Light detection and ranging
NGS	National Geodetic Survey
NOAA	National Oceanic and Atmospheric Administration
px	Pixel
RAM	Random access memory
RGB	Red, green, blue
TP	True positive
USD	United States dollar

References

CRED; UNDRR. Human Cost of Disasters: An Overview of the Last 20 Years 2000–2019; UNDRR: Geneva, Switzerland, 2020. [Google Scholar]
NOAA. Saffir-Simpson Hurricane Wind Scale. Available online: https://www.nhc.noaa.gov/aboutsshws.php (accessed on 5 July 2025).
Larsen, B.; Graham, T.; Aisbett, B. A survey to identify physically demanding tasks performed during storm damage operations by Australian State Emergency Services personnel. Appl. Ergon. 2013, 44, 128–133. [Google Scholar] [CrossRef]
Webb, S.; Weinstein Sheffield, E. Mindful Sheltering. Available online: https://alnap.org/help-library/resources/mindful-sheltering/ (accessed on 29 June 2022).
USACE. Temporary Roofing Level 2: On-Site Assessment. Available online: https://www.youtube.com/watch?v=Rv5tVFXeyu4&list=PL9jyI6yEwMbiAAmHj4EICLNhlrlsasf-2 (accessed on 5 July 2025).
UN OCHA. Emergency Team Responds to Philippines Devastation. Available online: https://reliefweb.int/report/philippines/emergency-team-responds-philippines-devastation (accessed on 5 July 2025).
UN OCHA. Staying Dry from the Rain: Red Cross Improves Living Conditions for Families Affected by Hurricane Matthew. Available online: https://reliefweb.int/report/haiti/staying-dry-rain-red-cross-improves-living-conditions-families-affected-hurricane (accessed on 5 July 2025).
UN OCHA. Shelter Cluster Vanuatu—TC Harold Situation Report No. 13 (10 July 2020). Available online: https://reliefweb.int/report/vanuatu/shelter-cluster-vanuatu-tc-harold-situation-report-no-13-10-july-2020 (accessed on 5 July 2025).
UN OCHA. Mozambique Cyclone Idai and Cyclone Kenneth Response: Situation Report #13 (1 October–31 December 2019). Available online: https://reliefweb.int/report/mozambique/mozambique-cyclone-idai-and-cyclone-kenneth-response-situation-report-13-1-october (accessed on 5 July 2025).
UNICEF. UNICEF Belize Humanitarian Situation Report #3, 16 August 2016. Available online: https://reliefweb.int/report/belize/unicef-belize-humanitarian-situation-report-3-16-august-2016 (accessed on 5 July 2025).
IOM. IOM to Provide Temporary Roofing Solutions for Houses Affected by Hurricane Dorian in the Bahamas. Available online: https://www.iom.int/news/iom-provide-temporary-roofing-solutions-houses-affected-hurricane-dorian-bahamas (accessed on 5 July 2025).
USACE. Operation Blue Roof. Available online: https://www.usace.army.mil/Missions/Emergency-Operations/Blue-Roof-Info/ (accessed on 5 July 2025).
US Army. Mobile District Host Operation Blue Roof Deployers. Available online: https://www.army.mil/article/280761/mobile_district_host_operation_blue_roof_deployers (accessed on 5 July 2025).
Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
Cheng, G.; Xie, X.; Han, J.; Guo, L.; Xia, G.-S. Remote Sensing Image Scene Classification Meets Deep Learning: Challenges, Methods, Benchmarks, and Opportunities. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3735–3756. [Google Scholar] [CrossRef]
Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.-S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
Maxwell, A.E.; Warner, T.A.; Guillén, L.A. Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 1: Literature Review. Remote Sens. 2021, 13, 2450. [Google Scholar] [CrossRef]
Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention Mask Transformer for Universal Image Segmentation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1280–1289. [Google Scholar]
Gibril, M.B.A.; Al-Ruzouq, R.; Shanableh, A.; Jena, R.; Bolcek, J.; Shafri, H.Z.M.; Ghorbanzadeh, O. Transformer-based semantic segmentation for large-scale building footprint extraction from very-high resolution satellite images. Adv. Sp. Res. 2024, 73, 4937–4954. [Google Scholar] [CrossRef]
Liu, Y.; Li, E.; Liu, W.; Li, X.; Zhu, Y. LFEMAP-Net: Low-Level Feature Enhancement and Multiscale Attention Pyramid Aggregation Network for Building Extraction From High-Resolution Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 2718–2730. [Google Scholar] [CrossRef]
Zheng, L.; Pu, X.; Zhang, S.; Xu, F. Tuning a SAM-Based Model With Multicognitive Visual Adapter to Remote Sensing Instance Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 2737–2748. [Google Scholar] [CrossRef]
Song, L.; Gao, Y.; Gui, Y.; Jiang, D.; Zhang, M.; Liu, H.; Li, W. LHAS: A Lightweight Network Based on Hierarchical Attention for Hyperspectral Image Segmentation. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5508012. [Google Scholar] [CrossRef]
Guo, S.; Yang, Q.; Xiang, S.; Wang, S.; Wang, X. Mask2Former with Improved Query for Semantic Segmentation in Remote-Sensing Images. Mathematics 2024, 12, 765. [Google Scholar] [CrossRef]
Wei, R.; Fan, B.; Wang, Y.; Yang, R. A Query-Based Network for Rural Homestead Extraction from VHR Remote Sensing Images. Sensors 2023, 23, 3643. [Google Scholar] [CrossRef]
Cai, W.; Jin, K.; Hou, J.; Guo, C.; Wu, L.; Yang, W. VDD: Varied Drone Dataset for semantic segmentation. J. Vis. Commun. Image Represent. 2025, 109, 104429. [Google Scholar] [CrossRef]
Qian, Z.; Chen, M.; Sun, Z.; Zhang, F.; Xu, Q.; Guo, J.; Xie, Z.; Zhang, Z. Simultaneous extraction of spatial and attributional building information across large-scale urban landscapes from high-resolution satellite imagery. Sustain. Cities Soc. 2024, 106, 105393. [Google Scholar] [CrossRef]
Xiao, Y.; Lin, L.; Ma, J.; Bi, M. Enhancing Rooftop Photovoltaic Segmentation Using Spatial Feature Reconstruction and Multi-Scale Feature Aggregation. Energies 2025, 18, 119. [Google Scholar] [CrossRef]
Tran, M.; De Luis, A.; Liao, H.; Huang, Y.; McCann, R.; Mantooth, A.; Cothren, J.; Le, N. S3Former: A Deep Learning Approach to High Resolution Solar PV Profiling. IEEE Trans. Smart Grid 2025, 16, 2611–2623. [Google Scholar] [CrossRef]
García, G.; Aparcedo, A.; Nayak, G.K.; Ahmed, T.; Shah, M.; Li, M. Generalized deep learning model for photovoltaic module segmentation from satellite and aerial imagery. Sol. Energy 2024, 274, 112539. [Google Scholar] [CrossRef]
Niu, Z.; Xi, K.; Liao, Y.; Tao, P.; Ke, T. A Practical Framework for Estimating Façade Opening Rates of Rural Buildings Using Real-Scene 3D Models Derived from Unmanned Aerial Vehicle Photogrammetry. Remote Sens. 2025, 17, 1596. [Google Scholar] [CrossRef]
Qiao, W.; Shen, L.; Wang, W.; Li, Z. A Weakly Supervised Bitemporal Scene Change Detection Approach for Pixel-Level Building Damage Assessment Using Pre- and Post-Disaster High-Resolution Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5648523. [Google Scholar] [CrossRef]
Fujita, S.; Hatayama, M. Collapsed Building Detection Using Multiple Object Tracking from Aerial Videos and Analysis of Effective Filming Techniques of Drones. In Information Technology in Disaster Risk Reduction, Proceedings of the 7th IFIP WG 5.15 International Conference, ITDRR 2022, Kristiansand, Norway, 12–14 October 2022; Gjøsæter, T., Radianti, J., Murayama, Y., Eds.; IFIP Advances in Information and Communication Technology; Springer Nature: Cham, Switzerland, 2023; Volume 672, pp. 118–135. ISBN 978-3-031-34206-6. [Google Scholar]
Abdi, G.; Esfandiari, M.; Jabari, S. A deep transfer learning-based damage assessment on post-event very high-resolution orthophotos. Geomatica 2022, 75, 237–250. [Google Scholar] [CrossRef]
Zhang, H.; Wang, M.; Zhang, Y.; Ma, G. TDA-Net: A Novel Transfer Deep Attention Network for Rapid Response to Building Damage Discovery. Remote Sens. 2022, 14, 3687. [Google Scholar] [CrossRef]
Pan, K.; Gonsoroski, E.; Uejio, C.K.; Beitsch, L.; Sherchan, S.P.; Lichtveld, M.Y.; Harville, E.W. Remotely sensed measures of Hurricane Michael damage and adverse perinatal outcomes and access to prenatal care services in the Florida panhandle. Environ. Health 2022, 21, 118. [Google Scholar] [CrossRef]
Kalantar, B.; Ueda, N.; Al-Najjar, H.A.H.; Halin, A.A. Assessment of convolutional neural network architectures for earthquake-induced building damage detection based on pre- and post-event orthophoto images. Remote Sens. 2020, 12, 3529. [Google Scholar] [CrossRef]
Jing, Y.; Ren, Y.; Liu, Y.; Wang, D.; Yu, L. Automatic Extraction of Damaged Houses by Earthquake Based on Improved YOLOv5: A Case Study in Yangbi. Remote Sens. 2022, 14, 382. [Google Scholar] [CrossRef]
Valentijn, T.; Margutti, J.; van den Homberg, M.; Laaksonen, J. Multi-hazard and spatial transferability of a CNN for automated building damage assessment. Remote Sens. 2020, 12, 2839. [Google Scholar] [CrossRef]
Scott, P.; Liang, D. Analysis of residential building performance in tornadoes as a function of building and hazard characteristics. In Proceedings of the 9th Asia-Pacific Conference on Wind Engineering, Auckland, New Zealand, 3–7 December 2017; pp. 5–8. [Google Scholar]
Massarra, C.C.; Friedland, C.J.; Marx, B.D.; Dietrich, J.C. Predictive multi-hazard hurricane data-based fragility model for residential homes. Coast. Eng. 2019, 151, 10–21. [Google Scholar] [CrossRef]
Rhee, D.M.; Nevill, J.B.; Lombardo, F.T. Comparison of Near-Surface Wind Speed Estimation Techniques Using Different Damage Indicators from a Damage Survey of Naplate, IL EF-3 Tornado. Nat. Hazards Rev. 2022, 23, 04021052. [Google Scholar] [CrossRef]
Lin, D.; Wang, J.; Li, Y. Unsupervised building damage identification using post-event optical imagery and variational autoencoder. IEICE Trans. Inf. Syst. 2021, E104D, 1770–1774. [Google Scholar] [CrossRef]
Tingzon, I.; Cowan, N.M.; Chrzanowski, P. Fusing VHR Post-disaster Aerial Imagery and LiDAR Data for Roof Classification in the Caribbean. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France, 2–6 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 3742–3749. [Google Scholar]
Xu, J.; Zeng, F.; Liu, W.; Takahashi, T. Damage Detection and Level Classification of Roof Damage after Typhoon Faxai Based on Aerial Photos and Deep Learning. Appl. Sci. 2022, 12, 4912. [Google Scholar] [CrossRef]
Zhou, Z.; Gong, J.; Hu, X. Community-scale multi-level post-hurricane damage assessment of residential buildings using multi-temporal airborne LiDAR data. Autom. Constr. 2019, 98, 30–45. [Google Scholar] [CrossRef]
Mohammadi, M.E.; Wood, R.L. Machine Learning-Based Structural Damage Identification Within Three-Dimensional Point Clouds; Cury, A., Ribeiro, D., Ubertini, F., Todd, M.D., Eds.; Structural Integrity; Springer Nature: Cham, Switzerland, 2022; Volume 21, ISBN 978-3-030-81715-2. [Google Scholar]
Boin, J.-B.; Roth, N.; Doshi, J.; Llueca, P.; Borensztein, N. Multi-class segmentation under severe class imbalance: A case study in roof damage assessment. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Online, 6–12 December 2020; pp. 1–8. [Google Scholar]
Naito, S.; Tomozawa, H.; Mori, Y.; Nagata, T.; Monma, N.; Nakamura, H.; Fujiwara, H.; Shoji, G. Building-damage detection method based on machine learning utilizing aerial photographs of the Kumamoto earthquake. Earthq. Spectra 2020, 36, 1166–1187. [Google Scholar] [CrossRef]
Zhou, Z.; Gong, J. Automated Analysis of Mobile LiDAR Data for Component-Level Damage Assessment of Building Structures during Large Coastal Storm Events. Comput. Civ. Infrastruct. Eng. 2018, 33, 373–392. [Google Scholar] [CrossRef]
Gueguen, L.; Pesaresi, M.; Gerhardinger, A.; Soille, P. Characterizing and Counting Roofless Buildings in Very High Resolution Optical Images. IEEE Geosci. Remote Sens. Lett. 2012, 9, 114–118. [Google Scholar] [CrossRef]
Calton, L.; Wei, Z. Using Artificial Neural Network Models to Assess Hurricane Damage through Transfer Learning. Appl. Sci. 2022, 12, 1466. [Google Scholar] [CrossRef]
Fujita, S.; Hatayama, M. Estimation Method for Roof-damaged Buildings from Aero-Photo Images During Earthquakes Using Deep Learning. Inf. Syst. Front. 2021, 25, 351–363. [Google Scholar] [CrossRef]
Mohammadi, M.E.; Watson, D.P.; Wood, R.L. Deep Learning-Based Damage Detection from Aerial SfM Point Clouds. Drones 2019, 3, 68. [Google Scholar] [CrossRef]
Yu, K.; Wang, S.; Wang, Y.; Gu, Z.; Wang, Y. DBA-RTMDet: A High-Precision and Real-Time Instance Segmentation Method for Identification of Damaged Buildings in Post-Earthquake UAV Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 19577–19593. [Google Scholar] [CrossRef]
Ji, M.; Liu, L.; Du, R.; Buchroithner, M.F. A Comparative Study of Texture and Convolutional Neural Network Features for Detecting Collapsed Buildings After Earthquakes Using Pre- and Post-Event Satellite Imagery. Remote Sens. 2019, 11, 1202. [Google Scholar] [CrossRef]
Ji, M.; Liu, L.; Buchroithner, M. Identifying Collapsed Buildings Using Post-Earthquake Satellite Imagery and Convolutional Neural Networks: A Case Study of the 2010 Haiti Earthquake. Remote Sens. 2018, 10, 1689. [Google Scholar] [CrossRef]
Wang, B.; Tan, X.; Song, D.; Zhang, L. Rapid Identification of Post-Earthquake Collapsed Buildings via Multi-Scale Morphological Profiles With Multi-Structuring Elements. IEEE Access 2020, 8, 122036–122056. [Google Scholar] [CrossRef]
Xiu, H.; Shinohara, T.; Matsuoka, M.; Inoguchi, M.; Kawabe, K.; Horie, K. Collapsed Building Detection Using 3D Point Clouds and Deep Learning. Remote Sens. 2020, 12, 4057. [Google Scholar] [CrossRef]
Pi, Y.; Nath, N.D.; Behzadan, A.H. Detection and Semantic Segmentation of Disaster Damage in UAV Footage. J. Comput. Civ. Eng. 2021, 35, 04020063. [Google Scholar] [CrossRef]
Liao, Y.; Mohammadi, M.E.; Wood, R.L. Deep Learning Classification of 2D Orthomosaic Images and 3D Point Clouds for Post-Event Structural Damage Assessment. Drones 2020, 4, 24. [Google Scholar] [CrossRef]
Kakooei, M.; Baleghi, Y. Fusion of satellite, aircraft, and UAV data for automatic disaster damage assessment. Int. J. Remote Sens. 2017, 38, 2511–2534. [Google Scholar] [CrossRef]
Thomas, J.; Kareem, A.; Bowyer, K.W. Automated poststorm damage classification of low-rise building roofing systems using high-resolution aerial imagery. IEEE Trans. Geosci. Remote Sens. 2014, 52, 3851–3861. [Google Scholar] [CrossRef]
He, M.; Zhu, Q.; Du, Z.; Hu, H.; Ding, Y.; Chen, M. A 3D Shape Descriptor Based on Contour Clusters for Damaged Roof Detection Using Airborne LiDAR Point Clouds. Remote Sens. 2016, 8, 189. [Google Scholar] [CrossRef]
Axel, C.; van Aardt, J. Building damage assessment using airborne lidar. J. Appl. Remote Sens. 2017, 11, 1. [Google Scholar] [CrossRef]
Qiao, W.; Shen, L.; Wen, Q.; Wen, Q.; Tang, S.; Li, Z. Revolutionizing building damage detection: A novel weakly supervised approach using high-resolution remote sensing images. Int. J. Digit. Earth 2024, 17, 2298245. [Google Scholar] [CrossRef]
Zhan, Y.; Liu, W.; Maruyama, Y. Damaged Building Extraction Using Modified Mask R-CNN Model Using Post-Event Aerial Images of the 2016 Kumamoto Earthquake. Remote Sens. 2022, 14, 1002. [Google Scholar] [CrossRef]
Fujita, S.; Hatayama, M. Automatic Calculation of Damage Rate of Roofs Based on Image Segmentation. In Information Technology in Disaster Risk Reduction, Proceedings of the 6th IFIP WG 5.15 International Conference, ITDRR 2021, Morioka, Japan, 25–27 October 2021; Sasaki, J., Murayama, Y., Velev, D., Zlateva, P., Eds.; IFIP Advances in Information and Communication Technology; Springer Nature: Cham, Switzerland, 2022; Volume 638, pp. 3–22. ISBN 978-3-031-04169-3. [Google Scholar]
Yu, K.; Wang, S.; Wang, Y.; Gu, Z. High-Quality Damaged Building Instance Segmentation Based on Improved Mask Transfiner Using Post-Earthquake UAS Imagery: A Case Study of the Luding Ms 6.8 Earthquake in China. Remote Sens. 2024, 16, 4222. [Google Scholar] [CrossRef]
Tennant, E.; Jenkins, S.F.; Miller, V.; Robertson, R.; Wen, B.; Yun, S.-H.; Taisne, B. Automating tephra fall building damage assessment using deep learning. Nat. Hazards Earth Syst. Sci. 2024, 24, 4585–4608. [Google Scholar] [CrossRef]
Zou, R.; Liu, J.; Pan, H.; Tang, D.; Zhou, R. An Improved Instance Segmentation Method for Fast Assessment of Damaged Buildings Based on Post-Earthquake UAV Images. Sensors 2024, 24, 4371. [Google Scholar] [CrossRef]
Mittal, P.V.; Bafna, R.; Mittal, A. Unsupervised learning framework for region-based damage assessment on xBD, a large satellite imagery. Nat. Hazards 2023, 118, 1619–1643. [Google Scholar] [CrossRef]
Miura, H.; Aridome, T.; Matsuoka, M. Deep learning-based identification of collapsed, non-collapsed and blue tarp-covered buildings from post-disaster aerial images. Remote Sens. 2020, 12, 1924. [Google Scholar] [CrossRef]
Adams, S.M.; Levitan, M.L.; Friedland, C.J. High resolution imagery collection for post-disaster studies utilizing unmanned aircraft systems (UAS). Photogramm. Eng. Remote Sens. 2014, 80, 1161–1168. [Google Scholar] [CrossRef]
Kijewski-Correa, T.L.; Kennedy, A.B.; Taflanidis, A.A.; Prevatt, D.O. Field reconnaissance and overview of the impact of Hurricane Matthew on Haiti’s Tiburon Peninsula. Nat. Hazards 2018, 94, 627–653. [Google Scholar] [CrossRef]
Kovar, R.N.; Brown-Giammanco, T.M.; Lombardo, F.T. Leveraging Remote-Sensing Data to Assess Garage Door Damage and Associated Roof Damage. Front. Built Environ. 2018, 4, 61. [Google Scholar] [CrossRef]
Amini, M.; Memari, A.M. Review of Literature on Performance of Coastal Residential Buildings under Hurricane Conditions and Lessons Learned. J. Perform. Constr. Facil. 2020, 34, 04020102. [Google Scholar] [CrossRef]
Meloy, N.; Sen, R.; Pai, N.; Mullins, G. Roof Damage in New Homes Caused by Hurricane Charley. J. Perform. Constr. Facil. 2007, 21, 97–107. [Google Scholar] [CrossRef]
Miura, H.; Murata, Y.; Wakasa, H.; Takara, T. Empirical estimation based on remote sensing images of insured typhoon-induced economic losses from building damage. Int. J. Disaster Risk Reduct. 2022, 82, 103334. [Google Scholar] [CrossRef]
Pratt, K.S.; Murphy, R.; Stover, S.; Griffin, C. CONOPS and autonomy recommendations for VTOL small unmanned aerial system based on Hurricane Katrina operations. J. F. Robot. 2009, 26, 636–650. [Google Scholar] [CrossRef]
Aránguiz, R.; Saez, B.; Gutiérrez, G.; Oyarzo-Vera, C.; Nuñez, E.; Quiñones, C.; Bobadilla, R.; Bull, M.T. Damage assessment of the May 31st, 2019, Talcahuano tornado, Chile. Int. J. Disaster Risk Reduct. 2020, 50, 101853. [Google Scholar] [CrossRef]
Rey, T.; Leone, F.; Candela, T.; Belmadani, A.; Palany, P.; Krien, Y.; Cécé, R.; Gherardi, M.; Péroche, M.; Zahibo, N. Coastal Processes and Influence on Damage to Urban Structures during Hurricane Irma (St-Martin & St-Barthélemy, French West Indies). J. Mar. Sci. Eng. 2019, 7, 215. [Google Scholar]
Roueche, D.B.; Chen, G.; Soto, M.G.; Kameshwar, S.; Safiey, A.; Do, T.; Lombardo, F.T.; Nakayama, J.O.; Rittelmeyer, B.M.; Palacio-Betancur, A.; et al. Performance of Hurricane-Resistant Housing during the 2022 Arabi, Louisiana, Tornado. J. Struct. Eng. 2024, 150, 04024029. [Google Scholar] [CrossRef]
Schaefer, M.; Teeuw, R.; Day, S.; Zekkos, D.; Weber, P.; Meredith, T.; van Westen, C.J. Low-cost UAV surveys of hurricane damage in Dominica: Automated processing with co-registration of pre-hurricane imagery for change analysis. Nat. Hazards 2020, 101, 755–784. [Google Scholar] [CrossRef]
Stevenson, S.A.; Miller, C.S.; Sills, D.M.L.; Kopp, G.A.; Rhee, D.M.; Lombardo, F.T. Assessment of wind speeds along the damage path of the Alonsa, Manitoba EF4 tornado on 3 August 2018. J. Wind Eng. Ind. Aerodyn. 2023, 238, 105422. [Google Scholar] [CrossRef]
Calantropio, A.; Chiabrando, F.; Sammartano, G.; Spanò, A.; Teppati Losè, L. UAV strategies validation and remote sensing data for damage assessment in post-disaster scenarios. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, XLII-3/W4, 121–128. [Google Scholar] [CrossRef]
de Bruijn, J.A.; Daniell, J.E.; Pomonis, A.; Gunasekera, R.; Macabuag, J.; de Ruiter, M.C.; Koopman, S.J.; Bloemendaal, N.; de Moel, H.; Aerts, J.C.J.H. Using rapid damage observations for Bayesian updating of hurricane vulnerability functions: A case study of Hurricane Dorian using social media. Int. J. Disaster Risk Reduct. 2022, 72, 102839. [Google Scholar] [CrossRef]
Alzarrad, A.; Awolusi, I.; Hatamleh, M.T.; Terreno, S. Automatic assessment of roofs conditions using artificial intelligence (AI) and unmanned aerial vehicles (UAVs). Front. Built Environ. 2022, 8, 1026225. [Google Scholar] [CrossRef]
Hezaveh, M.M.; Kanan, C.; Salvaggio, C. Roof Damage Assessment using Deep Learning. In Proceedings of the 2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 10–12 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 6403–6408. [Google Scholar]
Loerch, A.C.; Stow, D.A.; Coulter, L.L.; Nara, A.; Frew, J. Comparing the Accuracy of sUAS Navigation, Image Co-Registration and CNN-Based Damage Detection between Traditional and Repeat Station Imaging. Geosciences 2022, 12, 401. [Google Scholar] [CrossRef]
Zhang, Y.; Kong, L.; Antwi-Afari, M.F.; Zhang, Q. An Integrated Method Using a Convolutional Autoencoder, Thresholding Techniques, and a Residual Network for Anomaly Detection on Heritage Roof Surfaces. Buildings 2024, 14, 2828. [Google Scholar] [CrossRef]
Gong, J.; Maher, A. Use of Mobile Lidar Data to Assess Hurricane Damage and Visualize Community Vulnerability. Transp. Res. Rec. J. Transp. Res. Board 2014, 2459, 119–126. [Google Scholar] [CrossRef]
Kashani, A.G.; Crawford, P.S.; Biswas, S.K.; Graettinger, A.J.; Grau, D. Automated Tornado Damage Assessment and Wind Speed Estimation Based on Terrestrial Laser Scanning. J. Comput. Civ. Eng. 2015, 29, 04014051. [Google Scholar] [CrossRef]
Kashani, A.G.; Graettinger, A.J.; Dao, T. Lidar-Based Methodology to Evaluate Fragility Models for Tornado-Induced Roof Damage. Nat. Hazards Rev. 2016, 17, 04016006. [Google Scholar] [CrossRef]
Gong, J. A Remote Sensing-based Approach for Assessing and Visualizing Post-Sandy Damage and Resiliency Rebuilding Needs. In Proceedings of the Construction Research Congress 2014, Atlanta, Georgia, 19–21 May 2014; American Society of Civil Engineers: Reston, VA, USA, 2014; pp. 1259–1268. [Google Scholar]
Khoshelham, K.; Oude Elberink, S.; Xu, S. Segment-Based Classification of Damaged Building Roofs in Aerial Laser Scanning Data. IEEE Geosci. Remote Sens. Lett. 2013, 10, 1258–1262. [Google Scholar] [CrossRef]
Kashani, A.G.; Olsen, M.J.; Graettinger, A.J. Laser Scanning Intensity Analysis for Automated Building Wind Damage Detection. In Proceedings of the Computing in Civil Engineering 2015, Austin, TX, USA, 21–23 June 2015; American Society of Civil Engineers: Reston, VA, USA, 2015; pp. 199–205. [Google Scholar]
Fiorillo, F.; Perfetti, L.; Cardani, G. Automated Mapping of the roof damage in historic buildings in seismic areas with UAV photogrammetry. Procedia Struct. Integr. 2023, 44, 1672–1679. [Google Scholar] [CrossRef]
Li, S.; Tang, H. Classification of Building Damage Triggered by Earthquakes Using Decision Tree. Math. Probl. Eng. 2020, 2020, 2930515. [Google Scholar] [CrossRef]
Li, S.; Tang, H. Building damage extraction triggered by earthquake using the UAV imagery. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci.-ISPRS Arch. 2018, 42, 929–936. [Google Scholar] [CrossRef]
Li, S.; Tang, H.; He, S.; Shu, Y.; Mao, T.; Li, J.; Xu, Z. Unsupervised Detection of Earthquake-Triggered Roof-Holes From UAV Images Using Joint Color and Shape Features. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1823–1827. [Google Scholar]
Liu, C.; Sui, H.; Huang, L. Identification of Damaged Building Regions from High- Resolution Images Using Superpixel-Based Gradient and Autocorrelation Analysis. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 1010–1024. [Google Scholar] [CrossRef]
Lucks, L.; Bulatov, D.; Thönnessen, U.; Böge, M. Superpixel-wise Assessment of Building Damage from Aerial Images. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019), Prague, Czech Republic, 25–27 February 2019; SciTePress—Science and Technology Publications: Setúbal, Portugal, 2019; Volume 4, pp. 211–220. [Google Scholar]
Tu, J.; Sui, H.; Feng, W.; Sun, K.; Hua, L. Detection of Damaged Rooftop Areas from High-Resolution Aerial Images Based on Visual Bag-of-Words Model. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1817–1821. [Google Scholar] [CrossRef]
Böge, M.; Bulatov, D.; Lucks, L. Localization and Grading of Building Roof Damages in High-Resolution Aerial Images. In Computer Vision, Imaging and Computer Graphics Theory and Applications, Proceedings of the 14th International Joint Conference, VISIGRAPP 2019, Prague, Czech Republic, 25–27 February 2019; Cláudio, A.P., Bouatouch, K., Chessa, M., Paljic, A., Kerren, A., Hurter, C., Tremeau, A., Farinella, G.M., Eds.; Communications in Computer and Information Science; Springer Nature: Cham, Switzerland, 2020; Volume 1182, pp. 497–519. ISBN 978-3-030-41589-1. [Google Scholar]
McNamara, D.; Mell, W.; Maranghides, A. Object-based post-fire aerial image classification for building damage, destruction and defensive actions at the 2012 Colorado Waldo Canyon Fire. Int. J. Wildl. Fire 2020, 29, 174–189. [Google Scholar] [CrossRef]
Radhika, S.; Tamura, Y.; Matsui, M. Strong Wind-Damaged Roof Detection from Post-Storm Aerial Images. In Proceedings of the Eighth Asia-Pacific Conference on Wind Engineering, Chennai, India, 10–14 December 2013; Research Publishing Services: Singapore, 2013; pp. 1122–1128. [Google Scholar]
Radhika, S.; Tamura, Y.; Matsui, M. Determination of Degree of Damage on Building Roofs Due to Wind Disaster from Close Range Remote Sensing Images Using Texture Wavelet Analysis. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 3366–3369. [Google Scholar]
Radhika, S.; Tamura, Y.; Matsui, M. Application of Remote-Sensing Images for Post-Wind Storm Damage Analysis. In Remote Sensing of Hydrometeorological Hazards; CRC Press: Boca Raton, FL, USA, 2017; pp. 417–435. ISBN 9781498777599. [Google Scholar]
Radhika, S.; Tamura, Y.; Matsui, M. Texture-wavelet analysis for automating wind damage detection from aerial imageries. In Proceedings of the 2013 3rd IEEE International Advance Computing Conference (IACC), Ghaziabad, India, 22–23 February 2013; pp. 1246–1250. [Google Scholar]
Chen, S.C.; Shyu, M.L.; Zhang, C.; Tang, W.Z.; Zhang, K. Damage pattern mining in Hurricane image databases. In Proceedings of the Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications, Las Vegas, NV, USA, 27–29 October 2003; pp. 227–234. [Google Scholar]
Liu, C.; Sui, H.; Huang, L. Identification of Building Damage from UAV-Based Photogrammetric Point Clouds Using Supervoxel Segmentation and Latent Dirichlet Allocation Model. Sensors 2020, 20, 6499. [Google Scholar] [CrossRef] [PubMed]
Qiu, H.; Zhang, J.; Zhuo, L.; Xiao, Q.; Chen, Z.; Tian, H. Research on intelligent monitoring technology for roof damage of traditional Chinese residential buildings based on improved YOLOv8: Taking ancient villages in southern Fujian as an example. Herit. Sci. 2024, 12, 231. [Google Scholar] [CrossRef]
Liu, C.; Sui, H.; Huang, L. Minor Damage Recognition from Postearthquake Buildings with an Improved Generative Adversarial Semantic Segmentation Network. Nat. Hazards Rev. 2025, 26, 04025023. [Google Scholar] [CrossRef]
FEMA. Hazus Hurricane Model Technical Manual: Hazus 7.0. 2025; FEMA: Washington, DC, USA, 2025. [Google Scholar]
Esri. Deep-Learning-Frameworks. Available online: https://github.com/esri/deep-learning-frameworks (accessed on 5 July 2025).
Kucharczyk, M. Roof-Damage-Assessment. Available online: https://github.com/maja-kucharczyk/roof-damage-assessment (accessed on 24 July 2025).
GlobalMedic. RescUAV. Available online: https://globalmedic.ca/rescuav (accessed on 5 July 2025).
NGS. 2017 NOAA NGS Emergency Response Imagery: Hurricane Maria. Available online: https://www.fisheries.noaa.gov/inport/item/52283 (accessed on 5 July 2025).
NGS. NOAA’s Emergency Response Imagery. Available online: https://oceanservice.noaa.gov/hazards/emergency-response-imagery.html (accessed on 5 July 2025).
Esri. Resample (Data Management). Available online: https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/resample.htm (accessed on 5 July 2025).
Esri. Resample function. Available online: https://pro.arcgis.com/en/pro-app/latest/help/analysis/raster-functions/resample-function.htm (accessed on 5 July 2025).
Esri. Export Training Data For Deep Learning (Image Analyst). Available online: https://pro.arcgis.com/en/pro-app/latest/tool-reference/image-analyst/export-training-data-for-deep-learning.htm (accessed on 5 July 2025).
Esri. Arcgis.Learn Module. Available online: https://developers.arcgis.com/python/latest/api-reference/arcgis.learn.toc.html (accessed on 5 July 2025).
Fast.Ai. Vision.Transform. Available online: https://fastai1.fast.ai/vision.transform.html (accessed on 5 July 2025).
Esri. Using MMSegmentation with Arcgis.Learn. Available online: https://developers.arcgis.com/python/latest/guide/using-mmsegmentation-with-arcgis-learn (accessed on 5 July 2025).
OpenMMLab. Mmsegmentation. Available online: https://github.com/open-mmlab/mmsegmentation (accessed on 5 July 2025).
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 9992–10002. [Google Scholar]
Stanford Vision Lab. ImageNet. Available online: https://www.image-net.org (accessed on 5 July 2025).
Cityscapes Team. Cityscapes Dataset. Available online: https://www.cityscapes-dataset.com (accessed on 5 July 2025).
Esri. Classify Pixels Using Deep Learning (Image Analyst). Available online: https://pro.arcgis.com/en/pro-app/latest/tool-reference/image-analyst/classify-pixels-using-deep-learning.htm (accessed on 5 July 2025).

Figure 1. Workflow showing inputs/outputs (white) and Esri ArcGIS Pro tools (gray).

Figure 3. Example training tiles (sets of images and labels) exported using the roof decking training polygons (yellow) and prepared orthomosaics captured in (a–e) Dominica and (f–j) Sint Maarten.

Figure 4. Example training tiles (sets of images and labels) exported using the roof hole training polygons (red) and prepared orthomosaics captured in (a–e) Dominica and (f–j) Sint Maarten.

Figure 5. Locations of orthomosaics used for expanding the original training datasets.

Figure 8. Example training tiles from (a–e) Dominica and (f–j) Sint Maarten showing roof decking training polygons (yellow) and affected rooftops on which damage could not be comprehensively labeled.

Figure 9. Example training tiles from (a–e) Dominica and (f–j) Sint Maarten showing roof hole training polygons (red) and affected rooftops on which damage could not be comprehensively labeled.

Figure 10. Example true positive predicted polygons of roof decking (yellow) from test areas in (a–e) Dominica, (f–j) Sint Maarten, (k–o) the Bahamas, and (p–t) US Virgin Islands. To support the clear visualization of model predictions, corresponding reference polygons are not shown. However, all predicted and reference polygons are available for download at [116].

Figure 11. Example true positive predicted polygons of roof holes (red) from test areas in (a–e) Dominica, (f–j) Sint Maarten, (k–o) the Bahamas, and (p–t) US Virgin Islands. To support the clear visualization of model predictions, corresponding reference polygons are not shown. However, all predicted and reference polygons are available for download at [116].

Figure 12. Example false positive predicted polygons of roof decking (yellow) from test areas in (a–e) Dominica, (f–j) Sint Maarten, (k–o) the Bahamas, and (p–t) US Virgin Islands.

Figure 13. Example false positive predicted polygons of roof holes (red) from test areas in (a–e) Dominica, (f–j) Sint Maarten, (k–o) the Bahamas, and (p–t) US Virgin Islands.

Figure 14. Example false negative reference polygons of roof decking (yellow) from test areas in (a–e) Dominica, (f–j) Sint Maarten, (k–o) the Bahamas, and (p–t) US Virgin Islands.

Figure 15. Example false negative reference polygons of roof holes (red) from test areas in (a–e) Dominica, (f–j) Sint Maarten, (k–o) the Bahamas, and (p–t) US Virgin Islands.

Table 2. Properties of training and reference (testing) polygons in each location.

Location	Category	Number of Polygons (Decking)	Total Polygon Area (m²) ¹	Average Polygon Area (m²) ¹	Number of Polygons (Hole)	Total Polygon Area (m²) ¹	Average Polygon Area (m²) ¹
Dominica	Training	1250	23,816.82	19.05	1250	25,010.37	20.01
Sint Maarten	Training	1250	37,543.34	30.03	1250	18,524.75	14.82
All		2500	61,360.16	24.54	2500	43,535.12	17.41
Dominica	Testing	80	741.48	9.27	248	3729.63	15.04
Sint Maarten	Testing	155	2832.90	18.28	209	3151.51	15.08
The Bahamas	Testing	357	2438.71	6.83	159	1978.23	12.44
US Virgin Islands	Testing	108	4685.96	43.39	124	1520.02	12.26
All		700	10,699.05	15.28	740	10,379.39	14.03

¹ Planimetric areas are provided to convey the areal representation of each class in the training and testing images. Roof slopes must be factored into the calculation of true surface areas of damage.

Table 3. Properties of orthomosaics used for expanding the original training datasets.

Location	Category	Number of Orthomosaics	GSD (m)	Total Area (km²) ¹	Imaging Platform	Source
The Bahamas	Training	21	0.02–0.09	12.35	M200 ²	GlobalMedic [117]
US Virgin Islands	Training	26	0.15	124.13	Crewed aircraft ³	US NOAA NGS [118]
All		47	0.02–0.15	136.48

¹ Areas were calculated based on the polygons used for clipping the orthomosaics (Figure 5). ² Multirotor drone: DJI M200 with a Zenmuse X4S sensor. ³ Crewed aircraft with a Trimble Digital Sensor System.

Table 5. Accuracy results for each trained model. The accuracy assessment summed the number of test image pixels categorized as true positive (TP), false positive (FP), and false negative (FN). These sums were used to calculate union, precision, recall, F1 score, and intersection over union (IoU).

Decking Accuracy: Dual-Class Model, Original Training Dataset (33,195 Tiles), Early Stopping After 49 Epochs
Location	TP	FP	FN	Union	Precision	Recall	F1 Score	IoU
Dominica	66,294	12,193	230,359	308,846	0.84	0.22	0.35	0.21
Sint Maarten	855,103	38,235	278,025	1,171,363	0.96	0.75	0.84	0.73
The Bahamas	564,192	78,497	411,281	1,053,970	0.88	0.58	0.70	0.54
US Virgin Islands	1,058,496	7776	815,891	1,882,163	0.99	0.56	0.72	0.56
All	2,544,085	136,701	1,735,556	4,416,342	0.95	0.59	0.73	0.58
Hole Accuracy: Dual-Class Model, Original Training Dataset (33,195 Tiles), Early Stopping After 49 Epochs
Location	TP	FP	FN	Union	Precision	Recall	F1 Score	IoU
Dominica	1,041,515	108,519	450,443	1,600,477	0.91	0.70	0.79	0.65
Sint Maarten	505,705	18,870	754,798	1,279,373	0.96	0.40	0.57	0.40
The Bahamas	269,750	15,502	521,534	806,786	0.95	0.34	0.50	0.33
US Virgin Islands	117,067	44,235	490,941	652,243	0.73	0.19	0.30	0.18
All	1,934,037	187,126	2,217,716	4,338,879	0.91	0.47	0.62	0.45
Decking Accuracy: Single-Class Model, Original Training Dataset (20,899 Tiles), Early Stopping After 32 Epochs
Location	TP	FP	FN	Union	Precision	Recall	F1 Score	IoU
Dominica	166,142	67,685	130,511	364,338	0.71	0.56	0.63	0.46
Sint Maarten	886,534	91,195	246,594	1,224,323	0.91	0.78	0.84	0.72
The Bahamas	758,005	108,036	217,468	1,083,509	0.88	0.78	0.82	0.70
US Virgin Islands	1,294,140	201,386	580,247	2,075,773	0.87	0.69	0.77	0.62
All	3,104,821	468,302	1,174,820	4,747,943	0.87	0.73	0.79	0.65
Hole Accuracy: Single-Class Model, Original Training Dataset (21,725 Tiles), Early Stopping After 31 Epochs
Location	TP	FP	FN	Union	Precision	Recall	F1 Score	IoU
Dominica	1,040,745	140,252	451,213	1,632,210	0.88	0.70	0.78	0.64
Sint Maarten	566,164	51,416	694,339	1,311,919	0.92	0.45	0.60	0.43
The Bahamas	291,143	20,816	500,141	812,100	0.93	0.37	0.53	0.36
US Virgin Islands	272,613	21,905	335,395	629,913	0.93	0.45	0.60	0.43
All	2,170,665	234,389	1,981,088	4,386,142	0.90	0.52	0.66	0.49
Decking Accuracy: Single-Class Model, Expanded Training Dataset (25,659 Tiles), Early Stopping After 39 Epochs
Location	TP	FP	FN	Union	Precision	Recall	F1 Score	IoU
Dominica	180,446	41,560	116,207	338,213	0.81	0.61	0.70	0.53
Sint Maarten	931,631	95,651	201,497	1,228,779	0.91	0.82	0.86	0.76
The Bahamas	775,782	67,011	199,691	1,042,484	0.92	0.80	0.85	0.74
US Virgin Islands	1,665,817	71,897	208,570	1,946,284	0.96	0.89	0.92	0.86
All	3,553,676	276,119	725,965	4,555,760	0.93	0.83	0.88	0.78
Hole Accuracy: Single-Class Model, Expanded Training Dataset (26,186 Tiles), Early Stopping After 29 Epochs
Location	TP	FP	FN	Union	Precision	Recall	F1 Score	IoU
Dominica	1,302,837	208,708	189,121	1,700,666	0.86	0.87	0.87	0.77
Sint Maarten	1,000,810	297,466	259,693	1,557,969	0.77	0.79	0.78	0.64
The Bahamas	536,016	56,196	255,268	847,480	0.91	0.68	0.77	0.63
US Virgin Islands	516,733	361,370	91,275	969,378	0.59	0.85	0.70	0.53
All	3,356,396	923,740	795,357	5,075,493	0.78	0.81	0.80	0.66

Table 6. Reference sample properties and accuracy results of our study and previous studies.

		Reference Sample Properties			Model Accuracy
Study	Damage Class	Creation Method	Quantity	Location	Precision	Recall	F1 Score
Our study	Roof decking	Manual delineation	4.3 million px, 275 rooftops	4 external test areas	0.93	0.83	0.88
Our study	Roof hole	Manual delineation	4.2 million px, 408 rooftops	4 external test areas	0.78	0.81	0.80
[97]	Roof hole	Manual delineation	24,342 px, 1 rooftop	Extracted from training area	0.77	0.48	0.59
[104]	Roof damage	Classification of superpixels	33,305 superpixels (4.4 million px), 100–200 rooftops	Extracted from training area	0.72–0.97	0.41–0.85	0.53–0.91
[102]	Roof damage	Classification of superpixels	28,640 superpixels, 100–200 rooftops	Extracted from training area	0.76–0.89	0.72–0.81	0.74–0.85
[103]	Roof damage	Classification of superpixels	50 rooftops	Extracted from training area	0.91	0.88	0.89
[101]	Roof damage	Classification of superpixels	6929 superpixels	Extracted from training area	0.83	0.91	0.87
[111]	Roof damage	Classification of supervoxels	<10,000 supervoxels	Extracted from training area	0.86–0.90	0.87–0.93	0.86–0.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kucharczyk, M.; Nesbit, P.R.; Hugenholtz, C.H. Automated Mapping of Post-Storm Roof Damage Using Deep Learning and Aerial Imagery: A Case Study in the Caribbean. Remote Sens. 2025, 17, 3456. https://doi.org/10.3390/rs17203456

AMA Style

Kucharczyk M, Nesbit PR, Hugenholtz CH. Automated Mapping of Post-Storm Roof Damage Using Deep Learning and Aerial Imagery: A Case Study in the Caribbean. Remote Sensing. 2025; 17(20):3456. https://doi.org/10.3390/rs17203456

Chicago/Turabian Style

Kucharczyk, Maja, Paul R. Nesbit, and Chris H. Hugenholtz. 2025. "Automated Mapping of Post-Storm Roof Damage Using Deep Learning and Aerial Imagery: A Case Study in the Caribbean" Remote Sensing 17, no. 20: 3456. https://doi.org/10.3390/rs17203456

APA Style

Kucharczyk, M., Nesbit, P. R., & Hugenholtz, C. H. (2025). Automated Mapping of Post-Storm Roof Damage Using Deep Learning and Aerial Imagery: A Case Study in the Caribbean. Remote Sensing, 17(20), 3456. https://doi.org/10.3390/rs17203456

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Mapping of Post-Storm Roof Damage Using Deep Learning and Aerial Imagery: A Case Study in the Caribbean

Abstract

Highlights

Abstract

1. Introduction

1.1. Background

1.2. Related Work and Research Needs

2. Materials and Methods

2.1. Study Area and Images

2.2. Image Preparation

2.3. Training and Reference Polygon Creation

2.4. Training Data Export

2.5. Model Configuration and Training

2.6. Model Inference and Post-Processing

2.7. Accuracy Assessment

2.8. Additional Training Data Incorporation

3. Results

3.1. Single-Class Versus Dual-Class Modeling

3.2. Additional Training Data Incorporation

3.2.1. Impact on Accuracy of Each Damage Class

3.2.2. Impact on Accuracy in Each Test Location

3.2.3. Differences in Scene Complexity Between All Training Areas

3.3. Strengths and Weaknesses of the Best-Performing Models

3.3.1. True Positives

3.3.2. False Positives

3.3.3. False Negatives

3.3.4. Summary

4. Discussion

4.1. Recommended Modeling Approach

4.1.1. Single-Class Modeling

4.1.2. Geographic and Imaging Inclusion

4.1.3. Comprehensively Labeled Training Data

4.2. Comparison to Previous Studies

4.3. Application Limitations and Extensions

4.4. Recommended Future Research

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI