Using Artiﬁcial Neural Network Models to Assess Hurricane Damage through Transfer Learning

: Coastal hazard events such as hurricanes pose a signiﬁcant threat to coastal communities. Disaster relief is essential to mitigating damage from these catastrophes; therefore, accurate and efﬁcient damage assessment is key to evaluating the extent of damage inﬂicted on coastal cities and structures. Historically, this process has been carried out by human task forces that manually take post-disaster images and identify the damaged areas. While this method has been well established, current digital tools used for computer vision tasks such as artiﬁcial intelligence and machine learning put forth a more efﬁcient and reliable method for assessing post-disaster damage. Using transfer learning on three advanced neural networks, ResNet, MobileNet, and EfﬁcientNet, we applied techniques for damage classiﬁcation and damaged object detection to our post-hurricane image dataset comprised of damaged buildings from the coastal region of the southeastern United States. Our dataset included 1000 images for the classiﬁcation model with a binary classiﬁcation structure containing classes of ﬂoods and non-ﬂoods and 800 images for the object detection model with four damaged object classes damaged roof , damaged wall , ﬂood damage , and structural damage . Our damage classiﬁcation model achieved 76% overall accuracy for ResNet and 87% overall accuracy for MobileNet. The F1 score for MobileNet was also 9% higher than the F1 score of ResNet at 0.88. Our damaged object detection model achieved predominant predictions of the four damaged object classes, with MobileNet attaining the highest overall conﬁdence score of 97.58% in its predictions. The object detection results highlight the model’s ability to successfully identify damaged areas of buildings and structures from images in a time span of seconds, which is necessary for more efﬁcient damage assessment. Thus, we show that this level of accuracy for our damage assessment using artiﬁcial intelligence is akin to the accuracy of manual damage assessments while also completing the assessment in a drastically shorter time span.


Introduction
Coastal storms and hazard events are often analyzed to address dangers faced by coastal communities around the world. Many potential threats to communities residing in coastal areas are captured with a comprehensive plan for risk analysis. In 2018, a preliminary risk analysis estimated almost $17 billion in damages across the state of North Carolina in the wake of Hurricane Florence [1]. As a result, accurate and efficient evaluations of damage from coastal hazards such as hurricanes are necessary to provide data for addressing post-disaster relief efforts. Damage assessment is a primary tool for understanding the levels of damage to coastal populations in the aftermath of a hazard event. Knowledge of damage is further applied to models for risk assessment to mitigate damage from future hazards [2]. Efficient relief plans and proper allocation of relief funding to the affected areas are impractical without accurate data. Traditionally, post-disaster data have been collected through methods involving individuals or teams making initial observations and assessments of damage. These people capture photographs of the damage in door-to-door assessments or windshield surveys (e.g., [3,4]). Remote validations are a supplemental tool used during the damage assessment process which increases the swiftness of the manual evaluations. These desktop assessments replace onsite validations when the risk for preliminary damage assessment staff is high and images of the damaged area are readily available [4]. However, these validations still rely on humans to identify damaged structures and verify damage assessments making them prone to a significant level of inaccuracy.
In a myriad of classification tasks, artificial neural network technology has proven to be significantly more efficient in performing the same work as a human to a higher level of accuracy. Machine learning techniques possess particular advantages over humans in tasks that incorporate a large data set from multiple events of highly similar situations [2]. Hurricanes provide a multitude of events for data collection that can be used by artificial neural network models to perform damage assessment. There are usually two types of data capturing hurricane damage to buildings. The first type is satellite imagery (e.g., [5,6]), and the other type is ground-level images/photos (e.g., [7]) taken by drones or other similar ways. Both data types have been used for damage assessment. For example, Weber and Kané [8] used the Mask R-CNN [9] to predict both building locations and damage level based on pre-disaster and post-disaster images of xBD database [6]. Furthermore, Hao et al. [10] developed a multi-class deep learning model with an attention mechanism to assess damage levels of buildings given a pair of satellite images depicting a scene before and after a disaster using the xView2 dataset [11]. Cheng et al. [12] developed a stacked convolutional neural network architecture to train on an in-house visual dataset from Hurricane Dorian that was collected using an unmanned aerial vehicle. An effective hurricane damage assessment model should train on both aerial and ground-level image data to increase adaptability for emergency damage assessment of a future coastal hazard.
Social media has been explored as a primary source of data for hurricane damage assessment because of the swift integrability these platforms provide to automated damage assessments (e.g., [13][14][15]). Hao and Wang [16] used five machine learning classifiers that take social networking platform images and output the damage types and severity levels presented in images. Leveraging social media platforms to train damage assessment models has shown success with rapid operation capabilities.
The transfer learning approach to developing artificial neural network models for hurricane damage assessment has also been recently explored. Most of these studies focus on using transfer learning on pre-trained convolutional neural network (CNN) models with aerial images of hurricane damage to buildings (e.g., [17][18][19][20]). Liao et al. [21] uses transfer learning on two well-established CNNs, AlexNet and VGGNet, to create classification models for the two-dimensional orthomosaic images gathered from unpiloted aerial systems. These and other similar studies limit the source of the training dataset, making the classification models useful only for functional datasets comprised of aerial images taken by satellite or drone. Our work incorporates both aerial and ground-level images for hurricane damage classification and detection of damaged buildings to create a more operational damage assessment framework to apply to future coastal hazards.
Incorporating transfer learning for building damage assessment is affected by the transferability of the learned features and information from the source domain to the target domain used for testing the model. Domain adaption when using transfer learning arises when there are discrepancies among images in the source domain and between the source and target domains (e.g., [22,23]). These discrepancies are a result of how remote sensing captures images with varying sensors, locations, times, and perspectives. This issue with domain invariance extends to the transferability of information derived from different coastal hazards. A CNN-based model was shown to reach high classification performance when training on the same damage type for different disasters [24]. The source and target domains in our study do not present any major discrepancies. Rather, our damage classification and damage detection models focus on a single coastal hazard that causes multiple types of damage to enhance the efficacy of damage assessment.
There are several challenges in the area of building damage assessment using artificial neural network models. First, machine learning training requires a considerable amount of input data in order to sufficiently assess the damage or classify the damage levels from images (e.g., [5,6]). Second, in-house machine learning model development requires a significant amount of effort to achieve high accuracy. This study focuses on the building damages due to hurricanes in the U.S. southeast area, and we improve the efficiency of assessing hurricane damage to buildings by applying neural network models for damage classification and object detection. We address the first challenge by developing our inhouse building damage dataset using internet search engines, and we address the second challenge by utilizing the advanced artificial intelligence models for computer vision, MobileNet [25], ResNet [26], and EfficientNet [27], through transfer learning. This paper is organized as follows: Section 2 presents the development of our in-house building damage images including data collection, data statistics, and data pre-processing. Section 3 reviews the background of three artificial intelligence models that were used as the base of transfer learning for building damage assessment and explains the transfer learning workflow for both damage classification and damage detection. Section 4 presents the training metrics, the damage classification results, and the damage detection results, further discussing the transfer learning results among three models. Finally, the conclusion and significance of this study are stated in Section 5.

Building Damage Dataset
This section first presents the development of our in-house building damage image dataset. Then, we explain the data statistics for damage classification and damage detection.

Data Collection and Preparation
This study primarily focuses on the hurricane damage to buildings in the U.S. southeast region. We sourced the data from an internet search specifying criteria for photos related to hurricane damage, and a few thousand images taken from hurricane damage in Florida, Georgia, North Carolina, and South Carolina were prepared for a preliminary data cleanup. Each image in our in-house dataset was further examined for types of buildings and structures contained in the images to ensure they were characteristic of the U.S. east coast region.
The raw dataset was further processed for two tasks: damage classification and damaged object detection. For the first task, we examined the data to be used in the classification model and identified potential classes for image categorization. This step involved the additional cleanup and removal of remaining images that were duplicates or would not be a candidate for one of the image classes. For the second task, we also examined individual images to be used in the object detection model and removed those that did not capture a damaged structure. After final examination of both versions of the data, the images were ready for pre-processing before inputting them into the neural network models using transfer learning.

Data Statistics
The next step required dividing the dataset into a set for the classification model and a second set for the object detection model. The main difference among the two datasets was that the set applied to object detection required images only containing buildings, and the set applied to classification was independent from only using images containing buildings. Images contained in both sets of data are of varying pixel resolution and unaltered from the original source.
Historical hurricanes usually brought about significant flood damage due to storm surges and heavy precipitation. Thus, the damage classification research in this work aims to determine if there are floods in the image. To this purpose, we selected 1000 images from our dataset and divided them into two categories, floods and non-floods, as indicated in Table 1. The motivation is to examine if neural network models can perform binary classification on our dataset. Flood damage is characterized by flood waters in the images, and it can occur in various ways. Typical floods damages in our dataset include (1) flooded buildings, houses, and communities, (2) flooded streets, (3) flooded vehicles, and (4) flooded coastal areas. The non-floods images are related to hurricane damage, but they do not include floods in them; these images needed to be characteristic of areas and buildings damaged from hurricanes because the purpose of our classification models is to exemplify their success learning from data that would be used for traditional hurricane damage assessment. Finally, the binary classification task does not require additional data processing other than sorting the images into two categories.
Unlike the data preparation for damage classification, machine learning object detection requires the preparation of labeled data, which guides the model to learn common features in a specific type of object. The pre-processing image labeling in this work was accomplished by using the open-source annotation tool, LabelImg [28]. This tool allowed us to take an input image in our dataset and create bounding boxes around the areas of interest in the image corresponding to an annotation label. The position of the bounding box and the label were then exported for neural network model training. The object detection dataset consisted of 800 images that were annotated, and annotation labels are the damaged objects as listed in Table 2. Four categories of objects were identified from our hurricane damage dataset, and the features associated with each of them are briefly explained below.
• Damaged roof. The bounding box label highlights a roof that has the whole roof, some shingles, or parts of the roof damaged. The bounding box label typically encompasses the entire roof in the image; however, if the entire roof is not visible then the damaged area and any additional parts of the roof that are visible were included. • Damaged wall. The labeling bounding box highlights a damaged building wall or windows within a wall. Damage to walls/windows could range from areas with minor disintegration of brick or glass structure to entire loss of the wall or window structure. • Flood damage. The bounding box label highlights flood waters in an image. The flood water can occur in various places as explained in the binary classification dataset. Due to this sporadic nature, in some images, multiple bounding box labels were used to encapsulate the entirety of the flood water. • Structural damage. The bounding box label highlights a building suffering from structural damage, e.g., the disintegration of the roof and/or any floor(s) within the building, complete loss of multiple walls/structures, or the collapse of the whole building.
It should be pointed out that the total number of samples in Table 2 is 958, which is greater than the total number of annotated images, i.e., 800. The difference is due to the fact that multiple objects were annotated/observed in a single image, resulting in a larger number of objects than the number of images.

Transfer Learning
The previous section showed that our in-house hurricane damage dataset only has about 800-1000 images. To be able to develop effective hurricane damage assessment machine learning models using such a small dataset, we utilize a machine learning technique, transfer learning, in this work. This section first presents the background information about transfer learning. Then, we review the existing neural network models that were used in this study, and we focus on the typical model architecture. Next, we present the transfer learning workflows used in this study.

The Fundamentals of Transfer Learning
Transfer learning is a machine learning technique that leverages feature representations from a pre-trained artificial neural network model to train a new target model on a different, usually smaller size dataset. The crucial step for implementing transfer learning is to use learned weights/parameters from a pre-trained neural network model, which is a saved model that was previously trained on a large dataset, e.g., the ImageNet [29] and Coco [30] datasets. This choice is justified by the fact that if the original dataset is large enough and general enough, then the spatial hierarchy of features learned by the pre-trained models can effectively act as a generic model of the visual world; thus, its features are useful for many different computer vision problems, even though these new topics involve completely different classes than those of the original task [31].

Artificial Neural Network Models
This study utilizes three well-established neural network models in computer vision, namely, ResNet [26], MobileNet [25], and EfficientNet [32]. These networks were selected for several reasons. First of all, we aim to explore the efficiency of varying neural network architectures for hurricane damage assessment. To that end, collating results from multiple models would provide deeper insight than results obtained from one model trained on a single neural framework. Second, these models have been pre-trained using large image datasets, and their pre-trained weights are freely available. The following sub-sections provide a brief review of the selected three neural network models with a focus on their typical model architecture.

ResNet
The ResNet architecture was developed with a deep residual learning framework to directly address the issue of degradation of training accuracy in deeper networks that begin to converge [26]. Identity shortcut connections within the ResNet architecture do not rely on parameters and allow a continuous flow of information between layers as well as additional learning of residual functions. Thus, the residual net framework allows for easier optimization of the residual mapping and increased accuracy from enhanced depth of the residual nets [26]. The 50-layer ResNet contains a 3-layer bottleneck design that results in a more efficient model when paired with the identity shortcuts. We incorporated the 50-layer ResNet architecture into our model to match the target input resolution and simplistic model structure.

MobileNet
The MobileNet architecture focuses on streamlining the convolution layers through depthwise separable convolutions to build an attenuated deep neural network [25]. The separation of the convolution into two layers, one for filtering the inputs and one for combining the outputs with the depthwise convolution, significantly decreases the magnitude of computation and model scale. This in turn generally leads to low latency for incorporating the MobileNet model into classification and object detection.
Additionally, the MobileNet architecture makes use of two global hyper-parameters: a width multiplier and a resolution multiplier. The width multiplier aims to shrink each layer of the network in a uniform fashion, while the resolution multiplier is applied to the input image which results in reducing each subsequent layer by the same parameter [25]. We incorporated MobileNet V1 to match the target input resolution and maintain consistency with the choice of primitive model architectures.

EfficientNet
The EfficientNet architecture was created through a focus on prioritizing efficiency while maintaining state-of-the-art accuracy. Traditionally, convolutional neural networks are scaled up from a baseline model to improve the accuracy of detections/classifications; more training data and model layers generally produce more accurate predictions. Efficient-Net uses compound scaling of the network's dimensions (width, depth, and resolution) to achieve high accuracy while striving to be the most efficient CNN [32].
The EfficientDet network is an extension of the EfficientNet architecture that was created specifically for object detection applications; it uses the EfficientNet model architecture as a base network. This variant of EfficientNet focuses on compound scaling paired with a weighted bi-directional feature pyramid network (BiFPN) to connect subsequent layers of the model together for the most successful optimizations of model efficiency and accuracy [27]. Figure 1 shows the workflow to train ResNet and MobileNet for flood damage classification. Both ResNet and MobileNet were trained using the ImageNet dataset, which includes more than 1 million images and 1000+ target classes or labels. Our transfer learning work utilizes the pre-trained knowledge in these two models, i.e., model weights that characterize typical features in images in the real world. Specifically, our hurricane damage classification transfer learning work consists of the following steps. It should be pointed out that there are no floods nor non-floods classes in the ImageNet dataset. As a result, the primary goal to use these pre-trained models is feature extraction. Our flood damage dataset was configured with a 60/20/20 split for training, validation, and testing purposes. Furthermore, data augmentation was used to increase our training samples to reduce model overfitting. The data augmentation technique randomly transforms training samples to yield believable-looking images, and it helps expose the model to more aspects of the data for better generalization [31].  Figure 2 shows the object detection workflow for training ResNet, MobileNet, and Effi-cientNet. All three networks were pre-trained on the same image resolutions (640 × 640 pixels) and the COCO 2017 dataset [30] which contains 330,000 images and 80 object categories. Each of the three models were also configurable to begin with the same training parameters. Therefore, the batch size was set to four images, and each model training instance was terminated after the twenty-thousandth epoch. Our damage detection dataset was configured with a 50/50 split for training and testing purposes when pre-processed into each of our models. This left 400 images and their corresponding annotations for model training and the other 400 images and their corresponding annotations for testing each of the models' predictions. Our transfer learning work leverages the pre-trained model weights for features extracted from the typical objects contained within the COCO 2017 dataset. More specifically, our hurricane damage detection transfer learning work consists of the following steps. Because the four object categories from Table 2 do not appear in the COCO 2017 dataset, the primary objective for using the pre-trained models is feature extraction.

Computing Environment
This machine learning research was conducted using Google Colab, a free Jupyter notebook environment that runs entirely in the cloud. The computing environment was configured as follows.

•
The CPU model name is Intel(R) Xeon(R) CPU @ 2.00 GHz; • The clock speed of the CPU is 2 K MHz, and the CPU cache size is 39,424 KB; • The Graphics Processing Units (GPU) card is NVIDIA Tesla P100. It is based on the NVIDIA Pascal GPU architecture, and it has 3584 NVIDIA CUDA cores. The GPU memory is 16 GB. A single GPU card was used in this study.

Metrics and Prediction Skills
The metrics utilized in tracking the training behavior of the classification models and object detection models differ and are presented in the following sections. Additionally, there are specific prediction skills primarily used in determining the success in evaluating the classification models.

Classification Metrics
The metrics used to track the progress of classification models during training are loss and accuracy. Cross-entropy loss is the particular formulation in Equation (1) where the index i is the i-th training example in a dataset, y i is the ground-truth label for the i-th training example, andŷ i is the prediction for the i-th training example [33]. Cross-entropy loss is much larger for false predictions with a high level of confidence, resulting in those predictions being more denounced. Cross-entropy loss is used in many classifier models such as MobileNet and ResNet.

Cross-Entropy Loss
Accuracy is defined by Equation (2) In addition to the training metrics, the metrics used for evaluating the classification models include precision, recall, and the F1 score. Precision is defined by Equation (3) and measures the percentage of correct positive predictions when considering the total number of positive predictions made.
Recall is defined by Equation (4) and measures the percentage of positive predictions made when considering the total amount of positive samples.
The F1 score is an equally weighted combination of both precision and recall. Equation (5) describes the formulation of the F1 score which implies that both FP and FN predictions are considered in determining the value. This characteristic of the F1 score makes it a well-balanced measure of model performance.

Object Detection Metrics
The training metrics used to track the progress of the object detection models deal with the associated loss parameters for distinct training operations-the three major operations being classification, localization, and regularization. Classification loss is associated with the determination of the target object class [34]. Classification loss is represented as a combination of the cross-entropy loss from Equation (1) and the SoftMax activation function in Equation (6) where z is a vector input containing K elements corresponding to the possible object classes, j is the index variable for the input vector z, and z j is the j-th element of z. The denominator of Equation (6) contains the normalization term that ensures σ(z) j is a valid probability distribution where all j elements sum to 1, allowing the predicted object classes to be converted to probabilities before computing the cross-entropy loss [33]. The localization loss is associated with bounding box regression to pinpoint the target object through training another head with an independent loss function [34]. This loss function must account for given samples/instances of bounding box coordinates represented as y i and the target coordinates of the ground-truth bounding box represented asŷ i in Equation (7). This localization loss is characterized as a Mean Square Error (MSE).
The third type of loss, regularization loss, aims to reduce overfitting in the neural network by penalizing certain values of the weights in each layer. The result is a constrained range of values for these weights that purportedly reduces the memory capacity of the model without sacrificing model performance. Regularization is formulated in two distinct fashions (and usually implemented as a combination of both) with L1 and L2 regularization. L1 and L2 are shown in Equation (8) with the weight value w, total number of weights in a given layer n, and the regularization hyperparameter λ It is clear from Equation (8) that L1 is a function of a scaled sum of the magnitude of each weight value, and L2 is a function of a scaled sum of each weight value squared. Finally, the total loss function is used as a generalized metric for evaluating the training performance. It is a weighted sum of the classification loss, the localization loss, and the regularization loss parameters that are calculated by the model. The model weights for classification loss and localization loss were kept equal at a value of 1.0 while the regularization loss weight was set to a much smaller fraction of the previous weights. This was standard for configuring the training of all three damage detection models. It should be added that there are two opposed structures for the heads being trained to evaluate the loss parameters mentioned above: the convolution head and fully connected head. The former is more appropriate and has better results for the classification task, while the latter is more advantageous at conducting bounding box regression [35].

Model Training
We present the transfer learning model training and validation metrics in this section. Validation occurs in the training process to evaluate the model's predictions on the validation dataset which contains images the model has not encountered during training. This gives an objective estimate of the model's accuracy and loss to compare to the training accuracy and training loss. Figure 3 shows the training metrics for flood damage classification using the cross-entropy loss and the accuracy defined in Equations (1) and (2), respectively. Figure 3a shows that the training loss and the validation loss using the ResNet model both converge to a value of approximately 0.5, indicating that the model does not experience overfitting or underfitting issues. On the other hand, training loss for the MobileNet model is around 0.05, while its validation loss is similar to that of ResNet, 0.5. Figure 3b shows training accuracy and validation accuracy for the two base models. Accuracy measures the ratio of correct prediction (including true floods damage prediction and true non-floods damage prediction) to the total number of predictions. The accuracy for the ResNet model converges to a value between 0.7 and 0.75. The accuracy using MobileNet is slightly different between training and validation. The training accuracy is close to 0.975, while the validation accuracy is about 0.9. Overall, the training metrics comparison shows that flood damage classification using the MobileNet model has a similar validation loss compared to that of ResNet, but it has a better accuracy.  Figure 4 shows the training metrics for hurricane damage detection which utilize the cross-entropy, MSE, L1, and L2 loss functions in Equations (1), (7) and (8). Figure 4a shows that EfficientNet converges to an approximate value of 0.3, ResNet converges to an approximate value of 0.18, and MobileNet converge to the lowest value of training classification loss at approximately 0.05. All three models achieve values for classification loss ≤ 0.3, which is generally accepted for concluding model training. However, ResNet experiences a sharp spike in classification loss between 0 and 500 epochs. This can be attributed to the early point in the training process where there has not been a significant relation of extracted features within the model's network to image features for the four classes in our model. Figure 4b shows that ResNet converges to a training localization loss of approximately 0.8, while MobileNet and EfficientNet both converge to an approximate value of 0.01 for training localization loss. EfficientNet also has a unique path of convergence for this parameter, as it begins the training process with a localization loss value of approximately 0.05 and finishes with a value of 0.01 which characterizes its loss curve to be constant relative to the loss curves for ResNet and MobileNet. This result can be attributed to the BiFPN structure of the EfficientDet model [27] which optimizes the accuracy of predictions, specifically bounding box regression in this case. Figure 4c shows that EfficientNet converges to an approximate value of 0.08, ResNet converges to an approximate value of 0.2, and MobileNet converges to the largest value of approximately 0.56 for regularization loss. The relatively high value for MobileNet can be attributed to less effective modeling of the regularization loss with the second term in both L1 and L2 of Equation (8). This term likely contributes a larger value for the MobileNet layer weights than the layer weights for EfficientNet and ResNet. Additionally, the MobileNet layer architecture lacks multiple two dimensional convolutions as implemented in the ResNet and EfficientNet layer architectures. The addition of supplemental convolution layers would likely further reduce overfitting in each model. Thus, the regularization loss would naturally decrease in accordance. Figure 4d shows that ResNet and EfficientNet both converge to an approximate value of 0.4 while MobileNet converges to an approximate value of 0.6 for the total loss. The increased value of total loss for MobileNet can be attributed to the regularization loss which converges to a relatively large value in comparison to the localization loss and the classification loss for MobileNet. Thus, the significantly higher value of regularization loss for MobileNet skews the value of total loss, despite the model's significantly lower values for classification loss and localization loss.

Damage Classification
After the completion of the transfer learning model training using the base models of ResNet and MobileNet on our in-house flood damage dataset, the newly trained models are tested with images they did not see during the training stage. Among the test dataset, the number of floods images is 92, and the number of non-floods images is 107. It is noted that the floods and non-floods images are slightly imbalance. The same set of floods images and non-floods images was tested by the newly trained models using ResNet and MobileNet. In this study, the floods class was the positive class, and the non-floods class was the negative class, resulting in the following four possible predictions:  Figure 5 shows the confusion matrix for both ResNet and MobileNet. Each row in a confusion matrix represents a true label (i.e., an actual class), while each column in a confusion matrix represents a predicted label. In this study, the first row and the first column have the label of non-floods, while the second row and the second column have the label of floods. The diagonal elements of the confusion matrix represent that the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier. The values in each of the elements are normalized by the total number of images for each class. It is expected that a perfect classifier would have only true positives (lower right) and true negatives (top left). Figure 5 shows both ResNet and MobileNet are able to classify non-floods and floods images with the larger percentages along the diagonal elements. An accurate prediction means that a non-floods image was predicted as nonfloods, and a floods image was predicted as floods. The prediction accuracy of ResNet is about 76%, and the accuracy of MobileNet is about 87%. Both classifiers show that true positive predictions (85% by ResNet and 89% by MobileNet) are higher than true negative predictions (66% by ResNet and 85% by MobileNet). A further comparison shows that the transfer learning model using MobileNet outperforms the one trained using ResNet for both true positive and true negative predictions. Specifically, the true positive prediction percentage of MobileNet is about 89%, while the true positive prediction percentage of ResNet is about 85%. Similarly, the true negative prediction percentage of MobileNet is about 85%, while the true negative prediction percentage of ResNet is about 66%. Model classification performance is further evaluated by using precision, recall, and F1 score, as defined in Equations (3)-(5), respectively. The result is summarized in Table 3. The precision metric measures the accuracy of the true positive predictions (i.e., floods label) by dividing the true positive predictions over the sum of true positive and false positive predictions. The precision values for ResNet and MobileNet are 0.75 and 0.87, respectively. This indicates that MobileNet shows a higher accuracy (about 12%) than ResNet in terms of floods image classification. The second metric recall measures the true positive rate (i.e., the ratio of floods images that are correctly detected by the classifiers. The result shows that the recall of MobileNet is about 4% higher than that of ResNet. The last metric examined in this study is the F1 score, which is the harmonic mean of precision and recall, as defined in Equation (5). As a result, the classifier only obtains a high F1 score if both precision and recall are high. The F1 score result shows that MobileNet obtains a F1 score that is about 9% higher than that obtained by the ResNet-based classifier. In short, both the confusion matrix comparison in Figure 5 and metric comparison in Table 3 show that the flood damage classification models developed through transfer learning are accurate. Furthermore, the classifier using MobileNet as the base model performs better than a transfer learning model developed on the basis of ResNet.

Damage Detection
In the following sections, we compare the predictions from each model on a set of four test images that the model has not seen previously with each image containing one of the four specific classes of damage. The scores in Tables 4-7 are confidence/probability scores associated with each predicted type of damage. Each confidence score is assigned by the model to a different bounding box prediction as a measure of how likely the detected object in the image belongs to the predicted class. In other words, the confidence score is a measure of the model's ability to isolate a damaged area within an image and correctly identify that damage using the model's trained classifiers. Higher confidence scores are associated with the most accurate predictions of damaged areas in the image. The top three confidence scores were taken from each inference run on the corresponding image, and the top confidence score is associated with the bounding box prediction displayed on the images in Figures 6-9.

Damaged Roof Comparison
In Table 4, the top predictions for MobileNet and EfficientNet correctly classify the object as a damaged roof, and MobileNet achieves the higher confidence score of 90.93% for that predicted class. The bounding box location in Figure 6c predicted by EfficientNet more accurately encompasses the entire damaged roof structure in comparison to the bounding box location in Figure 6b predicted by MobileNet. On the other hand, the top two predictions made by ResNet are inaccurately classified as structural damage, and the bounding box location in Figure 6a is also inaccurate due to the different image features associated with the structural damage class. The same applies to the third prediction made by ResNet of the flood damage class and the third prediction made by EfficientNet of the structural damage class.   Table 4.

Damage Wall Comparison
In Table 5, the top predictions for ResNet, MobileNet, and EfficientNet correctly classify the object as a damaged wall, and MobileNet again achieves the highest confidence score of 97.58% for that predicted class. The bounding box location in Figure 7b predicted by MobileNet more accurately encompasses the entire damaged wall structure in comparison to the bounding box location in Figure 7a predicted by MobileNet and Figure 7c predicted by EfficientNet. However, the second and third predictions made by ResNet and MobileNet are inaccurately classified as either structural damage or a damaged roof, likely due to similar features of a damaged wall in this image that the model has learned to extract in predicting the structural damage and damaged roof classes as well.   Table 5.

Flood Damage Comparison
In Table 6, the top predictions for ResNet, MobileNet, and EfficientNet correctly classify the object as flood damage, and MobileNet again achieves the highest confidence score of 97.45% for that predicted class. The bounding box location in Figure 8b predicted by MobileNet more accurately encompasses the entire flood damage area in comparison to the bounding box location in Figure 8a predicted by ResNet and Figure 8c predicted by EfficientNet. However, the second and third predictions made by ResNet and MobileNet are inaccurately classified as either a damaged wall or a damaged roof, likely due to some features of the building in the background of the image that would be extracted to predict those classes. The same applies to the second prediction made by EfficientNet of the structural damage class.   Table 6.

Structural Damage Comparison
In Table 7, the top predictions for ResNet, MobileNet, and EfficientNet correctly classify the object as structural damage, and MobileNet actually achieves the lowest confidence score of 42.85% for that predicted class. ResNet and EfficientNet achieve similar confidence scores of 68.11% and 67.96%, respectively, for their top prediction. Although the bounding box location in Figure 9b predicted by MobileNet more accurately encompasses the entire structural damage area in comparison to the bounding box location in Figure 9a predicted by ResNet and Figure 9c predicted by EfficientNet, the second and third predictions made by ResNet and MobileNet are inaccurately classified as either a damaged wall or a damaged roof, very likely due to the similar features of structural damage in this image that the model has learned to extract in predicting the damaged roof and damaged wall classes as well. The same applies to the third prediction made by EfficientNet of the flood damage class.   Table 7.

Overall Performance of the Damage Detection Models
It can be seen from the results of the inference for each type of damage that the most accurate damage detector is the model trained on the MobileNet architecture. The MobileNet model most consistently achieved the highest confidence scores when predicting each type of damage with the highest overall confidence score of 97.58% when predicting the damaged roof in Figure 6b. This is likely a result of the unique structure of the MobileNet architecture; the depthwise separable convolutions allow for these detections to have an improved computation and overall model scale in comparison to the EfficientNet and ResNet models. Additionally, the width and resolution multipliers that are incorporated into the model architecture likely give the MobileNet model a significant advantage in scaling the layers for a more-tailored fit to each object class. Thus, each classifier corresponding to the four object classes predicts the type of damage after it is located in the image with high accuracy.
However, the damage detection model with the most consistent classifier was actually EfficientNet. As can be seen in Tables 4-7, EfficientNet was the most consistent model with classifying the predicted damage as the correct object/label. Out of the top three predictions among each of the four types of damage, EfficientNet incorrectly classified the damage a total of three times, while MobileNet and ResNet incorrectly classified the damage six times and eight times, respectively. This result is likely due to the primary advantage of the EfficientNet architecture being its efficiency of predictions due to the use of compound scaling. In turn, this efficiency is optimized by the BiFPN mentioned previously; increased efficiency of accurate predictions leads to the EfficientNet model producing more consistently accurate classifications of the damage-but at the cost of losing a certain degree of accuracy in predicting the precise location of the damage, thus leading to a lower confidence score for EfficientNet in general.
Since our in-house dataset for damage detection was not included in the original dataset used to develop all three AI models (ResNet, MobileNet, and EfficientNet), then our models are subject to developing a certain level of negative transfer [36,37]. The effect of negative transfer on model performance leads to less accurate predictions made by each model. Lower confidence scores can be attributed to the negative impact each model endured from transferring learned features of each pre-trained model to the target domain of our dataset.

Conclusions
This study has developed transfer-learning-based artificial intelligence models to assess building damages due to hurricanes in the U.S. southeast region. We developed our in-house building damage image dataset and subset it into (i) damage classification (i.e., floods vs. non-floods) and (ii) damaged object detection including damaged roof, damaged wall, flood damage, and structural damage. We developed transfer learning workflows that take advantage of feature extraction from three advanced neural network models in computer vision (i.e., EfficientNet, ResNet, and MobileNet) and successfully retrained these models for building damage assessment. Finally, we evaluated the classification and object detection performance among the different models. Our major findings and contributions include

•
The transfer learning based flood damage classification models were developed using ResNet and MobileNet. A binary classification was carried out to detect floods and non-floods images. Several methods were used to evaluate the performance of the transfer learning models. The confusion matrix comparison showed both ResNet and MobileNet are able to correctly classify floods and non-floods with a relatively high accuracy. Specifically, the overall accuracy is about 76% using ResNet and 87% using MobileNet. Three metrics (precision, recall, and F1 score) were further calculated and compared between two models. The result obtained using MobileNet as the base model is consistently better than that using ResNet. For example, the F1 score, a harmonic mean of precision and recall, is about 0.88 using MobileNet. It is about 9% higher than the F1 score using ResNet (0.79). Overall, this study showed that hurricane flood damage to buildings can be correctly classified using artificial intelligence models developed using transfer learning techniques on the basis of advancing machine learning models in computer vision. • The transfer-learning-based damage detection models were developed using ResNet, MobileNet, and EfficientNet. Four damage types were captured in four object classes: damaged roof, damaged wall, flood damage, and structural damage. Two methods were primarily used to evaluate the performance of the transfer learning models for damage detection. The top three confidence scores and associated object class were tabulated for each model, showing that each model was capable of predicting the correct object class in the image; the MobileNet model consistently achieved the highest confidence score and proved to be the more accurate model in detecting hurricane damage. Then, the images of each type of damage were displayed with the top bounding box prediction for each model. Likewise, MobileNet consistently achieved the most accurate localizations of the detected damage in each image. Therefore, this study showed that various types of damage from hurricanes can be accurately detected using artificial intelligence models developed through transfer learning to further advance machine learning applications in computer vision.
From creating our in-house damage assessment framework, we were able to show that a significant level of accuracy for damage classification can be achieved using transfer learning techniques on a pre-trained neural network. Given the relatively small and broad range of images used for the input data set, our classification model displayed a high degree of versatility that could be used during a spectrum of hurricane and other coastal hazard events. The object detection results highlight the model's ability to successfully identify damaged areas of buildings and structures from test data in a time span of seconds, which is necessary for more efficient damage assessment.
Our work can be improved with further research into applying transfer learning techniques to create classification and object detection models trained on post-disaster imagery. Using these machine learning models would significantly reduce the time required for damage assessment. Therefore, relief plans created in the wake of a future coastal hazard would save hours to days of time required to determine the total damage incurred. As a result, impacted coastal communities would be able to receive more reliable and prompt relief from direct implementation of artificial intelligence technology such as our classification and object detection models.
Author Contributions: L.C.: methodology, software, validation, formal analysis, investigation, resources, data curation, writing-original draft, writing-review and editing, and visualization. Z.W.: conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing-original draft, writing-review and editing, and visualization. All authors have read and agreed to the published version of the manuscript.
Funding: This research was partly supported by the research momentum fund and the faculty start-up fund provided by the University of North Carolina Wilmington.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.