Identifying Collapsed Buildings Using Post-Earthquake Satellite Imagery and Convolutional Neural Networks : A Case Study of the 2010 Haiti Earthquake

Earthquake is one of the most devastating natural disasters that threaten human life. It is vital to retrieve the building damage status for planning rescue and reconstruction after an earthquake. In cases when the number of completely collapsed buildings is far less than intact or less-affected buildings (e.g., the 2010 Haiti earthquake), it is difficult for the classifier to learn the minority class samples, due to the imbalance learning problem. In this study, the convolutional neural network (CNN) was utilized to identify collapsed buildings from post-event satellite imagery with the proposed workflow. Producer accuracy (PA), user accuracy (UA), overall accuracy (OA), and Kappa were used as evaluation metrics. To overcome the imbalance problem, random over-sampling, random under-sampling, and cost-sensitive methods were tested on selected test A and test B regions. The results demonstrated that the building collapsed information can be retrieved by using post-event imagery. SqueezeNet performed well in classifying collapsed and non-collapsed buildings, and achieved an average OA of 78.6% for the two test regions. After balancing steps, the average Kappa value was improved from 41.6% to 44.8% with the cost-sensitive approach. Moreover, the cost-sensitive method showed a better performance on discriminating collapsed buildings, with a PA value of 51.2% for test A and 61.1% for test B. Therefore, a suitable balancing method should be considered when facing imbalance dataset to retrieve the distribution of collapsed buildings.


Introduction
With the advance of sensor and space technology, remote sensing is able to obtain detailed temporal and spatial information at the target area, and has been widely used to detect, identify, and monitor the effect of natural disasters [1,2].It has been adopted in various post-earthquake activities as remotely sensed images usually need minimal fieldworks, which is especially important to earthquake-affected areas that are difficult to access [3].Building damage information is key to post-earthquake rescue and reconstruction.It has been demonstrated that remotely sensed data are capable to derive relatively accurate building damage information [4].High-resolution remote sensing imageries are able to generate building-by-building damage maps by interpreting their damage states [5][6][7].
Building damage can be detected by using only post-event data with the help of the emergence of very high resolution (VHR) remote sensing imagery, which can provide detailed textural and spatial features of the damaged targets [8].A wide range of remote sensing techniques is applicable to evaluate post-earthquake damage, including optical satellite imagery, synthetic aperture radar (SAR), and light detection and ranging (LiDAR).With the rapid improvement of the spatial resolution of satellite optical sensors (such as WorldView-4, that has a GSD of 0.31 m in the panchromatic band), the utilization of optical data is a promising approach to detect earthquake damage.Visual interpretation, edge and textures, and spectral properties have been used to detect building damage when the post-event optical data are available.The distribution of damaged buildings was visually delineated using post-event optical images to support early emergency and rescue planning [9].The semi-automated approach was applied to identify the region damage using spectral and textural information from an optical image after the earthquake [10].The building damage was detected using the watershed segmentation of the post-event aerial images, assuming their shape information is available as a stored geographic information system (GIS) layer [11].A refined example, automated damage detection method using optical data, was discussed in [12,13].The advantage of using SAR data to assess damaged buildings is its independence from sun illumination and relative insensitivity to atmospheric conditions [14].Compared with combined pre-with post-event data [15], single post-event PoISAR data is quicker and more convenient to assess building damage [16].The potential of post-event SAR images has been demonstrated for building damage assessment after earthquakes [17][18][19].LiDAR can provide a three-dimensional visualization of the damaged area, which is useful to automatically generate a damage map [20].In addition, LiDAR has the ability to work at day and night, even in adverse conditions, such as in poor illumination or through clouds and smoke.A number of studies have used post-even LiDAR imageries to detect building damage [20][21][22][23].There are also some studies that combined optical imagery with SAR imagery.Different damage types have been analyzed in [24] using post-event TerraSAR-X Spot-Light VHR SAR image and optical images as assistance to facilitate the analysis.To validate and analyze the results, a validation map was created based on optical imagery, and the result demonstrated that SAR data have potential for application in urban disaster monitoring and assessment [25].
Automatic and visual methods are common approaches to generate building damage maps using satellite or aerial imageries [26].However, the visual method based on the study of manual sampling is time-consuming, which is disadvantageous for planning rescue [27].By contrast, automatic methods are able to derive change information from satellite images efficiently.The classification model created by supervised learning can predict the class of other unclassified instances automatically, once the model is generated [28].Nevertheless, it also requires time and effort to prepare a few training samples for supervised methods, which is a disadvantage for rapid damage assessment.Numerous scholars have paid attention to the use of machine learning methods to detect damaged buildings using post-event datasets, and carried out much fruitful work.The feature extraction was conducted by morphological profiles and texture statistics, then collapsed buildings were classified using support vector machine (SVM) [29].The collapsed buildings were detected by methods based on object-based image analysis (OBIA), and SVM using post-event LiDAR data [30].A support vector selection and adaptation (SVSA) method was applied to two small regions and the entire city of Port-au-Prince (Haiti), to assess the damage using the post-event satellite images [31].A variety of algorithms and parameters were tested on post-event aerial imagery for the earthquake in Christchurch, New Zealand, and the results showed that object-based approaches can produce better results than pixel-based approaches in earthquake damage detection using remotely sensed images [32].Random forest (RF), SVM, and K-nearest neighbor (K-NN) classifiers were applied to classify collapsed and standing buildings with the post-event SAR image and the building footprint map [33].
Convolutional neural networks (CNNs) have become hot research topics in the field of image recognition and speech analysis in recent years.A CNN is an alternative type of neural network architecture that can be used to model spatial and temporal correlations [34][35][36].It can reduce the complexity of network models and the number of weights, due to its weight-sharing characteristic, which makes it more similar to a biological neural network.It has a high invariance in translation, scaling, incline.There are many kinds of CNNs, such as AlexNet [37], VGGNet [38], and GoogleNet [39].SqueezeNet [40,41] was developed by Forrest Iandola, and it can acquire the same accuracy of AlexNet-level on ImageNet, with 50 times fewer parameters.However, there are still limited studies using CNNs to obtain the damage information in earthquake-affected areas.In a recent study, deep learning was explored for building damage detection caused by earthquakes using oblique aerial images [42].It demonstrated that CNN features performed better than 3D point cloud features using a multiple-kernel-learning approach for detecting damaged regions using VHR images.Besides, there is a significant challenge for remotely sensed imagery analysis due to the highly imbalanced class distribution [43,44].To handle the imbalanced classification problems, various methodologies have been proposed, such as resampling, modification on classifier optimization problem, or introducing a new optimization task on top of the classifier [45].A number of studies have been aimed to deal with imbalanced datasets acquired from remote sensing images.Infinitely imbalanced logistic regression (IILR) was proposed to deal with remote sensing datasets [46].The oil spills were able to be detected by applying the one-sided selection method (OSS) and satellite radar images [47].To deal with the imbalance problem in the convolutional neural network, seven approaches were compared in [48], including random over-sampling, random under-sampling, and thresholding with prior class probabilities.
The objective of this study was to explore the performance of SqueezeNet on identifying collapsed buildings using single post-earthquake VHR satellite data.Completely collapsed buildings can be readily identified from disintegrated roof structures and associated texture features from VHR imagery, while lower damage grades are much harder to map, as such damage effects are largely expressed along the facade, which are not visible in such imagery.The dataset obtained after the 2010 Haiti earthquake was used in this study.As the distribution of building damage grades was imbalanced, three balancing methods were adopted to improve the accuracy of identifying collapsed buildings including random over-sampling, random under-sampling, and cost-sensitive approaches.The rest of this paper is organized as follows.Section 2 provides the descriptions of the study area.Section 3 briefly introduces basic concepts of convolutional neural networks, data balancing methods, and the metrics used to evaluate the performance.The workflow of using SqueezeNet to classify collapsed buildings caused by the earthquake is also included.Section 4 presents the experimental results and discusses the methodology used in the experiments.Finally, conclusions are drawn in Section 5.

Input Data
Massive buildings and infrastructures were damaged, and some of them even completely collapsed, after an earthquake struck Haiti on 12 January 2010.It was said that more than 300,000 people lost their lives, and about 105,000 houses were completely destroyed in Haiti earthquake [49].The location of the study area is shown in Figure 1.Post-earthquake satellite images were captured on 15 January 2010 by the QuickBird satellite.The data were obtained via DigitalGlobe open data program having a resampled spatial resolution of 0.5 m, and the NIR band was not included.The building damage information was visually interpreted from high-resolution satellite images and aerial photos by UNITAR/UNOSAT [50].The building damage level was classified into five categories based on the EMS-98 [51]; G5: Destruction; G4: Very heavy damage; G3: Substantial damage; G2: Moderate damage and G1: Negligible damage.Selected examples of damaged buildings were shown in Figure 2. Building footprints were manually extracted in the study area using ArcGIS 10.4.To train and validate the proposed method for collapsed building identification, the study area was further separated into three regions: train, test A and test B. In this study, the number of collapsed and non-collapsed buildings is 613 and 1857 for the training region, 129 and 454 for test A, and 322 and 553 for test B.

Methodology
In this study, a CNN-based approach was proposed for building collapsed assessment after the earthquake.The workflow can be seen in Figure 3.To make use of VHR satellite imagery, a decomposition method should be used to split the large image into small processing patches [52].Small building patches were extracted from satellite images according to the building boundary polygons.However, building patches have a different number of width and length pixels, which makes them unsuitable as inputs for the CNNs.We adopted a zero-padding operation to make the small building patches uniform, while discarding building patches with the width or length smaller than 10 pixels or larger than 96 pixels.The damage grades were reclassified into binary categories: collapsed (G5) and non-collapsed (G1-G4), which will be used as the labels for the corresponding building patches for further analysis.Non-collapsed buildings (2864) outnumbered collapsed buildings (1064), which caused an imbalance problem and may affect the classification results.Three balancing methods were considered and compared.Finally, a building collapsed map can be derived with the proposed workflow.

Convolutional Neural Networks (CNNs)
CNNs can be viewed as multilayer neural networks, in which shift and distortion invariance can be ensured due to CNNs' special architectures: local receptive fields, shared weights and, sometimes, spatial or temporal subsampling.A typical CNN structure normally contains convolution, pooling, and activation function layers, as shown in Figure 4.The first layer represents input data, while the second layer means feature maps after the convolution process.The third layer contains features of activation maps after the process of the activation function.The fourth layer is the pooled feature map after the pooling process.Red squares in the figure represent filters, and the latter square is the output of the former one after the corresponding operation (convolution, ReLU activation, and pooling).The convolutional layer is meant to extract features, and filter weights can be shared across all pixels.
The spatial variation and correlation will be reduced in convolutional layers.There are many kinds of activation functions, such as sigmoid, tanh, and rectified linear unit (ReLU).ReLU activation function is able to avoid the vanishing gradient, and has less computation than tanh and sigmoid, as it involves simpler mathematical operations [37].In the nonlinearity layer, ReLU is applied to each component in a feature map, as shown in Equation ( 1), in which x means the input to the activation layer.It is a half-wave rectifier function, which can significantly accelerate the training phase and prevent overfitting.
The main function of the pooling layer is to compress the feature graphs and reduce the dimensionality [53].Common methods are to maximize or average the input values.The pooling layer can be viewed as down-sampling of the convolutional feature map [54].A max operation is implemented over a small region G of each feature map.

SqueezeNet
A small CNN architecture named SqueezeNet was proposed in 2016.Compared to AlexNet, SqueezeNet can get a similar accuracy of classification with 50× fewer coefficients by using a compression methodology, and was proven on the ImageNet database [40].The design goal of SqueezeNet is not to get the best CNN recognition accuracy, but to simplify the network complexity and attain the recognition accuracy of the public network.There are three main strategies in SqueezeNet architecture.For the first strategy, a fire module was proposed based on the use of 1 × 1 filters instead of 3 × 3 filters, as 1 × 1 filters have 9× fewer parameters than 3 × 3 filters.For the second strategy, a squeeze layer was applied to decrease the number of input channels to 3 × 3 filters, instead of the 11 × 11 filters adopted by AlexNet.The first and second strategies are designed to reduce the number of parameters in a CNN, while maintaining similar inference accuracy.The last strategy is to get large activation maps in convolution layers and maximize the accuracy.For SqueezeNet, the stride for the first convolutional layer is 2 × 2 instead of 4 × 4 for AlexNet.As if early layers in the network have small strides, the following layers will have large activation maps.
The Fire module is the basic building block of SqueezeNet architecture, as shown in Figure 5.It consists of a squeeze convolution layer with 1 × 1 filters feeding an expanding layer with 1 × 1 and 3 × 3 filters.The number of filters per Fire module is gradually increased, from the beginning to the end of the network.Relu [56] is functioned as the activation function in all Fire modules.The SqueezeNet architecture utilizes global-average-pooling layer to replace the full-connected layer, which is easier to interpret and less prone to overfitting than full-connected layers.
The structure of the adopted CNN model is listed in Table 1.The input layer expected building patches with width, length, and band values as (96, 96, 3) followed by a convolutional layer with 64 filter kernels and a stride of 2 × 2. Three Fire modules were adopted from SqueezeNet when constructing the CNN model.A max-pooling operation was inserted between the fire modules.A dropout layer was also considered after the Fire modules.Finally, a global-average-pooling layer was used to replace the conventional flatten layer, followed by a softmax layer to classify if the input building collapsed or not.The model has a total number of 164,194 parameters, which were trained with the mentioned training dataset.The CNN model was implemented using Keras 2.1.5with tensorflow 1.8 as backend.Computing was done using Google Cloud Platform with NVIDIA Tesla K80 GPU and 26 gigabyte memory.

Data Balancing Methods
There are mainly three approaches to deal with class imbalance problems, which can be classified as data-level, algorithm-level, and hybrid methods [57].Data-level methods mainly use re-sampling methods to balance the class distribution in the training data.Random over-sampling methods increase the number of samples in the minority class by randomly replicating or generating a new minority.As opposed to the random over-sampling, majority class instances are randomly eliminated by the random under-sampling method, to balance class distribution until the minority and majority have the same number of instances.Algorithm methods modify existing classification algorithms to improve the sensitivity of the classifier towards minority classes.One of the most popular algorithm-level methods is the cost-sensitive approach [58], which assigns different cost to samples from different classes.In this study, we simply use the class proportion as the loss weight for different classes.The minority class samples have higher costs, thus giving them greater impact on the weight-updating in the neural network [59].The hybrid method integrates previously mentioned approaches to improve the performance [60], which was not considered in this study.

Evaluation Metrics
The selected Haiti dataset was separated into the train, test A, and test B regions, as shown in Figure 1.Several metrics are used as the evaluation standards in this study, including producer accuracy (PA), user accuracy (UA), overall accuracy (OA), and Kappa, which are proposed based on the confusion matrix (Table 2 3)) is the percentage of examples correctly classified.OA is often used to measure the performance of learning systems.However, it is not appropriate when the dataset is imbalanced, since it tends to be biased toward the majority class while neglecting the minority class.PA is the probability that a value in a given class was classified correctly.UA is the probability that a value predicted to be in a certain class really is that class.The probability is based on the fraction of correctly predicted values to the total number of values predicted to be in a class.The Kappa coefficient of agreement developed by Cohen (1960) is a statistical measure of inter-rater agreement for categorical items.It can be calculated by Equation (4).P 0 is the observed proportion of agreement, and P e is the proportion of agreement expected by chance.When dealing with imbalance dataset, it is important to pay attention not only to the overall accuracies but, also, the corresponding misclassification costs.Thus, Kappa would be a better performance measure than the OA when facing an imbalanced dataset.Kappa coefficients are interpreted using the guidelines outlined by Landis and Koch [61], who characterized values between 0.01 and 0.20 as slight, between 0.21 and 0.40 as fair, between 0.41 and 0.60 as moderate, between 0.61 and 0.80 as substantial, and between 0.81 and 1.00 as almost perfect. (4)

Identifying Collapsed Buildings Using CNNs
The number of width and length pixels for extracted building patches are different.The width and length vary from several to hundreds of image pixels.The length of the non-collapsed buildings ranges from 10 to 276 pixels, and the width ranges from 9 to 413 pixels.For collapsed buildings, the length falls within the range of 7-194 pixels, and the width within the range of 11-438 pixels.The distributions for collapsed and non-collapsed buildings have similar trends, mainly ranging from 20 to 40 pixels for both width and length, with long tails, as can be seen in Figure 6.Typical CNNs require fixed-size inputs, as pointed out by [62].Convolutional layers do not require a fixed image size, but the fully connected layer needs to have fixed-size inputs, by definition.Although a global average pooling layer was used instead of a fully connected layer in the network, it is still difficult to implement CNNs using variable-size inputs.In this study, too large or small buildings were ignored by defining the thresholds, and the remaining buildings were padded by zero values to have the same dimensions.The width and length of building patches were limited from 10 × 10 to 96 × 96 pixels in this study.Building patches having pixel sizes outside of the mentioned range were filtered.For retained collapsed or non-collapsed buildings, a zero-padding operation was considered so that all building patches have a uniformed pixel size of 96 × 96.The results of using the mentioned CNN model can be seen in Table 3, to classify collapsed and non-collapsed buildings caused by the Haiti earthquake.SqueezeNet achieved Kappa values of 37.7% and 45.6% for discriminating collapsed and non-collapsed buildings using the post-earthquake VHR imagery on test A and test B, respectively.The PA values of non-collapsed buildings were very high (>90%) in the two test regions, which means the model performed well on classifying non-collapsed buildings.It can be seen that collapsed buildings were prone to being misclassified through the low PA values of 42.6% and 50.6%.The UA values for non-collapsed buildings are comparatively higher than for collapsed buildings.The UA values for collapsed buildings were 58.5% and 78% for test A and test B, indicating that non-collapsed buildings were comparatively prone to being wrongly identified in test A region.One reason for the low PA value of collapsed buildings is that the classification method is prone to favor the majority class when dealing with the imbalanced dataset.The building structures should also be considered in the study area.It is prone to being correctly classified for concrete buildings having very prominent collapse or damage structures with totally broken down roofs.Steel or wooden frame buildings with metal sheet roofs, where the building was physically collapsed but there was no visible deformation or textural change to its roof structure, would be hard to correctly classify [63], which would decrease the number of correctly classified collapsed buildings.The Kappa value for test A (37.7%) and test B (45.6%) also indicated that SqueezeNet performed better on test B to discriminate collapsed buildings, which could be partly caused by the difference in building structures in these two regions.Building structures in Test B include concrete structures with flat roofs of varying heights and sizes, wooden or steel frame buildings with corrugated metal sheet roofs, and low height metal sheet shelters (shanty housing) with very small-sized dwellings [63].For test A, there is mainly a density of small buildings or even informal huts, as relatively poor residents lived here, and built mostly makeshift homes when the devastating earthquake struck Haiti.While such buildings were prone to be misclassified from the imagery, the extracted patches for small buildings were padded with more zero values in the pre-processing step, which will also affect the model's performance.To demonstrate the achieved results, building-by-building evaluation maps were shown in Figure 7.

Performance of Balancing Methods for Identifying Collapsed Buildings
In the training dataset, the number of collapsed and non-collapsed buildings was 613 and 1857, respectively.The imbalanced distribution of training labels makes the classifier biased to the majority class, which results in the CNN model not performing well on identifying collapsed buildings compared to non-collapsed buildings.Random under-sampling, random over-sampling, and cost-sensitive methods were adopted to deal with the imbalance problem.Table 4 compared the overall performance of the three balancing methods.The highest PA values for collapsed buildings in regions of test A (61.2%) and test B (69.6%) were acquired by random under-sampling methods.Also, the UA values were lowest among these three methods, 47.6% and 65.7%, respectively.The higher PA for collapsed buildings means that more collapsed buildings were classified correctly.In addition, lower UA for collapsed buildings means larger number of FP.Random over-sampling achieved similar results with the cost-sensitive method.It can be seen that the highest OA and Kappa were acquired by the cost-sensitive method, 80.1% and 40.6% for test A, and 77.0% and 48.9% for test B. Therefore, the cost-sensitive method performed better in discriminating buildings.After balancing steps, the CNN model still achieved a better OA for test A than for test B, and the Kappa values for test B are comparatively higher.To demonstrate the achieved results, building-by-building evaluation maps were shown in Figure 8 for test A and test B, by considering the cost-sensitive method.It can be seen that the PA values for non-collapsed buildings are higher than collapsed buildings with or without the balancing procedure.The PA values of collapsed buildings are increased after balancing, which means the balanced model has a better capability in identifying collapsed buildings, which is very important for planning rescue after an earthquake.For the random under-sampling method, it discarded a large number of non-collapsed building samples.An increasing number of collapsed buildings were correctly classified, and the PA values improved from 42.6% to 61.2% for test A, and from 50.6% to 69.6% for test B.However, the performance for non-collapsed buildings was severely damaged, thus, the OA decreased from 80.6% to 76.5% for test A, and from 76.6% to 75.4% for test B. In this study, random over-sampling and cost-sensitive methods achieved similar results, and the latter one has a slightly better performance.Although the overall accuracies did not improve with balancing methods, the Kappa values improved from 37.7% to 40.6%, and from 45.6% to 48.9% with the cost-sensitive method.

Intra-Class Analysis for Building Damage Assessment
To make the pixel size of building patches unified, a zero-padding operation was applied to the input data.If the original building patch has fewer pixel values, it will affect the model performance by padding too many zero values.To analyze the model performance on intra-class samples, test B dataset was re-classified according to the original width pixel numbers of extracted building patches.Test B dataset was used to explore the performance of the CNN and the cost-sensitive method on re-classified data.The distribution of width pixels for test B building patches was shown in Figure 9, mainly ranging from 20 to 40 pixels.Building patches were classified into five categories according to the number of building width pixels, as shown in Table 5, making the number of training data as equal as possible for each category.It can be seen that Kappa values for the categories of "25-31" and "31-47" were comparatively low before balancing, and the highest Kappa was achieved for buildings with width pixels within the range of 37-46.After balancing, the Kappa values were improved for all categories, except for "<25".For the category of building width larger than 46 pixels, the OA and Kappa values were improved from 76.3% to 80.2%, and 48.9% to 57.8%.For building patches with width pixels lower than 25, the Kappa value was decreased from 44.0% to 40.8%.While the balanced model achieved better Kappa values on building patches with width larger than 25 pixels, the balancing operation could deteriorate the performance for the left small buildings in this case study.The confusion matrix values obtained using re-classified test B dataset were plotted on Figure 10.TN has the highest values among the confusion matrix, as non-collapsed buildings outnumbered collapsed buildings, and the classifier performed well on classifying non-collapsed buildings.The aim of this study is to identify collapsed buildings, which plays a key role in post-event rescue and reconstruction.After balancing, the number of TP samples increased, showing that the balanced classifier had a better performance on identifying collapsed buildings.However, the TP values were still relatively low.One reason is that only post-event data was considered in this study.The accuracy would be better when pre-earthquake data and LiDAR data were considered.The zero-padding pre-processing operation also affected the performance on small buildings.Furthermore, the number of training samples was still too small.Considering that it is not easy to prepare large number samples for such a task, transfer learning based on CNN could be considered, by fine-tuning large-scale dataset-derived complex CNN models with relatively less new data.The model was prone to favor collapsed buildings after balancing, which would increase the number of misclassified small buildings for non-collapsed buildings.It can be observed from Figure 10 that the number of false positives was increased, along with the increase of the number of true positives for buildings with width pixels smaller than 46.Similar results were also observed in [64].This is a disadvantage and undesirable for rapid damage assessment.It is demonstrated that the number of false positives could be reduced by taking advantage of ensemble learning [65].However, it is still a challenge to properly deal with an imbalanced dataset [57].Besides, the building structures in the study area should be considered when using balancing methods for identifying collapsed buildings after an earthquake, which could deteriorate the performance for small buildings.

CNNs for Identifying Earthquake-Induced Collapsed Buildings
There are several existing studies using VHR satellite imagery for mapping earthquake-induced collapsed or affected buildings after the 2010 Haiti earthquake.Texture and structure features were derived from pre-and post-earthquake VHR satellite imagery for the city of Port-au-Prince (Haiti), and obtained overall accuracies of 74.1-77.3% and Kappa values of 30.6-40.2% using artificial neural networks (ANN), radial basis function neural network (RBFNN) and RF [1].A support vector selection and adaptation (SVSA) approach was carried out, to classify the post-earthquake QuickBird data into eight land-use classes, and 92 damaged buildings were correctly identified from the total 145 damaged samples [31].The road and building classes were confused with the damage class due to pixel-based classification.A moderate result was achieved in this study using the CNN approach, which can take advantage of features learnt via the trained network, and requires no extra feature extraction.However, the learnt features are difficult for interpretation.When the post-event LiDAR point cloud data were considered, a better result was achieved in [66] by combing spectral, texture, and height information, and implemented at the object level.A one-class support vector machine (OCSVM) method was adopted to extract collapsed buildings and obtained an OA of 88.3% and Kappa value of 70.8%.LiDAR data can characterize building roof changes through the accurate and precise measurement of height information.A 3D shape descriptor was further developed in [3], based on building contour clusters derived from airborne LiDAR point cloud data, and achieved an OA of 87.3% and Kappa value of 73.8%.
Kappa was used in this study as one of the evaluation metrics, and it is also commonly adopted by many previous studies related to earthquake-induced collapsed buildings [2,67].Kappa statistics showed significant response to the class distribution, while OA is not a useful measure when evaluating classifiers learned on imbalance datasets [68,69].Furthermore, it provides a mean to make the achieved result comparable with previous studies.However, it is pointed out that there are some drawbacks for Kappa [70].The major limitations are that randomness may mislead the accuracy of assessment, and may incorporate problems in computation and analysis [71].The allocation agreement and quantity agreement were further proposed by Pontius and Millions [70] to replace Kappa, and will be considered in further studies.
In this study, grade 4 damage was classified as non-collapsed, as in [1].However, it is pointed out that grade 4 damage is difficult to identify from remotely sensed images [6].Very heavy damaged buildings might have failure of walls, or partial structural failure of roofs and floors.Pre-and post-event imagery are needed to distinguish them from collapsed buildings.With the availability of aerial oblique imagery, a detailed building damage map for each grade might be achievable [72].Besides, building footprints, in this study, were manually extracted from the imagery, which is time-consuming and also disadvantageous for planning rescue after earthquake.In [65], a fractal net evolution approach (FNEA) algorithm was adopted to delineate the image into objects.It showed that collapsed buildings and other objects (e.g., intact buildings, vegetation, and shadow areas) were well segmented, and a collapsed building detection method was further proposed.A region-growing-with-smoothness-constraint approach was suggested to segment damaged and undamaged buildings from airborne LiDAR data [73].Out of 1953 validation buildings, 1890 were correctly segmented, with only 0.03% errors of commission and 0.03% errors of omission.Accurate automatic segmentation methods should be considered to extract building footprints from remotely sensed data.
It should be mentioned that the achieved result, in this study, was not satisfied for rapid damage assessment, as indicated by the Kappa values.On the one hand, only post-event imagery was used in the study, and pre-event data and LiDAR data are also crucial for identifying collapsed buildings.On the other hand, the performance is also affected by the building structures in the study area.There are many small buildings, making it difficult to correctly classify them.Furthermore, the size of training dataset was too small for the CNNs.However, this study demonstrated that the CNN method, to some extent, is able to distinguish collapsed buildings from non-collapsed buildings using single post-event satellite imagery.The performance is expected to be improved when more training data are available or using more advanced deep learning structures.Besides, rotating the input image will not significantly impact the output as CNNs are capable of learning invariant features due to the special architecture, including local receptive fields, shared weights, and the spatial subsampling.The model transferability to other areas was not explored in this study.It is worth pointing out that the CNN model is supposed to be transferrable to new datasets when a few training samples are available.The pretrained network weights can be fine-tuned by training the network with new data.Once an accurate CNN model was built, which can be viewed as the pretrained model, the fast availability of post-earthquake VHR imagery would be crucial for rapid damage assessment [74].Interpretation of the imagery or field observation should also be involved, to prepare for a situation of little training data, which will be then used to fine-tune the pretrained model.The transferability of CNNs using remotely sensed data has been demonstrated in case studies of land-use classification, SAR target recognition, and soil clay content mapping using airborne hyperspectral data [36,75,76].Furthermore, it is also possible to fine-tune CNN models (VGG, Xception, ResNet), trained by existing large-scale dataset (like ImageNet) to identify collapsed buildings caused by earthquakes.

Conclusions
Supervised classification algorithms have been widely used in damage assessment after an earthquake.In this study, the convolutional neural networks were proposed for identifying collapsed buildings after the Haiti 2010 earthquake using single post-earthquake VHR satellite imagery.The SqueezeNet method achieved OA values of 80.6% and 76.6% on classifying collapsed and non-collapsed buildings using test A and test B dataset.The damage grade distribution of earthquake-affected buildings is often imbalanced, as collapsed buildings are normally less than non-collapsed buildings after an earthquake.Three balancing methods were considered, integrating with the CNN model.Although the overall accuracies did not improve significantly for the two test regions, the model's capability of identifying collapsed buildings was enhanced with balancing methods.The SqueezeNet-similar CNN model achieved Kappa values of 40.6% and 48.9%, for test A and test B dataset, with the cost-sensitive method.The zero-padding operation for preparing building patches can improve the model performance on the intra-class dataset.The balanced model achieved the Kappa value of 57.8% for buildings with width larger than 46 pixels.However, the number of false positives were increased, along with the increase of the number of true positives, especially for small buildings.Thus, the building structures in the study area should be considered when using balancing methods for identifying collapsed buildings after earthquakes.The efficiency is crucial for emergency mapping.Apart from the preparation for training data, it took 95.3 s to train the model using Google Cloud Platform in this study.When more data and complex deep learning models with even millions parameters were involved, the model training time will be expected to be increased.It would be practical to consider a pretrained model, which requires fewer samples to fine-tune it, instead of starting from scratch after the earthquake.
Author Contributions: All authors contributed in a substantial way to the manuscript.M.J. conceived, designed and performed the research and wrote the manuscript.L.L. made contributions to the design of the research and data analysis.All authors discussed the basic structure of the manuscript.M.B. reviewed the manuscript and supervised the study at all stages.All authors read and approved the submitted manuscript.
Funding: This research received no external funding.

Figure 1 .
Figure 1.Location of the study area and regions for train and test datasets.

Figure 3 .
Figure 3. Workflow of mapping collapsed buildings using very high resolution (VHR) imagery and convolutional neural networks (CNNs).

Figure 5 .
Figure 5.The structure of the Fire module used in SqueezeNet.
). True positive (TP) is the number of positive examples correctly classified, false positive (FP) is the number of negative examples incorrectly classified as positive, false negative (FN) is the number of positive examples incorrectly classified as negative, and true negative (TN) is the number of negative examples correctly classified.OA (Equation (

Figure 6 .
Figure 6.The distribution of width and length pixels for collapsed and non-collapsed building patches.

Figure 7 .
Figure 7.The performance of the CNN on identifying collapsed buildings using test A (A) and test B (B) dataset.

Figure 8 .
Figure 8.The performance of the balanced-CNN on identifying collapsed buildings using test A (A) and test B (B) dataset.

Figure 9 .
Figure 9.The distribution of width pixels for collapsed and non-collapsed building patches for test B dataset.

Figure 10 .
Figure 10.Plots of the confusion matrix values obtained using re-classified test B dataset: (A) CNN (B) Balanced-CNN.FP (non-collapsed buildings misclassified as collapsed ones); FN (collapsed buildings misclassified as non-collapsed ones); TP (collapsed buildings classified correctly); TN (non-collapsed buildings classified correctly).

Table 1 .
The CNN structure adopted in this study.

Table 3 .
SqueezeNet performance on test A and test B dataset.

Table 4 .
The performance of balancing methods on test A and test B dataset.

Table 5 .
SqueezeNet model performance on test B data re-classified by the number of building width pixels.