1. Introduction
Building damage detection is a crucial task after large-scale natural disasters, such as earthquakes, as it can provide vital information for humanitarian assistance and post-disaster recovery. Remotely sensed images are often used for earthquake-induced damage mapping, due to their ability to cover a wide range in a short time period [
1,
2]. In general, there are two types of approaches, which are adopted according to the availability of data. The first type is change detection approaches, which compare pre- and post-event images, whereas the second type is solely based on post-event images [
3]. The former can usually achieve higher performance, but the latter is often preferred, as the acquisition of pre-event data is not always guaranteed.
Boosted by the rapid development of computer vision and machine learning technologies, more and more damage detection methods using supervised learning—through training and testing using large-scale datasets—have been developed [
4,
5]. Among machine learning algorithms, deep learning [
6] is a rapidly developing technology that can dramatically outperform previous conventional methods for image processing [
7]. The strength of deep learning lies in the automatically optimized feature extraction process, which is often manually carried out by human experts in conventional machine learning pipelines. The success of deep learning has also led to an increased interest in applying deep neural networks (DNNs) to large-scale building damage detection using vertical aerial images [
8], where their performance has been proven to be superior to that of conventional supervised learning approaches [
9]. Trained DNNs are able to classify building damage by learning the patterns of roof failure, as well as the surrounding objects (e.g., debris, collapsed walls). Despite their state-of-the-art performance, fundamental limitations with respect to image-based approaches have also been reported. For instance, story-collapsed buildings with slight roof failures are extremely hard to identify merely by vertical images [
10], as shown in
Figure 1. This is due to a lack of 3D geometric information of buildings revealed in vertical 2D images. This is extremely problematic as the partial or complete collapse of a building may greatly endanger its residents [
11], and thus, they should be detected with the highest priority. Therefore, the use of technology that is more suitable for the detection of collapsed buildings needs to be introduced.
Airborne light detection and ranging (LiDAR) systems are capable of rapidly obtaining position and precise elevation information in the form of 3D point cloud data. These systems can operate under most weather conditions by using the active sensing principle, making them an attractive choice for building damage detection. As can be observed in
Figure 1, the obtained point cloud can successfully capture the deformations of buildings, which are hardly visible in vertical aerial images. Therefore, LiDAR technology has greater potential, being able to obtain more precise height information than vertical images, to detect severely damaged (e.g., story-collapsed) buildings. Similar to image-based methods, multi-temporal change detection [
13], as well as single-temporal approaches [
14] have been used to assess building damage using LiDAR point clouds. As buildings are often represented by roofs (the other parts are normally not visible or partially visible for lower residential buildings) in airborne LiDAR point cloud data, a large number of methods have been proposed to detect surface and structural damages represented by roofs. For instance, region-growing-based methods are proposed to detect the planes by segmenting 2.5D raster data and subsequently classified by comparing post-event data with pre-event wireframe reference building models [
15,
16]. However, these approaches require multiple parameter-sensitive algorithms for detecting planes and roofs, which make them time-consuming to generate optimal results. Furthermore, the constant values of parameters in segmentation process are vulnerable to non-uniform density and data corruption. Structural damages such as roof inclinations can be reliably detected by calculating angle between geometric axis of roofs and vertical planes [
17]. However, the method put strong assumptions on the type of building roofs, and hence this type of method is not suitable for large areas with varying roof types. Some approaches utilize radiometric information of (oblique) images to assist the building damage detection in the 3D domain [
18,
19]. Nevertheless, oblique image-based approaches are difficult to adapt to dense residential areas with low-height buildings because of the large numbers of occlusions. Moreover, rich radiometric information from images is not always available to assist 3D building damage detection.
Damage detection using vertical aerial images in deep learning has achieved satisfactory results so far. Following its success, 3D deep learning on point cloud data for damage assessment should therefore naturally be promising and worthy of investigation. Indeed, the seminal work of PointNet [
20] and its variants [
21,
22,
23], which directly operate on point clouds, have demonstrated superior performance compared to conventional feature-engineering-based approaches [
20], as well as DNNs, which require additional input conversions [
24,
25]. The comprehensive review of recent development of point cloud-based deep learning including proposed algorithms, benchmark datasets, and applications on various 3D vision tasks were provided by [
26]. Despite the remarkable progress of 3D point cloud-based DNNs, to the best of our knowledge, based on the screening of the recent literature, it is apparent that only a limited number of studies exist regarding the use of 3D deep learning for point cloud-based building damage detection. The inferred primary reason for the limited number of works is the absence of a large-scale airborne LiDAR dataset tailored to 3D building damage detection. Although an SfM-based point cloud dataset was developed and the performance of 3D voxel-based DNN was tested using the developed dataset in [
27], the size of the dataset developed was relatively small. A large-scale dataset is a crucial and fundamental resource for developing advanced algorithms for targeted tasks, as well as for providing training and benchmarking data for such algorithms [
28,
29,
30,
31], which requires decent expertise and can be labor-intensive.
To this end, this study aims to reveal the potential and current limitations of 3D point cloud-based deep learning approaches for building damage detection by creating a dataset tailored to the targeted task. Two types of building data were created: building roof and building patch, which contains a building and its surroundings. Furthermore, mainstream algorithms are used to evaluate its performance by using the developed dataset under varying data availability scenarios (pre–post-building patch, post-building roof, and post-building patch) and reference data. The pre–post scenario tries to detect damage using pre-event and post-event data, whereas post-building patch and roof only use post-event data. In addition, to validate whether the developed model can gain correct knowledge about problem representations, a sensitivity analysis is conducted, considering the nature of building damages. The robustness of the trained models against sample reduction is tested by conducting an ablation study. Finally, the generalization ability of the trained model is examined using LiDAR data obtained with a different airborne sensor. The building point cloud obtained by this sensor has a different point density, and the architectural style (reinforced concrete) of these data is distinctly different, compared to the in the proposed dataset (wooden).
To summarize, this paper provides the following contributions:
We create a large-scale point cloud dataset tailored to building damage detection using deep learning approaches.
We perform damage detection under three data availability scenarios: pre–post, post-building patch, and building roof. The obtained individual results are quantitatively analyzed and compared.
We propose a general extension framework that extends single-input networks, which accept a single building point clouds, to the pre–post (i.e., two building point clouds) scenario.
We propose a visual explanation method to validate the reliability of the trained models through a sensitivity analysis under the post-only scenario.
We validate the generalization ability of models trained using the created dataset by applying the pre-trained models to data having distinct architectural styles captured by a distinct sensor.
The remainder of this paper is organized as follows:
Section 2 details the procedures of dataset creation.
Section 3 introduces the damage detection framework and the experimental setting.
Section 4 presents the experimental results followed by discussions in
Section 5. The conclusions are presented in
Section 6, providing a brief summary of our findings and prospects for future studies.
4. Results
In this section, the results of our experiments are presented. The results of two damage detection experiments with distinct purposes are quantitatively analyzed. Subsequently, the results of the sensitivity analyses are shown by visualizing the typical recognition pattern of the trained model using IPSI. The typical misclassification types are introduced using IPSI as well. Then, the results of the ablation study showing the robustness of the models considering sample reduction are presented, followed by the results of the generalization tests. Finally, this section concludes with a discussion regarding our significant findings and the challenges derived from the experiments.
4.1. Results of Emergency Mapping
The experimental results of early-stage mapping are shown in
Table 6. As expected, the highest performance was achieved when pre- and post-event data were both available, followed by the building patch and the building roof. The predictions became more stable, in terms of standard deviations, when more information was available. In addition, the pre–post approach also achieved the best performance in every category and performance metric. Therefore, it is clear that the pre–post approach is the most ideal condition for 3D point cloud-based deep learning for damage detection. Furthermore, our proposed general extension framework was proven to be effective, achieving high performance with all chosen modern backbone networks. However, extended PointNet achieved the highest score, compared to other advanced architectures. Presumably, the reason for this is that the introduction of the translation-invariant convolutional kernel made it harder for the networks to capture abstract global changes, such as global translation and deformation. Further exploitation of this topic is out of the scope of this study, and hence, we leave it to future work.
To our surprise, the best-performing network using the post-event building patch was only inferior to the best performing pre–post one by a small margin (0.01). In other words, modern deep neural networks using this point cloud data can perform almost as well as the pre–post one without pre-event data. The PointNet++ architecture demonstrated remarkable performance, surpassing all other networks using building patches by a large margin. Furthermore, it even demonstrated better performance than the pre–post extended PointNet++ and DGCNN.
The overall performance was drastically reduced when the input data were roof only. This situation was consistent except for the recall value of “others” and the precision value of “collapsed” of DGCNN. Therefore, the reduction of contextual information almost surely decreases the classification performance. The largest gap of recall was presented in the “collapsed” class of DGCNN. It was also obvious that removing contextual information made the results more widely varied, as demonstrated by increasing standard deviations among K runs. Such a phenomenon indicated that the predictions of some samples can become stable under the presence of contextual information, showing a possible large dependence on the surrounding environmental elements.
4.2. Results of Fine-Grained Analysis
The results of fine-grained analysis closely resembled the pattern of emergency mapping according to
Table 7, with slightly reduced performance. The reduction of performance may due to the ambiguities between “story-collapsed” and “collapsed”, for which the latter contains the former according to the definition of G5 in the damage pattern chart [
38]. In general, the best-performing models under each achieved high recalls under each data availability scenario, indicating that the models were able to accurately detect story-collapsed buildings among all buildings. In addition, the results proved that the point cloud-based DNNs can perform well given reference data created using in situ photographs.
4.3. Results of Sensitivity Analysis
In this section, the results of sensitivity analysis are presented. The models used were PointNet++ trained by using either post-building roof or post-building patch as training data. The damage localization results are shown to verify whether the model can make correct predictions by focusing on damaged parts. In addition, the failure cases were also analyzed to reveal the limitations of modern DNNs.
4.3.1. Damage Localization
One of the most obvious indicators of damage to a building is deformations. The top row of
Figure 9b clearly shows that the model could localize the partial deformation occurring in the roofs of buildings, where other parts of the roofs were relatively intact in the captured point clouds. As shown in
Figure 9c (i.e., of the same row), the highlighted part was extremely important for classifying this point cloud, as the result became incorrect when these points were removed. When the input was the building patch, the model prioritized the debris instead of other deformed parts of the roof, according to
Figure 9d. However, the model tended to localize many parts of the building, but the maximum assigned score remained low (0.13) when the input was a completely collapsed building with a heavily damaged roof. Therefore, the classification results of such buildings were stable. The building in the bottom row was story-collapsed with little roof damage and a minor inclination. Although it was correctly classified, the base score of the building was lower (0.67) compared to those in the upper rows (0.94+). The low base score indicated that the classification result was unstable and could possibly be easily misclassified by removing some parts of it, as is shown in the bottom row of
Figure 9c.
4.3.2. Failure Cases
The typical misclassification patterns using building roofs as input are shown in
Figure 10. The negative IPSI scores for misclassified buildings in
Figure 10 indicate that the probability of correct class increases when removing the corresponding part. In other words, the highlighted points show the possible causes of misclassifications. For the “others” class, as illustrated in the top row, additional structures were one of the misclassification sources. They tended to make the roofs discontinuous, thus making them more similar to damaged roofs. The second row shows that the model located the deformations correctly, but overemphasized it, which led to the misclassification. In terms of the “collapsed” class, the first cause of misclassification was that the model was clearly insensitive to the inclination of the roof, indicating that the model made decisions mainly based on the local roughness of the surface, rather than its global characteristics. It was highly challenging for the model to capture slight but global deformations of the buildings; for instance, the model failed to detect the collapsed building in the bottom row, for which the ridge was non-horizontal without obvious deformation to the roof.
Even though the addition of contextual information did improve the overall performance, it also induced more types of errors. In the top row of
Figure 11, the model showed the largest importance score at the slope with a small amount of debris or soil near the captured building. The reason for this is that irregularly distributed debris is often found in the vicinity of collapsed buildings. Tall trees and low bushes corresponded to high scores in the second row. As vegetation presented itself as irregularly scattered points in the LiDAR scan resembling debris, it also confused the model and led to misclassification. On the other hand, the model, again, failed to capture large inclinations given more contextual information, as indicated by
Figure 11c. Similar to the case of
Figure 10c, the model still failed to capture minor but global deformations, as shown in
Figure 11c.
4.4. Results of the Ablation Study
The results of the ablation study are shown in
Figure 12. As expected, the performance using pre–post data was extremely robust to data reduction, achieving approximately 0.85 of mean recall with only 1% (48) of the training samples. This suggests that our proposed framework is effective for damage detection when both pre- and post-data are available. On the other hand, the performance and stability of both roof and patch decreased with increasing reduction rates. However, their performances were still acceptable (around 0.83) after removing 90% of the training data. The scores of the building roofs became higher than those of the building patches after a 10% reduction. Nonetheless, the scores of the roof were closely followed by those of the building patches, until they reached a reduction rate of 1%.
4.5. Results of the Generalization Test
The results of the generalization tests are shown below in
Table 8. The pre-trained model achieved acceptable performance, even without any training data. This indicates that the trained model was capable of learning common features across different architectural styles. The performance on the “collapsed” class was higher than that with “others”. This was expected, as the degree of similarity between G5 of masonry buildings and wooden buildings is higher than that of G4 buildings. The model achieved the highest mean recall when 50% of the training data were used. However, it only outperformed the non-trained model by a small margin (0.03), demonstrating the promising universality of the model. Surprisingly, the pre-trained model with 100% of the data scored lower than that using 50% of the data. Although this case is worthy of further investigation, in this work, we focused on the best-achievable performance under given conditions, rather than explaining the complex behavior of transfer learning. Furthermore, the usefulness of pre-training can be clearly seen, as the non-trained model with pre-training outperformed the fully-trained model (100%) without pre-training.
5. Discussion
5.1. Potential of LiDAR for Building Damage Detection
Our series of experiments and analyses fully demonstrated the high potential of the proposed LiDAR dataset for building damage detection approaches, in terms of overall performance and damage recognition patterns. In view of its operational advantages, a LiDAR sensor can operate under most weather conditions using the active sensing principle. Furthermore, collapsed building detection is of highest importance, as it may greatly endanger residents, compared to other types of damage. Such buildings that are hardly visible or remain highly uncertain from vertical images can be clearly observed using airborne LiDAR 3D point clouds. Therefore, the visual interpretation and automatic detection of collapsed buildings using LiDAR is generally preferred, compared to aerial image-based methods. Moreover, the generalization test results also confirmed that using point cloud data can lead to acceptable performances without additional training data, even if the buildings in the dataset have very different architectural styles, compared those in the dataset for pre-training.
However, the current limitation of LiDAR, compared to aerial imagery, is also apparent. First of all, the operational cost and priority of LiDAR are still behind the aerial images, which may be resolved through the recently increasing interest in hardware development and rapid advancement of LiDAR technology in the scientific and industrial fields. In terms of the expected performance, aerial images are clearly advantageous for detecting lower damages, due to their rich radiometric and textural information. Consequently, the wisest decision to make is to combine these complementary technologies for the highest performance [
19,
55]. However, the fusion of two distinct data types is not trivial, and thus, it is left as future work.
5.2. Effectiveness and Challenges of the Proposed Visual Explanation Method
IPSI intuitively illustrated the decisions of models relating to particular building damages. Visualization using IPSI showed that the model was able to precisely locate partial damages on roofs by assigning high importance scores to nearby points when the other parts were less damaged. On the other hand, the models succeeded to detect multiple deformations by assigning low scores to them if the roofs were uniformly damaged. By adding contextual information to building roofs, the models were trained to detect debris effectively, similarly to human operators.
More importantly, IPSI revealed the challenges that modern point cloud-based DNNs face, considering building damage detection. The most prominent misclassification was that the models failed to detect global damages, such as inclination and slight global deformations. These types of damage cannot be detected by merely locally assessing the surface, which modern DNNs are good at. In addition, they tended to consider roofs to be continuous which led to misclassification of intact roofs having additional structures. Secondly, the addition of contextual information obviously caused the models to depend too much on context, leading to the diffusion of the focus region away from the buildings. Therefore, the reliability of the models was reduced, as the models did not explicitly localize the buildings.
Despite its effectiveness and simplicity, IPSI has some challenges that need to be addressed. The efficient implementation of IPSI is necessary, as it iterates over all points in the given set. Therefore, the computational cost surges with an increase in the number of points or density. This could be mitigated by introducing sampling methods that uniformly preserve 3D geometries [
21] or parallelizing the algorithm across a large number of CPUs. Additionally, the definition of importance can be improved by incorporating gradient information [
45], which shows explicit relationships with classification scores.
5.3. Future Study
With the aim of improving building damage detection method performances, we intend to develop a model that is inclination-sensitive. Data fusion of aerial images and LiDAR point clouds will also be investigated, for joint exploitation of individual advantages. In terms of improving reliability, one possible direction of development is to use regularization, such that the model is aware of buildings when building patches are used. Moreover, the further improvement of visual explanation methods should also be taken into consideration, in order to validate the reliability of model decisions.
6. Conclusions
In this paper, we focus on developing a large-scale dataset, which is tailored to building damage detection using 3D point cloud data. Three forms of output are created for evaluating different data availability scenarios: pre–post building patches, post-building patches, and post-building roofs. The building roofs are separated from patches, such that the recognition of buildings could be independently assessed without interference by contextual information. Several damage assessment experiments using the developed dataset are implemented, using both basic and modern 3D deep learning algorithms under varying data availability scenarios. A general framework extending single-input to multiple-inputs is proposed, in order to adapt single-input networks to the pre–post classification scenario. The results show that the best-performing networks under each data availability scenario are able to achieve satisfactory emergency mapping results. In addition, the experimental results regarding the extraction of fine-grained damage type (story-collapsed, in this study) show that the trained models are able to achieve comparable accuracy to emergency mapping procedures.
A visual explanation method is proposed to qualitatively assess the per-point importance, with respect to the classification performance. Through visualization, we confirm that the trained models are capable of locating deformations occurring on roofs, as well as the debris scattered around buildings. It also reveals that the models tend to overemphasize the local deformation, as well as debris-like objects (e.g., bushes and piles of soil), leading to misclassifications. Meanwhile, visualization of the results shows that the cutting-edge models lack the ability to detect inclinations and slight global deformations.
Finally, we conduct an ablation study and a generalization test. By reducing the number of samples, the ablation study reveals that the proposed pre–post framework is extremely robust to data reduction. Though not comparable to the pre–post approach, the post-building patches and roofs are also able to achieve acceptable performance with only 10% of the training data. The generalization ability of the model is also tested, using LiDAR data acquired using a distinct sensor with buildings having different architectural styles compared to those in the dataset used for pre-training. The model trained using the developed dataset achieves moderate performance without training data, demonstrating the promising application of point clouds in transfer learning.
Future study will include the development of an inclination-sensitive model, data fusion with aerial images, and improvement of the proposed visual explanation model by adding gradient information.