BDHE-Net: A Novel Building Damage Heterogeneity Enhancement Network for Accurate and Efficient Post-Earthquake Assessment Using Aerial and Remote Sensing Data

: Accurate and efficient post-earthquake building damage assessment methods enable key building damage information to be obtained more quickly after an earthquake, providing strong support for rescue and reconstruction efforts. Although many methods have been proposed, most have limited effect on accurately extracting severely damaged and collapsed buildings


Introduction
Earthquakes, as highly destructive natural disasters, can severely impair societal development and jeopardize the safety of human lives and property.Therefore, it is crucial to evaluate the extent of structural damage in buildings promptly and accurately following an earthquake.This assessment holds significant value in supporting government emergency response efforts and facilitating efficient rescue operations [1].Remote sensing images are characterized by wide coverage, high spatial resolution, and rich spectral information, while UAV data have the advantages of high resolution, flexibility, and low operating costs.Consequently, the utilization of remote sensing imagery and UAV imagery has become prevalent in studies pertaining to building damage assessment.
The conventional approaches for building damage assessment include visual interpretation and field investigation, which are highly accurate but time-consuming and labor-intensive, especially when the affected area is large [2].Change detection, based on pre-and post-earthquake remote sensing images, is also an effective approach to assessing building damage.Gong [3] used an object-oriented classification method to extract building images before and after an earthquake and used a change detection method to analyze changes in the buildings.Although this method makes full use of multi-temporal features and obtains better accuracy, it requires pre-disaster images corresponding to the same location at the same time.In addition, this method can usually only extract images of collapsed buildings and cannot meet the need for a refined classification of building damage.Although this method fully utilizes multi-temporal features of the images, it cannot determine the damage level of buildings.
The progress in deep learning technologies has led to the development of various neural network models, including recurrent neural networks (RNNs) [4], convolutional neural networks (CNNs) [5], and graph neural networks (GNNs) [6].In particular, CNNs have shown promising potential in image classification and semantic segmentation.Therefore, they have been widely used in building damage assessment research.For example, Duarte [7] used three different CNN-based feature fusion methods based on residual connections and dilated convolutions to assess building damage at different resolutions.The results showed that when multiple-resolution feature maps were fused, and the feature information from intermediate layers of each resolution-level network was considered, better accuracy and localization capability were demonstrated.Chowdhury [8] developed RescueNet, a high-resolution dataset tailored for natural disaster analysis.This dataset is meticulously annotated at the pixel level and categorizes features into 11 distinct types, such as debris, water, buildings, vehicles, roads, trees, ponds, and sand.Moreover, RescueNet includes four unique labels for segmenting buildings based on varying degrees of damage.Xie [9] proposed a network that considers heterogeneous features of damaged buildings.This network utilizes a local-global context attention module, which extracts features from multiple directions.The test results indicated that compared with excellent deep learning models, the proposed method achieved a joint intersection-over-union (IOU) growth of 0.03-7.39%.Gupta [10] developed an end-to-end model for building segmentation that integrates a unique location-aware loss function.This function combines binary cross-entropy loss with foreground-selective category cross-entropy loss to classify damage.This model outperforms those utilizing conventional cross-entropy loss in terms of building segmentation and damage classification.Additionally, it demonstrates enhanced generalization across diverse geographical regions and types of disasters.Shen [11] proposed a two-stage CNN designed for assessing building damage.Initially, a U-Net is employed to identify building locations.Subsequently, the second stage utilizes a dual-branch, multi-scale U-Net architecture as its core framework.Pre-disaster and post-disaster images were input into the network, and a cross-directional attention module was employed to explore correlations among these images.Zheng [12] developed the ChangeOS framework for semantic change detection, utilizing a deep object localization network to accurately identify building structures for damage assessment.Comparative studies demonstrated that ChangeOS outperformed existing methods in terms of speed and accuracy, also showing enhanced generalization capabilities for anthropogenic disasters.Shafique [13] proposed a new deep learning algorithm that replaces the upsampling layer in U-Net3+ with a sub-pixel count convolutional layer, thus improving the problem of poor segmentation including irrelevant change information and inconsistent boundaries present in building change detection.Bai [14] suggested employing a U-net convolutional network for the semantic segmentation of building damage in high-resolution remote sensing imagery.The effectiveness of the U-Net model was evaluated by comparing it with the deep residual U-Net model, with the 2011 Tohoku earthquake tsunami serving as a benchmark.Rudner [15] presented a new method for fast and accurate disaster loss segmentation, which integrates multi-resolution, multi-sensor, and multi-temporal satellite imagery within a CNN framework.Hong [16] proposed a deep learning-based Multi-View Stereo (MVS) model for reconstructing 3D models of earthquake-damaged buildings, aimed at assisting in building damage assessment tasks.Hong [17] presented EBDC-Net (Earthquake Building Damage Classification Network).The network comprises a feature extraction module and a damage classification module and was designed to augment semantic information and differentiate various damage levels.Günen [18] introduced a new framework that accelerates building detection in ultra-high-resolution images.This approach employs the maximum correlation minimum redundancy method for feature selection, resulting in the generation of five distinct feature sets.
While previous studies on building damage classification have yielded valuable insights, numerous challenges still need resolution.Firstly, the issue of data sample category imbalance poses a significant problem.In most seismic hazards, significantly fewer buildings are damaged or collapsed by earthquakes than buildings that are undamaged or slightly damaged.This imbalance causes the model to be biased towards learning a larger number of categories during the training process, thus affecting the ability to recognize a smaller but important number of categories (e.g., severely damaged and collapsed houses).Secondly, current approaches to assessing building damage rely on semantic segmentation models that are standard in computer vision.However, these methods fail to consider the specific characteristics of building damage, resulting in suboptimal assessment outcomes.For instance, after a building collapses, debris, tiles, and other objects may be scattered around, and the collapse direction can be random.Directly applying existing semantic segmentation models may result in incomplete feature extraction and misclassification.Furthermore, there is a significant resolution difference between UAV images and satellite remote sensing images.Consequently, traditional convolutional neural networks may struggle to effectively capture the complete information of a house, leading to decreased accuracy in house classification [19].Therefore, the objective of this study is to design an enhanced deep learning model that incorporates multi-directional convolution, data augmentation strategies, and deep and shallow feature fusion.The methodology presented in this paper is designed for post-earthquake building damage assessment utilizing UAV and satellite remote sensing imagery.The primary contributions of this study can be summarized as follows: 1.A novel data augmentation module (DAM) is presented, different from the commonly used data augmentation methods such as rotation and size scaling, by integrating oversampling techniques and label polygon dilation techniques, which can improve the situation where the model weights are biased towards a large number of categories.2. A building damage attention module (BDAM) is proposed to enhance the accuracy of severely damaged and collapsed categories by considering the randomness of the collapse direction in collapsed buildings following earthquakes, as well as the heterogeneity in texture features in damaged houses and the ground.3. A multilevel feature adaptive fusion module (MFAF) is introduced to search for optimal parameters on feature maps of different scales, focusing on extracting contour integrity information among houses of different sizes and enhancing the model's sensitivity to diverse house sizes.

Data Sources
This study utilizes datasets composed of post-earthquake images from two distinct locations.The first dataset includes UAV images captured after a magnitude 4.5 earthquake in Baoxing County, Ya'an City, Sichuan Province, on 1 June 2022.The second consists of post-earthquake remote sensing images acquired after a 6.2 magnitude earthquake on 22 June 2022 in Khost Province, Afghanistan [20].The original remote sensing images of Afghanistan can be downloaded (license required) from the following URL: https: //resources.maxar.com/,accessed on 27 June 2022.Initially, a region of interest (ROI) was extracted from the original images.Subsequently, the region of interest (ROI) was divided into patches of various sizes based on the image resolution and the architectural characteristics of the buildings.Every patch was then evenly adjusted to dimensions of 512 × 512 pixels.Table 1 provides detailed information on the datasets.All buildings were categorized into four levels as follows: intact, slightly damaged, severely damaged, and collapsed.The criteria for judging the four categories were as follows: slightly damaged buildings are those where the roof exhibits uneven color tones due to partial tile loss, resulting in visible leaking areas.Severely damaged buildings have the characteristics of the overall outline of the building remaining intact, but one side of the house shows partial wall collapse, forming local ruins in the imagery.Fallen debris displays significant variations in brightness and color tones, indicating a collapse degree of 10-50%.
The outline of the collapsed buildings is incomplete, and the roof texture and color tones appear asymmetrical.There is a noticeable contrast between the collapsed corners of the walls and the roof texture in terms of brightness and color tones, indicating a collapse degree of over 50%.The proportion of intact, slightly damaged, severely damaged, and collapsed is 52%, 22%, 16%, and 10%, respectively.Table 2 shows an example of four building damage levels in the UVA and remote sensing images.Lastly, Table 3 illustrates the disparity in resolution between drone imagery and satellite remote sensing imagery.Notably, within the demarcated region highlighted by the red box, a significant discrepancy exists in the pixel area occupied by a single residential structure in the two image types [21].All buildings were categorized into four levels as follows: intact, slightly damaged, severely damaged, and collapsed.The criteria for judging the four categories were as follows: slightly damaged buildings are those where the roof exhibits uneven color tones due to partial tile loss, resulting in visible leaking areas.Severely damaged buildings have the characteristics of the overall outline of the building remaining intact, but one side of the house shows partial wall collapse, forming local ruins in the imagery.Fallen debris displays significant variations in brightness and color tones, indicating a collapse degree of 10-50%.
The outline of the collapsed buildings is incomplete, and the roof texture and color tones appear asymmetrical.There is a noticeable contrast between the collapsed corners of the walls and the roof texture in terms of brightness and color tones, indicating a collapse degree of over 50%.The proportion of intact, slightly damaged, severely damaged, and collapsed is 52%, 22%, 16%, and 10%, respectively.Table 2 shows an example of four building damage levels in the UVA and remote sensing images.Lastly, Table 3 illustrates the disparity in resolution between drone imagery and satellite remote sensing imagery.Notably, within the demarcated region highlighted by the red box, a significant discrepancy exists in the pixel area occupied by a single residential structure in the two image types [21].All buildings were categorized into four levels as follows: intact, slightly damaged, severely damaged, and collapsed.The criteria for judging the four categories were as follows: slightly damaged buildings are those where the roof exhibits uneven color tones due to partial tile loss, resulting in visible leaking areas.Severely damaged buildings have the characteristics of the overall outline of the building remaining intact, but one side of the house shows partial wall collapse, forming local ruins in the imagery.Fallen debris displays significant variations in brightness and color tones, indicating a collapse degree of 10-50%.
The outline of the collapsed buildings is incomplete, and the roof texture and color tones appear asymmetrical.There is a noticeable contrast between the collapsed corners of the walls and the roof texture in terms of brightness and color tones, indicating a collapse degree of over 50%.The proportion of intact, slightly damaged, severely damaged, and collapsed is 52%, 22%, 16%, and 10%, respectively.Table 2 shows an example of four building damage levels in the UVA and remote sensing images.Lastly, Table 3 illustrates the disparity in resolution between drone imagery and satellite remote sensing imagery.Notably, within the demarcated region highlighted by the red box, a significant discrepancy exists in the pixel area occupied by a single residential structure in the two image types [21].All buildings were categorized into four levels as follows: intact, slightly damaged, severely damaged, and collapsed.The criteria for judging the four categories were as follows: slightly damaged buildings are those where the roof exhibits uneven color tones due to partial tile loss, resulting in visible leaking areas.Severely damaged buildings have the characteristics of the overall outline of the building remaining intact, but one side of the house shows partial wall collapse, forming local ruins in the imagery.Fallen debris displays significant variations in brightness and color tones, indicating a collapse degree of 10-50%.
The outline of the collapsed buildings is incomplete, and the roof texture and color tones appear asymmetrical.There is a noticeable contrast between the collapsed corners of the walls and the roof texture in terms of brightness and color tones, indicating a collapse degree of over 50%.The proportion of intact, slightly damaged, severely damaged, and collapsed is 52%, 22%, 16%, and 10%, respectively.Table 2 shows an example of four building damage levels in the UVA and remote sensing images.Lastly, Table 3 illustrates the disparity in resolution between drone imagery and satellite remote sensing imagery.Notably, within the demarcated region highlighted by the red box, a significant discrepancy exists in the pixel area occupied by a single residential structure in the two image types [21].All buildings were categorized into four levels as follows: intact, slightly damaged, severely damaged, and collapsed.The criteria for judging the four categories were as follows: slightly damaged buildings are those where the roof exhibits uneven color tones due to partial tile loss, resulting in visible leaking areas.Severely damaged buildings have the characteristics of the overall outline of the building remaining intact, but one side of the house shows partial wall collapse, forming local ruins in the imagery.Fallen debris displays significant variations in brightness and color tones, indicating a collapse degree of 10-50%.
The outline of the collapsed buildings is incomplete, and the roof texture and color tones appear asymmetrical.There is a noticeable contrast between the collapsed corners of the walls and the roof texture in terms of brightness and color tones, indicating a collapse degree of over 50%.The proportion of intact, slightly damaged, severely damaged, and collapsed is 52%, 22%, 16%, and 10%, respectively.Table 2 shows an example of four building damage levels in the UVA and remote sensing images.Lastly, Table 3 illustrates the disparity in resolution between drone imagery and satellite remote sensing imagery.Notably, within the demarcated region highlighted by the red box, a significant discrepancy exists in the pixel area occupied by a single residential structure in the two image types [21].All buildings were categorized into four levels as follows: intact, slightly damaged, severely damaged, and collapsed.The criteria for judging the four categories were as follows: slightly damaged buildings are those where the roof exhibits uneven color tones due to partial tile loss, resulting in visible leaking areas.Severely damaged buildings have the characteristics of the overall outline of the building remaining intact, but one side of the house shows partial wall collapse, forming local ruins in the imagery.Fallen debris displays significant variations in brightness and color tones, indicating a collapse degree of 10-50%.
The outline of the collapsed buildings is incomplete, and the roof texture and color tones appear asymmetrical.There is a noticeable contrast between the collapsed corners of the walls and the roof texture in terms of brightness and color tones, indicating a collapse degree of over 50%.The proportion of intact, slightly damaged, severely damaged, and collapsed is 52%, 22%, 16%, and 10%, respectively.Table 2 shows an example of four building damage levels in the UVA and remote sensing images.Lastly, Table 3 illustrates the disparity in resolution between drone imagery and satellite remote sensing imagery.Notably, within the demarcated region highlighted by the red box, a significant discrepancy exists in the pixel area occupied by a single residential structure in the two image types [21].All buildings were categorized into four levels as follows: intact, slightly damaged, severely damaged, and collapsed.The criteria for judging the four categories were as follows: slightly damaged buildings are those where the roof exhibits uneven color tones due to partial tile loss, resulting in visible leaking areas.Severely damaged buildings have the characteristics of the overall outline of the building remaining intact, but one side of the house shows partial wall collapse, forming local ruins in the imagery.Fallen debris displays significant variations in brightness and color tones, indicating a collapse degree of 10-50%.
The outline of the collapsed buildings is incomplete, and the roof texture and color tones appear asymmetrical.There is a noticeable contrast between the collapsed corners of the walls and the roof texture in terms of brightness and color tones, indicating a collapse degree of over 50%.The proportion of intact, slightly damaged, severely damaged, and collapsed is 52%, 22%, 16%, and 10%, respectively.Table 2 shows an example of four building damage levels in the UVA and remote sensing images.Lastly, Table 3 illustrates the disparity in resolution between drone imagery and satellite remote sensing imagery.Notably, within the demarcated region highlighted by the red box, a significant discrepancy exists in the pixel area occupied by a single residential structure in the two image types [21].All buildings were categorized into four levels as follows: intact, slightly damaged, severely damaged, and collapsed.The criteria for judging the four categories were as follows: slightly damaged buildings are those where the roof exhibits uneven color tones due to partial tile loss, resulting in visible leaking areas.Severely damaged buildings have the characteristics of the overall outline of the building remaining intact, but one side of the house shows partial wall collapse, forming local ruins in the imagery.Fallen debris displays significant variations in brightness and color tones, indicating a collapse degree of 10-50%.
The outline of the collapsed buildings is incomplete, and the roof texture and color tones appear asymmetrical.There is a noticeable contrast between the collapsed corners of the walls and the roof texture in terms of brightness and color tones, indicating a collapse degree of over 50%.The proportion of intact, slightly damaged, severely damaged, and collapsed is 52%, 22%, 16%, and 10%, respectively.Table 2 shows an example of four building damage levels in the UVA and remote sensing images.Lastly, Table 3 illustrates the disparity in resolution between drone imagery and satellite remote sensing imagery.Notably, within the demarcated region highlighted by the red box, a significant discrepancy exists in the pixel area occupied by a single residential structure in the two image types [21].All buildings were categorized into four levels as follows: intact, slightly damaged, severely damaged, and collapsed.The criteria for judging the four categories were as follows: slightly damaged buildings are those where the roof exhibits uneven color tones due to partial tile loss, resulting in visible leaking areas.Severely damaged buildings have the characteristics of the overall outline of the building remaining intact, but one side of the house shows partial wall collapse, forming local ruins in the imagery.Fallen debris displays significant variations in brightness and color tones, indicating a collapse degree of 10-50%.
The outline of the collapsed buildings is incomplete, and the roof texture and color tones appear asymmetrical.There is a noticeable contrast between the collapsed corners of the walls and the roof texture in terms of brightness and color tones, indicating a collapse degree of over 50%.The proportion of intact, slightly damaged, severely damaged, and collapsed is 52%, 22%, 16%, and 10%, respectively.Table 2 shows an example of four building damage levels in the UVA and remote sensing images.Lastly, Table 3 illustrates the disparity in resolution between drone imagery and satellite remote sensing imagery.Notably, within the demarcated region highlighted by the red box, a significant discrepancy exists in the pixel area occupied by a single residential structure in the two image types [21].

Image Category Image GT
UAV Remote sensing

Data Augmentation Module Based on Oversampling Techniques and Label Polygon Dilation Techniques
In earthquake disasters, it is common for the number of intact houses to be much higher than the number of damaged houses, with a serious imbalance in the sample size of categories.When traditional convolutional neural networks are used directly for building damage assessment, the model predictions may be biased towards a large number of categories, thus resulting in a decline in the classification accuracy of damaged houses.Therefore, we designed a data augmentation module to enhance its ability to perceive the damaged house category.The data augmentation module consists of the following two components: oversampling and label polygon dilation techniques.
Considering that the number of severely damaged and collapsed types is significantly less than that of the slightly damaged categories, we judge the input image and copy it once if there is a slightly damaged type, and twice if there is a severely damaged

Image Category Image GT
UAV Remote sensing

Data Augmentation Module Based on Oversampling Techniques and Label Polygon Dilation Techniques
In earthquake disasters, it is common for the number of intact houses to be much higher than the number of damaged houses, with a serious imbalance in the sample size of categories.When traditional convolutional neural networks are used directly for building damage assessment, the model predictions may be biased towards a large number of categories, thus resulting in a decline in the classification accuracy of damaged houses.Therefore, we designed a data augmentation module to enhance its ability to perceive the damaged house category.The data augmentation module consists of the following two components: oversampling and label polygon dilation techniques.
Considering that the number of severely damaged and collapsed types is significantly less than that of the slightly damaged categories, we judge the input image and copy it once if there is a slightly damaged type, and twice if there is a severely damaged

Image Category Image GT
UAV Remote sensing

Data Augmentation Module Based on Oversampling Techniques and Label Polygon Dilation Techniques
In earthquake disasters, it is common for the number of intact houses to be much higher than the number of damaged houses, with a serious imbalance in the sample size of categories.When traditional convolutional neural networks are used directly for building damage assessment, the model predictions may be biased towards a large number of categories, thus resulting in a decline in the classification accuracy of damaged houses.Therefore, we designed a data augmentation module to enhance its ability to perceive the damaged house category.The data augmentation module consists of the following two components: oversampling and label polygon dilation techniques.
Considering that the number of severely damaged and collapsed types is significantly less than that of the slightly damaged categories, we judge the input image and copy it once if there is a slightly damaged type, and twice if there is a severely damaged

Image Category Image GT
UAV Remote sensing

Data Augmentation Module Based on Oversampling Techniques and Label Polygon Dilation Techniques
In earthquake disasters, it is common for the number of intact houses to be much higher than the number of damaged houses, with a serious imbalance in the sample size of categories.When traditional convolutional neural networks are used directly for building damage assessment, the model predictions may be biased towards a large number of categories, thus resulting in a decline in the classification accuracy of damaged houses.Therefore, we designed a data augmentation module to enhance its ability to perceive the damaged house category.The data augmentation module consists of the following two components: oversampling and label polygon dilation techniques.
Considering that the number of severely damaged and collapsed types is significantly less than that of the slightly damaged categories, we judge the input image and copy it once if there is a slightly damaged type, and twice if there is a severely damaged

Image Category Image GT
UAV Remote sensing

Data Augmentation Module Based on Oversampling Techniques and Label Polygon Dilation Techniques
In earthquake disasters, it is common for the number of intact houses to be much higher than the number of damaged houses, with a serious imbalance in the sample size of categories.When traditional convolutional neural networks are used directly for building damage assessment, the model predictions may be biased towards a large number of categories, thus resulting in a decline in the classification accuracy of damaged houses.Therefore, we designed a data augmentation module to enhance its ability to perceive the damaged house category.The data augmentation module consists of the following two components: oversampling and label polygon dilation techniques.
Considering that the number of severely damaged and collapsed types is significantly less than that of the slightly damaged categories, we judge the input image and copy it once if there is a slightly damaged type, and twice if there is a severely damaged

Data Augmentation Module Based on Oversampling Techniques and Label Polygon Dilation Techniques
In earthquake disasters, it is common for the number of intact houses to be much higher than the number of damaged houses, with a serious imbalance in the sample size of categories.When traditional convolutional neural networks are used directly for building damage assessment, the model predictions may be biased towards a large number of categories, thus resulting in a decline in the classification accuracy of damaged houses.Therefore, we designed a data augmentation module to enhance its ability to perceive the damaged house category.The data augmentation module consists of the following two components: oversampling and label polygon dilation techniques.
Considering that the number of severely damaged and collapsed types is significantly less than that of the slightly damaged categories, we judge the input image and copy it once if there is a slightly damaged type, and twice if there is a severely damaged category or collapse category.Therefore, the model can learn multiple times from these images to better extract features of damaged houses.
Secondly, when a building suffers severe damage or collapse, there will be debris around the house, which is significantly different from intact or slightly damaged buildings.This characteristic is critical for building damage classification.Therefore, we used labeled polygon inflation techniques based on an inflation factor k.For the severely damaged category, k was set to 5-10% of the labeled area, while for the collapsed category, k was to 10-15% of the labeled area because of more debris and gravel around collapsed buildings than in the severely damaged category.In this way, the characteristics of the severely damaged and collapsed categories can be amplified.Figure 2 shows a comparison between before and after dilation.
Appl.Sci.2024, 14, x FOR PEER REVIEW 6 of 16 category or collapse category.Therefore, the model can learn multiple times from these images to better extract features of damaged houses.Secondly, when a building suffers severe damage or collapse, there will be debris around the house, which is significantly different from intact or slightly damaged buildings.This characteristic is critical for building damage classification.Therefore, we used labeled polygon inflation techniques based on an inflation factor k.For the severely damaged category, k was set to 5-10% of the labeled area, while for the collapsed category, k was to 10-15% of the labeled area because of more debris and gravel around collapsed buildings than in the severely damaged category.In this way, the characteristics of the severely damaged and collapsed categories can be amplified.Figure 2 shows a comparison between before and after dilation.

Building Damage Attention Module Based on Dilated Convolution and Direction Convolution
The building damage attention module consists of an ASPP (Atrous Spatial Pyramid Pooling) module [22] and a DFE (Direction Feature Extraction) module [23].Firstly, since our dataset includes both UAV and satellite imagery, the difference in the resolution scale between the two types of images can be very large.Therefore, an ASPP module was initially employed within the feature extraction module to enhance the network's capability to extract features at various scales.The design of the ASPP module is inspired by DeepLabv3 [24] and DeepLabv3+ [25].Because of the different resolutions between UAV images and satellite remote sensing images, this module enhances the receptive field of the convolutional kernel by incorporating an adaptable dilation rate, enabling the capture of features at varying scales.The augmentation allows the network to carry out comprehensive mapping across different scales effectively.Furthermore, the entire feature map is transformed into a fixed-length feature vector that retains global information while minimizing spatial dimensions.This modification strengthens the network's capacity to derive both local and global information from images.
Secondly, in addition to the debris accumulation around damaged houses, there is evidence of random collapse directions and distinct textural characteristics compared with the surrounding ground.To take advantage of this distinctive feature, we incorporated a Direction Feature Extraction (DFE) module for extracting direction features.As shown in Figure 3, the DFE module comprises two branches.In the first branch, a 1 × 1 convolution is initially applied to reduce dimensionality, followed by directional

Building Damage Attention Module Based on Dilated Convolution and Direction Convolution
The building damage attention module consists of an ASPP (Atrous Spatial Pyramid Pooling) module [22] and a DFE (Direction Feature Extraction) module [23].Firstly, since our dataset includes both UAV and satellite imagery, the difference in the resolution scale between the two types of images can be very large.Therefore, an ASPP module was initially employed within the feature extraction module to enhance the network's capability to extract features at various scales.The design of the ASPP module is inspired by DeepLabv3 [24] and DeepLabv3+ [25].Because of the different resolutions between UAV images and satellite remote sensing images, this module enhances the receptive field of the convolutional kernel by incorporating an adaptable dilation rate, enabling the capture of features at varying scales.The augmentation allows the network to carry out comprehensive mapping across different scales effectively.Furthermore, the entire feature map is transformed into a fixed-length feature vector that retains global information while minimizing spatial dimensions.This modification strengthens the network's capacity to derive both local and global information from images.
Secondly, in addition to the debris accumulation around damaged houses, there is evidence of random collapse directions and distinct textural characteristics compared with the surrounding ground.To take advantage of this distinctive feature, we incorporated a Direction Feature Extraction (DFE) module for extracting direction features.As shown in Figure 3, the DFE module comprises two branches.In the first branch, a 1 × 1 convolution is initially applied to reduce dimensionality, followed by directional convolutions that employ four 1 × 1 convolutions to perform convolutions in the up, down, left, and right directions.This process generates four local spatial feature weight maps.Subsequently, the weight maps are concatenated to form a single feature aggregation weight map encompassing local spatial features from all four directions.The second branch incorporates global average pooling, 1 × 1 convolution, batch normalization (BN), and rectified linear unit (ReLU) activation, which collectively yields global spatial information weight maps.Finally, the local spatial information weight map and the global spatial information weight map are integrated using the element-wise multiplication ("mul" operation).F i represents the input features processed by the convolution layer, and F j represents the learned damaged buildings feature map.By incorporating building damage heterogeneity features such as random collapse directions, unique texture attributes compared to the surrounding ground, and debris accumulation around the affected structures, the network gains a deeper understanding and enhanced utilization of these features.Consequently, this leads to improved accuracy and robustness in the pixel-level classification of building damage.
, 14, x FOR PEER REVIEW 7 of 16 convolutions that employ four 1 × 1 convolutions to perform convolutions in the up, down, left, and right directions.This process generates four local spatial feature weight maps.Subsequently, the weight maps are concatenated to form a single feature aggregation weight map encompassing local spatial features from all four directions.The second branch incorporates global average pooling, 1 × 1 convolution, batch normalization (BN), and rectified linear unit (ReLU) activation, which collectively yields global spatial information weight maps.Finally, the local spatial information weight map and the global spatial information weight map are integrated using the element-wise multiplication ("mul" operation).  represents the input features processed by the convolution layer, and   represents the learned damaged buildings feature map.By incorporating building damage heterogeneity features such as random collapse directions, unique texture attributes compared to the surrounding ground, and debris accumulation around the affected structures, the network gains a deeper understanding and enhanced utilization of these features.Consequently, this leads to improved accuracy and robustness in the pixel-level classification of building damage.

Multilevel Feature Adaptive Fusion Module Based on Multi-Scale Fusion
The design of the MFAF incorporates information about the integrity of a house at different scales [26].Figure 4 displays the construction of the MFAF module.  represents the input features processed by the convolution layer and   represents the learned house integrity feature map.Global statistics of feature maps at different resolutions are captured through three different strategies.The first strategy involves the simultaneous modification of channel number and resolution by applying a 3 × 3 convolution layer with a stride of 2. This downsamples the input feature map resolution to 1/2.The second strategy compresses the input feature map by employing a 1 × 1 convolution, preserving image information at the original scale.The third strategy increases the resolution to 2× by employing a 1 × 1 convolution and interpolation.This design captures global statistics of feature maps across various resolutions, aiding in the detection of buildings of different sizes in the input.Following the scaling operation, the output from each branch is processed through a fully connected network, resulting in a series of weighted feature vectors.The number of channels in the feature weight vector, denoted as , represents the feature weight vector diagram for the three branches.Subsequently, the SoftMax operation is applied to fix the values of the feature matrix between 0 and 1.These values are then used to multiply the original feature map, weighting the house features to amplify

Multilevel Feature Adaptive Fusion Module Based on Multi-Scale Fusion
The design of the MFAF incorporates information about the integrity of a house at different scales [26].Figure 4 displays the construction of the MFAF module.F i represents the input features processed by the convolution layer and F j represents the learned house integrity feature map.Global statistics of feature maps at different resolutions are captured through three different strategies.The first strategy involves the simultaneous modification of channel number and resolution by applying a 3 × 3 convolution layer with a stride of 2. This downsamples the input feature map resolution to 1/2.The second strategy compresses the input feature map by employing a 1 ×1 convolution, preserving image information at the original scale.The third strategy increases the resolution to 2× by employing a 1 × 1 convolution and interpolation.This design captures global statistics of feature maps across various resolutions, aiding in the detection of buildings of different sizes in the input.Following the scaling operation, the output from each branch is processed through a fully connected network, resulting in a series of weighted feature vectors.The number of channels in the feature weight vector, denoted as l, represents the feature weight vector diagram for the three branches.Subsequently, the SoftMax operation is applied to fix the values of the feature matrix between 0 and 1.These values are then used to multiply the original feature map, weighting the house features to amplify the impact of complete information.The features at the corresponding level l are fused as follows: where x n→l ij denotes the feature vector at position i, j on the feature map, adjusted from level n to level l.The terms α l ij , β l ij , and γ l ij refer to the spatial importance weights from three different levels to level l, which the network adaptively learns.Additionally, y l ij indicates the i, j vector of the output feature map between channels.
Appl.Sci.2024, 14, x FOR PEER REVIEW 8 of 16 the impact of complete information.The features at the corresponding level  are fused as follows: where   → denotes the feature vector at position ,  on the feature map, adjusted from level  to level  .The terms    ,    , and    refer to the spatial importance weights from three different levels to level , which the network adaptively learns.Additionally,    indicates the ,  vector of the output feature map between channels.

Combination Loss Function Based on Focal and Dice loss
Cross-entropy loss is a common loss function in deep learning.However, cross-entropy is a global loss function that only considers pixel-level loss, so when cross-entropy loss is used in our post-earthquake building damage dataset, it will lead to the model weights being biased in favor of categories with a high number of sample categories, whereas in the post-earthquake dataset, the number of samples in the generally largely intact category is much larger than in the other categories.As a result, it can easily lead to larger errors when extracting damaged houses in the models.Unlike cross-entropy loss, since Dice loss emphasizes the overlap between predicted results and true labels [27], focal loss focuses more on the strategy of hard-to-predict samples by adjusting the sample weights [28].Therefore, both Dice loss and focal loss enable the model to concentrate more significantly on samples from a limited number of classes, thereby mitigating the issue of model weights disproportionately favoring intact classes to some extent.To address the imbalance where the number of damaged houses in the post-earthquake building damage dataset is significantly lower than that of undamaged houses, a combined loss function comprising Dice loss and focal loss, alongside a cross-entropy loss function, is proposed.This combined loss function is utilized to optimize the model during the training process.
The Dice coefficient is a metric utilized to quantify the degree of overlap between predicted and ground truth regions in semantic segmentation tasks.It serves as an evaluation measure, assessing model performance by comparing the similarity between predicted results and actual labels.The Dice coefficient ranges between 0 and 1, where a value of 1 indicates a perfect overlap, meaning the predicted region is identical to the real region, while a value of 0 implies no overlap, indicating that the prediction has no relevance.The

Combination Loss Function Based on Focal and Dice Loss
Cross-entropy loss is a common loss function in deep learning.However, cross-entropy is a global loss function that only considers pixel-level loss, so when cross-entropy loss is used in our post-earthquake building damage dataset, it will lead to the model weights being biased in favor of categories with a high number of sample categories, whereas in the post-earthquake dataset, the number of samples in the generally largely intact category is much larger than in the other categories.As a result, it can easily lead to larger errors when extracting damaged houses in the models.Unlike cross-entropy loss, since Dice loss emphasizes the overlap between predicted results and true labels [27], focal loss focuses more on the strategy of hard-to-predict samples by adjusting the sample weights [28].Therefore, both Dice loss and focal loss enable the model to concentrate more significantly on samples from a limited number of classes, thereby mitigating the issue of model weights disproportionately favoring intact classes to some extent.To address the imbalance where the number of damaged houses in the post-earthquake building damage dataset is significantly lower than that of undamaged houses, a combined loss function comprising Dice loss and focal loss, alongside a cross-entropy loss function, is proposed.This combined loss function is utilized to optimize the model during the training process.
The Dice coefficient is a metric utilized to quantify the degree of overlap between predicted and ground truth regions in semantic segmentation tasks.It serves as an evaluation measure, assessing model performance by comparing the similarity between predicted results and actual labels.The Dice coefficient ranges between 0 and 1, where a value of 1 indicates a perfect overlap, meaning the predicted region is identical to the real region, while a value of 0 implies no overlap, indicating that the prediction has no relevance.The Dice coefficient offers several advantages over other metrics.Firstly, it exhibits regional correlation, meaning the loss of a given pixel is not solely dependent on its predicted value but also on the values of neighboring pixels.This characteristic enables the Dice coefficient to provide a balanced assessment of the data, particularly in scenarios where there is class imbalance.Secondly, since larger Dice coefficients indicate better performance, this metric remains unaffected by data imbalance.In neural network training, the goal is often to minimize the loss function to optimize the model.However, as larger Dice coefficients are desirable, it is possible to utilize the complementary value of the Dice coefficient (1 minus the Dice coefficient) as the formulation for the Dice loss function, as depicted in Equation ( 2).This approach aims to prioritize an enhancement in the Dice coefficient during training, thereby improving the model's performance in semantic segmentation tasks.
Furthermore, Equation (3) presents the focal loss, which serves as an additional component to the balanced cross-entropy loss function to adjust the weights of samples that are easy or challenging to classify.The focal loss introduces an adjustable focusing parameter, γ, to decrease the weight of easily separable samples and emphasize the importance of difficult-to-categorize samples.By utilizing a γ value greater than 1, the focal loss reduces the weight assigned to conveniently separable samples, directing the model's attention towards more challenging instances.Conversely, an γ value less than 1 increases the weight of easily separable samples, ensuring a balanced consideration of all categories.This mechanism directs the model to focus more on samples that are challenging to classify, thereby enhancing the classification accuracy of minority categories.
where X and Y represent the ground truth and predict_mask of segmentation, α represents the category weight, and γ represents the weight of difficult-to-distinguish samples.

Experimental Environment
An experiment was conducted using a single Nvidia GeForce RTX 3090 24G (GPU) and an Intel(R) Xeon(R) Gold 6248R CPU @ 3.00 GHz (CPU).The training and testing were all implemented on a Windows 10 system.The network model was constructed using the PyTorch1.9deep learning framework [29], which is widely utilized in the field.PyTorch, an open-source machine learning framework that enjoys extensive adoption, provides a diverse range of pre-trained models and libraries, enabling time and computational resource savings.

Evaluation Metrics
To objectively assess the segmentation performance of the model for building damage assessment and enable effective comparisons with different approaches, four widely employed evaluation metrics for semantic segmentation, namely, P (Precision), R (Recall), the F1 score, and IOU (Intersection over Union), are adopted to evaluate the effectiveness of the introduced model.The evaluation is conducted using the following formulas: where TP, TN, FP, and FN represent the counts of true positive, true negative, false positive, and false negative samples for the respective classes, respectively.

Experimental Parameter Setting
The neural network's internal parameters are derived from iterative model training, while certain hyperparameters require manual configuration before training.In our experiments, the training settings included a batch size of 12, the Adam optimizer, an initial learning rate of 0.0001, and a weight decay of 0.00001.We use the parameters trained on ResNext-50 in ImageNet as the initial weights to improve the stability and generalization ability of the model [30].The number of iterations was set to 30.As depicted in Figure 5, the model's loss value exhibited a decline from the initial 0.924 to 0.165 after 10 epochs of training.Subsequently, during the course of 30 training epochs, the model displayed a propensity to stabilize and converge.
Appl.Sci.2024, 14, x FOR PEER REVIEW 10 of 16 where , , , and  represent the counts of true positive, true negative, false positive, and false negative samples for the respective classes, respectively.

Experimental Parameter Setting
The neural network's internal parameters are derived from iterative model training, while certain hyperparameters require manual configuration before training.In our experiments, the training settings included a batch size of 12, the Adam optimizer, an initial learning rate of 0.0001, and a weight decay of 0.00001.We use the parameters trained on ResNext-50 in ImageNet as the initial weights to improve the stability and generalization ability of the model [30].The number of iterations was set to 30.As depicted in Figure 5, the model's loss value exhibited a decline from the initial 0.924 to 0.165 after 10 epochs of training.Subsequently, during the course of 30 training epochs, the model displayed a propensity to stabilize and converge.

Comparative Analysis of Splitting Performance
In this paper, three classic semantic segmentation models including DeepLabv3+, ResNet-50 [31], and U-Net [32] are used for comparison with our proposed model.The comparison results for the satellite remote sensing images from the Afghanistan earthquake are presented in Figure 6, while those for the UAV images from the Baoxing earthquake are displayed in Figure 7.The figure reveals an incorrect categorization of damaged houses by the U-Net model.The ResNet-50 model fails to provide clear boundary information for different building categories.In addition, the DeepLabv3+ model does not predict the outline of houses completely.In contrast, the BDHE-Net model proposed in this paper not only comprehensively extracts the contours of buildings but also effectively differentiates between buildings with different damage classes.

Comparative Analysis of Splitting Performance
In this paper, three classic semantic segmentation models including DeepLabv3+, ResNet-50 [31], and U-Net [32] are used for comparison with our proposed model.The comparison results for the satellite remote sensing images from the Afghanistan earthquake are presented in Figure 6, while those for the UAV images from the Baoxing earthquake are displayed in Figure 7.The figure reveals an incorrect categorization of damaged houses by the U-Net model.The ResNet-50 model fails to provide clear boundary information for different building categories.In addition, the DeepLabv3+ model does not predict the outline of houses completely.In contrast, the BDHE-Net model proposed in this paper not only comprehensively extracts the contours of buildings but also effectively differentiates between buildings with different damage classes.
To further assess the model's performance, a detailed analysis is presented in Figure 8 with satellite images and UAV images.The first row is the satellite remote sensing image, and the original image, zoomed in the red box, is shown in the second column.The main body of the house within the red box belongs to the slightly damaged category.However, the U-Net model misidentifies it as severely damaged.The ResNet-50 model struggles to differentiate between slight and severely damaged, categorizing it as a mix of both.The Deeplabv3+ model classifies it as severely damaged and fails to recognize the complete and regular shape of the main building.In contrast, our BDHE-Net model accurately identifies it as slight damage and recognizes the complete and regular shape of the main building.To further assess the model's performance, a detailed analysis is presented in Figure 8 with satellite images and UAV images.The first row is the satellite remote sensing image, and the original image, zoomed in the red box, is shown in the second column.The main body of the house within the red box belongs to the slightly damaged category.However, body of the house within the red box belongs to the slightly damaged category.However, the U-Net model misidentifies it as severely damaged.The ResNet-50 model struggles to differentiate between slight and severely damaged, categorizing it as a mix of both.The Deeplabv3+ model classifies it as severely damaged and fails to recognize the complete and regular shape of the main building.In contrast, our BDHE-Net model accurately identifies it as slight damage and recognizes the complete and regular shape of the main building.Similarly, the second row of Figure 8 contains a UAV image.The local image within the red box is enlarged and displayed in the second column.The left side of the building within the red box corresponds to the collapsed category, while the right side corresponds to the severely damaged category.It is evident that both the U-Net and ResNet-50 models predominantly classify the houses on the left and right sides as severely damaged, failing to differentiate between collapsed and severely damaged structures.The Deeplabv3+ model fails to recognize the complete and regular shape of the building.However, our BDHE-Net model accurately identifies the left side as collapsed and the right side as severely damaged while recognizing the complete and regular shape of the building.This demonstrates the effectiveness of our model in predicting building damage level classification.
Table 4 presents a quantitative comparison of building damage classification accuracy among various baseline models in our post-earthquake dataset.As indicated by the experimental results in Table 4, the proposed BDHE-Net surpasses the other models, achieving an average F1 score of 66.35% and an IOU of 47.15%.Compared with U-Net, ResNet-50, and DeepLabv3+, the average F1 score of BDHE-Net improved by 6.57%, 6.19%, and 8.22%, respectively, and the average IOU increased by 6.62%, 6.24%, and 8.09%, respectively.For the intact category, there was a slight difference in F1 between the four models, indicating that all models performed well in distinguishing intact buildings.However, for the severely damaged categories, BDHE-Net improved its F1 scores by 3.62% to 6.81% compared with the other models.This result proves the effectiveness of our proposed DAM module and combined loss function; by combining the method, Similarly, the second row of Figure 8 contains a UAV image.The local image within the red box is enlarged and displayed in the second column.The left side of the building within the red box corresponds to the collapsed category, while the right side corresponds to the severely damaged category.It is evident that both the U-Net and ResNet-50 models predominantly classify the houses on the left and right sides as severely damaged, failing to differentiate between collapsed and severely damaged structures.The Deeplabv3+ model fails to recognize the complete and regular shape of the building.However, our BDHE-Net model accurately identifies the left side as collapsed and the right side as severely damaged while recognizing the complete and regular shape of the building.This demonstrates the effectiveness of our model in predicting building damage level classification.
Table 4 presents a quantitative comparison of building damage classification accuracy among various baseline models in our post-earthquake dataset.As indicated by the experimental results in Table 4, the proposed BDHE-Net surpasses the other models, achieving an average F1 score of 66.35% and an IOU of 47.15%.Compared with U-Net, ResNet-50, and DeepLabv3+, the average F1 score of BDHE-Net improved by 6.57%, 6.19%, and 8.22%, respectively, and the average IOU increased by 6.62%, 6.24%, and 8.09%, respectively.For the intact category, there was a slight difference in F1 between the four models, indicating that all models performed well in distinguishing intact buildings.However, for the severely damaged categories, BDHE-Net improved its F1 scores by 3.62% to 6.81% compared with the other models.This result proves the effectiveness of our proposed DAM module and combined loss function; by combining the method, BDHE-Net effectively improves the original model weights biased towards categories with the high number of categories, which results in good accuracy of the model in the lesser number of slightly damaged and severely damaged categories.In addition, for the collapsed category, our model improves the F1 scores by 13.29% to 21.98% compared with the other models, which proves that our proposed BDAM module can fully utilize building collapse characteristics, thus enabling the model to distinguish collapsed houses effectively.Finally, in terms of the average F1 scores of the four classes, our model improves the average F1 scores by 6.19% to 8.22% compared with the other models, which demonstrates that our proposed MFAF module plays a role in the dataset at different scales, which enables BDHE-Net to extract more complete information about houses of different sizes at different scales.

Ablation Experiments
To validate the contribution of each module to the proposed method, detailed ablation experiments were conducted using the same data and experimental setup.Table 5 displays the results of the ablation experiments performed on the post-earthquake dataset, where B represents the BDAM module, M represents the MFAF module, D represents the DBA module, and C represents the combined loss function module.By adding the BDAM module, the MFAF module, the DBA, and combined loss function modules to the model, the present method achieves the highest overall accuracy relative to the baseline method.Specifically, when the MFAF module was added to the model, the average F1 score and IOU increased by 1.03% and 0.91%, respectively, compared with the baseline.In contrast, when the BDAM module was added to the model in this paper, the average F1 score and IOU increased by 1.71% and 1.80%, respectively.These results demonstrate that BDHE-Net offers tremendous benefits in the task of building damage assessment.The introduction of the MFAF module effectively integrates the integrity information of houses of different scales and enhances the ability to identify the integrity of houses of different scales.The BDAM module, on the other hand, fully considers different collapse directions and texture characteristics of damaged buildings, enhancing model performance for extracting damaged houses.In addition, when the DBA module and the combined loss function module were added to the model in this experiment, the average F1 and IOU scores increased by 0.67% and 0.51%, respectively.This suggests that the DBA module plays an active role in model training.To rectify the disparity in sample distribution among different classes of building damage in the dataset, the DBA module incorporates a strategy that involves augmenting the dataset with a smaller number of samples (e.g., in the severe damage and collapse categories) while simultaneously expanding the damaged house regions.By introducing more samples of fewer categories and expanding the damaged areas, the model is able to learn and classify the small number of samples better, which improves the refinement of building damage assessment.The combined loss function combines Dice and focal losses, which can balance the weights of different category samples during the training process.This approach prevents the model from excessively concentrating on samples with a large number of categories, such as the intact category, while neglecting those with fewer instances.By adjusting the weights of the samples, the combined loss function ensures the model allocates more attention to samples that are challenging to classify, thereby enhancing the performance of building damage assessment.
The results suggest that BDHE-Net shows significant advantages in the task of building damage assessment.This is because the BDAM module fully considers the characteristics of damaged buildings, e.g., different collapse directions and texture characteristics.Secondly, the DBA and combined loss function modules improve the model weight bias towards cases with a large number of categories.Finally, the MFAF module effectively fuses the house dimension information at different scales and enhances the integrity of houses at different scales.In summary, BDHE-Net demonstrates superior capabilities in conducting building damage assessments across multiple-resolution datasets.

Conclusions
This study initially examined the heterogeneity in damaged buildings following an earthquake.Additionally, satellite remote sensing and UAV datasets from the Afghanistan and Baoxing earthquakes were compiled, classifying building damage into four categories as follows: intact, slightly damaged, severely damaged, and collapsed.To address the challenge of pixel-level assessment of building damage post-earthquake, a network named BDHE-Net was proposed to significantly enhance the model's accuracy in classifying severely damaged and collapsed buildings.The method was tested on our dataset and benchmarked against three state-of-the-art methods.Furthermore, the role of BDAM, MFAF, BDA, and combined loss function modules was explored.The experimental results show that the introduction of these four strategies improves the mean F1 and mean IOU value distribution by 3.41% and 3.22%, respectively, compared with the baseline model.
This paper presents the following key contributions: 1.A novel deep learning-based model is proposed to solve the pixel-level classification problem for post-earthquake building damage assessment, which is crucial for earthquake rescue and post-disaster damage assessment.2. BDAM, MFAF, BDA, and combined loss function modules are incorporated into BDHE-Net, which enhance the model's capacity to discern varying levels of damage among buildings.
In the future, we will attempt to use multi-modal images, which refer to the fusion of data acquired by different sensors, such as optical imagery, radar data, hyperspectral data, and so on.By fusing data from different modalities, more comprehensive and multi-angle information can be obtained, thus enhancing the accuracy of building damage assessment.For example, optical images can provide information on the shape and texture of a building, while radar data can penetrate clouds and smoke to obtain structural information about a building, and hyperspectral data can provide rich spectral features for distinguishing buildings made of different materials.Pixel-level assessment of building damage under complex conditions can be further investigated to address the requirements of emergency rescue and post-disaster reconstruction efforts.

Figure 1
Figure 1 depicts BDHE-Net, the proposed framework for classifying building damage.The framework incorporates a data augmentation module, a building damage attention module, and a multilevel feature adaptive fusion module, which is employed to suppress the model weights likely to favor the intact category, enhance the extraction of building damage features from the model, and strengthen the extraction of housing integrity information at different scales, respectively.

Figure 1 .
Figure 1.The structure of the proposed BDHE-Net framework.

Figure 1
Figure 1 depicts BDHE-Net, the proposed framework for classifying building damage.The framework incorporates a data augmentation module, a building damage attention module, and a multilevel feature adaptive fusion module, which is employed to suppress the model weights likely to favor the intact category, enhance the extraction of building damage features from the model, and strengthen the extraction of housing integrity information at different scales, respectively.

Figure 1 .
Figure 1.The structure of the proposed BDHE-Net framework.

Figure 1
Figure 1 depicts BDHE-Net, the proposed framework for classifying building damage.The framework incorporates a data augmentation module, a building damage attention module, and a multilevel feature adaptive fusion module, which is employed to suppress the model weights likely to favor the intact category, enhance the extraction of building damage features from the model, and strengthen the extraction of housing integrity information at different scales, respectively.

Figure 1 .
Figure 1.The structure of the proposed BDHE-Net framework.

Figure 1
Figure 1 depicts BDHE-Net, the proposed framework for classifying building damage.The framework incorporates a data augmentation module, a building damage attention module, and a multilevel feature adaptive fusion module, which is employed to suppress the model weights likely to favor the intact category, enhance the extraction of building damage features from the model, and strengthen the extraction of housing integrity information at different scales, respectively.

Figure 1 .
Figure 1.The structure of the proposed BDHE-Net framework.

Figure 1
Figure 1 depicts BDHE-Net, the proposed framework for classifying building damage.The framework incorporates a data augmentation module, a building damage attention module, and a multilevel feature adaptive fusion module, which is employed to suppress the model weights likely to favor the intact category, enhance the extraction of building damage features from the model, and strengthen the extraction of housing integrity information at different scales, respectively.

Figure 1
Figure 1 depicts BDHE-Net, the proposed framework for classifying building damage.The framework incorporates a data augmentation module, a building damage attention module, and a multilevel feature adaptive fusion module, which is employed to suppress the model weights likely to favor the intact category, enhance the extraction of building damage features from the model, and strengthen the extraction of housing integrity information at different scales, respectively.

Figure 1 .
Figure 1.The structure of the proposed BDHE-Net framework.

Figure 1 .
Figure 1.The structure of the proposed BDHE-Net framework.

Figure 6 .
Figure 6.Comparison of different segmentation models for the Afghanistan earthquake.

Figure 6 .Figure 7 .
Figure 6.Comparison of different segmentation models for the Afghanistan earthquake.

Figure 7 .
Figure 7.Comparison of different segmentation models for the Baoxing earthquake.

Figure 8 .
Figure 8. Detailed comparison of impact local amplification.

Figure 8 .
Figure 8. Detailed comparison of impact local amplification.

Table 1 .
Image detail information on the datasets.

Table 2 .
[20]ples of four building damage levels in UVA and remote sensing images.Ya'an City, Sichuan Province, on 1 June 2022.The second consists of post-earthquake remote sensing images acquired after a 6.2 magnitude earthquake on 22 June 2022 in Khost Province, Afghanistan[20].The original remote sensing images of Afghanistan can be downloaded (license required) from the following URL: https://resources.maxar.com/,accessedon27 June 2022.Initially, a region of interest (ROI) was extracted from the original images.Subsequently, the region of interest (ROI) was divided into patches of various sizes based on the image resolution and the architectural characteristics of the buildings.Every patch was then evenly adjusted to dimensions of 512 × 512 pixels.Table1provides detailed information on the datasets.

Table 1 .
Image detail information on the datasets.

Table 2 .
Examples of four building damage levels in UVA and remote sensing images.

Table 1 .
Image detail information on the datasets.

Table 2 .
Examples of four building damage levels in UVA and remote sensing images.

Table 1 .
Image detail information on the datasets.

Table 2 .
Examples of four building damage levels in UVA and remote sensing images.

Image Category Intact Slightly Damaged Severely Damaged Collapsed UAV Remote sensing
/resources.maxar.com/,accessedon27 June 2022.Initially, a region of interest (ROI) was extracted from the original images.Subsequently, the region of interest (ROI) was divided into patches of various sizes based on the image resolution and the architectural characteristics of the buildings.Every patch was then evenly adjusted to dimensions of 512 × 512 pixels.Table1provides detailed information on the datasets.

Table 1 .
Image detail information on the datasets.

Table 2 .
[20]ples of four building damage levels in UVA and remote sensing images.June 2022.The second consists of post-earthquake remote sensing images acquired after a 6.2 magnitude earthquake on 22 June 2022 in Khost Province, Afghanistan[20].The original remote sensing images of Afghanistan can be downloaded (license required) from the following URL: https://resources.maxar.com/,accessedon27 June 2022.Initially, a region of interest (ROI) was extracted from the original images.Subsequently, the region of interest (ROI) was divided into patches of various sizes based on the image resolution and the architectural characteristics of the buildings.Every patch was then evenly adjusted to dimensions of 512 × 512 pixels.Table1provides detailed information on the datasets.

Table 1 .
Image detail information on the datasets.

Table 2 .
[20]ples of four building damage levels in UVA and remote sensing images.June 2022.The second consists of post-earthquake remote sensing images acquired after a 6.2 magnitude earthquake on 22 June 2022 in Khost Province, Afghanistan[20].The original remote sensing images of Afghanistan can be downloaded (license required) from the following URL: https://resources.maxar.com/,accessedon27 June 2022.Initially, a region of interest (ROI) was extracted from the original images.Subsequently, the region of interest (ROI) was divided into patches of various sizes based on the image resolution and the architectural characteristics of the buildings.Every patch was then evenly adjusted to dimensions of 512 × 512 pixels.Table1provides detailed information on the datasets.

Table 1 .
Image detail information on the datasets.

Table 2 .
[20]ples of four building damage levels in UVA and remote sensing images.Ya'an City, Sichuan Province, on 1 June 2022.The second consists of post-earthquake remote sensing images acquired after a 6.2 magnitude earthquake on 22 June 2022 in Khost Province, Afghanistan[20].The original remote sensing images of Afghanistan can be downloaded (license required) from the following URL: https://resources.maxar.com/,accessedon27 June 2022.Initially, a region of interest (ROI) was extracted from the original images.Subsequently, the region of interest (ROI) was divided into patches of various sizes based on the image resolution and the architectural characteristics of the buildings.Every patch was then evenly adjusted to dimensions of 512 × 512 pixels.Table1provides detailed information on the datasets.

Table 1 .
Image detail information on the datasets.

Table 2 .
[20]ples of four building damage levels in UVA and remote sensing images.Ya'an City, Sichuan Province, on 1 June 2022.The second consists of post-earthquake remote sensing images acquired after a 6.2 magnitude earthquake on 22 June 2022 in Khost Province, Afghanistan[20].The original remote sensing images of Afghanistan can be downloaded (license required) from the following URL: https://resources.maxar.com/,accessedon27 June 2022.Initially, a region of interest (ROI) was extracted from the original images.Subsequently, the region of interest (ROI) was divided into patches of various sizes based on the image resolution and the architectural characteristics of the buildings.Every patch was then evenly adjusted to dimensions of 512 × 512 pixels.Table1provides detailed information on the datasets.

Table 1 .
Image detail information on the datasets.

Table 2 .
Examples of four building damage levels in UVA and remote sensing images.

Table 3 .
Examples of UAV and satellite remote sensing images of the same size.

Table 3 .
Examples of UAV and satellite remote sensing images of the same size.

Table 3 .
Examples of UAV and satellite remote sensing images of the same size.

Table 3 .
Examples of UAV and satellite remote sensing images of the same size.

Table 3 .
Examples of UAV and satellite remote sensing images of the same size.

Table 3 .
Examples of UAV and satellite remote sensing images of the same size.

Table 4 .
Comparison of results among semantic segmentation networks.

Table 5 .
The impact of various modules within the BDHE-Net architecture on the accuracy of building damage classification.