Multi-Scale Earthquake Damaged Building Feature Set

: Earthquake disasters are marked by their unpredictability and potential for extreme destructiveness. Accurate information on building damage, captured in post-earthquake remote sensing images, is critical for an effective post-disaster emergency response. The foundational features within these images are essential for the accurate extraction of building damage data following seismic events. Presently, the availability of publicly accessible datasets tailored specifically to earthquake-damaged buildings is limited, and existing collections of post-earthquake building damage characteristics are insufficient. To address this gap and foster research advancement in this domain, this paper introduces a new, large-scale, publicly available dataset named the Major Earthquake Damage Building Feature Set (MEDBFS). This dataset comprises image data sourced from five significant global earthquakes and captured by various optical remote sensing satellites, featuring diverse scale characteristics and multiple spatial resolutions. It includes over 7000 images of buildings pre-and post-disaster, each subjected to stringent quality control and expert validation. The images are categorized into three primary groups: intact/slightly damaged, severely damaged, and completely collapsed. This paper develops a comprehensive feature set encompassing five dimensions: spectral, texture, edge detection, building index, and temporal sequencing, resulting in 16 distinct classes of feature images. This dataset is poised to significantly enhance the capabilities for data-driven identification and analysis of earthquake-induced building damage, thereby supporting the advancement of scientific and technological efforts for emergency earthquake response.


Introduction
In recent years, natural disasters have increasingly inflicted severe losses of life and property globally [1].Earthquakes, as one of the most unpredictable natural disasters, are particularly challenging to mitigate, with their widespread impacts and substantial destruction in urban areas [2][3][4].For instance, the 2010 Yushu earthquake in Qinghai, China, resulted in over 2000 fatalities and injured more than 100,000 individuals.Notably, rescue efforts successfully extracted over 6000 survivors from the rubble [5].The structural failure of buildings during seismic events frequently accounts for the majority of casualties and significant economic losses.In the context of Turkey's February 2023 earthquakes, reports from the Turkish Enterprise and Business Federation indicate that over 84,000 buildings were either severely damaged or collapsed, contributing to estimated economic losses of approximately USD 70.8 billion and a reduction in national income by USD 10.4 billion [6].
Data 2024, 9, 88 2 of 19 The advancement of high-resolution remote sensing satellites has significantly improved the capability for ground information extraction.Consequently, satellite technology has become essential for large-scale monitoring and post-disaster assessments [7][8][9][10].The strategic application of remote sensing techniques to evaluate urban damage after earthquakes is thus emerging as a critical area of focus in disaster remote sensing research [11][12][13].
The advent of deep learning techniques has bolstered the theoretical framework for intelligent remote sensing interpretation, enabling precise automatic recognition through the training of large and diverse datasets [14][15][16].To advance research on intelligent interpretation algorithms in remote sensing for assessing building damage following disasters, Gupta et al. [17] introduced the xBD dataset, which spans multiple disaster types and categorizes building damage into four distinct levels, encompassing 384 images of earthquakeaffected buildings.Zhe Dong applied deep learning to identify structures damaged by the 6.4-magnitude earthquake that struck Cangshan West Town, Yangbi County, Dali Prefecture, Yunnan Province, China on 21 May 2021, utilizing high-definition images from Google Earth and drones to compile a dataset of damaged buildings [18].Furthermore, Yao Sun and colleagues have developed the SAR-Optical Dataset for Rapid Damage Building Detection, focusing on radar imagery.These datasets significantly enhance the research on algorithms for extracting data on buildings damaged by earthquakes.Nonetheless, the reliance on single satellite sources and specific resolutions limits the generalizability of the trained models, posing challenges to their broader application [19].Additionally, most earthquake emergency datasets suffer from limited coverage areas and data volume, further complicating model training [20].The variety of resolutions and satellite sources, along with divergent architectural styles and topography in different earthquake regions, present substantial challenges for training models effectively.
The extraction of features from damaged buildings represents a critical area of focus in post-disaster damage assessment.Historically, numerous scholars have successfully employed threshold techniques or machine learning approaches based on damaged building features to recognize and extract information pertaining to post-earthquake structural damage [21][22][23][24].Fundamental image feature algorithms offer significant advantages, including robust interpretability and well-defined mathematical principles.Additionally, shallow image features can effectively augment deep learning algorithms by providing auxiliary recognition capabilities [25].
In response to the outlined challenges, this paper introduces the Multi-scale Earthquake Damaged Building Feature Set (MEDBFS), a comprehensive large-scale dataset.This feature set amalgamates various satellite remote sensing images and incorporates elements of the xBD dataset as its primary data sources, showcasing a diverse array of scales and resolutions from images collected across multiple significant earthquake events.Based on the visual characteristics of the acquired images, damaged buildings are systematically classified into three categories: intact/slightly damaged, severely damaged, and completely collapsed.The dataset constructs six distinct types of feature images, including spectral, texture, edge, building index, shadow, and temporal sequences, encompassing a total of 7062 images that depict buildings before and after sustaining damage.

Data Source Overview
Data acquisition for this study encompasses five major earthquake-affected regions: Yushu, China, in 2010; Puebla and its environs, Mexico, in 2018; Kahramanmaraş and Gaziantep, Türkiye, in 2023; Marrakech and surrounding areas in Morocco; and Herat and surrounding areas in Afghanistan.Detailed specifics regarding each earthquake are presented in Table 1 [5,6,[26][27][28].The optical remote sensing images utilized were acquired promptly post-disaster, ensuring relevance for emergency response analysis.The feature set developed in this study leverages a variety of remote sensing satellite imagery sources, including QuickBird, Jilin-1, and WorldView, to enhance the robustness and comprehensiveness of the data collected.The QuickBird satellites, launched by DigitalGlobe, Longmont, CO, USA, are advanced high-resolution optical remote sensing satellites that deliver multispectral images at a resolution of 2.44 m and panchromatic images at 0.61 m.The fusion of these images produces multispectral remote sensing images with an enhanced resolution of 0.61 m.The GF-2 satellites, developed independently by China Aerospace Science and Technology Corporation, Beijing, China, are high-resolution optical remote sensing satellites that provide images at resolutions as fine as 0.8 m at nadir.Additionally, the Jilin-1 satellites, also independently developed by Chang Guang Satellite Technology Co., Ltd, Changchun, JL, China, comprise a commercial remote sensing satellite constellation capable of revisiting any point on Earth between 35 and 37 times per day, which is instrumental for timely post-disaster data acquisition.The optical data from Jilin-1 utilized in this study have a resolution of 0.75 m.Furthermore, the WorldView satellites, operated by MAXAR, Westminster, CO USA, are high-resolution commercial remote sensing satellites that supply premium optical remote sensing images with a resolution of 0.3 m, pivotal for disaster response applications.The xBD dataset represents the first large-scale endeavor for postdisaster building damage assessment, encompassing up to 19 categories of post-disaster building damage images [17].
In this research, multispectral and panchromatic imagery from the 2010 China earthquake were sourced from the QuickBird satellite.Optical images for the 2023 Turkey and Afghanistan earthquakes were procured from the Jilin-1 satellites, while imagery of the 2023 Morocco earthquake was captured via the WorldView satellite.Additionally, image data from the 2018 Mexico earthquake, sourced from the xBD dataset which was provided by WorldView satellites, served as the primary data for this study.The distribution of the samples collected for each earthquake event is depicted in Figure 1, with comprehensive details regarding the data sources provided in Table 2.

Feature Set Overview
To facilitate the utilization of diverse image features for the classification of earthquake-damaged buildings, this paper introduces a feature set divided into three components: original pre-and post-disaster images, classification labels for damaged buildings,

Feature Set Overview
To facilitate the utilization of diverse image features for the classification of earthquakedamaged buildings, this paper introduces a feature set divided into three components: original pre-and post-disaster images, classification labels for damaged buildings, and feature maps of building damage.The original images, which are optical satellite remote sensing images, have been collected from five major earthquake-stricken regions, totaling 7062, each with dimensions of 512 × 512 pixels.The classification labels for the damaged buildings reflect multi-class outcomes based on a meticulous analysis by domain experts.This analysis involves a comparative assessment of the imagery before and after disasters, applying specific interpretive criteria to determine the extent and type of damage inflicted on the structures.Furthermore, this study has developed specific classification standards based on existing remote sensing interpretation protocols and the detectability of the samples at the utilized resolution.
The damaged building feature maps within this dataset encompass five categories of features: spectral, texture, edge, building index, and temporal, which are further divided into 16 distinct feature values.The feature set is organized into five compressed packages based on the earthquake-affected region.Each compress archive comprises three folders-Image, Label, and Feature-and one explanatory text file.Within the Image folder, pre-disaster and post-disaster images are stored as 'tiffpre' and 'tiffpost', respectively.The Label folder contains data where '0' indicates no obvious damage and non-building area, '1' denotes severe damage, and '2' signifies complete collapse.The Feature folder houses a total of 16 feature files.All images are in TIFF format, and each package is accompanied by a text file that provides detailed descriptions of its contents.The organizational structure of the feature set, exemplified by the China earthquake, is depicted in Figure 2.

Feature Set Overview
To facilitate the utilization of diverse image features for the classification of earthquake-damaged buildings, this paper introduces a feature set divided into three components: original pre-and post-disaster images, classification labels for damaged buildings, and feature maps of building damage.The original images, which are optical satellite remote sensing images, have been collected from five major earthquake-stricken regions, totaling 7062, each with dimensions of 512 × 512 pixels.The classification labels for the damaged buildings reflect multi-class outcomes based on a meticulous analysis by domain experts.This analysis involves a comparative assessment of the imagery before and after disasters, applying specific interpretive criteria to determine the extent and type of damage inflicted on the structures.Furthermore, this study has developed specific classification standards based on existing remote sensing interpretation protocols and the detectability of the samples at the utilized resolution.
The damaged building feature maps within this dataset encompass five categories of features: spectral, texture, edge, building index, and temporal, which are further divided into 16 distinct feature values.The feature set is organized into five compressed packages based on the earthquake-affected region.Each compress archive comprises three folders-Image, Label, and Feature-and one explanatory text file.Within the Image folder, predisaster and post-disaster images are stored as 'tiffpre' and 'tiffpost', respectively.The Label folder contains data where '0' indicates no obvious damage and non-building area, '1' denotes severe damage, and '2' signifies complete collapse.The Feature folder houses a total of 16 feature files.All images are in TIFF format, and each package is accompanied by a text file that provides detailed descriptions of its contents.The organizational structure of the feature set, exemplified by the China earthquake, is depicted in Figure 2.

Data Preprocessing
Raw data collected in the aftermath of disaster emergencies typically necessitate preprocessing to ensure their applicability and validity.To this end, this study undertook specific preprocessing measures on the original data to validate their integrity and utility for subsequent analyses.
To address the variability in imaging quality and cloud cover that significantly affect the identification of damaged buildings, this paper conducts preliminary screening of the original pre-and post-disaster image data sourced from diverse origins.This process ensures that only high-quality images are retained for further analysis.For images available in both multispectral and panchromatic bands, a fusion technique is employed to enhance the resolution of multispectral images, thereby improving their recognizability.Significant discrepancies exist in the grayscale values and absolute radiance between original image datasets.Standardization issues arise when comparing images produced by different sensors at varying times.To address this, the study performs radiometric calibration on the collected remote sensing images, converting visible light reflectance data into standardized units using specific formulas, which enhance the data's readability and credibility.Furthermore, although atmospheric effects on visible light are generally minimal, aerosols and water vapor can still impact light transmission.Accordingly, this paper implements atmospheric correction techniques on optical remote sensing data to refine image quality.
Original datasets may exhibit geographical inaccuracies that lead to coordinate misalignment between pre-and post-disaster remote sensing images.Orthorectification of visible light images substantially enhances geographical positioning accuracy, mitigates terrain-induced distortions, and facilitates the subsequent registration of dual-time images.Despite these improvements, discrepancies in detail between dual-time images persist following orthorectification.Consequently, it is imperative to conduct geographic matching on the pre-and post-disaster datasets to ensure positional consistency across the images.
In instances where the resolutions of dual-time images differ, this study implements raster downsampling on high-resolution remote sensing images to achieve uniform resolution, thereby ensuring pixel-level correspondence between the images.Concurrently, to optimize inputs for subsequent deep learning analyses, this paper systematically crops images to a uniform size of 512 × 512 pixels.During this cropping process, sample images lacking damaged building representations are excluded, thus maintaining a balance between intact and damaged building datasets.The detailed workflow of these procedures is depicted in Figure 3.

Ground Truth Annotation
To facilitate the application of the feature set in deep learning contexts, this paper integrates actual image interpretation features with established international standards and annotates damaged buildings with ground truth classifications.The European Macroseismic Scale (EMS98), initially proposed by European scholars in 1998, categorizes earthquake-induced building damage into five levels: slight, moderate, severe, very severe, and destruction [29].However, the EMS98 classification, originally based on ground surveys, may not adequately distinguish damage levels in remote sensing images.Consequently, Gupta et al.
[17] introduced the xBD dataset, which revises the classification of building damage into four distinct levels: no damage, slight damage, severe damage, and

Ground Truth Annotation
To facilitate the application of the feature set in deep learning contexts, this paper integrates actual image interpretation features with established international standards and annotates damaged buildings with ground truth classifications.The European Macroseismic Scale (EMS98), initially proposed by European scholars in 1998, categorizes earthquake-induced building damage into five levels: slight, moderate, severe, very severe, and destruction [29].However, the EMS98 classification, originally based on ground surveys, may not adequately distinguish damage levels in remote sensing images.Consequently, Gupta et al. [17] introduced the xBD dataset, which revises the classification of building damage into four distinct levels: no damage, slight damage, severe damage, and destroyed, better suiting the nuances of remote sensing analysis.See Table 3 for detailed classification rules.

(Minor Damage)
Building partially burnt, water surrounding structure, volcanic flow nearby, roof elements missing, or visible cracks.

(Major Damage)
Partial wall or roof collapse, encroaching volcanic flow, or surrounded by water/mud.

(Destroyed)
Scorched, completely collapsed, partially/completely covered with water/mud, or otherwise no longer present.
This study integrates the building damage classification from EMS-98 with the visual interpretation criteria for remote sensing assessment of building damage as proposed by Gupta et al.Given the limited discernibility of low-resolution satellite images, such as those from QuickBird, for detecting slight damage, and considering the relatively lower risk posed by slight damage to human life and property safety, this paper proposes a revised classification of post-earthquake building damage into three categories: intact/slight damage, severe damage, and complete collapse.This classification leverages spectral, texture, geometric, and shadow features extracted, comparing pre-and post-disaster images.The specifications of these features are detailed in Table 4.During the annotation process, rigorous quality control measures are implemented.Initial annotations are performed by laboratory personnel, followed by a secondary review conducted by professional technical staff.Annotations are meticulously based on compar-isons with pre-disaster images, and critical inspections of building damage are undertaken in areas reported to be severely affected post-disaster to ensure the authenticity and validity of the annotations.The classification criteria for collapsed buildings are delineated in Table 3. Figure 4 provides an example of the ground truth annotation process.

Disaster Level Structure Description 0(No Damage)
Undisturbed.No sign of water, structural or shingle damage, or burn marks.

1(Minor Damage)
Building partially burnt, water surrounding structure, volcanic flow nearby, roof elements missing, or visible cracks.

2(Major Damage)
Partial wall or roof collapse, encroaching volcanic flow, or surrounded by water/mud.

3(Destroyed)
Scorched, completely collapsed, partially/completely covered with water/mud, or otherwise no longer present.To ascertain the efficacy of our data annotation protocol, we employed the FTN network training dataset [30].The FTN network utilizes a Siamese structure on the encoding side to augment dual-phase feature extraction, with the swin-transformer serving as the foundational backbone of this structure.The feature fusion stage incorporates both feature summation and difference to amplify multi-level visual features.Enhancing the representational capacity of the memory, the decoding end incorporates the Progressive Attention Module (PAM) to construct a pyramid structure.Moreover, deep supervised learning is applied to optimize the training outcomes.The FTN networks have demonstrated robust performance across various change detection datasets.However, during our experiments, we observed that convolution in the encoding layer yielded superior results compared to the swin-transformer on our dataset.Consequently, we substituted the swin-transformer with ConvNeXt to enhance the training efficacy [31].
The division ratio of the training set to the validation set Is established at 8:2.To ensure an equitable distribution between training and validation samples, 20% of the validation sets were randomly selected from diverse geographical regions.
In our experiments, we employed two backbones, ConvNeXt-base and ConvNeXtsmall, respectively.The specific training curve for our dataset is depicted in Figure 5, while the training F1-score and validation F1-score are presented in Figure 6.In our experiments, cross-entropy loss was selected as the training loss function.The formula for the crossentropy loss function is as follows.In the formula, L represents the value of the loss function.N denotes the number of samples.Y i is the true label of the i-th sample (for binary classification problems, typically represented by 0 or 1).P i is the probability predicted by the model that the i-th sample belongs to the positive class (i.e., the probability value output by the model): contains three categories, we treat the task as three binary classifications to compute the confusion matrix for each category: The F1 Score is the harmonic mean of precision and recall, and the specific calculation formula is as follows: Mean Intersection over Union (MIoU) is a statistical metric used extensively in the evaluation of object detection and segmentation models, particularly in the fields of computer vision and image processing.The formula is as follows (in the formula, n represents the number of categories):    We utilized the F1 Score and Mean Intersection over Union (MioU) derived from multiple categories to gauge the accuracy of the training outcomes.In machine learning, prediction outcomes can be evaluated by a confusion matrix.As shown in Table 5, the confusion matrix depicts the performance of the classification model.In this matrix, True Positive (TP) denotes the number of samples that are positive both as actual and predicted values, False Positive (FP) presents the number of samples falsely predicted as positive while being negative, False Negative (FN) indicates the number of samples falsely predicted as negative while being positive, and True Negative (TN) represents the number of samples that are true both as actual and predicted negative values.Based on this matrix, scholars have developed metrics such as Precision and Recall.Considering our dataset contains three categories, we treat the task as three binary classifications to compute the confusion matrix for each category: The F1 Score is the harmonic mean of precision and recall, and the specific calculation formula is as follows: Mean Intersection over Union (MioU) is a statistical metric used extensively in the evaluation of object detection and segmentation models, particularly in the fields of computer vision and image processing.The formula is as follows (in the formula, n represents the number of categories): Both the F1 Score and MioU are indicative of the overall category discrimination accuracy.The detailed training results are tabulated in Table 6.Experimental results indicate that our annotated dataset achieves a peak validation F1score of 74% on the FTN network, demonstrating a robust performance in the recognition of collapsed buildings.Furthermore, as illustrated in Figure 7, the model effectively discriminates between various conditions of building damage.

Construction of Damaged Building Feature Set
This paper focuses on the extraction of shallow features from diverse image types based on fundamental optical remote sensing data, thereby furnishing shallow feature datasets essential for machine learning and deep learning training.The feature set

Construction of Damaged Building Feature Set
This paper focuses on the extraction of shallow features from diverse image types based on fundamental optical remote sensing data, thereby furnishing shallow feature datasets essential for machine learning and deep learning training.The feature set delineated herein encompasses five categories: spectral, texture, edge, index, and temporal features.Spectral features are categorized into red, green, and blue components.Texture features encompass contrast, energy, correlation, entropy, and angular second-moment gray-level texture features calculated via the Gray-Level Co-occurrence Matrix (GLCM) [32], as well as texture features depicting grayscale variation in the frequency domain, derived using the Local Binary Pattern (LBP) technique [33].Edge features include those derived from Prewitt and Laplace operators [34,35].Building index features utilize the Multi-Band Index (MBI) for feature representation.Shadow features are quantified using shadow index calculations.Temporal features involve 3D texture features computed through a 3D Gray-Level Co-occurrence Matrix [36].The comprehensive composition of the feature set is depicted in Figure 8. poral features.Spectral features are categorized into red, green, and blue components.Texture features encompass contrast, energy, correlation, entropy, and angular secondmoment gray-level texture features calculated via the Gray-Level Co-occurrence Matrix (GLCM) [32], as well as texture features depicting grayscale variation in the frequency domain, derived using the Local Binary Pattern (LBP) technique [33].Edge features include those derived from Prewitt and Laplace operators [34,35].Building index features utilize the Multi-Band Index (MBI) for feature representation.Shadow features are quantified using shadow index calculations.Temporal features involve 3D texture features computed through a 3D Gray-Level Co-occurrence Matrix [36].The comprehensive composition of the feature set is depicted in Figure 8.

Spectral Characteristics
Spectral features, fundamental to the characterization of images, demonstrate varied patterns of spectral absorption, reflection, and radiation across different terrestrial objects.The specific material composition of buildings dictates their unique spectral reflective properties.Post-earthquake, collapsed buildings often expose internal materials, manifesting spectral characteristics that markedly distinguish them from structures that remain intact [31].In this study, we extract the red, green, and blue spectral components from post-disaster optical images to serve as spectral features, as depicted in Figure 9b-d

Spectral Characteristics
Spectral features, fundamental to the characterization of images, demonstrate varied patterns of spectral absorption, reflection, and radiation across different terrestrial objects.The specific material composition of buildings dictates their unique spectral reflective properties.Post-earthquake, collapsed buildings often expose internal materials, manifesting spectral characteristics that markedly distinguish them from structures that remain intact [31].In this study, we extract the red, green, and blue spectral components from post-disaster optical images to serve as spectral features, as depicted in Figure 9b-d.
Spectral features, fundamental to the characterization of images, demonstrate varied patterns of spectral absorption, reflection, and radiation across different terrestrial objects.The specific material composition of buildings dictates their unique spectral reflective properties.Post-earthquake, collapsed buildings often expose internal materials, manifesting spectral characteristics that markedly distinguish them from structures that remain intact [31].In this study, we extract the red, green, and blue spectral components from post-disaster optical images to serve as spectral features, as depicted in Figure 9b-d

Texture Characteristics
Texture features are adept at capturing the grayscale variation patterns within remote sensing images, effectively extracting spatial distribution and structural information of objects.These features play a pivotal role in various applications, including remote sensing image classification, target detection, and change detection.Earthquake-induced damage disrupts the regular texture patterns of buildings, resulting in significant alterations to their texture features.In this study, the Gray-Level Co-occurrence Matrix (GLCM) and Local Binary Pattern (LBP) are utilized to extract texture features from post-disaster images, thereby providing crucial support for the development of damage detection tasks.

Gray Level Co-Occurrence Matrix
Gray-level co-occurrence matrix (GLCM) is a grayscale statistical method used to describe texture features.Each element in the matrix represents the frequency of occurrence of specific grayscale combinations at a certain direction and distance.The specific calculation method is as follows: (1) Select a direction θ and a distance d; (2) Traverse all pixels, record the frequency of occurrence of pixel grayscale levels i and j as pairs of grayscale values at the selected direction θ with adjacent distance d; (3) Summarize the frequency of occurrence of all pixel grayscale level pairs to form the gray-level co-occurrence matrix.We use a 5 × 5 window to slide the post-disaster pictures to generate a gray-level co-occurrence matrix.The texture feature values can be calculated using the gray-level co-occurrence matrix as follows (in Formulas ( 5)-( 9), i and j represent different gray levels.P(i, j) represents the frequency of occurrence of a gray level combination): Angular Second Moment (ASM), also referred to as energy, is employed to describe the uniformity of grayscale distribution and the fineness of texture within images.A higher ASM value indicates a more uniform and distinct image texture, whereas a lower ASM value suggests less uniformity.The ASM feature map is depicted in Figure 10b, and the specific calculation formula is as follows: Entropy (ENT) is a metric used to quantify the randomness of the information content within an image.It correlates positively with the randomness of grayscale information and the complexity of the image's detailed texture.The higher the entropy, the more complex and less predictable the image texture.The entropy feature map is illustrated in Figure 10c, and the specific methodology for its calculation is as follows: Contrast (Con) serves as an indicator of local variations within images, encapsulating the depth of grooves and wrinkles in the texture.This metric directly correlates with the clarity of the image texture, providing insights into the texture's sharpness and variation.The contrast feature map is displayed in Figure 10d, and the specific formula for its calculation is as follows: Inverse Differential Moment (IDM), also known as inverse variance, quantifies the rate of change in differences between distinct grayscale levels within an image.This metric inversely correlates with the magnitude of local grayscale variations-the smaller the variation, the higher the IDM value.This relationship underscores the homogeneity of the image texture.The IDM feature map is illustrated in Figure 10e, and the specific formula for its calculation is as follows: Correlation (Cor) is employed to assess the similarity among grayscale levels along the row and column axes of an image.This feature directly captures the local grayscale correlation, wherein a stronger grayscale correlation corresponds to higher values of the correlation feature.The correlation feature map is depicted in Figure 10f.The specific calculation formula-where  and  represent the mean values in the row and column directions, and  and  represent the standard deviations in the row and column directions-is as follows:

Local Binary Pattern
The Local Binary Pattern (LBP) is a texture descriptor extensively utilized in the field of computer vision.It fundamentally employs the pixel intensity values surrounding a central pixel to sample and generate a binary pattern, which effectively encapsulates the texture characteristics of the local area.For this study, LBP was extracted from post-disaster images using the following specific computational steps: 1. Select the 8-neighborhood region around the central pixel.2. Compare each neighboring pixel to the central pixel: assign a value of 1 if the neighboring pixel's intensity is greater than that of the central pixel, and 0 if less.3. Formulate a binary number by arranging the assigned values clockwise based on the comparison results.4. Convert the binary number to a decimal number as the texture feature value of the current pixel region.
An example of the Local Binary Pattern feature value is shown in Figure 10g.

Edge Characteristics
Edge features denote the linear characteristics of contour lines, which are discerned through a comprehensive series of image processing methods designed to detect transitions in grayscale values.In intact buildings, edge features typically manifest a regular pattern within the image.However, in the aftermath of an earthquake, the structural form of buildings experiencing severe damage or collapse is significantly altered, manifesting distinct changes in edge features when compared to pre-earthquake images.Conse- Correlation (Cor) is employed to assess the similarity among grayscale levels along the row and column axes of an image.This feature directly captures the local grayscale correlation, wherein a stronger grayscale correlation corresponds to higher values of the correlation feature.The correlation feature map is depicted in Figure 10f.The specific calculation formula-where µ i and µ j represent the mean values in the row and column directions, and σ i and σ j represent the standard deviations in the row and column directions-is as follows: Data 2024, 9, 88 14 of 19

Local Binary Pattern
The Local Binary Pattern (LBP) is a texture descriptor extensively utilized in the field of computer vision.It fundamentally employs the pixel intensity values surrounding a central pixel to sample and generate a binary pattern, which effectively encapsulates the texture characteristics of the local area.For this study, LBP was extracted from post-disaster images using the following specific computational steps: 1.
Select the 8-neighborhood region around the central pixel.

2.
Compare each neighboring pixel to the central pixel: assign a value of 1 if the neighboring pixel's intensity is greater than that of the central pixel, and 0 if less.

3.
Formulate a binary number by arranging the assigned values clockwise based on the comparison results.4.
Convert the binary number to a decimal as the texture feature value of the current pixel region.
An example of the Local Binary Pattern feature value is shown in Figure 10g.

Edge Characteristics
Edge features denote the linear characteristics of contour lines, which are discerned through a comprehensive series of image processing methods designed to detect transitions in grayscale values.In intact buildings, edge features typically manifest a regular pattern within the image.However, in the aftermath of an earthquake, the structural form of buildings experiencing severe damage or collapse is significantly altered, manifesting distinct changes in edge features when compared to pre-earthquake images.Consequently, the extraction of edge features is essential for identifying and analyzing the extent of damage in post-earthquake buildings.In this study, we employ the Prewitt operator and the Laplacian operator to extract edge features from images of earthquake-damaged buildings [33,34].Illustrations of these edge features are presented in Figure 11b,c.and the Laplacian operator to extract edge features from images of earthquake-damaged buildings [33,34].Illustrations of these edge features are presented in Figure 11b,c.The Prewitt operator, recognized as a straightforward yet effective edge detection tool, computes local grayscale variations by employing first-order differential values.This operator applies a threshold to identify grayscale transition points, thereby pinpointing edge locations effectively.The operator's template is formalized as Equation ( 6).We use a 3 × 3 window to perform sliding multiplication and sum of  and  on post-disaster images.The two values are averaged to obtain the final Prewitt edge strength as follows: The Laplacian operator, a second-order differential operator, operates fundamentally on the principle of calculating the rate of change across image pixels in various directions using the second-order partial derivatives of the image.This approach facilitates the extraction of edge features from the image.In this study, we employ the eight-neighborhood Laplacian operator to derive edge features from remote sensing images.The specific operator template is delineated in Equation (7).We obtain the Laplace edge intensity through the sliding product and sum operation of the post-disaster image and the operator :

Building Index Characteristics
The Morphological Building Index (MBI) serves as a quantitative descriptor for urban architectural morphological features [32].This index leverages specific spectral properties combined with a sequence of morphological operations to extract pertinent building information.Earthquakes impart substantial damage to the morphological characteristics of structures, which is reflected in alterations to the Morphological Building Index in images of destroyed buildings.In this study, the MBI is utilized to extract characteristics of buildings damaged by earthquakes.The methodological approach involves the following steps: 1. Select the maximum pixel value in the optical image for analysis; 2. Perform morphological white top-hat reconstruction on the resultant image; 3. Calculate the Differential Morphological Profiles (DMP); 4. Compute the Morphological Building Index (MBI).The outcomes of this process are illustrated in Figure 12b.The Prewitt operator, recognized as a straightforward yet effective edge detection tool, computes local grayscale variations by employing first-order differential values.This operator applies a threshold to identify grayscale transition points, thereby pinpointing edge locations effectively.The operator's template is formalized as Equation ( 6).We use a 3 × 3 window to perform sliding multiplication and sum of d x and d y on post-disaster images.The two values are averaged to obtain the final Prewitt edge strength as follows: The Laplacian operator, a second-order differential operator, operates fundamentally on the principle of calculating the rate of change across image pixels in various directions using the second-order partial derivatives of the image.This approach facilitates the extraction of edge features from the image.In this study, we employ the eightneighborhood Laplacian operator to derive edge features from remote sensing images.The specific operator template is delineated in Equation (7).We obtain the Laplace edge intensity through the sliding product and sum operation of the post-disaster image and the operator H:

Building Index Characteristics
The Morphological Building Index (MBI) serves as a quantitative descriptor for urban architectural morphological features [32].This index leverages specific spectral properties combined with a sequence of morphological operations to extract pertinent building information.Earthquakes impart substantial damage to the morphological characteristics of structures, which is reflected in alterations to the Morphological Building Index in images of destroyed buildings.In this study, the MBI is utilized to extract characteristics of buildings damaged by earthquakes.The methodological approach involves the following steps: 1. Select the maximum pixel value in the optical image for analysis; 2. Perform morphological white top-hat reconstruction on the resultant image; 3. Calculate the Differential Morphological Profiles (DMP); 4. Compute the Morphological Building Index (MBI).The outcomes of this process are illustrated in Figure 12b.

Time Characteristics
In this study, the 3D Gray-Level Co-occurrence Matrix (3DGLCM) was employed to capture the temporal features for image computation and extraction.Earthquakes induce noticeable spectral and textural variations in damaged buildings, discernible in dual-temporal images.The application of 3D gray-level co-occurrence matrices facilitates the effective extraction of these variations and their integration into machine learning tasks [19].Conventionally, 2D gray-level co-occurrence matrices statistically assess the probability of adjacent gray levels along a specific direction, as illustrated in Figure 13a.These matrices are capable of being computed in multiple directions on a two-dimensional plane, as shown in Figure 13b.The 3D gray-level co-occurrence matrix extends this concept by aggregating pixel values between dual-temporal images for comprehensive statistical analysis, depicted in Figure 13c.In the proposed feature set, texture feature values are derived using 3D gray-level co-occurrence matrices, with the features subsequently formatted into raster images to capture 3D temporal texture characteristics.The results of these generated feature values are displayed in Figure 14c-f.

Time Characteristics
In this study, the 3D Gray-Level Co-occurrence Matrix (3DGLCM) was employed to capture the temporal features for image computation and extraction.Earthquakes induce noticeable spectral and textural variations in damaged buildings, discernible in dualtemporal images.The application of 3D gray-level co-occurrence matrices facilitates the effective extraction of these variations and their integration into machine learning tasks [19].Conventionally, 2D gray-level co-occurrence matrices statistically assess the probability of adjacent gray levels along a specific direction, as illustrated in Figure 13a.These matrices are capable of being computed in multiple directions on a two-dimensional plane, as shown in Figure 13b.The 3D gray-level co-occurrence matrix extends this concept by aggregating pixel values between dual-temporal images for comprehensive statistical analysis, depicted in Figure 13c.In the proposed feature set, texture feature values are derived using 3D gray-level co-occurrence matrices, with the features subsequently formatted into raster images to capture 3D temporal texture characteristics.The results of these generated feature values are displayed in Figure 14c-f.tive extraction of these variations and their integration into machine learning tasks [19].Conventionally, 2D gray-level co-occurrence matrices statistically assess the probability of adjacent gray levels along a specific direction, as illustrated in Figure 13a.These matrices are capable of being computed in multiple directions on a two-dimensional plane, as shown in Figure 13b.The 3D gray-level co-occurrence matrix extends this concept by aggregating pixel values between dual-temporal images for comprehensive statistical analysis, depicted in Figure 13c.In the proposed feature set, texture feature values are derived using 3D gray-level co-occurrence matrices, with the features subsequently formatted into raster images to capture 3D temporal texture characteristics.The results of these generated feature values are displayed in Figure 14c-f.

Sources of Data Noise
In the feature set we propose, a small subset of the post-disaster data is influenced by cloud and fog, which obscures the imagery of both intact and damaged buildings, resulting in the loss of architectural features and the introduction of noise across various feature maps.Furthermore, in the acquisition of pre-disaster data for the Moroccan earthquake, we encountered variability in the timing and quality of image sources, leading to some

Sources of Data Noise
In the feature set we propose, a small subset of the post-disaster data is influenced by cloud and fog, which obscures the imagery of both intact and damaged buildings, resulting in the loss of architectural features and the introduction of noise across various feature maps.Furthermore, in the acquisition of pre-disaster data for the Moroccan earthquake, we encountered variability in the timing and quality of image sources, leading to some data being completely obscured by clouds and fog.This inconsistency has substantially impacted the computations involving the 3D Gray-Level Co-occurrence Matrix.In machine learning tasks, if there is a problem with poor convergence of machine learning models, I suggest excluding those datasets that are affected by clouds.To facilitate the exclusion of pre-disaster data influenced by clouds from research applications, we have cataloged these data in a TXT document.However, if experiments are conducted using only post-disaster data, this issue may be disregarded.

Conclusions
The interpretation of building damage through remote sensing is critically important for post-earthquake emergency response and other related applications.In this study, optical images produced by multiple remote sensing satellites were selected, amassing a comprehensive dataset of 7062 pre-and post-disaster images.This dataset was utilized to construct a large-scale feature set aimed at analyzing buildings damaged by multiple major earthquakes.With regard to data annotation, this study classifies post-disaster buildings into three categories and provides annotations for images showing severe damage and complete collapse.For feature generation, the study developed a feature set that includes five branches, encompassing a total of 16 feature values.To aid researchers, each feature image was resized to 512 × 512 pixels, facilitating straightforward integration into subsequent deep learning algorithms.This paper announces the public release of the Multi-scale Earthquake Damaged Building Feature Set (MEDBFS), actively encouraging contributions from the research community to enhance this dataset.Our team is committed to continuous improvement of the feature set in forthcoming studies.Ultimately, it is anticipated that the availability of this openly accessible feature set will propel advancements in the field of remote sensing-based post-disaster building detection.

21 Figure 1 .
Figure 1.Proportion of pre-and post-disaster image samples for each earthquake event.

Figure 1 .
Figure 1.Proportion of pre-and post-disaster image samples for each earthquake event.

Figure 1 .
Figure 1.Proportion of pre-and post-disaster image samples for each earthquake event.

Figure 2 .
Figure 2. Folders' structure, taking the Chinese earthquake as an example.Figure 2. Folders' structure, taking the Chinese earthquake as an example.

Figure 2 .
Figure 2. Folders' structure, taking the Chinese earthquake as an example.Figure 2. Folders' structure, taking the Chinese earthquake as an example.

Data 2023, 8 ,
x FOR PEER REVIEW 6 of 21

Figure 8 .
Figure 8. Construction of damaged building features.

Figure 8 .
Figure 8. Construction of damaged building features.

Figure 9 .
Figure 9. Spectral feature set example of China earthquake.(a) Post-disaster optical image; (b) Red feature; (c) Green feature; (d) Blue feature.

Figure 9 .
Figure 9. Spectral feature set example of China earthquake.(a) Post-disaster optical image; (b) Red feature; (c) Green feature; (d) Blue feature.

Figure 11 .
Figure 11.Edge feature set example of China earthquake.(a) Post-disaster optical image; (b) Prewitt edge feature; (c) Laplacian edge feature.

Figure 11 .
Figure 11.Edge feature set example of China earthquake.(a) Post-disaster optical image; (b) Prewitt edge feature; (c) Laplacian edge feature.

Figure 12 .
Figure 12.MBI feature set example of China earthquake.(a) Post-disaster optical image; (b) MBI feature.

Figure 12 .
Figure 12.MBI feature set example of China earthquake.(a) Post-disaster optical image; (b) MBI feature.

Table 2 .
Data source details.

Table 3 .
Damaged building marking standards of xBD.Undisturbed.No sign of water, structural or shingle damage, or burn marks.

Table 4 .
Damaged building marking standards.

Table 3 .
Damaged building marking standards of xBD.
Predicted Positive Predicted Negative Actual Positive True Positive(TP) False Negative(FN) Actual Negative False Positive(FP) True Negative(TN)