Convolutional Neural Network-Based Rapid Post-Earthquake Structural Damage Detection: Case Study

It is necessary to detect the structural damage condition of essential buildings immediately after an earthquake to identify safe structures, evacuate, or resume crucial activities. For this reason, a CNN methodology proposed to detect the structural damage condition of a building is here improved and validated for two currently instrumented essential buildings (Tahara City Hall and Toyohashi Fire Station). Three-dimensional frames instead of lumped mass models are used for the buildings. Besides this, a methodology to select records is introduced to reduce the variability of the structural responses. The maximum inter-storey drift and absolute acceleration of each storey are used as damage indicators. The accuracy is evaluated by the usability of the building, total damage condition, storey damage condition, and total comparison of the damage indicators. Finally, the maximum accuracy and R2 of the responses are obtained as follows: for the Tahara City Hall building, 90.0% and 0.825, respectively; for the Toyohashi Fire Station building, 100% and 0.909, respectively.


Introduction
The damage condition of essential buildings immediately after an earthquake is one of the most critical indicators for future decision-making by the government, owners, and stakeholders. Acceleration or displacement recordings from instrumented buildings during earthquakes offer valuable information to identify and monitor their damage extent. Thus, structural health monitoring is an expanding field that allows for establishing procedures to screen the structural status of buildings. For example, in order to tackle federal buildings, Mehmet Çelebi from the United States Geological Survey (USGS) reported guidelines for the seismic instrumentation of structures as part of a USA project in 2002 [1]. On the other hand, the structural information obtained from the building after an earthquake can be used for different purposes, which depend on the detail level. For instance, the usability of the buildings related to protecting their occupants is needed immediately, and a deep behavioral understanding of the structure is obtained after months [2]. However, the current conventional post-earthquake actions report the usability of buildings in days or even weeks after the event [3].
Structural responses such as the maximum inter-storey drift and acceleration can be used in order to determine structural integrity. Hazus is a geographic information system-based natural hazard analysis tool, used by the Federal Emergency Management Agency of the USA, and the Hazus earthquake model evaluates the damage probability of buildings and infrastructures considering inter-storey drift and acceleration limits to establish the structural and nonstructural damage states [4].
The accuracy of the structural responses depends mainly on the adopted structural model and the type of structural analysis. For example, a lumped mass model (LMM) is less accurate than a three-dimensional frame model (3D-FM) of high-rise or irregular maximum ductility ratio, the inter-storey drift, and the maximum absolute acceleration of each storey of the LMM using the acceleration record of a single sensor located on the top floor of the building, where the wavelet spectra were obtained from the absolute acceleration of this sensor and used as images for the input maps.
In this study, the authors update and improve the damage identification method proposed by Moscoso Alcantara et al. [20] as follows: • The structural models are 3D-FM. This allows all buildings to have different lateral force-resisting systems, structural configurations, material types, and elastic and inelastic behavior of their members; • The validation of the CNN model is applied to two instrumented buildings in Japan; • The structural responses used as damage identifiers are the maximum inter-storey drift (SD) and the maximum absolute acceleration (AA) of each storey of the target buildings; • A methodology to select records for each damage identifier is introduced using the Incremental Dynamic Analysis (IDA) responses of each target building, where the ground motions are scaled in order to cover the elastic and inelastic behavior of the target building; • The input map data for the training CNN model use the Wavelet Power Spectrum (WPS) computed from the absolute acceleration response measured by the sensor located on the top floor of each target building.
The accuracy of the results is evaluated by comparing the damage condition of the building with the reference values.
Although training and validating the CNN model is computationally intensive, once the CNN model is developed, the CNN algorithm trained for the target building can automatically predict the elastic and inelastic structural responses, and detect the damage condition immediately after the earthquake. This paper contains sections as follows: In Section 2, the methodology is presented, including an overview of the proposed research procedure. Section 3 shows the general information about the target buildings. Section 4 establishes the nonlinear structural models used for the target buildings. Damage levels based on the maximum inter-storey drift and absolute acceleration as damage identifiers are defined in Section 5. Furthermore, Section 6 presents the methodology used to select records from the database to reduce the variability of structural responses. Section 7 specifies the wavelet power spectrum used as input data for the CNN model, and its procedure and characteristics are described in Section 8. Additionally, Section 9 defines the training and validation process used to obtain a trained CNN model. This is applied to the target buildings, and their prediction results are shown in Section 10. Finally, in Section 11, a summary of conclusions and a discussion of the research results are presented.

Methodology
As shown in Figure 1, there are two processes for obtaining the ML (CNN) model in this methodology. They are called the training process (TP) and the validation process (VP). Hence, the selected records (see Section 6) are divided into two parts for each process. In total, 90% of the ground motions are used for the TP, and 10% for the VP.
In the TP, there are two subprocesses, the data preparation process and the training of the model process. The input data (WPS) and the output reference (SD or AA on each floor) are obtained from the IDA in the data preparation process. The trained CNN model is obtained in the training of the model process and used in the VP.
In the VP, the data preparation process is used in order to predict new input data and output references. Subsequently, the trained CNN model (from the TP) is validated when the highest accuracy is found by comparing it to structural damage identification.

Target Buildings
Two target buildings are considered to validate a CNN model. They are instrumented buildings in Japan: Tahara City Hall (steel structural system) and Toyohashi Fire Station building (steel-reinforced concrete structural system).

Tahara City Hall Building
The Tahara City Hall is a local government office building located in Toyohashi city of Aichi prefecture in Japan (see Figure 2). This building is an instrumented building, and the location of the sensor is shown in Figure 3. The main structural characteristics are as follows:

•
The structural system of the building is a moment-resisting frame in steel; The X-direction presents an irregular configuration in its elevation (see Figure 3). Only the X-direction is analyzed in this study;

Toyohashi Fire Station Building
The Toyohashi Fire Station is a fire station located in Toyohashi city of Aichi prefecture in Japan (see Figure 4). This building is an instrumented building, and the location of the sensor is shown in Figure 5. The main structural characteristics are as follows:

•
The structural system of the building is a moment-resisting frame in steel-reinforced concrete (SRC); • The number of floors is six with a basement, and the typical storey height is 4.00 m; • The storey weights are basement = 18,019 kN, 1st storey = 14,570 kN, 2nd storey = 12,483 kN, 3rd storey = 12,470 kN, 4th storey = 13,043 kN, 5th storey = 12,412 kN, 6th storey = 11,834 kN, and 7th storey = 10,588 kN; • The steel I cross-sections are embedded in RC rectangular beams and columns; • Both the X-and Y-directions are regular configurations, as shown in Figure 5. Only the X-direction is analyzed in this study;    In this study, the signals of the sensors have been simulated from the 3D structural models, which were constructed element by element based on the structural drawings. After the earthquake, the sensors will be activated and read the acceleration when a threshold is reached. Subsequently, all data are automatically stored on the network cloud and can be used to assess the damage. However, for future research, it is recommended to use other methods in order to reconstruct the missing data due to anomalies and other factors [21,22].

Nonlinear Structural Models for the Target Buildings
The structural models for the target buildings consist of three-dimensional elements with elastic and inelastic behavior. The software STERA_3D [23] developed by one of the authors was used, wherein the frame beam elements were modeled using nonlinear flexural springs at their ends and a nonlinear shear spring in the middle, as shown in Figure 6.   Likewise, the frame column elements are modeled as multi-spring models considering a nonlinear interaction between the bidirectional-flexural and axial effects (Mx-My-Nz, as shown in Figure 8a). The springs are distributed in the RC and steel cross-sections, as shown in Figure 8b. Moreover, Figure 8c shows the hysteresis model for steel and concrete springs. The nonlinear shear springs in the X-and Y-directions are defined independently.

Damage Identification of the Target Buildings
The inelastic 3D-FMs for the target buildings are made to obtain structural responses in detail. The damage condition will be identified from the maximum structural responses SD and AA on each storey.

Selection of Ground Motion Records
A record database was generated from the records obtained in the Center for Engineering Strong Motion Data by USGS and the California Geological Survey. This center receives worldwide records from the cooperation of international strong-motion seismic networks [24].
In order to consider the ground motion records with high intensity and reduce the number of samples (less than 3000), the records with a PGA greater than 400 gal and a time range from 5% to 95% of the Arias intensity have been selected [25]. Finally, 183 ground motion records have been selected in this study for the database.
In the structural analysis, the variability of the structural responses of the building depends mainly on the ground motion used. On the other hand, the prediction accuracy of ML is improved when the output variability is reduced. Therefore, a methodology to select the records from structural responses has been developed using the IDA of each target building. Figure 9 shows the procedure for determining the ground motion records from the database. For the IDA, the demand measure is either the SD or AA (on the vertical axis), and the intensity measure is the 5% damped spectral acceleration matched at the fundamental period (Sa(T 1 , 5%) on the horizontal axis). Sa(T 1 , 5%) is selected to represent the seismic intensity, where the main modal mass contribution is obtained. Besides this, a normal distribution is considered to represent the variability of the structural responses along Sa(T 1 , 5%). Thus, 68% of the structural responses are represented within ±1σ (one standard deviation) of the mean, resulting in a confidence interval from 16% to 84% fractile.
Therefore, in order to cover elastic and inelastic behavior ranges, the IDA curves use a confidence interval from 0% to 84% fractile of the structural responses. Besides this, 1/50 has been established as the maximum inter-storey drift limit. The selected records and Sa(T 1 , 5%) have been derived from accomplishing both previous conditions for SD. The IDA scale factors are such that the resultant spectral acceleration (∆Sa) is 25 gal if Sa(T 1 , 5%) max is less than 1000 gal, and 50 gal if Sa(T 1 , 5%) max is greater than 1000 gal. The IDA curves for AA, the same confidence interval, scale factors, and maximum Sa(T 1 , 5%) have been used to find their selected ground motion records. These criteria have been developed after several structural analyses carried out in this research.    Figures 12 and 13 show the acceleration response spectra (Sa) of the selected records subdivided into training and validation records used for the CNN models for SD and AA analyses. Besides this, the validation records for the SD and AA analyses are the same, in order to consider the same earthquake events. Additionally, Sa is 100 gal at the fundamental period of each target building.

Selection of Records for Tahara City Hall Building
After selecting the records, 130 and 142 records have been chosen for SD and AA, respectively, shown in Figure 10b,d.
The selected records are split into 120 and 132 for the training of CNN models for SD and AA. Besides this, ten records have been used for the validation process in both analyses.

Selection of Records for Toyohashi Fire Station Building
After selecting the records, 115 and 144 records have been chosen for CNN models for SD and AA, respectively, shown in Figure 11b,d.
The selected records have been split into 99 and 126 for the training of CNN models for SD and AA. Besides this, 16 records are used for the validation process for both analyses.

Wavelet Power Spectrum as Input Data of CNN
The acceleration record of the upper floor is obtained from the sensor installed on the target building. Since these records are non-stationary signals, they are transformed in order to capture their characteristics in the time and frequency domains. In this study, the wavelet transform is used.
Wavelet functions convolute the original signal into a space and scale field. The scale decomposition (related to the frequency domain) is obtained by dilating and shortening the wavelet. On the other hand, space decomposition comes from their variability in time (position) [26,27].
The wavelet signal is called the mother wavelet. The wavelet used in this study is the Morlet wavelet (complex-valued wavelet), which is the product of a sine (complex exponential) wave and a Gaussian envelope, as defined by Equation (1) [27]: where ω 0 is the nondimensional frequency. In this study, ω 0 is taken to be 6 in order to accomplish the admissibility property, according to [26]. Subsequently, ψ 0 will be normalized to keep constant the total energy when it is scaled. Furthermore, the parameters "a" and "b" are included in the Morlet wavelet in order to modify the scale and space (translation), respectively. The normalized Morlet wavelet is defined by Equation (2).
The continuous wavelet transform (CWT) of a discrete signal s (t) is defined by Equation (3): where "N" is the number of samples of the signal and the asterisk symbol (*) indicates the complex conjugate of the wavelet. CWT is a complex function because of the Morlet wavelet. Therefore, the module of CWT is the wavelet spectrum (WS), defined by Equation (4), and the square of the module is the wavelet power spectrum (WPS), defined by Equation (5).
WS was used by Moscoso et al. for the maximum absolute acceleration response on the upper floor. These results were used as images for the input data of the CNN model [20]. However, since WPS increases the coefficients of WS exponentially, the main characteristics of the signal are intensified in order to train the CNN model. For example, Figure 14 shows a random acceleration response, WS and WPS in 2D and 3D. Notice that WPS depicts the main frequencies more evidently than WS.  Figure 15 shows a general CNN model. The input, convolutional, pooling, and fullyconnected layers compose the architecture of the CNN model in this study. Each target building has a particular CNN model architecture, which depends on its accuracy after evaluation. The set of images is the input layer. In this study, CNN uses the images obtained from the WPS. Moreover, the convolutional layer is formed from the convolutional process, padding, and activation function methods, as shown in Figure 16. As known, a convolutional process is a mathematical operation of two functions. The functions used in CNN are arrays of data. The set of WPS is the first array, and the second is a set of filters used to extract and learn the main features of the first one. They are called kernels (K) or feature detectors. Since CNN is part of the ANN, a kernel is a set of updatable weights for the training process of the CNN model.

Convolutional Neural Network (CNN) Model
According to the prediction project, the CNN uses one-, two-, or three-dimensional space. This study used a two-dimensional space. Equation (6) defines the convolutional operation in CNN [28]: where FM is the feature map, WPS is the wavelet power spectrum used as input data, and K is the kernel array. In order to keep the size of the original image of the feature map, the same-padding or zero-padding method is used in this study. This method adds zeros along the border of the original image. Furthermore, every value of the feature map is evaluated by the activation function method, which allows for learning the nonlinearity features of the CNN model. The rectified linear unit function (ReLU) is used in this study and is defined by Equation (7): The pooling layer reduces the resolution and sizes of the feature maps, resulting in a lower computational cost. The maximum pooling layer, or max-pooling, is used for this research. The max-pooling divides the feature maps into sub-arrays with the size P × P, and the result is the maximum value of this. The CNN model without the max-pooling of Moscoso et al. converged more effectively. Nonetheless, the CNN model used for this study converged more efficiently using max-pooling layers because of the more significant amount of data. The number of the convolutional or max-pooling layer depends on the architecture of the CNN model (see Table 2).  After the convolutional and max-pooling layers, the outputs are connected to a onedimensional array, the fully connected layer. This layer is connected to the dense layer, which provides the predicting results. The predicting results are the structural responses of each storey of the target building. As mentioned, the structural responses are used as damage indicators. It is an iterative process performed until finding the lowest Mean Squared Error (MSE), which is the error used in this study and defined by Equation (8). The iteration is called an "epoch", and 50 epochs are considered in this study.
where y pred and y re f are the prediction and reference results, respectively, and N is the number of samples. The criteria of the error measure could be modified in order to improve the forecasting accuracy [29].
The hyperparameters used in this study are shown in Table 2. They were obtained after several processes of training and validation. However, methods are recommended to optimize the hyperparameters [30,31].

Training and Validation Processes
As mentioned in the methodology (Section 2), there are two processes used to obtain a trained CNN model, the training process (TP) and the validation process (VP).
In the TP, the CNN model is trained using the WPS of the absolute acceleration on the top floor of the target building. Then, the prediction results of the CNN model are compared to the SD and AA of each floor of the target building. The IDA numerical procedure is called the data preparation process in the methodology, and obtains the SD and AA. The TP provides a prepared model to make predictions; however, its accuracy should be checked in the validation process.
In the VP, new input data are obtained by the data preparation process. The new WPS is used in the trained CNN model and automatically predicts the results (SD or AA). The following results are compared to reference data:

•
The usability of the building, in which the availability of the building occupancy is evaluated after an earthquake, • The total damage condition, in which it is possible to identify the damage state of the target building, • Storey damage condition, in which it is possible to identify the damage state of each floor of the target building, • Total comparison of the SD or AA.
In general, one of the most potent advantages of the ML method in SHM is the rapid prediction result when an earthquake occurs. In other words, even though the TP and VP take a long time to obtain the final CNN model, it is carried out before the earthquake, and the prediction is obtained automatically. Therefore, it is possible to identify the damage states of actual buildings (3D regular or irregular structural configurations) immediately after the earthquake.

Prediction Results of Target Buildings
The results of predicting the responses and damage levels of the target buildings are summarized as follows: • Figures 17 and 18  A confusion matrix is used to evaluate the prediction accuracy of the total and storey damage condition (see Figures 17b,c, 18b,c, 19b,c and 20b,c). The confusion matrix represents the correct and incorrect predictions through the number of coincidences with the reference data. The rows and columns of the matrix are tagged as the predicted and the true label, respectively. Therefore, the number of well-matched predictions is located on the diagonal of the matrix.  (9): where y pred and y re f are the prediction and reference results, respectively. Furthermore, y re f is the mean of the reference values and N is the number of samples.      Table 3 shows the evaluation accuracy of the target buildings in the VP. The results are summarized below:

•
For the Tahara City Hall building, the maximum accuracy and R 2 are 90.0% (usability of the building) and 0.825, respectively; • For the Toyohashi Fire Station building, the maximum accuracy and R 2 are 100% (damage condition of the basement) and 0.909, respectively; • In general, the accuracy of the estimation of SD is the highest.

Conclusions and Discussion
In this research article, a previous methodology proposed by the authors has been improved and applied to two instrumented buildings in Aichi Prefecture in Japan, called Tahara City hall and Toyohashi Fire Station. The summary of the proposed methodology is as follows: • CNN models are trained per target building using the WPS of the absolute acceleration of the top floor record as input data to predict the SD and AA values. SD and AA are used as indicators to detect the damage state of the structures; • A methodology to select records in order to reduce the variability of the structural responses using IDA is proposed, wherein the confidence interval between the 0% and 84% fractiles is adopted; • The evaluation accuracy is discussed on the usability of the building, total damage condition, storey damage condition, and total comparison of the damage indicator; • The maximum accuracy and R 2 for the Tahara City Hall building are 90.0% (usability of the building) and 0.825, respectively; • The maximum accuracy and R 2 for the Toyohashi Fire Station building are 100% (damage condition of the basement) and 0.909, respectively; • In general, the accuracy of the estimation of SD is the highest.
Finally, the improved methodology based on CNN immediately detects the structural damage condition of buildings, considering only one sensor on the top floor. Since the training and validation process are computed before, a prediction can be obtained immediately after an earthquake.