Assessment of Earthquake Destructive Power to Structures Based on Machine Learning Methods

: This study presents a machine learning-based method for the destructive power assessment of earthquake to structures. First, the analysis procedure of the method is presented, and the backpropagation neural network (BPNN) and convolutional neural network (CNN) are used as the machine learning algorithms. Second, the optimized BPNN architecture is obtained by discussing the inﬂuence of a di ﬀ erent number of hidden layers and nodes. Third, the CNN architecture is proposed based on several classical deep learning networks. To build the machine learning models, 50,570 time-history analysis results of a structural system subjected to di ﬀ erent ground motions are used as training, validation, and test samples. The results of the BPNN indicate that the features extraction method based on the short-time Fourier transform (STFT) can well reﬂect the frequency-/ time-domain characteristics of ground motions. The results of the CNN indicate that the CNN exhibits better accuracy (R 2 = 0.8737) compared with that of the BPNN (R 2 = 0.6784). Furthermore, the CNN model exhibits remarkable computational e ﬃ ciency, the prediction of 1000 structures based on the CNN model takes 0.762 s, while 507.81 s are required for the conventional time-history analysis (THA)-based simulation. Feature visualization of di ﬀ erent layers of the CNN reveals that the shallow to deep layers of the CNN can extract the high to low-frequency features of ground motions. The proposed method can assist in the fast prediction of engineering demand parameters of large-number structures, which facilitates the damage or loss assessments of regional structures for timely emergency response and disaster relief after earthquake.


Introduction
Destructive earthquake is a tremendous threat to the operation and security of civil infrastructures. For example, the Wenchuan earthquake in 2008 caused the collapse of a large number of buildings, resulting in more than 65,000 deaths [1]. Moreover, the Northridge earthquake in 1994 damaged California's transportation infrastructures, which seriously jeopardizes the transportation system [2]. The rapid assessment of earthquake destructive power to structures can provide a scientific basis for decision-makers, to facilitate the post-earthquake rescue and relief work.
The earthquake destructive power to structures is influenced by the frequency-/time-domain characteristics of ground motions and the nonlinear seismic performance of structures. In previous studies, the seismic fragility analysis [3] and capacity spectrum method [4] have been frequently used to assess the earthquake's destructive power to regional infrastructures [5,6]. With the advancement of high-performance computing technologies and the higher accuracy requirement for seismic damage assessment, the time-history analysis (THA)-based method is gradually applied to the seismic damage simulation of regional infrastructures [7][8][9][10][11]. The THA-based seismic damage simulation method can adequately consider the frequency-/time-domain characteristics of ground motions and the nonlinear seismic performance of structures, to reasonably reflect the destructive power of earthquake to structures. Furthermore, the THA-based seismic damage simulation method can be applied to different types of structural systems and reveal the whole process of failure mechanism of a structure [11,12]. Therefore, the THA-based seismic damage simulation is adopted as the benchmark method in this study.
To date, machine learning methods have been widely used in various fields, such as the recognition of visual objects, natural language, and music [13][14][15][16]. Moreover, there are also many applications of machine learning methods in earthquake engineering [17][18][19][20][21][22][23][24][25]. Xu et al. [17] used several machine learning methods to predict structural types based on regional building attribute data. Mangalathu et al. [18] suggested an artificial neural network-based multi-parameter fragility methodology that can generate bridge-specific fragility curves without grouping the bridge classes. Nguyen et al. [19] proposed a hybrid particle-swarm-optimization-based artificial neural network model to calculate horizontal displacement of columns subjected to an earthquake. Zhang et al. [20] proposed a physical-guided convolutional neural network, which can make full use of the physical model and only adopt a small training dataset to realize an accurate simulation of structural responses. Kiani et al. [21] proposed a structural fragility curve generation method based on multiple machine learning methods. Xiong et al. [22] proposed an automated building seismic damage assessment method using an uncrewed aerial vehicle (UAV) and a convolutional neural network (CNN), which can obtain the damage distribution of an area with an accuracy of 89.39%. Moreover, extensive research works have been reported on the machine learning-based seismic response simulation (Table 1). Mangalathu et al. [23,24] adopt various machine learning methods to predict the damage state of a structure. It is a classification problem, and the computational workload of the simulation is light. Nevertheless, in some cases, obtaining engineering demand parameters (EDPs) is more favorable for damage or loss assessment of a structure. In the study of Zhang et al. [25], long-short-term memory (LSTM) [26] network is used to predict the structural time-history response. The LSTM network requires a relatively large computational workload [27] and is not suitable for the response prediction of a large number of structures. This study focuses on the fast prediction of EDP under seismic excitation. Considering that the backpropagation neural network (BPNN) [28] and CNN [13] models are capable of simulating complex nonlinear problems and exhibit good computational efficiency, these two models are used to predict the EDP of structures subjected to earthquake, to reflect the destructive power of earthquake to structures. Specifically, in Section 2, the analysis procedure of the method is introduced; in Section 3, the details of the adopted structural model and the ground motion records are presented; in Section 4, two feature extraction methods of ground motions are proposed, and the BPNN is used to assess the earthquake destructive power to structures; in Section 5, the CNN is adopted to extract the features of ground Appl. Sci. 2020, 10, 6210 3 of 20 motions and predict the earthquake destructive power to structures. The proposed method aims to realize a rapid evaluation of earthquake destructive power to structures based on machine learning methods. It is expected to provide a useful reference for timely emergency response and disaster relief after the earthquake.

Analysis Procedure of the Proposed Method
Nonlinear THA of structures can well capture the frequency-/time-domain characteristics of ground motions and the elasto-plastic mechanical properties of structures-therefore, it is used to calculate the seismic responses of structures which forms the training set. The simulation framework proposed in this study mainly consists of two modules, which are sample preparation and machine learning method, as shown in Figure 1. The sample preparation module presents details of the structural system, ground motion records, and how each sample for model training is obtained. In the machine learning module, the BPNN and CNN are adopted as the specific machine learning methods, of which the model architecture and simulation results are given in Sections 4 and 5, respectively. The detailed simulation procedures are as follows: 1.
Establish the nonlinear finite element model of a structure, and perform nonlinear THA of the structure subjected for a large number of ground motion records. The structural response data (e.g., maximum displacement and acceleration), together with ground motion data, are collected as a training sample.

2.
Process the training sample data according to the machine learning method adopted. For example, extract feature parameters from ground motion records or adjust the ground motion records to identical lengths.

3.
Feed the machine learning model with the processed sample data and obtain the well-tuned machine learning model.

4.
Apply the well-tuned machine learning model and assess the destructive power of a new ground motion record by predicting the EDP, such as the maximum displacement of a structure.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 20 adopted to extract the features of ground motions and predict the earthquake destructive power to structures. The proposed method aims to realize a rapid evaluation of earthquake destructive power to structures based on machine learning methods. It is expected to provide a useful reference for timely emergency response and disaster relief after the earthquake.

Analysis Procedure of the Proposed Method
Nonlinear THA of structures can well capture the frequency-/time-domain characteristics of ground motions and the elasto-plastic mechanical properties of structures-therefore, it is used to calculate the seismic responses of structures which forms the training set. The simulation framework proposed in this study mainly consists of two modules, which are sample preparation and machine learning method, as shown in Figure 1. The sample preparation module presents details of the structural system, ground motion records, and how each sample for model training is obtained. In the machine learning module, the BPNN and CNN are adopted as the specific machine learning methods, of which the model architecture and simulation results are given in Section 4 and Section 5, respectively. The detailed simulation procedures are as follows: 1. Establish the nonlinear finite element model of a structure, and perform nonlinear THA of the structure subjected for a large number of ground motion records. The structural response data (e.g., maximum displacement and acceleration), together with ground motion data, are collected as a training sample. 2. Process the training sample data according to the machine learning method adopted. For example, extract feature parameters from ground motion records or adjust the ground motion records to identical lengths. 3. Feed the machine learning model with the processed sample data and obtain the well-tuned machine learning model. 4. Apply the well-tuned machine learning model and assess the destructive power of a new ground motion record by predicting the EDP, such as the maximum displacement of a structure. Considering that the damage assessment methods and damage limits of different structures are different [4,29,30], the maximum displacement of a structure is adopted as the EDP to evaluate the earthquake destructive power to structures in this study. For other EDPs, such as maximum acceleration and maximum inter-story drift ratio, the method proposed in this study can also be adopted by retraining the machine learning model with training samples, including the adopted EDP. Considering that the damage assessment methods and damage limits of different structures are different [4,29,30], the maximum displacement of a structure is adopted as the EDP to evaluate Appl. Sci. 2020, 10, 6210 4 of 20 the earthquake destructive power to structures in this study. For other EDPs, such as maximum acceleration and maximum inter-story drift ratio, the method proposed in this study can also be adopted by retraining the machine learning model with training samples, including the adopted EDP.

Sample Preparation
In this study, a relatively simple structural model named elasto-plastic single degree-of-freedom (SDOF) model is adopted, as shown in Figure 2a. The model can capture the elasto-plastic mechanical properties of multi-story buildings and simple girder bridges [31,32]. The elastic parameters of the SDOF model include the natural vibration period and the elastic damping ratio. A tri-linear backbone curve is adopted for the interstory performance of the SDOF model, as shown in Figure 2b [4]. The parameters of the backbone curve parameter include yield strength to weight ratio, peak to yield strength ratio, and the secondary stiffness ratio. The secondary stiffness ratio is defined by the ratio of post-yield stiffness to the elastic stiffness. A single-parameter hysteretic model is adopted for the SDOF system ( Figure 2c). The hysteretic parameters reflect the ratio of the hysteretic area of the pinching model to that of the full bi-linear model ( Figure 2c). The parameters of the elasto-plastic SDOF model adopted in this paper are shown in Table 2.

Sample Preparation
In this study, a relatively simple structural model named elasto-plastic single degree-of-freedom (SDOF) model is adopted, as shown in Figure 2a. The model can capture the elasto-plastic mechanical properties of multi-story buildings and simple girder bridges [31,32]. The elastic parameters of the SDOF model include the natural vibration period and the elastic damping ratio. A tri-linear backbone curve is adopted for the interstory performance of the SDOF model, as shown in Figure 2b [4]. The parameters of the backbone curve parameter include yield strength to weight ratio, peak to yield strength ratio, and the secondary stiffness ratio. The secondary stiffness ratio is defined by the ratio of post-yield stiffness to the elastic stiffness. A single-parameter hysteretic model is adopted for the SDOF system (Figure 2c). The hysteretic parameters reflect the ratio of the hysteretic area of the pinching model to that of the full bi-linear model (Figure 2c). The parameters of the elasto-plastic SDOF model adopted in this paper are shown in Table 2.  Considering that the characteristics of different ground motions vary significantly, to cover different types of ground motions as much as possible, 5057 ground motion records were collected from the pacific earthquake engineering research (PEER) center ground motion database [33]. The obtained ground motions were recorded from 1935 to 2003, of which earthquake magnitude ranges from 4.2 to 7.9. Most of the ground motions were recorded in the US and Japan. Some typical earthquakes are Chi-Chi, El Centro, Northridge, Landers, and Kobe earthquakes. The acceleration response spectra of some selected ground motions are shown in Figure 3. As evident in Figure 3, the randomness of different ground motions is significant. To expand the training data for better generalization of the machine learning model, the ground motion records are normalized and rescaled to ten different peak ground accelerations (PGA) (i.e., PGA = 1 − 10 m/s 2 ), respectively. As a result, 50,570 ground motion acceleration sequences are obtained. Subsequently, THAs are performed to obtain the maximum response displacements of the SDOF model subjected to 50,570 ground motions. Finally, 50,570 ground motion acceleration sequences, together with the corresponding maximum response displacements, are taken as the dataset for model training.    Considering that the characteristics of different ground motions vary significantly, to cover different types of ground motions as much as possible, 5057 ground motion records were collected from the pacific earthquake engineering research (PEER) center ground motion database [33]. The obtained ground motions were recorded from 1935 to 2003, of which earthquake magnitude ranges from 4.2 to 7.9. Most of the ground motions were recorded in the US and Japan. Some typical earthquakes are Chi-Chi, El Centro, Northridge, Landers, and Kobe earthquakes. The acceleration response spectra of some selected ground motions are shown in Figure 3. As evident in Figure 3, the randomness of different ground motions is significant. To expand the training data for better generalization of the machine learning model, the ground motion records are normalized and rescaled to ten different peak ground accelerations (PGA) (i.e., PGA = 1 − 10 m/s 2 ), respectively. As a result, 50,570 ground motion acceleration sequences are obtained. Subsequently, THAs are performed to obtain the maximum response displacements of the SDOF model subjected to 50,570 ground motions. Finally, 50,570 ground motion acceleration sequences, together with the corresponding maximum response displacements, are taken as the dataset for model training. Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 20 The dataset is divided into a training set, a validation set, and a test set. According to the research of Ng [34], the test set should be large enough to give high confidence in the overall performance of the model. In this study, sample split experiments show that a test set size of 1000 can reasonably reflect the performance of the prediction model. Therefore, the original sample set is divided into 48,570, 1000, and 1000 samples for the training, validation, and test sets, respectively. It is worth noting that the original 50,570 sample set is generated using 5057 ground motion records in ten different PGAs. To makes sure that the ground motions in the test and validation sets are completely different from those in the training set, 4857, 100, and 100 ground motion records are randomly selected to generate the training, validation, and test sets, respectively.

Backpropagation Neural Network
The backpropagation neural network (BPNN) is a feed forward neural network, which has a backpropagation process. The backpropagation process repeatedly adjusts the weights of connections in the network, to minimize the difference between the actual output vector and the desired output vector [28].
The structure of the classic the BPNN usually consists of three layers, namely, the input layer, the hidden layer, and the output layer. The log-sigmoid function and the tan-sigmoid function are usually used as activation functions [18,35,36]. The BPNN has been widely used for its advantages of simple structure, stable performance, and easy to implement.

Feature Extractions
Ground motion records contain a large number of acceleration time-history data, which is well beyond the size of input suitable for the BPNN. Therefore, it is necessary to use appropriate methods to extract the features of ground motion data. The extracted features should contain enough information to reflect the destructive power of ground motion to structures. The size of the extracted feature should also be suitable for the implementation of the BPNN.
In the literature, the acceleration response spectrum of ground motion is usually used to characterize the destructive power of ground motion to structures [23]. The calculation of the response spectrum requires a large number of time-history analyses for SDOF systems with different natural vibration periods, which is computationally intensive. In a real post-earthquake scenario, many ground motion sequences will be recorded, and the calculation of the response spectra of many ground motion records is time-consuming. Compared with response spectrum analysis, the fast Fourier transform (FFT) algorithm is efficient and can extract the frequency-domain features of complex signals, which is widely used in various fields [37][38][39][40][41][42]. The dataset is divided into a training set, a validation set, and a test set. According to the research of Ng [34], the test set should be large enough to give high confidence in the overall performance of the model. In this study, sample split experiments show that a test set size of 1000 can reasonably reflect the performance of the prediction model. Therefore, the original sample set is divided into 48,570, 1000, and 1000 samples for the training, validation, and test sets, respectively. It is worth noting that the original 50,570 sample set is generated using 5057 ground motion records in ten different PGAs. To makes sure that the ground motions in the test and validation sets are completely different from those in the training set, 4857, 100, and 100 ground motion records are randomly selected to generate the training, validation, and test sets, respectively.

Backpropagation Neural Network
The backpropagation neural network (BPNN) is a feed forward neural network, which has a backpropagation process. The backpropagation process repeatedly adjusts the weights of connections in the network, to minimize the difference between the actual output vector and the desired output vector [28].
The structure of the classic the BPNN usually consists of three layers, namely, the input layer, the hidden layer, and the output layer. The log-sigmoid function and the tan-sigmoid function are usually used as activation functions [18,35,36]. The BPNN has been widely used for its advantages of simple structure, stable performance, and easy to implement.

Feature Extractions
Ground motion records contain a large number of acceleration time-history data, which is well beyond the size of input suitable for the BPNN. Therefore, it is necessary to use appropriate methods to extract the features of ground motion data. The extracted features should contain enough information to reflect the destructive power of ground motion to structures. The size of the extracted feature should also be suitable for the implementation of the BPNN.
In the literature, the acceleration response spectrum of ground motion is usually used to characterize the destructive power of ground motion to structures [23]. The calculation of the response spectrum requires a large number of time-history analyses for SDOF systems with different natural vibration periods, which is computationally intensive. In a real post-earthquake scenario, many ground motion sequences will be recorded, and the calculation of the response spectra of many ground motion records is time-consuming. Compared with response spectrum analysis, the fast Fourier transform (FFT) algorithm is efficient and can extract the frequency-domain features of complex signals, which is widely used in various fields [37][38][39][40][41][42].
The feature extraction of two ground motions named HECTOR-0515A360 (hereinafter referred to as HECTOR) and IMPVALL-A-E06140 (hereinafter referred to as IMPVALL) (Figure 4a) [33] are discussed. Both ground motion records are normalized to 1 m/s 2 for comparison. HECTOR's vibration is strong in the whole 40 s range, while IMPVALL's vibration mainly concentrates in the range of 3-7 s. By comparing the response spectra of the two ground motions (Figure 4b), it is found that the spectral acceleration of IMPVALL in the range of 0.3-1 s is greater than that of HECTOR. Considering that the natural vibration periods of multi-story buildings are usually within 0.3-1 s [43], the IMPVALL ground motion exhibits greater destructive power to these structures than the HECTOR ground motion. Nevertheless, the FFT spectrum (Figure 4c) indicates that the amplitude of IMPVALL is less than that of HECTOR in almost the entire frequency range. The difference between the response spectrum and FFT spectrum is that the seismic energy of the IMPVALL ground motion is released in a relatively short time. Therefore, the IMPVALL ground motion is very destructive to structures, even if its relatively low FFT spectrum compared to that of the HECTOR ground motion.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 20 The feature extraction of two ground motions named HECTOR-0515A360 (hereinafter referred to as HECTOR) and IMPVALL-A-E06140 (hereinafter referred to as IMPVALL) (Figure 4a) [33] are discussed. Both ground motion records are normalized to 1 m/s 2 for comparison. HECTOR's vibration is strong in the whole 40 s range, while IMPVALL's vibration mainly concentrates in the range of 3-7 s. By comparing the response spectra of the two ground motions (Figure 4b), it is found that the spectral acceleration of IMPVALL in the range of 0.3-1 s is greater than that of HECTOR. Considering that the natural vibration periods of multi-story buildings are usually within 0.3-1 s [43], the IMPVALL ground motion exhibits greater destructive power to these structures than the HECTOR ground motion. Nevertheless, the FFT spectrum (Figure 4c) indicates that the amplitude of IMPVALL is less than that of HECTOR in almost the entire frequency range. The difference between the response spectrum and FFT spectrum is that the seismic energy of the IMPVALL ground motion is released in a relatively short time. Therefore, the IMPVALL ground motion is very destructive to structures, even if its relatively low FFT spectrum compared to that of the HECTOR ground motion.  Previous studies have shown that the short-time Fourier transform (STFT) has a good performance in time-frequency domain analysis [44]. Therefore, this study proposes a ground motion feature extraction method based on the STFT algorithm. In consideration of the duration difference of ground motion records, all ground motions are truncated to the duration of 20 s. The specific process includes the following four steps: 1. Ground motion data truncation. Find the time point of the peak acceleration in a ground motion record, and truncate a 20 s signal with 10 s in both forward and backward. When the signals are less than 10 s in the front or back of the peak acceleration time point, the 20 s signal range should be shifted backward or forward to ensure that the peak acceleration is close to the center of the truncated signal. 2. STFT analysis. Considering that the selection of window size will affect the resolution of the STFT spectrum, the rectangular window length of 5 s and step length of 5 s are selected for the STFT analysis. Therefore, four frequency spectra of the 20 s truncated signal will be obtained. 3. Feature extraction. In this study, the natural vibration period of the adopted SDOF system is 0.5 s ( Table 2). Considering that the period of the structure will be prolonged after entering the plastic state, the frequency range of interest is selected as 0.3-2.3 Hz. The interested frequency range of an STFT spectrum is divided into 10 frequency bands, and the corresponding spectral value of each frequency band is adopted as a feature value. Therefore, the feature extraction of four 5 s-windows yields 40 STFT spectral feature values. 4. Dimension setting of features. Considering that the number of feature values will affect the learning effect of the BPNN, two different dimensions of the feature are discussed in this study. Previous studies have shown that the short-time Fourier transform (STFT) has a good performance in time-frequency domain analysis [44]. Therefore, this study proposes a ground motion feature extraction method based on the STFT algorithm. In consideration of the duration difference of ground motion records, all ground motions are truncated to the duration of 20 s. The specific process includes the following four steps:

1.
Ground motion data truncation. Find the time point of the peak acceleration in a ground motion record, and truncate a 20 s signal with 10 s in both forward and backward. When the signals are less than 10 s in the front or back of the peak acceleration time point, the 20 s signal range should be shifted backward or forward to ensure that the peak acceleration is close to the center of the truncated signal. 2.
STFT analysis. Considering that the selection of window size will affect the resolution of the STFT spectrum, the rectangular window length of 5 s and step length of 5 s are selected for the STFT analysis. Therefore, four frequency spectra of the 20 s truncated signal will be obtained. 3.
Feature extraction. In this study, the natural vibration period of the adopted SDOF system is 0.5 s ( Table 2). Considering that the period of the structure will be prolonged after entering the plastic state, the frequency range of interest is selected as 0.3-2.3 Hz. The interested frequency range of an STFT spectrum is divided into 10 frequency bands, and the corresponding spectral value of each frequency band is adopted as a feature value. Therefore, the feature extraction of four 5 s-windows yields 40 STFT spectral feature values.

4.
Dimension setting of features. Considering that the number of feature values will affect the learning effect of the BPNN, two different dimensions of the feature are discussed in this study.
One is to directly take the 40 STFT spectral values obtained in step (3). The other is to compute the maximum value of feature values in the same frequency range of four windows, which yields 10 feature values.
Based on the above methods, the ground motion features of HECTOR and IMPVALL are extracted, as shown in Figures 5 and 6. the maximum value of feature values in the same frequency range of four windows, which yields 10 feature values.
Based on the above methods, the ground motion features of HECTOR and IMPVALL are extracted, as shown in Figures 5 and 6. Figure 5a shows that the amplitude of IMPVALL in the range of 1.2-2.2 Hz is much higher than that of HECTOR. This result is consistent with the response spectrum results in Figure 4, and well reflects the destructive power of IMPVALL to structures in this frequency range. Figure 5b-d shows that the amplitude of IMPVALL is significantly lower than that of HECTOR. This is because the ground motion acceleration of IMPVALL is significantly lower than that of HECTOR in the time range of 7-20 s. From the above, the proposed feature extraction method based on STFT analysis can well reflect the frequency-/time-domain characteristics of ground motion sequences. The 40 feature values of ground motion are further condensed by computing the maximum amplitude value of feature values in the same frequency band from four STFT spectra, which yields 10 feature values. As shown in Figure 6, the 10 feature values can still reflect the great destructive power of the IMPVALL ground motion to structures in the frequency range of 1.2-1.6 Hz. Therefore, the following section will discuss the training results using 40 and 10 feature values, respectively.

BPNN Architecture
The network structure, together with the input dimension, can greatly influence the prediction accuracy of a BPNN model [45,46]. Therefore, the influences of the number of hidden layers and nodes, as well as the dimension of features, are discussed to obtain the optimal BPNN architecture.
The R 2 and mean absolute error (MAE) are used as evaluation indexes to select the optimal BPNN structure. The random initial weights are used for training, therefore, the results of each training are different. In this section, each BPNN structure was trained four times, respectively, the results are shown in Figures 7 and 8.
For the cases with 40 feature values as input (Figure 7), with the increase of the number of nodes in the hidden layers, the prediction accuracy increases and then decreases. When the number of nodes reaching a certain number, the increase of nodes in the hidden layer can no longer improve the BPNN model.  Figure 5a shows that the amplitude of IMPVALL in the range of 1.2-2.2 Hz is much higher than that of HECTOR. This result is consistent with the response spectrum results in Figure 4, and well reflects the destructive power of IMPVALL to structures in this frequency range. Figure 5b-d shows that the amplitude of IMPVALL is significantly lower than that of HECTOR. This is because the ground motion acceleration of IMPVALL is significantly lower than that of HECTOR in the time range of 7-20 s. From the above, the proposed feature extraction method based on STFT analysis can well reflect the frequency-/time-domain characteristics of ground motion sequences.
The 40 feature values of ground motion are further condensed by computing the maximum amplitude value of feature values in the same frequency band from four STFT spectra, which yields 10 feature values. As shown in Figure 6, the 10 feature values can still reflect the great destructive power of the IMPVALL ground motion to structures in the frequency range of 1.2-1.6 Hz. Therefore, the following section will discuss the training results using 40 and 10 feature values, respectively.

BPNN Architecture
The network structure, together with the input dimension, can greatly influence the prediction accuracy of a BPNN model [45,46]. Therefore, the influences of the number of hidden layers and nodes, as well as the dimension of features, are discussed to obtain the optimal BPNN architecture.
The R 2 and mean absolute error (MAE) are used as evaluation indexes to select the optimal BPNN structure. The random initial weights are used for training, therefore, the results of each training are different. In this section, each BPNN structure was trained four times, respectively, the results are shown in Figures 7 and 8.
For the cases with 40 feature values as input (Figure 7), with the increase of the number of nodes in the hidden layers, the prediction accuracy increases and then decreases. When the number of nodes reaching a certain number, the increase of nodes in the hidden layer can no longer improve the BPNN model.
The BPNN with a single hidden layer achieves the best prediction accuracy when the number of nodes is 3, of which the average R 2 equals 0.5771, and the average MAE equals 0.1537. The BPNN with two hidden layers has the best prediction effect when the number of nodes in each layer is 3, with the average R 2 equals 0.5910, and the average MAE equals 0.1448. Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 20 The BPNN with a single hidden layer achieves the best prediction accuracy when the number of nodes is 3, of which the average R 2 equals 0.5771, and the average MAE equals 0.1537. The BPNN with two hidden layers has the best prediction effect when the number of nodes in each layer is 3, with the average R 2 equals 0.5910, and the average MAE equals 0.1448.
For the cases with 10 feature values as input, the result in Figure 8 indicates that with the increase of the number of nodes in the hidden layer, the accuracy of the model increases and then decreases. Moreover, if the number of nodes in the hidden layers further increases, the R 2 , and MAE indexes become unstable. For the cases with 10 feature values as input, the result in Figure 8 indicates that with the increase of the number of nodes in the hidden layer, the accuracy of the model increases and then decreases. Moreover, if the number of nodes in the hidden layers further increases, the R 2 , and MAE indexes become unstable.
The BPNN with a single hidden layer achieves the best prediction accuracy when the number of nodes is 7, of which the average R 2 equals 0.6740, and the average MAE equals 0.1539. The BPNN with two hidden layers has the best prediction effect when the number of nodes in each layer is 4, with the average R 2 equals 0.6848, and the average MAE equals 0.1488. Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 20 The BPNN with a single hidden layer achieves the best prediction accuracy when the number of nodes is 7, of which the average R 2 equals 0.6740, and the average MAE equals 0.1539. The BPNN with two hidden layers has the best prediction effect when the number of nodes in each layer is 4, with the average R 2 equals 0.6848, and the average MAE equals 0.1488.

Results of the BPNN
According to the discussion in Section 4.2, the 10-dimensional features are selected as the input. Moreover, the BPNN structure with two hidden layers and four nodes in each layer is selected. The schematic diagram of the feature extraction and network structure is illustrated in Figure 9. In the hidden layer of the BPNN, the nodes adopted the ReLU function as the activation function and Bayesian regularization algorithm as the training algorithm [47,48]. The learning algorithm is

Results of the BPNN
According to the discussion in Section 4.2, the 10-dimensional features are selected as the input. Moreover, the BPNN structure with two hidden layers and four nodes in each layer is selected. The schematic diagram of the feature extraction and network structure is illustrated in Figure 9. In the hidden layer of the BPNN, the nodes adopted the ReLU function as the activation function and Bayesian regularization algorithm as the training algorithm [47,48]. The learning algorithm is gradient descent with momentum [49], the loss function is MAE, the learning rate is 0.01, and the maximum training epochs is set to 500. The program adopts the EarlyStopping function [50], which can stop the training and output of the previous optimal model if the loss of validation cannot be improved in 40 epochs. The loss history of training and validation are shown in Figure 10.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 20 gradient descent with momentum [49], the loss function is MAE, the learning rate is 0.01, and the maximum training epochs is set to 500. The program adopts the EarlyStopping function [50], which can stop the training and output of the previous optimal model if the loss of validation cannot be improved in 40 epochs. The loss history of training and validation are shown in Figure 10.  Finally, the test set is input into the optimal model, and the prediction results are shown in Figure 11. The statistical indexes of prediction errors are shown in Table 3. Evidently, the prediction results of the BPNN based on the STFT features are significantly better than those based on the FFT features. Nevertheless, both prediction results are poor. This is because the BPNN model is not capable of capturing the strong nonlinearity of the studied scenario. Therefore, a model based on the convolutional neural network is discussed in the subsequent section.  gradient descent with momentum [49], the loss function is MAE, the learning rate is 0.01, and the maximum training epochs is set to 500. The program adopts the EarlyStopping function [50], which can stop the training and output of the previous optimal model if the loss of validation cannot be improved in 40 epochs. The loss history of training and validation are shown in Figure 10.  Finally, the test set is input into the optimal model, and the prediction results are shown in Figure 11. The statistical indexes of prediction errors are shown in Table 3. Evidently, the prediction results of the BPNN based on the STFT features are significantly better than those based on the FFT features. Nevertheless, both prediction results are poor. This is because the BPNN model is not capable of capturing the strong nonlinearity of the studied scenario. Therefore, a model based on the convolutional neural network is discussed in the subsequent section.  Finally, the test set is input into the optimal model, and the prediction results are shown in Figure 11. The statistical indexes of prediction errors are shown in Table 3. Evidently, the prediction results of the BPNN based on the STFT features are significantly better than those based on the FFT features. Nevertheless, both prediction results are poor. This is because the BPNN model is not capable of capturing the strong nonlinearity of the studied scenario. Therefore, a model based on the convolutional neural network is discussed in the subsequent section.

Convolutional Neural Network
Convolutional neural network (CNN) is also a feed forward neural network, which has excellent performance in image recognition [13] and the evaluation of one-dimensional sequences (such as sound signals) [51,52]. Therefore, the CNN is well suited for the feature extraction and evaluation of ground motion records.
The feature extraction process of the CNN is not designed by human engineers, but learned from data in the training process [14]. Therefore, features of ground motions can be learned in the convolutional layers, and manual feature extraction of ground motion data is not required.

Convolutional Neural Network
Convolutional neural network (CNN) is also a feed forward neural network, which has excellent performance in image recognition [13] and the evaluation of one-dimensional sequences (such as sound signals) [51,52]. Therefore, the CNN is well suited for the feature extraction and evaluation of ground motion records.
The feature extraction process of the CNN is not designed by human engineers, but learned from data in the training process [14]. Therefore, features of ground motions can be learned in the convolutional layers, and manual feature extraction of ground motion data is not required.

CNN Architecture
The architecture of some classical the CNN model, such as LeNet-5 [53], AlexNet [13], and Visual Geometry Group (VGG) [54] models are adopted as references to build the CNN architecture for ground motion evaluation. The CNN structure used in this study is illustrated in Figure 12.

CNN Architecture
The architecture of some classical the CNN model, such as LeNet-5 [53], AlexNet [13], and Visual Geometry Group (VGG) [54] models are adopted as references to build the CNN architecture for ground motion evaluation. The CNN structure used in this study is illustrated in Figure 12. All ground motion records are down sampled to 50 Hz and processed to 20 s signal sequences according to the method presented in Section 4.1. Therefore, 1000 data points of each ground motion are taken as the input for the proposed CNN model. The maximum response displacement of the structure under the action of ground motion is taken as the output. Adam algorithm was adopted as the optimization algorithm [55]. The loss function of MAE is adopted, the learning rate is set to 0.05, the maximum training epoch is set to 200, and the batch size is set to 4000.

Results of CNN
The training of the CNN adopts the EarlyStopping function [47]. The loss history of training and validation are shown in Figure 13. The results show that at the beginning of training, with the increase of epoch, the loss of training decreases rapidly. Subsequently, the loss of validation reaches the minimum (MAE = 0.0825) when the epoch is 75, and the program stopped training at the end of the 115th epoch. The prediction based on the CNN model takes 0.762 s using a laptop platform (i5-4210H, RAM 8G, GTX 960M), which is over 650 times faster than that using the THA-based method (507.81 s).  All ground motion records are down sampled to 50 Hz and processed to 20 s signal sequences according to the method presented in Section 4.1. Therefore, 1000 data points of each ground motion are taken as the input for the proposed CNN model. The maximum response displacement of the structure under the action of ground motion is taken as the output. Adam algorithm was adopted as the optimization algorithm [55]. The loss function of MAE is adopted, the learning rate is set to 0.05, the maximum training epoch is set to 200, and the batch size is set to 4000.

Results of CNN
The training of the CNN adopts the EarlyStopping function [47]. The loss history of training and validation are shown in Figure 13. The results show that at the beginning of training, with the increase of epoch, the loss of training decreases rapidly. Subsequently, the loss of validation reaches the minimum (MAE = 0.0825) when the epoch is 75, and the program stopped training at the end of the 115th epoch. The prediction based on the CNN model takes 0.762 s using a laptop platform (i5-4210H, RAM 8G, GTX 960M), which is over 650 times faster than that using the THA-based method (507.81 s).
The test set is input into the optimal model, and the prediction result is shown in Figure 14 and Table 4. As evident in the result, the CNN model exhibits a good prediction accuracy, which outperforms the BPNN model ( Figure 11). LeCun et al. [14] mentioned that conventional machine learning techniques (such as the BPNN used in this study) have limitations in processing raw forms of natural data. Therefore, build pattern recognition and machine learning system requires considerate and comprehensive engineering design of the feature extractor, which converts the raw data into a suitable internal representation or feature vector. When the feature extractor is not effective, the BPNN can learn very limited information. The CNN model, on the other hand, can extract more representative features of raw material through the training process, which can be used to solve more complex problems. Therefore, to further demonstrate the feature extraction process of the proposed CNN model, the visualization of extracted features in convolutional layers is discussed in the following section.
The training of the CNN adopts the EarlyStopping function [47]. The loss history of training and validation are shown in Figure 13. The results show that at the beginning of training, with the increase of epoch, the loss of training decreases rapidly. Subsequently, the loss of validation reaches the minimum (MAE = 0.0825) when the epoch is 75, and the program stopped training at the end of the 115th epoch. The prediction based on the CNN model takes 0.762 s using a laptop platform (i5-4210H, RAM 8G, GTX 960M), which is over 650 times faster than that using the THA-based method (507.81 s).  The test set is input into the optimal model, and the prediction result is shown in Figure 14 and Table 4. As evident in the result, the CNN model exhibits a good prediction accuracy, which outperforms the BPNN model ( Figure 11). LeCun et al. [14] mentioned that conventional machine learning techniques (such as the BPNN used in this study) have limitations in processing raw forms of natural data. Therefore, build pattern recognition and machine learning system requires considerate and comprehensive engineering design of the feature extractor, which converts the raw data into a suitable internal representation or feature vector. When the feature extractor is not effective, the BPNN can learn very limited information. The CNN model, on the other hand, can extract more representative features of raw material through the training process, which can be used to solve more complex problems. Therefore, to further demonstrate the feature extraction process of the proposed CNN model, the visualization of extracted features in convolutional layers is discussed in the following section.

Feature Visualization of the CNN
As mentioned earlier, the CNN has better feature extraction performance compared with the BPNN. The CNN can directly extract effective features from ground motion data and avoid information loss caused by manual feature extraction. The convolutional layer extracts various features from the previous layer, and the pooling layer makes features robust to noise and deformation [14].
Visualizing the output of each convolutional layer (including the process of convolution and pooling) is helpful to understand the features extraction mechanism of the CNN. To clearly demonstrate the extracted feature of each convolutional layer, a synthetic signal is input into the CNN model. The synthetic signal is composed of seven simple harmonics with frequencies range from 0.1-20 Hz (Figure 15a). The output of each convolutional layer is presented in Figure 15b.

Feature Visualization of the CNN
As mentioned earlier, the CNN has better feature extraction performance compared with the BPNN. The CNN can directly extract effective features from ground motion data and avoid information loss caused by manual feature extraction. The convolutional layer extracts various features from the previous layer, and the pooling layer makes features robust to noise and deformation [14].
Visualizing the output of each convolutional layer (including the process of convolution and pooling) is helpful to understand the features extraction mechanism of the CNN. To clearly demonstrate the extracted feature of each convolutional layer, a synthetic signal is input into the CNN model. The synthetic signal is composed of seven simple harmonics with frequencies range from 0.1-20 Hz (Figure 15a). The output of each convolutional layer is presented in Figure 15b. As shown in the right-hand side of Figure 15, the output of the first convolutional layer is similar to the synthetic signal, which not only capture the high-frequency features, but also carries the information of the overall trend (low-frequency). With the increase of convolution depth, the length of output data decreases continuously, and the proportion of low-frequency content increases gradually. For example, the result of the convolutional filter No. 2 in the fourth convolutional layer is very similar to the shape of the simple harmonic with the frequency of 0.1 Hz. The above results show that the shallow convolutional layers of the CNN model mainly extract the high-frequency features of a signal, while the deep convolutional layers primarily extract the low-frequency features of a signal. In addition, the features extracted by the CNN model can also reflect the time-domain features of a signal. Therefore, the CNN model can extract effective features of ground motions and yields better prediction than that of the BPNN model.
Subsequently, the Chichi05_CHY060-e ground motion (hereafter referred to as CHICHI) was input into the well-tuned the CNN model, and the outputs of seven convolutional layers are shown in Figure 16. Similar to the result of the synthetic signal, the outputs of shallow convolutional layers contain more high-frequency information. As the convolutional layer deep in, the outputs contain As shown in the right-hand side of Figure 15, the output of the first convolutional layer is similar to the synthetic signal, which not only capture the high-frequency features, but also carries the information of the overall trend (low-frequency). With the increase of convolution depth, the length of output data decreases continuously, and the proportion of low-frequency content increases gradually. For example, the result of the convolutional filter No. 2 in the fourth convolutional layer is very similar to the shape of the simple harmonic with the frequency of 0.1 Hz. The above results show that the shallow convolutional layers of the CNN model mainly extract the high-frequency features of a signal, while the deep convolutional layers primarily extract the low-frequency features of a signal. In addition, the features extracted by the CNN model can also reflect the time-domain features of a signal. Therefore, the CNN model can extract effective features of ground motions and yields better prediction than that of the BPNN model.
Subsequently, the Chichi05_CHY060-e ground motion (hereafter referred to as CHICHI) was input into the well-tuned the CNN model, and the outputs of seven convolutional layers are shown in Figure 16. Similar to the result of the synthetic signal, the outputs of shallow convolutional layers contain more high-frequency information. As the convolutional layer deep in, the outputs contain more low-frequency information. For example, in Figure 16, filter No. 11 of the second convolutional layer can well capture the location and shape of the feature in the red circles.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 16 of 20 more low-frequency information. For example, in Figure 16, filter No. 11 of the second convolutional layer can well capture the location and shape of the feature in the red circles. It is worth mentioning that the output of filter No. 9 in the fifth convolutional layer is null, which means this ground motion does not contain the corresponding feature. Because the CNN model is learning from a large number of ground motions, and different ground motions are so random that a feature extracted from one ground motion may not necessarily exist in another. This is why multiple filters are adopted in each convolutional layer of the CNN model. Multiple filters enable the model of learning more information on different ground motions and improve the generalization of the model.

Conclusions
In this study, a machine learning-based destructive power assessment method of earthquake to structures is proposed. The method is proposed based on the backpropagation neural network (BPNN) and convolutional neural network (CNN). The results of 50,570 numerical simulations are It is worth mentioning that the output of filter No. 9 in the fifth convolutional layer is null, which means this ground motion does not contain the corresponding feature. Because the CNN model is learning from a large number of ground motions, and different ground motions are so random that a feature extracted from one ground motion may not necessarily exist in another. This is why multiple filters are adopted in each convolutional layer of the CNN model. Multiple filters enable the model of learning more information on different ground motions and improve the generalization of the model.

Conclusions
In this study, a machine learning-based destructive power assessment method of earthquake to structures is proposed. The method is proposed based on the backpropagation neural network (BPNN) and convolutional neural network (CNN). The results of 50,570 numerical simulations are used as samples to train the machine learning models, and some conclusions can be drawn as follow: 1.
For the feature extraction of ground motion time-history data, STFT can well reflect the frequency-/time-domain features of ground motions. The prediction using STFT as feature extractor yield higher accuracy (R 2 = 0.6784) than that using FFT (R 2 = 0.4277).

2.
The selection of the number of layers and nodes of the BPNN is related to the features. In this study, in the case of 40 features as input, the optimal model is the BPNN with two hidden layers and three nodes in each layer. For the case of 10 features as input, the optimal model is the BPNN with two hidden layers and four nodes in each layer. 3.
The model based on the CNN exhibits better prediction accuracy (R 2 = 0.8737) than the BPNN model (R 2 = 0.6784). This is because the convolution layer of the CNN can identify the frequency-/time-domain information of a signal, and avoid the bias caused by artificial feature extraction in the meantime. The advantage of the CNN is magnified for its deep network architecture and multiple filters in each layer.

4.
Feature visualization of convolutional layers reveals that the shallow convolutional layers of the CNN model mainly extract the high-frequency features of a signal, while the deep convolutional layers primarily extract the low-frequency features of a signal. In addition, the features extracted by the CNN model can reflect the time-domain features of a signal. 5.
The CNN model exhibits remarkable computational efficiency compared with THA-based method. For example, using a laptop platform (i5-4210H, 8G of RAM, GTX 960M), the prediction of 1000 structures using the CNN model takes 0.762 s, which is over 650 times faster than that using the THA-based method (507.81 s). The high computational efficiency of the CNN-based seismic response prediction makes it promising for use in the timely assessment of regional structures.