Effects of the Earthquake Nonstationary Characteristics on the Structural Dynamic Response: Base on the BP Neural Networks Modiﬁed by the Genetic Algorithm

: The intensity non-stationarity is one of the basic characteristics of ground motions, the in-ﬂuences of which on the dynamic responses of structures is a pressing issue in the ﬁeld of earthquake engineering. The BP neural network modiﬁed by the genetic algorithm was adopted in this research to investigate the inﬂuence of intensity nonstationary inputs on the structural dynamic responses from a new perspective. Firstly, many training data were generated from the prediction formula of dynamic response. The BP neural network was then pre-trained by sparsely selected data to optimize the initial weights and biases. Finally, the BP neural network was trained by all data, and the mean square error of predicted responses compared with the target response were less than 10 − 5 . The calculation formula of sensitivity was also derived here to quantify the inﬂuence of the input change on the output. The presented method combines the advantages of neural networks in nonlinear multi-variable ﬁtting and provides a new perspective for the study of earthquake nonstationary characteristics and their inﬂuence on the structural dynamic responses.


Instruction
The intensity non-stationarity of ground motions refers to the characteristic that the energy of ground motions changes over time, the performance of which is that the amplitude of the acceleration time history gradually increases and finally attenuates to zero. Artificial ground motions time history needs to be generated since there are usually only a few of the ground motions required for nonlinear time-history analysis. Research on the intensity non-stationarity of ground motions is focused, first, on generating many ground motions meeting the time-history analysis requirements by using the rule of intensity nonstationarity and, more importantly, studying the effect of the earthquake non-stationarity characteristics on the structural dynamic responses.
In the conventional research [1][2][3][4][5], the intensity non-stationarity of ground motions is mainly described from the perspective of the intensity envelope functions of ground motions. The shape of these intensity envelope functions is controlled by a number of shape parameters, and the artificial ground motion with intensity non-stationarity can be obtained by modulating a stationary ground motion using the envelope function. The value of the shape parameters is also a focus of the research on the intensity non-stationarity, and the previous methods based on engineering experience no longer meet the needs of engineering application. Huo et al. [6] fit the attenuation relationship of envelope parameters with magnitude, epicentral distance and site conditions by statistical method for three-step intensity envelope functions. Qu et al. [7] fit the change rule of parameters of the three-step intensity envelope function using the three seismic records of SMART-1 array and analyzed the effect of the coordinates and soil thickness of the measuring points on the parameters. Du et al. [8] proposed an intensity envelope function with log-normal distribution, fit the attenuation relationship of envelope parameters with magnitude and epicentral distance under different site conditions by statistical method, and gave the recommended values under different conditions.
The effect of the intensity non-stationarity of ground motions on the structural dynamic responses is not insignificant. Liu et al. [9] analyzed the effect of the envelope parameters of three-step intensity envelope function on the structural responses and obtained the approximate quantitative relationship between the parameters of the intensity non-stationarity of ground motions and the structural responses by fitting, which suggested that the stationary state duration of ground motions had a significant effect on the structural seismic responses. Jiang et al. [10] deduced the analytical expression of structural dynamic responses activated by the intensity non-stationarity based on the damped-sine function, which suggested that the frequency ratio and non-stationarity had a greater effect on the maximum value of structural dynamic responses than the damping ratio.
There are two main kinds of methods for researching earthquake non-stationarity characteristics and their influence on structural dynamic responses: analytical methods and numerical methods. With the analytical methods, the effect rule obtained is accurate and intuitive, but we can only obtain the analytical expression of displacement response time history, which is extremely complex. On the one hand, it is difficult to derive the expressions for velocity and acceleration. On the other hand, numerical methods are still needed to obtain the maximum value of responses. However, in general numerical methods, the values at multiple discrete points are calculated first, and then a smooth curve is used to fit the change rule. This method is feasible in the case of low dimension where there are few parameters but can hardly be implemented when there are many parameters. Therefore, neural networks are proposed in this paper to study the effect of the intensity non-stationarity of ground motions on the structural responses. Neural networks have a strong high-dimensional nonlinear fitting capacity. In this paper, the formula for the sensitivity of input parameters is deducted based on this, and, compared with the results of the analytical formula, this method could accurately analyze the change rule and influence of the maximum value of the structural responses with the parameters of the intensity non-stationarity.

The Intensity Non-Stationarity Model Enveloped by the Damped-Sine Function and the Analytical Solution of Its Dynamic Responses
The damped-sine function typed intensity envelope function was proposed by Shinozuka et al. [1], as follows: where f (t) is the damped-sine function typed intensity envelope function; I 0 is the normalization constant, which uniformly adjusts the peak of the envelope function to 1; α is the parameter of the attenuation term, which mainly affects the peak position and attenuation rate of the envelope function; β is the excitation shape parameter as well as the frequency of the sine function, which mainly controls the descent rate of the envelope function in the descent segment, with little influence on the peak position; and t 0 is the time of the peak of the envelope function. The analytical expression (Equation (2)) of the intensity non-stationarity signal based on the damped-sine function can be obtained by modulating a cosine wave with f (t), where ω is the circular frequency of the analytical signal. The typical expression of F(t) is shown in Figure 1. stationarity signal based on the damped-sine function can be obtained by modulating a cosine wave with f(t), where ω is the circular frequency of the analytical signal. The typical expression of F(t) is shown in Figure 1. The equation of motion shown in Equation (3) can be obtained when the modulated non-stationarity analytical signal F(t) is input into a single-degree-of-freedom.
where u is the displacement responses of the structure, the first-and second-order derivatives of which are velocity response and acceleration response; ζ is the damping ratio; and ωn is the structural natural vibration circular frequency. JIANG et al. [10] obtained the analytical expression (Equation (4)) of the displacement response u(t) by the Laplace transformation on both sides of Equation (3).  The equation of motion shown in Equation (3) can be obtained when the modulated non-stationarity analytical signal F(t) is input into a single-degree-of-freedom.
where u is the displacement responses of the structure, the first-and second-order derivatives of which are velocity response and acceleration response; ζ is the damping ratio; and ω n is the structural natural vibration circular frequency. JIANG et al. [10] obtained the analytical expression (Equation (4)) of the displacement response u(t) by the Laplace transformation on both sides of Equation (3). where The above analytical analysis shows that the earthquake non-stationarity characteristics has an effect on the structural dynamic responses, but the expression above is too complex.

Response Prediction Based on the GA-BP Neural Networks
In BP neural networks, many input and output data pairs are used to optimize the initial weights and biases to minimize the error of the predicted output compared with the target output of BP neural works. The training algorithm of neural networks is an optimization algorithm based on local gradient, so the final training result is related to both initial weights and initial biases, and improper initial network parameters may cause the local optimal solution of the algorithm.
The biological evolution principle of "survival of the fittest" is introduced into the parameter optimization algorithm by genetic algorithm, and the basic operators of genetic algorithm such as selection, cross, and mutation are used to screen the individuals in the population, with the superior ones retained to produce the next generation. The process is repeated until the condition of convergence is met. Different from BP neural networks, the genetic algorithm is a global searching algorithm, which makes up for the deficiency of the poor global searching capacity of BP neural networks, and the combined model of neural networks and the genetic algorithm is called GA-BP neural networks [11,12]. In this paper, the GA-BP neural network is adopted. First, the genetic algorithm is used to optimize the initial network parameters on a small sample, and then the BP neural network is assigned to the optimized weights and biases and trained on the whole dataset to obtain the final solution. The basic algorithm flowchart of GA-BP is shown in Figure 2.
The above analytical analysis shows that the earthquake non-stationarity characteristics has an effect on the structural dynamic responses, but the expression above is too complex.

Response Prediction Based on the GA-BP Neural Networks
In BP neural networks, many input and output data pairs are used to optimize the initial weights and biases to minimize the error of the predicted output compared with the target output of BP neural works. The training algorithm of neural networks is an optimization algorithm based on local gradient, so the final training result is related to both initial weights and initial biases, and improper initial network parameters may cause the local optimal solution of the algorithm.
The biological evolution principle of "survival of the fittest" is introduced into the parameter optimization algorithm by genetic algorithm, and the basic operators of genetic algorithm such as selection, cross, and mutation are used to screen the individuals in the population, with the superior ones retained to produce the next generation. The process is repeated until the condition of convergence is met. Different from BP neural networks, the genetic algorithm is a global searching algorithm, which makes up for the deficiency of the poor global searching capacity of BP neural networks, and the combined model of neural networks and the genetic algorithm is called GA-BP neural networks [11,12]. In this paper, the GA-BP neural network is adopted. First, the genetic algorithm is used to optimize the initial network parameters on a small sample, and then the BP neural network is assigned to the optimized weights and biases and trained on the whole dataset to obtain the final solution. The basic algorithm flowchart of GA-BP is shown in Figure 2.

Hyperparameters of the BP Neural Networks
Different from the parameters obtained by gradient descent training, the hyperparameters of the neural networks are set before learning, including the number of layers of neural networks, the number of neurons at each layer, the selection of cost function, the selection of excitation function, the methods of weight initialization, and so on.
The intensity non-stationarity model based on the damped-sine function has four independent variable input parameters and one output parameter. The input parameters are the time of peak (t 0 ), parameter of excitation shape (β), excitation frequency (ω), and structural natural vibration period (T), while the output is the absolute value of the maximum displacement response (u max ) of the structure. Therefore, four neurons are needed at the input layer and one neuron is needed at the output layer of the BP neural network. In this paper, two hidden layers are set in the BP neural network, which contain 16 and 8 neurons, respectively. The structure of the artificial neural network is shown in Figure 3. The activation function of each hidden layer is a hyperbolic tangent function, and the output layer is a linear transfer function. The hyperparameters of the artificial neural network are shown in Table 1.
structural natural vibration period (T), while the output is the absolute value of the maximum displacement response (umax) of the structure. Therefore, four neurons are needed at the input layer and one neuron is needed at the output layer of the BP neural network. In this paper, two hidden layers are set in the BP neural network, which contain 16 and 8 neurons, respectively. The structure of the artificial neural network is shown in Figure 3. The activation function of each hidden layer is a hyperbolic tangent function, and the output layer is a linear transfer function. The hyperparameters of the artificial neural network are shown in Table 1.

Training Dataset
The dynamic response prediction algorithm based on the BP neural network in this paper is a supervised learning algorithm, and the training data are composed of input and output data pairs. First, values of the four input parameters are taken uniformly within a certain range to obtain a set of input data. Then, each of the input parameters is substituted into Equation (4) to obtain the displacement response time history. Finally, the maximum absolute value of the displacement response time history is taken to obtain the output of each input parameter.
Given that the fitness function of each individual in a population needs to be evaluated by the genetic algorithm, the use of all training data will cause low efficiency of the algorithm. Therefore, the training data are divided into two parts, of which the sparsely selected sample data in the first part are used to optimize the initial weights and biases of the genetic algorithm, and all data in the second part are used to train the final BP neural network. The detailed methods for taking the small sample training data and the whole training data are shown in Table 2.

Training Dataset
The dynamic response prediction algorithm based on the BP neural network in this paper is a supervised learning algorithm, and the training data are composed of input and output data pairs. First, values of the four input parameters are taken uniformly within a certain range to obtain a set of input data. Then, each of the input parameters is substituted into Equation (4) to obtain the displacement response time history. Finally, the maximum absolute value of the displacement response time history is taken to obtain the output of each input parameter.
Given that the fitness function of each individual in a population needs to be evaluated by the genetic algorithm, the use of all training data will cause low efficiency of the algorithm. Therefore, the training data are divided into two parts, of which the sparsely selected sample data in the first part are used to optimize the initial weights and biases of the genetic algorithm, and all data in the second part are used to train the final BP neural network. The detailed methods for taking the small sample training data and the whole training data are shown in Table 2. The training dataset of the BP neural network includes training set, validation set, and test set. The training set is used to train the BP neural networks. The validation set is used to evaluate the performance of the BP neural networks, including determining whether there is underfitting or overfitting. The test set is used to evaluate the generalization ability of the final BP neural networks and does not participate in the training of the BP neural networks. In this paper, after the dataset is out of order, 70% of the total sample is randomly selected as the training set, 15% as the validation set, and 15% as the test set.

Data Initialization
In general, the distribution form of training dataset will affect the calculation efficiency of gradient descent algorithm, and data preprocessing can improve the solving speed and accuracy. The common normalization methods include linear normalization, mean standardization, and nonlinear normalization (taking the logarithm, exponent, etc.). The distribution characteristics of the training dataset in this paper show that data are too concentrated in an interval of smaller magnitude and thus the combination of linear normalization and nonlinear normalization, that is, taking the logarithm of the dataset and then normalizing the linearity, is selected, and the calculation formulas for the normalization of input and output data are shown in Equations (5) and (6), respectively.
where x and y are the non-normalized and normalized data, respectively; m and n are the maximum and minimum values of linear normalization, respectively; and x max and x min are the maximum and minimum values of the dataset, respectively. In Equation (5), the dataset is linearly compressed into the interval of [n, m]. In Equation (6), the dataset is linearly compressed into the interval of [n, m] after taking the algorithm. In this paper, n and m are set to −1 and 1, respectively, and the u max histograms of network output before and after the normalization are shown in Figure 4. It can be seen that many non-normalized data are concentrated in the interval of [0, 5000], while the normalized data are close to being normally distributed. Training data in normal distribution can improve the accuracy and efficiency of gradient descent algorithm.

Coding
In the genetic algorithm, chromosome coding is needed first, with such methods as binary coding and real-number coding. For binary coding, the network parameters need to be converted into binary numbers, and the large number of parameters of the neural networks will lead to long genes. In addition, Hancock [13] pointed out in his research that binary coding will also lead to permutation problem. For real-number coding [14][15][16], the network parameters are directly coded into a string of real numbers in sequence, which requires no conversion of number systems and will not cause too long chromosome. In this paper, the real-number coding shown in Figure 5 is adopted. We can see that the chromosome consists of three sets of weight vectors and three sets of bias vectors,  In the genetic algorithm, chromosome coding is needed first, with such methods as binary coding and real-number coding. For binary coding, the network parameters need to be converted into binary numbers, and the large number of parameters of the neural networks will lead to long genes. In addition, Hancock [13] pointed out in his research that binary coding will also lead to permutation problem. For real-number coding [14][15][16], the network parameters are directly coded into a string of real numbers in sequence, which requires no conversion of number systems and will not cause too long chromosome. In this paper, the real-number coding shown in Figure 5 is adopted. We can see that the chromosome consists of three sets of weight vectors and three sets of bias vectors, where ω 1 , ω 2 , and ω 3 are the weight vector from the input layer to the first hidden layer, the weight vector from the first hidden layer to the second hidden layer, and the weight vector from the second hidden layer to the output layer, respectively. ω 2,i-j is the weight from the ith neuron at the first hidden layer to the jth neuron at the second hidden layer. b 1 , b 2 , and b 3 are the bias vector from the input layer to the first hidden layer, the bias vector from the first hidden layer to the second hidden layer, and the bias vector from the second hidden layer to the output layer, respectively. b 2,k is the kth bias from the first hidden layer to the second hidden layer.

Coding
In the genetic algorithm, chromosome coding is needed first, with such methods as binary coding and real-number coding. For binary coding, the network parameters need to be converted into binary numbers, and the large number of parameters of the neural networks will lead to long genes. In addition, Hancock [13] pointed out in his research that binary coding will also lead to permutation problem. For real-number coding [14][15][16], the network parameters are directly coded into a string of real numbers in sequence, which requires no conversion of number systems and will not cause too long chromosome. In this paper, the real-number coding shown in Figure 5 is adopted. We can see that the chromosome consists of three sets of weight vectors and three sets of bias vectors, where ω1, ω2, and ω3 are the weight vector from the input layer to the first hidden layer, the weight vector from the first hidden layer to the second hidden layer, and the weight vector from the second hidden layer to the output layer, respectively. ω2,i-j is the weight from the ith neuron at the first hidden layer to the jth neuron at the second hidden layer. b1, b2, and b3 are the bias vector from the input layer to the first hidden layer, the bias vector from the first hidden layer to the second hidden layer, and the bias vector from the second hidden layer to the output layer, respectively. b2,k is the kth bias from the first hidden layer to the second hidden layer.

Fitness Function
Fitness function is used to evaluate the superiority and inferiority of each individual in a population. In this paper, the BP neural network is assigned to the chromosomes of each individual, trained 20 times, and the sum of the absolute values of prediction errors was taken as the fitness value of the individual. The calculation formula is shown in Equation (7).
where n is the total number of training data, which is set to 4400 in this paper; yi is the target output of the ith set of data; and oi is the predicted output of the ith set of data.

Fitness Function
Fitness function is used to evaluate the superiority and inferiority of each individual in a population. In this paper, the BP neural network is assigned to the chromosomes of each individual, trained 20 times, and the sum of the absolute values of prediction errors was taken as the fitness value of the individual. The calculation formula is shown in Equation (7).
where n is the total number of training data, which is set to 4400 in this paper; y i is the target output of the ith set of data; and o i is the predicted output of the ith set of data.

Basic Operators of Genetic Algorithm
(1) Section operator Roulette algorithm is selected for the selection operator in this paper, and the selection probability (p i ) is assigned to each individual based on Equation (8). In the equation, F i is the fitness value of an individual. Since the fitness is defined as the sum of the absolute values of prediction errors, individuals with lower fitness values should be more likely to be selected.
(2) Cross operator In this paper, the real-number cross method is adopted, and the cross probability is set to 0.3. Two paternal chromosomes are randomly selected first, and then one cross position is randomly selected. The value of the daughter chromosomes at the cross position is the randomly weighted sum of the value of two paternal chromosomes at the cross position, as shown in Equation (9).
where A and B represent the cross-generated daughter chromosomes; A j and B j represent the jth cross position of chromosomes A and B, respectively; j is a random integer, the maximum value of which is the length of the chromosome; and b is a randomly generated combination weight coefficient, the value range of which is from 0 to 1.

(3) Mutation operator
In this paper, the mutation probability is set to 0.3, and it is based on Equation (10).
where A j is the jth gene position of a certain chromosome in a population; A j is the gene value on the jth gene position after mutation; A j,max is the upper limit of the gene value on the jth gene position; A j,min is the lower limit of the gene value on the jth gene position; r 1 and r 2 are two random numbers in [0, 1]; g is the current generation; and G max is the maximum generation. Φ is a mutation parameter related to the generation; the greater the generation is, the smaller the value of the mutation parameter will be, i.e., the mutation probability will decrease with the increase of the generation.

Genetic Algorithm Optimization Results
An initial population containing 20 individuals is randomly generated, and the evolution of the mean fitness values and best fitness values of a population can be obtained after 10 generations, as shown in Figure 6. It can be seen that the fitness of the population decreases constantly with the increase of the generation and finally tends to be stable.

Validation on the Dataset
The predicted output of each input can be obtained after inputtin the trained artificial neural networks. The validation diagram of neu in Figure 7 can be obtained with the target value as the x-coordinat value as the y-coordinate. The closer the discrete points in the figure a slope of 1, the higher the prediction accuracy will be.

Validation on the Dataset
The predicted output of each input can be obtained after inputting the input data into the trained artificial neural networks. The validation diagram of neural networks shown in Figure 7 can be obtained with the target value as the x-coordinate and the predicted value as the y-coordinate. The closer the discrete points in the figure are to the line with a slope of 1, the higher the prediction accuracy will be. Figure 7a shows the validation results of data on the test set, from which we can find that the discrete points are basically distributed near the line, with the mean square error of only 1.0024 × 10 −5 . The test set does not participate in the network training, indicating that the artificial neural network has good generalization ability. Figure 7b shows the validation results of data on all datasets, from which we can find that the mean square error is less than 10 −5 , indicating that the artificial neural network has higher prediction accuracy and can meet the need for subsequent analysis.

Validation on the Dataset
The predicted output of each input can be obtained after inputting the input data into the trained artificial neural networks. The validation diagram of neural networks shown in Figure 7 can be obtained with the target value as the x-coordinate and the predicted value as the y-coordinate. The closer the discrete points in the figure are to the line with a slope of 1, the higher the prediction accuracy will be.  Figure 7a shows the validation results of data on the test set, from which we can find that the discrete points are basically distributed near the line, with the mean square error of only 1.0024 × 10 −5 . The test set does not participate in the network training, indicating that the artificial neural network has good generalization ability. Figure 7b shows the validation results of data on all datasets, from which we can find that the mean square error is less than 10 −5 , indicating that the artificial neural network has higher prediction accuracy and can meet the need for subsequent analysis.

Validation of the Change Rule of Single Parameter
Test set does not participate in the artificial neural networks training, and validation of the accuracy of the artificial neural networks on the test set can effectively validate the generalization ability of the artificial neural network to some extent. However, the data values on the test set are still included in the maximum and minimum values of the training data. To further validate the generalization ability of the artificial neural networks outside the datasets, the difference between the predicted value and the theoretical value of the artificial neural network is compared by constantly changing the value of a single input parameter in this section, as shown in Figure 8.

Validation of the Change Rule of Single Parameter
Test set does not participate in the artificial neural networks training, and validation of the accuracy of the artificial neural networks on the test set can effectively validate the generalization ability of the artificial neural network to some extent. However, the data values on the test set are still included in the maximum and minimum values of the training data. To further validate the generalization ability of the artificial neural networks outside the datasets, the difference between the predicted value and the theoretical value of the artificial neural network is compared by constantly changing the value of a single input parameter in this section, as shown in Figure 8.  Figure 8a shows the change of the theoretical output and predicted output with the structural period T, which can also be considered as the displacement response spectrum. When the range of T is extended from the 0.5-4 s of the original data to 0-8 s, the artificial neural networks in this paper can still accurately predict the output values before 6.6 s. Figure 8b shows the change of the theoretical output and predicted output with the excitation frequency ω. When the range of ω is extended from the 0.1-10 of the original data to 0-20, the artificial neural networks can still accurately predict the output values in the  Figure 8a shows the change of the theoretical output and predicted output with the structural period T, which can also be considered as the displacement response spectrum.
When the range of T is extended from the 0.5-4 s of the original data to 0-8 s, the artificial neural networks in this paper can still accurately predict the output values before 6.6 s. Figure 8b shows the change of the theoretical output and predicted output with the excitation frequency ω. When the range of ω is extended from the 0.1-10 of the original data to 0-20, the artificial neural networks can still accurately predict the output values in the entire extended data segment. Figure 8c shows the change of the theoretical output and predicted output with the excitation shape parameter β. When the range of β is extended from the 0.01-0.05 of the original data to 0-0.1, the artificial neural networks can still accurately predict the output values in the segment before 0.064. Figure 8d shows the change of the theoretical output and predicted output with the time of peak t 0 . When the range of t 0 is extended from the 20-40 s of the original data to 0-60 s, the artificial neural networks can still accurately predict the output values in the segment after 6.2 s.

Sensitivity Analysis of Neutrons at the Adjacent Layers
The complete structure of artificial neural networks is shown in Figure 9, including the input layer, hidden layer, and output layer. x i,j represents the jth neuron at the ith layer; ReLU is the activation function; and W i and θ i are, respectively, the weight matrix and bias vector from the ith layer to the (i + 1)th layer. Sensitivity [17,18] is defined as the influence of the input change on the output, which can be obtained by taking the derivative of the output with respect to the input. The mth neuron xk,m at the kth layer and the nth neuron xk+1,n at the (k + 1)th layer were investigated. When a neuron at the previous layer produces little disturbance, the change of the neuron at the later layer S(k,m),(k+1,n) can be obtained by taking the derivative of the output xk+1,n with respect to the input xk,m, as shown in Equation (11).
The relationship between xk,m and xk+1,n is shown in Equation (12), where Wk,i,n is the weight coefficient from the ith neuron at the kth layer to the nth neuron at the (k + 1)th layer; θk,n is the bias coefficient from the neurons at the kth layer to the nth neuron at the (k + 1)th layer; and R(x) is an activation function.
After substituting Equation (12) into Equation (11), we can obtain the sensitivity expression: Sensitivity [17,18] is defined as the influence of the input change on the output, which can be obtained by taking the derivative of the output with respect to the input. The mth neuron x k,m at the kth layer and the nth neuron x k+1,n at the (k + 1)th layer were investigated. When a neuron at the previous layer produces little disturbance, the change of the neuron at the later layer S (k,m),(k+1,n) can be obtained by taking the derivative of the output x k+1,n with respect to the input x k,m , as shown in Equation (11).
The relationship between x k,m and x k+1,n is shown in Equation (12), where W k,i,n is the weight coefficient from the ith neuron at the kth layer to the nth neuron at the (k + 1)th layer; θ k,n is the bias coefficient from the neurons at the kth layer to the nth neuron at the (k + 1)th layer; and R(x) is an activation function.
After substituting Equation (12) into Equation (11), we can obtain the sensitivity expression: When the activation function is a linear transfer function, R(x) = x and R (x) = 1; therefore, Equation (13) can be further simplified to Equation (14).

Sensitivity Analysis of Neutrons at any Layer
The mth neuron x k,m at the kth layer and the nth neuron x v,n at the vth layer were investigated, and their sensitivity can be obtained by the chained derivation formula shown in Equation (16).
where i j represents the subscript index of the jth neuron at the ith layer. The chained derivation formula shown in Equation (16) actually converts the sensitivity of neurons at any two layers into the sum of the sensitivity of neurons at multiple adjacent layers.

Analysis of Influence of the Parameters of the Intensity Non-Stationarity Based on Neural Networks
The BP neural network in this paper has three connection layers, of which the first two layers adopt hyperbolic tangent activation functions, and the output layer adopts linear transfer activation function. Equation (17) can be obtained by the chained derivation formula in Equation (16).
where y 0,1 and x 0,1 are the data values before and after the normalization of the ith input, respectively, and the normalization formula is shown in Equation (5); y 3,1 and x 3,1 are the output data values before and after the normalization, respectively, and the normalization formula is shown in Equation (6); W 0,i,j is the weight from the ith neuron at the input layer to the jth neuron at the first hidden layer; W 1,j,k is the weight from the jth neuron at the first hidden layer to the kth neuron at the second hidden layer; and W 2,k,1 is the weight from the kth neuron at the second hidden layer to the first neuron at the output layer. According to the normalization formula of the input and output data, the first two differential terms in Equation (17) where y 0,i,max and y 0,i,min are the maximum and minimum values before the normalization of the ith input data, respectively; y 3,1,max and y 3,1,min are the maximum and minimum values before the normalization of the output data, respectively; and m and n are the maximum and minimum values of the linear normalization, which are set to 1 and −1, respectively, in this paper. The data in Figure 8 are substituted into Equation (18), and the corresponding sensitivity analysis figures can be obtained, as shown in Figure 10.  According to Figure 10c,d, excitation shape parameter (β) and time of peak (t0) have the least effect on the maximum absolute displacement response; the sensitivity of structural period and excitation frequency is first positive and then negative, which is consistent with the change rule of the structural response (which first increases and then decreases) near the resonance frequency. A similar rule can also be found is Figure 10e: when the structural period and excitation frequency are similar, the greater the sensitivity is and the higher the excitation frequency is, the smaller the effect of the vibration input on the dynamic response will be.

Conclusions
The effect of the intensity non-stationarity of ground motions on the structural dynamic responses is one of the scientific problems in the anti-seismic field. In this paper, According to Figure 10c,d, excitation shape parameter (β) and time of peak (t 0 ) have the least effect on the maximum absolute displacement response; the sensitivity of structural period and excitation frequency is first positive and then negative, which is consistent with the change rule of the structural response (which first increases and then decreases) near the resonance frequency. A similar rule can also be found is Figure 10e: when the structural period and excitation frequency are similar, the greater the sensitivity is and the higher the excitation frequency is, the smaller the effect of the vibration input on the dynamic response will be.

Conclusions
The effect of the intensity non-stationarity of ground motions on the structural dynamic responses is one of the scientific problems in the anti-seismic field. In this paper, the adoption of GA-BP neural networks in the research of the presented problem is proposed for the first time. Firstly, the initial weights and biases of the BP neural network are optimized by the genetic algorithm based on a small amount of training data. Then, the BP neural network is trained on all training data. Finally, the effect rule of the intensity non-stationarity of ground motions on the structural dynamic responses is obtained by studying the relationship between the BP neural network input and output. The main conclusions in this research are as follows: (1) The GA-BP neural network in this paper, after trained with many data, can predict the maximum displacement response of the structure through four input parameters (time of peak t 0 , excitation shape parameter β, excitation frequency ω, and structural period T). The results show that the GA-BP neural network has high prediction accuracy, but the high prediction accuracy can only be ensured within a certain range due to the limited range of training data. However, due to the generalization ability of BP neural networks, the range is larger than that of the training data.
(2) The sensitivity of the neural networks can be defined as the derivative of the output with respect to the input, which is used to measure the effect of the input change on the output. The analytical expression of the output of the BP neural network with respect to the input is obtained by mathematical derivation, and its comparison with theoretical value shows that the sensitivity from the calculation of the neural network is consistent with the theoretical value within the range of prediction accuracy. The results of the sensitivity analysis show that the maximum displacement output is the most sensitive to changes of the structural period and excitation frequency near the point of resonance.
(3) In the research on the effects of the nonstationary characteristics on the structural dynamic responses, compared with the analytic derivation method, the method based on the GA-BP neural networks can not only ensure the equivalent accuracy, but also avoid situations where the expression is too complex to solve; thus, this method can be extended to the study on the effect rule of nonlinear responses. Compared with numerical methods, the method in this paper can make full use of the advantages of neural networks in high dimension strong nonlinear fitting and comprehensively consider the effect rule of multiple input parameters.