Prediction of Maximum Story Drift of MDOF Structures under Simulated Wind Loads Using Artificial Neural Networks

The aim of this paper is to investigate the prediction of maximum story drift of Multi-Degree of Freedom (MDOF) structures subjected to dynamics wind load using Artificial Neural Networks (ANNs) through the combination of several structural and turbulent wind parameters. The maximum story drift of 1600 MDOF structures under 16 simulated wind conditions are computed with the purpose of generating the data set for the networks training with the Levenberg–Marquardt method. The Shinozuka and Newmark methods are used to simulate the turbulent wind and dynamic response, respectively. In order to optimize the computational time required for the dynamic analyses, an array format based on the Shinozuka method is presented to perform the parallel computing. Finally, it is observed that the already trained ANNs allow for predicting adequately the maximum story drift with a correlation close to 99%.


Introduction
The maximum story drift of Multi-Degree of Freedom (MDOF) structures under wind loads is an important parameter for the structural design; however, for structural engineers, the estimation of this parameter by dynamic analysis could be impractical. The dynamic analysis of structures is complex and it requires real wind speed records, which are not available in most of the cases. Several studies suggest that the use of spectral density models are adequate to simulate the velocity field of the turbulent wind [1][2][3]. Due to the complexity of obtaining synthetic wind records and the dynamic structural response, the engineers have used non-complex methods such as the equivalent static analysis, which tries to consider the dynamic effects of the turbulent wind supported by the use of the well-known dynamic amplification factor [4]. Some modifications and improvements to the equivalent static method, originally presented by Davenport [5], have been proposed by different authors [6,7]. Although these methods are not complex, they require several calculation steps in order to estimate in tasks as prediction, classification, data processing, robotic and engineering problems [8][9][10][11]. For this reason, the aim of this study is to evaluate the efficiency of ANNs as computational models for prediction of the maximum story drift of MDOF structures in terms of regional wind speeds, terrain categories and structural characteristics such as height, slenderness ratio, fundamental period, distributed mass and stiffness ratio. The Newmark method is employed to resolve the differential equation of motion. This method is frequently used in numerical evaluation of the dynamic response of structures [12][13][14]. The wind loads applied in the dynamic analyses are generated from synthetic records obtained by spectral representation with the well-known Shinozuka method [15], which has been shown to give the best results in terms of overall quality of the signal [16]. The dynamic responses of a considerable amount of MDOF structures are used to build a database about the maximum story drift related to structural characteristics and wind conditions. This database is employed to the training processes of ANNs with the Levenberg-Marquardt (LM) method, which is a standard technique for solving nonlinear least squares problems [17]. The LM method is an optimization procedure with features of stable and fast convergence [18]. The training process is applied in ANN models known as multilayer perceptrons that consist of neurons located in multiple layers and interconnected to form a network with forward propagation (Figure 1a). The multilayer perceptron approach resolve problems that are not linearly separable, which is the main limitation of the standard linear perceptron method. Hecht-Nielsen [8] proved that one hidden layer of neurons suffices to model any solution surface of practical interest. In this paper, some multilayer perceptron models with different amounts of neurons in a single hidden layer are evaluated to predict the maximum story drift of 25,600 case studies.  Figure 1b shows the internal mathematical process of each neuron. The output is given by an activation function f which depends on the sum of the inputs n w : sum = 1w + n w + n w + ⋯ + n w , where n is the output of other neuron and w is the synaptic weight ( Figure 1b). The sigmoid function (values between 0 and 1) is frequently used like activation function and its mathematical form is: (sum) = n = 1 1 + .
An important property is that its derivative can be expressed by the function itself: The paper is organized in the following way: Section 2 exposes the detail of the theory employed in network training, wind simulation and dynamic response. Section 3 describes the set of structural characteristics and wind conditions; then, the results of the application of theoretical framework are shown. Finally, a brief discussion of the main results is presented in Section 4.  Figure 1b shows the internal mathematical process of each neuron. The output is given by an activation function f which depends on the sum of the inputs n i w i : where n i is the output of other neuron and w i is the synaptic weight ( Figure 1b). The sigmoid function (values between 0 and 1) is frequently used like activation function and its mathematical form is: An important property is that its derivative can be expressed by the function itself:

of 16
The paper is organized in the following way: Section 2 exposes the detail of the theory employed in network training, wind simulation and dynamic response. Section 3 describes the set of structural characteristics and wind conditions; then, the results of the application of theoretical framework are shown. Finally, a brief discussion of the main results is presented in Section 4.

Network Training
The Sum Square Error (SSE) is a performance index and it is used in order to evaluate the training of the network: where N p is the number of patterns; N o is the number of outputs; e p

o.
The derivative of the error, as a function of the weights, defines the gradient, which gives information to find a minimum. The mathematical form is as follows: where G is the gradient vector; and w is the weight vector.
The use of the gradient as principal parameter for reducing the error is known as the descendent gradient method. The update of weights to minimize the error by descendent gradient can be written as: where α is the learning constant and ∆w the update for the weights. This method is stable but could be very slow with a lot of interactions due to inefficient values of α. The Newton method allows for reducing the number of interactions using a square matrix of second-order partial derivatives. This matrix can be defined by Taylor series. According to Taylor, it is possible to represent a function from an infinite sum of terms that are calculated by derivatives and powers in a single point. The Taylor polynomial of second degree can be written in its compact form as: where a is the derivation point, G(a) is the gradient vector and H(a) is the Hessian matrix. Using the Taylor approximation, Equation (7), to estimate the next possible value of the error function, i.e., T(x) = E(w + ∆w), and considering the derivation point at a = w, the expression would be: Assuming that the value of error is zero E(w + ∆w) = 0 with the aim to close the error to a minimum, it is possible to reduce the expression with the derivative with respect to ∆w: therefore: written in matrix form : The updating of the weights by Newton method is obtained with the following expression: The Hessian matrix H of second-order partial derivatives of total error function have to be calculated and it could be very complicated. In order to simplify the calculating process, a matrix of first-order partial derivatives is presented as a Jacobian matrix, which has a mathematical relation with a Gradient and Hessian matrix. By combining the error of Equation (4) and the definition of gradient (5), it is possible to determinate that: The error e expressed in vector form is: and the Jacobian of the error e: Therefore, the gradient could be represented for the Jacobian matrix by the following expression: Each element of the Hessian matrix can be expressed in terms of error e p . o resolving the cross partial derivative of the sum square error: where S i,j is: For the Gauss-Newton method, it is assumed that S i,j ≈ 0; therefore, the Hessian matrix could be related with the Jacobian matrix by the following form: The updating of the weights by Gauss-Newton method is obtained as a function of the matrix of first-order partial derivatives: The Gauss-Newton algorithm has better performance than the Newton method because it does not require the calculation of Hessian matrix. However, the Gauss-Newton algorithm could have problems of matrices not invertible. In order to resolve this problem, LM modified the Hessian matrix with an approximation: where µ is a value always positive such with combination to an identity matrix I makes positive definite and therefore can be invertible. The updating of the weights by LM method can be presented as: Calculation of Jacobian Matrix According to Equations (4) and (15), the elements of Jacobian matrix can be calculated as: Considering an ANN of one hidden layer (Figure 2), the backpropagation is used in order to resolve the Equation (23). As shown in Figure 2, the superscript denotes the position of the layer in the network while the subscript describes the position of the neuron in the layer. In the weights, the double subscript indicates the neural connection in backpropagation order. The number of input and hidden neurons are defined by N x and N 2 , respectively. Calculation of Jacobian Matrix According to Equations (4) and (15), the elements of Jacobian matrix can be calculated as: Considering an ANN of one hidden layer (Figure 2), the backpropagation is used in order to resolve the Equation (23). As shown in Figure 2, the superscript denotes the position of the layer in the network while the subscript describes the position of the neuron in the layer. In the weights, the double subscript indicates the neural connection in backpropagation order. The number of input and hidden neurons are defined by and , respectively. For a given pattern, the output could be calculated with backward computation: deriving Equations (24) and (25) with respect to w ( ) and w ( ) , respectively: For the problem with patterns and outputs, the Jacobian and error matrices can be organized as the pseudocode shown in Figure 3. For a given pattern, the output y p . o could be calculated with backward computation: deriving Equations (24) and (25) with respect to w (2) . oj and w (1) ji , respectively: For the problem with patterns and outputs, the Jacobian and error matrices can be organized as the pseudocode shown in Figure 3.
deriving Equations (24) and (25) with respect to w ( ) and w ( ) , respectively: For the problem with patterns and outputs, the Jacobian and error matrices can be organized as the pseudocode shown in Figure 3.  Therefore, the training process using LM method can be planned as:

1.
Calculate the forward computation with initial weights (randomly generated).
the current total error. If the current total error is increased, then reset the weights to the previous values and increase µ by a factor of 10. Then, go to Step 2. Otherwise, keep the new weights and decrease µ by a factor of 10.

4.
Go to Step 2 until the current total error is smaller than the required value.

Wind Simulation
Among various simulation methods, the spectral representation methods appear to be the most widely used because they are fast and conceptually straightforward. Wind synthetic records can be generated from the spectral representation method proposed by Shinozuka and Jan [15]. Considering the case of N p stationary stochastic processes u j (t), j = 1, 2, 3, . . . , N p and discrete time t = i∆t, i = 0, 1, 2, . . . , N s , the mathematical expression for the generation of synthetic records is: where u j (t) is a signal with zero mean of size 1 × N s , representing the turbulent part of the longitudinal component at jth point in the space; H jk (ω n ) is an element of the lower triangular matrix H(ω n ) of size N p × N p , which is defined by the Cholesky factorization process from the cross spectral density matrix; θ kn is an element of the random phase angle matrix Θ of size N p × N f , with uniform distribution between [0, 2π]; ω n is the discrete angular frequency (rad/s); ∆ω is the increment of angular frequency; and N f is the amount of values contained in the discrete spectral density function.
As it was shown in Equation (28), the calculation of the Shinozuka expression requires serial computing. In order to save computing time, an expression in matrix multiplication format based on the Shinozuka method is presented to perform the parallel computing taking into account the advantage of current multicore processors that can solve several vectors of the resulting matrix at the same time. The expression in array format is: where u is a matrix with size N p ×N s , and represents N p stochastic process;Â is a matrix with size N p ×(N p N f ), defined by Equation (30); X is a matrix with size N p N f ×N s , defined by Equation (31) cos cos cos cos Cross-Spectral Density Matrix The cross-spectral density matrix describes the influence of the turbulence components at two points at a given frequency. This influence is due to the spatial dimension of the vortices in the wind field. Considering Cholesky, if the cross spectral density matrix S 0 (ω) is symmetric positive definite, then S 0 (ω) can be decomposed as [1]: where S 0 (ω) is the cross spectral density matrix for angular frequency ω; and H(ω) is a lower triangular matrix. The process of Cholesky factorization is as follows: For two points with vertical separationr y , the S 0 ij (ω) corresponding elements are obtained by the cross-spectral density function S uu z i , z j , n : where S uu z i , z j , n is the cross spectral density function for two longitudinal turbulent components at space points z 1 and z 2 ; S u (z i , n) is the single power density spectrum at z i , defined in Equation (38); coh r y , n is the root-coherence function, defined in Equation (35); n is the frequency in Hz. The root-coherence function defines the static dependence between two turbulent components at two different points. This function tends to zero when the separation r y increases; in other words, the cross-influence of the wind turbulent is inversely proportional to the separation. Davenport [19] suggested an exponential expression as root-coherence function: coh r y , n = e (−C y r y n U ) , where U is the average speed between two points, calculated as 1/2 U(z i ) + U z j ; C y is a non-dimensional decay constant, whose typical value is 10. The variation of mean wind speed to different levels of height can be calculated with an expression known as power law: where f s is a time scale factor, according to Mackey [20] f s = 0.702 in order to scale 3-s to 10-min of mean wind speeds; b is the roughness factor, defined in Table 1; U 10,II is the basic wind speed, and it is a mean wind speed taking at reference height 10 m and terrain category II; α is the exponent, defined in Equation (37). According to Counihanm [21], the exponent α depends on the characteristics of roughness, and can be calculated with the following equation: α = 0.096 log 10 z 0 + 0.016 log 10 z 0 2 + 0.24.
The power spectral density function used in this paper is the von Karman-Harris model [23,24], which has shown a very good approach at high frequency with application to wind codes [25], and it is considering the most mathematically correct wind spectrum according to [26]. The mathematical form of von Karman-Harris model is: where L u (z) is the integral length scale; and σ u is standard deviations for the turbulence components. The standard deviations of wind speed are related with the turbulence intensity. The turbulence intensity is the most basic measure of turbulence, and it is defined by the ratio of the standard deviation of the wind speed to the mean wind speed: The turbulence intensity can be estimated by the following empirical expression [22]: The integral length scales are a measure of the sizes of the vortices in the wind or, in other words, the average size of a gust in a given direction [26]. In wind codes, a typical expression in order to estimate the integral length scales is [22]: where the exponent v depends on the roughness length z 0 , and it can be calculated by: v = 0.67 + 0.05 ln(z 0 ).

Dynamic Response
The Newmark method is one of the numerical integration methods most frequently used in structural dynamic analysis; software packages such as OpeenSees (v2002, CA, USA), Ruaumoko (v2004, Christchurch, New Zealand) and SAP2000 (v2012, CA, USA) [27][28][29] include routines of dynamic analysis by this method. This procedure is based on two types of assumptions in order to simplify and solve the differential equations of motion. The assumptions are focused on determining the acceleration between two time instants, which may be linear or a constant mean value. The idealization of a prismatic building as shear-frame or shear-building ( Figure 4) can be represented by a Multi-Degree-of-Freedom (MDOF) system [30]. The governing equation of motion for MDOF systems subjected to external forces can be given in a matrix form as next: where M, C and K are the mass, damping and stiffness matrices, respectively, d, represented by a Multi-Degree-of-Freedom (MDOF) system [30]. The governing equation of motion for MDOF systems subjected to external forces can be given in a matrix form as next: where , and are the mass, damping and stiffness matrices, respectively, , and are the displacement, velocity and acceleration vectors; and is a matrix with the force vectors. The matrix form of the mass and stiffness matrices for a MDOF structure (Figure 4b) with shear resistant are: The form of the damping matrix depends on the damping characteristic of the structure, which can be classical or nonclassical damping. The classical damping is an appropriate idealization if similar damping mechanisms are distributed throughout the structure, e.g., a multistory building with a similar structural system and structural materials over its height [30]. In this work, the Rayleigh Damping method is used in order to build a classical matrix . The damping matrix can be approached by the following equation of Rayleigh: where and Rayleigh coefficient and could be estimated by: The matrix form of the mass and stiffness matrices for a MDOF structure (Figure 4b) with shear resistant are: . . The form of the damping matrix C depends on the damping characteristic of the structure, which can be classical or nonclassical damping. The classical damping is an appropriate idealization if similar damping mechanisms are distributed throughout the structure, e.g., a multistory building with a similar structural system and structural materials over its height [30]. In this work, the Rayleigh Damping method is used in order to build a classical matrix C. The damping matrix can be approached by the following equation of Rayleigh: where a 0 and a 1 Rayleigh coefficient and could be estimated by: where ζ is the damping ratio for the first mode; and ω i and ω j are two angular frequencies that correspond to two adjacent vibration modes. Assuming a free vibration system without damping, the equation of motion is as follows: Substituting d = Φq(t) = Φ sin(ωt) in Equation (47), the following expression is an eigenvalue problem: where Φ is modal matrix that represents the deflected shape for each vibration modes; q is a vector for each instant time and represents the modal coordinates; and ω is a diagonal matrix with the angular frequency of vibration.
Rewriting the equation of motion to the modal form and multiplying all expressions by Φ T : where M = Φ T MΦ, C = Φ T CΦ, K = Φ T KΦ and F = Φ T F are modal transformation matrices corresponding to the mass, damping, stiffness, and forces matrices, respectively. The procedure of the Newmark method for MDOF structures is shown in Table 2, see Chopra [30], where i indicates the time step. The factors β and γ are constant values taking β = 1/4 and γ = 1/2 for the case of average acceleration and β = 1/6 y γ = 1/2 for the linear acceleration. Table 2. Procedure of the Newmark method for multi-degrees of freedom systems.
Step-by-step procedure: Initial calculations: Repetition for the next time step. Replace i by i + 1 and implement Steps 2.1 and 2.7 for the next time step.

Data Generation
In this paper, several steel shear-buildings discretized as MDOF structures with different structural vibration period T 0 were analyzed using wind loads from synthetic records of 10 min. The case studies are defined by the combinations of some parameters that consider diverse wind conditions and structural characteristics (see Figure 5). The range of values for each parameter was selected to cover a considerable number of sensitive structures to the dynamic effects of the wind. The values of basic wind speed or regional speed are presented in 3-s gust wind speeds because this is more common in this type of design; however, the time scale factor was considered for 10 min. The slenderness relation is defined as R e = H/b, where H is the total height of the structure and b is the plane dimension parallel to the direction of the wind (see Figure 4a). The stiffness ratio corresponds to the reduction of the stiffness along the height between the first and the last story level, i.e., R k = k N m /k 1 (see Figure 4b).
The distributed mass represents the mass concentration per unit area for each floor, and this parameter defines the lumped mass m of each story level.
The dynamic loads are generated from synthetic records, where the turbulent parts of the longitudinal components are obtained by the Shinozuka method. According to Bernoulli, the wind is air mass with kinetic energy. It can be expressed in terms of pressure or force acting on a contact area; the mathematical form for a structure is [31]: where F j (t) is the wind force on the j-th floor; C d is the drag coefficient, for rectangular prismatic C d ≈ 1 [31], which is used to quantify the drag or resistance of an object in a fluid environment; ρ is the density of the air, for this study ρ = 1.25 kg/m 3 ; A j is the contact area on the j-th floor; and U j (t) is the longitudinal component of the wind. It is defined as the sum of the mean velocity and its corresponding turbulent part U j (t) = U j + u j (t); . d j (t) is the velocity of the mass lumped on the jth floor. The dynamic loads are generated from synthetic records, where the turbulent parts of the longitudinal components are obtained by the Shinozuka method. According to Bernoulli, the wind is air mass with kinetic energy. It can be expressed in terms of pressure or force acting on a contact area; the mathematical form for a structure is [31]: where F ( ) is the wind force on the j-th floor; is the drag coefficient, for rectangular prismatic ≈ 1 [31], which is used to quantify the drag or resistance of an object in a fluid environment; is the density of the air, for this study = 1.25 kg/m 3 ; is the contact area on the j-th floor; and U ( ) is the longitudinal component of the wind. It is defined as the sum of the mean velocity and its corresponding turbulent part U ( ) = U + ( ); ( ) is the velocity of the mass lumped on the th floor.
The stiffness matrix, required for the dynamic analysis, was calculated by an iterative process, where the all parameters of structure definition are involved. The pseudo-code used for the creation of the mass and stiffness matrices is shown in Figure 6.  The stiffness matrix, required for the dynamic analysis, was calculated by an iterative process, where the all parameters of structure definition are involved. The pseudo-code used for the creation of the mass and stiffness matrices is shown in Figure 6. The dynamic loads are generated from synthetic records, where the turbulent parts of the longitudinal components are obtained by the Shinozuka method. According to Bernoulli, the wind is air mass with kinetic energy. It can be expressed in terms of pressure or force acting on a contact area; the mathematical form for a structure is [31]: where F ( ) is the wind force on the j-th floor; is the drag coefficient, for rectangular prismatic ≈ 1 [31], which is used to quantify the drag or resistance of an object in a fluid environment; is the density of the air, for this study = 1.25 kg/m 3 ; is the contact area on the j-th floor; and U ( ) is the longitudinal component of the wind. It is defined as the sum of the mean velocity and its corresponding turbulent part U ( ) = U + ( ); ( ) is the velocity of the mass lumped on the th floor.
The stiffness matrix, required for the dynamic analysis, was calculated by an iterative process, where the all parameters of structure definition are involved. The pseudo-code used for the creation of the mass and stiffness matrices is shown in Figure 6.  With the loads calculated and the structures defined, the Newmark method is used to obtain the response of 1600 structures and each one is exposed to 16 scenarios of turbulent wind. Figure 7 shows the behavior of the maximum story drift for a certain group of structures and wind conditions. It is observed that the maximum story drift is directly proportional to the fundamental period and the wind speed and inversely proportional to the terrain roughness. The behavior of these surfaces is appropriate according to the variations of the parameters.
With the loads calculated and the structures defined, the Newmark method is used to obtain the response of 1600 structures and each one is exposed to 16 scenarios of turbulent wind. Figure 7 shows the behavior of the maximum story drift for a certain group of structures and wind conditions. It is observed that the maximum story drift is directly proportional to the fundamental period and the wind speed and inversely proportional to the terrain roughness. The behavior of these surfaces is appropriate according to the variations of the parameters.

Results of Network Trainings
The maximum story drifts obtained for each case study are used as the dataset for the network training process by the LM method. For the network architecture, the parameters of height, slenderness ratio, fundamental period, distributed mass, stiffness ratio, terrain categories and basic wind speed represent seven input values needed to estimate an output, which tends to predict the maximum story drifts. With the aim of evaluating the efficiency of the network against new data (not analyzed), a small percentage (15%) taken randomly of the dataset is excluded in the training process. Table 3 shows the mean square error (MSE) generated for the ANNs with different number of neurons in the hidden layer. It is observed that the error is reduced when the number of neurons in the hidden layer increases. A larger number of hidden neurons allows for modeling any variation in the data; however, the decrees of the error are not significant from 20 hidden neurons.

Results of Network Trainings
The maximum story drifts obtained for each case study are used as the dataset for the network training process by the LM method. For the network architecture, the parameters of height, slenderness ratio, fundamental period, distributed mass, stiffness ratio, terrain categories and basic wind speed represent seven input values needed to estimate an output, which tends to predict the maximum story drifts. With the aim of evaluating the efficiency of the network against new data (not analyzed), a small percentage (15%) taken randomly of the dataset is excluded in the training process. Table 3 shows the mean square error (MSE) generated for the ANNs with different number of neurons in the hidden layer. It is observed that the error is reduced when the number of neurons in the hidden layer increases. A larger number of hidden neurons allows for modeling any variation in the data; however, the decrees of the error are not significant from 20 hidden neurons.
The cross-validation is a technique to evaluate the independence of the results of a prediction model with relation to the training and testing dataset selected. Table 4 shows the cross-validation of the ANN with 20 hidden neurons using, in each iteration, different training and evaluation dataset taken randomly that represent the 85 and 15 percent, respectively. Small variations of MSE between each dataset and iteration are observed. Therefore, the dataset selected generates in an independent way results very close to each other.   The cross-validation is a technique to evaluate the independence of the results of a prediction model with relation to the training and testing dataset selected. Table 4 shows the cross-validation of the ANN with 20 hidden neurons using, in each iteration, different training and evaluation dataset taken randomly that represent the 85 and 15 percent, respectively. Small variations of MSE between each dataset and iteration are observed. Therefore, the dataset selected generates in an independent way results very close to each other.    Table 5 shows the memory and computer time required in the tasks of computational processing for the case H = 150 m, R e = 5, M d = 500 Kg/m 2 , F k = 1, T 0 = 3 s, C t = 2, U 10,II = 150 Km/h. A greater computer time in the training network is observed than the sum of all phases of the standard procedure; however, once the network has been trained, it is prepared to respond to many possible cases of immediately and using little memory.

Conclusions
The maximum story drift of 1600 shear-buildings discretized as MDOF structures were obtained. The Shinozuka and Newmark methods were employed to simulate the synthetic wind records and to obtain the dynamic response of the structures, respectively. The wind loads necessary to compute the dynamic response were developed from wind synthetic records considering 16 turbulent wind conditions. The Artificial Neural Networks (ANNs) were used to generate a computational model capable of estimating the maximum story drift of 25,600 case studies. The Levenberg-Marquardt (LM) method is applied in the process of network training. The parameters of height, slenderness ratio, fundamental period, distributed mass, stiffness ratio, terrain categories and basic wind speed represented the input values in the ANNs. Seven network architectures with 1 to 30 neurons in one hidden layer were studied to evaluate its predictive efficiency. It is observed that errors decrease when the number of neurons in the hidden layer increases. Nevertheless, the decreases of the errors are not significant from 20 hidden neurons. The cross-validation with 10 iterations suggested independence acceptable between the dataset selected for training and the results. The study of computer times required in the tasks of computational processing indicated a major processing time in training networks compared to a standard procedure for a specific case; nevertheless, once the network has been trained, it is prepared to respond to many possible cases of immediately and using little computational memory. Finally, it was demonstrated that with the use of ANNs and the LM method, it is possible to obtain a computational model capable of predicting with a good relationship the maximum story drift of MDOF structures.