1. Introduction
It is a crucial issue to evaluate the response of building structures subjected to dynamic earthquake loads where seismic design of such buildings is generally carried out on the basis of the results obtained from the performed dynamic analysis. Seismic response evaluation of building structures mainly involves determination and assessment of displacement demands [
1,
2,
3]. Characteristics of ground excitations having various peak ground accelerations, together with the dynamic properties of structures, are potentially the most important factors affecting the seismic response behaviour. The accurate prediction of the dynamic response of structures subjected to ground motions from possible future earthquakes can be considered as the first step towards introducing possible mitigation techniques and preventing structural damage against sudden earthquake hazards. The most suitable method for assessing the response of building structures is nonlinear dynamic analysis. Based on the results obtained from the analysis, conclusions regarding the design capacity, as well as the ductility requirements for each structural element or of the whole structural systems, can be obtained [
4]. Ductility is considered as the main factor controlling the damage of structures, as well as an indication of seismic vulnerability of newly-designed or existing under-designed buildings [
5]. The vulnerability of building structures under earthquake records of known characteristics can be defined as the degree of damage to a structure [
6].
The seismic response of building structures using artificial intelligence techniques has captured the interest of many researchers who studied the issues concerning the applications of artificial intelligence techniques to engineering structures. Parsaei Maram et al. [
7] performed an investigation to predict the seismic behaviour of reinforced concrete buildings with and without infill walls using an artificial neural network. Pei and Smyth [
8] introduced a new architecture of a multilayer feedforward neural network for problems with nonlinear and hysteric dynamic behaviours of single-degree-of-freedom oscillators. Several attempts were also undertaken to investigate the importance of considering the neural networks on a prestressed concrete bridge subjected to earthquake excitation [
9]. However, most of the studies on earthquake-induced structural response have been conducted using artificial intelligence techniques. To date, the logical analysis of data (LAD) has not been devoted yet to study the response behaviour of buildings.
This paper uses the machine learning technique LAD to predict seismic response of building structures. A MATLAB (Matrix Laboratory) code has been developed to serve for simulating the dynamic response of the SDOF building system. For the purpose of generating input data for the LAD model, 200 observations from exciting the SDOF building model are collected. A suite of 20 ground motion records from different regions and with different characteristics are used in the dynamic time-history analysis of the SDOF model having different natural periods. The natural period of the considered building model ranges from 0.1 s to 1 s. In
Section 2, the building model and data preparation are presented. In
Section 3, System parameters and input ground motion are introduced. In
Section 4 and
Section 5, the training process, from the experimental data, is introduced and comparison between LAD and the well-known ANN technique is presented. Then, results and discussion are presented. The validation process is presented in
Section 7, and concluding remarks are given in the last section.
2. Building Model
Consider the structural building model and its idealized single degree of freedom mathematical model as shown in
Figure 1. The symbol
m represents the mass of the superstructure of the building model. The structural damping and stiffness coefficients of the building are represented by
c and
k, respectively. The values of structural stiffness and damping coefficients can be calculated from the following formulas [
10]:
where
T and
ξ denote the natural structural vibration period and structural damping, respectively.
The governing equation of motion for the structural building system shown in
Figure 1 can be written as follows:
The derivatives with respect to time in terms of
and
respectively represent the relative acceleration and relative velocity of superstructure.
represents the relative displacement of the building model superstructure. The earthquake ground acceleration is denoted by
. Response analysis of building structures under random dynamic forces, such as earthquake, wind, and blast loads, requires the use of an efficient numerical integration technique where the accuracy of the employed technique is of great importance for practical design. The dynamic solution to the second-order ordinary differential equations in Equation (2) is performed using the Runge-Kutta (RK) methods. The ordinary differential equation in Equation (2) is equivalent to the following first-order system [
11]:
Equations (4) and (5), together with the initial condition
U(0) = 0, provide the following system of autonomous ordinary first-order differential equation:
A MATLAB code based on the explicit fourth-order RK method of has been developed in order to obtain the numerical solution of Equation (6), which is equivalent to the second-order ordinary differential equation in Equation (2). In order to ensure reasonable accuracy with a quick solution to the above system using the fourth-order Runge-Kutta method, the chosen step size Δ
t over the time interval has to be of a value less than one-tenth of the building’s natural period [
12].
3. System Parameters and Input Ground Motion
This section defines the parameters of the considered building model, as well as the set of ground motions used to excite the building model. Details of the aforementioned information are provided in the following two subsections.
3.1. System Parameters
The dynamic parameters of the building structure considered in the current study are taken from an earlier study by Jankowski [
13].
Table 1 shows the basic values describing the structural characteristics used in the analysis.
3.2. Input Ground Motion
A set of 20 earthquake ground motion records from different regions and different peak ground accelerations (see
Table 2) is used in the current research study as inputs to excite the building model shown in
Figure 1. These selected records are from 1971 Imperial Valley (El Centro), 1994 Northridge, 1971 San Fernando, 1989 Loma Prieta, 1984 Morgan Hill, 1986 Palm Springs, 1992 Cape Mendocino, 1983 Coalinga, 1999 Duzce, and 1999 Chi-Chi earthquakes. Important information related to the selected earthquake records, such as earthquake peak ground acceleration (PGA), peak ground velocity (PGV), peak ground displacement (PGD), magnitude (Mw), site to source distance (EPD), soil class (USGS), and characteristic period (Tg), can be found in
Table 2. These selected earthquake records represent various ground motion records with different PGA that varies between 0.1 g and 1.04 g, where g is the acceleration of gravity, in order to ensure low (0.1–0.4), moderate (0.4–0.7), and high (0.8–1.2) PGA levels during the numerical simulations. Different frequency contents, total durations, and local soil conditions are also characteristics of the selected ground motions. The magnitude of the chosen earthquake records varies between 5.2 and 7.1 with site-source distances of about 6.1 km to 43.3 km, which implies that the selected ground motions contain far-fault, as well as near-fault, motions. It is worth noting that the moment magnitude of moderate, strong, major, and great earthquakes vary between 5 to 5.9, 6 to 6.9, 7 to 7.9, and 8 to 8.9, respectively.
4. Logical Analysis of Data (LAD)
LAD is a method for data analysis using concepts from supervised machine learning, combinatorics, and Boolean functions which is conceded as a knowledge discovery technique first developed in 1986 [
14,
15]. The methodology of LAD consists of three steps: binarization of data, pattern generation, and theory formation. For more details about data binarization see [
16]. Many techniques for pattern generation are adopted by many researchers, such as enumeration [
16], heuristics [
14,
17], and integer linear programming [
18]. This last one has been proposed in this paper. In the last step, the discriminant function
is used as follows [
19]:
where
,
is the corresponding pattern in set
covering observation
O, and
is the coverage rate of pattern
with respect to the observations of the class. Pattern
if it covers observation
, and zero otherwise.
The accuracy (
) of classification is:
where
is the number of correctly-classified observations in class
, and
is the number of observations in the testing set.
5. Artificial Neural Network (ANN)
The artificial neural network is the most famous and well-known machine learning technique. An ANN is composed of three types of layers, namely, an input layer, hidden layer(s), and an output layer. The input layer is responsible for accepting input attributes so as to be equal to the number of neurons. In the hidden layer, some number of neurons are contained. On the other hand, the output layer is only a single neuron. It is worth noting that the nonlinearity of the model highly influences the number of hidden layers and the contained neurons. The contained neurons in any of the aforementioned layers are directly related through weighted links to the neurons of pre- and post-layers. Each neurons of the hidden and output layers is offset by a threshold value. It has high efficiency on adaptation and learning.
The accuracy of LAD is compared with that of an ANN, since the latter is the best-known machine learning technique.
The mathematical model of an artificial neuron’s behaviour imitates a mathematical model of a brain’s activity [
20].
Figure 2 shows the behaviour of an artificial neuron .The input attributes
to the network weighted by
are sent to a neuron. By performing accumulation, the neuron sums the weighted inputs, passes the result to a transfer function, and provides an output
[
21]:
where
is the internal threshold; the inputs
in this work are the input attributes, either controllable or uncontrollable (monitoring), as we will see later in
Section 5; and
is the non-linear transfer function. The most commonly-used
is defined by the sigmoid logistic function as:
The output of the sigmoid function is always between [0,1] which can be used to classify the quality outcome to conforming the specification and nonconforming. In other words, the sigmoid function squashes any value of to specify the probability of the binary output given the input . Back-propagation (BP) is used to update the weights dynamically.
An ANN consists of three types of layers, namely, an input layer, hidden layer(s), and an output layer. In
Figure 3, the input layer accepts the inputs, which are the eight variables; seven variables describe the earthquakes and one variable describes the building. The seven variables are peak ground acceleration, peak ground velocity, peak ground displacement, soil class, characteristic period, magnitude, and site-source distance, and the one variable that describes the building is the natural period.
6. Results and Discussion
The collected data is used to train LAD to determine the displacement class by finding hidden patterns that are specific to each class. The found patterns are represented in terms of the variables which describe the used earthquakes and the building. Each pattern is a special rule that is hidden in the data, and which is specific to a certain class. The input data are the 200 observations selected from the simulation analysis using the considered earthquake records. Seven classes of seismic response have been used in the analysis. Each observation contains the measurement of eight covariances and one of the seven labels. The covariances are: peak ground acceleration, peak ground velocity, peak ground displacement, soil type, characteristic period, magnitude, site-source distance, and the natural period. Seismic response is classified into seven classes, as shown in
Table 3.
The building model presented in
Figure 1 has been subjected to a set of 20 earthquake excitation records. The used earthquake characteristics in terms of PGA, PGV, PGD, magnitude, epicentre distance, soil class, and characteristic period are listed in
Table 1. Several natural periods varying from 0.1s to 1 s have been assigned to the considered building model. In
Table 4, the moment magnitude of earthquake is considered moderate from 5 to 5.9 and labelled (1); strong from 6 to 6.9 and labelled (2); major from 7 to 7.9 and labelled (3) and; finally, great from 8 to 8.9 and labelled (4). The total number of results produced from the simulation analysis are about 200 (20 results for each natural period under the list of records in
Table 1). The obtained results have been classified into seven categories.
Table 4 exhibits the patterns found by the software cbmLAD [
22] and shows the first six observation out of 200 as an example of the collected data.
The sensitivity of the seismic response of the building model with several natural periods to the characteristics of the considered earthquake records in terms of PGA, PGV, PGD, magnitude of the earthquake, site-source distance, soil class, and characteristic period, Tg, have also been tabulated in
Table 5. The software cbmLAD has been employed to find the aforementioned tabulated results. The obtained results clearly indicate that the fundamental natural period, Tn, one of the most important parameters that affects the building model seismic response, appears in all the grouped classes of results. The epicentre distance has also been found to highly affect the simulation response results. On the contrary, the moment magnitude Mw of the earthquake has been found to be slightly affect the induced seismic responses. This can be due to Mw being considered as a measure of the amount of earthquake energy released and a measure of damage as well. Similarly, the USCS, which is a unified soil classification system used to describe the texture and grain size of a soil, slightly affects the seismic response results. It is worth noting that the characteristic period, Tg, or in other words the dominant period of an earthquake, showed a pronounced effect on the response prediction of structures. The rigid structures are more affected by the waves with short Tg or high frequency. On the other hand, flexible structures, such as high-rise buildings, are more affected by long Tg (see classes 2–7). Destruction occurs when any of the natural frequencies of the building model coincide with the dominant frequency of the applied earthquake.
7. Validation and Comparison
The observed experimental values have been divided into two types of sets, namely, the learning set and testing set, like many machine learning techniques. The ten-fold cross-validation procedure used herein is performed by randomly dividing all of the observations into ten parts. The divided classes are approximately represented such as to have a similar proportion as in the full dataset. The first set, which is sometimes called the testing set T, is held out. This set represents about 10% of the total observations. The remaining 90% is associated to the trained set required for performing the learning process. The aforementioned percentage values clearly indicate that most of the observations are used for the training process, thus increasing the chance that the classifier found is an accurate one. The average accuracy () of learning is found to be equal to 0.84.
For the purpose of achieving the highest accuracy of prediction during the validation test, the architecture network is used [
20]. In this paper, Weka data mining software [
23] is used to enable obtaining the best network structure. Several trials are performed as a common approach to find the best architecture and then keep it as the recommended one [
20].
Figure 4 shows the best architecture found by Weka. It has one hidden layer of four neurons, a learning rate of 0.2, and a momentum of 0.2. The same ten-fold cross-validation procedure is performed; the confusion matrix is shown in
Table 6.
and
respectively denote the number of correctly and incorrectly classified observations in class
. The superscript 1 refers to class
,
≠
. For example,
refer to the number of correctly-classified observations in class 1, the number of incorrectly classified observations in class 1, and those classified as class 2. The learning accuracy (
) has been found to be:
The LAD accuracy proved to be efficient as compared with that of the ANN. After finishing the training process and starting to analyse sensors to the test or new readings, which is not included in the original dataset, LAD has been found to be powerful in predicting the induced displacement response class based on the found patterns in each new observation. The new observation belongs to the class with the highest score regarding the value . In general, LAD has the advantages of the causality property, which makes it very useful in tackling engineering problems. This means that one can track back results, phenomena, or effects to their possible causes. This clearly recommends the use of LAD, especially when compared to ANN, in which the user may face some difficulties in determining the required number of nodes, as well as the network structure. Moreover, interpreting the classification process is an additional difficulty. This is “black box” procedure of the causality relationship which classifies new observations without any deep explanation.
8. Conclusions
The current research study presents an efficient and reliable technique based on the LAD process for predicting the seismic response behaviour of building structures under ground excitations using the captured dynamic responses. A SDOF building model with different natural periods has been subjected to a set of earthquake records with different characteristics. The LAD technique has been employed to capture the sensitivity of the seismic responses to the characteristics of the applied ground motions. From a building dynamic characteristics point of view, analysis of the obtained results clearly indicated that the variations in the building’s natural period significantly affect the captured dynamic responses. From the earthquake characteristics point of view, the epicentre distance has also been found to highly affect the obtained seismic response results. On the other hand, the moment magnitude, as one of the earthquake characteristics, has been found to slightly affect the induced seismic responses. Based on the conducted study, the proposed LAD model has been proven to be an efficient technique to learn, simulate, and blindly predict the dynamic response behaviour of building structures subjected to earthquake loads.