Automated Structural Damage Identiﬁcation Using Data Normalization and 1-Dimensional Convolutional Neural Network

: In the ﬁeld of structural-health monitoring, vibration-based structural damage detection techniques have been practically implemented in recent decades for structural condition assessment. With the development of deep-learning networks that make automatic feature extraction and high classiﬁcation accuracy possible, deep-learning-based structural damage detection has been gaining signiﬁcant attention. The deep-learning neural networks come with ﬁxed input and output size, and input data must be downsampled or cropped to the predetermined input size of the networks to obtain desired output of the network. However, the length of input data (i.e., sensing data) is associated with the excitation quality of a structure, adjusting the size of the input data while main-taining the excitation quality is critical to ensure high accuracy of the deep-learning-based structural damage detection. To address this issue, natural-excitation-technique-based data normalization and the use of 1-D convolutional neural networks for automated structural damage detection are presented. The presented approach converts input data to predetermined size using cross-correlation and uses convolutional network to extract damage-sensitive feature for automated structural damage identiﬁcation. Numerical simulations were conducted on a simply supported beam model excited by random and trafﬁc loadings, and the performance was validated under various scenarios. The proposed method successfully detected the location of damage on a beam under random and trafﬁc loadings with accuracies of 99.90% and 99.20%, respectively.


Introduction
Bridges are prone to deterioration caused by external loads, such as traffic and environmental loading or natural disasters; therefore, structural-health monitoring (SHM) is critical to ensure safe and reliable operation of the bridges during their service life. Numerous vibration-based damage detection techniques have been studied in an attempt to monitor structural health [1][2][3][4][5][6][7][8][9], and they can be classified into two groups: (1) Parametric model-based methods that utilize the finite-element (FE) model of a structure and update the model parameters using acquired sensor data and (2) data-driven vibration-based damage identification techniques that use a database of measurements to fit a statistical model by extracting features (e.g., natural frequencies, mode shapes, modal flexibility, and curvature), and analyzing the condition of a structure. The parametric-model-based method can optimally calculate structural properties, such as modulus of elasticity and moment of inertia, but the precision of these properties heavily depends on the accuracy of the initial FE model and optimization methods. Data-driven methods have received greater attention because they are simple to implement. However, features from measurements, such as mode shapes and natural frequencies, may not properly be extracted because of measurement noise and the poor excitation quality of a structure, resulting in inaccurate damage detection.
Recently, deep-learning-based damage detection (DLDD) has emerged as an alternative [10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26] to conventional data-driven approaches. The DLDD automatically learns complex features from raw measurements other than dynamic characteristics and conducts nonparametric nonlinear regression for damage detection. Lin et al. [10] proposed a time series one-dimensional convolutional neural network (1-D CNN) to extract features directly from only 20 s of raw acceleration without any hand-crafted feature extraction process. The training datasets were simulated with a FE beam model subjected to random excitation. The research used two different types of loss function to locate damage and estimate the quantity of the damage. Zhang et al. [11] used a deep network structure that was similar to that of Lin et al. [10] using a 1-D CNN for classifying the states of a bridge. Their research used chunks of every 0.6 s of time-series measurement as an input, and they experimentally validated the performance of their trained network. Lee et al. [12] proposed an autoencoder-based deep neural network for detecting damage of a tendon in a prestressed girder using 20 s of raw measurement numerically. Utilizing 2-D convolution, Khodabandehlou et al. [13] converted time-series measurement into an image and used a 2-D CNN for damage state identification. Although the results from the abovementioned research have shown the possibility of using CNNs for automated damage detection, ambient vibration testing requires sufficient measurement time until major vibration modes can be clearly identified. Therefore, increasing the length of inputs for the deep-learning network is critical for accurate structural damage detection.
To address this issue, Guo et al. [14] and Pathirage et al. [15] used mode shapes and natural frequencies. Because modal parameters are fixed with the number of sensors, these parameters are extracted from ambient vibration testing and can be used without adjusting the size of the input. However, owing to the sparsity of the mode shapes, the accuracy of these methods depends entirely on the number of sensors instrumented on a structure. Table 1 summarizes the type of input and deep-learning model used for each method related to automated damage detection. In this report, a time-series deep neural network is presented for automated structural damage detection using a data normalization technique and 1-D CNN. The data normalization technique converts raw measurements with any arbitrary length to a specified size of free vibration, preserving the excitation quality of the measurements, and the 1-D CNN detects and localizes damage on a structure.
For training and validation, the beam was excited with random and traffic loadings for 20 s, and the damage was simulated as 20%, 30%, and 50% of the single elemental stiffness loss. After training, the trained network was tested considering an untrained input size as well as a randomly several damage severities between 20 and 50%.
The remainder of this report is organized as follows. In Section 2, the proposed data normalization technique and CNN structure used are explained. In Section 3, datasets for training and validation are generated through numerical simulations conducted on a 10-element beam model, under 20%, 30%, and 50% of damage to a single random element on a beam. In this study, random and traffic loading excitations were used to validate the performance of the proposed network under nonstationary traffic loading to simulate the environment of ambient field vibration testing. In Section 4, the application of the proposed trained model to training datasets that have different measurement times and damage severities is described, and the accuracy of the proposed method is discussed. Finally, in Section 5, a summary and the conclusions drawn from the study are provided.

Proposed Method
A deep-neural-network-based approach is presented to detect structural damage from varying sizes of structural acceleration measurements. The proposed method has two components (see

Data Normalization through NExT
The natural excitation technique (NExT) [27] uses auto-and cross-correlation functions for two measurements of a structure to replace an impulse response for modal testing, enabling a structure to be tested in an ambient environment. The correlation between the two measurements is a superposition of decaying oscillations, which are characterized by the natural frequencies and damping ratios of the structure.
The correlation function R ij (T) between two measurements at locations i and j in terms of time lag T is expressed as where the superscript r denotes a particular mode from a total of N modes, φ r i is the mode shape at location i at the r-th mode, m r is the modal mass in r-th mode, Q r j is constant associated with response at node j, ω r n is the r-th natural frequency, ξ r is the r-th modal damping ratio, and θ r is the phase angle associated with the r-th modal response.
In this study, the correlation functions for all sensor locations were obtained by inverse Fourier transform of cross power spectral density (CPSD) between two measurements at i and j as where S ij is the CPSD between measurements at i and j, m is the number of sensors, and n is the number of discrete Fourier transforms. The correlation functions in Equation (2) are used as the input data for a 1-D CNN. The proposed data normalization can accept measurements with different data sizes and sampling rates and normalize them to the same data size for flexible deep neural network applications. In the input layer, the normalized correlation functions are reshaped to m 2 × n 2 for training the proposed deep neural network. For example, if nine accelerations are measured and the number of discrete Fourier transforms is 2048, the size of the normalized cross-correlation is 81 × 1024 for nine acceleration measurements on a beam, regardless of the number of samples for the measurement. The normalized input is fed into the 1-D CNN that is introduced in the following section.

Convolutional Neural Network
A CNN [28] is proposed to extract features automatically from time-series measurements and detect damage locations, as shown in Figure 2. The m-channel acceleration measurements are normalized and reshaped to m 2 × n 2 and used as an input layer using Equation (2), where n is the number of discrete Fourier transforms. Following the input layer, three 1-D convolutional layers followed by batch normalization [29] and rectified linear unit (ReLU) [30] are used as a baseline architecture. Global average pooling (GAP) [31] and fully connected layers are used for the damage localization task. The features of the input layer are extracted through the convolutional layer and aggregated by GAP, and the damage location is classified by the fully connected layer. In addition, nonlinearity was added to the model through the ReLU activation function, and the input of each layer was normalized through batch normalization. The details of convolutional neural network architecture is described in Table 2.

One-Dimensional Convolution Layer
Convolution is an operator that multiplies one function by inverted values of another function and then aggregates it against the interval to obtain a new function. CNN creates a circuit of the input data at specified intervals, synthesizes it by channel, and makes the sum of the composite product of all channels into feature map H.
where i indicates the number of layers in the network, and the ⊗ symbol indicates a convolution operation. The 1-D convolutional layer performs element-by-element multiplication of the input array and the kernel along the temporal axis of the input array. One-dimensional convolution consists of simple array operations in forward propagation and backpropagation such that the computational complexity is lower than that of 2-D convolution. The raw 1-D data are processed and learned to extract features that are used in the classification task.

Batch Normalization
The deep-learning model is a gradual way to learn meaningful expressions in a series of deep layers. This means learning to express features in a continuous layer. As the layer deepens, learning instability, such as gradient vanishing/gradient exploding, can occur while performing backpropagation operations. Batch normalization is a method that prevents "internal covariance shift," a phenomenon in which the distribution of inputs varies on each floor or each activation function, by conducting normalization in minibatch units. The average and standard deviations for each feature are obtained and then normalized, and a new value is created using the scale factor and shift factor.
where µ B denotes the average of minibatch B, and σ B is the standard deviation of minibatch B. The parameters γ and β are learned through backpropagation that adjusts the normalized values to avoid being driven to zero. The parameter ε is an extremely small constant to prevent the denominator from becoming zero.

Rectified Linear Unit (ReLU)
An activation function outputs a signal by entering it and processing it properly. Several types of activation functions exist, and the decision about the activation function considerably affects the results. The activation function is used to add nonlinearity to the output value of the convolutional layer, such as sigmoid, tanh, and ReLU.
ReLU is the most commonly used nonlinear activation function in CNNs. Because ReLU is a nonsaturated function, and the gradient is zero if the input values are less than zero, and learning occurs if the input is greater than one. In addition, faster calculations and convergence rates can be achieved using ReLU.

Global Average Pooling
The pooling layer is used to reduce the size of the activation map or to emphasize specific features by receiving output data from the convolution layer as input. The pooling operation works by collecting the maximum value of a specific area or by averaging the values obtained for a specific area. In this study, the model uses GAP, in which window size or stride does not need to be designated; thus, overfitting is avoided. Unlike average pooling, which extracts only the average from a specific area and applies it after every convolutional layer, GAP derives the mean of the node values for each feature map. The GAP layer performs dimensionality reduction where the input is h × w × d and is reduced to 1 × 1 × d by taking the average of all hw values (see Figure 3). The 1-D GAP block takes a 2-D tensor (data points × channels) and computes the average of all values (data points) for each of the channels.

Fully Connected Layer
After extracting the features including spatial information through the CNN, the extracted feature map is classified through the fully connected layer. A fully connected layer is the same operation as a general neural network, and the input is a 1-D array.
For multilabel classification, softmax is mainly used; the softmax function is also a type of activation function. If sigmoid or tanh are mainly used for binary classification, softmax is used for multilabel classification. Softmax in Equation (8) is a function that normalizes all the input values from 0 to 1. The class with the largest output value among the outputs generated by the probability is considered.

Simulation Model
Numerical simulations were conducted to generate a database for the proposed automated structural damage detection method. A simply supported beam with a length of 50 m was modeled with a 10-element Bernoulli beam (see Figure 4). The cross section of the model was 4 × 2 m (width × height), the modulus of elasticity was 210 GPa, the density was 7850 kg/m 3 , and the damping ratio was 2%. The three major natural frequencies of the beam without damage were 1.85, 7.52, and 16.8 Hz. Damage was simulated by reducing the flexural rigidity of the damaged element by 20%, 30%, and 50%.
To excite the structure with different loading cases, two loading conditions-random excitation and traffic loading excitation-were considered. A random excitation was provided to validate the proposed method in an ideal condition where all frequency spectra are excited so as to obtain precise modal parameters. Traffic loading was considered for the actual excitation of a structure, which requires sufficient measurement time for accurate modal analysis. Random excitation was modeled with a normal distribution with a mean of 0 and a standard deviation of 20 kN at a single node randomly on the beam for each simulation case. Traffic loading was modeled as a three-wheel truck moving on the beam at different speeds from left to right (nodes 1 to 9). The force on the three wheels weighed 35 kN with 10% random force on the front and 145 kN with 10% random force added to two rear axles. The distance between the front and rear axles was 4.3 m, and that of the two rear axles was 9.0 m.
Sensor noise was assumed to be white Gaussian noise with a noise density of 100 µg/ √ Hz. The sampling rate was determined to be 100 Hz, and responses were generated using MATLAB Simulink with the ODE8 solver.

Data Generation for Training
For training of the proposed model, 24,000 datasets were generated with 50% random excitation and 50% traffic loading excitation, 20,000 datasets were generated for training, and 4000 datasets were used for validation. Two training cases were conducted to figure out the effect of data normalization pre-processing step (see Table 3). The first case consisted of datasets with sampling time of 60 s while varying sampling times from 20 to 60 s were applied to the second case. Each case includes random excitation and traffic loading datasets. Note that 20 to 60 s data was created by randomly truncating 60 s data. These two cases were compared to find out how long data length, i.e., beam excitation, affects the learning outcome and how data normalizing affects. To label the location of damage, 11 categories were prepared. Label 0 indicates an intact condition, whereas labels 1 to 10 indicate a single damage location that corresponds to the element number. Because there were 10 labels for damage (i.e., elements 1 to 10) and one label for the intact condition, the number of intact datasets was increased so that the number of intact datasets was equal to the number of damage datasets to handle data unbalancing [32][33][34][35].  Figure 5 shows the random excitation and resulting acceleration response measured at node 1. To simulate traffic loading, three to five AASHTO trucks were designed to pass the beam randomly with velocities of 30, 40, and 50 km/h, as summarized in Table 3. Trucks are set to depart randomly so that various acceleration amplitudes can be generated (see Figure 6).  The simulated acceleration measurements were normalized by NExT to an input layer of 81 × 1024 and fed into a 1-D CNN.

Training and Validation
The simulated acceleration measurements were trained with the proposed deep network for automated damage localization. The nine-channel accelerations were normalized using NExT and then standardized to have a mean of 0 and a standard deviation of 1. The preprocessed input data were fed to a series of 1-D convolution layers that have 128 filters with kernel sizes of 8, 5, and 3. For better convergence, the kernels were initialized by the uniform He initialization scheme proposed by He et al. [36], which samples from the uniform distribution. To avoid the overfitting issue, a batch normalization layer was added at the end of each 1-D convolution layer followed by GAP layers. The detailed configurations of training are summarized in Table 4. Intel i9-9900x and Nvidia RTX2080Ti were used for training datasets, and a single training epoch cost 12 s.

Training Results and Discussion
For demonstration of training results, classification accuracy of each validation datasets is shown in Table 5. According to classification results, Case 1 shows better classification capability in both cases. These results can lead us to the fact that training with longer data, which represents higher degree of excitation can provide more information to train the model. However, the longer the data, the longer the model takes to learn and harder to use for real world applications. For example, training a 60-s-long raw data with an input size of 9 × 6000 takes 34 s per epoch, whereas normalized 81 × 1024 takes 13 s. Thus, to secure learning efficiency in time-series deep learning model for structural damage detection, the proposed data normalization step is essential. A comparison was made between the proposed method and the model proposed by Lin et al. [10]. Since the existing model was originally proposed to train with raw acceleration data, 10.24 s of raw acceleration data of which the size is 9 × 1024 was used for training. Furthermore, for demonstration of proposed data normalization, Case 1 dataset was used with the existing model.
As shown in Table 6, classification accuracy is 99.90% and 81.20% under random excitation for the proposed method and the existing model, respectively. The proposed method showed 18.70% better performance in damage classification compared to existing model. It is noteworthy that the accuracy of the models using traffic loading clearly show excellence of the proposed model. The proposed method showed 99.20% of accuracy while the existing model exhibited 59.80% of classification accuracy. The difference is resulted from the use of the proposed data normalization that allows to aggregate frequency-domain information compared to instant time-domain information presented in the existing model. As a result of applying the data normalization method to existing model, the classification accuracy was significantly improved. However, in the existing model, the accuracy in the training process reached 100% at 98th epoch, but the validation accuracy did not exceed 93%. Since the existing model used max-pooling layers after convolutional layers, the overfitting issue was presented. On the other hand, GAP layer added to the proposed method instead of max-pooling layers addressed overfitting issue of the exiting model yielding better performance.

Dataset for Application Test
One thousand datasets for random excitation and traffic loading were generated to evaluate the proposed model for new datasets that were different from the trained dataset in that sampling time was extended from 60 to 120 to 150 s, and damage severity was randomly selected to be 0% and 20-50%. Additionally, the number and velocity of vehicles were increased. Table 7 compares the trained dataset and the new dataset for validation.

Results and Discussion
The test results for each category that indicate the damage location are given in Tables 8 and 9 for random excitation and traffic loading, respectively.  0  88  88  100  1  101  101  100  2  107  107  100  3  92  92  100  4  83  83  100  5  93  93  100  6  94  94  100  7  108  108  100  8  89  89  100  9  68  68  100  10  77  77  100  Total  88  101  107  92  83  93  94  108  89  68  77 1000 100 According to the test results shown in Tables 8 and 9, the classification accuracy for random excitation was perfect and 1.0% higher than for traffic loading. Most of the misclassifications occurred in determining the existence of damage in the traffic loading case, whereas the random excitation case shows perfect classification result. These results demonstrate that the system vibrates more when random excitation is used than when traffic loading is used; thus, random excitation is used for assessing the status. However, considering the application to real structures that are tested with ambient vibration, the test result of traffic loading shows the robustness of the proposed method with a high accuracy of 99.00%.
Tables 10 and 11 summarize the confusion matrix of each case that represents the classification capability of whether the system is intact or damaged. For analysis, "intact" is considered positive and "damage" is considered negative. From the confusion matrix, recall and fallout, represented by the true positive rate (TPR) and false positive rate (FPR), respectively, can be analyzed. TPR represents the ratio of the number predicted as positive cases to the number of actual positive cases and can be expressed  FPR represents the ratio of those predicted as positive cases to the number of actual negative cases and is expressed as In both random excitation and traffic loading cases, the trained model showed that the TPR was 100% and had excellent capability to categorize "intact." On the other hand, the capability to categorize damage represented by FPR was 0% for random excitation and 0.65% for traffic loading. These results show that the trained model performs well for classification of whether the condition of the structure is intact or damaged.
To show the performance of trained model, visualization using t-distributed stochastic neighbor embedding (t-SNE) [37] was conducted. The t-SNE is a nonlinear dimensionality reduction technique used to visualize high-dimensional data in low-dimensional space. This technique was used to create 2-D distribution map of the predicted location of damage and intact with the output from GAP layer, which was right before the classification layer.
Two 2-D maps, random and traffic loading, are shown in Figure 7, and each map shows 1000 points from test datasets. Figure 7 shows that points in the same category are close to each other and form clusters. Clustering was observed more clearly for random excitation result than for traffic loading.

Conclusions
An automated damage detection method using a deep neural network was presented. The proposed method for automated structural damage detection comprises two main contributions. (1) Data normalization is performed using NExT to compress and normalize the input data length. The acquired data can vary according to the measurement environment and purpose. Normalizing and quantifying the data length are critical to damage detection through deep neural networks because deep neural networks only work for trained data lengths. (2) A CNN is used to localize damaged elements. The proposed convolutional network can localize damaged elements from normalized input acceleration signals without any damage-sensitive extraction process.
A numerical model of a simply supported beam was excited by random ambient load and traffic loading, and acceleration responses were extracted from nine nodes. Sensor noise is considered to demonstrate the reality of the measurement. Noisy acceleration signals from nine nodes were correlated with each other to normalize and quantify the data length, and a fully correlated response matrix was generated. The fully correlated response matrix is the input of the deep neural network, and the output is the location of the damaged element.
For the training of the proposed method, datasets were generated using random excitation and traffic loading, and single damage was randomly applied to one of the elements. A total of 20,000 datasets for training (10,000 for each load) and 4000 datasets for validation (2000 for each load) were used to train the model. The number of intact data (i.e., category 0) was greater than each damage data (from category 1 to category 10) to balance the number of intact to total number of damage data. For representation of effect of data normalization, two training cases were compared. The resulting classification accuracy for Case 1 was 99.90% and 99.20% for random excitation and traffic loading datasets, respectively. Through the results, it was found that by using data normalization technique in the pre-processing step, longer data can be used for training to enhance classification capability of trained model.
The proposed method was validated through tests with different conditions for generating datasets. For random excitation, the sampling time was increased. For traffic loading, the velocity, number of trucks, and sampling time were increased. In addition, 0% or 20-50% of damage was chosen at random for both loads. The classification accuracy of random excitation was 100% and that of traffic loading was 99.00%.
Future work is planned to focus on problems caused by the low severity of damage. The proposed method using different deep neural networks rather than 1-D CNN will be studied to improve classification accuracy. Furthermore, to enhance the effectiveness of the proposed method, the classification of multiple damages and prediction of damage severity will be studied.