A Framework of Structural Damage Detection for Civil Structures Using Fast Fourier Transform and Deep Convolutional Neural Networks

: In the ﬁeld of structural health monitoring (SHM), vibration-based structural damage detection is an important technology to ensure the safety of civil structures. By taking advantage of deep learning, this study introduces a data-driven structural damage detection method that combines deep convolutional neural networks (DCNN) and fast Fourier transform (FFT). In this method, the structural vibration data are fed into FFT method to acquire frequency information reﬂecting structural conditions. Then, DCNN is utilized to automatically extract damage features from frequency information to identify structural damage conditions. To verify the effectiveness of the proposed method, FFT-DCNN is carried out on a three-story building structure and ASCE benchmark. The experimental result shows that the proposed method achieves high accuracy, compared with classic machine-learning algorithms such as support vector machine (SVM), random forest (RF), K-Nearest Neighbor (KNN), and eXtreme Gradient boosting (xgboost).


Introduction
Due to environment and load acting on structures, civil structures can produce different levels of damage, such as degradation, corrosion, and fatigue. It reduces the building service life, leading to threats to public safety [1]. Structural health monitoring (SHM) is a critical technology to assure the safety of civil engineering [2,3]. SHM is mainly classified into two categories: data-driven methods based on statistical pattern recognition and physics-based methods based on finite element model updating [4]. Compared with physics-based methods building a numerical model, data-driven methods have many advantages over physics-based methods while identifying structural damage under load and environmental influence such as temperature and moisture effect [5]. Thus, many researchers have focused on data-driven methods for structural damage detection to protect the safety of civil structures. The data-driven method is usually decomposed into several steps: data acquisition, feature extract, damage detection method, and decision-making, where damage detection methods are a big challenge for SHM [6][7][8].
With the development of computing power, machine learning (ML) algorithms have been widely used in the SHM field. Since vibration signals can reflect structural damage conditions, ML algorithms usually utilize structural vibration data to recognize structural damage, especially in handling vibration data with small datasets [9,10]. For example, Seyedpoor [11] adopted a support vector machine (SVM) and differential evolution algorithm (DEA) to identify the structural damage in moment frame connections. Using semi-rigid beam simulated damage in structural connections, and then the vibration data of structures with damaged connections were generated by the analytical model. Then, these data were fed into the SVM model to update the model's parameters and weight. The result showed that the SVM-DEA had an excellent performance for structural damage detection. Guo [12] proposed a method based on Bayesian theory and immune genetic algorithm (IGA) to recognize structural damage where Bayesian was applied to identify damage sites, and then improved IGA was utilized to identify the damage level. This study provided a two-stage method that can effectively assess the damage locations and extent. Wu [13] utilized the parson correlation coefficient to select essential information features, and then the features were fed into Ensemble Generalized Multiclass Support-Vector-Machine (EGMSVM) to recognize structural damage. To verify the effectiveness of the proposed method, different algorithms such as LDA, random forest, SVM, was tested on a simulated hydraulic platform. The result showed that the EGMSVM achieved high accuracy and low variance and deviation.
However, these methods, such as support vector machine, ensemble algorithm, Bayesian algorithm, belong to "shallow" machine learning. In addition, although "shallow" ML algorithms recognize structural damage and location with high accuracy for small datasets, it is poor performance in handling the massive vibration data from SHM systems.
With the development of sensors technology and data acquisition, SHM systems can collect an amount of data from various sensors installed on civil structures. Since deep learning methods effectively handle massive data, it has attracted much attention from many researchers in many fields such as image classification [14,15] and natural language processing [16]. In these methods, vibration-based convolutional neural networks (CNN) algorithms are widely utilized in civil engineering since it is powerful in extracting the feature from raw vibration data to recognize structural damage. For example, Nur Sila [17] adopted CNN to capture abstract features and complex classifier boundaries and then classified the damaged and healthy condition of structures. The experimental result showed that CNN accomplished real-time damage diagnosis and location with high accuracy, robustness, and computational efficiency. Bao [18] proposed a computer vision method that detects structural damage using the CNN model. The obtained vibration data were transformed into time and frequency domain images via visualization method, and then these images were fed into CNN to classify structural damage. The accuracy of classification can refer to 93.5%. Abdeljaber [19] and Zhang [20] proposed a one-dimensional CNN with data-processing techniques that utilized a small training sampling to identify structural damage location and mass changes effectively. However, the above methods have some limitations in handling contaminated or noisy data using the CNN methods because they may regard the contaminated information from acceleration data as fault information. More importantly, contaminated time-sequence data such as acceleration data cannot effectively reflect structural damaged conditions. Thus, many scholars have studied methods based on frequency information. To be specific, time signal is transformed into frequency information, which can reduce the influence from contaminated data and improve the accuracy of damage identification in noisy environments.
For example, Hoshyar [21] converted vibration signal to frequency information and then utilized machine learning algorithms such as support vector machines to localize concrete cracks based on obtained frequency information. The experimental results showed that the proposed method had high accuracy. Tehrani [22] utilized an artificial intelligence method based on fast Fourier transform (FFT) to recognize structural damage degree. The results showed that FFT was suitable for nonstationary vibration signals. Nguyen [23] proposed a method by combining FFT analysis and artificial learning to evaluate damaged changes of structures through the discrete model. The result gained higher accuracy compared with using only other models or analyses. However, the accuracy of FFT combined with traditional methods such as support vector machines needs to be further improved. Although FFT has many advantages in handling time-sequence, the existing methods do not fully consider the advantages of combining FTT with deep learning, such as DCNN for structural damage detection.
In this paper, high precision and robust structural damage detection method are proposed based on FFT-DCNN. In this method, FFT is utilized to analyze the frequency information, reducing the influence of contaminated data [24,25]. DCNN automatically extracts features from structural frequency information [26,27]. Finally, a three-story building structure [28,29] and ASCE benchmark [30] are usually utilized to evaluate the ability of structural damage detection based on our proposed method.
The following are the primary contributions of this paper: (1) A novel sensor datadriven structural damage detection method is proposed by combining FFT with DCNN, which can effectively handle the vibration signal to recognize the structural damage condition. (2) Compared with traditional damage detection such as FFT-SVM and SVM, KNN, random forest, and XGBoost, the experimental result shows that the proposed method achieves higher accuracy damage detection. (3) Since FFT-DCNN takes a short time on test datasets, it indicates that the proposed method can be utilized for the online detection of structural damage conditions in the field of SHM.
The rest of the paper is organized as follows. Section 2 introduces the proposed FFT-DCNN architecture. Section 3 describes the structural damage detection method based on the proposed method. The experimental setup is introduced in Section 4. FFT-DCNN is carried out on a three-story building structure and ASCE benchmark in Sections 5 and 6, respectively. Finally, Section 7 summarizes some conclusions based on FFT-DCNN and potential topics for future research. Figure 1 illustrates the architecture of the designed FFT-DCNN. It has five layers: input layer, FFT layer, convolutional neural network (Conv1, Conv2, MP1, Conv3, Conv4, and MP2), fully connected network (FC1, FC2), and output layer. In the CNN algorithm, Conv1 represents the first convolutional operation, and MP1 represents the first pooling operation. In this frame, raw vibration data are transformed into frequency information via FFT method. Considering that the CNN can be powerful in capturing the spatial features, frequency information with one dimension is transformed into a feature map with two dimensions via dimension transformation, improving the ability of features extraction for DCNN. Then, convolutional and max-pooling layers are utilized to capture spatial features from frequency information. Finally, the extracted features are fed into a fully connected network with a "softmax" activation function to recognize the damaged conditions of the structure. In addition, DCNN is powerful in handling two-dimension data. However, the frequency information data that are transformed by the FFT method is onedimension. DCNN cannot effectively extract the feature from the frequency information. Thus, the dimension transformation concept is referenced to transform one dimension into two dimensions.

Fast Fourier Transform Layer
Fast Fourier transform (FFT), as a discrete Fourier transform algorithm, was first proposed by Cooley and Tukey in the year 1965. It can reduce the number of computations needed for N points from O N 2 to O(N log N). With the increasing of sampling points, this method can save more computational resources.
Features extraction of FFT is shown in Figure 2. First of all, utilizing a sliding window acquires n consecutive time-domain data from the original signal. Then, every window data are transformed frequency information via FFT method, which is described as: where x 0 , . . . , x N−1 is a complex number and X k is time-domain data. N represents the number of sampling points.    Figure 2. First of all, utilizing a sliding window acquires n consecutive time-domain data from the original signal. Then, every window data are transformed frequency information via FFT method, which is described as: is a complex number and k X is time-domain data. N represents the number of sampling points. Figure 2. Flowchart of features extraction.

Convolutional Layer
In this study, the DCNN has two main layers, which are a convolutional layer and a pooling layer, respectively. The convolutional layer of DCNN is mainly utilized to extract     Figure 2. First of all, utilizing a sliding window acquires n consecutive time-domain data from the original signal. Then, every window data are transformed frequency information via FFT method, which is described as: is a complex number and k X is time-domain data. N represents the number of sampling points. Figure 2. Flowchart of features extraction.

Convolutional Layer
In this study, the DCNN has two main layers, which are a convolutional layer and a pooling layer, respectively. The convolutional layer of DCNN is mainly utilized to extract

Convolutional Layer
In this study, the DCNN has two main layers, which are a convolutional layer and a pooling layer, respectively. The convolutional layer of DCNN is mainly utilized to extract features from building structures via convolution operation. In this procedure, convolutional kernels slide on the input sample via using the same size convolution kernel (h C , w C ) where h C , w C , and s C represent the length, width, and step size of the convolution kernel, respectively. After finishing the convolutional operation, the output matrix can be obtained, and the number of kernels is the same as the output matrix. Output matrix of the convolutional layer is described as: where * represents the convolutional operation. W j and B j represent the j-th (j ∈ 1, 2, · · · , K C ) convolution kernel and bias, respectively. f (·) denotes activation function. C j and C are the j-th output and the entire output, respectively. In addition, the important information is lost due to dimensional reduction of input data X i , when the input sample X i carries out a convolutional operation leading to a change of the feature dimension. To solve the problem, "same padding" is utilized in this study, which can keep the same dimension between input and output. After the convolutional operation, the dimensional operation of the output result can be described as: where w C and h C are length and width of the frequency information via FFT transformation, respectively. K C denotes the number of convolutional kernels. n l and m l are length and width of convolutional kernels. s c represents the sliding step.

Pooling Layer
After convolution operation, the dimensions and parameters of the input matrix can increase, leading to required computing resources. Thus, a pooling layer is utilized to perform operations of dimension reduction, while it can save the important information of extraction features. In addition, a pooling layer can solve some problems such as overfitting or the long training time. The pooling procedure is described as: where C j is convolutional layer and P j presents pooling result of j-th convolutional layer. P describe all pooling layers.

Fully Connected Layer
After convolutional and pooling layers, the obtained two-dimensional matrix is transformed into a one-dimensional vector via flattened operation. Then, the vector is fed into a fully connected layer, which is described as: where P represents input vector. W F and b represent weights and bias, respectively. f (·) denotes activation function. F is the outputs of fully connected layers.

Classification Layer
The classification layer adopts the output feature from the fully connected layer to predict different structural damage via a softmax activation function. More specifically, for every input vector F, the classification layer can predict the probability of F belonging to different categories. The range of predicted F is between 0 and 1, and the sum of probability values is 1. Then j-th predicted category is expressed as: where F j represents probability belonging to different categories. Y j is the probability of predicting j-th categories via softmax classifier. n represents the number of entire categories.

Structural Damage Detection Method Using Proposed FFT-DCNN Architecture
This study proposes an FFT-DCNN method where FFT converts one-dimensional vibration signal into frequency information that reflects the structural damage, and the dimension of frequency information is M × 1. Considering that DCNN is powerful for extracting spatial features of input data, the frequency information M × 1 is converted to Fre M = √ M × √ M the matrix via dimension transformation, which can reflect the spatial information. Then, Fre M are fed into the DCNN model to distinguish different structural damage condition. Figure 3 shows the specific process of structural damage detection. Firstly, a sliding window is utilized to split the raw acceleration signals to obtain more samples, which is the term of data augmentation means [31]. Then, to make the data dimension transformation more convenient, the sliding window is set to 324 points, considering that the sampling frequency is 322.58 Hz. In addition, a sliding window with a fixed size is utilized to split the raw acceleration signals. Thus, the length of every segment is the same. Frequency information with 324 points can be acquired by utilizing the FFT method, and every segment includes frequency information of acceleration data. where k Q is the predicted value and K P represents the real value. Finally, after finishing the training, the testing dataset is fed into the DCNN model to identify the degree of structural damage. If the accuracy metric is high, it represents that the FFT-DCNN has excellent performance, and the proposed methods can be applied to actual structural damage detection.

Experimental Setups and Data Description
This section mainly introduces datasets, experimental settings, and evaluation criteria. Figure 4 shows a three-story building structure is made of aluminum columns and plates assembled using bolted joints with a rigid. The building structure has three floors where the top and bottom of the building (30.5 cm × 30.5 cm × 2.5 cm) structure are connected by utilizing four aluminum columns (17.7 cm × 2.5 cm × 0.6 cm) to form a fourdegree of freedom system. Moreover, a center column is installed on the top floor, shown in Figure 4b, which can change the degree of nonlinearity via changing the distance between the column and the bumper [33]. To collect data from the structure, the structure has five channels, including a force transducer and four acceleration sensors obtaining vibration signals. The sampling frequency is set to 322.58 HZ. Because temperature has a significant influence on the dynamic parameters of the structure monitored, considering that the three-story frame structure was tested in the LOS ALAMOS national laboratory, the temperature should be constant. Thus, the temperature generating influence can be ignored. The obtained datasets are randomly divided into training datasets, validation datasets, and testing datasets with a ratio of 6:2:2. Then, the frequency feature matrix Fre M is then fed into the DCNN model. The loss function and optimizer are used to continuously iterate and update model parameters via training datasets. To reduce the overfitting training process, the model finishes the renewal of model parameters and training process when the training model obtains a preferred result of structural damage detection on verification datasets.

Data Description
During the training process of FFT-DCNN model, the vibration data are fed into FFT to analyze the frequency information. Then, the Fre M matrix is fed into convolutional layers extracting spatial features and max-pooling layers reducing trained parameters in the DCNN model. To avoid overfitting problems and improve the ability of features extraction, dropout is added into a fully connected network [32]. After the four convolutional layers and two max-pooling layers extracting features from vibration signals, the features are fed into two fully connected networks with 300, 200 neurons to predict structural damage conditions.
In addition, the optimizer, Adam, is used to update the DCNN model parameters in every iteration. The dropout value is set to 0.5, and the batch size is set to 512. The initial learning rate is 0.001. In this study, the cross-entropy loss function is utilized to assess the training results, which is shown as: where Q k is the predicted value and P K represents the real value. Finally, after finishing the training, the testing dataset is fed into the DCNN model to identify the degree of structural damage. If the accuracy metric is high, it represents that the FFT-DCNN has excellent performance, and the proposed methods can be applied to actual structural damage detection.

Experimental Setups and Data Description
This section mainly introduces datasets, experimental settings, and evaluation criteria. Figure 4 shows a three-story building structure is made of aluminum columns and plates assembled using bolted joints with a rigid. The building structure has three floors where the top and bottom of the building (30.5 cm × 30.5 cm × 2.5 cm) structure are connected by utilizing four aluminum columns (17.7 cm × 2.5 cm × 0.6 cm) to form a four-degree of freedom system. Moreover, a center column is installed on the top floor, shown in Figure 4b, which can change the degree of nonlinearity via changing the distance between the column and the bumper [33]. To collect data from the structure, the structure has five channels, including a force transducer and four acceleration sensors obtaining vibration signals. The sampling frequency is set to 322.58 Hz. Because temperature has a significant influence on the dynamic parameters of the structure monitored, considering that the three-story frame structure was tested in the LOS ALAMOS national laboratory, the temperature should be constant. Thus, the temperature generating influence can be ignored.  According to different damage and nonlinear degree in Table 1, there are some structural conditions. Then, each scenario is repeated ten times, and these data sensors a recorded by sensors. Structural conditions (1-5) represent structural damage caused nonlinearity, which imitates effects from crack opening and closing. Structural conditio According to different damage and nonlinear degree in Table 1, there are some 13 structural conditions. Then, each scenario is repeated ten times, and these data sensors are recorded by sensors. Structural conditions (1-5) represent structural damage caused by nonlinearity, which imitates effects from crack opening and closing. Structural conditions (6-7) are designed as mass and nonlinearity changes, which represent impact-induced damage. Structural conditions (8-13) denote stiffness changes, where structural conditions (11)(12)(13) are more severe than (8-10) due to more stiffness reduction of structure. State#15 Gap = 0.10 mm + mass on the 1st floor 8

Crossvalidation for Datasets
Datasets dividing into the training dataset, validation dataset, and testing dataset with 6:2:2 ratio and the validation are utilized to evaluate the performance of the proposed deep learning algorithm. However, the single split cannot ensure achieving an optimal result of identifying structural damage conditions among subdataset. Therefore, the K-fold crossvalidation method is adopted to reduce the bias during the testing process. More specifically, the datasets are divided into the training and testing datasets with a ratio of 8:2. Then, the training dataset is split into K equal portions where one portion is validation, and the remaining datasets are training. In this study, the K = 4 is selected, and the crossvalidation of datasets is shown in Figure 5. deep learning algorithm. However, the single split cannot ensure achieving an optimal result of identifying structural damage conditions among subdataset. Therefore, the Kfold crossvalidation method is adopted to reduce the bias during the testing process. More specifically, the datasets are divided into the training and testing datasets with a ratio of 8:2. Then, the training dataset is split into K equal portions where one portion is validation, and the remaining datasets are training. In this study, the K = 4 is selected, and the crossvalidation of datasets is shown in Figure 5.

Evaluation Metrics
Four evaluation metrics, including Accuracy, Precision, Recall, and F1-sore, are selected to evaluate the proposed method and compared algorithms. The above four metrics are defined as follows: Pr Re F Pr Re Recall( , ) TP y y TP FN = + (11)

Evaluation Metrics
Four evaluation metrics, including Accuracy, Precision, Recall, and F 1 -sore, are selected to evaluate the proposed method and compared algorithms. The above four metrics are defined as follows: Recall(y,ŷ) = TP TP + FN Precision(y,ŷ) = TP TP + FP Accuracy(y,ŷ) = TP + TN TP + FP + TN + FN (13) where y andŷ represent the true labels and predicted labels. TP and FP are the number of true positives and are the number of false positives. TN and FN are true negatives and false negatives, respectively.

Experimental Results and Discussion for Three-Story Building Structure
The section uses a three-story building to evaluate the effectiveness of the proposed method.

Experimental Setup and Data Description
Because the sliding window length is set to 324, every one-dimensional vibration signal with 324 sampling length is obtained. The vibration signal are transformed into frequency information with 324 × 1 via FFT method. Then, frequency information is converted to Fre M = 18 × 18 features matrix via shape dimension transformation, considering that DCNN is adopted at extracting spatial features. Finally, DCNN extracts damaged features from Fre M matrix to identity structural damage. Figure 6 shows the process of features transformation of four different damage condition, including that damaged degree is 10%, damaged degree is 20%, damaged degree is 30%, and damaged degree is 40%. The frequency information of different damage conditions is different with increasing damage degree. converted to =18 18 M Fre × features matrix via shape dimension transform considering that DCNN is adopted at extracting spatial features. Finally, DCNN e damaged features from M Fre matrix to identity structural damage. Figure 6 sho process of features transformation of four different damage condition, includin damaged degree is 10%, damaged degree is 20%, damaged degree is 30%, and da degree is 40%. The frequency information of different damage conditions is differen increasing damage degree.  Table 3 shows the configuration of the proposed method. The size convolutional kernel is 7 × 7 in the first layer, and the size of the convolutional ker  Table 3 shows the configuration of the proposed method. The size of the convolutional kernel is 7 × 7 in the first layer, and the size of the convolutional kernel is 5 × 5 in the second. The maximum pooling layer is 2 × 2 in the third layer, the size of the convolutional kernel in the fourth layer is 3 × 3 in the fifth layer, and the size of the convolutional kernel is 3 × 3 in the sixth layer, and the maximum pooling layer is 2 × 2 in the seventh layer.  Table 4 shows the four-fold crossvalidation result on three three-story building structures using the proposed method. In the four-fold crossvalidation, the accuracy of training datasets is 93.48% on average, and the accuracy of testing datasets is 93.29% on average. It shows that FFT-DCNN is suitable for identifying structural damage degree effectively.  Figure 7 shows the accuracy curve for the proposed methods on Fold-1 datasets. The accuracy of training, testing, and validation reach 0.9 after achieving 200 epochs. The results show that FFT-DCNN presents an excellent ability of features extraction on the three-story building structure. In Figure 8, the loss curve of the FFT-DCNN model is smooth, which shows an excellent fitting ability and training process.

Compared with Other Methods
To verify the advantages of our proposed method, classical ML algorithms including support vector machine (SVM) [34], FFT-SVM, random forest (RF) [35], K-nearest neighbor (KNN) [36], and eXtreme gradient boosting (XGBoost) [37] are selected to evaluate structural damage degree and improved accuracy in structural damage detection. The accuracy of KNN, RF, and XGBoost is 67.64%, 70.24%, and 75.78%, respectively, representing a low ability for recognizing structural damage detection. For these algorithms, such as SVM, KNN, RF, and XGBoost, the time-sequence data of acceleration are input datasets. For FFT-DCNN and FFT-SVM algorithms, the time sequence is transformed into frequency information by the FFT method, and then frequency information is fed into DCNN or SVM algorithms. The relative setting of these algorithms are as follows: SVM: A Gaussian RBF function is used as the SVR kernel function, and a grid search is used to determine the penalty parameters c and kernel parameters g. The search range c and g are [10 −4 , 10 4 ] and [2 −4 , 2 4 ] by the grid searching method, respectively. Tolerance for stopping criterion is set to 1e-3, and it is enough to satisfy the error criterion.
KNN: The number of neighbors is set to 5. The value of function weights is set to uniform, representing that all points in each neighborhood are weighted equally. The power parameters. The Minkowski metric is set to 1, and it is equivalent to using Manhattan distance. The leaf size that affects the construction and query speed is set to 30.
RF: The number of trees in the forest is 100, and the maximum depth is set 3. The min samples split is set to 2, which denotes the minimum number of samples required to split an internal node. Min samples leaf represents that training samples in each of the left and right branches, and the values of Min samples leaf are set to 5. The value of Max features is set to 'auto', representing that the number of features to be considered when looking for the best split.
XGBoost: The maximum depth of a tree is set to 6, and the minimum child weight depth of a tree is 1. To making the update step more conservative, the max delta step is set to 0.1. L2 regularization term on weights is set 1, which can reduce the overfitting problem. The learning rate is set to 0.3. The fraction columns of random samples for each tree are set to 1. Table 5 shows the comparison of damage detection ability between the algorithms mentioned above and the proposed method. It can be seen from Table 5 that the proposed method has higher accuracy in 93.38% than classical ML methods. In addition, the FFT-SVM improves accuracy by 4.56% than SVM when FFT is added to SVM. It shows that the preprocessing method, for example, FFT method, can reduce the effect of fault or noisy data and improved accuracy in structural damage detection. To observing the classification result of every defined pattern for test data based on different algorithms, evaluation criteria including Precision, Recall, and F 1 -sore are utilized in Figures 9-11. Recall represents the number of positive class predictions made out of all positive examples in the dataset. It can be seen from Figure 9 that the obtained Recall scores of 10%, 20%, and 30% damaged conditions are approximately over 0.8 using the FFT-DCNN algorithm, with the exception of 40% damaged conditions, for which the scores fall approximately 0.5. In addition, using the precision evaluating on test data, similar results are achieved for FFT-DCNN algorithm. For 10%, 20%, 30%, and 40% damaged conditions, the Precision is approximately over 0.7 with a high recognition performance, which is shown in Figure 10. It can be seen from Figure 11 that the exception of 40% damaged conditions achieves a low score, and other patterns (such as 10%, 20%, and 30% damaged conditions) obtain a high score using FFT-DCNN algorithm.
Recall scores of 10%, 20%, and 30% damaged conditions are approximately over 0.8 usin the FFT-DCNN algorithm, with the exception of 40% damaged conditions, for which th scores fall approximately 0.5. In addition, using the precision evaluating on test data similar results are achieved for FFT-DCNN algorithm. For 10%, 20%, 30%, and 40% damaged conditions, the Precision is approximately over 0.7 with a high recognitio performance, which is shown in Figure 10. It can be seen from Figure 11 that the exceptio of 40% damaged conditions achieves a low score, and other patterns (such as 10%, 20% and 30% damaged conditions) obtain a high score using FFT-DCNN algorithm.   Recall scores of 10%, 20%, and 30% damaged conditions are approximately over 0.8 usin the FFT-DCNN algorithm, with the exception of 40% damaged conditions, for which th scores fall approximately 0.5. In addition, using the precision evaluating on test data similar results are achieved for FFT-DCNN algorithm. For 10%, 20%, 30%, and 40% damaged conditions, the Precision is approximately over 0.7 with a high recognitio performance, which is shown in Figure 10. It can be seen from Figure 11 that the exceptio of 40% damaged conditions achieves a low score, and other patterns (such as 10%, 20% and 30% damaged conditions) obtain a high score using FFT-DCNN algorithm.   In order to reveal the classification of different algorithms under four damaged conditions of a three-story building structure, Figure 12 shows the confusion matrix of th proposed approach and compared algorithms. More specifically, in the FFT-DCNN method, the accuracy of 10% damaged condition and 40% damaged condition keep a hig value over 95%. For 20% damaged condition, 37 samples are misclassified as 30% damaged condition, and 11 samples are misclassified as 10% damaged condition. Th Precision keeps a high value of over 86%. The Precision is 86.7% for 30% damaged condition. Finally, the total accuracy of test data for FFT-DCNN is 93.29%, whic illustrates the excellent ability of the proposed method for recognizing structural damag conditions compared with other methods, including FFT-SVM (90.15%), SVM (85.59% KNN (67.64%), RF (70.27%), and XGBoost (75.78%). In addition, for the FFT-SVM algorithm, the numbers of classification accuracy in 0-3 damaged conditions are 1003, 239 468, and 541, respectively, in Figure 12b. The numbers of classification accuracy in 0damaged conditions are 991, 310, 495, and 575, respectively, in Figure 12c. It indicates tha accuracy can be improved when the acceleration data are transformed into frequenc information. More importantly, the frequency of the three-story frame generates obviou changes compared to methods based on time sequence when the structure suffer different damaged degrees. Our proposed method utilizes frequency information o three-story frame to recognize structural damage with high performance. Figure 11. F 1 -scored of the test data using different algorithms.
In order to reveal the classification of different algorithms under four damaged conditions of a three-story building structure, Figure 12 shows the confusion matrix of the proposed approach and compared algorithms. More specifically, in the FFT-DCNN method, the accuracy of 10% damaged condition and 40% damaged condition keep a high value over 95%. For 20% damaged condition, 37 samples are misclassified as 30% damaged condition, and 11 samples are misclassified as 10% damaged condition. The Precision keeps a high value of over 86%. The Precision is 86.7% for 30% damaged condition. Finally, the total accuracy of test data for FFT-DCNN is 93.29%, which illustrates the excellent ability of the proposed method for recognizing structural damage conditions compared with other methods, including FFT-SVM (90.15%), SVM (85.59%), KNN (67.64%), RF (70.27%), and XGBoost (75.78%). In addition, for the FFT-SVM algorithm, the numbers of classification accuracy in 0-3 damaged conditions are 1003, 239, 468, and 541, respectively, in Figure 12b. The numbers of classification accuracy in 0-3 damaged conditions are 991, 310, 495, and 575, respectively, in Figure 12c. It indicates that accuracy can be improved when the acceleration data are transformed into frequency information. More importantly, the frequency of the three-story frame generates obvious changes compared to methods based on time sequence when the structure suffers different damaged degrees. Our proposed method utilizes frequency information of three-story frame to recognize structural damage with high performance.
All experimental algorithms are performed in the same Windows server, and the server is configured as: GPU is GeForce RTX 3080Ti, RAM is 32 GB, and AMD Ryzen 9 5950X 16-Core Processor. It can be seen from Figures 13 and 14 and Table 6 that the proposed methods cost more time than ML algorithms during training and testing datasets. It is mainly because FFT-CNN continually updates the parameter of the network using a number of datasets. In addition, the training process finishes when the network's loss curve begins to converge, which can cost a large amount of time in Figure 13. With the increase in computing ability, the consuming time of algorithms will be reduced in the future. Thus, the accuracy of structural damage detection should receive the primary attention, which can affect public safety.   future. Thus, the accuracy of structural damage detection should receive the primary attention, which can affect public safety.
Moreover, the test time is an important evaluation metric to judge whether the algorithm can be utilized in actual engineering. It can be seen that the proposed method takes a short time on test datasets, compared with FFT-SVM, SVM, KNN, and RF. Thus, the proposed method can be designed for recognizing structural damage in actual civil engineering.  the increase in computing ability, the consuming time of algorithms will be reduced in the future. Thus, the accuracy of structural damage detection should receive the primary attention, which can affect public safety.
Moreover, the test time is an important evaluation metric to judge whether the algorithm can be utilized in actual engineering. It can be seen that the proposed method takes a short time on test datasets, compared with FFT-SVM, SVM, KNN, and RF. Thus, the proposed method can be designed for recognizing structural damage in actual civil engineering.   Moreover, the test time is an important evaluation metric to judge whether the algorithm can be utilized in actual engineering. It can be seen that the proposed method takes a short time on test datasets, compared with FFT-SVM, SVM, KNN, and RF. Thus, the proposed method can be designed for recognizing structural damage in actual civil engineering.

Data Description and Experimental Setup
To verify the effectiveness of the proposed method in a complex environment, the experimental phase II of the SHM benchmark is utilized for the study. The datasets and experimental process of the ASCE benchmark were published by the American Society of Civil Engineers (ASCE) SHM task group. The height of the ASCE benchmark is 3.6 m, and the footprint dimensions are 2.5 m × 2.5 m, shown in Figure 15. The experiment aimed to provide a unified test bed for evaluating the ability of structural damage detection using different methods.

Data Description and Experimental Setup
To verify the effectiveness of the proposed method in a complex environment, the experimental phase II of the SHM benchmark is utilized for the study. The datasets and experimental process of the ASCE benchmark were published by the American Society of Civil Engineers (ASCE) SHM task group. The height of the ASCE benchmark is 3.6 m, and the footprint dimensions are 2.5 m × 2.5 m, shown in Figure 15. The experiment aimed to provide a unified test bed for evaluating the ability of structural damage detection using different methods.  The experiment collected acceleration signals from 15 accelerometers under ambient excitation, impact hammer excitation, and 5-50 Hz randomly generated shaker excitation. Moreover, Case 1 was measured for 120 s, Case 6 was measured for 300 s, and the remaining cases were measured for 360 s. The sampling frequency for all accelerometers were set to 200 Hz. As shown in Table 7 and Figure 16, the damaged degree was increased gradually from undamaged in Case 1 to very damaged in Case 9. Cases 2-7 removed the inclined supports at specific locations to generated structural damage, and Cases 8 and 9 loosed bolts at joint locations. In total, 31,680 samples were acquired by slicing with a window of 200 points during experimental process. Then, the dataset was divided into training, verification, and testing datasets according to the ratio of 6:2:2. That sample number for training, verification, and testing datasets are 19,008, 6336, and 6336, respectively. Table 7. Description of the structural cases for the benchmark [30].

Structural Conditions Description
Case 1 Undamaged condition Case 2 Inclined supports located on the first floor was removed Case 3 Inclined supports located on the first and fourth floors were removed Case 4 Inclined supports on all floors were removed in one bay Case 5 All inclined supports were removed on the west face Case 6 Case 5 + inclined supports located on the second floor were removed Case 7 All inclined supports on all faces were removed Case 8 Loosed bolts on first and second floors of the beam + Case 7 Case 9 Loosed bolts on all floors of the beam on west face + Case 7 inclined supports at specific locations to generated structural damage, and Cases 8 and 9 loosed bolts at joint locations. In total, 31,680 samples were acquired by slicing with a window of 200 points during experimental process. Then, the dataset was divided into training, verification, and testing datasets according to the ratio of 6:2:2. That sample number for training, verification, and testing datasets are 19,008, 6336, and 6336, respectively. Table 7. Description of the structural cases for the benchmark [30]. Case 1  Undamaged condition  Case 2  Inclined supports located on the first floor was removed  Case 3 Inclined supports located on the first and fourth floors were removed Case 4

Structural Conditions Description
Inclined supports on all floors were removed in one bay Case 5 All inclined supports were removed on the west face Case 6 Case 5 + inclined supports located on the second floor were removed Case 7 All inclined supports on all faces were removed Case 8 Loosed bolts on first and second floors of the beam + Case 7 Case 9 Loosed bolts on all floors of the beam on west face + Case 7

FFT-DCNN Testing Result for ASCE Benchmark
The historical curve of Fold 4 is presented as an example. It can be seen from Figure  17 that there are training, testing, and validation datasets in each epoch. The accuracies of validation and testing reach 80% after Epoch 250. It indicates that the FFT-DCNN has a faster-trained process and convergence speed.

FFT-DCNN Testing Result for ASCE Benchmark
The historical curve of Fold 4 is presented as an example. It can be seen from Figure 17 that there are training, testing, and validation datasets in each epoch. The accuracies of validation and testing reach 80% after Epoch 250. It indicates that the FFT-DCNN has a faster-trained process and convergence speed.
Four-fold crossvalidation results in the ASCE benchmark dataset are shown in Table 8. In each iteration, the verification accuracy is 88.06% on average, and the testing accuracy is 87.30% on average. The experimental result shows that the FFT-DCNN model is suitable for structural damage detection.  Four-fold crossvalidation results in the ASCE benchmark dataset are shown in Table  8. In each iteration, the verification accuracy is 88.06 % on average, and the testing accuracy is 87.30 % on average. The experimental result shows that the FFT-DCNN model is suitable for structural damage detection.  Figure 18 denotes the best result of Fold-4 for the confusion matrix. It can be seen that the label C1 is undamaged conditions, and the labels C2-C9 denote the damage conditions. The overall identification accuracy of 9 structural damage detection is 88.10%, and the error rate is 11.90%. The accuracy of C2, C6, C7, and C8 is more than 90%. C3, C4, and C9 is 88%, 84%, and 89%, respectively. While the C1 and C5 have wrong identification, but its accuracy is still over 70%, in addition, because the number of samples reaches 31,680 and it is large enough. Thus, the accuracy of FFT-CNN reaches 87.30% on average, which can satisfy the demand for structural damage detection in actual environment.  Figure 18 denotes the best result of Fold-4 for the confusion matrix. It can be seen that the label C1 is undamaged conditions, and the labels C2-C9 denote the damage conditions. The overall identification accuracy of 9 structural damage detection is 88.10%, and the error rate is 11.90%. The accuracy of C2, C6, C7, and C8 is more than 90%. C3, C4, and C9 is 88%, 84%, and 89%, respectively. While the C1 and C5 have wrong identification, but its accuracy is still over 70%, in addition, because the number of samples reaches 31,680 and it is large enough. Thus, the accuracy of FFT-CNN reaches 87.30% on average, which can satisfy the demand for structural damage detection in actual environment.

Comparative Analysis for Different Methods
The classic methods can extract sensitive features from acceleration to recognize structural damage conditions. Table 9 shows the comparison between the proposed methods and classic methods. It indicates that the proposed method achieves higher performance than existing methods, including FFT-SVM, SVM, KNN, RF, and XGBoost. The experimental result shows the proposed method high identification ability in

Comparative Analysis for Different Methods
The classic methods can extract sensitive features from acceleration to recognize structural damage conditions. Table 9 shows the comparison between the proposed methods and classic methods. It indicates that the proposed method achieves higher performance than existing methods, including FFT-SVM, SVM, KNN, RF, and XGBoost. The experimental result shows the proposed method high identification ability in recognizing damaged conditions of civil structure.

Conclusions and Future Work
This study proposed a novel FFT-DCNN model to identify structural damage detection and was verified on a three-story building structure and ASCE benchmark. The main contributions of this paper are summarized as follows.
(1) A novel data-driven method is proposed by combining FFT with DCNN, effectively handling vibration signals and accurately recognizing structural damage conditions. (2) Compared with traditional damage detection such as FFT-SVM and SVM, KNN, Random Forest, and XGBoost, the FFT-DCNN model automatically extracts features from the structure under different damage conditions and achieve higher accuracy in 93.38% for the three-story building structure and accuracy in 87.30% for the ASCE benchmark. Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Conflicts of Interest:
The authors declare that no conflict interest for the present study was found.