Learning Damage Representations with Sequence-to-Sequence Models

Natural hazards have caused damages to structures and economic losses worldwide. Post-hazard responses require accurate and fast damage detection and assessment. In many studies, the development of data-driven damage detection within the research community of structural health monitoring has emerged due to the advances in deep learning models. Most data-driven models for damage detection focus on classifying different damage states and hence damage states cannot be effectively quantified. To address such a deficiency in data-driven damage detection, we propose a sequence-to-sequence (Seq2Seq) model to quantify a probability of damage. The model was trained to learn damage representations with only undamaged signals and then quantify the probability of damage by feeding damaged signals into models. We tested the validity of our proposed Seq2Seq model with a signal dataset which was collected from a two-story timber building subjected to shake table tests. Our results show that our Seq2Seq model has a strong capability of distinguishing damage representations and quantifying the probability of damage in terms of highlighting the regions of interest.


Introduction
Natural hazards including hurricanes and earthquakes have caused damages to structures and incurred great economic costs in many countries. Post-hazard responses are critical to save lives and mitigate economic losses, requiring accurate and efficient damage assessment. The traditional approach to assessing post-hazard damage is on-site investigations by employing expert inspectors to detect damages. Because of the accessibility to specific locations, such as underneath a bridge deck, is often low, on-site investigations have unavoidable disadvantages in terms of emergency response and post-hazard recovery efforts. Additionally, manual visual inspection is subjective and laborious. Real-time inspection using sensor data to address these drawbacks of on-site investigations have led to the use of emerging technologies within the research community of structural health monitoring (SHM) [1].
The core of real-time inspection technology is dependent on the sensor data. Advancements in sensor technologies make rapidly acquiring rich data possible. Deep learning (DL) models have become a new paradigm in data-driven SHM [2,3]. The key advantage of DL models is that features related to damage patterns can be automatically extracted when detecting damage based on sensor data. The driving forces behind the revolutionary progress of DL-based damage detection can be attributed to the following factors: (1) rich sensor data and powerful computational resources allow large-scale training based on DL models; and (2) the superiority of DL algorithms in extracting features enables data-driven models to outperform conventional inspection approaches in terms of accuracy and efficiency.
Vision-based and vibration-based models are two main data-driven models using DL for detecting damage. Vision-based models implement computer vision technologies to detect damage in structures. Kong and Li [4] proposed a vision-based approach for detecting fatigue cracks. Results show that their method can stably recognize the fatigue cracks irrespectively of ambient lighting conditions. Cha et al. [5] proposes a vision-based method using a convolutional neural network (CNN) for identifying concrete cracks. Their CNN models were trained on a dataset of 40,000 images with 256 × 256 pixel resolutions, achieving approximately 98% accuracy. Cha et al. [6] trained a region-based DL model to recognize four-class damage types, including bolt corrosion, steel corrosion, steel delamination, and concrete crack given a dataset including 2366 images. Gao et al. [7] used transfer learning to avoid overfitting when training Visual Geometry Group (VGGNet) on a relatively small dataset of 2000 images. Their damage detection tasks include component recognition, spalling condition determination, damage level estimate, and damage type classification. Additionally, unmanned aerial vehicles with high-resolution cameras have been deployed with well-trained DL models, achieving the goal of broadening the inspection scope and improving the accessibility [8,9]. These advancements in vision-based methods have successfully addressed the weakness of conventional on-site investigations. However, the main limitation of vision-based models concerns the invisible and internal damage of structures not being recognized. Furthermore, vision-based models aim to identifying different damage classes with high accuracy instead of quantifying corresponding damage states.
Vibration-based models provide promising solutions to the quantification of damage using sensor data such as acceleration responses. These models rely on changes in vibration characteristics due to damage and recognize features that are related to damage from vibration data. Although DL models have been extensively employed to detect damage, limitations are summarized as follows: (1) the probability of damage cannot be quantified according to most proposed methods; (2) most proposed methods rely on a large amount of sensor data, requiring a high computational capacity; (3) analyses regarding damage representations to understand its effects on the quantification of damage are small. Therefore, we propose a Seq2Seq model to quantify the probability of damage given signals from a unknown damage state to address challenges in current data-driven models. We trained the model to learn damage representations with only undamaged signals and then quantified the probability of damage by feeding damaged signals into models. The main contribution of this paper are two-fold: (1) our model only requires the undamaged signals with simple signal processing, improving the computational efficiency; (2) our model is the first to perform quantified damage detection on a 2-story timber building under earthquake excitations.
The rest of paper is organized as follows: Section 2 reviews current progresses of data-driven damage detection in SHM; Section 3 describes the architecture of our proposed Seq2Seq model; Section 4 presents the experiments to verify our Seq2Seq model, including the project overview, training details, and results; Section 5 provides a comprehensive analysis to better understand the model in terms of learning curve, damage representation, and the probability of damage; and Section 6 summarizes the highlights of our Seq2Seq model and conclusions from experiments.

Related Work
In many studies, DL models have become most promising technology in SHM. Various DL models are extensively used to identify or locate damage. Rafiei and Adeli [10] use a deep restricted Boltzman machine (DRBM) to determine the damage states of a building, with DRBM automatically extracting features from acceleration signals and being useful in identifying both global and local damages. Cha and Wang [11] collected acceleration responses from a steel bridge in the laboratory. First, the continuous wavelet transform and fast Fourier transform were performed to transfer the time series data to the frequency domain. They build an AutoEncoder with a CNN architecture as a feature extractor. Then, a one-class support vector machine (OC-SVM) is trained with extracted features to classify levels of damage. Wang and Cha [12] also proposed an end-to-end damage detection workflow. They designed an AutoEncoder to reconstruct acceleration signals and three damage-sensitive features were then computed according to reconstruction losses. Similarly, an OC-SVM was trained with these damage-sensitive features to predict levels of damage, with an accuracy up to 97.4%. Avci et al. [13] used output-only response data to identify the damage, achieving an accuracy of 99.46% when classifying three-class levels of damage. Abdeljaber et al.'s model [14] can predict binary damage states but only obtain the system-level damage information. Li and Sun [15] trained a CNN model to recognize different levels of damage in a bridge, accurately predicting different severity of damage. Ni et al. [16] built an AutoEncoder with a CNN architecture as a feature extractor by feeding time series signals. Their model was validated using the acceleration responses of a long-span bridge under ambient excitations, detecting damage with a high accuracy. Li et al. [17] proposed a probabilistic structural damage detection algorithm based on Sparse Bayesian Learning and model reduction to infer the stiffness degradation from vibration responses. The algorithm is verified with the vibration responses of two beam structures and a long-span cable-stayed bridge and can reliably detect and quantify various damage scenarios. Yang et al. [18] proposed a novel damage recognition network designed as an encoder-decoder-encoder combination for detecting damage in a building. They trained the model using the Fourier spectra of acceleration signals in the undamaged state to recognize the pattern that is related to damage. Then, the Fourier spectra of the acceleration signals from a unknown damage state are fed into the damage recognition network to quantify the level of damage accordingly. Sony et al. [19] used a recurrent neural network to identify and locate damage based on vibration responses of the building. Their proposed model was verified on two benchmark datasets for binary and multi-class damage classification, respectively. Figure 1 illustrates the workflow of quantifying the probability of damage using Seq2Seq models, including signal processing and damage representation learning. In the first step, the signal processing was required to denoise the original signals with wavelet transform. Then, Seq2Seq models were trained using signals in the undamaged state to learn damage representations and quantify the probability of damage accordingly:

Signal Processing
The signal processing consists of two main steps, namely segmentation and denoising. First, we divided each complete time series signal of t seconds that was sampled under a frequency of f s Hz into n segments, with the length of each segment being t × f s /n. A proper n was determined such that each signal segment included sufficient features for learning representations given a damage state. Then, wavelet transform-based signal denoising was performed to eliminate the effect of noises in signals on the damage representation learning.
The process of signal denoising using wavelet transform can be summarized as follows: (1) perform the wavelet transform on each original segments to compute corresponding wavelet coefficients; (2) clean noises by carefully setting a limit to conserve large wavelet coefficients; (3) recover the denoised signals by performing the inverse wavelet transform according to preserved wavelet coefficients from step 2. Denoised signal segments were finally normalized to a range of ±1 for Seq2Seq models.

Damage Representation Learning
A Seq2Seq model is a neural network that computes a conditional probability of p(y|x) of mapping a source sequence, x = {x 1 , . . . , x n }, to a target sequence, y = {y 1 , . . . , y n } [20]. The Seq2Seq models were successfully applied for machine translations [20], exhibiting their strong capability of extracting representations of the sequence data. As illustrated in Figure 2, the basic architecture of a Seq2Seq model was comprised of two sub-networks: (a) an encoder that extracts damage representations h for each signal segments; and (b) a decoder that reconstructs a signal value at each time step and hence decomposes a conditional probability as Here, <sos> marks the start of a signal; and <eos> marks the end of a signal.
A direct option to build a Seq2Seq model is to use a recurrent neural network (RNN) architecture. Alternatively, to prevent the gradient vanishing during the training of long sequences, the long short-term memory (LSTM) unit and gated recurrent unit (GRU) can be used in a Seq2Seq model.
In more detail, one can parameterize the conditional probability of encoding each segmented signal x as where f determines the probability distribution of the damage representation given the signal, often referred to as the posterior probability, which can either be a vanilla RNN, an LSTM, or a GRU. h 0 is the hidden state at the initial time step, which is often set as an all-zero vector. θ p is the parameter set of the encoder. Then, one can parameterize the condition of decoding each segmented signal with a damage representation h from an encoder as with W being the weight matrix of the fully connected layer that outputs a signal-sized vector. · 2 2 denotes the 2nd power of the L 2 norm. Here, h * j is the RNN hidden unit in the decoder, which can be computed as where g recursively outputs the current hidden state according to the previous hidden state.
The initial hidden state of the decoder h * 0 is the damage representation h that is extracted from the encoder. Likewise, g can be either a vanilla RNN, an LSTM, or a GRU. θ q is the parameter set of the decoder.
In this work, our training objective is to reconstruct the signal with a Seq2Seq model and hence the loss function is formulated as follows: with x and y being the original and reconstructed signals, respectively. The procedure of learning damage representations with a Seq2Seq model is given in Algorithm 1. for j ≤ L do 6: Extract hidden states: h * j ← g(x j−1 , h * j−1 ; θ q ) 7: Reconstruct signal: y j ← W h * j 8: end for 9: Update θ p , θ q by optimizing Eq.(5) 10: end for 11: return θ p , θ q

Probability of Damage
We trained a Seq2Seq model with only undamaged signals to learn their representations and hence the model's reconstruction results are supposed to deviate when inputting damaged signals. The probability of damage according to the extent of deviation is defined as suggested in [18]: with e being the reconstruction loss, which can be computed as follows: where θ p and θ q are the parameter sets of a Seq2Seq model that is trained by undamaged signals, whilex is the damaged signal.

Experiment
In this section, we test the feasibility and effectiveness of using a proposed Seq2Seq model to learn damage representations with a two-story timber building subjected to shake table tests [21]. To distinguish the capacity of our model in terms of learning damage representations, we used a vanilla AutoEncoder, a stacking architecture of multilayer perceptron (MLP), as a baseline model.

Project Overview
The test building has a symmetric plan of 6.10 m by 17.68 m. The first floor height is 3.66 m from the base and the roof height is 3.05 m from the first floor. The total building elevation is 6.71 m [21].
The test building was subjected to real ground motions that represent four increasing hazard levels for the San Francisco site including service-level earthquake (SLE), designbasis earthquake (DBE), maximum considered earthquake (MCE), and 1.2×MCE. To understand the effects of the damage inflicted to the test building as excitation intensity increases, white noise tests were conducted before and after each ground motion test. To quantitatively reflect the severity of damage after each hazard level, we used white noise test WN1 (prior to excitation) to learn damage representations and four white noise tests WN2, WN11, WN17, and WN21 to predict the probability of damage for different hazard levels. The white noise excitation consists of a root mean squared acceleration of 0.03 g. The white noise test data are available in DesignSafe-CI [22].
Microelectromechanical system (MEMS) accelerometers that are inertial sensors with a low frequency response range but offer lower noise with an acceleration amplitude range of ±5 g were used to measure the dynamic response of the test building [23]. The results reported in [21] show that the amplitude of vibrations induced by the white noise tests is 0.2 g, which does not exceed the measurement range of MEMS accelerometers. The bi-axial accelerometers were placed at seven locations on the first floor and roof, as illustrated in Figure 3.

Training Details
The overall statistical information of datasets is summarized in Table 1. Signals are sampled under a frequency of 240 Hz. The durations of white noise tests WN1, WN2, WN11, WN17, and WN21 are 295.5, 267.0, 176.0, 160.5, and 166.0 s, with the length of signals being 70,902, 64,080, 42,240, 38,520, and 39,840, respectively. White noises are generated by superimposing multi-frequency sinusoidal waves and random noises. The multi-frequency sinusoidal waves are periodical, and the random noises represent the ambient environment excitations. The first white noise test is the longest so that it is beneficial for our model to learn damage representation because more noise can improve model's robustness and generalization. We divided a signal into 500 segments, as suggested in [10] and pad the last segment with extra zeroes if its length is less than 500 to ensure that each signal segment has the same length. Figure 4 illustrates the white noise data recorded by sensor 1-101 in WN1 and WN21.   Given a segmented signal dataset D = [X 1 , ..., X s ] T ∈ R s×d×n×l , with s being the number of accelerometers (s = 14 in this study), m being the number of measurement directions of an accelerometers (d = 2 in this study), s being the number of signal segments (n = 500 in this study), and l being the length of a signal segment. X i,j,k,: ∈ R l represents a signal of the k th segment in the j th measurement direction of the i th sensor. For the baseline model, we reshaped D to D b ∈ R s×n×dl by connecting signals in two directions end to end. Table 2 summarizes the hyper-parameters of training a Seq2Seq model and a baseline model. Our Seq2Seq model has one layer in both encoder and decoder, each with 500 cells, and 2-dimensional embeddings. We used the following settings in training a Seq2Seq model and a baseline model: (a) weights are uniformly initialized in [−1, 1]; (b) the hidden size (dimensionality of damage representations) is 128; (c) we trained models using SGD with a momentum coefficient of 0.9; (d) a fixed learning rate of 0.1 was employed; (e) our batch size was 256 and (f) the numbers of epochs were 1000 and 10,000 for a Seq2Seq model and a baseline model, respectively. In addition, dropout with a of probability 0.5 was used for a Seq2Seq model and the normalized gradient was rescaled when its norm exceeded 5. Our code was implemented in PyTorch and is available at the repository: https: //github.com/qryang/Damage-representation (accessed on 1 December 2021). When running on an Amazon Web Service g4dn.xlarge instance, we achieved speed of reconstructing 6000 signals per second. It normally takes 5-6 min to complete training a model.

Discussion
In this section, we conducted a comprehensive analysis to better understand our Seq2Seq models with respect to learning curves, damage representation, and the probability of damage. Figure 6 illustrates the learning curves of Seq2Seq models and a baseline model, with losses being transferred to a logarithmic scale. For Seq2Seq models, the loss of a vanilla RNN architecture drops quickly at the early stage and smoothly decreases throughout the training process. The LSTM and GRU architectures have a similar convergence speed, converging after 300 epochs. The GRU architecture achieves the lowest loss among all architectures. The baseline model converges after 10,000 epochs but with higher loss compared to Seq2Seq models. The Seq2Seq model converges faster to lower losses than the baseline model but the trade-off is that its time consumption of an epoch is 10 times that of the baseline model.

Damage Representation
The reliability of quantifying the probability of damage by models is heavily dependent on the damage representation. Learning the damage representations of signals is essential to extract useful information when building a decoder to reconstruct signals. In the case of our Seq2Seq model, a probabilistic model, a good damage representation is often one that captures the posterior distribution of the underlying explanatory factors for original signals. Good damage representation is also effectively condensed information extracted by an encoder [24].
We used a 128-dimensional hidden vector extracted by an encoder as the damage representation for both Seq2Seq models and the baseline model. To visualize the distribution of damage representations, we employed the linear discriminant analysis to obtain 2-dimensional damage representations. We chose the GRU architecture as representative of Seq2Seq models as it achieves the lowest loss during the training. Figure 7 illustrates dimensionally reduced damage representations extracted by a Seq2Seq model with the GRU architecture and a baseline model. The distributions of damage representations in the damaged states (WN2, WN11, WN17, and WN21) are distinct from those in an undamaged state (WN1). However, the distributions of damage representations extracted by Seq2Seq models and the baseline model exhibits different patterns. For a Seq2Seq model with the GRU architecture, damage representations for a heavier damage states clearly scatter with a larger distance to the undamaged state. In contrast, the distribution of damage representations extracted by the baseline model is overlapped without effectively clarifying the different damage states. Damage representations from different damage states contain duplicate information. The better capability of reconstructing signals, especially the high frequency components of our Seq2Seq models, ensures that damage representations contain sufficient information to distinguish different damage states. Therefore, we conclude that a Seq2Seq model behaves better in terms of the discriminability of damage representations than the baseline model.

Probability of Damage
We trained Seq2Seq models to learn damage representations with only undamaged signals and then quantify the probability of damage by feeding damaged signals into models, indicating the severity of damage. Practitioners can reasonably judge the conditions of structures and trace the progress of damage with the assistant of probabilities of damage.
In this section, we also used a Seq2Seq model with the GRU architecture to quantify the probability of damage as comparisons with the baseline model. Table 3 summarizes the results of the probabilities of damage by using our Seq2Seq model and the baseline model. The probabilities of damage computed by our Seq2Seq model show an ascending trend overall, indicating the effectiveness of its capability to quantify damage states. The probabilities of damage computed by our Seq2Seq model and the baseline model are not the exact same. This is due to their different capabilities of learning damage representations.   Table 4 summarizes the actual damage levels in terms of the degradation of stiffness, as suggested in [12]. Given that the natural stiffness is proportional to the frequency squared if the test building is considered as a single-degree-of-freedom system for simplicity, we used the results of natural frequencies that are obtained by the modal analysis in [23] to estimate the degradation of stiffness. The stiffness of test building degrades by 25.9%, 32.5%, 44.2%, 46.2%, and 56.8% compared with the undamaged state for four increasing hazard levels. Conceptually, the percentage of degradation of stiffness corresponds to the probability of damage. The core difference is that the percentage of degradation of stiffness is a global index as the representative of damage state, the probability of damage is a local index which can highlight the severity of damage at different regions.

Conclusions
In this paper, we proposed a Seq2Seq model to quantify the probability of damage given white noise signals. We trained a model with undamaged signals to learn damage representations and then fed damaged signals into a model. Damage representations can recognize different damage states and determine the probability of damage according to the deviation of reconstructed signals.
We verified the effectiveness of using the proposed Seq2Seq model to learn damage representations with a 2-story timber building subjected to shake table tests. To distinguish the capacity of our model in terms of learning damage representations, we used a vanilla AutoEncoder as a baseline model. Results show that our Seq2Seq model can reconstruct signals with a low loss while the baseline model can only reconstruct low frequency components but lose a majority of high frequency components. Compared with the baseline model, our Seq2Seq model has a stronger capability to distinguishing damage representations and quantifying the probability of damage in terms of highlighting the regions of interest.
Our Seq2Seq model is a general model for learning damage representations with the time series data. In future work, one could extend our Seq2Seq model to other areas such as anomaly detection.