A Generative Adversarial Network Based Autoencoder for Structural Health Monitoring †

: Civil structures, infrastructures and lifelines are constantly threatened by natural hazards and climate change. Structural Health Monitoring (SHM) has therefore become an active ﬁeld of research in view of online structural damage detection and long term maintenance planning. In this work, we propose a new SHM approach leveraging a deep Generative Adversarial Network (GAN), trained on synthetic time histories representing the structural responses of both damaged and undamaged multistory building to earthquake ground motion. In the prediction phase, the GAN generates plausible signals for different damage states, based only on undamaged recorded or simulated structural responses, thus without the need to rely upon real recordings linked to damaged conditions.


Introduction
Bridges, power generation systems, aircrafts, buildings and rotating machinery are only few instances of structural and mechanical systems which play an essential role in the modern society, even if the majority of them are approaching the end of their original design life [1].Taking into account that their replacement would be unsustainable from an economic standpoint, alternative strategies for early damage detection have been actively developed so to extend the basis service life of those infrastructures.Furthermore, the advent of novel materials whose long-term behaviour is still not fully understood drives the effort for effective Structural Health Monitoring (SHM), resulting in a saving of human lives and resources [1].
SHM consists of three fundamental steps: (i) measurement, at regular intervals, of the dynamic response of the system; (ii) selection of damage-sensitive features from the acquired data; (iii) statistical analysis of those attributes to assess the current health state of the structure.To characterize the damage state of a system, the method relying on hierarchical phases, originally proposed by [2] represents the currently adopted standard.The latter prescribes several consecutive identification phases (to be tackled in order), namely: check the existence of the damage, the location of the damage, its type, extent and the system's prognosis.Damaged states are identified by comparison with a reference condition, assumed to be undamaged.The detection of the damage location relies upon a wider awareness of the structural behaviour and the way in which it is influenced by damage.This information, along with the knowledge of how the observed features are altered by different kinds of damage, allows to determine the type of damage.The last two phases require an accurate estimation of the damage mechanisms in order to classify its severity and to estimate the Remaining Useful Life (RUL).
All the steps mentioned above rely on continuous data acquisition and processing to obtain information about the current health condition of a system.In the last few years, the concept of Digital Twin has emerged, combining data assimilation, machine learning and physics-based numerical Simulations [1], the latter being essential to completely understand the physics of the structure and damage mechanisms.A suitable tool able to extract main dominant features from a set of data is represented by neural networks [3], especially generative models such as Generative Adversarial Networks (GANs) [4] and Variational Autoencoders (VAEs) [5].
In this paper, an application of the generative neural network RepGAN, proposed by [6], is presented in the context of SHM.Section 2 provides an overview on existing works.In Section 3, the application of RepGAN to Structural Health Monitoring is presented.In Section 4, extensive numerical results are illustrated, while Section 5 gathers some concluding remarks.

Related Work
Generative Adversarial Networks [4] are well known due to their generative capability.Given a multidimensional random variable X ∈ (R d X , E X , P X ), where (R d X , E X , P X ) denotes the probabilistic space with σ-algebra E X and probability measure P X , whose samples are collected in the data set S = x (i) N i=1 , with probability density function p X (X ), the GAN generator attempts to reproduce synthetic samples x, sampled according to the probability density function p G (X ) as similar as possible to the original data, i.e., a GAN trains over data samples in order to match p G with p X .G maps a lower dimension manifold (R d Z , E Z , P Z ) (with d Z < d X in general) into the physics space (R d X , E X , P X ).In doing so, G learns to pass the critic test, undergoing the judgement of a discriminator D : R d X → [0, 1], simultaneously trained to recognize x(i) counterfeits.The adversarial training scheme relies on the following two-players Minimax game: In practice, G is represented by a neural network G θ and D by a neural network D ω , with trainable weights and biases θ and ω, respectively.Moreover, V(D, G) is approximated by the Empirical Risk function L S (ω, θ), depending on the data set S, defined as: with z (i) sampled from a known latent space probability distribution p Z (for instance the normal distribution N (0, I)).The generator G θ induces a sampling probability p G (X ; θ) so that, when optimized, passes the critic test, with D being unable to distinguish between ).In other words, x (i) and G θ z (i) can be associated with the value of a categorical variable C, with two possible values: class "d" (data) and class "g" (generated).x (i) and G θ z (i) can be therefore sampled with the mixture probability density p M = αχ(C = "d") + (1 − α)χ(C = "g") with χ being the indicator function and α = P(C = "d") [7].The optimum solution of the Minimax game in Equation (2) induces a mixture probability distribution 1  2 p C="d" + p C="g" [4].The saddle point of V(D, G) corresponds to the minimum (with the respect to to D) of the condi-tional Shannon's entropy S(C|X ) (see Appendix A).Moreover, minimizing the conditional Shannon's entropy S(C|X ) corresponds to the maximization of the Mutual Information I(X , C) = S(C) − S(C|X ) (see Appendix B), i.e., it corresponds to extract X samples x (i) or x(i) that are indistinguishable (belonging to same class), with an uninformative mapping X → C.
GANs proved useful in various applications such as generation of artificial data for data-set augmentation, filling gaps in corrupted images and image processing.Especially, deep convolutional generative adversarial networks (DCGANs) [8] proved useful in the field of unsupervised learning.SHM could benefit from GANs as they improve the generalisation performance of models, extracting general features from data, as well as their semantics (damage state, frequency content, etc).However, the adversarial training scheme in Appendix C does not grant a bijective mapping G θ Z : Z → X (decoder) and F θ X : X → Z (encoder), which is crucial in order to obtain a unique representation of the data into the latent manifold.Autoencoders have been developed for image reconstruction so to learn the identity operator ) belonging to the latent manifold Ω X (see Equation ( 1)).In order to make the learning process of GANs stable across a range of data-sets and to realize higher resolution and deeper generative models, Convolutional Neural Networks (CNNs) are employed to define F θ X , G θ Z and the discriminators.F θ X and G θ Z induce sampling probability density functions q Z|X = q XZ p X and p X|Z = p XZ p Z , respectively.p X is usually unknown (depending on the data-set at stake), but p Z can be chosen ad hoc (such as, for instance, N (0, I)) in order to get a powerful generative tool for realistic data samples x(i) .A particular type of Autoencoders, called Variational Autoencoders (VAEs) was introduced by [5], consisting in a probabilistic and generative version of the standard Autoencoder, where the encoder F θ X infers the mean µ Z and variance σ 2 Z of the latent manifold.However, the main contribution provided by VAEs is the straightforward approach that allows to reorganize the gradient computation and reduce variance in the gradients labelled reparametrization trick.
Adversarial Autoencoders (AAEs) [9] employ the adversarial learning framework in Equation ( 1), replacing ) and adding to the adversarial GAN loss the Mean Square Loss 2 as an optimization penalty, in order to assure a good reconstruction of the original signal.However, AAEs do not assure a bijective mapping between (R d X , E X , P X ) and (R d Z , E Z , P Z ).In order to achieve the bijection (in a probabilistic sense) between (x, ẑ) and ( x, z) samples, the distance between the joint probability distributions q X Ẑ = q Ẑ|X p X and p XZ = p X|Z p Z [10], with the posteriors q Ẑ|X and p Z| X must be minimized.A suitable distance operator for probability distributions is the so called Jensen-Shannon distance D JS q X Ẑ ||p XZ , defined as [10]: with D KL (p q) = S(p q) − S(p) being the Kullback-Leibler divergence (see Appendix B) and p M = q X Ẑ +p XZ 2 being the mixture probability distribution [7], i.e., the probability of extracting X , Ẑ or X , Z from a mixed data set, with α = P(C = "d") = 1  2 and the entropy of the mixture probability S(M) = ln 2. D JS q X Ẑ ||p XZ can be rewritten as: The adversarial optimization problem expressed in Equation ( 1) can be seen as a minimization of the Jensen-Shannon distance for C ∈ {"d", "g"}: that can be combined with the Autoencoder model in order to obtain the following expression [10,11]: In this context, F θ X learns to map data into a disentangled latent space, generally following the normal distribution, a good reconstruction is not ensured unless the crossentropy between X and Z is minimized too [12].
Another crucial aspect of generative models is the semantics of the latent manifold.Most of the standard GAN models trained according to Equation (1) employs a simple factored continuous input latent vector Z and does not enforce any restrictions on the way the generator treats it.The individual dimensions of Z do not correspond to semantic features of the data (uninformative latent manifolds) and Z cannot be effectively used in order to perform meaningful topological operations in the latent manifold (e.g., describing neighborhoods) and to associate meaningful labels to it.An information-theoretic extension to GANs, called InfoGAN [13] is able to learn a meaningful and disentangled representations in a completely unsupervised manner: a Gaussian noise Z is associated with a latent code C to capture the characteristic features of the data distribution (for classification purposes).As a consequence, the generator becomes G θ Z (Z, C) and the corresponding probability distribution p G , whose Mutual Information with the respect to to the latent codes C, namely I(C, G θ Z (Z, C)).The latter is forced to be high, penalizing the GAN loss in Equation (1) with the variational lower bound L I (G,Q), defined by: with q C|X being the probability distribution approximating the real unknown posterior probability distribution p C|X (and represented by the neural network Q Z ).L I (G, Q) can be easily approximated via Monte Carlo simulation, and maximized with the respect to to q C|X and p G via reparametrization trick [13].

Methods
With the purpose of learning a semantically meaningful and disentangled representation of the SHM time-histories, we adopted in this study the architecture called RepGAN, originally proposed in [6].RepGAN is based on an encoder-decoder structure (both represented by deep CNNs made of stacked 1D convolutional blocks), with a latent space Z = [C, S, N ].C ∈ [0, 1] d C a categorical variable representing the damage class(es), with C ∼ p C which is generally chosen as a categorical distribution over d C classes, i.e., p C = Cat(d C ). S ∈ R d S is a continuous variable of dimension d S , with S ∼ p S , generally p S = N (0, I) or the uniform distribution p S = U (−1, 1).Finally, N ∈ R d N is a random noise of d N independent components, with N ∼ p N , generally p N ∼ N (0, I).RepGAN adopts the conceptual frameworks of VAEs and InfoGAN, combining the learning of two representations x → ẑ → x and z → x → ẑ, respectively.The x → ẑ → x scheme must learn to map multiple data instances x (i) into their images (via encoder F θ X ) in a latent manifold ẑ(i) = F θ X (x (i) ) and back into a distinct instance in data space (via decoder G θ Z ), providing satisfactory results in reconstruction.z → x → ẑ maps multiple data latent instances into the same data representation, in order to guarantee impressive generation and clustering performance.Combining the two surjective mappings, in RepGAN the two learning tasks x → ẑ → x and z → x → ẑ are performed together with shared parameters in order to obtain a bijective mapping x ↔ z.In practice, the training of z → x → ẑ is iterated five times more than the x → ẑ → x.This ability to learn a bidirectional mapping between the input space and the latent space is achieved through a symmetric adversarial process.The Empirical Loss function can be written as: with the terms: • −E p C E p X|C ln q Ĉ|X minimizing the conditional entropy S(C|X ); • −E p S E p X|S ln q Ŝ|X minimizing the conditional entropy S(S|X ).
are introduced in order to constrain a deterministic and injective encoding mapping (see Appendix B).On the other hand, the term penalizes the learning scheme, in order to minimize the conditional entropy S(X |(C, S, N )), i.e., in order to grant a good reconstruction.Following the original RepGAN formulation: • E p S E p X|S ln q Ŝ|X corresponds to the InfoGAN L I penalty, and it is maximized via the reparametrization trick (structuring the S branch of the encoder-decoder structure as a VAE, see [5]).
Finally, E p C E p X|C ln q Ĉ|X is maximized in a supervised way, considering the actual class of labeled signals x (i) : d corresponding to a damaged structure and x (i) u to an undamaged one, respectively.RepGAN provides an informative and disentangled latent space associated with the damage class C. The most significant aspect of the approach is the efficiency in generating reasonable signals for different damage states only on the basis of undamaged recorded or simulated structural responses.Both generators F θ X , G θ Z and discriminators D ω X , D ω C , D ω S and D ω N are parametrized via 1D CNN (and strided 1D CNN), following [8].Our RepGAN model has been designed using the Keras API, and trained employing a Nvidia Tesla K40 GPU (on the supercomputer Ruche, the cluster of the Mésocentre Moulon of Paris Saclay University).

Results and Discussion
In the following, a case study is considered in order to prove the ability of the new architecture to achieve the three fundamental tasks of semantic generation, clustering and reconstruction.The reference example is a shear building subject to an earthquake ground motion whose signals are taken from the STEAD seismic database [14].STEAD [14] is a high-quality, large-scale, and global data set of local earthquake and non-earthquake signals recorded by seismic instruments.In this work, local earthquake wave forms (recorded at local distances within 350 km of earthquakes) have been considered.Seismic data are constituted by three wave forms of 60 s duration, recorded in east-west, north-south, and up-dip directions, respectively.The structure is composed of 39 storeys.The mass and the stiffness of each floor, in undamaged conditions, are, respectively, m = 625 × 10 3 kg and k = 1 × 10 9 kN m .Damage is simulated through the degradation of stiffness.In the present case, the stiffness reduction has been set equal to 50% of the above mentioned value.
The structural response of the system is evaluated considering one degree-of-freedom (dof) per floor.To take into account damping effects, a Rayleigh damping model has been considered.
The following results have been obtained considering 100 signals in both undamaged and damaged conditions for a total of 200 samples, with separated training and validation data sets.Each signal is composed of 2048 time steps with dt = 0.04 s.The training process has been performed over 2000 epochs.The reconstruction capability of the proposed network has been evaluated through the Goodness-of-Fit (GoF) criteria [15] where both the fit in Envelope (EG) and the fit in Phase (FG) are measured.An example is shown in Figure 1.The values 9.17 and 9.69, respectively, related to EG and PG testify the excellent reconstruction quality.i) .GoF is evaluated between 0 and 10: the higher the score, the better is the reconstruction.Frequency Envelope Goodness (FEG), Time-Frequency Envelope Goodness (EG), Time Envelope Goodness (TEG), Frequency Phase Goodness (FPG), Time-Frequency Phase Goodness (PG) and Time Phase Goodness (TPG).
The capability of reproducing signals for different damage scenarios can be appreciated from Figure 2 which presents the original structural response (black) and the corresponding generated one (orange) in both undamaged (left panel in Figure 2) and damaged (right panel in Figure 2) conditions.Regarding the classification capability, the classification report and the confusion matrix in Figure 3 highlight the fact that the model is able to correctly assign the damage class to the considered time histories.

Conclusions
In this paper, we introduce a SHM method based on a deep Generative Adversarial Network.Trained on synthetic time histories that represent the structural response of a multistory building in both damaged and undamaged conditions, the new model achieves high classification accuracy (??) and satisfactory reconstruction quality (Figures 1 and ??), resulting in a good bidirectional mapping between the input space and the latent space.However, the major innovation of the proposed method is the ability to generate reasonable signals for different damage states, based only on undamaged recorded or simulated structural responses.As a consequence, real recordings linked to damaged conditions are not requested.In our future work, we would like to extend our approach to real-time data.We will further consider a dataset constituted by a far larger number of time histories.

Conclusions
In this paper, we introduce a SHM method based on a deep Generative Adversarial Network.Trained on synthetic time histories that represent the structural response of a multistory building in both damaged and undamaged conditions, the new model achieves high classification accuracy (Figure 3) and satisfactory reconstruction quality (Figures 1 and 2), resulting in a good bidirectional mapping between the input space and the latent space.However, the major innovation of the proposed method is the ability to generate reasonable signals for different damage states, based only on undamaged recorded or simulated structural responses.As a consequence, real recordings linked to damaged conditions are not requested.In our future work, we would like to extend our approach to real-time data.We will further consider a dataset constituted by a far larger number of time histories.

Figure 1 .
Figure 1.Time-Frequency Goodness-of-Fit criterion: the black line represents the original time-histories x (i) while the red time history depicts the result of the RepGAN reconstructions

Figure 2 .
Figure 2. Examples of reconstructed signals for undamaged (left) and damaged (right) time-histories.The black lines represent the original time-histories x (i) u and x (i) d , respectively.The orange time histories represent the result of the RepGAN reconstructions G Z • F X x (i) u and G Z • F X x (i) d , respectively.The proposed examples represent the normalized displacement of the 1 st floor of the building in object.

Figure 3 .
Figure 3. Evaluation of the classification ability of the model.On the left panel, precision, recall, f1-score and accuracy values are reported.A precision score of 1.0 for a class C means that every item labelled as belonging to class C does indeed belong to class C, whereas a recall of 1.0 means that every item from class C was labelled as belonging to class C. F1-score is the harmonic mean of the precision and recall.Accuracy represents the proportion of correct predictions among the total number of cases examined.On the right panel, the confusion matrix allows to visualize the performance of the model: each row of the matrix represents the instances in the actual class, while each column depicts the instances in the predicted class.