Vibration-Based Structural Health Monitoring Using Piezoelectric Transducers and Parametric t-SNE

In this paper, we evaluate the performance of the so-called parametric t-distributed stochastic neighbor embedding (P-t-SNE), comparing it to the performance of the t-SNE, the non-parametric version. The methodology used in this study is introduced for the detection and classification of structural changes in the field of structural health monitoring. This method is based on the combination of principal component analysis (PCA) and P-t-SNE, and it is applied to an experimental case study of an aluminum plate with four piezoelectric transducers. The basic steps of the detection and classification process are: (i) the raw data are scaled using mean-centered group scaling and then PCA is applied to reduce its dimensionality; (ii) P-t-SNE is applied to represent the scaled and reduced data as 2-dimensional points, defining a cluster for each structural state; and (iii) the current structure to be diagnosed is associated with a cluster employing two strategies: (a) majority voting; and (b) the sum of the inverse distances. The results in the frequency domain manifest the strong performance of P-t-SNE, which is comparable to the performance of t-SNE but outperforms t-SNE in terms of computational cost and runtime. When the method is based on P-t-SNE, the overall accuracy fluctuates between 99.5% and 99.75%.


Introduction
In structural health monitoring (SHM), an important process for engineering structures, many methods have been applied for damage detection. In [1], the use of the Treed Gaussian Process model-a class of powerful switching response surface model-is illustrated in the context of the SHM of bridges. In [2], a SHM methodology, based on the system-identification techniques, is proposed to quantify the structural degradation in laminated composite booms used in satellite application. In [3], it is casted SHM in the context of statistical pattern recognition, and damage or structural changes are detected using two techniques based on time series analysis. In [4], three optimization-algorithm based support vector machines for damage detection in SHM are presented, which are expected to help engineers to process high-dimensional data.
Real-world datasets usually have high dimensionality, and their dimensionality may need to be reduced to facilitate data processing. Dimensionality reduction is the process of reducing the number of high-dimensional variables by obtaining a low-dimensional set of variables. This reduced representation must correspond to the intrinsic information of the data. Dimensionality reduction is very important, because it alleviates undesired properties of high-dimensional spaces, such as "the curse of dimensionality" [5]. In the literature, various dimensionality reduction methods have been 1. Data preprocessing: Data integration. According to [20], there are six different ways to arrange the raw data collected by multiple sensors that collected measurements for several seconds in different experiments. The type of integration of raw data affects the posterior analysis. In this work, we have considered type-E unfolding. 2. Data preprocessing: Data normalization. The second step, before data transformation, is the data normalization. We perform the mean-centered group scaling (MCGS), as detailed in [21]. 3. Data preprocessing: Data transformation. We build the PCA model (P) so that the normalized dataX are transformed into the projected data T =XP. Notably, matricesX and T have equal dimensions; hence, no reduction is performed at this stage. 4. Data preprocessing: Data reduction (phase I). We use PCA for the data reduction. The number of principal components ∈ N is chosen so that the proportion of the variance explained is greater than or equal to 95%. 5. Data preprocessing: Data reduction (phase II). We now propose P-t-SNE, avoiding the limitations of t-SNE, as a second phase to reduce the dimensionality from to 2. This is one of the first approaches that use P-t-SNE in the field of SHM. 6. Data postprocessing: Classification. For the classification, we propose the majority voting and the sum of the inverse distances, where each actuation phase casts a vote.
As a summary, the contribution of the present work is, precisely, the combination of existing preprocessing methods with a very promising approach: P-t-SNE.
The remainder of this paper is structured as follows. In Section 2, we present the P-t-SNE method. Section 3 describes the preprocessing of the baseline data, reduction of the global dimension of the data, and creation of the clusters using P-t-SNE. Section 4 describes the damage diagnosis procedure.
Subsequently, the application of the proposed method is presented in Section 5. Section 6 shows the results. Finally, Section 7 provides our conclusions.

Parametric t-SNE (P-t-SNE)
The non-parametric version of t-SNE has a huge computational cost of optimization: to map new data, the optimization has to run for the complete set again. To avoid the heavy optimization of the t-SNE, the P-t-SNE was proposed. P-t-SNE is an unsupervised dimensionality reduction technique that learns a parametric mapping between the high-dimensional data space and the low-dimensional latent space, preserving the local structure of the data in the latent space as well as possible.
In the P-t-SNE method, the mapping f : X → Y from the high-dimensional space X to the low-dimensional space Y is parameterized through a feed-forward NN with weights W. The NN is trained in such a way that it retains the local structure of the data in the low-dimensional space. There are two main stages in the training procedure: 1. Pretraining with a restricted Boltzmann machine (RBM). RBM is used to construct a pretrained P-t-SNE network. The main aim of the pretraining stage is to define an initialization of the model parameters for the next stage. 2. Fine-tuning using the cost function of P-t-SNE. In this stage, the weights of the pretrained NN are fine-tuned in such a way that the NN preserves the local structure of the data in the low-dimensional space. In the feed-forward NN, term q ij [12,19] of the t-SNE is adapted as follows: where α denotes the degrees of freedom of the Student's t-distribution.

Restricted Boltzmann Machine
In this section, a short introduction to RBM [22,23] is given. An RBM is a two-layer stochastic NN. This network consists of a visible, or input, layer (visible nodes v) and a hidden layer (hidden nodes h). Values of the nodes are normally Bernoulli-distributed. Each visible node is connected to all hidden nodes using weighted connections, but there are no intra-layer connections. The structure of the RBM is illustrated in Figure 1. Boltzmann distribution is specified by the energy function E(v, h), and this distribution gives the joint distribution over all nodes, P(v, h): where W ij is the weight of the connection between a visible node v i and a hidden node h j ; and b i and a j are the biases of visible and hidden nodes, respectively. Moreover, conditional probabilities P(v i = 1|h) and P(h j = 1|v) are given by the sigmoid function: The RBM can calculate the values of visible nodes from the values of hidden nodes by Equation (1); similarly, the RBM can calculate the values of hidden nodes from the visible nodes by Equation (2). The model parameters W, b, and a are learned so that the marginal distribution over the visible nodes under model P model (v) is close to the true distribution of data, P data (v). In particular, the RBM uses the Kullback-Leibler (KL) divergence to measure the distance between the true distribution P data (v) and the distribution based on the model P model (v). The gradient of the KL divergence with respect to W ij is given by To avoid computing the model, we follow an approximation: the gradient of a slightly different objective function that is called contrastive divergence (CD) [24]. The CD measures how the model distribution gets away from the true distribution of data through KL(P data ||P model ) − KL(P 1 ||P model ), where P 1 (v) is the distribution over the visible nodes because the RBM can run for one iteration (that is, one Gibbs sweep) when initialized according to the true distribution. Using standard gradient descent techniques, the CD can be minimized efficiently: The second term, E[v i h j ] P 1 , is estimated from the samples obtained using Gibbs sampling.

Baseline Data: Preprocessing and Clustering
In this section, data preprocessing is presented briefly, because a more detailed description can be found in [19]. The preprocessing has three stages: data integration (Section 3.1), data transformation (Section 3.2), and data reduction (Section 3.3). Subsequently, data are organized in clusters in Section 3.4.

Data Integration: Unfolding and Scaling
The collected data contain different response signals measured by sensors on a vibrating structure in the time domain. Under different structural states, multiple observations of these responses are measured. Then, using the fast Fourier transform (FFT) algorithm, these response signals are transformed into the frequency domain. All these observations in the frequency domain are collected in a matrix that is defined as follows: · · · x 1,L n 1 ,1 . .
where K ∈ N is the number of sensors and k = 1, . . . , K identifies the sensor that is measuring; L ∈ N is the number of components in each signal and l = 1, . . . , L indicates the l-th measurement in the frequency domain; J ∈ N is the number of different structural states that are considered and j = 1, . . . , J represents the structural state that is measured; finally, n j , j = 1, . . . , J, is the number of observations per structural state and i = 1, . . . , n j is the i-th observation related to the j-th structural state. Matrix X in Equation (3) is a particular unfolded version of a 3-dimensional (n 1 + · · · + n J ) × K × L data matrix, where the first dimension is observation, the second dimension is sensor, and the third dimension is time. Numerous approaches have been proposed to handle 3-dimensional matrices. The most widely adopted approaches are based on the unfolding of these matrices. There are six alternative ways of arranging a 3-dimensional data matrix [20] that affect the performance of the overall strategy, and we have considered type E in this work, because type-E unfolding simplifies the study of variability among samples.
Matrix X in Equation (3) is rescaled through MCGS [21] because of the different magnitudes and scales in the measurements.

Data Transformation
Data transformation means applying a particular mathematical function. The transformation that we apply in this study is PCA, because the final aim is dimensionality reduction. We build the PCA model, P, so that the normalized data X using MCGS,X, are transformed to the projected data T =XP. Notably, matricesX and T have equal dimensions; hence, no reduction is performed at this stage.

Data Reduction: PCA and P-t-SNE
In this work, we use two methods of data reduction. On the one hand, we apply PCA to represent the normalized matrixX in a new space with reduced dimensions and without a significant loss of information. On the other hand, we apply P-t-SNE as a 2-dimensional representation technique. These two approaches are combined to reduce the data complexity, computational effort, and time.

Clustering Effect
The first dimensionality reduction is performed using PCA. Specifically, for n = n 1 + · · · + n J observations, the rows of matrix X in Equation (3) under J different structural states are projected and transformed into a lower-dimensional space.
Next, the second dimensionality reduction is applied to the projected and transformed data using P-t-SNE. The aim is to find a collection of 2-dimensional points that represent the projected and transformed data by PCA with no explicit loss of information and preserve the local structure of this dataset. After the application of P-t-SNE, we expect to observe J clusters related to J different structural states.
As mentioned at the beginning of Section 3, see [19] for more details on these stages.

Structure to Diagnose: Damage Detection and Classification Procedure
In this section, we describe the vibration-based damage detection and classification procedure to diagnose a new structure.
For damage detection and classification, a single observation of the current structure is required to diagnose it. The collected data are composed of different response signals measured by K sensors and L components in each signal, as in Equation (3). When these measures are obtained in the frequency domain, we build a new data vector z:

Scaling (MCGS)
First, we have to scale the row vector z to define a scaled row vectorz : where µ k,l is the arithmetic mean of all the elements in the [(k − 1)L + l]-th column of matrix X in Equation (3) (that is, the l-th column of the k-th sensor) and σ k is the standard deviation of all measurements of the k-th sensor relative to the mean value µ k (the arithmetic mean of all the measurements of the k-th sensor).

Projection (PCA)
The scaled row vectorz ∈ R K·L is projected into the space spanned by the first principal components in P through the vector-to-matrix multiplication: Notably, the vector containing the data of the structure to be diagnosed initially has dimension K · L but later dimension . We add this new point to the projected and transformed data by PCA (X = {x 1 , . . . , x n } ⊂ R ) to define a new set: where x i = e iX P , i = 1, . . . , n and e i ∈ R n is the i-th element of the canonical basis. The network is trained with {x 1 , . . . , x n }. Then, {x n+1 } will be passed through the trained network.

P-t-SNE and Final Classification
Finally, we apply P-t-SNE to the -dimensional set X in Equation (5) to find a collection of 2-dimensional map points Y = {y 1 , . . . , y n , y n+1 } ⊂ R 2 that represent the original set X = {x 1 , . . . , x n } ⊂ R (the data projected and transformed by PCA) with no explicit loss of information and retaining the local structure of this set. Furthermore, the map point y n+1 , associated with the data point x n+1 , is included. That is, the embedded data are constructed applying the trained network: input X and output Y . We expect to observe the same J clusters related to the J different structural states.
For each cluster, we calculate its centroid: the mean of the values of the data points in the cluster. For instance, the centroid associated with the first structural state is In general, the centroid associated with the j-th structural state, j = 1, . . . , J, is the 2-dimensional point defined as where n 0 = 0. As a result, the current structure to diagnose is associated with the j-th structural state if that is, if the minimum distance between y n+1 and each centroid corresponds to the Euclidean distance between Y j and y n+1 . We call this approach the smallest point-centroid distance (see Figure 2). The proposed procedure is shown in Figure 3.

Application of the SHM System on an Aluminum Plate with Four PZTs
In this study, we reuse the structure and experiment from [19]. Therefore, we can use this structure as a benchmark to compare our results.

Structure
A square aluminum plate was manufactured to demonstrate the accuracy of the vibration-based method of damage detection and classification presented in Sections 3 and 4. The dimension of the plate is 40 × 40 × 0.2 cm. The plate is instrumented with four PZTs and a mass of 17.2916 g is introduced to simulate the damage in a non-destructive way, producing changes in the propagated wave (see Figure 4a). Each PZT can work in two modes: excite the aluminum plate (actuator mode) with a burst signal or detect a time-varying mechanical response (sensor mode). The location of the mass defines each damage, and J = 4 structural states are considered: healthy state and three different types of damage (Figure 4b). The plate is isolated from the vibration and noise in the laboratory (Figure 4b).

Scenarios and Actuation Phases
Unlike [19], in this study, the experimental setup includes only two scenarios (the two extreme cases) to determine the performance of the proposed method: • Scenario 1. The signals are obtained using a short wire (0.5 m) from the digitizer to the piezoelectric sensors. Next, the Savitzky-Golay [25] algorithm is used to filter these signals after adding white Gaussian noise. This filter is used to smoothen the data.
• Scenario 2. The signals are obtained using a long wire (2.5 m) from the digitizers to the piezoelectric sensors. Signals are also filtered with the Savitzky-Golay algorithm.
As discussed in Section 5.1, there are four PZTs (S1, S2, S3, and S4) that excite the plate and collect the measured signal. This sensor network works in what we call actuation phases. In each actuation phase, a PZT is used as an actuator (which excites the plate with the burst signal), and the rest of the PZTs are used as sensors (which measure signals). Therefore, we have as many actuation phases as sensors: in actuation phase 1, S1 is used as the actuator and the rest of PZTs are used as sensors; in actuation phase 2, S2 is used as the actuator and the rest of PZTs are used as sensors; and so on.
Therefore, the matrix that collects all observations under the different structural states in the frequency domain is as follows (see Equation (3); here, L = 30001 and J = 4): The damage detection and classification method described in Sections 3 and 4 can be applied to each matrix X[ϕ], ϕ = 1, . . . , 4, in Equation (7), leading to one classification per actuation phase. Then, we will use these four classifications to obtain a final decision based on the four actuation phases. This strategy will be detailed in Section 5.5.

κ-Fold Nonexhaustive Leave-p-Out Cross-Validation
The proposed approach is evaluated by comparing test data (the new observations in unknown state under the same conditions) with baseline data (data from the structure under the four different structural states). For that purpose, the κ-fold nonexhaustive leave-p-out cross-validation [19] is used. Hence, the sum of all elements in the confusion matrices that are presented in Section 6 is equal to 400. We use the notation X to represent the matrix that is used as the baseline to build the model. Matrix X has 20 rows: five for each structural state.

Damage Detection and Classification
The strategy for damage detection and classification is as follows: the classification will be based on the four matrices X (7), with κ-fold nonexhaustive leave-p-out cross-validation. Each actuation phase will cast a vote that will determine the final classification.

•
Step 2. PCA is applied to matrixX, obtaining the PCA model P.

•
Step 3. The reduced PCA model P is chosen such that the proportion of variance explained is at least 95%.

•
Step 4. An observation z ∈ R 3·30001 = R 90003 of the current structure-to-diagnose is needed. Then, z is scaled by Equation (4) to obtainz , which is projected into the space spanned by the first principal components in P .

•
Step 6. P-t-SNE is applied to X to find a set of 2-dimensional points: Y = {y 1 , . . . , y 20 , y 21 } ⊂ R 2 . Thus, the embedded data are constructed using the trained network: input X and output Y . . . , y 20 } ⊂ Y , related to damage 3. Centroid Y j , j = 1, . . . , J, associated with the j-th structural state is calculated by Equation (6).

•
Step 8. With the information given by the four actuation phases, the structure that must be diagnosed is finally classified considering two approaches: majority voting and sum of the inverse distances. For details of both approaches, see [19].

Results
In this section, confusion matrices summarize the results of the damage detection and classification method presented in Sections 3 and 4 and detailed in Sections 5.3-5.5. The results for each scenario are shown in different subsections. Four different structural states are considered in both scenarios:

•
The healthy state (D0)-the aluminum plate with no damage; • Three states with damage (D1, D2, and D3)-the aluminum plate with an added mass at the positions indicated in Figure 4.
Five iterations (κ = 5) of a nonexhaustive leave-p-out cross-validation, with p = 80, are performed to validate the damage detection and classification method. At each iteration, 80 observations are considered: 20 observations per structural state (D0, D1, D2, and D3). Therefore, the sum of all elements in the confusion matrices is equal to 5 · 80 = 400.
Two different confusion matrices are shown for each of the two scenarios: • Majority voting. The damage detection and classification method is applied to the four matrices X[ϕ], ϕ = 1, . . . , 4. Each actuation phase emits a vote, and the final classification is made by the majority voting [19].
• Sum of the inverse distances. The damage detection and classification method is applied to the four matrices X[ϕ], ϕ = 1, . . . , 4. Each actuation phase emits a vote, and the final classification is made using the maximum sum of the inverse distances [19].
Finally, we have included the confusion matrices for t-SNE, to compare its performance with P-t-SNE.

Scenario 1
In this section, we describe the results for Scenario 1 from Section 5.2. Tables 1 and 2 show the four confusion matrices. The green background cells correspond to observations that are correctly classified, while the red background cells represent the misclassifications. The color intensity (green or red) is related to the proportion of correct or wrong decisions.
With P-t-SNE and the majority voting (Table 1), the overall accuracy is very good. Specifically, 398 out of 400 observations have been correctly classified, which corresponds to the overall accuracy of 99.5%. With t-SNE, the overall accuracy is 100% (Table 1).
Using the sum of the inverse distances (Table 2) for the P-t-SNE-based damage detection and classification, 399 out of 400 observations have been correctly classified; hence, the overall accuracy is 99.75%. For t-SNE, 400 out of 400 observations have been correctly classified (100% accuracy).
Furthermore, other metrics are calculated. The most common metrics for choosing the best solution in a binary classification problem are as follows: • Accuracy, defined as the proportion of true results among the total number of cases examined.

•
Precision or positive predictive value (PPV) that attempts to answer the question "what proportion of positive identifications is actually correct?".
• Sensitivity or true positive rate (TPR) that measures the proportion of actual positives that are correctly identified as such.
• F 1 score, defined as the harmonic mean of PPV and TPR. • Specificity or true negative rate (TNR) that measures the proportion of actual negatives that are correctly identified as such.
These metrics are easy to calculate for binary and multiclass classification problems [26]. When the classification problem is multiclass, as in the current work, according to [27,28], the result is the average obtained by adding the result of each class and dividing over the total number of classes.
In all cases (P-t-SNE, t-SNE, the majority voting, and the sum of the inverse distances), these five metrics vary between 99.5% and 100% (Tables 3 and 4).
As can be seen, both parametric and non-parametric approaches obtain practically the same results. However, P-t-SNE dramatically reduces the processing time: it decreases from 40 min and 15 s (t-SNE) to 2 min and 34 s (P-t-SNE) on an Intel Core i7 4.20 GHz computer with 32 GB RAM. Using P-t-SNE, a decision is made in just a few milliseconds. The reduced time (2 min and 34 s) includes both the pretraining and the fine-tuning of the NN, as well as the classification of the current state of the structure. The total computational cost of P-t-SNE is reduced by approximately 94% compared with t-SNE. Table 1. Confusion matrix of the application of P-t-SNE and t-SNE damage detection and classification method presented in Sections 3 and 4: scenario 1 and the majority voting approach.  Table 3. Accuracy, PPV, TPR, F 1 score, and TNR of the application of P-t-SNE and t-SNE damage detection and classification method presented in Sections 3 and the 4: scenario 1 and the majority voting approach.  Table 4. Accuracy, PPV, TPR, F 1 score, and TNR of the application of P-t-SNE and t-SNE damage detection and classification method presented in Sections 3 and 4: scenario 1 and the sum of the inverse distances approach.

Scenario 2
In this section, we describe the results for Scenario 2 from Section 5.2. Tables 5 and 6 show the four confusion matrices.
With P-t-SNE and the majority voting (Table 5), the overall accuracy is also very good. Specifically, 398 out of 400 observations have been correctly classified; this corresponds to the overall accuracy of 99.5%. With t-SNE and the majority voting, the overall accuracy is 100% (Table 5).
Using the maximum sum of the inverse distances to take a final decision (Table 6) with P-t-SNE, 399 out of 400 observations have been correctly classified (the overall accuracy of 99.75%). With t-SNE, 400 out of 400 observations have been correctly classified (the overall accuracy of 100%).
In the non-parametric approach, using the majority voting and the sum of the inverse distances, the five metrics presented in Section 6.1 achieve 100%. In contrast, in the parametric approach, these metrics slightly decrease (between 0.1% and 0.5%, see Tables 7 and 8).
Again, as in scenario 1, parametric and non-parametric methods obtain similar results. However, as before, the P-t-SNE approach dramatically reduces the processing time: from 42 min and 1 s (t-SNE) to 2 min and 32 s (P-t-SNE). As in the previous scenario, the total computational cost of P-t-SNE is reduced by approximately 94% compared with t-SNE. Table 5. Confusion matrix of the application of P-t-SNE and t-SNE damage detection and classification method presented in Sections 3 and 4: scenario 2 and the majority voting approach.  Table 7. Accuracy, PPV, TPR, F 1 score, and TNR of the application of P-t-SNE and t-SNE damage detection and classification method presented in Sections 3 and 4: scenario 2 and the majority voting approach.

Repeatability
We use error bars to give the reader a general idea of the uncertainty in the results. Figure 5 shows the mean of each structural state with error bars representing the standard error. As it can be seen both in Figure 5 and in Tables 9-12, the standard error is very small, when we repeat the procedure 10 times.  Table 9. Repeatability of the SHM strategy (10 times): scenario 1 and majority voting approach. D0: healthy state of the structure; D1, D2, and D3: added masses at the positions indicated in Figure 4.

Conclusions
In this paper, we proposed an SHM strategy for the detection and classification of structural changes combining PCA and P-t-SNE. We evaluated the proposed method with experimental data. The obtained results show that its performance is very good, given its high classification accuracy. In addition, we have compared the parametric version of t-SNE with the non-parametric version.
According to the results shown in Sections 6.1 and 6.2, the performance is very satisfactory and similar in both approaches: P-t-SNE and t-SNE. However, in terms of processing time, it is better to make a decision considering the P-t-SNE-based damage detection and classification rather than working with the t-SNE-based method: although the non-parametric approach slightly outperforms the parametric approach, the parametric approach can reduce the total computational cost by approximately 94%. Hence, P-t-SNE can classify a current structure in just a few milliseconds. This is the first indication that P-t-SNE is better compared with the t-SNE. Other advantages of using P-t-SNE are as follows: 1. P-t-SNE can handle large-scale datasets, while t-SNE can only handle datasets with a size not greater than the order of thousands. 2. The t-SNE method requires an extremely large computational cost for the optimization: to map a new data sample, the optimization has to run for the whole dataset again. However, P-t-SNE can learn from the training data and be applied when a new observation arises; hence, it can work with real-time observations. Therefore, the parametric approach can make inferences about new samples to be diagnosed without having to recalculate everything; the model predicts on out-of-sample data.
Based on the foregoing, and seeing the strong performance of the P-t-SNE-based approach, we conclude that it is better to work with the parametric version of t-SNE than with the non-parametric version.
Many possible fields of application exist. For example, in aeronautics, parts of an airplane (wings or fuselage) can be simulated with similar aluminum plates; for wind turbines, this methodology can be applied to detect damage and faults. In general, if a sensor network can be installed in a structure, and various actuation phases can be defined, the proposed approach can be considered.
In our future work, we contemplate to apply the proposed methodology to handle imbalanced data, as well as to determine its effectiveness in different environmental and operational conditions. Author Contributions: D.A. and F.P. developed the idea and designed the exploration framework; D.A. developed the algorithms; F.P. supervised the results of the validation; D.A. and F.P. drafted the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: This research has been partially funded by the Spanish Agencia Estatal de Investigación (AEI)-Ministerio de Economía, Industria y Competitividad (MINECO), by the Fondo Europeo de Desarrollo Regional (FEDER) through the research project DPI2017-82930-C2-1-R, and by the Generalitat de Catalunya through the research project 2017 SGR 388.