Structural Health Monitoring for Condition Assessment Using Efficient Supervised Learning Techniques

: Pattern recognition can be adopted for structural health monitoring (SHM) based on statistical characteristics extracted from raw vibration data. Structural condition assessment is an important step of SHM, since changes in the relevant properties may adversely affect the behavior of any structure. It looks therefore necessary to adopt efficient and robust approaches for the classification of different structural conditions using features extracted from the said raw data. To achieve this goal, it is essential to correctly distinguish the undamaged and damage states of the structure; the aim of this work is to present and compare classification methods using feature selection techniques to classify the structural conditions. All of the utilized classifiers need a training set pertinent to the undamaged/damaged conditions of the structure, as well as relevant class labels to be adopted in a supervised learning strategy. The performance and accuracy of the considered classification methods are assessed through a numerical benchmark concrete beam.


Introduction
In many engineering areas, assessing the integrity and/or health of a structure is an important and timely topic. Several methods have been developed in the past decades for the relevant structural health monitoring (SHM), which aims to detect any possible damage and classify the conditions of structures. Methods can be distinguished into model-driven [1][2][3][4], see also Farrar and Worden [5], or data-driven ones. The model-driven SHM approach is based on an analytical or finite element model of the structure [6]. Although this approach can successfully detect damage, some limitations are represented by the necessity of a detailed model, of a model updating procedure and of data reduction from raw vibration measurements. The data-driven SHM approach is instead based on monitoring features that must be sensitive to damage, and then on discriminating the normal or undamaged structural state condition from the damaged one by analyzing the features [5,[7][8][9]. To pursue this aim, SHM consists in statistical pattern recognition methodologies arranged into four steps: (1) operational evaluation, (2) data acquisition, (3) feature extraction, and (4) statistical decision making for the classification of features [5].
Feature extraction is an important step in the methods of statistical pattern recognition, since methods may fail to provide an accurate decision using unreliable features. Series modeling is an applicable tool to extract damage-sensitive features from raw vibration data. In this regard, in [10] coefficients of autoregressive (AR) models were extracted as damage-sensitive features, whereas in [11] the residuals of AR models were employed to quantify the differences between the prediction from the AR model and the actual measurements. A different approach to extract the damagesensitive features is via principal component analysis (PCA). In [12], PCA was introduced to extract the features used for sub-surface defect detection, and in [13] PCA and independent component analysis (ICA) were compared for selecting the features from the measured data.
Statistical decision making refers to the application of statistical methods for the classification of the extracted features. This step is related to the implementation of machine learning algorithms, to classify the structural state conditions and identify possible damage states [5,[14][15][16]. The simple idea of machine learning relies on identifying a relationship between some features derived from the measured data in the undamaged or damaged conditions, as a training data set. In machine learning and statistics, classification is a problem of identifying the class label of a set of observations on the basis of a training set. The classification problem is considered as an instance of supervised learning that trains a classifier using the data set including the features related to both the undamaged and damaged conditions. In [17] a damage classification approach was presented for structures under varying operational and environmental conditions, with a unique combination of time series analysis and artificial neural networks. In [18], a comparative study of various classification algorithms was conducted for fault diagnosis, using different types of signals. The classification methods used were linear discriminant analysis, support vector machines, random forests and adaptive resonance theory-Kohonen neural network. In [19], the linear discriminant analysis method was adopted as a classifier for the damage detection in composite structures with the aid of a wavelet packet transformbased algorithm. A naïve Bayes classifier was used in [20] and shown to be one of the most promising classification approaches for damage detection. Despite the studies on successful classification algorithms, there are some other techniques that can be used in the context of SHM: most of them require a simple strategy to become useful approaches in the detection of damage [21,22].
The main objective of this work is to discuss some classification methods and compare their performances in the classification of different structural conditions. Linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), naïve Bayes (NB) classification, and decision tree (DT) are here adopted to recognize the class label of the structural state. AR models and PCA are applied to extract the damage-sensitive features from the raw vibration data. To assess the performances of the classification methods, a numerical benchmark concrete beam is considered.

Method
In the proposed comparative assessment, the features are selected using AR or PCA models and the structural state conditions are accordingly classified. Most of the classification methods require training and test data sets including features related to the undamaged and damaged conditions, as well as class labels. The performance and prediction accuracy of each of these methods depend on the model and algorithm parameters: a comparative assessment is therefore necessary in real-life cases and is here provided. Assume that X ϵ ℜ v×n is the training set, that consists of -dimensional features in the undamaged and damaged conditions, where represents the number of sensors mounted on the structure; then, the training set has AR coefficients or principal components at each sensor location. Next, if z ϵ ℜ v represents the vector containing the classification labels for each element in the training data set, the classification methods have to classify the test data through the information extracted from the training and class label sets.

Feature Selection
Assuming that a linear time-invariant representation can fit the structural response, an AR model for a single-output system reads [23]: where y(t) is the measured response at time t; θ = [θ1,θ2,…,θa] is the vector of AR coefficients; is the model order; e(t) is an uncorrelated residual sequence used to quantify the difference between the measured and predicted responses. PCA is a statistical procedure that is used to convert a set of observations of possibly correlated variables into a set of linearly uncorrelated variables, named principal components. In the context of SHM, PCA can be used to reduce the dimension of high-dimensional data [24], extract damagesensitive features [13], or discriminate between normal and abnormal conditions of a structure [25]. Hence, PCA linearly transforms matrix X into a low-dimensional matrix T ϵ ℜ v×k using an additional matrix P ϵ ℜ n×k in the following form: To apply PCA, it is necessary to standardize the original matrix on the basis of mean values and standard deviations of all features for each single sensor. This also helps remove the differences in the ranges of the variables, and provides the same importance to each of them in the statistical analysis.

Classification Methods
When two or more clusters of data are known a priori and new observations have to be classified based on the measured characteristics, discriminant analysis can be adopted. LDA is a classification method [26], that can be used to find a linear combination of features to characterize or separate the classes of groups. In order to perform the classification, a training data set must be defined and the class with the smallest misclassification cost is then predicted [27]. It is assumed that in the -th class, the probability density function of x from the training set X , with mean μk and covariance Σk, is given by: The aim is to classify the group for which the probability is largest for a randomly selected observation. To use Equation (3) as a classifier, one needs to estimate the class priors, and the means and covariance matrix from the training data set. In the case of LDA, Gaussians for each class are assumed to share the same covariance matrix Σk = Σ, and the classification rule is based on a linear score function that is given by: where pk is the probability that a randomly selected observation falls in the -th class. QDA still assumes that the features are normally distributed, but the covariance of each class is no more identical to all the others [28]. Overall, with LDA it is assumed that the trained model has the same covariance matrix for each class and only the mean values vary, whereas with QDA both the mean and covariance of each class both vary. QDA classifies a sample set into the cluster that has the largest score function, defined as: The unknown values of μk, Σk, and pk are again obtained from the training data set. NB is a classification method that is intended to classify the feature values on the basis of the Bayes theorem: it assumes that the values of a particular feature are conditionally independent, given the class. The method then computes the posterior probability of the features belonging to each class, for any test data: hence, NB classifies the test data set based on the largest posterior probability. To perform classification, the algorithm builds the posterior probability model on the basis of Bayes rule, according to: The algorithm classifies an observation by allocating it to the class yielding the maximum posterior probability.
DT is a further classification method that applies a decision tree as a predictive model for the process of classification. A decision tree is a decision support tool that employs a tree-like graph as a model of decisions. The goal of this approach is to predict the class of new data by learning simple decision rules, inferred from the data features. If the target variable can take a finite set of values, models are called classification trees. The classification decision tree splits so-called nodes based on either impurity or node error: a common method dealing with impurity is the Gini's diversity index or the maximum deviance reduction, which is also known as cross entropy [29]. Given the training data set X and the label vector z, DT recursively partitions the space such that the samples with the same labels are grouped together. In order to train the classifier, it becomes essential to specify the number of branch nodes (decision splits), the minimum number of branch and leaf node observations, and the prior probabilities for each class. A full discussion of the classification decision tree is beyond of the scope of this article, and readers are referred to [30] for further details.

Results and Discussion
To verify the performance and capability of the presented methods, the numerical benchmark model discussed in [31] is considered, see Figure 1. The model is a simply supported beam with length 5 m, height 0.5 m, and width 0.01 m. 15 sensors are assumed to be installed at the top surface of the beam, to provide acceleration time histories in the vertical direction: for each location, the measurement period is assumed to last two seconds, and the measurements thus consist of 4001 data points. A single vertical crack is considered at mid-span, close to the location of sensor #8. Table 1 also shows the damage cases allowed for, at varying damage severity or crack length.  Out of all the digital pseudo-experimental data, two acceleration responses in the undamaged and damaged cases are chosen for feature extraction with both the considered approaches. Therefore, the acceleration measurements in this study consist of a matrix with 8002 data points, 15 sensors and 7 cases.
Feature selection is used to extract the parameters of the AR model and the components of PCA, using the acceleration data by considering 80% of data (i.e., 3201 data points) for training, 10% of data for validation and the remaining 10% for testing, to assess and ensure the accuracy of modelling and extracted features. In this study, Bayesian information criterion is adopted to set the order of the AR model to = 23, and the least-squares technique is used to estimate the coefficients vector θ of the AR model.
To adopt the components of PCA as damage-sensitive features, a standardization process has been enforced for the acceleration histories, so as to have data with zero mean and unit variance. To achieve accuracy for the classification task, seven class labels are defined in accordance with the cases gathered in Table 1: hence, in order to classify the structural state, the features obtained with the AR and PCA models in the damaged scenarios 2-7 have been adopted as test data sets.
The process of classification based on LDA, QDA, NB and DT is carried out using the training and test data sets. To summarize the obtained results, it turns out that LDA is not capable of classifying the different damage patterns; in contrast, QDA and NB provide excellent classification results using the AR coefficients as damage-sensitive features. If the PCs are instead used as damagesensitive features, all the classification methods with the exception of NB fail in giving a reliable output. Results of NB thus demonstrate that this method is the only reliable and capable tool for classification.
In order to compare in some details the results of classification using the AR and PCA models, Figure 2a displays the accuracy of the classification methods on the basis of the classification error estimated from the loss function [32], and Figure 2b shows instead the corresponding computing time. The loss function gives a scalar value representing how well the trained model (classifier) classifies the test data. It is shown that NB, independently of the adopted feature selection algorithm, has the highest accuracy and a remarkable computational efficiency; LDA, though rather efficient, cannot provide reliable classification results, with an accuracy never exceeding 20%; DT turns out to be a moderate classification method in terms of accuracy and the best approach with respect to computing time, handling either the AR and PCA features. Using QDA, the results of classification on the basis of AR coefficients are rather good, whereas those based on the PCs are not and feature an accuracy smaller than 20%; furthermore, this method results to be computationally inefficient.

Conclusions
In this paper, a comparison has been provided of the performances of some efficient methods to classify the structural conditions of a cracked concrete beam using the two well-known feature selection techniques: AR time series modeling, and PCA. For the process of classification, the The results showed that LDA is not a reliable classification method in the context of SHM, resulting in a low accuracy of classification and relatively high computational costs for both of the feature selection techniques. By contrast, NB provides classifications with the highest accuracy coupled with a computational efficiency. DT is not able to classify the structural conditions as good as NB, yet it outperforms LDA; this method is anyway better than LDA and NB in terms of computational costs. The results of QDA are affected by the feature selection technique, but it can be concluded that this method has a good classification accuracy, though it is associated with the highest computing time.
The main limitation of the considered classification methods is represented by being supervised learning algorithms. The supervised format requires data from the damaged states to be always available, and this represents a practical challenge to the applicability of the classification methods to real-life structures.