Hierarchical Wavelet-Aided Neural Intelligent Identification of Structural Damage in Noisy Conditions

A sophisticated hierarchical neural network model for intelligent assessment of structural damage is constructed by the synergetic action of auto-associative neural networks (AANNs) and Levenberg-Marquardt neural networks (LMNNs). With the model, AANNs aided by the wavelet packet transform are firstly employed to extract damage features from measured dynamic responses and LMNNs are then utilized to undertake damage pattern recognition. The synergetic functions endow the model with a unique mechanism of intelligent damage identification in structures. The model is applied for the identification of damage in a three-span continuous bridge, with particular emphasis on noise interference. The results show that the AANNs can produce a low-dimensional space of damage features, from which LMNNs can recognize both the location and the severity of structural damage with great accuracy and strong robustness against noise. The proposed model holds promise for developing viable intelligent damage identification technology for actual engineering structures.


Introduction
Structural damage identification has been an active research focus that has been receiving increased attention over the years in the civil engineering field. Structural accidents have once again demonstrated the necessity of developing valid techniques for early-stage structural damage identification [1,2]. In general, structural damage identification can be divided into four progressive levels: judgment of damage occurrence, determination of damage location, discrimination of damage severity, and assessment of remaining structural life [3,4]. Typical damage identification methods found in the literature are usually developed by processing structural dynamic responses. Such methods can be largely categorized into two groups: vibration-based damage detection methods [5][6][7] and wave-propagation-based damage inspection methods [8][9][10].
An active branch of vibration-based damage identification is damage pattern recognition based on artificial intelligence methods [11][12][13][14][15]. The most popular artificial intelligence method used u (j) ; n, k = 0, 1, 2, · · · (1) where u is the wavelet function, j is the decomposition level, k is the translation parameter, and n is the modulation parameter. The terms h(k) and g(k) are quadrature mirror filters, and the corresponding function sets H = {h(k)} k=z and G = {g(k)} k=z denote the low pass filter and the high pass filter, respectively. The recursive relations between the jth and the (j + 1)th level components of WPT are given in the equations: After being decomposed for j times, the sum of the component signals can represent the original signal f (t) as: The process of WPT for the original signal can be perceived as a binary tree (Figure 1), in which f (t) is at the top layer. At the first decomposition layer, f (t) is decomposed by the filters H and G, giving its approximate component signal A and detail component signal D. A and D are then decomposed by H and G again, respectively, leading to two new pairs of approximate and detail component signals, {AA, AD} and {DA, DD}. For the next layer, the filters H and G are used in the same way to process each component signal AA, AD, DA, and DD, creating new pairs of approximate and detail component signals, and so on. In the end, at the jth layer, a total of 2 j component signals can be produced. All the component signals generated by the WPT up to the fourth layer are displayed in Figure 1, in which each component signal is termed a wavelet packet node. Appl. Sci. 2017, 7, 391 3 of 20 is the scaling function, (0) 1 ( ) ( )  u t t  is the wavelet function, j is the decomposition level, k is the translation parameter, and n is the modulation parameter. The terms ( ) h k and ( ) g k are quadrature mirror filters, and the corresponding function sets denote the low pass filter and the high pass filter, respectively. The recursive relations between the th j and the ( 1)th  j level components of WPT are given in the equations: After being decomposed for j times, the sum of the component signals can represent the original signal ( ) f t as: The process of WPT for the original signal can be perceived as a binary tree (Figure 1), in which ( ) f t is at the top layer. At the first decomposition layer, ( ) Figure 1, in which each component signal is termed a wavelet packet node. In Figure 1, the upper-left part is the WT procedure; the component signals with yellow backgrounds D, AD, AAD, AAAD and AAAA are just the results of the WT. Hence, the WT is a subset of the WPT. Unlike the WT that merely covers the lower frequency part, the WPT provides even more and exhaustive decomposition along the whole frequency scope, because each left detail signal is decomposed again in a similar way to decompose the approximation signal. The down sampling operation of WPT causes the switch of the order of the low and high-pass components in a subsequent decomposition, hence producing a frequency content that follows a Gray code sequence [47]. In Figure 1, the upper-left part is the WT procedure; the component signals with yellow backgrounds D, AD, AAD, AAAD and AAAA are just the results of the WT. Hence, the WT is a subset of the WPT. Unlike the WT that merely covers the lower frequency part, the WPT provides even more and exhaustive decomposition along the whole frequency scope, because each left detail signal is decomposed again in a similar way to decompose the approximation signal. The down sampling operation of WPT causes the switch of the order of the low and high-pass components in a subsequent decomposition, hence producing a frequency content that follows a Gray code sequence [47].

Nonlinear Principal Component Analysis
Nonlinear principal component analysis (NLPCA) is deemed to be a nonlinear generalization of principal component analysis (PCA) [48], generally used for data compression and dimension reduction, and can be realized by the following procedure.
Let X be an original data set of high dimension. X is projected by a nonlinear function vector U into a new data set T of low dimension: T can be projected back into the original space by another nonlinear function vector V, givingX X is the reconstruction of X with the residual error X The functions U and V are properly chosen to minimize the residual error. With appropriate U and V, Equations (6) and (7) can be used to project a high-dimensional data set X to a low-dimensional data set T with little information loss, which is the essence of NLPCA.
Compared to the PCA, the NLPCA has two distinctive features: (i) it has advantages over the standard PCA in removing not only linear but also nonlinear correlations in variables; (ii) the nonlinear principal components (NPCs) from NLPCA are parallel in importance rather than being in decreasing order like those of the PCA.
Recent studies have reported that ANNs can implement some component analysis methods such as sensitivity analysis [49]. Likewise, they can be used to conduct NLPCA. Auto-associative neural networks (AANNs) have been demonstrated as a sophisticated strategy to carry out NLPCA [50]. The technological key lies in constructing an auto-associative learning framework by placing the high-dimensional data set X at both the input and target output layers. As shown in Figure 2, AANNs consist of five layers: the input layer, the mapping layer, the bottleneck layer (low-dimensional space), the remapping layer, and the output layer. After being well trained by controlling the resident error defined by Equation (8), AANNs perform Equation (6) by projecting X of high dimension from the input layer to T of low dimension in the bottleneck layer via the mapping layer; meanwhile, AANNs perform Equation (7) by projecting T from the bottleneck layer toX in the output layer via the remapping layer.

Nonlinear Principal Component Analysis
Nonlinear principal component analysis (NLPCA) is deemed to be a nonlinear generalization of principal component analysis (PCA) [48], generally used for data compression and dimension reduction, and can be realized by the following procedure.
Let X be an original data set of high dimension. X is projected by a nonlinear function vector U into a new data set T of low dimension: T can be projected back into the original space by another nonlinear function vector V , X is the reconstruction of X with the residual error  X The functions U and V are properly chosen to minimize the residual error. With appropriate U and V , Equations (6) and (7) can be used to project a high-dimensional data set X to a low-dimensional data set T with little information loss, which is the essence of NLPCA.
Compared to the PCA, the NLPCA has two distinctive features: (i) it has advantages over the standard PCA in removing not only linear but also nonlinear correlations in variables; (ii) the nonlinear principal components (NPCs) from NLPCA are parallel in importance rather than being in decreasing order like those of the PCA.
Recent studies have reported that ANNs can implement some component analysis methods such as sensitivity analysis [49]. Likewise, they can be used to conduct NLPCA. Auto-associative neural networks (AANNs) have been demonstrated as a sophisticated strategy to carry out NLPCA [50]. The technological key lies in constructing an auto-associative learning framework by placing the high-dimensional data set X at both the input and target output layers. As shown in Figure 2, AANNs consist of five layers: the input layer, the mapping layer, the bottleneck layer (low-dimensional space), the remapping layer, and the output layer. After being well trained by controlling the resident error defined by Equation (8), AANNs perform Equation (6) by projecting X of high dimension from the input layer to T of low dimension in the bottleneck layer via the mapping layer; meanwhile, AANNs perform Equation (7) by projecting T from the bottleneck layer to X in the output layer via the remapping layer.  Alternatively, AANNs can be seen as a combination of two individual networks, NN1 and NN2, with symmetrical structures. NN1 consists of the input layer, the mapping layer, and the bottleneck layer, whereas NN2 consists of the bottleneck layer, the remapping layer, and the output layer. The two networks integrate by sharing the bottleneck layer. It has been illustrated that for a neural network, one hidden layer with a nonlinear activation function is sufficient for modeling arbitrary nonlinear functions, so both NN1 and NN2 are able to model a nonlinear function if their activation functions in the hidden layer are nonlinear [50]. To ensure that the AANNs are capable for NLPCA, they should have nonlinear activation functions for the mapping and remapping layers.

Damage Identification Paradigm
A hierarchical neural intelligent model for structural damage identification is established by integrating the function of AANNs for extracting damage features with that of the LMNNs for recognizing damage patterns.

Wavelet Packet Node Energies (WPNEs)
The WPNEs of structural dynamic responses are newly emerging damage indices used in intelligent damage identification methods in recent studies [29,51]. Let f (t) be an acceleration response of a structure subjected to an excitation. The WPNEs can be defined on f (t) as: where i labels the ith wavelet packet node at the jth layer, d i j [k] is the WPT coefficient of the ith wavelet packet node, k is the translation parameter, and u n j,k (t)is the wavelet packet function. The principle of using WPNEs to represent damage is that damage can modulate the structural dynamic response, and therefore it can change the WPNE distribution derived from the dynamic response. Compared to traditional dynamic features such as frequency or modal shape, WPNEs are much more sensitive to damage. Moreover, WPNEs are more robust than some newer damage features such as the WPT coefficient. In what follows, WPNEs are used to characterize damage.
The operation of WPNEs in characterizing damage are related to the wavelet packet function and the decomposition level of the WPT. In this study, the wavelet packet function is set at Daubechies4 (db4) and the number of decomposition layers is set at six. The choice of the db4 wavelet as the mother wavelet in the WPT to decompose the acceleration for damage identification is based on the trial-and-error method. The trial-and-error method is a general method for determining the proper mother wavelet for use in the WT-based structural damage detection, typically referring to [42,51]. Equations (9) and (10) give rise to the vector of WPNEs, e p = e 1,p , e 2,p , · · · e 64,p T . Although this vector carries sufficient damage information, its high dimensionality causes considerable difficulty in the efficient representation of damage. In particular, it is unfeasible to employ the vector of WPNEs as damage features to frame an intelligent damage pattern recognition paradigm. This instance entails a procedure of extracting damage features from WPNEs. The extracted damage features should have the characteristic of low dimensionality and also preserve the damage information of the WPNEs.

Damage Feature Extraction
NLPCA was previously introduced in the field of structural damage detection, where it was mainly used to deal with environmental effects [52,53]. In this article, it helps build damage features. Considering that the WPNEs vector e p is a column vector, a matrix E is constructed from the WPNEs vectors e p of m samples as follows: e 64,1 e 64,2 · · · e 64,p · · · · · · . . .
Extraction of damage features from WPNEs is tackled by the NLPCA strategy that is implemented by the AANNs, with the structure depicted in Figure 2. The number of neurons at the input layer or output layer (M) fits the dimensions of the WPNEs. According to [49], the number of neurons of either the mapping layer or the remapping layer N should be chosen such that: The bottleneck layer will produce NPCs that represent the extracted damage features. Therefore, the number of NPCs (r) equals the number of neurons in the bottleneck layer. r is properly chosen to ensure that the extracted damage features contain the sufficient damage information stored in the WPNEs. The activation functions in the mapping and remapping layers are the sigmoid functions σ and λ, respectively, whereas the activation functions at the bottleneck and output layers are the pure line functions β and δ, respectively, as indicated in the circle of the corresponding layer in Figure 2. The cost function, measured by the mean squared error (MSE), directs the training of the AANNs, with the MSE defined by [50]: where m is the number of samples. e i,p andê i,p are the target values and prediction values of the AANNs, respectively.
When the inputs and the target output are both set to WPNEs, the AANNs are forced to build a mapping from the inputs to the outputs. This mapping can produce a converted low-dimensional vector of WPNEs, i.e., the NPCs of WPNEs. In a well-trained AANN, the matrix of NPCs C r×m with r 64 is generated from the bottleneck layer. The C matrix is formed: where c p is a vector of NPCs for a damage sample. The NPCs carry almost all the damage information of WPNEs but have lower dimensionality than the WPNEs. This peculiarity enables NPCs to be valid damage features for intelligent damage pattern recognition.

Damage Pattern Recognition
With the damage features extracted by AANNs as inputs, LMNNs are utilized to perform damage pattern recognition. The LMNNs are back-feed forward neural networks equipped with the Levenberg-Marquardt algorithm. This algorithm endows LMNNs with distinctive efficiency and precision in convergence [54]. As shown in Figure 3, the LMNNs consist of three layers: an input layer, a hidden layer, and an output layer. The Greek letters in the neuron circles represent activation functions of the corresponding layer. In the hidden layer, the activation function ε is the sigmoid function and in the output layer, θ is the pure line function. The inputs are the NPCs constructed from the AANNs, C r×m with r 64; the outputs S z×m describe the spatial element distribution of the structure being inspected ( Figure 3). The value of each entry from 0 to a denotes the extent of the damage, with a denoting the upper limit of the severity of possible damage. The spatial position of this entry in S z×m indicates the damaged element of the structure. and precision in convergence [54]. As shown in Figure 3, the LMNNs consist of three layers: an input layer, a hidden layer, and an output layer. The Greek letters in the neuron circles represent activation functions of the corresponding layer. In the hidden layer, the activation function  is the sigmoid function and in the output layer,  is the pure line function. The inputs are the NPCs constructed from the AANNs,  To train the LMNNs, a certain number of damaged cases of the test structure need to be considered to construct the training sample set. Furthermore, more different damaged cases are demanded as testing samples for the network, to test its generalization ability after being well trained. For the particular model proposed here, due to the validity of the choices of inputs, relatively fewer training samples can train the LMNNs adequately. During the course of training, the LMNNs try to find out the underlying relationship between damage features (inputs) and damage states (outputs) by constantly adjusting the weight and bias of the hidden layer and output layer. MSEs between the outputs and the targets are considered as the cost function of the LMNNs. After being well trained, the LMNNs can build nonlinear mapping between damage features and damage states. This mapping will be able to recognize new unknown structural damage states.

Hierarchical Neural Network Model
The synergetic action of AANNs and LMNNs constructs a hierarchical neural network model for intelligent damage assessment, as shown in Figure 4. In the process of damage identification, AANNs are responsible for damage feature extraction and LMNNs account for damage pattern recognition. Distinctive features of the proposed model in damage assessment are: (i) WPNEs are much more sensitive to damage than WPT coefficients, natural frequencies, and mode shapes. (ii) AANNs acting as a smart NLPCA tool can extract damage features from WPNEs. Such extracted damage features have lower dimensionality than WPNEs while preserving enough damage information. (iii) LMNNs can capture the underlying relations between damage features and damage states, on which they can recognize structural damage patterns. To train the LMNNs, a certain number of damaged cases of the test structure need to be considered to construct the training sample set. Furthermore, more different damaged cases are demanded as testing samples for the network, to test its generalization ability after being well trained. For the particular model proposed here, due to the validity of the choices of inputs, relatively fewer training samples can train the LMNNs adequately. During the course of training, the LMNNs try to find out the underlying relationship between damage features (inputs) and damage states (outputs) by constantly adjusting the weight and bias of the hidden layer and output layer. MSEs between the outputs and the targets are considered as the cost function of the LMNNs. After being well trained, the LMNNs can build nonlinear mapping between damage features and damage states. This mapping will be able to recognize new unknown structural damage states.

Hierarchical Neural Network Model
The synergetic action of AANNs and LMNNs constructs a hierarchical neural network model for intelligent damage assessment, as shown in Figure 4. In the process of damage identification, AANNs are responsible for damage feature extraction and LMNNs account for damage pattern recognition. Distinctive features of the proposed model in damage assessment are: (i) WPNEs are much more sensitive to damage than WPT coefficients, natural frequencies, and mode shapes. (ii) AANNs acting as a smart NLPCA tool can extract damage features from WPNEs.
Such extracted damage features have lower dimensionality than WPNEs while preserving enough damage information. (iii) LMNNs can capture the underlying relations between damage features and damage states, on which they can recognize structural damage patterns. (iv) The special structure of the hierarchical neural network model requires a small set of training samples of damaged cases to produce accurate prediction results of damage identification with great noise robustness. (v) The hierarchical neural network model is easily implemented in a computational language, e.g., Matlab, to create an automatic program of intelligent damage assessment.
(iv) The special structure of the hierarchical neural network model requires a small set of training samples of damaged cases to produce accurate prediction results of damage identification with great noise robustness. (v) The hierarchical neural network model is easily implemented in a computational language, e.g., Matlab, to create an automatic program of intelligent damage assessment.

Numerical Verification
The three-span continuous bridge is a typical item in civil infrastructure. The effectiveness of the proposed model is verified by detecting damage in a three-span continuous bridge ( Figure 5) with span lengths 15, 20, and 15 m, respectively, similar to the engineering case used in [55]. The material properties of the bridge are listed in Table 1 and this bridge is modeled using ten beam elements. An impact force with a maximum magnitude of 100 N is exerted at the midpoint of the bridge, and the acceleration response of this point is measured. A duration of 20 s of acceleration response is recorded, with the sampling frequency of 100 Hz. From the acceleration response, the traditional modal properties including natural frequencies and mode shapes are first analyzed for damage characterization. Unfortunately, in the cases of damage with the stiffness reduction ratio (SRR) below 10%, the maximum change in the traditional parameters induced by the damage is less than 5%, insufficient to reflect the damage. This example, illustrating the incapacity of traditional modal characteristics to portray damage, calls for a more sophisticated method to undertake damage identification of the bridge.

Numerical Verification
The three-span continuous bridge is a typical item in civil infrastructure. The effectiveness of the proposed model is verified by detecting damage in a three-span continuous bridge ( Figure 5) with span lengths 15, 20, and 15 m, respectively, similar to the engineering case used in [55]. The material properties of the bridge are listed in Table 1 and this bridge is modeled using ten beam elements. An impact force with a maximum magnitude of 100 N is exerted at the midpoint of the bridge, and the acceleration response of this point is measured. A duration of 20 s of acceleration response is recorded, with the sampling frequency of 100 Hz. From the acceleration response, the traditional modal properties including natural frequencies and mode shapes are first analyzed for damage characterization. Unfortunately, in the cases of damage with the stiffness reduction ratio (SRR) below 10%, the maximum change in the traditional parameters induced by the damage is less than 5%, insufficient to reflect the damage. This example, illustrating the incapacity of traditional modal characteristics to portray damage, calls for a more sophisticated method to undertake damage identification of the bridge.

Damage Cases
Damage is fabricated by reducing the stiffness of a finite element of the bridge. Damage severity is described by the SRR. For each element, a set of 30 damaged cases is elaborated with SRRs ranging from 1% to 30% (Table 2). In total, 150 damaged cases are created in which damage occurs independently in one element of half of the bridge with various severities, considering the structure's symmetry.
For each damaged case, an impact force is exerted to vibrate the bridge, with the acceleration responses measured at the midpoint of the bridge.

Damage Cases
Damage is fabricated by reducing the stiffness of a finite element of the bridge. Damage severity is described by the SRR. For each element, a set of 30 damaged cases is elaborated with SRRs ranging from 1% to 30% (Table 2). In total, 150 damaged cases are created in which damage occurs independently in one element of half of the bridge with various severities, considering the structure's symmetry. For each damaged case, an impact force is exerted to vibrate the bridge, with the acceleration responses measured at the midpoint of the bridge.

Damage Feature Extraction
For preprocessing, the WPT is employed to process the acceleration responses and generate WPNEs. Let

Damage Feature Extraction
For preprocessing, the WPT is employed to process the acceleration responses and generate WPNEs. Let f p (t) be the acceleration response generated from the pth damaged case. The db4 wavelet optimally selected is used to decompose f p (t) up to the sixth layer, resulting in 64 WPNEs. Figure 6a,b show the WPNEs and their normalized forms of the intact case, where the normalization is conducted as: where e and σ are the mean value and the standard deviation of the WPNE vector.  In accordance with the procedure described in Section 3.2, the AANNs are employed to extract damage features from the WPNEs. By the trial-and-error method, the optimal number of neurons in the bottleneck layer of the AANNs is determined as 5; by the rule stated in Equation (12), the number of neurons in the mapping layer as well as the remapping layer is given as 20. When the WPNEs are individually placed in the input layer and the output layer, the AANNs can be driven to train under the control of the cost function, i.e., MSE. Once the AANNs are well trained, as indicated by the MSE arriving at the stop criterion, the outputs of the bottleneck layer generate the NPCs, leading to five NPCs. These NPCs are the damage features extracted from the WPNEs, as illustrated in Figure 6c for the intact case.
The damaged cases listed in Table 2 can be divided into five groups in terms of the damaged elements (locations), from Ele 1 to Ele 5, with Ele being an abbreviation of element. Damaged cases in same group differ from each other according to their different damage severities. Visualization of the NPCs is utilized to indicate their feasibility to characterize damage. The first three NPCs of all damage cases are plotted in Figure 7a, with Figure 7b-d providing progressively zoomed-in displays of the portion covered by the zoomed-in window. In these figures, there are five dotted curves extending to different directions, each comprising a sequence of separated dots. Each curve denotes a group of damage cases with the damage located at the same element; each dot of a curve labels a damage case of specific damage severity. The dispersion of the dotted curves indicates the effectiveness of the NPCs to reflect different damage locations; the separation of the dots in one curve designates the efficiency of the NPCs to characterize different damage severities. Clearly, the NPCs can be used to characterize not only the damage location but also the damage severity, even in damaged cases with the SRR less than 5%. Appl. Sci. 2017, 7, 391 11 of 20 In accordance with the procedure described in Section 3.2, the AANNs are employed to extract damage features from the WPNEs. By the trial-and-error method, the optimal number of neurons in the bottleneck layer of the AANNs is determined as 5; by the rule stated in Equation (12), the number of neurons in the mapping layer as well as the remapping layer is given as 20. When the WPNEs are individually placed in the input layer and the output layer, the AANNs can be driven to train under the control of the cost function, i.e., MSE. Once the AANNs are well trained, as indicated by the MSE arriving at the stop criterion, the outputs of the bottleneck layer generate the NPCs, leading to five NPCs. These NPCs are the damage features extracted from the WPNEs, as illustrated in Figure 6c for the intact case.
The damaged cases listed in Table 2 can be divided into five groups in terms of the damaged elements (locations), from Ele 1 to Ele 5, with Ele being an abbreviation of element. Damaged cases in same group differ from each other according to their different damage severities. Visualization of the NPCs is utilized to indicate their feasibility to characterize damage. The first three NPCs of all damage cases are plotted in Figure 7a, with Figure 7b-d providing progressively zoomed-in displays of the portion covered by the zoomed-in window. In these figures, there are five dotted curves extending to different directions, each comprising a sequence of separated dots. Each curve denotes a group of damage cases with the damage located at the same element; each dot of a curve labels a damage case of specific damage severity. The dispersion of the dotted curves indicates the effectiveness of the NPCs to reflect different damage locations; the separation of the dots in one curve designates the efficiency of the NPCs to characterize different damage severities. Clearly, the NPCs can be used to characterize not only the damage location but also the damage severity, even in damaged cases with the SRR less than 5%.

Damage Pattern Recognition
Following damage feature extraction using the AANNs, the LMNNs are used to recognize the damage location and severity. The inputs of LMNNs are the extracted damage features as illustrated in Figure 6c, and the outputs are five dimensional vectors [S Ele 1 , S Ele 2 , S Ele 3 , S Ele 4 , S Ele 5 ], among which S Ele k denotes the SRR of the kth element of the bridge. S Ele k takes the value of the interval [0, 0.3], suggesting that the maximum severity of damage considered is SRR = 30%. When the SRR varies from 1% to 30% for each element, 150 damaged cases are elaborated, as listed in Table 3. These damaged cases and the intact case are divided into a training set and a testing set of samples. The training set consists of 16 cases including 15 damaged cases that the SRR = 10%, 20%, 30% for each element and the intact case; the testing set comprises 135 damaged cases with SRRs differing from those in the training set. The training sample set is used to train the LMNNs with the cost function of MSE to control the training.  The LMNNs are considered well trained when the MSE reaches the stop criterion of 2 × 10 −5 . This criterion is determined by a trial-and-error method to ensure enough accuracy (avoiding excessive training) and high efficiency. At this stage, fed with the damage feature vectors of the test samples, the LMNNs produce the estimated vectors of [S Ele 1 , S Ele 2 , S Ele 3 , S Ele 4 , S Ele 5 ] for every sample. The difference between the estimated vectors and the target output vectors indicates the error of the damage identification. For instance, for the result corresponding to Ele 1, Figure 8a presents the output of Ele 1 estimated by the LMNNs for all the test samples; Figure 8b presents the associated target outputs; Figure 8c depicts the difference between the estimated values and the target outputs, i.e., the damage identification error. Clearly, the Ele 1 output of the LMNN approximates the target very well. For all the test samples, the damage identification results are given in Figure 9a-c for the estimated values, targets outputs, as well as the identification error. Firstly, the damage location can be detected with great accuracy; there are few incorrect judgments of the damage location. For the aspect of damage severity, some errors obstruct from obtaining a very precise prediction of the severity of damage. However, those errors are within a small range. As seen in Figure 9c, the errors are all below an upper limit of about seven, indicating that the detection may only fail for some minor damage. In a word, the results show that the proposed hierarchical neural network model can effectively detect both the location and severity of damage.
Notably, the proposed model has considerable generalization capability; it can capture the underlying relations between damage features and damage states on the basis of a fairly limited number of training samples.
Appl. Sci. 2017, 7, 391 13 of 20 Notably, the proposed model has considerable generalization capability; it can capture the underlying relations between damage features and damage states on the basis of a fairly limited number of training samples.

Robustness Against Noise
Robustness against measurement noise is a key factor for assessing the performance of the damage detection methods [56,57]. The capability of the proposed model to identify damage in noisy conditions is examined by using noisy acceleration responses to simulate actual measurements. The noisy acceleration response is elaborated by adding random Gaussian white noise to the acceleration response obtained from the numerical simulation of the bridge. The noise intensity is quantified by the signal-to-noise ratio (SNR): where A S and A N denote the root mean squares of the numerical acceleration response and the noise, respectively. Three high noise levels of SNR = 10 dB, 20 dB, and 30 dB are considered to be used in the analysis of damage identification. At each noise level, the hierarchical neural network model is implemented, with the identification results presented in Figure 10a-c for SNR = 10 dB, 20 dB, and 30 dB, respectively. From the figures, (i) damage location can be discriminated by the model, with errors merely obstructing the judgment in some minor damaged cases; (ii) the results in the noisy conditions show that the model gives an excellent prediction of damage severity, even better than that for the noise-free condition. The effect of noise on damage identification is indicated by the MSE and maximum error, given in Table 4. From the MSE, we can see that this model still functions in noisy conditions with quite a low error; from the maximum error, it is evident that errors are mostly controlled below the medium level, so that they may only interfere with the identification of minor damage. Therefore, conclusions can be obtained that the proposed hierarchical neural network model can locate and quantify damage in the bridge with great accuracy and strong robustness against noise.

Robustness Against Noise
Robustness against measurement noise is a key factor for assessing the performance of the damage detection methods [56,57]. The capability of the proposed model to identify damage in noisy conditions is examined by using noisy acceleration responses to simulate actual measurements. The noisy acceleration response is elaborated by adding random Gaussian white noise to the acceleration response obtained from the numerical simulation of the bridge. The noise intensity is quantified by the signal-to-noise ratio (SNR): where S A and N A denote the root mean squares of the numerical acceleration response and the noise, respectively. Three high noise levels of SNR = 10 dB, 20 dB, and 30 dB are considered to be used in the analysis of damage identification. At each noise level, the hierarchical neural network model is implemented, with the identification results presented in Figure 10a-c for SNR = 10 dB, 20 dB, and 30 dB, respectively. From the figures, (i) damage location can be discriminated by the model, with errors merely obstructing the judgment in some minor damaged cases; (ii) the results in the noisy conditions show that the model gives an excellent prediction of damage severity, even better than that for the noise-free condition. The effect of noise on damage identification is indicated by the MSE and maximum error, given in Table 4. From the MSE, we can see that this model still functions in noisy conditions with quite a low error; from the maximum error, it is evident that errors are mostly controlled below the medium level, so that they may only interfere with the identification of minor damage. Therefore, conclusions can be obtained that the proposed hierarchical neural network model can locate and quantify damage in the bridge with great accuracy and strong robustness against noise.  The merit of the robustness against noise of the proposed model is primarily attributed to two factors, i.e., the WPT and the NLPCA: ① WPT. The WPT is an integral transform that has the function of denoising. Moreover, a WPNE resulting from the WPT is formed with the summation of wavelet packet node coefficients squared, which is an operation of assimilating noise. ② NLPCA. For some WPNEs of lower magnitude, they are commonly dominated by noise. Such WPNEs can be almost eliminated by the NLPCA due to their insignificance when conducting NLPCA for feature extraction, resulting in the reduction of noise. The two factors endow the proposed model with the merit of robustness against noise.

Comparison with Traditional Methods
The performance of the proposed model is compared to that of a traditional neural-network-based damage assessment method that integrates the general linear PCA and back-propagation neural networks (BPNNs) to identify structural damage [58]. By the way of illustration, this method is used to re-perform the procedure of identifying damage in the bridge in noisy condition. As an alternative to the NLPCA, the PCA is employed to extract damage features from the WPNEs, and then the BPNNs are utilized to recognize damage patterns. Fed with the extracted damage features using PCA, the BPNNs predict the location and severity of the damage, giving the results of the damage identification shown in Figure 11. The results of damage prediction measured by MSE is 5.365 and by maximum error is 12.763. Conspicuously, this traditional model does not identify the severity of the damage with great accuracy. In particular, a relatively significant error of damage The merit of the robustness against noise of the proposed model is primarily attributed to two factors, i.e., the WPT and the NLPCA: 1 WPT. The WPT is an integral transform that has the function of denoising. Moreover, a WPNE resulting from the WPT is formed with the summation of wavelet packet node coefficients squared, which is an operation of assimilating noise. 2 NLPCA. For some WPNEs of lower magnitude, they are commonly dominated by noise. Such WPNEs can be almost eliminated by the NLPCA due to their insignificance when conducting NLPCA for feature extraction, resulting in the reduction of noise. The two factors endow the proposed model with the merit of robustness against noise.

Comparison with Traditional Methods
The performance of the proposed model is compared to that of a traditional neural-network-based damage assessment method that integrates the general linear PCA and back-propagation neural networks (BPNNs) to identify structural damage [58]. By the way of illustration, this method is used to re-perform the procedure of identifying damage in the bridge in noisy condition. As an alternative to the NLPCA, the PCA is employed to extract damage features from the WPNEs, and then the BPNNs are utilized to recognize damage patterns. Fed with the extracted damage features using PCA, the BPNNs predict the location and severity of the damage, giving the results of the damage identification shown in Figure 11. The results of damage prediction measured by MSE is 5.365 and by maximum error is 12.763. Conspicuously, this traditional model does not identify the severity of the damage with great accuracy. In particular, a relatively significant error of damage identification occurs in the cases whose damage is located in Ele 2 and Ele 5. By comparing Figure 11 with Figure 9, it can be easily seen that the proposed hierarchical identification model assesses the damage with greater accuracy, the performance of which is superior to that of the traditional method. This conclusion is further verified quantitatively by much lower magnitude of MSE and smaller value of maximum error for the proposed model than the traditional one for other cases. The comparison between the two methods demonstrates the great significance and effectiveness of the proposed hierarchical neural network based model. Appl. Sci. 2017, 7, 391 17 of 20 identification occurs in the cases whose damage is located in Ele 2 and Ele 5. By comparing Figure 11 with Figure 9, it can be easily seen that the proposed hierarchical identification model assesses the damage with greater accuracy, the performance of which is superior to that of the traditional method. This conclusion is further verified quantitatively by much lower magnitude of MSE and smaller value of maximum error for the proposed model than the traditional one for other cases. The comparison between the two methods demonstrates the great significance and effectiveness of the proposed hierarchical neural network based model.

Concluding Remarks
This study proposes a hierarchical neural network model for intelligent damage identification of bridge structures. This model features the synergetic functions of AANNs in extracting damage features from WPNEs, together with the functions of LMNNs in recognizing damage patterns. The model is applied to damage identification in a three-span bridge structure, with its high accuracy of identifying the location and severity of damage and its strong robustness against noise substantiated. The marked strength of the model in portraying damage is attributed to three factors: (i) The WPNEs contain much richer damage information than some traditional modal features, in that the WPNEs carry not only modal damage information but also non-modal damage information; (ii) Damage features, i.e., NPCs, extracted from WPNEs using AANNs, inherit the advantages of both WPT in denoising and NLPCA in accommodating noise. Firstly, the noise is spread during the wavelet packet decomposition, and their effects are greatly weakened during this

Concluding Remarks
This study proposes a hierarchical neural network model for intelligent damage identification of bridge structures. This model features the synergetic functions of AANNs in extracting damage features from WPNEs, together with the functions of LMNNs in recognizing damage patterns. The model is applied to damage identification in a three-span bridge structure, with its high accuracy of identifying the location and severity of damage and its strong robustness against noise substantiated. The marked strength of the model in portraying damage is attributed to three factors: (i) The WPNEs contain much richer damage information than some traditional modal features, in that the WPNEs carry not only modal damage information but also non-modal damage information; (ii) Damage features, i.e., NPCs, extracted from WPNEs using AANNs, inherit the advantages of both WPT in denoising and NLPCA in accommodating noise. Firstly, the noise is spread during the wavelet packet decomposition, and their effects are greatly weakened during this procedure. Secondly, using NLPCA to process the energy can also remove a part of the noise as this technique omitting some less important information of the original data. Hence, the damage features can characterize damage in severely noisy conditions; (iii) LMNNs are powerful in capturing the underlying relations between damage features and damage states, providing the possibility of recognizing damage patterns efficiently. The proposed model provides a prototype for creating effective technologies to identify damage intelligently in realistic, complex engineering structures.