In this section, the proposed GW-CNN based FCD method is introduced. Firstly, GW features that are extracted from multiple GW exciting-acquisition channels are used to construct the CNN input vector. Then, a CNN model is designed and trained for FCD. Finally, the trained CNN model can be used for reliable FCD of a monitored structure.
2.1. Multi-Channel and Multi-GW Features Extraction
GW is a kind of elastic waves that propagates in plate-like structures. It can travel a long distance in structures with small energy loss; hence, it can be used to monitor a relatively large structure area. Piezoelectric transducer (PZT) is a conventional kind of sensor used to excite and receive GW in structure. A typical sensor network configuration is shown in
Figure 1 where
J (
) exciting-acquisition channels are formed. From one sensor network, the number of obtained GW signals is
J. After the GW is excited in the structure by a PZT, the interaction of the GW with the crack can influence the GW propagation. By comparing the received signals under healthy and cracked conditions, the length of the crack can be estimated.
Many kinds of DIs can be defined to evaluate the variations between the baseline signal collected when the structure is healthy and the monitoring signal when the crack propagates in the structure. These DIs are extracted from GW signals in the time domain, frequency domain, and time–frequency domain.
As mentioned above, various GW features can be obtained from one crack length, which provides the potential of reliable FCD through data fusion of CNN. Based on this, a multi-channel and multi-GW features extraction method is proposed as follows. Firstly, under the one crack length, GW signals from
J exciting-acquisition channels are acquired, recorded as
. Then, from one GW signal
si,
d kinds of DIs were chosen for GW features extraction, which were recorded as
,
j {1,2,…,
J}. Finally, the extracted DIs were arranged into a feature vector of
p in order of channel as shown in Equation (1):
The feature vector
p was standardized to
to obtain a similar data distribution [
23] for each input feature to achieve a better learning efficiency and improve the accuracy of FCD.
2.2. CNN Based Fatigue Crack Diagnosis
Based on the above procedure of data preprocessing, DIs from multi-channels are formed into a one-dimensional (1D) feature vector
, which is designed as the CNN input. Crack lengths are divided into
z (
) crack sizes for classification; each crack size is an interval that contains a certain crack length. Softmax classifier [
24] is adopted as the output layer to express
z crack sizes, which makes the CNN output a vector that has the form of
. The classification result corresponds to the class that has the highest output value. For example, the desired output
q, which denotes the
kth crack size, is defined as a vector with
z elements as
, in which the
kth element value
qk = 1 and the rest are 0.
In this study, the GW signals corresponding to crack lengths are obtained from known structures, which are called historical data. The feature vector obtained from historical data and its corresponding desired output q is formed into an input sample {}. These historical input samples are utilized for CNN training.
CNN is designed to process data that come in the form of multiple arrays. For example, two-dimensional data for pictures or audio spectrograms and 1D data like signals and sequences [
19]. There are four key ideas behind CNN that take advantage of the properties of input data: local connections, shared weights, pooling, and the use of multiple layers [
16]. Typically, CNN for classification is composed of five parts [
18]: convolutional layers (CL), pooling layers (PL), flatten layer (FL), full-connected layers (FCL), and classification layer, as shown in
Figure 2.
M and
L represent the total layer number of convolutional layers and full-connected layers respectively. The convolutional and pooling layers are alternated layer by layer for extracting features from input data, the flatten layer is used to transform the outputs of pooling layer into the 1D feature set, and the last two parts are employed for classification from the learned features.
The convolutional layer, as the name suggests, utilizes the operation of convolution to process input data. In one convolutional layer, there are several convolutional kernels (or filters), for example
El convolutional kernels at the
lth (
) layer. Each convolutional kernel consists of a certain number of trainable weights. One convolutional feature at
lth layer is calculated as follows:
where the symbol of asterisk
* denotes the convolutional operation;
represents the convolutional feature of
ith convolutional kernel that slides to the
jth region of the input vector, where
;
f represents the activation function;
represents the vector composed of the elements in
jth region of the input vector;
; and
denote the
ith convolutional kernel and its bias.
The activation function is defined as Rectified Linear Units (ReLU) function, which has more simple derivative result than traditional tanh and sigmoid functions leading to faster training when using the training algorithm. The mathematical expression of the ReLU function is shown as follows:
Each convolutional kernel slides over the input vector, then all the convolutional features constitute the convolutional output
which is obtained by the
ith convolutional kernel:
Different convolutional kernels attain different convolutional outputs, which fuse information of the input vector from different ways. That is to say, different perspectives of features that are beneficial to classification can be extracted. Furthermore, the convolutional layer provides characteristics of local connections and shared weights, which can reduce the number of parameters, significantly reduce the computational costs, and have certain robustness to local noise [
19].
The pooling layer is usually connected after the convolutional layer, which is used to sub-sampling features with maximum pooling, average pooling or other operation. Assuming the pooling size is
c, the maximum pooling feature of the
jth region at
lth pooling layer can be expressed as:
where
represents the pooling feature when slides to the
jth region of the
ith convolutional output and
represents the
jth region of the
ith convolutional output.
The vector composed of all pooling features is the pooling output denotes as
after sliding. The key idea of pooling is to reduce the amount of data transferred to the next layer. Moreover, the pooling takes typical features as its outputs, which owns the ability of invariance [
19].
In consequence, the framework of the proposed CNN based FCD method can be described in
Figure 3. It is noticeable that the proposed CNN model has a nine-layer configuration, which consists of one input layer, three convolutional layers followed with one pooling layer, a flatten layer, two full-connected layers, and an output layer. In the CNN model, to obtain features that are less affected by uncertainties, the first convolutional layer (CL1) is used to extract features from a GW signal and fuse crack information from different channels of GW signals at the same time. As mentioned above, DIs are arranged into an input vector in order of channel. Therefore, the crack information from the same channel is in
d DIs that are arranged sequentially in the input feature vector. Accordingly, based on the above analysis, a set of kernels with the size of
are utilized in CL1. In this study, each element of the feature vector contains useful crack information and the number of elements is really scarce. However, a pooling layer may lead to the loss of valuable information because it only chooses one typical feature as its output in a pooling area. Hence, there is no pooling layer followed by CL1, but two convolutional layers are added as CL2 and CL3. Taking the Pyramid Shape mentioned in Leslie’s paper [
25] into consideration, the kernel size decreases throughout the architecture to get better performance. Next, a pooling layer (PL) is added followed by a flatten layer which is used to transform the outputs of PL into the 1D feature set. Then, two full-connected layers (FCL) are added, denoted by FCL1 and FCL2. The outputs of the flatten layer are employed as the inputs of FCL1. These two full-connected layers are designed as hidden layers to map the features into CNN outputs and select the features that are less affected by uncertainties. Most features that less affected by uncertainties will be found, which are in favor of leading to a higher FCD accuracy. In addition, the L2 regularization method [
26] is operated in full-connected layers during the training process to reduce the possibility of overfitting.
The training of CNN is a procedure of optimizing the connected weights and bias, which are trained via minimizing the cost function with training data. In the proposed method, training data are input vectors at different crack length collected from historical data as mentioned above. The cost function of the proposed model is defined as the cross-entropy function in accordance with the softmax classifier which is the output layer of the model, given as the following equation:
where
denotes the weight that connects the
jth neuron at the
lth layer and the
ith neuron at the (
l + 1) th layer and
ml denotes the neuron number at
lth layer. In Equation (6), the first term is the cross-entropy between the CNN output and the desired output, and the second term is the L2 regularization part, in which
denotes the regularization coefficient. Here, weights are initialized with Xavier method [
27], which makes the weights of each layer obey the following uniform distribution:
Bias is initialized to be
bl = 0. Then, weight and bias are updated with the Adam optimization method [
28], in which independent adaptive learning rate is designed for different parameters by calculating the first and second order moment estimation of the gradient. In the Adam optimization method, the first-order moments of weight gradient and bias gradient are initialized to be
and the second-order moments are initialized to be
firstly. Next, based on the backpropagation algorithm, the recurrence relation for the sensitivity can be written as follows:
where
and
denote the sensitivity of
jth neuron at the
lth layer and the
ith neuron at the (
l + 1)th layer and
denotes the derivative of the activation function at
lth layer. Then, the gradient can be calculated by the following expression:
After that,
Vdw,
Vdb,
Idw, and
Idb can be updated as follows:
where
k denotes the number of iterations and
and
are parameters that are generally preset as 0.9 and 0.999, respectively. Then, the updated values are corrected as follows:
where the superscript represents the corrected value. Finally, the weight and bias are updated as the following equations:
where
denotes the learning rate and
is set to prevent the denominator from being 0, which can be
.
The training of CNN is finished when the cost function is close to converging. Eventually, the input feature vector from a similar monitored structure is put into the trained model to get its diagnostic crack size.