1. Introduction
Recent years has seen a growing popularity of nondestructive evaluation (NDE) in the evaluation of infrastructural aging. One of the most promising NDE techniques for bridge defects is deep learning. This technique has various advantages: the portability of devices and the swiftness of data collection and processing. Hence, deep learning enjoys a great application potential in fast defect detection of real-world bridges.
Deep learning is an important field of machine learning [
1,
2]. Unlike traditional methods for image recognition, deep learning, which mimics the visual perception of humans, expresses the image features in an abstract form rather than actively establish the key features. Based on deep learning, artificial intelligence (AI) [
3,
4] is realized in the computing system by setting up artificial neural networks (ANNs). Below is a brief review of relevant studies on the application of the artificial neural networks (ANNs) in image classification tasks.
Gopalakrishnan et al. [
5] introduced deep migration learning to detect surface cracks of roads made of hot mix asphalt (HMA) and Portland cement concrete (PCC), and observed the best detection effect on the single-layer neural network (NN) after feature training. Based on a convolutional neural network (CNN), Gulgec et al. [
6] conducted finite-element simulation of cracked gusset plate connections in steel bridges, and differentiated between defected and healthy bridge samples. Wang et al. [
7] proposed a CNN method based on the sliding window, which uses deep learning frameworks from AlexNet and GoogLeNet to classify bridge damages.
Chen and Jahanshahi [
8] trained the CNN with the track defects in the Photometric Stereo Images Database. The defects are cavities on the track surface, which indicate further surface degradation before track breakage. Cha et al. [
9] proposed an NN to classify images on bridge surfaces, and compares the expected reflection features of defected bridge surfaces and those of intact bridge surfaces. Specifically, the classical CNN is trained in a purely supervised manner, and the influence of different regularization methods is discussed including unsupervised layered pre-training and enhancement of bridge defect datasets.
Wu et al. [
10] developed a CNN-based approach to detect fatigue cracks from real crack images at triangular plate joints, which are the most fatigue-prone joints of steel bridges. By this approach, the image datasets are collected from previous fatigue tests and bridge checks under uncontrolled settings. Next, the pre-trained benchmark CNN architecture on large general image datasets is adopted to transfer the deep learning features to bridge crack classification tasks. Zhang et al. [
11] put forward a novel road crack detection algorithm based on deep learning and adaptive image segmentation. First, the deep CNN was trained to judge the presence of cracks in images. Then, the crack-containing images were smoothed by a bilateral filter to minimize the number of noise pixels. Lastly, the cracks were extracted from pavement through adaptive thresholding.
Pan et al. [
12] designed a machine learning algorithm that adapts well to actual scenarios. Fan et al. [
13] detected structural defects by machine learning. The signals were collected via nondestructive tests, and the defectiveness was evaluated based on the signals. Ding et al. [
14] identified bridge cracks using Welch’s power spectrum and a generalized regression neural network (GRNN). Guan et al. [
15] used a surface acoustic wave and the NN to identify pavement cracks of bridges. To determine the material damages of bridge surfaces, Yam et al. [
16] identified the vibration of composite structures through wavelet transform and deep learning-based NN.
Peng et al. [
17] decomposed the gray images of various roads into approximate and detailed sub-images through wavelet transform, created a feature vector based on energy features of the wavelet, and computed the gray level co-occurrence matrix (GLCM) features of detailed sub-images. Lastly, the backpropagation neural network (BPNN) was optimized, using eight characteristic parameters, and was trained to achieve the best results. Sharif Razavian et al. [
18] summed up the merits of the CNN over the standard NN, which include the ability to capture the grid topology of road surfaces, the need for fewer computing iterations due to neuron sparsity and the pooling operation, and the excellent results of road defect detection.
With the aid of the CNN, Soukup and Huber-Mork [
19] found the defects on railway surfaces, which enhanced the manufacturing quality of railway workpieces. Qiao et al. [
20] designed a practical method to detect defects of bridge surfaces, which integrates the CNN with pattern recognition. Shang L. [
21] combined image processing and the CNN to detect defects on bridge surfaces. In light of deep learning, Yang and Zhao [
22] proposed a CNN-based defect detection algorithm of steel surfaces, which establishes a CNN model and datasets to extract and detect steel surface defects automatically. Prasanna et al. [
23] achieved 90% detection accuracy through thousands of crack tests on bridge surfaces.
German et al. [
24] integrated thresholding and template matching to measure the spalling degree of post-seismic bridges, and observed that the integrated strategy achieved 81.1% in mean accuracy and 80.2% in sensitivity on horizontally-exposed concrete. Chen et al. [
25] adopted the deep NN to detect different types of defects. Four independent classifiers were trained by one or several types of bridge surface defects. The trained classifiers realized an accuracy of 79.5% and a sensitivity of 65.0%. The results fully demonstrate the classification ability of deep NN. You et al. [
26] proposed a deep learning method that directly estimates bridge defects from outdoor images without relying on expensive customized sensors. Imani, M. et al. [
27] proposed a Bayesian decision making framework for control of Markov Deci- sion Processes (MDPs) with unknown dynamics and large, possibly continuous, state, action, and parameter spaces in data-poor environments.Xie S. et al. [
28] constructed a classification rule for nonstationary data through linear discriminant analysis (LDA), using a linear Gaussian state space model. Imani M. et al. [
29] proposed an approach for finite-horizon control of partially-observed Boolean dynamical systems (POBDS) with uncertain, continuous controlled input and infinite observation space. Imani M. et al. [
30] derived an optimal Bayesian estimator for damage state and parameters, and created deep learning features based on the CNN and a recurrent neural network (RNN). The CNN and the RNN were connected with shortcuts. The former captures the full view of the detection target, while the RNN mimics the transfer of human attention. The extension of the CNN to the RNN makes defect detection more accurate, which provides a novel deep learning approach in the industrial field.
Through the above analysis, this paper attempts to develop an intelligent classification system for surface defects on the cement concrete bridge. First, a complete dataset of concrete bridge surface defects was established, and the tags of seven classes of defects were set up according to the current code in China. Next, the sample images were preprocessed through morphology-based weight adaptive denoising. Lastly, the parameters of the VGG-16, which is a classic CNN, were optimized to realize intelligent identification and classification of surface defects.
3. Basic Principles of Deep Learning
3.1. The CNN
The CNN is an emerging deep ANN capable of convolutional computation, which offers a representative algorithm of deep learning. Mimicking the visual perception of humans, the CNN supports supervised learning, unsupervised learning, and semi-supervised learning. Over the years, this network has been successfully applied to such fields as natural language processing, voice recognition, and image classification.
The CNN model generally consists of the convolutional layer, the pooling layer, the fully connected layer, and the Softmax classification layer. Among them, the convolutional and pooling operations can be repeated many times to optimize the filtering effect. The relatively mature CNN models include LeNet, AlexNet, GoogLeNet, VGG-16Net, and ResNet. All of them have performed well in the field of artificial intelligence (AI).
3.1.1. Convolutional Layer
There is a kernel on each convolutional layer. During the operation, the kernel processes different areas of the input image according to the preset parameters (e.g. the step length), which outputs a convoluted feature map. Through the convolutional layer, the features can be extracted from the input image through the following convolutional computing.
where
x is a 2D vector of the input image,
A and
D are the length and width of the 2D vector, respectively, w is the convolutional kernel, j and i are the length and width of the kernel, respectively, b is the bias of each element of the kenel,
is the convolutional result, and
F is the activation function.
3.1.2. Pooling Layer
After the feature extraction through the convolutional layer, the resulting feature map is transferred to the pooling layer for further feature extraction and information filtering. The common pooling methods include max pooling, average pooling, and global average pooling. The most classic method is max pooling. For instance, take a 1D 6 × 6 input matrix without padding. The max pooling with a filter size of 2 × 2 and a step length of 2 will output a 1D 3 × 3 matrix. Therefore, the feature map is reduced by half in size through max pooling, while the dimensions of the information channels remain unchanged.
Figure 5 explains the max pooling method of this paper.
3.1.3. Fully Connected Layer
The fully connected layer appears at the end of the CNN. The output of convolution/pooling is flattened into a single vector of values, with each representing a probability that a certain feature belongs to a label. In this layer, each neuron is connected to every neuron in the superior layer, such that the distributed features extracted and learned in the previous layers can be mapped into the tag space. In other words, all the features previously obtained are consolidated into a unified output, which reduces the burden of subsequent classification tasks. To improve the nonlinear expression, the fully connected layer is usually activated by an activation function.
Figure 6 presents the structure of a relatively simple, fully connected layer.
3.2. Activation Function
In the neural network (NN) architecture, some nonlinear functions are often introduced to activate some inputs or outputs. These functions are known as activation functions. The overlapping of these functions will improve the nonlinear fitting of the CNN. Typical activation functions include sigmoid function, tanh function, and ReLU function.
3.2.1. Sigmoid Function
The sigmoid function can be expressed by the equation below.
The sigmoid function is one of the earliest and most popular activation functions. However, there are two clear defects with this function. First, this function can easily lead to vanishing gradient, which makes it impossible for neurons to accept signals or transfer weight and data. Second, the output of sigmoid function (
Figure 7a) is not based on zero but the interval of [0, 1], i.e., the function always outputs positive results. Thus, the weight gradients are either all positive or all negative, which adds to the optimization difficulty.
3.2.2. Tanh Function
The tanh function expression can be expressed by the equation below.
As shown in
Figure 7b, the tanh function controls the output within [−1, 1], which overcomes the second defect of sigmoid function and facilitates the optimization process. Nevertheless, the tanh function also faces the problem of a vanishing gradient.
3.2.3. ReLU Function
The rectified linear unit (ReLU) function can be expressed by the equation below.
Compared with a sigmoid function and tanh function, the ReLU function (
Figure 7c) solves the vanishing gradient and speeds up the convergence. Therefore, this function has a small error in the back-propagation process.
3.3. Transfer Learning
Transfer learning refers to transferring the acquired knowledge in a certain field to the related disciplines, which eliminates the need to restart the learning of the entire knowledge system. If applied in image classification, transfer learning mainly relocates the knowledge acquired from the training of the original task to the application of the target task.
In this paper, the surface defects of cement concrete bridges are detected based on the VGG-16. Specifically, the VGG-16 was trained on the ImageNet dataset, and the weights and parameters of the trained model were migrated to our model. Thus, our model could achieve a better detection effect on cement concrete bridges.
4. Model Construction and Experiments
4.1. Construction and Optimization of the Visual Geometry Group Network-16 (VGG-16) Model
The VGG-16 is a CNN with great strength in image classification. A typical VGG-16 consists of 13 convolutional layers and three fully connected layers. The relatively large kernels are replaced with two or three 3 × 3 kernels. Through the replacement, the nonlinear fitting ability will be enhanced, such that the same convolutional effect can be achieved with fewer parameters.
In this case, each input image is subjected to feature extraction on the convolutional layer with a step length of 1. Then, the resulting feature map received max pooling with the step length of 2 and the pooling window of 2 × 2. The planar size of the feature map was halved after max pooling. Through multiple convolutions, pooling, and ReLU activation, the half-size feature map was inputted into three fully connected layers, whose channel number was 4096, 4096, and 1000, respectively. The greater the number of channels, the lower the spatial resolution. Thus, the input image could be converted into the classification vector smoothly in terms of the dimension. Lastly, the obtained data were classified by a Softmax classifier with 1,000 tags. The proposed VGG-16 model is illustrated in
Figure 8.
The VGG-16 convolutional neural network model contains a large number of weight parameters, which are trained by a large image database of ImageNet, so it has a strong ability of deep feature learning and excellent performance in image classification and target extraction detection (
Figure 9). The depth of the configurations increases from the left (A) to the right (E), as more layers are added (the added layers are shown in bold). The convolutional layer parameters are denoted as“conv<receptive field size> - <number of channels>”.The rectified linear unit (ReLU) activation function is not shown for brevity.
In this paper, the principles of fine tuning and transfer learning are implemented to save cost and time of the entire network. The mature VGG-16 model and trained parameters were directly improved to recognize the seven classes of surface defects on cement concrete bridges. The improved network can be divided into convolutional, pooling, fully connected, and Softmax classification layers.
To further optimize the network performance, the ReLU function was adopted as the activation function. The three fully connected layers of the VGG-16 model were reduced into two, and the Softmax classifier with 1,000 tags was changed into a Softmax classification layer with seven tags. Through these fine tunings, the VGG-16 model can efficiently and accurately recognize the surface defects of cement concrete bridges.
Overall, our VGG-16 model has a total of 16 convolutional or fully connected layers, including five groups of convolutional layers and three fully connected layers: 16 = 2+2+3+3+3+3.
4.2. Experimental Steps
The parameters of the pre-trained VGG-16 model were migrated to the surface defect detection model for cement concrete bridges, which follows the principles of fine tuning and transfer learning. In addition, the fully connected layers and Softmax classification layer of the VGG-16 model were adjusted. Our experiments were carried out in the following steps.
The images of the seven defect classes were randomly extracted from the database of defect samples of cement concrete bridges, and inputted as the training set.
The images are polluted by noises in collection and transfer. In this case, the morphology-based weight adaptive denoising method is applied to enhance the image features and eliminate the useless noises from the images. Then, the input images were scaled and cropped into the standard size of 224×*224.
The multi-class logistic regression objectives were optimized through backpropagation with small batch gradient descent with momentum (Batch size: 256. Momentum: 0.9). To void overfitting, the weight attenuation was regularized (L2 penalty factor: 5 × 104) and the first two fully connected layers were discarded (discard ratio: 0.5). The learning rate was initialized as 10−2, and then reduced by 10 times when the accuracy of the test set no longer improved.
Based on the pre-trained VGG-16 model, the three fully connected layers were reduced to two and the Softmax classifier with 1,000 tags was replaced with a Softmax classification layer with seven tags.
The pre-trained parameters of the VGG-16 were migrated to the 13 convolutional layers and pooling layers of the surface defect detection model for cement concrete bridges through fine tuning and transfer learning.
Before training, the momentum, learning rate, and training time were configured manually. The parameters of the first 13 convolutional and pooling layers were kept constant, and the model was randomly initialized to obtain the initial loss. The model was then iteratively optimized through error backpropagation and gradient descent, which aims to train the parameters of the subsequent two fully connected layers and the Softmax classification layer with seven tags. The optimization was terminated until the value of the loss function reached the minimum. At the termination, the weights and biases of the model were optimal, which correspond to the best modelling effect.
The remaining images of the seven defect classes were extracted from the database and used to verify the established surface defect detection model. The accuracy and efficiency of the model were tested through the steps in
Figure 10.