Review of Vibration-Based Structural Health Monitoring Using Deep Learning

With the rapid progress in the deep learning technology, it is being used for vibration-based structural health monitoring. When the vibration is used for extracting features for system diagnosis, it is important to correlate the measured signal to the current status of the structure. The measured vibration responses show large deviation in spectral and transient characteristics for systems to be monitored. Consequently, the diagnosis using vibration requires complete understanding of the extracted features to discard the influence of surrounding environments or unnecessary variations. The deep-learning-based algorithms are expected to find increasing application in these complex problems due to their flexibility and robustness. This review provides a summary of studies applying machine learning algorithms for fault monitoring. The vibration factors were used to categorize the studies. A brief interpretation of deep neural networks is provided to guide further applications in the structural vibration analysis.


Introduction
Vibration originates from many different sources, including fluid flow, magnetic fields, translating and rotating elements, imbalances, interactions and frictional contacts. The vibration mode depends on the structural properties, geometry and boundary conditions, which may change with time. Consequently, the transient variation of the vibration response shows large statistical deviations. Despite the difficulties with interpretation of the measured values, the response provides important information about the status of a structure. Advances in engineering technology have increased the size and complexity of structures. To minimize the costs induced by unexpected failure, the need for continuous structural health monitoring is increasing. Using vibration to monitor structural integrity allows for robust, efficient and straightforward implementation. However, using vibration for fault inspection requires understanding of multidisciplinary research fields. According to Worden and Manson [1], the fundamental issue with structural health monitoring is to identify whether a system has departed from the normal state. Rytter [2] proposed a four-level method to identify damage: (1) determination of the existence of damage to the structure; (2) determination of the location of the damage in the structure; (3) quantification of the degree of damage; (4) estimation of the life expectancy of a structure influenced by damage.
Several attempts have been made to apply machine learning to structural health monitoring. Rapid advances have led to the application of deep learning [3] for vibration-based health diagnosis. Kar and Mohanty [4] and Singh and Sa'ad Mohammed [5] used Fourier and wavelet transforms of the vibration signal to detect defects in gearbox and induction machines. Konar et al. [6] investigated the efficiency of selecting features and usage as learning patterns for inspection. Data-driven methods have been used for the fault diagnosis of complex engineering systems. They require little understanding of the kinematics and physics of the tested system [7]. This data-driven approach requires two steps: feature extraction and fault classifier design. Time, frequency and time-frequency analysis is widely used to extract features of the vibration responses of a rotating machine. This method provides a very accurate diagnostic performance at the laboratory scale. This high level of accuracy depends on the physical properties, size or position when applied to actual operating systems. The vibration was also used to monitor the health of composite laminate structures. The natural frequency, damping, mode shapes curvature, transfer function and frequency response are used to analyze the transient responses. Defects have been diagnosed by using the time of flight for the propagation of a high-frequency guided wave transmitted to a structure. Low-frequency vibration has disadvantages for localization because the entire structure is excited [8]. High-frequency surface wave propagation can be utilized for precise measurement, but the requirement of a high-voltage power source increases the costs of the monitoring system.
Inverse analysis method was proposed to analyze the fracture of structures. The lifespan of the structure decreases with the growing crack. Inverse problems were used to estimate the location of an unknown damage in a structure [9,10]. Khatir et al. [11][12][13][14][15] proposed an optimal technique by applying an inverse problem to FEM. Zenzen et al. [16] used frequency response functions (FRF) to find the position and quantity of damage in truss structure based on genetic algorithm (GA) and bat algorithm (BA). The algorithms with better accuracy and computational time were identified. Tiachacht et al. [17] modified the Cornwell indicator (CI). This modified Cornwell indicator (MCI) improved in accuracy and localization compared to the CI.
Design of a fault classifier requires condition monitoring for offline training. Failure data in the field are difficult to obtain owing to the limited availability. Required data for training are generated from laboratory testbeds or from a smaller system to acquire labelled data for training. Testbed data are not guaranteed to those expected in the actual system data. To fill this gap, many deep-learning-based approaches have been proposed. Recent progress in machine learning technology has rapidly expanded its application to vibration-based structural health monitoring. This review summarizes recent attempts at actual applications. This information will be useful for proposals of relevant methods for specific systems requiring inspection and maintenance. Studies using deep learning for health monitoring with respect to vibration characteristics are introduced.
Magalhães et al. [51] studied the aging and degradation of a bridge. To carry out an ambient vibration test, twelve force balance accelerometers were installed inside the deck box girder to measure the lateral and vertical accelerations. They monitored the time evolution of the natural frequency with the surrounding temperature, natural frequency variation with the RMS values of the vertical acceleration, average day evolution of the modal damping ratios and natural frequency change during the working days to observe possible changes in the status of the bridge. Regression analysis and PCA permit the detection of very small frequency shifts. Kessler et al. [52] used the frequency response to detect damage in composite materials. They manufactured composite specimens with holes, reduced modulus and delamination and used a laser vibrometer to measure vibrations. Each specimen was adhered to two lead zirconate titanate (PZT) wafers for excitation. The extent of damage and reduction in natural frequency strongly corresponded to the clamped composite plates. The finite element model (FEM) has been used to compare the frequency responses in experiments. To identify the damage to a k i=1 α 2 ki = 1 (2) or equivalently, α k α k = 1, using standard vector notation.
In case of the agglomerative clustering algorithms, all data are initially set to clusters. When the distance between the data is closer than the specified value, it is merged into the same cluster. Hierarchical and nonhierarchical methods exist for calculating the distance of data in a cluster during a merge process [74]. The hierarchical method, such as agglomerative clustering, proceeds with the memory of the previous cluster when clustering again after the merger. The relationship between clusters is used in the clustering process. The nonhierarchical method such as k-means clustering and DBSCAN considers the clustered data in exactly the same group. There are methods to calculate the minimum, maximum and mean distance between all points in different clusters [75]. The algorithm ends when the number of clusters falls below the initially set value.
DBSCAN algorithm performs clustering based on the density of data [76]. When a certain number of data exist within a radius of data points, the core point is designated with border points in the surrounding points. When the calculation is applied to all data, and if the border point of the core point (c 1 ) is the same as the core point (c 2 ) of another cluster, it falls into the same group and shares all the border points. Data not included in any cluster are considered as a noise. In the case of DBSCAN, there is no configuration process for the number of clusters. It is possible to find clusters with geometric shapes and outliers for detection by noisy measurements [77].
The use of clustering for status identification using vibration responses is an effective methodology, as was illustrated for the clamping force estimation during bolt tightening process [72]. Figure 1 shows the vibration responses induced at the nut-runner when the bolt was tightened using different torques. During the bolt tightening, the magnitude of the vibration response increases and lasts for a short period of time. With these short vibration responses, the clamping force in the bolt was identified accurately compared to those using the tightening torque. For this purpose, various features were calculated.
During the clamping process, the axial force and the boundary stiffness of the bolt increased. With the increasing axial force, a shift in the resonance frequency occurs in the bolt vibration responses. The increasing bolt resonant responses are transmitted to the nut-runner. The accelerometer was used to measure the nut-runner vibration responses. The cepstrum analysis was used to analyze the vibration characteristics of the nut-runner after exclusion of the harmonic motor rotation components generated during the tightening process. During the fastening process, the nut-runner responses related to the bolt vibration occur in a relatively high frequency band, as visualized in Figure 2. It is not straightforward to explicitly determine the difference according to the change of the clamping force in the transient responses. This is more clearly presented through the cepstrum analysis. The real cepstrum was used as:x Appl. Sci. 2020, 10, 1680

of 24
where DCT is the discrete cosine transform and X(f) is the Fourier transform of the transient signal.
The discrete cosine transform is defined as: where k = 1, 2, · · · , N, y is the measured signal, N is length of signal and δ is Kronecker delta.
ometric shapes and outliers for detection by noisy measurements [77]. e use of clustering for status identification using vibration responses is an ology, as was illustrated for the clamping force estimation during bolt tightening pro 1 shows the vibration responses induced at the nut-runner when the bolt was tighten t torques. During the bolt tightening, the magnitude of the vibration response incre r a short period of time. With these short vibration responses, the clamping force in entified accurately compared to those using the tightening torque. For this purpose s were calculated.   Figure 3 shows the identified clamping force from the distance between the data formed by each clustering algorithm using the data with the largest clamping force as a reference. As can be seen from the identified results, the classification of the vibration responses using machine learning was performed with high accuracy that was not possible by the other approaches. Although the initial setting parameters of each method are different and the empirical method should be used to derive the optimal value, the suggested algorithms still can give a precise estimation with simple vibration measurements.
In addition to k-means clustering algorithm introduced through [72], the results in Figure 3 discriminate features derived using PCA, agglomerative clustering and DBSCAN. The k-means algorithm and the PCA confirm that the distribution of the identified results is relatively small. The PCA confirms that there is only one outlier. The agglomerative clustering shows that the dispersion of the discrimination increases as the tightening force increases. The DBSCAN shows an even spread in all tightening forces. The accuracy of the identified results may increase significantly with the use of deep learning algorithms, as discussed in the following sections. discriminate features derived using PCA, agglomerative clustering and DBSCAN. The k-means algorithm and the PCA confirm that the distribution of the identified results is relatively small. The PCA confirms that there is only one outlier. The agglomerative clustering shows that the dispersion of the discrimination increases as the tightening force increases. The DBSCAN shows an even spread in all tightening forces. The accuracy of the identified results may increase significantly with the use of deep learning algorithms, as discussed in the following sections.

Brief Introduction to Deep Learning and Its Future Applications in Vibration Analysis
This section introduces categories of deep learning that can be applied to many areas of vibration-based monitoring. Patterson and Gibson [78] defined four major architectures:

Brief Introduction to Deep Learning and Its Future Applications in Vibration Analysis
This section introduces categories of deep learning that can be applied to many areas of vibration-based monitoring. Patterson and Gibson [78] defined four major architectures:
A UPN gives a computer unlabeled data for training. The computer learns and organizes the data by itself to teach patterns in the data to the user. The UPN works with unsupervised pretrained input data to get the weights for the first layer. The performance of this training progressively improves to the output layer of the whole network, which is sorted by experience. Typical architectures for UPNs include the autoencoder, DBN and GAN. An autoencoder reduces the dimensions of a dataset and reconstructs the output data form. A DBN is made up of multiple restricted Boltzmann machine (RBM) layers that extract higher-level features of the input vector. While an autoencoder uses backpropagation for weights, an RBM learns stochastically. A GAN is composed of two networks: the discriminator and the generator. The generator creates results from the input data and lets the discriminator distinguish if it is real or created. Over multiple iterations, both networks evolve together.
In machine learning research, many acronyms are used that give information on the layers. The names and characteristics of examples of UPN applications are given here. Ng [79] proposed a sparse autoencoder (SAE) that uses more nodes for the hidden layers than the input layer. Vincent et al. [80] proposed a denoising autoencoder (DAE) that puts some noise in the input layer to prevent unlearning during training. Later, Vincent et al. [81] layered multiple DAEs to develop the stacked DAE (SDAE). Rifai et al. [82] proposed the contractive autoencoder (CAE), which is robust against small variations and outperforms the DAE at feature extraction. Kingma and Welling [83] proposed a generative model of autoencoders called the variational autoencoder (VAE).
Many architectures have been proposed to improve upon early models of the GAN. Radford et al. [84] proposed a deep convolutional generative adversarial network (DCGAN) to stabilize the training ability of GAN with a convolution layer. Mao et al. [85] proposed a least-squares GAN (LSGAN) that uses the least-squares loss instead of the sigmoid cross-entropy loss in the discriminator to overcome the vanishing gradient. Zhao et al. [86] replaced the discriminator network with an autoencoder, which they called an energy-based GAN (EBGAN). Arjovsky et al. [87] adopted the Wassertein distance for the loss function to solve mode collapse.
CNNs are widely used in image recognition. A representative CNN is composed of convolution, pooling, and fully connected layers. In the convolution layer, features are obtained from a small part of the data. The pooling layer reduces the number of parameters for a large image. The fully connected layer computes the class score as the output. There are many variations of the CNN architecture. LeCun et al. [88] proposed the Lenet classic CNN architecture. Kryzhevsky et al. [89] successfully applied a CNN for the first time in computer vision with AlexNet. Many attempts have been made to stack deep layers. Simonyan and Zisserman [90] showed the importance of depth for a CNN called VGGNet with Nineteen layers and Szegedy et al. [91] used a twenty-two layer CNN called GoogLeNet with many variation of filters in one layer. As the number of layer increase, gradient vanishing occurs. He et al. [92] used a residual block to solve gradient vanishing in ResNet, which has over 1000 layers. Huang et al. [93] proposed DenseNet; ResNet combines features through summation, while DenseNet combine features by concatenation.
The main concept of a recurrent neural network is the consideration of time steps. A recurrent neural network has a direction for hidden layer nodes. Recurrent neural networks can train regardless of the sequence length on both the input and output layers. Recurrent neural networks can be used to process time sequential data such as voice, text and raw vibration data. Hochreiter and Schmidhuber [94] proposed long short-term memory (LSTM) to solve the vanishing gradient problem Appl. Sci. 2020, 10, 1680 8 of 24 of recurrent neural networks. LSTM units control the amount of memory content with the output gate. Chung et al. [95] proposed the GRU without any control.
An RNN is similar to recurrent neural networks in that it is good at dealing with sequential data. Recurrent neural networks are also called RNNs in the literature; to distinguish between the architectures, only the recursive neural network is abbreviated as RNN in this paper. An RNN models hierarchical structures in a tree fashion, which is overly time-consuming and costly. This has led to a lack of attention being given to RNNs. Because an RNN processes all information of the input sequence, a specific tree fashion is the same as a recurrent neural network. Table 1 describes the advantages and drawbacks of each deep learning technique. In the case of deep learning using vibration signals, determining the appropriate architecture as well as preprocessing has a big influence on the results.

Health Monitoring Using Machine Vibrations
Power sources are important devices that need maintenance requirements to be estimated. Vibration occurs because of magnetic fields generated during power conversion. Harmonic vibration occurs without an external excitation device. The vibration characteristics depend on the efficiency of the power conversion. The rotation of elements also generates harmonic vibrations without excitation devices. Because rotating components have a strong influence on the vibration characteristics, the resulting signal processing needs to consider relevant variations.
The ambient excitation has been used to monitor large structures. Structural vibration is induced by variations in the surrounding environment (e.g., running vehicles, wind and interaction with nearby structures). The calculation of the vibration response requires transient vibration energy transfers to be considered. In addition, the vibration characteristics of the surrounding environment should be understood and correlated with the monitored structure.
Excitation occurs throughout the frequency band when the system is excited by external forces amplified by an impact or random signal. The excitation can be performed at a predetermined frequency to observe the behavior in a specific mode. This direct excitation provides an advantage in terms of the interpretation of results and assumptions for the desired mechanical deformations.
Zhao et al. [96] reviewed the health monitoring of machines using deep learning. They explained each deep learning technique and described an example application. The purpose of their paper was to provide a vibration approach to monitoring structures and to determine the appropriate method according to the cause and interpretation of the vibration response.

Power Source-Induced Vibration
Research on using deep learning in the health monitoring of moving parts in transportation systems, plants and manufacturing processes is increasing. Wang et al. [97] measured the vibrations of hydraulic cylinders on steel platforms to monitor five states: normal, blockage, leakage, cavitation and impulsion. The vibration signal of the hydraulic system provided sufficient state information for fault diagnosis with the sliding window spectrum feature (SWSF) of a deep belief network (DBN).
The feature extraction was fast and effective with the SWSF. The DBN allowed a large amount of data with redundancy and nonlinearity to be processed. Wen et al. [98] used an accelerometer to measure the vibration at the end face of an axial piston hydraulic pump to compare two conditions: piston shoes and swashplate (PS) and valve plate (VP) wear. Wen et al. used a convolutional neural network (CNN), which could easily distinguish grayscale images by using the transient variation of the vibration response to represent a pixel value, as shown in Figure 4. It can be seen that the pattern of normal state and the pattern of failure are different. They compared the results with the speeded up robust features (SURF)-based PNN carried out in [99] and found that their method outperformed. Two types of vibration occurred during stationary motion, and the spectral characteristics needed to be analyzed.
Szabó and Bakucz [100] applied a deep learning algorithm to a large gas engine that experienced abnormal vibration due to knocking. Knocking is an abnormal engine combustion process that should be prevented. Vibration was applied for a short time and monitored. They analyzed the chaoticity of the knocking by applying the DBN to distinguishing between the vibration caused by the valve and combustion during engine operation and the vibration caused by knocking. Luo et al. [101] detected the impact signal generated by defects to predict the failure of a machine during the production process, as shown in Figure 5. To solve the inverse problem, denoising, deconvolution and interpolation are used [102]. In the presented flow chart, deep learning is defined by the BPNN layer in step 4. If GA and MCI are used instead of BPNN, they can be defined as general inverse problems and SVM, PCA and k-means can be defined as machine learning. Deep learning was applied to analyze the dynamic characteristics. The impulse responses were difficult to analyze because of the harmonics and noise generated during the machining process. A deep autoencoder was used to distinguish between the normal vibration response and the impact vibration due to defects; 288 days of the transient vibration signal were used for learning and verification. Günnemann and Pfeffer [103] used the imbalance data of structure-borne noise from engine excitation to predict damage to the engine without measuring the vibration of the engine. The measured transient responses were transformed into the frequency domain with the Fourier transform. The 1D CNN was applied to the sequence of frequency values. The magnitude of the vibration due to engine failure was relatively small compared to the normal signal. These signal characteristics induced an imbalance in the labeling of data during the training process. There is a method of fitting a large amount of data or weighting to a small number of data. They applied two methods to the same dataset. Huang et al. [104] used the generated sound for effective labeling to evaluate the quality of the vehicle interior noise. To record the interior noise, they set up a microphone above the driver's seat. The internal noise of the vehicle was analyzed with the discrete wavelet transform (DWT) to derive eight sound quality factors (sound pressure level, A-weighted sound pressure level, loudness, sharpness, roughness, fluctuation strength, articulation index and tonality) under the idle and running conditions. Prediction accuracy was compared by evaluation of their method with using multiple linear regression (MLR), back-propagation neural network (BPNN), general regression neural network (GRNN) and SVM.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 23 Two types of vibration occurred during stationary motion, and the spectral characteristics needed to be analyzed. Szabó and Bakucz [100] applied a deep learning algorithm to a large gas engine that experienced abnormal vibration due to knocking. Knocking is an abnormal engine combustion process that should be prevented. Vibration was applied for a short time and monitored. They analyzed the chaoticity of the knocking by applying the DBN to distinguishing between the vibration caused by the valve and combustion during engine operation and the vibration caused by knocking. Luo et al. [101] detected the impact signal generated by defects to predict the failure of a machine during the production process, as shown in Figure 5. To solve the inverse problem, denoising, deconvolution and interpolation are used [102]. In the presented flow chart, deep learning is defined by the BPNN layer in step 4. If GA and MCI are used instead of BPNN, they can be defined as general inverse problems and SVM, PCA and k-means can be defined as machine learning. Deep learning was applied to analyze the dynamic characteristics. The impulse responses were difficult to analyze because of the harmonics and noise generated during the machining process. A deep autoencoder was used to distinguish between the normal vibration response and the impact vibration due to defects; 288 days of the transient vibration signal were used for learning and verification. Günnemann and Pfeffer [103] used the imbalance data of structure-borne noise from engine excitation to predict damage to the engine without measuring the vibration of the engine. The measured transient responses were transformed into the frequency domain with the Fourier transform. The 1D CNN was applied to the sequence of frequency values. The magnitude of the vibration due to engine failure was relatively small compared to the normal signal. These signal characteristics induced an imbalance in the labeling of data during the training process. There is a method of fitting a large amount of data or weighting to a small number of data. They applied two methods to the same dataset. Huang et al. [104] used the generated sound for effective labeling to evaluate the quality of the vehicle interior noise. To record the interior noise, they set up a microphone above the driver's seat. The internal noise of the vehicle was analyzed with the discrete wavelet transform (DWT) to derive eight sound quality factors (sound pressure level, A-weighted sound pressure level, loudness, sharpness, roughness, fluctuation strength, articulation index and tonality) under the idle and running conditions. Prediction accuracy was compared by evaluation of their method with using multiple linear regression (MLR), back-propagation neural network (BPNN), general regression neural network (GRNN) and SVM. In addition to signal measurement, deep learning has also been proposed for condition diagnosis. Xie et al. [105] utilized the speed variation to identify the vibration status of the locomotive absorber on a high-speed train. Knowing whether the fault features are stable or restricted by the number of samples is a difficult practical problem for approaches based on dynamic models. For high-speed trains, securing enough training data for deep learning models is difficult. To distinguish In addition to signal measurement, deep learning has also been proposed for condition diagnosis. Xie et al. [105] utilized the speed variation to identify the vibration status of the locomotive absorber on a high-speed train. Knowing whether the fault features are stable or restricted by the number of samples is a difficult practical problem for approaches based on dynamic models. For high-speed trains, securing enough training data for deep learning models is difficult. To distinguish 208 actual cases of speed data, they generated fourteen thousand input situations. Although the actual measurement accuracy was not very high, they realized a much higher accuracy than with backpropagation (BP) or SVM. Pujani [106] solved a similar problem for helicopter vibration. Numerical modeling was performed using a baseline model and ReLU networks because of the difficulty with securing enough input properties for applying deep learning to differentiating the vibration response of the helicopter to various states. The proposed model was compared with those from the Stanford Autonomous Helicopter Project [107]. Experience has shown that the acceleration of the helicopter in the up and down directions was the most difficult to predict. Thus, they focused on the vertical acceleration of the helicopter. The model provided more precise predictions than the linear acceleration model, linear lag model and quadratic lag model. The acceleration due to the flight state was nonlinear, which made the prediction with conventional models difficult. Deep learning models are appropriate for such nonlinear systems.
So far, cases where deep learning was applied to the discrimination, prediction, and diagnosis of moving mechanical systems have been examined. By measuring the dynamic characteristics of the system according to its movement based on vibration and sound, features can be extracted by various methods and applied to deep learning to distinguish the desired response or between the abnormal and normal responses at the same time. Various attempts have been made in vibration signal analysis to combine signal processing methods with deep learning to outperform conventional discrimination methods.

Vibration of a Rotating Object
Many research groups have applied deep learning to rotating systems because of their significance to securing the safety of industrial construction. For rotating parts, directly monitoring the vibration response is not a straightforward process because of the difficulties with installing equipment and delivering the measured signal. An indirect method is required to monitor the vibration of rotating objects based on those of the supporting structure. The training availability and Pujani [106] solved a similar problem for helicopter vibration. Numerical modeling was performed using a baseline model and ReLU networks because of the difficulty with securing enough input properties for applying deep learning to differentiating the vibration response of the helicopter to various states. The proposed model was compared with those from the Stanford Autonomous Helicopter Project [107]. Experience has shown that the acceleration of the helicopter in the up and down directions was the most difficult to predict. Thus, they focused on the vertical acceleration of the helicopter. The model provided more precise predictions than the linear acceleration model, linear lag model and quadratic lag model. The acceleration due to the flight state was nonlinear, which made the prediction with conventional models difficult. Deep learning models are appropriate for such nonlinear systems.
So far, cases where deep learning was applied to the discrimination, prediction, and diagnosis of moving mechanical systems have been examined. By measuring the dynamic characteristics of the system according to its movement based on vibration and sound, features can be extracted by various methods and applied to deep learning to distinguish the desired response or between the abnormal and normal responses at the same time. Various attempts have been made in vibration signal analysis to combine signal processing methods with deep learning to outperform conventional discrimination methods.

Vibration of a Rotating Object
Many research groups have applied deep learning to rotating systems because of their significance to securing the safety of industrial construction. For rotating parts, directly monitoring the vibration response is not a straightforward process because of the difficulties with installing equipment and delivering the measured signal. An indirect method is required to monitor the vibration of rotating objects based on those of the supporting structure. The training availability and convergence ability increase when a huge amount of data is available. The rotation signal fits very well to the characteristics of deep learning. In the case of a rotating system used in a machine, different signals of the same state can be generated in a relatively short time. It is advantageous that the rotating speed is easily detected in the frequency domain. The important mechanical components for rotation are gears and bearings. This section introduces cases where deep learning was applied to diagnosing gears, bearings, their accessories and attached systems.
Dai et al. [108] applied a novel diagnosis method of applying a stacked sparse denoising autoencoder (SSDAE) to a rolling bearing. The experimental stand consisted of a fan end bearing, electronic motor, drive end bearing, torque transducer and dynamometer, as shown in Figure 6. At Case Western Reserve University (CWRU), Smith and Randall [109] carried out an experiment where the nonlinear and nonstationary vibrational signals were decomposed by ensemble empirical mode decomposition (EEMD). As shown in Figure 7, four types of health conditions with three fault datasets were used to validate the performance of the proposed method and demonstrate its effectiveness compared with the other stack form. Tao et al. [110] applied various types of deep learning to the spur gear to compare their performances at monitoring failure. Several types of tooth breakage were measured. He et al. [111] used deep learning to diagnose the failure of a transmission chain. They verified the method using the dataset of the bearing and the gearbox and classified it with a very high accuracy. Zhang et al. [112] proposed deep convolutional neural networks with wide first-layer kernels (WDCNN), which use a wide kernel in the first layer to suppress high-frequency noise and extract features. Small kernel was used for multilayer nonlinear mapping. Horizontal flips, random crops/scales and color jitter are widely used for data enhancement; the training samples were augmented by cutting and overlapping data. They used the proposed method to diagnose faults in a motor mechanical system.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 23 for rotation are gears and bearings. This section introduces cases where deep learning was applied to diagnosing gears, bearings, their accessories and attached systems. Dai et al. [108] applied a novel diagnosis method of applying a stacked sparse denoising autoencoder (SSDAE) to a rolling bearing. The experimental stand consisted of a fan end bearing, electronic motor, drive end bearing, torque transducer and dynamometer, as shown in Figure 6. At Case Western Reserve University (CWRU), Smith and Randall [109] carried out an experiment where the nonlinear and nonstationary vibrational signals were decomposed by ensemble empirical mode decomposition (EEMD). As shown in Figure 7, four types of health conditions with three fault datasets were used to validate the performance of the proposed method and demonstrate its effectiveness compared with the other stack form. Tao et al. [110] applied various types of deep learning to the spur gear to compare their performances at monitoring failure. Several types of tooth breakage were measured. He et al. [111] used deep learning to diagnose the failure of a transmission chain. They verified the method using the dataset of the bearing and the gearbox and classified it with a very high accuracy. Zhang et al. [112] proposed deep convolutional neural networks with wide first-layer kernels (WDCNN), which use a wide kernel in the first layer to suppress high-frequency noise and extract features. Small kernel was used for multilayer nonlinear mapping. Horizontal flips, random crops/scales and color jitter are widely used for data enhancement; the training samples were augmented by cutting and overlapping data. They used the proposed method to diagnose faults in a motor mechanical system.     Qian et al. [113] studied how to overcome the difficulty of applying a deep learning model that was constructed with a small change in the measured trained vibration responses or when the load changed during actual application. Transfer learning solves these problems under robust work conditions. The constructed training dataset should be trained again when a new vibration signal is encountered. Therefore, they replaced the training of new target datasets with adaptive batch normalization (AdaBN), which has three steps: processing the vibration response into regular and differential frequency spectra, applying a DNN in advance with a stacked autoencoder (SAE) and modifying a trained network to diagnose a sample of the target dataset. They applied this method to find faults in gears and bearings.
There have been numerous cases in which various feature extraction methods and deep learning with different layers were applied to the fault diagnosis of gears and bearings [114][115][116][117][118][119][120][121][122][123][124], which have been summarized in several reviews [125][126][127]. The condition of flying vehicles needs to be predicted for maintenance and early warning of failure. Chen et al. [128] simulated engine operation in an aircraft to failure. The turbofan engine is an important element and is the largest and most complex subsystem in an aircraft, so knowing the current status in real-time is important. They applied a fault prognosis convolutional neural network (FP-CNN) to the measured vibration response at NASA AMES Lab to predict the remaining useful life (RUL). Intelligent fault diagnosis techniques improve the efficiency of fault diagnosis by replacing time-consuming and unreliable human interpretation. Deep learning has been applied to improve the accuracy of intelligent fault diagnosis. Fu et al. [129] applied deep learning to the vibration response generated by the cutting of an end mill. The vibration generated during aluminum processing with a three-tooth end mill cutter attached to the spindle housing was measured. The specific frequency of the chatter generated during the process was measured. This corresponded to the natural frequency measured by the impact test. Fu et al. used the Mel-frequency cepstral coefficient (MFCC) and wavelet as well as the raw data for input features. MFCC uses the concept of the cepstrum, which is an effective method for signal processing the vibrations from rotating components in a milling machine. They obtained a much better performance compared with the deep learning error with k-means and DBN.
Data-driven methods have been used for health monitoring by sensing and data analysis. Noise and the irregular length of data or sampling make classification with a regression model difficult. Zhao et al. [130] proposed a convolutional bidirectional long short-term memory (CBLSTM) network that features both a CNN and long short-term memory (LSTM). Local information is extracted by the CNN. Then, time series data are encoded by LSTM. The bidirectional LSTM extracts long-term dependencies from the time series data and uses bidirectionality to take advantage of the past and future contexts.
They predicted actual tool wear with raw data and compared the performance of CBLSTM with other types of discrimination. Zhao et al. applied the proposed method to computer numerical control (CNC) dry milling and acquired data by measuring vibration responses in different directions.
The bogie is an important component in high-speed trains that directly affects the safety and comfort of the train operation. Depending on the condition of the bogie, large vibrations may occur in the train, which can lead to accidents such as derailment or overturning. Zhao et al. [131] suggested applying deep learning to the existing method for diagnosing bogie faults. The advantages of deep learning by comparison with the firefly artificial neural network (FANN), particle swarm optimization neural network (PSONN) and genetic artificial neural network (GANN). They applied dynamic modeling to high-speed trains and diagnosed eight states of bogie vibrations. Collecting actual vibrations from a train with a failed bogie system is very difficult and dangerous. Therefore, the multibody dynamics software SIMPACK was used to simulate high-speed trains. Many sensors were mounted on the bogie, and measurements were carried out for diagnosis.
So far, examples of deep learning for diagnosing the vibration of a rotating object have been introduced. Gears and bearings were the main focus of study, and breakage was the main cause. The specific frequency component that occurs at this time does not overlap with the rotation frequency, so the failure occurrence can be easily determined.

Ambient Excitation
Until now, this paper has introduced machine systems that move during health monitoring. This section introduces health monitoring methods that utilize the vibration characteristics of the target system caused by the external environment of the mechanical system. The fiber recognition system is a hybrid fiberoptic acoustic sensor (DAS) that is used in various fields. Machine learning methods that use features extracted manually by professional knowledge are commonly applied to data that change at times. Wu et al. [132] tried to apply deep learning to recognizing the DAS signal. The background noise was mainly measured in the basement of the farm through which the pipe with the cable passed, but there was some interference from country lanes. Traffic noise passing through the ground was mainly collected, and the noise level of the surrounding factory was measured as shown in Figure 8. Regular impulse signals from the forging plant or small but continuous vibration of the fabricating plant or sand washing plant were measured. To measure this external excitation, CNN was introduced, and its performance was compared with other discrimination methods, as shown in Figure 9. They did not consider complex excitation sources for the pipe, but each noise source was measured with high accuracy.
Han et al. [133] applied deep learning to diagnosing the condition of a wind-driven machine like wind turbines. The experimental platform consisted of a wind tunnel, direct drive wind turbine test bench, accumulator and DAQ system. The experiment was conducted on 10 defects. The performance of the CNN layer was varied, and the deep adversarial convolutional neural network (DACNN), which is the deep learning model of the CNN combined with the generative adversarial network (GAN), was applied to the discrimination. Zhang et al. [134] used deep learning to distinguish between coal and rock in the coal mining process. The coal is moved after being mined and falls on the belt, which can be used to examine the different vibrations and sounds of coal and stones. They improved the accuracy by discriminating between the signals generated by accelerometers and microphones attached to the belt individually or with both at the same time.
t continuous vibration of the fabricating plant or sand washing plant were measured. To measur s external excitation, CNN was introduced, and its performance was compared with othe crimination methods, as shown in Figure 9. They did not consider complex excitation sources fo pipe, but each noise source was measured with high accuracy.   recognizing the DAS signal. The background noise was mainly measured in the basement of the farm through which the pipe with the cable passed, but there was some interference from country lanes. Traffic noise passing through the ground was mainly collected, and the noise level of the surrounding factory was measured as shown in Figure 8. Regular impulse signals from the forging plant or small but continuous vibration of the fabricating plant or sand washing plant were measured. To measure this external excitation, CNN was introduced, and its performance was compared with other discrimination methods, as shown in Figure 9. They did not consider complex excitation sources for the pipe, but each noise source was measured with high accuracy.   Tamilselvan et al. [135] performed aircraft wing structure diagnosis and DBN-based health incident classification to determine the engine condition and compared the performance with SVM. Aircraft wings are exposed to large changes in temperature and humidity and to a high-speed and high-pressure environment, so there is a high risk of breakage. During flight, the wings vibrate because of wind; this was measured with a vibration sensor attached to the wing. They applied DBN to the vibration signal for fault classification. Galloway et al. [136] proposed using deep learning for an automated fault detection method to maintain tidal power equipment. Vibration data were gathered from a tidal turbine's nacelle. By improving the performance of feature-based methods, their network classified ideal and fault conditions with 100% accuracy. The fault diagnosis of bogie vibration data is important, but the error mechanism is very complicated and the signal characteristics are not clear, so applying traditional signal processing methods is difficult. Hu et al. [137] applied deep learning to diagnose faults in bogies. With knowledge and diagnostic experience, deep learning is no longer needed to solve this problem. At different speeds, each fault of high-speed trains was diagnosed with very high discrimination accuracy. Dong et al. [138] applied deep learning to the small fault diagnosis of the front-end speed controlled wind generator (FSCWG) for which early diagnosis is difficult owing to the time-varying and nonstationary characteristics. The vibration signal was measured at the wind turbine of the wind farm. For the FSCWG, the complicated implicit layer was diagnosed from the vibration signals generated by various small fault patterns. The results were compared with those of a neural network and SVM, which are frequently used for fault diagnosis. Guo et al. [139] focused on the growing problem of structural deterioration, especially bridges. Bridges can be easily damaged by persistent traffic, wind loading, material aging, environmental corrosion, earthquakes and so on. These phenomena cause complex structural environmental noise, so detecting the deformation of the bridge is difficult. To diagnose the damage level and location, a large amount of training data is required. Guo et al. installed wireless sensor networks on a bridge to generate a large number of unlabeled examples to train a feature extractor based on the sparse coding algorithm. Their proposed method demonstrated higher accuracy than existing methods.

External Excitation and Resulting Vibrations
This section introduces several studies that have applied deep learning to health monitoring in which the system was directly excited and its response was measured. Zhao et al. [140] introduced the modal macro strain (MMS), the sensing technology of a long-gauge distributed fiber Bragg grating (FBG) and deep learning. They modeled a basalt fiber-reinforced polymer (BFRP) composite pipeline with the finite element method. Random signals were loaded to the pipe in the direction perpendicular to the wall. The location of the excitation was 0.2 m away from the pipe support, as shown in Figure 10. The longitudinal and circumferential distributions of the MMS were measured to locate the damage, load excitation and support. Appl. Sci. 2020, 10, x FOR PEER REVIEW 15 of 23 Figure 10. Pipeline modeling from [140], Copyright 2018, MDPI. The schematic diagram of the applied CNN is shown in Figure 11. Conventional image-based damage identification shows only the external damage. The introduced method of using dynamic responses is suited for complicated structures and environments and can overcome this limitation. Abdeljaber et al. [141] used the Qatar University (QU) grandstand simulator to develop a 1-D CNN for a horizontal girder made of steel. The bolt joints were loosened, and a modal shaker was used to vibrate the structure. A deep-learning-based damage detection system was used to localize the damage in real time. Raw vibration data were used to extract the optimal damage-sensitive features.  vibrate the structure. A deep-learning-based damage detection system was used to localize the damage in real time. Raw vibration data were used to extract the optimal damage-sensitive features.
Monitoring only the natural frequencies of a cantilever beam is not effective because the temperature or vibration amplitude can affect the variation in natural frequencies. Onchis [142] chose deep learning for noninvasive monitoring of the cantilever beam conditions. The vibration data were measured to obtain extended time-frequency signatures. The undamaged beam and two types of damaged beams with different temperatures were characterized. The results showed that the proposed method can eliminate the ambient effect (i.e., temperature) and determine if the beam is damaged or not. Lin et al. [143] proposed using a deep CNN for a novel detection method of structural damage to a simply supported beam. In a simulation, a beam was subjected to a burst of random excitation, and different damage levels were simulated by a reduction in the bending rigidity. The results show that the CNN had high accuracy in noisy environment; it could identify multiple points of damage and learned hierarchical features layer by layer. Lin et al. pointed out the importance of big data, which can implicitly consider all the factors. Thus, deep learning can be used for classification with excellent performance, reduced computation time and wide applicability. Oliveira et al. [144] used an electromechanical impedance (EMI)-PZT to monitor structural damage in aluminum structures. This application of EMI-PZT to a CNN was the first trial in this field of research. The proposed method outperformed the currently developed structural health monitoring approach. Table 2 summarizes the research using deep learning for structural health monitoring that were referenced in this review. A combination of various signal preprocessing methods and deep learning architectures is being studied for vibration-based monitoring. Structural health monitoring using vibration response has the elements necessary for classification by deep learning, such as a clear distinction between normal and abnormal status. Data processing can be used for data augmentation. In addition to ongoing developments in researches of deep learning, various and effective condition diagnostics are expected to be proposed. Monitoring only the natural frequencies of a cantilever beam is not effective because the temperature or vibration amplitude can affect the variation in natural frequencies. Onchis [142] chose deep learning for noninvasive monitoring of the cantilever beam conditions. The vibration data were measured to obtain extended time-frequency signatures. The undamaged beam and two types of damaged beams with different temperatures were characterized. The results showed that the proposed method can eliminate the ambient effect (i.e., temperature) and determine if the beam is damaged or not. Lin et al. [143] proposed using a deep CNN for a novel detection method of structural damage to a simply supported beam. In a simulation, a beam was subjected to a burst of random excitation, and different damage levels were simulated by a reduction in the bending rigidity. The results show that the CNN had high accuracy in noisy environment; it could identify multiple points of damage and learned hierarchical features layer by layer. Lin et al. pointed out the importance of big data, which can implicitly consider all the factors. Thus, deep learning can be used for classification with excellent performance, reduced computation time and wide applicability. Oliveira et al. [144] used an electromechanical impedance (EMI)-PZT to monitor structural damage in aluminum structures. This application of EMI-PZT to a CNN was the first trial in this field of research. The proposed method outperformed the currently developed structural health monitoring approach. Table 2 summarizes the research using deep learning for structural health monitoring that were referenced in this review. A combination of various signal preprocessing methods and deep learning architectures is being studied for vibration-based monitoring. Structural health monitoring using vibration response has the elements necessary for classification by deep learning, such as a clear distinction between normal and abnormal status. Data processing can be used for data augmentation. In addition to ongoing developments in researches of deep learning, various and effective condition diagnostics are expected to be proposed. Khatir et al. [15] presented a damage indicator based on normalized modal strain energy indicator (nMSEDI) and compared the performance of the local frequencies change ratio (LFCR). Computational (CPU) time of LFCR based on FEM was about 60 s and nMSEDI took about 0.28 s. Wu et al. [132] used 1-D CNN + SVM method with 358 samples per second processing speed. Inverse analysis has the advantage of short computation time because the amount of data calculation is smaller than that of deep learning. However, deep learning has the advantage of being able to perform health monitoring on complex structures with high accuracy.

Conclusions
Advances in machine learning technology have rapidly increased its application in vibration-based structural health monitoring, as summarized in this review. Since AlexNet's success in 2012, interest in deep learning has been increasing rapidly. Based on this trend, research on the possibility of applying deep learning to structural health monitoring has also increased. A database search showed that the number of works reported in the area of structural health monitoring with deep learning has been increasing each year from 279 in 2012 to 323, 402, 440, 433, 524 and 661 in 2013-2018, respectively. A further increase is expected in 2019. The use of structural vibrations for fault diagnosis is advantageous for intrinsic diagnosis of the installation responses. For practical applications, issues related to nonlinearity, nonstationarity and time variation should be resolved. Deep-learning-based algorithms commonly utilize big data. Consequently, they are advantageous in terms of robustness. The amount of data used for discrimination is large to filter out unnecessary information. However, it is important to ensure converged results and to overcome the problem of an aperiodic abnormality signal being generated. Studies are actively ongoing to use data augmentation or directly applying vibration to a system to detect an abnormal signal. The vibration responses are obtained in the time domain. The raw data were used for categorization and feature extraction. Signal processing (e.g., Fourier transform, Hilbert transform and wavelet transform) was also used. Deep learning is a multilayered neural network that can be fabricated in various ways, depending on how the layer is constructed. The deep learning algorithms established their own features that characterize the subject. Production of high-accuracy results is expected in a fast and efficient manner. Therefore, the use of deep learning to analyze unstructured data such as voice, video, photo and sensor measurements is increasing. Data labeling is not required for unsupervised learning. Data labeling requires a lot of time and effort in specific applications. Deep learning is expected to be a tool for solving complicated vibration problems and contributing to efficient structural health monitoring. This summary is intended to be used in future studies as a possible guideline when newly emerged machine learning algorithms are used for vibration-based structural integrity monitoring.