Optimization of a bearing fault diagnosis method based on convolutional neural network and wavelet packet transform by simulated annealing

: Bearings are widely used in various types of electrical machinery and equipment. As their core components, failures will often cause serious consequences . At present, most methods of parameter adjustment are still manual adjustment of parameters. This adjustment method is susceptible to prior knowledge and easy to fall into the local optimal solution, failing to obtain the global optimal solution and requires a lot of resources.Therefore, this paper proposes a new method of bearing fault diagnosis based on wavelet packet transform and convolutional neural network optimized by simulated annealing algorithm.The experimental results show that the method proposed in this paper has a more accurate effect in feature extraction and fault classiﬁcation compared with traditional bearing fault diagnosis methods. At the same time, compared with the traditional artiﬁcial neural network parameter adjustment, this paper introduces the simulated annealing algorithm to automatically adjust the parameters of the neural network, thereby obtaining an adaptive bearing fault diagnosis method. To verify the effectiveness of the method, the Case Western Reserve University bearing database was used for testing, and the traditional intelligent bearing fault diagnosis method was compared. The results show that the method proposed in this paper has good results in bearing fault diagnosis. Provides a new way of thinking in the ﬁeld of bearing


Introduction
Bearings are a widely used, significant part of modern machinery and equipment. Bearing failure leads to significant failure time, increases maintenance costs, and may even reduce productivity [1]. Rolling element bearings are important mechanical parts of rotating machinery. They are also the main causes of basic industrial equipment failures, such as those of roll mills in steel mills, paper mills, and wind-turbine power plants, accounting for 51 % of all failures [2].Therefore, efficient and accurate bearing fault diagnosis methods play an important role in ensuring the function of the entire mechanical system, and the research on these methods is thus of great significance.
The study of bearing failure requires a large amount of original data, most of which are related to the state of motion. The state of motion can be reflected through vibration, sound, heat, electricity, etc. To check the health of a motor comprehensively, a state monitoring system is used to collect real-time data on the motor; after the motor has been running for a while, a large amount of data can be obtained [3].However,these data are often processed with low efficiency in traditional bearing fault diagnosis methods, so a more efficient and reliable method is required.
Supported by a large amount of data, the traditional method of bearing fault diagnosis, which is based on manual experience, has gradually transitioned to data-driven, intelligent bearing fault diagnosis. Intelligent fault diagnosis can quickly and efficiently process the collected signals and provide accurate diagnoses. Thus, it is a promising tool in mechanical big-data processing. However, traditional intelligent diagnosis methods rely on prior knowledge and expertise for the manual extraction of features. These processes utilize human ingenuity, but they are time-consuming and labor-intensive [4].
One study [5]proposed a bearing fault diagnosis method combining the fuzzy cmeans (FCM) method and an optimized k-nearest neighbor (KNN) model. Another [6] proposed a method to detect motor bearing faults. This method is based on spectral kurtosis (SK) and cross-correlation, extracts fault features representing different faults, and then uses principal component analysis (PCA) and semi-supervised k -The nearest neighbor (KNN) distance measure combines these features into a health index.Another study [7] proposed a sequential KNN classification method based on the affinity of distance and density to perform classification resulting in bearing fault diagnosis. In other study , a diagnosis method was proposed based on data-driven random fuzzy evidence collection and Dempster-Shafer evidence theory.
The literature [8] proposes a sequential k-NN classification method based on the affinity of distance and density to perform classification for bearing fault diagnosis. The literature [9] proposed a diagnosis method based on data-driven random fuzzy evidence collection and Dempster-Shafer evidence theory. Then, by studying the bearing fault diagnosis method at this stage, it can be found that the bearing fault diagnosis method at this stage has obvious defects in feature extraction. In response to this problem, some scholars have proposed a new bearing fault diagnosis model, including feature extraction methods and fault classification methods .
Deep learning, which can discover features from raw data through multi-layer nonlinear data-processing units, has become a promising tool for intelligent bearing fault diagnosis. Indeed, deep learning has recently played a crucial role in the field of artificial intelligence [10]. Deep convolutional neural networks have strong data-mining and information integration capabilities, and they have been widely used inresearch on monitoring and diagnosing rotating machinery [11]. One study [12]proposed a method for feature extraction using continuous wavelet transform and rolling bearing fault diagnosis based on a convolutional neural network and support vector machine. Another study [13] proposed a bearing fault diagnosis method based on short-time Fourier transform and a convolutional neural network. Other studies have proposed bearing fault diagnosis methods based on the Stacked Inverted Residual Convolutional Neural Network (SIRCNN) and on the deep structure of the convolutional neural network [14] [15]. Another novel fault diagnosis strategy was based on Synchronous Compression Transformation (SST) and deep convolutional neural network [16].
Although bearing fault diagnosis based on convolutional neural networks has achieved good results, setting convolutional neural network parameters is difficult,and how these parameters are set often has a decisive effect on the final result. Furthermore, manual parameter adjustment requires a lot of manpower and computing resources.
To address this problem, this study proposed a simulated annealing (SA) optimization method for bearing fault diagnosis based on a convolutional neural network and wavelet packet transform (WPT). Through WPT, the original signal was changed into a spectrogram. This spectrogram was then used as the input data of the convolutional neural network, which used feature-extracted samples for supervised training. Then, an SA algorithm was introduced to optimize the parameters of the convolutional neural network continually. The Case Western Reserve University bearing database was then used to verify this study's proposed bearing fault diagnosis method.
This study also compared the proposed bearing fault diagnosis method with some traditional algorithms [31] [32] including support vector machine, BP neural net-work, and WPT-convolutional neural network. Finally, the effectiveness of the proposed diagnosis method was verified through the comparison.
This article is organized as follows: Section 2 introduces the data set used in the experiment, while Section 3 introduces the method proposed in this study, which included WPT, a convolutional neural network, and an SA algorithm. Section 4 describes the experiment and provides an analysis of the results. The fifth section summarizes the article.

Data set description
The Case Western Reserve University bearing database was employed in this study because a large number of recent studies have been based on this data set and have achieved good results. The experimental equipment is shown in Figure 1. All bearings tested during the experiment were SKF bearings, which can be divided into three classifications: ordinary bearings, drive end (DE) bearings, and fan end (FE) bearings [28].
The accelerometer installed on the drive end of the induction motor was used for data collection. The sampling frequency was 12 KHZ, and each vibration signal was 10 s.The collected data included four bearing states: (1) normal state, (2) inner ring failure, (3) rolling element failure, and (4) outer ring failure. Figure 2 provides a physical drawing of the bearing. The sampling was completed at different speeds to capture each of these states: 1750 rpm, 1772 rpm, and 1797 rpm. Empirical mode decomposition was used to introduce a single point of failure. The fault point size was 0.007, 0.014, 0.021, and 0.028 inches. The experiment was also repeated under different loads, including 0, 1, 2, and 3 horsepower(hp) [29].
The number of sampling points often has a great impact on the experimental results. Generally, as the number of sampling points increases, the final bearing effect will be correspondingly better. However, for application to the actual industrial field and realization of real-time bearing fault diagnosis, too many sampling points cannot be selected. To evaluate the effectiveness and robustness of the proposed method, 400 sampling points were selected per group. At present, most studies choose 1024,2048,4096 for a single sampling point. A large number of sampling points will greatly improve the final diagnostic accuracy, but the real-time performance will also be affected.In subsequent experiments, the bearings were divided into 12 categories to verify the method proposed. The classification is shown in Table 1.

The proposed bearing fault diagnosis method
This section introduces the bearing fault diagnosis method (SWC) based on an SAoptimized convolutional neural network and WPT. The widely used bearing fault diagnosis algorithm is mainly composed of feature extraction and pattern recognition. First , the main features of the original diagnostic signal are extracted.Then, a pattern recognition algorithm is used, and the extracted features are used as a data set for training, thereby achieving the purpose of bearing fault diagnosis. This methodology has been proven relatively mature and has achieved good results.
The proposed bearing fault diagnosis method (SWC) based on a convolutional neural network optimized by SA and WPT also roughly followed this model. On this basis, the SA algorithm was introduced for automatic parameter adjustment, which became a kind of self-adjustment and thus adapted the bearing fault diagnosis method.
First, WPT was used for feature extraction to obtain a spectrogram. This spectrogram was then sent to the convolutional neural network for training. Finally,the SA algorithm was used to optimize the convolutional neural network, and the global optimal solution was obtained. Figure 3 briefly describes this process.

Feature extraction based on wavelet packet transform
Machine-learning algorithms based on vibration signal processing are a common method used in bearing fault diagnosis [24]. Feature extraction of bearing signals is key for fault diagnosis, as the excellent features of bearing vibration signals aid in the accurate diagnosis of bearing faults. When a fault occurs, the time domain and the frequency domain of the output signal are different. WPT analysis can accurately find this fault information and location. Indeed, since WPT appeared in the early 1980s, its popularity has grown; WPT has many advantages over Fourier transform [23]. WPT is an important mathematic tool for analyzing nonlinear and non-stationary signals. It can decompose the signal into multiple sub-signals with different frequency ranges. Further, it can decompose the detailed information of the high-frequency area signal. Therefore, feature extraction of bearing vibration signals based on WPT is one of the most commonly used methods. The mathematical description of the WPT algorithm is as follows: In the process of WPT, the detected signal is divided by a pair of low-frequency and high-frequency filters into an approximate part and a detailed part, respectively. The approximate part is then further split into a secondary approximate part and a detailed part. Similarly, the first level of detail is split into a second level of approximation and detail. This process continues until the stopping criterion is reached. For n-level decomposition, the signal is decomposed into 2n narrowband signals. The recombined signal carries the same information as the original signal.
In SWC, WPT was used as the signal extraction method. WPT had the same bandwidth in each decomposition, as shown in the decomposition tree structure provided in Figure  4. In this decomposition mode, the original signal would not increase or decrease, and thus it was retained to the greatest extent possible; almost no loss of the original signal occurred during the process. This facilitated the processing of non-stationary signals, and good time-frequency analysis could be performed, regardless of the high-frequency or low-frequency parts of the original signal.  For timing signals, the characteristic frequency cannot be found directly Therefore, WPT was used to process the original timing signal and convert it to different frequencies, as shown in Figure 7. To improve the efficiency of data utilization, these pictures were spliced into one picture to use as the training data for the convolutional neural network, as shown in Figure 6.

Fault classification based on a convolutional neural network
Deep learning has been widely used in various fields and has achieved relatively good results. Among deep learning algorithms, the development of convolutional neural network is the fastest [17]. With the continuous improvement of computer performance, the efficiency of convolutional neural networks in processing sound and video data sets far exceeds that of the traditional methods.
Similarly, deep learning is also widely used in the field of bearing fault diagnosis. Compared with traditional diagnosis methods, bearing diagnosis based on convolutional neural networks has stronger anti-noise ability, faster processing speed, and higher accuracy. The structure of a convolutional neural network in shown in Figure 7.

Convolutional layer
The convolution layer of a convolutional neural network is mainly composed of several convolutional kernels, and the parameters of each kernel are optimized through a back propagation algorithm [18]. Convolution is a mathematical operation, and the completion of this operation in the convolution layer depends only on the convolution kernels. These kernels extract the features of the input data and perform learning.
The convolution layer can also complete local connection and weight sharing.Local connection means that the parameters of the convolution layer are only connectedwith the previous layer, and only local features are learned. This ensures that similar data will be input and similar answers will be obtained. Weight sharing reduces parameters and enables rapid convergence. Convolution can be described by the following mathematical formula: Among them, f represents the activation function, usually the relu function, which is described in detail below. i, j represent the element in the i-th row and j-th column of the input data matrix. m, n represent the weight of the mth row and n column, and w b represents the bias term.

Activation function
When the activation function is not used, each layer is a linear combination of the previous layer, so that fewer types of functions can be fitted. After introducing a non-linear activation function, the model can fit almost any non-linear function, namely. This activation function is introduced to improve the fitting ability of the model [19]. Activation functions are continuous and differentiable, which facilitates complex mathematical operations. Commonly used activation functions include sigmoid, relu, tanh, leaky relu, maxout, elu, etc.
The expression of the relu function is as follows: The image of the relu function is shown in Figure 8.

Pooling layer
In simple terms, pooling is a non-linear down-sampling. The pooling layer completes two tasks. First, it reduces the input feature map to reduce the complexity of the model and optimize its calculation. Second, it compresses the features,extracts the main features, and reduces the phenomenon of over-fitting [20]. The process of pooling is roughly shown in Figure 9.
Common pooling includes maximum pooling and average pooling. The realization mechanism of maximum pooling is to divide the input matrix into multiple regions and select the maximum value for each subregion. Through this mechanism, the most important characteristics of the input data are extracted.
Some models no longer use the pooling layer, as pooling reduces the dimensionality of the data and simplifies the calculation. However, part of the original data is lost. This loss of the original data will cause part of the data to be lost, which is detrimental to the final effect of the model [21].

Fully connected layer
Under normal circumstances, the data must be sent to the fully connected layer after convolutions and pooling to complete the final operation of the model. In the convolution layer, the local feature extraction of the input data is completed. If only the local features are used to judge, the phenomenon of over-fitting is prone to appear [22]. Therefore, all features must be used. Once the process is complete, the weight matrix reassembles the previously extracted features into a complete feature map.

Fault classification
In the proposed algorithm, the convolutional neural network algorithm is used for final pattern extraction to classify faults. Convolutional neural networks have many advantages in image recognition and classification. In this study's comparative analysis, the classification effect of the convolutional neural network was compared with that of a support vector machine and BP neural network. The results showed that convolutional neural networks achieved better fault classification results.
The spectrogram was sent to the convolutional neural network for training. After training, the convolutional neural network calculated the fault type. As stated above, convolutional neural networks are composed of an input layer, convolution layer, pooling layer, and fully connected layer. The general structure is shown Figure. The state of bearings is classified as normal, inner-ring failure, outer-ring failure, and rolling element failure. Figure 10 shows the frequency spectrum of these four bearing states. The signal-processed data were sent to the convolutional neural network as input data for training. The convolutional neural network of the present study is shown in Figure 11. Figure 11. Convolutional neural network structure used in the experiment (partial).

Optimization based on simulated annealing algorithm
The earliest idea of SA was proposed by N. Metropolis et al. in 1953. Then, in 1983, S. Kirkpatrick and others successfully introduced the idea of annealing to the field of combinatorial optimization, which is a stochastic optimization algorithm based on the Monte-Carlo iterative solution strategy similar to that used in [26]. The solution obtained by the SA algorithm has been shown to converge to the global optimal solution according to the probability theory.
The core concept of SA can be roughly described as follows. As the initial annealing temperature drops, the optimal solution is constantly searched for within the feasible solution. Unlike the hill-climbing algorithm, the SA algorithm has a certain probability of accepting a solution that is worse than the original solution so that it can avoid getting trapped by local optimal solutions. The probability of accepting the poor solution continues to decrease until the SA finally approaches the global optimal solution. This process is just like that of metal annealing [27]. The general process is shown in Figure 12.
As a general random search algorithm, SA has been widely used in VLSI design,image recognition, and neural network computer research. However, this algorithm is not widely applied in the field of bearing fault diagnosis. Thus, the optimizationpotential of SA is obvious. The parameter adjustment of the convolutional neural network is very important, as the quality of the parameter settings determines the performance of the final model. However, parameter adjustment is often difficult and requires a lot of manpower and computing resources. Additionally, because it is based on human experience, falling into the local optimal solution is easy, which renders obtaining the global optimal solution impossible. Therefore, the SA was introduced to improve the model. Due to the introduction of the SA algorithm, the final algorithm became an adaptive bearing fault diagnosis method. Compared with other diagnostic methods that require manual adjustment of parameters, this method had a better scope of application.
As an important optimization algorithm, the SA algorithm is mainly used to solve optimization problems in the engineering field. In this study, the SA algorithm was introduced to optimize the parameters of the convolutional neural neural network so that it could obtain a higher accuracy rate, reduce over-fitting, and accelerate the convergence speed. The setting of some parameters in the SA algorithm will affect the results of the final experiment. In the present study, the initial temperature was set to 1000 degrees Celsius , and the learning rate was 0.95.
In the previous section, the basic model architecture was described. WPT analysis was used as a signal processing method for feature extraction, and a convolutional neural network was used as a fault classification algorithm. Optimizing the convolutional neural network using an SA algorithm aimed to optimize the parameters of the convolutional neural network to help quickly find the most suitable parameters. This sped up the convergence of the convolutional neural network and obtained a higher accuracy rate.
The following steps were used to optimize the convolutional neural network with the SA algorithm: Step 1: Specify the initial solution, set the initial temperature First set an initial solution, and use the initial default settings as the initial solution. This solution is treated as a set, including the learning rate, and the parameters that need to be set for each layer. The initial temperature setting is theoretically the bigger the better, here choose 1000 degrees Celsius Step 2: Given the objective function The optimized objective function, after the entire optimization process, maximizes the value of the objective function. The objective function here selects the final training accuracy of the convolutional neural network. The third step: Given the disturbance function In order to help find the optimal solution quickly, a perturbation function needs to be given. The perturbation function jumps to 1%-5% based on the original parameters. As a new solution.
Step 4: Given acceptance criteria In this experiment, Metropolis criterion is used. For the new solution, it will be compared with the original solution first. If it is better than the original solution, it will choose to accept it. If the original solution is not superior, it will be accepted with a certain probability.
This idea of accepting inferior solutions will help jump out of the local optimal solution and provide the possibility to find the global optimal solution.

Analysis of results
The proposed bearing fault diagnosis method based on WPT and a convolutional neural network optimized by the SA algorithm was implemented in MATLAB, and the method was completed on a machine with GTX1660ti GPU. The Case Western Reserve University bearing database was used for verification.
Bearing fault diagnosis consists of two processes: feature extraction and fault classification. The main feature extraction methods include principal component analysis, empirical mode decomposition, WPT, and short-time Fourier transform. In bearing fault diagnosis, the commonly used fault classification algorithms are BP neural network and support vector machine.
To illustrate the effectiveness of the proposed method, it was compared with the traditional intelligent algorithms. Compared with these algorithms, the results of the new bearing fault diagnosis method showed a higher accuracy rate, thus supporting the effectiveness of the algorithm. The results of the comparison are shown in Table 2, while Figure 13 provides a comparison of the traditional intelligent algorithms. Empirical mode decomposition is an adaptive method that can decompose any signal into an empirical mode. It is commonly used for feature extraction in traditional intelligent bearing fault diagnosis algorithms. The empirical patterns obtained by re-composition are different oscillation patterns extracted from the original signal. Thus, the empirical mode can be used to reconstruct the original signal [30].