Deep Learning Network Based on Improved Sparrow Search Algorithm Optimization for Rolling Bearing Fault Diagnosis

: In recent years, deep learning has been increasingly used in fault diagnosis of rotating machinery. However, the actual acquisition of rolling bearing fault signals often contains ambient noise, making it difﬁcult to determine the optimal values of the parameters. In this paper, a sparrow search algorithm (LSSA) based on backward learning of lens imaging and Gaussian Cauchy variation is proposed. The lens imaging reverse learning strategy enhances the traversal capability of the algorithm and allows for a better balance of algorithm exploration and development. Then, the performance of the proposed LSSA was tested on the benchmark function. Finally, LSSA is used to ﬁnd the optimal modal component K and the optimal penalty factor α in VMD-GRU, which in turn realizes the fault diagnosis of rolling bearings. The experimental results show that the model can achieve a 96.61% accuracy in rolling bearing fault diagnosis, which proves the effectiveness of the method.


Introduction
Rolling bearings are important components of mechanical transmission systems.It is widely used in various types of rotating machinery such as hydroelectric generators and underground exploration equipment [1][2][3][4].Therefore, the real-time monitoring and fault diagnosis of the bearing condition is very important to protect the machinery and equipment.Deep learning methods are commonly used in rolling bearing fault detection and recurrent neural networks (RNN) make the network have a certain degree of memory by introducing memory units [5][6][7].However, RNN has the problem of gradient disappearance and gradient explosion, and the long-short time memory network (LSTM) and the gated recurrent unit neural network (GRU) solve the above problems.An, Yiyao et al. [8] introduced a sparse attention mechanism into LSTM for the fault diagnosis of rotating machinery, which has significant advantages in reducing random interference and enhancing feature information.Zhong, Cheng et al. [9] proposed a bi-directional longand short-term memory fault diagnosis method for rolling bearings based on segmental interception spectrum analysis and information fusion, which reduces the impact of feature redundancy on the training neural network arising from modal blending by comparing the AR spectra of the components corresponding to different fault locations and fusing all features.Wang, Haitao et al. [10] introduced a self-calibrating convolutional module in the residual network and proposed a recurrent neural network based on two-stage attention to achieve good results in bearing fault diagnosis experiments.
In practice, rolling bearings produce abnormal vibration signals when they fail.However, due to the fact that the equipment usually operates under multiple load conditions, the characteristic information of the fault is often weak and disturbed by the noise of the surrounding environment, and the vibration signal is mixed with a large amount of redundant information [11,12].It makes it difficult to extract and recognize its characteristic frequency using traditional fault diagnosis techniques.In recent years, many researchers have proposed programs.Huang et al. [13] proposed empirical mode decomposition (EMD) and applied it to process nonlinear signals, but EMD is subject to problems such as mode mixing and underenveloping during the decomposition process.To solve this problem, methods such as local mean decomposition (LMD) [14,15] and integrated empirical mode decomposition (EEMD) [16,17] have been proposed and achieved a certain degree of effectiveness; although LMD and EEMD can compensate for the defects of EMD to a certain extent, these methods still belong to recursive mode decomposition algorithms and cannot fundamentally solve the endpoint effects and error accumulation arising from the gradual accumulation of errors in the decomposition process [18].Dragomiretskiy K. et al. [19] proposed a variational modal decomposition (VMD) algorithm, which can decompose a signal into an amplitude-modulated signal with real physical significance, with high decomposition accuracy and fast convergence [20].As an emerging time-frequency analysis method, VMD has certain advantages in solving the signal decomposition problem, but it still needs to set different characteristic parameters according to different signals, which will lead to insufficient modal decomposition if the parameters are not selected properly [21].Therefore, many researchers have introduced swarm intelligence algorithms into the parameter selection of VMD.Ding, Jiakai et al. [22] proposed a genetic mutation particle swarm algorithm to optimize the variational modal decomposition algorithm (GMPSO-VMD) to find the optimal combination of parameters for the VMD algorithm with the minimum envelope entropy as the objective function and accurately extracted the frequency characteristics of the four fault states of rolling bearings.Tan, Shuai, et al. [23] introduced the cuckoo algorithm into the variational modal decomposition by calculating the envelope entropy with the maximum kurtosis between all components and the original signal as the feature input value of the autoencoder, and the experimental results determine that this method is more effective in extracting the initial fault characteristics of the bearing.
The Sparrow Search Algorithm (SSA) is a metaheuristic algorithm proposed in recent years [24], which is a meta-inspired algorithm inspired by sparrows searching for food and escaping from their pursuers.The sparrow search algorithm has many advantages such as simple implementation, less adjustment parameters required, high search accuracy, robustness, and stability [25,26], and is therefore of interest to a wide range of researchers.However, it also suffers from the problems of slow search speed, decreasing population diversity in the late stage of search, and easily falling into local optimum [27,28].To solve the above problems, Chengtian Ouyang et al. [29] proposed a learning sparrow search algorithm that introduces an improved sine and cosine mechanism, which achieves favorable results in path planning.Farhad Soleimanian Gharehchopogh et al. [30] summarize the various improvements of the sparrow search algorithm in recent years and systematically discuss its application to neural networks and deep learning.Chenglong Zhang et al. [31] proposed a sparrow search algorithm based on the chaotic mechanism with excellent results in randomizing the network configuration parameters.Yanlong Zhu et al. [32] proposed an adaptive sparrow search algorithm for parameter selection in a proton exchange membrane fuel cell model and the results show that the method can accurately select the unknown parameters in the model.
Although researchers have tried various methods to improve the searching ability of SSA, it still has not changed its poor searching ability and low population diversity.In this paper, a deep learning based rolling bearing fault diagnosis method (LSSA-VMD-GRU) is proposed.The main work of this paper is as follows: (1) A lens imaging reverse learning strategy is used to find the initial population locations, improve global search, and enhance the quality of the initial population.(2) The Gauss-Cauchy mechanism of variation introduces variance factors into populations and enhances population diversity.
(3) The proposed LSSA is used to optimize the hyperparameters of the VMD-GRU network to improve the accuracy of rolling bearing fault diagnosis.
The other sections of this paper are organized as follows: Section 2 presents the original SSA, the VMD algorithm, and the gated recurrent unitary neural network.Section 3 presents the proposed LSSA algorithm.Section 4 presents the benchmark function and the related experiments for rolling bearing fault detection.Section 5 summarizes the related work in this paper.

Basic Sparrow Search Algorithm
A higher value of fitness for the discoverer can find areas for the whole population that can enhance the fitness and provide foraging directions and areas for the followers.The discoverer position update equation is: where X t i,j represents the current location of the sparrow.W ∈ [0, 1], ST ∈ [0.5, 1], when W < ST means there is no predator in the vicinity, the sparrow can search globally in a wide range.When W ≥ ST means that the early warning agent in the population detects the danger and, at the same time, sends a warning signal to the other sparrows, the sparrow population quickly flees the area, engages in antipredator behavior, and updates its position.
The follower will follow the discoverer in order to increase its fitness value, feeding with the discoverer and increasing its energy reserves.The follower position update formula is: where X t worst is the global worst position of the previous generation.X t+1 p is the most food-rich location globally that the current sparrow has searched.A is a column vector at the same latitude as the individual sparrow.
indicates that the sparrow is currently in a food-deficient state (low acclimatization value) and the sparrow should immediately travel to a food-rich area to forage.
where X t best is the most food-rich location in the last generation.β is the control parameter.K ∈ [−1, 1], ε is a very small trace constant.If f i > f g , the sparrow at the edge has to move to the center.If f i = f g , the sparrow at the center moves closer to the other sparrows.

Variational Modal Decomposition Algorithm (VMD)
VMD is an algorithm proposed in 2014 for signal processing and data analysis.The input signal f (t) is decomposed into multiple intrinsic mode functions (IMF), where each IMF component after decomposition is a component with a different frequency for frequency and amplitude modulation and a variational constrained problem is obtained by adding an exponential term.
where u k is each IMF component, ω k is the center frequency, and δ(t) is the impulse function.
The above problem is converted into an unconstrained variational problem by introducing the Lagrange multiplier operator λ and the quadratic penalty factor α. The expressions for the final solutions u n+1 k (ω) and ω n+1 k are given below: 5)

Gated Recurrent Unit Neural Network (GRU)
There are similarities between the structure of the GRU and LSTM network, where both GRU and LSTM are computed by the gating mechanism; however, the difference is that GRU has only an update gate and a reset gate.GRU and LSTM are not much different, but GRU is faster due to fewer parameters.The network results for GRU are shown in Figure 1.

Lens Imaging Reverse Learning Strategy
Lens imaging inverse learning is an improved method for expanding the search range by calculating the inverse solution for the current position, which is shown in Figure 2.This method can expand the scope of the global search and achieve an overall improvement in the quality of individuals in the initial population.
It is possible to model the graph with the y-axis as a convex lens and [a,b] is the search range interval of the solution.There is a point P of height h.The projection of point P on the x-axis is x.The other side of P imaged through the lens presents an inverted solid image P*, of which the height is h * and the projection on the x-axis is x * .The mapping method is as follows: Let k = h/h * be the scaling factor of the lens which can be obtained: Let x j represent the current sparrow individual and x * j represent the individual after lens imaging reversal, then the lens imaging reversal learning strategy can be applied to the population initialization as shown in Equation ( 9):

Gaussian Cauchy Variation Mechanism
When the SSA is iterated to a later stage, the sparrow population will move closer to the currently found optimal individual, that is, the group aggregation behavior, and this behavior will lead to the problem of insufficient population diversity as well as premature convergence of the algorithm, which, once trapped in the local optimum, will increase the computational effort of the algorithm or even become incorrectly trapped in the local optimal solution.
In the improvement process of swarm intelligence algorithms, the evolutionary process of biological populations is also often simulated to improve the algorithm, in which the introduction of mutation factors to bein the population evolution is one of the common methods.In this paper, the Gauss-Cauchy variation mechanism is introduced to enhance the diversity of populations.
The Gaussian distribution, which is also known as the normal distribution, is denoted as Gauss(u, σ 2 ) and is distributed as in Equation ( 10): where the random variable x is a Gaussian distribution obeying a mathematical expectation of µ, a variance of σ 2 , and a standard Gaussian distribution at µ = 0 and σ = 0.The variance factor that arises when the variance of the Gaussian distribution is large enhances the global search ability of the population, which means that individuals have a larger variance range and more individuals can reach different locations in the search space to search.If the algorithm is stuck in a local optimum, the Gaussian variational factor will allow the algorithm to have better escape ability.The variance factor resulting from a Gaussian distribution with a small variance will give the algorithm a better local search ability and can obtain more accurate local extremes earlier, but this will also cause the algorithm to lose some of its local escape ability, which can easily cause the algorithm to fall into a local optimum.
In order to solve the problems associated with the Gaussian distribution, researchers have also proposed the use of the Cauchy distribution, denoted as, which is formulated as in Equation (11).
where the random variable x ∈ (−∞, ∞) obeys a Cauchy distribution with scale parameter γ and location parameter x 0 .It is a standard Cauchy distribution at γ = 1 and x 0 = 0.The cumulative distribution formula for the standard Cauchy distribution is given in Equation ( 12).
The curves of the Gaussian and Cauchy distributions are similar in shape, but the Gaussian curve is higher in the center than the Cauchy distribution, and the Cauchy curve is higher on both sides than the Gaussian distribution.The Gaussian distribution and the Cauchy distribution of the comparison graph is shown in Figure 3.
The pseudo-code of the proposed LSSA is shown in Algorithm 1.For the problem where it is difficult to choose the appropriate variable modal parameters of rolling bearing vibration signals, this paper adopts the proposed LSSA method to adaptively find the optimal parameters of VMD.Several modal components containing fault information are selected via the correlation coefficient, and the energy entropy of each modal component is calculated, which is input into the GRU deep learning network as a feature vector for fault classification, and the specific implementation steps are as follows: (1) Signal acquisition of rolling bearings in different states to collect raw data.

Benchmark Experiments
In order to see more intuitively how LSSA performs in solving simple and complex mathematical problems, this paper compares some of the common and latest heuristics for LSSA.These include Particle Swarm Optimization (PSO) [33], the Grey wolf algorithm (GWO) [34], the Whale optimization algorithm (WOA) [35], the original Sparrow search algorithm (SSA), and an improved sparrow search algorithm (NSSA) [36].The experimental equipment is shown in Table 1.In order to more fairly measure the performance of the PSO, GWO, WOA, SSA, NSSA, and LSSA algorithms and to ensure the objectivity of the evaluation, the maximum number of iterations T of the algorithms in this paper is uniformly set to 300.The parameter settings that each group intelligence algorithm has individually are shown in Table 2.In order to verify the optimization seeking performance of LSSA, in this paper, LSSA will be simulated and tested using 23 benchmarking functions proposed by Xin-She Yang et al. [37], as shown in Tables 3-5.This test function can test the optimization performance of swarm intelligence algorithms from multiple perspectives, so it has been widely used in algorithm optimization performance testing and achieved good results.The test functions in the test set are continuous and can be categorized into single-peak benchmark functions, multi-peak benchmark functions, and multi-dimensional multi-modal benchmark functions.Among them, the single-peak benchmark function possesses fewer local extremes and is suitable for testing the convergence accuracy of the algorithm; the multi-peak benchmark function has more local extremes than the single-peak benchmark function and it can be used to evaluate the performance characteristics of the algorithm; the multidimensional multimodal benchmark function has a large number of local extremes, of which the algorithmic program is very easy to fall into the local optimum under this test function; and the multimodal test function can test the algorithm's ability to escape locally.
Table 6 shows the results of LSSA and other algorithms in the benchmark function experiments, of which the best, mean, and std obtained were counted separately.From the comparison result of the mean, it can be seen that LSSA has stronger mining and exploration ability performances compared to other algorithms, and from the comparison result of the standard deviation, it can be seen that LSSA possesses stronger stability.Taking the above results together, it can be concluded that LSSA has excellent results in balancing the ability of algorithms to develop globally and mine locally.In the current study, to further illustrate the performance of LSSA, the evolutionary curves of LSSA and the above algorithms are analyzed in comparison under the same conditions.The comparison of the optimization-seeking evolutionary curves for each function is shown in Figures 4-7.The quality of the solution of LSSA is significantly higher than that of other classical algorithms and NSSA for both the single-peak benchmark and multimodal functions, and the speed of the LSSA algorithm solution is also improved to different degrees.

Fault Diagnosis Experiment of Rolling Bearing at Casey Western Reserve University
In this paper, the bearing dataset (CWRU) [38] from Casey Western Reserve University was used to validate the methodology of this paper and the test rig is shown in Figure 8.In this paper, the sampling frequency of 12 kHz is selected, the motor drive end is in normal state, and the diameter of 0.1778 mm is under the condition of three kinds of fault signals including the outer ring fault, inner ring fault, and rolling body fault.The composition of the data sample is shown in Table 7.
The obtained data are uniformly decomposed using the LSSA-VMD algorithm, and the sample set is imported into the GRU neural network for training to obtain a wellperforming training model.The VMD-GRU-and LSSA-VMD-GRU-based fault diagnosis models are obtained by selecting 400 rounds of iterations as experimental outputs and calculating the accuracy of the fault diagnosis methods.Its training curve is shown in Figure 9.     Analyzing the experimental results in Figure 9, the LSSA-VMD-GRU is relative to the VMD-GRU fault diagnosis model.In the pre-convergence period, the convergence speed is rapid; in the post-convergence period, the accuracy curve does not fluctuate significantly and stabilizes, and the model can achieve a high accuracy rate.The accuracy of VMD-GRU in the pre-convergence period showed an overall increasing trend, but in the late convergence period, the model showed an unstable state with large fluctuations in the accuracy.It can be concluded that the LSSA-VMD-GRU network model is able to obtain a high accuracy rate when dealing with complex vibration signal fault classification.In order to better validate the accuracy of the diagnostic method, this paper evaluates the performance of the algorithm using accuracy, precision, recall, and F1 score as the measures.Their calculation methods are as follows: Precision = TP/(TP + FP) Recall = TP/(TP + FN) Figure 11 shows the confusion matrix obtained by LSSA-VMD-GRU for the classification of faults in rolling bearings.Table 8 shows the average values of accuracy, precision, recall, and F1 score obtained from the two fault diagnosis methods after 20 experiments.The experiments show that LSSA-VMD-GRU improves the accuracy, precision, recall, and F1 score by 38.29%, 34.2%, 66.48%, and 52.25% over CNN, respectively.LSSA-VMD-GRU improved the accuracy, precision, recall, and F1 score by 24.95%, 20.94%, 67.02%, and 48.05% over VMD-GRU, respectively.From the above indicators, it can be concluded that the diagnostic accuracy and stability of the method proposed in this paper are significantly stronger and it can accurately classify the vibration signals under four different conditions in rolling bearing fault diagnosis.LSSA-VMD-GRU is an effective fault diagnosis method.

Fault Diagnosis of Bearing at Paderborn University
In this paper, in order to verify the general applicability of the proposed methodology in rolling bearing fault diagnosis methods, a data set from the University of Paderborn, Germany [39] is selected to conduct fault diagnosis experiments on the proposed method, and its test bed is shown in Figure 12.The test bed consists of a motor, torque machine, rolling bearings, flywheel, and load motor.Each bearing has a different type of damage and level of failure, and the whole can be categorized into three states of health: inner ring failure, outer ring failure, and health.The types of rolling bearing failures can be categorized as man-made damage and natural acceleration experimental damage.Due to the limited number of bearing failures caused by human damage, the vibration data of a normal bearing with a rotational speed of 1500 r/min, the bearing outer ring failure degree 1-2 and bearing inner ring failure degree 1-2 are selected for the experiment in this paper and the samples are set as shown in Table 9. Same as the Casey Western Reserve University rolling bearing experiment, a comparison experiment of the proposed method with CNN and VMD-GRU is performed.Its training curve is shown in Figure 13 and its loss value curve is shown in Figure 14.From the results, it can be seen that LSSA-VMD-GRU still achieves the best accuracy with the fastest speed and the proposed algorithm iteration curve is stable.Figure 15 shows the confusion matrix obtained by the proposed method for rolling bearing fault diagnosis on this dataset, and several models mentioned above are evaluated with the evaluation metrics of Equations ( 13)-( 16), of which the experimental results are shown in Table 10.As shown in Table 10, LSSA-VMD-GRU improves the accuracy, precision, recall, and F1 score over CNN by 26.12%, 26.42%, 56.58%, and 44.6%, respectively.LSSA-VMD-GRU improves the accuracy, precision, recall, and F1 score over VMD-GRU by 24.13%, 26.39%, 48%, and 26.82%, respectively.In summary, the two sets of experiments show that the LSSA-VMD-GRU model can better realize the fault identification of rolling bearings, which provides a key technology for the intelligent fault diagnosis of rolling bearings.

Conclusions
In order to solve the problem that the parameters of traditional deep learning are difficult to be selected when dealing with complex rolling bearing fault diagnosis.In this paper, a sparrow search algorithm based on lens imaging reverse learning and Gaussian Cauchy mutation is proposed.Among the 23 benchmark functions, LSSA has the best optimization seeking performance and excellent stability compared to other algorithms.In addition, in a rolling bearing troubleshooting experiment at Case Western Reserve University, the proposed LSSA is used to search for the optimal parameters of VMD-GRU.The accuracy of this model is 96.61%, the precision is 93.36%, the recall is 98.49%, and the F1 score is 92.19%.There is a significant improvement in all the indicators of the proposed method compared to the pre-optimization network, and the experiment proves the effectiveness of LSSA in rolling bearing fault diagnosis.
In the future, with the further improvement and development of swarm intelligence algorithms as well as deep learning networks and other related sciences, the proposed method will be explored for applications within the fields of image processing [40], unmanned vehicles [41], production scheduling [42], and so on.

Figure 2 .
Figure 2. Schematic representation of lens imaging reverse learning strategy.

( 2 )
Find the optimal solution of the objective function by LSSA and obtain the optimal parameter combination of LSSA-VMD.(3) The optimal parameters are used to obtain IMF components from the variational modal decomposition of the fault signals of the four types of rolling bearings, and the energy entropy is extracted as the feature vector of the classifier by screening the IMF components that contain obvious fault information.(4) Input the feature vectors into the GRU fault diagnosis model, train to obtain the prediction model of each state, and input the collected test signal data set into the model to realize the fault diagnosis of rolling bearings.

Figure 3 .
Figure 3.Comparison of Gaussian and Cauchy distributions.

Figure 9 .
Figure 9. Accuracy training curve for experiment one.

Figure 10
Figure 10 shows the training curves of the loss values of GRU and LSSA-VMD-GRU.By analyzing the training results in Figure 10, the loss rate of LSS-VMD-GRU decreases dramatically in the pre-training period, and the loss value is able to reach a very small value and converge stably in the late iteration.The loss value of GRU is in a slowly decreasing state in the early stage and the loss value fluctuates a lot in the later stage of the iteration, where a discrete phenomenon occurs.

Figure 10 .
Figure 10.Loss value training curve for experiment one.

Figure 11 .
Figure 11.Confusion matrix for experiments on the CWRU dataset.

Figure 13 .
Figure 13.Accuracy training curve for experiment two.

Figure 14 .
Figure 14.Loss value training curve for experiment two.

Figure 15 .
Figure 15.Experimental confusion matrix for the Paderborn University dataset.

Table 2 .
Parameter settings for each swarm intelligence algorithm.

Table 6 .
Benchmark function experiments of LSSA with other algorithms.

Table 7 .
Sample composition of the data.

Table 8 .
Comparison of diagnostic methods.

Table 10 .
Comparison of different classifiers.