A Fault Diagnosis Method of Rolling Bearings Based on Parameter Optimization and Adaptive Generalized S-Transform

: As for the fault diagnosis of rolling bearings under strong background noises, whether the fault feature extraction is comprehensive and accurate is critical, especially for the data-driven fault diagnosis methods. To improve the comprehensiveness and accuracy of the fault feature extraction, a fault diagnosis method of rolling bearings is proposed based on parameter optimization and Adaptive Generalized S-Transform (AGST). The AGST is used to solve the problem of incomplete feature extraction of bearing faults. The Particle Swarm Brain Storm Optimization algorithm based on the Discussion Mechanism (PSDMBSO) is used for the parameter optimization of VMD, which can better separate the complete fault components. The effectiveness of the fault diagnosis method proposed in this paper is veriﬁed by comparison with other methods. Author Contributions: Conceptualization, Y.P.; methodology, Y.P.; software, validation, Y.P.; formal analysis, Y.P.; investigation, Y.P.; resources, Y.P. X.M.; data curation, Y.P.; writing—original draft preparation, Y.P.; writing—review and editing, Y.P. and X.M.; visualization, Y.P.; supervision, X.M.; administration,


Introduction
The rolling bearing occupies a vital position in most rotary machines. If the rolling bearing faults occur, it might lead to the parts of rotary machines being extremely easily damaged and even equipment collapse. Therefore, it is extremely important to diagnose the fault category of the bearing in time and accurately. The current mainstream rolling bearing fault diagnosis models are divided into three types: the diagnosis methods based on expert systems [1,2], the diagnosis methods based on analytical models [3,4], and the diagnosis methods based on data-driven models [5,6]. The diagnosis accuracy of the diagnosis method based on the expert system depends on the richness of expert experience and the level of knowledge, and the diagnosis method based on the analytical model relies on the accurate mathematical model of the diagnosed object. Therefore, these deficiencies limit its application in some actual environments. Compared with the above two methods, the data-driven diagnosis method is more practical for it diagnoses faults based on the historical data of the system and does not rely on accurate analytical models and rich expert knowledge.
As for the fault diagnosis method based on data-driven models, the capability of feature extraction has significant effects on the accuracy of fault diagnosis. In the early days, fault features were manually extracted from the vibration acceleration signals of rolling bearings [7]. At present, the vibration signal in different forms is used to improve a more accurate and sufficient feature extraction [8,9]. However, the effect of the signal processing method is often limited by the quality of the parameters, and so the optimization of the parameters becomes very important. Li proposed a bearing fault diagnosis method based on Particle Swarm Optimization Maximum Correlated Kurtosis Deconvolution (MCKD). The three parameters of MCKD were optimized by Particle Swarm Optimization to achieve effective enhancement of fault characteristics [10]. Tong proposed a rolling bearing fault diagnosis model based on the combination of VMD and Bayesian networks based on parameter optimization, and the parameters of VMD were optimized by variable-step Particle Swarm Optimization to improve the effectiveness of feature extraction [11]. Li proposed a VMD parameter optimization method based on the maximum envelope kurtosis and frequency band entropy and completed the identification of fault types through the envelope frequency spectrum [12]. Liu proposed a diagnosis model based on parameter optimization VMD and sample entropy. The genetic variation Particle Swarm Optimization algorithm was used for parameter optimization, which reduced the influence of parameters to a certain extent [13]. However, there are still existed the problem of manual selection in the selection of modal components. On this basis, Tang proposed the concept of envelope entropy by combining the sparse characteristics of vibration signals and used it as a fitness function to realize the adaptive optimization of modal component selection [14]. In addition, Liu constructed a band-pass filter based on the concept of frequency band entropy, and on this basis proposed a screening criterion for reconstruction components based on band entropy. The experimental results showed the effectiveness of the method [15].
However, with regards to the screening of VMD decomposition results, the key feature of different modal components between the fault state and the normal state needs to be captured more clearly. Based on these questions, contents related to neural networks are used to obtain sufficient fault features by increasing the comprehensiveness of feature extraction [16,17]. Since the one-dimensional convolutional neural network is prone to problems such as overfitting and missing features when dealing with vibration signals, it has become the mainstream trend to convert vibration signals into other forms of data. Xiao proposed a data processing method to convert the time domain signal into a two-dimensional grayscale image, and then applied it to the intelligent fault diagnosis of bearings [18]. By the method the depth of feature extraction is further improved but the adaptive ability of the model is reduced due to the fixed feature extraction scale. Shao proposed a fault diagnosis method for rotor-bearing systems under variable speed based on two-stage parameter transfer and infrared thermal imaging. The parameter transfer method enables the data (or parameters) acquired in the source domain to be applied in the target domain, which greatly enhances the adaptive ability of the system [19]. Luo proposed to use GST to convert the original signal into a two-dimensional time-frequency map, and adaptively change the time-frequency resolution through frequency changes to extract more sufficient time-frequency features [20]. However, due to the use of a relatively fixed scale factor, this method cannot fully reflect the feature differences in different frequency ranges.
Based on the above analysis, in order to improve the diagnostic accuracy and antiinterference ability of the existing fault diagnosis methods, a fault diagnosis method for rolling bearings based on parameter optimization and AGST is proposed. Borrowing the idea of the Brain Storm Optimization algorithm [21] and the Discussion Mechanismbased Brain Storm Optimization algorithm (MBSO) [22], PSDMBSO is introduced for VMD parameter optimization. The PSDMBSO algorithm is used to optimize the parameters to solve the problem of inaccurate feature extraction by accurately extracting signals containing fault features. At the same time, the AGST realizes the comprehensive extraction of features in different frequency intervals by adaptively changing the scale factor, thus solving the problem of incomplete feature extraction. This paper is organized as follows: Section 1 introduces the fault diagnosis method, and elaborates the feature extraction process in this method in detail; Section 2 elaborates the PSDMBSO algorithm, the improved optimal band-pass filter design, and signal-image preprocessing; Section 3 verifies the effectiveness of the PSDMBSO algorithm and the filter involved in the standards in the feature extraction process, and then verifies the effectiveness of the method in this paper on real bearing fault data. Finally, the conclusion is given in Section 4.

Fault Diagnosis Method of Rolling Bearings Based on Parameter Optimization and AGST
The fault diagnosis method proposed is based on PSDMBSO algorithm optimization parameters and Adaptive Generalized S-Transform. The parameters of VMD are optimized by PSDMBSO to achieve the optimal solution under different states, and the interference signal (such as noise, harmonics, etc.) is completely separated from the signal containing fault information. On this basis, the Adaptive Generalized S-Transform can extract the features of different frequencies more comprehensively by adaptively adjusting the timefrequency resolution of different frequency bands of the original signal, thereby extracting more effective time-frequency features. The method we propose combines these two parts to complete comprehensive and accurate extraction of bearing fault features. On this basis, further feature extraction is completed through the improved LeNet-5 model, and the output feature vector is processed by SVM to output the fault category of the bearing. The specific structure of the fault diagnosis method we propose is in Figure 1. ter involved in the standards in the feature extraction process, and then verifies the effectiveness of the method in this paper on real bearing fault data. Finally, the conclusion is given in Section 4.

Fault Diagnosis Method of Rolling Bearings Based on Parameter Optimization and AGST
The fault diagnosis method proposed is based on PSDMBSO algorithm optimization parameters and Adaptive Generalized S-Transform. The parameters of VMD are optimized by PSDMBSO to achieve the optimal solution under different states, and the interference signal (such as noise, harmonics, etc.) is completely separated from the signal containing fault information. On this basis, the Adaptive Generalized S-Transform can extract the features of different frequencies more comprehensively by adaptively adjusting the time-frequency resolution of different frequency bands of the original signal, thereby extracting more effective time-frequency features. The method we propose combines these two parts to complete comprehensive and accurate extraction of bearing fault features. On this basis, further feature extraction is completed through the improved LeNet-5 model, and the output feature vector is processed by SVM to output the fault category of the bearing. The specific structure of the fault diagnosis method we propose is in Figure 1.

Particle Swarm Brain Storm Optimization Algorithm Based on Discussion Mechanism (PSDMBSO)
The algorithm flow of PSDMBSO is as follows: 1. Randomly generate n individuals (feasible solutions); 2. The population is divided into m classes by the clustering method, and the fitness of each individual is evaluated, and the class center of each class is selected according to the evaluation result; 3. Generate a random number 1 r , and judge whether 1 Dis r p < . If it is satisfied, a random individual will be generated to replace the randomly selected cluster center for discussion in the group. If not, it will directly enter the group for discussion;

Particle Swarm Brain Storm Optimization Algorithm Based on Discussion Mechanism (PSDMBSO)
The algorithm flow of PSDMBSO is as follows:
The population is divided into m classes by the clustering method, and the fitness of each individual is evaluated, and the class center of each class is selected according to the evaluation result; 3.
Generate a random number r 1 , and judge whether r 1 < p Dis . If it is satisfied, a random individual will be generated to replace the randomly selected cluster center for discussion in the group. If not, it will directly enter the group for discussion; 4.
Discuss within the group, and use the group discussion mechanism to update the individual, and determine whether the current number of discussions within the group has reached the maximum number of discussions. If satisfied, directly enter the discussion between groups; otherwise, continue the discussion within the group until the maximum number of discussions is reached; 5.
Conduct discussions between groups, use the discussion mechanism between groups to update individuals, and determine whether the current number of discussions has reached the maximum number, and go directly to the next step; otherwise, continue discussions between groups until this condition is met; 6.
Compare the fitness of new and old individuals, retain better individuals, and adjust the upper limit of the number of discussions within and between groups to determine whether the maximum number of iterations has been reached; if not, go back to step (2), and then determine whether it is satisfied with partial population initialization. If the criteria are satisfied, part of the population is initialized and go back to step (2). If not, the algorithm ends, and the result is output.
In the algorithm, each idea represents a feasible solution to the problem, and the population update is divided into inner-group and outer-group discussions. The specific iteration times are as follows: In the formula, N t − in is the upper limit of the number of inner-group discussions; N t − ex is the upper limit of the number of outer-group discussions; N c − i is the number of current generations of individuals; N m − i is the number of individual generations that are most generated; N m − t is the upper limit of the number of discussions within and between groups' maximum value.
This paper proposes a new individual update method, which separates the innergroup and outer-group. In the inner-group discussion, the concept of global guidance and local priority is introduced, and the influence of the global optimal value is introduced in the outer-group discussion. The birth of new ideas can accept the feedback of the characteristics of the current optimal value, and the introduced openness factor represents the unusual ideas in the process of human brainstorming and enhances the search ability of the algorithm by controlling the open element in the innovation. The specific description is as follows: In Formulas (3) and (4), L d and H d are the minimum and maximum values of the d-th dimension of the vector to be solved, x pd and x gd , respectively, represent the size of the class-optimal and globaloptimal d-th dimension in the current generation, and x ad and x bd are two random feasible solutions in the current generation. Two random solutions in the feasible solution, and 1 ≤ a = b ≤ N, where N is the number of feasible solutions, x id is the original single individual or the individual to be updated formed by a mixture of two solutions, y id represents the value of the d-th dimension of the newly generated solution, c 1 , c 2 , and c 3 represent the learning factors, and p r−g , p r−p represent the open factors between and within groups. The update methods of the open factors are as follows: In Formulas (5)-(7), c 1− p min and c 2− p min represent the minimum value of the local and global learning factors discussed within the group, c 1− p max and c 2− p max represent the maximum value of the local and global learning factors discussed within the group, N m − p and N m − g represent the current number of discussions within and between groups, c − g max and c − g min , respectively, represent the minimum and maximum value of the learning factors in the discussion between groups. We propose to initialize part of the population to inject new ideas at an appropriate time. The idea is to determine whether to initialize by calculating the following formula when the number of individual generations in the population has not reached the maximum number of generations: In Formula (8), E N c − i represents the optimal value of the fitness function when the individual algebra generated is N c − i , p per is the set initial algebra ratio, which represents the comparison of fitness after N c − i × p per generation, and η is the initialization parameter. When the above relationship is satisfied, a part of the population is initialized. This part is to remove the two parts by arranging the population from small to large in terms of fitness, taking the middle 50%.
PSDMBSO flexibly adjusts the global and local search capabilities of the algorithm by dividing the individual update into two parts-the inner-group and outer-group-avoiding the algorithm falling into local optimum. Aiming at the phenomenon that the algorithm is prone to premature maturity, a partial population initialization is proposed, which further reduces the possibility of the algorithm falling into the local optimum and increases the search range of the algorithm.

Optimization of VMD Parameters Based on PSDMBSO
The envelope entropy of the signal x(i) with a sampling length of N is described as follows: In Formula (9) and (10), i = 1, 2, . . . , N, a(i) are the envelope signals obtained after Hilbert demodulation of x(i). p i is obtained by normalizing a(i).
According to the inner-group and outer-group discussion mechanisms and search mechanisms of PSDMBSO, the following fitness function is established: In Formula (11), the results of µ and γ are 0.01 and 0.005, respectively, selected by simulation.

Improved Filter Construction Based on Frequency Band Entropy
According to the literature [15], it can be obtained that if the original signal is x(t), and assuming that the number of modal components of the original signal after VMD decomposition is K * , which is recorded as I MF j (j = 1, 2, . . . , K * ), the band entropy distribution of I MF j is as follows: In Formula (13), j represents the j-th modal component of the original signal after VMD decomposition, f be − f represents the f -th frequency component in the j-th modal component, and represents the corresponding frequency band entropy. On the basis, the minimum band entropy is defined as: In Formula (14), it can be obtained that each IMF has a corresponding minimum frequency band entropy, which is recorded as f E j min , and the filter is constructed using this as the center frequency.
Secondly, the bandwidth is defined as f j = We proposed a new screening criterion for reconstruction components. When the criterion in Equation (16) is not satisfied, the modal component that satisfies the following equation can also be used as the reconstruction component: Satisfying either of Equations (15) or (16) is regarded as satisfying the reconstruction standard. The components that meet the above conditions contain a part of fault information, and at the same time have less internal interference components. This standard balances the extraction of fault information and the removal of interference components and can better screen components containing fault information.

Feature Extraction Based on Adaptive Generalized S-Transform
Let x(t) ∈ L 2 (R), L 2 (R) be the energy-limited function space. Then, the Adaptive Generalized S-Transform of the signal x(t) is defined as: In the formula, p and k are constant between 0 and 1, f represents the frequency, τ represents the time shift factor, ω(τ, f ) is a Gaussian window function, i is an imaginary unit, σ( f ) is the adaptive scale factor of the Gaussian window function, and f k is the center frequency of the corresponding band filter of the original signal. When f = f k and p = k = 1, the Adaptive Generalized S-Transform becomes the standard S-transform.
It can be seen from Equation (19) that when f < f k , the variance of the window function will increase with the decrease of the frequency f , resulting in a higher frequency domain resolution. According to the characteristics of the exponential function, the frequency domain resolution will change with the change of the frequency range, so that the frequency domain resolution of the low frequency range is higher, and the time-frequency distribution characteristics of the time-frequency range of the lower frequency range can be more fully reflected. When f > f k , the variance of the window function will decrease with the increase of frequency f , resulting in a higher time domain resolution, and the more fully reflected time-frequency distribution characteristics of higher frequency segments.
In Equation (19), as f changes, the scale factor maintains better time-frequency resolution of the Gaussian window function through adaptive changes. By introducing an adaptive scale factor, the adaptive change of the time-frequency resolution is completed to extract more time-frequency features.

Numerical Experiment and Analysis
In order to verify the effectiveness of the method we proposed, we selected the data source of Case Western Reserve University's Electrical Engineering Laboratory for simulation experiments. In the experimental platform, the drive end bearing model was SKF6205. The fan end bearing model was SKF6203. The fault setting was a single point damage using Electronic Discharge Machining (EDM), and the damage diameter included three types: 0.1778, 0.3556, and 0.5334 mm. Among them, the damage point of the outer ring of the bearing was set at three different positions of the clock: 3 o'clock, 6 o'clock, and 12 o'clock. The vibration signal was collected with a sampling frequency of 12 kHz, and the drive end bearing failure also included data with a sampling frequency of 48 kHz. In the simulation experiment, the motors were given four loads of 0, 1, 2, and 3 Hp, corresponding to four different speeds of 1797, 1772, 1750, and 1730 r/min. Additionally, the rated power of the motor was 1.5 KW.
The structure of the neural network we used is based on the LeNet-5 model for two-dimensional images that have undergone signal-image preprocessing. The parameter settings of the neural network are shown in Table 1.

Simulation Experiment of PSDMBSO Algorithm
When cracks or other faults occur in the components of the rolling bearing, the vibration state of the rolling bearing will change. In order to accurately analyze the characteristic information contained in the vibration signal, this paper uses the rolling-element fault data with a motor speed of 1797 rpm and a damage diameter of 0.1778 mm, and was compared with several commonly used optimization algorithms which include Particle Swarm Optimization (PSO), Genetic Algorithm (GA), and BSO.
From Figure 2b, it can be seen that the fault signal mainly includes 3 main frequency components. The parameter settings of the PSDMBSO algorithm were divided into BSO parameter settings and update, and initialization parameter settings. The BSO parameters were the same as in Reference [22], the initialization parameters (m, n, etc.) were set to common values, and the update parameters were selected from the optimal values in 10 random parameter experiments within the common range. Update and initialization parameters according to the final results of the simulation experiment are shown in Table 2.
According to Konstantin Dragomiretskiy's documents, the parameters that the VMD algorithm needs to be determined are: the number of IMF components K, penalty factor α, convex function optimization-related parameter tau, center frequency initialization setting init, center frequency update related parameter DC, termination condition ε. Parameters other than K and α have little effect on the decomposition effect, set to common values, namely, tau = 0, init = 1, DC = 0, ε = 1e − 7. The spectral distribution diagrams of the modal components are shown in Figures 3 and 4.
tion state of the rolling bearing will change. In order to accurately analyze the characteristic information contained in the vibration signal, this paper uses the rolling-element fault data with a motor speed of 1797 rpm and a damage diameter of 0.1778 mm, and was compared with several commonly used optimization algorithms which include Particle Swarm Optimization (PSO), Genetic Algorithm (GA), and BSO. From Figure 2b, it can be seen that the fault signal mainly includes 3 main frequency components. The parameter settings of the PSDMBSO algorithm were divided into BSO parameter settings and update, and initialization parameter settings. The BSO parameters were the same as in Reference [22], the initialization parameters (m, n, etc.) were set to common values, and the update parameters were selected from the optimal values in 10 random parameter experiments within the common range. Update and initialization parameters according to the final results of the simulation experiment are shown in Table 2.  It can be seen from Figure 3a,b that the original signal is decomposed into 5 and 4 modal components, respectively, and different numbers of false components appear; that is, the over-decomposition phenomenon occurs.
(a) It can be seen from Figure 3a,b that the original signal is decomposed into 5 and 4 modal components, respectively, and different numbers of false components appear; that is, the overdecomposition phenomenon occurs.    It can be seen from Figure 4b that the original signal is decompos components, while the result of the algorithm we proposed is 3 modal c there is no modal aliasing phenomenon, indicating that the original signa ter good decomposition effect.
In order to get close to the actual situation, we used the bearing failu Bearing Data Center of Western Reserve University. The set load was 2 H the speed was 1750 r/min. By compressing the obtained time-frequency d olution of 64 × 64 in the same proportion, 7000 training set images and 100 were finally obtained. The specific fault types are shown in Table 3. For th of vibration acceleration signals of ten types of faults at rated power, the eter combinations searched by the PSDMBSO algorithm are shown in Ta   Table 3. Types of rolling bearing failure.  It can be seen from Figure 4b that the original signal is decomposed into 4 modal components, while the result of the algorithm we proposed is 3 modal components, and there is no modal aliasing phenomenon, indicating that the original signal achieves a better good decomposition effect.

Bearing
In order to get close to the actual situation, we used the bearing failure data from the Bearing Data Center of Western Reserve University. The set load was 2 Hp, which means the speed was 1750 r/min. By compressing the obtained time-frequency diagram to a resolution of 64 × 64 in the same proportion, 7000 training set images and 1000 test set images were finally obtained. The specific fault types are shown in Table 3. For the four categories of vibration acceleration signals of ten types of faults at rated power, the optimal parameter combinations searched by the PSDMBSO algorithm are shown in Table 4 below.

Bearing Status Labels
Inner

Improved Filter Construction Based on Band Entropy
Take rolling bearing load 2 Hp, damage radius 0.1778 mm, speed 1750 r/min, sampling frequency 12 kHz inner-ring fault data as an example. The final selection results of reconstruction components are as follows: the frequency band entropy data of the original signal under window functions of different lengths is shown in Figure 5a below.

Improved Filter Construction Based on Band Entropy
Take rolling bearing load 2 Hp, damage radius 0.1778 mm, speed 1750 r/min, sampling frequency 12 kHz inner-ring fault data as an example. The final selection results of reconstruction components are as follows: the frequency band entropy data of the original signal under window functions of different lengths is shown in Figure 5a below. It can be seen that the center frequency is 2062.5 Hz, and the corresponding optimal window function length is 64. According to the above formula, the pass band of the corresponding band filter of the original signal can be obtained as [1921.375, 2203.625]. In addition, the parameters obtained by the PSMBSO algorithm optimization are 8 K = , 249 α = , and then the corresponding IMF components can be obtained, and the same op- It can be seen that the center frequency is 2062.5 Hz, and the corresponding optimal window function length is 64. According to the above formula, the pass band of the corresponding band filter of the original signal can be obtained as [1921.375, 2203.625]. In addition, the parameters obtained by the PSMBSO algorithm optimization are K = 8,α = 249, and then the corresponding IMF components can be obtained, and the same operation is performed on each component. The final result is that IMF1 and IMF2 are selected as the optimal reconstruction components. The detailed process is as follows: as shown in Figures 5b and 6a, the corresponding center frequency and optimal window function length can be obtained as [1968.75, 128]  .5625], which meets the previously proposed standards, and the remaining components do not meet the requirements, so they are discarded. Take IMF7 as an example-its center frequency is 375 Hz, the corresponding optimal window function length is 64, and the final result is [234.375,515.625], which obviously does not meet the requirements. The detailed results are shown in Figure 6b below.  Figure 6b below.

Experimental Result
In this paper, the bearing fault data set of the drive end was selected, and the onedimensional signal was converted into a time-frequency map using AGST transformation on the reconstructed signal. At the same time, the time-frequency map was compared with the time-frequency map processed by GST, as shown in Figures 7 and 8.

Experimental Result
In this paper, the bearing fault data set of the drive end was selected, and the one-dimensional signal was converted into a time-frequency map using AGST transformation on the reconstructed signal. At the same time, the time-frequency map was compared with the time-frequency map processed by GST, as shown in Figures 7 and 8.  Figure 6b below.

Experimental Result
In this paper, the bearing fault data set of the drive end was selected, and the onedimensional signal was converted into a time-frequency map using AGST transformation on the reconstructed signal. At the same time, the time-frequency map was compared with the time-frequency map processed by GST, as shown in Figures 7 and 8. The upper part of Figure 8b is the outer-ring fault data with the fault diameter equal to 0.5334 mm, and the lower part is the normal signal.
For the optimization problem of SVM, the fitness function was selected as the 5-fold cross-validation method. When the accuracy rate reaches the highest and does not change, the corresponding C and g are the best parameters. In this paper, the range of C and g was set to 0-100.
As shown in Figure 9, the blue curve represents the average fitness, and the red curve represents the best fitness. With the increase of the number of iterations, the red best fitness curve shows an upward trend, and reaches the maximum when the number of iterations reaches 30. Excellent and remaining unchanged, the correct rate of 5-fold crossvalidation is 83.333%, the best parameter C = 79.4712, the best parameter g = 0.0328, and the SVM classification model is established with these two parameters. The upper part of Figure 8b is the outer-ring fault data with the fault diameter equal to 0.5334 mm, and the lower part is the normal signal.
For the optimization problem of SVM, the fitness function was selected as the 5-fold cross-validation method. When the accuracy rate reaches the highest and does not change, the corresponding C and g are the best parameters. In this paper, the range of C and g was set to 0-100.
As shown in Figure 9, the blue curve represents the average fitness, and the red curve represents the best fitness. With the increase of the number of iterations, the red best fitness curve shows an upward trend, and reaches the maximum when the number of iterations reaches 30. Excellent and remaining unchanged, the correct rate of 5-fold crossvalidation is 83.333%, the best parameter C = 79.4712, the best parameter g = 0.0328, and the SVM classification model is established with these two parameters. The upper part of Figure 8b is the outer-ring fault data with the fault diameter equal to 0.5334 mm, and the lower part is the normal signal.
For the optimization problem of SVM, the fitness function was selected as the 5-fold crossvalidation method. When the accuracy rate reaches the highest and does not change, the corresponding C and g are the best parameters. In this paper, the range of C and g was set to 0-100.
As shown in Figure 9, the blue curve represents the average fitness, and the red curve represents the best fitness. With the increase of the number of iterations, the red best fitness curve shows an upward trend, and reaches the maximum when the number of iterations reaches 30. Excellent and remaining unchanged, the correct rate of 5-fold cross-validation is 83.333%, the best parameter C = 79.4712, the best parameter g = 0.0328, and the SVM classification model is established with these two parameters. It can be seen from Figure 10a that the predicted label and the actual label of the test set completely overlap, which means the diagnosis accuracy of the bearing reaches 100%.
We selected the same data, used the Generalized S-Transform to convert the reconstructed signal into a time-frequency map, and other steps are the same. The final test set result is shown in Figure 10b. Under the same data, the final result of using improved screening criteria is shown in Figure 11a. It can be seen from Figure 10a that the predicted label and the actual label of the test set completely overlap, which means the diagnosis accuracy of the bearing reaches 100%.
We selected the same data, used the Generalized S-Transform to convert the reconstructed signal into a time-frequency map, and other steps are the same. The final test set result is shown in Figure 10b. Under the same data, the final result of using improved screening criteria is shown in Figure 11a. It can be seen from Figure 10a that the predicted label and the actual label of the test set completely overlap, which means the diagnosis accuracy of the bearing reaches 100%.
We selected the same data, used the Generalized S-Transform to convert the reconstructed signal into a time-frequency map, and other steps are the same. The final test set result is shown in Figure 10b. Under the same data, the final result of using improved screening criteria is shown in Figure 11a. It can be seen from Figure 10a that the predicted label and the actual label of the test set completely overlap, which means the diagnosis accuracy of the bearing reaches 100%.
We selected the same data, used the Generalized S-Transform to convert the reconstructed signal into a time-frequency map, and other steps are the same. The final test set result is shown in Figure 10b. Under the same data, the final result of using improved screening criteria is shown in Figure 11a. It can be drawn from Figure 10b that the final result of using the Generalized S-Transform is 96.6%, which shows the superiority of AGST. Compared with the method, the accuracy of the diagnostic model we proposed is improved by 3.4% at rated power. As shown in Figure 11a, the accuracy of the test set is 95.4%, which is also 4.6% lower than the method we proposed, which illustrates the effectiveness of the screening criteria we proposed. In order to further address the advantages of the feature extraction method we proposed, we used the traditional VMD decomposition method (K = 6, α = 2000) to decompose the signal, and then used the same processing.
It can be seen from Figure 11b that the accuracy is 92.60%. Under the same data, the method we proposed has an accuracy rate of 7.40% higher compared with the traditional VMD, which shows that the method we proposed can accurately identify different faults with the same damage diameter.
Based on the above comparative experiments, the fault diagnosis results of the above methods for different positions and different degrees of damage under the rated power are further explored. The method we proposed is named Method 1. The method that uses PSDMBSO to optimize the VMD parameter + optimized band filters + AGST + CNN + Softmax is named Method 2. The method that PSDMBSO-optimized VMD + unoptimized band filters + AGST + SVM is named Method 3. The method that uses the traditional VMD decomposition method is named Method 4.
The network structure and parameters of CNN in Method 2 are the same as in Method 1, the SVM parameters C and g in Method 3 are randomly selected parameters, and the parameter of traditional VMD in Method 4 was set to [6,2000].
In order to further test the diagnostic accuracy of method we proposed under non-rated power, the data under the same state were selected for experimental comparison. The final results are shown in Tables 5-7. It can be seen that the method we proposed has an accuracy close to 100%. Compared to other methods with better current-level effects, the accuracy rate has been improved by 6.2-8.9%. The result shows that the method we proposed is suitable for the complex and changeable working environment of the rolling bearing, which has extraordinary significance for the diagnosis of rolling bearings in practical applications.
In order to further verify the effectiveness of the method we proposed in dealing with minor faults, we selected 100 groups of fault data (0.1778 mm) in different load states, totaling 2400 test samples. First, the micro faults under different load conditions were tested separately, and the results are shown in Figures 12 and 13. Finally, the experimental results corresponding to different loads are shown in Table 8.  It can be seen that the method we proposed has an accuracy close to 100%. Compared to other methods with better current-level effects, the accuracy rate has been improved by 6.2-8.9%. The result shows that the method we proposed is suitable for the complex and changeable working environment of the rolling bearing, which has extraordinary significance for the diagnosis of rolling bearings in practical applications.
In order to further verify the effectiveness of the method we proposed in dealing with minor faults, we selected 100 groups of fault data (0.1778 mm) in different load states, totaling 2400 test samples. First, the micro faults under different load conditions were tested separately, and the results are shown in Figures 12 and 13. Finally, the experimental results corresponding to different loads are shown in Table 8.   Outer-ring  It can be seen from Table 8 that the method we proposed has a higher accuracy in the identification of minor faults than the other three methods. At the same time, it can be seen from the table that compared with the other three methods, the accuracy rate of the method we proposed can still remain above 96% when judging the three precise fault types that belong to the same minor fault type of the outer ring, but the fault location is different. This greatly improves the accurate judgment of minor faults required in actual production.
In order to further explore the real anti-noise performance of the method we proposed in the actual working environment, we introduced the standard noise library NOISEX-92 in the Signal Processing Information Base (SPIB). The noise data in the library were all collected under the condition of a sampling frequency of 19.98 KHz, and the duration was 235 s. By changing the sampling frequency of the noise signal to 12 KHz, the frequency calibration of the noise signal and the fault signal was completed, and on this basis, the calibrated noise signal was added to the original vibration signal. The experimental results are shown in Table 9.
As shown in Table 9, the difference between Method 1 and Method 2 is the signal-image processing method. From the result, we can see that the lowest value of Method 1 is higher than the highest value of Method 2, which shows that the Adaptive Generalized S-Transform we proposed improves noise immunity by extracting more effective features. Comparing Method 1 and Method 3, we can see that our proposed reconstruction component selection criterion improves the antiinterference ability of the whole model by removing the interference components and extracting more fault features. In summary, the method we proposed has been maintained at a higher level and the average accuracy rate is also 7.61-9.60% higher than other methods in the table, which shows that the fault diagnosis method we proposed can better separate fault and noise components.

Discussion of Experimental Results
As can be seen from Tables 5-7, with the deepening of the fault degree (damage diameter), the diagnostic accuracy of the four models is increasing. For faults at different positions with a fault diameter of 0.1778 mm, the accuracy of the method we proposed can reach more than 97%, which is 5-8% higher than the other three models, which shows that the method we proposed has improved the recognition accuracy of minor faults in the case of different fault degrees.
It can be seen from further experiments on different fault types under minor faults that the recognition accuracy of the model we proposed is 0.73-1.4% lower than that of general faults, while the other three models have more than 2%, which shows that the method has strong adaptability to fault identification of different fault degrees. At the same time, it can be seen from the comparative test results under minor faults that the method we proposed enhances the sensitivity of the diagnostic model to minor faults.
Judging from the noise experiment under minor faults, compared with the accuracy rate of the other three models below 92%, the accuracy rate of the diagnostic model we proposed is maintained at about 98%, which shows that the parameter optimization and AGST can extract the more effective fault features and thus improve the diagnostic accuracy in noisy environments.

Conclusions
A rolling bearing fault diagnosis method based on parameter optimization and AGST was proposed. Compared with the traditional VMD methods, the fault diagnosis method we proposed achieves higher fault diagnosis accuracy through more comprehensive and accurate feature extraction under general failure. For minor faults, the diagnostic accuracy rate of our proposed method can still remain above 98.5% compared with other methods under different loads, which is 5.17-9.5% higher than other methods. The diagnostic accuracy of our proposed method can still be maintained above 97.96% when some types of noise exist. Comprehensive experimental results show that the diagnostic method proposed by us significantly improves the diagnostic accuracy and anti-interference ability of rolling bearings by realizing accurate and comprehensive extraction of fault features.