Time-Frequency Fusion Features-Based GSWOA-KELM Model for Gear Fault Diagnosis

: To improve the accuracy of gear fault diagnosis and overcome the low diagnostic accuracy of the model caused by manual parameter selection, a combined diagnostic model based on time-frequency fusion features is combined with the improved global search whale optimization algorithm (GSWOA) to optimize the fault diagnosis capability of the kernel extreme learning machine (KELM). First, the time-domain and frequency-domain features of the gear fault state are extracted separately, and feature vectors are constructed through feature fusion, which overcomes the limitations of single features. Second, the GSWOA based on three strategies is used to optimize the regularization coefficient C and kernel function parameter γ of KELM, and a GSWOA-KELM fault diagnosis model is built to avoid the problem of low fault diagnosis accuracy caused by the manual selection of KELM parameters. Finally, the public dataset from Southeast University is taken to verify the performance of the proposed model by comparing it with KELM, SSA-KELM, and WOA-KELM models. The experimental results demonstrate that the improved time-frequency fusion features-based GSWOA-KELM model shows faster convergence speed and stronger global search ability. Compared with KELM, SSA-KELM, and WOA-KELM models, the performance of the proposed model has been improved by 11.33%, 8.67%, and 1.33%, respectively.


Introduction
Gears are essential components to ensure the normal operation of various types of rotating machinery and are widely used in all kinds of industrial scenarios, such as wind turbines and automotive transmissions [1][2][3].However, gear mesh surfaces are subjected to operating environments that result in poor lubrication, as evidenced by the mechanical impurities in oil, oil film disruption, etc., which further causes non-lubricating factors such as friction on the tooth surface, leading to surface deterioration, scuffing, and permanent deformation, etc. [4].In addition, due to quenching, fatigue, grinding, and cyclic loading, gears are often subject to cracks and fractures [5].Once the gear fails, it will not only greatly reduce the safety and reliability of the equipment but also cause enormous safety production accidents, which will bring immense hidden dangers to the social economy and stability.Therefore, it is of great importance to detect gear faults and take timely measures to ensure the safe operation of equipment and maintain social security and stability.
At present, the use of intelligent diagnosis technology based on machine learning for gear fault diagnosis is a major research trend among scholars at home and abroad [6][7][8].Reference [9] used frequency-modulated empirical mode decomposition to extract vibration signal features and calculated the energy entropy as a feature vector, which was inputted into the support vector machine (SVM) to realize gear fault diagnosis.Reference [10] input the extracted vibration signal characteristic parameters into the K-nearest neighbor (KNN) fault diagnosis model, which effectively achieved the predictive maintenance of rolling bearing faults.Although traditional machine learning algorithms can complete fault diagnosis, they still have some shortcomings in training speed and diagnosis accuracy.Therefore, scholars are gradually turning their attention to neural network-based algorithms.The literature [11] combined the traditional machine learning algorithm SVM and convolutional neural network (CNN) to build a CNN-SVM fault diagnosis model, avoiding the artificial feature extraction process and improving fault diagnosis accuracy and stability.In the literature [12], a BP-AdaBoost gear fault strong classifier model based on the BP neural network and AdaBoost algorithm was proposed and verified using experiments, and the results showed that the proposed method has higher accuracy than traditional fault diagnosis methods.The literature [13] proposed a new hierarchical fine composite multiscale fluctuation dispersive entropy (HRCMFDE) feature extraction method, which inputs the extracted features into a regularized extreme learning machine (RELM) using relief dimensionality reduction, effectively improving the practicality and versatility of model fault diagnosis.
The kernel extreme learning machine (KELM), as a novel feed-forward neural network, has higher generalization ability and stability than the BP neural network, RBF neural network, and ELM, so it has greater application advantages [14].However, the performance of KELM is affected by the regularization coefficient C and kernel function parameter γ, and the classification accuracy of KELM, which relies on manual experience to select the regularization coefficient and kernel function parameter, is low [15].To solve this problem, the researchers proposed to use an intelligent optimization algorithm to optimize the KELM parameters to improve the fault diagnosis accuracy.The scholars used the particle swarm optimization (PSO) algorithm [16], the sparrow search algorithm (SSA) [17], and the Harris Hawks optimization (HHO) [18] to optimize the KELM model in order to reduce the errors caused by manual parameter selection.However, the experimental results manifested that all of the above optimization algorithms have their own shortcomings [19,20].At the same time, the whale optimization algorithm (WOA) has been rapidly developed due to its powerful search capability and less parameter setting [21].However, WOA also has some problems, such as slow convergence speed and insufficient global search capacity [22].Therefore, it is necessary to propose an improved whale optimization algorithm to solve the above problems.The literature [23] proposed an improved global search whale optimization algorithm (GSWOA) based on three strategies.The adaptive weight strategy, variable spiral position update strategy, and optimal neighborhood perturbation strategy were adopted to improve the whale optimization algorithm, which improved the global search performance and convergence speed of the whale optimization algorithm.
In addition, feature extraction is an indispensable step before applying machine learning methods for fault diagnosis.The literature [24] extracted 17 time-domain features from circuit breaker vibration signals and input them into the XGBoost model to implement the diagnosis of the mechanical condition of the circuit breaker.The literature [25] used sparse filtering technology to automatically extract the frequency-domain features of gear vibration signals and input them into the Softmax classifier as feature vectors in order to realize gear fault diagnosis.However, extracting only a single time-domain or frequencydomain feature often leads to the insufficient ability to represent the information of the signal, which affects the accuracy of fault diagnosis [26].
Therefore, an improved time-frequency fusion features-based GSWOA-KELM model is proposed in this study.First, the time-domain and frequency-domain features of the gear fault state are extracted separately, and feature vectors are constructed through feature fusion, which overcomes the limitations of single features.Second, the GSWOA based on three strategies is used to optimize the regularization coefficient C and kernel function parameter γ of KELM, and a GSWOA-KELM fault diagnosis model is built to avoid the problem of low fault diagnosis accuracy caused by the manual selection of KELM parameters.Finally, the public dataset from Southeast University is taken to verify the performance of the proposed model by comparing it with KELM, SSA-KELM, and WOA-KELM models.The experimental results demonstrate that the improved time-frequency fusion featuresbased GSWOA-KELM model shows faster convergence speed and stronger global search ability.Compared with KELM, SSA-KELM, and WOA-KELM models, the performance of the proposed model has been improved by 11.33%, 8.67%, and 1.33%, respectively.
The main innovations and contributions of this paper are as follows: (1) This study proposes the GSWOA-KELM model for the first time.In the new model, the GSWOA is used to find the optimal parameters of the KELM, and the results show that compared with the existing model, the proposed GSWOA-KELM model has higher diagnostic accuracy, faster convergence speed, and stronger global search capability; (2) The time-domain and frequency-domain features are extracted and fused in this study, which overcomes the limitations of single-domain features and improves the fault diagnosis ability of the model.Meanwhile, the superiority of multi-domain features in representing information ability is examined in this study, which provides a reference basis for the application of feature extraction work in other aspects.

Time-Domain Features
Since the time-domain signal of gears tends to change when they are faulty, the timedomain characteristic parameters of the gear vibration signal can be analyzed to make an effective diagnosis of the type of fault.
Dimensional time-domain characteristic parameters and dimensionless time-domain characteristic parameters are two commonly used time-domain characteristic parameters.The 13 dimensional and dimensionless time-domain characteristic parameters extracted in this study and their calculation formulas are shown in Table 1 [27].

Formula Dimensionless Formula
Mean Value x = 1 Note: where x(n) represents the time-domain sequence of the signal, n = 1, 2, . .., N; N is the sample number.

Frequency-Domain Features
Extracting and analyzing the frequency-domain characteristic parameters of the gear fault vibration signal is also one of the efficient methods for gear fault diagnosis.
Therefore, in this study, the original time-domain vibration signal is transformed into the frequency domain using Fourier transform to observe the characteristics of the vibration signal from the perspective of frequency.The conversion from time-domain to frequency-domain can be defined as: where x(k∆tz) represents the sample value; N denotes the number of sample points; ∆t indicates the sampling interval; k refers to the discrete value of the time-domain signal.
After conversion into frequency signals, the corresponding frequency-domain characteristic parameters can be calculated according to the corresponding frequency-domain statistical index formula.The five frequency-domain characteristic parameters extracted in this study and their calculation formulas are shown in Table 2 [28].
Table 2.The five frequency-domain characteristic parameters and their calculation formulas.

Frequency Domain Characteristic Parameters Formula
Amplitude Mean AM = 1 Note: Where s(k) stands for the spectrum of signal x(n), k = 1, 2, . .., K; K is the number of spectral lines; f k denotes the frequency value of the k-th spectral line.

Fusion Features
Extract the above 13 time-domain features and 5 frequency-domain features to form the time-domain feature vector matrix T and the frequency-domain feature vector matrix F, respectively.Assuming that the total number of samples is n, then: The above time-domain feature vector matrix T and frequency-domain feature vector matrix F are fused to form the fused feature vector TF, then:

Kernel Extreme Learning Machine
The KELM is an improved algorithm developed on the basis of an extreme learning machine (ELM).It introduces a kernel function on the basis of an ELM, has better generalization performance, and has a faster learning ability [29].
The ELM is a feed-forward neural network including input, hidden, and output layers [20], and its typical neural network structure is given in Figure 1.
The KELM is an improved algorithm developed on the basis of an extreme learning machine (ELM).It introduces a kernel function on the basis of an ELM, has better generalization performance, and has a faster learning ability [29].
The ELM is a feed-forward neural network including input, hidden, and output layers [20], and its typical neural network structure is given in Figure 1.The mathematical expression of an ELM is as follows: (5) where H represents the output matrix of the hidden layer, β is the output weight, T denotes the target output matrix,   is the weight of the l-th neuron in the hidden layer, and   denotes the bias of the l-th neuron in the hidden layer.
The learning process of an ELM is the process of solving the output weight β, which is solved using the least squares method: where H + represents the generalized inverse matrix of H.
In KELM, the regularization coefficient C and kernel function parameters γ are introduced to improve the performance of the KELM, and the kernel function matrix is expressed as: Then, the least square solution of the β value of the KELM is: Based on the above equations, the output function of the KELM can be expressed as: In addition, the radial basis function (RBF) is chosen as the kernel function in this research, whose expression is as follows: The mathematical expression of an ELM is as follows: where H represents the output matrix of the hidden layer, β is the output weight, T denotes the target output matrix, w l is the weight of the l-th neuron in the hidden layer, and b l denotes the bias of the l-th neuron in the hidden layer.
The learning process of an ELM is the process of solving the output weight β, which is solved using the least squares method: where H + represents the generalized inverse matrix of H.
In KELM, the regularization coefficient C and kernel function parameters γ are introduced to improve the performance of the KELM, and the kernel function matrix is expressed as: Ω = HH T ( 8) Then, the least square solution of the β value of the KELM is: Based on the above equations, the output function of the KELM can be expressed as: In addition, the radial basis function (RBF) is chosen as the kernel function in this research, whose expression is as follows: where γ is the kernel parameter.

Whale Optimization Algorithm
The WOA is a swarm intelligence optimization algorithm that imitates the hunting process of whales in nature, which can be divided into three stages: the encircling prey stage, the bubble-net attacking stage, and the random hunting prey stage [30,31].In each stage, the position of the whale is updated.The process of using the whale optimization algorithm to solve the problem is to represent the position of each whale as a feasible solution and obtain the optimal solution by constantly updating the position of the whale.
During the encircling prey phase, the whale's position update formula can be expressed as: where t represents the number of iterations; X(t) indicates the current position of the whale; X * (t) represents the optimal whale location; D is the distance between the whale and the prey; and A and C represent the coefficient, whose expression is: (14) During the bubble-net attacking phase, the position update of the whale can be described using two mechanisms, namely, the contraction surround mechanism and the spiral update mechanism.The mathematical expression of the spiral update mechanism is where l represents the random number in the range between 0 and 1; b is a constant that reflects the shape of the helix.
It is worth noting that during the bubble-net attacking stage, the whale not only approaches the prey in a spiral shape but also shrinks the encircling circle, which is then mathematically modeled as: where p represents random numbers in the range between 0 and 1.
During the prey search phase, whales use a random search mechanism to search for prey globally.At this time, the updating method of whale position is determined by the range of A: if |A| < 1, the position is updated by spiral encircling; if |A|≥1, the location is updated by random search.The mathematical model of the random search mechanism updating location is where X rand (t) denotes the position of a random whale.

Global Search Whale Optimization Algorithm
In order to improve the convergence speed and global search ability of traditional whale optimization algorithms, an improved global search whale optimization algorithm (GSWOA) is proposed based on three strategies, namely, adaptive weight strategy, variable spiral position update strategy, and optimal neighborhood perturbation strategy [32].
First, the adaptive weight strategy is to introduce an adaptive inertia weight based on the number of iterations t into the position update of the whale, and its expression is as follows: (18) where w(t) is the adaptive inertia weight, and the variation range is [0, 1]; t is the current iteration number; and t max indicates the maximum iteration number.According to Equation (18), in the early stage of the algorithm, the weight value is small but changes quickly; in the later stage of the algorithm, with the increase in the number of iterations, the weight is large, but the change speed slows down, thus improving the convergence of the algorithm.
The position update formula of the improved whale optimization algorithm is Second, the variable spiral position update strategy refers to changing the constant b, which reflects the spiral shape in the bubble-net attacking stage, to a dynamically adjusted variable based on the number of iterations.The mathematical formula is as follows: From Equation ( 21), it can be seen that in the early phase of the algorithm, the spiral shape range is larger, and the whale can search for optimization in a larger range and has a stronger global search ability; with the increase of the number of iterations, the spiral shape range becomes smaller, and the whale can search in a smaller range to improve the optimization accuracy.Now, the position update formula of the improved whale optimization algorithm is Finally, the optimal neighborhood perturbation strategy is to expand the search scope of the optimal location to the vicinity of the current optimal location when the whale position is updated and search the nearby space simultaneously instead of being limited to the current optimal location.In this way, the search efficiency of the whale and the convergence speed of the algorithm can be enhanced.The mathematical expression for generating a disturbance in the neighborhood of the current optimal location and generating a new location is X where rand1 and rand2 indicate uniform random numbers in the range [0, 1]; X(t) is the generated new location.If the generated new position is better than the original position, the new position is kept.If the generated new position is inferior to the original position, the original position is retained.The formula is expressed as: where f(x) represents the fitness value when the position is x.The overall flow of the GSWOA is shown in Figure 2.

Kernel Extreme Learning Machine Optimized using the Global Search Whale Optimization Algorithm
In this study, GSWOA is used to intelligently optimize the regularization coefficient C and kernel function parameter γ of KELM, and a GSWOA-KELM gear fault diagnosis model is constructed to avoid the problem of low fault diagnosis efficiency of KELM caused by artificial parameter selection.The process of using GSWOA to optimize KELM parameters is shown in Figure 3.The specific steps are as follows: Step 1: Initialize the parameters of the GSWOA, set the whale population size to 10, the maximum number of iterations to 60, the problem dimension to 2, and the whale exploration boundary to [1,20]; Step 2: Initialize the whale position and map it to the initialization parameters of KELM: regularization coefficient C and kernel function parameter γ; Step 3: Calculate the fitness value of each whale in the whale population and find the optimal whale location in the population; Step 4: Update the current optimal location using Formulas ( 23) and ( 24); Step 5: Randomly generate the update parameter p.If p < 0.5 and |A| < 1, update the whale position using Formula (16); if p < 0.5 and |A| ≥ 1, use Formula (20) to update the position of the whale.If p ≥ 0.5, the whale position is updated using Formula (22), where A is the step coefficient of the convergence factor optimization; Step 6: Determine whether the maximum number of iterations has been reached.If it is, output the whale position at this moment as the optimal parameters of KELM, input these optimal parameters into the KELM model, and train the model for fault diagnosis; if not, repeat the above steps until the maximum number of iterations is reached.

Kernel Extreme Learning Machine Optimized Using the Global Search Whale Optimization Algorithm
In this study, GSWOA is used to intelligently optimize the regularization coefficient C and kernel function parameter γ of KELM, and a GSWOA-KELM gear fault diagnosis model is constructed to avoid the problem of low fault diagnosis efficiency of KELM caused by artificial parameter selection.The process of using GSWOA to optimize KELM parameters is shown in Figure 3.The specific steps are as follows: Step 1: Initialize the parameters of the GSWOA, set the whale population size to 10, the maximum number of iterations to 60, the problem dimension to 2, and the whale exploration boundary to [1,20]; Step 2: Initialize the whale position and map it to the initialization parameters of KELM: regularization coefficient C and kernel function parameter γ; Step 3: Calculate the fitness value of each whale in the whale population and find the optimal whale location in the population; Step 4: Update the current optimal location using Formulas ( 23) and ( 24); Step 5: Randomly generate the update parameter p.If p < 0.5 and |A| < 1, update the whale position using Formula (16); if p < 0.5 and |A| ≥ 1, use Formula (20) to update the position of the whale.If p ≥ 0.5, the whale position is updated using Formula (22), where A is the step coefficient of the convergence factor optimization; Step 6: Determine whether the maximum number of iterations has been reached.If it is, output the whale position at this moment as the optimal parameters of KELM, input these optimal parameters into the KELM model, and train the model for fault diagnosis; if not, repeat the above steps until the maximum number of iterations is reached.

Data Acquisition and Preprocessing
The open gearbox fault data set collected from the drivetrain dynamic simulator (DDS) of Southeast University is used to verify the proposed method, whose data acquisition test platform is shown in Figure 4.This test platform includes a brake, brake controller, planetary gearbox, reduction gearbox, motor, motor controller, and other components.Gear bearings are mounted on the second-stage drive shaft of the reduction gearbox or the second-stage planetary shaft of the planetary gearbox, and seven vibration sensors of type 608A11 are mounted in the direction of the x, y, and z-axes of the planetary gearbox and reduction gearbox as well as in the direction of the motor's z-axis with a sampling frequency of 5120 Hz.Its collected data include two working conditions, with the speed and load of 20 Hz/0 V and 30 Hz/2 V, respectively.There are five types of gear faults: health, chipped, root, miss, and surface.A detailed description of them is shown in Table 3.The gears of different fault states are processed in advance, the variable speed can be realized via the motor controller, and the change of load is realized via the load controller.
In this study, the data under the 20 Hz/0 V condition are selected for research; each type of fault intercepts 100 sample groups, and each sample group contains 2048 sample points.The data set is randomly divided according to the ratio of the training set to the test set = 7: 3, and the labels for the five types of gearbox faults are established, as shown in Table 3.

Experimental Verification and Result Analysis 4.1. Data Acquisition and Preprocessing
The open gearbox fault data set collected from the drivetrain dynamic simulator (DDS) of Southeast University is used to verify the proposed method, whose data acquisition test platform is shown in Figure 4.This test platform includes a brake, brake controller, planetary gearbox, reduction gearbox, motor, motor controller, and other components.Gear bearings are mounted on the second-stage drive shaft of the reduction gearbox or the second-stage planetary shaft of the planetary gearbox, and seven vibration sensors of type 608A11 are mounted in the direction of the x, y, and z-axes of the planetary gearbox and reduction gearbox as well as in the direction of the motor's z-axis with a sampling frequency of 5120 Hz.Its collected data include two working conditions, with the speed and load of 20 Hz/0 V and 30 Hz/2 V, respectively.There are five types of gear faults: health, chipped, root, miss, and surface.A detailed description of them is shown in Table 3.The gears of different fault states are processed in advance, the variable speed can be realized via the motor controller, and the change of load is realized via the load controller.In this study, the data under the 20 Hz/0 V condition are selected for research; each type of fault intercepts 100 sample groups, and each sample group contains 2048 sample points.The data set is randomly divided according to the ratio of the training set to the test set = 7:3, and the labels for the five types of gearbox faults are established, as shown in Table 3.

Time-Frequency Features Extraction
Thirteen time-domain features described in Table 1 and five frequency-domain features described in Table 2 are extracted and normalized, respectively.The data distributions of different time-domain features and frequency-domain features in different gear fault states are shown in Figures 5 and 6, respectively.

Time-Frequency Features Extraction
Thirteen time-domain features described in Table 1 and five frequency-domain features described in Table 2 are extracted and normalized, respectively.The data distributions of different time-domain features and frequency-domain features in different gear fault states are shown in Figures 5 and 6, respectively.

Fault Diagnosis and Result Analysis without Feature Fusion
The time-domain feature matrix T, frequency-domain feature matrix F, and fusionfeature matrix TF obtained after the above feature extraction are input into the GSWOA-KELM fault diagnosis model, respectively.In this model, the number of the whale population is set to 10, the maximum number of iterations is 60, the dimension is 2, the upper bound is 1, and the lower bound is 20.The dataset partitioning and labeling settings are set, as shown in Table 3

Fault Diagnosis and Result Analysis without Feature Fusion
The time-domain feature matrix T, frequency-domain feature matrix F, and fusionfeature matrix TF obtained after the above feature extraction are input into the GSWOA-KELM fault diagnosis model, respectively.In this model, the number of the whale population is set to 10, the maximum number of iterations is 60, the dimension is 2, the upper bound is 1, and the lower bound is 20.The dataset partitioning and labeling settings are set, as shown in Table 3   The accuracy rate of fault diagnosis under three different inputs is shown in Table 4.The accuracy rate of fault diagnosis under three different inputs is shown in Table 4.The accuracy rate of fault diagnosis under three different inputs is shown in Table 4.The accuracy rate of fault diagnosis under three different inputs is shown in Table 4.As can be seen from Table 4, the classification accuracy of GSWOA-KELM is lower when the inputs are single-domain features in time-domain or frequency-domain, and the classification accuracy of the training set is 86.67% and 85.33%, respectively, whereas when the inputs are fusion features, GSWOA-KELM has the highest classification accuracy, and the classification accuracy reaches 100%.Compared to when the time domain and frequency domain are used as separate feature vectors, the accuracy is improved by 13.33% and 14.67%.Therefore, extracting multi-domain features of gear faults as inputs to the model can significantly improve the classification accuracy of the fault diagnosis model.

Fault Diagnosis and Result Analysis with Feature Fusion
First, in order to verify the superiority of the GSWOA-KELM model in terms of convergence speed and global search performance, the fusion feature vector TF is input into the SSA-KELM, WOA-KELM, and GSWOA-KELM fault diagnosis models, respectively, and the fitness curves of the three models are compared, as shown in Figure 10.During fault diagnosis, the parameters of the three models are set to be consistent, where the population number is 10, the maximum number of iterations is 60, and the dimension is 2. The dataset partitioning and label setting are set as shown in Table 3.
Lubricants 2024, 12, x FOR PEER REVIEW 13 of 17 As can be seen from Table 4, the classification accuracy of GSWOA-KELM is lower when the inputs are single-domain features in time-domain or frequency-domain, and the classification accuracy of the training set is 86.67% and 85.33%, respectively, whereas when the inputs are fusion features, GSWOA-KELM has the highest classification accuracy, and the classification accuracy reaches 100%.Compared to when the time domain and frequency domain are used as separate feature vectors, the accuracy is improved by 13.33% and 14.67%.Therefore, extracting multi-domain features of gear faults as inputs to the model can significantly improve the classification accuracy of the fault diagnosis model.

Fault Diagnosis and Result Analysis with Feature Fusion
First, in order to verify the superiority of the GSWOA-KELM model in terms of convergence speed and global search performance, the fusion feature vector TF is input into the SSA-KELM, WOA-KELM, and GSWOA-KELM fault diagnosis models, respectively, and the fitness curves of the three models are compared, as shown in Figure 10.During fault diagnosis, the parameters of the three models are set to be consistent, where the population number is 10, the maximum number of iterations is 60, and the dimension is 2. The dataset partitioning and label setting are set as shown in Table 3. Figure 10a shows the algorithm converges after 50 iterations, and the final convergence value is 0.098.Figure 10b shows that when WOA is used to optimize KELM, its fitness curve is a straight line, which is analyzed because the model falls into local optimality from the beginning.As shown in Figure 10c, GSWOA-KELM only iterates twice to find the optimal value, and the final convergence value is 0, which is lower than the final convergence value of SSA-KELM, with faster convergence speed and higher optimization accuracy.Meanwhile, compared with WOA-KELM, GSWOA-KELM avoids premature convergence and local optimization and has a stronger global search ability.
Next, in order to verify the fault diagnosis accuracy of the GSWOA-KELM model, KELM, SSA-KELM, and WOA-KELM models are selected for comparison and verification.At first, the fusion feature vector TF is input into the above four fault diagnosis models respectively, in which the population number of SSA, WOA, and GSWOA optimization algorithms is set as 10, the maximum number of iterations is 60, and the dimension is two-dimensional.What is more, the dataset partitioning and label settings are set, as shown in Table 3.The classification accuracy rate of each fault diagnosis model is shown in Table 5  Figure 10a shows the algorithm converges after 50 iterations, and the final convergence value is 0.098.Figure 10b shows that when WOA is used to optimize KELM, its fitness curve is a straight line, which is analyzed because the model falls into local optimality from the beginning.As shown in Figure 10c, GSWOA-KELM only iterates twice to find the optimal value, and the final convergence value is 0, which is lower than the final convergence value of SSA-KELM, with faster convergence speed and higher optimization accuracy.Meanwhile, compared with WOA-KELM, GSWOA-KELM avoids premature convergence and local optimization and has a stronger global search ability.
Next, in order to verify the fault diagnosis accuracy of the GSWOA-KELM model, KELM, SSA-KELM, and WOA-KELM models are selected for comparison and verification.At first, the fusion feature vector TF is input into the above four fault diagnosis models respectively, in which the population number of SSA, WOA, and GSWOA optimization algorithms is set as 10, the maximum number of iterations is 60, and the dimension is two-dimensional.What is more, the dataset partitioning and label settings are set, as shown in Table 3  As shown in Table 5, the fault diagnosis classification accuracies of KELM, SSA-KELM, WOA-KELM, and GSWOA-KELM models are 88.67%, 91.33%, 98.67%, and 100%, respectively.Among them, GSWOA-KELM, established in this study, has the highest accuracy, which reaches 100%.Compared with the other three models, the fault diagnosis classification accuracy of GSWOA-KELM is improved by 11.33%, 8.67%, and 1.33%, respectively.As shown in Table 5, the fault diagnosis classification accuracies of KELM, SSA-KELM, WOA-KELM, and GSWOA-KELM models are 88.67%, 91.33%, 98.67%, and 100%, respectively.Among them, GSWOA-KELM, established in this study, has the highest accuracy, which reaches 100%.Compared with the other three models, the fault diagnosis classification accuracy of GSWOA-KELM is improved by 11.33%, 8.67%, and 1.33%, respectively.
As can be seen from Figures 11-14, the GSWOA-KELM model can accurately identify various fault types and has no misclassification.Compared with GSWOA-KELM, the misclassified fault types of the other three models mainly focus on Miss and Surface.Figure 11 denotes that, in the KELM model, three Root samples are misclassified as Miss samples, one Root sample is misclassified as Surface samples, and two Miss samples are misclassified as Root samples.Figure 12 indicates that in the SSA-KELM model, four Root samples are misclassified as Miss samples, two Miss samples are misclassified as Root samples, and seven Miss samples are misclassified as Surface samples.As can be seen from Figure 13, in the WOA-KELM model, one Miss sample and one Surface sample are misclassified as Root samples separately.

Conclusions
Aiming at the problem of low accuracy caused via manual parameter selection in KELM fault diagnosis, an improved time-frequency fusion features-based GSWOA-KELM model is proposed.The results have confirmed that the model proposed in this study has the ideal effect on gear fault diagnosis.The specific conclusions are as follows: (1) Compared with KELM, SSA-KELM, and WOA-KELM, the GSWOA-KELM has faster convergence speed, stronger global search capability, and higher recognition accuracy; (2) When constructing a GSWOA-KELM model for gear fault diagnosis, the GSWOA-KELM performance can be improved by considering the fusion features rather than the single time-domain or frequency-domain features;

Figure 1 .
Figure 1.The network structure of ELM.

Figure 1 .
Figure 1.The network structure of ELM.

Figure 4 .
Figure 4. Southeast University gearbox data acquisition test platform.

Figure 4 .
Figure 4. Southeast University gearbox data acquisition test platform.

Figure 4 .
Figure 4. Southeast University gearbox data acquisition test platform.

Figure 5 .
Figure 5. Distribution of time −domain feature parameters.

Figure 6 .
Figure 6.Distribution of frequency −domain feature parameters.
. The fault diagnosis results under three different inputs are shown in Figures 7-9, respectively, where Figures 7a-9a shows the comparison results of predicted classification and actual classification, and Figures 7b-9b indicates the confusion matrix of classification results.In Figures 7b-9b, the blue line indicates the number

Figure 6 .
Figure 6.Distribution of frequency −domain feature parameters.
. The fault diagnosis results under three different inputs are shown in Figures 7-9, respectively, where Figures 7a-9a shows the comparison results of predicted classification and actual classification, and Figures 7b-9b indicates the confusion matrix of classification results.In Figures 7b-9b, the blue line indicates the number

Figure 6 .
Figure 6.Distribution of frequency −domain feature parameters.

Figure 9 .
Figure 9. GSWOA-KELM fault diagnosis results for fusion-domain features as input.(a) Predictive classification results; (b) confusion matrix for the test set.
Figure10ashows the algorithm converges after 50 iterations, and the final convergence value is 0.098.Figure10bshows that when WOA is used to optimize KELM, its fitness curve is a straight line, which is analyzed because the model falls into local optimality from the beginning.As shown in Figure10c, GSWOA-KELM only iterates twice to find the optimal value, and the final convergence value is 0, which is lower than the final convergence value of SSA-KELM, with faster convergence speed and higher optimization accuracy.Meanwhile, compared with WOA-KELM, GSWOA-KELM avoids premature convergence and local optimization and has a stronger global search ability.Next, in order to verify the fault diagnosis accuracy of the GSWOA-KELM model, KELM, SSA-KELM, and WOA-KELM models are selected for comparison and verification.At first, the fusion feature vector TF is input into the above four fault diagnosis models respectively, in which the population number of SSA, WOA, and GSWOA optimization algorithms is set as 10, the maximum number of iterations is 60, and the dimension is two-dimensional.What is more, the dataset partitioning and label settings are set, as shown in Table3.The classification accuracy rate of each fault diagnosis model is shown in Table5.The fault diagnosis results of each model are shown in Figures 11-14, where Figures 11a-14a is the comparison results of the predicted classification and the actual classification of each model, and Figures 11b-14b shows the confusion matrix of the classification results of each model.
Figure10ashows the algorithm converges after 50 iterations, and the final convergence value is 0.098.Figure10bshows that when WOA is used to optimize KELM, its fitness curve is a straight line, which is analyzed because the model falls into local optimality from the beginning.As shown in Figure10c, GSWOA-KELM only iterates twice to find the optimal value, and the final convergence value is 0, which is lower than the final convergence value of SSA-KELM, with faster convergence speed and higher optimization accuracy.Meanwhile, compared with WOA-KELM, GSWOA-KELM avoids premature convergence and local optimization and has a stronger global search ability.Next, in order to verify the fault diagnosis accuracy of the GSWOA-KELM model, KELM, SSA-KELM, and WOA-KELM models are selected for comparison and verification.At first, the fusion feature vector TF is input into the above four fault diagnosis models respectively, in which the population number of SSA, WOA, and GSWOA optimization algorithms is set as 10, the maximum number of iterations is 60, and the dimension is two-dimensional.What is more, the dataset partitioning and label settings are set, as shown in Table3.The classification accuracy rate of each fault diagnosis model is shown in Table5.The fault diagnosis results of each model are shown in Figures 11-14, where Figures 11a, 12a, 13a and 14a is the comparison results of the predicted classification and the actual classification of each model, and Figures 11b, 12b, 13b and 14b shows the confusion matrix of the classification results of each model.

Figure 11 .
Figure 11.The fault diagnosis results of KELM.(a) Predictive classification results; (b) confusion matrix for the test set.

Figure 11 .Figure 12 .
Figure 11.The fault diagnosis results of KELM.(a) Predictive classification results; (b) confusion matrix for the test set.Lubricants 2024, 12, x FOR PEER REVIEW 15 of 17

Figure 12 .
Figure 12.The fault diagnosis results of SSA-KELM.(a) Predictive classification results; (b) confusion matrix for the test set.

Figure 11
denotes that, in the KELM model, three Root samples are misclassified as Miss samples, one Root sample is misclassified as Surface samples, and two Miss samples are misclassified as Root samples.Figure12indicates that in the SSA-KELM model, four Root samples are misclassified as Miss samples, two Miss samples are misclassified as Root samples, and seven Miss samples are misclassified as Surface samples.As can be seen from Figure13, in the WOA-KELM model, one Miss sample and one Surface sample are misclassified as Root samples separately.

Figure 13 .
Figure 13.The fault diagnosis results of WOA-KELM.(a) Predictive classification results; (b) confusion matrix for the test set.

Figure 11
denotes that, in the KELM model, three Root samples are misclassified as Miss samples, one Root sample is misclassified as Surface samples, and two Miss samples are misclassified as Root samples.Figure12indicates that in the SSA-KELM model, four Root samples are misclassified as Miss samples, two Miss samples are misclassified as Root samples, and seven Miss samples are misclassified as Surface samples.As can be seen from Figure13, in the WOA-KELM model, one Miss sample and one Surface sample are misclassified as Root samples separately.

Figure 14 .
Figure 14.The fault diagnosis results of GSWOA-KELM.(a) Predictive classification results; (b) confusion matrix for the test set.

Table 1 .
The 13 time-domain characteristic parameters and their calculation formulas.

Table 3 .
Dataset fault type description and classification label.

Table 3 .
Dataset fault type description and classification label.

Table 3 .
Dataset fault type description and classification label.

Table 4 .
The accuracy rate for different inputs.

Table 4 .
The accuracy rate for different inputs.

Table 4 .
The accuracy rate for different inputs.

Table 4 .
The accuracy rate for different inputs.

Table 5 .
The classification accuracy rate of each fault diagnosis model.