Research on the Gearbox Fault Diagnosis Method Based on Multi-Model Feature Fusion

: The gearbox is an important component of rotating machinery and is of great signiﬁcance for gearbox fault diagnosis. In this paper, a gearbox fault diagnosis model based on multi-model feature fusion was proposed that addressed the limitations of a single or few features reﬂecting the gearbox’s fault state. The time–frequency feature of the vibration signal was extracted, and the sensitive feature was selected. The sensitive features were extracted using a one-dimensional convolutional neural network. The parallel fusion method was used to fuse the two domain features as inputs to the support vector machine model. The radial basis kernel function and penalty factor of the support vector machine were optimized by improving the particle swarm optimization algorithm. Finally, the gearbox states were identiﬁed using the optimized support vector machine model. The results show that the recognition rate of the proposed model is 98.3%, which is higher than that of other models.


Introduction
Gearboxes are widely used in mechanical equipment due to their role in power transmission [1,2].The working environment of a gearbox is relatively harsh, and various faults often occur during operation, which affects the working state of the gearbox and can cause heavy casualties and economic losses [3,4].
Research on gearbox faults began in the early 20th century.By 1968, gearbox fault diagnosis became an important standard for determining whether gear performance met the requirements, and it attracted the attention of many scholars worldwide [5,6].Bao et al. [7] described the mathematical model of the faulty gear vibration signal in 1992 and verified the effectiveness of diagnostic methods such as broadband demodulation technology, correlation spectrum analysis, and refined complex envelope analysis.Wang et al. [8] proposed resonance demodulation, wavelet analysis, and model-based autoregressive diagnostic methods in 1999.Wu et al. [9] proposed a gear fault diagnosis and featureextraction method and analyzed and extracted the time-frequency characteristics of gear failure vibration signals.The experimental results indicated that the proposed method had a higher recognition rate.Some recent studies have focused on the classification of dynamic data distribution using different algorithms.Cao et al. [10] presented a novel intelligent technique for tool-wear state recognition using machine spindle vibration signals.
Additionally, that study combined derived wavelet frames and a convolutional neural network.ND et al. [11] proposed condition monitoring of a face milling cutting tool with the help of Artificial Neural Network based multilayer perceptron approach.The results confirmed that multilayer perceptron approach provides more classification accuracy.
Because time-frequency domain feature selection was successful in the gearbox fault diagnosis, its integration with machine learning is the next step to consider.Some machine learning algorithms are being used in gearbox fault diagnosis [12,13].Wang et al. [14] proposed a data-driven fault diagnosis method for wind turbine gearboxes, which was based on three models: a gray wolf optimized variational mode decomposition method (AGWO-VMD), normalized composite multiscale dispersion entropy (NCMDE), and a long short-term memory network (LSTM) [15].Recently, neural-network techniques have been used for fault diagnosis.Khalil et al. [16] used a fast Fourier transform (FFT) to obtain the fault frequency signature and principal component analysis (PCA) to obtain the most important data with reduced dimensions.The proposed method was validated using two circuits.Khalil et al. [17] proposed self-healing to recover a faulty embryonic cell through the innovative usage of healthy cells, and the researchers achieved a high-accuracy fault prediction with a low training time.However, when faced with multiple data samples, these shallow learning models often led to poor prediction results due to factors such as insufficient model generalization ability [18].
The deep learning model has the functions of feature extraction and pattern recognition, which can reduce the dependence on signal processing technology in the fault diagnosis process [19,20].In addition, because of its powerful representation ability, it can meet the current industrial big data development requirements [21].Deep learning models have been developed to address the issues with gearboxes, such as convolutional neural networks (CNN) [22,23], deep belief networks (DBN) [24], and generative adversarial neural networks (GAN) [25].Wu et al. [26] used a one-dimensional convolutional neural network (1DCNN) model to analyze the original vibration signal in the process of tank gearbox fault diagnosis, and the results showed that the 1DCNN model can effectively identify the fault state of the tank gearbox.Zhang et al. [27] extracted sensitive features using a 1DCNN and verified its effectiveness.Research on various intelligent algorithms has evolved to the point where fault diagnosis is no longer limited to a specific algorithm [28].Multi-model fusion methods also play an important role in fault diagnosis [29].Currently, the commonly used multi-model fusion techniques can be divided into three methods: data layer fusion, feature layer fusion, and decision layer fusion [30].Although the aforementioned fault diagnosis methods can correctly diagnose gearbox faults in most cases, the multi-model fusion method can be improved.
A gearbox fault diagnosis model based on multi-model feature fusion was proposed in this study.The contributions of this study are as follows.
(1) Eight time-frequency sensitive features are extracted and selected.The 1DCNN was used to extract the original vibration signal features.(2) The parallel fusion method was used to fuse the two domain features as the input of the support vector machine (SVM) model.(3) The improved particle swarm optimization (IPSO) algorithm is used to optimize the SVM classifier to achieve gearbox fault diagnosis and obtain more accurate and effective results.
The remainder of this paper is organized as follows.Section 2 introduces the gearbox fault diagnosis background, including the 1DCNN, SVM, IPSO, and feature fusion.Section 3 introduces the experimental platform construction and data collection.Section 4 introduces the construction of the fault diagnosis model.In Section 5, the experimental results are analyzed and verified.Section 6 concludes the paper and discusses future work.

1DCNN
Convolutional neural networks (CNN) were first proposed by Sercu et al. [31].Neural network models can be used to document and identify images.This model can directly process two-dimensional images through weight-sharing and convolution operations.It can also avoid the tedious feature extraction and data reconstruction processes involved in traditional intelligent algorithms.The 1DCNN model is a feed-forward neural network and supervised learning model; that is, the input sample needs the corresponding sample label for supervised learning, which is usually composed of a convolutional layer, pooling layer, fully connected layer, and a softmax classifier.It is necessary to select the appropriate activation function, optimizer, and learning rate.
The main function of the convolutional layer is to continuously extract features from the previous layer's data by setting the size of the convolution kernel [32].Weight sharing is the main characteristic of the convolutional layers.Weight sharing can effectively reduce the parameters required in the training process, accelerate the model training, and reduce the occurrence of overfitting.The expression for the convolution operation is as follows: where l is the lth convolutional layer, x l j is the lth layer output, x l−1 i is the lth layer input, k l ij is the weight matrix, b l j is the bias, f (•) is the activation function, and m j is the (l − 1)th convolutional region of the layer feature map [33].
In the 1DCNN structure, the pooling layer mainly performs pooling processing through the images extracted from the convolutional layer, which can significantly reduce the number of calculations during the model operation and reduce the occurrence of overfitting [34].Pooling methods include mean pooling, max-pooling, and stochastic pooling.This study adopted the max-pooling method to reduce the feature errors.The general calculation process for pooling is as follows: where β l j and b l j are the multiplicative and additive biases of the lth neuron of the jth layer network, respectively, and down(•) represents the sampling function.
The main task of the fully connected layer in the 1DCNN model is to summarize the local features extracted by the convolutional and pooling layers.After the local features were fused by the fully connected layer, they were identified using a softmax classifier.

IPSO
Based on the traditional PSO algorithm, the improved particle swarm optimization (IPSO) utilizes the adaptive inertia weight and population shrinkage factor to speed up convergence and improve the search accuracy.This is performed to address the PSO algorithm's problem of falling into the local optimal solution (and thereby affecting the search) [35].In the PSO algorithm, when the value of the weight ω is large, the particle has a strong ability to move in the solution space.Thus, the search ability in the global scope is also relatively strong.When ω is small, the ability of the particle to search for the optimal solution in the local area is strengthened, making it easier for the algorithm to converge.In the traditional PSO algorithm, setting a fixed value of ω easily leads to the setting of ω being too large, which causes the PSO algorithm to converge prematurely during the running process.If it is too small, the model will easily fall into the local optimum, and the expected search effect will not be achieved.The weight of the PSO algorithm selects a relatively large ω at the beginning of the iteration, which not only ensures that the algorithm has a powerful global search ability, but also has the ability to jump out of the local optimum.In the later stages of the iteration, using a smaller ω for a stronger local search is beneficial for the convergence of the algorithm [36].In this study, ω is expressed by Equation (3): where f represents the function value of the objective function optimized in the current PSO; ω max and ω min are the maximum and minimum values of the inertia weight factor, respectively; and f min and f avg are the minimum and average values of the particle objective function values, respectively.
Equation (3) shows that, when f tends to be consistent or tends toward a local optimal solution, ω in the PSO algorithm increases, and when each f is relatively scattered, ω decreases.When the f of the particle is better than f avg , its corresponding ω will be smaller; thus, the particle is preserved.Conversely, when the f of the particle is worse than f avg , the corresponding ω of the particle will be larger, making the particle move closer to a better search area.If the diversity of the population gradually decreases during the calculation process of the PSO algorithm, the population will be far from the global optimal position, which is equivalent to implementing the "diffusion" operation on the population.If the diversity of the population gradually increases, the population continues to approach the global optimal position, which is equivalent to an "attraction" operation on the population [37].To address this problem, this study introduces a shrinking factor based on the adaptive weight factor, and its calculation expression is shown in Equation ( 4): where C is the balance factor, and c 1 and c 2 are the learning factors.Clerk et al. [38] proposed that when C = 4.1, the species diversity of PSO can be maintained and the convergence ability is better.Here, ω = 0.7298, and the population speed update is as follows: where v i is the particle velocity, r i is a random number between (0, 1), P i is the global optimal particle, G i is the individual optimal particle, and x i is the current position of the particle.

SVM
As a data analysis method developed based on statistical learning theory, SVM can solve data processing problems, such as regression problems and pattern recognition, and can also be extended to fields and disciplines such as prediction and comprehensive evaluation [39][40][41][42].The SVM model was originally applied to binary classification, that is, to find a hyperplane to separate the positive and negative categories that need to be classified.Simultaneously, two parallel hyperplanes with as large intervals as possible were constructed on each side to ensure a good classification ability.The error generalization ability of the SVM increased as the distance between the two hyperplanes increased.The support vector was the training sample closest to the hyperplane and was the basis of the classification func-tion used to form classification [43].
In the process of classifying nonlinear data using an SVM, the mapping of the input data to a high-dimensional space was completed using a kernel function.The selection of different kernel functions affected the classification.Cheng et al. [44] demonstrated that the radial basis kernel function has a wider application range than polynomial and sigmoid functions do.Therefore, the radial basis kernel function was selected for this study, and its mathematical expression is as follows: where x is the n-dimensional input vector, x k is the center of the radial basis kernel function, x − x k is the norm of x − x k , and σ is the standard kernel function parameter [45].
Due to the fact that the SVM itself can only deal with binary classification problems, the multi-classification SVM model decomposes multi-classification problems into multiple binary classification problems.The multi-classification SVM model used in this study is the one-against-all SVM.The goal of the one-against-all SVM algorithm is that each support vector machine established can separate the data of one category from the data of other categories.In this method, one class in the dataset is regarded as the "+1" class, and the rest are regarded as the "−1" class.The first SVM model was then established.One class was separated from all the remaining classes, and the iteration was performed until all classes were separated pairwise.If the number of categories in a dataset is M, then the method must establish an SVM classifier using M. A flowchart of pattern recognition using a four-class SVM is shown in Figure 1. [45].
Due to the fact that the SVM itself can only deal with binary classification problems, the multi-classification SVM model decomposes multi-classification problems into multiple binary classification problems.The multi-classification SVM model used in this study is the one-against-all SVM.The goal of the one-against-all SVM algorithm is that each support vector machine established can separate the data of one category from the data of other categories.In this method, one class in the dataset is regarded as the "+1" class, and the rest are regarded as the "−1" class.The first SVM model was then established.One class was separated from all the remaining classes, and the iteration was performed until all classes were separated pairwise.If the number of categories in a dataset is M, then the method must establish an SVM classifier using M. A flowchart of pattern recognition using a four-class SVM is shown in Figure 1.

Feature Fusion
Serial and parallel fusions are the most commonly used feature-level fusion methods.The serial and parallel fusion methods are shown in Equations ( 7) and ( 8), respectively: ) where n1 and n2 are the weights of feature vectors e and f, respectively, if the dimensions of these two feature vectors are g and h, respectively.The fusion feature d1 can be obtained from Equation (7), and the dimension of the feature vector d1 after serial fusion is (g + h), as shown in Equation (8).Similarly, n1 and n2 are the weights of the feature vectors e and f, respectively, and the dimensions of these two feature vectors are the g-dimension and h-dimension, respectively.The dimension of the feature vector d2 fused by the parallel

Feature Fusion
Serial and parallel fusions are the most commonly used feature-level fusion methods.The serial and parallel fusion methods are shown in Equations ( 7) and ( 8), respectively: where n 1 and n 2 are the weights of feature vectors e and f, respectively, if the dimensions of these two feature vectors are g and h, respectively.The fusion feature d 1 can be obtained from Equation (7), and the dimension of the feature vector d 1 after serial fusion is (g + h), as shown in Equation (8).Similarly, n 1 and n 2 are the weights of the feature vectors e and f, respectively, and the dimensions of these two feature vectors are the g-dimension and h-dimension, respectively.The dimension of the feature vector d 2 fused by the parallel fusion method is the same as the dimension of the highest dimension of the feature vectors e and f.From the above introduction of these two fusion methods, it can be seen that series fusion involves directly and simply splicing feature vectors, whereas parallel fusion fuses feature vectors that need to be fused using a fusion algorithm.Serial fusion has problems, such as increasing the dimension of the fusion feature vector and causing the information conflict of the feature vector, whereas parallel fusion does not increase the dimension of the fusion feature vector after fusion is performed by the relevant algorithm.Moreover, studies have shown that the parallel fusion method performs better than the serial fusion method in practical applications [46].Therefore, the parallel fusion method was adopted to fuse the feature set in this paper, and its mathematical expression ( 9) is as follows: If the dimensions of the two feature sets to be fused are not equal, the low-dimensional feature set must be complemented by zero.

Experimental Platform Construction and Data Collection
To verify the effectiveness of the proposed method for recognizing the fault state of a gearbox, an experimental scheme for the fault diagnosis of a gearbox was designed using frequency converters, motors, gearboxes, magnetic powder brakes, piezoelectric accelerometers, data acquisition cards, and a PC.A three-phase asynchronous motor (model YE2-100L2-4) was selected, and the frequency converter model was G7R5/P011-T4, which means that the operation of the frequency conversion and speed regulation of the three-phase asynchronous motor was completed.The gearbox model used was JZQ250; its reduction ratio is 10.35, and the output speed is 0~145 rpm.The magnetic powder brake model FZ-A-12 was selected.The vibration data were collected using a YE6231 data-collection system produced by Jiangsu Lianneng Electronic Technology Co., Ltd.The CAYD051V piezoelectric accelerometer, whose sensitivity is 100 mV/g, was selected for the acquisition of vibration signals, because it has the characteristics of a strong anti-interference ability, wide measurement range, and wide frequency range.The connection diagram of the experimental device and physical chart of the platform are shown in Figures 2 and 3, respectively.
to fuse the feature set in this paper, and its mathematical expression ( 9) is as follows: If the dimensions of the two feature sets to be fused are not equal, the low-dimensional feature set must be complemented by zero.

Experimental Platform Construction and Data Collection
To verify the effectiveness of the proposed method for recognizing the fault state of a gearbox, an experimental scheme for the fault diagnosis of a gearbox was designed using frequency converters, motors, gearboxes, magnetic powder brakes, piezoelectric accelerometers, data acquisition cards, and a PC.A three-phase asynchronous motor (model YE2-100L2-4) was selected, and the frequency converter model was G7R5/P011-T4, which means that the operation of the frequency conversion and speed regulation of the threephase asynchronous motor was completed.The gearbox model used was JZQ250; its reduction ratio is 10.35, and the output speed is 0~145 rpm.The magnetic powder brake model FZ-A-12 was selected.The vibration data were collected using a YE6231 data-collection system produced by Jiangsu Lianneng Electronic Technology Co., Ltd.The CAYD051V piezoelectric accelerometer, whose sensitivity is 100 mV/g, was selected for the acquisition of vibration signals, because it has the characteristics of a strong anti-interference ability, wide measurement range, and wide frequency range.The connection diagram of the experimental device and physical chart of the platform are shown in Figures 2 and 3, respectively.The construction process of the gearbox fault diagnosis experimental platform is as follows.
(1) To ensure safety during the experiment, an air switch was installed between the power plug and inverter.(2) The inverter was connected to the motor, and the motor and gearbox were connected through the belt.The gearbox and the magnetic powder brake were connected through the coupling, and the motor, gearbox, and magnetic powder brake were fixed in the base plate.(3) A piezoelectric accelerometer was installed at the axial position of the bearing cover of the high-speed shaft of the gearbox, and the sensor, acquisition card, and PC were connected through the signal output line.The construction process of the gearbox fault diagnosis experimental platform is as follows.
(1) To ensure safety during the experiment, an air switch was installed between the power plug and inverter.(2) The inverter was connected to the motor, and the motor and gearbox were connected through the belt.The gearbox and the magnetic powder brake were connected through the coupling, and the motor, gearbox, and magnetic powder brake were fixed in the base plate.(3) A piezoelectric accelerometer was installed at the axial position of the bearing cover of the high-speed shaft of the gearbox, and the sensor, acquisition card, and PC were connected through the signal output line.(4) Four types of vibration data were obtained from the experiment: normal, wear, pitting, and broken gears.The motor speed was set at 900 rpm, and the sampling frequency was set at 6 kHz.A total of 1.8 million data points were collected for each state.The data groups, data length, sampling frequency, and motor speed of the vibration data obtained from the experiments are listed in Table 1.

Multi-Model Feature Fusion Fault Diagnosis Model Framework
Multi-model fusion is a multi-model combination and integration, which is a method of combining multiple models in some way [47].It seeks a better model to solve complex problems through multi-model fusion technology.The model-fusion strategy can be divided into three levels: sensor, feature, and decision level fusion [46].In this study, a feature-level fusion strategy was used for feature fusion.The specific process of the gearbox fault diagnosis model framework based on multi-model feature fusion is shown in Figure 5.As is shown in Figure 5, the vibration data of each state of the gearbox were obtained through vibration sensors, and the obtained data were subjected to time-frequency domain feature extraction and 1DCNN model feature extraction.The feature sets {X1} and {X2} were fused using the parallel fusion method.Finally, the IPSO-SVM model was used to complete the fault recognition of the gearbox.The specific parameters of the gearbox fault diagnosis model of the multi-model feature fusion are shown in Figure 6.
To compress the length of the original vibration data and fully extract useful information, the number and size of the convolution kernels of the 1DCNN models were reduced.As shown in Figure 6, the parameters of each layer of the gearbox fault diagnosis model with multi-model feature fusion were obtained, and each group of 1024-long experimental vibration signals was processed using four convolution layers, four pooling layers, and two full connection layers to obtain ten-dimensional feature samples.The selection of the time-frequency domain parameters mainly included the root mean, kurtosis, peak factor, impulse factor, wavelet factor, margin factor, barycenter frequency, and root mean square frequency.Fusion was conducted using feature fusion technology to complete the pattern recognition.As is shown in Figure 5, the vibration data of each state of the gearbox were obtained through vibration sensors, and the obtained data were subjected to time-frequency domain feature extraction and 1DCNN model feature extraction.The feature sets {X1} and {X2} were fused using the parallel fusion method.Finally, the IPSO-SVM model was used to complete the fault recognition of the gearbox.The specific parameters of the gearbox fault diagnosis model of the multi-model feature fusion are shown in Figure 6.
To compress the length of the original vibration data and fully extract useful information, the number and size of the convolution kernels of the 1DCNN models were reduced.As shown in Figure 6, the parameters of each layer of the gearbox fault diagnosis model with multi-model feature fusion were obtained, and each group of 1024-long experimental vibration signals was processed using four convolution layers, four pooling layers, and two full connection layers to obtain ten-dimensional feature samples.The selection of the time-frequency domain parameters mainly included the root mean, kurtosis, peak factor, impulse factor, wavelet factor, margin factor, barycenter frequency, and root mean square frequency.Fusion was conducted using feature fusion technology to complete the pattern recognition.

Time-Frequency Domain Feature Extraction
Using the experimental parameters in Table 1 and the no-load condition, the timedomain and frequency-domain diagrams of the experimental data of the four gearbox states were obtained, as shown in Figures 7 and 8, respectively.

Time-Frequency Domain Feature Extraction
Using the experimental parameters in Table 1 and the no-load condition, the timedomain and frequency-domain diagrams of the experimental data of the four gearbox states were obtained, as shown in Figures 7 and 8, respectively.3.    3.   3.

Number
Indicator Name Equation Annotation x i is the ith value of the signal x; N is the total number of data x is the signal mean; σ is the standard deviation P( f ) is the power spectrum of the signal In the eigenvalue calculation, the abovementioned eight time-frequency domain eigenvalues are very different, and the new features obtained using Equation ( 9) to fuse the features have a limited degree of distinction.Therefore, the time-frequency domain features of the vibration data were first normalized and then fused using Equation ( 9).The calculation process for the normalization operation is given by Equation ( 10): where max and min are the maximum and minimum values of the sample data, respectively, and the data can be mapped to the interval [0, 1].Table 4 lists the normalized eigenvalues of the samples.According to Equation ( 9), two sets of feature sets are complemented by zero.Eight time-frequency domain features and two complemented features are combined into the feature set {X1}.

Feature Extraction Analysis
Using the experimental parameters in Table 1 and the no-load condition, the vibration data of the four gearbox states were obtained.The vibration data of the four gearbox states were extracted using the 1DCNN model, and eight groups of ten-dimensional feature samples were obtained to form the feature set {X2}.The specific features are listed in Table 5. Feature sets {X1} and {X2} were fused using the parallel fusion method.Some fused feature data are shown in Table 6.Table 6 lists the eigenvalues obtained after the fusion of {X1} and {X2}.Using this method, the feature extraction of the original vibration data was completed, and feature set {D} was obtained by feature fusion, which was used for the next state recognition.

IPSO-SVM Parameter Analysis
After the feature sets {X1} and {X2} were fused in parallel, a 6000 × 10-dimensional feature set {D} was obtained and used as the input sample for the IPSO-SVM model.The training and test datasets were divided according to a ratio of 8:2, that is, 1200 sets of training samples for each state for a total of 4800 sets of training samples.Totals of 300 test sets for each state and 1200 test sets were obtained.The initial parameters of the SVM were set as learning factors c1 = 1.8, c2 = 2.3, initialized maximum weight ω max = 0.9, initialized minimum weight ω min = 0.4, number of particles = 30, and number of iterations = 200.The IPSO-SVM fitness curves are shown in Figure 9.After the SVM model was optimized using the IPSO algorithm, the best-fitnesschange curve was obtained, as shown in Figure 9. Figure 9 also shows that the IPSO-SVM model has the best fitness value of 96.3 after 17 iterations; it does not fall into the local optimum, and the convergence speed is fast.The recognition rate was 98.6%, the optimal penalty factor C was 4.32, and the kernel parameter γ was 1.91.To prevent the contingency in the classification recognition experiment and verify the reliability of the model, the IPSO-SVM model was run 10 times; the recognition accuracy of the 1200 test sets is shown in Figure 10, where the recognition accuracy refers to the ratio between the correct number of test samples and the total test samples.As shown in Figure 10, the average recognition accuracy of the model running 10 times was 98.3%, of which the highest was the fifth time, and its recognition accuracy was 98.6%; the lowest was the third time, with a recognition accuracy of 97.9%.The recognition effect is relatively stable.The confusion matrix for the recognition results of the fifth operation is shown in Figure 11.After the SVM model was optimized using the IPSO algorithm, the best-fitness-change curve was obtained, as shown in Figure 9. Figure 9 also shows that the IPSO-SVM model has the best fitness value of 96.3 after 17 iterations; it does not fall into the local optimum, and the convergence speed is fast.The recognition rate was 98.6%, the optimal penalty factor C was 4.32, and the kernel parameter γ was 1.91.To prevent the contingency in the classification recognition experiment and verify the reliability of the model, the IPSO-SVM model was run 10 times; the recognition accuracy of the 1200 test sets is shown in Figure 10, where the recognition accuracy refers to the ratio between the correct number of test samples and the total test samples.After the SVM model was optimized using the IPSO algorithm, the best-fitnesschange curve was obtained, as shown in Figure 9. Figure 9 also shows that the IPSO-SVM model has the best fitness value of 96.3 after 17 iterations; it does not fall into the local optimum, and the convergence speed is fast.The recognition rate was 98.6%, the optimal penalty factor C was 4.32, and the kernel parameter γ was 1.91.To prevent the contingency in the classification recognition experiment and verify the reliability of the model, the IPSO-SVM model was run 10 times; the recognition accuracy of the 1200 test sets is shown in Figure 10, where the recognition accuracy refers to the ratio between the correct number of test samples and the total test samples.As shown in Figure 10, the average recognition accuracy of the model running 10 times was 98.3%, of which the highest was the fifth time, and its recognition accuracy was 98.6%; the lowest was the third time, with a recognition accuracy of 97.9%.The recognition effect is relatively stable.The confusion matrix for the recognition results of the fifth operation is shown in Figure 11.As shown in Figure 10, the average recognition accuracy of the model running 10 times was 98.3%, of which the highest was the fifth time, and its recognition accuracy was 98.6%; the lowest was the third time, with a recognition accuracy of 97.9%.The recognition effect is relatively stable.The confusion matrix for the recognition results of the fifth operation is shown in Figure 11.From the confusion matrix shown in Figure 11, the recall value and precision value of each state of the gear under the model can be seen.The recall value of each state is greater than 97.7%, so the sample of each state of the gear has a probability of more than 97.7% of being successfully identified.The precision value of each state is greater than 97.7%, so there is a probability that more than 97.7% of the samples of each state of the identified gear are true samples of this state.It has been demonstrated that, after multifeature fusion, the recognition accuracy of the IPSO-SVM model is also improved.

Model Comparison and Verification
To verify the effectiveness of the proposed gearbox fault diagnosis method based on the multi-model feature fusion model, the recognition results of the four models were compared.In the time-frequency feature + IPSO-SVM model, the time-frequency feature set is the feature set {X1} of this study, and the 1DCNN parameters in the 1DCNN-softmax model are the same as those in Figure 9.The four models were identified ten times, and the average value was taken as the recognition accuracy of the model.The recognition results are listed in Table 7.As shown in Table 7, the proposed multi-model feature fusion model has the highest recognition rate and the lowest standard deviation compared with the other models, and the reasons are as follows.
(1) The traditional time-frequency feature extraction has human interference, which easily leads to the loss of valuable information and the reduction of recognition rate.From the confusion matrix shown in Figure 11, the recall value and precision value of each state of the gear under the model can be seen.The recall value of each state is greater than 97.7%, so the sample of each state of the gear has a probability of more than 97.7% of being successfully identified.The precision value of each state is greater than 97.7%, so there is a probability that more than 97.7% of the samples of each state of the identified gear are true samples of this state.It has been demonstrated that, after multi-feature fusion, the recognition accuracy of the IPSO-SVM model is also improved.

Model Comparison and Verification
To verify the effectiveness of the proposed gearbox fault diagnosis method based on the multi-model feature fusion model, the recognition results of the four models were compared.In the time-frequency feature + IPSO-SVM model, the time-frequency feature set is the feature set {X1} of this study, and the 1DCNN parameters in the 1DCNN-softmax model are the same as those in Figure 9.The four models were identified ten times, and the average value was taken as the recognition accuracy of the model.The recognition results are listed in Table 7.As shown in Table 7, the proposed multi-model feature fusion model has the highest recognition rate and the lowest standard deviation compared with the other models, and the reasons are as follows.
(1) The traditional time-frequency feature extraction has human interference, which easily leads to the loss of valuable information and the reduction of recognition rate.(2) Using 1DCNN to extract features from original data reduces human interference, improves the reliability of extracted features, and is conducive to improving the recognition rate.
(3) The 1DCNN model takes a long time to operate, the pooling layer will lose valuable information, and it is easy for the commonly used softmax classifier to fall into local optimization.(4) The proposed multi-model feature fusion model fused traditional time-frequency sensitive features and CNN extracted features through the parallel fusion method to overcome the single feature and effectively improve the recognition rate.
The results show that the proposed model verifies the effectiveness and stability of the gearbox fault diagnosis.The results demonstrate that fused features are more effective than non-fused features in reflecting the gearbox fault state.

Conclusions
The gearbox fault diagnosis method is based on a multi-model feature fusion model that is proposed and validated in this paper.An experimental platform for gearbox fault diagnosis was constructed, and the raw vibration data were obtained using an accelerometer.The vibration data were feature-extracted using the 1DCNN model to obtain the feature set {X2}.After a comparative analysis with the time-frequency domain index {X1}, the timefrequency index {X1} was normalized.{X1} and {X2} were used for the feature-layer fusion using the parallel fusion method.The four gearbox states were classified and identified using the proposed multi-feature fusion model.Compared with time-frequency + IPSO-SVM, traditional 1DCNN, and 1DCNN-IPSO-SVM, the average recognition accuracy of 1200 sets of test samples reached 98.3% using the proposed multi-feature fusion model.The proposed model achieved 1.8%, 1.9%, and 0.8% higher recognition rates than traditional 1DCNN, IPSO-SVM, and 1DCNN-IPSO-SVM, respectively.The test results demonstrated the effectiveness and stability of the multi-model feature fusion model for gearbox fault diagnosis.Uncertainty inevitably exists in the gearbox fault vibrations.In future research, uncertainty quantification can be considered to improve the reliability of the diagnosis results, and it also can be considered to eliminate the misclassification of normal teeth and faulty teeth according to the degree of fault.

Figure 1 .
Figure 1.Four-class SVM recognition flow chart for gearbox fault diagnosis.

Figure 1 .
Figure 1.Four-class SVM recognition flow chart for gearbox fault diagnosis.

Figure 2 .
Figure 2. Connection chart of the gearbox experimental device based on the accelerometer.Figure 2. Connection chart of the gearbox experimental device based on the accelerometer.

Figure 2 .
Figure 2. Connection chart of the gearbox experimental device based on the accelerometer.Figure 2. Connection chart of the gearbox experimental device based on the accelerometer.Machines 2022, 10, x FOR PEER REVIEW 7 of 18

4 .
Fault Diagnosis Model Construction 4.1.1DCNN-IPSO-SVM Model The overall workflow of the 1DCNN-IPSO-SVM model is shown in Figure 4.The 1DCNN model parameters are shown in Table 2.The first step is to input the collected one-dimensional vibration data into the established 1DCNN model, and the convolution function then selects a one-dimensional convolution function.During training, the Adam algorithm was selected to optimize the loss function, the learning rate was set to 0.001, and the activation function of each layer of the model was set to the rectified linear unit (ReLU) function.The ReLU is an unsaturated nonlinear function, which can be guaranteed to be positive in the calculation.To prevent the model from overfitting, a dropout layer was introduced, with the dropout = 0.8.The epoch size was set to 30, and the batch size was set to 20.As there were 6000 sets of experimental sample data, including 4800 sets of training samples and 1200 sets of test samples, each epoch had 160 training steps.A trained model was obtained after each training epoch.The model fit was obtained by substituting the training and validation data into the trained model.The second step was to run the model ten times, select the model with the highest accuracy, and maintain the network structure and parameters.Feature extraction was then performed, and the feature samples were used as input samples for the IPSO-SVM algorithm to obtain recognition results.Machines 2022, 10, x FOR PEER REVIEW 8 of 18 size was set to 20.As there were 6000 sets of experimental sample data, including 4800 sets of training samples and 1200 sets of test samples, each epoch had 160 training steps.A trained model was obtained after each training epoch.The model fit was obtained by substituting the training and validation data into the trained model.The second step was to run the model ten times, select the model with the highest accuracy, and maintain the network structure and parameters.Feature extraction was then performed, and the feature samples were used as input samples for the IPSO-SVM algorithm to obtain recognition results.

Figure 4 .
Figure 4. Workflow chart of gearbox fault diagnosis based on the 1DCNN-IPSO-SVM model.

Figure 4 .
Figure 4. Workflow chart of gearbox fault diagnosis based on the 1DCNN-IPSO-SVM model.

18 Figure 5 .
Figure 5. Framework of the gearbox fault diagnosis model based on multi-model feature fusion.

Figure 5 .
Figure 5. Framework of the gearbox fault diagnosis model based on multi-model feature fusion.

Figure 6 .
Figure 6.Parameter flow chart of the gearbox fault diagnosis based on parallel feature fusion.

Figure 6 .
Figure 6.Parameter flow chart of the gearbox fault diagnosis based on parallel feature fusion.

Figure 7 .
Figure 7. Time−domain diagram of four types of the gearbox vibration data.

Figure 8 .
Figure 8. Frequency−domain diagram of four types of the gearbox vibration data.

Figures 7
Figures 7 and 8 show that the time-frequency characteristics of the four states are significantly different.In this study, after the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) joint wavelet threshold denoting of the gearbox vibration data [48], eight time-frequency domain features were selected: root mean square (RMS), kurtosis, peak factor, impulse factor, waveform factor, margin factor, barycenter frequency, and RMS frequency.The root mean square (RMS) and kurtosis are dimensional time-domain indicators that reflect the vibration amplitude and energy change, respectively.The peak, impulse, waveform, and margin factors are dimensionless time-domain indicators that reflect the distribution of the vibration time series.The barycenter frequency and root mean square frequency are frequency-domain indicators that reflect changes in the position of the main frequency band.The calculation equations for the eight time-frequency and characteristic indicators are listed in Table3.

Figure 7 . 18 Figure 7 .
Figure 7. Time−domain diagram of four types of the gearbox vibration data.

Figure 8 .
Figure 8. Frequency−domain diagram of four types of the gearbox vibration data.

Figures 7 and 8
Figures 7 and 8 show that the time-frequency characteristics of the four states are significantly different.In this study, after the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) joint wavelet threshold denoting of the gearbox vibration data [48], eight time-frequency domain features were selected: root mean square (RMS), kurtosis, peak factor, impulse factor, waveform factor, margin factor, barycenter frequency, and RMS frequency.The root mean square (RMS) and kurtosis are dimensional time-domain indicators that reflect the vibration amplitude and energy change, respectively.The peak, impulse, waveform, and margin factors are dimensionless time-domain indicators that reflect the distribution of the vibration time series.The barycenter frequency and root mean square frequency are frequency-domain indicators that reflect changes in the position of the main frequency band.The calculation equations for the eight time-frequency and characteristic indicators are listed in Table3.

Figure 8 .
Figure 8. Frequency−domain diagram of four types of the gearbox vibration data.

Figures 7 and 8
Figures 7 and 8 show that the time-frequency characteristics of the four states are significantly different.In this study, after the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) joint wavelet threshold denoting of the gearbox vibration data [48], eight time-frequency domain features were selected: root mean square (RMS), kurtosis, peak factor, impulse factor, waveform factor, margin factor, barycenter frequency, and RMS frequency.The root mean square (RMS) and kurtosis are dimensional time-domain indicators that reflect the vibration amplitude and energy change, respectively.The peak, impulse, waveform, and margin factors are dimensionless time-domain indicators that reflect the distribution of the vibration time series.The barycenter frequency and root mean square frequency are frequency-domain indicators that reflect changes in the position of the main frequency band.The calculation equations for the eight time-frequency and characteristic indicators are listed in Table3.
minimum weight ωmin = 0.4, number of particles = 30, and number of iterations = 200.The IPSO-SVM fitness curves are shown in Figure9.

Figure 9 .
Figure 9. IPSO-SVM fitness curves in the model optimization.

Figure 10 .
Figure 10.Average recognition accuracy by the proposed multi-model feature fusion model.

Figure 9 .
Figure 9. IPSO-SVM fitness curves in the model optimization.

Machines 2022 ,
10, x FOR PEER REVIEW 14 of 18 minimum weight ωmin = 0.4, number of particles = 30, and number of iterations = 200.The IPSO-SVM fitness curves are shown in Figure 9.

Figure 9 .
Figure 9. IPSO-SVM fitness curves in the model optimization.

Figure 10 .
Figure 10.Average recognition accuracy by the proposed multi-model feature fusion model.

Figure 10 .
Figure 10.Average recognition accuracy by the proposed multi-model feature fusion model.

Figure 11 .
Figure 11.Confusion matrix of the fifth operation recognition result.

Figure 11 .
Figure 11.Confusion matrix of the fifth operation recognition result.

Table 1 .
Gearbox experimental data information.

Table 6 .
Eigenvalues from the parallel fusion method.

Table 7 .
Average accuracy and standard deviation of the four models.

Table 7 .
Average accuracy and standard deviation of the four models.