A Fault Diagnosis Scheme for Gearbox Based on Improved Entropy and Optimized Regularized Extreme Learning Machine

: The performance of a gearbox is sensitive to failures, especially in the long‑term high speed and heavy load field. However, the multi‑fault diagnosis in gearboxes is a challenging problem be‑ cause of the complex and non‑stationary measured signal. To obtain fault information more fully and improve the accuracy of gearbox fault diagnosis, this paper proposes a feature extraction method, hierarchical refined composite multiscale fluctuation dispersion entropy (HRCMFDE) to extract the fault features of rolling bearing and the gear vibration signals at different layers and scales. On this basis, a novel fault diagnosis scheme for the gearbox based on HRCMFDE, ReliefF and grey wolf optimizer regularized extreme learning machine is proposed. Firstly, HRCMFDE is employed to extract the original features, the multi‑frequency time information can be evaluated simultaneously, and the fault feature information can be extracted more fully. After that, ReliefF is used to screen the sensitive features from the high‑dimensional fault features. Finally, the sensitive features are inputted into the optimized regularized extreme learning machine to identify the fault states of the gearbox. Through three different types of gearbox experiments, the experimental results confirm that the proposed method has better diagnostic performance and generalization, which can effec‑ tively and accurately identify the different fault categories of the gearbox and outperforms other contrastive methods.


Introduction
As a critical part of the transmit power and motion in mechanical equipment, the gearbox has been widely used in many modern industrial fields such as aerospace, wind power generation, ship, rail transit and construction machinery.However, due to heavy loads and hostile working environments, it is easy to malfunction in the actual working process.These failures will lead to inevitable dynamic behavior and even significant accidents.To avoid losses caused by gearbox failures, accurate and automatic fault detection is of great value to ensure the safe and stable operation of mechanical equipment [1].
The research on gearbox fault diagnosis is mainly based on expert systems [2], analytical models [3] and data-driven methods [4].The expert system-based method has substantial limitations and primarily relies on the experience of experts for diagnosis.The analytical model-based methods need to establish accurate and systematic mathematical models according to a specific mechanical structure, which is not always possible for complex mechanical systems [5].The data-driven method analyzes an equipment's operating state through sensor data, which has received much attention in fault diagnosis.When the gearbox fails, the failed point repeatedly collides with other parts in contact.It will cause nonlinear, non-stationary and multi-frequency complex signals.Therefore, how to extract the fault feature information that can represent the running state from this signal has become the key [6].Researchers have proposed various state-of-the-art signal analysis methods and applied them to extract gearbox fault features, such as wavelet packet transform (WPT) [7], squared envelope spectrum (SES) [8], empirical mode decomposition (EMD) [9], variational modal decomposition (VMD) [10], machine learning [11] and entropy theories [12].
As a statistical measure, entropy can quantify complexity and detect the dynamic changes of signals through the nonlinear behavior of time series.It has become a hot research topic and study in many necessary fields, such as image processing [13], mechanical fault diagnosis [14], urban systems [15] and biomedical signals [16].Due to its advantages in nonlinear vibration signals feature extraction, there are many entropy-based methods, such as sample entropy [17], fuzzy entropy [18] and permutation entropy [19].These entropy-based methods or improved methods have been successfully applied in the field of mechanical equipment fault diagnosis.Feng [20] combines with the sample entropy, and the fault diagnosis of planetary gear under non-stationary operational conditions is realized.Wei [21] proposes an improved fuzzy entropy method for feature extraction of rotating machinery and verifies the effectiveness of the method through experiments.Kuai [22] decomposes the original signal into six intrinsic mode functions and defines the permutation entropies of each intrinsic mode function component as the input for the gearbox fault diagnosis.However, sample entropy has addressed the shortcoming, but the boundary of different categories is fuzzy in practical application.Fuzzy entropy can effectively solve this problem and improve the stability of the calculation results.Permutation entropy only compares the amplitude of time series in the calculation process and ignores the amplitude difference between the same pattern.
To tackle these problems, a method called frequency-based dispersion entropy (FDE) is introduced by Azami [23].Through the comparative analysis of various kinds of classical signals, FDE has apparent advantages in terms of stability, calculation cost and noise-robustness.
Nevertheless, FDE only measures the randomness and dynamic uncertainty of time series on a single scale.To address the defect, multiscale fluctuation dispersion entropy (MFDE) [24], refined composite multiscale dispersion entropy (RCMDE) [25] and refined composite multiscale fluctuation dispersion entropy (RCMFDE) [26] have been proposed to measure the complexity of time series on multiple scales.However, MFDE, RCMDE and RCMFDE do not comprehensively consider the multiscale feature information of time series at different layers and frequency bands.These also ignore the feature information of different coarse-graining sequences at the same scale during the coarse-graining process, which results in the loss of helpful information and increases entropy estimation deviation.Meanwhile, Yan [27] introduces hierarchical dispersion entropy, and Wang [28] proposes hierarchical fluctuation dispersion entropy.HFDE and HDE can simultaneously extract high-frequency and low-frequency features of the signal.Nonetheless, in the face of complex signals, HFDE and HDE are unstable and have severe feature information loss.To address these shortcomings, this paper combines the advantages of the above methods.Further, it proposes hierarchical refined composite multiscale fluctuation dispersion entropy (HRCMFDE) to extract fault features of the gearbox vibration signals.
The HRCMFDE mothed extracts the gearbox features information from the time domain signals.The obtained high-dimensional feature vectors contain redundant information, which will drown the sensitive information [29].In this paper, ReliefF is adopted to screen sensitive information [30], eliminate the correlation among the features and avoid redundancy.In the pattern recognition stage, a regularized extreme learning machine (RELM) [31] is introduced as a classifier.The performance of RELM depends on two parameters, namely, the regularization factor and the number of hidden neurons.To avoid choosing parameter combinations by experience, the grey wolf optimizer (GWO) [32] adap-tively determines the best parameter combinations of RELM.Therefore, GWO-RELM is also proposed to give full play to the best performance of RELM.
According to the layout of the gearbox, gear trains can be classified into four categories [33]: simple gear train, compound gear train, reverted gear train and planetary gear train.One example is given in Figure 1 for each type of gear train.The types (b), (c) and (d) can be formed by the combination of (a).To verify the applicability and generalization of this method in the field of gearbox fault diagnosis, experimental research on gearboxes with more complex structures (b), (c) and (d) is carried out.
The HRCMFDE mothed extracts the gearbox features information from the time domain signals.The obtained high-dimensional feature vectors contain redundant information, which will drown the sensitive information [29].In this paper, ReliefF is adopted to screen sensitive information [30], eliminate the correlation among the features and avoid redundancy.In the pattern recognition stage, a regularized extreme learning machine (RELM) [31] is introduced as a classifier.The performance of RELM depends on two parameters, namely, the regularization factor and the number of hidden neurons.To avoid choosing parameter combinations by experience, the grey wolf optimizer (GWO) [32] adaptively determines the best parameter combinations of RELM.Therefore, GWO-RELM is also proposed to give full play to the best performance of RELM.
According to the layout of the gearbox, gear trains can be classified into four categories [33]: simple gear train, compound gear train, reverted gear train and planetary gear train.One example is given in Figure 1 for each type of gear train.The types (b), (c) and (d) can be formed by the combination of (a).To verify the applicability and generalization of this method in the field of gearbox fault diagnosis, experimental research on gearboxes with more complex structures (b), (c) and (d) is carried out.The results validate that the proposed method has a better detection ability than the existing four entropy-based approaches.
The rest of this paper is organized as follows.Section 2 presents the mathematical modelling and parameter selection of the HRCMFDE algorithm.Section 3 provides the steps of the proposed method in detail and includes the principle of GWO-RELM.Section 4 is the experimental verification.A series of gearbox experiments verify the superiority and generalization of the proposed method.Section 5 draws the conclusions.The results validate that the proposed method has a better detection ability than the existing four entropy-based approaches.

HRCMFDE
The rest of this paper is organized as follows.Section 2 presents the mathematical modelling and parameter selection of the HRCMFDE algorithm.Section 3 provides the steps of the proposed method in detail and includes the principle of GWO-RELM.Section 4 is the experimental verification.A series of gearbox experiments verify the superiority and generalization of the proposed method.Section 5 draws the conclusions.

HRCMFDE
RCMFDE do not comprehensively consider the multiscale feature information of time series at different layers and frequency bands, which inevitably leads to the loss of potential effective information.The paper puts forward hierarchical refined composite multiscale fluctuation dispersion entropy (HRCMFDE).By referring to the process of hierarchical analysis, the multi-frequency information of time can be evaluated simultaneously by constructing operators of different frequency bands, and the feature information can be extracted more fully.

Fluctuation Dispersion Entropy (FDE)
For random series X = x 1 , x 2 , • • • , x N , its features are calculated as follows: (1) Obtaining the time series 1) and ( 2), (1) where σ and µ denote the standard deviation and mean of x i , R represents the rounding function and c stands for class, respectively; (2) Defining the vector Z based on embedding dimension m and time delay λ by Equation (3).
The new series Z m.λ.c k Consequently, the number of possible fluctuation dispersion modes is equal to (2c − 1) m−1 .The probability of each mode can be calculated by: (3) The FDE of series X can be computed as follows: (5)

Refined Composite Multiscale Fluctuation Dispersion Entropy (RCMFDE)
The traditional coarse-graining multiscale method intercepts non-overlapping fragments, and the relationship between adjacent elements of each fragment is not fully considered.With the increase of the scale factor, the stability of the calculated results becomes worse.Therefore, the refined composite multiscale method is introduced, which is summarized as follows: (1) The original signal X = {x 1 , x 2 , • • • , x N } is continuously divided into a small se- quence of length τ by the initial point in order [1, τ] and then taking the average of each small sequence.These means are arranged sequentially to obtain τ scale coarsegraining time series.The q th coarse-graining time series (2) Then, for each scale factor, calculate the probability of each fluctuation dispersion mode occurring in the q th coarse-graining time series x τ q .The average of the dispersion pattern π of the coarse-graining time series in the τ scale is as follows: (3) The RCMFDE of series X can be computed as follows:

Hierarchical Refined Composite Multiscale Fluctuation Dispersion Entropy (HRCMFDE)
RCMFDE ignores the feature information of different coarse-graining sequences at the same scale, which results in the loss of useful information and the increase of entropy estimation deviation.Therefore, hierarchical refined composite multiscale fluctuation dispersion entropy is proposed to extract the fault feature of vibration signals at different hierarchical layers and scales.The detailed flow of the HRCMFDE is required below.
(1) For random series and Q 1 (x) are as follows: where Q 0 (x) and Q 1 (x) contain the low-frequency information and high-frequency information of X = x 1 , x 2 , • • • , x N , respectively; (2) Then, the matrix form of the operator Q k t (t = 0 or 1) at the hierarchical layer k is written as follows: (3) Moreover, for a given vector [v 1 , v 2 , • • • , v k ] of length k, the variable e can be calculated as follows: where v m ∈ {0, 1}(m = 1, 2, • • • , k) is the operator Q 0 or Q 1 at the m-th layer, according to Equation (10), a unique vector correspondence exists for any given nonnegative integer e; (4) The hierarchical components of the series X are represented as follows: where X k,e represents the hierarchical components at the node e of the k-th layer of series X.When k = 3, the hierarchical decomposition process is depicted in Figure 2, where X 3,1 represents the hierarchical component at node 1 of the 3-rd layer, the corresponding unique vector is [0, 0, 1].X 1,0 and X 1,1 represent the high-frequency and low-frequency components in the first layer, respectively; (5) The RCMFDE value corresponding to the hierarchical node component X k,e under the scale factor τ is calculated as the HRCMFDE value under the scale factor, which can be expressed as follows: It can be seen from the above principle description that the HRCMFDE algorithm is optimized based on FDE and RCMFDE successively.The concepts of refined multiscale and hierarchical analysis are introduced, respectively, which can effectively extract information at different hierarchical layers and scales of the original signals.This method has better stability and performance.The flow of the HRCHFDE method is shown in Figure 3.It can be seen from the above principle description that the HRCMFDE algorithm is optimized based on FDE and RCMFDE successively.The concepts of refined multiscale and hierarchical analysis are introduced, respectively, which can effectively extract information at different hierarchical layers and scales of the original signals.This method has better stability and performance.The flow of the HRCHFDE method is shown in Figure 3.

Parameters Selection
Six main parameters in HRCMFDE need to be set manually: the series length N, the hierarchical layer k, the embedding dimension m, the class c, the time delay λ and the scale factor τ. Selecting proper parameters can process the original signals more effectively, which extracts the fault information more accurately: (1) In these parameters, if the scale factor τ is too large, redundant information will be easily generated.However, if τ is too small, obtaining helpful fault feature information from the original signals is challenging.If the hierarchical layer k is too small, it incompletely extracts high-frequency and low-frequency information of the signals.Nevertheless, the computational efficiency will be affected if it is too large.
To extract valuable features, the literature results [34] set τ = 8, k = 3, which can meet the requirement of gearbox fault diagnosis.Hence, using the HRCMFDE method, 64 features can be extracted from each group of signal samples, and the corresponding feature vector under the sample is constructed; (2) For the time delay λ and the series length N, the literature [35] indicates that the time delay λ and the series length N have less impact on the feature extraction result;  It can be seen from the above principle description that the HRCMFDE algorithm is optimized based on FDE and RCMFDE successively.The concepts of refined multiscale and hierarchical analysis are introduced, respectively, which can effectively extract information at different hierarchical layers and scales of the original signals.This method has better stability and performance.The flow of the HRCHFDE method is shown in Figure 3.

Parameters Selection
Six main parameters in HRCMFDE need to be set manually: the series length N, the hierarchical layer k, the embedding dimension m, the class c, the time delay λ and the scale factor τ. Selecting proper parameters can process the original signals more effectively, which extracts the fault information more accurately: (1) In these parameters, if the scale factor τ is too large, redundant information will be easily generated.However, if τ is too small, obtaining helpful fault feature information from the original signals is challenging.If the hierarchical layer k is too small, it incompletely extracts high-frequency and low-frequency information of the signals.Nevertheless, the computational efficiency will be affected if it is too large.
To extract valuable features, the literature results [34] set τ = 8, k = 3, which can meet the requirement of gearbox fault diagnosis.Hence, using the HRCMFDE method, 64 features can be extracted from each group of signal samples, and the corresponding feature vector under the sample is constructed; (2) For the time delay λ and the series length N, the literature [35] indicates that the time delay λ and the series length N have less impact on the feature extraction result;

Parameters Selection
Six main parameters in HRCMFDE need to be set manually: the series length N, the hierarchical layer k, the embedding dimension m, the class c, the time delay λ and the scale factor τ. Selecting proper parameters can process the original signals more effectively, which extracts the fault information more accurately: (1) In these parameters, if the scale factor τ is too large, redundant information will be easily generated.However, if τ is too small, obtaining helpful fault feature information from the original signals is challenging.If the hierarchical layer k is too small, it incompletely extracts high-frequency and low-frequency information of the signals.
Nevertheless, the computational efficiency will be affected if it is too large.To extract valuable features, the literature results [34] set τ = 8, k = 3, which can meet the requirement of gearbox fault diagnosis.Hence, using the HRCMFDE method, 64 features can be extracted from each group of signal samples, and the corresponding feature vector under the sample is constructed; (2) For the time delay λ and the series length N, the literature [35] indicates that the time delay λ and the series length N have less impact on the feature extraction result; (3) For the embedding dimension m and class c, the influence of different parameter values is analyzed using the distance measure.Assuming that the gearbox parts have n different health states, and each state has M samples of sample length N, the distance measure (average Euclidean distance) would be introduced as follows: ) where x and y denote the AED values of the x-th and the y-th states, Value AED is the AED value corresponding to the parameter (m, c), respectively.
Then, repeating the calculation for different parameter combinations, m is determined according to the criterion N/τ max > (2c − 1) m−1 and, according to the literature results [36], set c ∈ [4, 8].The (m, c) combination corresponding to the maximum Value AED value is the best (m, c) combination.

The Proposed Gearbox Intelligent Fault Diagnosis Method
According to the proposed feature extraction method, there is still redundant information in the feature vectors, affecting the recognition accuracy and increasing the calculation cost.Therefore, this section mainly introduces a feature dimension reduction method which removes redundant high-dimensional information and realizes the screening of sensitive features.At the same time, an improved classification method is introduced to realize the final fault diagnosis.

Grey Wolf Optimizer
Grey wolf optimizer (GWO) is one of the most popular metaheuristic algorithms in the recent decade, which Australian scholar Mirjalili proposes.The introduction of this algorithm is detailed in the literature and will not be described in this paper [37].
The algorithm program of GWO can be described in Algorithm 1.

Algorithm 1: GWO
(1) Initialize the grey wolf population X i (i = 1, 2, . . ., n); (2) Initialize α, A and C; (3) Calculate the objective values for each search agent X α = the best search agent X β = the second-best search agent X δ = the third-best search agent; (4) for t = 1: max number of iterations for each search agent Update the position of the current search agent by Equations ( 21)-( 26) end for Update α, A and C Calculate the fitness of all search agents Update X α , X β , and X δ end for (5) Return X α .
The steps of the GWO are as follows: During hunting, the encircling behavior of grey wolves can be defined as: where t is the current iteration, where r 1 , r 2 are random vectors in [0,1], t is the number of the current iteration and m is the maximum number of iterations.
The mathematical model of individual grey wolf tracking prey is described in Equations ( 21) and (22).
A proportional weight based on the modulus of the guide position vector is introduced.By adjusting the weights, the global and local search ability of the algorithm is dynamically balanced, and the convergence of the algorithm is accelerated.The calculation formulas are as follows: Equation ( 21) defines the step length and direction of grey wolf individuals to α, β and δ, Equations ( 21) and ( 22) define the final position of X α .

Regularized Extreme Learning Machine
The regularized extreme learning machine is used as a classification algorithm, and low dimensional feature vectors of test samples are inputted to realize the fault diagnosis of the gearbox.RELM introduces the concept of regularization based on the extreme learning machine (ELM), which is an improved method based on ELM.ELM is a fast-training algorithm for SLFN proposed by Huang [38].SLFN has been widely used in many fields with its better learning ability, and the structure is shown in Figure 4.
Assuming a training dataset {(x i , t i )}, where and the number of hidden nodes is k.The training steps of the ELM algorithm are as follows: (1) Randomly set input weights w j and hidden layer biases b j : (2) The output of SLFN can be formulated as follows: where β j is the set of values of connection weights between the hidden layer and the output layer.The output Equations for the input samples can be represented as Hβ = T, where: (2) The output of SLFN can be formulated as follows: ( where is the set of values of connection weights between the hidden layer and the output layer.The output Equations for the input samples can be represented as Hβ = T, where: (29) ; (3) Obtaining the output weights matrix by solving the least multiplication solution of the following Equation: (32) where is Moore-Penrose generalized inverse matrix; (4) Building the model of regularized extreme learning machine by the following Equation: (33) where is the Transposed matrix, is the regularization factor and is the identity matrix, using the non-singular matrix to replace the matrix .RELM can avoid overfitting and enhance the generalization ability of the model, improving the (3) Obtaining the output weights matrix β by solving the least multiplication solution of the following Equation: where H T is Moore-Penrose generalized inverse matrix; (4) Building the model of regularized extreme learning machine by the following Equation: where H T is the Transposed matrix, θ is the regularization factor and I is the identity matrix, using the non-singular matrix H T H −1 H T to replace the matrix H T .RELM can avoid overfitting and enhance the generalization ability of the model, improving the accuracy of the actual prediction.All in all, RELM has a more stable performance than ELM.
The algorithm program of RELM can be described in Algorithm 2.

Hybrid GWO-RELM
The training of RELM requires randomly setting the number of hidden neurons and constantly adjusting the number n of hidden neurons to search for a better value.If the value of n is too large it will increase the possibility of overfitting and take too much time.On the contrary, achieving the best accuracy and stability is difficult.Moreover, the value of θ depends on the input sample and needs to be set according to the results of many experiments.
To overcome the problems mentioned above and improve the efficiency of RELM, a hybrid means that combines GWO with RELM is required.The goal of the GWO algorithm is to optimize the parameters to find the best set of n and θ by avoiding over-fitting and improving generalization ability.The fitness function is the essential design problem to be solved in the GWO-RELM application.In the research of this paper, the selection of the commonly used fitness functions is the minimization of the root mean squared error (RMSE) given in Equation (34).

Algorithm 2: RELM
where N is the number of training samples, T i is the actual value and P i is the predicted value.The steps of the GWO-RELM are shown as follows: (1) Building fitness function for optimization parameters n and θ; (2) Setting the initial parameters and taking [n,θ ] as the grey wolf position to generate the initial population; (3) Calculating the fitness of individual grey wolves in the population; (4) Repeating several iterations and constantly updating the optimal fitness value; (5) Outputting the best parameters and the corresponding accuracy.
The flow charts of GWO-RELM are shown in Figure 5.

ReliefF
The high-dimensional feature vectors extracted by the HRCMFDE method are rich in fault feature information and redundant information.If all the feature information is used for fault diagnosis, the accuracy and efficiency of the diagnosis will be affected.Therefore, according to the importance and sensitivity of each feature, it is essential to reduce the dimension of high-dimensional feature vectors and obtain sensitive low-dimensional feature vectors.This paper uses the ReliefF method for feature dimension reduction; the detailed description of ReliefF is in reference [39].
A sample R i is randomly selected from the training set for the high-dimensional feature.Then, k nearest neighbour samples are chosen from the samples with the same label, and select k nearest neighbour samples from the different labels.Finally, using Equation (35), update the corresponding weight of the feature constantly, and the calculation is carried out m times until all the samples are successively calculated.The final weight of a single feature is obtained.
mk (35) where W i ( f l ) is the weight of the l-th feature f in the i-th sample; H j (j = 1, 2, • • • , k) is the j-th sample among k nearest neighbour samples of the same kind as R i ; P(C) is the probability of label C; P(label(R i )) is the probability of samples of the same kind as R i to the total samples; and M j (C) represents k nearest neighbour samples different from R. The calculation method of function di f f ( f , R 1 , R 2 ) is shown in Equation (36).
where di f f ( f , R 1 , R 2 ) is the normalized distance between sample R 1 and sample R 2 on the f -th feature.R 1 f and R 2 f are the f -th feature of samples R 1 and sample R 2 .

ReliefF
The high-dimensional feature vectors extracted by the HRCMFDE method are rich in fault feature information and redundant information.If all the feature information is used for fault diagnosis, the accuracy and efficiency of the diagnosis will be affected.Therefore, according to the importance and sensitivity of each feature, it is essential to reduce the dimension of high-dimensional feature vectors and obtain sensitive lowdimensional feature vectors.This paper uses the ReliefF method for feature dimension reduction; the detailed description of ReliefF is in reference [39].
A sample is randomly selected from the training set for the high-dimensional feature.Then, k nearest neighbour samples are chosen from the samples with the same label, and select k nearest neighbour samples from the different labels.Finally, using Equation ( 35), update the corresponding weight of the feature constantly, and the calculation is carried out m times until all the samples are successively calculated.The

The Proposed Fault Diagnosis Method
To ensure high fault classification accuracy for the gearbox.Based on HRCMFDE, ReliefF and GWO-RELM, a novel gearbox fault diagnosis method is presented in this paper, and the detailed process is shown in Figure 6.(1) Collecting the vibration signals.The various fault states of gears and rolling bearings in the gearbox are collected by accelerometers; (2) Determining the optimal parameters of HRCMFDE.The features under different (m, c) combinations are extracted, respectively, and the (m, c) combination corresponding to the maximum Value AED value is taken as the optimal parameter; (3) Extracting fault features.To extract the fault feature information of the gearbox completely, the HRCMFDE method is employed to calculate the entropy value, and the feature set with a length of 64 is obtained; (4) Feature dimension reduction.ReliefF is utilized to extract sensitive feature information and remove redundant features; (5) Fault classification.The obtained low-dimensional sensitive feature information is inputted into GWO-RELM to identify the health conditions of the gearbox.

Experimental Verification
In this section, to verify the diagnostic effectiveness and generalization of the above methods, the gearboxes of three structural types as shown in Figure 1b-d, are selected to carry out experimental testing.

Experiment 1: Fault Diagnosis of Reverted Gear Train Gearbox
The experiment data comes from the 2009 PHM Challenge gearbox composite fault data set [40].The experimental platform and its structure principle used in the experiment are shown in Figure 7, which mainly consists of the shaft, bearing, gear and other components.(2) Determining the optimal parameters of HRCMFDE.The features under different (m, c) combinations are extracted, respectively, and the (m, c) combination corresponding to the maximum value is taken as the optimal parameter; (3) Extracting fault features.To extract the fault feature information of the gearbox completely, the HRCMFDE method is employed to calculate the entropy value, and the feature set with a length of 64 is obtained; (4) Feature dimension reduction.ReliefF is utilized to extract sensitive feature information and remove redundant features; (5) Fault classification.The obtained low-dimensional sensitive feature information is inputted into GWO-RELM to identify the health conditions of the gearbox.

Experimental Verification
In this section, to verify the diagnostic effectiveness and generalization of the above methods, the gearboxes of three structural types as shown in Figure 1b-d, are selected to carry out experimental testing.

Experiment 1: Fault Diagnosis of Reverted Gear Train Gearbox
The experiment data comes from the 2009 PHM Challenge gearbox composite fault data set [40].The experimental platform and its structure principle used in the experiment are shown in Figure 7, which mainly consists of the shaft, bearing, gear and other components.In the study, using the data set of the spur gear for analysis, which includes a normal operation state, single fault state and compound fault, fully reflects the fault state in the actual operation process of the gearbox.The detailed and time domain waveforms are depicted in Table 1 and Figure 8.
The experiment is performed in the input shaft speed is 2400 r/min and the low load, the corresponding number of teeth of the spur gear 1, 2, 3, 4 are 16, 48, 24, 40, respectively.Vibration signals are collected by two accelerometers, and the installation mode as shown in Figure 9, which the paper uses the vibration data obtained by 1 channel, with sample frequency of 66.7 kHz and sampling time is 4 s.For each working status, 60 samples with the length of 2048 are taken, where 40 samples as the training samples and 20 samples as the testing samples.In the study, using the data set of the spur gear for analysis, which includes a normal operation state, single fault state and compound fault, fully reflects the fault state in the actual operation process of the gearbox.The detailed and time domain waveforms are depicted in Table 1 and Figure 8.
The experiment is performed in the input shaft speed is 2400 r/min and the low load, the corresponding number of teeth of the spur gear 1, 2, 3, 4 are 16, 48, 24, 40, respectively.Vibration signals are collected by two accelerometers, and the installation mode as shown in Figure 9, which the paper uses the vibration data obtained by 1 channel, with sample frequency of 66.7 kHz and sampling time is 4 s.For each working status, 60 samples with the length of 2048 are taken, where 40 samples as the training samples and 20 samples as the testing samples.The performance of the proposed method is verified by experimental data.Firstly, selecting the best (m, c) combination according to the AED method proposed in Section 2.4 is required, and 50 samples are randomly selected for each working state of the gearbox.The results of under different (m, c) combinations are illustrated in Table 2.It can be found that, with the increase of (m, c), the also increases, the Euclidean distance between samples of different states becomes larger and the separability is constantly enhanced.The results of the feature extraction of the training samples are shown in Figure 10, and the low-dimensional features after feature reduction are displayed in Figure 11, respectively.It can be seen that there are differences in features of different states, but it is difficult to distinguish them directly.Therefore, it is necessary to rely on a classification algorithm to identify the states.The sensitive feature vectors are inputted into GWO-RELM, and the final setting optimization parameters are set to n = 113, θ =0.509.To further verify the performance of the method, the samples of the training and testing are inputted into the optimized RELM model for state recognition.The results are shown in Figure 12.Among all test samples, 159 samples are identified successfully, and only one sample is incorrectly identified ("Status 3" is identified as "Status 1").In the experiment, the recognition accuracy of all samples in different operating states of the gearbox is 99.38%.It is indicated that the proposed method can effectively realize the fault diagnosis of the gearbox under various working conditions, such as single fault and compound fault.To further verify the performance of the method, the samples of the training and testing are inputted into the optimized RELM model for state recognition.The results are shown in Figure 12.Among all test samples, 159 samples are identified successfully, and only one sample is incorrectly identified ("Status 3" is identified as "Status 1").In the experiment, the recognition accuracy of all samples in different operating states of the gearbox is 99.38%.It is indicated that the proposed method can effectively realize the fault diagnosis of the gearbox under various working conditions, such as single fault and compound fault.3 and Figure 13, where 'Time' in Table 3 refers to the time consumed by a single sample to extract the high-dimensional features.The following conclusions can be found: (1) The correlations can be found by SD values as follows: The SD of MFDE and RCMFDE is smaller than MDE and RCMDE, respectively.It can be seen that FDE has better feature evaluation performance than DE, considering the fluctuation characteristics of the vibration signals.
The SD of RCMFDE and RCMDE is smaller than MFDE and MDE, respectively, which means that the refined composite entropy-based method has better stability.To further verify the performance of the method, the samples of the training and testing are inputted into the optimized RELM model for state recognition.The results are shown in Figure 12.Among all test samples, 159 samples are identified successfully, and only one sample is incorrectly identified ("Status 3" is identified as "Status 1").In the experiment, the recognition accuracy of all samples in different operating states of the gearbox is 99.38%.It is indicated that the proposed method can effectively realize the fault diagnosis of the gearbox under various working conditions, such as single fault and compound fault.3 and Figure 13, where 'Time' in Table 3 refers to the time consumed by a single sample to extract the high-dimensional features.The following conclusions can be found: (1) The correlations can be found by SD values as follows: The SD of MFDE and RCMFDE is smaller than MDE and RCMDE, respectively.It can be seen that FDE has better feature evaluation performance than DE, considering the fluctuation characteristics of the vibration signals.
The SD of RCMFDE and RCMDE is smaller than MFDE and MDE, respectively, which means that the refined composite entropy-based method has better stability.3 and Figure 13, where 'Time' in Table 3 refers to the time consumed by a single sample to extract the highdimensional features.The following conclusions can be found: (1) The correlations can be found by SD values as follows: The SD of MFDE and RCMFDE is smaller than MDE and RCMDE, respectively.It can be seen that FDE has better feature evaluation performance than DE, considering the fluctuation characteristics of the vibration signals.The SD of RCMFDE and RCMDE is smaller than MFDE and MDE, respectively, which means that the refined composite entropy-based method has better stability.The SD of HRCMFDE is small than RCMFDE, indicating that the hierarchical entropy-based method further improves the stability.Among these methods, HRCMFDE has the best stability and apparent advantages.The reason is that the refined composite multiscale entropy only analyzes the low-frequency signals and often ignores the high-frequency signals, resulting in a relatively large limitation of feature extraction performance; (2) The MFDE model has the fastest calculation speed, but the diagnostic accuracy is insufficient, and the significant variance indicates a lack of stability.Although the HRCMFDE model has the lowest computational efficiency, it is still acceptable in practical applications.Ultimately, HRCMFDE has the highest accuracy.The reason is that the information extracted from low-frequency to high-frequency is the most extensive, and the feature information contained is the richest.Hence, it has the best stability, the highest diagnostic accuracy and a longer calculation time.The above analysis shows that the HRCMFDE model proposed in this paper has apparent advantages in the separability and stability of features.The HRCMFDE model can be effectively applied in gearbox fault diagnosis.Then, the effectiveness of the dimension reduction method in the proposed approach is studied.The high-dimensional feature vectors are directly inputted into GWO-RELM to identify the health conditions of the gearbox.The results shown in Figure 14 are obtained according to Tables 3 and 4.Then, the effectiveness of the dimension reduction method in the proposed approach is studied.The high-dimensional feature vectors are directly inputted into GWO-RELM to identify the health conditions of the gearbox.The results shown in Figure 14 are obtained according to Tables 3 and 4.   Obviously, after ReliefF, the SD of different methods is significantly reduced, and the average recognition accuracy is significantly improved.It indicates that the lowdimensional feature vectors obtained by ReliefF strengthen the stability and accuracy of the recognition and are more suitable for the recognition of the operating state of the gearbox.All in all, ReliefF is an essential process for gearbox fault diagnosis.
After that, several classification approaches widely studied in current fault diagnosis algorithms are selected and compared with the method proposed in this paper.The results of the feature extraction model under different classification approaches are shown in Table 5.Therefore, the proposed classification method has better classification performance.Finally, the HRCMFDE, RCMFDE and RCMDE models with better feature extraction performance are further evaluated.Commonly used indicators used in fault diagnosis methods to evaluate the superiority of model performance include Precision (P), Recall (R), Accuracy (Acc) and F1 score (F1) [41].Precision is the ratio of the actual positive samples predicted in the test model to the predicted positive samples, which indicates the proportion of the real positive samples in the prediction results of the model.The Recall is the ratio of the number of true positive samples predicted by the model and the number Obviously, after ReliefF, the SD of different methods is significantly reduced, and the average recognition accuracy is significantly improved.It indicates that the low-dimensional feature vectors obtained by ReliefF strengthen the stability and accuracy of the recognition and are more suitable for the recognition of the operating state of the gearbox.All in all, ReliefF is an essential process for gearbox fault diagnosis.
After that, several classification approaches widely studied in current fault diagnosis algorithms are selected and compared with the method proposed in this paper.The results of the feature extraction model under different classification approaches are shown in Table 5.Therefore, the proposed classification method has better classification performance.Finally, the HRCMFDE, RCMFDE and RCMDE models with better feature extraction performance are further evaluated.Commonly used indicators used in fault diagnosis methods to evaluate the superiority of model performance include Precision (P), Recall (R), Accuracy (Acc) and F1 score (F1) [41].Precision is the ratio of the actual positive samples predicted in the test model to the predicted positive samples, which indicates the proportion of the real positive samples in the prediction results of the model.The Recall is the ratio of the number of true positive samples predicted by the model and the number of true positive samples in the samples.The Accuracy and F1 score are used to measure the overall performance of the model.The higher the index, the stronger the fault diagnosis capability of the model and the better the overall performance.
Each state is taken as a positive class, and the corresponding four indicators under this state are calculated successively.Each group of experiments is carried out 50 times.The average values are recorded in Table 6, where the status corresponds to various fault states in Table 1 and OM is the overall means.Compared with the RCMDE model, the RCMFDE model has better feature extraction performance and higher comprehensive scoring.Compared with the RCMFDE model, the P-means, R-means, Acc-means and F-means of the HRCMFDE model are increased by 3.89%, 4.16%, 1.04% and 4.17%, respectively.It shows that the HRCMFDE model has superior performance, higher accuracy and stability.

Experiment 2: Fault Diagnosis of Compound Gear Train Gearbox
In Section 4.1, the validity of the proposed method is verified by the typical vibration signals in the reverted gear train gearbox.Then, this method is used in another experiment to verify the effectiveness further and provide an effective state diagnosis method for the compound gear train gearbox, which offers the basis for other studies on the experimental platform.The structure of the experimental platform and gearbox is shown in Figure 15, which adopts a dual-input single-output fault diagnosis platform to collect the vibration signals of the gearbox in different working conditions.The platform mainly consists of driving motors, gears, bearings, transmission shafts and other components.

Experiment 2: Fault Diagnosis of Compound Gear Train Gearbox
In Section 4.1, the validity of the proposed method is verified by the typical vibration signals in the reverted gear train gearbox.Then, this method is used in another experiment to verify the effectiveness further and provide an effective state diagnosis method for the compound gear train gearbox, which offers the basis for other studies on the experimental platform.The structure of the experimental platform and gearbox is shown in Figure 15, which adopts a dual-input single-output fault diagnosis platform to collect the vibration signals of the gearbox in different working conditions.The platform mainly consists of driving motors, gears, bearings, transmission shafts and other components.In the experiment, the driving motor provides power, and the two driving wheels transmit the power to the driven gear to achieve power transmission.The internal structure of the gearbox and the layout of the accelerometer are displayed in Figure 16.In this paper, sensor data of channel 1 are employed for analysis, and ten different working states are simulated by replacing different fault components.
In the experiment, the sampling frequency is 2048 Hz, the sampling time is the 90 s and the driving wheel speed is 1200 r/min.60 samples with a length of 2048 are taken under each working state, with 40 samples as the training samples and 20 samples as the testing samples.The detailed fault information of the gearbox components is shown in Table 7.The components with various faults of gears and bearings are shown in Figure 17, which includes single and compound faults of gears and gearings.The time-domain waveforms of different states are shown in Figure 18, and the difference between signals of each state can be found.In the experiment, the driving motor provides power, and the two driving wheels transmit the power to the driven gear to achieve power transmission.The internal structure of the gearbox and the layout of the accelerometer are displayed in Figure 16.In this paper, sensor data of channel 1 are employed for analysis, and ten different working states are simulated by replacing different fault components.In the experiment, the sampling frequency is 2048 Hz, the sampling time is the 90 s and the driving wheel speed is 1200 r/min.60 samples with a length of 2048 are taken under each working state, with 40 samples as the training samples and 20 samples as the testing samples.The detailed fault information of the gearbox components is shown in Table 7.The components with various faults of gears and bearings are shown in Figure 17, which includes single and compound faults of gears and gearings.The time-domain waveforms of different states are shown in Figure 18, and the difference between signals of each state can be found.Similar to experiment 1, the feature extraction capability of those models is further evaluated, and the results are shown in Table 11.It can be found that the four indexes in the HRCMFDE model have improved and still have better comprehensive performance and stability.In addition, the HRCMFDE model performs better in the feature extraction of bearing faults than gear faults.Similar to experiment 1, the feature extraction capability of those models is further evaluated, and the results are shown in Table 11.It can be found that the four indexes in the HRCMFDE model have improved and still have better comprehensive performance and stability.In addition, the HRCMFDE model performs better in the feature extraction of bearing faults than gear faults.
Aiming at the compound gear train experimental platform, it is shown that this method can effectively identify the running state of the fault and provide a diagnosis method for the state monitoring of the experimental platform, which offers a basis for other studies on the experimental platform.

Experiment 3: Fault Diagnosis of Planetary Gear Train Gearboxes
In experiment 1 and experiment 2, the proposed method is used to realize the state identification of two different types of gearboxes, respectively.The results are satisfactory, which proves the application potential of the presented approach in the field of gearbox fault diagnosis.In this experiment, the planetary gearbox data from Southeast University is taken as an example to verify further the effectiveness of the proposed method [42].These data are collected from the Drivetrain Dynamic Simulator (DDS).There is a classic planetary gearbox fault diagnosis state simulation experimental platform employed by many scholars to research the fault diagnosis method.A detailed description of this experiment is shown in [42].This paper uses the data of channel 2, and the working condition is 20 HZ-0 V.The different fault types of gears and bears in the gearbox are shown in Table 12.In the experiment, 60 samples are selected for each group, with 40 samples as the training samples and 20 samples as the testing samples.The parameter determination process in the proposed method is consistent with that in experiment 1 and experiment 2. The final parameters are set to k = 3, τ = 8, λ = 1, N = 2048, m = 3, c = 8, n = 113, θ = 0.72.Using the fault diagnosis method proposed in Section 3.5, the final identification result is shown in Figure 23.It can be seen that, among 180 testing samples, only one sample is misidentified ("GS" is identified as "GR"), and the overall accuracy reaches 99.44%.The comparison results of different methods are shown in Table 13 and Figure 24.In addition, the comparison of recognition results of the feature extraction model under different classification methods is shown in Table 14.It can be seen that the proposed method still has a more precise diagnosis accuracy and a more stable performance.Gearwheel Surface GS Wear occurs in the surface of gear In the experiment, 60 samples are selected for each group, with 40 samples as training samples and 20 samples as the testing samples.The parameter determina process in the proposed method is consistent with that in experiment 1 and experimen The final parameters are set to k = 3, τ = 8, λ = 1, N = 2048, m = 3, c = 8, n = 113, θ =0 Using the fault diagnosis method proposed in Section 3.5, the final identification resu shown in Figure 23.It can be seen that, among 180 testing samples, only one sampl misidentified ("GS" is identified as "GR"), and the overall accuracy reaches 99.44%.comparison results of different methods are shown in Table 13 and Figure 24.In addit the comparison of recognition results of the feature extraction model under differ classification methods is shown in Table 14.It can be seen that the proposed method has a more precise diagnosis accuracy and a more stable performance.The HRCMFDE, RCMFDE and RCMDE models with better feature extraction performance are compared, and the results are shown in Table 15.The method proposed can fully identify testing samples of six different states.Meanwhile, the remaining methods have no way of completely identifying testing samples of any states.Compared with the RCMFDE model, the P-means, R-means, Acc-means and F-means are increased by 4.70%, 4.89%, 1.09% and 4.89%, respectively.Once again, the superior feature extraction ability of the HRCMFDE model is highlighted.
This section uses gearbox signals from three different types to discuss the performance and generalization of the presented method in the field of the gearbox.In experiment 1, the effectiveness and stability of the proposed method in feature extraction and sensitive information screening for single and compound faults are verified by the reverted gear train gearbox vibration signals.In experiment 2, the proposed method is applied to the constructed fault simulation platform, which proves that it still has excellent stability and diagnostic accuracy.In experiment 3, the proposed method is successfully used for planetary gearbox fault diagnosis.Analyzing the results of the three experiments shows that this method has excellent practicability and superior performance and can effectively identify single or compound faults in the gearbox.Meanwhile, the method possesses good generalization performance and is suitable for various structural types of gearboxes.

Conclusions
In this research, a novel fault diagnosis approach based on HRCMFDE, ReliefF and GWO-RELM is developed and applied to gearbox fault diagnosis.The effectiveness and superiority of the proposed method compared with existing methods are verified, and the generalization of the method is discussed on various gearbox structures.The main conclusions of this paper can be summarized as follows: (1) In view of the problem of poor stability and insufficient feature information extraction, a feature extraction method of HRCMFDE is proposed based on the existing techniques, which can effectively extract fault features of gearbox vibration signals at different hierarchical layers and scales; (2) ReliefF is utilized to screen sensitive information of high-dimensional information and remove redundant features.GWO-RELM is used to identify the health conditions of the gearbox.Combined with HRCMFDE, ReliefF and GWO-RELM, a novel gearbox fault diagnosis method is proposed.The method is verified by the gearbox fault data set.It shows that the proposed method has superior fault diagnosis performance and can accurately diagnose different working states of gearbox bearings and gears.The proposed method has better diagnostic accuracy and stability than the existing RCMFDE, MFDE, RCMDE and MDE methods; (3) The main structural types of gearboxes in practical applications have been tested.The methods proposed in each experiment can effectively identify the fault running state and have been successfully applied to the fault diagnosis of various gearboxes.The results show that this method has excellent practicability and generality, so it can widely apply to gearbox fault diagnosis.
In this preliminary study, the proposed approach is satisfactory and promising in the health condition identification of the gearbox.Moreover, it can be extended to other

Figure 1 .
Figure 1.Four types of gear trains: (a) simple gear train, (b) compound gear train, (c) reverted gear train and (d) planetary gear train.The main contributions of this paper can be summarized as follows: (1) A novel HRCMFDE method is employed to calculate the entropy value of the gearbox original vibration signals distributed over multiscale and multi-level fault feature extraction; (2) A novel fault diagnosis scheme for gearbox fault diagnosis is proposed based on HRCMFDE, ReliefF and GWO-RELM; (3) Experiment studies of the gearbox with single and compound failures are carried out.The results validate that the proposed method has a better detection ability than the existing four entropy-based approaches.

Figure 1 .
Figure 1.Four types of gear trains: (a) simple gear train, (b) compound gear train, (c) reverted gear train and (d) planetary gear train.The main contributions of this paper can be summarized as follows: (1) A novel HRCMFDE method is employed to calculate the entropy value of the gearbox original vibration signals distributed over multiscale and multi-level fault feature extraction; (2) A novel fault diagnosis scheme for gearbox fault diagnosis is proposed based on HRCMFDE, ReliefF and GWO-RELM; (3) Experiment studies of the gearbox with single and compound failures are carried out.The results validate that the proposed method has a better detection ability than the existing four entropy-based approaches.

X
indicates the position vector of a grey wolf.The vectors → A and → C are calculated as follows:

Figure 4 .
Figure 4.The structure of the SLFN.Assuming a training dataset , where , and .The activation function is and the number of hidden nodes is k.The training steps of the ELM algorithm are as follows: (1) Randomly set input weights and hidden layer biases : ;(27)

Figure 4 .
Figure 4.The structure of the SLFN.

Figure 6 .
Figure 6.Flowchart of the proposed fault diagnosis method.Figure 6. Flowchart of the proposed fault diagnosis method.

Figure 6 .
Figure 6.Flowchart of the proposed fault diagnosis method.Figure 6. Flowchart of the proposed fault diagnosis method.

Mathematics 2022 , 30 ( 1 )
10, x FOR PEER REVIEW 14 of Collecting the vibration signals.The various fault states of gears and rolling bearings in the gearbox are collected by accelerometers; (a) The experiment platform.(b) The structure of the gearbox.

Figure 7 .
Figure 7.The experimental platform and gearbox structure.

Figure 7 .
Figure 7.The experimental platform and gearbox structure.

Figure 8 .
Figure 8. Waveforms corresponding to different states.

Figure 8 .
Figure 8. Waveforms corresponding to different states.The performance of the proposed method is verified by experimental data.Firstly, selecting the best (m, c) combination according to the AED method proposed in Section 2.4 is required, and 50 samples are randomly selected for each working state of the gearbox.The results of Value AED under different (m, c) combinations are illustrated in Table 2.It can be found that, with the increase of (m, c), the Value AED also increases, the Euclidean distance between samples of different states becomes larger and the separability is constantly enhanced.Hence, selecting m = 3 and c = 8.Comprehensively, the final parameters are set to k = 3, τ = 8, λ = 1, N = 2048, m = 3 and c = 8.

Figure 8 .
Figure 8. Waveforms corresponding to different states.

Figure 9 .
Figure 9. Accelerometers installation location.Figure 9. Accelerometers installation location.The results of the feature extraction of the training samples are shown in Figure10, and the low-dimensional features after feature reduction are displayed in Figure11, respectively.It can be seen that there are differences in features of different states, but it is difficult to distinguish them directly.Therefore, it is necessary to rely on a classification algorithm to identify the states.The sensitive feature vectors are inputted into GWO-RELM, and the final setting optimization parameters are set to n = 113, θ =0.509.

Figure 10 .
Figure 10.Raw fault features corresponding to different states.Figure 10.Raw fault features corresponding to different states.

Figure 10 .
Figure 10.Raw fault features corresponding to different states.Figure 10.Raw fault features corresponding to different states.

Figure 12 .
Figure 12.Identification results of the proposed method.Then, HRCMFDE is compared with the existing RCMFDE, MFDE, RCMDE and MDE.The parameters are also set to k = 3, τ = 8, λ = 1, N = 2048, m = 3 and c = 8.For each model, the experiment is repeated 50 times, and the results are shown in Table3and Figure13, where 'Time' in Table3refers to the time consumed by a single sample to extract the high-dimensional features.The following conclusions can be found:

Figure 11 .
Figure 11.Sensitive fault features of different states.

Figure 11 .
Figure 11.Sensitive fault features of different states.

Figure 12 .
Figure 12.Identification results of the proposed method.Then, HRCMFDE is compared with the existing RCMFDE, MFDE, RCMDE and MDE.The parameters are also set to k = 3, τ = 8, λ = 1, N = 2048, m = 3 and c = 8.For each model, the experiment is repeated 50 times, and the results are shown in Table3and Figure13, where 'Time' in Table3refers to the time consumed by a single sample to extract the high-dimensional features.The following conclusions can be found:

Figure 12 .
Figure 12.Identification results of the proposed method.Then, HRCMFDE is compared with the existing RCMFDE, MFDE, RCMDE and MDE.The parameters are also set to k = 3, τ = 8, λ = 1, N = 2048, m = 3 and c = 8.For each model, the experiment is repeated 50 times, and the results are shown in Table3and Figure13, where 'Time' in Table3refers to the time consumed by a single sample to extract the highdimensional features.The following conclusions can be found:

Figure 13 .
Figure 13.The accuracy of different approaches.

Figure 13 .
Figure 13.The accuracy of different approaches.

Figure 14 .
Figure 14.Comparison of identification accuracy before and after ReliefF.

Figure 14 .
Figure 14.Comparison of identification accuracy before and after ReliefF.

Figure 15 .
Figure 15.The experimental platform and gearbox structure.Figure 15.The experimental platform and gearbox structure.

Figure 15 .
Figure 15.The experimental platform and gearbox structure.Figure 15.The experimental platform and gearbox structure.

Firstly, selecting
the best (m, c) of HRCMFDE is required.It is observed that m = 3 and c = 8, based on the result in Table 8, are selected, and the other parameter selection is the same as experiment 1, and the final parameters are set to k = 3, τ = 8, λ = 1, N = 2048, m = 3 and c = 8.The high-dimensional fault features corresponding to the different states extracted by HRCMFDE are shown in Figure 19, and the low-dimensional features after feature reduction are displayed in Figure 20.Mathematics 2022, 10, x FOR PEER REVIEW 21 of 30

Figure 16 .
Figure 16.The internal structure of the gearbox and the installation positions of accelerometers.

Figure 16 .
Figure 16.The internal structure of the gearbox and the installation positions of accelerometers.

Secondly, the sensitive 30 Figure 17 .
Figure 17.The components with various faults.

Figure 18 .
Figure 18.Waveforms corresponding to different states.Firstly, selecting the best (m, c) of HRCMFDE is required.It is observed that m = 3 and c = 8, based on the result in Table 8, are selected, and the other parameter selection is the same as experiment 1, and the final parameters are set to k = 3, τ = 8, λ = 1, N = 2048, m = 3 and c = 8.The high-dimensional fault features corresponding to the different states extracted by HRCMFDE are shown in Figure 19, and the low-dimensional features after feature reduction are displayed in Figure 20.

Figure 18 .
Figure 18.Waveforms corresponding to different states.Firstly, selecting the best (m, c) of HRCMFDE is required.It is observed that m = 3 and c = 8, based on the result in Table 8, are selected, and the other parameter selection is the same as experiment 1, and the final parameters are set to k = 3, τ = 8, λ = 1, N = 2048, m = 3 and c = 8.The high-dimensional fault features corresponding to the different states extracted by HRCMFDE are shown in Figure 19, and the low-dimensional features after feature reduction are displayed in Figure 20.

Figure 18 .
Figure 18.Waveforms corresponding to different states.

Figure 19 .
Figure 19.Raw fault features corresponding to different states.

Figure 20 .
Figure 20.Sensitive fault features of different states.

Figure 19 .
Figure 19.Raw fault features corresponding to different states.

Figure 19 .
Figure 19.Raw fault features corresponding to different states.

Figure 20 .
Figure 20.Sensitive fault features of different states.Figure 20.Sensitive fault features of different states.

Figure 20 .
Figure 20.Sensitive fault features of different states.Figure 20.Sensitive fault features of different states.
Secondly, the sensitive feature vectors of training samples are inp RELM, and the final setting optimization parameters are set to n = 121, Then, the low-dimensional sensitive feature vectors obtained from different states are inputted into the optimized RELM model for trainin final identification result is shown in Figure21.It can be seen that, a samples, only one sample is misidentified ("GTB" is identified as "GW recognition accuracy reaches 99.38%.The comparison results of extraction models and classification methods are shown in Tables 9 and and the conclusions obtained are similar to that of experiment 1. (a) GWO-RELM output.(b) Confusion matrix (%).

Figure 21 .
Figure 21.Identification results of the proposed method.

Figure 22 .
Figure 22.The accuracy of different feature extraction approaches.

Figure 23 .
Figure 23.Identification results of the proposed method.

Figure 24 .
Figure 24.The accuracy of different approaches.

Table 1 .
Detailed information on different working statuses.

Table 2 .
The Value AED under different (m, c) combinations.

Table 2 .
The under different (m, c) combinations.

Table 3 .
The performance comparison between different feature extraction models.HRCMFDE has the best stability and apparent advantages.The reason is that the refined composite multiscale entropy only analyzes the lowfrequency signals and often ignores the high-frequency signals, resulting in a relatively large limitation of feature extraction performance;(2) The MFDE model has the fastest calculation speed, but the diagnostic accuracy is insufficient, and the significant variance indicates a lack of stability.Although the HRCMFDE model has the lowest computational efficiency, it is still acceptable in practical applications.Ultimately, HRCMFDE has the highest accuracy.The reason is that the information extracted from low-frequency to high-frequency is the most extensive, and the feature information contained is the richest.Hence, it has the best stability, the highest diagnostic accuracy and a longer calculation time.The above analysis shows that the HRCMFDE model proposed in this paper has apparent advantages in the separability and stability of features.The HRCMFDE model can be effectively applied in gearbox fault diagnosis.

Table 3 .
The performance comparison between different feature extraction models.

Table 4 .
The performance comparison between different methods without ReliefF.

Table 4 .
The performance comparison between different methods without ReliefF.

Table 5 .
Comparison of models under different classification methods.

Table 5 .
Comparison of models under different classification methods.

Table 6 .
Comparison of different models.states in Table 1 and OM is the overall means.Compared with the RCMDE model, the RCMFDE model has better feature extraction performance and higher comprehensive scoring.Compared with the RCMFDE model, the P-means, R-means, Acc-means and Fmeans of the HRCMFDE model are increased by 3.89%, 4.16%, 1.04% and 4.17%, respectively.It shows that the HRCMFDE model has superior performance, higher accuracy and stability.

Table 6 .
Comparison of different models.

Table 7 .
The detailed fault information of gearbox components.

Table 7 .
The detailed fault information of gearbox components.

Table 8 .
The Value AED under different (m, c) combinations.

Table 9 .
The performance comparison between different feature extraction models.

Table 8 .
The under different (m, c) combinations.

Table 8 .
The under different (m, c) combinations.

Table 10 .
Comparison of models under different classification methods.

Table 9 .
The performance comparison between different feature extraction mo

Table 10 .
Comparison of models under different classification methods.

Table 11 .
Comparison of different models.

Table 11 .
Comparison of different models.

Table 12 .
The detailed fault information of gearbox components.

Table 13 .
The performance comparison between different methods.

Table 13 .
The performance comparison between different methods.

Table 14 .
Comparison of models under different classification methods.The HRCMFDE, RCMFDE and RCMDE models with better feature extraction performance are compared, and the results are shown in Table15.The method proposed can fully identify testing samples of six different states.Meanwhile, the remaining methods have no way of completely identifying testing samples of any states.Compared with the RCMFDE model, the P-means, R-means, Acc-means and F-means are increased by 4.70%, 4.89%, 1.09% and 4.89%, respectively.Once again, the superior feature extraction ability of the HRCMFDE model is highlighted.

Table 15 .
Comparison of different models.

Table 14 .
Comparison of models under different classification methods.

Table 15 .
Comparison of different models.