Research on Multi-Domain Fault Diagnosis of Gearbox of Wind Turbine Based on Adaptive Variational Mode Decomposition and Extreme Learning Machine Algorithms

Since variational mode decomposition (VMD) was proposed, it has been widely used in condition monitoring and fault diagnosis of mechanical equipment. However, the parameters K and α in the VMD algorithm need to be set before decomposition, which causes VMD to be unable to decompose adaptively and obtain the best result for signal decomposition. Therefore, this paper optimizes the VMD algorithm. On this basis, this paper also proposes a method of multi-domain feature extraction of signals and combines an extreme learning machine (ELM) to realize comprehensive and accurate fault diagnosis. First, VMD is optimized according to the improved grey wolf optimizer; second, the feature vectors of the time, frequency, and time-frequency domains are calculated, which are synthesized after dimensionality reduction; ultimately, the synthesized vectors are input into the ELM for training and classification. The experimental results show that the proposed method can decompose the signal adaptively, which produces the best decomposition parameters and results. Moreover, this method can extract the fault features of the signal more completely to realize accurate fault identification.


Introduction
With the increasing depletion of traditional energies, wind energy, a clean energy, has been widely considered. Because of the instability and randomness of wind speeds in addition to the poor operating environment, the speed and load of a wind-based unit continuously change, which results in a high failure rate for the unit. It is difficult to find faults in the early stages, and once a fault truly occurs, the fault leads to very large economic losses. Therefore, it is urgent to study the early failure of the gearbox of a wind turbine and determine the cause of the failure because these characteristics are of great significance to ensuring the safe and stable operation of the wind turbine and avoiding catastrophic accidents.
The main contents of fault diagnosis generally include: Signal acquisition, feature extraction, pattern recognition, diagnostic decision making, and other basic links. The specific process is shown in Figure 1. The gearbox is an important part of the wind turbine, whose signals are non-linear and non-stationary [1]. Therefore, it is necessary to use various signal processing methods to preprocess before feature extraction. Time-frequency methods such as the short-time Fourier transform (STFT), Wigner-Ville distribution (WVD), and discrete wavelet transform (DWT) have their own limitations, and it is difficult to achieve satisfactory results in the actual signal analysis [2][3][4][5]. EMD based signal processing methods are the mainstream at present. In reference [6], a fault feature extraction method based on local mean decomposition (LMD) and multi-scale entropy are proposed. Using LMD as preprocessing, the non-stationary vibration signal of rolling bearing is decomposed into several product functions, and its multi-scale entropy is taken as a feature vector. A feature extraction method based on energy entropy of the empirical mode decomposition (EMD) is proposed in reference [7]. In reference [8], the vibration signal is decomposed into a set of intrinsic mode functions (IMFs) by ensemble empirical mode decomposition (EEMD). The PE values of the first few IMFs are calculated as feature vectors, and the distance between clusters in the feature space is used to optimize the support vector machine (SVM) for fault types and severity classification of the degree. However, these methods only extract the time-frequency characteristic information of the vibration signal, and the extracted characteristic information is often not comprehensive enough. Reference [9] decomposes vibration signals into IMFs through EMD. 13 time-domain feature parameters and 16 frequency-domain feature parameters are extracted, and these parameters are input into a SVM for fault diagnosis. However, there are still some deficiencies in the research of multi-domain feature extraction. For example, traditional signal processing methods often have unsatisfactory results, and multi-domain feature vectors, while comprehensively extracting features, will also bring information redundancy and affect the difficulty of fault diagnosis. Empirical mode decomposition (EMD) is a kind of adaptive signal decomposition method that can adaptively decompose the signal into a series of intrinsic mode functions according to the local scale characteristics of the signal itself to reveal the internal characteristics of the signal [10][11][12]. In reference [13], a local mean decomposition (LMD) method is proposed that overcomes the problems of overpackaging and insufficient envelopes in EMD and has the advantages of fewer end effects and fewer iterations. However, LMD also faces the problem of modal mixing. For the sake of suppressing these problems, a method called ensemble empirical mode decomposition (EEMD) is proposed in reference [14]. However, a series of recursive decomposition methods, such as EMD, face the problem of modal mixing. The fundamental reason for these problems is the limitation of the recursive decomposition principle. Therefore, to fundamentally solve this problem, a new method is needed. VMD is a new method that is completely different from the recursive mode decomposition [15]. VMD takes the solution of the variational problem as its whole frame, has a solid theoretical basis, and can overcome the modal mixing. However, before using VMD for signal decomposition, the decomposition scale K and penalty factor α need to be set in advance. For the determination of these two parameters, the central frequency observation method is used in reference [16], while references [17,18] determine the parameter value according to artificial experience. However, these methods do not fundamentally solve the problem of parameter determination. Recently, intelligent algorithms such as the particle swarm optimization (PSO) have been used to optimize VMD parameters with good results [19][20][21]. Pattern recognition is the second key in fault diagnosis. Compared with traditional pattern recognition algorithms such as back propagation (BP) neural network and SVM, extreme learning machine (ELM) has the advantages of fast training speed, high learning efficiency, and strong robustness [16,22,23]. Empirical mode decomposition (EMD) is a kind of adaptive signal decomposition method that can adaptively decompose the signal into a series of intrinsic mode functions according to the local scale characteristics of the signal itself to reveal the internal characteristics of the signal [10][11][12]. In reference [13], a local mean decomposition (LMD) method is proposed that overcomes the problems of overpackaging and insufficient envelopes in EMD and has the advantages of fewer end effects and fewer iterations. However, LMD also faces the problem of modal mixing. For the sake of suppressing these problems, a method called ensemble empirical mode decomposition (EEMD) is proposed in reference [14]. However, a series of recursive decomposition methods, such as EMD, face the problem of modal mixing. The fundamental reason for these problems is the limitation of the recursive decomposition principle. Therefore, to fundamentally solve this problem, a new method is needed. VMD is a new method that is completely different from the recursive mode decomposition [15]. VMD takes the solution of the variational problem as its whole frame, has a solid theoretical basis, and can overcome the modal mixing. However, before using VMD for signal decomposition, the decomposition scale K and penalty factor α need to be set in advance. For the determination of these two parameters, the central frequency observation method is used in reference [16], while references [17,18] determine the parameter value according to artificial experience. However, these methods do not fundamentally solve the problem of parameter determination. Recently, intelligent algorithms such as the particle swarm optimization (PSO) have been used to optimize VMD parameters with good results [19][20][21]. Pattern recognition is the second key in fault diagnosis. Compared with traditional pattern recognition algorithms such as back propagation (BP) neural network and SVM, extreme learning machine (ELM) has the advantages of fast training speed, high learning efficiency, and strong robustness [16,22,23]. Based on these researches, this paper applies an improved GWO algorithm [24,25] to VMD parameter optimization to achieve better adaptive VMD decomposition. In addition, combined with the ELM, this paper proposes a multi-domain fault diagnosis method and applies it to the fault diagnosis of the rolling bearing of the gearbox.

Variational Mode Decomposition (VMD)
Unlike recursive modal decomposition methods such as EMD, VMD is a new method for signal decomposition and estimation with better time-frequency distribution. The VMD method can adaptively divide the frequency domain of the signal and effectively separate the components to obtain a series of intrinsic mode functions (IMFs) with sparse characteristics [26]. In essence, the VMD algorithm transforms the decomposition process of the signal into the solution process of the variational problem. Therefore, the algorithm can be divided into two parts, that is, the construction and solution of the variational problem.
(a) The construction of the variational problem First, to obtain the single side spectrum of each IMF component u k (t), the Hilbert transform is applied to each component u k (t): where δ(t) is a pulse function, j is the imaginary unit, and u k (t) is the k-th IMF component decomposed. Second, the frequency spectrum of each mode is modulated to the corresponding fundamental frequency band, that is, an estimated center frequency is added to the analytical signal of the IMF component, as follows: Finally, the bandwidth of each IMF is estimated by solving the square L2 norm of the demodulation signal gradient. The variational problem can be expressed as follows: where ω k is the center frequency of the k-th IMF component u k and f is the original input signal.
(b) The solution of the variational problem First, the quadratic penalty factor α and Lagrange multiplication operator λ(t) are introduced to transform the constrained variational problem into an unconstrained problem as follows: Second, the alternating direction method of multipliers (ADMM) is used to solve the variational problem [27].
Finally, the renewal formulas of the IMF component u(k) and its central frequency ω(k) are obtained:û In fact, before the VMD decomposition of the signal, four parameters need to be set artificially: Decomposition scale K, penalty factor α, noise tolerance τ, and discrimination accuracy ε. It is found that the two parameters of noise tolerance and discrimination accuracy have little influence on the decomposition results of VMD, and the standard default value is usually adopted [19]. The K and α are two important parameters of VMD decomposition [28]. In this paper, an improved GWO algorithm is selected to adaptively determine these two parameters to obtain the best combination of decomposed parameters. See Section 3 for details.

Fast Grey Wolf Optimizer (F-GWO)
The grey wolf optimizer (GWO) is inspired by the hunting behavior of wolves and is an optimization algorithm derived from the grey wolf's population mechanism and predatory behavior. The optimal solution can be obtained through continuous iteration. The basic process of this algorithm consists of the following three parts.
(a) Rank Assignment GWO imitates the hierarchy of grey wolves and divides them into four grades, (α, β, δ, ω). In the process of optimization, ω is responsible for the path search, and (α, β, δ) are responsible for the guidance of the path search (optimization process). According to the rank assignment, α obtains the optimal solution, β obtains the suboptimal solution, and δ obtains the general solution.

(b) Target Encircling
The ω grey wolves find the best way to the target in all directions. To avoid falling into a local optimum, the ω grey wolves need to traverse the whole path.
(c) Target Attacking At this stage, (α, β, δ) give instructions to guide ω to move and gradually shorten the distance between wolves and prey. Then, the location is updated, and hunting is realized.
The specific steps and mathematical modeling of the GWO algorithm are as follows: Step 1: Initialize the parameters, set the number of grey wolves ω, number of iterations, etc.
Step 2: According to the position of each wolf after initialization, calculate the distance between each wolf and its prey, that is, the solution and fitness value. Then, the optimal solution, suboptimal solution and general solution are assigned to α, β and δ according to the fitness value.
Step 3: According to the location of the prey and the distance between a grey wolf and the prey, change the search direction of the other grey wolves and update their location, that is, update the solution. This step is equivalent to the behavior of target encircling, and Formulas (7) and (8) are the corresponding mathematical modeling. position around the prey and finally output the position of the wolf α, that is, to obtain the best position. This step is equivalent to the final behavior of the target attack, and Formulas (9)-(11) are the corresponding mathematical modeling.
where → D α,β,δ denotes the distances between the first three grey wolves and their prey, → X α,β,δ denotes the positions of the first three grey wolves, and → X(t + 1) denotes the positions of the other grey wolves updated by the positions of α, β and δ.
Although the GWO algorithm is much better than the PSO and DE algorithms in performance, GWO has the disadvantages of slow convergence speed and weak global search ability. To solve this problem, the fast GWO algorithm [25], which improves the convergence factor and introduces the dynamic weight strategy, is used to optimize VMD parameters in this paper.
As the GWO algorithm is nonlinear in the convergence process, and the convergence factor → a decreases linearly from 2 to 0 with the number of iterations, a new nonlinear convergence method is introduced as follows: To solve the problem of the GWO algorithm easily falling into a local optimum, a proportional weight based on the module value of the guiding position vector is introduced: where ω 1 , ω 2 and ω 3 represent the learning rates of grey wolves ω corresponding to wolves α, β and δ, respectively. Thereby, the global search ability of GWO is dynamically balanced. Finally, the update mode of the grey wolf positions is optimized as follows:

Adaptive VMD Algorithm
In this section, the F-GWO was introduced to optimize the parameters of VMD. The fitness value is the core of the optimization algorithm. Because the value of entropy reflects the uncertainty, information entropy is an eminent index to judge signal sparseness [29]. In this paper, Energies 2020, 13, 1375 6 of 20 the minimum average envelope entropy (MAEE) was selected as the fitness value of VMD optimization, which is expressed as follows: where K, α is the optimal combination of K and α. H en (i) is the envelope entropy of the ith IMF component. The calculation formulas are as follows in Formulas (16) and (17).
In the above formulas, b i (n) is the envelope of the ith IMF component, N is the number of sampling points, and p i is the normalized form of the envelope of the ith IMF component.
Based on the theoretical basis in the previous section and MAEE, the algorithm flow is shown in Figure 2.

Adaptive VMD Algorithm
In this section, the F-GWO was introduced to optimize the parameters of VMD. The fitness value is the core of the optimization algorithm. Because the value of entropy reflects the uncertainty, information entropy is an eminent index to judge signal sparseness [29]. In this paper, the minimum average envelope entropy (MAEE) was selected as the fitness value of VMD optimization, which is expressed as follows: is the optimal combination of K and α .
is the envelope entropy of the ith IMF component. The calculation formulas are as follows in Formulas (16) and (17).
In the above formulas, ( )

Extraction of Multi-Domain Fault Feature
When an equipment fault occurs, its time and frequency domains often have corresponding feature changes. In this paper, the time-domain, frequency-domain and time-frequency-domain were synthesized to comprehensively extract fault features and realize the extraction of multi-domain fault features. According to the adaptive VMD (AVMD) proposed in Section 3.1, the original signal was decomposed to obtain the time-domain information of K IMF components, and the feature vector of the time domain, composed of K singular values, was obtained by singular value decomposition of the matrix formed by the components.
When decomposing different signals, the number of decomposed K was different, which led to different dimensions of K for the time-domain feature vector. To solve this problem, PCA was introduced, which can help analyze multi-dimensional data and reduce the dimension of the data. The PCA quoted in this paper has the following steps (with m n-dimensional data): Step 1: Make the original data into the matrix X (n * m); Step 2: Zero average each row of X, that is, subtract the average of each row; Step 3: Figure out the covariance matrix; Step 4: Figure out the eigenvalues and eigenvectors of the covariance matrix; Step 5: Arrange the eigenvectors into a matrix from top to bottom according to the corresponding eigenvalue size. Take the first k rows to form matrix P; Step 6: Then, reduce m n-dimensional data to k-dimensional.
In this way, information redundancy can be reduced, and the problem of different dimension of K for the feature vector can be solved. The complexity of IMFs decomposed from vibration signals with different faults was different. The greater the complexity was, the greater the uncertainty and the greater the entropy were. In particular, for some specific faults in the gearbox, because fault information was usually concentrated in a sensitive frequency band, once a fault occurred, the complexity in the sensitive frequency band changed accordingly. In reference [30], a method, permutation entropy (PE), was proposed to detect the randomness and dynamic mutation of a time series. Compared with approximate entropy and sample entropy, PE had a faster calculation speed and stronger anti-interference ability. Based on this, the time-frequency characteristics of the signal can be reflected by PE. The specific implementation method was to decompose the vibration signal through the adaptive VMD algorithm. Each decomposed IMF component contained the characteristic information of different frequency bands from the original vibration signal, which can better reflect the local characteristics of the signal. Therefore, the PE of each IMF component can be calculated to form the feature vector in the time-frequency domain.

Method of Multi-Domain Fault Diagnosis
The framework of the method is shown in Figure 3.

Simulation Verification
According to the research [31], the problem of modal mixing in a series of recursive decomposition methods, such as EMD, was usually caused by the interference of abnormal signals, such as noise signal and intermittent signal. Therefore, the simulated signal of Formulas (18) and (19) was established in MATLAB to verify the proposed FGWO-VMD method. The simulation time was 1 s, and the sampling rate was 1000 Hz. ( ) t f 3 is high frequency intermittent signal; ( )  Figure 4 shows the simulation signal diagram.

Simulation Verification
According to the research [31], the problem of modal mixing in a series of recursive decomposition methods, such as EMD, was usually caused by the interference of abnormal signals, such as noise signal and intermittent signal. Therefore, the simulated signal of Formulas (18) and (19) was established in MATLAB to verify the proposed FGWO-VMD method. The simulation time was 1 s, and the sampling rate was 1000 Hz.
It can be seen that f 1 (t) and f 2 (t) were sine cosine signals with amplitudes of 5, 4, and frequencies of 50 Hz and 100 Hz, respectively; f 3 (t) is high frequency intermittent signal; f 4 (t) is white noise with a mean value of zero and variance of 4. The four signals were combined as superimposed signals f (t). Figure 4 shows the simulation signal diagram.
Firstly, EMD, LMD and EEMD are used to decompose the signal f (t). The decomposition results and their spectra are shown in Figures 5-7, respectively.   It can be seen from Figures 5 and 6 that EMD decomposed the signal into 9 components, while the actual signal was composed of 3 components and noise, and there was serious modal mixing. LMD decomposed the signal into 5 components, and IMF1 and IMF2 contained 2 different center frequencies. In Figure 7, EEMD decomposed the signal into 6 signals and improved the problem of modal mixing. However, by observing the spectrum of IMF4, it was found that EEMD still did not eliminate the mode mixing. Moreover, the 3 methods all over decomposed the signal. It can be seen from Figures 5 and 6 that EMD decomposed the signal into 9 components, while the actual signal was composed of 3 components and noise, and there was serious modal mixing. LMD decomposed the signal into 5 components, and IMF1 and IMF2 contained 2 different center frequencies. In Figure 7, EEMD decomposed the signal into 6 signals and improved the problem of modal mixing. However, by observing the spectrum of IMF4, it was found that EEMD still did not eliminate the mode mixing. Moreover, the 3 methods all over decomposed the signal.
Next, the VMD method optimized in this paper was used to decompose the signal. Since signal f (t) is composed of 3 signals and white noise, the decomposition parameter K of VMD should  The number of grey wolves in the groups of GWO and FGWO was set at 100, and the maximum number of iterations was set at 10. The decomposition parameter of VMD was the position vector of the grey wolves. Figure 8 shows the convergence curve of the 2 optimization algorithms.
It can be seen from the convergence curve in Figure 8 that the fitness value of GWO reached the minimum value of 9.721 in the 5th iteration, while that of FGWO converged to the minimum value of 9.72 in the 2nd iteration. Therefore, both FGWO and GWO can converge to the global minimum, and FGWO does thus in less time. The position vector corresponding to MAEE was (3, 1126), which was used to decompose the simulation signal f (t). The IMF components and their spectra are shown in Figure 9.
It was easy to see that under the optimal parameters obtained from FGWO, VMD decomposed the simulation signal into 3 IMF components with 50 Hz, 100 Hz, and 300 Hz as the center frequency, that was, the simulation signal f (t) was decomposed into f 1 (t), f 2 (t), and f 3 (t), effectively. The problem of modal mixing was eliminated. Moreover, VMD eliminated some of the interference from the white noise f 4 (t).
It can be seen that VMD had great advantages in signal processing. In this paper, FGWO was introduced to optimize VMD, thus that VMD can adaptively determine the optimal decomposition parameters, get the better result of decomposition. Next, the VMD method optimized in this paper was used to decompose the signal. Since signal is composed of 3 signals and white noise, the decomposition parameter K of VMD should be 3.
Then, the original GWO algorithm and FGWO algorithm were used to optimize the VMD decomposition parameters. The number of grey wolves in the groups of GWO and FGWO was set at 100, and the maximum number of iterations was set at 10. The decomposition parameter of VMD was the position vector of the grey wolves. Figure 8 shows the convergence curve of the 2 optimization algorithms.
It can be seen from the convergence curve in Figure 8 that the fitness value of GWO reached the minimum value of 9.721 in the 5 th iteration, while that of FGWO converged to the minimum value of 9.72 in the 2 nd iteration. Therefore, both FGWO and GWO can converge to the global minimum, and  It can be seen that VMD had great advantages in signal processing. In this paper, FGWO was introduced to optimize VMD, thus that VMD can adaptively determine the optimal decomposition parameters, get the better result of decomposition.

Experimental Analysis
In this section, the proposed method was applied to process and analyze real data from the laboratory, of which the source was the bearing data center of Case Western Reserve University, as shown in Figure 10.

Experimental Analysis
In this section, the proposed method was applied to process and analyze real data from the laboratory, of which the source was the bearing data center of Case Western Reserve University, as shown in Figure 10. It was easy to see that under the optimal parameters obtained from FGWO, VMD decomposed the simulation signal into 3 IMF components with 50 Hz, 100 Hz, and 300 Hz as the center frequency, that was, the simulation signal ( ) problem of modal mixing was eliminated. Moreover, VMD eliminated some of the interference from the white noise It can be seen that VMD had great advantages in signal processing. In this paper, FGWO was introduced to optimize VMD, thus that VMD can adaptively determine the optimal decomposition parameters, get the better result of decomposition.

Experimental Analysis
In this section, the proposed method was applied to process and analyze real data from the laboratory, of which the source was the bearing data center of Case Western Reserve University, as shown in Figure 10.   Table 1 shows the MATLAB data information collected under the normal bearing. Vibration data were collected through accelerometers, which were attached to the housing with magnetic bases. In addition, signals were collected through a 16 channel DAT recorder, and were post-processed in a MATLAG environment.
The data processed and analyzed in this section were the bearing data from the drive end. The following 3 cases of fault diagnosis are studied:  The specific fault data information is shown in Table 2. Each type of data was divided into 50 groups, with 2048 sampling points in each group. The first 30 groups were used for training, and the last 20 groups were used for testing.

Diagnosis of Different Faults
The data used in this subsection are shown in A of Table 2. First, FGWO-VMD decomposition and FFT were performed on the signal to obtain the time and frequency domain information of K IMF components. Taking the inner race fault as an example, the best decomposition parameter combination of VMD optimized by FGWO was (4,2000), and the time-domain and frequency-domain information is shown in Figure 11. VMD decomposed the signal into K simple and easy-to-analyze IMF components and eliminated some noise interference. Each IMF component can jointly represent the characteristics of the original signal.
In Figure 11, the time-domain information of IMFs was decomposed into singular values, and the vector T1 composed of K singular values was obtained. PCA was performed on T1, as shown in Figure 12. It can be seen from Figure 12 that the contribution rate of the first feature value of T1 after PCA was very high, thus it was selected as the feature value t 1 of the time-domain information to realize the dimensionality reduction of the time-domain feature vector. Similar to the extraction method of the time-domain feature values, the vector T2 composed of K singular values and the T3 composed of K values of PE were obtained, and PCA was carried out for them, as shown in Figures 13  and 14. It was easy to see from Figure 13 that the first feature value of T2 after PCA was selected as Energies 2020, 13, 1375 15 of 20 the feature value t 2 of the frequency-domain information to complete the dimensionality reduction of the frequency-domain feature vector. It can be seen from Figure 14 that after the feature vector of the time-frequency domain was processed by PCA, the contribution rate of the first 2 values reached 80%. Therefore, these values were selected as the feature values t 3 and t 4 to represent the characteristics of the signal in the time-frequency domain.
vector of the time-frequency domain was processed by PCA, the contribution rate of the first 2 values reached 80%. Therefore, these values were selected as the feature values t 3 and t 4 to represent the characteristics of the signal in the time-frequency domain. Now, the multi-domain feature extraction and dimensionality reduction of the signal were complete, and next, the time-domain, frequency-domain, and time-frequency-domain feature values were synthesized to form the feature vector T = [t 1 ,t 2 ,t 3 ,t 4 ] , which can fully represent the characteristics of the signal.
Finally, according to this multi-domain feature extraction method, the feature vectors of 50 groups of signals of each fault were extracted, and a total of 200 feature vectors were obtained. A total of 120 feature vectors were randomly selected to train the ELM, and then 80 feature vectors were randomly selected as the signals to be tested, which are input into the ELM after training. Then, the feature vectors can be class  Finally, according to this multi-domain feature extraction method, the feature vectors of 50 groups of signals of each fault were extracted, and a total of 200 feature vectors were obtained. A total of 120 feature vectors were randomly selected to train the ELM, and then 80 feature vectors were randomly selected as the signals to be tested, which are input into the ELM after training. Then, the feature vectors can be classified to realize the diagnosis of different faults. The results of the diagnosis are shown in Figure 15.
In Figure 15, 1 represents normal, 2 represents the inner race fault, 3 represents the ball fault, and 4 represents the outer race fault. The correct rate of the ELM training was 95%. When fault diagnosis was carried out in the trained ELM, the correct rate was as high as 98.75%. Only one error occurred in 80 test samples, that was, an inner race fault was misjudged as a ball fault.     In Figure 15, 1 represents normal, 2 represents the inner race fault, 3 represents the ball fault, and 4 represents the outer race fault. The correct rate of the ELM training was 95%. When fault diagnosis was carried out in the trained ELM, the correct rate was as high as 98.75%. Only one error occurred in 80 test samples, that was, an inner race fault was misjudged as a ball fault.

Diagnosis under Different Working Conditions
The data used in this subsection are shown in B of Table 2. For the method in Section 5.1 and under the same fault of the inner race and different working conditions, the results of diagnosis are shown in Figure 16.

Diagnosis under Different Working Conditions
The data used in this subsection are shown in B of Table 2. For the method in Section 5.1 and under the same fault of the inner race and different working conditions, the results of diagnosis are shown in Figure 16. As shown in Figure 16, for the same fault under different working conditions, the method proposed in this paper was applied to feature extraction and classification, and only 3 misjudgments appeared in 80 test samples. The effect was still considerable, and the diagnostic accuracy was as high as 96.25%. This result showed that the decomposition effect of adaptive VMD was great. Combined with the subsequent work of the multi-domain feature extraction, all the original signal features were extracted comprehensively, thus increasing the accuracy of diagnosis.

Diagnosis of Different Fault Degrees
The data used in this subsection are shown in C of Table 2. The steps in Section 5.1 were followed to obtain the diagnosis results of different fault degrees, as shown in Figure 17. It can be seen from Figure 16, that although the classification effect was not as good as those for cases A and B, when diagnosing the same fault of different degrees, the accuracy rate of diagnosis was 90%.
In order to further illustrate the effectiveness of the proposed method, EEMD was combined with 4 pattern recognition methods: SVM, genetic algorithm back propagation (GA-BP) neural network, ELM, and deep convolutional neural networks (Deep-CNN) to obtain accuracy rate of diagnosis in 3 cases. The results are shown in Table 3. As shown in Figure 16, for the same fault under different working conditions, the method proposed in this paper was applied to feature extraction and classification, and only 3 misjudgments appeared in 80 test samples. The effect was still considerable, and the diagnostic accuracy was as high as 96.25%. This result showed that the decomposition effect of adaptive VMD was great. Combined with the subsequent work of the multi-domain feature extraction, all the original signal features were extracted comprehensively, thus increasing the accuracy of diagnosis.

Diagnosis of Different Fault Degrees
The data used in this subsection are shown in C of Table 2. The steps in Section 5.1 were followed to obtain the diagnosis results of different fault degrees, as shown in Figure 17. It can be seen from Figure 16, that although the classification effect was not as good as those for cases A and B, when diagnosing the same fault of different degrees, the accuracy rate of diagnosis was 90%. Energies 2020, 13, x FOR PEER REVIEW 20 of 22 It can be seen from Table 3 that in case A and B, because of the multi-domain feature extraction method proposed in this paper, the accuracies of 5 fault diagnosis methods were high. Therefore, the improvement of AVMD-ELM was not obvious. In case C, affected by signal processing methods, the accuracies of the 4 fault diagnosis methods were low. Due to the improvement of signal processing methods, the fault accuracy of AVMD-ELM was far higher than those of the other 4 methods.

Conclusions
There are two major contributions to fault diagnosis in this paper. In the view of the signal processing method before feature extraction, this paper optimizes the VMD algorithm; in the view of feature extraction, this paper proposes a multi-domain fault diagnosis method. Through these two improvements, the accuracy of fault diagnosis is improved. The conclusions are as follows: (a) It can be seen from the experiment that VMD can eliminate the problem of modal mixing.
Through the optimization of this paper, VMD can also adaptively determine the optimal decomposition parameters to obtain better decomposition effect; (b) In view of the problem of signal feature extraction, PCA is carried out thus that information redundancy is eliminated, the feature vectors representing each domain become more concise, and the features of each domain are more prominent thus that the accuracy of the subsequent classification is greatly enhanced; (c) In view of the problem of fault diagnosis accuracy, this paper starts from the fundamental problem that affects the accuracy of classification, synthesizes the extracted three-domain features to form a multi-domain feature vector that can comprehensively represent the fault characteristics of signals, and greatly improves the accuracy of fault diagnosis.
Finally, there are still some improvements that can be made in this method. Because of the instability of wind speed and the poor operating environment of the fan, the fan sometimes runs in a condition of variable speed. Therefore, on the basis of the research work in this paper, the technology of order tracking can be introduced to realize the automatic diagnosis of gearbox fault under variable conditions, broadening the application range of this method.  In order to further illustrate the effectiveness of the proposed method, EEMD was combined with 4 pattern recognition methods: SVM, genetic algorithm back propagation (GA-BP) neural network, ELM, and deep convolutional neural networks (Deep-CNN) to obtain accuracy rate of diagnosis in 3 cases. The results are shown in Table 3. It can be seen from Table 3 that in case A and B, because of the multi-domain feature extraction method proposed in this paper, the accuracies of 5 fault diagnosis methods were high. Therefore, the improvement of AVMD-ELM was not obvious. In case C, affected by signal processing methods, the accuracies of the 4 fault diagnosis methods were low. Due to the improvement of signal processing methods, the fault accuracy of AVMD-ELM was far higher than those of the other 4 methods.

Conclusions
There are two major contributions to fault diagnosis in this paper. In the view of the signal processing method before feature extraction, this paper optimizes the VMD algorithm; in the view of feature extraction, this paper proposes a multi-domain fault diagnosis method. Through these two improvements, the accuracy of fault diagnosis is improved. The conclusions are as follows: (a) It can be seen from the experiment that VMD can eliminate the problem of modal mixing. Through the optimization of this paper, VMD can also adaptively determine the optimal decomposition parameters to obtain better decomposition effect; (b) In view of the problem of signal feature extraction, PCA is carried out thus that information redundancy is eliminated, the feature vectors representing each domain become more concise, and the features of each domain are more prominent thus that the accuracy of the subsequent classification is greatly enhanced; (c) In view of the problem of fault diagnosis accuracy, this paper starts from the fundamental problem that affects the accuracy of classification, synthesizes the extracted three-domain features to form a multi-domain feature vector that can comprehensively represent the fault characteristics of signals, and greatly improves the accuracy of fault diagnosis.
Finally, there are still some improvements that can be made in this method. Because of the instability of wind speed and the poor operating environment of the fan, the fan sometimes runs in a condition of variable speed. Therefore, on the basis of the research work in this paper, the technology of order tracking can be introduced to realize the automatic diagnosis of gearbox fault under variable conditions, broadening the application range of this method.

Conflicts of Interest:
The authors declare no conflict of interest.