Adaptive Diagnosis for Rotating Machineries Using Information Geometrical Kernel-ELM Based on VMD-SVD

Rotating machineries often work under severe and variable operation conditions, which brings challenges to fault diagnosis. To deal with this challenge, this paper discusses the concept of adaptive diagnosis, which means to diagnose faults under variable operation conditions with self-adaptively and little prior knowledge or human intervention. To this end, a novel algorithm is proposed, information geometrical extreme learning machine with kernel (IG-KELM). From the perspective of information geometry, the structure and Riemannian metric of Kernel-ELM is specified. Based on the geometrical structure, an IG-based conformal transformation is created to improve the generalization ability and self-adaptability of KELM. The proposed IG-KELM, in conjunction with variation mode decomposition (VMD) and singular value decomposition (SVD) is utilized for adaptive diagnosis: (1) VMD, as a new self-adaptive signal processing algorithm is used to decompose the raw signals into several intrinsic mode functions (IMFs). (2) SVD is used to extract the intrinsic characteristics from the matrix constructed with IMFs. (3) IG-KELM is used to diagnose faults under variable conditions self-adaptively with no requirement of prior knowledge or human intervention. Finally, the proposed method was applied on fault diagnosis of a bearing and hydraulic pump. The results show that the proposed method outperforms the conventional method by up to 7.25% and 7.78% respectively, in percentages of accuracy.


Introduction
As industrial systems become more and more sophisticated, slight faults may result in catastrophes. Therefore, fault diagnosis technology has become more and more significant for the safety and reliable utilization of systems [1]. As we known, failures of rotating machineries are common causes of breakdown in industries. The growing demand for safety and reliability in industries requires a smart fault diagnosis system for rotating machineries [2]. Many researchers have studied the implementations of fault diagnosis algorithms on mobile devices for wider adoption [3][4][5][6]. Not only can the efficient fault diagnosis keep the monitored systems healthy and safe, they can also decrease the cost of repairs or replacements [7].
A lot of research have been done on the fault diagnosis for rotating machineries during the past decades. Vibration analysis is the main method for condition monitoring of rotating machineries [8].
(1) Adaptivity to unknown variable conditions. Rotating machineries often work under unknown variable conditions without any prior knowledge or training data. Since the health characteristics of rotating machineries might be variable under different conditions, an adaptive diagnosis method ought to be self-adaptive to unknown variable conditions. (2) Automaticity with little human intervention. An adaptive diagnosis method should be applied automatically and independent of human intervention as much as possible.
Generally, if a fault diagnosis method satisfies the above characteristics, it can be considered as an adaptive diagnosis method. The application of adaptive diagnosis methods can not only reduce the dependence on operators' experiences and skills, but also decrease the complexity and cost of condition monitoring systems. This paper attempts to propose an adaptive diagnosis method for rotating machineries by using vibration analysis.
Feature extraction is the basic step during fault diagnosis. It should be self-adaptive and provides remarkable features with little human intervention to extract valid features under variable conditions. Currently, a lot time-frequency analysis methods have been employed for feature extraction for rotating machineries, such as bearing, gearbox [14][15][16][17][18][19][20][21][22]. Wavelet transform, as a well-known time-frequency analysis tool, has been employed wildly to decompose the nonstationary signals. However, it still suffers several unacceptable disadvantages [23][24][25]. It is considered as an improved Fourier transform with adjustable windows [26]. The structure of the wavelet basic function is permanent during the decomposition [23,27]. Obviously, its non-adaptive nature is not appropriate for adaptive diagnosis.
Empirical mode decomposition (EMD), proposed by Huang et al. [28] is another well-known method and can decompose signals to several intrinsic mode functions (IMFs) according to the time and scale characteristics. It is completely self-adaptive and seems suitable for adaptive diagnosis [28][29][30][31][32]. However, is suffers from many problems, such as: modal aliasing, pseudo components, end effect and so on. Therefore, a more efficient self-adaptive method is needed.
Variation mode decomposition (VMD), as a novel self-adaptive method, was proposed by Dragomiretskiy et al. in 2014 [33]. It is a non-recursive method to decompose signals into quasi-orthogonal IMFs, each with a center frequency. It has been proved to overcome the problems of EMD and its extended methods [34]. Therefore, VMD is employed for feature extraction.
The IMFs acquired by VMD can be used to form a matrix, which is too large to be directly used for fault diagnosis. Therefore, a suitable algorithm is needed to extract the intrinsic characteristics of the matrix. Singular value decomposition (SVD), which has been proven to extract the features to periodic impulses is employed to obtain intrinsic features from the matrix with favorable stability [35,36]. After feature extraction, the next step is fault clustering. As aforementioned, the fault clustering method should adapt to unknown variable conditions automatically with little prior knowledge or human intervention. Therefore, the traditional discriminant analysis methods, such as Mahalanobis distance [37][38][39] and Fisher discriminant analysis [40,41], are not suitable, because they mainly require certain levels of expertise and threshold settings. In contrast, computation intelligence techniques are preferred in this situation [42]. The computation intelligence techniques are data-driven and have been employed broadly in fault diagnosis, such as: Bayes net classifier [43], optimization algorithms [44,45] and artificial neural networks [46][47][48]. Deep learnings, such as efficient and remarkable algorithms have been employed automatically with little human intervention [49][50][51]. They are capable to even extract feature self-adaptively and seems suitable for adaptive diagnosis. Deep learnings are data-hungry and require plenty of training data which are hardly acquired in practice, especially the faulty data under different conditions. Extreme learning machine (ELM), proposed by Huang [52,53], has been proven as an efficient algorithm for regression and multi-classification [54]. Compared with the conventional gradient-based algorithms and support vector machine (SVM), ELM spends less running time and performs better [55]. On this basis, Kernel-ELM (KELM) [56] is proposed by using a kernel function to improve the generalization ability and reduce possible over-fitting problems.
Similar to SVM, the performance of KELM also relies on the kernel function. But the choosing of the kernel function mainly depends on prior knowledge and expertise [55][56][57]. However, different kernel types or parameters maybe only fit several specific datasets respectively. Whereas measured signals under different conditions may prefer different kernel types or parameters. Therefore, it is necessary to modify the algorithm self-adaptively, to make sure that the diagnosis method is insensitive enough to the manual configuration of kernel and performs acceptably even if the kernel types or parameters are set badly.
Information geometry (IG), proposed by Amari [58], which aims to analyze information theory, statistics and machine learning based on differential geometry, offers a feasible approach. It can analyze and modify machine learning algorithms by convex analysis and constructing differential manifolds. By elucidating the dualistic differential-geometrical structure, information geometry has been widely applied [59][60][61][62][63][64].
In this paper, motivated by information geometry, we propose a novel algorithm, information geometrical kernel-ELM (IG-KELM). Firstly, we specify the geometrical structure and Riemannian metric of Kernel-ELM from the perspective of information geometry. Then, a data-dependent conformal transformation is created with Mahalanobis distance to modify the KELM self-adaptively. IG-KELM is insensitive to inappropriate kernel configuration, and can adapt itself to signals under variable conditions with little prior knowledge or human intervention. The feasibility and effectivity of IG-KELM was verified by simulation experiments.
The outline of this paper is as follows: Section 2 introduces VMD, Kernel-ELM, Riemannian metric of Kernel-ELM, information geometrical kernel-ELM, as well as the scheme of the proposed method; Section 3 describes the simulation experiment performed to verify IG-KELM; Section 4 describes the applications of the proposed method on fault diagnosis for bearing and hydraulic pump; and Section 5 is the conclusions.

Methodology
As aforementioned, VMD is the core algorithm for feature extraction, and kernel-ELM is the basic algorithm for fault clustering. In this section, VMD and kernel-ELM is introduced firstly. Then, the analysis of Riemannian metric of Kernel-ELM is described. On this basis, the IG-KELM is proposed. At the end of this section, the scheme of adaptive diagnosis based on VMD-SVD and IG-KELM is described.

Variational Mode Decomposition
VMD is capable to estimate the modes and determine the correlative bands of fault feature at the same time, which can decompose the signal into several IMFs [33]. VMD is a constrained variational problem represented by the following equation: where µ k and ω k are IMF components and their center frequencies.
To obtain the optimal solution of the problem, introduce the augmented Lagrange function: The mode number k and quadratic penalty α are set in advance, while the sub-mode function µ k 1 , the center frequency ω k 1 and the Largrangian multiplier λ 1 are initialized [33]. Then modes µ k and the center frequency ω k are renewed respectively by Equations (3) and (4): After the modes and center frequencies are updated, the Largrangian multiplier λ is also updated by Equation (5): µ k , ω k and λ are updated iteratively until Equation (6) is satisfied.

Kernel-ELM
ELM is originally developed for the training of Single hidden layer feedforward networks (SLFNs) and then extended to the generalized SLFNs. The architecture of ELM is shown in Figure 1. The details about ELM can be found in Huang et al. [52,53]. Consider a dataset (x i , y i ), where x i (i = 1, 2, · · · , N) is input vector, y i (i = 1, 2, · · · , M) is output vector. The output function of ELM can be described as: where, L is the number of hidden neurons, α i is the weight vector connecting the ith hidden node and the input nodes, b i is the bias of the ith hidden node, β i is the weight vector connecting the ith hidden node and the output nodes, G(•) represents the hidden nodes activation function. h(x) is the hidden layer output matrix of the network. where, L is the number of hidden neurons, i α is the weight vector connecting the ith hidden node and the input nodes, i b is the bias of the ith hidden node, i β is the weight vector connecting the ith hidden node and the output nodes, ( ) G  represents the hidden nodes activation function.
( ) h x is the hidden layer output matrix of the network. According to the ELM, i α and i b are randomly assigned and the least square solution of β is computed by the following objective function: Therefore, where, H + is the Moore-Penrose pseudo inverse of matrix H. Details can be found in Huang et al. [53].
To improve the generalization ability and reduce possible over-fitting problems, the training error is not supposed to be equal to zero. Therefore, the objective function can be rewritten as: where, i ξ is the training error of the ith input vector, C is the regularization parameter. The optimal value of β can be obtained as: If ( ) h x is unknown, the Mercer's conditions are introduced to the ELM model, a kernel function is defined as: Therefore, the output function can be rewritten as: According to the ELM, α i and b i are randomly assigned and the least square solution of β is computed by the following objective function: Therefore, where, H + is the Moore-Penrose pseudo inverse of matrix H. Details can be found in Huang et al. [53].
To improve the generalization ability and reduce possible over-fitting problems, the training error is not supposed to be equal to zero. Therefore, the objective function can be rewritten as: where, ξ i is the training error of the ith input vector, C is the regularization parameter. The optimal value of β can be obtained as: If h(x) is unknown, the Mercer's conditions are introduced to the ELM model, a kernel function is defined as: Therefore, the output function can be rewritten as: The regularization parameter C is usually calculated by using n-fold cross-validation (CV) method [56]. From the Equation (13), it is shown that the output function is determined by the kernel function. There are several popular kernel functions, such as: polynomial functions, radial basis functions and so on. Different kernel functions are appropriate for different situations.

Riemannian Metric of Kernel-ELM
For classification, KELM supposes to find an optimal separating hyperplane, which passes through the origin of the KELM random feature space [55]. To modify the kernel function data-dependently, information geometry is employed to analyze the structure of kernel mapping geometrically. According to information geometry [59], h(x) is considered as an embedding of the input space S into the random feature space F as a curved submanifold. The mapped pattern of x is z: z = h(x). It can be expressed in differential form: here, the squared length of dz = (dz α ) can be described as: where: G(x) = (g ij (x)) is the Riemannian metric tensor in the input space. It shows that G(x) is a positive-definite matrix. According to the theorems introduced by Wu et al. [65], g ij (x) can be described as: It is clear that the Riemannian metric G(x) can be directly determined by the kernel. Therefore, for Polynomial kernel, the induced Riemannian metric is: For Gaussian radial basis function, the induced Riemannian metric is: where: Therefore, the structure of kernel-ELM can be analyzed geometrically by using the Riemannian metric.

Information Geometrical Kernel-ELM
To modify KELM self-adaptively, we have to improve the generalization ability of KELM by using data rather than prior knowledge or expertise. From the perspective of the geometry, it can be found easily that the improvement of the spatial resolution around the optimal hyperplane will enhance the separability of patterns. In this study, a conformal transformation is utilized: where K(x, x ) is the modified kernel function and D(x) is a positive scalar function. The new Riemannian metric g ij (x) can be rewritten as: where: The selection of the factor D(x) should follow the rule that its value is greater when x is approaching to the boundary, and smaller when x is further away from the boundary. Therefore, the spatial resolution around the optimal hyperplane is enhanced.
However, the position of the hyperplane is unknown in practice. To solve this problem, we utilize Mahalanobis distance (MD) to estimate the approximate position of the hyperplane. The MD can be described as: MD where, MD (k) i represents the distance between x i and the kth pattern, u k and ∑ −1 represents the mean vector and covariance matrix of the kth pattern.
In view of the aforementioned analysis, the conformal mapping D(x) is given as: where, M is the number of patterns. It is shown that the chosen D(x) is directly derived from the data only and follows the aforementioned rule. By using this approach, this study improves the generalization ability of KELM self-adaptively based on the training data with no requirement of prior knowledge or expertise. In general, a novel algorithm called information geometrical Kernel-ELM (IG-KELM) is proposed, as shown in Figure 2: 1.
Train the KELM with a primary kernel K; 2.
Calculate Mahalanobis distances to obtain the conformal mapping D(x);

3.
Transform the kernel K by Equation (21), and obtain the modified kernel K;

4.
Retrain the KELM with the new kernel K.

Adaptive Diagnosis Based on VMD-SVD and IG-KELM
In this study, self-adaptive algorithms are used for both feature extraction and fault clustering; VMD-SVD is used for feature extraction and IG-KELM is used for fault clustering.
1. Feature extraction. VMD decomposes each signal into n-empirical modes (IMFs) self-adaptively, and the IMFs are constructed as a matrix. Then, SVD is employed to get a n-dimensional feature vector of singular values from the matrix.
2. Fault clustering. The proposed IG-KELM is employed for fault diagnosis under variable conditions.
The proposed method is self-adaptive and has strong robustness. Those advantages imply that this method is able to diagnose fault under unknown operation conditions without corresponding training data. The proposed method can be implemented self-adaptively under variable conditions,

Adaptive Diagnosis Based on VMD-SVD and IG-KELM
In this study, self-adaptive algorithms are used for both feature extraction and fault clustering; VMD-SVD is used for feature extraction and IG-KELM is used for fault clustering.
1. Feature extraction. VMD decomposes each signal into n-empirical modes (IMFs) self-adaptively, and the IMFs are constructed as a matrix. Then, SVD is employed to get a n-dimensional feature vector of singular values from the matrix. 2. Fault clustering. The proposed IG-KELM is employed for fault diagnosis under variable conditions. The proposed method is self-adaptive and has strong robustness. Those advantages imply that this method is able to diagnose fault under unknown operation conditions without corresponding training data. The proposed method can be implemented self-adaptively under variable conditions, with little requirement of prior knowledge, expertise, parameter configuration or any other human intervention.

Simulation Experiment for IG-KELM
A comparison of simulation experiments between IG-KELM and the conventional KELM was made to verify the efficiency and self-adaptivity while using different kernel types and parameters.

Simulation Data
Suppose that there is a simulated two-dimensional dataset Z = (x, y) which is distributed evenly in the region [−2, 2] × [−2, 2]. The datasets are divided into two classes using a curve determined by . Then, the IG-KELM or KELM produces a new boundary to cluster the dataset. The accuracy of the classification can be employed to verify the efficiency and self-adaptivity of IG-KELM or KELM while using different kernel functions and parameters.
In this simulation, 500 training samples were randomly and uniformly generated, as shown in Figure 3. Another 2000 test samples were generated for verification. Two types of kernel functions were involved, Gaussian RBF and Polynomial kernels in this study. In IG-KELM.
The results show that the test accuracy of KELM is 99.20%, while the test accuracy of IG-KELM is 99.65%. Therefore, the trained IG-KELM performed better than KELM.  Then, the classification results of KELM and IG-KELM were calculated, as shown in Table 1. The results show that the test accuracy of KELM is 99.20%, while the test accuracy of IG-KELM is 99.65%. Therefore, the trained IG-KELM performed better than KELM. The classification results were also calculated when γ = 2 −9 , 2 −6 , 2 0 , 2 2 to verify whether the IG-KELM can perform acceptably even if the kernel parameter is set badly, as shown in Table 1 and Figure 5. In general, when the value of γ is away from the optimal value (γ = 2 −3 ), the accuracy of KELM dropped obviously. However, the IG-KELM always performed well. Especially, if γ was set to 2 2 , the test accuracy rate of KELM dropped to about 93%, while the IG-KELM still performed at a high level (97.40%). The results show that the IG-KELM is insensitive to inappropriate parameter configuration, compared with KELM. The classification results were also calculated when  Table 1 and

By Using the Polynomial Kernel
The simulation results of KELM and IG-KELM with a Polynomial kernel were obtained when 2,3, 4,5 d = , as shown in Figure 6 and Table 2.

By Using the Polynomial Kernel
The simulation results of KELM and IG-KELM with a Polynomial kernel were obtained when d = 2, 3, 4, 5, as shown in Figure 6 and Table 2. Since the polynomial kernel is obviously inappropriate for the dataset, all test accuracy rates of KELM are below 90%. However, the IG-KELM improves the performance rapidly even with a bad kernel type. When d was set to 4, the test accuracy rate of IG-KELM reached to 93.15%. It is obvious that the IG-KELM has strong robustness to the inappropriate kernel types, compared with KELM.

Application on Bearing Fault Diagnosis
Rolling bearing is one of the most wildly used component in rotating machineries. Therefore, the vibration data of bearings were used to verify the propose method.

Experimental Setup of Bearing
The data from the bearing data center of Case Western Reserve University were used in this study. The test rig contains a 2 HP motor, a torque converter/encoder, a dynamometer and control electronics, and is shown in Figure 7 (More details about the test rig can be found in http://www.eecs.case.edu/laboratory/bearing/welcome_overview.htm.). The 6205-2RS JEM SKF deep-groove ball bearings were tested under four operation conditions (condition A, B, C, D), corresponding to different motor speeds and loads, as shown in Table 3. The vibration signals, including normal, inner race fault, outer race fault and rolling element fault signals were collected with a sampling rate of 12 kHz under every condition. There are four groups of data (group A, B, C, D), corresponding to four operation conditions. Each of four groups contains 50 training samples and 100 test samples, as shown in Table 3. To verify the proposed diagnosis method under unknown variable conditions, we trained four models corresponding to four conditions respectively, and tested each of them under every condition without any prior knowledge or human intervention.  Training  Test  Training  Test  Training  Test  Training  Test  A  1797  0  20  40  10  20  10  20  10  20  B  1772  1  20  40  10  20  10  20  10  20  C  1750  2  20  40  10  20  10  20  10  20  D  1730  3  20  40  10  20  10  20  10  20  Total  80  160  40  80  40  80 40 80 Figure 6. Comparison of test accuracy rates of KELM (blue curve) and IG-KELM (red curve) with the Polynomial kernel. Since the polynomial kernel is obviously inappropriate for the dataset, all test accuracy rates of KELM are below 90%. However, the IG-KELM improves the performance rapidly even with a bad kernel type. When d was set to 4, the test accuracy rate of IG-KELM reached to 93.15%. It is obvious that the IG-KELM has strong robustness to the inappropriate kernel types, compared with KELM.

Application on Bearing Fault Diagnosis
Rolling bearing is one of the most wildly used component in rotating machineries. Therefore, the vibration data of bearings were used to verify the propose method.

Experimental Setup of Bearing
The data from the bearing data center of Case Western Reserve University were used in this study. The test rig contains a 2 HP motor, a torque converter/encoder, a dynamometer and control electronics, and is shown in Figure 7 (More details about the test rig can be found in http://www. eecs.case.edu/laboratory/bearing/welcome_overview.htm.). The 6205-2RS JEM SKF deep-groove ball bearings were tested under four operation conditions (condition A, B, C, D), corresponding to different motor speeds and loads, as shown in Table 3. The vibration signals, including normal, inner race fault, outer race fault and rolling element fault signals were collected with a sampling rate of 12 kHz under every condition. There are four groups of data (group A, B, C, D), corresponding to four operation conditions. Each of four groups contains 50 training samples and 100 test samples, as shown in Table 3. To verify the proposed diagnosis method under unknown variable conditions, we trained four models corresponding to four conditions respectively, and tested each of them under every condition without any prior knowledge or human intervention.  Training  Test  Training  Test  Training  Test  Training  Test   A  1797  0  20  40  10  20  10  20  10  20  B  1772  1  20  40  10  20  10  20  10  20  C  1750  2  20  40  10  20  10  20  10  20  D  1730  3  20  40  10  20  10  20  10  20  Total  80  160  40  80  40  80  40

Feature Extraction Based on VMD-SVD
Each data sample was decomposed into n IMFs by VMD. In this study, n was set to 8 by default. Take a normal signal for example, the processing result is shown in Figure 8. For comparison, the results by EMD was also obtained, as shown in Figure 9. Compared with results of VMD and EMD, it is shown that VMD can separate signals more effectively and eliminate the modal aliasing problem.

Feature Extraction Based on VMD-SVD
Each data sample was decomposed into n IMFs by VMD. In this study, n was set to 8 by default. Take a normal signal for example, the processing result is shown in Figure 8. For comparison, the results by EMD was also obtained, as shown in Figure 9. Compared with results of VMD and EMD, it is shown that VMD can separate signals more effectively and eliminate the modal aliasing problem.

Feature Extraction Based on VMD-SVD
Each data sample was decomposed into n IMFs by VMD. In this study, n was set to 8 by default. Take a normal signal for example, the processing result is shown in Figure 8. For comparison, the results by EMD was also obtained, as shown in Figure 9. Compared with results of VMD and EMD, it is shown that VMD can separate signals more effectively and eliminate the modal aliasing problem.  The IMFs were used to construct the matrix. Then, the singular values can be acquired by using SVD. Take Condition A for example, the results are shown in Figure 10. Compared with results of EMD-SVD (as shown in Figure 11), for each certain fault mode, the results obtained by VMD-SVD are more coincident and stable, respectively. Different fault modes can be easily distinguishable by using VMD-SVD. On the contrary, the results of EMD-SVD show more variability. That is obviously a disadvantage to fault diagnosis. The IMFs were used to construct the matrix. Then, the singular values can be acquired by using SVD. Take Condition A for example, the results are shown in Figure 10. Compared with results of EMD-SVD (as shown in Figure 11), for each certain fault mode, the results obtained by VMD-SVD are more coincident and stable, respectively. Different fault modes can be easily distinguishable by using VMD-SVD. On the contrary, the results of EMD-SVD show more variability. That is obviously a disadvantage to fault diagnosis.

Fault Clustering for Bearing
In this study, IG-KELM with an RBF kernel was employed for fault clustering based on features extracted by VMD-SVD. Four trained models, corresponding to four conditions were obtained and tested under every operation condition, respectively, as shown in Table 4. The conventional KELM was employed for comparison.
In this study, all training errors of KELM and IG-KELM models were calculated to be zero; if the trained model and test samples come from the same condition, all test accuracy rates are 100% (which means the test errors are zero). That is mainly due to the efficiency of the hybrid feature extraction method (VMD-SVD). However, if the trained model and test samples come from different

Fault Clustering for Bearing
In this study, IG-KELM with an RBF kernel was employed for fault clustering based on features extracted by VMD-SVD. Four trained models, corresponding to four conditions were obtained and tested under every operation condition, respectively, as shown in Table 4. The conventional KELM was employed for comparison.
In this study, all training errors of KELM and IG-KELM models were calculated to be zero; if the trained model and test samples come from the same condition, all test accuracy rates are 100% (which means the test errors are zero). That is mainly due to the efficiency of the hybrid feature extraction method (VMD-SVD). However, if the trained model and test samples come from different conditions, the test accuracy rates of KELM decrease rapidly. When the trained KELM from Condition B was employed under Condition D, the test accuracy rate was only 89.25%, which is unacceptable in applications. On the contrary, IG-KELM performed much better under unknown variable conditions and the test accuracy rates were no less than 96%. Compared with the strong sensitivity of KELM to the operation condition, the IG-KELM is able to adapt itself in a data-dependent way and improves the performance under unknown variable conditions rapidly. The IG-KELM can be implemented even without any prior knowledge about the current operation condition.
Therefore, the novel method using IG-KELM based on VMD-SVD can be utilized for bearing fault diagnosis under unknown variable conditions with little prior knowledge or human intervention.

Application on Hydraulic Pump Fault Diagnosis
The proposed method was also applied on fault diagnosis of a hydraulic pump. The vibration signals were gathered from a test rig of SCY hydraulic plunger pump with a sampling rate of 1000 Hz, as shown in Figure 12. The nominal pressure is 31.5 Mpa, and the nominal displacement is 1.25-400 mL/r. The pump was running at a fluctuant motor speed of 5280 ± 200 rpm while gathering signals. Vibration signals were collected using a four-channel DAT recorder. Considering that the most crucial fault modes of the plunger pump are slipper loosing and valve plate wear, the dataset consists of three types of states, corresponding to no trouble (Normal), slipper loosing (Fault 1) and valve plate wear (Fault 2), as shown in Table 5.
operation condition, the IG-KELM is able to adapt itself in a data-dependent way and improves the performance under unknown variable conditions rapidly. The IG-KELM can be implemented even without any prior knowledge about the current operation condition. Therefore, the novel method using IG-KELM based on VMD-SVD can be utilized for bearing fault diagnosis under unknown variable conditions with little prior knowledge or human intervention.

Application on Hydraulic Pump Fault Diagnosis
The proposed method was also applied on fault diagnosis of a hydraulic pump. The vibration signals were gathered from a test rig of SCY hydraulic plunger pump with a sampling rate of 1000 Hz, as shown in Figure 12. The nominal pressure is 31.5 Mpa, and the nominal displacement is 1.25-400 mL/r. The pump was running at a fluctuant motor speed of 5280 ± 200 rpm while gathering signals. Vibration signals were collected using a four-channel DAT recorder. Considering that the most crucial fault modes of the plunger pump are slipper loosing and valve plate wear, the dataset consists of three types of states, corresponding to no trouble (Normal), slipper loosing (Fault 1) and valve plate wear (Fault 2), as shown in Table 5.   The proposed method was applied on the dataset and KELM was also used for comparison, as shown in Table 6, from which we can see that KELM has a high false alarm rate (25/30). That is because some normal data under unknown fluctuant conditions were identified as abnormal data by the KELM model. On the contrary, the IG-KELM has strong robustness to fluctuant conditions and can reduce the false alarms and improve the classification accuracy. The results show that the application of the proposed method on hydraulic pump fault diagnosis is feasible and efficient.

Conclusions
This paper focuses on adaptive diagnosis for rotating machineries, which can diagnose faults automatically under unknown variable operation conditions with little prior knowledge or human intervention. For this end, this paper proposes a method using IG-KELM based on VMD-SVD for fault diagnosis. Firstly, the VMD-SVD method is employed to extract features from the vibration signals self-adaptively. Secondly, IG-KELM, which employs information geometry to modify KELM data-dependently is used for fault clustering. The IG-KELM can be modified self-adaptively, and be insensitive to the manual configuration of kernel. The simulation results show that IG-KELM can increase the accuracy rate by up to 4.85%. Therefore, IG-KELM can perform efficiently even if the kernel types or parameters are set badly. Finally, the proposed method was applied on fault diagnosis of bearing and hydraulic pump under variable conditions. Compared with the conventional method, the proposed method increased the accuracy rates by up to 7.25% and 7.78%, respectively. The results show that the proposed method has strong self-adaptivity and high accuracy during feature extraction and fault clustering.
Considering that this proposed method can be applied automatically and require little prior knowledge or human intervention, it can be available on digital signal processors (DSPs), field programmable gate arrays (FPGAs) or even smartphones.
However, to some extent, the operation conditions involved in this study only fluctuated in a narrow region. Larger differences of operation conditions may create new problems. Therefore, additional experiments under more different conditions should be done to validate and improve the method. Meanwhile, more attentions should be paid to the development of more and better adaptive diagnosis methods.