Degradation State Recognition of Piston Pump Based on ICEEMDAN and XGBoost

Rui Guo 1,2,* , Zhiqian Zhao 1,3, Tao Wang 1,4, Guangheng Liu 1,4, Jingyi Zhao 2,4 and Dianrong Gao 3 1 Hebei Provincial Key Laboratory of Heavy Machinery Fluid Power Transmission and Control, Yanshan University, Qinhuangdao 066004, China; zzq623310996@163.com (Z.Z.); m13237195956@163.com (T.W.); lgh18733407119@163.com (G.L.) 2 The State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou 310027, China; zjy@ysu.edu.cn 3 Key Laboratory of Advanced Forging and Stamping Technology and Science, Yanshan University, Qinhuangdao 066004, China; gaodr@ysu.edu.cn 4 Hebei Key Laboratory of Special Delivery Equipment Yanshan University, Qinhuangdao 066004, China * Correspondence: guorui@ysu.edu.cn


Introduction
As the power source, the piston pump affects the function realization of the whole hydraulic system. According to the performance degradation of the piston pump so far, the degradation state recognition is carried out to determine the performance of the piston pump and the whole hydraulic system [1], which can provide decision-making information for condition-based maintenance (CBM), prognostics, and health management (PHM) [2]. Due to the influence of oil, temperature, load, and other factors [3,4], the components of piston pump will deteriorate because of wear [5], and this degradation will eventually lead to changes in the main performance indicators of the hydraulic pump.  The vibration signal is usually transient and periodic pulse behavior, which contains critical information about the status of the mechanical equipment. Scholars have carried out a lot of research and analysis of the vibration signal of mechanical equipment to extract effective information. Tian et al. [8] used and improved the multi-fractal detrended fluctuation analysis (MF-DFA) to extract the performance degradation characteristics of the hydraulic pump vibration signal, which improved the accuracy of the degradation state recognition, but did not process the vibration signal noise of the pump. Xiao et al. [9] used wavelet packet transform to analyze the bearing vibration signal, and extracted the node energy and its total energy as features. The method has a certain effect on white noise suppression, but the ability to suppress signal pulse interference is not strong. Singh et al. [10] applies pseudo-fault signal (PFS) assisted empirical mode decomposition (EMD) on the envelope, which solves the problem that different fault frequencies are not obvious due to mode aliasing, but the decomposition efficiency of the algorithm is low. Lei et al. [11] proposed an improved Hilbert-Huang transform based on ensemble empirical mode decomposition (EEMD) and sensitive mode IMFs, and achieved good results to a certain extent. For EEMD [12,13], the method reduces the effect of mode aliasing by adding auxiliary white noise; however, different implementation of signal plus noise may produce a different number of modes, making the final averaging difficult [14]. To solve this problem, Torres et al. [15] proposed complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), which added adaptive white noise at each stage of decomposition, and then obtained the modal components of each layer by calculating the unique residual signal.
For the degradation state recognition, Tian et al. [16] judged the degradation state of hydraulic pump by calculating the Jensen-Renyi Divergence (JRD) distance between different characteristic variables. For the fuzziness of the boundary of different degradation states, Wang et al. [17] used the fuzzy c-means (FCM) method to cluster and identified the degradation state of bearing performance according to the maximum membership degree law, the overall recognition rate reached 96%. Zhang et al. [18] combined neighborhood preserving embedding (NPE) with Self-organizing Map (SOM) in bearing degradation state recognition, and achieved good results. As classical classifiers, Support Vector Machine (SVM) [19,20] and Artificial Neural Network (ANN) [21,22] have been widely used in the field of mechanical fault diagnosis and condition recognition. For instance, an incomplete wavelet packet analysis (WPA) model composed of five-level discrete wavelet transform (DWT) and four-level WPA was established in the literature [23], and it was applied to multi-layered ANN engine failure classification. In Ref. [24], Laplace scoring algorithm is introduced to automatically select sensitive features according to the the importance of each feature, and the multi-state recognition of rolling bearing is realized based on particle swarm optimization-based support vector machine (PSO-SVM). However, SVM also has some shortcomings, that is, after selecting the appropriate kernel function, the quadratic programming of the function is required when dealing with the classification problem [25]. In this process, a lot of storage space is needed. Tree-based ensemble learning is an extensive and efficient method [26][27][28]. However, the traditional method is the model based on serial structure, which will inevitably increase the computational complexity and time [29]. XGBoost [30], as a new tree-based ensemble learning algorithm, can process data in parallel. It has minimal specification requirements for characteristic values, which can intelligently handle missing data and avoid overfitting [31]. Compared with SVM, the prediction accuracy is higher under relatively less parameter adjustment time [32]. Compared with deep learning, XGBoost is easier to classify small data sets [33].
In this paper, a new method for degradation state recognition of piston pump based on a further optimized decomposition method-improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) [34][35][36] and XGBoost is proposed. Firstly, ICEEMDAN is used to decompose the vibration signal to generate a series of IMFs. This method greatly suppresses the illusive components and mode mixing caused by the initial decomposition process, and has better decomposition results. According to the correlation coefficient method, the IMFs which have a great correlation with the original signal is extracted, and the different characteristic values of IMFs are analyzed and selected. Finally, the new characteristic values with low coincidence degree are selected by PCA as the input, and the XGboost model is used for training to complete the degradation state identification. The superiority of this method is verified by comparing with different algorithms. An overview of the proposed methods and analysis steps in the paper are shown in Figure 2.

Improved Complete Ensemble Empirical Mode Decomposition with the Adaptive Noise (ICEEMDAN) Model
ICEEMDAN adds a special white noise E k (w (i) ) that is the k-th IMF component of Gaussian white noise after EMD decomposition [34,37]. Then, by obtaining a unique residual, IMF is defined as the difference between the existing residual signal and its local mean. The results show the residual noise in IMF is greatly reduced, and the problem of illusive components and shortage of modal aliasing in the early stage of decomposition is also solved [38].
Let E k (·) denote the operator of the kth IMF component obtained by EMD, and M(·) denote the operator for calculating the local mean of the signal. The relationship between the first IMF valuec 1 and the residual r 1 are asc where M(x (i) ), x is the original signal, and N is the number of data points of the x.
The specific decomposition process of ICEEMDAN can be described as follows.
a. Adding white noise E 1 (w (i) ) to the original signal x, x (i) is obtained as where w (i) represents the i-th white noise to be added. b. EMD is used to calculate the local mean of x (i) , and the first residual r 1 was obtained by taking the average of them; then, the first IMF value can be calculated byc 1 = x − r 1 . c. The second mode component value(IMF2) can be calculated byc 2 = r 1 − r 2 , Until the obtained residual component can be further decomposed by EMD, ICEEMDAN can be used to decompose IMF accurately, which lays a foundation for improving the accuracy of state recognition.

eXtreme Gradient Boosting (XGBoost) Classifier Design
For the traditional radient Boosting Decision Tree (GBDT) algorithm, only the first order derivative information is used. Moreover, due to the dependency between weak learners, it is difficult for GBDT to train data in parallel [39]. XGBoost takes the Taylor expansion of the loss function up to the second order and adds a regularization term to find the optimal solution, which is used to balance the decline of the objective function and the complexity of the model to avoid overfitting [40]. The XGBoost model isŷ where K is the number of decision-tree, f k (x i ) is the function of input in the k-th decision-tree,ŷ i is the predicted value, and F is the set of all possible CART. The objective function of XGBoost includes two parts: training error and regularization, which is l(y,ỹ) is used to measure the difference between the predicted value and the real value of the loss function.
T is the number of leaf node, w is the scores of leaf node, γ is the leaf penalty coefficient, and λ ensures that the scores of leaf node is not too large.
The XGBoost algorithm uses the gradient boosting strategy, adds one new tree at a time instead of getting all the trees at once, and continuously repairs the previous test results by fitting the residuals of the last prediction: Combined with Formulas (4) and (5), for the t-th decision tree, the objective function can be updated to In order to find the objective function which can be minimized, taking the Taylor expansion of the loss function up to the second order. Then, the objective function is approximately as: Add the loss function values of each data, and the process is as follows: where The optimal w and objective function values obtained by the solution are as where G j = ∑ i∈I i g i , H j = ∑ i∈I j h i . During the training process, the model continuously calculates the node loss to select the leaf node with the largest gain loss. XGBoost adds new trees by continuously splitting features. Adding a tree each time is actually learning a new function f k (X, θ k ) to fit the residual of the last prediction.
When K trees are obtained after training, the features of prediction samples will have a corresponding leaf node in each tree, and each leaf node corresponds to a score. Finally, the corresponding scores of each tree are added up to obtain the recognition prediction value of the sample [30]. The flow chart of XGBoost is shown in Figure 3.

Performance Degradation Test of Piston Pump
In this paper, the piston pump performance degradation test bench could collect the vibration signals of the pump with different wear degrees. During the test, the slipper with different wear degrees was replaced to simulate the phenomenon that the wear of the piston pump gradually increases in actual work. Figure 4 is the physical picture of the test bench. The vibration sensor is installed by magnetic base adsorption, and the vibration signal of pump shell is collected synchronously.  In order to simulate the process of slipper wear degradation that affected the performance of the piston pump, the wear amounts of slipper were 0 mm (no wear), 0.5 mm (slight wear), 1.5 mm (moderate wear) and 2.5 mm (severe wear) in turn. The test adopts 20 kHz sampling frequency to collect the vibration signal. The test hydraulic system was under a pressure of 15 MPa, and the vibration signals of the slipper with different degrees of wear were sequentially collected. In our experiment, the data collected in the last 3 s are taken as effective degradation data, and the data obtained by each degradation degree are divided into 50 groups. Among the 50 sets of data, 30 sets are used for training and 20 sets are used for testing.

Pre-Processing of the Vibration Signal by ICEEMDAN
The vibration signals with different degradation degrees are shown in Figure 5, from which it can be seen that there are great differences in the characterization of vibration waveforms under the same working conditions. With the increase of slipper wear, the vibration of piston pump becomes more and more intense, and the amplitude of vibration signal is larger and larger. The vibration data of the tested pump under different states are selected and decomposed by ICEEMDAN, and the results are shown in Figure 6. The ICEEMDAN algorithm is used to decompose the vibration signals of four states into respective IMFs and a residual component for model development. The IMFs generated by ICEEMDAN have different sensitivity to the degrees of degradation. If the IMFs are analyzed directly, it is difficult to get the effective features related to the original signal. The appearance of illusive components will cause the extraction of useless features, increase the dimension of the feature set, and bring difficulties to the subsequent analysis [41]. By comparing correlation coefficients, the effective components in IMFs were screened out. The IMF component with the largest correlation coefficient is selected as the research object in this paper.   Then, the appropriate IMF component is selected by using the correlation coefficient method [42]. The calculation process of the correlation coefficient is shown in Formula (11): where x (n) is the sequence of effective IMF component, y (n) is the sequence of the original signal.
The correlation coefficient method can be used to express the correlation between the selected IMF component and the original signal, the correlation coefficient values of the IMF components of the four states are showed in Table 1. The larger the correlation coefficient is, the greater the correlation between the IMF components and the original signal, the more effective information it contains. On the contrary, the IMF component is more likely to be illusive.

Multi-Domain Feature Selection
The selected characteristic parameters of time domain and calculation methods are shown in the Table 2. In the time domain characteristics, the root mean square reflects the irregular continuity of vibration and the energy of signal. The peak-peak value can be used to represent the impact vibration caused by the wear of the the inner edge. The spike degree of vibration signal waveform about piston pump is expressed by peak value. The variance indicates the energy and intensity of vibration signal. The degree to which vibration signals deviates from normal distribution is reflected by skewness and kurtosis.The waveform index is a sensitive and stable parameter, which can well represent the slight damage of slippers with different wear degrees. With the aggravation of wear, the impulsion index and tolerance index increase obviously [43][44][45]. Table 2. Numerical explanation of time domain.

Time Domain Characteristic Expression
Root mean square x(i).
In order to obtain degradation information from the frequency domain of vibration signals, we select F 10 : average frequency, F 11 : frequency variance, F 12 : gravity frequency, F 13 : frequency standard deviation, and statistical indexes about frequency domain in reference [46] are F 14 , F 15 , F 16 in the table. The average frequency reflects the vibration energy in the frequency domain. The frequency variance, F 14 and F 15 , indicate the dispersion or concentration of the spectrum. The position change of the main frequency band is obtained according to gravity frequency, frequency standard deviation, and F 16 . The numerical explanation is shown in Table 3. Table 3. Numerical explanation of frequency domain.

Frequency Domain Characteristic Expression Frequency Domain Characteristic Expression
s(i) is the spectrum value of effective IMF, i = 1, 2, 3, ..., N, N is the total number of spectral lines, and f i is the frequency value of the i-th spectral line.
These characteristic values listed in Table 2 and Table 3 represent the degradation trend of the piston pump from different perspectives, which in time domain and frequency domain vary with the different degradation states. In order to further extract the piston pump characteristic information of different degradation state, the permutation entropy [47], approximate entropy [48], information entropy [49], and energy entropy [50] of effective IMFs are calculated, respectively, forming the feature information set based on IMF frequency band. The four entropy features of each degradation state are shown in Figure 7.
It can be seen from Figure 7 that different degradation states have a good hierarchy. With the degree of wear increasing, the vibration signal over time series becomes relatively chaotic and complex, but its entropy values are more stable.

Feature Filtering by PCA
Too high feature dimensions lead to over fitting and some of the features are highly correlated, which will increase the unnecessary calculation burden [51]. Therefore, the PCA algorithm is used in this paper to obtain new low-dimensional features, remove redundant features, and retain the main information of original feature vectors while reducing the complexity of classification and recognition model [52]. The results are shown in Figure 8.    For the convenience of calculation, the coincidence degree of cumulative contribution rate under different degradation states should be high, and we need the same number of principal components in different degradation states [43]. It can be seen from Figure 8 that, when the number of principal components is 6, the cumulative contribution rate of the four degradation states is about 96.4%, and have a high degree of coincidence and contain most of the degradation information of the feature set. When there are too many principal components, the interpretation degree of redundant principal components is smaller than that of a single variable, which will only increase the calculation time and complexity of the model. On the contrary, when there are too few principal components, the whole feature set can not be expressed, and if different degradation states have different numbers of principal components, the calculation will become more complex and reduce the accuracy of model prediction. Therefore, the first six principal components are used to replace the original 20-dimensional degenerate feature vector, which reduces the data dimension and calculation cost. The first six principal components are used as new feature vectors to construct the training set and test set of XGBoost model respectively.

Parameter Optimization for the XGBoost Model
In order to improve the performance of the model, it is necessary to adjust the XGBoost parameters. The XGBoost algorithm mainly includes three types of parameters: general parameters, Booster parameters, and task parameters. Each type of parameter includes several specific parameter values, among which the booster parameters have a greater impact on the algorithm performance. In this paper, on the basis of fixing other parameters, changing the booster parameters: max depth (the maximum depth of the tree), the min child weight (minimum sum of instance weight needed in a child) and n estimators (number of base classifiers) to find the optimal parameters, which can avoid the over-fitting of the model and improve the model accuracy.
The maximum depth of the trees and the min child weight are two parameters that affect each other. If the parameters are optimized in turn, they will only become disadvantages of local optimization. Moreover, the amount of data in this paper is small; therefore, we can use the grid search to select the maximum depth of the trees and the min child weight at the same time. The results are shown in Figure 9. When the maximum depth of the tree is 3 and the weight sum of leaf nodes is 1, the average state recognition accuracy is the highest, which is 0.9917. Thus, the max depth = 3 and the min child weight = 1. Fixing the above two parameters, the number of decision trees was selected by logarithmic loss function to evaluate the probability output of XGBoost classifier [53]. The logarithmic loss function is defined as: where N is the input sample size, M is the number of categories, y ij represents the real category of the input data points, and p ij is the probability that the i-th data point predicted by the XGBoost classifier belongs to the j-th class. The results are shown in Figure 10. The closer the value of logarithmic loss is to 0, the higher the accuracy of the classifier is. In this paper, when the number of trees is 167, the logarithmic loss value is −0.0061, which is the largest, so the number of the trees is equal to 167. The parameters for XGBoost are shown in the Table 4, and the remaining parameters are the default values.

Experimental Results and Analysis
In order to better verify the efficiency and correctness of this method, K-fold cross validation [54] is used to calculate the recognition accuracy of the XGBoost model. The data set is divided into 10 parts, among which six parts were taken as training data, and the remaining four parts were taken as test data in turn for experiments. The average value of recognition accuracy is taken as the final result. Moreover, the method adopted in this paper is also compared with ANN, SVM, and GBDT shown in Table 5. It is found that, when the number of hidden layer neurons of ANN is 12, the effect is better, while the kernel function of SVM is RBF, the penalty coefficient is 4, and gamma is 0.1. By adjusting and optimizing the parameters of GBDT, n estimators = 20, max depth = 3, and min samples split (the minimum number of samples needed when the internal nodes are divided again) = 10. The above table shows the average accuracy and average calculation time of multiple status recognition verifications. As you can see from the table, the average recognition accuracy of the four algorithms has little difference, possibly because the feature values extracted by the method proposed in this paper have good discrimination-among which SVM and XGBoost have the highest recognition accuracy of 0.997. SVM takes a long calculation time because of the kernel function. Thus, when the performance of the computer is the same, the average decision time of XGBoost is less than that of SVM, which is 0.03 s. Compared with the traditional GBDT model, XGBoost adds the control of model complexity and pruning processing, which makes the trained model difficult to overfitting, and the calculation time is relatively less. The average recognition rate of GBDT is 0.994, but the average decision time is lower than SVM and ANN. ANN needs to iterate repeatedly to get the ideal classification effect, so it has higher average decision time than three other algorithms. In the complex working environment, the vibration data of piston pump are complex and changeable, so it is necessary to extract a variety of characteristics from it to carry out effective diagnosis. Compared with ANN, the diagnostic accuracy of XGBoost can be improved by 0.001 to 0.013, and there is no need for a tedious parameter optimization process. The XGBoost algorithm can not only ensure high diagnosis accuracy but also reduce calculation time. In order to further prove the effectiveness of the present work in this paper, a variety of methods are introduced for comparison. The comparative results are shown in Table 6. The results are shown in Table : in the case of no data preprocessing, the recognition rate is low due to the influence of noise. Compared with EEMD, after ICEEMDAN preprocessing, the recognition rate will be relatively high. Therefore, the proposed method effectively identifies degraded states. It can maintain mechanical equipment through timely judgment, which has high practical value in the fault diagnosis of piston pump.

Conclusions
In this paper, ICEEMDAN is used to denoise the vibration signal of the piston pump, PCA is used to reduce the feature dimension, and the state recognition based on XGBoost is carried out to identify the wear state and category of the piston pump slipper. The experimental results show that: a. The ICEEMDAN can decompose the vibration signal of piston pump adaptively, improve the decomposition efficiency, and suppress phenomena of mode mixing. It is feasible to select an effective IMF component by using correlation coefficient. b. Through time domain, frequency domain, and entropy, the deterioration process of piston pump can be tracked and identified comprehensively. The PCA is used to reduce data dimension and calculation cost, which improves the accuracy and efficiency of state identification.
c. The average recognition accuracy of slipper wear state of piston pump based on ICEEMDAN and XGBoost is 99.7%. Compared with ANN, GBDT, and SVM algorithm, XGBoost identifies four different wear states better and saves the computing time, which highlights the advantages of XGBoost after parameter optimization in pattern recognition.