Piston Wear Detection and Feature Selection Based on Vibration Signals Using the Improved Spare Support Vector Machine for Axial Piston Pumps

A piston wear fault is a major failure mode of axial piston pumps, which may decrease their volumetric efficiency and service life. Although fault detection based on machine learning theory can achieve high accuracy, the performance mainly depends on the detection model and feature selection. Feature selection in learning has recently emerged as a crucial issue. Therefore, piston wear detection and feature selection are essential and urgent. In this paper, we propose a vibration signal-based methodology using the improved spare support vector machine, which can integrate the feature selection into the piston wear detection learning process. Forty features are defined to capture the piston wear signature in the time domain, frequency domain, and time–frequency domain. The relevance and impact of sparsity in 40 features are illustrated through the single and multiple statistical feature analysis. Model performance is assessed and the sparse features are discovered. The maximum model testing and training accuracy are 97.50% and 96.60%, respectively. Spare features s10, s12, Ew(8), x7, Ee(5), and Ee(4) are selected and validated. Results show that the proposed methodology is applicable for piston wear detection and feature selection, with high model accuracy and good feature sparsity.


Introduction
Hydraulic transmission systems have been widely utilized in the aviation, aerospace, and marine industries, with the advantages of high power density and output efficiency [1,2]. Axial piston pumps are the key power components in hydraulic systems, and they can convert the rotating mechanical power into fluid power [3][4][5]. Typical structures of an axial piston pump are shown in Figure 1. Pistons are uniformly distributed around the center of the cylinder block. The swash plate and retainer drive the pistons reciprocating along the cylinder block center. The oil suction and extrusion are accomplished through the reciprocating movements. There are three friction pairs acting as bearing and sealing functions (marked with red lines in Figure 1): the slipper pair, the piston pair, and the valve plate pair. The piston pair has poor bearing and sealing functions due to the huge lateral forces from the inclined swash plate [6][7][8]. The poor performance of the piston pair may lead the piston surface to easily wear and become stuck. The piston wear is caused by surface microprotrusion; different microstructures are proposed to improve the film characteristics of the piston pair [9]. Proper materials of friction matching can decrease the piston wear; 940 stainless steel, F102 engineering plastics, and Al 2 O 3 ceramic were used as matching materials in a water hydraulic pump [10]. Ceramic and 30Cr2MoVA materials were applied to an ultra-high-pressure pump [11]. In addition, an anti-friction coating was added to high-lead bronze, and the mechanical properties and wear mechanism of the Although fault detection based on machine learning theory can achieve satisfactory accuracy, the performance is heavily dependent on model and feature selection [19]. Gaussian process regression (GPR), extreme gradient boosting, and support vector machine (SVM) are used for modeling the specific heat capacity in the application of solar energy [20]. Linear discriminant analysis, logistic regression, decision tree, random forest, Light-GBM, and multi-layer perceptron are utilized for the anomaly detection of the hydraulic system [21]. The GPR and support vector regression (SVR) are employed to predict the energy storage properties [22]. In addition, the convolutional neural network is proposed for the fault diagnosis of axial piston pumps [23]. Among the machine learning methods, SVM, with powerful generalization ability and high learning efficiency, has been widely used as a fault detection model for bearings [24][25][26], motors [27], gearboxes [28,29], and pumps [30]. Improved algorithms for SVM include the wavelet transformbased SVM [31], least squares-based SVM [32,33], hyper-sphere-structured SVM [34], proximal SVM [35], etc. In addition, feature selection in learning has recently emerged as a crucial issue [36,37]. Various forms of regularization are presented for feature selection during the machine learning process. The ℓ2 regularization is not enforced, and the results of the feature selection are poor. Numerous types of research based on the ℓ1 form, ℓ1/ℓ2 form, and ℓ1 + ℓ2 form have emerged to induce sparsity for feature selection in learning [38]. The ℓ1 regularization is widely utilized to select features during the machine learning process.
The model and feature selection during the fault detection of axial piston pumps are mostly based on expert knowledge. However, the relevance and impact of sparsity in model features are unclear. A piston wear fault is a major failure mode of axial piston pumps, but recent studies are not specific to piston wear detection and feature selection. The classical SVM is unable to complete feature selection during the learning process. In the presented study, a methodology using the improved SPARE-SVM is proposed for piston wear detection and feature selection for the first time. This methodology has the ability to integrate feature selection during the machine learning process. A large set of features is employed in the TD, FD, and TFD. The relevance and impact of sparsity in these features Numerous types of research have been carried out for the fault detection of an axial piston pump based on vibration signals. Methods based on cluster analysis [13], particle swarm optimization [14], artificial neural networks [15], hidden semi-Markov [16], and deep belief networks [17,18] are proposed, and features in the time domain (TD), frequency domain (FD), and time-frequency domains (TFD), including kurtosis, the root mean square, the energy ratio, the coefficients of the wavelet packet transform (WPT), and spectral entropy, are used as model indicators.
Although fault detection based on machine learning theory can achieve satisfactory accuracy, the performance is heavily dependent on model and feature selection [19]. Gaussian process regression (GPR), extreme gradient boosting, and support vector machine (SVM) are used for modeling the specific heat capacity in the application of solar energy [20]. Linear discriminant analysis, logistic regression, decision tree, random forest, Light-GBM, and multi-layer perceptron are utilized for the anomaly detection of the hydraulic system [21]. The GPR and support vector regression (SVR) are employed to predict the energy storage properties [22]. In addition, the convolutional neural network is proposed for the fault diagnosis of axial piston pumps [23]. Among the machine learning methods, SVM, with powerful generalization ability and high learning efficiency, has been widely used as a fault detection model for bearings [24][25][26], motors [27], gearboxes [28,29], and pumps [30]. Improved algorithms for SVM include the wavelet transform-based SVM [31], least squares-based SVM [32,33], hyper-sphere-structured SVM [34], proximal SVM [35], etc. In addition, feature selection in learning has recently emerged as a crucial issue [36,37]. Various forms of regularization are presented for feature selection during the machine learning process. The 2 regularization is not enforced, and the results of the feature selection are poor. Numerous types of research based on the 1 form, 1 / 2 form, and 1 + 2 form have emerged to induce sparsity for feature selection in learning [38]. The 1 regularization is widely utilized to select features during the machine learning process.
The model and feature selection during the fault detection of axial piston pumps are mostly based on expert knowledge. However, the relevance and impact of sparsity in model features are unclear. A piston wear fault is a major failure mode of axial piston pumps, but recent studies are not specific to piston wear detection and feature selection. The classical SVM is unable to complete feature selection during the learning process. In the presented study, a methodology using the improved SPARE-SVM is proposed for piston wear detection and feature selection for the first time. This methodology has the ability to integrate feature selection during the machine learning process. A large set of features is employed in the TD, FD, and TFD. The relevance and impact of sparsity in these features are illustrated through single and multiple statistical feature analysis, model performance evaluation, and feature selection analysis.

Experimental Setup and Procedure
Testing pumps are axial piston pumps with swash-plate-type variable control. Detailed parameters are shown in Table 1. The testing pump has nine pistons. The materials of the piston pair are 38CrMoAl and CuPb15Sn5. The piston diameter is 17 mm. Large fitting clearance of the piston pair will increase the leakage, while small fitting clearance may lead to the piston becoming stuck. The normal fitting clearance between the piston and the cylinder for the testing pump is approximately 0.03 mm. A piston wear fault leads to the fitting clearance exceeding the level of tolerance. Abnormal vibrations will occur simultaneously. In this paper, additional clearance is processed through the centerless grinder to simulate the piston wear fault. As shown in Figure 2, the diameter of the three marked pistons (A, B, and C) is reduced by 0.03 mm. are illustrated through single and multiple statistical feature analysis, model performance evaluation, and feature selection analysis.

Experimental Setup and Procedure
Testing pumps are axial piston pumps with swash-plate-type variable control. Detailed parameters are shown in Table 1. The testing pump has nine pistons. The materials of the piston pair are 38CrMoAl and CuPb15Sn5. The piston diameter is 17 mm. Large fitting clearance of the piston pair will increase the leakage, while small fitting clearance may lead to the piston becoming stuck. The normal fitting clearance between the piston and the cylinder for the testing pump is approximately 0.03 mm. A piston wear fault leads to the fitting clearance exceeding the level of tolerance. Abnormal vibrations will occur simultaneously. In this paper, additional clearance is processed through the centerless grinder to simulate the piston wear fault. As shown in Figure 2, the diameter of the three marked pistons (A, B, and C) is reduced by 0.03 mm.  The testing rigs for the testing pump are shown in Figure 3. The testing pump is driven by an inverter-fed motor through the drum-gear coupling. A torque-speed transducer is used to monitor the torque and rotating speed, which is located between the testing pump and motor. The frequency converter can regulate the motor rotating speed. The discharge pressure is regulated by a pressure relief valve, and its maximum pressure is 35 MPa. The discharge pressure is measured by a pressure transducer and a pressure gauge. The pump output flow is measured by a flow transducer. The vibration signal is used for the piston wear fault detection and feature selection. The vibration acceleration on point 1 is collected by a piezoelectric accelerometer (Type LAN-XI 3050, Brüel & Kjaer, Virum, Denmark) and data acquisition module (Type 4507-B-001, Brüel & Kjaer, Virum, Denmark). The sampling frequency of vibration signals is 48,000 Hz. A high-pass filter is utilized to denoise raw vibration signals and the cut-off frequency is 7 Hz. The testing rigs for the testing pump are shown in Figure 3. The testing pump is driven by an inverter-fed motor through the drum-gear coupling. A torque-speed transducer is used to monitor the torque and rotating speed, which is located between the testing pump and motor. The frequency converter can regulate the motor rotating speed. The discharge pressure is regulated by a pressure relief valve, and its maximum pressure is 35 MPa. The discharge pressure is measured by a pressure transducer and a pressure gauge. The pump output flow is measured by a flow transducer. The vibration signal is used for the piston wear fault detection and feature selection. The vibration acceleration on point 1 is collected by a piezoelectric accelerometer (Type LAN-XI 3050, Brüel & Kjaer, Virum, Denmark) and data acquisition module (Type 4507-B-001, Brüel & Kjaer, Virum, Denmark). The sampling frequency of vibration signals is 48,000 Hz. A high-pass filter is utilized to denoise raw vibration signals and the cut-off frequency is 7 Hz.
A testing pump with normal pistons (TPNP) and a testing pump with worn pistons (TPWP) are tested in the test rig, respectively. Vibration signals are compared in the TD, FD, and TFD. The denoised signals of the vibration acceleration on point 1 for the TPNP and TPWP are shown in Figure 4a,c, respectively. There are 9600 points (0.2 s) of the waveform in the TD. It is difficult to distinguish differences in the TD waveforms in Figure 4a,c. The signal's inherent nature in the FD can be disclosed by Fourier transform (FT) analysis. The FT can reflect the frequency components and distribution of the vibration signal in the FD. There will be some abnormal frequencies in the signal spectra when a mechanical fault occurs. Spectra in the FD of the TPNP and TPWP are shown in Figure 4b,d, respectively. The frequency ranges from 0 Hz to 6000 Hz. Some differences in the spectrum distributions are presented. The amplitude of different harmonics is variable. A testing pump with normal pistons (TPNP) and a testing pump with worn pistons (TPWP) are tested in the test rig, respectively. Vibration signals are compared in the TD, FD, and TFD. The denoised signals of the vibration acceleration on point 1 for the TPNP and TPWP are shown in Figure 4a,c, respectively. There are 9600 points (0.2 s) of the waveform in the TD. It is difficult to distinguish differences in the TD waveforms in Figure 4a,c. The signal's inherent nature in the FD can be disclosed by Fourier transform (FT) analysis. The FT can reflect the frequency components and distribution of the vibration signal in the FD. There will be some abnormal frequencies in the signal spectra when a mechanical fault occurs. Spectra in the FD of the TPNP and TPWP are shown in Figure 4b,d, respectively. The frequency ranges from 0 Hz to 6000 Hz. Some differences in the spectrum distributions are presented. The amplitude of different harmonics is variable.   A testing pump with normal pistons (TPNP) and a testing pump with worn pistons (TPWP) are tested in the test rig, respectively. Vibration signals are compared in the TD, FD, and TFD. The denoised signals of the vibration acceleration on point 1 for the TPNP and TPWP are shown in Figure 4a,c, respectively. There are 9600 points (0.2 s) of the waveform in the TD. It is difficult to distinguish differences in the TD waveforms in Figure 4a,c. The signal's inherent nature in the FD can be disclosed by Fourier transform (FT) analysis. The FT can reflect the frequency components and distribution of the vibration signal in the FD. There will be some abnormal frequencies in the signal spectra when a mechanical fault occurs. Spectra in the FD of the TPNP and TPWP are shown in Figure 4b,d, respectively. The frequency ranges from 0 Hz to 6000 Hz. Some differences in the spectrum distributions are presented. The amplitude of different harmonics is variable.  Signal comparison in the TD and FD is based on stationary signals. The frequency spectrum obtained from the FT may fluctuate with time, because the mechanical fault signal is not stationary when a fault occurs. Signal processing methods in the TFD have been proposed to analyze the non-stationary signals. These methods include the short-time Fourier transform, Wigner-Ville distribution, wavelet transform (WT), WPT, empirical mode decomposition (EMD), and ensemble empirical mode decomposition (EEMD), etc. The WT or WPT decomposes a signal into an approximated component and a detailed component. Moreover, the WPT has high-frequency resolution power in both the low and high frequency range, because this transform can choose the matched frequency band adaptively. A signal can be divided into a series of IMFs and a residue by the EMD or EEMD algorithm. Furthermore, the EEMD algorithm is able to overcome the mode mixing issue by adding white noise with a finite amplitude and obtain better performance. Therefore, apart from signal comparisons in the TD and FD, TFD analyses based on the WPT and EEMD algorithms are conducted on the denoised vibration signals. The wavelet is sym4 and the decomposition level is 3 for the WPT algorithm. The WPT-reconstructed coefficients of the TPNP and TPWP are shown in Figure 5. The maximum amplitude of the fifth frequency sub-band of the TPNP is approximately 2, while the maximum amplitude of the fifth frequency sub-band of the TPWP is approximately 4. The maximum amplitude of the sixth frequency sub-band of the TPNP is approximately 5, while the maximum amplitude of the sixth frequency sub-band of the TPWP is approximately 4. In addition, there are also some differences in the waveforms of the WPT-reconstructed coefficients.
adaptively. A signal can be divided into a series of IMFs and a residue by the EMD or EEMD algorithm. Furthermore, the EEMD algorithm is able to overcome the mode mixing issue by adding white noise with a finite amplitude and obtain better performance. Therefore, apart from signal comparisons in the TD and FD, TFD analyses based on the WPT and EEMD algorithms are conducted on the denoised vibration signals. The wavelet is sym4 and the decomposition level is 3 for the WPT algorithm. The WPT-reconstructed coefficients of the TPNP and TPWP are shown in Figure 5. The maximum amplitude of the fifth frequency sub-band of the TPNP is approximately 2, while the maximum amplitude of the fifth frequency sub-band of the TPWP is approximately 4. The maximum amplitude of the sixth frequency sub-band of the TPNP is approximately 5, while the maximum amplitude of the sixth frequency sub-band of the TPWP is approximately 4. In addition, there are also some differences in the waveforms of the WPT-reconstructed coefficients. For the EEMD algorithm, the ratio of the standard deviation for the added sample noise is 0.2. The ensemble number is 50 and the IMF number is 7. The EEMD IMFs of the TPNP and TPWP are shown in Figure 6. The maximum amplitude of the seventh IMF of the TPNP is approximately 0.05, while the maximum amplitude of the seventh IMF of the TPWP is approximately 0.1. In addition, there are also some differences in the waveforms of the IMFs.  For the EEMD algorithm, the ratio of the standard deviation for the added sample noise is 0.2. The ensemble number is 50 and the IMF number is 7. The EEMD IMFs of the TPNP and TPWP are shown in Figure 6. The maximum amplitude of the seventh IMF of the TPNP is approximately 0.05, while the maximum amplitude of the seventh IMF of the TPWP is approximately 0.1. In addition, there are also some differences in the waveforms of the IMFs.
adaptively. A signal can be divided into a series of IMFs and a residue by the EMD or EEMD algorithm. Furthermore, the EEMD algorithm is able to overcome the mode mixing issue by adding white noise with a finite amplitude and obtain better performance. Therefore, apart from signal comparisons in the TD and FD, TFD analyses based on the WPT and EEMD algorithms are conducted on the denoised vibration signals. The wavelet is sym4 and the decomposition level is 3 for the WPT algorithm. The WPT-reconstructed coefficients of the TPNP and TPWP are shown in Figure 5. The maximum amplitude of the fifth frequency sub-band of the TPNP is approximately 2, while the maximum amplitude of the fifth frequency sub-band of the TPWP is approximately 4. The maximum amplitude of the sixth frequency sub-band of the TPNP is approximately 5, while the maximum amplitude of the sixth frequency sub-band of the TPWP is approximately 4. In addition, there are also some differences in the waveforms of the WPT-reconstructed coefficients. For the EEMD algorithm, the ratio of the standard deviation for the added sample noise is 0.2. The ensemble number is 50 and the IMF number is 7. The EEMD IMFs of the TPNP and TPWP are shown in Figure 6. The maximum amplitude of the seventh IMF of the TPNP is approximately 0.05, while the maximum amplitude of the seventh IMF of the TPWP is approximately 0.1. In addition, there are also some differences in the waveforms of the IMFs.  Results show that some differences in the TD, FD, and TFD can be found when the pump has a piston wear fault. These differences are caused by the waveform distortions and spectrum energy relocation. Therefore, it is feasible to detect the piston wear fault through the vibration signal. This paper proposes an improved SPARE-SVM-based methodology for the piston wear detection and feature selection of axial piston pumps. The training dataset contains 220 samples of the TPNP and 45 samples of the TPWP, and the testing dataset includes 30 samples of the TPNP and 10 samples of the TPWP. The length of a single sample is 512 data points. The training and testing samples are independent of each other.

Improved SPARE-SVM Based Methodology
The SVM algorithm has been widely used in various applications, with the advantages of uncomplicated principles, easy calculation, and high classification accuracy. This algorithm aims to construct a separating hyper-plane. The hyper-plane is able to divide the dataset (x i , i = 1, 2, ···, N) into two classes (y i = ±1). The maximum distance between the upper and lower boundary of the hyper-plane is called the maximized margin. The support vectors are data points located on the hyper-plane. The separating hyper-plane of the SVM algorithm is constructed by structural risk minimization [25].
The optimal hyper-plane is built from parameters (ω, b), and estimations of the parameters are calculated as where parameter ω is an N-dimensional vector and parameter b is a scalar. ω 2 is the 2 form of the parameter ω. H(~) represents the hinge loss function of the i th data (x i , y i ). The weight of H(~) is adjusted by parameter C.
The classical SVM algorithm is unable to complete the feature selection during the machine learning process, due to the fact that the 2 form of the parameter ω is used. In addition, imbalanced datasets of two classes (y i = ±1) may also decrease the parameter sparsity and model accuracy. Therefore, the improved SPARE-SVM algorithm is put forward, using the 1 regularization form, the square hinge loss function H 2 (~), and the weights of the imbalanced datasets [39]. This algorithm can integrate the feature selection into the fault detection learning process.
The 1 norm can impose the sparsity of parameter ω. Meanwhile, it renders the solution of the optimal hyper-plane difficult as the 1 norm is not differentiable. The square hinge loss function H 2 (~) is able to deal with the non-differentiable problem of the 1 norm and solve the optimal hyper-plane. The weights of the imbalanced datasets are imposed to increase the robustness and precision of the models. Parameter estimations of the improved SPARE-SVM algorithm are calculated as follows: where N −1 is the number of the class (yi = −1) and N +1 refers to the number of the other class (yi = +1). The 1 regularization norm of parameter ω is non-differentiable. The forward-backward splitting algorithm can solve this optimization problem using the gradient descent step of the square hinge loss function H 2 (~). The proximity operator of parameter ω is defined as where ω( 1 ) is the parameter estimation of the 1 regularization norm. The parameter ω is obtained by the iterative calculation of the forward-backward splitting algorithm.
where ω(z) and ω(z − 1) are the parameter estimations of the zth and z − 1th iterative. The parameter D is a constant of gradient descent order, and ∇H 2 (~) refers to the Lipschitzcontinuous gradient of the square hinge loss function. The improved SPARE-SVM-based methodology is proposed for the piston wear detection and feature selection of axial piston pumps. The flowchart of this methodology is shown in Figure 7. First, the vibration signals of the axial piston pump are collected by the data acquisition module. Second, a comprehensive set of features is defined in the TD, FD, and TFD. Third, single statistical feature analysis is conducted to investigate the feature correlations and model performance obtained from a single feature. Fourth, multiple statistical feature analysis is used to illustrate the weight variations and model performance of multiple features in the TD, FD, and TFD. Fifth, the weight variations and model accuracy of the large set of features are analyzed through the model performance evaluation. Lastly, feature selection analysis is utilized to validate the feature sparsity.
continuous gradient of the square hinge loss function.
The improved SPARE-SVM-based methodology is proposed for the piston wear detection and feature selection of axial piston pumps. The flowchart of this methodology is shown in Figure 7. First, the vibration signals of the axial piston pump are collected by the data acquisition module. Second, a comprehensive set of features is defined in the TD, FD, and TFD. Third, single statistical feature analysis is conducted to investigate the feature correlations and model performance obtained from a single feature. Fourth, multiple statistical feature analysis is used to illustrate the weight variations and model performance of multiple features in the TD, FD, and TFD. Fifth, the weight variations and model accuracy of the large set of features are analyzed through the model performance evaluation. Lastly, feature selection analysis is utilized to validate the feature sparsity. Differences in the waveforms in the TD, FD, and TFD are shown through the signal comparison in Section 2.1. Statistical feature calculation in the TD [40,41], FD [42], and TFD [43,44] is conducted, and 40 features are constructed to capture the fault characteristics from different aspects.
Statistical features used in the TD are listed in Table 2, where xi (i = 1, 2, …, 11) denotes the i th feature, x(n) (n = 1, 2, …, N) refers to the TD signal, and N is the data length of the vibration signal. The mean value and standard deviation (x1 and x2) reflect the average and deviation of the TD signal. The root amplitude, root mean square, and peak value (x3, Differences in the waveforms in the TD, FD, and TFD are shown through the signal comparison in Section 2.1. Statistical feature calculation in the TD [40,41], FD [42], and TFD [43,44] is conducted, and 40 features are constructed to capture the fault characteristics from different aspects. Statistical features used in the TD are listed in Table 2, where x i (i = 1, 2, . . . , 11) denotes the i th feature, x(n) (n = 1, 2, . . . , N) refers to the TD signal, and N is the data length of the vibration signal. The mean value and standard deviation (x 1 and x 2 ) reflect the average and deviation of the TD signal. The root amplitude, root mean square, and peak value (x 3 , x 4 , and x 5 ) refer to the signal amplitude and energy. The skewness (x 6 ) measures the data asymmetry around the mean value. The kurtosis (x 7 ) shows the outlier proneness of a distribution. The crest factor (x 8 ) and clearance factor (x 9 ) indicate peak severity. The shape factor (x 10 ) reveals the shape of the distribution. The impulse factor (x 11 ) detects the impulse components of the distribution.
Statistical features in the FD are shown in Table 3, where s i (i = 1, 2, . . . , 14) denotes the i th feature, s(k) (k = 1, 2, . . . , K) is the amplitude and f (k) (k = 1, 2, . . . , K) represents the corresponding frequency for the kth FT spectrum line, and K refers to the FT spectrum line number. The mean frequency (s 1 ) reflects the signal energy in the FD. The frequency center, root 2 order weighting, root 4-2 order weighting, 2-4-order weighting, 2 order center moment, 3 order center moment, 4 order center moment, and root 2 order center moment (s 2 , s 3 , s 4 , s 5 , s 6 , s 7 , s 8 , and s 9 ) refer to the main frequency position changes in the frequency spectrum. The root 2 order convergence index, 3 order convergence index, 4 order convergence index, 1/2 order convergence index, and root 2 order convergence index (s 10 , s 11 , s 12 , s 13 , and s 14 ) reveal the convergence characteristics of the frequency spectrum power. Mean value Root amplitude Root mean square Peak value Skewness value Kurtosis value Crest factor Clearance factor Shape factor Impulse factor  Root 2 order center moment Root 2 order center moment convergence index s 10 = s 9 /s 2 11 3 order convergence index s 11 = 1 13 1/2 order convergence index Signal features obtained from the TFD are effective compensations for the statistical features in the TD and FD. The energy ratio of the frequency sub-bands decomposed by the WPT and the energy ratio of the IMFs decomposed by the EEMD are used as statistical features in the TFD. As shown in Table 4, E w (i) (i = 1, 2, . . . , 2 m ) denotes the i th frequency sub-band feature of the WPT, x i (j) refers to the i th frequency sub-band, and m represents the decomposition level (m = 3). For the EEMD algorithm, E e (i) (i = 1, 2, . . . , L) denotes the i th IMF feature, c i (j) refers to the i th IMF, and L represents the number of IMFs (L = 7).

Single Statistical Feature Analysis
Statistical feature correlations and model performance obtained from the single feature were investigated via single statistical feature analysis. The correlation analysis of the features of all training samples was carried out. The absolute values of the Pearson correlation coefficients (PCCs) are shown in Figure 8. The diagonal entries are only 1 due to the fact that all statistical features are always directly correlated to themselves. Statistical features in the TD or FD have a strong correlation. For example, the root mean square (x 4 ) is directly correlated to the standard deviation (x 2 ) as the PCC is 1. The PCCs between x 9 and x 8 , x 7 are 0.954 and 0.959, respectively. The PCCs between x 11 and x 9 , x 8 , x 7 are 0.996, 0.977, and 0.946, respectively. the WPT and the energy ratio of the IMFs decomposed by the EEMD are used as statistical features in the TFD. As shown in Table 4, Ew(i) (i = 1, 2, ..., 2 m ) denotes the ith frequency sub-band feature of the WPT, xi(j) refers to the ith frequency sub-band, and m represents the decomposition level (m = 3). For the EEMD algorithm, Ee(i) (i = 1, 2, ..., L) denotes the ith IMF feature, ci(j) refers to the ith IMF, and L represents the number of IMFs (L = 7).

Single Statistical Feature Analysis
Statistical feature correlations and model performance obtained from the single feature were investigated via single statistical feature analysis. The correlation analysis of the features of all training samples was carried out. The absolute values of the Pearson correlation coefficients (PCCs) are shown in Figure 8. The diagonal entries are only 1 due to the fact that all statistical features are always directly correlated to themselves. Statistical features in the TD or FD have a strong correlation. For example, the root mean square (x4) is directly correlated to the standard deviation (x2) as the PCC is 1. The PCCs between x9 and x8, x7 are 0.954 and 0.959, respectively. The PCCs between x11 and x9, x8, x7 are 0.996, 0.977 and 0.946, respectively.  Moreover, some statistical features in the FD are highly correlated to those in the TD. For instance, the PCCs between s 1 and x 4 , x 2 equal 0.943. The PCCs between s 6 and x 4 , x 2 equal 0.976. The PCCs between s 13 and x 4 , x 2 equal 0.930. The PCCs between s 14 and x 4 , x 2 equal 0.915. Strong correlations are due to the fact that the calculation processes of some statistical features in the TD and FD are related to each other.
In terms of statistical features in the TFD, most features calculated by the WPT and EEMD have a lower correlation with each other due to frequency decompositions of the vibration signals. Therefore, statistical features with lower correlations in the TFD have better independence than features in the TD or FD.
A single statistical feature is utilized as a model indicator for piston wear detection. The model accuracy of the training dataset (Acc tr ) and the testing dataset (Acc te ) is shown in Figure 9. Features E e (4), E e (6), and E e (7) in the TFD yield better testing performance, with Acc te of 82.50%. Feature E w (7) has the best training performance, with the maximum Acc tr of 83.02%. Moreover, the Acc te of this single feature equals 80.00%. In addition, most features in the FD have better model accuracy than features in the TD. Features, s 2 , s 3 , s 4 , s 5 , s 6 , s 7 , s 10 , and s 11 in the FD have larger Acc tr values than features in the TD. A single feature of the energy ratio is able to capture the piston wear information. Results of the single statistical feature analysis show that a single feature in the TFD has better independence and model accuracy.
Moreover, some statistical features in the FD are highly correlated to those in the TD. For instance, the PCCs between s1 and x4, x2 equal 0.943. The PCCs between s6 and x4, x2 equal 0.976. The PCCs between s13 and x4, x2 equal 0.930. The PCCs between s14 and x4, x2 equal 0.915. Strong correlations are due to the fact that the calculation processes of some statistical features in the TD and FD are related to each other.
In terms of statistical features in the TFD, most features calculated by the WPT and EEMD have a lower correlation with each other due to frequency decompositions of the vibration signals. Therefore, statistical features with lower correlations in the TFD have better independence than features in the TD or FD.
A single statistical feature is utilized as a model indicator for piston wear detection. The model accuracy of the training dataset (Acctr) and the testing dataset (Accte) is shown in Figure 9. Features Ee(4), Ee (6), and Ee (7) in the TFD yield better testing performance, with Accte of 82.50%. Feature Ew (7) has the best training performance, with the maximum Acctr of 83.02%. Moreover, the Accte of this single feature equals 80.00%. In addition, most features in the FD have better model accuracy than features in the TD. Features, s2, s3, s4, s5, s6, s7, s10, and s11 in the FD have larger Acctr values than features in the TD. A single feature of the energy ratio is able to capture the piston wear information. Results of the single statistical feature analysis show that a single feature in the TFD has better independence and model accuracy.

Multiple Statistical Feature Analysis
Weight variations of multiple features in the TD, FD, and TFD are shown in Figure  10. Abscissas are the logarithmic coordinates of parameter C. The parameter C ranges from 1.00 × 10 −3 to 1.00 × 10 2 , and it can adjust the weights of the square hinge loss function H 2 (~). Vertical coordinates refer to feature weights varying with parameter C during the piston wear detection process. Figure 10a shows that features x3, x4, x7, and x8 are the four first features selected by the multiple statistical feature analysis using the improved SPARE-SVM algorithm. These features are excluded as C increases. For features (x7, x8, x9, and x11), with strong correlations in the TD, only features x9 and x11 remain included when increasing parameter C. As shown in Figure 10b, features s5, s11, and s13 are selected among the multiple correlated features in the FD. Weight variations of multiple features

Multiple Statistical Feature Analysis
Weight variations of multiple features in the TD, FD, and TFD are shown in Figure 10. Abscissas are the logarithmic coordinates of parameter C. The parameter C ranges from 1.00 × 10 −3 to 1.00 × 10 2 , and it can adjust the weights of the square hinge loss function H 2 (~). Vertical coordinates refer to feature weights varying with parameter C during the piston wear detection process. Figure 10a shows that features x 3 , x 4 , x 7 , and x 8 are the four first features selected by the multiple statistical feature analysis using the improved SPARE-SVM algorithm. These features are excluded as C increases. For features (x 7 , x 8 , x 9 , and x 11 ), with strong correlations in the TD, only features x 9 and x 11 remain included when increasing parameter C. As shown in Figure 10b, features s 5 , s 11 , and s 13 are selected among the multiple correlated features in the FD. Weight variations of multiple features calculated by the WPT are shown in Figure 10c. E w (7) and E w (8) are selected as the two first features and they remain included during the piston wear detection process. The three first selected features from the EEMD are E e (4), E e (5), and E e (6), as shown in Figure 10d, and they continue to be selected as parameter C increases. calculated by the WPT are shown in Figure 10c. Ew (7) and Ew (8) are selected as the two first features and they remain included during the piston wear detection process. The three first selected features from the EEMD are Ee(4), Ee (5), and Ee (6), as shown in Figure  10d, and they continue to be selected as parameter C increases. Multiple statistical features calculated from the TD, FD, WPT, and EEMD are employed as model indicators for piston wear detection, respectively. Moreover, the model accuracy Acctr and Accte is shown in Figure 11. Multiple features in the FD have the maximum model accuracy, where Acctr and Accte are 95.85% and 92.50%, respectively. Features in the TD have the minimum model accuracy, where Acctr and Accte are 71.32% and 72.50%, respectively. The Acctr and Accte for the WPT are 83.02% and 80.00%, respectively. Moreover, the Acctr and Accte for the EEMD are 80.38% and 85.00%, respectively. The model accuracy of multiple statistical features in the TFD is larger than that of features in the TD. Results show that piston wear detection based on multiple statistical features in the TD is not applicable. Features in the FD and TFD are effective compensations for the detection model. The model accuracy of features in the FD and TFD is higher and satisfactory. Multiple statistical features calculated from the TD, FD, WPT, and EEMD are employed as model indicators for piston wear detection, respectively. Moreover, the model accuracy Acc tr and Acc te is shown in Figure 11. Multiple features in the FD have the maximum model accuracy, where Acc tr and Acc te are 95.85% and 92.50%, respectively. Features in the TD have the minimum model accuracy, where Acc tr and Acc te are 71.32% and 72.50%, respectively. The Acc tr and Acc te for the WPT are 83.02% and 80.00%, respectively. Moreover, the Acc tr and Acc te for the EEMD are 80.38% and 85.00%, respectively. The model accuracy of multiple statistical features in the TFD is larger than that of features in the TD. Results show that piston wear detection based on multiple statistical features in the TD is not applicable. Features in the FD and TFD are effective compensations for the detection model. The model accuracy of features in the FD and TFD is higher and satisfactory.

Model Performance Evaluation
The large set of features in the TD, FD, and TFD is used as the input of the improved SPARE-SVM model for piston wear detection simultaneously. All feature weights varying with parameter C during the model learning process are shown in Figure 12a. All statisti- Figure 11. Model performance obtained from multiple statistical features.

Model Performance Evaluation
The large set of features in the TD, FD, and TFD is used as the input of the improved SPARE-SVM model for piston wear detection simultaneously. All feature weights varying with parameter C during the model learning process are shown in Figure 12a. All statistical features are excluded when parameter C is smaller than 9.11 × 10 −3 . The energy ratio features E w (7), E w (8), and E e (4) are the three first features selected due to the high independence and model accuracy, and they are shown in Figures 8 and 9, respectively. These variations are verified with the results of the multiple statistical feature analysis shown in Figure 10c,d. Most statistical features calculated from the TD, WPT, and EEMD have decreasing weights when parameter C continues to increase. The weights of some statistical features (s 1 , s 10 , s 12 , and s 13 ) in the FD increase as parameter C increases. They agree with the results shown in Figure 10b.

Feature Selection Analysis
Considering that the classical SVM algorithm cannot complete the feature selection during the machine learning process, the feature selection analysis is only conducted based on the improved SPARE SVM algorithm. The weights of the large set of features in the TD, FD, and TFD with C = 4.75 × 10 −1 are shown in Figure 13. Statistical features x1, x3, x7, x11, s1, s4, s6, s10, s12, Ew (2), Ew(4), Ew (7), Ew (8), Ee(1), Ee(3), Ee(4), Ee (5), and Ee (6) are selected by the improved SPARE-SVM algorithm. Meanwhile, features x2, x4, x5, x6, x8, x9, x10, s2, s3, s5, s7, s8, s9, s11, s13, s14, Ew(1), Ew(3), Ew (5), Ew (6), Ee (2), and Ee (7) are excluded with no weights. Features s10 and s12 in the FD have larger weights than other features. The weights of s10 and s12 are −0.68 and −0.57, respectively. The weights of the features x3 and x7 in the TD are 0.11 and −0.21, respectively. The weights of the features Ew(2), Ew (7), and Ew (8) calculated by the WPT are 0.10, −0.14, and −0.21, respectively. The weights of the features Ee(4) and Ee(5) calculated by the EEMD are −0.17 and 0.20, respectively. The order of the first six sparse features is s10, s12, Ew (8), x7, Ee (5), and Ee(4).  The model accuracy Acc tr and Acc te varying with parameter C during the model learning process is shown in Figure 12b. There is a sudden change in the model accuracy when C is around 9.11 × 10 −3 . The change is due to the fact that a large set of features in the TD, FD, and TFD is not selected during the model learning process when C is less than 9.11 × 10 −3 . The Acc te increases as parameter C increases. The maximum value of the Acc te of 97.50% occurs when C is around 4.75 × 10 −1 . Moreover, the Acc tr is 96.60% at this time. The Acc te is no longer improved as parameter C increases. In addition, when C is larger than 6.00 × 10 −1 , the Acc te begins to decline. The model accuracy Acc tr gradually increases as parameter C becomes larger, which means that over-fitting occurs in the model learning process. It is found that the parameter C is able to regulate the trade-off between the feature sparsity and model accuracy. Therefore, the optimal parameter C is chosen as 4.75 × 10 −1 through the model performance evaluation. It is found that the model accuracy for all features is higher than that obtained from the single and multiple statistical feature analysis. This means that the model accuracy for all features is more satisfactory.

Feature Selection Analysis
Considering that the classical SVM algorithm cannot complete the feature selection during the machine learning process, the feature selection analysis is only conducted based on the improved SPARE SVM algorithm. The weights of the large set of features in the TD, FD, and TFD with C = 4.75 × 10 −1 are shown in Figure 13. Statistical features x 1 , x 3 , x 7 , x 11 , s 1 , s 4 , s 6 , s 10 , s 12 , E w (2), E w (4), E w (7), E w (8), E e (1), E e (3), E e (4), E e (5), and E e (6) are selected by the improved SPARE-SVM algorithm. Meanwhile, features x 2 , x 4 , x 5 , x 6 , x 8 , x 9 , x 10 , s 2 , s 3 , s 5 , s 7 , s 8 , s 9 , s 11 , s 13 , s 14 , E w (1), E w (3), E w (5), E w (6), E e (2), and E e (7) are excluded with no weights. Features s 10 and s 12 in the FD have larger weights than other features. The weights of s 10 and s 12 are −0.68 and −0.57, respectively. The weights of the features x 3 and x 7 in the TD are 0.11 and −0.21, respectively. The weights of the features E w (2), E w (7), and E w (8) calculated by the WPT are 0.10, −0.14, and −0.21, respectively. The weights of the features E e (4) and E e (5) calculated by the EEMD are −0.17 and 0.20, respectively. The order of the first six sparse features is s 10 , s 12 , E w (8), x 7 , E e (5), and E e (4).
the TD, FD, and TFD with C = 4.75 × 10 −1 are shown in Figure 13. Statistical features x1, x3,  x7, x11, s1, s4, s6, s10, s12, Ew(2), Ew(4), Ew (7), Ew (8), Ee(1), Ee(3), Ee(4), Ee (5), and Ee (6) are selected by the improved SPARE-SVM algorithm. Meanwhile, features x2, x4, x5, x6, x8, x9, x10, s2, s3,  s5, s7, s8, s9, s11, s13, s14, Ew(1), Ew(3), Ew (5), Ew (6), Ee (2), and Ee (7) are excluded with no weights. Features s10 and s12 in the FD have larger weights than other features. The weights of s10 and s12 are −0.68 and −0.57, respectively. The weights of the features x3 and x7 in the TD are 0.11 and −0.21, respectively. The weights of the features Ew (2), Ew (7), and Ew (8) calculated by the WPT are 0.10, −0.14, and −0.21, respectively. The weights of the features Ee (4) and Ee(5) calculated by the EEMD are −0.17 and 0.20, respectively. The order of the first six sparse features is s10, s12, Ew (8), x7, Ee (5), and Ee(4). Different cases of additional clearance (0.03 mm, 0.06 mm, and 0.09 mm) are investigated to validate the six sparse features selected by the improved SPARE-SVM algorithm for piston wear detection. Feature comparisons under different cases of piston wear are shown in Figure 14. The abscissa is the sample number, and the sparse features of 55 samples are calculated. As shown in Figure 14a,b, different wear cases affect the convergence characteristics of the frequency spectrum power, especially for the root 2 order Different cases of additional clearance (0.03 mm, 0.06 mm, and 0.09 mm) are investigated to validate the six sparse features selected by the improved SPARE-SVM algorithm for piston wear detection. Feature comparisons under different cases of piston wear are shown in Figure 14. The abscissa is the sample number, and the sparse features of 55 samples are calculated. As shown in Figure 14a,b, different wear cases affect the convergence characteristics of the frequency spectrum power, especially for the root 2 order convergence index (s 10 ) and 4 order convergence index (s 12 ,). The indexes s 10 and s 12 become larger as the clearance increases. The case of the normal piston can be identified from the energy ratio features E w (8), E e (5), and E e (4), as shown in Figure 14c,e,f, respectively. The case of 0.09 mm is different from the other three cases in terms of kurtosis (x 7 ). The six major sparse features of the piston wear fault are s 10 , s 12 , E w (8), x 7 , E e (5), and E e (4) due to the weight differences. Results show that the improved SPARE-SVM model is effective for piston wear fault detection and sparse feature selection. convergence index (s10) and 4 order convergence index (s12,). The indexes s10 and s12 become larger as the clearance increases. The case of the normal piston can be identified from the energy ratio features Ew (8), Ee (5), and Ee (4), as shown in Figure 14c,e,f, respectively. The case of 0.09 mm is different from the other three cases in terms of kurtosis (x7). The six major sparse features of the piston wear fault are s10, s12, Ew (8), x7, Ee (5), and Ee(4) due to the weight differences. Results show that the improved SPARE-SVM model is effective for piston wear fault detection and sparse feature selection.

Conclusions
This paper proposes a methodology using the improved SPARE-SVM for piston wear detection and feature selection for the first time. An experimental investigation was carried out, and the following conclusions were drawn.