A Rail Fault Diagnosis Method Based on Quartic C2 Hermite Improved Empirical Mode Decomposition Algorithm

For compound fault detection of high-speed rail vibration signals, which presents a difficult problem, an early fault diagnosis method of an improved empirical mode decomposition (EMD) algorithm based on quartic C2 Hermite interpolation is presented. First, the quartic C2 Hermite interpolation improved EMD algorithm is used to decompose the original signal, and the intrinsic mode function (IMF) components are obtained. Second, singular value decomposition for the IMF components is performed to determine the principal components of the signal. Then, the signal is reconstructed and the kurtosis and approximate entropy values are calculated as the eigenvalues of fault diagnosis. Finally, fault diagnosis is executed based on the support vector machine (SVM). This method is applied for the fault diagnosis of high-speed rails, and experimental results show that the method presented in this paper is superior to the traditional EMD algorithm and greatly improves the accuracy of fault diagnosis.


Introduction
High-speed trains have developed rapidly in the recent years. Compared to the traditional trains, high-speed trains have advantages such as fast speed, strong carrying capacity, green environmental protection, high safety, and good economic benefits. However, when the high-speed train accelerates and brakes, the rails are prone to severe friction, crushing, and impact. Under harsh working conditions, the rails easily fail. If these failures are not discovered in time, the defects in the rails can expand rapidly causing the rails to break, which can result in serious safety concerns such as accidents. Hence great importance is placed on early fault diagnosis studies of the high-speed rails.
Domestic and foreign scholars have conducted a lot of research on the safety of high-speed rails. Yin Kai used image detection and other related technologies to detect faults in the rails [1]. Li Haiqing analyzed high-speed rail vibration signals and used an empirical mode decomposition (EMD) algorithm to extract the rail damage information for fault detection [2]. Xu Peng et al. used the differential eddy current detection system to detect cracks in high-speed rails and obtained the amplitude and phase changes from the crack detection signals; then they deduced the relation between the detection signal and the crack [3]. Xiongxiang Liu used deep convolutional neural networks to identify the surface defects in the two-dimensional images of the rails [4]. However, the deep learning model has high computational complexity and limited recognition accuracy, which may lead to misjudgment. There are other nondestructive testing methods such as magnetic particle testing [5], The rest of the paper is arranged as follows. Section 2 introduced the traditional EMD algorithm and its undershoot problem. Aiming at addressing this problem, an improved EMD algorithm with quartic C 2 Hermite [27] interpolation is presented. Section 3 introduced the singular value decomposition algorithm and the signal reconstruction method used in this paper. Section 4 introduced the fault features used in this paper, which are kurtosis and approximate entropy, and the whole rail fault diagnosis process. Section 5 introduced relevant experiments to verify the effectiveness of the algorithm.

EMD Principle
The EMD method was proposed by Huang et al. in 1988. This method exhibits outstanding performance in dealing with non-linear and non-stationary random signals. Therefore, it is widely used in the fields of signal denoising, signal features extraction, and so on. The EMD method decomposes the original signal into multiple IMF components based on the local feature time scale [12]. The conditions that must be met for IMF are as follows: (1) The numbers of extreme points and zero points must be equal or at most different in one over the length of the data; (2) At any data point, the average of the envelope of the local maximum and that of the local minimum must be zero.
The signal that can be decomposed with the EMD algorithm needs to satisfy the following conditions: (1) Have at least two extreme points, including a maximum point and a minimum point; (2) A signal feature time scale is determined by the time interval between extreme points; (3) Have no extreme points; the first-order or higher-order derivatives of inflection points can be taken as extreme points.
For a signal named X, the EMD steps are as follows: (1) Determine all the local extreme points of the signal, use the cubic spline interpolation to fit all local maximum points to form the upper envelope curve, and fit all local minimum points to form the lower envelope curve.
(2) Calculate h 0 (t) = x(t) − (3) Determine whether h 0 (t) satisfies the conditions of the IMF. If they are satisfied, then h 0 (t) is the first IMF component; otherwise, repeat steps 1 and 2 for h 0 (t) until the IMF conditions are satisfied. The IMF component after the first filtration is recorded as s 1 (t). (4) Calculate the remaining signal r 1 (t) = x(t) − s 1 (t), repeat step 1 to 3 for r 1 (t) to get the second IMF component, and in this way, repeat n times loop to get n IMF components. Terminate the step when the remainder is a monotonic signal or meets a given condition.

Undershoot
The signal filtration process is the most important part of the EMD algorithm and directly affects the decomposition results. Traditionally, when the cubic spline curve is used to fit extreme points to form the envelope curve, the undershoot phenomenon easily occurs. As shown in Figure 1, the upper envelope curve is fitted by cubic spline interpolation. The ideal upper envelope curve should satisfy x c (i) ≥ x(i), ∀i = 1, 2, · · · , N anywhere. However, there are two places where the values of the envelope curve are less than those of the original signal, and this phenomenon is called "undershoot". The envelope curve cannot surround the original signal curve completely, resulting in meaningless IMF components during the decomposition process, or signal distortion during the signal reconstruction process. in meaningless IMF components during the decomposition process, or signal distortion during the signal reconstruction process. The reason for the undershoot phenomenon is that the traditional fitting algorithm with cubic spline interpolation only satisfies C 1 continuous, such that the shape is fixed under the given conditions and lacks flexibility. So the fitting performance is poor when the non-linear and non-stationary signal changes greatly.

Improved EMD Algorithm
In order to solve the drawbacks of the method with cubic spline interpolation, the trigonometric interpolation method was used to fit the envelope curve [24]. Yang proposed a method with piecewise cubic spline interpolation; however, the method has poor continuity and only satisfies C 1 continuous [22]. Oberlin T proposed a method with which the envelope curve can be constructed directly to replace the method with cubic spline interpolation [28]; Zhu Weifang used a method with piecewise cubic Hermite interpolation to fit the envelope curve [23], which mainly used Lagrangian method to get the minimum value and optimized the derivative value at extreme points. However, this method also does not satisfy C 2 continuous. Some common interpolation methods and their pros and cons are shown in Table 1.

Advantages Disadvantages
Cubic Spline (CS) Common algorithm, Overshoot and undershoot, CTCP [24] good flexibility high computational complexity B-Spline (BS) [22] Improve the local characteristics of EMD algorithm The endpoint extension is assumed to be infinite, and the endpoint effect is not considered.
OS [28] Adaptive and better than cubic spline High computational complexity and poor smoothness. The reason for the undershoot phenomenon is that the traditional fitting algorithm with cubic spline interpolation only satisfies C 1 continuous, such that the shape is fixed under the given conditions and lacks flexibility. So the fitting performance is poor when the non-linear and non-stationary signal changes greatly.

Improved EMD Algorithm
In order to solve the drawbacks of the method with cubic spline interpolation, the trigonometric interpolation method was used to fit the envelope curve [24]. Yang proposed a method with piecewise cubic spline interpolation; however, the method has poor continuity and only satisfies C 1 continuous [22]. Oberlin T proposed a method with which the envelope curve can be constructed directly to replace the method with cubic spline interpolation [28]; Zhu Weifang used a method with piecewise cubic Hermite interpolation to fit the envelope curve [23], which mainly used Lagrangian method to get the minimum value and optimized the derivative value at extreme points. However, this method also does not satisfy C 2 continuous. Some common interpolation methods and their pros and cons are shown in Table 1. Table 1. Common interpolation methods and their advantages and disadvantages.

Interpolation Methods Advantages Disadvantages
Cubic Spline (CS) Common algorithm, Overshoot and undershoot, CTCP [24] good flexibility high computational complexity B-Spline (BS) [22] Improve the local characteristics of EMD algorithm The endpoint extension is assumed to be infinite, and the endpoint effect is not considered.
OS [28] Adaptive and better than cubic spline High computational complexity and poor smoothness.

Interpolation Methods Advantages Disadvantages
OPCH [23] Taking the difference between extremes as the cost function, the results are more accurate and reasonable than cubic spline Slow computation speed and do not satisfy C 2 continuous especially when there are many extreme points and cause undershooting.
MPCI [29] Through strict theoretical analysis, it is proved that it can solve the overshoot problem caused by the cubic spline interpolation Only satisfy C 1 continuous, the curve smoothness is poor and undershoot.
Piecewise Power Function Algorithm (PPFA) [30] It is more flexible than cubic spline interpolation, and the curve is smooth.
There is a contradiction between smoothness and flexibility and undershooting occurs.
In this paper, the EMD algorithm is improved by the quartic C 2 Hermite interpolation [27]. Suppose there are n + 1 nodes in the interval [a, b], which satisfy x 0 < x 1 < · · · < x n , then the traditional standard cubic Hermite basis function is . This is convenient for the calculation with the standard cubic Hermite interpolation method, which is a common method used in engineering. However, the shape of the fitting curve obtained from this method is fixed, which lacks adaptability and only satisfies C 1 continuous. In this paper, a quartic C 2 Hermite interpolation method is presented, which adds a control factor to adjust the shape of the curve. The basis function is as follows: where λ is a control factor to adjust the shape of the curve.
The basis function satisfies such conditions as follows: The method with quartic Hermite interpolation has the same properties as the one with traditional cubic Hermite interpolation, but the shape of the curve is more flexible due to the parameter λ. When the parameter λ equals 0, the curve is standard cubic Hermite spline curve, which satisfies C 1 continuous. The quartic Hermite spline curve at the interval of [a, b] is where y i is the function value of curve, . y i is its first derivative value, and ∆x i = x i+1 − x i . If the curve satisfies C 2 continuous, the second derivative must satisfy the condition as follows: And the continuous equation by calculation is as follows When λ satisfies the constraint condition as in Equation (6), the plotted curve will satisfy C 2 continuous and it will be smoother.

The Shape of the Envelope Curve Determination
It is known from Section 2.3 that the shape of the fitting curve with quartic C 2 Hermite interpolation is controlled by the parameter λ, and the curve cluster can be obtained with a different parameter λ. This paper draws the idea from reference [23] to find the curve whose length is the shortest in the curve cluster formed by different λ. Generally speaking, excessive length of curve will cause fitting overshoot. The curve length can be expressed as The conditions for searching the optimal curve are as follows: After many experiments, it is found that the optimal curves will appear only when λ < 2. If λ is too large, the smoothness of the curve will be affected. The shape of the curve will be determined by using the genetic algorithm to find the optimal value of λ. The detailed steps are as follows.

Singular Value Decomposition and Signal Reconstruction
The matrix singular value is usually used for signal feature extraction because of its good stability [20]. The signal matrix named X m×n is decomposed as follows: where U is matrix of left singular vectors and V is matrix of right singular vectors. Σ = diag(δ 1 , δ 2 , · · · , δ n ) is a singular value matrix and satisfies δ 1 ≥ δ 2 ≥ · · · ≥ δ n . The percentage of signal components is calculated as where λ k represents the percentage of the first k signal components, and N is the number of IMF components obtained by EMD. The threshold value of percentage is set to 0.9, which means 90% of the original signal can be obtained. The IMF components are reordered according to the correlation coefficient calculated from each IMF and the original signal X. When λ k > 0.9, the first k IMF components are used for signal reconstruction, and the remaining N−k IMF components are abandoned. The cross-correlation coefficient can be expressed as follows: where cov () represents the covariance function, var () represents the variance function, X(t) is the original signal, and IMF i represents the ith IMF component obtained by EMD. The range of the cross-correlation coefficient is [0,1], which reflects the matching degree between these two signals. The closer the value is to 1, the closer the component is to the original signal.

Fault Feature
Kurtosis is a dimensionless parameter that reflects the characteristics of signal distribution where x is the signal to be analyzed, µ is the signal mean value, and δ is the signal standard deviation. The fault rail vibration signals are usually accompanied by impact components to which the kurtosis is sensitive [31], so the larger the kurtosis value is, the higher the proportion of impact components in the signal, therefore the kurtosis can reflect the rail fault. Approximate entropy can measure the complexity of the signal [32]. The more complex the signal mode is, the more irregular the signal, and the larger the approximate entropy. The approximate entropy calculation method is as follows. (1) The original signal is called X, which is an m-dimensional vector composed of continuous points on a time scale (3) Given a threshold of similar tolerance represented by variable r, calculate each vector distance and get the number of less than r, which is expressed as C m i (r), (4) Calculate the log values of C m i (r) and get their mean value expressed as φ m (r), (5) Calculate the approximate entropy,

Fault Diagnosis Steps
The fault diagnosis steps are as follows and the flow chart is shown in Figure 2.
(1) Use an acceleration sensor to obtain the original vibration signal; (2) Process the vibration signal with the EMD algorithm based on quartic C 2 Hermite interpolation described in Sections 2.3 and 2.4 to obtain the IMF components; (3) Process the acquired IMF components by using the SVD algorithm to obtain a singular value matrix; (4) Determine the number of primary components of the signal and reconstruct the signal; (5) Calculate the kurtosis and approximate entropy of the reconstructed signal as the eigenvalues of fault diagnosis; (6) Train the SVM classifier with kurtosis and approximate entropy values of training samples; (7) Classify by the SVM classifier based on the different eigenvalues.

EMD Envelope Experiment
The algorithms with quartic C 2 Hermite interpolation and cubic Hermite interpolation are applied to the vibration signal processing. Figure 3 shows a part of the vibration signal, where the black curve is the original signal. The red is the upper envelope curve fitted with cubic Hermite interpolation algorithm and the blue is the upper envelope curve with the quartic C 2 Hermite interpolation algorithm. It can be seen from Figure 3 that obvious overshoot will appear with the cubic Hermite interpolation algorithm during non-stationary signal processing, and undershoot phenomenon appear in many positions (shown in the zoom Figure 3). But the quartic C 2 Hermite

EMD Envelope Experiment
The algorithms with quartic C 2 Hermite interpolation and cubic Hermite interpolation are applied to the vibration signal processing. Figure 3 shows a part of the vibration signal, where the black curve is the original signal. The red is the upper envelope curve fitted with cubic Hermite interpolation algorithm and the blue is the upper envelope curve with the quartic C 2 Hermite interpolation algorithm. It can be seen from Figure 3 that obvious overshoot will appear with the cubic Hermite interpolation algorithm during non-stationary signal processing, and undershoot phenomenon appear in many positions (shown in the zoom Figure 3). But the quartic C 2 Hermite interpolation algorithm presented in this paper effectively solves the overshoot and undershoot problem.

EMD Simulation Experiment
The signal in the simulation experiment is as follows: The simulation signal is composed of AM-FM and Gaussian white noise. The simulation platform is MATLAB2017a and the sampling frequency is set to 1000 Hz. In the simulation, the signal is processed separately with the traditional cubic Hermite interpolation algorithm, piecewise cubic Hermite interpolation algorithm, and the quartic C 2 Hermite interpolation algorithm, which are presented in this paper. The IMF components obtained by EMD based on the quartic C 2 Hermite interpolation algorithm and the corresponding spectrograms are shown in Figures 4 and 5, respectively. As can be seen from the figure, the frequency of the first IMF component (IMF1) is distributed in each frequency band, and the extracted IMF1 component approximates Gaussian white noise. The spectrums of IMF2 and IMF3 are concentrated at 100 Hz and 50 Hz, while the spectrum of IMF5 is concentrated at 10 Hz. Therefore, every mode of the signal can be effectively extracted with the proposed algorithm.
Mode mixing may appear in the decomposition [12]. Mode mixing means that an IMF component contains extremely different feature time scales or similar feature time scales distributed in different IMF components, which is caused by uneven distribution of signal extremums, signal interruption, or pulse, etc. The common method to solve mode mixing is to add white noise to the signal to suppress abnormal signal [15,17].

EMD Simulation Experiment
The signal in the simulation experiment is as follows: The simulation signal is composed of AM-FM and Gaussian white noise. The simulation platform is MATLAB2017a and the sampling frequency is set to 1000 Hz. In the simulation, the signal is processed separately with the traditional cubic Hermite interpolation algorithm, piecewise cubic Hermite interpolation algorithm, and the quartic C 2 Hermite interpolation algorithm, which are presented in this paper. The IMF components obtained by EMD based on the quartic C 2 Hermite interpolation algorithm and the corresponding spectrograms are shown in Figures 4 and 5, respectively. As can be seen from the figure, the frequency of the first IMF component (IMF1) is distributed in each frequency band, and the extracted IMF1 component approximates Gaussian white noise. The spectrums of IMF2 and IMF3 are concentrated at 100 Hz and 50 Hz, while the spectrum of IMF5 is concentrated at 10 Hz. Therefore, every mode of the signal can be effectively extracted with the proposed algorithm.     (19) and (20). The IO shows the orthogonality of the IMF component obtained with the EMD algorithm. The closer the value is to 0, the better the orthogonality between the IMF components. The IEC shows the energy conservation after signal decomposition. The closer the value is to 1, the higher the energy conservation by EMD. Mode mixing may appear in the decomposition [12]. Mode mixing means that an IMF component contains extremely different feature time scales or similar feature time scales distributed in different IMF components, which is caused by uneven distribution of signal extremums, signal interruption, or pulse, etc. The common method to solve mode mixing is to add white noise to the signal to suppress abnormal signal [15,17].
Index of orthogonality (IO) and index of energy conservation (IEC) are important indexes for evaluating the EMD results. The expressions are presented in Equations (19) and (20). The IO shows the orthogonality of the IMF component obtained with the EMD algorithm. The closer the value is to 0, the better the orthogonality between the IMF components. The IEC shows the energy conservation after signal decomposition. The closer the value is to 1, the higher the energy conservation by EMD.
where X(t) represents the original signal, f i (t) and f j (t) represent the i-th and j-th IMF components by EMD respectively, and r(t) is a trend item.
Objective evaluation indicators are used to validate the algorithm presented in this paper, the result is shown in Table 2. The IO and IEC obtained with the algorithm presented in this paper are better than those obtained with the traditional cubic Hermite interpolation algorithm and piecewise cubic Hermite interpolation algorithm.

The Effect of Noise on the Quartic C 2 Hermite Improved Empirical Mode Decomposition Algorithm
The influence of noise on the performance of EMD algorithm is analyzed through simulation experiments, and the simulation signal is: The signal is composed of sinusoidal signal with frequency of 20 Hz and 100 Hz superimposed with noise signal. The sampling frequency is set as 1000 Hz, the signal length is 1 s, and the noise variance is 0-1. Under different noises, the algorithm proposed in this paper is used to process the signal respectively. Figure 6 shows the IMF component with noise variance of 0.01, IMF2 is approximately a sinusoidal signal of 100 Hz, and IMF3 is approximately a sinusoidal signal of 20 Hz. The blue lines in the figure are IMF decomposed by EMD algorithm, and the gray lines are sinusoidal signals of 100 Hz and 20 Hz respectively. It can be seen from the figure that the algorithm in this paper can effectively decompose the signal and extract typical components of the signal. Objective evaluation indicators are used to validate the algorithm presented in this paper, the result is shown in Table 2. The IO and IEC obtained with the algorithm presented in this paper are better than those obtained with the traditional cubic Hermite interpolation algorithm and piecewise cubic Hermite interpolation algorithm.

The Effect of Noise on the Quartic C 2 Hermite Improved Empirical Mode Decomposition Algorithm
The influence of noise on the performance of EMD algorithm is analyzed through simulation experiments, and the simulation signal is: The signal is composed of sinusoidal signal with frequency of 20 Hz and 100 Hz superimposed with noise signal. The sampling frequency is set as 1000 Hz, the signal length is 1 s, and the noise variance is 0-1. Under different noises, the algorithm proposed in this paper is used to process the signal respectively. Figure 6 shows the IMF component with noise variance of 0.01, IMF2 is approximately a sinusoidal signal of 100 Hz, and IMF3 is approximately a sinusoidal signal of 20 Hz. The blue lines in the figure are IMF decomposed by EMD algorithm, and the gray lines are sinusoidal signals of 100 Hz and 20 Hz respectively. It can be seen from the figure that the algorithm in this paper can effectively decompose the signal and extract typical components of the signal.    Figure 7 shows the spectrum diagrams of each IMF component decomposed when the variances are 0.01, 0.1, 0.3, 0.5, 0.7, and 1 respectively, and the abscissa is represented by log 10. As can be seen from the figure, the increase of noise has a certain impact on the EMD decomposition and has a more significant impact on the high-frequency region. However, signal frequency characteristics are not affected, even when the noise variance is large.
The noise variance is increased from 0 to 1 with an interval of 0.01, and the EMD decomposition is carried out and the correlation coefficient is calculated. As shown in Figure 8, the red line is the correlation coefficient of IMF2 and 100 Hz sinusoidal signal, and the blue line is the correlation coefficient of IMF3 and 20 Hz sinusoidal signal. It can be seen that, with the increase of noise variance, the correlation coefficients of both components show a decreasing trend, and the reduction range of 20 Hz signal is significantly smaller than that of the 100 Hz signal. Since EMD algorithm has this characteristic, EMD algorithm can be used for signal noise reduction. In particular, when the target signal is distributed in the low-frequency component, the denoising effect is good. If the target signal frequency is high, the denoising effect is generally poor.
Sensors 2019, 19, x www.mdpi.com/journal/sensors coefficient of IMF3 and 20 Hz sinusoidal signal. It can be seen that, with the increase of noise variance, the correlation coefficients of both components show a decreasing trend, and the reduction range of 20 Hz signal is significantly smaller than that of the 100 Hz signal. Since EMD algorithm has this characteristic, EMD algorithm can be used for signal noise reduction. In particular, when the target signal is distributed in the low-frequency component, the denoising effect is good. If the target signal frequency is high, the denoising effect is generally poor. Sensors 2019, 19, x www.mdpi.com/journal/sensors variance, the correlation coefficients of both components show a decreasing trend, and the reduction range of 20 Hz signal is significantly smaller than that of the 100 Hz signal. Since EMD algorithm has this characteristic, EMD algorithm can be used for signal noise reduction. In particular, when the target signal is distributed in the low-frequency component, the denoising effect is good. If the target signal frequency is high, the denoising effect is generally poor.

Fault Eigenvalues Extracted from Rail Vibration Signal
The proposed algorithm is applied to vibration signal analysis to validate its effectiveness. The composition of the test system is shown in Figure 9, the rail is made with 60 kg/m standard steel. The sensors are mounted on the bottom of the rail so that the effect on train operation and track mechanical strength is minimal. The sensor used in the experiment is CYD103 piezoelectric acceleration sensor, its sensitivity is 20 pC/g. The sensor is connected to the YE5853 charge amplifier, which is used to modulate and transform the signals collected by the acceleration sensor for display on the computer and subsequent processing. The signal sampling frequency is 50 kHz. The IMF components of the signal obtained by EMD is as shown in Figure 10. mechanical strength is minimal. The sensor used in the experiment is CYD103 piezoelectric acceleration sensor, its sensitivity is 20 pC/g. The sensor is connected to the YE5853 charge amplifier, which is used to modulate and transform the signals collected by the acceleration sensor for display on the computer and subsequent processing. The signal sampling frequency is 50 kHz. The IMF components of the signal obtained by EMD is as shown in Figure 10. Singular value decomposition is performed for the IMF components obtained; the results are shown in Table 3. Suppose the accumulated percentage of principal components is 90%, then it can be seen that the percentage of the first eight principal component reaches 92.24%. So the signal can be reconstructed by using the first eight IMF components. mechanical strength is minimal. The sensor used in the experiment is CYD103 piezoelectric acceleration sensor, its sensitivity is 20 pC/g. The sensor is connected to the YE5853 charge amplifier, which is used to modulate and transform the signals collected by the acceleration sensor for display on the computer and subsequent processing. The signal sampling frequency is 50 kHz. The IMF components of the signal obtained by EMD is as shown in Figure 10. Singular value decomposition is performed for the IMF components obtained; the results are shown in Table 3. Suppose the accumulated percentage of principal components is 90%, then it can be seen that the percentage of the first eight principal component reaches 92.24%. So the signal can be reconstructed by using the first eight IMF components.  Figure 10. IMF components obtained by EMD. (a) IMF 1 to IMF 5, (b) IMF 6 to IMF 10.
Singular value decomposition is performed for the IMF components obtained; the results are shown in Table 3. Suppose the accumulated percentage of principal components is 90%, then it can be seen that the percentage of the first eight principal component reaches 92.24%. So the signal can be reconstructed by using the first eight IMF components. As is shown in Figure 11, the cross correlation coefficient calculations between the IMF components and the original signal are performed. The first eight IMF components with high cross correlation coefficient are used for reconstruction.
According to a large number of investigations, the main failure forms of high-speed rails in China are rail surface peeling and corrugation, which is shown in Figure 12. 1.95839 0.019708 1 As is shown in Figure 11, the cross correlation coefficient calculations between the IMF components and the original signal are performed. The first eight IMF components with high cross correlation coefficient are used for reconstruction.  Figure 11. Cross correlation for the IMF components and the original signal.
According to a large number of investigations, the main failure forms of high-speed rails in China are rail surface peeling and corrugation, which is shown in Figure 12.

•
Rail surface peeling damage During train operation, due to the large braking, sliding friction is produced by the locked wheels on the rail surface, causing the local temperature to rise rapidly. When the rail cools, the raw surface of the hard and brittle metal gets damaged, resulting in the rail surface peeling phenomenon due to the large cyclical impact between the wheel and rail.
As shown in Figure 13, the signal waveform and frequency spectrum of a typical normal rail surface are mainly distributed in the low frequency part, usually less than 500 Hz. However, for the rail surface with peeling damage, the signal has an obvious distribution in the high frequency band, as shown in Figure 14, and the main frequency increases significantly at 1000 Hz. By calculation, its kurtosis value is 42.3 and approximate entropy is 0.332, which can effectively identify the rail faults.

•
Corrugation damage Figure 11. Cross correlation for the IMF components and the original signal.
Sensors 2019, 19, x www.mdpi.com/journal/sensors As is shown in Figure 11, the cross correlation coefficient calculations between the IMF components and the original signal are performed. The first eight IMF components with high cross correlation coefficient are used for reconstruction.  Figure 11. Cross correlation for the IMF components and the original signal.
According to a large number of investigations, the main failure forms of high-speed rails in China are rail surface peeling and corrugation, which is shown in Figure 12.

•
Rail surface peeling damage During train operation, due to the large braking, sliding friction is produced by the locked wheels on the rail surface, causing the local temperature to rise rapidly. When the rail cools, the raw surface of the hard and brittle metal gets damaged, resulting in the rail surface peeling phenomenon due to the large cyclical impact between the wheel and rail.
As shown in Figure 13, the signal waveform and frequency spectrum of a typical normal rail surface are mainly distributed in the low frequency part, usually less than 500 Hz. However, for the rail surface with peeling damage, the signal has an obvious distribution in the high frequency band, as shown in Figure 14, and the main frequency increases significantly at 1000 Hz. By calculation, its kurtosis value is 42.3 and approximate entropy is 0.332, which can effectively identify the rail faults.

Rail surface peeling damage
During train operation, due to the large braking, sliding friction is produced by the locked wheels on the rail surface, causing the local temperature to rise rapidly. When the rail cools, the raw surface of the hard and brittle metal gets damaged, resulting in the rail surface peeling phenomenon due to the large cyclical impact between the wheel and rail.
As shown in Figure 13, the signal waveform and frequency spectrum of a typical normal rail surface are mainly distributed in the low frequency part, usually less than 500 Hz. However, for the rail surface with peeling damage, the signal has an obvious distribution in the high frequency band, as shown in Figure 14, and the main frequency increases significantly at 1000 Hz. By calculation, its kurtosis value is 42.3 and approximate entropy is 0.332, which can effectively identify the rail faults. Undulating wear is the roughness of the rail surface along the longitudinal direction, which appears most often in mountain areas. The causes are complex, involving rolling contact mechanics, material friction, vehicle coupling dynamics, and vibration load, vibration frequency, material characteristics, foundation, etc.
As shown in Figure 15, the signal under undulating wear usually generates abnormal vibration in the frequency region corresponding to the middle frequency band and the undulating frequency band. Since the excitation frequency is the same as the natural frequency of the rail, resonance happens, so the signal has a large amount of energy. Its kurtosis value is 32.8 and approximate entropy is 0.735.  •

Corrugation damage
Undulating wear is the roughness of the rail surface along the longitudinal direction, which appears most often in mountain areas. The causes are complex, involving rolling contact mechanics, material friction, vehicle coupling dynamics, and vibration load, vibration frequency, material characteristics, foundation, etc.
As shown in Figure 15, the signal under undulating wear usually generates abnormal vibration in the frequency region corresponding to the middle frequency band and the undulating frequency band. Since the excitation frequency is the same as the natural frequency of the rail, resonance happens, so the signal has a large amount of energy. Its kurtosis value is 32.8 and approximate entropy is 0.735. •

Fault identification
A large number of experiments show that the fault signals' kurtosis and approximate entropy calculated from corresponding reconstructed signals are both too large. The approximate entropy of fault signals is usually greater than 0.4 and the kurtosis is usually greater than 20. In this experiment, 100 sets of sample signals are used for identification and classification. The classification results based on the SVM and the fault identification accuracy of the proposed method in the paper are shown in Figures 16 and 17, respectively. It can be seen that all signal types can be identified effectively except one, which is taking the corrugation damage fault as a normal one. The reason may be the influence of the rail surface roughness. When the kurtosis and approximate entropy calculated from original signals are directly used for identification and classification, there appear eight classification errors totally; the reason may be the influence of surface roughness. Therefore this kind of direct classification accuracy is very low. Besides, three classification errors appear with •

Fault identification
A large number of experiments show that the fault signals' kurtosis and approximate entropy calculated from corresponding reconstructed signals are both too large. The approximate entropy of fault signals is usually greater than 0.4 and the kurtosis is usually greater than 20. In this experiment, 100 sets of sample signals are used for identification and classification. The classification results based on the SVM and the fault identification accuracy of the proposed method in the paper are shown in Figures 16 and 17, respectively. It can be seen that all signal types can be identified effectively except one, which is taking the corrugation damage fault as a normal one. The reason may be the influence of the rail surface roughness. When the kurtosis and approximate entropy calculated from original signals are directly used for identification and classification, there appear eight classification errors totally; the reason may be the influence of surface roughness. Therefore this kind of direct classification accuracy is very low. Besides, three classification errors appear with EMD algorithm based on the traditional cubic Hermite interpolation and the piecewise cubic Hermite interpolation. So it can be concluded that the method presented in this paper improves classification accuracy obviously. Figure 18 shows the accuracy of feature extraction and SVM recognition proposed in this paper by the eight methods mentioned in Table 1, which are CS (Cubic Spline), CTCP (Cubic Trigonometric Cardinal Precise spline [24]), BS (B-Spline [22]), OS (Direct constrained optimization [28]), OPCH (Optimized Piecewise Cubic Hermite [23]), MPCI (Monotone Piecewise Cubic Interpolation [29]), PPFA (Piecewise Power Function Algorithm [30]). By comparison, it is found that the comprehensive accuracy of this algorithm is superior to other algorithms.
Sensors 2019, 19, x www.mdpi.com/journal/sensors are shown in Figures 16 and 17, respectively. It can be seen that all signal types can be identified effectively except one, which is taking the corrugation damage fault as a normal one. The reason may be the influence of the rail surface roughness. When the kurtosis and approximate entropy calculated from original signals are directly used for identification and classification, there appear eight classification errors totally; the reason may be the influence of surface roughness. Therefore this kind of direct classification accuracy is very low. Besides, three classification errors appear with EMD algorithm based on the traditional cubic Hermite interpolation and the piecewise cubic Hermite interpolation. So it can be concluded that the method presented in this paper improves classification accuracy obviously. Figure 18 shows the accuracy of feature extraction and SVM recognition proposed in this paper by the eight methods mentioned in Table 1, which are CS (Cubic Spline), CTCP (Cubic Trigonometric Cardinal Precise spline [24]), BS (B-Spline [22]), OS (Direct constrained optimization [28]), OPCH (Optimized Piecewise Cubic Hermite [23]), MPCI (Monotone Piecewise Cubic Interpolation [29]), PPFA (Piecewise Power Function Algorithm [30]). By comparison, it is found that the comprehensive accuracy of this algorithm is superior to other algorithms.

Conclusions
Addressing the difficulties in rail compound fault diagnosis and the undershoot problem in vibration signal analysis based on the traditional EMD algorithms, an EMD algorithm based on the quartic C 2 Hermite interpolation is presented in this paper. It replaces the traditional cubic Hermite interpolation algorithm and improves the performance of the EMD algorithm, and the undershoot problem in signal filtration based on the EMD algorithm is reduced effectively. As described, the principle components of the signal can be determined and reconstructed after SVD and cross correlation coefficient calculation. Then kurtosis and approximate entropy of the reconstructed signal are calculated as fault eigenvalues, which are used to classify the fault based on the SVM. Experimental results show that the EMD algorithm based on quartic C 2 Hermite interpolation presented in this paper improves classification accuracy greatly.

Conclusions
Addressing the difficulties in rail compound fault diagnosis and the undershoot problem in vibration signal analysis based on the traditional EMD algorithms, an EMD algorithm based on the quartic C 2 Hermite interpolation is presented in this paper. It replaces the traditional cubic Hermite interpolation algorithm and improves the performance of the EMD algorithm, and the undershoot problem in signal filtration based on the EMD algorithm is reduced effectively. As described, the principle components of the signal can be determined and reconstructed after SVD and cross correlation coefficient calculation. Then kurtosis and approximate entropy of the reconstructed signal are calculated as fault eigenvalues, which are used to classify the fault based on the SVM. Experimental results show that the EMD algorithm based on quartic C 2 Hermite interpolation presented in this paper improves classification accuracy greatly.

Conflicts of Interest:
The authors declare no conflict of interest.