Fast EEMD Based AM-Correntropy Matrix and Its Application on Roller Bearing Fault Diagnosis

Yunxiao Fu 1,2, Limin Jia 2,3,*, Yong Qin 2,3, Jie Yang 2,4 and Ding Fu 5 1 State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing 100044, China; yunxiaof2012@163.com 2 School of Electric Engineering, Beijing Jiaotong University, Beijing 100044, China; qinyong2146@126.com (Y.Q.); 13114252@bjtu.edu.cn (J.Y.) 3 Beijing Research Center of Urban Traffic Information Sensing and Service Technologies, Beijing Jiaotong University, Beijing 100044, China 4 School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China 5 School of Business, University of Wollongong Australia, Wollongong 2519, Australia; df361@uowmail.edu.au * Correspondence: jialm@vip.sina.com; Tel./Fax: +86-10-5168-3824


Introduction
As one of the pivotal mechanized devices, roller bearings constantly rotate in harsh industrial environments that often feature high temperatures, variable rotational speeds, and big loads; as such, they have a high breakdown probability [1]. It is difficult to diagnose roller bearing faults because the fault burst of the roller is strong and because the ambient noise and hectic operating conditions in which they generally exist; as a result, the condition of the roller bearings may represent hidden dangers for mechanical and electrical systems. Consequently, it is of great significance to develop identification techniques for determining the condition of roller bearings in order to ensure the safety detection [27], medical signal processing [28], and so on. To achieve state identification, the Support Vector Machine (SVM) method has been widely employed in the machine learning community because of its distinctive generalization ability compared with that of other conventional methods, such as the neural network method [29]. Built on the foundation of the SVM algorithm, the Least Square Support Vector Machine (LSSVM) method possesses not only as good of a generalization ability as the general SVM method, but a better learning ability for small samples than the general SVM method. In addition, the calculation speed of LSSVM is faster than SVM based on the same identification accuracy [30] That is why LSSVM is prevalent in many fields related to pattern recognition [31,32].
Therefore, in terms of bearing fault feature extraction, the correlative information between time-frequency components and primary signals can be potential cachets. To extract the cachets comprehensively, here we propose the IMF correntropy matrix (IMFCM) which is comprised of four dimensions which are: fault state class, total number of sample data, data length, and IMF number in each FEEMD process. The centralized computation of a fault feature set can reflect the unity and logic of the sample set. The consequences of roller bearing fault identification will be obtained through LSSVM. In short, the purpose is to verify the performance of FEEMD and the divisibility and robustness of IMFCM in roller bearing fault diagnosis are better than the existing methods.
This paper contains seven sections. The first part introduces the basis of the study. The second section presents the research framework. The third part considers the instruction of the algorithms used to extract the fault features, and contains two subsections: the fast empirical mode decomposition algorithm and the definition and computation scheme of IMF correntropy matrix. The fourth sections discusses the application of IMFCM in fault identification, and contains two sub-sections: a brief overview of the Least Squares Support Vector Machine method and an evaluation of the identification consequence. The fifth section contains a case study and applicability experiment that considers both a Stationary Operating Situation and Cross-mixed Operating Situation. The sixth section is a discussion that analyzes the results of the experiment. The last part is the conclusion which summarizes the whole research work.

The Method Framework
The method framework contains three sub-procedures: the signal process, the feature extraction, and the state identification. The signal process is composed of a series of processes. In particular, the vibration acceleration signal is collected regardless of the operating condition of the roller bearings. The detailed framework for implementing the roller bearing fault diagnosis is shown in Figure 1. The parameters of s, c, and φ are for the primary vibration signal of roller bearings, the IMF decomposed from the primary signal, and the AM-correntropy, respectively. k, l, m, and n refer to the number of fault style, the number of primary signal samples, the length of each sample, and the amount of IMFs decomposed by a primary sample. The detailed mathematical relationships of the above parameters will be described in the following sections.

Brief Overview of EMD and EEMD Algorithms
To realize the stationary signal processing, the EMD algorithm separates the wave motion or tendency into different scales step by step, and then IMFs, a series of signal sequences with different characteristic scales, will be obtainable. To be an IMF, a sequence should follow two conditions: 1. The number of extreme value points and the number of zero-crossings must either be equal or differ at most by one in the whole primary signal. 2. At any point, the mean value of the envelope defined by local maxima and the envelope defined by the local minima is zero. The upper and lower envelopes are of local symmetry about the timeline.
The IMFs disassembled by EMD that satisfy the above two criterions are all near single frequency components and almost orthogonal signals [9]. The detailed description of the EMD algorithm execution process can be referenced in [9].
To prevent a primary single component signal from losing its physical meaning in the IMF sifting process, the stop norm is added to stop the sifting process. The general stoppage criterion makes use of standard deviation (SD) [16] as the stop condition. Besides the SD, there are many other stoppage criteria in applications such as the three parameters rule [33] and energy difference tracking method [34].
However, a susceptibility to mode mixing is one of the EMD shortcomings, which may reveal the signal's characteristic information incorrectly and may generate the crucial influence of the specific physical background [35].
To assuage the drawback of mode mixing which is intrinsic in the performance of EMD, EEMD was proposed by Wu and Huang [12]. To eliminate the intermittency of the primary signal, and then restrain the generation of mode mixing effectively, it is necessary to add white Gaussian noise with different amplitudes before each decomposed step, as white Gaussian noise has the statistical property of uniform distribution in the whole time-frequency domain, making the signal continuity better and more easily eliminated by superposition. Meanwhile, the statistical mean value of

Brief Overview of EMD and EEMD Algorithms
To realize the stationary signal processing, the EMD algorithm separates the wave motion or tendency into different scales step by step, and then IMFs, a series of signal sequences with different characteristic scales, will be obtainable. To be an IMF, a sequence should follow two conditions: 1.
The number of extreme value points and the number of zero-crossings must either be equal or differ at most by one in the whole primary signal.

2.
At any point, the mean value of the envelope defined by local maxima and the envelope defined by the local minima is zero. The upper and lower envelopes are of local symmetry about the timeline.
The IMFs disassembled by EMD that satisfy the above two criterions are all near single frequency components and almost orthogonal signals [9]. The detailed description of the EMD algorithm execution process can be referenced in [9].
To prevent a primary single component signal from losing its physical meaning in the IMF sifting process, the stop norm is added to stop the sifting process. The general stoppage criterion makes use of standard deviation (SD) [16] as the stop condition. Besides the SD, there are many other stoppage criteria in applications such as the three parameters rule [33] and energy difference tracking method [34].
However, a susceptibility to mode mixing is one of the EMD shortcomings, which may reveal the signal's characteristic information incorrectly and may generate the crucial influence of the specific physical background [35]. To assuage the drawback of mode mixing which is intrinsic in the performance of EMD, EEMD was proposed by Wu and Huang [12]. To eliminate the intermittency of the primary signal, and then restrain the generation of mode mixing effectively, it is necessary to add white Gaussian noise with different amplitudes before each decomposed step, as white Gaussian noise has the statistical property of uniform distribution in the whole time-frequency domain, making the signal continuity better and more easily eliminated by superposition. Meanwhile, the statistical mean value of unrelated random noise has been proved to be zero, so the ensemble mean calculation of the IMFs can almost remove the interference of white noises. The procedure of EEMD can be depicted as follows [12]: Add numerically generated white noise δ i (t) on the primary signal s(t) as demonstrated in the following equation, where i means the i-th trail of adding white noise with n e trails, and i = 1, 2, . . . , n e : Decompose the signal s i (t) by EMD algorithm to get several IMFs c ij (t) and a residue r i (t), in which c ij (t) is the j-th IMF component of the i-th EMD of s i (t) with j = 1, 2, . . . , n, and r i (t) is the i-th EMD residue.
Step One: Repeat the two aforementioned steps, and add different white noise sequences with the same root mean square each time.
Step Two: Calculate the ensemble means c j (t) of the corresponding IMFs of the decompositions as the final IMFs which can be represented by the following equation: where c j (t) is the j-th IMF component decomposed from the primary signal using EEMD.

Brief Overview of Fast EEMD
It is obvious that the process of EEMD indicates a more sophisticated signal processing technique for nonlinear and non-stationary signals than EMD. Hence, EEMD generates more computational complexity [14].
Practically, Wang [15] has proven that the computing process of EMD and EEMD is not intensive and tedious. It has been demonstrated that the computational time complexity (CTC) of EMD is no greater than n s¨( 41¨n¨m), which is proportional to shifting time n s , IMF dimension n, and signal length m, and the CTC of EEMD is no greater than NE¨n s¨( 41¨n¨m), which is proportional to the EMD CTC and ensemble time n e respectively. Generally, getting n e by SD stoppage criterion costs at least 5m CTC. The fast EEMD (FEEMD) algorithm indicates that the fixed value of n s can perform better than the n s obtained by other stoppage criteria. Furthermore, it has been summarized by Wu [12,36] that to get better decomposed results the assignment of n s should generally be 10. Taking n s and n e as constants, the CTC of FEEMD can be described as T(m) = O(m¨logm) [15], where n = log 2 m for EMD is performed as a dyadic filter bank [36]. Nevertheless, using the SD criterion the CTC of EEMD is no less than 205¨n e¨m 2¨l og 2 m = O(m 2¨l ogm). The theoretical curve of the relationship between log 10 (CTC) and data length m is shown in Figure 2.

Definition and Computation Scheme of IMF Correntropy Matrix
The length of primary signal s(t) is m and the sample amount of primary signal is l, to construct primary signal matrix s(t)l × m under k-th fault states separately, it generates the sample matrix (SM) of k × l × m dimensions. Next, after the FEEMD process of each signal si(t), n IMFs are totally generated, then all the IMFs should be blocked in k × l × n × m IMF matrix (IMFM) to be processed more efficiently by the computer.

Brief Overview of Correntropy
For determining more reliable fault features, it is worth clarifying the proportion of each IMF component in the primary signal and the distance of every sampling point in each IMF component that leaves from the corresponding point in the primary signal. In considering this situation, the minutia is similarly significant and needs to be examined.
Correntropy is a generalized similarity measure that estimates the probabilistic similarity between two arbitrary random variables [37]. It is computationally intensive to determine the correntropy of two-dimensional random variable in a high-dimensional data space. The kernel function aims to solve the high-dimension problem of computation without specification of the nonlinear transformation function. Through the kernel function, the obtainment of the linear correlation information can be implemented in the higher dimensional linear space, which is the projection of lower dimensional nonlinear space. Correntropy as a continuous function has been deeply analyzed in the literature [23,24], however the computer can only handle the data discretely. Suppose that X and Y are two discrete random variables with m sampling points. Generally, the joint probability function PXY of X and Y is unknown and the sampling point amount m is a finite integer, thus the correntropy is calculated by the discrete sample estimation instead of the continuous correntropy model. The equation is as follows.
where K is the arbitrary positive definite kernel function which satisfies Mercer's Conditions [24]. The Gaussian kernel (also called the radial basis function (RBF) kernel) is employed here to calculate the correntropy. The sample estimation of correntropy with the Gaussian kernel function is described as follows:

Definition and Computation Scheme of IMF Correntropy Matrix
The length of primary signal s(t) is m and the sample amount of primary signal is l, to construct primary signal matrix s(t) lˆm under k-th fault states separately, it generates the sample matrix (SM) of kˆlˆm dimensions. Next, after the FEEMD process of each signal s i (t), n IMFs are totally generated, then all the IMFs should be blocked in kˆlˆnˆm IMF matrix (IMFM) to be processed more efficiently by the computer.

Brief Overview of Correntropy
For determining more reliable fault features, it is worth clarifying the proportion of each IMF component in the primary signal and the distance of every sampling point in each IMF component that leaves from the corresponding point in the primary signal. In considering this situation, the minutia is similarly significant and needs to be examined.
Correntropy is a generalized similarity measure that estimates the probabilistic similarity between two arbitrary random variables [37]. It is computationally intensive to determine the correntropy of two-dimensional random variable in a high-dimensional data space. The kernel function aims to solve the high-dimension problem of computation without specification of the nonlinear transformation function. Through the kernel function, the obtainment of the linear correlation information can be implemented in the higher dimensional linear space, which is the projection of lower dimensional nonlinear space. Correntropy as a continuous function has been deeply analyzed in the literature [23,24], however the computer can only handle the data discretely. Suppose that X and Y are two discrete random variables with m sampling points. Generally, the joint probability function P XY of X and Y is unknown and the sampling point amount m is a finite integer, thus the correntropy is calculated by the discrete sample estimation instead of the continuous correntropy model. The equation is as follows.V where K is the arbitrary positive definite kernel function which satisfies Mercer's Conditions [24]. The Gaussian kernel (also called the radial basis function (RBF) kernel) is employed here to calculate the correntropy. The sample estimation of correntropy with the Gaussian kernel function is described as follows:V σ pX, Yq " The size of the kernel scale parameter σ determines the metric of similarity in the reproducing kernel Hilbert space (RKHS). Experientially, the kernel scale parameter σ is generated by the Silverman standard [24]. However, to keep the fixed inner product in RKHS, it needs a fixed σ value which is determined to be 0.5 for a balanced effect here. Additionally, it is necessary to keep the same computation for each feature value in order to ensure the fault diagnosis results are meaningful. It can be observed in Equation (4) that correntropy obeys two rules: symmetry (V(X,Y) = V(Y,X)) and boundedness (0 < V(X,Y) < 1/ ? 2πσ). The correlation information can be revealed by Taylor's series expansion of correntropy, which is shown in Equation (5): It contains a two order term of self-correlation information while m = 1 and an independent component analysis correlated term while m = 4. The weighted sum of even order moments reflects that correntropy includes all even order statistic messages, where a two order term makes up the major component and the quick attenuation of high order terms follows the addition of σ. Correntropy is therefore two order statistics with more ascendancy thereby implying more correlation statistical information, less information redundancy, and better correlation robustness than traditional correlation approaches.

Derivation of IMF Correntropy Matrix
Being that it is based upon the accumulation of local similarity, correntropy is considered as a local similarity measurement so that the linear degree of signal is negligible. In addition, each IMF component in the time domain has independent data distributions with partial similarity with the primary signal. As such, correntropy can measure the local similarity between IMF and the primary signal. Hence, it is sensitive to the local shock response. However, linear similarity is another measurement which contains the global similarity information that correntropy lacks. Adding the linear correlation coefficient as an AM operator, results in improved correntropy will which is called IMF-original-signal correntropy (IMFC). Based on the IMFC feature, the different fault styles of roller bearings can be embodied. Supposing ρ is the correlation coefficient between the primary signal and the IMF component.
where φ "V(PF,S). The above part has mentioned that IMFM is kˆlˆnˆm dimensional, thus one signal's IMFC estimate φ 1 is of 1ˆn dimensions. Using Φ = (2mσ)´1¨řexp(´ H ¨(2σ 2 )´1) as the correntropy operator, where ¨ is the Euclidean norm and H = c nˆm´s1ˆm is the input matrix, then the IMFC vector φ 1 1ˆn can be solved as follows.
In order to reduce the calculation sophistication, it takes the followed equation to normalize φ 1 so that the normalized solution φ can be obtained. To extend Equations (7) and (8) to all the primary signal samples in the whole state set, assuming ∆ = kˆl, Ω = ∆ˆn, then the calculation process of the Ω-dimensional IMFC matrix (IMFCM) is described as follows: where c ∆ and s ∆ signify the sample matrix of c kl and s kl , and φ Ω is the initial IMFCM. Furthermore, to express the convenience of calculation, it is necessary to partly transpose φ Ω . Accordingly, drawing support of Γ = (kˆl) 1ˆn , φ Γ is the final target result of IMFCM.

Application of IMFCM in Fault Identification
To implement the fault identification based on IMFCM, the intelligent decision-making algorithm should be involved in this work. There are many studies of roller bearing fault identification that draw support from SVM [38]. As the development of SVM, LSSVM has been widely used recently for its faster calculation ability than SVM. Besides, it is of the same significance to choose an exhaustive index to evaluate the identification consequence. Here, LSSVM and its output evaluation are briefly analyzed accordingly.

Brief Overview on Least Square Support Vector Machine
Based on the structural risk minimization principle, the Support Vector Machine (SVM) method as one of the machine learning approaches can perform well in the analysis of a small capacity, high dimensional, and non-stationary sample set. The core idea of SVM is to build up a hyperplane as a decision surface in order to maximize the isolated edge of sample sets of different types. The linear discriminant function is used to divide the object samples into two classes. Supposing ω is the weight vector and x R [m,n] owns n samples, each of which has m dimensions.
y " sgn ppω¨xq`bq (10) However, nonlinear classification discrimination needs the nonlinear function ϕ(x) to map samples to high-dimensional linear space, and then the optimal classification hyperplane is constructed in the high-dimension linear space. Training sample number affects computation speed and the solution of quadratic programming becomes complex if the training sample number increases. Solving this problem is one of the merits of LSSVM [39], where equality optimization constraints instead of inequality constraints are applied. Furthermore, the low computational cost of LSSVM allows it to be used in a wide variety of applications.
Provided the training set is TR = {(x i , y i ), i = 1, 2, . . . , n} (x i R m ), then the quadratic programming can be converted into the following description: where J is the objective function, γ is the penalty coefficient whose greater value expresses a more severe penalty for erroneous classifications, and ξ is a slack variable which solves the problem in that the largest geometric interval value is negative when the samples can be measured by no means. Moreover, Equation (11) is a typical quadratic programming case. To simplify the calculation, it is necessary to integrate the objective function and constraints into the Lagrange function; the optimal solution will be generated from this function. As such, the corresponding Lagrange function is: where α i is Lagrange multiplier. The conditions for optimality, similarly to the SVM problem, can be given by the partial derivatives of L(ω, b, ξ, α) with respect to ω, b, ξ i , and α i .
To remove ω and ξ i by element elimination, the above equations calculating the process can be described as follows.
. . , ξ n ], and α = [α 1 , α 2 , . . . , α n ]. In addition, the inner product operation of ZZ T can be represented by the Kernel function K(x i , x j ) instead which satisfies the Mercer condition. Suppose Ω = ZZ T , then ( 15) The classification decision function of LSSVM at point x i can be evaluated aŝ For multi-class identification, different association strategies of multi SVM operators, such as one-against-one (OAO), one-against-all (OAA), and directed acyclic graph (DAGSVM), can be employed to carry out rotating machine fault identifications. A detailed discussion of these approaches has been described in [40].

Evaluation of Identification Consequence
Multiple perspectives of evaluation can give an exhaustive examination of algorithm outcomes. The main appraisal principle is the integration of positive and negative evaluation indicators and local and global indicators. Here, correct rate (CR, positive and local indicator), misdiagnosis rate (MR, negative and local indicator), and average-identification rate (AR, positive and global indicator) are employed as the evaluation indicators. The CR index refers to the proportion of correct recognized samples out of the total number of tested samples in this category. MR means the proportion of the tested samples wrongly divided into this category out of the total number of tested samples of the other categories, and AR refers to the average accuracy of all categories.
In which θ is CR, η is MR, µ is AR, and their values all range from 0 to 1. λ is the sample number recognized correctly among the specific category samples, χ refers to the sum of the tested samples of the specific category, τ is the sample number of other categories recognized as the specific category, ζ is the sum of all samples, ψ is the sum of all class identification accuracy, and υ is the number of categories.

Case Study and Applicability Experiment
The objective of the experiment was to inspect if the fault feature IMFCM has superior fault diagnosis performance than the traditional features, both in abiding operating situations and hybrid operating situations. The vibration acceleration signal of the roller bearing is collected from the open source website of Case Western Reserve University whose vibration test platform is shown in Figure 3 [41]. The type of roller bearing is a deeply grooved ball bearing 6205-2RS JEM SKF whose outer ring is fixed with no guard. The format of size AˆSˆW is 25 mmˆ52 mmˆ15 mm, where A, S, and W represent aperture of inner ring, shaft diameter of outer ring, and width of roller bearing,  of the specific category, τ is the sample number of other categories recognized as the specific category, ζ is the sum of all samples, ψ is the sum of all class identification accuracy, and υ is the number of categories.

Case Study and Applicability Experiment
The objective of the experiment was to inspect if the fault feature IMFCM has superior fault diagnosis performance than the traditional features, both in abiding operating situations and hybrid operating situations. The vibration acceleration signal of the roller bearing is collected from the open source website of Case Western Reserve University whose vibration test platform is shown in Figure 3

The Test of FEEMD Computational Time Complexity
First, we chose one vibration signal sample from each fault style of {I, II, III, IV} to check the computational time cost and the storage space cost of EMD, EEMD, and FEEMD. Figure 4 shows the relationship between the data length, IMF number, and the computing time cost of EMD, EEMD, and FEEMD. It is conceivable that FEEMD combines the advantages of EMD and EEMD, in that it computes faster than EEMD and weakens the mode mixing effect. The detailed computational time statistics are presented in Table 1, which shows the quantitative superiority of FEEMD. Figure 5 demonstrates the IMFs in time domain of EMD, EEMD, and FEEMD in a sample of roller bearing vibration signals for ball fault. Although the computing time is cut down, the IMFs still performed well compared with EMD and EEMD.

The Test of FEEMD Computational Time Complexity
First, we chose one vibration signal sample from each fault style of {I, II, III, IV} to check the computational time cost and the storage space cost of EMD, EEMD, and FEEMD. Figure 4 shows the relationship between the data length, IMF number, and the computing time cost of EMD, EEMD, and FEEMD. It is conceivable that FEEMD combines the advantages of EMD and EEMD, in that it computes faster than EEMD and weakens the mode mixing effect. The detailed computational time statistics are presented in Table 1, which shows the quantitative superiority of FEEMD. Figure 5 demonstrates the IMFs in time domain of EMD, EEMD, and FEEMD in a sample of roller bearing vibration signals for ball fault. Although the computing time is cut down, the IMFs still performed well compared with EMD and EEMD.

Stationary Operating Situations
Whatever operating situation the roller bearing is in, the matrix required for calculating the fault characteristic needs to be emphasized. For IMFCM, the dimensional value Γ: lˆkˆn needs to be solved, as well as the length m of each primary signal. Only if these parameters are cleared the high-dimensional experiment data can be conveniently calculated in blocks.
In this section, the fault diagnosis experiment of a roller bearing in a stationary operating situation is implemented. The experiment flow chart is provided in Figure 6 and the detailed steps are as follows.  Step four: fault identification. LSSVM has been proven to be a well-applied classifier when σ = 0.5 [40]. Here, the input set is made up of the four dimensions feature vectors acquired from step three, and the output set  Step one: data collection. The motor operating speed and load are constant. The roller bearing vibration data samples are distributed into three groups (A, B, and C) based on different operating parameters, each of which contains four fault styles of roller bearing expressed by {I, II, III, IV}. The roller bearing operating parameters are shown in Table 2. The sampling frequency is 12 kHz, under which 118 data sample sets have been obtained. Each data sample contains 1024 points. The training data sample number of each fault style is 68, and the testing data sample number is 50. From the above information it is obvious that the total testing sample number is l = 200, the fault style number is k = 4, and the signal length m = 1024. The only parameter still unknown is the IMF number n for each primary signal.
Step two: time-frequency decomposition of vibration signal. Here, the primary signals of three groups are decomposed into IMF components by FEEMD. From Figure 5c it is obvious that the last two IMFs have little wave ingredients so that the meaningful IMFs contains only the first eight components.
Step three: feature extraction. Calculate IMFCM of the roller bearing vibration signal in each group according to the aforementioned process. To verify the fault diagnosis accuracy of IMFCM, the IMF energy moment matrix (IMFEMM) [42], IMF fuzzy entropy matrix (IMFFEM) [43], and IMF spectral kurtosis matrix (IMFSKM) [44] are considered as controlled objects which are extracted in the same process as the IMFCM, as shown in Figure 6. It is obvious that the features under the fourth component possess less feature diversity than the first four components in Figure 7 so they naturally have less contribution to the fault divisibility. Hence, component set {IMF1, IMF2, IMF3, IMF4} is chosen as the whole fault feature set for the roller bearing. Until now the parameters are all clear by lˆkˆnˆm = 200ˆ4ˆ4ˆ1024. By using the matrix form, the computation process will be intelligible and efficient.

Cross-Mixed Operating Situations
The vibration data is the same as in Section 5.2. The purpose of the cross-mixed operating condition is to inspect the fault diagnosis robustness for the rotating condition of the IMFCM.
Step one: data extraction. Two hundred continuous samples are selected randomly from each roller bearing fault style dataset in group A, B, and C, respectively. The first 150 continuous samples of each fault style are intercepted, respectively, in group A, B, and C which are then mixed as a training set while another 150 samples from each group are designated as the testing set. As such, the total training set sample number is 450 with an equally sized testing set sample number. This fashion of cross-mixed rotating conditions follows the principle of distributing the training and testing dataset on the average of each group, which is called a Homogeneous Group (HG). Another method chooses 100 continuous samples of each fault style in group A and B both randomly as the training dataset, and 100 continuous samples of each fault style in group C as the testing dataset. This mixing fashion extracts the training dataset and testing dataset from different operating situations, respectively, so that it is called Biased Group (BG). The roller bearing operating parameter distributions in HG and BG are shown in Figure 8.  I  II  III  IV  I  II  III  IV  I  II  IIII

Cross-Mixed Operating Situations
The vibration data is the same as in Section 5.2. The purpose of the cross-mixed operating condition is to inspect the fault diagnosis robustness for the rotating condition of the IMFCM.
Step one: data extraction. Two hundred continuous samples are selected randomly from each roller bearing fault style dataset in group A, B, and C, respectively. The first 150 continuous samples of each fault style are intercepted, respectively, in group A, B, and C which are then mixed as a training set while another 150 samples from each group are designated as the testing set. As such, the total training set sample number is 450 with an equally sized testing set sample number. This fashion of cross-mixed rotating conditions follows the principle of distributing the training and testing dataset on the average of each group, which is called a Homogeneous Group (HG). Another method chooses 100 continuous samples of each fault style in group A and B both randomly as the training dataset, and 100 continuous samples of each fault style in group C as the testing dataset. This mixing fashion extracts the training dataset and testing dataset from different operating situations, respectively, so that it is called Biased Group (BG). The roller bearing operating parameter distributions in HG and BG are shown in Figure 8. Step two: To execute FEEMD, feature extraction, and intelligent identification processes in turn. The procedure is the same as steps two and three provided in Section 5.2.
Step three: To compare the test performances and explain the consequences.

Discussion
From Figure 4 and Table 1, it can be concluded intuitively that the calculation time cost of FEEMD is the least of the three algorithms. Furthermore, FEEMD can obtain a constant number of IMF so that it shows stable decomposition process. Additionally, the results of FEEMD are as good as those of EEMD, to which it is compared in Figure 5.
Analyzing Figure 7, the first four-dimensional CM, EMM, FEM, and SKM are all stable, and have obvious diversity among different fault styles. However, the last four-dimensional CM, EMM, FEM, and SKM are unstable and own ledivisibility relatively. Hence, it can be concluded qualitatively that to obtain accurate fault identification results, the last four-dimensional feature Step two: To execute FEEMD, feature extraction, and intelligent identification processes in turn. The procedure is the same as steps two and three provided in Section 5.2.
Step three: To compare the test performances and explain the consequences.

Discussion
From Figure 4 and Table 1, it can be concluded intuitively that the calculation time cost of FEEMD is the least of the three algorithms. Furthermore, FEEMD can obtain a constant number of IMF so that it shows stable decomposition process. Additionally, the results of FEEMD are as good as those of EEMD, to which it is compared in Figure 5.
Analyzing Figure 7, the first four-dimensional CM, EMM, FEM, and SKM are all stable, and have obvious diversity among different fault styles. However, the last four-dimensional CM, EMM, FEM, and SKM are unstable and own ledivisibility relatively. Hence, it can be concluded qualitatively that to obtain accurate fault identification results, the last four-dimensional feature should be moved from the feature matrix. Furthermore, Figure 9 shows the quantitative comparison of the four features based on the samples of the 0.007 inches ball fault data from Group A. The results shown in Figure 9 are the LSSVM outputs. The identification results in Figure 9 indicate that the diagnosis accuracy of the first four dimensional feature matrix is higher than the eight dimensional feature matrix. Therefore, it is the first four dimensional IMFs that perform as the more effective feature matrix. The dimension of the testing set is 200ˆ4ˆ4ˆ1024. should be moved from the feature matrix. Furthermore, Figure 9 shows the quantitative comparison of the four features based on the samples of the 0.007 inches ball fault data from Group A. The results shown in Figure 9 are the LSSVM outputs. The identification results in Figure 9 indicate that the diagnosis accuracy of the first four dimensional feature matrix is higher than the eight dimensional feature matrix. Therefore, it is the first four dimensional IMFs that perform as the more effective feature matrix. The dimension of the testing set is 200 × 4 × 4 × 1024.
(a)  The roller bearing fault diagnosis results of the three fault diameters (0.007 inches, 0.014 inches, and 0.021 inches) are revealed in Tables 3-5. The diagnosis accuracy of SKM were less than CM, EMM, and FEM as reflected by θ. Corresponding to column "I", the misdiagnosis rate η of non-normal data is the key reference for reliability in engineering applications. Through contrasting the four features, CM and FEM were observed to have a value of zero for η in all the tests of Tables 3-5; as such, CM and FEM are qualified to measure the reliability of roller bearings. µ reflects the average diagnosis ability.
Having µ values for all the features above 85% means that CM, EMM, FEM, and SKM are capable of implementing roller bearing fault diagnosis work. To summarize what is mentioned above, under the stationary operation condition, CM as the fault feature has almost the same diagnosis ability with EMM and FEM, and is better than SKM. Meanwhile, CM can prevent the fault data from misdiagnosing into the normal state, and thus it can provide the reliability of the application to the roller bearing. Figures 10 and 11 present the intuitive fault diagnosis results based on LSSVM, and Tables 6-11 show the exhaustive quantified results. In analyzing the results, several conclusions can be discovered. CM is the only feature that maintains high accuracy of three fault diameter conditions in group HG and BG. However, the diagnosis accuracy of SKM in each condition is not good enough to implement the fault identification (i.e., SKM has no qualification to diagnose fault of roller bearings in non-stationary operating conditions). With the increase of the fault diameter, the fault diagnosis accuracy of each feature decreases, especially for EMM and FEM. More specifically, CM is sensitive not only to the light faults but deep faults, while EMM and FEM are exceedingly sensitive to light faults and are insensitive to deep faults. In terms of BG, it generates lower diagnosis accuracy for EMM and FEM than HG. This can be explained in that the feature distribution is not balanced in each operating situation of EMM and FEM. Nevertheless, that phenomenon which is weakened by CM proves that CM has the robustness for fault identification in various operating situations. Although the AR µ of CM in Tables 10 and 11 decreased to below 90%, the normal state in each fault diameter condition has been well recognized, and is better than the other features. This proves that correntropy is sensitive to shock response.
All in all, the results verify that CM is the only feature that possesses both the robustness and the high diagnosis accuracy in a variety of different operating situations.     All in all, the results verify that CM is the only feature that possesses both the robustness and the high diagnosis accuracy in a variety of different operating situations.

Conclusions
In order to extract the fault feature and recognize the fault pattern of bearing vibration signals, this paper proposes a novel approach combining the FEEMD, IMFCM, and LSSVM methods. The analysis results from the simulation signal and the experiment data demonstrate the superiority of the approach presented by this paper.
Compared with EMD and EEMD, the test result of FEEMD shows the two advantages, namely the faster computation time and the avoidance of the mode mixing effect. The FEEMD-IMFCM-LSSVM approach framework is proposed here to achieve roller bearing fault identification both in stationary and non-stationary operating situations. Bearing rotating speed and load were the changing operating parameters considered by this paper. The IMFCM was extracted from the IMF component that was decomposed from the roller bearing vibration signal by FEEMD. Exhaustively, it is through the AM-correntropy mathematical model of primary vibration signal and IMF components that IMFCM could be obtained.
Through experiments considering single and cross-mixed operating situations, IMFEMM, IMFFEM, and IMFSKM were compared in their ability to diagnose roller bearing faults. The compared diagnosis results verify that IMFCM has the highest fault divisibility and robustness of its operating state in both single and mixed operating conditions. To be expanded for further consideration, the identification of roller bearing damage degree has great significance in improving the ability of fault warnings. In the future, the application of IMFCM could focus on performance degradation detection and identification of the degree of damage to rotary machines.