Rolling Bearings Fault Diagnosis Based on Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise, Nonlinear Entropy, and Ensemble SVM

: Rolling bearings are fundamental elements that play a crucial role in the functioning of rotating machines; thus, fault diagnosis of rolling bearings is of great signiﬁcance to reduce catastrophic failures and heavy economic loss. However, the vibration signals of rolling bearings are often nonlinear and nonstationary, resulting in di ﬃ culty for feature extraction and fault recognition. In this paper, a hybrid method for multiple fault diagnosis of rolling bearings is presented. The bearing vibration signals are decomposed with the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) to denoise and extract nonlinear entropy features. The nonlinear entropy features are further processed to select the more discriminative fault features and to reduce feature dimension. Then a multi-class intelligent recognition model based on ensemble support vector machine (ESVM) is constructed to diagnose di ﬀ erent bearing fault modes as well as fault severities. The e ﬀ ectiveness of the proposed method is assessed via experimental case studies of rolling bearings under multiple operational conditions (i.e., speeds and loads). The results show that our method gives better diagnosis results as compared to some existing approaches.


Introduction
Rolling bearings are fundamental components that are widely used in rotating machinery, for example, in airplanes, machining centers, and wind turbines. As a vulnerable link in these machines, rolling bearing faults can result in breakdown of the whole machinery. According to reference statistics, about 40% (for large machines) to about 90% (for small machines) of failures can be attributed to bearing faults [1]. Therefore, rolling bearing fault diagnosis has been the research hot topic in machine health monitoring over the past few decades.
The process of rolling bearing fault diagnosis basically comprises fault feature extraction and fault classification. To extract the fault features from vibration signals is the first and most essential step. Traditionally, bearing fault features are extracted with time-domain or frequency-domain analysis methods [2,3]. However, due to variations such as friction, damping, and operational conditions, bearing vibration signals often are nonlinear and nonstationary. Thus, bearing fault feature extraction is now more frequently analyzed through time-frequency/scale methods, including short-time Fourier a promising classification method for small samples and has been successfully applied to many engineering fields, such as remote sensing, credit scoring, mechanical fault diagnosis, etc. [33][34][35]. However, many SVMs should be combined for a multi-class classification since SVM is designed for binary classification problem. Secondly, SVM learning for big data applications is of low efficiency. To overcome these issues, the ensemble SVM (ESVM) was proposed and has been verified with better classification performance than a single SVM for multi-pattern recognition [36][37][38]. ESVM is a collection of SVMs, and the classification results of these SVMs are combined in some way to output the final decision; thus, the classifying generalization performance is to be improved by utilizing the difference of each individual SVM.
Through the above literature analysis, a novel method for multi-fault diagnosis based on ICEEMDAN, nonlinear dynamics entropy, and ensemble SVM is presented for rolling bearings. In the proposed method, the nonstationary bearing vibration signal is decomposed by ICEEMDAN, and three kinds of entropy features are extracted from each of the obtained IMFs. Then, a distance-based method is formulated for selecting discriminative features, and a multi-class ESVM is constructed and trained for intelligent classification of multi-fault rolling bearings. The innovative contributions of this work can be summarized as follows: (1) a hybrid method by the integration of nonstationary signal adaptive decomposition, nonlinear entropy analysis, and ensemble learning for multi-fault diagnosis of the rolling bearing is systematically presented; (2) nonlinear entropy features including DisEn, PerEn, and SampEn of IMFs from ICEEMDAN are extracted to describe the characteristics of different rolling bearing fault categories; and (3) an ensemble of multi-class SVMs trained with the selected discriminative entropy features is used to accurately and intelligently classify different faults and different severities of the rolling bearing.
The remainder of this paper is organized as follows. Section 2 briefly introduces the background theory knowledge. Section 3 systematically outlines the proposed approach. Section 4 presents application studies of the presented method to rolling bearing experimental data to verify its effectiveness and validity. Finally, Section 5 concludes the paper with summaries and future directions.

Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN)
To enhance the performance of the EMD, some noise-assisted improvements have been proposed. The main idea of these improved versions is to add controllable noise to the signal so that the problems with the original EMD can be solved to some extent. Most recently, ICEEMDAN was proposed to obtain less noise and more physical meaning components with a lower computational cost [13]. The basic principle of ICEEMDAN is reformulated as follows.
Given a univariate signal s = [s 1 , s 2 , . . . , s L ] T of length L, let w (i) be a realization of zero mean unit variance white noise vector of the same length, E k (·) be the operator which produces the kth mode obtained by EMD, M(·) be the operator which produces the local mean of the signal that is applied to, and network and then categorized bearing faulty states with a simplified fuzzy adaptive resonance theory map. Li et al. [30] studied comparisons of four representative fuzzy clustering algorithms for bearing fault diagnosis. A sparse autoencoder was proposed in [31] both for feature extraction and fault classification of rolling bearing. Support vector machine (SVM) [32] based on structural risk minimization is a promising classification method for small samples and has been successfully applied to many engineering fields, such as remote sensing, credit scoring, mechanical fault diagnosis, etc. [33][34][35]. However, many SVMs should be combined for a multi-class classification since SVM is designed for binary classification problem. Secondly, SVM learning for big data applications is of low efficiency. To overcome these issues, the ensemble SVM (ESVM) was proposed and has been verified with better classification performance than a single SVM for multi-pattern recognition [36][37][38]. ESVM is a collection of SVMs, and the classification results of these SVMs are combined in some way to output the final decision; thus, the classifying generalization performance is to be improved by utilizing the difference of each individual SVM.
Through the above literature analysis, a novel method for multi-fault diagnosis based on ICEEMDAN, nonlinear dynamics entropy, and ensemble SVM is presented for rolling bearings. In the proposed method, the nonstationary bearing vibration signal is decomposed by ICEEMDAN, and three kinds of entropy features are extracted from each of the obtained IMFs. Then, a distance-based method is formulated for selecting discriminative features, and a multi-class ESVM is constructed and trained for intelligent classification of multi-fault rolling bearings. The innovative contributions of this work can be summarized as follows: (1) a hybrid method by the integration of nonstationary signal adaptive decomposition, nonlinear entropy analysis, and ensemble learning for multi-fault diagnosis of the rolling bearing is systematically presented; (2) nonlinear entropy features including DisEn, PerEn, and SampEn of IMFs from ICEEMDAN are extracted to describe the characteristics of different rolling bearing fault categories; and (3) an ensemble of multi-class SVMs trained with the selected discriminative entropy features is used to accurately and intelligently classify different faults and different severities of the rolling bearing.
The remainder of this paper is organized as follows. Section 2 briefly introduces the background theory knowledge. Section 3 systematically outlines the proposed approach. Section 4 presents application studies of the presented method to rolling bearing experimental data to verify its effectiveness and validity. Finally, Section 5 concludes the paper with summaries and future directions.

Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN)
To enhance the performance of the EMD, some noise-assisted improvements have been proposed. The main idea of these improved versions is to add controllable noise to the signal so that the problems with the original EMD can be solved to some extent. Most recently, ICEEMDAN was proposed to obtain less noise and more physical meaning components with a lower computational cost [13]. The basic principle of ICEEMDAN is reformulated as follows.
Given a univariate signal s = [s1, s2, ..., sL] T of length L, let w (i) be a realization of zero mean unit variance white noise vector of the same length, Ek(·) be the operator which produces the kth mode obtained by EMD, M(·) be the operator which produces the local mean of the signal that is applied to, and〈·〉be the operator of an ensemble averaging throughout the realizations. Then, the procedures of ICEEMDAN can be briefly described as follows: Step 1: Calculate the first residue by EMD with the local means of realizations s (i) = s + β0E1 (w (i) ) (i = 1, 2, ..., I); Step 2: Obtain the first mode as d1 = s − r1; be the operator of an ensemble averaging throughout the realizations. Then, the procedures of ICEEMDAN can be briefly described as follows: Step 1: Calculate the first residue r 1 = M s (i) by EMD with the local means of realizations s (i) = s + β 0 E 1 (w (i) ) (i = 1, 2, . . . , I); Step 2: Obtain the first mode as d 1 = s − r 1 ; Step 3: Estimate the kth residue r k = M(r k−1 + β k−1 E k (w (i) )) for k = 2, . . . , K.
Step 4: Compute the kth mode as d k = r k−1 − r k .
Step 5: Go to step 4 for next k until K.
Through the ICEEMDAN algorithm, the original signal s is adaptively decomposed into a set of meaningful intrinsic mode functions (IMFs) and a residual so that useful information can be obtained by further processing these components.

Dispersion Entropy
Similar to PerEn, DisEn is also a symbolic dynamic (i.e., dispersion pattern) entropy to make a fast and robust measure of time-series complexity. That is, a time series is transformed into a new signal with only a few different elements, and thus, study of the nonlinear dynamics of a time series is simplified to the corresponding distribution of symbol sequences. Although some of detailed information may be lost, some of the invariant and robust dynamic properties can be kept. For the given time series d = [d 1 , d 2 , . . . , d L ] T , calculation of its DisEn follows 4 steps.
Step 1: Map each element d j of the signal d to one of the c classes. To do so, first, employ the normal cumulative distribution function to map d j into y j . Next, y j is assigned to an integer from 1 to c by a linear algorithm z j c = round (c*y j + 0.5).
Step 2: Map embedding vectors to dispersion patterns. For each vector Step 3: Calculate the relative frequency of potential dispersion patterns Step 4: Obtain the DisEn in the normalized form as When all the elements of a signal are assigned to one class (i.e., c = 1), the time series d is perfectly predictable and then its DisEn value is equal to zero. In contrast, if all the possible dispersion patterns have equal probability, all values of the d are independent and randomly distributed, thus leading to the maximum value of DisEn.

Ensemble Support Vector Machine
Support vector machine (SVM), a kind of kernel-based learning algorithm, is a state-of-the-art approach that learns a discriminating hyperplane to maximize the margin and to produce a good generalization ability for classification. It works by implicitly mapping original examples into a high-dimensional space with a kind of nonlinear function and then by optimizing the decision function with the maximal margin to classify different classes. Actually, SVM is originally designed for two-class problems. Denoting the n original examples and the corresponding class labels as set {(x i , y i ), i = 1, 2, . . . , n}, where x i R d is the d-dimensional input features for the i-sample and y i {+1,−1} is the label for the positive and negative class and with the decision function f (x) = w T ϕ(x) + b (T is the transpose operator), the primal problem of soft-margin SVM is formulated as where w and b are the weight vector and bias, ϕ(x) is the nonlinear function that maps x to the high-dimensional space, ξ i are slack variables, and C is the penalty factor. Since ||w|| 2 is convex, the above minimizing problem can be achieved with Lagrange multipliers method and the dual problem follows: where α i is the Lagrange multiplier coefficients.

of 18
The solution of Equation (4) gives w = n i=1 α i y i ϕ(x i ) and the estimated bias b, inserting back to the linear decision function f (x) = w T ϕ(x) + b and defining K(x i ,x j ) = ϕ(x i ) T ϕ(x j ) as a kernel function to calculate the inner product; finally, the decision function of soft-margin SVM is represented with the signature function sgn (·) as In practice, there are abundant multi-class classification applications, so it is required that the original SVM is extended to multi-category problems. For this purpose, the task has been solved by decomposing into multiple two-class problems. Initially, the two strategies of one-against-all and one-against-one have been adopted to construct a number of binary SVMs for discrimination of more than two classes. However, a single SVM may not learn exact parameters for the global optimum and SVM learning is time-consuming for a large scale of data. To address these problems, the ensemble SVM (ESVM) has been proposed and has been verified to improve the classification performance greatly compared to a single SVM [36].
ESVM is a collection of several SVMs in which individual decisions are combined in some way to output the classification result. Each individual SVM is independently trained with the randomly chosen training samples to form one base learner. In such a way, each base learner learns a certain area of the data samples space and is unlikely to be wrong at the same time, so ensemble of these base learners provides complementary information and thus more accurate classification results can be obtained. To ensure the differences among base learners (i.e., SVMs) for enhanced performance, methods including bootstrap and AdaBoost have been considered for selecting data samples for training individual SVM [37]. After training, the outputs from the trained base learners are aggregated in an appropriate manner to decide the final classification result. Majority voting, winner takes all, and weighted averaging are commonly used aggregating rules in case of the multi-class classification.

The Proposed Method
To secure properly operation of the rotary machinery, a hybrid method for multi-fault diagnosis of rolling bearings is presented in this paper. Figure 1 shows the framework of the method, which includes signal preprocessing and feature extraction, discriminative feature selection, and intelligent recognition.

Preprocessing with ICEEMDAN and Feature Extraction
The vibration-based monitoring technique is one of the most effective approaches for fault diagnosis of rolling bearings. However, vibration signals of rolling bearings are highly nonstationary and nonlinear, which complicates the process of fault symptom extraction. To address the signal nonstationarity, the recently developed ICEEMDAN is properly introduced to preprocess the acquired vibrational accelerator signals, so that each monitoring signal s is decomposed as described in Section 2.1 and a set of IMFs d k (k = 1, 2, . . . , K) is thus obtained. Before the decomposition, some parameters need to be determined. Firstly, to control the signal-to-noise ratio (SNR) between the added noise and the residue to which the noise is added, the constants β k (k = 0, 1, . . . , K) should be chosen. To simulate the same increasing SNR as that in the EEMD, it was suggested in [13] that β 0 = 0.2 × std(s)/std(E 1 (w (i) )) and β k = 0.2 × std(r k ) (k ≥ 1), where std(·) is the standard deviation operator. This strategy is also considered in this work. For the ensemble realizations I of the final averaging, a number of 100 usually leads to satisfying analysis results for real applications.
To further address the nonlinearity of the vibration signals, the IMFs are processed to extract nonlinear dynamic features of entropy as rolling bearing fault features in this work. Specifically, dispersion entropy (DisEn), permutation entropy (PerEn), as well as sample entropy (SampEn) are considered for their effectiveness and stability. The DisEn features for all the IMFs are obtained as detailed in Section 2.2, while PerEn and SampEn for each IMF are calculated following the procedures in Table 1. ESVM is a collection of several SVMs in which individual decisions are combined in some way to output the classification result. Each individual SVM is independently trained with the randomly chosen training samples to form one base learner. In such a way, each base learner learns a certain area of the data samples space and is unlikely to be wrong at the same time, so ensemble of these base learners provides complementary information and thus more accurate classification results can be obtained. To ensure the differences among base learners (i.e., SVMs) for enhanced performance, methods including bootstrap and AdaBoost have been considered for selecting data samples for training individual SVM [37]. After training, the outputs from the trained base learners are aggregated in an appropriate manner to decide the final classification result. Majority voting, winner takes all, and weighted averaging are commonly used aggregating rules in case of the multi-class classification.

The Proposed Method
To secure properly operation of the rotary machinery, a hybrid method for multi-fault diagnosis of rolling bearings is presented in this paper. Figure 1 shows the framework of the method, which includes signal preprocessing and feature extraction, discriminative feature selection, and intelligent recognition.

Pre-processing and Feature Extraction
where r is the tolerance for Heaviside function H(·).

3
Calculate frequency of patterns: L−m and m←m + 1 to repeat Steps 1-2 to further obtain C m+1 (r).
In entropy-based analyzing methods of time series, as in other types of nonlinear dynamics analysis, appropriate choice of the related parameters is an important issue. For DisEn, there are three parameters to be determined, i.e., the number of classes c, embedding dimension m, and time delay τ. Theoretically, two amplitude values that are far from each other are more likely to be assigned to the same class if c is too small while a very small disturbance (such as noise) may change the original classes with a too large c. Additionally, the computational burden increases with the number of class c. To make a tradeoff between useful information detection and computational efficiency, it is recommended to choose c from 4-8 for practical uses of DisEn. For the embedding dimension m, if it is too small, the dynamic changes of the time series might not be detected. In contrast, if m is chosen too large, small variations may be unable to be observed. Based on the abovementioned facts, embedding dimension m is usually set as 2. According to previous studies [23,27], τ has a very small effect on the estimation of DisEn and aliasing may occur when τ > 1. Hence, τ is usually selected as 1 to guarantee computational efficiency and to provide a reliable analysis.
From Table 1, two parameters of embedding dimension m and time delay τ should be chosen in PerEn analysis. For the embedding dimension, it is appropriate to choose m from 3-7 for bearing vibration analysis based on previous literature [20]. To make full use of the original vibrational signal, τ is often set to be 1. Also, there are two parameters in the calculation of SampEn: reconstruction embedding dimension m and similarity comparison tolerance r. Practically, the tolerance r is a value from 0.1-0.25 std(d) and the parameter m is set to be 2 in SampEn analysis.

Discriminative Feature Selection
In general, the extracted entropy features can be directly combined into a longer vector to comprehensively quantify operational status of rolling bearings. However, the high dimensionality and redundancy of the original features will increase the computation time and decrease the pattern recognition accuracy. Thus, it is necessary to select the most representative features to form a low-dimensional discriminative feature vector for fault diagnosis.
The proposed discriminative feature selection is based on the distance principle: the larger distance between different classes and the smaller distance within the same category of one feature. This feature is regarded as discriminative for fault diagnosis. To be specific, denote the feature set as f n R J (n = 1, 2, . . . , N), where N is the total number of samples and J is the feature dimensionality. The discriminativity of the jth feature is where Dis inter and Dis intra are the interclass distance and the intraclass distance for the jth feature, respectively, calculated as Equation (7).
where f i,j q are samples for the qth (q = 1, 2, . . . , Q) class and N q is its number.
The discriminativity D(f j ) measures ease of classifying the Q classes using the jth feature. A larger discriminativity is, the easier to classify the Q classes that a feature is. Thus, D(f j ) (j = 1, 2, . . . , J) is sorted in descending order, and the number of features for fault diagnosis is increased one by one. When the desired accuracy is achieved, the first few features are selected as the most discriminative features.

ESVM for Multi-Fault Diagnosis
After the discriminative feature selection, one ESVM is constructed and trained for multi-fault intelligent diagnosis of rolling bearings. To be consistent, denote the training samples and the corresponding labels set as discriminative features for the i-sample and y i {1, 2, . . . , Q} is the label from the Q fault conditions of rolling bearings. From the statistical view, the training sample sets of each individual SVM (i.e., base learner) should be as different as possible to obtain higher improvement of the ensemble aggregation. Thus, the bagging technique is considered to construct an individual SVM classifier in this paper. Suppose the ensemble number of SVM is P; bootstrapping then builds replicated data sets TR p (p = 1, 2, . . . , P) by randomly resampling with replacement from the original training data set TR. In this way, each pair x i and y i in the N given training set may appear repeated or not at all in any Appl. Sci. 2020, 10, 5542 8 of 18 particular replication. With the replicated data sets TR p , an individual SVM is independently trained to act as one classifier as in Equation (5) by the one-against-one strategy.
For one test sample x = (f 1 , f 2 , . . . , f d ), it is input to the trained SVMs to output a total of P fault labels. To obtain the fault class label of the new sample, some kind of aggregating rule should be designed. Among the frequently used ensemble rules, majority voting is the simplest one for combining several SVMs and is also utilized here. Let f p (x) be the decision function for the pth SVM in the ESVM and #(p|(f p (x) = q)) be the counted number of SVMs in which the decisions are made to the qth class label. Then, final fault class label of the ESVM for the test sample x is

Application to Rolling Bearings Fault Diagnosis
In this section, the proposed multi-fault diagnosis method is applied to the rolling bearing experiments under different operational conditions and is also compared with some existing methods.

Description of the Experimental Setup
To validate our proposed multi-fault diagnosis method, the publicly available rolling bearing vibration data shared by the bearing data center of Case Western Reserve University (CWRU) [39] is researched in this paper. Since its availability, this data set has been intensively studied for bearing fault diagnostics [40,41]. The test stand of the rolling bearing experiment is illustrated in Figure 2, and the motor shaft (on the left side) support bearings were designed as the testing bearings, which are SKF deep groove ball bearings of 6205-2RS JEM. During the experiments, single point faults ranging from 0.007 to 0.028 inches were introduced to the bearing elements (i.e., ball, and inner and outer raceway) using electro-discharge machining (EDM). Faulted bearings were then reinstalled into the test motor, and vibration data were recorded for motor loads of 0 to 3 horsepower (approximate motor speeds of 1797-1720 rpm). each pair xi and yi in the N given training set may appear repeated or not at all in any particular replication. With the replicated data sets TRp, an individual SVM is independently trained to act as one classifier as in Equation (5) by the one-against-one strategy.
For one test sample x = (f1, f2, ..., fd), it is input to the trained SVMs to output a total of P fault labels. To obtain the fault class label of the new sample, some kind of aggregating rule should be designed. Among the frequently used ensemble rules, majority voting is the simplest one for combining several SVMs and is also utilized here. Let fp(x) be the decision function for the pth SVM in the ESVM and #(p|(fp(x) = q)) be the counted number of SVMs in which the decisions are made to the qth class label. Then, final fault class label of the ESVM for the test sample x is

Application to Rolling Bearings Fault Diagnosis
In this section, the proposed multi-fault diagnosis method is applied to the rolling bearing experiments under different operational conditions and is also compared with some existing methods.

Description of the Experimental Setup
To validate our proposed multi-fault diagnosis method, the publicly available rolling bearing vibration data shared by the bearing data center of Case Western Reserve University (CWRU) [39] is researched in this paper. Since its availability, this data set has been intensively studied for bearing fault diagnostics [40,41]. The test stand of the rolling bearing experiment is illustrated in Figure 2, and the motor shaft (on the left side) support bearings were designed as the testing bearings, which are SKF deep groove ball bearings of 6205-2RS JEM. During the experiments, single point faults ranging from 0.007 to 0.028 inches were introduced to the bearing elements (i.e., ball, and inner and outer raceway) using electro-discharge machining (EDM). Faulted bearings were then reinstalled into the test motor, and vibration data were recorded for motor loads of 0 to 3 horsepower (approximate motor speeds of 1797-1720 rpm). Specifically, the accelerator signals of the drive-end bearing were recorded with the sampling frequency 12 kHz and a duration of one second. In this scenario, there are a total of ten bearing fault classes. Besides the normal state, three defect severities are respectively simulated for the ball, inner raceway, as well as outer raceway. For each class, the original vibration signal is divided into a number of training and testing samples with a length of 2048 data points (about 0.171 s). Details of the analyzed multi-fault bearing data set are listed in Table 2.  Specifically, the accelerator signals of the drive-end bearing were recorded with the sampling frequency 12 kHz and a duration of one second. In this scenario, there are a total of ten bearing fault classes. Besides the normal state, three defect severities are respectively simulated for the ball, inner raceway, as well as outer raceway. For each class, the original vibration signal is divided into a number of training and testing samples with a length of 2048 data points (about 0.171 s). Details of the analyzed multi-fault bearing data set are listed in Table 2.

Results and Analysis
After the above experimental configuration, training and testing vibrational samples for different bearing conditions under four motor loads are prepared. One vibration signal sample for each of the ten fault classes under 0 horsepower motor load is illustrated in Figure 3, which demonstrates that it is problematic to discern each bearing fault class directly from the time waveforms, especially for the ball and inner raceway fault modes.

Results and Analysis
After the above experimental configuration, training and testing vibrational samples for different bearing conditions under four motor loads are prepared. One vibration signal sample for each of the ten fault classes under 0 horsepower motor load is illustrated in Figure 3, which demonstrates that it is problematic to discern each bearing fault class directly from the time waveforms, especially for the ball and inner raceway fault modes.   Figure 4. For comparison purposes, the decompositions by CEEMDAN and EEMD are performed with the same parameters. It can be observed from Figure 4 that the periodic impulses generated by an outer raceway point fault are successful detected by all three algorithms. However, there are ten IMFs using CEEMDAN and EEMD algorithms while nine IMFs using ICEEMDAN. Similar results are also found for other vibration samples. Thus, more compact decomposition results can be obtained with the ICEEMDAN algorithm. In the following, the obtained IMFs are further processed to extract fault features of the rolling bearing.  Then, the ICEEMDAN algorithm is utilized to preprocess time waveform signals with the parameters chosen as detailed in Section 3.1, and the results for one sample of bearing fault class 7 (i.e., outer raceway fault mode with fault size of 0.007 inches) under 0 horsepower motor load are displayed in Figure 4. For comparison purposes, the decompositions by CEEMDAN and EEMD are performed with the same parameters. It can be observed from Figure 4 that the periodic impulses generated by an outer raceway point fault are successful detected by all three algorithms. However, there are ten IMFs using CEEMDAN and EEMD algorithms while nine IMFs using ICEEMDAN. Similar results are also found for other vibration samples. Thus, more compact decomposition results can be obtained with the ICEEMDAN algorithm. In the following, the obtained IMFs are further processed to extract fault features of the rolling bearing.  Specifically, nonlinear entropy features of DisEn, PerEn, and SampEn are extracted for fault diagnosis of rolling bearings. Before calculation of the three entropy features, the related parameters are to be set. Based on the analysis in Section 3.2 and with some initial trials, the parameters are finally chosen for all cases of bearing conditions under four motor loads as c = 8, m = 2, and τ = 1 for DisEn; m = 6 and τ = 1 for PerEn; and m = 2 and 0.2std(d) for SampEn. Actually, the parameters of DisEn and SampEn show limited effects on the extracted features; the embedding dimension has a relatively greater influence on PerEn. To compromise between calculating complexity and effectiveness, these parameters are determined as mentioned above.
The scatter plots of entropy features under 0 horsepower motor load are shown in Figure 5. In previous studies, the IMF1 from such decomposition is usually selected as the components for further analysis; thus, the three kinds of entropy features of IMF1 are also plotted in Figure 5. Some observations can be made. Firstly, different bearing conditions are easier to be discerned in the extracted entropy feature space than the original time waveforms (refer to Figure 3). This is especially evident for the fault mode of the inner raceway, since the three classes with increasing fault sizes are well clustered and demarcated from the remaining seven classes. Nevertheless, such improvements are limited for the cases of the three ball fault sizes because of their large dispersion. Secondly, the different fault sizes of the rolling bearing outer raceway are both well clustered with and without discriminative feature selection of the extracted nonlinear entropies. Additionally, the proposed discriminative feature selection method can better cluster and demarcate the normal as well as the three ball-fault-bearing conditions compared to that of the results of IMF1.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 21 observations can be made. Firstly, different bearing conditions are easier to be discerned in the extracted entropy feature space than the original time waveforms (refer to Figure 3). This is especially evident for the fault mode of the inner raceway, since the three classes with increasing fault sizes are well clustered and demarcated from the remaining seven classes. Nevertheless, such improvements are limited for the cases of the three ball fault sizes because of their large dispersion. Secondly, the different fault sizes of the rolling bearing outer raceway are both well clustered with and without discriminative feature selection of the extracted nonlinear entropies. Additionally, the proposed discriminative feature selection method can better cluster and demarcate the normal as well as the three ball-fault-bearing conditions compared to that of the results of IMF1.
(a) (b) To have a sense of the effect of the number of selected features on the classifying accuracy, a multi-class ESVM of ensemble number ten is constructed and trained with the extracted nonlinear entropy of the ten bearing conditions. Specifically, the kernel function is a radial basis function with unit bandwidth and the penalty factor C = 10,000. The results for class 3 (i.e., ball fault mode with fault size of 0.021 inches) are depicted in Figure 6. Although the accuracy under the four motor loads (i.e., 0 to 3 horsepower) varies differently with the increased number of features, a perfect recognition can be obtained based on the first three discriminative entropy features for this bearing fault class. Furthermore, a detailed analysis of the other nine bearing status shows that more than 85% training accuracy can be obtained if the number To have a sense of the effect of the number of selected features on the classifying accuracy, a multi-class ESVM of ensemble number ten is constructed and trained with the extracted nonlinear entropy of the ten bearing conditions. Specifically, the kernel function is a radial basis function with unit bandwidth and the penalty factor C = 10,000. The results for class 3 (i.e., ball fault mode with fault size of 0.021 inches) are depicted in Figure 6.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 21 observations can be made. Firstly, different bearing conditions are easier to be discerned in the extracted entropy feature space than the original time waveforms (refer to Figure 3). This is especially evident for the fault mode of the inner raceway, since the three classes with increasing fault sizes are well clustered and demarcated from the remaining seven classes. Nevertheless, such improvements are limited for the cases of the three ball fault sizes because of their large dispersion. Secondly, the different fault sizes of the rolling bearing outer raceway are both well clustered with and without discriminative feature selection of the extracted nonlinear entropies. Additionally, the proposed discriminative feature selection method can better cluster and demarcate the normal as well as the three ball-fault-bearing conditions compared to that of the results of IMF1.
(a) (b) To have a sense of the effect of the number of selected features on the classifying accuracy, a multi-class ESVM of ensemble number ten is constructed and trained with the extracted nonlinear entropy of the ten bearing conditions. Specifically, the kernel function is a radial basis function with unit bandwidth and the penalty factor C = 10,000. The results for class 3 (i.e., ball fault mode with fault size of 0.021 inches) are depicted in Figure 6. Although the accuracy under the four motor loads (i.e., 0 to 3 horsepower) varies differently with the increased number of features, a perfect recognition can be obtained based on the first three discriminative entropy features for this bearing fault class. Furthermore, a detailed analysis of the other nine bearing status shows that more than 85% training accuracy can be obtained if the number Although the accuracy under the four motor loads (i.e., 0 to 3 horsepower) varies differently with the increased number of features, a perfect recognition can be obtained based on the first three discriminative entropy features for this bearing fault class. Furthermore, a detailed analysis of the other nine bearing status shows that more than 85% training accuracy can be obtained if the number of discriminative features is set to be three. Therefore, the first three extracted entropy features are selected as the discriminative features for each class to diagnose rolling bearing faults.
With the above analysis, a multi-fault diagnosis of rolling bearings can then be performed online. To validate the performance of the proposed approach, training and testing diagnostic results of the method as well as the non-discriminative features scenario (i.e., DisEn, PerEn, and SampEn of IMF1) under 0 horsepower motor load are summarized in Table 3. Interestingly, both the compromised training and testing accuracy for the two scenarios are in cases of ball fault mode (i.e., bearing fault classes 1, 2, and 3), which was highlighted for further research in the benchmark literature [41]. For the proposed method with the selected discriminative features, the training and testing accuracies are respectively compromised in fault classes 2 and 3 while the training accuracy is compromised in all three fault classes of ball fault mode and the testing accuracy is compromised in fault classes 2 and 3 for the non-discriminative features scenario.  5  100  100  100  100  6  100  100  100  100  7  100  100  100  100  8  100  100  100  100  9  100  100  100  100  10 100 100 100 100 The corresponding misclassified testing samples for the proposed method and the non-discriminative features scenario under 0 horsepower motor load are plotted in Figure 7. It is observed that one testing sample in fault classes 2 and 3 is respectively classified into fault classes 6 and 8 for the non-discriminative feature scenario. However, only one testing sample from fault class 3 is misclassified into fault class 8 with the selected discriminative features. The above results can be partially explained by the entropy feature scatters as in Figure 5, where the related cluster boundaries are somehow overlapped. To draw comprehensively comparisons between the non-discriminative features (NDF) and the proposed discriminative features (DF) scenarios, testing results of all the bearing fault classes under four motor loads are averaged.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 21 of discriminative features is set to be three. Therefore, the first three extracted entropy features are selected as the discriminative features for each class to diagnose rolling bearing faults. With the above analysis, a multi-fault diagnosis of rolling bearings can then be performed online. To validate the performance of the proposed approach, training and testing diagnostic results of the method as well as the non-discriminative features scenario (i.e., DisEn, PerEn, and SampEn of IMF1) under 0 horsepower motor load are summarized in Table 3. Interestingly, both the compromised training and testing accuracy for the two scenarios are in cases of ball fault mode (i.e., bearing fault classes 1, 2, and 3), which was highlighted for further research in the benchmark literature [41]. For the proposed method with the selected discriminative features, the training and testing accuracies are respectively compromised in fault classes 2 and 3 while the training accuracy is compromised in all three fault classes of ball fault mode and the testing accuracy is compromised in fault classes 2 and 3 for the non-discriminative features scenario. From the average recognition rates in Figure 8, only classes 2 and 3 do not have 100% accuracy. The recognition rates for class 2 (i.e., ball fault mode with fault size of 0.014 inches) are especially low. For both classes, a higher average recognition rate is obtained with the discriminative features. Thus, the proposed method with discriminative feature selection is validated to demonstrate much better performance.
The corresponding misclassified testing samples for the proposed method and the nondiscriminative features scenario under 0 horsepower motor load are plotted in Figure 7. It is observed that one testing sample in fault classes 2 and 3 is respectively classified into fault classes 6 and 8 for the non-discriminative feature scenario. However, only one testing sample from fault class 3 is misclassified into fault class 8 with the selected discriminative features. The above results can be partially explained by the entropy feature scatters as in Figure 5, where the related cluster boundaries are somehow overlapped. To draw comprehensively comparisons between the non-discriminative features (NDF) and the proposed discriminative features (DF) scenarios, testing results of all the bearing fault classes under four motor loads are averaged. From the average recognition rates in Figure 8, only classes 2 and 3 do not have 100% accuracy. The recognition rates for class 2 (i.e., ball fault mode with fault size of 0.014 inches) are especially low. For both classes, a higher average recognition rate is obtained with the discriminative features. Thus, the proposed method with discriminative feature selection is validated to demonstrate much better performance.

Comparisons and Discussions
In this subsection, our proposed method is further compared with some other related methods.
For vibration signal preprocessing, the ICEEMDAN algorithm is encapsulated in our method. The two algorithms of CEEMDAN and EEMD have also been frequently utilized in literature for bearing signal decomposition. So, to study the preprocessing algorithm for multi-fault diagnosis performance, fault feature extraction and selection are also performed with the IMFs from CEEMDAN and EEMD to classify the ten bearing fault conditions.

Comparisons and Discussions
In this subsection, our proposed method is further compared with some other related methods. For vibration signal preprocessing, the ICEEMDAN algorithm is encapsulated in our method. The two algorithms of CEEMDAN and EEMD have also been frequently utilized in literature for bearing signal decomposition. So, to study the preprocessing algorithm for multi-fault diagnosis performance, fault feature extraction and selection are also performed with the IMFs from CEEMDAN and EEMD to classify the ten bearing fault conditions.
The results under 0 horsepower motor load are listed in Table 4. The three algorithms all successfully classify the nine classes except class 3 from the ball fault mode. For this bearing state class under 0 horsepower motor load, two testing samples are misclassified with EEMD and only one is misclassified with CEEMDAN and ICEEMDAN. From the misclassified testing samples in Figure 9 for EEMD and CEEMDAN and in Figure 7b for ICEEMDAN, all the off-target samples of class 3 are labeled class 8 (i.e., outer raceway fault mode with fault size of 0.014 inches). Actually, the time waveforms of classed 3 and 8 in Figure 3 are very similar to each other, which complicates the correct classification of class 3. In addition, the average recognition rates by the three preprocessing algorithms for the ten bearing fault classes under four motor loads are summarized in Figure 10. It is also observed that only classes 2 and 3 do not have perfect recognition and that the best performance is obtained with the case of the ICEEMDAN algorithm, which further verifies the effectiveness of the proposed hybrid method.  The results under 0 horsepower motor load are listed in Table 4. The three algorithms all successfully classify the nine classes except class 3 from the ball fault mode. For this bearing state class under 0 horsepower motor load, two testing samples are misclassified with EEMD and only one is misclassified with CEEMDAN and ICEEMDAN. From the misclassified testing samples in Figure  9 for EEMD and CEEMDAN and in Figure 7b for ICEEMDAN, all the off-target samples of class 3 are labeled class 8 (i.e., outer raceway fault mode with fault size of 0.014 inches). Actually, the time waveforms of classed 3 and 8 in Figure 3 are very similar to each other, which complicates the correct classification of class 3. In addition, the average recognition rates by the three preprocessing algorithms for the ten bearing fault classes under four motor loads are summarized in Figure 10. It is also observed that only classes 2 and 3 do not have perfect recognition and that the best performance is obtained with the case of the ICEEMDAN algorithm, which further verifies the effectiveness of the proposed hybrid method.  In the proposed method, the classifier is a multi-class ESVM. To evaluate the performance of the ESVM, three other pattern recognition models of neural network (NN), k-Nearest Neighbor (kNN), and plain multi-class SVM are also tested. For related settings, the SVM uses the same parameters as ESVM, k = 1 for the kNN classifier, and NN is of multilayer perceptron type with the hidden node of 25. The classification results under 0 horsepower motor load are listed in Table 5. It can be concluded that NN has the worst performance with misclassifications in four bearing conditions while the other three classifiers miss the target in the hardest ball fault mode. To be specific, the kNN misclassifies two testing samples both in classes 2 and 3, SVM misses two targets in class 3, and ESVM only misclassified one testing sample in class 3.  Figure 10. The average recognition rates by the three preprocessing algorithms for the ten bearing fault classes.
In the proposed method, the classifier is a multi-class ESVM. To evaluate the performance of the ESVM, three other pattern recognition models of neural network (NN), k-Nearest Neighbor (kNN), and plain multi-class SVM are also tested. For related settings, the SVM uses the same parameters as ESVM, k = 1 for the kNN classifier, and NN is of multilayer perceptron type with the hidden node of 25. The classification results under 0 horsepower motor load are listed in Table 5. It can be concluded that NN has the worst performance with misclassifications in four bearing conditions while the other three classifiers miss the target in the hardest ball fault mode. To be specific, the kNN misclassifies two testing samples both in classes 2 and 3, SVM misses two targets in class 3, and ESVM only misclassified one testing sample in class 3. Detailed analysis can be seen from Figure 11, where the misclassifications are marked. For the multi-class SVM, the testing samples are erroneously classified into not only class 8 but also class 2, which is of poor performance compared to ESVM in Figure 7b. As regards the NN model, eight testing samples are misclassified, resulting in the poorest recognition rates. Finally, kNN classifies two testing samples from class 2 to class 6 and from class 3 respectively to class 8 and class 1. To further evaluate the average performance of the classifiers, the recognition rates under four motor loads are averaged and shown in Figure 12. Detailed analysis can be seen from Figure 11, where the misclassifications are marked. For the multi-class SVM, the testing samples are erroneously classified into not only class 8 but also class 2, which is of poor performance compared to ESVM in Figure 7b. As regards the NN model, eight testing samples are misclassified, resulting in the poorest recognition rates. Finally, kNN classifies two testing samples from class 2 to class 6 and from class 3 respectively to class 8 and class 1. To From Figure 12, similar conclusions can be drawn: the NN classifier obtains the lowest classifications. Generally, the reduced recognition rates are also with fault classes 1-3 (i.e., ball fault mode). The classification accuracy is the highest with ESVM, which further highlights the performance of the proposed method.

Conclusions
For improved fault diagnosis of rolling bearings, a hybrid method based on ICEEMDAN, nonlinear dynamics entropy, and ESVM for multi-fault diagnosis is presented in this paper. The ICEEMDAN algorithm is utilized to decompose the nonlinear and nonstationary bearing vibration signals, and three kinds of entropy features are extracted from the obtained IMFs. Then, the discriminative features are selected based on the interclass and intraclass distances, and the multifault conditions of rolling bearings are intelligently recognized using the multi-class EVSM. The effectiveness of the proposed method is validated with extensive experimental case studies of rolling bearings with different fault modes as well as fault severities under multiple operational conditions (i.e., speeds and loads). Comparisons with some other decomposition algorithms and classifiers further demonstrate the performance of the proposed approach. In the future, the above method for multi-fault rolling bearing diagnosis across different operational conditions will be addressed to bridge the gap between research and real applications.    From Figure 12, similar conclusions can be drawn: the NN classifier obtains the lowest classifications. Generally, the reduced recognition rates are also with fault classes 1-3 (i.e., ball fault mode). The classification accuracy is the highest with ESVM, which further highlights the performance of the proposed method.

Conclusions
For improved fault diagnosis of rolling bearings, a hybrid method based on ICEEMDAN, nonlinear dynamics entropy, and ESVM for multi-fault diagnosis is presented in this paper. The ICEEMDAN algorithm is utilized to decompose the nonlinear and nonstationary bearing vibration signals, and three kinds of entropy features are extracted from the obtained IMFs. Then, the discriminative features are selected based on the interclass and intraclass distances, and the multi-fault conditions of rolling bearings are intelligently recognized using the multi-class EVSM. The effectiveness of the proposed method is validated with extensive experimental case studies of rolling bearings with different fault modes as well as fault severities under multiple operational conditions (i.e., speeds and loads). Comparisons with some other decomposition algorithms and classifiers further demonstrate the performance of the proposed approach. In the future, the above method for multi-fault rolling bearing diagnosis across different operational conditions will be addressed to bridge the gap between research and real applications.