Rub-Impact Fault Diagnosis Using an Effective IMF Selection Technique in Ensemble Empirical Mode Decomposition and Hybrid Feature Models

The complex nature of rubbing faults makes it difficult to use traditional signal analysis methods for feature extraction. Various time-frequency analysis approaches based on signal decomposition, such as empirical mode decomposition (EMD) and ensemble EMD (EEMD), have been widely utilized recently to analyze rub-impact faults. However, traditional EMD suffers from “mode-mixing”, and in both EMD and EEMD the relevance of the extracted components to rubbing processes must be determined. In this paper, we introduce a new informative intrinsic mode function (IMF) selection method for EEMD and a hybrid feature model for diagnosing rub-impact faults of various intensities. Our method uses a novel selection procedure that combines the degree-of-presence ratio of rub impact and a Kullback–Leibler divergence-based similarity measure into an IMF quality metric with adaptive threshold-based selection to pick the meaningful signal-dominant modes. Signals reconstructed using the selected IMFs contained explicit information about the rubbing faults and are used for hybrid feature extraction. Experimental results demonstrated that the proposed approach effectively defines meaningful IMFs for rubbing processes, and the presented hybrid feature model allows for the classification of rub-impact faults of various intensities with good accuracy.


Introduction
Rotating machines, such as turbines, are widely used for power generation and usually operate under severe operating conditions characterized by high temperatures and high rotational speeds. The goal of the turbine design process is to maintain a small clearance between the rotor blades and the stator to increase torque and reduce air reluctance. Rubbing occurs when the turbine blades interact with the stationary parts, either due to blade expansion resulting from high temperatures or due to faults, such as misalignment of the shaft and self-excited vibrations [1]. If faults are not detected at an early stage, rubbing can cause excessive damage to the rotating machine, significantly increasing the maintenance cost. Therefore, the purpose of rub-impact fault diagnosis is to extract intrinsic information about rub impact and apply the extracted features to a classifier to determine rubbing faults at different intensities.
Rubbing faults are recognized as highly complex nonlinear and nonstationary faults [2,3], which cause a large number of transients to appear in the signal. This makes it difficult to utilize traditional signal-processing techniques for feature extraction, such as time-domain and frequency-domain fast Fourier transform (FFT) analysis, which cannot efficiently detect transient phenomena in a have focused on diagnosing rubbing faults of various intensities, and very few actual feature extraction models with numerical values that can be directly applied to classification techniques for rub-impact fault diagnosis have been proposed. Therefore, this paper proposes a reliable rub-impact fault feature extraction approach that combines EEMD with a new IMF selection procedure and uses the selected components to reconstruct the signal and perform hybrid feature extraction. The signal reconstructed using these selected informative IMFs contains less noise and clear rubbing fault frequency components, which makes the rubbing process easily observable. The proposed hybrid feature model for fault feature extraction allows for the examination of rubbing phenomena in both time and frequency domains and produces well-separable features that can be used for fault diagnosis. It is important to mention that there exist other metrics that can be used to quantify various repetitive transients and select informative portions (or sub-bands) of a signal, such as a kurtosis and its modification: a spectral kurtosis. These metrics were used in combination with other TFA decomposition techniques; however, these metrics appeared to be not exactly proportional to the degree of defectiveness of the system and the result highly depends on the frequency resolution assigned prior to the decomposition process [32,33].
The main contributions of this manuscript are as follows: 1. This paper presents a new informative IMF selection procedure that can be used to select the modes obtained by EEMD in rubbing fault analysis. The proposed method includes a new quality criterion for mode evaluation that combines the degree-of-presence ratio (DPR) of rub impact and the Kullback-Leibler divergence (KLD), a statistical similarity metric [34]. An adaptive selection technique then utilizes a thresholding approach and the aforementioned criterion to adaptively select the most valuable IMFs. The selected informative signal-dominant IMFs carry intrinsic information about the important harmonics of rub-impact faults. 2.
Since the selected IMFs are highly effective at detecting rub-impact phenomena, this paper then extracts features for rub-impact fault diagnosis from the signal reconstructed using the selected components. Thus, a hybrid feature model that well-represents rub-impact fault conditions is presented in this study. Our hybrid feature model consists of four features directly extracted from the reconstructed signal in the time domain and three features extracted from the envelope power spectrum of this signal. This hybrid set of features is highly effective for representing each rub-impact fault condition, so these features are further used in a classifier for diagnosing rubbing faults with various intensities.
The remaining sections of this paper are structured as follows. Section 2 presents the proposed rub-impact fault feature extraction methodology, including the adaptive IMF selection procedure and hybrid feature model for diagnosing rubbing faults of various intensities. Section 3 provides experimental validation of the proposed methods. Finally, Section 4 contains the concluding remarks.

Proposed Rub-Impact Fault Feature Extraction Technique
The block diagram of the proposed rub-impact fault diagnosis framework is presented in Figure 1. The proposed approach consists of four essential steps: data acquisition, signal processing, feature extraction, and fault classification. As shown in Figure 1, after data acquisition, an unknown vibration signal is first decomposed into a finite number of oscillating components by EEMD. Then, the subset of IMFs that are signal-dominant and carry essential information about the rubbing faults is selected using the proposed adaptive selection procedure with our novel IMF evaluation metric. Next, timeand frequency-domain features are extracted from the signal reconstructed using the selected IMFs, which represents a clear rub-impact fault signal. This set of hybrid features can provide sufficient insight into the rubbing process to classify rubbing faults of various intensities. The fault diagnosis procedure is completed by means of a one-against-all multi-class support vector machine (OAA MC SVM) classifier [35,36].  Figure 2 shows the self-designed experimental testbed used to collect rubbing fault data by simulating rub-impact faults of different intensities. Two sensors were installed at the drive end (DE) and non-drive end (NDE) of the shaft to continuously measure vibrations of the rotor. Each sensor records the displacements of the 16-blade rotor in both the vertical and horizontal directions using a different channel for each. Therefore, a total of four channels were used by two vibration sensors to record displacements in the vertical and horizontal directions at both ends of the shaft. A Pulse 3560 C device was used to digitize the acquired signal. Details of the data collection system are provided in Table 1. The experiment was performed at a constant rotational speed of 2580 revolutions per minute (RPM), and the signal was sampled at a rate of 65.5 kHz. A rub-impact fault was simulated by adding extra weight on the NDE to create a shaft imbalance. This imbalance in the shaft caused local interactions between the rotor blades and the stationary part. The appearance and intensity of the rubbing process were validated using a thermal camera mounted on the NDE of the shaft. Adjusting the extra mass at the end of the shaft resulted in various rubbing intensities as shown in Figure 3. The total duration of the recorded signal for each case was 59 s. However, each signal was divided into slices of 1 s each for signal processing and feature extraction.  Figure 2 shows the self-designed experimental testbed used to collect rubbing fault data by simulating rub-impact faults of different intensities. Two sensors were installed at the drive end (DE) and non-drive end (NDE) of the shaft to continuously measure vibrations of the rotor. Each sensor records the displacements of the 16-blade rotor in both the vertical and horizontal directions using a different channel for each. Therefore, a total of four channels were used by two vibration sensors to record displacements in the vertical and horizontal directions at both ends of the shaft. A Pulse 3560 C device was used to digitize the acquired signal. Details of the data collection system are provided in Table 1. The experiment was performed at a constant rotational speed of 2580 revolutions per minute (RPM), and the signal was sampled at a rate of 65.5 kHz. A rub-impact fault was simulated by adding extra weight on the NDE to create a shaft imbalance. This imbalance in the shaft caused local interactions between the rotor blades and the stationary part. The appearance and intensity of the rubbing process were validated using a thermal camera mounted on the NDE of the shaft. Adjusting the extra mass at the end of the shaft resulted in various rubbing intensities as shown in Figure 3. The total duration of the recorded signal for each case was 59 s. However, each signal was divided into slices of 1 s each for signal processing and feature extraction.

Empirical Mode Decomposition and Its Variant
In this subsection, we provide the necessary background on the original EMD and EEMD algorithms.

Empirical Mode Decomposition and Its Variant
In this subsection, we provide the necessary background on the original EMD and EEMD algorithms.

Empirical Mode Decomposition and Its Variant
In this subsection, we provide the necessary background on the original EMD and EEMD algorithms.

Empirical Mode Decomposition
EMD [15] decomposes the original signal x(t) into a finite number of oscillatory components using a sifting process. In order to be defined as an IMF, the function should meet the following two conditions [16,37]: (i) the number of extrema and zero-intersect points are either equal to each other or differ by at most one; and (ii) at any point, the mean of the envelopes defined by local maxima and minima is equal to zero. Each IMF can be considered as a specific frequency band of the original signal, where the first IMFs represent high-frequency bands and the last IMFs correspond to lower frequency bands [15].
After the decomposition process, the input signal x(t) can be defined as shown below: where n is the total number of extracted IMFs and r n (t) is the residue of the signal decomposition. The quality of the decomposition can be evaluated by computing the amplitude error between the original signal and the reconstructed one. EMD is a powerful tool that is capable of extracting nonlinear and nonstationary parts of the original signal. However, its crucial disadvantage is the mode-mixing problem, which leads to multiple oscillating components being presented in a single IMF or similar oscillating components being split in different modes with smaller amplitudes. This drawback causes difficulties in interpreting the physical meaning of each mode for feature extraction and fault diagnosis.

Ensemble Empirical Mode Decomposition
EEMD [16,25] was introduced to overcome the problems of mode-mixing observed in conventional EMD. The concept of EEMD is to obtain precise IMF components by taking the mean of several EMD trials performed on the original signal, with the addition of various realizations of white noise in each trial. The main advantage of EEMD over conventional EMD is that it significantly reduces the chance of mode-mixing and is capable of decomposing the original signal more precisely into a set of "true" IMFs. The implementation steps of EEMD are summarized as Algorithm 1.  Figure 4 shows the IMFs obtained by conventional EMD and EEMD for a signal corresponding to an intensive rubbing condition. Figure 4a shows that IMF 8, which corresponds to the fundamental frequency, contained an additional oscillatory component that affects the quality of information carried by this mode, an example of mode-mixing. In this figure, IMF 9 and IMF 10 correspond to the 1/2× and 1/3× fractional harmonics; however, the behavior of these oscillations looks very similar. Figure 4b demonstrates that EEMD better separated the extracted modes, and the IMF of the fundamental frequency (IMF 10) is not affected by other oscillations or noise. Also, EEMD was able to decompose the original vibration signal more accurately into a larger number of modes. IMF 9, the 3/2× fractional frequency harmonic, became visible after decomposition, while in conventional EMD decomposition the component with this frequency harmonic was absent. Moreover, better separation of the IMFs was observed in the range of low fractional harmonics, such as 1/2× and 1/3×. These results are reasonable because due to its specific features as an improvement to the traditional EMD approach, EEMD is able to better separate oscillating components and deliver more clear IMFs. demonstrates that EEMD better separated the extracted modes, and the IMF of the fundamental frequency (IMF 10) is not affected by other oscillations or noise. Also, EEMD was able to decompose the original vibration signal more accurately into a larger number of modes. IMF 9, the 3/2× fractional frequency harmonic, became visible after decomposition, while in conventional EMD decomposition the component with this frequency harmonic was absent. Moreover, better separation of the IMFs was observed in the range of low fractional harmonics, such as 1/2× and 1/3×. These results are reasonable because due to its specific features as an improvement to the traditional EMD approach, EEMD is able to better separate oscillating components and deliver more clear IMFs.

IMF Selection Procedure for Rubbing Fault Diagnosis
As discussed in Section 1, the extracted IMFs can be either signal-dominant or noise-dominant, so it is crucial to select informative IMFs that contain intrinsic information about rub-impact faults. This paper presents a novel adaptive selection method for informative signal-dominant IMFs that can be applied to the domain of rubbing signals. The following two procedures must be applied in IMF selection: evaluating the extracted components using certain criteria to determine which candidate IMFs contain the most valuable information and creating a subset that contains those chosen IMFs. To define the evaluation criterion, we carefully analyze the rubbing phenomenon using the DPR of rub-impact and the KLD (a statistical similarity metric) of each extracted IMF. Based on this new criterion, all of the components can be sorted by their relevance to the rubbing process. Once the IMFs are sorted and their quality determined, the best adaptively selected candidates are used to reconstruct a signal with reduced noise that contains clear rub-impact fault components and can be used for rubbing fault feature extraction.
Since it is known that the appearance and behavior of fractional harmonics are essential features for rub-impact fault diagnosis [11,12,38], the better extraction and separation of the modes containing these harmonics make EEMD preferable to EMD.
Whenever rub-impact faults occur in a system, they affect the harmonics of the fundamental frequency and some fractional frequencies, which can be clearly observed in the envelope power spectrum of the signal. DPR aims to detect the presence and power of these expected frequencies and their harmonics in the envelope power spectra of the IMFs and quantifies each IMF with respect to the ratio of the presence using the procedure presented in Figure 5. To ensure that the DPR can determine informative and meaningful IMF components, the important frequencies of interest and their harmonics were determined based on various studies of rubbing processes [11,12,38] to be the

IMF Selection Procedure for Rubbing Fault Diagnosis
As discussed in Section 1, the extracted IMFs can be either signal-dominant or noise-dominant, so it is crucial to select informative IMFs that contain intrinsic information about rub-impact faults. This paper presents a novel adaptive selection method for informative signal-dominant IMFs that can be applied to the domain of rubbing signals. The following two procedures must be applied in IMF selection: evaluating the extracted components using certain criteria to determine which candidate IMFs contain the most valuable information and creating a subset that contains those chosen IMFs. To define the evaluation criterion, we carefully analyze the rubbing phenomenon using the DPR of rub-impact and the KLD (a statistical similarity metric) of each extracted IMF. Based on this new criterion, all of the components can be sorted by their relevance to the rubbing process. Once the IMFs are sorted and their quality determined, the best adaptively selected candidates are used to reconstruct a signal with reduced noise that contains clear rub-impact fault components and can be used for rubbing fault feature extraction.
Since it is known that the appearance and behavior of fractional harmonics are essential features for rub-impact fault diagnosis [11,12,38], the better extraction and separation of the modes containing these harmonics make EEMD preferable to EMD.
Whenever rub-impact faults occur in a system, they affect the harmonics of the fundamental frequency and some fractional frequencies, which can be clearly observed in the envelope power spectrum of the signal. DPR aims to detect the presence and power of these expected frequencies and their harmonics in the envelope power spectra of the IMFs and quantifies each IMF with respect to the ratio of the presence using the procedure presented in Figure 5. To ensure that the DPR can determine informative and meaningful IMF components, the important frequencies of interest and their harmonics were determined based on various studies of rubbing processes [11,12,38] to be the following: 1/3×, 1/2×, 2/3×, 1×, 4/3×, 3/2×, and 5/3×. The detailed DPR calculation procedure is as follows.
Sensors 2018, 18, x FOR PEER REVIEW 9 of 23 following: 1/3×, 1/2×, 2/3×, 1×, 4/3×, 3/2×, and 5/3×. The detailed DPR calculation procedure is as follows. Step 1: To compute an envelope power spectrum, the analytical signal of the IMF in the time domain is first calculated. For example, if ( ) x t is the original IMF signal, the analytical signal can be represented as a combination of the original IMF signal and the virtue of the Hilbert transform. The analytical signal can thus be formulated as follows: where ( ) x t  is the Hilbert transform. Convolution of the original IMF signal with the signal 1/ t π yields the following: The power spectrum of the analytical signal is then obtained by taking the square of the absolute value of the Fourier transform,   Step 1: To compute an envelope power spectrum, the analytical signal of the IMF in the time domain is first calculated. For example, if x(t) is the original IMF signal, the analytical signal can be represented as a combination of the original IMF signal and the virtue of the Hilbert transform. The analytical signal can thus be formulated as follows: where x(t) is the Hilbert transform. Convolution of the original IMF signal with the signal 1/πt yields the following: The power spectrum of the analytical signal is then obtained by taking the square of the absolute value of the Fourier transform, F x h (t) 2 , as shown in Step 1 of Figure 5.
Step 2: Since the envelope power spectrum reveals the transient impact of rubbing, a Gaussian mixture model (GMM) window (G window (k, δ)) is constructed around the peaks of the frequencies of interest and their integer multiples to attain residual components in the frequency domain of the envelope power spectrum. The coefficients of G window (k, δ) are computed as follows: ( where FOI n defines the n th harmonic of the frequency of interest; n = 3 was used to calculate the DPR. N r f req is the number of frequency bins in the range FOI n − f range ≤ k ≤ FOI n + f range , as shown below: The value of N w f req can be computed (see Step 2 in Figure 5) for a GMM-DPR as follows: Similarly, δ is a Gaussian random variable that is inversely proportional to the standard deviation and can be defined as follows: Here, N w f req is the frequency bin size around the frequency-of-interest components, and the value of m is constant in the range 0 < m < 1, (m = 0.1). A narrow band frequency range (i.e., f range = 1/3FOI) is used for all the frequencies of interest in this paper.
Step 3: The components of the frequencies of interest are then calculated by multiplying the defined Gaussian window G window (k, δ) around the frequencies of interest and their harmonics in the attained envelope power spectrum.
Step 4: The residual frequency components are computed by subtracting the frequency-of-interest component (from Step 3) from the envelope power spectrum. Once we have the frequency of interest and residual components, the DPR is calculated as the ratio of the frequency-of-interest components and the residual frequency components as shown below: Here, C n,j and R n,j are the magnitudes of the j th frequency bin for the frequency-of-interest components and residual frequency components, respectively, around the n th harmonic of the frequency of interest.
Note that a large DPR of an IMF means that the component contains valuable information about the behavior of the fundamental frequency, its fractional harmonics, and other frequencies that can be observed during the rubbing process. A small DPR value indicates the opposite: the current IMF could be noise-dominant and does not provide valuable information that can be used for rubbing fault diagnosis.
Next, the distances between the probability density functions (PDFs) of the extracted IMFs and the original signal are calculated to discover similarities between the hidden structures and determine how much of the original signal's information is preserved in the selected component.
The distance function utilized in this paper, KLD, is one of the most frequently used information-based distance measures and is a member of the Shannon's entropy family [39]. The function is defined as follows: where PDF(x(t)) is the PDF of the original signal x(t) and PDF(IMF k (t)) is the PDF of the k th extracted IMF. Once the DPR and KLD have been calculated, we define an IMF quality measure by Equation (10), which will be used to determine the relevance of the extracted components, and then select informative modes: Here, DPR k represents the sum of all degrees of presence calculated for each N w f req and their three harmonics for the k th IMF, and KLD k is the measure of how close the PDF of the k th IMF is to the PDF of the original signal.
Finally, the IMF selection procedure is completed using a thresholding technique that compares the ratio IMFval k with a threshold value that is set to 1. The threshold value is assigned to 1 because the relationship between the DPR and KLD for each IMF can be considered as a signal-to-noise ratio (SNR). It is known that when the SNR is greater than 1, the amount of signal in the data is greater than the amount of noise. That is, a small KLD distance value in the IMFval k ratio indicates that the PDF of the mode under evaluation is similar or close to that of the original signal. Therefore, IMFval k grows when the DPR (the metric that reflects the presence of important information) is large compared to KLD. This means that the IMF contains a significant amount of valuable information about the rubbing fault and its PDF is very close to that of the original signal, which indicates that the IMF is signal-dominant. In the opposite direction, IMFval k decreases when the IMF contains a small amount of valuable information (low values of DPR) and a large KLD (the component appears to be noise-dominant). Therefore, if IMFval k is greater than or equal to 1, the IMF will be included in the optimal subset for signal reconstruction; otherwise, it will not be used. Using this proposed IMF quality measure and threshold-based selection approach, all of the extracted components can be evaluated and the most valuable modes are adaptively selected based on their relevance to rubbing processes and the amount of information presumed from the original signal.

Feature Extraction and Configuration of Feature Set
The proposed IMF selection process provides a set of the most meaningful IMFs that carry important information about the rubbing processes present in the original signal. We then utilize these selected IMFs for signal reconstruction (i.e., obtaining a noiseless rubbing signal) and feature extraction. The reconstructed signal can be obtained using the following equation: Here, N is the total number of selected IMFs and l corresponds to the order number of each selected mode.
After signal reconstruction, time-domain statistical features in the reconstructed signal and frequency-domain features in its complex envelope power spectrum are extracted. These features are considered to be discriminative because significant changes in the amplitude of the fundamental frequency and its fractional harmonics are usually observed in envelope power spectrum when a rub-impact fault occurs. The amplitude of the vibration signal in the time-domain also changes due to the increased fluctuations of the signal wave depending on the rubbing fault intensity. Thus, the changes in signal behavior and its energy with variations in rubbing intensity levels can be well-characterized by extracting dimensional time-domain statistical feature parameters, such as the root mean square (RMS), kurtosis, skewness, and square root of the amplitude (SRA) from the signal reconstructed using the sets of selected IMFs. These features are widely used as health indicators of various systems in fault diagnosis problems and can provide a good insight into rubbing processes in the time domain because they are known to be features sensitive to impulse faults [40]. Here, skewness and kurtosis are the third and fourth central moments of standard deviation. These two features are known as statistical indicators sensitive to the degree of peakedness of the signal that can be used well to characterize the variability of the vibration signal in the time domain when a rubbing fault occurs in a system. Also, the RMS and SRA are used in this study to describe the changes in intensity levels of the vibration signal affected by a rub-impact fault. Note that the RMS and SRA are usually both thought of as statistical parameters that reflect the behavior of the signal's amplitude in different scales, so these two features may not provide drastically different information, but they complement each other when used simultaneously.
Additionally, frequency domain analysis can help to discover some information that cannot be observed in the time domain [41]. Features that involve statistical properties of frequency are extracted from the envelope power spectra of the reconstructed signals. As we know, the frequency spectra of the rub-impact signals obtained by conventional frequency-domain signal analysis techniques, such as FFT, fail to present relevant information about rubbing processes. Various studies [20,23] have shown that the frequency spectra of the original rub-impact signals obtained by FFT usually cannot represent the frequency harmonics relevant to rub-impact faults, although the fundamental frequency and some high-order harmonics still may be present. The frequency-domain features extracted in this way cannot be considered as good features for diagnosing rub-impact faults because the behavior of the fundamental frequency and its higher harmonics can be affected by various mechanical faults in rotational machinery as well as properties of the environment in which the machine is installed. It is therefore difficult to verify whether these extracted features are actually related to rubbing faults or not. On the other hand, the frequency-domain features extracted in this study after time-frequency EEMD decomposition of the original vibration signal followed by informative IMF selection are definitely capable of reflecting rub-impact faults because the fault information is highly detectable and well-presented by the envelope spectra of the signals reconstructed using sets of selected IMFs. Moreover, it was observed that the valuable rubbing fault-related harmonics are detectable and well-localized in the envelope power spectrum of the reconstructed signal. Thus, the frequency-domain features extracted in this study are as follows: mean frequency, frequency RMS, and frequency standard deviation (St. Dev). Here, the mean frequency value corresponds to the mean of the frequency of the envelope power spectrum computed for a reconstructed signal, frequency RMS characterizes the intensity of the signal in the frequency domain, and frequency St. Dev describes the deviation of the signal from its main frequency components in the frequency domain.
Due to the extraction of the feature parameters from both time and frequency domains, the proposed feature model can be considered as a hybrid one. The basic idea beyond this model is that each feature extracted from both domains aims to characterize and quantify the specific physical and statistical properties of the denoised vibration signal affected by rub-impact. Furthermore, the features used in this study are well-recognized and frequently used health indicators in other problems of vibration-based condition monitoring, such as rolling-element bearing fault diagnosis [42] and prognosis [43]. Concretely, frequency-domain features, such as mean frequency, frequency RMS, and frequency standard deviation, are frequently applied for rolling element bearings (REB) fault diagnosis, whereas the time-domain RMS, kurtosis, and skewness are popular health indicators used for fault prognosis and remaining useful lifetime estimation. All the extracted features are shown in Table 2. Table 2. Time-and frequency-domain statistical feature parameters. Where x rec is the reconstructed signal in the time domain, f is the spectral component of the reconstructed signal x rec , and σ is the standard deviation of the reconstructed signal x rec . RMS, root mean square.

Parameters Equations Parameters Equations
Root mean square ( f 1 ) Mean frequency ( f 5 )

Experimental Results and Discussion
The effectiveness of the proposed rub-impact fault feature extraction technique with its novel IMF selection procedure is presented along with its possible application to fault diagnosis.

Training and Testing Data Configuration
An appropriate training and testing dataset configuration is important in order to determine the generalized quality of the proposed fault diagnosis approach. In this paper, a rub-impact fault was simulated using a shaft imbalance produced by adding extra weights to the NDE. In total, 10 different weights (i.e., rubbing intensities) were added to the shaft, namely 0, 0.5, 1.0, 1.5, 1.6, 1.7, 1.8, 2.0, 2.4, and 2.8 g. A 59-s-long signal was acquired for each intensity level. Therefore, the created dataset contained 590 signal instances in total. In this paper, the set of features extracted for the experimental part consisted of a total of N c × N s × N f features. Here, N c is the number of bladed rotor intensity classes (conditions) simulated in this study, N s is the number of instances for each condition, and N f is the number of extracted features.
In this paper, the k-fold cross-validation (k = 3) procedure was applied for validating the introduced methodology during each experimental trial. In the k-fold cross-validation method, the entire dataset is randomly split into k-folds, where each fold should be used once as a testing subset applied to a classifier trained on the remaining k − 1 subsets. Specifically, in this paper, 39 randomly chosen data samples from each class were selected as the training subset and the remaining 20 data instances from each class were used to construct the testing subset. Thus, each training subset contained 390 instances and each testing subset consisted of the 200 remaining data samples.

Validation of the Selected IMFs Using the Proposed Approach
The proposed IMF selection method allows for derivation of the specific objective function values that can be used to evaluate the quality of the components extracted by EEMD; these values are presented in Table 3. As the result of the introduced procedure, the sets containing the most valuable components were obtained for each signal class as shown in Table 4. In this study, rubbing faults were caused by imbalance in the testbed shafts, where the imbalance was created by attaching extra weight to the NDE. As shown in Figure 3 in Section 2.1, the increase in weight results in an increased degree of shaft imbalance, which causes the various rubbing conditions observed during the experiment. It was also observed that some rubbing conditions, which were validated using a thermal camera, had different degrees of imbalance due to the additional weight. This observation explains why our IMF selection method adaptively delivered subsets with various components for each class; however, each subset was also shown to contain IMFs that were common for all classes.  Since the modes obtained by the decomposition are amplitude-modulated, the envelope power spectrum of the reconstructed signal must be computed to verify the quality of the subsets provided by IMF selection. Figure 6 demonstrates the envelope power spectra of the original and reconstructed signals using the subsets of selected IMF components corresponding to the intensity classes produced by 0 g and 2.8 g of extra weight. As shown in Figure 6a, it was observed that the envelope power spectrum of the original signal when no rubbing occurred mostly consisted of the fundamental frequency and its high-order harmonics. No fractional harmonics were clearly observed in the power spectrum. Also, the peak amplitude of the fundamental frequency harmonic was drastically higher than the peak amplitudes of other harmonics present in the power spectrum. Figure 6b shows the envelope power spectrum of the reconstructed signal when no rubbing occurred in the system. This power spectrum includes the fundamental frequency 1× and its high-order harmonics, while the majority of fractional harmonics are not present or their amplitudes are very small, which is reasonable for a system in this state. The peak amplitudes of the harmonics present were also relatively small. Figure 6c presents the envelope power spectrum of the original signal acquired during a severe rub-impact fault. From this figure it is seen that the main frequency 1× and its multiple harmonics were clearly present in the envelope power spectrum. Moreover, some of the fractional harmonics that are considered evidence of the rubbing process, 1/3× and 4/3×, were barely seen in this figure. Even though these fractional harmonics were present, the problem with extracting features for rubbing fault diagnosis from this kind of power spectrum is that the amplitude of the fundamental frequency is noticeably higher than the amplitudes of all other harmonics. Therefore, the numerical values of features extracted from this original signal mostly reflect the behavior of the main frequency, while the influence of the fractional frequencies will be negligible. It is known that many various mechanical faults in rotating machinery affect the behavior of the fundamental frequency. Therefore, when the only signs of a fault are amplitude changes of the fundamental frequency and its high-order harmonics, it is not easy to determine whether the extracted features actually reflect the rubbing process or whether they are related to other mechanical faults or the environmental features. In contrast, Figure 6d shows that the frequencies which appeared in the envelope power spectrum of the reconstructed signal of the severe rub-impact fault contained important information corresponding to the various signal harmonics, such as the fundamental frequency 1×, its high-order harmonics 2×, 3×, 4×, and 5×, and the fractional harmonics 1/3×, 2/3×, 4/3×, 5/3×, 10/3×, and 9/2×, which are considered valuable features for rub-impact faults [11,12,38]. Furthermore, the amplitudes of the fundamental frequency component and its high-order harmonics during the severe rubbing process were higher than those shown in Figure 6b. These observations reveal that the selected IMF subsets contain meaningful information and that the features extracted from the reconstructed signals well-represent rubbing faults of various intensities. multiple harmonics were clearly present in the envelope power spectrum. Moreover, some of the fractional harmonics that are considered evidence of the rubbing process, 1/3× and 4/3×, were barely seen in this figure. Even though these fractional harmonics were present, the problem with extracting features for rubbing fault diagnosis from this kind of power spectrum is that the amplitude of the fundamental frequency is noticeably higher than the amplitudes of all other harmonics. Therefore, the numerical values of features extracted from this original signal mostly reflect the behavior of the main frequency, while the influence of the fractional frequencies will be negligible. It is known that many various mechanical faults in rotating machinery affect the behavior of the fundamental frequency. Therefore, when the only signs of a fault are amplitude changes of the fundamental frequency and its high-order harmonics, it is not easy to determine whether the extracted features actually reflect the rubbing process or whether they are related to other mechanical faults or the environmental features. In contrast, Figure 6d shows that the frequencies which appeared in the envelope power spectrum of the reconstructed signal of the severe rub-impact fault contained important information corresponding to the various signal harmonics, such as the fundamental frequency 1×, its high-order harmonics 2×, 3×, 4×, and 5×, and the fractional harmonics 1/3×, 2/3×, 4/3×, 5/3×, 10/3×, and 9/2×, which are considered valuable features for rub-impact faults [11,12,38]. Furthermore, the amplitudes of the fundamental frequency component and its high-order harmonics during the severe rubbing process were higher than those shown in Figure 6b. These observations reveal that the selected IMF subsets contain meaningful information and that the features extracted from the reconstructed signals well-represent rubbing faults of various intensities.  To validate the efficacy of EEMD decomposition followed by informative IMF selection in representing rub-impact faults, we compared the envelope power spectra of the signals reconstructed using the selected IMFs with wavelet maps obtained by applying a continuous wavelet transform (CWT) to the original signals as described in [44]. Figure 7 shows the envelope power spectra obtained after signal reconstruction for rubbing intensities corresponding to classes '0 g', '1.6 g', '2.0 g', and '2.8 g'. The wavelet maps corresponding to the same signals obtained using CWT are shown in Figure 8. To validate the efficacy of EEMD decomposition followed by informative IMF selection in representing rub-impact faults, we compared the envelope power spectra of the signals reconstructed using the selected IMFs with wavelet maps obtained by applying a continuous wavelet transform (CWT) to the original signals as described in [44]. Figure 7 shows the envelope power spectra obtained after signal reconstruction for rubbing intensities corresponding to classes '0 g', '1.6 g', '2.0 g', and '2.8 g'. The wavelet maps corresponding to the same signals obtained using CWT are shown in Figure 8. Comparing Figures 7 and 8, it is clear that the envelope power spectra of signals reconstructed using the selected IMFs after EEMD decomposition provide better information about rubbing faults than the wavelet maps obtained by CWT. Specifically, from Figure 7 it can be seen that the appearance of fractional harmonics and their amplitude behavior, combined with information about the fundamental frequency and its high-order harmonics, allows for the differentiation of rub-impact faults of various intensities as well as differentiation of the normal state of the system with no rubbing faults. Figure 8 shows that it while might be possible to discriminate between the state of the system when a rubbing fault is and is not present in a signal, it might be difficult to differentiate between Comparing Figures 7 and 8, it is clear that the envelope power spectra of signals reconstructed using the selected IMFs after EEMD decomposition provide better information about rubbing faults than the wavelet maps obtained by CWT. Specifically, from Figure 7 it can be seen that the appearance of fractional harmonics and their amplitude behavior, combined with information about the fundamental frequency and its high-order harmonics, allows for the differentiation of rub-impact faults of various intensities as well as differentiation of the normal state of the system with no rubbing faults. Figure 8 shows that it while might be possible to discriminate between the state of the system when a rubbing fault is and is not present in a signal, it might be difficult to differentiate between rubbing faults of various intensities. It is clear that the most significant difference between the wavelet map patterns and the wavelet coefficient scales for the various states is observed between the case where there is no rub-impact fault in a system (Figure 8a) and that in which severe rubbing is present (Figure 8d). The wavelet map patterns and the scales of the wavelet coefficients for classes corresponding to slight rubbing and intensive rubbing were very similar. Moreover, from the wavelet maps it is difficult to verify whether the frequency harmonics inherent to rub-impact faults are present in the FFT spectrum due to significant overlap of the harmonics around the fundamental frequency. Therefore, it can be concluded that the method proposed in this paper is capable of representing and delivering valuable information about rub-impact faults of various intensities better than conventional CWT. rubbing faults of various intensities. It is clear that the most significant difference between the wavelet map patterns and the wavelet coefficient scales for the various states is observed between the case where there is no rub-impact fault in a system ( Figure 8a) and that in which severe rubbing is present (Figure 8d). The wavelet map patterns and the scales of the wavelet coefficients for classes corresponding to slight rubbing and intensive rubbing were very similar. Moreover, from the wavelet maps it is difficult to verify whether the frequency harmonics inherent to rub-impact faults are present in the FFT spectrum due to significant overlap of the harmonics around the fundamental frequency. Therefore, it can be concluded that the method proposed in this paper is capable of representing and delivering valuable information about rub-impact faults of various intensities better than conventional CWT.

Performance Evaluation of the Proposed Rubbing Fault Feature Extraction Scheme with the New IMF Selection Procedure
To evaluate the quality of the features extracted from the reconstructed signal and their ability to provide sufficient information about rub-impact faults with different intensity levels in rotor blades, we compared our feature extraction approach with three TFA feature extraction methods that use numerical-valued features for rub-impact fault diagnosis. The first TFA method utilizes wavelet packet transform and maximum SV computation to extract features from signals containing rubbing faults (referred to as WPT + MSV) [4]. The second TFA combines conventional EMD decomposition and maximum SV extraction to create a set of features for differentiating rubbing faults (referred to as EMD + MSV) [24]. The third TFA approach applies digital wavelet transform (DWT) to the rubbing fault signal and then uses decomposed sample data in the time domain of the third level of transformation as feature vectors for diagnosing rub-impact faults (referred to as DWT + TDSIG) [5]. Also, for comparison purposes, we applied our proposed hybrid feature model to signals reconstructed using sensitive IMFs selected by the method presented in [30] (referred to as SensIMF + HFM). In this paper, the one-against-all multiclass SVM (OAA-MCSVM) classifier [35,36] was

Performance Evaluation of the Proposed Rubbing Fault Feature Extraction Scheme with the New IMF Selection Procedure
To evaluate the quality of the features extracted from the reconstructed signal and their ability to provide sufficient information about rub-impact faults with different intensity levels in rotor blades, we compared our feature extraction approach with three TFA feature extraction methods that use numerical-valued features for rub-impact fault diagnosis. The first TFA method utilizes wavelet packet transform and maximum SV computation to extract features from signals containing rubbing faults (referred to as WPT + MSV) [4]. The second TFA combines conventional EMD decomposition and maximum SV extraction to create a set of features for differentiating rubbing faults (referred to as EMD + MSV) [24]. The third TFA approach applies digital wavelet transform (DWT) to the rubbing fault signal and then uses decomposed sample data in the time domain of the third level of transformation as feature vectors for diagnosing rub-impact faults (referred to as DWT + TDSIG) [5]. Also, for comparison purposes, we applied our proposed hybrid feature model to signals reconstructed using sensitive IMFs selected by the method presented in [30] (referred to as SensIMF + HFM). In this paper, the one-against-all multiclass SVM (OAA-MCSVM) classifier [35,36] was employed to perform the comparison of the above approaches. To ensure the repeatability of the results and exclude the effect of randomness, our experiments were performed 20 times with different combinations of training and testing data.
The classification accuracy for each group of samples was computed using the true positive rate evaluation index (TPR), which is given below: where k is the total number of cross-validation folds (k = 3), N i,l TP is the total number of samples in class l that are correctly classified as class l, N i,l FN is the number of samples within class l that are not recognized as class l, and i indicates the i-th iteration of the k-fold cross-validation procedure. The final TPR for each class presented here is computed as an average of the TPRs achieved over 20 experiments. The standard deviation (St.Dev) of the TPR was also calculated and shown in the results.
The classification accuracy (CA) for each experiment was determined using the relation given below: where M is the total number of groups and N samples is the total number of samples represented in a particular testing subset. The final classification accuracy presented in the results is the average classification accuracy (ACA) achieved over 20 experiments. The standard deviations (St.Dev) of the CA metric were also calculated and presented in the results.
The experimental results are shown in Table 5. These results demonstrate that rub-impact fault feature extraction using EEMD with adaptive IMF selection based on the novel presented mode evaluation metric for rubbing faults and the proposed hybrid feature model outperforms the reference methods in terms of average classification accuracy, with an accuracy of 99.8% over 20 experiments. Table 5 shows that the average TPR values of the proposed method over 20 experiments were over 99.5% and the standard deviations of the TPR did not exceed 1.5%.
Confusion matrices for the proposed framework and the reference methods are presented in Figure 9. The confusion matrix is a robust technique that provides a visualization of the performance of a classifier algorithm in terms of the deviation between actual and predicted results [35]. According to the results in Figure 9a, the proposed method perfectly identified rubbing faults of all intensities with a very low misclassification rate compared to its counterparts: SensIMF + HFM [30] in Figure 9b, WPT + MSV [4] in Figure 9c, EMD + MSV [24] in Figure 9d, and WT + TDSIG [5] in Figure 9e.  These results can be explained as follows. EEMD decomposition is capable of extracting clear 'true' IMFs, which can be easily associated with frequency bands of the original signal. Importantly, the proposed adaptive IMF selection method for rubbing signals, in conjunction with the novel IMF evaluation technique, enables meaningful intrinsic components to be precisely determined. These components include valuable information about the frequency bandwidths containing specific frequency peaks and their harmonics, which are considered to be evidence of rubbing processes in a signal [11,12,38].
Regarding SensIMF + HFM, Table 5 shows that the average classification accuracy achieved by this method for classifying rubbing faults of various intensities was slightly less than one, made by the proposed methodology. Note that SensIMF + HFM is a synthetic method that combines a sensitive IMF selection method for rub-impact fault diagnosis [30] and our proposed hybrid feature model extracted from signals reconstructed using the selected sensitive IMFs. The results presented in Table  5 and Figure 9b demonstrate that this combined approach performs well at differentiating various rubbing conditions, but is still slightly worse than the method proposed in this paper. The difference between the results of the two methods can be explained as follows. Although both IMF selection methods were introduced for rub-impact fault diagnosis, the main concepts behind them are different. The method proposed in this paper aims to select the IMFs that are directly related to rubbing faults in the system by both searching for the specific rub-impact fault harmonics in the envelope power spectra of IMF candidates and computing the KLD similarity between the extracted components and the original non-decomposed signal. On the other hand, the referenced method relies on computing 'sensitivity factors' of IMFs using only the correlation between the extracted components and the original signal as well as the signal acquired under the normal operating conditions of rotating machinery [30]. Considering that both methods utilize the same feature model, it can be concluded that the approach proposed in this paper is able to better highlight the important frequency harmonics of rubbing faults than a method that relies only on correlation properties of signals and IMFs without considering the exact information content carried by those modes. These results can be explained as follows. EEMD decomposition is capable of extracting clear 'true' IMFs, which can be easily associated with frequency bands of the original signal. Importantly, the proposed adaptive IMF selection method for rubbing signals, in conjunction with the novel IMF evaluation technique, enables meaningful intrinsic components to be precisely determined. These components include valuable information about the frequency bandwidths containing specific frequency peaks and their harmonics, which are considered to be evidence of rubbing processes in a signal [11,12,38].
Regarding SensIMF + HFM, Table 5 shows that the average classification accuracy achieved by this method for classifying rubbing faults of various intensities was slightly less than one, made by the proposed methodology. Note that SensIMF + HFM is a synthetic method that combines a sensitive IMF selection method for rub-impact fault diagnosis [30] and our proposed hybrid feature model extracted from signals reconstructed using the selected sensitive IMFs. The results presented in Table 5 and Figure 9b demonstrate that this combined approach performs well at differentiating various rubbing conditions, but is still slightly worse than the method proposed in this paper. The difference between the results of the two methods can be explained as follows. Although both IMF selection methods were introduced for rub-impact fault diagnosis, the main concepts behind them are different. The method proposed in this paper aims to select the IMFs that are directly related to rubbing faults in the system by both searching for the specific rub-impact fault harmonics in the envelope power spectra of IMF candidates and computing the KLD similarity between the extracted components and the original non-decomposed signal. On the other hand, the referenced method relies on computing 'sensitivity factors' of IMFs using only the correlation between the extracted components and the original signal as well as the signal acquired under the normal operating conditions of rotating machinery [30]. Considering that both methods utilize the same feature model, it can be concluded that the approach proposed in this paper is able to better highlight the important frequency harmonics of rubbing faults than a method that relies only on correlation properties of signals and IMFs without considering the exact information content carried by those modes.
Regarding WPT + MSV, the average accuracy was smaller than that of the proposed feature extraction approach. Table 5 and Figure 9c show that in general, the WPT + MSV approach deals with differentiating rubbing faults of various intensities well; however, for the rubbing intensities corresponding to extra weights of 1.5 g, 1.7 g, and 2.0 g attached to the NDE, the TPR values were relatively low with increased standard deviation. These results can be explained because as mentioned above, rubbing faults in this study were simulated by adding extra weight to the NDE. This means that with increasing rotor imbalance, the rubbing becomes more severe and the characteristics of the vibration signal change significantly. The quality of the wavelet packet transform strongly depends on the choice of mother wavelet function. Moreover, since this approach is not entirely adaptive and data-driven, the same mother wavelet function simply cannot perfectly match all of the changing types of vibration signals which contain both rotor imbalance and a complex nonlinear fault, such as a rub-impact fault.
Regarding EMD + MSV, Table 5 and Figure 9d show that the average accuracy and TPR achieved by this method over 20 experiments were low compared to the proposed method, SensIMF + HFM, and WPT + MSV. There might be two main reasons for this. The first is the problem of mode-mixing which is inherent to the conventional EMD approach. A rub-impact fault is known to be a complex fault that induces various nonlinear processes in the signal, and due to the mode-mixing problem, the quality of the extracted modes may not be good enough to extract discriminative features. In the ideal case, each extracted component should represent a specific frequency band so that the extracted features can describe the processes in the system well. However, mode-mixing causes the mixing of harmonics in the frequency spectra of the obtained IMFs. This makes it difficult to clearly understand the physical meaning of each mode, and the quality of the extracted features becomes poor. The second reason for the obtained results is that the EMD + MSV feature extraction approach does not consider the selection of valuable IMFs. The cardinality of extracted IMFs is usually large, and they can be either signal-or noise-dominant. In the case of diagnosing rub-impact faults, it is important to determine which of the extracted components can provide useful and clear information because the features extracted from noise-dominant components can expand the feature models and degrade the overall classification performance by delivering poor-quality or redundant features that are not related to the fault being investigated.
In addition, in both WPT + MSV and EMD + MSV it might be difficult to verify whether the extracted SVs as features actually reflect rub-impact faults or whether their values are more closely related to the properties and setup of the system because these environment properties are also reflected in the acquired signal.
Regarding DWT + TDSIG, its performance for differentiating rub-impact faults of various intensities was not satisfactory in this study. Table 5 shows that this method achieved an average classification accuracy of 22.6% over 20 experiments. Also, the TPR values were noticeably lower and the standard deviations were significantly higher than those demonstrated by the other methods presented. The confusion matrix in Figure 9e shows a similar result. Such a poor performance can be explained as follows. First of all, choosing a wavelet function is not an easy task but is important because the quality of wavelet-based decomposition is directly affected by the chosen mother wavelet. Furthermore, the dataset used in this study contains two mechanical faults: a rub-impact fault and its cause, an imbalance of the shaft. The wavelet function chosen in the referenced method may not match all the signal groups present in the dataset. This method also does not propose any technique for adaptive selection of a proper wavelet level band that contains the most essential information about the mechanical fault after decomposition. Finally, the poor performance seems to be caused by the use of time-domain decomposed sample data as a feature vector to the OAA-MCSVM classifier. Although SVM is known to be a robust classifier that is insensitive to the dimensionality of the feature vector [45], the use of a time-domain signal instead of statistical feature parameters creates a feature vector of huge dimensions. This leads the classifier to attempt to create a very complex hyperplane to try to separate the signal samples, causing the final classification accuracy to be poor. Of course, DWT incorporates some downsampling while decomposing the original signal, but this has benefits only when the original signal is relatively short (e.g., 2048 sampling points [5]). If the original signal is long like those used in this study, even after the downsampling performed by DWT, the remaining signal at the third level of decomposition is still long.
Overall, the proposed feature extraction methodology is highly useful for the diagnosis of rub-impact faults of various intensities because of its main concept: informative IMF selection with a novel IMF quality metric and a hybrid feature extraction procedure based on a signal reconstructed using the selected IMFs in which fault symptoms are highly observable.

Conclusions
This paper presented a new feature extraction method for rub-impact fault diagnosis based on EEMD with an informative IMF selection procedure and hybrid feature model. The proposed method addresses the problems of both efficient feature extraction and the selection of meaningful and proper IMFs suitable for rub-impact fault diagnosis. First, EEMD provided well-separated and well-behaved oscillating components of the original vibration signal and reduced the influence of the mode-mixing that is inherent to the conventional EMD approach. Second, subsets of meaningful signal-dominant modes were selected by the proposed IMF selection method using a novel quality measure based on the DPR and the Kullback-Leibler divergence distance metric. Finally, the adaptively selected IMFs were used to reconstruct a noiseless and clear rubbing fault signal for hybrid feature extraction and construct the feature set. In the experimental part of this study, the proposed set of features was used to differentiate rub-impact faults with various intensities using the OAA-MCSVM classifier. The experimental results demonstrated that the proposed combination of signal decomposition and our IMF selection technique is capable of extracting relevant information about rubbing faults of various intensities. The proposed feature extraction approach outperformed reference methods in terms of classification accuracy and TPR. Fault classification using the features extracted by the proposed method achieved an average accuracy of 99.8% over 20 experiments with various combinations of training and testing data. In our future research work, we would like to elaborate more on the thresholding technique used in the proposed IMF selection approach. Specifically, the use of adaptive thresholding may reduce the number of selected IMF components for various classes, and hence, possibly lead to more 'clean' reconstructed signal. Also, the evaluation of the proposed methodology in other fault detection and diagnosis fields is the scope of future work.