Bearing Fault Diagnosis Considering the Effect of Imbalance Training Sample

To improve the accuracy of the recognition of complicated mechanical faults in bearings, a large number of features containing fault information need to be extracted. In most studies regarding bearing fault diagnosis, the influence of the limitation of fault training samples has not been considered. Furthermore, commonly used multi-classifiers could misidentify the type or severity of faults without using normal samples as training samples. Therefore, a novel bearing fault diagnosis method based on the one-class classification concept and random forest is proposed for reducing the impact of the limitations of the fault training sample. First, the bearing vibration signals are decomposed into numerous intrinsic mode functions using empirical wavelet transform. Then, 284 features including multiple entropy are extracted from the original signal and intrinsic mode functions to construct the initial feature set. Lastly, a hybrid classifier based on one-class support vector machine trained by normal samples and a random forest trained by imbalanced fault data without some specific severities is set up to accurately identify the mechanical state and specific fault type of the bearings. The experimental results show that the proposed method can significantly improve the classification accuracy compared with traditional methods in different diagnostic target.


Introduction
Bearings are one of the most important components in rotating machinery, and a bearing fault can affect the reliability of wind turbines or other electric equipment. A gearbox fault or other mechanical fault in the drive system of wind turbines is mostly caused by a bearing fault or is reflected in the state of the bearings. Therefore, research into bearing fault diagnosis is crucial for improving the electronic reliability of electrical equipment and reducing downtime [1][2][3].
Vibration analysis has been widely used in the field of bearing fault diagnosis [4][5][6][7]. However, it is difficult to extract features from raw vibration signals with nonlinear and non-stationary characteristics, and thus, the raw signal needs to be pre-processed using time-frequency analysis methods. The commonly used methods for this pre-processing include empirical mode decomposition (EMD) [7,8], wavelet packet transform (WPT) [9], local mean decomposition (LMD) [6], and ensemble of the imbalance of training samples. In this paper, the imbalance of sample number is due to the fact that a certain number of fault samples have been obtained after the occurrence of certain types of bearing fault, while some fault severities, which have not happened or cannot be obtained by experiment for reasons of cost, have no accumulation of samples.
Moreover, when a fault degree that is not included in the training dataset occurs, the diagnosis system may mistake it for a normal condition. Therefore, utilizing only normal samples as training samples for precisely distinguishing the fault condition of bearings has good practical value [22]. Wan designed a hybrid classifier with good diagnostic results for preventing the misidentification of unknown faults [11].
Otherwise, the diagnosis methods should fully consider the effect of diagnostic targets. Most of the studies develop bearing-fault diagnosis for three common faults that are classified by the fault location: ball fault, inner race fault, and outer race fault [2,22]. However, the requirements for various scenarios are not the same in application. Thus, further refinement of bearing fault types is required. In addition to the fault location, the position of the bearing and the fault severity can both be regarded as specific fault types [1]. For different diagnostic targets, the optimal feature subset is different. This leads to difficulty in constructing classifiers with optimal feature subsets.
A novel bearing fault diagnosis method based on a hybrid classifier constructed using one-class classification and RF considering the imbalance of the training sample is proposed. EWT is used to extract the IMFs of the bearing vibration signal; 284 features are extracted from the original signal and IMFs to construct the initial feature set. One-class support vector machine (OCSVM), trained using only normal samples in the hybrid classifier, is used to determine if a bearing fault has occurred. The classifier based on RF with the original feature set is applied for fault diagnosis with unbalanced training samples, and the influence of redundant features is avoided in the ensemble learning process of RF. If a severe fault does occur, the RF trained with all known fault severities is used to recognize the specific fault type. The experimental results show that the new method can improve the identification accuracy of the mechanical fault type severity samples not included in the training samples and provides a superior result for bearing fault diagnosis.

Empirical Wavelet Transform
EWT overcomes the shortcomings of theory and mode mixing in EMD [12]. In this method, an orthogonal wavelet filter bank is constructed, by which amplitude modulated-frequency modulated (AM-FM) components with a compactly supported Fourier spectrum are extracted. These AM-FM components can describe the intrinsic modes of the original vibration signal. So, like EMD, EWT can decompose the original bearing vibration signal f (t) into a series of IMFs denoted by f k (t). Therefore, where each f k (t) is an AM-FM function. The process of EWT includes the following three steps: Step 1: Process the original bearing signal via Fast Fourier transform (FFT).
Step 2: Adaptively segment the Fourier spectrum of the signal.
Step 3: Apply scaling and wavelet functions corresponding to each segment to generate bandpass filters on each segment.
In [12], Gilles referred to the construction of both Littlewood-Paley and Meyer's wavelets. To choose the appropriate wavelet filter banks, the Fourier spectrum must be split adaptively. Suppose the Fourier support [0, π] is split into N successive parts. Then, ω l (l = 1, 2, · · · , N) represents the boundaries of the parts. An empirical scaling functionφ l (ω) and the empirical waveletsψ l (ω) are defined by Expressions (2) and (3), respectively.
EWT is defined like the classic wavelet transform. If F[·] and F −1 [·] represent the Fourier transform and its inverse transform, the detail coefficients are obtained by the inner products of applied signal with the empirical wavelets: The approximation coefficients are obtained by the inner product of the applied signal with the scaling function: whereψ l (ω) andφ l (ω) represent the Fourier transform of ψ l (ω) andφ l (ω), respectively. ψ l (t) and φ l (t) represent the complex conjugate of ψ l (t) and φ l (t), respectively. Then, the empirical mode f k (t) of the bearing vibration signal in (1) can be obtained by

Construction and Classification Progress of RF
RF is a classification algorithm based on a collection of decision trees built using a bootstrap sample. For tree building, both bagging and random feature selection are used in this method. Compared with SVM and ELM, RF has a superior classification ability [25]. The main characteristics of RF are strong robustness to outliers and noise, effective in assessing the generalization error, strength, correlation, and feature importance, and effective in preventing over-fitting [26]. The detailed classification principle of RF can be found in [26], and its classification process is as follows: (1) K sample sets selected randomly with replacement by bootstrap are used to build K decision trees, and the remaining samples after every selection are regarded as out-of-bag data. (2) m try features are selected from each node of the decision trees. The amount of discriminative information contained in the features is used to estimate the classification ability of the different features. The feature with the strongest classification ability is regarded as a segmentation feature of the node. Usually, m try = √ M, where M is the total number of features. (3) To obtain low-bias trees, no pruning operation is performed in each tree. (4) RF is constructed with K decision trees obtained through the above process. For the tested bearing samples, the final classification result of RF is determined by taking the voting results of all decision trees into account.

One-Class Support Vector
Because of the advantage of being trained by only one type of target sample, one-class classification can make up for the shortcoming of excessive reliance on the training samples in the multi-class classifiers. OCSVM is suitable for solving small sample, high dimension and non-linear problems. In this paper, OCSVM is used in the monitoring of bearing conditions. For a given training set {x i }, i = 1, 2, . . . , N, N represents the sample number in the training set. The aim of OCSVM is to find a hyperplane f (x) = ω, x − ρ that can separate the target samples (that is, the normal samples of bearings) and an origin with a maximal margin in a high-dimensional feature space [29]. The parameters ω and ρ are used to express the normal vector and intercept of the hyperplane, respectively. A slack variable ξ i is introduced to allow some outliers in training samples. v ∈ (0, 1] is called the error limitation, which is used to control the upper limit on the number of outliers. Nonlinear mapping ψ : x → ψ(x) can map the samples in input space to a high-dimensional feature space, coming down to the following quadratic programming problem: By introducing the kernel function and Lagrange multiplier α i , Equation (8) is transformed into Here, kernel function K(x i , x j ) = ψ(x i ), ψ(x j ) . The Radial Basis Function (RBF) kernel function used in this paper is as follows, and σ represents the width of the kernel function.
The decision function used to judge the state of bearings can be determined after obtaining α i according to (11).
After training OCSVM, for any bearing vibration sample z, whether z is a fault sample can be determined by (11).
The proposed method includes feature extraction, the training of the classifier, state detection and fault type recognition. First, EWT is used to extract the IMFs of the bearing vibration signal. Then, 284 features are extracted from the original signal and IMFs to construct the initial feature set. Lastly, a hybrid classifier based on OCSVM and RF is set up. OCSVM is used to determine whether a bearing fault has occurred. If a fault has occurred, the RF trained with all known faults is used to recognize the specific fault type. The flowchart of the proposed method is shown in Figure 1.  The proposed method includes feature extraction, the training of the classifier, state detection and fault type recognition. First, EWT is used to extract the IMFs of the bearing vibration signal. Then, 284 features are extracted from the original signal and IMFs to construct the initial feature set. Lastly, a hybrid classifier based on OCSVM and RF is set up. OCSVM is used to determine whether a bearing fault has occurred. If a fault has occurred, the RF trained with all known faults is used to recognize the specific fault type. The flowchart of the proposed method is shown in Figure 1.

Construction of the Initial Feature Set
The bearing dataset provided by Case Western Reserve University (CWRU) [1] has been used as benchmark data in the field of bearing fault diagnosis. Thus, this dataset is chosen as the test data for verifying the proposed method. The basic layout of the test rig is shown in Figure 2. It consists of a 2 hp Reliance Electric motor driving a shaft on which a torque transducer and encoder are mounted. Torque is applied to the shaft via a dynamometer and electronic control system. For the tests, faults were seeded on the drive-and fan-end bearings (SKF deep-groove ball bearings: 6205-2RS JEM and 6203-2RS JEM, respectively) of the motor using electro-discharge machining (EDM). The faults were seeded on the rolling elements and on the inner and outer races, and each faulty bearing was reinstalled (separately) on the test rig, which was then run at constant speed for motor loads of 0-3 horsepower (approximate motor speeds of 1797-1730 rpm) [30]. The sampling frequency of fault data used in the paper was 12,000 points per second for bearing fault diagnosis in the experiment.

Construction of the Initial Feature Set
The bearing dataset provided by Case Western Reserve University (CWRU) [1] has been used as benchmark data in the field of bearing fault diagnosis. Thus, this dataset is chosen as the test data for verifying the proposed method. The basic layout of the test rig is shown in Figure 2. It consists of a 2 hp Reliance Electric motor driving a shaft on which a torque transducer and encoder are mounted. Torque is applied to the shaft via a dynamometer and electronic control system. For the tests, faults were seeded on the drive-and fan-end bearings (SKF deep-groove ball bearings: 6205-2RS JEM and 6203-2RS JEM, respectively) of the motor using electro-discharge machining (EDM). The faults were seeded on the rolling elements and on the inner and outer races, and each faulty bearing was reinstalled (separately) on the test rig, which was then run at constant speed for motor loads of 0-3 horsepower (approximate motor speeds of 1797-1730 rpm) [30]. The sampling frequency of fault data used in the paper was 12,000 points per second for bearing fault diagnosis in the experiment.

Condition Classes of the Experimental Data
The bearing dataset of CWRU provides the machine condition information containing different bearing fault locations (ball, inner race and outer race), the fault severity (i.e., 0.007, 0.014 and 0.021 mils in the diameter of the artificially drilled hole into the bearing) and the position of the motor bearing (drive end and fan end). Therefore, the database divided based on different diagnostic targets is used to demonstrate the validity of the method. When only identifying the bearing fault locations, the machine condition contains the normal condition and three types of faults: ball fault, inner race fault, and outer race fault. When identifying both the fault locations and the position of the bearing, the machine condition contains the normal condition and six types of faults, which include the ball fault at the drive and fan ends, inner race fault at the drive and fan ends, and outer race fault at the drive and fan ends. When considering the bearing fault locations, the position of the bearing and the three types of fault severity simultaneously, the machine condition can further be divided into a normal condition and eighteen types of faults [1].
The total duration of the signals in the CWRU database is approximately 10 s. To acquire more samples, the total duration can be divided into a series of successive intervals that can be regarded as independent patterns. For various studies on bearing fault diagnosis, the length of each interval varies from 1024 to 8000 points [1,6]. In general, the more sampling points in a signal, the more fault information is contained, which is more useful for improving the classification accuracy. However, considering the efficiency of feature extraction, the number of samples required and the number of sampling points in the relevant literature, the length of each sample is confirmed as 4096 points (that is, almost ten rotation periods) [1]. Finally, considering the locations of the bearing faults, the fault severity, and the position of the bearing, 2000 samples comprising 200 normal samples are obtained. Figure 3 lists the normal, ball fault, inner race fault, and outer race fault signal waveform with 0 hp acquired at the drive end (DE) with a diameter of 0.007 mils. The effective identifying information is submerged in noise. Therefore, EWT is used to extract effective features.

Condition Classes of the Experimental Data
The bearing dataset of CWRU provides the machine condition information containing different bearing fault locations (ball, inner race and outer race), the fault severity (i.e., 0.007, 0.014 and 0.021 mils in the diameter of the artificially drilled hole into the bearing) and the position of the motor bearing (drive end and fan end). Therefore, the database divided based on different diagnostic targets is used to demonstrate the validity of the method. When only identifying the bearing fault locations, the machine condition contains the normal condition and three types of faults: ball fault, inner race fault, and outer race fault. When identifying both the fault locations and the position of the bearing, the machine condition contains the normal condition and six types of faults, which include the ball fault at the drive and fan ends, inner race fault at the drive and fan ends, and outer race fault at the drive and fan ends. When considering the bearing fault locations, the position of the bearing and the three types of fault severity simultaneously, the machine condition can further be divided into a normal condition and eighteen types of faults [1].
The total duration of the signals in the CWRU database is approximately 10 s. To acquire more samples, the total duration can be divided into a series of successive intervals that can be regarded as independent patterns. For various studies on bearing fault diagnosis, the length of each interval varies from 1024 to 8000 points [1,6]. In general, the more sampling points in a signal, the more fault information is contained, which is more useful for improving the classification accuracy. However, considering the efficiency of feature extraction, the number of samples required and the number of sampling points in the relevant literature, the length of each sample is confirmed as 4096 points (that is, almost ten rotation periods) [1]. Finally, considering the locations of the bearing faults, the fault severity, and the position of the bearing, 2000 samples comprising 200 normal samples are obtained. Figure 3 lists the normal, ball fault, inner race fault, and outer race fault signal waveform with 0 hp acquired at the drive end (DE) with a diameter of 0.007 mils. The effective identifying information is submerged in noise. Therefore, EWT is used to extract effective features. In Figure 4, the Fourier spectrum of the original signal is divided into various regions, which denote the frequency range of the corresponding IMF in Figure 5. The components at 0 to 4000 Hz comprise a great percentage of the total signal in a ball fault and inner race fault. The main part of the normal signal is concentrated below 2000 Hz, showing that the energy distribution in different frequency bands of different types of fault vibration signals is different. From Figure 5, the amplitude of every IMF of the four types of signals has a greater difference at the same severity. The amplitudes of most of the IMFs for each fault type at different fault severities also have a greater difference. For the ball fault at 0.007 mils, the maximum amplitude appears in the sixth IMF, which is close to 0.5. For the ball fault at 0.021 mils, the maximum amplitude appears at the fifth IMF, which is close to 0.4. For the inner race fault at 0.007 mils, the maximum amplitude appears in the fifth IMF, which is close to 1. For the inner race fault at 0.021 mils, the maximum amplitude appears in the fourth IMF, which is close to 2. For the outer race fault at 0.007 mils, the maximum amplitude appears in the sixth IMF, which is close to 3. For the outer race fault at 0.021 mils, the maximum amplitude appears in the fourth IMF, which is close to 3.

Analysis of the Bearing Vibration Signal Using EWT
When the bearing failure emerges, the fault characters in the frequency distribution of the fault vibration signals will have changed, and the energy distribution in different frequency bands will show the corresponding change. EWT can decompose a multicomponent signal into some IMFs in different frequency bands. The experimental results shown in Figure 4 and Figure 5 prove that, by observing the amplitude of the IMFs and computing the energy distribution in different frequency bands and time domains, the features of different fault types of bearings can be extracted from EWT results. To accurately describe the fault characteristics and identify the fault type such as fault severity, more fault information should be mined from the raw signals and IMFs, and the integral fault diagnosis system can be constructed on those features. In Figure 4, the Fourier spectrum of the original signal is divided into various regions, which denote the frequency range of the corresponding IMF in Figure 5. The components at 0 to 4000 Hz comprise a great percentage of the total signal in a ball fault and inner race fault. The main part of the normal signal is concentrated below 2000 Hz, showing that the energy distribution in different frequency bands of different types of fault vibration signals is different. From Figure 5, the amplitude of every IMF of the four types of signals has a greater difference at the same severity. The amplitudes of most of the IMFs for each fault type at different fault severities also have a greater difference. For the ball fault at 0.007 mils, the maximum amplitude appears in the sixth IMF, which is close to 0.5. For the ball fault at 0.021 mils, the maximum amplitude appears at the fifth IMF, which is close to 0.4. For the inner race fault at 0.007 mils, the maximum amplitude appears in the fifth IMF, which is close to 1. For the inner race fault at 0.021 mils, the maximum amplitude appears in the fourth IMF, which is close to 2. For the outer race fault at 0.007 mils, the maximum amplitude appears in the sixth IMF, which is close to 3. For the outer race fault at 0.021 mils, the maximum amplitude appears in the fourth IMF, which is close to 3.
When the bearing failure emerges, the fault characters in the frequency distribution of the fault vibration signals will have changed, and the energy distribution in different frequency bands will show the corresponding change. EWT can decompose a multicomponent signal into some IMFs in different frequency bands. The experimental results shown in Figures 4 and 5 prove that, by observing the amplitude of the IMFs and computing the energy distribution in different frequency bands and time domains, the features of different fault types of bearings can be extracted from EWT results. To accurately describe the fault characteristics and identify the fault type such as fault severity, more fault information should be mined from the raw signals and IMFs, and the integral fault diagnosis system can be constructed on those features. However, owing to the complexity and the nature of various types of signals, the number of IMFs obtained from various fault signals using EWT may be different. After the statistical analysis, the usual number of IMFs is six to nine for various signals, and different IMFs contain different characters of time-frequency energy distribution for fault diagnosis. On observing the Fourier spectrum of the various signals in Figure 4 and the EWT results in Figure 5, we can observe that the energy distribution of four types of signals is concentrated mainly in the low-frequency and mediumfrequency portions. Therefore, the normalized energy, which is the ratio of the energy of each IMF component to the energy of the raw signal, is treated as the selection criterion for useful IMF. To increase the persuasive power, the average energy ratio of 600 signals per type is calculated. Statistical analysis shows that the five IMF components with the most energy contain over 96% of the discriminative information. Therefore, the five IMF components with the most energy are selected as effective components for feature extraction. However, owing to the complexity and the nature of various types of signals, the number of IMFs obtained from various fault signals using EWT may be different. After the statistical analysis, the usual number of IMFs is six to nine for various signals, and different IMFs contain different characters of time-frequency energy distribution for fault diagnosis. On observing the Fourier spectrum of the various signals in Figure 4 and the EWT results in Figure 5, we can observe that the energy distribution of four types of signals is concentrated mainly in the low-frequency and medium-frequency portions. Therefore, the normalized energy, which is the ratio of the energy of each IMF component to the energy of the raw signal, is treated as the selection criterion for useful IMF. To increase the persuasive power, the average energy ratio of 600 signals per type is calculated. Statistical analysis shows that the five IMF components with the most energy contain over 96% of the discriminative information. Therefore, the five IMF components with the most energy are selected as effective components for feature extraction. If the extracted features have a high sensitivity in the case of mechanical state changes of bearings under various operating conditions, the fault diagnosis capability of the systems can be enhanced greatly. The descriptive ability of features in the time and frequency domain has their own significance, and thus, synthetic analysis is required. Therefore, the numerous time and frequency domain features are extracted both from raw signals and IMFs to avoid missing important information.
(1) Time domain: Eighteen types of time-domain features including maximum amplitude value (F y,1 and F y, 19 ), minimum amplitude value (F y,2 and F y,20 ), mean value (F y,3 and F y,21 ), standard deviation (F y,4 and F y,22 ), absolute average (F y,5 and F y,23 ), skewness value(F y,6 and F y,24 ), kurtosis value (F y,7 and F y,25 ), peak-to-peak value (F y,8 and F y,26 ), square root of the amplitude (F y,9 and F y,27 ), root mean square (F y,10 and F y,28 ), peak value (F y,11 and F y,29 ), shape factor (F y,12 and F y,30 ), crest factor (F y,13 and F y,31 ), impulse factor (F y,14 and F y,32 ), margin factor (F y,15 and F y,33 ), skewness factor (F y, 16 and F y,34 ), coefficient of variation (F y,17 and F y,35 ) and kurtosis factor (F y,18 and F y,36 ) are used. Here, y = 0, 1, 2, · · · , 5. When y = 0, the features are extracted from the raw signal; otherwise, the features are extracted from the y th IMF. The same is shown below.
For the CWRU bearing data, the sensor installed at the fan end (FE) can detect the bearing faults at the DE, and vice versa. Hence, the number of features is duplicated because of cross-detection [1].
For the above time-domain features, F y,1 to F y,18 are extracted from the bearing signal of the local end collected by the sensor installed at the local end. F y,19 to F y,36 are extracted from the bearing fault signal of the local end collected by the sensor installed at the opposite end. The same below.
(2) Frequency domain: The original vibration signals are transformed into frequency signals using FFT. The frequency signals are divided into several bands, and the mean frequency (F y,37 and F y,41 ), root mean square of frequency (F y,38 and F y,42 ), frequency center (F y,39 and F y,43 ) and root variance frequency (F y,40 and F y,44 ) are calculated for each band.
When the failure emerges, the energy distribution in the same bandwidth of different types of signals is different, and the energy distribution in different bandwidths of the same type of signal is also different. Therefore, the normalized energy of the selected IMF (F y,45 and F y,46 ) is extracted. Meanwhile, the singular value (F y,47 and F y,48 ) [14] is also extracted.
The time-domain and frequency-domain features are extracted from the raw signal and the selected IMF, and the normalized energy features and singular value features are extracted only from the selected IMF. The distribution of features is shown in Figure 6

Feature Analysis and Classification Ability Analysis of OCSVM and RF
The feature importance under different diagnosis targets and the classification ability of RF are analyzed in this section.
Three diagnosis targets are considered in this paper. In the experiments, the entire dataset is divided into a training set, a validation set, and a test set. The training set comprises 60% of the entire dataset, and both the validation and test set comprise 20% of the entire dataset. Only the bearing faults diagnosed with different targets are classified by RF, the number of decision trees denoted by tree n is set at 500, and the feature number at each split denoted by try m is set at 17. The GI of each feature for different diagnosis targets are shown in Figure   7. From Figure 7

Feature Analysis and Classification Ability Analysis of OCSVM and RF
The feature importance under different diagnosis targets and the classification ability of RF are analyzed in this section.
Three diagnosis targets are considered in this paper.
In the experiments, the entire dataset is divided into a training set, a validation set, and a test set. The training set comprises 60% of the entire dataset, and both the validation and test set comprise 20% of the entire dataset. Only the bearing faults diagnosed with different targets are classified by RF, the number of decision trees denoted by n tree is set at 500, and the feature number at each split denoted by m try is set at 17. The GI of each feature for different diagnosis targets are shown in Figure 7. From Figure 7, we can observe that there is a great difference in the GI of various features for different targets. Feature No. 260 has the highest Gini importance, at 14.3, for target 1; Feature No. 46 has the highest Gini importance, at 18.9, for target 2; Feature No. 186 has the highest Gini importance, at 20.3, for target 3. The feature value distribution of the first 4 features with the highest GI and the last 4 features with the lowest GI are also shown in Figure 8. From Figure 8, the feature value of the first four features for different types of signals only has a small scope of the cross-field, and their ability to distinguish the various faults is strong. The feature value distribution of the first four features has a large crossfield, and it is difficult to distinguish the various faults. This validates the effectiveness of the evaluation of the classification ability of the features using the GI. On the other hand, the importance of same feature for different diagnosis targets are different (as Figure 7). This means that the optimal feature set for different diagnosis targets will be different.  The feature value distribution of the first 4 features with the highest GI and the last 4 features with the lowest GI are also shown in Figure 8. From Figure 8, the feature value of the first four features for different types of signals only has a small scope of the cross-field, and their ability to distinguish the various faults is strong. The feature value distribution of the first four features has a large cross-field, and it is difficult to distinguish the various faults. This validates the effectiveness of the evaluation of the classification ability of the features using the GI. On the other hand, the importance of same feature for different diagnosis targets are different (as Figure 7). This means that the optimal feature set for different diagnosis targets will be different. The feature value distribution of the first 4 features with the highest GI and the last 4 features with the lowest GI are also shown in Figure 8. From Figure 8, the feature value of the first four features for different types of signals only has a small scope of the cross-field, and their ability to distinguish the various faults is strong. The feature value distribution of the first four features has a large crossfield, and it is difficult to distinguish the various faults. This validates the effectiveness of the evaluation of the classification ability of the features using the GI. On the other hand, the importance of same feature for different diagnosis targets are different (as Figure 7). This means that the optimal feature set for different diagnosis targets will be different. To improve the classification ability of RF, an experiment with different input feature sets is performed. The descending ordered 284 features by GI are added to an empty set Q. For each additional feature in Q, the new training set in Q is used to train an RF classifier, and the accuracy of the RF in the new test set is recorded. The classified accuracy of various subsets for diagnostic target 1, target 2 and target 3 are shown in Figure 9. To improve the classification ability of RF, an experiment with different input feature sets is performed. The descending ordered 284 features by GI are added to an empty set Q. For each additional feature in Q, the new training set in Q is used to train an RF classifier, and the accuracy of the RF in the new test set is recorded. The classified accuracy of various subsets for diagnostic target 1, target 2 and target 3 are shown in Figure 9.   Figure 9 shows that the classification accuracy of RF under various diagnosis targets gradually increases to 100% with the increase in feature number. After that, the classification accuracy of RF  remains stable with the further increase of feature dimension. Therefore, RF can achieve high diagnosis accuracy with a high-dimensional original feature set for different diagnosis targets.

Diagnosis Result of Various Scenarios
The following three fault scenarios are set to verify the validity of the method proposed in this paper.

Fault Scenario 1: All Types are Included in the Training Set
To avoid the contingency caused by using only classification accuracy (ACC) as the measurement, a Kappa coefficient denoted by K is also used. The Kappa coefficient K is used to measure the consistency between the actual and predicted classifications. Considering both the Kappa coefficient K and classification accuracy can avoid the contingency when only considering the classification accuracy. The calculation method of K can be found in [31]. Therefore, the evaluation index denoted with η is as follows: To verify the classification ability of RF, a comparative test is carried out by OCSVM-RF, RF, SVM and BPNN, as shown in Table 1. The method of building the SVM and BPNN is the same as the method shown in [32]. Table 1 shows that RF has a better classification ability than BPNN and SVM for a high-dimensional original feature set. The diagnosis result of RF is decided by numerous decision trees and avoids false identification to the greatest extent. It is more suitable for high dimensional fault diagnosis scenario than other methods. In practical applications, samples with various fault severities are always insufficient and unbalanced. Traditional multi-classification methods may misidentify a sample with an unknown severity as the wrong type, even as a normal sample. Thus, the multi-classification method should first determine whether the mechanical state of the bearings is normal.
To verify the fault diagnostic capacity of the proposed method when diagnosing a sample with unknown fault severity, the OCSVM-RF hybrid classifier is used for comparison with SVM, BPNN and RF. For OCSVM, v = 0.75, σ = 16.12. In this experiment, the fault location is regarded as the identified target. The ball fault of the bearing at DE denoted by DE-BAF is regarded as a special fault type with unknown fault severity; two types of fault severity samples with 50 samples per fault severity are randomly selected from DE-BAF as the test samples and not added in the training set. One hundred samples are randomly selected from the remaining kinds of fault severity in DE-BAF, which combines 100 normal samples and the remaining five fault types with 100 samples per type for constructing the training set. According to the feature set described in Figure 5, for the hybrid classifier, OCSVM is trained only by normal samples, and RF is trained using the remaining fault samples. SVM, BPNN and RF are trained by the entire training set. When the ball fault at FE denoted by FE-BAF is regarded as a special type, the training of the classifiers is the same as above. The classification results of the various classifiers for the special type are shown in Table 2. As Table 2 shows, when the training set cannot completely cover the samples with various fault severities, SVM, BPNN and RF misidentify some samples with an unknown fault severity as the wrong type, even normal samples, illustrating that because of excessive reliance on the training samples, the state monitoring ability of multi-class classifiers is already weakened. The accuracy of RF is between 92% and 100%, and the accuracy of other classifiers is less than 87%. As compared with a single multi-class classifier, the accuracy of OCSVM-RF is 98% to 100%, and all test samples are identified as having a fault state. OCSVM-RF can retain the strong classification ability of RF while improving the ability of state monitoring.

The Comparisons of Diagnostic Results
Because the CWRU bearing dataset has been the benchmark in bearing fault diagnosis, the new method in this paper is used to compare with the methods proposed in the published papers, where all those methods are also using the CWRU dataset. The comparative results are shown in Table 3. In Ref. [2], a deep neural network for domain adaptation in fault diagnosis (DAFD) is proposed and applied to identify the four types of bearing faults, and finally a recognition accuracy of 94.73% was achieved. Amar et al. [5] used vibration spectrum imaging (VSI) and an artificial neural network (ANN) for bearings fault diagnosis and got 96.9% accuracy. In Ref. [4], a local connection network (LCN) constructed by normalized sparse autoencoder (NSAE), namely, NSAE-LCN, is used for bearing fault diagnosis, and 99.92% accuracy was obtained. Zhang et al. [9] used EEMD for feature extraction and an optimized SVM for the identification of six kinds of bearing faults, and they obtained 97.04% classification accuracy. In Ref. [19], EMD and wavelet kernel local fisher discriminant analysis (WKLFDA) are used for feature extraction and dimensional reduction, and SVM was used to classify ten bearing conditions. Finally, a classification accuracy of 98.80% was obtained. In all of the methods compared, only normal conditions and four to ten known fault types were selected to train multi-classifiers and carry out fault diagnosis. The influence of imbalance of samples is not considered in [2][3][4][5], [9] and [19]. Compared to other methods, the proposed method in this paper can detect the mechanical state of bearings correctly when the samples are imbalanced. Moreover, the training of classifiers is constructed according to the three different diagnostic targets, and the accuracy of the method is increased greatly. When the number of classes is eighteen, which is much more extensive and complicated than the number of classes usually found in related work, 100% classification accuracy is still achieved by the method proposed in this paper. Clearly, Table 3 shows that the new method has a superior ability to diagnose the bearing faults or an even more complicated mechanical system.

Conclusions
A bearing-fault diagnosis method that is based on a hybrid classifier and that considers the various diagnostic targets and imbalanced sample number is proposed.
The main contributions of this research are as follows: (1) Common features in the field of bearing fault diagnosis are collected, and a comprehensive feature set is constructed.
(2) Various diagnostic targets based on a practical project were determined. RF with high dimensional comprehensive feature set are constructed, and optimal feature set and classifier structure are constructed in the training process with different diagnosis targets automatically.
(3) The new method compensates for the shortcomings of misidentifying the fault type as normal samples for traditional methods under the scenarios with imbalanced training samples by a novel hybrid classifier constructed using OCSVM and RF combining the strong classification ability of RF and the state monitoring ability of OCSVM.

Conflicts of Interest:
The authors declare no conflict of interest.