^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

In bearing diagnostics using a data-driven modeling approach, a concern is the need for data from all possible scenarios to build a practical model for all operating conditions. This paper is a study on bearing diagnostics with the concurrent occurrence of multiple defect types. The authors are not aware of any work in the literature that studies this practical problem. A strategy based on one-

In bearing diagnostics using a data-driven modeling approach, a concern is the need for data from all possible scenarios to build a practical model for all operating conditions. This paper is a study on bearing diagnostics with the concurrent occurrence of multiple defect types. The authors are not aware of any work in the literature that studies this practical problem. In this paper, a strategy based on one-

This paper is a study on vibration-based bearing diagnostics with the data-driven approach. Data mining and knowledge discovery methods are used, with consideration of non-exclusive bearing defect types. The problem of bearing fault diagnostics is formulated to diagnose multiple defects from normal and single defect training data. The accuracy of the proposed OVA strategy on bearing diagnosis of unseen observations is evaluated with vibration data collected from laboratory.

The remaining of this section is a brief review on bearing diagnostic analysis of acceleration signals obtained from piezoelectric sensors, using data mining methods. Section 2 is a brief summary on the background of techniques used in this paper, namely support vector machine (SVM); C4.5 decision tree; and class binarization. Section 3 introduces the hypothesis and logic behind the problem formulation used in modeling. Empirical analyses of two bearing data sets, collected from a bearing fault motor and a multiple bearing mechanical system, are reported in Sections 4 and 5 respectively, each followed by a discussion of the results in the same section. Finally, this paper closes with the conclusions in Section 6.

Bearings are one of the most widely used components in machines and bearing failure is one of the most frequent reasons for machine breakdown. According to [

Bearing faults can refer to localized defects or extended spalls. In rotating machinery, whenever a defect makes contact with another surface, a high-level short-duration vibration impulse is excited. The effect is a damped signal which comprises of a sharp rise corresponding to the impact and approximately exponential decay [

Since the rotation of the machinery is periodic, impulses and the damped signals due to a particular defect also tend to be generated periodically. The frequency of impulse occurrence is the characteristic defect frequency of a particular defect and the theoretical value can be calculated from the physics of a component. The existence of a fault is indicated by the occurrence of the impulses at the characteristic defect frequency, and the impulse amplitude provides some information about how serious the fault is. The same information also occurs as sidebands in the high resonant frequency bands.

The goal in fault detection and diagnosis is to extract from vibration signals the presence of impulses and identify the matching characteristic defect frequencies, which act as fault signatures for the defects concerned. In practice, vibration signals due to defect impacts mix together with those generated by normal rotation of the same component, other components of the same rotating machinery and other random vibrations. Another complication is that the actual frequency is affected by the tightness-of-fit of the components and there are generally some differences between the measured frequencies and the theoretical characteristic defect frequency values due to random slip [

The signal processing approach constitutes a major part of the literature in vibration-based bearing fault diagnostics. This approach has been developed for decades with a huge literature and is not the focus of this paper. A useful review in this classic approach is given in [

Another approach to vibration-based bearing diagnostics is the use of data mining techniques. Data mining, first appeared as knowledge discovery in databases [

Data mining techniques are grouped under different names in the machine diagnostics literature. For example, in the review on machinery diagnostics and prognostics [

No matter which approach is used in the fault diagnosis, some common requirements exist for an ideal fault diagnostic system. While the discussion in [

Early detection and diagnosis ability: Achieve a balance in the trade-off between quick response and high false alarm rate.

Fault isolation ability: Be able to discriminate between different failures at different locations and different levels.

Robustness: To maintain performance at an acceptable level under noise and uncertainty.

Novelty identification ability: The ability to decide the state of a system as normal or abnormal, and if abnormal, a known fault or a novel fault state.

Multiple fault identification ability: The ability to identify multiple faults, which is difficult due to the interacting nature of most faults.

Adaptability: The ability to adapt to changes in external inputs or structural changes.

Reasonable storage and computational requirement: Achieve a balance in the trade-off between the computational complexity and system performance.

Explanation facility: To reason about cause and effect relationship in a process and provide possible explanations on how the fault originated and propagated to the current situation. To justify why certain hypotheses are proposed and why certain others are not, thus help the operator evaluate and act upon his/her experience.

The focus of this paper lies in the multiple fault identification ability with statistical and data mining techniques. One or more of different defect types can occur on a bearing concurrently in practice. A bearing diagnostic method that is able to identify both single and multiple defects is more useful in real-world applications. In classification, training data for all system states to be recognized by a model are usually necessary [

The proposed OVA approach is designed for fault diagnosis in the second phase of the systems health management process. In a real application such as machinery fault diagnostics and prognostics in a factory, the first phase

Algorithms in data mining are often grouped by the type of model generated or the most usual way of formulating a problem.

Classification and regression are the most common data mining formulations, where the classes (categorical in classification and continuous in regression) are utilized in the modeling (supervised learning). Five of the ten popular algorithms are designed for classification and regression.

The most intuitive formulation for fault diagnostics is classification and the simplest formulation for fault prognostics is regression. As in any modeling problems, the actual formulation used in the diagnosis/prognosis depends on the data available and how the analyst formulates the problem. For example, bearing diagnostics can also be modeled as a clustering problem [

In this paper, we stick to the classic formulation of bearing diagnosis with classification. Classification, the most familiar and most popular data mining task, is a mapping from the database to the set of predefined, non-overlapping classes that partition the entire database. SVM is a kernel method which builds a model by constructing a hyperplane that best separates two classes. Decision tree methods recursively generate a tree by constructing hierarchical and non-overlapping classification rules using a greedy algorithm. By taking the best immediate, or local, solution in finding an answer, the algorithm selects an attribute to split on at any given node with any given data set, decides whether to stop branching and what branches to form. SVM and C4.5 are used in our proposed method due to its popularity in prognostics and systems health management (PHM) related research, including but not limited to bearing diagnostics with vibration data.

Besides the algorithm(s) to use, the way how data are utilized in modeling is another critical factor for the success of data-driven methods in modeling a problem. For example, when a multi-class real-world problem is modeled with a binary class classification algorithm, as in SVM, different class binarization strategies can be used. Just as classification algorithms can be useful in regression problems, class binarization strategies may also be utilized in improving diagnostics performances and possibly enable diagnosis of multiple defects from single defect training data.

The following notations will be used in both Sections 2.1 and 2.2. Let

SVM is a kernel method first introduced in [

Consider the case of two classes,

If the two classes are completely separable, the data set should satisfy the constraints below:

These can alternatively be expressed in complete equation,

The optimal separating hyperplane is the separating hyperplane with greatest distance between the plane and the nearest data points on both sides,

With consideration of the not completely separable case, the SVM modeling can be expressed as an optimization problem with slack variables _{i}_{i}_{i}_{i}

For non-linear classification tasks, the application of kernel functions is required to transform the problem into a higher dimensional feature space to make linear separation possible. The choice of the appropriate kernel function is very important. A common choice is a polynomial kernel with the kernel parameters and the penalty margin

For problems with more than two classes, a multi-class classification strategy, such as one-

Recent examples of research papers applying SVM to bearing fault diagnostics include [

The SVM used in this paper is built using sequential minimal optimization (SMO) [

C4.5 decision tree is the most popular method in the ID3 decision tree family. The ID3 decision tree [

In Information theory, a probabilistic method for quantifying information is developed by Shannon [

The entropy ranges from zero to infinity. It is zero if and only if there is only one probable value for

Similarly, the uncertainty of _{j}_{j}_{j}_{j}_{xj}_{=} _{v}_{j}

Information gain Gain (_{j}_{j}_{j}

One drawback of information gain is its bias towards attributes with a large number of possible values [_{j}

C4.5 decision tree starts modeling with the whole training set

C4.5 decision tree allows the use of fields with missing values and employs post-pruning to avoid overfitting. It is capable of handling different measurement scales as well as both continuous and categorical variables. Advantages of C4.5 include easy to implement, no underlying assumptions and ability to generate non-parametric and non-linear models, and ability to reduce the dimensionality of input feature space in a way interpretable to the analysts. On the other hand, the trees produced by C4.5 are very large and complex. C5.0 is another enhancement of C4.5 with additional effectiveness, efficiencies and supports boosting. However, the C4.5 and C5.0 decision tree algorithms produce models with similar predictive accuracies [

While C4.5 decision trees naturally handle multiple classes, they can only examine a single feature at a time and do not perform well when the features are highly interdependent. However, they generate models that resemble the human decision-making process and become popular due to their interpretable results [

Real-world problems often involve multiple classes, but some classification algorithms, such as SVM, are inherently binary. Class binarization strategies reduce a k-class problem into a series of binary problems for classification. Two most common strategies in the literature are one-

In the OVO or OAO strategy, classifiers are trained to discriminate between two of the k possible classes. One classifier is trained with a data subset of each possible pair of classes, then the output of the (

In the OVA or OAA strategy, classifiers are trained to determine whether an instance belongs to one of the k classes or not. One classifier is trained with the whole training set, where each time

The differences among the formulations of multi-class classification (

These class binarization strategies can also be useful in other situations. For example, OVA class binarization strategy has been adopted in a general framework for class-specific feature selection using any feature selectors in [

Most of the work in binarization strategies apply multiple online empirical data sets to evaluate the classification performances in general. The authors are not aware of any work in the literature on the application of binarization strategies to the specific context of machinery fault diagnostics for our purpose. In this paper, the OVA class binarization strategy is adopted on all single-defect classes and samples of the healthy class are added to the relabeled class, to predict concurrent defects from normal and single-defect training data. This can help reduce the types of training data needed for bearing fault diagnosis of combined defects. More details of how this procedure is applied are discussed in the next section.

Common types of bearing defects with characteristics frequencies detectable from vibrations are: (1) fundamental train (cage) frequency (FTF); (2) ball pass frequency, outer race (BPFO); (3) ball pass frequency, inner race (BPFI); and (4) ball spin frequency (BSF). In practice, incipient faults first detected from vibration signals are usually BPFO; BPFI; or BSF.

Multiple types of defects can develop on the same bearing concurrently

The proposed method is summarized as a flowchart in

As an example, consider the multi-class bearing fault diagnosis of normal and three fault types (BPFI, BPFO, BSF). Applying the proposed OVA diagnostic approach to the three fault types, the 4-class classification problem among Normal, BPFI, BPFO, BSF is transformed into three 2-class classification sub-problems. They can be denoted by classifier isBPFO that handles classes BPFO

In this formulation, a normal bearing is expected to return the negative class (−1 in Section 2 or N in

If the interactions between different defect types are small enough to be ignored in modeling, a combined defect of BPFO and BPFI shall give positive for classifiers isBPFO and isBPFI, and negative for isBSF. In this case, classifiers for combined defects will not be needed and hence training data of combined defects need not be collected. On the other hand, if a combined defect cannot be correctly classified with the single-defect OVA classifiers, the OVA class binarization strategy cannot help simplify the fault diagnosis process. In this case, the interactions between different defect types have to be modeled and vibration data of different combinations of defects need to be collected, if fault diagnostics of multiple defects (which occur at a late stage of bearing degradation) is desired.

From domain expert knowledge, there is a known fixed relationship between the characteristic defect frequencies BPFO and BPFI for bearings due to geometry,

In this section, the proposed OVA strategy is evaluated with experimental data from a bearing motor test bed. The SVM classifier is used as an example of statistical and data mining methods that commonly used with OVA class binarization for other purposes.

The data set used in this empirical analysis was collected from a bearing fault motor as shown in

In this experiment, the bearing could be in normal; BPFO; BPFI; BSF; or BPFO and BPFI combined state (COMB). An accelerometer was connected at vertical direction of the bearing. Vibration data were collected at a sampling rate of 4,000 Hz. Two samples were extracted for each experimental condition. Each sample consists of one temporal signal of 0.5 s length. By the Nyquist sampling theorem, the highest frequency that can be shown in a spectrum is 4,000/2 = 2,000 Hz (or 85.8 ×

Six summary statistics commonly used in the literature, namely median, 75-th percentile, maximum, root-mean-square (RMS), skewness and kurtosis, are extracted from the time waveform signals, the time series distributions and the frequency range between the 4th to 10th harmonics of the bearing characteristics frequencies [

The testing performance in terms of prediction accuracy of the proposed method is compared with that of a simple multi-class classification using the same data split. The confusion matrices are shown in the corresponding subsections, followed by discussions of the results. The single-defect classification performance is first compared to evaluate the improvement in diagnostic accuracy for trained (single defect) classes. The concurrent-defect classification performance is then evaluated on the potential to enable multiple-defect bearing diagnostics from single-defect training data. The diagnosis accuracy of random guess classification and worst-case classification are also considered.

In the baseline method, multiple classes are handled by pairwise classification as in typical SVM. The data are neither normalized nor standardized.

The predictors used in the OVA formulation are the same as that in the multi-class classification. The 18 features are fed into three SVMs that model binary decisions of whether each of them belongs to each single-defect, as in

The prediction performance of normal and single-defect bearings is first compared. When one-

Next we compare the performance for concurrent defects, using the COMB data. When a multi-class classification is used, there is no way for the classifier to give the correct classification because this class does not exist in the training data and hence the classifier model. When the OVA formulation is used, all of the COMB test samples (two out of two) are classified as the correct combined defect BPFI + BPFO.

The use of OVA class binarization in our proposed way increases the overall bearing fault diagnostic accuracy, from 33.3% to 66.7% for normal and single defect types, and from 0% to 100% for COMB. The overall test accuracy of 75% of the proposed method is also much better than the random guess among eight classes (12.5%) and the worst-case classification of classifying all cases to the majority class,

For this data set, the next best performing classifier is C4.5 decision tree, which produces a classification accuracy of four out of six (66.7%) for normal and single-defect test cases in the multi-class formulation and the slightly lower three out of six (50%) in the OVA approach. However, an examination of the generated model shows that only two of the four types are modeled in the decision tree, probably because the training size is too small.

In this section, the proposed OVA strategy is evaluated with experimental data from a multiple bearing mechanical system, which includes a shaft suspended by two bearings. The popular C4.5 decision tree is an example of statistical and data mining methods that do not normally use with binarization, yet produces high diagnostics accuracies with our proposed OVA diagnostic approach.

The data set used in this empirical analysis was collected from a machine fault simulator (MFS) as shown in

In this experiment, the bearing again could be in normal; BPFO; BPFI; BSF; or BPFO and BPFI combined state (COMB). Either of the bearings on the left and right of the shaft can be at fault and we assume that the faulty bearing has been located in the isolation stage. Vibration data were collected from the accelerometers at vertical and horizontal direction of the bearing either on the left or the right of the rotors,

The total number of samples used in this analysis is: 5 (health states) × 2 (bearing locations) × 10 (replicates) = 100 observations. A 80-20 split is applied on normal and single defect data for training and testing to evaluate the prediction accuracies,

The same six summary statistics in the previous analysis,

The structure of this subsection is the same as Section 4.2 for the other data set, except that the worst-case classification accuracy for this data set is the same as the random guess classification accuracy. This is because equal number of training samples is used for each bearing defect type in this empirical analysis.

The prediction performance of normal and single-defect bearings is first compared. When one-

Next we consider the performance for concurrent defects, using the COMB data. When the OVA formulation is used, eight of the 20 COMB test samples are correctly classified as BPFO + BPFI. The diagnosis accuracy increases from 0% for the baseline to 40%, also much higher than the random guess among eight classes (12.5%).

Overall, the C4.5 decision tree produces very satisfactory performances with the proposed OVA diagnostic approach. In particular, the diagnosis accuracy of combined fault samples using C4.5 with OVA is much better than that using SVM with on the same data set (which performed worse than the worst case classification accuracy on the COMB samples). This further suggests that the combined fault classification accuracy is not solely contributed by the use of polynomial kernel in the SVM.

In real-world applications of bearing diagnostics, multiple defect types may occur at the same time. While many data-driven modeling methods naturally handle mutually exclusive groups of data, no discussions on the handling of non-exclusive concurrent faults in bearing diagnostics are found in the literature. Moreover, the need for data collection from different operating scenarios such as permutations and combinations of fault type, location, size, machine load and speed is a concern in using data-driven methods for bearing diagnosis. In this paper, we have proposed a formulation strategy to improve diagnostics performance and reduce the number of scenarios needed in the training data, with focus on multiple defects that may occur concurrently. Using two sets of test bed data collected from a bearing motor and a multiple component mechanical system, we have shown the potential of a one-

This research was supported in part by the Centre for Systems Informatics Engineering (CSIE) (CityU #9360144); HK RGC CRF (#8730029); HK RGC GRFs (#9041682 and #9041578); and CityU grant SRG-Fd (#CityU122613). The authors would like to thank Dong Wang in the same department for his constructive comments and suggestions to improve this paper.

The authors declare no conflict of interest.

Three different classification strategies illustrated with a 3-class problem (

The proposed bearing fault diagnosis method with one-

The bearing fault motor.

Plots of a normal bearing data sample: (

Plots of a BPFO bearing data sample: (

Plots of a BPFI bearing data sample: (

The machine fault simulator (MFS).

Plots of a normal bearing data sample—vertical (top) and horizontal (bottom) sensor (

Plots of a BPFO bearing data sample—vertical (top) and horizontal (bottom) sensor (

Plots of a BPFI bearing data sample—vertical (top) and horizontal (bottom) sensor (

The top 10 data mining algorithms [

C4.5 | decision tree | Classification |

k-means | distance-based | Clustering |

SVM (Support Vector Machine) | geometric | Classification |

Apriori | rule-based | Association rules |

EM (Expectation Maximation) | statistical | Clustering |

PageRank | network graph | Ranking |

AdaBoost | boosting | Ensemble |

kNN (k Nearest Neighbor) | distance-based | Classification |

Naïve Bayes | statistical | Classification |

CART (Classification and Regression Tree) | decision tree | Classification; Regression |

One-

| ||
---|---|---|

isBPFI | BPFI | notBPFI (Normal; BPFO; BSF) |

isBPFO | BPFO | notBPFO (Normal; BPFI; BSF) |

isBSF | BSF | notBSF (Normal; BPFI; BPFO) |

Fault diagnostics with a one-

| |||
---|---|---|---|

N | N | N | Normal |

Y | N | N | BPFI |

N | Y | N | BPFO |

N | N | Y | BSF |

Y | Y | N | BPFI + BPFO |

Y | N | Y | BPFI + BSF |

N | Y | Y | BPFO + BSF |

Y | Y | Y | BPFI + BPFO + BSF |

Bearing parameters and characteristic frequencies (BCFs) of each defect.

No. of balls N | 14 |

Ball diameter B | 8 |

Pitch diameter P | 46.55 |

Contact angle |
0 |

BPFI | 8.203 × |

BPFO | 5.797 × |

BSF | 2.823 × |

Bearing fault simulation data for training and testing.

BPFI_motorA_1s_01_sample_01 | BPFI | Train |

BPFO_motorA_1s_01_sample_01 | BPFO | Train |

BPFO_motorA_1s_01_sample_02 | BPFO | Train |

BPFO_serious_motorA_1s_01_sample_02 | BPFO | Train |

Normal_motorA_1s_02_sample_01 | NORMAL | Train |

Normal_motorA_1s_02_sample_02 | NORMAL | Train |

Normal_motorA_1s_03_sample_01 | NORMAL | Train |

ball_motorA_1s_01_sample_02 | BALL | Train |

| ||

BPFO_combine_BPFI_motorA_1s_01_sample_01 | COMB | Test |

BPFO_combine_BPFI_motorA_1s_01_sample_02 | COMB | Test |

BPFI_motorA_1s_01_sample_02 | BPFI | Test |

BPFO_serious_motorA_1s_01_sample_01 | BPFO | Test |

Normal_motorA_1s_01_sample_01 | NORMAL | Test |

Normal_motorA_1s_01_sample_02 | NORMAL | Test |

Normal_motorA_1s_03_sample_02 | NORMAL | Test |

ball_motorA_1s_01_sample_01 | BALL | Test |

Test performance of multi-class classification of single defects.

| ||||
---|---|---|---|---|

BPFI | ||||

BPFO | 1 | |||

BSF | 1 | |||

Normal | 2 | |||

COMB | 2 | |||

| ||||

Correct | ||||

| ||||

0.333 |

Test performance with one-

| |||||
---|---|---|---|---|---|

BPFI | |||||

BPFO | 1 | ||||

BSF | 1 | ||||

Normal | |||||

COMB | |||||

| |||||

Correct | |||||

| |||||

0.667 | 1.000 |

Bearing parameters and characteristic frequencies (BCFs) of each defect.

No. of balls N | 8 |

Ball diameter B | 0.3125 |

Pitch diameter P | 1.319 |

Contact angle |
0 |

BPFI | 4.948 × |

BPFO | 3.052 × |

BSF | 1.992 × |

Test performance of multi-class classification of single defects.

| ||||
---|---|---|---|---|

BPFI | ||||

BPFO | ||||

BSF | ||||

Normal | 1 | |||

COMB | 20 | |||

| ||||

Correct | ||||

| ||||

0.938 |

Test performance with one-

| |||||
---|---|---|---|---|---|

BPFI | |||||

BPFO | |||||

BSF | |||||

Normal | |||||

COMB | 7 | 4 | 1 | ||

| |||||

Correct | |||||

| |||||

1.000 | 0.400 |