Simultaneous-Fault Diagnosis of Gearboxes Using Probabilistic Committee Machine

Zhong, Jian-Hua; Wong, Pak Kin; Yang, Zhi-Xin

doi:10.3390/s16020185

Open AccessArticle

Simultaneous-Fault Diagnosis of Gearboxes Using Probabilistic Committee Machine

by

Jian-Hua Zhong

,

Pak Kin Wong

and

Zhi-Xin Yang

^*

Department of Electromechanical Engineering, University of Macau, Macao, China

^*

Author to whom correspondence should be addressed.

Sensors 2016, 16(2), 185; https://doi.org/10.3390/s16020185

Submission received: 23 October 2015 / Revised: 21 January 2016 / Accepted: 22 January 2016 / Published: 2 February 2016

(This article belongs to the Section Physical Sensors)

Download

Browse Figures

Versions Notes

Abstract

:

This study combines signal de-noising, feature extraction, two pairwise-coupled relevance vector machines (PCRVMs) and particle swarm optimization (PSO) for parameter optimization to form an intelligent diagnostic framework for gearbox fault detection. Firstly, the noises of sensor signals are de-noised by using the wavelet threshold method to lower the noise level. Then, the Hilbert-Huang transform (HHT) and energy pattern calculation are applied to extract the fault features from de-noised signals. After that, an eleven-dimension vector, which consists of the energies of nine intrinsic mode functions (IMFs), maximum value of HHT marginal spectrum and its corresponding frequency component, is obtained to represent the features of each gearbox fault. The two PCRVMs serve as two different fault detection committee members, and they are trained by using vibration and sound signals, respectively. The individual diagnostic result from each committee member is then combined by applying a new probabilistic ensemble method, which can improve the overall diagnostic accuracy and increase the number of detectable faults as compared to individual classifiers acting alone. The effectiveness of the proposed framework is experimentally verified by using test cases. The experimental results show the proposed framework is superior to existing single classifiers in terms of diagnostic accuracies for both single- and simultaneous-faults in the gearbox.

Keywords:

simultaneous-fault diagnosis; Hilbert-Huang transform; pairwise-coupling probabilistic committee machine

1. Introduction

In the rotating machinery, gearboxes are widely used to transmit power from the prime mover to the load. If any failure occurs in the gearbox, it may interrupt normal machine operation and endanger users. Consequently, it is of great significance to develop a reliable and accurate intelligent system to diagnose the main components of the gearbox, such as gears and bearings. There are two main challenges in gearbox diagnosis. One is the existence of simultaneous faults, that is, multiple single faults that appear concurrently. The other is that no unique sensor can detect all the machine faults. To accurately detect more faults, many kinds of sensors and signals may be involved at the same time. However, it is difficult to analyze different kinds of signals simultaneously and make a decision. In the [1,2,3,4,5,6,7], various gearbox diagnostic systems have been proposed. In these systems, the fault diagnosis procedures are mainly divided into two stages: (1) signal processing and (2) fault identification/classification.

The existing problems in signal processing of these systems are that the signals usually contain high-dimensional data and suffer from background noise interference, which degenerates the accuracy and fault identification time. Besides, the gearbox usually has many rotating components working together, such as bearings, gears and spindles, so the diagnosis of the gearbox is a simultaneous fault problem. In traditional gearbox fault diagnostic methods, simultaneous faults are usually considered as an independent label for the classifier, which will result in a high cost in acquiring exponentially increased simultaneous fault signals. For example, with d single-faults (labels) and one normal condition, there are 2^d − (d + 1) artificial simultaneous fault labels [8,9,10]. To solve this problem, an effective signal de-noising method and a proper feature extraction technique which can find the single fault pattern features in simultaneous fault patterns are studied together.

Currently, some methods, including spectral subtraction, least squares, and wavelet threshold methods, are widely used for signal de-nosing [11,12]. In order to effectively de-noise the non-stationary signals of a gearbox, a soft threshold method based on the discrete wavelet transform (DWT) is adopted in this study due to its popularity.

References [8,9,10] reported that a simultaneous fault symptom can be identified by analyzing the single fault patterns only if the classifier is trained by using a proper feature extraction technique, so that it can save a lot of resources to collect a large combination of simultaneous fault training data. Existing techniques to select a proper feature extraction technique are reviewed here. At present, there exist many methods to extract features from fault signals, such as Fourier transform, short time Fourier transform, and wavelet transform. The Fourier transform is only suitable for analyzing stationary signals. However, the signals of rotating gears and bearings are non-stationary, which makes the Fourier transform unsuitable for this application. The time-frequency analysis methods, such as short time Fourier transform (STFT) and wavelet transform, can process non-stationary signals, but they all have limitations. STFT has a limitation in non-stationary signal processing because of its use of a fixed time window which makes it impossible to achieve good resolution in the time and frequency domains at the same time. The drawback of the wavelet transform is that it suffers from the effect of the energy leakage because any signal which does not well correlate with the shape of wavelet basis function will be masked or completely ignored. In contrast to STFT and the wavelet transform, the Hilbert-Huang transform (H-HT) is the latest time-frequency signal processing technique to analyze nonlinear and non-stationary signals. The first step of a typical H-HT process is to employ the empirical mode decomposition (EMD) algorithm to decompose a complicated signal into a series of intrinsic mode functions (IMFs), which contains the local characteristics of the original signal at different time scales, and then a Hilbert transform is applied to each intrinsic mode function (IMF) for Hilbert spectrum analysis. The high time-frequency resolution of the H-HT method can effectively describe the rules of the changing frequency compositions with time, which is a good approach for analyzing non-stationary signals. Even though H-HT has been applied to many applications, particularly in fault detection and diagnosis [13,14], it has some disadvantages: (1) the issue of mode mixing; and (2) the redundant intrinsic mode functions easily appear at low frequency, which can cause the distortion of the processed result [15]. To overcome these disadvantages, this study applies ensemble empirical mode decomposition (EEMD), an improved EMD method, to deal with the mode mixing problem, and uses the correlation coefficient method to eliminate the redundant IMFs. The EEMD-based H-HT is hereafter refered to as HHT. It is well-known that different fault conditions show different amplitude- and phase-frequency characteristics in the frequency domain. In other words, fault signal energies in some frequency bands may be enhanced, while the others are restrained. It is reasonable to assume that there are certain corresponding relationships between the signal energy changes in the frequency bands and the fault phenomena. Therefore, on the basis of HHT, energy patterns of the selected intrinsic mode function components are considered in this study to further extract representative fault features from the gearbox vibration and sound signals.

In [1,3,5], most of the existing fault classification systems for the rotating machinery are constructed by a single classifier which is trained based on one type of signal. However, a single classifier-based fault diagnostic system may not give reliable fault diagnostic results due to the fact that a universal classifier is difficult to develop, especially when the data available for training the classifier are not abundant. Furthermore, a single classifier can only be trained by one type of signal. Obviously, only one type of signal may not be able to cover all the faults. To let a fault classification system generate more reliable diagnostic result and diagnose more faults, this paper proposes a new probabilistic committee machine (PCM) to combine the diagnostic results from vibration and sound signals. From the gearbox point of view, vibration and sound signals are usually used to identify the faults because those signals are easily acquired and highly related to the conditions of the gearbox [16,17,18,19,20,21]. The committee machine concept involves combining results acquired by individual classifiers so as to obtain a group decision that is superior to any individual classifier acting alone [22,23,24], because a group decision is usually better than a single person’s decision.

Moreover, a proper classifier must be able to offer the probabilities of all possible faults so that the user can at least trace the other possible faults according to the rank of their probabilities when the fault(s) predicted by the classifier are incorrect. Therefore, it is logical to employ a probabilistic classifier for each member in the committee machine for simultaneous-fault diagnosis of the gearbox. Currently, there are two common probabilistic classifiers, the probabilistic neural network (PNN) [25,26] and relevance vector machine (RVM) [27,28] available in the relevant literature. The main drawback of PNN lies in the limited number of inputs because the complexity of the network and the training time are heavily related to the number of inputs. Hence, RVM is selected as a probabilistic classifier to build each committee member in this study. Generally, the aforementioned probabilistic classifiers are suitable to solve the binary classification. Nevertheless, most of the practical applications are multi-class classification problems. One-versus-all strategy is usually employed to fix the multi-class classification problem. However, this strategy does not consider the correlation between every pair of faults or labels, which was verified to produce a large region of indecision [29]. To solve the multi-class classification problem effectively and generate a probability, a suitable pairwise coupling strategy is adopted for the above probabilistic classifiers to generate a pairwise-coupled probabilistic neural network (PCPNN) and pairwise-coupled relevance vector machine (PCRVM).

After determining the methods of signal de-noising, feature extraction and committee members, there are still two major factors, the decision threshold ε and member weight w, affecting the system accuracy in the proposed framework. The probabilistic committee machine only produces the probability of occurrence of each fault. To determine the occurrence of the faults, a decision threshold must be applied to those probabilities (e.g., output probabilistic vector P = [0.35, 0.58, 0.48, 0.83], if ε = 0.5, fault labels (2, 4) are considered as faults). Besides, different committee members usually have various reliabilities, so a fair committee machine should assign different weights to their committee members. Hence, an efficient searching algorithm, particle swarm optimization (PSO) [30,31], to determine optimal member weights and decision threshold is considered in the proposed framework. Finally, a fair measure, F-measure, is employed to evaluate the performance of the proposed diagnostic framework.

In a nutshell, this paper proposes a new framework which can diagnose simultaneous faults in the gearbox while the framework is trained using only single-fault patterns. Besides, the proposed framework can provide probabilities of all possible faults to users to trace the other possible faults according to the rank of probabilities when the diagnostic result is incorrect. Furthermore, the proposed framework can generate a more reliable diagnostic result and diagnose more faults by simultaneously analyzing vibration and sound signals. Even though the authors also proposed a similar framework for simultaneous-fault diagnosis of automotive engines in [21], the proposed framework is targeted at the gearbox system. Moreover, the signal patterns used in this application are totally different from the ones in [21]. The proposed framework is designed based on vibration and sound signals rather than air ratio, ignition and acoustic signals in the previous framework. Besides, the engine signals acquired in [21] do not consider the issue of background noise which can degenerate the accuracy of the diagnostic system. Furthermore, the feature extraction and selection methods rely on EMD + domain knowledge and sample entropy, which are old, time-consuming, out of support from reference materials, and have a risk of mode-mixing. Finally, the objective function in [21] is not well-defined that cannot achieve good diagnostic accuracy. Therefore, the framework in [21] cannot be directly applied and is modified significantly to suit for the gearbox, particularly in the phases of data processing and feature selection. Table 1 summarizes the differences between the diagnostic framework in [21] and this study.

Table 1. Differences of diagnostic framework between reference [21] and this study.

**Table 1.** Differences of diagnostic framework between reference [21] and this study.
Differences	Reference [21]	Present Study
Application	Automotive engine	Gearbox
Signal patterns	Air ratio, ignition and acoustic signals	Vibration and sound signals
Signal de-noising	None	Wavelet threshold
Feature extraction	EMD and domain knowledge	EEMD-based Hilbert-Huang transform and energy pattern
Feature selection (IMF selection)	Value of sample entropy	Correlation coefficient
Objective function	F_me $\in$ 0.925 ± 0.025	F_me $\geq$ 0.9

This paper is organized as follows: Section 2 presents the proposed framework and related techniques. The experimental setup and data per-processing are discussed in Section 3. Section 4 discusses the experimental results and a comparison with other approaches. Finally, conclusions are given in Section 5.

2. Proposed Framework

The proposed PCM framework for the gearbox simultaneous-fault diagnosis, evaluation approach and its construction method are illustrated in Figure 1. The framework consists of four sub-modules: (1) data processing; (2) probabilistic committee machine; (3) parameter optimization; and (4) performance evaluation. The details of the four sub-modules in the framework are discussed in the following sub-sections.

Figure 1. Proposed framework of gearbox simultaneous-fault diagnosis using probabilistic committee machine.

In this case study, signal features are extracted from two kinds of signals x_k (k = 1, 2), including the vibration and sound signals, which are denoted as x₁ and x₂, respectively. Taking the vibration signal as an example, the signal x₁, including both single-fault patterns (S) and simultaneous-fault patterns (S_M), goes through de-noising and feature extraction. After the data processing, the processed dataset is divided into three independent groups, including validation dataset, training dataset, and test dataset which are named as x_1-PTra, x_1-PVal, and x_1-PTes, respectively. The x_1-PVal and x_1-PTes involve the combination of both single-fault patterns and simultaneous-fault patterns, while x_1-PTra contains the single-fault patterns only. The divided datasets are used to train, validate, and test the proposed framework.

2.1. Data Processing

2.1.1. Signal De-Noising

The acquired signals are display interference from the background noise. To decrease the interference, the acquired signals have to be de-noised. A discrete wavelet transform (DWT) technique, which is an effective de-noising technique for non-stationary signals [11,13], is selected in this paper. The DWT can be defined as:

DWT (s, R) = \frac{1}{\sqrt{2^{s}}} \int_{- \infty}^{\infty} x (t) ψ^{*} (\frac{t - 2^{s} R}{2^{s}}) d t

(1)

where s and R are integers, 2^s and 2^sR represent the scale and translation parameters respectively, Ψ represents the mother wavelet and Ψ^* is the complex conjugate of Ψ. The original signal in time-domain x_k = x(t) goes through a set of low pass and high pass filters emerging as low frequency (approximations, a^*) and high frequency (details,

d_{i}^{*}

) signals. Therefore, the original signal x(t) can be written as:

x (t) = a_{n}^{*} + \sum_{i = 1}^{n} d_{i}^{*}

(2)

The DWT-based de-noising technique is performed in three steps: (1) signal decomposition; (2) determination of the threshold and nonlinear shrinking coefficients; and (3) signal reconstruction. In the family of mother wavelets, the Daubechies wavelet (Db) is the most popular one and hence it is employed in this study. Moreover, the soft threshold signal is defined as

s i g n (x (t)) (| x (t) - T |)

, if

| x (t) | > T

, and otherwise is 0, where T denotes a universal threshold that equals to

\sqrt{2 \log (l e n g t h x (t))}

. The detail of the de-noising is described in Section 3.2.

2.1.2. Feature Extraction Based on Hilbert-Huang Transform

The Hilbert-Huang transform (HHT) mentioned in this paper combines EEMD and the Hilbert transform. EEMD defines the true IMFs as the ensemble mean of trails, which consist of the decomposition of the signal plus a white noise of finite amplitude. In most cases, the range of the standard deviation is from 0.1 to 0.4 [32]. The EEMD algorithm [33] is given as follows:

(1): Initialize the number of ensemble J, the amplitude of the added white noise, and set j = 1.
(2): Perform the jth trial on the white noise-added signal. A white noise series with the given amplitude is added to the investigated signal:

$x ´_{j} = x (t) ´ + n_{j}$

(3)

where n_j represents the jth added white noise series, x(t)’ is the de-noised signal and x’_j denotes the noise-added signal of the jth trial.
(3): With the EMD method, the noise-added signal x_j is decomposed into I IMFs as c_i,j(t), for i = 1, 2, …, I, where c_i,j represents the ith IMF of the jth trial, and I is the number of IMFs.
(4): If j < J then let j = j + 1. Repeat Steps 2 and 3 again and again, but with different white noise series each time until j = J.
(5): Calculate the ensemble mean $\bar{c_{i}}$ of J trials for each IMF:

$\bar{c_{i}} = \frac{1}{J} \sum_{j = 1}^{J} c_{i, j}, i = 1, 2, ..., I, j = 1, 2, ..., J$

(4)
(6): Report the mean $\bar{c_{i}}$ of the I IMFs as the final IMFs.

Applying the Hilbert transform to each IMF, and calculating the instantaneous frequency

ω_{j}

(t) and amplitude A_j(t), the Hilbert spectrum of x(t)’,

H (ω, t)

, is then calculated by the following equation:

H (ω, t) = Re \sum_{j = 1}^{I} A_{j} (t) \exp (i \int ω_{j} (t) d t)

(5)

Accordingly, the marginal spectrum of Hilbert-Huang transform, h(

ω

), can be defined by an integrated spectrum with respect to time, t, i.e.:

h (ω) = \int_{0}^{l} H (ω, t) d t

(6)

where h(

ω

) reflects the amplitude changing with frequency in the entire frequency range, and l is the length of the signal x(t)’. The instantaneous frequency of IMF, which is obtained from the Hilbert transform, is well-localized in the time-frequency domain and reveals important characteristics of the signal.

2.2. Probabilistic Committee Machine

PCM is a group decision method which combines the results from the individual classifier and generates superior performance to any of the individual classifier acting alone. As mentioned previously, RVM is selected for constructing the probabilistic fault classifier. To solve the multi-label classification problem effectively, RVM adopts a pairwise coupling strategy which is named PCRVM. Moreover, a new ensemble method is proposed to combine the output of each committee member. In the proposed ensemble method, the committee members should be assigned suitable weights since every member/classifier in the group usually has its own strength. The details of PCRVM algorithm and ensemble method are described in the following sections.

2.2.1. Relevance Vector Machine

RVM is a statistical learning method utilizing Bayesian learning framework and popular kernels. In this research, predicting the posterior probability of each fault t_n for unseen symptoms f is conducted by RVM based on experimental data. Given a set of training data (f, t) = {f_n,t_n}, n = 1 to N, t_n

\in

{0, 1}, and N is the number of training data. It follows the statistical convention and generalizes the linear model by applying the logistic sigmoid function

σ (y (f)) = 1 / (1 + \exp (- y (f)))

to the predicted decision y(f) and adopting the Bernoulli distribution for

P (t | F)

, the likelihood of the data is written as:

\begin{array}{r} P (t | F, θ) = \prod_{n = 1}^{N} σ {y (f_{n}; θ)^{t_{n}}} {[1 - σ {y (f_{n}; θ)}]}^{1 - t_{n}} \\ where y (f; θ) = \sum_{i = 1}^{N} θ_{i} K (f, f_{i}) + θ_{0} \end{array}

(7)

where

θ = {(θ_{0}, θ_{1}, ..., θ_{N})}^{T}

is a weight vector and K is a kernel function. In the open literatures, three kinds of kernel functions, radial basis function (RBF), polynomial, and Gaussian kernels, are available. Among these kernel functions, Gaussian kernel is the most popular kernel function in RVM to deal with the issue of classification for industrial applications [34].

The optimal weight vector

θ^{*}

for the given dataset needs to be computed so as to maximize the probability P(

θ

|t, F, α)

\propto

P(t|F,

θ

)P(

θ^{*}

|α), with α = [α₀, α₁, …, α_N] a vector of N + 1 hyperparameters. However, the weights cannot be determined analytically. Thus, the following approximation procedure is chosen, which is based on Laplace’s method:

(1): For the current fixed values of α, the most probable weights $θ_{MP}$ are found. Since P( $θ$ |t, F, α) $\propto$ P(t|F, $θ$ )P( $θ$ |α), this step is equivalent to the following maximization.

$\begin{array}{l} θ_{MP} & = \arg \max_{θ} \log {P (t | F, θ) P (θ | α)} \\ = \arg \max_{θ} {\sum_{n = 1}^{N} [t_{n} \log d_{n} + (1 - t_{n}) (1 - \log d_{n})] - \frac{1}{2} θ^{T} A θ} \end{array}$

(8)

where $d_{n} = σ {y (f_{n}; θ)}, A = diag (α_{0}, α_{1}, ..., α_{N})$ .
(2): Laplace’s method is simply a Gaussian approximation to the log-posterior around the mode of the weights $θ_{MP}$ . Equation (8) is differentiated twice to give:

$\nabla_{θ} \nabla_{θ} \log P (θ | t, F, α) |_{θ_{M P}} = - (Φ^{T} B Φ + A)$

(9)

where $B = diag (β_{1}, β_{2}, ..., β_{N})$ is a diagonal matrix with $β_{n} = σ {y (f_{n}; θ)} [1 - σ {y (f_{n}; θ)}]$ and $Φ$ is a N × (N + 1) design matrix with $Φ_{n m} = K (f_{n}, f_{m - 1})$ and $Φ_{n 0} = 1$ , n = 1 to N, and m = 1 to N + 1. By inverting Equation (9), the covariance matrix $\sum = {(Φ^{T} B Φ + A)}^{- 1}$ can be obtained.
(3): The hyperparameter vector α is updated using an iterative re-estimation equation. Firstly, α_i is randomly guessed, then $γ_{i} = 1 - a_{i} \sum_{i i}$ is calculated, where $\sum_{i i}$ is the ith diagonal element of the covariance matrix $\sum \cdot$ Then, α_i is re-estimated as follows:

$α^{n e w} = \frac{γ_{i}}{u_{i}^{2}}$

(10)

where $u = θ_{MP} = \sum Φ^{T} B t$ . The first step is to set $α_{i} \leftarrow α_{i}^{n e w}$ and then $γ_{i}$ and $α_{i}^{n e w}$ are re-estimated again until convergence. Finally, $θ = θ_{MP}$ is set, so that the classification model $y (f; θ) = \sum_{i = 1}^{N} θ_{i} K (f, f_{i}) + θ_{0}$ is obtained.

2.2.2. Pairwise-Coupled Relevance Vector Machine as Committee Member

The traditional machine learning methods are designed only for the issue of binary classification, in which the output is either positive (+1) or negative (−1). However, most practical problems are multi-classification as well as probabilistic output. Usually, one-versus-all is employed to deal with multi-classification problems. The one-versus-all strategy constructs a group of classifiers l_class = [C₁, C₂, …, C_d] in a d-label classification problem. The one-versus-all strategy is simple and easy to implement, however, it generally gives a poor result [29,35] since one-versus-all does not consider the pairwise correlations which causes a much larger indecisive region than the pairwise coupling strategy (using one-versus-one) as showed in Figure 2. The pairwise coupling strategy also constructs a group of classifiers l_class = [C₁, C₂, …, C_d] in a d-label classification problem. However, each C_i = [C_i₁, C_i₂, …, C_id] is composed of a set of d − 1 different pairwise classifiers C_ij,

i \neq j

. Since C_ij and C_ji are complementary, there are totally d(d − 1)/2 classifiers in l_class as shown in Figure 3. To solve the multi-classification and probabilistic output problems, a pairwise coupling strategy is adopted for the RVM and PNN classifiers. The strategy combines all the outputs of every pair of classes to re-estimate the overall probability for a new instance.

Figure 2. Indecisive regions (shaded regions) using one-vs-all (left) and pairwise coupling (right).

Figure 3. Pairwise coupling strategy of probabilistic classification.

There are several available methods for pairwise coupling strategy [29], which are, however unsuitable for simultaneous-fault diagnosis because of the constraint

\sum^{​} ρ_{i} = 1

. Where

ρ_{i}

is the probability of the ith label. Note that the nature of simultaneous-fault diagnosis is that

\sum^{​} ρ_{i}

is unnecessarily equal to 1. Therefore, the following simple pairwise coupling strategy for simultaneous-fault diagnosis is proposed. Every

ρ_{i}

is calculated as:

ρ_{i} = C_{i} (x) = \frac{\sum_{i = 1 : i \neq j}^{d} n_{i j} C_{i j} (x)}{\sum_{j = 1 : i \neq j}^{d} n_{i j}} = \frac{\sum_{j = 1 : i \neq j}^{d} n_{i j} ρ_{i j}}{\sum_{j = 1 : i \neq j}^{d} n_{i j}}

(11)

where n_ij is the number of training feature vectors with either the ith or jth label. Hence, the probability can be accurately estimated from

ρ_{i j} = C_{i j} (x)

because the pairwise correlation between the labels is taken into account. With the above pairwise coupling strategy, the proposed probabilistic committee member, PCRVM, could estimate the probability vector

ρ

in a high level of accuracy.

After designing the pairwise coupling strategy for each probabilistic classifier, a new ensemble method is proposed to combine the result from each committee member with optimal weight.

2.2.3. Ensemble Method

One of the most frequently used ensemble methods is weighted averaging. In this method, every committee member has an appropriate weight related to its ability. However, the weighted averaging method cannot give a fair result when it deals with the issue of unbalanced committee member sensitivities to faults. For example, when the committee member 1 is not trained by a dataset with the fault d₅, the fault d₅ usually cannot be predicted by the committee member 1, which is demonstrated in Table 2. However, the weight averaging method still uses the unpredictable output to calculate the overall average, resulting in an unfair or unpredictable result.

To overcome the above problem, a novel ensemble method with optimal weights and predefined null outputs is proposed which is given by Equation (12). In Equation (12),

ρ_{j - i}

is set to be zero when the jth classifier cannot make a diagnosis for the ith fault label (i.e., the jth classifier is not trained by the ith single-fault). In this way, the proposed method can overcome the problem of the traditional weighted averaging method, which is one of main contributions of this research. The probability of the ith fault is expressed as:

\begin{array}{l} P_{i} = \frac{\sum_{j = 1}^{k} w_{j - o p t} ρ_{j - i}}{\sum_{j = 1}^{k} f (w_{j - o p t})}, i = 1, 2, ..., d & j = 1, 2, ..., k \\ subject to f (w_{j - o p t}) = {\begin{cases} w_{j - o p t} \\ 0 : i f ρ_{j - i} = 0 \end{cases} \end{array}

(12)

where w_j-opt is the optimal weight for the jth committee member, w_j-opt

\in [0, 1]

, j = 1 to k, where k is the number of committee members, and the sum of w_j-opt is not equal to 1.

ρ_{j - i} \in [0, 1]

is probability estimated from the jth classifier for the ith single-fault, i = 1 to d where d is the total number of detectable single-faults. Finally, the probabilistic outputs of classifiers are combined with optimal weights to generate the probability vector P = [P₁, P₂, ..., P_d].

Table 2. Issue of weighted averaging method for balanced and unbalanced committee member sensitivities to gearbox faults.

**Table 2.** Issue of weighted averaging method for balanced and unbalanced committee member sensitivities to gearbox faults.
Balanced Member Sensitivities to Gearbox Faults	Committee Member 1	Committee Member 2	Average Output Probability (P₃) for d₃
Fault d₃	trained	trained	$P_{3} = \frac{w_{1} ρ_{1 - 2} + w_{2} ρ_{2 - 2}}{w_{1} + w_{2}} \in [0 1]$ P₃ is a reasonable result
Output probability for d₃ for an unseen case	$ρ_{1 - 3} \in [0 1]$	$ρ_{2 - 3} \in [0 1]$
Unbalanced Member Sensitivities to Gearbox Faults	Committee Member 1	Committee Member 2	Average Output Probability (P₅) for d₅
Fault d₅	Unable to train	trained	$P_{5} = \frac{w_{1} ρ_{1 - 5} + w_{2} ρ_{2 - 5}}{w_{1} + w_{2}}$ P₅ is an unfair/unpredictable result
Output probability for d₅ for an unseen case	$ρ_{1 - 5}$ is unpredictable	$ρ_{2 - 5} \in [0 1]$

Remark: w₁ and w₂ are weights for Committee members 1 and 2 respectively; P₃ and P₅ are average output probabilities for d₃ and d₅ respectively.

In this application, the processed training datasets x_k-PTra, are employed to train probabilistic classifiers (PCRVM) respectively. The workflow of the PCM is shown in Figure 4.

Figure 4. Procedure for training probabilistic committee machine.

2.3. Parameter Optimization

The probability vector P = [P₁, P₂, …, P_d] can be provided to the user as a quantitative measure for reference and further processing. However, human experts generally cannot identify the number of simultaneous-faults directly based on the output probability of each fault. Therefore, a decision threshold (DT) ε is introduced to identify the simultaneous-faults from P such that:

y_{i} = {\begin{matrix} 0 \\ 1 & if P_{i} \geq ε \end{matrix}

(13)

where

ε \in [0 1]

and

1

denotes that the corresponding fault occurs. For example, given an unseen input x, if P = [0.72, 0.42, 0.51, 0.81, 0.39] and ε = 0.5, then y = DT(P) = [1, 0, 1, 1, 0]. Therefore, the unseen x is diagnosed as a simultaneous-fault for the labels (1, 3, 4).

Obviously, the weight and the decision threshold are the major factors affecting the classification accuracy. By reviewing the literature [30,31], it is seen that PSO has the same effectiveness as a typical optimization method, genetic algorithms, in finding the global optimal solution, but with better computational efficiency. Hence, PSO is adopted to determine the best weights w_opt and decision threshold ε_opt in this study.

Particle Swarm Optimization

PSO is a population-based optimizer. The population is regarded as a swarm and the individuals are considered as particles. For an z-dimensional search space and a swarm consisting of H particles, the ith particle can be represented by an z-dimensional vector u_i = (u_i₁, u_i₂, …, u_i_z), the velocity of this particle can be an z-dimensional vector v_i = (v_i₁, v_i₂, …, v_i_z), and the best previous position encountered by this particle can be described as p_i = (p_i₁, p_i₂, …, p_i_z). Let g represent the index of the particle that attains the best previous position among all the particles in the swarm. Then, the swarm is manipulated in accordance with the following equations:

v_{i} (j + 1) = W_{f} v_{i} (j) + q_{1} r_{1} [p_{i} (j) - u_{i} (j)] + q_{2} r_{2} [p_{g} (j) - u_{i} (j)]

(14)

u_{i} (j + 1) = u_{i} (j) + v_{i} (j + 1)

(15)

where i is the particle index i = [1, 2, …, H], W_f is the weight factor, q₁ and q₂ are positive constants, r₁ and r₂ are the random numbers selected between [0, 1]. The selection of the above parameters was presented in [36]. With reference to the literature, Table 3 shows the PSO parameters selected for this case study.

Table 3. PSO parameters.

**Table 3.** PSO parameters.
Number of generations	1000
Population size	50
W_f	0.9
q₁	2
q₂	2

To evaluate the fitness of each iteration, a common evaluation method called F-measure [37] and an objective function described in Section 2.4 are employed. The procedure of the proposed PSO approach is illustrated in Figure 5, which is performed in three steps:

(1): Initializing the parameters of PSO: The candidate weight (w₁, w₂) and decision threshold are randomly selected from interval [0, 1].
(2): Calculating the output of F-measure: Following the procedure in Figure 5, the candidate weight and decision threshold are entered into the PCM model and Equation (13), respectively.
(3): Comparing the output of F-measure with the objective function: If the F-measure satisfies the objective function, the corresponding weights and decision threshold are taken as optimal parameters, otherwise PSO updates the weights and decision threshold based on Equations (14) and (15), and then repeats Steps 2 and 3. When it reaches the present number of generation or satisfies the objective function, the corresponding weights and decision threshold of the highest output of F-measure are taken as optimal parameters.

Figure 5. Procedure for optimization of committee member weights and decision threshold.

2.4. Performance Evaluation

The traditional performance evaluation of classifiers only considers exact matching of the decision vector y against the true vector t. This evaluation is however unsuitable for simultaneous-fault diagnosis where partial matching is preferred. F-measure is mostly used as a performance evaluation for information retrieval systems where a document may belong to a single or multiple tags simultaneously, which is very similar to the current study. By using F-measure, the evaluation of both single-fault and simultaneous-fault test cases can be fairly examined. The definition of F-measure is given in Equation (16). The larger the F-measure value, the higher the diagnostic accuracy is:

F_{m e} = \frac{2 \sum_{j = 1}^{d} \sum_{i = 1}^{N_{t}} y_{i j} t_{i j}}{\sum_{j = 1}^{d} \sum_{i = 1}^{N_{t}} y_{i j} + \sum_{j = 1}^{d} \sum_{i = 1}^{N_{t}} t_{i j}} \in [0, 1]

(16)

where y_i = [y_i₁, y_i₂, …, y_id] and t_i = [t_i₁, t_i₂, …, t_id] are the predicted decision vector and the true decision vector respectively, for j = 1 to d and i = 1 to N_t and ∀y_ij, t_ij ∈ [0, 1]. N_t is the number of single-fault and simultaneous-fault test patterns. For optimization of the weights and decision threshold, F_me also serves as an important parameter in an objective function. In order to avoid over-fitting to the validation dataset and achieve high diagnostic accuracy, the objective function is specifically defined as:

F_{m e} \geq B

(17)

where B is the preset optimal accuracy of F-measure and B lies between 0 and 1. In this study, B is set to be 0.9 as a trial. Figure 6 summarizes the evaluation process for the proposed diagnostic framework.

Figure 6. Evaluation of proposed framework.

3. Experimental Setup and Data Preprocessing

To verify the effectiveness of the proposed framework, experiments were carried out. The detail of the experimental set up is presented in the following subsections. All the proposed methods were implemented by using MatLab R2008a and executed on a computer with a Core 2 Duo E6750 @ 2.13 GHz with 4 GB RAM.

3.1. Test Rig and Sample Data Acquisition

The experiments were performed on a test rig as shown in Figure 7, which can simulate most of the faults in a gearbox. In this study, some common gearbox faults, including gear faults, bearing faults, and structural faults, are introduced. In the experiments, the gear faults include a broken tooth with whole tooth damage, a chipped tooth with 1/4 tooth damage, and a gear crack with a 5 mm crack on the tooth face, whereas the bearing faults include medium wear on the rolling elements and outer races. The structural faults contain unbalance, looseness, and misalignment, which are simulated by respectively adding one eccentric mass on the output shaft, unfastening some screws of the gearbox, and adjusting one height of the gearbox with shims. In the test rig, the signal acquisition module (NI 9234) with accelerometers and a microphone acquires the vibration and sound signals, respectively. The accelerometer is used to record the vibration signals along the vertical direction. In this study, a total of 12 cases, including eight single-faults and four simultaneous faults which are described in Table 4, are simulated in the test rig in order to generate sample training and test datasets. According to practical experience, a machine cannot be operated if there are too many faults at the same time. Therefore, the type of simultaneous faults is an experimental selection in this case study. Besides, the relationship between simulated faults and signal types is presented in Table 5, which explains that one kind of signal can only detect a limited number of faults. For example, previous experiments have found that the vertical vibration signal cannot be used to detect d₄ and d₅ because the loading on the tapered roller bearing along the vertical direction is insignificant. Moreover, the sound signal is relatively unaffected by structural resonance [38], so the structural failures (d₁, d₂ and d₃) cannot be easily detected using the sound signal. To extend the number of detectable faults and enhance the reliability of the fault diagnostic system, the vibration and sound signals are therefore simultaneously employed to diagnose the simultaneous-faults in the gearbox.

Figure 7. Collection of fault patterns from a rotating machinery.

Table 4. Description of single-faults and simultaneous-faults.

**Table 4.** Description of single-faults and simultaneous-faults.
Case No.	Single-Faults	Case No.	Simultaneous-Faults
d₁	Unbalance	si₉	Broken gear tooth & Chipped tooth
d₂	Looseness	si₉	Broken gear tooth & Chipped tooth
d₃	Mechanical misalignment	si₁₀	Chipped tooth & Bearing with worn outer race
d₄	Bearing with worn rolling elements	si₁₀	Chipped tooth & Bearing with worn outer race
d₅	Bearing with worn outer race	si₁₁	Broken gear tooth & Bearing with worn rolling elements
d₆	Broken gear tooth	si₁₁	Broken gear tooth & Bearing with worn rolling elements
d₇	Gear crack	si₁₂	Bearing with worn rolling elements & Bearing with worn outer race
d₈	Chipped tooth	si₁₂

Table 5. Relationship of single-faults and signal types.

**Table 5.** Relationship of single-faults and signal types.
	d₁	d₂	d₃	d₄	d₅	d₆	d₇	d₈
Vertical vibration	√	√	√			√	√	√
Sound				√	√	√	√	√

To construct and test the proposed diagnostic framework, the samples for each single fault and simultaneous fault were repeated 200 times under two testing conditions (800 rpm and 1500 rpm). Each time, 1 s of raw signal, including the vibration and sound signals,wa simultaneously recorded with a sampling rate of 25.6 kHz. In other words, one case of each type of signal has 25,600 sampling data points. For each type of signal x_k (k = 1, 2), there are 1600 single-fault sample data (i.e., eight kinds of single faults × 200 samples) and 800 simultaneous fault sample data (i.e., four kinds of simultaneous faults × 200 samples). In order to evaluate the diagnostic performance for both single faults and simultaneous faults, each sample data is divided into different subsets as shown in Table 6.

Table 6. Division of sample dataset into different subsets.

**Table 6.** Division of sample dataset into different subsets.
	Type of Dataset	Single-Faults (1600)	Simultaneous-Faults (800)
Raw sample data (x_k)	Validation dataset	D_k-Val (800)	D_k-Val (600)
	Training dataset	D_k-Tra (600)
	Test dataset	D_k-Tes (200)	D_k-Tes (200)
After feature extraction	Validation dataset	D_k-PVal (800)	D_k-PVal (600)
	Training dataset	D_k-PTra (600)
	Test dataset	D_k-PTes (200)	D_k-PTes (200)

3.2. Data Processing and Signal De-Noising in Case Study

In order to obtain the feature vector, the IMF energy pattern based on HHT is calculated with the following steps: (1) signal de-noising; (2) IMF component selection; and (3) IMF energy pattern calculation.

(1) Signal de-noising. In the signal de-noising phase, the mother wavelet and the level of decomposition L are selected according to a trial-and-error method. In this case study, four Daubechies wavelets (Db3, Db4, Db5, and Db6) are tried and the range of L is set from 3 to 5. Moreover, the soft threshold T is equal to 4.476 according to the equation

T = \sqrt{2 log (length x (t))}

. The effectiveness of de-noising using Db wavelets is verified by using signal to noise ratio (SNR) which is given as follows:

SNR = 10 \times \log_{10} (\frac{S_{σ}}{N_{σ}})

(18)

where

S_{σ}

and

N_{σ}

are the standard deviation of de-noised signal and noise signal respectively. A large value of SNR means more noise is eliminated. Considering the sound signal of d₆ as an example, the de-noised result is shown in Table 7. It demonstrates that the SNR of Db5 with Level 3 is the highest, so it is suitable to de-noise the signal.

Table 7. Signal to noise ratio under different combinations of Db wavelets.

**Table 7.** Signal to noise ratio under different combinations of Db wavelets.
SNR	Level 3	Level 4	Level 5
Db3	12.689 db	11.041 db	10.191 db
Db4	12.690 db	11.090 db	10.207 db
Db5	12.847 db	11.126 db	10.271 db
Db6	12.720 db	11.118 db	10.272 db

(2) IMF component selection. After de-noising the signals, the IMFs of all de-noised signals are calculated by using EEMD in which the ensemble number and white noise amplitude of EEMD are set as 100 and 0.3 time of the standard deviation of the investigated signal respectively [33]. In this case study, EEMD decomposes the de-noised sound signal into ten IMFs and a residual signal. To select the proper number of IMFs, the correlation coefficient method [13] is used. The correlation coefficient between an IMF component I_i(t) and its de-noised signal x(t)’ can be defined as:

C o e_{x (t)', I_{i} (t)} = \frac{\sum_{i = 1}^{M} (x (t)' - \bar{x}) (I_{i} (t) - \bar{I_{i}})}{\sqrt{\sum_{i = 1}^{M} {(x (t)' - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{M} {(I_{i} (t) - \bar{I_{i}})}^{2}}}

(19)

where

\bar{x}

and

\bar{I_{i}}

is the mean values of the x(t)’ and I_i(t) respectively and M is the number of IMFs. A large

C o e_{x (t), I_{i} (t)}

value means a high correlation between I_i(t) and x(t)’, and also implies that I_i(t) contains more fault information. A signal of correlation coefficients of de-noised sound signal of d₆ is presented in Table 8 as a demonstration in which the correlation coefficient of IMF I₁₀ is obviously smaller than the others. Thus, only the IMFs from levels 1–9 are considered to extract the energy pattern in this case study.

Table 8. Correlation coefficients of each IMF component for an example of de-noised signal of d₆.

**Table 8.** Correlation coefficients of each IMF component for an example of de-noised signal of d₆.
De-noised sound of d₆		IMF Component
		I₁	I₂	I₃	I₄	I₅	I₆	I₇	I₈	I₉	I₁₀
	Correlation coefficient	0.2054	0.2089	0.2132	0.2375	0.2489	0.3475	0.3134	0.2876	0.2273	0.0274

(3) IMF energy pattern calculation. In this case study, the energy patterns of selected IMFs are considered to extract the fault features. The energy of the ith IMF, E_i, can be calculated by using the following equation:

E_{i} = \sum_{j = 1}^{n} [(j \cdot Δ t) \cdot {| I_{i} (j \cdot Δ t) |}^{2}]

(20)

where

Δ t

is the time interval, n and j are the total number and index of data points respectively, and

I_{i} (j \cdot Δ t)

denotes the decomposition coefficient of the ith IMF at the moment of

j \cdot Δ t

. A nine-dimensional energy feature vector is extracted as E = [E₁, E₂, …, E₉]. Furthermore, under different fault conditions, the HHT marginal spectra show various maximum values and corresponding frequencies in the patterns. To enrich the fault information, the maximum amplitude of a marginal spectrum of HHT, A_m, and its corresponding frequency, f_m, are added to the feature vector E. Therefore, the extracted feature vector is extended to an eleven-dimensional vector, which can be rewritten as E = [E₁, E₂, …, E₉, A_m, f_m]. The procedure of data processing is illustrated in Figure 8.

Figure 8. Flowchart of proposed feature extraction approach.

4. Experimental Results and Discussion

4.1. Performance of Various Combinations of Feature Extraction Techniques

In the experiments, two typical feature extraction methods, fast Fourier transform (FFT) and wavelet package transform with principal component analysis (WPT + PCA) are compared with HHT. For those feature extraction methods, some settings are necessary. For the wavelet package transform (WPT), the Daubechies wavelet is the most popular one, so it is employed. In this case study, Db4 with level 4 decomposition is employed after carrying out many trials. Besides, two classification techniques are used to compare with the proposed PCM framework, including PCPNN and PCRVM. There are two hyper-parameters, spread S^* and width W* in the kernel function, which are necessary to be defined in PCPNN and PCRVM respectively. Meanwhile, PCRVM is employed as a committee member of PCM, so PCM and PCRVM share the same hyper-parameter width W*. By using a trial-and-error method, S^* and W* are set to be 0.3 and 0.64 respectively.

After determining the configurations of the feature extraction and classification techniques, the reasonable combinations of feature extraction techniques are tested as shown in Figure 9, in which the weight of each committee member and decision threshold are predefined as 1 (i.e., w₁ = w₂ = 1) and 0.5 respectively. Note that PCPNN and PCRVM determine their F-measures by combining all the features extracted from vibration and sound signals as their input vectors, whereas PCM employs two PCRVM committee members to analyze the respective extracted features.

Figure 9 illustrates that the feature extraction techniques are effective. Taking the proposed PCM framework as an example, the feature extraction techniques, FFT, WPT + PCA, and Hilbert-Huang transform + energy pattern (HHT + E) give 14.12%, 18.18%, and 21.48% improvement respectively as compared with the method without any feature extraction. By using PCPNN and PCRVM as classifiers, the feature extraction methods also improve the diagnostic accuracy from 16.06% to 21.16% as compared with the method without feature extraction. Note that the classifiers only employ a training set of single-fault patterns to construct the classifiers while the performance is evaluated using simultaneous-fault test patterns. Figure 9 also indicates that no matter which classification technique is, HHT + E gives the best performance. The reason is that extracting the energy from HHT can reflect not only the energy amount of each IMF, but also the energy distribution of each IMF changing with time, which can provide more faulted component information. This result also verifies that the proposed feature extraction technique (HHT + E) is effective to extract the features of single-faults from simultaneous-fault patterns of the gearbox.

Figure 9. Diagnostic accuracies of different combinations of feature extraction techniques.

4.2. Result and Discussion of Optimization Approach

After selecting HHT + E as feature extraction technique, the extracted features are employed to construct and train the committee machine. Then, PSO and Equations (16) and (17) are employed to determine the best w_opt for each committee member and decision threshold ε_opt. The optimized weights and threshold as well as their corresponding F_me are shown in Table 9 in which the optimal weight for the first committee member w₁ (0.7752) is higher than that of w₂. In other words, the committee member trained by vertical vibration signal shows a great impact on the simultaneous-fault diagnosis. The main reason is that the sound signal is easily interfered by background noise. It implies that the first committee member is assigned with greater weight by PSO in order to make the output satisfying the objective function. Table 9 also illustrates that the proposed optimization framework can improve the diagnostic accuracy by 3.82% as compared with the empirical decision threshold of 0.5 and identical weights (w₁ = w₂ = 1) under the same feature extraction technique and simultaneous-fault test dataset. It means that the proposed optimization framework is effective.

Table 9. Selection of optimal weights and decision threshold using PSO.

**Table 9.** Selection of optimal weights and decision threshold using PSO.
Classifier	No. of Features	Optimization Method	Decision Threshold	Weights	F_me Based on Simultaneous-Fault Test Dataset
PCM	Vibration = 11 Sound = 11	-	0.5	w₁ = 1 w₂ = 1	0.7890
PCM	Vibration = 11 Sound = 11	PSO	0.7583	w₁ = 0.7752 w₂ = 0.6991	0.8272

Remark: Feature extraction method is based HHT + E.

4.3. Overall Evaluation of Proposed Framework

To verify the effectiveness of the proposed PCM diagnostic framework, the aforesaid two single probabilistic classifiers are compared with the proposed framework based on the optimal weights and decision threshold obtained by PSO. The experimental result of F-measure is shown in Table 10. Compared with PCPNN and PCRVM, the training time and average fault detection time of PCM are the longest, 36.189 s and 17.8574 s, respectively, while the result shows the diagnostic accuracy of PCM outperforms PCPNN and PCRVM by 5.24% and 4.18% respectively under the same test dataset of simultaneous-faults. Note that the training time of PCM is only based on the training dataset of single fault patterns; the average fault detection time of PCM relies on calculating the average time of test datasets of single, simultaneous and overall faults. Table 10 also reveals the proposed framework achieves the best accuracy for single faults (94.60%) and overall faults (89.24%) which include both single and simultaneous fault patterns. The main reason is that the committee members in the proposed framework are trained with different types of signals. In this way, each committee member becomes different from each other, which can improve the classification accuracy of the ensemble. For example, considering an ensemble of k trained classifiers [C₁, C₂, ..., C_k], if the classifiers are trained using different subsets and their errors are uncorrelated, then even when C_i is wrong, most of the other classifiers C_j (where i ≠ j) may still be correct.

In a nutshell, the proposed framework is an effective approach to detect the simultaneous-faults without costly simultaneous-fault training patterns. Moreover, the proposed method employs vibration and sound signals to train the diverse committee members, which can ensure the diagnostic result to be more reliable and accurate. Therefore, it can be concluded that the proposed framework is an effective technique to overcome both challenges in fault diagnosis of the gearbox.

Table 10. Evaluation result of PCM, PCPNN and PCRVM.

**Table 10.** Evaluation result of PCM, PCPNN and PCRVM.
Classifier	Feature Number	Decision Threshold	Optimal Weight	Accuracies for Test Cases (F_me)
Classifier	Feature Number	Decision Threshold	Optimal Weight	Single- Faults	Simultaneo- Us-Faults	Overall- Faults	Average Fault Detection Time (s)
PCPNN	11 + 11 = 22	0.6830	-	0.9163	0.7717	0.8563	8.8014
PCRVM	11 + 11 = 22	0.6754	-	0.9141	0.7823	0.8642	9.7685
PCM	Vibration = 11 Sound = 11	0.7583	w₁ = 0.7752 w₂ = 0.6991	0.9460	0.8241	0.8924	17.8574

Remark: Feature extraction method is based on HHT + E.

5. Conclusions

In this paper, a new framework, which combines signal de-noising, feature extraction, probabilistic committee machine, parameter optimization and F-measure, has successfully been developed to overcome the challenges of simultaneous fault diagnosis and multiple signal analysis in a gearbox. In consideration of the features of vibration and sound signals in this application, DWT and HHT + E are used for signal de-noising and feature extraction, respectively, so that the diagnostic system can effectively capture the single fault components from the noise-polluted simultaneous fault patterns. It implies that the acquisition of large amount of simultaneous fault signals can be avoided. Moreover, PSO is effective for optimizing the weight of each committee member and decision threshold in the PCM framework. To verify the effectiveness of the proposed probabilistic committee machine and make a comparison, the single probabilistic classifiers, PCPNN and PCRVM, are also employed to diagnose the simultaneous faults. Although the results show that those machine learning methods can diagnose the simultaneous faults in the gearbox, it is found that the proposed PCM framework is superior to the single classifiers. Therefore, the proposed PCM framework is suitable to detect the simultaneous faults in the gearbox.

In practice, most mechanical faults can be diagnosed by analyzing vibrations, sounds, currents, oil debris and temperature signals. As the number and type of committee members in the proposed framework can be adjusted by the user, the proposed framework can be applied to other similar diagnostic applications.

Acknowledgments

The authors would like to thank the financial support from the University of Macau, grant Numbers: MYRG2014-00178-FST, MYRG079(Y1-L2)-FST13-YZX, and MYRG2015-00077-FST. The authors would also like to thank the support from Yueqiao Chen.

Author Contributions

Pak Kin Wong and Jian-Hua Zhong conceived and designed the experiments; Jian-Hua Zhong performed the experiments; Jian-Hua Zhong and Zhi-Xin Yang analyzed the data; Zhi-Xin Yang contributed reagents/materials/analysis tools; Pak Kin Wong and Jian-Hua Zhong wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hu, Q.; He, Z.; Zhang, Z.; Zi, Y. Fault diagnosis of rotating machinery based on improved wavelet package transform and svms ensemble. Mech. Syst. Signal Process. 2007, 21, 688–705. [Google Scholar] [CrossRef]
Lei, Y.; He, Z.; Zi, Y.; Hu, Q. Fault diagnosis of rotating machinery based on multiple anfis combination with gas. Mech. Syst. Signal Process. 2007, 21, 2280–2294. [Google Scholar] [CrossRef]
Sanz, J.; Perera, R.; Huerta, C. Fault diagnosis of rotating machinery based on auto-associative neural networks and wavelet transforms. J. Sound Vib. 2007, 302, 981–999. [Google Scholar] [CrossRef]
Widodo, A.; Yang, B.-S. Application of nonlinear feature extraction and support vector machines for fault diagnosis of induction motors. Expert Syst. Appl. 2007, 33, 241–250. [Google Scholar] [CrossRef]
Widodo, A.; Yang, B.S. Support vector machine in machine condition monitoring and fault diagnosis. Mech. Syst. Signal Process. 2007, 21, 2560–2574. [Google Scholar] [CrossRef]
Wong, P.K.; Yang, Z.; Vong, C.M.; Zhong, J. Real-time fault diagnosis for gas turbine generator systems using extreme learning machine. Neurocomputing 2014, 128, 249–257. [Google Scholar] [CrossRef]
Santos, P.; Villa, L.F.; Reñones, A.; Bustillo, A.; Maudes, J. An SVM-based solution for fault detection in wind turbines. Sensors 2015, 15, 5627–5648. [Google Scholar] [CrossRef] [PubMed]
Vong, C.M.; Wong, P.K.; Ip, W.F. A new framework of simultaneous-fault diagnosis using pairwise probabilistic multi-label classification for time-dependent patterns. IEEE Trans. Ind. Electron. 2013, 60, 3372–3385. [Google Scholar] [CrossRef]
Yang, Z.; Wong, P.K.; Vong, C.M.; Zhong, J.; Liang, J. Simultaneous-fault diagnosis of gas turbine generator systems using a pairwise-coupled probabilistic classifier. Math. Probl. Eng. 2013, 2013. [Google Scholar] [CrossRef]
Yélamos, I.; Escudero, G.; Graells, M.; Puigjaner, L. Simultaneous fault diagnosis in chemical plants using support vector machines. In Computer Aided Chemical Engineering; Valentin, P., Paul Şerban, A., Eds.; Elsevier: Philadelphia, PA, USA, 2007; Volume 24, pp. 1253–1258. [Google Scholar]
Wang, Y.S.; Lee, C.M.; Kim, D.G.; Xu, Y. Sound-quality prediction for nonstationary vehicle interior noise based on wavelet pre-processing neural network model. J. Sound Vib. 2007, 299, 933–947. [Google Scholar] [CrossRef]
Ahn, J.H.; Kwak, D.H.; Koh, B.H. Fault detection of a roller-bearing system through the emd of a wavelet denoised signal. Sensors 2014, 14, 15022–15038. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Ma, Q.; Zhu, Q.; Liu, X.; Zhao, L. An intelligent approach for engine fault diagnosis based on hilbert–huang transform and support vector machine. Appl. Acoust. 2014, 75, 1–9. [Google Scholar] [CrossRef]
Soualhi, A.; Medjaher, K.; Zerhouni, N. Bearing health monitoring based on hilbert–huang transform, support vector machine, and regression. IEEE Trans. Instrum. Meas. 2015, 64, 52–62. [Google Scholar] [CrossRef]
Jiang, L.L.; Li, B.B.; Li, X.J. An Improved hht Method and Its Application in Fault Diagnosis of Roller Bearing. Appl. Mech. Mater. 2013, 273, 264–268. [Google Scholar] [CrossRef]
Wu, J.D.; Liu, C.H. Investigation of engine fault diagnosis using discrete wavelet transform and neural network. Expert Syst. Appl. 2008, 35, 1200–1213. [Google Scholar] [CrossRef]
Wu, J.D.; Chan, J.J. Faulted gear identification of a rotating machinery based on wavelet transform and artificial neural network. Expert Syst. Appl. 2009, 36, 8862–8875. [Google Scholar] [CrossRef]
Loutas, T.H.; Sotiriades, G.; Kalaitzoglou, I.; Kostopoulos, V. Condition monitoring of a single-stage gearbox with artificially induced gear cracks utilizing on-line vibration and acoustic emission measurements. Appl. Acoust. 2009, 70, 1148–1159. [Google Scholar] [CrossRef]
Yang, Y.; Yu, D.; Cheng, J. A fault diagnosis approach for roller bearing based on imf envelope spectrum and svm. Measurement 2007, 40, 943–950. [Google Scholar] [CrossRef]
Cerrada, M.; Sánchez, R.V.; Cabrera, D.; Zurita, G.; Li, C. Multi-stage feature selection by using genetic algorithms for fault diagnosis in gearboxes based on vibration signal. Sensors 2015, 15, 23903–23926. [Google Scholar] [CrossRef] [PubMed]
Wong, P.K.; Zhong, J.; Yang, Z.; Vong, C.M. Sparse bayesian extreme learning committee machine for engine simultaneous fault diagnosis. Neurocomputing 2016, 174, 331–343. [Google Scholar] [CrossRef]
Tresp, V. A bayesian committee machine. Neural Comput. 2000, 12, 2719–2741. [Google Scholar] [CrossRef] [PubMed]
Chen, S.; Wang, W.; van Zuylen, H. Construct support vector machine ensemble to detect traffic incident. Expert Syst. Appl. 2009, 36, 10976–10986. [Google Scholar] [CrossRef]
Hansen, L.K.; Salamon, P. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, 993–1001. [Google Scholar] [CrossRef]
Wu, J.D.; Chiang, P.H.; Chang, Y.W.; Shiao, Y.J. An expert system for fault diagnosis in internal combustion engines using probability neural network. Expert Syst. Appl. 2008, 34, 2704–2713. [Google Scholar] [CrossRef]
Wang, C.; Zhou, J.; Qin, H.; Li, C.; Zhang, Y. Fault diagnosis based on pulse coupled neural network and probability neural network. Expert Syst. Appl. 2011, 38, 14307–14313. [Google Scholar] [CrossRef]
Wang, G.; Yang, Y.; Xie, Q.; Zhang, Y. Force based tool wear monitoring system for milling process based on relevance vector machine. Adv. Eng. Softw. 2014, 71, 46–51. [Google Scholar] [CrossRef]
Zio, E.; Di Maio, F. Fatigue crack growth estimation by relevance vector machine. Expert Syst. Appl. 2012, 39, 10681–10692. [Google Scholar] [CrossRef]
Wu, T.F.; Lin, C.J.; Weng, R.C. Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 2004, 5, 975–1005. [Google Scholar]
Robinson, J.; Rahmat-Samii, Y. Particle swarm optimization in electromagnetics. IEEE Trans. Antennas Propag. 2004, 52, 397–407. [Google Scholar] [CrossRef]
Trelea, I.C. The particle swarm optimization algorithm: Convergence analysis and parameter selection. Inf. Process. Lett. 2003, 85, 317–325. [Google Scholar] [CrossRef]
Wu, Z.; Huang, N.E. Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal. 2009, 1, 1–41. [Google Scholar] [CrossRef]
Lei, Y.; He, Z.; Zi, Y. Application of the eemd method to rotor fault diagnosis of rotating machinery. Mech. Syst. Signal Process. 2009, 23, 1327–1338. [Google Scholar] [CrossRef]
Widodo, A.; Yang, B.-S. Application of relevance vector machine and survival probability to machine degradation assessment. Expert Syst. Appl. 2011, 38, 2592–2599. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R. Classification by pairwise coupling. Ann. Stat. 1998, 26, 451–471. [Google Scholar] [CrossRef]
Wong, P.K.; Tam, L.M.; Li, K.; Vong, C.M. Engine idle-speed system modelling and control optimization using artificial intelligence. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2010, 224, 55–72. [Google Scholar] [CrossRef]
Hripcsak, G.; Rothschild, A.S. Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 2005, 12, 296–298. [Google Scholar] [CrossRef] [PubMed]
Qu, Y.; He, D.; Yoon, J.; van Hecke, B.; Bechhoefer, E.; Zhu, J. Gearbox tooth cut fault diagnostics using acoustic emission and vibration sensors—A comparative study. Sensors 2014, 14, 1372–1393. [Google Scholar] [CrossRef] [PubMed]

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhong, J.-H.; Wong, P.K.; Yang, Z.-X. Simultaneous-Fault Diagnosis of Gearboxes Using Probabilistic Committee Machine. Sensors 2016, 16, 185. https://doi.org/10.3390/s16020185

AMA Style

Zhong J-H, Wong PK, Yang Z-X. Simultaneous-Fault Diagnosis of Gearboxes Using Probabilistic Committee Machine. Sensors. 2016; 16(2):185. https://doi.org/10.3390/s16020185

Chicago/Turabian Style

Zhong, Jian-Hua, Pak Kin Wong, and Zhi-Xin Yang. 2016. "Simultaneous-Fault Diagnosis of Gearboxes Using Probabilistic Committee Machine" Sensors 16, no. 2: 185. https://doi.org/10.3390/s16020185

APA Style

Zhong, J.-H., Wong, P. K., & Yang, Z.-X. (2016). Simultaneous-Fault Diagnosis of Gearboxes Using Probabilistic Committee Machine. Sensors, 16(2), 185. https://doi.org/10.3390/s16020185

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Simultaneous-Fault Diagnosis of Gearboxes Using Probabilistic Committee Machine

Abstract

1. Introduction

2. Proposed Framework

2.1. Data Processing

2.1.1. Signal De-Noising

2.1.2. Feature Extraction Based on Hilbert-Huang Transform

2.2. Probabilistic Committee Machine

2.2.1. Relevance Vector Machine

2.2.2. Pairwise-Coupled Relevance Vector Machine as Committee Member

2.2.3. Ensemble Method

2.3. Parameter Optimization

Particle Swarm Optimization

2.4. Performance Evaluation

3. Experimental Setup and Data Preprocessing

3.1. Test Rig and Sample Data Acquisition

3.2. Data Processing and Signal De-Noising in Case Study

4. Experimental Results and Discussion

4.1. Performance of Various Combinations of Feature Extraction Techniques

4.2. Result and Discussion of Optimization Approach

4.3. Overall Evaluation of Proposed Framework

5. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI