An Ensemble Learning and RUL Prediction Method Based on Bearings Degradation Indicator Construction

Tian, Qiaoping; Wang, Honglei

doi:10.3390/app10010346

Open AccessArticle

An Ensemble Learning and RUL Prediction Method Based on Bearings Degradation Indicator Construction

by

Qiaoping Tian

and

Honglei Wang

^*

School of Management, Guizhou University, Huaxi District, Guiyang 550025, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2020, 10(1), 346; https://doi.org/10.3390/app10010346

Submission received: 5 November 2019 / Revised: 15 December 2019 / Accepted: 28 December 2019 / Published: 2 January 2020

Download

Browse Figures

Versions Notes

Abstract

:

The prediction of the remaining life of a bearing plays a vital role in reducing the accident-related maintenance costs of machinery and in improving the reliability of machinery and equipment. To predict bearing remaining useful life (RUL), the abilities of statistical characteristics to reflect the bearing degradation state differ, and the single prediction model has low generalization ability and a poor prediction effect. An ensemble robust prediction method is proposed here to predict bearing RUL based on the construction of a bearing degradation indicator set: the initial bearing degradation indicator subsets were constructed using the Fast Correlation-Based Filter with Approximate Markov Blankets (FCBF-AMB) and Maximal Information Coefficient (MIC) selection methods. Through the cross-operation of the obtained subsets, we obtained a set of robust degradation indicators. These selected degradation indicators were fed into the long short-term memory (LSTM) neural network prediction model enhanced by the AdaBoost algorithm. We found through calculation that the average prediction accuracy of the proposed method is 91.40%, 92.04%, and 93.25% at 2100, 2250, and 2400 rpm, respectively. Compared with other methods, the proposed method improves the prediction accuracy by 1.8% to 14.87% at most. Therefore, the method proposed in this paper is more accurate than the other methods in terms of RUL prediction.

Keywords:

remaining useful life prediction; three-stage feature selection; degradation indicator; LSTM-AdaBoost prediction model

1. Introduction

Rolling bearings are one of the key components supporting rotating shafts in rotating mechanical equipment. Bearing failure is often considered one of the most common causes of mechanical equipment failure [1,2]. Bearing reliability is critical for the reliability, durability, and efficiency of mechanical equipment. Any accidental failure of a bearing may cause have various negative effects [3,4] ranging from production downtime to casualties or even catastrophic environmental pollution. To address these issues, online detection of bearing health is urgently required to effectively enhance the safety of mechanical equipment operation [5,6,7], predict bearing remaining useful life (RUL), and to implement an action plan to prevent catastrophic events and extend the bearing life cycle [7]. Advances in bearing RUL prediction technology have provided increasingly powerful technical support for intelligent bearing RUL prediction and health management [8,9,10]. In the past few decades, the research has achieved theoretical results that have been widely applied. Most bearing RUL prediction methods apply a model-based or data-driven approach [11,12].

The model-based method mainly relies on an accurate mathematical model of bearing degradation, but bearing degradation is a complicated and difficult problem [13]. The data-driven approach uses data mining and artificial intelligence [14] to explore the potential relationship between current bearing state data and RUL. Data-driven methods have become a promising approach. Data-driven analysis methods can be used as objective and rational tools to understand the data and make decisions [15,16]. For example, in the field of life sciences [17], the data-driven approach was used to conduct diagnosis [18,19]. In the field of engineering applications, the data-driven method has been used to obtain information on road lighting infrastructure. Based on the feature selection technology of supervised and unsupervised filters, the dimension of feature space was reduced to classify and identify lamps, which was ultimately used to evaluate and optimize the performance of the street lighting at night [20]. Some experts used ensemble data-driven statistical models to map comparative shallow landslide susceptibility to obtain the relationship between heavy rain and shallow landslides [21].

The deep learning approach [22] has advantages for bearing RUL prediction [6], providing new opportunities for this research field [10,23]. A typical deep learning framework consists of four phases: data acquisition and processing, feature extraction and calculation, learning model building, and prediction. In today’s big data era, the premise of accurate bearing RUL prediction is to extract as much effective information as possible from massive amounts of monitoring data [24]. However, the data are increasingly complicated and high-dimensional. The irrelevant and redundant features in these high-dimensional data increase the complexity of the learning model, and can even reduce the prediction accuracy, which creates the problem known as “dimensional disaster” [25]. In the feature extraction and calculation stage, deep learning has some shortcomings: the time-domain features are less able to reflect the bearing degradation process details, the frequency-domain features are not sensitive to medium-term bearing degradation, and the time frequency-domain characteristics of wavelets can cause information loss. These three problems usually lead to information redundancy and increases the neural network nodes, which in turn leads to difficult training and over-fitting of the mode [6]. In this process, the traditional algorithm mainly finds a set of features with high contribution rate. Some authors [26,27,28] defined different feature types based on the contribution of features to degradation information (DI). In this case, feature correlation is a measure of the degradation-stage-related information.

A feature that does not contain information about the bearing DI is considered insignificant and therefore unnecessary for the prediction task. Removing such features can improve the prediction model and speed up the learning algorithm. Conversely, the relevant features are those that can reflect bearing DI. To minimize the prediction error, it may not be necessary to select all relevant features, but instead only select the feature subset with the highest contribution rate and the strongest prediction ability. Feature subsets with these properties may not be unique due to redundancy effects. Redundancy is usually measured by feature correlation; if the values of two features are relevant, then they are redundant.

With the feature selection method, the representative feature subset is selected, the features with a high contribution rate and sensitivity that are favorable for prediction are retained, and the complete set is replaced to construct and train the learning model. Experts and scholars have studied this field, especially based on artificial intelligence and statistical methods, using feature compression methods or similar monotonic methods [6,10]. The optimal feature selection method should not only reduce data dimensions, but also eliminate redundant and irrelevant features. Therefore, considering correlations in feature selection plays a crucial role in reducing data dimensions [29]. However, in the construction of feature subsets, only relying on a single correlation or sensitivity measurement method will bias the calculation results to some extent, which will reduce the robustness of the feature subsets. Therefore, we aimed to use the three-stage feature selection method to extract sensitive features and construct a bearing degradation indicator set. Based on two different feature extraction methods, the initial subsets of bearing degradation indicators were constructed, and the cross-operation of these subsets was applied to obtain the robust set that can fully reflect bearing degradation information.

In the deep learning model establishment and prediction stage, scholars introduced a bearing RUL prediction method based on a recurrent neural network (RNN). However, for practical problems, gradient disappearance occurs. Hochreiter et al. [30] proposed a long short-term memory (LSTM) model in 1997 to overcome the RNN problem of gradient disappearance.

Some experts have proposed using the LSTM neural network to predict bearing RUL based on the bearing degradation bottleneck feature, waveform entropy (WFE) indicator, time factor, or based on the deep feature representation method [26,31,32,33]. Compared with the previous artificial intelligence algorithms, the predictive ability of the LSTM significantly improved. The above research used a single artificial intelligence algorithm to predict bearing RUL; however, the single artificial intelligence algorithm has weaker generalization and low robustness. The bearing RUL cannot be predicted well outside the sample. To address this problem, we wanted to enhance the LSTM neural network prediction model using the AdaBoost algorithm.

To overcome the aforementioned shortcomings, we propose an ensemble robust prediction method to predict bearing RUL based on the construction of a bearing degradation indicator set.

The main contributions of this paper are summarized as follows:

(1): To reveal the state of bearing degradation more fully, we integrated the selected high contribution rate and sensitive features to form a more representative and robust feature set, defined as the bearing degradation indicator set.
(2): To ensure the robustness of the constructed set of bearing degradation indicators, a new framework for three-stage feature selection is proposed for bearing RUL prediction, which more comprehensively considers the correlation between features and bearing degradation state.
(3): The AdaBoost algorithm is proposed to enhance the prediction ability, the prediction accuracy, and the generalization ability of the LSTM prediction model.

The rest of the paper is organized as follows: Section 2 introduces the basic LSTM prediction model theory and two kinds of feature selection methods. Section 3 presents the detailed implementation process of this three-stage feature selection method that was applied to the construction of a bearing degradation indicator set and an improved LSTM-AdaBoost prediction model. The performance of the proposed method was verified using the XJTU-SY bearing datasets from Xi’an Jiaotong University (XJTU, Xi’an, China) and compared with other methods in Section 4. Finally, conclusions are drawn in Section 5.

2. Basic Theory and Algorithm

Based on the three main problems experienced: the feature correlation measurement standard in the feature extraction and calculation process, the computational complexity in the predictive modelling process, and the generalization ability of the prediction model, this section lists the relevant theories that can be used to solve these problems.

First, the initial reference degradation indicator subset

F^{*}

was screened by the fast correlation-based filter (FCBF) solution and approximate Markov Blanket to construct an initial subset of reference degradation feature indicators that can characterize the bearing degradation process. Secondly, the maximum information coefficient (MIC) was used to measure the correlation between features and features, as well as the correlation between features and bearings degradation state, to construct the initial reference degradation indicator subset

F_{F - R}

with maximum correlation between features and real RUL and the subset

F_{F - F}

with minimum redundancy between features. Thirdly, cross-operation was adopted for the initial reference bearing degradation indicator subsets to reduce the computation load, shorten the training time of prediction model, and reduce the computational complexity of the prediction modelling process. The reason for choosing different correlation measurement methods to construct the bearing degradation indicator subsets was to avoid the single correlation measurement method being affected by outliers in the data set, resulting in bias of the constructed bearing degradation indicator set and affecting the prediction accuracy.

The results obtained using different correlation measurement methods were cross-operated to retain the effective indicators to the maximum extent. Finally, the AdaBoost algorithm was used to enhance the prediction model of LSTM neural network, and multiple weak predictors were assembled into a strong predictor to predict the bearing remaining useful life.

2.1. LSTM

Recurrent neural network (RNN) is a type of neural network dedicated to processing time-series data samples. Each layer of its output is output to the next layer and to a hidden state, which is used by the current layer when processing the next sample, as shown in Figure 1. Module M of the RNN reads the input

x^{(t)}

and obtains the output

h^{(t)}

. Circulation is used to complete the transfer to the next step of information from the current step.

The above chain network structure reveals that RNN is essentially sequence-dependent. However, in practical applications, problems of gradient disappearance and gradient explosion occur. To solve these problems faced by RNN, Hochreiter et al. [30] constructed a LSTM architecture that involves a memory cell. This model resembles a standard RNN with a hidden layer. Each repeating module has a simple tanh layer. The LSTM has the same structure, but the only difference is that the structure inside each module is different, each node in the ordinary hidden layer is replaced by a storage units. The specific structure [34] of the model is shown in Figure 2. This structure ensures the RNN model has the long short-term memory in the form of weights and ephemeral activations.

x^{(t)}

is the input vector at the current time,

h^{(t - 1)}

is the hidden layer state value of the previous time

(t - 1)

, and the memory unit is the memory of the neuron state, which is used to record the current time state. The forget gate in the LSTM decides what information is retained or forgotten. The forgetting gate is calculated by the sigmoid function. The input gate decides whether to update the state of the LSTM using the current input; the output gate decides whether to pass on the hidden state to the next iteration.

\begin{matrix} g^{(t)} & = & tanh ({W^{g}}^{X} x^{(t)} + {W^{g}}^{h} h^{(t - 1)} + b_{g}), \\ i^{(t)} & = & σ ({W^{i}}^{X} x^{(t)} + {W^{i}}^{h} h^{(t - 1)} + b_{i}), \\ f^{(t)} & = & σ ({W^{f}}^{X} x^{(t)} + {W^{f}}^{h} h^{(t - 1)} + b_{f}), \\ o^{(t)} & = & σ ({W^{o}}^{X} x^{(t)} + {W^{o}}^{h} h^{(t - 1)} + b_{o}), \\ s^{(t)} & = & g^{(t)} \times i^{(t)} + s^{(t - 1)} \times f^{(t)}, \\ h^{(t)} & = & tanh (s^{(t)}) \times o^{(t)} . \end{matrix}

where W and b values are the layer weights and biases, respectively;

σ

and

t a n h

represent the sigmoid activation function and hyperbolic tangent activation function, respectively;

x^{(t)}

and

h^{(t - 1)}

are the input layer and hidden layer at time t, respectively;

g^{(t)}

,

i^{(t)}

,

f^{(t)}

, and

o^{(t)}

are the output values of the input node, the input gate, the forget gate, and the output gate, respectively; and

s^{(t)}

is an internal state at the current time.

2.2. Feature Selection

To reduce the computational burden and improve the prediction accuracy, it is necessary to select the sensitive features of the bearing degradation indicators that clearly represent the bearing degradation state information, and eliminate the irrelevant or redundant features that are useless or even affect the prediction accuracy of bearing RUL [35]. In this paper, we propose a three-stage feature selection method based on FCBF-AMB and MIC, which reduces feature redundancy and reduces feature data dimension based on the bearing degradation indicator subsets fusion method.

2.2.1. FCBF Feature Selection Method and Markov Blanket

The fast correlation-based filter (FCBF) solution feature selection method is based on the idea of significance and adopts the backward sequential search strategy to find the feature subset quickly and effectively. Symmetrical uncertainty (SU) was used as a correlation metric to select symmetrical features and remove redundant features [36].

Calculate the symmetric uncertainty of each feature:

S U (f_{i}, R) = 2 [\frac{I G (R |f_{i})}{H (R) + H (f_{i})}],

where

H (R)

and

H (f_{i})

represent the information entropy of the real RUL value R and feature

f_{i}

, respectively [37];

I G (R | f_{i})

represents the information gain

(I G)

and measures the reduction in uncertainty about the real RUL value R given the value of feature

f_{i}

.

Given a threshold value

λ

, if

S U (f_{i}, R) \geq λ

,

f_{i}

is a strongly correlated feature for the real RUL value R, it should be retained or deleted otherwise.

In this paper, symmetric uncertainty SU in the FCBF feature selection method is adopted as the metric standard to approximate the Markov Blanket. We applied approximate Markov blanket [25,36] to identify and delete redundant features. Feature redundancy can be determined using the Markov blankets [38] concept. Formally, it is defined as:

Definition 1 (Markov Blankets).

In the feature set F, for a given feature

f_{i} \in F

, if there is

f_{i} ⊥ {F - M_{i} - {f_{i}}, R} ∣ M_{i}

, the feature subset

M_{i} \subset F (f_{i} \notin M_{i})

is the Markov Blanket of

f_{i}

.

In the above definition, ⊥ denotes independent and

| M_{i}

denotes conditional on

M_{i}

. In other words, the Markov Blanket condition in the definition states, where the feature set F is divided into three mutually exclusive parts: feature

f_{i}

, feature subset

M_{i}

, and feature subset

F - M_{i} - {f_{i}}

. These three subsets have no intersection, and the union is the feature set F. If feature subset

M_{i}

is given, the feature

f_{i}

is independent of the feature subset

F - M_{i} - {f_{i}}

and the real RUL value R.

Definition 2 (Approximate Markov Blankets).

For the two features

f_{i}

and

f_{j} (i \neq j)

, the condition of

f_{i}

being the approximate Markov blanket of

f_{j}

is:

S U (f_{i}, R) > S U (f_{j}, R)

and

S U (f_{i}, R) < S U (f_{i}, f_{j})

.

The approximate Markov Blanket (AMB) is computed by comparing the correlation between feature

f_{i}

and feature

f_{j}

, and the

S U

value of

f_{i}

and the real RUL value R. If the correlation

S U

between different features is large, then

f_{j}

is an AMB.

Definition 3 (Predominant feature).

A feature

f_{i}

is a predominant feature of F if it does not have any approximate Markov blanket in F. Predominant features are not removed at any stage.

The process of using the AMB feature selection method to find and delete redundant features is as follows:

FCBF consists of two stages: obtaining the subset of relevant features and selecting the predominant features from the subset. A relevant feature

f_{i}

is predominant if no other relevant feature

f_{j}

exists, such that

f_{j}

is an AMB for

f_{i}

. The feature subset composed of all predominant features is the initial bearing degradation indicator subset

F^{*}

, which represents the degradation state of bearings.

2.2.2. Maximum Information Coefficient (MIC)

Reshef et al. [39] proposed the MIC theory and solution method, focusing on the linear and nonlinear metric relationships between variables, and further exploring the non-function dependencies between variables through this metric relationship. The MIC mainly uses mutual information as an indicator of the degree of correlation between variables and meshing methods are used for calculation.

Given variable

A = \{a_{i}, i = 1, 2, \dots n\}

and variable

B = \{b_{i}, i = 1, 2, \dots n\}

, where n is the number of samples, the mutual information

(M I)

is defined as follows:

M I (A, B) = \sum_{a \in A} \sum_{b \in B} p (a, b) log \frac{p (a, b)}{p (a) p (b)},

where

P (a, b)

is the joint probability density of A and B, and

P (a)

and

P (b)

are the boundary probability densities of A and B, respectively.

Suppose set

D = \{(a_{i}, b_{i}), i = 1, 2, \dots n\}

is a set of finite ordered pairs. It defines a division G, which is used to divide the value range of variable A into x segments and divide the value range of variable B into y segments. G is a grid with a size of

x \times y

. Calculate

M I (A, B)

within each grid partition obtained, since the same grid can be divided several ways. The maximum value of

M I (A, B)

under different division methods is chosen as the

M I

value of a division G.

The maximum mutual information formula of D under a division is defined as

M I^{*} (D, x, y) = max M I (D |G)

, where

D | G

denotes data D are divided by G. The maximum information coefficient

(M I C)

uses

M I

to indicate the quality of the grid; a feature matrix is formed by maximum normalized

M I

values under different divisions. The feature matrix is defined as

M {(D)}_{x, y}

and the formula is

M {(D)}_{x, y} = \frac{M I^{*} (D, x, y)}{log min \{x, y\}}

.

M I C

is defined as:

M I C (D) = {max}_{x y < B (n)} {M {(D)}_{x, y}}

, where n is the sample size of the sample and

B (n)

is a function of sample size and represents the upper limit of the grid

x \times y

. Generally,

ω (1) \leq B (n) \leq o (n^{1 - ε})

,

0 < ε < 1

. We set

B (n) = n^{0.6}

in the experiment [39].

Suppose feature set

F = \{f_{1}, f_{2}, \dots, f_{m}, R\}

. The number of features is m and the real RUL values are R.

M I C

is used to define the correlation between feature

f_{i}

and real RUL values R as

M I C (f_{i}, R)

. Similarly,

M I C

is used to define the correlation between feature

f_{i}

and feature

f_{j}

as

M I C (f_{i}, f_{j})

. We prefer to select larger

M I C (f_{i}, R)

and smaller

M I C (f_{i}, f_{j})

features to form a set of bearing degradation indicators.

To reduce the dimension of the bearing degradation indicator set feature data, we propose the following three-stage bearing degradation indicator set construction framework based on feature subsets fusion method.

3. Methodology

3.1. Proposed Degradation Indicator Set

The structure of the proposed bearing RUL prediction model is shown in Figure 3. The original data used for bearing RUL prediction include the bearing vibration signal. First, different features are extracted from the vibration signal data, including time-domain features and frequency-domain features. Secondly, a three-stage feature selection method is used to extract and reduce the sensitive features of the feature data to construct the indicator set for bearing degradation. Then, the most sensitive features selected in the degradation indicator set are input into LSTM-AdaBoost for RUL prediction.

This section describes the procedure for construction of the proposed bearing degradation indicator set. As shown in Figure 3, the procedure is mainly composed of three stages: feature extraction, selection of sensitive features, and construction of the degradation indicator set.

The characteristics of several subsets in a given data set can produce predictive models with similar performance, but the predictive power may be different. According to the algorithm’s search strategy or sample bias, some features can be selected [40]. In general, features extracted by different feature extraction methods with similar performance are highly correlated. We assumed that the relevant features are separately calculated and extracted to ensure their independence in the search process.

FCBF is a selection algorithm that uses correlation fast filter features. The feature ranking method is adopted to delete irrelevant or weakly correlated features. This method has low time complexity, but it cannot remove redundant features. Some experts and scholars addressed this problem by using an approximate Markov blanket de-redundancy method based on the FCBF feature selection result using the MIC as the measurement standard [25].

In this study, the FCBF-AMB feature selection method [40] was used to construct the initial bearings degradation indicator subset. To ensure the independence of the initial bearing degradation indicator subsets, we used FCBF-AMB and the MIC algorithm to extract features and form different initial bearings degradation indicator subsets, respectively. First, the subset of relevant features is obtained and arranged in descending order to identify and delete the weak correlations with irrelevant features, and to add strong correlation features to the initial feature set

F^{'}

according to their SU with respect to the real RUL. The predominant feature

f_{i}

is selected from the feature set F and placed into subset

F^{'}

. Next, let feature

f_{i}

become the first feature in this subset

F^{'}

. By definition, feature

f_{i}

is a predominant feature for each of the remaining relevant features

f_{j}

. Check whether

f_{i}

is an AMB for

f_{j}

. If so,

f_{j}

is removed from the subset

F^{'}

. Then, repeat the process until no predominant features remain in the feature set

F^{'}

. Construct the initial bearings degradation indicator subset

F^{*}

. The details are outlined in Algorithm 1.

Algorithm 1 FCBF-AMB feature selection method.

Input: Original feature set

F = \{f_{1}, f_{2}, \dots, f_{m}, R\}

, real RUL values R,

S U

threshold value

λ

.

Output: Initial bearing degradation indicator subset

F^{*}

.

1:: for $f_{i} \in F$ do
2:: Calculate the symmetric uncertainty $S U (f_{i}, R)$ between the features and real RUL values R,
3:: if $S U (f_{i}, R) > λ$ then,
4:: Add feature $f_{i}$ to feature subset $F^{'}$ and rank it in descending order,
5:: end if
6:: end for
7:: for $f_{i} \in F^{'}$ do
8:: for $f_{j} \in F^{'} \ {f_{i}}$ do
9:: if $f_{i}$ is an AMB for $f_{j}$ , then
10:: Add $f_{i}$ to $F^{*}$ , break;
11:: end if
12:: end for
13:: Remove predominant feature $f_{i}$ from $F^{'}$ .
14:: end for

Steps 1–6 include the process for removing the irrelevant and weakly correlated features using the symmetric uncertainty feature ordering method, to finally obtain a feature subset

F^{'}

with strong correlation with the bearing degradation state.

F^{'}

contains many redundant features that will be deleted in the approximate Markov blanket method in steps 7–14. The predominant feature

f_{i}

is selected from feature subset

F^{'}

and deleted; predominant feature

f_{i}

is added to the initial bearing degradation indicator subset. The above process is repeated until feature subset

F^{'}

is an empty subset.

After the first stage of processing obtains a smaller subset of bearing degradation indicators, two subsets of bearing degradation indicator based on the

M I C

feature selection method are constructed in the second stage.

Calculate the correlation

M I C (f_{i}, f_{j})

between features, the correlation

M I C (f_{i}, R)

between features and real RUL values.

M I C - F F

refers to the matrix that can measure the correlation between features, whereas

M I C - F R

refers to the matrix that can measure the correlation between features and real RUL values.

We find the minimum values for each column in the

M I C - F F

matrix and combine these minimum values into a set

m i n_{F F} = {m i n_{F F 0}, m i n_{F F 1}, \dots, m i n_{F F 24}}

, where each column corresponds to one feature, and there are 25 columns in this matrix. We find the maximum value as the

F F - t h r e s h o l d

. Then, we count the number of elements in each column that are less than the threshold value, and combine the numbers into a set

N u m_{F F} = {N u m_{F F 0}, N u m_{F F 1}, \dots, N u m_{F F 24}}

, and sort the numbers to find the median. If the number of values is greater than the median, the features corresponding to this column are weakly correlated, the more likely to be selected, they will be the elements of the bearings degradation indicator subset

F_{F - F}

as strong irrelevance features.

Similarly, the

M I C (f_{i}, R)

values are sorted in descending order to find the median of the

F R - t h r e s h o l d

value in the

M I C - F R

matrix. Then, the values greater than the threshold, will be the elements of the bearing degradation indicator subset

F_{F - R}

as strongly relevant features. In the process of feature extraction for the bearing training sets, we found that the median value of

M I C (f_{i}, R)

was 0.5 and the maximum value of

m i n_{F F i}

in the set

m i n_{F F}

was 0.1. We set the features’ maximum value

F R - t h r e s h o l d

to 0.5 and the minimum value

F F - t h r e s h o l d

to 0.1.

The steps for obtaining the bearing degradation indicator subsets are shown in Algorithm 2.

Algorithm 2 MIC feature selection method.

Input: Original data set D, original feature set

F = \{f_{1}, f_{2}, \dots, f_{m}, R\}

, real RUL value R.

Output: Initial bearing degradation indicator subset

F_{F - F}

, subset

F_{F - R}

.

1:: for $f_{i} \in F$ do,
2:: Calculate maximum information coefficient $M I C (f_{i}, f_{j})$ , obtaining the $M I C - F F$ matrix,
3:: for Every value in every column of the $M I C - F F$ matrix do,
4:: Find the minimum values and obtain the set $m i n_{F F} = {m i n_{F F 0}, m i n_{F F 1}, \dots, m i n_{F F 24}}$ ,
5:: end for
6:: end for
7:: for Every $m i n_{F F i}$ in set $m i n_{F F}$ do,
8:: Find the maximum values in in set $m i n_{F F}$ as the $F F - t h r e s h o l d$ ,
9:: end for
10:: for Every column in $M I C - F F$ matrix do,
11:: Count the number of elements in each column that are less than the $F F - t h r e s h o l d$ , obtain the set $N u m_{F F} = {N u m_{F F 0}, N u m_{F F 1}, \dots, N u m_{F F 24}}$ ,
12:: end for
13:: for $N u m_{F F i}$ in set $N u m_{F F}$ do,
14:: Find the median number $N u m_{m e d}$ in set $N u m_{F F}$ ,
15:: if $N u m_{F F i}$ > $N u m_{m e d}$ then,
16:: Select the features corresponding to the feature columns and form the feature subset $F_{F - F}$ ,
17:: end if
18:: end for
19:: for $f_{i} \in F$ do
20:: Calculate maximum information coefficient $M I C (f_{i}, R)$ , obtaining the $M I C - F R$ matrix,
21:: for Every value in every row of the $M I C - F R$ matrix do
22:: Rank the values and find the median value $F R_{m e d}$ as the $F R - t h r e s h o l d$ ,
23:: if $M I C (f_{i}, R)$ > $F R_{m e d}$ then,
24:: Select the features to form a subset $F_{F - R}$ .
25:: end if
26:: end for
27:: end for

The third stage is called the feature subsets fusion method. The bearing degradation indicator subset

F^{*}

constructed in the first stage based on FCBF-AMB, and the bearing degradation indicator subset

F_{F - F}

and subset

F_{F - R}

constructed in the second stage, are cross-operated to construct the optimal indicator set

F_{o p t}

, which characterizes the bearings degradation state.

In the above stages, three subsets of bearing degradation indicators are obtained. Subset

F^{*}

is an initial subset of degradation indicator with strong correlation and low redundancy. Subset

F_{F - F}

is a strongly uncorrelated subset composed of features with low redundancy. Subset

F_{F - R}

is a strongly correlated subset that consists of features that have strong correlations with failure modes.

3.2. LSTM-AdaBoost Ensemble Learning and Prediction Model

After constructing the LSTM neural network model mentioned in Section 2.1, the prediction ability of the model did not meet the requirements for robust prediction. AdaBoost is an iterative algorithm that was originally mainly used in classification problems, and it is sensitive to abnormal features. We considered using the AdaBoost algorithm to enhance the LSTM network model and achieve robust prediction.

Suppose we want to make the m-step ahead prediction for a time-series. The iterative prediction strategy is implemented in this paper, which can be expressed as:

{\hat{x}}_{t + m} = f (x_{t}, x_{t - 1}, \dots, x_{t - (p - 1)})

, where

\hat{x}

is the predicted value,

x_{t}

is the actual value in period t, and p denotes the lag orders.

In this study, the AdaBoost algorithm was introduced to integrate a set of LSTM predictors. The proposed LSTM-AdaBoost ensemble learning approach consists of seven steps as shown in Algorithm 3.

Algorithm 3 LSTM-Adaboost Algorithm.

Input: Training data set:

S = {(x_{t_{1}}, {\hat{x}}_{t_{1}}), (x_{t_{2}}, {\hat{x}}_{t_{2}}), \dots, (x_{t_{N}}, {\hat{x}}_{t_{N}})}

, LSTM weak predictor.

Output: Strong predictor

P (x)

1:: Initialize the weight vector. The weight distribution of the training data is initialized to: $W = (\frac{1}{N}, \frac{1}{N}, \dots, \frac{1}{N})$ , $k = 1, 2, \dots, K$ ,
2:: Suppose the weight distribution is $W_{k}$ , the prediction error of the predictor $P_{k}$ on the training data set is calculated by $ε_{k}^{i} = |P_{k} (x_{i}) - y_{i}| / E_{k}$ , where $E_{k} = sup_{i} (|P_{k} (x_{i}) - y_{i}|)$ , the output interval of the function is $[0, 1]$ ,
3:: Calculate the total error of training sample sets: $ε_{k} = \sum_{i = 1}^{n} W_{k}^{i} ε_{k}^{i}$ ,
4:: Calculate the weights of the current predictor $a_{k} = \frac{1}{2} ln (\frac{1}{β_{i}})$ , $β_{i} = \frac{ε_{k}}{1 - ε_{k}}$ ,
5:: Update the distribution of weights of training datasets as follows: $W_{k + 1}^{i} = \frac{W_{k}^{i} \cdot β_{k}^{- ε_{i}}}{Z (k)}$ , where $Z (k) = \sum_{i = 1}^{N} W_{k}^{i}$ ,
6:: Repeat steps 1–5 until all the LSTM predictors are obtained. Record the connection weight of the LSTM predictors $W = (w_{1}, w_{2}, \dots w_{K})$ , where $w_{i} = \frac{a_{i}}{\sum_{i = 1}^{K} a_{i}}$ ,
7:: Build the final predictor and integrate the above trained predictors according to the connection weights to obtain the final strong predictor.

$P (x) = w_{1} P_{1} (x) + w_{2} P_{2} (x) + \dots w_{K} P_{K} (x) .$

Through the LSTM-AdaBoost ensemble learning approach, multiple weak predictors are integrated into a strong predictor and the features of the degradation indicator set are predicted by the strong predictor. Finally, the prediction results are ensembled to obtain the remaining useful life of the bearings in the next moment. The main steps are as follows:

First, the feature extraction method proposed above is used to construct the indicator set of bearings degradation

F_{o p t}

. Next, for each feature of the bearing degradation indicator set

F_{o p t}

, the LSTM-AdaBoost ensemble learning approach is adopted to obtain the predicted remaining useful life value

{\hat{f}}_{i, (t + 1)}

corresponding to each feature at moment

t + 1

.

Finally, the above prediction results are ensembled to obtain the predicted value

{\hat{R U L}}_{, t + 1}

of useful bearing life at

t + 1

. That is

{\hat{R U L}}_{, t + 1} = \sum_{i = 1}^{n} {\hat{f_{i}}}_{, t + 1}

.

4. Experiment and Analysis

The run-to-failure data acquired from accelerated degradation tests of rolling element bearings were used to demonstrate the effectiveness of the proposed prediction approach. The proposed approach was compared with other two features selected methods.

4.1. Data Description

The bearings testbed is shown in Figure 4. These faults occurred accidentally in accelerated degradation experiments. XJTU-SY bearing datasets were provided by the Institute of Design Science and Basic Component at Xi’an Jiaotong University (XJTU), Shaanxi, China, and the Changxing Sumyoung Technology Co., Ltd. (SY), Zhejiang, China. The data sets contained complete run-to-failure data of 15 rolling element bearings that were acquired by conducting many accelerated degradation experiments. This testbed was designed to conduct the accelerated degradation tests of rolling element bearings under different operating conditions (different radial forces and rotating speeds). The tested bearings were type LDK UER204.

This platform can conduct accelerated degradation tests of bearings to provide real experimental data that characterize the degradation of bearings during their whole operating life.

To acquire the run-to-failure data of the tested bearings, two type PCB 352C33 accelerometers were horizontally and vertically mounted on the bearing to monitor its vibration. The sampling frequency was set to 25.6 kHz. As shown in Figure 5, a total of 32,768 data points (i.e., 1.28 s) were recorded for each sampling, and the sampling period was 1 min. Detailed information about the platform and experiments can be found in [41].

As tabulated in Table 1, 15 rolling element bearings were tested under three different operating conditions. Among them, the first two bearings in every operating condition were regarded as a training set and the others were used as a testing set. Figure 6 shows the vibration signal of test bearing 1-1 during its whole life cycle. The amplitude of vibration signal increases with time, which indicates that vibration signal plays an important role in bearing performance degradation assessment.

4.2. Experiment

4.2.1. Data Preprocessing and Feature Extraction

Because the vibration signal collected by the sensor contains important degradation information, appropriate transformation of the vibration signal can reflect the degradation state of the bearings. To avoid information loss, multiple features in the time and frequency domains are extracted to form feature set for selection. In addition, to accelerate the convergence of the prediction model and improve the prediction accuracy, all the features are normalized. The data preprocessing details are as shown in Algorithm 4.

Algorithm 4 Data preprocessing

Input: Data sample

S = {s_{1}, s_{2}, \dots, s_{n}}

, n is the number of samples.

Output: Original feature set F after data preprocessing.

1:: for $s_{i} i n S$ do
2:: Calculate each feature in the time and frequency domains, the calculated features are normalized and set between [0,1] to form the original feature set $F = {f_{1}, f_{2}, \dots, f_{m}}$ , and m is the number of features.
3:: end for

When the bearings in mechanical equipment fail, the amplitude and probability distribution of the time-domain signal change. Signal frequency components, energy of different frequency components, and the position of the main energy spectrum of the spectrum change, which can effectively characterize the state of bearing health, provide the information about the noise in the bearing vibration signal [6,42]. Some features are useless, so choosing the appropriate time-domain and frequency-domain features is the key to effectively predicting the bearing RUL. To obtain more DI and fully reflect the running state of bearings, the feature parameters in the time and frequency-domain are comprehensively used here.

Each of these vibration signals is processed to extract 12 time-domain features, such as mean, variance, and kurtosis. A total of 13 frequency-domain features characterize the degradation of bearing performance, as shown in Table 2. In this study, the time-domain feature and frequency-domain features were calculated using the feature parameters listed in Table 2 [43].

4.2.2. Construction of Bearing Degradation Indicator Set

The features mentioned above represent bearings degradation from the different perspective. However, if all these features have been taken as input parameters to the model, then it may result into model over-fitting. Thus, before using these features as input parameters to the model, it is desirable to select the most sensitive features from the feature set and remove the less indicative features to improve the model accuracy [44].

In this paper, the three-stage feature selection method is used to select the sensitive features that can characterize bearings degradation state, which is used to construct the bearings degradation indicator set. Taking operating condition A as an example, the construction process of degradation indicator set is described in detail as follows. Figure 7 shows the sensitive features extracted by the first-stage FCBF feature extraction method. The reference value symmetric uncertainty

S U

is sorted in descending order, the threshold

λ

given in this paper is 0.1, i.e., the feature with the SU value greater than 0.1 is selected to be placed in the feature subset

F^{'}

, then, an AMB de-redundancy method is used to de-redundant the features in feature subset

F^{'}

, construct the feature subset in Figure 8 as the initial bearings degradation indicator subset

F^{*}

.

In the second stage, the

M I C

method mentioned above is used to measure the correlation between features and failure modes, features and features, and construct a strong subset

F_{F - R}

with strong correlations between features and failure modes, and a strongly uncorrelated subset

F_{F - F}

consisting of less redundant features. The two subsets of bearing degradation indicators are shown in Figure 9a,b.

In the third stage, the above three bearing degradation indicator subsets,

F^{*}

,

F_{F - R}

, and

F_{F - F}

, are cross-operated based on the fusion method to obtain a strong correlation and low redundancy optimal degradation indicator set

F_{o p t}

. The final bearings degradation indicator set consists of eight features shown in Figure 10, which will be applied to the bearing remaining useful life prediction as the degradation indicators of the bearing.

According to the proposed feature selection method, features selected by the proposed method are shown in Figure 11.

4.2.3. Train Prediction Model

After obtaining the optimal degradation indicator set revealing the state of bearing degradation, the prediction model is trained. The model is trained by LSTM network, and the AdaBoost algorithm is used to optimize the LSTM prediction model to form a strong predictor. The input of the model is the degradation indicators of the optimal set, the output is the RUL of the bearings. After the training process, the trained model is used to predict the bearing RUL.

4.3. Results and Analysis

To reflect the advantages of constructing the bearing degradation indicator set by using the three-stage feature selection method proposed in this paper, different bearings selection methods were applied to the bearings under three different working conditions to construct the bearing degradation indicator set. Each bearing in the test set was run 10 times; we obtained the average prediction accuracy of three bearings under the same operating conditions. Figure 12a–c depict the average prediction accuracy and feature selection of operating condition A, operating condition B, and operating condition C, respectively. There were three bearings in each condition. As shown in Figure 12, the proposed method extracts fewer features than the other two feature selection methods, and has relatively high accuracy. This is mainly because the proposed method conducts cross-operation on different subsets of degradation indicators, ensuring the robustness of the set of degradation indicators based on reducing feature dimensions.

The feature selection method proposed in this paper is based on the correlation and redundancy measurement, aiming to reduce the complexity of the model and ensure the sensitivity and high contribution rate of the features. mRMR is a method based on the correlation and redundancy measurement. Principal component analysis (PCA) is a dimensionality reduction method that also has a significant effect on reducing the data dimension, and FCBF+Markov Blanket has also been applied [36]. Therefore, these methods were compared with the proposed method in this study. Based on the bearings degradation indicator set constructed above, we used the LSTM neural network and LSTM-AdaBoost ensemble algorithm to predict the RUL of bearing 1_3, bearing 2_3, and bearing 3_3 under three operating conditions.

Figure 13a,b reveal the results of predicting the RUL of bearing 1_3 using different feature selection methods and prediction models. The figure shows that in the early stage of prediction, the prediction result of all the three methods deviate considerably. The prediction results of the non-feature selection method deviates more from the real RUL value, indicating that feature selection is necessary for RUL prediction. The proposed method is the first to fit near the real curve. By comparing the prediction effects of different prediction models, we conclude that AdaBoost algorithm plays a role in improving the prediction accuracy. The prediction results of the other two bearings are shown in Figure 13c–f. The prediction results of the RUL of these two bearings are similar to those of bearing 1_3, further demonstrating that the method proposed in this paper is more robust and produces a better prediction effect under different operating conditions and degradation stages. The effectiveness of this method has been proven.

The accuracy of model prediction is measured using the mean square error (MSE). Table 3 shows the comparison results of the MSE between no feature selection, PCA, mRMR method, FCBF + Markov Blanket method, and the proposed method. We predicted the bearings under different operating conditions in the test set 10 times, and calculated the mean value of the MSE of the three bearings under each operating condition to represent the predicted results under different operating conditions. To more clearly determine the prediction effects of different methods, taking bearing 1_3 as an example, we selected the prediction results for 12 moments of the bearing, which was the average values obtained by the prediction model running 10 times. We compared the different methods and list the absolute error

= | P r e d i c t e d R U L - R e a l R U L |

. Table 4 provides the details of the predicted results.

In Table 3 and Table 4, the MSE and absolute error of non-feature selection method were the largest because irrelevant and redundant data existed in the original data set, even noise. If no feature selection process or filtering process is applied, some outliers also contribute to the prediction model, which leads to significant deviation of the prediction model and reduces the prediction accuracy. This further proves that feature selection is critical during bearing RUL prediction process. The two tables show that PCA and mRMR do not perform well either because, at the initial stage of bearing degradation, the prediction model learning the degradation characteristics is not complete, resulting in a large error. When the prediction model learns the degradation characteristics, it is affected by the low indicators or weak contribution rate of degradation characteristics to some extent. The prediction error of the method based on FCBF + Markov Blanket is higher than our proposed method. This shows that the more comprehensive the comprehensive features, the more accurate the prediction results. Compared with the results of the two prediction models, the error is significantly reduced for the proposed model, which proves that the LSTM-AdaBoost ensemble prediction method provides improved prediction accuracy.

The proposed method can approximate the real RUL curve of bearings because the method is more robust and avoids the single feature selection method, which may lead to feature sensitivity bias and make the contribution rate of some features larger or smaller under certain measurement standards. The AdaBoost algorithm is a robust lifting algorithm that further guarantees the generalization ability of the prediction model. The experimental results showed that the proposed method has good practical value for bearing RUL prediction.

5. Conclusions

The bearing RUL prediction accuracy largely depends on the performance of the degradation indicator set. This paper proposes an ensemble learning method to improve the prediction accuracy of bearing RUL. We mainly studied the feature extraction phase and prediction modelling phase in the process of bearing RUL prediction. For the feature extraction phase, a three-stage feature selection method was proposed to construct the bearing degradation indicator set; for the prediction modelling phase, AdaBoost algorithm was used to enhance the LSTM neural network. Finally, the features of the bearing degradation indicators set were input into the LSTM-AdaBoost prediction model for ensemble learning and robust prediction. Through experimental verification, the proposed method was applied to the XJTU-SY bearing datasets, and the method was compared with different feature selection methods and different prediction modelling methods. The results showed that the method can effectively predict bearing RUL.

Author Contributions

H.W. designed research and conceptualization; Q.T. performed research and verified the proposed method; H.W. and Q.T. wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by National Natural Science Foundation of China under grant 71962004.

Conflicts of Interest

The authors declare no conflict of interest.

References

Boskoski, P.; Gasperin, M.; Petelin, D. Bearing fault prognostics based on signal complexity and Gaussian process models. In Proceedings of the 2012 IEEE Conference on Prognostics and Health Management (PHM), Denver, CO, USA, 18–21 June 2012. [Google Scholar]
Rasmussen, C.E. Gaussian Processes in Machine Learning. In Advanced Lectures on Machine Learning; Bousquet, O., von Luxburg, U., Ratsch, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3176. [Google Scholar]
Amirhossein, G.; Lee, H.H. Probabilistic frequency-domain discrete wavelet transform for better detection of bearing faults in induction motors. Neurocomputing 2016, 188, 206–216. [Google Scholar]
Liu, J.; Wang, W.; Golnaraghi, F. An enhanced diagnostic scheme for bearing con- dition monitoring. IEEE Trans. Instrum. Meas. 2010, 59, 309–321. [Google Scholar]
Sikorska, J.Z.; Hodkiewicz, M.; Ma, L. Prognostic modelling options for remaining useful life estimation by industry. Mech. Syst. Signal Process. 2011, 25, 1803–1836. [Google Scholar] [CrossRef]
Guo, L.; Li, N.P.; Jia, F.; Lei, Y.G. A recurrent neural network based health indicator for remaining useful life prediction of bearings. Neurocomputing 2017, 240, 98–109. [Google Scholar] [CrossRef]
Wang, D.; Shen, C.Q. An equivalent cyclic energy indicator for bearing performance degradation assessment. J. Vib. Control 2016, 22, 2380–2388. [Google Scholar] [CrossRef]
Guo, L.; Lei, Y.G.; Li, N.P.; Yan, T. Machinery health indicator construction based on convolutional neural networks considering trend burr. Neurocomputing 2018, 292, 142–150. [Google Scholar] [CrossRef]
Loutas, T.H.; Roulias, D.; Georgoulas, G. Remaining Useful Life Estimation in Rolling Bearings Utilizing Data-Driven Probabilistic E-Support Vectors Regression. IEEE Trans. Reliab. 2013, 62, 821–832. [Google Scholar] [CrossRef]
Ren, L.; Sun, Y.Q.; Cui, J.; Zhang, L. Bearing remaining useful life prediction based on deep autoencoder and deep neural networks. J. Manuf. Syst. 2018, 48, 71–77. [Google Scholar] [CrossRef]
Sutrisno, E.; Oh, H.; Vasan, A.S.S.; Pecht, M. Estimation of remaining useful life of ball bearings using data driven methodologies. In Proceedings of the 2012 IEEE Conference on Prognostics and Health Management, Denver, CO, USA, 18–21 June 2012; pp. 1–7. [Google Scholar]
Caesarendra, W.; Widodo, A.; Thom, P.H.; Yang, B.S. Combined Probability Approach and Indirect Data-Driven Method for Bearing Degradation Prognostics. IEEE Trans. Reliab. 2011, 60, 14–20. [Google Scholar] [CrossRef]
Si, X.S.; W, W.B.; Hu, C.H.; Zhou, D.H. emaining useful life estimation—A review on the statistical data driven approaches. Eur. J. Oper. Res. 2011, 213, 1–14. [Google Scholar] [CrossRef]
Liu, F.; Shen, C.Q.; He, Q.B.; Zhang, A. Wayside bearing fault diagnosis based on a data-driven Doppler effect eliminator and transient model analysis. Sensors 2014, 14, 8096–8125. [Google Scholar] [CrossRef] [Green Version]
Krush, M.T.; Agnihotri, R.; Trainor, K.J. A Contingency Model of Marketing Dashboards and Their Influence on Marketing Strategy Implementation Speed and Market Information Management Capability. Eur. J. Mark. 2016, 50, 2077–2102. [Google Scholar] [CrossRef]
Wilson, R.D. Using Clickstream Data to Enhance Business-to-Business Web Site Performance. J. Bus. Ind. Mark. 2010, 25, 177–187. [Google Scholar] [CrossRef]
Laukens, K.; Eyckmans, M.; Neuter, N.D. Preparing students for the data-driven life science era through a real-world viral infection case. J. Biol. Educ. 2019, 1–10. [Google Scholar] [CrossRef]
Irit, H.; Deeb, D.; Naim, S.; Elad, Y.T. Can internet search engine queries be used to diagnose diabetes? Analysis of archival search data. Acta Diabetol. 2019, 56, 1149–1154. [Google Scholar]
Fu, C.; Chang, W.J.; Liu, W.Y.; Yang, S.L. Data-driven group decision making for diagnosis of thyroid nodule. Sciece China Inf. Sci. 2019, 62, 1–23. [Google Scholar] [CrossRef] [Green Version]
Kumar, S.; Deshpande, A.; Ho, S.S.; Ku, J.S.; Sarma, S.E. Urban Street Lighting Infrastructure Monitoring using a Mobile Sensor Platform. IEEE Sens. J. 2016, 16, 4981–4994. [Google Scholar] [CrossRef]
Dou, J.; Yunus, A.P.; Xu, Y.R.; Zhu, Z.F.; Chen, C.W. Torrential rainfall-triggered shallow landslide characteristics and susceptibility assessment using ensemble data-driven models in the Dongjiang Reservoir Watershed, China. Nat. Hazards 2019, 97, 579–609. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef]
Ren, L.; Cui, J.; Sun, Y.Q.; Cheng, X.J. Multi-bearing remaining useful life collaborative prediction: A deep learning approach. J. Manuf. Syst. 2017, 43, 248–256. [Google Scholar] [CrossRef]
Hu, C.H.; Pei, H.; Wang, Z.Q.; Si, X.S. A new remaining useful life estimation method for equipment subjected to intervention of imperfect maintenance activities. Chin. J. Aeronaut. 2018, 31, 514–528. [Google Scholar] [CrossRef]
Sun, G.L.; Li, J.B.; Dai, J.; Song, Z.C. Feature selection for IoT based on maximal information coefficient. Future Gener. Comput. Syst. 2018, 89, 606–616. [Google Scholar] [CrossRef]
Wang, F.T.; Liu, X.F.; Deng, G. Remaining Life Prediction Method for Rolling Bearing Based on the Long Short-Term Memory Network. Neural Process. Lett. 2019, 50, 2437–2454. [Google Scholar] [CrossRef]
Zhao, L.; Wang, X. A deep feature optimization fusion method for extracting bearing degradation features. IEEE Access 2018, 6, 19640–19653. [Google Scholar] [CrossRef]
Wang, D.; Tsui, K.; Miao, Q. Prognostics and Health Management: A Review of Vibration Based Bearing and Gear Health Indicators. IEEE Access 2018, 6, 665–676. [Google Scholar] [CrossRef]
Tang, X.H.; Wang, J.C.; Lu, J.G.; Liu, G.K. Improving bearing fault diagnosis using maximum information coefficient based feature selection. Appl. Sci. Basel 2018, 8, 2143. [Google Scholar] [CrossRef] [Green Version]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Tang, G.; Zhou, Y.G.; Wang, H.Q.; Li, G.Z. Prediction of bearing performance degradation with bottleneck feature based on LSTM network. In Proceedings of the 2018 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Houston, TX, USA, 14–17 May 2018; pp. 1–6. [Google Scholar]
Zhang, B.; Zhang, S.H.; Li, W.H. Bearing performance degradation assessment using long short-term memory recurrent network. Comput. Ind. 2019, 106, 14–29. [Google Scholar] [CrossRef]
Mao, W.T.; He, J.L.; Tang, J.M.; Li, Y. Predicting remaining useful life of rolling bearings based on deep feature representation and long short-term memory neural network. Adv. Mech. Eng. 2019, 10, 1–18. [Google Scholar] [CrossRef]
Zhang, Y.Z.; Xiong, R.; He, H.W.; Pecht, M.G. Long short-term memory recurrent neural network for remaining useful life prediction of lithium-ion batteries. IEEE Trans. Veh. Technol. 2018, 67, 5695–5705. [Google Scholar] [CrossRef]
Lei, Y.G.; Li, N.P.; Guo, L.; Li, N.B. Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mech. Syst. Signal Process. 2018, 104, 799–834. [Google Scholar] [CrossRef]
Yu, L.; Liu, H. Eficient Feature Selection Via Analysis of Relevance and Redundancy. J. Mach. Learn. Res. 2004, 5, 1205–1224. [Google Scholar]
Yang, Y.M.; Pedersen, J.O. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, USA, 8–12 July 1997; pp. 412–420. [Google Scholar]
Koller, D.; Sahami, M. Toward optimal feature selection. In Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, 3–6 July 1996; pp. 284–292. [Google Scholar]
Reshef, D.N.; Reshef, Y.A.; Finuca, H.K.; Grossman, S.R. Detecting Novel Associations in Large Data Sets. Science 2011, 334, 1518–1524. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Miguel, G.H.; Francisco, G.V.; Belen, M.B.; Moreno-Vega, J.M. High-dimensional feature selection via feature grouping: A variable neighborhood search approach. Inf. Sci. 2016, 326, 102–118. [Google Scholar]
Wang, B.; Lei, Y.G.; Li, N.P.; Li, N.B. A Hybrid Prognostics Approach for Estimating Remaining Useful Life of Rolling Element Bearings. IEEE Trans. Reliab. 2018, 1–12. [Google Scholar] [CrossRef]
Yu, J.B. Adaptive hidden Markov model-based online learning framework for bearing faulty detection and performance degradation monitoring. Mech. Syst. Signal Process. 2017, 83, 149–162. [Google Scholar] [CrossRef]
Lei, Y.G.; He, Z.J.; Zi, Y.Y. A new approach to intelligent fault diagnosis of rotating machinery. Expert Syst. Appl. 2008, 35, 1593–1600. [Google Scholar] [CrossRef]
Kundu, P.; Chopra, S.; Lad, B.K. Multiple failure behaviors identification and remaining useful life prediction of ball bearings. J. Intell. Manuf. 2017, 7, 1–13. [Google Scholar]

Figure 1. Recurrent neural network (RNN) expansion structure.

Figure 2. The architecture of a long short-term memory (LSTM) memory cell.

Figure 3. A flowchart of the proposed bearing degradation indicator set.

Figure 4. Bearing testbed.

Figure 5. Sampling setting for vibration signals.

Figure 6. Typical vertical vibration signals.

Figure 7. The sensitive features were extracted and sorted in descending order in the feature subset

F^{'}

.

Figure 7. The sensitive features were extracted and sorted in descending order in the feature subset

F^{'}

.

Figure 8. Initial bearing degradation indicator subset

F^{*}

.

Figure 8. Initial bearing degradation indicator subset

F^{*}

.

Figure 9. Initial bearings degradation indicator (a) subset

F_{F - R}

and (b) subset

F_{F - F}

.

Figure 9. Initial bearings degradation indicator (a) subset

F_{F - R}

and (b) subset

F_{F - F}

.

Figure 10. Optimal degradation indicator set

F_{o p t}

of operating condition A.

Figure 10. Optimal degradation indicator set

F_{o p t}

of operating condition A.

Figure 11. Features selected by the proposed method.

Figure 12. Average prediction accuracy and number of features selected under three operating conditions. (a) average prediction accuracy and the number of features selected under condition A, (b) average prediction accuracy and the number of features selected under condition B, (c) average prediction accuracy and the number of features selected under condition C.

Figure 13. RUL prediction resultsusing three different features selection methods for three bearings.

Table 1. Operating Conditions of the Tested Bearings.

Operating Condition	Rotating Speed (rpm)	Radial Force (KN)	Bearings Dataset
Condition A	2100	12	Bearing1_1 Bearing1_2 Bearing1_3 Bearing1_4 Bearing1_5
Condition B	2250	11	Bearing2_1 Bearing2_2 Bearing2_3 Bearing2_4 Bearing2_5
Condition C	2400	10	Bearing3_1 Bearing3_2 Bearing3_3 Bearing3_4 Bearing3_5

Table 2. The feature parameters.

Feature	Time-Domain Feature Parameters	Feature	Frequency-Domain Feature Parameters
$F 0$	$f_{0} = \frac{\sum_{n = 1}^{N} x (n)}{N}$	$F 12$	$f_{12} = \frac{\sum_{k = 1}^{K} s (k)}{K}$
$F 1$	$f_{1} = \sqrt{\frac{\sum_{n = 1}^{N} {(x (n) - f_{0})}^{2}}{N - 1}}$	$F 13$	$f_{13} = \frac{\sum_{k = 1}^{K} {(s (k) - f_{12})}^{2}}{K - 1}$
$F 2$	$f_{2} = {(\frac{\sum_{n = 1}^{N} \sqrt{\|x (n)\|}}{N})}^{2}$	$F 14$	$f_{14} = \frac{\sum_{k = 1}^{K} {(s (k) - f_{12})}^{3}}{K {(\sqrt{f_{13}})}^{3}}$
$F 3$	$f_{3} = \sqrt{\frac{\sum_{n = 1}^{N} {(x (n))}^{2}}{N}}$	$F 15$	$f_{15} = \frac{\sum_{k = 1}^{K} {(s (k) - f_{12})}^{4}}{K {(f_{13})}^{2}}$
$F 4$	$f_{4} = max \|x (n)\|$	$F 16$	$f_{16} = \frac{\sum_{k = 1}^{K} {\tilde{f}}_{k} s (k)}{\sum_{k = 1}^{K} s (k)}$
$F 5$	$f_{5} = \frac{\sum_{n = 1}^{N} {(x (n) - f_{1})}^{3}}{(N - 1) f_{2}^{3}}$	$F 17$	$f_{17} = \sqrt{\frac{\sum_{k = 1}^{K} {({\tilde{f}}_{k} - f_{16})}^{2} s (k)}{K}}$
$F 6$	$f_{6} = \frac{\sum_{n = 1}^{N} {(x (n) - f_{0})}^{4}}{(N - 1) f_{1}^{4}}$	$F 18$	$f_{18} = \sqrt{\frac{\sum_{k = 1}^{K} {\tilde{f}}_{k}^{2} s (k)}{\sum_{k = 1}^{K} s (k)}}$
$F 7$	$f_{7} = \frac{f_{4}}{f_{3}}$	$F 19$	$f_{19} = \sqrt{\frac{\sum_{k = 1}^{K} {\tilde{f}}_{k}^{4} s (k)}{\sum_{k = 1}^{K} {\tilde{f}}_{k}^{2} s (k)}}$
$F 8$	$f_{8} = \frac{f_{4}}{f_{2}}$	$F 20$	$f_{20} = \frac{\sum_{k = 1}^{K} {\tilde{f}}_{k}^{2} s (k)}{\sqrt{\sum_{k = 1}^{K} s (k) \sum_{k = 1}^{K} {\tilde{f}}_{k}^{4} s (k)}}$
$F 9$	$f_{9} = \frac{f_{3}}{\frac{1}{N} \sum_{n = 1}^{N} \|x (n)\|}$	$F 21$	$f_{21} = \frac{f_{17}}{f_{16}}$
$F 10$	$f_{10} = \frac{f_{4}}{\frac{1}{N} \sum_{n = 1}^{N} \|x (n)\|}$	$F 22$	$f_{22} = \frac{\sum_{k = 1}^{K} {({\tilde{f}}_{k} - f_{16})}^{3} s (k)}{K {f_{17}}^{3}}$
$F 11$	$f_{11} = {\sum_{n = 1}^{N} \|x (n)\|}^{2}$	$F 23$	$f_{23} = \frac{\sum_{k = 1}^{K} {({\tilde{f}}_{k} - f_{16})}^{4} s (k)}{K {f_{17}}^{4}}$
		$F 24$	$f_{24} = \frac{\sum_{k = 1}^{K} {({\tilde{f}}_{k} - f_{16})}^{\frac{1}{2}} s (k)}{K \sqrt{f_{17}}}$
	where $x (n)$ is the time-domain signal series, for $n = 1, 2, \dots, N$ , N is the number of each sample points.		where $s (k)$ is the frequency-domain signal series, for $k = 1, 2, \dots, K$ , K is the number of spectral lines. ${\tilde{f}}_{k}$ is the frequency value of the k-th spectral line.

Table 3. Comparison of prediction accuracy of LSTM and LSTM-AdaBoost at motor speeds of 2100, 2250, and 2400 rpm using different feature selection methods.

Operating Condition	Feature Selection Method	Prediction Results (MSE)
Operating Condition	Feature Selection Method	LSTM	LSTM-AdaBoost
Condition A 2100 rpm	No feature selection	471.28	317.43
	PCA	359.44	244.06
	mRMR	221.64	162.90
	FCBF + Markov Blanket	82.96	67.09
	Proposed method	19.68	10.02
Condition B 2250 rpm	No feature selection	322.81	263.26
	PCA	206.92	141.32
	mRMR	186.46	127.24
	FCBF + Markov Blanket	46.62	41.18
	Proposed method	13.29	7.06
Condition A 2400 rpm	No feature selection	419.36	179.02
	PCA	143.52	97.21
	mRMR	193.77	102.06
	FCBF + Markov Blanket	64.27	28.64
	Proposed method	21.49	15.06

Table 4. Comparison of prediction results of bearing 1_3 using different methods.

Moment	Real RUL	No Feature Selection				PCA				mRMR				FCBF + Markov Blanket				Proposed Method
Moment	Real RUL	LSTM		LSTM-AdaBoost		LSTM		LSTM-AdaBoost		LSTM		LSTM-AdaBoost		LSTM		LSTM-AdaBoost		LSTM		LSTM-AdaBoost
(Min)	(Min)	Predicted	Absolute	Predicted	Absolute	Predicted	Absolute	Predicted	Absolute	Predicted	Absolute	Predicted	Absolute	Predicted	Absolute	Predicted	Absolute	Predicted	Absolute	Predicted	Absolute
(Min)	(Min)	RUL	Error	RUL	Error	RUL	Error	RUL	Error	RUL	Error	RUL	Error	RUL	Error	RUL	Error	RUL	Error	RUL	Error
45	113	102.43	10.57	124.7	11.7	122.27	9.27	119.3	6.3	99.35	13.65	118.92	5.92	119	6	110.32	2.68	107	6	116.61	3.61
55	103	113.7	10.7	108.23	5.23	98.3	4.7	98.11	4.89	95.87	7.13	111.4	8.4	99.3	3.7	105.25	2.25	105	2	101	2
65	93	101.09	8.09	99.64	6.64	105	12	97.04	4.04	88.3	4.7	98.18	5.18	89.6	3.4	95.77	2.77	95.9	2.9	95.28	2.28
75	83	92.14	9.14	89.96	6.96	86.6	3.6	78.15	4.85	79.17	3.83	90.96	7.96	90.2	7.2	87.13	4.13	86.5	3.5	81.32	1.68
85	73	68.22	4.78	67.8	5.2	79.1	6.1	69.32	3.68	77	4	75.82	2.82	76.31	3.31	74.29	1.29	75.13	2.13	72.91	0.09
95	63	67.04	4.04	58.39	4.61	69.4	6.4	65.51	2.51	60.4	2.6	59.79	3.21	61.4	1.6	64.6	1.6	64.02	1.02	63.29	0.29
105	53	42.93	10.07	56.77	3.77	55.8	2.8	54.91	1.91	47.2	5.8	57.03	4.03	56	3	54.94	1.94	54.72	1.72	54.41	1.41
115	43	37.91	5.09	46.47	3.47	47	4	46.76	3.76	40.8	2.2	45.19	2.19	42.11	0.89	45.79	2.79	42.09	0.91	43.55	0.55
125	33	36.25	3.25	38.06	5.06	30.58	2.42	31.77	1.23	30.2	2.8	35.27	2.27	36.5	3.5	34.18	1.18	35.2	2.2	34.29	1.29
135	23	28.03	5.03	26.48	3.48	24.9	1.9	25.73	2.73	24.6	1.6	22.39	0.61	26.17	3.17	22.16	0.84	24.8	1.8	23.53	0.53
145	13	15.23	2.23	15.5	2.5	15.02	2.02	14.7	1.7	14.37	1.37	15.91	2.91	15.41	2.41	12.4	0.6	14.33	1.33	13.6	0.6
155	3	4.29	1.29	3.73	0.73	4.02	1.02	3.79	0.79	3.77	0.77	3.72	0.72	3.68	0.68	2.49	0.51	2.63	0.37	2.87	0.13

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tian, Q.; Wang, H. An Ensemble Learning and RUL Prediction Method Based on Bearings Degradation Indicator Construction. Appl. Sci. 2020, 10, 346. https://doi.org/10.3390/app10010346

AMA Style

Tian Q, Wang H. An Ensemble Learning and RUL Prediction Method Based on Bearings Degradation Indicator Construction. Applied Sciences. 2020; 10(1):346. https://doi.org/10.3390/app10010346

Chicago/Turabian Style

Tian, Qiaoping, and Honglei Wang. 2020. "An Ensemble Learning and RUL Prediction Method Based on Bearings Degradation Indicator Construction" Applied Sciences 10, no. 1: 346. https://doi.org/10.3390/app10010346

APA Style

Tian, Q., & Wang, H. (2020). An Ensemble Learning and RUL Prediction Method Based on Bearings Degradation Indicator Construction. Applied Sciences, 10(1), 346. https://doi.org/10.3390/app10010346

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Ensemble Learning and RUL Prediction Method Based on Bearings Degradation Indicator Construction

Abstract

1. Introduction

2. Basic Theory and Algorithm

2.1. LSTM

2.2. Feature Selection

2.2.1. FCBF Feature Selection Method and Markov Blanket

2.2.2. Maximum Information Coefficient (MIC)

3. Methodology

3.1. Proposed Degradation Indicator Set

3.2. LSTM-AdaBoost Ensemble Learning and Prediction Model

4. Experiment and Analysis

4.1. Data Description

4.2. Experiment

4.2.1. Data Preprocessing and Feature Extraction

4.2.2. Construction of Bearing Degradation Indicator Set

4.2.3. Train Prediction Model

4.3. Results and Analysis

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI