A Novel Hybrid Model for the Prediction and Classiﬁcation of Rolling Bearing Condition

: Rotating machinery is a key piece of equipment for tremendous engineering operations. Vibration analysis is a powerful tool for monitoring the condition of rotating machinery. Furthermore, vibration signals have the characteristics of time series. Hence, it is necessary to monitor the condition of vibration signal series to avoid any catastrophic failure. To this end, this paper proposes an effective condition monitoring strategy under a hybrid method framework. First, we add variational mode decomposition (VMD) to preprocess the data points listed in a time order into a subseries, namely intrinsic mode functions (IMFs). Then the framework of the hybrid prediction model, namely the autoregressive moving average (ARMA)-artiﬁcial neural network (ANN), is adopted to forecast the IMF series. Next, we select the sensitive modes that contain the prime information of the original signal and that can imply the condition of the machinery. Subsequently, we apply the support vector machine (SVM) classiﬁcation model to identify the multiple condition patterns based on the multi-domain features extracted from sensitive modes. Finally, the vibration signals from the Case Western Reserve University (CWRU) laboratory are utilized to verify the effectiveness of our proposed method. The comparison results demonstrate advantages in prediction and condition monitoring.


Introduction
In practical operation, machinery control technology [1][2][3][4], fault diagnosis and condition monitoring [5,6] have attracted wide research. Vibration analysis is a widely used tool for monitoring the condition of rotating machinery in industrial operation [7,8].
There is abundant information in the time or frequency domain reflecting the characteristics of a machine's condition; however, some information is not suitable to apply directly. The signal decomposition method can decompose the original signal into a subseries to further comprehensively analyze the signal. Therefore, many researchers have been attracted to the field. A considerable number of techniques have been extensively carried out to fix this issue, such as wavelet decomposition (WT) [9,10]; however, wavelet decomposition aims to find wavelet basis functions, which generates a non-adaptive problem. This is the reason that the wavelet basis function is selected empirically. Different from the WT, empirical mode decomposition (EMD) was presented by Huang et al. [11,12]. Based on the inherent properties of the signal itself, the original signal was empirically decomposed by EMD. Therefore, the data from nonlinear and non-stationary processes can be decomposed by EMD, while a physically meaningful subseries can be obtained based on its adaptive characteristics [13]. Regrettably, the endpoint effect in EMD is still unresolved. Ensemble empirical mode decomposition (EEMD) was introduced by Wu and Huang [14] to alleviate the mode mixing problem occurred in EMD [15], which was a typical noise-assisted data analysis method and also has been widely applied in time-frequency analysis. Furthermore, Wang et al. [16] validated the superiority of EEMD over EMD. Both EMD and EEMD exhibit tremendous computational load and are not suitable for decomposing signals that contain large amounts of data based on their recursive decomposition [17]. In 2013, Dragomiretskiy and Zosso proposed a new algorithm for decomposing signals adaptively called variational mode decomposition (VMD) [18]. VMD is a fully non-recursive method with a more solid mathematical theoretical foundation [19]. Several studies have verified that VMD has better performance over WT, EMD, and EEMD in terms of time-frequency analysis [20][21][22]. Thus, the original signal is decomposed into several sub-series through VMD, which is targeted to enhance the accuracy and convenience of the downstream feature extraction and classification processing.
The vibration signal has the characteristics of a typical time series. It has been found that time series model-based techniques play a pivotal role in monitoring the condition of machinery based on vibration signals [23]. Therefore, an effective time series forecasting model is the key to monitoring the condition of industrial equipment, which can uncover potential statistical patterns through exploring the functional relationships. Furthermore, such patterns can give a valuable early warning. However, it is a challenge to acquire reliable information in practical operations. For time-series analysis and modeling, ARMA is one of the most popular approaches, which has made considerable achievements in various fields in terms of forecasting, such as finance [24,25], engineering [26], and many others [27,28]. In addition, ARMA is more accurate than some other popular machine learning methods in prediction, such as multi-layer perceptron, SVM, and long shortterm memory [29,30]. Although ARMA has made great progress in coping with linear analyses based on the assumption that a linear functional relationship exists for current and past values and white noise, its limitations in nonlinear analysis hinder its application. Considerable research on ARMA has demonstrated that its performance is poor in modeling nonlinear real-world problems [31]. For nonlinear time series modeling, ANN is one of the most widely used algorithms [32,33], while the ANN model alone can not give full consideration to both linear and nonlinear patterns simultaneously [31,34]. In summary, any single model can not perform well in every situation. Consequently, considering the merits and drawbacks of ARMA in linear flexibility and ANN for nonlinear time series modeling, their synthetic prediction model, namely, ARMA-ANN, is constructed as a prediction model for subseries in this paper. The motivation for applying the hybrid model is demonstrated as follows [34]. First, instability and uncertainty are common in engineering applications. Second, the time series demonstrates complex characteristics in the real world. Third, each type of individual model rarely perform well compared to the hybrid technique, which combines the advantages of each type of model into an integration. Therefore, using the hybrid technique improves forecasting performance.
Various measurement indicators were considered in vibration signal analysis and fault feature extraction [35,36]. To comprehensively describe the condition information of the original signal, feature extraction plays an important role. Thus, the relevant studies have been widely studied [37,38]. The multi-domain features extracted from the sensitive modes contain the main information of the original signal. The motivation for selecting sensitive IMFs is to avoid the interference of redundant information [39]. SVM is a promising method that can solve the classification problem of small samples [40,41]. Therefore, SVM has been extensively used in mechanical fault classification based on its merits [39,42].
This work proposes an effective hybrid condition monitoring strategy that achieves state-of-the-art performance, which makes predictive maintenance come true [43]. First, we apply VMD to decompose the original signals into several subseries and predict the IMFs by ARMA-ANN. Then, the time-frequency domain (T-F) features are extracted from the sensitive IMFs. Finally, a SVM classifier is used to identify the condition patterns of the rolling bearing based on the T-F features. The performance of our method is verified through experimental signals from the CWRU Laboratory.
Our work mainly provides the following contributions: First, the key information can be sufficiently captured through VMD decomposing the original signal into subseries to improve the accuracy of feature extraction and classification. Then, the subseries are fed into the hybrid prediction model to provide a reliable trend of the original signal. Next, the excellent accuracy for pattern identification of rolling bearing is acquired based on the comprehensive T-F features. Finally, a sufficient comparable experimental analysis verifies that our method achieved excellent performance while simultaneously relieving time resource problems.
The rest of this study is organized as follows: Section 2 presents the theoretical background. Section 3 introduces the proposed framework in detail. Experimental results and comparisons are presented in Section 4, and conclusions are given in Section 5.

Variational Model Decomposition
VMD is an adaptive, completely non-recursive variational mode decomposition model. The core idea of VMD is to determine the decomposed K IMFs, their corresponding center frequencies, and bandwidth by iteratively searching the optimal solution of the variational model. The constraint model formula of VMD is as follows: where x(t)is the original signal and {u k } := {u 1 , u 2 , · · · u K } and {ω k } := {ω 1 , ω 2 , · · · ω K } represent the set of all modes and the corresponding center frequencies, respectively. δ(t) is the Dirac distribution, ∂ t is the partial with respect to t, and K is the number of modes to be decomposed (positive integer). To solve the optimal solution of the constraint model, the augmented Lagrange function method is introduced to transform the constraint problem into an unconstrained problem, and its formula is as follows: where α is the quadratic penalty term and λ(t) is the Lagrange multipliers. The alternate direction method of multipliers is adopted to settle the optimal u k , ω k and λ(t), which can be written as:û whereû(ω),ω andλ(ω) denote the Fourier transforms of each variable, and n is the number of iteration. Ref. [18] provided more details on the VMD algorithm.

Prediction Model
Two popular forecasting models, ARMA and ANN, will be introduced in this section, which also will be used to predict the IMFs in our research.

Autoregressive Moving Average Model
The ARMA model is designed for the variables for which past observations imply a linear function with the future value. ARMA is first used to predict the IMF series. Because the response of IMF Y lt at t is related to the response of its previous time provides high accuracy prediction, where β p (p = 1, 2, . . .) and θ q (q = 1, 2, . . .) are the autoregression (AR) and moving average (MA) coefficients, respectively. p, q are the orders of the ARMA model, which are identified based on Akaike's information criterion and Bayesian information criterion in this paper. When q = 0, the model is reduced to an

Artificial Neural Network Model
ANN is a tool which can detect all possible interactions between the input and output from the provided training data. It also provides a flexible architecture for nonlinear modeling in fault detection, diagnosis [44], and prediction [45]. The framework of ANN depends on the characteristics of the data. Thus, the number layers and neurons are configured based on the essence of the input data. The widely used three-layer ANN model presented in Figure 1 is applied to model time series data, in which the neurons are acyclically linked.  The nonlinear function f of Y nt sequence from Y n(t−1) to Y n(t−N) is expressed as follows: whereŶ nt is the prediction result at any given time t, ω ij and H denote the hidden layer neuron weights and number, respectively. The corresponding input layers are ω j and N. f represents a "sigmoid" function in this research, while e t is a noise or error term.
Additionally, to avoid the local minimum and overfitting, k-fold cross validation is applied to train the ANN. Finally, we evaluate the test data with the trained ANN [46].

Hybrid Prediction Model
We take the original time series data Y t as consisting of two components, the linear components Y lt and the nonlinear components Y nt . The residual data r t = Y t − Y lt are considered as the distinction between the linear component and original data. Thus, the hybrid model is constructed as: whereŶ t is the final prediction result,Ŷ lt is the ARMA prediction result, Y nt is the residual of the ARMA model, andŶ nt is the prediction result of the ANN model.

Feature Extraction
Both time-domain and frequency-domain indicators are affected by the equipment's operating condition [19]. Table 1 presents the features extracted from the sensitive IMFs, which are the subseries that contain 90% of the energy of the original signal. The energy of each IMF can refer to the Ref. [47].

Support Vector Machine
SVM is one of the most celebrated and popular classification algorithms in the field of multi-classification [39,40,48]. The prime purpose of SVM is to maximize the margin of between-class, meanwhile narrowing the distance between the hyperplane focuses. The training set of SVM is denoted by where x i and y i are the sample data and category, respectively. The hyperplane can separate the data correctly with the maximum interval. The solution of the optimal separating plane is described as: where b is the bias vector, ω denotes the weight vector, and ξ i ≥ 0(i = 1, 2, . . . , n) represents a relaxing factor, which ensures the accuracy of classification. We rewrite the problem (8) as the following minimization problem: where λ is the penalty factor, which is used to relieve the SVM classifier dilemma between accuracy and complexity. The classical hinge loss function is applied in this research. Thus, the dual optimization problem of SVM is described as: where K x i , x j denotes a kernel function, K x i , x j = Φ (x i ) · Φ x j , and · is the inner product operation. The SVM classifier with a radial basis function (RBF) kernel, K x i , x j = exp −γ (xi−xj) 2 , γ > 0 is applied in this work, where γ is a hyper-parameter that affects the merits of the SVM. Thus, the classification is described as:

Proposed Architecture
An architecture for the condition monitoring of the rolling bearing is shown in Figure 2, which is mainly divided into four submodules (VMD decomposition, ARMA-ANN prediction, T-F features extraction and SVM classification).
The specific details are as follows.

Data Description
In this section, the motor bearing data set from the CWRU laboratory is utilized to demonstrate the advantages of our proposed method. More details on the experiment are referenced in [19,49].

Experimental Results and Analysis
In this paper, four classes of bearing operating conditions data sets, including normal (NS), inner race fault (IF), outer race fault (OF), and ball fault (BF) were applied, respectively. Each piece of the data set consisted of 12,000 points. The working speed, fault size, load and sampling frequency were set to 1730 r/min, 0.007 in, 3 HP, 12 kHz, respectively.

Decomposing the Vibration Signal Using VMD
According to the steps in Section 3, a group of IF data as an example was analyzed. Table 2 shows the VMD relevant parameters according to [19]. The time-waveform of the original signal and the modes obtained by VMD are presented in the left of Figure 3, and the corresponding frequency spectrums are presented in the right of this figure.

Prediction Based on Hybrid Prediction Model
Based on the aforementioned VMD, the original signal was decomposed into several IMF time series, using the hybrid prediction model to obtain the trend of subseries. First, the obtained six IMF series by VMD were subjected to ARMA. We used the AR model of order 12 (AR(12)) and MA model of order 16 (MA(16)) for IMF1. Then the three-layered 1 × 4 × 2 ANN architecture was used for the residuals. To further quantify the performance of the prediction models, the comparison of quantitative criteria is shown in Table 3, including the root-mean-square error (RMSE): 1 n ∑ n t=1 e 2 t , the mean average percentage error (MAPE): 100% n ∑ n t=1 | e t Y t |, and the mean absolute error (MAE): 1 n ∑ n t=1 |e t |, which are used to quantify the prediction accuracy, where e t denotes the error between expectation and the prediction. We further compared the prediction of subseries with the original data series. The prediction of the IF original data series for the ARMA-ANN obtained by VMD is shown in Figure 10. The quantitative evaluation criteria shown in Table 4 demonstrate that the prediction of subseries obtained by VMD achieves better performance than the original data series. This result motivated us to make original data series into subseries to improve the accuracy of results. Furthermore, our hybrid prediction method with VMD could provide a superior scheme for machinery condition monitoring.

Feature Extraction
Next, the features were extracted from the sensitive IMFs. The power spectral density (PSD) distribution of the original signal are shown in Figure 11, which demonstrates that the sensitive modes are the 2nd IMF to the 6th IMF. However, the 1st is obviously redundant for the downstream feature extraction and classification processing. Then, we extracted the T-F features from the 2nd IMF to the 6th IMF to form a new feature set and also normalized the set. 6 Figure 11. The energy ratio distribution of subseries for IF data by VMD decomposition.

Classification Based on SVM
In this subsection, the condition patterns are identified using SVM based on the normalized feature set. The performance of classification highly depended on the features set. Consequently, the time-domain features and the frequency-domain features were combined to construct the input data vectors. Then, a normalized 10-dimension input vector was fed into SVM to train for the downstream classification processing. Figures 12-14 show the recognition results based on the features of different domains. The quantitative classification performance of the SVM is compared in Table 5, which shows the classification results of the condition patterns using the features of different domains.

Discussion
Considerable achievements have been made in time series forecasting [52] and classification [53] for the deep learning-based methods; however, these methods highly depend on the input scale level. Meanwhile, it is worthy to notice that the deep learning-based methods show accuracy associated with the scale level, and the more scales contribute, the more accurate and stable the performance becomes. Therefore, the solution to acquire high performance is costly in terms of time to train and test. This fact indicates that the augmentation of computation becomes the problem caused by its scale level. Moreover, most of the deep learning-based methods cannot deal with a small sample of data; in other words, accuracy and stability cannot be guaranteed in this case.
This research uses 12,000 sample points to verify the performance of the prediction and classification. The biggest challenge is that the data used for training are much less. The reason for this is that our proposed method is conducted to monitor the rolling bear operating condition. Moreover, our strategy for condition monitoring is also effective while simultaneously relieving time resource problems. A detailed description of the comparative method's parameters is given as follows. The LSTM [52] is utilized to forecast the future trend of the subseries, and "Sigmoid" and "Adam" are utilized as the activation function and optimizer, respectively. The network is updated by minimizing the mean square error for e t with a gradient descent algorithm. The configurations of the three one-dimensional CNN [53] for classifying layers are: Conv1D (16,64,16), Conv1D (32, 3, 1), Conv1D (64, 3, 1). After the convolution operation, the Batch Normalization is carried out. "Relu" is applied as an activation function. Then, the convolution kernel (2 × 1) is employed in the maximum pooling operation. To more clearly illustrate the advantages of our proposed method on time series forecasting using the same small sample data set, a comparison of quantitative criteria for prediction results is presented in Table 6. Our proposed method outperforms the CNN-based method in classification, whose averaged accuracy is higher than 91.88%. Therefore, both the performance of prediction and classification confirm the advantages of our proposed method.

Conclusions
In this paper, an effective condition monitoring scheme is designed for rolling bearings. First, the original signal can be sufficiently described by IMFs obtained by VMD. Next, each IMF is forecasted through the hybrid prediction model ARMA-ANN. Moreover, the sensitive IMFs, which contain over 90% of the energy of the original signal, are selected to provide downstream feature extraction and classification processing. Then, the T-F features, which can represent the comprehensive condition characteristics of the rolling bearing, are extracted and normalized. Finally, the normalized T-F features are fed into the SVM classifier to identify the operating patterns of the rolling bearing. The performance of the proposed condition monitoring method is verified by the motor-bearing data set of the CWRU laboratory.
Note that the scheme of the original signal first decomposed by VMD into a subseries provides the sufficiency and convenience of analyzing the operating condition. Secondly, the hybrid prediction model enhances the accuracy of prediction. Thirdly, the condition patterns are identified accurately using a SVM classifier based on the T-F features. Additionally, the proposed method balances the performance and time resources equally well. Thus, it can be concluded that our proposed method under the hybrid framework has some potential value for monitoring machinery condition.
Author Contributions: Conceptualization, A.W.; methodology, A.W.; software, A.W. and B.X.; validation, A.W. and B.X.; formal analysis, A.W.; investigation, A.W.; data curation, A.W. and B.X.; writing-original draft preparation, A.W. and B.X.; writing-review and editing, Y.L., Z.Y., C.Z. and Z.G.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Data available in a publicly accessible repository that does not issue DOIs Publicly available datasets were analyzed in this study. This data can be found here: (https: //engineering.case.edu/bearingdatacenter, accessed on 5 April 2022).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: