A Comparative Study to Predict Bearing Degradation Using Discrete Wavelet Transform (DWT), Tabular Generative Adversarial Networks (TGAN) and Machine Learning Models

: Prognostics and health management (PHM) is a framework to identify damage prior to its occurrence which leads to the reduction of both maintenance costs and safety hazards. Based on the data collected in condition monitoring, the degradation of the part is predicted. Studies show that most failures are caused by faults in rolling element bearing, which highlights that a bearing is one of the most important mechanical components of any machine. Thus, it becomes important to monitor bearing degradation to make sure that it is utilized properly. Generally, machine learning (ML) or deep learning (DL) techniques are utilized to predict bearing degradation using a data-driven approach, where signals are captured from the machine. There should be a large amount of data to apply either ML or DL techniques, but it is difficult to collect that amount of data directly from any machine. In this study, health assessment is carried out using the correlation coefficient to divide the bearing life into two degradation stages. The raw signal is processed using discrete wavelet transform (DWT), where mutual information (MI) is used to rank and select the base wavelet, after which tabular generative adversarial networks (TGAN) are used to generate the artificial co-efficients. Statistical features are calculated from the real data (DWT coefficients) and the artificial data (generated from TGAN). The constructed feature vector is then used as an input to train machine learning models, namely ensemble bagged tree (EBT) and Gaussian process regression with the squared exponential kernel function (SEGPR), to estimate bearing degradation conditions. Both the machine learning models were validated on the publicly available experimental data of FEMTO bearing. Obtained results showed that the developed EBT and SEGPR models accurately predicted the bearing degradation conditions with the average lowest RMSE value of 0.0045 and MAE value of 0.0037.


Introduction
In the past decade, prognostics and health management (PHM) methods have received wide attention due to an increase in prediction accuracy, which results in a reduction of maintenance costs [1,2]. Vibration in the bearing generates a significant amount of noise and will lead to degradation of the product quality. These vibrations are generated by two types of defects; the first one is the distributed defects where surface roughness, off-size rolling elements and misaligned races play a role, while the second type of defect is the local defect where cracks, spalls and corrosion play a role [3,4]. Bearing degradation is a part of prognostics, in which degradation performance is studied using either a datadriven approach or model-based approach, or a combination of both approaches (hybrid), and identifies the forthcoming failure of a machine part [5,6].
As per the finding by P.F. Albrecht et al. [7], around 45% of machine failure incidents are only caused by rolling element bearing, so it becomes very important to monitor/predict bearing degradation. In the case of using a model-based approach, a mathematical model is constructed for prediction. Yaguo Lei et al. [8] proposed a two-stage model; in the first stage, the weighted minimum quantization error is calculated as a health indicator using mutual information (MI) to correlate with the degradation process; in the second stage, particle filtering is used to predict the RUL. However, it is very difficult to make a mathematical model of a machine, as it depends on the operation of the machine in different environments and the complexity of the machine.
In the case of a data-driven approach, along with the machine's complexity and the working environment, data collected from sensors such as vibration signals, acoustic signals or temperature reveals the health conditions of the bearing [9,10]. Various statistical features are extracted from time domains, frequency domains and time-frequency domains which are used to form a feature vector. The constructed feature vector is then used as an input to train various ML models such as support vector machines (SVM), decision trees, k-nearest neighbors (K-NN), or deep learning models such as artificial neural networks (ANN), for either finding the type of fault or bearing health conditions [11][12][13][14]. Xiang Li et al. [15] proposed the usage of a convolution neural network (CNN) as a multifeature identifier, as well as a predictor for the estimation of the degradation of the bearing. Jun Zhu et al. [16] proposed a multi-scale convolution neural network (MSCNN) on a time-frequency representation made using wavelets to predict the RUL. Mehdi Behzad et al. [17] proposed to use features, namely, the root mean square (RMS) and kurtosis, along with a special feature known as high-frequency root mean square (HFRMS), to train a feed-forward neural network and estimate the degradation of the bearing. Wentao Mao et al. [18] proposed an LSTM model on time-frequency images generated using the Hilbert-Huang transform for predicting bearing degradation. Youngji Yoo and Jun-Geol Baek [19] proposed CNN on the wavelet power spectrum image generated using the continuous wavelet transform. Biao Wang et al. [20] proposed a hybrid prognostic approach where degradation data are sparsely presented using relevance vector machine regression and bearing degradation is estimated using the exponential degradation model. The study conducted by Xinlai Ye et al. [21] focused on the development of a novel health index for the successful identification of bearing degradation. Their results showed improvements in the detection of the accuracy of incipient bearing degradation. B Savić et al. [22] implemented a non-linear regression model for analyzing the condition of rolling bearings. Testing results of their used model showed a prediction error within the limits. Blaut et al. [23] used the Teager-Kaiser method to evaluate rotor imbalance in hydrodynamic bearings. In another study conducted by Kubik et al. [24], test methods have been developed for diagnosing the technical condition of rolling bearings in road wheels.
It is very difficult to capture the vibration signal for every possible type of failure of any machine component due to its operationality in different conditions. Generative adversarial networks (GAN) have gained popularity in the past decade, as they can generate artificial data from the original data. David Verstraete et al. [25] have predicted bearing degradation, using a generative adversarial network (GAN), variational autoencoders (VAE) and an adversarial-variational model, and compared them. Xiang Li et al. [26] divided the whole bearing life into two stages and calculated the first prediction time (FPT), and then GAN was used to generate artificial data. The real and the artificial data was then fed to CNN to extract the features and predict bearing degradation.
Several studies have been carried out for the prediction of bearing degradation. In the present study, bearing degradation is divided into two stages using correlation coefficients, and then discrete wavelet transform is applied to raw vibration signals to calculate statistical features. As a large amount of data are needed to predict bearing degradation using ML models, authors have therefore utilized tabular generative adversarial networks (TGAN) to generate artificial data for prediction. Finally, the feature vector was formed from DWT and TGAN, which was fed into ML models to estimate bearing degradation. TGAN is a data augmentation technique specifically applied on tabular data. It would be applicable to a variety of manufacturing applications where the available experimental dataset is limited, such as for the prediction of MRR, surface roughness, tool wear rate, etc. Figure 1 shows the flow chart of the proposed methodology. In this paper, as shown in Figure 1, the first section is about the FEMTO bearing dataset. The second section is about the preprocessing of the data, where a health assessment is carried out to identify bearing degradation. The second part of section two is about discrete wavelet transform (DWT) and the selection of the base wavelet using mutual information (MI). The next part of section two is about making artificial data using a tabular generative neural network (TGAN). The last parts of section two are about the extraction of time-domain statistical features calculated on both the real data as well as on artificial data, followed by a short description of machine learning models. In the seventh section, bearing degradation is estimated using the machine learning models, followed by the last section of the conclusion.

Dataset
The FEMTO bearing dataset of prognostic and health management (PHM)'s IEEE challenge 2012, provided by the National Aeronautics and Space Administration (NASA) [27], is used. Figure 2 is an experimental setup of PRONOSTIA, used to test and validate the bearing's fault detection and prognostics.  Table 1 shows the FEMTO bearing dataset generated by PRONOSTIA, which has acceleration values collected at a sampling frequency of 25.6 kHz as the data for a total of seventeen bearings, of which, seven were in condition 1 and 2, while three were in condition 3. In this study, the vibration data of six bearings operated at the radial load of 4000 N and 1800 rpm is used. Figure 3 shows the plot of the vibration signal against time for bearing 11 and bearing 13.

Health Assessment
In this study, the complete bearing life was first divided into two health stages; the first was the normal degradation stage, while the second was the fast degradation stage, as shown in Figure 4 for all the bearing conditions, using a criterion, namely, singular value decomposition (SVD) normalized correlation coefficients, as proposed by Wentao Mao et al. [18]. The equation used to calculate the SVD normalized coefficient is where x and y are the singular value vectors of the signal and j = 1…N, q represents the length of singular value vectors.
As per Mao et al. [18], normal degradation and fast degradation can be identified on the basis of the correlation coefficient. If the correlation coefficient value lies consistently below 0.95, then it can be considered as being the bearing fast degradation state. The idea is to use the robustness of SVD, as with change, the correlation coefficient of SVD for the normal stage is comparatively higher than the correlation coefficient of SVD for the faulty stage. SVD also takes into account the noise, and minor deviations to the health assessment as the change in its value will be slight if the deviation is small, and if the signal varies greatly, it will show a large amount of change in its values. Thus, Wentao Mao et al. [18] concluded that the singular value decomposition (SVD) normalized correlation coefficient can be used to accurately find the change in a bearing's health state when the vibration signal varies drastically.

Tabular Generative Adversarial Networks
Machine learning algorithms require ample, adequate and reliable data, especially for unconventional and intricate applications performed by complex techniques [28,29]. For this study, collecting an ample amount of data was a major challenge, as collecting vibration data for the different parameters for different types of faults/defects is very timeconsuming, costly and laborious. In recent years, GAN, i.e., generative adversarial networks, which were first developed by Goodfellow et al. [30], became a very popular and effective deep learning algorithm to generate feasible "fake data" from original data. Two neural networks, generator and discriminator, compete with each other in a min-max game, where the discriminator tries to identify the difference between the original data and the artificial data, while the generator tries to generate the artificial data in such a way that discriminator does not distinguish between the data. GANs are widely used to generate realistic and high-quality images in the field of computer vision. Various GANs such as the least squares GAN (LSGAN) [31], Wasserstein GAN (WGAN) [32], conditional GAN (CGAN) [33], information maximizing GAN (InfoGAN) [34], auxiliary classifier GAN (AC-GAN) [35], semi-supervised GAN (SGAN) [36], and tabular GANs (TGAN) [37] were developed and used for specific tasks, including fault detection, fraud detection, improving healthcare and improving cybersecurity [38]. In this study, tabular generative adversarial networks (TGAN), developed by Lei Xu and Kalyan Veeramachaneni [37], used a single table consisting of numerical as well as categorical variables to find the data distribution, to generate "fake data." The synthetic table, Tsynth, consists of continuous variables (C1, C2, C3 ……… ,Cnr) with multinomial discrete random variables (Dmr1, Dmr2, Dmr3 ……… ,Dnr) and should follow a joint probability distribution and be sampled independently. The goal was to develop a generative model G (P C1, P Dmr1) such that the sample generated represented a synthetic table Tsynth which essentially resembled the original table, with slight variations. In this study, the authors have used 100 hidden units and 100 hidden layers to characterize LSTM cells.
As observed in Figure 5, the generator was developed using a long short-term memory (LSTM) neural network with a hidden vector = tanh( ), where was the output of LSTM and was the learned parameter in the network. The input to the LSTM was the random noise z, the hidden vector x and the weight vector y. The output was calculated as = = = softmax( ) for discrete variables. A cross-entropy loss, along with KL divergence, was used as a loss function. For the discriminator, a multi-layer perceptron (MLP), an n-layer fully connected neural network, was used, and a1:n, b1:n and c1:n were fed in a concatenated manner. The first layer and the i th layer were calculated as, where ⊕ is the concatenation operator, diversity is the mini-batch discrimination vector, BN is batch normalization, and Leaky ReLU is the activation function. A conventional cross-entropy loss was used as a loss function. In this study, the artificial coefficients of the DWT were generated using TGAN.

Feature Extraction
Non-linearity is generally observed when a vibration signal is captured using the sensors from a bearing, as noise can be generated either due to surrounding environmental conditions or by faults in the bearing itself. Thus, it becomes very important to select appropriate signal processing techniques to extract useful information (features) about the health stage of the bearing. Three types of features can be extracted from captured vibration signals; time-domain features, frequency-domain features and time-frequency-domain features. Time-domain statistical features are extracted from the time waveforms of the captured vibration signals of the bearing [39][40][41]. In the current study, 13 time-domain statistical features including the mean, sum and skewness, were calculated from original db1 wavelet coefficients and the coefficients generated by TGAN. The equations used to calculate the feature vectors are shown in Table 2.

Name
Description Formula

Mean
The average value of the vibration signal Standard Deviation Deviation from the mean value of the vibration signal Variance Kurtosis A measure of the spikiness of the signal relative to a normal distribution Minimum Amplitude Value of the minimum amplitude of the signal = ( ( )) Crest Factor The ratio of peak value to RMS value = Shape Factor The ratio of RMS value and mean-value = Impulse Factor The ratio of max value and mean value =

Ensemble Bagged Trees
Ensemble methods utilize the combination of several decision trees so as to obtain better predictions, as compared to single tree classifiers. The core idea behind the building of an ensemble model is that a set of weak learners join together to become a strong learner. As a result, we have a collection of various models and the average of all predictions from various trees is utilized, which is more reliable than a single decision tree classifier. In EBT, bagging is applied to minimize the variance in the decision tree classifier. There are N readings and F features in a feature vector, and a subset of M features are selected, and the feature which gives the best split is chosen to split the nodes iteratively.
The procedure is repeated N times, and a prediction is made based on the sum of predictions from N trees. In Figure 6, the color represents the various extracted features applied for training, testing and cross-validation of ML models.

Squared Exponential Gaussian Processes Regression (SEGPR)
This method is one of the supervised learning algorithms specifically designed to solve the regression problem. Usually, the calculation of the probability distribution is carried on a specific function, but in Gaussian processes regression, the probability distribution is calculated by fitting the data over all the functions. In this study, the squared exponential kernel (exponentiated quadratic or radial basis function) was used with Gaussian processes regression as it is infinitely differentiable, which states that if used as a covariance function, then it will be very smooth as it includes mean square derivatives of all orders [42,43]. It is based on equations such as where is the characteristic length scale, is the standard deviation of the signal and ui, and uj denotes the value of the signal at location i and j.

Results and Discussions
Features extracted using the wavelet transform are very useful in detecting abrupt variations in the captured vibration signals. Due to the development and vast availability of the several base functions of the wavelet, it is advantageous to use them to identify the degradation of the bearing [40]. It is very important to select the appropriate base function to extract features of the captured vibration signals [44]. Discrete wavelet transform is performed as it is one of the best tools for signal analysis and signal processing, such as noise reduction and data compression. A technique known as mutual information (MI) [45], which measures the dependency between two variables, was used in this study to select the appropriate base function for DWT-based feature extraction. Mutual information (MI) measures the dependency between two variables, which is given by the equation

MI (X;Y) = E(X) -CE(X|Y)
Where MI (X;Y) is the mutual information for X and Y, E(X) is the entropy for X and CE (X|Y) is the conditional entropy for X given Y, and X and Y are the variables. A function is selected as a base function when MI shows the maximum amount of dependency. In this study, the base functions of wavelets compared were Daubechies (db1), Symlet (sym2), Coiflet (coif1), and reverse Biorthogonal (rbio1.1). All six bearings were considered at a speed of 1800 rpm and a load of 4000 N to select the base function of the wavelet. It is clear from Figure 7 that Daubechies (db1) shows the highest MI as compared to other wavelets functions, and was therefore selected to calculate DWT-based statistical features. Availability of experimental data is critical for building machine-learning-based degradation models to better understand the non-linear connection between various extracted statistical features. Considering the availability of small size datasets, the small amount of data provided for testing raises concerns about the effectiveness of ML models accuracy, therefore, the authors have proposed a novel architecture: a tabular generative adversarial network (TGAN) for enhancing the dataset, which in turn can be useful for improving accuracy, robustness, and generalisation capabilities for future experimental data which is previously unknown to the developed model. Figure 8 shows the sample root mean square value between real and artificial data generated through TGAN. It can be observed that there were very slight variations in the RMS values between actual experimental data and the generated data. Furthermore, this shows the utility of generated features through TGAN, and Table 3 compares the statistical properties of the actual features to those of the generated features. As can be seen from Table 3, there were slight deviations in the statistical parameters for the real features and generated features.  In the current study, two ML models were used to evaluate the degradation of six bearings conditions. Initially, training was executed using the real and generated feature vectors, and afterwards, five-fold cross-validation, ten-fold cross-validation and fifteenfold cross-validation were performed on ML models. Cross-validation is a resampling procedure which is used to evaluate machine learning models, so that bias and overfitting can be avoided and generalization capability can be improved. In five-fold cross-validation, datasets are first divided in to five equal parts, and for the first fold, one part is used for testing and the other four parts are used for training. In the second fold, two parts are used for testing and three parts are used for training. The procedure repeats for the rest of the portions and the predictions are generated after averaging all folds. In the present study, for five-fold, ten-fold and fifteen-fold cross-validation, mean RMSE and MAE were computed after applying EBT and SEGPR models.
Two performance metrics, the root mean square error (RMSE) and mean absolute error (MAE), are considered to evaluate the performance of ML models for the predictions of bearing degradation. Performance metrics were calculated as follows:

Mean Absolute Error (MAE)
MAE reflects the average deviation obtained from the differences between the experimental/calculated values and the predicted values [41]. It is represented by the equation: where � represents the predicted values obtained using ML models, while y shows the experimental/calculated values.

Root Mean Square Error (MAE)
RMSE is the square root of the mean of the squared difference between the experimental/calculated values and the predicted values. It is a significant parameter widely used to evaluate the performance of regressions model and is represented mathematically as: where � represents the predicted values obtained using ML models, while y shows the experimental/calculated values. In this study, bearing degradation was predicted using two machine learning algorithms; namely, ensemble bagged tree (EBT) and squared exponential Gaussian processes regression (SEGPR), for the features extracted through actual experimental data and the artificial data generated by TGAN. It was observed that the feature vector was randomly split into training and testing. To avoid bias as well as to minimize misclassification errors, the authors have implemented k-fold cross-validation techniques. RMSE and MAE have been calculated after applying five-fold cross-validation, ten-fold cross-validation and fifteen-fold cross-validation procedures in ML models. Figure 9a-c shows the RMSE obtained when EBT and SEGPR models are applied with five-fold cross-validation, ten-fold cross-validation and fifteen-fold cross-validation procedures, respectively. From Figure  9a, it can be seen that the performance of the SEGPR model was better than the EBT model, as the average RMSE value for all the bearing degradation conditions, as well as original and generated data, was 0.0046, as compared to 0.0047 with the EBT model. Figure 9b represents the average RMSE value obtained after applying EBT and SEGPR models with ten-fold cross-validation. The average RMSE value obtained after applying ML models were 0.0047 and 0.0048 with SEGPR and EBT models, respectively. Similarly, the average RMSE value obtained after applying ML models were 0.0045 and 0.0047 with SEGPR and EBT models, respectively. Figure 9c shows when a fifteen-fold cross validation procedure was applied. It can be seen that the SEGPR model predicted bearing degradation better than the EBT model with original and generated bearing degradation data, as well as for all cross-validation conditions.
To justify the utility of the proposed methodology, MAE values were calculated and plotted, which can be observed in Figure 10a-c. From Figure 10a, it can be seen that the performance of the SEGPR model was better than the EBT model, as the average MAE value for all the bearing degradation conditions, as well as original and generated data, was 0.0038, as compared to 0.0039 with the EBT model. Figure 10b represents the average MAE value obtained after applying the EBT and SEGPR models with ten-fold cross-validation. The average MAE values obtained after applying ML models were 0.0039 and 0.0040 with SEGPR and EBT models, respectively. Similarly, the average MAE values obtained after applying ML models were 0.0037 and 0.0039 with SEGPR and EBT models, respectively. Figure 10c shows when a fifteen-fold cross validation procedure was applied. It can be seen that, once again, the SEGPR model predicted bearing degradation better than the EBT model, based on MAE performance metrics. It should be noted that all the RMSE and MAE values were very small and were under the permissible limit of industry and research conditions. One of the reasons for the better performance of the SEGPR model is that the kernel function squared exponential is infinitely differentiable, which generates a smooth curve, which enables the easy fitting of the data. To verify the effectiveness of the proposed methodology, a comparison table with the previously published literature, Table 4, was prepared, in which different authors have used the same bearing degradation dataset. It can be seen that with smaller datasets, the prediction results are comparatively better than when TGAN is utilized with wavelet transform and the SEGPR model.

Conclusions
In the current study, the bearing degradation of six bearings computed from original data and artificial data generated using TGAN has been predicted using two machine learning models: ensemble bagged tree (EBT) and squared exponential Gaussian processes regression (SEGPR). Initially, the raw vibration signals were captured and pre-processed using the selected DWT functions based on the MI criterion. Thirteen statistical features were calculated, and to demonstrate the utility of TGAN for degradation prediction, an artificial feature vector was generated. After applying two ML models with five-, ten-and fifteen-fold cross-validation, observations were as follows: 1. The effectiveness of ML models was assessed with two standard performance metrics: RMSE and MAE. Least errors have been observed to predict bearing degradation from both the EBT and SEGPR models. 2. The lowest RMSE value of 0.0045 was observed with the SEGPR model when fifteenfold cross-validation was implemented, whereas with the EBT model, the lowest RMSE was observed as 0.0047 when five-fold cross-validation was implemented.
3. With the SEGPR model, the lowest MAE value observed was 0.0037 when fifteenfold cross-validation was implemented, whereas with the EBT model, the lowest MAE was observed as 0.0038 with five-fold cross-validation. 4. The methodology developed based on hybrid TGAN-SEGPR, which is least explored for bearing degradation, can be useful to various applications including fault diagnosis, fault severity, and manufacturing parameter assessments, when the availability of experimental data is limited, which makes difficult to develop ML models. Authors expect that the additional data generated through TGAN will be extremely useful in a variety of interdisciplinary applications, for classification, as well as regression analysis, with ML models.

Acknowledgments:
We would like to thank Patrick Nectoux and his colleagues for conducting the experiments at the FEMTO Institute, and the National Aeronautics and Space Administration (NASA) for providing the dataset.

Conflicts of Interest:
The authors declare no conflict of interest.