1. Introduction
Rolling bearings are essential components of rotating machinery with considerable importance in the problem of fault detection and diagnosis, accounting for one-third of the total defects in induction machine failures [
1]. The quality and performance of these indispensable parts directly affect the reliability, efficiency and down-time of electrical machines. Four main faults can occur in bearings, namely, inner race; outer race; ball or rolling element; and cage, under variable and high loads, resulting in economic costs and even safety accidents in case of escalation. Fault detection and diagnosis in rolling bearings have been widely studied using model-based methods [
2], signal processing approaches [
3] and data-driven techniques [
4]. Developing a mathematical model of bearing faults is not always feasible, especially in complex dynamic systems. The continuously increasing availability of data has led the research focus on data-driven techniques. Knowledge is extracted incorporating feature engineering processes on raw data acquired from diverse sensor measurements. These sensing modalities include the following [
5]: stator current measurements, vibration signals, sound or acoustic emission signals, and thermal analysis. Major efforts are needed to establish a real-world testbed to collect measurements from bearing faults of different types. Fortunately, different organizations publicly provide such bearing fault datasets, that contain individual stator current signals, vibration signals or both. For example, the most popular bearing fault datasets among them are CWRU [
6], IMS [
7], Paderborn university [
8] and PRONOSTIA [
9]. These bearing datasets are used as a standard reference since they are essential for validating the performance of different models and approaches in the field of fault detection and diagnosis.
Conventional signal processing techniques for rolling-element bearing fault detection using vibration signals include time-domain [
10], frequency-domain [
11] and time–frequency-domain analysis [
12]. In time-domain analysis, characteristic features of signal statistics are calculated using temporal vibrational signal data. These features include root mean square (RMS), peak value, peak-to-peak value, skewness, kurtosis, crest factor, form factor, standard deviation and min–max values [
13]. The time-domain vibration signals can be converted to frequency components using fast Fourier transform (FFT). Thus, FFT and discrete Fourier transform (DFT), spectrum analysis and envelope analysis are the most common frequency domain candidates detecting the required specific frequency components [
14,
15]. In time-frequency domain analysis there is a combination of both time and frequency domains using approaches like short-time Fourier transform (STFT) [
16], wavelet analysis (continuous wavelet transform—CWT, discrete wavelet transform—DWT) [
17,
18], wavelet packet decomposition [
19], empirical mode decomposition [
20], variational mode decomposition [
21], Hilbert transform [
22] and stochastic resonance [
23].
There is a long list of machine learning (ML) and deep learning (DL) methods that are utilized in the rolling bearing fault diagnosis domain [
17]. ML approaches find patterns in the extracted features producing predictions of bearing fault types. On the other hand, DL methods incorporate processes that enable feature extraction in an automatic manner, learn high-level features in their hidden layers and classify fault types. However, as the availability of data increases then the performance produced by DL techniques can be significantly enhanced compared with standard ML models. Indicative ML-based approaches that have been reported in the literature for bearing fault detection are support vector machine (SVM) [
24], k-nearest neighbor (k-NN) [
25], principal component analysis (PCA) [
26], singular value decomposition (SVD) [
27] and fuzzy cognitive networks with functional weights (FCN-FW) [
28]. Approaches of particular importance are those that are based on optimization methods such as particle swarm optimization (PSO) [
29], mayfly optimization algorithm (MMA) [
30], whale optimization algorithm (WOA) and gray wolf optimization (GWO) [
31]. Broadly practiced DL implementations in the application under examination are as follows: convolutional neural networks (CNNs) [
32,
33]; auto-encoders (AEs) [
34,
35]; deep belief networks (DBNs) [
36]; recurrent neural networks (RNNs) [
37]; long short-term memory (LSTM) [
38]; and generative adversarial networks (GANs) [
39]. This class of models can deal with 1D signals, as well as 2D images that have been converted from the raw vibration signal or a feature extraction method such as continuous wavelet transform.
A challenging task is to identify the presence and type of a bearing fault under noisy conditions, especially when faults are at their incipient stage. In real-world applications and especially in industrial processes, electrical machines operate in constantly noisy environments. The background noise is an inherent characteristic in industrial sites and practically unavoidable. In the case of rolling bearings, the acquired vibration signals may contain a level of noise due to lack of lubrication, improper installation, imprecise manufacturing, high rotational speed or vibration caused by other parts of the machine. For this reason, denoising methods have been proposed to remove the noisy part from vibration signals [
40], but prior and expert knowledge is required often [
41]. Deep learning has attracted increasing interest in recent years for its use in bearing fault diagnosis in noisy environments. Different implementations have been presented in the literature aiming to propose an accurate learning model to detect such faults in noisy environments [
42,
43,
44,
45].
The subject of this work is evaluating different conventional learning models utilizing different preprocessing methods. Also, this work is aligned with a low-cost orientation and, therefore, evaluations are made on the 12 kHz of the CWRU dataset taking into account the total number of trainable parameters. For example, instead of using large segments of 1D vibration signals to produce larger images as inputs to the learning models and subsequently aim at higher overall performance, we study the performance of all adopted learning approaches from the preprocessing perspective under more feasible computational workloads. More specifically, in this work we investigate the reliability and resiliency of conventional ML and DL models towards rolling bearing fault detection, simulating data that correspond to noisy industrial environments. Diverse preprocessing methods have been applied in order to study the performance of SVM, Lenet-5, 1D-CNN and 2D-CNN from the feature extraction perspective. These feature extraction methods include statistical features in time-domain analysis (TDA); wavelet packet decomposition (WPD); and continuous wavelet transform (CWT); signal-to-image conversion (SIC), utilizing raw vibration signals acquired under varying load conditions of a 2 Hp induction motor with a sampling frequency of 12 kHz, as mentioned.
The paper is organized as follows:
Section 2 presents, in a brief and comprehensive manner, a review of the bearing fault detection problem from basic notions and theoretical background to the main diagnostic workflow needed, embodying feature extraction methods and learning models.
Section 3 is devoted to the development of the adopted implementations, as well as to their evaluation in different simulated noise environments.
Section 4 covers the comparison of different cases, in terms of preprocessing methods as well as different learning models, in the bearing fault detection problem.
Section 5 provides an analytical discussion of the conducted study, while conclusions are given in
Section 6.
2. Bearing Fault Detection Workflow, Problem Description and Review
Various parts of a rotating electrical machine, such as the stator, rotor and rolling bearings, are susceptible to significant issues [
46]. Notably, rolling bearing defects are among the most frequent types of failures in electrical motors, occurring at a rate of 30–40%. As the component that secures the rotor’s appropriate rotation from the machine shaft and serves as a mechanical connection point of the electric motor, bearings are crucial to the lifespan of an electrical machine. The rolling balls, the inner and outer races and the cage, which keeps the distance between the balls equal, are just a few of the components that make up
Figure 1, which illustrates a typical geometry of a rolling bearing.
Furthermore, bearing faults are caused by a variety of factors like insufficient lubrication, misalignment of rotor and mechanical stress. Each kind of bearing fault produces a pulse in the frequency spectrum, which is known as the bearing characteristic frequency. The frequencies for ball fault (
), inner race fault (
), outer race fault (
) and cage fault (
), respectively, can be mathematically described by the following equations:
where
is the ball diameter,
is the pitch diameter,
is the contact angle of the ball with the rails,
is the number of rolling bearing balls and
is the rotor frequency [
47].
2.1. General Perception of the Bearing Fault Detection Workflow
The general working procedure towards bearing fault detection involves different operational stages such as data acquisition, preprocessing, feature extraction and selection, learning mechanism and finally diagnostic decision. In a preparatory stage, a set of sensors is required to be placed at specific locations of the machine under examination. Usually, vibration data are collected, and then a preprocessing stage is applied to extract features of different domains and textures. Traditionally, the most used feature extraction methods include short-time Fourier transform (STFT), empirical mode decomposition or an extension like ensemble empirical mode decomposition (EEMD), continuous wavelet transform (CWT), signal-to-image conversion (SIC) or statistical methods. The latter provides the most compressed representation of the original signal leading to the strict utilization of machine learning approaches. Training mechanisms that incorporate deep neural learning algorithms utilize either 1D raw vibration signals or 2D representations that are produced by the aforementioned feature extraction processes. Convolutional neural network (CNN) architectures are widely used for signal processing and especially fault detection, extracting potential features encapsulated in signals and detecting local information during training.
Figure 2 illustrates the general flow procedure for bearing fault detection adopting either 1D or 2D CNN as training candidate algorithms under different feature extraction methods.
2.2. A Short Review of Bearing Fault Datasets
One of the most challenging tasks in the Artificial Intelligence universe is the existence of descriptive and coherent benchmark datasets. Often, in large-scale datasets, there is the need for multidisciplinary perspectives to ensure the creation of a flawless dataset, under specific conditions and parameters, that is a reliable solution to be utilized towards solving a real-world problem. This process is consequently even more ambitious in the case of electrical machines and bearing faults. This stems from the fact that degradation occurs gradually over a long operating horizon passing from incipient stages and malfunctions towards severe conditions and eventually total degradation.
For this reason, a usual practice for data collection is to either include artificially induced faults or perform testing methods that accelerate the life-cycle of components. Apart from being time consuming, this process is prohibitively expensive and requires expert assistance to ensure that all intermediate fault states have been acquired smoothly and accurately. The following are well-known bearing fault datasets that are publicly available from different organizations: (a) Case Western Reserve University (CWRU) bearing dataset (
https://engineering.case.edu/bearingdatacenter (accessed on 1 July 2023)); (b) Intelligent Maintenance Systems (IMS) (
https://www.nasa.gov/content/prognostics-center-of-excellence-data-set-repository (accessed on 1 July 2023)); (c) Paderborn university bearing dataset (
https://mb.uni-paderborn.de/kat/forschung/ (accessed on 1 July 2023)); and (d) IEEE PHM 2012 Prognostic Challenge (PRONOSTIA). A brief comparison of the aforementioned datasets is presented in
Table 1, illustrating the differences among fault mode, sensor type, sampling frequency and fault type. Note that the PRONOSTIA and IMS datasets are preferred for use in remaining useful life (RUL) prediction problems [
48,
49].
2.3. Feature Extraction and Selection
Diverse signal processing methods can be applied to obtain the required useful information from vibration data. These methods may vary between time-domain, frequency-domain and time–frequency-domain analysis [
50]. A comprehensive review that presents the signal processing techniques utilized in the rolling element bearings fault detection area is presented in [
51]. For example, the most common approaches are (a)
time domain or temporal analysis—statistical features; (b)
frequency domain—fast Fourier transform (FFT), power spectrum, cepstrum, envelope spectrum; (c)
time–frequency domain techniques—short-time Fourier transform (STFT), wavelet based approaches like continuous wavelet transform (CWT), discrete wavelet transform (DWT), wavelet packet transform (WPT) and tunable Q-factor wavelet transform (TQWT), also empirical mode decomposition (EMD) and its extensions, and empirical wavelet transform (EWT) and morphological filter.
Feature selection is a usual practice during preprocessing in order to divide attributes into informative, redundant or irrelevant ones. This operation reduces the feature vector dimension keeping the most related and important features, while also removing the redundant and irrelevant features, avoiding overfitting and alleviating the workload. Different feature selection strategies have been proposed in the literature to choose the most discriminant features using the CWRU dataset. For example, there are approaches that are based on particle swarm optimization (PSO) [
52], principal component analysis (PCA) [
53] or conventional search of the feature space by greedy methods [
54].
2.4. Machine Learning and Deep Learning Models
In the field of fault detection and diagnosis in electrical machines, machine learning algorithms play a crucial role offering data-driven approaches for identifying anomalies, malfunctions, degradation levels and defects. These models are trained to recognize patterns associated with normal and faulty behavior based on historical data. Feature extraction and feature engineering are essential steps in preparing the data for training machine learning models in fault detection. They involve transforming raw data into meaningful and informative features that capture relevant patterns and characteristics related to the fault type. However, these preprocessing steps may involve complex feature engineering approaches or may require domain-related expertise. Machine learning approaches that have been reported in the literature regarding bearing fault detection include mainly artificial neural networks (ANNs) [
55], support vector machines (SVMs) [
56] and k-nearest neighbor (KNN) [
57].
Consequently, deep learning algorithms with automated feature extraction capabilities have gained popularity in bearing fault diagnostics. Deep learning is a subset of machine learning that excels in representing the problem under examination through nested hierarchies of concepts. The transition from classical machine learning to deep learning is driven by factors such as data explosion, algorithm evolution and hardware advancements. The advantages of deep learning over conventional machine learning include better performance, automatic feature extraction and transferability to different domains. As a result, deep learning has witnessed exponential growth in applications, including machine health monitoring and fault diagnostics, with bearing fault detection being a prominent example. Indicative methodologies include auto-encoder implementations [
58], 2D CNN structures [
59], 1D CNN classifier [
60], deep belief network (DBN) [
61] and attention mechanism [
62]. Extensive review studies have been reported in the literature regarding the field of bearing fault detection from the scope of learning models [
63,
64,
65].
5. Discussion
The field of fault diagnosis in electric machines has significantly attracted the interest of the research community. In this work, an evaluation study was conducted to find the most suitable signal preprocessing techniques and the most effective model for fault diagnosis of 16 conditions/classes, from a low-workload perspective using the well-known CWRU dataset. The data were preprocessed in various ways, including feature extraction in the time domain, wavelet packet decomposition, signal-to-image conversion and continuous wavelet transform. The processed data served as inputs to the classification models in order to evaluate the latter in terms of accuracy, noise resiliency and complexity. The learning models perform better when the noise is in a smaller fraction of the overall signal, as expected.
Table 16 reports the number of trainable parameters and the training time for each neural network implementation. Generally, the 1D CNN has lower computational complexity and workload compared with the 2D operations needed for 2D CNN. This is also translated into faster training times for 1D CNN models. It is worth mentioning that, in the no-noise scenario, CWT is the best preprocessing method for all neural network implementations both handling the dataset with individual loading conditions (see
Table 9,
Table 10 and
Table 11) and in the merged dataset (
Table 12). Also, for the machine learning candidate (SVM), the best preprocessing technique is TDA compared with WPD. In the noisy environment scenario, TDA is again preferred over WPD for the SVM model (
Table 13) and CWT leads to better classification accuracy for LeNet-5 and the other 2D CNN models (
Table 13 and
Table 14). In 1D CNN-4L, the best performing preprocessing method is the raw vibration signals (
Table 15). However, 1D CNN-2L using CWT performs better than 1D CNN fed with raw vibration signals (
Table 15), showing that, as the number of layers increases in 1D CNNs, raw signals are exploited more efficiently to classify bearing faults under heavy noise. Similar behavior is observed when all loading conditions are merged in a unified dataset as illustrated in
Figure 11 from an oversight perspective.
Finally, in a “lessons learned” context we provide accumulated usual practices in the sense of preferred preprocessing methods and training models under different load and noise conditions:
No-noise under individual load conditions:
- -
Preprocessing method: For machine learning candidates, TDA is preferred over WPD. Apart from the better produced performance, TDA typically involves straightforward computations directly in the time domain which is often simpler to implement and computationally less intensive compared to frequency domain methods like Fourier transform. In the deep learning context CWT appears to be a strong approach as it consistently produced high accuracy results across different neural network architectures, including LeNet-5, 1D CNN-2L, 1D CNN-4L, 2D CNN-2L and 2D CNN-4L.
- -
Training model: It appears that the 1D CNN model with four layers consistently performed very well across the different load conditions. This model is also relatively easier to implement compared to deep 2D convolutional architectures like LeNet-5 and 2D CNNs, making it an attractive choice in terms of both performance and simplicity.
No-noise with all load conditions considered in a merged dataset:
- -
Preprocessing method: In this case, TDA is preferred over WPD again, while, in the deep learning context, CWT appears to be again the most dominant approach in all training model cases. It should be noted that, with low deviation from the best performed approach, raw signals can be used in 1D implementations in the case that the lowest computational burden is needed from the signal processing perspective.
- -
Training model: The most dominant model is 1D CNN-4L, while 1D CNN-2L and both 2D CNNs can also be used. However, the 1D CNN models seem to be more effective at capturing the relevant features of rolling bearing fault data compared with 2D CNNs. Also, 1D CNN architectures are generally simpler than 2D CNN architectures, both in terms of the model architecture and the number of parameters.
Noisy environment under individual load conditions:
- -
Preprocessing method: For machine learning candidates, WPD seems to perform slightly better under weak noise conditions only; thus, TDA is preferable in general in this case. For deep learning cases, CWT is clearly the most suitable for all cases with raw signals producing noise resilient forms in the 1D-CNN models.
- -
Training model: It appears that the 1D CNN model with two layers performs better than 1D CNN-4L for weak noise conditions, but the latter is more resilient to medium and strong noise environments. In the 2D CNN implementations the one with two layers performs consistently better than the one which includes four layers. However, 1D CNN is preferred over 2D CNN in terms of performance and complexity.
Noisy environment with all load conditions considered in a merged dataset:
- -
Preprocessing method: In this case, WPD performs slightly better than TDA under strong noise but produces a degraded performance with respect to TDA in all other cases. In deep learning models, CWT provides a reliable preprocessing approach in all cases. However, the most resilient case is to include raw signals in deeper architectures of 1D CNN.
- -
Training model: From the 2D CNN family, as the number of layers increases a less accurate performance is observed. In general, 1D CNN models seem to be well-suited to processing such data because they are designed to capture patterns along a single dimension, making them a natural choice for time series analysis. Deeper 1D CNN seems to work better with raw data in this generalized scenario under all noisy conditions. Also, this model provides the best choice from the computational burden perspective.