Enhanced Fault Detection in Bearings Using Machine Learning and Raw Accelerometer Data: A Case Study Using the Case Western Reserve University Dataset
Abstract
:1. Introduction
Machine Learning Algorithms for Fault Identification and Diagnosis
2. Paper Contribution
3. Data Description
- ○
- In pursuit of this goal, this research analyzed bearing-state classification data by separating it into three distinct load-level datasets.
- ○
- Targeted Monitoring: In real-world applications, bearings often experience consistent loads. Training models on specific load datasets allows for deployment that is tailored to those conditions. This leads to increased accuracy and reliability for continuous monitoring.
- ○
- Detailed Analysis of Load Effects: Separating the data allows for a coarser analysis of how load impacts bearing faults. This unveils subtle trends and relationships that might be obscured when analyzing the combined data.
- ○
- Simplified Modeling: Assuming statistical independence between datasets based on load can simplify the modeling process. This reduces the complexity of the analysis and minimizes potential biases that might arise from combining data with inherent differences.
- ○
- Preserving Unique Characteristics: By analyzing each load level separately, we avoid masking unique load-specific features within the data. This ensures that the model captures the nuances of fault signatures under different load conditions.
4. Methodology—Fault Diagnosis
Data Division Representation
5. Data Exploration
5.1. Raw Accelerometer Data Exploration
5.2. Principal Component Analysis (PCA)
5.3. t-Distributed Stochastic Neighbor Embedding (t-SNE)
6. Model Description
6.1. Multi-Layer Perceptron (MLP)
6.2. Recurrent Neural Network
- Deep LSTM: This model stacks multiple LSTM layers to create a deeper network architecture. This setup allows the model to learn higher-level temporal representations at each layer, making it more robust and accurate. This is particularly beneficial for complex time-series prediction tasks, where long-range dependencies and patterns are important. Each LSTM layer captures different aspects of the temporal data, and the deep architecture usually ends with a fully connected layer that maps the final LSTM layer’s output to the desired output shape. This could be a specific number of classes for classification tasks. Deep LSTMs are typically trained using backpropagation through time (BPTT) and are often optimized with algorithms such as Adam or RMSprop, leveraging the categorical cross-entropy loss function for classification problems.
- Bidirectional LSTM (Bi-LSTM): Bi-LSTM networks enhance the standard LSTM by introducing another layer that processes the input sequence in reverse. This allows Bi-LSTMs to capture dependencies from both past and future contexts, resulting in a more comprehensive understanding of the sequence. This architecture is particularly useful for tasks where comprehending each part requires understanding the entire sequence’s context, such as natural language processing and speech recognition. In Bi-LSTMs, the forward and backward LSTM layers run parallel to each other, and their outputs are combined at each time step through concatenation or summing. This combined output is then passed through additional layers or directly to the output layer, depending on the complexity of the task. Bi-LSTM networks also benefit from deep learning optimizations and loss functions like deep LSTMs, ensuring effective training and convergence for various sequence modeling tasks.
6.3. Convolutional Neural Network
- One-dimensional CNN (1D-CNN): supervised learning algorithm used mainly for processing data sequences such as time series, text, and speech. A set of learnable filters is applied to the input, which is this configuration is a one-dimensional sequence of values [22]. The architecture consists of a 1D convolutional layer, each followed by a max pooling layer to reduce the spatial dimensions of the feature map. A flattened output is then passed through a fully connected layer with 100 units, following a final output layer with SoftMax activation to produce a probability distribution of the input belonging to each of the 14 classes. The categorical cross-entropy loss function is used to train the model and optimized using the Adam optimizer.
- Multi Kernel 1D-CNN: This is the second CNN variant developed with three different input signals processed through three separate 1D convolutional layers. Each 1D convolutional layer has been assigned different kernel sizes of 200, 100, and 50. Each layer is followed by a max pooling layer to reduce the spatial dimensions of the feature maps. The resulting feature maps are flattened and concatenated, then passed through a fully connected layer with 100 units and a final output layer with a SoftMax activation function to produce the probability distribution of the input belonging to each class. The model is trained using the categorical cross-entropy loss function and the Adam optimizer.
- Two-dimensional (2D) CNN: In this model, raw accelerometer signals are first transformed into 28 × 28 pixel grayscale images for each class’s points. The 2D-CNN consists of two convolutional layers, each followed by a max pooling layer to down-sample the feature maps. The first convolutional layer has 32 filters with a kernel size of 3 × 3, while the second layer has 64 filters with the same kernel size and activation function ReLU. The padding parameter is set to “same” to ensure that the output feature maps have the exact spatial dimensions as the input. The output of the last max pooling layer is flattened and passed through two dense layers with 128 and 14 neurons, respectively. The activation function used for these layers is ReLU. Finally, the output layer uses the SoftMax activation function to predict the probability of each class. By using a 2D CNN, the model can capture spatial correlations between adjacent pixels in the grayscale images, which can be helpful in detecting patterns in the accelerometer raw signals [23,24].
7. Results
7.1. Recall, Sensitivity, F1-Score and Precision
- ○
- Accuracy: measures the percentage of correctly classified instances by a model out of the total number of instances. It is calculated as the ratio of the number of correctly classified instances to the total number of instances, as per the formula below:
- ○
- Recall: is the proportion of true-positive instances (i.e., instances correctly classified as positive) out of the total number of positive instances. It is calculated as the ratio of true-positive instances to the sum of true-positive and false-negative instances.
- ○
- Precision: is the proportion of true-positive instances out of the total number of instances the model classified as positive. It is calculated as the ratio of true-positive instances to the sum of true-positive and false-positive instances.
- ○
- F1-score: is a measure of the harmonic mean between precision and recall. It is a more informative metric than accuracy in situations with a class imbalance in the data, i.e., when the number of instances in one class is much larger than the other. The F1-score considers both the precision and recall of a model and is calculated as the harmonic mean of these two metrics.
- ○
- Specificity: also referred to as the true-negative rate, calculates the ratio of actual negatives that are correctly identified as negatives. For instance, it measures the percentage of healthy individuals who are correctly identified as not having the condition in a medical test. It is a key metric in evaluating the performance of a classification model, especially in imbalanced datasets or when the cost of false negatives is high.
- No-Load: The model achieves near-perfect accuracy for most classes, suggesting strong discriminative power despite the absence of load-induced stress signatures. Minimal off-diagonal values indicate a low number of false positives.
- Loaded Conditions: Both the 1 hp and 2 hp scenarios show a slight increase in misclassifications (higher off-diagonal values) compared to the no-load model. This suggests the model struggles to differentiate between certain fault types, possibly due to inherent data complexity or similar signatures under load.
- Impact of Load: Increased load introduces a trade-off. While the model maintains good accuracy for some fault types and the normal state, others exhibit lower precision under higher loads. This highlights the varying influence of load on different fault signatures.
8. Conclusions and Remarks
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kompella, K.C.D.; Rao, M.V.G.; Rao, R.S. Bearing fault detection in a 3 phase induction motor using stator current frequency spectral subtraction with various wavelet decomposition techniques. Ain Shams Eng. J. 2018, 9, 2427–2439. [Google Scholar] [CrossRef]
- Morris, D.; Sadeghi, F.; Singh, K.; Voothaluru, R. Residual stress formation and stability in bearing steels due to fatigue induced retained austenite transformation. Int. J. Fatigue 2020, 136, 105610. [Google Scholar] [CrossRef]
- Rajabi, S.; Azari, M.S.; Santini, S.; Flammini, F. Fault diagnosis in industrial rotating equipment based on permutation entropy, signal processing and multi-output neuro-fuzzy classifier. Expert. Syst. Appl. 2022, 206, 117754. [Google Scholar] [CrossRef]
- Yang, C.; Ma, J.; Wang, X.; Li, X.; Li, Z.; Luo, T. A novel based-performance degradation indicator RUL prediction model and its application in rolling bearing. ISA Trans. 2022, 121, 349–364. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Fan, L. An adaptive prediction approach for rolling bearing remaining useful life based on multistage model with three-source variability. Reliab. Eng. Syst. Saf. 2022, 218, 108182. [Google Scholar] [CrossRef]
- Saha, D.K.; Hoque, M.E.; Badihi, H. Development of Intelligent Fault Diagnosis Technique of Rotary Machine Element Bearing: A Machine Learning Approach. Sensors 2022, 22, 1073. [Google Scholar] [CrossRef] [PubMed]
- Ding, Y.; Jia, M.; Miao, Q.; Cao, Y. A novel time–frequency Transformer based on self–attention mechanism and its application in fault diagnosis of rolling bearings. Mech. Syst. Signal Process 2022, 168, 108616. [Google Scholar] [CrossRef]
- Zhang, S.; Zhang, S.; Wang, B.; Habetler, T.G. Machine Learning and Deep Learning Algorithms for Bearing Fault Diagnostics—A Comprehensive Review. IEEE Access 2020, 8, 29857–29881. [Google Scholar] [CrossRef]
- Lu, C.; Wang, Z.Y.; Qin, W.L.; Ma, J. Fault diagnosis of rotary machinery components using a stacked denoising autoencoder-based health state identification. Signal Process. 2017, 130, 377–388. [Google Scholar] [CrossRef]
- Zhang, T.; Liu, S.; Wei, Y.; Zhang, H. A novel feature adaptive extraction method based on deep learning for bearing fault diagnosis. Measurement 2021, 185, 110030. [Google Scholar] [CrossRef]
- Singh, S.; Kumar, A.; Kumar, N. ScienceDirect Motor Current Signature Analysis for Bearing Fault Detection in Mechanical Systems. Procedia Mater. Sci. 2014, 6, 171–177. [Google Scholar] [CrossRef]
- (PDF) Bearing Fault Diagnosis Using Motor Current Signature Analysis and the Artificial Neural Network. Available online: https://www.researchgate.net/publication/339366382_Bearing_Fault_Diagnosis_Using_Motor_Current_Signature_Analysis_and_the_Artificial_Neural_Network (accessed on 3 April 2023).
- Alonso-González, M.; Díaz, V.G.; Pérez, B.L.; G-Bustelo, B.C.P.; Anzola, J.P. Bearing Fault Diagnosis With Envelope Analysis and Machine Learning Approaches Using CWRU Dataset. IEEE Access 2023, 11, 57796–57805. [Google Scholar] [CrossRef]
- Neupane, D.; Seok, J. Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: A review. IEEE Access 2020, 8, 93155–93178. [Google Scholar] [CrossRef]
- Apparatus & Procedures|Case School of Engineering|Case Western Reserve University. Available online: https://engineering.case.edu/bearingdatacenter/apparatus-and-procedures (accessed on 4 April 2024).
- Jiang, L.; Fu, X.; Cui, J.; Li, Z. Fault detection of rolling element bearing based on principal component analysis. In Proceedings of the 2012 24th Chinese Control and Decision Conference, CCDC 2012, Taiyuan, China, 23–25 May 2012; pp. 2944–2948. [Google Scholar] [CrossRef]
- Mezni, Z.; Delpha, C.; Diallo, D.; Braham, A. Performance of Bearing Ball Defect Classification Based on the Fusion of Selected Statistical Features. Entropy 2022, 24, 1251. [Google Scholar] [CrossRef] [PubMed]
- Raj, K.K.; Joshi, S.H.; Kumar, R. A state-space model for induction machine stator inter-turn fault and its evaluation at low severities by PCA. In Proceedings of the 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering, CSDE 2021, Brisbane, Australia, 8–10 December 2021. [Google Scholar] [CrossRef]
- Xu, X.; Xie, Z.; Yang, Z.; Li, D.; Xu, X. A t-SNE Based Classification Approach to Compositional Microbiome Data. Front. Genet. 2020, 11, 1633. [Google Scholar] [CrossRef]
- Tian, H.; Fan, H.; Feng, M.; Cao, R.; Li, D. Fault Diagnosis of Rolling Bearing Based on HPSO Algorithm Optimized CNN-LSTM Neural Network. Sensors 2023, 23, 6508. [Google Scholar] [CrossRef] [PubMed]
- Sun, H.; Zhao, S. Fault Diagnosis for Bearing Based on 1DCNN and LSTM. Shock. Vib. 2021, 2021, 1221462. [Google Scholar] [CrossRef]
- Ince, T.; Malik, J.; Devecioglu, O.C.; Kiranyaz, S.; Avci, O.; Eren, L.; Gabbouj, M. Early Bearing Fault Diagnosis of Rotating Machinery by 1D Self-Organized Operational Neural Networks. IEEE Access 2021, 9, 139260–139270. [Google Scholar] [CrossRef]
- Toma, R.N.; Piltan, F.; Im, K.; Shon, D.; Yoon, T.H.; Yoo, D.S.; Kim, J.M. A Bearing Fault Classification Framework Based on Image Encoding Techniques and a Convolutional Neural Network under Different Operating Conditions. Sensors 2022, 22, 4881. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Liu, J.; Xie, J.; Wang, C.; Ding, T. Conditional GAN and 2-D CNN for Bearing Fault Diagnosis with Small Samples. IEEE Trans. Instrum. Meas. 2021, 70, 3525712. [Google Scholar] [CrossRef]
- Alexa, R.Y.A.; Alexa, D.A.A. Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, Online, 20 November 2020. [Google Scholar]
Fault Type | Load (hp) | |||||
---|---|---|---|---|---|---|
Fault Diameter (Inch) | Fault Position Relative to Load Zone | Fault Labels | 0 | 1 | 2 | |
Normal | 0 | - | N | 243,938 | 485,643 | 485,643 |
Ball | 0.007 | 7_BA | 243,538 | 486,804 | 488,545 | |
0.014 | - | 14_BA | 249,146 | 487,384 | 486,804 | |
0.021 | 21_BA | 243,938 | 487,384 | 491,446 | ||
Inner Race | 0.007 | 7_IR | 243,938 | 485,643 | 485,643 | |
0.014 | - | 14_IR | 63,788 | 487,964 | 485,063 | |
0.021 | 21_IR | 244,339 | 491,446 | 486,804 | ||
Outer Race | 0.007 | @ 6:00 | 7_OR1 | 244,739 | 486,804 | 487,964 |
@ 3:00 | 7_OR2 | 124,602 | 485,643 | 486,224 | ||
@ 12:00 | 7_OR3 | 129,969 | 483,323 | 484483 | ||
0.014 | @ 6:00 | 14_OR1 | 245,140 | 486,804 | 488,545 | |
0.021 | @ 6:00 | 21_OR1 | 246,342 | 487,964 | 489,125 | |
@ 3:00 | 21_OR2 | 128,663 | 487,384 | 484,483 | ||
@ 12:00 | 21_OR3 | 130,549 | 486,804 | 486,224 | ||
Total Data Points | 2,782,629 | 6,816,994 | 6,816,996 |
Fault Labels | Load 0 | Load 1 | Load 2 | ||||||
---|---|---|---|---|---|---|---|---|---|
Training | Validation | Test | Training | Validation | Test | Training | Validation | Test | |
N | 680 | 170 | 362 | 1359 | 338 | 724 | 1359 | 338 | 724 |
7_BA | 681 | 167 | 364 | 1361 | 337 | 730 | 1367 | 337 | 732 |
14_BA | 698 | 172 | 370 | 1362 | 339 | 730 | 1362 | 337 | 729 |
21_BA | 680 | 169 | 365 | 1363 | 338 | 730 | 1378 | 342 | 731 |
7_IR | 680 | 170 | 364 | 1356 | 339 | 726 | 1355 | 341 | 727 |
14_IR | 177 | 43 | 370 | 1362 | 343 | 729 | 1358 | 337 | 725 |
21_IR | 681 | 169 | 366 | 1376 | 344 | 732 | 1361 | 337 | 730 |
7_OR1 | 683 | 169 | 366 | 1363 | 338 | 727 | 1366 | 340 | 728 |
7_OR2 | 347 | 85 | 185 | 1356 | 339 | 728 | 1357 | 339 | 729 |
7_OR3 | 362 | 88 | 194 | 1353 | 335 | 722 | 1355 | 338 | 724 |
14_OR1 | 684 | 171 | 364 | 1360 | 337 | 731 | 1362 | 340 | 734 |
21_OR1 | 688 | 169 | 368 | 1363 | 338 | 732 | 1366 | 339 | 734 |
21_OR2 | 359 | 88 | 191 | 1365 | 338 | 728 | 1354 | 338 | 724 |
21_OR3 | 363 | 90 | 194 | 1360 | 339 | 729 | 1359 | 339 | 724 |
Total: | 7763 | 1920 | 4423 | 19,059 | 4742 | 10,198 | 19,059 | 4742 | 10,195 |
Model Label | Classifier | Specification |
---|---|---|
MLP | MLP NN | Architecture: (1024IN—ReLU |512DL—ReLU|256DL—ReLU|14OUT—Softmax|Optimizer: adam|Loss: categorical_crossentropy) |
Bidirectional LSTM | Bi-LSTM | Architecture: [1BL—LSTM, 128 Units, Return Sequences: True, Input Shape: Specified], [1DO—Dropout, Rate: 0.2], [1FD—Dense, 32 Units, Activation: tanh], [2BL—LSTM, 64 Units, Return Sequences: False], [2DO—Dropout, Rate: 0.1], [1FD—Dense, 16 Units, Activation: tanh], [1FD—Dense, 256 Units, Activation: relu], [2FD—Dense, Number of Classes, Activation: Softmax], Optimizer: adam, Loss: categorical_crossentropy. Metrics: accuracy |
Deep LSTM | D-LSTM | Architecture: [1LD—LSTM, 64 Units, Return Sequences: True, Input Shape: Specified], [1DO—Dropout, Rate: 0.2], [2LD—LSTM, 32 Units, Return Sequences: False], [2DO—Dropout, Rate: 0.2], [1FD—Dense, 32 Units, Activation: ReLU], [2FD—Dense, Number of Classes, Activation: Softmax], Optimizer: adam, Loss: categorical_crossentropy. Metrics: accuracy |
1D-CNN | 1D Convolutional NN | Architecture: ([1Fd—Conv1D, 64 Filters, 100 Kernels, Activation: ReLU]|[2FD—Conv1D, 32 Filters, 50 Kernels, Activation: ReLU]|1FL|[1PL—4 MaxPooling1D]|[1FC—100 Dense, Activation: ReLU]|Optimizer: adam|Loss: categorical_crossentropy) |
MK-CNN | 1D Multi Kernel CNN | Architecture 1: ([1Fd—Conv1D, 64 Filters, 200 Kernels, Activation: ReLU, 0.5 Dropout, 1PL—20 MaxPooling1D, 1FL—Flatten]| Architecture 2: ([1Fd—Conv1D, 64 Filters, 100 Kernels, Activation: ReLU, 0.5 Dropout, 1PL—10 MaxPooling1D, 1FL—Flatten]| Architecture 3: ([1Fd—Conv1D, 64 Filters, 50 Kernels, Activation: ReLU, 0.5 Dropout, 1PL—5 MaxPooling1D, 1FL—Flatten])| [1DL—100 layers, Activation: ReLU]|[1OUT—Activation: Softmax]|Optimizer: adam|Loss: categorical_crossentropy) |
2D-CNN | 2D CNN | Architecture: Sequential|([1FD—Conv2D, 32 Filters, Kernel size (3, 3), Activation: ReLU]|[1PL— MaxPooling2D (2, 2), Strides (2, 2)]|[1CL—64 Conv2D, Kernel size (3, 3), Activation: ReLU]|[1PL—MaxPooling2D (2, 2), Strides (2, 2)]|[1DL—128 layers, Activation: ReLU]|[1OUT—Activation: Softmax]|Optimizer: adam|Loss: categorical_crossentropy) |
Classifier | Validation Accuracy (%) | Test Accuracy (%) | ||||
---|---|---|---|---|---|---|
0 hp | 1 hp | 2 hp | 0 hp | 1 hp | 2 hp | |
MLP | 73.94 | 55.74 | 51.3 | 68.38 | 55.73 | 54.66 |
D-LSTM | 81.32 | 81.51 | 78.93 | 70.50 | 81.50 | 82.54 |
Bi-LSTM | 89.84 | 81.90 | 80.33 | 78.96 | 81.89 | 81.70 |
1D-CNN | 90.26 | 89.14 | 89.2 | 85.02 | 89.27 | 90.30 |
Mk-CNN | 95.72 | 92.36 | 90.67 | 94.26 | 91.87 | 93.44 |
2D-CNN | 98.40 | 93.62 | 95.63 | 99.32 | 91.16 | 94.97 |
Classifier | Loading | KPI | |||
---|---|---|---|---|---|
Precision | Specificity | Recall | F1-Score | ||
MLP | 0 hp | 0.693 | 0.975 | 0.683 | 0.685 |
1 hp | 0.565 | 0.966 | 0.557 | 0.558 | |
2 hp | 0.534 | 0.914 | 0.545 | 0.551 | |
D_LSTM | 0 hp | 0.615 | 0.977 | 0.705 | 0.638 |
1 hp | 0.735 | 0.985 | 0.815 | 0.763 | |
2 hp | 0.843 | 0.983 | 0.8253 | 0.8115 | |
Bi-LSTM | 0 hp | 0.6902 | 0.983 | 0.789 | 0.724 |
1 hp | 0.816 | 0.915 | 0.892 | 0.813 | |
2 hp | 0.865 | 0.923 | 0.817 | 0.821 | |
1D-CNN | 0 hp | 0.865 | 0.988 | 0.850 | 0.846 |
1 hp | 0.897 | 0.991 | 0.983 | 0.892 | |
2 hp | 0.863 | 0.964 | 0.902 | 0.876 | |
MK-CNN | 0 hp | 0.942 | 0.995 | 0.942 | 0.940 |
1 hp | 0.915 | 0.895 | 0.9167 | 0.882 | |
2 hp | 0.923 | 0.996 | 0.932 | 0.910 | |
2D-CNN | 0 hp | 0.993 | 0.999 | 0.993 | 0.998 |
1 hp | 0.913 | 0.993 | 0.911 | 0.910 | |
2 hp | 0.950 | 0.996 | 0.949 | 0.949 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Raj, K.K.; Kumar, S.; Kumar, R.R.; Andriollo, M. Enhanced Fault Detection in Bearings Using Machine Learning and Raw Accelerometer Data: A Case Study Using the Case Western Reserve University Dataset. Information 2024, 15, 259. https://doi.org/10.3390/info15050259
Raj KK, Kumar S, Kumar RR, Andriollo M. Enhanced Fault Detection in Bearings Using Machine Learning and Raw Accelerometer Data: A Case Study Using the Case Western Reserve University Dataset. Information. 2024; 15(5):259. https://doi.org/10.3390/info15050259
Chicago/Turabian StyleRaj, Krish Kumar, Shahil Kumar, Rahul Ranjeev Kumar, and Mauro Andriollo. 2024. "Enhanced Fault Detection in Bearings Using Machine Learning and Raw Accelerometer Data: A Case Study Using the Case Western Reserve University Dataset" Information 15, no. 5: 259. https://doi.org/10.3390/info15050259
APA StyleRaj, K. K., Kumar, S., Kumar, R. R., & Andriollo, M. (2024). Enhanced Fault Detection in Bearings Using Machine Learning and Raw Accelerometer Data: A Case Study Using the Case Western Reserve University Dataset. Information, 15(5), 259. https://doi.org/10.3390/info15050259