Figure 1.
Hybrid RBSO–MRFO algorithm workflow for automated hyperparameter optimization of Transformer-LSTM model.
Figure 1.
Hybrid RBSO–MRFO algorithm workflow for automated hyperparameter optimization of Transformer-LSTM model.
Figure 2.
Model architecture of proposed hybrid Transformer-LSTM for binary classification.
Figure 2.
Model architecture of proposed hybrid Transformer-LSTM for binary classification.
Figure 3.
Model architecture of proposed hybrid Transformer-LSTM for multi-class classification.
Figure 3.
Model architecture of proposed hybrid Transformer-LSTM for multi-class classification.
Figure 4.
Visual comparison of signal distributions across operating conditions for CWRU dataset.
Figure 4.
Visual comparison of signal distributions across operating conditions for CWRU dataset.
Figure 5.
Visual comparison of signal distributions across operating conditions for TMFD dataset.
Figure 5.
Visual comparison of signal distributions across operating conditions for TMFD dataset.
Figure 6.
Visual comparison of signal distributions across operating conditions for MaFaulDa dataset.
Figure 6.
Visual comparison of signal distributions across operating conditions for MaFaulDa dataset.
Figure 7.
Confusion matrix for binary classification of the Transformer-LSTM model on the CWRU dataset: (a) pre-optimization, (b) post-optimization.
Figure 7.
Confusion matrix for binary classification of the Transformer-LSTM model on the CWRU dataset: (a) pre-optimization, (b) post-optimization.
Figure 8.
Test accuracy of deep learning models pre-optimization and post-optimization for binary classification on the CWRU dataset.
Figure 8.
Test accuracy of deep learning models pre-optimization and post-optimization for binary classification on the CWRU dataset.
Figure 9.
Pre-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the CWRU dataset.
Figure 9.
Pre-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the CWRU dataset.
Figure 10.
Post-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the CWRU dataset.
Figure 10.
Post-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the CWRU dataset.
Figure 11.
Test accuracy of deep learning models pre-optimization and post-optimization for multi-class classification on the CWRU dataset.
Figure 11.
Test accuracy of deep learning models pre-optimization and post-optimization for multi-class classification on the CWRU dataset.
Figure 12.
Confusion matrix for binary classification of the Transformer-LSTM model on the TMFD dataset: (a) pre-optimization, (b) post-optimization.
Figure 12.
Confusion matrix for binary classification of the Transformer-LSTM model on the TMFD dataset: (a) pre-optimization, (b) post-optimization.
Figure 13.
Test accuracy of deep learning models pre-optimization and post-optimization for binary classification on the TMFD dataset.
Figure 13.
Test accuracy of deep learning models pre-optimization and post-optimization for binary classification on the TMFD dataset.
Figure 14.
Pre-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the TMFD dataset.
Figure 14.
Pre-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the TMFD dataset.
Figure 15.
Post-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the TMFD dataset.
Figure 15.
Post-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the TMFD dataset.
Figure 16.
Test accuracy of deep learning models pre-optimization and post-optimization for multi-class classification on the TMFD dataset.
Figure 16.
Test accuracy of deep learning models pre-optimization and post-optimization for multi-class classification on the TMFD dataset.
Figure 17.
Confusion matrix for binary classification of the Transformer-LSTM model on the MaFaulDa dataset: (a) pre-optimization, (b) post-optimization.
Figure 17.
Confusion matrix for binary classification of the Transformer-LSTM model on the MaFaulDa dataset: (a) pre-optimization, (b) post-optimization.
Figure 18.
Test accuracy of deep learning models pre-optimization and post-optimization for binary classification on the MaFaulDa dataset.
Figure 18.
Test accuracy of deep learning models pre-optimization and post-optimization for binary classification on the MaFaulDa dataset.
Figure 19.
Pre-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the MaFaulDa dataset.
Figure 19.
Pre-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the MaFaulDa dataset.
Figure 20.
Post-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the MaFaulDa dataset.
Figure 20.
Post-optimization confusion matrix of the Transformer-LSTM model for multi-class classification on the MaFaulDa dataset.
Figure 21.
Test accuracy of deep learning models pre-optimization and post-optimization for multi-class classification on the MaFaulDa dataset.
Figure 21.
Test accuracy of deep learning models pre-optimization and post-optimization for multi-class classification on the MaFaulDa dataset.
Table 1.
Hyperparameter search and optimization for evaluated models using hybrid RBSO–MRFO algorithm for binary and multi-class classification.
Table 1.
Hyperparameter search and optimization for evaluated models using hybrid RBSO–MRFO algorithm for binary and multi-class classification.
| Model | Hyperparameter | Range | Type |
|---|
MLP [62,63,64] | Hidden Units | (2, 16) | Discrete |
| Dropout Rate | (0.0, 0.5) | Continuous |
| Learning Rate | (1 × 10−5, 1 × 10−2) | Continuous |
LSTM [62,65,66] | LSTM Units | (8, 128) | Discrete |
| Dropout Rate | (0.0, 0.5) | Continuous |
| Learning Rate | (1 × 10−5, 1 × 10−2) | Continuous |
GRU–TCN [62,65,66] | GRU Units | (8, 256) | Discrete |
| TCN Filters | (16, 64) | Discrete |
| Dropout Rate | (0.0, 0.5) | Continuous |
| Learning Rate | (1 × 10−5, 1 × 10−2) | Continuous |
CNN–BiLSTM [64,65,66] | CNN Filters | (16, 128) | Discrete |
| LSTM Units | (8, 128) | Discrete |
| Dropout Rate | (0.0, 0.5) | Continuous |
| Learning Rate | (1 × 10−5, 1 × 10−2) | Continuous |
Transformer-LSTM [62,66,67] | Number of Heads | (1, 8) | Discrete |
| Key Dimension | (8, 128) | Discrete |
| FFN Units | (8, 512) | Discrete |
| LSTM Units | (8, 256) | Discrete |
| Dropout Rate | (0.0, 0.5) | Continuous |
Table 2.
Transformer-LSTM Model Architecture for Binary Classification.
Table 2.
Transformer-LSTM Model Architecture for Binary Classification.
| Layer | Configuration |
|---|
| Model type | Hybrid Transformer-LSTM network |
| Input layer | Input shape (number of features) |
| Reshape layer | Reshape to (number of features, 1) |
| Conv1D layer | 64 filters, kernel size 1, ReLU activation, same padding |
| Gaussian noise | Standard deviation 0.01 |
| Transformer block | Multi-head attention 1 head, key dimension 4, dropout 0.9, Add & LayerNorm; feed-forward Dense 16, dropout, Dense d_model, dropout, Add & LayerNorm |
| LSTM layer | 8 units, return sequences False, dropout 0.9 |
| Dense layer | 128 units, ReLU activation, dropout 0.2 |
| Output layer | 1 unit, Sigmoid activation |
| Output | Binary classification |
Table 3.
Transformer-LSTM Model Hyperparameters for Binary Classification.
Table 3.
Transformer-LSTM Model Hyperparameters for Binary Classification.
| Hyperparameter | Value |
|---|
| Optimizer | Adam |
| Loss function | Binary cross-entropy |
| Metrics | Accuracy |
| Batch size | 128 |
| Learning rate | 0.001 |
| Learning rate schedule | ReduceLROnPlateau, patience 3, factor 0.5, min_lr 1 × 10−6, monitored on validation loss |
| Callbacks | Confusion matrix visualization |
| Transformer heads | 1 |
| Key dimension | 4 |
| Feed-forward units | 16 |
| LSTM units | 8 |
| Dropout rate | 0.9 (Transformer + LSTM), 0.2 (Dense layer) |
| Dense layer units | 128 |
Table 4.
Transformer-LSTM model architecture for multi-class classification.
Table 4.
Transformer-LSTM model architecture for multi-class classification.
| Layer | Configuration |
|---|
| Model type | Hybrid Transformer-LSTM network |
| Input layer | Input shape (number of features) |
| Reshape layer | Reshape to (number of features, 1) |
| Conv1D layer | 64 filters, kernel size 1, ReLU activation, same padding |
| Gaussian noise | Standard deviation 0.01 |
| Multi-head attention | 1 head, key dimension 16, dropout 0.5, Add & LayerNorm |
| Feed-forward network | Dense 64, ReLU, dropout 0.5, Dense d_model, dropout 0.5, Add & LayerNorm |
| LSTM layer | 32 units, return sequences False, dropout 0.5 |
| Dense layer | 128 units, ReLU activation, dropout 0.2 |
| Output layer | Number of classes units, Softmax activation |
| Output | Multi-class classification |
Table 5.
Transformer-LSTM model hyperparameters for multi-class classification.
Table 5.
Transformer-LSTM model hyperparameters for multi-class classification.
| Hyperparameter | Value |
|---|
| Optimizer | Adam |
| Loss function | Categorical cross-entropy |
| Metrics | Accuracy |
| Batch size | 128 |
| Learning rate | 0.001 |
| Learning rate schedule | ReduceLROnPlateau, patience 3, factor 0.5, minimum 1 × 10−6, monitored on validation loss |
| Callbacks | Confusion matrix visualization |
| Transformer heads | 1 |
| Key dimension | 16 |
| Feed-forward units | 64 |
| LSTM units | 32 |
| Dropout rate | 0.5 (Transformer + LSTM), 0.2 (Dense layer) |
| Dense layer units | 128 |
Table 6.
Sample distribution per operating condition on CWRU dataset.
Table 6.
Sample distribution per operating condition on CWRU dataset.
| Fault Type | Number of Samples |
|---|
| Normal_1 | 230 |
| IR_014_1 | 230 |
| IR_007_1 | 230 |
| IR_021_1 | 230 |
| OR_007_6_1 | 230 |
| OR_014_6_1 | 230 |
| OR_021_6_1 | 230 |
| Ball_007_1 | 230 |
| Ball_014_1 | 230 |
| Ball_021_1 | 230 |
Table 7.
Statistical characterization of operating conditions on CWRU dataset.
Table 7.
Statistical characterization of operating conditions on CWRU dataset.
| Fault Type | Samples | Mean | Std | Min | Q1 | Median | Q3 | Max |
|---|
| Ball_014_1 | 230 | 2.24 | 4.46 | −2.49 | 0.013 | 0.168 | 3.066 | 40.87 |
| Ball_021_1 | 230 | 2.36 | 6.05 | −2.82 | 0.008 | 0.175 | 0.715 | 73.78 |
| Normal_1 | 230 | 0.94 | 1.94 | −0.43 | −0.119 | 0.064 | 0.222 | 11.70 |
| IR_014_1 | 230 | 1.37 | 2.26 | −1.18 | 0.032 | 0.207 | 1.50 | 8.12 |
| Ball_007_1 | 230 | 1.24 | 2.46 | −0.63 | 0.017 | 0.137 | 0.508 | 11.31 |
| IR_021_1 | 230 | 6.93 | 18.43 | −3.11 | 0.014 | 0.605 | 2.561 | 162.79 |
| OR_014_6_1 | 230 | 1.70 | 3.76 | −1.35 | 0.009 | 0.132 | 0.519 | 21.47 |
| IR_007_1 | 230 | 2.60 | 4.37 | −1.57 | 0.021 | 0.286 | 4.501 | 16.90 |
| OR_007_6_1 | 230 | 11.41 | 29.94 | −5.25 | 0.058 | 1.11 | 4.679 | 313.74 |
| OR_021_6_1 | 230 | 7.04 | 14.12 | −6.29 | 0.014 | 0.699 | 6.923 | 104.54 |
Table 8.
Sample distribution per operating condition on TMFD dataset.
Table 8.
Sample distribution per operating condition on TMFD dataset.
| Operating Condition | Number of Samples |
|---|
| Normal | 18,026 |
| Steady-State Overload | 320 |
| Transient Overload | 220 |
Table 9.
Statistical characterization of operating conditions on TMFD dataset.
Table 9.
Statistical characterization of operating conditions on TMFD dataset.
| Operating Condition | Samples | Mean | Std | Min | Q1 | Median | Q3 | Max |
|---|
| Normal | 18,026 | 73.52 | 125.45 | 0.0 | 1.0 | 3.7 | 100.0 | 580.0 |
| Steady-State Overload | 320 | 70.04 | 138.70 | 0.0 | 0.0 | 1.2 | 20.93 | 500.0 |
| Transient Overload | 220 | 70.38 | 138.53 | 0.0 | 0.0 | 2.0 | 21.75 | 500.0 |
Table 10.
Sample distribution per operating condition on MaFaulDa dataset.
Table 10.
Sample distribution per operating condition on MaFaulDa dataset.
| Fault Type | Number of Samples |
|---|
| Normal | 12,250,000 |
| 6 g | 12,250,000 |
| 10 g | 12,000,000 |
| 15 g | 12,000,000 |
| 20 g | 12,250,000 |
| 25 g | 11,750,000 |
| 30 g | 11,750,000 |
Table 11.
Statistical characterization of operating conditions on MaFaulDa dataset.
Table 11.
Statistical characterization of operating conditions on MaFaulDa dataset.
| Fault Type | Samples | Mean | Std | Min | Q1 | Median | Q3 | Max |
|---|
| 6 g | 12,250,000 | 0.009379 | 0.753013 | −5.0265 | −0.25145 | −0.009956 | 0.12983 | 5.3841 |
| Normal | 12,250,000 | 0.007108 | 0.746563 | −4.4835 | −0.29003 | −0.014158 | 0.16688 | 5.1078 |
| 10 g | 12,000,000 | 0.014068 | 0.801364 | −4.8189 | −0.28728 | −0.010551 | 0.14267 | 6.4163 |
| 15 g | 12,000,000 | 0.009017 | 0.873038 | −7.7810 | −0.34193 | −0.011001 | 0.14749 | 7.3737 |
| 20 g | 12,250,000 | 0.006443 | 0.924449 | −154.930 | −0.38188 | −0.012361 | 0.15468 | 35.2620 |
| 25 g | 11,750,000 | 0.009178 | 0.961216 | −204.120 | −0.40610 | −0.011347 | 0.16391 | 126.8600 |
| 30 g | 11,750,000 | −0.001049 | 1.143868 | −193.770 | −0.46202 | −0.018592 | 0.19607 | 130.4000 |
Table 12.
Optimized hyperparameter values for deep learning models in binary classification of CWRU dataset.
Table 12.
Optimized hyperparameter values for deep learning models in binary classification of CWRU dataset.
| Model | Hyperparameters | Best Value |
|---|
| MLP | Hidden Units | 15 |
| | Dropout Rate | 0.0619 |
| | Learning Rate | 0.0098 |
| LSTM | LSTM Units | 128 |
| | Dropout Rate | 0.1039 |
| | Learning Rate | 0.0100 |
| GRU-TCN | GRU Units | 93 |
| | TCN Filters | 16 |
| | Dropout Rate | 0.2318 |
| | Learning Rate | 0.0085 |
| CNN-BiLSTM | CNN Filters | 67 |
| | LSTM Units | 102 |
| | Dropout Rate | 0.0998 |
| | Learning Rate | 0.0051 |
| Transformer-LSTM | Number of Heads | 3 |
| | Key Dimension | 32 |
| | FFN Units | 431 |
| | LSTM Units | 161 |
| | Dropout Rate | 0.2067 |
Table 13.
Optimized hyperparameter values for deep learning models in multi-class classification of CWRU dataset.
Table 13.
Optimized hyperparameter values for deep learning models in multi-class classification of CWRU dataset.
| Model | Hyperparameters | Best Value |
|---|
| MLP | Hidden Units | 14 |
| | Dropout Rate | 0.0170 |
| | Learning Rate | 0.0100 |
| LSTM | LSTM Units | 127 |
| | Dropout Rate | 0.4771 |
| | Learning Rate | 0.0092 |
| GRU-TCN | GRU Units | 156 |
| | TCN Filters | 54 |
| | Dropout Rate | 0.2579 |
| | Learning Rate | 0.0065 |
| CNN-BiLSTM | CNN Filters | 30 |
| | LSTM Units | 120 |
| | Dropout Rate | 0.3741 |
| | Learning Rate | 0.0078 |
| Transformer-LSTM | Number of Heads | 2 |
| | Key Dimension | 95 |
| | FFN Units | 339 |
| | LSTM Units | 256 |
| | Dropout Rate | 0.0000 |
Table 14.
Optimized hyperparameter values for deep learning models in binary classification of TMFD dataset.
Table 14.
Optimized hyperparameter values for deep learning models in binary classification of TMFD dataset.
| Model | Hyperparameters | Best Value |
|---|
| MLP | Hidden Units | 8 |
| | Dropout Rate | 0.0302 |
| | Learning Rate | 0.0077 |
| LSTM | LSTM Units | 91 |
| | Dropout Rate | 0.4810 |
| | Learning Rate | 0.0090 |
| GRU-TCN | GRU Units | 122 |
| | TCN Filters | 53 |
| | Dropout Rate | 0.2888 |
| | Learning Rate | 0.0099 |
| CNN-BiLSTM | CNN Filters | 57 |
| | LSTM Units | 113 |
| | Dropout Rate | 0.0000 |
| | Learning Rate | 0.0081 |
| Transformer-LSTM | Number of Heads | 4 |
| | Key Dimension | 74 |
| | FFN Units | 143 |
| | LSTM Units | 250 |
| | Dropout Rate | 0.5000 |
Table 15.
Optimized hyperparameter values for deep learning models in multi-class classification of TMFD dataset.
Table 15.
Optimized hyperparameter values for deep learning models in multi-class classification of TMFD dataset.
| Model | Hyperparameters | Best Value |
|---|
| MLP | Hidden Units | 13 |
| | Dropout Rate | 0.0638 |
| | Learning Rate | 0.0097 |
| LSTM | LSTM Units | 66 |
| | Dropout Rate | 0.1040 |
| | Learning Rate | 0.0100 |
| GRU-TCN | GRU Units | 184 |
| | TCN Filters | 50 |
| | Dropout Rate | 0.1490 |
| | Learning Rate | 0.0088 |
| CNN-BiLSTM | CNN Filters | 98 |
| | LSTM Units | 111 |
| | Dropout Rate | 0.5000 |
| | Learning Rate | 0.0082 |
| Transformer-LSTM | Number of Heads | 2 |
| | Key Dimension | 94 |
| | FFN Units | 251 |
| | LSTM Units | 145 |
| | Dropout Rate | 0.0553 |
Table 16.
Optimized hyperparameter values for deep learning models in binary classification of MaFaulDa dataset.
Table 16.
Optimized hyperparameter values for deep learning models in binary classification of MaFaulDa dataset.
| Model | Hyperparameters | Best Value |
|---|
| MLP | Hidden Units | 16 |
| | Dropout Rate | 0.0414 |
| | Learning Rate | 0.0061 |
| LSTM | LSTM Units | 89 |
| | Dropout Rate | 0.4077 |
| | Learning Rate | 0.0100 |
| GRU-TCN | GRU Units | 50 |
| | TCN Filters | 60 |
| | Dropout Rate | 0.1740 |
| | Learning Rate | 0.0086 |
| CNN-BiLSTM | CNN Filters | 81 |
| | LSTM Units | 84 |
| | Dropout Rate | 0.2304 |
| | Learning Rate | 0.0053 |
| Transformer-LSTM | Number of Heads | 4 |
| | Key Dimension | 80 |
| | FFN Units | 8 |
| | LSTM Units | 144 |
| | Dropout Rate | 0.1267 |
Table 17.
Optimized hyperparameter values for deep learning models in multi-class classification of MaFaulDa dataset.
Table 17.
Optimized hyperparameter values for deep learning models in multi-class classification of MaFaulDa dataset.
| Model | Hyperparameters | Best Value |
|---|
| MLP | Hidden Units | 14 |
| | Dropout Rate | 0.0034 |
| | Learning Rate | 0.0100 |
| LSTM | LSTM Units | 75 |
| | Dropout Rate | 0.1150 |
| | Learning Rate | 0.0092 |
| GRU-TCN | GRU Units | 190 |
| | TCN Filters | 53 |
| | Dropout Rate | 0.0251 |
| | Learning Rate | 0.0089 |
| CNN-BiLSTM | CNN Filters | 67 |
| | LSTM Units | 86 |
| | Dropout Rate | 0.1978 |
| | Learning Rate | 0.0100 |
| Transformer-LSTM | Number of Heads | 1 |
| | Key Dimension | 11 |
| | FFN Units | 29 |
| | LSTM Units | 243 |
| | Dropout Rate | 0.4965 |
Table 18.
Performance comparison of deep learning models for binary classification before and after optimization on the CWRU dataset.
Table 18.
Performance comparison of deep learning models for binary classification before and after optimization on the CWRU dataset.
| Prediction Model | Accuracy | Precision | Recall | F-Score |
|---|
| MLP | 96.96% | 97.56% | 96.96% | 97.10% |
| LSTM | 97.24% | 97.24% | 97.24% | 97.12% |
| GRU-TCN | 95.59% | 96.75% | 95.59% | 95.87% |
| CNN-BiLSTM | 98.07% | 98.11% | 98.07% | 98.00% |
| Transformer-LSTM | 98.35% | 98.54% | 98.35% | 98.39% |
| Optimized MLP | 99.17% | 99.22% | 99.17% | 99.18% |
| Optimized LSTM | 98.89% | 98.98% | 98.89% | 98.91% |
| Optimized GRU-TCN | 99.17% | 99.23% | 99.17% | 99.19% |
| Optimized CNN-BiLSTM | 99.45% | 99.45% | 99.45% | 99.44% |
| Optimized Transformer-LSTM | 99.72% | 99.73% | 99.72% | 99.72% |
Table 19.
Performance comparison of deep learning models for multi-class classification before and after optimization on the CWRU dataset.
Table 19.
Performance comparison of deep learning models for multi-class classification before and after optimization on the CWRU dataset.
| Prediction Model | Accuracy | Precision | Recall | F-Score |
|---|
| MLP | 96.14% | 96.51% | 96.14% | 96.05% |
| LSTM | 95.31% | 95.34% | 95.31% | 95.29% |
| GRU-TCN | 98.07% | 98.10% | 98.07% | 98.07% |
| CNN-BiLSTM | 96.14% | 96.65% | 96.14% | 96.06% |
| Transformer-LSTM | 97.21% | 97.29% | 97.21% | 97.16% |
| Optimized MLP | 99.44% | 99.46% | 99.44% | 99.45% |
| Optimized LSTM | 98.62% | 98.70% | 98.62% | 98.62% |
| Optimized GRU-TCN | 99.17% | 99.20% | 99.17% | 99.17% |
| Optimized CNN-BiLSTM | 99.45% | 99.47% | 99.45% | 99.45% |
| Optimized Transformer-LSTM | 99.72% | 99.73% | 99.72% | 99.72% |
Table 20.
Performance comparison of deep learning models for binary classification before and after optimization on the TMFD dataset.
Table 20.
Performance comparison of deep learning models for binary classification before and after optimization on the TMFD dataset.
| Prediction Model | Accuracy | Precision | Recall | F-Score |
|---|
| MLP | 93.88% | 97.30% | 93.88% | 95.18% |
| LSTM | 97.38% | 98.59% | 97.38% | 97.76% |
| GRU-TCN | 97.39% | 98.60% | 97.39% | 97.76% |
| CNN-BiLSTM | 98.55% | 99.02% | 98.55% | 98.68% |
| Transformer-LSTM | 99.52% | 99.53% | 99.52% | 99.52% |
| Optimized MLP | 99.00% | 99.24% | 99.00% | 99.07% |
| Optimized LSTM | 99.94% | 99.94% | 99.94% | 99.94% |
| Optimized GRU-TCN | 99.35% | 99.47% | 99.35% | 99.38% |
| Optimized CNN-BiLSTM | 99.41% | 99.50% | 99.41% | 99.43% |
| Optimized Transformer-LSTM | 99.97% | 99.97% | 99.97% | 99.97% |
Table 21.
Performance comparison of deep learning models for multi-class classification before and after optimization on the TMFD dataset.
Table 21.
Performance comparison of deep learning models for multi-class classification before and after optimization on the TMFD dataset.
| Prediction Model | Accuracy | Precision | Recall | F-Score |
|---|
| MLP | 93.21% | 97.87% | 93.21% | 95.10% |
| LSTM | 98.51% | 98.56% | 98.51% | 98.53% |
| GRU–TCN | 69.17% | 95.81% | 69.17% | 79.39% |
| CNN–BiLSTM | 98.33% | 98.94% | 98.33% | 98.51% |
| Transformer-LSTM | 98.57% | 98.62% | 98.57% | 98.58% |
| Optimized MLP | 99.11% | 99.12% | 99.11% | 99.07% |
| Optimized LSTM | 99.91% | 99.91% | 99.91% | 99.91% |
| Optimized GRU–TCN | 89.74% | 94.69% | 89.74% | 92.00% |
| Optimized CNN–BiLSTM | 99.92% | 99.92% | 99.92% | 99.92% |
| Optimized Transformer-LSTM | 99.97% | 99.97% | 99.97% | 99.97% |
Table 22.
Performance comparison of deep learning models for binary classification before and after optimization on the MaFaulDa dataset.
Table 22.
Performance comparison of deep learning models for binary classification before and after optimization on the MaFaulDa dataset.
| Prediction Model | Accuracy | Precision | Recall | F-Score |
|---|
| MLP | 90.97% | 90.39% | 90.97% | 90.03% |
| LSTM | 97.60% | 97.91% | 97.60% | 97.67% |
| GRU-TCN | 97.52% | 97.78% | 97.52% | 97.58% |
| CNN-BiLSTM | 95.98% | 96.48% | 95.98% | 96.11% |
| Transformer-LSTM | 98.18% | 98.37% | 98.18% | 98.23% |
| Optimized MLP | 99.66% | 99.66% | 99.66% | 99.66% |
| Optimized LSTM | 99.30% | 99.32% | 99.30% | 99.31% |
| Optimized GRU-TCN | 99.31% | 99.33% | 99.31% | 99.32% |
| Optimized CNN-BiLSTM | 99.50% | 99.50% | 99.50% | 99.50% |
| Optimized Transformer-LSTM | 99.98% | 99.98% | 99.98% | 99.98% |
Table 23.
Performance comparison of deep learning models for multi-class classification before and after optimization on the MaFaulDa dataset.
Table 23.
Performance comparison of deep learning models for multi-class classification before and after optimization on the MaFaulDa dataset.
| Prediction Model | Accuracy | Precision | Recall | F-Score |
|---|
| MLP | 88.05% | 87.93% | 88.05% | 87.84% |
| LSTM | 75.57% | 74.93% | 75.57% | 74.97% |
| GRU–TCN | 91.17% | 91.07% | 91.17% | 91.10% |
| CNN–BiLSTM | 90.46% | 90.33% | 90.46% | 90.37% |
| Transformer-LSTM | 92.82% | 93.03% | 92.82% | 92.82% |
| Optimized MLP | 92.23% | 92.16% | 92.23% | 92.18% |
| Optimized LSTM | 97.44% | 97.43% | 97.44% | 97.43% |
| Optimized GRU–TCN | 95.98% | 95.96% | 95.98% | 95.96% |
| Optimized CNN–BiLSTM | 96.23% | 96.22% | 96.23% | 96.22% |
| Optimized Transformer-LSTM | 98.60% | 98.60% | 98.60% | 98.60% |
Table 24.
Detailed computational cost metrics of RBSO–MRFO algorithm.
Table 24.
Detailed computational cost metrics of RBSO–MRFO algorithm.
| Parameter | Value |
|---|
| Population Size | 14 |
| Number of Iterations/Generations | 12 |
| Total Fitness Evaluations | 168 (population size × iterations) |
| Stopping Criteria | Fixed number of iterations |
| Hyperparameter Encoding | Mixed continuous and discrete |
| Training Epochs per Evaluation | 6 |
| Batch Size | 128 |
Table 25.
Training time for optimized Transformer-LSTM model.
Table 25.
Training time for optimized Transformer-LSTM model.
| Dataset | Classification Type | Training Time per Batch (Seconds) | Training Time per Sample (Milliseconds) |
|---|
| CWRU | Binary classification | 0.082 | 0.64 |
| Multi-class classification | 0.084 | 0.65 |
| TMFD | Binary classification | 0.076 | 0.59 |
| Multi-class classification | 0.079 | 0.62 |
| MaFaulDa | Binary classification | 0.045 | 0.35 |
| Multi-class classification | 0.048 | 0.37 |
Table 26.
Inference time for optimized Transformer-LSTM model.
Table 26.
Inference time for optimized Transformer-LSTM model.
| Dataset | Classification Type | Inference Time per Batch (Seconds) | Inference Time per Sample (Milliseconds) |
|---|
| CWRU | Binary classification | 0.071 | 0.55 |
| Multi-class classification | 0.074 | 0.57 |
| TMFD | Binary classification | 0.069 | 0.54 |
| Multi-class classification | 0.075 | 0.59 |
| MaFaulDa | Binary classification | 0.036 | 0.28 |
| Multi-class classification | 0.039 | 0.30 |
Table 27.
Memory consumption for optimized Transformer-LSTM model.
Table 27.
Memory consumption for optimized Transformer-LSTM model.
| Dataset | Classification Type | Memory Consumption (MB) |
|---|
| CWRU | Binary classification | 1.29 |
| Multi-class classification | 2.04 |
| TMFD | Binary classification | 1.87 |
| Multi-class classification | 1.08 |
| MaFaulDa | Binary classification | 1.13 |
| Multi-class classification | 1.63 |
Table 28.
Results of comparison experiments of prior methods and proposed method of binary classification across CWRU, TMFD and MaFaulDa datasets.
Table 28.
Results of comparison experiments of prior methods and proposed method of binary classification across CWRU, TMFD and MaFaulDa datasets.
| Dataset | Model | Accuracy |
|---|
| CWRU | KNN [73] | 94.7% |
| MLP-BP [73] | 99.5% |
| MLP-BP + SVM [73] | 98.8% |
| CWT + ANN [73] | 99.6% |
| RBSO–MRFO + Transformer-LSTM (Proposed method) | 99.72% |
| TMFD | DNN [47] | 99.29% |
| CNN [47] | 98.51% |
| LSTM [47] | 99.11% |
| GRU [47] | 99.27% |
| RBSO–MRFO + Transformer-LSTM (Proposed method) | 99.97% |
| MaFaulDa | Unoptimized SVM [74] | 85.9% |
| Optimized SVM [74] | 90.4% |
| Oversampled optimized SVM [74] | 95.4% |
| Unoptimized KNN [74] | 87.4% |
| Optimized KNN [74] | 89.8% |
| Oversampled optimized KNN [74] | 92.8% |
| Time-domain based DNN [74] | 95% |
| FFT based DNN [74] | 99.7% |
| RBSO–MRFO + Transformer-LSTM (Proposed method) | 99.98% |
Table 29.
Results of comparison experiments of prior methods and proposed method of multi-class classification across CWRU, TMFD and MaFaulDa datasets.
Table 29.
Results of comparison experiments of prior methods and proposed method of multi-class classification across CWRU, TMFD and MaFaulDa datasets.
| Dataset | Model | Accuracy |
|---|
| CWRU | CNN-LSTM [75] | 94.20% |
| HPSO-CNN-LSTM [75] | 99.20% |
| TSFFCNN-PSO-SVM [76] | 98.50% |
| 1-D CNN-PSO-SVM [76] | 98.20% |
| CNN-LSTM with Gated Recurrent Unit [76] | 99.29% |
| CNN-BiLSTM with Grid Search [76] | 99.28% |
| Optimized 1-D CNN-LSTM [76] | 99.35% |
| RBSO–MRFO + Transformer-LSTM (Proposed method) | 99.72% |
| TMFD | DNN [47] | 99.67% |
| CNN [47] | 99.86% |
| LSTM [47] | 97.09% |
| GRU [47] | 97.09% |
| RBSO–MRFO + Transformer-LSTM (Proposed method) | 99.97% |
| MaFaulDa | DNN [47] | 97.04% |
| CNN [47] | 90.51% |
| LSTM [47] | 95.71% |
| GRU [47] | 96.64% |
| Transformer-DNN [47] | 98.39% |
| RBSO–MRFO + Transformer-LSTM (Proposed method) | 98.60% |
Table 30.
Results of ablation experiments of binary classification across CWRU, TMFD and MaFaulDa datasets.
Table 30.
Results of ablation experiments of binary classification across CWRU, TMFD and MaFaulDa datasets.
| Experimental Method | CWRU | TMFD | MaFaulDa |
|---|
| Accuracy | Accuracy | Accuracy |
|---|
| Transformer-LSTM | 98.35% | 99.52% | 98.18% |
| RBSO + Transformer-LSTM | 99.45% | 99.87% | 99.43% |
| MRFO + Transformer-LSTM | 99.17% | 99.84% | 99.37% |
| RBSO–MRFO + Transformer-LSTM (Proposed method) | 99.72% | 99.97% | 99.98% |
Table 31.
Results of ablation experiments of multi-class classification across CWRU, TMFD and MaFaulDa datasets.
Table 31.
Results of ablation experiments of multi-class classification across CWRU, TMFD and MaFaulDa datasets.
| Experimental Method | CWRU | TMFD | MaFaulDa |
|---|
| Accuracy | Accuracy | Accuracy |
|---|
| Transformer-LSTM | 97.21% | 98.57% | 92.82% |
| RBSO + Transformer-LSTM | 98.35% | 80.56% | 97.71% |
| MRFO + Transformer-LSTM | 98.07% | 88.83% | 97.55% |
| RBSO–MRFO + Transformer-LSTM (Proposed method) | 99.72% | 99.97% | 98.60% |