Deep Transfer Learning Framework for Bearing Fault Detection in Motors

: The domain of fault detection has seen tremendous growth in recent years. Because of the growing demand for uninterrupted operations in different sectors, prognostics and health management (PHM) is a key enabling technology to achieve this target. Bearings are an essential component of a motor. The PHM of bearing is crucial for uninterrupted operation. Conventional artiﬁcial intelligence techniques require feature extraction and selection for fault detection. This process often restricts the performance of such approaches. Deep learning enables autonomous feature extraction and selection. Given the advantages of deep learning, this article presents a transfer learning–based method for bearing fault detection. The pretrained ResNetV2 model is used as a base model to develop an effective fault detection strategy for bearing faults. The different bearing faults, including the outer race fault, inner race fault, and ball defect, are included in developing an effective fault detection model. The necessity for manual feature extraction and selection has been reduced by the proposed method. Additionally, a straightforward 1D to 2D data conversion has been suggested, altogether eliminating the requirement for manual feature extraction and selection. Different performance metrics are estimated to conﬁrm the efﬁcacy of the proposed strategy, and the results show that the proposed technique effectively detected bearing faults.


Introduction
Electrical motors are the prime movers of modern industries.However, unwanted faults in the motors can lead to the complete shutdown of the industry.Bearings are the critical components of electrical motors (EMs).The bearing separates the stationary part from the moving part.Generally, it is on both ends, i.e., on both the drive and nondrive sides of the motor.Bearing fault (BF) is the most common fault in electrical motors.It is responsible for more than half of all motor defects.Bearing fault includes the inner race fault (IRF), outer race fault (ORF), and bearing ball defect (BBD) [1,2].The PHM of the bearing is crucial for the uninterrupted operation of the motor.Timely fault diagnosis of the BFs in the EMs could help avoid the complete shutdown of the production or manufacturing facility in the industry.The BFs can be caused by harsh operating conditions, overloading, overspeeding, manufacturing defects, overheating, dielectric stress, and aging.Symptoms of the faults can be unbalanced voltages and line currents, excessive vibration, unwarranted heating, enhanced losses, efficiency decline, reduction in the median torque, and higher torque pulsation.To keep the EMs in the industries operating without interruption, timely fault detection (FD) is crucial.
The conventional bearing FD approach includes current and vibration monitoring techniques.Motor current signature analysis is the favored method for fault diagnosis in EMs.In addition, vibration-based FD approaches are famous for bearing FD.Both these methods require a lot of human expertise and past knowledge about EMs.Researchers have recently used artificial intelligence (AI) for FD in EMs.In recent years, the use of AI in the bearing FD has improved fault detection accuracy while making the fault detection approaches more reliable [3].Conventional machine-learning (ML) algorithms, such as support vector machine (SVM), decision tree (DT), random forest (RF), and knearest neighbor (kNN), have been extensively used to develop a bearing FD approach in electrical motors and rotating machines [4,5].These FD approaches rely on efficient input data, i.e., input features.Feature extraction and selection (FES) are essential for fault detection using these conventional ML approaches.Either the frequency domain signals or the time-domain signals can yield the features.Statistical features, such as standard deviation, mean, and variance, among many more, can be used.Feature selection is a big challenge for these conventional ML approaches, and the overall performance of these approaches is dependent on these features.Manual feature extraction and selection require prior knowledge and human expertise.In addition, optimization algorithms are used to optimize the models [6].Konar et al. [7] proposed the bearing FD using the continuous wavelet transformation (CWT) and SVM model.The CWT has been used to analyze the vibration of the frame of the induction motor (IM) during the starting operation.Li et al. [8] developed an FD strategy for the bearing with the help of the SVM model and ant colony optimization.Gryllias et al. [9] suggested an FD methodology for bearing operating in the industrial environment using the SVM model.An FD strategy for the bearing fault was developed in [10], using permutation entropy, ensemble empirical decomposition, and SVM.Amarnath et al. [11] used the sound signals for the bearing FD by using the DT-based FD model.The DT algorithm-based model for bearing FD was fed data from statistical features acquired from acoustic signals.Sugumaran et al. [12] developed a bearing FD strategy by using the DT algorithm to select features with the help of vibration signals and feeding to the proximal SVM model.Tian et al. [13] propounded a bearing FD strategy using spectral kurtosis and cross-correlation for the FES from the vibration signals.These features were applied to develop the health index with the help of principal component analysis and kNN.Pandya et al. [14] developed a rolling element-bearing FD strategy using empirical mode decomposition and kNN.The features were extracted with the help of the Hilbert-Huang transformation from the acoustic signals.Sharma et al. [15] developed a bearing FD approach using vibration-based time-domain features and feeding to the weighted kNN classifier.The conventional ML-based bearing FD strategy requires efficient features as inputs.Feature extraction, as well as feature selection, is a tedious and often challenging task.The manual FES also requires prior knowledge and human expertise.It is difficult to develop the generic features that can be given to the traditional ML-centered FD approaches.This often restricts the performance of these ML-based FD approaches.
Given the problem associated with the abovementioned approaches, researchers have started to use deep learning (DL) algorithms in the domain of PHM.In recent years, DL algorithms have drawn a lot of attention for FD in EMs.The DL algorithms, such as convolutional neural network (CNN), autoencoder (AE), deep Boltzmann machine (DBM), deep belief network (DBN), and recurrent neural network (RNN), have been considered useful in the FD domain [15][16][17][18].These algorithms are popular in the domain of image classification, biomedical, and many more [18,19].The DL-based FD approaches overcome one of the major lacunae of the conventional ML-based FD approaches, i.e., FES.The DL algorithms, such as CNN, facilitate autonomous FES.The CNN is extensively used in the image classification problem.The inherent advantages of CNN have attracted researchers to apply it in the PHM domain.Several research papers have efficiently used the CNNbased model for the bearing FD in electrical motors, as well as other rotating machines.Janssens et al. [20] proposed the bearing FD in rotating machinery using the CNN-based model.The vibration signals collected under various states of the bearing were used for the analysis and development of the CNN-based model.Magar et al. [21] proposed an FD approach employing the CNN-based FD model.The model was tested on the Case Western Reserve University Bearing dataset [22].Mukesh et al. [23] developed a bearing defect identification and classification methodology using the CNN-based FD model.The three-bearing condition, i.e., normal condition, ORF, and IRF, were considered and were efficiently detected.Zhao et al. [24] propounded a deep CNN model for planet-bearing FD.The DL-based model was composed of multiple CNN layers and achieved reasonable accuracy.The Hilbert transformation was applied to the input signal to develop input for the deep CNN model.Wang et al. [25] proposed a bearing FD approach with the help of a multiscale neural network comprising one-and two-dimensional convolution channels.The three states of the bearing, namely healthy, ORF, and IRF, were included for the analysis.Islam et al. [26] developed a bearing FD scheme with the help of an adaptive deep CNN model that utilized the 2D input information from the acoustic signals.Wang et al. established an FD model for the rolling element bearings in combination with hidden Markov models (HMM) and CNN.CNN layers were employed for the FES from the vibration data, and HMM was used as a tool for fault classification.Zhang et al. [27] have proposed a bearing FD using transfer learning (TL) in variable operating conditions.Zhu et al. [28] have developed a TL-based method for bearing FD under changing working conditions.A locomotive bearing FD approach has been proposed using TL from laboratory to practical usage bearings [29].Shao et al. [30] have proposed a machine FD using waveletbased images and transfer learning.Zheng et al. [31] have proposed a bearing FD using the TL-based framework using a sufficient dataset of bearings from different sources.Hasan and Kim [32] have developed a bearing FD approach using the vibration data conversion by Stockwell transformation and transfer learning.Lu et al. [33] have proposed an intelligent bearing FD using CNN and TL.Wen et al. [34] have used the VGG19 model with TL for the bearing FD.
The deep model offers advantages, such as better domain adaptabilities, and generalization capabilities [35].Most of the previously used models have been shallow in nature.The previous works have used relatively shallow models, and the potential of the deep models with over 50 layers has not previously been investigated.In addition, most of the papers have used dedicated signal-processing tools for converting 1D data to the 2D data before the analysis [30].Moreover, the challenges of training such deep models can be effectively addressed by transfer learning (TL) [36].Only a limited number of papers are available that have used both transfer learning and deeper network for the bearing FD.The present work aims to use deep models with the application of the TL for the bearing FD.The developed work uses the state-of-the-art ResNet50V2 (RNV2) [37] model for FES from the input signals, and final classification is conducted with the fully connected layers and the SoftMax function.The key contributions of the present work may be summarized as follows: 1.
A deep model with transfer learning is proposed, which can perform efficient feature extraction and fault detection.2.
Without using any particular signal-processing transformation, such as a wavelet or short-time Fourier transformation, the input signals are transformed to 2D images.

3.
Transfer learning inhibits the need to train the deep model from scratch, which helps the model converge more quickly.4.
To the best of the author's knowledge, this is the first work that has used the ResNet50V2 model for the bearing FD.
The rest of this article is structured as follows: Section 2 describes transfer learning and the ResNet50V2 model in detail.Section 3 presents the proposed methodology.Section 4 provides the results and assessment.Finally, Section 5 concludes the work.

Transfer Learning (TL)
TL is a popular method in image classification problems, text classification, building utilization, and spam filtering.This consists of learning features on one task and applying them to them on the other task.To avoid having to train the new model from the start, TL entails using the features from the previously trained model.The pretrained framework is generally trained on a large dataset comprising a lot of data.The application of TL facilitates the new model with lower training time and lower generalization error.Figure 1 shows the fundamental layout of the standard TL process.The transfer from the source model can be weights, which could be used for the weight initialization of the target model.These pretrained models are generally trained on large datasets, such as ImageNet.The transfer learning process involves steps such as selecting the pretrained model, creating the base model, weight transfer, training the new layers on the target dataset, and improving the model via fine-tuning.Figure 2 shows a flowchart of the TL process.

Transfer Learning (TL)
TL is a popular method in image classification problems, text classification, buildin utilization, and spam filtering.This consists of learning features on one task and applyin them to them on the other task.To avoid having to train the new model from the start, T entails using the features from the previously trained model.The pretrained framewor is generally trained on a large dataset comprising a lot of data.The application of TL fa cilitates the new model with lower training time and lower generalization error.

ResNet50V2 (RNV2)
ResNet50V2 [37] is a modified version of the ResNet50, and it performs better tha the ResNet50 and ResNet101 on the ImageNet dataset.ResNet stands for the residual neu ral network.An RNN is a form of deep neural network (DNN) and draws inspiration from the cerebral cortex's pyramidal cells.ResNet implements this functionality by employin shortcuts to go over particular layers or skip connections.The ResNet models are typicall run with double-or triple-layer skips that have batch normalization and nonlinearitie (ReLU) in between.The skip weights can also be learned using an additional weight ma trix; these models are referred to as highway nets.DenseNets are models that include nu merous parallel skips.ResNet allows the deep network training of more than 150 layers Before the ResNet, the training of DNNs was a challenging task owing to the problem o vanishing gradients.The core of the ResNet is the residual blocks.Assume that the inpu is , the learning-derived underlying mapping is (), and () is the input to the act vation function.The portions in the dotted line boxes in Figure 3a, must learn the mappin (), whereas in Figure 3b, they must learn the mapping () − .The term "residua

Transfer Learning (TL)
TL is a popular method in image classification problems, text classification, building utilization, and spam filtering.This consists of learning features on one task and applying them to them on the other task.To avoid having to train the new model from the start, TL entails using the features from the previously trained model.The pretrained framework is generally trained on a large dataset comprising a lot of data.The application of TL facilitates the new model with lower training time and lower generalization error.Figure 1 shows the fundamental layout of the standard TL process.The transfer from the source model can be weights, which could be used for the weight initialization of the target model.These pretrained models are generally trained on large datasets, such as ImageNet.The transfer learning process involves steps such as selecting the pretrained model, creating the base model, weight transfer, training the new layers on the target dataset, and improving the model via fine-tuning.Figure 2 shows a flowchart of the TL process.

ResNet50V2 (RNV2)
ResNet50V2 [37] is a modified version of the ResNet50, and it performs better than the ResNet50 and ResNet101 on the ImageNet dataset.ResNet stands for the residual neural network.An RNN is a form of deep neural network (DNN) and draws inspiration from the cerebral cortex's pyramidal cells.ResNet implements this functionality by employing shortcuts to go over particular layers or skip connections.The ResNet models are typically run with double-or triple-layer skips that have batch normalization and nonlinearities (ReLU) in between.The skip weights can also be learned using an additional weight matrix; these models are referred to as highway nets.DenseNets are models that include numerous parallel skips.ResNet allows the deep network training of more than 150 layers.Before the ResNet, the training of DNNs was a challenging task owing to the problem of vanishing gradients.The core of the ResNet is the residual blocks.Assume that the input is , the learning-derived underlying mapping is (), and () is the input to the activation function.The portions in the dotted line boxes in Figure 3a, must learn the mapping (), whereas in Figure 3b, they must learn the mapping () − .The term "residual

ResNet50V2 (RNV2)
ResNet50V2 [37] is a modified version of the ResNet50, and it performs better than the ResNet50 and ResNet101 on the ImageNet dataset.ResNet stands for the residual neural network.An RNN is a form of deep neural network (DNN) and draws inspiration from the cerebral cortex's pyramidal cells.ResNet implements this functionality by employing shortcuts to go over particular layers or skip connections.The ResNet models are typically run with double-or triple-layer skips that have batch normalization and nonlinearities (ReLU) in between.The skip weights can also be learned using an additional weight matrix; these models are referred to as highway nets.DenseNets are models that include numerous parallel skips.ResNet allows the deep network training of more than 150 layers.Before the ResNet, the training of DNNs was a challenging task owing to the problem of vanishing gradients.The core of the ResNet is the residual blocks.Assume that the input is x, the learning-derived underlying mapping is f (x), and f (x) is the input to the activation function.The portions in the dotted line boxes in Figure 3a, must learn the mapping f (x), whereas in Figure 3b, they must learn the mapping f (x) − x.The term "residual block" was acquired in this manner.Given that the necessary underlying mapping is the identity function f (x) = x, the residual mapping is significantly easier to train because it simply has to set the weights and biases of the top weight layer (such as the fully connected layer and convolutional layer) to zero.The solid line in Figure 3b that connects the layer input x to the addition operator is referred to as a residual connection (or shortcut connection).The residual blocks provide faster input propagation across layers via the residual connections.In RNV2, a modification was made for the ResNet50 model in the propagation formulations of the links between the blocks.The core idea of the RNV2 is the application of the identity shortcut connection that skips one or more layers.
Mathematics 2022, 10, x FOR PEER REVIEW 5 of 15 block" was acquired in this manner.Given that the necessary underlying mapping is the identity function () = , the residual mapping is significantly easier to train because it simply has to set the weights and biases of the top weight layer (such as the fully connected layer and convolutional layer) to zero.The solid line in Figure 3b that connects the layer input  to the addition operator is referred to as a residual connection (or shortcut connection).The residual blocks provide faster input propagation across layers via the residual connections.In RNV2, a modification was made for the ResNet50 model in the propagation formulations of the links between the blocks.The core idea of the RNV2 is the application of the identity shortcut connection that skips one or more layers.

The Proposed Methodology
The bearing fault detection model is crucial for the uninterrupted operation of the motors.The core idea of the proposed method is to apply transfer learning to minimize the computational burden of the deep fault detection model.The proposed method combines transfer learning with deep learning to develop an efficient fault detection approach for the bearings.A simple 1D to 2D conversion technique has been proposed.In addition, the details of the fault detection model have been described in this section.

Data Preprocessing
Timely bearing fault detection is crucial to avoid unwanted downtime in industries.Generally, the data obtained from the sensors are time series data.The model training is dependent on the data condition.If the data include noises and irrelevant information, then they make model training a challenging task.Data preparation involves steps such as data integration, data cleaning, data segmentation, dimensionality reduction, and transformation.These steps can help in cleaning the raw data.Because data-driven approaches struggle to handle raw signals, the traditional ML-based FD approach necessitates extensive preprocessing before analysis.Data conditioning involves the optimal FES from the raw signals and is based on human expertise and prior knowledge.FD is strongly reliant on FES, and as incorrect FES can lead to fault misclassification.Because of FD's dependence on the FES, it is both a difficult and crucial task.Xu et al. [38] have proposed wavelet transformation-based images for the motor imagery (MI) EEG signals.Saucedo-

The Proposed Methodology
The bearing fault detection model is crucial for the uninterrupted operation of the motors.The core idea of the proposed method is to apply transfer learning to minimize the computational burden of the deep fault detection model.The proposed method combines transfer learning with deep learning to develop an efficient fault detection approach for the bearings.A simple 1D to 2D conversion technique has been proposed.In addition, the details of the fault detection model have been described in this section.

Data Preprocessing
Timely bearing fault detection is crucial to avoid unwanted downtime in industries.Generally, the data obtained from the sensors are time series data.The model training is dependent on the data condition.If the data include noises and irrelevant information, then they make model training a challenging task.Data preparation involves steps such as data integration, data cleaning, data segmentation, dimensionality reduction, and transformation.These steps can help in cleaning the raw data.Because data-driven approaches struggle to handle raw signals, the traditional ML-based FD approach necessitates extensive preprocessing before analysis.Data conditioning involves the optimal FES from the raw signals and is based on human expertise and prior knowledge.FD is strongly reliant on FES, and as incorrect FES can lead to fault misclassification.Because of FD's dependence on the FES, it is both a difficult and crucial task.Xu et al. [38] have proposed wavelet transformation-based images for the motor imagery (MI) EEG signals.Saucedo-Dorantes et al. [39] have extracted the deep features for the FD in the bearings.A stacked autoencoder has been used for extracting the fault features.Azamfar [40] et al. have developed the gearbox FD approach with the help of the 2D CNN and current signature analysis.Authors have used the fused data from the multiple sensors before applying it to the CNN model.Dedicated signal-processing tools such as wavelet transformation (WT) or short-time Fourier transformation (STFT) offer advantages such as localization in the time and frequency domain, efficient data representation, time-frequency information, and more.However, applying a dedicated signal-processing tool is also challenging.WT has disadvantages too, such as computationally intensive, shift sensitivity, and poor directionality.The fact that STFT has a fixed window is a serious flaw.It often restricts the efficient representation of the data.Moreover, applying any dedicated signal-processing tools requires expertise and extensive knowledge of the data.Many other authors have also used 1D data to 2D conversion for the fault diagnosis [41,42].The proposed TL-based FD model takes the image as input.A simple yet effective technique has been utilized to transform the 1D vibration data into 2D images.To build the 2D images, the acquired 1D data is divided to obtain signal samples for the various physical conditions of the bearing.The color images are in RGB format (i.e., a matrix), which means that each image has height, width, and three channels: red (R), green (G), and blue (B).The RGB pictures, or color images, are made up of many pixels.The pixel is a composition of a triplet of R, G, and B elements.R, G, and B elements have a range (0-255).The fact that color images have numerous pixels means they have several triplets of R, G, and B. Pixel(p, q, c) represents the RGB matrix, where p = 1, . . ., d, q = 1, . . ., d, and the third dimension c includes the red (c = 1), green (c = 2), and blue (c = 3) channels.With the aid of Equation ( 1), segments (S) are created from the time-domain signal samples (S) to create a root matrix (RM).The sample's maximum and lowest values, given by Equation ( 2), are used to normalize the root matrix.Finally, Equation ( 3) is computed to estimate Pixel(p, q, c) from the normalized root matrix.The red, green, and blue components of Pixel(p, q, c) are identical to one another, and the range of their pixel values is (0-255).Python was used to implement this technique.Figure 4 illustrates the proposed method.

Fault Detection Model
In the FD field, it is challenging to generate a considerable number of labeled fault data.The availability of large, labeled fault data is a challenging issue, and these data are quite low, compared with the ImageNet dataset.The performance of deep CNN architecture for FD is constrained by the lack of large quantities of labeled fault data, which makes training difficult.Transfer learning can mitigate these issues and helps in the overall improvement of the fault detection models.The pretrained RNV2 model is used for knowledge transfer and fault diagnosis.RNV2 shows an example of excellent image segregation performance.Its layers can perform FES from the input.Figure 4

Fault Detection Model
In the FD field, it is challenging to generate a considerable number of labeled fault data.The availability of large, labeled fault data is a challenging issue, and these data are quite low, compared with the ImageNet dataset.The performance of deep CNN architecture for FD is constrained by the lack of large quantities of labeled fault data, which makes training difficult.Transfer learning can mitigate these issues and helps in the overall improvement of the fault detection models.The pretrained RNV2 model is used for knowledge transfer and fault diagnosis.RNV2 shows an example of excellent image segregation performance.Its layers can perform FES from the input.Figure 4   A complete analysis has been conducted on the vibration data of the CWRU-bearing dataset [22].Figure 6 shows a view of the test setup.It comprises a motor, a torque transduce/encoder (center), a dynamometer (right), and control circuits (not shown).The bearing faults were induced with the help of electro-discharge machining.The bearings faults were emulated with the help of electro-discharge technology, having fault diameters of (0.007, 0.014, and 0.021) inches.The input vibration data was acquired for different loading conditions, such as (0, 1, 2, and 3) hp, including the various fault conditions, such as IRF, ORF, BBD, and healthy bearings.The vibration signal was collected at the sampling frequency of 12 kHz.The vibration data were acquired employing sensors fitted on the drive side of the motor.These data were processed using the method described in Section 3. The 1D signals have been converted to the images for the FD using the IRNV2 model and are shown in Table 1.The randomness of the data points distribution helps to avoid the bias problem and facilitates an equal chance of selection.The four states of the bearing are under consideration, namely healthy, IRF, ORF, and BBD.Overall, 24,000 images were generated for the proposed TL-based FD approach.Of the 24,000 images, 70% (16,800) of data were employed for the training, 15% (3600) for validation, and 15% (3600) for the model testing.Each state of the bearing contains 4200 images for training, 900 images for validation, and 900 images for testing.A complete analysis has been conducted on the vibration data of the CWRU-bearing dataset [22].Figure 6 shows a view of the test setup.It comprises a motor, a torque transduce/encoder (center), a dynamometer (right), and control circuits (not shown).The bearing faults were induced with the help of electro-discharge machining.The bearings faults were emulated with the help of electro-discharge technology, having fault diameters of (0.007, 0.014, and 0.021) inches.The input vibration data was acquired for different loading conditions, such as (0, 1, 2, and 3) hp, including the various fault conditions, such as IRF, ORF, BBD, and healthy bearings.The vibration signal was collected at the sampling frequency of 12 kHz.The vibration data were acquired employing sensors fitted on the drive side of the motor.These data were processed using the method described in Section 3. The 1D signals have been converted to the images for the FD using the IRNV2 model and are shown in Table 1.The randomness of the data points distribution helps to avoid the bias problem and facilitates an equal chance of selection.The four states of the bearing are under consideration, namely healthy, IRF, ORF, and BBD.Overall, 24,000 images were generated for the proposed TL-based FD approach.Of the 24,000 images, 70% (16,800) of data were employed for the training, 15% (3600) for validation, and 15% (3600) for the model testing.Each state of the bearing contains 4200 images for training, 900 images for validation, and 900 images for testing.

Results and Discussions
Deep architecture training from scratch without the TL is a challenging task.The Res-Net blocks of the RNV2 were utilized as feature extractors.These features were given to the FCLs, with the SoftMax layer as the last layer.The pretrained RNV2 framework was utilized as the base model, their weights were frozen, and new top layers were added and trained.These top layers were trained with random weight initialization on extracted features from the ResNet blocks.The fact that the base model was run only once on the training data, rather than once for each training period, was a substantial advantage of the proposed method.Thus, it is much faster and cheaper.The model was trained multiple times and fine-tuned for efficient performance.The model's performance was evaluated by estimating the confusion matrix (CM) and different performance metrics such as the accuracy (), precision (), sensitivity (), and 1-score.The CM is a tabular arrangement that visualizes and encapsulates the performance of a classification model.The values of these performance metrics, such as , , and 1-score, can be calculated as:

Results and Discussions
Deep architecture training from scratch without the TL is a challenging task.The Res-Net blocks of the RNV2 were utilized as feature extractors.These features were given to the FCLs, with the SoftMax layer as the last layer.The pretrained RNV2 framework was utilized as the base model, their weights were frozen, and new top layers were added and trained.These top layers were trained with random weight initialization on extracted features from the ResNet blocks.The fact that the base model was run only once on the training data, rather than once for each training period, was a substantial advantage of the proposed method.Thus, it is much faster and cheaper.The model was trained multiple times and fine-tuned for efficient performance.The model's performance was evaluated by estimating the confusion matrix (CM) and different performance metrics such as the accuracy (), precision (), sensitivity (), and 1-score.The CM is a tabular arrangement that visualizes and encapsulates the performance of a classification model.The values of these performance metrics, such as , , and 1-score, can be calculated as:

Results and Discussions
Deep architecture training from scratch without the TL is a challenging task.The Res-Net blocks of the RNV2 were utilized as feature extractors.These features were given to the FCLs, with the SoftMax layer as the last layer.The pretrained RNV2 framework was utilized as the base model, their weights were frozen, and new top layers were added and trained.These top layers were trained with random weight initialization on extracted features from the ResNet blocks.The fact that the base model was run only once on the training data, rather than once for each training period, was a substantial advantage of the proposed method.Thus, it is much faster and cheaper.The model was trained multiple times and fine-tuned for efficient performance.The model's performance was evaluated by estimating the confusion matrix (CM) and different performance metrics such as the accuracy (), precision (), sensitivity (), and 1-score.The CM is a tabular arrangement that visualizes and encapsulates the performance of a classification model.The values of these performance metrics, such as , , and 1-score, can be calculated as: Table 1.Samples of images created for various states of the bearing.

Results and Discussions
Deep architecture training from scratch without the TL is a challenging task.The Res-Net blocks of the RNV2 were utilized as feature extractors.These features were given to the FCLs, with the SoftMax layer as the last layer.The pretrained RNV2 framework was utilized as the base model, their weights were frozen, and new top layers were added and trained.These top layers were trained with random weight initialization on extracted features from the ResNet blocks.The fact that the base model was run only once on the training data, rather than once for each training period, was a substantial advantage of the proposed method.Thus, it is much faster and cheaper.The model was trained multiple times and fine-tuned for efficient performance.The model's performance was evaluated by estimating the confusion matrix (CM) and different performance metrics such as the accuracy (), precision (), sensitivity (), and 1-score.The CM is a tabular arrangement that visualizes and encapsulates the performance of a classification model.The values of these performance metrics, such as , , and 1-score, can be calculated as:

Results and Discussions
Deep architecture training from scratch without the TL is a challenging task.The Res-Net blocks of the RNV2 were utilized as feature extractors.These features were given to the FCLs, with the SoftMax layer as the last layer.The pretrained RNV2 framework was utilized as the base model, their weights were frozen, and new top layers were added and trained.These top layers were trained with random weight initialization on extracted features from the ResNet blocks.The fact that the base model was run only once on the training data, rather than once for each training period, was a substantial advantage of the proposed method.Thus, it is much faster and cheaper.The model was trained multiple times and fine-tuned for efficient performance.The model's performance was evaluated by estimating the confusion matrix (CM) and different performance metrics such as the accuracy (), precision (), sensitivity (), and 1-score.The CM is a tabular arrangement that visualizes and encapsulates the performance of a classification model.The values of these performance metrics, such as , , and 1-score, can be calculated as:

Results and Discussion
Deep architecture training from scratch without the TL is a challenging task.The ResNet blocks of the RNV2 were utilized as feature extractors.These features were given to the FCLs, with the SoftMax layer as the last layer.The pretrained RNV2 framework was utilized as the base model, their weights were frozen, and new top layers were added and trained.These top layers were trained with random weight initialization on extracted features from the ResNet blocks.The fact that the base model was run only once on the training data, rather than once for each training period, was a substantial advantage of the proposed method.Thus, it is much faster and cheaper.The model was trained multiple times and fine-tuned for efficient performance.The model's performance was evaluated by estimating the confusion matrix (CM) and different performance metrics such as the accuracy (acc), precision (p), sensitivity (s), and F1-score.The CM is a tabular arrangement that visualizes and encapsulates the performance of a classification model.The values of these performance metrics, such as p, s, and F1-score, can be calculated as: It is evident from Table 2 that the values of various performance metrics are more than 99%, which demonstrates that all the states of the bearing are efficiently classified.The CM (Figure 9) shows that states such as healthy, IRF, ORF, and BBD are classified reasonably, with more than 99% accuracy.
In addition, a comparative study was conducted on the proposed and existing DLbased FD methods.Multiple DL-based FD methods were assessed to inspect the functioning of the proposed IRNV2-based FD model.The proposed model was compared with two DL-based FD methods employing the sparse filters [43] and DBN [44].Lei et al. [43] employed sparse filters for FES from the vibration data and SoftMax to classify faults.Gan and Wang [44] developed an FD network by using DBNs for FD in the mechanical system.In addition, comparisons were made with many CNN-based FD models, such as the hierarchical CNN (HCNN) model [45], adaptive deep CNN (ADCNN) model [46], multiple sensors-based CNN (MCNN) model [47], 1D CNN model [48], CNN model using vibration images (CNNVM) [49], stacked autoencoder and DBN model (SAE-DBN) [50], and CNN long short-term memory (CNN-LSTM) model [51].Lu et al. [45] developed a bearing FD approach using a DL method by employing CNN.Guo et al. [46] used hierarchical adaptive deep CNN for the bearing FD.Xia et al. [47] incorporated sensor fusion and CNN for efficient fault detection in rotating machines.Eren [48] proposed a 1D CNN model for the bearing FD.Hoang and Kang [49] developed a CNN-based FD methodology for the bearing FD utilizing vibration images.Chen and Li [50] developed a SAE-DBN model for the bearing FD using the multisensory feature fusion.Wang et al. [51] developed a motor fault diagnosis model using the multilevel information fusion and combination of CNN and LSTM.Table 3 lists the accuracy of the various models.Table 3 shows that the developed model surpasses FD techniques utilizing sparse filters, DBNs, and other CNN models.The proposed fault detection approach surpasses the different CNN models and has a superior accuracy of 99.50%.In addition, the developed FD approach was compared with an FD method by using a DBN and outperformed it.The developed method shows superior accuracy to that of other DL-based FD procedures, as reflected in Table 3, owing to its higher depth, autonomous FES properties, and fault categorization capabilities.Moreover, the profundity of the model facilitates better domain adaptabilities and generalization capabilities.
Deep architectures (proposed model) learn distributed and sparse representations.These features are efficient in comparison to features learned by shallow ML frameworks.It is expedient to utilize a deep framework in comparison with a shallow ML framework for better data representation.The depth of the networks enables more-effective FES.The deep framework makes domain adaptivity simple.The benefits of the purported approach over the standard ML-based fault detection techniques include automatic FES and good domain adaptation.The proposed method outperforms existing methods and offers accurate fault analysis with minimal human interaction.Additionally, it shows that fine-tuning and modeling with TL together produce superior accuracy in a small number of training epochs to a model created from scratch.When training a CNN model with more depth from scratch, it takes a long time to process.The current study efficiently applies TL to overcome the deep CNN's shortcomings.Despite being a deep network, the proposed technique reduces the computational cost.

Conclusions
This article proposes a TL-based fault detection approach for the bearing fault.A thorough analysis was conducted of the bearing dataset from the CWRU-bearing data center.The proposed model effectively performed the bearing fault detection with reasonable accuracy.The depth of the model aids in efficient domain learning capabilities.In addition, the proposed model mitigated the need for manual FES, which is a cumbersome task.Despite the high depth of the model, owing to transfer learning, the model was not required to be trained from scratch.This helped in saving time, and it worked with low computational power.The average accuracy of the model was more than 99%, and the values of the other performance metrics were also on the higher side.These results justify the performance of the proposed IRNV2-based bearing FD model.Thereby, it can be established that the developed model enables an intelligent and computationally viable solution for bearing fault detection.
Figure shows the fundamental layout of the standard TL process.The transfer from the sourc model can be weights, which could be used for the weight initialization of the targe model.These pretrained models are generally trained on large datasets, such as ImageNe The transfer learning process involves steps such as selecting the pretrained model, crea ing the base model, weight transfer, training the new layers on the target dataset, an improving the model via fine-tuning.Figure 2 shows a flowchart of the TL process.

Figure 1 .
Figure 1.Basic layout of the standard transfer learning process.

Figure 2 .
Figure 2. Flowchart of the transfer learning process.

Figure 1 .
Figure 1.Basic layout of the standard transfer learning process.

Figure 1 .
Figure 1.Basic layout of the standard transfer learning process.

Figure 2 .
Figure 2. Flowchart of the transfer learning process.

Figure 2 .
Figure 2. Flowchart of the transfer learning process.
shows the composition of the purported RNV2-based FD model.The fault detection dataset's class labels were fitted by using the ResNet layers of RNV2.There are two types of blocks present in the structure, namely ResNet-Block-1 and ResNet-Block-2.Both blocks comprise the convolutional layers, batch normalization, ReLu activation, and shortcut connections.The difference between the two blocks lies in the shortcut connection.ResNet-Block-1 has an identity function in the shortcut connection, whereas ResNet-Block-2 has convolutional layers and batch normalization in the shortcut connection.The depth of the network and efficient feature extraction layers of the RNV2-based FD model would help improve the
shows the composition of the purported RNV2-based FD model.The fault detection dataset's class labels were fitted by using the ResNet layers of RNV2.There are two types of blocks present in the structure, namely ResNet-Block-1 and ResNet-Block-2.Both blocks comprise the convolutional layers, batch normalization, ReLu activation, and shortcut connections.The difference between the two blocks lies in the shortcut connection.ResNet-Block-1 has an identity function in the shortcut connection, whereas ResNet-Block-2 has convolutional layers and batch normalization in the shortcut connection.The depth of the network and efficient feature extraction layers of the RNV2-based FD model would help improve the performance and achieve high accuracy in FD.The proposed model extracts feature from images generated from a 1D signal using RNV2.The mined bottleneck features were used as an input to the classifier, including one FCL and SoftMax activation function with four output nodes.Weights were randomly initialized for fully connected layers (FCLs).The hyperparameters tuning was conducted for the fully connected layers.ResNet blocks utilize the same weight of the pretrained RNV2 model (i.e., weights are locked and represented by a lock symbol), and the remaining block is fine-tuned (represented by an unlocking symbol), as shown in Figure 5.The training of FCLs is conducted with the help of the Adam optimizer's having a preliminary learning rate of 0.001.Mathematics 2022, 10, x FOR PEER REVIEW 8 of 15 performance and achieve high accuracy in FD.The proposed model extracts feature from images generated from a 1D signal using RNV2.The mined bottleneck features were used as an input to the classifier, including one FCL and SoftMax activation function with four output nodes.Weights were randomly initialized for fully connected layers (FCLs).The hyperparameters tuning was conducted for the fully connected layers.ResNet blocks utilize the same weight of the pretrained RNV2 model (i.e., weights are locked and represented by a lock symbol), and the remaining block is fine-tuned (represented by an unlocking symbol), as shown in Figure 5.The training of FCLs is conducted with the help of the Adam optimizer's having a preliminary learning rate of 0.001.

Figure 5 .
Figure 5. Block diagram of the proposed FD methodology.

Figure 5 .
Figure 5. Block diagram of the proposed FD methodology.

Figure 6 .
Figure 6.Test setup for the analysis comprising 2 hp motor.

Figure 6 .
Figure 6.Test setup for the analysis comprising 2 hp motor.

IRF 15 Figure 6 .
Figure 6.Test setup for the analysis comprising 2 hp motor.

ORF 15 Figure 6 .
Figure 6.Test setup for the analysis comprising 2 hp motor.

Healthy 15 Figure 6 .
Figure 6.Test setup for the analysis comprising 2 hp motor.

Table 1 .
Samples of images created for various states of the bearing.

Table 1 .
Samples of images created for various states of the bearing.Test setup for the analysis comprising 2 hp motor.

Table 1 .
Samples of images created for various states of the bearing.

Table 1 .
Samples of images created for various states of the bearing.

Table 1 .
Samples of images created for various states of the bearing.

Table 2 .
Performance metrics of the proposed model.

Table 3 .
Comparison of the proposed model with various DL-based FD approaches.