LSTM-Based Condition Monitoring and Fault Prognostics of Rolling Element Bearings Using Raw Vibrational Data

: The 4.0 industry revolution and the prevailing technological advancements have made industrial units more intricate. These complex electro-mechanical units now aim to improve efﬁciency and increase reliability. Downtime of such essential units in the current competitive age is unafford-able. The paradigm of fault diagnostics is being shifted from conventional to proactive predictive approaches. As a result, Condition-based Monitoring and prognostics are now essential components of complex industrial systems. This research is focused on developing a fault prognostic system using Long Short-Term Memory for rolling element bearings because they are a critical component of industrial systems and have one of the highest fault frequencies. Compared to other research, feature engineering is minimized by using raw time series sensor data as an input to the model. Our model achieved the lowest root mean square error and outperformed similar research models where time domain, frequency domain, or time-frequency domain features were used as input to the model. Furthermore, using raw vibration data also enabled better generalization of the model. This has been conﬁrmed by evaluating the performance of the developed model against vibration data generated by distinct sources, including hydro and wind power turbines.


Introduction
One of the most important components in industrial machines are bearings, which normally operate under a very stressful environment and, hence, are continuously prone to degradation [1].Thus, bearings are among the few critical and foremost fault-causing components in electro-mechanical systems [2].When aged, faults like bearing wear and tear and symptoms, such as abnormal vibrations, high temperatures, and misalignments, are introduced into the system.Therefore, a predictive approach toward fault identification is necessary in modern, industrially-revolutionized times to avoid forced downtime.
The existing Condition-based Monitoring (CBM) and system health monitoring approaches are generically classified into (1) physics-based models that require extensive domain knowledge where the models are constructed using mathematical equations, and (2) data-driven models where the training of the model is conducted using historic data generated by the sensors [3,4] because the data-driven models work with historic sensory data collected from the machines.Such data can be recorded online, where the model's parameters can be updated in real-time.Therefore, this approach makes the data-driven models more attractive for the predictive maintenance of electro-mechanical machines.
Various models have been developed for the effective condition monitoring of bearings, including acoustic emission-based models [5,6].However, vibration signal analysis [7,8] is one of the most widely used and most effective methods to conduct the prognosis of rotating machines.The occurrence of faults in the rotating machines can result in machine downtime and is directly related to higher operation and maintenance (O&M) costs, serious accidents, and economic losses [9,10].The fault diagnosis in rotating machines can therefore be conducted by measuring the speed variations using speed sensors [11,12] or by quantifying the vibration signals using accelerometers [13].However, fault detection and prognostics using vibration signals are the most widely used approaches.A detailed review of the feature extraction and selection techniques, different classifiers, and deep learning models using the Case Western Reserve University bearing center (CWRU) dataset was conducted [14].Similarly, a future roadmap for intelligent fault diagnostics has been provided [15], and a systematic review of the recent advancements in mechanical fault diagnostics and prognostics was carried out [16].
Several techniques have been adopted to identify faults in the vibration data, including short-time Fourier transform (STFT) and wavelet transform [17][18][19][20].Among the other adapted approaches are band-pass filtering, phase demodulation, Kalman filters, and deep learning techniques based on deep neural networks [21][22][23][24][25].A total of 17 different classifiers using the MATLAB Classification Learner toolbox along with support vector machine (SVM), K-nearest neighbors (KNN), and ensemble have been used to evaluate the performance of classifiers for diagnosing the faults in induction motors [26].Likewise, classical machine learning algorithms have also been used to detect leakages in the waterwall tube of a steam power plant [27].
As computational resources advance, deep learning has been in the spotlight, especially in the area of prognostics, because of its effectiveness in modeling complex systems [28].Artificial neural networks (ANNs) are the most common deep learning methods used for prognostics because of their outstanding performance against complex non-linear multi-dimensional systems.Their ability to effectively process non-linear information makes them more robust concerning noise.Many ANN architectures, such as Feed-forward, Single and Multilayer Perceptron, Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), Modular Neural Networks (MNNs), and Convolutional Neural Networks (CNNs), are being used to conduct the CBM and prognostics of industrial and renewable energy systems [29].To diagnose the bearing faults, a large memory storage retrieval (LAMSTAR) neural network based on an optimized deep learning structure is proposed [30].A Bayesian deep learning model to characterize the latent structure between RULs and degradation features is used to describe prognostic uncertainties [31].Based thereon, dynamic decisions pertinent to maintenance and spare-part ordering are taken.The performance of the proposed framework has been validated through comparison with various benchmarking policies based on a C-MAPSS turbofan engine data set.Extensive research on integrating Artificial Intelligence, Big Data, and the Industrial Internet of Things in smart manufacturing and modern dynamic industrial processes and energy generation units under the umbrella of Industry 4.0 is conducted [32].The issue of data imbalances is solved using virtual sensors to artificially induce different health states in the vibration data [33].
This research, therefore, focused on developing a dynamic and robust prognostic model that can efficiently predict the degradation and faults of the bearings based on the raw vibration data recorded from the sensors.All analyses in this research were conducted in the time domain, and the need for feature engineering was minimized by taking actual sensor data as input.The proposed model was developed to predict the bearing degradation and faults ahead of time.Because the raw vibration data generated by the sensors were used for model training and testing, the model could be effectively generalized.To evaluate the generalization capabilities of the developed model, the model was tested for analyzing the faults in the bearings of both hydro and wind power turbines.It was observed that the model effectively predicted the bearing vibration values irrespective of the data source.Using raw vibration values instead of time-domain features also has an advantage in cases where condition monitoring is carried out online in real-time.Since, at any instant in time, only partial data is available, effectively acquiring time-domain features becomes a difficult ask.
The remaining part of the paper is organized as follows.Section 1.1 discusses the novelty of the research.Section 2 explains the overall methodology, including the justification and architecture of the proposed model.Section 3 explains the process of data acquisition, details about the dataset, and elaborates the experimental setup.Section 3 explains the evaluation metrics and results and also compares the results with those achieved by similar research.The generalization ability of the model is also discussed in Section 3. Finally, the paper is concluded in Section 4.

Related Research and Novelty
Previous research conducted on bearing vibration datasets mostly focused on the frequency domain [34,35].At first, the time-series data were converted into the frequency domain, followed by data pre-processing and feature extraction [34].Such analysis is based on the pretext that the data components with higher frequency are actually the added system noise.Subsequently, low-pass filters were used to remove these higher-frequency components, segregating the system and bearing vibrations.However, in a realistic environment, the vibration signals generated by the machinery (other than the bearings) are an integral part of the data.Segregation of these system vibration signals from the bearing vibration signals requires an extensive domain knowledge and, hence, is highly prone to errors.
Furthermore, in similar research, either time domain features, frequency domain features, or time-frequency domain features, such as mean, standard deviation, kurtosis, and skewness, were considered input(s) to the model [36][37][38][39][40][41].However, in cases where prognostics are conducted in real-time, where only partial data generated by the Supervisory Control and Data Acquisition (SCADA) system are available at any time, such statistical feature extraction cannot be carried out effectively.
Therefore, this research concentrated on conducting the model training and testing using the actual raw sensor data as an input to the model.This approach helped reduce the need for extensive feature engineering and domain knowledge and also improved the performance of the model in terms of root mean square error (RMSE).Furthermore, because the model is trained using raw sensor data, it can be effectively generalized.An in-depth analysis of the results and generalization capabilities, along with comparing the results with other similar models, are given in later sections.

Methodology
The model was developed and tested using Python programming language.The prediction of the vibration values was carried out by first acquiring the sensory data containing raw vibration values recorded by a test rig.Subsequently, this raw data was pre-processed for the removal of any outliers.The pre-processed data were normalized and then used to train and test a prognostic model based on a machine-learning algorithm.Finally, hyper-parameter testing and fine-tuning of the model were conducted.In the end, the generalization capability of the model was verified by testing it against real vibration data generated by different sources.
Figure 1 shows the flow chart of the methodology, comprising the experimental steps.

Model Selection
The vibration signals generated by the installed sensors represent the time-series data expressed sequentially.One of the limitations of conventional models based on multidomain feature extraction, including kurtosis, spectral skewness, and wavelet coefficients, is their ineffectiveness to model the inherent sequential characteristics of the sensory data.Moreover, selecting features for these models requires extensive domain knowledge and feature engineering skills.
Furthermore, the sequential models, including the traditional ANNs, hidden Markov models (HMMs), Kalman filters, and conditional random fields, despite having their ability to handle the sequential data, are incapable of addressing the long-term dependencies.As in machine condition monitoring based on sensory data, many noisy or non-discriminative signals may exist between two consecutive informative or discriminative signals.Hence, a long delay on a time scale is induced between important data points.This leads to reduced efficiency and performance of the aforementioned models when working with the time series data.

Model Selection
The vibration signals generated by the installed sensors represent the time-series data expressed sequentially.One of the limitations of conventional models based on multi-domain feature extraction, including kurtosis, spectral skewness, and wavelet coefficients, is their ineffectiveness to model the inherent sequential characteristics of the sensory data.Moreover, selecting features for these models requires extensive domain knowledge and feature engineering skills.
Furthermore, the sequential models, including the traditional ANNs, hidden Markov models (HMMs), Kalman filters, and conditional random fields, despite having their ability to handle the sequential data, are incapable of addressing the long-term dependencies.As in machine condition monitoring based on sensory data, many noisy or non-discriminative signals may exist between two consecutive informative or discriminative signals.
Hence, a long delay on a time scale is induced between important data points.This leads to reduced efficiency and performance of the aforementioned models when working with the time series data.
To address the issues of long-term data dependency, the use of RNNs for handling sequential data has increased substantially [42].However, one of the shortcomings of RNNs is the problem of gradient exploding and vanishing.Although RNNs can store the previous inputs in the network and can be trained using backpropagation, their ability to cater to long-term dependencies in sequential data is reduced because of the gradient vanishing problem.To address the issues of long-term data dependency, the use of RNNs for handling sequential data has increased substantially [42].However, one of the shortcomings of RNNs is the problem of gradient exploding and vanishing.Although RNNs can store the previous inputs in the network and can be trained using backpropagation, their ability to cater to long-term dependencies in sequential data is reduced because of the gradient vanishing problem.
Consequently, the LSTM algorithm was introduced [43].This algorithm prevented the gradient from vanishing or exploding; however, it also addressed the long-term data dependencies by introducing forget gates in the architecture.LSTMs can simultaneously carry out representative learning and model training without additional domain knowledge.Regarding the unprecedented performance against time-series data, the LSTM algorithm was used to develop the prognostic model.

Long Short-Term Memory (LSTM)
LSTM was first presented in 1997 by Hochreiter and Schmidhuber [43].It is a type of RNN that can effectively establish a correlation between a priori information and the current state based on a time series.The proposed LSTM network comprises three basic components, the Forget gate, Input gate, and Output gate, as shown in Figure 2.These components are used to (1) Forget irrelevant information, (2) Add/Update new information, and (3) Pass on the updated information.
where: C t−1 represents the cell state of the previous timestamp; H t−1 represents the hidden state of the previous timestamp; C t represents the current cell state; H t represents the current hidden state.
Consequently, the LSTM algorithm was introduced [43].This algorithm prevented the gradient from vanishing or exploding; however, it also addressed the long-term data dependencies by introducing forget gates in the architecture.LSTMs can simultaneously carry out representative learning and model training without additional domain knowledge.Regarding the unprecedented performance against time-series data, the LSTM algorithm was used to develop the prognostic model.

Long Short-Term Memory (LSTM)
LSTM was first presented in 1997 by Hochreiter and Schmidhuber [43].It is a type of RNN that can effectively establish a correlation between a priori information and the current state based on a time series.The proposed LSTM network comprises three basic components, the Forget gate, Input gate, and Output gate, as shown in Figure 2.These components are used to (1) Forget irrelevant information, (2) Add/Update new information, and (3) Pass on the updated information.The cell state carries the information, in our case the vibration data points along with all timestamps, and is known as long-term memory.The hidden state is known as shortterm memory, hence the name Long Short-Term Memory.
Along with some minor linear interactions, the cell state Ct carries the data points straight throughout the entire chain, acting as a conveyor belt.The addition of significant data points and the removal of less significant data points to the cell state is regulated by the gates depicted in Figure 2.These gates comprise a sigmoid σ neural network layer and a point-wise multiplication operation, as depicted in Figure 3.The model training is conducted in accordance with these nodes.The cell state carries the information, in our case the vibration data points along with all timestamps, and is known as long-term memory.The hidden state is known as short-term memory, hence the name Long Short-Term Memory.
Along with some minor linear interactions, the cell state C t carries the data points straight throughout the entire chain, acting as a conveyor belt.The addition of significant data points and the removal of less significant data points to the cell state is regulated by the gates depicted in Figure 2.These gates comprise a sigmoid σ neural network layer and a point-wise multiplication operation, as depicted in Figure 3.The model training is conducted in accordance with these nodes.In the first step of training the model, the Forget gate decides whether to keep the previous data points with the time step or to forget them.This process is governed by Equation (1).

𝑓 = 𝜎 (𝑤 𝑥 + 𝑤 ℎ )
(1) where:  represents the input to the current timestamp;  represents the weight matrix associated with the input; ℎ represents the hidden state of the previous timestamp;  represents the weight matrix associated with the hidden state.
A sigmoid function is applied to change the value of  to between 0 and 1.The cell state of the previous timestamp is then multiplied with  to determine how much information to forget or keep using Equations ( 2) and (3).In the first step of training the model, the Forget gate decides whether to keep the previous data points with the time step or to forget them.This process is governed by Equation (1). where: x t represents the input to the current timestamp; w f x represents the weight matrix associated with the input; h t−1 represents the hidden state of the previous timestamp; w f h represents the weight matrix associated with the hidden state.A sigmoid function is applied to change the value of f t to between 0 and 1.The cell state of the previous timestamp is then multiplied with f t to determine how much information to forget or keep using Equations ( 2) and (3).
The network will forget the cell state of the previous timestamp if the value of f t is 0 and will retain it if the value is 1.
In the second step of the training, the input gate is used for quantifying the data points, and which new data points are stored in the cell state is decided.This step has two parts.First, the sigmoid input gate layer decides what values are updated.Second, the tanh layer creates a vector of the new values, N t , to be added to the state.The tanh activation function transforms the values to between −1 and 1.The process is conducted using Equations ( 4) and ( 5).
Both of these steps are then integrated to update the cell state of the network using Equation ( 6).
Based on the value of N t in the above equation, either the information is added or subtracted from the cell state.For a negative value of N t , the input data points are subtracted from the cell state; if the value of N t is positive, the input information is added to the cell state.
In the final step of the training, it is decided what values of the cell state are going to the output.The output is based on a filtered version of the cell state.First, a sigmoid activation function is applied to the cell state, as shown in Equation (7).
Second, a tanh activation function is multiplied with the output of the sigmoid layer using Equation (8) to calculate the current hidden state of the network and decide the relevant points to be sent as an output.
The current hidden layer h t in Equation ( 9) is a function of the long-term memory C t and the current output O t .

Experiments and Results
This section provides a detailed discussion of the dataset used, the experiments conducted, and the results.

Dataset
In this research study, we used the dataset from the Prognostic Data Repository of NASA, which was made publicly available by the Center of Intelligent Maintenance System (IMS), University of Cincinnati [44].Details of the data acquired from the IMS database are in Table 1.The plot of vibrational signals of bearing 1 in dataset 02 for the entire run-to-failureexperiment can be seen in Figure 6a.Whereas the plot of vibrational signals of bearing 3 in dataset 03 for the entire run-to-failure-experiment can be observed in Figure 6b.In Figure 6, the x-axis represents the time taken till the occurrence of a fault in the bearings, and the y-axis represents the vibration values recorded by the accelerometers in m/s 2 .

Experimental Setup
The dataset provided by IMS comprises three ( 03

Experimental Setup
The dataset provided by IMS comprises three ( 03

Data Pre-Processing
The data is first pre-processed to remove any outliers.After the data cleansing, it is then normalized.During normalization, the data is rescaled to fall between −1 and 1.The data is normalized using the min-max scaler function given in Equation (9).
where x is the original value and x is the normalized value.

Hyper-Parameter Testing and Fine Tuning
During the fine-tuning and hyper-parameter testing, the model was trained for various epochs and the training and validation losses were calculated.Subsequently, the training and validation loss was plotted to visually analyze the model's performance.For 50 epochs, the graph depicts that the model is underfit since the validation and training plots do not converge.For 150 epochs, the model was getting overfit and could not be effectively generalized.The best-fit model was 100 epochs and a batch size of 50, whereas the sequence size had been set to 10.
The optimizer used for the model was ADAM.The reason for selecting Adam as an optimizer is that it has a learning rate of 0.001 by default.That is, it is neither too small nor too high and works best in the case of LSTM.A stacked LSTM model detailed in Table 2 was then defined.For 50 epochs, the graph depicts that the model is underfit since the validation and training plots do not converge.For 150 epochs, the model was getting overfit and could not be effectively generalized.The best-fit model was 100 epochs and a batch size of 50, whereas the sequence size had been set to 10.
The optimizer used for the model was ADAM.The reason for selecting Adam as an optimizer is that it has a learning rate of 0.001 by default.That is, it is neither too small nor too high and works best in the case of LSTM.A stacked LSTM model detailed in Table 2 was then defined.To make the model work effectively in real scenarios, all analyses were conducted on the actual time series data and in the time domain.Additionally, the cleansed data from the sensors were directly input into the model, contrary to other studies where either timedomain features or frequency-domain features were used as input to the model [37][38][39][40][41].These statistical analyses and statistical feature extractions cannot be effectively carried out on partial data, especially in cases where prognostics are performed in real-time, and limited SCADA data is available at any time.Furthermore, LSTM has a competitive advantage where the need for feature engineering and extensive domain knowledge is minimized [43].Hence, instead of extracting features, raw data from the sensors were directly input into the LSTM model.

Evaluation Metrics
The performance of the proposed methodology was evaluated using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Normalized Mean Absolute Error (NMAE), and Mean Absolute Percentage Error (MAPE) metrics given by Equations ( 10)-( 13), respectively.
y i denotes the actual values and ŷi denotes the predicted values.During the prognostics of bearing vibrational data, large errors can often result in undesirable outcomes.Because RMSE squares the difference between the actual and predicted values, it gives greater weight to the larger error values and hence is a very useful metric for the performance evaluation of the proposed model.Therefore, the comparison of the model with other similar research is carried out in terms of RMSE.However, to further verify and evaluate the performance of the model, MAE, NMAE, and MAPE have also been used as evaluation metrics.

Results and Discussion
The performance of the developed LSTM model against the proposed approach was first analyzed on Dataset 02. Figure 8a shows the plot of the predicted and actual bearing vibration values.The blue color represents the actual bearing vibrations, whereas the orange color represents the predicted values.The proposed model precisely predicted the normal values and followed the degradation trend until a fault, which was also efficiently predicted.The RMSE value recorded for the Dataset 02 was 0.0145.Considering the mean vibration value, this RMSE reflects that the error in the predicted and actual values is less.3 reflect that the model performed extremely well while predicting the bearing vibration values.Removing noise in the frequency domain requires in-depth domain knowledge and can lead to the removal of significant data points that may prove useful for the model while analyzing patterns.Additionally, with LSTM, the requirement for extensive feature engineering is minimized, whereas when statistical time domain features are input into a model, they can add a biases factor, thereby affecting the prediction accuracy and limiting the generalization of the model.These issues have been efficiently addressed in this research.
The results achieved by the proposed model were then compared with similar research conducted on the IMS bearing dataset.3 reflect that the model performed extremely well while predicting the bearing vibration values.Removing noise in the frequency domain requires in-depth domain knowledge and can lead to the removal of significant data points that may prove useful for the model while analyzing patterns.Additionally, with LSTM, the requirement for extensive feature engineering is minimized, whereas when statistical time domain features are input into a model, they can add a biases factor, thereby affecting the prediction accuracy and limiting the generalization of the model.These issues have been efficiently addressed in this research.
The results achieved by the proposed model were then compared with similar research conducted on the IMS bearing dataset.Table 4 compares the RMSE of our research with the RMSE achieved by various models in other research that were trained and tested on the same dataset.The researchers who achieved close RMSE values, such as [34], first extracted the time-frequency features from the data and then used those features as input to predict the future state of the bearings.However, in this research, for predicting the future vibration values, normalized raw vibration values of the bearings were used as input to the LSTM model.This not only helped reduce the need for feature engineering but also resulted in a better generalization of the model.
Table 5 compares the prediction accuracy of other machine learning and deep learning models with this research.The results reflect that our proposed methodology and model achieved better results in terms of RMSE than other research conducted on the same dataset.Also, the model developed in this research achieved better prediction accuracy than other deep learning and machine learning models when tested against the same dataset.

Generalization Capability of the Model
To evaluate the generalization capability of the model, it was tested against real vibration data generated by other electro-mechanical systems, including hydro and wind power turbines.The first set of bearing vibration data was acquired from the SCADA system installed at Neelum-Jehlum Hydro Power Project (NJHPP) Pakistan, having a total generation capacity of 969 MW.The second data set, publicly available [56], was acquired from a wind power project (WPP) operating in northern Sweden.
The plots depicted in Figure 9a,b again show that the model not only effectively predicted the bearing vibration values but also followed the trend in the data.Furthermore, a low RMSE value of 0.11 and 0.12 was recorded for the NJHPP and WPP datasets, respectively.The test results show that irrespective of the data source, the model can effectively predict the bearing vibration values.from a wind power project (WPP) operating in northern Sweden.
The plots depicted in Figure 9a,b again show that the model not only effectively predicted the bearing vibration values but also followed the trend in the data.Furthermore, a low RMSE value of 0.11 and 0.12 was recorded for the NJHPP and WPP datasets, respectively.The test results show that irrespective of the data source, the model can effectively predict the bearing vibration values.

Conclusions
In lieu of the continual advancement of power generation units, the need to deploy efficient O&M procedures has increased.Hence, prognostics have become an important component of revolutionized electro-mechanical systems.Therefore, this research described an effective fault prognostics system for rolling element bearings based on univariate time series analysis using LSTM.The developed model was trained and tested using the bearings' vibrational data.All analyses were conducted in the time domain, and raw sensor data was directly input into the model, thereby minimizing the need for feature engineering.The proposed approach was experimentally validated, and the model's performance was analyzed in RMSE terms.The results were compared with other research conducted on the same dataset, and our model outperformed existing models and achieved a lower RMSE.The generalization capability of the model was verified by evaluating the performance of the model against real-time data generated by the SCADA system of a wind or hydropower turbine.

Figure 1 .
Figure 1.Steps of the methodology.

Figure 2 .
Figure 2. Basic layout of the proposed LSTM network.where: Ct−1 represents the cell state of the previous timestamp; Ht−1 represents the hidden state of the previous timestamp; Ct represents the current cell state; Ht represents the current hidden state.The cell state carries the information, in our case the vibration data points along with all timestamps, and is known as long-term memory.The hidden state is known as shortterm memory, hence the name Long Short-Term Memory.Along with some minor linear interactions, the cell state Ct carries the data points straight throughout the entire chain, acting as a conveyor belt.The addition of significant data points and the removal of less significant data points to the cell state is regulated by the gates depicted in Figure2.These gates comprise a sigmoid σ neural network layer and a point-wise multiplication operation, as depicted in Figure3.The model training is conducted in accordance with these nodes.

Figure 2 .
Figure 2. Basic layout of the proposed LSTM network.

Figure 3 .
Figure 3. Detailed structure of the proposed LSTM network.

Figure 4
Figure 4 depicts the layout of the test rig.The data were collected for 1 s every 10 min.Each file has 20,480 data points, and the name of the file indicates the time at which the data was recorded.The test rig was lubricated using an oil circulation mechanism.The extent of debris stuck to a magnetic switch installed in the oil feedback pipe of the lubrication system indicated bearing degradation.achines 2023, 11, x FOR PEER REVIEW 8 of 17

Figure 5a depicts Bearing 3
Figure 5a depicts Bearing 3 with an inner race fault at the end of the experiment recorded as Dataset 1. Figure 5b shows Bearing 1 with an outer race fault at the end of the experiment recorded as Dataset 2.

Figure
Figure 5a depicts Bearing 3 with an inner race fault at the end of the experiment recorded as Dataset 1. Figure 5b shows Bearing 1 with an outer race fault at the end of the experiment recorded as Dataset 2.The plot of vibrational signals of bearing 1 in dataset 02 for the entire run-to-failureexperiment can be seen in Figure6a.Whereas the plot of vibrational signals of bearing 3 in dataset 03 for the entire run-to-failure-experiment can be observed in Figure6b.In Figure6, the x-axis represents the time taken till the occurrence of a fault in the bearings, and the y-axis represents the vibration values recorded by the accelerometers in m/s 2 .

FigureFigure 5 .
Figure5adepicts Bearing 3 with an inner race fault at the end of the experiment recorded as Dataset 1. Figure5bshows Bearing 1 with an outer race fault at the end of the experiment recorded as Dataset 2.

Figure 5 .
Figure 5. Physical condition of bearings after the run-to-failure experiment [45].

Figure 6 .
Figure 6.Plot of Bearing Vibrational Signals after Run-to-Failure Experiment.
) datasets, each describing an independent run-to-failure experiment.Dataset 02, with a total of 984 files, was used for the training and testing of the model.The dataset was split into training and testing sets.70% of the data was used for training the model, and 30% was used for testing.The results were further verified by testing the trained model using Dataset 03 with 4448 files.At the end of run-to-failure experiments, in both Dataset 02 and Dataset 03, Bearing 1 and Bearing 3, respectively, developed an outer race fault.Therefore, the results were verified against Dataset 03 after performing the training and testing of the model on Dataset 02.3.2.1.Data Pre-Processing

Figure 6 .
Figure 6.Plot of Bearing Vibrational Signals after Run-to-Failure Experiment.
) datasets, each describing an independent run-to-failure experiment.Dataset 02, with a total of 984 files, was used for the training and testing of the model.The dataset was split into training and testing sets.70% of the data was used for training the model, and 30% was used for testing.The results were further verified by testing the trained model using Dataset 03 with 4448 files.At the end of run-to-failure experiments, in both Dataset 02 and Dataset 03, Bearing 1 and Bearing 3, respectively, developed an outer race fault.Therefore, the results Machines 2023, 11, 531 9 of 15 were verified against Dataset 03 after performing the training and testing of the model on Dataset 02.
Figure 7 depicts the training and validation loss for 50, 100, and 150 Epochs, respectively.ines 2023, 11, x FOR PEER REVIEW 10 of 17 3.2.2.Hyper-Parameter Testing and Fine Tuning During the fine-tuning and hyper-parameter testing, the model was trained for various epochs and the training and validation losses were calculated.Subsequently, the training and validation loss was plotted to visually analyze the model's performance.
Figure 7 depicts the training and validation loss for 50, 100, and 150 Epochs, respectively.

Figure 8 .
Figure 8. Plot of bearing vibrational signals-actual (blue) vs. predicted (orange) values.The model's performance was further verified by testing it against Dataset 03. Figure 8b depicts the predicted and actual values of the bearing vibrations.The model performed even better and achieved a remarkably low RMSE value of 0.0102.During the testing on Dataset 03, the model was also evaluated using MAE, NMAE, and MAPE.The error values in Table3reflect that the model performed extremely well while predicting the bearing vibration values.

Figure 8 .
Figure 8. Plot of bearing vibrational signals-actual (blue) vs. predicted (orange) values.The model's performance was further verified by testing it against Dataset 03. Figure 8b depicts the predicted and actual values of the bearing vibrations.The model performed even better and achieved a remarkably low RMSE value of 0.0102.During the testing on Dataset 03, the model was also evaluated using MAE, NMAE, and MAPE.The error values in Table3reflect that the model performed extremely well while predicting the bearing vibration values.

Table 1 .
Details of the dataset.

Table 2 .
Characteristics of the LSTM model.

Table 2 .
Characteristics of the LSTM model.

Table 3 .
Model evaluation using various metrics.
Table 4 compares the RMSE of our research

Table 3 .
Model evaluation using various metrics.

Table 4 .
Comparison of RMSE with similar research.

Table 5 .
Comparison of prediction accuracy with other machine learning models.