Leveraging Label Information in a Knowledge-Driven Approach for Rolling-Element Bearings Remaining Useful Life Prediction

: Since bearing deterioration patterns are difﬁcult to collect from real, long lifetime scenarios, data-driven research has been directed towards recovering them by imposing accelerated life tests. Consequently, insufﬁciently recovered features due to rapid damage propagation seem more likely to lead to poorly generalized learning machines. Knowledge-driven learning comes as a solution by providing prior assumptions from transfer learning. Likewise, the absence of true labels was able to create inconsistency related problems between samples, and teacher-given label behaviors led to more ill-posed predictors. Therefore, in an attempt to overcome the incomplete, unlabeled data drawbacks, a new autoencoder has been designed as an additional source that could correlate inputs and labels by exploiting label information in a completely unsupervised learning scheme. Additionally, its stacked denoising version seems to more robustly be able to recover them for new unseen data. Due to the non-stationary and sequentially driven nature of samples, recovered representations have been fed into a transfer learning, convolutional, long–short-term memory neural network for further meaningful learning representations. The assessment procedures were benchmarked against recent methods under different training datasets. The obtained results led to more efﬁciency conﬁrming the strength of the new learning path.


Introduction
A successful conditional preventive maintenance program relies entirely on precise real-time monitoring, is capable of prior detection of any failure suspicion, and stands on a well-structured prognosis policy [1]. Remaining useful life (RUL) is very crucial for prognosis, taking place as the primary measure of health assessment [2]. It is mainly based either on the estimation of useful time until the complete failure of such a system or on the provision of a probability or any other important information indicating its current operational performance. Its evaluation involves the use of different modeling paradigms, depending on the complexity of the system as well as the availability of the operating history, including all anomaly events [3]. If it is not difficult to represent the operating behavior using physical interpretations, then it would be a very authentic modeling process that could certainly lead to an accurate RUL prediction. Likewise, datadriven evaluation is a very common promising solution in case of unavailability of the physical modeling process. In data-driven training procedures, the precise RUL estimation process depends on two main characteristics, namely: (i) complete run-to-failure historical sensor measurements, and (ii) truly attributed labels to each event. However, for some machines such as bearings, collecting these large deterioration patterns seems most likely

•
A new autoencoder (AE) capable of seeding label information in hidden layers without affecting the learning inputs or even the reconstruction process, and also to ensure knowledge transfer, is designed. • This AE is stacked and strengthened with a denoising scheme to ensure more robust extraction as well as more homogeneous mixture of label and feature mapping. • Due to the sequentially driven and non-stationary nature of vibration signals, a convolutional LSTM (C-LSTLM) has been designed to fit both time-varying adaptive learning as well as accurate extraction, similar to the work done in [26]. • Two main datasets, namely, IMS and WTHS, have been involved in an attempt to produce more generalization by transferring knowledge between them using the same learning framework. • Unlike previously mentioned works that mostly deal with a single prediction problem (either HI or HS), both HI and HS have been investigated in this work.

•
In this work, we also used an exponential HI deteriorating function that shows it is more compatible with extracted deterioration features rather than the linear one.

•
Concerning the HS splitting, we have involved the Gaussian mixture model (GMM) based silhouettes coefficient.
The reminder of this paper is organized as follows: in Section 2, the studied bearing datasets and the proposed methodology have been described. Section 3 is devoted to experimental results and discussions. Section 4 concludes with perspectives.

Materials and Methods
This section provides important descriptions related to the used datasets in this work (IMS bearing and WTHS), as well as the details of the followed methodology during RUL prediction.

IMS Bearing Data
First, the IMS bearing dataset was made available by IMS of University of Cincinnati [11]. Since then, it was provided by NASA at a prognostic data repository for the public to be able to evaluate their health assessment models [27]. The experiment incorporated four bearings aligned with a single feed shaft ( Figure 1) with 6000 pounds and a speed of 2000 rpm. • Due to the sequentially driven and non-stationary nature of vibration signals, a convolutional LSTM (C-LSTLM) has been designed to fit both time-varying adaptive learning as well as accurate extraction, similar to the work done in [26]. • Two main datasets, namely, IMS and WTHS, have been involved in an attempt to produce more generalization by transferring knowledge between them using the same learning framework.

•
Unlike previously mentioned works that mostly deal with a single prediction problem (either HI or HS), both HI and HS have been investigated in this work.

•
In this work, we also used an exponential HI deteriorating function that shows it is more compatible with extracted deterioration features rather than the linear one.

•
Concerning the HS splitting, we have involved the Gaussian mixture model (GMM) based silhouettes coefficient.
The reminder of this paper is organized as follows: in Section 2, the studied bearing datasets and the proposed methodology have been described. Section 3 is devoted to experimental results and discussions. Section 4 concludes with perspectives.

Materials and Methods
This section provides important descriptions related to the used datasets in this work (IMS bearing and WTHS), as well as the details of the followed methodology during RUL prediction.

IMS Bearing Data
First, the IMS bearing dataset was made available by IMS of University of Cincinnati [11]. Since then, it was provided by NASA at a prognostic data repository for the public to be able to evaluate their health assessment models [27]. The experiment incorporated four bearings aligned with a single feed shaft ( Figure 1) with 6000 pounds and a speed of 2000 rpm. The IMS data contained three subsets, the first of which was released from two installed accelerometers. In the meantime, the other two sets were brought together from a single sensor. Data acquisition was performed with a sampling rate of 20 kHz where each file was stored separately every second. This means that each file could contain at least 20,000 samples, which can be considered very difficult to manipulate for training models as a single observation. For each experiment, the associated files were gathered in a single folder and named after the date and time of acquisition. The vibration signals in Figure 2 show how bearing health deteriorates over time in each experiment. The bearing tests The IMS data contained three subsets, the first of which was released from two installed accelerometers. In the meantime, the other two sets were brought together from a single sensor. Data acquisition was performed with a sampling rate of 20 kHz where each file was stored separately every second. This means that each file could contain at least 20,000 samples, which can be considered very difficult to manipulate for training models as a single observation. For each experiment, the associated files were gathered in a single folder and named after the date and time of acquisition. The vibration signals in Figure 2 show how bearing health deteriorates over time in each experiment. The bearing tests lasted approximately 35 days until the appearance of certain significant symptoms of bearing failure.

WTHS Bearing Data
In an attempt to obtain more realistic conclusions from the analysis of bearing deterioration, an experiment was carried out to record the real-time health indicators of a high-speed shaft with 20 tooth pinion gear driven by 2 MW wind turbine. Figure 3 is a simple graphical illustration of the studied system that indicates the position of the main bearings as well as the dimensions of the taper roller bearings.

WTHS Bearing Data
In an attempt to obtain more realistic conclusions from the analysis of bearing deterioration, an experiment was carried out to record the real-time health indicators of a high-speed shaft with 20 tooth pinion gear driven by 2 MW wind turbine. Figure 3 is a simple graphical illustration of the studied system that indicates the position of the main bearings as well as the dimensions of the taper roller bearings.

WTHS Bearing Data
In an attempt to obtain more realistic conclusions from the analysis of bearing deterioration, an experiment was carried out to record the real-time health indicators of a high-speed shaft with 20 tooth pinion gear driven by 2 MW wind turbine. Figure 3 is a simple graphical illustration of the studied system that indicates the position of the main bearings as well as the dimensions of the taper roller bearings.  The vibration measurements were recorded by an actual designated monitoring system in the United States [10]. A single accelerometer was installed perpendicular to the high-speed shaft in the gearbox bearing to be able to detect the progressive spread of damage. The monitoring system was programmed to store 6 s of vibration samples each day with a sampling rate equal to 100 kHz. Fifty files were stored separately for 50 days where an amount of approximately 585,936 samples per file was treated as a single health indicator [28]. It was observed that the collected data varied exponentially over time due to the changes that occurred in the physical health conditions of the bearings, as shown in Figure 4. Consequently, within 50 days under normal operating condition, the bearings stopped working due to the occurrence of an internal race fault.
The vibration measurements were recorded by an actual designated monitoring system in the United States [10]. A single accelerometer was installed perpendicular to the high-speed shaft in the gearbox bearing to be able to detect the progressive spread of damage. The monitoring system was programmed to store 6 s of vibration samples each day with a sampling rate equal to 100 kHz. Fifty files were stored separately for 50 days where an amount of approximately 585,936 samples per file was treated as a single health indicator [28]. It wasobserved that the collected data varied exponentially over time due to the changes that occurred in the physical health conditions of the bearings, as shown in Figure 4. Consequently, within 50 days under normal operating condition, the bearings stopped working due to the occurrence of an internal race fault.

Proposed Knowledge-Driven Methodology
In order to provide a more reliable conclusion on the RUL prediction of such a system, a new methodology, which involves multiple sources of knowledge, has been involved in a unique learning scheme that is designed according to the flow diagram enshrined by Figure 5. Both data-driven and knowledge-driven methodology have been utilized, along within this new learning procedure,by exploiting TL information, deep learning labels, as well as previous hypotheses generated from pre-trained models.

Proposed Knowledge-Driven Methodology
In order to provide a more reliable conclusion on the RUL prediction of such a system, a new methodology, which involves multiple sources of knowledge, has been involved in a unique learning scheme that is designed according to the flow diagram enshrined by Figure 5. Both data-driven and knowledge-driven methodology have been utilized, along within this new learning procedure, by exploiting TL information, deep learning labels, as well as previous hypotheses generated from pre-trained models. The vibration measurements were recorded by an actual designated monitor system in the United States [10]. A single accelerometer was installed perpendicular to high-speed shaft in the gearbox bearing to be able to detect the progressive spread damage. The monitoring system was programmed to store 6 s of vibration samples ea day with a sampling rate equal to 100 kHz. Fifty files were stored separately for 50 d where an amount of approximately 585,936 samples per file was treated as a single hea indicator [28]. It wasobserved that the collected data varied exponentially over time d to the changes that occurred in the physical health conditions of the bearings, as shown Figure 4. Consequently, within 50 days under normal operating condition, the bearin stopped working due to the occurrence of an internal race fault.

Proposed Knowledge-Driven Methodology
In order to provide a more reliable conclusion on the RUL prediction of such a s tem, a new methodology, which involves multiple sources of knowledge, has been volved in a unique learning scheme that is designed according to the flow diagram shrined by Figure 5. Both data-driven and knowledge-driven methodology have be utilized, along within this new learning procedure,by exploiting TL information, de learning labels, as well as previous hypotheses generated from pre-trained models.  To guarantee an optimal feature space for representation learning algorithms, a wellstructured feature extraction was considered. In total, 15 statistical features were extracted from both time-domain and frequency-domain for both datasets.
Eleven time-domain features were calculated for each time window (mean, standard deviation (Std), skewness, kurtosis, peak to peak indices (Peak2Peak), root mean squared (RMS), crest factor, shape factor, impulse factor, margin factor energy} in addition to four frequency-domain features {spectral kurtosis mean (SKMean), SKStd, SKSkewness, SKKurtosis}. More information about the mathematical background of these features can be obtained from [10]. After that, these parameters were scaled in the interval [0, 1] using min-max normalization and fed directly into the learning models. Regarding smoothness filtering procedures of the extracted features, as in [27], we do not think they will be beneficial. In fact, this may be applicable in this case if the training and testing data is pre-defined. However, in a real application, one cannot guess how far we could scale our samples, especially when the newly arrived samples are driven one-by-one. Therefore, by trying to avoid any misleading representations, we both trained and tested the learning models by samples as they resulted from min-max normalization. Figure 6 illustrates the behavior of the entities extracted from the two datasets. It can be seen that the WTHS dataset exhibited a complete data deterioration lifecycle, which seemed very adequate for the learning process in terms of the compatibility of the degradation form. Conversely, the IMS dataset was more difficult because it could lead the training program to expose all kinds of ill-posed problems.

Data Preprocessing
To guarantee an optimal feature space for representation learning algorithms, a well-structured feature extraction was considered. In total, 15 statistical features were extracted from both time-domain and frequency-domain for both datasets.
Eleven time-domain features were calculated for each time window (mean, standard deviation (Std), skewness, kurtosis, peak to peak indices (Peak2Peak), root mean squared (RMS), crest factor, shape factor, impulse factor, margin factor energy} in addition tofour frequency-domain features {spectral kurtosis mean (SKMean), SKStd, SKSkewness, SKKurtosis}. More information about the mathematical background of these features can be obtained from [10]. After that, these parameters were scaled in the interval [0,1] using min-max normalization and fed directly into the learning models. Regarding smoothness filtering procedures of the extracted features, as in [27], we do not think they will be beneficial. In fact, this may be applicable in this case if the training and testing data is pre-defined. However, in a real application, one cannot guess how far we could scale our samples, especially when the newly arrived samples are driven one-by-one. Therefore, by trying to avoid any misleading representations, we both trained and tested the learning models by samples as they resulted from min-max normalization. Figure 6 illustrates the behavior of the entities extracted from the two datasets. It can be seen that the WTHS dataset exhibited a complete data deterioration lifecycle, which seemed very adequate for the learning process in terms of the compatibility of the degradation form. Conversely, the IMS dataset was more difficult because it could lead the training program to expose all kinds of ill-posed problems.

HI Identification
Unlike traditional works, which present the deterioration path as a linear function; in our work we used an exponential function, which is illustrated by Equation (1), that seems more adaptable with the approximation process. The similarity between the shape of the deterioration and the exceptional function of degradation creates a kind of compatibility between the labels and the extracted patterns.
t stands for the time instant, and (a, b, d) are hyperparameters that control the divergence characteristics of the exponential degradation. These parameters are tuned according to the best results of the loss function. Figure 7 is an illustrative example indicating the HI identification process for the two studied datasets.

HI Identification
Unlike traditional works, which present the deterioration path as a linear function; in our work we used an exponential function, which is illustrated by Equation (1), that seems more adaptable with the approximation process. The similarity between the shape of the deterioration and the exceptional function of degradation creates a kind of compatibility between the labels and the extracted patterns.
stands for the time instant, and ( , , ) are hyperparameters that control the divergence characteristics of the exponential degradation. These parameters are tuned according to the best results of the loss function. Figure 7 is an illustrative example indicating the HI identification process for the two studied datasets.

HS Splitting
Among many clustering machines, we have chosen to use the Gaussian mixture model (GMM) to divide the training patterns according to three distinct operating phases of bearings (healthy, deteriorated, critical). The reasons for this choice lie in the learning procedures of GMM, which use more statistical characteristics, such as the mean and standard deviation, than others, such as K-nearest neighbor (KNN) variants [5], which use only the mean value for subspaces-divisions. The KNN variants divide the data into subspaces by identifying the circular boundaries centered in the medal of each class with a radius equal to the mean value of the samples of that class. On the other hand, GMM is able to provide more flexibility and smoothness by providing an ellipsoidal shape to the decision classes [29,30]. In GMM, clustering probability can be defined as expressed by Equation (2). refers to the training sample, = { , , Σ} are the statistical components of the GMM model, and( , Σ) are the mean and variance, respectively.

HS Splitting
Among many clustering machines, we have chosen to use the Gaussian mixture model (GMM) to divide the training patterns according to three distinct operating phases of bearings (healthy, deteriorated, critical). The reasons for this choice lie in the learning procedures of GMM, which use more statistical characteristics, such as the mean and standard deviation, than others, such as K-nearest neighbor (KNN) variants [5], which use only the mean value for subspaces-divisions. The KNN variants divide the data into subspaces by identifying the circular boundaries centered in the medal of each class with a radius equal to the mean value of the samples of that class. On the other hand, GMM is able to provide more flexibility and smoothness by providing an ellipsoidal shape to the decision classes [29,30]. In GMM, clustering probability can be defined as expressed by Equation (2). x refers to the training sample, λ = {x, u, Σ} are the statistical components of the GMM model, and (u, Σ) are the mean and variance, respectively.
Since clustering is an unsupervised learning process, it is, therefore, very difficult to judge whether the model is really able to classify the data correctly or not due to the lack of ground truth labels. However, we are still able to guess whether the model is able to group this kind of data at certain numbers of classes or not. For instance, in our work, we used Silhouette analysis for this mission. The Silhouette coefficient is a metric that can be used to measure the amount of groups that such a cluster is able to detect [31]. Equation (4) illustrates the measurement of this parameter α using only the largest and the smallest average distances between learning samples (ω, γ), respectively.
In our work, this simple test proves that GMM is really capable of doing it under these datasets, as shown in Figure 8.
14, x FOR PEER REVIEW 9 of 18 Each clustering component can be defined as expressed by Equation (3). Where stands for the number of features in each observation (e.g., in our case, we have 15 features).
Since clustering is an unsupervised learning process, it is, therefore, very difficult to judge whether the model is really able to classify the data correctly or not due to the lack of ground truth labels. However, we are still able to guess whether the model is able to group this kind of data at certain numbers of classes or not. For instance, in our work, we used Silhouette analysis for this mission. The Silhouette coefficient is a metric that can be used to measure the amount of groups that such a cluster is able to detect [31]. Equation (4) illustrates the measurement of this parameter using only the largest and the smallest average distances between learning samples( , ), respectively.
In our work, this simple test proves that GMM is really capable of doing it under these datasets, as shown in Figure 8.

DenoisingAutoencoderfor Label Embedding
In the proposed learning scheme, the integration of labels was carried out via a stack of autoencoders connected in series. The main objective was to seed patterns similar to the labels in the learning representations of the hidden layers of the neural network. We believe the more the shape of the inputs reflects the targets, the more the approximation will do (as it will be experimentally proven later). As we can see from the illustrations in Figure 9, the denoising scheme was incorporated to strengthen the representations, attempting to lead to more meaningful patterns. Additionally, the stack of autoencoderswas designed to homogenize the labels as much as possible with the original mapped patterns without distorting them.

Denoising Autoencoder for Label Embedding
In the proposed learning scheme, the integration of labels was carried out via a stack of autoencoders connected in series. The main objective was to seed patterns similar to the labels in the learning representations of the hidden layers of the neural network. We believe the more the shape of the inputs reflects the targets, the more the approximation will do (as it will be experimentally proven later). As we can see from the illustrations in Figure 9, the denoising scheme was incorporated to strengthen the representations, attempting to lead to more meaningful patterns. Additionally, the stack of autoencoders was designed to homogenize the labels as much as possible with the original mapped patterns without distorting them. Due to the local connections between the hidden layer and the input layer resulting from additive labels, also due to the deep complex architecture, we moved forward using a tuning algorithm on one side capable of adjusting only the output weights accurately and quickly. To the best of our knowledge, the only algorithm capable of doing so is the extreme learning machine(ELM) [32]. Therefore, the learning steps of the proposed stacked denoising autoencoder with embedded labels (SD-AEEL) can be presented as follows: • Retrieve corrupted inputs using generated noise from any distribution with a specific magnitude and rate and use them to corrupt the inputs , same as in Equation (5).
• Activate the hidden layerℎ using any activation transfer function that holds both ordinary full rank mapping of and seeded labels ,as explained by Equation (6).( , ) are the input weights and biases,respectively.
In fact, Equation (6) is our main contribution in this work, and as far as we know, it is the only one of its kind.

•
Determine the reconstructions weights using the original inputs and the new feature mapping, same as described by Equation (7). Due to the local connections between the hidden layer and the input layer resulting from additive labels, also due to the deep complex architecture, we moved forward using a tuning algorithm on one side capable of adjusting only the output weights accurately and quickly. To the best of our knowledge, the only algorithm capable of doing so is the extreme learning machine (ELM) [32]. Therefore, the learning steps of the proposed stacked denoising autoencoder with embedded labels (SD-AEEL) can be presented as follows: • Retrieve corrupted inputs x η using generated noise from any distribution with a specific magnitude ζ and rate ψ and use them to corrupt the inputs x, same as in Equation (5).
• Activate the hidden layer h using any activation transfer function f that holds both ordinary full rank mapping of x η and seeded labels y , as explained by Equation (6).
(w, b) are the input weights and biases, respectively.
In fact, Equation (6) is our main contribution in this work, and as far as we know, it is the only one of its kind.

•
Determine the reconstructions weights β using the original inputs and the new feature mapping, same as described by Equation (7).
• Repeat the learning process by considering the hidden layers as inputs of the next autoencoders, as explained by Equation (8), where m is the index of the autoencoder.
• After the training process is finished, one can construct a fully trained network for robust feature mapping using the transpose of the output weights β T , as in Equation (9).

Data-Driven Network and Transfer Learning
The showcased convolutional LSTM (C-LSTM) network in Figure 10 represents the main data-driven approach used for both training in source domain and knowledge transfer to the target domain. As we can see, it consists of three main layers; the convolutional layer, the pooling layer, and the LSTM layer.
• Repeat the learning process by considering the hidden layers as inputs of the next autoencoders, as explained by Equation (8), where is the index of the autoencoder.
• After the training process is finished, one can construct a fully trained network for robust feature mapping using the transpose of the output weights , as in Equation (9).

Data-Driven Network and Transfer Learning
The showcased convolutional LSTM (C-LSTM) network in Figure 10 represents the main data-driven approach used for both training in source domain and knowledge transfer to the target domain. As we can see, it consists of three main layers; the convolutional layer, the pooling layer, and the LSTM layer. In the convolutional layer, the training samples were mapped through local dimensional (1D) receiver filters into several hierarchical slices. Following that, each resulting slice of the entity mapping was sub-simplified using maximum pooling. In addition, the pooling layer was introduced in the LSTM layer for approximation and generalization. Regarding the TL process, the learning weights of C-LSTM were used to provide further generalization for the learning models in the target domain.
The reason we chosethe LSTM network was due to its powerful capability in adaptive sequential learning without suffering from vanishing gradient problems [33]. It uses a set of learning gates, namely; input gate , forgetting gate ,and output gate ,to control the amount of memorization of any driven patterns, as demonstrated by Equations (10)- (12).
In the meantime, the hidden state ℎ and the cell state are initially determined using Equations (13)-(15). In the convolutional layer, the training samples were mapped through local dimensional (1D) receiver filters into several hierarchical slices. Following that, each resulting slice of the entity mapping was sub-simplified using maximum pooling. In addition, the pooling layer was introduced in the LSTM layer for approximation and generalization. Regarding the TL process, the learning weights of C-LSTM were used to provide further generalization for the learning models in the target domain.
The reason we chosethe LSTM network was due to its powerful capability in adaptive sequential learning without suffering from vanishing gradient problems [33]. It uses a set of learning gates, namely; input gate g i t , forgetting gate g In the meantime, the hidden state h t and the cell state C t are initially determined using Equations (13)- (15).

Results and Discussion
This section provides results and discussion related to the performances of the designed algorithm, starting with label seeding towards both HI and HS predictions.

Labels Recovering Process
Unlike traditional unsupervised learning autoencoders that follow the input reconstruction path, our designed stacked denoising version is able to perform two main tasks simultaneously, namely, label regeneration and input reconstruction. Therefore, to prove this ability to keep the inputs as is without affecting them in any way, a simple experiment was performed using both the datasets and their HI labels. The accuracy of the retrieval of labels and feature reconstruction in Figure 11 indicates that the acceptable number of layers for both datasets can be found between [6,10] hidden layers, which can be adjusted by a searching mechanism through a grid of hyperparameters.

Results and Discussion
This section provides results and discussion related to the performances of the designed algorithm, starting with label seeding towards both HI and HS predictions.

Labels Recovering Process
Unlike traditional unsupervised learning autoencodersthat follow the input reconstruction path, our designed stacked denoising version is able to perform two main tasks simultaneously, namely, label regeneration and input reconstruction. Therefore, to prove this ability to keep the inputs as is without affecting them in any way, a simple experiment was performed using both the datasets and their HI labels. The accuracy of the retrieval of labels and feature reconstruction in Figure 11 indicates that the acceptable number of layers for both datasets can be found between [6,10]hidden layers, which can be adjusted by a searching mechanism through a grid of hyperparameters.

HI Prediction Results
Prior to the training process, the two data sets were divided into training and testing sets by cross-validation on k times of training samples (k = 4). To prove the capability of the learning scheme designed in the HI regression problem, we first tested each contribution separately. We compared the full TL algorithm that leverages labels information (TLC-LSTM) to its transfer learning version (LC-LSTM), C-LSTM, and LSTM, respectively. The results of the curve fitting of Figure 12 on the test sets prove its higher approximation ability. In addition, it shows the benefits of each individual contribution separately by showing the observable improvements at each stage.

HI Prediction Results
Prior to the training process, the two data sets were divided into training and testing sets by cross-validation on k times of training samples (k = 4). To prove the capability of the learning scheme designed in the HI regression problem, we first tested each contribution separately. We compared the full TL algorithm that leverages labels information (TLC-LSTM) to its transfer learning version (LC-LSTM), C-LSTM, and LSTM, respectively. The results of the curve fitting of Figure 12 on the test sets prove its higher approximation ability. In addition, it shows the benefits of each individual contribution separately by showing the observable improvements at each stage. If we used, for instance, any HI accuracy evaluation metric , such as the one expressed by Equation (16)[34], we will be able to give more information on the accuracy of the learning algorithm by studying the sparseness of the predicted HI values. where is an estimation error that can be calculated according to Equation (17).
The current accuracy formula was designed to observe the number of errors in the late and early predictions, because later predictions were harmful and the early ones were more demanding on maintenance resources. It also showed the concentration of values towards the value 1: the higher the concentration was, the more accuracy increased.
For instance, Figure 13 explains that all of the designed predictors are most likely early estimators of HI. However, the designed algorithm (TLC-LSTM) has a better approximation and a better accuracy by showing more concentration towards value 1 than the other algorithms. If we used, for instance, any HI accuracy evaluation metric S, such as the one expressed by Equation (16) [34], we will be able to give more information on the accuracy of the learning algorithm by studying the sparseness of the predicted HI values.
where E is an estimation error that can be calculated according to Equation (17).
The current accuracy formula was designed to observe the number of errors in the late and early predictions, because later predictions were harmful and the early ones were more demanding on maintenance resources. It also showed the concentration of values towards the value 1: the higher the concentration was, the more accuracy increased.
For instance, Figure 13 explains that all of the designed predictors are most likely early estimators of HI. However, the designed algorithm (TLC-LSTM) has a better approximation and a better accuracy by showing more concentration towards value 1 than the other algorithms. If we used, for instance, any HI accuracy evaluation metric , such as the one expressed by Equation (16)[34], we will be able to give more information on the accuracy of the learning algorithm by studying the sparseness of the predicted HI values.
where is an estimation error that can be calculated according to Equation (17).
The current accuracy formula was designed to observe the number of errors in the late and early predictions, because later predictions were harmful and the early ones were more demanding on maintenance resources. It also showed the concentration of values towards the value 1: the higher the concentration was, the more accuracy increased.
For instance, Figure 13 explains that all of the designed predictors are most likely early estimators of HI. However, the designed algorithm (TLC-LSTM) has a better approximation and a better accuracy by showing more concentration towards value 1 than the other algorithms.  In terms of HI predictions, and given that available literature works use different assessment parameters, representations of HI as well as data divisions, it is clearly difficult to collect information needed for comparison (i.e., in a single table). However, if we consider HI curve fit regularity as a criterion, we can definitely consider that our proposal has a higher approximation in comparison to recent works, such as in [15,16,35,36], for WTHS data. Concerning IMS data, the carried-out works, such as in [9,[12][13][14]20,37], mainly dealt with the HS classification problem, while in our study, we are proposing HI approximation for IMS data, which can be considered a new contribution in RUL prediction. In addition, the proposed model proves knowledge integration robustness, specifically when no data processing is considered after feature extraction, such as correlation or monotonicity analysis.

HS Prediction Results
Due to the nature of vibration signals, which resemble consecutively driven sequences in time, the best representation of HI for classes will be the ordinal type of encoding. Therefore, the resulting groups from the GMM were re-sorted and represented with consecutive integers. After that, we followed the same procedures of the previous regression problem to feed the learning samples to the learning models. As a result, the area under the probability curve (AUC) beneath the receiver operating characteristic (ROC) curves of Figure 14 explains the classification capacity of each algorithm. Indeed, we are always able to observe the benefits of each contribution in the classification process. Meanwhile, our proposed algorithm has a larger AUC area, which explains its strong ability to split HS. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data.   Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally, our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally, our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.   Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data.
Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.   Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data.
Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.   Table 2 of classification rates also explains the ability TLC-LSTM method in health threshold detection by sho racy during the testing phase for the unseen data. TLC-LSTM 0.96856% Compared to recent data-driven methods, such as [9 more general and even clearer as they were able to stud time. Therefore, we do not consider numerical proof only racy, for example. In fact, in this work, we have used m classification accuracy, and Silhouette coefficient), inclu (AUC and ROC), which is strong evidence that supports rithm. Additionally,our work also used the same extra contribution, which was considered as one the main co Moreover, if more additive characteristics are considered those obtained with acoustic signals used in [38], one can space results that ultimately lead to an increased accura lowed path is the only one that could provide an optimal the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating characteristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.   Table 2 of classification rates also explains the ability TLC-LSTM method in health threshold detection by sho racy during the testing phase for the unseen data. TLC-LSTM 0.96856% Compared to recent data-driven methods, such as [9 more general and even clearer as they were able to stud time. Therefore, we do not consider numerical proof only racy, for example. In fact, in this work, we have used m classification accuracy, and Silhouette coefficient), inclu (AUC and ROC), which is strong evidence that supports rithm. Additionally,our work also used the same extra contribution, which was considered as one the main co Moreover, if more additive characteristics are considered those obtained with acoustic signals used in [38], one can space results that ultimately lead to an increased accura lowed path is the only one that could provide an optimal the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating characteristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.   Table 2 of classification rates also explains the ability TLC-LSTM method in health threshold detection by sho racy during the testing phase for the unseen data. TLC-LSTM 0.96856% Compared to recent data-driven methods, such as [9 more general and even clearer as they were able to stud time. Therefore, we do not consider numerical proof only racy, for example. In fact, in this work, we have used m classification accuracy, and Silhouette coefficient), inclu (AUC and ROC), which is strong evidence that supports rithm. Additionally,our work also used the same extra contribution, which was considered as one the main co Moreover, if more additive characteristics are considered those obtained with acoustic signals used in [38], one can space results that ultimately lead to an increased accura lowed path is the only one that could provide an optimal the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating characteristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating characteristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.   Table 2 of classification rates also explains the ability TLC-LSTM method in health threshold detection by sho racy during the testing phase for the unseen data. TLC-LSTM 0.96856% Compared to recent data-driven methods, such as [9 more general and even clearer as they were able to stud time. Therefore, we do not consider numerical proof only racy, for example. In fact, in this work, we have used m classification accuracy, and Silhouette coefficient), inclu (AUC and ROC), which is strong evidence that supports rithm. Additionally,our work also used the same extra contribution, which was considered as one the main co Moreover, if more additive characteristics are considered those obtained with acoustic signals used in [38], one can space results that ultimately lead to an increased accura lowed path is the only one that could provide an optimal the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating characteristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating characteristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating characteristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.   Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.   Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.   Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.   Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.   Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating charac-teristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accu-racy during the testing phase for the unseen data.  [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accu-racy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algo-rithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our fol-lowed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating characteristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods.  Figure 14. HI classifiers ofarea under the probability curve beneath the receiver operating characteristic (AUC-ROC) curves. Table 2 of classification rates also explains the ability of the new knowledge-driven TLC-LSTM method in health threshold detection by showing more classification accuracy during the testing phase for the unseen data. Compared to recent data-driven methods, such as [9,10,14,37], our contributions are more general and even clearer as they were able to study both HI and HS at the same time. Therefore, we do not consider numerical proof only as a single parameter of accuracy, for example. In fact, in this work, we have used many metrics (RMSE, accuracy, classification accuracy, and Silhouette coefficient), including graphical interpretations (AUC and ROC), which is strong evidence that supports the ability of the learning algorithm. Additionally,our work also used the same extracted features as an additional contribution, which was considered as one the main contributions of previous works. Moreover, if more additive characteristics are considered for bearing HS splitting, such as those obtained with acoustic signals used in [38], one can find further improved feature space results that ultimately lead to an increased accuracy. Table 3 shows that our followed path is the only one that could provide an optimal conclusion on the RUL among the compared methods. Generally speaking, with respect to the current knowledge-based framework, the present study proposed a design to solve the problem of incomplete data in the absence of true labels for one-dimensional bearing characteristics. A set of training samples was already obtained from a real life test (WTHS) where further training samples were obtained from an accelerated life test (IMS). The purpose of this hybrid data selection was to collect important patents from real experiences and use them to transfer knowledge to fill the gaps produced by the lake of patterns. In addition, the learning process was managed in reverse to be able to produce an additional generalization in case of insufficient samples due to the small recording period (6s per day).
Bearing measurements have a particular nature, in which a form of time series has been integrated that could be delivered piece by piece. Therefore, the proposed methodology applications in other areas will be clearly possible with any type of time series data with an incomplete list of patterns, as well as with missing labels.

Conclusions
In this work, a new approach based on multi-source knowledge was proposed for bearing degradation prognosis. This new scheme combined a stack of autoencoders connected in series specially designed for the integration of labels as a source of knowledge. These autoencoders were used to power data-driven algorithms that integrate C-LSTM for RUL prediction. In addition, this hybridization was maintained to transfer knowledge from a source domain to a specific target domain via TL procedures. This knowledgedriven approach was tested on two different datasets, namely, IMS and WTHS. Unlike previous works, which focused on one of the themes; whether it is HI or HS predictions in a single benchmark, our work studied both in the same experiments. The prediction results were transmitted through numerous measurements, including numerical and graphical interpretation. The evaluation process proved the strength and credibility of the designed algorithm, even when comparing it to a set of recent works. Regarding future works, and in an attempt to reduce algorithmic complexity, one possible trend in this work will be the integration of label seeding inside the C-LSTM itself, rather than the current feature mapping. In addition, one can discuss the use of other approaches that make label mining even easier and increase retrieval capacity by testing other types of feature reconstructions, such as compressed sensing. Additionally, neuron pruning techniques, such as sparse coding, dropout, and contractive autoencoding, could be involved to keep only important meaningful descriptions from learning samples.