Given the complexities and associated uncertainty of the fault diagnostic and prognostic problem, a proposed methodology would be one that is flexible enough to include new sets of information as they become available. Expert opinion, black swan events, abnormal operating conditions, knowledge of the underlying failure modes, physics of failure models and partially relevant information, can all be included within the remaining useful life estimation. While this information can be valuable, the methodology should also adequately generalize this data. For example, extracting relevant features, which may be known, may not be able to account for noisy sensor signals or operating conditions outside the norm. With this end, we propose the methodology shown in
Figure 4.
The methodology has two distinct phases: (1) Unsupervised learning assessment of RUL, (2) Semi-supervised learning assessment of RUL. It starts with the raw data signal fed into the unsupervised variational adversarial filter. Without knowledge of labeling (e.g., the system health states) at the start of operation of the system, this stage of development requires the use of unsupervised remaining useful life estimation. Once the system has had operational time, the engineer can start labeling data in a semi-supervised iterative loop, i.e., to identify the system’s health states with corresponding input sensor data patterns. As it may not be feasible (time and cost-wise) to do so for all the available data, experiments have shown that semi-supervised methodologies with only a few percentages of the data set labeled can substantially improve the unsupervised methods [
30]. Therefore, as the engineer labels data, the framework is robust enough to handle this percentage of labeled data, as shall be demonstrated later in
Section 4.
3.1. Unsupervised Remaining Useful Life Formulation
In this work, we propose a mathematical formulation that encapsulates the following features: Both unsupervised and semi-supervised feature learning, adversarial-variational state-space modeling with non-Markovian transitions (i.e., it is not assumed that all information regarding past observation is contained within the last system state), adversarial training mechanism on the training of the recognition
and variational Bayes for the inference and generative model
. As shown in
Figure 5 and
Figure 6, we set
as the observed sensor data,
as the latent system health state (e.g., crack length, degradation), and
is the target domain relevant to the adversarial training
. Blue lines represent the adversarial mechanism, dashed lines indicate inference processes and solid lines indicate a generative process. The transition parameters,
, are inferred via a neural network. Past observations are directly included in the inferential model output. The proposed mathematical framework does not assume that all the information relevant to parameters
is encoded within
.
To establish the training optimization, we denote the latent sequence
as a set of real numbers
and observations as
. Now
can be, but is not limited to, a multi-dimensional sensor data set from a large asset. The observations,
, are not constrained to a Markovian transition assumption. For engineering problems (e.g., crack growth and environmental effects on RUL) these transitions can be complex non-Markovian. Therefore, the degradation sequence
generated by the discrete multi-dimensional sensor data sequences
and latent sequences
are of interest to the engineer. This is shown in Equation (4):
where
denotes the latent sequence. The basis of the latent dynamical system is assumed to have an emission model
and transition model
. Two assumptions are classically imposed on the emission and transition models as shown in Equations (5) and (6),
These equations capture the assumption that the current state, , holds complete information for the observations , and the subsequent state . For noisy multidimensional sensor data sets with complex non-Markovian transition, this assumption is insufficient. The proposed mathematical formulation characterizes the state-space model without these assumptions.
Therefore, to derive the proposed mathematical framework of the proposed methodology, we first put forward the variational lower bound objective function from Equation (4), given that we do not make the Markov assumption from Equations (5) and (6). Thus, we have:
Substituting into Equation (7) we get,
Applying the product rule on (10) we have,
Applying the quotient rule on (11) we have,
Therefore, we have,
where we simultaneously want to minimize the Kullback–Liebler (KL) divergence and maximize the variational (evidence) lower bound (ELBO),
, as shown in Equation (16):
Now, rearranging Equation (16), we have the non-Markovian variational lower bound derived for time series data in Equation (17):
To add in adversarial training, we follow Goodfellow, et al. [
28] and rewrite the optimization function from Equation (17) to Equation (18) as follows:
We now have an objective function which gives us an expressive , that is, we have a mathematical framework the characterizes the state-space model without the restrictive assumptions outlined in Equations (5) and (6). Additionally, this mathematical framework contains both the generative and inference models of the system state that allows us to perform fault diagnostics and prognostics as well as the RUL of the system assessment.
3.2. Semi-Supervised Loss Function
Semi-supervised initialization involves training of the chosen model’s architecture with an incrementally increasing set of labeled data. This is an important aspect to explore, because as the engineer gains more knowledge about a new system, one can label small sets of data, which are known to be system degradation versus healthy operation to increase the system’s health state prediction [
29]. This approach can improve the quality of the results via a semi-supervised loss,
function given by Equation (19):
In the context of the proposed adversarial framework, during the unsupervised training, the discriminator learns features to avoid classifying the generated data as real data, but these features might not be the best representation. To improve the discriminator and develop more meaningful features for the system’s health states over time, labels are used. This is possible by writing the loss function, L, within training to some predetermined number of epochs as follows:
where
x and
y are the same as defined previously.
pmodel corresponds to the trained model. This cost function adds a cross-entropy loss for the first K discriminator outputs. The unsupervised cost is the same as the original GAN (see Equation (2)). However, there is a slight change, as now K + 1 corresponds to the probability of the sample being false [
31]. The discriminator is used as a competent classifier, given a subset of the dataset. In the context of the proposed mathematical framework, the discriminator will be used as a feature extractor, given a subset of the dataset to improve the system’s health state identification results.