Multistate Diagnosis and Prognosis of Lubricating Oil Degradation Using Sticky Hierarchical Dirichlet Process–Hidden Markov Model Framework

Lubricating Oil Diagnostics and Prognostics. Abstract: In this study, we present a state-based diagnostic and prognostic methodology for lubricating oil degradation based on a nonparametric Bayesian approach, i.e., sticky hierarchical Dirichlet process–hidden Markov model (HDP-HMM). An accurate health state-space assessment for diagnostics and prognostics has always been unobservable and hypothetical in the past. The lubrication condition monitoring (LCM) data is generally segregated as “healthy or unhealthy”, representing a binary state-based perspective to the problem. This two-state performance-based formulation poses limitations to the precision and accuracy of the diagnosis and prognosis for real data wherein there may be multiple states of discrete performance that are characteristic of the system functionality. In particular, the reversible and nonlinear time-series trends of degradation data increase the complexity of state-based modeling. We propose a multistate diagnostic and prognostic framework for LCM data in the wear-out phase (i.e., the unhealthy portion of degradation data), accounting for irregular oil replenishment and oil change effects (i.e., nonlinearity in the degradation signal). The LCM data is simulated for an elementary mechanical system with four components. The sticky HDP sets the prior for the HMM parameters. The unsupervised learning over inﬁnite observations and emission reveals four discrete health states and helps estimate the associated state transition probabilities. The inferred state sequence provides information relating to the state dynamics, which provides further guidance to maintenance decision making. The decision making is further backed by prognostics based on the conditional reliability function and mean residual life estimation.


Introduction
The use of statistical models is on the rise for condition-based maintenance (CBM) over system-specific models. Lubricating oil degradation is a slow process that involves irregular oil replenishment and oil change.

Lubricating Condition Monitoring (LCM)
The degradation data from oil wear condition monitoring (CM) can provide information on the health status of both the oil and the machine itself. In the literature, wear-based health status identification has been widely explored for several industrial applications. To assess machine health status, LCM is primarily based on a subjective evaluation of wear debris quantity, size, morphology, and chemical structure that requires specific domain expertise, thereby limiting the utility and application of LCM [1]. In addition, the wear debris concentration (WDC) in lubricating oil is affected by oil changes and replenishments, resulting in negative spikes in the degradation signal [2].
Various researchers have harnessed the importance of wear state identification for effective diagnosis, prognosis, and health management. One of the studies by Wu et al. [3] proposed a wear state identification model considering wear states as degradation features. Two states, i.e., normal and abnormal wear, were classified using a binary-class method, and a support vector data description method was used to study the evolution of these states. Cao et al. [4] also proposed a model to identify wear states using multiple sensor information fusion with the binary class method for the normal and abnormal states. The wear state identification approach of Peng et al. [5] combined wear rate and wear mechanism using the mean shift algorithm. The sample data were clustered, and a chart was prepared for wear rate and wear mechanisms to define and classify the traits of each wear state. This wear state identification method was further extended by considering the severity of wear and the wear rate and wear mechanism by Wang et al. [6].
An adaptive mean shift algorithm with a binary class method was then proposed to model wear-state evolution. Run-in, quasi-stationary, and break-down phases were considered for wear data quantification and segregation based on the debris particle size. The quasi-stationary phase provided information on the dominant wear mode, whereas wear particle quantity governed the transition from the quasi-stationary phase to the breakdown phase [7]. In another attempt, three operational states based on flow in dual filters were proposed by Henneberg et al. [8]. Mohamed et al. [9] that predicted the remaining useful life of lubricating oil, considering the first passage time approach for degradation trajectory and continuous-time Markov chain for the dynamic operating environment. The study also assumed that the number of operating states is known. The work of these researchers above highlights the limitations of the existing approaches: predefined state space, inability to accurately estimate the state space, binary classification model forcing the system to dwell in only two states, the nonlinearity of degradation signal, which complicates the analysis, and the necessity for data segregation/clustering with the mean shift algorithm for any state classification. The "system-based" wear state identification models may provide effective diagnosis, but they cannot be used for prognostics and health management. Given that the wear state cannot be directly observed, the wear model may not accurately identify the health status in the context of LCM.
The use of data-driven state-space models provides an effective solution to this problem. For example, the state-space model of oil degradation data traverses through a discrete set of states with a certain set of transition probabilities. Thus, the oil degradation data provides an observation sequence for an unobservable stochastic process or hidden states.

Diagnostics and Prognostics Using Hidden Markov Model (HMM)
The hidden Markov model (HMM) has been applied to forecast health status based on observations in Refs. [10][11][12][13]. HMM has also been a vital modeling tool in the field of bioinformatics [14], speech [15], image [16], and handwriting recognition [17], as well as financial time series predictions [18]. Scholars have lately identified and explored HMM also for its diagnostic and prognostic (DP) capabilities. In an early attempt at HMM for diagnostics, Smyth et al. [19] introduced HMM as part of a hybrid model with a neural network providing prior temporal correlations to develop a probabilistic framework for antenna health diagnosis. The Markov model's ability to capture probabilistic changes in the degradation parameters was then realized by Monplaisir and Arumugadasan in Ref. [20]. They used a discrete-time Markov chain to estimate the probable transitions through seven degradation states to assess the lubricant condition to support maintenance decision making. Given the apparent gain in using Markov-based modeling for degrading systems, researchers started exploring the tool for mechanical system diagnosis and prognosis (DP). Figure 1 shows the count of research articles published relating to HMM-based DP of mechanical components, namely, bearings, rotating shaft, drill bits, wind turbines, turbofans, boilers, etc. Two pertinent review works are included in this count. seven degradation states to assess the lubricant condition to support maintenance decision making. Given the apparent gain in using Markov-based modeling for degrading systems, researchers started exploring the tool for mechanical system diagnosis and prognosis (DP). Figure 1 shows the count of research articles published relating to HMM-based DP of mechanical components, namely, bearings, rotating shaft, drill bits, wind turbines, turbofans, boilers, etc. Two pertinent review works are included in this count. The first review by Li et al. [21] on rotating machinery prognostics discussed Kalman filter, particle filter, stochastic filter, artificial neural network, HMM, etc. In addition, the authors briefly reviewed HMM with its computational limitations and application to remaining useful life (RUL) prediction. The second review by Mor et al. [22] discussed HMM in the context of the systematic evaluation of HMM variants and their applications over the years. However, the review does not deal with any studies on applications of HMM in the field of diagnosis and prognosis.
There is an increasing interest in applying HMM to fault diagnosis and failure prognosis over the past decade [22]. However, HMM for lubricating oil degradation modeling has not yet been fully explored. The recent studies by Du et al. [23] and Zhu et al. [24] [ on lubricating oil RUL prediction used wear debris level as the lubricating oil degradation feature and provided a physical model for the relationship between oil degradation and particle contamination. A vector autoregressive model [23] was considered as an observation process to generate input for the three-state HMM. The authors of that study focused on a healthy portion of the data and classified the states as healthy, unhealthy, and failed. This experience and assumption-based discrete state classification may not provide adequate maintenance decision support in a practical scenario, more so for relatively new technologies with limited data history. A detailed review by Wakiru et al. [25] ] listed out the various diagnostic and prognostic approaches for lubricating oil condition monitoring, but again, it cited few studies on the use of HMM. Khaleghai and Makis [26] presented a prognostic model for dependent failure modes of a system. The authors considered a standard three-state HMM model with Marshal-Olkin bivariate exponential distribution The first review by Li et al. [21] on rotating machinery prognostics discussed Kalman filter, particle filter, stochastic filter, artificial neural network, HMM, etc. In addition, the authors briefly reviewed HMM with its computational limitations and application to remaining useful life (RUL) prediction. The second review by Mor et al. [22] discussed HMM in the context of the systematic evaluation of HMM variants and their applications over the years. However, the review does not deal with any studies on applications of HMM in the field of diagnosis and prognosis.
There is an increasing interest in applying HMM to fault diagnosis and failure prognosis over the past decade [22]. However, HMM for lubricating oil degradation modeling has not yet been fully explored. The recent studies by Du et al. [23] and Zhu et al. [24] on lubricating oil RUL prediction used wear debris level as the lubricating oil degradation feature and provided a physical model for the relationship between oil degradation and particle contamination. A vector autoregressive model [23] was considered as an observation process to generate input for the three-state HMM. The authors of that study focused on a healthy portion of the data and classified the states as healthy, unhealthy, and failed. This experience and assumption-based discrete state classification may not provide adequate maintenance decision support in a practical scenario, more so for relatively new technologies with limited data history. A detailed review by Wakiru et al. [25] listed out the various diagnostic and prognostic approaches for lubricating oil condition monitoring, but again, it cited few studies on the use of HMM. Khaleghai and Makis [26] presented a prognostic model for dependent failure modes of a system. The authors considered a standard three-state HMM model with Marshal-Olkin bivariate exponential distribution for the joint distribution of the dependent failure modes. Their model, though quite powerful, cannot track the low degradation level increments. A review of the data-driven approaches by Si et al. [27] can be referred to for work relating to HMM for prognostics. The authors in their article had also reviewed the different variants of HMM, including hidden semi-Markov model (HSMM), hidden hierarchical Markov model (HHMM), factorial hidden Markov model (FHMM), etc. for numerous applications. However, their work did not address HMM's application in lubricating oil diagnosis and prognosis. A recent work by Kim et al. [28] also recommended using the healthy portion of the oil data, due to the unavailability of optimal segmentation techniques, for the nonstationary time series. There exists a research gap on optimal state-space identification in the context of fault diagnosis and failure prognosis. Based on existing research explorations, the limitations of HMM-based lubricating oil diagnosis and prognosis may be listed as follows:

1.
The state space is prespecified based on experience, assumption, or data segmentation; 2.
There is no possible update to the state space based on the trend of the degradation data second item; 3.
The number of parameters is limited; 4.
The analysis is based purely on the healthy portion of the oil data with minimal inherent nonlinearity.
Though advanced variants of HMM have been widely explored in the fields of speech [15] and handwriting recognition [17], music recognition and classification [29] profile analysis in structural biology [30], etc., studies that focus on their application to diagnostic and prognostic case studies are still lacking. In this study, we propose a state-based lubricating oil diagnosis and prognosis using the hierarchical Dirichlet process (HDP)-HMM to overcome the limitations of the standard HMM. HDP-HMM is a nonparametric Bayesian approach that solves the identification of the number of states, state updating, and the limitation on the number of parameters. Our work here provides a practical LCM approach, compared to other nonparametric Bayesian methods, i.e., multi-output Gaussian process regression (MO-GPR), which we have explored earlier in Ref. [31]. Our previous work based on the MO-GPR method to predict lubricating oil RUL faced the following limitations: • RUL prediction requires one or more historical degradation time-series patterns; • Nonlinearity and nonmonotonicity of degradation trends affect the RUL prediction accuracy; • Degradation states and state evolution trends cannot be extracted and estimated.
The HDP-HMM overcomes all of these above limitations by estimating the degradation states even for nonlinear trends without the availability of any historical data sets. In machine learning, HDP is a nonparametric Bayesian method to cluster data, which uses the Dirichlet process for clustering. The Dirichlet process is a generalization of the Dirichlet distribution, which is defined as a distribution over distributions of N outcomes. Therefore, any draw from a Dirichlet distribution is a probability distribution in itself. The Dirichlet process also follows the same characteristic, except that the number of observations, N, can reach infinity.
Initially, a speaker diarization problem was solved by Fox et al. using HDP-HMM [32]. The authors used a sticky HDP-HMM to segmentize the audio recording without prior information on the speakers' number. The sticky HDP-HMM resolved the over-segmentation problem posed by the standard HDP-HMM. The authors in Ref. [32] further developed the blocked Gibbs sampler to estimate the state sequence considering a nonparametric emission distribution. The emission or output probability for hidden states, Z T , is defined as P(y t ∈ A}}|Z t = z t ), where y t is the observation at time t. The latest work by Sun and Zhang [33] discusses bearing health status identification using HDP-HMM. The HDP-HMM is improved by replacing the ergodic topology of transition from the emission state to all other states, with the left-to-right model having state transition to itself and to one other unique state only.
The proposed work aims at estimating the hidden number of states for a given oil degradation trajectory using sticky HDP-HMM. It uses HDP-HMM to model unknown state space learned from degradation data. A range of hyperparameters is then set to train the model with a truncated approximation for HDP. The model then estimates the optimal number of hidden states for the given oil degradation trajectory. The number of states will vary according to the observed trajectory.
The wear debris concentration (WDC) in lubricating oil provides a vital degradation feature that indicates the lube oil and system health status. Here, we simulate realistic traces of the degradation signal based on WDC by utilizing the wear model proposed by Fan et al. [34]. The relation of wear rate with wear debris concentration is defined using a system-based model in their work. The authors [34] simulated WDC for the severe wear rate phase of the operating system. The severe wear rate (wear-out) phase of the operating system generates more wear debris and will require more oil replenishments and oil changes to avoid breakdown. The other two phases in a typical lifecycle bathtub curve, i.e., burn-in, and steady-state wear rates see fewer oil replenishments and oil changes, generating an almost linear or monotonic oil degradation trajectory with random noise. The previous work of several researchers [26,32,33,35] has already proven the efficacy of HDP-HMM with different trajectories. The focus of our work is to check the ability of the HDP-HMM model to identify hidden states of lubricating oil degradation with irregular replenishments and oil changes. The criticality of the operating system and lubricating oil degradation in the wear-out phase motivated this study.
This study also introduces degradation drops due to external interference events, namely, oil replenishment and oil change, to the simulated data. The effect of these events is evident in the experimental work by Wu et al. [2]. In addition to oil replenishment and oil change, lubricating oil degradation can be influenced by contamination, operating conditions (air, moisture, water, gas, radiation, etc.), storage conditions, operating temperature and pressure, speed of tribological component, oil selection, change in chemical properties, etc. The proposed method considers lubricating oil contamination, i.e., wear debris, to assess the lubricating oil remaining useful life, and it can be utilized for any system where contamination is continuously monitored, and lubricant is under regular replenishments.
Based on the above discussions and context of our work here, the subsequent sections address the following aspects. Section 2 presents the simulation methodology for data generation, along with the theory and application of the Dirichlet process (DRP), HMM and, sticky HDP-HMM in the context of the oil wear data. In Section 3, the remaining useful life (RUL) to oil replacement is predicted using the conditional residual function (CRF) and mean residual life (MRL). Finally, Section 4 summarizes the study and its outcomes and presents some ideas for future work leveraging our proposed framework.

Model Framework and Methodology
This section starts with the simulation model for oil wear degradation phenomena and then introduces the formulation and application of the HMM, DRP, and sticky HDP-HMM using the simulated oil data set that represents the real-life nonidealities due to replenishment and oil change.

Simulation of Degradation Data
The wear of contact surfaces in mechanical components is measured in terms of wear debris concentration (WDC) in lubricating oil. The online monitoring of WDC provides information on the wear rate of the component. WDC and wear rate follows a relationship in the lubrication system. The lubrication system has a cyclic oil flow with continuous removal of wear debris due to oil filtration, oil loss, sedimentation, and debris sticking to the surface. Therefore, the WDC formulation assumes the wear debris production rate to be equal to the wear rate. The evolution of WDC comprises three stages with different wear rates, as described in detail by Fan et al. [34]. Lubricating oil is assumed to be frequently added to sustain the oil level in the lubrication system. Therefore, WDC degradation data need to be simulated, accounting for the effect of frequent oil replenishment and oil change. Our study here considered a mechanical system comprising four basic components: tribological component, oil filter, oil pump, and oil tank. The assumptions that we made to carry out the WDC simulation are listed as follows:

5.
Oil replenishment is an external event, and oil consumption is precisely equal to oil replenishment; 6.
Wear production rate is equal to wear rate; 7.
Wear debris are homogeneously distributed in the oil with negligible mixing time; 8.
The system is in the wear-out (abnormal) phase of its life cycle; Appl. Sci. 2021, 11, 6603 6 of 16 9.
Oil is changed every time the WDC reaches a threshold of 40 ppm.
The quantity WDC is defined as C(t), a function of wear rate and is given by the following equation based on the model presented in Ref. [34]: where we define V r = V q , and V r is the oil replenishment quantity, V q is the oil consumption quantity, V 0 is the initial volume of oil in the tank, c is the wear mode, and (c > 0) for wear-out phase, m s is the steady-state wear rate, W F (t) denotes the white Gaussian noise, and k is the attenuation coefficient. Therefore, the expression for WDC becomes Here, β x is the beta ratio, i.e., filter efficiency, a ratio of WDC before filter to the WDC after the filter, Ψ is the wear debris loss factor, and Q is the oil flow through the filter.
The oil replenishment and oil change cause abrupt negative jumps (spikes) in the degradation trajectory, and the WDC trend continuously evolves. The degradation pattern is simulated for the abnormal state, using experimental parameters for the lubrication system given by Fan et al. [34]. The model parameters are m s = 25 mg/min, V 0 = 40-L, m 0 = 0.4975, W F (t)~N(0,1.5), c = 0.028, k = 0.0001, V r = V q = 0, t 1 = 0 sec, and the oil replenishment effects are added as negative ppm values in the model.
The complete life cycle of the system comprises several oil life cycles (OLC). In this work, we considered four OLC for the wear-out phase of the oil and the system. As mentioned earlier, Kim et al. [28] studied a healthy portion of the oil data, citing limitations for optimal data segregation. Figure 2 illustrates the complexity in the WDC data generated for the 4 OLCs with oil changes occurring every time WDC reaches the 40-ppm threshold criterion as well as random oil replenishments based on minimal oil quantity to be maintained in the operating system. One of the simulated WDC time series traces with negative jumps due to oil replenishment, and oil change is plotted in Figure 2. The simulated degradation trajectory here is used as the input to the state-based Dirichlet process. In the following subsection, we discuss the state-based stochastic model, i.e., HMM. The simulated degradation trajectory here is used as the input to the state-based Dirichlet process. In the following subsection, we discuss the state-based stochastic model, i.e., HMM.

Hidden Markov Model (HMM)
Lubricating oil degradation trajectory with a prespecified number of states is widely modeled using a stochastic technique called the hidden Markov model (HMM). In general, the state space for lubrication degradation is discretized based on experience [23,26]. HMM has proved its efficiency in the field of diagnostics. HMM assumes that the states are hidden for observed values. Zt's hidden state for observation Y t at time t is assumed to evolve through K possible states. The new state, Z 0 = 0, depicts the new oil and initial state of degradation. The chain rule then gives the state sequence probability [10] P( where λ is the state transition matrix given by HMM hypothesizes that the observation of emission sequence is generated from hidden states traversing over time. In the case of lubrication degradation, the hidden state can traverse to "any state" with "continuous observations". However, HMM requires a "discrete state space" for hidden states that need to be "defined and specified in advance". Considering these inherent restrictions, HMM cannot directly provide an optimal prognostic and health management (PHM) strategy for practical real case scenarios, more so, for new systems with insufficient knowledge. An advanced variant of the HMM with an infinite-state transition matrix needs to be considered to model this. A prior over-thetransition matrix is achieved using the hierarchical Dirichlet process (HDP), which will be discussed in the next subsection.

Hierarchical Dirichlet Process (HDP)-HMM
The HDP-HMM is a nonparametric Bayesian approach to extend HMM to infinite state space. The HDP prior over an infinite state space requires a brief overview of the Dirichlet process. Figure 3 shows the generative model for HDP-HMM, where Z t shows the hidden state sequence, Y t represents the observation sequence and is drawn from the observation distribution, f. The HDP prior over infinite transition matrices has each π j as a Dirichlet process draw depicting the transition distribution from j. The Dirichlet process, DRP (γ, H), is a stochastic process consisting of a set of discrete probability distributions. Here, γ is a scalar parameter known as the concentration parameter, and H is the and base distribution with parameter space θ.
A sample from the Dirichlet process is a distribution over θ and is defined as G 0, given by the hidden state sequence, Yt represents the observation sequence and is drawn from the observation distribution, f. The HDP prior over infinite transition matrices has each πj as a Dirichlet process draw depicting the transition distribution from j. The Dirichlet process, DRP (γ, H), is a stochastic process consisting of a set of discrete probability distributions.
Here, γ is a scalar parameter known as the concentration parameter, and H is the and base distribution with parameter space θ. A sample from the Dirichlet process is a distribution over θ and is defined as G0, given by ~, = 1, … , ∞,~( ) To obtain a discrete set of locations, repeated sampling of the base measure, H, is carried out, and location weights are obtained using the stick-breaking process denoted as GEM(γ) and GEM (from Griffiths, Engen, and McCloskey) [32], and θk is the state-specific observation distribution parameter. The weight vector, βk, given by Equation (10) represents the division of a unit length stick by weights, βk. The kth weight is a random proportion β ' k of the remaining stick after the previous (k-1) weights have been defined. Subsequently, the random draw G ~ DRP (γ, H) with a probability of one can be expressed as [32]  To obtain a discrete set of locations, repeated sampling of the base measure, H, is carried out, and location weights are obtained using the stick-breaking process denoted as GEM(γ) and GEM (from Griffiths, Engen, and McCloskey) [32], and θ k is the state-specific observation distribution parameter. The weight vector, β k , given by Equation (10) represents the division of a unit length stick by weights, β k . The kth weight is a random proportion β k of the remaining stick after the previous (k−1) weights have been defined. Subsequently, the random draw G~DRP (γ, H) with a probability of one can be expressed as [32] where the notation δ (θ − θ k ) indicates a Dirac delta at θ = θ k . From this definition, it is evident that the DRP defines a distribution over a discrete probability measure. The HDP then consists of (at least) two layers of DRP as shown in Figure 3, which can be expressed as where γ is the concentration parameter for upper-level DRP and α is the concentration parameter for the lower-level DRP. Since G j is discrete, G 0 is also discrete. Equations (12)- (15) represent the hierarchical sampling of G j values from G 0 . The DRP and HDP are used to generate priors for data likelihood parameters. These priors are further used for estimating the value of the HMM parameters. The state transition distribution in the sticky HDP-HMM is then given by where s is the sticky parameter. The next section explains the sticky HDP-HMM is overcoming the limitations (i.e., parametric treatment for transition and emission distributions) of the standard HMM and HDP-HMM.

Sticky (HDP)-HMM
The HDP-HMM is also known as a Bayesian HMM or infinite HMM because it represents the same framework that models the transitions among hidden states given the observations generated by those states. The HMM assumes that the state at any time t is independent of all the previous states before time (t − 1) and depends only on the state at a time (t − 1). In the HMM, the observations belonging to the discrete hidden states can either be discrete or continuous. The need for sticky HDP-HMM over HMM and HDP-HMM is discussed next. The realization of the number of hidden states (K) is a prerequisite to learn HMM.
The HDP-HMM works with an infinite state space but has unrealistic fast dynamics allowing state sequences to have a much wider posterior probability. In other words, HDP-HMM does not differentiate its self-transitions from transitions between different states. In such a case, the identifications on observation of a model with redundant states can be hindered. In contrast, the DRP-HMM relaxes this limitation by providing a prior over the infinite states with infinite transition matrices. Thus, with the limitations of HMM and HDP-HMM, the sticky HDP-HMM appears to be an elegant Bayesian model for an infinite number of states for any given practical data set.
Any draw from DRP (γ, H) is a discrete distribution that helps to build the DRP-HMM such that the emission parameters define the distribution, H, and the DRP (γ, H) defines the infinite subsets of the parameters. Thus, DRP (γ, H) represents a prior over each infinite transition matrix row as a separate draw from the Dirichlet process (DP).
However, there is a chance that if H is a continuous distribution, the transition matrix progress through unique states (that have never occurred before) for all the transitions, which limits the DRP suitability to many applications. Therefore, the HDP replaces the continuous distribution of H with the discrete distribution, H , by drawing from the upperlevel DRP and forcing the lower-level DRP to generate an output distribution with the same parameters. Thus, the HDP interpretation after applying it to Markov chain can be expressed as where Z t denotes the state of oil at time t and T is the number of observations. Then, the transition matrix λ j as one separate row of the Markov chain is given by Though HDP-HMM can give specific priors to the transition matrices, it is not enough, as in some cases, the changes in the hidden states are negligible. HDP-HMM also has limitations, namely, over segmentation of data, creating redundant states, rapid switching among states, and inability to differentiate self-transition from transitions between different states [32]. These limitations might cause erroneous estimation of state space and transition probabilities in a small data set scenario. Therefore, a weightage parameter s is introduced to avert rapid switching among redundant states, as shown in Equation (16). From Equations (11) and (16), the more intuitive transition distribution can now be written as The above formulation is the sticky HDP-HMM and is based on the stick-breaking process as defined in the previous DRP section. Figure 4 illustrates the state-space diagram for the sticky HDP-HMM with DRP emissions. Equation (19) infers the states for the simulated degradation data set plotted in Figure 2. In the HDP and the HDP-HMM, inference and learning are usually achieved simultaneously by estimating the posterior distribution of the parameters and hidden variables. Our aim is to decode Z t hidden variables. The sampling structure used in this work explicitly samples parameters β, π j , (distribution over the initial states of the Markov chain), θ k , Υ, and α by way of hyperpriors: p Z 1:T , θ 1:K , π 1:J , β, α, γ|y 1:T . tween different states [32]. These limitations might cause erroneous estimation of state space and transition probabilities in a small data set scenario. Therefore, a weightage parameter s is introduced to avert rapid switching among redundant states, as shown in Equation (16). From Equations (11) and (16), the more intuitive transition distribution can now be written as │ ,~ + , ( ) + +

(110 )
The above formulation is the sticky HDP-HMM and is based on the stick-breaking process as defined in the previous DRP section. Figure 4 illustrates the state-space diagram for the sticky HDP-HMM with DRP emissions. Equation (19) infers the states for the simulated degradation data set plotted in Figure 2. In the HDP and the HDP-HMM, inference and learning are usually achieved simultaneously by estimating the posterior distribution of the parameters and hidden variables. Our aim is to decode Zt hidden variables. The sampling structure used in this work explicitly samples parameters β, πj, (distribution over the initial states of the Markov chain), θk, ϒ, and α by way of hyperpriors: ( : , : , : , , , | : ). A sampling of the above parameters jointly is impractical; therefore, Gibbs sampling offers an alternative. Several Gibbs samplers for the HDP are available in [32,35,36] In this work, we use the approach introduced by Fox et al. [36]. The data likelihoods are simply chosen to be Gaussian, and all priors are conjugate to their respective likelihoods to allow for closed-form computation of the posteriors.

Model Evaluation
The oil degradation data set described in Section 2.1 was used to experiment with the proposed sticky HDP-HMM methodology. A sampling of the above parameters jointly is impractical; therefore, Gibbs sampling offers an alternative. Several Gibbs samplers for the HDP are available in [32,35,36] In this work, we use the approach introduced by Fox et al. [36]. The data likelihoods are simply chosen to be Gaussian, and all priors are conjugate to their respective likelihoods to allow for closed-form computation of the posteriors.

Model Evaluation
The oil degradation data set described in Section 2.1 was used to experiment with the proposed sticky HDP-HMM methodology.

Hyperparameter Optimization
In this work, the HDP-HMM hyperparameters (α, γ, s) were optimized using a global optimization technique, i.e., Bayesian optimization. The optimization predetermines the hyperparameter range before model training. The log-likelihood function is considered as the convergence metric. The Mann-Kendall entropy [37] of log-likelihood at each iteration is calculated as the objective function, as given in Equations (20) and (21). The Gamma distribution is chosen as the prior to α (lower-level DRP concentration parameter) and γ (upper-level DRP concentration parameter). The sticky parameter, s, has a significant impact on the model convergence time and convergence effect. The objective function and Mann-Kendall entropy are expressed as where d is the log-likelihood length.
The hyperparameters are then used to train the HDP-HMM on the degradation dataset. Here, we consider a multivariate Gaussian distribution for the emission distributions and an individual Gaussian distribution for the prior over the mean and the variance parameter. The weak limit parameter, L, i.e., the state truncation level, is set to 20. The hyperparameter search range is set to (1,10), (1,10), and (1,50) for α, β, and γ, respectively, with 300 iterations, and the log-likelihood convergence is shown in Figure 5.
considered as the convergence metric. The Mann-Kendall entropy [37] of log-likelihood at each iteration is calculated as the objective function, as given in Equations (20) and (21). The Gamma distribution is chosen as the prior to α (lower-level DRP concentration parameter) and γ (upper-level DRP concentration parameter). The sticky parameter, s, has a significant impact on the model convergence time and convergence effect. The objective function and Mann-Kendall entropy are expressed as arg max ( , , ) = * , where d is the log-likelihood length.
The hyperparameters are then used to train the HDP-HMM on the degradation dataset. Here, we consider a multivariate Gaussian distribution for the emission distributions and an individual Gaussian distribution for the prior over the mean and the variance parameter. The weak limit parameter, L, i.e., the state truncation level, is set to 20. The hyperparameter search range is set to (1,10), (1,10), and (1,50) for α, β, and γ, respectively, with 300 iterations, and the log-likelihood convergence is shown in Figure 5.

State-Space Estimation
After training the HDP-HMM, we obtained the state sequence, as shown in Figure  6. The scatter plots in Figure 6a,b illustrate data after sampling, as described in Section 2.3. The x-axis represents the observations that form clusters, and the y-axis represents the distances at which these clusters merge. The colored bars represent the individual states.

State-Space Estimation
After training the HDP-HMM, we obtained the state sequence, as shown in Figure 6. The scatter plots in Figure 6a,b illustrate data after sampling, as described in Section 2.3. The x-axis represents the observations that form clusters, and the y-axis represents the distances at which these clusters merge. The colored bars represent the individual states. Knowing that the wear rate follows a relation with WDC as described by Equation (1), the wear states should also be capable of tapping the WDC fluctuation in lubricating oil. The state sequence shows a unique starting state. The effects of oil replenishments and oil changes on the state sequence change are visible from the different color bars.
The blue color bars indicate the states related to oil replenishments and oil changes, the red color depicts the initial degradation state, and the cyan and the yellow colors depict the least degraded and most degraded intermediate states between oil replenishments and oil change in Figure 6a. Figure 6b shows the estimated state sequence at the 100th iteration, and the scatter plot shows simulation of 300 observations drawn from HDP. Each observation within the cluster is drawn independently from the multivariate Gaussian distribution. Figure 7a shows the inferred hidden states corresponding to the observations. The impact of irregular oil replenishments and oil changes are evident from the expected transition in the degradation states. The corresponding state transition probability for the estimated state-space matrix is plotted in Figure 7b. The matrix quantifies all the probable transitions from a given state to the next state. The rows in the matrix represent pretransition states, and the columns represent post-transition states. Since state 1 is the default initial state, there is a high probability of transition from state 1 to states 2 and 3. The high probability values along the diagonal are representative of the sticky HDP-HMM algorithm's bias towards self-transition. The remaining values in the matrix are indicative of the possible nonzero, non-negligible, and nonsequential state transitions. With the inferred state sequence as shown in Figure 7a, we shall now present our approach of predicting the residual life of the system in the next section below.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 12 of 16 Knowing that the wear rate follows a relation with WDC as described by Equation (1), the wear states should also be capable of tapping the WDC fluctuation in lubricating oil. The state sequence shows a unique starting state. The effects of oil replenishments and oil changes on the state sequence change are visible from the different color bars.
(a) (b) Figure 6. (a) Observed "true" state sequences corresponding to the simulated WDC evolution trend in Figure 2. The scatter plot shows observed data; (b) the observation distribution scatter plot, and the estimated states sampled at the 100th iteration.
The blue color bars indicate the states related to oil replenishments and oil changes, the red color depicts the initial degradation state, and the cyan and the yellow colors depict the least degraded and most degraded intermediate states between oil replenishments and oil change in Figure 6a. Figure 6b shows the estimated state sequence at the 100th iteration, and the scatter plot shows simulation of 300 observations drawn from HDP. Each observation within the cluster is drawn independently from the multivariate Gaussian distribution. Figure 7a shows the inferred hidden states corresponding to the observations. The impact of irregular oil replenishments and oil changes are evident from the expected transition in the degradation states. The corresponding state transition probability for the estimated state-space matrix is plotted in Figure 7b. The matrix quantifies all the probable transitions from a given state to the next state. The rows in the matrix represent pretransition states, and the columns represent post-transition states. Since state 1 is the default initial state, there is a high probability of transition from state 1 to states 2 and 3. The high probability values along the diagonal are representative of the sticky HDP-HMM algorithm's bias towards self-transition. The remaining values in the matrix are indicative of the possible nonzero, non-negligible, and nonsequential state transitions. With the inferred state sequence as shown in Figure 7a, we shall now present our approach of predicting the residual life of the system in the next section below.

Prediction of Residual Life
As described in detail above, the degradation process is defined by HDP-HMM with an estimated state space consisting of Z = {1, 2, 3, 4}. The state sequence can be inferred from Figures 6b and 7a, and the corresponding state transition matrix is obtained from Figure 7b. This procedure of state estimation and representation is rare in LCM DP since most of the focus in the past has been towards HMM, as mentioned earlier in Section 1.
Following the state diagnosis above, for prognosis, we used the posterior probability to define a conditional reliability function (CRF) and the mean residual life (MRL) functions. Since degradation data filtration is required in most cases to remove the random noise in the data, the refined samples were then considered for DP. The present work used

Prediction of Residual Life
As described in detail above, the degradation process is defined by HDP-HMM with an estimated state space consisting of Z = {1, 2, 3, 4}. The state sequence can be inferred from Figures 6b and 7a, and the corresponding state transition matrix is obtained from Figure 7b. This procedure of state estimation and representation is rare in LCM DP since most of the focus in the past has been towards HMM, as mentioned earlier in Section 1.
Following the state diagnosis above, for prognosis, we used the posterior probability to define a conditional reliability function (CRF) and the mean residual life (MRL) functions. Since degradation data filtration is required in most cases to remove the random noise in the data, the refined samples were then considered for DP. The present work used DRP and a stick-breaking process for sampling. After state estimation, we considered the data points as sampling epochs. The samples were collected at time intervals of ∆ = 1 h, and n = 300 samples were collected, and Y n∆ denotes the observation process. Assume that the lubrication oil does not fail at a decision epoch, n, i.e., ξ > ∆n, ξ is the observation time.
The posterior probability statistic, π n , is then used to express CRF and MRL, representing the likelihood that the oil is in a warning state given all the information until time n∆ and defined as [23] π n = P( The CRF denoting the probability of non-failure by n∆+t, is then defined as Using the CRF expression, the MRL(µ) at the nth epoch can then be extracted as We can predict the RUL making use of the CRF function and the MRL value, as suggested by Khaleghei and Makis in Ref. [26], estimated by applying Equations (24) and (25) to the degradation data set. Figure 8a plots the conditional reliability estimated at all the sampling epochs for the degradation data. The reliability drops significantly around the 120th, 180th, and 240th sampling epochs when the oil is changed. The reliability drops flicker between 0.965 to 0.865 since the degradation is constantly affected by irregular oil replenishments and oil changes. The effect of oil replenishment and oil change is evident from the more minor reliability transitions in the plot. The reliability may decline drastically in the absence of oil replenishments.  Figure 8b plots the posterior probability of lubricating oil degradation data presented in Figure 2. The posterior probability is estimated using Equation (23), taken from Ref. [23]. It may be inferred from the plot that the lubricating oil healthy state continues for a prolonged time between the 10th and 250th sampling epoch. Moreover, the oil has transitioned to the warning states several times between the 100th to 300th sampling epoch prior to oil changes. This shows a high probability that the lubricating oil state can change and that oil replenishments need to be initiated following such inspections.   Figure 2. The posterior probability is estimated using Equation (23), taken from Ref. [23]. It may be inferred from the plot that the lubricating oil healthy state continues for a prolonged time between the 10th and 250th sampling epoch. Moreover, the oil has transitioned to the warning states several times between the 100th to 300th sampling epoch prior to oil changes. This shows a high probability that the lubricating oil state can change and that oil replenishments need to be initiated following such inspections.

Conclusions
In order to examine the extent of degradation in oil debris, online condition monitoring technology is commonly used to provide real-time information on the oil state and equipment wear state without disassembling the equipment. We documented in this study that there has been limited research on state-based oil degradation modeling initiatives in the past. The key contribution of work lies in introducing a multistate modeling approach to lubrication degradation trajectory, citing the complexity and inability to trace the trajectory with a stochastic model [31]. The proposed work can effectively classify the multiple states. The probabilistic progression of the wear state under frequent oil replenishments and oil changes is also provided by a state-based prognostic model. While the hidden Markov model (HMM) with a prior preassigned state count (N) is widely used for multistate modeling to quantify the probabilistic changes in the degradation parameters. This estimate of state count is purely based on experience and assumption, limiting its application for complex systems with nonmonotonic and singular (spiky) degradation traces with frequent external intentional or unintentional interferences into the system.
With this scenario in mind and with the limitations of HMM and HDP-HMM, the sticky HDP-HMM appears to be an elegant Bayesian model for an infinite number of states [32]. We presented the hierarchical Dirichlet process-hidden Markov model (HDP-HMM) framework with its nonparametric properties to infer the system's unknown state space size based purely on the degradation time series data. The sticky HDP-HMM helped further refine the data learning capability of the model, where the degradation data (i.e., wear debris concentration in oil) was simulated for the system considering multiple oil replenishments and oil changes. The remaining useful life (RUL) is indirectly evaluated in terms of the CRF and MRL to estimate the time for an oil change. Our analysis proves that the experience and assumption-based states' classification is not effective for real-time oil diagnosis and prognosis.
The state-based Dirichlet process for lubrication condition monitoring still needs more intense research explorations in the near future. The HDP-HMM can improve state-based DP for various industrial systems. In the future, though the sticky HDP-HMM follows a state persistence approach, our work needs to be extended by considering the HDP-hidden semi-Markov model (HSMM) to learn nongeometric state durations with natural priors over state duration. A comparative analysis for estimated state space, state evolution over time, and estimated RUL will need to be carried out with the HDP-HMM, HDP-HSMM, and HDP-Hierarchical hidden Markov model (HHMM) for the same WDC time series data set. Various sampling algorithms in parallel with the sticky algorithm will also be explored in the context of their state estimation capability. While the present work estimates the instantaneous wear state space based on current WDC data, the work will also be extended to investigate cumulative wear state-space estimation to detect system wear state. The RUL will be predicted using CRF and MRL for comparative analysis with another nonparametric Bayesian approach, i.e., MO-GPR.