Assessment of Early Stopping through Statistical Health Prognostic Models for Empirical RUL Estimation in Wind Turbine Main Bearing Failure Monitoring

Details about a fault’s progression, including the remaining-useful-lifetime (RUL), are key features in monitoring, industrial operation and maintenance (O&M) planning. In order to avoid increases in O&M costs through subjective human involvement and over-conservative control strategies, this work presents models to estimate the RUL for wind turbine main bearing failures. The prediction of the RUL is estimated from a likelihood function based on concepts from prognostics and health management, and survival analysis. The RUL is estimated by training the model on run-to-failure wind turbines, extracting a parametrization of a probability density function. In order to ensure analytical moments, a Weibull distribution is assumed. Alongside the RUL model, the fault’s progression is abstracted as discrete states following the bearing stages from damage detection, through overtemperature warnings, to over overtemperature alarms and failure, and are integrated in a separate assessment model. Assuming a naïve O&M plan (wind turbines are run as close to failure as possible without regards for infrastructure or supply chain constrains), 67 non run-to-failure wind turbines are assessed with respect to their early stopping, revealing the potential RUL lost. These are turbines that have been stopped by the operator prior to their failure. On average it was found that wind turbines are stopped 13 days prior to their failure, accumulating 786 days of potentially lost operations across the 67 wind turbines.


Introduction
Health prognostics in asset assessment, including remaining-useful-lifetime (RUL) estimations, are key elements in operation, and maintenance (O&M) strategies. This can help to increase production and/or reduce O&M costs. As wind power is a leading renewable energy source, availability, reliability, and lifetimes are taken incrementally into account by investors. In this work, we focus on slow developing faults, which, when not addressed, can cause unwanted or unnecessary costly downtime [1][2][3][4][5][6][7][8]. SPecifically, we will focus on (i) investigate main bearing monitoring, (ii) give a full account of the underlying neural network (NN) approach presented by Herp et al. [9] on different timescales, (iii) present how this ties into O&M efforts, and (iv) compare expected main bearing RUL with model predictions. The later will give raise to a discussion on early stopping of wind turbines and potential waste of RUL (when O&M planning requires to stop the wind turbine ahead of time).
Models for assessing a systems current and future condition are often data-driven, and thus different approaches can be used depending on the type of data at hand. Trappey at al. [10] uses a combined statistical and NN approach, in which principal component analysis is used to extract relevant features. A NN was trained with back-propagation on these features to learn and predict the condition of power transformers. You et al. [11] addresses the use of NNs, by including temporal dependencies, they propose a diagnostic approach for electrical car batteries through the utilization of recurrent neural networks (RNN). On the other hand, Herp et al. [12] proposes a pure statistical approach based on the history and descriptive statistics of the observed data, using a Gaussian process to predict the future of bearing failure time-series in wind turbines. Hong et al. [13] similarly addresses the problem of bearing monitoring by extracting features and using an approach combining NN with self-organizing map approach to estimate the confidence of the bearing health states. In contrast to the aforementioned, Si [14] proposed an approach based on the time-series stochastic properties. Assuming an underlying driving Brownian motion, a closed-form predictive distribution for the time it takes the signal to reach a threshold can be calculated. Medical research uses survival analysis to understand the relations between patients' clinical features and the effectiveness of treatment options. Khan et al. [15] and others like Katzman et al. [16] have recently shown that prognostic and health management, and NNs can be combined to outperform existing state-of-the-art survival analysis models. A comprehensive overview of advances in RUL prediction can be found in Si et al. [17], Chapter 1. For a summary on wind turbine monitoring techniques, we refer to Márquez [18].
Catering to the need of health prognostics, the details of the model proposed by Herp et al. [9] are inspired by concepts of the above mentioned work. It adopts the closed-form approach by Si et al. [17] by assuming the RUL can be quantified by a closed-form expression and embeds it in a NN, to make confidence calculations of the predictions easier. A Weibull distribution has been chosen as a lifetime distribution, since literature already has shown its use in connection with NNs [19][20][21][22]. In contrast to Ranganath et al. [19] deep exponential families model, this work will limit itself to Weibull distribution's only. Yang et al. [23] and Aggarwal et al. [24] developed similar Weibull based NNs and RNNs, respectively, but focus amongst others on different disciplines than wind turbine monitoring.
For the sake of completeness, we refer the interested reader to Herp et al. [9] for a comparative study between different RUL estimation frameworks in empirical bearing fault prediction, as this study will not be concerned with the topic. This paper is organized as follows: In Section 2, the methods for the RUL estimation are described in detail. The presented methodology is applied in Section 3 showing the potential gain from re-evaluating early stopping. A discussion on the results and the underlying assumption are provided in Section 4. The paper is closed by a final remark in Section 5.

Methodology
As this study is concerned with both the empirical estimation of the remaining-useful-lifetime of main bearings and the assessment of early stopping, both topics will have a dedicated section of their own. Even if treated separately here, we will later show that these methods are very much intertwined when it comes to monitoring and operation of one or more wind turbines. In Figure 1 we present how a monitoring and operation work flow might look like in order to make educated decisions based on estimations of the RUL. Fault monitoring is embedded in a back-end operations monitoring framework, which facilitates fault detection. We will not spend more thoughts on fault detection in this work and assume that one of the many fault detection approaches described in the literature can be applied. On a fault monitoring level, upon having collected data regarding the fault-to-be, we can distinguish between three main strategies for later decision making. (1) Assessing the situation at the time of detection, draw a conclusion and decide on an O&M action. Options (2) and (3) are assessments over time, where the aforementioned option derives a new assessment at given condition monitoring (CM) points spaced out over the time until failure, and the latter provides an assessment at any given time. For the remainder of this work we refer to, (1) Initial Assessment , (2) Discrete Assessment, and (3) Continuous Assessment•, collectively, as Empirical Remaining-Useful-Lifetime Estimation. The conclusion of the assessments is then facilitated again in the Operations Monitoring, while actions are left outside the monitoring framework as they are a matter of O&M management. In the conclusion step, one would estimate the benefit of continuing or stopping operations. We refer to this as Assessment of Early Stopping.

Conclusion:
Interpreting the results from the assessment(s).

Action(s):
The operator decides what action(s) to take for its O&M management plan(s).

Data and Notation
This study is concerned with two types of data, time-series data and event data. The time-series data are composed out of a subset of SCADA data and other health indicators, while the event data contains an event description together with its start and stop time. Since a main bearing failure will be investigated as a case study, the SCADA and event data are associated with the failure (including events for the initial damage detection, failure warnings, and failure alarms) and stems from 132 wind turbines of the same type, of which 35 are operated until failure.
Regarding the time-series data, we let {x 1 , . . . , x T } be a process of measurement vectors, x t ∀ t = 1, . . . , T, containing measurements for m variables, i.e., x t = [x 1 , . . . , x m ] . Successive samples on an interval a to b are denoted x [a,b] . The feature space available in this study is the same as used by Bach-Andersen et al. [25] and Herp et al. [9], namely, active power, generator rpm, gear box oil temperature, ambient temperature, wind speed, nacelle temperature, and a bearing health indicator. Apart from the bearing health indicator, all features are sampled each 10 min as averages of the past 10 min interval. The bearing health indicator is based on energy bands in vibration spectra and is provided as an event-based measurement. For the remainder of this work, the SCADA and health indicator data are re-sampled to hourly timestamps. The time-series data are the foundation for the predictive models described in Section 2.2 and employed in Section 3.
The event data are a set of discrete data points where we let E = {E 1 , . . . , E K } be the set of all events E k ∀ k = 1, . . . , K, and we let T = {1, . . . , t, . . . , T} be the time at which an event can occur in the reference frame of the time-series data. An outtake of these data is shown in Table 1. Starting with the initial detection of the damage, hundreds of events are recorded until failure. The failure of the wind turbine of Table 1 is indicated by the red vertical line (see graph), and the event associated with the failure is bearing overtemperature alarm. Wind turbines for which this event is recorded are referred to as run-to-failure wind turbines. These wind turbines also contain the event bearing temperature warning, shown by the orange vertical line in Table 1. Other combinations of events, different from bearing overtemperature warning and bearing overtemperature alarm, might likewise carry valuable information between the initial detection of the damage and the wind turbine failure. Considering a chain of successive and/or simultaneous events, the dependency between two sets of events E l ⇒ E l will be defined by: Energies 2020, xx, 5 4 of 18 initial detection of the damage, hundreds of events are recorded until failure. The failure of the wind turbine of Table 1 is indicated by the red vertical line (see graph), and the event associated with the failure is bearing overtemperature alarm. Wind turbines for which this event is recorded are referred to as run-to-failure wind turbines. These wind turbines also contain the event bearing temperature warning, shown by the orange vertical line in Table 1. Other combinations of events, different from bearing overtemperature warning and bearing overtemperature alarm, might likewise carry valuable information between the initial detection of the damage and the wind turbine failure. Considering a chain of successive and/or simultaneous events, the dependency between two sets of events E l ⇒ E l will be defined by:  The found sets E k can then be used to establish a state as defined later on in Equation (15).
The found sets E k can then be used to establish a state as defined later on in Equation (15).

Empirical Remaining-Useful-Lifetime Estimation
As we are concerned with the time between detecting a failure and the failure itself, we let the RUL be defined to be positive bound, i.e., RUL ∈ [0, ∞). Following the lead of probabilistic estimations of the remaining-useful-lifetime in literature, this section provides the missing detailed account for the underlying framework used in Herp et al. [9] for wind turbine bearing failure.
In this study, η(t) is referred to as the hazard function and H(t) is referred as to the cumulative hazard function. Following textbooks on survival analysis [26], H(t) leads to a cumulative distribution function for a positive random variable RUL: and a probability density function: As a consequence, for each cumulative distribution function, P(RUL ≤ t), there exists H(t) such that H(t) = − log(1 − P(RUL ≤ t)). In terms of remaining-useful-lifetime, large values of the hazard function indicate a higher chance of an failure to occur up to the current time, i.e., higher failure rate in terms of fault prediction. For the remainder of the study, the focus will be on the right tail of the distribution under consideration, which will define the RUL distribution [26]: As the time it takes to observe the remaining-useful-lifetime is the remaining lifetime of the wind turbine itself, the true remaining-useful-lifetime,RUL, cannot be observed until it is too late. This premise is known as censored observations in survival analysis [26]. Let RUL be some positive random variable indicating the remaining-useful-lifetime. An observation of RUL is said to be censored whenever it has been observed point wise. In this framework one might distinguish between truê RUL and observed RUL. The trueRUL is always contained in the observation. WhenRUL equals the observed value, the process is referred to as non-censored. More specifically we distinguish between (i): right censored data ∆ = 1, such that RUL is know to be above a threshold t (i.e., RUL ∈ (t, ∞)), and (ii) non-censored ∆ = 0, whenRUL equals the observed value (i.e., RUL = t). Censoring the remaining-useful-lifetime of wind turbine prohibits simply using the mean value of the already failed wind turbines, as this will lead to underestimating the RUL.
A full mathematical proof of the following description can be found in Patti et al. [27]. In Figure 2 the censoring idea is illustrated: the parametrized probability density function tightens when the failure is observed or is pushed beyond the current observed point, if the failure is censored.
Under the assumption of non-informative censoring [26], the likelihood reads: Thus, the log-likelihood can be written as: For non-censored and censored failures, the log-likelihood will always have a negative contribution with increasing t, penalizing as time progresses. Following Herp et al. [9] we consider a Weibull distribution, for its tractable properties, intuitive parametrization and use in other related studies [19][20][21][22]: Here, t ∈ [0, ∞), β ∈ (0, ∞), and α ∈ (0, ∞), where α and β are referred to as the scale and shape of the distribution. Beside other features, such as analytic moments, the Weibull distribution has an analytic hazard and cumulative hazard function, which can be obtained by Equations (2) and (3): The cost associated with the Weibull function is then given by Equation (9) and reads: 2.2.1. Initial Assessment The initial assessment contains only one estimation of the remaining-useful-lifetime, based on the descriptive statistics and maximizing the likelihood as described by Equation (14). Figure 3 shows the Weibull model for run-to-failure wind turbines (bearing overtemperature alarm), wind turbines running into bearing overtemperature warning, wind turbines excluded from the model construction (as they do not run into any aforementioned event), are illustrated by their cumulative distribution as a scatter plot (non-failing). For later comparison, a stopped by operator model is obtained for the non-failing wind turbines. Furthermore, the Kaplan-Meier [28] estimation (KM) is provided for reference, as it makes no assumption with regard to the shape of the probability distribution.

State Abstraction and Discrete Assessment
The same principle as in Figure 3 applies to the discrete assessment. However, in order to perform the discrete assessment, condition monitoring (CM) points will need to be taken into account. How these CM points are extracted is described in this section.
Between initial detection and failure of a damaged bearing, the bearing undergoes transitions from a minor fatigue to advanced fatigue and damage. In combination with Equation (1), this section addresses how to identify individual stages of the bearing fatigue to lay the foundation of the discrete assessment later on.
Consider an event E k or set of events E k , from a library of events E = {E 1 , . . . , E K } (abstracted by Equation (1)), linked to the operation of a wind turbine at time t, to be dependent on the data collected up to time t and a hidden state variable s (t) , s (t) representing the current state of the wind turbine. The probability for E k can be given by: further, S m is referred to as the hidden states, defined by the separability of the process {x 1 , . . . , x T } into S ≤ T states. The transition between those states is called state transition, where s (t) is the length of the current state with samples x [S S ,t] , with S S being the time of the last state transition. Following Herp et al. [29,30] and Prescott Adam et al. [31], a state in a set of time-series can be abstracted by considering only the maximum likelihood of P(s (t) , x [1,t] ), where P(s (t) , x [1,t] Here the conditional prior and sample model are implicit depending on known hyper-parameters β = [β c , β m ], where β c and β m are the parametrization of the conditional prior and the sample model.
Given the nature of the data we are seeking a sample model that addresses changes in the first and second moment of the time-series. This is achieved by classifying each s (t) by the two moments E[x] and E[x 2 ]. Given sufficient statistics for each s (t) , a posterior distribution for x t is given for the iterative update [32,33]: with µ 0 , κ 0 , α 0 , γ 0 being the previous statistics. A Student's t-distribution can then be used for the posterior distribution [34] with ν degrees of freedom and µ and σ as mean and variance, it follows that: The conditional prior is chosen as a hazard function, as motivated in Section 2.2, such that here η is a hazard function of a geometric sequence and given by the elapsed time since the last state transition: For the remainder of this work, the algorithm is implemented with a supervised probability update as proposed by Herp et al. [29].
In order to quantify states, a piecewise constant regression for finite state numbers is performed on the cumulative probability state transitions: s.t.
arg max where α s is a real number, the states cumulative probability, and A s the interval of the sth sate, i.e., [S m−1 , S m ), χ A s is an indicator function that is 1 if t ∈ A, and 0 otherwise. The constraint on A s comes from the delay of detecting a new state.
In detail, the state abstraction for the health indicator can be seen in Figure 4. From top to bottom the health indicator for a wind turbine is shown in (a), (b) shows the log-likelihood for the state transitions (gray-scale) and its maximum likelihood, (c) shows the state transition probability density function and cumulative density function, in addition the abstracted states in accordance with Equation (21)   The state abstraction give us a measure for the length of each state. Given the time for each turbine to any state transition, the RUL probabilities can be calculated equivalent to the initial assessment for any combinations of state transitions. The implementation and results of this approach are discussed in the case study of Section 3, providing the RUL probabilities from initial detection to any of the state transition. Figure 10).

Continuous Assessment: RUL Recurrent Neural Network
We employ a recurrent neural network (RNN) for the continuous assessment for maximizing the cost function given by Equation (14) at each t. In general, this is an optimization problem of finding a function or distribution in the space of hazard function, F ∈ H, that maximizes the model's likelihood given historical data, x [1,t] . The space of all possible closed-from distributions in H is too large to obtain solutions in an easy manner. Furthermore, as x [1,t] can contain any information at time t, including information of the previous states [30] the optimization for all x is computational not feasible. However, constraining the problem to a Weibull distribution and letting the Weibull distribution be dependent on x [1,t] , it then follows from Equations (12) and (13), as well as Equation (14) that: [1,i] ) This optimization is still not feasible, as for a large dataset, each parameter needs to be obtained through consideration of all historic data. In the context of Figure 5, θ can be written as a vector θ = [α, β] . Omitting mathematical details on node level, which can be found in a wide range of textbooks such as Goodfellow et al. [35], each node in Figure 5  arg max ω log L(ω, RUL, ∆, x [1,t] For the remainder of this study, we use a Long Short Term Memory (LSTM) RNN [35] and maintain the topology, feature space, and optimization as in Herp et al. [9] (See Figure 6). The sequence length is set to 7 days. An example on how predictions look like can be found in Figure 7, showing the assumableRUL and model prediction. The prediction is shown in terms of the first moment, median, and arg max of the Weibull distribution at each time t. The distribution itself is illustrated as shaded areas.

Dec
Feb Apr Jun Aug Oct Figure 6. Illustration of the RNN used to predict the RUL. Selected SCADA time-series are shown as input [9].

Assessment of Early Stopping-A Bearing Failure Study
This section combines the afore mentioned methodology and applies it to 67 non run-to-failure wind turbines. For each assessment, all turbines are investigated with respect to early stopping, showing the lost remaining RUL after turbine operations were stopped. The remaining turbines are used for model construction.
Consider the example of Figure 3 where models for the bearing overtemperature warning and bearing overtemperature alarm are illustrated. Comparing each non run-to-failure wind turbine to any of the event models shows a mismatch δ, which is defined as the distance between any given wind turbine and the predictive models. This mismatch is interpreted as the lost remaining RUL.
Under continuous assessment, Figure 7 shows the mismatch between the presumably trueRUL and the model prediction for a non run-to-failure wind turbine. In this case we consider an average measure for each wind turbine: where D (turbine) will then be a measure of the specific turbine's potential remaining RUL. In the following we employ the methodology of this section in a case study of mean bearing failures.
When addressing whether or not to stop a wind turbine and perform maintenance is a topic on its own. Many factors, e.g., weather condition, availability of equipment, spare parts, and manpower, have to be taken into consideration when deciding and planing O&M tasks. In the following, we simplify towards a pure time-based approach. Many wind turbines are stopped before they run into failure; based on the proposed RUL estimations, the potential remaining time until failure can be estimated.

Initial Assessment
Model construction can be facilitated up to different events, including bearing overtemperature warning and bearing overtemperature alarm. For the wind turbines in this study, the respective models are shown in Figure 3. Assuming that all turbines experience the same fault, the difference between the two models (warning and alarm) should be an offset in the predictive horizon. However, variability in environmental features and the bearing failure's development can lead to changes of the time it takes to undergo a state transition from the bearing overtemperature warning to the bearing overtemperature alarm event. This is illustrated in Figure 8, labeled warning vs. alarm. The low curvature of the graph indicates that the difference between the two models is relatively small, and relatively constant throughout the RUL space. Given the limited number of wind turbines, that are operated beyond RUL > 250 days and turbines that run into any event before RUL < 100 days, broader confidence bounds can be observed in these RUL ranges. Besides the time between warning and alarm, we can consider a model for the 67 non run-to-failure wind turbines, referred to as the stopped model in Figure 3. A mismatch δ for this model with respect to the warning and alarm model is shown in Figure 8. Comparing the models, turbines that have run 100 days, and were stopped too early would have had 30 days left of operations until running into failure, i.e., the bearing overtemperature alarm. Or if a more conservative stopping criterion is desired, 20 days until the bearing ovetemperature warning. Variability in wind turbine operations leads to a point where the expected mismatch becomes negative; in this regime, the wind turbine is operated at the right tail of the RUL probability curve of Figure 3. An operator might interpret this as the time where the wind turbine has overextended the confidence of continuing operations, when comparing to the ensemble of already failed turbines.  The mismatch to failure over all individual wind turbines can be seen in Figure 9a as a histogram. The largest mismatch is 57 days, while the smallest is −40 days. The mean value and median are 11 and 15 days, respectively. The bulk of turbines is concentrated in the quantiles between 3 and 23 days. The combined mismatch accumulates to 782 days.

Discrete Assessment
In its core, the discrete assessment with CM points at each state transition, is the same as the initial assessment but applied at each CM point. As failure of a component can go through different stages, the operator might want to re-evaluate when the component undergoes the next state transition.
The state detection and abstraction is performed as outlined in Section 2.2.2. As the bearing failures in this case study are slowly developing faults, we impose a minimum state length of A s ≥ 5 days, as seen in the constraint of Equation (21). This is done in order to prevent fast switching states in case of random outliers.
In Figure 10, the the cumulative state lengths are shown for each turbine. For each state transition, the RUL for that state is provided. This gives the operator information on when to expect the next state transition of a damaged component. Remark: as the number of states increases, the models become less defined as confidence bounds increase. This is caused by the limited sample size of turbines that experience three or more states transitions. The expected RUL is obtained by comparing the wind turbine's cumulative state length with the state length of states for wind turbines abstracted with the event pattern E l ⇒ E l , where E l contains the bearing overtemperature alarm. The mismatch to failure over all individual wind turbine can be seen in Figure 9b as a histogram. The largest mismatch is 38 days, while the smallest is −15 days, the mean value and median are 16 and 17 days, respectively. The bulk of turbines is concentrated in the quantiles between 10 and 20 days. The combined mismatch accumulates to 836 days.

Continuous Assessment
As shown in Figure 7 the continuous assessment is concerned with the prediction of the RUL at each time step. Focusing on either the first moment, median, or arg max of the Weibull distribution, it becomes apparent that there is a discrepancy between the model's prediction and expectedRUL. This is the aforementioned mismatch δ . The average mismatch to failure over all individual wind turbines can be seen in Figure 9c as a histogram. The largest mismatch is 41 days, while the smallest is −8 days; the mean value and median are 11 and 12 days, respectively. The bulk of turbines are concentrated in the quantiles between 10 and 15 days. The combined mismatch accumulates to 739 days.

Discussion
Common for all assessments is a high count of wind turbine mismatches at the respective mean values. These peaks can contain more than twice the amount of wind turbines than their neighboring bins, indicating not only the most likely mismatch, but also a firm consistency indicating a tendency to stop too early. However, two points are worth mentioning, (i) the different assessments do not yield the same results, and (ii) the accumulative mismatch is based on naïve O&M assumption.

Discrepancy Between Assessments
The discrepancy between the different assessments stem from the underlying nature of the descriptive statistics. As the initial assessment applies for the total length of failure, no iterative adjustment occurs in the monitoring effort, thus assessing wind turbines that are further into a fault process are not presented by the initial assessment. On the other hand, as the name implies, it gives an initial assessment of the expected length of the failure state and their relative probability based on an ensemble of wind turbines.
A step closer to a real representation of the RUL is provided by the discrete assessment. While the failure state can be split further into shorter states, the RUL and probability of failure is obtained by propagating non run-to-failure wind turbines through the RUL probabilities for each state transition. Besides, the discrete assessment can provide a measure of RUL until the next state transition, aiding an operator to known when to stop a wind turbine's operation for more conservative O&M strategies. For both the initial assessment and discrete assessment, RUL of individual wind turbines and, from there, the mismatch δ, are ensemble measures, i.e., when addressing the RUL of a single wind turbine with respect to a cumulative distribution obtained by all other turbines that were undergoing the same failure.
The initial and discrete assessments are in their core curve fitting problems for one or more datasets of wind turbine RULs. When the number of CM points goes towards the number of sample vectors, S → T, one can consider the discrete and continuous assessment as equivalent. While the aforementioned assessment was an ensemble measure, the continuous assessment offers a real-time monitoring approach for individual wind turbines, where the RUL is updated at each successive time step. This makes the prediction of the RUL more and more reliable the closer a wind turbine is to failure, but provides poor or misleading assessments early in the fault process, where predictions are wide [9]. The further a wind turbine is in a fault state, a continuous assessment becomes more and more desirable.
All together, when combined as proposed in Figure 1, the proposed assessment offers an operator information at desired stages of a main bearing fault.

The Naïve O&M Assumption
The cumulative, potential lost RUL has to be looked upon with caution. Firstly, naïve O&M assumption takes away any other consideration that might play a role in stopping a turbine, and secondly, wind turbines might continue to operate reliably even if models suggest otherwise. Apart from the predictive error due to model constraints, it becomes apparent that RUL, i.e., a measure of time, is a difficult measure that is not easily related to the physical parameters of a wind turbine's operation. If a wind turbine is not operated due to lack of wind or other external factors, no matter how good a model, predictions will underestimate the RUL in any of the suggested frameworks.

Conclusions
We have proposed three fault monitoring and assessment concepts, namely, initial assessment, discrete assessment, and continuous assessment, showing that in any of the frameworks, wind turbines are stopped earlier than necessary compared to a global ensemble. Based on the descriptive statistics of 67 non-run-to-failure wind turbines, wind turbines are stopped 13 days too early, prior to their failure, accumulating, on average, 786 of potentially lost days of operations under the naïve O&M assumption made in this study. Future work will probably focus on the inclusion of supply chain and weather constraints to represent a more realistic model mismatch behavior. Funding: This project was funded by Siemens Gamesa Renewable Energy and the University of Southern Denmark.

Acknowledgments:
The authors would like to take the chance to thank Siemens Gamesa Renewable Energy for providing us with the necessary data.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. P(s (t) | s (t−1) ) ≡ P(s (t) | s (t−1) , β c ) independent conditional prior P(s (t) | x [1,t] ) probability distribution over the current state P(x t | s (t−1) , x [1,t−1] ) ≡ P(x t | s (t−1) , x [1,t−1] , β m ) sample model δ mismatch between RUL andRUL D average mismatch