A Reliability Growth Process Model with Time-Varying Covariates and Its Application

: The nonhomogeneous Poisson process model with power law intensity, also known as the Army Materiel Systems Analysis Activity (AMSAA) model, is commonly used to model the reliability growth process of many repairable systems. In practice, it is necessary to test the reliability of the product under different operational environments. In this paper we introduce an AMSAA-based model considering the covariate effects to measure the inﬂuence of the time-varying environmental condition. The parameter estimation of the model is typically performed using maximum likelihood on the failure data. The statistical properties of the estimation in the model are comprehensively derived by the martingale theory. Further inferences including conﬁdence interval estimation and hypothesis tests are designed for the model. The performance and properties of the method are veriﬁed in a simulation study, compared with the classical AMSAA model. A case study is used to illustrate the practical use of the model. The proposed approach can be adapted for a wide class of nonhomogeneous Poisson process based models.


Introduction
A repairable system is a system that can undergo reparation by mending or replacing some system components rather than re-establishing the entire system after failing. Most of the contemporary complex systems are repairable-for instance, communication systems, software systems, automobile engines, helicopters, and aircraft generators all belong to the repairable category. The reliability of some repairable systems can be improved by a test, analyze, and fix process, where system failures are identified and corrective actions are implemented until the reliability reaches a prespecified level. This procedure of improvement is called the reliability growth process, or the reliability growth test.
The reliability growth test is a powerful tool to improve repairable system's reliability by identifying and repairing failure units, which has been widely used in many industries [1]. In the testing, data of failure times are collected to assess the changing trend of the product's reliability growth. A variety of reliability models have been developed to study these testing data. Duane [2] first introduced the reliability growth model for aircraft engines by using the learning curve to describe the relationship between the cumulative failure rate and the cumulative testing time. Crow [3] further delved into the stochastic extension of Duane's model, where failures can be represented as a nonhomogeneous Poisson process (NHPP) with Weibull intensity, which is also called a power law process [4]. The developed model is now known as the Army Materiel Systems Analysis Activity (AMSAA) model and has been popular, with multiple applications in industrial areas including automobiles [5], health care [6], power systems [7], and coal mining [8]. Subsequently, more forms of the NHPP model have been introduced to characterize the time-varying rates, such as the log-linear intensity model [9] and the exponential-trigonometric intensity model [10]. The statistical theory of the classical AMSAA model has been intensively studied as well. Some of these efforts focused on confidence interval estimation [11,12], future reliability estimation [13], first passage time problem [14], model selection [15] and goodness-of-fit test [16].
More recently, an increasing number of extensions of the AMSAA model have been proposed to foster product development. For instance, Somboonsavatdee et al. [17] investigated the statistical inference of the AMSAA model under competing risks; Peng et al. [18] proposed a novel method to evaluate and predict the dynamic reliability of a repairable system subjected to the interval-censoring problem based on the AMSAA model; Hu et al. [19] designed a robust decision method for the planning of reliability growth testing by integrating information gap decision theory and the AMSAA model. A compelling improvement to the AMSAA model is the incorporation of heterogeneity-an inclusion of individualspecific factors or covariates that may affect the reliability growth process. To measure the impact of operational and environmental factors, it is essential to take covariate effects into consideration, and several studies have already displayed this awareness. Van Dyck and Verdonck [20] extended the AMSAA model by introducing a multiplicative covariate factor with the interpretation of a known scaling of the system's failure character. Similarly, Slimacek and Lindqvist [21] introduced a novel approach for the statistical modeling of failures where nonparametric frailty and covariates were considered in the NHPP model. In addition to the NHPP-based models, proportional hazard models are also able to delineate the covariate effects; however, they make no distributional assumptions about the failure process [22].
Among all the contemporary analyses of repairable systems with covariates or environmental effects, the majority are conducted over reliability growth tests for multiple experimental individuals, machines, or systems. However, with the present rising reliability requirements and the increasing product useful life, the cost of conducting reliability growth tests often becomes exorbitant. Because test costs and the availability of test subjects can sometimes be a barrier to the progression of reliability growth tests, it is more practical and economical to conduct tests with a single test subject under different experimental conditions; in other words, it would be more desirable if the covariates were time-dependent and the data were collected from one sample path. These time-varying covariates can include local climatic conditions like temperature and humidity or operational factors like current intensity and rotating speed, which can all be dynamically monitored during the test. To our best knowledge, there have been few comprehensive investigations on the AMSAA model with such test settings. In this paper, we model the data of a reliability growth test in a time-varying environment through an extended AMSAA model.
In a classical AMSAA model, the parameters can be estimated by using maximum likelihood estimation (MLE) with analytic forms, whose exact distributions can be derived as chi-square distributions [3]. However, since there is no analytic form for MLE in the extension, it is challenging to discuss the properties of MLE. Moreover, most NHPP-based works on repairable systems develop estimations and inferences from the classical MLE theory, which is not readily applicable to small sample data with dependence. With these previous limitations in mind, the statistical properties of MLE and further inferences of the model are derived comprehensively from the martingale theory in the present study. The rest of this paper is organized as follows. In Section 2, we establish the AMSAA model with time-varying covariates of environmental effects. The method and property of the parameter estimation are discussed in Section 3. More statistical inference paradigms are developed in Section 4. A comprehensive simulation study and a case study are described in Sections 5 and 6. A final discussion and an overview of the study are presented in Section 7.

AMSAA Model with Time-Varying Covariates
When evaluating the reliability of complex repairable systems with counting process models, each event in the process represents one instance of a failure of the system. The counting process {N(t), t ≥ 0} is said to be a nonhomogeneous Poisson process with intensity function λ(t), t ≥ 0 if it satisfies: 1.
Under the conditions in the definition of NHPP, the distribution of the number of events that occurred in (s, t], N(t) − N(s), can be obtained from: where Let N(t − ) denote the number of events that occurred before time t. If we observe n events in the interval [0, T] and the events occurred at times t 1 , t 2 , · · · , t n , the desired joint probability density function with t 0 = 0 takes the form, Several forms of the intensity function have been used in modeling the process of reliability growth. In the present study, we considered the celebrated power law intensity adopted in AMSAA model, that is, λ(t) = γαt α−1 . The expected event number up to time t is γt α .
When operating the testing machine in environmental or operational conditions that are dynamic, external factors (e.g., temperature and stress) and internal factors (e.g., speed) will influence the reliability growth process of the machine. Now suppose that for the system there is a d-dimensional time-dependent covariate vector X(t) that affects the intensity of events' occurrence. Note that X(t) may be deterministic or random, continuous or discrete about testing time. A deterministic and discrete situation is the most common due to the multi-stage experiment design, where the experimental conditions are changed in each stage. To quantify the reliability growth process with time-dependent covariate effects, some assumptions are made for the model.  Assumption 1 is satisfied in most cases in reality. Usually, X(t) is the monitoring data, which can be observed up to time t. The bounding condition restricts the covariate effect within a reasonable range so that the expected failure will tend to infinity as the operational time tends to infinity. Assumption 2 is also a common assumption that characterizes evolving pattern of a product's reliability with normal use. In Assumption 3, the model we consider for the inclusion of X(t) is of the Cox type. This gives us the intensity and the expectation function with covariate effects respectively: In the model, α, γ, and β are unknown parameters, with α and γ being positive. As a well-known property of the counting process, we have the following martingale property for the model.
is a square integrable martingale with respect to the filtration F t = σ{N(s), s ≤ t}, and its predictable variation process is Λ(t).
Further properties and theories of the counting process and martingale can be found in [23].

Maximum Likelihood Estimation
The maximum likelihood method is considered for estimating the unknown parameters γ, α, and β. The log-likelihood of (2) is given by Then, the estimation values can be obtained by the optimization problem, The first-order partial derivatives of the log-likelihood are The parameter estimation can be computed by setting the partial derivatives equal to zero and solving the equations. However, there is one key problem that requires discussion before the computation-the existence and uniqueness of the solution. We will prove the property by constructing conditional likelihood.
Consider the conditional distribution of t 1 , t 2 , · · · , t n given N(T) = n. The conditional distribution is given by the distribution of the order statistics of the random variable t whose density function is Λ(T) . Note that the distribution function does not contain the parameter γ. As a result, the conditional log-likelihood function is given by We have the following lemma for the conditional log-likelihood. Lemma 1. The maximum likelihood estimator for the log-likelihood (9) exists and is unique with probability 1 as the time tends to infinity. Moreover, the estimator is strongly consistent.
Proof. The problem is equal to discussing the existence and uniqueness of the maximum likelihood estimation of the parameters for n independent and identically distributed random variables with density function is an exponential distribution family with natural parameter α and β. Note that N(T) → ∞ with probability 1 as T → ∞. Then, the property in the lemma is derived by the theory of classical repeated sampling from an exponential family [24].
The estimator in the lemma gives the same solution with that of (5) for α and β. We will prove the existence, uniqueness and consistency for the estimator of full log-likelihood in the following theorem. Theorem 1. The maximum likelihood estimator from (5) exists and is unique with probability 1 as the time tends to infinity. Moreover, the estimator is strongly consistent.
Proof. The estimation forα andβ in optimization (5) is equal to maximizing the conditional likelihood (9), which derives from the direct calculations shown in Appendix A . Due to the analytic form ofγ with α and β given by (10), the existence and uniqueness are held for (5). The strong consistency ofα andβ is given by Lemma 1.

Statistical Inference
In this section, we will discuss the distribution of the maximum likelihood estimation, which is expected to be asymptotically normal in most cases, as well as some inferences for the proposed model.

Asymptotic Normality of the Parameter Estimation
Before discussing normality, we present the martingale property of the score function. (γ, α, β) T by θ. The score function ∂l T ∂θ denoted byl T (θ) is a square integrable martingale with mean zero with respect to the filtration F T = σ{N(s), s ≤ T}.

Property 2. Denote
The property follows from the fact thatl T (θ) can be formulated as . The specific form of H(t) can be extracted from (6), (7), and (8). With this property, the asymptotic normality of the maximum likelihood estimator can now be deduced by standard arguments.

Remark 1. I T (θ)
and J T (θ) are well known as the expected information matrix and observed information matrix, respectively. The concrete forms are displayed in Appendix B. Due to the asymptotic equivalence of these two information matrices, I T (θ) in the theorem can be replaced by J T (θ).

Inference of MTBF
The mean time between failures (MTBF) describes the expected time between two failures for a repairable system, and is a very important indicator in the process of reliability growth. Under the setting of NHPP, the mean time between failures at time t, MTBF t , is 1 λ(t) . With the parameters estimated, the estimation of MTBF t can be given by MTBF t is a continuous and differentiable function ofθ. The distribution of MTBF t can be given by the delta method as: where Then, the 95% confidence interval estimation of MTBF t can be computed in the form by plugging inθ, where q 97.5% is the 97.5th percentile of the standard normal distribution. Note that the MTBF t is a transient index computed under the time dimension of the accelerated reliability growth test, where the accelerated factor is decided by the covariate effects. In practice, we also want to know what the reliability of the machine would be like if there have been Λ(t) faults fixed in normal use; such a reliability can be inferred from the MTBF at t (0) in a standard condition test, where the expected cumulative number of failures at t under covariate effects is equal to the expected cumulative number of failures at t (0) under the standard conditions.
To distinguish between the two MTBF types, we will use MTBF (0) to denote the latter. There exists a measurable map ψ from the test time t with covariate effects to the equivalent time t (0) in the standard condition so that the transformed process denoted by {N(t (0) ), t ≥ 0} is an NHPP with intensity function λ (0) (t (0) ) = γα(t (0) ) α−1 and expectation function Λ (0) (t 0 ) = γ(t (0) ) α . The aforementioned setup yields: Subsequently, the estimated MTBF (0) for the testing time t in the test with covariate effects is The 95% confidence interval is given by where g * (θ) = ∂θ . Furthermore, we generalize the MTBF to the situation where Λ(t) faults have been fixed for the machine under the covariate effects of e β T Y Y(t) , with parameter β Y being known. A new map ψ Y (t) for the testing time is constructed in a similar manner to that of (28), where Note that the map is one-to-one due to the monotonically increasing property of Λ (Y) (·). Therefore, the estimation for MTBF . (32) In addition, the 95% confidence interval can be given by the delta method, as shown in (27) and (30).

Hypothesis Test for Covariate Effects
The inference of the covariate effects can be accomplished by using likelihood ratio tests. Consider the hypothesis test problem, where S ⊆ {1, 2, · · · , d}. The test aims to check whether the covariates in the subset S influence the reliability growth process significantly. The likelihood ratio test is designed to achieve this purpose. The likelihood ratio statistic can be given by where β −S represents the vector excluding β S . By Wilks' theorem, under H 0 , −2 log LR has an asymptotic distribution of χ 2 |S| , where |S| is the cardinal number of the set S. Then, the null hypothesis will be rejected at a significance level of 0.05 if −2 log LR is greater than the 95th percentile of χ 2 |S| , indicating that the covariate effects of X S (t) are significant.

Simulation Study
The functionalities of the proposed model and estimation method are validated in the simulation study. Data of time points when failures happen are simulated by a non-homogeneous Poisson process whose intensity function is defined as (3). The timedependent covariate effect is considered as a step function which occurs when the environmental factor (e.g., temperature) is changed as designed at some points during the test. The values of the step function are defined in Table 1.
The parameters in the model are set as, γ = 1, α = 0.5, and β = 1. The failure points of the process are generated by the thinning approach [27], which applies thinning to the points generated from a homogeneous Poisson process with a rate parameter equal to the maximum of the intensity function. MLE is done for the simulated data and the procedure is repeated 1000 times. The estimation results are summarized in Table 2. From the results, we can conclude that the MLE results are close to the true values, with relatively small mean square errors (MSEs). The standard deviations of the estimates almost perfectly match the theoretical standard deviations computed from the square root of the inverse of the expected information matrix. As a means of comparison, the classical AMSAA model without considering the covariate effects is fitted as well. This model has a mean of 0.66 and a standard deviation of 0.1991 forγ; the mean ofα is 0.5728 with a standard deviation of 0.0263. The deviation between the estimation of the classical AMSAA model and the true parameters (γ, α) in the base intensity is perceptibly large. In order to further compare the performance in fitting the reliability growth trend, the fitted intensity functions are plotted with the means of MLEs in Figure 1. The curve of the intensity estimated by the AMSAA model with covariate effects is within the 95% confidence band, while the curve of the classical AMSAA model deviates considerably large from the true curve. The comparison result again tellingly emphasizes the indispensability of considering covariate effects.  [15,000,100, 000] in different models. The grey area is the 95% confidence band constructed by the true parameters. For the sake of comparison and neatness, the confidence band is constructed around the true curve rather than the two estimated curves. The band width is the same as that of the proposed model, which is computed with the delta method by plugging in the true parameter values.
Another aspect in the simulation study that calls for verification is the normality of the estimators. The quantile-quantile (Q-Q) plot method is employed to show the distributions ofγ,α, andβ, which are presented in Figure 2. As expected, the scattered points are distributed around the line y = x except for the most extreme values, and this pattern indicates a good fit. Therefore, it is plausible to conclude that the estimator is approximately normally distributed in practice. The relationship between MSE, which is computed by the mean square l 2 norm ofθ − θ, and testing time is studied as well. The maximum testing time is increased from 100, 000 to 600, 000 with a step size of 50, 000. The testing time of each phase is increased proportionally. For each testing time, the procedure of simulation and estimation is repeated 1000 times. As seen in Figure 3, MSE decreases with the increase of testing time, which illustrates the consistency of the estimates.

Case Study
The data used in the case study is obtained from a real dataset collected in a reliability growth test during the development phase of an engine. In the test, the engine was tested in four phases with different operational conditions, and the failure times were recorded. Some summary statistics are provided in Table 3. In this particular case, it is impossible to monitor all aspects of the environment. So, the covariate in table is generated from scoring by experts, and is the evaluation for the stress intensity of the environment. More specially, the testers rate the stress intensity of the experimental environment as {0, 1, 2} based on the sensitive parameters of the engine, such as gas pressure, temperature, and rotation speed.
We estimate the parameters by maximizing the log-likelihood. The estimation results are shown in Table 4. The valueα is less than 1, which suggests that the reliability of the machine is improved in the testing procedure. The positiveβ implies that the increase in the stress intensity of the environment accelerates the process of the fault exposure. With the estimated parameters, we estimate MTBF t and its 95% confidence interval at the failure points. The estimation results are plotted in Figure 4. As shown in the figure, MTBF t grows as testing time increases in each phase, and the growth rate is attenuating. The MTBFs for other operational conditions at the end of the test are also calculated through (32). The estimations and their 95% confidence intervals are presented in Table 5. The first row of the table presents the reliability computed in the experiment condition; the other three rows are the transformation results estimated for the testing machine put in each of the remaining operational conditions, respectively. However, due to the short test duration in the present study, the variance of the estimation is relatively large. This limitation can be addressed in the future by incorporating a longer testing time so that a more accurate set of data can be obtained.  In the model, the stress intensity of the experimental environment is reflected by the covariate scored by experts. An alternative method is to treat the stress intensity as a latent variable and incorporate it into β. In this case, we can set X(t) as a 2-dimensional dummy variable, that is, With the set of X(t), the values of β 1 and β 2 in the coefficient vector represent the environmental stress intensity of Phases 2 and 3 during the test, respectively. The estimation results for the alternative model are shown in Table 6. It is shown that the stress intensity of Phase 3, β 2 , is the highest, which is in line with the expert score. Note that β 1 is close to 0; we use the likelihood ratio test (33) to check whether β 1 = 0. The test statistic −2 log LR takes the value 0.0328, which is less than the 95th percentile of χ 2 1 . So, the null hypothesis of β 1 = 0 cannot be rejected. The hypothesis test result fails to correspond to the actual setting of the test, where the environmental stress of Phase 2 is significantly stronger than that of the normal case. The increase in the number of parameters and the absence of experimental information render the model more susceptible to data fluctuations, which leads to greater estimation variance and insignificant estimation results. In future tests, this can be improved by increasing the length of testing time and collecting more monitoring information on the sensitive parameters of the tested engine.

Discussion
In this paper, we investigate an extension of the AMSAA model where the influence of the environment is considered as a time-dependent covariate vector. The covariate effects are incorporated into the AMSAA model as a proportional factor in the intensity function. To estimate the parameters in the NHPP-based model, the maximum likelihood estimation method is adopted. The properties of MLE, including existence, uniqueness, consistency, and asymptotic normality, are proved with several martingale properties of the designed counting process. Based on the MLE properties, the confidence interval estimation of the reliability measurement, mean time between failures, is deduced by the delta method. Moreover, a likelihood ratio test is designed to test whether any covariates influence the reliability growth process significantly. We verify the property and the efficiency of the proposed model in the simulation study, where a comparison is made with the classical AMSAA model. The results of the simulation demonstrate the necessity of taking covariate effects into account. A case study is conducted on the data collected from a multi-phase reliability growth test. We illustrate the practical use of the model by estimating the parameters and the MTBFs for the testing machine.
The proposed model is formulated based on the AMSAA model, in which failures are modeled by an NHPP whose intensity function corresponds to the power law. Nevertheless, the modeling approach and the proofs of its properties can be applied to NHPP models with an exponential family intensity function, which enables our model to be readily extended to these models. Furthermore, the covariate effects can be incorporated into the reliability growth model in more generalized forms in future work. As for application, our model can play an essential role in the design and planning of future reliability growth experiments, especially multi-stage reliability experiments.