Next Article in Journal
Roblox Graffiti Art Using Virtual Reality Devices: Reality and Spatial Presence in a Virtual Space
Previous Article in Journal
Enhancing Energy Efficiency in Retail within Smart Cities through Demand-Side Management Models
Previous Article in Special Issue
An Overview of Japanese Encephalitis in Australia: Trends, Impact and Interventions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Using the Weibull Accelerated Failure Time Regression Model to Predict Time to Health Events

1
Mary MacKillop Institute for Health Research, Australian Catholic University, Melbourne, VIC 3000, Australia
2
College of Medicine and Public Health, Flinders University, Adelaide, SA 5042, Australia
3
Australian Institute of Family Studies, Melbourne, VIC 3006, Australia
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(24), 13041; https://doi.org/10.3390/app132413041
Submission received: 30 October 2023 / Revised: 29 November 2023 / Accepted: 5 December 2023 / Published: 6 December 2023
(This article belongs to the Special Issue Applied Biostatistics for Health Science and Epidemiology)

Abstract

:
Clinical prediction models are commonly utilized in clinical practice to screen high-risk patients. This enables healthcare professionals to initiate interventions aimed at delaying or preventing adverse medical events. Nevertheless, the majority of these models focus on calculating probabilities or risk scores for medical events. This information can pose challenges for patients to comprehend, potentially causing delays in their treatment decision-making process. Our paper presents a statistical methodology and protocol for the utilization of a Weibull accelerated failure time (AFT) model in predicting the time until a health-related event occurs. While this prediction technique is widely employed in engineering reliability studies, it is rarely applied to medical predictions, particularly in the context of predicting survival time. Furthermore, we offer a practical demonstration of the implementation of this prediction method using a publicly available dataset.

1. Introduction

Clinical prediction models are commonly utilized in clinical practice to screen high-risk patients. This enables healthcare professionals to initiate interventions aimed at delaying or preventing adverse medical events. However, in the realm of the medical literature, most prediction models focus on estimating the probability of an event occurring or a condition developing over a specified time frame. For instance, there are well-known models like the Framingham 10-year risk of general cardiovascular disease [1] and FRAX, a tool for estimating 10-year fracture risk [2]. This information can pose challenges for patients to comprehend, potentially causing delays in their treatment decision-making process. In contrast, in engineering reliability research, it is commonplace to employ Weibull accelerated failure time (AFT) models to predict “time to failure”. This is relevant in scenarios like determining the lifespan of machinery, identifying when a component requires replacement, and optimizing maintenance schedules to enhance overall system reliability [3]. Weibull AFT models also find application in forecasting the shelf life of perishable goods and warranty periods for products [4,5].
This statistical methodology estimates when an event will occur without being restricted to a predefined time frame (i.e., when a component will need replacement, as opposed to a 10-year risk of replacement). Additionally, this statistical approach is not limited to predicting engineering or mechanical events; it may also prove valuable in predicting medical events such as fractures, myocardial infarctions, and fatalities. In this paper, our intention is not to develop and present a prediction tool. Instead, we aim to demonstrate how to utilize the Weibull AFT model and evaluate its accuracy in a medical context.

2. Weibull Distribution

The Weibull distribution is also referred to as the type III extreme value distribution [6]. This distribution is characterized by three parameters: the location parameter  μ , the scale parameter  ρ , and the shape parameter  γ . The location parameter  μ  is typically set as the minimum value in the distribution. In the context of survival or failure analysis, it is common to select  μ  as 0, which results in a two-parameter distribution.
The cumulative distribution function (CDF) for a two-parameter Weibull distributed random variable is denoted as:
F T ( t ; ρ , γ ) = 1 exp ( t ρ ) γ
where  t 0 , ρ > 0 ,  and  γ > 0 .
The probability density function (PDF) of the Weibull distribution is given as:
f T ( t ; ρ , γ ) = F ( t ; ρ , γ ) = γ ρ ( t ρ ) γ 1 exp ( t ρ ) γ
The survival function of the Weibull distribution is given as:
S T ( t ) = 1 F T ( t ) = exp ( t ρ ) γ
The mean survival time or mean time to failure (MTTF) is given as:
E ( T ) = 0 S ( t ) d t = 0 exp ( t ρ ) γ d t let ( t ρ ) γ = u t = ρ u 1 γ = 0 e u ρ 1 γ u 1 γ 1 d u = ρ 1 γ Γ ( 1 γ ) note 1 γ Γ ( 1 γ ) = Γ ( 1 γ + 1 ) = ρ Γ ( 1 γ + 1 )

3. Log-Weibull Distribution

The log-Weibull distribution is also known as the Gumbel distribution, or type I extreme value distribution [7].
Let us consider a random variable T, which follows a Weibull distribution  W ( ρ , γ ) , and we have a one-to-one transformation  Y = log ( T )  that maps support  T = t | t > 0  to  Y = y | < y < . The inverse of Y is given by:
T = g 1 ( Y ) = e Y
The Jacobian is calculated as:
| J | = | d g 1 ( Y ) d Y | = e Y
Using Equation (2), we can derive the PDF of Y:
f Y ( y ) = f T ( g 1 ( y ) ) | J | = γ ρ ( e y ρ ) γ 1 exp ( e y ρ ) γ e y
Simplifying further:
f Y ( y ) = γ e γ y ρ γ exp e γ y ρ γ = γ e γ ( y log ρ ) exp e γ ( y log ρ )
Here, we let  γ = 1 b  and  log ρ = a :
f Y ( y ) = 1 b exp ( y a b ) exp exp ( y a b ) where < y <
This demonstrates that the log-Weibull distribution corresponds to a Gumbel distribution  G ( a , b ) , where  a = log ρ  and  b = 1 γ .
The CDF  F Y ( y )  of the log-Weibull distribution can be derived as:
F Y ( y ) = P ( Y y ) = P ( l o g ( T ) y ) = P ( T e y ) = F T ( e y )
By Equation (1), we obtain
F Y ( y ) = F T ( e y ) = 1 exp ( e y a ) γ = 1 exp e γ y a γ = 1 exp e γ y e γ log ρ = 1 exp e γ ( y log ρ ) = 1 exp exp ( y a b )
where  γ = 1 b  and  log ρ = a
The survival function of  S Y ( y )  is given by
S Y ( y ) = 1 F Y ( y ) = exp exp ( y a b )
The hazard function  h Y ( y )  is given by
h Y ( y ) = f Y ( y ) S Y ( y ) = 1 b exp ( y a b )
These equations provide a comprehensive understanding of the log-Weibull distribution and its relationship to the Gumbel distribution, including its PDF, CDF, survival function, and hazard function.

4. Weibull AFT Regression Model

In the Weibull AFT regression model, let T represent survival time. Consider a random sample of size n from a target population. For each subject  i ( i = 1 , 2 , , n ) , we have observed values of covariates  x i 1 , x i 2 , , x i p  and possibly censored survival time  t i . The Weibull AFT model can be expressed as:
log ( t i ) = β 0 + β 1 x i 1 + + β p x i p + σ ϵ i = x i β + σ ϵ i , i = 1 , 2 , , n
Here,  β = ( β 0 , , β p )  represent the regression coefficients of interest,  σ  is a scale parameter, and  ϵ 1 , ϵ n  are i.i.d distributed according to a Gumbel distribution with the PDF
f ϵ ( x ) = exp ( x ) exp exp ( x )
and the CDF
F ϵ ( x ) = 1 exp exp ( x )
It is important to note that this Gumbel distribution corresponds to a  G ( 0 , 1 )  distribution or a standard Gumbel distribution.
Now, we can derive the PDF of T from Equation (8) 
log ( T ) = x β + σ ϵ T = e x β + σ ϵ (11) ϵ = g 1 ( T ) = log ( T ) x β σ (12) | J | = | d ( g 1 ( T ) ) d T | = 1 σ T
Substituting Equations (11) and (12) into Equation (9), we obtain:
f T ( t ) = f ϵ ( g 1 ( t ) ) | J | = exp ( log ( t ) x β σ ) exp exp ( log ( t ) x β σ ) 1 σ t = ( t exp ( x β ) ) 1 σ exp ( t exp ( x β ) ) 1 σ 1 σ t = 1 / σ exp ( x β ) ( t exp ( x β ) ) 1 σ 1 exp ( t exp ( x β ) ) 1 σ
Comparing Equation (13) with Equation (2) and letting  γ = 1 σ  and  ρ = exp ( x β ) , we can see T has a Weibull distribution  T W ( exp ( x β ) , 1 σ ) .
As shown in Equation (3), the survival function of  T W ( exp ( x β ) , 1 σ )  can be written as
S T ( t ) = exp ( t exp ( x β ) ) 1 σ
Referring to Equations (3) and (4), replacing  ρ  with  exp ( x β ) , and replacing  γ  with  1 σ , the expected survival time is given as:
E ( T ) = exp ( x β ) Γ ( σ + 1 )
Since most statistical software use  log ( T )  to calculate the parameters, let us show the distribution and characteristics of  log ( T ) . Let
Y = log ( T ) = x β + σ ϵ
ϵ = g 1 ( Y ) = Y x β σ
| J | = | d ( g 1 ( Y ) ) d Y | = 1 σ
Substituting Equations (16) and (17) into Equation (9), we obtain:
f Y ( y ) = f ϵ ( g 1 ( Y ) ) | J | = 1 σ exp ( y x β σ ) exp exp ( y x β σ )
If we compare Equation (18) to Equation (5), we can see Y (i.e.,  log ( T ) ) has a  G ( x β , σ )  distribution. We can also observe the use of the error term  ϵ , which follows a  G ( 0 , 1 )  distribution in Equation (8). This is analogous to the error term in a simple linear regression, which has an  N ( 0 , σ 2 )  distribution.
Referring to Equations (13) and (18), we can see that in the Weibull AFT model, T has a Weibull  W ( exp ( x β , 1 σ ) )  distribution, and  log ( T )  has a Gumbel  G ( x β , σ )  distribution.
From Equation (7), the survival function of Y (i.e.,  log ( T ) ) is given as:
S Y ( y ) = exp exp ( y x β σ )
and the expectation of Y (i.e  log ( T ) ) is calculated as:
E ( Y ) = x β σ ξ
where  ξ 0.57721  is the Euler–Mascheroni constant.
It is important to note that by Jensen’s inequality,  E ( log ( T ) ) log ( E ( T ) )  since  log ( x )  is a concave down function. Therefore, it is not appropriate to use  exp ( x β σ ξ )  to calculate the expected survival time. Equation (15) provides the correct formula for calculating the expected survival time.

5. Estimating Weibull AFT Model Parameters

The parameters of the Weibull AFT model can be estimated using the maximum likelihood method. The likelihood function for the observed  log ( t )  times,  y 1 , y 2 , . . . , y n , is given by:
L ( β , σ ; y i ) = i = 1 n f Y ( y i ) δ i S Y ( y i ) 1 δ i
Here,  δ i  is the event indicator for the ith subject, where  δ i = 1  if an event has occurred, and  δ i = 0  if the event has not occurred. The maximum likelihood estimation (MLE) involves calculating  p + 1  parameters:  σ , β 1 , β p . Taking the natural logarithm of the likelihood function allows the use of the Newton–Raphson method to compute these parameters. Most statistical software packages can perform these calculations.

6. Calculating Expected Survival Time by the Weibull AFT Model

In reliability research, the expected survival time is often referred to as the mean time to failure (MTTF) or mean time between failures (MTBF) [8].
To predict an individual’s mean survival time  t i  using the Weibull AFT model, we first use the MLE method, as described in Equation (20) to calculate the estimates  β ^  and  σ ^ . Then, by the invariance property of the MLE, we can directly compute the predicted MTTF using Equation (15):
t i = exp ( x i β ^ ) Γ ( σ ^ + 1 )
After calculating the MTTF, we can apply the Delta method to establish a confidence interval for the MTTF. This method treats the predicted MTTF as a function of  β ^  and  σ ^ . The standard error of the MTTF can be calculated as:
S E = E ( t i ) ^ β ^ E ( t i ) ^ σ ^ t Σ σ ^ β ^ E ( t i ) ^ β ^ E ( t i ) ^ σ ^ 1 2
where  Σ σ ^ β ^  is the variance–covariance matrix of  β ^  and  σ ^ . It can be estimated by the observed Fisher information of the Weibull AFT model. The (1 −  α ) % confidence interval is given as:
t ^ i z 1 α 2 S E < t i < t ^ i + z 1 α 2 S E
Here,  α  represents the type I error, and z is the quantile of the standard normal distribution.

7. Calculating Median Survival Time by the Weibull AFT Model

In survival analysis, another crucial statistic is the median survival time or percentile survival time. The pth percentile of the survival time can be computed from the survival function. For an individual i, the pth percentile of survival time is determined by:
S T ( t i ( p ) ) = 100 p 100
For the Weibull AFT model, Equation (14) is used to calculate the pth percentile survival time for an individual i:
S T ( t i ) = exp ( t i exp ( x β ) ) 1 σ = 100 p 100
This leads to the following expression for the estimated pth percentile survival time after obtaining  β ^  and  σ ^  using the MLE method:
t i = log ( 100 p 100 ) σ exp ( x i β )
The calculation of the median survival time corresponds to p = 50, which can be specifically determined as:
t i ( 50 ) = ( log 2 ) σ ^ exp ( x i β ^ )
Similarly, we can use the Delta method to calculate the standard error of the predicted pth survival time when p is fixed, following the approach detailed in Equations (21) and (22).

8. Minimum Prediction Error Survival Time (MPET)

Both mean and median survival time estimates can be biased when a small sample is used, especially in models that incorporate censoring [8]. Henderson et al. proposed a method to find the optimum prediction time with the minimum prediction error [9]. They suggested that if an observed survival time t falls in the interval  p k < t < k p  where p is the predicted survival time and  k > 1 , then the prediction should be considered accurate. The probability of prediction error  E k  conditional on the predicted time p is given by:
P ( E k | p ) = P ( T < p / k ) + P ( T > k p )
This probability can be expressed as:
f T ( p / k ) = k 2 f ( k p )
The probability of prediction error  P ( E k | p )  achieves the minimum value.
Now, let us calculate the minimum prediction error for the Weibull AFT model. Referring to Equation (13), we have:
f T ( p / k ) = 1 / σ exp ( x β ) ( p / k exp ( x β ) ) 1 σ 1 exp ( p / k exp ( x β ) ) 1 σ k 2 f T ( k p ) = k 2 1 / σ exp ( x β ) ( k p exp ( x β ) ) 1 σ 1 exp ( k p exp ( x β ) ) 1 σ
Substituting the above equations into Equation (24) and canceling the common parts, we obtain:
k 1 1 σ exp ( p / k exp ( x β ) ) 1 σ = k 1 + 1 σ exp ( k p exp ( x β ) ) 1 σ
We then take the natural logarithm of both sides:
( 1 1 σ ) log ( k ) ( p / k exp ( x β ) ) 1 σ = ( 1 + 1 σ ) log ( k ) ( k p exp ( x β ) ) 1 σ
Rearranging these terms, we can solve for p to calculate the minimum prediction error survival time:
p = 2 σ log ( k ) k 1 σ k 1 σ σ exp ( x β )
Here, p presents the minimum prediction error survival time. To estimate its standard error, the Delta method can be employed, and bootstrap methods can also be used to obtain a confidence interval for the minimum prediction error survival time.
This approach helps to minimize prediction errors and enhance the accuracy of survival time predictions in the Weibull AFT model, especially when dealing with censored data and small sample sizes.

9. An Example to Predict the Survival Time

We use a publicly available larynx cancer dataset to illustrate the process of making survival time predictions. This dataset consists of records for 90 male larynx cancer patients, each with five variables: the stage of the disease (stage: 1, 2, 3, 4), the time to death or the duration of on-study time in months (time), the age at the diagnosis of larynx cancer (age), the year of diagnosis of larynx cancer (diagyr), and a death indicator (death: 0 = alive, 1 = dead). We added a new variable ID into the dataset and changed the variable name delta to death. The dataset can be downloaded from https://vincentarelbundock.github.io/Rdatasets/datasets.html.
The larynx cancer data are structured as follows: Applsci 13 13041 i001
We used two predictor variables to make survival time predictions: the stage of the disease and the age at the diagnosis of larynx cancer. Since the “stage” is a categorical variable, we created three dummy variables for stages 2, 3, and 4, with stage 1 as the default reference group. The survival probability of patients at various stages and time intervals can be observed in the following Kaplan–Meier plot (Figure 1):
The Weibull AFT model can be expressed as follows:
log ( T ) = β 0 + β 1 s t a g e 2 + β 2 s t a g e 3 + β 3 s t a g e 4 + β 4 a g e + σ ϵ ϵ G ( 0 , 1 )
Most statistical software, such as R, can be used to run the Weibull regression model. In R, we can use the following code:
library(survival)
larynx<-read.csv("D:/larynx.csv")
wr <- survreg(Surv(time, death) ~ factor(stage) + age,
data = larynx,dist="w")
summary(wr)
The following results were obtained from the model:
Call:
survreg(formula = Surv(time, death) ~ factor(stage) + age, data = larynx,
    dist = "w")
ValueStd. Errorzp
(Intercept)3.52880.90413.9039.50e-05
factor(stage)2-0.14770.4076-0.3627.17e-01
factor(stage)3-0.58660.3199-1.8336.68e-02
factor(stage)4-1.54410.3633-4.2512.13e-05
age-0.01750.0128-1.3671.72e-01
Log(scale)-0.12230.1225-0.9993.18e-01
Scale = 0.885
Weibull distribution
Loglik(model) = -141.4Loglik(intercept only) = -151.1
Chisq= 19.37 on 4 degrees of freedom, p = 0.00066
Number of Newton-Raphson Iterations: 5
n = 90
Suppose we want to predict the survival time for a patient with ID = 46, who is at larynx cancer stage 2 and is 74 years old. We can use the following equations:
1.
To calculate the mean time to failure (MTTF):
M T T F 46 = E ( t 46 ) ^ = exp ( x i β ^ ) Γ ( σ ^ + 1 ) = e x p ( 3.5288 0.1477 1 0.5866 0 1.5441 0 0.0175 74 ) Γ ( 1.885 ) = 7.7 ( m o n t h s )
2.
To calculate the median survival time:
M e d i a n 46 = ( log 2 ) σ ^ exp ( x i β ^ ) = log ( 2 ) 0.885 e x p ( 3.5288 0.1477 1 0.5866 0 1.5441 0 0.0175 74 ) = 5.8 ( months )
3.
To calculate the minimum prediction error survival time (MPET) using Equation (26) with a fixed k = 2:
M P E T 46 = 2 σ log ( k ) k 1 σ k 1 σ σ exp ( x β ) = 2 σ log ( k ) 2 1 0.885 k 1 0.885 0.885 exp ( 3.5288 0.1477 1 0.5866 0 1.5441 0 0.0175 74 ) = 7.4 ( months )
It seems that these prediction methods yield results quite close to the real survival time of patient ID = 46, which was 6.2 months.

10. Calculating the 95% Confidence Interval of the Predicted Time

First, we used Equation (21) to calculate the standard error of the survival time:
S E = E ( t i ) ^ β ^ E ( t i ) ^ σ ^ t Σ σ ^ β ^ E ( t i ) ^ β ^ E ( t i ) ^ σ ^ 1 2 = ( log 2 ) σ ^ exp ( x i β ^ ) β ^ ( log 2 ) σ ^ exp ( x i β ^ ) σ ^ t Σ σ ^ β ^ ( log 2 ) σ ^ exp ( x i β ^ ) β ^ ( log 2 ) σ ^ exp ( x i β ^ ) σ ^ 1 2 = ( ( log 2 ) σ ^ exp ( x β ^ ) ( log 2 ) σ ^ exp ( x β ^ ) s t a g e 2 ( log 2 ) σ ^ exp ( x β ^ ) s t a g e 3 ( log 2 ) σ ^ exp ( x β ^ ) s t a g e 4 ( log 2 ) σ ^ exp ( x β ^ ) a g e ( log 2 ) σ ^ log ( log 2 ) exp ( x β ^ ) ) t Σ σ ^ β ^ ( ( log 2 ) σ ^ exp ( x β ^ ) ( log 2 ) σ ^ exp ( x β ^ ) s t a g e 2 ( log 2 ) σ ^ exp ( x β ^ ) s t a g e 3 ( log 2 ) σ ^ exp ( x β ^ ) s t a g e 4 ( log 2 ) σ ^ exp ( x β ^ ) a g e ( log 2 ) σ ^ log ( log 2 ) exp ( x β ^ ) ) 1 2
The variance–covariance matrix  Σ σ ^ β ^  can be calculated by the observed Fisher information of the Weibull AFT model. In most statistical software, this variance–covariance matrix can be computed directly. In R, we used the following R code to obtain the  Σ σ ^ β ^ matrix:
wr$var
which produces the following:
(Intercept)stage2stage3stage4ageLog(scale)
(Intercept)0.817-0.09049-0.08479-0.0444-0.011140.02591
stage2-0.0900.166110.053190.05070.000570.00016
stage3-0.0850.053190.102370.05670.00042-0.00731
stage4-0.0440.050680.056680.1320-0.00020-0.01070
age-0.0110.000570.00042-0.00020.00016-0.00026
Log(scale)0.0260.00016-0.00731-0.0107-0.000260.01501
Note that in the results above, the last row represents the log(scale), denoted as  log ( σ ^ ) , and what we obtained is the covariance of  β ^ s and  log ( σ ^ ) . For  Σ σ ^ β ^ , we needed to change  l o g ( σ ^ )  back to  σ ^ . Some extra calculations were needed. To make this adjustment, we can refer to the formulas found on page 401 of John Klein’s book [10]. Our calculations were:
C o v ( β 0 , σ , ) = C o v ( β 0 , e log ( σ ) ) = C o v ( β 0 , log ( σ ) ) σ = 0.02292735 C o v ( β 1 , σ , ) = C o v ( β 1 , e log ( σ ) ) = C o v ( β 1 , log ( σ ) ) σ = 0.0001403178 C o v ( β 2 , σ , ) = C o v ( β 2 , e log ( σ ) ) = C o v ( β 2 , log ( σ ) ) σ 0.006469443 C o v ( β 3 , σ , ) = C o v ( β 3 , e log ( σ ) ) = C o v ( β 3 , log ( σ ) ) σ = 0.009470604 C o v ( β 4 , σ , ) = C o v ( β 4 , e log ( σ ) ) = C o v ( β 4 , log ( σ ) ) σ = 0.0002297781 C o v ( σ ) = C o v ( e log ( σ ) ) = ( e log ( σ ) ) 2 C o v ( log ( σ ) ) = σ 2 V a r ( l o g ( σ ) ) = 0.0117501
We replaced the last row of our variance–covariance matrix from R with these six values: Applsci 13 13041 i002 which is the  Σ σ ^ β ^  matrix needed to calculate the standard error in Equation (26).
If we use SAS software (SAS (SAS Institute Inc., Cary, NC, USA)), we can directly obtain the variance–covariance matrix of  β ^  and  σ ^  by using the following statements:
proc lifereg data=larynx order=data COVOUT outest=est;
class stage;
model time∗death(0)=stage age/dist=weibull;
run;
proc print data=est;
run;
The column vector on the right side of  Σ σ ^ β ^  in Equation (26) can be calculated as follows:
( ( log 2 ) σ ^ exp ( x β ^ ) ( log 2 ) σ ^ exp ( x β ^ ) s t a g e 2 ( log 2 ) σ ^ exp ( x β ^ ) s t a g e 3 ( log 2 ) σ ^ exp ( x β ^ ) s t a g e 4 ( log 2 ) σ ^ exp ( x β ^ ) a g e ( log 2 ) σ ^ log ( log 2 ) exp ( x β ^ ) ) = ( ( log 2 ) 0.885 exp ( 3.5288 0.1477 1 0.5866 0 1.5441 0 0.0175 74 ) ( log 2 ) 0.885 exp ( 3.5288 0.1477 1 0.5866 0 1.5441 0 0.0175 74 1 ) ( log 2 ) 0.885 exp ( 3.5288 0.1477 1 0.5866 0 1.5441 0 0.0175 74 0 ) ( log 2 ) 0.885 a exp ( 3.5288 0.1477 1 0.5866 0 1.5441 0 0.0175 74 0 ) ( log 2 ) 0.885 exp ( 3.5288 0.1477 1 0.5866 0 1.5441 0 0.0175 74 74 ) ( log 2 ) 0.885 log log ( log 2 ) exp ( 3.5288 0.1477 1 0.5866 0 1.5441 0 0.0175 74 ) ) = ( 5.8383 5.8383 0 0 432.03 −7.7915 )
Now, with all the necessary components in place, we can calculate the standard error of the median survival time:
S E = 5.8383 5.8383 0 0 432.03 7.7915 t 0.817 0.0905 0.0848 0.0444 0.0111 0.0229 0.090 0.1661 0.0532 0.0507 0.00057 0.00014 0.0859 0.0532 0.1024 0.0567 0.00042 0.0065 0.044 0.0507 0.0567 0.1320 0.00020 0.0095 0.011 0.00057 0.0004 0.00020 0.00016 0.00023 0.023 0.00014 0.0065 0.0095 0.00023 0.01175 5.8383 5.8383 0 0 432.03 7.7915 1 2 = 2.156133
This calculation yields a standard error of approximately 2.156133. Consequently, the 95% confidence interval for the median survival time is given by:
95 % C I : ( 5.83 1.96 2.16 < M e d i a n 46 < 5.83 + 1.96 2.16 ) = ( 1.60 to 10.01 ) months .
which means we are  95 %  confident that the survival time will be within 1.60 to 10.01 months. Alternatively, we can employ the built-in R function  p r e d i c t  to estimate the median survival time as follows:
Median46<-predict(wr, newdata=data.frame(stage=2,age=74),type="quantile",
p=0.5,se.fit=TRUE)
Median46
This results in:
$fit
  5.838288
$se.fit
2.095133
The standard error differs slightly from our calculations because R uses Greenwood’s formula to calculate the standard error of the survival function [11].
Note that in R’s built-in  p r e d i c t  function for the Weibull AFT model, type = “response” calculates  exp ( x β ^ )  without considering  Γ ( 1 + σ ^ )  and type = “lp” computes  x β ^  only; thus, we should not use them to predict MTTF. Additionally, to the best of our knowledge, there is no available software for calculating the minimum prediction error survival time.

11. Assessing Point Prediction Accuracy

Henderson et al. [9], inspired by Parkes [12], introduced a simple approach to assess the accuracy of predicted survival times. Let t represent the observed survival time and p represent the predicted time. If  p / k t k p , then the point prediction p is considered as “accurate”, otherwise, it is labeled as “inaccurate”.
Alternatively, Christakis and Lamont proposed a “33 percent rule” to measure accuracy. In that method, the observed time is divided by the predicted survival time, and a prediction is considered “accurate” if that quotient falls between 0.67 and 1.33. Values less than 0.67 or greater than 1.33 are categorized as “errors” [13]. That method is essentially equivalent to setting  k = 3  in Parkes’s method. For our accuracy assessment, we chose to use  k = 2 . The accuracy rate was defined as the proportion of “accurate” predictions relative to the total sample size. The results are presented in Table 1.

12. Discussion

In this paper, we introduced how to use the Weibull AFT model to predict when an event will occur. We utilized mean survival time (mean time to failure time, mean time between failures), median survival time, and minimum prediction error survival time to make predictions about the time from the baseline to the event. We also assessed prediction accuracy using Parkes’s method. When we fixed  k = 2 , the accuracy was 55.6% for the median, 50% for the MTTF, and 51.1% for the MPET. However, by setting  k = 3 , as suggested by Christakis and Lamont, the accuracy rate increased to 77.8%, 66.7%, and 67.8%, respectively. It is worth noting that our sample size was relatively small, and we only used two predictors. With a larger sample and more predictors, the accuracy rate could potentially be even higher. If there are many covariates that could be included in the model, various variable selection methods, such as backward elimination, forward selection, stepwise selection, and all possible subset selection can be employed. These methods may incorporate different stopping rules, such as p-values, Akaike information criterion (AIC), Bayesian information criterion (BIC), and Mallows’s Cp statistic to construct clinical prediction models [14]. Additionally, in this sample, we did not observe that the MPET had a significantly better accuracy rate than the median survival time.
Parametric survival models offer advantages in predicting survival time compared to the semiparametric Cox regression model. The Cox regression model, which can be specified as  S i ( t | x i ) = S 0 ( t ) exp ( x i β ) , cannot directly predict time. Instead, it requires first specifying a certain period of time and then calculates the probability of an event within that period of time. The lognormal model is an alternative parametric model that can be employed to fit survival data. However, it comes with a drawback—parametric survival models, including the lognormal model, necessitate stronger assumptions compared to semiparametric models. Other models, like the logistic regression model or neural network models, are typically utilized to model binary events, regardless of when those events occurred. Poisson regression models can also be applied to model survival data with count data types (0, 1, 2, 3, and so forth). Nevertheless, similar to the Cox model, a prespecified period of time is required to calculate the probability of an event. The choice of which model to use should be guided by the specific questions we aim to address and the type of available data [15].
Currently, most clinical prediction models calculate a patient’s probability of having or developing a specific disease or risk scores based on these probabilities [16]. However, providing a probability can be challenging to understand for the general population, and probability itself can be defined in various ways [17]. In practice, the time axis remains the most natural measure for both clinicians and patients. Predicting when an event will occur can offer a practical and concrete guide to clinicians and healthcare providers for managing their patients [18]. It can also assist families and patients in making suitable plans for the remaining lifespan.
In this paper, our intention was not to utilize the publicly available larynx cancer dataset for the development of an actual prediction tool. Rather, we employed the dataset to illustrate the application of statistical methods and evaluate point accuracy. Developing a real prediction tool would require a much larger dataset and rigorous internal and external validations. Readers interested in the steps to develop such a tool can refer to the book Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating [19].

Author Contributions

Conceptualization, E.L.; methodology, E.L.; validation, K.L. and R.Y.L.; writing—original draft preparation, E.L.; writing—review and editing, K.L. and R.Y.L.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Example data are publicly available at https://vincentarelbundock.github.io/Rdatasets/datasets.html.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sullivan, L.M.; Massaro, J.M.; D’Agostino, R.B., Sr. Presentation of multivariate data for clinical use: The Framingham Study risk score functions. Stat. Med. 2004, 23, 1631–1660. [Google Scholar] [CrossRef] [PubMed]
  2. Vandenput, L.; Johansson, H.; McCloskey, E.V.; Liu, E.; Åkesson, K.E.; Anderson, F.A.; Azagra, R.; Bager, C.L.; Beaudart, C.; Bischoff-Ferrari, H.A.; et al. Update of the fracture risk prediction tool FRAX: A systematic review of potential cohorts and analysis plan. Osteoporos. Int. 2023, 33, 2103–2136. [Google Scholar] [CrossRef] [PubMed]
  3. Ali, J.B.; Chebel-Morello, B.; Saidi, L.; Malinowski, S.; Fnaiech, F. Accurate bearing remaining useful life prediction based on Weibull distribution and artificial neural network. Mech. Syst. Signal Process. 2015, 56, 150–172. [Google Scholar]
  4. Fu, B.; Labuza, T.P. Shelf-life prediction: Theory and application. Food Control 1993, 4, 125–133. [Google Scholar] [CrossRef]
  5. Li, X.; Lu, W.F.; Zhai, L.; Er, M.J.; Pan, Y. Remaining life prediction of cores based on data-driven and physical modeling methods. In Handbook of Manufacturing Engineering and Technology; Springer: Berlin/Heidelberg, Germany, 2015; pp. 3239–3264. [Google Scholar]
  6. Gorgoso-Varela, J.J.; Rojo-Alboreca, A. Use of Gumbel and Weibull functions to model extreme values of diameter distributions in forest stands. Ann. For. Sci. 2014, 71, 741–750. [Google Scholar] [CrossRef]
  7. Lai, C.-D. Generalized Weibull Distributions; Springer: Berlin/Heidelberg, Germany, 2014; pp. 23–75. [Google Scholar]
  8. Ho, L.; Silva, A. Unbiased estimators for mean time to failure and percentiles in a Weibull regression model. Int. J. Qual. Reliab. Manag. 2006, 23, 323–339. [Google Scholar] [CrossRef]
  9. Henderson, R.; Jones, M.; Stare, J. Accuracy of point predictions in survival analysis. Stat. Med. 2001, 20, 3083–3096. [Google Scholar] [CrossRef] [PubMed]
  10. Klein, J.P.; Moeschberger, M.L. Survival Analysis: Techniques for Censored and Truncated Data; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  11. Collett, D. Modelling Survival Data in Medical Research; Chapman and Hall/CRC: Boca Raton, FL, USA, 2015. [Google Scholar]
  12. Parkes, C.M. Accuracy of predictions of survival in later stages of cancer. Br. Med. J. 1972, 2, 29. [Google Scholar] [CrossRef] [PubMed]
  13. Christakis, N.A.; Smith, J.L.; Parkes, C.M.; Lamont, E.B. Extent and determinants of error in doctors’ prognoses in terminally ill patients: Prospective cohort studyCommentary: Why do doctors overestimate? Commentary: Prognoses should be based on proved indices not intuition. BMJ 2000, 320, 469–473. [Google Scholar] [CrossRef] [PubMed]
  14. Chowdhury, M.Z.; Turin, T.C. Variable selection strategies and its importance in clinical prediction modelling. Fam. Med. Community Health 2020, 8, e000262. [Google Scholar] [CrossRef] [PubMed]
  15. Nardi, A.; Schemper, M. Comparing Cox and parametric models in clinical studies. Stat. Med. 2003, 22, 3597–3610. [Google Scholar] [CrossRef] [PubMed]
  16. Lee, Y.-H.; Bang, H.; Kim, D.J. How to establish clinical prediction models. Endocrinol. Metab. 2016, 31, 38–44. [Google Scholar] [CrossRef] [PubMed]
  17. Saunders, S. What is Probability? Quo Vadis Quantum Mechanics? Springer: Berlin/Heidelberg, Germany, 2005; pp. 209–338. [Google Scholar]
  18. Liu, E.; Killington, M.; Cameron, I.D.; Li, R.; Kurrle, S.; Crotty, M. Life expectancy of older people living in aged care facilities after a hip fracture. Sci. Rep. 2021, 11, 20266. [Google Scholar] [CrossRef] [PubMed]
  19. Steyerberg, E.W. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Figure 1. Kaplan–Meier plot of survival probability.
Figure 1. Kaplan–Meier plot of survival probability.
Applsci 13 13041 g001
Table 1. Prediction results and accuracy (last digit in predicted time: 0, inaccurate; 1, accurate).
Table 1. Prediction results and accuracy (last digit in predicted time: 0, inaccurate; 1, accurate).
IDStageAgeDeathTimeMedian (95% CI)MTTFMPET
117710.66.42 (3.16, 9.68), 08.47, 08.11, 0
215311.39.77 (4.07, 15.46), 012.9, 012.34, 0
314512.411.23 (3.07, 19.39), 014.84, 014.19, 0
415702.59.11 (4.32, 13.89), 012.03, 011.5, 0
515813.28.95 (4.36, 13.54), 011.82, 011.3, 0
615103.210.11 (3.88, 16.34), 013.36, 012.78, 0
717613.36.54 (3.29, 9.78), 18.62, 08.25, 0
816303.38.2 (4.37, 12.03), 010.83, 010.36, 0
914313.511.63 (2.72, 20.54), 015.36, 014.7, 0
1016013.58.64 (4.4, 12.89), 011.41, 010.91, 0
11152149.94 (3.98, 15.89), 013.13, 012.55, 0
12163148.2 (4.37, 12.03), 010.83, 010.36, 0
1318614.35.49 (1.96, 9.01), 17.24, 16.92, 1
1414804.510.66 (3.52, 17.79), 014.08, 013.47, 0
1516804.57.52 (4.12, 10.91), 19.92, 09.49, 0
1618115.35.99 (2.63, 9.34), 17.9, 17.56, 1
1717005.57.26 (3.96, 10.56), 19.58, 19.16, 1
1815805.98.95 (4.36, 13.54), 111.82, 011.3, 1
1914705.910.84 (3.38, 18.31), 114.33, 013.7, 0
20175166.65 (3.41, 9.89), 18.78, 18.39, 1
2117706.16.42 (3.16, 9.68), 18.47, 18.11, 1
2216406.28.06 (4.34, 11.77), 110.64, 110.18, 1
2317716.46.42 (3.16, 9.68), 18.47, 18.11, 1
2416716.57.65 (4.19, 11.1), 110.1, 19.66, 1
2517906.56.2 (2.9, 9.5), 18.18, 17.83, 1
2616106.78.49 (4.4, 12.58), 111.21, 110.73, 1
27166077.78 (4.25, 11.31), 110.27, 19.83, 1
2816817.47.52 (4.12, 10.91), 19.92, 19.49, 1
2917307.46.89 (3.65, 10.12), 19.09, 18.69, 1
3015608.19.27 (4.28, 14.26), 112.24, 111.71, 1
3117308.16.89 (3.65, 10.12), 19.09, 18.69, 1
3215809.68.95 (4.36, 13.54), 111.82, 111.3, 1
33168010.77.52 (4.12, 10.91), 19.92, 19.49, 1
3428610.24.73 (0.68, 8.78), 06.25, 05.97, 0
3526411.86.95 (2.37, 11.54), 09.18, 08.78, 0
36263127.07 (2.4, 11.75), 09.34, 08.93, 0
3727102.26.15 (1.96, 10.34), 08.12, 07.77, 0
3826702.66.6 (2.22, 10.97), 08.71, 08.33, 0
3925103.38.72 (2.28, 15.17), 011.52, 011.02, 0
4027013.66.26 (2.03, 10.49), 18.26, 07.9, 0
4127203.66.05 (1.89, 10.2), 17.98, 07.63, 0
42281145.17 (1.13, 9.21), 16.82, 16.52, 1
4324704.39.36 (1.98, 16.73), 012.36, 011.82, 0
4426404.36.95 (2.37, 11.54), 19.18, 08.78, 0
45266056.71 (2.28, 11.15), 18.86, 18.48, 1
4627416.25.84 (1.73, 9.94), 17.7, 17.37, 1
47262177.2 (2.43, 11.97), 19.51, 19.09, 1
4825007.58.88 (2.22, 15.54), 111.73, 111.22, 1
4925307.68.42 (2.38, 14.47), 111.13, 110.64, 1
5026109.37.33 (2.46, 12.2), 19.67, 19.25, 1
5134910.35.83 (2.41, 9.24), 07.69, 07.36, 0
5237110.33.97 (2.19, 5.75), 05.24, 05.01, 0
5335710.55.07 (2.68, 7.45), 06.69, 06.4, 0
5437910.73.45 (1.56, 5.34), 04.55, 04.35, 0
5538210.83.27 (1.32, 5.23), 04.32, 04.13, 0
56349115.83 (2.41, 9.24), 07.69, 07.36, 0
5736011.34.81 (2.68, 6.94), 06.35, 06.07, 0
5836411.64.48 (2.58, 6.39), 05.92, 05.66, 0
5937411.83.76 (1.96, 5.56), 04.97, 04.75, 0
6037211.93.9 (2.12, 5.68), 05.14, 04.92, 0
6135311.95.43 (2.6, 8.27), 07.17, 06.86, 0
6235413.25.34 (2.63, 8.05), 17.05, 06.74, 0
6338113.53.33 (1.4, 5.27), 14.39, 14.2, 1
6435203.75.53 (2.56, 8.49), 17.3, 16.98, 1
6536604.54.33 (2.49, 6.17), 15.71, 15.47, 1
6635404.85.34 (2.63, 8.05), 17.05, 16.74, 1
6736304.84.56 (2.61, 6.51), 16.02, 15.76, 1
68359154.89 (2.69, 7.1), 16.46, 16.18, 1
69349055.83 (2.41, 9.24), 17.69, 17.36, 1
7036905.14.11 (2.32, 5.89), 15.42, 15.19, 1
7137016.34.04 (2.26, 5.82), 15.33, 15.1, 1
7236516.44.41 (2.54, 6.27), 15.82, 15.56, 1
7336506.54.41 (2.54, 6.27), 15.82, 15.56, 1
7436817.84.18 (2.38, 5.98), 15.52, 15.28, 1
75378083.51 (1.64, 5.38), 04.63, 14.43, 1
7636909.34.11 (2.32, 5.89), 05.42, 15.19, 1
77351010.15.63 (2.52, 8.73), 17.43, 17.11, 1
7846510.11.69 (0.77, 2.61), 02.23, 02.14, 0
7947110.31.52 (0.7, 2.34), 02.01, 01.92, 0
8047610.41.4 (0.61, 2.18), 01.84, 01.76, 0
8146510.81.69 (0.77, 2.61), 02.23, 02.14, 0
8247810.81.35 (0.56, 2.13), 11.78, 01.7, 0
83441112.57 (0.3, 4.84), 03.4, 03.25, 0
8446811.51.6 (0.74, 2.47), 12.12, 12.03, 1
85469121.58 (0.73, 2.42), 12.08, 11.99, 1
8646212.31.78 (0.78, 2.79), 12.35, 12.25, 1
8747402.91.44 (0.65, 2.24), 01.91, 11.82, 1
8847113.61.52 (0.7, 2.34), 02.01, 11.92, 1
8948413.81.21 (0.42, 2.01), 01.6, 01.53, 0
9044804.32.28 (0.57, 3.99), 13.01, 12.87, 1
Accuracy rate (%) 55.6% (50/90)50% (45/90)51.1% (46/90)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, E.; Liu, R.Y.; Lim, K. Using the Weibull Accelerated Failure Time Regression Model to Predict Time to Health Events. Appl. Sci. 2023, 13, 13041. https://doi.org/10.3390/app132413041

AMA Style

Liu E, Liu RY, Lim K. Using the Weibull Accelerated Failure Time Regression Model to Predict Time to Health Events. Applied Sciences. 2023; 13(24):13041. https://doi.org/10.3390/app132413041

Chicago/Turabian Style

Liu, Enwu, Ryan Yan Liu, and Karen Lim. 2023. "Using the Weibull Accelerated Failure Time Regression Model to Predict Time to Health Events" Applied Sciences 13, no. 24: 13041. https://doi.org/10.3390/app132413041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop