Estimating Covariances and Goodness of Fit Plots for Accelerated Failure Time Models

David J. Olive; Sanjuka Johana Lemonge

doi:10.3390/axioms15010015

and

¹

School of Mathematical & Statistical Sciences, Southern Illinois University, Carbondale, IL 62901, USA

²

Department of Business, Black Hills State University, Spearfish, SD 57799, USA

^*

Author to whom correspondence should be addressed.

Axioms2026, 15(1), 15;https://doi.org/10.3390/axioms15010015
(registering DOI)

This article belongs to the Special Issue Probability Theory and Stochastic Processes: Theory and Applications

Version Notes

Order Reprints

Abstract

Let the response variable Y be the time until an event such as death. Assume that there are p predictors

x_{1}, \dots, x_{p}

and that the response variable is right censored. Several survival regression models, including accelerated failure time models, have the form

Z = log (Y) = α_{Z} + x_{i}^{T} β_{Z} + e

. This paper gives a simple method for estimating the covariances

C o v (x_{i}, Z)

for some of these models. Plots are given for checking the goodness of fit of accelerated failure time models. Plots for checking the proportional hazards regression model are often harder to use.

Keywords:

Buckley James estimator; proportional hazards regression; Weibull regression

MSC:

62N01; 62N02

1. Introduction

This section reviews some survival regression models, including plots for checking the models. The response variable

Y > 0

is the time until an event such as death. Let the

p \times 1

vector of predictor variables

x = {(x_{1}, \dots, x_{p})}^{T}

. Let the sufficient predictor

S P = α + x^{T} β

, and let the estimated sufficient predictor

E S P = \hat{α} + x^{T} \hat{β}

where it is possible that

α = \hat{α} = 0

. The ESP is sometimes called the estimated risk score.

Assume that the response variable is right censored so Y is not observed. Instead, the right censored survival time

T_{i} = min (Y_{i}, W_{i})

where

Y_{i}

is independent of the censoring time

W_{i}

. Also

δ_{i} = 0

if

T_{i} = W_{i}

is censored and

δ_{i} = 1

if

T_{i} = Y_{i}

is uncensored. Hence the data is

(T_{i}, δ_{i}, x_{i})

for

i = 1, \dots, n .

For an accelerated failure time model, the log transformation of the response variable results in a multiple linear regression model. Hence multiple linear regression models will be useful. Now let the response variable Y be for multiple linear regression, so Y need not be a nonnegative time until event. A useful multiple linear regression model is

Y | x^{T} β = α + x^{T} β + e

or

Y_{i} = α + x_{i, 1} β_{1} + \dots + x_{i, p} β_{p} + e_{i} = α + x_{i}^{T} β + e_{i}

(1)

for

i = 1, \dots, n .

Assume that the

e_{i}

are independent and identically distributed (iid) with expected value

E (e_{i}) = 0

and variance

V (e_{i}) = σ^{2}

. In matrix form, this model is

Y = X ϕ + e,

(2)

where Y is an

n \times 1

vector of dependent variables, X is an

n \times (p + 1)

matrix with ith row

(1, x_{i}^{T})

,

ϕ = {(α, β^{T})}^{T}

is a

(p + 1) \times 1

vector, and e is an

n \times 1

vector of unknown errors. Also

E (e) = 0

and Cov(

e) = σ^{2} I_{n}

where

I_{n}

is the

n \times n

identity matrix.

For a multiple linear regression model with heterogeneity, assume model (1) holds with

E (e) = 0

and Cov

(e) = Σ_{e} = d i a g (σ_{i}^{2}) = d i a g (σ_{1}^{2}, \dots, σ_{n}^{2})

is an

n \times n

positive definite matrix. When the

σ_{i}^{2}

are known, weighted least squares is often used. Under regularity conditions, the ordinary least squares (OLS) estimator

{\hat{ϕ}}_{O L S} = {(X^{T} X)}^{- 1} X^{T} Y

can be shown to be a consistent estimator of

ϕ

.

For estimation with ordinary least squares, let the

p \times p

covariance matrix of x be

Cov (x) = Σ_{x} = E [(x - E (x)) {(x - E (x))}^{T}]

and let the

p \times 1

vector

Cov (x, Y) = Σ_{x Y} = E [(x - E (x) (Y - E (Y))] = {(Cov (x_{1}, Y), \dots, Cov (x_{p}, Y))}^{T} .

Let

{\hat{Σ}}_{x} = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{i} - \bar{x}) {(x_{i} - \bar{x})}^{T} and {\hat{Σ}}_{x Y} = \frac{1}{n - 1} \sum_{i = 1}^{n} (x_{i} - \bar{x}) (Y_{i} - \bar{Y}) .

Then the OLS estimators for model (1) are

{\hat{ϕ}}_{O L S} = {(X^{T} X)}^{- 1} X^{T} Y

,

{\hat{α}}_{O L S} = \bar{Y} - {\hat{β}}_{O L S}^{T} \bar{x}

, and

{\hat{β}}_{O L S} = {\hat{Σ}}_{x}^{- 1} {\hat{Σ}}_{x Y} .

For a multiple linear regression model with iid cases,

{\hat{β}}_{O L S}

is a consistent estimator of

β_{O L S} = Σ_{x}^{- 1} Σ_{x Y}

under mild regularity conditions, while

{\hat{α}}_{O L S}

is a consistent estimator of

E (Y) - β_{O L S}^{T} E (x)

.

For a parametric accelerated failure time (AFT) model,

Z_{i} = log (Y_{i}) = α + β_{A}^{T} x_{i} + σ e_{i}

(3)

where the

e_{i}

are iid from a location scale family. The parameters are estimated by maximum likelihood.

The Weibull proportional hazards regression model or Weibull regression model is

Y | S P \sim W (γ = 1 / σ, λ_{0} exp (S P))

where Y has a Weibull

W (γ, λ)

distribution if the probability density function of Y is

f (y) = λ γ y^{γ - 1} exp [- λ y^{γ}]

for

y > 0, γ > 0,

and

λ > 0 .

This regression model can also be fit using the Cox (1972) [1] proportional hazards regression model. Let the sufficient predictor

S P = x^{T} β_{P}

. If

Y | x^{T} β_{P}

satisfies a Weibull regression model, then

Z = log (Y) = α + x^{T} β_{A} + e_{i}

satisfies a Weibull AFT with

λ_{0} = exp (- α / σ)

and

β_{P} = - β_{A} / σ

. Exponential regression is the special case where

σ = 1

.

Two other important AFTs are (i) the lognormal AFT where

log (Y) | x^{T} β_{A} \sim N (α + x^{T} β_{A}, σ^{2})

, where the

Y_{i}

are lognormal and the

e_{i} \sim N (0, 1)

are normal, and (ii) the loglogistic AFT where

log (Y) | x^{T} β_{A} \sim L (α + x^{T} β_{A}, σ)

, where the

Y_{i}

are loglogistic and the

e_{i} \sim L (0, 1)

are logistic. For the loglogistic AFT, Y follows a proportional odds model. Y does not follow a proportional hazards regression model for the loglogistic and lognormal AFTs.

The Buckley and James (1979) [2] estimator

({\hat{α}}_{B J}, {\hat{β}}_{B J})

is a nonparametric survival regression method for models of the form (3), and is a competitor for the parametric AFTs. When there is no censoring, this estimator is equivalent to the ordinary least squares estimator for multiple linear regression.

Often the log transformation results in a linear model with heterogeneity:

Z_{i} = log (Y_{i}) = α_{Z} + x_{i}^{T} β_{Z} + e_{i}

(4)

where the

e_{i}

are independent with expected value

E (e_{i}) = 0

and variance

V (e_{i}) = σ_{i}^{2} .

For the AFT and the Buckley James estimator, the variance is constant:

V (e_{i}) = σ^{2}

does not depend on i.

The Cox (1972) [1] proportional hazards regression model is a semiparametric model with

S P = β_{C}^{T} x

and hazard function

h_{Y | S P} (t) = exp (β_{C}^{T} x) h_{0} (t) = exp (S P) h_{0} (t)

where the baseline hazard function

h_{0} (t)

is left unspecified. The survival function is

S_{Y | S P} (t) = P (Y > t | S P = β_{C}^{T} x) = {[S_{0} (t)]}^{exp (β_{C}^{T} x)} = {[S_{0} (t)]}^{exp (S P)}

(5)

where

S_{0} (t)

is the baseline survival function. If

x = 0

is within the range of the predictors, then the baseline survival and hazard functions correspond to the survival and hazard functions of

x = 0 .

The cumulative hazard function

H_{Y | S P} (t) = - log (S_{Y | S P} (t))

for

t > 0

, and the hazard function

h_{Y | S P} (t) = \frac{d}{d t} H_{Y | S P} (t)

for

t > 0 .

High hazard implies low survival times while low hazard implies long survival times.

The literature for checking the goodness of fit of the proportional hazards model is fairly large. See, for example, Arjas (1988) [3], Feng et al. (2017) [4], Gill and Schumaker (1987) [5], Kay (1984) [6], Lin and Wei (1991) [7], Marzec and Marzec (1997) [8], Ng’andu (1997) [9], Quantin et al. (1996) [10], and Yu et al. (2008) [11].

Grambsch and Therneau (1994) [12] give a useful graphical check. Suppose the ith case had an uncensored survival time

t_{i}

. Let the scaled Schoenfeld residual for the ith observation and jth variable

x_{j}

be

r_{P, j}^{*} (t_{i})

. For each variable, plot the

t_{i}

versus the

r_{P, j}^{*} (t_{i}) + \hat{β_{j}}

and add the loess curve. If the loess curve is approximately horizontal for each of the p plots, then the proportional hazards assumption is reasonable. Alternatively, fit a line to each plot and test that each of the p slopes is equal to 0. The R function cox.zph makes both the plots and tests.

2. Materials and Methods

2.1. Estimating $Σ_{x Z}$ for Some Censored Survival Regression Models

This subsection derives an estimator for

Σ_{x Z} = Cov (x, Z)

where the right censored

Z_{i}

are not observed. Let the ordinary least squares (OLS) estimator be

{\hat{β}}_{O L S}

. Assume that the cases

(x_{i}, Y_{i})

are iid. Since model (4) is a multiple linear regression model, under mild regularity conditions, the population parameter vectors are equal:

β_{Z} = β_{O L S} = Σ_{x}^{- 1} Σ_{x Z}

. Thus

Σ_{x Z} = Cov (x) β_{Z} = Σ_{x} β_{Z} .

When the response

Y_{i}

is censored, several survival regression models give consistent estimators

{\hat{β}}_{Z}

of

β_{Z}

. Hence

{\hat{Σ}}_{x Z} = {\hat{Σ}}_{x} {\hat{β}}_{Z} .

(6)

The Buckley James estimator is equivalent to the OLS estimator when there is no censoring, and censoring does not change the population parameter. Although the population parameters are the same, the estimators are very different. The OLS estimator is not a consistent estimator of

β_{Z} = β_{O L S}

when censoring occurs. Hence

{\hat{Σ}}_{x} {\hat{β}}_{O L S}

is not a consistent estimator of

Σ_{x Z}

if the response variable is censored.

If an accelerated failure time model is used, then two estimators are

{\hat{Σ}}_{x Z} (A) = {\hat{Σ}}_{x} {\hat{β}}_{A}

and

{\hat{Σ}}_{x Z} (B) = {\hat{Σ}}_{x} {\hat{β}}_{B J}

. These two estimators require consistent estimators of

β_{Z} = Σ_{x}^{- 1} Σ_{x Z}

, which occurs, for example, if the cases

(x_{i}, Y_{i})

are iid from some population with covariance matrix

Σ_{x}

and covariance vector

Σ_{x Z}

. The survival times

Y_{i}

can be right censored, but the predictor variables

x_{1}, \dots, x_{p}

are not censored. Note that the predictor variables that have the highest absolute correlation with Z have the highest values of

| \hat{Cov} (x_{i}, Z) | / \sqrt{\hat{V} (x_{i})}

.

Although this technique is a simple plug in estimator, to our knowledge, the technique has not been suggested previously for survival regression models. In the literature, there are several estimators for the correlation

C o r (X, Y)

, where X and Y are survival times. These estimators usually use maximum likelihood estimation or multiple imputation assuming that

(X, Y)

are iid from a bivariate normal distribution. See, for example, Barchard and Russell (2024) [13], Li et al. (2018) [14], and Lyles, Fan, and Chuachoowong (2001) [15]. For covariance estimation, see references in Pesonen, Pesonen, and Nevalainen (2015) [16].

2.2. The EE Plot

It is important to check that a parametric AFT model is reasonable with the Buckley James estimator before using Equation (6). Make an EE plot of

E S P B J = x^{T} {\hat{β}}_{B J}

versus

E S P A = x^{T} {\hat{β}}_{A}

. For the Weibull AFT, also plot

E S P P H = - \hat{σ} x^{T} {\hat{β}}_{P}

versus the above two ESPs, where PH stands for the Cox proportional hazards estimator. The plotted points in the EE plot should scatter tightly about the identity line with zero intercept and unit slope. Lack of fit is suggested if the plotted points do not cluster tightly about the identity line. The identity line is included in the EE plots as a visual aid. Suppose the two estimators of

β = β_{Z}

are consistent. Then as

n \to \infty

, the correlation of the plotted points goes to 1 in probability for any finite interval, e.g., from the 1st percentile to the 99th percentile of

x^{T} {\hat{β}}_{B J}

.

For the Exponential regression model,

σ = 1

and

β_{P} = - β_{A}

. Since the Exponential regression model with

{\hat{β}}_{E}

is a special case of the Weibull regression model with

{\hat{β}}_{W}

, plots can use

E S P E = x^{T} {\hat{β}}_{E}

,

E S P B J

,

E S P P H

or

- x^{T} {\hat{β}}_{P}

, and

E S P A = E S P W

where

A = W

is the Weibull regression that estimates

\hat{σ}

.

The EE plots are much easier to use than other plots in the literature, since tight clustering about the identity line suggests goodness of fit.

3. Some Other Plots

Let

log (T_{i}) = \hat{α} + {\hat{β}}_{A}^{T} x_{i} + r_{i}

where the

r_{i}

are residuals. Collett (2003, p. 231) [17] defines a standardized residual

r_{S i} = r_{i} / \hat{σ} .

For accelerated failure time models, a log censored response (LCR) plot is a plot of

\hat{α} + {\hat{β}}_{A}^{T} x_{i}

versus log(

T_{i})

on the vertical axis with plotting symbol 0 for censored cases and + for uncensored cases. The identity line with unit slope and zero intercept is added to the plot, and the vertical deviations from the identity line =

r_{i}

. This plot is useful to check for cases with unusual survival times. The plot is also useful for checking for linearity (which is often rather weak due to the censored response and skewed data).

Ritov (1990) [18] and Tsiatis (1990) [19] consider the censored survival regression model

Z = t (Y) = α_{Z} + x_{i}^{T} β_{Z} + e

where t is a known monotone function. Then a plot of

{\hat{α}}_{Z} + x_{i}^{T} {\hat{β}}_{Z}

versus

t (T_{i})

is useful. A power transformation has the form

W = t_{λ} (T) = T^{λ}

for

λ \neq 0

and

W = t_{0} (T) = log (T)

for

λ = 0

where

λ \in Λ_{L} = {- 1, - 1 / 2, - 1 / 3, 0, 1 / 3, 1 / 2, 1} .

The modified power transformation family

t_{λ} (T_{i}) \equiv T_{i}^{(λ)} = \frac{T_{i}^{λ} - 1}{λ}

(7)

for

λ \neq 0

and

T_{i}^{(0)} = log (T_{i})

.

A graphical method for response transformations makes the transformation plot of

\hat{α} + x_{i}^{T} \hat{β}

versus

t_{λ} (T_{i})

using the Buckley James estimator with

W = t_{λ} (T)

as the “response” for

λ \in Λ_{L}

. Then a candidate response transformation

Z = t_{λ^{*}} (T)

has the transformation plot that suggests a linear model is appropriate.

Smith (2002, p. 191) [20] notes that the Buckley James estimator replaces the censored

T_{i}

by their estimated conditional expected values, resulting in a renovated response variable

T_{i}^{*}

. Using the

T_{i}^{*}

instead of the

T_{i}

may increase the linearity of the plot.

The slice survival plot divides the ESP into J groups of roughly the same size. For each group j with

n_{j}

cases, the model estimated survival function

{\hat{S}}_{j} (t)

is computed using the x corresponding to the “median ESP” of the group (the kth order statistic of the ESP in group j, where

k = 1 + f l o o r [(n_{j} - 1) / 2]

). Let

{\hat{S}}_{K M j} (t)

be the Kaplan Meier estimator computed from the survival times

(T_{i}, δ_{i})

in the jth group. For each group,

{\hat{S}}_{j} (t)

is plotted and

{\hat{S}}_{K M j} (t_{i})

is plotted as circles at the uncensored event times

t_{i}

. The survival regression model is reasonable if the circles “track

{\hat{S}}_{j}

well” in each of the J plots.

If the slice widths go to zero, but the number of cases per slice increases to ∞ as

n \to \infty,

then the Kaplan Meier estimator and the model estimator converge to

S_{Y | S P} (t)

if the model holds. Simulations suggest that the two survival functions are “close” for moderate n and nine slices. For small n and skewed predictors, some slices may be too wide in that the model is correct but

{\hat{S}}_{K M j} (t)

is not a good approximation of

S_{Y | S P} (t)

where

S P

corresponds to the x used to compute

{\hat{S}}_{j} (t) .

For the Cox model, if pointwise confidence interval (CI) bands are added to the plot, then

{\hat{S}}_{K M j}

“tracks

{\hat{S}}_{j}

well” if most of the plotted circles do not fall very far outside the pointwise CI bands since these pointwise bands are not as wide as simultaneous bands. Collett (2003, pp. 241–243) [17] places several observed Kaplan Meier curves with fitted curves on the same plot.

Survival regression is the study of the conditional survival

S_{Y | S P} (t)

, and the slice survival plot is a useful tool for visualizing

S_{Y | S P} (t)

in the background of the data. Suppose the jth slice is narrow so that

E S P \approx w_{j}

. If the model is reasonable,

E S P \approx S P

, and the number of uncensored cases in the jth slice is not too small, then

S_{Y | S P = w_{j}} (t) \approx {\hat{S}}_{j} (t) \approx {\hat{S}}_{K M j} (t) .

(These quantities approximate

{[{\hat{S}}_{0} (t)]}^{exp (w_{j})}

for the Cox model.) Thus the nonparametric Kaplan Meier estimator is used to check the model estimator

{\hat{S}}_{j} (t)

in each slice.

The slice survival plot tailored to the Cox model is closely related to the May and Hosmer (1998) [21] test. Also, van Houwelingen et al. (2006) [22] use similar ideas but place the J Kaplan Meier curves on one plot and the J Cox survival curves on another plot. The ESP is a scalar while x is a

p \times 1

vector. Using the ESP instead of x in plots is an important dimension reduction technique (and is similar to using a scalar valued minimal sufficient statistic instead of the p-dimensional sufficient statistic x.) Plots have been suggested by several authors with x divided into J groups instead of the ESP. For example, see Miller (1981, p. 168) [23]. Hosmer and Lemeshow (1999, pp. 141–145) [24] suggest making plots based on the quartiles of the ith predictor

x_{i}

, and note that a problem with Cox survival curves,

{\hat{S}}_{x} (t) = {[{\hat{S}}_{0} (t)]}^{exp ({\hat{β}}_{C}^{T} x)} = {[{\hat{S}}_{0} (t)]}^{exp (E S P)}

, is that they may use inappropriate extrapolation. Using the ESP results in narrow slices with many cases, and adding Kaplan Meier curves shows if there is extrapolation.

3.1. Examples and Simulations

Example 1.

The ovarian cancer data is from Collett (2003, pp. 187–190) [17] and Edmunson et al. (1979) [25]. The response variable is the survival time of

n = 26

ovarian cancer patients in days, with predictors age in years and treat (1 for cyclophosphamide alone and 2 for cyclophosphamide combined with adriamycin). See Figure 1 for the three EE plots for the ovarian cancer data, where ESPW=ESPA. The Weibull AFT appears to be appropriate for this data set. Then

\hat{C o v} (a g e, Z) = - 0.1286

,

\hat{C o v} (t r e a t, Z) = - 7.90408

,

\hat{C o v} (a g e, Z) / \sqrt{\hat{V} (a g e)} = - 0.2522

, and

\hat{C o v} (t r e a t, Z) / \sqrt{\hat{V} (t r e a t)} = - 0.7840 .

Hence

| \hat{C o r} (t r e a t, Z) | \approx 3 | \hat{C o r} (a g e, Z) | .

Figure 2 shows the EE plots when an Exponential AFT is used instead of the Weibull AFT. Now the plotted points do not cluster about the identity line. Hence the Exponential AFT should not be used. Figure 3 shows the LCR plot using the Weibull AFT.

Figure 1. Three EE plots for the ovarian cancer data.

Figure 2. Three EE plots with the exponential AFT.

Figure 3. LCR plot for ovarian cancer data.

Example 2.

R contains a data set nwtco where the response variable Y is the time until relapse with

n = 4028

. The model used predictors histol = tumor histology from central lab, instit = tumor histology from local institution, age in months, and stage of disease from 1 to 4 (treated as a continuous variable). In Figure 4, the Grambsch and Therneau (1994) [12] plots suggest that the Cox model is not valid since not all of the loess curves are flat, and the global test has p-value

\approx 0 .

The slice survival plot in Figure 5 shows that the Cox survival estimators and Kaplan Meier estimators are nearly identical in the six slices, suggesting that the Cox model is a reasonable approximation to the data. The greatest contributors to lack of fit seem to be the predictors age and stage corresponding to the bottom two plots of Figure 4, and survival for small ESP corresponding to the upper left plot in Figure 5.

Figure 4. Grambsch and Therneau plots for NWTCO data.

Figure 5. Slice survival plot for NWTCO data: dashes are pointwise CI bands.

3.2. ${\hat{Σ}}_{x Z}$ Simulation

An R code similar to that of Zhou (2001) [26] was used to generate a Weibull regression data set with parameter vector

β_{P}

. Then the Weibull AFT parameter vector

β = β_{Z} = β_{A} = - σ β_{P} = - (1 / γ) β_{P} .

Hence

Σ_{x Z} = - γ Cov (x) β_{P}

. The simulation used

β_{A} = - {(1 / γ, \dots, 1 / γ, 0, \dots, 0)}^{T}

with

p - k

zeroes and

β_{P} = {(1, \dots, 1, 0, \dots, 0)}^{T}

with k ones and

p - k

zeroes. The population

Σ_{x Z} = Σ_{x} β_{A}

was computed.

In the simulations, for

i = 1, \dots, n

, we generated

w_{i} \sim N_{p} (0, I)

where the p elements of the vector

w_{i}

are iid N(0,1). Let the

p \times p

matrix

A = (a_{i j})

with

a_{i i} = 1

and

a_{i j} = ψ

where

0 \leq ψ < 1

for

i \neq j

. Then the vector

x_{i} = A w_{i}

so that

Cov (x_{i}) = Σ_{x} = A A^{T} = (σ_{i j})

where the diagonal entries

σ_{i i} = [1 + p ψ^{2}]

and the off diagonal entries

σ_{i j} = [2 ψ + (p - 1) ψ^{2}]

. Hence the correlations are

c o r (x_{i}, x_{j}) = ρ = (2 ψ + (p - 1) ψ^{2}) / (1 + p ψ^{2})

for

i \neq j

. If

ψ = 1 / \sqrt{c p}

, then

ρ \to 1 / (c + 1)

as

p \to \infty

where

c > 0

. As

ψ

gets close to 1, the predictor vectors

x_{i}

cluster about the line in the direction of

{(1, \dots, 1)}^{T} .

Then 5000 runs are used to obtain the estimators. The means and standard deviations of the estimators are given. In the simulation, the uncensored values of Z are known. Hence the first estimator is the usual sample covariance vector

{\hat{Σ}}_{x Z}

. For real data, this estimator cannot be computed since only censored values of Z are known. The second estimator is

{\hat{Σ}}_{x Z} (A) = {\hat{Σ}}_{x} {\hat{β}}_{A}

from the Weibull AFT. The third estimator is

{\hat{Σ}}_{x Z} (B J) = {\hat{Σ}}_{x} {\hat{β}}_{B J}

using the Buckley James estimator. Let

Σ_{x Z} = {(σ_{1 Z}, \dots, σ_{p Z})}^{T}

. Table 1 gives 2 lines per simulation scenario. The first line gives the means while the second line gives the standard deviations. A value of

0 +

means the absolute value was less than 0.0005. Table 2 gives similar results with

p = 5

. Table 3 used correlated predictors. The simulation needed roughly

n > 25 p

to avoid convergence problems in the 5000 runs.

Table 1.

Σ_{x Z} = {(- 1, 0, 0, 0)}^{T}

.

Table 2.

Σ_{x Z} = {(- 0.2, - 0.2, - 0.2, - 0.2, - 0.2)}^{T}

.

Table 3.

Σ_{x Z} = {(- 1, - 0.2136, - 0.2136, - 0.2136)}^{T}

.

All three estimators worked well. It is not surprising that a correctly specified AFT would slightly outperform the Buckley James estimator (have the smallest standard deviations).

4. Discussion

The Harrell (2015) [27] rms library is useful for the Buckley James estimator. For more on estimators for model (4), see, for example, Heller and Simonoff (1990) [28], James and Smith (1984) [29], Lai and Ying (1991) [30], Lin and Wei (1992) [31], and Zeng and Lin (2007) [32]. The Kaplan Meier estimator is due to Kaplan and Meier (1958) [33].

Next, we provide some directions for further research.

(a) Under iid cases,

β_{O L S} = Σ_{x}^{- 1} Σ_{x Y}

even if heterogeneity is present. Hence

{\hat{Σ}}_{x Z} = {\hat{Σ}}_{x} {\hat{β}}_{Z}

where, for example,

{\hat{β}}_{Z}

is one of the estimators studied by Yu, Liu, and Chen (2024) [34].

(b) Similar ideas can be used for other censored regression models, such as a Tobit regression, provided that the cases are iid, and the population parameter vector

β = β_{O L S}

.

(c) The Buckley James estimator can also be used for the survival regression model

Z = t (Y) = α_{Z} + x_{i}^{T} β_{Z} + e

where t is a known monotone function and Z is right censored.

(d) Better large sample theory would be useful so confidence intervals can be made for

C o v (x_{i}, Y)

.

(e) More resistance to outliers would be useful, although the LCR plot is somewhat useful for detecting outliers.

(f) When the Grambsch and Therneau test and the slice survival plot give contradictory evidence on the proportional hazards assumption, how should that evidence be used?

Software

The R software 4.4.1 was used in the simulations. See R Core Team (2024) [35]. Programs are in the collection of R functions survpack.txt, available from (http://parker.ad.siu.edu/Olive/survpack.txt, accessed on 29 October 2025). The function BJcovxz generates a Weibull regression data set with right censored survival times. The function BJcovxzsim was used for Table 1. The function vnwtco was used to produce plots for Example 2. The function vovar is useful for Example 1. The function bphgfit makes a slice survival plot if there is a single covariate

x = 1

for group (treatment) 1 and

x = 0

, otherwise. The function bphsim3 simulates the above function. The function phgfit2 makes a slice survival plot when x is a

p \times 1

vector. Both of these functions are for the Cox model. The function phsim5 simulates the above function. The function wphsim simulates the slice survival plot for Weibull regression. The function wregsim2 simulates EE plots for Weibull regression.

Some R code for producing the simulation and Figure 1 appears in Johana Lemonge (2025) [36].

Author Contributions

Validation, S.J.L.; Writing—original draft, D.J.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

One data set is available from (http://parker.ad.siu.edu/Olive/survdata.txt, accessed on 29 October 2025). The other data set comes with R.

Acknowledgments

The authors thank the editors and referees for their work.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AFT	accelerated failure time
ESP	estimated sufficient predictor
iid	independent and identically distributed
LCR	log censored response
OLS	ordinary least squares

References

Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. B 1972, 34, 187–220. [Google Scholar] [CrossRef]
Buckley, J.; James, I. Linear regression with censored data. Biometrika 1979, 66, 429–436. [Google Scholar] [CrossRef]
Arjas, E. A graphical method for assessing goodness of fit in Cox’s proportional hazards model. J. Am. Stat. Assoc. 1988, 83, 204–212. [Google Scholar] [CrossRef]
Feng, C.; Wang, H.; Zhang, Y.; Han, Y.; Liang, Y.; Tu, X.M. On testing proportionality in the Cox regression model by Andersen’s plot. Commun. Stat.-Theory Methods 2017, 46, 3489–3500. [Google Scholar] [CrossRef]
Gill, R.D.; Schumaker, M. A simple test of the proportional hazards assumption. Biometrika 1987, 74, 289–300. [Google Scholar] [CrossRef]
Kay, R. Goodness-of-fit methods for the proportional hazards model: A review. Rev. Epidemiol. Santé Publ. 1984, 32, 185–198. [Google Scholar]
Lin, D.Y.; Wei, L.J. Goodness-of-fit tests for the general Cox regression model. Stat. Sin. 1991, 1, 1–17. [Google Scholar]
Marzec, L.; Marzec, P. Generalized martingale-residual processes for goodness-of-fit inference in Cox’s type regression models. Ann. Stat. 1997, 25, 683–714. [Google Scholar] [CrossRef]
Ng’andu, N.H. An empirical comparison of statistical tests for assessing the proportional hazards assumption of Cox’s model. Stat. Med. 1997, 16, 611–626. [Google Scholar] [CrossRef]
Quantin, C.; Moreau, T.; Asselain, B.; Maccario, J.; Lellouch, J. A regression survival model for testing the proportional hazards hypothesis. Biometrics 1996, 52, 874–885. [Google Scholar] [CrossRef]
Yu, Q.; Chappell, R.; Wong, G.Y.C.; Hsu, Y.; Mazur, M. Relationship between Cox, Lehmann, Weibull, and accelerated lifetime models. Commun. Stat.-Theory Methods 2008, 37, 1458–1470. [Google Scholar] [CrossRef]
Grambsch, P.M.; Therneau, T.M. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994, 81, 515–526. [Google Scholar] [CrossRef]
Barchard, K.A.; Russell, J.A. Distorted correlations among censored data: Causes, effects, and correction. Behav. Res. Methods 2024, 56, 1207–1228. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Gillespie, B.W.; Shedden, K.; Gillespie, J.A. Profile likelihood estimation of the correlation coefficient in the presence of left, right or interval censoring and missing data. R J. 2018, 10, 159–179. [Google Scholar] [CrossRef]
Lyles, R.H.; Fan, D.; Chuachoowong, R. Correlation coefficient estimation involving a left censored laboratory assay variable. Stat. Med. 2001, 20, 2921–2933. [Google Scholar] [CrossRef]
Pesonen, M.; Pesonen, H.; Nevalainen, J. Covariance estimation for left-censored data. Comput. Stat. Data Anal. 2015, 92, 13–25. [Google Scholar] [CrossRef]
Collett, D. Modelling Survival Data in Medical Research, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2003. [Google Scholar]
Ritov, Y. Estimation in a linear regression with censored data. Ann. Stat. 1990, 18, 303–328. [Google Scholar] [CrossRef]
Tsiatis, A.A. Estimating regression parameters using linear rank tests for censored data. Ann. Stat. 1990, 18, 354–372. [Google Scholar] [CrossRef]
Smith, P.J. Analysis of Failure and Survival Data; Chapman and Hall/CRC: Boca Raton, FL, USA, 2002. [Google Scholar]
May, S.; Hosmer, D.W. A simple method for calculating a goodness-of-fit test for the proportional hazards model. Lifetime Data Anal. 1998, 4, 109–120. [Google Scholar] [CrossRef]
van Houwelingen, H.C.; Bruinsma, T.; Hart, A.A.M.; Veer, L.J.; Wessels, L.F.A. Cross-validated Cox regression on microarray gene expression data. Stat. Med. 2006, 25, 3201–3216. [Google Scholar] [CrossRef]
Miller, R. Survival Analysis; Wiley: New York, NY, USA, 1981. [Google Scholar]
Hosmer, D.W.; Lemeshow, S. Applied Survival Analysis: Regression Modeling of Time to Event Data; Wiley: New York, NY, USA, 1999. [Google Scholar]
Edmunson, J.H.; Fleming, T.R.; Decker, D.G.; Malkasian, G.D.; Jorgenson, E.O.; Jeffries, J.A.; Webb, M.J.; Kvols, L.K. Different chemotherapeutic sensitivities and host factors affecting prognosis in advanced ovarian carcinoma versus minimal residual disease. Cancer Treat. Rep. 1979, 63, 241–247. [Google Scholar]
Zhou, M. Understanding the Cox regression models with time–change covariates. Am. Stat. 2001, 55, 153–155. [Google Scholar] [CrossRef]
Harrell, F.E. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis, 2nd ed.; Springer: New York, NY, USA, 2015. [Google Scholar]
Heller, G.; Simonoff, J.S. A comparison of estimators for regression with a censored response variable. Biometrika 1990, 77, 515–520. [Google Scholar] [CrossRef]
James, I.R.; Smith, P.J. Consistency results for linear regression with censored data. Ann. Stat. 1984, 12, 590–600. [Google Scholar] [CrossRef]
Lai, T.L.; Ying, Z. Large-sample theory of a modified Buckley-James estimator for regression analysis with censored data. Ann. Stat. 1991, 19, 1370–1402. [Google Scholar] [CrossRef]
Lin, J.S.; Wei, L.J. Linear regression analysis based on Buckley-James estimating equation. Biometrics 1992, 48, 679–681. [Google Scholar] [CrossRef]
Zeng, D.; Lin, D.Y. Efficient estimation for the accelerated failure time model. J. Am. Stat. Assoc. 2007, 102, 1387–1396. [Google Scholar] [CrossRef]
Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
Yu, L.; Liu, L.; Chen, D.G. Extending Buckley-James method for heteroscedastic survival data. J. Stat. Comput. Sim. 2024, 94, 1776–1792. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2024; Available online: www.R-project.org (accessed on 29 October 2025).
Johana Lemonge, S. OLS Testing with Predictors Scaled to Have Unit Sample Variance. Ph.D. Thesis, Southern Illinois University, Carbondale, IL, USA, 2025. Available online: http://parker.ad.siu.edu/Olive/sSanjuka.pdf (accessed on 29 October 2025).

Figure 1. Three EE plots for the ovarian cancer data.

Figure 2. Three EE plots with the exponential AFT.

Figure 3. LCR plot for ovarian cancer data.

Figure 4. Grambsch and Therneau plots for NWTCO data.

Figure 5. Slice survival plot for NWTCO data: dashes are pointwise CI bands.

Table 1.

Σ_{x Z} = {(- 1, 0, 0, 0)}^{T}

.

Table 1.

Σ_{x Z} = {(- 1, 0, 0, 0)}^{T}

.

$(n, ψ, k)$	est	$σ_{1 Z}$	$σ_{2 Z}$	$σ_{3 Z}$	$σ_{4 Z}$
(100,0,1)	samp	−0.999	−0.002	−0.002	−0.001
	SD	0.194	0.162	0.162	0.163
(100,0,1)	AFT	−1.003	−0.001	−0.001	$0 +$
	SD	0.186	0.149	0.150	0.151
(100,0,1)	BJ	−1.002	−0.002	−0.002	−0.001
	SD	0.202	0.169	0.169	0.169
(500,0,1)	samp	−1.000	0.001	0.002	$0 +$
	SD	0.087	0.073	0.073	0.074
(500,0,1)	AFT	−1.001	0.001	0.001	$0 +$
	SD	0.082	0.065	0.066	0.066
(500,0,1)	BJ	−1.000	0.001	0.002	$0 +$
	SD	0.090	0.075	0.075	0.076
(1000,0,1)	samp	−1.001	$0 +$	0.001	$0 +$
	SD	0.061	0.051	0.051	0.051
(1000,0,1)	AFT	−1.001	0.001	0.001	$0 +$
	SD	0.056	0.047	0.047	0.046
(1000,0,1)	BJ	−1.000	$0 +$	0.001	0.020
	SD	0.064	0.053	0.053	0.053

Table 2.

Σ_{x Z} = {(- 0.2, - 0.2, - 0.2, - 0.2, - 0.2)}^{T}

.

Table 2.

Σ_{x Z} = {(- 0.2, - 0.2, - 0.2, - 0.2, - 0.2)}^{T}

.

$(n, ψ, k)$	est	$σ_{1 Z}$	$σ_{2 Z}$	$σ_{3 Z}$	$σ_{4 Z}$	$σ_{5 Z}$
(150,0,5)	samp	−0.1999	−0.2002	−0.2015	−0.1984	−0.1998
	SD	0.0613	0.0622	0.0615	0.0615	0.0622
(150,0,5)	AFT	−0.2004	−0.2004	−0.2018	−0.1998	−0.2008
	SD	0.0569	0.0574	0.0572	0.0570	0.0569
(150,0,5)	BJ	−0.2005	−0.2002	−0.2019	−0.1988	−0.2001
	SD	0.0639	0.0646	0.0642	0.0640	0.0646
(500,0,5)	samp	−0.2001	−0.1999	−0.2003	−0.2000	−0.2000
	SD	0.0339	0.03366	0.0342	0.0337	0.0338
(500,0,5)	AFT	−0.2006	−0.1999	−0.2006	−0.2004	−0.2002
	SD	0.0312	0.0309	0.0311	0.0311	0.0307
(500,0,5)	BJ	−0.2000	−0.1999	−0.2003	−0.2001	−0.2000
	SD	0.0351	0.0349	0.0355	0.0348	0.0350
(1000,0,5)	samp	−0.2003	−0.1999	−0.2002	−0.2001	−0.2010
	SD	0.0234	0.0239	0.0239	0.0237	0.0237
(1000,0,5)	AFT	−0.2004	−0.2002	−0.2002	−0.2000	−0.2007
	SD	0.0215	0.0218	0.0217	0.0216	0.0216
(1000,0,5)	BJ	−0.2003	−0.1998	−0.2001	−0.1999	−0.2009
	SD	0.0243	0.0247	0.0247	0.0244	0.0245

Table 3.

Σ_{x Z} = {(- 1, - 0.2136, - 0.2136, - 0.2136)}^{T}

.

Table 3.

Σ_{x Z} = {(- 1, - 0.2136, - 0.2136, - 0.2136)}^{T}

.

$(n, ψ, k)$	est	$σ_{1 Z}$	$σ_{2 Z}$	$σ_{3 Z}$	$σ_{4 Z}$
(100,0.1,5)	samp	−0.9977	−0.2116	−0.2121	−0.2136
	SD	0.1913	0.1608	0.1655	0.1640
(100,0.1,5)	AFT	−1.0026	−0.2122	−0.2124	−0.2137
	SD	0.1834	0.1470	0.1526	0.1528
(100,0.1,5)	BJ	−1.0003	−0.2119	−0.2123	−0.2148
	SD	0.2001	0.1670	0.1718	0.1707

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.

Estimating Covariances and Goodness of Fit Plots for Accelerated Failure Time Models

Abstract

1. Introduction

2. Materials and Methods

2.1. Estimating Σ x Z for Some Censored Survival Regression Models

2.2. The EE Plot

3. Some Other Plots

3.1. Examples and Simulations

3.2. Σ ^ x Z Simulation

4. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics

2.1. Estimating $Σ_{x Z}$ for Some Censored Survival Regression Models

3.2. ${\hat{Σ}}_{x Z}$ Simulation