Analysis of Receiver Operating Characteristic Curves for Cure Survival Data and Mismeasured Biomarkers

Chen, Li-Pang

doi:10.3390/math13030424

Open AccessArticle

Analysis of Receiver Operating Characteristic Curves for Cure Survival Data and Mismeasured Biomarkers

by

Li-Pang Chen

Department of Statistics, National Chengchi University, Taipei City 116, Taiwan

Mathematics 2025, 13(3), 424; https://doi.org/10.3390/math13030424

Submission received: 13 December 2024 / Revised: 20 January 2025 / Accepted: 25 January 2025 / Published: 27 January 2025

(This article belongs to the Special Issue Statistical Analysis and Data Science for Complex Data)

Download Versions Notes

Abstract

Cure models and receiver operating characteristic (ROC) curve estimation are two important issues in survival analysis and have received attention for many years. In the development of biostatistics, these two topics have been well discussed separately. However, a rare development in the estimation of the ROC curve has been made available based on survival data with the cure fraction. On the other hand, while a large body of estimation methods have been proposed, they rely on an implicit assumption that the variables are precisely measured. In applications, measurement errors are generally ubiquitous and ignoring measurement errors can cause unexpected bias for the estimator and lead to the wrong conclusion. In this paper, we study the estimation of the ROC curve and the area under curve (AUC) when variables or biomarkers are subject to measurement error. We propose a valid procedure to handle measurement error effects and estimate the parameters in the cure model, as well as the AUC. We also make an effort to establish the theoretical properties with rigorous justification.

Keywords:

area under curve; bias correction; cure; censoring; EM algorithm; incomplete response; insertion method; mismeasurement

MSC:

62Nxx; 62N02

1. Introduction

Survival analysis has been an important research topic in biostatistics. The main purpose of survival analysis is to understand the time at which a specific event or disease happens among patients and make a prediction. The key feature of survival data is that the time-to-event variable either is observed at a finite time (i.e., the event, such as death, happens during the observation period) or cannot be observed due to censoring. A more detailed introduction can be found in some monographs, such as [1]. However, in the context of survival analysis, there is a situation in which patients never experience the failure event (e.g., death) in the study period; this phenomenon is called cure, and the corresponding model and data structure is the well-known cure model. There are some estimation methods to analyze the cure model in the literature. For example, [2] considered the transformation model for survival data and implemented the logistic regression model to characterize the probability of cure. Ref. [3] proposed the conditional likelihood function given truncation variables. Ref. [4] explored the cure model with a mixture of single-index and Cox models. More comprehensive discussions are summarized in [5].

In applications, noisy data are usually inevitable due to the data collection or sampling mechanism. One characteristic of typical noisy data is the measurement error, which indicates that the observed variables cannot fully reflect what they should be. This phenomenon is usually caused by machines that are imprecise or incorrect records by researchers. In the framework of survival data with cure models, some methods have also been developed to deal with measurement error problems. To name a few, ref. [6] proposed the simulation and extrapolation (SIMEX) method to correct the measurement errors. Ref. [7] developed the conditional expectation approach to correct the error effects. Unlike [6,7], who focused on the Cox proportional hazards model, ref. [8] studied the transformation model and adopted the SIMEX method to handle measurement error.

The other important issue in survival analysis is the estimation of the receiver operating characteristic (ROC) curve, whose primary goal is to display the sensitivity and the specificity of a continuous marker for a given disease. From the existing literature, the estimation procedures lie in the censored data, including [9,10,11,12,13]. However, to our knowledge, rare methods have been available to estimate the ROC curve when survival data contain the cured group and variables or biomarkers suffer from measurement error, except for [14,15], who considered the Cox model and the mixture cure model with the decomposition of sensitivity and specificity, respectively. The other challenging feature in analysis is the measurement error in biomarkers, which frequently appears in real-world applications. For example, [16] studied the Mayo Clinic primary biliary cirrhosis dataset with measurement errors in serum bilirubin, serum albumin, and prothrombin time; ref. [17] pointed out the biomarker systolic blood pressure (SBP), which is one of the key prognostic factors for cardiovascular risk scores, is possibly subject to measurement error; ref. [18] implemented three error-prone biomarkers, ER, PR, and Ki67 proteins, to analyze health study data on breast cancer in nurses. All the literature reveals that measurement error in biomarkers possibly affects the performance of standard diagnostic measures, such as ROC curves and area under the ROC curve (AUC). While error-prone biomarkers were discussed in survival analysis or the estimation of ROC curves (e.g., [16,17]), it seems that estimation methods for the cure fraction and error-prone biomarkers are unavailable.

To fill this research gap, we aim to develop a new estimation procedure in this paper. Specifically, we first consider the transformation model with the cure fraction in the survival data and the measurement error in covariates or biomarkers. Our setting generalizes the conventional implementation of the Cox model. Unlike existing methods that adopted the SIMEX method to correct for measurement error (e.g., [6,8]), we propose the insertion method, whose key spirit is to construct the new function so that its conditional expectation can be recovered to the true function based on the unobserved covariates or biomarkers. The corrected estimation function ensures the consistency of the estimator. After that, we further explore the estimation of the ROC curve when the survival data contain cured samples and mismeasured variables. We develop the time-independent and time-dependent estimation for the ROC curves and the corresponding estimation of the area under curve (AUC). Theoretically, we also establish the consistency and the asymptotic normality of the AUC estimator. The contribution of this paper is a valid estimation strategy, and the main focus is its theoretical establishment with rigorous justification.

The remainder is organized as follows. In Section 2, we introduce notation and models. In Section 3, we first present the proposed method to correct the measurement error effect and derive the estimator. Moreover, the corresponding theoretical properties are presented. In Section 4, we present numerical experiments and their results. We conclude the paper with discussion in Section 5. Finally, the proofs of the theoretical properties are placed in Appendix A.

2. Notation and Models

2.1. Cure Model

Let T and C denote the (uncured) failure time and the censoring time, respectively. With subjects cured and uncured, the failure time is determined by

\tilde{T} ≜ A T + (1 - A) \infty

, where

A \in {0, 1}

indicates whether a subject is cured (

A = 0

) or uncured (

A = 1

). To characterize A, we consider the logistic regression

\begin{matrix} π (X) ≜ P (A = 1 | X) = \frac{exp (α^{⊤} X)}{1 + exp (α^{⊤} X)}, \end{matrix}

(1)

where X is the p-dimensional vector of covariates or biomarkers and

α

is the p-dimensional vector of the parameters associated with X.

On the other hand, regarding the uncured failure time T, we consider the survivor function that follows the transformation model (e.g., [19,20])

S_{u} (t | X_{i}) = \frac{exp (- β^{⊤} X_{i} / ρ)}{{\{exp (- β^{⊤} X_{i}) + ρ H (t)\}}^{1 / ρ}},

(2)

where

β

is the p-dimensional vector of the parameters associated with X,

H (t)

is the unknown strictly increasing function, and

ρ

is the known parameter. In particular, when

ρ = 1

, (2) follows the proportional odds (PO) model; when

ρ = 0

, then (2) becomes the proportional hazards (PH) model. Given (2), the survivor function of

\tilde{T}

is given by

\begin{matrix} S (t | X) & ≜ & P (\tilde{T} > t | X) \\ = & P (\tilde{T} > t | X, A = 1) P (A = 1 | X) + P (\tilde{T} > t | X, A = 0) P (A = 0 | X) \\ = & P (T > t | X) π (X) + P (\infty > t | X) {1 - π (X)} \\ = & S_{u} (t | X) π (X) + 1 - π (X) . \end{matrix}

Finally, let C denote the censoring time, and we further define

Y ≜ min {T, C}

as the observed survival time and

δ = I (T \leq C)

, with

I (\cdot)

being the indicator function. Let

{{Y_{i}, δ_{i}, X_{i}} : i = 1, \dots n}

denote the independent and identically distributed sample as

{Y, δ, X}

.

To develop the method in the following sections, some typical assumptions are imposed in the context of survival analysis, including noninformative censoring times, and the independence of failure time is independent of the censoring time, given the covariates.

2.2. Measurement Error Model

In applications, the covariates or biomarkers are possibly subject to measurement error. Here, we consider the case where

X_{i}

is error-contaminated, and let

X_{i}^{*}

denote the observed or surrogate measurement of

X_{i}

. Consistent with most work in the literature (e.g., [21]), we consider that the

X_{i}

are continuous and linked with

X_{i}^{*}

by the additive measurement error model:

X_{i}^{*} = X_{i} + ϵ_{i},

(3)

where

ϵ_{i}

is independent of

{Y_{i}, δ_{i}, X_{i}}

,

ϵ_{i} \sim N (0_{p}, Σ_{ϵ})

with the covariance matrix

Σ_{ϵ}

, and

0_{p}

represents the

p \times 1

zero vector. Let

ω_{j k} = \frac{cov (X_{i j}, X_{i k})}{cov (X_{i j}^{*}, X_{i k}^{*})}

denote the reliability ratio of the covariances of jth and kth covariates and their surrogates for

j, k = 1, \dots, p

, which reflects the magnitude of variation between the unobserved and observed covariates, and higher values indicate a minor measurement error. Note that

cov (X_{i j}^{*}, X_{i k}^{*}) = cov (X_{i j}, X_{i k}) + cov (ϵ_{i j}, ϵ_{i k})

, which implies that

ω_{j k} = \frac{cov (X_{i j}, X_{i k})}{cov (X_{i j}, X_{i k}) + cov (ϵ_{i j}, ϵ_{i k})}

. In applications, we may assume that the reliability ratio

ω_{j k}

is not a small value for all

j, k = 1, \dots, p

(e.g., [22]), which implies that the values in

Σ_{ϵ}

are not too large.

In situations where the parameters for the measurement error model (3) are unknown, we may utilize the information carried by additional data sources, such as repeated measurements or validation subsamples. As a result, to highlight key ideas, we focus our attention on the estimation of parameters associated with the survival model and assume that the parameter

Σ_{ϵ}

for measurement error model (3) is known. This assumption is reasonable, typically arising in two circumstances: (i) prior studies provide the information on the covariate mismeasurement and offer an estimate of

Σ_{ϵ}

, and (ii) in conducting sensitivity analyses, different values of

Σ_{ϵ}

are specified to understand how mismeasurement effects may affect inference results about the parameters associated with the survival model.

3. Methodology

Suppose that we have a sample of n subjects and that for

i = 1, \dots, n

,

{Y_{i}, δ_{i}, X_{i}^{*}}

has the same distribution as

{Y, δ, X^{*}}

. We first estimate the unknown parameters

(α, β, H (\cdot))

and the survivor function of

\tilde{T}

based on formulation (2) with suitable error correction. After that, we develop the error-corrected method to estimate the time-independent/time-dependent AUC. Moreover, theoretical results are also established after presenting the proposed estimation procedures.

3.1. Construction of the Error-Corrected Likelihood Function

To develop our approach, we start the discussion by pretending that

X_{i}

is available. After that, we replace

X_{i}

with

X_{i}^{*}

and propose the error-corrected method to adjust error effects.

For cured patients with

δ_{i} = 0

and

A_{i} = 0

, we have

P {({\tilde{T}}_{i} = \infty)}^{1 - A_{i}} = {\{1 - π (X_{i})\}}^{1 - A_{i}}

. In addition, for those patients who encounter the failure event, i.e.,

δ_{i} = 1

and

A_{i} = 1

, we have

{\{f_{T} (y | X_{i}) π (X_{i})\}}^{δ_{i} = 1}

, where

f_{T} (\cdot)

represents the density function of uncured failure time. Moreover, for observations who are censored but not cured (

δ_{i} = 0

and

A_{i} = 1

), we can obtain

{\{S_{u} (y | X_{i}) π (X_{i})\}}^{I (δ_{i} = 0, A_{i} = 1)}

. Therefore, for subject

i = 1, \dots, n

, the complete likelihood function is given by

\begin{matrix} L & ≜ & \prod_{i = 1}^{n} {\{1 - π (X_{i})\}}^{1 - A_{i}} {\{π (X_{i})\}}^{A_{i}} \\ \times {\{\frac{d H (Y_{i})}{exp (- β^{⊤} X_{i}) + ρ H (Y_{i})}\}}^{δ_{i}} {\{\frac{exp (- β^{⊤} X_{i})}{exp (- β^{⊤} X_{i}) + ρ H (Y_{i})}\}}^{A_{i} / ρ}, \end{matrix}

(4)

where

d H (\cdot)

is the derivative of

H (\cdot)

, and the corresponding log-likelihood function of (4) is

\begin{matrix} ℓ & = & \sum_{i = 1}^{n} (A_{i} X_{i}^{⊤} α - log \{1 + exp (X_{i}^{⊤} α)\} \\ + δ_{i} [log \{d H (Y_{i})\} - log \{exp (- β^{⊤} X_{i}) + ρ H (Y_{i})\}] \\ - \frac{A_{i}}{ρ} [β^{⊤} X_{i} + log \{exp (- β^{⊤} X_{i}) + ρ H (Y_{i})\}]) . \end{matrix}

(5)

However, note that we usually have

X_{i}^{*}

in (5) instead of

X_{i}

. It is crucial to make a suitable correction. Our strategy is to find the new likelihood function based on

X_{i}^{*}

, denoted by

ℓ^{*}

, such that the conditional expectation

E (ℓ^{*} | X_{i}) = ℓ

is satisfied. In this way, we have

E (ℓ^{*}) = E (ℓ)

, showing that

ℓ^{*}

is an error-corrected likelihood function and provides the same optimizer as those obtained from ℓ. By measurement error model (3) and the moment-generating function under the normal distribution, it is direct to have

E (X_{i}^{*} | X_{i}) = X_{i}

and

E {exp (α^{⊤} X_{i}^{*}) | X_{i}} = exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)

. According to (5), however, the challenge is that the error-prone covariate appears in

log \{1 + exp (X_{i}^{* ⊤} α)\}

and

log \{exp (- β^{⊤} X_{i}^{*}) + ρ H (Y_{i})\}

, which are nonlinear terms.

To address this issue, we first consider the function

\frac{X_{i}^{*} exp (α^{⊤} X_{i}^{*})}{1 + exp (α^{⊤} X_{i}^{*})}

. Since the expectation of a ratio random variable is sometimes difficult to compute, we may adopt the approximation. Specifically, let

Ξ_{1 i} ≜ X_{i}^{*} exp (α^{⊤} X_{i}^{*})

and

Ξ_{2 i} ≜ 1 + exp (α^{⊤} X_{i}^{*})

. By the second-order Taylor series expansion (e.g., [23], pp. 69–72), we have the following approximation:

\begin{matrix} \frac{Ξ_{1 i}}{Ξ_{2 i}} & \approx & \frac{E (Ξ_{1 i} | X_{i})}{E (Ξ_{2 i} | X_{i})} + \frac{1}{E (Ξ_{2 i} | X_{i})} {Ξ_{1 i} - E (Ξ_{1 i} | X_{i})} - \frac{E (Ξ_{1 i} | X_{i})}{{E (Ξ_{2 i} | X_{i})}^{2}} {Ξ_{2 i} - E (Ξ_{2 i} | X_{i})} \\ - \frac{1}{2} \frac{1}{{E (Ξ_{2 i} | X_{i})}^{2}} {Ξ_{1 i} - E (Ξ_{1 i} | X_{i})} {Ξ_{2 i} - E (Ξ_{2 i} | X_{i})} \\ + \frac{1}{2} \frac{E (Ξ_{1 i} | X_{i})}{{E (Ξ_{2 i} | X_{i})}^{3}} {Ξ_{2 i} - E (Ξ_{2 i} | X_{i})}^{2} . \end{matrix}

Taking the conditional expectation gives that

\begin{matrix} E \{\frac{X_{i}^{*} exp (α^{⊤} X_{i}^{*})}{1 + exp (α^{⊤} X_{i}^{*})} | X_{i}\} & \approx & \frac{(X + Σ_{ϵ} α) exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)}{1 + exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)} \\ - \frac{1}{2} \frac{cov (Ξ_{1 i}, Ξ_{2 i} | X_{i})}{{\{1 + exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)\}}^{2}} \\ + \frac{1}{2} \frac{(X + Σ_{ϵ} α) exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)}{{\{1 + exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)\}}^{3}} var (Ξ_{2 i} | X_{i}) . \end{matrix}

(6)

On the other hand, the conditional variance of

Ξ_{2 i}

and conditional covariance of

Ξ_{1 i}

and

Ξ_{2 i}

, given

X_{i}

, are respectively derived as

\begin{matrix} var (Ξ_{2 i} | X_{i}) & = & var \{1 + exp (α^{⊤} X_{i}^{*}) | X_{i}\} \\ = & var \{exp (α^{⊤} X_{i}^{*}) | X_{i}\} \\ = & E \{exp (2 α^{⊤} X_{i}^{*}) | X_{i}\} - {[E \{exp (α^{⊤} X_{i}^{*}) | X_{i}\}]}^{2} \\ = & exp (2 α^{⊤} X_{i} + 2 α^{⊤} Σ_{ϵ} α) - exp (2 α^{⊤} X_{i} + α^{⊤} Σ_{ϵ} α) \end{matrix}

and

\begin{matrix} cov (Ξ_{1 i}, Ξ_{2 i} | X_{i}) & = & E \{X_{i}^{*} exp (α^{⊤} X_{i}^{*}) + X_{i}^{*} exp (2 α^{⊤} X_{i}^{*}) | X_{i}\} \\ - E \{X_{i}^{*} exp (α^{⊤} X_{i}^{*}) | X_{i}\} E \{1 + X_{i}^{*} exp (α^{⊤} X_{i}^{*}) | X_{i}\} \\ = & (X_{i} + 2 Σ_{ϵ} α) exp (2 α^{⊤} X_{i} + 2 α^{⊤} Σ_{ϵ} α) - (X_{i} + Σ_{ϵ} α) exp (2 α^{⊤} X_{i} + α^{⊤} Σ_{ϵ} α) . \end{matrix}

Provided that the reliability ratio is not too small or, equivalently, that the variance of the noise term

Σ_{ϵ}

is not too large, the two values

var (Ξ_{2 i} | X_{i})

and

cov (Ξ_{1 i}, Ξ_{2 i} | X_{i})

are close to zero, yielding that the second and last terms in (6) are close to zero; thus, we have that

\begin{matrix} E \{\frac{X_{i}^{*} exp (α^{⊤} X_{i}^{*})}{1 + exp (α^{⊤} X_{i}^{*})} | X_{i}\} \approx \frac{(X + Σ_{ϵ} α) exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)}{1 + exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)}, \end{matrix}

(7)

and performing integration on (7) with respect to

α

yields

\begin{matrix} E [log \{1 + exp (α^{⊤} X_{i}^{*})\} | X_{i}] & \approx & log \{1 + exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)\} \\ = & log \{1 + exp (α^{⊤} X_{i})\} + \frac{1}{2} α^{⊤} Σ_{ϵ} α, \end{matrix}

(8)

where the last approximation is due to that “plus one” being almost ignorable. Thus, (8) suggests that the suitable error correction of

log \{1 + exp (α^{⊤} X_{i})\}

is

\begin{matrix} log \{1 + exp (α^{⊤} X_{i}^{*})\} - \frac{1}{2} α^{⊤} Σ_{ϵ} α . \end{matrix}

(9)

Further, similar derivations of (7) and (8) enable us to obtain the following equality:

\begin{matrix} E [log \{ρ H (Y_{i}) + exp (- β^{⊤} X_{i}^{*})\} | X_{i}] \\ = & log \{ρ H (Y_{i}) + exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)\}, \end{matrix}

yielding that the suitable error correction of

log \{ρ H (Y_{i}) + exp (- β^{⊤} X_{i})\}

is

\begin{matrix} log \{ρ H (Y_{i}) + exp (- β^{⊤} X_{i}^{*})\} - \frac{1}{2} β^{⊤} Σ_{ϵ} β . \end{matrix}

(10)

Consequently, combining (5) with (9) and (10) yields the corrected log-likelihood function

\begin{matrix} ℓ^{*} & = & \sum_{i = 1}^{n} (A_{i} α^{⊤} X_{i}^{*} - log \{1 + exp (α^{⊤} X_{i}^{*})\} + \frac{1}{2} α^{⊤} Σ_{ϵ} α \\ + δ_{i} [log \{d H (Y_{i})\} - log \{ρ H (Y_{i}) + exp (- β^{⊤} X_{i}^{*})\} + \frac{1}{2} β^{⊤} Σ_{ϵ} β .] \\ - \frac{A_{i}}{ρ} [β^{⊤} X_{i}^{*} + log \{ρ H (Y_{i}) + exp (- β^{⊤} X_{i}^{*})\} - \frac{1}{2} β^{⊤} Σ_{ϵ} β .]) . \end{matrix}

(11)

3.2. Estimation of Parameters and Functions

After establishing the corrected log-likelihood function (11), we aim to estimate unknown parameters

α

and

β

, as well as unknown function

H (\cdot)

.

However, the other concern is that for those censored observations (

δ_{i} = 0

), whether they are cured or not is unknown; thus,

A_{i}

can be regarded as a missing value. To address this concern, we apply the expectation–maximization (EM) algorithm with error correction. Specifically, in the E-step, we start by considering

E (A_{i} | δ_{i}, {\tilde{T}}_{i}, X_{i})

to replace

A_{i}

. Recall that we have

X_{i}^{*}

only instead of

X_{i}

; then, it is crucial to make a suitable adjustment. Let

ξ_{i} ≜ E (A_{i} | δ_{i}, {\tilde{T}}_{i}, X_{i})

. We aim to find

ξ_{i}^{*} ≜ E (ξ_{i} | X_{i}^{*})

, which is in terms of the observed covariate

X_{i}

, such that the measurement error would be adjusted, i.e.,

E (ξ_{i}^{*}) = E (ξ_{i})

.

When

δ_{i} = 1

, we have

E (A_{i} | δ_{i} = 1, {\tilde{T}}_{i}, X_{i}) = 1

, and it is easy to check that

ξ_{i}^{*} = 1

. When

δ_{i} = 0

, then

E (A_{i} | δ_{i} = 0, {\tilde{T}}_{i}, X_{i}) = \frac{π (X_{i}) S_{u} (Y_{i} | X_{i})}{1 - π (X_{i}) + π (X_{i}) S_{u} (Y_{i} | X_{i})}

. Let

{\tilde{X}}_{i} ≜ E (X_{i} | X_{i}^{*})

, which satisfies

E ({\tilde{X}}_{i}) = E (X_{i})

. In particular, we apply the idea in [22] to express

{\tilde{X}}_{i}

by the linear approximation

\begin{matrix} {\tilde{X}}_{i} ≜ μ_{X} + Σ_{X}^{1 / 2} Σ_{X^{*}}^{- 1 / 2} (X_{i}^{*} - μ_{X^{*}}) \end{matrix}

(12)

with

μ_{X} ≜ E (X)

,

μ_{X^{*}} ≜ E (X^{*})

,

Σ_{X} ≜ var (X)

, and

Σ_{X^{*}} ≜ var (X^{*})

.

Following the derivation of [22], we apply the second-order Taylor series expansion for

ξ_{i}

and

{\tilde{ξ}}_{i} ≜ \frac{π ({\tilde{X}}_{i}) S_{u} (Y_{i} | {\tilde{X}}_{i})}{1 - π ({\tilde{X}}_{i}) + π ({\tilde{X}}_{i}) S_{u} (Y_{i} | {\tilde{X}}_{i})}

around

μ_{X}

with respect to

X_{i}

and

{\tilde{X}}_{i}

, respectively; then, we can obtain the following two approximations:

\begin{matrix} ξ_{i} & \approx & \frac{π (μ_{X}) S_{u} (Y_{i} | μ_{X})}{1 - π (μ_{X}) + π (μ_{X}) S_{u} (Y_{i} | μ_{X})} + {(\frac{\partial ξ_{i}}{\partial X_{i}} |_{X_{i} = μ_{X}})}^{⊤} (X_{i} - μ_{X}) \\ + \frac{1}{2} {(X_{i} - μ_{X})}^{⊤} \frac{\partial^{2} ξ_{i}}{\partial X_{i} \partial X_{i}^{⊤}} |_{X_{i} = μ_{X}} (X_{i} - μ_{X}) \end{matrix}

(13)

and

\begin{matrix} {\tilde{ξ}}_{i} & \approx & \frac{π (μ_{X}) S_{u} (Y_{i} | μ_{X})}{1 - π (μ_{X}) + π (μ_{X}) S_{u} (Y_{i} | μ_{X})} + {(\frac{\partial {\tilde{ξ}}_{i}}{\partial {\tilde{X}}_{i}} |_{{\tilde{X}}_{i} = μ_{X}})}^{⊤} ({\tilde{X}}_{i} - μ_{X}) \\ + \frac{1}{2} {({\tilde{X}}_{i} - μ_{X})}^{⊤} \frac{\partial^{2} {\tilde{ξ}}_{i}}{\partial {\tilde{X}}_{i} \partial {\tilde{X}}_{i}^{⊤}} |_{{\tilde{X}}_{i} = μ_{X}} ({\tilde{X}}_{i} - μ_{X}) . \end{matrix}

(14)

Combining (13) and (14) gives that

\begin{matrix} ξ_{i} & \approx & {\tilde{ξ}}_{i} + {(\frac{\partial ξ_{i}}{\partial X_{i}} |_{X_{i} = μ_{X}})}^{⊤} (X_{i} - {\tilde{X}}_{i}) + \frac{1}{2} {(X_{i} - μ_{X})}^{⊤} \frac{\partial^{2} ξ_{i}}{\partial X_{i} \partial X_{i}^{⊤}} |_{X_{i} = μ_{X}} (X_{i} - μ_{X}) \\ - \frac{1}{2} {({\tilde{X}}_{i} - μ_{X})}^{⊤} \frac{\partial^{2} {\tilde{ξ}}_{i}}{\partial {\tilde{X}}_{i} \partial {\tilde{X}}_{i}^{⊤}} |_{{\tilde{X}}_{i} = μ_{X}} ({\tilde{X}}_{i} - μ_{X}) . \end{matrix}

(15)

Then, taking the expectation on (15) yields that

\begin{matrix} E (ξ_{i}) \approx E ({\tilde{ξ}}_{i}), \end{matrix}

(16)

where the expectation of the subtraction of two quadratic forms in (16) is given by

\begin{matrix} E \{{(X_{i} - μ_{X})}^{⊤} \frac{\partial^{2} ξ_{i}}{\partial X_{i} \partial X_{i}^{⊤}} |_{X_{i} = μ_{X}} (X_{i} - μ_{X}) - {({\tilde{X}}_{i} - μ_{X})}^{⊤} \frac{\partial^{2} {\tilde{ξ}}_{i}}{\partial {\tilde{X}}_{i} \partial {\tilde{X}}_{i}^{⊤}} |_{{\tilde{X}}_{i} = μ_{X}} ({\tilde{X}}_{i} - μ_{X})\} \\ = & trace (\frac{\partial^{2} ξ_{i}}{\partial X_{i} \partial X_{i}^{⊤}} |_{X_{i} = μ_{X}} var (X_{i})) - trace (\frac{\partial^{2} ξ_{i}}{\partial {\tilde{X}}_{i} \partial {\tilde{X}}_{i}^{⊤}} |_{{\tilde{X}}_{i} = μ_{X}} var ({\tilde{X}}_{i})) \\ = & 0 \end{matrix}

since

var ({\tilde{X}}_{i}) = Σ_{X}^{1 / 2} Σ_{X^{*}}^{- 1 / 2} var (X_{i}^{*}) Σ_{X^{*}}^{- 1 / 2} Σ_{X}^{1 / 2} = Σ_{X} = var (X_{i})

. Consequently, (16) implies that

\begin{matrix} ξ_{i}^{*} = E (ξ_{i} | X_{i}^{*}) \approx \frac{π ({\tilde{X}}_{i}) S_{u} (Y_{i} | {\tilde{X}}_{i})}{1 - π ({\tilde{X}}_{i}) + π ({\tilde{X}}_{i}) S_{u} (Y_{i} | {\tilde{X}}_{i})} . \end{matrix}

(17)

Therefore, by the given data and the fact

Σ_{X^{*}} = Σ_{X} + Σ_{ϵ}

from measurement error model (3), (12) can be estimated by

\begin{matrix} {\hat{X}}_{i} ≜ {\hat{μ}}_{X^{*}} + {({\hat{Σ}}_{X^{*}} - Σ_{ϵ})}^{1 / 2} {\hat{Σ}}_{X^{*}}^{- 1 / 2} (X_{i}^{*} - {\hat{μ}}_{X^{*}}), \end{matrix}

where

{\hat{μ}}_{X^{*}}

and

{\hat{Σ}}_{X^{*}}

are the empirical estimates of

μ_{X^{*}}

and

Σ_{X^{*}}

, respectively. Thus, replacing

{\tilde{X}}_{i}

in (17) with

{\hat{X}}_{i}

yields

\begin{matrix} {\hat{ξ}}_{i}^{*} = \frac{π ({\hat{X}}_{i}) S_{u} (Y_{i} | {\hat{X}}_{i})}{1 - π ({\hat{X}}_{i}) + π ({\hat{X}}_{i}) S_{u} (Y_{i} | {\hat{X}}_{i})} . \end{matrix}

(18)

When the E-step is established, we examine the M-step. Specifically, replacing

A_{i}

in (11) with

ξ_{i}^{*}

gives

\begin{matrix} {\hat{ℓ}}^{*} & = & \sum_{i = 1}^{n} (ξ_{i}^{*} α^{⊤} X_{i}^{*} - log \{1 + exp (α^{⊤} X_{i}^{*})\} + \frac{1}{2} α^{⊤} Σ_{ϵ} α) \\ + \sum_{i = 1}^{n} (δ_{i} [log \{d H (Y_{i})\} - log \{ρ H (Y_{i}) + exp (- β^{⊤} X_{i}^{*})\} + \frac{1}{2} β^{⊤} Σ_{ϵ} β] \\ - \frac{ξ_{i}^{*}}{ρ} [β^{⊤} X_{i}^{*} + log \{ρ H (Y_{i}) + exp (- β^{⊤} X_{i}^{*})\} - \frac{1}{2} β^{⊤} Σ_{ϵ} β]) \\ ≜ & {\hat{ℓ}}_{1}^{*} + {\hat{ℓ}}_{2}^{*}, \end{matrix}

(19)

where

{\hat{ℓ}}_{1}^{*}

contains information on

α

only and

{\hat{ℓ}}_{2}^{*}

includes

β

and

H (\cdot)

, which reflect the survival process. Thus, the maximization of

{\hat{ℓ}}^{*}

for

(α, β, H (\cdot))

can be performed separately. More specifically, the estimator of

α

, denoted by

\hat{α}

, is determined by

\hat{α} = \underset{α}{argmax} {\hat{ℓ}}_{1}^{*}

. Due to the unavailability of the closed form, one can adopt the Newton–Raphson approach to derive

\hat{α}

.

Naturally, the estimator of

(β, H (\cdot))

can also be determined by

\begin{matrix} (\hat{β}, \hat{H} (\cdot)) = \underset{(β, H (\cdot))}{argmax} {\hat{ℓ}}_{2}^{*} . \end{matrix}

(20)

However, the challenge comes from nonparametric function

H (\cdot)

. To address it, we follow the same line as [19] and “parametrize” function

H (\cdot)

by

\sum_{j = 1}^{d_{i}} exp (γ_{j})

, where

d_{i} ≜ max {j : t_{(j)} \leq t_{i}}

represents the index of the latest jump point before

t_{i}

and

γ_{1}, \dots, γ_{m}

are the corresponding logs of jump sizes of

H (\cdot)

at

t_{(1)}, \dots, t_{(m)}

, which are the sorted times. Based on such representation,

{\hat{ℓ}}_{2}^{*}

can be expressed as

\begin{matrix} {\hat{ℓ}}_{2}^{*} & = & \sum_{i = 1}^{n} (δ_{i} [γ_{i} - log \{ρ \sum_{j = 1}^{d_{i}} exp (γ_{j}) + exp (- β^{⊤} X_{i}^{*})\} + \frac{1}{2} β^{⊤} Σ_{ϵ} β] \\ - \frac{ξ_{i}^{*}}{ρ} [β^{⊤} X_{i}^{*} + log \{ρ \sum_{j = 1}^{d_{i}} exp (γ_{j}) + exp (- β^{⊤} X_{i}^{*})\} - \frac{1}{2} β^{⊤} Σ_{ϵ} β]) . \end{matrix}

(21)

Note that (21) is a concave function; then, taking the negative sign on (21) makes it become convex. To derive the estimators of

β

and

γ ≜ (γ_{1}, \dots, γ_{m})

from (21), we employ the coordinate-descent algorithm, which is frequently implemented in the optimization problem (e.g., [24]). Detailed descriptions of the steps are given below:

Step 1:: Choose an initial value for $β$ and $γ$ , and denote them by $β^{(0)}$ and $γ^{(0)}$ .
Step 2:: For $r = 1, \dots, p$ , given $γ^{(k - 1)} ≜ (γ_{1}^{(k - 1)}, \dots, γ_{p}^{(k - 1)})$ and $β_{1}^{(k - 1)}, \dots, β_{r - 1}^{(k - 1)}$ , $β_{r + 1}^{(k - 1)}, \dots, β_{p}^{(k - 1)}$ , update $β_{r}$ by finding

$\begin{matrix} β_{r}^{(k)} = \underset{β_{r}}{argmin} - {\hat{ℓ}}_{2}^{*} (β_{r} | γ^{(k - 1)}, β_{1}^{(k - 1)}, \dots, β_{r - 1}^{(k - 1)}, β_{r + 1}^{(k - 1)}, \dots, β_{p}^{(k - 1)}) . \end{matrix}$
Step 3:: For $s = 1, \dots, m$ , given $β^{(k)} ≜ (β_{1}^{(k)}, \dots, β_{p}^{(k)})$ and $γ_{1}^{(k - 1)}, \dots, γ_{r - 1}^{(k - 1)}, γ_{r + 1}^{(k - 1)}, \dots, γ_{p}^{(k - 1)}$ , update $γ_{s}$ by finding

$\begin{matrix} γ_{s}^{(k)} = \underset{γ_{r}}{argmin} - {\hat{ℓ}}_{2}^{*} (γ_{r} | β^{(k)}, γ_{1}^{(k - 1)}, \dots, γ_{r - 1}^{(k - 1)}, γ_{r + 1}^{(k - 1)}, \dots, γ_{m}^{(k - 1)}) . \end{matrix}$
Step 4:: Repeat Steps 2 and 3 until convergence, and let $\hat{β}$ and $\hat{γ}$ denote the limit of $β^{(k)}$ and $γ^{(k)}$ as $k \to \infty$ .

We comment that the coordinate-descent algorithm is a common strategy to obtain the minimizer (e.g., [24]). Its key idea is to iteratively update the scale parameter with other values given under the previous step. This approach enables us to obtain a stable numeric scheme and avoid estimating high-dimensional vectors of parameters. In the literature, the alternative strategy to address the optimization in (20) is called the minimization–maximization (MM) algorithm (e.g., [19,25]). While the MM algorithm is also an iterative approach based on the historical values that is similar to the coordinate-descent algorithm, the key step of this approach is to find and optimize the surrogate function that improves the value of the objective function or leave it unchanged, but in general, such suitable surrogate functions are not easy to find. Instead, the coordinate-descent algorithm enables us to directly examine (21).

Finally, to assess the validity of the proposed method, we present the following theoretical results, and the proof is postponed to Appendix A.2.

Theorem 1.

Let

α_{0}

,

β_{0}

, and

H_{0} (\cdot)

denote the true values of parameters. Under regularity conditions, the proposed estimator is consistent. That is, as

n \to \infty

,

\begin{matrix} {∥\hat{α} - α_{0}∥}_{2} \overset{p}{⟶} 0, {∥\hat{β} - β_{0}∥}_{2} \overset{p}{⟶} 0, and {∥\hat{H} (t) - H_{0} (t)∥}_{\infty} \overset{p}{⟶} 0, \end{matrix}

where

{∥ \cdot ∥}_{2}

and

{∥ \cdot ∥}_{\infty}

are the

L_{2}

- and supremum norms, respectively.

Let

V = {ν = (ν_{1}, ν_{2}, ν_{3})}

, and let us define

V_{p} = {ν \in V : ∥ ν_{1} ∥, ∥ ν_{2} ∥, ∥ ν_{3} ∥_{BV} < p}

as the collection of directions, where

ν_{1}

and

ν_{2}

are p-dimensional vectors,

ν_{3}

is a function defined in

[0, τ]

with

τ

being the end time of the study, and

{∥ \cdot ∥}_{BV}

is the total variation, defined as

\begin{matrix} ∥ ν_{3} ∥_{BV} = sup_{0 = t_{0} < t_{1} < \dots < t_{n} = τ} \sum_{j = 1}^{n} |ν_{3} (t_{j}) - ν_{3} (t_{j - 1})| . \end{matrix}

The next theorem shows the convergence in distribution of the proposed method with the rate

\sqrt{n}

.

Theorem 2.

Under regularity conditions, the proposed estimator has an asymptotic distribution. That is, as

n \to \infty

,

\sqrt{n} (\hat{α} - α_{0}, \hat{β} - β_{0}, \hat{H} (t) - H_{0} (t))

converges in distribution to a mean-zero Gaussian process in the functional space

l_{\infty} (V_{p})

on

V_{p}

.

Let

ϕ ≜ {(α^{⊤}, β^{⊤})}^{⊤}

. According to (19), the parameters

α

and

β

appear in

{\hat{ℓ}}_{1}^{*}

and

{\hat{ℓ}}_{2}^{*}

. With

H (\cdot)

fixed, by the theory of M-estimation, the marginal asymptotic distribution of

\hat{ϕ} ≜ {({\hat{α}}^{⊤}, {\hat{β}}^{⊤})}^{⊤}

is given by

\begin{matrix} \sqrt{n} (\hat{ϕ} - ϕ_{0}) \overset{d}{⟶} N (0, Σ_{ϕ}), \end{matrix}

(22)

where

Σ_{ϕ} = I^{- 1} J I^{- 1}

, with

\begin{matrix} I = diag (\frac{\partial^{2}}{\partial α \partial α^{⊤}} {\hat{ℓ}}_{1}^{*}, \frac{\partial^{2}}{\partial β \partial β^{⊤}} {\hat{ℓ}}_{2}^{*}), J = E (a^{\otimes 2}), \end{matrix}

and

a^{\otimes 2} = a a^{⊤}

for a vector

a ≜ {({(\frac{\partial {\hat{ℓ}}_{1}^{*}}{\partial α})}^{⊤}, {(\frac{\partial {\hat{ℓ}}_{2}^{*}}{\partial β})}^{⊤})}^{⊤}

.

3.3. Estimation of ROC and AUC

When the estimators of

H (\cdot)

,

α

, and

β

are obtained, we now propose the estimation procedure for the time-independent and time-dependent ROC curves, as well as the area under curve (AUC) with measurement error, where the former says that the event status and marker value for an individual are independent of the change in the time and the latter reflects that the curve varies at different time points and seems more flexible than the former. In real-world applications, it is common for the disease status to change over time, so the time-dependent ROC curve is more widely explored. However, in the presence of the cure fraction in survival time and measurement error in the covariates, both time-independent and time-dependent cases may not be fully discussed. To make the discussion on measurement error effects more comprehensive, we still study two different types of ROC curve estimation in the following two subsections.

3.3.1. Time-Independent AUC

We first discuss the time-independent case. Let

C_{1} (X)

be the binary and time-independent classifier with biomarker X, which classifies patients into the uncured group with 1. On the other hand, as discussed by [26], the linear composite biomarker is an optimal classifier among all functions of biomarkers. As a result, the event

C_{1} (X) = 1

is equivalent to the inequality

β^{⊤} X > u

for some constant u. Then, the time-independent true-positive rate (TPR) and false-positive rate (FPR), denoted by

T (α, β, u)

and

F (α, β, u)

, are defined as

\begin{matrix} T (α, β, u) = P (β^{⊤} X > u | \tilde{T} < \infty) and F (α, β, u) = P (β^{⊤} X > u | \tilde{T} = \infty) . \end{matrix}

(23)

Moreover, (23) can be further expressed as

\begin{matrix} T (α, β, u) & = & \frac{P (β^{⊤} X > u, \tilde{T} < \infty)}{P (\tilde{T} < \infty)} \\ = & \frac{\int P (\tilde{T} < \infty | X = x) I (β^{⊤} X > u) f_{X} (x) d x}{\int P (\tilde{T} < \infty | X = x) f_{X} (x) d x} \\ = & \frac{E \{π (X) I (β^{⊤} X > u)\}}{E \{π (X)\}} \end{matrix}

(24)

and

\begin{matrix} F (α, β, u) & = & \frac{P (β^{⊤} X > u, \tilde{T} = \infty)}{P (\tilde{T} = \infty)} \\ = & \frac{\int P (\tilde{T} = \infty | X = x) I (β^{⊤} X > u) f_{X} (x) d x}{\int P (\tilde{T} = \infty | X = x) f_{X} (x) d x} \\ = & \frac{E [\{1 - π (X)\} I (β^{⊤} X > u)]}{E \{1 - π (X)\}} \end{matrix}

(25)

respectively, with

f_{X} (\cdot)

and

E (\cdot)

being the density function of X and the expectation evaluated with respect to X, respectively, where the last equality in (24) comes from

\begin{matrix} P (\tilde{T} < \infty | X = x) & = & P (\tilde{T} < \infty | A = 1, X = x) P (A = 1 | X = x) \\ = & P (T < \infty | X = x) π (X) \\ = & π (X) \end{matrix}

with

P (T < \infty | X = x) = \int_{0}^{\infty} f (t | x) d t = 1

and

f (t | x)

being the conditional density function of T given X; similar derivations show that

\begin{matrix} P (\tilde{T} = \infty | X = x) & = & P (\tilde{T} = \infty | A = 0, X = x) P (A = 0 | X = x) \\ = & 1 - π (X) \end{matrix}

with

P (\tilde{T} = \infty | A = 0, X = x) = 1

, yielding the last equality of (25).

Therefore, based on (24) and (25), the AUC, denoted by

A (α, β)

, is given by (e.g., [11])

\begin{matrix} A (α, β) = \frac{E [π (X_{1}) \{1 - π (X_{2})\} I (β^{⊤} X_{1} > β^{⊤} X_{2})]}{E \{π (X)\} E \{1 - π (X)\}} . \end{matrix}

(26)

In practice, however, biomarker X may also be measured with error. Let

X^{*}

denote the surrogate version of X. We adopt measurement error model (3) to build the relationship between X and

X^{*}

.

In (24),

X^{*}

appears in

π (X)

and

π (X) I (β^{⊤} X > u)

. To correct for error effects, our strategy is to follow a similar approach to that in Section 3.2 and find the surrogate functions in terms of

X^{*}

, say

π^{*} (X^{*})

and

φ^{*} (X^{*})

, such that

π^{*} (X^{*}) = E \{π (X) | X^{*}\}

and

φ^{*} (X^{*}) = E \{π (X) I (β^{⊤} X > u) | X^{*}\}

. In this way, we have

E \{π^{*} (X^{*})\} = E \{π (X)\}

and

E \{φ^{*} (X^{*})\} = E \{π (X) I (β^{⊤} X > u)\}

; thus,

π (X)

and

π (X) I (β^{⊤} X > u)

can be replaced by

π^{*} (X^{*})

and

φ^{*} (X^{*})

, respectively. A similar strategy is also adopted for the numerator and denominator terms in (25) and (26). To this end, we develop

π^{*} (X^{*})

and

φ^{*} (X^{*})

.

According to the Mean Value Theorem, there exists

X^{'}

between X and

\tilde{X} ≜ E (X | X^{*})

, such that

\begin{matrix} π (X) = π (\tilde{X}) + \frac{\partial π (X)}{\partial X} |_{X = X^{'}} (X - \tilde{X}), \end{matrix}

(27)

and taking the conditional expectation

E (\cdot | X^{*})

gives

\begin{matrix} E \{π (X) | X^{*}\} = π (\tilde{X}) . \end{matrix}

(28)

This indicates that the corrected version

π^{*} (X^{*})

is given by the original function

π (\cdot)

with X being replaced by

\tilde{X}

. In particular, this result is essentially similar to the result in [27], which pointed out the satisfactory approximation of implementing

\tilde{X}

in the logit function.

On the other hand, note that based on the measurement error model, if

β^{⊤} X > u

, then we have

β^{⊤} X^{*} > β^{⊤} X > u - ∥ β ∥ ∥ ϵ ∥

. This indicates that

I (β^{⊤} X > u)

implies

I (β^{⊤} X^{*} > u)

. Therefore,

π (X) I (β^{⊤} X > u)

takes the value

π (X)

when

I (β^{⊤} X > u) = 1

, and by a similar strategy to (27), taking the conditional expectation yields

\begin{matrix} E \{π (X) I (β^{⊤} X > u) | X^{*}\} = π (\tilde{X}) I (β^{⊤} \tilde{X} > u) \end{matrix}

(29)

thus,

φ^{*} (X^{*})

is taken as

π (X) I (β^{⊤} X > u)

with X being replaced by

\tilde{X}

.

Consequently, the corrected TPR, FPR, and AUC are given by (24), (25), and (26), respectively. Moreover, with the consistent estimator

\hat{α}

in Section 3.2, the corresponding estimators of the corrected TPR, FPR, and AUC are

\begin{matrix} {\hat{T}}^{*} (\hat{α}, \hat{β}, u) = \frac{\frac{1}{n} \sum_{i = 1}^{n} \hat{π} ({\tilde{X}}_{i}) I ({\hat{β}}^{⊤} {\tilde{X}}_{i} > u)}{\frac{1}{n} \sum_{i = 1}^{n} \hat{π} ({\tilde{X}}_{i})}, {\hat{F}}^{*} (\hat{α}, \hat{β}, u) = \frac{\frac{1}{n} \sum_{i = 1}^{n} \{1 - \hat{π} ({\tilde{X}}_{i})\} I ({\hat{β}}^{⊤} {\tilde{X}}_{i} > u)}{\frac{1}{n} \sum_{i = 1}^{n} \{1 - \hat{π} ({\tilde{X}}_{i})\}}, \end{matrix}

and

\begin{matrix} \hat{A} (\hat{α}, \hat{β}) = \frac{\frac{1}{n (n - 1)} \sum_{i \neq j} [\hat{π} ({\tilde{X}}_{i}) \{1 - \hat{π} ({\tilde{X}}_{j})\} I ({\hat{β}}^{⊤} {\tilde{X}}_{i} > {\hat{β}}^{⊤} {\tilde{X}}_{j})]}{\{\frac{1}{n} \sum_{i = 1}^{n} \hat{π} ({\tilde{X}}_{i})\} [\frac{1}{n} \sum_{i = 1}^{n} \{1 - \hat{π} ({\tilde{X}}_{i})\}]}, \end{matrix}

(30)

where

\hat{π} (\cdot)

represents

π (\cdot)

with

α

being replaced by

\hat{α}

.

3.3.2. Time-Dependent AUC

We now explore the time-dependent case. Denote the time-dependent TPR and FPR, respectively, by

\begin{matrix} T_{t} (α, β, u) = P (β^{⊤} X > u | \tilde{T} < \infty, T \leq t) and F_{t} (α, β, u) = P (β^{⊤} X > u | \tilde{T} < \infty, T > t) . \end{matrix}

(31)

Similar to the discussion on (23), one can further express (31) as

\begin{matrix} T_{t} (α, β, u) = \frac{E [π (X) \{1 - S_{u} (t | X)\} I (β^{⊤} X > u)]}{E [π (X) \{1 - S_{u} (t | X)\}]} \end{matrix}

(32)

and

\begin{matrix} F_{t} (α, β, u) = \frac{E \{π (X) S_{u} (t | X) I (β^{⊤} X > u)\}}{E \{π (X) S_{u} (t | X)\}} . \end{matrix}

(33)

Thus, the time-dependent AUC is defined as

\begin{matrix} A_{t} (α, β) = \frac{E [π (X_{1}) \{1 - S_{u} (t | X_{1})\} π (X_{2}) S_{u} (t | X_{2}) I (β^{⊤} X_{1} > β^{⊤} X_{2})]}{E [π (X) \{1 - S_{u} (t | X)\}] E \{π (X) S_{u} (t | X)\}} . \end{matrix}

(34)

Moreover, following a similar discussion to that in Section 3.3.1, the corrected time-dependent TPR, FPR, and AUC are given by (32), (33), and (34) with X being replaced by

\tilde{X}

, respectively. With the consistent estimators

(\hat{α}, \hat{β}, \hat{H} (\cdot))

, the corresponding estimators are determined by

\begin{matrix} {\hat{T}}_{t} (\hat{α}, \hat{β}, u) = \frac{\frac{1}{n} \sum_{i = 1}^{n} [\hat{π} ({\tilde{X}}_{i}) \{1 - {\hat{S}}_{u} (t | {\tilde{X}}_{i})\} I ({\hat{β}}^{⊤} {\tilde{X}}_{i} > u)]}{\frac{1}{n} \sum_{i = 1}^{n} [\hat{π} ({\tilde{X}}_{i}) \{1 - {\hat{S}}_{u} (t | {\tilde{X}}_{i})\}]} \end{matrix}

\begin{matrix} {\hat{F}}_{t} (\hat{α}, \hat{β}, u) = \frac{\frac{1}{n} \sum_{i = 1}^{n} \{\hat{π} ({\tilde{X}}_{i}) {\hat{S}}_{u} (t | {\tilde{X}}_{i}) I ({\hat{β}}^{⊤} {\tilde{X}}_{i} > u)\}}{\frac{1}{n} \sum_{i = 1}^{n} \{\hat{π} ({\tilde{X}}_{i}) {\hat{S}}_{u} (t | {\tilde{X}}_{i})\}} \end{matrix}

and

\begin{matrix} {\hat{A}}_{t} (\hat{α}, \hat{β}) = \frac{\frac{1}{n (n - 1)} \sum_{i \neq j} [\hat{π} ({\tilde{X}}_{i}) \{1 - {\hat{S}}_{u} (t | {\tilde{X}}_{i})\} \hat{π} ({\tilde{X}}_{j}) {\hat{S}}_{u} (t | {\tilde{X}}_{j}) I ({\hat{β}}^{⊤} {\tilde{X}}_{i} > {\hat{β}}^{⊤} {\tilde{X}}_{j})]}{[\frac{1}{n} \sum_{i = 1}^{n} \hat{π} ({\tilde{X}}_{i}) \{1 - {\hat{S}}_{u} (t | {\tilde{X}}_{i})\}] [\frac{1}{n} \sum_{i = 1}^{n} \{\hat{π} ({\tilde{X}}_{i}) {\hat{S}}_{u} (t | {\tilde{X}}_{i})\}]} . \end{matrix}

Finally, at the end of this section, we establish the following theorems to justify the proposed estimators of time-independent/-dependent AUC, including consistency and asymptotic normality.

Theorem 3.

Suppose that the conditions in Theorem 1 hold. With a fixed value u, as

n \to \infty

, the following applies:

(a): For the time-independent result in Section 3.3.1,

$\begin{matrix} {\hat{T}}^{*} (\hat{α}, \hat{β}, u) \overset{p}{⟶} T (α_{0}, β_{0}, u), {\hat{F}}^{*} (\hat{α}, \hat{β}, u) \overset{p}{⟶} F (α_{0}, β_{0}, u), and \hat{A} (\hat{α}, \hat{β}) \overset{p}{⟶} A (α_{0}, β_{0}); \end{matrix}$
(b): For the time-dependent result in Section 3.3.2,

$\begin{matrix} {\hat{T}}_{t} (\hat{α}, \hat{β}, u) \overset{p}{⟶} T_{t} (α_{0}, β_{0}, u), {\hat{F}}_{t} (\hat{α}, \hat{β}, u) \overset{p}{⟶} F_{t} (α_{0}, β_{0}, u), \end{matrix}$

and

$\begin{matrix} {\hat{A}}_{t} (\hat{α}, \hat{β}) \overset{p}{⟶} A_{t} (α_{0}, β_{0}) . \end{matrix}$

Theorem 4.

Suppose that conditions in Theorems 1 and 2 hold. Then as

n \to \infty

, the following applies:

(a): For the time-independent result in Section 3.3.1,

$\begin{matrix} \sqrt{n} \{\hat{A} (\hat{α}) - A (α_{0})\} \overset{d}{⟶} N (0, σ_{TI}), \end{matrix}$

where $σ_{TI}$ is the asymptotic variance whose exact form is given in Appendix A.5;
(b): For the time-dependent result in Section 3.3.2,

$\begin{matrix} \sqrt{n} \{{\hat{A}}_{t} (\hat{α}, \hat{β}) - A_{t} (α_{0}, β_{0})\} \overset{d}{⟶} N (0, σ_{TD}), \end{matrix}$

where $σ_{TD}$ is the asymptotic variance whose exact form is given in Appendix A.5.

4. Numerical Studies

Let

n = 250

or 500 denote the sample size. For a subject

i = 1, \dots, n

, let

X_{i}

be generated by the standard normal distribution. Let

α = 0.5

,

ρ = 2

, and

β = 1

be the true values of the parameters in (1) and (2), respectively. We consider the function

H (t) = t

. We then generate

A_{i}

and

T_{i}

by (1) and (2), respectively. We further independently generate the censoring time

C_{i}

from the uniform distribution in an interval

[0, τ_{0}]

, where

τ_{0}

is a pre-specified constant, such that the censoring rate is approximately 30%. In addition, we further apply measurement error model (3) to generate

X_{i}^{*}

for

i = 1, \dots, n

, where

ϵ_{i}

is independently generated by the normal distribution with mean zero and variance

σ_{ϵ}^{2} = 0.15, 0.35

, and 0.55, which reflect minor, moderate, or severe measurement error effects, respectively. Consequently, the collected dataset is denoted by

{{Y_{i}, δ_{i}, X_{i}^{*}} : i = 1, \dots, n}

. For each setting, we run the simulation 500 times repeatedly.

There are two goals in this study. First, we aim to estimate two parameters

α

and

β

in (1) and (2), respectively. Based on two estimates

\hat{α}

and

\hat{β}

, the next goal is to estimate the time-independent AUC (26) and the time-dependent AUC (34) at the time points

t = 5

and 10. In addition to implementing the proposed method with measurement error correction, to see the impact of the measurement error on estimation, we also examine the naive method, which is given by the same estimation procedure in Section 3 but with error-prone covariates

X_{i}^{*}

equipped.

The simulation results are summarized in Table 1, where we report the bias, the standard error (S.E.) obtained by repeated simulations, and the mean squared error (MSE). In general, we find that biases are increasing when

σ_{ϵ}^{2}

becomes large, which implies that the measurement error may affect the estimation even though the correction is taken into account. In particular, larger biases of the estimated AUC indicate that the shape of the ROC curves slightly moves to the 45-degree straight line. For the comparisons of two estimation methods, the proposed method generally outperforms the naive method due to the smaller biases of the estimators, regardless of the change of the values of

σ_{ϵ}^{2}

and sample sizes n. This indicates that the proposed method is valid to produce precise estimators. Note that the S.E. of the naive method is relatively smaller than the proposed method, as discussed in [22]; it is due to removing biases in point estimators and is a typical phenomenon in the framework of measurement error analysis. By the trade-off between bias and variation, we observe from the MSE that the proposed method is still better than the naive method with smaller MSE. In summary, the numerical results verify the validity of the proposed method and the measurement error correction.

5. Summary

In this paper, we study the transformation model to analyze the survival data with the cure fraction and its extension to the estimation of the ROC curve. In addition, we also explore the measurement error in covariates or biomarkers, which is one type of noisy data. To deal with measurement error effects and derive a reliable estimator, we propose a corrected likelihood function whose expectation recovers to the likelihood function under the true covariates. In addition, we also propose a measurement error correction strategy to estimate the time-independent/time-dependent AUC. To derive the optimization of (19), we introduce the EM algorithm with finite stepwise iterations to obtain the estimators, but one should keep in mind that the computational implementation is not unique, and some alternative strategies can be adopted, such as quasi-Newton methods or damped Anderson acceleration, when the dimension of variables is large or the settings are complex. The other contribution of this paper is the focus on the establishment of the asymptotic properties for the proposed estimator with rigorous derivations. While we have no real data application in the current stage, it is also expected that the proposed method is valid to handle the relevant data, since simulation studies show that the performance of the proposed method is satisfactory.

The current development focuses on the estimation of the parameters, as well as the AUC, and the handling of measurement error problems, but some further issues related to the AUC could be explored in depth. For example, the current setting involves the cure fraction, which may induce long follow-up periods for the survival time. An interesting but unsolved concern is to understand whether the dynamic AUC estimation is stabilized and the corresponding impact caused by measurement error. Exploring this issue in depth might make the diagnosis of the disease more precise. In the next stage of our research, we wish to explore this issue from a theoretical perspective and provide rigorous justification. In addition to the time-dependent AUC, as mentioned by a referee, the other challenging feature in applications is the time-varying covariates. While this issue was studied in the literature (e.g., [28]), the corresponding applications to AUC estimation seem to be rarely discussed. Moreover, in the presence of measurement error in covariates, the other challenge includes the characterization of time-dependent covariates and the corresponding measurement error correction. We expect that the current method can be extended to handle this complex structure in the near future.

Funding

This research was funded by National Science and Technology Council, Taiwan, with grant number 112-2118-M-004-005-MY2.

Data Availability Statement

All the data in this paper are simulated data generated by the procedure in Section 4.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Theoretical Justification

Appendix A.1. Regularity Conditions

(C1): $Θ$ is a compact set, and the true parameter value $(α_{0}, β_{0})$ is an interior point of $Θ$ .
(C2): Let $τ$ be the finite maximum support of the failure time.
(C3): $\{A_{i}, Y_{i}, X_{i}\}$ are independent and identically distributed for $i = 1, \dots, n$ .
(C4): The covariates or biomarkers $X_{i}$ are bounded.
(C5): Censoring time $C_{i}$ is noninformative. That is, the failure time $T_{i}$ and the censoring time $C_{i}$ are independent, given the covariate $X_{i}$ .

Condition (C1) is a basic condition that is used to derive the maximizer of the target function. (C2) to (C5) are standard conditions for survival analysis, which allow us to obtain the sum of i.i.d. random variables and hence to derive the asymptotic properties of the estimators.

Appendix A.2. Proof of Theorem 1

Proof.

Note that the function

H (\cdot)

is parameterized as

H^{*} (t) ≜ \frac{1}{n} \sum_{j = 1}^{n} I (Y_{j} \leq t) exp (γ_{j})

, and the corresponding estimator is given by

\hat{H} (t) = \frac{1}{n} \sum_{j = 1}^{n} I (Y_{j} \leq t) exp ({\hat{γ}}_{j})

. Following Helly’s Lemma (e.g., [29], Section 2.1) and a similar discussion in [30], we can show that

H^{*} (t)

converges to

H_{0} (t)

, provided that

γ_{j}

is replaced by the true value

γ_{j 0}

for

j = 1, \dots, p

. As a result, to show the consistency of

\hat{H} (t)

and

H_{0} (t)

, it suffices to examine the relationship between

\hat{H} (t)

and

H^{*} (t)

; thus, we can explore the asymptotic behavior of

{\hat{γ}}_{j}

.

In the following derivation, we aim to prove that

\begin{matrix} {∥{({\hat{α}}^{⊤}, {\hat{β}}^{⊤}, {\hat{γ}}^{⊤})}^{⊤} - {(α_{0}^{⊤}, β_{0}^{⊤}, γ_{0}^{⊤})}^{⊤}∥}_{2} = O_{p} (\frac{1}{\sqrt{n}}) . \end{matrix}

(A1)

Specifically, if (A1) holds, then we conclude that

{∥\hat{α} - α_{0}∥}_{2} \overset{p}{⟶} 0

,

{∥\hat{β} - β_{0}∥}_{2} \overset{p}{⟶} 0

and

{∥\hat{H} (t) - H^{*} (t)∥}_{\infty} \overset{p}{⟶} 0

as

n \to \infty

.

To prove (A1), we follow the strategy in [24] and start by showing that

\begin{matrix} P \{sup_{{∥ U ∥}_{2} = B} {\hat{ℓ}}^{*} (α_{0} + \frac{u_{1}}{\sqrt{n}}, β_{0} + \frac{u_{2}}{\sqrt{n}}, γ_{0} + \frac{u_{3}}{\sqrt{n}}) < {\hat{ℓ}}^{*} (α_{0}, β_{0}, γ_{0})\} > 1 - ϵ \end{matrix}

(A2)

for every

ϵ > 0

and a given value

B

, where

U = {(u_{1}^{⊤}, u_{2}^{⊤}, u_{3}^{⊤})}^{⊤}

, with

u_{1} = \sqrt{n} (\hat{α} - α_{0})

,

u_{2} = \sqrt{n} (\hat{β} - β_{0})

, and

u_{3} = \sqrt{n} (\hat{γ} - γ_{0})

.

We write

Ψ (u_{1}, u_{2}, u_{3}) = {\hat{ℓ}}^{*} (α_{0} + \frac{u_{1}}{\sqrt{n}}, β_{0} + \frac{u_{2}}{\sqrt{n}}, γ_{0} + \frac{u_{3}}{\sqrt{n}}) - {\hat{ℓ}}^{*} (α_{0}, β_{0}, γ_{0})

. By the Taylor series expansion around

u_{1} = u_{2} = u_{3} = 0

, we have

\begin{matrix} Ψ (u_{1}, u_{2}, u_{3}) & = & \frac{u_{1}^{⊤}}{\sqrt{n}} S_{α} (α_{0}, β_{0}, γ_{0}) + \frac{u_{2}^{⊤}}{\sqrt{n}} S_{β} (α_{0}, β_{0}, γ_{0}) + \frac{u_{3}^{⊤}}{\sqrt{n}} S_{γ} (α_{0}, β_{0}, γ_{0}) \\ + \frac{1}{n} u_{1}^{⊤} I_{α} (α_{0}, β_{0}, γ_{0}) u_{1} + \frac{1}{n} u_{2}^{⊤} I_{β} (α_{0}, β_{0}, γ_{0}) u_{2} + \frac{1}{n} u_{3}^{⊤} I_{γ} (α_{0}, β_{0}, γ_{0}) u_{3} \\ + \frac{1}{n} u_{2}^{⊤} I_{β γ} (α_{0}, β_{0}, γ_{0}) u_{3} + o_{p} (1), \end{matrix}

(A3)

where

S_{α} (\cdot)

,

S_{β} (\cdot)

, and

S_{γ} (\cdot)

are the first-order derivatives of

Ψ (\cdot)

with respect to

u_{1}

,

u_{2}

, and

u_{3}

, respectively;

I_{α} (\cdot)

,

I_{β} (\cdot)

, and

I_{γ} (\cdot)

are the second-order derivatives of

Ψ (\cdot)

with respect to

u_{1}

,

u_{2}

, and

u_{3}

, respectively; and

I_{β γ}

is the derivative of

Ψ (\cdot)

with respect to

u_{2}

and

u_{3}

.

We now separately examine each term in (A3). Note that

(α_{0}, β_{0}, γ_{0})

is the maximizer of

E (ℓ)

, and

E (ℓ) = E ({\hat{ℓ}}^{*})

according to our development. This implies that

(α_{0}, β_{0}, γ_{0})

is also the maximizer of

E ({\hat{ℓ}}^{*})

or, equivalently, the solutions of

E \{S_{α} (α_{0}, β_{0}, γ_{0})\} = 0

,

E \{S_{β} (α_{0}, β_{0}, γ_{0})\} = 0

, and

E \{S_{γ} (α_{0}, β_{0}, γ_{0})\} = 0

. Therefore, we have

\begin{matrix} \frac{u_{1}^{⊤}}{\sqrt{n}} S_{α} (α_{0}, β_{0}, γ_{0}) & = & u_{1}^{⊤} O_{p} (1), \\ \frac{u_{2}^{⊤}}{\sqrt{n}} S_{β} (α_{0}, β_{0}, γ_{0}) & = & u_{2}^{⊤} O_{p} (1), and \\ \frac{u_{3}^{⊤}}{\sqrt{n}} S_{γ} (α_{0}, β_{0}, γ_{0}) & = & u_{3}^{⊤} O_{p} (1) . \end{matrix}

For the second-order derivative

I_{α} ≜ \frac{\partial^{2} {\hat{ℓ}}^{*}}{\partial α \partial α^{⊤}}

, its explicit form is given by

\begin{matrix} I_{α} & = & - (\sum_{i = 1}^{n} [\frac{X_{i}^{*} X_{i}^{* ⊤} exp (α^{⊤} X_{i}^{*}) \{1 + exp (α^{⊤} X_{i}^{*})\} - X_{i}^{*} X_{i}^{* ⊤} exp (α^{⊤} X_{i}^{*})}{{\{1 + exp (α^{⊤} X_{i}^{*})\}}^{2}}] + Σ_{ϵ}) \\ = & \frac{\partial}{\partial α^{⊤}} (\frac{\partial {\hat{ℓ}}^{*}}{\partial α}) . \end{matrix}

(A4)

By the Law of Large Numbers, we have that

\frac{1}{n} I_{α} \overset{p}{⟶} I_{α} = E (\frac{1}{n} I_{α}) = E \{E (\frac{1}{n} I_{α} | X_{i})\}

as

n \to \infty

. As a result, to find

E (I_{α})

, it suffices to compute

E (I_{α} | X_{i}) = E \{\frac{\partial}{\partial α^{⊤}} (\frac{\partial {\hat{ℓ}}^{*}}{\partial α}) | X_{i}\} = \frac{\partial}{\partial α^{⊤}} E (\frac{\partial {\hat{ℓ}}^{*}}{\partial α} | X_{i})

, where the last equality is based on the interchange of the derivative and integral.

Taking the derivative of (19) with respect to

α

and then taking the conditional expectation give

\begin{matrix} E (\frac{\partial {\hat{ℓ}}^{*}}{\partial α} | X_{i}) = \sum_{i = 1}^{n} \{ξ_{i} X_{i} - \frac{exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α) \times (X_{i} + Σ_{ϵ} α)}{1 + exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)}\} + Σ_{ϵ} α, \end{matrix}

which also implies that

\begin{matrix} E (\frac{\partial^{2} {\hat{ℓ}}^{*}}{\partial α \partial α^{⊤}} | X_{i}) & = & \frac{\partial}{\partial α^{⊤}} E (\frac{\partial {\hat{ℓ}}^{*}}{\partial α} | X_{i}) \\ = & - \sum_{i = 1}^{n} \frac{exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α) \{{∥X_{i} + Σ_{ϵ} α∥}_{2}^{2} - Σ_{ϵ}\} - Σ_{ϵ}}{{\{1 + exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)\}}^{2}} . \end{matrix}

As a result, by the Law of Large Numbers and Condition (C3),

I_{α}

is determined by

\begin{matrix} I_{α} = E [\frac{- exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α) \{{∥X_{i} + Σ_{ϵ} α∥}_{2}^{2} - Σ_{ϵ}\} + Σ_{ϵ}}{{\{1 + exp (α^{⊤} X_{i} + \frac{1}{2} α^{⊤} Σ_{ϵ} α)\}}^{2}}] . \end{matrix}

By Conditions (C1) and (C4), the covariates

X_{i}

and the parameter

α

are bounded, and

Σ_{ϵ}

is positive definite, which indicates that

I_{α}

is bounded componentwise and its negative definite ensures that the maximizer of

α

exists.

Next, we explore

I_{β}

. By taking the derivative of

{\hat{ℓ}}^{*}

with respect to

β

, we have

\begin{matrix} \frac{\partial {\hat{ℓ}}^{*}}{\partial β} & = & \sum_{i = 1}^{n} [δ_{i} \frac{X_{i}^{*} exp (- β^{⊤} X_{i}^{*})}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i}^{*})} + Σ_{ϵ} β - \frac{A_{i}}{ρ} X_{i}^{*} \\ + \frac{A_{i}}{ρ} \frac{X_{i}^{*} exp (- β^{⊤} X_{i}^{*})}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i}^{*})} + \frac{A_{i}}{ρ} Σ_{ϵ} β] . \end{matrix}

Note that the moment-generating function gives

\begin{matrix} E \{exp (- β^{⊤} X_{i}^{*}) | X_{i}\} = exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β) \end{matrix}

(A5)

and

\begin{matrix} E \{- X_{i}^{*} exp (β^{⊤} X_{i}^{*}) | X_{i}\} = exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β) \times (- X_{i} + Σ_{ϵ} β), \end{matrix}

then, the conditional expectation of

\frac{\partial {\hat{ℓ}}^{*}}{\partial β}

given

X_{i}

is

\begin{matrix} E (\frac{\partial {\hat{ℓ}}^{*}}{\partial β} | X_{i}) & = & \sum_{i = 1}^{n} [δ_{i} \frac{(X_{i} - Σ_{ϵ} β) exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)} + Σ_{ϵ} β - \frac{A_{i}}{ρ} X_{i} \\ + \frac{A_{i}}{ρ} \frac{(X_{i} - Σ_{ϵ} β) exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)} + \frac{A_{i}}{ρ} Σ_{ϵ} β] . \end{matrix}

This further shows that

\begin{matrix} E (\frac{\partial^{2} {\hat{ℓ}}^{*}}{\partial β \partial β^{⊤}} | X_{i}) = \sum_{i = 1}^{n} \{(δ_{i} + \frac{A_{i}}{ρ}) D_{1} + \frac{A_{i}}{ρ} Σ_{ϵ}\} and E (\frac{\partial^{2} {\hat{ℓ}}^{*}}{\partial β \partial γ_{j}^{⊤}} | X_{i}) = \sum_{i = 1}^{n} \{(δ_{i} + \frac{A_{i}}{ρ}) D_{2}\}, \end{matrix}

where

\begin{matrix} D_{1} & = & \frac{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) (∥ - X_{i} + Σ_{ϵ} {β ∥}_{2}^{2} + Σ_{ϵ}) exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)} \\ + \frac{exp (- 2 β^{⊤} X_{i} + β^{⊤} Σ_{ϵ} β) Σ_{ϵ}}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)} \end{matrix}

and

\begin{matrix} D_{2} = \frac{ρ I (Y_{i} < Y_{j}) exp (γ_{j}) (- X_{i} + Σ_{ϵ} β) exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)} . \end{matrix}

Consequently, by the Law of Large Numbers, we have that

\begin{matrix} I_{β} = E (\frac{1}{n} \frac{\partial^{2} {\hat{ℓ}}^{*}}{\partial β \partial β^{⊤}} | X_{i}) = E \{(δ_{i} + \frac{A_{i}}{ρ}) D_{1} + \frac{A_{i}}{ρ} Σ_{ϵ}\} \end{matrix}

and

I_{β γ}

is the matrix of

E (\frac{\partial^{2} {\hat{ℓ}}^{*}}{\partial β \partial γ_{j}^{⊤}})

with stack in row for

j = 1, \dots, n

, where

\begin{matrix} E (\frac{\partial^{2} {\hat{ℓ}}^{*}}{\partial β \partial γ_{j}^{⊤}}) = E \{(δ_{i} + \frac{A_{i}}{ρ}) D_{2}\}, \end{matrix}

and they are bounded elementally due to positive definite matrix

Σ_{ϵ}

and Conditions (C1) and (C4).

Finally, we examine

I_{γ}

. Taking the first derivative of

{\hat{ℓ}}^{*}

with respect to

γ_{j}

yields

\begin{matrix} \frac{\partial {\hat{ℓ}}^{*}}{\partial γ_{j}} = \sum_{i = 1}^{n} \{δ_{i} - \frac{ρ I (Y_{i} < Y_{j}) exp (γ_{j})}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i}^{*})} \times (δ_{i} + \frac{ξ_{i}^{*}}{ρ})\} . \end{matrix}

Additionally, by (A5), we have

\begin{matrix} E \{\frac{\partial {\hat{ℓ}}^{*}}{\partial γ_{j}} | X_{i}\} = \sum_{i = 1}^{n} \{δ_{i} - \frac{ρ I (Y_{i} < Y_{j}) exp (γ_{j})}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)} \times (δ_{i} + \frac{A_{i}}{ρ})\} . \end{matrix}

This also implies that

\begin{matrix} E \{\frac{\partial^{2} {\hat{ℓ}}^{*}}{\partial γ_{j}^{2}} | X_{i}\} = - \sum_{i = 1}^{n} \{\frac{ρ^{2} I (Y_{i} < Y_{j}) exp (2 γ_{j})}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)} \times (δ_{i} + \frac{A_{i}}{ρ})\} \end{matrix}

and

\begin{matrix} E \{\frac{\partial^{2} {\hat{ℓ}}^{*}}{\partial γ_{j} \partial γ_{l}} | X_{i}\} = - \sum_{i = 1}^{n} \{\frac{ρ^{2} I (Y_{i} < Y_{j}) I (Y_{i} < Y_{l}) exp (γ_{l} + γ_{j})}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)} \times (δ_{i} + \frac{A_{i}}{ρ})\} \end{matrix}

for

j \neq l

. Therefore, by the Law of Large Numbers, we have

I_{γ}

with diagonal entries

\begin{matrix} E \{\frac{\partial^{2} {\hat{ℓ}}^{*}}{\partial γ_{j}^{2}}\} = E \{- \frac{ρ^{2} I (Y_{i} < Y_{j}) exp (2 γ_{j})}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)} \times (δ_{i} + \frac{A_{i}}{ρ})\} \end{matrix}

and off-diagonal elements

\begin{matrix} E \{\frac{\partial^{2} {\hat{ℓ}}^{*}}{\partial γ_{j} \partial γ_{l}}\} = E \{- \frac{ρ^{2} I (Y_{i} < Y_{j}) I (Y_{i} < Y_{l}) exp (γ_{l} + γ_{j})}{ρ \sum_{j = 1}^{n} I (Y_{i} < Y_{j}) exp (γ_{j}) + exp (- β^{⊤} X_{i} + \frac{1}{2} β^{⊤} Σ_{ϵ} β)} \times (δ_{i} + \frac{A_{i}}{ρ})\} \end{matrix}

for

j \neq l

.

Finally, when n goes to infinity, the quadratic terms in (A3) converge in probability to

\begin{matrix} {(u_{1}^{⊤}, u_{2}^{⊤}, u_{3}^{⊤})}^{⊤} (\begin{matrix} I_{α} & 0 & 0 \\ 0 & I_{β} & I_{β γ} \\ 0 & I_{β γ} & I_{γ} \end{matrix}) (\begin{matrix} u_{1} \\ u_{2} \\ u_{3} \end{matrix}) = : U^{⊤} I U, \end{matrix}

which is equal to

O (B^{2})

because

I

is bounded and negative definite. Since

B^{2}

dominates

B

in the linear terms of (A3), for sufficiently large

B

and

{∥ U ∥}_{2} = B

, we have

\begin{matrix} {\hat{ℓ}}^{*} (α_{0} + \frac{u_{1}}{\sqrt{n}}, β_{0} + \frac{u_{2}}{\sqrt{n}}, γ_{0} + \frac{u_{3}}{\sqrt{n}}) - {\hat{ℓ}}^{*} (α_{0}, β_{0}, γ_{0}) < 0 . \end{matrix}

Therefore, (A2) holds; thus, (A1) is verified. □

Appendix A.3. Proof of Theorem 2

Proof.

In this proof, we adopt Theorem 3.3.1 in [31] to derive the asymptotic distribution. A similar discussion was also presented by [32].

Let

θ_{0} ≜ (α_{0}, β_{0}, H_{0})

and

\hat{θ} ≜ (\hat{α}, \hat{β}, \hat{H})

. Moreover, define

S = \frac{\partial {\hat{ℓ}}^{*}}{\partial θ}

and

S = E (S)

. According to Theorem 3.3.1 in [31], we need to check the following:

(a): $\sqrt{n} (S - S) (θ_{0}) \overset{d}{⟶} Z$ , where Z is a tight random element;
(b): The map $θ \to S (θ)$ is Fréchet differentiable at $θ_{0}$ with a continuously invertible derivative $\nabla S (θ_{0})$ , where $\nabla f (x)$ denotes an operator of the derivative of f with respect to x;
(c): $S (θ_{0}) = 0$ , and $\hat{θ}$ satisfies $S (\hat{θ}) = o_{p} (\frac{1}{\sqrt{n}})$ .

If Conditions (a)–(c) are verified, then we conclude that

\begin{matrix} \sqrt{n} (\hat{θ} - θ_{0}) \overset{d}{⟶} - {\{\nabla S (θ_{0})\}}^{- 1} Z as n \to \infty . \end{matrix}

Lastly, we examine Conditions (a)–(c) separately.

Check Condition (a):

Since

θ_{0}

is the true value, we automatically have

S (θ_{0}) = 0

. Then,

(S - S) (θ_{0}) = \frac{1}{n} \sum_{i = 1}^{n} (D_{i, 1} + D_{i, 2} + D_{i, 3})

, where

\begin{matrix} D_{i, 1} = X_{i}^{*} - \frac{X_{i}^{*} exp (α_{0}^{⊤} X_{i}^{*})}{1 + exp (α_{0}^{⊤} X_{i}^{*})} + Σ_{ϵ} α_{0}, \end{matrix}

\begin{matrix} D_{i, 2} = (\frac{X_{i}^{*} exp (- β_{0}^{⊤} X_{i}^{*})}{ρ H_{0} (Y_{i}) + exp (- β_{0}^{⊤} X_{i}^{*})} + Σ_{ϵ} β_{0}) (\frac{A_{i}}{ρ} + δ_{i}), \end{matrix}

and

\begin{matrix} D_{i, 3} = δ_{i} d H_{0} (Y_{i}) - \frac{ρ d H_{0} (Y_{i})}{ρ H_{0} (Y_{i}) + exp (- β_{0}^{⊤} X_{i}^{*})} (\frac{A_{i}}{ρ} + δ_{i}) . \end{matrix}

By Conditions (C1) and (C4),

D_{i, 1}

is Donsker bounded because it belongs to the finite-dimensional class of a measurable score function. In addition,

D_{i, 2}

and

D_{i, 3}

are bounded real valued functions on the bounded support due to Conditions (C2) and (C4). This indicates that

D_{i, 2}

and

D_{i, 3}

are Donsker (e.g., [29], p. 270). Therefore, by Example 2.10.7 in [31], we conclude that

(S - S) (θ_{0})

is also Donsker. Consequently, by Section 2.8.2 in [31] and by Theorem 19.3 in [29], we obtain that as

n \to \infty

,

\sqrt{n} (S - S) (θ_{0})

converges in distribution to a tight random element, denoted by Z.

Check Condition (b):

The first part is to show Fréchet differentiability. That is, it suffices to prove that (e.g., [29], p. 297)

\begin{matrix} ∥ S (θ_{0} + v) - S (θ_{0}) - \nabla S (v) ∥ = o (∥ v ∥) as ∥ v ∥ \to 0 . \end{matrix}

(A6)

Let

θ ≜ (α, β, H)

, and let

θ_{0}

denote the true value of

θ

. We write

θ_{r} = (α + ν_{1}, β + ν_{2}, H_{r} (ν_{3}))

, with

H_{r} (ν_{3}) (t) = \int_{0}^{t} (1 + r ν_{3}) d H (u)

, where

ν_{1}, ν_{2} \in R^{p}

and

ν_{3}

satisfies bound variation on

[0, τ]

. Define

ν \equiv (ν_{1}, ν_{2}, ν_{3})

.

On one hand, by the Taylor series expansion on

S (θ_{0} + r ν)

around

r = 0

, we have

\begin{matrix} S (θ_{0} + r ν) - S (θ_{0}) \\ = & ν_{1} E [X_{i}^{*} - \frac{X_{i}^{*} exp \{{(α_{0} + r^{*} ν_{1})}^{⊤} X_{i}^{*}\}}{1 + exp \{{(α_{0} + r^{*} ν_{1})}^{⊤} X_{i}^{*}\}} + Σ_{ϵ} (α_{0} + r ν_{1})] (r - 0) \\ + ν_{2} E [\frac{X_{i}^{*} exp \{- {(β_{0} + r^{*} ν_{2})}^{⊤} X_{i}^{*}\}}{ρ H_{0} (Y_{i}) + exp \{- {(β_{0} + r^{*} ν_{2})}^{⊤} X_{i}^{*}\}} \cdot (δ_{i} + \frac{A_{i}}{ρ}) \\ + Σ_{ϵ} (β_{0} + r^{*} ν_{2}) \cdot (δ_{i} + \frac{A_{i}}{ρ})] (r - 0) \\ + ν_{3} E [δ_{i} ν_{3} (Y_{i}) - \frac{ρ \int_{0}^{Y_{i}} ν_{3} (u) d H_{0} (u)}{ρ H_{r^{*}} (Y_{i}) + exp (- β_{0}^{⊤} X_{i}^{*}\}} \cdot (δ_{i} + \frac{A_{i}}{ρ})] (r - 0) + o (r) \\ ≜ & r \cdot \{ν_{1} S_{α} + ν_{2} S_{β} + S_{H} (ν_{3})\} + o (r) . \end{matrix}

(A7)

On the other hand, by taking the derivative of

S

based on the definition in p. 296 of [29], we have

\begin{matrix} \nabla S (ν) & = & ν_{1} E [X_{i}^{*} - \frac{X_{i}^{*} exp \{α_{0}^{⊤} X_{i}^{*}\}}{1 + exp \{α^{⊤} X_{i}^{*}\}} + Σ_{ϵ} α] \\ + ν_{2} E [\frac{X_{i}^{*} exp \{- β^{⊤} X_{i}^{*}\}}{ρ H (Y_{i}) + exp \{- β^{⊤} X_{i}^{*}\}} \cdot (δ_{i} + \frac{A_{i}}{ρ}) + Σ_{ϵ} β \cdot (δ_{i} + \frac{A_{i}}{ρ})] \\ + ν_{3} E [δ_{i} ν_{3} (Y_{i}) - \frac{ρ \int_{0}^{Y_{i}} ν_{3} (u) d H (u)}{ρ H (Y_{i}) + exp (- β^{⊤} X_{i}^{*}\}} \cdot (δ_{i} + \frac{A_{i}}{ρ})] . \end{matrix}

(A8)

Finally, by combining (A7) and (A8), we obtain (A6) as

r \to 0

, because of Theorem 1 and because

r^{*} \to 0

as

r \to 0

.

The second part is to discuss the continuous invertibility of

\nabla S (θ_{0})

. Similar to the discussion in [32], the continuous invertibility of the Fréchet derivative can be justified by showing that there exists a constant

ζ

such that

\begin{matrix} inf_{θ} \frac{∥ \nabla S (θ_{0}) ∥}{∥ θ ∥} > ζ . \end{matrix}

(A9)

Since

\nabla S (θ_{0})

can be expressed as a linear combination of the three

σ

-operators, where three terms in

\nabla S (θ_{0})

are continuously differentiable functions that map to a finite-dimensional space. Therefore, we conclude that

\nabla S (θ_{0})

is the summation of invertible and compact operators; thus, (A9) holds by the similar arguments in [32]. Consequently,

\nabla S (θ_{0})

is verified to be continuously differentiable.

Check Condition (c):

The first claim,

S (θ_{0}) = 0

, has been verified by the argument of Condition (a). In addition,

\hat{θ}

is the maximizer of

S (θ)

, and

\hat{θ}

is the consistent estimator due to Theorem 1. Therefore, the second claim,

S (\hat{θ}) = o_{p} (n^{- 1 / 2})

, holds. □

Appendix A.4. Proof of Theorem 3

Proof.

In this appendix, we mainly prove results (a) and (b) separately.

Proof of part (a)

In this proof, we only show that, as

n \to \infty

,

\begin{matrix} \hat{A} (\hat{α}, \hat{β}) \overset{p}{⟶} A (α_{0}, β_{0}), \end{matrix}

(A10)

and we omit the proofs of

{\hat{T}}^{*} (\hat{α}, \hat{β}, u)

and

{\hat{F}}^{*} (\hat{α}, \hat{β}, u)

because the derivations are similar to those for (A10).

We write

\hat{A} (\hat{α}, \hat{β}) = \frac{{\hat{A}}_{n} (\hat{α}, \hat{β})}{{\hat{A}}_{d} (\hat{α}, \hat{β})}

, where

\begin{matrix} {\hat{A}}_{d} (\hat{α}, \hat{β}) = \{\frac{1}{n} \sum_{i = 1}^{n} \hat{π} ({\tilde{X}}_{i})\} [\frac{1}{n} \sum_{i = 1}^{n} \{1 - \hat{π} ({\tilde{X}}_{i})\}] \end{matrix}

and

\begin{matrix} {\hat{A}}_{n} (\hat{α}, \hat{β}) = \frac{1}{n (n - 1)} \sum_{i \neq j} [\hat{π} ({\tilde{X}}_{i}) \{1 - \hat{π} ({\tilde{X}}_{j})\} I ({\hat{β}}^{⊤} {\tilde{X}}_{i} > {\hat{β}}^{⊤} {\tilde{X}}_{j})] . \end{matrix}

In the following, we examine

{\hat{A}}_{d} (\hat{α}, \hat{β})

and

{\hat{A}}_{n} (\hat{α}, \hat{β})

separately.

Claim 1: Let

A_{d} (α_{0}, β_{0}) = E \{π (X)\} E \{1 - π (X)\}

. Show that as

n \to \infty

,

\begin{matrix} {\hat{A}}_{d} (\hat{α}, \hat{β}) \overset{p}{⟶} A_{d} (α_{0}, β_{0}) . \end{matrix}

Note that by (28), we have

E \{π (X)\} = E \{π (\tilde{X})\}

, yielding that

A_{d} (α_{0}, β_{0}) = E \{π (\tilde{X})\} E \{1 - π (\tilde{X})\}

. By Theorem 1, which shows that

\hat{α}

is the consistent estimator of

α_{0}

, we have

\hat{π} (X_{i}) \overset{p}{⟶} π (X_{i})

as

n \to \infty

. On the other hand, by Condition (C3), which shows that

X_{i}

is independent, applying the Law of Large Number gives that as

n \to \infty

,

\frac{1}{n} \sum_{i = 1}^{n} \hat{π} ({\tilde{X}}_{i}) \overset{p}{⟶} E \{π (\tilde{X})\}

and

\frac{1}{n} \sum_{i = 1}^{n} \{1 - \hat{π} ({\tilde{X}}_{i})\} \overset{p}{⟶} E \{1 - π (\tilde{X})\}

. Thus, Claim 1 holds.

Claim 2: Let

A_{n} (α_{0}, β_{0}) = E [π (X_{1}) \{1 - π (X_{2})\} I (β^{⊤} X_{1} > β^{⊤} X_{2})]

. Show that as

n \to \infty

,

\begin{matrix} {\hat{A}}_{n} (\hat{α}, \hat{β}) \overset{p}{⟶} A_{n} (α_{0}, β_{0}) . \end{matrix}

Similar to Claim 1, we can obtain that

A_{n} (α_{0}, β_{0})

is equal to

E [π ({\tilde{X}}_{1}) \{1 - π ({\tilde{X}}_{2})\} I (β^{⊤} {\tilde{X}}_{1} > β^{⊤} {\tilde{X}}_{2})]

due to (29). By adding and subtracting an additional term, we have

\begin{matrix} {\hat{A}}_{n} (\hat{α}, \hat{β}) - A_{n} (α_{0}, β_{0}) & = & \{{\hat{A}}_{n} (\hat{α}, \hat{β}) - {\tilde{A}}_{n} (α_{0}, β_{0})\} + \{{\tilde{A}}_{n} (α_{0}, β_{0}) - A_{n} (α_{0}, β_{0})\} \\ ≜ & A_{1} + A_{2}, \end{matrix}

where

\begin{matrix} {\tilde{A}}_{n} (α_{0}, β_{0}) = \frac{1}{n (n - 1)} \sum_{i \neq j} [π ({\tilde{X}}_{i}) \{1 - π ({\tilde{X}}_{j})\} I (β_{0}^{⊤} {\tilde{X}}_{i} > β_{0}^{⊤} {\tilde{X}}_{j})] . \end{matrix}

(A11)

A_{1}

can be expressed as

\begin{matrix} A_{1} = \frac{1}{n (n - 1)} \sum_{i \neq j} W_{i j} + o_{p} (1) \end{matrix}

(A12)

with

W_{i j} = \hat{π} ({\tilde{X}}_{i}) \{1 - \hat{π} ({\tilde{X}}_{j})\} I ({\hat{β}}^{⊤} {\tilde{X}}_{i} > {\hat{β}}^{⊤} {\tilde{X}}_{j}) - π ({\tilde{X}}_{i}) \{1 - π ({\tilde{X}}_{j})\} I (β_{0}^{⊤} {\tilde{X}}_{i} > β_{0}^{⊤} {\tilde{X}}_{j})

By a similar discussion to that for Claim 1, we have

\hat{π} (X_{i}) \overset{p}{⟶} π (X_{i})

as

n \to \infty

. In addition, applying Theorem 1 gives

{\hat{β}}^{⊤} (X_{i} - X_{j}) \overset{p}{⟶} β_{0}^{⊤} (X_{i} - X_{j})

. By

P \{β_{0}^{⊤} (X_{i} - X_{j}) = 0\} = 0

and the continuous mapping theorem (e.g., [33], Theorem 3.2.4), we have that as

n \to \infty

,

\begin{matrix} E [I \{{\hat{β}}^{⊤} (X_{i} - X_{j}) > 0\}] \overset{p}{⟶} E [I \{β_{0}^{⊤} (X_{i} - X_{j}) > 0\}] . \end{matrix}

(A13)

Moreover, similar derivations show that

\begin{matrix} I \{{\hat{β}}^{⊤} (X_{i} - X_{j}) \leq 0\} \overset{p}{⟶} I \{β_{0}^{⊤} (X_{i} - X_{j}) \leq 0\} . \end{matrix}

(A14)

By (A14) and the fact that

I (E) I (E^{c}) = 0

for any event

E

, we have

\begin{matrix} E [I \{{\hat{β}}^{⊤} (X_{i} - X_{j}) > 0\} I \{{\hat{β}}^{⊤} (X_{i} - X_{j}) \leq 0\}] \overset{p}{⟶} 0 . \end{matrix}

(A15)

Combining (A13) and (A15) yields

E (| \frac{1}{n (n - 1)} \sum_{i \neq j} W_{i j} |) = o (1)

; thus, by Chebyshev’s inequality, we have

\begin{matrix} A_{1} \overset{p}{⟶} 0 as n \to \infty . \end{matrix}

(A16)

On the other hand, since (A11) is formulated as the U-statistics with bounded kernel, then applying the similar derivation of [34] gives

\begin{matrix} A_{2} \overset{p}{⟶} 0 as n \to \infty . \end{matrix}

(A17)

Therefore, combining (A16) and (A17) shows that Claim 2 and thus (A10) hold.

Proof of part (b)

The proof of part (b) is similar to that of part (a) except for the involvement of the estimates of the function

H (\cdot)

and the parameter

β

. According to Theorem 1, since

\hat{β}

and

\hat{H} (\cdot)

are consistent with the true values, then applying a similar argument in part (a) yields the desired result. Thus, the proof is completed. □

Appendix A.5. Proof of Theorem 4

Proof.

In this appendix, we mainly prove results (a) and (b) separately.

Proof of part (a)

Define

π_{0} (\cdot)

as (1) with

α

being replaced by

α_{0}

. Let

\hat{A} (α_{0}, β_{0})

denote (30) with

\hat{α}

and

\hat{β}

being replaced by

β_{0}

and

β_{0}

, respectively. By adding and subtracting an additional term

\hat{A} (α_{0}, β_{0})

, we have

\begin{matrix} \{\hat{A} (\hat{α}, \hat{β}) - \hat{A} (α_{0}, β_{0})\} + \{\hat{A} (α_{0}, β_{0}) - A (α_{0}, β_{0})\} ≜ R_{1} + R_{2} . \end{matrix}

(A18)

Note that by applying the Law of Large Numbers and Theorem 1 to the denominator terms of

R_{1}

and

R_{2}

, we have that as

n \to \infty

,

\begin{matrix} \{\frac{1}{n} \sum_{i = 1}^{n} \hat{π} ({\tilde{X}}_{i})\} \{\frac{1}{n} \sum_{i = 1}^{n} (1 - \hat{π} ({\tilde{X}}_{i}))\} \overset{p}{⟶} E \{π_{0} ({\tilde{X}}_{i})\} E \{1 - π_{0} ({\tilde{X}}_{i})\} . \end{matrix}

(A19)

To complete this proof, we examine the numerator terms of

R_{1}

and

R_{2}

separately.

Step 1: Examine $R_{1}$

The numerator term of

R_{1}

can be expressed as

\begin{matrix} \frac{1}{n (n - 1)} \sum_{i \neq j} [\hat{π} ({\tilde{X}}_{i}) \{1 - \hat{π} ({\tilde{X}}_{j})\} \hat{β} I ({\tilde{X}}_{i} > {\tilde{X}}_{j}) - π_{0} ({\tilde{X}}_{i}) \{1 - π_{0} ({\tilde{X}}_{j})\} β_{0} I ({\tilde{X}}_{i} > {\tilde{X}}_{j})] \\ = & \frac{1}{n^{2}} \sum_{i, j = 1}^{n} \{\hat{π} ({\tilde{X}}_{i}) \hat{β} - π_{0} ({\tilde{X}}_{i}) β_{0}\} I ({\tilde{X}}_{i} > {\tilde{X}}_{j}) \\ - \frac{1}{n^{2}} \sum_{i, j = 1}^{n} \{\hat{π} ({\tilde{X}}_{i}) \hat{π} ({\tilde{X}}_{j}) \hat{β} - π_{0} ({\tilde{X}}_{i}) π_{0} ({\tilde{X}}_{j}) β_{0}\} I ({\tilde{X}}_{i} > {\tilde{X}}_{j}) \end{matrix}

(A20)

since

I ({\tilde{X}}_{i} > {\tilde{X}}_{j}) = 0

if

i \neq j

. By the Taylor series expansion, we have

\begin{matrix} \hat{π} ({\tilde{X}}_{i}) \hat{β} - π_{0} ({\tilde{X}}_{i}) β_{0} = (\frac{\partial}{\partial ϕ} π ({\tilde{X}}_{i}) β |_{ϕ = ϕ_{0}}) (\hat{ϕ} - ϕ_{0}) + o_{p} (n^{- 1 / 2}), \end{matrix}

then, the first term in the right-hand side of (A20) gives that

\begin{matrix} \frac{1}{n^{2}} \sum_{i, j = 1}^{n} (\frac{\partial}{\partial ϕ} π ({\tilde{X}}_{i}) β |_{ϕ = ϕ_{0}}) (\hat{ϕ} - ϕ_{0}) I ({\tilde{X}}_{i} > {\tilde{X}}_{j}) + o_{p} (n^{- 1 / 2}) \\ = & \frac{1}{n} \sum_{i = 1}^{n} q_{i} (\frac{\partial}{\partial ϕ} π ({\tilde{X}}_{i}) β |_{ϕ = ϕ_{0}}) (\hat{ϕ} - ϕ_{0}) + o_{p} (n^{- 1 / 2}) \\ = & E \{q_{i} (\frac{\partial}{\partial ϕ} π ({\tilde{X}}_{i}) β |_{ϕ = ϕ_{0}})\} (\hat{ϕ} - ϕ_{0}) + o_{p} (n^{- 1 / 2}), \end{matrix}

(A21)

where

q_{i}

in the second step is defined as

\frac{1}{n} \sum_{j = 1}^{n} I ({\tilde{X}}_{i} > {\tilde{X}}_{j})

, and the last step is due to the Law of Large Numbers.

In addition, by the Taylor series expansion, we have

\begin{matrix} \hat{π} ({\tilde{X}}_{i}) \hat{π} ({\tilde{X}}_{j}) \hat{β} - π_{0} ({\tilde{X}}_{i}) π_{0} ({\tilde{X}}_{j}) β_{0} = (\frac{\partial}{\partial ϕ} π ({\tilde{X}}_{i}) π ({\tilde{X}}_{j}) β |_{ϕ = ϕ_{0}}) (\hat{ϕ} - ϕ_{0}) + o_{p} (n^{- 1 / 2}) . \end{matrix}

Then, the second term of (A20) can be expressed as

\begin{matrix} \frac{1}{n^{2}} \sum_{i, j = 1}^{n} \{(\frac{\partial}{\partial ϕ} π ({\tilde{X}}_{i}) π ({\tilde{X}}_{j}) β |_{ϕ = ϕ_{0}}) I ({\tilde{X}}_{i} > {\tilde{X}}_{j})\} (\hat{ϕ} - ϕ_{0}) + o_{p} (n^{- 1 / 2}) \\ = & E \{(\frac{\partial}{\partial ϕ} π ({\tilde{X}}_{i}) π ({\tilde{X}}_{j}) β |_{ϕ = ϕ_{0}}) I ({\tilde{X}}_{i} > {\tilde{X}}_{j})\} (\hat{ϕ} - ϕ_{0}) + o_{p} (n^{- 1 / 2}), \end{matrix}

(A22)

where the equality holds due to the Law of Large Numbers in the U-statistics. Therefore, combining (A21) and (A22) yields

\begin{matrix} \sqrt{n} \{\hat{A} (\hat{α}, \hat{β}) - \hat{A} (α_{0}, β_{0})\} = (E_{1} + E_{2}) \sqrt{n} (\hat{ϕ} - ϕ_{0}) + o_{p} (1), \end{matrix}

(A23)

where

\begin{matrix} E_{1} = \frac{E [q_{i} \{π_{0} ({\tilde{X}}_{i}) + \frac{\partial}{\partial α} π_{0} ({\tilde{X}}_{i}) β_{0}\}]}{E \{π_{0} ({\tilde{X}}_{i})\} E \{1 - π_{0} ({\tilde{X}}_{i})\}} \end{matrix}

and

\begin{matrix} E_{2} = \frac{E [\{π_{0} ({\tilde{X}}_{i}) π_{0} ({\tilde{X}}_{j}) + \frac{\partial}{\partial α} π_{0} ({\tilde{X}}_{i}) π_{0} ({\tilde{X}}_{j}) β_{0}\} I ({\tilde{X}}_{i} > {\tilde{X}}_{j})]}{E \{π_{0} ({\tilde{X}}_{i})\} E \{1 - π_{0} ({\tilde{X}}_{i})\}} . \end{matrix}

Step 2: Examine $R_{2}$

Similarly to Step 1, we only examine the numerator term of

R_{2}

because the denominator term comes from (A19).

For the numerator term of

R_{2}

, we have

\begin{matrix} \frac{2}{n (n - 1)} \sum_{i \neq j} [\frac{1}{2} π_{0} ({\tilde{X}}_{i}) \{1 - π_{0} ({\tilde{X}}_{j})\} β_{0} I ({\tilde{X}}_{i} > {\tilde{X}}_{j})] - E [π_{0} (X_{1}) \{1 - π_{0} (X_{2})\} β_{0} I (X_{1} > X_{2})] . \end{matrix}

ψ_{i j} ≜ \frac{1}{2} π_{0} ({\tilde{X}}_{i}) \{1 - π_{0} ({\tilde{X}}_{j})\} β_{0} I ({\tilde{X}}_{i} > {\tilde{X}}_{j})

is the kernel and

\frac{2}{n (n - 1)} \sum_{i \neq j} ψ_{i j}

forms the U-statistics. By Conditions (C3) and (C4),

E (ψ_{i j}^{2}) < \infty

, and by the construction in Section 3.3.1, we have

E (ψ_{i j}) = E [π_{0} (X_{1}) \{1 - π_{0} (X_{2})\} β_{0} I (X_{1} > X_{2})]

. Then, applying Theorem 12.3 in [29] and Slutsky’s theorem yields that, as

n \to \infty

,

\begin{matrix} \sqrt{n} \{\hat{A} (α_{0}, β_{0}) - A (α_{0}, β_{0})\} \\ = & \frac{1}{E \{π_{0} ({\tilde{X}}_{i})\} E \{1 - π_{0} ({\tilde{X}}_{i})\}} \sqrt{n} (\frac{2}{n (n - 1)} \sum_{i \neq j} ψ_{i j} - E (ψ_{i j})) \\ \overset{d}{⟶} & N (0, 2^{2} E_{3}) \end{matrix}

(A24)

with

E_{3} = \frac{var (ψ_{i j})}{{[E \{π_{0} ({\tilde{X}}_{i})\} E \{1 - π_{0} ({\tilde{X}}_{i})\}]}^{2}}

.

Finally, combining (A23) and (A24) gives that as

n \to \infty

,

\begin{matrix} \sqrt{n} \{\hat{A} (\hat{α}, \hat{β}) - A (α_{0}, β_{0})\} \overset{d}{⟶} N (0, σ_{TI}) \end{matrix}

with

σ_{TI} = {(E_{1} + E_{2})}^{⊤} Σ_{ϕ} (E_{1} + E_{2}) + 4 E_{3}

and

Σ_{ϕ}

being the asymptotic variance of

\sqrt{n} (\hat{ϕ} - ϕ_{0})

defined after Theorem 2.

Proof of part (b)

Following a similar discussion to that in part (a), we consider the decomposition

\begin{matrix} {\hat{A}}_{t} (\hat{α}, \hat{β}) - A_{t} (α_{0}, β_{0}) & = & \{{\hat{A}}_{t} (\hat{α}, \hat{β}) - {\hat{A}}_{t} (α_{0}, β_{0})\} + \{{\hat{A}}_{t} (α_{0}, β_{0}) - A_{t} (α_{0}, β_{0})\} \\ = & K_{1} + K_{2} . \end{matrix}

(A25)

By applying the Law of Large Numbers and Theorem 1 to the denominator terms of

K_{1}

and

K_{2}

, we have that as

n \to \infty

,

\begin{matrix} [\frac{1}{n} \sum_{i = 1}^{n} \hat{π} ({\tilde{X}}_{i}) \{1 - {\hat{S}}_{u} (t | {\tilde{X}}_{i})\}] [\frac{1}{n} \sum_{i = 1}^{n} \{\hat{π} ({\tilde{X}}_{i}) {\hat{S}}_{u} (t | {\tilde{X}}_{i})\}] \\ \overset{p}{⟶} & E [π_{0} ({\tilde{X}}_{i}) \{1 - S_{u} (t | {\tilde{X}}_{i})\}] E \{π_{0} ({\tilde{X}}_{i}) S_{u} (t | {\tilde{X}}_{i})\} . \end{matrix}

(A26)

On the other hand, the numerator term of

K_{1}

can be expressed as

\begin{matrix} \frac{1}{n (n - 1)} \sum_{i \neq j} [\hat{π} ({\tilde{X}}_{i}) \{1 - {\hat{S}}_{u} (t | {\tilde{X}}_{i})\} \hat{π} ({\tilde{X}}_{j}) {\hat{S}}_{u} (t | {\tilde{X}}_{j}) I ({\hat{β}}^{⊤} {\tilde{X}}_{i} > {\hat{β}}^{⊤} {\tilde{X}}_{j})] \\ - \frac{1}{n (n - 1)} \sum_{i \neq j} [π_{0} ({\tilde{X}}_{i}) \{1 - S_{u} (t | {\tilde{X}}_{i})\} π_{0} ({\tilde{X}}_{j}) S_{u} (t | {\tilde{X}}_{j}) I (β^{⊤} {\tilde{X}}_{i} > β^{⊤} {\tilde{X}}_{j})] \\ = & \frac{1}{n^{2}} \sum_{i, j = 1}^{n} [\hat{π} ({\tilde{X}}_{i}) \{1 - {\hat{S}}_{u} (t | {\tilde{X}}_{i})\} \hat{π} ({\tilde{X}}_{j}) {\hat{S}}_{u} (t | {\tilde{X}}_{j}) I ({\hat{β}}^{⊤} {\tilde{X}}_{i} > {\hat{β}}^{⊤} {\tilde{X}}_{j})] \\ - \frac{1}{n^{2}} \sum_{i, j = 1}^{n} [π_{0} ({\tilde{X}}_{i}) \{1 - S_{u} (t | {\tilde{X}}_{i})\} π_{0} ({\tilde{X}}_{j}) S_{u} (t | {\tilde{X}}_{j}) I (β^{⊤} {\tilde{X}}_{i} > β^{⊤} {\tilde{X}}_{j})] \\ = & \frac{1}{n} \sum_{i = 1}^{n} q_{i} [\frac{\partial}{\partial ϕ} π ({\tilde{X}}_{i}) \{1 - S_{u} (t | {\tilde{X}}_{i})\} π ({\tilde{X}}_{j}) S_{u} (t | {\tilde{X}}_{j}) |_{ϕ = ϕ_{0}}] (\hat{ϕ} - ϕ_{0}) + o_{p} (1) . \end{matrix}

(A27)

In addition, the numerator term of

K_{2}

can be written as

\begin{matrix} \frac{2}{n (n - 1)} \sum_{i \neq j} Γ_{i j} - E (Γ_{i j}), \end{matrix}

(A28)

where

Γ_{i j} = \frac{1}{2} π_{0} ({\tilde{X}}_{i}) \{1 - S_{u} (t | {\tilde{X}}_{i})\} π_{0} ({\tilde{X}}_{j}) S_{u} (t | {\tilde{X}}_{j}) I (β^{⊤} {\tilde{X}}_{i} > β^{⊤} {\tilde{X}}_{j})

is the kernel of the U-statistic in the numerator term of

K_{2}

.

Finally, combining (A25)–(A28) with the central limit theorem in

\hat{ϕ} - ϕ_{0}

and

Γ_{i j}

and using Slutsky’s theorem give that as

n \to \infty

,

\begin{matrix} \sqrt{n} \{{\hat{A}}_{t} (\hat{α}, \hat{β}) - A_{t} (α_{0}, β_{0})\} \overset{d}{⟶} N (0, σ_{TD}), \end{matrix}

where

σ_{TD} = M_{1}^{⊤} Σ_{ϕ} M_{1} + 4 M_{2}

, with

M_{2} = \frac{var (Γ_{i j})}{E [π_{0} ({\tilde{X}}_{i}) \{1 - S_{u} (t | {\tilde{X}}_{i})\}] E \{π_{0} ({\tilde{X}}_{i}) S_{u} (t | {\tilde{X}}_{i})\}}

and

M_{1} = E (q_{i} [\frac{\partial}{\partial ϕ} π ({\tilde{X}}_{i}) \{1 - S_{u} (t | {\tilde{X}}_{i})\} π ({\tilde{X}}_{j}) S_{u} (t | {\tilde{X}}_{j}) |_{ϕ = ϕ_{0}}])

. □

References

Lawless, J.F. Statistical Models and Methods for Lifetime Data; Wiley: New York, NY, USA, 2003. [Google Scholar]
Lu, W.; Ying, Z. On semiparametric transformation cure models. Biometrika 2004, 91, 331–343. [Google Scholar] [CrossRef]
Chen, C.-M.; Shen, P.-S.; Wei, J.C.-C.; Lin, L. A semiparametric mixture cure survival model for left-truncated and right-censored data. Biom. J. 2017, 59, 270–290. [Google Scholar] [CrossRef]
Amico, M.; Keilegom, I.V.; Legrand, C. The single-index/Cox mixture cure model. Biometrics 2019, 75, 452–462. [Google Scholar] [CrossRef]
Amico, M.; Keilegom, I.V. Cure models in survival analysis. Annu. Rev. Stat. Its Appl. 2018, 5, 311–342. [Google Scholar] [CrossRef]
Bertrand, A.; Legrand, C.; Carroll, R.J.; Meester, C.D.; Keilegom, I.V. Inference in a survival cure model with mismeasured covariates using a simulation-extrapolation approach. Biometrika 2017, 104, 31–50. [Google Scholar] [CrossRef][Green Version]
Ma, Y.; Yin, G. Cure rate model with mismeasured covariates under transformation. J. Am. Stat. Assoc. 2008, 103, 743–756. [Google Scholar] [CrossRef]
Chen, L.-P. Semiparametric estimation for cure survival model with left-truncated and right-censored data and covariate measurement error. Stat. Probab. Lett. 2019, 154, 108547. [Google Scholar] [CrossRef]
Beyene, K.M.; El Ghouch, A. Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Stat. Med. 2020, 39, 3373–3396. [Google Scholar] [CrossRef]
Heagerty, P.J.; Lumley, T.; Pepe, M.S. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 2000, 56, 337–344. [Google Scholar] [CrossRef]
Kamarudin, A.N.; Cox, T.; Kolamunnage-Dona, R. Time-dependent ROC curve analysis in medical research: Current methods and applications. BMC Med. Res. Methodol. 2017, 17, 53. [Google Scholar] [CrossRef]
Li, L.; Greene, T.; Hu, B. A simple method to estimate the time-dependent receiver operating characteristic curve and the area under the curve with right censored data. Stat. Methods Med. Res. 2018, 27, 2264–2278. [Google Scholar] [CrossRef]
Song, X.; Zhou, X.-H. A semiparametric approach for the covariate specific roc curve with survival outcome. Stat. Sin. 2008, 18, 947–965. [Google Scholar]
Zhang, Y.; Han, X.; Shao, Y. The ROC of Cox proportional hazards cure models with application in cancer studies. Lifetime Data Anal. 2021, 27, 195–215. [Google Scholar] [CrossRef]
Amico, M.; Keilegom, I.V.; Han, B. Assessing cure status prediction from survival data using receiver operating characteristic curves. Biometrika 2021, 108, 727–740. [Google Scholar] [CrossRef]
Kolamunnage-Dona, R.; Kamarudin, A.N. Adjustment for the measurement error in evaluating biomarker performances at baseline for future survival outcomes: Time-dependent receiver operating characteristic curve within a joint modelling framework. Res. Methods Med. Health Sci. 2021, 2, 51–60. [Google Scholar] [CrossRef]
Crowther, M.J.; Lambert, P.C.; Abrams, K.R. Adjusting for measurement error in baseline prognostic biomarkers included in a time-to-event analysis: A joint modelling approach. BMC Med. Res. Methodol. 2013, 13, 146. [Google Scholar] [CrossRef]
Nevo, D.; Zucker, D.M.; Tamimic, R.M.; Wang, M. Accounting for measurement error in biomarker data and misclassification of subtypes. Stat. Med. 2016, 35, 5686–5700. [Google Scholar] [CrossRef]
Mao, M.; Wang, J.-L. Semiparametric efficient estimation for a class of generalized proportional odds cure models. J. Am. Stat. 2010, 105, 302–311. [Google Scholar] [CrossRef]
Chen, L.-P. Semiparametric estimation for the transformation model with length-biased data and covariate measurement error. J. Stat. Comput. Simul. 2020, 90, 420–442. [Google Scholar] [CrossRef]
Carroll, R.J.; Ruppert, D.; Stefanski, L.A.; Crainiceanu, C.M. Measurement Error in Nonlinear Model; CRC Press: New York, NY, USA, 2006. [Google Scholar]
Chen, L.-P.; Yi, G.Y. Semiparametric estimation methods for left-truncated and right-censored survival data with covariate measurement error. Ann. Inst. Stat. Math. 2021, 73, 481–517. [Google Scholar] [CrossRef]
Elandt-Johnson, R.C.; Johnson, N.L. Survival Models and Data Analysis; John Wiley & Sons: New York, NY, USA, 1980. [Google Scholar]
Chen, L.-P.; Yi, G.Y. Analysis of noisy survival data with graphical proportional hazards measurement error models. Biometrics 77, 956–969. [CrossRef]
Hunter, D.R.; Lange, K. Computing estimates in the proportional odds model. Ann. Inst. Stat. Math. 2002, 54, 155–168. [Google Scholar] [CrossRef]
Zheng, Y.; Cai, T.; Feng, Z. Application of the time-dependent ROC curves for prognostic accuracy with multiple biomarkers. Biometrics 2006, 62, 279–287. [Google Scholar] [CrossRef]
Carroll, R.J.; Spiegelman, C.H.; Lan, K.K.G.; Bailey, K.T.; Abbot, R.D. On errors-in-variables for binary regression model. Biometrika 1984, 71, 19–25. [Google Scholar] [CrossRef]
Webb, A.; Ma, J. Cox models with time-varying covariates and partly-interval censoring—A maximum penalised likelihood approach. Stat. Med. 2023, 42, 815–833. [Google Scholar] [CrossRef]
van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: New York, NY, USA, 1998. [Google Scholar]
Kim, J.P.; Lu, W.; Sit, T.; Ying, Z. A unified approach to semiparametric transformation models under general biased sampling schemes. J. Am. Stat. Assoc. 2013, 108, 217–227. [Google Scholar] [CrossRef][Green Version]
van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer: New York, NY, USA, 1996. [Google Scholar]
Su, Y.-R.; Wang, J.-L. Modeling left-truncated and right-censored survival data with longitudinal covariates. Ann. Stat. 2012, 40, 1465–1488. [Google Scholar] [CrossRef]
Durrett, R. Probability: Theory and Examples; Cambridge University Press: New York, NY, USA, 2010. [Google Scholar]
Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 1948, 19, 293–325. [Google Scholar] [CrossRef]

Table 1. Simulation results. “Parameter” indicates the some parameters of interest; “Method” refers to the proposed and naive methods; “Bias” is the bias of the estimators; “S.E.” is the standard error; and “MSE” is the mean squared error.

$σ_{ϵ}^{2}$	Parameter	Methods	$n = 250$			$n = 500$
			Bias	S.E.	MSE	Bias	S.E.	MSE
0.15	$α$	Naive	0.163	0.308	0.121	0.159	0.230	0.078
		Proposed	0.012	0.325	0.106	0.009	0.238	0.057
	$β$	Naive	0.170	0.231	0.082	0.165	0.208	0.070
		Proposed	0.018	0.266	0.071	0.013	0.220	0.049
	$A (α, β)$	Naive	0.108	0.033	0.013	0.096	0.030	0.010
		Proposed	0.014	0.056	0.003	0.007	0.047	0.002
	$A_{5} (α, β)$	Naive	0.115	0.051	0.016	0.108	0.047	0.014
		Proposed	0.016	0.060	0.004	0.013	0.057	0.003
	$A_{10} (α, β)$	Naive	0.114	0.049	0.015	0.103	0.041	0.012
		Proposed	0.013	0.062	0.004	0.009	0.056	0.003
0.35	$α$	Naive	0.195	0.321	0.141	0.178	0.255	0.097
		Proposed	0.020	0.345	0.119	0.017	0.269	0.073
	$β$	Naive	0.195	0.264	0.108	0.181	0.229	0.085
		Proposed	0.023	0.281	0.080	0.020	0.245	0.060
	$A (α, β)$	Naive	0.125	0.047	0.018	0.114	0.042	0.015
		Proposed	0.017	0.060	0.004	0.014	0.055	0.003
	$A_{5} (α, β)$	Naive	0.123	0.060	0.019	0.112	0.055	0.016
		Proposed	0.020	0.071	0.005	0.017	0.066	0.005
	$A_{10} (α, β)$	Naive	0.121	0.054	0.018	0.116	0.050	0.016
		Proposed	0.019	0.069	0.005	0.016	0.063	0.004
0.55	$α$	Naive	0.214	0.348	0.167	0.193	0.285	0.118
		Proposed	0.027	0.362	0.132	0.023	0.307	0.095
	$β$	Naive	0.226	0.287	0.133	0.201	0.266	0.111
		Proposed	0.028	0.301	0.091	0.025	0.278	0.078
	$A (α, β)$	Naive	0.133	0.066	0.022	0.128	0.059	0.020
		Proposed	0.021	0.078	0.006	0.019	0.069	0.005
	$A_{5} (α, β)$	Naive	0.136	0.068	0.023	0.124	0.063	0.019
		Proposed	0.026	0.079	0.007	0.024	0.074	0.006
	$A_{10} (α, β)$	Naive	0.134	0.060	0.022	0.128	0.057	0.020
		Proposed	0.025	0.066	0.005	0.021	0.069	0.005

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, L.-P. Analysis of Receiver Operating Characteristic Curves for Cure Survival Data and Mismeasured Biomarkers. Mathematics 2025, 13, 424. https://doi.org/10.3390/math13030424

AMA Style

Chen L-P. Analysis of Receiver Operating Characteristic Curves for Cure Survival Data and Mismeasured Biomarkers. Mathematics. 2025; 13(3):424. https://doi.org/10.3390/math13030424

Chicago/Turabian Style

Chen, Li-Pang. 2025. "Analysis of Receiver Operating Characteristic Curves for Cure Survival Data and Mismeasured Biomarkers" Mathematics 13, no. 3: 424. https://doi.org/10.3390/math13030424

APA Style

Chen, L.-P. (2025). Analysis of Receiver Operating Characteristic Curves for Cure Survival Data and Mismeasured Biomarkers. Mathematics, 13(3), 424. https://doi.org/10.3390/math13030424

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Analysis of Receiver Operating Characteristic Curves for Cure Survival Data and Mismeasured Biomarkers

Abstract

1. Introduction

2. Notation and Models

2.1. Cure Model

2.2. Measurement Error Model

3. Methodology

3.1. Construction of the Error-Corrected Likelihood Function

3.2. Estimation of Parameters and Functions

3.3. Estimation of ROC and AUC

3.3.1. Time-Independent AUC

3.3.2. Time-Dependent AUC

4. Numerical Studies

5. Summary

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Theoretical Justification

Appendix A.1. Regularity Conditions

Appendix A.2. Proof of Theorem 1

Appendix A.3. Proof of Theorem 2

Appendix A.4. Proof of Theorem 3

Appendix A.5. Proof of Theorem 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI