Nonparametric Sieve Maximum Likelihood Estimation of Semi-Competing Risks Data

Xifen Huang; Jinfeng Xu

doi:10.3390/math10132248

and

School of Mathematics, Yunnan Normal University, Kunming 650092, China

^*

Author to whom correspondence should be addressed.

Mathematics2022, 10(13), 2248;https://doi.org/10.3390/math10132248

This article belongs to the Special Issue Recent Advances in Computational Statistics

Version Notes

Order Reprints

Abstract

In biomedical studies involving time-to-event data, a subject may experience distinct types of events. We consider the problem of estimating the transition functions for a semi-competing risks model under illness-death model framework. We propose to estimate the intensity functions by maximizing a B-spline based sieve likelihood. The method yields smooth estimates without parametric assumptions. Our proposed approach facilitates easy computation of the covariance of the model parameters and yields direct interpretation. Compared with existing approaches, our proposed method requires neither the subjective specification of the frailty distribution nor the Markov or semi-Markov assumption which may be unmet in real applications. We establish the consistency, the convergence rate, and the asymptotic normality of the proposed estimators under some regularity conditions. We also provide simulation studies to assess the finite-sample performance of the proposed modeling and estimation strategy. A real data application is further used to illustrate the proposed methodology.

Keywords:

asymptotics; B-spline; illness-death model; Markov model; proportional hazards; semi-competing risks data

MSC:

46N30; 65C60

1. Introduction

In survival analysis, a subject may experience several distinct types of failures. If apart from censoring, the follow up period ends upon the occurrence of the first event, such data are often referred to as competing risks data. This framework consists of survival data where failure may be due to one of a number of competing causes. In some application, with additional information, this notion can be extended to accommodate that of semi-competing risks ([1,2]), where one type of event (terminal event, e.g., death) may censor the other events (non-terminal event, e.g., relapse of the disease), but not vice versa. The framework of semi-competing risk data have been previously discussed in [1,3]. Furthermore, competing risks data can also be regarded as a special type of multitask prediction problem, which simultaneously predicts multiple outcomes from the same set of predictors. A stacking algorithm borrowing information among multiple prediction tasks to improve multivariate prediction performance (MTPS) is recently proposed by [4]. The MTPS is shown to outperform existing multivariate prediction methods.

Recently [5] suggests that semicompeting risks data can also be analyzed using the conventional illness–death compartment model by a subjective specification of the frailty distribution and postulating the Markov or semi-Markov assumption for the conditional transition functions given the covariates and the frailty ([6,7]). However, the subjective specification of the frailty distribution or the Markov or semi-Markov assumption may be unmet in some practical applications, leading to inconsistent estimators. In such cases, alternative (non-Markov) estimators are needed. Furhthemore, their nonparametric maximum likelihood estimation approach may be computational demanding when the sample size is large.

To address the theoretical and numerical challenges in the semiparametric estimation of semi-competing risks model, we employ the B-spline based sieve maximum likelihood approach to simultaneously estimate the regression parameters and transition functions. Covariates are incorporated naturally via proportional hazards assumptions. This approach facilitates easy calculation of the covariance of the model parameters. The proposed spline estimation algorithm requires much less computation than the isotonic type algorithm used in [5] since the size of the step function is much larger than the number of parameters in our proposed B-spline based approach. Under certain regularity conditions, we are able to prove that the estimators of regression parameters is root-n consistent, asymptotically normal and semiparametric efficient.

The rest of the paper is organized as follows. In Section 2, we will introduce our proposed model and estimating approach. In Section 3, we study the asymptotic properties of the proposed estimators. In Section 4, we provide simulation results. An application to colon cancer data is given in Section 5. We then conclude with some discussion in Section 6. All proofs are relegated to the Appendix A.

2. Methodology

2.1. Model and Likelihood Function

For the ith subject, let

C_{i}

,

X_{i}

,

T_{i 1}

, and

T_{i 2}

denote the censoring, covariate vector, non-terminal event time, and terminal event time, respectively. Define

Y_{i 2} = T_{i 2} \land C_{i}, δ_{i 2} = I (T_{i 2} ⩽ C_{i}), Y_{i 1} = T_{i 1} \land Y_{i 2}

, and

δ_{i 1} = I (T_{i 1} ⩽ Y_{i 2})

. We observe

(Y_{i 2}, δ_{i 2}, δ_{i 1}, X_{i}, i = 1, \dots, n)

. The hazard functions are defined as below.

λ_{1} (t_{1}) = lim_{Δ \to 0} P [T_{1} \in [t_{1}, t_{1} + Δ) | T_{1} \geq t_{1}, T_{2} \geq t_{1}] / Δ,

(1)

λ_{2} (t_{2}) = lim_{Δ \to 0} P [T_{2} \in [t_{2}, t_{2} + Δ) | T_{1} \geq t_{2}, T_{2} \geq t_{2}] / Δ,

(2)

λ_{12} (t_{2} | t_{1}) = lim_{Δ \to 0} P [T_{2} \in [t_{2}, t_{2} + Δ) | T_{1} = t_{1}, T_{2} \geq t_{2}] / Δ,

(3)

where

0 < t_{1} < t_{2}

. In general,

λ_{12} (t_{2} | t_{1})

can depend on both

t_{1}

and

t_{2}

(see Remark 1 for more detailed discussions). Let

Λ_{1} (t) = \int_{0}^{t} λ_{1} (x) d x

and

Λ_{2} (s) = \int_{0}^{s} λ_{2} (x) d x

. Specifically, the probability measure P refers to the joint distribution of

(T_{1}, T_{2}, C)

in the unconditional case. In the conditional case, the probability measure P refers to the joint distribution of

(T_{1}, T_{2}, C)

given X. For the unconditional case, the likelihood function

L (θ)

then takes the form

\prod_{i = 1}^{n} λ_{1} {(Y_{i 1})}^{δ_{i 1}} λ_{2} {(Y_{i 2})}^{(1 - δ_{i 1}) δ_{i 2}} λ_{12} {(Y_{i 2} | Y_{i 1})}^{δ_{i 1} δ_{i 2}} exp (- Λ_{1} (Y_{i 1}) - Λ_{2} (Y_{i 1}) - \int_{Y_{i 1}}^{Y_{i 2}} λ_{12} (s | Y_{i 1}) d s)),

(4)

where

θ = (β_{1}, β_{2}, β_{3}, λ_{10}, λ_{20}, λ_{30})

will be specified as follows.

For the case with q dimension covariates X, the conditional transition rate functions are defined as follows:

λ_{1} (t_{1} | X = x) = λ_{10} (t_{1}) exp (β_{1}^{T} x),

(5)

λ_{2} (t_{2} | X = x) = λ_{20} (t_{2}) exp (β_{2}^{T} x),

(6)

λ_{12} (t_{2} | t_{1}, X = x) = λ_{12, 0} (t_{2} | t_{1}) exp (β_{3}^{T} x) .

(7)

Note that both x and X refer to the covariates where X denote the random variable and x refers to its observed values. The Equations (5)–(7) are the conditional transition functions of

T_{1}

and

T_{2}

(given

X = x

) while the Equations (1)–(3) are the unconditional transition functions of

T_{1}

and

T_{2}

.

To simplify the notation, denote

λ_{3} (t, s) = λ_{12} (s | t),

λ_{30} (t, s) = λ_{12, 0} (s | t),

β = (β_{1}^{T}, β_{2}^{T},

β_{3}^{T})^{T},

β_{0} = {(β_{10}^{T}, β_{20}^{T}, β_{30}^{T})}^{T} .

Note that in our modeling approach,

λ_{30}

depends on two parameters t and s.

2.2. Sieve Space $Θ_{n}$ for the Parameters $(β_{1}, β_{2}, β_{3}, λ_{10}, λ_{20}, λ_{30})$

We propose a sieve space consisting of B-splines for

λ_{j 0} (j = 1, 2, 3)

in maximizing (4). We suppose that

Y_{1}

and

Y_{2}

have compact supports (say

[0, 1]

) and that

∥ β ∥ \leq M

for a known constant

M .

Rewrite

λ_{10} (t) = exp (g_{10} (t)), λ_{20} (s) = exp (g_{20} (s)), λ_{30} (t, s) = exp (g_{30} (t, s))

. Let

ψ = (g_{1}, g_{2}, g_{3})

and

ψ_{0} = (g_{10}, g_{20}, g_{30}) .

A sieve space consisting of B-splines is defined for these new parameters as follows: First, we obtain an extended partition with equal length

1 / K_{n}

for the interval

[0, 1] :

Δ = {s_{- m} = \dots = s_{- 1} = 0 = s_{0} < s_{1} < \dots < s_{K_{n}} = 1 = \dots = s_{K_{n} + m}},

where m (independent of the sample size n) and

K_{n} = O (n^{ν}) (0 < ν < 1 / 2)

are two integers to be chosen later. Note that m and

K_{n}

are two parameters often used in B-spline modeling where m indicates the smoothness of the basis function. Let

N_{n} = K_{n} + m

and

{N_{j}^{m} (s)}_{j = 1}^{N_{n}}

be a normalized B-spline basis associated with

Δ

(see [8]). Then the sieve space for the parameters

θ = (β, ψ (t, s))

is defined as

\begin{matrix} Θ_{n} = {θ_{n} = (β, ψ_{n} (s, t)) : ψ_{n} (s, t) = (g_{1 n} (t), g_{2 n} (s), g_{3 n} (s, t), ∥ β ∥ \leq M, \\ g_{1 n} (t) = \sum_{i = 1}^{m + K_{n}} α_{i} N_{i}^{m} (t), g_{2 n} (s) = \sum_{i = 1}^{m + K_{n}} η_{i} N_{i}^{m} (s), \\ g_{3 n} (s, t) = \sum_{i_{1}, i_{2} = 1}^{m + K_{n}} γ_{i_{1}, i_{2}} N_{i_{1}}^{m} (s) N_{i_{2}}^{m} (t), max_{1 \leq i \leq m + K_{n}} | α_{i} | \leq M_{n}, \\ max_{1 \leq i \leq m + K_{n}} | η_{i} | \leq M_{n}, max_{1 \leq i_{1}, i_{2} \leq m + K_{n}} | γ_{i_{1}, i_{2}} | \leq M_{n}}, \end{matrix}

(8)

where

M_{n} \leq (2 m - 1) / (2 m^{'} (2 m + 1))

with a constant

m^{'}

arbitrarily close to m.

For any

θ_{i} = (β_{i}, ψ_{i}) \in Θ

(i = 1, 2),

we define a distance

d (θ_{1}, θ_{2}) = ∥ β_{1} - β_{2} ∥ + ∥ ψ_{1} - ψ_{2} ∥_{2} .

Remark 1.

Here we assume that the transition intensity

λ_{30} (\cdot)

depends on both

t_{1}

and

t_{2}

. A semi-Markov process specifies that

λ_{30} (t_{1}, t_{2}) = h_{2} (t_{2} - t_{1}) .

However, it is important to note that in either Markov or semi-Markov approaches,

λ_{30}

depends on only one parameter, corresponding to the special cases of our modeling approach where

λ_{30}

can flexibly depend on two parameters.

2.3. Maximization

Let

P_{n},

P denote the empirical measure and the true probability measure of

(δ_{1}, δ_{2}, Y_{1}, Y_{2}, X),

respectively. We maximize the function

\begin{matrix} l_{n} (β, ψ) = P_{n} l (θ; W_{i}) = P_{n} l (β, ψ; W_{i}) = P_{n} {δ_{1 i} [X_{i}^{T} β_{1} + g_{1} (Y_{1 i})] + (1 - δ_{1 i}) δ_{2 i} [X_{i}^{T} β_{2} \\ + g_{2} (Y_{2 i})] + δ_{1 i} δ_{2 i} [X_{i}^{T} β_{3} + g_{3} (Y_{1 i}, Y_{2 i})] - Λ_{1} (Y_{1 i}) - Λ_{2} (Y_{2 i}) \\ - \int_{Y_{1 i}}^{Y_{2 i}} exp (g_{3} (Y_{1 i}, s)) d s} \end{matrix}

(9)

over the sieve space

Θ_{n} .

For the knot selection, we let

m = 3

and use the Bayesian information criterion

BIC (N_{n}) = l_{n} (\hat{β}, \hat{ψ}) + \frac{log n}{n} (3 N_{n} + 3 q)

to choose

K_{n}

which minimizes the criterion function.

3. Theoretical Properties

In this section, we establish the theoretical properties of our spline-based modeling strategy under the following regularity conditions.

Assumptions

(A1) $Y_{1}$ and $Y_{2}$ have compact supports (say $[0, 1]$ ) and X has bounded support in $R^{q}$ where q is the dimension of $X .$ Moreover, if there exists a constant $c_{0}$ and a constant vector $\tilde{γ}$ such that $γ^{⊤} X = c_{0}$ almost surely, then $c_{0} = 0$ and $\tilde{γ} = 0 .$
(A2) $β_{0} \in B,$ where $B$ is a compact set of $R^{3 q}$ with nonempty interior. $λ_{10}$ and $λ_{20} \in H_{r},$ and $λ_{30} \in C_{r} .$
(A3) $K_{n} = O (n^{ν})$ where $ν$ satisfies the restrictions $0.25 / r < ν < 0.5 .$
(A4) $r \geq 2$ where r is the measure of smoothness of $λ_{j}$ in definitions of $H_{r}$ and $C_{r} .$

We first establish the strong consistency for the estimated model parameters.

Theorem 1.

Under Assumptions A1–A3,

\hat{β}

are strong consistent estimators of the true coefficients

β_{0},

and

∥ {\hat{λ}}_{1} - λ_{10} ∥_{2} ⟶ 0,

∥ {\hat{λ}}_{2} - λ_{20} ∥_{2} ⟶ 0,

∥ {\hat{λ}}_{3} - λ_{30} ∥_{2} ⟶ 0

almost surely.

Next, we obtain the convergence rates for the proposed estimators.

Theorem 2.

Under Assumptions A1–A3, it holds that

∥ {\hat{λ}}_{1} - λ_{10} ∥_{2} + ∥ {\hat{λ}}_{2} - λ_{20} ∥_{2} + {∥ {\hat{λ}}_{3} - λ_{30} ∥}_{2} = O_{p} (n^{- r ν} + n^{- (1 / 2 - ν)}) .

This theorem implies that if

v = 1 / (2 + 2 r),

∥ {\hat{λ}}_{3} - λ_{30} ∥_{2} = O_{p} (n^{- r / (2 r + 2)}),

which is the optimal convergence rate in the non-parametric regression setting for bivariate function estimation by [9].

To derive the limiting distribution of the proposed estimators, establish the asymptotic normality, we calculate the directional derivative of the log-likelihood in the associate functional spaces as follows.

Denote V as the linear span of

Θ_{0} - θ_{0},

where

θ_{0}

denote the true value of

θ = (β, ψ)

and

Θ_{0}

denote the true parameter space. Let

l (θ; W)

be the log-likelihood for a sample of size one and

δ_{n} = n^{- r ν} + n^{- (1 / 2 - ν)} .

For any

θ \in {θ \in Θ_{0} : ∥ θ - θ_{0} ∥ = O (δ_{n})},

define the first order directional derivative of

l (θ; W)

at the direction

v \in V

as

\dot{l} (θ; W) = \frac{d l (θ + s v; W)}{d s} |_{s = 0},

and the second order directional derivative as

\ddot{l} (θ; W) = \frac{d^{2} l (θ + s v + \tilde{s} \tilde{v}; W)}{d \tilde{s} d s} |_{s = 0} |_{\tilde{s} = 0} = \frac{d \dot{l} (θ + \tilde{s} \tilde{v}; W)}{d \tilde{s}} |_{\tilde{s} = 0} .

Define the Fisher inner product on the space V as

< v, \tilde{v} > = P \{\dot{l} (θ; W) [v] \dot{l} (θ; W) [\tilde{v}]\}

and the Fisher norm for

v \in V

as

{∥ v ∥}^{1 / 2} = < v, v > .

Let

\bar{V}

be the closed linear span of V under the Fisher norm. Then

(\bar{V}, ∥ \cdot ∥)

is a Hilbert space.

Define the smooth functional of

θ

as

γ (θ) = b^{'} β + \int_{0}^{1} ϕ_{1} (t) λ_{1} (t) d t + \int_{0}^{1} ϕ_{2} (s) λ_{2} (s) d s + \int_{0}^{1} \int_{0}^{1} ϕ_{3} (t, s) λ_{3} (t, s) d t d s,

where b is any vector of

3 q

dimension with

∥ b ∥ \leq 1,

ϕ_{i} \in H_{r} [0, 1], i = 1, 2

λ_{3} \in C_{r} {[0, 1]}^{2} .

For any

v \in V,

we denote

\dot{γ} (θ_{0}) [v] = \frac{d γ (θ_{0} + s v)}{d s} |_{s = 0}

whenever the right hand-side limit is well defined and assume:

(A5) for any $v \in \bar{V},$ $γ (θ_{0} + s v)$ is continuously differentiable in $s \in [0, 1]$ near $s = 0,$ and

$∥ \dot{γ} (θ_{0}) ∥ = sup_{v \in \bar{V} : ∥ v ∥ > 0} \frac{| \dot{γ} (θ_{0}) [v] |}{∥ v ∥} < \infty .$

Note that

γ (θ) - γ (θ_{0}) = \dot{γ} (θ_{0}) [θ - θ_{0}] .

Under Assumption A5, by the Riesz representation theorem, there exists

v^{*} \in \bar{V}

such that

\dot{γ} (θ_{0}) [v] = < v^{*}, v >

for all

v \in \bar{V}

and

∥ v^{*} ∥^{2} = ∥ \dot{γ} (θ_{0}) ∥ .

Theorem 3.

Suppose suppose

r > 2

and assumptions A1–A3, A5 hold, then

n^{1 / 2} (γ (\hat{θ}) - γ (θ)) \to N (0, ∥ \dot{γ} (θ_{0}) ∥^{2})

in distribution and and

γ (\hat{θ})

is semiparametrically efficient.

Remark 2.

Inference about $\hat{β}$ .Theorem 3 offers ease of inference procedure, especially for the regression parameter β. Set

ϕ_{j} (\cdot) = 0 (j = 1, 2, 3)

, then Theorem 3 yields that

n^{1 / 2} b^{'} (\hat{β} - β_{0}) \to N (0, b^{'} Σ_{β β} b),

and thus

n^{1 / 2} (\hat{β} - β_{0}) \to N (0, Σ_{β β}),

by Gramer-Wold device, one can establish semiparametricefficiency of

\hat{β} .

where

Σ_{β β}

can be consistently estimated using the inverse of the Hessian matrix.

Remark 3.

Inference about $λ_{j} (\cdot) (j = 1, 2, 3)$ . For

λ_{j} (\cdot) (j = 1, 2),

let

b = 0

and

ϕ_{k} (k \neq j) = 0,

then Theorem 3 yields that

n^{1 / 2} \int_{0}^{1} ϕ_{j} (w) ({\hat{λ}}_{j} (w) - λ_{j 0} (w)) d w \to N (0, σ_{λ_{j}}^{2}),

where

σ_{λ_{j}}^{2} (j = 1, 2)

can be consistently estimated by using the delta method or some resampling methods. Similarly inference can be done for

λ_{3} (t, s)

: Let

b = 0,

ϕ_{1} (\cdot) = 0,

ϕ_{2} (\cdot) = 0,

then Theorem 3 yields that

n^{1 / 2} \int_{0}^{1} \int_{0}^{1} ϕ_{j} (t, s) ({\hat{λ}}_{3} (t, s) - λ_{30} (t, s)) d t d s \to N (0, σ_{λ_{3}}^{2}),

where

σ_{λ_{3}}^{2}

can be consistently estimated by using the delta method or some resampling methods. The above results can be used to check the linear (quadratic) effect of

t_{j} (j = 1, 2)

, or to check whether

λ_{3} (t_{1}, t_{2})

is an additive form of

t_{1}

and

t_{2} .

4. Simulation Study

We conducted simulations to investigate finite sample performance of the proposed estimator. In the simulation, we let

λ_{10} (t_{1}) = \frac{1}{1 + 2 t_{1}},

λ_{20} (t_{2}) = \frac{1}{1 + 2 t_{2}},

λ_{30} (t_{2} | t_{1}) = \frac{2}{1 + t_{1} + t_{2}} .

By calculation, it is clear that the stipulated transition functions do not follow the transition functions from the models involving the frailty distribution and Markov or semi-Markov modells ([1,5]). It is therefore of interest to examine whether the proposed spline-based estimation procedure still yields reliable and accurate estimates for this scenario which cannot be tackled by existing approaches. We report results with one covariate, X, having a uniform. distribution between 0 and

0.5 .

We consider

β_{j} = 1, - 1, 0.5, j = 1, 2, 3,

and

n = 200

and

400 .

The censoring time was simulated from from a uniform distribution on

(0, τ)

with

τ = 50 .

We compute the spline based semiparametric maximum likelihood estimate using the cubic B-spline and estimate the standard error of the estimated regression parameter using the inverse of the Hessian matrix. For the B-spline, the number of knots

K_{n}

or equivalently

N_{n} = (K_{n} + m)

is chosen using BIC defined in Section 2.3. Table 1, Table 2 and Table 3 presents the estimation bias (BIAS), standard deviations (STD), the mean of the estimated standard error of the estimated regression parameter(ESE) and the coverage proportion of the 95 percent confidence intervals (CP) based on 500 replicates.

Table 1. Simulation results for

(β_{10}, β_{20}, β_{20}) = (1, 1, 1)

.

Table 2. Simulation results for

(β_{10}, β_{20}, β_{20}) = (- 1, - 1, - 1)

.

Table 3. Simulation results for

(β_{10}, β_{20}, β_{20}) = (0.5, 0.5, 0.5)

.

From Table 1, Table 2 and Table 3, we can see (a) the proposed estimates have very small biases; (b) standard deviations of the estimates shrink at approximately the

\sqrt{n}

rate; (c) the estimated standard deviations are very close to those of the original estimates; the 95 percent confidence intervals provide adequate coverage probabilities. It can be seen that the proposed modeling strategy and estimation procedure can yield reliable and accurate estimates and exhibit direct and good interpretation in practice.

5. A Real Data Example

As our proposed B-spline based modeling strategy does not involve the subjective specification of the frailty distribution and do not require the Markov or semi-Markov assumption which may be unmet in real applications, it is hence more flexible than existing approaches in practice. To illustrate this point, we now apply the illness-death model presented in Section 2 to the colon cancer data. It is of interest to examine whether the time spent in state 1 (past) is related to the transition function from state 2 into state 3. For answering this question, we consider a working model

λ_{3} (t, s) = exp (ξ t) λ (s)

. It translates to test

H_{0} : ξ = 0 .

This can be done using the usual likelihood ratio statistic. The results obtained for the colon cancer study show that the effect of time spent in state 1 is significant (p-value

< 0.05

). This allows us to conclude that the Markov assumption may be unsatisfactory for the colon cancer data set. This further demonstrate the stringent assumptions required by existing approaches may be unmet in practice which calls for the need of our proposed methodology.

For illustrative purposes, we only consider one covariates: Lev+5-FU treatment. Our interest centers on understanding the effect of Lev+5-FU treatment and nonparametricall modelling transition functions in different states. Table 4 reports the estimates of the regression coefficients along with standard errors and p-values. From Figure 1 and Figure 2, we can see our proposed model and estimation procedure yield the estimated transition functions with direct and good interpretation. It stipulates quantitatively how the hazard functions of the time to terminal event and the time to non-terminal event evolves over time and shed lights on the disease progression and death risks for colon cancer patients with and without relapse of the cancer. We plot the estimated the transition functions in Figure 2.

Table 4. Estimated regression coefficients and their standard errors for the colon data.

Figure 1. Compartment model for semicompeting risks data.

Figure 2. Estimated transition functions for the colon cancer data.

Furthermore, to illustrate the computational advantage of our proposed approach, for the real data application, the existing frailty-model approach will require the number of parameters

(3 + 413 + 1 = 417)

. However, our proposed B-spline approach only require

(m + K_{n}) * 3 + 3 = (4 + 8) * 3 + 3 = 39

parameters. Hence, the computational cost is substantially reduced while our approach is more flexible than existing approaches because it does not require the subjective specification of the frailty distribution and the Markov or semi-Markov assumption.

6. Concluding Remarks

In this paper, we proposed an spline-based sieve semiparametric maximum likeli- hood method for semi-competing risks data. This method reduces the dimensionality of the estimation problem using the splines and therefore releases the numerical burden of the computation. This approach allow essily infer for both regression parameters and transition functions. It should be a straightforward task to apply the method presented here to allow for non-linear relationships between continuous predictors and survival in the multi-state framework ([6,10] and others). Simulations showed that the new estimator may behave very good. For illustration purposes we used a real dataset from a clinical trail for colon cancer. Competing risks data can also be regarded as a special type of multitask prediction problem. In such a field, the most state-of-the-art method is MTPS [4], which currently does not support predicting survival outcomes. Following their approaches, it would be worthwhile studying the stacked algorithm for prediction with multivariate survival outcomes including competing risks and semi-competing risks data.

Author Contributions

Conceptualization, J.X.; methodology, X.H. and J.X.; software, X.H.; formal analysis, X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of Theorem 1, Theorem 2, and Theorem 3

This section contains the proofs for Theorems 1–3. Some empirical process theorems developed in [11] will be repeatedly used. Throughout the following proofs, we denote

P f = \int f (x) d P (x)

and

P_{n} f = n^{- 1} \sum_{i = 1}^{n} f (X_{i}),

the empirical process indexed by function

f (X) .

Appendix A.1. Proof of Theorem 1

By applying the inequality (31) in [12] (p. 31), we have

sup_{θ \in Θ_{n}} | P_{n} l (θ; W) - P l (θ; W) | \to 0, a . s .

(A1)

Let

\begin{matrix} ζ_{1 n} & = & sup_{θ \in Θ_{n}} | P_{n} l (θ; W) - P l (θ; W) |, \end{matrix}

(A2)

\begin{matrix} ζ_{2 n} & = & P_{n} l (θ_{0}; W) - P l (θ_{0}; W) . \end{matrix}

(A3)

Denote

K_{ϵ} = {θ : d (θ, θ_{0}) \geq ϵ, θ \in Θ_{n}} .

\begin{matrix} inf_{K_{ϵ}} P l (θ; W) & = & inf_{K_{ϵ}} \{P l (θ; W) - P_{n} l (θ; W) + P_{n} l (θ; W)\} \\ \leq & ζ_{1 n} + inf_{K_{ϵ}} P_{n} l (θ; W) . \end{matrix}

(A4)

If

{\hat{θ}}_{n} \in K_{ϵ},

we have

\begin{matrix} inf_{K_{ϵ}} P_{n} l (θ; W) = P_{n} l (\hat{θ}; W) \\ \leq & P_{n} l (θ_{0}; W) = P_{n} l (θ_{0}; W) - P l (θ_{0}; W) + P l (θ_{0}; W) \\ = & ζ_{2 n} + P l (θ_{0}; W) . \end{matrix}

(A5)

By condition A3, we obtain that

{inf}_{K_{ϵ}} P l (θ; W) - P l (θ_{0}; W) = δ_{ϵ} > 0 .

It completes the proof.

Appendix A.2. Proof of Theorem 2

Noticing

E_{P} ∥ n^{1 / 2} (P_{n} - P) ∥_{F_{η}} \leq C J_{η} (ε, F_{η} {, ∥ \cdot ∥}_{2}) {1 + \frac{J_{η} (ε, F_{η} {, ∥ \cdot ∥}_{2})}{η^{2} n^{1 / 2}}},

(A6)

where

J_{η} (ε, F_{η} {, ∥ \cdot ∥}_{2}) = \int_{0}^{η} {1 + log N_{[]} (ε, F_{η} {, ∥ \cdot ∥}_{2} {)}}^{1 / 2} d ε \leq C N^{1 / 2} η .

The right-hand side of (A6) yields

ϕ_{n} (η) = C (N^{1 / 2} η + N / n^{1 / 2}) .

It is easy to see that

ϕ_{n} (η) / η

decreasing in

η,

and

r_{n}^{2} ϕ_{n} (1 / r_{n}) = r_{n} N^{1 / 2} + r_{n}^{2} N / n^{1 / 2} < 2 n^{1 / 2},

where

r_{n} = N^{- 1 / 2} n^{1 / 2} = n^{- ν + 1 / 2}, 0 < ν < 1 / 2 .

Hence

n^{- ν + 1 / 2} d (\hat{θ}, θ_{n 0}) = O_{P} (1)

by Theorem 3.2.5 of [11]. This, together with

d (θ_{n 0}, θ_{0}) = O_{p} (n^{- r ν})

(see Theorem 12.7 in [8], yields that

d (\hat{θ}, θ_{0}) = O_{p} (n^{- (1 / 2 - ν)} + n^{- r ν})

. This completes the proofs.

Appendix A.3. Proof of Theorem 3

Let

ε_{n}

be any positive sequence satisfying

ε_{n} = o (n^{- 1 / 2}) .

For any

v^{*} \in Θ_{0},

by [8], Theorem 12.7, there exists

Π_{n} v^{*} \in Θ_{n}

such that

∥ Π_{n} v^{*} - v^{*} ∥ = o (1)

and

δ_{n} ∥ Π_{n} v^{*} - v^{*} ∥ = o (n^{- 1 / 2}) .

Also define

r [θ - θ_{0}; W] = l (θ; W) - l (θ_{0}; W) - \dot{l} (θ; W) [θ - θ_{0}] .

Then by definition of

\hat{θ},

we have

By (A1) and Chebyshev inequality, independent and identical distribution data, and

∥ Π_{n} v^{*} - v^{*} ∥ = o (1),

we have

I_{1} = o_{p} (n^{- 1 / 2}) .

For

I_{2},

we have

\begin{matrix} I_{2} & = & (P_{n} - P) \{l (\hat{θ}; W) - l (\hat{θ} \pm ε_{n} Π_{n} v^{*}; W) \pm ε_{n} \dot{l} (θ_{0}; W) [Π_{n} v^{*}]\} \\ = & \mp ε_{n} (P_{n} - P) \{\dot{l} (\tilde{θ}; W) - \dot{l} (θ_{0}; W) [Π_{n} v^{*}]\}, \end{matrix}

where

\tilde{θ}

lies between

\hat{θ}

and

\hat{θ} \pm ε_{n} Π_{n} v^{*} .

It follows that

{\dot{l} (θ; W) [Π_{n} v^{*}] : ∥ θ - θ_{0} ∥ = O (δ_{n})}

is Donsker class. Therefore, by Theorem 2.11.23 of [11], we have

I_{2} = ε_{n} \times o_{p} (n^{- 1 / 2}) .

It follows that

δ_{n} ∥ Π_{n} v^{*} - v^{*} ∥ = o (n^{- 1 / 2}),

and

∥ Π_{n} v^{*} ∥^{2} \to {∥ v^{*} ∥}^{2} .

Combing the above facts, together with

P \dot{l} (θ_{0}; W [v^{*}]) = 0,

we can establish that

0 \leq P_{n} {l (\hat{θ}; W) - l (\hat{θ} \pm ε_{n} Π_{n} v^{*}; W)} = \mp ε_{n} P_{n} \dot{l} (θ_{0}; W) [v^{*}] \pm ε_{n} < \hat{θ} - θ_{0}, v^{*} > + ε_{n} \times o_{p} (n^{- 1 / 2}) = \mp ε_{n} (P_{n} - P) {\dot{l} (θ_{0}; W) [v^{*}]} \pm ε_{n} < \hat{θ} - θ_{0}, v^{*} > + ε_{n} \times o_{p} (n^{- 1 / 2}) .

Therefore, we obtain

\sqrt{n} < \hat{θ} - θ_{0}, v^{*} > = \sqrt{n} (P_{n} - P) {\dot{l} (θ_{0}; W) [v^{*}]} + o_{p} (1) \to N (0, ∥ v^{*} ∥^{2}),

where the asymptotic normality is guaranteed by Central limits Theorem and the the asymptotic variance being equal to

∥ v^{*} ∥^{2} = {∥ \dot{l} (θ_{0}; W) ∥}^{2} .

This, together with A5 imply

n^{1 / 2} (γ (\hat{θ}) - γ (θ_{0})) = n^{1 / 2} < \hat{θ} - θ_{0}, v^{*} > + o_{p} (1) \to N (0, ∥ v^{*} ∥^{2})

in distribution. The semiparametric efficiency can be established by applying the result of [13].

References

Fine, J.P.; Jiang, H.; Chappell, R. On semi-competing risks data. Biometrika 2001, 88, 907–919. [Google Scholar] [CrossRef]
Wang, W. Estimating the association parameter for copula models under dependent censoring. J. R. Stat. Soc. Ser. Stat. Methodol. 2003, 65, 257–273. [Google Scholar] [CrossRef]
Day, R.; Bryant, J.; Lefkopoulou, M. Adaptation of bivariate frailty models for prediction, with application to biological markers as prognostic indicators. Biometrika 1997, 84, 45–56. [Google Scholar] [CrossRef]
Xing, L.; Lesperance, M.L.; Zhang, X. Simultaneous prediction of multiple outcomes using revised stacking algorithms. Bioinformatics 2020, 36, 65–72. [Google Scholar] [CrossRef]
Xu, J.; Kalbfleisch, J.D.; Tai, B. Statistical analysis of illness–death processes and semicompeting risks data. Biometrics 2010, 66, 716–725. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Andersen, P.K.; Borgan, O.; Gill, R.D.; Keiding, N. Statistical Models Based on Counting Processes; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Kalbfleisch, J.D.; Prentice, R.L. The Statistical Analysis of Failure Time Data; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
Schumaker, L. Spline Functions: Basic Theory; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Stone, C.J. Optimal global rates of convergence for nonparametric regression. Ann. Stat. 1982, 10, 1040–1053. [Google Scholar] [CrossRef]
Meira-Machado, L.; de Uña-Álvarez, J.; Cadarso-Suárez, C.; Andersen, P.K. Multi-state models for the analysis of time-to-event data. Stat. Methods Med. Res. 2009, 18, 195–222. [Google Scholar] [CrossRef] [Green Version]
Wellner, J. Weak Convergence and Empirical Processes: With Applications to Statistics; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Pollard, D. Convergence of Stochastic Processes; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Bickel, P.J.; Kwon, J. Inference for semiparametric models: Some questions and an answer. Stat. Sin. 2001, 11, 863–886. [Google Scholar]

Figure 1. Compartment model for semicompeting risks data.

Figure 2. Estimated transition functions for the colon cancer data.

Table 1. Simulation results for

(β_{10}, β_{20}, β_{20}) = (1, 1, 1)

.

Table 1. Simulation results for

(β_{10}, β_{20}, β_{20}) = (1, 1, 1)

.

		BIAS	STD	ESE	CP
$n = 200$	$β_{1} = 1$	0.021	0.233	0.219	0.953
	$β_{2} = 1$	−0.016	0.230	0.263	0.954
	$β_{3} = 1$	0.026	0.281	0.219	0.986
$n = 400$	$β_{1} = 1$	0.017	0.166	0.159	0.963
	$β_{2} = 1$	−0.013	0.167	0.164	0.960
	$β_{3} = 1$	0.018	0.122	0.141	0.965

Table 2. Simulation results for

(β_{10}, β_{20}, β_{20}) = (- 1, - 1, - 1)

.

Table 2. Simulation results for

(β_{10}, β_{20}, β_{20}) = (- 1, - 1, - 1)

.

		BIAS	STD	ESE	CP
$n = 200$	$β_{1}$ = −1	−0.015	0.244	0.225	0.956
	$β_{2}$ = −1	0.019	0.232	0.239	0.962
	$β_{3}$ = −1	−0.014	0.269	0.284	0.961
$n = 400$	$β_{1}$ = −1	−0.013	0.144	0.165	0.961
	$β_{2}$ = −1	0.014	0.158	0.164	0.945
	$β_{3}$ = −1	−0.013	0.197	0.185	0.980

Table 3. Simulation results for

(β_{10}, β_{20}, β_{20}) = (0.5, 0.5, 0.5)

.

Table 3. Simulation results for

(β_{10}, β_{20}, β_{20}) = (0.5, 0.5, 0.5)

.

		BIAS	STD	ESE	CP
$n = 200$	$β_{1} = 0.5$	0.017	0.230	0.205	0.966
	$β_{2} = 0.5$	−0.013	0.221	0.219	0.965
	$β_{3} = 0.5$	0.016	0.182	0.218	0.945
$n = 400$	$β_{1} = 0.5$	0.008	0.172	0.155	0.941
	$β_{2} = 0.5$	−0.011	0.132	0.152	0.954
	$β_{3} = 0.5$	0.012	0.125	0.157	0.938

Table 4. Estimated regression coefficients and their standard errors for the colon data.

Transition	Parameters	Estimate	Standard Error	p-Value
12	$β_{1}$	−0.513	0.119	1.6 × 10 $^{- 5}$
13	$β_{1}$	−0.028	0.379	0.469
23	$β_{1}$	0.738	0.130	7.0 × 10 $^{- 9}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Nonparametric Sieve Maximum Likelihood Estimation of Semi-Competing Risks Data

Abstract

1. Introduction

2. Methodology

2.1. Model and Likelihood Function

2.2. Sieve Space $Θ_{n}$ for the Parameters $(β_{1}, β_{2}, β_{3}, λ_{10}, λ_{20}, λ_{30})$

2.3. Maximization

3. Theoretical Properties

Assumptions

4. Simulation Study

5. A Real Data Example

6. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proofs of Theorem 1, Theorem 2, and Theorem 3

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Theorem 2

Appendix A.3. Proof of Theorem 3

References

Article Metrics

Citations

Article Access Statistics

Nonparametric Sieve Maximum Likelihood Estimation of Semi-Competing Risks Data

Abstract

1. Introduction

2. Methodology

2.1. Model and Likelihood Function

2.2. Sieve Space Θ n for the Parameters ( β 1 , β 2 , β 3 , λ 10 , λ 20 , λ 30 )

2.3. Maximization

3. Theoretical Properties

Assumptions

4. Simulation Study

5. A Real Data Example

6. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proofs of Theorem 1, Theorem 2, and Theorem 3

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Theorem 2

Appendix A.3. Proof of Theorem 3

References

Article Metrics

Citations

Article Access Statistics

2.2. Sieve Space $Θ_{n}$ for the Parameters $(β_{1}, β_{2}, β_{3}, λ_{10}, λ_{20}, λ_{30})$