A Semiparametric Single-Index Modelling Approach to Learning Optimal Treatment Regimens with Interval-Censored Data

Yuan, Changhui; Zhao, Shishun; Li, Shiying

doi:10.3390/sym18030532

Open AccessArticle

A Semiparametric Single-Index Modelling Approach to Learning Optimal Treatment Regimens with Interval-Censored Data

by

Changhui Yuan

,

Shishun Zhao

and

Shiying Li

^*

School of Mathematics, Jilin University, Changchun 130012, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(3), 532; https://doi.org/10.3390/sym18030532

Submission received: 17 February 2026 / Revised: 12 March 2026 / Accepted: 18 March 2026 / Published: 20 March 2026

(This article belongs to the Section Mathematics)

Download

Browse Figures

Versions Notes

Abstract

Precision medicine tailored to individual patient characteristics is crucial for improving long-term health outcomes. In survival analysis, a significant challenge for learning an optimal treatment regimen is to handle censoring, which is ubiquitous due to insufficient follow-up or other reasons. While there exist some ready-made methods under right censoring, learning an optimal treatment regimen with the more complicated interval censoring mechanism is still unexplored. To address this significant gap, this work proposes a novel semiparametric single-index modeling method, in which the interaction between the treatment and a single-index combination of covariates is linked through an unknown monotonic function. The proposed approach can capture complex, nonlinear treatment–covariate relationships while maintaining interpretability for clinical decision-making. Our estimation strategy employs sieve maximum likelihood, utilizing monotone splines to approximate the cumulative baseline hazard and B-splines for the unknown link function. To tackle the challenge of maximizing the complicated likelihood, we develop a stable and computationally efficient EM algorithm. The consistency and asymptotic distribution of the resultant estimators are established through the empirical process theory. Simulation studies demonstrate that the proposed approach performs well in finite samples. An application to a clinical trial data set on AIDS highlights the practical utility of the proposed method.

Keywords:

interval censoring; sieve estimation; survival analysis; treatment decision; precision medicine

1. Introduction

Patient responses to pharmacological treatments show significant variability, influenced by factors such as genetics, disease traits, comorbid conditions, and other individual-specific characteristics. This variability highlights the importance of personalized treatment approaches over traditional uniform protocols. Standardized treatment plans may not yield optimal clinical results and might even cause adverse effects in certain cases [1,2]. A relevant example is flexible sigmoidoscopy, a common screening method for colorectal cancer. While beneficial for patients expected to live ten or more years, this procedure can pose risks, including perforation or infection, that may outweigh benefits for those with shorter life expectancy. Therefore, deciding on such interventions requires a careful assessment of each patient’s health status and prognosis [3].

Given the complexity inherent in personalized care, devising individualized treatment decision rules has become essential. These guidelines offer clinicians a systematic approach to customize therapies based on clinical, demographic, and genetic information, thus reducing treatment-related risks and enhancing therapeutic outcomes [4,5,6]. The emerging field of precision medicine embodies this concept by aiming to tailor treatments to the genetic and clinical profile of each patient. Delivering the appropriate medication at the optimal dose and timing, precision medicine seeks to improve outcomes while minimizing side effects and inefficiencies in healthcare delivery [7,8,9]. This individualized strategy not only boosts the chances of positive patient results but also promotes more effective use of healthcare resources.

Various approaches have been developed to estimate optimal treatment regimes, including regression-based methods [10,11,12], value-search methods [13,14,15], and model-based planning frameworks [16,17]. Traditional survival models like the Cox proportional hazards (PH) model [18] have been widely used to evaluate treatment effects. However, such models often have limited capacity to capture complex interactions between treatments and patient features. Moreover, the presence of censored observations complicates the estimation of optimal treatment strategies, as the event of interest may not be fully observed for all individuals. To address these challenges, several extensions of survival analysis techniques have been proposed. For instance, Fang et al. [19] introduces a semiparametric accelerated failure time model for optimal treatment rule estimation, employing augmented inverse probability weighted estimators to manage treatment effect heterogeneity within censored data. Wang et al. [20] expands the Cox model and proposes an iterative alternating optimization algorithm to determine optimal treatment rules for censored survival outcomes. Although the aforementioned extensions have improved the capacity of traditional survival models to accommodate complex data structures and treatment heterogeneity, their flexibility is still limited by reliance on pre-specified model forms. In this context, machine learning techniques have emerged. The advent of machine learning has opened new opportunities for optimizing treatment regimes, offering increased flexibility and predictive power to capture complex, nonlinear relationships without the restrictive assumptions of traditional parametric models [21,22]. However, a major limitation is the reduced interpretability of many machine learning models. Their “black-box” nature makes it difficult to interpret individual covariate effects and treatment interactions, hindering clinicians’ ability to relate results to clinical knowledge and thereby reducing trust. The absence of explicit parameters precludes formal statistical inference, such as constructing confidence intervals or conducting hypothesis tests, thereby obscuring the reliability and significance of estimated treatment effect heterogeneity. In contrast, the statistical model offers improved interpretability through clearly estimated parameters with direct clinical meaning, enabling direct quantification of covariate effects. This transparency facilitates clinician understanding, evaluation, and practical application.

Interval censoring frequently occurs in clinical research involving periodic follow-ups. A well-studied case is current status data, where the event of interest is only known to have happened before or after a single inspection time [23]. More generally, interval censoring refers to situations where events are observed to occur between two time points, resulting in more complicated data patterns. This type of censoring presents computational and theoretical difficulties, necessitating sophisticated statistical methods that balance model complexity and estimation efficiency. Numerous studies have tackled various aspects of inference with interval-censored data [24,25,26,27]. Despite these efforts, methods specifically designed to estimate optimal treatment regimes under interval censoring remain limited. Developing robust techniques tailored to interval-censored survival data would advance personalized clinical decision-making and contribute to better patient care and precision medicine.

In this research, we introduce a novel semiparametric single-index model to investigate treatment effects on survival outcomes within the framework of interval-censored data. Our approach links the treatment variable to a linear combination of covariates through a flexible monotonic function, enabling the capture of intricate relationships between treatments and patient features. This design also maintains clinical interpretability, offering meaningful guidance for personalized therapy development. For parameter estimation, we employ a combination of nonparametric maximum likelihood and sieve estimation methods. In particular, monotone splines are utilized to approximate the cumulative baseline hazard function, while B-splines are applied to model the unknown link function. The estimation procedure is carried out via an expectation-maximization (EM) algorithm that leverages data augmentation to simplify computations. Using empirical process theory, we establish the asymptotic properties of the estimators. Additionally, simulation studies confirm that the proposed algorithm is robust, computationally efficient, and easy to implement in practice.

The rest of the paper is organized as follows. Section 2 introduces the notation, describes the proposed model, and derives the likelihood function based on observed data. Section 3 focuses on the formulation of the sieve maximum likelihood estimation and presents the accompanying EM algorithm. In Section 4, we establish the asymptotic properties of the proposed estimators. Section 5 and Section 6 provide simulation results and apply the method to real-world data, including a case study using the ACTG320 clinical trial dataset. Finally, Section 7 offers concluding remarks and discusses future research directions.

2. Notation, Models and Likelihood

Let

X \in R^{p}

denote a vector of bounded baseline covariates, and let

A \in {0, 1}

be the treatment indicator, where 1 corresponds to the treatment group and 0 to the control group. We consider a semiparametric single-index model for the conditional cumulative hazard function of survival time T, given

X

and A, specified as

Λ (t ∣ X, A) = Λ (t) exp {α^{⊤} X + A φ (β^{⊤} W)},

(1)

where

Λ (t)

represents the unspecified cumulative baseline hazard function,

φ (\cdot)

is an unknown strictly increasing link function,

W

denotes a q-dimensional subset of

X

,

α

and

β

are vectors of unknown regression parameters. Under model (1), the conditional cumulative distribution function of T given

X

and A can be written as

F (t ∣ X, A) = 1 - exp [- Λ (t) exp {α^{⊤} X + A φ (β^{⊤} W)}] .

For identifiability, we impose the constraint

∥ β ∥ = 1

along with the sign restriction

β_{1} > 0

, where

∥ \cdot ∥

denotes the Euclidean norm and

β_{1}

is the first component of the vector

β

. When the link function

φ (\cdot)

is linear, model (1) simplifies to the classical Cox proportional hazards model incorporating a treatment–covariate interaction term [18].

To derive valid estimators under model (1) that facilitate the identification of optimal treatment rules, we first introduce the notation

T (a)

to represent the potential survival time an individual would experience if assigned treatment a, with

a \in {0, 1}

. Following Wang et al. [20], we impose two commonly adopted assumptions from the causal inference framework [28]:

(A1): Stable Unit Treatment Value Assumption: The survival time T equals the potential survival time $T (a)$ associated with the received treatment $A = a$ .
(A2): No Unmeasured Confounders Assumption: Conditional on the covariates $X$ , the treatment assignment A is independent of the potential outcomes ${T (0), T (1)}$ .

Assumption (A1) guarantees that one individual’s treatment does not affect the potential outcomes of others. It also assumes that the treatment effect should be the same for all individuals receiving the same treatment. Assumption (A2) is fundamental but remains unverifiable in observational studies. This assumption implies that there are no unmeasured confounders that simultaneously influence treatment decisions and survival outcomes after accounting for covariates

X

[29]. If this assumption is violated, meaning that there exist unmeasured confounders that affect both treatment assignment and outcomes, the estimated treatment effects may be biased. Such bias can consequently lead to suboptimal or even misleading individualized treatment rules. This ultimately reduces the reliability and generalizability of the derived treatment regimes when applied to new patient populations. Given these assumptions, one can evaluate and contrast the potential survival outcomes associated with various treatment strategies and determine the best possible treatment regimen. In particular, the optimal treatment strategy can be formulated as

I (φ (β^{⊤} W) < 0)

, where

I (\cdot)

denotes the indicator function. For better interpretability, this is equivalently expressed as

I (β^{⊤} W < φ^{- 1} (0))

. The key motivation behind this rule lies in leveraging clinically meaningful contrasts in survival probabilities: when

φ (β^{⊤} W) < 0

, the model predicts that the patient’s survival probability is higher under treatment

A = 1

than under

A = 0

. Conversely, if

φ (β^{⊤} W) \geq 0

, the survival probability is higher if the patient receives treatment

A = 0

. This thresholding reflects a direct comparison of estimated survival benefits, grounded in patient-specific covariates summarized by the index

β^{⊤} W

. For each patient, following the treatment assignment defined by this decision rule ensures that the treatment selected is the one predicted to yield the highest survival probability based on their individual characteristics. This maximization is not merely a statistical abstraction but reflects a concrete comparison of patient-specific survival outcomes under competing treatment options. By explicitly basing treatment choice on which regimen offers the most favorable survival chance, the rule provides a clinically interpretable contrast: it indicates when one treatment clearly outperforms the other in terms of expected patient survival.

We consider the interval-censored data, where the failure time is only known to fall within a specific time range and cannot be observed precisely. For each subject

i = 1, \dots, n

, the failure time

T_{i}

is observed only to fall within the interval

(L_{i}, R_{i}]

, with

L_{i} < R_{i}

. Define the indicator variables as follows:

δ_{i, 1} = 1

if

T_{i}

is left-censored, meaning

T_{i} \leq R_{i}

and

L_{i} = 0

;

δ_{i, 2} = 1

if

T_{i}

is interval-censored, i.e.,

T_{i} \in (L_{i}, R_{i}]

with

L_{i} > 0

and

R_{i} < \infty

; and

δ_{i, 3} = 1

if

T_{i}

is right-censored, indicating

T_{i} > L_{i}

and

R_{i} = \infty

. Note that for each subject i, exactly one of these indicators is equal to 1, so that

δ_{i, 1} + δ_{i, 2} + δ_{i, 3} = 1

. In a randomized clinical trial setting with n subjects, the observed data can be summarized as

O = {L_{i}, R_{i}, δ_{i, 1}, δ_{i, 2}, δ_{i, 3}, A_{i}, X_{i}; i = 1, \dots, n} .

Under the two aforementioned assumptions and assuming that the observed failure times are conditionally independent given the covariates, the likelihood function for the observed data in relation to the model parameters can be written as follows:

\begin{matrix} L_{o b s} (α, β, Λ, φ) & = \prod_{i = 1}^{n} {F (R_{i} ∣ X_{i}, A_{i}) - F (L_{i} ∣ X_{i}, A_{i})} \\ = \prod_{i = 1}^{n} F {(R_{i} ∣ X_{i}, A_{i})}^{δ_{i, 1}} {F (R_{i} ∣ X_{i}, A_{i}) - F (L_{i} ∣ X_{i}, A_{i})}^{δ_{i, 2}} {1 - F (L_{i} ∣ X_{i}, A_{i})}^{δ_{i, 3}} . \end{matrix}

(2)

To handle the nuisance functions

Λ (\cdot)

and

φ (\cdot)

, we approximate them with spline-based functions. In particular, since

Λ (\cdot)

is a positive, increasing function satisfying

Λ (0) = 0

, it is modeled using monotone splines [30]. The approximation takes the following form:

Λ (\cdot) \approx \tilde{Λ} (\cdot) = η^{⊤} M (\cdot) = \sum_{l = 1}^{L_{n}} η_{l} M_{l} (\cdot),

where

M (\cdot) \equiv {(M_{1} (\cdot), \dots, M_{L_{n}} (\cdot))}^{⊤}

represents the integrated spline basis functions. Each

M_{l} (\cdot)

is non-decreasing and takes values within the interval

[0, 1]

, with corresponding spline coefficients

η_{l}

. To ensure the monotonicity of

Λ (\cdot)

, we enforce non-negativity constraints on the coefficients

η_{l}

. The total number of basis functions, denoted by

L_{n} = m_{n} + d_{M}

, is determined once the spline degree and the placement of interior knots are fixed. Here,

m_{n}

represents the number of interior knots, and

d_{M}

refers to the spline degree, which can be set to 1, 2, or 3 for linear, quadratic, or cubic splines, respectively. In practical applications, the interior knots may be chosen either as equally spaced points between the minimum and maximum observed times or according to selected quantiles of the observational time distribution.

On the other hand, we approximate

φ (\cdot)

using B-splines [31], which provide a flexible and smooth approximation while ensuring the preservation of the function’s monotonicity. The approximation is given by:

φ (\cdot) \approx \tilde{φ} (\cdot) = γ^{⊤} B (\cdot) = \sum_{j = 1}^{J_{n}} γ_{j} B_{j} (\cdot),

where

B (\cdot) \equiv {(B_{1} (\cdot), \dots, B_{J_{n}} (\cdot))}^{⊤}

represents the quadratic B-spline basis functions, and

γ_{j}

’s denote the corresponding spline coefficients. To ensure the monotonicity of

φ (\cdot)

and maintain numerical stability in the estimation process, we impose the condition

- C_{n} \leq γ_{1} \leq \dots \leq γ_{J_{n}} \leq C_{n}

for some constant

C_{n}

. These conditions are necessary to ensure the monotonicity of

φ (\cdot)

and to maintain numerical stability of the estimation process. Similar to monotone splines, B-splines require specifying the spline degree

d_{B}

and the set of interior knots. The total number of basis functions,

J_{n}

, is given by

J_{n} = b_{n} + d_{B} - 1,

where

b_{n}

denotes the number of interior knots. These knots may be placed either uniformly spaced between the minimum and maximum of

β^{⊤} W_{i}

, or selected based on quantiles of these values for any unit vector

β

.

With these approximations, the likelihood in Equation (2) can be expressed as:

\begin{matrix} L_{o b s} (α, β, η, γ) = & \prod_{i = 1}^{n} \tilde{F} {(R_{i} ∣ X_{i}, A_{i})}^{δ_{i, 1}} {\tilde{F} (R_{i} ∣ X_{i}, A_{i}) - \tilde{F} (L_{i} ∣ X_{i}, A_{i})}^{δ_{i, 2}} \\ \times {1 - \tilde{F} (L_{i} ∣ X_{i}, A_{i})}^{δ_{i, 3}}, \end{matrix}

(3)

where

\tilde{F} (t ∣ X_{i}, A_{i}) = 1 - exp [- \tilde{Λ} (t) exp {α^{⊤} X_{i} + A \tilde{φ} (β^{⊤} W_{i})}] .

Maximizing the likelihood function in (3) is often computationally challenging due to its intractable form and the large number of parameters involved. Even in the case of relatively simpler models, such as the PH model, the optimization task remains inherently complex. In such cases, traditional methods like the Newton–Raphson algorithm can encounter significant numerical issues, including non-convergence or convergence to local extrema rather than the global optimum [32,33]. To address these challenges and simplify the computation of the sieve maximum likelihood, we develop an EM algorithm as a more efficient and tractable solution. This approach incorporates a three-stage data augmentation procedure, which effectively reduces the computational complexity and mitigates the numerical instability commonly associated with direct maximization. By introducing Poisson latent variables and iteratively refining the parameter estimates, the EM algorithm offers a more stable and efficient solution for estimating the parameters in the model.

3. Estimation Procedure

To improve the efficiency and stability of the estimation procedure, we propose an EM algorithm augmented with a three-stage data augmentation scheme. The EM algorithm alternates between two key steps: in the E-step, conditional expectations of latent variables are computed given the observed data and current parameter values; in the M-step, parameter estimates are updated by maximizing the expected log-likelihood obtained from the E-step. This iterative process continues until convergence, yielding stable and reliable parameter estimates. Incorporating the three-stage data augmentation further enhances computational efficiency and numerical stability by simplifying the calculations involved.

To facilitate maximization, we augment the dataset by introducing two layers of independent Poisson latent variables. Specifically, for each subject i, two independent Poisson random variables are defined as

Z_{i} \sim P o i s (\tilde{Λ} (t_{i 1}) exp {α^{⊤} X_{i} + A_{i} \tilde{φ} (β^{⊤} W_{i})})

and

Y_{i} \sim P o i s ({\tilde{Λ} (t_{i 2}) - \tilde{Λ} (t_{i 1})} exp {α^{⊤} X_{i} + A_{i} \tilde{φ} (β^{⊤} W_{i})})

. Here, the notation

P o i s (ν)

denotes the Poisson distribution with mean

ν

. We express the terms

t_{i 1}

and

t_{i 2}

as follows:

t_{i 1} = R_{i} (δ_{i, 1} = 1) + L_{i} (δ_{i, 1} = 0)

and

t_{i 2} = R_{i} (δ_{i, 2} = 1) + L_{i} (δ_{i, 3} = 1)

if

δ_{i, 1} = 0

. Using the spline-based representation of

\tilde{Λ} (t)

, we can decompose the random variables

Z_{i}

and

Y_{i}

into the following sums

Z_{i} = \sum_{l = 1}^{L_{n}} Z_{i l} and Y_{i} = \sum_{l = 1}^{L_{n}} Y_{i l},

where

Z_{i 1}, \dots, Z_{i L_{n}}, Y_{i 1}, \dots, Y_{i L_{n}}

represent independent Poisson random variables. Each

Z_{i l}

has an expected value of

η_{l} M_{l} (t_{i 1}) exp {α^{⊤} X_{i} + A_{i} \tilde{φ} (β^{⊤} W_{i})}

, while the expected value for each

Y_{i l}

is given by

η_{l} {M_{l} (t_{i 2}) - M_{l} (t_{i 1})} exp {α^{⊤} X_{i} + A_{i} \tilde{φ} (β^{⊤} W_{i})}

, for

l = 1, \dots, L_{n}

. We use the notation

P (u ∣ ω)

to denote the probability mass function of a Poisson random variable u with mean

ω

. Additionally, we express

\tilde{φ} (β^{⊤} W_{i})

as

γ^{⊤} B_{i} (β^{⊤} W_{i})

. Considering the latent variables as if they were fully observed, the complete data likelihood function can be expressed as

\begin{matrix} L_{c} (α, β, η, γ) = \prod_{i = 1}^{n} \prod_{l = 1}^{L_{n}} & P (Z_{i l} ∣ η_{l} M_{l} (t_{i 1}) exp {α^{⊤} X_{i} + A_{i} γ^{⊤} B_{i} (β^{⊤} W_{i})}) \\ \times P {(Y_{i l} ∣ η_{l} {M_{l} (t_{i 2}) - M_{l} (t_{i 1})} exp {α^{⊤} X_{i} + A_{i} γ^{⊤} B_{i} (β^{⊤} W_{i})})}^{δ_{i, 2} + δ_{i, 3}} \end{matrix}

with the constraints:

∥ β ∥ = 1

,

- C_{n} \leq γ_{1} \leq \dots \leq

γ_{J_{n}} \leq C_{n}

,

\sum_{l = 1}^{L_{n}} Z_{i l} > 0

if

δ_{i, 1} = 1

,

\sum_{l = 1}^{L_{n}} Z_{i l} = 0

and

\sum_{l = 1}^{L_{n}} Y_{i l} > 0

if

δ_{i, 2} = 1

, and

\sum_{l = 1}^{L_{n}} Z_{i l} = 0

and

\sum_{l = 1}^{L_{n}} Y_{i l} = 0

if

δ_{i, 3} = 1

, where

B_{i} (β^{⊤} W_{i}) = {(B_{i 1} (β^{⊤} W_{i}), \dots, B_{i J_{n}} (β^{⊤} W_{i}))}^{⊤}

for

i = 1, \dots, n

.

To estimate the parameters

α

,

β

,

η

, and

γ

, we implement the EM algorithm. Specifically, in the E-step, we calculate the conditional expectation of the complete-data log-likelihood

log L_{c} (α, β, η, γ)

with respect to the latent variables, given the current parameter estimates. This step yields

\begin{matrix} Q (α, β, η, γ; α_{(m)}, β_{(m)}, η_{(m)}, γ_{(m)}) \\ = \sum_{i = 1}^{n} \sum_{l = 1}^{L_{n}} {(log η_{l} + α^{⊤} X_{i} + A_{i} γ^{⊤} B_{i} (β^{⊤} W_{i})) {E (Z_{i l}) + (δ_{i, 2} + δ_{i, 3}) E (Y_{i l})} \\ - η_{l} exp {α^{⊤} X_{i} + A_{i} γ^{⊤} B_{i} (β^{⊤} W_{i})} {(δ_{i, 1} + δ_{i, 2}) M_{l} (R_{i}) + δ_{i, 3} M_{l} (L_{i})}} \\ + Q_{add} (α_{(m)}, β_{(m)}, η_{(m)}, γ_{(m)}) . \end{matrix}

(4)

In the above equation,

α_{(m)}

,

β_{(m)}

,

η_{(m)}

, and

γ_{(m)}

represent the mth update of

α

,

β

,

η

, and

γ

, respectively. Furthermore, the function

Q_{add} (α_{(m)}, β_{(m)}, η_{(m)}, γ_{(m)})

depends on these current iterates but does not vary with respect to the optimization variables

α

,

β

,

η

, and

γ

. For the sake of brevity, we omit the explicit conditioning on the observed data and current parameter estimates in the conditional expectations. Let

{\tilde{φ}}_{(m)} (y) = \sum_{j = 1}^{J_{n}} γ_{j (m)} B_{j} (y)

,

{\tilde{Λ}}_{(m)} (t) = \sum_{l = 1}^{L_{n}} η_{l (m)} M_{l} (t)

, and

V_{i (m)} (t) = {\tilde{Λ}}_{(m)} (t) exp {α_{(m)}^{⊤} X_{i} + A_{i} {\tilde{φ}}_{(m)} (β_{(m)}^{⊤} W_{i})}

. The expressions for the above conditional expectations are given by

E (Z_{i l}) = \frac{η_{l (m)} M_{l} (R_{i})}{{\tilde{Λ}}_{(m)} (R_{i})} E (Z_{i}),

E (Y_{i l}) = \frac{η_{l (m)} M_{l} (R_{i}) - η_{l (m)} M_{l} (L_{i})}{{\tilde{Λ}}_{(m)} (R_{i}) - {\tilde{Λ}}_{(m)} (L_{i})} E (Y_{i}) .

Furthermore, the conditional expectations

Z_{i}

and

Y_{i}

can be expressed as

E (Z_{i}) = \frac{V_{i (m)} (R_{i}) δ_{i, 1}}{1 - exp {- V_{i (m)} (R_{i})}}, E (Y_{i}) = \frac{{V_{i (m)} (R_{i}) - V_{i (m)} (L_{i})} δ_{i, 2}}{1 - exp {V_{i (m)} (L_{i}) - V_{i (m)} (R_{i})}} .

In this process, terms independent of the unknown parameters can be omitted, allowing us to focus on the key contributions from the latent variables. This simplification yields a more tractable expression for optimization. Consequently, maximizing (4) reduces to

\begin{matrix} max_{α, β, η, γ} \sum_{i = 1}^{n} & \sum_{l = 1}^{L_{n}} {(log η_{l} + α^{⊤} X_{i} + A_{i} γ^{⊤} B_{i} (β^{⊤} W_{i})) {E (Z_{i l}) + (δ_{i, 2} + δ_{i, 3}) E (Y_{i l})} \\ - η_{l} exp {α^{⊤} X_{i} + A_{i} γ^{⊤} B_{i} (β^{⊤} W_{i})} {(δ_{i, 1} + δ_{i, 2}) M_{l} (R_{i}) + δ_{i, 3} M_{l} (L_{i})}} \\ subject to & ∥ β ∥ = 1, - C_{n} \leq γ_{1} \leq \dots \leq γ_{J_{n}} \leq C_{n} . \end{matrix}

(5)

The computational procedure proceeds as follows. Following the initialization approach described in Wang et al. [20], we begin by fitting a PH model [32] with covariates arranged as

{(X^{⊤}, A W^{⊤}, A)}^{⊤}

. From this model, initial estimates

α_{(0)}

,

β_{(0)}

, and

η_{(0)}

are obtained. Specifically,

α_{(0)}

corresponds to the coefficient estimates for

X

, while

β_{(0)}

is derived from the interaction term coefficients associated with

A W

. The initial values

η_{(0)}

are extracted from the spline coefficients of the estimated baseline cumulative hazard function, as detailed in Wang et al. [32]. For the B-spline coefficients

γ_{(0)}

, we employ a least squares fitting procedure assuming a linear link function, where the intercept and slope correspond respectively to the coefficient for A and the coefficients for

A W

obtained from the PH model.

At iteration

(m + 1)

, the conditional expectations

{E (Z_{i l}), E (Y_{i l}); i = 1, \dots, n,

l = 1, \dots, L_{n}}

are computed using the current parameter estimates

α_{(m)}, β_{(m)}, η_{(m)}

, and

γ_{(m)}

, along with the observed data. These expectations are derived from the expressions provided earlier. In the M-step, we obtain a closed-form update for each

η_{l}

(

l = 1, \dots, L_{n}

) by setting the partial derivative of the function

Q (α, β, η, γ; α_{(m)}, β_{(m)}, η_{(m)}, γ_{(m)})

with respect to

η_{l}

to zero. The resulting solution is expressed in terms of

α

,

β

, and

γ

as follows:

\begin{matrix} η_{l} (α, β, γ) = \frac{\sum_{i = 1}^{n} E (Z_{i l}) + (δ_{i, 2} + δ_{i, 3}) E (Y_{i l})}{\sum_{i = 1}^{n} exp (α^{⊤} X_{i} + A_{i} γ^{⊤} B_{i} (β^{⊤} W_{i})) {(δ_{i, 1} + δ_{i, 2}) M_{l} (R_{i}) + δ_{i, 3} M_{l} (L_{i})}} . \end{matrix}

Importantly, when each

η_{l}

is initialized to a nonnegative value, all conditional expectations involved in the expression for

η_{l} (α, β, γ)

are nonnegative. Consequently,

η_{l} (α, β, γ)

remains nonnegative throughout the iterations. This desirable property of the proposed EM algorithm obviates the need for constrained optimization to enforce nonnegativity of

η_{l}

, thereby simplifying the computational procedure.

Substituting

η_{l} (α, β, γ)

into (5) yields the following objective function:

\begin{matrix} \sum_{i = 1}^{n} \sum_{l = 1}^{L_{n}} & {(α^{⊤} X_{i} + A_{i} γ^{⊤} B_{i} (β^{⊤} W_{i})) \times {E (Z_{i l}) + (δ_{i, 2} + δ_{i, 3}) E (Y_{i l})} \\ - log [\sum_{j = 1}^{n} exp (α^{⊤} X_{j} + A_{j} γ^{⊤} B_{j} (β^{⊤} W_{j})) {(δ_{j, 1} + δ_{j, 2}) M_{l} (R_{j}) + δ_{j, 3} M_{l} (L_{j})}] . \\ \times [E (Z_{i l}) + (δ_{i, 2} + δ_{i, 3}) E (Y_{i l})]} . \end{matrix}

(6)

To update the parameter

β

, we first approximate the objective function using a first-order Taylor expansion around the current estimate

β_{(m)}

. This yields a surrogate function

Q_{new} (β; α_{(m)}, β_{(m)}, γ_{(m)})

in

β

, which facilitates optimization.

\begin{matrix} \sum_{i = 1}^{n} \sum_{l = 1}^{L_{n}} & {(α_{(m)}^{⊤} X_{i} + A_{i} {{\tilde{φ}}_{(m)} (β_{(m)}^{⊤} W_{i}) + {\tilde{φ}}_{(m)}^{'} (β_{(m)}^{⊤} W_{i}) {(β - β_{(m)})}^{⊤} W_{i}}) \\ \times {E (Z_{i l}) + (δ_{i, 2} + δ_{i, 3}) E (Y_{i l})} . \\ - log [\sum_{j = 1}^{n} exp (α_{(m)}^{⊤} X_{j} + A_{j} {{\tilde{φ}}_{(m)} (β_{(m)}^{⊤} W_{j}) + {\tilde{φ}}_{(m)}^{'} (β_{(m)}^{⊤} W_{j}) {(β - β_{(m)})}^{⊤} W_{j}}) . \\ \times {(δ_{j, 1} + δ_{j, 2}) M_{l} (R_{j}) + δ_{j, 3} M_{l} (L_{j})}] \times [E (Z_{i l}) + (δ_{i, 2} + δ_{i, 3}) E (Y_{i l})]}, \end{matrix}

Subject to the constraint

∥ β ∥ = 1

and

β_{1} > 0

,

φ^{'} (\cdot)

is the first derivative of the link function

φ (\cdot)

. The optimal solution

β_{(m + 1)}

is obtained using established nonlinear optimization techniques, such as the solnp() function from the “Rsolnp” package [34].

In the subsequent step, we update the parameters

α

and

γ

while fixing

β

at its current estimate

β_{(m + 1)}

. Substituting this value into the objective function (6) yields a new surrogate objective, denoted

Q_{new} (α, γ; β_{(m + 1)})

, which depends solely on

α

and

γ

.

\begin{matrix} Q_{new} (α, γ; β_{(m + 1)}) \\ = \sum_{i = 1}^{n} \sum_{l = 1}^{L_{n}} {(α^{⊤} X_{i} + A_{i} γ^{⊤} B_{i} (β_{(m + 1)}^{⊤} W_{i})) \times {E (Z_{i l}) + (δ_{i, 2} + δ_{i, 3}) E (Y_{i l})} . \\ - log [\sum_{j = 1}^{n} exp (α^{⊤} X_{j} + A_{j} γ^{⊤} B_{j} (β_{(m + 1)}^{⊤} W_{j})) {(δ_{j, 1} + δ_{j, 2}) M_{l} (R_{j}) + δ_{j, 3} M_{l} (L_{j})}] . \\ \times [E (Z_{i l}) + (δ_{i, 2} + δ_{i, 3}) E (Y_{i l})]} . \end{matrix}

(7)

We then maximize

Q_{new} (α, γ; β_{(m + 1)})

subject to the ordered bound constraints

- C_{n} \leq γ_{1} \leq \dots \leq γ_{J_{n}} \leq C_{n}

for

α

and

γ

. This optimization is performed using the “nloptr” package in R [35], which is particularly effective for optimization problems of this form.

The proposed EM algorithm can be summarized as follows:

Step 1:: Fit a standard PH model to obtain initial estimates $α_{(0)}$ , $β_{(0)}$ and $η_{(0)}$ . Initialize $γ_{(0)}$ via least-squares regression. Set iteration counter $m = 0$ .
Step 2:: At iteration $(m + 1)$ , compute the conditional expectations ${E (Z_{i l}), E (Y_{i l}); i = 1, \dots, n, l = 1, \dots, L_{n}}$ based on the current estimates $α_{(m)}$ , $β_{(m)}$ , $η_{(m)}$ , and $γ_{(m)}$ , as well as the observed data.
Step 3:: Update $β_{(m + 1)}$ by maximizing $Q_{new} (β; α_{(m)}, β_{(m)}, γ_{(m)})$ subject to the unit-norm constraint $∥ β ∥ = 1$ and $β_{1} > 0$ .
Step 4:: Update $α_{(m + 1)}$ and $γ_{(m + 1)}$ by maximizing $Q_{n e w} (α, γ; β_{(m + 1)})$ using nonlinear optimization methods, subject to the ordered bounds $- C_{n} \leq γ_{1} \leq \dots \leq γ_{J_{n}} \leq C_{n}$ .
Step 5:: Calculate $η_{l}^{(m + 1)} = η_{l} (α^{(m + 1)}, β^{(m + 1)}, γ^{(m + 1)})$ for $l = 1, \dots, L_{n}$ .
Step 6:: Set $m = m + 1$ . Go to Step 2 and iterate until convergence.

The algorithm is considered to have converged when the maximum absolute change in the log-likelihood between consecutive iterations falls below a prespecified tolerance, such as 0.001. During initialization, we obtain the initial parameter estimates using the standard PH model as proposed in Wang et al. [32]. To balance computational efficiency and estimation accuracy, we adopt a convergence criterion of 0.005 for the initial model fitting, which is less stringent than the commonly used 0.001 threshold. This relaxation significantly reduces computation time without compromising the quality of the initial estimates. In generating these initial values, the regression coefficients are initialized at zero, and the spline coefficients are set to one without multiple restarts. This approach provides a stable starting point, and subsequent optimization effectively refines the estimates, helping to avoid suboptimal local maxima. The simulation results presented below demonstrate the effectiveness of this initialization approach.

To facilitate readers who may be less familiar with the multi-layer data augmentation framework, we provide a concise pseudo-code representation of the proposed EM algorithm, which is presented as Algorithm 1. This pseudo-code complements the detailed mathematical derivations by clearly outlining the iterative steps involved in parameter estimation, thereby enhancing the overall clarity and accessibility of the algorithm.

Algorithm 1: Proposed EM algorithm

For spline estimation, the numbers of interior knots

m_{n}

and

b_{n}

are selected using the Akaike Information Criterion (AIC) or analogous model selection criteria. The link function

φ (y)

is estimated as

\hat{φ} (y) = {\hat{γ}}^{⊤} B (y)

. Then the optimal treatment rule is given by

I {\hat{φ} ({\hat{β}}^{⊤} W) < 0}

. Finally, the average treatment effect (ATE) is estimated by

{\hat{β}}_{av} = n^{- 1} \sum_{i = 1}^{n} \hat{φ} ({\hat{β}}^{T} W_{i})

.

4. Asymptotic Behavior and Variance Estimation

In this section, we aim to establish the theoretical properties of the estimators proposed for the model parameters. To begin, let us clearly specify the estimators, the true model parameters, and the relevant function spaces. Let

(\hat{α}, \hat{β}, \hat{Λ}, \hat{φ})

denote the estimator of the parameter vector

(α, β, Λ, φ)

, where

\hat{α}

,

\hat{β}

,

\hat{Λ}

and

\hat{φ}

are the proposed estimators of

α

β

,

Λ

and

φ

, respectively. The true parameter values are represented by

α_{0} = {(α_{01}, \dots, α_{0 p})}^{⊤} \in R^{p}

,

β_{0} = {(β_{01}, \dots, β_{0 q})}^{⊤} \in R^{q}

,

Λ_{0}

and

φ_{0}

. Collectively, we write the true parameter vector as

(α_{0}, β_{0}, Λ_{0}, φ_{0})

. For notational convenience, let

∥ \cdot ∥

and

{∥ \cdot ∥}_{\infty}

denote the Euclidean norm and the sup-norm, respectively. Specifically, for a vector

α = {(α_{1}, \dots, α_{p})}^{⊤}

, the Euclidean norm is defined by

∥ α ∥ = {(\sum_{i = 1}^{p} α_{i}^{2})}^{1 / 2}

, whereas the sup-norm is

{∥ α ∥}_{\infty} = {max}_{i} | α_{i} |

. Let

τ

be the maximum follow-up time considered in the study, and

ℓ^{\infty} [x, y]

be the space of bounded sequences on the interval

[x, y]

. We introduce the function space for the cumulative hazard function as

F_{1} = {Λ \in ℓ^{\infty} [0, τ] : Λ

is monotone increasing with

Λ (0) = 0

and

Λ (τ) < \infty}

. Moreover, we define the

L_{2}

-norm on

[0, τ]

with respect to the Lebesgue measure, defined as

{∥ Λ_{1} - Λ_{2} ∥}_{L_{2}}^{2} = \int_{0}^{τ} {Λ_{1} (t) - Λ_{2} (t)}^{2} d F_{Λ} (t),

where

F_{Λ} (t)

denotes the distribution function of t. Let

W

denote the union of the supports of

β^{⊤} W

over all

β \in R^{q}

satisfying

∥ β ∥ = 1

. We then define

{∥ \cdot ∥}_{L_{2} (W)}

and

{∥ \cdot ∥}_{L^{\infty} (W)}

as the

L_{2}

-norm and

L_{\infty}

-norm over

W

, respectively, such that

∥ φ_{1} - φ_{2} ∥_{L_{2} (W)}^{2} = \int_{W} {φ_{1} (y) - φ_{2} (y)}^{2} d F_{φ} (y),

and

{∥ φ ∥}_{L^{\infty} (W)} = sup_{y \in W} | φ (y) |,

where

F_{φ} (y)

is the distribution function of the variable y.

Define the knot sequence

\tilde{T} = {t_{i}}_{i = 1}^{m_{n} + d_{M}}

, satisfying

0 = t_{1} = \dots = t_{d_{M}} < t_{d_{M} + 1} < \dots < t_{m_{n} + d_{M}} < t_{m_{n} + d_{M} + 1} = \dots = t_{m_{n} + 2 d_{M}} = τ,

where the interval

[0, τ]

is divided into

m_{n} + 1

subintervals.

Similarly, define the knot sequence

\tilde{Y} = {y_{i}}_{i = 1}^{b_{n} + d_{B}}

, with points

y_{i} \in W

, such that

y_{1} = \dots = y_{d_{B}} < y_{d_{B} + 1} < \dots < y_{b_{n} + d_{B}} < y_{b_{n} + d_{B} + 1} = \dots = y_{b_{n} + 2 d_{B}},

where the interval

W

is partitioned into

b_{n} + 1

subintervals.

To establish the asymptotic behavior of the proposed estimators, we impose the following regularity assumptions.

(C1): The true parameter vectors $α_{0} \in A$ and $β_{0} \in B$ , where $A$ and $B$ are compact sets. The function $Λ_{0} \in F_{1}$ , possesses a strictly positive, continuously differentiable first derivative over the interval $[0, τ]$ . The function $φ_{0} (y)$ is a strictly increasing with respect to y and has continuous derivatives up to third order on its domain $W$ .
(C2): The vector of covariates $X^{⊤}$ is supported within a bounded subset of $R^{p}$ , and its probability density function is bounded away from zero.
(C3): For $ξ ⩾ 1$ , the $ξ$ th derivative of $Λ_{0}$ satisfies the Lipchitz continuity condition on $[0, τ]$ . Specifically, there exists a constant $k_{Λ}$ such that for all $t_{1}, t_{2} \in [0, τ]$ , $∥ Λ_{0}^{(ξ)} (t_{1}) - Λ_{0}^{(ξ)} (t_{2}) ∥ ⩽ k_{Λ} ∥ t_{1} - t_{2} ∥$ . Similarly, the $ξ$ th derivative of $φ_{0}$ satisfies the Lipchitz condition on $W$ . That is, there exists a constant $k_{φ}$ such that for all $y_{1}, y_{2} \in W$ , $∥ φ_{0}^{(ξ)} (y_{1}) - φ_{0}^{(ξ)} (y_{2}) ∥ ⩽ k_{φ} ∥ y_{1} - y_{2} ∥$ .
(C4): Define the maximal knot spacings as $Δ_{Λ} = max_{d_{M} + 1 ⩽ i ⩽ m_{n} + d_{M} + 1} | t_{i} - t_{i - 1} |$ , $Δ_{φ} = max_{d_{B} + 1 ⩽ j ⩽ b_{n} + d_{B} + 1} | y_{j} - y_{j - 1} |$ . Additionally, the ratios of maximum to minimum knot spacings $Δ_{Λ} / δ_{Λ}$ and $Δ_{φ} / δ_{φ}$ are uniformly bounded, where $δ_{Λ} = min_{d_{M} + 1 ⩽ i ⩽ m_{n} + d_{M} + 1} | t_{i} - t_{i - 1} |, δ_{φ} = min_{d_{B} + 1 ⩽ j ⩽ b_{n} + d_{B} + 1} | y_{j} - y_{j - 1} |$ .
(C5): If there exists a unit vector $\overset{˘}{β}$ such that $Var ({\overset{˘}{β}}^{⊤} W ∣ β_{0}^{⊤} W) = 0$ almost surely, then it must hold that $\overset{˘}{β} = \pm β_{0}$ .
(C6): The number of interior knots satisfies that $b_{n} \to \infty$ and $n^{- 1 / 2} b_{n}^{7} \to \infty$ . Let $C_{n}$ be the upper bound of the B-spline coefficients and $M_{η}$ the upper bound for the monotone spline coefficients. These satisfy:

$C_{n} \to \infty, b_{n}^{4} M_{η}^{- 2} C_{n} exp (4 C_{n}) n^{- 1 / 2} \to 0, and M_{η}^{- 3 / 2} exp (3 C_{n} / 2) b_{n}^{- 1 / 2} \to 0 .$

Additionally, the distance between adjacent interior knots lies in the interval $[M^{- 1} b_{n}^{- 1}, M b_{n}^{- 1}]$ for some constant $M > 0$ .
(C7): If $ν^{⊤} f_{1} (x) + f_{2} (x) = 0$ for any $x$ in a certain support set with probability 1, then $ν = 0$ and $f_{2} (x) = 0$ in this support set.

Conditions (C1) and (C2) ensure that both the true parameter values and covariates lie within bounded regions, which is a standard requirement in the analysis of interval-censored data [36,37]. These conditions also guarantee the requisite smoothness of the functions

Λ_{0}

and

φ_{0}

. Meanwhile, condition (C3) is introduced to facilitate the derivation of the convergence rate of the estimators. Condition (C4) is essential for establishing asymptotic normality [38]. Condition (C5) serves as an identifiability condition for single-index models, and it is satisfied when

W

follows a multivariate normal distribution [20]. Under condition (C6), one may choose the number of knots as

b_{n} = O (n^{1 / 13})

, and the spline coefficient bound as

C_{n} = O ({(log n)}^{δ})

for any constant

δ \in (0, 1)

. Condition (C7) is used to prove that the matrix

E ({[1, h^{⊤} (X, A)]}^{⊤} [1, h^{⊤} (X, A)])

is nonsingular. Notably, the aforementioned regularity conditions may appear somewhat strong or restrictive at first glance. However, such assumptions are standard in the theoretical analysis of interval-censored and semiparametric models. They are crucial for guaranteeing key theoretical properties of the proposed estimators, including consistency and asymptotic normality. Moreover, these conditions are practically reasonable and often satisfied in real applications. For instance, parameter spaces are typically bounded due to prior scientific knowledge or data domain restrictions; smoothness and monotonicity assumptions naturally align with the typical behavior of cumulative hazard functions; and the bounded support of covariates ensures stable estimation. Therefore, the imposed regularity conditions strike a balance between mathematical tractability and practical applicability.

Theorem 1.

Suppose that conditions (C1)–(C6) are satisfied. Then the estimators satisfy

∥ \hat{α} - α_{0} ∥ \to 0

,

∥ \hat{β} - β_{0} ∥ \to 0

∥ \hat{Λ} - Λ_{0} ∥_{\infty} \to 0

and

∥ \hat{φ} - φ_{0} ∥_{W^{1, \infty} (W)} \to 0

in probability, where for any differentiable function f with dervative

f^{'}

, the norm

{∥ f ∥}_{W^{1, \infty} (W)}

is defined as

{∥ f ∥}_{L^{\infty} (W)} + {∥ f^{'} ∥}_{L^{\infty} (W)}

.

In the above theorem,

{∥ \cdot ∥}_{W^{1, \infty} (W)}

denotes the norm in the Sobolev space

W^{1, \infty} (W)

, which incorporates both the function itself and its first-order derivatives [39]. This norm quantifies the uniform boundedness and smoothness of functions in the space. By constraining the estimator under this norm, we effectively restrict its complexity and prevent excessive local oscillations, which promotes estimator stability and favorable convergence properties such as consistency.

To characterize the asymptotic distribution, we impose the assumption

β_{0 q} > 0

without loss of generality. For a q-dimensional vector

u

, we denote by

u_{- q} = {(u_{1}, \dots, u_{q - 1})}^{⊤}

the vector consisting of its first

q - 1

components.

Theorem 2.

Assume conditions (C1)–(C7) hold, let the combined parameter vector be defined as

θ = {(α^{⊤}, β_{- q}^{⊤})}^{⊤}

with true value

θ_{0} = {(α_{0}^{⊤}, β_{0, - q}^{⊤})}^{⊤}

. Then, as

n \to \infty

,

\sqrt{n} (\hat{θ} - θ_{0}) \overset{d}{\to} N (0, I^{- 1} (θ_{0})),

where

\overset{d}{\to}

denotes convergence in distribution, and

I (θ_{0})

is the information matrix evaluated at

θ_{0}

. Consequently,

\sqrt{n} {(\hat{α} - α_{0}, {\hat{β}}_{- q} - β_{0, - q})}^{⊤}

converges in distribution to a mean-zero normal random vector with covariance matrix

I^{- 1} (θ_{0})

. Furthermore, the inverse information matrix

I^{- 1} (θ_{0})

achieves the semiparametric efficiency bound.

The comprehensive proofs of Theorems 1 and 2 can be found in Appendix A. For conducting inference on the true parameter vectors

α_{0}

and

β_{0}

, it is vital to obtain consistent estimates of the covariance matrices of the estimators

\hat{α}

and

\hat{β}

. We recommend a nonparametric bootstrap procedure, which proceeds in three main steps. First, generate S bootstrap samples by resampling the original dataset with replacement, where S is a large integer (typically

S = 100

or greater). Second, for each bootstrap replicate

s = 1, \dots, S

, compute the parameter estimates

{\hat{α}}_{s}^{*}

,

{\hat{β}}_{s}^{*}

,

{\hat{Λ}}_{s}^{*}

, and

{\hat{φ}}_{s}^{*}

using the same estimation procedure applied to the original data. Third, approximate the asymptotic covariance matrices of

\hat{α}

and

\hat{β}

by the sample covariance matrices of the bootstrap estimates

{{\hat{α}}_{s}^{*}}_{s = 1}^{S}

and

{{\hat{β}}_{s}^{*}}_{s = 1}^{S}

, respectively. Since the asymptotic distributions of

\hat{Λ}

and

\hat{φ}

are not well characterized under interval censoring, we construct 95% pointwise confidence bands using the 2.5% and 97.5% quantiles of the corresponding bootstrap replicates

{{\hat{Λ}}_{s}^{*}}_{s = 1}^{S}

and

{{\hat{φ}}_{s}^{*}}_{s = 1}^{S}

. We note that these confidence bands provide valid coverage only at each individual point (i.e., pointwise coverage) but do not guarantee simultaneous coverage over the entire range of the functions. Developing methods for constructing simultaneous confidence bands under interval censoring remains an important direction for future research.

5. Simulation Studies

This section reports the results of simulation experiments designed to assess the finite-sample behavior of the proposed method. For each subject i, four independent covariates,

X_{i 1}, \dots, X_{i 4}

, were generated from a uniform distribution over the interval

[- 1, 1]

to represent the main effects. The influence of treatment on survival times depended solely on the first two covariates,

X_{i 1}

and

X_{i 2}

. The failure time

T_{i}

was generated according to the model in Equation (1), with main-effect parameters

α = {(α_{1}, α_{2}, α_{3}, α_{4})}^{⊤} = {(- 0.4, - 0.3, 0.3, 0.4)}^{⊤}

and interaction-effect parameters

β = {(β_{1}, β_{2})}^{⊤} = {(0.8, - 0.6)}^{⊤}

. Treatment assignment

A_{i}

was generated independently of the covariates

X_{i}

and followed a Bernoulli distribution with probability 0.5. Simulations were conducted for sample sizes of

n = 500

, 800, and 1000, each repeated 500 times to evaluate performance. The baseline hazard function

λ (t)

was defined as a Weibull hazard with shape parameter

r = 2.5

and scale parameter

λ = 2.5

, yielding the cumulative baseline hazard

Λ (t) = t^{r} / λ^{r}

. Two scenarios were examined, differing only by the choice of the link function: (i)

φ_{0} (y) = 2 y + 0.1

; and (ii)

φ_{0} (y) = exp (y) - 1.8

. The follow-up period was fixed at

τ = 5

, with no observations beyond this time. To induce interval censoring, two inspection times were generated for each individual:

U_{1} \sim Unif (0, 2 τ / 5)

and

U_{2} \sim min {0.1 + U_{1} + Exponential (1) \cdot τ / 2, τ}

. These inspection times divided the timeline into three intervals:

(0, U_{1}]

,

(U_{1}, U_{2}]

, and

(U_{2}, \infty)

. Under these settings, approximately 12–17% of observations were left-censored and 34–41% were right-censored.

We explored a variety of models with the number of knots ranging from 1 to 8. Through this exploration, the cumulative baseline hazard function

Λ (\cdot)

was approximated using cubic monotone splines equipped with 5 interior knots, which were positioned at equally spaced quantiles of the observed event times. For the link function

φ (\cdot)

, quadratic B-splines with 3 interior knots were employed, with knots located at equally spaced quantiles of the values

{β^{⊤} W_{1}, \dots, β^{⊤} W_{n}}

. The model incorporating 5 interior knots for the monotone splines and 3 interior knots for the B-splines achieved the lowest Akaike information criterion (AIC) and was thus selected. In the simulation studies, we also investigated the effect of varying the tuning parameter

C_{n}

on the estimation performance. The empirical results demonstrate that once

C_{n}

surpasses roughly 13, the estimates stabilize, showing very little sensitivity to further increases. For practical simplicity, we fixed

C_{n} = 15

as the default choice for all simulations.

Table 1 presents the finite-sample performance metrics of the proposed estimator under the linear link setting

φ_{0} (y) = 2 y + 0.1

. For each parameter, we summarize four statistics computed over 500 replications: Bias (defined as the average difference between the estimated and true parameter values), sample standard error of the 500 estimates (SSE), the average of the 500 standard error estimates (SEE), and 95% coverage probability (CP95) based on normal approximation. Standard errors were obtained through a nonparametric bootstrap procedure with 100 bootstrap samples per replication. As shown in Table 1, the proposed approach yields nearly unbiased estimates for both main-effect parameters

(α_{1}, \dots, α_{4})

and interaction parameters

(β_{1}, β_{2})

. The SSE and SEE closely align, and the 95% coverage probabilities remain close to the intended nominal level, indicating the bootstrap variance estimator is reliable. Furthermore, improvements in bias and variability are observed as the sample size increases from 500 to 1000. The lower part of Table 1 reports inference results for the ATE, denoted

β_{a v}

. Across sample sizes, the ATE estimator displays negligible bias, well-calibrated standard errors, and coverage probabilities near 95%. Collectively, these findings validate the practical effectiveness of the proposed method for finite samples. We conducted a comparison between our proposed method and the approach of Wang et al. [32], hereafter referred to as the “Cox method.” As shown in Table 1, Wang’s method exhibited considerably large bias in the estimation of the

β

parameters, while the bias for the

α

parameters remained relatively moderate. The poor performance of the Cox model, despite the linearity assumption, is likely due to the binary nature of the treatment variable A (taking values 0 or 1), which limits the model’s ability to accurately estimate

β

, resulting in biased estimates. Furthermore, Wang’s method faced considerable challenges in variance estimation, with many variance estimates reported as NA. This issue may be partly attributed to the EM algorithm employed by Wang et al. [32], which relies on optim() requiring convergence at every iteration step, failure to achieve convergence at any step can cause the procedure to crash. Consequently, we only present bias and SSE for this comparison. It is worth noting that the frequency of these NA values decreased as the sample size increased, suggesting that Wang’s method may be more appropriate for large-sample scenarios. Overall, these results demonstrate the enhanced accuracy and robustness of our proposed method, particularly in moderate to small sample settings.

Figure 1 displays the estimated baseline survival function

S (t) = exp {- Λ (t)}

and the linear link function

φ

. The estimated

S (t)

and

φ

closely align with their true counterparts, demonstrating high accuracy. With the visualization of the estimated link function

φ

, clinicians can compute the patient-specific risk score, defined as the index variable

{\hat{β}}^{⊤} W

derived from patient covariates, and then use the plotted curve to more intuitively determine the optimal treatment. Specifically, by locating the patient’s risk score on the horizontal axis of the

φ

plot, clinicians can observe whether the function value is below zero, which indicates that treatment

A = 1

is preferable, or above zero, indicating treatment

A = 0

is preferable. This visual tool enhances interpretability and facilitates transparent, personalized treatment decisions based on the model’s estimates. These findings underscore the robustness and reliability of the proposed method, particularly for larger datasets.

Table 2 reports simulation results for parameter estimates obtained using the exponential link function

φ (y) = exp (y) - 1.8

. Consistent with the findings in Table 1, the proposed method exhibits low bias, accurate variance estimation, and 95% coverage probabilities near the nominal level. As anticipated, estimation accuracy improves and bias diminishes with increasing sample sizes. Figure 2 presents the mean estimates of the link function

φ

and the baseline survival function

S (t) = exp {- Λ (t)}

. As shown in Table 2, the Cox method continues to exhibit substantial bias in estimating the

β

parameters; however, the bias is notably reduced compared to the scenario where the function

φ

is assumed linear. Interestingly, under this setup, we observed that Wang et al. [32]’s method encountered far fewer NA values in variance estimation than in previous settings. This suggests that the Cox method may not be well-suited for the simulation scenario where

φ_{0} (y) = 2 y + 0.1

. These results further confirm accurate estimation and excellent agreement with the true functions.

To assess the performance of the derived treatment decision rules, we created an independent test dataset consisting of 1000 subjects. Individualized treatment recommendations were generated by applying the estimated coefficients from the proposed model. These recommendations were then compared against the true optimal treatment assignments. Table 3 reports the average rates of treatment assignment errors under both the linear and exponential link functions. The model misclassified treatment assignments at very low rates—0.8% for the linear link and 0.2% for the exponential link—highlighting the high accuracy of the proposed method in identifying appropriate treatments. Collectively, these findings reinforce the method’s robustness in tailoring treatment strategies on a personalized level and its potential to enhance clinical outcomes.

To comprehensively evaluate and compare the predictive performance of the proposed model against the Cox model, patients in the test dataset were stratified into three distinct subgroups according to their “risk scores”, defined as

R S (W) \equiv β^{⊤} W = 0.8 W_{1} - 0.6 W_{2}

. This stratification aims to assess how well each model performs across different levels of patient risk, allowing for a more nuanced comparison beyond aggregate measures. The three subgroups correspond to low-risk patients with

R S (W) < 0.2

, moderate-risk patients with

0.2 ⩽ R S (W) ⩽ 0.5

, and high-risk patients with

R S (W) > 0.5

. Figure 3 illustrates the average survival probability over time for each subgroup under the exponential link function scenario with a sample size of

n = 1000

. This breakdown enables us to examine the treatment rule effectiveness and survival predictions within clinically relevant risk categories, highlighting areas where the proposed model may offer improved individualized prognostication relative to the Cox model. Notably, in subgroups (i) and (iii), which correspond to patients with

R S (W)

less than 0.2 and greater than 0.5 respectively, the survival probabilities predicted by both the proposed model and the Cox model closely match the true survival curves. This alignment indicates that, for low-risk and high-risk patients, both models capture the underlying survival patterns with comparable accuracy. However, a distinct difference emerges in subgroup (ii), which includes 176 patients with intermediate

R S (W)

between 0.2 and 0.5. As illustrated in Figure 3, the proposed model’s predicted average survival probabilities more closely approximate the true survival function compared to those generated by the Cox model. Specifically, at time

t = 3

, patients receiving treatments based on the proposed model have an average survival probability of 0.33, which is notably closer to the optimal treatment rule’s survival probability of 0.36. In contrast, patients treated according to the Cox model’s recommendations exhibit a lower average survival probability of 0.31. These findings highlight the superior ability of the proposed model to personalize treatment decisions within this moderate-risk subgroup. The improved performance likely stems from the model’s flexible structure, which better captures complex nonlinear relationships and interactions that may be overlooked by the Cox model.

We assessed the computational efficiency of our approach as follows. In terms of computation speed, the proposed method generally requires less than 2 min to calculate the point estimates based on one simulated dataset of size 1000. For variance estimation, we employed the bootstrap approach with

S = 100

resamples, balancing accuracy and efficiency. To mitigate the increased computation time, we implemented parallel processing utilizing 20 cores to accelerate both the bootstrap procedure and the overall algorithm. Regarding memory usage, monitoring with system tools indicated that the main R process running the computation consumes approximately 4 GB of RAM for the dataset of this size. This memory usage is moderate and considered feasible for typical modern desktop environments.

6. Application

We applied the proposed approach to data from the AIDS Clinical Trials Group (ACTG) 320 study, a randomized, double-blind, placebo-controlled trial designed to assess the efficacy of a three-drug antiretroviral regimen—indinavir (IDV) combined with open-label zidovudine (ZDV) or stavudine plus lamivudine (IDV group)—compared to a two-drug regimen of ZDV (or stavudine) plus lamivudine alone (non-IDV group). The trial enrolled HIV-1-infected individuals who had undergone at least three months of prior zidovudine therapy and had baseline CD4 counts not exceeding 200 cells/mm³ [40]. Participants were administered open-label ZDV and lamivudine, with randomization to either indinavir treatment (

A = 1

) or a placebo (

A = 0

) every eight hours. The primary composite outcome, denoted by T, was defined as the time until the first occurrence of an AIDS-defining event, a reduction of at least 50% in CD4 cell count from baseline, or death due to any cause. Further details on the trial’s design and procedures are available in Hosmer et al. [41]. Our objective was to explore baseline patient features that modify treatment effects on survival and to develop individualized treatment rules that optimally allocate either the three-drug IDV-containing regimen or the two-drug non-IDV regimen to maximize expected survival benefit. A key analytic challenge stemmed from the periodic nature of clinical follow-up assessments in the trial. Since AIDS-defining events, substantial CD4 declines, or deaths were detected only at scheduled visits, typically occurring every 4 to 8 weeks, the exact event time could not be precisely determined and was only known to lie between two consecutive visits. Consequently, T is interval-censored, meaning it is known to fall within the interval between the last event-free visit and the first visit at which the event was confirmed. This interval censoring complicates standard survival analysis and necessitates specialized semiparametric modeling frameworks to ensure valid inference and optimal treatment allocation.

After removing participants with missing covariates, the ACTG320 dataset included 1080 individuals, including 65 with interval-censored event times and 992 with right-censored observations. While most observations are right-censored due to no event by the last follow-up, about 6% are interval-censored as a direct result of the study design rather than data limitations. Treating these interval-censored cases as exact event times or as right-censored observations may lead to biased survival estimates. Although interval-censored observations represent a smaller portion of the dataset, properly incorporating them improves inference accuracy and fully utilizes the available information, thus enhancing model validity compared to methods that consider only right censoring. We incorporated 10 covariates as main effects: Sex (1 = male, 0 = female), Race (1 = white Non-Hispanic, 0 = otherwise), Ivdrug (1 = never used IV drugs, 0 = current or previous IV drug use), Hemophil (1 = hemophiliac, 0 = otherwise), Weight (weight at enrollment in kilograms), Karnof (Karnofsky Performance Scale, where 100 indicates normal functioning without complaints or disease signs; 90 denotes minor symptoms with normal activity; 80 reflects some symptoms with activity requiring effort; and 70 means the ability to care for oneself but no normal or active work), AveCD4 (baseline CD4 count), AveCD8 (baseline CD8 count), Priorzdv (months of prior ZDV use), and Age (participant age at enrollment in years). All continuous covariates were standardized to have a mean of 0 and a standard deviation of 1. Previous studies Jiang et al. [42] and Geng et al. [43] suggested potential interactions between treatment effects and the covariates Karnof, Weight, AveCD4, and Age, leading us to include these four variables as interaction. We evaluated the influence of treatment–covariate interactions on survival time and derived the optimal treatment strategies by fitting the specified model (1).

We assumed that the failure time followed a semiparametric single-index model characterized by the conditional hazard function:

Λ (t ∣ X, W) = Λ (t) exp {X_{1} α_{1} + \dots + X_{10} α_{10} + A φ (\sum_{i = 1}^{4} W_{i} β_{i})},

where

X_{1}, \dots, X_{10}

represent the ten main effects mentioned earlier, with

W_{1}, \dots, W_{4}

corresponding to Karnof, Weight, AveCD4, and Age, respectively. Following the simulation, we used cubic splines to approximate

Λ (\cdot)

, quadratic B-splines to approximate link function and 5 interior knots for monotone splines and 3 for B-splines. The interior knots were positioned at equally spaced quantiles covering the range from the smallest to the largest observed times. Figure 4 displays the estimated link function, revealing a clear nonlinear pattern.

Table 4 presents a comparison between our proposed approach and the traditional Cox proportional hazards model in assessing main covariate effects, the average treatment effect, and treatment–covariate interaction terms. Displayed results include parameter estimates (EST), estimated standard errors (SE), and corresponding p-values. For our method, the standard errors were obtained through nonparametric bootstrapping using 100 resamples. The estimated interaction coefficients quantify the relative contributions of each treatment–covariate interaction to modifying treatment effects. Age (0.683) and Karnofsky score (0.297) have positive coefficients, indicating that higher values increase the single-index score

{\hat{β}}^{⊤} W

. Conversely, weight (0.153) shows a smaller positive effect, while average CD4 count has a negative coefficient (−0.649), suggesting higher CD4 levels decrease the index. Since our treatment rule assigns the IDV regimen when the estimated link function

\hat{ϕ} ({\hat{β}}^{⊤} W)

is less than zero, patients with lower single-index values, who are often younger or have lower Karnofsky scores but favorable profiles in other covariates, are recommended for treatment. Our model identified both race and age as significant predictors of survival, whereas the Cox model only found race to be significant. A key finding from both models is the significant interaction between the baseline CD4 count (AveCD4) and treatment, indicating that patients with lower initial CD4 levels derive greater benefit from the IDV-inclusive therapy, experiencing higher survival probabilities or extended survival times. Crucially, our method uncovered an additional significant interaction: age modifies treatment efficacy, with older patients experiencing a greater survival advantage from the IDV regimen than younger patients. Although the average treatment effect was significant in both analyses, our method yielded a far more stable estimate (SE = 0.026) compared to the Cox model (SE = 0.395). Unlike the Cox model, which relies on the proportional hazards assumption, our proposed method provides robust analysis even when this assumption is violated, offering a more reliable representation of treatment effects across diverse patient populations. In summary, our analysis demonstrates that the proposed flexible model reveals significant effects, such as the interaction with age that are missed by the Cox model, which imposes a linear structure on the link function. As demonstrated in Figure 4, the link function clearly exhibits nonlinear patterns, highlighting the advantage of our approach’s flexibility and robustness to potential model misspecification. This makes it particularly well-suited for analyzing complex datasets with interval-censored outcomes, enabling more personalized and effective treatment strategies for HIV patients.

The results presented in Table 5 compare the mean survival probabilities at specific time points across three treatment arms: Treatment A (IDV combined with open-label ZDV and lamivudine for all patients), Treatment B (ZDV and lamivudine alone for all patients), and Treatment C (a personalized treatment regimen tailored to individual patient characteristics based on the proposed model). The findings clearly indicate that Treatment C consistently achieves the highest survival probabilities, underscoring the effectiveness of personalized treatment plans driven by individual patient characteristics. Among the three arms, Treatment A demonstrates the lowest survival probabilities, suggesting that the addition of IDV to ZDV and lamivudine does not yield improved outcomes compared to the other regimens. Treatment B, while outperforming Treatment A, shows survival probabilities that are slightly lower than those of Treatment C. This indicates that the standard combination of ZDV and lamivudine alone is more effective than adding IDV but falls short of the benefits provided by a tailored approach. These findings emphasize the importance of personalized medicine in improving patient outcomes.

We further visualized the estimated link function

\hat{ϕ} ({\hat{β}}^{⊤} W)

against age to provide a more clinically interpretable illustration of treatment effect heterogeneity. Figure 5 exhibits a nonlinear relationship that aligns with the overall pattern observed between

\hat{ϕ} ({\hat{β}}^{⊤} W)

and the index variable. Notably, the link function dips below zero primarily at younger ages, which, according to our treatment rule, corresponds to recommending the IDV regimen. This pattern suggests that treatment decisions are not driven by age alone but by the combined effect of age and other covariates encapsulated in the single-index

{\hat{β}}^{⊤} W

.

In particular, some younger patients with favorable profiles in other covariates may have low overall index values and thus be assigned to IDV treatment. This observation underscores the complexity and multivariate nature of the personalized treatment rule derived from our model.

7. Discussion and Concluding Remarks

In this study, we introduced a new framework for estimating optimal individualized treatment rules by modeling treatment–covariate interactions within a flexible single-index model. Our method utilizes a sieve maximum likelihood estimation technique specifically designed for interval-censored survival data, incorporating monotone splines to model the cumulative baseline hazard and B-splines to approximate the link function. The adaptability of the resulting treatment rules arises from this nonparametric link function, which effectively captures complex, potentially nonlinear interactions between treatment assignment and patient covariates. To efficiently compute the sieve estimators, we developed an easy-to-implement EM algorithm. Building on empirical process theory, we derived the asymptotic properties of our estimators, thereby providing rigorous theoretical support for the proposed methodology. In this work, we utilized

S = 100

bootstrap replications for inference, which provides reliable standard error estimation. While increasing the number of replications could further improve the accuracy of confidence interval calibration, this was not feasible in the current study due to computational time constraints. Future research will focus on enhancing computational efficiency, allowing for a larger number of bootstrap replications and more precise inference. To promote wider use of the proposed method, the R code is openly accessible at https://github.com/ssshyyy0411/single-index-IC (accessed on 11 March 2026).

Several promising directions merit further investigation. First, extending the current approach to high-dimensional covariate settings is essential, given the increasing prevalence of biomarkers and genomic data in modern precision medicine. Direct application of the existing method without proper regularization or variable selection may lead to overfitting and reduced interpretability in such settings. Efficiently handling high-dimensional data thus requires incorporating sparsity-inducing techniques, such as LASSO [44], SCAD [45], or other regularization methods. These approaches enable simultaneous variable selection and treatment–covariate interaction modeling, thereby help maintain estimation accuracy and improve model interpretability. Developing and integrating these mechanisms represents a crucial direction for future methodological enhancements. Moreover, many clinical decisions involve not a single binary treatment, but rather multiple competing treatments or dynamic sequences of interventions. Therefore, to broaden the applicability of our method, it is essential to develop extensions that can handle multi-arm and longitudinal treatment settings. Second, incorporating advanced machine learning architectures, including deep neural networks, could substantially enhance flexibility. These models offer great flexibility in capturing complex, nonlinear relationships between covariates and outcomes. These methods have proven effective in a range of semiparametric regression settings, including those involving right-censored survival data [46] and interval-censored survival data [47,48]. Their ability to automatically learn patterns from large datasets could complement traditional statistical methods and enhance predictive accuracy. Third, accommodating time-varying treatment effects is critical for chronic disease management and longitudinal interventions, where treatment efficacy may evolve with disease progression or cumulative exposure [49,50]. Extending the framework to dynamic treatment regimes would broaden its clinical relevance. Fourth, an important extension involves incorporating cluster-level random effects to capture unobserved heterogeneity across centers or clusters. In multi-center studies or clustered designs, fixed or random effects are commonly employed to model center-specific or cluster-specific variability [51]. Integrating such random effects into the proposed framework would broaden its applicability to multi-center trial data, thereby improving both inference and predictive performance. In addition, while the current work is developed within a sieve maximum likelihood framework, situating the methodology within the broader frequentist-Bayesian context would further enrich its scope. The Bayesian approach offers complementary advantages such as natural uncertainty quantification through posterior distributions, the incorporation of prior knowledge, and flexible modeling of complex hierarchical structures. Recent advances in Bayesian degradation modeling and inference [51,52] provide useful methodological references for such an extension. Furthermore, in the current ACTG320 analysis, subjects with missing baseline covariates were excluded for simplicity and to maintain a complete-case analysis framework. We acknowledge that approach may introduce potential selection bias and reduce statistical efficiency. Incorporating principled missing-data methods such as multiple imputation [53] would help to reduce bias and increase the robustness of the results. We plan to explore these approaches in future work to improve the analysis. Finally, generalizing the methodology to other semi-parametric survival models, such as accelerated failure time [54] or transformation models [55], would enable richer modeling of the underlying survival distribution and treatment–covariate dependencies. Such extensions could accommodate cure rate structures, or competing risks, thereby enhancing applicability across diverse medical and public health contexts.

Author Contributions

Conceptualization, C.Y. and S.L.; Methodology, C.Y.; Software, C.Y.; Validation, C.Y.; Formal analysis, C.Y.; Investigation, C.Y.; Resources, S.Z.; Data curation, C.Y.; Writing—original draft preparation, C.Y.; Writing—review and editing, C.Y. and S.L.; Visualization, S.L.; Supervision, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data that support the findings in this paper are not publicly available due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ATE	Average treatment effect
EM	Expectation-maximization
AIC	Akaike information criterion
AIDS	Acquired Immunodeficiency Syndrome
ACTG	AIDS Clinical Trials Group
IDV	Indinavir
ZDV	Zidovudine
EST	Regression parameter estimates
SSE	Sample standard errors
SEE	Standard error estimates
CP95	95% coverage probability

Appendix A. Proofs of Theorems 1 and 2

Let

P_{n}

denote the empirical measure constructed from n independent samples, and let

P

be the true probability measure. More formally, for a measurable function f and a random variable X with cumulative distribution function F, the expectation under the true measure is defined as

P f = \int f (x) d F (x)

. The empirical measure is given by

P_{n} f = \frac{1}{n} \sum_{i = 1}^{n} f (X_{i})

, where

X_{1}, \dots, X_{n}

are independent samples draws from the distribution of X. The empirical process is defined as

G_{n} = \sqrt{n} (P_{n} - P),

which indexes the centered and scaled deviation between the empirical and true measures over function classes. Throughout this manuscript, M denotes a generic positive constant whose value may differ from line to line without further specification. For notational convenience, we introduce the following shorthand:

ψ_{0} (W) = φ_{0} (β_{0}^{⊤} W), \hat{ψ} (W) = \hat{φ} (β^{⊤} W)

and

\tilde{ψ} (W) = \tilde{φ} (β_{0}^{⊤} W)

. where

β_{0}

is the true parameter vector,

\hat{β}

is its estimator, and

\tilde{φ}

is B-splines approximation to the unknown link function

φ_{0}

. The log-likelihood for a single observation is given by

\begin{matrix} ℓ (α, Λ, ψ) & = δ_{1} log [1 - exp (- Λ (R) exp (α^{⊤} X + A ψ (W)))] \\ + δ_{2} log [exp (- Λ (L) exp (α^{⊤} X + A ψ (W))) - exp (- Λ (R) exp (α^{⊤} X + A ψ (W)))] \\ - δ_{3} Λ (L) exp (α^{⊤} X + A ψ (W)) . \end{matrix}

Proof of Theorem 1.

We consider the class of functions, taking the form

\begin{matrix} L = & {ℓ (α, Λ, ψ) : α \in A, ∥ β ∥ = 1, Λ (t) = \sum_{l = 1}^{L_{n}} η_{l} M_{l} (t) with 0 \leq η_{l} \leq M_{η} for \\ l = 1, \dots, L_{n}, ψ (W) = \sum_{j = 1}^{J_{n}} γ_{j} B_{j} (β^{⊤} W) with - C_{n} \leq γ_{1} \leq \dots \leq γ_{J_{n}} \leq C_{n}}, \end{matrix}

where

A

is a compact set containing

α_{0}

as specified in condition (C1),

M_{η}

is a sufficiently large positive constant, and

C_{n}

is a constant defined in condition (C6). Under condition (C2), both

{α^{⊤} X}

and

{β^{⊤} W}

reside within a bounded finite-dimensional space; thus, they are classified as Vapnik–Chervonenkis (VC)-major classes, as established by Vapnik [56]. Define

M (L_{n}, M_{η}) = \{M_{η}^{- 1} Λ (t) : Λ (t) = \sum_{l = 1}^{L_{n}} η_{l} M_{l} (t) : 0 \leq η_{l} \leq M_{η} for l = 1, \dots, L_{n}, t \in [0, τ]\},

and

F (J_{n}, C_{n}) = \{C_{n}^{- 1} ψ (W) : ψ (W) = \sum_{l = 1}^{J_{n}} γ_{j} B_{j} (β^{⊤} W) with - C_{n} \leq γ_{1} \leq \dots \leq γ_{J_{n}} \leq C_{n}\} .

According to condition (C1), both sums

\sum_{l = 1}^{L_{n}} η_{l} M_{l} (t)

and

\sum_{j = 1}^{J_{n}} γ_{j} B_{j} (y)

are monotonic and bounded. Specifically, the first sum is bounded by

O (M_{η})

, while the second is bounded by

O (C_{n})

. Invoking Lemma 2.6.19 from van der Vaart and Wellner [57], it follows that

M (L_{n}, M_{η})

and

F (J_{n}, C_{n})

are bounded VC-major classes. Furthermore, the functions

log (x), exp (x)

and

log (1 - exp (- x))

are monotonic and Lipschitz. Combining this with condition (C1)–(C4), applying standard results on the entropy of function classes (Theorem 2.6.9 of van der Vaart and Wellner [57]), we can bound the

ε

-covering number of

L

. For any probability measure Q, the following inequality holds

log N (ε, L, L_{2} (Q)) \leq K_{1} {(ε^{- 1} M_{η} C_{n} exp (C_{n}))}^{2 V_{1} / (V_{1} + 2)},

where

K_{1}

and

V_{1}

are universal positive constants.

By using the Theorem 2.14.1 of van der Vaart and Wellner [57], we have

\sqrt{n} (P_{n} - P) (ℓ (\hat{α}, \hat{Λ}, \hat{ψ})) = O_{P} (M_{η} C_{n} exp (C_{n})) .

(A1)

Similarly,

\sqrt{n} (P_{n} - P) (ℓ (α_{0}, Λ_{0}, \tilde{ψ})) = O_{P} (M_{η} C_{n} exp (C_{n})) .

(A2)

Because the log-likelihood is maximized at

\hat{α}, \hat{β}, \hat{Λ}

, and

\hat{ψ}

, we have

P_{n} (ℓ (α_{0}, Λ_{0}, \tilde{ψ})) ⩽ P_{n} (ℓ (\hat{α}, \hat{Λ}, \hat{ψ})) .

(A3)

Combining (A1)–(A3), we obtain

\begin{matrix} (P_{n} - P) (ℓ (\hat{α}, \hat{Λ}, \hat{ψ}) - ℓ (α_{0}, Λ_{0}, \tilde{ψ})) + O_{p} (M_{η} C_{n} exp (C_{n}) n^{- 1 / 2}) \\ ⩾ - P (ℓ (\hat{α}, \hat{Λ}, \hat{ψ}) - ℓ (α_{0}, Λ_{0}, \tilde{ψ})) . \end{matrix}

(A4)

As shown in Theorem 6.25 of Schumaker [31], there exists a function

\tilde{φ} (y) = \sum_{j = 1}^{b_{n} + 1} γ_{j} B_{j} (y)

such that

{∥ \tilde{φ} - φ_{0} ∥}_{W^{1, \infty} (W)} ⩽ O (b_{n}^{- 2}) and {∥ \tilde{φ} - φ_{0} ∥}_{L_{2} (W)} ⩽ O (b_{n}^{- 7 / 2}) .

Furthermore, since

φ_{0}

is strictly increasing, it follows that for sufficiently large n, the derivative

{\tilde{φ}}^{'} (y) > 0

for all

y \in W

. From the previous expressions in (A1) and (A2), the first term on the left-hand side of (A4) can be bounded by

O_{p} (M_{η} C_{n} exp (C_{n}) n^{- 1 / 2})

. Next, we apply a second-order Taylor expansion of

ℓ (\hat{α}, \hat{Λ}, \hat{ψ})

around the true parameter values

(α_{0}, Λ_{0}, ψ_{0})

on the right-hand side of (A4). Noting that the eigenvalues of the Hessian matrix of the log-likelihood function are bounded above by

- M M_{η}^{3} exp (- 3 C_{n})

for some constant

M > 0

. Therefore, we have

\begin{matrix} P ℓ (\hat{α}, \hat{Λ}, \hat{ψ}) - P ℓ (α_{0}, Λ_{0}, \tilde{ψ}) \\ = & P ℓ (\hat{α}, \hat{Λ}, \hat{ψ}) - P ℓ (α_{0}, Λ_{0}, ψ_{0}) + O_{p} (b_{n}^{- 7}) \\ \leq & - M M_{η}^{3} exp (- 3 C_{n}) E [∥ \hat{α} - α_{0} ∥^{2} + {∥ \hat{Λ} - Λ_{0} ∥}_{L_{2}}^{2} + {\hat{ψ} (W) - ψ_{0} (W)}^{2}] + O_{p} (b_{n}^{- 7}) \\ \leq & - M M_{η}^{3} exp (- 3 C_{n}) E [∥ \hat{α} - α_{0} ∥^{2} + {∥ \hat{Λ} - Λ_{0} ∥}_{L_{2}}^{2} + \frac{{\hat{ψ} (W) - \tilde{ψ} (W)}^{2}}{2} - {\tilde{ψ} (W) - ψ_{0} (W)}^{2}] \\ + O_{p} (b_{n}^{- 7}) . \end{matrix}

As a result,

\begin{matrix} O_{p} (M_{η}^{- 2} C_{n} exp (4 C_{n}) n^{- 1 / 2}) + O_{p} (M_{η}^{- 3} exp (3 C_{n}) b_{n}^{- 7}) \geq \\ M E \{∥ \hat{α} - α_{0} ∥^{2} + ∥ \hat{Λ} - Λ_{0} ∥_{L_{2}}^{2} + {| \hat{ψ} (W) - \tilde{ψ} (W) |}^{2} / 2\} . \end{matrix}

(A5)

Based on the selections of

b_{n}

,

C_{n}

and

M_{η}

, it follows that

∥ \hat{α} - α_{0} ∥ \to 0

, and

∥ \hat{Λ} - Λ_{0} ∥_{L_{2}} \to 0

. Consequently, we derive

O_{p} (M_{η}^{- 2} C_{n} exp (4 C_{n}) n^{- 1 / 2}) + O_{p} (M_{η}^{- 3} exp (3 C_{n}) b_{n}^{- 7}) \geq E \{| \hat{ψ} (W) - \tilde{ψ} {(W) |}^{2}\} .

(A6)

Recall that

\hat{ψ} (W) = \hat{φ} ({\hat{β}}^{⊤} W)

and

\tilde{ψ} (W) = \tilde{φ} (β_{0}^{⊤} W)

. Based on the following equation

\begin{matrix} E [{\hat{φ} ({\hat{β}}^{⊤} W) - \tilde{φ} (β_{0}^{⊤} W)}^{2}] \\ = E ({[\hat{φ} ({\hat{β}}^{⊤} W) - E {\tilde{φ} (β_{0}^{⊤} W) ∣ {\hat{β}}^{⊤} W}]}^{2}) + E ({[E {\tilde{φ} (β_{0}^{⊤} W) ∣ {\hat{β}}^{⊤} W} - \tilde{φ} (β_{0}^{T} W)]}^{2}), \end{matrix}

inequality (A6) leads to the conclusion that

\begin{matrix} O_{p} (M_{η}^{- 2} C_{n} exp (4 C_{n}) n^{- 1 / 2}) & + O_{p} (M_{η}^{- 3} exp (3 C_{n}) b_{n}^{- 7}) \\ \geq E ({[E {φ_{0} (β_{0}^{⊤} W) ∣ {\hat{β}}^{⊤} W} - φ_{0} (β_{0}^{⊤} W)]}^{2}) . \end{matrix}

Since

\hat{β}

is bounded, any subsequence of

{{\hat{β}}_{n}}

will have a convergent subsequence, whose limit we denote by

\overset{˘}{β}

. By Condition (C1) and the continuous mapping theorem, it follows that

E {φ_{0} (β_{0}^{⊤} W) ∣ {\overset{˘}{β}}^{⊤} W} = φ_{0} (β_{0}^{⊤} W)

almost surely. Applying Condition (C5), we conclude that

\overset{˘}{β} = β_{0}

, and therefore,

\hat{β} \to_{p} β_{0}

. Furthermore, Condition (C5) also implies that

O_{p} (M_{η}^{- 2} C_{n} exp (4 C_{n}) n^{- 1 / 2}) + O_{p} (M_{η}^{- 3} exp (3 C_{n}) b_{n}^{- 7}) \geq {∥ \hat{β} - β_{0} ∥}^{2} .

(A7)

Futhermore,

\begin{matrix} | \hat{φ} ({\hat{β}}^{⊤} W) - \tilde{φ} ({\hat{β}}^{⊤} W) |^{2} = & | \hat{φ} ({\hat{β}}^{⊤} W) - \tilde{φ} (β_{0}^{⊤} W) + \tilde{φ} (β_{0}^{⊤} W) - \tilde{φ} ({\hat{β}}^{⊤} W) |^{2} \\ \leq & | \hat{φ} ({\hat{β}}^{⊤} W) - \tilde{φ} (β_{0}^{⊤} W) |^{2} + {| \tilde{φ} (β_{0}^{⊤} W) - \tilde{φ} ({\hat{β}}^{⊤} W) |}^{2} \\ + 2 | \hat{φ} ({\hat{β}}^{⊤} W) - \tilde{φ} (β_{0}^{⊤} W) | \times | \tilde{φ} (β_{0}^{⊤} W) - \tilde{φ} ({\hat{β}}^{⊤} W) | \\ \leq & 2 | \hat{φ} ({\hat{β}}^{⊤} W) - \tilde{φ} (β_{0}^{⊤} W) |^{2} + 2 {| \tilde{φ} ({\hat{β}}^{⊤} W) - \tilde{φ} (β_{0}^{⊤} W) |}^{2} . \end{matrix}

We can infer from (A6) that

\begin{matrix} O_{p} (M_{η}^{- 2} C_{n} & exp (4 C_{n}) n^{- 1 / 2}) + O_{p} (M_{η}^{- 3} exp (3 C_{n}) b_{n}^{- 7}) \geq \\ M E \{| \hat{φ} ({\hat{β}}^{⊤} W) - \tilde{φ} ({\hat{β}}^{⊤} W) |^{2} - 2 {| \tilde{φ} ({\hat{β}}^{⊤} W) - \tilde{φ} (β_{0}^{⊤} W) |}^{2}\} . \end{matrix}

(A8)

With condition (C6) and inequality (A7), we have

O_{p} (b_{n}^{2} M_{η}^{- 2} C_{n} exp (4 C_{n}) n^{- 1 / 2}) + O_{p} (M_{η}^{- 3} exp (3 C_{n}) b_{n}^{- 5}) \geq E \{| \hat{φ} ({\hat{β}}^{⊤} W) - \tilde{φ} ({\hat{β}}^{⊤} W) |^{2}\} .

Then, based on the result from De Boor [58] (p. 155), which establishes the relationship between the distance in function space and the Euclidean distance in the coefficient space, we move to the following expression

O_{p} (b_{n}^{2} M_{η}^{- 2} C_{n} exp (4 C_{n}) n^{- 1 / 2}) + O_{p} (M_{η}^{- 3} exp (3 C_{n}) b_{n}^{- 5}) \geq \sum_{j = 1}^{b_{n} + 1} {|{\hat{γ}}_{j} - {\tilde{γ}}_{j}|}^{2} .

This implies that

\begin{matrix} ∥ \hat{φ} - \tilde{φ} ∥_{W^{1, \infty} (W)} \\ \leq & \sum_{j = 1}^{b_{n} + 1} |{\hat{γ}}_{j} - {\tilde{γ}}_{j}| {∥B_{j}∥}_{\infty} + \sum_{j = 1}^{b_{n} + 1} |{\hat{γ}}_{j} - {\tilde{γ}}_{j}| {∥B_{j}^{'}∥}_{\infty} \\ \leq & {(b_{n} + 1)}^{1 / 2} {\{\sum_{j = 1}^{b_{n} + 1} {|{\hat{γ}}_{j} - {\tilde{γ}}_{j}|}^{2}\}}^{1 / 2} {∥B_{j}∥}_{\infty} + {(b_{n} + 1)}^{1 / 2} {\{\sum_{j = 1}^{b_{n} + 1} {|{\hat{γ}}_{j} - {\tilde{γ}}_{j}|}^{2}\}}^{1 / 2} {∥B_{j}^{'}∥}_{\infty} \\ = & O_{p} (b_{n}^{2} M_{η}^{- 1} C_{n}^{1 / 2} exp (2 C_{n}) n^{- 1 / 4} + M_{η}^{- 3 / 2} exp (3 C_{n} / 2) b_{n}^{- 1 / 2}) . \end{matrix}

The second inequality follows from the Cauchy–Schwarz inequality. By the choices of

b_{n}

,

M_{η}

and

C_{n}

, we obtain

∥ \hat{φ} - \tilde{φ} ∥_{W^{1, \infty} (W)} \to 0

. □

Proof of Theorem 2.

Let the parameter vector be

ζ = (θ, Λ, φ)

, with corresponding estimators

\hat{ζ} = (\hat{θ}, \hat{Λ}, \hat{φ})

, and the true values

ζ_{0} = (θ_{0}, Λ_{0}, φ_{0})

. Define the set

B_{- q} = {β_{0, - q} \in R^{q - 1} : there exists β_{0 q}, s . t . (β_{01}, \dots, β_{0 q}) \in B}

, so that

β_{0, - q} \in B_{- q}

. In order to complete the proof of asymptotic normality, we aim to obtain the least favorable direction for

θ

, defined as

q_{Λ}^{*}

and

q_{φ}^{*}

satisfying the following equations:

E [{\dot{ℓ}}_{Λ}^{*} (ζ_{0}) {\dot{ℓ}}_{θ} (ζ_{0})] = E [{\dot{ℓ}}_{Λ}^{*} (ζ_{0}) ({\dot{ℓ}}_{Λ} (ζ_{0}) [q_{Λ}^{*}] + {\dot{ℓ}}_{φ} (ζ_{0}) [q_{φ}^{*}])],

and

E [{\dot{ℓ}}_{φ}^{*} (ζ_{0}) {\dot{ℓ}}_{θ} (ζ_{0})] = E [{\dot{ℓ}}_{φ}^{*} (ζ_{0}) ({\dot{ℓ}}_{Λ} (ζ_{0}) [q_{Λ}^{*}] + {\dot{ℓ}}_{φ} (ζ_{0}) [q_{φ}^{*}])],

where

{\dot{ℓ}}_{θ} (ζ_{0})

denotes the score function for

θ

. Correspondingly,

{\dot{ℓ}}_{Λ} (ζ_{0}) [q_{Λ}]

and

{\dot{ℓ}}_{φ} (ζ_{0}) [q_{φ}]

denote the score functions for

Λ

and

φ

evaluated along the parametric submodels

Λ + ϵ q_{Λ}

and

φ + ϵ q_{φ}

, respectively. We define

{\dot{ℓ}}_{Λ} (ζ_{0}) [q_{Λ}^{*}] = {({\dot{ℓ}}_{Λ} (ζ_{0}) [q_{Λ 1}^{*}], \dots, {\dot{ℓ}}_{Λ} (ζ_{0}) [q_{Λ (p + q - 1)}^{*}])}^{⊤}

and

{\dot{ℓ}}_{φ} (ζ_{0}) [q_{φ}^{*}] = {({\dot{ℓ}}_{φ} (ζ_{0}) [q_{φ 1}^{*}], \dots, {\dot{ℓ}}_{φ} (ζ_{0}) [q_{φ (p + q - 1)}^{*}])}^{⊤}

, where

q_{Λ}^{*} =

{(q_{Λ 1}^{*}, \dots, q_{Λ (p + q - 1)}^{*})}^{⊤} \in Φ_{Λ}^{p + q - 1}

,

q_{φ}^{*} = {(q_{φ 1}^{*}, \dots, q_{φ (p + q - 1)}^{*})}^{⊤} \in Φ_{φ}^{p + q - 1}

,

Φ_{Λ} = L_{2} ([0, τ])

and

Φ_{φ} = L_{2} (W)

. Finally,

{\dot{ℓ}}_{Λ}^{*} (ζ_{0})

and

{\dot{ℓ}}_{φ}^{*} (ζ_{0})

are the adjoint operators of

{\dot{ℓ}}_{Λ} (ζ_{0})

and

{\dot{ℓ}}_{φ} (ζ_{0})

, respectively.

Furthermore, we define

\begin{matrix} Q_{1} (X; ζ) \\ = δ_{1} (\frac{exp {- Λ (R) exp (α^{⊤} X + A φ (β^{⊤} W))} Λ (R) exp (α^{⊤} X + A φ (β^{⊤} W))}{1 - exp {- Λ (R) exp (α^{⊤} X + A φ (β^{⊤} W))}}) \\ + δ_{2} (\frac{exp {- Λ (R) exp (α^{⊤} X + A φ (β^{⊤} W))} Λ (R) exp (α^{⊤} X + A φ (β^{⊤} W))}{exp {- Λ (L) exp (α^{⊤} X + A φ (β^{⊤} W))} - exp {- Λ (R) exp (α^{⊤} X + A φ (β^{⊤} W))}}), \end{matrix}

\begin{matrix} Q_{2} (X; ζ) \\ = δ_{2} (\frac{- exp {- Λ (L) exp (α^{⊤} X + A φ (β^{⊤} W))} Λ (L) exp (α^{⊤} X + A φ (β^{⊤} W))}{exp {- Λ (L) exp (α^{⊤} X + A φ (β^{⊤} W))} - exp {- Λ (R) exp (α^{⊤} X + A φ (β^{⊤} W))}}) \\ - δ_{3} (Λ (L) exp (α^{⊤} X + A φ (β^{⊤} W))) . \end{matrix}

The score function for

θ

is

\begin{matrix} {\dot{ℓ}}_{θ} (ζ_{0}) = h (X, A) (Q_{1} (X; ζ_{0}) + Q_{2} (X; ζ_{0})), \end{matrix}

where

h (X, A) = (\begin{matrix} X \\ A φ_{0}^{'} (β_{0}^{⊤} W) W_{- q} - β_{0, - q} A φ_{0}^{'} (β_{0}^{⊤} W) W_{q} / β_{0, q} \end{matrix}) .

Next, the score function for

Λ

along the parametric submodel

Λ + ϵ q_{Λ}

is given by

{\dot{ℓ}}_{Λ} (ζ_{0}) [q_{Λ}] = \frac{q_{Λ} (R)}{Λ (R)} Q_{1} (X; ζ_{0}) + \frac{q_{Λ} (L)}{Λ (L)} Q_{2} (X; ζ_{0}) .

Similarly, the score function corresponding to

φ

along the submodel

φ + ϵ q_{φ}

is

{\dot{ℓ}}_{φ} (ζ_{0}) [q_{φ}] = A q_{φ} (β_{0}^{⊤} W) (Q_{1} (X; ζ_{0}) + Q_{2} (X; ζ_{0})) .

Following Theorem 1 in Bickel et al. [59] (p. 70), the efficient score vector for the parameter

θ

is defined as

ℓ_{θ}^{*} (ζ_{0}) = {\dot{ℓ}}_{θ} (ζ_{0}) - {\dot{ℓ}}_{Λ} (ζ_{0}) [q_{Λ}^{*}] - {\dot{ℓ}}_{φ} (ζ_{0}) [q_{φ}^{*}],

where

q_{Λ}^{*} \in Φ_{Λ}^{p + q - 1}

and

q_{φ}^{*} \in Φ_{φ}^{p + q - 1}

are chosen to be least favorable direction such that the efficient score

ℓ_{θ}^{*} (ζ_{0})

is orthogonal to the score operators

{\dot{ℓ}}_{Λ} (ζ_{0}) [q_{Λ}^{*}]

and

{\dot{ℓ}}_{φ} (ζ_{0}) [q_{φ}^{*}]

, respectively.

Following the proof framework presented in Zeng et al. [37], we establish the asymptotic normality of

\hat{θ}

by verifying the following three key conditions:

(i): We first show the existence of $q_{Λ}^{*}$ and $q_{φ}^{*}$ , which guarantees that the efficient score function $ℓ_{θ}^{*} (ζ_{0})$ is well-defined;
(ii): Next, analyze the properties of the efficient score function $ℓ_{θ}^{*} (\hat{ζ})$ , showing that it belongs to a Donsker class and converges to $ℓ_{θ}^{*} (ζ_{0}) = {\dot{ℓ}}_{θ} (ζ_{0}) - {\dot{ℓ}}_{Λ} (ζ_{0}) [q_{Λ}^{*}] - {\dot{ℓ}}_{φ} (ζ_{0}) [q_{φ}^{*}]$ in $L_{2} (P)$ norm;
(iii): Finally, we verify that the matrix $I (θ_{0}) = E {ℓ_{θ}^{*} (ζ_{0}) ℓ_{θ}^{*} {(ζ_{0})}^{⊤}}$ is nonsingular.

The existence of

q_{Λ}^{*}

and

q_{φ}^{*}

in condition (i) can be established by adapting the argument in Theorem 6.1 of Huang and Rossini [60], thereby ensuring the proper definition of the efficient score function.

We next focus on verifying condition (ii). To this end, we first introduce the function spaces

M_{n} = M_{η} M (L_{n}, M_{η})

and

F_{n} = C_{n} F (J_{n}, C_{n})

. Combining condition (C1) with Theorem 9.23 of Kosorok [61], it follows that for any

δ > 0

and

ε \in (0, δ)

, the

ε

-bracketing number of

{θ \in A \times B_{- q} : ∥ θ - θ_{0} ∥ \leq δ}

with radius

ε

and Euclidean norm is bounded by

M {(δ / ε)}^{p + q - 1}

. Define

{\tilde{Λ}}_{0} = \underset{\tilde{Λ} \in M_{n}}{error} {∥ \tilde{Λ} - Λ_{0} ∥}_{L_{2}}

and

{\tilde{φ}}_{0} = \underset{\tilde{φ} \in F_{n}}{error} {∥ \tilde{φ} - φ_{0} ∥}_{L_{2}}

. Applying condition (C4) and following the calculations of Shen and Wong [62] (p. 597), we obtain that the logarithm of the

ε

-bracketing number of

{\tilde{Λ} \in M_{n} : ∥ \tilde{Λ} - {\tilde{Λ}}_{0} ∥_{L_{2}} \leq δ}

is bounded by

M L_{n} log (δ / ε)

. Similarly, the logarithm of the

ε

-bracketing number of

{\tilde{φ} \in F_{n} : ∥ \tilde{φ} - {\tilde{φ}}_{0} ∥_{L_{2}} \leq δ}

is bounded by

M J_{n} log (δ / ε)

. Set

\tilde{ζ} = (θ, \tilde{Λ}, \tilde{φ})

. For every

δ > 0

, consider the class

L_{δ} = {{\dot{ℓ}}_{θ} (\tilde{ζ}) : \tilde{ζ} \in A \times B_{- q} \times M_{n} \times F_{n}, d (\tilde{ζ}, ζ_{0}) \leq δ}

. This class has an

ε

-bracketing number bounded by

M {(δ / ε)}^{p + q - 1}

, thereby constituting a Donsker class. By similar arguments, the function classes

{{\dot{ℓ}}_{Λ} (\tilde{ζ}) [q_{Λ}^{*}] : \tilde{ζ} \in A \times B_{- q} \times M_{n} \times F_{n}, d (\tilde{ζ}, ζ_{0}) \leq δ}

and

{{\dot{ℓ}}_{φ} (\tilde{ζ}) [q_{φ}^{*}] : \tilde{ζ} \in A \times B_{- q} \times M_{n} \times F_{n}, d (\tilde{ζ}, ζ_{0}) \leq δ}

are also Donsker classes. It then follows from the preservation of the Donsker property that

ℓ_{θ}^{*} (\hat{ζ})

belongs to a Donsker class. Combining with the previously established consistency of the estimator, this implies that

ℓ_{θ}^{*} (\hat{ζ})

converges to

ℓ_{θ}^{*} (ζ_{0})

in

L_{2} (P)

norm. It follows that the empirical process

G_{n} {ℓ_{θ}^{*} (\hat{ζ})}

converges in distribution to a zero-mean normal random vector with dimension

p + q - 1

.

By the definition of

\hat{ζ}

, we have

P_{n} {ℓ_{θ}^{*} (\hat{ζ})} = 0

, which implies that

G_{n} {ℓ_{θ}^{*} (\hat{ζ})} = - \sqrt{n} P {ℓ_{θ}^{*} (\hat{ζ}) - ℓ_{θ}^{*} (ζ_{0})} .

Applying a Taylor series expansion, we obtain

\begin{matrix} G_{n} {ℓ_{θ}^{*} (\hat{ζ})} = \sqrt{n} I (θ_{0}) (\hat{θ} - θ_{0}) + O (\sqrt{n} d^{2} (\hat{ζ}, ζ_{0})), \end{matrix}

(A9)

where the distance metric

d (\hat{ζ}, ζ_{0})

is defined by

d (\hat{ζ}, ζ_{0}) = (∥ \hat{θ} - θ_{0} ∥^{2} + ∥ \hat{Λ} - Λ_{0} ∥_{L_{2}}^{2} + ∥ \hat{φ} - φ_{0} {∥_{L_{2} (W)}^{2})}^{1 / 2} .

From the consistency results for

\hat{α}, \hat{β}, \hat{Λ}

, and

\hat{φ}

established in Theorem 1, the left-hand side of (A5) becomes

o_{p} (n^{- 1 / 2}) + O_{p} (b_{n}^{- 7})

. Together with the assumption in condition (C6), it follows that

∥ \hat{α} - α_{0} ∥^{2} + ∥ \hat{β} - β_{0} ∥^{2} + ∥ \hat{Λ} - Λ_{0} ∥_{L_{2}}^{2} + {∥ \hat{φ} - φ_{0} ∥}_{L_{2} (W)}^{2} = o_{p} (n^{- 1 / 2})

. implying that

d^{2} (\hat{ζ}, ζ_{0}) = o_{p} (n^{- 1 / 2})

.

Finally, we demonstrate that

I (θ_{0})

is nonsingular. Suppose that the matrix

I (θ_{0})

is singular. Then there exists a nonzero vector

\tilde{ν}

satisfying

{\tilde{ν}}^{⊤} I (θ_{0}) \tilde{ν} = {\tilde{ν}}^{⊤} E {ℓ_{θ}^{*} (ζ_{0}) ℓ_{θ}^{*} {(ζ_{0})}^{⊤}} \tilde{ν} = 0 .

This implies that the score function along the submodel defined by

{θ_{0} + ϵ \tilde{ν}, Λ_{0} + ϵ {\tilde{ν}}^{⊤} q_{Λ}^{*}, φ_{0} + ϵ {\tilde{ν}}^{⊤} q_{φ}^{*}}

is almost surely zero. In other words,

\begin{matrix} Q_{1} (X; ζ_{0}) [{\tilde{ν}}^{⊤} h (X, A) + {\tilde{ν}}^{⊤} A q_{φ}^{*} (β^{⊤} W) + {\tilde{ν}}^{⊤} q_{Λ}^{*} (R) / Λ_{0} (R)] \\ + Q_{2} (X; ζ_{0}) [{\tilde{ν}}^{⊤} h (X, A) + {\tilde{ν}}^{⊤} A q_{φ}^{*} (β^{⊤} W) + {\tilde{ν}}^{⊤} q_{Λ}^{*} (L) / Λ_{0} (L)] = 0 . \end{matrix}

Focusing on the case when

δ_{L} = 1

, the above equation reduces to

Q_{1} (X; ζ_{0}) [{\tilde{ν}}^{⊤} h (X, A) + {\tilde{ν}}^{⊤} A q_{φ}^{*} (β^{⊤} W) + {\tilde{ν}}^{⊤} q_{Λ}^{*} (R) / Λ_{0} (R)] = 0 .

Therefore, with probability 1,

{\tilde{ν}}^{⊤} h (X, A) + {\tilde{ν}}^{⊤} A q_{φ}^{*} (β^{⊤} W) + {\tilde{ν}}^{⊤} q_{Λ}^{*} (R) / Λ_{0} (R) = 0

. By assumption (C7), this can only hold if

\tilde{ν} = 0

, contradicting the fact that

\tilde{ν}

is nonzero. Thus, the matrix

I (θ_{0})

is nonsingular. Combining this with Equation (A9), we further conclude that

\sqrt{n} (\hat{θ} - θ_{0}) \overset{d}{\to} N (0, I^{- 1} (θ_{0})), n \to \infty .

The proof of Theorem 2 is complete. □

Appendix B. Sensitivity Analysis

To assess the robustness of our method with respect to key tuning parameters, we conducted sensitivity analyses on the number of interior knots for both monotone and B-splines, as well as on the tuning parameter

C_{n}

. Table A1 summarizes the AIC values for the top five knot configurations tested, demonstrating that the combination of 5 interior knots for the monotone splines and 3 interior knots for the B-splines provides the best model fit as measured by AIC. This supports our selection of this knot configuration as the default setting.

Table A1. Top five knot configurations and their corresponding AIC values for the monotone and B-splines.

AIC	Number of Interior Knots (Monotone Splines)	Number of Interior Knots (B-Splines)
578.539	5	3
578.759	2	2
579.642	7	3
581.329	4	3
582.674	3	3

Table A2 presents the parameter estimation results for various values of the tuning parameter

C_{n}

ranging from 10 to 20. The results confirm that the estimates stabilize for

C_{n}

values above approximately 13, justifying our choice of

C_{n} = 15

as a practical default. Collectively, these findings demonstrate the method’s robustness to the selection of these tuning parameters.

Table A2. Arameter estimation results for different values of the tuning parameter

C_{n}

.

Table A2. Arameter estimation results for different values of the tuning parameter

C_{n}

.

Cn	Bias
Cn	$β_{1}$	$β_{2}$	$α_{1}$	$α_{2}$	$α_{3}$	$α_{4}$	$β_{av}$
10	−0.043	−0.038	0.064	0.014	0.026	0.038	0.020
11	−0.038	−0.033	0.055	0.013	0.026	0.039	0.018
12	−0.035	−0.032	0.052	0.014	0.025	0.037	0.018
13	−0.036	−0.031	0.046	0.012	0.025	0.035	0.017
14	−0.036	−0.031	0.047	0.013	0.024	0.035	0.017
15	−0.036	−0.031	0.046	0.011	0.024	0.035	0.017
16	−0.036	−0.031	0.047	0.012	0.024	0.035	0.017
17	−0.035	−0.032	0.048	0.011	0.024	0.036	0.018
18	−0.036	−0.032	0.047	0.014	0.024	0.036	0.017
19	−0.035	−0.031	0.047	0.013	0.024	0.036	0.017
20	−0.036	−0.031	0.047	0.013	0.025	0.035	0.018

Appendix C. A Summary Table of Symbols

Table A3. Summary of Notation and Parameters.

Symbol	Description
X	The p-dimensional covariate vector.
A	The treatment indicator, $A = a, a \in {0, 1}$ .
T	The survival time.
$Λ$	The unspecified cumulative baseline hazard function.
$Λ_{0}$	The true cumulative baseline hazard function.
$\tilde{Λ}$	Approximated cumulative hazard function, represented via spline basis functions.
$φ (\cdot)$	The strictly increasing link function.
$\tilde{φ} (\cdot)$	The quadratic B-spline approximation of $φ (\cdot)$ .
$W$	A q-dimensional subset of $X$ .
$α$	The regression parameter vector.
$β$	The regression parameter vector.
$\hat{β}$	The proposed estimator of $β$ .
$F (t \| X)$	The cumulative distribution function of T given $X$ .
$\tilde{F} (t \| X)$	Approximated cumulative distribution function for the survival time.
$∥ \cdot ∥$	The Euclidean norm.
${∥ \cdot ∥}_{\infty}$	The sup-norm.
$I (\cdot)$	The indicator function.
$(L, R]$	The time interval containing the failure time T.
$δ_{1}, δ_{2}, δ_{3}$	The indicators for left censoring, interval censoring, and right censoring observations, respectively.
$O$	The collected data.
$L_{o b s}$	The likelihood function of the observed data.
$M (\cdot)$	The vector of integrated spline basis functions, $M \equiv {(M_{1} (\cdot), \dots, M_{L_{n}} (\cdot))}^{⊤}$ , with each $M_{l} (\cdot)$ non-decreasing in $[0, 1]$ .
$η_{l}$	The spline coefficient corresponding to $M_{l} (\cdot)$ .
$L_{n}$	Number of integrated spline basis functions, $L_{n} = m_{n} + d_{M}$ .
$m_{n}$	Cardinality of the interior knot set used for spline construction.
$d_{M}$	Degree of the splines, which can be set to 1, 2, or 3 for linear, quadratic, or cubic splines, respectively.
$B$	The quadratic B-spline basis functions, $B \equiv {(B_{1} (\cdot), \dots, B_{J_{n}} (\cdot))}^{⊤}$ .
$B_{j} (\cdot)$	The j-th quadratic B-spline basis function.
$γ$	Vector of spline coefficients, $γ = {(γ_{1}, \dots, γ_{J_{n}})}^{⊤}$ .
$γ_{j}$	The j-th spline coefficient corresponding to $B_{j} (\cdot)$ .
$J_{n}$	Number of B-spline basis functions, $J_{n} = b_{n} + d_{B} - 1$ .
$b_{n}$	Cardinality of the interior knot set for B-spline construction.
$d_{B}$	Degree of the B-spline.
$C_{n}$	A constant used to impose monotonicity constraints $- C_{n} \leq γ_{1} \leq \dots \leq γ_{J_{n}} \leq C_{n}$ .
$M_{η}$	Upper bound for the monotone spline coefficients.
$Z_{i}, Y_{i}$	Latent variables for the ith subject.
$p o i s (v)$	The probability mass function associated with a Poisson distribution having a mean of v.
$t_{i_{1}}$	Time point defined as $t_{i 1} = R_{i} (δ_{i 1} = 1) + L_{i} (δ_{i 1} = 0)$ .
$t_{i 2}$	$t_{i 2} = R_{i} (δ_{i 2} = 1) + L_{i} (δ_{i 3} = 1)$ if $δ_{i 1} = 0$ .
$L_{c}$	The complete data likelihood function.
$V_{i}$	Intermediate quantity, $V_{i} = \tilde{Λ} (t) exp {α^{⊤} X_{i} + A_{i} \tilde{φ} (β^{⊤} W_{i})}$ .
${\hat{β}}_{a v}$	The average treatment effect.
$(α_{0}, β_{0}, Λ_{0}, φ_{0})$	The true parameter values.
$F_{1}$	The function space for the cumulative hazard function.
$W$	The union of the supports of $β^{⊤} W$ over all $β \in R^{q}$ satisfying $∥ β ∥ = 1$ .
$τ$	The maximum follow-up time in the study.
${∥ \cdot ∥}_{L_{2} (W)}$	$L_{2}$ -norm over $W$ .
${∥ \cdot ∥}_{L^{\infty} (W)}$	$L_{\infty}$ -norm over $W$ .
$F_{φ} (y)$	The distribution function of the variable y.
$\tilde{T}$	Monotonic knot sequence partitioning $[0, τ]$ into $m_{n} + 1$ subintervals, $\tilde{T} = {t_{i}}_{i = 1}^{m_{n} + d_{M}}$ .
$\tilde{Y}$	Monotonic knot sequence partitioning $W$ into $b_{n} + 1$ subintervals, $\tilde{Y} = {y_{i}}_{i = 1}^{b_{n} + d_{B}}$ .
$A$	The parameter space for $α_{0}$ .
$B$	The parameter space for $β_{0}$ .
$φ_{0} (y)$	The true link function, strictly increasing in y and defined on the covariate space $W$ .
$ζ$	Order of the derivative for which the Lipschitz condition is satisfied.
$k_{Λ}, k_{φ}$	Positive Lipschitz constants for the $ζ$ -th derivatives of $Λ_{0} (t)$ and $φ_{0} (y)$ , respectively.
$Δ_{Λ}$	Maximum spacing between consecutive time knots, $Δ_{Λ} = {max}_{d_{M + 1} \leq i \leq m_{n} + d_{M + 1}} \| t_{i} - t_{i - 1} \|$ .
$Δ_{φ}$	Maximum spacing between consecutive covariate knots, $Δ_{φ} = {max}_{d_{B + 1} \leq i \leq B_{n} + d_{B + 1}} \| y_{i} - y_{i - 1} \|$ .
$δ_{Λ}$	Minimum spacing between consecutive time knots, $δ_{Λ} = {min}_{d_{M + 1} \leq i \leq m_{n} + d_{M + 1}} \| t_{i} - t_{i - 1} \|$ .
$δ_{φ}$	Minimum spacing between consecutive covariate knots, $δ_{φ} = {min}_{d_{B + 1} \leq i \leq B_{n} + d_{B + 1}} \| y_{i} - y_{i - 1} \|$ .
$\overset{˘}{β}$	A unit vector in the identifiability condition.
$u_{- q}$	The vector consisting of its first $q - 1$ components.
$θ$	The parameter vector, $θ = (α^{⊤}, β_{- q}^{⊤})$ .
$θ_{0}$	The true parameter vector, $θ_{0} = (α_{0}^{⊤}, β_{0, - q}^{⊤})$ .

References

Neumann, F.J.; Kastrati, A.; Miethke, T.; Pogatsa-Murray, G.; Mehilli, J.; Valina, C.; Jogethaei, N.; da Costa, C.P.; Wagner, H.; Schömig, A. Treatment of Chlamydia pneumoniae infection with roxithromycin and effect on neointima proliferation after coronary stent placement (ISAR-3): A randomised, double-blind, placebo-controlled trial. Lancet 2001, 357, 2085–2089. [Google Scholar] [CrossRef]
IBCSG. Endocrine responsiveness and tailoring adjuvant therapy for postmenopausal lymph node-negative breast cancer: A randomized trial. J. Natl. Cancer Inst. 2002, 94, 1054–1065. [Google Scholar] [CrossRef][Green Version]
Tang, V.; Boscardin, W.J.; Stijacic-Cenzer, I.; Lee, S.J. Time to benefit for colorectal cancer screening: Survival meta-analysis of flexible sigmoidoscopy trials. BMJ 2015, 350, h1662. [Google Scholar] [CrossRef]
Moodie, E.E.; Richardson, T.S.; Stephens, D.A. Demystifying optimal dynamic treatment regimes. Biometrics 2007, 63, 447–455. [Google Scholar] [CrossRef]
Cai, T.; Tian, L.; Wong, P.H.; Wei, L.J. Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics 2011, 12, 270–282. [Google Scholar] [CrossRef]
Zhang, B.; Tsiatis, A.A.; Davidian, M.; Zhang, M.; Laber, E. Estimating optimal treatment regimes from a classification perspective. Stat 2012, 1, 103–114. [Google Scholar] [CrossRef]
Lavori, P.W.; Dawson, R. A design for testing clinical strategies: Biased adaptive within-subject randomization. J. R. Stat. Soc. Ser. A Stat. Soc. 2000, 163, 29–38. [Google Scholar] [CrossRef]
Hamburg, M.A.; Collins, F.S. The path to personalized medicine. N. Engl. J. Med. 2010, 363, 301–304. [Google Scholar] [CrossRef]
Twilt, M. Precision medicine: The new era in medicine. EBioMedicine 2016, 4, 24–25. [Google Scholar] [CrossRef] [PubMed]
Murphy, S.A. Optimal dynamic treatment regimes. J. R. Stat. Soc. Ser. B Stat. Methodol. 2003, 65, 331–355. [Google Scholar] [CrossRef]
Schulte, P.J.; Tsiatis, A.A.; Laber, E.B.; Davidian, M. Q- and A-learning methods for estimating optimal dynamic treatment regimes. Stat. Sci. 2015, 29, 640–661. [Google Scholar]
Ertefaie, A.; McKay, J.R.; Oslin, D.; Strawderman, R.L. Robust Q-learning. J. Am. Stat. Assoc. 2021, 116, 368–381. [Google Scholar] [CrossRef] [PubMed]
Orellana, L.; Rotnitzky, A.; Robins, J.M. Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part I: Main content. Int. J. Biostat. 2010, 6, 8. [Google Scholar] [CrossRef]
Zhang, B.; Tsiatis, A.A.; Laber, E.B.; Davidian, M. A robust method for estimating optimal treatment regimes. Biometrics 2012, 68, 1010–1018. [Google Scholar] [CrossRef]
Zhang, B.; Zhang, M. C-learning: A new classification framework to estimate optimal dynamic treatment regimes. Biometrics 2018, 74, 891–899. [Google Scholar] [CrossRef]
Saarela, O.; Arjas, E.; Stephens, D.A.; Moodie, E.E. Predictive Bayesian inference and dynamic treatment regimes. Biom. J. 2015, 57, 941–958. [Google Scholar] [CrossRef]
Xu, Y.; Müller, P.; Wahed, A.S.; Thall, P.F. Bayesian nonparametric estimation for dynamic treatment regimes with sequential transition times. J. Am. Stat. Assoc. 2016, 111, 921–950. [Google Scholar] [CrossRef]
Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B Methodol. 1972, 34, 187–202. [Google Scholar] [CrossRef]
Fang, Y.; Zhang, B.; Zhang, M. Robust method for optimal treatment decision making based on survival data. Stat. Med. 2021, 40, 6558–6576. [Google Scholar] [CrossRef]
Wang, J.; Zeng, D.; Lin, D.Y. Semiparametric single-index models for optimal treatment regimens with censored outcomes. Lifetime Data Anal. 2022, 28, 744–763. [Google Scholar] [CrossRef] [PubMed]
Goldberg, Y.; Kosorok, M.R. Q-learning with censored data. Ann. Stat. 2012, 40, 529–560. [Google Scholar] [CrossRef]
Lyu, L.; Cheng, Y.; Wahed, A.S. Imputation-based Q-learning for optimizing dynamic treatment regimes with right-censored survival outcome. Biometrics 2023, 79, 3676–3689. [Google Scholar] [CrossRef]
Sun, J. The Statistical Analysis of Interval-Censored Failure Time Data; Springer: New York, NY, USA, 2006. [Google Scholar]
Huang, J. Efficient estimation for the proportional hazards model with interval censoring. Ann. Stat. 1996, 24, 540–568. [Google Scholar] [CrossRef]
Lu, M.; McMahan, C.S. A partially linear proportional hazards model for current status data. Biometrics 2018, 74, 1240–1249. [Google Scholar] [CrossRef]
Li, S.; Peng, L. Instrumental variable estimation of complier causal treatment effect with interval-censored data. Biometrics 2023, 79, 253–263. [Google Scholar] [CrossRef]
Yuan, C.; Zhao, S.; Li, S.; Song, X. Sieve Maximum Likelihood Estimation of Partially Linear Transformation Models with Interval-Censored Data. Stat. Med. 2024, 43, 5765–5776. [Google Scholar] [CrossRef]
Rubin, D.B. Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 1974, 66, 688–701. [Google Scholar] [CrossRef]
Imbens, G.W.; Rubin, D.B. Causal Inference in Statistics, Social, and Biomedical Sciences; Cambridge University Press: Cambridge, UK, 2015. [Google Scholar]
Ramsay, J.O. Monotone regression splines in action. Stat. Sci. 1988, 32, 425–441. [Google Scholar] [CrossRef]
Schumaker, L. Spline Functions: Basic Theory; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Wang, L.; McMahan, C.S.; Hudgens, M.G.; Qureshi, Z.P. A flexible, computationally efficient method for fitting the proportional hazards model to interval-censored data. Biometrics 2016, 72, 222–231. [Google Scholar] [CrossRef] [PubMed]
McMahan, C.S.; Wang, L.; Tebbs, J.M. Regression analysis for current status data using the EM algorithm. Stat. Med. 2013, 32, 4452–4466. [Google Scholar] [CrossRef] [PubMed]
Ghalanos, A.; Theussl, S. Rsolnp: General Non-Linear Optimization Using Augmented Lagrange Multiplier Method. R Package Version 2.0.1. 2025. Available online: https://CRAN.R-project.org/package=Rsolnp (accessed on 30 June 2025).
Ypma, J.; Johnson, S.G.; Stamm, A.; Borchers, H.W.; Eddelbuettel, D.; Ripley, B.; Hornik, K.; Chiquet, J.; Adler, A.; Dai, X.; et al. nloptr: R Interface to NLopt. R Package Version 2.2.1. 2025. Available online: https://CRAN.R-project.org/package=nloptr (accessed on 17 March 2025).
Zhang, Y.; Hua, L.; Huang, J. A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scand. J. Stat. 2010, 37, 338–354. [Google Scholar] [CrossRef]
Zeng, D.; Mao, L.; Lin, D.Y. Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 2016, 103, 253–271. [Google Scholar] [CrossRef]
Lu, M.; Zhang, Y.; Huang, J. Estimation of the Mean Function with Panel Count Data Using Monotone Polynomial Splines. Biometrika 2007, 94, 705–718. [Google Scholar] [CrossRef]
Adams, R.A.; Fournier, J.J. Sobolev Spaces; Elsevier: Amsterdam, The Netherlands, 2003. [Google Scholar]
Hammer, S.M.; Squires, K.E.; Hughes, M.D.; Grimes, J.M.; Demeter, L.M.; Currier, J.S.; Eron, J.J., Jr.; Feinberg, J.E.; Balfour, H.H., Jr.; Deyton, L.R.; et al. A controlled trial of two nucleoside analogues plus indinavir in persons with human immunodeficiency virus infection and CD4 cell counts of 200 per cubic millimeter or less. N. Engl. J. Med. 1997, 337, 725–733. [Google Scholar]
Hosmer, D.W.; Lemeshow, S.; May, S. Applied Survival Analysis: Regression Modeling of Time-to-Event Data; Wiley: New York, NY, USA, 2008. [Google Scholar]
Jiang, R.; Lu, W.; Song, R.; Davidian, M. On estimation of optimal treatment regimes for maximizing t-year survival probability. J. R. Stat. Soc. Ser. B Stat. Methodol. 2017, 79, 1165–1185. [Google Scholar]
Geng, Y.; Zhang, H.H.; Lu, W. On optimal treatment regimes selection for mean survival time. Stat. Med. 2015, 34, 1169–1184. [Google Scholar] [CrossRef] [PubMed]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zhong, Q.; Mueller, J.; Wang, J.L. Deep learning for the partially linear Cox model. Ann. Stat. 2022, 50, 1348–1375. [Google Scholar] [CrossRef]
Wu, Q.; Tong, X.; Zhao, X. Deep partially linear cox model for current status data. Biometrics 2024, 80, ujae024. [Google Scholar] [CrossRef]
Du, M.; Wu, Q.; Tong, X.; Zhao, X. Deep learning for regression analysis of interval-censored data. Electron. J. Stat. 2024, 18, 4292–4321. [Google Scholar] [CrossRef]
Zeng, D.; Lin, D.Y. Efficient estimation of semiparametric transformation models for counting processes. Biometrika 2006, 93, 627–640. [Google Scholar] [CrossRef]
Bellera, C.A.; MacGrogan, G.; Debled, M.; De Lara, C.T.; Brouste, V.; Mathoulin-Pélissier, S. Variables with time-varying effects and the Cox model: Some statistical concepts illustrated with a prognostic factor study in breast cancer. BMC Med. Res. Methodol. 2010, 10, 20. [Google Scholar] [CrossRef]
Zhuang, L.; Ma, Y.; Fang, G.; Xu, A. Estimation and inference for fixed center effects on panel count data. Stat. Pap. 2026, 67, 27. [Google Scholar] [CrossRef]
Zhuang, L.; Ma, Y.; Fang, G.; Xu, A. Modeling Two-Scale Degradation with Heterogeneity: A Unified Random-Effects Inverse Gaussian Framework. IISE Trans. 2026, 1–23. [Google Scholar] [CrossRef]
Rubin, D.B. Multiple Imputation for Nonresponse in Surveys; Wiley: New York, NY, USA, 1987. [Google Scholar]
Zeng, D.; Lin, D.Y. Efficient estimation for the accelerated failure time model. J. Am. Stat. Assoc. 2007, 102, 1387–1396. [Google Scholar] [CrossRef]
Sun, J.; Sun, L. Semiparametric linear transformation models for current status data. Can. J. Stat. 2005, 33, 85–96. [Google Scholar] [CrossRef]
Vapnik, V. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Its Appl. 1971, 16, 264–280. [Google Scholar] [CrossRef]
van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes: With Applications to Statistics; Springer: New York, NY, USA, 1996. [Google Scholar]
De Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 1978. [Google Scholar]
Bickel, P.J.; Klaassen, C.A.J.; Ritov, Y.; Wellner, J.A. Efficient and Adaptive Estimation for Semiparametric Models; Springer: New York, NY, USA, 1993. [Google Scholar]
Huang, J.; Rossini, A.J. Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J. Am. Stat. Assoc. 1997, 92, 960–967. [Google Scholar] [CrossRef]
Kosorok, M.R. Introduction to Empirical Processes and Semiparametric Inference; Springer: New York, NY, USA, 2008. [Google Scholar]
Shen, X.; Wong, W.H. Convergence rate of sieve estimates. Ann. Stat. 1994, 22, 580–615. [Google Scholar] [CrossRef]

Figure 1. Estimation results of

exp (- Λ (t))

(left) and link function

φ

(right) with the true

Λ_{0} (t) = t^{r} / λ^{r}, r = 2.5, λ = 2.5

,

φ_{0} (y) = 2 y + 0.1

.

Figure 1. Estimation results of

exp (- Λ (t))

(left) and link function

φ

(right) with the true

Λ_{0} (t) = t^{r} / λ^{r}, r = 2.5, λ = 2.5

,

φ_{0} (y) = 2 y + 0.1

.

Figure 2. Estimation results of

exp (- Λ (t))

(left) and link function

φ

(right) with the true

Λ_{0} (t) = t^{r} / λ^{r}, r = 2.5, λ = 2.5

,

φ_{0} (y) = exp (y) - 1.8

.

Figure 2. Estimation results of

exp (- Λ (t))

(left) and link function

φ

(right) with the true

Λ_{0} (t) = t^{r} / λ^{r}, r = 2.5, λ = 2.5

,

φ_{0} (y) = exp (y) - 1.8

.

Figure 3. Estimation results of the survival functions for patients categorized by their

R S (W)

values: (i) less than 0.2, (ii) between 0.2 and 0.5 inclusive, and (iii) greater than 0.5. In the figures, the red, green, and black lines represent the survival probabilities under the proposed model, the Cox proportional hazards model, and the optimal treatment assignment, respectively.

Figure 3. Estimation results of the survival functions for patients categorized by their

R S (W)

values: (i) less than 0.2, (ii) between 0.2 and 0.5 inclusive, and (iii) greater than 0.5. In the figures, the red, green, and black lines represent the survival probabilities under the proposed model, the Cox proportional hazards model, and the optimal treatment assignment, respectively.

Figure 4. Estimation of the link function

φ

.

Figure 4. Estimation of the link function

φ

.

Figure 5. Estimated link function

φ

versus Age.

Figure 5. Estimated link function

φ

versus Age.

Table 1. Simulation results for the estimation of all parameters with

α = {(- 0.4, - 0.3, 0.3, 0.4)}^{⊤}

,

β = {(0.8, - 0.6)}^{⊤}

, and

φ_{0} (y) = 2 y + 0.1

.

Table 1. Simulation results for the estimation of all parameters with

α = {(- 0.4, - 0.3, 0.3, 0.4)}^{⊤}

,

β = {(0.8, - 0.6)}^{⊤}

, and

φ_{0} (y) = 2 y + 0.1

.

n	Par.	The Proposed Method				Cox Method
n	Par.	Bias	SSE	SEE	CP95	Bias	SSE
500	$β_{1}$	−0.036	0.086	0.094	0.934	0.845	0.294
	$β_{2}$	−0.031	0.106	0.120	0.950	−0.631	0.272
	$α_{1}$	0.046	0.235	0.247	0.945	−0.012	0.196
	$α_{2}$	0.011	0.204	0.219	0.971	−0.022	0.185
	$α_{3}$	0.024	0.146	0.155	0.955	0.010	0.133
	$α_{4}$	0.035	0.142	0.160	0.949	0.017	0.130
	$β_{a v}$	0.017	0.171	0.221	0.976	−0.010	0.157
800	$β_{1}$	−0.028	0.066	0.072	0.948	0.816	0.217
	$β_{2}$	−0.026	0.086	0.091	0.938	−0.618	0.207
	$α_{1}$	0.045	0.189	0.199	0.928	−0.005	0.142
	$α_{2}$	0.009	0.158	0.166	0.946	−0.011	0.130
	$α_{3}$	0.013	0.105	0.112	0.959	0.006	0.102
	$α_{4}$	0.015	0.106	0.113	0.975	0.008	0.100
	$β_{a v}$	0.009	0.134	0.146	0.958	−0.004	0.123
1000	$β_{1}$	−0.024	0.058	0.063	0.927	0.815	0.185
	$β_{2}$	−0.024	0.073	0.078	0.923	−0.616	0.175
	$α_{1}$	0.043	0.155	0.178	0.920	−0.004	0.125
	$α_{2}$	0.008	0.132	0.149	0.956	−0.008	0.129
	$α_{3}$	0.012	0.091	0.095	0.964	0.001	0.088
	$α_{4}$	0.002	0.090	0.096	0.949	0.004	0.092
	$β_{a v}$	0.004	0.112	0.123	0.976	0.004	0.105

Table 2. Simulation results for the estimation of all parameters with

α = {(- 0.4, - 0.3, 0.3, 0.4)}^{⊤}

,

β = {(0.8, - 0.6)}^{⊤}

, and

φ_{0} (y) = exp (y) - 1.8

.

Table 2. Simulation results for the estimation of all parameters with

α = {(- 0.4, - 0.3, 0.3, 0.4)}^{⊤}

,

β = {(0.8, - 0.6)}^{⊤}

, and

φ_{0} (y) = exp (y) - 1.8

.

n	Par.	The Proposed Method				Cox Method
n	Par.	Bias	SSE	SEE	CP95	Bias	SSE
500	$β_{1}$	−0.034	0.167	0.186	0.918	0.138	0.299
	$β_{2}$	0.017	0.214	0.223	0.932	−0.122	0.278
	$α_{1}$	−0.016	0.226	0.223	0.925	−0.009	0.194
	$α_{2}$	−0.007	0.201	0.209	0.959	−0.019	0.182
	$α_{3}$	0.015	0.137	0.146	0.957	0.016	0.131
	$α_{4}$	0.023	0.140	0.147	0.945	0.019	0.134
	$β_{a v}$	0.033	0.201	0.213	0.945	−0.045	0.162
800	$β_{1}$	−0.014	0.119	0.140	0.947	0.124	0.211
	$β_{2}$	0.014	0.157	0.175	0.951	−0.122	0.215
	$α_{1}$	−0.014	0.160	0.170	0.938	−0.001	0.143
	$α_{2}$	−0.006	0.148	0.163	0.962	−0.007	0.130
	$α_{3}$	0.007	0.110	0.109	0.930	0.004	0.107
	$α_{4}$	0.016	0.106	0.110	0.948	0.012	0.105
	$β_{a v}$	0.032	0.150	0.164	0.949	−0.022	0.123
1000	$β_{1}$	−0.012	0.112	0.124	0.926	0.123	0.179
	$β_{2}$	0.013	0.147	0.158	0.934	−0.113	0.193
	$α_{1}$	−0.009	0.140	0.151	0.953	<0.001	0.124
	$α_{2}$	−0.004	0.138	0.146	0.953	−0.004	0.127
	$α_{3}$	−0.004	0.092	0.096	0.937	−0.001	0.090
	$α_{4}$	0.003	0.097	0.096	0.947	0.001	0.096
	$β_{a v}$	0.030	0.141	0.150	0.952	−0.017	0.110

Table 3. Simulation results on the treatment assignments.

Optimal Treatment	Linear Link Function Trt. Assign.		Exp Link Function Trt. Assign.
Optimal Treatment	0	1	0	1
0	52.7%	0.5%	15.1%	0.1%
1	0.3%	46.5%	0.1%	84.7%

Note: “Trt.” stands for “Treatment”; “Assign.” stands for “Assignment”.

Table 4. Analysis results of the ACTG320 data.

Covariate	The Proposed Method			Cox Method
Covariate	EST	SE	$p$ -Value	EST	SE	$p$ -Value
Main Effects
`Sex`	0.227	0.374	0.543	0.311	0.460	0.499
`Race`	−0.448	0.224	0.046	−0.512	0.224	0.022
`Ivdrug`	0.610	0.408	0.135	0.610	0.550	0.267
`Hemophil`	0.444	0.549	0.418	0.550	0.493	0.265
`Weight`	−0.096	0.202	0.636	−0.210	0.208	0.313
`Karnof`	−0.257	0.269	0.340	−0.290	0.211	0.170
`AveCD4`	−0.236	0.234	0.312	−0.155	0.237	0.513
`AveCD8`	0.064	0.149	0.666	0.031	0.131	0.812
`Priorzdv`	−0.019	0.131	0.887	−0.035	0.123	0.775
`Age`	−0.595	0.300	0.047	−0.344	0.225	0.127
Average treatment effect	1.043	0.026	0	0.929	0.395	0.019
Interactions
`Weight`	0.153	0.240	0.523	0.262	0.240	0.274
`Karnof`	0.297	0.313	0.343	0.341	0.246	0.166
`AveCD4`	−0.649	0.265	0.014	−0.605	0.291	0.038
`Age`	0.683	0.287	0.017	0.327	0.256	0.201

Table 5. Estimation of the survival probability at fixed time points.

Time (Days)	Treatment A	Treatment B	Treatment C
$t =$ 100	0.9025	0.9732	0.9735
$t =$ 200	0.8552	0.9568	0.9573
$t =$ 300	0.8222	0.9442	0.9448

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yuan, C.; Zhao, S.; Li, S. A Semiparametric Single-Index Modelling Approach to Learning Optimal Treatment Regimens with Interval-Censored Data. Symmetry 2026, 18, 532. https://doi.org/10.3390/sym18030532

AMA Style

Yuan C, Zhao S, Li S. A Semiparametric Single-Index Modelling Approach to Learning Optimal Treatment Regimens with Interval-Censored Data. Symmetry. 2026; 18(3):532. https://doi.org/10.3390/sym18030532

Chicago/Turabian Style

Yuan, Changhui, Shishun Zhao, and Shiying Li. 2026. "A Semiparametric Single-Index Modelling Approach to Learning Optimal Treatment Regimens with Interval-Censored Data" Symmetry 18, no. 3: 532. https://doi.org/10.3390/sym18030532

APA Style

Yuan, C., Zhao, S., & Li, S. (2026). A Semiparametric Single-Index Modelling Approach to Learning Optimal Treatment Regimens with Interval-Censored Data. Symmetry, 18(3), 532. https://doi.org/10.3390/sym18030532

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Semiparametric Single-Index Modelling Approach to Learning Optimal Treatment Regimens with Interval-Censored Data

Abstract

1. Introduction

2. Notation, Models and Likelihood

3. Estimation Procedure

4. Asymptotic Behavior and Variance Estimation

5. Simulation Studies

6. Application

7. Discussion and Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Proofs of Theorems 1 and 2

Appendix B. Sensitivity Analysis

Appendix C. A Summary Table of Symbols

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI