Optimal Model Averaging for Semiparametric Partially Linear Models with Censored Data

Hu, Guozhi; Cheng, Weihu; Zeng, Jie

doi:10.3390/math11030734

Open AccessArticle

Optimal Model Averaging for Semiparametric Partially Linear Models with Censored Data

by

Guozhi Hu

¹,

Weihu Cheng

² and

Jie Zeng

^1,*

¹

School of Mathematics and Statistics, Hefei Normal University, Hefei 230601, China

²

Faculty of Science, Beijing University of Technology, Beijing 100124, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(3), 734; https://doi.org/10.3390/math11030734

Submission received: 29 December 2022 / Revised: 25 January 2023 / Accepted: 28 January 2023 / Published: 1 February 2023

(This article belongs to the Special Issue Statistical Methods and Models for Survival Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

In the past few decades, model averaging has received extensive attention, and has been regarded as a feasible alternative to model selection. However, this work is mainly based on parametric model framework and complete dataset. This paper develops a frequentist model-averaging estimation for semiparametric partially linear models with censored responses. The nonparametric function is approximated by B-spline, and the weights in model-averaging estimator are picked up via minimizing a leave-one-out cross-validation criterion. The resulting model-averaging estimator is proved to be asymptotically optimal in the sense of achieving the lowest possible squared error. A simulation study demonstrates that the method in this paper is superior to traditional model-selection and model-averaging methods. Finally, as an illustration, the proposed procedure is further applied to analyze two real datasets.

Keywords:

model averaging; asymptotic optimality; leave-one-out cross-validation; partially linear model; censored data

MSC:

62G08; 62G20; 62N02

1. Introduction

The semiparametric partially linear model (PLM), which was proposed by [1], has attracted extensive attention in statistics because it combines the interpretability of the linear model with the flexibility of the nonparametric model. A large collection of literature has explored the estimation methods of this model, including the parametric part and the nonparametric function, such as [2,3,4,5], and so on. The premise of using methods in the aforementioned literature is that the model is correctly specified. However, in real data analysis, researchers are always able to collect a variety of variables and are not sure which variables are to be included in the true model. This kind of uncertainty is generally referred to as model uncertainty, which brings great trouble to statistical analysis.

Refs. [6,7,8] pointed out that model selection and model averaging are two mainstream methods to deal with model uncertainty. Model selection, which has a long history, selects a model from a series of candidate models through selection criteria, for example, Akaike information criterion (AIC [9]), Bayesian information criterion (BIC [10]), focused information criterion (FIC [11]), and so on. In addition, shrinkage-estimation-based variable selection was also applied to determine which variables are neededto build a PLM (see, e.g., [12,13,14], among others). These model-selection methods can be viewed as combining a series of candidate models, and assigning a weight of 1 to the selected model and 0 to other models.

As an important alternative to model selection, model averaging incorporates the model uncertainty in statistical analysis by assigning nonzero weight vector to a set of candidate models, which frequently leads to more effective results (see [15]). Bayesian model averaging, which is an important branch of model averaging, has been fully developed in the past decades; see [16] for details. In the current paper, we focus on model-averaging approach in PLM from a frequentist perspective. Since ref. [17] pioneered the use of Mallows criterion for weight choice in model averaging, there is rapidly growing research on asymptotically optimal model averaging. Different kinds of optimal model-averaging methods were proposed, including jackknife model averaging (JMA [7]), Kullback–Leibler model averaging [18], generalized least-squares model averaging [19], leave-subject-out cross-validation [20], K-fold cross-validation [21], and so on. In addition, the optimal model-averaging methods were extended to quantile regression [22], semiparametric models [23,24], missing data [25,26], functional data [27], measurement error data [28], and high-dimensional data [29,30].

Censored data are ubiquitous in the fields of biomedicine, industry, econometrics, etc. For example, in biomedicine, when some sampled individuals are lost to follow-up before the end of the study or drop out during the study, the survival time will be subject to censoring. Compared with the complete data, censoring makes some data unable to be observed completely, which increases the difficulty of statistical analysis. Although an extensive body of literature focusing on the estimation methods in the presence of censored data has been developed, such as [31,32], there is little work on the development of model-averaging approaches with censored data. Based on FIC advocated in [11,33,34,35], theydeveloped model-averaging methods for different regression models with censored data under the local misspecification framework, where the weights of the model-averaging estimators were constructed by the information criterion values, rather than being selected in a data-driven fashion. Moreover, the framework mentioned above requires that the distance between the candidate model and the true model is

O (1 / \sqrt{n})

. This means the candidate model is close to the true model when the sample size is large, which is unrealistic.

Without local misspecification framework, ref. [36] constructed the optimal model-averaging estimator for a high-dimensional linear model with censored data by adapting a leave-one-out cross-validation criterion. Ref. [37] studied Mallows model-averaging method for linear models with censored responses, and the resulting model-averaging estimator was proved to be asymptotically optimal in terms of minimizing the squared error loss. The optimal model-averaging methods for censored data mentioned above are based on classical linear models. The primary object of the current paper is to construct an optimal model-averaging estimator for semiparametric PLM with censored responses, in which the weight vector is selected by minimizing a leave-one-out cross-validation criterion. Compared with [36,37], we confront two major challenges. Firstly, the nonparametric function in PLM significantly complicates the construction of the model-averaging estimator and the development of the weight choice criterion. Secondly, our proof of optimality cannot follow the approach of the linear model, since their proof techniques cannot be directly applied when a nonparametric part is present.

The plan of this article is as follows. In Section 2, we describe the model set and introduce the parametric estimation method in the candidate PLM. Section 3 constructs the model-averaging estimator and proposes a weight choice criterion. Section 4 establishes the asymptotic optimality of the model-averaging estimator. Section 5 explores the finite sample performance of our method by the simulation study. Section 6 applies the proposed method to a real dataset. Section 7 gives some conclusions. Proofs are listed in the Appendix A.

2. Model Setup and Parametric Estimation

To facilitate presentation, we first list the basic notations used in this paper in Table 1. Then, we consider the following PLM:

Y_{i} = μ_{i} + ϵ_{i} = \sum_{j = 1}^{\infty} x_{i j} β_{j} + g (U_{i}) + ϵ_{i}, i = 1, \dots, n,

(1)

where

Y_{i}

is a response variable with a continuous distribution function

F (\cdot)

, a countably infinite vector

X_{i} = {(x_{i 1}, x_{i 2}, \dots)}^{⊤}

is linearly related to

Y_{i}

,

U_{i}

is a covariate nonlinearly related to

Y_{i}

,

g (\cdot)

is an unknown smooth function, and

ϵ_{i}

is the model error with

E (ϵ_{i} | X_{i}, U_{i}) = 0

and

E (ϵ_{i}^{2} | X_{i}, U_{i}) = σ_{i}^{2}

. The covariate

U_{i}

is distributed on a compact interval

[a^{★}, b^{★}]

. Without loss of generality, we take

[a^{★}, b^{★}] = [0, 1]

. The conditional expectation of the response is denoted as

E (Y_{i} | X_{i}, U_{i}) = μ_{i} = \sum_{j = 1}^{\infty} x_{i j} β_{j} + g (U_{i})

.

In survival analysis, we assume

Y_{i}

to be a known monotonic transformation of the survival time

T_{i}

, for example, the commonly used logarithm

Y_{i} = log T_{i}

.

Y_{i}

may be censored by a censoring time

C_{i}

and hence cannot be observed completely. We consider a sample of independent observations

(Z_{i}, δ_{i}, X_{i}, U_{i})

for

i = 1, \dots, n

, where

Z_{i} = min (Y_{i}, C_{i})

and

δ_{i} = I (Y_{i} \leq C_{i})

is the censoring indicator.

Let

G (\cdot)

be the cumulative distribution function of the censoring time, and

Z_{G, i} = Z_{i} δ_{i} / {1 - G (Z_{i})}

be a synthetic response. Then, from [38], it is not difficult to verify that

E (Z_{G, i} | X_{i}, U_{i}) = E (Y_{i} | X_{i}, U_{i}) = μ_{i}

. Therefore, under model (1), we obtain

Z_{G, i} = μ_{i} + e_{G, i} = \sum_{j = 1}^{\infty} x_{i j} β_{j} + g (U_{i}) + e_{G, i}, i = 1, \dots, n,

(2)

where

E (e_{G, i} | X_{i}, U_{i}) = 0

and

E (e_{G, i}^{2} | X_{i}, U_{i}) = σ_{G, i}^{2}

. Model (2) can be expressed in matrix form as

Z_{G} = μ + e_{G} = X β + M + e_{G},

(3)

where

Z_{G} = {(Z_{G, 1}, \dots, Z_{G, n})}^{⊤}

is an n-dimensional synthetic response vector, conditional mean

μ = {(μ_{1}, \dots, μ_{n})}^{⊤}

is an n-dimensional vector,

M = {g (U_{1}), \dots, g (U_{n})}^{⊤}

,

X = {(X_{1}, \dots, X_{n})}^{⊤}

is a linear covariate matrix,

U = {(U_{1}, \dots, U_{n})}^{⊤}

, and

e_{G} = {(e_{G, 1}, \dots, e_{G, n})}^{⊤}

is an n-dimensional error vector satisfying

E (e_{G} | X, U) = 0

and

E (e_{G} e_{G}^{⊤} | X, U) = Ω_{G} = diag (σ_{G, 1}^{2}, \dots, σ_{G, n}^{2})

.

Assume that we have a total of S candidate PLMs to approximate the true data generating process, where S is allowed to go to infinity. Suppose the sth candidate model is

Z_{G} = μ_{(s)} + e_{G, (s)} = X_{(s)} β_{(s)} + M_{(s)} + e_{G, (s)},

(4)

where

X_{(s)} = {(X_{(s), 1}, \dots, X_{(s), n})}^{⊤}

is an

n \times p_{s}

covariate matrix which includes

p_{s}

columns of X with full column rank,

β_{(s)} = {(β_{(s) 1}, \dots, β_{(s) p_{s}})}^{⊤}

is the corresponding

p_{s} \times 1

unknown linear regression coefficient vector,

M_{(s)} = {g_{(s)} (U_{1}), \dots, g_{(s)} (U_{n})}^{⊤}

is an

n \times 1

unknown nonparametric function vector, and

e_{G, (s)}

is the model error.

To obtain the estimator of

μ

under the sth candidate model, we should estimate the coefficient vector

β_{(s)}

and nonparametric function vector

M_{(s)}

firstly. There are many estimation methods for model (4), including kernel smoothing, polynomial spline smoothing, and so on. Recently, ref. [39] pointed out that using B-splines to approximate nonparametric functions in the area of model averaging has great advantages; therefore, in this paper, we adopt the spline technique to estimate the unknowns in model (4).

Denote

ψ_{n} = {0 = u_{0} < u_{1} < \dots < u_{J_{n}} < u_{J_{n} + 1} = 1}

as a partition of

[0, 1]

, where

J_{n}

is the number of interior knots. Let

ζ_{n}

be the polynomial spline space on interval

[0, 1]

of degree r. From [40], the nonparametric function in PLM can be well estimated by a B-spline expansion. Then, for

g_{s} (u)

, one can write

g_{(s)} (u) \approx B_{(s)}^{⊤} (u) α_{(s)},

(5)

where

B_{(s)} (\cdot) = {(B_{(s) 1} (\cdot), \dots, B_{(s) k_{n}} (\cdot))}^{⊤}

is the normalized B-spline basis function vector in the sth candidate model,

k_{n} = J_{n} + r + 1

, and

α_{(s)} = {(α_{(s) 1}, \dots, α_{(s) k_{n}})}^{⊤}

is the vector of spline coefficient. Define the

n \times k_{n}

matrix

B_{(s)} = {(B_{(s)} (U_{1}), \dots, B_{(s)} (U_{n}))}^{⊤}

. Therefore, there exists a design matrix

X_{(s)}^{*} = (X_{(s)}, B_{(s)})

and the corresponding unknown parameter vector

γ = {(β^{⊤}, α^{⊤})}^{⊤}

such that

μ_{(s)} \approx X_{(s)}^{*} γ,

(6)

where

X_{(s)}^{*}

is supposed to be of full-column rank. By regressing

Z_{G}

on

X_{(s)}^{*}

, we can obtain the least-squares estimators of

β_{(s)}

and

α_{(s)}

:

{\hat{β}}_{G, (s)} = {X_{(s)}^{⊤} (I - Q_{(s)}) X_{(s)}}^{- 1} X_{(s)}^{⊤} (I - Q_{(s)}) Z_{G},

(7)

and

{\hat{α}}_{G, (s)} = {(B_{(s)}^{⊤} B_{(s)})}^{- 1} B_{(s)}^{⊤} (Z_{G} - X_{(s)} {\hat{β}}_{(s)}),

(8)

where

Q_{(s)} = B_{(s)} {(B_{(s)}^{⊤} B_{(s)})}^{- 1} B_{(s)}^{⊤}

. Therefore, the estimator of

μ

under the sth candidate model is given by

{\hat{μ}}_{G, (s)} = X_{(s)} {\hat{β}}_{G, (s)} + B_{(s)} {\hat{α}}_{G, (s)} = {Q_{(s)} + {\tilde{X}}_{(s)} {({\tilde{X}}_{(s)}^{⊤} {\tilde{X}}_{(s)})}^{- 1} {\tilde{X}}_{(s)}^{⊤}} Z_{G} = P_{(s)} Z_{G},

(9)

where

{\tilde{X}}_{(s)} = (I - Q_{(s)}) X_{(s)}

, and

P_{(s)} = Q_{(s)} + {\tilde{X}}_{(s)} {({\tilde{X}}_{(s)}^{⊤} {\tilde{X}}_{(s)})}^{- 1} {\tilde{X}}_{(s)}^{⊤}

. From Equation (9), we find that

{\hat{μ}}_{G, (s)}

is linearly dependent on

Z_{G}

.

3. Model-Averaging Estimator and Weight Choice Criterion

Let

ω = {(ω_{1}, \dots, ω_{S})}^{⊤}

be a weight vector belonging to the set

W = {ω \in {[0, 1]}^{S} : \sum_{s = 1}^{S} ω_{s} = 1}

, then the model-averaging estimator of

μ

can be formulated as

{\hat{μ}}_{G} (ω) = \sum_{s = 1}^{S} ω_{s} {\hat{μ}}_{G, (s)} = P (ω) Z_{G},

(10)

where

P (ω) = \sum_{s = 1}^{S} ω_{s} P_{(s)}

.

Motivated by [7], we propose a leave-one-out cross-validation criterion to select the weight vector

ω

for Equation (10) in the PLM framework. Let

X_{(s)}^{[- i]}

,

B_{(s)}^{[- i]}

and

Z_{G}^{[- i]}

be the matrices/vectors

X_{(s)}

,

B_{(s)}

and

Z_{G}

with the ith row deleted. The leave-one-out estimator of

μ_{i}

in the sth candidate model is given by

{\tilde{μ}}_{G, (s), i} = X_{(s), i}^{⊤} {\hat{β}}_{G, (s)}^{[- i]} + B_{(s)}^{⊤} (U_{i}) {\hat{α}}_{G, (s)}^{[- i]},

(11)

where

{\hat{β}}_{G, (s)}^{[- i]} = {\{X_{(s)}^{[- i] ⊤} (I - {\tilde{Q}}_{(s)}) X_{(s)}^{[- i]}\}}^{- 1} X_{(s)}^{[- i] ⊤} (I - {\tilde{Q}}_{(s)}) Z_{G}^{[- i]},

{\hat{α}}_{G, (s)}^{[- i]} = {(B_{(s)}^{[- i] ⊤} B_{(s)}^{[- i]})}^{- 1} B_{(s)}^{[- i] ⊤} (Z_{G}^{[- i]} - X_{(s)}^{[- i]} {\hat{β}}_{G, (s)}^{[- i]}),

and

{\tilde{Q}}_{(s)} = B_{(s)}^{[- i]} {(B_{(s)}^{[- i] ⊤} B_{(s)}^{[- i]})}^{- 1} B_{(s)}^{[- i] ⊤}

. Denote the sth jackknife estimator and the corresponding jackknife version of the averaging estimator as

{\tilde{μ}}_{G, (s)} = {({\tilde{μ}}_{G, (s), 1}, \dots, {\tilde{μ}}_{G, (s), n})}^{⊤}

and

{\tilde{μ}}_{G} (ω) = \sum_{s = 1}^{S} ω_{s} {\tilde{μ}}_{G, (s)}

. The leave-one-out cross-validation weight choice criterion is

C V_{G} (ω) = {∥ Z_{G} - {\tilde{μ}}_{G} (ω) ∥}^{2},

(12)

then minimizing

C V_{G} (ω)

over the space W yields the optimal weight vector. However, in practice, such a minimization process is computationally infeasible because the cumulative distribution function

G (\cdot)

in Equation (12) is unknown and needs to be estimated. Similar to [41], we can estimate

G (\cdot)

by the commonly used Kaplan–Meier estimator

{\hat{G}}_{n} (z) = 1 - \prod_{i = 1}^{n} {[\frac{n - i}{n - i + 1}]}^{I [Z_{(i)} \leq z, δ_{(i)} = 0]},

(13)

where

Z_{(1)} \leq Z_{(2)} \leq \dots \leq Z_{(n)}

denote the order statistics of

Z_{1}, Z_{2}, \dots, Z_{n}

, and

δ_{(i)}

is the indicator corresponding to

Z_{(i)}

. In what follows, a letter subscripted by

{\hat{G}}_{n}

denotes that it is obtained by replacing G in its corresponding estimator with

{\hat{G}}_{n}

. For instance,

Z_{{\hat{G}}_{n}}

is obtained by replacing G with its estimator

{\hat{G}}_{n}

in

Z_{G}

. Then a feasible counterpart of

C V_{G} (ω)

is given by

C V_{{\hat{G}}_{n}} (ω) = {∥ Z_{{\hat{G}}_{n}} - {\tilde{μ}}_{{\hat{G}}_{n}} (ω) ∥}^{2},

(14)

where

{\tilde{μ}}_{{\hat{G}}_{n}} (ω) = \sum_{s = 1}^{S} ω_{s} {\tilde{μ}}_{{\hat{G}}_{n}, (s)}

, and

{\tilde{μ}}_{{\hat{G}}_{n}, (s)} = {({\tilde{μ}}_{{\hat{G}}_{n}, (s), 1}, \dots, {\tilde{μ}}_{{\hat{G}}_{n}, (s), n})}^{⊤}

. Minimizing

C V_{{\hat{G}}_{n}} (ω)

with respect to

ω

over the set W leads to the jackknife choice of weight vector

\hat{ω} = arg min_{ω \in W} C V_{{\hat{G}}_{n}} (ω) .

(15)

Plugging

{\hat{G}}_{n}

and

\hat{ω}

into Equation (10) yields the model-averaging estimator of

μ

, written as

{\hat{μ}}_{{\hat{G}}_{n}} (\hat{ω})

, which is named the censored partially linear model averaging (CPLMA) estimator hereafter.

However, minimizing the weight choice criterion (14) is not easy because the computation of

{\tilde{μ}}_{{\hat{G}}_{n}, (s)}

requires n separate regressions, which is especially cumbersome when the number of candidate models and the sample sizes are large. Motivated by the computationally efficient cross-validation criterion introduced by [7] for linear regression model, we express

{\tilde{μ}}_{{\hat{G}}_{n}, (s)}

in a simple form which yields an enormous reduction in calculation time. Let

ϕ_{i i}^{(s)}

be the ith diagonal entry of

P_{(s)}

. From [20,42],

{\tilde{μ}}_{{\hat{G}}_{n}, (s)}

can be conveniently written as

{\tilde{μ}}_{{\hat{G}}_{n}, (s)} = {P_{(s)} - D_{(s)} A_{(s)}} Z_{{\hat{G}}_{n}} = {\tilde{P}}_{(s)} Z_{{\hat{G}}_{n}},

(16)

where

D_{(s)} = diag (D_{11}^{(s)}, \dots, D_{n n}^{(s)})

,

D_{i i}^{(s)} = ϕ_{i i}^{(s)} + {(ϕ_{i i}^{(s)})}^{2} + {(ϕ_{i i}^{(s)})}^{3} + \dots = ϕ_{i i}^{(s)} / (1 - ϕ_{i i}^{(s)})

, and

A_{(s)} = I - P_{(s)}

. The shortcut formula of

{\tilde{μ}}_{{\hat{G}}_{n}, (s)}

given in (16) indicates that all elements in

{\tilde{μ}}_{{\hat{G}}_{n}, (s)}

can be simultaneously calculated based on all observations, which is much more convenient and time-saving than the standard method based on Equation (11). Let

Λ_{(s)} = D_{(s)} A_{(s)}

,

Λ (ω) = \sum_{s = 1}^{S} ω_{s} Λ_{(s)}

, and

A (ω) = \sum_{s = 1}^{S} ω_{s} A_{(s)} = I - P (ω)

. The corresponding computational shortcut formula for the feasible jackknife criterion (14) then follows as

\begin{matrix} C V_{{\hat{G}}_{n}} (ω) & = ∥ Z_{{\hat{G}}_{n}} - {P (ω) - Λ (ω)} Z_{{\hat{G}}_{n}} ∥^{2} \\ = ∥ {A (ω) + Λ (ω)} Z_{{\hat{G}}_{n}} ∥^{2} \\ = Z_{{\hat{G}}_{n}}^{⊤} {A (ω) + Λ (ω)}^{⊤} {A (ω) + Λ (ω)} Z_{{\hat{G}}_{n}} \\ = ω^{⊤} H_{{\hat{G}}_{n}}^{⊤} H_{{\hat{G}}_{n}} ω, \end{matrix}

(17)

where

H_{{\hat{G}}_{n}} = {(A_{(1)} + Λ_{(1)}) Z_{{\hat{G}}_{n}}, \dots, (A_{(S)} + Λ_{(S)}) Z_{{\hat{G}}_{n}}}

is an

n \times S

matrix. From Equation (17), we observe that the minimization of

C V_{{\hat{G}}_{n}} (ω)

is a standard quadratic programming problem, which can be performed by various existing software packages, for example, the quadprog package in R [43].

4. Asymptotic Optimality

In this section, we demonstrate that the resulting weight vector, which is obtained by minimizing the weight choice criterion

C V_{{\hat{G}}_{n}} (ω)

, is asymptotically optimal under some mild conditions.

Define the squared loss as

L_{G} (ω) = {∥ {\hat{μ}}_{G} (ω) - μ ∥}^{2}

, and the corresponding risk function as

R_{G} (ω) = E (L_{G} (ω) | X, U) = {∥ A (ω) μ ∥}^{2} + tr {P (ω) Ω_{G} P (ω)}

. Let

\bar{p} = {max}_{1 \leq s \leq S} p_{s}

,

\bar{q} = {max}_{1 \leq s \leq S} rank (B_{(s)})

,

ξ_{G} = {inf}_{ω \in W} R_{G} (ω)

, and

ω_{s}^{0}

be a

S \times 1

vector with the sth entry taking on 1 and the others taking on 0. To prove the asymptotic optimality of the model-averaging estimator

{\hat{μ}}_{{\hat{G}}_{n}} (\hat{ω})

, we list the following regularity conditions, where all limiting processes correspond to

n \to \infty

.

(Condition (C.1)) $τ_{F} < τ_{G}$ , where $τ_{L} = inf {t : L (t) = 1}$ for any distribution function L.
(Condition (C.2)) $\bar{λ} (Ω_{G}) \leq C_{G}$ , where $\bar{λ} (\cdot)$ denotes the maximum singular value of a matrix, and $C_{G}$ is a constant.
(Condition (C.3)) $ξ_{G}^{- 2} \sum_{s = 1}^{S} R_{G} (ω_{s}^{0}) \to 0$ , a.s.
(Condition (C.4)) $\bar{p} ξ_{G}^{- 1} \to 0$ and $\bar{q} ξ_{G}^{- 1} \to 0$ , a.s.
(Condition (C.5)) ${∥ μ ∥}^{2} = O (n)$ , a.s.
(Condition (C.6)) $ϕ_{i i}^{(s)} \leq C_{s} n^{- 1} tr (P_{(s)})$ , a.s., where $C_{s}$ is a constant.
(Condition (C.7)) The function g belongs to a class of functions $A$ , whose rth derivative $g_{★}^{(r)}$ exsits and is Lipschitz of order $α_{0}$ . That is,

$A = {g_{★} (\cdot) : | g_{★}^{(r)} (t_{1}) - g_{★}^{(r)} (t_{0}) | \leq M^{★} | t_{1} - t_{0} |^{α_{0}} f o r t_{1}, t_{0} \in U},$

for some positive constant $M^{★}$ , where $U$ is the support of U, r is a nonnegative integer and $α_{0} \in (0, 1]$ such that $r + α_{0} > 0.5$ .

Condition (C.1), which is the same as condition (C5) in [35], is widely used to ensure the uniform convergence of the Kaplan–Meier estimator

{\hat{G}}_{n} (\cdot)

in studies of the censored data. Condition (C.2) imposes a mild restriction on the maximum singular value of the covariance matrix

Ω_{G}

, which is also used by [44]. Condition (C.3), which is from condition (21) of [45], is less restrictive than the commonly used condition

S ξ_{G}^{- 2 N} \sum_{s = 1}^{S} {R_{G} (ω_{s}^{0})}^{N} \to 0, a . s .

, for some constant

N \geq 1

in model-averaging references. Condition (C.4) places constraints on the growth rates of

\bar{p}

and

\bar{q}

, which is similar to condition (22) in [45]. Condition (C.5) discusses the sum of

μ_{i}^{2}

, and is frequently used in the model-averaging literature, such as [23,24], and so on. Condition (C.6) is a common assumption utilized to guarantee the asymptotic optimality of cross-validation; see [7,24], for instance. Conditions (C.3)–(C.6) require almost sure convergence, which ensure that the result (18) holds whether the covariates X and U are random or not. Specifically, when X and U are nonstochastic, we only need to assume convergence in probability in conditions (C.3)–(C.6); see [46]. Otherwise, we should impose almost sure convergence to guarantee that the proof method, which is used in the case of nonstochastic, is still effective. Condition (C.7) is required for the B-spline approximation in PLM; see [39,47].

Theorem 1 indicates that the CPLMA estimator proposed in this paper is asymptotically optimal in the sense that its squared error loss is asymptotically equivalent to that of the infeasible best possible model-averaging estimator in PLM framework. The proof of Theorem 1 is shown in the Appendix A.

Theorem 1.

Under Conditions (C.1)–(C.7), we have

\frac{L_{{\hat{G}}_{n}} (\hat{ω})}{{inf}_{ω \in W} L_{{\hat{G}}_{n}} (ω)} \to 1

(18)

in probability as

n \to \infty

.

5. A Simulation Study

In this section, a simulation experiment is conducted to investigate the finite sample performance of the CPLMA estimator, which arises from the proposed leave-one-out cross-validation weight choice approach, in PLM with censored responses. We compare it with several popular information-criterion-based model-selection methods as well as other model-averaging procedures.

5.1. The Design of Simulation

The data-generating process in this part is similar to the infinite-order regression model proposed by [17], except that responses are subject to censoring and a nonparametric function is included in addition to the linear part. Specifically, the data are generated by the following regression model:

Y_{i} = μ_{i} + ϵ_{i} = \sum_{j = 1}^{200} x_{i j} β_{j} + g (U_{i}) + ϵ_{i},

(19)

where

X_{i} = {(x_{i 1}, \dots, x_{i 200})}^{⊤}

, the covariates in linear component, follows a multivariate normal distribution with mean 0 and covariance

{0.5}^{|j_{1} - j_{2}|}

between

x_{i j_{1}}

and

x_{i j_{2}}

. The coefficients of the linear part are set as

β_{j} = 1 / j^{α}

, and the parameter

α

varies between 2 and

0.5

. The larger

α

implies that the coefficient decays more quickly as j increases. The nonparametric function is

g (U_{i}) = sin (2 π U_{i}^{2})

, where

U_{i}

is generated from the uniform distribution on

[0, 1]

. The model error

ϵ_{i}

follows a normal distribution

N (0, η^{2} (x_{i 2}^{2} + 0.01))

. We choose the value of

η

so that

R^{2} = v a r (μ_{i}) / v a r (Y_{i})

varies from

0.1

to

0.9

, where

v a r (\cdot)

denotes the sample variance. In addition, the censoring variable

C_{i}

is generated from a uniform distribution on interval

[a_{1}, a_{2}]

, where different values of

a_{1}

and

a_{2}

are selected to yield a censoring rate (CR) of either

20 %

or

40 %

. In order to evaluate the performance of the methods as comprehensively as possible, we consider two designs and set the sample size

n = 50

, 75, 100, 200, 300, and 400.

Design 1

(non-nested setting).The linear parts of all candidate models are a subset of

{x_{i 1}, \dots, x_{i 5}}

(with the rest of

X_{i}

being ignored), so the number of candidate models is

2^{5} = 32

.

Design 2

(nested setting).The sth candidate model includes the first s linear variables. The number of the candidate models is determined by

S = INT (3 n^{1 / 3})

, where

INT (b)

denotes the nearest integer of b. Therefore,

S = 11

, 13, 14, 18, 20, and 22, for

n = 50

, 75, 100, 200, 300, and 400, respectively.

5.2. Estimation and Comparison

Suggested by [35], the cubic B-spline is used to approximate the nonparametric function, and the spline basis matrix is generated by

b s (\cdot)

in splines package with R project [48]. To select the number of knots, we set

η = 1

,

α = 2

, and investigate the impact of the number of knots on the risk of CPLMA estimator under different scenarios. Figure 1 shows the variation for the mean of risks when the number of knots varies, over 500 replications for four combinations of designs and CRs considered. From Figure 1, we see that in almost all cases, 1 knot yields the smallest mean of risk, except that in the left lower panel, 1 knot leads to a mean of risk that is second to 2 knots but best among the remaining number of knots. In addition, in all cases, the mean of risk increases with the number of knots when the number of knots exceeds 2. This observation coincides with the findings in [39] that the larger number of knots results in a more serious overfitting effect. Therefore, the number of knots is set to be 1 in the simulation studies.

We compare the performance of the CPLMA method with two traditional model-selection methods (AIC and BIC) and two model-averaging methods based on scores of information criteria (SAIC and SBIC). For the sth model, we calculate AIC and BIC scores by

A I C_{s} = log ({\hat{σ}}_{{\hat{G}}_{n}, (s)}^{2}) + 2 n^{- 1} t r (P_{(s)}),

and

B I C_{s} = log ({\hat{σ}}_{{\hat{G}}_{n}, (s)}^{2}) + n^{- 1} t r (P_{(s)}) log (n),

respectively, where

{\hat{σ}}_{{\hat{G}}_{n}, (s)}^{2} = n^{- 1} {∥ Z_{{\hat{G}}_{n}} - {\hat{μ}}_{{\hat{G}}_{n}, (s)} ∥}^{2}

, and

{\hat{μ}}_{{\hat{G}}_{n}, (s)}

is obtained by replacing G in Equation (9) with

{\hat{G}}_{n}

. Both methods pick the model with the smallest information criterion score.

For SAIC and SBIC, the weights of sth model are defined as

ω_{A I C_{s}} = exp (- A I C_{s} / 2) / \sum_{s^{'} = 1}^{S} exp (- A I C_{s^{'}} / 2),

and

ω_{B I C_{s}} = exp (- B I C_{s} / 2) / \sum_{s^{'} = 1}^{S} exp (- B I C_{s^{'}} / 2),

respectively. To evaluate these five methods, we draw 500 independent samples of size n, and compute the risk of the estimator of

μ

. For comparison convenience, the risks of all estimators are normalized by the risk produced by the AIC method.

5.3. Results

The simulation results for Design 1 are presented in Figure 2 and Figure 3 for the censoring rate of

20 %

and in Figure 4 and Figure 5 for the censoring rate of

40 %

. These four figures show that our method CPLMA leads to the smallest risk in most cases, except that both SAIC and SBIC sometimes have marginal advantages over ours when

R^{2}

is large, and the advantage of SBIC is more obvious when n is small. In particular, comparing the simulation results between

α = 0.5

and

α = 2

, we find that the performance of CPLMA is better when

α

is small. As expected, SAIC and SBIC invariably produce vastly more accurate outcomes than their respective model-selection counterparts.

The simulation results for Design 2 are depicted in Figure 6, Figure 7, Figure 8 and Figure 9 with

C R \approx 20 %

and

40 %

, from which we see that in most cases our proposed CPLMA method still outperforms its rivals in terms of risk. The superiority of the CPLMA method over the other methods is more apparent than in Design 1. Additionally, we find that BIC-based model selection and averaging estimators have much worse risk performance than the other three estimators when

R^{2}

is small, which is different from the simulation results in Design 1. We also note that SAIC and our method CPLMA almost perform equally well when

R^{2}

is very large.

In a word, no matter whether the candidate models are nested or not, our proposal, CPLMA, is superior to the traditional model-selection and model-averaging methods for all the combinations of censoring rates and sample sizes considered.

6. Real Data Analysis

In this section, we apply the proposed CPLMA method to analyze two real datasets by R software. The first real dataset can be found in the R package “survival” [49], and the other is available at http://llmpp.nih.gov/MCL (accessed on 18 January 2023).

6.1. Primary Biliary Cirrhosis Dataset Study

The primary biliary cirrhosis (PBC) dataset includes information of 424 patients, which were collected at Mayo Clinic from January 1974 to May 1984, and has been extensively explored by [34,35,37,50,51]. Following the related literature, we restrict our attention to the

n = 276

patients without missing observations, each of whom contains the data of 17 covariates. There are 111 deaths among 276 patients, which leads to

60 %

of censoring.

In this dataset, the dependent variable is the log number of days between registration and the earlier of death or study analysis time in 1986. The 17 covariates include age (in years), albumin (serum albumin in g/dL), alk.phos (alkaline phosphotase in U/L), bili (serum bilirunbin in mg/dL), chol (serum cholesterol in mg/dL), copper (urine copper in ug/day), platelet (platelet count), protime (standardized blood clotting time in seconds), ast (aspartate aminotransferase, once called SGOT in U/mL), trig (triglycerides in mg/dL), ascites (presence of ascites,

0 =

no,

1 =

yes), edema (

0 =

no;

0.5 =

yes, but responded to diuretic treatment;

1 =

yes, did not respond to treatment), hepato (presence of hepatomegaly,

0 =

no,

1 =

yes), sex (

0 =

male,

1 =

female), spiders (presence of spiders,

0 =

no,

1 =

yes), stage (histologic state of disease, graded 1, 2, 3, or 4), and trt (treatment code,

1 =

D-pencillamine,

2 =

placebo). The first 10 variables are continuous, which are standardized to have mean of 0 and variance of 1 in the analysis.

A total of 17 covariates lead to a huge number of candidate models, which brings heavy calculation burden. Ref. [50] pointed out that only eight covariates, that is, ge, edema, bili, albumin, copper, ast, protime, and stage, have significant impact on the response variable, and ref. [35] found that albumin has a functional impact on the response variable. Thus, we only consider eight significant covariates. Specifically, we assign albumin in the nonparametric part while including others in the linear part of PLM, and we run model selection and averaging on the covariates in the linear part. Accordingly, there are

2^{7} = 128

prepared models as candidates. Similar to [35], we also use the cubic B-spline with two knots to approximate the nonparametric component.

To evaluate the prediction effect of two model-selection methods (AIC and BIC) and three model-averaging methods (SAIC, SBIC, and CPLMA), we randomly separate the data into a training sample and a test sample. Let

n_{0}

be the size of the training sample, and

n_{1} = n - n_{0}

be the size of the test sample. We set

n_{0}

to

140, 160, 180, 200, 220

, and 240. The mean-squared prediction error (MSPE) is used to describe the out-of-sample prediction performance of the proposed CPLMA and its competitors. We further calculate the mean and the median of the MSPE for each method based on 1000 replications. Specifically,

{MSPE}_{mean} = \frac{1}{1000} \sum_{d = 1}^{1000} {MSPE}^{(d)},

(20)

and

{MSPE}_{median} = {median}_{d = 1, 2, \dots, 1000} {MSPE}^{(d)},

(21)

where

{MSPE}^{(d)} = \frac{1}{n_{1}} \sum_{i = n_{0} + 1}^{n} {(Z_{{\hat{G}}_{n}, i}^{(d)} - {\hat{μ}}_{i}^{(d)})}^{2},

(22)

and

{\hat{μ}}_{i}^{(d)}

is the predicted value of

μ_{i}

in the dth repetition.

To facilitate comparison, we calculate the ratio of MSPE for a given method to the MSPE produced by AIC, which is referred to relative MSPE (RMSPE). Table 2 reports the mean and median of RMSPE across 1000 repetitions. We see that our proposed CPLMA always yields the lowest mean and median of RMSPE for all considered training sample sizes. In all cases, the values of RMSPE for BIC are bigger than 1 and the values of RMSPE for three model-averaging methods are smaller than 1, which indicates that in terms of prediction performance, AIC is the overwhelming favorite of two model-selection methods, and model-averaging methods outperform model-selection methods.

Table 3 presents the Diebold and Mariano (DM) [52] test results for the differences in MSPE, where a positive DM statistic implies that the method in the numerator yields a larger MSPE than the method in the denominator. The results in columns 6, 9, 11, and 12 indicate that the differences between CPLMA and its competitors are statistically significant, and our method always produces smaller MSPE than other four methods, which again demonstrates the superiority of our proposal. The results in column 3 show that AIC is significantly better than BIC, which coincides with the finding in Table 2. Columns 4 and 8 indicate that SAIC and SBIC are significantly different from their respective model-selection counterparts.

6.2. Mantle Cell Lymphoma Data Analysis

The mantle cell lymphoma (MCL) dataset contains 92 patients who were classified as having MCL based on established morphologic and immunophenotype criteria. Since 2003, this dataset has been widely studied by [53,54]. The response variable of interest is time (time of follow-up in year). The variable status denotes patient status at follow-up (

1 =

death,

0 =

censored). The six covariates include indicator of INK/ARF deletion (

1 =

yes,

0 =

no), indicator of ATM deletion (

1 =

yes,

0 =

no), indicator of P-53 deletion (

1 =

yes,

0 =

no), cyclinD-1 taqman result, BMI expression, and proliferation signature averages. After removing seven records with missing covariates, we focus on the information of the remaining 85 patients, and the censoring rate is

29.41 %

.

Ref. [54] found that BMI expression has a functional impact on the response variable; therefore, we establish a full PLM model with BMI expression as nonparametric variable and other covariates as linear variables. Then, we can obtain

2^{5} = 32

candidate models to conduct model selection and model averaging. Let the size of training sample be

n_{0} = 55

or 65, and the mean and median of the RMSPE across 1000 repetitions are shown in Table 4. It can be seen from Table 4 that, both in terms of mean and median, our method CPLMA is significantly superior to other competitive methods. Figure 10 shows that the variation of MSPE for CPLMA is minor relative to that of other methods, regardless of if

n_{0}

is 55 or 65.

7. Conclusions

In the context of the semiparametric partially linear models with censored responses, we develop a jackknife model-averaging method which selects the weights by minimizing a leave-one-out cross-validation criterion, in which the B-splines are used to approximate the nonparametric function and the least-squares estimation is applied to estimate the unknown parameters in each candidate model. The resulting model-averaging estimator, CPLMA estimator, is shown to be asymptotically optimal. A simulation study and two real data examples indicate that our method posseses some advantages over other model-selection and model-averaging methods.

Based on the results in this paper, we can further explore the optimal model averaging for the semiparametric partially linear quantile regression models with censored data. In addition, it is worthwhile to apply other optimal model-averaging methods, such as the model-averaging method based on Kullback–Leibler distance, to generalized partially linear models with censored responses.

Author Contributions

Conceptualization, W.C.; methodology, G.H., W.C. and J.Z.; software, G.H. and J.Z.; supervision, W.C. and J.Z.; writing—original draft, G.H.; writing—review and editing, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The work of Hu is supported by the Important Natural Science Foundation of Colleges and Universities of Anhui Province (No.KJ2021A0930). The work of Zeng is supported by the Important Natural Science Foundation of Colleges and Universities of Anhui Province (No.KJ2021A0929).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The PBC dataset is available in the R package “survival”, and the MCL dataset is available at http://llmpp.nih.gov/MCL(accessed on 18 January 2023).

Acknowledgments

The authors would like to thank the reviewers and editors for their careful reading and constructive comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

To prove Theorem 1, we first give some notations which are used in the following context. Similar to

L_{G} (ω)

and

R_{G} (ω)

, we define the loss function of

{\tilde{μ}}_{G} (ω)

as

{\tilde{L}}_{G} (ω) = {∥ {\tilde{μ}}_{G} (ω) - μ ∥}^{2}

, and the risk function as

{\tilde{R}}_{G} (ω) = E ({\tilde{L}}_{G} (ω) | X, U)

. Straightforward calculation yields

{\tilde{R}}_{G} (ω) = ∥ \tilde{A} (ω) μ ∥^{2} + tr [\tilde{P} (ω) Ω_{G} {\{\tilde{P} (ω)\}}^{⊤}],

(A1)

where

\tilde{P} (ω) = \sum_{s = 1}^{S} ω_{s} {\tilde{P}}_{(s)}

, and

\tilde{A} (ω) = I - \tilde{P} (ω)

.

Lemma A1.

Under Conditions (C.2)–(C.7), we have the following results:

\begin{matrix} sup_{ω \in W} | \frac{R_{G} (ω)}{{\tilde{R}}_{G} (ω)} - 1 | = o_{p} (1), \end{matrix}

(A2)

\begin{matrix} sup_{ω \in W} | \frac{{\tilde{L}}_{G} (ω)}{{\tilde{R}}_{G} (ω)} - 1 | = o_{p} (1), \end{matrix}

(A3)

\begin{matrix} sup_{ω \in W} | \frac{L_{G} (ω)}{R_{G} (ω)} - 1 | = o_{p} (1) . \end{matrix}

(A4)

Proof.

According to the proof of (A.45), (A.48) and (A.44) in [45], we know that Equations (A2)–(A4) are satisfied. □

Lemma A2.

If Conditions (C.4)–(C.7) hold, then

\begin{matrix} sup_{ω \in W} \bar{λ} \{P (ω)\} \leq 2, \end{matrix}

(A5)

\begin{matrix} sup_{ω \in W} \bar{λ} \{Λ (ω)\} = o_{p} (1), \end{matrix}

(A6)

and

\begin{matrix} sup_{ω \in W} \bar{λ} \{\tilde{P} (ω)\} = O_{p} (1) . \end{matrix}

(A7)

Proof.

By the inequalities for maximum singular value, we obtain

\begin{matrix} sup_{ω \in W} \bar{λ} {P (ω)} & = sup_{ω \in W} \bar{λ} \{\sum_{s = 1}^{S} ω_{s} P_{(s)}\} \leq \max_{1 \leq s \leq S} \bar{λ} {P_{(s)}} \\ \leq \max_{1 \leq s \leq S} [\bar{λ} {Q_{(s)}} + \bar{λ} {{\tilde{X}}_{(s)} {({\tilde{X}}_{(s)}^{⊤} {\tilde{X}}_{(s)})}^{- 1} {\tilde{X}}_{(s)}^{⊤}}] \leq 2, \end{matrix}

(A8)

where the last inequality holds because the matrices

Q_{(s)}

and

{\tilde{X}}_{(s)} {({\tilde{X}}_{(s)}^{⊤} {\tilde{X}}_{(s)})}^{- 1} {\tilde{X}}_{(s)}^{⊤}

are all symmetric and idempotent.

By the definition of

Λ_{(s)}

, we have

\begin{matrix} sup_{ω \in W} \bar{λ} \{Λ (ω)\} & = sup_{ω \in W} \bar{λ} \{\sum_{s = 1}^{S} ω_{s} Λ_{(s)}\} \leq \max_{1 \leq s \leq S} \bar{λ} (Λ_{(s)}) \\ \leq \max_{1 \leq s \leq S} \bar{λ} (D_{(s)} A_{(s)}) \leq \max_{1 \leq s \leq S} \bar{λ} (D_{(s)}) \bar{λ} (I - P_{(s)}) \\ \leq \max_{1 \leq s \leq S} \bar{λ} (D_{(s)}) \{1 + \bar{λ} (P_{(s)})\} \leq \max_{1 \leq s \leq S} 3 \bar{λ} (D_{(s)}) \\ \leq \max_{1 \leq s \leq S} \max_{1 \leq i \leq n} \frac{3 ϕ_{i i}^{(s)}}{1 - ϕ_{i i}^{(s)}} = O_{p} (\frac{\bar{p} + \bar{q}}{n}), \end{matrix}

(A9)

where the last equality is obtained based on Condition (C.6). Then, Equation (A6) is derived by Condition (C.4).

From (A5) and (A6), it can be shown that

\begin{matrix} sup_{ω \in W} \bar{λ} \{\tilde{P} (ω)\} & \leq \max_{1 \leq s \leq S} \bar{λ} ({\tilde{P}}_{(s)}) = \max_{1 \leq s \leq S} \bar{λ} (P_{(s)} - Λ_{(s)}) \\ \leq \max_{1 \leq s \leq S} \bar{λ} (P_{(s)}) + \max_{1 \leq s \leq S} \bar{λ} (Λ_{(s)}) \\ \leq 2 + O_{p} (\frac{\bar{p} + \bar{q}}{n}) = O_{p} (1) . \end{matrix}

(A10)

The proof of Lemma A2 is completed. □

Lemma A3.

Assuming that Conditions (C.1), (C.2), and (C.5) are satisfied, we obtain

\begin{matrix} ∥ Z_{G} - Z_{{\hat{G}}_{n}} ∥^{2} = O_{p} (1) . \end{matrix}

(A11)

Proof.

This result is from Lemma

6.2

in [37] directly; we omit the proof procedure. □

Proof of Theorem 1.

According to [44,45], Theorem 1 is valid if we can prove

\begin{matrix} sup_{ω \in W} | \frac{R_{G} (ω)}{{\tilde{R}}_{G} (ω)} - 1 | = o_{p} (1), \end{matrix}

(A12)

\begin{matrix} sup_{ω \in W} | \frac{{\tilde{L}}_{G} (ω)}{{\tilde{R}}_{G} (ω)} - 1 | = o_{p} (1), \end{matrix}

(A13)

\begin{matrix} sup_{ω \in W} | \frac{L_{{\hat{G}}_{n}} (ω)}{R_{G} (ω)} - 1 | = o_{p} (1), \end{matrix}

(A14)

and

\begin{matrix} \frac{{\tilde{L}}_{G} (\hat{ω})}{{inf}_{ω \in W} {\tilde{L}}_{G} (ω)} - 1 = o_{p} (1) . \end{matrix}

(A15)

By Lemma A1 and Conditions (C.2)–(C.6), Equations (A12) and (A13) are satisfied. Next, we present the proofs of (A14) and (A15), which completes the proof of Theorem 1.

For (A14), because

\begin{matrix} sup_{ω \in W} & | \frac{L_{{\hat{G}}_{n}} (ω)}{R_{G} (ω)} - 1 | = sup_{ω \in W} | \frac{∥ {\hat{μ}}_{{\hat{G}}_{n}} {(ω) - μ ∥}^{2}}{R_{G} (ω)} - 1 | \\ = sup_{ω \in W} | \frac{∥ μ - {\hat{μ}}_{G} (ω) + {\hat{μ}}_{G} (ω) - {\hat{μ}}_{{\hat{G}}_{n}} {(ω) ∥}^{2}}{R_{G} (ω)} - 1 | \\ \leq sup_{ω \in W} | \frac{L_{G} (ω)}{R_{G} (ω)} - 1 | + 2 sup_{ω \in W} | \frac{{L_{G} (ω)}^{1 / 2} ∥ {\hat{μ}}_{G} (ω) - {\hat{μ}}_{{\hat{G}}_{n}} (ω) ∥}{R_{G} (ω)} | \\ + sup_{ω \in W} | \frac{∥ {\hat{μ}}_{G} (ω) - {\hat{μ}}_{{\hat{G}}_{n}} {(ω) ∥}^{2}}{R_{G} (ω)} |, \end{matrix}

(A16)

it is sufficient to verify that

\begin{matrix} sup_{ω \in W} | \frac{L_{G} (ω)}{R_{G} (ω)} - 1 | = o_{p} (1), \end{matrix}

(A17)

and

\begin{matrix} sup_{ω \in W} | \frac{∥ {\hat{μ}}_{G} (ω) - {\hat{μ}}_{{\hat{G}}_{n}} {(ω) ∥}^{2}}{R_{G} (ω)} | = o_{p} (1) . \end{matrix}

(A18)

Equation (A17) can be directly obtained by Lemma A1. As for (A18), by Cauchy–Schwarz inequality, we have

\begin{matrix} sup_{ω \in W} & | \frac{∥ {\hat{μ}}_{G} (ω) - {\hat{μ}}_{{\hat{G}}_{n}} {(ω) ∥}^{2}}{R_{G} (ω)} | = sup_{ω \in W} | \frac{∥ P (ω) Z_{G} - P (ω) Z_{{\hat{G}}_{n}} ∥^{2}}{R_{G} (ω)} | \\ \leq ξ_{G}^{- 1} {\bar{λ}}^{2} \{P (ω)\} ∥ Z_{G} - Z_{{\hat{G}}_{n}} ∥^{2} \leq 4 ξ_{G}^{- 1} ∥ Z_{G} - Z_{{\hat{G}}_{n}} ∥^{2} = o_{p} (1), \end{matrix}

(A19)

where the last equality follows Lemma A3 and

ξ_{G} \to \infty

, which is implied by Condition (C.3). Then Equation (A14) is obtained.

A simple calculation yields

\begin{matrix} C V_{{\hat{G}}_{n}} (ω) & = ∥ Z_{{\hat{G}}_{n}} - {\tilde{μ}}_{{\hat{G}}_{n}} (ω) ∥^{2} = ∥ Z_{{\hat{G}}_{n}} - μ + μ - {\tilde{μ}}_{G} (ω) + {\tilde{μ}}_{G} (ω) - {\tilde{μ}}_{{\hat{G}}_{n}} (ω) ∥^{2} \\ = ∥ Z_{{\hat{G}}_{n}} - μ ∥^{2} + {\tilde{L}}_{G} (ω) + Φ (ω), \end{matrix}

(A20)

where the term

∥ Z_{{\hat{G}}_{n}} {- μ ∥}^{2}

is unrelated to

ω

, and

\begin{matrix} Φ (ω) = & ∥ {\tilde{μ}}_{G} (ω) - {\tilde{μ}}_{{\hat{G}}_{n}} (ω) ∥^{2} + 2 {(Z_{{\hat{G}}_{n}} - Z_{G})}^{⊤} \{μ - {\tilde{μ}}_{G} (ω)\} \\ + 2 e_{G}^{⊤} \tilde{A} (ω) μ - 2 e_{G}^{⊤} \tilde{P} (ω) e_{G} + 2 {(Z_{{\hat{G}}_{n}} - Z_{G})}^{⊤} \{{\tilde{μ}}_{G} (ω) - {\tilde{μ}}_{{\hat{G}}_{n}} (ω)\} \\ + 2 e_{G}^{⊤} \tilde{P} (ω) (Z_{G} - Z_{{\hat{G}}_{n}}) + 2 {\{μ - {\tilde{μ}}_{G} (ω)\}}^{⊤} \{{\tilde{μ}}_{G} (ω) - {\tilde{μ}}_{{\hat{G}}_{n}} (ω)\} . \end{matrix}

(A21)

Considering (A13), (A15) is implied by

\begin{matrix} sup_{ω \in W} \frac{| Φ (ω) |}{{\tilde{R}}_{G} (ω)} = o_{p} (1) . \end{matrix}

(A22)

Using Cauchy–Schwarz inequality, Lemma A1, Lemma A3, and [45] to establish (A22), we only need to show

\begin{matrix} sup_{ω \in W} \frac{∥ {\tilde{μ}}_{{\hat{G}}_{n}} (ω) - {\tilde{μ}}_{G} {(ω) ∥}^{2}}{{\tilde{R}}_{G} (ω)} = o_{p} (1) . \end{matrix}

(A23)

By Equation (A7) and Lemma A3, we observe

\begin{matrix} sup_{ω \in W} & \frac{∥ {\tilde{μ}}_{{\hat{G}}_{n}} (ω) - {\tilde{μ}}_{G} {(ω) ∥}^{2}}{{\tilde{R}}_{G} (ω)} = sup_{ω \in W} \frac{∥ \tilde{P} (ω) (Z_{{\hat{G}}_{n}} - Z_{G}) ∥^{2}}{{\tilde{R}}_{G} (ω)} \\ \leq sup_{ω \in W} \frac{{\bar{λ}}^{2} {\tilde{P} (ω)} {∥ Z_{{\hat{G}}_{n}} - Z_{G} ∥}^{2}}{{\tilde{R}}_{G} (ω)} = o_{p} (1) . \end{matrix}

(A24)

Thus, we can obtain (A15). This concludes the proof. □

References

Engle, R.F.; Granger, C.W.J.; Rice, J.; Weiss, A. Semiparametric estimates of the relation between weather and electricity sales. J. Am. Stat. Assoc. 1986, 81, 310–320. [Google Scholar] [CrossRef]
Speckman, P. Kernel smoothing in partial linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 1988, 50, 413–436. [Google Scholar] [CrossRef]
Heckman, N.E. Spline smoothing in a partly linear model. J. R. Stat. Soc. Ser. B Stat. Methodol. 1986, 48, 244–248. [Google Scholar] [CrossRef]
Shi, J.; Lau, T. Empirical likelihood for partially linear models. J. Multivar. Anal. 2000, 72, 132–148. [Google Scholar] [CrossRef]
Härdle, W.; Liang, H.; Gao, J. Partially Linear Models; Springer Science & Business Media: Berlin, Germany, 2000. [Google Scholar]
Claeskens, G.; Hjort, N.L. Model Selection and Model Averaging; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Hansen, B.E.; Racine, J.S. Jackknife model averaging. J. Econom. 2012, 167, 38–46. [Google Scholar] [CrossRef]
Racine, J.S.; Li, Q.; Yu, D.; Zheng, L. Optimal model averaging of mixed-data kernel-weighted spline regressions. J. Bus. Econ. Stat. 2022, in press. [CrossRef]
Akaike, H. Statistical predictor identification. Ann. Inst. Statist. Math. 1970, 22, 203–217. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the dimension of a model. Ann. Statist. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Claeskens, G.; Hjort, N.L. The focused information criterion. J. Am. Stat. Assoc. 2003, 98, 900–916. [Google Scholar] [CrossRef]
Ni, X.; Zhang, H.; Zhang, D. Automatic model selection for partially linear models. J. Multivar. Anal. 2009, 100, 2100–2111. [Google Scholar] [CrossRef] [Green Version]
Raheem, S.E.; Ahmed, S.E.; Doksum, K.A. Absolute penalty and shrinkage estimation in partially linear models. Comput. Stat. Data Anal. 2012, 56, 874–891. [Google Scholar] [CrossRef]
Xie, H.; Huang, J. SCAD-penalized regression in high-dimensional partially linear models. Ann. Statist. 2009, 37, 673–696. [Google Scholar] [CrossRef]
Peng, J.; Yang, Y. On improvability of model selection by model averaging. J. Econom. 2022, 229, 246–262. [Google Scholar] [CrossRef]
Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian model averaging: A tutorial. Statist. Sci. 1999, 14, 382–417. [Google Scholar]
Hansen, B.E. Least squares model averaging. Econometrica 2007, 75, 1175–1189. [Google Scholar] [CrossRef]
Zhang, X.; Zou, G.; Carroll, R.J. Model averaging based on Kullback-Leibler distance. Stat. Sin. 2015, 25, 1583–1598. [Google Scholar] [CrossRef] [PubMed]
Liu, Q.; Okui, R.; Yoshimura, A. Generalized least squares model averaging. Economet. Rev. 2016, 35, 1692–1752. [Google Scholar] [CrossRef]
Gao, Y.; Zhang, X.; Wang, S.; Zou, G. Model averaging based on leave-subject-out cross-validation. J. Econom. 2016, 192, 139–151. [Google Scholar] [CrossRef]
Zhang, X.; Liu, C. Model averaging prediction by K-fold cross-validation. J. Econom. 2022, in press.
Lu, X.; Su, L. Jackknife model averaging for quantile regressions. J. Econom. 2015, 188, 40–58. [Google Scholar] [CrossRef]
Zhang, X.; Wang, W. Optimal model averaging estimation for partially linear models. Stat. Sin. 2019, 29, 693–718. [Google Scholar] [CrossRef]
Zhu, R.; Wan, A.T.K.; Zhang, X.; Zou, G. A Mallows-type model averaging estimator for the varying-coefficient partially linear model. J. Am. Stat. Assoc. 2019, 114, 882–892. [Google Scholar] [CrossRef]
Xie, J.; Yan, X.; Tang, N. A model-averaging method for high-dimensional regression with missing responses at random. Stat. Sin. 2021, 31, 1005–1026. [Google Scholar] [CrossRef]
Wei, Y.; Wang, Q.; Liu, W. Model averaging for linear models with responses missing at random. Ann. Inst. Statist. Math. 2021, 73, 535–553. [Google Scholar] [CrossRef]
Zhang, X.; Chiou, J.; Ma, Y. Functional prediction through averaging estimated functional linear regression models. Biometrika 2018, 105, 945–962. [Google Scholar] [CrossRef]
Zhang, X.; Ma, Y.; Carroll, R.J. MALMEM: Model averaging in linear measurement error models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2019, 81, 763–779. [Google Scholar] [CrossRef]
Ando, T.; Li, K.C. A model-averaging approach for high-dimensional regression. J. Am. Stat. Assoc. 2014, 109, 254–265. [Google Scholar] [CrossRef]
Ando, T.; Li, K.C. A weight-relaxed model averaging approach for high-dimensional generalized linear models. Ann. Statist. 2017, 45, 2654–2679. [Google Scholar] [CrossRef]
Zeng, D.; Lin, D. Efficient estimation for the accelerated failure time model. J. Am. Stat. Assoc. 2007, 102, 1387–1396. [Google Scholar] [CrossRef]
Wang, H.J.; Wang, L. Locally weighted censored quantile regression. J. Am. Stat. Assoc. 2009, 104, 1117–1128. [Google Scholar] [CrossRef]
Hjort, N.L.; Claeskens, G. Focused information criteria and model averaging for the Cox hazard regression model. J. Am. Stat. Assoc. 2006, 101, 1449–1464. [Google Scholar] [CrossRef]
Du, J.; Zhang, Z.; Xie, T. Focused information criterion and model averaging in censored quantile regression. Metrika 2017, 80, 547–570. [Google Scholar] [CrossRef]
Sun, Z.; Sun, L.; Lu, X.; Zhu, J.; Li, Y. Frequentist model averaging estimation for the censored partial linear quantile regression model. J. Statist. Plann. Inference 2017, 189, 1–15. [Google Scholar] [CrossRef]
Yan, X.; Wang, H.; Wang, W.; Xie, J.; Ren, Y.; Wang, X. Optimal model averaging forecasting in high-dimensional survival analysis. Int. J. Forecast. 2021, 37, 1147–1155. [Google Scholar] [CrossRef]
Liang, Z.; Chen, X.; Zhou, Y. Mallows model averaging estimation for linear regression model with right censored data. Acta Math. Appl. Sin. E. 2022, 38, 5–23. [Google Scholar] [CrossRef]
Koul, H.; Susarla, V.; Ryzin, J.V. Regression analysis with randomly right-censored data. Ann. Statist. 1981, 9, 1276–1288. [Google Scholar] [CrossRef]
Xia, X. Model averaging prediction for nonparametric varying-coefficient models with B-spline smoothing. Stat. Pap. 2021, 62, 2885–2905. [Google Scholar] [CrossRef]
De Boor, C. A Practical Guide to Splines; Springer: New York, NY, USA, 2001. [Google Scholar]
Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
Hu, G.; Cheng, W.; Zeng, J. Model averaging by jackknife criterion for varying-coefficient partially linear models. Comm. Statist. Theory Methods 2020, 49, 2671–2689. [Google Scholar] [CrossRef]
Turlach, B.A.; Weingessel, A.; Moler, C. Quadprog: Functions to Solve Quadratic Programming Problems. R Package Version 1.5-8. 2019. Available online: https://CRAN.R-project.org/package=quadprog (accessed on 16 December 2022).
Wei, Y.; Wang, Q. Cross-validation-based model averaging in linear models with response missing at random. Stat. Probab. Lett. 2021, 171, 108990. [Google Scholar] [CrossRef]
Zhang, X.; Wan, A.T.K.; Zou, G. Model averaging by jackknife criterion in models with dependent data. J. Econom. 2013, 174, 82–94. [Google Scholar] [CrossRef]
Wan, A.T.; Zhang, X.; Zou, G. Least squares model averaging by Mallows criterion. J. Econom. 2010, 156, 277–283. [Google Scholar] [CrossRef]
Fan, J.; Ma, Y.; Dai, W. Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models. J. Am. Stat. Assoc. 2014, 109, 1270–1284. [Google Scholar] [CrossRef] [PubMed]
Bates, D.M.; Venables, W.N. Splines: Regression Spline Functions and Classes. R Package Version 3.6-1. 2019. Available online: https://CRAN.R-project.org/package=splines (accessed on 15 December 2022).
Therneau, T.M.; Lumley, T.; Elizabeth, A.; Cynthia, C. Survival: Survival Analysis. R Package Version 3.4-0. 2022. Available online: https://CRAN.R-project.org/package=survival (accessed on 15 December 2022).
Tibshirani, R. The lasso method for variable selection in the Cox model. Stat. Med. 1997, 16, 385–395. [Google Scholar] [CrossRef]
Shows, J.H.; Lu, W.; Zhang, H.H. Sparse estimation and inference for censored median regression. J. Statist. Plann. Inference 2010, 140, 1903–1917. [Google Scholar] [CrossRef] [PubMed]
Diebold, F.X.; Mariano, R.S. Comparing predictive accuracy. J. Bus. Econ. Stat. 1995, 13, 253–263. [Google Scholar]
Rosenwald, A.; Wright, G.; Wiestner, A.; Chan, W.C.; Connors, J.M.; Campo, E.; Gascoyne, R.D.; Grogan, T.M.; Muller-Hermelink, H.K.; Smeland, E.B.; et al. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell 2003, 3, 185–197. [Google Scholar] [CrossRef] [PubMed]
Ma, S.; Du, P. Variable selection in partly linear regression model with diverging dimensions for right censored data. Stat. Sin. 2012, 22, 1003–1020. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The curves of the mean of risk with the number of knots over 500 replications.

Figure 2. Risk comparisons for Design 1 when

α = 2

and the censoring rate is about

20 %

.

Figure 2. Risk comparisons for Design 1 when

α = 2

and the censoring rate is about

20 %

.

Figure 3. Risk comparisons for Design 1 when

α = 0.5

and the censoring rate is about

20 %

.

Figure 3. Risk comparisons for Design 1 when

α = 0.5

and the censoring rate is about

20 %

.

Figure 4. Risk comparisons for Design 1 when

α = 2

and the censoring rate is about

40 %

.

Figure 4. Risk comparisons for Design 1 when

α = 2

and the censoring rate is about

40 %

.

Figure 5. Risk comparisons for Design 1 when

α = 0.5

and the censoring rate is about

40 %

.

Figure 5. Risk comparisons for Design 1 when

α = 0.5

and the censoring rate is about

40 %

.

Figure 6. Risk comparisons for Design 2 when

α = 2

and the censoring rate is about

20 %

.

Figure 6. Risk comparisons for Design 2 when

α = 2

and the censoring rate is about

20 %

.

Figure 7. Risk comparisons for Design 2 when

α = 0.5

and the censoring rate is about

20 %

.

Figure 7. Risk comparisons for Design 2 when

α = 0.5

and the censoring rate is about

20 %

.

Figure 8. Risk comparisons for Design 2 when

α = 2

and the censoring rate is about

40 %

.

Figure 8. Risk comparisons for Design 2 when

α = 2

and the censoring rate is about

40 %

.

Figure 9. Risk comparisons for Design 2 when

α = 0.5

and the censoring rate is about

40 %

.

Figure 9. Risk comparisons for Design 2 when

α = 0.5

and the censoring rate is about

40 %

.

Figure 10. Boxplots for MSPEs of five methods for the MCL data.

Table 1. The basic notations used in this paper.

Notations	Descriptions
$T_{i}$	The survival time of the ith subject
$Y_{i}$	The response variable, a transformation of $T_{i}$
$X_{i}$	The covariate vector of the ith subject
$δ_{i}$	The censoring indicator of the ith subject
$C_{i}$	The last follow up time of the ith subject
$Z_{i}$	The observed time, equal to $min (Y_{i}, C_{i})$
$G (\cdot)$	The cumulative distribution function of $C_{i}$
$Z_{G}$	$n \times 1$ synthetic response vector
$μ$	$n \times 1$ conditional mean vector of the response
$B_{(s)}$	$n \times k_{n}$ B-spline basis matrix for the sth model
$β_{(s)}$	$p_{s} \times 1$ linear regression coefficient vector for the sth model
$α_{(s)}$	$k_{n} \times 1$ spline coefficient vector for the sth model
${\hat{G}}_{n} (\cdot)$	The Kaplan–Meier estimator of $G (\cdot)$
${\hat{β}}_{G, (s)}$	The estimator of $β_{(s)}$ with $G (\cdot)$
${\hat{α}}_{G, (s)}$	The estimator of $α_{(s)}$ with $G (\cdot)$
${\hat{μ}}_{G, (s)}$	The estimator of $μ$ for the sth model with $G (\cdot)$
${\hat{μ}}_{G} (ω)$	The model-averaging estimator of $μ$ with $G (\cdot)$
${\tilde{μ}}_{G, (s)}$ or ${\tilde{μ}}_{{\hat{G}}_{n}, (s)}$	the sth jackknife estimator of $μ$ with $G (\cdot)$ or ${\hat{G}}_{n} (\cdot)$
${\tilde{μ}}_{G} (ω)$ or ${\tilde{μ}}_{{\hat{G}}_{n}} (ω)$	the jackknife model-averaging estimator of $μ$ with $G (\cdot)$ or ${\hat{G}}_{n} (\cdot)$

Table 2. The mean and median of RMSPE across 1000 repetitions.

$n_{0}$	Method	BIC	SAIC	SBIC	CPLMA
140	mean	1.005	0.987	0.983	0.979
	median	1.014	0.992	0.995	0.987
160	mean	1.005	0.987	0.983	0.982
	median	1.006	0.986	0.988	0.984
180	mean	1.011	0.991	0.991	0.986
	median	1.011	0.990	0.990	0.981
200	mean	1.014	0.993	0.995	0.989
	median	1.012	0.984	0.990	0.976
220	mean	1.012	0.995	0.997	0.994
	median	1.020	0.994	1.003	0.993
240	mean	1.008	0.995	0.998	0.993
	median	1.017	0.996	0.999	0.988

Table 3. Diebold–Mariano test results for the differences in MSPE.

$n_{0}$	Method	$\frac{AIC}{BIC}$	$\frac{AIC}{SAIC}$	$\frac{AIC}{SBIC}$	$\frac{AIC}{CPLMA}$	$\frac{BIC}{SAIC}$	$\frac{BIC}{SBIC}$	$\frac{BIC}{CPLMA}$	$\frac{SAIC}{SBIC}$	$\frac{SAIC}{CPLMA}$	$\frac{SBIC}{CPLMA}$
140	DM	−3.013	15.130	12.335	14.858	16.743	25.341	16.633	4.973	6.361	2.834
	p-value	0.003	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.005
160	DM	−3.014	16.607	12.490	14.862	17.196	30.331	18.474	4.995	5.538	0.942
	p-value	0.003	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.347
180	DM	−8.082	12.355	7.874	11.238	22.554	34.561	21.914	−0.348	4.439	5.679
	p-value	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.728	0.000	0.000
200	DM	−11.473	12.320	4.744	11.393	23.962	32.286	22.550	−3.721	5.288	8.690
	p-value	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
220	DM	−9.509	8.500	2.308	5.587	19.004	22.316	17.152	−4.011	1.085	5.059
	p-value	0.000	0.000	0.021	0.000	0.000	0.000	0.000	0.000	0.278	0.000
240	DM	−5.484	7.332	1.441	5.848	12.901	15.427	12.998	−4.175	2.110	6.561
	p-value	0.000	0.000	0.150	0.000	0.000	0.000	0.000	0.000	0.035	0.000

Table 4. The mean and median of RMSPE across 1000 repetitions.

$n_{0}$	Method	BIC	SAIC	SBIC	CPLMA
55	mean	1.011	0.982	0.988	0.923
	median	0.951	0.965	0.937	0.918
65	mean	0.992	0.976	0.987	0.947
	median	0.982	0.970	0.973	0.939

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, G.; Cheng, W.; Zeng, J. Optimal Model Averaging for Semiparametric Partially Linear Models with Censored Data. Mathematics 2023, 11, 734. https://doi.org/10.3390/math11030734

AMA Style

Hu G, Cheng W, Zeng J. Optimal Model Averaging for Semiparametric Partially Linear Models with Censored Data. Mathematics. 2023; 11(3):734. https://doi.org/10.3390/math11030734

Chicago/Turabian Style

Hu, Guozhi, Weihu Cheng, and Jie Zeng. 2023. "Optimal Model Averaging for Semiparametric Partially Linear Models with Censored Data" Mathematics 11, no. 3: 734. https://doi.org/10.3390/math11030734

APA Style

Hu, G., Cheng, W., & Zeng, J. (2023). Optimal Model Averaging for Semiparametric Partially Linear Models with Censored Data. Mathematics, 11(3), 734. https://doi.org/10.3390/math11030734

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Model Averaging for Semiparametric Partially Linear Models with Censored Data

Abstract

1. Introduction

2. Model Setup and Parametric Estimation

3. Model-Averaging Estimator and Weight Choice Criterion

4. Asymptotic Optimality

5. A Simulation Study

5.1. The Design of Simulation

5.2. Estimation and Comparison

5.3. Results

6. Real Data Analysis

6.1. Primary Biliary Cirrhosis Dataset Study

6.2. Mantle Cell Lymphoma Data Analysis

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI