Variable Selection for Generalized Linear Models with Interval-Censored Failure Time Data

Rong Liu; Shishun Zhao; Tao Hu; Jianguo Sun

doi:10.3390/math10050763

,

and

¹

Center for Applied Statistical Research, College of Mathematics, Jilin University, Changchun 130012, China

²

National Applied Mathematical Center (Jilin), Changchun 130012, China

³

School of Mathematical Sciences, Capital Normal University, Beijing 100048, China

⁴

Department of Statistics, University of Missouri, Columbia, MO 65211, USA

Mathematics2022, 10(5), 763;https://doi.org/10.3390/math10050763

This article belongs to the Special Issue Computational Statistics and Data Analysis

Version Notes

Order Reprints

Abstract

Variable selection is often needed in many fields and has been discussed by many authors in various situations. This is especially the case under linear models and when one observes complete data. Among others, one common situation where variable selection is required is to identify important risk factors from a large number of covariates. In this paper, we consider the problem when one observes interval-censored failure time data arising from generalized linear models, for which there does not seem to exist an established method. To address this, we propose a penalized least squares method with the use of an unbiased transformation and the oracle property of the method is established along with the asymptotic normality of the resulting estimators of regression parameters. Simulation studies were conducted and demonstrated that the proposed method performed well for practical situations. In addition, the method was applied to a motivating example about children’s mortality data of Nigeria.

Keywords:

interval-censored data; unbiased transformation; linear model; variable selection; large sample properties

MSC:

62J12; 62N02; 62J07; 62G20

1. Introduction

Variable selection is an important field in statistics, and there is a lot of literature on variable selection, especially in the context of linear models for complete data, such as stepwise regression, ridge regression, Bayesian variable selection, least absolute shrinkage and selection operator (LASSO), model averaging, smoothly clipped absolute deviation (SCAD), elastic net, adaptive LASSO (ALASSO), minimax concave penalty (MCP), seamless-

L_{0}

(SELO) and broken adaptive ridge (BAR) (Goldberger et al., 1961 [1]; Hoerl and Kennard, 1970 [2]; Mitchell and Beauchamp, 1988 [3]; Tibshirani, 1996 [4]; Raftery et al., 1997 [5]; Zou and Hastie, 2005 [6]; Zou, 2006 [7]; Fan and Li, 2001 [8]; Zhang, 2010 [9]; Dicker et al., 2013 [10]; Liu and Li, 2016 [11]; Dai et al., 2018 [12]; Zheng et al., 2021 [13]). Among others, one type of methods that have recently attracted a lot of attention is the penalized method, for which various penalty functions have been proposed. LASSO by Tibshirani (1996) [4], SCAD by Fan and Li (2001) [8], elastic net by Zou and Hastie (2005), ALASSO by Zou (2006) [7], MCP by Zhang (2010) [9], SELO by Dicker et al. (2013) [10], and BAR by Liu and Li (2016) [11] are included for penalty functions.

Variable selection has been investigated by many authors for incomplete data such as right-censored and interval-censored failure time data (Cai et al., 2005 [14]; Fan and Li, 2002 [15]; Tibshirani, 1997 [16]; Zhang and Lu, 2007 [17]; Nasrullah Khan et al., 2018 [18]; Zhao et al., 2020 [19]; Li et al., 2020 [20]; Du and Sun 2021 [21]; Ali et al., 2021 [22]). By interval censored data, we mean that the failure time of interest is known or observed only to belong to an interval instead of being observed exactly. It is easy to see that this is usually the case for periodic follow-up studies such as clinical trials and they include right-censored failure time data as a special case. Notably, the analysis of interval-censored data is much more challenging than the analysis of right-censored data. In terms of the well-known Cox model, the classical partial likelihood method for right-censored data is no longer available with interval censoring. This is because for the case of interval censoring, we have to estimate not only regression parameters of interest but also the nuisance parameters simultaneously. Recently, many authors have discussed the analysis of interval-censored data in various situations. For example, Zhao et al. (2015) [23], Wang et al. (2016) [24] and Li et al. (2018) [25] studied inference procedures for the Cox model, the additive hazard model, and the linear transformation model, respectively. Sun (2006) [26] provided a relatively complete review of the literature for the interval-censored failure time data analysis.

As mentioned above, several authors have discussed variable selection for interval-censored failure time data ([19,20]). However, existing methods cannot be applied directly to linear models or generalized linear models. In the variable selection procedure proposed below, we employ the unbiased transformation approach and the main idea behind it is to transfer two variables representing interval-censored data to a new, single variable that has the same conditional expectation. One of the early applications of the approach was given by Buckley and James (1979) [27] in dealing with right-censored data. Among others, one advantage of the proposed method over the existing methods is that it can be relatively easily implemented since one can make use of the existing variable selection programs for complete data with simple modifications.

Deng (2004) [28] and Deng et al. (2012) [29] discussed the use of the unbiased transformation approach for the analysis of interval-censored data. In particular, the latter considered the situation similar to that discussed here, but not on variable selection, under the assumption that the joint density function of the two variables representing the observed data is known. It is easy to see that this may not be true in many applications. To address this, we will adopt the kernel estimation approach [30,31] to estimate the needed density function and develop a unified approach to variable selection with different penalty functions for generalized linear models based on interval-censored data.

The rest of the article is organized as follows. In Section 2, we will first introduce some notations and assumptions to be used throughout the paper and present the proposed variable selection procedure. For the implementation of the presented method, a coordinate descent algorithm is developed in Section 3. In Section 4, the asymptotic properties of the proposed method under the BAR penalty are established, and Section 5 gives some numerical results obtained from a simulation study, which suggest that the method works well in practical situations. In Section 6, the method is applied to children’s mortality data of Nigeria that motivated this study, and Section 7 contains some discussion and concluding remarks.

2. Unbiased Transformation Variable Selection Procedure

Consider a failure time study that consists of n independent subjects and for subject i, let

T_{i}

denote the failure time of interest and

X_{i}

a p-dimensional vector of covariates. Suppose that for each

T_{i}

, two observations are available at the observation times

U_{i}

and

V_{i}

such that they divide the axis

(0, \infty)

into three parts

[0, U_{i}), [U_{i}, V_{i}], (V_{i}, \infty)

and we know which part

T_{i}

falls in. Thus, the observed data on subject i have the form

O_{i} = {U_{i}, V_{i}, δ_{1 i}, δ_{2 i}, X_{i}}

, where

δ_{1 i} = I (T_{i} \leq U_{i})

and

δ_{2 i} = I (U_{i} < T_{i} \leq V_{i})

with

I (\cdot)

being the indicator function,

I_{A} (x) = \{\begin{matrix} 1 & x \in A \\ 0 & x \notin A \end{matrix}

i = 1, \dots, n

. In the following, we will assume that

T_{i}

is independent of

U_{i}

and

V_{i}

given

X_{i}

.

To describe the covariate effects, we will assume that given

X_{i}

,

T_{i}

has the form

H (T_{i}) = β_{0} + X_{i}^{⊤} β + ϵ_{i},

(1)

where

H (\cdot)

is a known function,

ϵ_{i}

is a zero mean random error with distribution unknown, and

β_{0}

and

β

are unknown parameters. Note that for the estimation of the model above, if

T_{i}

was exactly observed, a simple method would be to take

H (T_{i})

as a new response variable and transform model (1) into a linear model. For the situation here where

T_{i}

is interval-censored, however, the above method cannot be carried out directly.

To overcome the problem, we adopt the unbiased transformation approach to first convert

H (T_{i})

into the variable

h_{i}^{*} = ϕ_{1} (U_{i}, V_{i}) δ_{1 i} + ϕ_{2} (U_{i}, V_{i}) δ_{2 i} + ϕ_{3} (U_{i}, V_{i}) (1 - δ_{1 i} - δ_{2 i}) + H (0),

where

ϕ_{1} (\cdot, \cdot)

,

ϕ_{2} (\cdot, \cdot)

and

ϕ_{3} (\cdot, \cdot)

are some continuous functions with finite continuous partial derivatives and also independent of the distribution of

T_{i}

. Let

g (\cdot, \cdot)

denotes the joint density function of

U_{i}

and

V_{i}

and assume that

ϕ_{1} (u, v)

,

ϕ_{2} (u, v)

and

ϕ_{3} (u, v)

satisfy the following conditions

\{\begin{matrix} \int_{v = 0}^{\infty} \int_{u = 0}^{v} ϕ_{1} (u, v) g (u, v) d u d v = 0, \\ \int_{y}^{\infty} [ϕ_{2} (y, v) - ϕ_{1} (y, v)] g (y, v) d v \\ + \int_{0}^{y} [ϕ_{3} (u, y) - ϕ_{2} (u, y)] g (u, y) d u = H^{'} (y) . \end{matrix}

(2)

Then according to Deng et al. (2012) [29], we have that

E (h_{i}^{*} (U_{i}, V_{i}, δ_{1 i}, δ_{2 i})) = E (H (T_{i}))

for

i = 1, \dots, n

.

Under Equation (2), one can see that

ϕ_{1} (U_{i}, V_{i})

,

ϕ_{2} (U_{i}, V_{i})

and

ϕ_{3} (U_{i}, V_{i})

not only depend on

U_{i}

and

V_{i}

but also depend on

g (U_{i}, V_{i})

. More specifically, one can rewrite

h_{i}^{*}

as

h_{i}^{*} (U_{i}, V_{i}, g (U_{i}, V_{i}), δ_{1 i}, δ_{2 i}) = ϕ_{1} (U_{i}, V_{i}, g (U_{i}, V_{i})) δ_{1 i} + ϕ_{2} (U_{i}, V_{i}, g (U_{i}, V_{i})) δ_{2 i}

+ ϕ_{3} (U_{i}, V_{i}, g (U_{i}, V_{i})) (1 - δ_{1 i} - δ_{2 i}) + H (0) .

Thus, for estimating model (1) or

β

, it is natural to consider the least squares method to minimize the mean square of the residual after the unbiased transformation

n^{- 1} \sum_{i = 1}^{n} {(h_{i}^{*} - β_{0} - X_{i}^{⊤} β)}^{2}

(3)

if

g (U_{i}, V_{i})

was known. Of course, in practice,

g (U_{i}, V_{i})

is unknown and for this, we propose to first estimate

g (U_{i}, V_{i})

by the kernel density estimator

\hat{g} (U_{i}, V_{i})

and then consider

{\hat{h}}_{i}^{*} = ϕ_{1} (U_{i}, V_{i}, \hat{g} (U_{i}, V_{i})) δ_{1 i} + ϕ_{2} (U_{i}, V_{i}, \hat{g} (U_{i}, V_{i})) δ_{2 i}

+ ϕ_{3} (U_{i}, V_{i}, \hat{g} (U_{i}, V_{i})) (1 - δ_{1 i} - δ_{2 i}) + H (0) .

Note that

{\hat{h}}_{i}^{*}

given above involves the estimation of the two-dimensional function g and according to Section 5 of Deng (2004) [28], one could equivalently replace it by

{\hat{h}}_{i}^{*} = ϕ_{1} (U_{i}, V_{i}, {\hat{g}}_{u} (U_{i}), {\hat{g}}_{v} (V_{i})) δ_{1 i} + ϕ_{2} (U_{i}, V_{i}, {\hat{g}}_{u} (U_{i}), {\hat{g}}_{v} (V_{i})) δ_{2 i}

+ ϕ_{3} (U_{i}, V_{i}, {\hat{g}}_{u} (U_{i}), {\hat{g}}_{v} (V_{i})) (1 - δ_{1 i} - δ_{2 i}) + H (0),

where

{\hat{g}}_{u} (U_{i})

and

{\hat{g}}_{v} (V_{i})

denote the kernel estimators of the marginal density functions

g_{u} (U_{i})

and

g_{v} (V_{i}) .

In Lemma A1 given in Appendix A, we show that

{\hat{h}}_{i}^{*}

converge to

h_{i}^{*}

in probability. Since

U_{i}

and

V_{i}

are positive variables, we propose to adopt the log-transformation technique and employ the log kernel density estimator

{\hat{g}}_{u} (u) = {(n h)}^{- 1} \sum_{i = 1}^{n} u^{- 1} K ((l o g (u) - l o g (U_{i})) / h),

and

{\hat{g}}_{v} (v) = {(n h)}^{- 1} \sum_{i = 1}^{n} v^{- 1} K ((l o g (v) - l o g (V_{i})) / h)

(Parzen, 1962 [30]). Then by following Deng (2004) [28], one can estimate

β

by minimizing the mean squared residual after kernel estimation and unbiased transformation

n^{- 1} \sum_{i = 1}^{n} {({\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{q}^{*} - X_{i}^{⊤} β)}^{2} .

(4)

Now we consider the variable selection or the selection of important covariates. For this, and motivated by (4), we propose to use the penalized least squares estimation method and to minimize the mean squared residual after kernel estimation and unbiased transformation with penalized criteria

n^{- 1} \sum_{i = 1}^{n} {({\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{q}^{*} - X_{i}^{⊤} β)}^{2} + \sum_{j = 1}^{p} p_{α, λ} (| β_{j} |),

(5)

where

p_{a, λ} (| β_{j} |)

denotes a penalty function with a and

λ

(some comments are given below). In the following, several commonly used penalty functions are considered, including the LASSO penalty

p_{λ} (| β_{j} |) = λ | β_{j} |

proposed by Tibshirani (1996) [4] and the SCAD penalty

p_{λ} (|β_{j}|; a) = \{\begin{matrix} λ |β_{j}| & if |β_{j}| \leq λ \\ - \frac{{|β_{j}|}^{2} - 2 a λ |β_{j}| + λ^{2}}{2 (a - 1)} & if λ < |β_{j}| \leq a λ \\ \frac{(a + 1) λ^{2}}{2} & if |β_{j}| > a λ \end{matrix}

with

a > 2

by Fan and Li (2001) [8]. For a in the SCAD penalty, we set

a = 3.7

suggested by Fan and Li (2001) [8]. Furthermore, we investigate the use of the MCP penalty

p_{λ} (|β_{j}|; a) = λ \int_{0}^{|β_{j}|} \frac{{(a λ - x)}_{+}}{a λ} d x

with

a > 1

given in Zhang (2010) [9] and the BAR penalty

p_{λ} (|β_{j}|) = λ β_{j}^{2} / {\tilde{β}}_{j}^{2}

discussed in Zhao et al. (2020) [19], where

{\tilde{β}}_{j} (j = 1, \dots, p)

denotes a nonzero “good” estimator of

β_{j}

.

Note that for the application of the method described above, one needs to choose the functions

ϕ_{1},

ϕ_{2},

and

ϕ_{3}

satisfying Equation (2). For this, many functions can be used and for simplicity, we suggest to employ (I)

ϕ_{1} (U_{i}, V_{i}) = 0, ϕ_{2} (U_{i}, V_{i}) = 0, ϕ_{3} (U_{i}, V_{i}) = H^{'} (V_{i}) / {\hat{g}}_{v} (V_{i})

; (II)

ϕ_{1} (U_{i}, V_{i}) = 0, ϕ_{2} (U_{i}, V_{i}) = H^{'} (U_{i}) / {\hat{g}}_{u} (U_{i}), ϕ_{3} (U_{i}, V_{i}) = H^{'} (U_{i}) / {\hat{g}}_{u} (U_{i})

; or (III)

ϕ_{1} (U_{i}, V_{i}) = 0

,

ϕ_{2} (U_{i}, V_{i}) = H^{'} (U_{i}) / 2 {\hat{g}}_{u} (U_{i})

,

ϕ_{3} (U_{i}, V_{i}) = H^{'} (U_{i}) /

2 {\hat{g}}_{u} (U_{i}) + H^{'} (V_{i}) / 2 {\hat{g}}_{u} (V_{i}) .

The numerical study below indicates that they give satisfactory and robust results and more discussion on this can be found in Deng et al. (2012) [29].

3. Penalized Least Squares Coordinate Descent Algorithm

Let

\hat{β}

denote the estimator of

β

given by minimizing the penalized criterion function in (5). In this section, we discuss the determination of

β

and investigate a coordinate descent algorithm that takes turn to update each element

β_{j}

of

β

while making all other elements of

β

fixed at their current estimates except

β_{j}

.

Define

M (β_{j}) = n^{- 1} \sum_{i = 1}^{n} {({\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{q}^{*} - X_{i}^{⊤} β)}^{2} .

Then, at the k th iteration, the value of

β_{j}

is derived by minimizing

Q (β_{j}) = M (β_{j}) + p_{λ} (| β_{j} |)

to determine

{\hat{β}}_{j}^{(k)}

. Note that by drawing the locally quadratic approximation idea discussed in Fan and Li (2001) [8], the locally quadratic approximation can be presented as

[p_{λ} (| β_{j} {|)]}^{'} = p_{λ}^{'} (| β_{j} |) s g n (β_{j}) \approx {p_{λ}^{'} (| {\hat{β}}_{j}^{(k - 1)} |) / | β_{j}^{(k - 1)} |} β_{j}

when

β_{j} \neq 0

. In other words, a quadratic function can locally approximate

p (| β_{j} |; λ)

at

| {\hat{β}}_{j}^{(k - 1)} |

as

p_{λ} (|β_{j}|) \approx p_{λ} (|{\hat{β}}_{j}^{(k - 1)}|) + 1 / 2 \{p_{λ}^{'} (|{\hat{β}}_{j}^{(k - 1)}|) / |{\hat{β}}_{j}^{(k - 1)}|\} [β_{j}^{2} - {({\hat{β}}_{j}^{(k - 1)})}^{2}] .

Meanwhile,

M (β_{j})

can be obtained by the second-order Taylor expansion,

M (β_{j}) \approx M ({\hat{β}}_{j}^{(k - 1)}) + M^{'} ({\hat{β}}_{j}^{(k - 1)}) (β_{j} - {\hat{β}}_{j}^{(k - 1)}) + 1 / 2 M^{″} ({\hat{β}}_{j}^{(k - 1)}) {(β_{j} - {\hat{β}}_{j}^{(k - 1)})}^{2},

where

M^{'}

and

M^{″}

represent the first and second derivatives of

M (\cdot)

, respectively. Therefore, minimizing

Q (β_{j})

is equivalent to minimizing the function

\begin{matrix} M ({\hat{β}}_{j}^{(k - 1)}) + M^{'} ({\hat{β}}_{j}^{(k - 1)}) (β_{j} - {\hat{β}}_{j}^{(k - 1)}) + \frac{1}{2} M^{″} ({\hat{β}}_{j}^{(k - 1)}) {(β_{j} - {\hat{β}}_{j}^{(k - 1)})}^{2} \\ + p_{λ} (|{\hat{β}}_{j}^{(k - 1)}|) + 1 / 2 \{p_{λ}^{'} (|{\hat{β}}_{j}^{(k - 1)}|) / |{\hat{β}}_{j}^{(k - 1)}|\} [β_{j}^{2} - {({\hat{β}}_{j}^{(k - 1)})}^{2}] \end{matrix}

with respect to

β_{j}

. A closed form solution is given as

{\hat{β}}_{j}^{(k)} = \frac{{\hat{β}}_{j}^{(k - 1)} M^{″} ({\hat{β}}_{j}^{(k - 1)}) - M^{'} ({\hat{β}}_{j}^{(k - 1)})}{M^{″} ({\hat{β}}_{j}^{(k - 1)}) + p_{λ}^{'} (|{\hat{β}}_{j}^{(k - 1)}|) / |{\hat{β}}_{j}^{(k - 1)}|} .

(6)

Note that it is easy to see that the resulting solution (6) and the approximation used above for the penalty function apply to any penalty function. However, the BAR penalty is not necessary to do a locally quadratic approximation, due to the fact that it is already a quadratic function of coefficients. For that situation, we can obtain the closed-form iterative solution proposed by Wu et al., 2020 [32] as

{\hat{β}}_{j}^{(k)} = {\hat{β}}_{j}^{(k - 1)} - \frac{Q^{'} ({\hat{β}}_{j}^{(k - 1)})}{Q^{″} ({\hat{β}}_{j}^{(k - 1)})},

(7)

where

Q^{'} ({\hat{β}}_{j}^{(k - 1)})

and

Q^{″} ({\hat{β}}_{j}^{(k - 1)})

are the first and second derivatives of

Q (β_{j})

with respect to

β_{j}

evaluated at

{\hat{β}}_{j}^{(k - 1)}

, respectively. By combining the discussion above, the algorithm can be implemented as follows:

Step 1:: Set $k = 0$ and choose the initial estimate ${\hat{β}}^{(0)}$

${\hat{β}}^{(0)} = arg min_{β} n^{- 1} \sum_{i = 1}^{n} {({\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{q}^{*} - X_{i}^{⊤} β)}^{2} .$
Step 2:: Use the coordinate descent algorithm to determine ${\hat{β}}^{(k)}$ by (6) for LASSO, SCAD and MCP, respectively. Meanwhile, we determine ${\hat{β}}^{(k)}$ by (7) for BAR.
Step 3:: Repeat Step 2 until the convergence or k exceeds a given large number.

There are many various criteria to check the convergence in Step 3 above. In the simulation studies below, the proposed algorithm is declared to achieve convergence if the maximum of the absolute differences of the estimates between two successive iterations is less than

10^{- 5}

(Sun et al., 2019) [33]. To implement the algorithm above, one also needs to choose the tuning parameter

λ_{n}

. For the results given below, we use the Bayesian information criterion (BIC) proposed by Schwarz (1978) [34] which is data-dependent and defined as

B I C_{λ} = 2 \cdot n^{- 1} \sum_{i = 1}^{n} {({\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{q}^{*} - X_{i}^{⊤} \hat{β})}^{2} + d f_{λ} l o g (n) .

In the above,

\hat{β}

denotes the final estimator of

β

and

d f_{λ}

represents the total number of nonzero estimates in

\hat{β}

, which serves as the degree of freedom. Alternatively, one could employ other methods such as a K-fold cross-validation (CV) (Verweij and Houwelingen, 1993) [35]. For given

λ_{n}

, we define BIC as above. Then, one can choose the value of

λ_{n}

that minimize

B I C_{λ}

. For variance estimation of the proposed estimators, we suggest using the nonparametric bootstrap method, which seems to work well as the numerical study indicates.

4. Asymptotic Properties

In this section, we establish the asymptotic properties of the variable selection procedure or the estimator proposed above with the use of the BAR penalty function. For this, let

β_{0} = {(β_{01}, \dots, β_{0 p_{n}})}^{⊤}

denote the true parameter value. Note that here, we replace p by

p_{n}

to emphasize the dependence of p on n and assume that

p_{n}

can diverge to infinity but

p_{n} < n

. For simplicity, we write

β_{0} = {(β_{01}^{⊤}, β_{02}^{⊤})}^{⊤}

and assume

β_{01} \neq 0

and

β_{02} = 0

, where

β_{01} \in R_{n}^{q_{n}}

and

β_{02} \in R_{n}^{p_{n} - q_{n}}

. Let

{\hat{β}}^{*} =

{({\hat{β}}_{1}^{^{*^{⊤}}}, {\hat{β}}_{2}^{^{*^{⊤}}})}^{⊤}

denote the BAR estimator of

β

corresponding to the same participation. Set

X_{α} = (x_{1}, \dots, x_{q_{n}}), X_{γ} = (x_{q_{n} + 1}, \dots, x_{p_{n}}), Σ_{n 1} = X_{α}^{⊤} X_{α} / n

and

Σ_{n} = X^{⊤} X / n

, where

x_{j}

is jth column of

X

for

j = 1, \dots, p_{n}

. Let

{\hat{Y}}_{i}^{*}

=

{\hat{h}}_{i}^{*} - n^{- 1} \sum_{i = 1}^{n} {\hat{h}}_{i}^{*}

,

{\hat{Y}}^{*} = ({\hat{Y}}_{1}^{*}, \dots, {\hat{Y}}_{n}^{*})

,

Y_{i}^{*}

=

h_{i}^{*} - n^{- 1} \sum_{i = 1}^{n} h_{i}^{*}

,

Y^{*} = (Y_{1}^{*}, \dots, Y_{n}^{*})

for

i = 1, \dots, n

. For the asymptotic properties, we need the following regularity conditions.

C 1

. For every

t \in (0, \infty)

,

H^{'} (t)

exists, where

H^{'} (t)

is the derivative of

H (t)

and

H (0) < \infty

.

C 2

.

U_{i}

and

V_{i}

are positive i.i.d random vectors with uniformly continuous density functions

g_{u} (u)

and

g_{v} (v)

, respectively.

C 3

.

T_{i}

and

(U_{i}, V_{i})

are independent given

X_{i}

.

C 4

.

V a r (h^{*}) < \infty

.

C 5

.

C_{n} = n^{- 1} \sum_{i = 1}^{n} X_{i}^{⊤} X_{i} \to C

with probability 1 as n tends to ∞, where

C

is a positive definite matrix ([19]).

C 6

.

K (t)

is uniformly continuous and of bound variation,

\int | K (t) | d t < \infty

and

K (t) \to 0

as

| t | \to \infty

.

\int K (t) d t = 1

and

{\int | x l o g | x | |}^{1 / 2} | d K (t) | < \infty

.

{lim}_{n \to \infty} h_{n} = 0

, and

{lim}_{n \to \infty} n h_{n} / log n = \infty

.

C 7

. There exists a constant

E > 1

such that

0 < 1 / E < λ_{min} (C_{n}) \leq λ_{max} (C_{n}) < E < \infty

for every integer n.

C 8

.

a_{0 n} = {min}_{1 \leq j \leq q_{n}} |β_{0 j}|

and

a_{1 n} = {max}_{1 \leq j \leq q_{n}} |β_{0 j}|

. As

n \to \infty, p_{n} q_{n} / \sqrt{n} \to 0, {(p_{n} / n)}^{1 / 2} / a_{0 n} \to 0

,

p_{n} / λ_{n} \to 0

and

λ_{n} a_{1 n} {(q_{n} / n)}^{1 / 2} / a_{0 n}^{2} \to 0

.

Note that Conditions

C 1

–

C 5

are necessary to obtain an unbiased transformation ([29]), where uniformly continuous density functions

g_{u} (u)

and

g_{v} (v)

are needed to make kernel estimation

{\hat{g}}_{u} (u)

and

{\hat{g}}_{v} (v)

converging to density functions

g_{u} (u)

and

g_{v} (v)

almost surely and Condition

C 6

guarantees that

{\hat{h}}_{i}^{*}

converges to

h_{i}^{*}

in probability ([31]). Condition

C 7

assumes that

C_{n}

is positive definite almost surely and its eigenvalues are bounded away from zero and infinity. Condition

C 8

gives some sufficient, but not necessary, conditions needed to prove the convergence and asymptotic properties of the BAR estimator and nonzero coefficients are assumed to be uniformly bounded away from zero and infinity ([12]). Define

β = {(α^{⊤}, γ^{⊤})}^{⊤}

, where

α

and

γ

are

q_{n} \times 1

and

(p_{n} - q_{n}) \times 1

vectors, respectively. The following theorem gives the asymptotic properties.

Theorem 1.

Assume that Conditions

C 1

–

C 8

given above hold and

ϕ_{1} (\cdot, \cdot)

,

ϕ_{2} (\cdot, \cdot)

, and

ϕ_{3} (\cdot, \cdot)

are continuous functions with finite continuous partial derivatives. Then, we have that:

(i) The fixed point of

f (α) = {\{X_{α}^{⊤} X_{α} + λ_{n} D_{1} (α)\}}^{- 1} X_{α}^{⊤} {\hat{Y}}^{*}

exists and is unique, where

D_{1} (α) = diag (α_{1}^{- 2}, \dots, α_{q_{n}}^{- 2})

.

(ii) (Oracle property) the BAR estimator

{\hat{β}}^{*} = {({\hat{β}}_{1}^{*^{⊤}}, {\hat{β}}_{2}^{*^{⊤}})}^{⊤}

exists and is unique, where

{\hat{β}}_{2}^{*} = 0

and

{\hat{β}}_{1}^{*}

is the unique fixed point of

f (α)

.

(iii) (Asymptotic normality)

\sqrt{n} ({\hat{β}}_{1}^{*} - β_{01}) \to N (0, Σ_{(1)})

in distribution, where

Σ_{(1)}

is defined in Appendix A.

5. Simulation Study

Now we present some results from a simulation study to assess the finite sample performance of the variable selection procedure presented in the previous sections. In the study, we generated the vector of covariates

X_{i}

from

N (0, Σ_{X})

and the covariance matrix

Σ_{X}

with the

(l, m)

element being

0 . 5^{| l - m |}

. Given the

X_{i}

’s, the true failure times were generated based on model (1) with

H (T) = log (T + 1)

and the

ε_{i}

’s following the standard normal distribution. For the generation of the observed interval-censored data or the two observation times for each subject, they were set to be times from the homogeneous Poisson process with the interexamination times being independently and identically distributed in the exponential distribution with a mean of 0.4 (Li et al., 2019 [20]).

For the application of the proposed variable selection procedure, we set

h^{*} = δ_{1} * 0 + δ_{2} * 0 + δ_{3} * \frac{1}{1 + v} / g_{v} (v)

(8)

by following Deng et al. (2012) [29]. Meanwhile, we set

h^{*} = δ_{1} * 0 + δ_{2} * \frac{1}{1 + u} / g_{u} (u) + δ_{3} * \frac{1}{1 + u} / g_{u} (u)

(9)

and

h^{*} = δ_{1} * 0 + δ_{2} * \frac{1}{1 + u} / 2 g_{u} (u) + δ_{3} * (\frac{1}{1 + u} / 2 g_{u} (u) + \frac{1}{1 + v} / 2 g_{v} (v)) .

(10)

For the kernel estimators

\hat{g} (u)

and

\hat{g} (v)

, we considered the following biquadratic kernel function

\begin{matrix} K (t) = \{\begin{matrix} 3 / 4 (1 - t^{2}), & | t | \leq 1, \\ 0, & else, \end{matrix} \end{matrix}

and several different bandwidths. They include (a)

n^{- 1 / 5}

, (b)

1.06 \cdot \hat{σ} \cdot n^{- 1 / 5}

, (c)

1.06 \cdot min (\hat{σ}, \hat{R} / 1.34) \cdot n^{- 1 / 5}

with

\hat{R}

being the 0.75 quantile minus the 0.25 quantile, and (d)

c_{1} \cdot n^{- 1 / 5}

, where

\hat{σ}

denotes the sample standard deviation and

c_{1}

is selected by the CV method over 20 numbers from

(0.5, 1.5)

with equal distance. The simulation results of above four circumstances are listed in Supplemental Materials. We also considered the selection of the tuning parameter

λ_{n}

and the bandwidth choice (d) together, based on the BIC described above with

λ_{n}

over 50 numbers from 0.001 to 0.01 with equal distance and

c_{1}

over 10 numbers from

(0.5, 1.5)

with equal distance. The results given below are based on the sample size

n = 300

or

n = 500

with 100 replications and

p = 10

, 30, 50, or 100.

Table 1, Table 2 and Table 3 are based on (8), (9), (10), respectively. Table 1, Table 2 and Table 3 present the results on the covariate selection with

β_{0} = {(0.5, 0.5, 0_{p - 2}^{⊤})}^{⊤}

and

β_{0} = {(0.5, 0.7, 0_{p - 2}^{⊤})}^{⊤}

, corresponding to relatively moderate and weak signals, respectively, as well as

β_{0} = {(0.5, 0.5, 0.5, 0.5, 0_{p - 4}^{⊤})}^{⊤}

. They include the average number of nonzero estimates of the parameters whose true values are not zero (TP) and the average number of nonzero estimates of the parameters whose true values are zero (FP). In addition, we calculated and included in the table the median mean squared errors (MMSE) given by

{(\hat{β} - β_{0})}^{⊤} Σ_{X} (\hat{β} - β_{0})

, measuring the prediction accuracy, and the standard deviation of MSE (SD), where

Σ_{X}

denotes the population covariance matrix of the covariates. In the table, in addition to the BAR penalty function, we also considered the LASSO, MCP, and SCAD penalty functions, and the joint selection of the tuning parameter

λ_{n}

and the bandwidth based on the BIC was used. We also added backward stepwise variable selection based on BIC and for

β_{0} = {(0.5, 0.5, 0_{p - 2}^{⊤})}^{⊤}

in Table 1.

Table 1. Simulation results based on the joint selection of the tuning parameter and bandwidth for (I).

Table 2. Simulation results based on the joint selection of the tuning parameter and bandwidth

n = 300

for (II).

Table 3. Simulation results based on the joint selection of the tuning parameter and bandwidth

n = 300

for (III).

Furthermore, we added lower and greater sample sizes

n = 100

and

n = 5000

with

p = 5

in Table 4 with

β_{0} = {(0.5, 0.5, 0_{p - 2}^{⊤})}^{⊤}

. Furthermore, we added an extra simulation to demonstrate how this method works in the presence of noncontinuous covariates, i.e., the last covariates are generated from a Bernoulli distribution with a 0.5 success probability. The simulation results of this setup are presented in Table 5 with

n = 300

,

p = 10, 30, 50

, and

β_{0} = {(0.5, 0.5, 0^{⊤})}^{⊤}

. Finally, we considered a toy example in which the left endpoint imputation is considered (Sun, 2006). We illustrated the results in Table 6 to show the error that would be made if the data were considered as uncensored with

n = 100

,

p = 5

and

β_{0} = {(0.5, 0.5, 0_{p - 2}^{⊤})}^{⊤}

.

Table 4. Simulation results based on the joint selection of the tuning parameter and bandwidth for

n = 100

or

n = 5000

with

p = 5

.

Table 5. Simulation results based on the joint selection of the tuning parameter and bandwidth

n = 100

for existing noncontinuous covariate.

Table 6. Comparison of the proposed method to the left endpoint method.

One can see from Table 1, Table 2 and Table 3 that the proposed approach seems to perform well with all penalty functions considered and in general, the method with the BAR penalty function gave smaller FP and thus parsimonious models. Meanwhile, results are similar for (I), (II), and (III). As expected, the method with the LASSO penalty function gave slightly larger FP and tended to select more noises than other penalty functions, and the method gave better results on both variable selection and prediction accuracy when the sample size increased. Furthermore, as expected, the important covariates with weak effects were more difficult to be identified than moderate effect covariates. The stepwise variable selection gave the largest MMSE and SD. One can see from Table 4 that the results with a larger sample have lower MMSE and SD. The proposed method only needed 17 s, 15 s, 20 s, and 5 s for LASSO, SCAD, MCP and BAR, respectively, on average in Table 4. One can see from Table 5 that the proposed procedure works in the presence of noncontinuous covariates. The unbiased transformation method can improve accuracy as shown when comparing the proposed method to the left endpoint imputation in Table 6.

6. An Application

In this section, we apply the methodology proposed in the previous sections to a set of children’s mortality data arising from the 2003 Nigeria Demographic and Health Survey (Kneib, 2006) [36]. The data set consists of 5730 children with their survival information and six covariates. They are AGE (the age of the children’s mother when giving birth), BMI (the mother’s body mass index at birth), HOSP (1 if the baby was delivered in a hospital and 0 otherwise), GENDER (1 for boys and 0 otherwise), EDU (1 if the mother received higher education and 0 otherwise), and URBAN (1 if the family lived in an urban area and 0 otherwise). Among others, one of the objectives of the study was to identify the covariates that had a significant effect on children’s mortality in Nigeria.

In the study, for each subject, if death occurred within the first two months after birth, the failure time could be observed exactly, and otherwise, only interval-censored observations were obtained based on the interview times of their mothers. Among the 5730 children, 233 gave exact failure times, 430 interval-censored observations, and the others provided right-censored observations. To apply the proposed approach, as in the simulation study, we set

H (T) = log (T + 1)

and used the same four penalty functions. For the three functions

ϕ_{1} (\cdot, \cdot)

,

ϕ_{2} (\cdot, \cdot)

and

ϕ_{3} (\cdot, \cdot)

, we considered three choices and they are (I)

ϕ_{1} (U_{i}, V_{i}) = 0, ϕ_{2} (U_{i}, V_{i}) = 0, ϕ_{3} (U_{i}, V_{i}) = H^{'} (V_{i}) / {\hat{g}}_{v} (V_{i})

; (II)

ϕ_{1} (U_{i}, V_{i}) = 0, ϕ_{2} (U_{i}, V_{i}) = H^{'} (U_{i}) / {\hat{g}}_{u} (U_{i}), ϕ_{3} (U_{i}, V_{i}) = H^{'} (U_{i}) / {\hat{g}}_{u} (U_{i})

; (III)

ϕ_{1} (U_{i}, V_{i}) = 0

,

ϕ_{2} (U_{i}, V_{i}) = H^{'} (U_{i}) /

2 {\hat{g}}_{u} (U_{i})

,

ϕ_{3} (U_{i}, V_{i}) = H^{'} (U_{i}) / 2 {\hat{g}}_{u} (U_{i}) + H^{'} (V_{i}) / 2 {\hat{g}}_{u} (V_{i})

, which will be referred to as choices (I), (II), and (III) below, respectively. Note that for exact failure times, we used

H (T)

as the response and made no unbiased transformation.

The results on the covariate selection and the estimated covariate effects are presented in Table 7, Table 8 and Table 9 with the use of choices (I)–(III), respectively. In addition, the estimated standard errors were obtained by using the nonparametric bootstrap method with 100 bootstrap samples and are included in the tables. One can see from these tables that the results seem to be robust with respect to the three choices, and all methods selected the factor AGE, suggesting that the age of the mother giving birth has a significant effect on the mortality risk of the children. The factor EDU was also selected by the LASSO, SCAD, and MCP penalty functions, and the results indicate that the level of the mother’s education seem to have a significant effect on the mortality risk. Xu et al. (2020) [37] have analyzed this application. They also found that EDU had a significant effect on the mortality risk of the children by the LASSO, ALASSO, and SCAD penalty functions. In contrast, the results suggest that the mother’s body mass index, the child’s gender, the baby’s birth location, and the family’s location did not appear to have significant effects on the children’s mortality or death rate.

Table 7. Analysis results of children’s mortality data based on choice (I).

Table 8. Analysis results of children’s mortality data based on choice (II).

Table 9. Analysis results of children’s mortality data based on choice (III).

7. Discussion and Concluding Remarks

This paper discussed the variable selection for generalized linear models when only interval-censored failure time data are available, and for that problem, a new unbiased-transformation-based approach was proposed. One advantage for the use of the unbiased transformation is that it allows one to employ the simple penalized least squares approach for estimation. The proposed approach can accommodate any penalty function such as LASSO, SCAD, MCP, and BAR, and a coordinate descent algorithm was developed for the implementation of the proposed procedure. In addition, the asymptotic properties of the resulting estimators were established, and the simulation study indicated that the proposed methodology works well for practical situations.

There exist several directions for future research. One is that in the proposed method, it was assumed that

H (T) < \infty

and it is clear that this may not hold in practice. One such example is the accelerated failure time (AFT) model, in which case the proposed method would not be valid. In other words, one needs to modify the proposed unbiased transformation or develop some new methods that can be applied to the AFT model. Another direction is that in practical applications, one may encounter multivariate failure time data; in that scenario, Cai et al. (2005) [14] proposed a penalized marginal likelihood method for right-censored data with a mass of covariates. It would be helpful to develop some flexible and reliable methods to handle other types of multivariate failure time data including interval-censored data. A third direction is that it has been assumed in the previous sections that the dimension of covariates

p_{n}

can diverge to infinity but is smaller than the sample size n. Obviously, there may be case where

p_{n}

is greater than n such as in genetic or biomarker studies. In other words, some new methods that allow for

p_{n} > n

need to be developed.

As one anonymous reviewer pointed out, neutrosophic statistics (Smarandache, 1998 [38]; Smarandache, 2013 [39]) is the extension of classical statistics and is applied when the data is coming from a complex process or from an uncertain environment. The current study can be extended using neutrosophic statistics as future research.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/math10050763/s1, Table S1: Simulation results based on the bandwidth selection method (a), Table S2: Simulation results based on the bandwidth selection method (b), Table S3: Simulation results based on the bandwidth selection method (c), Table S4: Simulation results based on the bandwidth selection method (d).

Author Contributions

Formal analysis, R.L.; Investigation, S.Z.; Methodology, T.H.; Software, R.L.; Supervision, T.H. and J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Beijing Natural Science Foundation Z210003 and National Nature Science Foundation of China (Grant Nos: 12171328,11971064). Shishun Zhao’s study was supported by National Natural Science Foundation of China (NSFC) (12071176), Science and Technology Developing Plan of Jilin Province (No. 20200201258JC, 20190902012TC, 20190304129YY).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. Interested researchers can obtain the data in the application section of this paper on the official website of Demographic and Health Surveys (DHS) Program: https://dhsprogram.com/data/available-datasets.cfm (accessed on 19 February 2022). One can registered for a download account and apply for the data.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this appendix, we sketch the proof of Theorem 1. For completeness, we introduce a few lemmas and notations, which are useful for the proof of Theorem 1. Let

C_{n} = X^{⊤} X / n

. When we consider BAR penalty, i.e.,

arg min_{β} n^{- 1} \sum_{i = 1}^{n} {({\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{q}^{*} - X_{i}^{⊤} β)}^{2} + λ \sum_{j = 1}^{n} β_{j}^{2} / {\tilde{β}}_{j}^{2} .

(A1)

In Section 2 we can get

β^{k} = g (β^{k - 1}) = {\{X^{⊤} X + λ_{n} D (β^{k - 1})\}}^{- 1} X^{⊤} {\hat{Y}}^{*} = {(α^{*} {(β^{k - 1})}^{⊤}, γ^{*} {(β^{k - 1})}^{⊤})}^{⊤} .

(A2)

For simplicity, we write

α^{*} (β)

and

γ^{*} (β)

as

α^{*}

and

γ^{*}

.

C_{n}^{- 1}

can be partitioned as

C_{n}^{- 1} = (\begin{matrix} A_{11} & A_{12} \\ A_{12}^{⊤} & A_{22} \end{matrix}),

where the

A_{11}

is a

q \times q

matrix. Multiplying

{(X^{⊤} X)}^{- 1} (X^{⊤} X + λ_{n} D (β))

to Equation (A2) gives

(\begin{matrix} α^{*} - β_{01} \\ γ^{*} \end{matrix}) + λ_{n} / n (\begin{matrix} A_{11} D_{1} (α) α^{*} + A_{12} D_{2} (γ) γ^{*} \\ A_{12}^{⊤} D_{1} (α) α^{*} + A_{22} D_{2} (γ) γ^{*} \end{matrix}) = {(X^{⊤} X)}^{- 1} X^{⊤} {\hat{ϵ}}^{*} = {\hat{β}}_{l e a s t} - β_{0},

where

{\hat{ϵ}}_{i}^{*} = ϵ_{i}^{*} + ({\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{q}^{*} - h_{i}^{*} + n^{- 1} \sum_{q = 1}^{n} h_{q}^{*}), {\hat{β}}_{l e a s t} = {(X^{⊤} X)}^{- 1} X^{⊤} {\hat{Y}}^{*}, D_{1} (α) = diag (α_{1}^{- 2}, \dots, α_{q_{n}}^{- 2})

,

D_{2} (γ) = diag (γ_{1}^{- 2}, \dots, γ_{p_{n} - q_{n}}^{- 2})

, and

ϵ_{i}^{*} = h_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} h_{q}^{*} - X_{i}^{T} β

.

Lemma A1.

Assume that the conditions

C 2,

C 6

hold and

ϕ_{1} (\cdot, \cdot)

,

ϕ_{2} (\cdot, \cdot)

, and

ϕ_{3} (\cdot, \cdot)

are continuous functions with finite continuous partial derivatives. We have

\begin{matrix} sup_{u, v} | {\hat{h}}^{*} ({\hat{g}}_{u} (u), {\hat{g}}_{v} (v)) - h^{*} (g_{u} (u), g_{v} (v)) | \to 0 a . s . a s n \to \infty . \end{matrix}

Proof.

Under the conditions

C 2

and

C 6

, we have that

{sup}_{u} | {\hat{g}}_{u} (u) - g_{u} (u) | \to 0 a . s .

and

{sup}_{v} | {\hat{g}}_{v} (v) - g_{v} (v) | \to 0 a . s . a s n \to \infty

according to Theorem A in Silverman (1978) [31]. Using Taylor’s expansion,

ϕ_{1} (\cdot, \cdot)

,

ϕ_{2} (\cdot, \cdot)

, and

ϕ_{3} (\cdot, \cdot)

are continuous functions with finite continuous partial derivatives yielding

\begin{matrix} sup_{u, v} | {\hat{h}}^{*} ({\hat{g}}_{u} (u), {\hat{g}}_{v} (v)) - h^{*} (g_{u} (u), g_{v} (v)) | \to 0 a . s . a s n \to \infty . \end{matrix}

The proof of Lemma A1 is completed. □

Lemma A2.

Let

{\hat{β}}_{l e a s t}

denote the least squares estimator defined in (3),

β_{0}

is a true value of

β

and suppose that the conditions

C 1

to

C 8

hold. Then, we have that

∥{\hat{β}}_{l e a s t} - β_{0}∥ = O_{p} (\sqrt{p_{n} / n}),

Proof.

When we consider (A1), according to

C 4

, we get

V a r (ϵ) < \infty

. According to (3), we get

n^{- 1} \sum_{i = 1}^{n} {(h_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} h_{q}^{*} - X_{i}^{T} β + {\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{q}^{*} - h_{i}^{*} + n^{- 1} \sum_{q = 1}^{n} h_{q}^{*})}^{2}

. After some simple algebraic manipulations, we get

{({\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{q}^{*} - h_{i}^{*} + n^{- 1} \sum_{q = 1}^{n} h_{q}^{*} + ϵ^{*})}^{2} \leq 2 {({\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{i}^{*} - h_{i}^{*} + n^{- 1} \sum_{q = 1}^{n} h_{q}^{*})}^{2} + 2 ϵ^{* 2} .

Under the conditions

C 1

to

C 8

, we have that

2 {({\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{q}^{*} - h_{i}^{*} + n^{- 1} \sum_{q = 1}^{n} h_{q}^{*})}^{2} = o_{p} (1)

. According to Lemma A1,

n E {({\hat{h}}_{i}^{*} - n^{- 1} \sum_{q = 1}^{n} {\hat{h}}_{i}^{*} - h_{i}^{*} + n^{- 1} \sum_{q = 1}^{n} h_{q}^{*} + ϵ_{i}^{*})}^{2} = n E (ϵ_{i}^{* 2}) + n o_{p} (1)

.

n E ∥ {\hat{β}}_{l e a s t} - β_{0} ∥ = n E (∥ {(X^{⊤} X)}^{- 1} X^{⊤} ({\hat{Y}}^{*} - Y^{*} + ϵ^{*}) ∥^{2}) = (E (ϵ^{* ⊤} ϵ^{*}) + o_{p} (1)) \cdot trace (C_{n}) = O_{p} (p_{n})

. We get that

∥ {\hat{β}}_{l e a s t} - β_{0} ∥^{2} = O_{p} (p_{n} / n)

. □

Lemma A3.

Suppose that the conditions

C 2

,

C 4

, and

C 6

hold. Then, we have that

lim_{n \to \infty} P (| V a r ({\hat{Y}}^{*}) - V a r (Y^{*}) | > ϵ) = 0 .

Proof.

According to

C 4

, we know

V a r (Y^{*}) = E (ϵ^{* ⊤} ϵ^{*}) < \infty

. According to Lemma A1,

{\hat{h}}_{i}^{*}

converges to

h_{i}^{*}

in probability for

i = 1, \dots, n

. Then,

V a r ({\hat{Y}}^{*}) = E {{({\hat{Y}}^{*} - Y^{*} + ϵ^{*})}^{⊤} ({\hat{Y}}^{*} - Y^{*} + ϵ^{*})} = E ((ϵ^{* ⊤} ϵ^{*}) + ϵ^{*} ({\hat{Y}}^{*} - Y^{*}) + {({\hat{Y}}^{*} - Y^{*})}^{2}) = E (ϵ^{* ⊤} ϵ^{*}) + o_{p} (1) .

So Lemma A3 is proved. □

Proof of Theorem 1 (Oracle Property).

For a zero component, according to Lemma 1 in [12] conclusion (i), let

δ_{n}

be a sequence of positive real numbers satisfying

δ_{n} \to \infty

and

δ_{n}^{2} p_{n} / λ_{n} \to 0

. Define

H_{n} = \{β \in R^{p_{n}} : ∥β - β_{0}∥ \leq δ_{n} \sqrt{p_{n} / n}\}

,

H_{n 1} = \{α \in R^{q n} : ∥α - β_{01}∥ \leq δ_{n} \sqrt{p_{n} / n}\} .

Then, with probability tending to 1, we have

∥ γ^{*} ∥ < ∥ γ ∥ \leq δ_{n} \sqrt{p_{n} / n} \to 0 .

(A3)

For a nonzero component, according to Lemma 2 in [12],

f (α)

is a contraction mapping from

H_{n 1}

to itself, where

f (α) = {\{X_{α}^{⊤} X_{α} + λ_{n} D_{1} (α)\}}^{- 1} X_{α}^{⊤} {\hat{Y}}^{*} .

Let

{\hat{α}}^{\circ}

is the unique fixed point of

f (α)

defined by

{\hat{α}}^{\circ} = {\{X_{α}^{⊤} X_{α} + λ_{n} D_{1} ({\hat{α}}^{\circ})\}}^{- 1} X_{α}^{⊤} {\hat{Y}}^{*}

.

Hence, to prove the consistency of nonzero parts in Theorem 1, it is sufficient to show that

Pr (lim_{k \to \infty} ∥{\hat{α}}^{(k)} - {\hat{α}}^{\circ}∥ = 0) \to 1 .

(A4)

Define

γ^{*} = 0

if

γ = 0

. It is easy to see from (A3) that for any

α \in H_{n 1}

,

lim_{γ \to 0} γ^{*} (α, γ) = 0 .

(A5)

Combining (A5) with the fact that

(\begin{matrix} X_{α}^{⊤} X_{α} + λ_{n} D_{1} (α) & X_{α}^{⊤} X_{γ} \\ X_{γ}^{⊤} X_{α} & X_{γ}^{⊤} X_{γ} + λ_{n} D_{2} (γ) \end{matrix}) (\begin{matrix} α^{*} \\ γ^{*} \end{matrix}) = (\begin{matrix} X_{α}^{⊤} \hat{Y^{*}} \\ X_{γ}^{⊤} \hat{Y^{*}} \end{matrix}),

we find that, for any

α \in H_{n}

,

lim_{γ \to 0} α^{*} (α, γ) = {\{X_{α}^{⊤} X_{α} + λ_{n} D_{1} (α)\}}^{- 1} X_{α} {\hat{Y}}^{*} = f (α) .

(A6)

According to conclusion b in Lemma 1 of Dai et al. (2018) [12], g in (A2) is a mapping from

H_{n}

to itself. This, together with (A3) and (A6), implies that, as

k \to \infty

,

η_{k} \equiv sup_{α \in H_{n}} ∥f (α) - α^{*} (α, {\hat{γ}}^{(k)})∥ \to 0 .

(A7)

with probability tending to 1. Note that

\begin{matrix} ∥{\hat{α}}^{(k + 1)} - {\hat{α}}^{\circ}∥ = ∥α^{*} ({\hat{β}}^{(k)}) - {\hat{α}}^{\circ}∥ & \leq ∥α^{*} ({\hat{β}}^{(k)}) - f ({\hat{α}}^{(k)})∥ + ∥f ({\hat{α}}^{(k)}) - {\hat{α}}^{\circ}∥ \\ \leq η_{k} + \frac{1}{E} ∥{\hat{α}}^{(k)} - {\hat{α}}^{\circ}∥, \end{matrix}

where the last step is on the basis of

∥f ({\hat{α}}^{(k)}) - {\hat{α}}^{\circ}∥ = ∥f ({\hat{α}}^{(k)}) - f ({\hat{α}}^{\circ})∥ \leq ∥{\hat{α}}^{(k)} - {\hat{α}}^{\circ}∥ / E

. Let

a_{k} = ∥{\hat{α}}^{(k)} - {\hat{α}}^{\circ}∥

, for every integer

k \geq 0

. From (A7), for any

ϵ > 0

, there exists a positive integer N such that for every integer

k > N, |η_{k}| < ϵ

\begin{matrix} a_{k + 1} \leq \frac{a_{k - 1}}{E^{2}} + \frac{η_{k - 1}}{E} + η_{k} & \leq \frac{a_{1}}{E^{k}} + \frac{η_{1}}{E^{k - 1}} + \dots + \frac{η_{N}}{E^{k - N}} + (\frac{η_{N + 1}}{E^{k - N - 1}} + \dots + \frac{η_{k - 1}}{E} + η_{k}) \\ \leq (a_{1} + η_{1} + \dots + η_{N}) \frac{1}{E^{k - N}} + \frac{1 - {(1 / E)}^{k - N}}{1 - 1 / E} ϵ \end{matrix}

with probability 1 as n tends to ∞ and the right-hand term tends to 0 as

k \to \infty

. This proves (A4). Therefore,

{lim}_{k \to \infty} β^{(k)} = {({\hat{α}}^{\circ ⊤}, 0)}^{⊤}

. □

Proof (Proof of Theorem 1) (Asymptotic Normality).

We next prove the asymptotic normality of the nonzero components in the BAR regression. Finally, based on (A6), we have

\sqrt{n} ({\hat{α}}^{*} - α) = Π_{1} + Π_{2},

where

Π_{1} = \sqrt{n} [{(X_{α}^{⊤} X_{α} + λ_{n} D_{1} (α))}^{- 1} X_{α} X_{α}^{⊤} - I_{q_{n}}] α,

Π_{2} = \sqrt{n} [{(X_{α}^{⊤} X_{α} + λ_{n} D_{1} (α))}^{- 1} (X_{α}^{⊤} {\hat{Y}}^{*} - X_{α}^{⊤} X_{α} α)] .

According to the first-order resolvent expansion formula, we obtain

\begin{matrix} {(X_{α}^{⊤} X_{α} + λ_{n} D_{1} (α))}^{- 1} = & {(X_{α}^{⊤} X_{α})}^{- 1} - λ_{n} {(X_{α}^{⊤} X_{α})}^{- 1} D_{1} (α) (X_{α}^{⊤} X_{α} \\ {+ λ_{n} D_{1} (α))}^{- 1} . \end{matrix}

This yields

\begin{matrix} Π_{1} = & - λ_{n} / \sqrt{n} {(X_{α}^{⊤} X_{α} / n)}^{- 1} D_{1} (\hat{α}) {(X_{α}^{⊤} X_{α} + λ_{n} D_{1} (α))}^{- 1} \\ \times X_{α} X_{α}^{⊤} α . \end{matrix}

With conditions

C 7

and

C 8

, we can get

∥Π_{1}∥ = O_{p} (λ_{n} \sqrt{q_{n} / n}) \to 0,

Π_{2} = \sqrt{n} [{(n^{- 1} X_{α}^{⊤} X_{α} + o_{p} (n^{- 1 / 2})]}^{- 1} (n^{- 1} X_{α}^{⊤} {\hat{Y}}^{*} - n^{- 1} X_{α}^{⊤} X_{α} α)] .

Therefore, we can get

\sqrt{n} ({\hat{α}}^{*} - α) = E {(X_{α}^{⊤} X_{α})}^{- 1} n^{- 1 / 2} (X_{α}^{⊤} {\hat{Y}}^{*} - X_{α}^{⊤} X_{α} α) + o_{p} (1) .

According to Lemma A3, we know that

V a r ({\hat{Y}}^{*})

converge to

V a r (Y^{*})

in probability. Then, we get

\sqrt{n} Σ_{(1)}^{(- 1 / 2)} ({\hat{α}}^{*} - α^{*}) \to N (0, I_{q \times q})

in distribution. Denote

E {(X_{α}^{⊤} X_{α})}^{- 1} V a r (X_{α}^{⊤} Y^{*}) E {(X_{α}^{⊤} X_{α})}^{- 1}

as

Σ_{(1)}

. This complete the proof.

□

References

Goldberger, A.S.; Jochems, D.B. Note on stepwise least squares. J. Am. Stat. Assoc. 1961, 56, 105–110. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Mitchell, T.J.; Beauchamp, J.J. Bayesian Variable Selection in Linear Regression. J. Am. Stat. Assoc. 1988, 83, 1023–1041. [Google Scholar] [CrossRef]
Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Raftery, A.E.; Madigan, D.; Hoeting, J.A. Bayesian Variable model averaing for Linear Regression Models. J. Am. Stat. Assoc. 1997, 92, 179–191. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
Zou, H. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Li, R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zhang, C.H. Nearly Unbiased Variable Selection Under Minimax Concave Penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
Dicker, L.; Huang, B.; Lin, X. Variable Selection and Estimation with the Seamless-L₀ penalty. Stata Sin. 2013, 23, 929–962. [Google Scholar] [CrossRef]
Liu, Z.; Li, G. Efficient Regularized Regression with L₀ Penalty for Variable Selection and Network Construction. Comput. Math. Methods Med. 2016, 2016, 3456153. [Google Scholar] [CrossRef] [Green Version]
Dai, L.; Chen, K.; Sun, Z.; Liu, Z.; Li, G. Broken Adaptive Ridge Regression and Its Asymptotic Properties. J. Multivar. Anal. 2018, 168, 334–351. [Google Scholar] [CrossRef] [PubMed]
Zheng, X.; Rong, Y.; Liu, L.; Cheng, W. A More Accurate Estimation of Semiparametric Logistic Regression. Mathematics 2021, 9, 2376. [Google Scholar] [CrossRef]
Cai, J.; Fan, J.; Li, R.; Zhou, H. Variable Selection for Multivariate Failure Time Data. Biometrika 2005, 92, 303–316. [Google Scholar] [CrossRef] [PubMed]
Fan, J.; Li, R. Variable Selection for Cox’s Proportional Hazards Model and Frailty Model. Ann. Stat. 2002, 30, 74–99. [Google Scholar] [CrossRef]
Tibshirani, R. The Lasso Method for Variable Selection in the Cox Model. Stat. Med. 1997, 16, 385–395. [Google Scholar] [CrossRef] [Green Version]
Zhang, H.H.; Lu, W. Adaptive Lasso for Cox’s proportional hazards model. Biometrika 2007, 94, 691–703. [Google Scholar] [CrossRef]
Khan, N.; Aslam, M.; Raza, S.M.; Jun, C. A New Variable Control Chart under Failure-censored Reliability Tests for Weibull distribution. Qual. Reliab. Eng. Int. 2018, 35, 572–581. [Google Scholar] [CrossRef]
Zhao, H.; Wu, Q.; Li, G.; Sun, J. Simultaneous Estimation and Variable Selection for Interval-Censored Data With Broken Adaptive Ridge Regression. J. Am. Stat. Assoc. 2020, 115, 204–216. [Google Scholar] [CrossRef]
Li, S.; Wei, Q.; Sun, J. Penalized estimation of semiparametric transformation models with interval-censore data and application to Alzheimer’s disease. Stat. Methods Med. Res. 2019, 29, 2151–2166. [Google Scholar] [CrossRef]
Du, M.; Sun, J. Variable selection for interval-censored failure time data. Int. Stat. Rev. 2021. accepted. [Google Scholar] [CrossRef]
Ali, S.; Raza, S.M.; Aslam, M.; Butt, M.M. CEV-Hybrid DEWMA charts for censored data Using Weibull distribution. Commun. Stat.—Simul. Comput. 2021, 50, 446–461. [Google Scholar] [CrossRef]
Zhao, S.; Hu, T.; Ma, L.; Wang, P.; Sun, J. Regression analysis of informative current status data with the additive hazards model. Lifetime Data Anal. 2015, 21, 241–258. [Google Scholar] [CrossRef]
Wang, P.; Zhao, H.; Sun, J. Regression Analysis of Case K Interval Censored Failure Time Data in the Presence of Informative Censoring. Biometrics 2016, 72, 1103–1112. [Google Scholar] [CrossRef] [PubMed]
Li, S.; Hu, T.; Wang, P.; Sun, J. A Class of Semiparametric Transformation Models for Doubly Censored Failure Time Data. Scand. J. Stat. 2018, 45, 682–698. [Google Scholar] [CrossRef]
Sun, J. The Statistical Analysis of Interval-Censored Failure Time Data; Springer: New York, NY, USA, 2006. [Google Scholar]
Buckley, J.; James, I. Linear regression with censored data. Biometrika 1979, 66, 429–436. [Google Scholar] [CrossRef]
Deng, W. Some Issues on Interval Censored Data. Ph.D. Dissertation, Fudan University, Shanghai, China, 2004. [Google Scholar]
Deng, W.; Tian, Y.; Lv, Q. Parametric Estimator of Linear Model with Interval-Censored Data. Commun. Stat.—Simul. Comput. 2012, 41, 1794–1804. [Google Scholar] [CrossRef]
Parzen, E. On Estimation of a Probability Density Function and Mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Silverman, B.W. Weak and Strong Uniform Consistency of the Kernel Estimate of a Density and its Derivatives. Ann. Statist. 1978, 6, 177–184. [Google Scholar] [CrossRef]
Wu, Q.; Zhao, H.; Zhu, L.; Sun, J. Variable Selection for High-dimensional Partly Linear Additive Cox model with application to Alzheimer’s Disease. Stat. Med. 2020, 39, 3120–3134. [Google Scholar] [CrossRef]
Sun, L.; Li, S.; Wang, L.; Song, X. Variable Selection in semiparametric nonmixture cure model with interval-censored failure time data: An application to the prostate cancer screening study. Stat. Med. 2019, 38, 3026–3039. [Google Scholar] [CrossRef] [PubMed]
Schwarz, G. Estimating the Dimension for a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Verweij, P.; Houwelingen, H. Cross-validation in survival analysis. Stat. Med. 2010, 12, 2305–2314. [Google Scholar] [CrossRef]
Kneib, T. Mixed Model-Based Inference in Geoadditive Hazard Regression for Interval-Censored Survival Times. Comput. Stat. Data Anal. 2006, 51, 777–792. [Google Scholar] [CrossRef] [Green Version]
Xu, Y.; Zhao, S.; Tao, H.; Sun, J. Variable Selection for generalized odds rate mixture cure models wth interval-censored failure time data. Comput. Stat. Data Anal. 2020, 156, 107115. [Google Scholar] [CrossRef]
Smarandache, F. Neutrosophic Probability, Set, and Logic; American Research Press: Rehoboth, NM, USA, 1998. [Google Scholar]
Smarandache, F. Introduction to Neutrosophic Measure, Neutrosophic Integral, and Neutrosophic Probability; Sitech-Eduacation Publisher: Craiova, Romania, 2013. [Google Scholar]

Table 1. Simulation results based on the joint selection of the tuning parameter and bandwidth for (I).

		$n = 300$			$n = 500$
$β = (0.5, 0.5, 0, \dots, 0)$
p	Method	TP	FP	MMSE (SD)	TP	FP	MMSE (SD)
10	Stepwise	1.49	0.12	0.179 (0.097)	1.96	1.58	0.144 (0.060)
	LASSO	2.00	1.03	0.174 (0.071)	2.00	1.21	0.164 (0.065)
	SCAD	1.95	0.92	0.145 (0.070)	1.97	0.90	0.132 (0.063)
	MCP	1.92	0.77	0.139 (0.064)	1.95	0.76	0.131 (0.061)
	BAR	1.89	0.49	0.242 (0.090)	1.95	0.48	0.134 (0.058)
30	Stepwise	1.90	5.65	0.279 (0.115)	1.98	5.05	0.208 (0.069)
	LASSO	1.99	3.17	0.216 (0.145)	2.00	2.86	0.186 (0.056)
	SCAD	1.93	2.97	0.187 (0.085)	1.99	2.51	0.145 (0.059)
	MCP	1.90	2.68	0.187 (0.092)	1.98	2.01	0.137 (0.056)
	BAR	1.98	1.14	0.143 (0.055)	1.98	1.14	0.143 (0.054)
50	Stepwise	1.84	9.90	0.400 (0.140)	1.96	9.69	0.292 (0.083)
	LASSO	1.97	4.86	0.233 (0.063)	2.00	4.66	0.193 (0.054)
	SCAD	1.90	4.54	0.206 (0.079)	1.96	4.25	0.151 (0.063)
	MCP	1.88	4.32	0.220 (0.099)	1.95	3.78	0.155 (0.068)
	BAR	1.73	1.45	0.205 (0.079)	1.93	1.76	0.158 (0.066)
100	Stepwise	1.87	25.96	0.875 (0.277)	1.96	21.61	0.542 (0.137)
	LASSO	1.99	9.83	0.255 (0.079)	2.00	8.67	0.211 (0.064)
	SCAD	1.86	9.55	0.228 (0.103)	1.97	8.12	0.173 (0.081)
	MCP	1.87	8.85	0.299 (0.133)	1.95	7.32	0.189 (0.095)
	BAR	1.75	2.99	0.253 (0.101)	1.91	3.27	0.196 (0.082)
$β = (0.5, 0.7, 0, \dots, 0)$
10	LASSO	2.00	1.13	0.278 (0.086)	2.00	1.25	0.225 (0.074)
	SCAD	1.94	0.85	0.225 (0.080)	1.97	0.86	0.209 (0.070)
	MCP	1.94	0.84	0.220 (0.076)	1.97	0.77	0.204 (0.067)
	BAR	1.89	0.29	0.229 (0.082)	1.95	0.45	0.214 (0.067)
30	LASSO	2.00	3.06	0.311 (0.096)	2.00	2.85	0.276 (0.069)
	SCAD	1.90	2.80	0.248 (0.099)	1.99	2.53	0.209 (0.069)
	MCP	1.91	2.69	0.258 (0.109)	1.99	2.12	0.207 (0.063)
	BAR	1.85	0.91	0.254 (0.102)	1.98	1.08	0.214 (0.065)
50	LASSO	1.98	4.78	0.337 (0.082)	2.00	4.13	0.278 (0.059)
	SCAD	1.90	4.64	0.284 (0.092)	2.00	3.55	0.214 (0.069)
	MCP	1.93	4.32	0.292 (0.115)	1.98	3.25	0.214 (0.067)
	BAR	1.82	2.07	0.373 (0.578)	1.97	1.48	0.221 (0.063)
100	LASSO	1.99	9.44	0.343 (0.092)	1.99	8.14	0.312 (0.069)
	SCAD	1.94	9.50	0.298 (0.121)	1.98	8.04	0.241 (0.079)
	MCP	1.90	8.45	0.346 (0.137)	1.99	7.19	0.249 (0.082)
	BAR	1.83	2.72	0.309 (0.113)	1.99	3.12	0.262 (0.079)
$β = (0.5, 0.5, 0.5, 0.5, 0, \dots, 0)$
10	LASSO	3.92	1.03	0.649 (0.134)	3.97	1.10	0.588 (0.109)
	SCAD	3.60	0.81	0.591 (0.132)	3.81	0.87	0.530 (0.120)
	MCP	3.39	0.58	0.586 (0.137)	3.67	0.64	0.533 (0.125)
	BAR	3.24	0.25	0.615 (0.138)	3.61	0.37	0.552 (0.126)
30	LASSO	3.89	2.79	0.705 (0.127)	3.99	2.58	0.643 (0.112)
	SCAD	3.56	2.62	0.630 (0.139)	3.76	2.59	0.557 (0.121)
	MCP	3.38	2.27	0.614 (0.136)	3.67	2.07	0.540 (0.110)
	BAR	3.15	0.75	0.630 (0.140)	3.64	0.96	0.547 (0.109)
50	LASSO	3.90	4.49	0.754 (0.119)	3.98	4.40	0.663 (0.108)
	SCAD	3.51	4.30	0.679 (0.132)	3.70	4.38	0.578 (0.122)
	MCP	3.45	3.82	0.660 (0.144)	3.57	3.60	0.555 (0.123)
	BAR	3.08	1.19	0.673 (0.138)	3.57	1.68	0.566 (0.111)
100	LASSO	3.84	9.20	0.793 (0.139)	3.97	7.62	0.695 (0.086)
	SCAD	3.38	9.15	0.724 (0.157)	3.64	7.80	0.616 (0.094)
	MCP	3.05	8.04	0.742 (0.191)	3.52	6.50	0.600 (0.100)
	BAR	3.00	2.57	0.735 (0.172)	3.53	2.99	0.607 (0.101)

Table 2. Simulation results based on the joint selection of the tuning parameter and bandwidth

n = 300

for (II).

Table 2. Simulation results based on the joint selection of the tuning parameter and bandwidth

n = 300

for (II).

$β = (0.5, 0.5, 0, \dots, 0)$
p	Method	TP	FP	MMSE (SD)
10	LASSO	2.00	0.52	0.174 (0.047)
	SCAD	1.99	0.38	0.129 (0.048)
	MCP	1.99	0.32	0.127 (0.044)
	BAR	1.96	0.08	0.132 (0.047)
30	LASSO	2.00	1.47	0.197 (0.053)
	SCAD	1.99	1.20	0.151 (0.055)
	MCP	1.99	1.12	0.133 (0.049)
	BAR	1.92	0.31	0.150 (0.057)
50	LASSO	2.00	1.96	0.204 (0.048)
	SCAD	2.00	1.99	0.159 (0.054)
	MCP	1.96	1.57	0.140 (0.050)
	BAR	1.96	0.29	0.144 (0.048)
100	LASSO	2.00	4.01	0.215 (0.058)
	SCAD	2.00	1.99	0.159 (0.054)
	MCP	1.98	3.60	0.149 (0.057)
	BAR	1.97	0.70	0.159 (0.053)
$β = (0.5, 0.7, 0, \dots, 0)$
10	LASSO	2.00	0.55	0.277 (0.055)
	SCAD	1.99	0.35	0.218 (0.053)
	MCP	1.99	0.28	0.213 (0.050)
	BAR	1.95	0.10	0.222 (0.055)
30	LASSO	2.00	1.40	0.304 (0.070)
	SCAD	1.98	1.07	0.232 (0.064)
	MCP	1.99	1.14	0.216 (0.062)
	BAR	1.95	0.29	0.231 (0.057)
50	LASSO	2.00	2.25	0.320 (0.061)
	SCAD	2.00	1.94	0.238 (0.062)
	MCP	2.00	1.85	0.227 (0.057)
	BAR	1.98	0.34	0.233 (0.056)
100	LASSO	2.00	3.76	0.325 (0.060)
	SCAD	1.99	3.82	0.251 (0.067)
	MCP	1.98	3.42	0.236 (0.062)
	BAR	1.94	0.68	0.249 (0.061)
$β = (0.5, 0.5, 0.5, 0.5, 0, \dots, 0)$
10	LASSO	3.97	0.40	0.673 (0.102)
	SCAD	3.71	0.33	0.599 (0.100)
	MCP	3.58	0.23	0.594 (0.090)
	BAR	3.32	0.06	0.634 (0.106)
30	LASSO	3.99	1.43	0.731 (0.106)
	SCAD	3.77	1.20	0.645 (0.105)
	MCP	3.61	1.30	0.600 (0.107)
	BAR	3.44	0.26	0.626 (0.106)
50	LASSO	3.99	2.05	0.751 (0.092)
	SCAD	3.74	2.22	0.648 (0.103)
	MCP	3.63	1.81	0.618 (0.103)
	BAR	3.41	0.38	0.624 (0.112)
100	LASSO	3.96	3.55	0.775 (0.081)
	SCAD	3.76	3.77	0.692 (0.102)
	MCP	3.64	3.25	0.627 (0.118)
	BAR	3.38	0.55	0.644 (0.112)

Table 3. Simulation results based on the joint selection of the tuning parameter and bandwidth

n = 300

for (III).

Table 3. Simulation results based on the joint selection of the tuning parameter and bandwidth

n = 300

for (III).

$β = (0.5, 0.5, 0, \dots, 0)$
p	Method	TP	FP	MMSE (SD)
10	LASSO	2.00	0.63	0.165 (0.057)
	SCAD	1.98	0.42	0.125 (0.054)
	MCP	1.92	0.77	0.139 (0.064)
	BAR	1.95	0.14	0.128 (0.053)
30	LASSO	2.00	1.82	0.191 (0.057)
	SCAD	1.98	1.44	0.142 (0.058)
	MCP	1.90	2.68	0.187 (0.092)
	BAR	1.94	0.37	0.139 (0.055)
50	LASSO	2.00	2.83	0.200 (0.046)
	SCAD	1.99	2.31	0.153 (0.058)
	MCP	1.88	4.32	0.220 (0.099)
	BAR	1.96	0.42	0.137 (0.050)
100	LASSO	2.00	5.25	0.214 (0.055)
	SCAD	1.95	5.45	0.171 (0.071)
	MCP	1.96	1.57	0.140 (0.050)
	BAR	1.92	1.04	0.161 (0.066)
$β = (0.5, 0.7, 0, \dots, 0)$
10	LASSO	2.00	0.78	0.261 (0.068)
	SCAD	1.98	0.53	0.204 (0.060)
	MCP	1.98	0.47	0.202 (0.056)
	BAR	1.94	0.13	0.211 (0.063)
30	LASSO	2.00	1.80	0.275 (0.067)
	SCAD	2.00	1.20	0.217 (0.064)
	MCP	2.00	1.42	0.201 (0.065)
	BAR	1.95	0.31	0.222 (0.064)
50	LASSO	2.00	2.54	0.305 (0.062)
	SCAD	1.95	2.36	0.223 (0.066)
	MCP	1.97	2.25	0.220 (0.061)
	BAR	1.96	0.58	0.225 (0.063)
100	LASSO	2.00	4.8	0.231 (0.081)
	SCAD	1.97	4.45	0.231 (0.066)
	MCP	1.97	4.70	0.158 (0.063)
	BAR	1.91	0.90	0.232 (0.070)
$β = (0.5, 0.5, 0.5, 0.5, 0, \dots, 0)$
10	LASSO	3.96	0.58	0.618 (0.110)
	SCAD	3.74	0.40	0.553 (0.108)
	MCP	3.63	0.32	0.544 (0.104)
	BAR	3.43	0.11	0.578 (0.114)
30	LASSO	3.96	1.36	0.694 (0.106)
	SCAD	3.77	1.79	0.588 (0.104)
	MCP	3.59	1.07	0.571 (0.109)
	BAR	3.42	0.32	0.594 (0.113)
50	LASSO	3.97	2.19	0.702 (0.096)
	SCAD	3.69	2.54	0.627 (0.095)
	MCP	3.63	1.94	0.576 (0.102)
	BAR	3.39	0.58	0.600 (0.106)
100	LASSO	3.96	4.93	0.738 (0.093)
	SCAD	3.63	4.79	0.642 (0.102)
	MCP	3.57	4.70	0.612 (0.099)
	BAR	3.35	1.17	0.615 (0.117)

Table 4. Simulation results based on the joint selection of the tuning parameter and bandwidth for

n = 100

or

n = 5000

with

p = 5

.

Table 4. Simulation results based on the joint selection of the tuning parameter and bandwidth for

n = 100

or

n = 5000

with

p = 5

.

		$n = 100$			$n = 5000$
$β = (0.5, 0.5, 0, \dots, 0)$
p	Method	TP	FP	MMSE (SD)	TP	FP	MMSE (SD)
5	LASSO	1.81	0.68	0.234 (0.104)	2.00	0.74	0.084 (0.065)
	SCAD	1.60	0.58	0.220 (0.103)	2.00	0.36	0.077 (0.018)
	MCP	1.55	0.50	0.218 (0.107)	2.00	0.30	0.076 (0.019)
	BAR	1.21	0.14	0.267 (0.117)	2.00	0.03	0.079 (0.021)

Table 5. Simulation results based on the joint selection of the tuning parameter and bandwidth

n = 100

for existing noncontinuous covariate.

Table 5. Simulation results based on the joint selection of the tuning parameter and bandwidth

n = 100

for existing noncontinuous covariate.

$β = (0.5, 0.5, 0, \dots, 0)$
p	Method	TP	FP	MMSE (SD)
10	LASSO	1.98	1.25	0.188 (0.074)
	SCAD	1.93	1.04	0.156 (0.076)
	MCP	1.90	0.94	0.155 (0.078)
	BAR	1.81	0.38	0.162 (0.085)
30	LASSO	1.99	3.42	0.229 (0.077)
	SCAD	1.93	3.07	0.199 (0.087)
	MCP	1.89	2.69	0.199 (0.093)
	BAR	1.82	1.26	0.200 (0.090)
50	LASSO	1.99	5.60	0.232 (0.074)
	SCAD	1.92	5.32	0.202 (0.086)
	MCP	1.90	4.83	0.218 (0.092)
	BAR	1.79	1.61	0.206 (0.087)

Table 6. Comparison of the proposed method to the left endpoint method.

		Proposed Method			Left Endpoint Imputation
$β = (0.5, 0.5, 0, \dots, 0)$
p	Method	TP	FP	MMSE (SD)	TP	FP	MMSE (SD)
5	LASSO	1.81	0.68	0.234 (0.104)	1.89	0.10	0.313 (0.086)
	SCAD	1.60	0.58	0.220 (0.103)	1.58	0.06	0.305 (0.084)
	MCP	1.55	0.50	0.218 (0.107)	1.52	0.13	0.313 (0.071)
	BAR	1.21	0.14	0.268 (0.117)	0.97	0.02	0.385 (0.089)

Table 7. Analysis results of children’s mortality data based on choice (I).

Factor	LASSO	SCAD	MCP	BAR
Age	${0.234}_{(0.110)}$	${0.248}_{(0.110)}$	${0.248}_{(0.132)}$	${0.162}_{(0.134)}$
BMI	${0.000}_{(0.132)}$	${0.000}_{(0.111)}$	${0.000}_{(0.154)}$	${0.000}_{(0.110)}$
HOPS	${0.000}_{(0.123)}$	${0.000}_{(0.130)}$	${0.000}_{(0.118)}$	${0.000}_{(0.094)}$
GENDER	${0.000}_{(0.109)}$	${0.000}_{(0.108)}$	${0.000}_{(0.102)}$	${0.000}_{(0.088)}$
EDU	${0.092}_{(0.119)}$	${0.115}_{(0.143)}$	${0.098}_{(0.146)}$	${0.000}_{(0.115)}$
URBAN	${0.000}_{(0.126)}$	${0.000}_{(0.122)}$	${0.000}_{(0.101)}$	${0.000}_{(0.120)}$

Table 8. Analysis results of children’s mortality data based on choice (II).

Factor	LASSO	SCAD	MCP	BAR
Age	${0.238}_{(0.109)}$	${0.252}_{(0.115)}$	${0.252}_{(0.105)}$	${0.167}_{(0.090)}$
BMI	${0.000}_{(0.108)}$	${0.000}_{(0.123)}$	${0.000}_{(0.118)}$	${0.000}_{(0.100)}$
HOPS	${0.000}_{(0.131)}$	${0.000}_{(0.107)}$	${0.000}_{(0.144)}$	${0.000}_{(0.109)}$
GENDER	${0.000}_{(0.121)}$	${0.000}_{(0.119)}$	${0.000}_{(0.122)}$	${0.000}_{(0.119)}$
EDU	${0.110}_{(0.102)}$	${0.133}_{(0.161)}$	${0.133}_{(0.118)}$	${0.000}_{(0.105)}$
URBAN	${0.000}_{(0.127)}$	${0.000}_{(0.114)}$	${0.000}_{(0.104)}$	${0.000}_{(0.091)}$

Table 9. Analysis results of children’s mortality data based on choice (III).

Factor	LASSO	SCAD	MCP	BAR
Age	${0.243}_{(0.093)}$	${0.257}_{(0.111)}$	${0.257}_{(0.105)}$	${0.162}_{(0.134)}$
BMI	${0.000}_{(0.119)}$	${0.000}_{(0.127)}$	${0.000}_{(0.154)}$	${0.000}_{(0.110)}$
HOPS	${0.000}_{(0.145)}$	${0.000}_{(0.148)}$	${0.000}_{(0.118)}$	${0.000}_{(0.094)}$
GENDER	${0.000}_{(0.123)}$	${0.000}_{(0.101)}$	${0.000}_{(0.102)}$	${0.000}_{(0.088)}$
EDU	${0.128}_{(0.097)}$	${0.149}_{(0.109)}$	${0.149}_{(0.119)}$	${0.000}_{(0.115)}$
URBAN	${0.000}_{(0.117)}$	${0.000}_{(0.118)}$	${0.000}_{(0.105)}$	${0.000}_{(0.120)}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Variable Selection for Generalized Linear Models with Interval-Censored Failure Time Data

Abstract

1. Introduction

2. Unbiased Transformation Variable Selection Procedure

3. Penalized Least Squares Coordinate Descent Algorithm

4. Asymptotic Properties

5. Simulation Study

6. An Application

7. Discussion and Concluding Remarks

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics