Next Article in Journal
Models of Optimal Operating Modes of the Water-Economic Complex on the Basis of Hydro Resource Price Evaluation
Next Article in Special Issue
Bootstrap Tests for the Location Parameter under the Skew-Normal Population with Unknown Scale Parameter and Skewness Parameter
Previous Article in Journal
Nearest Descent, In-Tree, and Clustering
Previous Article in Special Issue
Estimation of COVID-19 Transmission and Advice on Public Health Interventions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Variable Selection for Generalized Linear Models with Interval-Censored Failure Time Data

1
Center for Applied Statistical Research, College of Mathematics, Jilin University, Changchun 130012, China
2
National Applied Mathematical Center (Jilin), Changchun 130012, China
3
School of Mathematical Sciences, Capital Normal University, Beijing 100048, China
4
Department of Statistics, University of Missouri, Columbia, MO 65211, USA
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(5), 763; https://doi.org/10.3390/math10050763
Submission received: 18 January 2022 / Revised: 22 February 2022 / Accepted: 24 February 2022 / Published: 27 February 2022
(This article belongs to the Special Issue Computational Statistics and Data Analysis)

Abstract

:
Variable selection is often needed in many fields and has been discussed by many authors in various situations. This is especially the case under linear models and when one observes complete data. Among others, one common situation where variable selection is required is to identify important risk factors from a large number of covariates. In this paper, we consider the problem when one observes interval-censored failure time data arising from generalized linear models, for which there does not seem to exist an established method. To address this, we propose a penalized least squares method with the use of an unbiased transformation and the oracle property of the method is established along with the asymptotic normality of the resulting estimators of regression parameters. Simulation studies were conducted and demonstrated that the proposed method performed well for practical situations. In addition, the method was applied to a motivating example about children’s mortality data of Nigeria.

1. Introduction

Variable selection is an important field in statistics, and there is a lot of literature on variable selection, especially in the context of linear models for complete data, such as stepwise regression, ridge regression, Bayesian variable selection, least absolute shrinkage and selection operator (LASSO), model averaging, smoothly clipped absolute deviation (SCAD), elastic net, adaptive LASSO (ALASSO), minimax concave penalty (MCP), seamless- L 0 (SELO) and broken adaptive ridge (BAR) (Goldberger et al., 1961 [1]; Hoerl and Kennard, 1970 [2]; Mitchell and Beauchamp, 1988 [3]; Tibshirani, 1996 [4]; Raftery et al., 1997 [5]; Zou and Hastie, 2005 [6]; Zou, 2006 [7]; Fan and Li, 2001 [8]; Zhang, 2010 [9]; Dicker et al., 2013 [10]; Liu and Li, 2016 [11]; Dai et al., 2018 [12]; Zheng et al., 2021 [13]). Among others, one type of methods that have recently attracted a lot of attention is the penalized method, for which various penalty functions have been proposed. LASSO by Tibshirani (1996) [4], SCAD by Fan and Li (2001) [8], elastic net by Zou and Hastie (2005), ALASSO by Zou (2006) [7], MCP by Zhang (2010) [9], SELO by Dicker et al. (2013) [10], and BAR by Liu and Li (2016) [11] are included for penalty functions.
Variable selection has been investigated by many authors for incomplete data such as right-censored and interval-censored failure time data (Cai et al., 2005 [14]; Fan and Li, 2002 [15]; Tibshirani, 1997 [16]; Zhang and Lu, 2007 [17]; Nasrullah Khan et al., 2018 [18]; Zhao et al., 2020 [19]; Li et al., 2020 [20]; Du and Sun 2021 [21]; Ali et al., 2021 [22]). By interval censored data, we mean that the failure time of interest is known or observed only to belong to an interval instead of being observed exactly. It is easy to see that this is usually the case for periodic follow-up studies such as clinical trials and they include right-censored failure time data as a special case. Notably, the analysis of interval-censored data is much more challenging than the analysis of right-censored data. In terms of the well-known Cox model, the classical partial likelihood method for right-censored data is no longer available with interval censoring. This is because for the case of interval censoring, we have to estimate not only regression parameters of interest but also the nuisance parameters simultaneously. Recently, many authors have discussed the analysis of interval-censored data in various situations. For example, Zhao et al. (2015) [23], Wang et al. (2016) [24] and Li et al. (2018) [25] studied inference procedures for the Cox model, the additive hazard model, and the linear transformation model, respectively. Sun (2006) [26] provided a relatively complete review of the literature for the interval-censored failure time data analysis.
As mentioned above, several authors have discussed variable selection for interval-censored failure time data ([19,20]). However, existing methods cannot be applied directly to linear models or generalized linear models. In the variable selection procedure proposed below, we employ the unbiased transformation approach and the main idea behind it is to transfer two variables representing interval-censored data to a new, single variable that has the same conditional expectation. One of the early applications of the approach was given by Buckley and James (1979) [27] in dealing with right-censored data. Among others, one advantage of the proposed method over the existing methods is that it can be relatively easily implemented since one can make use of the existing variable selection programs for complete data with simple modifications.
Deng (2004) [28] and Deng et al. (2012) [29] discussed the use of the unbiased transformation approach for the analysis of interval-censored data. In particular, the latter considered the situation similar to that discussed here, but not on variable selection, under the assumption that the joint density function of the two variables representing the observed data is known. It is easy to see that this may not be true in many applications. To address this, we will adopt the kernel estimation approach [30,31] to estimate the needed density function and develop a unified approach to variable selection with different penalty functions for generalized linear models based on interval-censored data.
The rest of the article is organized as follows. In Section 2, we will first introduce some notations and assumptions to be used throughout the paper and present the proposed variable selection procedure. For the implementation of the presented method, a coordinate descent algorithm is developed in Section 3. In Section 4, the asymptotic properties of the proposed method under the BAR penalty are established, and Section 5 gives some numerical results obtained from a simulation study, which suggest that the method works well in practical situations. In Section 6, the method is applied to children’s mortality data of Nigeria that motivated this study, and Section 7 contains some discussion and concluding remarks.

2. Unbiased Transformation Variable Selection Procedure

Consider a failure time study that consists of n independent subjects and for subject i, let T i denote the failure time of interest and X i a p-dimensional vector of covariates. Suppose that for each T i , two observations are available at the observation times U i and V i such that they divide the axis ( 0 , ) into three parts 0 , U i , U i , V i , V i , and we know which part T i falls in. Thus, the observed data on subject i have the form O i = { U i , V i , δ 1 i , δ 2 i , X i } , where δ 1 i = I ( T i U i ) and δ 2 i = I ( U i < T i V i ) with I ( · ) being the indicator function,
I A ( x ) = 1 x A 0 x A
i = 1 , , n . In the following, we will assume that T i is independent of U i and V i given X i .
To describe the covariate effects, we will assume that given X i , T i has the form
H ( T i ) = β 0 + X i β + ϵ i ,
where H ( · ) is a known function, ϵ i is a zero mean random error with distribution unknown, and β 0 and β are unknown parameters. Note that for the estimation of the model above, if T i was exactly observed, a simple method would be to take H ( T i ) as a new response variable and transform model (1) into a linear model. For the situation here where T i is interval-censored, however, the above method cannot be carried out directly.
To overcome the problem, we adopt the unbiased transformation approach to first convert H ( T i ) into the variable
h i * = ϕ 1 U i , V i δ 1 i + ϕ 2 U i , V i δ 2 i + ϕ 3 U i , V i 1 δ 1 i δ 2 i + H ( 0 ) ,
where ϕ 1 ( · , · ) , ϕ 2 ( · , · ) and ϕ 3 ( · , · ) are some continuous functions with finite continuous partial derivatives and also independent of the distribution of T i . Let g ( · , · ) denotes the joint density function of U i and V i and assume that ϕ 1 ( u , v ) , ϕ 2 ( u , v ) and ϕ 3 ( u , v ) satisfy the following conditions
v = 0 u = 0 v ϕ 1 ( u , v ) g ( u , v ) d u d v = 0 , y ϕ 2 ( y , v ) ϕ 1 ( y , v ) g ( y , v ) d v + 0 y ϕ 3 ( u , y ) ϕ 2 ( u , y ) g ( u , y ) d u = H ( y ) .
Then according to Deng et al. (2012) [29], we have that
E ( h i * ( U i , V i , δ 1 i , δ 2 i ) ) = E ( H ( T i ) )
for i = 1 , , n .
Under Equation (2), one can see that ϕ 1 ( U i , V i ) , ϕ 2 ( U i , V i ) and ϕ 3 ( U i , V i ) not only depend on U i and V i but also depend on g ( U i , V i ) . More specifically, one can rewrite h i * as
h i * ( U i , V i , g ( U i , V i ) , δ 1 i , δ 2 i ) = ϕ 1 ( U i , V i , g ( U i , V i ) ) δ 1 i + ϕ 2 ( U i , V i , g ( U i , V i ) ) δ 2 i
+ ϕ 3 ( U i , V i , g ( U i , V i ) ) ( 1 δ 1 i δ 2 i ) + H ( 0 ) .
Thus, for estimating model (1) or β , it is natural to consider the least squares method to minimize the mean square of the residual after the unbiased transformation
n 1 i = 1 n ( h i * β 0 X i β ) 2
if g ( U i , V i ) was known. Of course, in practice, g ( U i , V i ) is unknown and for this, we propose to first estimate g ( U i , V i ) by the kernel density estimator g ^ ( U i , V i ) and then consider
h ^ i * = ϕ 1 ( U i , V i , g ^ ( U i , V i ) ) δ 1 i + ϕ 2 ( U i , V i , g ^ ( U i , V i ) ) δ 2 i
+ ϕ 3 ( U i , V i , g ^ ( U i , V i ) ) ( 1 δ 1 i δ 2 i ) + H ( 0 ) .
Note that h ^ i * given above involves the estimation of the two-dimensional function g and according to Section 5 of Deng (2004) [28], one could equivalently replace it by
h ^ i * = ϕ 1 ( U i , V i , g ^ u ( U i ) , g ^ v ( V i ) ) δ 1 i + ϕ 2 ( U i , V i , g ^ u ( U i ) , g ^ v ( V i ) ) δ 2 i
+ ϕ 3 ( U i , V i , g ^ u ( U i ) , g ^ v ( V i ) ) ( 1 δ 1 i δ 2 i ) + H ( 0 ) ,
where g ^ u ( U i ) and g ^ v ( V i ) denote the kernel estimators of the marginal density functions g u ( U i ) and g v ( V i ) . In Lemma A1 given in Appendix A, we show that h ^ i * converge to h i * in probability. Since U i and V i are positive variables, we propose to adopt the log-transformation technique and employ the log kernel density estimator
g ^ u ( u ) = ( n h ) 1 i = 1 n u 1 K ( ( l o g ( u ) l o g ( U i ) ) / h ) ,
and
g ^ v ( v ) = ( n h ) 1 i = 1 n v 1 K ( ( l o g ( v ) l o g ( V i ) ) / h )
(Parzen, 1962 [30]). Then by following Deng (2004) [28], one can estimate β by minimizing the mean squared residual after kernel estimation and unbiased transformation
n 1 i = 1 n ( h ^ i * n 1 q = 1 n h ^ q * X i β ) 2 .
Now we consider the variable selection or the selection of important covariates. For this, and motivated by (4), we propose to use the penalized least squares estimation method and to minimize the mean squared residual after kernel estimation and unbiased transformation with penalized criteria
n 1 i = 1 n ( h ^ i * n 1 q = 1 n h ^ q * X i β ) 2 + j = 1 p p α , λ ( | β j | ) ,
where p a , λ ( | β j | ) denotes a penalty function with a and λ (some comments are given below). In the following, several commonly used penalty functions are considered, including the LASSO penalty p λ ( | β j | ) = λ | β j | proposed by Tibshirani (1996) [4] and the SCAD penalty
p λ β j ; a = λ β j if β j λ β j 2 2 a λ β j + λ 2 2 ( a 1 ) if λ < β j a λ ( a + 1 ) λ 2 2 if β j > a λ
with a > 2 by Fan and Li (2001) [8]. For a in the SCAD penalty, we set a = 3.7 suggested by Fan and Li (2001) [8]. Furthermore, we investigate the use of the MCP penalty
p λ β j ; a = λ 0 β j ( a λ x ) + a λ d x
with a > 1 given in Zhang (2010) [9] and the BAR penalty p λ β j = λ β j 2 / β ˜ j 2 discussed in Zhao et al. (2020) [19], where β ˜ j ( j = 1 , , p ) denotes a nonzero “good” estimator of β j .
Note that for the application of the method described above, one needs to choose the functions ϕ 1 , ϕ 2 , and ϕ 3 satisfying Equation (2). For this, many functions can be used and for simplicity, we suggest to employ (I) ϕ 1 ( U i , V i ) = 0 , ϕ 2 ( U i , V i ) = 0 , ϕ 3 ( U i , V i ) = H ( V i ) / g ^ v ( V i ) ; (II) ϕ 1 ( U i , V i ) = 0 , ϕ 2 ( U i , V i ) = H ( U i ) / g ^ u ( U i ) , ϕ 3 ( U i , V i ) = H ( U i ) / g ^ u ( U i ) ; or (III) ϕ 1 ( U i , V i ) = 0 , ϕ 2 ( U i , V i ) = H ( U i ) / 2 g ^ u ( U i ) , ϕ 3 ( U i , V i ) = H ( U i ) / 2 g ^ u ( U i ) + H ( V i ) / 2 g ^ u ( V i ) . The numerical study below indicates that they give satisfactory and robust results and more discussion on this can be found in Deng et al. (2012) [29].

3. Penalized Least Squares Coordinate Descent Algorithm

Let β ^ denote the estimator of β given by minimizing the penalized criterion function in (5). In this section, we discuss the determination of β and investigate a coordinate descent algorithm that takes turn to update each element β j of β while making all other elements of β fixed at their current estimates except β j .
Define
M ( β j ) = n 1 i = 1 n ( h ^ i * n 1 q = 1 n h ^ q * X i β ) 2 .
Then, at the k th iteration, the value of β j is derived by minimizing Q ( β j ) = M ( β j ) + p λ ( | β j | ) to determine β ^ j ( k ) . Note that by drawing the locally quadratic approximation idea discussed in Fan and Li (2001) [8], the locally quadratic approximation can be presented as
[ p λ ( | β j | ) ] = p λ ( | β j | ) s g n ( β j ) { p λ ( | β ^ j ( k 1 ) | ) / | β j ( k 1 ) | } β j
when β j 0 . In other words, a quadratic function can locally approximate p ( | β j | ; λ ) at | β ^ j ( k 1 ) | as
p λ β j p λ β ^ j ( k 1 ) + 1 / 2 p λ β ^ j ( k 1 ) / β ^ j ( k 1 ) β j 2 β ^ j ( k 1 ) 2 .
Meanwhile, M ( β j ) can be obtained by the second-order Taylor expansion,
M β j M β ^ j ( k 1 ) + M β ^ j ( k 1 ) β j β ^ j ( k 1 ) + 1 / 2 M β ^ j ( k 1 ) β j β ^ j ( k 1 ) 2 ,
where M and M represent the first and second derivatives of M ( · ) , respectively. Therefore, minimizing Q ( β j ) is equivalent to minimizing the function
M β ^ j ( k 1 ) + M β ^ j ( k 1 ) β j β ^ j ( k 1 ) + 1 2 M β ^ j ( k 1 ) β j β ^ j ( k 1 ) 2 + p λ β ^ j ( k 1 ) + 1 / 2 p λ β ^ j ( k 1 ) / β ^ j ( k 1 ) β j 2 β ^ j ( k 1 ) 2
with respect to β j . A closed form solution is given as
β ^ j ( k ) = β ^ j ( k 1 ) M β ^ j ( k 1 ) M β ^ j ( k 1 ) M β ^ j ( k 1 ) + p λ β ^ j ( k 1 ) / β ^ j ( k 1 ) .
Note that it is easy to see that the resulting solution (6) and the approximation used above for the penalty function apply to any penalty function. However, the BAR penalty is not necessary to do a locally quadratic approximation, due to the fact that it is already a quadratic function of coefficients. For that situation, we can obtain the closed-form iterative solution proposed by Wu et al., 2020 [32] as
β ^ j ( k ) = β ^ j ( k 1 ) Q β ^ j ( k 1 ) Q β ^ j ( k 1 ) ,
where Q ( β ^ j ( k 1 ) ) and Q ( β ^ j ( k 1 ) ) are the first and second derivatives of Q ( β j ) with respect to β j evaluated at β ^ j ( k 1 ) , respectively. By combining the discussion above, the algorithm can be implemented as follows:
Step 1:
Set k = 0 and choose the initial estimate β ^ ( 0 )
β ^ ( 0 ) = arg min β n 1 i = 1 n ( h ^ i * n 1 q = 1 n h ^ q * X i β ) 2 .
Step 2:
Use the coordinate descent algorithm to determine β ^ ( k ) by (6) for LASSO, SCAD and MCP, respectively. Meanwhile, we determine β ^ ( k ) by (7) for BAR.
Step 3:
Repeat Step 2 until the convergence or k exceeds a given large number.
There are many various criteria to check the convergence in Step 3 above. In the simulation studies below, the proposed algorithm is declared to achieve convergence if the maximum of the absolute differences of the estimates between two successive iterations is less than 10 5 (Sun et al., 2019) [33]. To implement the algorithm above, one also needs to choose the tuning parameter λ n . For the results given below, we use the Bayesian information criterion (BIC) proposed by Schwarz (1978) [34] which is data-dependent and defined as
B I C λ = 2 · n 1 i = 1 n ( h ^ i * n 1 q = 1 n h ^ q * X i β ^ ) 2 + d f λ l o g ( n ) .
In the above, β ^ denotes the final estimator of β and d f λ represents the total number of nonzero estimates in β ^ , which serves as the degree of freedom. Alternatively, one could employ other methods such as a K-fold cross-validation (CV) (Verweij and Houwelingen, 1993) [35]. For given λ n , we define BIC as above. Then, one can choose the value of λ n that minimize B I C λ . For variance estimation of the proposed estimators, we suggest using the nonparametric bootstrap method, which seems to work well as the numerical study indicates.

4. Asymptotic Properties

In this section, we establish the asymptotic properties of the variable selection procedure or the estimator proposed above with the use of the BAR penalty function. For this, let β 0 = β 01 , , β 0 p n denote the true parameter value. Note that here, we replace p by p n to emphasize the dependence of p on n and assume that p n can diverge to infinity but p n < n . For simplicity, we write β 0 = β 01 , β 02 and assume β 01 0 and β 02 = 0 , where β 01 R n q n and β 02 R n p n q n . Let β ^ * = β ^ 1 * , β ^ 2 * denote the BAR estimator of β corresponding to the same participation. Set X α = x 1 , , x q n , X γ = x q n + 1 , , x p n , Σ n 1 = X α X α / n and Σ n = X X / n , where x j is jth column of X for j = 1 , , p n . Let Y ^ i * = h ^ i * n 1 i = 1 n h ^ i * , Y ^ * = ( Y ^ 1 * , , Y ^ n * ) , Y i * = h i * n 1 i = 1 n h i * , Y * = ( Y 1 * , , Y n * ) for i = 1 , , n . For the asymptotic properties, we need the following regularity conditions.
C 1 . For every t ( 0 , ) , H ( t ) exists, where H ( t ) is the derivative of H ( t ) and H ( 0 ) < .
C 2 . U i and V i are positive i.i.d random vectors with uniformly continuous density functions g u ( u ) and g v ( v ) , respectively.
C 3 . T i and ( U i , V i ) are independent given X i .
C 4 . V a r ( h * ) < .
C 5 . C n = n 1 i = 1 n X i X i C with probability 1 as n tends to , where C is a positive definite matrix ([19]).
C 6 . K ( t ) is uniformly continuous and of bound variation, | K ( t ) | d t < and K ( t ) 0 as | t | . K ( t ) d t = 1 and | x l o g | x | | 1 / 2 | d K ( t ) | < . lim n h n = 0 , and lim n n h n / log n = .
C 7 . There exists a constant E > 1 such that 0 < 1 / E < λ min C n λ max C n < E < for every integer n.
C 8 . a 0 n = min 1 j q n β 0 j and a 1 n = max 1 j q n β 0 j . As n , p n q n / n 0 , p n / n 1 / 2 / a 0 n 0 , p n / λ n 0 and λ n a 1 n q n / n 1 / 2 / a 0 n 2 0 .
Note that Conditions C 1 C 5 are necessary to obtain an unbiased transformation ([29]), where uniformly continuous density functions g u ( u ) and g v ( v ) are needed to make kernel estimation g ^ u ( u ) and g ^ v ( v ) converging to density functions g u ( u ) and g v ( v ) almost surely and Condition C 6 guarantees that h ^ i * converges to h i * in probability ([31]). Condition C 7 assumes that C n is positive definite almost surely and its eigenvalues are bounded away from zero and infinity. Condition C 8 gives some sufficient, but not necessary, conditions needed to prove the convergence and asymptotic properties of the BAR estimator and nonzero coefficients are assumed to be uniformly bounded away from zero and infinity ([12]). Define β = α , γ , where α and γ are q n × 1 and p n q n × 1 vectors, respectively. The following theorem gives the asymptotic properties.
Theorem 1.
Assume that Conditions C 1 C 8 given above hold and ϕ 1 ( · , · ) , ϕ 2 ( · , · ) , and ϕ 3 ( · , · ) are continuous functions with finite continuous partial derivatives. Then, we have that:
(i) The fixed point of f ( α ) = X α X α + λ n D 1 ( α ) 1 X α Y ^ * exists and is unique, where D 1 ( α ) = diag α 1 2 , , α q n 2 .
(ii) (Oracle property) the BAR estimator β ^ * = β ^ 1 * , β ^ 2 * exists and is unique, where β ^ 2 * = 0 and β ^ 1 * is the unique fixed point of f ( α ) .
(iii) (Asymptotic normality) n β ^ 1 * β 01 N ( 0 , Σ ( 1 ) ) in distribution, where Σ ( 1 ) is defined in Appendix A.

5. Simulation Study

Now we present some results from a simulation study to assess the finite sample performance of the variable selection procedure presented in the previous sections. In the study, we generated the vector of covariates X i from N ( 0 , Σ X ) and the covariance matrix Σ X with the ( l , m ) element being 0 . 5 | l m | . Given the X i ’s, the true failure times were generated based on model (1) with H ( T ) = log ( T + 1 ) and the ε i ’s following the standard normal distribution. For the generation of the observed interval-censored data or the two observation times for each subject, they were set to be times from the homogeneous Poisson process with the interexamination times being independently and identically distributed in the exponential distribution with a mean of 0.4 (Li et al., 2019 [20]).
For the application of the proposed variable selection procedure, we set
h * = δ 1 * 0 + δ 2 * 0 + δ 3 * 1 1 + v / g v ( v )
by following Deng et al. (2012) [29]. Meanwhile, we set
h * = δ 1 * 0 + δ 2 * 1 1 + u / g u ( u ) + δ 3 * 1 1 + u / g u ( u )
and
h * = δ 1 * 0 + δ 2 * 1 1 + u / 2 g u ( u ) + δ 3 * ( 1 1 + u / 2 g u ( u ) + 1 1 + v / 2 g v ( v ) ) .
For the kernel estimators g ^ ( u ) and g ^ ( v ) , we considered the following biquadratic kernel function
K ( t ) = 3 / 4 1 t 2 , | t | 1 , 0 , else ,
and several different bandwidths. They include (a) n 1 / 5 , (b) 1.06 · σ ^ · n 1 / 5 , (c) 1.06 · min ( σ ^ , R ^ / 1.34 ) · n 1 / 5 with R ^ being the 0.75 quantile minus the 0.25 quantile, and (d) c 1 · n 1 / 5 , where σ ^ denotes the sample standard deviation and c 1 is selected by the CV method over 20 numbers from ( 0.5 , 1.5 ) with equal distance. The simulation results of above four circumstances are listed in Supplemental Materials. We also considered the selection of the tuning parameter λ n and the bandwidth choice (d) together, based on the BIC described above with λ n over 50 numbers from 0.001 to 0.01 with equal distance and c 1 over 10 numbers from ( 0.5 , 1.5 ) with equal distance. The results given below are based on the sample size n = 300 or n = 500 with 100 replications and p = 10 , 30, 50, or 100.
Table 1, Table 2 and Table 3 are based on (8), (9), (10), respectively. Table 1, Table 2 and Table 3 present the results on the covariate selection with β 0 = ( 0.5 , 0.5 , 0 p 2 ) and β 0 = ( 0.5 , 0.7 , 0 p 2 ) , corresponding to relatively moderate and weak signals, respectively, as well as β 0 = ( 0.5 , 0.5 , 0.5 , 0.5 , 0 p 4 ) . They include the average number of nonzero estimates of the parameters whose true values are not zero (TP) and the average number of nonzero estimates of the parameters whose true values are zero (FP). In addition, we calculated and included in the table the median mean squared errors (MMSE) given by ( β ^ β 0 ) Σ X ( β ^ β 0 ) , measuring the prediction accuracy, and the standard deviation of MSE (SD), where Σ X denotes the population covariance matrix of the covariates. In the table, in addition to the BAR penalty function, we also considered the LASSO, MCP, and SCAD penalty functions, and the joint selection of the tuning parameter λ n and the bandwidth based on the BIC was used. We also added backward stepwise variable selection based on BIC and for β 0 = ( 0.5 , 0.5 , 0 p 2 ) in Table 1.
Furthermore, we added lower and greater sample sizes n = 100 and n = 5000 with p = 5 in Table 4 with β 0 = ( 0.5 , 0.5 , 0 p 2 ) . Furthermore, we added an extra simulation to demonstrate how this method works in the presence of noncontinuous covariates, i.e., the last covariates are generated from a Bernoulli distribution with a 0.5 success probability. The simulation results of this setup are presented in Table 5 with n = 300 , p = 10 , 30 , 50 , and β 0 = ( 0.5 , 0.5 , 0 ) . Finally, we considered a toy example in which the left endpoint imputation is considered (Sun, 2006). We illustrated the results in Table 6 to show the error that would be made if the data were considered as uncensored with n = 100 , p = 5 and β 0 = ( 0.5 , 0.5 , 0 p 2 ) .
One can see from Table 1, Table 2 and Table 3 that the proposed approach seems to perform well with all penalty functions considered and in general, the method with the BAR penalty function gave smaller FP and thus parsimonious models. Meanwhile, results are similar for (I), (II), and (III). As expected, the method with the LASSO penalty function gave slightly larger FP and tended to select more noises than other penalty functions, and the method gave better results on both variable selection and prediction accuracy when the sample size increased. Furthermore, as expected, the important covariates with weak effects were more difficult to be identified than moderate effect covariates. The stepwise variable selection gave the largest MMSE and SD. One can see from Table 4 that the results with a larger sample have lower MMSE and SD. The proposed method only needed 17 s, 15 s, 20 s, and 5 s for LASSO, SCAD, MCP and BAR, respectively, on average in Table 4. One can see from Table 5 that the proposed procedure works in the presence of noncontinuous covariates. The unbiased transformation method can improve accuracy as shown when comparing the proposed method to the left endpoint imputation in Table 6.

6. An Application

In this section, we apply the methodology proposed in the previous sections to a set of children’s mortality data arising from the 2003 Nigeria Demographic and Health Survey (Kneib, 2006) [36]. The data set consists of 5730 children with their survival information and six covariates. They are AGE (the age of the children’s mother when giving birth), BMI (the mother’s body mass index at birth), HOSP (1 if the baby was delivered in a hospital and 0 otherwise), GENDER (1 for boys and 0 otherwise), EDU (1 if the mother received higher education and 0 otherwise), and URBAN (1 if the family lived in an urban area and 0 otherwise). Among others, one of the objectives of the study was to identify the covariates that had a significant effect on children’s mortality in Nigeria.
In the study, for each subject, if death occurred within the first two months after birth, the failure time could be observed exactly, and otherwise, only interval-censored observations were obtained based on the interview times of their mothers. Among the 5730 children, 233 gave exact failure times, 430 interval-censored observations, and the others provided right-censored observations. To apply the proposed approach, as in the simulation study, we set H ( T ) = log ( T + 1 ) and used the same four penalty functions. For the three functions ϕ 1 ( · , · ) , ϕ 2 ( · , · ) and ϕ 3 ( · , · ) , we considered three choices and they are (I) ϕ 1 ( U i , V i ) = 0 , ϕ 2 ( U i , V i ) = 0 , ϕ 3 ( U i , V i ) = H ( V i ) / g ^ v ( V i ) ; (II) ϕ 1 ( U i , V i ) = 0 , ϕ 2 ( U i , V i ) = H ( U i ) / g ^ u ( U i ) , ϕ 3 ( U i , V i ) = H ( U i ) / g ^ u ( U i ) ; (III) ϕ 1 ( U i , V i ) = 0 , ϕ 2 ( U i , V i ) = H ( U i ) / 2 g ^ u ( U i ) , ϕ 3 ( U i , V i ) = H ( U i ) / 2 g ^ u ( U i ) + H ( V i ) / 2 g ^ u ( V i ) , which will be referred to as choices (I), (II), and (III) below, respectively. Note that for exact failure times, we used H ( T ) as the response and made no unbiased transformation.
The results on the covariate selection and the estimated covariate effects are presented in Table 7, Table 8 and Table 9 with the use of choices (I)–(III), respectively. In addition, the estimated standard errors were obtained by using the nonparametric bootstrap method with 100 bootstrap samples and are included in the tables. One can see from these tables that the results seem to be robust with respect to the three choices, and all methods selected the factor AGE, suggesting that the age of the mother giving birth has a significant effect on the mortality risk of the children. The factor EDU was also selected by the LASSO, SCAD, and MCP penalty functions, and the results indicate that the level of the mother’s education seem to have a significant effect on the mortality risk. Xu et al. (2020) [37] have analyzed this application. They also found that EDU had a significant effect on the mortality risk of the children by the LASSO, ALASSO, and SCAD penalty functions. In contrast, the results suggest that the mother’s body mass index, the child’s gender, the baby’s birth location, and the family’s location did not appear to have significant effects on the children’s mortality or death rate.

7. Discussion and Concluding Remarks

This paper discussed the variable selection for generalized linear models when only interval-censored failure time data are available, and for that problem, a new unbiased-transformation-based approach was proposed. One advantage for the use of the unbiased transformation is that it allows one to employ the simple penalized least squares approach for estimation. The proposed approach can accommodate any penalty function such as LASSO, SCAD, MCP, and BAR, and a coordinate descent algorithm was developed for the implementation of the proposed procedure. In addition, the asymptotic properties of the resulting estimators were established, and the simulation study indicated that the proposed methodology works well for practical situations.
There exist several directions for future research. One is that in the proposed method, it was assumed that H ( T ) < and it is clear that this may not hold in practice. One such example is the accelerated failure time (AFT) model, in which case the proposed method would not be valid. In other words, one needs to modify the proposed unbiased transformation or develop some new methods that can be applied to the AFT model. Another direction is that in practical applications, one may encounter multivariate failure time data; in that scenario, Cai et al. (2005) [14] proposed a penalized marginal likelihood method for right-censored data with a mass of covariates. It would be helpful to develop some flexible and reliable methods to handle other types of multivariate failure time data including interval-censored data. A third direction is that it has been assumed in the previous sections that the dimension of covariates p n can diverge to infinity but is smaller than the sample size n. Obviously, there may be case where p n is greater than n such as in genetic or biomarker studies. In other words, some new methods that allow for p n > n need to be developed.
As one anonymous reviewer pointed out, neutrosophic statistics (Smarandache, 1998 [38]; Smarandache, 2013 [39]) is the extension of classical statistics and is applied when the data is coming from a complex process or from an uncertain environment. The current study can be extended using neutrosophic statistics as future research.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/math10050763/s1, Table S1: Simulation results based on the bandwidth selection method (a), Table S2: Simulation results based on the bandwidth selection method (b), Table S3: Simulation results based on the bandwidth selection method (c), Table S4: Simulation results based on the bandwidth selection method (d).

Author Contributions

Formal analysis, R.L.; Investigation, S.Z.; Methodology, T.H.; Software, R.L.; Supervision, T.H. and J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Beijing Natural Science Foundation Z210003 and National Nature Science Foundation of China (Grant Nos: 12171328,11971064). Shishun Zhao’s study was supported by National Natural Science Foundation of China (NSFC) (12071176), Science and Technology Developing Plan of Jilin Province (No. 20200201258JC, 20190902012TC, 20190304129YY).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. Interested researchers can obtain the data in the application section of this paper on the official website of Demographic and Health Surveys (DHS) Program: https://dhsprogram.com/data/available-datasets.cfm (accessed on 19 February 2022). One can registered for a download account and apply for the data.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this appendix, we sketch the proof of Theorem 1. For completeness, we introduce a few lemmas and notations, which are useful for the proof of Theorem 1. Let C n = X X / n . When we consider BAR penalty, i.e.,
arg min β n 1 i = 1 n ( h ^ i * n 1 q = 1 n h ^ q * X i β ) 2 + λ j = 1 n β j 2 / β ˜ j 2 .
In Section 2 we can get
β k = g ( β k 1 ) = X X + λ n D ( β k 1 ) 1 X Y ^ * = α * ( β k 1 ) , γ * ( β k 1 ) .
For simplicity, we write α * ( β ) and γ * ( β ) as α * and γ * . C n 1 can be partitioned as
C n 1 = A 11 A 12 A 12 A 22 ,
where the A 11 is a q × q matrix. Multiplying X X 1 X X + λ n D ( β ) to Equation (A2) gives
α * β 01 γ * + λ n / n A 11 D 1 ( α ) α * + A 12 D 2 ( γ ) γ * A 12 D 1 ( α ) α * + A 22 D 2 ( γ ) γ * = X X 1 X ϵ ^ * = β ^ l e a s t β 0 ,
where ϵ ^ i * = ϵ i * + ( h ^ i * n 1 q = 1 n h ^ q * h i * + n 1 q = 1 n h q * ) , β ^ l e a s t = X X 1 X Y ^ * , D 1 ( α ) = diag α 1 2 , , α q n 2 , D 2 ( γ ) = diag γ 1 2 , , γ p n q n 2 , and ϵ i * = h i * n 1 q = 1 n h q * X i T β .
Lemma A1.
Assume that the conditions C 2 ,   C 6 hold and ϕ 1 ( · , · ) , ϕ 2 ( · , · ) , and ϕ 3 ( · , · ) are continuous functions with finite continuous partial derivatives. We have
sup u , v | h ^ * ( g ^ u ( u ) , g ^ v ( v ) ) h * ( g u ( u ) , g v ( v ) ) | 0 a . s . a s n .
Proof. 
Under the conditions C 2 and C 6 , we have that sup u | g ^ u ( u ) g u ( u ) | 0 a . s . and sup v | g ^ v ( v ) g v ( v ) | 0 a . s . a s n according to Theorem A in Silverman (1978) [31]. Using Taylor’s expansion, ϕ 1 ( · , · ) , ϕ 2 ( · , · ) , and ϕ 3 ( · , · ) are continuous functions with finite continuous partial derivatives yielding
sup u , v | h ^ * ( g ^ u ( u ) , g ^ v ( v ) ) h * ( g u ( u ) , g v ( v ) ) | 0 a . s . a s n .
The proof of Lemma A1 is completed. □
Lemma A2.
Let β ^ l e a s t denote the least squares estimator defined in (3), β 0 is a true value of β and suppose that the conditions C 1 to C 8 hold. Then, we have that
β ^ l e a s t β 0 = O p ( p n / n ) ,
Proof. 
When we consider (A1), according to C 4 , we get V a r ( ϵ ) < . According to (3), we get n 1 i = 1 n ( h i * n 1 q = 1 n h q * X i T β + h ^ i * n 1 q = 1 n h ^ q * h i * + n 1 q = 1 n h q * ) 2 . After some simple algebraic manipulations, we get
( h ^ i * n 1 q = 1 n h ^ q * h i * + n 1 q = 1 n h q * + ϵ * ) 2 2 ( h ^ i * n 1 q = 1 n h ^ i * h i * + n 1 q = 1 n h q * ) 2 + 2 ϵ * 2 .
Under the conditions C 1 to C 8 , we have that 2 ( h ^ i * n 1 q = 1 n h ^ q * h i * + n 1 q = 1 n h q * ) 2 = o p ( 1 ) . According to Lemma A1, n E ( h ^ i * n 1 q = 1 n h ^ i * h i * + n 1 q = 1 n h q * + ϵ i * ) 2 = n E ( ϵ i * 2 ) + n o p ( 1 ) . n E β ^ l e a s t β 0 = n E ( ( X X ) 1 X ( Y ^ * Y * + ϵ * ) 2 ) = ( E ( ϵ * ϵ * ) + o p ( 1 ) ) · trace ( C n ) = O p ( p n ) . We get that β ^ l e a s t β 0 2 = O p ( p n / n ) . □
Lemma A3.
Suppose that the conditions C 2 , C 4 , and C 6 hold. Then, we have that
lim n P ( | V a r ( Y ^ * ) V a r ( Y * ) | > ϵ ) = 0 .
Proof. 
According to C 4 , we know V a r ( Y * ) = E ( ϵ * ϵ * ) < . According to Lemma A1, h ^ i * converges to h i * in probability for i = 1 , , n . Then, V a r ( Y ^ * ) = E { ( Y ^ * Y * + ϵ * ) ( Y ^ * Y * + ϵ * ) } = E ( ( ϵ * ϵ * ) + ϵ * ( Y ^ * Y * ) + ( Y ^ * Y * ) 2 ) = E ( ϵ * ϵ * ) + o p ( 1 ) . So Lemma A3 is proved. □
Proof of Theorem 1 (Oracle Property).
For a zero component, according to Lemma 1 in [12] conclusion (i), let δ n be a sequence of positive real numbers satisfying δ n and δ n 2 p n / λ n 0 . Define H n = β R p n : β β 0 δ n p n / n , H n 1 = α R q n : α β 01 δ n p n / n . Then, with probability tending to 1, we have
γ * < γ δ n p n / n 0 .
For a nonzero component, according to Lemma 2 in [12], f ( α ) is a contraction mapping from H n 1 to itself, where
f ( α ) = X α X α + λ n D 1 ( α ) 1 X α Y ^ * .
Let α ^ is the unique fixed point of f ( α ) defined by α ^ = X α X α + λ n D 1 α ^ 1 X α Y ^ * .
Hence, to prove the consistency of nonzero parts in Theorem 1, it is sufficient to show that
Pr lim k α ^ ( k ) α ^ = 0 1 .
Define γ * = 0 if γ = 0 . It is easy to see from (A3) that for any α H n 1 ,
lim γ 0 γ * ( α , γ ) = 0 .
Combining (A5) with the fact that
X α X α + λ n D 1 ( α ) X α X γ X γ X α X γ X γ + λ n D 2 ( γ ) α * γ * = X α Y * ^ X γ Y * ^ ,
we find that, for any α H n ,
lim γ 0 α * ( α , γ ) = X α X α + λ n D 1 ( α ) 1 X α Y ^ * = f ( α ) .
According to conclusion b in Lemma 1 of Dai et al. (2018) [12], g in (A2) is a mapping from H n to itself. This, together with (A3) and (A6), implies that, as k ,
η k sup α H n f ( α ) α * α , γ ^ ( k ) 0 .
with probability tending to 1. Note that
α ^ ( k + 1 ) α ^ = α * β ^ ( k ) α ^ α * β ^ ( k ) f α ^ ( k ) + f α ^ ( k ) α ^ η k + 1 E α ^ ( k ) α ^ ,
where the last step is on the basis of f α ^ ( k ) α ^ = f α ^ ( k ) f α ^ α ^ ( k ) α ^ / E . Let a k = α ^ ( k ) α ^ , for every integer k 0 . From (A7), for any ϵ > 0 , there exists a positive integer N such that for every integer k > N , η k < ϵ
a k + 1 a k 1 E 2 + η k 1 E + η k a 1 E k + η 1 E k 1 + + η N E k N + η N + 1 E k N 1 + + η k 1 E + η k a 1 + η 1 + + η N 1 E k N + 1 ( 1 / E ) k N 1 1 / E ϵ
with probability 1 as n tends to and the right-hand term tends to 0 as k . This proves (A4). Therefore, lim k β ( k ) = α ^ , 0 . □
Proof (Proof of Theorem 1) (Asymptotic Normality).
We next prove the asymptotic normality of the nonzero components in the BAR regression. Finally, based on (A6), we have
n ( α ^ * α ) = Π 1 + Π 2 ,
where
Π 1 = n [ ( X α X α + λ n D 1 ( α ) ) 1 X α X α I q n ] α ,
Π 2 = n [ ( X α X α + λ n D 1 ( α ) ) 1 ( X α Y ^ * X α X α α ) ] .
According to the first-order resolvent expansion formula, we obtain
X α X α + λ n D 1 α 1 = X α X α 1 λ n X α X α 1 D 1 α X α X α + λ n D 1 α 1 .
This yields
Π 1 = λ n / n X α X α / n 1 D 1 α ^ X α X α + λ n D 1 α 1 × X α X α α .
With conditions C 7 and C 8 , we can get
Π 1 = O p λ n q n / n 0 ,
Π 2 = n [ ( n 1 X α X α + o p ( n 1 / 2 ) ] 1 ( n 1 X α Y ^ * n 1 X α X α α ) ] .
Therefore, we can get
n ( α ^ * α ) = E ( X α X α ) 1 n 1 / 2 ( X α Y ^ * X α X α α ) + o p ( 1 ) .
According to Lemma A3, we know that V a r ( Y ^ * ) converge to V a r ( Y * ) in probability. Then, we get
n Σ ( 1 ) ( 1 / 2 ) ( α ^ * α * ) N ( 0 , I q × q )
in distribution. Denote E ( X α X α ) 1 V a r ( X α Y * ) E ( X α X α ) 1 as Σ ( 1 ) . This complete the proof.

References

  1. Goldberger, A.S.; Jochems, D.B. Note on stepwise least squares. J. Am. Stat. Assoc. 1961, 56, 105–110. [Google Scholar] [CrossRef]
  2. Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  3. Mitchell, T.J.; Beauchamp, J.J. Bayesian Variable Selection in Linear Regression. J. Am. Stat. Assoc. 1988, 83, 1023–1041. [Google Scholar] [CrossRef]
  4. Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  5. Raftery, A.E.; Madigan, D.; Hoeting, J.A. Bayesian Variable model averaing for Linear Regression Models. J. Am. Stat. Assoc. 1997, 92, 179–191. [Google Scholar] [CrossRef]
  6. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
  7. Zou, H. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
  8. Fan, J.; Li, R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  9. Zhang, C.H. Nearly Unbiased Variable Selection Under Minimax Concave Penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
  10. Dicker, L.; Huang, B.; Lin, X. Variable Selection and Estimation with the Seamless-L0 penalty. Stata Sin. 2013, 23, 929–962. [Google Scholar] [CrossRef]
  11. Liu, Z.; Li, G. Efficient Regularized Regression with L0 Penalty for Variable Selection and Network Construction. Comput. Math. Methods Med. 2016, 2016, 3456153. [Google Scholar] [CrossRef] [Green Version]
  12. Dai, L.; Chen, K.; Sun, Z.; Liu, Z.; Li, G. Broken Adaptive Ridge Regression and Its Asymptotic Properties. J. Multivar. Anal. 2018, 168, 334–351. [Google Scholar] [CrossRef] [PubMed]
  13. Zheng, X.; Rong, Y.; Liu, L.; Cheng, W. A More Accurate Estimation of Semiparametric Logistic Regression. Mathematics 2021, 9, 2376. [Google Scholar] [CrossRef]
  14. Cai, J.; Fan, J.; Li, R.; Zhou, H. Variable Selection for Multivariate Failure Time Data. Biometrika 2005, 92, 303–316. [Google Scholar] [CrossRef] [PubMed]
  15. Fan, J.; Li, R. Variable Selection for Cox’s Proportional Hazards Model and Frailty Model. Ann. Stat. 2002, 30, 74–99. [Google Scholar] [CrossRef]
  16. Tibshirani, R. The Lasso Method for Variable Selection in the Cox Model. Stat. Med. 1997, 16, 385–395. [Google Scholar] [CrossRef] [Green Version]
  17. Zhang, H.H.; Lu, W. Adaptive Lasso for Cox’s proportional hazards model. Biometrika 2007, 94, 691–703. [Google Scholar] [CrossRef]
  18. Khan, N.; Aslam, M.; Raza, S.M.; Jun, C. A New Variable Control Chart under Failure-censored Reliability Tests for Weibull distribution. Qual. Reliab. Eng. Int. 2018, 35, 572–581. [Google Scholar] [CrossRef]
  19. Zhao, H.; Wu, Q.; Li, G.; Sun, J. Simultaneous Estimation and Variable Selection for Interval-Censored Data With Broken Adaptive Ridge Regression. J. Am. Stat. Assoc. 2020, 115, 204–216. [Google Scholar] [CrossRef]
  20. Li, S.; Wei, Q.; Sun, J. Penalized estimation of semiparametric transformation models with interval-censore data and application to Alzheimer’s disease. Stat. Methods Med. Res. 2019, 29, 2151–2166. [Google Scholar] [CrossRef]
  21. Du, M.; Sun, J. Variable selection for interval-censored failure time data. Int. Stat. Rev. 2021. accepted. [Google Scholar] [CrossRef]
  22. Ali, S.; Raza, S.M.; Aslam, M.; Butt, M.M. CEV-Hybrid DEWMA charts for censored data Using Weibull distribution. Commun. Stat.—Simul. Comput. 2021, 50, 446–461. [Google Scholar] [CrossRef]
  23. Zhao, S.; Hu, T.; Ma, L.; Wang, P.; Sun, J. Regression analysis of informative current status data with the additive hazards model. Lifetime Data Anal. 2015, 21, 241–258. [Google Scholar] [CrossRef]
  24. Wang, P.; Zhao, H.; Sun, J. Regression Analysis of Case K Interval Censored Failure Time Data in the Presence of Informative Censoring. Biometrics 2016, 72, 1103–1112. [Google Scholar] [CrossRef] [PubMed]
  25. Li, S.; Hu, T.; Wang, P.; Sun, J. A Class of Semiparametric Transformation Models for Doubly Censored Failure Time Data. Scand. J. Stat. 2018, 45, 682–698. [Google Scholar] [CrossRef]
  26. Sun, J. The Statistical Analysis of Interval-Censored Failure Time Data; Springer: New York, NY, USA, 2006. [Google Scholar]
  27. Buckley, J.; James, I. Linear regression with censored data. Biometrika 1979, 66, 429–436. [Google Scholar] [CrossRef]
  28. Deng, W. Some Issues on Interval Censored Data. Ph.D. Dissertation, Fudan University, Shanghai, China, 2004. [Google Scholar]
  29. Deng, W.; Tian, Y.; Lv, Q. Parametric Estimator of Linear Model with Interval-Censored Data. Commun. Stat.—Simul. Comput. 2012, 41, 1794–1804. [Google Scholar] [CrossRef]
  30. Parzen, E. On Estimation of a Probability Density Function and Mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
  31. Silverman, B.W. Weak and Strong Uniform Consistency of the Kernel Estimate of a Density and its Derivatives. Ann. Statist. 1978, 6, 177–184. [Google Scholar] [CrossRef]
  32. Wu, Q.; Zhao, H.; Zhu, L.; Sun, J. Variable Selection for High-dimensional Partly Linear Additive Cox model with application to Alzheimer’s Disease. Stat. Med. 2020, 39, 3120–3134. [Google Scholar] [CrossRef]
  33. Sun, L.; Li, S.; Wang, L.; Song, X. Variable Selection in semiparametric nonmixture cure model with interval-censored failure time data: An application to the prostate cancer screening study. Stat. Med. 2019, 38, 3026–3039. [Google Scholar] [CrossRef] [PubMed]
  34. Schwarz, G. Estimating the Dimension for a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  35. Verweij, P.; Houwelingen, H. Cross-validation in survival analysis. Stat. Med. 2010, 12, 2305–2314. [Google Scholar] [CrossRef]
  36. Kneib, T. Mixed Model-Based Inference in Geoadditive Hazard Regression for Interval-Censored Survival Times. Comput. Stat. Data Anal. 2006, 51, 777–792. [Google Scholar] [CrossRef] [Green Version]
  37. Xu, Y.; Zhao, S.; Tao, H.; Sun, J. Variable Selection for generalized odds rate mixture cure models wth interval-censored failure time data. Comput. Stat. Data Anal. 2020, 156, 107115. [Google Scholar] [CrossRef]
  38. Smarandache, F. Neutrosophic Probability, Set, and Logic; American Research Press: Rehoboth, NM, USA, 1998. [Google Scholar]
  39. Smarandache, F. Introduction to Neutrosophic Measure, Neutrosophic Integral, and Neutrosophic Probability; Sitech-Eduacation Publisher: Craiova, Romania, 2013. [Google Scholar]
Table 1. Simulation results based on the joint selection of the tuning parameter and bandwidth for (I).
Table 1. Simulation results based on the joint selection of the tuning parameter and bandwidth for (I).
n = 300 n = 500
β = ( 0.5 , 0.5 , 0 , , 0 )
pMethodTPFPMMSE (SD)TPFPMMSE (SD)
10Stepwise1.490.120.179 (0.097)1.961.580.144 (0.060)
LASSO2.001.030.174 (0.071)2.001.210.164 (0.065)
SCAD1.950.920.145 (0.070)1.970.900.132 (0.063)
MCP1.920.770.139 (0.064)1.950.760.131 (0.061)
BAR1.890.490.242 (0.090)1.950.480.134 (0.058)
30Stepwise1.905.650.279 (0.115)1.985.050.208 (0.069)
LASSO1.993.170.216 (0.145)2.002.860.186 (0.056)
SCAD1.932.970.187 (0.085)1.992.510.145 (0.059)
MCP1.902.680.187 (0.092)1.982.010.137 (0.056)
BAR1.981.140.143 (0.055)1.981.140.143 (0.054)
50Stepwise1.849.900.400 (0.140)1.969.690.292 (0.083)
LASSO1.974.860.233 (0.063)2.004.660.193 (0.054)
SCAD1.904.540.206 (0.079)1.964.250.151 (0.063)
MCP1.884.320.220 (0.099)1.953.780.155 (0.068)
BAR1.731.450.205 (0.079)1.931.760.158 (0.066)
100Stepwise1.8725.960.875 (0.277)1.9621.610.542 (0.137)
LASSO1.999.830.255 (0.079)2.008.670.211 (0.064)
SCAD1.869.550.228 (0.103)1.978.120.173 (0.081)
MCP1.878.850.299 (0.133)1.957.320.189 (0.095)
BAR1.752.990.253 (0.101)1.913.270.196 (0.082)
β = ( 0.5 , 0.7 , 0 , , 0 )
10LASSO2.001.130.278 (0.086)2.001.250.225 (0.074)
SCAD1.940.850.225 (0.080)1.970.860.209 (0.070)
MCP1.940.840.220 (0.076)1.970.770.204 (0.067)
BAR1.890.290.229 (0.082)1.950.450.214 (0.067)
30LASSO2.003.060.311 (0.096)2.002.850.276 (0.069)
SCAD1.902.800.248 (0.099)1.992.530.209 (0.069)
MCP1.912.690.258 (0.109)1.992.120.207 (0.063)
BAR1.850.910.254 (0.102)1.981.080.214 (0.065)
50LASSO1.984.780.337 (0.082)2.004.130.278 (0.059)
SCAD1.904.640.284 (0.092)2.003.550.214 (0.069)
MCP1.934.320.292 (0.115)1.983.250.214 (0.067)
BAR1.822.070.373 (0.578)1.971.480.221 (0.063)
100LASSO1.999.440.343 (0.092)1.998.140.312 (0.069)
SCAD1.949.500.298 (0.121)1.988.040.241 (0.079)
MCP1.908.450.346 (0.137)1.997.190.249 (0.082)
BAR1.832.720.309 (0.113)1.993.120.262 (0.079)
β = ( 0.5 , 0.5 , 0.5 , 0.5 , 0 , , 0 )
10LASSO3.921.030.649 (0.134)3.971.100.588 (0.109)
SCAD3.600.810.591 (0.132)3.810.870.530 (0.120)
MCP3.390.580.586 (0.137)3.670.640.533 (0.125)
BAR3.240.250.615 (0.138)3.610.370.552 (0.126)
30LASSO3.892.790.705 (0.127)3.992.580.643 (0.112)
SCAD3.562.620.630 (0.139)3.762.590.557 (0.121)
MCP3.382.270.614 (0.136)3.672.070.540 (0.110)
BAR3.150.750.630 (0.140)3.640.960.547 (0.109)
50LASSO3.904.490.754 (0.119)3.984.400.663 (0.108)
SCAD3.514.300.679 (0.132)3.704.380.578 (0.122)
MCP3.453.820.660 (0.144)3.573.600.555 (0.123)
BAR3.081.190.673 (0.138)3.571.680.566 (0.111)
100LASSO3.849.200.793 (0.139)3.977.620.695 (0.086)
SCAD3.389.150.724 (0.157)3.647.800.616 (0.094)
MCP3.058.040.742 (0.191)3.526.500.600 (0.100)
BAR3.002.570.735 (0.172)3.532.990.607 (0.101)
Table 2. Simulation results based on the joint selection of the tuning parameter and bandwidth n = 300 for (II).
Table 2. Simulation results based on the joint selection of the tuning parameter and bandwidth n = 300 for (II).
β = ( 0.5 , 0.5 , 0 , , 0 )
pMethodTPFPMMSE (SD)
10LASSO2.000.520.174 (0.047)
SCAD1.990.380.129 (0.048)
MCP1.990.320.127 (0.044)
BAR1.960.080.132 (0.047)
30LASSO2.001.470.197 (0.053)
SCAD1.991.200.151 (0.055)
MCP1.991.120.133 (0.049)
BAR1.920.310.150 (0.057)
50LASSO2.001.960.204 (0.048)
SCAD2.001.990.159 (0.054)
MCP1.961.570.140 (0.050)
BAR1.960.290.144 (0.048)
100LASSO2.004.010.215 (0.058)
SCAD2.001.990.159 (0.054)
MCP1.983.600.149 (0.057)
BAR1.970.700.159 (0.053)
β = ( 0.5 , 0.7 , 0 , , 0 )
10LASSO2.000.550.277 (0.055)
SCAD1.990.350.218 (0.053)
MCP1.990.280.213 (0.050)
BAR1.950.100.222 (0.055)
30LASSO2.001.400.304 (0.070)
SCAD1.981.070.232 (0.064)
MCP1.991.140.216 (0.062)
BAR1.950.290.231 (0.057)
50LASSO2.002.250.320 (0.061)
SCAD2.001.940.238 (0.062)
MCP2.001.850.227 (0.057)
BAR1.980.340.233 (0.056)
100LASSO2.003.760.325 (0.060)
SCAD1.993.820.251 (0.067)
MCP1.983.420.236 (0.062)
BAR1.940.680.249 (0.061)
β = ( 0.5 , 0.5 , 0.5 , 0.5 , 0 , , 0 )
10LASSO3.970.400.673 (0.102)
SCAD3.710.330.599 (0.100)
MCP3.580.230.594 (0.090)
BAR3.320.060.634 (0.106)
30LASSO3.991.430.731 (0.106)
SCAD3.771.200.645 (0.105)
MCP3.611.300.600 (0.107)
BAR3.440.260.626 (0.106)
50LASSO3.992.050.751 (0.092)
SCAD3.742.220.648 (0.103)
MCP3.631.810.618 (0.103)
BAR3.410.380.624 (0.112)
100LASSO3.963.550.775 (0.081)
SCAD3.763.770.692 (0.102)
MCP3.643.250.627 (0.118)
BAR3.380.550.644 (0.112)
Table 3. Simulation results based on the joint selection of the tuning parameter and bandwidth n = 300 for (III).
Table 3. Simulation results based on the joint selection of the tuning parameter and bandwidth n = 300 for (III).
β = ( 0.5 , 0.5 , 0 , , 0 )
pMethodTPFPMMSE (SD)
10LASSO2.000.630.165 (0.057)
SCAD1.980.420.125 (0.054)
MCP1.920.770.139 (0.064)
BAR1.950.140.128 (0.053)
30LASSO2.001.820.191 (0.057)
SCAD1.981.440.142 (0.058)
MCP1.902.680.187 (0.092)
BAR1.940.370.139 (0.055)
50LASSO2.002.830.200 (0.046)
SCAD1.992.310.153 (0.058)
MCP1.884.320.220 (0.099)
BAR1.960.420.137 (0.050)
100LASSO2.005.250.214 (0.055)
SCAD1.955.450.171 (0.071)
MCP1.961.570.140 (0.050)
BAR1.921.040.161 (0.066)
β = ( 0.5 , 0.7 , 0 , , 0 )
10LASSO2.000.780.261 (0.068)
SCAD1.980.530.204 (0.060)
MCP1.980.470.202 (0.056)
BAR1.940.130.211 (0.063)
30LASSO2.001.800.275 (0.067)
SCAD2.001.200.217 (0.064)
MCP2.001.420.201 (0.065)
BAR1.950.310.222 (0.064)
50LASSO2.002.540.305 (0.062)
SCAD1.952.360.223 (0.066)
MCP1.972.250.220 (0.061)
BAR1.960.580.225 (0.063)
100LASSO2.004.80.231 (0.081)
SCAD1.974.450.231 (0.066)
MCP1.974.700.158 (0.063)
BAR1.910.900.232 (0.070)
β = ( 0.5 , 0.5 , 0.5 , 0.5 , 0 , , 0 )
10LASSO3.960.580.618 (0.110)
SCAD3.740.400.553 (0.108)
MCP3.630.320.544 (0.104)
BAR3.430.110.578 (0.114)
30LASSO3.961.360.694 (0.106)
SCAD3.771.790.588 (0.104)
MCP3.591.070.571 (0.109)
BAR3.420.320.594 (0.113)
50LASSO3.972.190.702 (0.096)
SCAD3.692.540.627 (0.095)
MCP3.631.940.576 (0.102)
BAR3.390.580.600 (0.106)
100LASSO3.964.930.738 (0.093)
SCAD3.634.790.642 (0.102)
MCP3.574.700.612 (0.099)
BAR3.351.170.615 (0.117)
Table 4. Simulation results based on the joint selection of the tuning parameter and bandwidth for n = 100 or n = 5000 with p = 5 .
Table 4. Simulation results based on the joint selection of the tuning parameter and bandwidth for n = 100 or n = 5000 with p = 5 .
n = 100 n = 5000
β = ( 0.5 , 0.5 , 0 , , 0 )
pMethodTPFPMMSE (SD)TPFPMMSE (SD)
5LASSO1.810.680.234 (0.104)2.000.740.084 (0.065)
SCAD1.600.580.220 (0.103)2.000.360.077 (0.018)
MCP1.550.500.218 (0.107)2.000.300.076 (0.019)
BAR1.210.140.267 (0.117)2.000.030.079 (0.021)
Table 5. Simulation results based on the joint selection of the tuning parameter and bandwidth n = 100 for existing noncontinuous covariate.
Table 5. Simulation results based on the joint selection of the tuning parameter and bandwidth n = 100 for existing noncontinuous covariate.
β = ( 0.5 , 0.5 , 0 , , 0 )
pMethodTPFPMMSE (SD)
10LASSO1.981.250.188 (0.074)
SCAD1.931.040.156 (0.076)
MCP1.900.940.155 (0.078)
BAR1.810.380.162 (0.085)
30LASSO1.993.420.229 (0.077)
SCAD1.933.070.199 (0.087)
MCP1.892.690.199 (0.093)
BAR1.821.260.200 (0.090)
50LASSO1.995.600.232 (0.074)
SCAD1.925.320.202 (0.086)
MCP1.904.830.218 (0.092)
BAR1.791.610.206 (0.087)
Table 6. Comparison of the proposed method to the left endpoint method.
Table 6. Comparison of the proposed method to the left endpoint method.
Proposed MethodLeft Endpoint Imputation
β = ( 0.5 , 0.5 , 0 , , 0 )
pMethodTPFPMMSE (SD)TPFPMMSE (SD)
5LASSO1.810.680.234 (0.104)1.890.100.313 (0.086)
SCAD1.600.580.220 (0.103)1.580.060.305 (0.084)
MCP1.550.500.218 (0.107)1.520.130.313 (0.071)
BAR1.210.140.268 (0.117)0.970.020.385 (0.089)
Table 7. Analysis results of children’s mortality data based on choice (I).
Table 7. Analysis results of children’s mortality data based on choice (I).
FactorLASSOSCADMCPBAR
Age 0.234 ( 0.110 ) 0.248 ( 0.110 ) 0.248 ( 0.132 ) 0.162 ( 0.134 )
BMI 0.000 ( 0.132 ) 0.000 ( 0.111 ) 0.000 ( 0.154 ) 0.000 ( 0.110 )
HOPS 0.000 ( 0.123 ) 0.000 ( 0.130 ) 0.000 ( 0.118 ) 0.000 ( 0.094 )
GENDER 0.000 ( 0.109 ) 0.000 ( 0.108 ) 0.000 ( 0.102 ) 0.000 ( 0.088 )
EDU 0.092 ( 0.119 ) 0.115 ( 0.143 ) 0.098 ( 0.146 ) 0.000 ( 0.115 )
URBAN 0.000 ( 0.126 ) 0.000 ( 0.122 ) 0.000 ( 0.101 ) 0.000 ( 0.120 )
Table 8. Analysis results of children’s mortality data based on choice (II).
Table 8. Analysis results of children’s mortality data based on choice (II).
FactorLASSOSCADMCPBAR
Age 0.238 ( 0.109 ) 0.252 ( 0.115 ) 0.252 ( 0.105 ) 0.167 ( 0.090 )
BMI 0.000 ( 0.108 ) 0.000 ( 0.123 ) 0.000 ( 0.118 ) 0.000 ( 0.100 )
HOPS 0.000 ( 0.131 ) 0.000 ( 0.107 ) 0.000 ( 0.144 ) 0.000 ( 0.109 )
GENDER 0.000 ( 0.121 ) 0.000 ( 0.119 ) 0.000 ( 0.122 ) 0.000 ( 0.119 )
EDU 0.110 ( 0.102 ) 0.133 ( 0.161 ) 0.133 ( 0.118 ) 0.000 ( 0.105 )
URBAN 0.000 ( 0.127 ) 0.000 ( 0.114 ) 0.000 ( 0.104 ) 0.000 ( 0.091 )
Table 9. Analysis results of children’s mortality data based on choice (III).
Table 9. Analysis results of children’s mortality data based on choice (III).
FactorLASSOSCADMCPBAR
Age 0.243 ( 0.093 ) 0.257 ( 0.111 ) 0.257 ( 0.105 ) 0.162 ( 0.134 )
BMI 0.000 ( 0.119 ) 0.000 ( 0.127 ) 0.000 ( 0.154 ) 0.000 ( 0.110 )
HOPS 0.000 ( 0.145 ) 0.000 ( 0.148 ) 0.000 ( 0.118 ) 0.000 ( 0.094 )
GENDER 0.000 ( 0.123 ) 0.000 ( 0.101 ) 0.000 ( 0.102 ) 0.000 ( 0.088 )
EDU 0.128 ( 0.097 ) 0.149 ( 0.109 ) 0.149 ( 0.119 ) 0.000 ( 0.115 )
URBAN 0.000 ( 0.117 ) 0.000 ( 0.118 ) 0.000 ( 0.105 ) 0.000 ( 0.120 )
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liu, R.; Zhao, S.; Hu, T.; Sun, J. Variable Selection for Generalized Linear Models with Interval-Censored Failure Time Data. Mathematics 2022, 10, 763. https://doi.org/10.3390/math10050763

AMA Style

Liu R, Zhao S, Hu T, Sun J. Variable Selection for Generalized Linear Models with Interval-Censored Failure Time Data. Mathematics. 2022; 10(5):763. https://doi.org/10.3390/math10050763

Chicago/Turabian Style

Liu, Rong, Shishun Zhao, Tao Hu, and Jianguo Sun. 2022. "Variable Selection for Generalized Linear Models with Interval-Censored Failure Time Data" Mathematics 10, no. 5: 763. https://doi.org/10.3390/math10050763

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop