Next Article in Journal
Theoretical Proof of and Proposed Experimental Search for the Ground Triplet State of a Wigner-Regime Two-Electron ‘Artificial Atom’ in a Magnetic Field
Previous Article in Journal
A New Perspective on Intuitionistic Fuzzy Structures in Sheffer Stroke BCK-Algebras
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Partitioning-Based Approach to Variable Selection in WLW Model for Multivariate Survival Data

Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China
*
Author to whom correspondence should be addressed.
Axioms 2025, 14(5), 348; https://doi.org/10.3390/axioms14050348
Submission received: 28 March 2025 / Revised: 28 April 2025 / Accepted: 30 April 2025 / Published: 30 April 2025

Abstract

:
In this paper, we propose a new variable selection method using a partitioning-based estimating equation for multivariate survival data to simultaneously perform variable selection and parameter estimation. The main idea of the partitioning-based estimating equation is to partition the score function into small blocks. We construct our method using the SCAD penalty function and achieve the purpose of directly selecting variables through the estimating equation. We further establish asymptotic normality and prove that our method achieves the oracle property. Moreover, we use a simple approximation of the penalty function such that our method can be implemented algorithmically. We conducted simulation studies to validate the performance of our method and analyzed the dataset from the Colon Cancer Study.

1. Introduction

Biomedical research often involves multivariate survival data, such as cancer patients facing local recurrence and repeated hospitalizations in the context of chronic disease management. A significant characteristic of these data is that the survival times of the same individual are correlated. Theoretical development is relatively slow due to the complexity of the dependency structure among survival times, and researchers have mainly focused on modeling the marginal distributions of survival times (see Liang [1], Lin [2], and Spiekerman [3]), where the dependence is not specified.
The most widely used approach is the WLW method [4]. Suppose that there is a random sample of n subjects. Let T k i be the kth survival time of the ith subject, where k = 1 , , K and i = 1 , , n . In the WLW method, the marginal hazard function of T k i is assumed to take the form
λ k i ( t | Z k i ) = λ k 0 ( t ) e x p ( β k 0 T Z k i ( t ) ) , t 0 ,
where Z k i is a p-dimensional possibly covariate vector, λ k 0 ( t ) is the unspecified baseline hazard function, and β k 0 represents the true values of unknown regression coefficients. Let β 0 = ( β 10 T , , β K 0 T ) T and C k i be the censoring time. Let T ˜ k i = T k i C k i and δ k i = I ( T k i C k i ) . Assume that T k i and C k i are given independent covariates of Z k i . Then, the marginal partial likelihood ([5,6]) is
L k ( β k ) = i = 1 n e x p ( β k T Z k i ( T ˜ k i ) ) Σ l R k ( T ˜ k i ) e x p ( β k T Z k l ( T ˜ k i ) ) δ k i ,
and the corresponding score function is
l o g L k ( β k ) β k = i = 1 n 0 Z k i ( t ) Z ¯ k i ( β k , t ) d N k i ( t ) ,
where R k ( t ) = { i : T ˜ k i t } is the risk set at t, Y k i ( t ) = I ( T ˜ k i t ) , N k i ( t ) = δ k i I ( T ˜ k i t ) , and
Z ¯ k ( β k , t ) = Z k i ( t ) Y k i ( t ) e x p ( β k T Z k i ( t ) ) Y k i ( t ) e x p ( β k T Z k i ( t ) ) .
Thus, we have K sets of estimating equations as follows:
i = 1 n 0 ( Z k i ( t ) Z ¯ k ( β k , t ) ) d N k i ( t ) = 0 , k = 1 , 2 , , K .
We can obtain the estimators β ^ 1 , , β ^ K . Let β ^ W = ( β ^ 1 T , , β ^ K T ) T , which is known as the WLW estimator.
Cui [7] proposed a new method to improve the efficiency of the WLW method, which is called the partition method or partition-estimating equation in this paper. The main idea of their method is to partition the score function into small blocks. He further explored the situation where the number of blocks increases with the sample size, and he established the asymptotic normality of the estimators obtained using their method. His method described and made use of the dependency information among survival times, and the simulation results showed that their method performed better than the WLW method.
In practice, it is always difficult for investigators to identify significant covariates when the number of covariates is large, and variable selection studies increasingly involve the analysis of survival data with high-dimensional covariates to solve this difficulty. Tibshirani [8] proposed the application of the L 1 penalty function (LASSO) in the Cox model; Zou [9] proposed the adaptive Lasso (AdpLASSO), which Zhang [10] studied in the Cox model; and Zhang [11] proposed the minimax concave penalty. Several studies have focused on variable selection for multivariate survival data. For example, Cai [12] proposed a variable selection method for a growing number of regression coefficients; Liu [13] proposed a multivariate varying-coefficient hazard model; and Sun [14] developed a variable selection technique for multivariate interval-censored data.
Fan and Li [15] proposed a new type of penalty function called Smoothly Clipped Absolute Deviation (SCAD). The SCAD method combines characteristics of the LASSO and least squares. It compresses the coefficients of the model through a penalty function such that some coefficients are compressed to 0, thereby achieving variable selection, and the larger coefficients can also achieve asymptotically unbiased estimates. Moreover, Fan and Li proposed the oracle property and then introduced the SCAD penalty function into the Cox model [16].
In this paper, we aim to further improve the partition method to propose a new variable selection method for multivariate survival data. Based on the partition method, we make better use of the dependency information among survival times compared to the WLW method. Moreover, we directly achieve the purpose of variable selection using estimating equations. We construct our method with the SCAD penalty function and prove that the obtained estimators possess the oracle property. Numerical studies show that the proposed method performs well.
The rest of this paper is organized as follows. In Section 2, we present the used notation and assumptions. Then, we introduce our method and present its asymptotic and oracle properties in Section 3. We address implementation issues in Section 4, while simulations and an application of our method to real data are given in Section 5 and Section 6. We leave the proofs in the Appendix A.

2. Notation and Assumptions

T k i , Z k i ( t ) , C k i , T ˜ k i , N k i ( t ) , Y k i ( t ) , λ k i ( t | Z k i ) , and λ k 0 ( t ) are the same as in Section 1. To facilitate the notation, let
M k i ( t ) = N k i ( t ) 0 t Y k i ( u ) λ k 0 ( u ) e x p ( β k 0 T Z k i ( u ) ) d u , i = 1 , , n , k = 1 , , K ,
which is a martingale with respect to σ -filtration F t , k i = σ { N k i ( s ) , Y k i ( s ) , Z k i ( s ) : 0 s t } .
For k = 1 , , K , let
S k ( d ) ( β k , t ) = 1 n i = 1 n Y k i ( t ) Z k i ( t ) d e x p ( β k T Z k i ( t ) ) , s k ( d ) = E S k ( d ) ( β k , t ) , Z ¯ k ( β k , t ) = S k ( 1 ) ( β k , t ) S k ( 0 ) ( β k , t ) , μ k ( β k , t ) = s k ( 1 ) ( β k , t ) s k ( 0 ) ( β k , t ) , V k ( β k , t ) = S k ( 2 ) ( β k , t ) S k ( 0 ) ( β k , t ) Z ¯ k ( β k , t ) 2 , v k ( β k , t ) = s k ( 2 ) ( β k , t ) s k ( 0 ) ( β k , t ) μ k ( β k , t ) 2 ,
where E denotes expectation. For a column vector α , α 0 = 1 , α 1 = α , and α 2 = α α T .
The preliminary estimators of the baseline cumulative hazard functions are given by
Λ ^ k 0 ( t ) = 0 t d N k · ( u ) n S k ( 0 ) ( β ^ k , u ) ,
where N k · ( t ) = i = 1 n N k i ( t ) . We obtain the following estimated martingales:
M ^ k i ( t ) = N k i ( t ) 0 t Y k i ( u ) e x p ( β ^ k T Z k i ( u ) ) d Λ ^ k 0 ( u ) , i = 1 , , n , k = 1 , , K ,
where β ^ 1 , , β ^ K are the WLW estimators.
We consider only events up to τ such that P r ( T ˜ k τ ) = 0 for all k. Let
U k ( β k , t ) = i = 1 n 0 t ( Z k i ( u ) Z ¯ k ( β k , u ) ) d N k i ( u ) , t [ 0 , τ ] .
Cui [7] introduced a partition. For the kth event, partition [ 0 , τ ] into L k intervals is expressed as follows:
0 = t 0 ( k ) < t 1 ( k ) < < t L k ( k ) = τ .
Let L = L 1 + + L K define partition Π , as follows:
Π = { ( t i 1 ( 1 ) , , t i K ( K ) ) : t i k ( k ) 0 , i = 0 , 1 , , L k , 1 k K } .
Following Cui [7], we break U k ( β k , t ) into L k pieces as follows:
Δ U Π ( k ) ( β k ) = ( Δ U Π ( k ) ( β k , t 1 ( k ) ) T , , Δ U Π ( k ) ( β k , t L k ( k ) ) T ) T ,
where, for l = 1 , , L k ,
Δ U Π ( k ) ( β k , t l ( k ) ) = U Π ( k ) ( β k , t l ( k ) ) U Π ( k ) ( β k , t l 1 ( k ) ) = i = 1 n t l 1 ( k ) t l ( k ) ( Z k i ( t ) Z ¯ k ( β k , t ) ) d N k i ( t ) .
Let Δ U Π = ( Δ U Π ( 1 ) ( β 1 ) T , , Δ U Π ( K ) ( β K ) T ) T . Cui [7] introduced the following notations:
Δ Ψ ^ Π ( β ^ ) = 1 n Δ U Π ( β ) β T | β = β ^ = Δ Ψ ^ 11 ( β ^ 1 ) 0 0 Δ Ψ ^ 1 L 1 ( β ^ 1 ) 0 0 0 Δ Ψ ^ 21 ( β ^ 2 ) 0 0 Δ Ψ ^ 2 L 2 ( β ^ 2 ) 0 0 0 Δ Ψ ^ K 1 ( β ^ K ) 0 0 Δ Ψ ^ K L K ( β ^ K ) p L × p K ,
Δ Ψ ^ k l ( β ^ k ) = Δ Ψ ^ Π ( β ^ k , t l ( k ) ) Δ Ψ ^ Π ( β ^ k , t l 1 ( k ) ) = t l 1 ( k ) t l ( k ) V k ( β ^ k , u ) S k ( 0 ) ( β ^ k , u ) d Λ ^ k 0 ( u ) , Δ Σ ^ Π ( β ^ ) = ( b ^ l l ( k j ) ( β ^ ) ) 1 l L k , 1 l L j , 1 k , j K , ( b ^ l l ( k j ) ( β ^ ) ) = 1 n i = 1 n t l 1 ( k ) t l ( k ) ( Z k i ( t ) Z ¯ k ( β ^ k , t ) ) d M ^ k i ( t ) t l 1 ( j ) t l ( j ) ( Z j i ( t ) Z ¯ j ( β ^ j , t ) ) T d M ^ j i ( t ) , β ^ = β ^ W .
Cui [7] focused on the situation in which Π n varies with sample size n. Let
Δ t Π n ( k ) = max 1 l L k { t l ( k ) t l 1 ( k ) } ,
such that
lim n Δ t Π n ( k ) 0 , k = 1 , , K .
In addition, he proposed the following estimating equations:
Δ Ψ ^ Π n ( β ^ W ) T Δ Σ ^ Π n ( β ^ W ) 1 Δ U Π n ( β ) = 0 .
We impose the following conditions:
  • P r ( Y k i ( t ) = 1 , t [ 0 , τ ] ) > 0 , i = 1 , , n , k = 1 , , K .
  • | Z k i j ( 0 ) | + 0 t | d Z k i j ( t ) | < B a.s. for j = 1 , , p , i = 1 , , n , k = 1 , , K , and some constant B < .
  • λ k 0 ( t ) is a continuous function of t [ 0 , τ ] , and there exist constants C 1 > 0 and γ > 0 such that λ k 0 ( t ) C t γ , t [ 0 , τ ] .
  • There exists a neighborhood B of β 0 such that, for d = 0 , 1 , 2 and k = 1 , , K ,
    sup t [ 0 , τ ] , β B S k ( d ) ( β k , t ) s k ( d ) ( β k , t ) = O p 1 n .
  • s k ( d ) ( β k , t ) (k = 1, …, K; d = 0, 1, 2) is a continuous function of β B uniformly in t [ 0 , τ ] and is bounded on B × [ 0 , τ ] , s k ( 0 ) s 0 > 0 ( k = 1 , , K ) , s 0 is a constant, β B , t [ 0 , τ ] , and
    s k ( 1 ) ( β k , t ) = β k s k ( 0 ) ( β k , t ) , s k ( 2 ) ( β k , t ) = 2 β k 2 s k ( 0 ) ( β k , t ) ,
    for k = 1 , , K , β B , and t [ 0 , τ ] .
  • v k ( β k , t ) ( k = 1 , , K ) is a positive definite matrix on B × [ 0 , τ ] ,
    inf B × [ 0 , τ ] λ m i n ( v k ( β k , t ) ) = λ m i n ( k ) > 0 ,
    and λ m i n ( · ) represents the minimum eigenvalue of the matrix.
  • For all sufficiently large n, there exists Δ Σ ^ Π n ( β ^ W ) 1 . We use η n to denote the partition index for partition Π n , η n is an increasing positive sequence, and there exists a constant C 3 > C 2 > 0 such that
    C 2 η n min { L 1 , , L K } max { L 1 , , L K } C 3 η n ,
    where we assume that η n , and η n 4 + 2 γ / n 0 .
  • Let Δ Ψ Π T ( β ) = 1 n Δ U Π ( β ) β T and Δ Σ Π ( β ) = ( b l l ( k j ) ( β ) ) 1 l L k , 1 l L j , 1 k , j K . Let
    A ( β ) = Δ Ψ Π n T ( β ) Δ Σ Π n 1 ( β ) Δ Ψ Π n ( β ) ,
    and assume that there exists a constant C 4 such that
    λ m a x { A ( β ) } < C 4 < ,
  • Assume that the penalty function p λ ( | β j | ) satisfies
    lim n lim inf β j 0 + p λ ( β j ) / λ > 0
    for all j = 1 , , d n . Furthermore, we assume that there exists a constant C 4 such that, for nonzero θ 1 and θ 2 , | p λ ( θ 1 ) p λ ( θ 2 ) | C 5 | θ 1 θ 2 | .
Remark 1.
Conditions 1 and 2 are also used by Andersen et al. [17]. Conditions 4–8, which are adapted from Cai et al. [18] and Cui et al. [7], guarantee the asymptotic normality of the penalized partition estimator. Condition 3 is satisfied for most commonly used distributions in survival analysis [19]. Condition 9 is also used by Cai et al. [12]

3. Main Results

3.1. Construction of Estimators

We introduce the penalized partition-estimating Equation (PPEE) as follows:
Δ Ψ ^ Π n ( β ^ W ) T Δ Σ ^ Π n ( β ^ W ) 1 Δ U Π n ( β ) n p λ ( | β | ) s g n ( β ) = 0
where p λ ( | β | ) = ( p λ ( | β 1 | ) , , p λ ( | β d | ) ) T , and p λ ( · ) is a penalty function. We consider the differential form of the SCAD penalty proposed by Fan and Li [15,20] defined by
p λ ( | θ | ) = λ I ( | θ | < λ ) + ( a λ | θ | ) + ( a 1 ) λ I ( | θ | λ ) ,
for some a > 2 .
Then, we can obtain the estimators β ^ by solving the penalized partition-estimating Equation (6). The SCAD penalty function involves two tuning parameters, a and λ . We will explain how to choose the parameters and how to obtain the estimators β ^ in Section 4.

3.2. Asymptotic and Oracle Properties of the Proposed Estimator

Fan and Li proposed the oracle property [21], which means that the estimator has the same limiting distribution as an estimator that knows the true model a priori. In this section, we will provide the asymptotic properties for β ^ and show that it achieves the oracle property.
We consider the situation mentioned by Cai [12], where the regression coefficient varies with the sample size n, that is, β = ( β 1 , , β d n ) T , where d n tends to as n and d n 4 / n 0 . Let β 0 ( β 01 , , β 0 d n ) T denote the true value of β . Furthermore, we let β 10 and β 20 , respectively, denote the nonzero and zero components of β 0 . Without loss of generality, we write β 0 = ( β 10 T , β 20 T ) T and suppose that β 0 j 0 for j s n and β 0 j = 0 for j > s n , which means that β 10 consists of the s n nonzero components of β 0 . Let
a n = m a x { | p λ n ( | β 0 j | ) | : β 0 j 0 } ,
and
b n = m a x { | p λ n ( | β 0 j | ) | : β 0 j 0 } .
In this section, we will use λ n instead of λ to emphasize that λ depends on n. For simplicity, we let f ( β ) = Δ Ψ ^ Π n ( β ^ W ) T Δ Σ ^ Π n ( β ^ W ) 1 Δ U Π n ( β ) and g ( β ) = f ( β ) n p λ n ( | β | ) s g n ( β ) .
Remark 2.
Here, we discuss the compatibility of the given parameters. We first impose the condition that the SCAD penalty function possesses the following property: for nonzero fixed θ, lim n / d n p λ n ( | θ | ) = 0 and lim p λ n ( | θ | ) = 0 . This can be satisfied by appropriately choosing the regularization parameter λ n . If we choose λ n 0 and n / d n λ n , the above property holds because n / d n p λ n = p λ n ( | θ | ) = 0 for θ 0 ; thus, a n 0 and b n 0 can be satisfied. Furthermore, we can obtain that, for any given constant M, inf | θ | M n 1 / 2 p λ n ( | θ | ) = λ n , which means that condition (I) can be satisfied. Therefore, the conditions given in the upcoming theorem will not contradict each other.
Based on Cui’s research [7], we have the following theorem and lemma:
Theorem 1.
Under conditions (1)–(8), let { Π n } be the partition sequence that satisfies the following condition:
lim n Δ t Π n ( k ) 0 , k = 1 , , K .
Then, there exists a matrix W such that
lim n Δ Ψ Π n ( β 0 ) T Δ Σ Π n ( β 0 ) 1 Δ Ψ Π n ( β 0 ) = W ,
and
1 n f ( β 0 ) d N ( 0 , W ) .
Lemma 1.
Under conditions 1-8, we can obtain that
1 n f ( β ) + A ( β ) = o p ( 1 ) .
The above theorem and lemma have been proven by Cui [7] and will not be repeated in this paper. With Lemma 1, we can obtain the following theorem, which shows that there exists a penalized partition estimate that converges at the rate O p { d n ( n 1 / 2 + a n ) } .
Theorem 2.
Under conditions 1–8, if a n 0 , b n 0 , d n 4 / n 0 , and η n 4 + 2 γ / n 0 as n , there exists an approximate zero-crossing β ^ of g ( β ) such that β ^ β 0 = O p { d n ( n 1 / 2 + a n ) } .
From Theorem 2, if a n = O ( n 1 / 2 ) , which can be achieved by selecting the appropriate λ n , then there exists a r o o t ( n / d n ) -consistent approximate zero-crossing of g ( β ) . Let
Σ = d i a g p λ n ( | β 01 | ) , , p λ n ( | β 0 s n | ) ,
and
b = p λ n ( | β 01 | ) s g n ( β 01 ) , , p λ n ( | β 0 s n | ) s g n ( β 0 s n ) .
Then, we can obtain Theorem 3, which shows that the proposed estimator achieves the oracle property.
Theorem 3.
Under conditions 1-9, if a n 0 , b n 0 , d n 4 / n 0 , and η n 4 + 2 γ / n 0 as n , and if λ n 0 , n / d n λ n , and a n = O ( n 1 / 2 ) , then under the conditions of Theorem 2, with probability tending to 1, there exists a r o o t ( n / d n ) -consistent approximate zero-crossing β ^ = ( β ^ 1 T , β ^ 2 T ) T in Theorem 2 such that
1. 
(Sparsity) β ^ 2 = 0 ;
2. 
(Asymptotic normality)
n ( A 11 + Σ ) β ^ 1 β 10 + ( A 11 + Σ ) 1 b d N ( 0 , W 11 ) ,
where A 11 and W 11 are the first s n × s n sub-matrices of A ( β 10 , 0 ) and W ( β 10 , 0 ) , respectively.
Remark 3.
The two theorems above indicate that with the SCAD penalty function, which means a n = 0 , b = 0 , and Σ = 0 for sufficiently large n, and we have
n A 11 ( β ^ 1 β 10 ) d N ( 0 , W 11 ) .
Thus, β ^ 1 possesses the same sampling property as the oracle estimate. The oracle estimator knows β in advance, and β ^ 2 = 0 is the same as that. Hence, the penalized partition estimator that we propose achieves the oracle property.

4. Implementation

4.1. Solution of Penalized Partition-Estimating Equation

In Section 3, we construct the penalized partition-estimating equation. Here, we provide the method for solving the equation. First, we need to establish a reasonable partition Π n . We make the number of partitions corresponding to each event the same, that is, L 1 = = L K = L . To ensure that the penalized partition-estimating equation is effective, we need to ensure that each interval of the partition contains a certain number of event times or failure times. Hence, a reasonable partition method is as follows: for the k –th event, sort the failure times of each subject from small to large, and use the n / L , 2 n / L , , ( L 1 ) n / L -th failure time as the cut-point. If k n / L is not an integer ( k = 1 , , L 1 ), then round it to an integer.
As the derivative function of the SCAD penalty p λ ( | β | ) = p λ ( | β | ) s g n ( β ) is discontinuous near 0, we need to obtain an approximate differential penalty function p ˜ λ ( θ ) . We rewrite the penalized partition-estimating equation as follows:
f ( β ) n p ˜ λ ( | β | ) s g n ( β ) = 0 .
Then, we need to solve the equation above to obtain the penalized partition estimator. In this study, we use the gradient descent method to solve (10).

4.2. Abnormal Condition Handling Within Zero Neighborhood

We need to address the issue mentioned above, that is, the approximation of the differential penalty function when a component β j of β approaches 0 (abnormal condition handling within the zero neighborhood). In practice, for a very small ϵ , 0 < ϵ λ , it can be assumed that β = 0 if β ( ϵ , ϵ ) . For the SCAD penalty p λ ( | β | ) , its derivative function is discontinuous near 0. We use a linear function in a small neighborhood ( ϵ , ϵ ) near 0 to obtain the approximate differential penalty function p ˜ λ :
p ˜ λ ( | β | ) = λ | β | ϵ I ( | β | < ϵ ) + I ( ϵ | β | < λ ) + ( a λ | β | ) + ( a 1 ) λ I ( | β | λ ) .
This approximation has little impact on the SCAD penalty function. Since there are random errors in the data and model, it will not have a significant impact on the model’s performance as long as the impact of the small neighborhood ( ϵ , ϵ ) on the SCAD penalty does not exceed the random errors.
However, there are still issues with this approximation. When β j ( ϵ , 0 ) ( 0 , ϵ ) , though β j can be approximated as 0 in the analysis of the estimation, the variation in β j may cause significant variation in g ( β ) = f ( β ) n p ˜ λ ( | β | ) s g n ( β ) , thus resulting in variation in other components of β , except for β j , which will make the zero point unstable. If we let all β j ( ϵ , ϵ ) be 0 and prohibit their variation, then although g ( β ) will remain stable, this method will cause another deficiency, that is, all β j close to 0 can no longer escape ( ϵ , ϵ ) once they fall into this interval. This can cause a deviation in the zero point and even make it impossible to solve for the zero point. Therefore, we introduce the following Algorithm 1 to solve this problem.
Algorithm 1 Abnormal condition handling within zero neighborhood
  • Input:    β , g
  • Output:    β , g
  1:
I j = I ( β ( ϵ , ϵ ) ) ,    j = 1 , , d ,    I = ( I 1 , , I d )
/ / I denotes a vector representing whether β is within a small neighborhood around 0
  2:
Δ β = α g
/ / Δ β denotes the variation in β in the gradient descent method.
  3:
M = which ( I = = 1 )
/ / M denotes the index of the components of β that fall into the small neighborhood around 0.
  4:
if  M N U L L  then
  5:
   if  max j M | β Δ β j | < ϵ  then
  6:
         β j = 0 ,    j M
  7:
         g ( β ) = f ( β ) n p ˜ λ ( β )
  8:
         g j = 0 ,    j M
  9:
   end if
10:
end if
11:
return  β , g
It can be seen that this method can set a small neighborhood with “attraction” near 0. When β j falls into this neighborhood, it will temporarily be “attracted” to 0. If g ( β ) has a large value in this component and can leave this special zero neighborhood in the next step, then this component will not be approximated as 0. Otherwise, when all the components “attracted” to 0 cannot leave this special zero neighborhood in the next step, we will approximate these components to 0, calculate g ( β ) , and perform gradient descent on these components. The next transformation in the algorithm is set to 0.
Condition handling within the zero neighborhood was also mentioned by Fan and Li [21]. They used a quadratic function to approximate the penalty function, resulting in the situation where coefficients are forced to be 0 and can no longer leave the small neighborhood [21]. Our proposed method for abnormal condition handling within the zero neighborhood can solve this problem by setting an appropriate threshold for the “attraction” near 0.

4.3. Tuning Parameter Selection

In order to achieve effective variable selection in the solution of (10), it is necessary to first select appropriate regularization parameters. In their simulations, Fan and Li found that, when a 3.7 , the SCAD method provided the best variable selection and coefficient estimation performance in penalty least-squares estimation [21]. Subsequently, many scholars continued to use a 3.7 in the field of multivariate survival analysis (see Cai [12], liu [13], and Cai [22]). They pointed out that, from a Bayesian statistical point of view, it is suggested that a 3.7 be used, as the Bayes risk cannot be reduced much with other choices of a. In this paper, we also take a 3.7 . Therefore, in the following parameter selection, the value of λ needs to be chosen.
Next, we use k-fold cross-validation (k-fold CV) to select λ . Specifically, we use 10 folds. As it is hard to determine the concrete representation of the objective function of our method, we cannot establish the cross-validation statistic. However, we can apply the 10-fold CV method to the WLW method with the SCAD penalty and select the appropriate regularization parameter from it. The basis for doing so is that the WLW method is a special case of partition estimation when the number of partitions L is set to 1. Therefore, the estimators obtained using the WLW method and the partition-estimating equation have values and errors on the same scale. Hence, it is reasonable to assume that the corresponding optimal regularization parameters have values of similar scale and will not have significant differences.

5. Simulation Study

In this section, we describe our evaluation of the performance of the proposed method based on the results of simulation studies. Raftery [23,24] introduced a bivariate exponential model in 1984. Based on their model, we generate bivariate survival data as follows:
T 1 = ( 1 q 1 ) Y 1 I 1 log ( U ) exp ( β 1 T Z ) , T 2 = ( 1 q 2 ) Y 2 I 2 log ( 1 U ) exp ( β 2 T Z ) ,
where the covariate Z is a p-dimensional 0–1 vector that indicates the presence or absence of features corresponding to each variable, T 1 and T 2 are the bivariate failure times, Y 1 and Y 2 follow a standard exponential distribution, U U ( 0 , 1 ) , I 1 and I 2 are independent random variables P ( I i = 1 ) = q i ( i = 1 , 2 ) , and q 1 q 2 are the pre-set parameters used to adjust the correlation between binary exponential distributions. In addition, Y 1 , Y 2 , U, I 1 , and I 2 are independent.

5.1. Different Numbers of Partitions

This section presents simulation experiments to evaluate the performance of the proposed method in parameter estimation for different numbers of partitions L . We considered settings with p = 8 , n = 1000 , and q 1 = q 2 = q = 0.98 . In each simulation, each component of β 1 and β 2 had a probability of ( 1 6 , 1 6 , 1 3 , 1 6 , 1 6 ) of taking on a value of ( 0.5 , 0.25 , 0 , 0.25 , 0.5 ) . This experiment did not set a censor time, and the simulation was repeated 1000 times. Figure 1 shows the simulation results for different numbers of partitions. When the number of partitions is 1, it is the WLW method. The results show that the larger the number of partitions, the smaller the MSE of parameter estimation, indicating that the larger the number of partitions, the better the performance of our method. Initially, as the number of partitions increases, the MSE of parameter estimation will significantly decrease, and the advantages of the proposed method will rapidly expand. When the number of partitions is large, as the number of partitions increases, the rate of MSE reduction in parameter estimation slows down, and the improvement in the proposed method’s performance in parameter estimation will become inapparent.

5.2. Different Correlations Between Various Events in Multivariate Survival Data

This section presents simulation experiments to evaluate the performance of the proposed method in the parameter estimation and variable selection for different correlations between various events in multivariate survival data. We considered settings with p = 8, n = 200, 1000, and L = 8 . Let the true values be β 0 = ( 0.5 , 0.25 , 0 , 0 , 0 , 0 , 0.25 , 0 , 0 , 0.5 , 0 , 0.25 , 0 , 0 , 0 , 0 ) T . In a simulation study, Cui observed that the partition-estimating function method was superior to the WLW method [7] when there was a significant correlation between various events in multivariate survival data. We let q 1 = q 2 = q = 0.98 , 0.8 , 0.25 . Censoring times were generated from a uniform distribution over ( 0 , c ) , and we chose c = 1 , 5 to change the censoring rate. Each configuration had 1000 replications.
We assess the performance of our method using the model error (ME), similar to Fan and Li [16].
M E = E { e x p ( β ^ Z ) e x p ( β 0 Z ) } 2 .
In addition, we use the oracle estimator β ^ O R to define the relative model error (RME) as follows:
R M E = M E ( β ^ O R ) M E ( β ^ ) .
We compared our method (PPEE) with the WLW-with-LASSO method and the WLW-with-SCAD method. The results of 1000 simulated datasets are given in Table 1. The column labeled “RME” provides the median of 1000 RMEs, and the column labeled “C” reports the average number of coefficients correctly estimated as 0, while the column “IC” presents the average number of the coefficients erroneously estimated as 0. From Table 1, we can see that our method performs well in terms of variable selection. When there is a high correlation between T 1 and T 2 , our method performs particularly well. While when there is a weak correlation, the performance is slightly better than that of WLW+SCAD but still much better than that of WLW+LASSO. This indicates that our method performs better when the correlation between T 1 and T 2 is greater. In practical analysis, there is a certain correlation between failure times in multivariate survival data, which is also the advantage of our method over classical methods. In addition, none of the methods set the coefficients of significant variables to 0; thus, there is no under-fitting situation. Moreover, when the sample size n is large, none of the methods set the coefficients of significant variables to 0, and there is no under-fitting situation.

6. The Colon Cancer Study

In this section, we report the results of applying our method to the dataset collected in the Colon Cancer Study [25]. This study was initiated in the 1980s and included 929 patients with stage C disease randomly assigned to observation, levamisole alone, or levamisole combined with fluorouracil. The observation (Obs), levamisole alone (Lev), and levamisole combined with fluorouracil (Lev + 5-FU) groups comprise 315, 310, and 304 patients, respectively. There are multiple failure outcomes, such as the time to cancer recurrence and the survival time. By the end of the study, 155 patients in Obs, 144 in Lev, and 103 in Lev + 5-FU had experienced recurrences, and 114, 109, and 78 had died, respectively. We are interested in the following risk factors: sex, where 1 is male and 0 is female; age; obstruction of the colon by the tumor (obstruct); adherence to nearby organs (adhere); differentiation of the tumor (differ); perforation of the colon (perfor); number of lymph nodes with detectable cancer (nodes); extent of local spread (extent); more than four positive lymph nodes (node4), coded as 1 if true and 0 otherwise; and time from surgery to registration (surg), coded as 1 for long and 0 for short. In addition, similar to Lin [2], we created two dummy variables as follows: Lev, coded as 1 for the levamisole-alone treatment group and 0 for others, and Lev + 5FU, coded as 1 for the levamisole combined with the fluorouracil treatment group and 0 otherwise.
Table 2 shows the estimated coefficients and standard errors for the Colon Cancer Study using different methods, including the unpenalized method (UNM), LASSO, and our method (PPEE). From Table 2, we can see that our method only keeps five significant variables. In the column “LASSO”, we observe that certain variables may have an impact on one failure event but not on another. According to our method, “Lev + 5FU”, “Extent”, and “Node4” all have a significant impact on the two failure events. “Lev + 5FU” has a negative impact on both death and recurrence, which is consistent with Moertel’s study [26]; that is, levamisole combined with fluorouracil is effective in reducing the mortality rate of colon cancer. In addition, Lin [2] found that “extent” and “node4” increased the risk of colon cancer, and our method is consistent with this.

7. Discussion and Conclusions

We proposed a penalized partition-estimating equation for variable selection in multivariate survival data. The partition-estimating equation was originally proposed by Cui [7]. We developed Cui’s method and proposed the penalized partition-estimating equation to simultaneously estimate the parameters and select variables. Compared with the classic Cox regression method, our method can more effectively select variables and estimate coefficients, which is reflected in our simulation experiments. Moreover, our method makes use of the dependency information among survival times and performs better when the correlation between failure times is greater, which is also considered as an advantage of our method over classical methods, as there is a certain correlation between failure times in multivariate survival data in practical analysis. Moreover, we proved the asymptotic and oracle properties of the proposed method.
Future studies can supplement this work. In this study, our method performed well when failure events have a strong correlation. Therefore, one future task is to consider whether our method still performs well when there is no obvious correlation between events and, if not, explore how this problem can be solved. Another interesting task is ultrahigh-dimensional variable selection.

Author Contributions

Methodology, W.C.; Validation, W.C. and W.T.; formal analysis, W.C. and W.T.; writing—original draft preparation, W.T.; writing—review and editing, W.C. All authors have read and agreed to the published version of the manuscript.

Funding

National Natural Science Foundation of China: 71873128.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proofs

For simplicity, we let F ( β ) = f ( β ) and G ( β ) = F ( β ) n j = 1 d n p λ ( | β j | ) .
Proof of Theorem 2.
To prove Theorem 2, it is sufficient to show that β ^ is a local maximizer of G ( β ) . To prove this, let α n = d n ( n 1 / 2 + a n ) ; it is sufficient to show that for any ϵ > 0 , there exists a large constant C such that
P sup u = C G ( β 0 + α n u ) < G ( β 0 ) 1 ϵ .
This means that the probability of a local maximum existing in the ball { β 0 + α n u : u C } is at least 1 ϵ . Therefore, a local maximizer exists such that β ^ β 0 = O p ( α n ) , and β ^ is a zero-crossing of g ( β ) .
Note that p λ ( 0 ) = 0 and p λ ( · ) 0 ; then,
L n ( u ) = G ( β 0 + α n u ) G ( β 0 ) { F ( β 0 + α n u ) F ( β 0 ) } n j = 1 s n { p λ n ( | β 0 j + α n u j | ) p λ n ( | β 0 j | ) } I 1 + I 2 .
Firstly, we consider I 1 . Let
f ( β ) = β T f ( β ) .
According to the Taylor expansion,
I 1 = α n u T f ( β 0 ) + 1 2 α n 2 u T f ( β n ) u I 11 + I 12 ,
where β n lies between β 0 and β 0 + α n u . According to the Cauchy–Schwarz inequality,
I 11 = α n u T f ( β 0 ) α n f ( β 0 )   u = O p ( α n n d n ) u = O p ( n α n 2 ) u .
Next, we consider I 12 . We can obtain
f ( β ) = n Δ Ψ ^ Π n ( β W ^ ) T Δ Σ ^ Π n ( β W ^ ) 1 Δ Ψ ^ Π n ( β ) .
From Lemma 1, we can obtain
1 n f ( β ) + A ( β ) = o p ( 1 ) .
Consequently,
I 12 = 1 2 n α n 2 u T A ( β 0 ) u { 1 + o p ( 1 ) } .
Thus, I 12 dominates I 11 uniformly in u = C when choosing a sufficiently large C. Then, using the Taylor expansion, we can obtain
I 2 = j = 1 s n n α n p λ n ( | β 0 j | ) s g n ( β 0 j ) u j + n α n 2 p λ n ( | β 0 j | ) u j 2 { 1 + o ( 1 ) } I 21 I 22 .
Using the Cauchy–Schwarz inequality,
| I 21 | s n n α n a n u d n n α n a n u n α n 2 u .
Furthermore,
| I 22 | = n α n 2 j = 1 s n p λ n ( | β 0 j | ) u j 2 { 1 + o ( 1 ) } 2 b n n α n 2 u 2 .
b n 0 , and therefore, we can choose a sufficiently large C so that I 12 also dominates both I 21 and I 22 . This means that (A1) holds, and the proof is complete. □
To prove Theorem 3, we introduce a lemma first.
Lemma A1.
Under the conditions of Theorem 3, with a probability tending to 1, for any given β 1 satisfying β 1 β 10 = O p ( d n / n ) and any constant C, the following holds:
G { ( β 1 T , 0 ) T } = max β 2 C d n / n G { ( β 1 T , β 2 T ) T } .
Proof. 
It is sufficient to show that for any β 1 and β 2 satisfying the conditions above, G ( β ) / β j and β j have different signs for all β j ( C d n / n , C d n / n ) , j = s n + 1 , , d n . We have
G ( β ) β j = F ( β ) β j n p λ n ( | β j | ) s g n ( β j ) .
Using the Taylor expansion, we have
F ( β ) β j = F ( β 0 ) β j n l = 1 d n A j l ( β 0 ) ( β l β 0 l ) { 1 + o p ( 1 ) } = I I I ,
where A j l ( β 0 ) is the ( j , l ) -element of A ( β 0 ) . According to the standard argument, it follows that
I = O p ( n ) = O p ( n d n ) .
Next, we consider I I . According to the Cauchy–Schwarz inequality, it follows that
I I n l = 1 d n A j l 2 ( β 0 ) 1 / 2 β β 0 ,
and then we have
j = 1 d n A j l 2 ( β 0 ) = e j T A ( β 0 ) T A ( β ) e j λ m a x 2 { A ( β 0 ) } = O ( 1 ) ,
where e j is a d n × 1 vector whose jth element is 1 and others are 0. Since β 1 β 10 = O p ( d n / n ) and β 2 C d n / n , we can obtain β β 0 = O p ( d n / n ) . Hence,
I I = O p ( n d n ) , F ( β ) β j = O p ( n d n ) .
Thus,
G ( β ) β j = n λ n λ n 1 p λ n ( | β j | ) s g n ( β j ) + O p { 1 / ( λ n n / d n ) } .
Given the assumption that λ n n / d n , O p { 1 / ( λ n n / d n ) } = o p ( 1 ) , the sign of G ( β ) β j is determined by the sign of β j . Thus, we complete the proof. □
Proof of Theorem 3.
We can immediately prove part 1 through Lemma A1. To prove part 2, it is sufficient to show that
f 1 ( β 0 ) n ( A 11 + Σ ) β 1 ^ β 10 + ( A 11 + Σ ) 1 b + o p ( n ) = 0 ,
where f 1 ( β 0 ) denotes the first s n components of f ( β 0 ) . Then,
n ( A 11 + Σ ) β ^ 1 β 10 + ( A 11 + Σ ) 1 b = 1 n f 1 ( β 0 ) + o p ( 1 )
According to condition (F) and Cui [7], we have
1 n f 1 ( β 0 ) d N ( 0 , W 11 ) .
Thus, we need to prove that (A4) is valid. There exists a β 1 ^ in Theorem 2 that is a r o o t ( n / d n ) -consistent approximate zero-crossing of g { ( β 1 , 0 ) T } , and it satisfies
G { ( β ^ 1 , 0 ) T } / β j = 0 , j = 1 , , s n .
Using the Taylor expansion to G { ( β ^ 1 , 0 ) T } / β 1 at β 10 , it follows that
f 1 ( β 0 ) + f 1 ( β 1 ) ( β 1 ^ β 10 n b n Σ ( β 1 ^ β 10 ) ) = 0 ,
where β 1 lies between β 1 ^ and β 10 , and Σ = d i a g p λ n ( | β 1 | ) , , p λ n ( | β s n | ) . From (A2), we have
{ f 1 ( β 1 ) + n A 11 } ( β 1 ^ β 10 ) n n 1 f 1 ( β 1 ) + A 11   β 1 ^ β 10 n o p ( 1 ) O p ( d n / n ) = o p ( n / d n ) = o p ( n ) .
From condition (I), it follows that
n ( Σ Σ ) ( β 1 ^ β 10 ) n C 5 β 1 ^ β 10 2 = C 4 n O p ( d n / n ) = O p ( d n ) = o p ( n ) .
Therefore, (A4) holds. Thus, we complete the proof. □

References

  1. Liang, K.Y.; Self, S.G.; Chang, Y.C. Modelling marginal hazards in multivariate failure time data. J. R. Stat. Soc. Ser. B Stat. Methodol. 1993, 55, 441–453. [Google Scholar] [CrossRef]
  2. Lin, D.Y. Cox regression analysis of multivariate failure time data: The marginal approach. Stat. Med. 1994, 13, 2233–2247. [Google Scholar] [CrossRef] [PubMed]
  3. Spiekerman, C.F.; Lin, D.Y. Marginal regression models for multivariate failure time data. J. Am. Stat. Assoc. 1998, 93, 1164–1175. [Google Scholar] [CrossRef]
  4. Wei, L.J.; Lin, D.Y.; Weissfeld, L. Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J. Am. Stat. Assoc. 1989, 84, 1065–1073. [Google Scholar] [CrossRef]
  5. Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B Methodol. 1972, 34, 187–202. [Google Scholar] [CrossRef]
  6. Cox, D.R. Partial likelihood. Biometrika 1975, 62, 269–276. [Google Scholar] [CrossRef]
  7. Cui, W.Q.; Ying, Z.L.; Zhao, L.C. A simple construction of optimal estimation in multivariate marginal Cox regression. Sci. China Math. 2012, 55, 1827–1857. [Google Scholar] [CrossRef]
  8. Tibshirani, R. The lasso method for variable selection in the Cox model. Stat. Med. 1997, 16, 385–395. [Google Scholar] [CrossRef]
  9. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
  10. Zhang, H.H.; Lu, W. Adaptive Lasso for Cox’s proportional hazards model. Biometrika 2007, 94, 691–703. [Google Scholar] [CrossRef]
  11. Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 2010, 38, 894–942. [Google Scholar] [CrossRef] [PubMed]
  12. Cai, J.W.; Fan, J.Q.; Li, R.Z.; Zhou, H.B. Variable selection for multivariate failure time data. Biometrika 2005, 92, 303–316. [Google Scholar] [CrossRef] [PubMed]
  13. Liu, J.C.; Zhang, R.Q.; Zhao, W.H.; Lv, Y.Z. Variable selection in semiparametric hazard regression for multivariate survival data. J. Multivar. Anal. 2015, 142, 26–40. [Google Scholar] [CrossRef]
  14. Sun, L.; Li, S.; Wang, L.; Song, X.; Sui, X. Simultaneous variable selection in regression analysis of multivariate interval-censored data. Biometrics 2022, 78, 1402–1413. [Google Scholar] [CrossRef]
  15. Fan, J.Q.; Li, R.Z. Variable Selection via Penalized Likelihood; Department of Statistics, UCLA: Los Angeles, CA, USA, 1999. [Google Scholar]
  16. Fan, J.Q.; Li, R.Z. Variable selection for Cox’s proportional hazards model and frailty model. Ann. Stat. 2002, 30, 74–99. [Google Scholar] [CrossRef]
  17. Andersen, P.K.; Gill, R.D. Cox’s regression model for counting processes: A large sample study. Ann. Stat. 1982, 10, 1100–1120. [Google Scholar] [CrossRef]
  18. Cai, J.W. Hypothesis testing of hazard ratio parameters in marginal models for multivariate failure time data. Lifetime Data Anal. 1999, 5, 39–53. [Google Scholar] [CrossRef]
  19. Cui, W.Q. Analysis of Multivariate Survival Data by Marginal Proportional Hazards Regression Models. Ph.D. Thesis, University of Science and Technology of China, Hefei, China, 2004. (In Chinese). [Google Scholar]
  20. Fan, J.Q.; Li, R.Z. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Am. Stat. Assoc. 2004, 99, 710–723. [Google Scholar] [CrossRef]
  21. Fan, J.Q.; Li, R.Z. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  22. Cai, K. Bi-level Variable Selection and Dimension-reduction Methods in Complex Lifetime Data Analytics. Ph.D. Thesis, University of Calgary, Calgary, AB, Canada, 2019. [Google Scholar]
  23. Raftery, A.E. A continuous multivariate exponential distribution. Commun. Stat. Theory Methods 1984, 13, 947–965. [Google Scholar] [CrossRef]
  24. Raftery, A.E. Some properties of a new continuous bivariate exponential distribution. Stat. Decis. Suppl. Issue 1985, 2, 53–58. [Google Scholar]
  25. Moertel, C.G.; Fleming, T.R.; Macdonald, J.S.; Haller, D.G.; Laurie, J.A.; Goodman, P.J.; Ungerleider, J.S.; Emerson, W.A.; Tormey, D.C.; Glick, J.H.; et al. Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. N. Engl. J. Med. 1990, 322, 352–358. [Google Scholar] [CrossRef] [PubMed]
  26. Moertel, C.G.; Fleming, T.R.; Macdonald, J.S.; Haller, D.G.; Laurie, J.A.; Tangen, C.M.; Ungerleider, J.S.; Emerson, W.A.; Tormey, D.C.; Glick, J.H.; et al. Fluorouracil plus levamisole as effective adjuvant therapy after resection of stage III colon carcinoma: A final report. Ann. Intern. Med. 1995, 122, 321–326. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Mean squared error (MSE) of parameter estimates for β for different number of partitions.
Figure 1. Mean squared error (MSE) of parameter estimates for β for different number of partitions.
Axioms 14 00348 g001
Table 1. Relative model error.
Table 1. Relative model error.
MethodqRMENumber of ZerosRMENumber of Zeros
CIC CIC
c = 1c = 5
n = 200
LASSO 0.47310.4580.0360.53110.5130.006
SCAD0.980.67710.7030.0240.74510.7830.004
PPEE 0.73310.7120.0240.80110.7850.004
LASSO 0.46910.4710.0300.51810.5430.005
SCAD0.80.69110.7100.0300.75310.7700.006
PPEE 0.70310.7050.0360.78310.7760.004
LASSO 0.47910.4430.0360.52010.4120.004
SCAD0.250.68210.7020.0260.74710.7730.002
PPEE 0.69110.7190.0240.75610.7750.002
n = 1000
LASSO 0.58310.4980.0000.61210.5320.000
SCAD0.980.75210.8110.0000.79610.8480.000
PPEE 0.81310.8190.0000.85710.8550.000
LASSO 0.59010.4730.0000.59310.5280.000
SCAD0.80.74310.8020.0000.79010.8420.000
PPEE 0.79210.8110.0000.82410.8470.000
LASSO 0.59110.4820.0000.58810.5410.000
SCAD0.250.75510.8070.0000.78110.8370.000
PPEE 0.76110.8110.0000.79210.8410.000
Table 2. Estimated coefficients and standard errors for the Colon Cancer Study using different methods.
Table 2. Estimated coefficients and standard errors for the Colon Cancer Study using different methods.
EffectUNMLASSOSCADPPEE
Recurrence
Lev−0.026 (0.111)0.0000.0000.000
Lev + 5FU−0.499 (0.122)−0.441 (0.108)−0.416 (0.108)−0.428 (0.107)
Sex−0.138 (0.096)0.0000.0000.000
Age−0.003 (0.004)0.0000.0000.000
Obstruct0.194 (0.119)0.061 (0.095)0.050 (0.134)0.048 (0.103)
Perfor0.211 (0.257)0.0000.0000.000
Adhere0.161 (0.130)0.028 (0.137)0.028 (0.136)0.000
Nodes0.038 (0.015)0.037 (0.017)0.0000.000
Differ0.153 (0.098)0.118 (0.108)0.036 (0.106)0.024 (0.105)
Extent0.451 (0.119)0.414 (0.120)0.393 (0.119)0.532 (0.116)
Surg0.240 (0.104)0.072 (0.110)0.084 (0.108)0.000
Node40.591 (0.141)0.641 (0.103)0.772 (0.146)0.751 (0.106)
Death
Lev−0.041 (0.114)0.0000.0000.000
Lev + 5FU−0.362 (0.122)−0.294 (0.109)−0.209 (0.108)−0.226 (0.107)
Sex0.007 (0.097)0.0000.0000.000
Age0.008 (0.004)0.006 (0.004)0.002 (0.004)0.000
Obstruct0.269 (0.120)0.118 (0.135)0.098 (0.135)0.094 (0.131)
Perfor0.017 (0.270)0.0000.0000.000
Adhere0.170 (0.131)0.138 (0.145)0.130 (0.141)0.135 (0.126)
Nodes0.044 (0.015)0.043 (0.014)0.0000.000
Differ0.138 (0.101)0.106 (0.110)0.007 (0.106)0.003 (0.110)
Extent0.446 (0.118)0.420 (0.114)0.377 (0.111)0.427 (0.112)
Surg0.240 (0.106)0.021 (0.113)0.079 (0.110)0.000
Node40.667 (0.143)0.641 (0.128)0.657 (0.143)0.899 (0.153)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tian, W.; Cui, W. A Partitioning-Based Approach to Variable Selection in WLW Model for Multivariate Survival Data. Axioms 2025, 14, 348. https://doi.org/10.3390/axioms14050348

AMA Style

Tian W, Cui W. A Partitioning-Based Approach to Variable Selection in WLW Model for Multivariate Survival Data. Axioms. 2025; 14(5):348. https://doi.org/10.3390/axioms14050348

Chicago/Turabian Style

Tian, Wenjian, and Wenquan Cui. 2025. "A Partitioning-Based Approach to Variable Selection in WLW Model for Multivariate Survival Data" Axioms 14, no. 5: 348. https://doi.org/10.3390/axioms14050348

APA Style

Tian, W., & Cui, W. (2025). A Partitioning-Based Approach to Variable Selection in WLW Model for Multivariate Survival Data. Axioms, 14(5), 348. https://doi.org/10.3390/axioms14050348

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop