Next Article in Journal
Lump-Type Solutions, Lump Solutions, and Mixed Rogue Waves for Coupled Nonlinear Generalized Zakharov Equations
Next Article in Special Issue
Statistical Considerations for Analyzing Data Derived from Long Longitudinal Cohort Studies
Previous Article in Journal
The Evolution of Probability Density Function for Power System Excited by Fractional Gaussian Noise
Previous Article in Special Issue
Scrambling Reports: New Estimators for Estimating the Population Mean of Sensitive Variables
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Subgroup Identification in Survival Outcome Data Based on Concordance Probability Measurement

1
Department of Biostatistics, School of Public Health, Southern Medical University, Guangzhou 510515, China
2
Otsuka Pharmaceutical Development & Commercialization Inc., Rockville, MD 20878, USA
3
Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, Washington, DC 20057, USA
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(13), 2855; https://doi.org/10.3390/math11132855
Submission received: 31 March 2023 / Revised: 11 June 2023 / Accepted: 24 June 2023 / Published: 26 June 2023
(This article belongs to the Special Issue Current Research in Biostatistics)

Abstract

:
Identifying a subgroup of patients who may have an enhanced treatment effect in a randomized clinical trial has received increasing attention recently. For time-to-event outcomes, it is a challenge to define the effectiveness of a treatment and to choose a cutoff time point for identifying subgroup membership, especially in trials in which the two treatment arms do not differ in overall survival. In this paper, we propose a mixture cure model to identify a subgroup for a new treatment that was compared to a classical treatment (or placebo) in a randomized clinical trial with respect to survival time. Using the concordance probability measurement (K-index), we propose a statistic to test the existence of subgroups with effective treatments in the treatment arm. Subsequently, the subgroup is defined by a limited number of covariates based on the estimated area under the curve (AUC). The performance of this method in different scenarios is assessed through simulation studies. A real data example is also provided for illustration.

1. Introduction

Randomized clinical trials provide conclusive information on treatments by comparing the existing standard treatment with a new treatment, with the aim of increasing the time to failure of the treated patient. Although phase II trials provide sufficient information, it often fails to find an efficient treatment that benefits all patients. However, a new treatment is sometimes shown to be slightly better than the standard therapy, but not sufficiently better for the entire population. In such cases, there may be a subgroup of patients for whom the new treatment provides a substantial benefit. This means that the survival time of a certain subgroup of patients shows a significant improvement under the new treatment, while others may not experience any improvement or may even experience decreased survival in the treatment arm. In such situations, it is challenging to identify such a subgroup and demonstrate that the benefit is likely to be real for future patients [1].
The motivation for this analysis is based on a use case in an AIDS clinical trial, where HIV patients were randomized into the control group receiving monotherapy with didanosine and the treatment group receiving combination therapy with didanosine and zidovudine [2]. In this clinical trial, the effectiveness of the new combination therapy in patients with CD4+ T cells between 200 and 500 mm3 was evaluated by comparing it with the monotherapy treatment for HIV infection. The survival time was defined as the elapsed time from the initial treatment to an event indicating progression to AIDS. A total of 18 demographic and clinical characteristics of patients were considered, and there were no significant differences between the control and treatment groups for any of them (p-value > 0.2), indicating that the two arms were balanced. The log-rank test showed that the combination therapy of didanosine and zidovudine did not have a significant difference compared to the monotherapy of didanosine (p-value = 0.181, see Figure 1 below), although the combination therapy was slightly better than the monotherapy for the whole population. Despite the balanced covariates between the monotherapy and combination therapy groups, it is interesting to explore if there is a subgroup for whom the combination therapy of didanosine and zidovudine is more effective.
The search for subgroups with differential treatment effects is known as subgroup identification (see [3,4,5,6]). When outcomes from competing treatments are observed for all subjects in both arms, subgroup identification can be achieved using classification or supervised learning algorithms (see, for example, [7,8,9]). However, in common clinical trials, each patient receives only one treatment, resulting in an imbalance between the new treatment and placebo or an existing treatment for certain subgroups. Consequently, various statistical approaches to subgroup identification have been developed. Lipkovich and Dmitrienko [10] proposed a two-stage procedure in which a small number of biomarkers with the highest predictive ability, based on an appropriate variable importance score, were selected. Subsequently, subgroups with enhanced treatment effects were identified based on the selected biomarkers. Shen and He [11] introduced a structured logistic-normal mixture model for subgroup analysis. Ballarini et al. [12] provided a comparative analysis of different modeling strategies to estimate the predicted individual treatment effect. Additionally, Lipkovich et al. [13] presented a comprehensive review of a broad class of statistical methods used in subgroup identification.
The methods mentioned above for treatment noncompliance in observational data have primarily focused on continuous or binary outcomes and have received relatively less scrutiny with respect to survival outcomes. There are relatively few methods available specifically for identifying subgroups with right-censored survival endpoints (see [14,15,16,17]). Loeys and Goetghebeur [18] proposed the structural accelerated failure time (SAFT) models for estimating the causal effects of treatment. Zhang et al. [19] presented a nonparametric method for value function guided subgroup identification. Hu et al. [20] introduced nonparametric Bayesian additive regression trees within the framework of accelerated failure time models. Altstein and Li [21] proposed a semiparametric accelerated failure time mixture model for estimating treatment effects in a subgroup of interest with a time-to-event outcome in randomized clinical trials. However, to the best of our knowledge, there are currently no methods available for testing the existence of a subgroup with a significant treatment effect when the survival time for a new treatment is not sufficiently better than the control group.
In this paper, our objective is to develop a method for identifying a subgroup of patients who respond effectively to a new treatment that may not benefit all patients in a clinical trial. Specifically, we consider a Cox proportional hazards cure model ([22,23]), where patients with an effective treatment response are defined through a latent logistic model. While Wu et al. [24] proposed a likelihood ratio test procedure to assess the existence of subgroups based on the Cox proportional hazards cure model, their testing procedure only determines the presence of a cured subgroup. Since the concordance probability, using the K-index has been utilized to measure prognostic accuracy in survival settings ([25,26]), we propose a statistic to test the existence of subgroups with differential treatment effects. Additionally, we aim to identify the specific subgroup based on its association with subject-specific variables.
The rest of the paper is organized as follows. In Section 2, we introduce the Cox proportional hazards cure model for identifying subgroups with effective treatment in a randomized clinical trial, where differential treatment effects exist. We also propose a statistical test procedure using the K-index to determine the presence of such subgroups. Section 3 examines the performance of our proposed test through numerical simulations. Furthermore, in Section 4, we apply the proposed methodology to the aforementioned clinical trial, highlighting its potential usefulness. We conclude the paper in Section 5 and provide a brief discussion in Section 6.

2. Methods

2.1. Structured Cure Models

Consider a two-arm clinical trial a random sample of n subjects receiving one of the two pre-specified treatments. For patient   i , we observe X i , δ i , Z i , T R i   for   i = 1 ,   2 ,   ,   n , where   X i = min T i ,   C i   is the observed time, and T i and C i are the survival and censoring times, respectively.   δ i = I T i < C i   is the censoring indicator,   Z i R q   is the observed q-dimensional covariates of patient   i , and   T R i   is the treatment indicator with   T R i = 0   if patient   i   in the control group and   T R i = 1   if patient   i   in the treatment group. Let   T 0   be the study end time point, i.e.,   X i 0 ,   T 0 . Usually, the Cox proportional hazards model is employed to patient survival time   T i ,
λ t | Z i = λ 0 t e x p Z i β + T R i β T R
t R + = 0 , , for   i = 1 ,   2 ,   ... ,   n ,   where   β   and β T R are unknown coefficients to be estimated. When the estimated   β T R   is significant negative, the patients in the treatment group have significant treatment effects. In this paper, we want to know if there is a subgroup in the treatment group such that the patients have a significantly longer survival time compared with that in the control group.
For patient i in the treatment group   T R i = 1 with the given covariates Z i , it is assumed that the longer survival time   T i   or   X i   indicates patient i having the better treatment effect. To investigate the treatment effect, let   Y i   be a latent subgroup indicator that is dependent on the covariates Z i such that   Y i = 0   if patient   i   has the treatment effect and otherwise   Y i = 1 . For patient   i in the treatment group, the following logistic model is employed for identifying the subgroup with treatment effects,
P Y i = 1 | Z i = π γ Z i = e x p W i γ 1 + e x p W i γ , P Y i = 0 | Z i = 1 π γ Z i = 1 P Y i = 1 | Z i
where W i = 1 , Z i . Conditional on   Y i = 1 , the survival time   T i   follows the Cox proportional hazards model
λ u t | Z i = λ 0 t e x p Z i β , t R + = 0 , ,
where   γ   and   β   are unknown coefficients to be estimated.   Z i   in (2) and (3) may share some, none or all covariates. Then, we have a semiparametric logistic/proportional-hazards mixture model
S t | Z i = π γ Z i S u t | Z i + 1 π γ Z i ,
for   i = 1 , 2 , , n ,   where   S u t | Z i   is the survival function with its hazard function   λ u t | Z i   in (3). Unlike the cured models defined in [22,23], the subgroup indicator   Y i   is unobserved and dependent on the covariates Z i which results in more complicates for estimating the parameters in model (4). Since the longer survival time is assumed to be the better treatment effect, we define a new indicator   Y i s depending on time s to search some subgroups. For a given time s 0 , T 0 , we define
Y i s = 0             if   T i > s             patient   i   has   the   treatment   effect ,   1           if   T i s             patient   i   has   no   treatment   effect ,
instead of   Y i   in model (2). The estimates of the corresponding parameters   γ   and   β   which are same as in (2) and (3) can be obtained via the EM algorithm such as in the cured models ([22,23]). Of course, the estimators   γ ^   and   β ^   are dependent on the given time point   s .

2.2. Hypothesis on Existence of Subgroups

To assess the strength of a risk classification system in survival analysis, the concordance probability is employed as a metric for the global assessment of discrimination. Based on the Cox proportional hazards model, the larger risk score   R i = Z i β   is, the shorter survival time is. We consider the concordance probability or K-index [25]
K = P T i > T j | R j > R i ,   Y i = Y j = 1 .
Obviously, 1 / 2 K 1 .   K = 1   implies that   R j > R i   can perfectly predict   T i > T j .   K = 1 / 2   is similar to coin tossing, with poor predictive value. The K -index is a predictive probability for patients without treatment effects. Based on the definition (6), the K -index is dependent on time t , denoted by K t . Let θ = β ,   γ . If there is no subgroup for a given time   t , the parameters   γ   in model (2) should be zero, and the corresponding index   K t ; θ   is equal to the   K 0 t ; β 0 based on a Cox proportional hazard model
λ t | Z i = λ 0 t e x p Z i β 0 ,         t R + = 0 , ,
with the data   X ˜ i ,   δ ˜ i , Z i ; i = 1 ,   ,   n ,
X ˜ i ,   δ ˜ i = X i ,   δ i             i f   X i t X i ,   0             i f   X i > t .
Therefore, we propose the following hypothesis for testing the existence of subgroups,
H 0 : K t ; θ K 0 t ; β 0 ,           f o r   a l l   t 0 , T 0
and the alternative hypothesis
H 1 : K t 0 ; θ K 0 t 0 ; β 0 ,           f o r   s o m e   t 0 0 , T 0 .
For a given time point   t , Gonen and Heller [25] proposed to estimate the index K 0 t ; β 0 as
K ^ 0 ,   n t ; β ^ 0 = 2 n n 1 i j I Z i β ^ 0 < Z j β ^ 0 1 + exp Z j β ^ 0 Z i β ^ 0 ,
where   β ^ 0   is the estimates of   β 0 .   K ^ 0 ,   n t ; β ^ 0   is a consistent estimate of   K 0 t ; β 0   and has an asymptotically normal distribution which is independent on the unknown censoring distribution. For the treatment effects defined in (5), let θ ^ = β ,   γ   be the parametric estimates for the mixture model (4). The K-index based on logistic/proportional hazards mixture model is estimated with marginal probability as
K ^ n t ; θ ^ = i j G i j β ^ I Z i β ^ < Z j β ^ π γ ^ Z i π γ ^ Z j i j I Z i β ^ < Z j β ^ π γ ^ Z i π γ ^ Z j ,
where
G i j β ^ = 1 1 + exp Z i β ^ Z j β ^ .
The estimator   K ^ n t ; θ ^   does not depend on the baseline survival hazards function   λ 0 t   and is consistent and asymptotically normal under some regularity conditions (Theorem 2 in [26]),
n K ^ n t ; θ ^ K t σ K N 0 ,   1 ,
where   σ K 2 / n = v a r K ^ n t ; θ ^ .
Let   0 < t 1 < t 2 < < t m T 0   be the ordered time points of   X i , i = 1 , 2 , , n . To test the hypothesis   H 0   in (9), we propose the statistics, for   j = 1 , 2 , , m ,
D j ,   n = K ^ 0 , n t j ; β ^ 0 K ^ n t j ; θ ^ σ K ^ 0 , n t j ; β ^ 0 2 + σ K ^ n t j ; θ ^ 2 N 0 ,   1 ,
as   n   where
1 n σ K ^ 0 , n t j ; β ^ 0 2 = v a r K ^ 0 , n t j ; β ^ 0
and
σ K ^ n t j ; θ ^ 2 = n v a r K ^ n t j ; θ ^ .
For a given significant level   α , define
P n = m i n 1 Φ D j ,   n ;   j = 1 ,   2 ,   ,   m .
where   Φ ·   is the distribution function of   N 0 , 1 . If   P n α / 2 , we cannot reject the null hypothesis   H 0 . Otherwise, there is some   j 0   such that   1 Φ D j ,   n < α / 2   and there is a subgroup for treatment effects with given time point   t j 0   in (5).
When the hypothesis   H 0   is rejected, we consider the area under the   R O C   curve   A U C   for treatment effects with given time point   t j c u t , which is the first   t   to reject   H 0   in ascending survival time order.
A U C = P W i γ > W j γ | Y i = 1 ,   Y j = 0 .
The consistent estimator of   A U C   is given by [26]
A U C ^ θ ^ ,   λ ^ 0 t = i j I W i γ > W j γ v ^ i 1 v ^ j i j v ^ i 1 v ^ j ,
where   v ^ i   is the estimated conditional probability of no treatment effective patients   v i = P ( Y i = 1 | X   t j c u t ,   δ i ,   Z i ) . The estimator of   v i   is the byproduct of E-step in the EM algorithm for the mixture model (4). If   A U C ^ θ ^ ,   λ ^ 0 t   is significantly greater than 0.5, a subgroup with effective treatment can be found based on the mixture model (4).

3. Simulation Studies

To examine the finite sample performance of the proposed test in (13), this section presents simulation studies. For uncured subjects (without treatment effect), we generated survival time data from the following model
λ t | Z 1 i ,   Z 2 i = λ 0 t e x p β 1 Z 1 i + β 2 Z 2 i + β T R T R i ,  
for   t R + = 0 , , where the baseline survival function is from Weibull distribution with the base hazard function   λ 0 t = 1 .   Z 1 i ,   i = 1 ,   ,   n   were generated from Z 1 ~ N 0 ,   2 2 , and   Z 2 i ,   i = 1 ,   ,   n   were generated from Z 2 ~ B n ,   p   with   p = 0.5 . We assume that the survival time has no significant difference between the control and treatment groups and set   β T R = 0   in (17).   β 1   and   β 2   were both set to be   l o g 2 . We generated the censoring times from a uniform distribution,   U 0 , q , where   q   was chosen to represent the percentage of censored observations in uncured subjects. While we also considered the exponential distribution and other distributions for the censoring times, the results were similar and are not presented in this study.
For the cure status (patients with treatment effects), we chose the following logistic model with covariates   W 1   and   W 2   which were generated in the same way as   Z 1   and   Z 2 :
π b Z i = e x p b 0 + b 1 Z 1 i + b 2 Z 2 i 1 + e x p b 0 + b 1 Z 1 i + b 2 Z 2 i ,
Both   b 1   and   b 2   were set to be   l o g 2 , and b 0   was selected for the percentages of cured observations in each replication. The uncured status Y was assumed to follow a Bernoulli distribution with a success probability   π b   in (18). Without loss of generality, the simulation studies considered the case where the censored rate exceeds the cured rate. As the cured rates increased, the testing power also increased. Two sample sizes, n = 200 and 400, were used for the simulations. For each simulated dataset, the   D n statistic was calculated using Formula (13) and H 0 was tested using Formula (14). The procedure was repeated 1000 times to assess the type I errors and powers of our proposed method.
When there are no cured individuals (i.e., no subgroups with treatment effects), Table 1 demonstrates that the type I error increases with ascending censored proportions. However, all of the type I errors are still able to maintain the nominal significant level of 5%.
Conditional on a fixed cured proportion (e.g., 30%), the power will decrease as the censored rates increase. However, even with increasing censored rates, the power remains sufficiently high (e.g., 85%) as long as the differences between the cured and censored proportions are <15% with a sample size of n = 200. With a larger sample size of n = 400, the power can still be maintained at >80% even with higher censored rates (see Table 2). When the cured proportion is 50%, it can be observed that the power remains high only when the sample size is large (e.g., n = 400) and the differences between the cured and censored proportions are <15%.
With a fixed difference between cured and censored proportions (e.g., 10%), the power decreases as the cured proportions increase. However, if the sample size is sufficiently large (e.g., n = 400), the power can remain high (>90%). The simulation results are summarized in Table 3.
With our proposed method, we first determine the time to cure   t j c u t , and then obtain predicted subgroup memberships and their accuracy (the proportion of correctly classified). A logistic model can be constructed using the predicted subgroups and associated covariates for further subgroup identification. Table 4 presents the accuracy and area under the curve (AUC) obtained from logistic models, which remain adequate across most scenarios. A higher censored proportion results in lower accuracy and given a fixed difference between the cured and censored proportions, the accuracy decreases as the cured proportion increases.

4. Application to Clinical Trial Data

We analyzed the real data ACTG175 [2] with our proposed method. In this study, the sample sizes of “combination therapy of zidovudine and didanosine” and “monotherapy didanosine” are   n t r = 522   and   n c o n = 559 , respectively. The test showed subgroups may exist ( D j 0 ,   n = 2.364, p -value = 0.009) in the treatment group with the time   t j 0   = day 486. The model to identify subgroups is   l o g i s t P Y = 1 = 3.5506 + 0.5783 × symptom + 2.4904 × offtrt , which shows that patients’ symptom (symptomatic indicator, 0 = asymptomatic, 1 = symptomatic) and offtrt (indicator of off-treatment before 96 ± 5 weeks, 0 = no, 1 = yes). The   A U C   based on the logistic model is 0.7903 (95% CI: (0.7342, 0.8465)) and the best.
The probability threshold was 0.1530 with the corresponding sensitivity and specificity at 81.7% and 72.9%. The log-rank test shows that the treatment effect of the beneficial subgroup   Y = 0   is better than the control group (p-value = 0.000253), which is different from the result of the total treatment group vs. the control group ( p -value = 0.181). Within the treatment group (combination therapy of zidovudine and didanosine), the beneficial subgroup   Y = 0   also showed a better effect than the non-beneficial subgroup   Y = 1   ( p -value < 0.0001, see Figure 2).

5. Conclusions

In clinical trials, it is recognized that certain patients may respond differently to treatment based on their individual characteristics. Existing methods for identifying effective subgroups with time-to-event outcomes often rely on estimating the survival function for the overall population using a predefined cutoff time point as the maximum observed follow-up time (e.g., [12,19]). However, since patients with different biomedical characteristics may exhibit varying treatment efficacies in the early and later time periods, this paper proposes a testing procedure for identifying potential subgroups using survival function estimates at different cutoff time points with right-censored time-to-event data.
The proposed test utilizes two types of K-indexes based on the semiparametric Logistic–Cox mixture model. Through simulation studies and real data analysis, it is demonstrated that the proposed approach effectively detects subgroups and identifies the time point at which survival statuses differ. These methods can also be adapted to other fields such as cancer prevention, public management, and marketing, where a particular public policy or marketing strategy may have varying effects on different groups of individuals. The results of this study contribute to the understanding of subgroup-specific treatment effects and provide a valuable tool for tailoring interventions to specific patient groups.

6. Discussions

There has been a recent surge in the development of methods for subgroup identification, recognizing the importance of associated covariates in this process. It is valuable to explore the role of covariates in identifying subgroups and to construct models for predicting subgroup identification in practical applications. Our research focuses on investigating effective semiparametric models in causal inference with time-dependent outcomes, such as commonly encountered survival data and dynamic treatment regimes.
However, it is essential to note that our proposed method relies on the correct specification of the assumed statistical models. Specifically, the modeling assumptions of the Cox proportional hazards framework must be met for the survival component of the mixture model. It is crucial to identify and employ model diagnostic procedures to ensure the validity of the proposed method. Furthermore, our method requires a relatively large sample size and relies on the differences between the cured and censored proportions.
In observational epidemiological studies or clinical trials without completely random treatment assignments, naive estimators of treatment effects based on treatment group data can be biased due to confounding. Consequently, causal inference methods are necessary to obtain unbiased estimators and establish causal relationships ([27]). These challenges and limitations present opportunities for further study and improvement in the future. Overall, addressing these issues and advancing the field of causal inference in the context of time-dependent outcomes will contribute to the development of more robust and reliable methods for subgroup identification and causal inference in clinical and epidemiological research.

Author Contributions

Conceptualization, P.Z. and H.-B.F.; methodology, S.A. and H.-B.F.; validation, P.Z.; formal analysis, S.A. and P.Z.; writing—original draft preparation, S.A.; and writing—review and editing, H.-B.F. and P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The research of An and Fang is partially supported by the National Cancer Institute (NCI) grant R01CA164717 and P30CA051008. An has conducted his work at Georgetown University when he was as a visiting scholar partially supported by China State Scholarship Foundation.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

  1. Ruberg, S.; Chen, L.; Wang, Y. The Mean Does Not Mean as Much Anymore: Finding Sub-Groups for Tailored Therapeutics. Clin. Trials 2010, 7, 574–583. [Google Scholar] [CrossRef]
  2. Hammer, S.; Katzenstein, D.; Hughes, M.; Gundacker, H.; Schooley, R.; Haubrich, R.; Henry, W.; Lederman, M.; Phair, J.; Niu, M.; et al. A Trial Comparing Nucleoside Monotherapy with Combination Therapy in HIV-Infected Adults with CD4 Cell Counts from 200 to 500 per Cubic Millimeter. N. Engl. J. Med. 1996, 335, 1081–1090. [Google Scholar] [CrossRef] [PubMed]
  3. Follmann, D. On the Effect of Treatment among Would-Be Treatment Compliers: An Analysis of the Multiple Risk Factor Intervention Trial. J. Am. Stat. Assoc. 2000, 95, 1101–1109. [Google Scholar] [CrossRef]
  4. Foster, J.; Taylor, J.; Ruberg, S. Subgroup Identification from Randomized Clinical Trial Data. Stat. Med. 2011, 30, 2867–2880. [Google Scholar] [CrossRef] [Green Version]
  5. Lipkovich, I.; Dmitrienko, A.; Denne, J.; Enas, G. Subgroup Identification Based on Differential Effect Search-A Recursive Partitioning Method for Establishing Response to Treatment in Patient Subpopulations. Stat. Med. 2011, 30, 2601–2621. [Google Scholar] [CrossRef]
  6. Zhao, L.; Tian, L.; Cai, T.; Claggett, B.; Wei, L. Effectively Selecting a Target Population for a Future Comparative Study. J. Am. Stat. Assoc. 2013, 108, 527–539. [Google Scholar] [CrossRef]
  7. Bonetti, M.; Gelber, R. Patterns of Treatment Effects in Subsets of Patients in Clinical Trials. Biostatistics 2004, 5, 465–481. [Google Scholar] [CrossRef]
  8. Xiao, S.; Pepe, M. Evaluating Markers for Selecting a Patient’s Treatment. Biometrics 2004, 60, 874–883. [Google Scholar] [CrossRef]
  9. Loh, W.; He, X.; Man, M. A Regression Tree Approach to Identifying Subgroups with Differential Treatment Effects. Stat. Med. 2015, 34, 1818–1833. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Lipkovich, I.; Dmitrienko, A. Strategies for identifying predictive biomarkers and subgroups with enhanced treatment effect in clinical trials using SIDES. J. Biopharm. Stat. 2014, 24, 130–153. [Google Scholar] [CrossRef] [PubMed]
  11. Shen, J.; He, X. Inference for Subgroup Analysis with a Structured Logistic-Normal Mixture Model. J. Am. Stat. Assoc. 2015, 110, 303–312. [Google Scholar] [CrossRef]
  12. Ballarini, N.M.; Rosenkranz, G.K.; Jaki, T.; König, F.; Posch, M. Subgroup identification in clinical trials via the predicted individual treatment effect. PLoS ONE 2018, 13, e0205971. [Google Scholar] [CrossRef] [Green Version]
  13. Lipkovich, I.; Dmitrienko, A.; D’Agostino Sr, R.B. Tutorial in biostatistics: Data-driven subgroup identification and analysis in clinical trials. Stat. Med. 2017, 36, 136–196. [Google Scholar] [CrossRef]
  14. Ciampi, A.; Negassa, A.; Lou, Z. Tree-structured Prediction for Censored Survival-data and the Cox model. J. Clin. Epidemiol. 1995, 48, 675–689. [Google Scholar] [CrossRef] [PubMed]
  15. Goetghebeur, E.; Molenberghs, G. Causal Inference in a Placebo-Controlled Clinical Trial with Binary Outcome and Ordered Compliance. J. Am. Stat. Assoc. 1996, 91, 928–934. [Google Scholar] [CrossRef]
  16. Kehl, V.; Ulm, K. Responder Identification in Clinical Trials with Censored Data. Comput. Stat. Data Anal. 2006, 50, 1338–1355. [Google Scholar] [CrossRef] [Green Version]
  17. Cuzick, J.; Sasieni, P.; Myles, J.; Tyrer, J. Estimating the Effect of Treatment in a Proportional Hazards Model in the Presence of Non-Compliance and Contamination. J. R. Stat. Soc. Ser. B 2007, 69, 565–588. [Google Scholar] [CrossRef]
  18. Loeys, T.; Goetghebeur, E. A Causal Proportional Hazards Estimator for the Effect of Treatment Actually Received in a Randomized Trial with All-or-Nothing Compliance. Biometrics 2003, 59, 100–105. [Google Scholar] [CrossRef] [PubMed]
  19. Zhang, P.; Ma, J.; Chen, X.; Shentu, Y. A nonparametric method for value function guided subgroup identification via gradient tree boosting for censored survival data. Stat. Med. 2020, 39, 4133–4146. [Google Scholar] [CrossRef] [PubMed]
  20. Hu, L.; Ji, J.; Li, F. Estimating heterogeneous survival treatment effect in observational data using machine learning. Stat. Med. 2021, 40, 4691–4713. [Google Scholar] [CrossRef]
  21. Altstein, L.; Li, G. Latent Subgroup Analysis of a Randomized Clinical Trial through a Semiparametric Accelerated Failure Time Mixture Model. Biometrics 2013, 69, 52–61. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Sy, J.; Taylor, J. Estimation in a Cox Proportional Hazards Cure Model. Biometrics 2000, 56, 227–236. [Google Scholar] [CrossRef] [PubMed]
  23. Fang, H.B.; Li, G.; Sun, J. Maximum Likelihood Estimation in a Semiparametric Logistic/Proportional-Hazards Mixture Model. Scand. J. Stat. 2005, 32, 59–75. [Google Scholar] [CrossRef]
  24. Wu, R.; Zheng, M.; Yu, W. Subgroup Analysis with Time-to-Event Data under a Logistic-Cox Mixture Model. Scand. J. Stat. 2016, 43, 863–878. [Google Scholar] [CrossRef]
  25. Gonen, M.; Heller, G. Concordance Probability and Discriminatory Power in Proportional Hazards Regression. Biometrika 2005, 92, 965–970. [Google Scholar] [CrossRef]
  26. Zhang, Y.; Shao, Y. Concordance Measure and Discriminatory Accuracy in Transformation Cure Models. Biostatistics 2018, 19, 14–26. [Google Scholar] [CrossRef] [Green Version]
  27. Bai, X.; Tsiatis, A.A.; O’Brien, S.M. Doubly robust estimators of treatment-specific survival distributions in observation studies with stratified sampling. Biometrics 2013, 69, 830–839. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The comparison of combination therapy (treatment) with monotherapy (control).
Figure 1. The comparison of combination therapy (treatment) with monotherapy (control).
Mathematics 11 02855 g001
Figure 2. The comparison of two subgroups (combination therapy) with monotherapy (control), Y = 0 means the beneficial subgroup, and Y = 1 means otherwise.
Figure 2. The comparison of two subgroups (combination therapy) with monotherapy (control), Y = 0 means the beneficial subgroup, and Y = 1 means otherwise.
Mathematics 11 02855 g002
Table 1. Type I error under different scenarios.
Table 1. Type I error under different scenarios.
Censored Proportion
n 0.100.300.50
2000.0010.0020.012
4000.0040.0080.049
n : sample size; simulation times = 1000.
Table 2. Power analysis with different scenarios.
Table 2. Power analysis with different scenarios.
Cured Proportion (Censored Proportion)
n 0.1 (0.2)0.2 (0.3)0.3 (0.4)0.4 (0.5)0.5 (0.6)
2000.9090.9520.9180.8640.668
4000.9950.9990.9990.9840.934
n : sample size; simulation times = 1000.
Table 3. Power with fixed differences between cured and censored proportions.
Table 3. Power with fixed differences between cured and censored proportions.
Censored Proportion
n cured rate0.350.400.450.50
2000.300.9680.9290.8500.696
4000.301.0000.9990.9890.955
Censored Proportion
n cured rate0.550.600.650.70
2000.500.8120.6610.4750.293
4000.500.9810.9230.8090.571
Censored proportion—cured proportion = 0.10.
Table 4. Accuracy of the proposed method.
Table 4. Accuracy of the proposed method.
Cured ProportionCensored ProportionPCCAUC
0.30.40.8130.779
0.30.50.7270.737
0.50.60.7880.774
0.50.70.6810.703
PCC: proportion of classified correctly.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

An, S.; Zhang, P.; Fang, H.-B. Subgroup Identification in Survival Outcome Data Based on Concordance Probability Measurement. Mathematics 2023, 11, 2855. https://doi.org/10.3390/math11132855

AMA Style

An S, Zhang P, Fang H-B. Subgroup Identification in Survival Outcome Data Based on Concordance Probability Measurement. Mathematics. 2023; 11(13):2855. https://doi.org/10.3390/math11132855

Chicago/Turabian Style

An, Shengli, Peter Zhang, and Hong-Bin Fang. 2023. "Subgroup Identification in Survival Outcome Data Based on Concordance Probability Measurement" Mathematics 11, no. 13: 2855. https://doi.org/10.3390/math11132855

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop