Abstract
Clinical trials with rare or distant outcomes are usually designed to be large in size and long term. The resource-demand and time-consuming characteristics limit the feasibility and efficiency of the studies. There are motivations to replace rare or distal clinical endpoints by reliable surrogate markers, which could be earlier and easier to collect. However, statistical challenges still exist to evaluate and rank potential surrogate markers. In this paper, we define a generalized proportion of treatment effect for survival settings. The measure’s definition and estimation do not rely on any model assumption. It is equipped with a consistent and asymptotically normal non-parametric estimator. Under proper conditions, the measure reflects the proportion of average treatment effect mediated by the surrogate marker among the group that would survive to mark the measurement time under both intervention and control arms.
1. Introduction
The HPTN (HIV Prevention Trial Network) 052 study is an HIV prevention trial conducted across several continents. The primary clinical endpoint of interest is that HIV infection is estimated to have a rate of 3–5% in modern clinical trial settings. In addition to this, the time from viral exposure to infection is long. The median infection time on average is more than one year. It is desirable to replace the clinically meaningful endpoint by an earlier and more easily accessible alternative endpoint.
A surrogate marker in clinical trials is considered to be “a laboratory measurement or physical sign used as a substitute for a clinically meaningful endpoint that measures directly how a patient feels, functions, or survives and that is expected to predict the effect of the therapy” [1]. It is considered to be valid if one could correctly conclude treatment effect on the clinical endpoint by using the marker [2,3]. In this context, how to validate surrogate markers for a clinically meaningful endpoint are especially important. Zhuang and Chen [4] review the surrogate measures in clinical research on their strengths and limitations in details.
A surrogate marker is considered to be valid if one could correctly conclude the treatment effect on the clinical endpoint by using that marker. In the language of hypothesis testing, that is, departure from the null hypothesis is captured by departure from the null hypothesis , where Z, S, and T represent the intervention, marker, and clinical endpoint, respectively. Prentice [5] operationalized the idea to test The conditional independence of T and Z represents an ideal situation when the marker fully mediates the treatment effect. However, many candidate markers may only capture part of the treatment effect, so that to which extent a marker captures the treatment effect is of great practical importance. Freedman et al. [6] further extended Prentice’s criterion and evaluated the strength of surrogate markers by comparing the treatment effect with and without adjusting for the marker. For a binary endpoint T and logistic models:
the proportion of treatment effect explained (PTE) is defined as However, the adjusted and unadjusted models do not hold simultaneously in general model classes [7,8,9], and the assumption of no interaction in the adjusted model is not necessarily true. To avoid model dependence, Wang and Taylor [9] proposed the F-measure in a general setting as where and Here, and are the distributions of surrogate marker S in the treatment group A and the control group B, respectively. The functions and are functions of the conditional distribution of the primary endpoint given S in the two groups. The functions , and are chosen such that is the desired measure of treatment effect on the primary endpoint. The F-measure framework is flexible while preserving the flavor of comparing the marginal treatment effect and adjusted treatment effect.
In this paper, we bring in the time dimension and define a generalized F-measure for time-to-event outcomes and time-varying internal surrogate markers explicitly. In Section 2, we introduce the time-varying F-measure. We show the measure can be estimated using a non-parametric estimator that is consistent and asymptotically normal. In Section 3, we give examples to visualize the change of F-measure with time and conduct Monte Carlo simulation studies to evaluate the proposed non-parametric estimation and inference. In Section 4, we apply the time-varying F-measure to an HIV prevention trial for illustration. Finally, we conclude the paper with a discussion in Section 5 and a conclusion in Section 6.
2. The Time-Varying F-Measure
2.1. Definition
We introduce the time-varying F-measure in this section. The new measure does not rely on any model assumption. In addition, it reflects Prentice’s criterion and describes the degree to which a marker captures the treatment effect on the clinical endpoints.
We consider intervention groups and . Let T represent the time-to-event outcome and represents the value of a candidate marker measured at time point t (after randomization). The time-varying F-measure is formulated to evaluate the marker when survival status at time point is of primary interest. We choose . Then, the F-measure for a time-to-event outcome T is:
where
It is a function of time point c when the survival status is of primary interest, and time point t when the surrogate marker is measured. The definition and estimation do not necessarily rely on any model assumption and are exempt from model misspecification.
The time-varying F-measure reflects Prentice’s criterion. Namely, the scenarios of perfect markers, in which a marker mediates all the treatment effect, lead to ; the scenarios of useless markers, in which a marker does not mediate any treatment effect or is independent of intervention in the group of interest, leads to In addition, when the treatment effect mediated by the marker is consistent with the direct treatment effect, the F-measure for a partial marker is guaranteed to be bounded within (0,1). A value outside the ideal bound indicates treatment effects via different pathways are not in the same direction so that the marker is not an appropriate surrogate. (Theoretic results are deferred to Section 2.3.)
In summary, the time-varying F-measure evaluates the relative position of the survival probability adjusted by eliminating the treatment effect on a biomarker. It serves as a model-free metric for assessing the proportion of treatment effect explained by the marker.
2.2. Estimation and Inference
In the time-varying F-measure, survival probabilities can be estimated by the non-parametric Kaplan–Meier estimator [10]. Under the assumption of random censoring, the conditional probability can be estimated by the empirical distribution. Naturally, we propose a plug-in estimator for the defined time-varying F-measure:
where and are the Kaplan–Meier estimator for and , respectively. Let be the ordered, distinct times observed on arm z; be the number of subjects at risk set at time on arm z; and be the number of events at time on arm z. The Kaplan–Meier estimator of survival probabilities reads:
Similarly, can be estimated by the Kaplan–Meier estimator in the strata by and as:
Under the assumption of random censoring, can be estimated by the empirical distribution as:
where .
We show the proposed estimator converges weakly to a Gaussian process under the following regularity assumptions (Proof of the theorem is deferred to Appendix A).
Assumption A1.
The time c is in a range of for some constant , such that and , where H is the distribution function of time-to-event T and G is the distribution function of censoring time U.
Assumption A2.
Survival probabilities on .
Assumption A3.
Random censoring: The censoring time U is independent of both the failure time T and time-varying covariates on .
Theorem 1.
Under regularity Assumptions A1–A3, given a time t, converges weakly to a zero-mean Gaussian process with covariance function between time points c and , where:
and
In the above equations, denote the observed counting process and the at-risk process. The covariance function can be consistently estimated by where is the sample versions of .
2.3. Ranges of F-Measure
2.3.1. Perfect Marker
When the marker mediates the entire treatment effect, we have It implies and furthermore,
2.3.2. Useless Marker
When the marker does not mediate any treatment effect, we have , when the intervention is independent of in the risk set at time point t, we have Either of the above useless marker conditions leads to and furthermore,
2.3.3. Partial Marker
Without loss of generality, we consider the case Theorem 2 stated below and its proof at Appendix B are naturally extendable for the counterpart case To give interpretability and links to common instances in clinical trials, we impose three mild assumptions:
Assumption A4.
in the treatment group and that in the control group are stochastically ordered, or
Assumption A5.
is monotone with x in the same direction for any given z.
Assumption A6.
is monotone with z in the same direction for any given x.
In addition, we formulate three conditions:
- C1.
- .
- C2.
- and is increasing with
- C3.
- and is decreasing with
Theorem 2.
With Assumptions A4–A6, if Condition C1 is satisfied, then ; if either Condition C2 or C3 is satisfied, then
2.4. Causal Interpretation
The F-measure is closely related with the concept of natural indirect effect, which is defined in the counterfactual framework [11,12]. We formulate Theorem 3 revealing the link with detailed proof in Appendix C.
Assumption A7.
.
Assumption A8.
.
Assumption A9.
.
Theorem 3.
Under Assumptions A7–A9, it holds that:
For the subgroup of F-measure’s numerator describes the natural indirect effect mediated by the surrogate marker while the denominator describes the average treatment effect. The ratio reflects the proportion of the average treatment effect mediated by the surrogate measure (in the sense of natural direct effect) for the subgroup surviving to marker measurement anyway. However, we also note that the causal interpretation does not apply in general [13].
3. Numerical Studies
To assess the proposed surrogate measure, we conduct numerical studies motivated by the HIV Prevention Trial Network. The plasma HIV-1 viral load represents the degree of viral burden and is believed to play a crucial role in mediating the benefit of antiretroviral therapy (ART) on HIV-related disease progression and transmission. We consider a viral load measurement dichotomized by a threshold of 1000 copies per cubic millimeter as the biomarker of interest. In a hypothetical scenario, participants have some HIV-1 exposure at the enrollment. The viral load level may increase fast in the follow-up while an effective intervention could delay the virus proliferation and further suspend the failure time. We express the above scenario in the following mathematical models. The dichotomous viral load level at time t is modeled as , where denotes the time when one’s viral load shifts from level 0 to 1 after enrollment. We assume follows an exponential distribution with mean in intervention group z, and a time-varying Cox–Weibull model:
where Z is Bernoulli with success probability of 0.5.
3.1. Numerical Examples
In Figure 1, we explore the numerical behavior of the time-varying F-measure under the motivation scenario described above. In particular, we assume Model (2) with a constant baseline hazard where Without loss of generality, we assume and . With the model assumption, the F-measure has a closed-form formula with details in the Appendix D. Figure 1a has varying visualized F-measures curves from an useless marker with to partial markers with Figure 1b has varying gives the F-measure curves from a perfect marker with to partial markers with
Figure 1.
F-measure curves describing the surrogacy level for survival status at Year 5.
3.2. Monte–Carlo Simulation
In this section, we describe our Monte–Carlo simulation to evaluate the proposed non-parametric estimator. We generate failure times for each z-group based on a closed-form approach described in Austin [14]: First, a random value u is generated from the Uniform(0, 1) distribution and the subject-specific shift time is generated from an exponential distribution with mean ; second, if , we let failure time otherwise, In addition, we generate the censoring times from Uniform , in which is chosen to give a censoring rate of . The censoring is independent of the failure time T, the covariates Z and .
Table 1 summarizes the simulation results. We consider the scale parameter v to be 0.8, 1, and 1.2, representing when the hazard is decreasing, constant, and increasing with time, respectively. We are interested in the surrogacy level of the t-th year marker measurement for the treatment effect on the c-th year survival probability. For each setting of v, we choose (years) and (years). We show typical scenarios when a surrogate marker is perfect, useless, or partial. A perfect surrogate marker explains all the treatment effect on the clinical endpoint, i.e., an useless maker is conditionally independent of the failure time given Z, i.e., ; a partial marker, the most common scenario in practice, is beyond the above extreme situations. Without loss of generality, we consider a treatment delaying the failure time by both directly affecting the clinical endpoint and suppressing a harmful marker. That is, , Specifically, here are the configurations for the three scenarios in Table 1: (1) A perfect marker: (2) a useless marker: and (3) a partial marker: We replicate 1000 times with 20,000 subjects. Under a large sample size, the estimator is unbiased; its variance accurately reflects the sampling variation; the coverage of 95% Wald-type confidence intervals is close to the nominal probability. One limitation for the non-parametric estimator is its lack of efficiency, which is the price for avoiding model misspecification.
Table 1.
Simulation results under Cox–Weibull distribution. The sample size of the study is 20,000 subjects and the coverage probability is obtained by 1000 replicates.
4. Data Analysis
We apply the proposed time-varying F-measure to the HIV Prevention Trial Network (HPTN) 052 study [15]. The study enrolled 1763 serodiscordant couples in which one participant was HIV-positive, and the other was HIV-negative. The HIV-positive patients were randomly assigned to receive either immediate or delayed ART in a 1:1 ratio. Patients on the delayed arm started ART when two consecutive CD4+ cell count measurements fell below 250 per cubic millimeter or an indicator of AIDS developed. The study monitored the earlier occurrence of severe clinical outcomes in HIV-positive patients or HIV transmission to HIV-negative partners as a key endpoint. It is believed that plasma viral load mediates the effect of ART on HIV-associated disease progression and transmission [16].
In this application, we consider the plasma viral load as a candidate marker and evaluate its surrogacy level on the composite monitoring endpoint in a 3-year follow-up. To explain the idea in a simple way, we dichotomize the viral load using a threshold of 1000 copies per cubic millimeter. More specifically, we set the marker value to be 1 for a viral load greater than 1000. We estimate the time-varying F-measure for the viral load measured at each of the 2nd to 7th quarter after randomization. Table 2 shows results of the application. Comparing the prevalence of a high viral load between the two arms reveals that ART was very effective in suppressing viral proliferation. In addition, a low viral load significantly decreases the hazard of the composite endpoint on the immediate arm before the treatment effect kicks in on the delayed arm. The time-varying F-measure gradually increases until reaching its maximum at the 6th quarter. This temporal pattern reflects the fact that the surrogacy level is a combination of the treatment effect on the marker and the marker effect on the clinical endpoint. On the one hand, it takes time to realize the effect of viral load suppression. On the other hand, as an increasing number of patients on the delayed arm began ART, the difference in marker distribution between two arms become smaller. The time-varying F-measure correctly reflects the temporal pattern and the biological mechanism of ART.
Table 2.
Application to an HIV prevention trial HPTN 052. The proposed time-varying F-measure captures the proportion of treatment effect explained by the plasma HIV-1 viral load.
5. Discussion
In this paper, we consider a definition of time-varying F-measure based on three aspects. First, there is the question of whether there is a sound interpretation for comparisons. Second, do the typical marker types, such as perfect or useless markers, correspond to reasonable values. Third, is the defined F-measure model-free and equipped with a non-parametric estimation? Guided by the three questions, we define the time-varying F-measure in Section 2. In addition, we explore two alternative definitions. Both of them do not conduct an appropriate comparison. With the F-measure can be defined as:
When the availability of an internal marker depends on the failure time (e.g., event is death-related), should include “not applicable” as a possible value for subjects with In this case, is determined by both the treatment effect on the marker and that on the primary endpoint. Compared to , the adjusted survival probability actually removes a portion of the direct treatment effect. This definition does not reflect the proportion of treatment effect explained by the surrogate marker in general. With the F-measure can be defined as:
A closer look at the marker distribution reveals that:
If there is no interaction between the marker and intervention, the independence of and Z in the risk set at time point t could translate to the independence at time point In other words, only if then is equivalent to and is constant with Assumption of no interaction is, unfortunately, necessary for the appropriateness of the definition with hazard functions. As a contrast, the time-varying F definition introduced in Section 2 has a sound interpretation, reasonable ranges, and model-free definition and estimation. Moreover, numerical studies and practical data analysis verify the measure’s numerical behavior.
The time-varying F-measure is a generalization of the PTE [6] and F-measure [9]. All three measures are quantitative ones based on the qualitative Prentice Criterion [5]. While Prentice Criterion tests and requires a surrogate marker to capture the treatment effect fully, the three quantitative measures compare the treatment effects unadjusted and adjusted by the marker distribution on the treatment arm. Beyond the similarities, PTE is defined for binary endpoints and relies on logistic regressions for definition and estimation; the F-measure is a model-free version of PTE, however it does not cover how to assess surrogate markers for time-to-event outcomes. The time-varying F-measure brings in the time dimension and extends the measure for time-to-event outcomes in survival settings.
6. Conclusions
This paper introduces a generalized proportion of treatment effect for survival settings, called the time-varying F-measure. Without relying on any model assumption, the measure reflects the proportion of the average treatment effect mediated by the surrogate marker. In addition, the paper introduces a non-parametric estimator to maximize the measure’s model-free characteristics. One limitation of the current estimation method is its lack of efficiency, which can be a future research direction. We applied the generalized F-measure to assess the viral load as a surrogate marker for HIV progression and transmission in the HPTN052 study. The time-varying F-measure increased from 0.18 in the 2nd quarter after randomization and reached 1.12 in the 6th quarter. It correctly captured the temporal pattern and biological mechanism of how ART regulates HIV progression and transmission by suppressing viral replication.
Author Contributions
Conceptualization, R.Z. and Y.-Q.C.; Formal analysis, R.Z.; Investigation, R.Z., F.X. and Y.W.; Methodology, R.Z.; Resources, Y.-Q.C.; Software, R.Z.; Supervision, Y.-Q.C.; Validation, R.Z.; Writing—original draft, R.Z.; Writing—review & editing, Y.-Q.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research was partly supported by the grants from NIH/NICHD R01 HD094682 and NIH/NIAID R56 AI140953.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Proof of Theorem 1
Proof.
The proof consists of three steps. First, we decompose as multiple empirical processes. Second, the convergence of each empirical process is derived. Third, we combine the asymptotic results and conclude the proof.
Step 1: We first write as:
and tackle the parts one by one. For the first part, we plug in the estimator (1) and obtain:
Rearranging the terms yields:
Then we collect the terms and write the equation in the form of , and as:
Similarly, we write the second part as:
Step 2: In this step, we derive the convergence of each empirical process. Under the assumption of random censoring, Kaplan–Meier estimator satisfies [17]:
where represents the counting process martingale. Therefore, the Kaplan–Meier estimator , , and satisfy:
where is an indicator function for subjects’ group.
Next we write in the form of:
It can be readily shown that:
Let and . Since is a consistent estimator of , we obtain:
as a consequence of Slutsky’s theorem.
Step 3: In this step, we combine the above results and conclude the convergence of . We first introduce some notation,
Combining the above results and apply Slutsky’s theorem, we can write:
It follows that converges weakly to a zero-mean Gaussian process with covariance function between time points c and , where:
The covariance function can be consistently estimated by with:
where , and is the subject i’s realization of (A1)–(A4), respectively. The specific forms are:
□
Appendix B. Proof of Theorem 2
Proof.
The proof consists of two steps. First, we show the sufficiency of Condition C1 for Second, we show the sufficiency of either Condition C2 or C3 for .
Step 1: We first expand as If Condition C1 is satisfied, we have Simple algebra reveals Given we conclude:
Step 2: If in the treatment group is stochastically greater than that in the control group, for bounded and increasing function When is increasing with we have that is, Use the same argument, if in the control group is stochastically greater than that in the treatment group and is decreasing with we have that is Given , we conclude:
In summary, if Condition C1 and C2 (or C3) are satisfied, the F-measure is bounded within (0,1). □
Appendix C. Proof of Theorem 3
Proof.
Step 1. In this step, we show We first write:
By Assumption A7, we have:
Further, Assumption A8 yields:
In a similar way, we can show
Step 2. In this step, we show By definition,
In the following, we work on the two probability components one by one.
The cross-world independence described in Assumption A7 leads to:
Furthermore, Assumption A8 yields:
The remaining probability component:
Then we have:
Assumption A9 gives so that:
Collecting the equations for the two probability components, we show:
□
Appendix D. F-Measure under the Time-Varying Cox–Weibull Model
To facilitate the exploration and understanding of the F-measure, we calculate its true value under a time-varying Cox–Weibull model as an illustrative example. We follow the notation described in the main paper: Z denotes treatment assignment (control = 0; treatment = 1); denotes the value of the marker at time t; T denotes the failure time; and c denotes the pre-specified time of interest for survival.
We consider the time-varying Cox–Weibull model:
where the marker value satisfies and follows an exponential distribution with mean . The definition of the F-measure reads:
With the Bayes rule, the conditional survival probability can be written in the form of:
In general, for
The first term of (A5):
Plugging in and the density leads to:
For a general Cox–Weibull model with , there is no closed form formula and we need to refer to a numerical evaluation. Similarly, the second term of (A5):
Next, the adjusted probability in the numerator of the F-measure , We use Bayes rule and obtain:
Plugging in and the density leads to:
Following the same lines, we work on:
To simplify the math, we introduce the following notation:
With the above notation, Equation (A5) writes:
The remaining parts in the F-measure definition are:
Gathering all the pieces together, we have:
When (i.e., the failure time follows an exponential distribution), terms A and C are equipped with a closed-form formula in the form of:
References
- FDA. New drug, antibiotic and biological drug product regulations: Accelerated approval. Fed. Regist. 1992, 57, 13234–13242. [Google Scholar]
- Baker, S.G.; Kramer, B.S. A perfect correlate does not a surrogate make. BMC Med. Res. Methodol. 2003, 3, 16. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fleming, T.R.; Powers, J.H. Biomarkers and surrogate endpoints in clinical trials. Stat. Med. 2012, 31, 2973–2984. [Google Scholar] [CrossRef] [PubMed]
- Zhuang, R.; Chen, Y.Q. Measuring Surrogacy in Clinical Research. Stat. Biosci. 2020, 12, 295–323. [Google Scholar] [CrossRef] [PubMed]
- Prentice, R.L. Surrogate endpoints in clinical trials: Definition and operational criteria. Stat. Med. 1989, 8, 431–440. [Google Scholar] [CrossRef] [PubMed]
- Freedman, L.S.; Graubard, B.I.; Schatzkin, A. Statistical validation of intermediate endpoints for chronic diseases. Stat. Med. 1992, 11, 167–178. [Google Scholar] [CrossRef] [PubMed]
- Lin, D.Y.; Fleming, T.R.; De Gruttola, V. Estimating the proportion of treatment effect explained by a surrogate marker. Stat. Med. 1997, 16, 1515–1527. [Google Scholar] [CrossRef]
- Bycott, P.W.; Taylor, J.M. An evaluation of a measure of the proportion of the treatment effect explained by a surrogate marker. Control. Clin. Trials 1998, 19, 555–568. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Taylor, J.M. A measure of the proportion of treatment effect explained by a surrogate marker. Biometrics 2002, 58, 803–812. [Google Scholar] [CrossRef] [PubMed]
- Kaplan, E.L.; Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 1958, 53, 457–481. [Google Scholar] [CrossRef]
- Robins, J.M.; Greenland, S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992, 3, 143–155. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Pearl, J. Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, USA, 2–5 August 2001; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2001; pp. 411–420. [Google Scholar]
- Taylor, J.M.; Wang, Y.; Thiébaut, R. Counterfactual Links to the Proportion of Treatment Effect Explained by a Surrogate Marker. Biometrics 2005, 61, 1102–1111. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Austin, P.C. Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Stat. Med. 2012, 31, 3946–3958. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cohen, M.S.; Chen, Y.Q.; McCauley, M.; Gamble, T.; Hosseinipour, M.C.; Kumarasamy, N.; Hakim, J.G.; Kumwenda, J.; Grinsztejn, B.; Pilotto, J.H.; et al. Antiretroviral therapy for the prevention of HIV-1 transmission. N. Engl. J. Med. 2016, 375, 830–839. [Google Scholar] [CrossRef] [PubMed]
- Murray, J.S.; Elashoff, M.R.; Iacono-Connors, L.C.; Cvetkovich, T.A.; Struble, K.A. The use of plasma HIV RNA as a study endpoint in efficacy trials of antiretroviral drugs. Aids 1999, 13, 797–804. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fleming, T.R.; Harrington, D.P. Counting Processes and Survival Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2011; Volume 169. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).