1. Introduction
Clinical trials often collect intermediate, or surrogate, endpoints other than their true endpoint of interest. Surrogate endpoints are chosen because they occur more frequently, are easier to measure, or occur more proximally to the treatment time. The use of surrogate endpoints can result in a reduction in the required sample size for a trial, leading to shorter trial duration, as well as reduced costs of conducting clinical trials. A good surrogate endpoint is one that accurately reflects the effect of a given treatment on the true endpoint of interest while incurring lower cost or taking less time to measure. Some examples of surrogate endpoints include tumor progression as a surrogate endpoint for cancer-specific mortality, or CD4 counts in blood as a surrogate endpoint for AIDS mortality.
There exist several approaches for evaluating the strength of proposed surrogate endpoints. The first formalized approach for surrogate endpoint validation was presented by Prentice in 1989, who suggested that, among other criteria, a good surrogate should be highly correlated with the true endpoint [
1]. He provided a method to test the surrogate by including it in a regression model of the true endpoint with the treatment and checking if it would eliminate the coefficient of the treatment association with the true endpoint of interest [
1]. Later work pointed out that this approach does not allow for causal claims about surrogate efficacy since it ignores the potential of confounders between the surrogate endpoint and true endpoint. Confounding is possible despite randomization, since the surrogate endpoint is measured after treatment [
2].
Since then, there have been several approaches proposed to evaluate surrogates in a causal inference framework when data are available on a single trial in which both outcomes are measured. These methods can be categorized into two major types: “causal effects” and “causal association” [
2,
3,
4]. The causal effects paradigm uses the potential outcomes framework, which considers all the outcomes that would be potentially observed if the treatment and placebo were both applied to each subject (a combination of the observed outcomes and counterfactual outcomes if a subject were assigned to the opposite treatment that they actually received) [
5]. Once the potential outcomes are defined, we consider both treatment and surrogate endpoints to be separately manipulable and create potential outcomes based on all possible combinations of potential outcomes [
2]. This allows the estimation of the total effect of treatment as the sum of direct effects of the treatment on the true endpoint and indirect effects of the treatment that go through the surrogate endpoint. An ideal surrogate would capture the majority of the indirect effect of the treatment on the true outcome of interest, leaving little direct effect of the treatment. In the causal association framework, only the treatment and not the surrogate is considered manipulable. To account for the fact that the surrogate endpoint is measured after treatment, the causal association framework conditions on the joint counterfactual values of the surrogate endpoint under both the treatment and the control. Both the causal effects and causal association approaches use models that are not entirely identifiable, since we never completely observe the counterfactual distribution. There is an alternate causal association approach, presented by Buyse et al. in 2000, in the meta-analytic setting, where data are available on multiple trials of the same treatment and surrogate combination [
6]. This approach leverages data from multiple randomized trials to assess the effectiveness of a surrogate endpoint, allowing all parameters to be identified from the observed data [
6]. This is the setting we consider in this paper.
The goal of measuring the validity of a surrogate is to make sure that a surrogate endpoint accurately captures the effect of the treatment on the true endpoint of interest. There have been several examples of surrogate endpoints that are positively associated with both the treatment and the true endpoint of interest but have not accurately predicted the treatment effect on the true endpoint. One notable example is in the development of a drug to fight ventricular arrhythmias, which were considered to be a surrogate for cardiac-related deaths. The drug was found to lower ventricular arrhythmias, and ventricular arrhythmias were positively associated with cardiac deaths, leading to the approval of the drug in clinical trials. Subsequent follow-up trials found that the drug was associated with a significantly increased risk of cardiac death [
7]. The phenomenon is labeled the “surrogate paradox” [
8]. The surrogate paradox occurs when the treatment has beneficial effects on the surrogate outcome, and the surrogate outcome is positively associated with the true outcome, yet the overall effect of the treatment on the true outcome is negative, leading to incorrect conclusions that can be potentially dangerous to public health. It has been shown that testing the efficacy of a surrogate endpoint under either the causal association or causal effects framework is not enough to fully preclude the risk of observing the surrogate paradox [
8]. There are several situations in which the surrogate paradox may be observed [
9]. The first is when a direct effect between the treatment and the true outcome runs in the opposite direction of the indirect effect of the treatment through the surrogate. The second is when there is uncaptured confounding between the surrogate and true endpoints. The third is when the effect of the treatment on the surrogate and true endpoints are different on the individual level, meaning that the positive effect of the treatment is experienced on the surrogate endpoint for some patients and on the true endpoint for a different set of patients. In his paper, Vanderweele discusses means of assessing the risk of surrogate paradox and concludes that the meta-analytic approach [
6] is the most effective, since it studies the efficacy of a surrogate measure over multiple trials. Elliott et al. proposed measures to assess the risk of surrogate paradox in the meta-analytic causal association framework [
10].
Treatments may have different effects on different patient subpopulations, and there is the possibility that some subpopulations in a study may be at a different risks of experiencing the surrogate paradox. To consider this possibility, in this paper, we propose extensions to the measures of surrogate paradox risk proposed by Elliott et al. [
10] that incorporate covariate information. Without considering covariate information when measuring the risk of surrogate paradox, there is the possibility that a new trial in a new population with different covariate distribution than past studies could expose those patients to a higher risk of surrogate paradox than what was expected. Incorporating covariate information may allow us to identify groups that are at particular risk of experiencing the surrogate paradox and help design future trials that make use of that surrogate. In the following sections, we describe the Buyse et al. meta-analytic causal association setting [
11], the proposed surrogacy paradox risk measures from Elliott et al. [
10], and then propose methods for incorporating covariate information.
2. Background
For surrogate marker
and outcome measure
, where
indexes the trials, and
indexes the subjects in the
ith trial, Buyse et al. [
6] considered the following distributions:
where
is an indicator of treatment assignment, and
and random effects
From this distribution, we can calculate the causal effect of a treatment
Z on the surrogate marker in the
ith trial as
Similarly, the causal effect of a treatment
Z on the outcome measure in the
ith trial is
Buyse et al. used the above distribution to suggest a trial-level measure of surrogate validity called
[
6].
is the proportion of variance explained by the trial-level random effects associated with the surrogate and is defined by
Elliott et al. use the joint distribution of
and
to develop several measures of surrogate paradox risk [
10]. To do this, consider the contour plots of the joint distribution
Figure 1. Throughout the paper, we assume, without loss of generality, that the qualitative effects of the treatment on the surrogate marker and true outcome are in the same direction, with positive effects beneficial and negative effects harmful. Each scenario shows the joint distribution of a different set of trials. Based on the location of the joint distribution on the Cartesian plane, we can infer the risk of surrogate paradox occurring. If the distribution falls mostly in the first or third quadrants, there is little risk of surrogate paradox, since
and
give the same qualitative conclusion. However, if the distribution falls in the second or fourth quadrants, the treatment effect on the surrogate and true outcomes are in opposite directions. By calculating the probabilities of the joint distribution falling in each quadrant, Elliot et al. present measures of the risk of surrogate paradox [
10]. These measures are dependent on both the level of correlation between
and
and the size of the treatment effect on both outcomes. For example, in Scenario 1, although there is a strong correlation between the treatment effect on the surrogate and true outcomes, there is still some risk of surrogate paradox because of the relatively small treatment effect on the true outcome. In Scenario 2, there is some risk that the treatment effect on the surrogate outcome is negative, while the true treatment effect is positive; however, the increased true treatment effect size means that there is a lower risk of experiencing the more dangerous surrogate paradox (i.e., the treatment effect on the surrogate is positive while the true treatment effect is negative). In Scenario 3, despite the very strong correlation between the treatment effects on the two outcomes, there is some risk of surrogate paradox because of the low treatment effect sizes. Finally, in Scenario 4, there is low correlation between the two outcomes, but the risk of surrogate paradox is precluded because of the large treatment effect size on both outcomes. In the remainder of this section, we describe Elliott et al.’s measures of surrogate paradox risk using this joint distribution [
10].
2.1. : Estimating the Probability That an Outcome and Marker Will Have the Same Direction of Treatment Effects in a New Trial
The first surrogate paradox measure considers the probability that the
th trial will yield treatment effects on the marker and the outcome in the same direction. This probability is given by
where
is the cumulative distribution function of a k-variate normal distribution with mean
and variance
. The subscript 13 in
refers to the first and third quadrants of the Cartesian plane, the region in which the marker gives a qualitatively correct prediction of the treatment effect.
2.2. : Estimating the Probability of Avoiding Dangerous Surrogate Paradox
A second measure of surrogacy paradox considers the particularly dangerous situation where the surrogate marker suggests a beneficial treatment effect but the treatment effect on the outcome measure is harmful. This probability is given by
This measure estimates the probability that the
th trial lies outside of the fourth quadrant of the Cartesian plane (see
Figure 1). It is the probability that a future trial will not result in a setting where the surrogate marker suggests the treatment will be helpful when, in fact, it is harmful.
2.3. : Estimating the Probability That an Outcome and Marker Will Have the Same Direction of Treatment Effects in a New Trial When Partial Data Have Been Collected
The first two measures can be considered when drawing inferences about a future trial that has not yet collected data based on N historic trials that have already completed data collection. In practice, a trial may have already begun data collection and be interested in the risk of observing the surrogate paradox in their ongoing trial conditioning on the data from historic trials. In particular, they may have collected data on the surrogate outcome and no or very limited data on the true outcome of interest. We consider the situation where we have collected partial data for the Nth trial and want to estimate the measures of surrogate paradox risk in the ongoing trial conditioned on the previously collected data from the first trials.
Let constitute the surrogate marker and outcome for each subject, be the fixed effect matrix associated with the parameters , and let be the random effect matrix associated with . Let , , and , represent the stacked elements of , , and . Then, , , and represent the stacked individual level data for each subject () in the Nth trial (where is the total sample of the Nth trial so far) and represents the trial-level random effects.
The conditional distribution of
can be found by considering the joint distribution of
and
and
for
, with
R representing a
matrix with block diagonals of
representing the individual level residual variance. Then, we have
where
and
. From here, the measure of surrogate paradox risk is given by
where
for
corresponding to the third element of the maximum likelihood (ML) or reduced maximum likelihood (REML) estimate of
and
corresponding to the third element of the ML/REML estimate of
,
for
corresponding to the fourth element of the ML/REML estimate of
and
corresponding to the fourth element of the ML/REML estimate of
,
corresponding to the
k, l element of the ML/REML estimator of
. Similarly, we can derive
.
This measure allows measurement of surrogate paradox risk after some data have been collected in the trial. This could be useful after the surrogate outcome has been collected on some of the patients, but there are not yet many (or any) measurements of the true endpoint that might occur later in the study. When is missing, can be replaced with in the above calculations, while leaving the placeholder rows for the missing .
2.4. s: Estimating the Size of the Beneficial Treatment Effect on the Marker Required to Preclude a Harmful Treatment Effect on the Outcome
In the fourth surrogate paradox measure, Elliott et al. consider the minimum observed beneficial treatment effect for a marker that can reduce the probability that the true treatment effect for the outcome is harmful. Let
represent the difference between the observed surrogate marker means under treatment and control. Note that for some value of
s,
will coincide with the true
. Then, the joint distribution of the true treatment effect on the outcome and the observed treatment effect on the surrogate marker is given by
where
,
, and
. From here, they find that the distribution of the true treatment effect on the outcome
conditional on a given observed treatment effect
is
and
The authors propose two different ways to move forward from here. If data are collected to determine
s, we can calculate the probability that the true effect in the outcome for the trial will be non-negative by replacing the parameters in (
3) by their estimates from the data. Alternatively, we can determine the value of s that will ensure that the probability that
is negative is less than or equal to a preset level
:
3. Incorporating Covariates
Treatments may have heterogeneous effects on surrogate and true endpoints in different patient populations, exposing some subpopulations to increased risk of surrogate paradox. Therefore, it is important that measuring risk of surrogate paradox allows consideration of patient level factors. To address this concern, a natural extension to Elliott et al. [
10] is to incorporate covariate information by conditioning on a set of covariates and making the measures above (
Section 2.1,
Section 2.2,
Section 2.3 and
Section 2.4) functions of covariates
X. We can consider a situation where the surrogate and outcome measures depend on a set of covariates in addition to the treatment and extend (
1) and (2) to incorporate covariates, where
indexes the number of covariates.
This may be difficult to fit once p gets large and increases the number of random effects required. We consider two simplified scenarios that can be extended to a larger number of covariates if enough data are available:
Scenario 1: The effects of covariates on surrogate and outcome are constant across trials (i.e., no random effects related to the covariates X).
Scenario 2: The effects of covariates on surrogate and outcome are not constant across trials. In order to not overly complicate notation, we focus on the setting with only one scalar or binary covariate X (i.e., , and all random effects related to the covariate X are included), but the approach can easily be extended to higher dimensions of covariates.
Although it is theoretically possible to consider a larger number of covariates, it is often not possible or computationally feasibly if it is expected that the effect of the covariates differs by study, since that would rapidly increase the size of the random effect variance matrix.
In the following two sections, we recreate the surrogate paradox measures from Elliott et al. under each of the above scenarios.
3.1. Scenario 1
Under scenario 1, we assume the effects of covariates on the surrogate and outcome measures are constant across trials:
Then, we can choose a level
for each
in
and calculate the causal effect of a treatment
Z on the surrogate marker among subjects with
in the
ith trial as
Similarly, the causal effect of a treatment
Z on the outcome measure among subjects with
in the
ith trial is
Thus,
and
have the joint distribution:
This distribution consists of a mean shift from the non-covariate-adjusted distribution. The variance remains the same as the original, no-subgroup distribution. To visualize this, refer to Scenario 1 in
Figure 2. The risk of surrogate paradox may be different in the two groups and can be identified by calculating the differing probabilities of falling into each quadrant for the different covariate levels. The change in risk occurs from a mean shift of the overall joint distribution (the variance of the joint distribution for the two covariate levels remains the same).
3.1.1. Scenario 1:
Using the new joint distribution, the probability that the
th trial will yield treatment effects on the marker and outcome in the same direction is given by
where
is the cumulative distribution function of a k-variate normal distribution with mean
and variance
.
3.1.2. Scenario 1:
Under the new joint distribution, the probability that the treatment effects for the outcome will be harmful given that the treatment effect on the marker is beneficial is given by
This measure estimates the probability that a future trial will not result in a setting where the surrogate marker suggests the treatment will be helpful when it is, in fact, harmful.
3.1.3. Scenario 1:
For this section, we consider the simplest case of one covariate for illustrative purposes. This can easily be extended to multiple covariates by extending the
and
matrices and the
and
vectors. Let
constitute the surrogate marker and outcome for each subject,
be the fixed effect matrix associated with the parameters
, and let
be the random effect matrix associated with
. Let
,
, and
, represent the stacked elements of
,
, and
.
Consider the vector of random effects
, then the conditional distribution of
can be found by considering the joint distribution of
and
and
for
, with
R representing a
matrix with block diagonals of
as before.
where
and
. From here, the measure of surrogate paradox risk is given by
where
for
and
corresponding to the third and seventh elements of the estimate of
and
corresponding to the third element of the estimate of
,
for
and
corresponding to the fourth and eighth element of the estimate of
and
corresponding to the fourth element of the estimate of
,
corresponding to the
k, l element of the estimator of
. Similarly, we can derive
.
3.1.4. Scenario 1: s Value
In the fourth surrogate paradox measure, Elliott et al. consider the minimum observed beneficial treatment effect for a marker that can reduce the probability that the true treatment effect for the outcome is harmful [
10]. When considering covariate subgroups, we can compute
for each covariate level and call it
:
represents the difference between the observed surrogate marker means under treatment and control within a fixed level of X. Then, the joint distribution of the true treatment effect on the outcome and the observed treatment effect on the surrogate marker is given by
where
,
,and
. So, the distribution of the true treatment effect on the outcome
conditional on a given observed treatment effect
within the group having
is
and
The value of s that will ensure that the probability that
is negative is less than or equal to a preset level
:
3.2. Scenario 2
Under scenario 2, we assume the effects of the covariates on the surrogate and outcome are not constant across trials. For simplicity, we consider only one scalar or binary covariate
X:
where
Now, we can choose a level
x for the covariaite
X and calculate the causal effect of a treatment Z on the surrogate marker and outcome measure among subjects with
in the
ith trial as
Similarly, the causal effect of a treatment Z on the surrogate marker and outcome measure among subjects with
in the
ith trial is
Now, we can calculate the joint distribution of
and
:
Thus,
and
have the joint distribution:
where
This distribution consists of both a mean shift and change in variance compared with the original, no-subgroup distribution. To visualize this, refer to Scenario 2 in
Figure 2 and
Figure 3. The change in risk occurs from both a mean shift and change in variance of the overall joint distribution by covariate level. We can use this distribution to construct the four surrogate paradox measures proposed by Elliott et al.
3.2.1. Scenario 2:
Using the new joint distribution, the probability that the
th trial will yield treatment effects on the marker and outcome in the same direction is given by
where
is the cumulative distribution function of a k-variate normal distribution with mean
and variance
.
3.2.2. Scenario 2:
Under the new joint distribution, the probability that the treatment effects for the outcome will be harmful given that the treatment effect on the marker is beneficial is given by
3.2.3. Scenario 2:
Let
constitute the surrogate marker and outcome for each subject,
be the fixed effect matrix associated with the parameters
, and let
be the random effect matrix associated with
. Let
,
, and
represent the stacked elements of
,
, and
.
Consider the vector of random effects
, then the conditional distribution of
can be found by considering the joint distribution of
and
and
for
, with
R representing a
matrix with block diagonals of
as before.
where
and
. From here, the measure of surrogate paradox risk is given by
where
for
and
corresponding to the third and seventh elements of the estimate of
and
corresponding to the third element of the estimate of
,
for
and +
corresponding to the fourth and eighth element of the estimate of
and
corresponding to the fourth element of the estimate of
,
corresponding to the
k, l element of the estimate of
. Similarly, we can derive
.
3.2.4. Scenario 2: s Value
In the fourth surrogate paradox measure, Elliott et al. consider the minimum observed beneficial treatment effect for a marker that can reduce the probability that the true treatment effect for the outcome is harmful. When considering covariate subgroups, we can compute
for each covariate level and call it
:
represents the difference between the observed surrogate marker means under treatment and control within a fixed level of X. Then, the joint distribution of the true treatment effect on the outcome and the observed treatment effect on the surrogate marker is given by
where
,
,and
. So, the distribution of the true treatment effect on the outcome
conditional on a given observed treatment effect
within the group having
is
and
The value of
s that will ensure that the probability that
is negative is less than or equal to a preset level
:
4. Bayesian Estimation
In this section, we describe how to obtain estimates and inferences for the proposed measures using a Bayesian frameworks for scenario 2, which is a generalization of scenario 1 that allows for covariate effects and interactions to differ by study. It is also possible to estimate the measures using a maximum likelihood (ML) or reduced maximum likelihood (REML) approach, although it is often not computationally feasible in practice without large sample sizes, so we focused on a Bayesian estimation approach in this paper. Details of the ML/REML estimation approach are provided in the
Appendix A.
The estimation can also be conducted using a fully Bayesian approach, with priors placed on
,
D, and
. We obtain draws of the parameters from a Markov chain Monte Carlo and transform them to obtain
and
, the posterior distributions of
and
. We place a multivariate normal prior on the fixed effects,
, such that
. We place Wishart priors on the variance parameters
D and
such that
and
. Then, we can obtain the conditional posterior distributions for each of the parameters of interest as
with
Using the conditional posterior distributions and a Gibbs sampling routine, we can obtain draws from the posterior distributions of each of the parameters of interest.
5. Testing
In order to determine which scenario is the best fit for a particular analysis, we would need some intuition as to whether the effect of a covariate X on the outcome differs based on the study and whether that effect also differs based on treatment. If there is no intuition as to whether the covariate effect differs by center, it may be of interest to test which scenario is the most appropriate for the observed meta-analytic data. This amounts to jointly testing the null hypotheses that all of the variances and covariances associated with the covariate random effects are equal to zero.
Since variances are positive, testing whether they are equal to zero means we are testing a null hypothesis on the boundary of the parameter space, and the usual chi-square distribution of the likelihood ratio statistics under this null hypothesis is incorrect. Drikvandi et al. propose a test statistic based on the variance least square estimator of variance components, as well as a permutation test to approximate its finite sample distribution [
12]. Under the Bayesian framework, Ariyo et al. recommend using the marginal deviance information criterion (DIC) or the marginal widely applicable information criterion (WAIC) to evaluate the need for random effects [
13] by comparing the criterion value between the model including the random effects and a model excluding all the covariate-related random effects.
6. Simulations
We perform simulations under several surrogacy scenarios to examine the properties of the proposed estimators as a function of a binary covariate X. We generate data under scenario 1 (the effect of X on the surrogate and outcome is constant across trials) and scenario 2 (the effect of X on the surrogate and outcome is not constant across trials). For scenario 1, we generate data assuming and . For the variance components, we assume , = , and . For scenario 2, we generate data using the same parameters as scenario 1 and assume the new variance components and that all the new off-diagonal components are set to . Under each scenario, we simulate 200 studies with 30 or 100 clusters, each of size 20, 50, or 500, representing 30 or 100 repeated trials of the same treatment, surrogate, and true endpoint combination, each with either 20, 50, or 500 participants. Half of the participants in each trial are randomly assigned to either placebo or control.
We used a Gibbs sampling routine, as described in
Section 4, with a multivariate normal prior for the fixed effects, such that
, and Wishart priors for the inverse of the covariance matrices of the form
, where
q is the length of the associated vector of covariance effects. We sample from the derived conditional posterior distributions to obtain draws of the proposed estimators.
Table 1 and
Table 2 contain the point estimates, standard errors, bias, and coverage rates for
,
, and
s, with 30 and 100 trials, respectively. The true value of
s assumes that there is equal distribution of subjects between each of the treatment and covariate categories. To estimate
, we considered the final study to have only half of the data of the other trials. Although it is also possible to conduct this analysis with a ML/REML estimation approach, as described in the
Appendix A, we ran into computation issues when estimating the large number of random effects using reasonable sample sizes and have therefore presented only the simulation results for the Bayesian approach.
We observed some minimal bias in estimating , , and with either 30 or 100 trials, each of size 20, 50, or 500 subjects. However, with the estimate of s, we found that the lower number of trials and lower number of subjects resulted in unstable estimates with very large bias and variance. The observed coverage rates of the credible intervals were below the nominal level for some estimates of and in both scenarios, demonstrating the need for large numbers of trials and subjects per trial when there is a desire to identify the risk of surrogate paradox in subpopulations.
As a sensitivity analysis, we also considered two simulation settings with data that were not normally distributed to assess the robustness of our proposed method to model misspecification. We generated data using a
T Distribution with 15 degrees of freedom, as well as a skew normal distribution with
equal to 0.1 times the location and scale parameters and centered at 0. The data generated under the T distribution allow us to assess whether the method is robust to a situation in which the normality assumption is violated in the tails of the distribution [
14]. The data generated under the skew normal distribution consider a situation in which the data are distributed asymmetrically, as carried out in prior similar sensitivity analyses [
15]. For each sensitivity analysis, we generated 30 trials, each with 50 subjects, and considered the bias, standard error, and coverage of
and
. The true value of each of the parameters of interest was estimated empirically by taking one million draws of
and
and computing
and
from the proportion of draws that fell into each of the relevant quadrants. The results of the sensitivity analysis are shown in
Table 3. Under these deviations from normality, we had small increases in bias and standard error but still maintained high coverage rates. As the number of required parameters increased in scenario 2, the coverage rates also decreased, as we would expect.
7. Applications
7.1. Collaborative Initial Glaucoma Treatment Study
We apply the proposed method to data from the Collaborative Initial Glaucoma Treatment Study (CIGTS) [
16]. The CIGTS trial was a multicenter randomized clinical trial that contrasted initial surgical therapy versus initial medical therapy to treat glaucoma, with reduction in intraocular pressure (IOP) as one of its outcome measures. A total of 607 patients were enrolled in the study, and 307 were randomized to the drug arm. IOP was recorded in mmHg at baseline, 3 months, 6 months, and every 6 months thereafter. We consider the measurement of IOP at 18 months after beginning treatment as a surrogate for the true endpoint of interest: IOP at 96 months. We consider the 14 centers at which the study was conducted to be the trial-level replicates. Missing data were imputed using single imputation with a linear mixed model with a random effect for trial, a quadratic trend for time, an effect for treatment, and an interaction between time and treatment. The estimates of the between-trial covariance matrix, D, are not positive definite, so only the results (estimates and 95% credible intervals(CIs)) from the Bayesian estimation procedure are presented. As in the simulation study, we used a Gibbs sampling routine, as described in
Section 4, with a multivariate normal prior for the fixed effects, such that
, and Wishart priors for the inverse of the covariance matrices of the form
, where
q is the length of the associated vector of covariance effects. The
measure of surrogacy is 0.49, indicating a moderate quality surrogate by the Buyse criteria [
6].
In order to illustrate our proposed methods, we consider two covariates: sex (female, male) and age (<60, ≥60), and compute
and
for each variable category under both proposed scenarios. The results are shown in
Table 4.
In scenario 1, we exclude all of the random effects for the included covariates. As we can see, overall, there is a small probability of experiencing the surrogate paradox when using early IOP as a surrogate for later IOP in this trial, since the 95% credible intervals of the measures are close to 1. This does not change significantly when comparing the overall and with the covariate adjustments, implying that there is no evidence of a significant difference between the risk of surrogate paradox by age or gender. In scenario 2, we estimate all of the random effects for the included covariates, allowing the effect of the covariate and the interaction between the covariate and treatment to differ by study center. In this scenario, we observe some differences between the risk of surrogate paradox by subgroup. Notably, it seems as though males and people aged 60 or over are at a higher risk of experiencing the surrogate paradox in a new trial compared with females and people under the age of 60, respectively. However, the difference in their risk of dangerous surrogate paradox is minimal. In both scenarios, the measure of s is too unstable to provide useful inference.
Using WAIC as a model selection tool, we find that there is a WAIC difference of 380 between the models for scenarios 1 and 2 for the model including sex as a covariate, and a WAIC difference of 815 for the model including age as a covariate, and conclude that the models including the additional random effects (scenario 2) are a better fit in this data example. The data for this trial are not publicly available.
7.2. Trial of Preventing Hypertension
Our second illustrative example comes from the Trial of Preventing Hypertension (TROPHY) [
17]. This multicenter randomized trial compared the effects of two years of treatment with Candesartan versus the standard of care on the incidence of hypertension in patients with prehypertension. Blood pressure and hypertension status were collected at baseline, 1 month and 3 months post randomization, and then every 3 months for a total of two years of follow-up. To illustrate our proposed methods, we consider the average of systolic and diastolic pressure at 1 month as a surrogate for the average of systolic and diastolic pressure at 12 months. Although the primary endpoint of interest in the original trial was a binary indicator of developing hypertension, we used the endpoint of average systolic and diastolic pressure at 12 months, since our method has currently only been developed for normally distributed outcomes. After developing hypertension patients were switched to a new treatment regimen, resulting in some missing data in both the surrogate measured at 1 month and the true endpoint measured at 12 months. These missing data were imputed using a model that was stratified by treatment and gender and included the following baseline covariates: age, race, weight, body mass index, systolic blood pressure, diastolic blood pressure, total cholesterol, high-density lipoprotein cholesterol (HDL), low-density lipoprotein (LDL), HDL:LDL ratio, triglycerides, fasting glucose, total insulin, and creatinine. For missing outcome values at 12 months, the imputation model also included the blood pressure measurements up to the 12th month. We consider the 69 centers at which the study was conducted to be the trial-level replicates. There were a total of 772 patients included in the original analysis. After removing centers with patients in only one treatment arm, there were a remaining 62 centers and 764 patients, 389 of which received the treatment. The size of the remaining centers ranged from 2 patients to 46 patients. When applying the REML estimation method, the covariance matrix was non-positive-definite (likely due to the small sample size at some centers), so we only present the results (estimates and 95% credible intervals (CIs)) from the Bayesian estimation procedure.
In order to illustrate our proposed methods, we consider two covariates: sex (female, male) and age (<50, ≥50), and compute
and
for each variable category under both proposed scenarios. The results are shown in
Table 5.
The results indicate that, overall, there is very little risk of the surrogate paradox when considering the effect Candesartan on the average of systolic and diastolic blood pressure at 1 month as a surrogate for the average of systolic and diastolic blood pressure at 12 months. Although there are minor differences between the risk of surrogate paradox (measured through both and ) by gender and age, the credible intervals overlap between the groups, indicating no significant difference between their risk of surrogate paradox. As in the previous example, the measure of s is too unstable to provide useful inference, consistent with our simulation study that indicated a large number of trials would be required to obtain useful inference for this quantity.
Using WAIC as a model selection tool, we find that there is a WAIC difference of 120 between the models for scenarios 1 and 2 for the model with sex as a covariate, and a WAIC differnce of 83 for the model with age as a covariate, and conclude that the models including the additional random effects (scenario 2) are better fitting in this data example. However, qualitatively, the results between the two scenarios are quite similar, and a simpler model may be preferred. The data for this trial are not publicly available.
8. Discussion
Surrogate outcomes are commonly used in clinical trials, and their prevalence has led to the development of innovative trial designs that aim to efficiently use the additional information provided by surrogate outcomes [
18,
19,
20]. Despite the valuable additional information that surrogate outcomes provide, their use also comes with risk. Evaluating the quality of a chosen surrogate to prevent the surrogate paradox should be an important step in both the design and analysis of clinical trials.
There are several existing approaches for evaluating surrogate outcome efficacy, but some apparently “good” surrogates under these methods may still experience the “surrogate paradox”, in which the treatment has a positive effect on the surrogate endpoint but a negative effect on the true endpoint. The meta-analytic causal association approach to surrogate validation is particularly useful in assessing the risk of surrogate paradox. In this paper, we develop methods to measure the risk of the surrogate paradox in subpopulations when there are data available on multiple trials of similar treatments on the same surrogate and outcome. Using measures of surrogate paradox risk can prevent the occurrence of the surrogate paradox in new trials and protect the health of study participants.
Incorporating covariate information can provide valuable insights into the mechanism of the surrogate paradox and identify groups that are particularly vulnerable to the paradox. This additional information can tell us about the transferability of surrogates from one trial to the next, depending on their study population. It can also help assess the risk of using a proposed surrogate in a new trial depending on the demographic distribution of the new study population. Researchers can incorporate their understanding of whether certain subpopulations are at a higher risk of experiencing the surrogate paradox into the design of new clinical trials of similar treatments that plan to use the same surrogate and true endpoints.
Both our simulations and examples focused on exploring whether the surrogate paradox risk varied with a single scalar covariate. While in principle this could easily be extended to a multiple-covariate setting, in practice, this would typically require a fairly large set number of trials to obtain stable estimates, especially for the “scenario 2” setting, where both the fixed and random effects are associated with multiple covariates. Our simulation study showed that the estimation of some measures can be unstable when there is a small number of trials and subjects. We also considered simulations under mild deviations from normality and were able to retain relatively high coverage rates. The proposed method derives the probabilities of interest assuming normally distributed variables that may not be likely in practice. Future work will consider further violations of the normality assumption, as well as how to account for them when estimating the risk of surrogate paradox.
This work has the potential to be extended to non-normal surrogate and true endpoints. By using a copula model instead of the bivariate normal assumption in this paper, we may be able to consider a larger range of distributions for the surrogate and true endpoints, including binary or time-to-event distributions. We may also be able to consider the situation when the proposed surrogate and true endpoints have differing distributional forms (e.g., an indicator of hypertension as a surrogate for time to cardiac death). Another potential extension is to apply meta-analytic methods to estimate the risk of surrogate paradox when individual-level data on the prior studies are not available. One example would be if we only have the parameter estimates from a series of published papers on the same treatment and endpoint combination and want to use them to estimate the risk of surrogate paradox risk in a newly designed study.
Finally, we note that while we focused on conditional surrogacy paradox estimates—interactions with covariates—this method can also be used to deal with non-normality in the multiple trials setting, with the conditional surrogacy paradox measures averaged to obtain marginal results, using the sample distribution of the covariates to approximate the population density. Thus,
variance estimates could be obtained by bootstrapping for the REML approaches or via posterior distributions of draws of
obtained by averaging the draws of
.
The code for implementing these methods is available at github.com/fatemashafie.
Author Contributions
Conceptualization, F.S.K. and M.R.E.; methodology, F.S.K. and M.R.E.; software, F.S.K.; validation, F.S.K.; formal analysis, F.S.K.; investigation, F.S.K. and M.R.E.; resources, J.M.G.T. and N.K.; data curation, N.K.; writing—original draft preparation, F.S.K.; writing—review and editing, F.S.K., M.R.E. and J.M.G.T.; visualization, F.S.K., M.R.E. and J.M.G.T.; supervision, M.R.E. and J.M.G.T.; project administration, F.S.K., M.R.E. and J.M.G.T.; funding acquisition, J.M.G.T. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded in part by the US National Institutes of Health grant CA83654 and by the National Cancer Institute Award Number T32CA083654.
Institutional Review Board Statement
Ethical review and approval were waived for this study since we conducted secondary analyses on previously collected and de-identified data.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data used in this study are not publicly available. Code for implementing the methods is available at
github.com/fatemashafie.
Conflicts of Interest
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
Appendix A. Maximum Likelihood Estimation
To estimate
, we can use the best linear unbiased estimators from a linear mixed model using either maximum likelihood (ML) or reduced maximum likelihood (REML) estimation. Let
constitute the surrogate marker and outcome for each subject,
be the fixed effect matrix associated with the parameters
, and let
be the random effect matrix associated with
. Let
,
, and
represent the stacked elements of
,
, and
. Then, consider the model:
where
,
,
is the number of observations in the
ith trial, and ⊗ is the Kronecker product operator. Then,
,
,
, and
are the third, fourth, seventh, and eighth elements of of the ML or REML estimator of
. Similarly, we can obtain estimates of the needed variance components from the ML or REML estimators of
D. Then, we have
where
Similarly, we can estimate .
References
- Prentice, R.L. Surrogate endpoints in clinical trials: Definition and operational criteria. Stat. Med. 1989, 8, 431–440. [Google Scholar] [CrossRef] [PubMed]
- Frangakis, C.E.; Rubin, D.B. Principal Stratification in Causal Inference. Biometrics 2002, 58, 21–29. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Joffe, M.M.; Greene, T. Related Causal Frameworks for Surrogate Outcomes. Biometrics 2009, 65, 530–538. [Google Scholar] [CrossRef] [PubMed]
- Lauritzen, S.L.; Aalen, O.O.; Rubin, D.B.; Arjas, E. Discussion on Causality [with Reply]. Scand. J. Stat. 2004, 31, 189–201. [Google Scholar] [CrossRef]
- Rubin, D.B. Multiple Imputation for Nonresponse in Surveys; John Wiley & Sons: Hoboken, NJ, USA, 1987. [Google Scholar]
- Buyse, M.; Molenberghs, G.; Burzykowski, T.; Renard, D.; Geys, H. The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics 2000, 1, 49–67. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Fleming, T.R.; DeMets, D.L. Surrogate End Points in Clinical Trials: Are We Being Misled? Ann. Intern. Med. 1996, 125, 605–613. [Google Scholar] [CrossRef] [PubMed]
- Chen, H.; Geng, Z.; Jia, J. Criteria for surrogate end points. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2007, 69, 919–932. [Google Scholar] [CrossRef]
- VanderWeele, T.J. Surrogate Measures and Consistent Surrogates. Biometrics 2013, 69, 561–565. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Elliott, M.R.; Conlon, A.S.; Li, Y.; Kaciroti, N.; Taylor, J.M. Surrogacy marker paradox measures in meta-analytic settings. Biostatistics 2015, 16, 400–412. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Buyse, M.; Molenberghs, G. Criteria for the Validation of Surrogate Endpoints in Randomized Experiments. Biometrics 1998, 54, 1014–1029. [Google Scholar] [CrossRef] [PubMed]
- Drikvandi, R.; Verbeke, G.; Khodadadi, A.; Partovi Nia, V. Testing multiple variance components in linear mixed-effects models. Biostatistics 2013, 14, 144–159. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ariyo, O.; Quintero, A.; Muñoz, J.; Verbeke, G.; Lesaffre, E. Bayesian model selection in linear mixed models for longitudinal data. J. Appl. Stat. 2020, 47, 890–913. [Google Scholar] [CrossRef] [PubMed]
- McCulloch, C.E.; Neuhaus, J.M. Prediction of random effects in linear and generalized linear models under model misspecification. Biometrics 2011, 67, 270–279. [Google Scholar] [CrossRef] [PubMed]
- Sheng, Y.; Yang, C.; Curhan, S.; Curhan, G.; Wang, M. Analytical methods for correlated data arising from multicenter hearing studies. Stat. Med. 2022, 41, 5335–5348. [Google Scholar] [CrossRef] [PubMed]
- Musch, D.C.; Lichter, P.R.; Guire, K.E.; Standardi, C.L. The collaborative initial glaucoma treatment study: Study design, methods, and baseline characteristics of enrolled patients. Ophthalmology 1999, 106, 653–662. [Google Scholar] [CrossRef] [PubMed]
- Julius, S.; Nesbitt, S.D.; Egan, B.M.; Weber, M.A.; Michelson, E.L.; Kaciroti, N.; Black, H.R.; Grimm, R.H.; Messerli, F.H.; Oparil, S.; et al. Feasibility of Treating Prehypertension with an Angiotensin-Receptor Blocker. N. Engl. J. Med. 2006, 354, 1685–1697. [Google Scholar] [CrossRef] [PubMed]
- Lawless, J.F.; Kalbfleisch, J.D.; Wild, C.J. Semiparametric methods for response-selective and missing data problems in regression. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 1999, 61, 413–438. [Google Scholar] [CrossRef]
- Chatterjee, N.; Chen, Y.H.; Breslow, N.E. A Pseudoscore Estimator for Regression Problems with Two-Phase Sampling. J. Am. Stat. Assoc. 2003, 98, 158–168. [Google Scholar] [CrossRef]
- Yang, C.; Diao, L.; Cook, R.J. Adaptive response-dependent two-phase designs: Some results on robustness and efficiency. Stat. Med. 2022, 41, 4403–4425. [Google Scholar] [CrossRef] [PubMed]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).