# Biomarker-Guided Non-Adaptive Trial Designs in Phase II and Phase III: A Methodological Review

^{1}

^{2}

^{*}

Next Article in Journal

Correction published on 7 May 2018, see
*J. Pers. Med.* **2018**, *8*(2), 17.

MRC North West Hub for Trials Methodology Research, Liverpool L69 3GL, UK

Department of Biostatistics, Institute of Translational Medicine, University of Liverpool, Liverpool L69 3GL, UK

Author to whom correspondence should be addressed.

Academic Editor: Stephen B. Liggett

Received: 8 October 2016
/
Revised: 6 December 2016
/
Accepted: 11 January 2017
/
Published: 25 January 2017

Biomarker-guided treatment is a rapidly developing area of medicine, where treatment choice is personalised according to one or more of an individual’s biomarker measurements. A number of biomarker-guided trial designs have been proposed in the past decade, including both adaptive and non-adaptive trial designs which test the effectiveness of a biomarker-guided approach to treatment with the aim of improving patient health. A better understanding of them is needed as challenges occur both in terms of trial design and analysis. We have undertaken a comprehensive literature review based on an in-depth search strategy with a view to providing the research community with clarity in definition, methodology and terminology of the various biomarker-guided trial designs (both adaptive and non-adaptive designs) from a total of 211 included papers. In the present paper, we focus on non-adaptive biomarker-guided trial designs for which we have identified five distinct main types mentioned in 100 papers. We have graphically displayed each non-adaptive trial design and provided an in-depth overview of their key characteristics. Substantial variability has been observed in terms of how trial designs are described and particularly in the terminology used by different authors. Our comprehensive review provides guidance for those designing biomarker-guided trials.

The rapidly developing field of ‘personalized medicine’ [1], also known as ‘individualized medicine’, ‘stratified medicine’, or ‘precision medicine’ is allowing scientists to treat patients by providing them with a specific regimen according to their individual demographic, genomic or biological characteristics. The latter two aforementioned characteristics are collectively known as biomarkers [2]. The terms ‘personalized medicine’ and ‘individualized medicine’ often create confusion in literature, as in reality, the objective of this approach is to identify demographic- or biomarker-defined subgroups. Thus, as it still remains a population and not an individualized approach, the terms ‘stratified’ or ‘precision’ medicine are often considered to be more accurate. The National Institutes of Health Biomarkers Definitions Working Group [3] defined a biomarker to be “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention” [1,4,5,6,7]. Biomarkers related to clinical outcome which are measured before treatment commences can be classified as either prognostic or predictive biomarkers. Prognostic biomarkers provide information regarding the likely progression of a disease without taking into account any specific treatment, whilst predictive biomarkers provide information about the patient’s outcome given a certain treatment, i.e., their likely response to the treatment [4,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34]. Prior to utilizing a patient’s biomarker information in clinical practice, it is necessary that they have been robustly tested in terms of analytical validity (the results of testing a specific biomarker or biomarkers can be trusted), clinical validity (the results obtained from the test correlates with important clinical information) and clinical utility (the test will be useful in ameliorating patients’ health) [9,13,19,25].

A number of phase II and phase III trial designs have been proposed for testing the clinical utility of prognostic biomarkers. Due to the large amount of literature in this field, we have split our review into two broad categories, i.e., the biomarker-guided non-adaptive trial designs which are presented in the current study and the biomarker-guided adaptive trial designs. The latter are extensively discussed in our published paper “Biomarker-Guided Adaptive Trial Designs in Phase II and Phase III: a Methodological Review”, Antoniou et al., 2016 [35].

In this review we aim to communicate the different non-adaptive biomarker-guided trial designs, which can be either randomized or non-randomized designs (e.g., single-arm designs), proposed in the literature so far and to report on the potential advantages and weaknesses of each. Although not included in the paper by Antoniou et al., 2016 [35] which describes and discusses adaptive designs, some designs discussed in the current paper, although not adaptive in the traditional sense, they own an adaptive element.

We undertook a search of the MEDLINE (Ovid) database, restricted to published papers in the English language within the previous ten years aiming to identify articles which describe and discuss biomarker-guided trial designs. Traditional trial designs, i.e., designs which do not incorporate biomarkers aiming to aid in making treatment decisions (we will refer to as ‘traditional’ trial designs) are part of our literature review search strategy in order to help us identify and distinguish any potential reference to biomarker-guided designs, as the finding of the appropriate keywords in Medline database for biomarker-guided designs was challenging. Furthermore, the restriction of published papers within the past decade was made not only because of the large amount of literature in this field, but also for the identification of the most recent trial designs. Two separate strategies as illustrated in Figure 1 were used to identify relevant articles, and the keywords utilized in the search are presented in S1 Keywords. Our initial search resulted in 9412 and 5024 relevant titles for biomarker-guided clinical trial designs and traditional trial designs, respectively. From the 9412 papers, 104 articles were included based on their title and abstract. From the 5024 papers, 40 articles were included based on their title and abstract and after removing inaccessible articles or those already identified in the search for biomarker-guided trial designs. An additional 67 eligible papers were identified from searching both the reference list of included articles and the internet (the internet searches were performed using the same keywords as those for the Ovid strategy), making a total of 211 included papers. Of these 211 included papers, biomarker-guided non-adaptive trial designs were referred to in 100 papers; 107 papers for biomarker-guided adaptive trial designs were reviewed in our published paper Antoniou et al., 2016 [35]. In the total number of 211 papers, some papers are referred to both adaptive and non-adaptive designs. Articles from references and internet searches which did not provide further information on each broad category of biomarker-guided designs were not included. Cited books, web pages for actual trials and papers published before 2005 are also not included in these numbers. For each included paper, the following details were extracted: definition of the trial design(s) referred to in the paper, how patients were screened and/or randomized based on their biomarker status, treatment groups randomized to, as well as other key information relating to the trial design and methodology, including advantages and limitations. Where reference was made in the included papers to an actual trial which had adopted a particular biomarker-guided non-adaptive trial design, the clinical field with which the trial was associated was also recorded. However, a review of all implementations of the different trial designs in practice is beyond the scope of this paper; however, and is an area for potential future work. Therefore, it is important to highlight that even where no evidence of the implementation of a particular design was found in the papers included in our review, the design may well be currently in use in ongoing trials.

In our review, we identified five main biomarker-guided non-adaptive trial designs namely: (i) single-arm designs; (ii) enrichment designs; (iii) randomize-all designs; (iv) biomarker-strategy designs and (v) other designs. Within each main design several subtypes and extensions were also identified. Graphical representations of the main designs and subtypes are given in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15 and Figure 16. Graphical representations of the extensions are given in Figures S1-S4 included in File S1-S4. The characteristics and methodology of the main design types and subtypes are discussed below and are summarized in Table 1, whilst information on the extensions are discussed in File S1-S4. Furthermore, sample size formulae for each biomarker-guided design are provided in Table 2.

Single arm designs were referred to in seven papers (7%). In the context of biomarkers, these designs (Phase II designs) include the whole study population to which the same experimental treatment is prescribed, without taking into consideration biomarker status.

Enrichment designs are described in 71 papers (71%), either in Phase II or Phase III clinical trials, and involve randomizing only the biomarker-positive patients and comparing the experimental treatment versus the standard treatment only in this particular biomarker-defined subgroup.

$$E\left({D}_{i,enrichment}\right)=\frac{nT{\lambda}_{i}}{2\left({\lambda}_{i}+{\phi}_{i}\right)}\left\{1-\frac{{e}^{-\left({\lambda}_{i}+{\phi}_{i}\right)t}}{\left({\lambda}_{i}+{\phi}_{i}\right)T}[1-{e}^{-\left({\lambda}_{i}+{\phi}_{i}\right)T}]\right\},$$

$${D}_{enrichment}=4{\left[\frac{\left({z}_{\alpha /2}+{z}_{\beta}\right)}{{\mathrm{log}\mathsf{\theta}}_{1}}\right]}^{2},$$

$${D}_{enrichment}=\frac{{\left(R+1\right)}^{2}}{R}{\left[\frac{\left({z}_{\alpha /2}+{z}_{\beta}\right)}{{\mathrm{log}\mathsf{\theta}}_{1}}\right]}^{2}.$$

In a survival study, the calculation of the total sample size in terms of number of patients required in the two treatment groups (experimental and control treatment group) to be enrolled in order to yield the aforementioned total number of events depends on the probability of event over the duration of the study [120]. Consequently, the actual number of patients required in a survival study can be given by
where $\mathrm{Pr}\left(event\right)$ is the probability of observing an event in the two treatment groups in the study and ${D}_{enrichment}$ is the required total number of events. $\mathrm{Pr}\left(event\right)$ in a survival study can be given by
where
are the proportions of patients who are randomized to experimental and control treatment group respectively and $P{r}_{A}\left(event\right)$ and $P{r}_{B}\left(event\right)$ are the probability of events in experimental and control arm respectively [121]. Freedman, 1982 [122] provided an approximation of the probability of event for each treatment group assuming equal follow-up for all patients and thus simultaneous accrual for all patients whereas Schoenfeld, 1983 [123] provided a more exact approximation of the expected event rate as compared to Freedman’s approximation. More precisely, according to Freedman’s idea,
and according to Schoenfeld’s idea,
where $i$ denotes the corresponding treatment group (either experimental or control), $\tau $ denotes the follow-up time and $T$ the accrual period, $T/2+\tau $ denotes the median follow-up time and $T+\tau $ denotes the total duration of the study. Another approximation of the probability of event could be
considering that the survival probability can be approximated as the probability that a patient survives past the median follow-up time (i.e., $T/2+\tau $) [121].

$${N}_{enrichment}=\frac{{D}_{enrichment}}{\mathrm{Pr}\left(event\right)},$$

$$\mathrm{Pr}\left(event\right)={\pi}_{A}P{r}_{A}\left(event\right)+{\pi}_{B}P{r}_{B}\left(event\right),$$

$${\pi}_{A}=\frac{R}{R+1}\text{}\mathrm{and}\text{}{\pi}_{B}=\frac{1}{R+1},$$

$$P{r}_{i}\left(event\right)\approx 1-{S}_{i}\left(\tau \right)$$

$$P{r}_{i}\left(event\right)\approx 1-\left\{{S}_{i}\left(\tau \right)+4{S}_{i}\left(T/2+\tau \right)+{S}_{i}\left(T+\tau \right)\right\}/6,$$

$$P{r}_{i}\left(event\right)\approx 1-{S}_{i}\left(T/2+\tau \right)$$

The web-based interface is composed of two options. If the first option is chosen, the treatment effects for assay-negative and assay-positive patients must be specified in order to evaluate the relative efficiency of enrichment and untargeted design, i.e., marker stratified design (see next section for further information) in which apart from the biomarker-positive patients, biomarker-negative patients are also included; if the second option is chosen, it is possible to account for error in the assaying of the study population, thus, both the treatment effects for target-negative and target-positive patients must be specified as well as the assay’s sensitivity and specificity.

The sample size calculation using binary data is based on the formulas described by Simon and Maitournam [65,111,112] and again the two options offered when assuming a time-to-event outcome are available, i.e., options both with and without accounting for error in assaying the study population the biomarker status. When binary outcome is assumed and the allocation ratio is 1:1, the sample size of randomized patients required in each treatment arm (experimental and control) can be given as
where ${p}_{A}^{Q}$ and ${p}_{B}$ are the response probabilities in the experimental and control groups respectively,
and ${z}_{\alpha /2},\text{}{z}_{\beta}$ denote the upper $\alpha /2$- and upper $\beta $-points respectively of a standard normal distribution where $\alpha $ and $\beta $ denote the assumed type I error and type II error respectively. The response probability in the experimental group can be found by
where ${\delta}_{+}$ denotes the improvement in response probability for biomarker-positive patients. Consequently, the total sample size of randomized patients will be

$${N}_{enrichment/arm}=2{\overline{p}}_{Q}\left(1-{\overline{p}}_{Q}\right){\left[\frac{\left({z}_{\alpha /2}+{z}_{\beta}\right)}{\left({p}_{A}^{Q}-{p}_{B}\right)}\right]}^{2},$$

$${\overline{p}}_{Q}=\frac{{p}_{A}^{Q}+{p}_{B}}{2}$$

$${p}_{A}^{Q}={p}_{B}+{\delta}_{+},$$

$${N}_{enrichment}=2{N}_{enrichment/arm}$$

For continuous response endpoints the aforementioned formula ${N}_{enrichment/arm}$ changes to
where ${\sigma}^{2}$ denotes the anticipated common variance, ${\mu}_{A+}$ and ${\mu}_{B+}$ the mean responses for biomarker-positive patients in the experimental and control treatment arm respectively. These formulae are the standard formulae used for a standard randomized trial.

$${N}_{enrichment/arm}=\frac{2{\sigma}^{2}{\left({z}_{\alpha /2}+{z}_{\beta}\right)}^{2}}{{\left({\mu}_{A+}-{\mu}_{B+}\right)}^{2}},$$

In addition, if we want to account for error in the assaying of the study population, the number of patients to be randomized in each arm of the enrichment trial when using continuous response endpoints can be given by the following formula
where $\text{}\omega $ measures the accuracy of the assay and corresponds to the PPV (positive predictive value of the assay, i.e., the proportion of patients who are assigned the biomarker-positive status according to the assay who are truly biomarker positive), ${\lambda}_{1}$ is the treatment effect in the biomarker-positive patients and $\zeta ={\lambda}_{0}/{\lambda}_{1}$ (where ${\lambda}_{0}$ is the treatment effect in the biomarker-negative patients) [55].

$${N}_{enrichment/arm}=2{\sigma}^{2}{\left({z}_{\alpha /2}+{z}_{\beta}\right)}^{2}{\left\{{\lambda}_{1}\left[\left(1-\omega \right)\text{}\zeta +\omega \right]\right\}}^{-2}$$

Simon and Maitournam [65,111,112] considered that apart from the number of patients to be randomized, the number of patients needed to be screened should be also reported. Thus, they stated that the expected number of patients to be screened in the enrichment design is ${N}_{enrichment}/k$ where $k$ corresponds to the proportion of biomarker-positive patients. The online tool developed by Zhao and Simon provides both the number of patients to be screened and to be randomized.

$$\frac{{N}_{stratified}}{{N}_{enrichment}}\approx \frac{1}{{\left[k+\left(1-k\right)\frac{{\delta}_{-}}{{\delta}_{+}}\right]}^{2}}={\left[\frac{{\delta}_{+}}{k{\delta}_{+}+\left(1-k\right){\delta}_{-}}\right]}^{2},$$

$$\frac{{N}_{stratified}}{{N}_{enrichment}}\approx \frac{1}{{k}^{2}},$$

$$\frac{{N}_{stratified}}{{N}_{enrichment}}\approx \frac{4}{{\left(k+1\right)}^{2}}.$$

Randomize-all designs (also named as all-comers/untargeted/unselected/non-targeted/simple randomization designs) allow the inclusion of the entire population as eligible for randomization. Consequently, the whole study population who meet the eligibility criteria, is randomly assigned to the different treatment groups (experimental and control treatment group) regardless of biomarker status. This design allows assessment of treatment benefit for the entire population irrespective of biomarker status whilst at the same time allowing for treatment benefit to be tested in the two biomarker-defined subgroups separately.

Generally, they are useful when we are uncertain about the benefit of the experimental treatment in the overall population versus the biomarker-defined subgroups, the targeted treatment may benefit both biomarker-positive and biomarker-negative patients, the goal is to test the predictive ability of a biomarker, the assay reproducibility and accuracy is questionable, the turnaround time for biomarker assessment is long and the biomarker prevalence is high.

Randomize-all designs are composed of two main subtypes: the Marker-stratified designs and the Hybrid designs, which are discussed separately below.

These designs (prospective validation Phase III trials) were identified in 45 papers (45%) of our review.

Marker-by-treatment interaction using separate test was referred to in 15 papers (15%) of our review [4,11,12,15,29,42,45,53,57,60,80,82,84,87,88] and is also referred to as ‘separate randomization design’ and ‘separate by treatment interaction design’. This analysis plan is based on separate superiority tests in each biomarker-defined subgroup in order to detect the treatment efficacy in each subset. Two examples of actual trials which use this testing plan are the following: National Cancer Institute (NCI)-sponsored North Central Cancer Treatment Group Study N0975 [29] and the MARVEL trial [29].

The ‘marker-by-treatment interaction design using separate tests’ is a testing plan which determines whether the novel treatment is superior to the control treatment separately within each biomarker-defined subgroup. Consequently, the hypothesis to be tested, the calculation of the number of patients required for the trial, the estimation of the statistical power of the design and the randomization procedure of patients to different treatments are independent among the different subgroups [12]. The sample size of the trial should be calculated in such a way so as to yield adequate statistical power when testing whether the experimental treatment is superior to the control treatment separately in the two biomarker-defined subgroups. Hence, this approach is not widely used due to the required large sample size as essentially two separate trials are being conducted. Another limitation of this approach is that when multiple biomarker-defined subsets and treatments are to be investigated, it is difficult to implement in practice.

The ‘marker-by-treatment interaction using interaction test’ uses a test for interaction between the biomarker status and treatment assignment and was identified in 12 papers (12%) of our review [4,12,15,42,53,57,60,82,84,87,88,94]. A marker stratified design which uses this testing plan is also referred to in the literature as an ‘interaction design’ or ‘genomic signature stratified design’. First, a formal statistical test for interaction between biomarker status and treatment assignment is undertaken. If this interaction is not significant, then the study is continued by testing the different treatments overall at a two-sided significance level of 0.05, otherwise, the treatments are compared within each biomarker-defined subpopulation at a two-sided 0.05 significance level (i.e., the same as in the marker-by-treatment interaction design using separate tests). The sample size for this second testing plan is calculated with reference to the treatment effect in the entire study population. Therefore, it might not provide sufficient power for detecting the treatment effect in each biomarker defined-subset individually. More precisely, if the sample size is calculated for the overall analysis and the proportion of the biomarker-defined subpopulation which responds to the novel treatment is very small, the statistical power for the subgroup analysis may be inadequate. In addition, when several biomarker-defined subpopulations and treatments are to be investigated, this strategy is not easy to be implemented.

For the case of binary outcomes, Eng, 2014 [92] provided the formula for the required sample size to power the biomarker-positive and biomarker-negative patients separately. It is assumed that $Y$ is a binary variable which corresponds to a patient’s response to their randomly tailored treatment and $P\left(Y|Trt=i,\text{}M=j\right)={r}_{ij}$ where $i$ corresponds to either the experimental or control treatment and $j$ corresponds to either the biomarker-positive patients or the biomarker-negative patients. Hence,
where ${\beta}_{0}$ denotes a baseline effect, ${\beta}_{A}$ denotes the added effect of the experimental treatment, ${\beta}_{+}$ denotes the biomarker-positive effect and ${\beta}_{I}$ denotes the nonadditive effect. Consequently, the proposed formula for the required sample size can be given by
where $\alpha $ correspond to the target level, $1-\beta $ corresponds to the power. Also, ${r}_{A+},\text{}{r}_{B+}$ are the assumed response rates of biomarker-positive patients receiving the experimental and the control treatment respectively. Additionally, ${r}_{A-},\text{}{r}_{B-}$ are the assumed response rates of biomarker-negative patients receiving the experimental and the control treatment respectively.

$${r}_{ij}={\beta}_{0}+{\beta}_{A}I\left(Trt=A\right)+{\beta}_{+}I\left(M={M}^{+}\right)+{\beta}_{I}I\left(Trt=A,\text{}M={M}^{+}\right),$$

$${N}_{stratified}=2{\left({z}_{a}+{z}_{1-\beta}\right)}^{2}\left\{\frac{{r}_{A+}\left(1-{r}_{A+}\right)+{r}_{B+}\left(1-{r}_{B+}\right)}{{\left({\beta}_{A}+{\beta}_{I}\right)}^{2}}+\frac{{r}_{A-}\left(1-{r}_{A-}\right)+{r}_{B-}\left(1-{r}_{B-}\right)}{{\left({\beta}_{A}\right)}^{2}}\right\},$$

Mandrekar and Sargent, 2009 [31] provide a formula to calculate the required number of events when the trial has a survival outcome with 1:1 randomization to treatment arms, i.e.,
where ${m}_{A+},{m}_{A-},{m}_{B+},{m}_{B-}$, indicate the median overall survival for biomarker-positive and biomarker-negative patients receiving control and experimental treatment, respectively and
correspond to the hazard ratios of biomarker-positive and biomarker-negative subgroups and ${z}_{\alpha /2},\text{}{z}_{\beta}$ denote the upper $\alpha /2$- and upper $\beta $-points respectively of a standard normal distribution where $\alpha $ and $\beta $ denote the assumed type I error and type II error respectively. More precisely, the total number of events is the sum of the required number of events for the biomarker-negative and biomarker-positive subpopulation. Freidlin et al., 2010 [61] stated that the required number of events in order to compare the experimental to the control treatment among the biomarker-positive patients for detecting a given effect size in this biomarker-positive subpopulation is identical to the number of events needed by an enrichment design (i.e., ${D}_{enrichment}$).

$${D}_{stratified}=\frac{4{\left({z}_{a/2}+{z}_{\beta}\right)}^{2}}{{\left[\mathrm{log}\left(\frac{{m}_{A+}}{{m}_{B+}}\right)\right]}^{2}}+\frac{4{\left({z}_{a/2}+{z}_{\beta}\right)}^{2}}{{\left[\mathrm{log}\left(\frac{{m}_{A-}}{{m}_{B-}}\right)\right]}^{2}},$$

$${\theta}_{1}=\frac{{m}_{A+}}{{m}_{B+}}=H{R}_{bio{m}^{+}},$$

$${\theta}_{2}=\frac{{m}_{A-}}{{m}_{B-}}=H{R}_{bio{m}^{-}},$$

Another potential formula for the required total number of events when 1:1 randomization to treatment arms is assumed is given by

$${D}_{stratified}=\frac{4{\left({z}_{a/2}+{z}_{\beta}\right)}^{2}}{{\left[k\mathrm{log}\left({\theta}_{1}\right)+\left(1-k\right)\mathrm{log}\left({\theta}_{2}\right)\right]}^{2}}.$$

Although the formula proposed by Mandrekar and Sargent, 2009 [31] achieves a specific power $\left(1-\beta \right)$ for each biomarker-defined subgroup separately, the aforementioned formula proposed in the book of Harrington, 2012 [114] aims to reach a power $\left(1-\beta \right)$ for the overall population. According to Harrington, 2012 the required total number of patients to be entered to a stratified trial can be given by
where $P{r}_{(+)}\left(event\right)$, $P{r}_{(-)}\left(event\right)$ are the probability of an event in biomarker-positive subset and biomarker-negative subset respectively. If we divide the required total number of events for the enrichment design by the aforementioned formula for the required total number of events for the stratified design, we can get the following approximation of the ratio

$${N}_{stratified}=\frac{4{\left({z}_{a/2}+{z}_{\beta}\right)}^{2}}{\left\{\frac{\left[kP{r}_{(+)}\left(event\right)\mathrm{log}\left({\theta}_{1}\right)+\left(1-k\right)P{r}_{(-)}\left(event\right)\mathrm{log}\left({\theta}_{2}\right)\right]}{\sqrt{kP{r}_{(+)}\left(event\right)+\left(1-k\right)P{r}_{(-)}\left(event\right)}}\right\}},$$

$$\frac{{D}_{stratified}}{{D}_{enrichment}}=\frac{{\left[\mathrm{log}\left({\theta}_{1}\right)\right]}^{2}}{{\left[k\mathrm{log}\left({\theta}_{1}\right)+\left(1-k\right)\mathrm{log}\left({\theta}_{2}\right)\right]}^{2}}=\frac{1}{{\left[k+\left(1-k\right)\frac{\mathrm{log}\left({\theta}_{2}\right)}{\mathrm{log}\left({\theta}_{1}\right)}\right]}^{2}}.$$

Further, Zhao and Simon [19,28,53,57,60] have developed an online tool for the calculation of sample size for biomarker stratified randomized designs with binary or time-to-event endpoints which is available online at the following web site http://brb.nci.nih.gov/brb/samplesize/sdpap.html [115]. More precisely, the sample size for both binary and time-to-event endpoints can be performed with three different analysis plans; A, B and C. Before choosing one of these analysis plans in the web site, for binary endpoints we need to specify the probability of treatment response in the control arm as well as the proportion of biomarker-positive patients. For survival endpoints, the hazard ratio of biomarker-positive patients versus the biomarker-negative control patients which corresponds to the hazard ratio of prognostic effect as well as the proportion of biomarker-positive patients must be specified.

Analysis plan A is performed when there is confidence that an overall treatment effect exists. It determines the sample size on the basis of first of all comparing the experimental treatment to the control treatment in the entire randomized population at a reduced two-sided significance level $a<0.05$. If the overall test is not significant, then the experimental treatment is compared to the control treatment in the biomarker-positive patients using the type I error $a=0.05$. Analysis Plan A is similar to the ‘Biomarker-positive and overall strategies design’ with fall-back analysis described later in this paper; the difference lies in this in terms of the significance levels they have used. In order for the sample size to be estimated, the anticipated overall effect estimate, reduced two-sided significance level and power for the overall test need to be specified.

Analysis plan B is performed when there is confidence that there is a treatment effect in the biomarker-positive subpopulation. It determines the sample size on the basis of first of all comparing the experimental treatment to the control treatment in the biomarker-positive subgroup at a two-sided significance level of $a=\text{}$0.05 level. If the treatment effect is found to be significant at this 0.05 level, then treatment effect is evaluated in the biomarker-negative subgroup again at a two-sided significance level of 0.05 level. This analysis plan is identical to the ‘Sequential subgroup specific design’ described later in this paper. In order for the sample size to be estimated, apart from the fixed significance level set to 0.05, the anticipated effect estimate in the biomarker-positive subpopulation and power need to be specified.

Analysis plan C first tests whether there is a statistically significant interaction between treatment and biomarker [60] at a significance level $a\le 0.05$. If the interaction is not significant, then the treatments are compared in the overall study population at a two-sided significance level 0.05. Otherwise, the treatments are compared within the two biomarker subgroups separately at a two-sided 0.05 significance level for each subgroup. Analysis Plan C follows either the ‘marker-by-treatment interaction process with interaction or the separate test process’ described above. In order for the sample size to be estimated, the anticipated treatment effect in the overall study population, the one-sided significance level for interaction test and the power for testing the treatment effect in the overall population need to be specified.

In marker stratified designs, three designs can be included which differ in terms of their statistical testing strategies, i.e., (i) Subgroup-specific designs (i.e., sequential subgroup-specific design, parallel subgroup-specific design); (ii) Biomarker-positive and overall strategies (i.e., biomarker-positive and overall strategies with parallel assessment, biomarker-positive and overall strategies with sequential assessment, biomarker-positive and overall strategies with fall-back analysis); (iii) Marker sequential test design (MaST) and they are discussed in the following sections.

$${N}_{Sequential\text{}subgroup-specific}^{+}={N}_{enrichment}$$

As Simon, 2008 [60] stated, the total number of patients will be approximately
where $k$ is the proportion of biomarker-positive patients and the number of biomarker-negative patients will be approximately

$${N}_{Sequential\text{}subgroup-specific}=\frac{{N}_{enrichment}}{k}$$

$${N}_{Sequential\text{}subgroup-specific}^{-}=\frac{\left(1-k\right){N}_{enrichment}}{k}.$$

For the conduct of this design, it is important to ensure that there is also an adequate number of biomarker-negative patients for analysis purposes. For time-to-event outcomes, the required number of events for biomarker-positive patients is the same with the required number of events in the enrichment design, i.e.,

$${D}_{Sequential\text{}subgroup-specific}^{+}={D}_{enrichment}.$$

At the time that there are ${D}_{enrichment}$ patients, the required number of events among biomarker-negative patients in terms of that among biomarker-positive patients $\left({D}_{enrichment}\right)$ is given by
where ${\lambda}_{-}$, ${\lambda}_{+}$ are the event rates in biomarker-negative and biomarker-positive control subsets at the time when there are ${D}_{enrichment}$ events in the biomarker-positive subgroup [60].

$${D}_{Sequential\text{}subgroup-specific}^{-}={D}_{enrichment}\left(\frac{{\lambda}_{-}}{{\lambda}_{+}}\right)\left(\frac{1-k}{k}\right),$$

The significance levels $a$ can also be considered as one-sided significance levels in situations where our alternative hypothesis is not that there is just a treatment effect but that the treatment benefit in the experimental group is greater than that of the control group.

As in the sequential subgroup-specific design, the probability of rejecting either the null hypothesis of no treatment effect in the biomarker-positive subset or in the biomarker-negative effect under the global null hypothesis is less than or equal to the overall type I error rate $a$. Additionally, the probability of rejecting the null hypothesis of no treatment effect in the biomarker-negative subpopulation when the treatment benefit is only restricted to biomarker-positive patients is less than or equal to $a$. The significance levels $a$ can be considered as one-sided or two-sided significance levels.

Despite the fact that the biomarker-positive subgroup and overall strategy design allows the treatment effect to be tested in the biomarker-positive subpopulation and provides good statistical power when the treatment effect is homogeneous across subgroups, this design is usually considered problematic and its use is not often recommended. More precisely, a major concern is that when the benefit of the novel treatment is limited to the biomarker-positive patients, it is possible that the design might lead to a wrong recommendation of treatment for the biomarker-negative patients. This might happen because when there is no treatment effect in the biomarker-negative subgroup, there might be an observed effect in the entire population due to the potentially large effect in the biomarker-positive patients. This concern is particularly pronounced in the sequential version of the design, which first tests the biomarker-positive subgroup and then, if it is positive, it tests the overall population.

When the experimental treatment is compared to the control treatment within the overall population and the overall treatment effect is significant, then the test has high statistical power. If we are testing only the biomarker-positive subgroup and the treatment effect in this subgroup is significant, the statistical power is again high. This prospective subset analysis plan is based on testing both the overall study population and the biomarker-positive subgroup using significance levels, which are chosen in such a way that the overall significance level is equal or less than $a$ (type I error). An easy way is to split $a$ in such a way that the significance level for the entire population and the significance level for the biomarker-positive subset equals to overall significance level $a$ (typically $a=0.05$). For example, the SATURN trial (NCT00556712) [96] which employs a prospective subset strategy used the value of 0.03 as level of significance to test the treatment effect in the entire population and the value of 0.02 to test the treatment effect in the biomarker-positive subset; therefore, the overall level of significance was preserved at 0.05. The approach can be overly conservative as in the SATURN trial because of the correlation between the global and subgroup test. Other approaches [98,125,126,127,128] have been proposed for adjusting the level of significance of both tests in a more accurate and less conservative way.

Regarding the sample size for such a design where there is prior evidence indicating strong predictive ability of the biomarker, a standard sample size calculation (i.e., the same sample size calculation as for the enrichment designs) can be used for biomarker-positive subpopulation or alternatively, researchers can use the sample size calculation used for the sequential subgroup-specific design. However, in order to have sufficient number of biomarker-positive patients to detect treatment effectiveness in that particular biomarker-defined subset and consequently to reach the desired power, the sample size should be calculated using the reduced level ${a}_{1}$ $\left[0,a\right]$ instead of the global significance level $\alpha $ which is used in the sample size formulae of the enrichment and sequential subgroup-specific designs. This will result in a small increase in the number of patients as compared to the enrichment and sequential subgroup-specific designs. Otherwise, if the reduced significance level ${a}_{1}$ is not used, this would yield minor loss of power.

Hybrid designs (Phase III) were identified in 14 papers (14%) of our review and they can be included in the all-comers designs, where the entire population is firstly screened for biomarker status and all individuals enter the trial. A graphical illustration of this design is given in Figure 11.

Generally, with biomarker-strategy designs, the study population is randomized to treatment strategies as opposed to treatments per se. More precisely, patients are randomized to either a biomarker-based treatment strategy arm where the biomarker is used in deciding on approach to treatment, or to an arm that does not use the biomarker to guide treatment. Consequently, biomarker-strategy designs make a comparison between two strategies—one which uses biomarker information to inform treatment approach and the other that does not.

These designs are also known as biomarker-based strategy designs or signature-based strategy designs and they are composed of four subtypes; (i) biomarker-strategy designs with biomarker assessment in the control arm; (ii) biomarker-strategy designs without biomarker assessment in the control arm; (iii) biomarker-strategy designs with treatment randomization in the control arm and (iv) reverse marker-based strategy designs. Whilst patients randomized to the non-biomarker based strategy arm in the first two design subtypes are allocated the control treatment, in the third design subtype those patients undergo secondary randomization to either the control or experimental treatment. The fourth design subtype differs from the three aforementioned subtype designs as the non-biomarker based strategy arm is replaced by the reverse marker-strategy arm. The first and second types are similar with the difference being only in terms of ethical/feasibility issues regarding the acquisition of biomarker status at the beginning of the trial.

This approach is preferred when the study is planned for a confirmatory phase of a certain biomarker-based strategy allowing for comparison between the biomarker-based strategy and non-biomarker-based strategy.

This approach is described in 21 (21%) papers of our review.

According to Freidlin et al., 2010 [61], assuming a survival outcome, the required sample size in terms of number of events for this type of biomarker-strategy design in order to reach power $\left(1-\beta \right)$ at significance level $\alpha $ (type I error) can be given by
where $k$ denotes the prevalence of biomarker-positive patients, $\mathsf{\theta}<1$ denotes the assumed hazard ratio in the biomarker-positive subpopulation and ${z}_{\alpha /2},\text{}{z}_{\beta}$ denote the upper $\alpha /2$- and upper $\beta $-points respectively of a standard normal distribution where $\alpha $ and $\beta $ denote the assumed type I error and type II error respectively. According to Freidlin et al. 2010 [61], it is assumed that there is no treatment effect in the biomarker-negative subpopulation (corresponding to a hazard ratio of experimental treatment versus control treatment of 1) and that there is no prognostic effect of the biomarker under the control treatment. Consequently, the overall hazard ratio between experimental and control arms in biomarker-positive patients and biomarker-negative patients can be approximated by $\mathrm{exp}\left[k\text{}\mathrm{log}\mathsf{\theta}+\left(1-k\right)\text{}\mathrm{log}1\right]={\mathsf{\theta}}^{k}$ [61] and this is the reason why the formula which gives the required total number of events $\left({D}_{strategy}\right)$ contains only the hazard ratio of biomarker-positive patients. Freidlin et al., 2010 [61] provided the aforementioned formula assuming that all random assignments use 1:1 randomization.

$${D}_{strategy\text{}I}=4\text{}{\left[\frac{\left({\mathrm{z}}_{\alpha /2}+{\mathrm{z}}_{\beta}\right)}{k\text{}{\mathrm{log}\mathsf{\theta}}_{1}}\right]}^{2},$$

Additionally, Young et al., 2010 [26] determined the total sample size needed for this type of biomarker-strategy designs when using continuous clinical endpoints by
where ${z}_{1-\alpha /2}$, ${z}_{1-\beta}$ denote the lower $1-\alpha /2$- and lower $1-\beta $-points respectively of a standard normal distribution, $\alpha $ and $\beta $ denote the assumed type I error and type II error respectively, ${v}_{m}$ and ${v}_{n}$ denote the mean response from the biomarker-based strategy arm and the non-biomarker-based strategy arm respectively, and ${\tau}_{m}^{2},\text{}{\tau}_{n}^{2}$ denote the variance of response for the biomarker-based strategy arm and non-biomarker-based strategy arm respectively. Young et al., 2010 [26] also provided formulae for the aforementioned variances which depend on sensitivity and specificity of the assay, such that any error in the evaluation of biomarker in the biomarker-based strategy can be accounted for.

$${N}_{strategy\text{}I}=\frac{2{\left({z}_{1-\alpha /2}+{z}_{1-\beta}\right)}^{2}\left({\tau}_{m}^{2}+{\tau}_{n}^{2}\right)}{{\left({v}_{m}-{v}_{n}\right)}^{2}},$$

For the case of binary outcomes, Eng, 2014 [92] provided the formula for the required sample size for each arm in a test of proportions between the two randomization arms (biomarker-based strategy arm and non-biomarker-based strategy arm). This formula can be given by
where $\alpha $ corresponds to the target level, $1-\beta $ corresponds to the power, ${g}_{1}$ is the expected response rate in the biomarker-based strategy arm, ${g}_{2}$ is the expected response rate in the non-biomarker-based strategy arm and ${\Delta}_{2}={g}_{1}-{g}_{2}$. The expected response rates ${g}_{1},{g}_{2}$ can be found by calculating the formulae $k{r}_{A+}+\left(1-k\right){r}_{B-}$ and ${r}_{B}$ respectively, the prevalence of biomarker-positive patients corresponds to $k$ and ${r}_{A+},{r}_{B-}$ are the assumed response rates of biomarker-positive patients receiving the experimental treatment and biomarker-negative patients receiving the control treatment, ${r}_{B}$ denotes the marginal effect of treatment B (control treatment).

$${\mathit{N}}_{\mathit{s}\mathit{t}\mathit{r}\mathit{a}\mathit{t}\mathit{e}\mathit{g}\mathit{y}\text{}\mathit{I}/\mathit{a}\mathit{r}\mathit{m}}=\frac{{\left({z}_{a}+{z}_{1-\beta}\right)}^{2}\left[{g}_{1}\left(1-{g}_{1}\right)+{g}_{2}\left(1-{g}_{2}\right)\right]}{{\Delta}_{2}^{2}}$$

This strategy was identified in 14 papers (14%) of our review.

Sargent and Allegra [108] proposed another version of Biomarker-strategy designs where there is a second randomization between experimental and control treatment in the non-biomarker guided strategy arm. This strategy is referred to in 17 papers (17%) of our review.

$${D}_{strategy\text{}III}=\frac{4{\left({z}_{a/2}+{z}_{\beta}\right)}^{2}}{{\left\{\mathrm{log}\left[\frac{2k{m}_{B+}\text{}+\text{}2\left(1-k\right){m}_{A-}}{k\left({m}_{A+}\text{}+\text{}{m}_{B+}\right)\text{}+\text{}\left(1-k\right)\left({m}_{A-}\text{}+\text{}{m}_{B-}\right)}\right]\right\}}^{2}},$$

Additionally, Young et al., 2010 [26], considering continuous clinical outcomes, calculated the total sample size by
where ${z}_{1-\alpha /2}$, ${z}_{1-\beta}$ denote the lower $1-\alpha /2$- and lower $1-\beta $-points respectively of a standard normal distribution, $\alpha $ and $\beta $ denote the assumed type I error and type II error respectively, ${v}_{m}$ and ${v}_{nr}$ denote the mean response from the biomarker-based strategy arm and the non-biomarker-based strategy arm,) and ${\tau}_{m}^{2},\text{}{\tau}_{nr}^{2}$ denote the variance of response for the biomarker-based strategy arm and non-biomarker-based strategy arm respectively. The only differences in the mathematical formula for the total sample size ${n}_{t}$ between this type of biomarker-strategy design and the first and second types mentioned above are the values of ${v}_{nr}$ and ${\tau}_{nr}^{2}$, to reflect the fact that in the non-biomarker-based strategy arm patients are randomly assigned to either the experimental or control treatment. Again, the formulae can be adjusted to account for uncertainty in biomarker assessment.

$${N}_{strategy\text{}III}=\frac{2{\left({z}_{1-\alpha /2}+{z}_{1-\beta}\right)}^{2}\left({\tau}_{m}^{2}+{\tau}_{nr}^{2}\right)}{{\left({v}_{m}-{v}_{nr}\right)}^{2}},$$

For the case of binary outcomes, Eng, 2014 [92] provided the formula for the required sample size for each arm in a test of proportions between the two randomization arms (biomarker-based strategy arm and non-biomarker-based strategy arm). This formula can be given by
where $\alpha $ correspond to the target level, $1-\beta $ corresponds to the power, ${g}_{1}$ is the expected response rate in the biomarker-based strategy arm, ${g}_{3}$ is the expected response rate in the non biomarker-based strategy arm and ${\Delta}_{3}={g}_{1}-{g}_{3}$. The expected response rates ${g}_{1},\text{}{g}_{3}$ can be found by calculating the formulae $k{r}_{A+}+\left(1-k\right){r}_{B-}$ and ${r}_{A}/2+{r}_{B}/2$ respectively, ${r}_{A}$ and ${r}_{B}$ denote the marginal effect of treatment A (experimental treatment) and treatment B (control treatment) respectively. ${r}_{A+},\text{}{r}_{B-}$ are the assumed response rates of biomarker-positive patients receiving the experimental treatment and biomarker-negative patients receiving the control treatment. The prevalence of biomarker-positive patients corresponds to $k$.

$${N}_{strategy\text{}III/arm}=\frac{{\left({z}_{a}+{z}_{1-\beta}\right)}^{2}\left[{g}_{1}\left(1-{g}_{1}\right)+{g}_{3}\left(1-{g}_{3}\right)\right]}{{\Delta}_{3}^{2}}$$

Eng, 2014 [92] proposed another version of biomarker-strategy designs where the non-biomarker-based strategy arm which is included in the three aforementioned subtypes of biomarker-strategy designs is replaced by the reverse marker-strategy arm. This strategy is referred to in four papers (4%) of our review.

$${N}_{strategy\text{}IV/arm}=\frac{{\left({z}_{a}+{z}_{1-\beta}\right)}^{2}\left[{g}_{1}\left(1-{g}_{1}\right)+{g}_{4}\left(1-{g}_{4}\right)\right]}{{\Delta}_{4}^{2}}$$

Freidlin et al., 2012 [71] proposed a biomarker-guided Phase II clinical trial design in which when it is completed, it recommends which type of Phase III trial should be used. These recommendations for a Phase III trial are the following: (i) enrichment design; (ii) marker-stratified design; (iii) a traditional trial design without a biomarker; or (iv) drop consideration of the experimental treatment. A graphical illustration of this design is given in Figure 16.

The design starts by comparing the experimental treatment with the control treatment in the biomarker-positive subgroup using a one-sided level of significance ${a}_{1}=0.10$. The null hypothesis is that the progression-free survival for biomarker-positive patients is the same for both experimental and control treatment arm ($H{R}_{0,\text{}biom+}\le 1\text{}\mathrm{vs}.\text{}H{R}_{1,\text{}biom+}1)$. Next, if the null hypothesis is rejected, which means that the experimental treatment is better than the control treatment in the biomarker-positive subgroup we continue with the calculation of an 80% two-sided confidence interval (CI) for the hazard ratio (control vs experimental) in the biomarker-negative subpopulation. Three decisions are made according to the values of the CI: (i) if the entire CI is less than 1.3 then we can continue with a Phase III enrichment design; (ii) if the CI includes the values 1.3 or 1.5 then we can continue with a Phase III marker-stratified design and (iii) if the entire CI is greater than 1.5 then it seems that the biomarker is not useful as the novel treatment benefits only the biomarker-negative patients, thus, the biomarker should be dropped and a traditional randomized Phase III design should be conducted. Otherwise, if the null hypothesis is not rejected at the one-sided significance ${a}_{1}=0.10$ (meaning that that the experimental treatment is not better than the experimental treatment in the biomarker-positive subgroup), then we continue with the comparison of treatments in the overall study population at one-sided level of significance $a=0.05$. If the null hypothesis of no treatment effect in the entire population is rejected, then the authors recommend to drop the biomarker and to continue with a traditional randomized Phase III trial due to the fact that the biomarker seems to be useless. On the other hand, if the null hypothesis is not rejected, the experimental treatment should not be tested further as it does not seem to be effective.

A number of biomarker-guided trial designs have been proposed in the past decade, including both biomarker-guided adaptive and non-adaptive trial designs. We have undertaken a comprehensive review of the literature using an in-depth search strategy to report on the biomarker-guided designs proposed to date, with a view to providing the research community with clarity in definition, methodology and terminology of the various trial designs. The review is split in two parts due to its size; the first part of the review is focused on adaptive designs which are extensively discussed in our published paper “Biomarker-Guided Adaptive Trial Designs in Phase II and Phase III: a Methodological Review”, Antoniou et al., 2016 [35], whereas, herein we focus on non-adaptive designs which incorporate biomarkers.

The review has demonstrated ambiguity and confusion regarding the biomarker-guided non-adaptive designs proposed by different authors. In this review, we focus on 5 main types of such designs including their subtypes and variations. Knowledge on how to implement and analyse these designs are essential in testing the effectiveness of a biomarker-guided approach to treatment; hence, a comprehensive review giving this knowledge is essential for the research community. In our in-depth study, we provide researchers with analytical information of these study designs not only in terms of their utility, advantages and limitations but also in terms of their methodology. In addition, a graphical illustration for each biomarker-guided design is given. A guidance document by Tajik et al., 2012 [117] regarding the evaluation of putative biomarkers in randomized clinical trials came to our knowledge by personal communication as we were not able to identify it during our literature search.

The non-adaptive designs do not allow modifications of important aspects of the trial such as refinement of the existing study population, treatment assignment, study endpoints, study duration, etc. In non-adaptive designs, all these factors are defined before the initiation of the study and they are kept fixed during the course of the clinical trial. However, there is a great potential of failure when implementing such conventional designs due to potential wrong design assumptions of the key aspects of the study that might be made before the conduct of the trial. Hence, an adaptive design clinical study which allows on-going adaptations based on accumulating study data from interim analysis might hold advantageous position as compared to the non-adaptive trial design due to its flexibility. However, before implementing an adaptive design a lot of issues should be taken into careful consideration by research teams in order to prove that there are good reasons for conducting such designs. Regulatory and logistical issues, requirement of additional efforts for the achievement of the design, potential difficulties, possible increased cost and time, statistical challenges including the potential increase of the chance of a false conclusion that the treatment is effective (inflation of Type I error) and whether the adaptation process has led to positive study results that are difficult to interpret irrespective of having control of Type I error should be considered [130]. A recent paper by Dimairo et al., 2015 [131] refers to a number of obstacles and barriers when implementing adaptive designs in practice. Several key stakeholders in clinical trials research have been interviewed (i.e., UK Clinical Trials Units directors, funding board and panel members, statisticians, regulators, chief investigators, data monitoring committee members and health economists) expressing difficulties of adaptive designs. Lack of appropriate knowledge and familiarity of these designs in the scientific community, insufficient time and funding structure, additional work required due to the complexity of such designs and the needed statistical expertise and appropriate software are some of the highlighted difficulties mentioned in the paper of Dimairo et al., 2015 [131]. In addition, this study includes the characterisation of potential benefits of an adaptive design to patients, clinical trials as well as funders.

The different designs proposed so far for biomarker-guided designs, both non-adaptive designs which remain an appealing approach to a great extent mainly due to their simplicity and adaptive designs which are more flexible need to be further explored by the research community, as the proper choice and use of such designs can result in a great increase in the efficiency of a trial and expedite the development of novel treatments.

The characteristics and methodology of the five main designs and their subtypes are discussed in the current paper, whilst information on their variations are summarized in File S1-S4. Additional references for these variations and the literature review search strategy are provided in [132,133].

The following are available online at www.mdpi.com/2075-4426/7/1/1/s1, File S1-S4: Extensions of Biomarker-guided non-adaptive trial designs, Keywords S1: Literature review search strategies for both biomarker-guided clinical trial designs and for traditional trial designs.

The authors express their gratitude to the two anonymous reviewers for their suggestions.

M.A., R.K.-D. and A.L.J. conceived and designed the experiments; M.A. performed the experiments; M.A. analyzed the data; M.A., R.K.-D. and A.L.J. contributed reagents/materials/analysis tools; M.A., R.K.-D. and A.L.J. wrote the paper. First author (M.A.) screened the available titles and abstracts, and second and third authors (R.K.-D., A.L.J.) were consulted where it was questionable whether a paper should be included or not. A data extraction form was designed to collect all necessary information, and the summary of the extracted data was reviewed by the second and third authors (R.K.-D, A.L.J.).

The authors declare no conflict of interest.

- George, S.L. Statistical issues in translational cancer research. Clin. Cancer Res.
**2008**, 14, 5954–5958. [Google Scholar] [CrossRef] [PubMed] - Chabner, B. Advances and challenges in the use of biomarkers in clinical trials. Clin. Adv. Hematol. Oncol.
**2008**, 6, 42–43. [Google Scholar] [PubMed] - Group, B.D.W. Biomarkers and surrogate endpoints: Preferred definitions and conceptual framework. Clin. Pharmacol. Ther.
**2001**, 69, 89–95. [Google Scholar] - Shi, Q.; Mandrekar, S.J.; Sargent, D.J. Predictive biomarkers in colorectal cancer: Usage, validation, and design in clinical trials. Scand. J. Gastroenterol.
**2012**, 47, 356–362. [Google Scholar] [CrossRef] [PubMed] - Pihlstrom, B.L.; Barnett, M.L. Design, operation, and interpretation of clinical trials. J. Dent. Res.
**2010**, 89, 759–772. [Google Scholar] [CrossRef] [PubMed] - Rigatto, C.; Barrett, B.J. Biomarkers and surrogates in clinical studies. Methods Mol. Biol.
**2009**, 473, 137–154. [Google Scholar] [PubMed] - Mandrekar, S.J.; An, M.-W.; Sargent, D.J. A review of phase II trial designs for initial marker validation. Contemp. Clin. Trials
**2013**, 36, 597–604. [Google Scholar] [CrossRef] [PubMed] - Karuri, S.W.; Simon, R. A two-stage bayesian design for co-development of new drugs and companion diagnostics. Stat. Med.
**2012**, 31, 901–914. [Google Scholar] [CrossRef] [PubMed] - Matsui, S. Genomic biomarkers for personalized medicine: Development and validation in clinical studies. Comput. Math. Methods Med.
**2013**, 2013, 865980. [Google Scholar] [CrossRef] [PubMed] - Buyse, M.; Michiels, S. Omics-based clinical trial designs. Curr. Opin. Oncol.
**2013**, 25, 289–295. [Google Scholar] [CrossRef] [PubMed] - Wu, W.; Shi, Q.; Sargent, D.J. Statistical considerations for the next generation of clinical trials. Semin. Oncol.
**2011**, 38, 598–604. [Google Scholar] [CrossRef] [PubMed] - Sargent, D.J.; Conley, B.A.; Allegra, C.; Collette, L. Clinical trial designs for predictive marker validation in cancer treatment trials. J. Clin. Oncol.
**2005**, 23, 2020–2027. [Google Scholar] [CrossRef] [PubMed] - Chen, J.J.; Lu, T.-P.; Chen, D.-T.; Wang, S.-J. Biomarker adaptive designs in clinical trials. Transl. Cancer Res.
**2014**, 3, 279–292. [Google Scholar] - Freidlin, B.; Sun, Z.; Gray, R.; Korn, E.L. Phase III clinical trials that integrate treatment and biomarker evaluation. J. Clin. Oncol.
**2013**, 31, 3158–3161. [Google Scholar] [CrossRef] [PubMed] - Gosho, M.; Nagashima, K.; Sato, Y. Study designs and statistical analyses for biomarker research. Sensors
**2012**, 12, 8966–8986. [Google Scholar] [CrossRef] [PubMed] - Ming-Wen An, S.J.M.; Daniel, J.S. Biomarkers-guided targeted drugs: New clinical trials design and practice necessity. Adv. Personal. Cancer Manag.
**2011**, 30–41. [Google Scholar] - Buyse, M. Towards validation of statistically reliable biomarkers. Eur. J. Cancer Suppl.
**2007**, 5, 89–95. [Google Scholar] [CrossRef] - Lee, C.K.; Lord, S.J.; Coates, A.S.; Simes, R.J. Molecular biomarkers to individualise treatment: Assessing the evidence. Med. J. Aust.
**2009**, 190, 631–636. [Google Scholar] [PubMed] - Simon, R. Clinical trial designs for evaluating the medical utility of prognostic and predictive biomarkers in oncology. Personal. Med.
**2010**, 7, 33–47. [Google Scholar] [CrossRef] [PubMed] - Fraser, G.A.M.; Meyer, R.M. Biomarkers and the design of clinical trials in cancer. Biomark. Med.
**2007**, 1, 387–397. [Google Scholar] [CrossRef] [PubMed] - Mandrekar, S.J.; Sargent, D.J. Design of clinical trials for biomarker research in oncology. Clin. Investig.
**2011**, 1, 1629–1636. [Google Scholar] [CrossRef] [PubMed] - Simon, R. Advances in clinical trial designs for predictive biomarker discovery and validation. Curr. Breast Cancer Rep.
**2009**, 1, 216–221. [Google Scholar] [CrossRef] - Polley, M.-Y.C.; Freidlin, B.; Korn, E.L.; Conley, B.A.; Abrams, J.S.; McShane, L.M. Statistical and practical considerations for clinical evaluation of predictive biomarkers. J. Natl. Cancer Inst.
**2013**, 105, 1677–1683. [Google Scholar] [CrossRef] [PubMed] - Bradley, E. Incorporating biomarkers into clinical trial designs: Points to consider. Nat. Biotechnol.
**2012**, 30, 596–599. [Google Scholar] [CrossRef] [PubMed] - Beckman, R.A.; Clark, J.; Chen, C. Integrating predictive biomarkers and classifiers into oncology clinical development programmes. Nat. Rev. Drug Discov.
**2011**, 10, 735–748. [Google Scholar] [CrossRef] [PubMed] - Young, K.Y.; Laird, A.; Zhou, X.H. The efficiency of clinical trial designs for predictive biomarker validation. Clin. Trials
**2010**, 7, 557–566. [Google Scholar] [CrossRef] [PubMed] - Lee, J.J.; Xuemin, G.; Suyu, L. Bayesian adaptive randomization designs for targeted agent development. Clin. Trials
**2010**, 7, 584–596. [Google Scholar] [CrossRef] [PubMed] - Simon, R. Clinical trials for predictive medicine: New challenges and paradigms. Clin. Trials
**2010**, 7, 516–524. [Google Scholar] [CrossRef] [PubMed] - Buyse, M.; Sargent, D.J.; Grothey, A.; Matheson, A.; de Gramont, A. Biomarkers and surrogate end points—The challenge of statistical validation. Nat. Rev. Clin. Oncol.
**2010**, 7, 309–317. [Google Scholar] [CrossRef] [PubMed] - Mandrekar, S.J.; Sargent, D.J. Clinical trial designs for predictive biomarker validation: Theoretical considerations and practical challenges. J. Clin. Oncol.
**2009**, 27, 4027–4034. [Google Scholar] [CrossRef] [PubMed] - Mandrekar, S.J.; Sargent, D.J. Clinical trial designs for predictive biomarker validation: One size does not fit all. J. Biopharm. Stat.
**2009**, 19, 530–542. [Google Scholar] [CrossRef] [PubMed] - Hoering, A.; Leblanc, M.; Crowley, J.J. Randomized phase III clinical trial designs for targeted agents. Clin. Cancer Res.
**2008**, 14, 4358–4367. [Google Scholar] [CrossRef] [PubMed] - Kelloff, G.J.; Sigman, C.C. Cancer biomarkers: Selecting the right drug for the right patient. Nat. Rev. Drug Discov.
**2012**, 11, 201–214. [Google Scholar] [CrossRef] [PubMed] - Chow, S.-C. Adaptive clinical trial design. Annu. Rev. Med.
**2014**, 65, 405–415. [Google Scholar] [CrossRef] [PubMed] - Antoniou, M.; Jorgensen, A.L.; Kolamunnage-Dona, R. Biomarker-guided adaptive trial designs in phase II and phase III: A methodological review. PLoS ONE
**2016**, 11, e0149803. [Google Scholar] [CrossRef] [PubMed] - Tajik, P.; Zwinderman, A.H.; Mol, B.W.; Bossuyt, P.M. Trial designs for personalizing cancer care: A systematic review and classification. Clin. Cancer Res.
**2013**, 19, 4578–4588. [Google Scholar] [CrossRef] [PubMed] - Lader, E.W.; Cannon, C.P.; Ohman, E.M.; Newby, L.K.; Sulmasy, D.P.; Barst, R.J.; Fair, J.M.; Flather, M.; Freedman, J.E.; Frye, R.L.; et al. The clinician as investigator: Participating in clinical trials in the practice setting: Appendix 1: Fundamentals of study design. Circulation
**2004**, 109, e302–e304. [Google Scholar] [CrossRef] [PubMed] - Stingl Kirchheiner, J.C.; Brockmöller, J. Why, when, and how should pharmacogenetics be applied in clinical studies? Current and future approaches to study designs. Clin. Pharm. Ther.
**2011**, 89, 198–209. [Google Scholar] [CrossRef] [PubMed] - Sambucini, V. A bayesian predictive two-stage design for phase II clinical trials. Stat. Med.
**2008**, 27, 1199–1224. [Google Scholar] [CrossRef] [PubMed] - Ang, M.-K.; Tan, S.-B.; Lim, W.-T. Phase II clinical trials in oncology: Are we hitting the target? Expert Rev. Anticancer Ther.
**2010**, 10, 427–438. [Google Scholar] [CrossRef] [PubMed] - Farley, J.; Rose, P.G.; Farley, J.; Rose, P.G. Trial design for evaluation of novel targeted therapies. Gynecol. Oncol.
**2010**, 116, 173. [Google Scholar] [CrossRef] [PubMed] - Kaplan, R.; Maughan, T.; Crook, A.; Fisher, D.; Wilson, R.; Brown, L.; Parmar, M. Evaluating many treatments and biomarkers in oncology: A new design. J. Clin. Oncol.
**2013**, 31, 4562–4568. [Google Scholar] [CrossRef] [PubMed] - Hodgson, D.R.; Wellings, R.; Harbron, C. Practical perspectives of personalized healthcare in oncology. New Biotechnol.
**2012**, 29, 656–664. [Google Scholar] [CrossRef] [PubMed] - Mandrekar, S.J.; Sargent, D.J. Predictive biomarker validation in practice: Lessons from real trials. Clin. Trials
**2010**, 7, 567–573. [Google Scholar] [CrossRef] [PubMed] - Galanis, E.; Wu, W.; Sarkaria, J.; Chang, S.M.; Colman, H.; Sargent, D.; Reardon, D.A. Incorporation of biomarker assessment in novel clinical trial designs: Personalizing brain tumor treatments. Curr. Oncol. Rep.
**2011**, 13, 42–49. [Google Scholar] [CrossRef] [PubMed] - Van Schaeybroeck, S.; Allen, W.L.; Turkington, R.C.; Johnston, P.G. Implementing prognostic and predictive biomarkers in CRC clinical trials. Nat. Rev. Clin. Oncol.
**2011**, 8, 222–232. [Google Scholar] [CrossRef] [PubMed] - Buyse, M.; Michiels, S.; Sargent, D.J.; Grothey, A.; Matheson, A.; de Gramont, A. Integrating biomarkers in clinical trials. Expert rev. Mol. Diagn.
**2011**, 11, 171–182. [Google Scholar] [CrossRef] [PubMed] - Sparano, J.A.; Paik, S. Development of the 21-gene assay and its application in clinical practice and clinical trials. J. Clin. Oncol.
**2008**, 26, 721–728. [Google Scholar] [CrossRef] [PubMed] - Freidlin, B.; Korn, E.L. Biomarker enrichment strategies: Matching trial design to biomarker credentials. Nat. Rev. Clin. Oncol.
**2014**, 11, 81–90. [Google Scholar] [CrossRef] [PubMed] - Simon, R.; Polley, E. Clinical trials for precision oncology using next-generation sequencing. Personal. Med.
**2013**, 10, 485–495. [Google Scholar] [CrossRef] - Baker, S.G.; Kramer, B.S.; Sargent, D.J.; Bonetti, M. Biomarkers, subgroup evaluation, and clinical trial design. Discov. Med.
**2012**, 13, 187–192. [Google Scholar] [PubMed] - Buch, M.H.; Pavitt, S.; Parmar, M.; Emery, P. Creative trial design in RA: Optimizing patient outcomes. Nat. Rev. Rheumatol.
**2013**, 9, 183–194. [Google Scholar] [CrossRef] [PubMed] - Simon, R. Clinical trials for predictive medicine. Stat. Med.
**2012**, 31, 3031–3040. [Google Scholar] [CrossRef] [PubMed] - Scher, H.I.; Nasso, S.F.; Rubin, E.H.; Simon, R. Adaptive clinical trial designs for simultaneous testing of matched diagnostics and therapeutics. Clin. Cancer Res.
**2011**, 17, 6634–6640. [Google Scholar] [CrossRef] [PubMed] - Sato, Y.; Laird, N.M.; Yoshida, T. Biostatistic tools in pharmacogenomics—Advances, challenges, potential. Curr. Pharm. Des.
**2010**, 16, 2232–2240. [Google Scholar] [CrossRef] [PubMed] - Mandrekar, S.J.; Sargent, D.J.; Mandrekar, S.; Sargent, D. Genomic advances and their impact on clinical trial design. Genome Med.
**2009**, 1, 69. [Google Scholar] [CrossRef] [PubMed] - Simon, R. Designs and adaptive analysis plans for pivotal clinical trials of therapeutics and companion diagnostics. Expert opin. Med. Diagn.
**2008**, 2, 721–729. [Google Scholar] [CrossRef] [PubMed] - Dobbin, K.K. Statistical design and evaluation of biomarker studies. Methods Mol. Biol.
**2014**, 1102, 667–677. [Google Scholar] [PubMed] - Ananthakrishnan, R.; Menon, S. Design of oncology clinical trials: A review. Crit. Rev. Oncol./Hematol.
**2013**, 88, 144–153. [Google Scholar] [CrossRef] [PubMed] - Simon, R. The use of genomics in clinical trial design. Clin. Cancer Res.
**2008**, 14, 5984–5993. [Google Scholar] [CrossRef] [PubMed] - Freidlin, B.; McShane, L.M.; Korn, E.L. Randomized clinical trials with biomarkers: Design issues. J. Natl. Cancer Inst.
**2010**, 102, 152–160. [Google Scholar] [CrossRef] [PubMed] - Johnson, D.R.; Galanis, E. Incorporation of prognostic and predictive factors into glioma clinical trials. Curr. Oncol. Rep.
**2013**, 15, 56–63. [Google Scholar] [CrossRef] [PubMed] - Sparano, J. TAILORx: Trial assigning individualized options for treatment (Rx). Clin. Breast Cancer
**2006**, 7, 347–350. [Google Scholar] [CrossRef] [PubMed] - Di Maio, M.; Gallo, C.; De Maio, E.; Morabito, A.; Piccirillo, M.C.; Gridelli, C.; Perrone, F. Methodological aspects of lung cancer clinical trials in the era of targeted agents. Lung Cancer
**2010**, 67, 127–135. [Google Scholar] [CrossRef] [PubMed] - Maitournam, A.; Simon, R. On the efficiency of targeted clinical trials. Stat. Med.
**2005**, 24, 329–339. [Google Scholar] [CrossRef] [PubMed] - Collette, L.; Bogaerts, J.; Suciu, S.; Fortpied, C.; Gorlia, T.; Coens, C.; Mauer, M.; Hasan, B.; Collette, S.; Ouali, M.; et al. Statistical methodology for personalized medicine: New developments at EORTC headquarters since the turn of the 21st century. Eur. J. Cancer Suppl.
**2012**, 10, 13. [Google Scholar] [CrossRef] - Mandrekar, S.J.; Sargent, D.J. All-comers versus enrichment design strategy in phase II trials. J. Thorac. Oncol.
**2011**, 6, 658–660. [Google Scholar] [CrossRef] [PubMed] - Simon, R. Development and validation of biomarker classifiers for treatment selection. J. Stat. Plan. Inference
**2008**, 138, 308–320. [Google Scholar] [CrossRef] [PubMed] - Freidlin, B.; Korn, E.L.; Gray, R. Marker sequential test (MaST) design. Clin. Trials
**2014**, 11, 19–27. [Google Scholar] [CrossRef] [PubMed] - Wason, J.; Marshall, A.; Dunn, J.; Stein, R.C.; Stallard, N. Adaptive designs for clinical trials assessing biomarker-guided treatment strategies. Br. J. Cancer
**2014**, 110, 1950–1957. [Google Scholar] [CrossRef] [PubMed] - Freidlin, B.; McShane, L.M.; Polley, M.-Y.C.; Korn, E.L. Randomized phase II trial designs with biomarkers. J. Clin. Oncol.
**2012**, 30, 3304–3309. [Google Scholar] [CrossRef] [PubMed] - Ziegler, A.; Koch, A.; Krockenberger, K.; Grosshennig, A. Personalized medicine using DNA biomarkers: A review. Hum. Genet.
**2012**, 131, 1627–1638. [Google Scholar] [CrossRef] [PubMed] - Freidlin, B.; Korn, E.L. Biomarker-adaptive clinical trial designs. Pharmacogenomics
**2010**, 11, 1679–1682. [Google Scholar] [CrossRef] [PubMed] - Eickhoff, J.C.; Kim, K.; Beach, J.; Kolesar, J.M.; Gee, J.R. A bayesian adaptive design with biomarkers for targeted therapies. Clin. Trials
**2010**, 7, 546–556. [Google Scholar] [CrossRef] [PubMed] - Ferraldeschi, R.; Attard, G.; de Bono, J.S. Novel strategies to test biological hypotheses in early drug development for advanced prostate cancer. Clin. Chem.
**2013**, 59, 75–84. [Google Scholar] [CrossRef] [PubMed] - Coyle, V.M.; Johnston, P.G. Genomic markers for decision making: What is preventing us from using markers? Nat. Rev. Clin. Oncol.
**2010**, 7, 90–97. [Google Scholar] [CrossRef] [PubMed] - Chen, C.F.; Lin, J.R.; Liu, J.P. Statistical inference on censored data for targeted clinical trials under enrichment design. Pharm. Stat.
**2013**, 12, 165–173. [Google Scholar] [CrossRef] [PubMed] - Liu, J.P.; Lin, J.R. Statistical methods for targeted clinical trials under enrichment design. J. Formos. Med. Assoc.
**2008**, 107, 35–42. [Google Scholar] [CrossRef] - Scheibler, F.; Zumbé, P.; Janssen, I.; Viebahn, M.; Schröer-Günther, M.; Grosselfinger, R.; Hausner, E.; Sauerland, S.; Lange, S. Randomized controlled trials on pet: A systematic review of topics, design, and quality. J. Nucl. Med.
**2012**, 53, 1016–1025. [Google Scholar] [CrossRef] [PubMed] - An, M.-W.; Mandrekar, S.J.; Sargent, D.J. A 2-stage phase II design with direct assignment option in stage II for initial marker validation. Clin. Cancer Res.
**2012**, 18, 4225–4233. [Google Scholar] [CrossRef] [PubMed] - Zheng, G.; Wu, C.O.; Yang, S.; Waclawiw, M.A.; DeMets, D.L.; Geller, N.L. NHLBI clinical trials workshop: An executive summary. Stat. Med.
**2012**, 31, 2938. [Google Scholar] [CrossRef] [PubMed] - Bria, E.; Di Maio, M.; Carlini, P.; Cuppone, F.; Giannarelli, D.; Cognetti, F.; Milella, M. Targeting targeted agents: Open issues for clinical trial design. J. Exp. Clin. Cancer Res.
**2009**, 28, 66. [Google Scholar] [CrossRef] [PubMed] - French, B.; Joo, J.; Geller, N.L.; Kimmel, S.E.; Rosenberg, Y.; Anderson, J.L.; Gage, B.F.; Johnson, J.A.; Ellenberg, J.H. Statistical design of personalized medicine interventions: The Clarification of Optimal Anticoagulation through Genetics (Coag) trial. Trials
**2010**, 11, 108. [Google Scholar] [CrossRef] [PubMed] - Lin, J.-A.; He, P. Reinventing clinical trials: A review of innovative biomarker trial designs in cancer therapies. Br. Med. Bull.
**2015**, 114, 17–27. [Google Scholar] [CrossRef] [PubMed] - Renfro, L.A.; Mallick, H.; An, M.-W.; Sargent, D.J.; Mandrekar, S.J. Clinical trial designs incorporating predictive biomarkers. Cancer Treat. Rev.
**2016**, 43, 74–82. [Google Scholar] [CrossRef] [PubMed] - Ondra, T.; Dmitrienko, A.; Friede, T.; Graf, A.; Miller, F.; Stallard, N.; Posch, M. Methods for identification and confirmation of targeted subgroups in clinical trials: A systematic review. J. Biopharm. Stat.
**2016**, 26, 99–119. [Google Scholar] [CrossRef] [PubMed] - Simon, R.; Wang, S.J. Use of genomic signatures in therapeutics development in oncology and other diseases. Pharmacogenom. J.
**2006**, 6, 166–173. [Google Scholar] [CrossRef] [PubMed] - European Medicines Agency. Reflection Paper on Methodological Issues Associated with Pharmacogenomic Biomarkers in Relation to Clinical Development and Patient Selection. Available online: http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2011/07/WC500108672.pdf (accessed on 10 October 2015).
- Lai, T.L.; Liao, O.Y.-W.; Kim, D.W. Group sequential designs for developing and testing biomarker-guided personalized therapies in comparative effectiveness research. Contemp. Clin. Trials
**2013**, 36, 651–663. [Google Scholar] [CrossRef] [PubMed] - Foley, R.N. Analysis of randomized controlled clinical trials. Methods Mol. Biol.
**2009**, 473, 113–126. [Google Scholar] [PubMed] - Tajik, P.; Bossuyt, P.M. Genomic markers to tailor treatments: Waiting or initiating? Hum. Genet.
**2011**, 130, 15–18. [Google Scholar] [CrossRef] [PubMed] - Eng, K.H. Randomized reverse marker strategy design for prospective biomarker validation. Stat. Med.
**2014**, 33, 3089–3099. [Google Scholar] [CrossRef] [PubMed] - Baker, S.G. Biomarker evaluation in randomized trials: Addressing different research questions. Stat. Med.
**2014**, 33, 4139–4140. [Google Scholar] [CrossRef] [PubMed] - Matsui, S.; Choai, Y.; Nonaka, T. Comparison of statistical analysis plans in randomize-all phase III trials with a predictive biomarker. Clin. Cancer Res.
**2014**, 20, 2820–2830. [Google Scholar] [CrossRef] [PubMed] - Cappuzzo, F.; Ciuleanu, T.; Stelmakh, L.; Cicenas, S.; Szczésna, A.; Juhász, E.; Esteban, E.; Molinier, O.; Brugger, W.; Melezínek, I.; et al. Erlotinib as maintenance treatment in advanced non-small-cell lung cancer: A multicentre, randomised, placebo-controlled phase 3 study. Lancet Oncol.
**2010**, 11, 521. [Google Scholar] [CrossRef] - Hoffmann-La Roche. A Randomized, Double-Blind Study to Evaluate the Effect of Tarceva or Placebo Following Platinum-Based CT on Overall Survival and Disease Progression in Patients with Advanced, Recurrent or Metastatic NSCLS Who Have Not Experienced Disease Progression or Unacceptable Toxicity during Chemotherapy. Available online: https://clinicaltrials.gov/ct2/show/NCT00556712?term=NCT00556712&rank=1 (accessed on 10 October 2015).
- Choai, Y.; Matsui, S. Estimation of treatment effects in all-comers randomized clinical trials with a predictive marker: Estimating treatment effects in marker-based randomized trials. Biometrics
**2015**, 71, 25. [Google Scholar] [CrossRef] [PubMed] - Wang, S.-J.; O’Neill, R.T.; Hung, H.M.J. Approaches to evaluation of treatment effect in randomized clinical trials with genomic subset. Pharm. Stat.
**2007**, 6, 227–244. [Google Scholar] [CrossRef] [PubMed] - Cree, I.A.; Kurbacher, C.M.; Lamont, A.; Hindley, A.C.; Love, S. A prospective randomized controlled trial of tumour chemosensitivity assay directed chemotherapy versus physician’s choice in patients with recurrent platinum-resistant ovarian cancer. Anti-Cancer Drugs
**2007**, 18, 1093. [Google Scholar] [CrossRef] [PubMed] - Cobo, M.; Isla, D.; Massuti, B.; Montes, A.; Sanchez, J.M.; Provencio, M.; Viñolas, N.; Paz-Ares, L.; Lopez-Vivanco, G.; Muñoz, M.A.; et al. Customizing cisplatin based on quantitative excision repair cross-complementing 1 mRNA expression: A phase III trial in non-small-cell lung cancer. J. Clin. Oncol.
**2007**, 25, 2747. [Google Scholar] [CrossRef] [PubMed] - Lijmer, J.G.; Bossuyt, P.M.M. Various randomized designs can be used to evaluate medical tests. J. Clin. Epidemiol.
**2009**, 62, 364. [Google Scholar] [CrossRef] [PubMed] - Wang, S.-J. Biomarker as a classifier in pharmacogenomics clinical trials: A tribute to 30th anniversary of PSI. Pharm. Stat.
**2007**, 6, 283–296. [Google Scholar] [CrossRef] [PubMed] - Cho, D.; McDermott, D.; Atkins, M. Designing clinical trials for kidney cancer based on newly developed prognostic and predictive tools. Curr. Urol. Rep.
**2006**, 7, 8–15. [Google Scholar] [CrossRef] [PubMed] - Af Geijerstam, J.L.; Oredsson, S.; Britton, M. Medical outcome after immediate computed tomography or admission for observation in patients with mild head injury: Randomised controlled trial. Br. Med. J.
**2006**, 333, 465. [Google Scholar] [CrossRef] [PubMed] - Ferrante di Ruffano, L.; Davenport, C.; Eisinga, A.; Hyde, C.; Deeks, J.J. A capture-recapture analysis demonstrated that randomized controlled trials evaluating the impact of diagnostic tests on patient outcomes are rare. J. Clin. Epidemiol.
**2012**, 65, 282. [Google Scholar] [CrossRef] [PubMed] - Mandrekar, S.J.; Grothey, A.; Goetz, M.P.; Sargent, D.J. Clinical trial designs for prospective validation of biomarkers. Am. J. Pharmacogenom.
**2005**, 5, 317–325. [Google Scholar] [CrossRef] - Therasse, P.; Carbonnelle, S.; Bogaerts, J. Clinical trials design and treatment tailoring: General principles applied to breast cancer research. Crit. Rev. Oncol. Hematol.
**2006**, 59, 98–105. [Google Scholar] [CrossRef] [PubMed] - Sargent, D.; Allegra, C. Issues in clinical trial design for tumor marker studies. Semin. Oncol.
**2002**, 29, 222–230. [Google Scholar] [CrossRef] [PubMed] - Tanniou, J.; van der Tweel, I.; Teerenstra, S.; Roes, K.C.B. Subgroup analyses in confirmatory clinical trials: Time to be specific about their purposes. BMC Med. Res. Methodol.
**2016**, 16, 20. [Google Scholar] [CrossRef] [PubMed] - Rubinstein, L.V.; Gail, M.H.; Santner, T.J. Planning the duration of a comparative clinical trial with loss to follow-up and a period of continued observation. J. Chronic Dis.
**1981**, 34, 469–479. [Google Scholar] [CrossRef] - Simon, R.; Maitournam, A. Evaluating the efficiency of targeted designs for randomized clinical trials: Supplement and correction. Clin. Cancer Res.
**2006**, 12, 3229. [Google Scholar] [CrossRef] [PubMed] - Simon, R.; Maitournam, A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clin. Cancer Res.
**2004**, 10, 6759. [Google Scholar] [CrossRef] [PubMed] - Biomarker targeted randomized design. Available online: http://brb.nci.nih.gov/brb/samplesize/td.html (accessed on 15 September 2016).
- Harrington, R.A. Applied Bioinformatics and Biostatistics in Cancer Research. In Designs for Clinical Trials: Perspectives on Current Issues; Springer Science+Business Media, LLC: New York, NY, USA, 2012. [Google Scholar]
- Biomarker stratified randomized design. Available online: https://brb.nci.nih.gov/brb/samplesize/sdpap.html (accessed on 15 September 2016).
- Freidlin, B. Randomized phase ii trial designs with biomarkers. Available online: http://brb.nci.nih.gov/Data/FreidlinB/RP2BM (accessed on 15 September 2016).
- Tajik, P.; Zwinderman, A.H.; Mol, B.W.; Bossuyt, P.M. Evaluating putative predictive biomarkers in randomized clinical trials. Available online: http://www.zonmw.nl/fileadmin/documenten/DO_Farmacotherapie_Dure_Weesgeneesmiddelen/HTA_pharmacotherapy_predictive_markers_guidance_document.pdf (accessed on 15 September 2016).
- Zaslavsky, B.G.; Scott, J. Sample size estimation in single-arm clinical trials with multiple testing under frequentist and bayesian approaches. J. Biopharm. Stat.
**2012**, 22, 819–835. [Google Scholar] [CrossRef] [PubMed] - Wittes, J. Sample size calculations for randomized controlled trials. Epidemiol. Rev.
**2002**, 24, 39–53. [Google Scholar] [CrossRef] [PubMed] - Collette, L. Chapman & Hall/CRC Texts in Statistical Science Series. In Modelling Survival Data in Medical Research, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2003. [Google Scholar]
- Kleinbaum, D.G.; Klein, M. Statistics for Biology and Health. Survival Analysis: A Self-Learning Text, 3rd ed.; Springer: New York, NY, USA, 2012. [Google Scholar]
- Freedman, L.S. Tables of the number of patients required in clinical trials using the logrank test. Stat. Med.
**1982**, 1, 121–129. [Google Scholar] [CrossRef] [PubMed] - Schoenfeld, D.A. Sample-size formula for the proportional-hazards regression model. Biometrics
**1983**, 39, 499–503. [Google Scholar] [CrossRef] [PubMed] - Bland, J.M.; Altman, D.G. Multiple significance tests: The bonferroni method. Br. Med. J.
**1995**, 310, 170. [Google Scholar] [CrossRef] - Jiang, W.; Freidlin, B.; Simon, R. Biomarker-adaptive threshold design: A procedure for evaluating treatment with possible biomarker-defined subset effect. J. Natl. Cancer Inst.
**2007**, 99, 1036–1043. [Google Scholar] [CrossRef] [PubMed] - Wang, S.-J.; Hung, H.M.J.; O’Neill, R.T. Adaptive patient enrichment designs in therapeutic trials. Biom. J.
**2009**, 51, 358–374. [Google Scholar] [CrossRef] [PubMed] - Alosh, M.; Huque, M.F.; Alosh, M.; Huque, M.F. A flexible strategy for testing subgroups and overall population. Stat. Med.
**2009**, 28, 3. [Google Scholar] [CrossRef] [PubMed] - Spiessens, B.; Debois, M. Adjusted significance levels for subgroup analyses in clinical trials. Contemp. Clin. Trials
**2010**, 31, 647. [Google Scholar] [CrossRef] [PubMed] - Song, Y.; Chi, G.Y.H.; Song, Y.; Chi, G.Y.H. A method for testing a prespecified subgroup in clinical trials. Stat. Med.
**2007**, 26, 3535. [Google Scholar] [CrossRef] [PubMed] - Chang, M. Chapman & Hall/CRC Biostatistics Series. In Adaptive Design Theory and Implementation Using SAS and R, 2nd ed.; CRC Press: London, UK, 2014. [Google Scholar]
- Dimairo, M.; Boote, J.; Julious, S.A.; Nicholl, J.P.; Todd, S. Missing steps in a staircase: A qualitative study of the perspectives of key stakeholders on the use of adaptive designs in confirmatory trials. Trials
**2015**, 16, 430. [Google Scholar] [CrossRef] [PubMed][Green Version] - Spira, A.; Edmiston, K.H. Clinical trial design in the age of molecular profiling. Methods Mol. Biol.
**2012**, 823, 19–34. [Google Scholar] [PubMed] - Medline Plus basic course manual 2012. Available online: http://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&cd=1&ved=0ahUKEwjS7_OmodvJAhWGVhQKHZr0AZMQFggdMAA&url=http%3A%2F%2Fbma.org.uk%2F-%2Fmedia%2Ffiles%2Fpdfs%2Fabout%2520the%2520bma%2Flibrary%2Fmedline%2520plus%2520basic%2520course%2520manual%25202012.pdf&usg=AFQjCNGFxcWiS11CJsroeeIETAWjW0neUA (accessed on 15 September 2016).

Types of Biomarker-Guided Non-Adaptive Trial Designs | Utility | Advantages | Limitations |
---|---|---|---|

Single arm designs (7 papers) [30,36,37,38,39,40,41] (see Figure 2) | Useful for initial identification and/or validation of a biomarker. | (A1) Considered as a simple statistical design as there is no need for randomization of patients. | (L1) There is no distinction between prognostic and predictive biomarker as patients are not randomized to experimental and control treatment arms. |

Also called: Nonrandomized clinical trial design, Uncontrolled Cohort Pharmacogenetic Study design | (A2) Simple logistics. | ||

Examples of actual trials: None identified ^{a} | (A3) Not complex statistical design | ||

(A4) In some cases, these designs may be viewed as ethical as all patients are given the opportunity to experience the experimental treatment. However, they may be viewed as unethical if the novel treatment does not benefit a subgroup of patients or causes adverse events. | |||

Enrichment designs (71 papers) [1,4,7,8,9,11,13,15,16,18,19,21,23,25,26,27,28,29,30,31,32,33,36,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86] (see Figure 3) | Useful when we aim to test the treatment effect only in biomarker-positive subset for which there is prior evidence that the novel treatment is beneficial, but the candidate biomarker requires prospective validation. | (A5) Evaluates the effect of the experimental treatment in the biomarker-positive subgroup in a simple and efficient way. | (L2) Do not assess whether the experimental treatment benefits the biomarker-negative patients, thus we cannot obtain information about this subgroup. Also unable to demonstrate whether the targeted treatment is beneficial in the entire study population. |

Also called: Targeted design, Selection design, Efficient Targeted design, Biomarker-Enrichment design, Marker-enrichment design, Gene enrichment design, Enriched design, Clinically enriched Phase III study design, Clinically Enriched Trial design, Biomarker-Enriched design, Biomarker Enriched design, Biomarker Selected trial design, Screening enrichment design, Randomized Controlled Trial (RCT) of test positive design, Population enrichment design | Useful when it is not ethical to assign biomarker-negative patients to the novel treatment for which there is prior evidence that it will not be beneficial for this subpopulation, or that it will harm them. | (A6) Provides clear information about whether the novel treatment is effective for the biomarker-positive subgroup, thus these designs can identify the best treatment for these patients and confirm the usefulness of the biomarker. | (L3) Do not inform us directly about whether the biomarker is itself predictive because the relative treatment efficacy may be the same in the unevaluated biomarker-negative patients. Since these designs only enrol a subgroup of patients, they do not allow for full validation of the marker’s predictive ability. For full validation, a trial would need to randomize all patients in order to test for a treatment–biomarker interaction. |

Examples of actual trials: CRYSTAL [49], BRIM 3 [49,50,51], EURTAC [49], CLEOPATRA [49], PROFILE 1007 [49,50], LUX-Lung [49], NSABP B-31 and NCCTG N9831 [4,15,16,18,19,28,29,30,31,36,44,46,52,53,54,55,56,57,58,59,60], CALGB-10603 [61], CATNON [62], CODEL [62], Evaluation of epidermal growth factor receptor variant III (EGFRvIII) peptide vaccination [62], N0923 [7,21] , Flex study [64], TOGA trial [47], IPASS [33,43], N0147 [29], PetaCC-8 [29,47], C80405 [29], ECOG E5202 [29] | Recommended when both the cut-off point for determination of biomarker-status of patients and the analytical validity of a biomarker are well established. | (A7) Reduced sample size as the assessment of treatment effect is restricted only to biomarker-positive subgroup. Therefore, if the selected biomarker is “biologically correct” and reliably measured, the used enrichment strategy could result in a large saving of randomized patients. | (L4) Researchers should carefully decide whether or not to follow this strategy as it may be of limited value due to the exclusion of biomarker-negative patients. It may be that the entire population could benefit from the experimental treatment equally irrespective of biomarker status, in which case enrolling only the biomarker-positive patients will result in slow trial accrual, increase of expenses and unnecessary limitation of the size of the indicated patient population. |

(A8) Enables rapid accumulation of efficacy data. | (L5) Concern over an ethical problem as we cannot include individuals in a clinical trial if it is believed that the treatment is not effective for them, as raised by the US Food and Drug Administration (FDA) [50]. It was based on the facts that the experimental treatment can only be approved for a particular biomarker-defined subpopulation (i.e., biomarker-positive patients) if a companion diagnostic test is also approved, and how the test can be approved if the Phase III trial does not show that the novel treatment does not benefit the biomarker-negative patients. | ||

(A9) Allow us to avoid potential dilution of the results due to the absence of biomarker-negative patients. For example, if the design had included the biomarker-negative population and the biomarker positivity rate was low as compared to the biomarker negative rate, then the estimation of the overall treatment effectiveness could be diluted as it would be driven by the biomarker-negative subset. | (L6) The accuracy of diagnostic devices used to identify the biomarkers, e.g., biomarker assays, is not always correct [45]. This can result in incorrect selection of biomarker-positive patients and therefore these patients will erroneously be enrolled in a trial yielding biased treatment effect estimates. For example, even when the experimental treatment works well for a specific subgroup, if the biomarker assay is not able to identify this subgroup robustly then a promising treatment may be abandoned. | ||

(A10) Can be attractive in terms of speed and cost, meaning that patients are provided with tailored treatment sooner. | |||

Marker Stratified designs (45 papers) [4,10,12,13,15,16,17,18,19,21,25,26,27,30,31,33,44,45,46,49,50,51,53,58,61,62,66,68,71,72,73,74,79,80,81,84,85,86,87,88,89,90,91,92,93] (see Figure 4) | Useful when there is evidence that the novel treatment is more effective in the positive biomarker-defined subgroup than in the negative biomarker-defined subgroup but there is insufficient compelling data indicating that the experimental treatment does not benefit the biomarker-negative patients. | (A11) Ability to assess the treatment effect not only in the entire population but also in each biomarker-defined subgroup. Thus, this design can find the optimal treatment in the entire population and in each biomarker-defined subgroup. | (L7) In situations where there are several biomarkers and treatments this design may not be feasible as it involves randomization of patients between all possible treatment options and may require a large sample size. |

Also called: Marker-stratified design, Biomarker-stratified design, Stratified-Randomized design, Stratification design, Stratified design, Stratified Analysis design, Marker by treatment – interaction design, Marker-by-treatment interaction design, Treatment by marker interaction design, Treatment-by-marker interaction design, Marker × treatment interaction design, Treatment-marker interaction design, Biomarker-by-treatment interaction design, Non-targeted RCT (stratified by marker) design, Genomic Signature stratified designs, Signature-Stratified design, Randomization or analysis stratified by biomarker status design, marker-interaction design. | (A12) An ethical design even in situations where the biomarker is not useful as no treatment decisions are made based on biomarker status; all decisions are made randomly. Consequently, if the biomarker’s value is in doubt, this design may be preferred. | (L8) May not be feasible when the prevalence of the biomarker is low. | |

Examples of actual trials: MARVEL (N023) [4,16,30,31,33,44,61,89], GALGB-30506 [15,61], RTOG0825 [45], EORTC 10994 p53 [12,66], IBCSG trial IX [18], MINDACT [18] | (L9) Might be expensive to test the entire population for its biomarker status. | ||

(L10) Measuring the biomarker up front may be logistically difficult. | |||

(L11) There is no guarantee of balanced groups for analysis. | |||

Sequential Subgroup-Specific design (11 papers) [13,14,19,22,53,57,58,60,69,91,94] (see Figure 5) | Recommended when prior evidence indicates that the biomarker-positive subpopulation benefits more from the novel treatment as compared to the biomarker-negative subpopulation. | (A13) Allows for the estimation of treatment effect in biomarker-positive and biomarker-negative subgroups. | (L12) Has less power when there is homogeneity of treatment across the different biomarker defined subgroups as compared to the overall/biomarker-positive designs. |

Also called: sequential design, Fixed-sequence 2 design, hierarchical fixed sequence testing procedure | (A14) Preserves the overall type I error rates and allows for a smaller sample size than the parallel version mentioned below. | (L13) Need a much larger sample size than the overall/biomarker positive designs if we assume that the treatment effect is relatively homogeneous across the biomarker-defined subsets. | |

Examples of actual trials: PRIME [49], MARVEL [49] | (A15) Considered as the best direct evidence for clinical decision making as it tests the treatment effectiveness in both the biomarker-positive and biomarker-negative subset in a sequential way. | ||

(A16) Do not require larger sample size than the overall/biomarker-positive designs when the prevalence of the biomarker-positive patients is small. | |||

Parallel Subgroup-Specific design (3 papers) [14,49,69] (see Figure 6) | Appropriate when the aim of the study is to give treatment recommendations for each biomarker-defined subgroup separately at the same time. | (A17) Same as (A13), (A16) | (L14) Same as (L12) |

Also called: Phase III Biomarker-Stratified design | (L15) Allocates the overall level $a$ between the two biomarker-defined subgroup tests which means that it will be more difficult to achieve statistical significance in the biomarker-positive subgroup. | ||

Examples of actual trials: None identified ^{a} | |||

Biomarker-positive and overall strategies with parallel assessment (8 papers) [1,14,36,47,49,69,95,96] (see Figure 7) | Recommended when the aim of the study is to assess the treatment effect in both the entire population and in the biomarker-positive subset but not in the biomarker-negative population. | (A18) Can control the overall type I error $a$. | (L16) Can be overly conservative as in the SATURN trial because of the correlation between the test of treatment effect in the overall study population and in the biomarker subgroups. |

Also called: Overall/biomarker-positive design with parallel assessment, prospective subset design, hybrid design | (A19) Can require smaller sample size as compared to the subgroup-specific designs, especially when we assume that the novel treatment equally benefits both biomarker-defined subgroups. | (L17) Cannot control the probability of rejecting the null hypothesis of no treatment effect in the biomarker-negative subset when the treatment benefit is restricted to biomarker-positive patients. Consequently, there is a high risk of inappropriately recommending the novel treatment for biomarker-negative patients due to the large treatment effect in biomarker-positive subset. | |

Examples of actual trials: S0819 [14,49], SATURN [14,36,47,49,95,96], MONET1 [14,49], ARCHER [14,49], ZODIAC [49], MERiDiAN [49] | |||

Biomarker-positive and overall strategies with sequential assessment (11 papers) [13,14,30,44,49,69,80,84,85,88,94] (see Figure 8) | Might be useful in cases where the experimental treatment is expected to be effective in the overall population. | (A20) Same as (A18), (A19) | (L18) Can be problematic for determining whether the treatment is beneficial in the biomarker-negative subgroup. |

Also called: Overall/biomarker-positive design with sequential assessment, sequential design, Fixed-sequence 2 design, hierarchical fixed sequence testing procedure | (L19) Same as (L17) | ||

Examples of actual trials: Trial of letrozole plus lapatinib versus letrozole plus placebo in breast cancer, with the biomarker defined by human epidermal growth factor receptor 2 (HER2) [14], N0147 [30,49] | |||

Biomarker-positive and overall strategies with fall-back analysis (15 papers) [10,30,36,44,47,49,53,57,60,69,84,88,94,96,97] (see Figure 9) | Recommended when there is insufficient confidence in the predictive value of the biomarker and the novel treatment is assumed to probably benefit all patients. | (A21) Can assess the treatment effect in the biomarker-positive patients, if no benefit is detected in the overall population. | (L20) Same as (L17), (L18) |

Also called: Biomarker-stratified design with fall-back analysis, fall-back design, prospective subset design, sequential design, other analysis plan design, Fallback design | (A22) Same as (A18), (A19) | ||

Examples of actual trials: None identified ^{a} | |||

Marker Sequential test design (4 papers) [14,49,69,94] (see Figure 10) | Recommended when biomarkers with strong credentials are available and we have convincing evidence that the novel treatment is more effective in biomarker-positive than in biomarker-negative patients. | (A23) Can provide clear evidence of treatment benefit in the biomarker-positive subgroup and in the biomarker-negative subgroup. | (L21) In situations where biomarker status is not available for some of the patients included in the study, this design can either exclude these patients or include them in the global test, however, further statistical adjustments might be required in that case. |

Also called: MaST design, hybrid design | Appropriate when we can assume that the treatment will not be beneficial in the biomarker-negative subpopulation unless it is effective for the biomarker-positive subpopulation. | (A24) Enables sequential testing of the treatment effect in the entire study population and in the biomarker-defined subgroups to restrict testing of the treatment effect in the entire population when there is no significant result in the biomarker-positive subset, while controlling the appropriate type I error rates. | (L22) Does not decrease the sample size of the study as it was developed in order to increase the power compared to the sequential subgroup-specific design in situations where the novel treatment benefits equally both biomarker-negative and biomarker-positive patients. |

Examples of actual trials: ECOG E1910 [14,49] | (A25) Results in higher power as compared to the sequential subgroup-specific design in cases where the treatment effect is homogeneous across the biomarker-defined subgroups. | ||

(A26) Preserves the power in situations where the treatment effect is restricted only to the biomarker-positive patients and at the same time it controls the relevant type I error rates. | |||

(A27) Control the type I error rate for the biomarker-negative subgroup over all possible prevalence values. | |||

(A28) The probability of erroneously concluding that the novel treatment is beneficial for the entire population when the global effect is driven by the biomarker-positive patients is minimized since the design only tests the treatment effect in the entire population when no significant effect is detected in the biomarker-positive subgroup. | |||

Hybrid designs (14 papers) [1,13,15,29,30,31,36,46,48,55,66,84,88,98] (see Figure 11) | Can be used when there is prior evidence indicating that only a particular treatment is beneficial to a biomarker-defined subgroup which makes it unethical to randomize patients with that specific biomarker status to other treatment options. | (A29) The feasibility of a prognostic biomarker can be tested. | None found. |

Also called: Mixture design, Combination of trial designs, hybrid biomarker design | (A30) Allows for better risk assessment and improved individualized treatment since it assigns patients to treatments based on risk assessment scores instead of their biomarker status (biomarker-positive and biomarker-negative patients). | ||

Examples of actual trials: TAILORx [15,48,55,58,63,66], EORTC MINDACT [15,48,55,66], ECOG 5202 study [30,46] | |||

Biomarker-strategy designs with biomarker assessment in the control arm (21 papers) [15,25,26,32,33,36,45,61,62,64,79,82,85,86,92,93,99,100,101,102,103] (see Figure 12) | Useful when we want to test the hypothesis that the treatment effect based on the personalized approach is superior to that of the standard of care. | (A31) Biomarker can be validated without including all possible biomarker–treatment combinations [26] as in the non-biomarker-based arm all patients receive only the control treatment. | (L23) Unable to inform us whether the biomarker is predictive as these designs are able to answer the question about whether the biomarker-based strategy is more effective than standard treatment, irrespective of the biomarker status of the study population. |

Also called: Marker strategy design, Biomarker-strategy design, Strategy design, Marker-based strategy design, Marker-based design, Random disclosure design, Customized strategy design, Parallel controlled pharmacogenetic study design, Marker-based strategy design I, Biomarker-guided design, Biomarker-based assignment of specific drug therapy design, Marker-based strategy I design, Biomarker-strategy design with a standard control, Marker strategy design for prognostic biomarkers | (A32) Have the option of testing the biomarker status of patients in the non-biomarker-strategy arm which can aid secondary analyses [26]. | (L24) The evaluation of the true biomarker by treatment effect is not possible as the biomarker-positive patients receive only the experimental treatment and not the alternative treatment (control treatment). Consequently, this design cannot detect the case in which the control treatment might be more beneficial for the entire population. | |

Examples of actual trials: GILT docetaxel [15], Randomized phase III trial conducted in Spain, dedicated to patients with advanced Non-Small Cell Lung Cancer (NSCLC) candidates for first-line chemotherapy [32,64,100], Study the effect of Magnetic Resonance Imaging (MRI) in patients with low back pain on patient outcome and to evaluate Doppler US of the umbilical artery in the management of women with intrauterine growth retardation (IUGR), Randomized controlled trial in recurrent platinum-resistant ovarian carcinoma [101] | (A33) Able to inform us whether the biomarker is prognostic. | (L25) In case that the number of biomarker-positive patients is very small, then the treatment received will be similar in biomarker-strategy arm and non-biomarker strategy arm. Consequently, the trial might give little information regarding the efficacy of the experimental treatment or it might not be able to detect it. As a result, this type of design should be used when there is an adequate number of biomarker-positive and biomarker-negative patients. | |

(A34) Can be expanded to investigate several biomarkers and treatments [103]. Additionally, these designs can be attractive when evaluating multiple biomarkers or the predictive value of molecular profiling between several treatment options is to be assessed [45]. | (L26) Unable to compare directly experimental treatment to control treatment as the aim is to compare not the treatments but the biomarker-strategies. | ||

(A35) Might be used more frequently in the future due to the wide variety of molecular biomarkers, complexity of gene expression arrays, and several treatments directed at similar targets [103]. | (L27) Less efficient designs than biomarker-stratified designs [4,73] and a poor substitute for clinical trials which aim to compare the experimental treatment to control treatment, since it is possible for some patients in both the biomarker-based strategy arm and non-biomarker-based strategy arm to be assigned to the same treatment (due to the existence of biomarker-negative patients in both strategy arms the treatment effect can be diluted) [51]. Consequently, as a large overlap of patients receiving the same treatment might have occurred, the comparison of the two biomarker-strategy arms results in a hazard ratio which is forced towards unity, i.e., no treatment effect exists as the effect of experimental versus control treatment is diluted by the biomarker-based treatment selection. For this reason, a large sample size is needed to detect at least a small overall difference in outcomes between the two biomarker-strategy arms. | ||

(L28) Should be used only if you want to evaluate a complex biomarker-guided strategy with a variety of treatment options or biomarker categories [73]. | |||

Biomarker-strategy design without biomarker assessment in the control arm (14 papers) [9,13,17,18,20,25,36,38,61,74,101,104,105,106] (see Figure 13) | In situations where it is not feasible or unethical to test the biomarker in the entire population. | (A36) Galanis et al., 2011 [45] stated that these designs can be attractive when evaluating multiple biomarkers or the predictive value of molecular profiling between several treatment options is to be assessed. Also, Freidlin and Korn, 2010 [73] claimed that these biomarker-strategy designs should be used only if researchers want to evaluate a complex biomarker-guided strategy with a variety of treatment options or biomarker categories. | (L29) Criticized for their potential cost increase due to the fact that patients without predicted responsive biomarker are double enrolled in the trial (biomarker-negative patients receive control treatment in both strategy arms). |

Also called: Biomarker-strategy design with standard control, Direct-predictive biomarker-based, RCT of testing, Test-treatment, Parallel controlled pharmacogenetic diagnostic study, Marker strategy, Marker-based with no randomization in the non-marker-based arm, Classical, Marker-based strategy, Marker strategy design for prognostic biomarkers | (A37) Same as (A31), (A32), (A33) | (L30) Biomarker-positive and biomarker-negative subpopulations might be more imbalanced as compared with the first type of biomarker-strategy design due to the fact that the randomization to different treatment strategies is performed before the evaluation of the biomarker status (balancing the randomization is useful to ensure that all randomized patients have tissue available). This can happen especially when the number of patients is very small. | |

Examples of actual trials: A study, which evaluated the use of immediate computed tomography in patients with acute mild head injury [101,104]. | (L31) Same as (L23), (L24), (L25), (L26), (L27) | ||

Biomarker-strategy design with treatment randomization in the control arm (17 papers) [15,17,26,27,32,36,45,62,64,66,74,86,92,93,106,107,108] (see Figure 14) | In cases where we want to know whether the biomarker is not only prognostic but also predictive, these designs are preferable as compared to the two previously mentioned biomarker-strategy designs. | (A38) These designs have the ability to inform researchers about the potential superiority of the control treatment in the whole population or among a particular biomarker-defined subpopulation. | (L32) Generally require a larger sample size as compared to the marker-stratified designs. |

Also called: Biomarker-strategy design with a randomized control, Modified marker-based strategy design (for predictive biomarkers), Biomarker-strategy design with randomized control, Marker-based design with randomization in the non-marker-based arm, Marker-based strategy design II, Marker-strategy design, Augmented strategy design, Trial design allowing the evaluation of both the treatment and the marker effect | (A39) Able to inform us whether the biomarker is prognostic or predictive. | (L33) Same as (L27) | |

Examples of actual trials: None identified ^{a} | (A40) Allow clarification of whether the results which indicate efficacy of the biomarker-directed approach to treatment are caused due to a true effect of the biomarker status or to an improved treatment irrespective of the biomarker status. | ||

(A41) Same as (A36) | |||

Reverse marker-based strategy (4 papers) [86,92,93,109] (see Figure 15) | Enables testing the interaction hypothesis of treatment and biomarker in a more efficient way as compared to the first (i.e., Biomarker-strategy design with biomarker assessment in the control arm) and third biomarker-strategy subtype design (i.e., Biomarker-strategy design with randomization in the control arm and the marker stratified design) | (A42) Can estimate directly the marker-strategy response rate. | (L34) It has been claimed by Baker, 2014 [93] that other designs than the reverse marker-based strategy are more appropriate in order to investigate questions which include both treatment effect of biomarker-defined subgroups and the biomarker strategy treatment effect. These designs should allow the estimation of treatment effects within biomarker-defined subgroups as well as the estimation of the global treatment effect. |

Also called: None found | (A43) Allows the estimation of the effect size of the experimental treatment compared to the control treatment for each biomarker-defined subset separately. | ||

Examples of actual trials: None identified ^{a} | (A44) There is no chance that the same treatment will be tailored to biomarker-positive patients who are randomized either to the biomarker-based strategy arm or the reverse marker strategy. Also, there is no possibility of the same treatment assignment to biomarker-negative patients who are randomly assigned to the two biomarker-based strategy arms. | ||

(A45) It has been demonstrated by Eng, 2014 [92] that this new type of design is more than four times more efficient for testing the interaction between treatment and biomarker compared to Biomarker-strategy design with biomarker assessment in the control arm, Biomarker-strategy design with randomization in the control arm and the marker stratified design. | |||

A specific randomized phase II trial design that can be used to guide decision making for further development of an experimental therapy. (1 paper) [71] (see Figure 16) | Recommended when we want to conduct a Phase II randomized trial which allows decisions to be made about which type of Phase III biomarker-guided trial should be used. | (A46) Works well in providing recommendations for phase III trial design. | None found |

Types of Biomarker-Guided Non-Adaptive Trial Designs | Sample Size Formula | Definition |
---|---|---|

Single arm designs | Standard sample size formula can be used, more information can be found in the ‘methodology’ part of the ‘Single arm designs’ section in the main text. | |

Enrichment designs [55,61,65,110,111,112] | Online tool for sample size calculation when using either binary or time-to-event endpoints is available on the following website: http://brb.nci.nih.gov/brb/samplesize/td.html [113]. | |

$E\left({D}_{i,enrichment}\right)=\frac{nT{\lambda}_{i}}{2\left({\lambda}_{i}+{\phi}_{i}\right)}\left\{1-\frac{{e}^{-\left({\lambda}_{i}+{\phi}_{i}\right)\tau}}{\left({\lambda}_{i}+{\phi}_{i}\right)T}\left[1-{e}^{-\left({\lambda}_{i}+{\phi}_{i}\right)T}\right]\right\}$ | $E\left({D}_{i,enrichment}\right)$ is referred to the expected number of events per treatment arm (time-to-event outcome), $i$ corresponds to either the experimental or the control treatment group, $1:1$ ratio between the two treatment arms (experimental:control) is assumed, $\lambda $ corresponds to the event hazard rate, $\phi $ is the loss to follow-up rate, $T$ denotes the accrual time, patients enter the trial according to a Poisson process with rate $n$ per year over the accrual period of $T$ years, τ corresponds to the follow-up period. | |

${D}_{enrichment}=4{\left[\frac{\left({z}_{\alpha /2}+{z}_{\beta}\right)}{{\mathrm{log}\mathsf{\theta}}_{1}}\right]}^{2}$ | ${D}_{enrichment}$ is referred to the required total number of events (time-to-event outcome), $1:1$ ratio between the two treatment arms (experimental:control) is assumed, ${z}_{\alpha /2},\text{}{z}_{\beta}$ denote the upper $\alpha /2$- and upper $\beta $-points respectively of a standard normal distribution, $\alpha $ and $\beta $ denote the assumed type I error and type II error respectively, ${\mathsf{\theta}}_{1}$ denotes the assumed hazard ratio between the two treatment groups (control vs experimental) in the biomarker-positive subset. | |

${N}_{enrichment/arm}=2{\overline{p}}_{Q}\left(1-{\overline{p}}_{Q}\right){\left[\frac{\left({z}_{\alpha /2}+{z}_{\beta}\right)}{\left({p}_{A}^{Q}-{p}_{B}\right)}\right]}^{2}$ | ${N}_{enrichment/arm}$ is referred to the required number of patients per treatment arm (binary outcome), $1:1$ ratio between the two treatment arms (experimental:control) is assumed, ${p}_{A}^{Q}$ and ${p}_{B}$ are the response probabilities in the experimental and control groups respectively, ${\overline{p}}_{Q}=\left({p}_{A}^{Q}+{p}_{B}\right)/2$. | |

${N}_{enrichment/arm}=\frac{2{\sigma}^{2}{\left({z}_{\alpha /2}+{z}_{\beta}\right)}^{2}}{{\left({\mu}_{A+}-{\mu}_{B+}\right)}^{2}}$ | ${N}_{enrichment/arm}$ is referred to the required total number of patients per treatment arm (continuous response endpoints), $1:1$ ratio between the two treatment arms (experimental:control) is assumed, ${\sigma}^{2}$ denotes the anticipated common variance, ${\mu}_{A+}$ and ${\mu}_{B+}$ the mean responses for biomarker-positive patients in the experimental and control treatment arm respectively. | |

${N}_{enrichment/arm}=2{\sigma}^{2}{\left({z}_{\alpha /2}+{z}_{\beta}\right)}^{2}{\left\{{\lambda}_{1}\left[\left(1-\omega \right)\text{}\zeta +\omega \right]\right\}}^{-2}$ | ${N}_{enrichment/arm}$ is referred to the required total number of patients per treatment arm (continuous response endpoints when accounting for error in the assaying of the study population), $1:1$ ratio between the two treatment arms (experimental:control) is assumed, $\omega $ measures the accuracy of the assay and corresponds to the PPV (positive predictive value of the assay, i.e., the proportion of patients who are assigned biomarker positive status according to the assay who are truly biomarker positive), ${\lambda}_{1}$ is the treatment effect in the biomarker-positive patients and $\zeta ={\lambda}_{0}/{\lambda}_{1}$ (where ${\lambda}_{0}$ is the treatment effect in the biomarker-negative patients). | |

Marker Stratified designs [31,53,60,92,111,112,114] | Online tool for sample size calculation when using either binary or time-to-event endpoints is available on the following website: http://brb.nci.nih.gov/brb/samplesize/sdpap.html [115]. | |

${D}_{stratified}=4\frac{{\left({z}_{{a}_{1}}+{z}_{\beta}\right)}^{2}}{{\left[\mathrm{log}\left({\theta}_{1}\right)\right]}^{2}}+4\frac{{\left({z}_{{a}_{2}}+{z}_{\beta}\right)}^{2}}{{\left[\mathrm{log}\left({\theta}_{2}\right)\right]}^{2}}$ | ${D}_{stratified}$ is referred to the required total number of events for the achievement of sufficient power in each biomarker-defined subgroup separately (time-to-event endpoint), $1:1$ ratio between the two treatment arms (experimental:control) is assumed, ${\theta}_{2}$ corresponds to the hazard ratio of biomarker-negative subgroup, ${a}_{1}={a}_{2}=a/2$. | |

${D}_{stratified}=\frac{4{\left({z}_{a/2}+{z}_{\beta}\right)}^{2}}{{\left[k\mathrm{log}\left({\theta}_{1}\right)+\left(1-k\right)\mathrm{log}\left({\theta}_{2}\right)\right]}^{2}}$ | ${D}_{stratified}$ is referred to the required total number of events for the achievement of sufficient power in the overall population (time-to-event endpoint), $k$ is the proportion biomarker-positive patients, $1:1$ ratio between the two treatment arms (experimental:control) is assumed. | |

${N}_{stratified}=\frac{4{\left({z}_{a/2}+{z}_{\beta}\right)}^{2}}{{\left\{\left[kP{r}_{(+)}\left(event\right)\mathrm{log}\left({\theta}_{1}\right)+\left(1-k\right)P{r}_{(-)}\left(event\right)\mathrm{log}\left({\theta}_{2}\right)\right]/\sqrt{kP{r}_{(+)}\left(event\right)+\left(1-k\right)P{r}_{(-)}\left(event\right)}\right\}}^{2}}$ | ${N}_{stratified}$ is referred to the required total number of patients for the achievement of sufficient power in the overall population (time-to-event endpoint), $1:1$ ratio between the two treatment arms (experimental:control) is assumed, $P{r}_{(+)}\left(event\right)$, $P{r}_{(-)}\left(event\right)$ are the probabilities of an event in biomarker-positive subset and biomarker-negative subset respectively. | |

$\frac{{D}_{stratified}}{{D}_{enrichment}}=\frac{{\left[\mathrm{log}\left({\theta}_{1}\right)\right]}^{2}}{{\left[k\mathrm{log}\left({\theta}_{1}\right)+\left(1-k\right)\mathrm{log}\left({\theta}_{2}\right)\right]}^{2}}=\frac{1}{{\left[k+\left(1-k\right)\frac{\mathrm{log}\left({\theta}_{2}\right)}{\mathrm{log}\left({\theta}_{1}\right)}\right]}^{2}}$ | $\frac{{D}_{stratified}}{{D}_{enrichment}}$ is referred to the ratio of the required number of events between marker stratified and enrichment design (time-to-event endpoint). | |

$\frac{{N}_{stratified}}{{N}_{enrichment}}\approx \frac{1}{{\left[k+\left(1-k\right)\frac{{\delta}_{-}}{{\delta}_{+}}\right]}^{2}}$ | $\frac{{N}_{stratified}}{{N}_{enrichment}}$ is referred to the ratio of the required number of patients between marker stratified and enrichment design (binary outcome), ${\delta}_{-}$, ${\delta}_{+}$, correspond to the treatment effectiveness in biomarker-negative and biomarker-positive subgroup respectively. | |

${N}_{stratified}=2{\left({z}_{a}+{z}_{1-\beta}\right)}^{2}\left\{\frac{{r}_{A+}\left(1-{r}_{A+}\right)+{r}_{B+}\left(1-{r}_{B+}\right)}{{\left({\beta}_{A}+{\beta}_{I}\right)}^{2}}+\frac{{r}_{A-}\left(1-{r}_{A-}\right)+{r}_{B-}\left(1-{r}_{B-}\right)}{{\left({\beta}_{A}\right)}^{2}}\right\}$ | ${N}_{stratified}$ is referred to the required total number of patients (binary outcome), ${\beta}_{0}$ denotes a baseline effect, ${\beta}_{A}$ denotes the added effect of the experimental treatment, ${\beta}_{+}$ denotes the biomarker-positive effect and ${\beta}_{I}$ denotes the nonadditive effect, $\alpha $ corresponds to the target level, $1-\beta $ corresponds to the power, ${r}_{A+},\text{}{r}_{B+}$ are the assumed response rates of biomarker-positive patients receiving the experimental and the control treatment respectively, ${r}_{A-},\text{}{r}_{B-}$ are the assumed response rates of biomarker-negative patients receiving the experimental and the control treatment respectively. | |

Sequential Subgroup-Specific design [57] | ${N}_{sequential\text{}subgroup-specific}^{+}={N}_{enrichment}$ | ${N}_{sequential\text{}subgroup-specific}^{+}$ is referred to the required number of biomarker-positive patients (binary outcome), ${N}_{enrichment}$ is the required number of biomarker-positive patients (binary outcome) in the enrichment design. |

${N}_{sequential\text{}subgroup-specific}=\frac{{N}_{enrichment}}{k}$ | ${N}_{sequential\text{}subgroup-specific}$ is referred to the required total number of patients (binary outcome), ${N}_{enrichment}$ is the required number of biomarker-positive patients (binary outcome) in the enrichment design. | |

${N}_{sequential\text{}subgroup-specific}^{-}=\frac{\left(1-k\right){N}_{enrichment}}{k}$ | ${N}_{sequential\text{}subgroup-specific}^{-}$ is referred to the required number of biomarker-negative patients (binary outcome), ${N}_{enrichment}$ is the required number of biomarker-positive patients (binary outcome) in the enrichment design. | |

${D}_{sequential\text{}subgroup-specific}^{+}={D}_{enrichment}$ | ${D}_{sequential\text{}subgroup-specific}^{+}$ is referred to the required number of events for biomarker-positive patients (time-to-event outcome), ${D}_{enrichment}$ is the required number of events for biomarker-positive patients (time-to-event outcome). | |

${D}_{sequential\text{}subgroup-specific}^{-}={D}_{enrichment}\left(\frac{{\lambda}_{-}}{{\lambda}_{+}}\right)\left(\frac{1-k}{k}\right)$ | ${D}_{sequential\text{}subgroup-specific}^{-}$ is referred to the required number of events for biomarker-negative patients (time-to-event outcome), ${D}_{enrichment}$ is the required number of events for biomarker-positive patients (time-to-event outcome), ${\lambda}_{-}$, ${\lambda}_{+}$, are the event rates in biomarker-negative and biomarker-positive control subgroups. | |

Parallel Subgroup-Specific design | Same formula proposed for marker stratified designs could be considered to achieve sufficient power in each biomarker-defined subgroup simultaneously. However, in order to control the overall type I error rate of the design at the overall level of significance $\alpha $ it is required to allocate this overall $\alpha $ between the test for the biomarker-positive subgroup and the test for the biomarker-negative. Consequently, for biomarker-positive subgroup the reduced significance level ${a}_{1}=a-{a}_{2}$ can be used whereas the reduced significance level ${a}_{2}=a-{a}_{1}$ can be used for biomarker-negative subgroup. | |

Biomarker-positive and overall strategies with parallel assessment | If there is significant confidence that the biomarker is predictive, the sample size estimation is aimed at having a sufficient number of biomarker-positive individuals to enable the treatment effect in the biomarker positive subgroup to be detected. Standard formula for sample size calculation of biomarker-positive subgroup proposed for the enrichment designs could be considered by using the reduced significance level ${a}_{1}=a-{a}_{2}$. On the other hand, if there is no confidence in the predictive value of the biomarker, the sample size estimation is aimed at having a sufficient number of patients to detect a treatment effect in the overall study population; consequently, for the sample size calculation, the same formula proposed for marker stratified designs aiming to achieve sufficient power in the overall population could be applied by using the reduced significance level ${a}_{2}=a-{a}_{1}$. | |

Biomarker-positive and overall strategies with sequential assessment | At the first stage, the standard formula for a traditional randomized trial which is the same with the formula proposed for enrichment designs can be applied for the biomarker-positive subgroup. At the second stage, the sample size formula proposed for marker stratified designs aiming to yield appropriate power for the entire population could be considered. | |

Biomarker-positive and overall strategies with fall-back analysis | At the first stage, the sample size formula proposed for marker stratified designs aiming to yield appropriate power for the entire population could be considered by using the reduced significance level ${a}_{1}=a-{a}_{2}$. At the second stage, the formula proposed for enrichment designs could be applied for the biomarker-positive subgroup by using the reduced significance level ${a}_{2}=a-{a}_{1}$. | |

Marker Sequential test design (MaST) | A standard sample size calculation (i.e., the same sample size calculation as for the enrichment designs) can be applied for the biomarker-positive subpopulation. However, in order to have sufficient number of biomarker-positive patients to detect treatment effectiveness in that particular biomarker-defined subset and consequently to reach the desired power, the sample size should be calculated by using the reduced significance level ${a}_{1}$ $\left[0,a\right]$ instead of the global significance level $\alpha $ which is used in the sample size formulae of the enrichment designs. The same formula could be considered for the sample size calculation of the biomarker-negative subgroup; however, the corresponding hazard ratio of that subgroup and the global significance level $\alpha $ should be used. For the sample size calculation of the entire population, the same formula proposed for marker stratified designs aiming to achieve sufficient power in the overall population could be considered by using the reduced significance level ${a}_{2}=a-{a}_{1}$. | |

Biomarker-strategy, design with biomarker assessment in the control arm [26,61,92] | ${D}_{strategy\text{}I}=4{\left[\frac{\left({z}_{\alpha /2}+{z}_{\beta}\right)}{k{\mathrm{log}\mathsf{\theta}}_{1}}\right]}^{2}$ | ${D}_{strategy\text{}I}$ is referred to the required total number of events (time-to-event outcome), $1:1$ ratio between the two treatment arms (experimental:control) is assumed. |

${N}_{strategy\text{}I}=\frac{2{\left({z}_{1-\alpha /2}+{z}_{1-\beta}\right)}^{2}\left({\tau}_{m}^{2}+{\tau}_{n}^{2}\right)}{{\left({v}_{m}-{v}_{n}\right)}^{2}}$ | ${N}_{strategy\text{}I}$ is referred to the required total sample size (continuous clinical endpoints), $1:1$ ratio between the two treatment arms (experimental:control) is assumed, ${z}_{1-\alpha /2}$, ${z}_{1-\beta}$ denote the lower $1-\alpha /2$- and lower $1-\beta $-points respectively of a standard normal distribution, ${v}_{m}$ and ${v}_{n}$ denote the mean response from the biomarker-based strategy arm and the non-biomarker-based strategy arm respectively, and ${\tau}_{m}^{2},\text{}{\tau}_{n}^{2}$ denote the variance of response for the biomarker-based strategy arm and non-biomarker-based strategy arm respectively. | |

${N}_{strategy\text{}I/arm}=\frac{{\left({z}_{a}+{z}_{1-\beta}\right)}^{2}\left[{g}_{1}\left(1-{g}_{1}\right)+{g}_{2}\left(1-{g}_{2}\right)\right]}{{\Delta}_{2}^{2}}$ | ${N}_{strategy\text{}I/arm}$ is referred to the required total number of patients per arm (binary outcome), ${g}_{1}$ is the expected response rate in the biomarker-based strategy arm, ${g}_{2}$ is the expected response rate in the non biomarker-based strategy arm, ${\Delta}_{2}={g}_{1}-{g}_{2}$, ${g}_{1},{g}_{2}\text{}$can be found by calculating the formulae $k{r}_{A+}+\left(1-k\right){r}_{B-}$ and ${r}_{B}$ respectively, ${r}_{B}$ denotes the marginal effect of treatment B (control treatment). | |

Biomarker-strategy design without biomarker assessment in the control arm | Same formulae as for the ‘Biomarker-strategy design with biomarker assessment in the control arm’ can be considered. | |

Biomarker-strategy design with treatment randomization in the control arm [26,31,92] | ${D}_{strategy\text{}III}=\frac{4{\left({z}_{a/2}+{z}_{\beta}\right)}^{2}}{{\left\{\mathrm{log}\left[\frac{2k{m}_{B+}+2\left(1-k\right){m}_{A-}}{k\left({m}_{A+}+{m}_{B+}\right)+\left(1-k\right)\left({m}_{A-}+{m}_{B-}\right)}\right]\right\}}^{2}}$ | ${D}_{strategy\text{}III}$ is referred to the required total number of events (time-to-event outcome), $1:1$ ratio between the two treatment arms (experimental:control) is assumed, ${m}_{A+},{m}_{A-},\text{}{m}_{B+},{m}_{B-}$, denote the median survival for biomarker-positive and biomarker-negative patients receiving control and experimental treatments respectively. |

${N}_{strategy\text{}III}=\frac{2{\left({z}_{1-\alpha /2}+{z}_{1-\beta}\right)}^{2}\left({\tau}_{m}^{2}+{\tau}_{nr}^{2}\right)}{{\left({v}_{m}-{v}_{nr}\right)}^{2}}$ | ${N}_{strategy\text{}III}$ is referred to the required total sample size (continuous clinical endpoints), $1:1$ ratio between the two treatment arms (experimental:control) is assumed, ${v}_{nr}$ denotes the mean response from the non-biomarker-based strategy arm, ${\tau}_{nr}^{2}$ denotes the variance of response for the non-biomarker-based strategy arm respectively. | |

${N}_{strategy\text{}III/arm}=\frac{{\left({z}_{a}+{z}_{1-\beta}\right)}^{2}\left[{g}_{1}\left(1-{g}_{1}\right)+{g}_{3}\left(1-{g}_{3}\right)\right]}{{\Delta}_{3}^{2}}$ | ${N}_{strategy\text{}III/arm}$ is referred to the required total number of patients per arm (binary outcome), ${g}_{3}$ is the expected response rate in the non biomarker-based strategy arm and ${\Delta}_{3}={g}_{1}-{g}_{3}$, the expected response rate ${g}_{3}$ can be found by calculating the formula ${r}_{A}/2+{r}_{B}/2$, ${r}_{A}$ denotes the marginal effect of treatment A (experimental treatment). | |

Reverse marker-based strategy [92] | ${N}_{strategy\text{}IV/arm}=\frac{{\left({z}_{a}+{z}_{1-\beta}\right)}^{2}\left[{g}_{1}\left(1-{g}_{1}\right)+{g}_{4}\left(1-{g}_{4}\right)\right]}{{\Delta}_{4}^{2}}$ | ${N}_{strategy\text{}IV/arm}$ is referred to the required total number of patients per arm (binary outcome), ${g}_{4}$ is the expected response rate in the reverse biomarker-based strategy arm and ${\Delta}_{4}={g}_{1}-{g}_{4}$, the expected response rate ${g}_{4}$ can be found by calculating the formula $k{r}_{B+}+\left(1-k\right){r}_{A-}$, ${r}_{B+},\text{}{r}_{A-}$ are the assumed response rates of biomarker-positive patients receiving the control treatment and biomarker-negative patients receiving the experimental treatment. |

Randomized Phase II trial design with biomarkers [71] | Online tool for sample size calculation is available on the following website: http://brb.nci.nih.gov/Data/FreidlinB/RP2BM [116]. |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).