How to Balance Prognostic Factors in Controlled Phase II Trials: Stratified Permuted Block Randomization or Minimization? An Analysis of Clinical Trials in Digestive Oncology

In controlled phase II trials, major prognostic factors need to be well balanced between arms. The main procedures used are SPBR (Stratified Permuted Block Randomization) and minimization. First, we provide a systematic review of the treatment allocation procedure used in gastrointestinal oncology controlled phase II trials published in 2019. Second, we performed simulations using data from six phase II studies to measure the impacts of imbalances and bias on the efficacy estimations. From the 40 articles analyzed, all mentioned randomization in both the title and abstract, the median number of patients included was 109, and 77.5% were multicenter. Of the 27 studies that reported at least one stratification variable, 10 included the center as a stratification variable, 10 used minimization, 9 used SBR, and 8 were unspecified. In real data studies, the imbalance increased with the number of centers. The total and marginal imbalances were higher with SBR than with minimization, and the difference increased with the number of centers. The efficiency estimates per arm were close to the original trial estimate in both procedures. Minimization is often used in cases of numerous centers and guarantees better similarity between arms for stratification variables for total and marginal imbalances in phase II trials.


Introduction
After phase I, which is the safety exploration in clinical oncology research, phase II trials are designed to provide preliminary evidence that a new therapeutic strategy seems to be effective enough to proceed to a phase III study for efficacy comparison with the standard of care.Phase II trials in oncology have typically been single-arm trials, but controlled designs are recommended for assessing the current efficacy in both control (standard of care) and experimental groups in the same setting (diverse cancer parameters, multiple centers if possible, etc.).Ideally, for trials with 1:1 ratio, the distribution of prognostic factors must be similarly balanced to prevent selection bias [1,2].Randomization, aiming to produce treatment groups in which the distribution of prognostic factors, known and unknown [3,4], is well balanced, introduces randomness into the assignment of treatments to participants [3] but does not always guarantee that the groups will be similar with regard to patient characteristics [5].
Initially, simple randomization often led to imbalances in participant groups, particularly in small trials.To address this, block randomization was introduced, ensuring a balance by dividing participants into blocks for random assignment, although it neglected important covariates.Usually, major prognostic factors are known before the study, so stratification was added to balance participants across treatment groups within specific covariates (e.g., smoking status, mutational status).Combining these methods, stratified block randomization allocates participants by key covariates and then applies block randomization within each stratum.The permuted block size in the randomization prevents predictability in treatment assignment, ensuring that the allocation remains truly random while still achieving balanced group sizes, which reduces selection bias and maintains the integrity of the trial.This approach, called "Stratified Permuted Block Randomization" (SPBR), enhances the overall and within-stratum balance, improving the results' robustness.Before the first patient inclusion (Figure 1), the predefined random sequence for all possible stratifying variables is supposed to ensure a balance in the number of participants between treatment groups in each stratum [6].If the effect of treatment is expected to vary substantially in magnitude across clinically relevant subgroups, stratifying these subgroups can help demonstrate the treatment's effect and consistency across these subgroups (Figure 1).In practice, the main limitation of this procedure is that if there are too many strata and/or too few subjects included, many blocks will be incomplete, and an imbalance can occur [7], which may affect the study's results.
As assignment solely by chance does not guarantee group similarity, other procedures that attempt to balance treatment groups with respect to baseline covariates such as minimization can be used for this purpose [8].Minimization is a non-random method for treatment allocation, initially proposed by Taves and then by Pocok and Simon [9,10] and validated by Altman [11].The minimization mechanism lies in assigning the treatment arm to a newly included patient, based on their clinical characteristics, by calculating the differences in these characteristics between arms depending on which arm the patient is assigned to.The arm making the smallest difference (imbalance) possible is chosen (Figure 1).This dynamic adaptive method is designed to reduce differences in the distribution of prognostic factors between treatment group assignments to guarantee the similarity of groups even in small samples [12].The treatment allocated to the next participant enrolled in the trial depends (wholly or partly) on the characteristics of the participants who are already enrolled (see Figure 1).One of the most significant limitations of minimization is that the allocation of the treatment arm is predictable when all the parameters of the patients who were previously included in the study are known.Therefore, it is recommended that the minimization includes some randomness.For example, assuming a 20% randomness factor (also known as a random element), the probability of a patient being oriented towards the most unbalanced arm is, at worst, 20%.This will make the overall procedure difficult to predict, especially in a multicenter trial [13].In general and in practice, the proportion of randomness chosen varies between 10 and 25% [14,15].
Whatever the allocation treatment method (SPBR or minimization) is in clinical trials, the choice of arm allocation has no impact and is not involved in the sample size calculation, and the European Medicine Agency [16] recommends adjusting the treatment effect for the prognostic variables used for stratification to improve the power.
In this educational manuscript, conjointly written by statisticians and digestive oncologists, we first evaluate the use of these two allocation procedures-minimization and SPBR-in a systematic review of published phase II trials in gastrointestinal oncology, and second, we illustrate the advantages and disadvantages of these procedures, simulating the arm allocation from six of these trials and endpoints from one, using both procedures.

Literature Search Strategy and Selection Criteria
We conducted a systematic review of controlled phase II clinical trials in a gastrointestinal oncology setting.We limited our search to digestive cancers, as one co-author (JLR) is involved in gastrointestinal oncology and is a member of the FFCD (Fédération Francophone de Cancérologie Digestive) and PRODIGE (Partenariat de Recherche en Oncologie DIGEstive) intergroups.PRODIGE data were chosen because patient-level data can be obtained after a request is made through a committee.We searched MEDLINE (PubMed) for English-language articles to identify controlled phase II trials published between 1 January 2019 and 31 December 2019.Our search algorithm included medical subject heading terms for gastrointestinal neoplasms; the date of publication; and publication type (for a controlled clinical trial, phase II) (the search algorithm can be found in File S1 List S1).We excluded protocol publications, post hoc analyses, historic controls, phase I/II or phase III studies, and single-arm studies.All records retrieved were assessed independently by two authors (EM, VS).First, titles and abstracts were screened to identify obvious exclusions.Second, full-text reports were retrieved to determine whether they met the selection criteria.Any disagreements were resolved through discussion.Data extraction was carried out independently by the two authors (EM and VS) using a pre-designed data extraction form prior to data entry.Information extracted included the following items: the method and parameters used for patient allocation, number of patients, arms, centers, and stratification variables.

Simulating Arm Allocation
From the aforementioned real clinical trial data, we used the individual baseline characteristics that corresponded to the stratification variables described in the protocols of each study [16][17][18][19][20][21] to simulate 1000 allocations of treatment arms using SPBR and minimization successively.The patients' order of inclusion was re-shuffled for each data set simulation.For the minimization simulation, 20% of randomness was used [13], and for SPBR, the block size was set at twice the number of treatment arms.

Imbalance Measurements
Imbalance measures the difference between arms for predefined characteristics.We used three levels of precision: total, marginal, and within-stratum imbalances, which were calculated for each allocation method in each of the 1000 simulated data sets in each of the 6 trials.
Table 1 illustrates these three types of imbalances between arms with 2 stratification variables: 1.
Total imbalance is the difference measured between arms, calculated using the total number of patients assigned to each arm, 0 in our example.

2.
Marginal imbalance (or covariable margin imbalance) is calculated as the sum of the differences between arms for each modality of the variables, 2 in our example.

3.
Within-stratum imbalance is calculated as the total differences between treatment arms for each combination of stratification variables, 16 in our example.

Simulation of Endpoints
To explore whether-and how much-the allocation arm method would affect the efficacy estimation, we calculated the efficacy in each arm for each of the 1000 simulations for both methods, and we measured the difference between these estimations and the efficacy measured in the real data from the clinical trial.The PRODIGE 35 and 37 trials [21,22] were chosen to simulate their primary endpoint in relation to the 2 methods studied.The primary outcome of these studies, the 6-month Progression-Free Survival (PFS) rates, was considered a binary variable.The Bernoulli distribution was used to regenerate the endpoint for each patient using the probability of success observed in each stratum.All variables used in the stratification process, except centers, were used to estimate the treatment's effect in each stratum (any possible combination of stratification variable modalities, e.g., bottom of Table 1).We did not use centers given their high number (52 and 36 centers, respectively) and the small number of patients in each within-stratum (276 patients divided into 624 possible combinations for PRODIGE 35 and 127 patients divided into 432 possible combinations for PRODIGE 37); this did not provide enough information to make a convincing estimate of the efficacy in each stratum.The impact of the allocation arm method on the efficacy estimation was described by calculating the bias on the efficacy estimate, defined as the difference between the actual clinical efficacy obtained in the article and the measured effect in each simulated data set trial.

Software
All simulations were performed using the R version 4.2.2 (2022-10-31 ucrt): A language and environment for statistical computing (R Foundation for Statistical Computing, Vienna, Austria.URL https://www.R-project.org/accessed on 5 September 2023).
For studies with two arms, the PocSimMIN function was used to mimic the minimization procedure and StrPBR for stratified block randomization (SPBR), both of which are available in the carat [15] package.For studies with 3 arms, the Minirand [23] and blockrand [24] packages were used.

Review of Our Selected Articles
The PRISMA flow diagram of the selected articles is presented in Figure 2.After reconciliation, 40 articles were included in the analyses  (File S1 Table S1).Of these 40 articles, 14 had a last-author affiliation from Asia (35%), 16 from Europe (40%), and 10 from North America (25%).All 40 manuscripts used the term "randomization" to describe their treatment allocation in the title and/or abstract.In the full text, 16 articles were identified as using the randomization allocation procedure (9 used stratified block randomization, 7 had no stratification variable mentioned), 10 as using minimization, and the remaining 14 did not clearly report the method used for allocation.
Table 2 summarizes the characteristics of the studies included.The median number of patients was 109 (range: 24-376) with 2 arms in median (range: 2-4).Twenty-seven trials used stratification variable(s) in the allocation procedure: nine of the sixteen that used randomization, ten that used minimization, and eight out of fourteen in the unspecified method group.Overall, the median number of variables used as a stratification variable was 1 (0-3) for studies using randomization as the allocation procedure and 3 (1-7) for those using minimization.¥: One study did not report the details of the stratification variables or their number (id 41); §: one study indicated a multicenter trial but did not report the exact number of centers (id 20).

Unspeci
Out of the 40 studies included, 31 were multicenter.The median numbers of centers were, respectively, 3 (1-36) and 29  for the randomization and minimization groups.The center was a stratification variable in 10 studies (3 using randomization, 6 using minimization, and 1 using an unspecified method).In these 10 studies, the median number of centers was high (18 for randomization, 39 for minimization studies, and 9 for the unspecified-method studies).
Of the 40 studies included, 33 (82.5%) reported comparisons between arms, 14 out of the 16 that used randomization and 8 out of the 10 that used minimization.Of them, eight mentioned statistical methods for including stratification variables in their comparisons (using adjustment or subgroup comparisons on stratification variables).
Manuscripts using SPBR or minimization were published in journals with similar impact factors and the same rankings.
Of the 40 studies included, 33 (82.5%) reported comparisons between arms, 14 out of the 16 that used randomization and 8 out of the 10 that used minimization.Of them, eight mentioned statistical methods for including stratification variables in their comparisons (using adjustment or subgroup comparisons on stratification variables).Manuscripts using SPBR or minimization were published in journals with similar impact factors and the same rankings.

Imbalance in Real Clinical Trial Data
From the individual characteristics of each of the six real trials (14-19), we simulated the allocation arm 1000 times, using the minimization and SPBR methods, to observe an imbalance in the baseline characteristics between arms.The parameters and objectives of each study are available in the Supplemental Materials (File S2, Table S1).
In all studies, we observed similar patterns of imbalance distribution: using SPBR gave higher total and marginal imbalances.SPBR produced a lower within-stratum imbalance than minimization did (Figure 3 and File S2, Table S2), as it is primarily based on controlling the within-stratum imbalance as much as possible.
In the studies with the highest number of centers, the marginal and within-stratum imbalances increased with both methods; in such situations, the total imbalance only increased when using SPBR.For the studies with the highest number of centers, the total and marginal imbalances in the SPBR group were higher than with minimization, while the within-stratum imbalance was higher with minimization.We noted that the boxes were increasingly separated when the number of centers increased, without overlapping.The impact on imbalance depending on the selected method.For each trial, the distribution of the imbalances (total, marginal, and within-stratum) calculated in the 1000 simulated data sets for the 2 allocation arm methods (minimization and SPBR) is presented: boxes correspond to the interquartile range imbalance (25th-75th percentiles), Figure 3. Simulation of re-randomized real databases.The impact on imbalance depending on the selected method.For each trial, the distribution of the imbalances (total, marginal, and withinstratum) calculated in the 1000 simulated data sets for the 2 allocation arm methods (minimization and SPBR) is presented: boxes correspond to the interquartile range imbalance (25th-75th percentiles), the central segment corresponds to the median imbalance, and the whiskers are the lines that extend from the top or the bottom of the box extending to 1.5 times the interquartile range, bullets correspond to outliers.
When centers were not included as stratification variables in the simulated data sets, both imbalance values and differences in imbalance between SPBR and minimization were reduced for all three levels of imbalance (File S2, Figure S1, and Table S3).

Impact on Endpoint Evaluation in Real Clinical Trial Data
When we simulated the primary endpoint based on the individual data from PRODIGE 37, the treatment effects within each stratum and the efficacy results were similar between the two methods.For PRODIGE 35, the bias ranged from −1.2 to 0.8, with no method systematically obtaining a bias closer to 0. For PRODIGE 37, the bias was only positive; in each arm, the 6-month PFS rate assessed in the simulations was therefore on average lower than that published, without any change in the trial conclusion.Neither method outperformed the other in these two simulations with regard to the evaluation of the endpoint (File S2, Figure S2, and Table S4).

Literature Review (Main Finding)
Randomization appears to be the best known and most popular way of ensuring that any baseline differences between groups are solely the result of randomness.This is essential for trusting the results observed in different groups.It is so ingrained in the scientific reasoning in medical clinical research that the absence of the word "randomized" in the title may be viewed with suspicion.We wonder if the absence of the word "randomized" may lead to manuscripts being rejected or grants not being awarded if reviewers are not well aware of biostatistics.
In our literature review, we showed that often, the allocation method is insufficiently described, especially when stratification variables are used.In studies with stratification, 30% did not specify the method used to implement the stratification in allocating the treatment arm.Only 1 out of 10 reported the random factor value in case of minimization.We observed that sometimes, the use of the word "minimization" is not documented in the article itself but only used in the Statistical Analysis Plan.The term "randomization" may not be appropriate for describing the allocation of arms when minimization is used without containing a random component.A moderate recommended proportion (between 10 and 25%) should be subject to randomness and described in the method sections with the help of a biostatistician.A recent simulation study [65] showed that minimization insufficiently controls serious covariate imbalances for 35% of randomness.
One limitation of this literature review is that it is restricted solely to gastrointestinal oncology studies.We cannot exclude the possibility that different results could be observed for other primary tumor sites; for instance, rare site localizations may impact the recruitment and design selection

Imbalance in Real Clinical Trial Data
We found that the total and marginal imbalances were greater with SPBR than with minimization, except when centers were not taken into account (File S2, Figure S1, and Table S3).Minimization decreased these imbalances but also appeared to increase the within-stratum imbalance.
Considering prognostic factors when estimating efficacy is essential in phase II trials.By ensuring that the control arm is as close as possible to the experimental arm, one must guarantee that the effects of prognostic factors other than treatment are identified.More specifically, when the number of patients per arm is different (total imbalance), the precision of the estimate (confidence interval) is influenced.When there is an imbalance between arms for a stratification covariable (marginal imbalance), the estimate in one arm may be biased by the prognostic effect of this covariable (such as the performance status, center, etc.).When there is an imbalance within a stratum, the estimate in one arm may be biased by the interaction of these two covariables between each other.In practice, we believe that in phase II clinical trials, the most important imbalance to avoid is the total imbalance and the marginal imbalance in the case of stratification.In our clinical trial data analyses, we found that minimization outperforms SPBR regarding these two points, especially when stratification by center is desired.
We encountered difficulties simulating the efficacy within each stratum from real clinical data: the number of patients in each stratum was sometimes very small (1 or 0 patients).As we treated the endpoint as a binary variable (and not as a rate from a survival distribution), the proportion of patients that reached success in some strata was sometimes 1/1 = 100%.The precision of the efficacy estimate was not sufficient to draw conclusions on the differences in efficacy estimates for each method.

Center as a Stratification Variable
The center is frequently considered a prognostic factor by itself.We found that 78% of the trials reported in our review were multicenter, with a median of 11 centers.In the case of multicenter trials, centers must thus be considered a stratification variable.When numerous centers are used as a stratification factor, we have shown that the marginal imbalance of arm allocation is high, and the efficacy estimation may thus be influenced by the specificity of these centers.In practice, it may be difficult to include centers in the stratification variables if SPBR is used (if there are too many, this would generate too many incomplete or empty blocks), and that will negatively affect the total imbalance of the study and thus the similarity of both arms.In situations with multiple centers, minimization has better results, with a systematically lower imbalance; we therefore consider that minimization must be recommended in the case of multicenter studies.
We found that SPBR performs better than minimization regarding within-stratum imbalance.These results warn us about a potential bias due to interactions of stratified variables on the treatment effect.Looking at treatment effects at the intersection of different categories of variables is not relevant in phase II trials, as these studies are not designed for this purpose.As phase II trials have fewer participants than phase III trials, the estimation of efficacy within a stratum is not possible.However, in larger phase III clinical trials, when there are more patients in each subgroup, the estimation of efficacy within a stratum may be possible, and we thus recommend using SPBR instead of minimization.
Recently, in a journal specializing in methodology and statistics, Coart et al. [66] reported a tutorial overviewing methodological and statistical issue according to the choice of allocation arm method, especially with minimization.Specifically, they addressed the two most common limits imputed to the minimization method: predictability and Type-I error control.In detail, using a simulation study and review of 50 trials using minimization conducted in their center, they showed that the predictability of the allocation arm (the probability of correctly guessing the next treatment allocation) is only an issue in the uncommon scenario where investigators participating in the trial have knowledge of the treatment allocation algorithm and information according to treatment group on the patients allocated so far.In this specific situation, modified algorithms are available to reduce predictability.Callegaro et al. [67] showed that the randomization test preserves type-I errors, and Coart et al. showed that the randomization test and usual asymptotic tests provide similar inferences.Finally, they concluded that minimization is especially useful when many baseline factors are known to have an impact on the patient prognosis and in small or medium sample sizes (up to 100 patients), which is a common situation when planning phase II studies.

Other Methods
In this report, our primary focus is on SPBR and minimization; however, there are various approaches employed in clinical trial designs.Among these, "pick-the-winner" is a response-adaptive design that adjusts treatment assignment probabilities based on observed outcomes during the trial, with the aim of allocating more participants to treatments demonstrating superior efficacy.Biomarker-Adaptive Randomization tailors treatment allocation according to participants' biomarker profiles, potentially enhancing treatment efficacy by targeting specific biological characteristics.Bayesian Adaptive Randomization utilizes Bayesian statistical methods to update treatment allocation probabilities, integrating prior knowledge and offering a flexible framework for adjusting randomization based on accumulating trial data.Each method presents distinct advantages and considerations, contributing to the comprehensive toolkit that is available for optimizing clinical trial designs and execution.It would be of interest in future studies to explore these alternative methods of arm allocation further.

Conclusions
The minimization method offers a better opportunity to guarantee similarity between treatment arms, while Stratified Permuted Block Randomization may be less effective, particularly in terms of the total and marginal imbalances.This is particularly true in cases of multicenter trials and in phase 2 clinical trials with a relatively small sample size.As phase II trials do not focus on demonstrating the superiority of one arm, or on the effects of the intersection of variables or modeling efficacy, the total and marginal imbalances should be minimized as a priority.This method is easily understandable and directly applicable to clinical studies, as it is implemented in most statistical or clinical database management software programs.

Figure 1 .
Figure 1.Decision regarding the arm allocation of the 8th patient ( ) depending on the 2 procedures: Stratified Permuted Block Randomization (left side) and minimization (right side).The characteristics of the 7 patients who were already randomized in the study are shown at the top left of the diagram.The study's stratification variables are represented using colors and pictograms (see top right of diagram).

Figure 1 . 3 Figure 1 .
Figure 1.Decision regarding the arm allocation of the 8th patient (

)
depending on the 2 procedures: Stratified Permuted Block Randomization (left side) and minimization (right side).The characteristics of the 7 patients who were already randomized in the study are shown at the top left of the diagram.The study's stratification variables are represented using colors and pictograms (see top right of diagram).

Figure 2 .
Figure 2. Literature review flowchart: PRISMA flow diagram of our systematic analysis of controlled phase II trials in digestive oncology published in 2019.

Figure 2 .
Figure 2. Literature review flowchart: PRISMA flow diagram of our systematic analysis of controlled phase II trials in digestive oncology published in 2019.

Figure 3 .
Figure 3. Simulation of re-randomized real databases.The impact on imbalance depending on the selected method.For each trial, the distribution of the imbalances (total, marginal, and within-stratum) calculated in the 1000 simulated data sets for the 2 allocation arm methods (minimization and SPBR) is presented: boxes correspond to the interquartile range imbalance (25th-75th percentiles),

Table 1 .
Example of distribution of patients (n = 100) using 2 stratification variables (center, smoking).The 3 types of imbalances are shown in gray.

Table 2 .
Description of studies in the literature review.

Table 2 .
Description of studies in the literature review.