Subset Selection with Curtailment Among Treatments with Two Binary Endpoints in Comparison with a Control

Yin, Chishu; Buzaianu, Elena M.; Chen, Pinyuen; Hsu, Lifang

doi:10.3390/math13193067

Open AccessArticle

Subset Selection with Curtailment Among Treatments with Two Binary Endpoints in Comparison with a Control

¹

Department of Mathematics, Syracuse University, Syracuse, NY 13244, USA

²

Department of Mathematics and Statistics, University of North Florida, Jacksonville, FL 32224, USA

³

Department of Mathematics, Le Moyne College, Syracuse, NY 13214, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(19), 3067; https://doi.org/10.3390/math13193067

Submission received: 30 June 2025 / Revised: 23 August 2025 / Accepted: 3 September 2025 / Published: 24 September 2025

(This article belongs to the Special Issue Sequential Sampling Methods for Statistical Inference)

Download Versions Notes

Abstract

We propose a sequential procedure with a closed and adaptive structure. It selects a subset of size

t (> 0)

from

k (\geq t)

treatments in such a way that any treatment superior to the control is guaranteed to be included. All the experimental treatments and the control are assumed to produce two binary endpoints, and the procedure is based on those two binary endpoints. A treatment is considered superior if both its endpoints are larger than those of the control. While responses across treatments are assumed to be independent, dependence between endpoints within each treatment is allowed and modeled via an odds ratio. The proposed procedure comprises explicit sampling, stopping, and decision rules. We demonstrate that, for any sample size n and parameter configuration, the probability of correct selection remains unchanged when switching from the fixed-sample-size procedure to the sequential one. We use the bivariate binomial and multinomial distributions in the computation and derive design parameters under three scenarios: (i) independent endpoints, (ii) dependent endpoints with known association, and (iii) dependent endpoints with unknown association. We provide tables with the sample size savings achieved by the proposed procedure compared to its fixed-sample-size counterpart. Examples are given to illustrate the procedure.

Keywords:

curtailment; sample size; subset selection; two binary endpoints

MSC:

62F07

1. Background, Introduction, and Motivation

As both efficacy and safety are crucial in Phase II evaluations, we propose a closed adaptive sequential procedure to compare

k (> 1)

experimental treatments with a control group. Each treatment, including the control, is assessed using two binary endpoints: one for efficacy and one for safety. An experimental treatment is considered superior to the control if it demonstrates higher success probabilities in both endpoints. Let the control treatment be denoted by

π_{0}

, and the k experimental treatments by

π_{1}, π_{2}, \dots, π_{k}

. The outcomes for each treatment consist of two binary endpoints that are modeled marginally as Bernoulli random variables with unknown success probabilities. Specifically, for the control, the success probabilities are denoted by

(p_{0, 1}, p_{0, 2})

, and for each experimental treatment

π_{i} (i = 1, \dots, k)

, the corresponding success probabilities are

(p_{i, 1}, p_{i, 2})

. Comparisons are made between each experimental treatment and the control by evaluating whether

p_{i, 1} > p_{0, 1}

and

p_{i, 2} > p_{0, 2}

. The proposed procedure incorporates curtailment, allowing for early termination of sampling when sufficient evidence has been gathered to make a decision, thus potentially stopping before reaching the maximum sample size N. The term closed highlights the existence of an upper bound N, whereas adaptive indicates that continuation decisions are guided by interim outcomes.

In medical and pharmaceutical research, a common statistical problem involves selecting the best among k (

k > 1

) treatments, each yielding Bernoulli outcomes. Classical methods addressing this issue include the fixed-sample Indifference Zone approach of [1], and the Subset Selection strategy of [2], which aims to identify a group likely to contain the best treatment. Later studies by [3] and [4] further explored this framework for comparisons involving a control. A notable limitation of these approaches is their exclusive focus on single binary endpoints under non-adaptive, fixed-sample designs.

In contrast to earlier fixed-sample approaches, we consider a curtailed selection procedure involving two Bernoulli endpoints. Under this design, each treatment is sampled sequentially, up to a pre-specified maximum of N observations. Sampling for a treatment is terminated early if there is sufficient evidence that it is no longer a viable candidate, or once the sample limit N is reached. The formal definition of a “contending treatment” is given in Procedure R (Section 3). This early stopping strategy can substantially reduce the total number of observations required for treatment selection.

Curtailment has previously been applied in clinical trials with Bernoulli outcomes, particularly in the context of hypothesis testing ([5]) and treatment selection ([6,7,8]). However, these studies have largely focused on a single Bernoulli endpoint. In this work, we extend the notion of curtailment to the selection problem involving two Bernoulli endpoints. Related investigations include [9] for normally distributed outcomes, and [10,11,12] for binary endpoint designs.

Recent developments in ranking and selection (R&S) have highlighted the importance of incorporating covariates into selection procedures to support personalized decision-making. Ref. [13] introduced a two-stage R&S procedure with covariates (R&S-C) based on a linear regression framework, addressing scenarios where the mean performance of treatments varies systematically with patient-specific covariates. Their approach ensures a predefined average probability of correct selection across the covariate space, enabling tailored treatment recommendations in heterogeneous populations. Although their framework differs from ours by considering a single continuous endpoint and not incorporating curtailment, their emphasis on individual-level heterogeneity underscores the broader need for flexible experimental designs that adapt to diverse patient characteristics.

In addition, Bayesian sequential approaches such as the Knowledge Gradient (KG) method have also been proposed to incorporate covariates into ranking and selection problems ([14]). These methods provide theoretical consistency guarantees and computationally efficient decision-making frameworks, which are particularly valuable when dealing with large covariate spaces or high-dimensional decision scenarios. Nevertheless, Bayesian methods often require stronger assumptions and more complex computations than frequentist procedures. Our study, while not directly modeling covariates, provides a simpler, frequentist alternative designed specifically for dual binary endpoints that facilitates easier implementation in clinical trial settings.

More recently, ref. [15] discussed a curtailed procedure for subset selection involving two binary Bernoulli endpoints. However, their approach compares each experimental treatment to a well-established standard treatment. This design is most appropriate when a widely accepted reference treatment exists. In contrast, our procedure compares new treatments against a control treatment, which may or may not be a recognized standard.

There are many situations in which our approach is more applicable. For example, in the absence of a universally accepted standard treatment—such as when placebo is the only baseline option—it becomes necessary to evaluate new treatments in relation to a control. Similarly, even when a standard treatment exists, it may have been validated only in limited populations (e.g., specific age groups, races, or genders). In such cases, it is important to assess whether the standard treatment continues to perform well in broader or different patient populations. Our design, which explicitly includes the control treatment in the experiment, enables such comparisons and provides a flexible and inclusive framework for decision-making.

This study tackles the selection problem involving two binary endpoints through a subset selection framework. We propose a curtailed, closed sequential sampling design, where the total number of observations obtained from each of the

k + 1

candidate populations (including the control) is a bounded random variable. We assume that treatment effects manifest quickly relative to the overall trial duration. The proposed procedure is constructed with reference to a fixed-sample-size method, which will be detailed in later sections. Our results show that the sequential design achieves the same probability of correctly identifying superior treatments as the fixed-sample method, while requiring fewer samples from inferior alternatives. Section 2 sets forth the modeling assumptions, trial goals, and statistical criteria for studies involving dual binary endpoints. Section 3 introduces the fixed-sample approach used as the basis for comparison in evaluating the curtailed procedure.

In Section 4, we introduce a curtailed sequential selection method that fulfills the objective of this study. It is shown that the proposed design achieves the same probability of correct selection as the corresponding fixed-sample-size method across the entire parameter space. Section 5 compares the expected sample sizes between the curtailed and non-curtailed versions of the procedure. In Section 6, we present two numerical examples that demonstrate the implementation of the proposed approach. Lastly, Section 7 concludes the paper with final remarks.

2. Assumptions, Goal, and Probability Requirements

Assumptions: Assume that n independent subjects receive a treatment, and two binary outcomes—typically interpreted as therapeutic efficacy (“response”) and safety (“nontoxicity”)—are recorded for each individual. Following the notation introduced by [11], let

X_{i j}

denote the number of subjects experiencing outcome i on the first endpoint and outcome j on the second, where

i, j \in {1, 2}

, with 1 indicating “success” and 2 indicating “failure.” The observed data can be organized into a

2 \times 2

contingency table (see Table 1).

We assume that the random vector

X = (X_{11}, X_{12}, X_{21}, X_{22})

follows a multinomial distribution with probabilities

P = (p_{11}, p_{12}, p_{21}, p_{22})

, where:

$p_{11}$ is the probability of success on both endpoints,
$p_{12}$ is the probability of success on endpoint 1 and failure on endpoint 2,
$p_{21}$ is the probability of failure on endpoint 1 and success on endpoint 2,
$p_{22}$ is the probability of failure on both endpoints.

The use of a multinomial distribution to model two dependent binary endpoints is standard in clinical trial design literature. In particular, both [11] and [12] adopt the multinomial formulation in their Phase II design studies. We follow the same convention in this work, leveraging its flexibility in capturing both marginal probabilities and the association structure between the two endpoints.

All developments in this paper are carried out under a joint multinomial (equivalently, bivariate–binomial) model for two dependent binary endpoints, a modeling choice that is widely used in Phase II clinical trial design. The real-data examples are intended to illustrate implementation rather than to validate distributional assumptions. In applied use, investigators should justify or check the adequacy of the multinomial model for their specific data before adopting the procedure.

Let

X_{1} = X_{11} + X_{12}

and

X_{2} = X_{11} + X_{21}

represent the number of successes on endpoints 1 and 2, respectively. The marginal probabilities of success are given by

p_{1} = p_{11} + p_{12}

and

p_{2} = p_{11} + p_{21}

, respectively. Consequently,

X_{1} \sim Binomial (n, p_{1})

and

X_{2} \sim Binomial (n, p_{2})

. We denote the binomial probability mass function with parameters n and p by

b (n, p, \cdot)

.

The joint distribution of

(X_{1}, X_{2})

depends not only on

p_{1}

and

p_{2}

, but also on the association between the two endpoints. To quantify this association, we use the odds ratio

ϕ = \frac{p_{11} p_{22}}{p_{21} p_{12}}

, which is a natural and widely used measure in

2 \times 2

tables. Notably,

ϕ

is independent of the marginal probabilities

p_{1}

and

p_{2}

. When

ϕ = 1

, the two endpoints are independent;

ϕ > 1

indicates a positive association, and

ϕ < 1

indicates a negative association.

This study investigates two binary outcomes—efficacy and safety—for each experimental treatment, comparing them against those of a control. The success probabilities associated with the control treatment on these two endpoints are denoted by

p_{0, 1}

and

p_{0, 2}

, respectively. Let

π_{i}

, for

i = 1, \dots, k

, denote the k experimental treatments being evaluated, with

π_{0}

representing the control group. Each treatment is associated with two binary outcomes. To distinguish between the two endpoints within each treatment, we use a second subscript j in the notation, where

j = 1

corresponds to the efficacy endpoint and

j = 2

to the safety endpoint. Thus, the success probabilities for treatment

π_{i}

are denoted by

p_{i, 1}

and

p_{i, 2}

for the efficacy and safety endpoints, respectively, for

i = 0, 1, \dots, k

. We assume that the

k + 1

treatment groups are statistically independent, indicating that responses from different treatment arms do not influence one another. In contrast, within any single treatment arm, dependence between the two endpoints may exist. To classify treatments based on their performance, we partition the parameter space

{(p_{1, 1}, p_{1, 2}), \dots, (p_{k, 1}, p_{k, 2}) ∣ 0 < p_{i, j} < 1 for i = 1, \dots, k, j = 1, 2}

using four prespecified constants:

δ_{0, 1}^{*}, δ_{0, 2}^{*}, δ_{1, 1}^{*},

and

δ_{1, 2}^{*}

. These constants satisfy the conditions

- \infty < δ_{0, 1}^{*} < δ_{1, 1}^{*}

with

δ_{1, 1}^{*} > 0

, and

- \infty < δ_{0, 2}^{*} < δ_{1, 2}^{*}

with

δ_{1, 2}^{*} > 0

. In this framework, a treatment

π_{i}

is considered ineffective if

p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*}

or

p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*}

, and considered effective if

p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*}

and

p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*}

, where

p_{0, 1}

and

p_{0, 2}

are success probabilities of the control treatment, and we assume that these two probabilities are known prior to conducting the selection procedure. Our objective is to classify the k experimental treatments into two groups: those that are effective and those that are ineffective. We now describe the formal selection goal.

The threshold parameters

δ_{0, 1}^{*}, δ_{0, 2}^{*}, δ_{1, 1}^{*}, δ_{1, 2}^{*}

are pre-specified quantities that reflect the minimal clinically meaningful differences in efficacy and safety between an experimental treatment and the control. In practice, these values are typically determined by subject-matter experts based on prior clinical knowledge, historical studies, or the expected magnitude of benefit that would justify advancing a treatment to further development stages. From a statistical perspective, the role of these thresholds is analogous to the effect size specified in power calculations for hypothesis testing. Just as the effect size quantifies the magnitude of difference that a study aims to detect with sufficient power, the

δ^{*}

values define the selection region where a treatment is considered promising in terms of both endpoints. Importantly, the framework allows these thresholds to be flexibly tuned to reflect the specific objectives or priorities of a trial. For example, if the primary goal is to achieve a substantial improvement in efficacy while maintaining comparable safety, one may set

δ_{1, 1}^{*}

to a larger value and

δ_{1, 2}^{*}

close to zero. Conversely, if safety improvement is the main concern—such as in cases where toxicity is a known issue—one may choose a small

δ_{1, 1}^{*}

and a larger

δ_{1, 2}^{*}

. A detailed illustration of how these thresholds can be specified based on the desired benefit-risk trade-off is provided in Example 2 (Chemotherapy of Acute Leukemia), where the clinical objective is to improve safety while maintaining efficacy.

Goal: Select a subset consisting of those treatments

π_{i}

for which

p_{i, 1} > p_{0, 1}

and

p_{i, 2} > p_{0, 2}

; that is, include all experimental treatments that demonstrate superiority over the control treatment with respect to both efficacy and safety. If no such treatment exists—i.e., if no

π_{i}

satisfies both

p_{i, 1} > p_{0, 1}

and

p_{i, 2} > p_{0, 2}

—then none of the k experimental treatments should be selected.

Probability requirements: Let

P_{0}^{*}

and

P_{1}^{*}

be pre-specified probability constants satisfying

2^{- k} < P_{0}^{*} < 1

and

(1 - 2^{- k}) / k < P_{1}^{*} < 1

. The probability requirements for the selection procedure are defined as follows:

\begin{matrix} P (All π_{i} with p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*} and p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*} \\ are included in the selected subset, for i = 1, \dots, k) \geq P_{1}^{*}, \end{matrix}

(1)

and

\begin{matrix} P (No π_{i} is selected whenever p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*} or p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*} \\ for all i = 1, \dots, k) \geq P_{0}^{*} . \end{matrix}

(2)

Let

C S_{1}

denote the event that the selected subset correctly includes all effective treatments, provided such treatments exist. Specifically,

C S_{1}

occurs when every treatment

π_{i}

satisfying

p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*}

and

p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*}

is included in the selected subset. Similarly, let

C S_{0}

denote the event that no treatment is selected when none are truly effective. That is,

C S_{0}

occurs if

p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*}

or

p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*}

holds for all

i = 1, \dots, k

. Under this framework, the selection procedure is required to satisfy the following probability criteria:

\begin{matrix} P (C S_{1}) & \geq P_{1}^{*}, \\ P (C S_{0}) & \geq P_{0}^{*}, \end{matrix}

where

P_{1}^{*}

and

P_{0}^{*}

are prespecified thresholds that represent the minimum acceptable probabilities for correctly identifying effective treatments and correctly excluding ineffective treatments, respectively.

Remark 1.

A selection is considered correct when at least one effective experimental treatment exists and the chosen subset successfully includes all such effective options. The rationale for selecting a subset—rather than identifying a single best treatment—is that no natural ordering can be established among the pairs of success probabilities

(p_{1, 1}, p_{1, 2}), \dots, (p_{k, 1}, p_{k, 2})

unless one endpoint is explicitly prioritized over the other. Since this paper does not assume any preference between the two endpoints, we adopt a subset selection approach.

3. Fixed Sample Size Procedure

This section introduces the fixed-sample-size selection method, which serves as a benchmark for evaluating the curtailed procedure outlined in Section 4. This fixed-sample-size procedure was derived by [16]. We also present results concerning the determination of design parameters that guarantee the fixed-sample-size procedure meets the probability criteria specified in Probability requirements (1) and (2).

For prespecified design parameters

n, c_{1}, c_{2}

, the selection procedure is defined as follows:

Procedure H:

Suppose we obtain n observations from each of the k Bernoulli experimental treatments and the control treatment. Let

X_{i, 1}

and

X_{i, 2}

denote the number of observed successes for the first and second binary endpoints of treatment i, where

i = 0, 1, 2, \dots, k

. For fixed positive constants

c_{1}

and

c_{2}

, Procedure H proceeds as follows:

(1): Include in the selected subset all treatments $π_{i}$ such that $X_{i, 1} - X_{0, 1} \geq c_{1}$ and $X_{i, 2} - X_{0, 2} \geq c_{2}$ ;
(2): If no treatment $π_{i}$ satisfies both conditions, select the control treatment $π_{0}$ .

In ranking and selection problems, it is common practice to derive an explicit formula for the probability of a correct selection, denoted by

P (C S)

, and then identify the least favorable configuration (LFC), under which

P (C S)

reaches its minimum. Design parameters are then determined such that

P (C S | LFC)

exceeds a pre-specified threshold

P^{*}

.

However, in this particular subset selection problem, no closed-form expression for

P (C S)

could be obtained. Instead, a lower bound was developed, together with the configuration that minimizes it. If this minimum bound exceeds

P^{*}

, then

P (C S)

will necessarily be no less than

P^{*}

across all parameter settings.

We denote by

C F G_{1}

the configuration where

p_{i, 1} = p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} = p_{0, 2} + δ_{1, 2}^{*}

,

i = 1, \dots, k

and by

C F G_{0}

the parameter configuration where

p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}, p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*}

,

i = 1, \dots, k

. Then

C F G_{1}

and

C F G_{0}

are the configurations under which the lower bounds

P_{L} (C S_{1})

and

P_{L} (C S_{0})

of the probabilities of correct selections

P (C S_{1})

and

P (C S_{0})

, respectively, were computed.

P_{L} (C S_{1})

also depends on the odds ratios between the two endpoints of each of the k treatments, while

P_{L} (C S_{0})

does not. It is assumed that the degree of association between the two endpoints is identical across all k treatments. Three scenarios are considered: (1) the endpoints are independent; (2) the endpoints are dependent with a known association structure; and (3) the endpoints are dependent but the association is unknown. When the association is not known, it was shown that the minimum of

P_{L} (C S_{1})

is attained when the odds ratio is zero. However, numerical computations showed that the sample size varies very little with the odds ratio. We now present the theorems establishing the lower bounds

P_{L} (C S_{0})

and

P_{L} (C S_{1})

for the correct selection probabilities

P (C S_{0})

and

P (C S_{1})

, respectively. The corresponding proofs are provided in [16] and also included in Appendix A for completeness.

Case 1: $ϕ_{i} = 1$ , $i = 0, 1, \dots, k$ . We begin with the scenario where the two endpoints are independent. Specifically, we assume that

ϕ_{i} = 1

for all

i = 0, 1, \dots, k

. Under this assumption,

X_{i, 1}

and

X_{i, 2}

are mutually independent binomial random variables with parameters

(n, p_{i, 1})

and

(n, p_{i, 2})

, respectively, for each

i = 0, 1, \dots, k

.

Theorem 1

(Adapted from [16]). For fixed

k, p_{0, 1}, p_{0, 2}, δ_{0, 1}^{*}, δ_{0, 2}^{*}, δ_{1, 1}^{*}, δ_{1, 2}^{*},

the probability requirements are satisfied by choosing values of

n, c_{1}, c_{2}

that simultaneously satisfy

(\sum_{x_{0, 1} = 0}^{n} {[\sum_{x_{i, 1} = c_{1} + x_{0, 1}}^{n} b (n, p_{0, 1} + δ_{1, 1}^{*}, x_{i, 1})]}^{k} b (n, p_{0, 1}, x_{0, 1})) \times (\sum_{x_{0, 2} = 0}^{n} {[\sum_{x_{i, 2} = c_{2} + x_{0, 2}}^{n} b (n, p_{0, 2} + δ_{1, 2}^{*}, x_{i, 2})]}^{k} b (n, p_{0, 2}, x_{0, 2})) \geq P_{1}^{*}

and

\sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2}}^{n} {(1 - max [P (X_{i, 1} \geq c_{1} + x_{0, 1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), P (X_{i, 2} \geq c_{2} + x_{0, 2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*})])}^{k} \times b (n, p_{0, 1}, x_{0, 1}) b (n, p_{0, 2}, x_{0, 2}) \geq P_{0}^{*}

.

Case 2: $ϕ_{i} \neq 1$ is specified for $i = 0, 1, \dots, k$ . We next consider the scenario where the two endpoints within each treatment arm exhibit dependence, with the association parameter

ϕ_{i}

known in advance.

Theorem 2

(Adapted from [16]). For fixed values of

k, p_{0, 1}, p_{0, 2}, δ_{0, 1}^{*}, δ_{0, 2}^{*}, δ_{1, 1}^{*}, δ_{1, 2}^{*}, ϕ_{i},

i = 0, 1, \dots, k

, the probability requirements are satisfied by choosing values of

n, c_{1}, c_{2}

that simultaneously satisfy

\sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} [Π_{i = 1}^{k} (\sum_{x_{i, 1} = c_{1} + x_{0, 1}}^{n} \sum_{x_{i, 2} = c_{2} + x_{0, 2}}^{n} P (X_{i, 1} = x_{i, 1}, X_{i, 2} = x_{i, 2} | p_{i, 1} = p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} = p_{0, 2} + δ_{1, 2}^{*}, ϕ_{i}))] \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = X_{0, 2} | p_{0, 1}, p_{0, 2}, ϕ_{0}) \geq P_{1}^{*}

and

\sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2}}^{n} {(1 - max [P (X_{i, 1} \geq c_{1} + x_{0, 1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), P (X_{i, 2} \geq c_{2} + x_{0, 2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*})])}^{k} \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = X_{0, 2} | p_{0, 1}, p_{0, 2}, ϕ_{0}) \geq P_{0}^{*}

. where

p_{i, 11} = \frac{a_{i} - \sqrt{a_{i}^{2} + b_{i}}}{2 (ϕ_{i} - 1)}

,

a_{i} = 1 + (ϕ_{i} - 1) (p_{0, 1} + δ_{1, 1}^{*} + p_{0, 2} + δ_{1, 2}^{*})

,

b_{i} = - 4 ϕ_{i} (ϕ_{i} - 1) (p_{0, 1} + δ_{1, 1}^{*}) (p_{0, 2} + δ_{1, 2}^{*})

,

p_{i, 12} = p_{0, 1} + δ_{1, 1}^{*} - p_{i, 11}

.

Case 3: $ϕ_{i}$ unspecified for $i = 1, 2, \dots, k$ , $ϕ_{0}$ specified. We now examine the setting in which the dependence structure between the two endpoints is unknown for the experimental treatments, while the association for the control treatment remains specified.

Theorem 3

(Adapted from [16]). For fixed

k, p_{0, 1}, p_{0, 2}, δ_{0, 1}^{*}, δ_{0, 2}^{*}

,

δ_{1, 1}^{*}, δ_{1, 2}^{*}

, the probability requirements are satisfied by choosing values of

n, c_{1}, c_{2}

that simultaneously satisfy

\sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} {[\sum_{x_{i, 1} = c_{1} + x_{0, 1}}^{n} \sum_{x_{i, 2} = c_{2} + x_{0, 2}}^{n} b (n, p_{0, 2} + δ_{1, 2}^{*}, x_{i, 2}) f (x_{i, 1}, x_{i, 2})]}^{k} \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = x_{0, 2} | p_{0, 1}, p_{0, 2}, ϕ_{0}) \geq P_{1}^{*}

and

(\sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2}}^{n} {(1 - max [P (X_{i, 1} \geq c_{1} + x_{0, 1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), P (X_{i, 2} \geq c_{2} + x_{0, 2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*})])}^{k} \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = X_{0, 2} | p_{0, 1}, p_{0, 2}, ϕ_{0}) \geq P_{0}^{*}

where

f (x_{i, 1}, x_{i, 2}) = \{\begin{matrix} b (n - x_{i, 2}, \frac{p_{0, 1} + δ_{1, 1}^{*}}{1 - (p_{0, 2} + δ_{1, 2}^{*})}, x_{i, 1}) & if p_{0, 1} + δ_{1, 1}^{*} < 1 - (p_{0, 2} + δ_{1, 2}^{*}) \\ b (x_{i, 2}, \frac{p_{0, 1} + δ_{1, 1}^{*} + p_{0, 2} + δ_{1, 2}^{*} - 1}{p_{0, 2} + δ_{1, 2}^{*}}, x_{i, 1} + x_{i, 2} - n) & if p_{0, 1} + δ_{1, 1}^{*} > 1 - (p_{0, 2} + δ_{1, 2}^{*}) \\ b (n, p_{0, 2} + δ_{1, 2}^{*}, x_{i, 2}) 1_{x_{i, 1} + x_{i, 2} = n} & if p_{0, 1} + δ_{1, 1}^{*} = 1 - (p_{0, 2} + δ_{1, 2}^{*}) \end{matrix}

(3)

,

1_{x_{i, 1} + x_{i, 2} = n} = \{\begin{matrix} 1 & x_{i, 1} + x_{i, 2} = n \\ 0 & x_{i, 1} + x_{i, 2} \neq n \end{matrix}

and

ϕ_{i} = 0, i = 1, 2, \dots, k

Remark 2.

The lower bound on

P (C S_{0})

depends only on the odds ratio for the control treatment.

Remark 3.

Ref. [16] demonstrated that

P (C S_{1})

increases with the odds ratios of the experimental treatments

π_{i}

, for

i = 1, 2, \dots, k

. Therefore, the minimum value of

P (C S_{1})

is achieved when the odds ratios of all experimental treatments are zero. To obtain a lower bound for

P (C S_{1})

, we evaluated it under the assumption that all treatments tested have odds ratios equal to zero. As a result, the scenario with unspecified odds ratios is effectively handled by considering the scenario where the odds ratios of the tested treatments are zeros.

Remark 4.

The theoretical results presented here are obtained under the assumption that all k treatments share the same type of dependence between the two endpoints. For instance, the endpoints may be mutually independent across all treatments, or they may follow a common, known dependence structure. Nevertheless, due to the generality of our derivations, the framework can also accommodate settings in which different treatments exhibit heterogeneous dependence types—such as independence for some and unspecified association for others.

Remark 5.

Under configuration

C F G_{1}

, where

p_{i, 1} = p_{0, 1} + δ_{1, 1}^{*}

and

p_{i, 2} = p_{0, 2} + δ_{1, 2}^{*}

for all

i = 1, \dots, k

, the quantity

P_{L} (C S_{1})

represents a lower bound on the probability of correctly selecting

π_{1}, \dots, π_{k}

when the experimental treatments are truly superior to the control. In this sense,

P_{L} (C S_{1})

is analogous to a lower bound on the power in classical hypothesis testing. On the other hand, under configuration

C F G_{0}

, where

p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}

and

p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*}

for all

i = 1, \dots, k

, the quantity

P_{L} (C S_{0})

denotes the probability of correctly selecting

π_{0}

, reflecting the scenario in which none of the experimental treatments are truly superior to the control. In this case,

1 - P_{L} (C S_{0})

is conceptually analogous to the upper bound of the family-wise error rate (FWER) in multiple hypothesis testing, as it represents the probability of erroneously selecting at least one ineffective treatment.

4. Proposed Curtailment Procedure

To address the objective described in Section 2, we introduce a curtailed procedure, denoted by R. This sequential approach utilizes curtailment to reduce the required sample size for treatments that are evidently ineffective or demonstrate sufficient efficacy. Let n represent the maximum number of observations allowed per treatment.

Curtailment Procedure R:

A treatment is considered contending if it has not yet been eliminated from the study. The procedure begins with all

k + 1

populations classified as contending. Sampling follows a vector-at-a-time approach. At “Step M”, where

1 \leq M \leq n

, a total of M vectors have been drawn. Let

Y_{(i, 1), M}

and

Y_{(i, 2), M}

denote the cumulative numbers of successes observed for endpoints 1 and 2 of treatment

π_{i}

through Step M.

Sampling Rule. A vector-at-a-time sampling strategy is adopted, subject to the following constraints:

(a) A maximum of n observations may be collected from each of the

(k + 1)

populations. Sampling proceeds sequentially from each contending treatment, one observation at a time, until either the treatment accumulates n observations or is eliminated based on criteria (b) or (c) below.

(b) At any step M, if the number of successes for the two endpoints,

Y_{(i, 1), M}

and

Y_{(i, 2), M}

, of treatment

π_{i}

satisfy

Y_{(i, 1), M} + n - M < c_{1} + Y_{(0, 1), M} or Y_{(i, 2), M} + n - M < c_{2} + Y_{(0, 2), M},

then eliminate treatment

π_{i}

and stop sampling from it.

(c) At any step M, if the number of successes for the two endpoints,

Y_{(i, 1), M}

and

Y_{(i, 2), M}

, of treatment

π_{i}

satisfy

Y_{(i, 1), M} \geq c_{1} + Y_{(0, 1), M} + n - M and Y_{(i, 2), M} \geq c_{2} + Y_{(0, 2), M} + n - M,

then terminate sampling for treatment

π_{i}

.

Stopping Rule:

The experiment is terminated at the first step M at which any one of the following three conditions is met:

(i): There exists a partition $A, B$ of the set ${1, 2, \dots, k}$ such that:

$\begin{matrix} Y_{(i, 1), M} \geq c_{1} + Y_{(0, 1), M} + n - M and Y_{(i, 2), M} \geq c_{2} + Y_{(0, 2), M} + n - M, for all i \in A, \\ Y_{(j, 1), M} + n - M < c_{1} + Y_{(0, 1), M} or Y_{(j, 2), M} + n - M < c_{2} + Y_{(0, 2), M}, for all j \in B . \end{matrix}$
(ii): For all $i \in {1, 2, \dots, k}$ ,

$Y_{(i, 1), M} + n - M < c_{1} + Y_{(0, 1), M} or Y_{(i, 2), M} + n - M < c_{2} + Y_{(0, 2), M} .$
(iii): $M = n$ .

Decision Rule:

(a) If sampling terminates under condition (i) of the above Stopping Rule, then all treatments

π_{i}

contained in set A are included in the selected subset.

(b) If sampling terminates under condition (ii) of the above Stopping Rule, we conclude that none of the experimental treatments demonstrates a statistically significant improvement over the control treatment

π_{0}

.

(c) If sampling terminates under condition (iii) of the above Stopping Rule, we include in the selected subset all treatments

π_{i}

satisfying

Y_{(i, 1), n} \geq c_{1} + Y_{(0, 1), n}

and

Y_{(i, 2), n} \geq c_{2} + Y_{(0, 2), n}

. If no such treatments exist—i.e., the selected subset is empty—we conclude that none of the experimental treatments is significantly better than the control

π_{0}

.

Theorem 4.

Given fixed k and n, procedures H and R produce the same subset of treatments

{π_{0}, π_{1}, \dots, π_{k}}

if they are applied with identical threshold parameters

c_{1}

and

c_{2}

. This equivalence holds uniformly over the parameter space

{(p_{0, 1}, p_{0, 2}), (p_{1, 1}, p_{1, 2}), \dots, (p_{k, 1}, p_{k, 2})}

.

Proof.

Decision Rule (c) of Procedure R, which applies exclusively when sampling terminates under Stopping Rule (iii), coincides with the decision rule of Procedure H in the case where

m = n

. Consequently, for any sampling outcome in which each of the k treatments reaches the maximum of n observations, procedures R and H will yield the same selected subset. It therefore suffices to examine only those cases in which Procedure R makes a decision under Rule (a) or (b).

Decision Rules (a) and (b) are applied exclusively when sampling terminates under Stopping Rules (i) and (ii), respectively. Observe that sampling under either of these rules always occurs before the maximum number of observations is reached; that is,

M < n

, and hence

n - M > 0

.

If sampling terminates under Stopping Rule (i), then under Procedure R, Decision Rule (a) selects all treatments

π_{i}

that satisfy the condition:

Y_{(i, 1), M} \geq c_{1} + Y_{(0, 1), M} + n - M and Y_{(i, 2), M} \geq c_{2} + Y_{(0, 2), M} + n - M .

Now suppose the experiment were to continue as it would under Procedure H. Let

Y_{(i, 1), n}

and

Y_{(i, 2), n}

denote the total number of successes for treatment

π_{i}

at endpoints 1 and 2, respectively, after n observations. Then, by Rule (2) of Procedure H, treatment

π_{i}

would be selected if

Y_{(i, 1), n} - c_{1} \geq Y_{(0, 1), n} and Y_{(i, 2), n} - c_{2} \geq Y_{(0, 2), n} .

Observe that:

\begin{matrix} Y_{(i, 1), n} - c_{1} & \geq Y_{(i, 1), M} - c_{1} \geq Y_{(0, 1), M} + n - M \geq Y_{(0, 1), n}, \\ Y_{(i, 2), n} - c_{2} & \geq Y_{(i, 2), M} - c_{2} \geq Y_{(0, 2), M} + n - M \geq Y_{(0, 2), n} . \end{matrix}

Hence, the same subset of treatments would be selected by Procedure H.

Similarly, if sampling terminates under Stopping Rule (ii), then Decision Rule (b) of Procedure R results in the selection of no experimental treatments. This outcome coincides precisely with the decision specified by Rule (2) of Procedure H.

This concludes the proof of the theorem. □

5. Tables

In this section, we assess the performance of the curtailment procedure by examining its potential for sample size reduction compared to the corresponding non-curtailment procedure H. Throughout the analysis, we assume a common association structure between the two endpoints across all

(k + 1)

treatments. Nevertheless, our results indicate that the parameter derivations for the curtailment procedure are also applicable to settings in which the association structures vary across treatments.

We examine the following configurations:

k = 2, 3

;

p_{0, 1} = 0.40, 0.50, 0.60

;

p_{0, 2} = 0.60

;

δ_{0, 1}^{*} = δ_{0, 2}^{*} = 0.01

;

δ_{1, 1}^{*} = 0.30

,

δ_{1, 2}^{*} = 0.25

;

P_{0}^{*} = 0.90

;

P_{1}^{*} = 0.80, 0.85, 0.90

; and

ϕ = 0, 0.01, 0.1, 1, 2, 4, 8, 100

. These parameter settings are consistent with those adopted in [16] for generating Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 under the fixed-sample-size procedure H. For each configuration, the minimum required sample size per treatment, n, along with the corresponding thresholds

(c_{1}, c_{2})

, was determined to ensure that the specified probability constraints were satisfied. If multiple

(c_{1}, c_{2})

combinations met these constraints, the design yielding the highest probability of selecting a single effective treatment was chosen. This n is then used as the maximum number of observations per treatment under the curtailment procedure R. As established in Theorem 4, selecting the same values of n and

(c_{1}, c_{2})

ensures that the curtailment procedure meets the same probability requirements as Procedure H. This result ensures that the theoretical properties derived under the fixed-sample framework—such as those in Theorems 1–3—can be validly transferred to the curtailed procedure without requiring separate proofs. In this sense, Theorem 4 provides the theoretical foundation for leveraging fixed-sample thresholds within a sequential, curtailed sampling design.

We define N as the total number of observations required by the fixed-sample-size procedure to meet the prescribed probability constraints, with

N = (k + 1) \times n

. This value of N also serves as an upper limit on the total number of observations under the curtailment procedure. Let

E_{1} (N | R)

and

E_{0} (N | R)

represent the expected sample sizes of Procedure R under the configurations

C F G_{1}

and

C F G_{0}

, respectively, as described in Section 3. These configurations were also used to compute the lower bounds

P_{L} (C S_{1})

and

P_{L} (C S_{0})

for selecting a correct subset under the alternative and null hypotheses, and to determine the design parameters for Procedure H under the fixed-sample-size setting.

We define the average expected sample size under the curtailment procedure as

E (N ∣ R) = \frac{1}{2} (E_{1} (N ∣ R) + E_{0} (N ∣ R))

, following the approach of Thall, Simon, and Ellenberg, to account for performance under both configurations. The quantities

E_{1} (N ∣ R)

and

E_{0} (N ∣ R)

are estimated via simulation (10,000 repetitions), implemented in R. To generate bivariate binary data with marginal probabilities

p_{1}

,

p_{2}

and odds ratio

ϕ \neq 1

, we compute:

\begin{matrix} p_{11} & = \frac{a - \sqrt{a^{2} + b}}{2 (ϕ - 1)}, a = 1 + (ϕ - 1) (p_{1} + p_{2}), b = - 4 ϕ (ϕ - 1) p_{1} p_{2}, \\ p_{12} & = p_{1} - p_{11}, p_{21} = p_{2} - p_{11}, p_{22} = 1 - (p_{11} + p_{12} + p_{21}), \end{matrix}

and then simulate binary outcomes from a

2 \times 2

table with cell probabilities

p_{i j}

.

Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 summarize, for each parameter setting, the total sample size N required by the fixed-sample-size procedure, the expected sample sizes

E_{1} (N | R)

and

E_{0} (N | R)

under the curtailment procedure, and the percentage reduction in observations achieved via curtailment, computed as

R S (%) = \frac{N - E (N | R)}{N} \times 100

. These results clearly demonstrate that Procedure R requires significantly fewer observations than Procedure H to achieve equivalent performance guarantees.

Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 also demonstrate that the odds ratio

ϕ

has little impact on the expected sample size when

ϕ

values are relatively close, under both

C F G_{1}

and

C F G_{0}

, for the curtailment procedure. Greater variability is observed under

C F G_{0}

. This pattern is consistent with the findings of [12], who considered only moderate odds ratios (

ϕ = 2, 4, 8

) in the context of hypothesis testing, and observed minimal sample size variation under curtailment, with larger variability under the null hypothesis. In our procedure, however, when

ϕ

varies substantially—for example, from 1 to 100—we observe a marked decrease in expected sample size under both

C F G_{1}

and

C F G_{0}

. Ref. [12] did not report average expected sample sizes, but instead presented results under both the null and alternative hypotheses and calculated percentage savings. Their results showed modest sample size savings under the alternative hypothesis and more pronounced savings under the null, consistent with our own findings.

6. Examples

6.1. Immunotherapy in Elderly Patients with Non-Small Cell Lung Cancer

To demonstrate the practical utility of our method and illustrate how to use the simulation-based reference tables, we construct the following example. The example is intended to illustrate how to implement the proposed design and to use the simulation-based reference tables; it is not intended to validate the underlying distributional assumptions on this dataset.

This example considers an experimental trial involving two immunotherapy-based treatments for elderly patients (≥75 years old) diagnosed with advanced non-small cell lung cancer (NSCLC). The trial compares two immunotherapy strategies—PD1-A (anti-PD-1 monotherapy) and PD1-B (anti-PD-1 combined with low-dose chemotherapy)—against the standard chemotherapy regimen consisting of carboplatin and pemetrexed, which serves as the control treatment.

While carboplatin plus pemetrexed is considered the standard of care in general NSCLC populations, this regimen has not been adequately studied in patients aged 75 and above. Historically, in younger NSCLC populations, this standard chemotherapy yields approximately

40 %

objective response rate (ORR), and around

40 %

of patients experience grade 3 or higher treatment-related adverse events. These outcomes establish the benchmark efficacy and safety rates for the control treatment as

p_{0, 1} = 0.40

and

p_{0, 2} = 0.60

, respectively. Prior analyses in younger patients suggest an odds ratio of approximately 2 between efficacy and safety, indicating that patients who do not experience toxicity are more likely to respond to treatment.

This study aims to assess whether PD1-A or PD1-B outperforms the control treatment with respect to efficacy and safety. Specifically, the experimenter seeks an increase in the response rate of at least

0.30

and a reduction in high-grade toxicity of at least

0.25

, corresponding to

δ_{1, 1}^{*} = 0.30

and

δ_{1, 2}^{*} = 0.25

. If both experimental treatments fail to demonstrate improvements over the control, the standard chemotherapy will be selected, with thresholds

δ_{0, 1}^{*} = δ_{0, 2}^{*} = 0.01

. This means that whenever a tested treatment exceeds the control by at least

0.30

in efficacy and

0.25

in safety, the procedure guarantees a probability of at least

P_{1}^{*}

of selecting all such treatments. Conversely, if none of the experimental treatments surpass the control by more than

0.01

in either endpoint, the procedure ensures a probability of at least

P_{0}^{*}

of correctly selecting only the control treatment.

When

P_{1}^{*} = 0.85

and

P_{0}^{*} = 0.90

, Table 3 shows that the fixed sample size procedure requires

n = 81

observations per treatment, with corresponding critical values

c_{1} = 14

and

c_{2} = 12

. Therefore, a total of

3 \times 81 = 243

observations is needed under the fixed sample size procedure to meet the specified probability constraints. In contrast, the curtailment procedure, while also using at most

n = 81

observations per treatment and the same critical values

c_{1} = 14

,

c_{2} = 12

, is expected to achieve the same probability guarantees with fewer observations on average. According to Table 3, the expected relative sample size saving from using the curtailment procedure is approximately

11.2767 %

.

6.2. Chemotherapy of Acute Leukemia

This example is adapted from a clinical trial scenario described in [17], with minor modifications to align with our design framework. This example likewise serves to demonstrate implementation rather than to validate the distributional assumptions for this dataset.

This example involves an experimental trial comparing two different combinations of Gemcitabine and Cyclophosphamide—denoted as GemCy1 and GemCy2—each with varying dosage proportions, against the standard Ara-C regimen for treating patients with good-prognosis acute myelogenous leukemia (AML) or myelodysplastic syndrome. Historically, the standard treatment Ara-C yields approximately

70 %

of patients achieving complete remission (CR), while around

38 %

either die or experience severe myelosuppression within the first five weeks. These historical outcomes establish the efficacy and safety rates for the control treatment as

p_{0, 1} = 0.7

and

p_{0, 2} = 0.62

, respectively. Additionally, the odds ratio between efficacy and safety is estimated to be

3.05

, indicating that patients who do not experience toxicity are more likely to achieve complete remission.

The goal of the trial is to determine whether either GemCy1 or GemCy2 surpasses the control treatment in terms of benefit-risk profile. Since the toxicity associated with Ara-C is relatively severe, our design prioritizes safety improvement while maintaining efficacy. Specifically, we consider a benefit-risk trade-off scenario in which the efficacy threshold is relaxed—i.e., we set

δ_{1, 1}^{*} = 0

, requiring only that the experimental treatments match the control in efficacy—while demanding a safety improvement of at least

δ_{1, 2}^{*} = 0.35

. This reflects the objective of identifying a new treatment that is comparably effective but significantly safer. If neither experimental treatment demonstrates superiority in safety, the control treatment is selected. To enforce this, we use

δ_{0, 1}^{*} = - 0.2

and

δ_{0, 2}^{*} = 0.01

. The negative value of

δ_{0, 1}^{*}

indicates that an experimental treatment will be excluded if its efficacy falls more than 0.2 below that of the control (i.e., if

p_{i, 1} < p_{0, 1} - 0.2

). This ensures that a treatment cannot be selected solely on the basis of improved safety if it sacrifices too much efficacy.

When

P_{1}^{*} = 0.85

and

P_{0}^{*} = 0.85

, the grid search results show that the fixed sample size procedure requires

n = 86

observations per treatment, with corresponding critical values

c_{1} = - 8

and

c_{2} = 17

. Consequently, implementing the fixed sample size procedure requires a total of

3 \times 86 = 258

observations to meet the required probabilistic conditions. In contrast, the curtailment procedure, while also using at most

n = 86

observations per treatment and the same critical values

c_{1} = - 8

,

c_{2} = 17

, is expected to achieve the same probability guarantees with fewer observations on average. According to the simulation program derived by the curtailed procedure, the expected relative sample size saving from using the curtailment procedure is approximately

12.13 %

.

Remark 6.

The thresholds

δ_{0, 1}^{*}, δ_{0, 2}^{*}, δ_{1, 1}^{*}, δ_{1, 2}^{*}

are design parameters specified by the investigator to reflect the clinical priorities of the trial. If the goal is to prioritize efficacy, one may choose a relatively large value for

δ_{1, 1}^{*}

(requiring a strong efficacy improvement) and a small or even zero value for

δ_{1, 2}^{*}

(allowing safety to remain comparable to control). Conversely, higher values of

δ_{1, 2}^{*}

can be used to enforce stricter safety improvements. This flexibility allows the proposed procedure to accommodate various benefit-risk trade-offs tailored to specific therapeutic contexts. In this framework, the probability of correctly selecting a truly effective treatment—i.e., one that exceeds both thresholds

δ_{1, 1}^{*}

and

δ_{1, 2}^{*}

—is guaranteed to exceed

P_{1}^{*}

, which serves a role analogous to the power in hypothesis testing.

Remark 7.

The thresholds

δ_{0, 1}^{*}

and

δ_{0, 2}^{*}

play a complementary role by specifying minimal margins below which a treatment is considered insufficiently effective or safe. Treatments that fail to exceed either

p_{0, 1} + δ_{0, 1}^{*}

or

p_{0, 2} + δ_{0, 2}^{*}

are excluded from selection, ensuring that only those with meaningful improvements over the control are considered. This design component helps control the probability of incorrectly selecting inferior treatments, serving a role analogous to controlling the type I error rate in hypothesis testing.

Remark 8.

The values of

p_{0, 1}

and

p_{0, 2}

for the control group are typically known or can be reliably estimated from historical data, since the control treatment usually corresponds to an existing standard-of-care therapy with well-documented efficacy and safety profiles.In situations where the exact values of

p_{0, 1}

and

p_{0, 2}

are uncertain, a conservative design strategy can be employed. Specifically, one may perform a grid search over a plausible range of

(p_{0, 1}, p_{0, 2})

pairs and identify the worst-case configuration—i.e., the combination that yields the largest required sample size n. The study can then be designed based on this worst-case scenario to ensure robust performance across all reasonable control assumptions.

7. Conclusions

This study introduces a curtailment procedure for selecting a random-sized subset that includes the best treatment, provided it demonstrates a significant improvement over the control. The comparison is based on two Bernoulli endpoints observed for each of the k experimental treatments and the control group. The proposed method builds upon the fixed sample size approach outlined by [16]. While meeting the same probability constraints for successful selection as the original method, the curtailment approach requires fewer observations from the experimental arms. Simulation results show that this method can yield a relative total sample size reduction of approximately 10–

12 %

compared to the fixed sample size procedure. The adaptive nature of curtailment is especially advantageous, as it not only decreases the overall sample size but also avoids unnecessary sampling from less promising treatments. However, to ensure its practical feasibility, the time between treatment administration and response observation should be short relative to the experiment duration.

Furthermore, simulation results suggest that variations in the odds ratio have negligible influence on the required sample size. The insensitivity to odds ratios contributes to the robustness of the proposed method, particularly when the assumption of independence between endpoints is violated. In our study, we assumed a consistent pattern of association between the two endpoints across all treatment arms. Nevertheless, this assumption can be readily relaxed to allow for differing association structures among treatments.

As a potential direction for future research, when the true success probabilities

p_{i j}

are unknown at the design stage, one may estimate the association parameter

ϕ

from historical or pilot data and obtain a confidence set

I_{ϕ}

. Rather than reporting ranges for

(n, c_{1}, c_{2})

, a robust, worst-case calibration could be performed to choose a single triplet

(n^{*}, c_{1}^{*}, c_{2}^{*})

meeting the selection-probability guarantees for all

ϕ \in I_{ϕ}

. Our design tables suggest that the resulting robust triplet often coincides with the design at one endpoint of

I_{ϕ}

, reflecting the empirical stability of

(c_{1}, c_{2})

across nearby

ϕ

values. A systematic development and comparison of such interval-based robust calibration is left for future study.

In applied use, practitioners should justify or check the adequacy of the joint multinomial (equivalently, bivariate–binomial) assumption for their specific datasets prior to adopting the procedure. The present work focuses on the methodological framework and guarantees under this assumption. As another avenue for future work, it would be valuable to investigate the empirical validity of the underlying distributional assumptions—such as the bivariate binomial or multinomial model—when applying the proposed procedure to real-world clinical datasets.

Additionally, a reader may wonder about selecting the single most efficacious treatment from the identified subset. That task falls under the select-the-best framework, which has distinct objectives and methodological requirements compared to the subset selection problem studied here. Both approaches have distinct objectives and design considerations, and each has its own advantages and limitations. The two approaches are referred to as the Subset Selection Approach and the Indifference Zone Approach, respectively. Please see [18]. We view these two directions as complementary rather than contradictory, and believe they are best addressed in separate articles.

Author Contributions

Conceptualization, P.C. and L.H.; methodology, E.M.B. and C.Y.; software, C.Y.; validation, E.M.B., P.C. and L.H.; formal analysis, C.Y.; investigation, E.M.B. and C.Y.; resources, C.Y.; data curation, E.M.B. and C.Y.; writing—original draft preparation, C.Y.; writing—review and editing, E.M.B., P.C. and L.H.; visualization, C.Y.; supervision, P.C.; project administration, L.H.; funding acquisition, P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors thank the anonymous reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. Proofs of Theorems 1–3

Theorems 1–3 and their corresponding proofs are adapted from our submitted manuscript ([16]). They are included here for the reader’s convenience and to ensure the completeness of the current work.

To prove Theorems 1–3, we introduced the following monotonicity properties of the joint probability

P (X_{1} \geq c_{1}, X_{2} \geq c_{2} ∣ p_{1}, p_{2}, ϕ),

which were introduced by [19].

Theorem A1.

Let

n \geq 1, 1 \leq c_{1} \leq n, 1 \leq c_{2} \leq n

be fixed constants. a. For fixed ϕ and

p_{2}, P (X_{1} \geq c_{1}, X_{2} \geq c_{2} | p_{1}, p_{2}, ϕ)

is increasing in

p_{1}

. b. For fixed ϕ and

p_{1}, P (X_{1} \geq c_{1}, X_{2} \geq c_{2} | p_{1}, p_{2}, ϕ)

is increasing in

p_{2}

. c. For fixed

p_{1}

and

p_{2}, P (X_{1} \geq c_{1}, X_{2} \geq c_{2} | p_{1}, p_{2}, ϕ)

is increasing in ϕ.

We now provide the proof of Theorem 1 from Section 3.

Proof.

Suppose that there are

k_{1} \leq k

treatments superior to the control. Without loss of generality, assume that treatments

π_{i}, i = 1, 2, \dots, k_{1}

are better than the control. Since

ϕ = 1

, efficacy and safety are independent, and each endpoint follows a binomial distribution with success probabilities

p_{i, 1}

and

p_{i, 2}

, respectively. Let

b (n, p, x)

denote the probability mass function of a binomial distribution, i.e.,

b (n, p, x) = (\binom{n}{x}) p^{x} {(1 - p)}^{n - x}, x = 0, 1, \dots, n .

Then by the definition of

P (C S_{1})

, we have:

\begin{matrix} P (C S_{1}) & = P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2}, i = 1, 2, \dots, k_{1} | p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*}, \\ i = 1, 2, \dots, k_{1}) \\ \geq P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2}, i = 1, 2, \dots, k | p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*}, \\ i = 1, 2, \dots, k) \\ \geq \sum_{x_{0, 1} = 0}^{n} \prod_{i = 1}^{k} P (X_{i, 1} \geq x_{0, 1} + c_{1} | p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*}) b (n, p_{0, 1}, x_{0, 1}) \\ \times \sum_{x_{0, 2} = 0}^{n} \prod_{i = 1}^{k} P (X_{i, 2} \geq x_{0, 2} + c_{2} | p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*}) b (n, p_{0, 2}, x_{0, 2}) \\ \geq (\sum_{x_{0, 1} = 0}^{n} {[\sum_{x_{1, 1} = x_{0, 1} + c_{1}}^{n} b (n, p_{0, 1} + δ_{1, 1}^{*}, x_{1, 1})]}^{k} b (n, p_{0, 1}, x_{0, 1})) \\ \times (\sum_{x_{0, 2} = 0}^{n} {[\sum_{x_{1, 2} = x_{0, 2} + c_{2}}^{n} b (n, p_{0, 2} + δ_{1, 2}^{*}, x_{1, 2})]}^{k} b (n, p_{0, 2}, x_{0, 2})) \end{matrix}

Then, if

(\sum_{x_{0, 1} = 0}^{n} {[\sum_{x_{1, 1} = x_{0, 1} + c_{1}}^{n} b (n, p_{0, 1} + δ_{1, 1}^{*}, x_{1, 1})]}^{k} b (n, p_{0, 1}, x_{0, 1})) \times (\sum_{x_{0, 2} = 0}^{n} [\sum_{x_{1, 2} = x_{0, 2} + c_{2}}^{n} b (n,

p_{0, 2} + δ_{1, 2}^{*}, x_{1, 2} {)]}^{k} b (n, p_{0, 2}, x_{0, 2})) \geq P_{1}^{*}

, we obtain that

P (C S_{1}) \geq P_{1}^{*}

Now we show that

\sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2}}^{n} {(1 - max [P (X_{i, 1} \geq c_{1} + x_{0, 1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), P (X_{i, 2} \geq c_{2} + x_{0, 2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*})])}^{k} \times b (n, p_{0, 1}, x_{0, 1}) b (n, p_{0, 2}, x_{0, 2}) \geq P_{0}^{*}

.

By the definition of

P (C S_{0})

, we have:

\begin{matrix} P (C S_{0}) & = P (X_{i, 1} < X_{0, 1} + c_{1} or X_{i, 2} < X_{0, 2} + c_{2}, i = 1, \dots, k | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*} or p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*}, \\ i = 1, \dots, k) \\ = \prod_{i = 1}^{k} P (X_{i, 1} < X_{0, 1} + c_{1} or X_{i, 2} < X_{0, 2} + c_{2} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*} or p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*}) \\ = \prod_{i = 1}^{k} (1 - P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*} or p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*})) \end{matrix}

Now consider a typical term in the product of the above expression:

\begin{matrix} P & (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*} or p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*}) \\ \leq max (P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*}, p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*}), \\ P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*}, p_{i, 2} \geq p_{0, 2} + δ_{0, 2}^{*}), \\ P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 1} \geq p_{0, 1} + δ_{0, 1}^{*}, p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*})) \\ = max (P (X_{i, 1} \geq X_{0, 1} + c_{1} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*}) P (X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*}), \\ P (X_{i, 1} \geq X_{0, 1} + c_{1} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*}) P (X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 2} \geq p_{0, 2} + δ_{0, 2}^{*}), \\ P (X_{i, 1} \geq X_{0, 1} + c_{1} | p_{i, 1} \geq p_{0, 1} + δ_{0, 1}^{*}) P (X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*})) \\ \leq max (P (X_{i, 1} \geq X_{0, 1} + c_{1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}) P (X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*}), \\ P (X_{i, 1} \geq X_{0, 1} + c_{1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}) P (X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 2} = 1), \\ P (X_{i, 1} \geq X_{0, 1} + c_{1} | p_{i, 1} = 1) P (X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*})) \\ = max (P (X_{i, 1} \geq X_{0, 1} + c_{1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), P (X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*})) \end{matrix}

Therefore,

\begin{matrix} P (C S_{0}) \geq & \prod_{i = 1}^{k} (1 - max (P (X_{i, 1} \geq X_{0, 1} + c_{1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), \\ P (X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*}))) \\ = & \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} \prod_{i = 1}^{k} [1 - max (P (X_{i, 1} \geq x_{0, 1} + c_{1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), \\ P (x_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*}))] \times b (n, p_{0, 1}, x_{0, 1}) b (n, p_{0, 2}, x_{0, 2}) \\ = & \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} [1 - max (P (X_{i, 1} \geq x_{0, 1} + c_{1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), \\ P (x_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*}) {)]}^{k} \times b (n, p_{0, 1}, x_{0, 1}) b (n, p_{0, 2}, x_{0, 2}) \end{matrix}

and so if

\begin{matrix} \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} {[1 - max (P (X_{i, 1} \geq x_{0, 1} + c_{1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), P (x_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*}))]}^{k} \\ \times b (n, p_{0, 1}, x_{0, 1}) b (n, p_{0, 2}, x_{0, 2}) \geq P_{0}^{*}, \end{matrix}

we obtain that

P (C S_{0}) \geq P_{0}^{*}

.

□

We now provide the proof of Theorem 2 from Section 3.

Proof.

Suppose that there are

k_{1} \leq k,

treatments better than the control treatment. Without loss of generality we can assume that treatments

π_{i}, i = 1, 2, \dots, k_{1}

are better than the control.

The joint probability distribution of

X_{1}

and

X_{2}

can be written as:

P (X_{1} = x_{1}, X_{2} = x_{2} | p_{1}, p_{2}, ϕ)

= \sum_{i = m a x (0, X_{1} + X_{2} - n)}^{m i n (X_{1}, X_{2})} \frac{n!}{i! (X_{1} - i)! (X_{2} - i)! (n - X_{1} - X_{2} + i)!} p_{11}^{i} p_{12}^{X_{1} - i} p_{21}^{X_{2} - i} p_{22}^{n - X_{1} - X_{2} + i}

(A1)

where

p_{11} = \frac{a - \sqrt{a^{2} + b}}{2 (ϕ - 1)}, a = 1 + (ϕ - 1) (p_{e} + p_{s}), b = - 4 ϕ (ϕ - 1) p_{e} p_{s}

(A2)

p_{12} = p_{e} - p_{11}

(A3)

p_{21} = p_{s} - p 11

(A4)

p_{22} = 1 - p_{11} - p_{12} - p_{21}

(A5)

By the monotonicity properties of the joint probability introduced by [19], we have:

\begin{matrix} P (C S_{1}) \\ = P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2}, i = 1, 2, \dots, k_{1} | p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*}, ϕ_{i}, \\ i = 1, 2, \dots, k_{1}) \\ \geq P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2}, i = 1, 2, \dots, k | p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*}, ϕ_{i}, \\ i = 1, 2, \dots, k) \\ = \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} [\prod_{i = 1}^{k} P (X_{i, 1} \geq x_{0, 1} + c_{1}, X_{i, 2} \geq x_{0, 2} + c_{2} | p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*}, ϕ_{i})] \\ \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = x_{0, 2} | p_{0, 1}, p_{0, 2}, ϕ_{0}) \\ \geq \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} [\prod_{i = 1}^{k} P (X_{i, 1} \geq x_{0, 1} + c_{1}, X_{i, 2} \geq x_{0, 2} + c_{2} | p_{i, 1} = p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} = p_{0, 2} + δ_{1, 2}^{*}, ϕ_{i})] \\ \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = x_{0, 2} | p_{0, 1}, p_{0, 2}, ϕ_{0}) \\ = \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} [Π_{i = 1}^{k} (\sum_{x_{i, 1} = c_{1} + x_{0, 1}}^{n} \sum_{x_{i, 2} = c_{2} + x_{0, 2}}^{n} P (X_{i, 1} = x_{i, 1}, X_{i, 2} = x_{i, 2} | p_{i, 1} = p_{0, 1} + δ_{1, 1}^{*}, \\ p_{i, 2} = p_{0, 2} + δ_{1, 2}^{*}, ϕ_{i}))] \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = X_{0, 2} | p_{0, 1}, p_{0, 2}, ϕ_{0}) \geq P_{1}^{*} \end{matrix}

\begin{matrix} P (C S_{0}) & = P (X_{i, 1} < X_{0, 1} + c_{1} or X_{i, 2} < X_{0, 2} + c_{2}, i = 1, \dots, k | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*} or p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*}, \\ i = 1, \dots, k) \\ = \prod_{i = 1}^{k} P (X_{i, 1} < X_{0, 1} + c_{1} or X_{i, 2} < X_{0, 2} + c_{2} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*} or p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*}) \\ = \prod_{i = 1}^{k} (1 - P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*} or p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*})) \end{matrix}

Since

\begin{matrix} P (X_{i, 1} \geq c_{1}, X_{i, 2} \geq c_{2} | & p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*} o r p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*}) \leq \\ \leq m a x (P (X_{i, 1} \geq c_{1}, X_{i, 2} \geq c_{2} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*}, p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*}), \\ P (X_{i, 1} \geq c_{1}, X_{i, 2} \geq c_{2} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*}, p_{i, 2} \geq p_{0, 2} + δ_{0, 2}^{*}), \\ P (X_{i, 1} \geq c_{1}, X_{i, 2} \geq c_{2} | p_{i, 1} \geq p_{0, 1} + δ_{0, 1}^{*}, p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*})) \\ \leq m a x (P (X_{i, 1} \geq c_{1}, X_{i, 2} \geq c_{2} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}, p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*}) \\ P (X_{i, 1} \geq c_{1}, X_{i, 2} \geq c_{2} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}, p_{i, 2} = 1), \\ P (X_{i, 1} \geq c_{1}, X_{i, 2} \geq c_{2} | p_{i, 1} = 1, p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*})) \\ = m a x (P (X_{i, 1} \geq c_{1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), P (X_{i, 2} \geq c_{2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*})) \end{matrix}

We obtain that

\begin{matrix} \prod_{i = 1}^{k} (1 - P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2} | p_{i, 1} \leq p_{0, 1} + δ_{0, 1}^{*} or p_{i, 2} \leq p_{0, 2} + δ_{0, 2}^{*})) \\ \geq & \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} \prod_{i = 1}^{k} [1 - m a x (P (X_{i, 1} \geq x_{0, 1} + c_{1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), \\ P (X_{i, 2} \geq x_{0, 2} + c_{2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*}))] \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = x_{0, 2}) \\ = & \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} [1 - m a x (P (X_{i, 1} \geq x_{0, 1} + c_{1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), \\ P (X_{i, 2} \geq x_{0, 2} + c_{2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*}) {)]}^{k} \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = x_{0, 2}) \end{matrix}

So

P (C S_{0}) \geq \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} {[1 - m a x (P (X_{i, 1} \geq x_{0, 1} + c_{1} | p_{i, 1} = p_{0, 1} + δ_{0, 1}^{*}), P (X_{i, 2} \geq x_{0, 2} + c_{2} | p_{i, 2} = p_{0, 2} + δ_{0, 2}^{*}))]}^{k} \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = x_{0, 2})

□

We now provide the proof of Theorem 3 from Section 3.

Proof.

Suppose that there are

k_{1} \leq k,

treatments better than the control treatment. Without loss of generality we can assume that treatments

π_{i}, i = 1, 2, \dots, k_{1}

are better than the control.

By the monotonicity properties of the joint probability introduced by [19], we have:

\begin{matrix} P (C S_{1}) & = P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2}, i = 1, 2, \dots, k_{1} | p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*}, ϕ_{i}, \\ i = 1, 2, \dots, k_{1}) \\ \geq P (X_{i, 1} \geq X_{0, 1} + c_{1}, X_{i, 2} \geq X_{0, 2} + c_{2}, i = 1, 2, \dots, k | p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*}, ϕ_{i}, \\ i = 1, 2, \dots, k) \\ = \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} [\prod_{i = 1}^{k} P (X_{i, 1} \geq x_{0, 1} + c_{1}, X_{i, 2} \geq x_{0, 2} + c_{2} | \\ p_{i, 1} \geq p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} \geq p_{0, 2} + δ_{1, 2}^{*}, ϕ_{i})] \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = x_{0, 2} | p_{0, 1}, p_{0, 2}, ϕ_{0}) \\ \geq \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} [\prod_{i = 1}^{k} P (X_{i, 1} \geq x_{0, 1} + c_{1}, X_{i, 2} \geq x_{0, 2} + c_{2} | \\ p_{i, 1} = p_{0, 1} + δ_{1, 1}^{*}, p_{i, 2} = p_{0, 2} + δ_{1, 2}^{*}, ϕ_{i} = 0)] \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = x_{0, 2} | p_{0, 1}, p_{0, 2}, ϕ_{0}) \\ = \sum_{x_{0, 1} = 0}^{n} \sum_{x_{0, 2} = 0}^{n} [\sum_{x_{i, 1} = c_{1} + x_{0, 1}}^{n} \sum_{x_{i, 2} = c_{2} + x_{0, 2}}^{n} P (X_{i, 1} = x_{i, 1}, X_{i, 2} = x_{i, 2} | p_{i, 1} = p_{0, 1} + δ_{1, 1}^{*}, \\ p_{i, 2} = p_{0, 2} + δ_{1, 2}^{*}, ϕ_{i} = 0 {)]}^{k} \times P (X_{0, 1} = x_{0, 1}, X_{0, 2} = X_{0, 2} | p_{0, 1}, p_{0, 2}, ϕ_{0}) \geq P_{1}^{*} \end{matrix}

The first inequality now follows if we use the expression of the joint probability distribution of

X_{1, 1}

and

X_{1, 2}

when

ϕ

= 0, described in Section 3.

The proof of second inequality follows similarly to the proof of the second inequality in the previous theorem. □

References

Sobel, M.; Huyett, M.J. Selecting the Best One of Several Binomial Populations. Bell Syst. Tech. J. 1957, 36, 537–576. [Google Scholar] [CrossRef]
Gupta, S.S.; Sobel, M. On Selecting a Subset which Contains all Populations Better than a Standard. Ann. Math. Stat. 1958, 29, 235–244. [Google Scholar] [CrossRef]
Dunnett, C.M. Selection of the Best Treatment in Comparison to a Control with an Application to a Medical Trial. In Design of Experiments, Ranking, and Selection; Santner, T.J., Tamhane, A.C., Eds.; Marcel Dekker: New York, NY, USA, 1984; pp. 47–66. [Google Scholar]
Thall, P.F.; Simon, R.; Ellenberg, S.S. Two-stage designs for comparative clinical trials. Biometrika 1988, 75, 303–310. [Google Scholar] [CrossRef]
Carsten, C.; Chen, P. Curtailed Two-Stage Matched Pairs Design in Double-Arm Phase II Clinical Trials. J. Biopharm. Stat. 2016, 26, 816–822. [Google Scholar] [CrossRef] [PubMed]
Bechhofer, R.E.; Kulkarni, R.V. Closed Adaptive Sequential Procedures for Selecting the Best of k ≥ 2 Bernoulli Populations. No. TR510. 1981. Available online: https://apps.dtic.mil/sti/html/tr/ADA115653/ (accessed on 2 September 2025).
Buzaianu, E.M.; Chen, P. Curtailment Procedure for Selecting Among Bernoulli Populations. Commun. Stat. Theory Methods 2008, 37, 1085–1102. [Google Scholar] [CrossRef]
Jennison, C. Equal Probability of Correct Selection for Bernoulli Selection Procedures. Commun. Stat. Theory Methods 1983, 12, 2887–2896. [Google Scholar] [CrossRef]
Jennison, C.; Turnbull, B.W. Group sequential tests for bivariate response: Interim analysis of clinical trials with both efficacy and safety endpoints. Biometrics 1993, 49, 741–752. [Google Scholar] [CrossRef] [PubMed]
Bryant, J.; Day, R. Incorporating Toxicity Considerations into the Design of Two-Stage Phase II Clinical Trials. Biometrics 1995, 51, 1372–1383. [Google Scholar] [CrossRef] [PubMed]
Conway, M.R.; Petroni, G.R. Bivariate Sequential Designs for Phase II Trials. Biometrics 1995, 51, 656–664. [Google Scholar] [CrossRef]
Chen, C.M.; Chi, Y. Curtailed two-stage designs with two dependent binary endpoints. Pharm. Stat. 2012, 11, 57–62. [Google Scholar] [CrossRef] [PubMed]
Shen, H.; Hong, L.J.; Zhang, X. Ranking and selection with covariates for personalized decision making. INFORMS J. Comput. 2021, 33, 1500–1519. [Google Scholar] [CrossRef]
Ding, L.; Hong, L.J.; Shen, H.; Zhang, X. Knowledge Gradient for Selection with Covariates: Consistency and Computation. In Proceedings of the 2019 Winter Simulation Conference, National Harbor, MD, USA, 8–12 December 2019; pp. 3425–3436. [Google Scholar]
Buzaianu, E.M.; Chen, P.; Hsu, L. A Curtailed Procedure for Selecting Among Treatments With Two Bernoulli Endpoints. Sankhya B 2022, 84, 320–339. [Google Scholar] [CrossRef]
Yin, C.; Buzaianu, E.M.; Chen, P.; Hsu, L. A Design for Selecting Among Treatments with Two Binary Endpoints. In Manuscript Submitted to Communications in Statistics—Theory and Methods. Status: Under Review; Taylor & Francis Inc.: Philadelphia, PA, USA, 1976. [Google Scholar]
Thall, P.F.; Cheng, S.C. Treatment comparisons based on two-dimensional safety and efficacy alternatives in oncology trials. Biometrics 1999, 55, 746–753. [Google Scholar] [CrossRef] [PubMed]
Gupta, M.K.; Panchapakesan, S. Multiple Decision Procedures: Theory and Methodology of Selection and Ranking Populations; SIAM: Philadelphia, PA, USA, 2002. [Google Scholar]
Buzaianu, E.M.; Chen, P.; Hsu, L. Selecting among treatments with two Bernoulli endpoints. Commun. Stat. Theory Methods 2024, 53, 1964–1984. [Google Scholar] [CrossRef]

Table 1. Classification table.

	Second Endpoint
First Endpoint	1	2
1	$X_{11}$	$X_{12}$	$X_{1}$
2	$X_{21}$	$X_{22}$	$X_{\bar{1}}$
	$X_{2}$	$X_{\bar{2}}$	n

Table 2. Design parameters when

k = 2, P_{1}^{*} = 0.80

.

Table 2. Design parameters when

k = 2, P_{1}^{*} = 0.80

.

$p_{0, 1}$	$ϕ$	n	$c_{1}$	$c_{2}$	N	$E_{0} (N \| R)$	$E_{1} (N \| R)$	$E (N \| R)$	$RS %$
0.4	0	81	14	13	243	195.96	232.41	214.18	11.86
	0.01	81	14	13	243	196.61	232.10	214.36	11.79
	0.1	78	14	12	234	190.27	222.84	206.55	11.73
	1	77	14	12	231	189.83	219.01	204.42	11.51
	2	75	13	12	225	186.30	212.83	199.56	11.3
	4	75	13	12	225	187.11	212.41	199.76	11.22
	8	71	13	11	213	177.07	200.73	188.90	11.32
	100	69	12	11	207	174.27	194.16	184.22	11.01
0.5	0	80	14	13	240	193.65	229.38	211.52	11.87
	0.01	80	14	13	240	193.97	229.26	211.61	11.83
	0.1	77	14	12	231	187.35	220.16	203.76	11.79
	1	76	14	12	228	186.77	216.31	201.54	11.61
	2	74	13	12	222	183.31	210.23	196.77	11.37
	4	74	13	12	222	184.22	209.72	196.97	11.27
	8	70	13	11	210	174.25	198	186.12	11.37
	100	67	12	11	201	169.16	188.56	178.86	11.01
0.6	0	75	14	12	225	180.99	215.14	198.06	11.97
	0.01	75	14	12	225	181.04	215.12	198.08	11.97
	0.1	73	13	12	219	177.48	208.88	193.18	11.79
	1	73	13	12	219	179.61	207.89	193.75	11.53
	2	71	14	11	213	173.74	202.21	187.97	11.75
	4	69	13	11	207	170.59	195.87	183.23	11.48
	8	67	12	11	201	167.16	189.63	178.40	11.25
	100	62	12	10	186	155.61	174.58	165.10	11.24

Note : k = 2, p_{0, 2} = 0.60, δ_{0, 1}^{*} = 0.01, δ_{0, 2}^{*} = 0.01, δ_{1, 1}^{*} = 0.30, δ_{1, 2}^{*} = 0.25, P_{0}^{*} = 0.90,

P_{1}^{*} = 0.80 .

Table 3. Design parameters when

k = 2, P_{1}^{*} = 0.85

.

Table 3. Design parameters when

k = 2, P_{1}^{*} = 0.85

.

$p_{0, 1}$	$ϕ$	n	$c_{1}$	$c_{2}$	N	$E_{0} (N \| R)$	$E_{1} (N \| R)$	$E (N \| R)$	$RS (%)$
0.4	0	87	15	13	261	212.04	248.41	230.23	11.79
	0.01	87	15	13	261	212.71	248.08	230.40	11.73
	0.10	85	14	13	255	209.71	241.68	225.70	11.49
	1	85	14	13	255	212.24	240.44	226.34	11.24
	2	81	14	12	243	202.46	228.73	215.60	11.28
	4	81	14	12	243	203.30	228.21	215.75	11.21
	8	81	14	12	243	204.02	227.76	215.89	11.16
	100	78	13	12	234	198.24	218.67	208.46	10.92
0.5	0	86	15	13	258	209.73	245.41	227.57	11.79
	0.01	86	15	13	258	209.97	245.28	227.63	11.77
	0.10	84	14	13	252	206.76	238.98	222.87	11.56
	1	84	14	13	252	209.18	237.80	223.49	11.31
	2	80	14	12	240	199.46	226.11	212.79	11.34
	4	80	14	12	240	200.36	225.57	212.97	11.26
	8	78	13	12	234	196.99	219.37	208.18	11.03
	100	73	13	11	219	185.24	204.25	194.75	11.07
0.6	0	81	15	12	243	196.90	231.23	214.06	11.91
	0.01	81	15	12	243	197.00	231.18	214.09	11.90
	0.10	81	15	12	243	197.70	230.81	214.25	11.83
	1	78	14	12	234	192.70	221.34	207.02	11.53
	2	78	14	12	234	193.54	220.89	207.22	11.45
	4	77	13	12	231	193.19	217.16	205.18	11.18
	8	74	14	11	222	184.30	208.86	196.58	11.45
	100	70	12	11	210	178.72	195.91	187.32	10.80

Note : k = 2, p_{0, 2} = 0.60, δ_{0, 1}^{*} = 0.01, δ_{0, 2}^{*} = 0.01, δ_{1, 1}^{*} = 0.30, δ_{1, 2}^{*} = 0.25, P_{0}^{*} = 0.90,

P_{1}^{*} = 0.85 .

Table 4. Design parameters when

k = 2, P_{1}^{*} = 0.90

.

Table 4. Design parameters when

k = 2, P_{1}^{*} = 0.90

.

$p_{0, 1}$	$ϕ$	n	$c_{1}$	$c_{2}$	N	$E_{0} (N \| R)$	$E_{1} (N \| R)$	$E (N \| R)$	$RS (%)$
0.4	0	96	15	14	288	237.24	272.11	254.67	11.57
	0.01	96	15	14	288	237.93	271.76	254.85	11.51
	0.10	96	15	14	288	239.33	271.17	255.25	11.37
	1	92	15	13	276	231.39	258.66	245.02	11.22
	2	92	15	13	276	232.31	258.15	245.23	11.15
	4	90	14	13	270	228.93	251.92	240.42	10.95
	8	90	14	13	270	229.75	251.42	240.58	10.90
	100	85	14	12	255	217.44	236.74	227.09	10.94
0.5	0	95	15	14	285	234.97	269.09	252.03	11.57
	0.01	95	15	14	285	235.24	268.96	252.10	11.54
	0.10	93	16	13	279	230.01	263.14	246.57	11.62
	1	91	15	13	273	228.35	256.03	242.19	11.29
	2	91	15	13	273	229.28	255.49	242.39	11.21
	4	89	14	13	267	225.99	249.27	237.63	11.00
	8	89	14	13	267	226.93	248.71	237.82	10.93
	100	82	13	12	246	211.21	227.98	219.59	10.73
0.6	0	89	15	13	267	219.34	252.27	235.80	11.68
	0.01	89	15	13	267	219.46	252.20	235.83	11.67
	0.10	89	15	13	267	220.23	251.86	236.04	11.59
	1	88	14	13	264	221.17	247.47	234.32	11.24
	2	87	16	12	261	216.51	245.11	230.81	11.57
	4	85	15	12	255	213.46	238.66	226.06	11.35
	8	83	14	12	249	210.29	232.34	221.31	11.12
	100	78	14	11	234	198.57	217.13	207.85	11.17

Note : k = 2, p_{0, 2} = 0.60, δ_{0, 1}^{*} = 0.01, δ_{0, 2}^{*} = 0.01, δ_{1, 1}^{*} = 0.30, δ_{1, 2}^{*} = 0.25, P_{0}^{*} = 0.90,

P_{1}^{*} = 0.90 .

Table 5. Design parameters when

k = 3, P_{1}^{*} = 0.80

.

Table 5. Design parameters when

k = 3, P_{1}^{*} = 0.80

.

$p_{0, 1}$	$ϕ$	n	$c_{1}$	$c_{2}$	N	$E_{0} (N \| R)$	$E_{1} (N \| R)$	$E (N \| R)$	$RS (%)$
0.4	0.00	95	16	15	380	308.89	362.28	335.59	11.69
	0.01	94	17	14	376	305.01	357.95	331.48	11.84
	0.10	93	17	14	372	303.28	354.08	328.68	11.65
	1.00	91	16	14	364	300.94	344.40	322.67	11.35
	2.00	89	15	14	356	296.73	335.82	316.27	11.16
	4.00	88	15	14	352	293.36	331.98	312.67	11.17
	8.00	86	14	14	344	288.88	323.93	306.41	10.93
	100.00	82	14	13	328	277.11	307.69	292.40	10.85
0.5	0.00	92	17	14	368	297.22	351.12	324.17	11.91
	0.01	92	17	14	368	297.68	351.02	324.35	11.86
	0.10	92	17	14	368	299.18	350.45	324.82	11.73
	1.00	90	16	14	360	296.63	340.58	318.61	11.50
	2.00	88	15	14	352	292.18	332.41	312.30	11.28
	4.00	87	15	14	348	289.80	328.60	309.20	11.15
	8.00	85	16	13	340	282.10	320.37	301.24	11.40
	100.00	80	14	13	320	270.15	300.19	285.17	10.88
0.6	0.00	88	16	14	352	284.93	335.77	310.35	11.83
	0.01	88	16	14	352	285.05	335.56	310.30	11.85
	0.10	86	15	14	344	280.29	327.42	303.86	11.67
	1.00	86	17	13	344	279.82	326.74	303.28	11.84
	2.00	85	16	13	340	279.78	321.82	300.80	11.53
	4.00	82	15	13	328	271.86	309.96	290.91	11.31
	8.00	80	14	13	320	267.28	301.55	284.42	11.12
	100.00	74	13	12	296	250.71	277.28	263.99	10.81

Note : k = 3, p_{0, 2} = 0.60, δ_{0, 1}^{*} = 0.01, δ_{0, 2}^{*} = 0.01, δ_{1, 1}^{*} = 0.30, δ_{1, 2}^{*} = 0.25, P_{0}^{*} = 0.90,

P_{1}^{*} = 0.80 .

Table 6. Design parameters when

k = 3, P_{1}^{*} = 0.85

.

Table 6. Design parameters when

k = 3, P_{1}^{*} = 0.85

.

$p_{0, 1}$	$ϕ$	n	$c_{1}$	$c_{2}$	N	$E_{0} (N \| R)$	$E_{1} (N \| R)$	$E (N \| R)$	$RS (%)$
0.4	0.00	101	17	15	404	330.13	383.44	356.78	11.69
	0.01	101	17	15	404	330.85	382.99	356.92	11.65
	0.10	101	17	15	404	333.10	382.08	357.59	11.49
	1.00	98	16	15	392	326.96	369.43	348.19	11.18
	2.00	97	17	14	388	323.57	364.62	344.09	11.32
	4.00	95	16	14	380	319.33	356.19	337.76	11.12
	8.00	94	16	14	376	316.11	352.35	334.23	11.11
	100.00	91	15	14	364	309.03	340.14	324.59	10.83
0.5	0.00	99	17	15	396	323.11	376.07	349.59	11.72
	0.01	99	17	15	396	323.68	375.74	349.71	11.69
	0.10	99	17	15	396	325.23	375.14	350.18	11.57
	1.00	97	16	15	388	322.80	365.79	344.29	11.26
	2.00	96	17	14	384	319.40	361.15	340.28	11.39
	4.00	93	16	14	372	311.32	349.43	330.38	11.19
	8.00	93	16	14	372	312.71	348.76	330.74	11.09
	100.00	86	15	13	344	291.61	320.66	306.13	11.01
0.6	0.00	94	17	14	376	306.26	357.10	331.68	11.79
	0.01	94	17	14	376	306.41	356.78	331.60	11.81
	0.10	94	17	14	376	307.53	356.55	332.04	11.69
	1.00	92	16	14	368	305.01	347.17	326.09	11.39
	2.00	91	16	14	364	302.20	343.25	322.72	11.34
	4.00	90	15	14	360	301.71	338.37	320.04	11.10
	8.00	87	16	13	348	290.26	326.94	308.60	11.32
	100.00	84	15	13	336	284.41	313.77	299.09	10.98

Note : k = 3, p_{0, 2} = 0.60, δ_{0, 1}^{*} = 0.01, δ_{0, 2}^{*} = 0.01, δ_{1, 1}^{*} = 0.30, δ_{1, 2}^{*} = 0.25, P_{0}^{*} = 0.90,

P_{1}^{*} = 0.85 .

Table 7. Design parameters when

k = 3, P_{1}^{*} = 0.90

.

Table 7. Design parameters when

k = 3, P_{1}^{*} = 0.90

.

$p_{0, 1}$	$ϕ$	n	$c_{1}$	$c_{2}$	N	$E_{0} (N \| R)$	$E_{1} (N \| R)$	$E (N \| R)$	$RS (%)$
0.4	0.00	111	18	16	444	365.66	419.29	392.48	11.60
	0.01	111	18	16	444	366.71	418.74	392.73	11.55
	0.10	110	17	16	440	366.62	413.66	390.14	11.33
	1.00	108	18	15	432	361.99	404.50	383.24	11.29
	2.00	106	17	15	424	358.00	395.96	376.98	11.09
	4.00	106	17	15	424	359.44	395.26	377.35	11.00
	8.00	104	16	15	416	354.66	387.29	370.97	10.82
	100.00	101	17	14	404	343.97	375.04	359.50	11.01
0.5	0.00	109	19	15	436	357.96	411.50	384.73	11.76
	0.01	109	19	15	436	358.33	411.49	384.91	11.72
	0.10	109	19	15	436	360.19	410.60	385.40	11.61
	1.00	107	18	15	428	358.00	401.18	379.59	11.31
	2.00	105	17	15	420	354.04	392.32	373.18	11.15
	4.00	104	17	15	416	351.23	388.19	369.71	11.13
	8.00	103	16	15	412	350.99	383.49	367.24	10.86
	100.00	97	16	14	388	332.10	359.83	345.97	10.83
0.6	0.00	104	18	15	416	341.64	392.59	367.11	11.75
	0.01	103	17	15	412	340.19	388.10	364.14	11.62
	0.10	103	17	15	412	341.38	387.88	364.63	11.50
	1.00	102	19	14	408	337.08	383.75	360.41	11.66
	2.00	100	18	14	400	332.66	375.32	353.99	11.50
	4.00	98	17	14	392	329.37	366.73	348.05	11.21
	8.00	97	16	14	388	328.98	361.58	345.28	11.01
	100.00	91	16	13	364	309.59	337.64	323.62	11.09

Note : k = 3, p_{0, 2} = 0.60, δ_{0, 1}^{*} = 0.01, δ_{0, 2}^{*} = 0.01, δ_{1, 1}^{*} = 0.30, δ_{1, 2}^{*} = 0.25, P_{0}^{*} = 0.90,

P_{1}^{*} = 0.90 .

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yin, C.; Buzaianu, E.M.; Chen, P.; Hsu, L. Subset Selection with Curtailment Among Treatments with Two Binary Endpoints in Comparison with a Control. Mathematics 2025, 13, 3067. https://doi.org/10.3390/math13193067

AMA Style

Yin C, Buzaianu EM, Chen P, Hsu L. Subset Selection with Curtailment Among Treatments with Two Binary Endpoints in Comparison with a Control. Mathematics. 2025; 13(19):3067. https://doi.org/10.3390/math13193067

Chicago/Turabian Style

Yin, Chishu, Elena M. Buzaianu, Pinyuen Chen, and Lifang Hsu. 2025. "Subset Selection with Curtailment Among Treatments with Two Binary Endpoints in Comparison with a Control" Mathematics 13, no. 19: 3067. https://doi.org/10.3390/math13193067

APA Style

Yin, C., Buzaianu, E. M., Chen, P., & Hsu, L. (2025). Subset Selection with Curtailment Among Treatments with Two Binary Endpoints in Comparison with a Control. Mathematics, 13(19), 3067. https://doi.org/10.3390/math13193067

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Subset Selection with Curtailment Among Treatments with Two Binary Endpoints in Comparison with a Control

Abstract

1. Background, Introduction, and Motivation

2. Assumptions, Goal, and Probability Requirements

3. Fixed Sample Size Procedure

4. Proposed Curtailment Procedure

5. Tables

6. Examples

6.1. Immunotherapy in Elderly Patients with Non-Small Cell Lung Cancer

6.2. Chemotherapy of Acute Leukemia

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proofs of Theorems 1–3

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI