Hypothesis Testing for Two-Arm Proportions with Two Binary Endpoints

Chen, Pinyuen; Yin, Chishu; Buzaianu, Elena M.

doi:10.3390/axioms15060435

Open AccessArticle

Hypothesis Testing for Two-Arm Proportions with Two Binary Endpoints

by

Pinyuen Chen

¹

,

Chishu Yin

^1,*

and

Elena M. Buzaianu

²

¹

Department of Mathematics, Syracuse University, Syracuse, NY 13244, USA

²

Department of Mathematics and Statistics, University of North Florida, Jacksonville, FL 32224, USA

^*

Author to whom correspondence should be addressed.

Axioms 2026, 15(6), 435; https://doi.org/10.3390/axioms15060435

Submission received: 17 March 2026 / Revised: 25 May 2026 / Accepted: 1 June 2026 / Published: 11 June 2026

(This article belongs to the Special Issue Probability Theory and Stochastic Processes: Theory and Applications)

Download Versions Notes

Abstract

Many studies require evidence that a new treatment improves efficacy and maintains or improves safety. Composite endpoints can obscure trade-offs and complicate interpretation. We propose a single-stage hypothesis test that directly evaluates two binary endpoints against a concurrent control, offering a transparent alternative to composite endpoints. The test rejects only if the observed improvements on both endpoints exceed pre-specified paired thresholds. The joint distribution of efficacy and safety is modeled with a four-category multinomial, yielding probabilities for all the outcome combinations. This enables exact computation of rejection probabilities and identification of least-favorable parameter configurations to control type I error at the nominal level while retaining adequate power. Design tables map the target significance level and power, together with predefined effect sizes for each endpoint, to the required sample size and decision thresholds. Simulations and one case study illustrate design selection and interpretation. The proposed test provides an exact and practical tool for early-phase trials with dual binary endpoints, particularly when efficacy and safety must be evaluated simultaneously.

Keywords:

two binary endpoints; hypothesis testing; multinomial distribution; clinical trial design; type I error control; sample size determination

MSC:

62F03

1. Introduction

Clinical trials increasingly require evidence that new treatments not only improve efficacy but also maintain or enhance safety. Regulatory agencies such as the U.S. Food and Drug Administration [1] explicitly emphasize the importance of multiple endpoints to ensure that therapeutic benefit is not achieved at the expense of unacceptable adverse effects. In many modern therapeutic areas—such as oncology, immunotherapy, and cardiovascular medicine—both efficacy and toxicity are regarded as co-primary outcomes, reflecting the dual objective of maximizing patient benefit while minimizing harm. Composite endpoints are frequently used to summarize multiple outcomes into a single measure; however, they can obscure trade-offs between efficacy and safety and complicate interpretation [2]. Consequently, there is growing interest in transparent and statistically rigorous frameworks that jointly evaluate efficacy and safety without conflating their effects.

From a methodological standpoint, the prevailing paradigm for co-primary binary endpoints is the intersection–union testing (IUT) framework, in which overall success is declared only when significant improvement is demonstrated on all the endpoints [3,4]. This approach provides strong control of the familywise type I error rate and aligns with regulatory expectations. A natural modeling strategy is to represent two binary outcomes jointly as a four-category multinomial variable, explicitly accounting for their correlation and enabling exact computation of joint probabilities and rejection regions.

A substantial body of research has focused on the design of clinical trials with dual binary endpoints. The seminal work of Bryant and Day [5] introduced a two-stage single-arm phase II design that simultaneously monitors efficacy and toxicity under a multinomial model. Their framework defined admissible regions for early termination but was limited to settings in which the experimental treatment was compared with a fixed standard rather than against a control treatment. Building on this idea, Conaway and Petroni [6] extended the single-arm framework to bivariate sequential settings, still assuming fixed standard values. Subsequent developments by Stallard, Thall, and Whitehead [7], Ivanova et al. [8], and Chen and Chi [9] further advanced early-phase designs with dual binary endpoints under single-arm or fixed-standard settings. Thall and Cheng [10] proposed a related decision-theoretic design for evaluating efficacy and safety in which clinically meaningful improvements of an experimental treatment over a standard treatment are elicited from the physician. Their testing procedure was developed using a large-sample normal approximation, whereas the present study provides an exact two-arm hypothesis testing framework under a multinomial model, allowing rejection probabilities and type I error rates to be computed exactly. Therefore, despite substantial progress, a methodological gap remains for exact two-arm hypothesis tests that jointly evaluate efficacy and safety when a clear decision is reached.

The present study addresses this methodological gap by proposing an exact frequentist hypothesis test for two-arm trials with two binary endpoints—efficacy and safety. The goal of the proposed method is twofold. First, it aims to directly evaluate the simultaneous improvement of both endpoints relative to a concurrent control rather than against fixed pre-specified standard values. Second, it ensures exact control of the type I error rate at the nominal significance level by analytically identifying least favorable parameter configurations (LFCs) or generalized least favorable parameter configurations (GLFCs). The LFCs and GLFCs are obtained by using monotonicity results to identify boundary parameter settings and then exactly evaluating the rejection probabilities under the multinomial model.

To achieve these objectives, the joint outcomes of efficacy and safety are modeled using a four-category multinomial distribution, which enables complete enumeration of all possible response combinations across treatment arms. Within this framework, the study focuses on the following four key methodological questions:

How can we construct a single-stage rejection rule that declares superiority only when both endpoints exceed pre-specified paired thresholds?
How can rejection probabilities be computed exactly under the four-category multinomial model?
How can LFCs or GLFCs be identified to guarantee exact control of the type I error rate and adequate power?
How can design tables be developed to map desired significance levels, power, and effect sizes to the required sample size and decision thresholds?

Since the two binary endpoints generate four possible joint response categories, the proposed formulation is naturally connected to the classical analysis of

2 \times 2

contingency tables; see, for example, Everitt [11] (Section 2.8). In particular, the condition

p_{11} p_{22} = p_{12} p_{21}

corresponds to the classical no-association structure for a

2 \times 2

table, a testing problem discussed by Kendall and Stuart [12] (Section 33.16). The present work builds on this classical framework but focuses on a different objective: constructing an exact two-arm hypothesis testing procedure for simultaneous efficacy and safety improvement relative to a concurrent control. This work makes several distinct contributions. First, it extends the single-arm joint-endpoint framework of Bryant and Day [5] and Chen and Chi [9] to a two-arm comparative design, allowing direct inference about relative efficacy and safety between treatments rather than against fixed standards. Second, the proposed test provides an exact frequentist formulation that allows analytical computation of rejection probabilities and formal identification of least favorable configurations. Finally, the framework produces design tables that map nominal type I error, power and effect sizes to required sample sizes and thresholds, facilitating practical implementation with simulations and exact calculations.

More recently, Homma and Yoshida [13] and Jung et al. [14] considered two-arm clinical trial designs with two binary co-primary endpoints under joint binary-outcome frameworks. Homma and Yoshida [13] developed exact power and sample size calculations for two-arm superiority trials with two co-primary binary endpoints by combining the bivariate binomial distribution with endpoint-wise testing procedures, such as Fisher’s exact test, Fisher’s mid-p test, Pearson’s chi-square test, the Z-pooled exact unconditional test, and Boschloo’s exact unconditional test. Thus, their framework is primarily built on marginal tests for each endpoint within an intersection–union structure, whereas the present study directly formulates a joint multinomial hypothesis testing procedure for simultaneous efficacy and safety improvement based on the paired treatment–control differences in

(p_{e}, p_{s})

.

Jung et al. [14] proposed a related two-arm, two-stage phase II design for two binary co-primary endpoints, which allows early termination for futility and is therefore useful for screening ineffective treatments. Their approach is closer in spirit to the present work because it also models correlated binary endpoints jointly. However, their operating characteristics are calibrated at pre-specified null and alternative configurations: the type I error rate is evaluated under selected point configurations of the null hypothesis, and the power is calculated at a specified alternative point. In contrast, the proposed method identifies least favorable or generalized least favorable configurations for type I error control and power evaluation so that the resulting design guarantees the desired operating characteristics over the corresponding parameter regions rather than only at selected design points.

By uniting the intersection–union test principle and multinomial modeling, this work offers an exact and transparent testing framework for clinical trials requiring simultaneous improvement in efficacy and safety. The proposed testing procedure is calibrated to guarantee the desired type I error control and power requirements over the corresponding parameter regions rather than only at selected point configurations. It provides statistical rigor through precise error control and clinical interpretability by maintaining separate explicit evaluation of efficacy and safety. ChatGPT-5.5 was used to assist in converting simulation output tables into LaTeX format.

2. Formulation for Two Endpoints

We consider a procedure for comparing an experimental treatment with a control. Let

π_{0}

denote the control treatment and

π_{1}

the experimental treatment. Each treatment yields two binary endpoints, referred to as efficacy and safety, although other binary outcomes could be accommodated within the same framework. For treatment j (

j = 0, 1

), let

X_{e}^{j}

and

X_{s}^{j}

denote the numbers of successes in efficacy and safety among n patients, with corresponding success probabilities

p_{e}^{j}

and

p_{s}^{j}

. The two endpoints within a treatment arm may be correlated, whereas responses across different treatments are assumed to be independent. In this paper, the association between the efficacy and safety endpoints is characterized through a pre-specified odds ratio

ϕ

. This specification is needed because, under the four-category multinomial model, the marginal efficacy probability, the marginal safety probability, and the odds ratio together determine the joint cell probabilities

(p_{11}, p_{12}, p_{21}, p_{22})

; conversely, the joint cell probabilities determine the two marginal probabilities and the odds ratio. Thus, a working specification of the association structure is required to fully define the multinomial probabilities used in the design. In practice, information on the control treatment may be available from historical studies or previous clinical experience, so the marginal response rates and the association between efficacy and safety can often be estimated or elicited for the control arm. For the experimental treatment, however, the association structure is typically less well established at the design stage. Following common practice in phase II designs with dual binary endpoints, we therefore use a common odds ratio across treatment arms as a working design assumption. A similar specification of the association structure through a pre-specified odds ratio has also been used in the dual-endpoint phase II trial literature; see, for example, Chen and Chi [9].

To simultaneously assess clinical efficacy and safety, one may formulate the null hypothesis as stating that the new treatment is not sufficiently better than the control in terms of either efficacy or safety, while the alternative hypothesis requires improvement in both dimensions. Specifically, we consider the following hypothesis testing problem:

H_{0} : p_{e}^{1} \leq p_{e}^{0} or p_{s}^{1} \leq p_{s}^{0} vs. H_{1} : p_{e}^{1} > p_{e}^{0} and p_{s}^{1} > p_{s}^{0} .

(1)

Based on the null and alternative hypotheses in (1), the following probability requirements are imposed:

P (Reject H_{0} ∣ H_{0} is true) \leq α,

(2)

and

P (Reject H_{0} ∣ H_{a} is true) \geq 1 - β .

(3)

Here, condition (2) controls the type I error probability at the nominal level

α

, which represents the maximum acceptable probability of falsely declaring the experimental treatment superior when it does not achieve simultaneous improvement in both efficacy and safety. In potential clinical applications, the choice of

α

reflects the tolerance for such a false-positive conclusion and may depend on the trial phase, disease severity, available alternatives, and regulatory considerations. Condition (3) requires the power of the test to be at least

1 - β

under clinically meaningful alternatives, where

β

denotes the maximum acceptable probability of failing to detect a treatment that improves both endpoints by the pre-specified effect sizes. Thus,

α

controls the risk of incorrectly advancing an insufficient treatment, whereas

1 - β

measures the probability of correctly identifying a treatment with meaningful joint efficacy–safety benefit.

Joint Probability of Two Endpoints

Let

j = 0, 1

index the two treatment arms, where

j = 0

corresponds to the control treatment

π_{0}

and

j = 1

corresponds to the experimental treatment

π_{1}

. For patient i in treatment arm j, let

(E_{j i}, S_{j i})

denote the two binary endpoints, where

E_{j i} = 1

indicates efficacy success and

S_{j i} = 1

indicates safety success. Thus, each patient falls into one of four joint response categories,

(1, 1)

,

(1, 0)

,

(0, 1)

, or

(0, 0)

, corresponding, respectively, to efficacious and safe, efficacious but unsafe, nonefficacious but safe, and neither efficacious nor safe.

Accordingly, for treatment j, we denote the vector of cell counts by

{\vec{X}}^{j} = (X_{11}^{j}, X_{12}^{j}, X_{21}^{j}, X_{22}^{j}),

where

X_{11}^{j} = \sum_{i = 1}^{n} I (E_{j i} = 1, S_{j i} = 1), X_{12}^{j} = \sum_{i = 1}^{n} I (E_{j i} = 1, S_{j i} = 0),

X_{21}^{j} = \sum_{i = 1}^{n} I (E_{j i} = 0, S_{j i} = 1), X_{22}^{j} = \sum_{i = 1}^{n} I (E_{j i} = 0, S_{j i} = 0) .

We assume that this vector follows a multinomial distribution with total sample size n and cell probabilities

{\vec{p}}^{j} = (p_{11}^{j}, p_{12}^{j}, p_{21}^{j}, p_{22}^{j}),

for

j = 0, 1

. Here,

p_{k l}^{j}

(

k, l = 1, 2

) denotes the probability corresponding to each efficacy–safety combination in treatment arm j. The sample proportions

X_{k l}^{j} / n

provide empirical estimates of the corresponding cell probabilities

p_{k l}^{j}

, for

k, l = 1, 2

.

This notation is equivalent to the standard three-dimensional contingency-table notation

n_{k l j}

for observed frequencies and

p_{k l j}

for cell probabilities, where the third index corresponds to the treatment arm. We retain notations

X_{k l}^{j}

and

p_{k l}^{j}

because, in clinical trial testing and ranking-and-selection designs, n is conventionally used for the sample size, while X denotes observed response counts.

The marginal probability of observing efficacy under treatment j is

p_{e}^{j} = p_{11}^{j} + p_{12}^{j},

which aggregates all outcomes classified as efficacious.

Similarly, the marginal safety probability is

p_{s}^{j} = p_{11}^{j} + p_{21}^{j},

representing the chance that a patient satisfies the safety criterion regardless of efficacy.

The corresponding marginal cell counts are

X_{e}^{j} = X_{11}^{j} + X_{12}^{j}, X_{e^{c}}^{j} = X_{21}^{j} + X_{22}^{j},

and

X_{s}^{j} = X_{11}^{j} + X_{21}^{j}, X_{s^{c}}^{j} = X_{12}^{j} + X_{22}^{j},

where

e^{c}

and

s^{c}

denote the complements of efficacy success and safety success, respectively.

Outcomes for treatment

π_{j}

is listed in Table 1.

For each treatment arm, the dependence between efficacy and safety is quantified by the treatment-specific odds ratio

ϕ_{j} = \frac{p_{11}^{j} p_{22}^{j}}{p_{12}^{j} p_{21}^{j}}, j = 0, 1 .

The odds ratio is a classical measure of association for

2 \times 2

contingency tables; see, for example, Everitt [11] (Section 2.8). Equivalently, the condition

p_{11}^{j} p_{22}^{j} = p_{12}^{j} p_{21}^{j}

corresponds to the classical no-association case in treatment arm j, which has long been studied as a testing problem for

2 \times 2

tables [12] (Section 33.16). For multiple

2 \times 2

tables, such as those arising after stratification by treatment arm, the treatment-specific log odds ratio is a standard way to describe conditional association. The closely related problems of partial association and conditional independence in stratified

2 \times 2

tables have been studied in the classical contingency-table literature; see Birch [15] and Kendall and Stuart [12] (Section 33.62). These formulations are also closely connected to the common-odds-ratio, or homogeneous-association, setting used for multiple

2 \times 2

tables. This classical formulation provides a natural justification for using the odds ratio as an association measure for the two binary endpoints. Following its use in bivariate binary-endpoint designs such as Conaway and Petroni [6], we use the odds ratio to parameterize the association between efficacy and safety.

In the present paper, we adopt the working assumption of a known common odds ratio across the two treatment arms, namely

ϕ_{0} = ϕ_{1} = ϕ .

This common-odds-ratio assumption is not a mathematical necessity but a design simplification that allows the four cell probabilities in each treatment arm to be determined by the two marginal probabilities

p_{e}^{j}

and

p_{s}^{j}

together with a common association parameter

ϕ

. It is also related to the classical common-association framework for several

2 \times 2

contingency tables, often expressed in terms of the log odds ratio; see Kendall and Stuart [12] (Section 33.62). In applications,

ϕ

may be specified using prior studies, pilot data, or sensitivity analyses. The value of

ϕ

has a direct interpretation in terms of the association between the two binary endpoints. When

ϕ = 1

, the efficacy and safety endpoints are independent within a treatment arm, corresponding to the no-association case discussed above. When

ϕ > 1

, the two endpoints are positively associated: patients who achieve efficacy are more likely to also achieve safety, and patients who fail to achieve efficacy are more likely to also fail to achieve safety. Equivalently, for fixed marginal probabilities, larger values of

ϕ

place more probability mass on the concordant cells

(1, 1)

and

(0, 0)

and less probability mass on the discordant cells

(1, 0)

and

(0, 1)

. When

0 < ϕ < 1

, the two endpoints are negatively associated, indicating a tendency toward discordant outcomes; for example, achieving efficacy may be less likely to occur together with achieving safety. The boundary case

ϕ = 0

represents an extreme form of negative association and is mainly included for sensitivity assessment rather than as a typical clinical setting. Two situations are considered:

Case 1:

ϕ = 1

(independence) When the two endpoints are independent, the joint distribution of

(X_{e}, X_{s})

factorizes as

P (X_{e} = x_{e}, X_{s} = x_{s} ∣ p_{e}, p_{s}, ϕ = 1) = b (n, p_{e}, x_{e}) b (n, p_{s}, x_{s}),

where

b (\cdot)

denotes the binomial pmf.

Case 2:

ϕ \neq 1

(dependence) For specified

p_{e}

,

p_{s}

, and

ϕ

, Bryant and Day [5] derived the corresponding four cell probabilities:

p_{11} = \frac{a - \sqrt{a^{2} + d}}{2 (ϕ - 1)},

(4)

p_{12} = p_{e} - p_{11},

(5)

p_{21} = p_{s} - p_{11},

(6)

p_{22} = 1 - p_{11} - p_{12} - p_{21},

(7)

with

a = 1 + (ϕ - 1) (p_{e} + p_{s}), d = - 4 ϕ (ϕ - 1) p_{e} p_{s} .

Given these cell probabilities, the joint probability of

(X_{e}, X_{s})

is

\begin{matrix} P (X_{e} = x_{e}, X_{s} = x_{s} ∣ p_{e}, p_{s}, ϕ) = \sum_{i = max (0, x_{e} + x_{s} - n)}^{min (x_{e}, x_{s})} \frac{n!}{i! (x_{e} - i)! (x_{s} - i)! (n - x_{e} - x_{s} + i)!} \\ \times p_{11}^{i} p_{12}^{x_{e} - i} p_{21}^{x_{s} - i} p_{22}^{n - x_{e} - x_{s} + i} . \end{matrix}

(8)

Remark 1.

When

ϕ = 1

, expression (8) continues to hold, with the simplifying identity

p_{11} = p_{e} p_{s} .

3. Fixed Sample Size Design

Consider fixed design constants n, e, and s. The decision rule is constructed as follows. Here, n denotes the sample size per treatment arm, while e and s are positive integer thresholds representing the minimum required excess numbers of successes for efficacy and safety, respectively, in the experimental treatment compared with the control. These thresholds are chosen together with n so that the resulting test satisfies the desired type I error and power requirements.

Procedure H:

Collect n observations from each of the two treatments (the control and the new treatment). For treatment i

(i = 0, 1)

, let

X_{e}^{i}

and

X_{s}^{i}

denote the observed numbers of successes for the efficacy and safety endpoints, respectively. With positive thresholds e and s, the rule in Procedure H is given by:

Reject the null hypothesis if both inequalities $X_{e}^{1} - X_{e}^{0} \geq e$ and $X_{s}^{1} - X_{s}^{0} \geq s$ hold.
Otherwise (i.e., if either $X_{e}^{1} - X_{e}^{0} < e$ or $X_{s}^{1} - X_{s}^{0} < s$ ), retain the null hypothesis.

Probability Requirements:

The design constants n, e, and s for Procedure H are chosen so that the procedure meets the following probabilistic criteria:

sup_{H_{0}} P (X_{e}^{1} - X_{e}^{0} \geq e, X_{s}^{1} - X_{s}^{0} \geq s | ϕ) \leq α,

(9)

inf_{H_{1}} P (X_{e}^{1} - X_{e}^{0} \geq e, X_{s}^{1} - X_{s}^{0} \geq s | ϕ) \geq 1 - β .

(10)

The left-hand side of (9) represents the probability of incorrectly rejecting

H_{0}

and is therefore controlled at the type I error level

α

. Likewise, the left-hand side of (10) corresponds to the probability of correctly declaring the new treatment superior, which equals the power

1 - β

under

H_{1}

.

To guarantee a meaningful distinction between the null and alternative spaces, we require a minimum effect separation that specifies how far the true parameters under

H_{1}

must lie from those under

H_{0}

. In particular, we assume the alternative satisfies

p_{e}^{1} \geq p_{e}^{0} + δ_{e}, p_{s}^{1} \geq p_{s}^{0} + δ_{s},

(11)

where

δ_{e}

and

δ_{s}

denote the smallest clinically relevant differences in efficacy and safety. These effect-size constraints ensure that the testing procedure can reliably discriminate between

H_{0}

and

H_{1}

when the new treatment exhibits meaningful improvement.

We now derive the values of the procedure parameters such that the probability constraints (9) and (10) are satisfied.

Theorem 1

(Monotonicity of the rejection probability). Let ϕ denote the odds ratio. The rejection probability

P (Reject H_{0}) = P (X_{e}^{1} - X_{e}^{0} \geq e, X_{s}^{1} - X_{s}^{0} \geq s | ϕ)

satisfies the following monotonicity properties with respect to the marginal efficacy and safety probabilities.

1.: Non-increasing in $p_{e}^{0}$ : For fixed $p_{e}^{1}, p_{s}^{0}, p_{s}^{1},$ and ϕ,

$P (Reject H_{0}) decreases (or remains constant) as p_{e}^{0} increases .$
2.: Non-increasing in $p_{s}^{0}$ : For fixed $p_{e}^{0}, p_{e}^{1}, p_{s}^{1},$ and ϕ,

$P (Reject H_{0}) decreases (or remains constant) as p_{s}^{0} increases .$
3.: Non-decreasing in $p_{e}^{1}$ : For fixed $p_{e}^{0}, p_{s}^{0}, p_{s}^{1},$ and ϕ,

$P (Reject H_{0}) increases (or remains constant) as p_{e}^{1} increases .$
4.: Non-decreasing in $p_{s}^{1}$ : For fixed $p_{e}^{0}, p_{e}^{1}, p_{s}^{0},$ and ϕ,

$P (Reject H_{0}) increases (or remains constant) as p_{s}^{1} increases .$

The detailed proof is provided in the Appendix A.

Theorem 1 immediately implies that, when

p_{e}^{0}

and

p_{s}^{0}

are known, the power is minimized at the boundary of the effect-size constraints, namely when

p_{e}^{1} = p_{e}^{0} + δ_{e}, p_{s}^{1} = p_{s}^{0} + δ_{s} .

Theorem 2

(Least favorable configuration for power control under

ϕ = 1

). When the odds ratio satisfies

ϕ = 1

and the critical values satisfy

e \leq n δ_{e}

and

s \leq n δ_{s}

, the minimal power under the alternative is attained at the boundary of the effect-size constraints, specifically when

p_{e}^{1} = p_{e}^{0} + δ_{e}, p_{s}^{1} = p_{s}^{0} + δ_{s}, p_{e}^{0} = \frac{1 - δ_{e}}{2}, p_{s}^{0} = \frac{1 - δ_{s}}{2} .

The proof is provided in Appendix A.

Theorem 3.

If the null-hypothesis parameters

p_{e}^{0}

and

p_{s}^{0}

are fixed and known, then the maximal type I error is attained at one of the following configurations:

p_{e}^{1} = 1, p_{s}^{1} = p_{s}^{0},

or

p_{e}^{1} = p_{e}^{0}, p_{s}^{1} = 1 .

This result follows directly from the monotonicity properties in Theorem 1. To attain the maximal type I error, the rejection probability should be made as large as possible while the null hypothesis remains true. Since the rejection probability is non-decreasing in

p_{e}^{1}

and

p_{s}^{1}

, the maximum occurs on the boundary of the null hypothesis: either

p_{s}^{1} = p_{s}^{0}

with

p_{e}^{1}

as large as possible or

p_{e}^{1} = p_{e}^{0}

with

p_{s}^{1}

as large as possible.

Consequently, Theorem 3 implies that the constraint in (9) can be equivalently expressed as

\begin{matrix} \max & {P (X_{e}^{1} - X_{e}^{0} \geq e, X_{s}^{1} - X_{s}^{0} \geq s | p_{e}^{1} = p_{e}^{0}, p_{s}^{1} = 1, ϕ), \\ P (X_{e}^{1} - X_{e}^{0} \geq e, X_{s}^{1} - X_{s}^{0} \geq s | p_{e}^{1} = 1, p_{s}^{1} = p_{s}^{0}, ϕ)} \leq α . \end{matrix}

(12)

Theorem 4

(Least favorable configuration for type I error control under

ϕ = 1

). When the odds ratio satisfies

ϕ = 1

and the critical values satisfy

e, s > 0

, the maximal type I error under the null hypothesis is attained at one of two configurations,

p_{e}^{1} = p_{e}^{0} = \frac{1}{2}, p_{s}^{1} = 1, p_{s}^{0} = 0,

or, symmetrically,

p_{s}^{1} = p_{s}^{0} = \frac{1}{2}, p_{e}^{1} = 1, p_{e}^{0} = 0 .

The proof is provided in the Appendix A.

The above results characterize the least favorable configurations for both power and type I error control in Procedure H. Theorem 1 shows that the rejection probability is monotone in the marginal efficacy and safety probabilities. This monotonicity reduces the search for worst-case configurations to boundary points of the parameter space. Theorem 2 identifies the least favorable alternative for power calculation when

ϕ = 1

. Theorems 3 and 4 characterize the worst-case null configurations for type I error control.

A key implication is that, when

ϕ = 1

, the efficacy and safety endpoints are independent. In this case, a universal least favorable configuration (LFC) exists. Therefore, the design parameters

(n, e, s)

can be determined without specifying the nuisance parameters

p_{e}^{0}

and

p_{s}^{0}

. By contrast, when

ϕ \neq 1

, the two endpoints are dependent. A universal LFC is generally unavailable in this setting. Thus, determining

(n, e, s)

requires the baseline values

p_{e}^{0}

and

p_{s}^{0}

to be specified.

If

p_{e}^{0}

and

p_{s}^{0}

are known in advance, the resulting design is less conservative. It may also require a substantially smaller sample size. This reduction is illustrated in the numerical results reported in the subsequent tables.

4. Tables and Discussion

In this section, we report the design parameters required to implement the proposed fixed-sample procedure. Throughout, we assume a common association structure between the two binary endpoints across the control and the tested treatment; i.e., all the treatments share the same odds ratio

ϕ

. Under this assumption, bivariate binary outcomes can be characterized by the marginal success probabilities and the odds ratio, which jointly determine the

2 \times 2

cell probabilities used in simulation.

We consider the following configurations:

ϕ \in {0, 1, 2, 4, 8}

,

δ_{e} \in {0.1, 0.2, 0.3}

, and

δ_{s} \in {0.1, 0.2, 0.3}

. The target operating characteristics are specified by

(Power, Type I error) \in {(0.75, 0.15), (0.85, 0.15)} .

The type I error level of 0.15 is used here for illustrative purposes in an exploratory early-phase setting, where a less stringent error level may be considered acceptable for screening promising treatments. In practice, the choice of type I error rate and power should be determined according to the clinical objective, regulatory context, and input from clinical investigators.

For each configuration, we determine the minimum required sample size per treatment, n, together with the corresponding critical values

(e, s)

such that the probability constraints in (9) and (10) are satisfied. The reported operating characteristics in the tables are estimated by Monte Carlo simulation using 100,000 replications for each configuration. For example, when

ϕ = 1

,

δ_{e} = δ_{s} = 0.2

, and the target operating characteristics are

(Power, Type I error) = (0.75, 0.15)

, we evaluate candidate sample sizes n sequentially. For each n, all feasible integer threshold pairs

(e, s)

are checked by Monte Carlo estimation of the corresponding rejection probabilities. The smallest sample size for which at least one pair

(e, s)

satisfies both requirements is selected. In this setting, based on 100,000 Monte Carlo replications,

n = 63

with

(e, s) = (7, 7)

gives an estimated power of

0.75241

and an estimated type I error of

0.12335

, so it satisfies the desired constraints.

The simulation procedure used to determine

(n, e, s)

can be summarized as follows:

Specify the target type I error level $α$ , target power $1 - β$ , effect sizes $δ_{e}$ and $δ_{s}$ , odds ratio $ϕ$ , and, when needed, baseline control rates $p_{e}^{0}$ and $p_{s}^{0}$ .
For a candidate sample size n, enumerate all feasible positive integer threshold pairs $(e, s)$ .
For each pair $(e, s)$ , estimate the type I error and power using 100,000 Monte Carlo replications under the corresponding LFC or GLFC.
Retain the threshold pairs $(e, s)$ that satisfy both the type I error constraint and the power constraint.
Increase n sequentially until at least one feasible pair $(e, s)$ is found.
Select the smallest such n. If multiple threshold pairs satisfy the constraints for this minimum n, choose the pair with the smallest thresholds $(e, s)$ .

The numerical study is divided into two cases.

Case 1:

ϕ = 1

with unknown baseline control rates. When

ϕ = 1

, the efficacy and safety endpoints are independent. In this case, we assume that the baseline control rates

p_{e}^{0}

and

p_{s}^{0}

are unknown. By Theorem 2, the minimum power under the alternative is attained at a specific least favorable configuration when

ϕ = 1

. Similarly, by Theorem 4, the maximal type I error under the null is attained at a least favorable configuration. These least favorable configurations together make it possible to calibrate the design parameters

(n, e, s)

without specifying

p_{e}^{0}

and

p_{s}^{0}

. The resulting designs are reported in Table 2 and Table 3, corresponding to the two target operating characteristics

(0.75, 0.15)

and

(0.85, 0.15)

, respectively.

Case 2: known baseline control rates. We next consider the case where the baseline control rates are known, with

p_{e}^{0}, p_{s}^{0} \in {0.2, 0.4, 0.6},

yielding all possible combinations of

(p_{e}^{0}, p_{s}^{0})

. For this case, we fix

δ_{e} = δ_{s} = 0.2

and consider

ϕ \in {0, 1, 2, 4, 8}

. Using the generalized least favorable configuration (GLFC) characterized by Theorems 1 and 3, we determine the corresponding minimum sample size n and critical values

(e, s)

for each parameter setting. The results are presented in Table 4 and Table 5, corresponding to

(Power, Type I error) = (0.75, 0.15)

and

(0.85, 0.15)

, respectively.

The design parameters

(n, e, s)

are obtained via simulation. Specifically, for each candidate value of n, we search over feasible integer thresholds

(e, s)

and select the smallest n for which at least one pair

(e, s)

meets the constraints. When multiple

(e, s)

pairs satisfy the requirements for the same minimal n, we adopt the pair that yields the largest empirical power while maintaining the type I error constraint. The resulting parameters provide a direct lookup table for implementing the procedure under the considered settings.

When the baseline control rates

p_{e}^{0}

and

p_{s}^{0}

are unknown, Table 2 and Table 3 summarize, under

ϕ = 1

, the minimum per-treatment sample size n and the corresponding critical values

(e, s)

that achieve the desired operating characteristics. As expected, larger effect thresholds (i.e., larger

δ_{e}

and/or

δ_{s}

) generally lead to smaller required sample sizes since stronger separation between the null and alternative hypotheses makes it easier to satisfy the power constraint while controlling type I error. These tabulated values enable straightforward implementation of the procedure without re-running the design search for each new study.

In addition, Table 4 and Table 5 report the case where the baseline control rates

p_{e}^{0}

and

p_{s}^{0}

are known in advance and

δ_{e} = δ_{s} = 0.2

. The results show that, as the odds ratio

ϕ

increases, the required sample size n is non-increasing across all the reported configurations, and the overall variation in n is relatively small. This indicates that the proposed design is not highly sensitive to the dependence parameter

ϕ

. The sensitivity results in Table 4 and Table 5 further suggest that the proposed design is relatively robust to the choice of

ϕ

over a clinically relevant range. In particular, the required sample size changes only slightly for moderate values of

ϕ

, indicating that moderate misspecification of the working odds ratio has limited impact on the resulting design. The case

ϕ = 0

represents an extreme boundary scenario and is included mainly for sensitivity assessment rather than as a typical clinical setting. Moreover, for the setting

δ_{e} = δ_{s} = 0.2

, the sample sizes in Table 2 and Table 3 are generally no smaller than those in Table 4 and Table 5, reflecting the fact that the designs obtained under unknown baseline control rates are more conservative. In particular, in Table 2, the required sample size is

n = 63

, which exceeds all the corresponding values reported in Table 4 for

δ_{e} = δ_{s} = 0.2

. In Table 3, the required sample size is

n = 76

, which is larger than nearly all the corresponding values in Table 5; the only exception is one configuration with

n = 77

when

ϕ = 0

and

p_{e}^{0} = p_{s}^{0} = 0.4

. Overall, these findings suggest that the values reported in Table 2 and Table 3 provide a reasonably conservative design benchmark. Therefore, when

p_{e}^{0}

and

p_{s}^{0}

are unknown and

ϕ \neq 1

, or even when

ϕ

itself is unknown, the design calibrated under the

ϕ = 1

least favorable configuration can still serve as a practical and reliable approximation for trial planning.

To further examine the robustness of the proposed design to possible misspecification of the odds ratio

ϕ

, we conducted an additional sensitivity calculation using the first configuration in Table 3, where

p_{e}^{0} = p_{s}^{0} = 0.2

. For this setting, the design obtained under

ϕ = 8

is

(n, e, s) = (45, 5, 5)

. We then fixed this design and estimated the corresponding power and type I error rate under several different values of the true odds ratio. When

ϕ = 0

, the estimated power is

0.7132

and the estimated type I error rate is

0.11864

; when

ϕ = 1

, the estimated power is

0.73223

and the estimated type I error rate is

0.11806

; and, when

ϕ = 2

, the estimated power is

0.74129

and the estimated type I error rate is

0.11623

. The cases

ϕ = 4

and

ϕ = 8

lead to the same design parameters in the table and therefore do not require separate design calibration. These results suggest that, when the misspecification of

ϕ

is moderate, its impact on the rejection probabilities is relatively small. Even in the boundary case

ϕ = 0

, which represents an extreme scenario on the odds-ratio scale, the type I error rate remains below the nominal level

0.15

, while the reduction in power is moderate. Moreover, when

p_{e}^{0}

and

p_{s}^{0}

are unknown and the design from Table 2 with

δ_{e} = δ_{s} = 0.2

is used, namely

(n, e, s) = (63, 7, 7)

, the estimated power and type I error rate under

ϕ = 0

and

p_{e}^{0} = p_{s}^{0} = 0.2

are

0.77839

and

0.07363

, respectively. Thus, when there is substantial uncertainty about the value of

ϕ

, the designs reported in Table 2 and Table 3 can serve as conservative and practical choices for trial planning.

The design tables also reveal several structural patterns among the design parameters. First, as expected, a higher target power generally requires a larger sample size because stronger probability guarantees require more information from each treatment arm. Second, the clinically meaningful effect sizes

δ_{e}

and

δ_{s}

have a substantial impact on the required sample size. When either the efficacy threshold or the safety threshold becomes larger, the required sample size tends to decrease since larger treatment differences are easier to detect. Conversely, smaller values of

δ_{e}

or

δ_{s}

lead to more demanding designs and therefore require larger sample sizes. Third, for fixed baseline rates and effect-size thresholds, the required sample size tends to be non-increasing as the odds ratio

ϕ

increases. This pattern indicates that a stronger positive association between efficacy and safety can reduce the amount of information needed to jointly demonstrate improvement on both endpoints. Overall, the numerical results are consistent with intuition: sample size is most sensitive to target power, the type I error requirement, and clinically meaningful thresholds

δ_{e}

and

δ_{s}

, while the dependence parameter

ϕ

appears to have little impact over the practical range considered in the tables.

5. Example

This example considers an experimental trial involving an immunotherapy-based treatment for elderly patients (≥75 years old) diagnosed with advanced non-small-cell lung cancer (NSCLC). The trial compares the immunotherapy strategy, PD1-A (anti-PD-1 monotherapy), against the standard chemotherapy regimen, consisting of carboplatin and pemetrexed, which serves as the control treatment.

The goal of the trial is to determine whether PD1-A demonstrates superiority over the control treatment in terms of both efficacy and safety. Specifically, the study targets an improvement of at least

0.30

in the response rate and a reduction of at least

0.20

in high-grade toxicity, corresponding to

δ_{e} = 0.30

and

δ_{s} = 0.20

. If PD1-A fails to show meaningful improvement over the control, the standard chemotherapy will be selected.

Under this setting, the least favorable configuration for rejecting the null hypothesis occurs when

p_{e}^{0} = \frac{1 - δ_{e}}{2}, p_{s}^{0} = \frac{1 - δ_{s}}{2}, p_{e}^{1} = \frac{1 + δ_{e}}{2}, p_{s}^{1} = \frac{1 + δ_{s}}{2} .

By Theorem 4, the least favorable configuration (LFC) for the type I error is attained when

p_{e}^{1} = p_{e}^{0} = \frac{1}{2}, p_{s}^{1} = 1, p_{s}^{0} = 0,

or, symmetrically,

p_{s}^{1} = p_{s}^{0} = \frac{1}{2}, p_{e}^{1} = 1, p_{e}^{0} = 0 .

When the probability constraints are set at power

= 0.85

and type I error

= 0.15

, Table 2 indicates that the fixed-sample procedure requires

n = 56

observations per treatment arm, with associated critical values

e = 6

and

s = 6

. Therefore, the total number of observations required to meet the design criteria is

2 \times 56 = 112

.

After the trial is completed, the procedure is applied by comparing the observed numbers of efficacy and safety successes between the PD1-A arm and the control arm. Let

X_{e}^{1}

and

X_{s}^{1}

denote the observed efficacy and safety successes in the PD1-A arm, and let

X_{e}^{0}

and

X_{s}^{0}

denote the corresponding observed successes in the control arm. With the selected critical values

e = 6

and

s = 6

, the null hypothesis is rejected only if both

X_{e}^{1} - X_{e}^{0} \geq 6

and

X_{s}^{1} - X_{s}^{0} \geq 6

. Otherwise, the trial does not provide sufficient evidence that PD1-A is superior to the control treatment in both efficacy and safety. For instance, if the observed differences are

X_{e}^{1} - X_{e}^{0} = 7

and

X_{s}^{1} - X_{s}^{0} = 6

, then both criteria are met and the null hypothesis is rejected; if either difference is less than 6, the null hypothesis is retained.

6. Conclusions

This paper develops an exact frequentist fixed sample size hypothesis testing procedure for two-arm clinical trials with two binary endpoints, motivated by settings in which a new treatment must demonstrate improvement in both efficacy and safety relative to a concurrent control. Instead of relying on a composite endpoint, the proposed method evaluates the two endpoints jointly and rejects the null hypothesis only when the observed treatment differences exceed pre-specified paired thresholds on both dimensions. By modeling the joint efficacy–safety outcomes through a four-category multinomial distribution, the procedure allows calculation of rejection probabilities and provides a clear framework for benefit–risk evaluation.

The theoretical results characterize the least favorable configurations for both power and type I error control. The monotonicity properties of the rejection probability reduce the search for worst-case configurations to the boundary of the parameter space. When the two endpoints are independent (

ϕ = 1

), the least favorable configurations can be identified explicitly, allowing the design parameters

(n, e, s)

to be determined without specifying the baseline control rates

p_{e}^{0}

and

p_{s}^{0}

. When

ϕ \neq 1

, a universal least favorable configuration is generally unavailable, and calibration requires the baseline control rates to be specified. The numerical results further suggest that the required sample size is not highly sensitive to the odds ratio across the settings considered.

The proposed framework can be implemented through design tables that map the target power, type I error rate, and clinically meaningful effect sizes to the required sample size and decision thresholds. These tables provide a convenient tool for study planning and reduce the need for repeated design searches in new applications. In particular, when the baseline control rates are unknown, designs calibrated under the

ϕ = 1

least favorable configuration may provide a practical approximation even when the true odds ratio differs from 1.

Several extensions of the proposed framework are possible. One limitation of the proposed framework is that the odds ratio

ϕ

is treated as a pre-specified working design parameter and is assumed to be common across the control and experimental treatments. Although sensitivity analyses indicate that the design is relatively robust over practical ranges of

ϕ

, future work may consider adaptive or data-driven approaches that allow different association structures between efficacy and safety across treatment arms. Second, although the present study focuses on a fixed-sample design, the relatively large sample sizes required in some settings suggest that sequential or curtailed versions of the proposed procedure may provide meaningful gains in efficiency.

Overall, the proposed procedure offers a rigorous and interpretable approach for two-arm trials with co-primary binary endpoints. By preserving separate assessments of efficacy and safety while controlling type I error and maintaining adequate power, the method provides a useful alternative to composite-endpoint analyses and helps to bridge the gap between early-phase bivariate endpoint designs and comparative two-arm confirmatory testing.

Author Contributions

Conceptualization, P.C., C.Y. and E.M.B.; Methodology, C.Y. and E.M.B.; Software, C.Y.; Validation, C.Y.; Formal analysis, E.M.B.; Investigation, P.C. and E.M.B.; Resources, E.M.B.; Data curation, C.Y.; Writing—original draft, C.Y.; Writing—review & editing, P.C., C.Y. and E.M.B.; Visualization, C.Y.; Supervision, P.C.; Project administration, P.C.; Funding acquisition, P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

Generative ChatGPT (GPT-5.5, OpenAI) was used in a limited and supportive manner during manuscript preparation. Specifically, AI was used to assist in converting simulation output tables into LaTeX format, as well as to help check grammar and improve language clarity. All scientific content, methodology, results, and interpretations were developed and verified by the authors.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proof of Theorem 1

Lemma A1.

Let

X_{e}

and

X_{s}

be two random variables with joint probability mass function

P (X_{e} = x_{e}, X_{s} = x_{s} | p_{e}, p_{s}, ϕ)

. If

ψ = ψ (x_{e}, x_{s})

is a non-decreasing (non-increasing) function of

x_{e}

when

x_{s}

is held fixed, then

E ψ (X_{e}, X_{s})

is a non-decreasing (non-increasing) function of

p_{e}

when

p_{s}, ϕ

are held fixed. If

ψ = ψ (x_{e}, x_{s})

is a non-decreasing (non-increasing) function of

x_{s}

when

x_{e}

is held fixed, then

E ψ (X_{e}, X_{s})

is a non-decreasing (non-increasing) function of

p_{s}

when

p_{e}, ϕ

are held fixed.

This lemma was established by [16] in a related treatment-selection framework with two binary endpoints.

Theorem A1.

Let ϕ denote the odds ratio. The probability of rejecting the null hypothesis

P (Reject H_{0})

satisfies the following monotonicity properties:

1.: Non-increasing in $p_{e}^{0}$ : For fixed $p_{e}^{1}, p_{s}^{0}, p_{s}^{1}, ϕ$ ,

$P (Reject H_{0}) is non-increasing in p_{e}^{0} .$
2.: Non-increasing in $p_{s}^{0}$ : For fixed $p_{e}^{0}, p_{e}^{1}, p_{s}^{1}, ϕ$ ,

$P (Reject H_{0}) is non-increasing in p_{s}^{0} .$
3.: Non-decreasing in $p_{e}^{1}$ : For fixed $p_{e}^{0}, p_{s}^{0}, p_{s}^{1}, ϕ$ ,

$P (Reject H_{0}) is non-decreasing in p_{e}^{1} .$
4.: Non-decreasing in $p_{s}^{1}$ : For fixed $p_{e}^{0}, p_{e}^{1}, p_{s}^{0}, ϕ$ ,

$P (Reject H_{0}) is non-decreasing in p_{s}^{1} .$

Proof.

Let

ψ (X_{e}^{0}, X_{s}^{0}, X_{e}^{1}, X_{s}^{1})

be an indicator function, where

(X_{e}^{j}, X_{s}^{j})

follows the joint distribution

f_{X_{e}^{j}, X_{s}^{j}} (x_{e}^{j}, x_{s}^{j}) = P (X_{e}^{j} = x_{e}^{j}, X_{s}^{j} = x_{s}^{j}),

and the pairs

(X_{e}^{j}, X_{s}^{j})

are independent of each other for

j = 0, 1

.

ψ = \{\begin{matrix} 1 if & x_{e}^{1} - x_{e}^{0} \geq e & x_{s}^{1} - x_{s}^{0} \geq s, \\ 0 if & otherwise \end{matrix}

(A1)

That is,

ψ = 1

if

H_{0}

is rejected and

ψ = 0

otherwise.

We are going to show that

ψ

is a non-increasing function of

x_{e}^{0}

when

x_{s}^{0}

and

(x_{e}^{1}, x_{s}^{1})

are held fixed. Assume

x_{e}^{0 *} > x_{e}^{0}

.

Let

ψ (\dots x_{e}^{0})

be a simplified form of

ψ (x_{e}^{0}, x_{s}^{0}, x_{e}^{1}, x_{s}^{1})

.

If

ψ (\dots x_{e}^{0 *}) = 0

, then we have

ψ (\dots x_{e}^{0}) \geq ψ (\dots x_{e}^{0 *}) = 0

.

If

ψ (\dots x_{e}^{0 *}) = 1

, then, by the condition of indicator function and

x_{e}^{0 *} > x_{e}^{0}

, we have:

\begin{matrix} e \leq x_{e}^{1} - x_{e}^{0 *} < x_{e}^{1} - x_{e}^{0} \end{matrix}

(A2)

By (A2), we have

ψ (\dots x_{e}^{0}) = 1 \geq ψ (\dots x_{e}^{0 *})

. So

ψ

is a non-increasing function of

x_{e}^{0}

. Similarly we can show that

ψ

is non-increasing over

x_{s}^{0}

.

Next step, we are going to show that

ψ

is non-decreasing with

x_{e}^{1}

when the other variables are held fixed. Assume

x_{e}^{1 *} > x_{e}^{1}

If

ψ (\dots x_{e}^{1}) = 0

, then

ψ (\dots x_{e}^{1 *}) \geq 0 = ψ (\dots x_{e}^{1})

.

If

ψ (\dots x_{e}^{1}) = 1

, then we have:

\begin{matrix} x_{e}^{1 *} - x_{e}^{0} > x_{e}^{1} - x_{e}^{0} \geq e, x_{s}^{1} - x_{s}^{0} \geq s \end{matrix}

(A3)

By (A3), we can conclude that

ψ (\dots x_{e}^{1 *}) = 1 \geq ψ (\dots x_{e}^{1})

. So

ψ

is a non-decreasing function of

x_{e}^{1}

. Similarly we can show that

ψ

is non-decreasing over

x_{s}^{1}

.

So

ψ

is monotone with respect to

x_{e}^{j}

and

x_{s}^{j}

for

j = 0, 1

.

Then, we are going to show that

P (Reject H_{0})

is monotone with respect to

p_{e}^{0}

.

\begin{matrix} P (Reject H_{0}) = & E ψ ((X_{e}^{0}, X_{s}^{0}), (X_{e}^{1}, X_{s}^{1})) \\ = & E {E ψ ((X_{e}^{0}, X_{s}^{0}), (X_{e}^{1}, X_{s}^{1})) | x_{s}^{0}, (x_{e}^{1}, x_{s}^{1}))} \end{matrix}

(A4)

\begin{matrix} E {E ψ ((X_{e}^{0}, X_{s}^{0}), (X_{e}^{1}, X_{s}^{1})) | x_{s}^{0}, (x_{e}^{1}, x_{s}^{1}))} \\ = & \sum_{x_{e}^{0}} ψ_{x_{s}^{0}, (x_{e}^{1}, x_{s}^{1})} (x_{e}^{0}) \times P (X_{e}^{0} = x_{e}^{0} | X_{s}^{0} = x_{s}^{0}) \\ = & \sum_{x_{e}^{0}} ψ_{x_{s}^{0}, (x_{e}^{1}, x_{s}^{1})} (x_{e}^{0}) \times \frac{P (X_{e}^{0} = x_{e}^{0}, X_{s}^{0} = x_{s}^{0})}{P (X_{s}^{0} = x_{s}^{0})} \\ = & \frac{1}{P (X_{s}^{0} = x_{s}^{0})} \times \sum_{x_{e}^{0}} ψ_{x_{s}^{0}, (x_{e}^{1}, x_{s}^{1})} (x_{e}^{0}) \times P (X_{e}^{0} = x_{e}^{0}, X_{s}^{0} = x_{s}^{0}) \end{matrix}

(A5)

Since (A5) is a non-increasing function of

p_{e}^{0}

by Lemma A1, then, by the order-preserving property of expectations, we can conclude that

P (Reject H_{0})

, which is the expectation of (A5), is a non-increasing function of

p_{e}^{0}

. Similarly, we can show that

P (Reject H_{0})

is a monotone function of

p_{s}^{j}

for

j = 0, 1

and

p_{e}^{g}

for

g = 0, 1

. □

Appendix A.2. Proof of Theorem 2

Theorem A2.

When the odds ratio satisfies

ϕ = 1

and the critical values satisfy

e \leq n δ_{e}

and

s \leq n δ_{s}

, the minimal power under the alternative is attained at the boundary of the effect-size constraints, specifically when

p_{e}^{0} = \frac{1 - δ_{e}}{2}, p_{s}^{0} = \frac{1 - δ_{s}}{2}, p_{e}^{1} = \frac{1 + δ_{e}}{2}, p_{s}^{1} = \frac{1 + δ_{s}}{2} .

Proof.

Based on Theorem 1, the power is minimized at the boundary configuration

p_{e}^{1} = p_{e}^{0} + δ_{e}

and

p_{s}^{1} = p_{s}^{0} + δ_{s}

. It remains to determine the location of

(p_{e}^{0}, p_{s}^{0})

that minimizes the power.

We apply the variance-stabilizing normal approximation (arcsin–square-root transform):

Z_{e}^{j} = \sqrt{4 n} arcsin \sqrt{\frac{X_{e}^{j}}{n}}, Z_{s}^{j} = \sqrt{4 n} arcsin \sqrt{\frac{X_{s}^{j}}{n}}, j = 0, 1 .

Under this approximation,

Z_{e}^{j} \approx N (\sqrt{4 n} arcsin \sqrt{p_{e}^{j}}, 1), Z_{s}^{j} \approx N (\sqrt{4 n} arcsin \sqrt{p_{s}^{j}}, 1) .

Moreover, when

ϕ = 1

,

X_{e}^{j}

is independent of

X_{s}^{j}

for

j = 0, 1

; hence,

(Z_{e}^{0}, Z_{e}^{1})

is independent of

(Z_{s}^{0}, Z_{s}^{1})

.

After the transformation, the rejection criteria can be written as

T_{e} = \frac{Z_{e}^{1} - Z_{e}^{0}}{\sqrt{2}} \geq y_{e}, T_{s} = \frac{Z_{s}^{1} - Z_{s}^{0}}{\sqrt{2}} \geq y_{s} .

Under

p_{e}^{1} = p_{e}^{0} + δ_{e}

and

p_{s}^{1} = p_{s}^{0} + δ_{s}

, define

Δ_{e} (p_{e}^{0}) = arcsin \sqrt{p_{e}^{0} + δ_{e}} - arcsin \sqrt{p_{e}^{0}}, Δ_{s} (p_{s}^{0}) = arcsin \sqrt{p_{s}^{0} + δ_{s}} - arcsin \sqrt{p_{s}^{0}} .

Then

T_{e} \approx N (\sqrt{2 n} Δ_{e} (p_{e}^{0}), 1), T_{s} \approx N (\sqrt{2 n} Δ_{s} (p_{s}^{0}), 1),

and

T_{e}

is independent of

T_{s}

. Therefore the power (under

ϕ = 1

) factorizes:

P (Reject H_{0}) = P (T_{e} \geq y_{e}) P (T_{s} \geq y_{s}) \approx [1 - Φ {y_{e} - \sqrt{2 n} Δ_{e} (p_{e}^{0})}] [1 - Φ {y_{s} - \sqrt{2 n} Δ_{s} (p_{s}^{0})}] .

For any fixed threshold

y \in R

, the function

g (Δ) = 1 - Φ {y - \sqrt{2 n} Δ}

is strictly increasing in

Δ

since

g^{'} (Δ) = \sqrt{2 n} φ {y - \sqrt{2 n} Δ} > 0 .

Hence, minimizing the power with respect to

p_{e}^{0}

(resp.

p_{s}^{0}

) is equivalent to minimizing

Δ_{e} (p_{e}^{0})

(resp.

Δ_{s} (p_{s}^{0})

).

Next, differentiate

Δ_{e} (p)

for

p \in (0, 1 - δ_{e})

:

Δ_{e}^{'} (p) = \frac{1}{2 \sqrt{p + δ_{e}} \sqrt{1 - p - δ_{e}}} - \frac{1}{2 \sqrt{p} \sqrt{1 - p}} .

Setting

Δ_{e}^{'} (p) = 0

yields

\sqrt{p (1 - p)} = \sqrt{(p + δ_{e}) (1 - p - δ_{e})} ⟺ p (1 - p) = (p + δ_{e}) (1 - p - δ_{e}) ⟺ p = \frac{1 - δ_{e}}{2} .

Moreover,

Δ_{e}^{'} (p) < 0

for

p < \frac{1 - δ_{e}}{2}

and

Δ_{e}^{'} (p) > 0

for

p > \frac{1 - δ_{e}}{2}

, so

Δ_{e} (p)

is minimized at

p_{e}^{0} = \frac{1 - δ_{e}}{2}

. The same argument gives that

Δ_{s} (p_{s}^{0})

is minimized at

p_{s}^{0} = \frac{1 - δ_{s}}{2}

. Therefore the power is minimized at

p_{e}^{0} = \frac{1 - δ_{e}}{2}, p_{s}^{0} = \frac{1 - δ_{s}}{2}, p_{e}^{1} = \frac{1 + δ_{e}}{2}, p_{s}^{1} = \frac{1 + δ_{s}}{2} .

Remark A1

(flip condition under an untransformed difference approximation). The above conclusion relies on the variance-stabilizing transform, under which the variance is approximately constant (equal to 1) and the power is monotone in Δ for any threshold

y_{e}, y_{s} \in R

. If instead one works directly with the count difference

D_{e} = X_{e}^{1} - X_{e}^{0}

and a rejection rule of the form

D_{e} \geq a_{e}

, then, under

p_{e}^{1} = p_{e}^{0} + δ_{e}

,

E [D_{e}] = n δ_{e}, Var (D_{e}) = n (p_{e}^{0} (1 - p_{e}^{0}) + (p_{e}^{0} + δ_{e}) (1 - p_{e}^{0} - δ_{e})) = : σ_{e}^{2} (p_{e}^{0}),

and a normal approximation gives

P (D_{e} \geq a_{e}) \approx 1 - Φ (\frac{a_{e} - n δ_{e}}{σ_{e} (p_{e}^{0})}) .

The derivative with respect to

σ_{e}

satisfies

\frac{\partial}{\partial σ_{e}} P (D_{e} \geq a_{e}) = φ (\frac{a_{e} - n δ_{e}}{σ_{e}}) \cdot \frac{a_{e} - n δ_{e}}{σ_{e}^{2}},

whose sign is determined by

a_{e} - n δ_{e}

. Consequently:

If $a_{e} \leq n δ_{e}$ , then $P (D_{e} \geq a_{e})$ decreases with $σ_{e}$ , so the worst-case (minimum power) occurs at the maximal-variance point, i.e., $p_{e}^{0} = (1 - δ_{e}) / 2$ .
If $a_{e} > n δ_{e}$ , the monotonicity flips: $P (D_{e} \geq a_{e})$ increases with $σ_{e}$ , so the worst-case shifts toward minimal variance, i.e., boundary values $p_{e}^{0} \to 0$ or $p_{e}^{0} \to 1 - δ_{e}$ (and similarly for the safety endpoint).

An analogous statement holds for

D_{s} = X_{s}^{1} - X_{s}^{0}

with threshold

a_{s}

and mean shift

n δ_{s}

. □

Appendix A.3. Proof of Theorem 4

Theorem A3.

When the odds ratio satisfies

ϕ = 1

and critical values

e, s > 0

, the maximal type I error under the null hypothesis is attained when

p_{e}^{1} = p_{e}^{0} = 0.5, p_{s}^{1} = 1, p_{s}^{0} = 0,

or

p_{s}^{1} = p_{s}^{0} = 0.5, p_{e}^{1} = 1, p_{e}^{0} = 0,

Proof.

Let the rejection region be

R = {T_{e} \geq e, T_{s} \geq s},

where

T_{e} = X_{e}^{1} - X_{e}^{0}

(resp.

T_{s} = X_{s}^{1} - X_{s}^{0}

) is an increasing function of the efficacy (resp. safety) evidence in arm 1 relative to arm 0. When

ϕ = 1

, efficacy and safety are independent within each arm, and the two arms are independent; hence,

P (Reject H_{0}) = P (T_{e} \geq e) P (T_{s} \geq s) .

(A6)

The null hypothesis is composite:

H_{0} = {p_{e}^{1} \leq p_{e}^{0}} \cup {p_{s}^{1} \leq p_{s}^{0}},

i.e., at least one endpoint is not improved. Since

R

is increasing in

p_{e}^{1} - p_{e}^{0}

and in

p_{s}^{1} - p_{s}^{0}

, the rejection probability under each sub-null is maximized on the boundary:

sup_{p_{e}^{1} \leq p_{e}^{0}} P (T_{e} \geq e) = sup_{p_{e}^{1} = p_{e}^{0}} P (T_{e} \geq e), sup_{p_{s}^{1} \leq p_{s}^{0}} P (T_{s} \geq s) = sup_{p_{s}^{1} = p_{s}^{0}} P (T_{s} \geq s) .

Therefore, to maximize the overall type I error

{sup}_{H_{0}} P (R)

, it suffices to consider the two boundary cases:

Case 1:

p_{e}^{1} = p_{e}^{0}

(efficacy on the null boundary). Then, by (A6),

P (R) = P (T_{e} \geq e ∣ p_{e}^{1} = p_{e}^{0}) \cdot P (T_{s} \geq s) .

The second factor is at most 1, so the product is maximized by making the safety constraint as easy as possible. In particular, taking

(p_{s}^{0}, p_{s}^{1}) = (0, 1)

yields

X_{s}^{0} \equiv 0

and

X_{s}^{1} \equiv n

; hence,

P (T_{s} \geq s) = 1

(for any nontrivial critical value corresponding to requiring improvement). Thus, the maximal type I error in this case reduces to maximizing

P (T_{e} \geq e ∣ p_{e}^{1} = p_{e}^{0} = p)

over

p \in [0, 1]

.

For the commonly used improvement-type rules based on a positive difference in binomial counts (equivalently, a positive threshold on an increasing transform), this maximization occurs at

p = 1 / 2

. A convenient statement is the following lemma (stated for the raw count difference; it applies to any increasing rejection rule with a positive critical value).

Lemma A2.

Let

X_{0}, X_{1} \overset{i i d}{\sim} Bin (n, p)

be independent. For any integer

a \geq 1

,

f (p) : = P (X_{1} - X_{0} \geq a)

is maximized at

p = \frac{1}{2}

.

Proof.

Write

X_{1} - X_{0} = \sum_{i = 1}^{n} (A_{i} - B_{i})

with

(A_{i}, B_{i})

i.i.d. Bernoulli

(p)

pairs, and set

Y_{i} = A_{i} - B_{i} \in {- 1, 0, 1}

. Then

P (Y_{i} = 1) = P (Y_{i} = - 1) = p (1 - p) = : t, P (Y_{i} = 0) = 1 - 2 t,

so the law depends on p only through

t \in [0, 1 / 4]

. Let

K = # {i : Y_{i} \neq 0} \sim Bin (n, 2 t)

. Conditional on

K = k

, the nonzero increments are

\pm 1

with equal probability, so

X_{1} - X_{0} ∣ K = k \overset{d}{=} S_{k} : = \sum_{j = 1}^{k} ε_{j}

, where

ε_{j} \in {\pm 1}

are i.i.d. symmetric. Hence

f (p) = \sum_{k = 0}^{n} P (K = k) P (S_{k} \geq a) .

For

a \geq 1

, the tail probability

P (S_{k} \geq a)

is non-decreasing in k, and

K \sim Bin (n, 2 t)

is stochastically increasing in t. Therefore

f (p)

is increasing in

t = p (1 - p)

, which is maximized at

p = \frac{1}{2}

. □

Applying the lemma (or its transformed-rule analogue) yields that the maximum over p is attained at

p_{e}^{0} = p_{e}^{1} = \frac{1}{2}

. Together with

(p_{s}^{0}, p_{s}^{1}) = (0, 1)

, this gives the first claimed configuration:

p_{e}^{1} = p_{e}^{0} = \frac{1}{2}, p_{s}^{0} = 0, p_{s}^{1} = 1 .

Case 2:

p_{s}^{1} = p_{s}^{0}

(safety on the null boundary). By symmetry of the argument, the product (A6) is maximized by taking

(p_{e}^{0}, p_{e}^{1}) = (0, 1)

so that

P (T_{e} \geq y_{e}) = 1

and then maximizing

P (T_{s} \geq y_{s} ∣ p_{s}^{1} = p_{s}^{0} = p)

, which is attained at

p = \frac{1}{2}

. This yields the second configuration:

p_{s}^{1} = p_{s}^{0} = \frac{1}{2}, p_{e}^{0} = 0, p_{e}^{1} = 1 .

Combining the two cases proves the theorem. □

References

U.S. Food and Drug Administration. Multiple Endpoints in Clinical Trials: Guidance for Industry; Food and Drug Administration: Silver Spring, MD, USA, 2022. Available online: https://www.fda.gov/media/162416/download (accessed on 24 May 2026).
Pocock, S.J.; Ariti, C.A.; Collier, T.J.; Wang, D. The win ratio: A new approach to the analysis of composite endpoints in clinical trials. Eur. Heart J. 2012, 33, 176–182. [Google Scholar] [CrossRef] [PubMed]
Marcus, R.; Peritz, E.; Gabriel, K.R. On closed testing procedures with special reference to ordered analysis of variance. Biometrika 1976, 63, 655–660. [Google Scholar] [CrossRef]
Berger, R.L.; Hsu, J.C. Bioequivalence trials, intersection–union tests and equivalence confidence sets. Stat. Sci. 1996, 11, 283–319. [Google Scholar] [CrossRef]
Bryant, J.; Day, R. Incorporating toxicity considerations into the design of two-stage phase II clinical trials. Biometrics 1995, 51, 1372–1383. [Google Scholar] [CrossRef] [PubMed]
Conaway, M.R.; Petroni, G.R. Bivariate sequential designs for phase II trials. Biometrics 1995, 51, 656–664. [Google Scholar] [CrossRef]
Stallard, N.; Thall, P.F.; Whitehead, J. Decision-theoretic designs for phase II clinical trials with multiple outcomes. Biometrics 1999, 55, 971–977. [Google Scholar] [CrossRef]
Ivanova, A.; Qaqish, B.F.; Schell, M.J. Continuous toxicity monitoring in phase II trials in oncology. Biometrics 2005, 61, 540–545. [Google Scholar] [CrossRef]
Chen, C.-M.; Chi, Y. Curtailed two-stage designs with two dependent binary endpoints. Pharm. Stat. 2012, 11, 57–62. [Google Scholar] [CrossRef] [PubMed]
Thall, P.F.; Cheng, S.-C. Treatment comparisons based on two-dimensional safety and efficacy alternatives in oncology trials. Biometrics 1999, 55, 746–753. [Google Scholar] [CrossRef] [PubMed]
Everitt, B.S. The Analysis of Contingency Tables; Chapman and Hall: London, UK, 1977. [Google Scholar]
Kendall, M.G.; Stuart, A. The Advanced Theory of Statistics, Volume 2: Inference and Relationship, 3rd ed.; Charles Griffin: London, UK, 1973. [Google Scholar]
Homma, G.; Yoshida, T. Exact power and sample size in clinical trials with two co-primary binary endpoints. Stat. Methods Med. Res. 2025, 34, 2183–2201. [Google Scholar] [CrossRef] [PubMed]
Jung, H.; Mitani, A.A.; Husain, M.I.; Ma, C. Exact randomized two-stage phase 2 clinical trial designs for two binary co-primary endpoints. Stat. Med. 2026, 45, e70424. [Google Scholar] [CrossRef] [PubMed]
Birch, M.W. The detection of partial association, I: The 2 × 2 case. J. R. Stat. Soc. Ser. B Methodol. 1964, 26, 313–324. [Google Scholar] [CrossRef]
Yin, C.; Buzaianu, E.M.; Chen, P.; Hsu, L. A design for selecting among k treatments with two binary endpoints in comparison to a control treatment. Sankhya B 2026, 1–40. [Google Scholar] [CrossRef]

Table 1. Observed outcomes for treatment

π_{j}

.

Table 1. Observed outcomes for treatment

π_{j}

.

		Safety
		Yes	No	Sum
Efficacy	Yes	$X_{11}^{j}$	$X_{12}^{j}$	$X_{e}^{j}$
	No	$X_{21}^{j}$	$X_{22}^{j}$	$X_{e^{c}}^{j}$
	Sum	$X_{s}^{j}$	$X_{s^{c}}^{j}$

Table 2. Target power

= 0.75

, target type I error

= 0.15

,

ϕ = 1

, and

p_{e}^{0}, p_{s}^{0}

unknown.

Table 2. Target power

= 0.75

, target type I error

= 0.15

,

ϕ = 1

, and

p_{e}^{0}, p_{s}^{0}

unknown.

$δ_{e}$	$δ_{s}$	$p_{e}^{0}$	$p_{s}^{0}$	n	e	s	Power	Type I Error
0.1	0.1	0.45	0.45	235	12	12	0.75452	0.14436
	0.2	0.45	0.40	156	10	10	0.75276	0.14103
	0.3	0.45	0.35	154	10	10	0.75091	0.13948
0.2	0.1	0.40	0.45	156	10	10	0.75417	0.14103
	0.2	0.40	0.40	63	7	7	0.75241	0.12335
	0.3	0.40	0.35	46	6	6	0.75714	0.12566
0.3	0.1	0.35	0.45	153	16	10	0.75072	0.13869
	0.2	0.35	0.40	46	6	6	0.75521	0.12566
	0.3	0.35	0.35	29	5	5	0.76641	0.11852

Table 3. Target power

= 0.85

, target type I error

= 0.15

,

ϕ = 1

, and

p_{e}^{0}, p_{s}^{0}

unknown.

Table 3. Target power

= 0.85

, target type I error

= 0.15

,

ϕ = 1

, and

p_{e}^{0}, p_{s}^{0}

unknown.

$δ_{e}$	$δ_{s}$	$p_{e}^{0}$	$p_{s}^{0}$	n	e	s	Power	Type I Error
0.1	0.1	0.45	0.45	312	14	14	0.85141	0.13987
	0.2	0.45	0.4	225	12	12	0.85153	0.13912
	0.3	0.45	0.35	224	12	17	0.85112	0.13859
0.2	0.1	0.4	0.45	225	12	12	0.85081	0.13912
	0.2	0.4	0.4	76	7	7	0.85473	0.14583
	0.3	0.4	0.35	56	6	6	0.85093	0.14930
0.3	0.1	0.35	0.45	224	14	12	0.85054	0.13859
	0.2	0.35	0.4	56	6	6	0.85183	0.14930
	0.3	0.35	0.35	34	5	5	0.85353	0.13750

Table 4. Target power

= 0.75

, target type I error

= 0.15

,

δ_{e} = δ_{s} = 0.2

, and

ϕ \in {0, 1, 2, 4, 8}

.

Table 4. Target power

= 0.75

, target type I error

= 0.15

,

δ_{e} = δ_{s} = 0.2

, and

ϕ \in {0, 1, 2, 4, 8}

.

$p_{s}^{0}$	$p_{e}^{0}$	$ϕ$	n	e	s	Power	Type I Error
0.2	0.2	0	48	5	5	0.75495	0.12505
		1	47	5	5	0.75505	0.12295
		2	46	5	5	0.75396	0.11976
		4	45	5	5	0.75135	0.11693
		8	45	5	5	0.76221	0.11774
	0.4	0	53	6	5	0.75059	0.13623
		1	52	6	5	0.75415	0.13770
		2	51	6	5	0.75134	0.13305
		4	50	6	5	0.75022	0.12924
		8	50	6	5	0.75559	0.13262
	0.6	0	52	6	5	0.76262	0.13648
		1	50	6	5	0.75010	0.13051
		2	50	6	5	0.75735	0.13080
		4	50	6	5	0.76275	0.13154
		8	49	6	5	0.75629	0.12770
0.4	0.2	0	54	5	6	0.76188	0.13951
		1	52	5	6	0.75205	0.13523
		2	52	5	6	0.76051	0.13511
		4	51	5	6	0.75904	0.13437
		8	50	5	6	0.75575	0.13115
	0.4	0	58	6	6	0.75214	0.14951
		1	57	6	6	0.75607	0.14663
		2	56	6	6	0.75733	0.14505
		4	55	6	6	0.75392	0.14443
		8	54	6	6	0.75765	0.14107
	0.6	0	57	6	6	0.76124	0.14765
		1	55	6	6	0.75283	0.14256
		2	54	6	6	0.75011	0.14013
		4	54	6	6	0.76298	0.14010
		8	53	6	6	0.75934	0.13641
0.6	0.2	0	52	5	6	0.76290	0.13554
		1	50	5	6	0.75125	0.13153
		2	50	5	6	0.75556	0.13236
		4	49	5	6	0.75186	0.12765
		8	49	5	6	0.75558	0.12815
	0.4	0	57	6	6	0.76215	0.14689
		1	55	6	6	0.75404	0.14279
		2	55	6	6	0.76194	0.14152
		4	54	6	6	0.75950	0.14007
		8	53	6	6	0.76019	0.13837
	0.6	0	55	6	6	0.76030	0.14275
		1	54	6	6	0.76235	0.13779
		2	53	6	6	0.75781	0.13831
		4	52	6	6	0.75438	0.13652
		8	51	6	6	0.75506	0.13410

Table 5. Target power

= 0.85

, target type I error

= 0.15

,

δ_{e} = δ_{s} = 0.2

, and

ϕ \in {0, 1, 2, 4, 8}

.

Table 5. Target power

= 0.85

, target type I error

= 0.15

,

δ_{e} = δ_{s} = 0.2

, and

ϕ \in {0, 1, 2, 4, 8}

.

$p_{s}^{0}$	$p_{e}^{0}$	$ϕ$	n	e	s	Power	Type I Error
0.2	0.2	0	58	5	5	0.85928	0.14710
		1	57	5	5	0.85588	0.14647
		2	56	5	5	0.85261	0.14387
		4	55	5	5	0.85124	0.14296
		8	55	5	5	0.85713	0.14121
	0.4	0	71	7	6	0.85598	0.13195
		1	70	7	6	0.85405	0.13131
		2	70	7	6	0.85598	0.13254
		4	69	7	6	0.85427	0.12807
		8	68	7	6	0.85286	0.12577
	0.6	0	68	7	6	0.85027	0.12720
		1	67	7	6	0.85180	0.12478
		2	67	7	6	0.85443	0.12593
		4	67	7	6	0.85791	0.12723
		8	66	7	6	0.85109	0.12466
0.4	0.2	0	71	6	7	0.85386	0.13086
		1	70	6	7	0.85311	0.13067
		2	70	6	7	0.85510	0.13046
		4	69	6	7	0.85481	0.13033
		8	68	6	7	0.85297	0.12787
	0.4	0	77	7	7	0.85411	0.14499
		1	76	7	7	0.85368	0.14162
		2	75	7	7	0.85262	0.13935
		4	74	7	7	0.85120	0.13735
		8	73	7	7	0.85252	0.13745
	0.6	0	74	7	7	0.85361	0.13814
		1	73	7	7	0.85441	0.13641
		2	72	7	7	0.85028	0.13375
		4	72	7	7	0.85345	0.13483
		8	72	7	7	0.85966	0.13623
0.6	0.2	0	68	6	7	0.85159	0.12723
		1	67	6	7	0.85175	0.12547
		2	67	6	7	0.85383	0.12457
		4	67	6	7	0.85748	0.12530
		8	66	6	7	0.85126	0.12497
	0.4	0	74	7	7	0.85541	0.13949
		1	73	7	7	0.85350	0.13631
		2	72	7	7	0.85037	0.13508
		4	72	7	7	0.85189	0.13507
		8	71	7	7	0.85313	0.13311
	0.6	0	72	7	7	0.85746	0.13336
		1	71	7	7	0.85887	0.13302
		2	70	7	7	0.85303	0.13095
		4	69	7	7	0.85038	0.12930
		8	69	7	7	0.85707	0.13020

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, P.; Yin, C.; Buzaianu, E.M. Hypothesis Testing for Two-Arm Proportions with Two Binary Endpoints. Axioms 2026, 15, 435. https://doi.org/10.3390/axioms15060435

AMA Style

Chen P, Yin C, Buzaianu EM. Hypothesis Testing for Two-Arm Proportions with Two Binary Endpoints. Axioms. 2026; 15(6):435. https://doi.org/10.3390/axioms15060435

Chicago/Turabian Style

Chen, Pinyuen, Chishu Yin, and Elena M. Buzaianu. 2026. "Hypothesis Testing for Two-Arm Proportions with Two Binary Endpoints" Axioms 15, no. 6: 435. https://doi.org/10.3390/axioms15060435

APA Style

Chen, P., Yin, C., & Buzaianu, E. M. (2026). Hypothesis Testing for Two-Arm Proportions with Two Binary Endpoints. Axioms, 15(6), 435. https://doi.org/10.3390/axioms15060435

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hypothesis Testing for Two-Arm Proportions with Two Binary Endpoints

Abstract

1. Introduction

2. Formulation for Two Endpoints

Joint Probability of Two Endpoints

3. Fixed Sample Size Design

4. Tables and Discussion

5. Example

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Theorem 2

Appendix A.3. Proof of Theorem 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI