Testing Homogeneity of Odds Ratio for Stratified Bilateral Correlated Data

Shen, Xi; Zhang, Xueqing; Ma, Chang-Xing

doi:10.3390/axioms15020155

Open AccessArticle

Testing Homogeneity of Odds Ratio for Stratified Bilateral Correlated Data^†

by

Xi Shen

¹,

Xueqing Zhang

² and

Chang-Xing Ma

^1,*

¹

Department of Biostatistics, The State University of New York at Buffalo, Buffalo, NY 14214, USA

²

Department of Statistics, Yanshan University, Qinhuangdao 066004, China

^*

Author to whom correspondence should be addressed.

^†

Statement: This article is a revised and expanded version of Dr. Xi Shen’s PhD thesis entitled “Statistical Inference of Association Parameters for Stratified Bilateral Correlated Data”, which was completed at University at Buffalo in 2018.

Axioms 2026, 15(2), 155; https://doi.org/10.3390/axioms15020155

Submission received: 18 December 2025 / Revised: 12 February 2026 / Accepted: 18 February 2026 / Published: 20 February 2026

(This article belongs to the Special Issue New Perspectives in Mathematical Statistics, 2nd Edition)

Download

Browse Figure

Versions Notes

Abstract

In clinical studies such as ophthalmologic or otolaryngologic research, bilateral correlated data frequently arise when outcomes are collected from paired organs or body parts. Since the measurements from such paired observations are usually highly correlated, appropriate data analysis requires accounting for the intra-class correlation. Methodological developments for analyzing bilateral data have been extensively studied over the past several decades, including both inferential procedures and computational strategies. In some analyses, the center effect or confounding effect could lead to imbalance among treatment arms, making it necessary to adjust for stratification/confounding factors in the data analysis. In this article, we develop three testing procedures for assessing the homogeneity of odds ratios in stratified bilateral correlated data under the assumption of a common correlation structure. Monte Carlo simulation studies are conducted to evaluate the performance of the proposed methods. The results indicate that the Wald-type test based on a log-linear hypothesis and the score test maintain robust type I error rates and achieve high power across a range of scenarios, and are therefore recommended for practical application. The proposed methodologies are further illustrated using two real data examples.

Keywords:

MSC:

62F03

1. Introduction

In randomized controlled trials, bilateral data commonly arise when participants receive treatment or surgery on paired organs or body parts. For example, in ophthalmologic studies, patients are randomly assigned to one of two treatment groups to evaluate whether a new therapy is more effective than an established treatment or a control with no intervention. After the treatment period, outcomes are often summarized into three categories: cure in both organs, cure in only one organ, or no cure. A similar data structure appears in case–control studies investigating the association between smoking and age-related macular degeneration (AMD), where clinical measurements are collected from both the left and right eyes. In such settings, responses can likewise be classified as bilateral, unilateral, or absent, and observations from paired organs are generally positively correlated.

Recent studies have demonstrated that failing to account for within-pair dependence can result in invalid statistical inferences. To address this issue, paired correlated data are commonly analyzed using one of three parametric frameworks: the R model, Dallal’s model, and the

ρ

model. (1) “R model”: Rosner [1] introduced a model incorporating a constant R to capture intra-pair correlation, where the conditional probability of a response in one organ given a response in the paired organ is proportional to the group-specific prevalence. Nevertheless, Dallal [2] noted that this model may provide a poor fit when bilateral responses occur with high certainty and prevalence varies substantially across groups. (2) “Dallal’s model”: To overcome this limitation, Dallal [2] proposed a model that assumes a constant conditional probability of response in one organ given a response in the other, independent of group prevalence. (3) “ $ρ$ model”: Alternatively, Donner [3] suggested a formulation in which all treatment groups share a common intra-class correlation coefficient to characterize within-pair dependence. Collectively, these models provide practical strategies for handling paired correlated data. Building on these frameworks, a range of asymptotic and exact testing procedures has been developed over the past two decades, with empirical evidence indicating satisfactory performance [4,5,6,7]. Beyond these parametric models, marginal modeling approaches have also been developed for correlated binary outcomes. For example, Zou and Donner [8] extended modified Poisson regression to prospective studies with correlated data to estimate marginal risk ratios using sandwich variance estimators. Within the generalized estimating equation framework, Westgate [9] proposed improved intra-cluster correlation estimation methods that reduce bias relative to moment-based estimators. Moreover, Li and Tong [10] derived sample size formulas for modified Poisson regression in cluster randomized trials. Together, these approaches offer flexible alternatives for analyzing correlated binary data.

Stratified designs are widely employed to adjust for multi-center effects or other potential confounders and have attracted increasing attention in the recent statistical literature [11,12]. In stratified bilateral-sample studies with two treatment groups per stratum, valid analysis of paired observations must simultaneously address within-subject correlation and between-stratum heterogeneity. Motivated by this structure, a variety of statistical methods have been proposed to incorporate the dependence of bilateral outcomes within stratified designs explicitly.

Three primary measures are commonly used to characterize the association between two proportions in a population: the risk difference, the risk ratio, and the odds ratio (OR) [13]. Owing to its intuitive interpretation, the risk difference has been extensively used to compare treatment effects. For example, Tang and Qiu [14] proposed a modified score test for assessing homogeneity of the risk difference and constructed confidence intervals (CIs) for a common risk difference under the R model. Subsequently, Shen et al. [15] developed maximum likelihood estimates (MLEs)-based tests for evaluating homogeneity of differences in stratified correlated bilateral data under the

ρ

model, and further proposed multiple testing procedures and CI methods for a common risk difference within the same framework.

In certain applications, the risk ratio also provides meaningful insight. Pei et al. [16] introduced a homogeneity test for proportion ratios in stratified bilateral data using the R model, while Zhuang et al. [17] derived several CI estimators for proportion ratios under the

ρ

model within each stratum.

The odds ratio, however, is often the preferred measure of association in prospective, retrospective, and cross-sectional studies, and it serves as a fundamental parameter in the analysis of multiway contingency tables [18]. Despite the extensive literature on risk differences and risk ratios for stratified bilateral designs, asymptotic testing procedures for the OR remain largely unexplored. Unlike the risk difference or risk ratio, the OR is defined through a nonlinear transformation and is constrained to be nonnegative, rendering standard methods developed for independent or unpaired data inappropriate. This added complexity underscores the methodological challenges associated with OR-based inference in correlated bilateral settings.

Motivated by these gaps, this article focuses on developing asymptotic test procedures for assessing homogeneity of the OR in stratified correlated paired binary data under the

ρ

model. We address these challenges by proposing testing approaches based on maximum likelihood estimation. The finite-sample performance of the proposed methods is evaluated through simulation studies, and their practical utility is demonstrated using data from an otolaryngologic study and a multi-center, two-arm clinical trial.

The remainder of this article is organized as follows. In Section 2, we briefly describe the data structure. Section 3 presents the MLEs of the model parameters along with three testing procedures. Simulation studies evaluating the performance of these tests are reported in Section 4. In Section 5, two real data examples are analyzed to illustrate the proposed methods. Finally, Section 6 provides concluding remarks and discusses potential directions for future research.

2. Data Structure

We describe the data structure and statistical model under a prospective study design. Let

m_{l i j}

denote the number of subjects in the

i^{t h}

group (

i = 1, 2

) and

j^{t h}

stratum (

j = 1, \dots, J

) who exhibit l responses (

l = 0, 1, 2

). Let

m_{i j} = \sum_{l = 0}^{2} m_{l i j}

represent the total number of subjects in the

i^{t h}

group and

j^{t h}

stratum.

Define

Z_{h i j k}

as an indicator variable for the response of the

h^{t h}

(

h = 1, 2

) eye (or other paired organ) of the

k^{t h}

subject (

k = 1, \dots, m_{\cdot i j}

) in the

i^{t h}

group and

j^{t h}

stratum, where

Z_{h i j k} = 1

indicates a response and

Z_{h i j k} = 0

otherwise. We assume a common marginal response probability

Pr (Z_{h i j k} = 1) = π_{i j}

(

0 \leq π_{i j} \leq 1

) for both organs within the same group and stratum.

To account for within-subject dependence, we adopt a stratified version of the equal-correlation (or

ρ

) model. Specifically, let

ρ_{j}

(

- 1 \leq ρ_{j} \leq 1

) denote the intra-class correlation coefficient for subjects in the

j^{t h}

stratum. The correlation is assumed to be identical across treatment groups within each stratum but allowed to vary between strata.

Under this formulation, the probabilities of observing no response, a unilateral response, or a bilateral response in the

i^{t h}

group and

j^{t h}

stratum are given by

p_{0 i j} = (1 - π_{i j}) (1 - π_{i j} + ρ_{i j} π_{i j})

,

p_{1 i j} = 2 π_{i j} (1 - ρ_{i j}) (1 - π_{i j})

, and

p_{2 i j} = π_{i j}^{2} + ρ_{i j} π_{i j} (1 - π_{i j})

, respectively. The corresponding observed frequencies and probabilities are summarized in Table 1.

3. Proposed Methods

In this article, we aim to test whether the odds ratios (ORs) of the two groups are equal across all strata. Let

θ_{j}

denotes the OR between two groups in

j^{t h}

stratum where

θ_{j} = \frac{π_{2 j} / (1 - π_{2 j})}{π_{1 j} / (1 - π_{1 j})}

and

θ

denotes an arbitrary constant. Therefore, the null hypothesis of interest is

H_{0}

:

θ_{1} = \dots = θ_{J} ≜ θ

versus

H_{a}

: At least one of the

θ_{j}

’s is not equal.

Let

m_{j} = {m_{01 j}, m_{11 j}, m_{21 j}; m_{02 j}, m_{12 j}, m_{22 j}}

denote the observed data for the

j^{t h}

stratum, as shown in Table 1, where each column follows a multinomial distribution. Following Ma and Liu [19], the log-likelihood function of observed data

m_{j}

is given by

\begin{matrix} l_{j} (π_{1 j}, π_{2 j}, ρ_{j} ∣ m_{j}) = \sum_{i = 1}^{2} & {m_{0 i j} log [(1 - π_{i j}) (ρ_{j} π_{i j} - π_{i j} + 1)] \\ + m_{1 i j} log [2 π_{i j} (1 - ρ_{j}) (1 - π_{i j})] \\ + m_{2 i j} log [π_{i j}^{2} + ρ_{j} π_{i j} (1 - π_{i j})]} \\ + constant . \end{matrix}

(1)

Thus, the overall log-likelihood function of the observed data is

l (π_{1}, π_{2}, ρ) = \sum_{j = 1}^{J} l_{j} (π_{1 j}, π_{2 j}, ρ_{j} ∣ m_{j}),

(2)

where

π_{i} = {(π_{i 1}, \dots, π_{i J})}^{T}

,

ρ = {(ρ_{1}, \dots, ρ_{J})}^{T}

,

i = 1, 2

.

(a): Unconstrained MLEs

We first derive the unconstrained MLEs. By taking partial derivatives of l (or

l_{j}

) with respect to

π_{i j}

and

ρ_{j}

and setting them equal to zero, the resulting equations are obtained. The solutions to these equations yield the MLEs, denoted by

{\hat{π}}_{i j}

and

{\hat{ρ}}_{j}

, respectively.

\begin{matrix} \frac{\partial l}{\partial π_{i j}} = & \frac{(2 π_{i j} - 1) m_{1 i j}}{π_{i j} (π_{i j} - 1)} + \frac{m_{2 i j} (ρ_{j} + 2 π_{i j} - 2 ρ_{j} π_{i j})}{π_{i j} (ρ_{j} + π_{i j} - ρ_{j} π_{i j})} \\ - \frac{m_{0 i j} (ρ_{j} + 2 π_{i j} - 2 ρ_{j} π_{i j} - 2)}{(π_{i j} - 1) (ρ_{j} π_{i j} - π_{i j} + 1)}, i = 1, 2 . \end{matrix}

(3)

\frac{\partial l}{\partial ρ_{j}} = \sum_{i = 1}^{2} [\frac{m_{1 i j}}{ρ_{j} - 1} - \frac{(π_{i j} - 1) m_{2 i j}}{ρ_{j} + π_{i j} - ρ_{j} π_{i j}} + \frac{π_{i j} m_{0 i j}}{ρ_{j} π_{i j} - π_{i j} + 1}] .

(4)

There are no closed-form solutions for the above system of equations. Moreover, direct application of a global iterative algorithm is often computationally intensive and may suffer from convergence issues, especially in high-dimensional settings. Instead, the MLEs can be obtained using the procedure proposed by Ma and Liu [19], which employs an alternating iterative scheme that updates

π_{i j}

and

ρ_{j}

in turn. In particular, Equation (3) can be simplified into a cubic Equation (5),

\begin{matrix} (4 ρ_{j} - 2 ρ_{j}^{2} - 2) m_{i j} π_{i j}^{3} + [3 ρ_{j}^{2} m_{i j} - ρ_{j} (5 m_{0 i j} + 6 m_{1 i j} + 7 m_{2 i j}) + 2 m_{0 i j} \\ + 3 m_{1 i j} + 4 m_{2 i j}] π_{i j}^{2} + [(4 ρ_{j} - ρ_{j}^{2}) m_{i j} - 2 ρ_{j} m_{0 i j} - m_{1 i j} - 2 m_{2 i j}] π_{i j} \\ - ρ_{j} (m_{1 i j} + m_{2 i j}) = 0 . \end{matrix}

(5)

The MLE of

π_{i j}

is a function of

ρ_{j}

, which can be obtained by solving for the real root of it. Specifically, the iterative procedure is initialized using estimates obtained from the pooled counts across the two groups. Given an initial value of

ρ_{j}

,

π_{i j}

is first updated by solving the log-likelihood equation, selecting the real root within

(0, 1)

that maximizes the stratum-specific log-likelihood. Then,

ρ_{j}

is updated using the Fisher scoring algorithm. The

{(t + 1)}^{t h}

approximate of

ρ_{j}

is

ρ_{j}^{(t + 1)} = ρ_{j}^{(t)} - {[\frac{\partial^{2} l (π_{1 j}^{(t)}, π_{2 j}^{(t)}; ρ_{j}^{(t)})}{\partial ρ_{j}^{2}}]}^{- 1} \frac{\partial l (π_{1 j}^{(t)}, π_{2 j}^{(t)}; ρ_{j}^{(t)})}{\partial ρ_{j}},

where

j = 1, \dots, J

. Subsequently, the

{(t + 1)}^{t h}

update of

π_{i j}

is obtained by solving the cubic equation with

ρ_{j}

replaced by

ρ_{j}^{(t + 1)}

. These two steps are repeated iteratively until convergence, which is defined as

| ρ_{j}^{(t + 1)} - ρ_{j}^{(t)} | < ε

, where

ε

is a pre-specified tolerance level. The expressions for the second-order derivatives are provided in the Appendix A.

(b): Constrained MLEs

We subsequently derive the constrained MLEs. Under the null hypothesis,

π_{2 j}

can be written as

\frac{θ π_{1 j}}{θ π_{1 j} - π_{1 j} + 1}

, so that the parameter space is reduced to

ρ_{j}

,

π_{1 j}

, and the common parameter

θ

. By setting the partial derivatives of l (or

l_{j}

) with respect to

(ρ_{j}, π_{1 j}, θ)

equal to zero, the constrained MLEs are obtained and denoted by

({\hat{ρ}}_{j H_{0}}, {\hat{π}}_{1 j H_{0}}, {\hat{θ}}_{H_{0}})

.

There are no closed-form solutions to (

{\hat{ρ}}_{j}, {\hat{π}}_{1 j}, {\hat{θ}}_{H_{0}}

). We adopt an iterative approach by Shen et al. [15] to obtain the constrained MLEs under the null hypothesis. The common odds ratio

θ

is initialized as 1, and

π_{1 j}

and

ρ_{j}

are initialized using pooled counts across the two groups. Given the current values of

π_{1 j}

and

ρ_{j}

,

θ

is updated via a Newton–Raphson algorithm, and then

π_{1 j}

and

ρ_{j}

are updated jointly for each stratum using a Fisher Scoring algorithm with a given

θ

. The feasible real root within

(0, 1)

that maximizes the likelihood is selected. This process is repeated until the changes in

θ

,

π_{1 j}

, and

ρ_{j}

between successive iterations are smaller than a pre-specified tolerance

ε

, i.e.,

| ρ_{j}^{(t + 1)} - ρ_{j}^{(t)} | < ε .

With all MLEs obtained, we consider the following test procedures.

3.1. Likelihood Ratio Test ( $T_{L}$ )

The likelihood ratio test (LRT) statistic is given by

\begin{matrix} T_{L} = 2 & [l ({\hat{π}}_{11}, {\hat{π}}_{21}, \dots, {\hat{π}}_{1 J}, {\hat{π}}_{2 J}, {\hat{ρ}}_{1}, \dots, {\hat{ρ}}_{J}) \\ - l ({\hat{π}}_{11 H_{0}}, {\hat{π}}_{21 H_{0}}, \dots, {\hat{π}}_{1 J H_{0}}, {\hat{π}}_{2 J H_{0}}, {\hat{ρ}}_{1 H_{0}}, \dots, {\hat{ρ}}_{J H_{0}})] . \end{matrix}

(6)

where

{\hat{π}}_{2 j H_{0}} = \frac{{\hat{θ}}_{H_{0}} {\hat{π}}_{1 j H_{0}}}{{\hat{θ}}_{H_{0}} {\hat{π}}_{1 j H_{0}} - {\hat{π}}_{1 j H_{0}} + 1}

. Following Wilks [20], under the null hypothesis,

T_{L}

asymptotically follows a chi-square distribution with

J - 1

degrees of freedom.

3.2. Wald-Type Log-Linear Test ( $T_{W}$ )

Log-transformation is widely used in biomedical research to handle skewed data, because the odds ratio (OR), as a ratio of two estimated odds that are bounded below by zero and unbounded above, has a highly skewed sampling distribution when sample sizes are small to moderate. The null hypothesis

θ_{1} = \dots = θ_{J} ≜ θ

can be written as

log θ_{1} = \dots = log θ_{J} ≜ log θ

, where

θ_{j} = \frac{π_{1 j} / (1 - π_{1 j})}{π_{2 j} / (1 - π_{2 j})}

, and

log θ_{j} = log [π_{1 j} / (1 - π_{1 j})] - log [π_{2 j} / (1 - π_{2 j})]

. Furthermore, the null hypothesis is expressed as

log (O_{11}) - log (O_{21}) = \dots = log (O_{1 J}) - log (O_{2 J}) ≜ log (O_{1}) - log (O_{2})

, where

log (O_{i j}) = log [π_{i j} / (1 - π_{i j})]

is called logit for the

i^{t h}

group from the

j^{t h}

stratum.

Let

β = {(ρ_{1}, π_{11}, π_{21}, \dots, ρ_{J}, π_{1 J}, π_{2 J})}^{T}

, where the corresponding unconstrained MLE is

\hat{β} = {({\hat{ρ}}_{1}, {\hat{π}}_{11}, {\hat{π}}_{21}, \dots, {\hat{ρ}}_{J}, {\hat{π}}_{1 J}, {\hat{π}}_{2 J})}^{T}

.

Let

γ = {(log (ρ_{1}), log (O_{11}), log (O_{21}), \dots, log (ρ_{J}), log (O_{1 J}), log (O_{2 J}))}^{T}

; then the null hypothesis can be rewritten as

H \times γ = 0

, where H has the form that

H = {(\begin{matrix} 0 & 1 & - 1 & 0 & - 1 & 1 & 0 & \dots & \dots & \dots & \dots & \dots & 0 \\ 0 & 0 & 0 & 0 & 1 & - 1 & 0 & - 1 & 1 & 0 & \dots & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & \dots & \dots & \dots & \dots & \dots & 0 & 1 & - 1 & 0 & - 1 & 1 \end{matrix})}_{(J - 1) \times 3 J} .

According to the asymptotic normality of MLE under certain regularity conditions, the asymptotic distribution of

β

is given by

\sqrt{N} (\hat{β} - β) \overset{d}{\to} Normal (0, I_{(β)}^{- 1})

, where

I_{(β)}

is the information matrix for

β

with a block diagonal structure where

I_{(β)} = diag (I_{1}, \dots, I_{J})

,

I_{j}

is the information matrix for

(ρ_{j}, π_{1 j}, π_{2 j})

from each stratum block, and N is the sample size. Using the delta method, we can further obtain the asymptotic distribution of

γ

, which is given by

\sqrt{N} (\hat{γ} - γ) \overset{d}{\to} Normal (0, I_{(γ)}^{- 1})

, where

I_{(γ)}

is the information matrix for

γ

.

I_{(γ)}

has a a block diagonal structure that

I_{(γ)} = diag (I_{1}^{(γ)}, \dots, I_{J}^{(γ)})

, where

I_{j}^{(γ)} = g_{j} I_{j}^{- 1}

,and

g_{j} = [\begin{matrix} 0 & 0 & 0 \\ 0 & \frac{1}{π_{1 j}} + \frac{1}{1 - π_{1 j}} & 0 \\ 0 & 0 & \frac{1}{π_{2 j}} + \frac{1}{1 - π_{2 j}} \end{matrix}] .

Therefore, the Wald-type log-linear test statistic has the form that

T_{W} = (γ^{T} H^{T}) {(H I_{(γ)}^{- 1} H^{T})}^{- 1} (H γ) |_{γ = \hat{γ}},

(7)

Under the null hypothesis,

T_{W}

is asymptotically distributed as a chi-square distribution with

J - 1

degrees of freedom.

Particularly, in order to test

H_{0} : θ_{i} = θ_{j} for all i, j versus H_{a} : θ_{i} \neq θ_{j} for at least one pair i \neq j,

Wald-type log-linear test statistics are given by

T_{W a} = (γ^{T} h^{T}) {(h I_{(γ)}^{- 1} h^{T})}^{- 1} (h γ) |_{γ = \hat{γ}},

(8)

where

h = (0, \dots, 0, 1, - 1, \dots, 0, - 1, 1, \dots, 0)

with

(0, 1, - 1)

in the

{(3 i - 2)}^{t h}

to

{3 i}^{t h}

element,

(0, - 1, 1)

in the

{(3 j - 2)}^{t h}

to

{3 j}^{t h}

element, and 0 otherwise. Under the null hypothesis,

T_{W a}

is asymptotically distributed as a chi-square distribution with 1 degree of freedom.

3.3. Score Test ( $T_{S C}$ )

The score test statistic

T_{S C}

utilizes the MLEs of parameters under the null hypothesis. Let

δ = {(θ_{1}, π_{11}, ρ_{1}, \dots, θ_{J}, π_{1 J}, ρ_{J})}^{T}

, then the score function U is a row vector with a

U_{j} = (\frac{\partial l}{\partial θ_{j}}, \frac{\partial l}{\partial π_{1 j}}, \frac{\partial l}{\partial ρ_{j}})

in

j^{t h}

block, and

I_{(δ)}

denotes a global information matrix for

δ

, which has a block diagonal structure, with a

I_{j}^{(δ)}

in each main diagonal block (i.e.,

I_{(δ)} = diag (I_{1}^{(δ)}, \dots, I_{j}^{(δ)})

).

Therefore, the score test statistic is given by

\begin{matrix} T_{S C} & = U I_{(δ)}^{- 1} U^{T} |_{H_{0}} \\ = \sum_{j = 1}^{J} U_{j} {[I_{j}^{(δ)} (θ_{j}, π_{1 j}, ρ_{j})]}^{- 1} U_{j}^{T} |_{θ_{j} = {\hat{θ}}_{H_{0}}, π_{1 j} = {\hat{π}}_{1 j H_{0}}, ρ_{j} = {\hat{ρ}}_{j H_{0}}} . \end{matrix}

(9)

Here,

θ_{j}

is the parameter of interest, while

π_{1 j}

and

ρ_{j}

are nuisance parameters. To simplify the calculation, the score components corresponding to the nuisance parameters are set to 0 at the constrained MLE under

H_{0}

. That is, the

j^{t h}

block of the score function can be rewritten as

U_{j} = (\frac{\partial l_{j}}{\partial θ_{j}}, 0, 0) |_{θ_{j} = {\hat{θ}}_{H_{0}}}

. Then the test statistics can be simplified as

T_{S C} = \sum_{j = 1}^{J} {(\frac{\partial l_{j}}{\partial θ_{j}})}^{2} {[I_{j}^{(δ)}]}^{- 1} (1, 1) |_{θ_{j} = {\hat{θ}}_{j H_{0}}, π_{1 j} = {\hat{π}}_{1 j H_{0}}, ρ_{j} = {\hat{ρ}}_{j H_{0}}},

(10)

where

{[I_{j}^{(δ)}]}^{- 1} (1, 1)

represents the

{(1, 1)}^{t h}

entry of

{[I_{j}^{(δ)}]}^{- 1}

. Under the null hypothesis,

T_{S C}

is proved to have an asymptotic chi-square distribution with

J - 1

degrees of freedom.

For each of the three proposed test procedures, the null hypothesis is rejected at the nominal significance level

α

if the observed value of the test statistic exceeds

χ_{J - 1, α}^{2}

, the upper

α

quantile of the chi-square distribution with

J - 1

degrees of freedom.

4. Simulation Studies

In this section, we use Monte Carlo simulations to examine the empirical type I error rates and power of the three test statistics introduced in the previous section.

First, we conduct simulation studies to evaluate the type I error rate under various parameter configurations. We focus on balanced data with equal sample sizes from two groups across

J = 2, 4, 8

strata. Boxplots in Figure 1 further illustrate the distribution of empirical type I error rates for all tests, considering balanced data with

m = m_{\cdot 11} = m_{\cdot 21} = \dots = m_{\cdot 1 J} = m_{\cdot 2 J} = 25, 50,

or 100 in

J = 2, 4,

or 8 strata, respectively. The specific parameter settings are summarized in Table 2. For each configuration, 10,000 samples are generated under the null hypothesis, and the empirical type I error rate is computed as the proportion of samples in which the null hypothesis is rejected. All tests are conducted at a

5 %

significance level. Following Tang et al. [21], a test is considered liberal if the empirical type I error exceeds

0.06

, conservative if it is below

0.04

, and otherwise robust.

Generally, the results (Table 3, Table 4 and Table 5 and Figure 1) indicate that the Wald-type test based on the log-linear hypothesis maintains satisfactory type I error across all simulation configurations. The score test performs well when the sample size is moderate to large (

m = 50

or 100), but tends to be slightly liberal in the small-sample scenario (

m = 25

), suggesting that the asymptotic approximation is less accurate when few observations are available. Interestingly, the performance of the score test improves as the number of strata increases, likely because the increased stratification provides more information and stabilizes the variance estimates.

In contrast, the likelihood ratio test exhibits pronounced liberal behavior in small samples. Small sample bias in the maximum likelihood estimates and deviations of the likelihood surface from its quadratic approximation can inflate the likelihood ratio test statistic, leading to higher than nominal rejection rates. In such settings, exact testing procedures may provide a more reliable alternative, as they avoid reliance on large-sample approximations. But exact methods may become computationally intensive in stratified correlated settings. Investigation of finite-sample exact inference under the proposed framework remains a topic for future study.

As expected, as the sample size increases, the type I error rates for all three tests converge toward the nominal level (

α = 0.05

). Overall, these findings suggest that the Wald-type log-linear test is robust across a wide range of sample sizes and configurations, the score test is reliable in moderate to large samples or with more strata, and indicate that the likelihood ratio test can be unreliable in small samples, and its use should be interpreted with caution.

Then, we evaluate the empirical power of the three proposed test statistics under various parameter settings. Specifically, we use the same sample sizes and parameter configurations as in the type I error simulations. Table 6, Table 7 and Table 8 present the empirical power for the likelihood ratio test, Wald-type log-linear test, and score test across different scenarios.

Overall, the three tests exhibit similar power under the same parameter settings, with no substantial differences observed among them. As expected, the empirical power increases as the difference in the true

θ

values among strata becomes larger, reflecting the greater detectability of the alternative hypothesis. The power also improves with increasing sample size, and to a lesser extent with the number of strata. In addition, following the confidence interval construction method described in [22], we conducted a limited set of supplementary simulations under selected parameter configurations to assess the empirical coverage probabilities of the odds ratio. The observed coverage rates ranged approximately from 94% to 96%, close to the nominal 95% level, suggesting reasonable finite-sample interval performance. These findings further suggest that the relatively small power values observed in some configurations are likely attributable to modest effect sizes and limited sample sizes, rather than substantial bias or instability of the estimator.

Considering both type I error control and empirical power, the Wald-type log-linear test and the score test are generally preferable for practical applications. They consistently maintain type I error close to the nominal level while achieving competitive power, making them reliable choices across a wide range of sample sizes and parameter configurations.

5. Real-World Examples

In this section, we analyze two real-world datasets to illustrate the application of the proposed test statistics.

The first example is a double-blind randomized clinical trial investigating the efficacy of cefaclor versus amoxicillin in treating acute otitis media with effusion (OME) in children [23]. A total of 31 children with bilateral tympanocentesis were randomly assigned to one of the two treatment groups, and each child received a 14-day course of the assigned antibiotic. Treatment outcomes were assessed by recording the number of cured ears at the end of the study. Stratification was based on age, and the resulting data structure is summarized in Table 9. A goodness-of-fit test supports the adequacy of the common

ρ

model for these data [24], justifying the use of the proposed testing procedures.

Maximum likelihood estimates of the model parameters are reported in Table 10, while the values of the test statistics and corresponding p-values are presented in Table 11. All p-values exceed the nominal significance level

α = 0.05

, indicating insufficient evidence to reject the null hypothesis of homogeneous odds ratios across strata for any of the proposed tests.

The second example comes from a multi-center, two-arm randomized trial involving 168 patients with diffuse scleroderma [25]. Participants were randomized to receive either native collagen or placebo, and treatment response was evaluated using the modified Rodnan skin score (MRSS). The MRSS assigns ordinal scores (0–3) to body parts to reflect disease severity. Following Tang et al. [26], the MRSS was dichotomized at the body-part level, with improvement defined as either a score of zero at follow-up or a decrease of at least two units from baseline. To account for disease duration, patients were stratified into early-phase (≤3 years) and late-phase (4–10 years) groups.

The corresponding data structure, parameter estimates, and test results are summarized in Table 12, Table 13 and Table 14. Similar to the first example, all p-values are greater than 0.05, providing no evidence against the null hypothesis of homogeneous odds ratios across strata.

These two examples demonstrate the practical applicability of the proposed methods for analyzing stratified bilateral or multi-center paired binary data. In both cases, the tests can be effectively implemented to compute MLEs, evaluate the test statistics, and obtain valid p-values, providing a useful framework for inference on the homogeneity of odds ratios in real-world clinical studies.

6. Conclusions

In this article, we consider the problem of testing homogeneity for odds ratios of two proportions under the “

ρ

model” assumption on stratified bilateral designs. Three MLE-based test procedures—likelihood ratio test, Wald-type log-linear test, and score statistics—are investigated. Classical algorithms, such as Fisher scoring and Newton–Raphson methods, can be computationally demanding, particularly when there are many parameters. We simplify the algorithm and computational process to improve efficiency, making the methods more convenient for practical use.

Simulation studies indicate that the Wald-type log-linear test and the score test generally maintain acceptable type I error and exhibit reasonable power under the parameter configurations considered in this study. The likelihood ratio test shows adequate power but can yield inflated type I error for small sample sizes. In smaller samples, the Wald-type log-linear test appears more stable, whereas the score test tends to perform better in moderate to large samples. These results suggest that both tests are suitable for practical application, with the choice guided by sample size and study design.

Building on this article, we note that the

ρ

model assumes equal intra-class correlation between groups within each stratum. While this assumption simplifies estimation and inference, it may be violated in practice, potentially affecting variance estimates and test statistics. Therefore, caution is warranted when interpreting results, particularly for small samples or in the presence of highly heterogeneous correlations. Moreover, the current study focuses on comparisons between two groups within each stratum. When more than two groups are involved, the problem naturally extends to a many-to-one or pairwise comparison setting, which requires additional methodological development. Extending the proposed methods to handle multiple groups via pairwise comparisons thus represents an important topic for future research. In addition, developing exact tests for small sample sizes remains a promising direction for future work, complementing the asymptotic methods studied here and addressing scenarios in which the current approaches may have limited accuracy.

Author Contributions

Conceptualization, X.S. and C.-X.M.; methodology, X.S. and C.-X.M.; software, X.S. and C.-X.M.; validation, X.S., X.Z. and C.-X.M.; formal analysis, X.S.; investigation, X.S., X.Z. and C.-X.M.; writing—original draft preparation, X.S.; writing—review and editing, X.S., X.Z. and C.-X.M.; visualization, X.S.; supervision, C.-X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available in references [23,25].

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Information Matrix and Formula Derivation

The second-order differential equations from the

j^{t h}

stratum with respect to

π_{i j}

(

i = 1, 2

) and

ρ_{j}

yield

\begin{matrix} \frac{\partial^{2} l}{\partial π_{i j}^{2}} & = & - \frac{(2 π_{i j}^{2} - 2 π_{i j} + 1) m_{1 i j}}{π_{i j}^{2} {(π_{i j} - 1)}^{2}} \\ - \frac{(2 {ρ_{j}}^{2} π_{i j}^{2} - 2 {ρ_{j}}^{2} π_{i j} + {ρ_{j}}^{2} - 4 ρ_{j} π_{i j}^{2} + 2 ρ_{j} π_{i j} + 2 π_{i j}^{2}) m_{2 i j}}{π_{i j}^{2} {(ρ_{j} + π_{i j} - ρ_{j} π_{i j})}^{2}} \\ - \frac{(2 {ρ_{j}}^{2} π_{i j}^{2} - 2 {ρ_{j}}^{2} π_{i j} + {ρ_{j}}^{2} - 4 ρ_{j} π_{i j}^{2} + 6 ρ_{j} π_{i j} - 2 ρ_{j} + 2 π_{i j}^{2} - 4 π_{i j} + 2) m_{0 i j}}{{(π_{i j} - 1)}^{2} {(ρ_{j} π_{i j} - π_{i j} + 1)}^{2}}, \\ \frac{\partial^{2} l}{\partial π_{i j} \partial ρ_{j}} & = & \frac{m_{0 i j}}{{(ρ_{j} π_{i j} - π_{i j} + 1)}^{2}} - \frac{m_{2 i j}}{{(ρ_{j} π_{i j} - π_{i j} - ρ_{j})}^{2}}, \\ \frac{\partial^{2} l}{\partial π_{i j} \partial π_{k j}} & = & 0, i \neq k, \\ \frac{\partial^{2} l}{\partial ρ_{j}^{2}} & = & - \sum_{i = 1}^{2} [\frac{m_{1 i j}}{{(ρ_{j} - 1)}^{2}} + \frac{π_{i j}^{2} m_{0 i j}}{{(ρ_{j} π_{i j} - π_{i j} + 1)}^{2}} + \frac{{(π_{i j} - 1)}^{2} m_{2 i j}}{{(ρ_{j} + π_{i j} - ρ_{j} π_{i j})}^{2}}] . \end{matrix}

Then from the

j^{t h}

stratum, we have

I_{j} (π_{1 j}, π_{2 j}, ρ_{j}) = [\begin{matrix} I_{11 (j)} & 0 & I_{13 (j)} \\ 0 & I_{22 (j)} & I_{23 (j)} \\ I_{13 (j)} & I_{23 (j)} & I_{33 (j)}, \end{matrix}]

where

\begin{matrix} I_{i i (j)} & = & E (- \frac{\partial^{2} l}{\partial π_{i j}^{2}}) \\ = & \frac{m_{\cdot i j} (- 4 {ρ_{j}}^{2} π_{i j}^{2} + 4 {ρ_{j}}^{2} π_{i j} - {ρ_{j}}^{2} + 6 ρ_{j} π_{i j}^{2} - 6 ρ_{j} π_{i j} + 2 ρ_{j} - 2 π_{i j}^{2} + 2 π_{i j})}{π_{i j} (1 - π_{i j}) (ρ_{j} + π_{i j} - ρ_{j} π_{i j}) (ρ_{j} π_{i j} - π_{i j} + 1)}, \\ I_{i 3 (j)} & = & E (- \frac{\partial^{2} l}{\partial π_{i j} \partial ρ_{j}}) = \frac{m_{\cdot i j} ρ_{j} (2 π_{i j} - 1)}{(ρ_{j} + π_{i j} - ρ_{j} π_{i j}) (ρ_{j} π_{i j} - π_{i j} + 1)}, \\ I_{33 (j)} & = & E (- \frac{\partial^{2} l}{\partial ρ_{j}^{2}}) = \sum_{i = 1}^{2} \frac{m_{\cdot i j} π_{i j} (ρ_{j} + 1) (1 - π_{i j})}{(1 - ρ_{j}) (ρ_{j} + π_{i j} - ρ_{j} π_{i j}) (ρ_{j} π_{i j} - π_{i j} + 1)} . \end{matrix}

References

Rosner, B. Statistical methods in ophthalmology: An adjustment for the intraclass correlation between eyes. Biometrics 1982, 38, 105–114. [Google Scholar] [CrossRef]
Dallal, G.E. Paired Bernoulli Trials. Biometrics 1988, 44, 253–257. [Google Scholar] [CrossRef]
Donner, A. Statistical methods in ophthalmology: An adjusted chi-square approach. Biometrics 1989, 45, 605–611. [Google Scholar] [CrossRef]
Tang, M.-L.; Ling, M.-H.; Tian, G.-L. Exact and approximate unconditional confidence intervals for proportion difference in the presence of incomplete data. Stat. Med. 2009, 28, 625–641. [Google Scholar] [CrossRef] [PubMed]
Pei, Y.; Tang, M.-L.; Wong, W.-K.; Guo, J. Confidence intervals for correlated proportion differences from paired data in a two-arm randomised clinical trial. Stat. Methods Med. Res. 2012, 21, 167–187. [Google Scholar] [CrossRef] [PubMed]
Peng, X.; Liu, C.; Liu, S.; Ma, C.-X. Asymptotic confidence interval construction for proportion ratio based on correlated paired data. J. Biopharm. Stat. 2019, 29, 1137–1152. [Google Scholar] [CrossRef]
Zhao, H.; Wang, X.; Bian, J.; Chen, S.; Li, Z. Homogeneity Test of Response Rate Functions in Bilateral Correlated Data Under Dallal’s Model. Complexity 2023, 2023, 7691732. [Google Scholar] [CrossRef]
Zou, G.Y.; Donner, A. Extension of modified Poisson regression model to prospective studies with correlated binary data. Stat. Methods Med. Res. 2013, 22, 661–670. [Google Scholar] [CrossRef]
Westgate, P.M. A readily available improvement over method of moments for intra-cluster correlation estimation in the context of cluster randomized trials and fitting a GEE-type marginal model for binary outcomes. Clin. Trials 2019, 16, 41–51. [Google Scholar] [CrossRef] [PubMed]
Li, F.; Tong, G. Sample size estimation for modified Poisson analysis of cluster randomized trials with a binary outcome. Stat. Methods Med. Res. 2021, 30, 1288–1305. [Google Scholar] [CrossRef]
Kahan, B.C. Accounting for centre-effects in multicentre trials with a binary outcome—When, why, and how? BMC Med. Res. Methodol. 2014, 14, 20. [Google Scholar] [CrossRef]
Kahan, B.C.; Morris, T.P. Assessing potential sources of clustering in individually randomised trials. BMC Med. Res. Methodol. 2013, 13, 58. [Google Scholar] [CrossRef]
Edwards, A. The measure of association in a 2 × 2 table. J. R. Stat. Soc. Ser. A Gen. 1963, 126, 109–114. [Google Scholar] [CrossRef]
Tang, N.-S.; Qiu, S.-F. Homogeneity test, sample size determination and interval construction of difference of two proportions in stratified bilateral-sample designs. J. Stat. Plan. Inference 2012, 142, 1242–1251. [Google Scholar] [CrossRef]
Shen, X.; Ma, C.-X.; Yuen, K.C.; Tian, G.-L. Common risk difference test and interval estimation of risk difference for stratified bilateral correlated data. Stat. Methods Med. Res. 2019, 28, 2418–2438. [Google Scholar] [CrossRef]
Pei, Y.B.; Tian, G.-L.; Tang, M.-L. Testing homogeneity of proportion ratios for stratified correlated bilateral data in two-arm randomized clinical trials. Stat. Med. 2014, 33, 4370–4386. [Google Scholar] [CrossRef] [PubMed]
Zhuang, T.; Tian, G.-L.; Ma, C.-X. Confidence intervals for proportion ratios of stratified correlated bilateral data. J. Biopharm. Stat. 2019, 29, 203–225. [Google Scholar] [CrossRef] [PubMed]
Walter, S. Choice of effect measure for epidemiological data. J. Clin. Epidemiol. 2000, 53, 931–939. [Google Scholar] [CrossRef] [PubMed]
Ma, C.X.; Liu, S. Testing equality of proportions for correlated binary data in ophthalmologic studies. J. Biopharm. Stat. 2017, 27, 611–619. [Google Scholar] [CrossRef]
Wilks, S. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. Ann. Math. Stat. 1938, 9, 60–62. [Google Scholar] [CrossRef]
Tang, N.-S.; Tang, M.-L.; Qiu, S.-F. Testing the equality of proportions for correlated otolaryngologic data. Comput. Stat. Data Anal. 2008, 52, 2719–3729. [Google Scholar] [CrossRef]
Hua, S.; Ma, C.-X. Common odds ratio test and interval estimation for stratified bilateral and unilateral data. Stat. Methods Med. Res. 2024, 33, 1559–1576. [Google Scholar] [CrossRef] [PubMed]
Mandel, E.M.; Bluestone, C.D.; Rockette, H.E.; Blatter, M.M.; Reisinger, K.S.; Wucher, F.P.; Harper, J. Duration of effusion after antibiotic treatment for acute otitis media: Comparison of cefaclor and amoxicillin. Pediatr. Infect. Dis. 1982, 1, 310–316. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Ma, C.-X. Goodness-of-Fit Tests for Correlated Bilateral Data from Multiple Groups; Springer: Berlin/Heidelberg, Germany, 2020; pp. 311–327. [Google Scholar] [CrossRef]
Postlethwaite, A.E.; Wong, W.K.; Clements, P.; Chatterjee, S.; Fessler, B.J.; Kang, A.; Korn, J.; Mayes, M.; Merkel, P.; Molitor, J.; et al. A multicenter, randomized, double-blind, placebo-controlled trial of oral type I collagen treatment in patients with diffuse cutaneous systemic sclerosis: I. oral type I collagen does not improve skin in all patients, but may improve skin in late-phase disease. Arthritis Rheumatol. 2008, 58, 1810–1822. [Google Scholar]
Tang, M.-L.; Pei, Y.; Wong, W.-K.; Li, J. Goodness-of-fit tests for correlated paired binary data. Stat. Methods Med. Res. 2012, 21, 331–345. [Google Scholar] [CrossRef]

Figure 1. Box-plots of empirical type I error rates for the proposed tests under different numbers of strata J and sample sizes n (Likelihood ratio test: TL; Wald-type log-linear test: TW; Score test: TSC).

Table 1. Data structure for the

j^{t h}

(

j = 1, \dots, J

) stratum in a stratified bilateral design.

Table 1. Data structure for the

j^{t h}

(

j = 1, \dots, J

) stratum in a stratified bilateral design.

Number of Responses (l)	Group (i)		Total
Number of Responses (l)	1	2	Total
0	$m_{01 j} (p_{01 j})$	$m_{02 j} (p_{02 j})$	$m_{0 \cdot j}$
1	$m_{11 j} (p_{11 j})$	$m_{12 j} (p_{12 j})$	$m_{1 \cdot j}$
2	$m_{21 j} (p_{21 j})$	$m_{22 j} (p_{22 j})$	$m_{2 \cdot j}$
Total	$m_{\cdot 1 j}$ (1.0)	$m_{\cdot 2 j}$ (1.0)	$m_{\cdot \cdot j}$

Table 2. Parameter setups for computing empirical type I error rates and powers.

	Cases	Number of Strata
	Cases	$J = 2$	$J = 4$	$J = 8$
$ρ$	I	$(0.4, 0.4)$	$(0.4, 0.4, 0.4, 0.4)$	$(0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4)$
$ρ$	II	$(0.4, 0.6)$	$(0.4, 0.6, 0.4, 0.6)$	$(0.4, 0.6, 0.4, 0.6, 0.4, 0.6, 0.4, 0.6)$
$π_{1}$	a	$(0.2, 0.4)$	$(0.2, 0.4, 0.2, 0.4)$	$(0.2, 0.4, 0.2, 0.4, 0.2, 0.4, 0.2, 0.4)$
	b	$(0.3, 0.3)$	$(0.3, 0.3, 0.3, 0.3)$	$(0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3)$
	c	$(0.3, 0.5)$	$(0.3, 0.5, 0.3, 0.5)$	$(0.3, 0.5, 0.3, 0.5, 0.3, 0.5, 0.3, 0.5)$

Table 3. Part of the simulation results of the empirical sizes for 2 strata.

$θ$	$ρ$	$π_{1}$	$m = 25$			$m = 50$			$m = 100$
$θ$	$ρ$	$π_{1}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$
1	I	a	6.81	5.09	6.95	5.51	4.88	5.28	5.37	5.21	5.27
		b	7.18	5.24	6.81	5.61	5.13	5.46	5.43	5.29	5.30
		c	6.74	5.40	6.96	5.62	5.26	5.73	5.36	5.26	5.31
	II	a	6.44	4.54	6.51	6.08	5.32	5.84	5.16	5.00	5.09
		b	7.58	5.32	7.09	5.60	5.05	5.43	4.73	4.64	4.70
		c	7.32	5.70	7.34	5.73	5.18	5.62	5.20	5.11	5.14
1.5	I	a	5.11	4.78	4.96	5.04	4.95	4.99	5.16	5.08	5.09
		b	5.39	5.13	5.23	5.42	5.32	5.27	4.94	4.91	4.88
		c	5.63	5.46	5.53	4.94	4.90	4.88	5.07	5.05	5.00
	II	a	5.58	5.05	5.28	5.48	5.32	5.38	5.00	4.99	4.99
		b	5.49	5.21	5.34	5.27	5.23	5.21	5.15	5.09	5.05
		c	5.85	5.59	5.65	5.76	5.66	5.64	5.27	5.24	5.23
2	I	a	5.48	5.21	5.19	5.46	5.37	5.33	4.95	4.89	4.85
		b	5.79	5.60	5.55	5.24	5.20	5.16	5.33	5.31	5.21
		c	5.49	5.31	5.19	5.21	5.18	5.13	4.94	4.94	4.87
	II	a	5.36	5.00	5.03	5.27	5.18	5.17	5.21	5.12	5.10
		b	5.23	5.05	5.05	5.44	5.40	5.32	5.12	5.07	5.00
		c	5.37	5.19	5.15	5.25	5.21	5.08	4.92	4.83	4.73

Note: Values in bold represent empirical type I error rates exceeding 6% or falling below 4%.

Table 4. Part of the simulation results of the empirical sizes for 4 strata.

$θ$	$ρ$	$π_{1}$	$m = 25$			$m = 50$			$m = 100$
$θ$	$ρ$	$π_{1}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$
1	I	a	5.67	4.66	5.52	5.26	4.87	5.07	5.08	4.95	4.96
		b	6.06	5.11	5.83	5.74	5.44	5.53	5.10	4.99	5.01
		c	5.97	5.36	5.88	5.29	5.15	5.17	5.02	4.97	4.97
	II	a	5.29	4.11	5.28	5.83	5.22	5.53	5.12	4.82	5.01
		b	6.41	5.36	6.11	5.10	4.73	4.86	4.95	4.87	4.89
		c	6.04	5.24	6.09	5.63	5.43	5.57	5.28	5.20	5.22
1.5	I	a	5.56	4.85	5.08	5.46	5.06	5.25	4.98	4.83	4.85
		b	5.33	4.95	4.95	5.72	5.48	5.44	5.22	5.11	5.10
		c	5.74	5.30	5.33	5.41	5.23	5.15	4.76	4.71	4.66
	II	a	5.77	4.92	5.29	5.30	4.88	5.03	5.59	5.36	5.44
		b	5.05	4.57	4.69	5.36	5.16	5.20	5.28	5.17	5.15
		c	5.51	5.10	5.15	5.54	5.24	5.26	5.08	4.96	4.91
2	I	a	5.79	5.05	5.23	5.35	5.10	5.11	5.16	4.96	4.93
		b	5.35	4.95	4.95	4.94	4.76	4.70	5.35	5.29	5.22
		c	5.17	4.73	4.65	5.01	4.91	4.79	5.00	4.94	4.84
	II	a	5.61	4.92	5.20	5.47	5.13	5.15	5.03	4.87	4.84
		b	5.68	5.24	5.28	5.03	4.82	4.76	4.97	4.89	4.86
		c	5.39	4.94	4.92	5.33	5.18	5.12	5.04	5.00	4.86

Table 5. Part of the simulation results of the empirical sizes for 8 strata.

$θ$	$ρ$	$π_{1}$	$m = 25$			$m = 50$			$m = 100$
$θ$	$ρ$	$π_{1}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$
1	I	a	5.59	4.33	4.97	5.55	4.88	5.20	5.33	4.96	5.14
		b	5.99	5.09	5.42	5.56	5.10	5.26	5.04	4.78	4.87
		c	5.99	5.30	5.42	5.83	5.43	5.43	5.23	5.05	5.05
	II	a	5.72	4.21	5.05	5.53	4.71	5.06	5.17	4.88	4.99
		b	5.72	4.46	5.07	5.64	5.02	5.31	5.31	5.11	5.19
		c	5.70	4.69	5.00	5.77	5.43	5.50	5.43	5.22	5.25
1.5	I	a	6.00	4.86	5.21	5.60	5.06	5.18	5.34	5.13	5.14
		b	5.71	4.83	4.99	5.06	4.82	4.80	4.77	4.59	4.58
		c	5.71	5.00	5.07	5.22	4.93	4.87	4.96	4.83	4.76
	II	a	5.35	4.20	4.67	5.61	5.00	5.26	4.94	4.65	4.74
		b	6.32	5.22	5.57	5.35	4.85	4.96	5.63	5.41	5.41
		c	6.64	5.84	5.92	5.50	5.15	5.15	5.53	5.37	5.34
2	I	a	5.97	5.08	5.19	5.38	4.84	4.89	5.45	5.19	5.17
		b	6.48	5.65	5.60	5.58	5.23	5.08	5.24	5.11	4.90
		c	5.62	5.10	5.02	5.49	5.22	5.11	5.35	5.22	5.09
	II	a	5.73	4.58	4.95	5.43	4.84	4.92	5.52	5.24	5.20
		b	5.65	4.76	4.85	5.51	5.07	5.07	5.10	4.93	4.80
		c	5.96	5.13	5.26	5.09	4.76	4.74	5.12	4.93	4.84

Table 6. Part of simulation results of the empirical powers for 2 strata (where

H_{1 A} : θ = (1, 2)

,

H_{1 B} : θ = (1, 4)

).

Table 6. Part of simulation results of the empirical powers for 2 strata (where

H_{1 A} : θ = (1, 2)

,

H_{1 B} : θ = (1, 4)

).

$θ_{1}$	$ρ$	$π_{1}$	$m = 25$			$m = 50$			$m = 100$
$θ_{1}$	$ρ$	$π_{1}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$
$H_{1 A}$	I	a	0.158	0.155	0.159	0.256	0.255	0.257	0.456	0.456	0.457
		b	0.169	0.166	0.168	0.275	0.274	0.273	0.497	0.496	0.496
		c	0.171	0.167	0.168	0.285	0.284	0.283	0.504	0.504	0.503
	II	a	0.155	0.150	0.154	0.248	0.246	0.248	0.432	0.432	0.433
		b	0.163	0.156	0.159	0.270	0.269	0.269	0.474	0.472	0.472
		c	0.172	0.168	0.169	0.279	0.278	0.277	0.471	0.469	0.469
$H_{1 B}$	I	a	0.315	0.313	0.317	0.530	0.531	0.532	0.821	0.823	0.823
		b	0.358	0.356	0.355	0.594	0.593	0.592	0.874	0.874	0.873
		c	0.337	0.333	0.333	0.591	0.589	0.588	0.869	0.868	0.868
	II	a	0.294	0.290	0.294	0.507	0.507	0.507	0.798	0.799	0.798
		b	0.342	0.337	0.337	0.567	0.565	0.564	0.847	0.846	0.845
		c	0.331	0.325	0.326	0.565	0.562	0.562	0.840	0.839	0.838

Table 7. Part of simulation results of the empirical powers for 4 strata (where

H_{1 A} : θ

= (1, 1.5, 1.5, 2),

H_{1 B} : θ = (1, 2, 3, 4)

).

Table 7. Part of simulation results of the empirical powers for 4 strata (where

H_{1 A} : θ

= (1, 1.5, 1.5, 2),

H_{1 B} : θ = (1, 2, 3, 4)

).

$θ_{1}$	$ρ$	$π_{1}$	$m = 25$			$m = 50$			$m = 100$
$θ_{1}$	$ρ$	$π_{1}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$
$H_{1 A}$	I	a	0.123	0.108	0.115	0.200	0.193	0.196	0.363	0.360	0.361
		b	0.136	0.127	0.128	0.215	0.210	0.210	0.381	0.378	0.378
		c	0.129	0.123	0.122	0.218	0.212	0.212	0.395	0.392	0.391
	II	a	0.122	0.109	0.116	0.190	0.183	0.186	0.331	0.328	0.329
		b	0.124	0.115	0.117	0.203	0.197	0.198	0.357	0.352	0.353
		c	0.124	0.114	0.116	0.200	0.195	0.196	0.364	0.361	0.361
$H_{1 B}$	I	a	0.340	0.327	0.338	0.607	0.604	0.607	0.906	0.907	0.907
		b	0.390	0.380	0.380	0.691	0.690	0.688	0.948	0.948	0.948
		c	0.383	0.369	0.373	0.674	0.669	0.668	0.942	0.942	0.941
	II	a	0.324	0.305	0.318	0.591	0.585	0.591	0.886	0.886	0.887
		b	0.380	0.370	0.372	0.660	0.656	0.655	0.940	0.939	0.939
		c	0.365	0.349	0.354	0.646	0.640	0.640	0.932	0.931	0.931

Table 8. Part of simulation results of the empirical powers for 8 strata (where

H_{1 A} : θ

= (1, 1.5, 1.5, 2, 1, 1.5, 1.5, 2),

H_{1 B} : θ = (1, 2, 3, 4, 1, 2, 3, 4)

).

Table 8. Part of simulation results of the empirical powers for 8 strata (where

H_{1 A} : θ

= (1, 1.5, 1.5, 2, 1, 1.5, 1.5, 2),

H_{1 B} : θ = (1, 2, 3, 4, 1, 2, 3, 4)

).

$θ_{1}$	$ρ$	$π_{1}$	$m = 25$			$m = 50$			$m = 100$
$θ_{1}$	$ρ$	$π_{1}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$	$T_{L}$	$T_{W}$	$T_{SC}$
$H_{1 A}$	I	a	0.158	0.137	0.145	0.270	0.257	0.262	0.512	0.508	0.509
		b	0.152	0.137	0.139	0.277	0.268	0.268	0.532	0.526	0.526
		c	0.161	0.149	0.149	0.285	0.277	0.277	0.550	0.546	0.545
	II	a	0.137	0.116	0.126	0.249	0.237	0.241	0.478	0.471	0.474
		b	0.160	0.138	0.145	0.264	0.250	0.252	0.512	0.506	0.507
		c	0.158	0.143	0.146	0.269	0.260	0.262	0.508	0.501	0.501
$H_{1 B}$	I	a	0.476	0.451	0.467	0.814	0.810	0.813	0.988	0.989	0.989
		b	0.561	0.545	0.546	0.874	0.873	0.871	0.997	0.996	0.996
		c	0.535	0.513	0.518	0.862	0.857	0.857	0.996	0.996	0.996
	II	a	0.441	0.411	0.429	0.788	0.782	0.787	0.985	0.985	0.985
		b	0.532	0.514	0.518	0.861	0.858	0.858	0.996	0.996	0.996
		c	0.511	0.486	0.493	0.850	0.844	0.844	0.994	0.994	0.994

Table 9. Number of OME-free ears after treatment across different strata (Group 1: Cefaclor; Group 2: Amoxicillin).

Age Groups	Age < 2 yrs		Age 2–5 yrs		Age ≥ 6 yrs
Number of Responses	1	2	1	2	1	2
0	8	11	6	3	0	1
1	2	2	6	1	1	0
2	8	2	10	5	3	6
Total	18	15	22	9	4	7

Table 10. MLEs of parameters for the acute otitis media study.

Age Groups	Unconstrained MLE			Constrained MLE
Age Groups	$\hat{ρ}$	${\hat{π}}_{1}$	$\hat{θ}$	${\hat{ρ}}_{H_{0}}$	${\hat{π}}_{1 H_{0}}$	${\hat{θ}}_{H_{0}}$
Age < 2 yrs	0.711	0.500	0.265	0.731	0.364	0.740
Age 2–5 yrs	0.531	0.588	1.145	0.532	0.597
Age ≥ 6 yrs	0.615	0.834	1.516	0.614	0.864

Table 11. Statistic values and p-values of different test statistics for the acute otitis media study.

	$T_{L}$	$T_{W}$	$T_{SC}$
Statistic	3.002	2.444	2.362
p-value	0.223	0.294	0.307

Table 12. Number of scleroderma patients with hand MRSS decreased across different strata (Group 1: Collagen; Group 2: Placebo).

Disease Duration	Early-Phase		Late-Phase
Number of Responses	1	2	1	2
0	20	23	9	22
1	2	3	3	2
2	5	4	3	2
Total	27	30	15	26

Table 13. MLEs of parameters for the diffuse scleroderma trial.

Disease Duration	Unconstrained MLE			Constrained MLE
Disease Duration	$\hat{ρ}$	${\hat{π}}_{1}$	$\hat{θ}$	${\hat{ρ}}_{H_{0}}$	${\hat{π}}_{1 H_{0}}$	${\hat{θ}}_{H_{0}}$
Early phase	0.727	0.218	0.833	0.727	0.227	0.639
Late phase	0.569	0.303	0.292	0.583	0.210

Table 14. Statistic values and p-values of different test statistics for the diffuse scleroderma trial.

	$T_{L}$	$T_{W}$	$T_{SC}$
Statistic	1.521	1.236	1.155
p-value	0.218	0.266	0.283

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shen, X.; Zhang, X.; Ma, C.-X. Testing Homogeneity of Odds Ratio for Stratified Bilateral Correlated Data. Axioms 2026, 15, 155. https://doi.org/10.3390/axioms15020155

AMA Style

Shen X, Zhang X, Ma C-X. Testing Homogeneity of Odds Ratio for Stratified Bilateral Correlated Data. Axioms. 2026; 15(2):155. https://doi.org/10.3390/axioms15020155

Chicago/Turabian Style

Shen, Xi, Xueqing Zhang, and Chang-Xing Ma. 2026. "Testing Homogeneity of Odds Ratio for Stratified Bilateral Correlated Data" Axioms 15, no. 2: 155. https://doi.org/10.3390/axioms15020155

APA Style

Shen, X., Zhang, X., & Ma, C.-X. (2026). Testing Homogeneity of Odds Ratio for Stratified Bilateral Correlated Data. Axioms, 15(2), 155. https://doi.org/10.3390/axioms15020155

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Testing Homogeneity of Odds Ratio for Stratified Bilateral Correlated Data^†

Abstract

1. Introduction

2. Data Structure

3. Proposed Methods

3.1. Likelihood Ratio Test ( $T_{L}$ )

3.2. Wald-Type Log-Linear Test ( $T_{W}$ )

3.3. Score Test ( $T_{S C}$ )

4. Simulation Studies

5. Real-World Examples

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Information Matrix and Formula Derivation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Testing Homogeneity of Odds Ratio for Stratified Bilateral Correlated Data †

Abstract

1. Introduction

2. Data Structure

3. Proposed Methods

3.1. Likelihood Ratio Test ( T L )

3.2. Wald-Type Log-Linear Test ( T W )

3.3. Score Test ( T S C )

4. Simulation Studies

5. Real-World Examples

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Information Matrix and Formula Derivation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Testing Homogeneity of Odds Ratio for Stratified Bilateral Correlated Data^†

3.1. Likelihood Ratio Test ( $T_{L}$ )

3.2. Wald-Type Log-Linear Test ( $T_{W}$ )

3.3. Score Test ( $T_{S C}$ )