Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Exact Inference for Random Effects Meta-Analyses for Small, Sparse Data

Stats 2025, 8(1), 5; https://doi.org/10.3390/stats8010005

by Jessica Gronsbell^1,*, Zachary R. McCaw²

, Timothy Regis¹ and Lu Tian³

Reviewer 1: Anonymous

Reviewer 2:

E. I. Abdul Sathar

Reviewer 3: Anonymous

Stats 2025, 8(1), 5; https://doi.org/10.3390/stats8010005

Submission received: 17 October 2024 / Revised: 6 December 2024 / Accepted: 10 December 2024 / Published: 7 January 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The article discusses the development of a new exact inference method, "XRRmeta," aimed at meta-analyses with random effects, particularly useful for small and sparse datasets, common in drug safety studies. This work is highly relevant because these conditions (rare events, few studies, and heterogeneity) make it difficult to apply traditional methods, which tend to require larger volumes of data to achieve statistical precision. Thus, XRRmeta presents itself as a solution for analyses in scenarios where classical methods fail to adequately control type I error, ensuring robust confidence intervals.

The main innovation of the work is the creation of XRRmeta, a method that offers exact confidence intervals and guaranteed coverage, even in scenarios with rare events and high variability among studies. Unlike conventional methods, which rely on asymptotic assumptions and often exclude studies without events, XRRmeta uses conditional inference that justifies the exclusion of these studies without compromising the validity of the conclusions. Furthermore, the method is implemented in an open-source package in R, facilitating its practical application.

The described methodology responds well to the results, as the article provides a detailed explanation of how XRRmeta works, its theoretical foundations, and the method's performance in different simulated scenarios, as well as in real-case studies (rosiglitazone and face masks). The results confirm that XRRmeta effectively controls type I error without being overly conservative, demonstrating the method's suitability for the proposed objectives.

The article includes a range of relevant references, particularly highlighting recent articles that address meta-analysis in rare events and methodologies for controlling heterogeneity. Some references are older, such as widely accepted norms and techniques in the field, but there is a balanced use of current literature that emphasizes the study's relevance and current challenges in meta-analysis.

The text structure is coherent and organized. The article follows a logical order: it introduces the problem, presents the methodology and theories behind XRRmeta, followed by an analysis of the results obtained from simulations and real data, and concludes with discussions and practical implications. Each section provides clear information that guides the reader to understand the results and the importance of the proposed method.

Conclusion
The conclusion is correct and well-founded. It reinforces the importance of XRRmeta as a robust and exact alternative for meta-analyses with rare events and high heterogeneity. Moreover, the article emphasizes the limitations of XRRmeta and suggests that it is particularly useful when controlling type I error is a priority, remaining conservative without being excessively restrictive.

The references are compatible with the scope of the work, including studies on exact inference, methodologies for analyzing rare events, and literature reviews on meta-analysis. The authors also cite important standards, such as the Cochrane Handbook, which add credibility to the application of the method in clinical intervention reviews and reinforce the article's alignment with established practices in the field.

Comments on the Quality of English Language

The conclusion is correct and well-founded. It reinforces the importance of XRRmeta as a robust and exact alternative for meta-analyses with rare events and high heterogeneity. Moreover, the article emphasizes the limitations of XRRmeta and suggests that it is particularly useful when controlling type I error is a priority, remaining conservative without being excessively restrictive.

Author Response

Thank you very much for your comments and for your careful review of our paper. We appreciate the encouragement and have made revisions throughout the text to further improve clarity of the main text. All changes to the manuscript are made in blue so that they can be easily identified. We have paid special attention to editing grammar and flow of the manuscript to improve the quality of the language.

Reviewer 2 Report

Comments and Suggestions for Authors

The paper needs a major revision based on the following comment sectionwise.

introduction

Title:

1. Minor Typos: The title contains an extra space in "Meta-Analyses with Small, Sparse Data."

2. Clarity of Purpose: The title could specify the methodological contribution ("Exact Inference" for "random effects meta-analysis" under conditions of "small and sparse data") for better precision.

Abstract:

1. Line Numbering in Text: The abstract contains line numbers (e.g., "1 statistical inference," "2 pharmaceutical products") that interfere with readability. These should be removed.

2. Overuse of Jargon: Terms like "exact inference," "random effects framework," and "nominal level" may benefit from simplified explanations, especially for broader accessibility.

3. Missing Citation Formatting: Specific references are cited in the introduction but appear without author-year citation formatting, making it harder to trace sources.

4. Unclear Methodology Naming: The term “XRRmeta” is introduced as the name for the method without context on why it was named as such, which could be clarified for better coherence.

Introduction:

1. Inconsistent Citation Style: Citations like "[1–3]" are used without full names or context, which can confuse readers about the sources referenced.

2. Line Numbers Throughout: Line numbers (e.g., "18," "20," etc.) are embedded in the text, disrupting the flow.

3. Example Clarity: While the introduction provides an example (rosiglitazone), additional context around the selection of this example would improve relevance.

4. Ambiguous Transition to New Method: The transition to introducing the new "XRRmeta" method could be clearer, as it currently feels abrupt without connecting why current methods are insufficient.

5. Redundant Phrasing: Some sentences reiterate points already made, such as the rarity of events affecting analysis reliability. Removing redundancy could make the introduction more concise.

Section2.

2. Problem Setup

1. Incomplete Notation Explanation: The setup mentions variables like Y_{ij} ), N_{ij} ), and K_{text{tot}} ) without immediately clarifying each symbol’s meaning. While later explained, this could cause initial confusion.

2. Equation Alignment: Some equations, such as Y_{ij} sim text{Poisson}(N_{ij}lambda_{ij}) ), could be aligned or displayed more clearly, especially in settings with complex subscripted terms.

3. Abbreviations without Definitions: Terms like "DZ" (double-zero) appear without an initial definition, making it difficult to follow for first-time readers.

2.1 Notations and Assumptions

1. Assumption Clarity: The assumption that Y_{ij} ) follows a Poisson distribution with rate parameter lambda_{ij} ) could benefit from further explanation of why the log-linear model was chosen for lambda_{ij} ).

2. Binary Treatment Indicator: X_{ij} = I(j = 1) ) is introduced as a binary treatment indicator, but the purpose and implications of this variable could be clarified.

3. Redundant Language: The phrase "without loss of generality" is used, which can be omitted without affecting the meaning since it does not add significant value in this context.

4. Typographical Errors in Notation: There is inconsistent spacing in notations like "Poisson(Nijλij)" and "ind∼," which might indicate formatting issues.

5. Non-Standard Symbols for Events: Yi· = Yi1 + Yi2 ) is used without explicitly defining the dot notation, which might be confusing for some readers.

2.2 Parameter of Interest

1. Ambiguous Explanation of Treatment Contrast: The parameter pi_i ) as "the magnitude of the event rate in the treated group relative to the cumulative event rate" could be made clearer by specifying how it serves in the model and its practical significance.

2. Parameter Interpretation: The interpretation of pi_i > 0.5 ) could include more detail on what it implies for the treated vs. control groups.

3. Inconsistent Subscript and Formatting: Notations like alpha_0 ) and beta_0 ) are defined but may benefit from standardized subscript formatting for better readability.

4. Bimodal Behavior Explanation: When explaining the behavior of the pi_i sim text{Beta}(alpha_0, beta_0) ) distribution, more background on the context of “bimodal prior distributions” would clarify why this behavior is an issue.

5. Typographical Errors: Minor formatting issues, such as "Yij∼" (missing space), occur in several equations.

Remark 1

1. Lack of Transition: The transition from discussing 2 times 2 ) tables to incidence rates is abrupt, without a linking sentence that explains why this additional context is being introduced.

2. Formatting Issues in Symbol Representation: Phrases like "{(Yij, Tij)" and "P(Yij | Y · ) ∼" have inconsistent spacing and formatting errors.

3. Terminology Clarity: Phrasing like “analogous assumptions” lacks clarity on what assumptions specifically apply to incidence rate data in this context.

4. Structural Flow: Ending Remark 1 with a general statement without tying it back to the main content disrupts the section's flow.

Section 3

3.1 Exact Inference Procedure

1. Terminology Consistency: The use of “CI” for confidence interval should be clarified at first use for consistency.

2. Unclear Hypothesis Definition: The null hypothesis H_0 : \mu_0 = \mu could be better explained to indicate why this specific hypothesis is relevant for the exact inference method.

3. Ambiguity in Probability Notation: The notation P\{ T(\mu; D_{\mu,\nu}) \geq T(\mu; D_0) | D_0 \} may be confusing for some readers as it introduces complex probabilistic concepts without much context.

4. Unexplained Acronyms: The acronym "MC" is used for Monte Carlo but could be explicitly defined upon first use.

5. Complex Language in Step Descriptions: Steps 1a and 1b in the Monte Carlo simulation explanation contain dense technical language (e.g., "calculate p(\mu, \nu; D_0) with M^{-1}..."), which may benefit from simplification or brief clarifications.

Figure 1

1. Incomplete Figure Label: Figure 1 lacks a clear title or description of what it represents. Adding a brief caption explaining the two steps in XRRmeta would be helpful.

2. Inconsistent Formatting in Steps: Formatting in the two-step procedure varies, with inconsistent use of punctuation and font size, which may disrupt readability.

3. Lack of Connection with Main Text: Figure 1 could be better referenced within the main text to clarify how the figure ties into the computational steps in XRRmeta.

4. Typographical Errors: Minor formatting issues like “grid search” and “projection” are inconsistently capitalized, which affects professionalism.

3.2 Test Statistic

1. Inconsistent Notation: Notations for estimators (e.g., \mû vs. \mu ) lack clarity, with inconsistent subscripts and accents that could make it hard for readers to follow.

2. Unclear Assumptions on Moments Estimators: The method of moments estimators used are introduced without clearly stating why they were chosen over other estimators, such as maximum likelihood estimators (MLE).

3. Abrupt Introduction of Terms: Terms like "Wald statistic" and "inverse variance estimator" appear suddenly and could benefit from brief definitions or explanations.

4. Confusing Test Statistic Notation: In the expression for the Wald statistic, T(\mu; D_0) = \frac{(\mû - \mu)^2}{V̂ar(\mû)} , the notation V̂ar(\mû) could be misinterpreted without clear contextual explanation.

3.2.1 Balanced Design

1. Unclear Explanation of "Balanced": The term “balanced design” could be explained more explicitly, especially regarding why equal sample sizes are assumed for treatment and control groups.

2. Equation Formatting: Equations for the first two moments E(Y_{i1} / Y_{i\cdot}) = \mu_0 and E((Y_{i1} / Y_{i\cdot})^2) could be better formatted or numbered for ease of reference.

3. Complexity in Moment Estimation: The explanation of moment estimators using continuity correction may be overly complex. Simplifying or breaking it down into smaller steps would improve comprehension.

4. Typographical Inconsistencies: The estimator symbols, such as Ỹ_{i1} and Ỹ_{i\cdot} , are inconsistently formatted, making it difficult to track variables throughout the section.

5. Redundant Information: There is some redundancy in describing the continuity correction, as it is already stated in the previous section.

3.2.2 Unbalanced Design

1. Inconsistent Notation: The section introduces unbalanced design adjustments without consistent notation, especially with terms like Y_{i1} , Y_{i\cdot} , and Ỹ_{i\cdot} .

2. Abrupt Explanation of Resampling: The resampling procedure is introduced without sufficient context, making it hard to understand the rationale behind resampling as it applies to mimicking a balanced design.

3. Terminology Clarification: Terms like "hypergeometric distribution" and “enumerated data” could be better explained to ensure clarity for readers not familiar with these concepts.

4. Formatting of Example Data: The example data given (e.g., D^_1 = \{(Y^_{11l}, N_{12}), (Y_{12}, N_{12}) | l = ... \} ) is densely presented, and adding explanations for each variable would improve readability.

5. Typographical Errors in Notation: Inconsistent spacing in terms like Y_{i1} / Y_{i\cdot} and superscripts (e.g., "int") may hinder clarity.

3.3 Computational Details

1. Ambiguous Complexity Description: The section references computational complexity reductions without detailing how or why certain operations achieve these reductions.

2. Unclear Explanation of Stochastic Dominance: The explanation for how stochastic dominance aids in reducing execution time could be expanded for clarity.

3. Typographical Errors in Probability Notation: Probability notation, such as P(T(\mu; D_{\mu, \nu}) > t) , could be consistently formatted to avoid confusion.

4. Ambiguous Instructions on Grid Search Parameters: The description of grid size s and Monte Carlo repetitions M lacks specific examples to clarify recommended values and their effects.

5. Inconsistent Formatting: Terms like “correction step” and “boundary of the parameter space” are inconsistently capitalized, and inconsistent spacing around symbols disrupts the flow.

6. Redundant Terminology: The mention of “reduced parameter space” is repeated multiple times with slightly different phrasing, which could be streamlined for brevity.

Section 4.

4.1 Simulation Studies

1. Clarity on Simulation Parameters: The description of simulation parameters is dense, and specific explanations for why parameters like $ r_0 $, $ K_{\text{tot}} $, and Beta distribution settings were chosen could help readers understand the relevance of each choice.

2. Inconsistent Notation of Parameters: Symbols, such as $ K_{\text{tot}} $ and $ K $, are used without consistent definitions, which can confuse the reader regarding their respective roles.

3. Ambiguous Reporting of Type I Error: Terms like "Type I error inflation" and "conservativeness" could be better explained to clarify their impact on interpreting statistical significance in small sample settings.

4. Typographical Issues: Minor typographical inconsistencies, such as the inconsistent use of spaces in equations (e.g., "MH, MH_cc") and around parameters (e.g., "r0=0.01"), disrupt readability.

5. Repetitive Language: Phrasing is repetitive regarding the limitations of classical methods in high heterogeneity settings, which could be streamlined to maintain focus.

6. Visual Reference to Figures: The section references Figure 2 but lacks an introductory statement on what the figure represents, which could enhance readability.

7. Missing Statistical Test Details: While several methods are compared, the section does not explain the choice of these methods or the criteria for comparison (e.g., coverage levels, efficiency).

4.2 Real Data Examples

1. Lack of Rationale for Chosen Studies: The section does not provide a clear rationale for choosing the rosiglitazone and face mask studies as case studies, which could help contextualize their relevance.

2. Complexity in Data Presentation: Presenting results from several statistical methods at once, without clearly organizing each method’s purpose and differences, makes the data hard to interpret.

3. Missing Connection to Simulation Results: While the simulations aimed to demonstrate method effectiveness, the lack of connection between simulation outcomes and real data applications limits the relevance of the findings.

4. Formatting of Method Names: Method names, such as "MH-CC" and "Peto-R," are presented in shorthand without explanation, which may confuse readers unfamiliar with these abbreviations.

4.2.1 Rosiglitazone Study

1. Inconsistent Reference to Previous Studies: References to the rosiglitazone study’s initial findings could benefit from expanded citations for improved traceability of the original results.

2. Unclear Statistical Results Comparison: When comparing the results from XRRmeta with other methods, the differences in confidence intervals (CIs) and p-values could be better explained to make it clear why XRRmeta is preferable.

3. Typographical Errors in CIs: In Table 2, inconsistent formatting of CIs, such as "[1.03, 1.98]" and "[0.91,1.67]," creates a disjointed look.

4. Unexplained Acronyms: Terms like "CVD" and "DZ" appear without a prior explanation, which can lead to confusion for readers not familiar with the context.

5. Complexity in Method Justification: Statements like "our procedure is justified in its exclusion of DZ studies" could be clarified, as it’s not immediately clear why XRRmeta excludes these studies.

4.2.2 Face Mask Study

1. Abrupt Transition Between Methods: The comparison of results between methods is presented in a rushed manner, making it difficult for readers to understand each method’s unique contribution to the analysis.

2. Incomplete Explanation of Methodological Significance: The significance of the face mask study's results in demonstrating XRRmeta’s advantages could be expanded to better explain why XRRmeta is suitable for similar studies with rare events.

3. Typographical Consistency: In Table 3, abbreviations such as "MH-CC" and "Peto-R" are inconsistently spaced, which affects professionalism.

4. Incomplete Data Context: Background on the transmission rates and settings for SARS-CoV-2 could help readers understand the importance of this meta-analysis and its conclusions.

5. Repetitive Language: There is some redundancy in describing the protective effects of face masks, which could be simplified to improve focus on the statistical results.

5. Discussion

1. Missing Summary of Key Findings: The discussion lacks a concise summary of key findings from the simulations and real data examples, which would help reinforce the relevance of the XRRmeta method.

2. Unclear Implications for Practice: Statements such as "XRRmeta was the only method that uniformly maintained type I error" could benefit from a discussion on the practical implications for researchers conducting meta-analyses on rare events.

3. Redundant Mention of Cochrane Handbook: The reference to the Cochrane Handbook appears multiple times with slight variations, which could be streamlined for brevity.

4. Typographical Errors in Citations: Inconsistent citation formatting (e.g., “[9,34]” without proper spacing) disrupts readability.

5. Abrupt End to Discussion: The discussion concludes without suggestions for future research directions, limiting the impact of the findings. Including proposed improvements or further validation studies would add value.

6. Technical Language: Statements like “the primary complication in implementing XRRmeta is that the cumulative distribution function of $ T(\mu;D_{\mu, \nu}) $” could be simplified or clarified for broader accessibility.

Comments on the Quality of English Language

The quality of English in the manuscript is generally understandable but could benefit from improvements to enhance clarity and flow. Here are specific suggestions:

1. Complexity of Language: Technical jargon is often introduced without adequate context, which can make comprehension challenging, especially for a broader audience. Simplifying explanations where possible and defining complex terms upon first use would help.

2. Inconsistent Terminology and Notation: Some terms, such as abbreviations for statistical methods (e.g., "MH-CC" and "Peto-R"), are inconsistently formatted. Standardizing terminology and maintaining uniform notation throughout would improve readability.

3. Typographical Errors and Formatting Issues: There are minor typographical inconsistencies, such as spacing around equations, inconsistent capitalization in headings, and formatting of confidence intervals. Correcting these errors would enhance the paper’s professional appearance.

4. Redundant Phrasing: Certain explanations are repeated, especially when discussing the limitations of existing methods. Streamlining such content would allow for a more concise and engaging narrative.

5. Improving Flow: Transitions between sections, especially from results to discussion, could be made smoother by adding linking statements. This would make the paper’s structure easier to follow.

Author Response

The paper needs a major revision based on the following comment sectionwise.

Thank you very much for the detailed comments and your extremely thorough review of our manuscript. We respond to each of your comments below and have made all changes to the text in blue. Your comments have substantially improved the article and we appreciate your time.

Title:

Minor Typos: The title contains an extra space in "Meta-Analyses with Small, Sparse Data."

Thank you for catching this formatting issue. It has been updated in the revised manuscript.

Clarity of Purpose: The title could specify the methodological contribution ("Exact Inference" for "random effects meta-analysis" under conditions of "small and sparse data") for better precision.

Following your suggestion, we have updated the title “Exact Inference for Random Effects Meta-Analyses for Small, Sparse Data” to better indicate the purposes of the article.

Abstract:

Line Numbering in Text: The abstract contains line numbers (e.g., "1 statistical inference," "2 pharmaceutical products") that interfere with readability. These should be removed.

The line numbers are unfortunately part of the STATS journal template. They will be removed prior to publication.

Overuse of Jargon: Terms like "exact inference," "random effects framework," and "nominal level" may benefit from simplified explanations, especially for broader accessibility.

We have added a sentence to clarify the benefits of our proposal in the abstract.

Missing Citation Formatting: Specific references are cited in the introduction but appear without author-year citation formatting, making it harder to trace sources.

Unfortunately we are utilizing the STATS journal template which uses numbering for references.

Unclear Methodology Naming: The term “XRRmeta” is introduced as the name for the method without context on why it was named as such, which could be clarified for better coherence.

We have updated the abstract to underline relevant letters that correspond to the name XRRmeta to improve clarity.

Introduction:

Inconsistent Citation Style: Citations like "[1–3]" are used without full names or context, which can confuse readers about the sources referenced.

Unfortunately we are utilizing the STATS journal template which uses numbering for references.

Line Numbers Throughout: Line numbers (e.g., "18," "20," etc.) are embedded in the text, disrupting the flow.

The line numbers are unfortunately part of the STATS journal template. They will be removed prior to publication.

Example Clarity: While the introduction provides an example (rosiglitazone), additional context around the selection of this example would improve relevance.

We have added background information on the impact of the initial rosiglitazone example, including the initial findings showing an increase in the risk of myocardial infarction (odds ratio, 1.43; 95\% confidence interval [CI], 1.03 to 1.98) and a borderline-significant finding for death from cardiovascular causes (odds ratio, 1.64; 95\% CI, 0.98 to 2.74). This finding set off significant debate over the drug’s safety, influencing regulatory actions worldwide, including label warnings by agencies like the Food and Drug Administration .

Ambiguous Transition to New Method: The transition to introducing the new "XRRmeta" method could be clearer, as it currently feels abrupt without connecting why current methods are insufficient.

We have revised the transition sentence to clarify that existing methods cannot handle the small, sparse data scenario, which is the focus of XRRmeta.

Redundant Phrasing: Some sentences reiterate points already made, such as the rarity of events affecting analysis reliability. Removing redundancy could make the introduction more concise.

We appreciate the suggestion and have made every effort to remove repeated text.

Section 2.

Problem Setup
Incomplete Notation Explanation: The setup mentions variables like Y_{ij}, N_{ij}, and K_{text{tot}} ) without immediately clarifying each symbol’s meaning. While later explained, this could cause initial confusion.

Thank you, we have reorganized the notation section to clarify each notation when it is introduced.

Equation Alignment: Some equations, such as Y_{ij} sim text{Poisson}(N_{ij}lambda_{ij}) ), could be aligned or displayed more clearly, especially in settings with complex subscripted terms.

We have realigned all equations to be displayed more clearly.

Abbreviations without Definitions: Terms like "DZ" (double-zero) appear without an initial definition, making it difficult to follow for first-time readers.

We have made every effort to define all abbreviations, including double zero (DZ), which is first introduced in the introduction section.

2.1 Notations and Assumptions

Assumption Clarity: The assumption that Y_{ij} follows a Poisson distribution with rate parameter lambda_{ij} could benefit from further explanation of why the log-linear model was chosen for lambda_{ij}.

We have clarified that this model is chosen as the data are counts and that it also accommodates between-study heterogeneity. We have also added reference to the original formulation of this model in the text.

Binary Treatment Indicator: X_{ij} = I(j = 1) is introduced as a binary treatment indicator, but the purpose and implications of this variable could be clarified.

We have rephrased our explanation of X_{ij} to indicate that it is a binary indicator for assignment to the treated arm.

Redundant Language: The phrase "without loss of generality" is used, which can be omitted without affecting the meaning since it does not add significant value in this context.

We have removed the term without loss of generality.

Typographical Errors in Notation: There is inconsistent spacing in notations like "Poisson(Nijλij)" and "ind∼," which might indicate formatting issues.

We have fixed the formatting and explanation of this equation to enhance readability.

Non-Standard Symbols for Events: Yi· = Yi1 + Yi2 ) is used without explicitly defining the dot notation, which might be confusing for some readers.

We have introduced the meaning of Yi· before introducing the notation to enhance clarity. We have also clarified that the dot notation indicates the summation is taken over the index for the study.

2.2 Parameter of Interest

Ambiguous Explanation of Treatment Contrast: The parameter pi_i as "the magnitude of the event rate in the treated group relative to the cumulative event rate" could be made clearer by specifying how it serves in the model and its practical significance.

We have clarified that \pi_i provides a measure of the treatment of effect as it conveys the proportion of events that occur in the treated arm relative to the overall event rate in both arms.

Parameter Interpretation: The interpretation of pi_i > 0.5 could include more detail on what it implies for the treated vs. control groups.

We have added that the treatment has a harmful effect when the events of interest are adverse in the setting of pi_i > 0.5 for clarification.

Inconsistent Subscript and Formatting: Notations like alpha_0 and beta_0 are defined but may benefit from standardized subscript formatting for better readability.

The 0 subscript is used to indicate the true values of the parameter as alpha and beta are later varied in the XRRmeta procedure. We have made note of this in the text.

Bimodal Behavior Explanation: When explaining the behavior of the pi_i sim text{Beta}(alpha_0, beta_0) ) distribution, more background on the context of “bimodal prior distributions” would clarify why this behavior is an issue.

We have clarified that this assumption is necessary to ensure identifiability of the model parameter.

Typographical Errors: Minor formatting issues, such as "Yij∼" (missing space), occur in several equations.

We have paid special attention to make sure all equation formatting is correct.

Remark 1

Lack of Transition: The transition from discussing 2 times 2 tables to incidence rates is abrupt, without a linking sentence that explains why this additional context is being introduced.

We have removed this remark to improve the flow of the article.

Formatting Issues in Symbol Representation: Phrases like "{(Yij, Tij)" and "P(Yij | Y · ) ∼" have inconsistent spacing and formatting errors.

These equations have been removed.

Terminology Clarity: Phrasing like “analogous assumptions” lacks clarity on what assumptions specifically apply to incidence rate data in this context.

We have removed this remark to improve the flow of the article.

Structural Flow: Ending Remark 1 with a general statement without tying it back to the main content disrupts the section's flow.

We have removed this remark to improve the flow of the article.

Section 3

3.1 Exact Inference Procedure

Terminology Consistency: The use of “CI” for confidence interval should be clarified at first use for consistency.

We now introduce the shorthand of CI for “confidence interval” in the introduction and use it throughout the text.

Unclear Hypothesis Definition: The null hypothesis H_0 : \mu_0 = \mu could be better explained to indicate why this specific hypothesis is relevant for the exact inference method.

We have added that the exact CI invests the null hypothesis prior to introducing the hypothesis to clarify its purpose.

Ambiguity in Probability Notation: The notation P\{ T(\mu; D_{\mu,\nu}) \geq T(\mu; D_0) | D_0 \} may be confusing for some readers as it introduces complex probabilistic concepts without much context.

We have rewritten this section of the text to first introduce the p-value function and then introduce the profile p-value so that it is clear to readers what the purpose of the notation is. The p-value function corresponds to the familiar definition of a p-value being the probability of observing a test statistic as or more extreme than the observed value for a given null value of \mu and \nu.

Unexplained Acronyms: The acronym "MC" is used for Monte Carlo but could be explicitly defined upon first use.

Monte Carlo (MC) is first used in line and the acronym is provided with it. We have ensured that MC is used in the remainder of the text.

Complex Language in Step Descriptions: Steps 1a and 1b in the Monte Carlo simulation explanation contain dense technical language (e.g., "calculate p(\mu, \nu; D_0) with M^{-1}..."), which may benefit from simplification or brief clarifications.

We have added more clarity regarding Steps 1a and 1b of the procedure and also added that the equation for p-value corresponds with the familiar definition to improve clarity.

Figure 1

Incomplete Figure Label: Figure 1 lacks a clear title or description of what it represents. Adding a brief caption explaining the two steps in XRRmeta would be helpful.

We have added a detailed caption for Figure 1 to explain the two steps of XRRmeta.

Inconsistent Formatting in Steps: Formatting in the two-step procedure varies, with inconsistent use of punctuation and font size, which may disrupt readability.

We have assured that the font size is consistent throughout the text and figures using the STATS template. We have also updated the writing so punctuation is consistent.

Lack of Connection with Main Text: Figure 1 could be better referenced within the main text to clarify how the figure ties into the computational steps in XRRmeta.

Figure 1 is referenced in line 152 when we introduce XRRmeta and again in 157 when detailing its two substeps.

Typographical Errors: Minor formatting issues like “grid search” and “projection” are inconsistently capitalized, which affects professionalism.

We have corrected all typographical errors.

3.2 Test Statistic

Inconsistent Notation: Notations for estimators (e.g., \mû vs. \mu ) lack clarity, with inconsistent subscripts and accents that could make it hard for readers to follow.

Hat notation is the most common way to denote an estimator in the statistical literature. We have stayed consistent with existing conventions. We have clarified that these are the estimators of the true model parameters, \mu_0 and \nu_0.

Unclear Assumptions on Moments Estimators: The method of moments estimators used are introduced without clearly stating why they were chosen over other estimators, such as maximum likelihood estimators (MLE).

We have clarified why method of moment estimators were not selected at the beginning of the section.

Abrupt Introduction of Terms: Terms like "Wald statistic" and "inverse variance estimator" appear suddenly and could benefit from brief definitions or explanations.

We have added more detail about the Wald test to remind readers it is one of the three most common testing procedures and also defined “inverse variance estimator” in words.

Confusing Test Statistic Notation: In the expression for the Wald statistic, T(\mu; D_0) = \frac{(\mû - \mu)^2}{V̂ar(\mû)} , the notation V̂ar(\mû) could be misinterpreted without clear contextual explanation.

We have clarified that V̂ar(\mû) is the estimated variance when we introduce the Wald test.

3.2.1 Balanced Design

Unclear Explanation of "Balanced": The term “balanced design” could be explained more explicitly, especially regarding why equal sample sizes are assumed for treatment and control groups.

We have introduced what the balanced design is and its purpose (i.e., increasing statistical power in detecting a treatment effect) at the beginning of the section.

Equation Formatting: Equations for the first two moments E(Y_{i1} / Y_{i\cdot}) = \mu_0 and E((Y_{i1} / Y_{i\cdot})^2) could be better formatted or numbered for ease of reference.

We have reformatted the equations for ease of reference

Complexity in Moment Estimation: The explanation of moment estimators using continuity correction may be overly complex. Simplifying or breaking it down into smaller steps would improve comprehension.

We have rewritten this section and broken down the moment estimation into two steps, one for the estimation of \mu_0 and another for \nu_0 to improve comprehension.

Typographical Inconsistencies: The estimator symbols, such as Ỹ_{i1} and Ỹ_{i\cdot} , are inconsistently formatted, making it difficult to track variables throughout the section.

We have updated this typographical inconsistency to introduce Ỹ_{i1} and Ỹ_{i\cdot} separately.

Redundant Information: There is some redundancy in describing the continuity correction, as it is already stated in the previous section.

Here we utilize a continuity correction specifically for our test statistic so it is different from what is introduced in the previous section. We have clarified that the correction is specifically for equation (3) which introduced the estimator of \nu.

3.2.2 Unbalanced Design

Inconsistent Notation: The section introduces unbalanced design adjustments without consistent notation, especially with terms like Y_{i1} , Y_{i\cdot} , and Ỹ_{i\cdot} .

With your earlier suggestions the notations are now clearly defined. The definitions of Y_{i1} and Y_{i\cdot} are provided in the notation section (Section 2.1) and were reorganized according to your prior comments. Ỹ_{i\cdot} is also clarified in Section 3.2.1

Abrupt Explanation of Resampling: The resampling procedure is introduced without sufficient context, making it hard to understand the rationale behind resampling as it applies to mimicking a balanced design.

We have clarified that the goal of the resampling procedure is to avoid estimators that require iterative calculations. The resampling procedure therefore enables us to use weighted method of moment estimators that are a simple extension to the balanced design setting and to maintain a computationally efficient procedure for XRRmeta. This section has been rewritten in the text.

Terminology Clarification: Terms like "hypergeometric distribution" and “enumerated data” could be better explained to ensure clarity for readers not familiar with these concepts.

The hypergeometric distribution is defined in line 221. We have changed “enumerated data” to “resampled data” for clarity.

Formatting of Example Data: The example data given (e.g., D^_1 = \{(Y^_{11l}, N_{12}), (Y_{12}, N_{12}) | l = ... \} ) is densely presented, and adding explanations for each variable would improve readability.

We have broken down the steps in the example data to improve readability.

Typographical Errors in Notation: Inconsistent spacing in terms like Y_{i1} / Y_{i\cdot} and superscripts (e.g., "int") may hinder clarity.

We have reformatted the spacing to make sure the typographical errors are corrected at the end of section 3.2.2. The subscripts are also consistent with Section 3.2.1 for clarity.

3.3 Computational Details

Ambiguous Complexity Description: The section references computational complexity reductions without detailing how or why certain operations achieve these reductions.

We have rewritten the introduction to the Computational Details Section (Section 3.3) to show that our result allows us to do H operations rather than the H X J operations originally shown in Figure 1, which significantly improves computation time.

Unclear Explanation of Stochastic Dominance: The explanation for how stochastic dominance aids in reducing execution time could be expanded for clarity.

As demonstrated in the response to (1) above, we have rewritten this section.

Typographical Errors in Probability Notation: Probability notation, such as P(T(\mu; D_{\mu, \nu}) > t) , could be consistently formatted to avoid confusion.

We have confirmed that this formatting is consistent with the other sections.

Ambiguous Instructions on Grid Search Parameters: The description of grid size s and Monte Carlo repetitions M lacks specific examples to clarify recommended values and their effects.

We have clarified that both M and s affect how precise the results are for the treatment contrast. We have also made suggestions for M of at least 2000 so that the profile p-values in XRRmeta are computed with enough precision and s of 0.001 as this will accommodate most values of the treatment contrast.

Inconsistent Formatting: Terms like “correction step” and “boundary of the parameter space” are inconsistently capitalized, and inconsistent spacing around symbols disrupts the flow.

We have corrected the inconsistent formatting in the main text and the supplement.

Redundant Terminology: The mention of “reduced parameter space” is repeated multiple times with slightly different phrasing, which could be streamlined for brevity.

We have shortened this to parameter space for brevity.

Section 4.

4.1 Simulation Studies

Clarity on Simulation Parameters: The description of simulation parameters is dense, and specific explanations for why parameters like $ r_0 $, $ K_{\text{tot}} $, and Beta distribution settings were chosen could help readers understand the relevance of each choice.

We have rewritten the explanation of the simulation study to improve clarity.

Inconsistent Notation of Parameters: Symbols, such as $ K_{\text{tot}} $ and $ K $, are used without consistent definitions, which can confuse the reader regarding their respective roles.

K_tot is defined in the notation section (Section 2.1) and readers are reminded that it is the total number of studies in the simulation section.

Ambiguous Reporting of Type I Error: Terms like "Type I error inflation" and "conservativeness" could be better explained to clarify their impact on interpreting statistical significance in small sample settings.

We have clarified that inflated Type I error corresponds to Type I error exceeding the nominal level of 0.05 while conservativeness corresponds to Type I error below the nominal level of 0.05. We have also (i) reemphasized that XRRmeta is guaranteed to have Type I error at or below the nominal level for all sample sizes (i.e., it is not an asymptotic procedure) and (ii) type I error is expected to become closer to the nominal level as sample size increases and that some of the existing comparison methods do not exhibit this property.

Typographical Issues: Minor typographical inconsistencies, such as the inconsistent use of spaces in equations (e.g., "MH, MH_cc") and around parameters (e.g., "r0=0.01"), disrupt readability.

We have corrected the typographical issues for "MH, MH_cc". All equal symbols use math mode in latex for consistency (e.g., "r0=0.01").

Repetitive Language: Phrasing is repetitive regarding the limitations of classical methods in high heterogeneity settings, which could be streamlined to maintain focus.

We have removed some of the interpretation to streamline the main focus of the results section.

Visual Reference to Figures: The section references Figure 2 but lacks an introductory statement on what the figure represents, which could enhance readability.

We have clarified that figure shows power and Type I error to enhance readability.

Missing Statistical Test Details: While several methods are compared, the section does not explain the choice of these methods or the criteria for comparison (e.g., coverage levels, efficiency).

We have added that these are the most common methods for comparisons and that Type I error and power are used to evaluate their performance relative to XRRmeta.

4.2 Real Data Examples

Lack of Rationale for Chosen Studies: The section does not provide a clear rationale for choosing the rosiglitazone and face mask studies as case studies, which could help contextualize their relevance.

Following your previous suggestion, we clarified why the rosiglitazone was our primary motivating dataset in the introduction section (Section 1) due to its impact on regulation and the controversy of its results. We have clarified that the face mask study is chosen to represent a different setting, specifically one with a higher event but fewer studies, and due to the controversy over the use of face masks during the pandemic. We have also added background on the study in Section 4.2.2.

Complexity in Data Presentation: Presenting results from several statistical methods at once, without clearly organizing each method’s purpose and differences, makes the data hard to interpret.

We selected the same methods used for comparison in the simulation for consistency and clarity of presentation. Additionally, some of these methods were used in the initial analyses of the datasets. We emphasized this point in the introduction of Section 4. These represent the most commonly used methods in the literature and include both fixed and random effects methods, which is clarified in Section 3.

Missing Connection to Simulation Results: While the simulations aimed to demonstrate method effectiveness, the lack of connection between simulation outcomes and real data applications limits the relevance of the findings.

We have made connections to the simulation results throughout the real data analyses.

Formatting of Method Names: Method names, such as "MH-CC" and "Peto-R," are presented in shorthand without explanation, which may confuse readers unfamiliar with these abbreviations.

All of the shorthand notation is introduced the first time the methods are introduced in the simulation section. We have ensured that the naming is consistent across Sections 3 and 4 for clarity.

4.2.1 Rosiglitazone Study

Inconsistent Reference to Previous Studies: References to the rosiglitazone study’s initial findings could benefit from expanded citations for improved traceability of the original results.

We have included references to the original study as well as an updated analysis of the original paper.

Unclear Statistical Results Comparison: When comparing the results from XRRmeta with other methods, the differences in confidence intervals (CIs) and p-values could be better explained to make it clear why XRRmeta is preferable.

We have clarified that XRRmeta is preferable as it is not sensitive to the large number of DZ studies, presence of the two larger studies, and/or between study heterogeneity in the CVD analysis. These are issues that can impact the validity of the comparison methods, but not XRRmeta.

Typographical Errors in CIs: In Table 2, inconsistent formatting of CIs, such as "[1.03, 1.98]" and "[0.91,1.67]," creates a disjointed look.

We have updated the formatting of the confidence intervals in the text.

Unexplained Acronyms: Terms like "CVD" and "DZ" appear without a prior explanation, which can lead to confusion for readers not familiar with the context.

As discussed in an earlier response, DZ is defined when it is first used. CVD is defined at the beginning of Section 4.2.1 as cardiovascular mortality.

Complexity in Method Justification: Statements like "our procedure is justified in its exclusion of DZ studies" could be clarified, as it’s not immediately clear why XRRmeta excludes these studies.

We have reminded readers that our conditional inference argument is the justification for the removal of DZ studies. This is also clarified in Section 2 when we introduce XRRmeta.

4.2.2 Face Mask Study

Abrupt Transition Between Methods: The comparison of results between methods is presented in a rushed manner, making it difficult for readers to understand each method’s unique contribution to the analysis.

We have rewritten this section to improve the understanding of each method’s contribution. Specifically that all methods reach similar findings, but that the random effects methods yield CIs that are longer than the CIs from the fixed effects methods, which is both expected and consistent with our simulation findings.

Incomplete Explanation of Methodological Significance: The significance of the face mask study's results in demonstrating XRRmeta’s advantages could be expanded to better explain why XRRmeta is suitable for similar studies with rare events.

While the rosiglitzaone study indicates the value of XRRmeta when there are a large number of DZ studies, small number larger studies, and/or between study heterogeneity, the face mask study illustrates the utility of XRRmeta in a setting with a small number of studies. In particular, when event rates are not low, XRRmeta does not appear to be underpowered, which echoes the simulation study findings.

Typographical Consistency: In Table 3, abbreviations such as "MH-CC" and "Peto-R" are inconsistently spaced, which affects professionalism.

We have adjusted the notation for MH-CC to be consistent across the text and improve spacing.

Incomplete Data Context: Background on the transmission rates and settings for SARS-CoV-2 could help readers understand the importance of this meta-analysis and its conclusions.

We have added some background on the face mask meta-analysis to help readers understand its importance.

Repetitive Language: There is some redundancy in describing the protective effects of face masks, which could be simplified to improve focus on the statistical results.

We have simplified the language around protective effects of facemasks to focus on the statistical results.

Discussion
Missing Summary of Key Findings: The discussion lacks a concise summary of key findings from the simulations and real data examples, which would help reinforce the relevance of the XRRmeta method.

We have added a summary of key findings from the simulation to reinforce the relevance of the XRRmeta method.

Unclear Implications for Practice: Statements such as "XRRmeta was the only method that uniformly maintained type I error" could benefit from a discussion on the practical implications for researchers conducting meta-analyses on rare events.

We have clarified at the end of the conclusion that the trade off between power and type I error control is ultimately a question of the goal of the specific meta-analysis. XRRmeta favors type I error to prevent false positive conclusions.

Redundant Mention of Cochrane Handbook: The reference to the Cochrane Handbook appears multiple times with slight variations, which could be streamlined for brevity.

We have updated the reference to the Cochrane Handbook.

Typographical Errors in Citations: Inconsistent citation formatting (e.g., “[9,34]” without proper spacing) disrupts readability.

We have updated the references.

Abrupt End to Discussion: The discussion concludes without suggestions for future research directions, limiting the impact of the findings. Including proposed improvements or further validation studies would add value.

We have added a comment on further validation studies to improve the utility of XRRMeta in practice.

Technical Language: Statements like “the primary complication in implementing XRRmeta is that the cumulative distribution function of $ T(\mu;D_{\mu, \nu}) $” could be simplified or clarified for broader accessibility.

We have simplified the language in Section 3.1 when we first introduce XRRmeta.

Comments on the Quality of English Language

Complexity of Language: Technical jargon is often introduced without adequate context, which can make comprehension challenging, especially for a broader audience. Simplifying explanations where possible and defining complex terms upon first use would help.

Thank you for your suggestions and the careful reading of your paper. We have followed all of your comments in making the paper more accessible. We have simplified explanations where possible and defined complex terms when they are first introduced.

Inconsistent Terminology and Notation: Some terms, such as abbreviations for statistical methods (e.g., "MH-CC" and "Peto-R"), are inconsistently formatted. Standardizing terminology and maintaining uniform notation throughout would improve readability.

We have followed all your suggestions regarding consistency of formatting and have done a careful review of the paper to ensure that notation is uniform throughout the paper.

Typographical Errors and Formatting Issues: There are minor typographical inconsistencies, such as spacing around equations, inconsistent capitalization in headings, and formatting of confidence intervals. Correcting these errors would enhance the paper’s professional appearance.

We have followed all suggestions regarding spacing around equations, inconsistent capitalization in headings, and formatting of confidence intervals.

Redundant Phrasing: Certain explanations are repeated, especially when discussing the limitations of existing methods. Streamlining such content would allow for a more concise and engaging narrative.

We have made the discussions about existing methods more concise, particularly in the simulations study section to make the paper more concise and engaging.

Improving Flow: Transitions between sections, especially from results to discussion, could be made smoother by adding linking statements. This would make the paper’s structure easier to follow.

We have added linking statements where appropriate throughout the paper to improve the flow.

Reviewer 3 Report

Comments and Suggestions for Authors

Overall Comments: Gronsbell et al propose an exact inference for random effects meta-analysis targeting small sparse data. Overall the manuscript is well written. I have some comments for the authors to address.

Major Comments:

1. In Page 3, Section 2.1, why does the authors model it as a Poisson model with rate specified instead of a logistics regression, where Yij is a Binomial random variables from Nij. It will be more persuasive if the authors can provide some literature. Are there some reasons to select Poisson regression instead of logistics regression?

2. In Section 2.2, the treatment contrast pi_i is assumed to be a Beta distribution with alpha>1 and beta>1. Why the authors require alpha>1 and beta>1. I think the parameters only require to be positive. In addition, if the reality/truth is a bimodal effect, how to handle it?

3. It will be better if the authors discuss on sample size requirement and power analysis related to their model.

4. In Line 247, why do we assume Gamma distribution for lambda in simulation studies?

5. In Table 1, will the method still ok for Detrimental effect instead of Protective effect?

Minor Comments:

1. For small sample size, how does the authors handle zero count and nearly zero count?

2. How about if different individual study have different sizes? Is it possible to consider sample size as the weight in Analysis?

Author Response

Thank you very much for the careful reading of our paper and for the encouragement. We address your comments below and have made all changes to the manuscript in blue text so they may be easily found.

Major Comments:

In Page 3, Section 2.1, why does the authors model it as a Poisson model with rate specified instead of a logistics regression, where Yij is a Binomial random variables from Nij. It will be more persuasive if the authors can provide some literature. Are there some reasons to select Poisson regression instead of logistics regression?

We have added relevant literature to papers that use the Poisson model for rare-events meta analysis when we introduce the model (Section 2.1). In fact, in a recent large-scale simulation study, the Poisson regression was the preferred choice for rare events meta-analyses (see reference [19]). Poisson models offer a natural way to model count data. Additionally, one thing to note is that the Poisson is an approximation to the binomial when the event rate is low and the study-specific sample sizes are not too small. Therefore, the two models are very similar in most settings that XRRmeta will be applied to. Lastly, the Poisson model also lends itself to a random effects framework that enables us to take advantage of a conditional inference argument that justifies the removal of double-zero studies. Similar to the relative risk example given in the main text, finding an appropriate random effects distribution is challenging for the odds ratio.

In Section 2.2, the treatment contrast pi_i is assumed to be a Beta distribution with alpha>1 and beta>1. Why the authors require alpha>1 and beta>1. I think the parameters only require to be positive. In addition, if the reality/truth is a bimodal effect, how to handle it?

We require alpha>1 and beta>1 so that the location parameter is identifiable. We have added this to the main text (Section 2.2) so that readers are aware the parameter is not estimable if there is a bimodal effect. Practically, if there is bimodal effect, there is likely an issue in the datasets that have been selected for meta-analysis in that the studies represent fundamentally different populations or interventions. Meta-analysis is not an appropriate tool in this scenario.

It will be better if the authors discuss on sample size requirement and power analysis related to their model.

One of the key advantages of XRRmeta is that it is an exact inference method. It therefore provides valid inference regardless of the number of studies included in the meta-analysis. We have re-emphasized this point in the abstract to make it clear to readers. We evaluate the power of XRRmeta across various sample sizes in the simulation studies and in two real-data analyses with very different sample sizes. With respect to sample size calculations, these can be performed using simulation studies, but would require knowledge of the magnitude of the treatment effect, between-study heterogeneity, and the event rate. We comment on this as a point of future work in the discussion (Section 5).

In Line 247, why do we assume Gamma distribution for lambda in simulation studies?

We assume a Gamma distribution of the Poisson rate parameter for several reasons. Firstly, the Gamma distribution defined on the positive real line and the Poisson rate parameter is strictly non-negative. Secondly, it is a flexible distribution with two parameters that can be varied to change its shape and rate. Thirdly, it provides us with a natural way to induce between-study heterogeneity. We have rewritten the simulation study explanation for clarity (Section 4.1).

In Table 1, will the method still ok for Detrimental effect instead of Protective effect?

XRRmeta can be applied to studies with either a detrimental or protective effect. We included both the rosiglitazone study and the face masks study to illustrate the application of our procedure in two very different settings.

Minor Comments:

For small sample size, how does the authors handle zero count and nearly zero count?

Our method is based on a conditional inference argument which justifies the removal of double zero studies. In developing the test statistic for XRRmeta, we use a continuity correction so that \nu_0 can be estimated in the presence of zero event studies. However, in contrast to other methods, the continuity correction does not impact the validity of the inference of XRRmeta as it is based on the exact distribution of the test statistics. We have rewritten Section 3.2.1 to clarify this point.

How about if different individual study have different sizes? Is it possible to consider sample size as the weight in Analysis?

It is possible to consider different test statistics in XRRmeta. One option is to use the sample size as a weight as you have suggested. Alternatively, the inverse variance estimator (i.e., weighting each study inversely to its variance for point estimation) is generally expected to provide the best performance. However, this is true only when the sample size (i.e., the total number of events) of every individual study is sufficiently large. We initially investigated the inverse variance estimator of \mu_{0} in our numerical studies, but did not observe a substantial improvement in the efficiency of XRRmeta relative to our proposed method of moments estimators. We have emphasized this point in Section 3.2.1.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Review Report of the revised article entitled “Exact Inference for Random Effects Meta-Analyses for Small, Sparse Data”.

The article still requires further revision. A major revision is needed based on the detailed section-wise feedback provided.

Section 1. Introduction

General Writing and Clarity

Terminology and Clarity:

The introduction uses specialized terms like "continuity corrections" and "zero-event studies" without providing immediate context or examples, which might be challenging for readers unfamiliar with meta-analytic methods.

Flow of Ideas:

The transition between the practical issues in existing methods and the introduction of "XRRmeta" could be smoother. Currently, it feels abrupt when shifting from describing the challenges to naming the proposed method.

Context for XRRmeta:

While the article mentions that "XRRmeta" addresses limitations in existing methods, there is limited explanation of why "exact inference" is particularly advantageous for small, sparse data. Including a brief motivation could strengthen the introduction.

Specific Content Concerns

Referencing Issues:

Some citations (e.g., [7], [10-15]) are clustered in a way that might overwhelm readers. It might be helpful to consolidate these into key references or provide a brief summary of their findings.

Overgeneralization:

The claim that "practitioners often apply arbitrary continuity corrections or remove zero-event studies" could benefit from additional substantiation or referencing to avoid sounding anecdotal.

Scope of XRRmeta:

The introduction states that XRRmeta provides valid inference for various conditions but does not detail whether this comes with any trade-offs (e.g., computational complexity). Acknowledging this upfront could improve transparency.

Section 2. Problem Setup

1. Terminology Usage:

The term "nuisance parameter" is introduced but not clearly defined for non-specialist readers. While it is a standard term in statistics, a brief explanation or context would help improve accessibility.

2. Mathematical Notation and Clarity:

The formula for Yij∼Poisson(Nijλij)Y_{ij} \sim \text{Poisson}(N_{ij}\lambda_{ij}) and its subsequent explanations are clear, but the transition to λij=λi2eXijξi\lambda_{ij} = \lambda_{i2} e^{X_{ij}\xi_i} could confuse readers unfamiliar with Poisson modeling. A brief sentence connecting these to practical applications might help.
The use of expit(x)=11+e−x\text{expit}(x) = \frac{1}{1 + e^{-x}} is correct, but some readers might not recognize this as the sigmoid/logistic function. Mentioning this could improve clarity.
The notation Yi⋅=Yi1+Yi2Y_{i\cdot} = Y_{i1} + Y_{i2} is well-defined, but the text doesn't explain why summing across arms is crucial for eliminating the nuisance parameter λi2\lambda_{i2}. Adding a sentence here would strengthen understanding.

3. Assumptions on Exclusion of Double-Zero (DZ) Studies:

The justification for excluding DZ studies (those with no events in both arms) is presented as self-evident but could benefit from further elaboration. Why these studies provide no information on relative risk might not be immediately clear to all readers.

4. Parameter of Interest (π_i):

The introduction of πi≡expit(ξi)=λi1λi1+λi2\pi_i \equiv \text{expit}(\xi_i) = \frac{\lambda_{i1}}{\lambda_{i1} + \lambda_{i2}} as a treatment contrast is mathematically correct but may confuse readers due to the indirect way λij\lambda_{ij} is parameterized. Clarifying the interpretation of πi\pi_i (e.g., it represents the proportion of events occurring in the treated arm relative to the total event rate) would improve accessibility.
The condition πi∼Beta(α0,β0)\pi_i \sim \text{Beta}(\alpha_0, \beta_0) assumes unimodality and requires α0,β0>1\alpha_0, \beta_0 > 1, but the implications of these restrictions are not thoroughly discussed.

5. Practical Relevance:

The toy example given to demonstrate challenges with specifying the standard deviation for the normal random effects model (lines 129-133) is useful but underexplained. For instance:

The choice of ξi∼N(10,1000)\xi_i \sim N(10, 1000) seems arbitrary. Clarifying why this example is relevant and how it connects to real-world applications would make the example more compelling.
The conclusion drawn from this example could be stated more explicitly to guide the reader.

6. Constraints on Parameters:

The derivation of constraints for ν0\nu_0 (e.g., ν0≤μ0(1−μ0)min⁡(μ01+μ0,1−μ02−μ0)\nu_0 \leq \mu_0(1-\mu_0)\min\left(\frac{\mu_0}{1+\mu_0}, \frac{1-\mu_0}{2-\mu_0}\right)) is mathematically dense and lacks sufficient explanation of its practical implications. Why these constraints are essential should be clarified.

7. Typographical Errors or Formatting Issues:

Line 118: "This enables use to utilize" should be "This enables us to utilize."
Line 122: The phrase "ensuring the unimodality of the random-effect distribution" might benefit from more precise wording, such as "ensuring that the random-effect distribution is unimodal."

Section 3. Methods

1. General Organization and Clarity:

The structure of the section, particularly the distinction between "Exact Inference Procedure," "Test Statistic," and its subsections ("Balanced Design" and "Unbalanced Design"), could be better organized. For instance, the subsection "Test Statistic" is dense, and readers might benefit from a summary of its purpose at the beginning.

2. Mathematical Notation and Clarity:

2.1 Exact Inference Procedure:

Equation (2): p(μ;D0)=sup⁡νp(μ,ν;D0)p(\mu;D_0) = \sup_{\nu} p(\mu, \nu; D_0):

The definition of the profile p-value assumes familiarity with profiling nuisance parameters. Briefly explaining the rationale for taking the supremum over ν\nu would improve clarity.
While the equation is mathematically correct, some readers might benefit from examples or visual aids to understand how p(μ;D0)p(\mu; D_0) is calculated in practice.

The description of the Monte Carlo (MC) simulation procedure in Steps 1a and 1b (lines 159–163) does not specify how many replications MM are typically needed for practical precision, although this is later discussed in computational details.

2.2 Test Statistic:

Balanced Design:

The introduction of Yi1/Yi⋅∼Beta-Binomial(Yi⋅,μ0,ν0)Y_{i1} / Y_{i\cdot} \sim \text{Beta-Binomial}(Y_{i\cdot}, \mu_0, \nu_0) assumes knowledge of Beta-Binomial distributions. A brief explanation or reference to its properties would make this more accessible.
The continuity correction in Equation (3) (line 186) is briefly justified but might appear arbitrary to some readers. More explanation on why Yi1+0.5Y_{i1} + 0.5 is added could strengthen its inclusion.
The variance estimator Var^(μ^)\widehat{\text{Var}}(\hat{\mu}) (line 192) involves a mix of notations (K−2K^{-2} scaling), which could confuse readers. Adding a step-by-step derivation would improve clarity.

Unbalanced Design:

The resampling procedure for unbalanced designs (lines 214–218) is introduced without much context. Explaining why resampling is necessary and how it ensures computational efficiency could help the reader follow.
The description of the hypergeometric distribution for sampling probabilities (pi1lp_{i1l}) is correct but not intuitive. A small numerical example or diagram could clarify the approach.
The redefinition of variables (e.g., DicD_i^c, pi1lcp_{i1l}^c) and their use in deriving μ~\tilde{\mu} and ν~\tilde{\nu} (lines 229–237) is dense. Simplification or splitting this derivation into steps would improve readability.

2.3 Simplifications and Approximations:

The stochastic dominance assumption (lines 247–249), while intuitively explained, lacks formal justification or references to prior work validating this property. Including a brief statement about its empirical or theoretical support would enhance credibility.

3. Computational Efficiency:

While computational optimizations are well-discussed, the potential trade-off between speed and accuracy is not adequately addressed. For example, using fewer grid points or fewer MC iterations could affect the confidence interval's precision or robustness.
The suggestion to use M≥2000M \geq 2000 Monte Carlo replications for p-value estimation (line 263) is reasonable but lacks evidence or references to justify this threshold.

4. Practical Implementation and Examples:

The lack of practical examples in this section makes the methods difficult to follow for readers without strong mathematical backgrounds. For instance, applying the XRRmeta steps to a simple dataset would make the explanation more tangible.
Figure 1 is referenced but does not provide sufficient visual clarity on the overall process. Including illustrative examples for each step would greatly enhance comprehension.

5. Typographical and Stylistic Issues:

Line 149: "The unconditional test eliminates ν0\nu_0" might be rephrased for clarity, e.g., "The unconditional test addresses ν0\nu_0 by profiling out its effect."
Line 184: There is a missing space in "Y_{i1}/Y_{i\cdot}" in one instance.
Line 193: "Additional to computational efficiency" could be rephrased as "In addition to computational efficiency."
Line 214: "Exposition" could be replaced with "illustration" for better readability.

Section 5. Discussion

1. Clarity and Organization:

The discussion section lacks clear subsections or organization to separate key points, such as the implications of findings, limitations, and future directions. Structuring the section would make it easier to follow.
The discussion begins immediately with a summary of the method's advantages (lines 418–421), but it does not recap the key findings from the results section. A brief summary of major results would provide context for the discussion.

2. Key Points Missing or Underexplored:

Implications for Real-World Applications:

While the advantages of XRRmeta in controlling type I error are highlighted, the implications of these findings for practical applications in meta-analysis are not sufficiently explored. For instance:

How does XRRmeta's ability to control type I error impact decision-making in high-stakes settings like drug safety or public health policies?
Are there specific scenarios where XRRmeta should be preferred over traditional methods?

Comparison with Other Methods:

The discussion notes the limitations of other methods (e.g., inflated type I error in classical approaches, lines 430–431), but it does not fully contextualize these issues in terms of practical impact.
A more explicit comparison of XRRmeta with competing methods in terms of strengths and weaknesses (e.g., handling heterogeneity, computational efficiency) would provide a balanced view.

3. Limitations:

The limitations of XRRmeta are briefly mentioned (lines 444–448), but they are not sufficiently detailed. For instance:

The computational burden of XRRmeta is described (line 443), but there is no quantitative comparison to other methods or examples to illustrate the potential trade-offs between computational cost and precision.
The discussion acknowledges that XRRmeta is conservative and prioritizes type I error control (line 449), but the implications for power in real-world settings are not discussed in depth. For example:

Under what conditions might XRRmeta's conservatism lead to false negatives or missed signals of treatment effects?

The choice of test statistic (lines 440–442) is noted as a potential limitation, but no discussion is provided about whether alternative statistics were explored or how they might affect results.

4. Future Directions:

While future directions are mentioned (lines 450–452), they lack specificity. For instance:

What specific scenarios or datasets will the validation study involve?
What methodology will be used to assess the trade-offs between conservatism and power in rare events meta-analyses?
How will sample size calculations be developed and validated for XRRmeta?

The discussion could explore future extensions of XRRmeta, such as:

Applications to other types of rare events data or different effect measures.
Enhancements to reduce computational complexity.

5. Interpretation of Findings:

The statement that "XRRmeta uniformly maintained the type I error without yielding overly conservative inference" (lines 430–431) could be better supported by specific results from the simulations or real data examples. For instance:

Referencing specific CI widths or comparisons with classical methods would make this conclusion more robust.

The practical significance of the findings in the face mask study is underexplored. For example:

How does XRRmeta's result influence the interpretation of face mask efficacy compared to previous methods?

6. Typographical and Stylistic Issues:

Line 419: "Unlike classical methods, the coverage of the CI from our method is guaranteed to be at or above the nominal level" could be rephrased for better readability, e.g., "Unlike classical methods, our method guarantees CI coverage at or above the nominal level."
Line 428: "This finding was consistent across various settings" could specify which settings or provide an example for clarity.
Line 449: The phrase "if preserving type I error is the priority" is slightly awkward. Consider rephrasing to "when controlling type I error is prioritized."

Comments on the Quality of English Language

English Language is fine.

Author Response

Thanks for your careful reading of our manuscript The revisions from round 2 can be found in red text for easy identification. We believe your suggestions have made for a much stronger paper and appreciate your time.

Section 1. Introduction

General Writing and Clarity

Terminology and Clarity:

The introduction uses specialized terms like "continuity corrections" and "zero-event studies" without providing immediate context or examples, which might be challenging for readers unfamiliar with meta-analytic methods.

We have added more detail to the abstract and introduction to explicitly define a continuity correction (i.e., adding 1 event to arms with zero events). We have also removed the use of “zero-event studies” to explicitly state “studies in which no events have occurred in one or both arms.”

Flow of Ideas:

The transition between the practical issues in existing methods and the introduction of "XRRmeta" could be smoother. Currently, it feels abrupt when shifting from describing the challenges to naming the proposed method.

We have reframed the first sentence in the last paragraph of the introduction sentence to reiterate the gaps in current methodology and explicitly state that XRRmeta fills this gap.

Context for XRRmeta:

While the article mentions that "XRRmeta" addresses limitations in existing methods, there is limited explanation of why "exact inference" is particularly advantageous for small, sparse data. Including a brief motivation could strengthen the introduction.

We have noted that exact inference remains valid regardless of the number of studies and/or within-study sample sizes included in the analysis to the introduction to emphasize the importance of exact inference in the small, sparse data setting that is the focus of the manuscript.

Specific Content Concerns

Referencing Issues:

Some citations (e.g., [7], [10-15]) are clustered in a way that might overwhelm readers. It might be helpful to consolidate these into key references or provide a brief summary of their findings.

We have gone through the citations to make sure they are relevant to the given sentence and simplified wherever possible. Specifically, we have simplified (7, 10-15) in the introduction to 3 key references.

Overgeneralization:

The claim that "practitioners often apply arbitrary continuity corrections or remove zero-event studies" could benefit from additional substantiation or referencing to avoid sounding anecdotal.

We removed “often” from the sentence.

Scope of XRRmeta:

The introduction states that XRRmeta provides valid inference for various conditions but does not detail whether this comes with any trade-offs (e.g., computational complexity). Acknowledging this upfront could improve transparency.

We agree that one of the key risks of exact inference is the additional computational complexity. However, we carefully design the test statistic of XRRmeta so that it can be run on a laptop computer. We state this in the introduction so that practitioners are aware. In the conclusion, we also state the exact computational time for some of our analyses for transparency.

Section 2. Problem Setup

Terminology Usage:

The term "nuisance parameter" is introduced but not clearly defined for non-specialist readers. While it is a standard term in statistics, a brief explanation or context would help improve accessibility.

We have defined nuisance parameter to let readers know it is a parameter “that is not of direct interest for inference”

Mathematical Notation and Clarity:

The formula for Yij∼Poisson(Nijλij)Y_{ij} \sim \text{Poisson}(N_{ij}\lambda_{ij}) and its subsequent explanations are clear, but the transition to λij=λi2eXijξi\lambda_{ij} = \lambda_{i2} e^{X_{ij}\xi_i} could confuse readers unfamiliar with Poisson modeling. A brief sentence connecting these to practical applications might help.

We have clarified that this model is utilized T allow for heterogeneity across studies as both the baseline risk (i.e., $\lambda_{i2}$) and log relative risk (i.e., $\xi_i$) may vary across studies and is a common choice for random-effects meta-analyses of a treatment effect

The use of expit(x)=11+e−x\text{expit}(x) = \frac{1}{1 + e^{-x}} is correct, but some readers might not recognize this as the sigmoid/logistic function. Mentioning this could improve clarity.

We have updated the equation to explicitly define the expit function.

The notation Yi⋅=Yi1+Yi2Y_{i\cdot} = Y_{i1} + Y_{i2} is well-defined, but the text doesn't explain why summing across arms is crucial for eliminating the nuisance parameter λi2\lambda_{i2}. Adding a sentence here would strengthen understanding.

We have clarified that conditioning on the sufficient statistic enables us to eliminate the nuisance parameter to further clarify this point.

Assumptions on Exclusion of Double-Zero (DZ) Studies:

The justification for excluding DZ studies (those with no events in both arms) is presented as self-evident but could benefit from further elaboration. Why these studies provide no information on relative risk might not be immediately clear to all readers.

We have further clarified that this studies provide no information on the relative risk as the conditional distribution in equation (1) is degenerate (i.e., the binomial distribution has no trials).

Parameter of Interest (π_i):

The introduction of πi≡expit(ξi)=λi1λi1+λi2\pi_i \equiv \text{expit}(\xi_i) = \frac{\lambda_{i1}}{\lambda_{i1} + \lambda_{i2}} as a treatment contrast is mathematically correct but may confuse readers due to the indirect way λij\lambda_{ij} is parameterized. Clarifying the interpretation of πi\pi_i (e.g., it represents the proportion of events occurring in the treated arm relative to the total event rate) would improve accessibility.

We have clarified that \pi_i quantifies the magnitude of the event rate in the treated group relative to the cumulative event rate across both treatment arms in the $i$th study when this parameter is introduced.

The condition πi∼Beta(α0,β0)\pi_i \sim \text{Beta}(\alpha_0, \beta_0) assumes unimodality and requires α0,β0>1\alpha_0, \beta_0 > 1, but the implications of these restrictions are not thoroughly discussed.

We note in the current version of the manuscript that the parameter of interest is not identifiable without these restrictions. Without identifiability, the parameter is not estimable from a given dataset. We have added a note regarding estimability. Additionally, we include an example at the bottom of pg. 4 that further clarifies that the model would lead to paradoxical results without this identifiability constraint.

Practical Relevance:

The toy example given to demonstrate challenges with specifying the standard deviation for the normal random effects model (lines 129-133) is useful but underexplained. For instance:

The choice of ξi∼N(10,1000)\xi_i \sim N(10, 1000) seems arbitrary. Clarifying why this example is relevant and how it connects to real-world applications would make the example more compelling.
The conclusion drawn from this example could be stated more explicitly to guide the reader.

We have clarified the purpose of this example is to illustrate that it is unclear how to naturally constrain the standard deviation of the normal distribution to prevent this paradoxical behavior (i.e., there is positive probability that \xi_i could come from a N(10,1000), although it should be very negative) from occurring phenomena from occurring, therefore limiting the utility of this approach in real-world analyses.

Constraints on Parameters:

The derivation of constraints for ν0\nu_0 (e.g., ν0≤μ0(1−μ0)min⁡(μ01+μ0,1−μ02−μ0)\nu_0 \leq \mu_0(1-\mu_0)\min\left(\frac{\mu_0}{1+\mu_0}, \frac{1-\mu_0}{2-\mu_0}\right)) is mathematically dense and lacks sufficient explanation of its practical implications. Why these constraints are essential should be clarified.

These are not new constraints, but rather a result of our identifiability condition that \alpha, beta > 1 and the reparameterization given in equation (2). We have clarified this point to make this point evident to readers.

Typographical Errors or Formatting Issues:

Line 118: "This enables use to utilize" should be "This enables us to utilize."

We have corrected the typo.

Line 122: The phrase "ensuring the unimodality of the random-effect distribution" might benefit from more precise wording, such as "ensuring that the random-effect distribution is unimodal."

We have reworded this sentence.

Section 3. Methods

General Organization and Clarity:

The structure of the section, particularly the distinction between "Exact Inference Procedure," "Test Statistic," and its subsections ("Balanced Design" and "Unbalanced Design"), could be better organized. For instance, the subsection "Test Statistic" is dense, and readers might benefit from a summary of its purpose at the beginning.

To improve clarity, we have renamed the sections to: (1) Overview of the Exact Inference Procedure and (2) Proposed Test Statistic for the Exact Inference Procedure. We have also added a short summary under section (2) to help readers understand the upcoming subsections.

Mathematical Notation and Clarity:

2.1 Exact Inference Procedure:

Equation (2): p(μ;D0)=sup⁡νp(μ,ν;D0)p(\mu;D_0) = \sup_{\nu} p(\mu, \nu; D_0):

The definition of the profile p-value assumes familiarity with profiling nuisance parameters. Briefly explaining the rationale for taking the supremum over ν\nu would improve clarity.

We have clarified that the profile $p$ value is the $p$ value for $\mu$ that incorporates uncertainty from $\nu$.

While the equation is mathematically correct, some readers might benefit from examples or visual aids to understand how p(μ;D0)p(\mu; D_0) is calculated in practice.

The detailed procedure for utilizing the profile p-value to obtain the confidence interval is provided in Figure 1. While we agree that some readers may be new to profile p-values, it is outside the scope of the paper to detail their calculation. We have therefore provided a key reference on profile likelihood for readers that are unfamiliar with the general concept behind profiling ([30]).

The description of the Monte Carlo (MC) simulation procedure in Steps 1a and 1b (lines 159–163) does not specify how many replications MM are typically needed for practical precision, although this is later discussed in computational details.

We have left the details to its own section, as it it is important to first introduce the test statistic before describing the details of the computation.

2.2 Test Statistic:

Balanced Design:

The introduction of Yi1/Yi⋅∼Beta-Binomial(Yi⋅,μ0,ν0)Y_{i1} / Y_{i\cdot} \sim \text{Beta-Binomial}(Y_{i\cdot}, \mu_0, \nu_0) assumes knowledge of Beta-Binomial distributions. A brief explanation or reference to its properties would make this more accessible.

The beta binomial distribution is first introduced in Section 2.2 We explain there (line 125) that the beta distribution is used to define the random-effects model that allows for between-study heterogeneity.

The continuity correction in Equation (3) (line 186) is briefly justified but might appear arbitrary to some readers. More explanation on why Yi1+0.5Y_{i1} + 0.5 is added could strengthen its inclusion.

We select 0.5 as a historical convention and because it is a small number so not as to introduce significant bias. We have clarified this point and provided a relevant reference.

The variance estimator Var^(μ^)\widehat{\text{Var}}(\hat{\mu}) (line 192) involves a mix of notations (K−2K^{-2} scaling), which could confuse readers. Adding a step-by-step derivation would improve clarity.

We have clarified that this variance calculation follows from the definitions of the moments in equation in equation 3.

Unbalanced Design:

The resampling procedure for unbalanced designs (lines 214–218) is introduced without much context. Explaining why resampling is necessary and how it ensures computational efficiency could help the reader follow.

The beginning of the section clarified that we select weighted moments estimators, rather than estimators requiring iterative calculation such as maximum likelihood estimators, as they would substantially increase the computation time of XRRmeta. This is the primary reason for this approach.

The description of the hypergeometric distribution for sampling probabilities (pi1lp_{i1l}) is correct but not intuitive. A small numerical example or diagram could clarify the approach.
The redefinition of variables (e.g., DicD_i^c, pi1lcp_{i1l}^c) and their use in deriving μ~\tilde{\mu} and ν~\tilde{\nu} (lines 229–237) is dense. Simplification or splitting this derivation into steps would improve readability.

While we appreciate your suggestion, we feel that it is beyond the scope of the manuscript to devote a diagram to clarify the use of the hypergeometric distribution as a reader must have knowledge of this distribution apriori. We have, however, clarified the intuition for the sampling weights to improve clarity. Additionally, we have included the detailed numerical example in this section to help clarify the proposed procedure to readers, which we believe is more illustrative than a diagram.

2.3 Simplifications and Approximations:

The stochastic dominance assumption (lines 247–249), while intuitively explained, lacks formal justification or references to prior work validating this property. Including a brief statement about its empirical or theoretical support would enhance credibility.

This property has not been verified theoretically, but evaluated empirically in our previous manuscript ([29]). We have clarified this point and added an appropriate reference.

Computational Efficiency:

While computational optimizations are well-discussed, the potential trade-off between speed and accuracy is not adequately addressed. For example, using fewer grid points or fewer MC iterations could affect the confidence interval's precision or robustness.
The suggestion to use M≥2000M \geq 2000 Monte Carlo replications for p-value estimation (line 263) is reasonable but lacks evidence or references to justify this threshold.

We appreciate your suggestion and have added this point to the manuscript. We have justified our choice of 2000 as it enables us to calculate the p-value with a reasonable amount of precision. We have also utilized and evaluated this choice in our prior work ([29]), we have added the relevant reference to this section.

Practical Implementation and Examples:

The lack of practical examples in this section makes the methods difficult to follow for readers without strong mathematical backgrounds. For instance, applying the XRRmeta steps to a simple dataset would make the explanation more tangible.

We appreciate your suggestion. XRRmeta is available in an R package and only requires a single function call. We have included R code in the supplement so that readers are familiar with how to implement XRRmeta on a single dataset.

Figure 1 is referenced but does not provide sufficient visual clarity on the overall process. Including illustrative examples for each step would greatly enhance comprehension.

While we appreciate your suggestion and have discussed it amongst ourselves, we have found that using a list to describe the proposed procedure provides more clarity. We have, however, included a visual of the (mu, nu) parameter space so that readers have a visualization available to further clarify the overall procedure.

Typographical and Stylistic Issues:

Line 149: "The unconditional test eliminates ν0\nu_0" might be rephrased for clarity, e.g., "The unconditional test addresses ν0\nu_0 by profiling out its effect."

Following your earlier suggestion, we have updated this sentence to clarify what the profile p-value is and provided intuition that addresses the uncertainty in \nu.

Line 184: There is a missing space in "Y_{i1}/Y_{i\cdot}" in one instance.

We have checked line 184 and the typesetting is correct for the fraction.

Line 193: "Additional to computational efficiency" could be rephrased as "In addition to computational efficiency."

We have updated the typo.

Line 214: "Exposition" could be replaced with "illustration" for better readability.

We have updated the wording.

Section 5. Discussion

Clarity and Organization:

The discussion section lacks clear subsections or organization to separate key points, such as the implications of findings, limitations, and future directions. Structuring the section would make it easier to follow.

While we appreciate your suggestion we took a look through some methodological papers in STAT. It seems most follow the same style with a single section for “Discussion” or “Conclusion” (e.g., https://www.mdpi.com/2571-905X/7/3/56). This is also a common style for other statistical journals so we have decided to keep the current format.

The discussion begins immediately with a summary of the method's advantages (lines 418–421), but it does not recap the key findings from the results section. A brief summary of major results would provide context for the discussion.

We have restructured the discussion to begin with a discussion of the results.

Key Points Missing or Underexplored:

Implications for Real-World Applications:

While the advantages of XRRmeta in controlling type I error are highlighted, the implications of these findings for practical applications in meta-analysis are not sufficiently explored. For instance:

How does XRRmeta's ability to control type I error impact decision-making in high-stakes settings like drug safety or public health policies? Are there specific scenarios where XRRmeta should be preferred over traditional methods?

We appreciate your suggestion and have added the relevance to drug safety and public health policies in the discussion section.

Comparison with Other Methods:

The discussion notes the limitations of other methods (e.g., inflated type I error in classical approaches, lines 430–431), but it does not fully contextualize these issues in terms of practical impact.

We have followed your prior suggestion to include examples, such as drug safety and public health studies, to clarify the importance of preserving type I error.

A more explicit comparison of XRRmeta with competing methods in terms of strengths and weaknesses (e.g., handling heterogeneity, computational efficiency) would provide a balanced view.

We have added this point to the discussion section.

Limitations:

The limitations of XRRmeta are briefly mentioned (lines 444–448), but they are not sufficiently detailed. For instance:

The computational burden of XRRmeta is described (line 443), but there is no quantitative comparison to other methods or examples to illustrate the potential trade-offs between computational cost and precision.

We have provided the time it takes to run XRRmeta in the discussion. We have also included the time it takes for comparison methods, which are available in the metabin package. With only 5 days for revision, it is not possible to provide a more in-depth analysis. We have also considered the computational complexity of exact inference in our prior work (e.g., [29]; Michael et al, Biometrics, 2019).

The discussion acknowledges that XRRmeta is conservative and prioritizes type I error control (line 449), but the implications for power in real-world settings are not discussed in depth. For example:

Under what conditions might XRRmeta's conservatism lead to false negatives or missed signals of treatment effects?

We have added details to the discussion as mentioned in a prior response.

The choice of test statistic (lines 440–442) is noted as a potential limitation, but no discussion is provided about whether alternative statistics were explored or how they might affect results.

We mention other tests statistics in the Section 3.2.1 (i.e., a test statistic based on the inverse variance estimator in the last paragraph) and in Section 3.2.2 (i.e., a test statistic using the maximum likelihood estimator in the first paragraph ).

Future Directions:

While future directions are mentioned (lines 450–452), they lack specificity. For instance:

What specific scenarios or datasets will the validation study involve?
What methodology will be used to assess the trade-offs between conservatism and power in rare events meta-analyses?
How will sample size calculations be developed and validated for XRRmeta?

Our prior statements regarding future work were motivated by the recent work by Tsujimoto et al ([ 34]). The authors conducted a large study across 885 meta-analyses in Cochrane reviews to evaluate the impact of continuity correction methods in Cochrane reviews with single-zero trials with rare events. We have clarified that we plan to conduct a validation study across a large number of rare events meta analyses following Tsujimoto et al with the goals of (i) investigating the impact of XRRmeta in favoring conservatism and (ii) developing sample size calculations.

The discussion could explore future extensions of XRRmeta, such as:

Applications to other types of rare events data or different effect measures.
Enhancements to reduce computational complexity.

We appreciate your suggestions and have added them to the discussion section.

Interpretation of Findings:

The statement that "XRRmeta uniformly maintained the type I error without yielding overly conservative inference" (lines 430–431) could be better supported by specific results from the simulations or real data examples. For instance:

Referencing specific CI widths or comparisons with classical methods would make this conclusion more robust.

Our statement follows from Figures 3 and S1. We have clarified this in the test.

The practical significance of the findings in the face mask study is underexplored. For example:

How does XRRmeta's result influence the interpretation of face mask efficacy compared to previous methods?

The justification for the choice of the face mask study is provided in the beginning of Section 5. The study was chosen as the event rate is higher and due its public health relevance. With respect to the conclusions, there is strong evidence of reduced person-to-person virus transmission with face mask use across all of the methods. These results re-emphasize the findings in our simulation studies which illustrate that XRRmeta is not substantially underpowered relative to the comparison methods we have considered, particularly in this example with few studies.

Typographical and Stylistic Issues:

Line 419: "Unlike classical methods, the coverage of the CI from our method is guaranteed to be at or above the nominal level" could be rephrased for better readability, e.g., "Unlike classical methods, our method guarantees CI coverage at or above the nominal level."

This sentence has been removed as we restructured the discussion.

Line 428: "This finding was consistent across various settings" could specify which settings or provide an example for clarity.

We have clarified that the settings are those considered in Section 4.1.

Line 449: The phrase "if preserving type I error is the priority" is slightly awkward. Consider rephrasing to "when controlling type I error is prioritized."

We have updated the sentence.

Reviewer 3 Report

Comments and Suggestions for Authors

Thank you for inviting me in reviewing this interesting manuscript.

Author Response

Thank you for helping us to improve the manuscript.

Article Menu

Exact Inference for Random Effects Meta-Analyses for Small, Sparse Data

Further Information

Guidelines

MDPI Initiatives

Follow MDPI