Next Article in Journal
Critical Poles and Third-Order Nonlinear Differential Equations
Previous Article in Journal
Anomaly-Aware Graph-Based Semi-Supervised Deep Support Vector Data Description for Anomaly Detection
Previous Article in Special Issue
Explaining the Anomaly Detection in Additive Manufacturing via Boosting Models and Frequency Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Bayesian Interval Estimation for Rare Binomial Events: A Variance-Blending Calibration Framework

1
Department of Mathematical Sciences, Sol Plaatje University, Kimberly 8300, South Africa
2
Department of Mathematics and Computer Science, Modern College of Business and Science, Muscat 133, Oman
3
Department of Statistics and Operations Research, University of Limpopo, Polokwane 0727, South Africa
*
Authors to whom correspondence should be addressed.
Mathematics 2025, 13(24), 3988; https://doi.org/10.3390/math13243988
Submission received: 30 October 2025 / Revised: 27 November 2025 / Accepted: 11 December 2025 / Published: 14 December 2025
(This article belongs to the Special Issue Advances of Applied Probability and Statistics)

Abstract

Classical binomial interval methods often exhibit poor performance when applied to extreme conditions, such as rare-event scenarios or small-sample estimations. Recent frequentist and Bayesian approaches have improved coverage in small samples and rare events. However, they typically rely on fixed error margins that do not scale with the magnitude of the proportion. This distorts uncertainty quantification at the extremes. As an alternative method to reduce these boundary distortions, we propose a novel hybrid approach. It blends Bayesian, frequentist, and approximation-based techniques to estimate robust and adaptive intervals. The variance incorporates sampling variability, Wilson score margin of error, a tuned credible level, and a gamma regularization term that is inversely proportional to sample size. Extensive simulation studies and real-data applications demonstrate that the proposed method consistently achieves better coverage proportions at all sample sizes and proportions. It provides more conservative interval widths below a sample size of 50 and competitively narrower widths from moderate to large sample sizes, especially beyond 50, compared to the Jeffreys’ and Wilson score intervals. Geometric analysis of the tuning curves demonstrates how the blended method adaptively tunes credible levels across binomial extremes. It starts at higher values for small samples and gradually flattens into near-linear, symmetric trajectories as sample size increases. This ensures robust coverage and balanced sensitivity. Our method offers a theoretically grounded, computationally efficient, and practically robust estimation of rare-event intervals. These intervals have applications in safety-critical reliability, epidemiology, and early-phase clinical trials.

1. Introduction

Rare-event phenomena such as emerging disease outbreaks, earthquakes, and industrial defects are typically characterized by extremely low probability of occurrence. The estimation of binomial proportions for such events poses significant challenges, particularly due to the inadequacy of conventional confidence interval methods. Classical approaches, such as the Wald interval, frequently perform poorly for these events due to their reliance on asymptotic normality, which fails as the true proportion approaches the extreme boundaries of the binomial proportions [1,2]. The Clopper–Pearson interval also tends to be overly conservative, thus producing wide intervals that result in statistical inefficiency [3]. Conversely, asymptotic approximation methods such as the Wald interval may yield substantial under-coverage probabilities when the event probability is small [2].
To address the weaknesses of the traditional confidence interval estimations under rare-event conditions, many frequentist alternative approaches [4,5,6,7,8,9,10] and Bayesian alternatives [10,11] have been proposed. Although these methods have demonstrated improvements in coverage probability, especially in small samples or extreme and rare events, they are built under the assumption of a fixed error margin, which is not plausible for events with proportions at the extreme ends of the probability spectrum (0 and 1). This is because such margins fail to scale with the magnitude of the proportion being estimated, which may lead to under- or overestimated uncertainties, depending on the location of the proportion within the parameter space [12]. This results in inaccurate estimates because the intervals are either too wide to be practically useful or too narrow to be valid, depending on the sample size and the rarity of the event [13].
The incorporation of adaptive margins that scale with the magnitude of the proportion into existing classical methods has greatly improved the accuracy of both the coverage probability and the interval width of rare events [13]. Although alternative adaptive Bayesian methods—such as the adaptive Huber’s ε-contamination model, which assumes that the error margin (ε) is unknown and then adaptively estimates—have been shown to perform well in estimating probabilities on a continuous scale, they tend to underperform in small-sample or rare-event scenarios [14]. Bayesian adaptive prior methods have also been proposed and shown to perform well in rare events, but they require correct scale parameter tuning, as they may increase the credible width when incorrectly tuned or specified [15]. As an alternative, we introduce an adaptive Bayesian variance-blending calibration framework (hereafter known as the blended approach or method).
The proposed method offers several key advantages. It significantly improves coverage accuracy by integrating information from the Beta distribution via Jeffreys’ prior, the Wilson score, and a credible-level tuning parameter, as well as a gamma regularization parameter which enables the targeted nominal coverage to be maintained more efficiently across a wide range of sample sizes and true proportions. The use of the Beta distribution and logit transformations stabilizes estimates near the extremes; thus, our method is robust for small sample sizes and extreme proportions near 0 or 1, which provides more accurate and stable intervals. Additionally, by combining multiple sources of variability, the approach avoids overly conservative or overly optimistic interval widths, resulting in intervals that are reasonably narrow without sacrificing coverage. The method, therefore, strikes an optimal balance between precision and coverage. The incorporation of the gamma and the credible level parameters allows the intervals to adapt to the sample size and true proportions, which accounts for sampling uncertainty and potential overdispersion. Computationally, the proposed method is fully vectorizable and memory-efficient, thus enabling seamless integration into automated workflows and statistical software. Furthermore, its modular structure allows for rapid diagnostics and tuning, facilitating iterative refinement without computational bottlenecks.
This paper makes contributions to the statistical theory of estimation by introducing a blended confidence interval method that inherits asymptotically desirable properties (consistency, efficiency, and asymptotic normality) of frequentist and model-based inferences. Specifically, as n → ∞, the proposed interval exhibits consistency, asymptotic normality, and nominal coverage probability. The blending mechanism is theoretically grounded in logit-scale transformations and adaptive variance blending, ensuring that the method remains robust for small samples and across extreme events. Furthermore, this paper makes practical contributions in that the method is robust and adaptive to extreme and rare events, as well as a small sample size. It also yields high coverage with conservative width, thus making it more adaptable to problems in various fields and applications. The rest of the paper is outlined as follows. The theoretical framework is presented in Section 2, while the simulation design is described in Section 3. The results and discussion are then provided in Section 4, and the paper concludes with a summary in Section 5.

2. Theoretical Framework

In this section, we provide details of the traditional Bayesian interval estimation approach and then extend it to an adaptive version by incorporating our proposed margin correction and variance blending method. We also provide the methodology for the Wilson approach as well as methods for assessing the performances of the three methods. Accurate interval estimation for binomial proportions is fundamental in statistical inference. Bayesian methods offer probabilistically coherent intervals that are derived from the posterior distribution. For binomially distributed data, Bayesian updating with conjugate Beta priors yields Beta-distributed posteriors, thus enabling straightforward computation of credible intervals. However, traditional credible intervals may be too conservative or too narrow, especially for small samples with extreme proportions. To address this, we introduce a blended variance calibration framework that incorporates a tuned credible level within a Bayesian framework to enable adaptive uncertainty quantification.

2.1. Standard Bayesian Credible Interval Using the Beta Distribution and Jeffreys’ Prior

Consider a sample size n (number of trials) and number of successes k, where k Binomial n , p and p, the true underlying success probability. The goal is to estimate p ^ using our proposed method, the standard Bayesian method with Jeffreys’ prior, and Wilson score interval.

2.1.1. Formulation

Assume that p follows the prior distribution of Jeffreys’ [16], such that
p | k ~ Beta 1 2 + k , 1 2 + n k
By denoting a = 1 2 + k and b = 1 2 + n k , with a credible level 1 α , then the 100 1 α %   credible interval is defined by Equation (2) as
[ Beta 1 ( α 2 , a , b ) , Beta 1 ( 1 α 2 , a , b ) ] ,
where Beta 1 q , a , b   is the inverse CDF (percent point function) of the Beta distribution. Given the posterior mean and the variance of the Beta a , b as
E p = a a + b   and   Var p = a b a + b 2 a + b + 1 ,
we define the credible width as
Beta 1 α 2 , a , b Beta 1 1 α 2 , a , b .
This is a quantile-based width and, although it narrows as n → ∞, it does not scale directly with the posterior variance. Quantiles reflect cumulative distribution and not local spread. Therefore, the standard Bayesian and the Jeffreys’ interval shrink as sample size increases but do not scale proportionally with the posterior variance; hence, their quantile spreads are fixed. The analysis below explains why the standard Bayesian credible interval width does not scale proportionally with variance.

2.1.2. Non-Proportionality Scaling of the Bayesian Credible Width with Variance

Let the quantile function   Q q a , b be implicitly defined as
Q q a , b = 0 Q q a , b f t d t ,   where   f t = t a 1 1 t b 1 B a , b
The definition in Equation (5) suggests that Q q a , b is the value such that the cumulative distribution function (CDF) equals q, i.e., F Q q = q . Using the chain rule,
d d q F Q q = d F d t d Q q d q = f Q q d Q q d q
However, since F Q q = q , substitution leads to d d q q = 1   f Q q d Q q d q = 1 . Therefore
d Q q d q = 1 f Q q
Substitution of Equation (7) into Equation (5) yields
d Q q d q = 1 f Q q = B a , b Q q a 1 1 Q q b 1
Now, let q 0 be the central quantile and Δ q = α 2 be a 100 1 α %   interval; the lower and upper quantiles are respectively defined by Q l = Q q 0 Δ q ,   Q u = Q q 0 + Δ q . Let the interval width be denoted by w = Q u Q l . Using a first-order Taylor expansion, it can be shown that
Q q 0 ± Δ q Q q 0 ± d Q q d q q 0 . Δ q   and   W 2 d Q q d q q 0 . Δ q
Therefore, by substituting the quantile derivative, the interval width becomes
w 2 Δ q B a , b Q q 0 a 1 1 Q q 0 b 1
From the quantile derivative in Equation (8)
d Q q d q q 0   if   a < 0   since   Q q 0 a 1 0   as   Q q 0 0 .
Similarly,
d Q q d q q 0   if   b < 0   since   1 Q q 0 b 1 0   as   Q q 0 1 .
The implications are that near the boundaries (0 and 1), the quantile function becomes extremely sensitive to small changes in q, especially when a < 1 or b < 1, which leads to distorted interval widths in the tails. Therefore, width behaves nonlinearly, especially at the boundaries. This explains why the intervals stretch in the tails and the width does not scale proportionally with variance.

2.2. Wilson Score Interval

Given the Z-score z = Φ 1 1 α 2 , the confidence level 1 α , and p ^ = k n , where k and n are previously defined, then the Wilson score interval [2] is formulated as
p ^ + z 2 2 n z p ^ ( 1 p ^ ) n + z 2 4 n 2 1 + z 2 n , p ^ + z 2 2 n + z p ^ ( 1 p ^ ) n + z 2 4 n 2 1 + z 2 n

2.3. Adaptive Bayesian-Logit-Scaled Variance-Blending Calibration Framework

Here we present a detailed formulation of our proposed method. We then prove its width’s proportionality to the margin of error and how the margin of error also adapts to the sample size. We further assess the asymptotic properties of the method.

2.3.1. Formulation

Using the Jeffreys’ Beta posterior prior [16], we transform the lower and the upper confidence limits by first scaling it such that
l = Beta 1 1 ϕ 2 , a , b   and   u = Beta 1 1 1 ϕ 2 , a , b
where   ϕ   is the credibility multiplier representing the coverage level, and it determines how wide the interval is based on the desired confidence level. It therefore serves as a regularization parameter, and it allows the calibration (via optimization) to shrink extreme intervals to stabilize the estimates. Now let us transform the interval on a logit scale as follows,
Logit L = log l 1 l   and   Logit U = log u 1 u
The center and the half-width are computed as
c = Logit L + Logit U 2   and   h = Logit U Logit L 2
Definition 1. 
Given the sample variance   var sample , model-based variance   var γ , and the Wilson variance approximation   MOE Wilson  defined by
var sample = 1 n p ^ 1 p ^ + ϵ ,   var γ = θ n 2   and   MOE Wilson = z p ^ ( 1 p ^ ) n + z 2 4 n 2 2
the blended variance and the corresponding standard error are given by
var Blend = var γ + var sample + MOE Wilson   and   S E Blend = var Blend
where  ϵ = 1 × 1 0 8   is a correction factor that prevents division by zero.
Now, substituting the blended standard error in Definition 1 into the half-width in Equation (12) yields the adaptive margin of error in Equation (13) below.
δ = h S E Blend max S E Blend , ϵ
We now adjust the logit interval on the probability scale to obtain the credible interval as
max 0 , 1 1 + e c δ , min 1 , 1 1 + e c + δ
It is observed in Definition 1 that the blended variance assumes an additive structure without explicit weights. The additive structure of the blended variance is rooted in decision-theoretic principles and variance decomposition under quadratic risk. In classical risk frameworks, particularly under squared-error loss, the total risk is expressed as the sum of independent variance components, where each represents a distinct source of uncertainty [17,18]. This additive property ensures interpretability and coherence because each term contributes proportionally to the overall uncertainty without requiring complex interactions. Introducing explicit weights would necessitate estimating additional hyperparameters, which increases computational complexity and introduces potential instability, especially in small samples where parameter estimation is inherently noisy. Weighted schemes also risk overfitting specific scenarios; thus, they reduce generalizability across rare-event regimes. By contrast, the unweighted additive form provides a parsimonious solution that is theoretically justified and empirically robust. Furthermore, asymptotic analysis (see Section 2.3.3) confirms that all components decay at a rate O(1/n), which ensures that none of the sources dominate as the sample size grows. This property aligns with minimax risk principles thus emphasizing robustness and stability over aggressive shrinkage, which is critical for rare-event inference where under-coverage can have severe consequences.
The variance blending technique integrates frequentist variance decomposition, Bayesian regularization, and decision-theoretic risk aggregation [2,17,18,19]. Each component serves a unique role in addressing limitations of traditional interval estimation methods. The sampling variance captures uncertainty inherent in binomial data and is derived from the Fisher information on the logit scale. Its magnitude increases near the boundaries of the parameter space (0 or 1) and in small samples, thus reflecting the heightened variability in these situations. The gamma regularization term introduces a prior-informed penalty that stabilizes estimates when data are sparse or proportions are extreme. This term acts as a safeguard against erratic behavior by damping excessive variability. However, its influence diminishes with increasing sample size, thus preserving asymptotic efficiency. The Wilson margin penalty complements these adjustments by correcting for skewness and boundary effects which ensures conservative interval widths in low-proportion regions where classical methods often fail. Together, these components provide a balanced framework that adapts dynamically to sample size and proportion, hence mitigating distortions without sacrificing efficiency.
Regularization within the blended variance is further enhanced through credible level tuning, which optimizes the trade-off between coverage probability and interval width. This adaptive calibration ensures that intervals remain sufficiently wide to maintain nominal coverage in rare-event scenarios while avoiding unnecessary conservatism in moderate regimes. The integration of Bayesian and frequentist elements provides dual advantages. While the Bayesian priors stabilize inference near boundaries, the frequentist components maintain desirable asymptotic properties such as consistency and efficiency. Practically, this design offers robustness across diverse applications such as epidemiology, reliability engineering, and early-phase clinical trials, where rare events and small samples are common.
The credible interval constructed in Equation (14) is built on several key assumptions that ensure its theoretical validity and practical relevance. These assumptions include the following:
  • The observations are sampled from independent Bernoulli trials with a fixed number of trials n and constant success probability p.
  • Each trial is independent, with no correlation or clustering effects.
  • The Bayesian component assumes a non-informative Jeffreys’ prior to stabilize inference near boundaries.
  • Adjusting the credible level within the specified grid (e.g., 98–99.9%) achieves near-nominal frequentist coverage without introducing bias.
  • Gamma regularization scales with sample size. This suggests that the term acts as a penalty, which stabilizes variance in small samples and diminishes as n grows, thus preserving asymptotic efficiency.
  • Transforming the intervals to the logit scale is assumed to improve numerical stability and symmetry near extreme proportions.
  • The sampling variance, Wilson margin penalty, and Gamma regularization are assumed to combine additively without violating consistency or efficiency.
  • For large sample sizes, the method assumes convergence to normality on the logit scale, thus ensuring consistency and efficiency as n → ∞.

2.3.2. Proportionality Scaling of the Credible Width with the Variance

Let Δ q = 1 ϕ 2 , the lower bound: l = Q 1 ϕ 2 a , b , and the upper bound: u = Q 1 1 ϕ 2 a , b ; then l = Q q 0 Δ q   and   u = Q q 0 + Δ q . The Beta quantile function Q q ( a , b ) is implicitly defined as
Q q ( a , b ) = 0 Q q ( a , b ) f t d t ,   where   f t = t a 1 1 t b 1 B a , b
Applying first-order Taylor expansion to Equation (15) yields
Q q 0 ± Δ q Q q 0 ± d Q q d q q 0 . Δ q   and   W   2 d Q q d q q 0 . Δ q
The quantile derivative function is obtained as
d Q q d q   = 1 f Q q = B a , b Q q a 1 1 Q q b 1
Therefore, after the substitution of Δ q = 1 ϕ 2 , the width in Equation (12) becomes
W 1 ϕ d Q q d q q 0
Equation (18) implies that the width is proportional to 1 ϕ , thus increasing ϕ reduces the interval width linearly. To further show that the width is proportional to the posterior variance, we transform the interval to logit scale by letting
l q = log Q q 1 Q q
The application of the first-order Taylor expansion to l q yields
l q 0 ± Δ q l q 0 ± d l d q q 0 . Δ q ,
where at Q   = q 0 and
d l d q = d l d Q   d Q   d q   = 1 Q 1 Q   d Q   d q   = B a , b Q a 1 Q b  
Therefore, by the substitution of d l d q and Δ q = 1 γ 2 , the width on the logit scale becomes
W Logit 1 ϕ B a , b Q a   1 Q b  
By the definition of margin of error, our adjusted margin of error on the logit scale becomes
MOE log it 1 ϕ B a , b Q a   1 Q b   1 n p ^ 1 p ^ + ϵ + θ n 2 + z p ^ ( 1 p ^ ) n + z 2 4 n 2 2
W Logit 1 ϕ 1 n p ^ 1 p ^ + ϵ + θ n 2 + z p ^ ( 1 p ^ ) n + z 2 4 n 2 2
From Equation (21), the logit-scaled half-width is inversely proportional to ϕ (tail probability), hence it scales proportionally to the asymptotic variance or the uncertainty.

2.3.3. Asymptotic Properties

The margin of error of the Jeffreys’ prior (through the Bernstein–von Mises theorem), the model-based method, and the Wilson method are asymptotically normal, efficient, and consistent. Therefore, by default, our blended method inherits these properties. In the preceding sub-sections, we assess these properties to establish the underlying asymptotic and other theoretical properties of the blended method.
(a)
Asymptotic efficiency
From Definition 1, the blended variance is obtained as
var blend = 1 n p ^ 1 p ^ + ϵ   sampling + θ 2 n + Gamma z p ^ ( 1 p ^ ) n + z 2 4 n 2 2 Wilson
From this variance, as n , each term behaves as follows: sampling term ~ O 1 n , Gamma term ~ O 1 n , and Wilson term ~ O 1 n . Therefore, following from the Hajek–Le Cam theorem of local asymptotic normality and influence functions [20], var blend 0   as n . Furthermore var blend O 1 n S E Blend O 1 n , hence
δ = h S E Blend max S E Blend , ϵ O 1 n
which implies that the subsequent interval width shrinks at rate O 1 n , thus confirming asymptotic efficiency.
  • (b) Asymptotic consistency
By the weak law of large numbers, let
p ^ = k n p   be the sample proportion .
Therefore, for any ε > 0 ,
lim   P n p ^ p > ε = 0 .
Furthermore, since we have shown that δ 0 as n , logit p = log 1 1 + e p is differentiable, and Lipschitz on (0, 1), then
L n , U n logit p c n logit p ,
where L n = max 0 , 1 1 + e c δ , U n = min 1 , 1 1 + e c + δ , and c n = Logit L n + Logit L n 2 . Thus, the interval converges to the true population proportion as n .
  • (c) Asymptotic normality
Let
Z n = c n logit p S E blend
Since c n logit p   in probability and S E blend 0 , the numerator c n logit p behaves like the mean of an i.i.d. terms (via beta posterior), so, by the central limit theorem, Z n d N 0 , 1 . Hence,
c n N logit p , var blend .
This confirms asymptotic normality in logit space. From the blended variance definition 1, it can be shown from leading order analysis that
var blend ~ A n   where   A = 1 p ^ 1 p ^ + θ 2 + z 2 p ^ ( 1 p ^ )
Therefore, following from the asymptotic normality of the observed c n ,
c n N logit p , A n
If we let g x = 1 1 + e x and g x = 1 1 + e x 1 1 1 + e x , then by the delta method, if X n d N μ , σ 2 n , then
g X n d N g μ , g μ 2 σ 2 n
Therefore,
L n , U n d N p , [ g ( log it ( p ) ) ] 2 A n
From the results in Equation (27), the unscaled interval limits exhibit asymptotic normality that converges in distribution to normality. This implies increasing concentration around the true parameter p as the sample size grows. This classical form of asymptotic behavior confirms the consistency of the interval bounds and validates their use as efficient estimators. Moreover, it highlights that confidence intervals become progressively narrower and more accurate in large samples, reinforcing the appropriateness of normal approximations for inference, particularly in transformed parameter spaces such as the logit scale.

3. Simulation Design

3.1. Parameters and Design

Let k { 0 , 1 , , n } denote the number of observed successes in a binomial trial of size n N , with estimated proportion p ^ = k n , where p ^ 0 , 1 is the true binomial probability. For each combination of p   ̂ and n , we perform M simulation replications following burn-in samples B. Each replication i 1 , , M + B consists of a Bernoulli trial with Jeffreys’ prior-based posterior inference and three interval construction methods.
For each i 1 , , M + B   and a combination of k and n
Step 1: Compute
k ( i ) Binomial ( n , p ) p ^ ( i ) = k ( i ) n
Step 2: Compute the posterior shape parameters for the Jeffreys’ prior   Beta ( 0.5 , 0.5 )   such   that
a ( i ) = 0.5 + k ( i ) ,   b ( i ) = 0.5 + n k ( i )
Then, construct central ( 1 α ) × 100 % posterior interval as
L ~ i = Q α 2 a i , b i , U ~ i = Q 1 α 2 a i , b i
Step 3: Given z = Φ 1 1 α 2 , construct the Wilson interval score as follows.
First the center and the margin are calculated, respectively, as follows:
c ~ i = p ^ i + z 2 2 n 1 + z 2 n   and   m ~ i = z 1 + z 2 n p ^ i ( 1 p ^ i ) n + z 2 4 n 2
Then, construct the interval as
max 0 , c ~ i m ~ i , min 1 , c ~ i m ~ i
Step 4: Construct the lower and upper bounds for the Logit margin of error. Using Jeffreys’ prior, compute the lower and upper bounds as follows:
L ~ i = Q 1 ϕ 2 a i , b i , U ~ i = Q 1 1 ϕ 2 a i , b i
Then, transform the bounds to logit scale as indicated below
l ~ L i = log L ~ i 1 L ~ i , l ~ U i = log U ~ i 1 U ~ i
c ~ i = l ~ L i + l ~ U i 2 , h ~ i = l ~ U i l ~ L i 2
δ ~ i = Logit U ~ i Logit L ~ i 2 S E Blend max S E Blend , ϵ , S ~ E ~ Blend = var Blend
L ̑ i max 0 , 1 1 + e c ~ i δ ~ i   U ~ i = min 1 , 1 1 + e c ~ i δ ~ i
L i = max 0 , 1 1 + e c i δ i ,   U i = min 1 , 1 1 + e c i δ i
Note that at each iteration, ϕ is tuned as indicated in Section 3.2.
Step 5: Computing the coverage probability and the credible width.
At each iteration M in Steps 2 to 4, calculate the coverage probability and width as follows:
coverage   probability = 1 M i = 1 M I L i p U i   and
width = 1 M i = 1 M U i L i ,
where I   is the indicator function defined by
I = 1 if   p U i L i 0   otherwise ,
and L i , U i represent the corresponding lower and upper credible intervals for each of the interval estimation methods.

3.2. Tuning Procedure

The tuning process aims to strike a balance between achieving high coverage (at least 95%) and maintaining interval efficiency by minimizing width to ensure precise estimates without sacrificing coverage. To achieve this balance, the algorithm evaluates candidate confidence levels through simulation and then selects the credible level that meets the coverage threshold while producing the narrowest or optimal intervals. The tuning process is discussed below in detail.
Step 1: Defining the grid for the credible levels
The algorithm begins by specifying a set of candidate credible levels ( ϕ ), which represent the confidence level used in the Bayesian component of the blended interval. The grid typically spans from 0.98 to 0.999 in small increments, i.e., ϕ 0.98 , 0.981 , , 0.999 .
Step 2: Simulations of blended intervals for each credible level
For each candidate level ϕ in the grid, perform Monte Carlo simulation with N = 6000 trials. In each trial, generate a binomial sample of size n with success probability p ^ and then compute the blended confidence interval as discussed in Step 4 of Section 3.1. This step evaluates how well each candidate level performs under repeated sampling.
Step 3: Computation of the width for each simulation
For each candidate level ϕ , calculate the width by subtracting the lower bound from the upper bound.
Step 4: Selecting Optimal Level
Choose the level ϕ that satisfies:
ϕ = a r   min   ϕ grid median   Width ϕ
This ensures the interval is both accurate (coverage ≥ 95%) and as narrow as possible.
NB: After selecting ϕ , the algorithm performs a quick validation with 500 trials. If the coverage falls below 0.95 during this check, it overrides the tuned level with a fixed value of ϕ = 0.985 . Because simulations can occasionally underestimate variability, especially in edge cases such as small sample sizes or extreme probabilities, the override rule adds robustness by enforcing a fallback level of 0.985 whenever quick validation shows coverage below 95%. This safeguard ensures that the method remains reliable even under extreme or rare conditions.
For the tuning process, the grid ϕ 0.98 , 0.985 , , 0.999   is selected for three reasons. Firstly, the primary goal is to achieve at least 95% coverage, which corresponds to confidence levels near 0.95. Starting the grid at 0.98 allows the algorithm to explore slightly higher levels in case the blended method tends to overcover and produce intervals that are unnecessarily wide. Extending the grid up to 0.999 provides a safety margin for extreme cases, such as very small sample sizes or probabilities near 0 or 1, where higher levels may be required to maintain coverage. Secondly, using small increments (e.g., 0.001) ensures fine resolution for optimization, enabling the algorithm to identify the level that minimizes interval width while meeting the coverage requirement. Large jumps could miss the optimal trade-off between coverage and efficiency. Third, the wide range of levels makes the method adaptable across different scenarios, as coverage and width vary significantly with sample size and true probability. For example, large samples stabilize coverage near 0.95, while small samples or extreme probabilities often require higher levels to avoid under-coverage. Finally, the range from 0.98 to 0.999 strikes a balance between accuracy and computational feasibility. It is broad enough to capture variability across different situations but narrow enough to keep the grid search practical. In summary, this grid ensures flexibility, precision, and robustness in tuning the blended method for diverse data situations.

3.3. The Choice of Gamma

The gamma regularization parameter was introduced to stabilize variance estimates in small samples and near-boundary proportions. Its calibration follows an inverse relationship with sample size, ensuring that the penalty diminishes as n grows, thereby preserving asymptotic efficiency. Empirically, gamma can be tuned through grid search across simulation regimes to minimize coverage deviation from 95% while controlling interval width. However, this approach may result in an inadequate gamma value being selected because it optimizes based on empirical performance in a finite simulation set, which may not fully represent all real-world scenarios; thus, in this study, we fixed gamma at 1. The choice of this value is considered optimal because of how it interacts with the blended variance components and the asymptotic properties. The choice balances stabilization and flexibility and has theoretical justification.
The gamma acts as a penalty term to prevent instability in small samples or near-boundary proportions; hence, if gamma < 1 (too small), the regularization effect is weak and the interval width may shrink excessively in ultra-rare events, leading to under-coverage, while on the other hand, if gamma > 1 (too large), the penalty dominates, which makes the interval unnecessarily wide and conservative, thus reducing efficiency. Gamma = 1 provides an ideal scaling factor that neither inflates nor deflates the blended variance disproportionately, thus ensuring that the regularization term contributes meaningfully without overshadowing the sampling variance or Wilson margin. Furthermore, Gamma = 1 aligns with the principle of unit-information priors in Bayesian regularization, where the penalty corresponds to one “pseudo-observation,” which keeps the method asymptotically unbiased and efficient. In particular, if gamma > 1, the penalty term becomes larger than 1/n; thus, it effectively adds the influence of multiple pseudo-observations, which makes the intervals overly conservative. On the other hand, if gamma < 1, the penalty term is less than 1/n; thus, it contributes less than the equivalent of one pseudo-observation, which may fail to provide sufficient stabilization in small samples. This is specifically consistent with Bayesian decision theory, as it aligns with the principle that priors should reflect minimal but meaningful information, thus ensuring coherent inference across different sample sizes.

4. Results and Discussion

The goal of this section is to estimate the binomial proportion p using our proposed adaptive method to improve coverage and precision of p for rare and extreme events. The chosen ranges of true proportions (0.00001–0.99999) and sample sizes (5–10,000) were selected to reflect realistic conditions encountered in rare-event and extreme-probability scenarios across multiple fields. Ultra-rare probabilities such as 10−5 or smaller occur in safety-critical engineering and pharmacovigilance, where failure rates or adverse events are extremely low. Rare disease prevalence often falls between 1 in 10,000 and 1 in 1,000,000, which aligns with the lower bound of our range, while proportions near 0.99999 represent high-reliability systems or near-certain outcomes in reliability studies. The sample size range of 5 to 10,000 captures both ends of practical study designs. For example, early-phase clinical trials and rare-disease studies often enroll fewer than 50 participants, sometimes as few as 12, whereas large-scale epidemiological surveys and public health programs routinely include tens of thousands of individuals to detect low-prevalence conditions. This broad range ensures that the proposed method is evaluated under different proportions that are representative in healthcare, pharmacovigilance, behavioral sciences, and reliability engineering. Rare disease prevalence thresholds and ultra-rare definitions (≤1 per 1,000,000) are outlined in Orphanet (a European database for rare diseases and orphan drugs) and in European Medicines Agency guidelines. In reliability engineering, standards often target failure probabilities as low as 10−5 to 10−6. Similarly, survey designs from the World Health Organization and the Centers for Disease Control and Prevention typically require sample sizes exceeding 10,000 to capture low-prevalence conditions.

4.1. Simulation Results

We simulate 5000 samples, where the first 1000 are used as burn-in samples to estimate proportions of rare and extreme events using our method and then compare its performance to Wilson and Jeffreys’ prior in terms of coverage probability and precision. Detailed R code used for the simulations, along with practical applications and the datasets, are provided as Supplementary Material. Refer to the Supplementary Material section following Section 5 for additional details.
The simulation results are summarized in Table 1, Table 2, Table 3 and Table 4 and Figure 1, Figure 2 and Figure 3.

4.1.1. Coverage Consistency

Across all regimes, from ultra-rare to extreme, the intervals of our blended method consistently maintain coverage probabilities close to or exceeding the nominal 95% level. This is especially evident as sample size increase, where the blended method often matches or surpasses both the Jeffreys’ and Wilson intervals. For instance, at p = 0.00001 and n = 10,000, the blended interval achieves 99.18% coverage compared to the 90.36% under-coverages for Jeffreys’ and Wilson. Even in extreme tail scenarios (e.g., p = 0.99999), the blended method maintains coverage above 95% at moderate to large n, demonstrating robustness under boundary conditions. Jeffreys’ intervals tend to underperform at small n and extreme p, while Wilson intervals maintain coverage but are often overcompensated with wider bounds. The results are visualized in Figure 1, where the blended method tends to consistently stay near or above the 95% coverage probability.

4.1.2. Interval Width and Efficiency

In Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 and Figure 2, it is observed that the blended intervals tend to be wider at very small sample sizes (e.g., n = 5 to 10), which is expected due to the added gamma penalty and conservative tuning of the credible level. However, these widths remain practical and shrink rapidly as the sample size increases. At all levels, the blended intervals are competitive compared to both the Jeffreys’ and Wilson intervals while preserving high coverage. For instance, at p = 0.001 and n = 5, the blended interval width is 0.46856, compared to 0.38069 for Jeffreys’ and 0.43544 for Wilson, while at n = 10,000, for the same proportion, the blended interval width is 0.00463, compared to 0.00390 for Jeffreys’ and 0.00391. This improved efficiency while maintaining high coverage reflects the adaptiveness of the blended method, which balances precision and coverage through dynamic tuning and variance blending.

4.1.3. Impact of True Probability (p)

At low true probabilities (e.g., p = 0.00001 to 0.001), the blended intervals outperform the Jeffreys’ and Wilson intervals in coverage with completive width as the sample size increases. Jeffreys’ occasionally fails to capture the true value with conservative width at small n due to its symmetric prior, while the Wilsons interval also fails to maintain the nominal coverage but with relatively smaller width. Our method, by contrast, adapts to the rarity of the event, ensuring that the interval expands appropriately to maintain coverage without excessive conservatism. As p increases to moderate levels (e.g., 0.05), all methods perform well, but the blended interval remains conservative with the interval width. At high p (e.g., 0.99 and above), the blended method continues to excel, yielding conservative intervals, and maintains coverage even when Jeffreys’ interval begins to under-cover.

4.1.4. Sample Size Sensitivity

Smaller sample sizes (n = 5, 10) exhibit more variability and wider intervals across all methods, as seen in Table 1, Table 2, Table 3 and Table 4 and Figure 2 and Figure 3. However, the blended approach maintains high coverage even for small sample sizes, which indicates robustness in small-sample data. For example, at p = 0.01 and n = 10, the blended interval achieves 99.62% coverage with a conservative width of 0.33358, thus outperforming the Jeffreys’ and Wilson intervals because of their under-coverages. As sample size increases (n ≥ 100), all methods stabilize, but the blended interval consistently maintains coverage above the nominal while maintaining conservative or narrower width. This scalability confirms that the method remains efficient across sample sizes and is particularly well-suited for large-sample studies or simulations. Notably, the blended interval avoids the erratic discreteness-induced jumps seen in classical intervals when n is small, which is one of the strengths of the blended method.

4.1.5. Sensitivity and Robustness of the Tuning Parameter on Performance

Coverage Calibration: The credible level is highly responsive to the interaction between sample size and true proportion. For ultra-rare events (e.g., p = 0.00001), the method consistently selects high credible levels (e.g., 99%) to compensate for data sparsity and ensure the interval expands sufficiently to include the true value. For instance, at p = 0.00001 and n = 5, the credible level is 0.985, yielding 100% coverage with a width of 0.49251. Similarly, for p = 0.0001 and n = 10, the level increases to 0.99 while maintaining full coverage. As the sample size increases or the true proportion becomes moderate, the tuned level decreases. At p = 0.00001 and n = 1000, the level drops to 0.98, still achieving 98.88% coverage with a much narrower width of 0.00334. For p = 0.001 and n = 5, the credible level is 0.98, and the method achieves 99.38% coverage with a width of 0.46856. This dynamic tuning allows the method to self-correct, i.e., if initial tuning results in under-coverage, the algorithm escalates the level to 98.5% or higher to restore coverage. This adaptivity underpins the method’s robustness across regimes where fixed-level intervals often fail.
Width Efficiency: Higher credible levels yield wider Bayesian bounds, which, after transformation and blending, produce more conservative intervals. The tuning algorithm balances this trade-off by selecting the credible level that still achieves the desired coverage. This ensures statistical efficiency by avoiding unnecessarily wide intervals. This behavior is evident across Table 1, Table 2, Table 3 and Table 4. For example, at p = 0.00001 and n = 1000, the credible level is 0.98, producing a narrow interval (width = 0.00334) with 98.88% coverage. At p = 0.0001 and n = 10000, the level remains at 0.98, yielding 98.02% coverage with a width of 0.0005. For rare events like p = 0.001 and n = 10, the level remains at 0.98, producing an interval width of 0.27732 and 99.34% coverage. These examples illustrate how the method maintains conservative width while preserving or exceeding the target coverage.
Boundary Sensitivity: Near the boundaries (p ≈ 0 or p ≈ 1), credible level tuning becomes more aggressive. The logit transformation amplifies small differences near the extremes, necessitating wider initial bounds. For instance, at p = 0.99999 and n = 5, the credible level is 0.985, yielding 99.98% coverage with a width of 0.49256. At p = 0.999 and n = 100, the level is also 0.98, producing 99.58% coverage with a width of 0.03467. The gamma penalty and Wilson variance blending help stabilize the interval, but the credible level remains the primary mechanism for controlling coverage.
The multi-panel plot in Figure 3 illustrates how the blended method dynamically tunes credible levels across extreme binomial regimes, balancing robustness and sensitivity. For ultra-rare and near-certain proportions (e.g., p = 0.00005 or p = 0.99999), the credible level starts conservatively high at small sample sizes around 98.5% which ensures robustness against under-coverage. These tuning curves show steep initial declines that flatten as sample size grows, forming arcs that transition from convex to nearly linear beyond n ≈ 100. This curvature reflects logit-scale transformation effects, where differences near boundaries are amplified, causing higher credible levels at low n. In contrast, mid-range proportions (e.g., p = 0.05) exhibit gentle slopes and early stabilization near 98%, indicating less aggressive tuning and geometric smoothness. The plot confirms symmetry across complementary proportions (e.g., p ≈ 0.001 vs. p ≈ 0.999), reinforcing the method’s principled adaptiveness. Collectively, the steepness at extremes, flattening at large n, and mirrored trajectories demonstrate how the method maintains high coverage while adaptively reducing credible levels from 98.5% at n = 5 to 98% at n ≥ 10,000 independent of fixed parameterization.

4.2. Practical Applications Using COVID-19 Data

To assess the performance of the blended method in comparison to the competing methods with applications to real life data, mortality and recovery rate data for the COVID-19 outbreak was used. Data from nine countries—Western Sahara, Ghana, South Africa, Australia, Brazil, Germany, France and the USA—were used. These data were selected to mirror rare, extreme nature, or small sample applications. We focused on mortality and recovery rates because they represent opposite extremes of the proportion spectrum in COVID-19 data. Mortality rates are typically low (often <0.01), making them rare-event scenarios that challenge conventional interval estimation methods. In contrast, recovery rates often approach near certainty (>0.99), placing them at the upper extreme of the probability scale. Evaluating these endpoints allows us to demonstrate the adaptability of the proposed method across the full range of binomial proportions from ultra-rare to near-certain outcomes. The data were sourced from https://www.worldometers.info/coronavirus/ (accessed on 26 August 2025). For each country, mortality and recovery rate were estimated using the blended method and the competing methods. The results are reported in Table 5 and Table 6.
Table 5. Performance of interval estimation methods for COVID-19 mortality rates.
Table 5. Performance of interval estimation methods for COVID-19 mortality rates.
CountrySample
Size
True pJeffreys’ CoverageWilson
Coverage
Blended
Coverage
Jeffreys’
Width
Wilson
Width
Blended
Width
Western
Sahara
100.100000.987330.929670.987330.343710.368780.40912
Ghana171,8890.008510.947330.946330.979000.000870.000870.00103
South Africa4,076,4630.025170.946330.946330.977000.000300.000300.00036
Australia11,853,1440.002060.945500.945000.977170.000050.000050.00006
Brazil38,743,9180.018360.949830.949830.978330.000080.000080.00010
Germany38,828,9950.004710.949330.949330.978000.000040.000040.00005
France40,138,5600.004180.952830.952830.981000.000040.000040.00005
USA111,820,0820.010910.950500.950500.978500.000040.000040.00005
Table 6. Performance of interval estimation methods for COVID-19 recovery rates.
Table 6. Performance of interval estimation methods for COVID-19 recovery rates.
CountrySample
Size
True pJeffreys’ CoverageWilson
Coverage
Blended
Coverage
Jeffreys’
Width
Wilson
Width
Blended
Width
Western
Sahara
100.777780.986000.926830.988170.345130.369770.41099
Ghana171,8890.900000.954670.953170.980330.000870.000870.00103
South Africa4,076,4630.991480.951500.951500.980000.000380.000380.00045
Australia11,853,1440.959780.952670.952670.977500.000150.000150.00007
Brazil38,743,9180.935610.950670.950670.981830.000040.000040.00018
Germany38,828,9950.995290.957330.957170.980830.000040.000040.00005
France40,138,5600.995820.950670.950670.979000.000050.000050.00005
USA111,820,0820.982060.986000.926830.980500.000050.000050.00006
Across countries, the mortality rate intervals for the blended method consistently achieved better coverage, particularly in large samples and extremely low rates. For example, for Australia (true proportion = 0.00206), blended coverage reached 97.7%, exceeding the nominal 95% threshold and outperforming the Jeffreys’ and Wilson intervals. Similar patterns were observed for South Africa and Germany, where the blended intervals maintained coverage above 97%. Although the width was comparatively wider in most cases, the blended method’s prioritization of coverage over smaller width appears justified in rare events where underestimation poses substantial epidemiological risk. In small samples such as Western Sahara (n = 10), the blended intervals preserved high coverage (98.7%) while producing competitive interval width in comparison to the other methods. Jeffreys’ intervals remained stable, whereas Wilson intervals exhibited under-coverage (92.9%), indicating sensitivity to small-sample-size bias and boundary truncation.
The recovery rate estimation revealed good performance across all methods in large samples. However, blended intervals again maintained coverage above 97% for Brazil and Germany. Interval widths converged across methods in large samples (width ≤ 0.000103), yet the blended intervals retained a marginal advantage in coverage. For the USA (true proportion = 0.98206), Wilson coverage (92.683%) dropped below the nominal threshold, while the blended and Jeffreys’ intervals were above the nominal levels, which underscores the blended method’s robustness near the boundary. The blended method’s adaptive tuning emerged as a key strength. This dynamic behavior enables the method to maintain nominal coverage across heterogeneous epidemiological contexts without excessive conservatism.
In summary, the blended method is the most reliable for coverage, with an under-coverage rate of 1.6% (1 out of 63) compared to that of Jeffreys at 14.3% (9 cases) and Wilson at 12.7% (8 cases). The blended method consistently exceeded the nominal 95% level, even for rare events and boundary extremes. The demonstrated efficiency of the blended interval method in both rare-event and boundary-extreme events supports its integration into public health surveillance systems, particularly for real-time estimation of mortality and recovery rates. Its capacity to preserve coverage in sparse data environments makes it suitable for early disease outbreak detection and low-incidence monitoring, while its stability in high-proportion contexts ensures accurate reporting of recovery rates of disease outbreaks. Adoption of the blended method in reporting frameworks could enhance the interpretability and credibility of epidemiological risks, thus informing risk communication, resource allocation, and intervention prioritization. Moreover, its modular tuning architecture aligns with adaptive surveillance strategies, which allows for calibration sensitivity in evolving epidemic outbreaks.

4.3. Computational Cost Analysis

The proposed method is theoretically bounded by O M × L per n , p pair, where M   is the number of simulation iterations and L   is   the number of candidate credible levels in tuning. Our configuration ( M = 1000 , L = 20 , and 63 parameter pairs) results in approximately 1.26 million interval estimations. Despite this complexity, the observed runtime is far below the n 6   threshold. The largest configuration was completed in approximately 87.5 s (1.46 min) using about 20.15 MB of memory, while the smaller cases required only about 7.1 s (0.12 min) and 1.75 MB. These results confirm that the implementation is memory-efficient by leveraging R’s optimized linear algebra routines, hence avoiding explicit n 6 scaling. Efficiency is achieved through vectorization, sparse operations, and caching, which makes the method practically robust for large-scale application.

4.4. Data-Based Tuning Rules for the Credible Level ( ϕ ) : The Choice of Adaptive over Fixed Rule

In practice, choosing a single fixed value for ϕ   instead of using a grid or adaptive rule is not recommended because it removes the flexibility that the method needs to perform well across diverse data conditions. Real-world datasets vary widely in sample size and observed proportion, and a fixed ϕ   might work for moderate n and p ^ , but may fail for very small samples or rare events. For example, always using ϕ = 0.98 could lead to under-coverage when n = 10 and p ^ = 0.0001 or produce unnecessarily wide intervals when n = 10,000 and p ^ = 0.5 . The strength of the proposed method lies in its ability to adapt interval width to maintain coverage while minimizing unnecessary conservatism; thus, a fixed ϕ   ignores this adaptivity and makes the method behave like a standard fixed-level Bayesian interval, which defeats the purpose. Coverage performance depends on both n and p ^ , and without tuning, intervals can fail to meet the nominal 95% coverage in rare-event scenarios, which is critical in applications such as epidemiology or reliability analysis. Practically, using a fixed ϕ   could result in misleading intervals, either too narrow (under-coverage) or too wide (inefficient), which can lead to incorrect decisions in safety-critical applications like clinical trials or risk assessments.
To address this, we implement a grid search within the range [0.98, 0.999]. Preliminary simulation studies (not reported here) support the choice of this range, which is broad enough to capture variability across regimes yet narrow enough to remain computationally efficient. For practical reasons, searching across the entire (0, 1) spectrum would add unnecessary computational cost without improving results, as most intervals of that range are irrelevant for coverage optimization. Although grid search is implemented within the codes, practitioners may need to specify the range for the grid search to suit the nature of their data for better results; thus, based on the simulation results from Table 1, Table 2, Table 3 and Table 4, the following grid selection rules (as reported in Table 7) are proposed as a guide.
The algorithm starts at the lower boundary and increases incrementally (e.g., by 0.001) until a coverage ≥ 95% is achieved. This ensures robustness for rare-event scenarios and efficiency for large-sample studies without unnecessary conservatism.

5. Conclusions

This study introduces an adaptive Bayesian variance-blending calibration framework for estimating binomial proportions of rare and extreme events. The performance of the method in comparison with the classical Jeffreys’ and Wilson score intervals across different sample sizes (n = 5 to 10,000) under different true proportions was assessed through extensive simulations and real data applications. The blended method was shown to be robust and adaptive across sample sizes and true probabilities. It consistently achieves or exceeds the nominal 95% coverage, particularly in small sample events, rare events, and extreme events where traditional methods often struggle. Even at small sample sizes (n = 5), the method maintains high coverage and avoids erratic behavior typical of discrete classical intervals.
In terms of interval width, the blended method balances conservatism and efficiency through dynamic credible level tuning. While intervals are wider at small n due to conservative initialization, they shrink rapidly with increasing sample size. At moderate and large n, blended intervals are competitive in comparison to the Jeffreys’ and Wilson intervals while preserving coverage. This adaptivity is driven by a tuning mechanism that adjusts the credible level based on the observed data and empirical coverage feedback. High credible levels are selected for ultra-rare events to ensure efficiency, while lower levels are used for moderate proportions to enhance precision.
Geometric analysis of the tuning curves demonstrates how the blended method adaptively tunes credible levels across binomial extremes by starting at higher values for small samples and gradually flattening into near-linear, symmetric trajectories as sample size increases, thus ensuring robust coverage and balanced sensitivity. These features reflect the method’s ability to stabilize estimates near boundaries through aggressive credible level tuning and variance blending.
Theoretically, the blended method bridges Bayesian and frequentist paradigms by using empirical feedback to calibrate credible levels to guarantee near or above nominal coverage with narrow or conservative width. Its adaptivity and boundary sensitivity offer a competitive alternative to fixed-level intervals, particularly in rare and extreme events. Practically, the method’s ability to adaptively shrink or expand based on sample size and true proportion makes it highly suitable for applications where sample size limitations and rare or extreme event proportions—such as rare disease prevalence, safety-critical system reliability, and early-phase clinical trials—render conventional interval estimation inaccurate. Considering these findings, we recommend the use of the blended method for interval estimation of rare and extreme events, especially when sample sizes are limited or true proportions lie near the boundaries. Future work may extend this framework to multinomial or hierarchical settings, where adaptive tuning could further enhance inference under complex data structures.
Although the proposed adaptive Bayesian variance-blending framework demonstrates strong performance across rare and extreme event scenarios, several limitations remain. First, the method incurs additional computational overhead due to grid-based tuning of credible levels. Although the method is memory-efficient, its runtime can still be significant for very large datasets, which may pose challenges for real-time or large-scale applications in the absence of high computational power and memory.
Second, its performance is sensitive to the tuning of key parameters, such as the credible level. Incorrect calibration can lead to inflated interval widths or under-coverage. Furthermore, the approach assumes independence and that the trials follow a strict binomial form. Finally, while the method was validated using simulations and COVID-19 data, broader empirical testing across diverse domains, such as reliability engineering and rare disease prevalence, is needed to confirm generalizability.
Future research should address these limitations by developing automated tuning algorithms that leverage optimization and reduce computational cost. Extending the framework to handle complex data structures such as multinomial, hierarchical, and correlated models would enhance its application in many fields. In such models, incorporating mechanisms to account for overdispersion, such as beta-binomial or hierarchical Bayesian models, could also further improve robustness. Additionally, adaptive strategies for gamma calibration should be explored to allow dynamic adjustment based on sample size and event rarity, instead of fixing its value to 1. Practical directions include optimizing the method for real-time surveillance systems and creating open-source software implementations to facilitate adoption. Finally, comprehensive validation across multiple application areas will be essential to establish the method’s reliability and scalability.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math13243988/s1, The codes and the data used have been included as a zipped file labelled Supplementary file-R codes.

Author Contributions

Conceptualization, A.A. and A.B.; methodology, A.A.; software, A.A.; validation, A.B. and D.M.; formal analysis, A.A.; resources, A.B. and D.M.; writing—original draft, A.A.; writing—review and editing, A.B. and D.M.; supervision, D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study are publicly available.

Acknowledgments

The authors would like to acknowledge UCDP-Sol Plaatje who funded the research visit to the Modern College of Business and Science, which resulted in the writing of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Agresti, A.; Coull, B.A. Approximate is better than ‘exact’ for interval estimation of binomial proportions. Am. Stat. 1998, 52, 119–126. [Google Scholar] [CrossRef] [PubMed]
  2. Wilson, E.B. Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 1927, 22, 209–212. [Google Scholar] [CrossRef]
  3. Clopper, C.J.; Pearson, E.S. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 1934, 26, 404–413. [Google Scholar] [CrossRef]
  4. Agresti, A.; Gottard, A. Nonconservative exact small sample inference for discrete data. Comput. Stat. Data Anal. 2007, 52, 6447–6458. [Google Scholar] [CrossRef]
  5. Brown, L.D.; Cai, T.T.; DasGupta, A. Interval estimation for a binomial proportion. Stat. Sci. 2001, 16, 101–133. [Google Scholar] [CrossRef]
  6. Fan, Z.; Liu, D.; Chen, Y.; Zhang, N. Something Out of Nothing? The Influence of Double-Zero Studies in Meta-analysis of Adverse Events in Clinical Trials. Stat. Biosci. 2024. [Google Scholar] [CrossRef]
  7. Krishnamoorthy, K.; Lee, M.; Zhang, D. Closed-form fiducial confidence intervals for some functions of independent binomial parameters with comparisons. Stat. Methods Med. Res. 2017, 26, 43–63. [Google Scholar] [CrossRef] [PubMed]
  8. Liu, J.; Shao, F.; Yang, J. Comparison of interval estimation for extreme event proportions based on exact, approximate and Bayesian approaches. Biostat. Epidemiol. 2025, 9. [Google Scholar] [CrossRef]
  9. Newcombe, R.G. Two-sided confidence intervals for the single proportion: Comparison of seven methods. Stat. Med. 1998, 17, 857–872. [Google Scholar] [CrossRef]
  10. Yang, G.; Liu, D.; Wang, J.; Xie, M. Meta-analysis framework for exact inferences with application to the analysis of rare events. Biometrics 2016, 72, 1378–1386. [Google Scholar] [CrossRef] [PubMed]
  11. Owen, M.; Burke, K. Binomial confidence intervals for rare events: Importance of defining margin of error relative to magnitude of proportion. Am. Stat. 2024, 78, 437–449. [Google Scholar] [CrossRef]
  12. Blaker, H. Confidence curves and improved exact confidence intervals for discrete distributions. Can. J. Stat. 2000, 28, 783–798. [Google Scholar] [CrossRef]
  13. Ogura, T.; Yanagimoto, T. Improvement of Bayesian credible interval for a small binomial proportion using logit transformation. Am. J. Biostat. 2018, 8, 1–8. [Google Scholar] [CrossRef]
  14. Luo, Y.; Gao, C. Adaptive robust confidence intervals. arXiv 2024, arXiv:2410.22647. [Google Scholar] [CrossRef]
  15. Lyles, R.H.; Weiss, P.; Waller, L.A. Calibrated Bayesian credible intervals for binomial proportions. J. Stat. Comput. Simul. 2019, 90, 75–89. [Google Scholar] [CrossRef] [PubMed]
  16. Jeffreys, H. Theory of Probability, 3rd ed.; Oxford University Press: London, UK, 1961; ISBN 9780198532019. [Google Scholar]
  17. Bayarri, M.J.; Berger, J.O. The interplay of Bayesian and Frequentist analysis. Stat. Sci. 2004, 19, 58–80. [Google Scholar] [CrossRef]
  18. Young, G.A.; Smith, R.L. Essentials of Statistical Inference; Cambridge University Press: New York, NY, USA, 2005; ISBN 978-0-521-83971-6. [Google Scholar]
  19. Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational inference: A review for statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
  20. Le Cam, L. Asymptotic Methods in Statistical Decision Theory; Springer: New York, NY, USA, 1986. [Google Scholar] [CrossRef]
Figure 1. Coverage sensitivity to sample size across interval methods.
Figure 1. Coverage sensitivity to sample size across interval methods.
Mathematics 13 03988 g001
Figure 2. Interval width sensitivity to sample size across interval methods.
Figure 2. Interval width sensitivity to sample size across interval methods.
Mathematics 13 03988 g002
Figure 3. Sensitivity of tuned credible levels across sample sizes and proportions.
Figure 3. Sensitivity of tuned credible levels across sample sizes and proportions.
Mathematics 13 03988 g003
Table 1. Performance comparison of interval estimation methods (ultra-rare-event proportions, p ≤ 0.0001).
Table 1. Performance comparison of interval estimation methods (ultra-rare-event proportions, p ≤ 0.0001).
Sample
Size
Jeffreys’
Coverage
WS
Coverage
Blend
Coverage
Jeffreys’ WidthWS
Width
Blend
Width
Blend
Level
p = 0.00001
50.00001.00001.00000.379280.434480.492510.985
100.00001.00001.00000.217150.277530.276270.980
500.99920.99920.99920.048780.071370.063920.980
1000.99860.99860.99860.024770.037020.032580.980
10000.98880.98880.98880.002530.003850.003340.980
50000.94780.94780.94780.000520.000790.000690.980
10,0000.90360.90360.99180.000270.000400.000360.980
p = 0.0001
50.99900.99900.99900.379510.434640.467420.980
100.99960.99960.99960.217210.277580.276330.980
500.99580.99580.99580.048910.071470.064060.980
1000.99040.99040.99040.024930.037140.032760.980
10000.90520.90520.99500.002710.003990.003540.980
50000.98780.91240.98780.000690.000920.000880.980
10,0000.98020.91720.98020.000430.000530.000540.980
Table 2. Performance comparison of interval estimation methods (Rare events, 0.001 < True proportion ≤ 0.01).
Table 2. Performance comparison of interval estimation methods (Rare events, 0.001 < True proportion ≤ 0.01).
Sample
Size
Jeffreys’
Coverage
WS
Coverage
Blend
Coverage
Jeffreys’ WidthWS
Width
Blend
Width
Blend
Level
p = 0.001
50.99380.99380.99380.380690.435440.468560.980
100.99340.99340.99340.218160.278250.277320.980
500.94960.94960.99900.050730.072890.070890.985
1000.90780.90780.99460.026660.038510.034680.980
10000.98500.92420.98500.004230.005240.005310.981
50000.92920.96220.98200.001770.001880.002120.980
10,0000.94620.96360.98160.001250.00128
p = 0.01
50.94920.94920.94920.390880.442340.478470.980
100.91040.91040.99620.231210.287530.290920.980
500.98720.90560.98720.066870.085600.084890.981
1000.98420.92640.98420.042220.051130.052120.980
10000.94880.96560.97560.012360.012730.014730.980
50000.95680.94700.98400.005520.005550.006560.980
10,0000.95400.95400.97880.003900.003910.004630.980
Table 3. Performance comparison of interval estimation methods (normal events (0.01 < true proportion ≤ 0.99).
Table 3. Performance comparison of interval estimation methods (normal events (0.01 < true proportion ≤ 0.99).
Sample
Size
Jeffreys’
Coverage
WS
Coverage
Blend
Coverage
Jeffreys’ WidthWS
Width
Blend
Width
Blend
Level
p = 0.05
50.97800.97800.97800.432600.470680.519090.980
100.98880.91980.98880.285860.326690.348120.980
500.89220.96860.98900.119080.128870.144090.981
1000.93780.96920.98360.084510.088310.100760.980
10000.95260.95260.97980.026990.027110.032040.980
50000.94960.95320.98180.012090.012100.014350.980
10,0000.96120.95800.98500.008540.008540.010140.980
p = 0.99
50.94840.94840.99820.391150.442530.503850.985
100.90540.90540.99580.232020.288110.29500.981
500.98800.91420.98800.066170.085040.084110.981
1000.98420.92360.98420.041910.050880.051770.980
10000.94200.96280.97700.012370.012730.014740.980
50000.95440.94860.98260.005510.005540.006540.980
10,0000.95580.95580.98440.003890.003910.004620.980
Table 4. Performance comparison of interval estimation methods (extreme events, true proportion > 0.99).
Table 4. Performance comparison of interval estimation methods (extreme events, true proportion > 0.99).
Sample
Size
Jeffreys’
Coverage
WS
Coverage
Blend
Coverage
Jeffreys’ WidthWS
Width
Blend
Width
Blend
Level
p = 0.999
50.99500.99500.99500.380420.435250.468300.980
100.98880.98880.98880.218880.278760.278070.980
500.95060.95060.99820.050710.072870.070880.985
1000.90720.90720.99580.026650.038500.034670.980
10000.97760.91200.97760.004300.005300.005330.980
50000.92900.96320.97960.001760.001870.002110.980
10,0000.94340.96180.98040.001240.001280.001500.981
p = 0.9999
50.99960.99960.99960.379370.434540.467280.980
100.99880.99880.99880.217330.277660.276460.980
500.99460.99460.99460.048960.071510.064110.980
1000.98960.98960.98960.024950.037160.032780.980
10000.90020.90020.99460.002720.004000.003600.981
50000.98560.91200.98560.000690.000920.000870.980
10,0000.97980.92320.97980.000430.000530.000540.981
p = 0.99999
50.00000.99980.99980.379330.434510.492560.985
100.00000.99980.99980.217180.277550.27630.980
500.99940.99940.99940.048770.071370.063910.980
1000.99880.99880.99880.024760.037010.032580.980
10000.98960.98960.98960.002530.003840.003330.980
50000.95200.95200.95200.000520.000780.000690.980
10,0000.90260.90260.99360.000270.000400.000360.980
Table 7. Adaptive grid ranges for tuning credible level (ϕ).
Table 7. Adaptive grid ranges for tuning credible level (ϕ).
RegimeSample Size (n)True Proportion (p)Grid Range for ϕ
Ultra-raren ≤ 10p ≤ 0.00010.985–0.999
n ≥ 50p ≤ 0.00010.980–0.990
Raren ≤ 100.0001 < p ≤ 0.010.980–0.995
n ≥ 500.0001 < p ≤ 0.010.980–0.990
Normaln ≥ 500.01 < p ≤ 0.990.980–0.985
Extreme tailsAny other np ≥ 0.99 or p ≤ 0.000010.985–0.999
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Antwi, A.; Boateng, A.; Maposa, D. Adaptive Bayesian Interval Estimation for Rare Binomial Events: A Variance-Blending Calibration Framework. Mathematics 2025, 13, 3988. https://doi.org/10.3390/math13243988

AMA Style

Antwi A, Boateng A, Maposa D. Adaptive Bayesian Interval Estimation for Rare Binomial Events: A Variance-Blending Calibration Framework. Mathematics. 2025; 13(24):3988. https://doi.org/10.3390/math13243988

Chicago/Turabian Style

Antwi, Albert, Alexander Boateng, and Daniel Maposa. 2025. "Adaptive Bayesian Interval Estimation for Rare Binomial Events: A Variance-Blending Calibration Framework" Mathematics 13, no. 24: 3988. https://doi.org/10.3390/math13243988

APA Style

Antwi, A., Boateng, A., & Maposa, D. (2025). Adaptive Bayesian Interval Estimation for Rare Binomial Events: A Variance-Blending Calibration Framework. Mathematics, 13(24), 3988. https://doi.org/10.3390/math13243988

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop