Next Article in Journal
Linearized Harmonic Balance Method for Seeking the Periodic Vibrations of Second- and Third-Order Nonlinear Oscillators
Previous Article in Journal
Arouse-Net: Enhancing Glioblastoma Segmentation in Multi-Parametric MRI with a Custom 3D Convolutional Neural Network and Attention Mechanism
Previous Article in Special Issue
Statistical Inference on the Shape Parameter of Inverse Generalized Weibull Distribution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sequential Confidence Intervals for Comparing Two Proportions with Applications in A/B Testing

1
Department of Mathematics and Statistics, Oakland University, Rochester, MI 48309-4486, USA
2
Department of Health Policy, Stanford University School of Medicine, Stanford, CA 94305, USA
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(1), 161; https://doi.org/10.3390/math13010161
Submission received: 11 December 2024 / Revised: 1 January 2025 / Accepted: 4 January 2025 / Published: 5 January 2025
(This article belongs to the Special Issue Sequential Sampling Methods for Statistical Inference)

Abstract

:
This article addresses the use of fixed-width confidence intervals (FWCIs) for comparing two independent Bernoulli populations in A/B testing scenarios. Two sequential estimation procedures are proposed: one for estimating the difference in log probabilities of success and the other for log odds ratios. Both methods showcase great efficiency, as established via theoretical analysis and Monte Carlo simulations. The practical utility of these methods is demonstrated through two real-world applications: analyzing retention rates in mobile game Cookie Cats and evaluating the effectiveness of online advertising.

1. Introduction

A/B testing is a statistical strategy for comparing two or more variants to see which one outperforms the others. Participants in an A/B test are assigned to different groups at random, and are exposed to varying versions (A and B, say). The observed outcomes are then analyzed to determine whether any of the differences are statistically significant. A/B testing helps companies and researchers make choices about product features, marketing strategies, or user experiences based on data, leading to better results supported by real-world evidence. For example, ref. [1] provided insights into how Microsoft utilized online controlled experiments (A/B tests) to make data-driven decisions in product development, and ref. [2] used A/B testing alongside evaluation of users’ mental models to improve the user experience of a Japanese language mobile learning application. Additionally, many companies regularly share case studies and blog posts detailing how A/B testing has helped them make decisions and obtain better results.
When binary data are available, A/B testing serves as a powerful method to compare two proportions across diverse domains, including applications in mobile gaming and marketing strategies. In the context of mobile gaming, A/B testing can be employed to compare different game features or user interfaces, aiming to identify the version that leads to a higher proportion of player engagement, retention, or in-app purchases. For example, in our study, we will apply A/B testing to analyze the mobile game called Cookie Cats. In this game, players progress through various stages and encounter gates that require them to either wait a certain amount of time or make an in-app purchase to continue. These gates serve a dual purpose: they drive in-app sales and provide players with a break, increasing and extending their enjoyment of the game. The placement of these gates is crucial for maintaining player satisfaction and maximizing retention. We will examine an A/B test in Cookie Cats where we changed the first gate from level 30 to level 40. Our analysis will focus on how this modification affects player retention.
Similarly, A/B testing is highly applicable in marketing strategies, such as evaluating the effectiveness of online advertising on outcomes such as sales or user engagement. A large company with an established user base, for example, may aim to increase sales through targeted advertisements. To determine the effectiveness of these ads, we will investigate an A/B test where users are divided into two groups: a control group that does not receive the advertisements, and a test group that does. By comparing sales performance between the two groups over a specified period, we will evaluate whether exposure to advertisements leads to a significant boost in sales. This data-driven approach enables the company to make informed decisions about its advertising strategy, ensuring that resources are allocated to campaigns that produce measurable results.
In both the mobile gaming and online advertising examples, where A/B testing is utilized as a pivotal tool for refining user experiences and optimizing strategies, the subsequent step of calculating confidence intervals for the proportions under estimation becomes equally important. Once A/B testing identifies the version that yields a superior outcome, the application of confidence intervals provides a quantitative measure of the precision and reliability of those estimates.
Confidence intervals play a fundamental role in the interpretation of proportions in statistical analysis. Estimating the percentage of website users who click a link is an example of a proportion. Confidence intervals measure the uncertainty about observed differences between groups in two proportions, such as A/B testing. These intervals help analysts evaluate their estimates and decide on the statistical significance and practical relevance of observed effects, improving research and experimental interpretation.
It is safe to say that there is a vast number of articles on confidence intervals for one and two proportions. One of the most commonly used methods is the Wald confidence interval, which relies on the normal approximation to the binomial distribution. Despite its simplicity, the Wald interval is widely criticized for its poor performance, particularly with small sample sizes or sample proportions near 0 or 1, where it often exhibits undercoverage. Ref. [3] introduced an alternative interval, called the Wilson interval, which improves upon the Wald interval by addressing its coverage issues. It has better performance, especially for small sample sizes and extreme proportions. Ref. [4] proposed the Clopper–Pearson interval, which is an exact interval based on the inversion of the equal-tail binomial test. Although guaranteeing the nominal coverage probability, this interval tends to be overly conservative and can produce unnecessarily wide intervals. Ref. [5] proposed the so-called “plus four” interval, which adjusts the Wald interval by adding two successes and two failures to the observed counts, improving its performance. Ref. [6] compared seven methods for constructing two-sided confidence intervals for a single proportion, providing a detailed analysis. Ref. [7] conducted a comprehensive evaluation of various intervals focusing on their coverage probability and their width, which was further supported and complemented by their subsequent work [8]. Additionally, ref. [9] obtained the smallest confidence intervals for a proportion in the sense of set inclusion. As for the difference between two independent proportions, classical confidence intervals can often be constructed by inverting the hypothesis tests accordingly. For example, the Wald interval can be derived from Goodman’s test [10], and the score interval is based on inverting the test with the standard errors evaluated under the null hypothesis of equal proportions. Ref. [11] compared eleven methods and introduced a hybrid interval by using information from the respective Wilson intervals for the two proportions. Ref. [12] extended the “plus four” approach to the two-proportion case. Ref. [13] applied an Edgeworth expansion to the Studentized difference of two binomial proportions, proposing two new intervals by correcting the skewness in the Edgeworth expansion. Ref. [14] put forward a “recentered” confidence interval with strong overall performance. And most recently, ref. [15] presented an optimal exact confidence interval, which showed uniformly superior performance in terms of infimum coverage probability and total interval width.
It is worth mentioning that the aforementioned confidence intervals are all constructed based on fixed-size samples. However, in many statistical inference problems, fixed-sample-size methods are unfeasible when predetermined accuracy requirements, such as fixed-width confidence intervals, must be met. Sequential or multistage sampling, which adjusts the sample size dynamically according to the collected data, becomes essential for achieving the desired accuracy. These methods are especially valuable in fields requiring prompt decisions, such as clinical trials and manufacturing quality control, enabling timely and accurate estimates while minimizing resource and time expenditure. In the context of sequential confidence intervals for one and two proportions, ref. [16] proposed asymptotically optimal sequential and two-stage procedures to construct confidence intervals of fixed width and confidence level for a Bernoulli success probability p. Ref. [17] developed and compared four exact sequential methods for obtaining fixed-width confidence intervals for p. Ref. [18] considered the construction of fixed-width confidence intervals using sequential sampling and carried out a simulation study. Ref. [19] introduced a novel approach with a tandem-width confidence interval for a Bernoulli proportion. Ref. [20] proposed leveraging importance sampling to calculate confidence intervals that almost always guarantee the specified coverage. Ref. [21] optimized sampling costs while achieving prescribed interval widths, accounting for varying observation costs between distributions. Furthermore, ref. [22] analyzed the achieved coverage and explored the trade-offs between the number of observations and the number of stages needed to achieve the desired width of the confidence interval.
In this article, we explore sequential confidence intervals for comparing two proportions, with a focus on A/B testing applications. We emphasize the importance of confidence intervals in interpreting A/B testing results within mobile gaming and online advertising sectors. The article is organized as follows: Section 2 introduces fixed-width confidence intervals for the logarithm of the ratio of two proportions and presents simulated studies to validate these methods; Section 3 extends this discussion to fixed-width confidence intervals for the logarithm of the odds ratio, again supported by simulated studies; in Section 4, we apply these statistical tools to A/B testing in mobile gaming through a case study of the game Cookie Cats, highlighting the practical impact of gate placement on player retention; in Section 5, we present a second case study evaluating the effectiveness of online advertising aimed at increasing sales; finally, Section 6 concludes the article by summarizing the findings and underscoring the significance of sequential confidence intervals in facilitating data-driven decision-making via A/B testing.

2. Sequential Confidence Intervals for the Ratio of Two Proportions

Suppose we are interested in some common characteristic, referred to as success, possessed by two independent dichotomous populations, say X and Y. The success probabilities are denoted by p 1 and p 2 , respectively, where 0 < p i < 1 , i = 1 , 2 . Our goal is to compare their magnitudes and determine whether one is significantly greater than the other.
Assume that we have collected random samples X 1 , , X n 1 and Y 1 , , Y n 2 from X and Y, respectively, where the sample sizes n 1 and n 2 are not necessarily the same. Then, the  X i ’s are independent and identically distributed (i.i.d.) Bernoulli ( p 1 ) random variables, and the sample proportion X ¯ n 1 = n 1 1 i = 1 n 1 X i serves as an unbiased estimator of p 1 . However, even though 0 < p 1 < 1 , there is a positive probability that X ¯ n 1 equals 0 or 1 in a given sample. To avoid this issue, we adopt the “plus four” idea introduced in [5,12], leading to the following biased but consistent estimator:
p ^ 1 , n 1 = i = 1 n 1 X i + 1 n 1 + 2 .
This estimator can be treated as a Bayes estimator with a Uniform ( 1 , 1 ) prior, or as a weighted average of X ¯ n 1 (the sample proportion) and 1 / 2 (a naïve estimator of p 1 ). Notably, p ^ 1 , n 1 is always strictly between 0 and 1. Similarly, we estimate p 2 by
p ^ 2 , n 2 = j = 1 n 2 Y j + 1 n 2 + 2 .
To compare the magnitudes of p 1 and p 2 , we construct a confidence interval for the ratio p 1 / p 2 (or a monotonic function of p 1 / p 2 ) with some prescribed accuracy. As  p 1 / p 2 is always positive, we apply the log transformation on it and the resulting quantity log ( p 1 / p 2 ) takes values on ( , ) . According to the the central limit theorem and the delta method, we find that for i = 1 , 2 ,
n i log p ^ i , n i log p i d N 0 , σ i 2 ,
as n i , where d represents convergence in distribution and σ i 2 = ( 1 p i ) / p i . For sufficiently large n 1 and n 2 , we have the following approximate normality of the difference in log p ^ 1 , n 1 and log p ^ 2 , n 2 :
log p ^ 1 , n 1 log p ^ 2 , n 2 log p 1 log p 2 . N 0 , σ 1 2 n 1 + σ 2 2 n 2 ,
where W . F represents that the random variable W is approximately distributed as F. This can be used to construct a large-sample approximate confidence interval for log p 1 log p 2 to compare p 1 and p 2 . For the sake of estimation precision, we pre-specify both the confidence level 1 α ( 0 , 1 ) and the interval width 2 d > 0 . Such an interval is then referred to as a fixed-width confidence interval (FWCI). That is, with prefixed α and d, we consider the confidence interval of the form
I n 1 , n 2 = log p ^ 1 , n 1 log p ^ 2 , n 2 ± d ,
which satisfies
Pr log p 1 log p 2 I n 1 , n 2 1 α .
Here, log p ^ 1 , n 1 log p ^ 2 , n 2 serves as a point estimator for log p 1 log p 2 , and d can be interpreted as the half-width of the interval (or, the margin of error).
It is clear that the FWCI I n 1 , n 2 for log p 1 log p 2 given by (5) is equivalent to the following interval for the ratio of two proportions p 1 / p 2 :
I n 1 , n 2 = e d p ^ 1 , n 1 / p ^ 2 , n 2 , e d p ^ 1 , n 1 / p ^ 2 , n 2 ,
which is then called a fixed-accuracy confidence interval (FACI) with e d ( > 1 ) being the accuracy parameter.
Next, we set out to determine the minimum sample sizes needed to meet the fixed width and coverage probability requirements. Define
Δ = d 2 / z 2 ,
where z z α / 2 is the upper 100 ( α / 2 ) % point of a standard normal distribution. From (6), the required sample size in total, n 1 + n 2 , must satisfy that
σ 1 2 n 1 + σ 2 2 n 2 Δ .
To minimize the total sample size, applying the Cauchy–Schwarz inequality yields
n 1 + n 2 ( σ 1 + σ 2 ) 2 / Δ ,
with equality when n 1 / n 2 = σ 1 / σ 2 . In this sense, we can specify the optimal sample sizes of n 1 , n 2 , and  n as follows:
n 1 = σ 1 ( σ 1 + σ 2 ) / Δ , n 2 = σ 2 ( σ 1 + σ 2 ) / Δ , and n = n 1 + n 2 .
We tacitly disregard the fact that n 1 , n 2 , or  n may not be an integer.
Since p 1 and p 2 are two unknown parameters, it is essential to estimate σ 1 2 and σ 2 2 by updating their estimators at every stage as necessary. Beginning with pilot samples X 1 , , X m 1 ( m 1 10 ) from X and Y 1 , , Y m 2 ( m 2 10 ) from Y, we propose the following sequential estimation procedure with the associated stopping rule given by
N = N 1 + N 2 = inf { n 1 + n 2 m 1 + m 2 : n 1 1 σ ^ 1 , n 1 2 + n 2 1 σ ^ 2 , n 2 2 Δ } ,
where n 1 and n 2 indicate the numbers of observations that are taken from X and Y, respectively, and for i = 1 , 2 , σ ^ i , n i 2 = ( 1 p ^ i , n i ) / p ^ i , n i with p ^ i , n i defined in (1)–(2). By utilizing the “plus four” adjustment, Pr 0 < p ^ i , n i < 1 = 1 so that σ ^ i , n i 2 is well defined with probability one (w.p.1). Suppose that at some point, we have gathered n 1 and n 2 observations from X and Y, respectively, but the stopping rule (11) is not satisfied, which implies that we should continue sampling. The question is from which population we are going to take the next observation. According to the equality condition for (9), we propose the following allocation scheme:
If n 1 / n 2 > ( ) σ ^ 1 , n 1 / σ ^ 2 , n 2 , collect one additional observation from Y ( X ) .
This sequential estimation procedure (11)–(12) can be summarized in Algorithm 1. It is implemented as follows. With the pilot samples, if  m 1 1 σ ^ 1 , m 1 2 + m 2 1 σ ^ 2 , m 2 2 Δ has already been satisfied, we do not take any additional observations, and the final sample size is N = m 1 + m 2 . Otherwise, we compare m 1 / m 2 with σ ^ 1 , m 1 / σ ^ 2 , m 2 , and pick the next observation as per (12). After obtaining the updated σ ^ 1 , n 1 2 or σ ^ 2 , n 2 2 , we check with the boundary crossing condition (11). This process is repeated until n 1 1 σ ^ 1 , n 1 2 + n 2 1 σ ^ 2 , n 2 2 Δ happens for the first time. By referring to [23], we can claim that Pr ( N 1 < , N 2 < | p 1 , p 2 ) = 1 , which shows that the procedure will stop w.p.1. Finally, with the fully accrued data { X 1 , , X m 1 , , X N 1 ; Y 1 , , Y m 2 , , Y N 2 } , we construct the FWCI
I N 1 , N 2 = log p ^ 1 , N 1 log p ^ 2 , N 2 ± d
for log p 1 log p 2 . If the interval I N 1 , N 2 contains zero, we conclude that there is no significant difference in p 1 and p 2 at a pre-specified level of α ; and if I N 1 , N 2 contains only positive (negative) values, we conclude that p 1 > ( < ) p 2 at level α / 2 .
Algorithm 1: Sequential sampling strategy (11) and allocation scheme (12)
 1Take pilot samples X 1 , . . . , X m 1 and Y 1 , . . . , Y m 2 , where m 1 , m 2 10 ;
 2Assign n 1 as m 1 n 1 and n 2 as m 2 n 2 ;
 3while  n 1 1 σ ^ 1 , n 1 2 + n 2 1 σ ^ 2 , n 2 2 > Δ  do
 4  if  n 1 / n 2 > σ ^ 1 , n 1 / σ ^ 2 , n 2  then
 5      Collect one additional observation from Y;
 6     Update n 2 as n 2 + 1 n 2 ;
 7  else
 8     Collect one additional observation from X;
 9     Update n 1 as n 1 + 1 n 1 ;
10  end
11end
12return  N 1 = n 1 , N 2 = n 2 , and  N = N 1 + N 2 .
The sequential estimation procedure (11)–(12) enjoys the following efficiency properties as summarized in Theorem 1.
Theorem 1.
Under the sequential sampling strategy (11) and the allocation scheme (12), with p 1 , p 2 , d , and α fixed, as d 0 , we have:
( i ) E [ N 1 / n 1 ] 1 , E [ N 1 / n 1 ] 1 , and E [ N / n ] 1 ,
( i i ) Pr log p 1 log p 2 I N 1 , N 2 1 α ,
where n 1 , n 2 , and n come from (10), and I N 1 , N 2 comes from (13).
Proof. 
One can easily find that the associated stopping rule (11) and allocation scheme (12) are similar with the rule R 1 of [24]. Their techniques can be applied here to justify both (26) and (27), so we omit the proof for brevity. One may refer to [25], Chapter 13 of [26], and other sources for many details. □

2.1. Simulated Studies

To investigate the performance of our proposed sequential estimation procedure (11)–(12), we have conducted an extensive set of Monte Carlo simulations. For illustrative purposes, we first present the results under the following settings: X and Y are Bernoulli populations with success probabilities p 1 = 0.3 and p 2 = 0.2 , respectively; the level α is fixed to be 0.05 so that the confidence level is 1 α = 0.95 ; the pilot sample sizes are both set to 20; and a wide range of d (half width) from 0.6 to 0.1 in decrements of 0.1 is taken into account. For each configuration, we have run the simulation for 10,000 times, and summarize the findings in Table 1. We have recorded the three optimal sample sizes ( n 1 , n 2 , n ) , the three average final sample sizes ( n ¯ 1 , n ¯ 2 , n ¯ ) along with the standard deviations, and the three ratios ( n ¯ 1 / n 1 , n ¯ 2 / n 2 , n ¯ / n ) accordingly. In the last but one column, c p ¯ is the proportion of confidence intervals that successfully capture the parameter under estimation, which is to be compared with the confidence level. And in the last column, Power is referred to as the proportion of confidence intervals that successfully identify p 1 > p 2 , that is, the proportion of confidence intervals containing positive values alone. Note that this “power” only provides a conservative estimate, because we are using a two-sided confidence interval to help make a one-sided conclusion.
From Table 1, we find that the three ratios n ¯ 1 / n 1 , n ¯ 2 / n 2 , and n ¯ / n are all slightly below 1. However, as d decreases, the three ratios get closer and closer to 1, which empirically verifies (26) in Theorem 1. The coverage probability averages c p ¯ are all around 1 α = 0.95 , verifying (27). We also observe that Power increases rapidly as d decreases. When d = 0.2 , one is able to conclude p 1 > p 2 at least 97.75% of the time; and when d = 0.1 , this rate increases to 100%. This indicates that our proposed sequential estimation procedure (11)–(12) can help identify which proportion is larger when there does exist a difference in the two proportions for small d values.
In Table 1, we have considered the scenario where p 2 < p 2 < 1 / 2 . Since the optimal sample size n 1 or n 2 is not symmetric about 1/2, we have also carried out a set of simulations when p 1 > p 2 > 1 / 2 . In particular, p 1 = 0.8 and p 2 = 0.7 , and a wide range of d from 0.20 to 0.05 with increment 0.05 have been considered. The findings are displayed in Table 2. There is little to no difference in the performance compared to that summarized in Table 1.
Finally, we investigate the performance of the proposed sequential estimation procedure (11)–(12) when the two proportions are identical. In particular, we have conducted simulations under p 1 = p 2 = 0.2 , α = 0.05 , m 1 = m 2 = 20 with d varying from 0.7 to 0.1. This time, note that Power has the same definition as the coverage probability, so we combine the last two columns c p ¯ and Power in Table 1 and Table 2, and rename it “ c p ¯ /Power”. The findings are summarized in Table 3, which again validate Theorem 1. We leave out many details for brevity.

3. Sequential Confidence Intervals for the Odds Ratio of Two Proportions

In Section 2, the log transformation resulted in an FWCI for log ( p 1 / p 2 ) , the log of the ratio of two proportions. In this section, we consider the logit transformation, which helps to construct an FWCI for log p 1 / ( 1 p 1 ) p 2 / ( 1 p 2 ) = log p 1 1 p 1 log p 2 1 p 2 , the log of the odds ratio of two proportions.
As before, we continue to use p ^ 1 , n 1 and p ^ 2 , n 2 , defined in (1)–(2), as the point estimators of p 1 and p 2 , respectively. By the central limit theorem and the delta method, for i = 1 , 2 ,
n i log p ^ i , n i 1 p ^ i , n i log p i 1 p i d N 0 , δ i 2 ,
as n i , where δ i 2 = [ p i ( 1 p i ) ] 1 . For large enough n 1 and n 2 , we then have
log p ^ 1 , n 1 1 p ^ 1 , n 1 log p ^ 2 , n 2 1 p ^ 2 , n 2 log p 1 1 p 1 log p 2 1 p 2 . N 0 , δ 1 2 n 1 + δ 2 2 n 2 ,
which can be used to construct a large-sample approximate confidence interval for log p 1 1 p 1 log p 2 1 p 2 to compare p 1 and p 2 . With the prefixed half-width d > 0 , we consider the FWCI given by
J n 1 , n 2 = log p ^ 1 , n 1 1 p ^ 1 , n 1 log p ^ 2 , n 2 1 p ^ 2 , n 2 ± d ,
where log p ^ 1 , n 1 1 p ^ 1 , n 1 log p ^ 2 , n 2 1 p ^ 2 , n 2 serves as a point estimator for log p 1 1 p 1 log p 2 1 p 2 . It should be further satisfied that
Pr log p 1 1 p 1 log p 2 1 p 2 J n 1 , n 2 1 α ,
where the confidence level 1 α ( 0 , 1 ) is also prefixed.
Clearly, the FWCI J n 1 , n 2 for log p 1 1 p 1 log p 2 1 p 2 given by (18) is equivalent to the following FACI for the odds ratio p 1 / ( 1 p 1 ) p 2 / ( 1 p 2 ) :
J n 1 , n 2 = e d p ^ 1 , n 1 / ( 1 p ^ 1 , n 1 ) p ^ 2 , n 2 / ( 1 p ^ 2 , n 2 ) , e d p ^ 1 , n 1 / ( 1 p ^ 1 , n 1 ) p ^ 2 , n 2 / ( 1 p ^ 2 , n 2 ) .
Define δ 1 2 = [ p 1 ( 1 p 1 ) ] 1 and δ 2 2 = [ p 2 ( 1 p 2 ) ] 1 . From (19), the required sample size in total, n 1 + n 2 , must satisfy that
δ 1 2 n 1 + δ 2 2 n 2 Δ ,
where Δ is define in (7). Apply the Cauchy–Schwarz inequality, and we obtain
n 1 + n 2 ( δ 1 + δ 2 ) 2 / Δ ,
with equality when n 1 / n 2 = δ 1 / δ 2 . In the same fashion of (10), the optimal sample sizes are
n 1 = δ 1 ( δ 1 + δ 2 ) / Δ , n 2 = δ 2 ( δ 1 + δ 2 ) / Δ , and n = n 1 + n 2 .
Again, it is essential to estimate the unknown δ 1 2 and δ 2 2 by updating their estimators at every stage as necessary. In the spirits of (11)–(12), we propose the following sequential estimation procedure with the associated stopping rule and allocation scheme given by
T = N 1 + N 2 = inf { n 1 + n 2 m 1 + m 2 : n 1 1 δ ^ 1 , n 1 2 + n 2 1 δ ^ 2 , n 2 2 Δ }
with m 1 , m 2 10 , and
if n 1 / n 2 > ( ) δ ^ 1 , n 1 / δ ^ 2 , n 2 , collect one additional observation from Y ( X ) ,
where δ ^ i , n i 2 = [ p ^ i , n i ( 1 p ^ i , n i ) ] 1 , i = 1 , 2 . Since Pr 0 < p ^ i , n i < 1 = 1 , δ ^ i , n i 2 is well-defined w.p.1. The implementation of the sequential estimation procedure (23)–(24) is analogous with that of the procedure (11)–(12), and sampling will terminate w.p.1. After having collected the full data { X 1 , , X m 1 , , X N 1 ; Y 1 , , Y m 2 , , Y N 2 } , we construct the FWCI
J N 1 , N 2 = log p ^ 1 , N 1 1 p ^ 1 , N 1 log p ^ 2 , N 2 1 p ^ 2 , N 2 ± d
for log p 1 1 p 1 log p 2 1 p 2 . If the interval J N 1 , N 2 contains zero, we conclude that there is no significant difference in p 1 and p 2 at level α ; and if J N 1 , N 2 contains only positive (negative) values, we conclude that p 1 > ( < ) p 2 at level α / 2 .
In the spirit of Theorem 1, we state the efficiency properties enjoyed by the sequential estimation procedure (23)–(24) in the following theorem.
Theorem 2.
Under the sequential sampling strategy (23) and the allocation scheme (24), with p 1 , p 2 , d , and α fixed, as d 0 , we have:
( i ) E [ N 1 / n 1 ] 1 , E [ N 1 / n 1 ] 1 , and E [ N / n ] 1 ,
( i i ) Pr log p 1 1 p 1 log p 2 1 p 2 J N 1 , N 2 1 α ,
where n 1 , n 2 , and n come from (22), and J N 1 , N 2 comes from (25).
Proof. 
The proof will be the same with that of Theorem 1, and is thus omitted for brevity. □

Simulated Studies

To investigate the performance of the sequential estimation procedure (23)–(24), we have conducted an extensive set of Monte Carlo simulations in the same fashion of Section 2.1. With the confidence level 1 α = 0.95 and pilot sample sizes m 1 = m 2 = 20 , we have considered the following two scenarios: (i) X and Y are Bernoulli populations with success probabilities p 1 = 0.3 and p 2 = 0.2 , respectively; and (ii) X and Y are Bernoulli populations with identical success probability p 1 = p 2 = 0.2 . We exclude the case in which X and Y are Bernoulli populations with success probabilities p 1 > p 2 > 1 / 2 since the optimal sample sizes n 1 and n 2 are both symmetric about 1/2. The corresponding simulated results are summarized in Table 4 and Table 5.
Comparing the simulated results implementing the sequential estimation procedures (11)–(12) under the log transformation and (23)–(24) under the logit transformation, we find little to no difference in terms of coverage probability and power performance. However, distinctions emerge regarding the sample size needed: for the same d value, the former procedure requires a smaller sample size, indicating greater efficiency; and conversely, the latter procedure presents a smaller standard deviation in sample size, suggesting lower variability and greater “robustness”. For clarity and brevity, Figure 1 illustrates a visual comparison of the terminal sample sizes obtained from implementing the sequential estimation procedures (11)–(12) and (23)–(24) under Bernoulli success probabilities p 1 = 0.3 and p 2 = 0.2 as a representative example.

4. Mobile Games A/B Testing

Now, we are in a position to revisit the mobile games A/B testing problem described in Section 1. To illustrate the application, we analyze a dataset collected from the Kaggle platform accessed on 3 March 2024 (https://www.kaggle.com/code/yufengsui/datacamp-project-mobile-games-a-b-testing/notebook), referred to as the Cookie Cats data. The dataset contains information on over 90,000 users of the mobile puzzle game Cookie Cats, developed by Tactile Entertainment. The following variables are included.
  • userid: A unique label that identifies each user.
  • version: The version of the game the user played, either with the first gate at level 30 (gate_30, version A) or at level 40 (gate_40, version B).
  • sum_gamerounds: The total number of game rounds played by the user during the first week after installation.
  • retention_1: A binary indicator of whether the user returned to play the game one day after installation (True) or not (False).
  • retention_7: A binary indicator of whether the user returned to play the game seven days after installation (True) or not (False).
Our primary focus is on the variable retention_7, which measures 7-day retention. This metric is used to determine which version of the game more successfully retains users.
The two Cookie Cats versions can be modeled as two independent Bernoulli populations, as each version was tested on a separate and non-overlapping group of players. Let p 1 and p 2 denote the 7-day retention rates of version A and version B, respectively. With fixed α = 0.05 and d = 0.1 , we implemented both sequential estimation procedures (11)–(12) and (23)–(24) to collect the data needed for constructing FWCIs to compare the magnitudes of p 1 and p 2 . The outcomes of these comparisons determine which version is better. To initiate the process, pilot samples of size 50 were taken for each version. The summary of the analyses is displayed in Table 6, where N 1 and N 2 represent the terminal numbers of users playing version A and version B, respectively, while p ^ 1 and p ^ 2 denote the sample 7-day retention rates of version A and version B, respectively. The FWCI refers to either the interval I N 1 , N 2 for log p 1 log p 2 as defined in (13), or the interval J N 1 , N 2 for log p 1 1 p 1 log p 2 1 p 2 as defined in (25) accordingly.
The sequential estimation procedure (11)–(12) terminated with 384 observations from version A and 427 observations from version B. The resulting FWCI for log p 1 log p 2 is [ 0.059 , 0.259 ] , indicating that version A, with the first gate at level 30, has a significantly higher 7-day retention rate.
Not surprisingly, the sequential estimation procedure (23)–(24) required larger sample sizes, terminating with 581 observations from version A and 632 observations from version B. The resulting FWCI for log p 1 1 p 1 log p 2 1 p 2 is [ 0.167 , 0.367 ] , indicating that version A, with the first gate at level 30, has a significantly higher 7-day retention rate, too. Both FWCIs consistently support the conclusion that assigning the first gate in Cookie Cats at level 30 is more appealing than assigning it at level 40 in terms of the 7-day retention rate.

5. Online Advertising Effectiveness

In this section, we revisit the second application for A/B testing in online advertising described in Section 1 earlier: a large company seeks to increase sales through advertisements and has substantial user base plans. To assess the effectiveness of advertisements in boosting sales, an A/B testing experiment was conducted using a dataset collected from the Kaggle platform accessed on 11 May 2024 (https://www.kaggle.com/datasets/farhadzeynalli/online-advertising-effectiveness-study-ab-testing/data), referred to as the Online Advertising data. The dataset contains information on users’ interactions with online advertisements, including the following variables.
  • customerID: A unique identifier for each individual customer.
  • made_purchase: A binary indicator of whether the user made a purchase after viewing an advertisement (TRUE) or not (FALSE).
  • test group: Specifies whether the user is in the “ads” (advertisements) group or the “psa” (public service announcements, no ads) group.
  • days_with_most_ads: The day of the month when the user viewed the most ads.
  • peak_ad_hours: The hour of the day when the user viewed the most ads.
  • ad_count: The total number of ads viewed by each user.
In this case, we hypothesize that implementing advertisements can lead to an increase in sales, which would be supported if the purchase rate in the ads group is significantly higher than that in the psa group.
The ads and psa groups can be modeled as independent Bernoulli populations because each group consisted of distinct and non-overlapping sets of users. Let p 1 and p 2 denote the purchase rates of the ads group and the psa group, respectively. With fixed α = 0.05 and d = 0.3 , we implemented both sequential estimation procedures (11)–(12) and (23)–(24) to collect the data needed for constructing FWCIs to compare the magnitudes of p 1 and p 2 , which are further used to evaluate the effectiveness of advertisements in boosting sales. To initiate the process, pilot samples of size 500 were taken for each group. The summary of the analyses is displayed in Table 7, where N 1 and N 2 represent the terminal sample sizes for the ads and psa groups, respectively, while p ^ 1 and p ^ 2 denote the sample purchase rates of the ads and psa groups, respectively. The FWCI refers to either the interval I N 1 , N 2 for log p 1 log p 2 as defined in (13), or the interval J N 1 , N 2 for log p 1 1 p 1 log p 2 1 p 2 as defined in (25) accordingly.
The sequential estimation procedure (11)–(12) terminated with 1424 observations from the ads group and 2121 observations from the psa group. The resulting FWCI for log p 1 log p 2 is [ 0.445 , 1.045 ] , indicating that the ads group has a significantly higher purchase rate than the psa group.
Similarly, the sequential estimation procedure (23)–(24) required larger sample sizes, terminating with 1595 observations from the ads group and 2253 observations from the psa group. The resulting FWCI for log p 1 1 p 1 log p 2 1 p 2 is [ 0.452 , 1.052 ] , confirming that the ads group has a significantly higher purchase rate compared to the psa group. Both FWCIs consistently demonstrate that implementing advertisements increases the purchase rate relative to public service announcements.

6. Conclusions

Traditional A/B tests typically rely on a fixed sample size, with larger sample sizes generally preferred to ensure statistical reliability. In contrast, sequential estimation procedures offer more flexibility by allowing data collection and analysis at multiple points throughout the process. In this paper, we present a comprehensive study on the application of sequential confidence intervals for comparing two independent Bernoulli proportions in A/B testing, focusing on two real-world scenarios: mobile game design optimization and online advertising effectiveness. We proposed two types of fixed-width confidence intervals (FWCIs), one based on log transformation and the other on logit transformation, to evaluate key performance metrics including mobile game retention rates and purchase rates, respectively. Both approaches efficiently determined significant differences between experimental groups while minimizing data requirements. The findings demonstrate the practical utility of these methods, successfully identifying optimal strategies for both applications: assigning the first gate of Cookie Cats at level 30 and implementing advertisements to boost sales.
As this study primarily focuses on comparing two independent proportions, the methodologies can be extended to broader contexts. For example, applications in online banking could enable financial managers to compare the proportions of users completing credit card applications, thereby refining services based on statistical evidence to improve user satisfaction and overall performance of their credit card offerings.
Future work could explore scenarios involving three or more independent Bernoulli populations, paving the way for research on ranking and selection problems, such as identifying the best-performing population. Sequential stopping rules and allocation schemes could be proposed to construct simultaneous FWCIs for all pairwise comparisons. In a parallel direction, bandit problems for studying the exploration-exploitation trade-off in sequential decision have gained a lot of attention in machine learning and reinforcement learning. We could apply FWCIs to two-armed or multi-armed Bernoulli bandits, opening up new possibilities in adaptive decision-making under uncertainty. Such extensions would broaden the scope of sequential A/B testing, making it a powerful tool for handling complex decision-making scenarios across various industries.

Author Contributions

Conceptualization, methodology, J.H.; software, J.H. and I.A.; formal analysis, J.H. and L.Z.; validation, L.Z.; data curation, L.Z. and I.A.; writing, J.H., L.Z. and I.A. All authors have read and agreed to the published version of the manuscript.

Funding

The first author’s research was supported in part by the 2024 Summer URC Faculty Research Fellowship from Oakland University, Rochester, MI, USA.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

We sincerely thank the three reviewers for their insightful comments and constructive suggestions, which have greatly contributed to improving the quality and clarity of our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Kohavi, R.; Longbotham, R.; Sommerfield, D.; Henne, R.M. Controlled experiments on the web: Survey and practical guide. Data Min. Knowl. Discov. 2009, 18, 140–181. [Google Scholar] [CrossRef]
  2. Brata, K.C.; Brata, A.H. User experience improvement of Japanese language mobile learning application through mental model and A/B testing. Int. J. Electr. Comput. Eng. 2020, 10, 2659. [Google Scholar] [CrossRef]
  3. Wilson, E.B. Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 1927, 22, 209–212. [Google Scholar] [CrossRef]
  4. Clopper, C.J.; Pearson, E.S. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 1934, 26, 404–413. [Google Scholar] [CrossRef]
  5. Agresti, A.; Coull, B.A. Approximate is better than “exact” for interval estimation of binomial proportions. Am. Stat. 1998, 52, 119–126. [Google Scholar]
  6. Newcombe, R.G. Two-sided confidence intervals for the single proportion; comparison of seven methods. Stat. Med. 1998, 17, 857–872. [Google Scholar] [CrossRef]
  7. Brown, L.D.; Cai, T.T.; DasGupta, A. Interval estimation for a binomial proportion. Stat. Sci. 2001, 16, 101–133. [Google Scholar] [CrossRef]
  8. Brown, L.D.; Cai, T.T.; DasGupta, A. Confidence intervals for a binomial proportion and asymptotic expansions. Ann. Stat. 2002, 30, 160–201. [Google Scholar] [CrossRef]
  9. Wang, W. Smallest confidence intervals for one binomial proportion. J. Stat. Plan. Inference 2006, 136, 4293–4306. [Google Scholar] [CrossRef]
  10. Goodman, L.A. Simultaneous confidence intervals for contrast among multinomial populations. Ann. Math. Stat. 1964, 35, 716–725. [Google Scholar] [CrossRef]
  11. Newcombe, R.G. Interval estimation for the difference between independent proportions: Comparison of eleven methods. Stat. Med. 1998, 17, 873–890. [Google Scholar] [CrossRef]
  12. Agresti, A.; Caffo, B. Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Am. Stat. 2000, 54, 280–288. [Google Scholar] [CrossRef]
  13. Zhou, X.H.; Tsao, M.; Qin, G. New intervals for the difference between two independent binomial proportions. J. Stat. Plan. Inference 2004, 123, 97–115. [Google Scholar] [CrossRef]
  14. Brown, L.D.; Li, X. Confidence intervals for two sample binomial distribution. J. Stat. Plan. Inference 2005, 130, 359–375. [Google Scholar] [CrossRef]
  15. Cao, X.; Wang, W.; Xie, T. An optimal exact confidence interval for the difference of two independent binomial proportions. Stat. Methods Med. Res. 2024. ahead of print. [Google Scholar] [CrossRef] [PubMed]
  16. Asparouhov, T.Z. Sequential Fixed Width Confidence Intervals. Doctoral Dissertation, California Institute of Technology, Pasadena, CA, USA, 2000. [Google Scholar]
  17. Frey, J. Fixed-width sequential confidence intervals for a proportion. Am. Stat. 2010, 64, 242–249. [Google Scholar] [CrossRef]
  18. Liu, W.; Zhou, S. Construction of fixed width confidence intervals for a Bernoulli success probability using sequential sampling: A simulation study. J. Stat. Comput. Simul. 2011, 81, 1483–1493. [Google Scholar] [CrossRef]
  19. Yaacoub, T.; Goldsman, D.; Mei, Y.; Moustakides, G.V. Tandem-width sequential confidence intervals for a Bernoulli proportion. Seq. Anal. 2019, 38, 163–183. [Google Scholar] [CrossRef]
  20. Shan, G. Accurate confidence intervals for proportion in studies with clustered binary outcome. Stat. Methods Med. Res. 2020, 29, 3006–3018. [Google Scholar] [CrossRef] [PubMed]
  21. Erazo, I.; Goldsman, D. Efficient confidence intervals for the difference of two Bernoulli distributions’ success parameters. J. Simul. 2023, 17, 76–93. [Google Scholar] [CrossRef]
  22. Erazo, I.; Goldsman, D.; Mei, Y. Cost-efficient fixed-width confidence intervals for the difference of two Bernoulli proportions. J. Simul. 2023, 18, 726–744. [Google Scholar] [CrossRef]
  23. Chow, Y.S.; Robbins, H. On the asymptotic theory of fixed-width sequential confidence intervals for the mean. Ann. Math. Stat. 1965, 36, 457–462. [Google Scholar] [CrossRef]
  24. Robbins, H.; Simons, G.; Starr, N. A sequential analogue of the Behrens-Fisher problem. Ann. Math. Stat. 1967, 38, 1384–1391. [Google Scholar] [CrossRef]
  25. Srivastava, M.S. On a sequential analogue of the Behrens–Fisher problem. J. R. Stat. Soc. Ser. B 1970, 32, 144–148. [Google Scholar] [CrossRef]
  26. Mukhopadhyay, N.; de Silva, B.M. Sequential Methods and Their Applications; CRC: Boca Ratton, FL, USA, 2009. [Google Scholar]
Figure 1. Comparison of terminal sample sizes from Table 1 and Table 4.
Figure 1. Comparison of terminal sample sizes from Table 1 and Table 4.
Mathematics 13 00161 g001
Table 1. Simulated results with p 1 = 0.3 , p 2 = 0.2 , α = 0.05 , and m 1 = m 2 = 20 implementing the sequential estimation procedure (11)–(12) under 10,000 runs.
Table 1. Simulated results with p 1 = 0.3 , p 2 = 0.2 , α = 0.05 , and m 1 = m 2 = 20 implementing the sequential estimation procedure (11)–(12) under 10,000 runs.
d n 1 n ¯ 1 s ( n 1 ) n ¯ 1 / n 1 n 2 n ¯ 2 s ( n 2 ) n ¯ 2 / n 2 n n ¯ s ( n ) n ¯ / n cp ¯ Power
0.6 57.50 55.63 12.52 0.9676 75.28 72.60 17.57 0.9643 132.78 128.23 27.00 0.9657 0.9515 0.2303
0.5 82.80 80.94 15.17 0.9775 108.41 105.81 21.24 0.9761 191.20 186.75 32.75 0.9767 0.9536 0.3302
0.4 129.37 127.31 18.99 0.9841 169.39 166.59 26.45 0.9835 298.76 293.91 40.89 0.9838 0.9514 0.4883
0.3 229.99 227.92 25.31 0.9910 301.13 298.53 35.57 0.9914 531.12 526.45 54.69 0.9912 0.9522 0.7460
0.2 517.48 514.83 38.15 0.9948 677.54 674.26 53.27 0.9952 1195.02 1189.09 82.33 0.9950 0.9502 0.9775
0.1 2069.93 2066.66 76.96 0.9984 2710.17 2706.13 105.80 0.9985 4780.09 4772.79 164.08 0.9985 0.9469 1.0000
Table 2. Simulated results with p 1 = 0.8 , p 2 = 0.7 , α = 0.05 , and m 1 = m 2 = 20 implementing the sequential estimation procedure (11)–(12) under 10,000 runs.
Table 2. Simulated results with p 1 = 0.8 , p 2 = 0.7 , α = 0.05 , and m 1 = m 2 = 20 implementing the sequential estimation procedure (11)–(12) under 10,000 runs.
d n 1 n ¯ 1 s ( n 1 ) n ¯ 1 / n 1 n 2 n ¯ 2 s ( n 2 ) n ¯ 2 / n 2 n n ¯ s ( n ) n ¯ / n cp ¯ Power
0.20 55.44 55.59 14.00 1.0026 72.59 71.96 15.93 0.9913 128.04 127.55 26.89 0.9962 0.9476 0.2313
0.15 98.57 98.52 19.35 0.9995 129.06 128.53 21.43 0.9959 227.62 227.05 36.75 0.9975 0.9447 0.4016
0.10 221.78 221.46 28.26 0.9986 290.38 290.01 31.56 0.9987 512.15 511.47 53.63 0.9987 0.9473 0.7455
0.05 887.11 888.64 55.87 1.0017 1161.50 1162.35 61.93 1.0007 2048.61 2050.99 105.58 1.0012 0.9503 0.9989
Table 3. Simulated results with p 1 = p 2 = 0.2 , α = 0.05 , and m 1 = m 2 = 20 implementing the sequential estimation procedure (11)–(12) under 10,000 runs.
Table 3. Simulated results with p 1 = p 2 = 0.2 , α = 0.05 , and m 1 = m 2 = 20 implementing the sequential estimation procedure (11)–(12) under 10,000 runs.
d n 1 n ¯ 1 s ( n 1 ) n ¯ 1 / n 1 n 2 n ¯ 2 s ( n 2 ) n ¯ 2 / n 2 n n ¯ s ( n ) n ¯ / n cp ¯ /Power
0.7 62.72 59.83 15.66 0.9540 62.72 59.91 15.53 0.9552 125.44 119.74 27.93 0.9546 0.9582
0.6 85.37 82.61 18.15 0.9677 85.37 82.49 18.14 0.9663 170.73 165.10 32.55 0.9670 0.9573
0.5 122.93 120.04 21.86 0.9765 122.93 120.00 21.82 0.9762 245.85 240.04 39.10 0.9764 0.9541
0.4 192.07 189.46 27.65 0.9864 192.07 188.96 27.29 0.9838 384.15 378.42 49.39 0.9851 0.9573
0.3 341.46 338.70 36.59 0.9919 341.46 338.45 36.60 0.9912 682.93 677.15 65.49 0.9915 0.9528
0.2 768.29 764.50 54.84 0.9951 768.29 765.13 55.06 0.9959 1536.58 1529.64 98.31 0.9955 0.9478
0.1 3073.17 3067.66 109.34 0.9982 3073.17 3068.90 110.25 0.9986 6164.33 6136.55 196.40 0.9984 0.9493
Table 4. Simulated results with p 1 = 0.3 , p 2 = 0.2 , α = 0.05 , and m 1 = m 2 = 20 implementing the sequential estimation procedure (23)–(24) under 10,000 runs.
Table 4. Simulated results with p 1 = 0.3 , p 2 = 0.2 , α = 0.05 , and m 1 = m 2 = 20 implementing the sequential estimation procedure (23)–(24) under 10,000 runs.
d n 1 n ¯ 1 s ( n 1 ) n ¯ 1 / n 1 n 2 n ¯ 2 s ( n 2 ) n ¯ 2 / n 2 n n ¯ s ( n ) n ¯ / n cp ¯ Power
0.8 61.33 61.75 5.70 1.0068 70.26 70.54 9.68 1.0040 131.59 132.29 13.95 1.0053 0.9596 0.2280
0.7 80.10 80.43 6.51 1.0041 91.77 92.07 11.12 1.0033 171.87 172.50 16.08 1.0036 0.9622 0.3067
0.6 109.03 109.37 7.63 1.0031 124.91 125.10 12.93 1.0015 233.93 234.46 18.74 1.0023 0.9588 0.3944
0.5 157.00 157.31 9.19 1.0020 179.86 179.88 15.44 1.0001 336.86 337.19 22.44 1.0010 0.9553 0.5365
0.4 245.31 245.59 11.54 1.0011 281.04 281.52 19.57 1.0017 526.35 527.11 28.40 1.0014 0.9521 0.7572
0.3 436.11 436.41 15.59 1.0007 499.62 499.97 26.18 1.0007 935.73 936.38 38.28 1.0007 0.9534 0.9398
0.2 981.24 981.00 23.00 0.9998 1124.15 1123.92 38.96 0.9998 2105.39 2104.93 56.68 0.9998 0.9512 0.9993
0.1 3924.95 3924.65 46.71 0.9999 4496.60 4496.08 78.45 0.9999 8421.55 8420.72 114.69 0.9999 0.9492 1.0000
Table 5. Simulated results with p 1 = p 2 = 0.2 , α = 0.05 , and m 1 = m 2 = 20 implementing the sequential estimation procedure (23)–(24) under 10,000 runs.
Table 5. Simulated results with p 1 = p 2 = 0.2 , α = 0.05 , and m 1 = m 2 = 20 implementing the sequential estimation procedure (23)–(24) under 10,000 runs.
d n 1 n ¯ 1 s ( n 1 ) n ¯ 1 / n 1 n 2 n ¯ 2 s ( n 2 ) n ¯ 2 / n 2 n n ¯ s ( n ) n ¯ / n cp ¯ /Power
0.8 75.03 74.93 10.08 0.9987 75.03 74.90 10.08 0.9983 150.06 149.83 17.93 0.9985 0.9591
0.7 98.00 97.96 11.71 0.9996 98.00 97.80 11.38 0.9980 195.99 195.76 20.57 0.9988 0.9615
0.6 133.38 133.26 13.63 0.9991 133.38 133.26 13.52 0.9991 266.77 266.52 24.17 0.9991 0.9567
0.5 192.07 191.96 16.34 0.9994 192.07 192.02 16.59 0.9997 384.15 383.98 29.43 0.9996 0.9522
0.4 300.11 300.11 20.56 1.0000 300.11 299.99 20.75 0.9996 600.23 600.10 36.92 0.9998 0.9537
0.3 533.54 533.24 27.47 0.9995 533.54 533.49 27.63 0.9999 1067.07 1066.73 49.24 0.9997 0.9510
0.2 1200.46 1199.64 41.11 0.9993 1200.46 1199.51 41.07 0.9992 2400.91 2399.15 73.45 0.9993 0.9505
0.1 4801.82 4800.14 82.81 0.9996 4801.82 4800.63 82.44 0.9998 9603.65 9600.77 148.06 0.9997 0.9484
Table 6. A/B testing for Cookie Cats implementing sequential estimation procedures (11)–(12) and (23)–(24).
Table 6. A/B testing for Cookie Cats implementing sequential estimation procedures (11)–(12) and (23)–(24).
Procedure N 1 N 2 p ^ 1 p ^ 2 FWCI
Sequential estimation procedure (11)–(12)384427 0.199 0.170 [ 0.059 , 0.259 ]
Sequential estimation procedure (23)–(24)581632 0.206 0.166 [ 0.167 , 0.367 ]
Table 7. A/B testing for online advertising effectiveness implementing sequential estimation procedures (11)–(12) and (23)–(24).
Table 7. A/B testing for online advertising effectiveness implementing sequential estimation procedures (11)–(12) and (23)–(24).
Procedure N 1 N 2 p ^ 1 p ^ 2 FWCI
Sequential estimation procedure (11)–(12)14242121 0.069 0.033 [ 0.445 , 1.045 ]
Sequential estimation procedure (23)–(24)15952253 0.069 0.034 [ 0.452 , 1.052 ]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, J.; Zheng, L.; Alanazi, I. Sequential Confidence Intervals for Comparing Two Proportions with Applications in A/B Testing. Mathematics 2025, 13, 161. https://doi.org/10.3390/math13010161

AMA Style

Hu J, Zheng L, Alanazi I. Sequential Confidence Intervals for Comparing Two Proportions with Applications in A/B Testing. Mathematics. 2025; 13(1):161. https://doi.org/10.3390/math13010161

Chicago/Turabian Style

Hu, Jun, Lijia Zheng, and Ibtihal Alanazi. 2025. "Sequential Confidence Intervals for Comparing Two Proportions with Applications in A/B Testing" Mathematics 13, no. 1: 161. https://doi.org/10.3390/math13010161

APA Style

Hu, J., Zheng, L., & Alanazi, I. (2025). Sequential Confidence Intervals for Comparing Two Proportions with Applications in A/B Testing. Mathematics, 13(1), 161. https://doi.org/10.3390/math13010161

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop