On the Conjecture of Berry Regarding a Bernoulli Two-Armed Bandit

Zhang, Jichen; Wu, Panyu

doi:10.3390/math11030733

Open AccessArticle

On the Conjecture of Berry Regarding a Bernoulli Two-Armed Bandit

by

Jichen Zhang

¹

and

Panyu Wu

^2,*

¹

School of Mathematics, Shandong University, Jinan 250100, China

²

Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan 250100, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(3), 733; https://doi.org/10.3390/math11030733

Submission received: 25 December 2022 / Revised: 21 January 2023 / Accepted: 23 January 2023 / Published: 1 February 2023

(This article belongs to the Special Issue Statistical Methods in Mathematical Finance and Economics)

Download Versions Notes

Abstract

:

In this paper, we study an independent Bernoulli two-armed bandit with unknown parameters

ρ

and

λ

, where

ρ

and

λ

have a pair of priori distributions such that

d R (ρ) = C_{R} ρ^{r_{0}} {(1 - ρ)}^{r_{0}^{'}} d μ (ρ),

d L (λ) = C_{L} λ^{l_{0}} {(1 - λ)}^{l_{0}^{'}} d μ (λ)

and

μ

is an arbitrary positive measure on

[0, 1]

. Berry proposed the conjecture that, given a pair of priori distributions

(R, L)

of parameters

ρ

and

λ

, the arm with R is the current optimal choice if

r_{0} + r_{0}^{'} < l_{0} + l_{0}^{'}

and the expectation of

ρ

is not less than that of

λ

. We give an easily verifiable equivalent form of Berry’s conjecture and use it to prove that Berry’s conjecture holds when R and L are two-point distributions as well as when R and L are beta distributions and the number of trials

N \leq ⌊\frac{r_{0}}{r_{0}^{'}}⌋ + 1

.

Keywords:

Bernoulli two-armed bandit; stochastically maximizing; prior distributions; Bayesian decision theory

MSC:

62C10; 62L10

1. Introduction

The bandit problem is a well-known problem in sequential control under conditions of incomplete information. It involves sequential selections from several options referred to as the arms of the bandit. The payoffs of these arms are characterized by parameters which are typically unknown. Gamblers should learn from the past information when deciding which arm to select next with an aim to maximize the total payoffs.

This problem was first raised by Thompson in the study of medical trials [1] and has been applied to market pricing (see [2]), digital marketing (see [3]), search problems (see [4]) and many other sequential statistical decision problems which are characterized by the trade-off between exploration and exploitation (i.e., between long-term benefits and short-term gains). For example, gamblers may choose to make enough observations of each arm in the early stages to estimate the gain for each arm and then select the arm with the largest estimated gain in the later stages. Observations of bad arms in the early stages can reduce short-term gains, but the information they bring can enhance long-term gains. The trade-off between short-term and long-term gains to maximize total payoffs is the key to the bandit problem.

There are three main schools of early research on the bandit problem: Berry’s school, which focuses on the finite-horizon setting, Gittins’s school, which studies an infinite horizon with discounting, and Robbins’s school, which focuses on the time-averaged infinite-horizon setting.

Here, we focus on the two-armed bandit problem proposed by Berry [5]. It is an important foundational model, and there are many variants based on it, such as the models in [6,7]. It can also be used directly in practice, such as in the study of human selection behavior [8].

Let

R

and

L

denote the independent Bernoulli processes with parameters

ρ

and

λ

, respectively. We call

R

the right arm and

L

the left arm. An observation on either arm is called a pull. A right pull or a left pull is made at each of N stages, and the result of the pull at each stage is known before a right or left pull is made in the next stage. In Berry’s setting, the parameters

ρ

and

λ

associated with

R

and

L

, respectively, are not known but are random variables. The sequences of successes and failures associated with the right and left arms are therefore not sequences of independent Bernoulli trials but are conditionally independent from the unknown parameters

ρ

and

λ

. The goal of this problem is to find a strategy to maximize the expected number of wins after N pulls.

Berry used Bayesian theory to investigate this problem and assumed that the prior distributions of

ρ

and

λ

had the following unique form:

\begin{matrix} d R (ρ) & = C_{R} ρ^{r_{0}} {(1 - ρ)}^{r_{0}^{'}} d μ_{R} (ρ), \end{matrix}

(1)

\begin{matrix} d L (λ) & = C_{L} λ^{l_{0}} {(1 - λ)}^{l_{0}^{'}} d μ_{L} (λ), \end{matrix}

(2)

where

μ_{R}

and

μ_{L}

are arbitrary positive measures on

[0, 1]

and

C_{R}

and

C_{L}

are normalizing constants.

Although it seems simple, this model is not completely solved. Its optimal strategy has an expression only in a few cases, and in most cases, the optimal strategy can only be calculated by a recursive formula (Equation (12) in this paper), which is difficult to compute when N is large, even using a computer. Berry proposed a conjecture (Conjecture B in [5]) that arm

R

is the current optimal choice if

μ_{R} = μ_{L}

,

r_{0} + r_{0}^{'} < l_{0} + l_{0}^{'}

and

E (ρ | r_{0}, r_{0}^{'}) \geq E (λ | l_{0}, l_{0}^{'})

. Here,

E (ρ | r_{0}, r_{0}^{'})

and

E (λ | l_{0}, l_{0}^{'})

are the expectations of

ρ

and

λ

with respect to the distributions R and L, respectively. As mentioned in [9], no significant progress has been made in the computation of optimal strategies for over 40 years. The confirmation of Berry’s conjecture can avoid the use of recursive formulas in many cases and greatly improve the speed of optimal strategy computation for Berry’s bandit models.

The study of Berry’s conjecture is an important step to improve the theory of bandit models. Intuitively, if an arm is less observed, then choosing it will bring more long-term benefits since the information it brings will help us in our later choices. Berry’s conjecture tells us that if an arm has both higher short-term gains and higher long-term gains, then it must be optimal. This is consistent with our intuition. Although Berry’s conjecture is of great importance, it is difficult to prove. There are few relevant references. Joshi [10] published a paper in The Annals of Statistics announcing the proof of Berry’s conjecture. Unfortunately, Joshi later announced that this proof was wrong [10]. Yue [11] studied a problem similar to the set-up in our Theorem 6, but Yue studied a two-stage bandit model, which differs significantly from the model studied in this paper.

After years of research, more and more new models and strategies have arisen. Many have turned to asymptotically optimal and suboptimal strategies. Here are a few examples. The famous Gittins index strategy introduced by Gittins and Jones [12] assigns each arm an index as a function of its current state and then activates the arm with the largest index value. This policy optimizes the infinite-horizon expected discounted costs and infinite-horizon long-run average cost. If more than one arm can change its state in every period, then the problem becomes a so-called restless problem. Whittle [13] proposed an index rule to solve the restless problem. This index is not necessarily optimal, but Weber and Weiss [14] proved that it would admit a form of asymptotic optimality as both the number of arms and the number of allocated arms in each period grow to infinity at a fixed proportion. The restless multi-armed bandit model can be used in many applications, such as clinical trials, sensor management and capacity management in healthcare (see [15,16,17,18,19,20]).

A major drawback of the Gittins index and Whittle index is that they are both difficult to calculate. The current fastest algorithm can only solve the index in cubic time [21,22]. A second drawback of the Gittins index is that the arms must have independent parameters, and the discounting scheme must be geometric. If these conditions are not met, then the Gittins index strategy is only suboptimal [23,24].

Another important strategy is the upper confidence bound (UCB)strategy. Lai and Robbins [25] laid out the theory of asymptotically optimal allocation and were the first to actually use the term upper confidence bound. Each arm is assigned a UCB for its mean reward, and the arm with the largest bound is to be played. The bound is not the conventional upper limit for a confidence interval and is not easy to compute. The design of the confidence bound has been successively improved in [26,27,28,29,30,31,32,33]. Among them, the kl-UCB strategy [30] and Bayes-UCB strategy [33] are asymptotically optimal for exponential family bandit models.

The Gittins index, UCB method and other strategies such as Thompson sampling and

ϵ

-greedy are all suboptimal when applied to Berry’s model. When the number of pulls N is not very large, there is a significant difference from the optimal strategy. Therefore, it is still necessary to prove Berry’s conjecture and accelerate the computation of the optimal strategy.

In this paper, we prove that Berry’s conjecture is equivalent to the following statement:

Statement.

If

μ_{R} = μ_{L} = μ

,

r_{0} + r_{0}^{'} \leq l_{0} + l_{0}^{'}

and

E (ρ | r_{0}, r_{0}^{'}) = E (λ | l_{0}, l_{0}^{'})

, then arm

R

is the optimal choice.

This result reveals that Berry’s conjecture is essentially a quantitative study of the relationship between exploitation and exploration. This shows that when the prior distributions of the two arms are equal, the one with fewer observations is more worthy of selection. Using this result, we studied two specific models.

The first special case is where

μ = τ

and

τ

is a Bernoulli distribution, with a concentrating probability of

\frac{1}{2}

on each of

τ_{1}

and

τ_{2}

. In this case, we prove that our Statement holds and obtain a more complete conclusion than Berry’s conjecture. For any real numbers

r_{0}

,

r_{0}^{'}

,

l_{0}

and

l_{0}^{'}

,the right arm is currently optimal if and only if

E (ρ | r_{0}, r_{0}^{'}) \geq E (λ | l_{0}, l_{0}^{'})

. This is consistent with the conclusion that Berry obtained in a different way in [5].

The second special case is where the initial distributions R and L are both beta distributions. A partial result is obtained in this case. Let

r_{0}, r_{0}^{'}, l_{0}

and

l_{0}^{'}

be positive real numbers and satisfy

r_{0} + r_{0}^{'} \leq l_{0} + l_{0}^{'}

and

\frac{r_{0}}{r_{0}^{'}} \geq \frac{l_{0}}{l_{0}^{'}}

. If the number of remaining pulls

N \leq ⌊\frac{r_{0}}{r_{0}^{'}}⌋ + 1

, then the current optimal choice is the right arm. Here,

⌊x⌋

denotes the largest integer less than or equal to x.

This paper is organized as follows. In Section 2, the concepts and results used in this paper are given. In Section 3, the main result is obtained, which proves the equivalence between Berry’s conjecture and our Statement. In Section 4 and Section 5, we discuss two specific cases, where the initial distributions R and L are both two-point distributions or both beta distributions, respectively. Finally, Section 6 gives the conclusions and future research directions.

2. Preliminarys

A brief introduction of the notation and the structure of the problem is given below. See [5] for details.

As we mentioned, gamblers need to choose from two arms, namely the right arm

R

and the left arm

L

. Arms

R

and

L

are independent Bernoulli processes with unknown parameters

ρ

and

λ

, respectively. Berry used Bayesian theory to investigate this problem. Let

I_{k}

denote the pattern of information known about

R

and

L

at stage

k + 1

, which is regarded as a pair of probability distributions of the unknown parameters

ρ

and

λ

.

I_{0}

denotes the initial information, consisting of the distribution of

ρ

and

λ

. Specifically,

R = R (ρ)

denotes the distribution of

ρ

,

L = L (λ)

denotes the distribution of

λ

and the initial information

I_{0} = (R, L)

. If the right arm

R

is pulled, then we update the distribution R of the right arm according to its result using the Bayesian theory. Similarly, a pull on the left arm also updates L to a posterior distribution.

A common goal of gamblers is to maximize their payoffs. Assuming that the utility function of their payoffs is linear, the goal of this problem is to find a strategy to maximize the expected number of wins.

2.1. The Initial Distributions

In this model, Berry considered a special form of initial distribution. We use the initial probability distributions R and L of the Bernoulli parameters

ρ

and

λ

for the right arm

R

and the left arm

L

, respectively, as follows:

\begin{matrix} d R (ρ) & = \frac{1}{v (r_{0}, r_{0}^{'}; μ_{R})} ρ^{r_{0}} {(1 - ρ)}^{r_{0}^{'}} d μ_{R} (ρ), \end{matrix}

(3)

\begin{matrix} d L (λ) & = \frac{1}{v (l_{0}, l_{0}^{'}; μ_{L})} λ^{l_{0}} {(1 - λ)}^{l_{0}^{'}} d μ_{L} (λ), \end{matrix}

(4)

where

μ_{R}

and

μ_{L}

are arbitrary positive measures on

[0, 1]

and

v (r_{0}, r_{0}^{'}; μ_{R})

and

v (l_{0}, l_{0}^{'}; μ_{L})

are defined by

\begin{matrix} v (r_{0}, r_{0}^{'}; μ_{R}) & = \int_{0}^{1} ρ^{r_{0}} {(1 - ρ)}^{r_{0}^{'}} d μ_{R} (ρ), \\ v (l_{0}, l_{0}^{'}; μ_{L}) & = \int_{0}^{1} λ^{l_{0}} {(1 - λ)}^{l_{0}^{'}} d μ_{L} (λ) . \end{matrix}

Note that

r_{0}, r_{0}^{'}

and

l_{0}, l_{0}^{'}

here are not necessarily integers but any real numbers that can make

v (r_{0}, r_{0}^{'}; μ_{R})

and

v (l_{0}, l_{0}^{'}; μ_{L})

converge. The set of

(r_{0}, r_{0}^{'})

or

(l_{0}, l_{0}^{'})

making

v (r_{0}, r_{0}^{'}; μ_{R}) < \infty

or

v (l_{0}, l_{0}^{'}; μ_{L}) < \infty

is called the possibility region of

μ_{R}

or

μ_{L}

. It is easy to verify that when

δ_{r}

and

δ_{r}^{'}

are nonnegative real numbers,

v (r_{0} + δ_{r}, r_{0}^{'} + δ_{r}^{'}; μ_{R}) < \infty

if

v (r_{0}, r_{0}^{'}; μ_{R}) < \infty

. Therefore, for any measure

μ_{R}

which assigns a positive measure to the interior of the unit interval, the possibility region for

μ_{R}

is a quadrant of the

(r_{0}, r_{0}^{'})

plane. If

r_{0}

and

r_{0}^{'}

are integers, then using Bayes’ theorem, we can consider that the distribution R is derived from the measure

μ_{R}

and a number

N_{R} = r_{0} + r_{0}^{'}

of pulls on the right arm, with

r_{0}

successes and

r_{0}^{'}

failures. Similarly, if

l_{0}

and

l_{0}^{'}

are integers, L is derived from the measure

μ_{L}

and a number

N_{L} = l_{0} + l_{0}^{'}

of pulls on the left arm, with

l_{0}

successes and

l_{0}^{'}

failures.

With these notations, the initial distribution

I_{0} = (R, L)

can be written as

\begin{matrix} I_{0} = (r_{0}, r_{0}^{'}, μ_{R}; l_{0}, l_{0}^{'}, μ_{L}) . \end{matrix}

The posterior distribution

I_{1} = (r_{0} + 1, r_{0}^{'}, μ_{R}; l_{0}, l_{0}^{'}, μ_{L})

if the right arm

R

is pulled and wins, while

I_{1} = (r_{0}, r_{0}^{'} + 1, μ_{R}; l_{0}, l_{0}^{'}, μ_{L})

if

R

fails. Similarly, the posterior distribution

I_{1} = (r_{0}, r_{0}^{'}, μ_{R}; l_{0} + 1, l_{0}^{'}, μ_{L})

if the left arm

L

is pulled and wins, while

I_{1} = (r_{0}, r_{0}^{'}, μ_{R}; l_{0}, l_{0}^{'} + 1, μ_{L})

if

L

fails.

Sometimes, we only need to consider the distribution R of the right arm, and thus we can write

I_{0}

and

I_{1}

as

\begin{matrix} I_{0} = (r_{0}, r_{0}^{'}, μ_{R}; L) and I_{1} = (r_{0} + 1, r_{0}^{'}, μ_{R}; L) . \end{matrix}

Let

E (ρ | r_{0}, r_{0}^{'}; μ_{R})

and

E (λ | l_{0}, l_{0}^{'}; μ_{L})

denote the expectations of

ρ

and

λ

for the distributions R and L, respectively.

An important case is

μ_{R} = μ_{L} = μ

. In this case, the difference between the distributions R and L is entirely determined by

r_{0}, r_{0}^{'}

and

l_{0}, l_{0}^{'}

, respectively. Therefore, without causing confusion, the above notation will be shortened to

I_{0} = (r_{0}, r_{0}^{'}; l_{0}, l_{0}^{'})

,

E (ρ | r_{0}, r_{0}^{'})

and

E (λ | l_{0}, l_{0}^{'})

.

In this paper, we focus on two special

μ

cases. The first case is when

μ

is a two-point distribution

τ

, with a concentrating probability of

\frac{1}{2}

on each of

τ_{1}

and

τ_{2}

and

τ_{1} < τ_{2}

without both

τ_{1} = 0

and

τ_{2} = 1

. The distributions R and L are also two-point distributions and

R (τ_{1}) = \frac{τ_{1}^{r_{0}} {(1 - τ_{1})}^{r_{0}^{'}}}{τ_{1}^{r_{0}} {(1 - τ_{1})}^{r_{0}^{'}} + τ_{2}^{r_{0}} {(1 - τ_{2})}^{r_{0}^{'}}}, R (τ_{2}) = \frac{τ_{2}^{r_{0}} {(1 - τ_{2})}^{r_{0}^{'}}}{τ_{1}^{r_{0}} {(1 - τ_{1})}^{r_{0}^{'}} + τ_{2}^{r_{0}} {(1 - τ_{2})}^{r_{0}^{'}}},

(5)

L (τ_{1}) = \frac{τ_{1}^{l_{0}} {(1 - τ_{1})}^{l_{0}^{'}}}{τ_{1}^{l_{0}} {(1 - τ_{1})}^{l_{0}^{'}} + τ_{2}^{l_{0}} {(1 - τ_{2})}^{l_{0}^{'}}}, L (τ_{2}) = \frac{τ_{2}^{l_{0}} {(1 - τ_{2})}^{l_{0}^{'}}}{τ_{1}^{l_{0}} {(1 - τ_{1})}^{l_{0}^{'}} + τ_{2}^{l_{0}} {(1 - τ_{2})}^{l_{0}^{'}}} .

(6)

The possibility region of

τ

is dependent on

τ_{1}

and

τ_{2}

. If

0 < τ_{1} < τ_{2} < 1

, then the possibility region is the whole

(r_{0}, r_{0}^{'})

plane. If

τ_{1} = 0

and

τ_{2} < 1

, then the possibility region is

r_{0} \geq 0

and

r_{0}^{'} \in R

. If

τ_{1} > 0

and

τ_{2} = 1

, then the possibility region is

r_{0} \in R

and

r_{0}^{'} \geq 0

. The corresponding expectations are

\begin{matrix} E (ρ | r_{0}, r_{0}^{'}; τ) & = \frac{τ_{1}^{r_{0} + 1} {(1 - τ_{1})}^{r_{0}^{'}} + τ_{2}^{r_{0} + 1} {(1 - τ_{2})}^{r_{0}^{'}}}{τ_{1}^{r_{0}} {(1 - τ_{1})}^{r_{0}^{'}} + τ_{2}^{r_{0}} {(1 - τ_{2})}^{r_{0}^{'}}}, \end{matrix}

(7)

\begin{matrix} E (λ | l_{0}, l_{0}^{'}; τ) & = \frac{τ_{1}^{l_{0} + 1} {(1 - τ_{1})}^{l_{0}^{'}} + τ_{2}^{l_{0} + 1} {(1 - τ_{2})}^{l_{0}^{'}}}{τ_{1}^{l_{0}} {(1 - τ_{1})}^{l_{0}^{'}} + τ_{2}^{l_{0}} {(1 - τ_{2})}^{l_{0}^{'}}} . \end{matrix}

(8)

In the second case, let

μ = β

, where

\begin{matrix} β (A) = \int_{A} x^{- 1} {(1 - x)}^{- 1} d x, for any A \subseteq [0, 1] . \end{matrix}

(9)

And the number of successes and failures

r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}

satisfy

\begin{matrix} v (r_{0}, r_{0}^{'}; β) & = \int_{0}^{1} ρ^{r_{0} - 1} {(1 - ρ)}^{r_{0}^{'} - 1} d ρ < \infty, \\ v (l_{0}, l_{0}^{'}; β) & = \int_{0}^{1} λ^{l_{0} - 1} {(1 - λ)}^{l_{0}^{'} - 1} d λ < \infty, \end{matrix}

which are equivalent to

r_{0}, r_{0}^{'}, l_{0}

and

l_{0}^{'}

being positive numbers. Then, the distributions R and L are both beta distributions and

\begin{matrix} d R (ρ) & = \frac{1}{v (r_{0}, r_{0}^{'}; β)} ρ^{r_{0} - 1} {(1 - ρ)}^{r_{0}^{'} - 1} d ρ, \end{matrix}

(10)

\begin{matrix} d L (λ) & = \frac{1}{v (l_{0}, l_{0}^{'}; β)} λ^{l_{0} - 1} {(1 - λ)}^{l_{0}^{'} - 1} d λ . \end{matrix}

(11)

The corresponding expectations are

\begin{matrix} E (ρ | r_{0}, r_{0}^{'}; β) & = \frac{r_{0}}{N_{R}} = \frac{r_{0}}{r_{0} + r_{0}^{'}}, \\ E (λ | l_{0}, l_{0}^{'}; β) & = \frac{l_{0}}{N_{L}} = \frac{l_{0}}{l_{0} + l_{0}^{'}} . \end{matrix}

Using the Bayesian formula, we know that all posterior distributions are also beta distributions.

2.2. The Function $Δ$

This problem has a dynamic programming property. In each selection, the gambler always needs to choose the arm that will lead to the greatest subsequent gain based on the current information.

Let

W_{N - k}^{R} (I_{k})

(

W_{N - k}^{L} (I_{k})

) denote the worth of the pattern

I_{k}

with

N - k

pulls remaining when the right (left) arm is pulled at stage

k + 1

and an optimal procedure follows thereafter. Let

W_{N} (I_{0})

be the worth of

I_{0}

when an optimal procedure is followed. Then, there is

\begin{matrix} W_{N} (I_{0}) = max \{W_{N}^{R} (I_{0}), W_{N}^{L} (I_{0})\} for all N \geq 1 and I_{0} . \end{matrix}

Using this dynamic programming property, for all

N \geq 1

and

I_{0} = (r_{0}, r_{0}^{'}, μ_{R}; l_{0}, l_{0}^{'}, μ_{L})

, there is

\begin{matrix} W_{N}^{R} (I_{0}) = & E (ρ | r_{0}, r_{0}^{'}; μ_{R}) + E (ρ | r_{0}, r_{0}^{'}; μ_{R}) W_{N - 1} (r_{0} + 1, r_{0}^{'}, μ_{R}; l_{0}, l_{0}^{'}, μ_{L}) \\ + (1 - E (ρ | r_{0}, r_{0}^{'}; μ_{R})) W_{N - 1} (r_{0}, r_{0}^{'} + 1, μ_{R}; l_{0}, l_{0}^{'}, μ_{L}), \\ W_{N}^{L} (I_{0}) = & E (λ | l_{0}, l_{0}^{'}; μ_{L}) + E (λ | l_{0}, l_{0}^{'}; μ_{L}) W_{N - 1} (r_{0}, r_{0}^{'}, μ_{R}; l_{0} + 1, l_{0}^{'}, μ_{L}) \\ + (1 - E (λ | l_{0}, l_{0}^{'}; μ_{L})) W_{N - 1} (r_{0}, r_{0}^{'}, μ_{R}; l_{0}, l_{0}^{'} + 1, μ_{L}) . \end{matrix}

Note that

W_{0} (I_{0}) = 0

for any

I_{0}

. Then, for any

I_{0}

and

N \geq 1

, we can define an important function:

\begin{matrix} Δ_{N} (I_{0}) = W_{N}^{R} (I_{0}) - W_{N}^{L} (I_{0}) . \end{matrix}

The function

Δ_{N} (I_{0})

represents the advantage of choosing

R

over

L

in the first stage.

Δ_{N} (I_{0}) \geq 0

means that

R

is optimal, and

Δ_{N} (I_{0}) < 0

means that

L

is better than

R

.

By simple calculation, we obtain that the function

Δ_{N} (I_{0})

can be defined recursively. Let

Δ_{N}^{+} (I_{0}) = max {0, Δ_{N} (I_{0})} \geq 0

and

Δ_{N}^{-} (I_{0}) = min {0, Δ_{N} (I_{0})} \leq 0

. Then, for any

I_{0} = (r_{0}, r_{0}^{'}, μ_{R}; l_{0}, l_{0}^{'}, μ_{L})

and

N \geq 2

, we have

\begin{matrix} Δ_{N} (I_{0}) = & E (ρ | r_{0}, r_{0}^{'}; μ_{R}) Δ_{N - 1}^{+} (r_{0} + 1, r_{0}^{'}, μ_{R}; l_{0}, l_{0}^{'}, μ_{L}) \\ + (1 - E (ρ | r_{0}, r_{0}^{'}; μ_{R})) Δ_{N - 1}^{+} (r_{0}, r_{0}^{'} + 1, μ_{R}; l_{0}, l_{0}^{'}, μ_{L}) \\ + E (λ | l_{0}, l_{0}^{'}; μ_{L}) Δ_{N - 1}^{-} (r_{0}, r_{0}^{'}, μ_{R}; l_{0} + 1, l_{0}^{'}, μ_{L}) \\ + (1 - E (λ | l_{0}, l_{0}^{'}; μ_{L})) Δ_{N - 1}^{-} (r_{0}, r_{0}^{'}, μ_{R}; l_{0}, l_{0}^{'} + 1, μ_{L}) . \end{matrix}

(12)

In addition, for

N = 1

, there is

\begin{matrix} Δ_{1} (I_{0}) = E (ρ | r_{0}, r_{0}^{'}; μ_{R}) - E (λ | l_{0}, l_{0}^{'}; μ_{L}) . \end{matrix}

(13)

The following Proposition 1 is easily obtained from Equations (12) and (13).

Proposition 1

(Theorem 3.1 in [5]). For any

I_{0} = (r_{0}, r_{0}^{'}, μ_{R}; l_{0}, l_{0}^{'}, μ_{L})

and

N \geq 1

, there is

\begin{matrix} - (1 - E (ρ | r_{0}, r_{0}^{'}; μ_{R}) \leq Δ_{N} (I_{0}) \leq 1 - E (λ | l_{0}, l_{0}^{'}; μ_{L}) . \end{matrix}

2.3. Berry’s Conjecture and Related Results

Obviously, when there are N pulls left and the current known information is

I_{0}

, we can use the sign of

Δ_{N} (I_{0})

to determine which arm is optimal at this stage. Therefore, how to identify the sign of

Δ_{N} (I_{0})

is the key to finding the optimal strategy. Unfortunately, Berry did not completely solve this problem and instead gave the following Theorem 1.

Theorem 1

(Theorem 5.1 in [5]). The following statements are true for

N \geq 1

,

I_{0} = (r_{0}, r_{0}^{'}, μ_{R}; L)

and any

δ_{r}

,

δ_{r}^{'} \geq 0

:

\begin{matrix} Δ_{N} (r_{0} + δ_{r}, r_{0}^{'} + δ_{r}^{'}, μ_{R}; L) \geq & Δ_{N} (r_{0}, r_{0}^{'}, μ_{R}; L) \\ if E (ρ | r_{0} + δ_{r} & + N - 1, r_{0}^{'} + δ_{r}^{'}; μ_{R}) \geq E (ρ | r_{0} + N - 1, r_{0}^{'}; μ_{R}); \\ Δ_{N} (r_{0} + δ_{r}, r_{0}^{'} + δ_{r}^{'}, μ_{R}; L) \leq & Δ_{N} (r_{0}, r_{0}^{'}, μ_{R}; L) \\ if E (ρ | r_{0} + δ_{r} & , r_{0}^{'} + δ_{r}^{'} + N - 1; μ_{R}) \leq E (ρ | r_{0}, r_{0}^{'} + N - 1; μ_{R}) . \end{matrix}

Remark 1.

Theorem 5.2 in [5] states that a strict increase in

E (ρ | r_{0} + N - 1, r_{0}^{'}; μ_{R})

or a strict decrease in

E (ρ | r_{0}, r_{0}^{'} + N - 1; μ_{R})

guarantees a strict increase in

Δ_{N} (r_{0}, r_{0}^{'}, μ_{R}; L)

for all L and N.

Remark 2.

When considering the left arm, a conclusion similar to the above theorem can be obtained by using the fact that

Δ_{N} (r_{0}, r_{0}^{'}, μ_{R}; l_{0}, l_{0}^{'}, μ_{L}) = - Δ_{N} (l_{0}, l_{0}^{'}, μ_{L}; r_{0}, r_{0}^{'}, μ_{R})

. For

N \geq 1

,

I_{0} = (R; l_{0}, l_{0}^{'}, μ_{L})

and any

δ_{l}

,

δ_{l}^{'} \geq 0

, we have

\begin{matrix} Δ_{N} (R; l_{0} + δ_{l}, l_{0}^{'} + δ_{l}^{'}, μ_{L}) \leq & Δ_{N} (R; l_{0}, l_{0}^{'}, μ_{L}) \\ if E (λ | l_{0} + δ_{l} & + N - 1, l_{0}^{'} + δ_{l}^{'}; μ_{L}) \geq E (λ | l_{0} + N - 1, l_{0}^{'}; μ_{L}); \\ Δ_{N} (R; l_{0} + δ_{l}, l_{0}^{'} + δ_{l}^{'}, μ_{L}) \geq & Δ_{N} (R; l_{0}, l_{0}^{'}, μ_{L}) \\ if E (λ | l_{0} + δ_{l} & , l_{0}^{'} + δ_{l}^{'} + N - 1; μ_{L}) \leq E (λ | l_{0}, l_{0}^{'} + N - 1; μ_{L}) . \end{matrix}

Theorem 1 cannot be used in many cases due to the harsh conditions. However, it can still reveal some properties of the function

Δ_{N} (I_{0})

, such as

\begin{matrix} Δ_{N} (r_{0} + 1, r_{0}^{'}, μ_{R}; L) \geq Δ_{N} (r_{0}, r_{0}^{'}, μ_{R}; L) \geq Δ_{N} (r_{0}, r_{0}^{'} + 1, μ_{R}; L), for N \geq 1 \end{matrix}

(14)

and

\begin{matrix} 1 - 1 Δ_{N} (r_{0}, r_{0}^{'}, μ_{R}; L) \leq Δ_{N - 1}^{+} (r_{0} + 1, r_{0}^{'}, μ_{R}; L) for N \geq 2 . \end{matrix}

(15)

When R and L are conjugate with each other (i.e.,

μ_{R} = μ_{L} = μ

), there are several more refined results:

Theorem 2

(Theorem 6.4 in [5]). Provided

μ_{R} = μ_{L} = μ

, if

r_{0} \geq l_{0}

and

r_{0}^{'} \leq l_{0}^{'}

, then

Δ_{N} (r_{0}, r_{0}^{'}; l_{0}, l_{0}^{'}) \geq 0

for all

N \geq 1

.

Theorem 2 is intuitive. When the right arm wins more often than the left arm and loses less than the left arm, we believe that the right arm is better. We can immediately get the following Corollary 1 and Corollary 2 from Theorem 2.

Corollary 1

(Corollary 1 in [5]). Provided

μ_{R} = μ_{L} = μ

, and if

N_{R} \leq N_{L}

and

r_{0}^{'} \geq l_{0}^{'}

, then

Δ_{N} (r_{0}, r_{0}^{'}; l_{0}, l_{0}^{'}) \leq 0

for all

N \geq 1

.

Corollary 2

(Corollary 2 in [5]). Provided

μ_{R} = μ_{L} = μ

, and if

N_{R} \leq N_{L}

and

r_{0} \geq l_{0}

, then

Δ_{N} (r_{0}, r_{0}^{'}; l_{0}, l_{0}^{'}) \geq 0

for all

N \geq 1

.

Intuitively, the conclusion of Corollary 2 can still be strengthened. When

N_{R} \leq N_{L}

, the right arm has less known information than the left arm, so choosing the right arm can bring additional information. Additionally, if at the same time the right arm offers a greater expected immediate payoff, then the optimal choice for the next pull should be the right arm.

With this idea in mind, Berry proposed the following conjecture:

Berry’s Conjecture (Conjecture B in [5]). Let

μ_{R} = μ_{L} = μ

and

I_{0} = (r_{0}, r_{0}^{'}; l_{0}, l_{0}^{'})

. If

r_{0} + r_{0}^{'} \leq l_{0} + l_{0}^{'}

, and

E (ρ | r_{0}, r_{0}^{'}) \geq E (λ | l_{0}, l_{0}^{'})

, then

Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0

for all

N \geq 1

.

3. Main Result

In this section, we prove that Berry’s conjecture is equivalent to the following statement:

Statement.

Let

μ_{R} = μ_{L} = μ

and

I_{0} = (r_{0}, r_{0}^{'}; l_{0}, l_{0}^{'})

. If

r_{0} + r_{0}^{'} \leq l_{0} + l_{0}^{'}

and

E (ρ | r_{0}, r_{0}^{'}) = E (λ | l_{0}, l_{0}^{'})

, then

Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0

for all integers

N \geq 1

.

It seems that Berry’s conjecture is a stronger result while the Statement is a direct corollary of it. However, we will show below that the Statement is actually equivalent to Berry’s conjecture.

Here, we quote two results obtained by Berry regarding the partial derivatives of

E (ρ | r, r^{'}; μ_{R})

and

E (λ | l, l^{'}; μ_{L})

. See the discussion before Equation (4.8) in [5]:

\begin{matrix} \frac{\partial}{\partial r} E (ρ | r, r^{'}; μ_{R}) & = C o v (ρ, log ρ) \geq 0, \end{matrix}

(16)

\begin{matrix} \frac{\partial}{\partial r^{'}} E (ρ | r, r^{'}; μ_{R}) & = C o v (ρ, log (1 - ρ)) \leq 0, \end{matrix}

(17)

and

\begin{matrix} \frac{\partial}{\partial l} E (λ | l, l^{'}; μ_{L}) & = C o v (λ, log λ) \geq 0, \end{matrix}

(18)

\begin{matrix} \frac{\partial}{\partial l^{'}} E (λ | l, l^{'}; μ_{L}) & = C o v (λ, log (1 - λ)) \leq 0 . \end{matrix}

(19)

Using these results, we can derive the following Lemma 1.

Lemma 1.

Let

(r_{0}, r_{0}^{'})

and

(l_{0}, l_{0}^{'})

be interior points of the possibility region of μ,

K \leq 0

and θ be real numbers such that

(r_{0} + θ K, r_{0}^{'} - θ K)

and

(l_{0} + θ, l_{0}^{'} - θ)

are both in the possibility region of μ. Then, for any positive integer N, we have

\begin{matrix} \frac{d}{d θ} Δ_{N} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ) \leq 0 . \end{matrix}

(20)

Proof.

Let us use mathematical induction. When

N = 1

, we have

\begin{matrix} Δ_{1} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ) = E (ρ | r_{0} + θ K, r_{0}^{'} - θ K) - E (λ | l_{0} + θ, l_{0}^{'} - θ) . \end{matrix}

Notice that

\begin{matrix} \frac{d}{d θ} E (ρ | r_{0} + θ K, r_{0}^{'} - θ K) \\ = & K (\frac{\partial}{\partial r} E (ρ | r_{0} + θ K, r_{0}^{'} - θ K) - \frac{\partial}{\partial r^{'}} E (ρ | r_{0} + θ K, r_{0}^{'} - θ K)), \\ \frac{d}{d θ} E (λ | l_{0} + θ, l_{0}^{'} - θ) \\ = & (\frac{\partial}{\partial l} E (λ | l_{0} + θ, l_{0}^{'} - θ) - \frac{\partial}{\partial l^{'}} E (λ | l_{0} + θ, l_{0}^{'} - θ)) . \end{matrix}

Since

K \leq 0

, the above equalities together with Equations (16)–(19) yield

\begin{matrix} \frac{d}{d θ} & E (ρ | r_{0} + θ K, r_{0}^{'} - θ K) \leq 0, \end{matrix}

(21)

\begin{matrix} \frac{d}{d θ} & E (λ | l_{0} + θ, l_{0}^{'} - θ) \geq 0 . \end{matrix}

(22)

Therefore, we have

\frac{d}{d θ} Δ_{1} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ) \leq 0 .

Now, let us assume that Equation (20) holds for N, and consider the

N + 1

case. Through Equation (12), we have

\begin{matrix} Δ_{N + 1} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ) \\ = & E (ρ | r_{0} + θ K, r_{0}^{'} - θ K) Δ_{N}^{+} (r_{0} + θ K + 1, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ) \\ + (1 - E (ρ | r_{0} + θ K, r_{0}^{'} - θ K)) Δ_{N}^{+} (r_{0} + θ K, r_{0}^{'} - θ K + 1, l_{0} + θ, l_{0}^{'} - θ) \\ + E (λ | l_{0} + θ, l_{0}^{'} - θ) Δ_{N}^{-} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ + 1, l_{0}^{'} - θ) \\ + (1 - E (λ | l_{0} + θ, l_{0}^{'} - θ)) Δ_{N}^{-} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ + 1) . \end{matrix}

By taking the derivative, we obtain the following equality:

\begin{matrix} \frac{d}{d θ} Δ_{N + 1} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ) \\ = & E (ρ | r_{0} + θ K, r_{0}^{'} - θ K) \frac{d}{d θ} Δ_{N}^{+} (r_{0} + θ K + 1, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ) \\ + (1 - E (ρ | r_{0} + θ K, r_{0}^{'} - θ K)) \frac{d}{d θ} Δ_{N}^{+} (r_{0} + θ K, r_{0}^{'} - θ K + 1, l_{0} + θ, l_{0}^{'} - θ) \\ + E (λ | l_{0} + θ, l_{0}^{'} - θ) \frac{d}{d θ} Δ_{N}^{-} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ + 1, l_{0}^{'} - θ) \\ + (1 - E (λ | l_{0} + θ, l_{0}^{'} - θ)) \frac{d}{d θ} Δ_{N}^{-} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ + 1) \\ + \frac{d}{d θ} E (ρ | r_{0} + θ K, r_{0}^{'} - θ K) [Δ_{N}^{+} (r_{0} + θ K + 1, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ) \\ - Δ_{N}^{+} (r_{0} + θ K, r_{0}^{'} - θ K + 1, l_{0} + θ, l_{0}^{'} - θ)] \\ + \frac{d}{d θ} E (λ | l_{0} + θ, l_{0}^{'} - θ) [Δ_{N}^{-} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ + 1, l_{0}^{'} - θ) \\ - Δ_{N}^{-} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ + 1)] \end{matrix}

(23)

Through Equation (14), we can obtain that

\begin{matrix} Δ_{N}^{+} (r_{0} + θ K + 1, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ) & \geq Δ_{N}^{+} (r_{0} + θ K, r_{0}^{'} - θ K + 1, l_{0} + θ, l_{0}^{'} - θ), \end{matrix}

(24)

\begin{matrix} Δ_{N}^{-} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ + 1, l_{0}^{'} - θ) & \leq Δ_{N}^{-} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ + 1) . \end{matrix}

(25)

By combining Equations (21), (22), (24) and (25), we obtain that the last two summands on the right side of Equation (23) are both negative.

Then, by applying the assumption for N on

(r_{0} + 1, r_{0}^{'}, l_{0}, l_{0}^{'})

,

(r_{0}, r_{0}^{'} + 1, r_{0}, r_{0}^{'})

,

(r_{0}, r_{0}^{'}, l_{0} + 1, l_{0}^{'})

and

(r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'} + 1)

, we know the first four summands on the right side of Equation (23) are all negative. Hence, we now have

\begin{matrix} \frac{d}{d θ} Δ_{N + 1} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ) \leq 0 . \end{matrix}

Thus, Equation (20) holds for any positive integer N. □

Now, we can prove the equivalence of the Statement and Berry’s conjecture:

Theorem 3.

The Statement holds if and only if Berry’s conjecture holds.

Proof.

Assume that the Statement holds. When

μ_{R} = μ_{L} = μ

, we have

E (ρ | l_{0}, l_{0}^{'}) = E (λ | l_{0}, l_{0}^{'})

. If

E (ρ | r_{0}, r_{0}^{'}) = E (λ | l_{0}, l_{0}^{'})

, then the conclusion of Berry’s conjecture must hold by applying the Statement. For any

r_{0} + r_{0}^{'} \leq l_{0} + l_{0}^{'}

and

E (ρ | r_{0}, r_{0}^{'}) > E (λ | l_{0}, l_{0}^{'})

, there must be

r_{0}^{'} < l_{0}^{'}

by using Equations (16) and (17). If

r_{0} \geq l_{0}

, then

Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0

by applying Corollary 2. Thus, we only need to prove the case where

r_{0} < l_{0}

and

r_{0}^{'} < l_{0}^{'}

.

Therefore, the possibility region of

μ

contains at least all pairs of

(r, r^{'})

that satisfy

r > r_{0}

and

r^{'} > r_{0}^{'}

(see Section 2.1). Let

θ > 0

. With the equalities in Equations (18) and (19), we have

E (λ | l_{0} + θ, l_{0}^{'} - θ) \geq E (λ | l_{0}, l_{0}^{'})

. If

θ = l_{0}^{'} - r_{0}^{'}

, then

θ > 0

,

l_{0} + θ > r_{0}

and

l_{0}^{'} - θ = r_{0}^{'}

. Due to

μ_{R} = μ_{L}

and

E (ρ | r_{0}, r_{0}^{'}) \geq E (λ | l_{0}, l_{0}^{'})

, we have

\begin{matrix} E (λ | l_{0} + θ, l_{0}^{'} - θ) \geq E (λ | r_{0}, r_{0}^{'}) = E (ρ | r_{0}, r_{0}^{'}) \geq E (λ | l_{0}, l_{0}^{'}) . \end{matrix}

Then, there exists

θ^{*} \leq l_{0}^{'} - r_{0}^{'}

such that

E (λ | l_{0} + θ^{*}, l_{0}^{'} - θ^{*}) = E (ρ | r_{0}, r_{0}^{'})

. Since

(l_{0} + θ^{*}, l_{0}^{'} - θ^{*})

is an interior point of the possibility region of

μ

, we can consider

Δ_{N} (r_{0}, r_{0}^{'}, l_{0} + θ^{*}, l_{0}^{'} - θ^{*})

. With Lemma 1, we obtain

\begin{matrix} Δ_{N} (r_{0}, r_{0}^{'}, l_{0} + θ^{*}, l_{0}^{'} - θ^{*}) \leq Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) . \end{matrix}

Note that when

r_{0} + r_{0}^{'} \leq l_{0} + l_{0}^{'} = l_{0} + θ^{*} + l_{0}^{'} - θ^{*}

and

E (λ | l_{0} + θ^{*}, l_{0}^{'} - θ^{*}) = E (ρ | r_{0}, r_{0}^{'})

, we can use the Statement to obtain

Δ_{N} (r_{0}, r_{0}^{'}, l_{0} + θ^{*}, l_{0}^{'} - θ^{*}) \geq 0

for any

N \geq 1

. Therefore, the desired result

Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0

holds. □

Theorem 3 simplifies the condition

E (ρ | r_{0}, r_{0}^{'}) \geq E (λ | l_{0}, l_{0}^{'})

in Berry’s conjecture to

E (ρ | r_{0}, r_{0}^{'}) = E (λ | l_{0}, l_{0}^{'})

. Unfortunately, the Statement is still not easy to prove. In the following, we will continue this discussion under two important special cases, where R and L are two-point distributions or beta distributions.

4. Two-Point Distribution Case

In this section, we consider the situation where

μ_{R} = μ_{L} = τ

and the distribution

τ

is a two-point distribution, with a concentrating probability of

\frac{1}{2}

for both

τ_{1}

and

τ_{2}

. Without causing confusion, we will omit the

τ

in the notation (e.g.,

E (ρ | r_{0}, r_{0}^{'}, τ)

will be written as

E (ρ | r_{0}, r_{0}^{'})

). In this case, we prove that the Statement holds and obtain a more complete conclusion than Berry’s conjecture due to the good properties of

Δ_{N}

. This is consistent with the conclusion that Berry obtained by discussing the contours of

Δ_{N}

in the

(r_{0}, r_{0}^{'})

plane.

In the following discussion, let

0 < τ_{1} < τ_{2} < 1

, then all of the pairs

(r, r^{'})

in the plane are in the possibility region.

To prove the Statement, we should first find the points in the possibility region where the expected values of

ρ

and

λ

are equal such that

\begin{matrix} E (ρ | r_{0}, r_{0}^{'}) = E (λ | l_{0}, l_{0}^{'}) . \end{matrix}

(26)

Due to Equations (7) and (8), Equation (26) is equivalent to

\begin{matrix} \frac{τ_{1}^{r_{0} + 1} {(1 - τ_{1})}^{r_{0}^{'}} + τ_{2}^{r_{0} + 1} {(1 - τ_{2})}^{r_{0}^{'}}}{τ_{1}^{r_{0}} {(1 - τ_{1})}^{r_{0}^{'}} + τ_{2}^{r_{0}} {(1 - τ_{2})}^{r_{0}^{'}}} = \frac{τ_{1}^{l_{0} + 1} {(1 - τ_{1})}^{l_{0}^{'}} + τ_{2}^{l_{0} + 1} {(1 - τ_{2})}^{l_{0}^{'}}}{τ_{1}^{l_{0}} {(1 - τ_{1})}^{l_{0}^{'}} + τ_{2}^{l_{0}} {(1 - τ_{2})}^{l_{0}^{'}}}, \end{matrix}

(27)

which holds if and only if

\begin{matrix} (τ_{2} - τ_{1}) [τ_{1}^{l_{0}} {(1 - τ_{1})}^{l_{0}^{'}} τ_{2}^{r_{0}} {(1 - τ_{2})}^{r_{0}^{'}} - τ_{1}^{r_{0}} {(1 - τ_{1})}^{r_{0}^{'}} τ_{2}^{l_{0}} {(1 - τ_{2})}^{l_{0}^{'}}] = 0 . \end{matrix}

(28)

Since

0 < τ_{1} < τ_{2} < 1

, Equation (28) is equivalent to

\begin{matrix} τ_{2}^{r_{0} - l_{0}} {(1 - τ_{2})}^{r_{0}^{'} - l_{0}^{'}} = τ_{1}^{r_{0} - l_{0}} {(1 - τ_{1})}^{r_{0}^{'} - l_{0}^{'}} . \end{matrix}

With the logarithm, we obtain

\begin{matrix} (r_{0} - l_{0}) log τ_{2} + (r_{0}^{'} - l_{0}^{'}) log (1 - τ_{2}) = (r_{0} - l_{0}) log τ_{1} + (r_{0}^{'} - l_{0}^{'}) log (1 - τ_{1}) . \end{matrix}

(29)

Recall that for

N_{R} = r_{0} + r_{0}^{'}

and

N_{L} = l_{0} + l_{0}^{'}

, there is

\begin{matrix} r_{0}^{'} - l_{0}^{'} = N_{R} - N_{L} - (r_{0} - l_{0}) . \end{matrix}

(30)

Using Equations (29) and (30), we obtain that the relationship between

l_{0}

and

r_{0}

to make Equation (26) hold is

\begin{matrix} r_{0} = l_{0} - C_{τ} (N_{R} - N_{L}), \end{matrix}

(31)

where

C_{τ} = \frac{log (1 - τ_{2}) - log (1 - τ_{1})}{log τ_{2} - log τ_{1} + log (1 - τ_{1}) - log (1 - τ_{2})}

.

Next, we will show that

Δ_{N}

has a strong symmetry property when

μ_{R} = μ_{L} = τ

. Note that since the possibility region of

τ

is the whole plane, the m and n in Theorem 4 can be any integers:

Theorem 4.

If

μ_{R} = μ_{L} = τ

, then for any positive integer N and any numbers

l_{0}

, m and n, there is

\begin{matrix} Δ_{N} (l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0}, N_{L} - l_{0}) \\ + Δ_{N} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m, N_{L} - l_{0} + n) = 0 . \end{matrix}

(32)

Proof.

We will use mathematical induction to prove this theorem.

When

N = 1

, for any integers m and n, we have

\begin{matrix} Δ_{1} (l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0}, N_{L} - l_{0}) \\ + Δ_{1} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m, N_{L} - l_{0} + n) \\ = & E (ρ | l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n) - E (λ | l_{0}, N_{L} - l_{0}) \\ + E (ρ | l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L})) - E (λ | l_{0} + m, N_{L} - l_{0} + n) . \end{matrix}

(33)

With the equivalence between Equations (26) and (31), we have

\begin{matrix} E (ρ | l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L})) = E (λ | l_{0}, N_{L} - l_{0}) . \end{matrix}

(34)

By letting

\tilde{l_{0}} = l_{0} + m

,

{\tilde{N}}_{R} = N_{R} + m + n

and

{\tilde{N}}_{L} = N_{L} + m + n

, and apply Equation (34) to

\tilde{l_{0}}

,

{\tilde{N}}_{R}

and

{\tilde{N}}_{L}

, we obtain

\begin{matrix} E (ρ | l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n) = E (λ | l_{0} + m, N_{L} - l_{0} + n) . \end{matrix}

(35)

Equation (33), together with Equations (34) and (35), yields

\begin{matrix} Δ_{1} (l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0}, N_{L} - l_{0}) \\ + Δ_{1} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m, N_{L} - l_{0} + n) = 0 . \end{matrix}

Then, Equation (32) has been proven for the case where

N = 1

.

Now, assume that Equation (32) holds for N. For any

N_{R}, N_{L}

and any numbers

l_{0}

, m and n, we need to prove Equation (32) also holds for

N + 1

.

Consider the

N + 1

case. Using the recursive Equation (12), we have

\begin{matrix} Δ_{N + 1} (l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0}, N_{L} - l_{0}) \\ + Δ_{N + 1} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m, N_{L} - l_{0} + n) \\ = & E (ρ | l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n) \\ \times Δ_{N}^{+} (l_{0} - C_{τ} (N_{R} - N_{L}) + m + 1, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0}, N_{L} - l_{0}) \\ + (1 - E (ρ | l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n)) \\ \times Δ_{N}^{+} (l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n + 1, l_{0}, N_{L} - l_{0}) \\ + E (λ | l_{0}, N_{L} - l_{0}) \\ \times Δ_{N}^{-} (l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0} + 1, N_{L} - l_{0}) \\ + (1 - E (λ | l_{0}, N_{L} - l_{0})) \\ \times Δ_{N}^{-} (l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0}, N_{L} - l_{0} + 1) \\ + E (ρ | l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L})) \\ \times Δ_{N}^{+} (l_{0} - C_{τ} (N_{R} - N_{L}) + 1, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m, N_{L} - l_{0} + n) \\ + (1 - E (ρ | l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}))) \\ \times Δ_{N}^{+} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + 1, l_{0} + m, N_{L} - l_{0} + n) \\ + E (λ | l_{0} + m, N_{L} - l_{0} + n) \\ \times Δ_{N}^{-} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m + 1, N_{L} - l_{0} + n) \\ + (1 - E (λ | l_{0} + m, N_{L} - l_{0} + n)) \\ \times Δ_{N}^{-} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m, N_{L} - l_{0} + n + 1) . \end{matrix}

(36)

For

N_{R}

,

N_{L}

,

m + 1

, n and

l_{0}

, Equation (32) implies

\begin{matrix} Δ_{N} (l_{0} - C_{τ} (N_{R} - N_{L}) + m + 1, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0}, N_{L} - l_{0}) \\ + Δ_{N} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m + 1, N_{L} - l_{0} + n) = 0 . \end{matrix}

Thus, if

Δ_{N} (l_{0} - C_{τ} (N_{R} - N_{L}) + m + 1, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0}, N_{L} - l_{0}) \geq 0

, then there must be

Δ_{N} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m + 1, N_{L} - l_{0} + n) \leq 0

. Hence, we obtain

\begin{matrix} Δ_{N}^{+} (l_{0} - C_{τ} (N_{R} - N_{L}) + m + 1, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0}, N_{L} - l_{0}) \\ + Δ_{N}^{-} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m + 1, N_{L} - l_{0} + n) = 0 . \end{matrix}

(37)

If

Δ_{N} (l_{0} - C_{τ} (N_{R} - N_{L}) + m + 1, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0}, N_{L} - l_{0}) \leq 0

, then

Δ_{N} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m + 1, N_{L} - l_{0} + n) \geq 0

, and thus Equation (37) also holds.

With Equations (35) and (37), we obtain that the sum of the first and seventh summands on the right side of Equation (36) is zero.

Similarly, for

N_{R} + 1

,

N_{L} + 1

,

m - 1

, n and

l_{0} + 1

, Equation (32) implies

\begin{matrix} Δ_{N} (l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0} + 1, N_{L} - l_{0}) \\ + Δ_{N} (l_{0} - C_{τ} (N_{R} - N_{L}) + 1, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m, N_{L} - l_{0} + n) = 0 . \end{matrix}

Together with Equation (34), we obtain that the sum of the third and fifth summands on the right side of Equation (36) is also zero.

Using similar techniques, we can find that the sum of the second and eighth summands on the right side of Equations (36) and the sum of the fourth and sixth summands on the right side of Equation (36) are both zero. Hence, we have proven that

\begin{matrix} Δ_{N + 1} (l_{0} - C_{τ} (N_{R} - N_{L}) + m, N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}) + n, l_{0}, N_{L} - l_{0}) \\ + Δ_{N + 1} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0} + m, N_{L} - l_{0} + n) = 0 . \end{matrix}

In other words, Equation (32) holds for

N + 1

. The theorem is proven by induction. □

Based on Theorem 4, we can conclude that the Statement holds for

μ_{R} = μ_{L} = τ

. The following corollary shows that

Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) = 0

when

E (ρ | r_{0}, r_{0}^{'}) = E (λ | l_{0}, l_{0}^{'})

, which is a stronger conclusion than our Statement:

Corollary 3.

Provided

μ_{R} = μ_{L} = τ

, for any positive integer N and any real numbers

l_{0}

,

N_{R}

and

N_{L}

, there is

\begin{matrix} Δ_{N} (l_{0} - C_{τ} (N_{R} - N_{L}), N_{R} - l_{0} + C_{τ} (N_{R} - N_{L}), l_{0}, N_{L} - l_{0}) = 0 . \end{matrix}

Proof.

The corollary can be deduced from Theorem 4 for

m = n = 0

. □

Note that

N_{R} \leq N_{L}

is not needed in Theorem 4 and Corollary 3, and we can obtain the following result, which is stronger than Berry’s conjecture and consistent with Theorem 8.3 in [5].

Theorem 5.

Provided

μ_{R} = μ_{L} = τ

, for any positive integer N and any real numbers

r_{0}

,

r_{0}^{'}

,

l_{0}

and

l_{0}^{'}

, we have

\begin{matrix} Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) & \geq 0, if E (ρ | r_{0}, r_{0}^{'}) \geq E (λ | l_{0}, l_{0}^{'}); \\ Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) & \leq 0, if E (ρ | r_{0}, r_{0}^{'}) \leq E (λ | l_{0}, l_{0}^{'}) . \end{matrix}

Proof.

If

E (ρ | r_{0}, r_{0}^{'}) = E (λ | l_{0}, l_{0}^{'})

, by the equivalence between Equations (26) and (31), there is

r_{0} = l_{0} - C_{τ} (N_{R} - N_{L})

and

r_{0}^{'} = N_{R} - l_{0} + C_{τ} (N_{R} - N_{L})

, where

N_{R} = r_{0} + r_{0}^{'}

and

N_{L} = l_{0} + l_{0}^{'}

. Therefore, with Corollary 3, we have

Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) = 0 .

Consider the case

E (ρ | r_{0}, r_{0}^{'}) > E (λ | l_{0}, l_{0}^{'})

. If

r_{0} + r_{0}^{'} \leq l_{0} + l_{0}^{'}

, then with Theorem 3, we have

Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0

. If

r_{0} + r_{0}^{'} > l_{0} + l_{0}^{'}

, by the equivalence between Equations (26) and (31), there is

\begin{matrix} E (λ | r_{0} + C_{τ} (N_{R} - N_{L}), N_{L} - r_{0} - C_{τ} (N_{R} - N_{L})) = E (ρ | r_{0}, r_{0}^{'}) > E (λ | l_{0}, l_{0}^{'}) . \end{matrix}

(38)

Notice that

\begin{matrix} r_{0} + C_{τ} (N_{R} - N_{L}) + N_{L} - r_{0} - C_{τ} (N_{R} - N_{L}) = N_{L} = l_{0} + l_{0}^{'}, \end{matrix}

(39)

and with Equations (18) and (19), we obtain

l_{0} < r_{0} + C_{τ} (N_{R} - N_{L})

and

l_{0}^{'} > N_{L} - r_{0} - C_{τ} (N_{R} - N_{L})

. Let

θ = r_{0} + C_{τ} (N_{R} - N_{L}) - l_{0}

. Then, we have

θ > 0

and Equation (38) becomes

\begin{matrix} E (λ | l_{0} + θ, l_{0}^{'} - θ) = E (ρ | r_{0}, r_{0}^{'}) > E (λ | l_{0}, l_{0}^{'}) . \end{matrix}

(40)

By applying Lemma 1 with

K = 0

and Corollary 3, there is

\begin{matrix} Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) \geq Δ_{N} (r_{0}, r_{0}^{'}, l_{0} + θ, l_{0}^{'} - θ) = 0 . \end{matrix}

(41)

Therefore, we have

Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0

if

E (ρ | r_{0}, r_{0}^{'}) \geq E (λ | l_{0}, l_{0}^{'})

. The

E (ρ | r_{0}, r_{0}^{'}) \leq E (λ | l_{0}, l_{0}^{'})

case can be proven by a similar method. □

5. Beta Distribution Case

In this section, we consider the case where the prior distribution

I_{0} = (R, L)

has a special structure

μ_{R} = μ_{L} = β

, where

β

is defined by Equation (9). Then, R and L are both beta distributions.

Although we believe that the Statement holds in this case, only a partial result is obtained. We know that in this case, the expectation of the parameters

ρ

and

λ

are

E (ρ | r_{0}, r_{0}^{'}) = \frac{r_{0}}{r_{0} + r_{0}^{'}}

and

E (λ | l_{0}, l_{0}^{'}) = \frac{l_{0}}{l_{0} + l_{0}^{'}}

, which are increasing functions of

r_{0}

and

l_{0}

and decreasing functions of

r_{0}^{'}

and

l_{0}^{'}

respectively. Therefore,

E (ρ | r_{0}, r_{0}^{'}) = E (λ | l_{0}, l_{0}^{'})

if and only if

\frac{r_{0}}{r_{0}^{'}} = \frac{l_{0}}{l_{0}^{'}}

.

The following theorem shows that

Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0

holds for

N < ⌊\frac{l_{0}}{l_{0}^{'}}⌋ + 1

when

E (ρ | r_{0}, r_{0}^{'}) = E (λ | l_{0}, l_{0}^{'})

:

Theorem 6.

If

μ_{R} = μ_{L} = β

,

0 < φ \leq 1

and

l_{0}, l_{0}^{'} > 0

, and if for a fixed positive integer

N^{*}

we have

\frac{l_{0}}{l_{0}^{'}} \geq N^{*} - 1

, then for any integer

0 < N \leq N^{*}

, there is

Δ_{N} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0

.

Proof.

Using mathematical induction again, we prove that if

\frac{l_{0}}{l_{0}^{'}} \geq N^{*} - 1

and

0 < N \leq N^{*}

, then the following two inequalities hold:

\begin{matrix} Δ_{N} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0, \end{matrix}

(42)

and for any integer

1 \leq m \leq N^{*} + 1 - N

,

\begin{matrix} 1 - 1 Δ_{N}^{+} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'}) \geq 0 . \end{matrix}

(43)

Through the exact version of Equation (13) for

Δ_{1}

, it is easy to verify the above inequalities:

\begin{matrix} Δ_{1} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'}) = & E (ρ | φ l_{0}, φ l_{0}^{'}) - E (λ | l_{0}, l_{0}^{'}) \\ = & \frac{φ l_{0}}{φ l_{0} + φ l_{0}^{'}} - \frac{l_{0}}{l_{0} + l_{0}^{'}} \\ = & 0 . \end{matrix}

For any

m \geq 0

, we have

\begin{matrix} Δ_{1} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'}) = & E (ρ | φ l_{0} + m, φ l_{0}^{'}) - E (λ | l_{0}, l_{0}^{'}) \\ = & \frac{φ l_{0} + m}{φ l_{0} + m + φ l_{0}^{'}} - \frac{l_{0}}{l_{0} + l_{0}^{'}} \geq 0, \\ Δ_{1} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'}) = & E (ρ | φ l_{0}, φ l_{0}^{'}) - E (λ | l_{0} + m, l_{0}^{'}) \\ = & \frac{φ l_{0}}{φ l_{0} + φ l_{0}^{'}} - \frac{l_{0} + m}{l_{0} + m + l_{0}^{'}} \leq 0 . \end{matrix}

Since

0 < φ \leq 1

, we have

\begin{matrix} Δ_{1}^{+} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + Δ_{1}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'}) \\ = & Δ_{1} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + Δ_{1} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'}) \\ = & \frac{φ l_{0} + m}{φ l_{0} + φ l_{0}^{'} + m} - \frac{l_{0}}{l_{0} + l_{0}^{'}} + \frac{φ l_{0}}{φ l_{0} + φ l_{0}^{'}} - \frac{l_{0} + m}{l_{0} + l_{0}^{'} + m} \\ \geq & 0 . \end{matrix}

Hence, Equations (42) and (43) hold for

N = 1

.

Now, we assume that Equations (42) and (43) hold for

1 \leq N \leq N^{*} - 1

, and we will show that Equations (42) and (43) also hold for

N + 1

. Note that when we consider

N + 1

pulls, the condition for m will become

1 \leq m \leq N^{*} - N

.

First, it can be deduced from Equation (12) that

\begin{matrix} Δ_{N + 1} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'}) \\ = & E (ρ | φ l_{0}, φ l_{0}^{'}) Δ_{N}^{+} (φ l_{0} + 1, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + (1 - E (ρ | φ l_{0}, φ l_{0}^{'})) Δ_{N}^{+} (φ l_{0}, φ l_{0}^{'} + 1, l_{0}, l_{0}^{'}) \\ + E (λ | l_{0}, l_{0}^{'}) Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + 1, l_{0}^{'}) + (1 - E (λ | l_{0}, l_{0}^{'})) Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'} + 1) \\ = & \frac{l_{0}}{l_{0} + l_{0}^{'}} [Δ_{N}^{+} (φ l_{0} + 1, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + 1, l_{0}^{'})] \\ + \frac{l_{0}^{'}}{l_{0} + l_{0}^{'}} [Δ_{N}^{+} (φ l_{0}, φ l_{0}^{'} + 1, l_{0}, l_{0}^{'}) + Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'} + 1)] . \end{matrix}

Using the assumption for N, we know that for

m = 1

, we have

\begin{matrix} Δ_{N} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0, \\ Δ_{N}^{+} (φ l_{0} + 1, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + 1, l_{0}^{'}) \geq 0 . \end{matrix}

Since

E (λ | l_{0}, l_{0}^{'} + 1 + N - 1) < E (λ | l_{0}, l_{0}^{'} + N - 1)

, by using Remark 2, we can obtain

\begin{matrix} Δ_{N} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'} + 1) \geq Δ_{N} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0 . \end{matrix}

Hence,

Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'} + 1) = 0

, and consequently, we find that Equation (42) holds for

N + 1

; in other words, we have

\begin{matrix} Δ_{N + 1} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0 . \end{matrix}

Next, we will prove that Equation (43) also holds for

N + 1

. For any

1 \leq m \leq N^{*} - N

, because of

E (ρ | φ l_{0} + m + N, φ l_{0}^{'}) > E (ρ | φ l_{0} + N, φ l_{0}^{'})

, by using Theorem 1, we obtain

\begin{matrix} Δ_{N + 1} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'}) \geq Δ_{N + 1} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0 . \end{matrix}

Then, there is

Δ_{N + 1}^{+} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'}) = Δ_{N + 1} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'})

.

If

Δ_{N + 1} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'}) \geq 0

, then the inequality in Equation (43) is also proven. If

Δ_{N + 1} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'}) < 0

, then with Equation (12), we have

\begin{matrix} Δ_{N + 1}^{+} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + Δ_{N + 1}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'}) \\ = & Δ_{N + 1} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + Δ_{N + 1} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'}) \\ = & \frac{φ l_{0} + m}{φ l_{0} + φ l_{0}^{'} + m} Δ_{N}^{+} (φ l_{0} + m + 1, φ l_{0}^{'}, l_{0}, l_{0}^{'}) \\ + \frac{φ l_{0}^{'}}{φ l_{0} + φ l_{0}^{'} + m} Δ_{N}^{+} (φ l_{0} + m, φ l_{0}^{'} + 1, l_{0}, l_{0}^{'}) \\ + \frac{l_{0}}{l_{0} + l_{0}^{'}} [Δ_{N}^{-} (φ l_{0} + m, φ l_{0}^{'}, l_{0} + 1, l_{0}^{'}) + Δ_{N}^{+} (φ l_{0} + 1, φ l_{0}^{'}, l_{0} + m, l_{0}^{'})] \\ + \frac{l_{0}^{'}}{l_{0} + l_{0}^{'}} [Δ_{N}^{-} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'} + 1) + Δ_{N}^{+} (φ l_{0}, φ l_{0}^{'} + 1, l_{0} + m, l_{0}^{'})] \\ + \frac{l_{0} + m}{l_{0} + l_{0}^{'} + m} Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + m + 1, l_{0}^{'}) \\ + \frac{l_{0}^{'}}{l_{0} + l_{0}^{'} + m} Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'} + 1) . \end{matrix}

(44)

Using the fact that

m \leq N^{*} - N

, we have

m + 1 \leq N^{*} - N + 1

. Thus, we can apply Equation (43) of N pulls to

m + 1

and obtain

\begin{matrix} Δ_{N}^{+} (φ l_{0} + m + 1, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + m + 1, l_{0}^{'}) \geq 0 . \end{matrix}

(45)

Since

0 < φ \leq 1

, the sum of the first and fifth summands on the right side of Equation (44) is

\begin{matrix} \frac{φ l_{0} + m}{φ l_{0} + φ l_{0}^{'} + m} Δ_{N}^{+} (φ l_{0} + m + 1, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + \frac{l_{0} + m}{l_{0} + l_{0}^{'} + m} Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + m + 1, l_{0}^{'}) \\ \geq & \frac{l_{0} + m}{l_{0} + l_{0}^{'} + m} [Δ_{N}^{+} (φ l_{0} + m + 1, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + m + 1, l_{0}^{'})] \\ \geq & 0 . \end{matrix}

Since

m \geq 1 \geq φ

, and

E (ρ | φ l_{0} + m, φ l_{0}^{'}) \geq E (ρ | φ (l_{0} + 1), φ l_{0}^{'})

, we can obtain by Theorem 1 that

\begin{matrix} Δ_{N} (φ l_{0} + m, φ l_{0}^{'}, l_{0} + 1, l_{0}^{'}) \geq Δ_{N} (φ (l_{0} + 1), φ l_{0}^{'}, l_{0} + 1, l_{0}^{'}) . \end{matrix}

Due to the fact that

\frac{l_{0} + 1}{l_{0}^{'}} > \frac{l_{0}}{l_{0}^{'}} \geq N^{*} - 1

, we can apply the assumption for N pulls and obtain

Δ_{N} (φ (l_{0} + 1), φ l_{0}^{'}, l_{0} + 1, l_{0}^{'}) \geq 0

. Consequently, we have

\begin{matrix} Δ_{N}^{-} (φ l_{0} + m, φ l_{0}^{'}, l_{0} + 1, l_{0}^{'}) = 0 . \end{matrix}

Therefore, the third summand on right side of Equation (44) is nonnegative.

By using Theorem 1 twice, we obtain

\begin{matrix} Δ_{N} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'} + 1) \geq Δ_{N} (φ l_{0}, φ l_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0 . \end{matrix}

Hence,

Δ_{N}^{-} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'} + 1) = 0

, and the fourth summand on right side of Equation (44) is nonnegative.

Now, we only need to consider

Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'} + 1)

. As we know,

m \leq N^{*} - N \leq N^{*} - 1 \leq \frac{l_{0}}{l_{0}^{'}}

. It follows from Remark 2 that

\begin{matrix} Δ_{N} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'} + 1) \geq Δ_{N} (φ l_{0}, φ l_{0}^{'}, l_{0} + \frac{l_{0}}{l_{0}^{'}}, l_{0}^{'} + 1) . \end{matrix}

Let

\tilde{l_{0}} = l_{0} (1 + \frac{1}{l_{0}^{'}})

,

\tilde{l_{0}^{'}} = l_{0}^{'} (1 + \frac{1}{l_{0}^{'}})

and

\tilde{φ} = φ \frac{l_{0}^{'}}{1 + l_{0}^{'}}

. Then,

0 < \tilde{φ} \leq 1

and

\frac{\tilde{l_{0}}}{\tilde{l_{0}^{'}}} = \frac{l_{0}}{l_{0}^{'}} > N^{*} - 1

still hold. Hence, the assumption for N can be applied, and we obtain

\begin{matrix} Δ_{N} (φ l_{0}, φ l_{0}^{'}, l_{0} + \frac{l_{0}}{l_{0}^{'}}, l_{0}^{'} + 1) = Δ_{N} (\tilde{φ} \tilde{l_{0}}, \tilde{φ} \tilde{l_{0}^{'}}, \tilde{l_{0}}, \tilde{l_{0}^{'}}) \geq 0 . \end{matrix}

Then, we obtain

Δ_{N}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'} + 1) = 0

. Thus far, all summands on the right side of the equality in Equation (44) are positive. Then, we find that Equation (43) holds for

N + 1

; that is, for any

1 \leq m \leq N^{*} - N

, we have

\begin{matrix} Δ_{N + 1}^{+} (φ l_{0} + m, φ l_{0}^{'}, l_{0}, l_{0}^{'}) + Δ_{N + 1}^{-} (φ l_{0}, φ l_{0}^{'}, l_{0} + m, l_{0}^{'}) \geq 0 . \end{matrix}

Then, the theorem is proven by mathematical induction. □

The Statement is partially proven in Theorem 6, and hence we can deduce a partial result about Berry’s conjecture.

Consider

I_{0} = (r_{0}, r_{0}^{'}, β; l_{0}, l_{0}^{'}, β)

, where

r_{0}, r_{0}^{'}, l_{0}

and

l_{0}^{'}

are positive real numbers and satisfy

r_{0} + r_{0}^{'} \leq l_{0} + l_{0}^{'}

and

\frac{r_{0}}{r_{0}^{'}} \geq \frac{l_{0}}{l_{0}^{'}}

. Using the properties of the beta distribution, there is

\begin{matrix} E (ρ | r_{0}, r_{0}^{'}) = \frac{r_{0}}{r_{0} + r_{0}^{'}} \geq \frac{l_{0}}{l_{0} + l_{0}^{'}} = E (λ | l_{0}, l_{0}^{'}) . \end{matrix}

For

θ \geq 0

and

K \leq 0

such that

r_{0} > - θ K

and

l_{0}^{'} > θ

, applying Lemma 1, there is

\begin{matrix} Δ_{N} (r_{0} + θ K, r_{0}^{'} - θ K, l_{0} + θ, l_{0}^{'} - θ) \leq Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}), for any N . \end{matrix}

In order to use Theorem 6, we require

\begin{matrix} \frac{r_{0} + θ K}{r_{0}^{'} - θ K} = \frac{l_{0} + θ}{l_{0}^{'} - θ}, \end{matrix}

which is equivalent to

r_{0} l_{0}^{'} - r_{0}^{'} l_{0} = θ [r_{0} + r_{0}^{'} - K (l_{0} + l_{0}^{'})]

. To make

θ

as large as possible, we choose

K = 0

and

θ = \frac{r_{0} l_{0}^{'} - r_{0}^{'} l_{0}}{r_{0} + r_{0}^{'}}

. Now let

\begin{matrix} N^{*} = ⌊\frac{l_{0} + θ}{l_{0}^{'} - θ}⌋ + 1 = ⌊\frac{r_{0}}{r_{0}^{'}}⌋ + 1 . \end{matrix}

We obtain the following result by using Theorem 6:

Theorem 7.

Let

I_{0} = (r_{0}, r_{0}^{'}, β; l_{0}, l_{0}^{'}, β)

,

r_{0}, r_{0}^{'}, l_{0}

and

l_{0}^{'}

be positive real numbers and satisfy

r_{0} + r_{0}^{'} \leq l_{0} + l_{0}^{'}

and

\frac{r_{0}}{r_{0}^{'}} \geq \frac{l_{0}}{l_{0}^{'}}

,

N^{*} = ⌊\frac{r_{0}}{r_{0}^{'}}⌋ + 1

. Then, for

1 \leq N \leq N^{*}

, there is

\begin{matrix} Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0 . \end{matrix}

Proof.

Let

θ = \frac{r_{0} l_{0}^{'} - r_{0}^{'} l_{0}}{r_{0} + r_{0}^{'}}

and

φ = \frac{r_{0}}{l_{0} + θ}

. If

r_{0} \geq l_{0}

, then

Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) \geq 0

by using Corollary 2.

If

r_{0} < l_{0}

, then we know that

0 < φ < 1

, and using the fact that

\frac{l_{0} + θ}{l_{0}^{'} - θ} \geq N^{*} - 1

, we obtain from Theorem 6 that for

N \leq N^{*}

,

Δ_{N} (r_{0}, r_{0}^{'}, l_{0} + θ, l_{0}^{'} - θ) = Δ_{N} (φ (l_{0} + θ), φ (l_{0}^{'} - θ), l_{0} + θ, l_{0}^{'} - θ) \geq 0 .

It can be deduced from Lemma 1 that

Δ_{N} (r_{0}, r_{0}^{'}, l_{0}, l_{0}^{'}) \geq Δ_{N} (r_{0}, r_{0}^{'}, l_{0} + θ, l_{0}^{'} - θ) \geq 0

. Therefore, the proof is completed. □

Obviously

N^{*} > 1

, and thus Berry’s conjecture always holds for

N = 1

. However, when we want to use Theorem 7 for large values of N, we need

N^{*}

to be large enough, which means that

\frac{r_{0}}{r_{0}^{'}}

is very large. This is consistent with the intuitive impression that the greater the expectation of the arm, the greater the advantage in the choice.

6. Conclusions

As we mentioned in the introduction, the bandit model is used to solve the problem of the trade-off between exploration and exploitation. Berry’s conjecture is an exactly intuitive conjecture about exploration and exploitation. In this paper, we show the essence of Berry’s conjecture by proving the equivalence of the conjecture and our Statement. The Statement is easier to verify than Berry’s conjecture and thus provides a new idea for proving Berry’s conjecture. We also proved that Berry’s conjecture holds in two specific models, which can speed up the computation of optimal strategies. We believe that for most positive measures

μ

, Berry’s conjecture is correct. In the future, we will apply our Statement to prove Berry’s conjecture to other specific models.

Author Contributions

Conceptualization, P.W. and J.Z.; methodology, P.W. and J.Z.; validation, J.Z.; formal analysis, P.W. and J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, P.W. and J.Z.; supervision, P.W.; funding acquisition, P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China (No. 2018YFA0703900) and the Natural Science Foundation of Shandong Province (Nos. ZR2021MA098 and ZR2019ZD41).

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current (theoretical) study.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Thompson, W.R. On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika 1933, 25, 285. [Google Scholar] [CrossRef]
Rothschild, M. A two-armed bandit theory of market pricing. J. Econ. Theory 1974, 9, 185–202. [Google Scholar] [CrossRef]
Liberali, G.B.; Hauser, J.R.; Urban, G.L. Morphing Theory and Applications. In International Series in Operations Research & Management Science; Springer International Publishing: Berlin/Heidelberg, Germany, 2017; pp. 531–562. [Google Scholar] [CrossRef]
Aggarwal, C.C. Recommender Systems; Springer International Publishing: Berlin/Heidelberg, Germany, 2016. [Google Scholar] [CrossRef]
Berry, D.A. A Bernoulli Two-armed Bandit. Ann. Math. Stat. 1972, 43, 871–897. [Google Scholar] [CrossRef]
Berry, D.A.; Chen, R.W.; Zame, A.; Heath, D.C.; Shepp, L.A. Bandit problems with infinitely many arms. Ann. Stat. 1997, 25, 2103–2116. [Google Scholar] [CrossRef]
Lin, C.T.; Shiau, C.J. Some Optimal Strategies for Bandit Problems with Beta Prior Distributions. Ann. Inst. Stat. Math. 2000, 52, 397–405. [Google Scholar] [CrossRef]
Steyvers, M.; Lee, M.D.; Wagenmakers, E.J. A Bayesian analysis of human decision-making on bandit problems. J. Math. Psychol. 2009, 53, 168–179. [Google Scholar] [CrossRef]
Jacko, P. The Finite-Horizon Two-Armed Bandit Problem with Binary Responses: A Multidisciplinary Survey of the History, State of the Art, and Myths; Working Paper 3; Lancaster University Management School: Lancaster, UK, 2019. [Google Scholar]
Joshi, V.M. A Conjecture of Berry Regarding A Bernoulli Two-Armed Bandit. Ann. Stat. 1975, 3, 189–202, Correction in Ann. Stat. 1985, 13, 1249. [Google Scholar] [CrossRef]
Yue, J.C. Generalized two-stage bandit problem. Commun. Stat.-Theory Methods 1999, 28, 2261–2276. [Google Scholar] [CrossRef]
Gittins, J.; Jones, D. A Dynamic Allocation Index for the Sequential Design of Experiments. In Progress in Statistics; Gani, J., Ed.; North-Holland: Amsterdam, The Netherlands, 1974; pp. 241–266. [Google Scholar]
Whittle, P. Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 1988, 25, 287–298. [Google Scholar] [CrossRef]
Weber, R.R.; Weiss, G. On an index policy for restless bandits. J. Appl. Probab. 1990, 27, 637–648. [Google Scholar] [CrossRef]
Ahmad, S.H.A.; Liu, M.; Javidi, T.; Zhao, Q.; Krishnamachari, B. Optimality of Myopic Sensing in Multichannel Opportunistic Access. IEEE Trans. Inf. Theory 2009, 55, 4040–4050. [Google Scholar] [CrossRef]
Deo, S.; Iravani, S.; Jiang, T.; Smilowitz, K.; Samuelson, S. Improving Health Outcomes Through Better Capacity Allocation in a Community-Based Chronic Care Model. Oper. Res. 2013, 61, 1277–1294. [Google Scholar] [CrossRef]
Gittins, J.; Glazebrook, K.; Weber, R. Multi-Armed Bandit Allocation Indices; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2011. [Google Scholar] [CrossRef]
Lee, E.; Lavieri, M.S.; Volk, M. Optimal Screening for Hepatocellular Carcinoma: A Restless Bandit Model. Manuf. Serv. Oper. Manag. 2019, 21, 198–212. [Google Scholar] [CrossRef]
Mahajan, A.; Teneketzis, D. Multi-Armed Bandit Problems. In Foundations and Applications of Sensor Management; Springer US: New York, NY, USA, 2008; pp. 121–151. [Google Scholar] [CrossRef]
Washburn, R.B. Application of Multi-Armed Bandits to Sensor Management. In Foundations and Applications of Sensor Management; Springer US: New York, NY, USA, 2008; pp. 153–175. [Google Scholar] [CrossRef]
Gast, N.; Gaujal, B.; Khun, K. Computing Whittle (and Gittins) Index in Subcubic Time. arXiv 2022, arXiv:2203.05207. [Google Scholar]
Niño-Mora, J. A (2/3)n³ Fast-Pivoting Algorithm for the Gittins Index and Optimal Stopping of a Markov Chain. INFORMS J. Comput. 2007, 19, 596–606. [Google Scholar] [CrossRef]
Berry, D.A.; Fristedt, B. Bandit Problems: Sequential Allocation of Experiments; Springer: Dordrecht, The Netherlands, 1985. [Google Scholar] [CrossRef]
Gittins, J.; Wang, Y.G. The Learning Component of Dynamic Allocation Indices. Ann. Stat. 1992, 20, 1625–1636. [Google Scholar] [CrossRef]
Lai, T.; Robbins, H. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 1985, 6, 4–22. [Google Scholar] [CrossRef]
Agrawal, R. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 1995, 27, 1054–1078. [Google Scholar] [CrossRef]
Audibert, J.Y.; Bubeck, S. Regret Bounds and Minimax Policies under Partial Monitoring. J. Mach. Learn. Res. 2010, 11, 2785–2836. [Google Scholar]
Audibert, J.Y.; Munos, R.; Szepesvári, C. Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci. 2009, 410, 1876–1902. [Google Scholar] [CrossRef]
Auer, P.; Cesa-Bianchi, N.; Fischer, P. Finite-time Analysis of the Multiarmed Bandit Problem. Mach. Learn. 2002, 47, 235–256. [Google Scholar] [CrossRef]
Cappé, O.; Garivier, A.; Maillard, O.A.; Munos, R.; Stoltz, G. Kullback-Leibler upper confidence bounds for optimal sequential allocation. Ann. Stat. 2013, 41, 1516–1541. [Google Scholar] [CrossRef]
Honda, J.; Takemura, A. An Asymptotically Optimal Bandit Algorithm for Bounded Support Models. In Proceedings of the COLT 2010, Haifa, Israel, 27–29 June 2010; pp. 67–79. [Google Scholar]
Lai, T.L. Adaptive Treatment Allocation and the Multi-Armed Bandit Problem. Ann. Stat. 1987, 15, 1091–1114. [Google Scholar] [CrossRef]
Kaufmann, E. On Bayesian index policies for sequential resource allocation. Ann. Stat. 2018, 46, 842–865. [Google Scholar] [CrossRef] [Green Version]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, J.; Wu, P. On the Conjecture of Berry Regarding a Bernoulli Two-Armed Bandit. Mathematics 2023, 11, 733. https://doi.org/10.3390/math11030733

AMA Style

Zhang J, Wu P. On the Conjecture of Berry Regarding a Bernoulli Two-Armed Bandit. Mathematics. 2023; 11(3):733. https://doi.org/10.3390/math11030733

Chicago/Turabian Style

Zhang, Jichen, and Panyu Wu. 2023. "On the Conjecture of Berry Regarding a Bernoulli Two-Armed Bandit" Mathematics 11, no. 3: 733. https://doi.org/10.3390/math11030733

APA Style

Zhang, J., & Wu, P. (2023). On the Conjecture of Berry Regarding a Bernoulli Two-Armed Bandit. Mathematics, 11(3), 733. https://doi.org/10.3390/math11030733

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Conjecture of Berry Regarding a Bernoulli Two-Armed Bandit

Abstract

1. Introduction

2. Preliminarys

2.1. The Initial Distributions

2.2. The Function $Δ$

2.3. Berry’s Conjecture and Related Results

3. Main Result

4. Two-Point Distribution Case

5. Beta Distribution Case

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

On the Conjecture of Berry Regarding a Bernoulli Two-Armed Bandit

Abstract

1. Introduction

2. Preliminarys

2.1. The Initial Distributions

2.2. The Function Δ

2.3. Berry’s Conjecture and Related Results

3. Main Result

4. Two-Point Distribution Case

5. Beta Distribution Case

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2. The Function $Δ$