Asymptotic Duration for Optimal Multiple Stopping Problems

: We study the asymptotic duration of optimal stopping problems involving a sequence of independent random variables that are drawn from a known continuous distribution. These variables are observed as a sequence, where no recall of previous observations is permitted, and the objective is to form an optimal strategy to maximise the expected reward. In our previous work, we presented a methodology, borrowing techniques from applied mathematics, for obtaining asymptotic expressions for the expectation duration of the optimal stopping time where one stop is permitted. In this study, we generalise further to the case where more than one stop is permitted, with an updated objective function of maximising the expected sum of the variables chosen. We formulate a complete generalisation for an exponential family as well as the uniform distribution by utilising an inductive approach in the formulation of the stopping rule. Explicit examples are shown for common probability functions as well as simulations to verify the asymptotic


Introduction
Optimal stopping problems are formulated in terms of observing random variables and determining the stopping time(s) in order to maximise an objective function (which can be thought of as some reward).The discrete single-stopping problem involves observing a sequence of random variables y 1 , y 2 , . . ., y N , and making the choice to stop on a particular observation y m , where 1 ≤ m ≤ N, which is based only on the variables that have been previously observed.After stopping, the "player" receives some pay-off which is a function of the variables observed y 1 , . . ., y m .In [1], we used the pay-off of y m (i.e., the value of the variable stopped on).To extend this to the multiple optimal stopping problem we stop on k > 1 variables and, after stopping on variables y m 1 , y m 2 , . . ., y m k at times m 1 , m 2 , . . ., m k , we then receive a gain which is a function of the variables observed.This problem is a subset of a general class of other optimal stopping problems that all aim to find a sequential procedure to maximise the expected reward (see section 13.4 of [2] for a more extensive discussion of this class of problem).The secretary problem is arguably the most well known (see [3,4]), and it has a wide range of variations (see [5,6]), but there is also a rich literature of other examples (see [7]).By the 'duration' (sometimes referred to as 'time') of the stopping problem, we refer to how many observations the statistician observes before optimally stopping.It should be emphasised that knowing how long you would be waiting, on average, for potentially large sequences of observations is also useful to understand (see [8]), which highlights the need for asymptotic analysis to address this question.
Less focus has been placed on understanding the asymptotic behaviour of the stopping duration, with most pre-existing results focusing on secretary-type, so called "no-information", problems where the distribution of the observations is unknown.The asymptotic expectation and variance of the stopping time for the secretary problem were studied in [9].Similar asymptotic analyses for other variants of no-information problems can be found in [10][11][12][13], where the techniques/formulations used differ depending on the particular variation or structure of the problem.
There is substantially less literature describing the asymptotic behaviour for "fullinformation" problems, when the distribution of the variables is known a priori.A smaller subset addresses the multiple stopping problem, which we focus on in this study.
Gilbert and Mosteller [11] studied the optimal stopping strategy for the full-information problem in which the objective is to maximise the probability of attaining the best observation, known as the full-information best-choice problem [14].The optimal rule was shown to be a threshold strategy wherein the player would stop on y m if it was the best observed so far after all other observations and its value exceeded a threshold that depended on m.The asymptotic behaviour of this rule was also derived.
In the full-information case, the pay-off can instead be in terms of the actual values of the variables stopped on.A special case of this is the uniform game (see section 5a of [11]), which is closely related to Cayley's problem (see [3,15]).In [11], the authors showed an asymptotic expression for the expected reward of a sequence of n independent and identically distributed (iid) random variables having the standard uniform distribution (see also [15]).In [9], Mazalov and Peshkov found the asymptotic behaviour of the expected value and variance of the stopping time as N/3 and N 2 /18, respectively, when the variables are from the uniform distribution.However, the techniques applied were specific to the structure of the distribution, and thus, it is difficult to easily extend or generalise this to other distribution functions.In [16,17], using the extreme value theory, Kennedy and Kertz proved limit theorems for threshold-stopped random variables and derived the asymptotic distribution of the reward sequence of the optimal stopping (iid) random variables.The asymptotic pay-off for the multiple case is briefly analysed in (section 5c of [11]).
In [1], we outlined a novel approach for a general asymptotic technique for calculating the asymptotic behaviour of the pay-off as well as E(τ N ) and Var(τ N ) in the single-stopping case, where τ N is the single-stopping time, as N → ∞ for general classes of probability distributions in the full-information problem where we wish to maximise the expected reward y m .The techniques in our previous paper, which are extended in this study, employ the asymptotic analysis of difference and differential equations in order to establish and solve asymptotic differential equations for the quantities of interest.Differential equations were also used in [18,19].In this study, we extend some of our results in [1] to the multiple stopping case through some inductive arguments and verify these results with simulations.For simplicity, we only analyse continuous distributions and reserve the notation f (x), F(x), and h(y) = 1 − F(x) for the continuous probability density, cumulative distribution, and "survivor" function, respectively.We use the notation f (x)∼g(x) to establish the asymptotic relation that lim = 1.

Formulation of the Multiple Optimal Stopping Problem
As in the single-stopping problem, we still sequentially observe the sequence y 1 , y 2 , . . ., y N of independent, identically distributed (iid) random variables from a known distribution, we must decide which k of these variables to stop on.After k, k ≥ 2 stoppings at times The random variable y m can be interpreted as the value of some asset, such as a house, at time m.The problem of selling k identical objects in a finite time (or horizon) N with one offer per time period and no recall of previous offers is analogous to the multiple optimal stopping problem described.If we stop at m 1 after observations (y 1 ,y 2 ,. . ., y m 1 ), then we proceed to observe another sequence y m 1 +1 , y m 1 +2 , . . ., y N (whose length depends on m 1 ) and must solve the new optimal stopping problem on this sequence.From [20], we have the following theorem.
Theorem 1.Let y 1 , y 2 , . . ., y N be a sequence of independent random variables with known cumulative distribution functions (cdfs) F 1 , F 2 , . . ., F N .Let v L,l be the value, which is the optimal expected reward, of a game with l, l ≤ k stoppings and L, L ≤ N steps.If there exist E(y 1 ), E(y 2 ), . . ., E(y N ), then the value v of the 'game' is v N,k , where We put is the optimal stopping rule.
then we may notice that the stopping rules now take the form so we may interpret v i,j as the appropriate threshold value that needs to be satisfied for stopping for the ith occasion at the jth term in the sequence of N observations.We can then define w i,j = P(y < v i,j ), which can now be interpreted as the probability that in the above situation we do not stop.
We derived the equations for v n,1 in the original paper (for details, see [1]): where v 0,1 = 0.For v n,2 we have that which can then be used to numerically determine the values of the 'game'.
For the example of N = 6, we may produce the values for v n,1 and v n,2 for each value of n from n = 1 to n = 6, as displayed in Table 1.For example, consider this sequence of simulated variables: 0.1081, 0.6987, 0.1483, 0.4123, 0.8968, 0.7242.
For the first stop, we would arrive on y 2 = 0.6987, since y 3 is the first variable to satisfy 1 and for the second, we arrive on y 5 = 0.8968, as this is the first subsequent variable for which y m 2 ≥ v N−m 2 ,1 .This particular example would have resulted in the reward 0.6987 + 0.8968 = 1.5955.

Computing v n,k Behaviour
We establish a similar recurrence result as we did for the single-stopping case.We note that this relation for v n,k will now be a second-order relation.For a convenient evaluation of the expectation, we use the fact that for any continuous integrable random variable X with cdf F(x), the expectation can be given by Theorem 2. Let Y be an integrable random variable whose expectation exists, and which is drawn from a continuous probability distribution function (pdf) f (y) with survivor function h(y) = 1 − F(y).The value of a sequence with n + 1 steps and k stops remaining is given by Proof.For ease of notation, we let Then, by definition, we have where the last expectation, using (3), is given by Substituting this result, noting that the v n,k−1 terms cancel, we obtain the required result.
We note that if f (y) has bounded support in the positive direction such that f (y) = 0 for y > y max , it follows from above that By the controlling factor method, we also have that v n+1,k − v n,k ∼ (v n,k ) ′ , which can be combined with the previous integral to establish the following: This may be differentiated on both sides to obtain the asymptotic relation for the second derivative: This expression can now be rearranged for h(v n,k − v n,k ) to give Depending on the asymptotic nature of v n,k , v n,k−1 and their derivatives, this result can be used through direct substitution.In other scenarios, the derivative expressions may not yield useful expressions and the behaviour of h(v n,k − v n,k−1 ) can be analysed directly without this result.We will provide variations of this in the subsequent example calculations.

Example Calculations
We illustrate the application of these ideas to some common distributions, such as the uniform and exponential distributions.However, the nature of the differential equations that arise from the multiple stopping problem are much harder to solve-some of them have no solution.For the differential equation for the large asymptotic behaviour, we have that h(y) = b−y b−a .Defining v := v n,2 − v n,1 and rearranging in the asymptotic relation for (v n,2 ) ′ we obtain From [1], we have the asymptotic relation n , which can be directly substituted into the above equation: We can solve this formal differential equation to obtain where c arises as a constant of integration but may be dropped since it is part of a subdominant term.We now replace v with v n,2 − v n,1 , substitute our asymptotic relation for v n,1 , and rearrange for v n,2 to obtain We may notice, in general, that whenever v satisfies the relation where ∆ is some positive constant, that we obtain the asymptotic relation This can be used to generalise the asymptotic behaviour for v n,k for the uniform distribution.
Theorem 3. Consider X 1 , X 2 , . . ., X N to be independent identically distributed uniform variables on [a, b], b > a.The reward sequence v n,k follows the asymptotic relation where Proof.We have shown this to be true for k = 1 and k = 2. Now, assume that which, from (10) and (11), has the asymptotic solution To conclude the proof, we rearrange for v n,k+1 to obtain We noted in [1] that the behaviour of v n,1 was identical for the family of distributions with exponential tails.The next example will seek to unify such distributions that corresponded to α = 1 in the multiple stopping scenario.We obtain the ordinary differential equation where Γ represents the upper incomplete gamma function, and e −y/β dy which is sub-dominant in the asymptotic differential equation.The solution to the differential equation is thus approximated by We have, from [1], v n,1 ∼ β log(n), and so the general case for k > 2 may be presented by mathematical induction.
Theorem 4. Let X 1 , X 2 , . . ., X N be random variables from a distribution f (y) that, for sufficiently large y, satisfies for positive ∆, and where β and γ are positive constants.Assume each of the terms in the sequence of reward values v n,1 , . . ., v n,k increases without bound.Then, the asymptotic behaviour of v n,k is given by v Proof.From (15) we have that the behaviour of v n,k is related by We have verified the claim for n = 1 in [1].We now assume that v n,k ∼ β log(n k ) as n → ∞ and use this to prove the same for v n,k+1 .We write the asymptotic differential equation (v n,k+1 ) ′ ∼ γe −(v n,k+1 −v n,k )/β and substitute for our assumed asymptotic for v n,k to obtain which is a separable differential equation with solution as required.

Calculating the Optimal Expectation
In this section, we continue the ideas from the single-stopping calculation to calculate the expectation for the multiple stopping rules.We proceed to find the asymptotic for the expected value τ * 1 , the first stopping time.We can then find the rest of the expectations in an inductive fashion under certain conditions.We now extend some of the previous notation to reflect the multiple reward sequences, as well as k stopping variables.Let τ * 1 , τ * 2 , . . ., τ * k denote the 1st, 2nd, . . ., kth stopping times, respectively, and let w i,j = P(y < v i,j ), v i,j = v N−j,k−i+1 − v N−j,k−i , i = 1, . . ., k, j = 1, . . ., N.
5. An Asymptotic Equation for E(τ * 1 ) By recalling the notation from the beginning of this section, we obtain E(τ * 1 ) through We split the summation for E(τ * 1 ) on a value k * , where 0 ≪ k * ≪ N: where we apply the fact that 0 < w 1,N−1−j < 1 to obtain a bound for the first summation term: For the second summation term, as k * is large in the limit as N → ∞, we may use the asymptotic approximations for v n,k obtained in the previous section.
For many distributions, this can be simplified through using the large-n asymptotic for v n,k and its derivatives, or obtaining an asymptotic expression for h(v n,k − v n,k−1 ) through other means.In the case where h(v 1,N−1−k ) ∼ λ j , we have from [1] that Example 4. Uniform Distribution for k stops.
For simplicity, we first consider the double stopping problem (k = 2).From Example 2, we have that , and thus, h(v n,2 − v n,1 ) ∼ 1+ √ 5 n .We then have that For the general result with k > 2 stops for the uniform distribution, we apply the asymptotic behaviour of h(v n,k − v n,k−1 ) to obtain For k = 1, conveniently ∆ 0 = 0, this retrieves N 3 .For k = 2, we have that ∆ 1 = 2, and so, this retrieves N
Example 5 (Distributions with an Exponential Tail (k stops)).We once again consider a probability distribution f (y) is given with a survival function that, for sufficiently large y, satisfies h(y) − γe −(y/β) < e −(y/β) y ∆ where the additional conditions are described in Example 3.
We found that the sequences v n,k satisfy the asymptotic relation v n,k ∼ β log(n k ) and so From this we obtain asymptotic relations for the derivatives: Hence, we obtain Provided that E(τ * 1 ) is asymptotically of this form, we may make use of linearity of expectation to obtain convenient conditional formulae that are not as complicated as those encountered in the previous section.
6.An Inductive Approach for E(τ * j ), j > 1 Due to the independence of observations, it starts to make physical sense to view the expectation of the (j + 1)th stopping time as a function of only the previous stopping time.We, thus, investigate the properties of the (j + 1)th stopping time when conditioned on the jth.We would expect then to only need to add the additional expected number of observations to stop one more time out of a 'reduced' optimal stopping problem.We now introduce the notation τ * N,j,k to allow for more flexible interactions between stopping times.
Definition 1.Let τ * N,j,k denote the jth stopping time (out of k) in the optimal stopping problem with N observations.
Here, τ * j,N,k denotes the τ * j used in the previous section.τ * 1,N,1 corresponds to the τ * in the single-stopping problem.Theorem 5. Let X 1 , X 2 , . . ., X N be independent and identically distributed random variables where the expectations of the optimal stopping times {τ * j,N,k } exist.Suppose further that the first stopping time out of k stops has asymptotic expectation E(τ * 1,N,k ) ∼ N λ 1,k for some constant λ 1,k .Then, the following relation holds: where λ 1,k−j is some other positive constant.

Conclusions
In this paper, we have derived asymptotics of multiple optimal stopping times for sequences of independent identically distributed continuous random variables through extending the pre-existing methodology that we developed for the single-stopping case.It is anticipated that a similar class of results can be established for other classes of distribution, but it is not clear if the resulting differential equations are easily solvable.
Asymptotic calculations were performed on a number of probability distributions, on both bounded and unbounded domains.The asymptotic properties for k ≥ 2 were then inductively obtained.Numerical simulations were subsequently performed to calculate the expectation of the optimal stopping rule for a range of values of N, ranging from 10 to 1000.In each case, the simulated results tended towards the asymptotic prediction in the large-N limit, validating the asymptotic and inductive approach.

Example 2 .
The uniform distribution is given by f (y) = 1 b−a on y ∈ [a, b], where b > a.

Example 3 .
A continuous probability density function f (y) is given with a survival function that, for sufficiently large y, satisfies h(y) − γe −(y/β) < e −(y/β) y ∆ (13)for positive ∆, and where β and γ are positive constants.Assume each of the terms in the sequence of reward values v n,1 , . . ., v n,k increases without bound.

Table 1 .
Values for the standard uniform distribution.