Abstract
We consider the following generalization of the classical coupon collector problem. We assume that, in addition to the initial collection of standard coupons, there is one more coupon that acts as a reset button, removing all coupons from the part of the collection that has already been drawn. For the case where standard coupons have unequal probabilities of being drawn, we obtain the distribution of the waiting time until the end of the collection process. For the case where standard coupons have equal probabilities, we derive a simple formula for the expected waiting time in terms of the beta function, and discuss the asymptotic properties of this expected waiting time, when the number of standard coupons tends toward infinity.
Keywords:
coupon collector problem; reset coupon; expected waiting time; Markov chain; beta function MSC:
60C05; 60J10
1. Introduction
In the classical coupon collector problem (CCP) the goal is to analyze the waiting time (in days) for a collector, who buys a coupon each day, in order to complete a full collection of n distinct coupons from the set , assuming that each coupon is drawn with probability . Similarly, the coupon collector problem with unequal probabilities (CCPU) refers to the situation where the coupon is drawn with probability and .
The coupon collector problem can be formulated in different ways. In its simplest form, it appears in the well known work [1], but it can also be formulated and treated as a special kind of urn model (see, for example, [2,3]) or in the context of formal languages [4]. On the other hand, several results related to the waiting time problems have been obtained using Markov chain techniques (see, for example, [5,6]). The coupon collector problem and its generalizations also led to various types of asymptotic results (see, for example, [7,8,9]).
We consider the following generalization of the coupon collector problem, which, to our knowledge, has not been considered before. We assume that there is a special coupon (we call it the reset coupon) that does not belong to and acts as a reset button, in the sense that the set of coupons drawn up to time (day) t becomes empty when the reset coupon is drawn on day . After that, the collection can start again from the beginning (or not). Therefore, we work with an augmented set of available coupons,
where ⊗ denotes the reset coupon.
We call this version of the problem coupon collector with reset button and refer to it as CCPRB in the rest of the text.
We assume, in sampling with replacement, that the probability of obtaining the reset coupon is , , and that the probabilities of obtaining standard coupons (from the set ) sum up to . Obviously, the problem reduces to CCP or CCPU when .
The CCPRB problem we consider belongs to the group of generalizations of CCP that are based on the idea of introducing additional coupons with special purposes into the coupon set. Other generalizations of CCP of this type are analyzed in [5,10,11,12,13].
The variant of the coupon collector problem, where each coupon can have several purposes (so called, goals), is considered in [13]. In this case, the experiment ends when the sum of the numbers of goals reaches a certain limit.
Another generalization of the classical coupon collector problem is proposed in [11], where the appearance of an additional coupon (so called, bonus coupon) leads to obtaining one more coupon.
In [12], the author considers the case where the additional coupon (so called, penalty coupon) interferes with collecting standard coupons, in the sense that the collection process ends when the absolute difference between the number of collected standard coupons and the number of collected penalty coupons is equal to the total number of standard coupons.
In this work, we refer to results related to the coupon collector problem with a null coupon, as considered in [5,10]. This is the situation where the probabilities of the standard coupons sum up to , or, equivalently, there is a null coupon, that can be drawn with probability , but does not belong to any collection. This variant of the problem reduces to CCPU when .
It is well known that the coupon collector problem has various applications in engineering (see, for example, [14]). In particular, the CCPU has recently been used in biology, to model parasitism (explained in [15]), and in telecommunications, to model the transmission of information in computer networks [16], and to analyze Internet security problems (analyzed in [5,10,17]).
The structure of this paper is as follows. In Section 2, we obtain the general properties of the waiting time for a full collection of standard coupons, in the case of CCPRB with unequal probabilities. In Section 3, we obtain the expected waiting time for a full collection in the case of equal probabilities, and derive its relation to the beta function. In Section 4, we provide some numerical examples. In Section 5, we discuss the asymptotic properties of the expected waiting time for a full collection in the case of equal probabilities, for different values of and when n tends to infinity, and give some specific examples. The conclusions are given in Section 6.
2. Distribution of the Waiting Time for a Full Collection in General Case
Here, we derive the distribution of the waiting time until a full collection of standard coupons is sampled in the case of unequal probabilities (where each coupon is drawn with probability , such that ).
Let denote the waiting time until a full collection of standard coupons in CCPRB are collected, and denote the corresponding waiting time in the coupon collector problem with unequal probabilities and no additional coupons. We will also use the notation and .
The distribution of the waiting time is a well known result,
(see, for example, [5], Theorem 1, p. 409).
The corresponding result related to the waiting time is obtained in the next theorem. In the rest of the text, we will use the abbreviation for
Theorem 1.
For every , for the waiting time , the following relations hold:
- 1.
- 2.
Proof.
- 1.
- Each sequence draws up to time k can be presented in the formwhere R represents a single reset coupon and , represent blocks of standard coupons, such that the length of the block is , , and none of the blocks consist of a full collection of standard coupons.We define the events as follows:
- , : m reset coupons appeared up to time t;
- , : block of the length k does not consist of a full collection of standard coupons.
Therefore,On the other hand, we havesince all the blocks consist of incomplete collections of standard coupons, their lengths sum up to , and the appearance of the blocks of any combination of lengths are mutually independent events. This completes the proof of the statement. - 2.
- This is a simple modification of the first part of the theorem. Each realization of the experiment has the formwhere none of the blocks , consists of a full collection of standard coupons, and the block consist of the full collection of standard coupons. Using the fact thatwe complete the proof of the theorem.
□
Remark 1.
The sequence of draws in the CCPRB can be seen as a renewal process, as the coupon collection starts over after each reset. More precisely, the events of the type , defined in (5), can be regarded as recurring events, in the sense of Definition 1, p. 308 in [1].
Example 1.
If , and , the probability that the full collection of standard coupons has not been drawn by the time (day) is equal to
where
Therefore,
Remark 2.
Remark 3.
It is well known that, for large n, the computation of probabilities associated with the coupon collector problem with unequal probabilities, such as (3), becomes computationally intensive, and requires some sort of approximation or bounds. This is even more obvious in the case of CCPRB. However, the upper and lower bounds for the probability (3) can be obtained directly by applying the corresponding upper and lower bounds for the probability (2) in (3). For a detailed discussion on this topic, and a comprehensive list of upper and lower bounds on the probability (2), see [17].
Observing that
we obtain an additional, simple lower bound for the probability :
Remark 4.
Let denote the waiting time until c, , out of n coupons in CCPRB are collected. For this waiting time, results analogous to Theorem 1 and its consequences can be derived using the same technique.
3. Expected Waiting Time for a Full Collection in the Case of Equal Probabilities
The expected waiting time for a full collection, or a subcollection, can be obtained from (10). However, if we assume that all the standard coupons have an equal probability of being drawn, the expected waiting time for a full collection has a simpler form, which is conveniently derived using the Markov chain technique.
Let be the number of different types of standard coupons sampled after t units of time (days). We can notice that is a Markov chain on the state space:
Depending on how we define the end of the collection process, we can distinguish between two characteristic cases.
3.1. Case 1: The Collector Gives up Collecting after the First Reset
In the first case, the collector starts with a certain number of coupons, and buys coupons until he completes his collection, or the first reset happens.
The transition probability matrix is
and
is the waiting time until absorption, starting from the state k.
3.2. Case 2: The Collector Keeps Collecting after the Reset
In this case, the collector buys coupons until he completes his collection, regardless of how many resets occur in the meantime.
The transition probability matrix is
and
is the waiting time until absorption, starting from the state k.
The expected waiting times and , are obtained in the next theorem.
Theorem 2.
- 1.
- For the expected waiting time in Case 1, Section 3.1, the following relations hold:where denotes the gamma function.
- 2.
- For the expected waiting time in Case 2, Section 3.2, the following relations hold:
Proof.
- 1.
- Applying the first step analysis to the Markov chain with the transition probability matrix (14), we obtain thatNext, we haveNow, we can rewrite (21) aswhich completes the proof of the statement.
- 2.
- Applying the first step analysis to the Markov chain with the transition probability matrix (16), we conclude thatand
□
Example 2.
The expected waiting times and are the most general, in the sense that the collector has to wait for the almost full or full collection. Next, we will provide simplified expressions for the waiting times and for the case . For that purpose, we need the next lemma.
Lemma 1.
Let denote the falling factorial:
- 1.
- For any , the following equality holds:
- 2.
- For any , , , the following equality holds:
Proof.
- 1.
- 2.
- Using (33), we obtainwhich completes the proof of the lemma.
□
Theorem 3.
- 1.
- If , for the expected waiting time , the following equality holds:where denotes the beta function.
- 2.
- If , for the expected waiting time , the following equality holds:
4. Numerical Examples
In this section, we provide numerical examples for the CCPRB with equal probabilities, as analyzed in Section 3. We assume that the set of available coupons is . We consider different values of the probability and calculate the expected waiting time for this case using formula (27). The results are shown in Table 1. Statistical Software R, version 2023.03.0+386 was used for all calculations.
Table 1.
Expected waiting time .
Next, we show how the expected waiting time depends on the probability for different values of n.
Note that the behavior of the expected waiting time , depicted in Figure 1, is consistent with the intuition we have about CCPRB: increases as n increases (as having more standard coupons to collect extends the waiting time), and increases as increases (as resets remove the coupons already collected, and therefore extend the waiting time).
Figure 1.
Expected waiting time in terms of for different values of n.
In some cases considered, we can also notice some kind of exponential growth, which we discuss in more detail in the next section.
5. Asymptotic Properties of the Waiting Times and
Here, we analyze the properties of the expected waiting times until the end of the collection process, as the number of standard coupons n tends to infinity, for different values of the probability . We can distinguish between the case when is fixed, and the case when depends on n.
For fixed , we can apply the Stirling approximation of the term in Theorem 3, and obtain the asymptotic estimate for , as , formulated in the next proposition.
Proposition 1.
For any , the following asymptotic relation holds as :
Remark 5.
For the case when the ratio is bounded (which means that the collection process is not interrupted too often by resets), we have the following asymptotic result.
Proposition 2.
For , the following asymptotic relation holds, :
Proof.
The statement follows from Theorem 3 and the relation
(see, for example, [19], page 257). □
Next, we analyze some particular cases of the problem.
Example 3.
The case corresponds to the situation where the reset coupon has the same probability of being drawn as any other coupon. Using Theorem 3, we obtain the expressions
and
This case is “exactly solvable", in the sense that, for a given value , we can simply obtain such that the expected waiting time is less or equal to . Precisely, from the inequality
and the fact that , we obtain that
Example 4.
The case corresponds to the situation where the ratio is equal to n. Using Theorem 3, we obtain the expressions
and
6. Conclusions
In this paper, we considered a generalization of the coupon collector problem where there is an additional coupon that resets the collection process and is drawn with probability .
We distinguished between cases where standard coupons are drawn with arbitrary probabilities that sum up to (which generalizes CCPU), and the case where standard coupons are drawn with the same probability (which generalizes CCP). The object of interest was the waiting time until the end of the coupon collection procedure.
For the case of equal probabilities, we specified the distribution of this waiting time in a very general way, in terms of the corresponding distribution in CCPU.
For the case of equal probabilities, applying the first step analysis for the appropriately constructed Markov chains, we have obtained the expression for the expected waiting time in both possible cases (when the collection process continues after the reset or not), and derived its simple form in terms of the beta function. We determined the asymptotic behavior (when the size of the collection tends to infinity) of the expected waiting time, considering possible values of the probability (fixed or depending on n).
We have also discussed some characteristic examples of this problem.
The possible applications (or interpretations) of this work relate to the detection of distributed denial-of-service (DDoS) attacks, which are explained, for example, in [5]. The authors conclude that, since the solution to these attacks is to continuously monitor Internet traffic, the occurrence of standard coupons in the coupon collector problem corresponds to tracking c (out of n) recent high traffic flows, where the portion size c is determined by server capacity, and the probability of null coupon, , corresponds to the case of flows with small probabilities that sum up to . In this context, the appearance of the reset coupon at the moment can be interpreted as a system crash, which leads to losing information about high traffic flows that have been monitored up to that moment. Having information about the frequency of such crashes is important for maintaining proper protection.
Author Contributions
Conceptualization, J.J.; methodology, J.J. and B.T.; software, B.T.; validation, B.T.; formal analysis, J.J. and B.T.; writing—original draft preparation, J.J. and B.T.; supervision, J.J. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the Ministry of Science, Technological Development and Innovation of the Republic of Serbia: Grant No. 451-03-47/2023-01/200104.
Data Availability Statement
The data are contained within the article.
Acknowledgments
The authors would like to thank the reviewers for carefully checking the manuscript and for providing extremely useful comments and suggestions.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Feller, W. An Introduction to Probability Theory and Its Applications, 3rd ed.; John Wiley and Sons Inc.: New York, NY, USA, 1968; Volume 1. [Google Scholar]
- Samuel-Cahn, E. Asymptotic distributions for occupancy and waiting time problems with positive probability of falling through the cells. Ann. Probab. 1974, 2, 515–521. [Google Scholar] [CrossRef]
- Mahmoud, H. Polya urn Models; Chapman & Hall/CRC: Boca Raton, FL, USA, 2009. [Google Scholar]
- Flajolet, P.; Gardy, D.; Thimonier, L. Birthday paradox, coupon collectors, caching algorithms and self-organizing search. Discret. Appl. Math. 1992, 39, 207–229. [Google Scholar] [CrossRef]
- Anceaume, E.; Busnel, Y.; Sericola, B. New results on a generalized coupon collector problem using Markov chains. J. Appl. Probab. 2015, 52, 405–418. [Google Scholar] [CrossRef]
- Kim, B.; Kim, J. Sooner waiting time problems in a sequence of multi-state trials with random rewards. Statist. Probab. Lett. 2019, 153, 171–179. [Google Scholar] [CrossRef]
- Baum, L.E.; Billingsley, P. Asymptotic distributions for the coupon collector’s problem. Ann. Math. Statist. 1965, 36, 1835–1839. [Google Scholar] [CrossRef]
- Mladenović, P. Limit distributions for the problem of collecting pairs. Bernoulli 2008, 14, 419–439. [Google Scholar] [CrossRef]
- Neal, P. The generalized coupon collector problem. J. Appl. Probab. 2008, 45, 621–629. [Google Scholar] [CrossRef]
- Anceaume, E.; Busnel, Y.; Schulte-Geers, E.; Sericola, B. Optimization results for a generalized coupon collector problem. J. Appl. Probab. 2016, 53, 622–629. [Google Scholar] [CrossRef]
- Nakata, T.; Kubo, I. A coupon collector’s problem with bonuses. In Proceedings of the 4th Colloquium on Mathematics and Computer Science, Nancy, France, 18–22 September 2006; pp. 215–224. [Google Scholar]
- Todić, B. Coupon collector problem with penalty coupon. Mat. Vesn. 2023; in press. [Google Scholar] [CrossRef]
- Wild, M.; Janson, S.; Wagner, S.; Laurie, D. Coupon collecting and transversals of hypergraphs. Discrete Math. Theor. 2013, 15, 259–270. [Google Scholar] [CrossRef]
- Boneh, A.; Hofri, M. The coupon-collector problem revisited—A survey of engineering problems and computational methods. Comm. Statist. Stoch. Models 1997, 13, 39–66. [Google Scholar] [CrossRef]
- Zoroa, N.; Lesigne, E.; Fernández-Sáez, M.J.; Zoroa, P.; Casas, J. The coupon collector urn model with unequal probabilities in ecology and evolution. J. R. Soc. Interface 2016, 14, 20160643. [Google Scholar] [CrossRef] [PubMed]
- Comets, F.; Delarue, F.; Schott, R. Information transmission under random emission constraints. Combin. Probab. Comput. 2014, 23, 973–1009. [Google Scholar]
- Shioda, S. Some upper and lower bounds on the coupon collector problem. J. Comput. Appl. Math. 2007, 200, 154–167. [Google Scholar] [CrossRef]
- Taylor, H.M.; Karlin, S. An Introduction to Stochastic Modeling, 3rd ed.; Academic Press: San Diego, CA, USA, 1998. [Google Scholar]
- Abramowitz, M.; Stegun, I.A. (Eds.) Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; Dover Publications Inc.: New York, NY, USA, 1965. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
