Abstract
We study the distribution theory of some statistics related to records from a sequence of geometrically distributed random variables. The main novelty in this study is the introduction of q-calculus techniques. We obtain representations for the probability functions of some variables of interest, such as, for instance, record times, number of records, inter-record times, record indicators, etc., in terms of multiple q-integrals. It is remarkable to note that the expressions thus obtained are q-analogues of the corresponding well-known results in the absolutely continuous case. We also explore a duality between the results in the case of weak and ordinary records.
Keywords:
weak and ordinary records; geometric distributions; Jackson integral; multiple q-integral; q-Stirling numbers MSC:
05A30; 60E05; 62E15
1. Introduction
The origins of q-calculus date back to the works of [1] in connection to infinite series in his famous book Introductio in Analysin Infinitorum. In the last few decades, there has been a considerable interest in the topic due to its numerous applications in different areas of mathematics and physics, such as number theory, special functions, orthogonal polynomials, operator theory, combinatorics, probability, quantum mechanics, relativity, string theory, etc. An extensive treatment on q-calculus, its history and applications can be found in [2].
In this paper, we explore a connection between q-calculus and the distribution theory of records from sequences of random variables geometrically distributed.
Ref. [3] started the probabilistic study of record values and gave many of their basic distributional properties. Let us recall some definitions and results. Let be a sequence of independent and identically distributed (iid) random variables with the common distribution function F. The sequence of (upper) ordinary record times, , is defined as , with . The m-th (ordinary) record of the sequence is the random variable , . For , the record indicators are the binary random variables
and with probability one. In the case in which F is continuous, the basic distribution theory of the statistics above defined is very well-known; see, for instance, chapter 2 of [4]. In the continuous case, the distributions of the record indicators, the record times and the inter-record times do not depend on the parent distribution, and we have results like the following (for further references, see [5]):
- The record indicators are independent Bernoulli random variables with , . Then
- The joint probability mass function (pmf) of the first m (non-trivial) record times isfor integers ; alsowith being the Stirling numbers of the first kind defined by the equality
- The pmf of the inter-record times, , satisfies (see [6])so then, for ,
- The number of records in a sequence of length n, denoted , has pmfand its expected value is .
If the parent distribution, F, is discrete, the previous results are no longer true, and in general, the distribution of record times, inter-record times, record indicators and number of records depend on F. Thus, the distribution theory for records from discrete parents is more complicated than in the continuous case. In fact, there exist in the literature two definitions of records, ordinary and weak records, which are equivalent in the case of continuous parents, but not so in the discrete case. The weak record time sequence is defined as , and . Then, the weak record of the sequence (or from the parent distribution F) is the random variable , . Note that the difference with ordinary records is that ties with the previous records are also considered as records. Obviously, for continuous distributions, weak records coincide (almost surely) with ordinary records. In the case of discrete distributions, ordinary and weak records lead to different definitions.
We say that a random variable, X, follows a geometric distribution with parameter if its pmf is for and . We denote .
The number of records in geometrically distributed sequences has been studied in connection with a data structure called skiplist; see [7,8] or [9]. Related studies can be found in [10], which established the asymptotic normality of the number of records from geometric parents and also studied large deviations, local limit theorems and approximations. Some of these results have been extended to other discrete parents, so [11] established a CLT for the number of records from discrete distributions; rates of growth for the number of records and laws of large numbers can be found in [12,13].
Most of the results above deal with limit properties, and not very much attention has been paid to the exact distribution of statistics related to the number of records. Our main contribution in this paper is the use of techniques of q-calculus for the study of the distribution theory of records from geometric parents.
In Section 2, we present the Jackson integral or q-integral and the extension of this concept to multiple q-integration in a convex polytope; see [14]. We also present in this section some useful lemmas for the calculation of multiple q-integrals (proofs are deferred to the Appendix A) and the non-central q-Stirling numbers; see [15]. No previous knowledge of q-calculus is necessary to understand this section. In Section 3, we study some statistics related to the sequence of weak records from a sequence of geometrically distributed random variables. Theorem 1 is a central result in this section. It gives a multiple q-integral representation of the joint probability function of the (weak) record times. As a consequence of this result, we show that the distribution of record times and that of the number of records in a sample of size n can be written in terms of the central and non-central q-Stirling numbers. We also study the distribution of inter-(weak) record times and we prove its asymptotic log-normal behavior. In Section 4, it becomes clear that the results obtained for a geometric parent, Ge, are q-analogues of the corresponding results in the case of an absolutely continuous distribution, so we can say that the absolutely continuous case is the limit case of geometric parents when . Finally, in Section 5, we consider the case of ordinary records. We show that there is the following duality: the formulas in the case of ordinary records can be obtained from the corresponding ones in the case of weak records by replacing with .
Throughout this paper, we will use the usual conventions with sums and products: for , and , and more generally, the summation (product) over an empty set of indexes is equal to zero (one). Additionally, .
2. The Jackson Integral and -Analogues
In the following, we present some basic concepts of q-calculus. The interested reader can find more details in [16], or, for a deeper treatment of the subject, see [2]. Regardless, this section contains all the results of q-calculus that we need in this paper.
The most basic concept in q-calculus is perhaps that of the q-number. Let . The q-numbers are defined as
for . For a positive integer, m, its q-factorial is and . Throughout this paper, . Thus, for instance, for , we have .
Given a function f defined on the non-negative semi-axis of the real line, and , the Jackson integral or q-integral of f in the interval is defined as
provided that the series in the right hand side (rhs) of (5) is absolutely convergent.
The Jackson integral has some resemblances to the Riemann integral; for instance, for ,
Among the differences with respect to the ordinary Riemann integrals, we mention that the formula of the change of variables is no longer valid, so, for natural ,
The definition of a q-integral can be extended to a multiple q-integral in a convex polytope; see [14]. Let us denote as the set of permutations of . Let . For a function of m variables, and a convex polytope , the q-integral of f over C with respect to the order of integration is defined as
where , is the set of real numbers depending on the values of with
and (implicitly, hereafter we assume the absolute convergence of the involved series).
It follows from the definition that when the polytope C is a hyper-rectangle, i.e., , the ordering is irrelevant; that is to say, we can change the order of integration, and so, for any ,
for f, a function in m variables, such that the series in the rhs of (8) is absolutely convergent. Thus, for example,
for any and , .
The next Lemmas are useful in connection with iterated and multiple Jackson integrals.
Lemma 1.
Let f be a real function of m variables such that the series appearing below are absolutely convergent. Then, for any ,
- (a)
- .
- (b)
In general, when the polytope of integration is not a hyper-rectangle, the order of integration is important. We illustrate this fact with the following example:
but,
Throughout this paper we will use the reverse ordering, . For , we define the polytopes
and from the definition of a multiple q-integral given in (7),
Lemma 2.
Let f be a real function of m variables such that the series appearing below are absolutely convergent. Then,
- (a)
- (b)
Comparing the results in Lemmas 1 and 2, we obtain
and using these results in combination with (6), it can be confirmed that for and , real numbers such that , for ,
and in particular, for ,
and for and ,
Given a (formal) power series , we denote , . For non-negative integers, n and r, the non-central q-Stirling numbers of the first kind (see [15]) are defined as
and by convention if or or . The case corresponds to the q-Stirling numbers of the first kind; see [2], Ch. 5. We denote . The non-central q-Stirling numbers satisfy the recurrence
Some useful relationships for our purposes are the following:
By equating coefficients of in both sides of
we obtain
Lemma 3.
For the integers , and ,
- (a)
- .
- (b)
- , where .
Lemma 4.
For and f, a non-negative function defined in the non-negative real numbers,
- (a)
- (b)
Many variants of Lemma 4 (all of them with similar proofs) are possible; for instance, for a non-negative function , ,
Note that results like (25) or those in Lemma 4 justify the formal interchange of the summation and integration symbols.
Suppose that is an expression involving the parameter q. Then, it is said that is a q-analogue of A if when . q-analogues appear in different areas of mathematics and physics, such as, for instance, in quantum groups, string theory, fractals, dynamical systems, elliptic curves, etc. It is clear that q-numbers are q-analogues of the corresponding real numbers. Expression (6) is a q-analogue of the well-known formula for Riemann integrals , . Similarly, (9), (14)–(17) are q-analogues of the corresponding expressions obtained by substituting q-integrals for Riemann integrals and q-numbers for real numbers. Then, the following lemma is immediate.
Lemma 5.
If is a polynomial with real coefficients in the variables , then the multiple q-integral
is a q-analogue of the multiple Riemann integral
Finally, we show a connection between multiple q-integrals and expected values involving geometric samples. Let , be independent random variables with geometric distributions of the parameter (i.e., ). Let . Let be a function such that and thus, from (8) we get
3. Weak Records from Geometric Parent
Let be a sequence of iid random variables with a common distribution Ge with and . The index n in the process can be regarded as discrete time. A trajectory of the process is a sequence of non-negative integers. A trajectory up to time n is a finite sequence of length n of non-negative integers. We will denote a finite trajectory of length by a sequence where , are non-negative integers. Consecutive equal integers in a trajectory are denoted with exponents; for instance, is the sequence of length t. A sequence of length 1 formed with a non-negative integer less than k is denoted by . Similar notations are used in the cases ≤, >, ≥, etc. As an example of these notations, consider the trajectory . Whenever it is convenient, this trajectory could be denoted as: or , or ,…etc.
Theorem 1.
For ,
- (a)
- For , the joint pmf of from admits the following q-integral representation:
- (b)
- The joint probability generating function of isfor , .
Proof.
(a) For the integers , , the event corresponds to finite trajectories of length of the process , which are of the form:
with . The probability associated with the trajectory given in (28) is . Then, using Lemma 2(a),
(b) Firstly, observe that for the integers , and ,
For and , , . Using Theorem 1(a) and (29), we have
where the interchange of the summation and integral symbols can be justified with a similar argument to the one in the proof of Lemma 4. □
Theorem 2.
For and ,
- (a)
- (b)
Proof.
- (a)
- From Theorem 1(b), the probability-generating function of isIf , then . Therefore, if , then we have that , and the following expansion holds:On the other hand, it is not difficult to check thatfor , and .
- (b)
- For , using Theorem 2(a),
□
Some other statistics related to weak record indicators can be expressed in terms of q-integrals and q-numbers as shown below.
Theorem 3.
For ,
- (a)
- .
- (b)
- .
- (c)
- .
Proof.
(a) For , firstly observe that
Then,
The event consists of all the trajectories of the process with a weak record at position n, i.e., trajectories of the form ; then, using the definition of the Jackson integral (see (5) and (33))
(b) The trajectories corresponding to the event are of the form , with . Then, using Lemma 2 and (16),
(c) The trajectories corresponding to the event are of the form , with . Then
(see the proof of (a) for the last step). □
The weak record indicators from geometric parents, as opposed to the continuous case, are not independentl for instance,
The number of weak records in a sample of size n is (note that the 0-th record, , is counted).
Theorem 4.
For ,
- (a)
- .
- (b)
- .
- (c)
Proof.
- (a)
- Observe that if , . Then, the result is immediate using Theorem 2(b).
- (b)
- For , using (a) and the properties (22) and (23)For , note that if in a sample of size n, there is only one weak record. Then, it is in the first position, so, using Theorem 3(c),where in the last equality we have used the property .
- (c)
- In the proof of Theorem 3(a), we obtained the integral representationClearly, this formula is also valid for . Therefore,
□
The inter-(weak)-record times are defined as , .
Theorem 5.
For ,
- (a)
- The joint pmf of the inter-record times from a Ge(p) parent isfor , .
- (b)
- The pmf of the m-th inter-weak record is
- (c)
Proof.
(a) It is straightforward from Theorem 1(a).
The next result proves the asymptotic log-normality of the inter-(weak) record times from geometric parents.
Theorem 6.
The inter-weak record times from a parent, , satisfy
Proof.
Let , , be independent random variables. The CLT yields,
For , let us define . From Theorem 5(c) and (26),
for .
Let us define the functions . These functions satisfy the following properties:
- (a)
- (b)
- For a fixed value of , the function is non-decreasing.
Then, for ,
where .
Let be the cumulative distribution function of a standard normal random variable. If we show that
then we can conclude
and this proves (37).
In order to prove (38), for , we have
where
and this quantity can be made arbitrarily small choosing an yet small enough. □
A by-product of Theorem 6 is the following weak law of large numbers:
When , , and this is the limit in probability for as , where denotes the waiting-time between the -th and the m-th upper record in a sequence of iid random variables with a continuous distribution; see [6]. Then, (39) can be seen as a q-analogue of Theorem 1 in [6]. Similarly, our Theorem 6 is a q-analogue of Theorem 2 in [6].
4. Weak Records from Geometric Parents Are -Analogues of Records from Absolutely Continuous Parents
In this section, we show that the distributions and statistics related to weak records from a geometric parent, Ge are q-analogues () of the corresponding distributions and statistics related to ordinary records from absolutely continuous parents.
Theorem 7.
Proof.
Without loss of generality, suppose that are the record times of a sequence of iid random variables uniformly distributed in the interval . For integers , the event corresponds to trajectories of length of the form
with . Then,
where the last step is the result of the change in variables , .
The addition of q-analogues is a q-analogue of the corresponding sum. As an application of this property, as a corollary of Theorem 7, we have that the pmf of (given in Theorem 2) is a q-analogue of (3). Clearly, the pmf of inter-weak-record times from geometric parents (see (35)) is a q-analogue of (4). Similarly, as , it can be checked that
so all of these quantities in the case of a geometric parent are q-analogues of the corresponding quantities in the absolutely continuous case.
5. Ordinary Records from Geometric Parents
Following similar arguments as in the previous sections, it is possible to derive results for ordinary records from a Ge parent, . As before, let and . We quote without proof the following results.
Joint distribution of (ordinary) record times:
for and . The random vector has a probability generating function
Probability function of :
for and .
Joint distribution of inter-record times:
for , .
Probability function of inter-record times:
for , .
Record indicators:
Number of ordinary records:
Comparing these results with those in Section 3, we observe that the formulas for ordinary records are obtained by changing q by in the formulas for weak records. Additionally, as in the case of weak records, the above quantities are Q-analogues of the corresponding ones in the absolutely continuous case.
6. Conclusions
In the case of a geometric parent, we have obtained representations of the joint distribution of weak and ordinary record times and inter-record times in terms of Jackson integrals. These representations are useful for obtaining the marginal distribution of record times, inter-record times, record indicators, number of records, etc. In particular, the marginal distribution of record times can be expressed in terms of non-central q-Stirling numbers. Additionally, we show that statistics related to weak records from geometric parents, Ge, behave as q-analogues () of the corresponding statistics in the case of absolutely continuous parents. There is also a duality between weak and ordinary records from geometric parents: it is possible to obtain results from ordinary records by changing q by in the corresponding formulas or results for weak records.
Author Contributions
These authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.
Funding
Grant PID2020-114594GB-C21 funded by MCIN/AEI 10.13039/50110001033.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Proof of Lemma 1.
(a) By changing the summation variables, , , …, , then
where the last equality is justified by (8).
(b) By using the change of variables , , …, , we obtain
□
Proof of Lemma 2.
(a) If , the result is immediate by using the definition of Jackson integral given in (5). Thus, let us suppose in the following that . Let us define
First of all, observe that by construction, coincides with the left-hand side of the statement in Lemma 2. On the other hand, the definition of the Jackson integral yields
and by (back) recurrence, it is not difficult to show that
for . Finally, from (A1) and (A2),
and the result follows from (10).
(b) The proof is similar to the one in , so we omit it. □
Proof of Lemma 3.
Proof of Lemma 4.
For simplicity, we only give the proof in the case that ; for larger values the proof is similar but tedious. Thus, we have
□
References
- Euler, L. Introductio in Analysin Infinitorum. Tomus Primus; Sociedad Andaluza de Educación Matemática “Thales”, Seville; Real Sociedad Matemática Española: Madrid, Spain, 2000. [Google Scholar]
- Ernst, T. A Comprehensive Treatment of q-Calculus; Birkhäuser/Springer Basel AG: Basel, Switzerland, 2012. [Google Scholar]
- Chandler, K.N. The distribution and frequency of record values. J. R. Stat. Soc. Ser. B 1952, 14, 220–228. [Google Scholar] [CrossRef]
- Arnold, B.C.; Balakrishnan, N.; Nagaraja, H.N. Records; Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1998. [Google Scholar]
- Nevzorov, V.B. Records: Mathematical Theory; Translations of Mathematical Monographs; American Mathematical Society: Providence, RI, USA, 2001; Volume 194. [Google Scholar]
- Neuts, M.F. Waitingtimes between record observations. J. Appl. Probab. 1967, 4, 206–208. [Google Scholar] [CrossRef]
- Kirschenhofer, P.; Prodinger, H. The path length of random skip lists. Acta Inf. 1994, 31, 775–792. [Google Scholar] [CrossRef][Green Version]
- Kirschenhofer, P.; Martínez, C.; Prodinger, H. Analysis of an optimized search algorithm for skip lists. Theor. Comput. Sci. 1995, 144, 199–220. [Google Scholar] [CrossRef][Green Version]
- Prodinger, H. Combinatorics of geometrically distributed random variables: Left-to-right maxima. Discret. Math. 1996, 153, 253–270. [Google Scholar] [CrossRef]
- Bai, Z.D.; Hwang, H.K.; Liang, W.Q. Normal approximations of the number of records in geometrically distributed random variables. Random Struct. Algorithms 1998, 13, 319–334. [Google Scholar] [CrossRef]
- Gouet, R.; López, F.; Sanz, G. Central limit theorems for the number of records in discrete models. Adv. Appl. Probab. 2005, 37, 781–800. [Google Scholar] [CrossRef]
- Gouet, R.; López, F.; San Miguel, M. A martingale approach to strong convergence of the number of records. Adv. Appl. Probab. 2001, 33, 864–873. [Google Scholar] [CrossRef]
- Key, E. On the number of records in an iid discrete sequence. J. Theor. Probab. 2005, 18, 99–107. [Google Scholar] [CrossRef]
- Kim, J.; Stanton, D. On q-integrals over order polytopes. Adv. Math. 2017, 308, 1269–1317. [Google Scholar] [CrossRef]
- Charalambides, C. Non-central generalized q-factorial coefficients and q-Stirling numbers. Discret. Math. 2004, 275, 67–85. [Google Scholar] [CrossRef]
- Kac, V.; Cheung, P. Quantum Calculus; Universitext; Springer: New York, NY, USA, 2002. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).