Near-Record Values in Discrete Random Sequences

: Given a sequence ( X n ) of random variables, X n is said to be a near-record if X n ∈ ( M n − 1 − a , M n − 1 ] , where M n = max { X 1 , . . . , X n } and a > 0 is a parameter. We investigate the point process η on [ 0, ∞ ) of near-record values from an integer-valued, independent and identically distributed sequence, showing that it is a Bernoulli cluster process. We derive the probability generating functional of η and formulas for the expectation, variance and covariance of the counting variables η ( A ) , A ⊂ [ 0, ∞ ) . We also derive the strong convergence and asymptotic normality of η ([ 0, n ]) , as n → ∞ , under mild regularity conditions on the distribution of the observations. For heavy-tailed distributions, with square-summable hazard rates, we prove that η ([ 0, n ]) grows to a ﬁnite random limit and compute its probability generating function. We present examples of the application of our results to particular distributions, covering a wide range of behaviours in terms of their right tails.


Introduction
Outstanding achievements and world records in athletics events such as the 100 m sprint always make headlines and arouse widespread admiration. Similarly, considerable media attention and public concern are attached to record figures (often bad) relating to the economy, the weather or healthcare systems. Crucial social questions arise when we are faced with a steady flow of records, which are presented as ominous signs of dramatic underlying phenomena. It is therefore unsurprising that the term "record" has become such a constant in our modern everyday life and in a wide range of specialist domains. The probabilistic theory and statistical analysis of record breaking data can be helpful in assessing the seriousness of these issues.
The mathematical theory of records is well developed, especially for data generated by independent and identically distributed (i.i.d.) random variables (r.v.) with a continuous underlying distribution function. As is well known, in this setting, one can only expect about log n record values among n observations, which means that records are rare. The reader interested in the theory of records can consult the monographs [1][2][3]. For statistical inference from record data, see [4].
Concepts of "quasi-records" emerged as natural extensions of records and have proven to be worthwhile from a mathematical as well as an applied perspective. The general idea of values close to records was translated into a variety of definitions that were theoretically analysed and applied in widely different contexts. Near-records were introduced in [5] for applications in finance, and their properties were analysed in [6,7]. In addition, the related concept of the δ-record was introduced in [8] and later studied in [9][10][11][12]. These objects have practical applications in the case of negative δ, since δ-records are more numerous than records. So, by considering samples of δ-records for statistical inference, we address the problem of the scarcity of records while keeping the extremal nature of the data. Indeed, it has been shown in [9] and in references therein that inferences based on δ-records outperform those based on records only.
The main objects of interest in this work are near-records. An observation is a nearrecord if it is not a record but is at a distance of less than a > 0 units from the last record; that is, it falls short of being a record by less than a units. While the number of records in an i.i.d. sequence of continuous r.v. grows with the logarithm of the number of observations, the number of near-records grows at speeds depending on the distribution of the observations. In fact, for heavy-tailed distributions, there are fewer near-records than records (in extreme cases, only a finite number of near-records can be observed along the whole sequence), while for light-tailed distributions, near-records outnumber records; see [13] for details.
Another interesting aspect of near-records is related to their values. It is well known that record values of an i.i.d. sequence behave as the so-called Shorrock process [14], which is a mixture of a non-homogeneous Poisson process and a Bernoulli process. In the particular case of non-negative integer-valued r.v., k is a record value with probability P(X 1 = k|X 1 ≥ k), and the events wherein {k is a record value} are independent.
In this paper, we focus on the process of near-record values for i.i.d. sequences of r.v., taking non-negative integer values. The case of continuous r.v., analysed in [13], showed that near-record values follow a Poisson cluster point process, where records are the centres of the clusters and near-records are the points in each cluster. The main characteristics of this process, including its asymptotic behaviour, were derived from the properties of the Poisson cluster process, which has been thoroughly studied in the literature. In the discrete setting of the present paper, we prove that near-record values also behave as a cluster process, with centres following a Bernoulli process. We fully characterise the process by giving an expression for its probability generating functional. In particular, we find the exact distribution of the number of times that k is a near-record, which turns out to be a mixture of a point mass at 0 and a geometric distribution. We also characterise the distribution of the total number of near-records for heavy-tailed distributions. Moreover, we study the limiting behaviour of the number of near-records with values less than n, as n goes to infinity, by giving laws of large numbers and central limit theorems. Rather than relying on properties of cluster processes, as done in [13], here, we use a more direct approach that consists of approximating the sequences under study by a sum of independent r.v. We give several examples of applications of our results to particular families, ranging from heavy-tailed distributions, with a finite number of near-records, to light-tailed ones, such as the Poisson distribution.
The paper is organised as follows: notations and first definitions are presented in Section 2. The process of near-record values is studied in Section 3, while in Section 4, we consider the eventual finiteness of the total number of near-records, followed by asymptotic results in Section 5. Finally, illustrative examples are shown in Section 6 and Appendix A is devoted to technical results.

Notation and Preliminary Definitions
The sets of real and positive real numbers are denoted by R and R + , respectively. The sets of positive and non-negative integers are denoted by N and Z + , respectively. Sequences in R are indexed by N and are written in lower-case letters, between parentheses, such as (x n ), (y k ), etc. All r.v. are assumed to be defined on a common probability space (Ω, F , P). The indicator r.v. of an event B ∈ F , taking the value 1 on B and 0 otherwise, is denoted by 1 1 B . The indicator function of A ⊆ R, equal to 1 on A and 0 on R \ A, is denoted by 1 1 A .
When referring to a geometric r.v. or distribution throughout the paper, we assume 0 as the starting value. The probability generating function (p.g.f.) of an r.v. X, taking values in Z + , is defined as ϕ X (t) = E(t X ) = ∑ ∞ k=0 t k P(X = k), for all t ∈ R, such that the series is absolutely convergent.
Sequences of r.v. are also indexed by N and are written in upper-case letters, such as (X n ), (Y k ), etc. The convergence of deterministic sequences to a limit L is denoted by x n → L or lim x n = L, and it is implicitly understood as n → ∞, unless otherwise stated. The notation x n ∼ y n stands for x n /y n → 1. The same notation applies to random sequences, where the mode of convergence (almost sure a.s. −→ or in distribution D −→) is indicated over the arrow. The σ-algebra of Borel subsets of R + is denoted by B + . Definition 1. Let (X n ) be a sequence of r.v. and let a be a positive parameter. Then, for n ∈ N, 1.
X n is a record if X n > M n−1 , and 2.
From the above definitions, it is clear that a near-record is not a record, but it can take the value of the current record. Other random sequences of interest related to records are record times (L n ), defined as for n ≥ 1, with L 0 = 0, and record values (R n ), given by R n = X L n = M L n , for n ≥ 1. Additionally, we consider the set of record values as a point process on R, which can be described by the random counting measure ξ, defined by We also define I n := ξ({n})-the indicator of the event for which a record takes the value n.
Observe that record times L n are the jump times of the sequence of partial maxima and that record values R n are the (strictly increasing) subsequence of partial maxima (M n ), sampled at those jump times. However, without further probabilistic assumptions on (X n ), it may happen that L n = ∞, from some value of n on, which is equivalent to the existence of a final record. Furthermore, we have to ensure that the counting measure ξ is boundedly finite in the sense of being finite on bounded sets.
Similarly, the sequence (L a n ) of near-record times is defined by L a n = min{k ∈ N | k > L a n−1 , M k−1 − a < X k ≤ M k−1 }, for n ≥ 1, with L a 0 = 0, and near-record values (R a n ) are given by R a n = X L a n , for n ≥ 1. We define the counting measure of near-record values by and define the related r.v. η(n) = η([0, n]) and η n = η({n}), for n ∈ Z + . As for records, assumptions are needed in order to ensure that near-record times and values are well defined. Additionally, in order to characterise η as a cluster point process, we consider a classification of near-records in terms of their proximity to records.

Definition 2.
(a) For m, n ∈ N, the n-th near-record value R a n is said to be associated to the m-th record value R m if L m < L a n < L m+1 . (b) For m ∈ N, the point process η(· | R m ) of near-record values associated to R m is defined by the random counting measure η(A | R m ) = card{n ∈ N | R a n ∈ A, L m < L a n < L m+1 }, A ∈ B + .
We state here the probabilistic assumptions regarding (X n ), which hold throughout the paper. We assume that (X n ) is a sequence of i.i.d. r.v., taking non-negative integer values, with p k := P(X 1 = k), y k := P(X 1 > k), k ∈ Z + . For convenience, we define p k = 0 and y k = 1 for k < 0.
In addition, let be the hazard or failure rates, and Note that y k = ∏ k i=0 (1 − r i ), k ∈ Z + . In order to ensure that no final record exists and thus that all record times are well defined, we assume that y k > 0, ∀k ∈ Z + . This, in particular, implies r k < 1, ∀k ∈ Z + . In addition, to avoid unnecessary complications, we assume a ∈ N. • The sequence of partial maxima is According to Definition 2, there are no near-records associated to R 1 , there is one near-record (with value 3) associated to R 2 , one near-record (with value 6) associated to R 3 , one near record (with value 7) associated to R 4 and two near-records (with values 6 and 7) associated to R 5 . Note also that, as X 17 = R 6 = 12 and a = 3, there will be no near-records with value smaller than 10 after observation 17. Thus, η ([0, 9]), the number of near-records with value in the interval [0, 9], is equal to 5.

The Point Process of Near-Record Values
We recall that a point process N on R + can be seen as a random measure and has a probability generating functional (p.g.fl.) defined by G N [h] = E(exp( log h(x)N(dx))), under appropriate conventions regarding the logarithm of 0, where h : R + → [0, 1] is a measurable function equal to 1 outside some bounded subset of R + (such functions are referred to as "suitable"). Alternative formulas for the p.g.fl., in the form of a productintegral or a product are given by In this section, we show that the near-record process η is a discrete cluster process.
process η can be seen as a superposition of a denumerable family of point processes which, by Proposition 1 (c) below, are conditionally independent. Moreover, since the r.v. X n take values in Z + , we find that, for every bounded A, η(A) ≤ L K+a , where K ∈ N is an upper bound of A, the process η is boundedly finite.
We characterise η by means of its p.g.fl. and compute its first moments and other quantities of interest. To that end, we first present some useful results about records and near-records.

Lemma 1. (a)
The point process ξ of record values has its atoms in Z + , and the r.v. I n are independent Bernoulli, with E(I n ) = r n , n ∈ Z + . (b) For any suitable function h, Proof. For a proof of (a), see, for instance, Theorem 16.1 in [3]. To prove (b), from (a) and the second formula in (6), we obtain, noting that h = 0 outside a bounded set, and using the convention 0 0 = 1, That is, S m is geometrically distributed, conditionally on R m . (b) Let L a n 1 < · · · < L a n Sm be the near-record times associated to R m . Then, conditionally on R m , S m , the near-record values Y m,j := X L a n j where π(k, i) := Moreover, conditionally on R m , S m , the r.v. N m,k : Proof. (a) Note that the r.v. X n , n > L m , are independent and identically distributed as . Lastly, S m is the number of terms X k n up to (but no including) the first X k n > R m . Hence, conditionally on R m , S m is geometrically distributed, as stated. (b) The near-record values Y m,j are precisely the X k n before the next record. So, conditionally on R m , S m , they are i.i.d. with probabilities π(k, R m ). In addition, from the arguments above, it is clear that the N m,k are (conditionally) multinomial.
We compute below the p.g.fl. of the point process η(· | R m ), which is obtained from (6) by taking the conditional expectation. That is, For h : Additionally, let α i (n) = α i ([0, n]), n ∈ Z + .

Proposition 2. For a suitable function h,
Proof. Suppose R m = i, for some m ∈ N. From (10) and (b) of Proposition 1, we get where the third and fourth equalities, as shown above, follow from the expressions of the p.g.f. of the multinomial and geometric distributions, respectively. Definition 3 (Definition 6.3.I in [15]). A (boundedly finite) point process N is a cluster point process on R + , with the centre process N c on R + and component processes the family of point processes {N(· | y) : y ∈ R + } if, for every bounded A ∈ B + , Definition 4. For i ∈ Z + , let ζ i be the point process with p.g.fl. given by where h is a suitable function and α i (h) is defined in (11).
The point process η of near-records is a cluster process on Z + , with the centre process ξ and independent components processes In particular, taking h(k) = t 1 1 A (k) , t ∈ [0, 1] and A ∈ B + bounded, we obtain the p.g.f. of η(A), given by (c) For every bounded A, B ∈ B + , Proof. (a) Observe that So, according to Definition 3, η is a cluster point process, as asserted. Independence of component processes follows from (c) in Proposition 1, because η(A | R m ) is F m -measurable, for any m ∈ N.
For ϕ η(A) (t) we replace h by t 1 1 A in (14) and get (15), noting that α i (t 1 N m,k and recall that N m,k is binomial, conditional on R m , S m , with parameters S m , π(k, R m ), and that S m is geometric, conditional on R m , with parameter q Rm . Moreover, N m,A := ∑ k∈A N m,k is binomial, conditional on R m , S m , with parameters So, noticing that π(A, i) which is finite since α i (A) > 0 only for a finite set of i values.
(c2) From the computations above, it is clear that Hence, the variance of the conditional expectation is We compute next the expectation of the conditional variance, namely E(Var(η(A) | R)). Observe that, because of the conditional independence of the η(A | R n ), n ∈ N, we have Collecting terms from the expressions above and using the formula Var(η(A)) = Var(E(η(A) | R)) + E(Var(η(A) | R)), we obtain (c3) The covariance Cov(η(A), η(B)), when A ∩ B = ∅, follows immediately from the formula for the variance, noting that η(A ∪ B) = η(A) + η(B).

Corollary 1.
For N ∈ Z + , the r.v. η N (the number of near-records taking the value N) is distributed as a mixture of a point mass at 0, with probability 1−c 1+d , and a geometric distribution, with a success probability equal to 1 1+d . That is, where c = r N and d = r N /q N+a−1 . Moreover, Proof. After simple computations, we obtain the p.g.f., which yields the probability mass function (p.m.f.) (20). Formulas in (21) In fact, the r.v. η N and η M are independent if |M − N| ≥ a, due to the independence of the r.v. I n . In other words, the r.v. η N are (a − 1)-dependent.

Finiteness of the Number of Near-Records
s; that is, the number of near-records in the whole sequence (X n ) is finite a.s. Moreover, η(R + ) has the finite expectation and p.g.f. given by with α i (R + ) = y i−a y i − 1.

Proof. From Proposition 1, we have
Taking the expectation above, we obtain where the final term in the display above follows from the Cauchy-Schwarz inequality. Therefore, by the Borel-Cantelli lemma, P(S m > 0 i.o.) = 0, which yields the result. In order to compute the p.g.f. of η(R + ), we observe that η(n) a.s.
−→ η(R + ), and so, by the monotone convergence theorem, ϕ η(n) (t) → ϕ η(R + ) (t), for t ∈ [0, 1]. Furthermore, from (15), we have The interchange of the limit and product above is justified by the monotone convergence theorem, after taking logarithms, since the sequence inside the product decreases with n.
Finally, (23) is obtained, for example, from the derivative of ϕ η(R + ) (t) at t = 1 − or as the limit of E(η(n)). Finiteness follows from the bound 1 − q i ≤ ∑ i j=i−a+1 r j , used in (25), which implies q i → 1. Indeed, for sufficiently large i, we have q i ≥ 1/2 and The conclusion α i (R + ) < ∞ is obtained after arguing as in (25).

Asymptotic Behaviour
We now focus on the asymptotic behaviour of η(n). From Theorem 2, we know that if ∑ ∞ i=0 r 2 i < ∞, then lim n→∞ η(n) is finite a.s. In this section, we obtain laws of large numbers and a central limit theorem for η(n) under the assumption ∑ ∞ i=0 r 2 i = ∞.

Lemma 2. The random variables
are independent with p.g.f.
Proof. For simplicity, we prove pairwise independence since the argument extends easily to the general case, but details are somewhat laborious. We compute the joint p.g.f. of Z i , Z j as follows: for s, t ∈ [0, 1], From the formula above, we get ϕ Z i ,Z j (s, 1) = ϕ Z i (s), as in (28). In addition, (27) is the number of near-records associated to i if i is a record value and is equal to 0 otherwise. Indeed, (28) shows that Z i is distributed as a mixture of a point mass at 0 and a geometric random variable of parameter q i , with respective weights 1 − r i , r i .

Remark 2. Note that Z i in
Our interest in the variable Z i arises from the following inequalities, which are easily verified: The strategy of the proof is to establish the desired asymptotic results for the sum of Z i s, which are then transferred to η. For that purpose, we assume some minimal conditions on the hazard rates r n , besides ∑ ∞ i=0 r 2 i = ∞. The following proposition gathers some useful facts about the variable Z i .
i=0 r 2 i = ∞ and lim sup r n < 1 or (ii) lim r n = 1 and lim 1−r n 1−r n−1 = 1 hold, then Proof. (a) The (factorial) moments of Z i are computed by differentiating the p.g.f. in (28).
For the variance, we obtain from (a) and the divergence of expectations that (c) Suppose that (i) holds. Then, lim inf q n > 0, which implies that expectations E(Z n ) and variances Var(Z n ) are bounded above. Hence, from (b), the limits in (c1) hold.

Proof. By (29) and (c1) in Proposition 3, the result follows if we show that
By the strong law of large numbers for sequences of i.i.d. r.v., (32) follows if we prove that Suppose first that (i) holds. Note that Var(Z i ) ≤ 2E(Z i )/q i , i ∈ Z + and also that lim inf n q n > 0. Hence, there exists a positive constant γ such that q n > γ, for n ∈ Z + . So, where convergence in the right-hand side of (34) follows from Abel-Dini's Theorem A1.
On the other hand, if condition (ii) holds, then it is easy to see that q n → 0 and lim n n β (1 − q n q n−1 ) = 0. In addition, q n E(Z n ) → 1 and q 2 n Var(Z n ) → 1. Therefore, (33) is equivalent to which follows from Proposition A1, thus proving the stated result.
Proof. First, we prove asymptotic normality for ∑ n i=1 Z i and then transfer the result to η(n). To that end, we show that the following Lyapunov condition holds: where Note first that Var(Z i ) ≥ E(Z i ). Moreover, if (i) holds, then lim inf q n > 0, and from (37) above, we obtain E(|Z i − E(Z i )| 3 ) ≤ KE(Z i ), i ∈ N, where K > 0 is a generic constant. So, the sequence in (36) is bounded above by K(∑ n i=1 E(Z i )) −1/2 , which tends to 0, because of Proposition 3 (b).
From the inequalities above, (41) is a direct consequence of (c1) in Proposition 3.

Examples
In this section, we present some examples of application of our results to particular distributions. For each distribution, we consider the r.v. η N analysed in Corollary 1. In particular, we give formulas for E(η N ), Var(η N ) and the correlation ρ(η N , η N+1 ). We also study the asymptotic behaviour of η(n).
The distributions that we consider in this section are very different in terms of their right tails. Example 2 is devoted to a heavy-tailed distribution, similar to the Zeta distribution (see Example 3.1 in [16]), which is the discrete counterpart of the Pareto distribution. Example 3 deals with the geometric distribution, which has an exponential-like tail, while Example 4 is about the Poisson distribution, which is light-tailed.
Example 2 (Heavy-tailed distribution). Let p k = (k(k + 1)) −1 , hence y k = (k + 1) −1 and r k = (k + 1) −1 , k ∈ N. Then, from (21) we have Regarding the asymptotic behaviour of η(n), we observe that ∑ ∞ k=0 r 2 k < ∞ and so, from Theorem 2, η(n) → η(R + ) < ∞ a.s. We now compute the main characteristics of this r.v. For the expectation, note that α i (R + ) = i, for i < a, and α i (R + ) = a i−a+1 , for i ≥ a. So, from (23), we obtain It is interesting to see that the expected total number of near-records is equal to the near-record parameter a. For the variance, we use (19) to obtain which, after some algebra (see Appendix B), yields In addition, the p.g.f. of η(R + ) is easily computed from (24) as The p.m.f. of η(R + ) can be obtained from (44). For instance, taking t = 0, we get Example 3 (Geometric distribution). The geometric distribution has p k = p(1 − p) k , y k = (1 − p) k+1 , r k = p, for k ∈ Z + , with p ∈ (0, 1). From (21), it is easy to see that Observe that none of the quantities above depend on N. The p.m.f. of η N is given by (20), with c = p, d = p (1−p) a .
Since r k = p, k ∈ Z + , hypothesis (i) of Theorems 3 and 4 holds, and so we have a strong law of large numbers and a central limit theorem for η(n). For Theorem 3 note that η(n) = ∑ n N=0 η N and therefore, For the central limit theorem, as shown in the proof of Theorem 4, we can replace Var(η(n)) by ∑ n i=0 Var(Z i ) in the denominator of (35). Since q i = (1 − p) a , for i ≥ a, from Proposition 3 (a), we get Therefore, Example 4 (Poisson distribution). The Poisson distribution has p k = e −λ λ k /k! for k ∈ Z + . Although there is not a manageable form of y k and r k , the following bounds, taken from [17], are useful: Explicit expressions for the quantities in (21) can be written out, but they shed little light on their dependence on λ and N. Instead, we analyse their asymptotic behaviour for large N. By (46), r k → 1, so y k /y k−1 → 0. Therefore, ∑ N+a−1 i=N r i /y i ∼ 1/y N+a−1 and and ρ(η N , η N+1 ) ∼ λ/N. Hence, as in Example 2, the correlation coefficient between η N and η N+1 converges to 0 as N → ∞.
For the asymptotic behavior of η(n), note that (46) guarantees that lim n r n = 1 and, moreover, |r n − r n−1 | ≤ C/n 2 and 1 − r n ≥ D/n, for all large enough values of n and some positive constants C and D. Hence, condition (ii) in Theorem 3 holds, with β = 3/4, and so does condition (ii) in Theorem 4. In order to apply Theorem 3 note that, since E(η N ) ∼ (N/λ) a , we have E(η(n)) = ∑ n N=0 E(η N ) ∼ n a+1 λ a (a+1) and η(n) n a+1 a.s.
For the central limit theorem, as in Example 3, the scaling sequence can be taken as (∑ n i=0 Var(Z i )) 1/2 . Since In addition, from the proof of Theorem 4, the centring sequence in the central limit theorem can be chosen as ∑ n i=0 E(Z i ), which in turn can be replaced by ∑ n i=a q −1 i . Indeed, for i ≥ a, Thus, by (46), ∑ n i=0 e i ≤ n + Cn a , for a given constant C > 0, which implies ∑ n i=0 e i (∑ n i=0 Var(Z i )) 1/2 → 0.

Remark 3.
In Examples 3 and 4 above, we observe that the normalising sequences in the law of large numbers and central limit theorem depend on the right-tail behaviour of the parent distribution of the observations. This is also the case for the speed of convergence of η(n) to the normal distribution in the central limit theorem, as shown in Figure 2. Convergence is very fast for the geometric distribution, while it is much slower in the Poisson distribution (the distribution of η(30) in the geometric distribution is closer to the normal than the distribution of η(100) in the Poisson).

Conclusions and Future Work
In this paper, we have studied the point process of near-record values from a sequence of independent and identically distributed discrete random variables. Near-records arise as a natural complement of records, with applications in statistical inference.
We have shown that this process is a Bernoulli cluster process and obtained its probability generating functional, as well as formulas for the expectation, variance, covariance and probability generating functions for related counting processes.
We have given a condition for the finiteness of the total number of near-records along the whole sequence of observations. This condition is provided in terms of convergence of the squared hazard rates series. In addition, the explicit expression of its probability generating function is obtained.
In the case where the total number of near-records is not finite, strong convergence and central limit theorems for the number of record values in growing intervals are derived under mild regularity conditions. Finally, we have presented examples of the application of our results to particular families, which show that the asymptotics of near-record values depends critically on the right-tail behaviour of the parent distribution.
Some interesting questions remain open, such as a more detailed analysis of the sequence η(n), including the law of the iterated logarithm and large deviations, or departures from the i.i.d. hypothesis (e.g., linear trend model). They will be addressed in future work.