Statistical Inference for the Weibull Distribution Based on -Record Data

We consider the maximum likelihood and Bayesian estimation of parameters and prediction of future records of the Weibull distribution from δ-record data, which consists of records and near-records. We discuss existence, consistency and numerical computation of estimators and predictors. The performance of the proposed methodology is assessed by Montecarlo simulations and the analysis of monthly rainfall series. Our conclusion is that inferences for the Weibull model, based on δ-record data, clearly improve inferences based solely on records. This methodology can be recommended, more so as near-records can be collected along with records, keeping essentially the same experimental design.


Introduction
Informally, a record is an extraordinary value of a variable, which surpasses all of its kind. Records are very popular in fields such as sports, climatology, finance or insurance. In mathematical terms, given a sequence of real-valued observations X 1 , X 2 , . . . , we define X 1 as the first record, by convention, and we say that X n is an upper record (or simply a record) if X n > M n−1 holds, where M n−1 = max{X 1 , . . . , X n−1 }, for n ≥ 2. Records have been extensively studied in Extreme Value Theory and their probabilistic properties, mainly under the assumption of independent and identically distributed observations, with continuous underlying distribution, are well known. This classical setting has significant symmetry which greatly simplifies calculations because it implies that all orderings of observations are equally likely. On the other hand, departures from symmetry, such as the existence of a trend in the sequence of observations, brings about technical complexities which require the use of more sophisticated mathematical tools. For general information on record theory, see Reference [1] or Reference [2].
In parallel, statistical inference for record data has developed considerably, impelled by the availability of many data sets of records and also because, in contexts such as destructive stress-testing, efficient sampling schemes (in terms of the number of broken units) yield record series. There is a vast literature on inference for record data and the interested reader can consult, for example, References [3][4][5].
A serious problem with record data is relative scarceness, since a sequence of n iid observations has only about log n records. So extra data may be needed and a reasonable option is near-record data, presented in Section 3.5. Section 4 is devoted to Bayesian inference. We compute Bayes estimators and highest posterior density (HPD) intervals of parameters, using two different priors in the case of both parameters unknown. Then we consider Bayesian prediction of future records. Results from simulations and real data are shown in Sections 4.5 and 4.6. In Section 5 we present our conclusions.
We end this introduction with some comments about the novelty of results presented in this paper. We remark first that, while the expressions for the likelihood of the sample of δ-records and for the MLE of the parameters were first obtained in Reference [10], the remaining results are new, in the context of continuous distributions. These novel results include strong consistency of the MLE of the scale parameter (under known shape parameter); the development of a frequentist strategy for the prediction of future records and the proof of existence of predictors. In the Bayesian framework we develop the estimation of parameters, under a variety of choices of prior distributions, also complemented with a consistency result and, finally, we propose and analyze a method for predicting new records.

Preliminaries
Let X n , n ≥ 1, be a sequence of independent and identically distributed (iid) random variables, with common distribution function F θ and density f θ , where θ is a parameter. Let M n = max{X 1 , . . . , X n }, n ≥ 1, be the sequence of partial maxima. Definition 1. Let X 1 be a record by convention and, for n ≥ 2, X n is a (upper) record if X n > M n−1 . The indexes L(n), corresponding to record observations, are called record times. That is, L(1) = 1 and for n ≥ 2, L(n) = min{m > L(n − 1) : X m > M m−1 }. Records (or record-values) R n are defined by R n = X L(n) , n ≥ 1.

Definition 2.
Let δ be a fixed, real parameter. Let X 1 be a δ-record by convention and, for n ≥ 2, X n is a δ-record if X n > M n−1 + δ.
The sequences of δ-record times and δ-records are defined analogously as for records. Note that if δ = 0, δ-record are just records. If δ > 0, δ-records are a subsequence of records and, on the contrary, if δ < 0, records are a subsequence of δ-records. So, the only statistically relevant situation is δ < 0, since δ-records contain all records plus the so-called near-records (in the sense of Reference [7]). Given δ < 0, X n is a δ-near-record (or simply near-record) if X n ∈ (M n−1 + δ, M n−1 ]. In other words, near-records are close to being records but are not records. It is clear also that near records are not symmetrically clustered around records. In the rest of the paper we assume δ ≤ 0.

Definition 3.
(i) A near-record X n is said to be associated to the m-th record R m , if L(m) < n < L(m + 1).
(ii) The number of near-records associated to R m is denoted by S m . (iii) If S m > 0, the vector of near-records associated to R m is denoted by (Y m,1 , . . . , Y m,S m ).
When referring to the sample as a random object we use bold upper-case letters, otherwise we use t = (r n , s n , y n ). Note that T contains a fixed number n of records (R), plus the counts (S) and respective values (Y) of all near-records associated to each record. So, T has random length, depending on the (random) numbers S i of near-records associated to each record R i . In turn, the distribution of S i depends in a non trivial way on δ and is also affected by the tail behavior of F θ . For example, the number of near-records obviously increases with the absolute value of δ. On the other hand, if F θ is heavy-tailed, there will tend to be fewer near-records. Laws of large numbers and central limit theorems, for the number of near-records, can be found in References [28,29]. Normalizing sequences of asymptotic results in Example 4 of Reference [10], may serve as proxies for the expected values of the number of near-records, for the Weibull distribution.

Remark 1.
Observe that in (1) the sample is assumed to contain all near-records associated to the last record R n . To ensure that all near-records are present in the sample, the value of R n+1 must be observed. As commented in Section 4.2.1 of Reference [10], the data may not contain all near-records associated to R n , since R n+1 is not observed. So, it is not known if some near-records associated to R n are missing and, in such situation, the likelihood has to be modified accordingly. It is easy to see that this amounts to substituting F θ (r n ) for F θ (r n + δ) in (1). Then the modified likelihood L, of a possibly incomplete sample, is In the rest of the paper, except in the analysis of real data, we work with L (thereby assuming that all near-records associated to R n have been collected). Results for L can be adapted to L, with only minor modifications.

Density of Future Records
The prediction of a future record R m , m > n, is based on the conditional density, given T, as presented below.

Proposition 2.
The density of the m-th record R m conditional on T, is given by for m > n and z ≥ r n , where Λ θ (z) = − log F θ (z).
Proof. Conditionally on R n , R m is independent of T (see Proposition 1 in Reference [10]). Therefore, the density of R m given T is the same as the density of R m given R n , which proves the result.

Weibull Distribution
We present here the likelihoods corresponding the Weibull distribution, with two parametrizations: P 1 , for classical (frequentist) inference, and P 2 , for Bayesian analysis. We find it convenient to work with these two parametrizations mainly because they are found in the literature, associated respectively to classical or Bayesian inferences. See, for example, the Weibull Distribution Handbook [27], where the author argues that P 2 separates parameters and has the advantage over P 1 , of simplifying the algebra in Bayesian manipulations.

Definition 4.
Let the parametrizations P 1 , P 2 of the Weibull distribution be defined respectively by for x ≥ 0 and λ, α, β > 0.

Maximum Likelihood Analysis
This section is devoted to study the MLE of parameters and the MLP of future records. We begin with the MLE of parameters λ, β, related to P 1 and, as commented before, we consider two cases: β known and β unknown. Throughout this section λ is assumed unknown.

Remark 2.
Notice that if β is known, some inferences for λ can be reduced to the exponential distribution since, if X is Weibull distributed, then X β is exponentially distributed. However, this is not so for δ-records because, for β = 1, the β-th power of δ-records sampled from X, are not distributed as δ-records sampled from X β . The reason being that δ-record extraction and power transform of the data are not symmetrical actions, in the sense that they do not commute. Therefore, the consistency result of Theorem 1, for β = 1, does not follow from the corresponding result for the exponential model (β = 1).
The existence of the MLE of λ, β is established in Proposition 3. For ease of notation, we omit their dependence on t. The set of solutions of a maximization problem is denoted argmax.

Remark 3.
Note thatλ(β) depends on all δ-records (records and near-records) in the sample t. If, by chance, no near-records are observed (s i = 0, for i = 1, . . . , n), then which is in contrast with the case δ = 0, where the estimator depends only on the last record r n .
The numerical computation of the MLE is straightforward. If β is known, the explicit formula for the MLE of λ is given in (7). If β is unknown, the numerical maximization of h(β) over [L, U] must be carried out to findβ, as explained in the proof of Proposition 3. Thenλ(β) andβ are the MLE of λ and β respectively.

Strong Consistency
We state below the strong consistency ofλ(β), the MLE of λ when β is known. The proof of this result is split into several technical lemmas, presented in Section 6.1. Strong convergence as n → ∞ is denoted a.s. −→.

Remark 4.
If β is unknown, the question of consistency ofλ remains open. Nevertheless, we have run Montecarlo simulations which suggest that consistency also holds in this case. In the left panel of Figure 1, values of the EMSE ofλ are plotted versus n. It can be seen thatλ appears to be consistent, for unknown β (in mean square sense). We also observe that convergence in the case of known β is much faster (lower curves). Moreover, in both situations δ-records yield an EMSE smaller than records. On the right panel of Figure 1 we can see a steep descent of the EMSE ofβ, suggesting consistency of this estimator as well.

Maximum Likelihood Prediction of Future Records
The MLP of future records, as defined in Reference [30], consists in maximizing the so-called predictive likelihood function. The following definitions follows that idea, adapted to the sample t of δ-records.

Definition 5.
Let the predictive likelihood of R m and θ be defined by L P (z, t|θ) = f R m (z|t, θ)L(t|θ). In the case of the Weibull distribution, using parametrization P 1 , we have, for z ≥ r n , and so, Definition 6. The MLP of R m , m > n, is defined byR m =z, where (z,θ) ∈ argmax z,θ L P (z, t|θ). In the case of the Weibull distribution, using parametrization P 1 , we have Remark 5. The estimatorsλ,β in Definition 6 are the so-called predictive maximum likelihood estimators of λ, β, according to Reference [7]. Their properties are not investigated in this paper.
The existence of MLP of future records in the Weibull distribution is established in Proposition 4 and the corresponding proof is presented in Section 6.2.

Remark 6.
According to Proposition 4, if β is known, there is an explicit formula for the MLP of R m , equal tõ z(β) in (9). On the other hand, if β is unknown,z(β) andλ(β) are plugged in L P 1 and a maximization problem in one real variable must be solved (numerically), namely max . This is straightforward since, as shown in Section 6.2, there is a compact interval containing a solutionβ. Finally, it suffices to replace β byβ in (9), to find the MLP.

Simulation Study
To assess the behavior of the MLE and MLP, we carry out Montecarlo simulations. For several values of λ, β and for δ = 0, −0.5, we generate 10 4 samples of n = 5 records and their near-records.
The MLE of the Weibull parameters using δ-records were first studied in Section 4.1.1 of Reference [10]. A comparison of the accuracy of estimators, for n = 10, λ = 1, β = 2 and δ = 0, −0.5 can be found in that paper. Throughout this paper we work in a different scenario, namely n = 5, with several values of λ, β and δ = 0, −0.5.
The estimated mean square errors (EMSE) of the MLE of λ, β are computed as averages of the squared deviations of the MLE from the true values of the parameters. Results in Table 1 show that, for β known, the EMSE ofλ(β) is much lower for δ < 0 than for δ = 0 (only records). For β unknown, the observed improvement is greater inβ than inλ. Regarding bias, it is known that the MLE of λ and β, from record data, are biased; see Reference [23]. In the case of δ-records our simulations show a small positive bias in the estimations of both parameters. Table 1. Estimated Mean Square Errors (EMSE) of maximum likelihood estimators (MLE)λ,β.

Known β
Unknown β For MLP of future records we proceed as above in terms of the number of simulated samples, the number n of records and the values of λ, β and δ. The record to be predicted is R 7 (that is m = n + 2), which is the first interesting case for m > 5, sinceR 6 = r 5 , as commented in Remark 6. For each run we simulate R 7 and compute the EMSE, as the average of squared deviations ofR 7 from the simulated value of R 7 . Results show that predictions based on records or on δ-records tend to underestimate R 7 . For example, given λ = 0.25 and β = 1, the mean of R 7 is 1.75; for δ = 0, the mean ofR 7 is 1.39 while, for δ = −0.5, the mean ofR 7 is 1.47. The values of the EMSE are displayed in Table 2. Although, in absolute terms, the advantage of δ-records appears to be greater in estimation than in prediction, it should be borne in mind that there exists a positive lower bound of the EMSE of any predictor of R m , based on past information, even if parameters were known. This is because the optimal mean square predictor, based on past information, is the conditional expectation with respect to the past. Indeed, let F n be the σ-algebra generated by {X k ; k ≤ L(n + 1) − 1} and R * m a predictor based on the available information before record R n+1 (that is, F n -measurable), then In particular, for the Weibull model, with β = 1, R n is distributed as sums of n iid exponential random variables, with mean λ, hence For example, if m = 7, n = 5 and λ = 0.5, the bound in (11) is equal to 0.5. Therefore, when comparing the EMSE 0.884 (δ = 0) versus 0.793 (δ = −0.5), in the penultimate line of Table 2, the bound should be taken into account. A fair estimate of the gain is obtained by subtracting 0.5 from each quantity and computing (0.384 − 0.293)/0.384, which yields a 23.7% reduction, instead of just 10.3%, without such correction. An overall conclusion from Tables 1 and 2 is that estimators and predictors, based on δ-records, outperform those based on records only. This is coherent with the fact that there is more information in δ-records than in records and also with results in Reference [15], where this property was also observed in the case of the geometric distribution.

Real Data
We compute estimations and predictions for the rainfall data mentioned in the introduction. The data consists of cumulative rainfall (measured in millimeters), from September to November, recorded at the Castellote weather station in Spain, between 1927 and 2000. The complete sample of 74 values is well fitted by a Weibull distribution, as can be seen in Table 3 and Figure 1 in Reference [10]. We have also run the Ljung-Box test to check the existence of autocorrelation, yielding a p-value of 0.621. There are n = 5 records in the sample and, for δ = −25, −50 and −75, the number of δ-records is 6, 9 and 18, respectively. For completeness we show the δ-record data in Table 3. Since in real data applications, unlike in simulations, we do not know the values of the parameters, it is not clear how to assess the impact of δ-records. This is also the case in predicting records, because no new record has been observed between the years 2000 and 2018. So, in order to measure the improvement of estimations and predictions due to δ-records, we compare our results with those obtained from the complete sample of 74 values, taken as benchmarks. Note that the complete sample can be seen as the set of δ-records with δ = −∞.
The MLE of λ and β were reported in Reference [10], showing that δ-records give estimates closer to the benchmark than the estimates from records only. For completeness we present those results here, in the second and third columns of Table 4. Maximum likelihood prediction was not considered in Reference [10]. The results in Table 4 show that the prediction with delta-records is closer to the estimation using the complete sample, even though the gain is not as clear as in the case of estimation. Recall, however, that the predicted values for future records using the whole sample are not the real values (since they have not been observed yet). Conclusions on the advantage of using delta-records over records must be drawn from Montecarlo simulations in Section 3.4, rather than from this particular dataset. The reader interested in applications to real data can also see Section 4.2.2 in Reference [10], where the theory is adapted to lower records and lower δ-records. This is readily done by taking advantage of the symmetry between the definitions of upper and lower records.

Bayesian Analysis
We develop the estimation of parameters and prediction of future records in the Bayesian framework. We use parametrization P 2 because it is frequently found in the literature on Bayesian analysis of the Weibull model; see Section 14.2 of Reference [27]. As in Section 3, we analyze the cases of β known and unknown, while in this section α is assumed unknown. In all expressions below, α, β are positive. Let the Gamma(µ, p) density, with parameters µ, p > 0, be denoted If β is known, we assume that α has prior π(α) = γ(α|a, b), which is easily checked to be conjugate. In the case of β unknown, there seems to be no tractable conjugate family for α, β and, among several alternatives found in the literature, we decided to follow Kundu [31] and Soland [32]. In Kundu's approach α and β have independent gamma distributions; in Soland's approach, β is discrete, taking values in a finite set and α conditional on β is gamma distributed. Other options, not considered here, are found in References [33,34], where α, as well as β conditional on α, have gamma distributions. Definitions of Kundu's and Soland's priors are given below.
If β is unknown, using Kundu's prior we obtain the posteriors with common normalizing constant (numerically computed), given by For Soland's prior we have, from (5) and Definition 7, the posteriors with common normalizing constant given by

Estimation of Hyperparameters in Soland's Prior
A practical challenge with Soland's prior is the choice of the hyperparameters a i , b i . We propose to estimate them, using the empirical-Bayes-type method described below, inspired by Reference [4].
For j = 1, . . . , n and i = 1, . . . , k, consider the expectations Note that the value 2 −j in (17) follows from F α,β (R j ) being distributed as e − ∑ j l=1 ξ l , where ξ 1 , . . . , ξ j are iid exponential, with parameter one, so E(F α,β (R j )|α, β) = E(e − ∑ j l=1 ξ l ) = 2 −j ; see Reference [2]. Note also that (16) depends on r j , the actual j-th record value of the sample, while (17) depends on the random variable R j . Then a i , b i can be estimated by choosing, if possible (as described in Lemma 1 below), two suitable records r j , r l , j < l and solving for a i , b i in the equations Lemma 1. If there exist records r j , r l , j < l, such that lr Proof. Let x = 1/a i and y = b i , then (18) is equivalent to Solving for y in (19) and equating we have g j (x) := (1 + xr β i j ) 1/j = g l (x) := (1 + xr β i l ) 1/l . Furthermore, observe that g j (0) = g l (0) = 1 and that g j (x)/g l (x) → ∞, as x → ∞. Hence, if the derivatives satisfy g j (0) < g l (0) or, equivalently, if r j /r l β i < j/l, there exists x * such that g j (x * ) = g l (x * ). Finally, x is replaced by x * in either expression of (19) to solve for y.

Bayes Estimators of Parameters α, β
Bayes estimators are defined under quadratic loss. In the case of known β, the estimator of α is denotedα B (β) and follows at once from π(α|t) = γ(α|a + G(β), b + N). So, Credible intervals for α are readily obtained from π(α|t), as well. Additionally, it may be of interest the next result of consistency forα B (β), which follows from Theorem 1. See Reference [35] for a discussion on Bayesian consistency.
In the case of Soland's prior, the Bayes estimators are also readily computed from (14), aŝ where K 2 is defined in (15). It should be noted that in simulations and in the analysis of real data, using Soland's prior, we prefer to estimate β usingβ MP ∈ arg max 1≤i≤k π 2 (β i |t), in order to stay within the set of possible β values.

Prediction of Future Records
The Bayesian prediction of future records is based either on f R m (z|β, t), if β is known, or on f R m (z|t), if β is unknown. In all densities below we assume that parameters are positive and z ≥ r n . From (3) and using parametrization P 2 , we have First, if β is known, we use the posterior π(α|t) to compute Then the Bayes predictor of R m , given byR m (β) = E(R m |β, t) = ∞ z n z f R m (z|β, t)dz, is numerically computed as If β is unknown we use π j (α, β) (j = 1, for Kundu and j = 2, for Soland) to compute and the Bayes predictor is given byR m = E(R m |t) = ∞ z n z f R m (z|t)dz. Bayesian prediction intervals are also readily obtained.
In the case of Kundu's prior, from (12) and (24) we have In the case of Soland's prior, from (14) and (24), we get

Simulation Study
We assess here the performance of estimators, credible intervals and predictors. To that end, we simulate samples of δ-records, with n = 5 records. For each sample we compute estimators or predictors for records only (δ = 0) and for δ-records (δ = −0.5).

Known β
In Table 5 we show results for the Bayes estimatorα B (β), defined in (20) and 95% HPD intervals for α. For several values of a, b and β, we simulate 10 4 independent observations of α, from the Gamma(a, b) distribution. Then, for each α we simulate a random sample of 5 records and their associated near-records and computeα B (β). The EMSE is computed as average of squared differences (α B (β) − α) 2 , over the 10 4 simulation runs. Table 5. EMSE of Bayes estimatorα B (β); length (LHPD) and coverage (%CHPD) of highest posterior density (HPD) interval of parameter α, with known β. We report in Table 5 the mean coverage and average length of the 10 4 HPD intervals. Regarding the coverage, as we sample α from its prior distribution, approximately 95% of the intervals (both for records and δ-records), contain the value of the parameter in the simulation. Since this happens in all the HPD we construct in this section, we do not include the coverage of the HPD intervals in the remaining tables. In all cases we observe thatα B (β) and the HPD intervals based on δ-records compare quite favorably with their counterparts based only on records (δ = 0), in terms of smaller EMSE and interval length.
Additionally, we analyze the frequentist coverage for particular values of the parameter. In order to do so, we take β = 1.5 and fix a value of α in a grid from 0.2 to 2; we then simulate 200 samples of records (and δ-records), compute the corresponding HPDs for α, using a Gamma (4,4) prior and check if they contain the value of α. We repeat this for each value of α. Figure 2 (left) shows the coverage for α ∈ [0.2, 2]. We observe that δ-records provide intervals with frequentist coverage closer to 95% than records. The right plot in Figure 2 shows the average length of the intervals as a function of α. In the assessment of Bayes predictors of future records (known β), we consider different gamma priors for α and simulate 10 4 values of α. For each simulated α we generate a sample of δ-records and the values of R 6 and R 7 . Then we compute the EMSE ofR 6 andR 7 , as the average of the squared deviations (R 6 − R 6 ) 2 and (R 7 − R 7 ) 2 , over the 10 4 simulation runs. We also compute the lengths of the HPD intervals as the average of the lengths of the estimated intervals. The coverage of the intervals, defined as the proportion of runs where the simulated record is in the interval, is included as well. As in the case of estimation, since we sample α from the prior distribution in the simulations, approximately 95% of the intervals contain the corresponding record. So, we do not include the coverage in the rest of the tables for prediction.
Results are displayed in Table 6, where it is apparent that predictors are more accurate with δ-records. While Table 5 shows a significant improvement in the estimation of α, with the use of δ-records, this improvement is less visible when forecasting future records. Nevertheless, as in the case of MLP, a fair comparison between the EMSE of predictions should take into account the lower bounds commented there. For instance, if β = 1 and α is fixed, the bound is (m − n)/α 2 . Therefore, when α has a prior Gamma(a, b), with b > 2, the lower bound can be computed as In the particular case a = b = 4, m = 7, n = 5, the bound is 16/3. Then the gain with the use of δ-records, once subtracted the lower bound, is (6.292 − 6.095)/(6.292 − 16/3) = 20.5%. That is, while the absolute gain of 0.2 may not seem relevant, relative to 6.292, it is so when the lower bound on the EMSE is taken into account. Table 6. EMSE of Bayes predictorsR 6 ,R 7 ; length (LHPD) and coverage (%CHPD) of HPD intervals for R 6 , R 7 , with known β.

Unknown α, β
We begin with Kundu's prior π 1 , described in Definition 7. For fixed a, b, c, d we simulate 10 3 pairs (α, β) from π 1 (α, β) and, for each (α, β) we generate a sample of 5 records and their near-records. Once the sample is observed, we numerically compute K 1 in (13), to obtain an approximation of π 1 (α, β|t). We then compute the Bayes estimatorsα B ,β B in (21) and the HPD intervals from π 1 (α, β|t). The EMSE are obtained as averages of the squared deviations (α B − α) 2 and (β B − β) 2 , over the 10 3 simulation runs. As in the case of known β, we observe in Table 7 that δ-records have a positive effect, both in the accuracy of the estimations and in the length of the HPD intervals. Table 7. EMSE of Bayes estimatorsα B ,β B and length (LHPD) of HPD intervals, using Kundu's prior. For Soland's prior π 2 we choose the values β k ∈ {0.5, 0.75, 1, 1.25, 1.5} for β, with different prior probabilities and two different gamma distributions for α, given β = β k . The HPD intervals for β are not computed since β is discrete and takes only five different values. Results are shown in Table 8.  Simulation results of Bayes predictors of future records (α, β unknown), using Kundu's and Soland's priors, are presented in Tables 9 and 10, respectively. As before, we proceed by simulating first the parameter values, from the prior distributions and then the sample of δ-records and the values of future records. There is, however, a practical difficulty when computing the EMSE of predictors, since a few huge records, which actually appear in simulations, completely dominate the EMSE because it is just the average of squared deviations. Suppose, for example, that we use Kundu's prior and that the values α = 0.5, β = 0.1 have been obtained from π 1 (α, β). Then, the corresponding Weibull distribution has expectation 3.7 × 10 9 , the value of R m will likely be very large and so, a huge value of (R m − R m ) 2 will be observed. These "outliers" make the EMSE, computed over all the simulation runs, a useless measure of performance. In order to avoid this problem, we compute the 5% trimmed mean of (R m − R m ) 2 , over the simulations. That is, we eliminate the 2.5% smallest and largest values of (R m − R m ) 2 and compute the average with the remaining ones. The results in Tables 9 and 10 show that δ-records have an impact in the prediction of future records, as observed in the case of known β. Table 9. Trimmed EMSE (TEMSE) of Bayes predictors and lengths (LHPD) of HPD intervals for R 6 , R 7 , using Kundu's prior π 1 .

Real Data
Given that we do not have actual prior knowledge of the parameters, we decided to consider values around 2 for β and around 1/2 for α, having only illustrative meaning. In Kundu's prior π 1 , we take a = 2, b = 1, c = 1, d = 2. For Soland's prior π 2 we consider β uniformly distributed on {1.5, 1.75, 2, 2.25, 2.5} and in order to determine the values of (a i , b i ), we apply the method described before Lemma 1, if possible. Recall that in order to apply the method on β i , there must exist a pair of records r j < r l such that lr β i j < jr β i l . For β 1 = 1.5 and β 2 = 1.75, there exists no such pair of records while, for β 3 = 2, β 4 = 2.25 and β 5 = 2.5, the method can be applied and yields the following hyperparameter pairs (after rounding up to the nearest integer): (40, 20), (18,8) and (13,5) for β 3 , β 4 and β 5 respectively. For β 1 and β 2 , where the method fails, we pick the value of the closest β, that is, (40, 20). For numerical convenience, we analyze rainfall data using decimeters instead of millimeters, so that the values of δ andr i are now divided by 100.
The results are shown in Tables 11 and 12. In both tables we observe that the estimates of the parameters using δ-records, with δ = −0.75, outperform those based on records only, because they are closer to results using the complete sample. As in maximum likelihood prediction, the gain in prediction of future records using δ-records versus records is not clear; while there is some improvement using Soland's prior, this is not the case for Kundu's prior. This can be due to the surprising closeness of Kundu's prediction using records to the prediction using the whole sample, which we believe happens by chance in this particular instance.

Final Comments
Professionals interested in statistical inferences for the Weibull model, based on record data, should consider the possibility of using δ-records. We have presented in this paper the mathematical bases of the methodology, together with some theoretical results, such as consistency. We have also established the existence of estimators and predictors and discussed their numerical implementation so that researchers interested in applications can readily test the method on their own data. As commented above, we insist here that δ-records can be collected, in many cases, by slightly modifying the experimental setup for records so that the additional cost related to extra data is kept low. The conclusions to be drawn from the simulations and the application to the rainfall data are that δ-records improve inferences in the Weibull model. The impact is more notorious in MLE than in the Bayesian framework, possibly because of a strong influence of the priors. We believe that more investigation would be welcome for fine tuning this novel methodology. For example, it would be useful to have guidelines for the choice of δ and to explore the possibility of letting δ vary during data collection. Also, in the theoretical analysis of the model, it would be of interest to extend the consistency of Theorem 1 to the case of unknown shape parameter and study eventual asymptotic distributions; see Remark 4. These and other related topics will be considered in forthcoming papers.

Consistency of the MLEλ(β)
The proof of Theorem 1 follows from Lemmas 2-7, related to the asymptotic behavior of δ-records, which can be of independent interest. Strong convergence, as the appropriate index (usually n) tends to infinity, is denoted by a.s. −→; inequalities with random variables are understood in the almost sure sense. Recall that R m denotes the m-th record, while S m and (Y m,1 , . . . , Y m,S m ) denote respectively the number and the vector of near-records associated to R m .
Proof. It suffices to show that S n /φ n,k a.s.
−→ ∞, for k ≥ 1, where φ n,k = (n − 1) · · · (n − k) is the k-th falling factorial of n − 1. To that end we apply the Borel-Cantelli lemma to show that P(S n < φ n,k M i.o.) = 0, for every integer M > 0 (i.o. stands for infinitely often). Observe that S n conditional on R n is geometrically distributed (starting at 0), with parameter where α = λ −β , that is, P(S n = k|R n ) = (1 − p n ) k p n , k ≥ 0; see Reference [10]. Noting that from (26) we have p n ≤ e −α|δ|β(R n +δ) β−1 + and so, Now, since R β n has Gamma(α, n) distribution (see Reference [2]), we get Hence, knowing that the k-th factorial moment of a Poisson random variable with parameter µ is and the conclusion follows.
Proof. From Proposition 1 in Reference [10] we know that, conditional on R n and S n , S n > 0, the random variables Y n,1 , . . . , Y n,S n are iid, with common density function Hence, letting F be the σ-algebra generated by the sequences (R n ) and (S n ), we obtain and let V n = ∑ S n j=1 Z n,j , n ≥ 1. Then, by Tchebychev's inequality, P(|V n | > S n |F ) ≤ σ 2 n 2 S n , for all > 0, where Furthermore, since the sequence (R β n ) is distributed as the ordered points of a homogeneous Poisson process, with rate λ −β (see Reference [2]), the strong law of large numbers implies R β n n a.s.
−→ ∞ and β > 1), we get K n a.s. −→ 0. The conclusion now follows from the above convergence results and the identity U n /S n = V n /S n + λ β − K n .
Proof. As stated in Lemma 2, conditionally on (R n ), the random variables S i are independent and geometrically distributed, with parameter where G is the σ-algebra generated by (R n ). Furthermore, observe that p i ≥ e δα and E(S 4 i |G) ≤ 19p −4 i . So, the conditional fourth moment ofS i is bounded above by 19e −4αδ and the following (conditional) strong law of large numbers holds: P 1 n ∑ n i=1S i → 0 G = 1. Then, taking expectation and observing that (1 − p n )/p n = e α(R n −(R n +δ) + ) − 1 a.s.
Proof. Let F , Z i,j , V i and α be defined as in the proof of Lemma 3. Note that, conditionally on F , the Z i,j are independent with mean 0 and also that |Z i,j | ≤ |δ|. It follows that E(V 2 i |F ) ≤ MS i and E(V 4 i |F ) ≤ MS 2 i , for some non-random constant M > 0. Consequently, since the V i are also conditionally independent and centered, we have On the other hand, from (26) and Then, by Tchebychev's inequality and (29), and it follows that 1 but such convergence is implied by Lemma 4 and R n − (R n + δ) + a.s.
Proof. Let N n and D n be the number of records and of near-records among the first n observations, respectively, then it is clear that S 1 + · · · + S N n −1 ≤ D n ≤ S 1 + · · · + S N n . From Proposition 5.1 of Reference [28], we have D n / log n a.s.
−→ 0 and, noting that N L(n) = n, where L(n) is the n-th record time, we have where we have used L(n) a.s.
−→ ∞ and the well-known result N n / log n a.s.

Proof of Proposition 4.
(i) For z > r n , let l 1 (z, λ, β) = log L P 1 (z, t|λ, β). Then Solving for λ in we obtain the stationary pointλ(β) = and max z>r nl 1 (z) is reduced to max x>0l 1 (x). We now study the sign of the derivative which is equal to the sign of the polynomial q(x) := ax 2 + bx + c, with coefficients defined in (10). We observe, since a < 0 and ac ≤ 0, that q is concave on R and has real roots given by ν = On the other hand, if m > n + 1, then c > 0 and ac < 0, which clearly implies ρ > 0. We summarize the above in terms of (31), denoting byz(β) an element of argmax zl 1 (z): • If m = n + 1, then ρ = 0, ∂l 1 ∂x is negative on (0, +∞) and so,l 1 (x) has no maximum on (0, +∞). In this case we do, however, definez(β) = r n , that is,z β (β) = ρ + r β n .
Author Contributions: All the authors contributed significantly to the present paper. All authors have read and agreed to the published version of the manuscript.