Analyzing the Gaver—Lewis Pareto Process under an Extremal Perspective

Marta Ferreira 1,2,3,∗ and Helena Ferreira 4 1 Centro de Matemática da Universidade do Minho, Campus de Gualtar 4710-057 Braga, Portugal 2 Centro de Matemática Computacional e Estocástica, Departamento de Matemática-Instituto Superior Técnico Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal 3 Centro de Estatística e Aplicações, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal 4 Universidade da Beira Interior, Centro de Matemática e Aplicações (CMA-UBI), Avenida Marquês d’Avila e Bolama, Covilhã 6200-001, Portugal; helena.ferreira@ubi.pt * Correspondence: msferreira@math.uminho.pt


Introduction
Increased exposure to catastrophic losses and the complexity of financial instruments require sophisticated risk assessment tools in areas such as (re) insurance, banking, finance, among others.Extreme value theory plays an important methodological role in risk management by providing appropriate instruments to deal with values as large as or even higher than those ever observed.These techniques include heavy-tailed models and measures to evaluate tail dependence, namely to infer to what extent the occurrence of a risk value in some variable influences an analogous occurrence in another variable.
Linear ARMA (autoregressive moving average) with heavy-tailed noise may be suitable to model time series presenting peaks of observations.However, in place of a summation, a maximum operator is more propitious to derive extremal properties.Max-autoregressive and moving maximum models were developed within this spirit, such as MARMA (max-autoregressive moving average) introduced in Davis and Resnick (1989) and M4 (multivariate maxima of moving maxima) processes presented in Smith and Weissman (1996).The Pareto model, which is closed under geometric multiplication and minimization, also motivated the first-order Pareto processes presented in Arnold (2001).Further analysis may be found in Ferreira (2016) and references therein.
A random variable (rv) is modeled by Pareto(σ, α) if it has distribution function (df) This model is a particular case of Pareto-type tail models, the class of regular varying tail distributions, that is, where L(x) is a slowly varying function (i.e., L(x) is a real function of positive real values satisfying L(tx)/L(x) → 1, as x → ∞, ∀t > 0).Parameter 1/α is usually denoted as the tail index of the Pareto rv.Consider {X n } n≥1 a Gaver-Lewis Pareto (GLP) process, presented in Arnold (2001).More precisely, for each positive integer n, we have where { n } n≥1 is an independent and identically distributed (iid) sequence with common df Pareto(σ, α) given in Equation ( 1) and independent of the Bernoulli(p) iid sequence {U n } n≥1 with 0 < p < 1, and i independent of X j ∀i > j.If X 0 ∼Pareto(σ, α), then {X n } n≥1 is stationary also with marginal df Pareto(σ, α).The GLP process corresponds to the exponentiated version of the Gaver-Lewis process introduced in Gaver and Lewis (1980), and hence its name.Simulated samples from the GLP process with marginals Pareto(1, 1) and p = 0.25, 0.5, 0.75 are plotted in Figure 1.This is a model within the heavy-tailed class where mean values (of different orders) may not exist.For instance, in this case, the mean exists only if α > 1 and the variance/covariance exists whenever α > 2. In Arnold (2001), the autocorrelation was derived as for α > 2.Moreover, in heavy-tailed models, the extremal observations are important, and a dependence analysis based on central measures like the most common autocorrelation may be misleading if the dependence in the tails presents a different structure from the remaining.
Here, we focus on the extremal behavior of the GLP process, namely the tail dependence structure.Despite being a heavy-tailed model, it is practically unknown in modeling extreme values.We shall see that it has interesting properties, such as asymptotic tail independence, i.e., the probability that one observation exceeds an increasing large value given that the previous one has already exceeded it, approaches zero.The rate of the convergence, usually denoted coefficient of asymptotic tail independence η (Ledford and Tawn (1996); Wadsworth and Tawn (2012)) captures a residual tail dependence, revealing a kind of "penultimate" clustering, i.e., an aggregation of not so high values.This is a not so fortuitous behavior in real applications and can be observed in the well-known Gaussian processes (see Bortot and Tawn (1998), Ramos and Ledford (2013), and references therein).In practice, ignoring this phenomena will result in misleading inference (see, e.g., Poon et al. (2003)).However, not all series associated with environmental, social or economic phenomena are susceptible to Gaussian modeling, especially when they have heavy tails.The most common extremal models MARMA and M4, as well as, the Yeh-Arnold-Robertson Pareto (III) and the Lawrence-Lewis Pareto processes (see, respectively, Ferreira (2012) and Ferreira (2016) and references therein) are not tail independent and new processes have been appearing (Heffernan et al. (2007); Ferreira and Canto e Castro (2010); Ferreira and Ferreira (2014) and Ferreira and Ferreira (2015) ).The GLP is an additional contribution within this class.Coefficient η will also be extended to observations that are lag-m apart, providing an alternative to the autocorrelation function (acf).
This paper is organized as follows.The tail dependence measures and conditions to be analyzed are detailed in Section 2 and applied to the GLP process in Section 3. The tail characterization provides us with methods to estimate the autoregressive parameter p, which will be compared through simulation in Section 4.An illustration with a real dataset is addressed in Section 5.

Tail Dependence
(5) (Leadbetter et al. (1983)).The extremal index is a measure for the clustering propensity, being interpreted as the arithmetic inverse of the mean number of exceedances of an increasing threshold per independent cluster.The null case is often ignored, corresponding to a degenerate limiting distribution for the maximum.The value θ = 1 is associated to iid sequences but not only these.Below, we shall see that it is a form of asymptotic independence of extremes.Some dependence conditions allow us to derive the extremal index through the joint distribution of s consecutive terms of {X n } n≥1 .
The long-range condition D(u n ) of Leadbetter (1974) states that α n,l n → 0, as n → ∞, for some sequence l n = o(n), where Observe that D(u n ) is a milder condition than the usual mixing, such as strong-mixing.Under condition D(u n ), we say that the local dependence condition D (s) (u n ) of Chernick et al. (1991) holds for {X n } n≥1 , if for some {k n } n≥1 satisfying Equation ( 7), we have If D (s) (u n ) holds, the extremal index exists and can be computed through (Chernick et al. (1991)) Observe that, under D This corresponds to condition D (u n ) of Leadbetter et al. (1983) whenever s = 1, which locally restricts the occurrence of clusters of exceedances and thus leads to θ = 1.Condition D (u n ) of Leadbetter and Nandagopalan (1989) is obtained with s = 2 and locally restricts upcrossing clustering.
Observe that, under D (u n ), we can write and thus state meaning that consecutive observations are tail dependent.On the other hand, under a unit extremal index, we have and thus consecutive observations are asymptotically tail independent.This feature has been observed in some real data and theoretical examples (Ledford and Tawn (1996); Bortot and Tawn (1998); Ramos and Ledford (2013)).Ledford and Tawn (1996) introduced the asymptotic tail independence coefficient, η, in order to measure the rate of convergence of P(X 2 > F −1 (1 − t)|X 1 > F −1 (1 − t)) towards 0, where F −1 is the quantile function, capturing a kind of pre-asymptotic dependence.More precisely, the asymptotic tail independence coefficient, η ∈ [0, 1], exists whenever it holds Thus, under Equation ( 9), we can state, Observe that, if η = 1 and L(u n ) → 0, we have θ < 1 and thus an effect of clustering of high values.Under a unit extremal index, the coefficient η < 1 measures the rate of convergence of θ towards 1, capturing a kind of pre-asymptotic clustering, despite a resembling of the process to an iid sequence at increasingly high thresholds.
Analogous with the acf, we extend the coefficient η and state the tail dependence within random pairs that are lag-m apart, (X i , X i+m ), i ≥ 1, through the coefficient η m , i.e., where η 1 ≡ η.
The tail dependent class has been greatly enhanced within the methodology of extreme values.However, this approach results in the overestimation of extremal dependence if the series is actually asymptotically independent.An illustration with financial data may be seen in Poon et al. (2003).The most recent literature has been addressing this issue, namely, with the introduction of new models comprising asymptotic tail independence (Heffernan et al. (2007); Ferreira and Canto e Castro (2010); Ferreira andFerreira (2014, 2015).In the next section, we will show that the GLP belongs to this latter class of models.

The Tail Dependence of the Gaver-Lewis Process
In the following, and without loss of generality, we will take σ = 1."Mixing"conditions roughly state that two rvs become increasingly independent as they get more apart in time.One of its forms is the β-mixing condition, defined by with F (.) denoting the σ−field generated by the indicated random variables (Bradley (2005)).
In the following, consider notation Observe that First, we show that GLP is regenerative, that is, it has a regeneration set, i.e., a recurrent set R such that, for some m ∈ N, a distribution ϕ and κ ∈ (0, 1), we have for all Borel set B over R. If, for any regeneration set R and any Borel set B over R, we have for some m ∈ N and κ 1 , κ 2 ∈ (0, 1), then the process is said to be aperiodic (Asmussen (1987)).Consider R =]1, r[ (and thus recurrent since it is in the state space ]1, ∞[ of the process) and B a Borel set over R. Let x ∈ R, S = [r, r 1/(1−p) ] and V ∼Pareto(r 1−p , α).For all x ∈ R, we have and thus regeneration holds by considering m = 1, ϕ(B) = P(V ∈ B|V ∈ S) and κ = P(V ∈ S)p.Observe that S is also regenerative since it is recurrent (S ⊂]1, ∞[) and, ∀z ∈ S, z 1−p ≤ r < y, for any y ∈ S, and thus Q(z, B) ≥ κ ϕ(B), with ϕ(B) and κ as above.Now, we have Therefore, the aperiodicity condition in Equation ( 10) is satisfied if we take κ 1 = κpP(V ∈ S) and κ 2 = κ.
Note that condition D(u n ) given in Equation ( 6) is weaker than β-mixing and thus holds for GLP by the previous result.
Proposition 2. The GLP process satisfies condition D (u n ) for sequences {k n } n≥1 satisfying Equation ( 7) and such that (2 − p) n/k n /n p → 0, as n → ∞.
This result reveals that high observations of the GLP process behave similar to an iid scenario.However, there is a weak dependence that may be evaluated through the Ledford and Tawn coefficient η in Equation ( 9).Moreover, we will see that it relates with parameter p of the process.
Proof.Consider a t = F −1 X (1 − t) and take t ↓ 0. Observe that The fluctuation probability in the GLP process, given by is a simple measure that will be useful in the following.
Corollary 2. The GLP process verifies the following equalities: This result states a characterizing feature that can be helpful in model specification.Moreover, in order to satisfy 0 < p < 1, we must have 0 < f < 1/2 and 1/2 < η < 1.
Another interesting property for model identification is based on the lag-m coefficient η m , analogous with the acf for linear models.The plots in Figure 2 exhibit a power decay as the acf of AR(1) processes.Observe also that the smaller the value of p, the higher we must choose the lag-m in order to have "almost" independent observations, i.e., η m ≈ 1/2.
Proof.Consider a t = F −1 X (1 − t).The product of powered Pareto rvs is still Pareto-type tail distributed (see, e.g., Arnold (2001)) and thus, applying Equation ( 2) and the theorem of the dominated convergence, we have successively, as t ↓ 0, where L(1/t) and L * (1/t) are slowly varying functions.

Estimation
Relations (i) and (ii) stated in Corollary 2 will provide us with estimators for the autoregressive parameter p.More precisely, from (i), we have with f corresponding to the empirical counterpart of f , From the iid property of the generating process ε t (and with X 0 as specified for stationarity), we have ergodicity and thus consistence of the proposed estimators.In addition, f corresponds to the mean of Bernoulli trials with Markov dependence.From Klotz (1973), we have that √ n( f − f ) converges in distribution to a centered Gaussian model, and thus √ n( p (F) − p) by the Delta Method, as n → ∞.For more details, see Ferreira (2012).
From (ii), the estimation of p is based on the estimation of η through as long as η (H) > 1/2.Observe that η corresponds to the tail index of The most common method developed in literature is the Hill estimator (Hill (1975))-thus the superscript "H".More precisely, we have where T n−k+1:n , . . ., T n:n are the k larger order statistics of T that exceed u.It is usual to consider u ∈ [T n−k:n , T n−k+1:n [ and plot η (H) as a function of k.In Figure 3, we can see the Hill trajectories of η for the respective GLP models considered in Figure 1.The paths are quite stable around the true value of η for a large range of values of k.Indeed, variable T corresponds to the minimum of unit Pareto rvs, where the Hill estimator behaves particularly well.Consistency and asymptotic normality of the Hill estimator η (H) can be seen in Draisma et al. (2004).Expecting to observe time series data behaving exactly as the GLP functional Equation ( 3) is not realistic.At best, we might observe perturbed versions of the GLP process, for instance "noisy" processes of the form X (δ) n = X n + δZ n , n ≥ 1, where {Z n } n≥1 is an iid sequence of standard Gaussian rvs and δ > 0. Thus, the simulations cover the GLP and "noisy" GLP sample paths for δ = 0, 0.1, 1.We consider 1000 replicas of sizes n = 100, 1000, 5000 for p = 0.25, 0.5, 0.75, and marginals Pareto(1, 1).The computed estimates of the root mean squared error (rmse) and absolute bias (abias) are reported in Table 1, where the estimator p (H) is based on thresholds u corresponding to the sample minimum (q 0 ), the median (q 50 ) and the percentile 80 (q 80 ).We also register the number of fails resulting, respectively, from f ≥ 1/2 and η (H) ≤ 1/2.Not surprisingly, they are more associated to small sample sizes, where the case p = 0.75 seems particularly sensitive.Indeed, the results tend to be slightly worse under large p = 0.75, where the process approximates to independence.In practice, the difficulty in deciding between tail dependence (η = 1) and asymptotic independence (η < 1) is well known.For a survey on this topic, see, e.g., Poon et al. (2003) and Beirlant et al. (2004).The results get better as the sample sizes increase.We observe that estimator p (F) is the best for the GLP process but not so robust for "noisy" GLP.In what concerns estimator p (H) , it seems to present an overall better performance under u = q 0 .
The estimation of the tail index parameter α may be conducted through the Hill estimator α (H)  (Hill (1975)), which is consistent and asymptotically normal under strong mixing conditions (see Rootzn et al. (1990)).
Table 1.Simulation results of the root mean squared error (RMSE) and of the absolute bias (abias).The last three columns correspond to the number of fails (nf) in each case.0.0447 0.0447 0.0548 0.0037 0.0026 0.0367 0 0 0

Application
Insurance loss data is typically well modeled by heavy-tailed processes.We consider the daily closing values for the Danish fire losses registered from January 1980 to December 1990, plotted in Figure 4 (left).Observe the high values that appear suddenly, similar to the GLP simulated sample paths (see Figure 1), as well as the close linearity of the Pareto quantile-quantile plot (right panel of Figure 4).In Figure 5 (left), the almost plane region of Hill's sample path led us to the estimate α (H) ≈ 1.4.Thus, we cannot assume the existence of the acf and should avoid an analysis based on this tool.We conduct the estimate of the GLP parameter p through p (F) and p (H) .More precisely, based on Equation ( 14), we obtain p (F) = 0.9945 and, using Equation (15), we derive p (H) q 0 = 0.9610, p (H) q 50 = 0.9715 and p (H) q 80 = 0.9996, where the quantiles q 0 , q 50 and q 80 were considered according to the simulation study (see also the sample path estimates of Hill in the right panel of Figure 5).Formula (12) of the lag-m coefficient η m of Proposition 4 is a similar tool to the role of the acf in identifying linear models.Table 2 presents the estimates of η m , for m = 1, 2, 3, obtained from estimator η (H) m , which consists of the Hill estimator, respectively applied to lag-m apart random pairs (X i , X i+m ), as well as, estimates of η m ( p (F) ) and η m ( p (H) ) derived from Equation (12) by replacing p, respectively, by p (F) and p (H) .The closeness between the two type of estimates, η The GLP process thus seems to have potential in the modeling of this type of data.More tools regarding goodness-of-fit analysis will be addressed in a future work.

m
and η m ( p(•) ), shows a further contribution in favor of the model.

Table 2 .
Danish fire losses: estimates of the lag-m coefficient η m .