Abstract
In this paper, we provide a central limit theorem for the finite-dimensional marginal distributions of empirical processes whose index set is a family of cluster functionals valued on blocks of values of a stationary random field. The practicality and applicability of the result depend mainly on the usual Lindeberg condition and on a sequence which summarizes the dependence between the blocks of the random field values. Finally, in application, we use the previous result in order to show the Gaussian asymptotic behavior of the proposed iso-extremogram estimator.
MSC:
60G60; 60F05; 60G70
1. Introduction
Recent developments in massive data processing lead us to think differently about certain problems in statistics. In particular, it is interesting to develop the construction of statistics as functions of data blocks and to study their inference. On the other hand, in some applications, only very little data are relevant to the estimates, not to mention that the estimates are also hidden among a large mass of “raw data”. We can refer the reader to Davis and Mikosch [] for examples in extremes and to Long and De Sousa [] for examples in astronomy. This leads us to think of clusters of data deemed “relevant” (or extremal type, within the framework of extreme value theory), where we say that two relevant values belong to two different clusters if they belong to two different blocks. Moreover, these relevant values are in the cores of blocks, where the core of a block B is defined as the smaller sub-block of B which contains all relevant values of B, if they exist.
In this context, we consider functionals that act on these clusters of relevant values and we develop useful lemmas in order to simplify the essential step to establish a Lindeberg central limit theorem (CLT) for these “cluster functionals” on stationary random fields, inspired by the definitions of Drees and Rootzén [] and the approach of Bardet et al. [] and Gómez-García [].
The mathematical background is as follows. Let , and denote , and , where . Let be a valued stationary random field and let be the corresponding normalized random observations from the random field X, defined by for some measurable functions , such that
where G is a non-degenerate distribution and is the so-called relevance set. Here, denotes the usual indicator function of a subset A and the tendency means that for all . In particular, the convergence (1) is fulfilled if the random vector is regularly varying. For more details on regularly varying vectors, one can refer to Resnick [,].
For each , let be an integer value such that and . We define the blocks (or simply blocks) of by
where . Thus, we have complete blocks , and no more than incomplete ones which we ignore because we consider large enough. Moreover, as usual, denotes the Cartesian product and, by stationarity, we denote as a generic block of .
We are now going to formally define the core of a block, cluster functional and the empirical process of cluster functionals, which are generalizations of the definitions of Yun [], Segers [] and Drees and Rootzén [] to blocks.
Let be a block. The core of the block y with respect to the relevance set A is defined as
where, for each , and with
Let be a measurable subspace of for some such that and let be the set of valued blocks (or arrays) of size , with . Consider now the set
which is equipped with the field induced by the Borelfields on , for . A cluster functional is a measurable map such that
Let be a class of cluster functionals and let be the family of blocks of size defined in (2). The empirical process of cluster functionals in , is the process defined by
where and with denoting the relevance set.
Under the Lindeberg condition and the convergence to zero of a sequence that summarizes the dependence between the blocks of values of the random field, we prove that the finite-dimensional marginal distributions (fidis) of the empirical process (4) converge to a Gaussian process. The proof basically consists of the “Lindeberg method” for a CLT of stationary time series as in Bardet et al. [], but adapted here to stationary random fields.
Since Bardet et al. [] gave a Lindeberg CLT for time series, Gómez-García [] used this approach in order to obtain a Lindeberg CLT for cluster functionals on time series whose convergence depends mainly on the Lindeberg condition and the convergence to zero of that summarizes the dependence. Moreover, Gómez-García [] simplified by using the coefficients of weak-dependence of Doukhan and Louhichi []. This allowed the attainment of partially more general results than Drees and Rootzén [] which are established under mixing. Note that the family of weakly dependent processes of Doukhan and Louhichi [] is more general that the family of mixing processes, see Andrews [].
In the context of random fields, the approach is not very simple. In fact, we must first generalize the results of Bardet et al. [] within the framework of random fields, then we could simplify the term of dependence by fixing short range dependence conditions on the random field X like convenient conditions for the decay rates of the weak-dependence coefficients of Doukhan and Louhichi []. In this work, we concentrate on the first part and we introduce a measure (and its estimator) which motivates the choice of this generalization: the iso-extremogram, which can be viewed as a correlogram for extreme values of space–time processes.
The rest of the paper consists of three complementary sections. In Section 2, we provide useful lemmas in order to establish the CLT for the fidis of the cluster functionals empirical process (4). Then, in Section 3, we introduce the iso-extremogram and we use the CLT of Section 2 in order to show that, under appropriate additional conditions, the iso-extremogram estimator has an asymptotically Gaussian behavior. Section 4 is dedicated to the conclusions and perspectives of this approach.
2. Results
In this section, we provide useful lemmas which notably simplify the essential step to establish a CLT for the fidis of the empirical process defined in (4). The proof consists of the same techniques as Bardet et al. [] used in the demonstrations of their dependent and independent Lindeberg lemmas, but generalized here to random fields.
In order to establish the CLT, firstly, consider the following basic assumption:
- (Bas)
- The vector is such that for each .In addition, denoting , we have and , as .
Secondly, consider the following essential convergence assumptions:
- (Lin)
- , , ;
- (Cov)
- , .
Consider now the random blocks , with defined in (2). For each tuple of cluster functionals and each , we define the following random vector:
Without loss of generality and in order to simplify the writing, we consider in the rest of this section.
Let be a sequence of zero mean independent -valued random variables, independent of the sequence , such that , for all . Denote by the set of bounded functions with bounded and continuous partial derivatives up to order 3 and, for and , define
The following assumption allows us to present, in a useful and simplified form, lemmas of Lindeberg under independence and dependence.
- (Lin’)
- There exists such that, for any , we havefor all and all tuple of cluster functionals .
Moreover, denote
Lemma 1
(Lindeberg under independence). Suppose that the blocks are independent and that the random variables defined in (5) satisfy Assumption (Lin’). Then, for all , we have
Proof.
First, notice that
where
Furthermore, we adopt the convention , if either or .
Now, we use some lines of the proof of Lemma 1 in Bardet et al. [].
Let . From Taylor’s formula, there exist vectors such that
where, for , stands for the value of the symmetric linear form from of at v. Moreover, denote
Thus, for , there exist some suitable vectors such that
by using the approximation of Taylor of order 2, and
by using the approximation of Taylor of order 3.
Thus, satisfies
where (8) is given by using the inequality , with and .
Substituting and for and in the preceding inequality (8) and taking expectations, we obtain a bound for . Indeed, we have
because is independent of and , and because and for all .
On the other hand, using Jensen’s inequality, we derive , and because is a Gaussian random variable with the same covariance as .
Therefore,
and
In addition, for ,
else
The inequalities (9)–(12) allow to simplify the terms between parentheses in the last inequality in (8). Recall that for all and . Therefore, we obtain
because, for all , .
As a consequence, from Assumption (Lin’), we obtain . The proof of Lemma 1 ends. □
Remark 1.
By taking and suitably using the second inequality of (8) in the proof of Lemma 1, the classical Lindeberg conditions can be used:
where
Moreover, these classical Lindeberg conditions imply the conditions from Lemma 1. Indeed, we have
for and .
The proof of this remark for general independent random vectors is given in (Bardet et al. [], p. 165).
Remark 2.
Observe that Assumptions (Lin) and (Cov) imply that and that , respectively. Therefore, if the blocks are independent and if Assumptions (Lin) and (Cov) hold, then from Lemma 1 and Remark 1, the fidis of the empirical process of cluster functionals converges to the fidis of a Gaussian process with covariance function c.
For the dependent case, we need to consider more notations:
Let , for all . We set for any and any . For each , , and , we define
Lemma 2
(Dependent Lindeberg lemma). Suppose that the random variables defined in (5) satisfy Assumption (Lin’). Consider the special case of complex exponential functions with . Then, for each and each tuple of cluster functionals, the following inequality holds:
Proof.
Consider an array of independent random variables satisfying Assumption (Lin’) and such that is independent of and . Moreover, assume that has the same distribution as for .
Then, using the same decomposition (7) in the proof of the previous lemma, one can also write
Then, from the previous lemma, the second term of the right-hand side (RHS) of the inequality (14) is bounded by
For the first term of the RHS of the inequality (14), first notice that, for a valued random vector X independent from , we have
because , where is the covariance matrix of the vector , for . For or , recall that . In this case, we also set .
Thus,
Therefore,
This completes the proof of Lemma 2. □
The previous lemma together with Remark 1 imply the following theorem.
Theorem 1
(CLT for cluster functionals on random fields). Suppose that the basic Assumption (Bas) holds and that Assumptions (Lin) and (Cov) are satisfied. Then, if for each , converges to zero as , for all and all tuple of cluster functionals, the fidis of the empirical process of cluster functionals converges to the fidis of a Gaussian process with covariance function c defined in (Cov).
Proof.
The assumptions (Lin) and (Cov) imply that, as , and , respectively. Therefore, taking into account Remark 1, we obtain from Lemma 2 that, for each ,
for all , with , because by hypothesis, for all and all .
Notice that
and that , where , with .
Using triangular inequality, we deduce that
and therefore . The proof of Theorem 1 is complete. □
Remark 3.
The previous theorem can be formulated for as follows. Define , for , with the convention . Moreover, , for , and if i, j or k is zero. Then, if Assumptions (Bas), (Lin), (Cov) are satisfied (for ), and if for each ,
converges to zero as for all and all tuple of cluster functionals, with
the fidis of the empirical process of cluster functionals converges to the fidis of a Gaussian process with covariance function c.
Remark 4.
We have mentioned earlier that means for each . However, the limits of the sequences indexed with , as , could be reformulated in terms of the limits of such sequences as “ along a monotone path on the lattice ”, i.e., along for some strictly increasing continuous functions , with , such that as , for .
Suppose that from each block we extract a sub-block and that the remaining parts of the blocks do not influence the process . In particular, this last statement is fulfilled if
and , where . This assumption would allow us to consider (or ) as a function of the blocks (separated by ) instead of the blocks , in order to provide them bounds based on either the strong mixing coefficient of Rosenblatt [] or the weak-dependence coefficients of Doukhan and Louhichi [] for stationary random fields. These bounds are developed in Gómez-García [] for the case of weakly-dependent time series. However, we do not develop them in the random field context as this is not the aim of this work. This topic will be addressed in a forthcoming applied statistics paper with numerical simulations.
3. Asymptotic Behavior of the Extremogram for Space–Time Processes
In this section, we propose a measure (in two versions) of serial dependence on space and time of extreme values of space–time processes. We provide an estimator for this measure and we use Theorem 1 in order to establish an asymptotic result. This work is inspired by the extremogram for time series defined in Davis and Mikosch [].
Let be a valued space–time process, which is stationary in both space and time. We define the extremogram of X for two sets A and B both bounded away from zero by
with , provided that the limit exists.
In estimating the extremogram, the limit on x in (16) is replaced by a high quantile of the process. Defining as the quantile of the stationary distribution of or related quantity, with , as , one can redefine (16) by
with .
The choice of such a sequence of quantiles is not arbitrary. The main condition to guarantee the existence of the limit (17) for any two sets A and B bounded away from zero, is that it must satisfy the following convergence
for all , , , where
is a collection of Radon measures on the Borel field , not all of them being the null measure, with . In this case, we have
provided that .
Remark 5.
The condition (18) is particularly satisfied if the space–time process X is regularly varying. For details and examples of regularly varying space–time processes and time series, see Davis and Mikosch [] and Basrak and Segers [], respectively.
Note that the extremogram (17) is a function of two lags: a spatial-lag and a non-negative time-lag . Due to all the spatial values that the spatial-lag takes, in practice, it is very complicated to analyze the results of estimating such an extremogram. Moreover, the calculation would be very slow in terms of computation. To obtain a simpler interpretation and to simplify the calculations, we assume that the space–time process X satisfies the following “isotropy” condition:
- (I)
- For each pair of non-negative integers and ,
where with and .
Under this condition, the extremogram (17) can be redefined using only two non-negative integer lags: a spatial-lag and a time-lag . Indeed, under Condition (I), we define the iso-extremogram of X for two sets A and B both bounded away from zero by
where is the first element of the canonical basis of .
We now propose an estimator for the iso-extremogram. For this, without loss of generality, consider because the case can be treated in the same way.
Let be the observations from a -valued space–time process X, stationary in both space and time, and which satisfies Condition (I). Let us set . The sample iso-extremogram based on the observations is given by
for , and , where
denotes the “center” of the block , for . Moreover, with and denotes the cardinality of the set E. We recall that and , for .
Defining the cluster functional
for , such that
with (the “center” of the block ), we can rewrite the estimator (20) as
where
We can therefore write (22) in terms of empirical processes of cluster functionals (4) and use Lindeberg CLT for cluster functionals on random fields (Theorem 1) together with suitable conditions of joint distributions, in order to prove the convergence in distribution of the iso-extremogram estimator.
For this, first of all, we make some considerations: the normalized random variables are defined here by , where and ; and the random blocks as in (2). We define and as the family of cluster functionals defined in (21). Moreover, for the set A, bounded away from zero, let .
Secondly, consider the following conditions:
- (Cov’)
- For each ,
- (C)
- ,
where is set of the “centers” of the blocks .
Proposition 1
(CLT for the iso-extremogram estimator). Assume that the following conditions hold for the -valued space–time process
- The process X is stationary in both space and time and satisfies Condition (I).
- The sequence is such that holds. Moreover, and , where , , and , for .
- Conditions (Cov’) and (C) hold, and the Lindeberg condition (Lin) is satisfied for the normalized variables together with the family of cluster functionals . Moreover, for each , the coefficient defined in (15) converges to zero as , for all tuple of cluster functionals and all . The same assumption holds together with the family , which contains a single functional.
Then, for each ,
where and is the covariance matrix, defined by the coefficients
with .
Proof.
Consider the expression (22) of the iso-extremogram estimator. Then, for , we obtain that
where denotes the empirical process of cluster functionals (4). Furthermore, here and .
Now, notice that Chebyshev’s inequality applied on the random variables R and implies that they converge to zero in probability as . Similarly, applying Chebyshev’s inequality together with the condition , we prove that , as . This last condition () also guarantees that . Again, Chebyshev’s inequality on the random variable , followed by Condition (C) and , implies that this converges to zero in probability as . Thus,
From Theorem 1, the assumption 3 implies that converges to a centered Gaussian random variable with covariance matrix
for each . Using the same argument, we prove that converges to a centered Gaussian variable with variance .
Finally, considering the existence of in (Cov’), we obtain the desired result. □
4. Conclusions and Perspectives
We have proved Lindeberg lemmas for cluster functionals on stationary random fields. This allowed us to obtain a CLT for the finite-dimensional marginal distributions of the empirical process (4) of cluster functionals of stationary random fields under the classical Lindeberg condition and the convergence to zero of a sequence that summarizes the dependence between the blocks of values of the random field. Moreover, we have introduced a new spatio–temporal measure of serial extremal dependence: the iso-extremogram, a type of correlogram for extreme values of space–time processes. Under precise conditions, we have proved that the iso-extremogram estimator is asymptotically Gaussian.
In all our results, it can be noted that the sequence converges to zero if the random field satisfies short range dependence conditions; either mixing or weak-dependence conditions. However, in this work we do not specify such conditions because it is not the aim of this paper, but of course it will be presented in a forthcoming applied statistics article including numerical simulations. To obtain a general idea of how to simplify the coefficient using weak dependence coefficients, the reader is referred to Gómez-García [] which deals with the time series framework.
Author Contributions
Conceptualization, J.G.G.-G.; methodology, J.G.G.-G. and C.C.; investigation, J.G.G.-G.; writing—original draft preparation, J.G.G.-G.; writing—review and editing, J.G.G.-G. and C.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
We would like to thank the two referees and an Associate Editor for their constructive comments.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Davis, R.A.; Mikosch, T. Extreme value theory for space-time processes with heavy-tailed distributions. Stoch. Process. Their Appl. 2008, 118, 560–584. [Google Scholar] [CrossRef]
- Long, J.P.; De Sousa, R.S. Wiley StatsRef: Statistics Reference Online. In Statistical Methods in Astronomy; American Cancer Society: Atlanta, GA, USA, 2018; pp. 1–11. [Google Scholar] [CrossRef]
- Drees, H.; Rootzén, H. Limit theorems for empirical processes of cluster functionals. Ann. Stat. 2010, 38, 2145–2186. [Google Scholar] [CrossRef]
- Bardet, J.; Doukhan, P.; Lang, G.; Ragache, N. Dependent Lindeberg Central Limit Theorem and Some Applications. ESAIM Probab. Stat. 2007, 12, 154–172. [Google Scholar] [CrossRef][Green Version]
- Gómez-García, J. Dependent Lindeberg central limit theorem for the fidis of empirical processes of cluster functionals. Statistics 2018, 52, 955–979. [Google Scholar] [CrossRef]
- Resnick, S. Point processes, regular variation and weak convergence. Adv. Appl. Probab. 1986, 18, 66–138. [Google Scholar] [CrossRef]
- Resnick, S. Extreme Values, Regular Variation, and Point Processes; Springer: Berlin, Germany, 1987. [Google Scholar]
- Yun, S. The distributions of cluster functionals of extreme events in a dth-order Markov chain. J. Appl. Probab. 2000, 37, 29–44. [Google Scholar] [CrossRef]
- Segers, J. Functionals of clusters of extremes. Adv. Appl. Probab. 2003, 35, 1028–1045. [Google Scholar] [CrossRef]
- Doukhan, P.; Louhichi, S. A new weak dependence condition and applications to moment inequalities. Stoch. Process. Their Appl. 1999, 84, 313–342. [Google Scholar] [CrossRef]
- Andrews, D.K.W. Non strong mixing autoregressive processes. J. Appl. Probab. 1984, 21, 930–934. [Google Scholar] [CrossRef]
- Rosenblatt, M. A central limit theorem and a strong mixing condition. Proc. Natl. Acad. Sci. USA 1956, 42, 43–47. [Google Scholar] [CrossRef] [PubMed]
- Davis, R.A.; Mikosch, T. The extremogram: A correlogram for extreme events. Bernoulli 2009, 15, 977–1009. [Google Scholar] [CrossRef]
- Basrak, B.; Segers, J. Regularly varying multivariate time series. Stoch. Process. Their Appl. 2009, 119, 1055–1080. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).