Next Article in Journal
New Monotonic Properties of Positive Solutions of Higher-Order Delay Differential Equations and Their Applications
Next Article in Special Issue
A Preventive Replacement Policy for a System Subject to Bivariate Generalized Polya Failure Process
Previous Article in Journal
Solving Multi-Group Reflected Spherical Reactor System of Equations Using the Homotopy Perturbation Method
Previous Article in Special Issue
Robust Parametric Identification for ARMAX Models with Non-Gaussian and Coloured Noise: A Survey
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On Robustness for Spatio-Temporal Data

by
Alfonso García-Pérez
Departamento de Estadística, I.O. y C.N., Universidad Nacional de Educación a Distancia (UNED), Senda del Rey 9, 28040 Madrid, Spain
Mathematics 2022, 10(10), 1785; https://doi.org/10.3390/math10101785
Submission received: 19 April 2022 / Revised: 13 May 2022 / Accepted: 18 May 2022 / Published: 23 May 2022
(This article belongs to the Special Issue Probability Theory and Stochastic Modeling with Applications)

Abstract

:
The spatio-temporal variogram is an important factor in spatio-temporal prediction through kriging, especially in fields such as environmental sustainability or climate change, where spatio-temporal data analysis is based on this concept. However, the traditional spatio-temporal variogram estimator, which is commonly employed for these purposes, is extremely sensitive to outliers. We approach this problem in two ways in the paper. First, new robust spatio-temporal variogram estimators are introduced, which are defined as M-estimators of an original data transformation. Second, we compare the classical estimate against a robust one, identifying spatio-temporal outliers in this way. To accomplish this, we use a multivariate scale-contaminated normal model to produce reliable approximations for the sample distribution of these new estimators. In addition, we define and study a new class of M-estimators in this paper, including real-world applications, in order to determine whether there are any significant differences in the spatio-temporal variogram between two temporal lags and, if so, whether we can reduce the number of lags considered in the spatio-temporal analysis.

1. Introduction

There exist several approaches for the treatment of spatio-temporal data. The most common approach is to assume that the data are a partial realization of a spatio-temporal random field Z ( s , t ) , ( s , t ) D × T (see, e.g., [1,2]). In this superpopulation model ([3], p. 8), we also assume that D is a fixed subset of R d , d 1 and T R ; that is, we assume that a random variable Z, such as precipitation, temperature or atmospheric pollutant concentrations, is observed at some known fixed locations s and different time moments t, considering a geostatistical framework where the spatial observations are expected to be correlated with a decreasing correlation as the distance between locations increases.
We can conduct exploratory data analysis with spatio-temporal data, mainly through their visualization. However, it is more interesting to model the random field, allowing for inference of the model parameters and closed-form expressions (see [4]). As it is usually assumed that the data come from a joint Gaussian (i.e., normal) distribution, we are interested in estimating the parameters; that is, summaries of the first- and second-order characteristics. To make this feasible, we suppose that Z ( s , t ) is intrinsically stationary in space and time; that is, its increments in space and time have a zero mean (possibly after a temporal trend has been removed) and have a variance that depends only on displacements in space and differences in time. With these assumptions, the parameter of interest is the spatio-temporal variogram of Z, defined as
2 γ z ( h ; τ ) = v a r ( Z ( s + h ; t + τ ) Z ( s ; t ) ) ,
where v a r is the variance of Z, h is a spatial lag, and τ is a temporal lag.
We also assume that Z is spatially isotropic; that is, the variogram depends on the spatial lag h only through the Euclidean norm h .
Furthermore, one of the most important problems in geostatistics is kriging prediction at new locations, for which the spatio-temporal variogram is required. Hence, the spatio-temporal variogram is the crucial parameter in geostatistics. However, the traditional spatio-temporal variogram estimator, which is commonly employed for these purposes, is extremely sensitive to outliers. Moreover, in a wide range of fields, such as geology, the environment, sustainability or climate change, detecting atypical observations is of special interest.
Considering these aims, we first define new robust estimators of the spatio-temporal variogram. Then, we obtain very accurate approximations for the sample distribution of these new estimators, and, with these, we finally identify spatio-temporal outliers.
The spatio-temporal variogram of Z can also be written as
2 γ z ( h ; τ ) = E [ ( Z ( s + h ; t + τ ) Z ( s ; t ) ) 2 ] ,
where E denotes the mathematical expectation of Z.
To analyze Z, we consider observations of the random field Z ( s , t ) at spatial locations { s i : i = 1 , . . . , m } and times { t j : j = 1 , . . . , T } , where n = m · T is the sample size.
In this situation, the spatio-temporal variogram is estimated using the classical method-of-moments estimator, also called the empirical spatio-temporal variogram (see [3,5,6]),
2 γ ^ z ( h ; τ ) = 1 | N s ( h ) | 1 | N t ( τ ) | s i , s k N s ( h ) t j , t l N t ( τ ) ( Z ( s i ; t j ) Z ( s k ; t l ) ) 2 ,
where N s ( h ) refers to the set containing all pairs of spatial locations with spatial lag h , and N t ( τ ) refers to the set containing all pairs of time points with time lag τ . Furthermore, | N ( · ) | denotes the number of elements in the set N ( · ) .
If we denote, by n ( h , τ ) = | N s ( h ) | · | N t ( τ ) | , the sample size considered in the estimator 2 γ ^ z ( h ; τ ) —that is, the number of pairs with spatio-temporal lag ( h , τ ) —this estimator is a sample mean of n ( h , τ ) terms and, hence, sensitive to outliers in the terms.
In [7], robust estimators of the spatial variogram and accurate approximations for their distributions were obtained. In [8], these results were extended to the multivariate case, with robust estimators for the cross-variogram. In the first part of this paper, we extend these results by introducing a temporal component into the problem. This is achieved by defining new robust M-estimators of the spatio-temporal variogram and obtaining accurate approximations for their distributions, as well as for the classical one, 2 γ ^ z ( h ; τ ) . In the last part of this paper, we propose a method for identifying spatio-temporal outliers, also obtaining interesting properties of a new class of M-estimators.
The remainder of this paper is organized as follows: A spatio-temporal variogram M-estimator is proposed in Section 2, and an approximation to its distribution is obtained at the end of Section 3.2. The problem of independence of the transformed observations is addressed in Section 4. These results are applied in Section 5 to the empirical spatio-temporal variogram estimator. In Section 6, we introduce Huber’s spatio-temporal variogram estimator and obtain an approximation to its distribution. An example is developed in Section 7. The question of whether some temporal lags can be dropped in the analysis is considered in Section 8. The problem of identifying spatio-temporal outliers is addressed in Section 9, where a new class of M-estimators is defined. The conclusions of the paper are presented in Section 10.

2. M-Estimators of the Spatio-Temporal Variogram

2.1. Underlying Model for Z

The common model assumption for spatio-temporal data Z is a normal distribution. Nevertheless, this is a very strong assumption as, although most of the data will come from this model, it is very likely that some will not. For this reason, it is more realistic to assume a scale-contaminated normal distribution for the model (see, e.g., [9], p. 2):
( 1 ϵ ) N ( μ , σ 2 ) + ϵ N ( μ , g 2 σ 2 ) ,
where ϵ ( 0 , 1 ) and g > 1 , with ϵ representing the proportion of outliers in the sample and g denoting the quantity that contaminates them. For ϵ = 0 or g = 1 , this model is the normal distribution and, if ϵ > 0 and g > 1 , it is the N ( μ , σ 2 ) in the central part but with heavier tails. In this way, we consider that the model for Z is inside the class of scale contamination neighborhoods of the normal distribution, P ϵ ( N ) = { F ϵ | F ϵ = ( 1 ϵ ) N ( μ , σ 2 ) + ϵ N ( μ , g 2 σ 2 ) } , one of the usual model classes considered in robustness studies ([9] p. 12, [10,11] or [12] p. 870).
Although the main role in the question of the underlying model is played by the marginal distributions of Z, in order to complete the mathematical framework, we shall assume that these marginal distributions are obtained from the multivariate scale-contaminated normal distribution (see, e.g., [13], pp. 2, 220).

2.2. M-Estimators of the Spatio-Temporal Variogram

Let us consider the transformation
X i j = ( Z ( s i + h ; t j + τ ) Z ( s i ; t j ) ) 2 s i , t j .
These new variables will be shortened, in some cases, by X u , u = 1 , n , considering them as a sample of a new variable X = ( Z ( s + h ; t + τ ) Z ( s ; t ) ) 2 defined from the lags of Z in space and time. As the parameter of interest is now 2 γ z ( h ; τ ) = E [ X ] , the problem of estimating the spatio-temporal variogram described in the previous section can be considered as the problem of estimating the expectation of the random variable X, obtained from the original Z through this transformation.
This framework is especially suitable and useful in situations related to spatial or temporal data, where the initially dependent observations are separated by a spatial and/or temporal lag and where direct robust estimators, if they exist, are difficult to apply. Considering this mean (the spatio-temporal variogram) as a functional T of the underlying distribution F,
T ( F ) = x d F ( x ) ,
where F is the cumulative distribution function of X, and its classical method-of-moments estimator is the sample mean
T ( F n ( h , τ ) * ) = x d F n ( h , τ ) * ( x ) = 1 n ( h , τ ) u = 1 n ( h , τ ) X u
of the transformed variables X u , where F n ( h , τ ) * is the empirical cumulative distribution function. This approach—that is, expressing estimators as functionals of the empirical distribution function—is common and useful in robustness studies ([9,14]).
An important question here is how to choose the transformation (1) such that the new variables X u are independent in the new sample mean. We shall deal with this problem later. If we achieve this independence, obtaining robust estimators for the parameter T ( F ) is an easy task with M-estimators and α -trimmed means of the transformed variables X u . With respect to the former, we can define a spatio-temporal M-estimator ([11]) T n for the parameter T ( F ) (the spatio-temporal variogram) based on the transformed observations X u as a solution to the equation
u = 1 n ψ ( X u , T n ) = 0 ,
assuming that ψ ( x , θ ) is monotonic decreasing in θ for all x. In fact, as T n is an estimator for a location problem, ψ ( x , θ ) is of the form ψ ( x θ ) , with ψ ( v ) monotonically increasing in v. Now, we should control the local robustness of these M-estimators, through choosing different bounded score functions ψ (see, e.g., refs. [9,15] for a background on robust methods and standard M-estimators.)
Hence, the idea that we propose in the paper is that, instead of considering a weird estimator for a strange parameter of the initial Z distribution, we transform the original (and usually dependent) observations Z u into new data X u (independent under some conditions), obtaining, in this way, a natural parameter of the new variable (e.g., its mean), for which a manageable estimator (the sample mean) should be feasible. Then, standard techniques of robustification can be applied. The comparison between the traditional estimator (the empirical spatio-temporal variogram) and one of these robust M-estimators here introduced, both based on the observations X u , is the well-known comparison between the sample mean and a robust M-estimator (see, e.g., [9,14]).
This idea was first successfully applied in [16] and has also been utilized in [7,8]. Furthermore, in the paper [17], this idea was used for the periodogram ordinates in the context of a time-series.

2.3. Distribution of Variables X u

An important problem is to determine the distribution of this new variable X, from the original normal (or contaminated normal) distribution of Z, in order to later obtain the distribution of the robust estimators based on X.
If we consider a scale-contaminated normal model for the original observations Z, as the variable Z ( s i + h ; t j + τ ) Z ( s i ; t j ) follows a normal distribution with 0 mean and variance 2 γ z ( h ; τ ) . For each s i , t j , the distribution of the transformed variables
X i j = ( Z ( s i + h ; t j + τ ) Z ( s i ; t j ) ) 2
is the mixture
F = ( 1 ϵ ) 2 γ z ( h ; τ ) χ 1 2 + ϵ g 2 2 γ z ( h ; τ ) χ 1 2 = ( 1 ϵ ) G + ϵ H ,
where G = 2 γ z ( h ; τ ) χ 1 2 and H = g 2 2 γ z ( h ; τ ) χ 1 2 , where χ 1 2 is a chi-square distribution with one degree of freedom, following a similar development to that followed in [7], Section 2.1.

3. Approximation to the Distribution of M-Estimators of the Spatio-Temporal Variogram

The distribution of these new robust M-estimators T n , defined by (2), depends on the distribution of the new variables X u after the transformation. We obtain an approximation to the distribution of the robust estimators T n ( X 1 , , X n ) in two steps: in the first step, we consider a von Mises expansion (VOM) of the tail probability functional, which depends on another functional, for which we obtain a saddlepoint approximation (SAD) in the second step. The independence of the X u is now required.

3.1. Von Mises Approximation

If T n ( X 1 , , X n ) is an estimator with associated functional T, and F is the underlying model distribution of the observations X u , we usually cannot express T ( F ) explicitly; however, we can utilize a linearization based on the von Mises expansion, [18], at G (called the pivotal distribution) as follows:
T ( F ) = T ( G ) + I F ( x ; T , G ) d F ( x ) + O ( | | F G | | 2 ) ,
where I F ( · ; T , G ) is the Hampel Influence Function; that is, the G a ^ teaux derivative of T at G in direction Δ x , the Dirac measure at x (see [15,19,20]).
If we consider T as the tail probability functional, T ( F ) = P X i F T n > a , the Hampel Influence Function is now the Tail Area Influence Function TAIF ([21]), and the previous von Mises expansion is equal to
P F { T n > a } = P G { T n > a } + TAIF x ; a ; T n , G d F ( x ) + O | | F G | | 2 ,
from which we define the von Mises approximation (VOM)
P F { T n > a } P G { T n > a } + TAIF x ; a ; T n , G d F ( x ) ,
which will be accurate if the distributions F and G are close. In this case, we can use this approximation to compute the distribution of T n under the underlying model F using a model G in the class P ϵ ( N ) .
In particular, if F is the mixture F = ( 1 ϵ ) G + ϵ H , the von Mises approximation will be
P F { T n > a } P G { T n > a } + ϵ TAIF x ; a ; T n , G d H ( x ) ,
because
TAIF x ; a ; T n , G d F ( x ) = ( 1 ϵ ) TAIF x ; a ; T n , G d G ( x )
+ ϵ TAIF x ; a ; T n , G d H ( x ) = ( 1 ϵ ) · 0 + ϵ TAIF x ; a ; T n , G d H ( x ) .

3.2. Saddlepoint Approximation of the TAIF

The von Mises approximations (3) or (4) depend on the TAIF, which is the influence function of the tail probability functional. Daniels ([22], p. 94), using the Lugannani and Rice formula ([23]), gave the following saddlepoint approximation (SAD) for the tail probability of an M-estimator T n ( X 1 , , X n ) with score function ψ , assuming that G is the underlying model for the X u ,
P G { T n > a } = 1 Φ ( s ) + ϕ ( s ) 1 r 1 s + O ( n 3 / 2 ) ,
where Φ and ϕ are the cumulative and density functions of the standard normal distribution, and s and r are the functionals
s = 2 n K ( z 0 , a ) , r 1 = z 0 K ( z 0 , a ) , r = n r 1 ,
where
K ( λ , a ) = log e λ ψ ( y , a ) d G ( y )
is the cumulant generating function of the distribution G; K ( λ , a ) and K ( λ , a ) are the second and first partial derivatives of K ( λ , a ) with respect to the first argument λ , respectively, and z 0 is the saddlepoint; that is, the functional solution of the saddlepoint equation
K ( z 0 , a ) = e z 0 ψ ( y , a ) ψ ( y , a ) d G ( y ) = 0 .
If, in approximation (5), we replace the model G by the contaminated model G ϵ ; x = ( 1 ϵ ) G + ϵ Δ x and obtain the derivative at ϵ = 0 , in all of the functionals involved in it, we obtain a saddlepoint approximation of the TAIF x ; a ; T n , G , (for details, see [24] pp. 402–404, [25] p. 77 or [9] p. 314), as
TAIF x ; a ; T n , G = ϕ ( s ) r 1 n 1 / 2 e z 0 ψ ( x , a ) e z 0 ψ ( y , a ) d G ( y ) 1 + O ( n 1 / 2 ) .
Replacing the SAD approximation (6) in the VOM approximation (3), we obtain the VOM + SAD approximation for the distribution of an M-estimator T n ( X 1 , , X n ) with score function ψ , at the model F, which is on the order of O ( n 1 / 2 ) ,
P F { T n > a } P G { T n > a } + ϕ ( s ) r 1 n e z 0 ψ ( x , a ) d F ( x ) e z 0 ψ ( y , a ) d G ( y ) 1 .
In the particular case that the transformed observations X u follow a mixture model F = ( 1 ϵ ) G + ϵ H , the VOM + SAD approximation is
P F { T n > a } P G { T n > a } + ϵ ϕ ( s ) r 1 n e z 0 ψ ( x , a ) d H ( x ) e z 0 ψ ( y , a ) d G ( y ) 1 .
Remark 1.
If the sample size is large and T n is asymptotically normal under F, we can approximate its distribution using the Central Limit Theorem, thereby, obtaining
P F { T n > a } P F ( T n E [ T n ] ) / σ T n > ( a E [ T n ] ) / σ T n
= 1 Φ ( a E [ T n ] ) / σ T n .
Alternatively, if T n is only asymptotically normal under G, we can approximate the leading terms of (7) and (8).
Remark 2.
Approximations (7) and (8) are valid for any M-estimator with score function ψ based on X u data, solution of (2). For spatio-temporal data, these X u , which are transformations of the initial Z i observations, have different distributions than the Y s used in [7] for the estimation of the spatial variogram and also different from those used in [8] in the estimation of the cross-variogram.
In addition to the differences in the observations are the differences in the score functions. Here, for the spatio-temporal problem, ψ will include the temporal dimension, which was not considered in the other two mentioned papers. However, the main difference is that, in [7], we obtained M-estimators for the spatial variogram, while here we obtained it for the spatio-temporal variogram. However, if the temporal dimension is removed (see Section 8), both estimators will agree. Hence, the estimators obtained here generalize those of the variogram (without temporal dimension) obtained there, as it should be.
This remark can be clearly observed in the example considered in Section 7, where we obtain seven different spatial variogram estimators (see Figure 6 for the classical and Figure 7 for the robust) at the seven different temporal lags considered—all of them obtained from the only one classical (Figure 4) or robust (Figure 5) three-dimensional spatio-temporal variogram estimator.

4. Independence of the Transformed Variables X u

As the locations s i are fixed in advance, they can be considered as being equally spaced on a transect, as in [3], p. 32. Hence, we can match two contiguous s i (for which the dependence of the Z i is supposed to be the strongest), such that s i + h = s i + 1 . Under these conditions, with the same arguments as in [7], Section 2, it can be proved that, at each time t j and time lag τ , the correlation between X i j = Z ( s i + h ; t j + τ ) Z ( s i ; t j ) and X k j = Z ( s k + h ; t j + τ ) Z ( s k ; t j ) is 0 if a linear semivariogram model can be accepted for all the initial Z u variables.
Moreover, following the ideas provided in [8] for the cross-variogram, if we can also accept a linear cross-variogram for each pair ( Z i , Z k ) at any pair of time moments, assuming that all moments are equally spaced, the variables X i j = Z ( s i + h ; t j + τ ) Z ( s i ; t j ) and X k l = Z ( s k + h ; t l + τ ) Z ( s k ; t l ) will also be independent; then, so will all of the X u , u = 1 , , n , assuming that the vector Z of the observations is distributed as a multivariate (or contaminated) normal distribution.
Hence, to obtain the independence of the X u , we must check that a linear semivariogram can be accepted for the Z u and a linear cross-variogram for each pair ( Z i , Z k ) .
This can easily be checked in a visual way with R and formally with the global test proposed in [7], Section 10.1. Furthermore, these linearity requirements should not be a serious problem, as we can move the spatial lag h and/or the time lag τ until linearized versions ([7], Section 9) of the variograms and cross-variograms can be accepted.

5. VOM + SAD Approximation of the Distribution of the Empirical Spatio-Temporal Estimator

As the classical method-of-moments estimator
2 γ ^ z ( h ; τ ) = 1 n ( h , τ ) u = 1 n ( h , τ ) X u
is an M-estimator and a solution of the equation
u = 1 n ( h , τ ) ψ ( X u , T n ) = 0
with the score function ψ ( v ) = v , we can use the results of Section 3 to obtain a VOM + SAD approximation for its distribution.
In the unrealistic case of no contamination—namely, if Z N ( μ , σ 2 ) and so, X u 2 γ z ( h ; τ ) χ 1 2 —the exact distribution of 2 γ ^ z ( h ; τ ) is the tail of a χ 2 distribution with n ( h , τ ) degrees of freedom,
P { 2 γ ^ z ( h ; τ ) > a } = P χ n ( h , τ ) 2 > a · n ( h , τ ) 2 γ z ( h ; τ ) .
Hence, using G = 2 γ z ( h ; τ ) χ 1 2 as a pivotal distribution, the von Mises approximation (8) becomes
P F { 2 γ ^ z ( h ; τ ) > a } P χ n ( h , τ ) 2 > a · n ( h , τ ) 2 γ z ( h ; τ ) + ϵ ϕ ( s ) r 1 n ( h , τ ) e z 0 ψ ( x , a ) d H ( x ) e z 0 ψ ( y , a ) d G ( y ) 1 ,
considering a scale-contaminated normal distribution for the original observations Z, i.e., the following model for the X u
F = ( 1 ϵ ) 2 γ z ( h ; τ ) χ 1 2 + ϵ g 2 2 γ z ( h ; τ ) χ 1 2 = ( 1 ϵ ) G + ϵ H ,
where G = 2 γ z ( h ; τ ) χ 1 2 and H = g 2 2 γ z ( h ; τ ) χ 1 2 ; that is, where G is a gamma distribution with parameters ( 1 / 2 , 1 / ( 4 γ z ( h ; τ ) ) ) , and H is a gamma distribution with parameters ( 1 / 2 , 1 / ( 4 g 2 γ z ( h ; τ ) ) ) .
In (9), the saddlepoint is
z 0 = 1 4 γ z ( h ; τ ) 1 2 a ,
and approximation (9) becomes
P F { 2 γ ^ z ( h ; τ ) > a } P χ n ( h , τ ) 2 > a n ( h , τ ) 2 γ z ( h ; τ ) + ϵ n ( h , τ ) 2 γ z ( h ; τ ) π ( a 2 γ z ( h ; τ ) ) · exp n ( h , τ ) 2 a 2 γ z ( h ; τ ) 1 log a 2 γ z ( h ; τ ) · 2 γ z ( h ; τ ) a a g 2 + 2 g 2 γ z ( h ; τ ) 1 .
This approximation has the same accuracy as the VOM + SAD approximation obtained in [7] for Matheron’s estimator because, in fact, the classical spatio-temporal estimator is a generalization of Matheron’s estimator. For this reason, the lack of robustness of Matheron’s estimator is also inherited in the empirical spatio-temporal estimator.

Accuracy of the Approximation

Let us observe that, if ϵ = 0 or g = 1 , the sum of the right-hand side of approximation (10) is zero. Moreover, we can observe the accuracy of this approximation with a simulation, as explained in the Supplementary Material.
With this simulation, we can see the quality of approximation (10) in Table 1 for several values of a, considering a sample size as small as n ( h , τ ) = 3 , g = 1.1 (i.e., 10 % contamination in scale), 2 γ z ( h ; τ ) = 1.4 and ϵ = 0.01 . The exact values were obtained with a simulation considering 100 , 000 samples.
This VOM + SAD approximation is shown in Figure 1, as the dotted line, where the solid line shows the exact distribution.
In Figure 2, we plot the VOM + SAD approximation with different contaminations: ϵ = 0.01 , ϵ = 0.05 , ϵ = 0.1 and ϵ = 0.2 . We can see that, as the contamination percentage (i.e., the value of ϵ ) increases, the p-values and critical values are greatly affected, graphically indicating the lack of robustness of the classical spatio-temporal estimator.
The details of this and other computations, as well as the R functions ([26]) used in the paper, are available on the website https://www2.uned.es/pea-metodos-estadisticos-aplicados/spa-temp-variogram.htm as Supplementary Material (accessed on 18 April 2022).

6. Huber’s Spatio-Temporal Variogram Estimator

We define the Huber spatio-temporal variogram estimator  2 γ ^ H ( h ; τ ) as the M-estimator obtained from Equation (2) using, as the score function ψ , the Huber function ψ b ( u ) = min { b , max { u , b } } , where b is the tuning constant.
This estimator is a generalization of the spatial Huber estimator for the spatial variogram defined in [7]. Here, the score function ψ incorporates the time component, sometimes as spatial variograms at different time moments.
In the approximation proposed for the tail probability of Huber’s spatio-temporal variogram estimator, we approximate the leading term using the Lugannani and Rice formula, [23], given in (5), and the second term using the integral of the saddlepoint approximation of the TAIF obtained in Section 3.2, assuming again a scale-contaminated normal model. The VOM + SAD approximation obtained in this way is
P X i F 2 γ ^ H ( h ; τ ) > a 1 Φ ( s ) + ϕ ( s ) 1 r 1 s + ϵ ϕ ( s ) r 1 n ( h , τ ) e z 0 ψ b ( x a ) d H ( x ) e z 0 ψ b ( y a ) d G ( y ) 1 ,
where the saddlepoint z 0 is obtained from the saddlepoint equation
e z 0 ψ b ( y a ) ψ b ( y a ) d G ( y ) = 0 .
Some applications of this estimator are given in the following example.

7. Example

For this example, we obtain the Huber spatio-temporal variogram estimator for the NOAA data set. This data set was introduced in [5] and refers to the daily weather data obtained by the US National Oceanic and Atmospheric Administration (NOAA) National Climatic Data Center.
In this data set, we considered the variable Tmax—the daily maximum temperature in degrees Fahrenheit. The classical spatio-temporal semivariogram for this variable is shown in Figure 2.17 of [5], p. 39. In Figure 3, we show the Huber spatio-temporal semivariogram estimator defined in this paper, considering the tuning constant b = 1.345 .
Three-dimensional representations of these classical and robust Huber’s spatio-temporal semivariogram estimators are shown, respectively, in Figure 4 and Figure 5.
Details of these computations are provided in the Supplementary Material.

8. Significant Time Dimension

We can see differences with respect to the selected temporal lags in Figure 6 and Figure 7 for the classical and robust semivariogram estimators, respectively, obtained from the three-dimensional spatio-temporal variogram estimators by fixing the lags. These differences became smaller as we increased the time lag. If there were no significant differences between two of these semivariograms, we could group these two lags into one, thus, reducing the number of time lags considered.
Let us denote by γ z ( h , τ 0 ) and γ z ( h , τ ) the semivariograms at lags τ 0 and τ for a fixed spatial lag h , having corresponding distributions F τ 0 and F τ . If we use approximation (7), considering the distributions F = F τ and G = F τ 0 , the VOM + SAD approximation of the distribution of the classical spatio-temporal estimator T n = 2 γ ^ z ( h , τ ) at lag τ is
P F τ { T n > a } P F τ 0 { T n > a } + ϕ ( s ) r 1 n ( h , τ 0 ) e z 0 ψ ( x , a ) d F τ ( x ) e z 0 ψ ( y , t ) d F τ 0 ( y ) 1 = P χ n ( h , τ 0 ) 2 > a · n ( h , τ 0 ) 2 γ z ( h ; τ 0 ) + n ( h , τ 0 ) 2 γ z ( h ; τ 0 ) π ( a 2 γ z ( h ; τ 0 ) ) · exp n ( h , τ 0 ) 2 a 2 γ z ( h ; τ 0 ) 1 log a 2 γ z ( h ; τ 0 ) · exp a 1 4 γ z ( h ; τ 0 ) 1 4 γ z ( h ; τ ) 2 γ z ( h ; τ ) 2 γ z ( h ; τ 0 ) 1 ,
where n ( h , τ 0 ) is the sample size used by T n at spatial lag h and temporal lag τ 0 .
In the same way as in a general testing problem, we test the null hypothesis θ = θ 0 against the alternative θ > θ 0 using a test statistic S n , computing the tail probability P θ 0 { S n > s n } , where s n is the observed value of S n , and if this probability is small (large), we reject (accept) the null hypothesis. Here, we can test, for a fixed spatial lag h , the null hypothesis of no significant change between two temporal lags τ 0 and τ —that is, H 0 : γ z ( h , τ ) = γ z ( h , τ 0 ) , against H 1 : γ z ( h , τ ) > γ z ( h , τ 0 ) —by computing the tail probability
P 2 γ z ( h , τ 0 ) { 2 γ ^ z ( h , τ ) > 2 γ ^ z ( h , τ ) o b s . } .
A small value of this probability will discredit the null hypothesis and lead us to reject it, concluding that there exists a significant difference between the semivariograms at the lags τ 0 and τ , suggesting that we must compute the (classical or robust) estimators in a separate way at these two lags. On the other hand, if we accept the null hypothesis, we shall group these two lags, thus, considering one less lag.
For instance, in the previous example, considering the spatial lag h between 240 and 320, the previous probability between the starting moment and the first time lag or that between the first and second temporal lag, are both equal to 0, suggesting highly significant differences between these two pairs of lags (as can be appreciated in Figure 6 and Figure 7).
On the other hand, the probability between time lags four and five is 0.9427521 and between the last two is 0.9737844, (for the spatial lag 240 < h < 320 ), leading us to accept the null hypothesis and suggesting that we can consider all of these observations in a single group for the computation of the spatio-temporal variogram estimator.

9. Identification of Spatio-Temporal Outliers

The second objective of this work is to identify spatio-temporal outliers. For this purpose, we calculated the VOM + SAD approximation of the distribution of the Difference M-estimator, an M-estimator that is essentially the difference between the classical method-of-moments estimator and the Huber estimator defined in Section 6. We chose this pair of estimators, as Huber’s estimator minimizes the maximum asymptotic variance inside the class of contamination neighborhoods of the normal distribution—the class of models considered in the paper—and the mean is an extreme particular case of it (i.e., they are nested estimators). When this difference is significant at some pair of lags, we qualify this pair of lags as spatio-temporal outliers.
This M-estimator is completely defined in (12) below; however, we can also say that the Difference M-estimator is one of the estimators inside the class defined in the next section.

9.1. Average M-Estimators

M-estimators ([11]) are likely the most widely used robust estimators. Nevertheless, they are somewhat unpleasant to handle as they are defined in an implicit way, as a solution of an equation; in particular, the spatio-temporal M-estimator is a solution of Equation (2). Next, we define a new class of M-estimators, which is considered in this paper only for the case of location estimation.
Definition 1.
If T n is an M-estimator with score function ψ and, thus, with M-functional T ( F ) defined by
ψ ( x , T ( F ) ) d F ( x ) = 0 ,
the Average M-estimator associated with T n is defined as
T n a = 1 n i = 1 n ψ ( X i )
with the associated functional
T a ( F ) = ψ ( x ) d F ( x ) .
The Average M-estimator associated with the mean is exactly the mean and i = 1 n ψ b ( X i ) / n is the associated with the Huber estimator, ψ b being the Huber score function considered in Section 6.
An Average M-estimator is an M-estimator with score function ψ ( x , θ ) = ψ ( x ) θ because it is a solution of
i = 1 n ψ ( x i , θ ) = 0 ;
that is,
i = 1 n ψ ( x i ) n θ = 0
or
T n a = i = 1 n ψ ( x i ) / n .
We summarize some of the main properties of this class of M-estimators in the following proposition.
Proposition 1.
(a) The Influence Function of a linear combination of estimators is the linear combination of their Influence Functions:
If T = j = 1 q w j T j is a linear combination of q estimators with Influence Functions I F j , the Hampel Influence Function of T is j = 1 q w j I F j .
(b) The linear combination of Average M-estimators is an M-estimator:
If T = j = 1 q w j T j a is a linear combination of q Average M-estimators with score functions ψ j ( x i ) θ , then T is an M-estimator with score function
ψ ( x i , θ ) = j = 1 q w j ψ j ( x i ) θ .
(c) The Hampel Influence Function of an Average M-functional T a ( F ) is
I F ( x ; T a , F ) = ψ ( x ) T a ( F ) .
(d) The robustness properties of an Average M-estimator are the same as those of the M-estimator from which it is defined.
(e) Any Average M-functional is a linear functional and is weakly continuous on the class of probability distributions on the Borel σ-algebra if ψ is bounded.
(f) The asymptotic distribution of an Average M-estimator is normal with the mean being the associated functional and asymptotic variance
( ψ ( x ) T a ( F ) ) 2 / n .
Proof. 
The proof of (a) is straightforward due to the linearity properties of the limits (or derivatives) and because the Hampel Influence Function is defined as a limit.
To prove (b), we set up the equation
i = 1 n ψ ( X i , T ) = 0 ;
that is,
i = 1 n j = 1 q w j ψ j ( x i ) n T = 0
or
T = j = 1 q w j i = 1 n ψ j ( x i ) / n = j = 1 q w j T j a .
(c) The Hampel Influence Function of an Average M-functional T a ( F )
T a ( F ) = ψ ( y ) d F ( y )
is obtained first by contaminating the distribution
T a ( F ϵ ) = ( 1 ϵ ) ψ ( y ) d F ( y ) + ϵ ψ ( x )
and then obtaining the derivative at ϵ = 0 ,
I F ( x ; T a , F ) = ψ ( x ) T a ( F ) .
(d) The infinitesimal robustness properties of an estimator, such as the gross-error sensitivity (B-robustness), local-shift sensitivity and rejection point, are based on its Influence Function which, in the case of M-estimators, depends on the behavior of their score functions. As the Influence Function of an Average M-estimator is the score function ψ (shifted by T a ( F ) ) of the M-estimator from which it is defined, as obtained in (11), the robustness properties of both will be the same.
The same occurs with the global reliability (breakdown point) or with the qualitative robustness and its weak continuity, as highlighted in (e), which is true because of Lemma 2.1 in [9], p. 24.
The proof of (f) is obtained from the Central Limit Theorem, with the asymptotic variance of M-estimators equal to the square of the Influence Function ([9], p. 47). □

9.2. Identification of Spatio-Temporal Outliers

As the classical method-of-moments estimator 2 γ ^ z ( h ; τ ) is the M-estimator associated with the score function ψ ( x ) = x and the Huber spatio-temporal variogram estimator, 2 γ ^ H ( h ; τ ) is the M-estimator associated with the score function ψ b ( u ) = min { b , max { u , b } } , where b is the tuning constant, to evaluate the effect of contamination, we define the Difference M-estimator as a solution T n d i of the equation
u = 1 n ψ d i ( X u , T n d i ) = 0 ,
where the score function in (12) is defined as ψ d i = ψ ψ b , which is plotted in Figure 8.
The Difference M-estimator, completely defined from (12) as a general M-estimator, can be also considered as the difference of the Average M-estimators associated with the mean and the Huber estimator.
As it is an M-estimator, we can use the VOM + SAD approximation (8) for its distribution obtained above, P F { T n d i > a } , with ψ d i being the score function.
As 2 γ ^ z ( h ; τ ) and 2 γ ^ H ( h ; τ ) are sums of squares, and the latter is softer than the former, we should check for large positive values of the Difference M-estimator as spatio-temporal outliers. Hence, if the probability P F { T n d i > t n d i } for a pair of lags ( h , τ ) (where t n d i is the observed value of the Difference M-estimator), is significantly small, we conclude that ( h , τ ) is a spatio-temporal outlier.
Example 1.
Continuing with the example of Section 7, some of the differences between the classical spatio-temporal estimator and the Huber spatio-temporal estimator are small (e.g., 0.0000 and 0.0299), while others are large (e.g., 6.4689 and 6.6959). With the approximation of the Difference M-estimator, we obtain a table of tail probabilities (i.e., p-values for the test of significant differences), thus, allowing for the detection of spatio-temporal outliers.
The full table for the 91 pairs of lags considered in this paper is provided in the Supplementary Material. All 91 lags are shown in Figure 9 together with the highly significant spatio-temporal outliers (in red) and the doubtfully significant outliers (in blue).
From the figure, if we discard the doubtful outliers (in blue), we can conclude that some of spatio-temporal lag outliers are essentially only spatial outliers (at h = 40 , h = 200 from the second temporal lag), while two of them are essentially only temporal outliers ( τ = 2 , τ = 6 , from the distance lags h = 40 to h = 200 ; maybe h = 280 ). The truly spatio-temporal outliers, in both components, are the intersection lags ( h , τ ) = ( 40 , 2 ) , ( 40 , 6 ) , ( 200 , 2 ) , ( 200 , 6 ) .
We remark that these spatio-temporal outliers are lag outliers (i.e., not observation coordinates); that is, they are outliers with respect to the variogram, where the observations are not the initial Z i but the transformed X i . Nevertheless, they must be checked before kriging.

10. Conclusions

In this paper, we proposed some robust estimators of the spatio-temporal variogram. We also obtained accurate approximations for their distributions. These were based on a von Mises expansion of the tail probability functional plus a saddlepoint approximation of the Tail Area Influence Function involved in the von Mises expansion. One of the advantages of these approximations is that they have a closed form, thus, allowing for easy interpretation of the elements that they involve, such as the sample size, contamination fraction, score function, temporal and spatial lags and so on.
These approximations are computed under a scale-contaminated normal model for the observations. One of the key points in obtaining these approximations is the transformation of the original variables into new independent variables. With the approximations obtained in this way, we can check, for instance, whether the common use of all the observations without temporal distinctions is valid or if the estimators must be computed for significantly different times. We also used these approximations to identify spatio-temporal outliers in the second part of the paper, defining a new class of M-estimators in the process.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math10101785/s1.

Funding

This work was partially supported by grant PGC2018-095194-B-I00 from the Ministerio de Ciencia, Innovación y Universidades (Spain).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The author is very grateful to the referees for their kind and professional remarks.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Christakos, G. Spatiotemporal Random Fields: Theory and Applications, 2nd ed.; Elsevier: Amsterdam, The Netherlands, 2017. [Google Scholar]
  2. Hristopulos, D.T. Random Fields for Spatial Data Modeling: A Primer for Scientists and Engineers; Springer Nature: Berlin, Germany, 2020. [Google Scholar]
  3. Cressie, N.A.C. Statistics for Spatial Data; John Wiley & Sons: New York, NY, USA, 1993. [Google Scholar]
  4. Chilès, J.P.; Delfiner, P. Geostatistics: Modeling Spatial Uncertainty, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2012. [Google Scholar]
  5. Wikle, C.K.; Zammit-Mangion, A.; Cressie, N. Spatio-Temporal Statistics with R; Chapman & Hall/CRC: London, UK, 2019. [Google Scholar]
  6. Varouchakis, E.A.; Hristopulos, D.T. Comparison of spatiotemporal variogram functions based on a sparse dataset of groundwater level variations. Spat. Stat. 2019, 34, 1–18. [Google Scholar] [CrossRef]
  7. García-Pérez, A. Saddlepoint approximations for the distribution of some robust estimators of the variogram. Metrika 2020, 83, 69–91. [Google Scholar] [CrossRef]
  8. García-Pérez, A. New robust cross-variogram estimators and approximations for their distributions based on saddlepoint techniques. Mathematics 2021, 9, 762. [Google Scholar] [CrossRef]
  9. Huber, P.J.; Ronchetti, E.M. Robust Statistics, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2009. [Google Scholar]
  10. Tukey, J.W. A survey of sampling from contaminated distributions. In Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, Stanford Studies in Mathematics and Statistics; Oklin, I., Ed.; Stanford University Press: Redwood City, CA, USA, 1960; Chaper 39; pp. 448–485. [Google Scholar]
  11. Huber, P.J. Robust estimation of a location parameter. Ann. Math. Stat. 1964, 35, 73–101. [Google Scholar] [CrossRef]
  12. Ebner, B.; Henze, N. Tests for multivariate normality—A critical review with emphasis on weighted L2-statistics. Test 2020, 29, 845–892. [Google Scholar] [CrossRef]
  13. Kotz, S.; Balakrishnan, N.; Johnson, N.L. Continuous Multivariate Distributions. Volume 1: Models and Applications, 2nd ed.; John Wiley & Sons: New York, NY, USA, 2000. [Google Scholar]
  14. Hampel, F.R.; Ronchetti, E.M.; Rousseeuw, P.J.; Syahel, W.A. Robust Statistics: The Approach Based on Influence Functions; John Wiley & Sons: New York, NY, USA, 1986. [Google Scholar]
  15. Ronchetti, E. Accurate and robust inference. Econom. Stat. 2020, 14, 74–88. [Google Scholar] [CrossRef]
  16. Cressie, N.; Hawkins, D.M. Robust estimation of the variogram: I. Math. Geol. 1980, 12, 115–125. [Google Scholar] [CrossRef]
  17. La Vecchia, D.; Ronchetti, E. Saddlepoint approximations for short and long memory time series: A frequency domain approach. J. Econom. 2019, 213, 578–592. [Google Scholar] [CrossRef]
  18. Von Mises, R. On the asymptotic distribution of differentiable statistical functions. Ann. Math. Stat. 1947, 18, 309–348. [Google Scholar] [CrossRef]
  19. Withers, C.S. Expansions for the distribution and quantiles of a regular functional of the empirical distribution with applications to nonparametric confidence intervals. Ann. Stat. 1983, 11, 577–587. [Google Scholar] [CrossRef]
  20. Serfling, R.J. Approximation Theorems of Mathematical Statistics; John Wiley & Sons: New York, NY, USA, 1980. [Google Scholar]
  21. Field, C.A.; Ronchetti, E. A tail area influence function and its application to testing. Sequential Anal. 1985, 4, 19–41. [Google Scholar] [CrossRef]
  22. Daniels, H.E. Saddlepoint approximations for estimating equations. Biometrika 1983, 70, 89–96. [Google Scholar] [CrossRef]
  23. Lugannani, R.; Rice, S. Saddle point approximation for the distribution of the sum of independent random variables. Adv. Appl. Probab. 1980, 12, 475–490. [Google Scholar] [CrossRef]
  24. García-Pérez, A. Von Mises approximation of the critical value of a test. Test 2003, 12, 385–411. [Google Scholar] [CrossRef]
  25. Jensen, J.L. Saddlepoint Approximations; Clarendon Press: Oxford, UK, 1995. [Google Scholar]
  26. R Development Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Viena, Austria, 2021; Available online: http://www.R-project.org (accessed on 18 April 2022).
Figure 1. Exact and approximate tail probabilities for the empirical spatio-temporal estimator with n ( h , τ ) = 3 .
Figure 1. Exact and approximate tail probabilities for the empirical spatio-temporal estimator with n ( h , τ ) = 3 .
Mathematics 10 01785 g001
Figure 2. Exact and approximate tail probabilities of the empirical spatio-temporal estimator with contamination ϵ = 0.01 , ϵ = 0.05 , ϵ = 0.1 and ϵ = 0.2 , with sample size n ( h , τ ) = 3 .
Figure 2. Exact and approximate tail probabilities of the empirical spatio-temporal estimator with contamination ϵ = 0.01 , ϵ = 0.05 , ϵ = 0.1 and ϵ = 0.2 , with sample size n ( h , τ ) = 3 .
Mathematics 10 01785 g002
Figure 3. Huber’s spatio-temporal semivariogram estimator (with tuning constant equal to 1.345) of daily Tmax from the NOAA data set for July 2003, computed using the estimator introduced in Section 6.
Figure 3. Huber’s spatio-temporal semivariogram estimator (with tuning constant equal to 1.345) of daily Tmax from the NOAA data set for July 2003, computed using the estimator introduced in Section 6.
Mathematics 10 01785 g003
Figure 4. Three-dimensional picture of the classical spatio-temporal semivariogram estimator of the daily Tmax from the NOAA data for July 2003.
Figure 4. Three-dimensional picture of the classical spatio-temporal semivariogram estimator of the daily Tmax from the NOAA data for July 2003.
Mathematics 10 01785 g004
Figure 5. Three-dimensional picture of the Huber spatio-temporal semivariogram estimator (with a tuning constant equal to 1.345) of the daily Tmax from the NOAA data for July 2003 computed using the estimator introduced in Section 6.
Figure 5. Three-dimensional picture of the Huber spatio-temporal semivariogram estimator (with a tuning constant equal to 1.345) of the daily Tmax from the NOAA data for July 2003 computed using the estimator introduced in Section 6.
Mathematics 10 01785 g005
Figure 6. Classical semivariograms of the daily Tmax from the NOAA data with respect to the seven time lags considered.
Figure 6. Classical semivariograms of the daily Tmax from the NOAA data with respect to the seven time lags considered.
Mathematics 10 01785 g006
Figure 7. Huber’s semivariograms (with tuning constant equal to 1.345) of the daily Tmax from the NOAA data with respect to the seven time lags considered.
Figure 7. Huber’s semivariograms (with tuning constant equal to 1.345) of the daily Tmax from the NOAA data with respect to the seven time lags considered.
Mathematics 10 01785 g007
Figure 8. Score function defining the Difference M-estimator with tuning constant b.
Figure 8. Score function defining the Difference M-estimator with tuning constant b.
Mathematics 10 01785 g008
Figure 9. Highly significant spatio-temporal atypical lags (in red) and doubtfully significant (in blue) of the daily Tmax from the NOAA data.
Figure 9. Highly significant spatio-temporal atypical lags (in red) and doubtfully significant (in blue) of the daily Tmax from the NOAA data.
Mathematics 10 01785 g009
Table 1. Tail probabilities for several values of a and sample size n ( h , τ ) = 3 .
Table 1. Tail probabilities for several values of a and sample size n ( h , τ ) = 3 .
aExactApproximation
2.50.147140.148299
3.00.093080.093233
3.50.055770.058124
4.00.035480.036006
4.50.020890.022196
5.00.013130.013633
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

García-Pérez, A. On Robustness for Spatio-Temporal Data. Mathematics 2022, 10, 1785. https://doi.org/10.3390/math10101785

AMA Style

García-Pérez A. On Robustness for Spatio-Temporal Data. Mathematics. 2022; 10(10):1785. https://doi.org/10.3390/math10101785

Chicago/Turabian Style

García-Pérez, Alfonso. 2022. "On Robustness for Spatio-Temporal Data" Mathematics 10, no. 10: 1785. https://doi.org/10.3390/math10101785

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop