1. Introduction and Notation
Spatial dependence is described by a variogram in the univariate case. If there is another variable correlated with the variable of interest and we want to use its spatial information, we have to use a cross-variogram, thus extending the univariate analysis to the multivariate case.
Formally, let , be an isotropic second-order stationary multivariate spatial process, with D being a fixed subset of , assuming that each component , , has an expectation and variance constant, i.e., they do not depend on the location . We also assume that the covariance between two observations depends only on the distance that separates them and not on the spatial locations.
In addition, we admit that each component possesses a variogram
is the variance.
We measure the statistical association between the random components of
with the correlation coefficients and the spatial dependence in each component with the variograms. To capture the association both within components of
and across
, the cross-variogram is defined as ([
1], p. 67, or [
2], p. 229)
, where
means covariance and
E means mathematical expectation.
This definition is for collocated data, i.e., assuming that each location (site) has all variables measured, a situation that we assume all over the paper.
The results refer to the pair , i.e., to a generic pair of components of the vector .
Let us also assume that we have a sample of at m locations , obtaining m (p-dimensional) observations . Hence, the data matrix is a matrix where the th element is the observation of component at location .
The definition of new robust estimators against outliers of the cross-variogram and their sample distributions are the aims of this paper.
Until now, there were only two robust estimators previously defined by [
3]. This author considered the covariance estimation method to obtaining two somewhat weird and difficult to apply estimators. Here, we consider the location estimation method, extending the idea considered first in [
4] and followed in [
5] to the multivariate case.
To do this, we start with the classical (non-robust) method-of-moments estimator, defined as
with the sample size being
and where the cardinality of
It is usually assumed that spatial data follow a normal distribution, but this is unrealistic because, in practice, they are contaminated by occasional outliers. For this reason, we assume in the paper a model close to the normal, i.e., a normal-like model in the central region but with heavier tails than the normal, namely, a multivariate-scale-contaminated normal distribution with joint probability density function (pdf):
denotes the pdf of a
p-variate normal random vector with mean vector
and covariance matrix
, a matrix with values
in its diagonal,
In this framework, represents the small proportion of outliers in the sample (e.g., the proportion of extreme weather events affecting at all locations) and g represents the extent of the contamination. If or , this model reduces to the multivariate normal distribution and, if and , it resembles the normal in the central part but with heavier tails.
This is the usual way in which robust statistics handles the nonnormality of the data: establishing a neighborhood of the standard model distribution,
the contamination neighborhood, inside which the underlying model is located (e.g., [
8], p. 12, or [
9], p. 870).
From this joint distribution, the marginal distributions of the
are the univariate scale contaminated normal models:
The paper is organized as follows. In
Section 2, we consider a consecutive pair of transformations of the initial observations to avoid their dependence. With these, we can use standard techniques for independent and identically distributed (iid) random variables. We also obtain in that section the distribution of these new variables. Here, we have a remarkable difference with respect to the paper by [
5]: there, the transformed variables were the square of standard normal variables, i.e.,
distributed random variables, but here, we have the product of two different normal variables.
Section 3, cross-variogram
M-estimators based on the new variables are defined. The von Mises plus saddlepoint (VOM+SAD) approximations for their distributions are also obtained, approximations that are applied to the classical method-of-moment estimator in
Section 4. This is the first time that a closed form approximation of its distribution is obtained. Simulations of approximation accuracy and lack of robustness of its distribution are included.
Section 5, we define the
-trimmed cross-variogram estimator and we obtain the VOM+SAD approximation for its distribution. We do the same for the Huber’s cross-variogram estimator in
Section 6. We include here a simulation study to compare the robustness of the three estimators as we increase the degree of contamination.
Section 7 is devoted to analyzing the dependence of the transformed variables on the linearized cross-variogram models. We conclude the paper with two examples of real data.
Finally, in
Section 8, we give some conclusions, ending the paper with an Appendix, which contains the technical details obtained in the paper.
2. Preliminary Transformation
The usual dependence between spatial observations does not allow for the use of techniques for iid variables. Nevertheless, it is possible to skip this restriction by transforming the initial observations .
Namely, let us define the
gap or
lag variable
The cross-variogram is now
the mean of the product, and its classical estimator, the method-of-moments estimator,
The sample mean of the variables , , is non-robust then.
This is the reason why we say that we use the location estimation way: the parameter is the mean, and the classical estimator is the sample mean. In this manner, instead of considering a weird estimator for a strange parameter of the initial distribution, we propose to transform the original (and usually dependent) observations into new data (independent under some conditions) obtaining a natural parameter of the new variable (its mean) for which a manageable estimator (the sample mean) should be feasible. Then, standard techniques of robustification can be applied.
This idea has been successfully applied, first, in [
4] and in [
An important problem is determining the distribution of this new variable from the original normal (or contaminated normal) distribution of to later obtain the distribution of the robust estimators obtained, where is now the product of two different normal variables.
2.1. Correlation between and
First, let us define two new functions that are natural extensions of the similar ones associated with the variogram.
Let us call
cross-covariogram between
to the function (provided it is well defined)
that will be equal to
Here, a will be t or t + h and b will be s or s + h, and thus, and , where the equality between the expectations is obtained because of the intrinsic stationary property of the components of Z.
Analogously, we assume the equality of the variances in locations that are distanced by a lag h, , and .
Let us also define the
cross-correlogram as
Now, the covariance between
will be (see the
Appendix A for details)
Thus, the correlation between
will be zero if
Because locations are fixed in advance (for instance, they could be sample stations) we assume that they are equally spaced on a transect, for instance, in Figure 2.1 of [
1], i.e., they are data on a regular grid. Hence, we can match two contiguous
(for which the dependence is supposed to be the strongest), so that it is
t +
h =
Now, the previous condition of correlation equal to zero is obtained if
or, in terms of the
cross-covariogram, when
On the other hand, with a little of algebra, the cross-variogram can be expressed as (see
Appendix A for details)
and then, it will be
Replacing these values of
in (
2), we obtain
i.e., the correlation between
will be 0 when
i.e., if a linear cross-variogram can be accepted as model (because, theoretically, the nugget is 0).
Remark 1. The increments and have as joint cumulative distribution function, if they are uncorrelated, Hence, if
are uncorrelated, with probability
, they are independent under model
and, with probability
, they are independent under model
, being a mixture of independent variables. For this reason, these variables are considered in the paper as independent if they are uncorrelated, following the idea of [
2.2. Independence of the Observations
The method-of-moments estimator
was expressed as the sample mean of the variables
. Considering only two of them,
, if we can accept a linear variogram for the variable
and a linear variogram for the variable
, it was proved in [
5] that
will be independent of
and that
will be independent of
If, additionally, we can accept a linear cross-variogram for the couple , the variables and , and and will be independent.
As a conclusion, if we could accept a linear variogram for the variable , a linear variogram for the variable , and a linear cross-variogram for this pair, the variables , , could be considered independent, a situation that we assume in the paper and to which we shall return later.
2.3. Distribution of the Transformed Variables
Therefore, the initial observations , normal or contaminated normal distributed, are transformed into the lag variables and, finally, into their product . The reason for this transformation is to express the classical estimator as a sample mean of independent variables (if linear variograms and cross-variogram can be accepted), obtaining a nice mathematical expression for the estimator, very useful in the definition of new robust estimators of of location and in the determination of its sample distribution, thanks to this location estimation way.
The problem is that, although, initially, the are contaminated normal variables, after two transformations, we do not have normality in . In what follows, we obtain their distributions.
Proposition 1. (a) If , then .
(b) If , then .
To obtain the distribution of we use two results from Nadarajah and Pongány (2016).
Proposition 2. ([10], p. 202, Theorems 2.1 and 2.2) (a) Let denote a bivariate normal random vector with zero means, unit variances, and correlation coefficient ρ. Then, the pdf of is, where is the modified Bessel function of the second-order zero. (b) If () is a random sample of , the pdf of their sample mean is, where , , and is the modified Bessel function of the second-order b. Thus, if
is a bivariate scale contaminated normal variable with distribution
the variable
will be a bivariate scale contaminated normal variable with distribution
where, in
, the two elements of the diagonal are
and the correlation coefficient between
equal to the correlation coefficient between
, usually shortened as
in the rest of the paper. Hence, it will be
The distribution of
is the cumulative distribution function for which the pdf is given by (
3). The last equality, (
5), is used as a notation.
3. Cross-Variogram -Estimators
Because the method-of-moments estimator is the sample mean of the transformed variables , this estimator is robustified as it is the sample mean, but here, the model distribution of the observations is somewhat peculiar, with the computations being more elaborated.
Firstly, we define a large class of cross-variogram estimators for which their robustness can be controlled. We call
cross-variogram M-estimators, with score function
, to the solution of the equation:
are the variables previously considered and we assume that
is monotonically decreasing in
for all
x. In fact,
is an estimator for a location problem, with
being of the form
, with
monotonically increasing in
u, [
We can control the robustness of the cross-variogram M-estimators, choosing a bounded score function. Other robustness properties, such us the breakdown point, can also be applied to this class of estimators.
3.1. Von Mises Approximation for their Distributions
is an estimator where
F is the underlying model distribution of the observations, the tail probability
can be expressed at another model
G using the von Mises expansion as [
is Hampel’s influence function of the tail probability functional, called tail area influence function (
15] and defined as
for all
where the right-hand side exists.
This influence function is calculated by changing the underlying model G using a contaminated model before computing the first derivative at , with being the distribution that assigns mass 1 at x.
If distributions
F and
G are close enough, we can use the
von Mises approximation (VOM)
to compute the distribution of
under the underlying model
F using model
In particular, if
F is a mixture
the von Mises expansion is
. The von Mises approximation (
7) will be then
Distribution G plays an important role in the VOM approximation because we can choose it such that we know the tail probability of the leading term, . Distribution G is called the pivotal distribution, and let us observe that is also computed for this pivotal distribution.
3.2. Saddlepoint Approximation of the TAIF
In order to use von Mises approximation (
8) for location
M-estimators, we compute a saddlepoint approximation (SAD) of the
, using Lugannani and Rice’s formula, [
16] ([
17], p. 77, or better, [
8], p. 314). We use the approximation given in [
11] for
M-estimators and, following the same computations as that in [
18], pp. 402–404, we have that
is the density function of the standard normal distribution, and
s and
are the functionals
being the cumulant generating function of distribution
being the second partial derivative of
with respect to the first variable
; and
being the saddlepoint, i.e., the solution of the
saddlepoint equationReplacing the SAD approximation (
9) in the VOM approximation (
8), we obtain the VOM+SAD approximation for the distribution of the
, assuming that
which is the approximation that we use in what follows and where
G and
H are the distributions that appear in (
The VOM+SAD approximation will be accurate if distributions
F and
G are close. Nevertheless, if this is not the case, we can use an iterative procedure, as in [
21], considering intermediate distributions between
F and
4. Sample Distribution of the Method-of-Moments Estimator
Not all the cross-variogram M-estimators are robust. For instance, the classical method-of-moment estimator is not robust because its score function is not bounded. Nevertheless, we compute its VOM+SAD approximation to show its lack of robustness next and because its distribution will be useful in the determination of the distribution of some robust versions of it.
Due to
being an
M-estimator with score function
, we can use approximation (
10). Its leading term is computed with respect to distribution
, where
is the cumulative distribution function for which the pdf is
, given by (
3) in Proposition 2.
Thus, the leading term in (
10) is
and where
is the pdf given by (
4) because, now, the previous tail probability is the tail probability of the sample mean of the product of two standard normal distributions.
4.1. Performance of the Theoretical Results with Simulations
We can see how accurate the VOM+SAD approximation is for the method-of-moments estimator with a simulation study, considering a sample size as small as . We considered a bivariate normal distribution with mean vector and covariance matrix such that and are the marginal variances and the covariance for . We consider four different situations: no contamination, contamination , contamination , and contamination .
Under these conditions, we obtain
Figure 1 in which we appreciate that the approximations are very good, especially in the tails, which are the areas of interest for tests and confidence intervals.
If we compute from this table the relative errors of the approximation, in %, defined as usual (see, for instance, [
22]) as
we obtain
Table 2, showing extremely low relative errors in the approximations. This is one of the advantages of saddlepoint approximations, [
4.2. Robustness of the Method-of-Moments-Estimator
We can observe the lack of robustness of the distribution of the method-of-moments-estimator in
Figure 2 as we increase
Remark 2. The sample size , considered in each estimation, depends on the value of the lagh, that is fixed in advance. Ifhis small, the number of lags will be large and will be small. The VOM+SAD approximations obtained in the paper are very accurate, even in this case.
Nevertheless, ifhis large, the number of lags will be small and the sample size will be large. In this case, it is easier to compute the leading term as using the central limit theorem because
is the product of two standard normal variables with correlation coefficient
. The characteristic function of this product is (expression (4) in [
and then, the mean of this product variable
and the second moment about the origin is
. Hence, the variance will be
and the leading term can be computed if
is large, as
Since, if or , the scale contaminated normal distribution is just a normal distribution, this last expression is an approximation for the distribution of the classical method-of-moments estimator under the usual underlying normal distribution model.
5. -Trimmed Cross-Variogram Estimator
Another robust estimator for the cross-variogram, which is not an M-estimator, can be obtained by trimming the observations as follows:
Considering the initial pair of variables
, and transforming them to the couple
and finally to the product
, if we trim the
of the smallest and the
of the largest ordered data
, the (symmetrically) sample
α-trimmed cross-variogram estimator is defined as
stands for the integer part.
To obtain an approximation for its sample distribution, we use an accurate VOM+SAD approximation obtained in [
21]. From Corollary 1 therein, we can approximate the small sample distribution of the sample
-trimmed cross-variogram
when the observations
come from
, with
k iterations (
k large), by the VOM+SAD approximation to the distribution of the method-of-moments-estimator
, obtained in the previous section, as
In the bottom row of
Figure 3, we plot the tail probability of the
-trimmed cross-variogram estimator
with no contamination (
) and with two percentages of contamination:
, with the sample size being
We observe in this figure that, as we increase the contamination percentage, i.e., as we increase , the tail probabilities obtained with the trimmed cross-variogram estimators are affected but by less than those obtained with the classical method-of-moments estimator. We see this by comparing the first row of figures (non-trimmed cross-variogram estimators) with the second row of figures (trimmed cross-variogram estimators).
6. Huber’s Cross-Variogram Estimator
If the
, used to obtain the
M-estimator in Equation (
6) is the Huber’s function
, the
M-estimator obtained is called the
Huber’s cross-variogram estimator,
. Since its score function is bounded, this estimator will be robust.
An approximation for its distribution can be obtained from (
10). Nevertheless, the leading term
is not easy to compute. For this reason, in this case, we use the Lugannani and Rice formula to approximate this leading term, the VOM+SAD approximation for the distribution of the Huber’s cross-variogram estimator being the following:
where the saddlepoint
is such that
G and
H being the distributions that appear in (
5), and where all the functionals
, and
s are computed with respect to model
This approximation may seem complicated but it is easy to compute using the
huber function of the MASS library, [
Example 1. In order to analyze the behaviour of the robust estimators defined in the paper, we compare them with the classical method-of-moments estimator, carrying out a simulation study in which we compare the -trimmed and Huber’s variogram estimators with the classical one.
The study consists of a simulation of two spatial and statistical correlated variables
, both with a normal distribution, in different situations, with some of them considered, for instance, in [
- (A)
No contamination, ;
- (B)
- (C)
- (D)
- (E)
- (F)
- (G)
The details of the simulations are in the
Supplementary Materials. In these simulations, we observe less sensitivity in the robust estimators than in the classical one, as we increase the contamination in the model. We appreciate this in
Figure 4 and
Figure 5. In the first one, we observe that the classical variogram model can be accepted for the three estimations in case (A), where there is no contamination. Nevertheless, as we increase the contamination (
Figure 5), this variogram model does not represent the classic variogram estimations; only in some cases it represents the 0.1-trimmed variogram estimations, and it can be accepted when we consider Huber’s variogram estimations except, perhaps, in the last case, where it is doubtful.
7. Linearized Version of the Cross-Variogram Model
We saw at the end of
Section 2.2 that, if linear models can be accepted as variograms and cross-variograms, the variables
can be considered independent. These
linearized versions of the model (classical and robust) were introduced in Section 9 of [
5] and can be applied to model the cross-variogram. They essentially consists of replacing, before the range, the increasing part of the traditional variogram, or cross-variogram, using the regression line and, after the range, using the sill (or the robust sample mean in the robust linearized version).
Additionally, the test defined in Section 10.1 of [
5] can be used to check if these models can be accepted, using saddlepoint approximations for the robust (and classical) estimators of the variograms and cross-variograms.
Namely, we test the null hypothesis of a particular variogram or cross-variogram model
from which we obtain the
theoretical variogram values
(or cross-variogram values
)) using as test statistic
assuming that we consider
K lags.
If we unify both as
the cumulative distribution function of
is (see [
probabilities that are computed with the VOM+SAD approximations.
We remark that the number K of lags (and hence the value of ) can be modified to obtain the desired linearity.
Example 2. Let us consider prediction data, included in the jura data set from Pierre Goovaerts’ book that contains geolocated information of several variables. This data set is calledprediction.datin the R library,gstat.
Two correlated variables, with a distribution similar to a scale contaminated normal model, are ln(Pb) (natural logarithm of Lead) and Ni (Nickel).
The values of the classical method-of-moments estimator, the 0.1-trimmed cross-variogram estimator, and the Huber’s cross-variogram estimator (with tuning constant
) are easily obtained for these variables, as can be seen in the
Supplementary Materials. The lag distant chosen was
. These values are shown in
Figure 6.
To use their distributions, obtained in the paper, it is necessary to check if we can accept linear variograms for these two variables and a linear cross-variogram for the pair, as it was pointed out in
Section 2.2. If this is the case, the variables
can be considered independent.
Assuming as underlying model, a scale contaminated normal with
, the linearized versions of the variograms for the logarithm of
Lead are shown in
Figure 7. The linearized versions of the variograms for
Nickel are shown in
Figure 8.
Finally, the linearized versions for the cross-variograms models are shown in
Figure 9.
From a visual point of view, all these linearized versions can be accepted using the test considered in
Section 7. The values of the test statistics
and the
p-values are given in
Table 3 (see the
Supplementary Materials). Thus, the independence of the
can be accepted.
We conclude the paper with a real-data example in which we observe how robust cross-variogram estimations provide models less sensitive to outliers, which will lead us to a more robust cokriging.
Example 3. Let us consider the geolocated pollution data, included in the Supplementary Materials, that are the 2017 average concentrations of four air pollutants in the Community of Madrid (Spain): nitrogen monoxide (NO), nitrogen dioxide (NO2), suspended particles with a size less than 10 microns (PM10), and ozone (O3). These data are obtained from 22 monitoring stations [24,25,26]. Two of these 4 variables are strongly correlated and have a distribution similar to a scale contaminated normal model; they are NO and NO2.
The variogram-crossvariogram matrix of the classical variogram and cross-variogram estimators along with classical least squares model (Mather’s model in this case) are shown in
Figure 10.
The values of the classical method-of-moments estimator, the 0.1-trimmed cross-variogram estimator, and the Huber’s cross-variogram estimator (with tuning constant
) for these variables are obtained in the
Supplementary Materials. These values are shown in
Figure 11, along with the linearized cross-variogram models.
We observe that, at first lag, the three estimations agree. In the others, we can see the soft effect of the 0.1-trimmed cross-variogram and Huber’s cross-variogram estimators.
The linearized versions of the variograms and cross-variogram can be accepted, and therefore, the independence of the transformed variables , .
Moreover, we appreciate the influence of the outliers in the estimation of the (linearized) cross-variogram in
Figure 11 and, therefore, on the cokringing obtained with classical cross-variogram models. Thus, the use of robust estimators of the cross-varogram will be more reasonable in order to obtain a robust cokriging.