New Robust Cross-Variogram Estimators and Approximations of Their Distributions Based on Saddlepoint Techniques

: Let Z ( s ) = ( Z 1 ( s ) ,..., Z p ( s )) t be an isotropic second-order stationary multivariate spatial process. We measure the statistical association between the p random components of Z with the correlation coefﬁcients and measure the spatial dependence with variograms. If two of the Z components are correlated, the spatial information provided by one of them can improve the information of the other. To capture this association, both within components of Z ( s ) and across s , we use a cross-variogram. Only two robust cross-variogram estimators have been proposed in the literature, both by Lark, and their sample distributions were not obtained. In this paper, we propose new robust cross-variogram estimators, following the location estimation method instead of the scale estimation one considered by Lark, thus extending the results obtained by García-Pérez to the multivariate case. We also obtain accurate approximations for their sample distributions using saddlepoint techniques and assuming a multivariate-scale contaminated normal model. The question of the independence of the transformed variables to avoid the usual dependence of spatial observations is also considered in the paper, linking it with the acceptance of linear variograms and cross-variograms.


Introduction and Notation
Spatial dependence is described by a variogram in the univariate case. If there is another variable correlated with the variable of interest and we want to use its spatial information, we have to use a cross-variogram, thus extending the univariate analysis to the multivariate case.
Formally, let Z(s) = (Z 1 (s), ..., Z p (s)) t , s ∈ D be an isotropic second-order stationary multivariate spatial process, with D being a fixed subset of R d , assuming that each component Z i , i = 1, ..., p, has an expectation and variance constant, i.e., they do not depend on the location s. We also assume that the covariance between two observations depends only on the distance that separates them and not on the spatial locations.
In addition, we admit that each component possesses a variogram where var is the variance. We measure the statistical association between the random components of Z with the correlation coefficients and the spatial dependence in each component with the variograms. To capture the association both within components of Z(s) and across s, the cross-variogram is defined as ([1], p. 67, or [2], p. 229) 2γ ij (h) = cov Z i (s + h) − Z i (s), Z j (s + h) − Z j (s) = E (Z i (s + h) − Z i (s)) · (Z j (s + h) − Z j (s)) with the sample size being n = N h and where the cardinality of N(h) = {(s l 1 , s l 2 ) : s l 1 − s l 2 = h}.
It is usually assumed that spatial data follow a normal distribution, but this is unrealistic because, in practice, they are contaminated by occasional outliers. For this reason, we assume in the paper a model close to the normal, i.e., a normal-like model in the central region but with heavier tails than the normal, namely, a multivariate-scale-contaminated normal distribution with joint probability density function (pdf): f M (z) = f M (z 1 , ..., z p ) = (1 − ) f N (z; µ, Σ) + f N (z; µ, g 2 Σ) (1) where ∈ (0, 1); g > 1; f N (z; µ, Σ) denotes the pdf of a p-variate normal random vector with mean vector µ = (µ 1 , ..., µ p ) and covariance matrix Σ, a matrix with values σ 2 i in its diagonal, i = 1, ..., p.
In this framework, represents the small proportion of outliers in the sample (e.g., the proportion of extreme weather events affecting Z at all locations) and g represents the extent of the contamination. If = 0 or g = 1, this model reduces to the multivariate normal distribution and, if > 0 and g > 1, it resembles the normal in the central part but with heavier tails. This is the usual way in which robust statistics handles the nonnormality of the data: establishing a neighborhood of the standard model distribution, the contamination neighborhood, inside which the underlying model is located (e.g., [6][7][8], p. 12, or [9], p. 870).
From this joint distribution, the marginal distributions of the Z i are the univariate scale contaminated normal models: The paper is organized as follows. In Section 2, we consider a consecutive pair of transformations of the initial observations to avoid their dependence. With these, we can use standard techniques for independent and identically distributed (iid) random variables. We also obtain in that section the distribution of these new variables. Here, we have a remarkable difference with respect to the paper by [5]: there, the transformed variables were the square of standard normal variables, i.e., χ 2 distributed random variables, but here, we have the product of two different normal variables.
In Section 3, cross-variogram M-estimators based on the new variables are defined. The von Mises plus saddlepoint (VOM+SAD) approximations for their distributions are also obtained, approximations that are applied to the classical method-of-moment estimator in Section 4. This is the first time that a closed form approximation of its distribution is obtained. Simulations of approximation accuracy and lack of robustness of its distribution are included.
In Section 5, we define the α-trimmed cross-variogram estimator and we obtain the VOM+SAD approximation for its distribution. We do the same for the Huber's crossvariogram estimator in Section 6. We include here a simulation study to compare the robustness of the three estimators as we increase the degree of contamination. Section 7 is devoted to analyzing the dependence of the transformed variables on the linearized cross-variogram models. We conclude the paper with two examples of real data.
Finally, in Section 8, we give some conclusions, ending the paper with an Appendix, which contains the technical details obtained in the paper.

Preliminary Transformation
The usual dependence between spatial observations Z does not allow for the use of techniques for iid variables. Nevertheless, it is possible to skip this restriction by transforming the initial observations Z.
Namely, let us define the gap or lag variable W i s as The cross-variogram is now the mean of the product, and its classical estimator, the method-of-moments estimator, The sample mean of the variables X l = W i s l · W j s l , l = 1, ..., n, is non-robust then. This is the reason why we say that we use the location estimation way: the parameter is the mean, and the classical estimator is the sample mean. In this manner, instead of considering a weird estimator for a strange parameter of the initial distribution, we propose to transform the original (and usually dependent) observations Z l into new data X l (independent under some conditions) obtaining a natural parameter of the new variable (its mean) for which a manageable estimator (the sample mean) should be feasible. Then, standard techniques of robustification can be applied. This idea has been successfully applied, first, in [4] and in [5]. An important problem is determining the distribution of this new variable X l from the original normal (or contaminated normal) distribution of Z l to later obtain the distribution of the robust estimators obtained, where X l is now the product of two different normal variables.

Correlation between W i t and W j s
First, let us define two new functions that are natural extensions of the similar ones associated with the variogram. Let us call cross-covariogram between Z i and Z j to the function (provided it is well defined) Here, a will be t or t + h and b will be s or s + h, and thus, where the equality between the expectations is obtained because of the intrinsic stationary property of the components of Z.
Analogously, we assume the equality of the variances in locations that are distanced by a lag h, . Let us also define the cross-correlogram as Now, the covariance between W i t and W j s will be (see the Appendix A for details) Thus, the correlation between W i t and W j s will be zero if Because locations are fixed in advance (for instance, they could be sample stations) we assume that they are equally spaced on a transect, for instance, in Figure 2.1 of [1], i.e., they are data on a regular grid. Hence, we can match two contiguous Z i (for which the dependence is supposed to be the strongest), so that it is t + h = s. Now, the previous condition of correlation equal to zero is obtained if or, in terms of the cross-covariogram, when On the other hand, with a little of algebra, the cross-variogram can be expressed as (see Appendix A for details) and then, it will be CC ij (2h) = CC ij (0) − γ ij (2h).
Replacing these values of CC ij (h) and CC ij (2h) in (2), we obtain i.e., the correlation between W i t and W j s will be 0 when i.e., if a linear cross-variogram can be accepted as model (because, theoretically, the nugget is 0).

Remark 1.
The increments W i t and W j s have as joint cumulative distribution function, if they are uncorrelated, Hence, if W i t and W j s are uncorrelated, with probability 1 − , they are independent under model f N 1 and, with probability , they are independent under model f N 2 , being a mixture of independent variables. For this reason, these variables are considered in the paper as independent if they are uncorrelated, following the idea of [4].

Independence of the Observations X s
The method-of-moments estimator 2 γ ij (h) was expressed as the sample mean of the variables X l = W i s l · W j s l , l = 1, ..., n. Considering only two of them, X 1 = W i s 1 · W j s 1 and X 2 = W i s 2 · W j s 2 , if we can accept a linear variogram for the variable Z i and a linear variogram for the variable Z j , it was proved in [5] that W i s 1 will be independent of W i s 2 and that W j s 1 will be independent of W j s 2 , l = 1, ..., n. If, additionally, we can accept a linear cross-variogram for the couple (Z i , Z j ), the variables W i s 1 and W j s 2 , and W j s 1 and W i s 2 will be independent. As a conclusion, if we could accept a linear variogram for the variable Z i , a linear variogram for the variable Z j , and a linear cross-variogram for this pair, the variables X l = W i s l · W j s l , l = 1, ..., n, could be considered independent, a situation that we assume in the paper and to which we shall return later.

Distribution of the Transformed Variables
Therefore, the initial observations Z i , Z j , normal or contaminated normal distributed, are transformed into the lag variables W i s , W j s and, finally, into their product X s = W i s · W j s . The reason for this transformation is to express the classical estimator as a sample mean of independent variables (if linear variograms and cross-variogram can be accepted), obtaining a nice mathematical expression for the estimator, very useful in the definition of new robust estimators of of location and in the determination of its sample distribution, thanks to this location estimation way.
The problem is that, although, initially, the Z i are contaminated normal variables, after two transformations, we do not have normality in X s . In what follows, we obtain their distributions.
To obtain the distribution of 2 γ ij (h) we use two results from Nadarajah and Pongány (2016).
equal to the correlation coefficient between W i s / 2γ ii (h) and W j s / 2γ jj (h), usually shortened as ρ in the rest of the paper. Hence, it will be where P X is the cumulative distribution function for which the pdf is given by (3). The last equality, (5), is used as a notation.

Cross-Variogram M-Estimators
Because the method-of-moments estimator is the sample mean of the transformed variables X s , this estimator is robustified as it is the sample mean, but here, the model distribution of the observations is somewhat peculiar, with the computations being more elaborated.
Firstly, we define a large class of cross-variogram estimators for which their robustness can be controlled. We call cross-variogram M-estimators, with score function ψ : X × Θ −→ I R, to the solution of the equation: where X s are the variables previously considered and we assume that ψ(x, θ) is monotonically decreasing in θ for all x. In fact, T n is an estimator for a location problem, with ψ(x, θ) being of the form ψ(x − θ), with ψ(u) monotonically increasing in u, [11]. We can control the robustness of the cross-variogram M-estimators, choosing a bounded score function. Other robustness properties, such us the breakdown point, can also be applied to this class of estimators.

Von Mises Approximation for their Distributions
If T n (X 1 , ..., X n ) is an estimator where F is the underlying model distribution of the observations, the tail probability P F {T n > t} can be expressed at another model G using the von Mises expansion as [12][13][14]: where TAIF(x; t; T n , G) is Hampel's influence function of the tail probability functional, called tail area influence function [15] and defined as for all x ∈ R where the right-hand side exists. This influence function is calculated by changing the underlying model G using a contaminated model (1 − )G + δ x before computing the first derivative at = 0, with δ x being the distribution that assigns mass 1 at x.
If distributions F and G are close enough, we can use the von Mises approximation (VOM) to compute the distribution of T n under the underlying model F using model G.
The von Mises approximation (7) will be then Distribution G plays an important role in the VOM approximation because we can choose it such that we know the tail probability of the leading term, P G {T n > t}. Distribution G is called the pivotal distribution, and let us observe that TAIF is also computed for this pivotal distribution.

Saddlepoint Approximation of the TAIF
In order to use von Mises approximation (8) for location M-estimators, we compute a saddlepoint approximation (SAD) of the TAIF(x; t; T n , G) , using Lugannani and Rice's formula, [16] ( [17], p. 77, or better, [8], p. 314). We use the approximation given in [11] for M-estimators and, following the same computations as that in [18], pp. 402-404, we have that where φ is the density function of the standard normal distribution, and s and r 1 are the functionals s = −2nK(z 0 , t) being the cumulant generating function of distribution G; K (λ, t) being the second partial derivative of K(λ, t) with respect to the first variable λ; and z 0 being the saddlepoint, i.e., the solution of the saddlepoint equation Replacing the SAD approximation (9) in the VOM approximation (8), we obtain the VOM+SAD approximation for the distribution of the M-estimator T n (X 1 , ..., X n ), assuming that X i ≡ F = (1 − )G + H, which is the approximation that we use in what follows and where G and H are the distributions that appear in (5).
The VOM+SAD approximation will be accurate if distributions F and G are close. Nevertheless, if this is not the case, we can use an iterative procedure, as in [19][20][21], considering intermediate distributions between F and G.

Sample Distribution of the Method-of-Moments Estimator
Not all the cross-variogram M-estimators are robust. For instance, the classical methodof-moment estimator 2 γ ij (h) is not robust because its score function ψ(u) = u is not bounded. Nevertheless, we compute its VOM+SAD approximation to show its lack of robustness next and because its distribution will be useful in the determination of the distribution of some robust versions of it.
Due to 2 γ ij (h) being an M-estimator with score function ψ(x − θ) = x − t, we can use approximation (10). Its leading term is computed with respect to distribution where P X is the cumulative distribution function for which the pdf is p X , given by (3) in Proposition 2.
Thus, the leading term in (10) is where p X is the pdf given by (4) because, now, the previous tail probability is the tail probability of the sample mean of the product of two standard normal distributions. The rest of the elements in approximation (10)

Performance of the Theoretical Results with Simulations
We can see how accurate the VOM+SAD approximation is for the method-of-moments estimator with a simulation study, considering a sample size as small as n = 3. We considered a bivariate normal distribution with mean vector (0, 0) and covariance matrix such that 0.5 2 and 0.7 2 are the marginal variances and 0.3 the covariance for (W i s , W j s ). We consider four different situations: no contamination, contamination = 0.05, contamination = 0.1, and contamination = 0.2.
Under these conditions, we obtain Figure 1 in which we appreciate that the approximations are very good, especially in the tails, which are the areas of interest for tests and confidence intervals. We include in Table 1

Robustness of the Method-of-Moments-Estimator
We can observe the lack of robustness of the distribution of the method-of-momentsestimator in Figure 2 as we increase or g.
The programs in R, used to obtain this figure, are in the Supplementary Materials.

Remark 2.
The sample size N h , considered in each estimation, depends on the value of the lag h, that is fixed in advance. If h is small, the number of lags will be large and N h will be small. The VOM+SAD approximations obtained in the paper are very accurate, even in this case. Nevertheless, if h is large, the number of lags will be small and the sample size N h will be large. In this case, it is easier to compute the leading term as is the product of two standard normal variables with correlation coefficient ρ. The characteristic function of this product is (expression (4) in [10]) and then, the mean of this product variable X s is ϕ (0)/i = ρ and the second moment about the origin is ϕ (0)/i 2 = 1 + 2ρ 2 . Hence, the variance will be 1 + ρ 2 and the leading term can be computed if N h is large, as Since, if = 0 or g = 1, the scale contaminated normal distribution is just a normal distribution, this last expression is an approximation for the distribution of the classical method-of-moments estimator under the usual underlying normal distribution model.

α-Trimmed Cross-Variogram Estimator
Another robust estimator for the cross-variogram, which is not an M-estimator, can be obtained by trimming the X s observations as follows: Considering the initial pair of variables Z i and Z j , and transforming them to the s , if we trim the 100 · α% of the smallest and the 100 · α% of the largest ordered data X (i) , the (symmetrically) sample α-trimmed cross-variogram estimator is defined as stands for the integer part.
To obtain an approximation for its sample distribution, we use an accurate VOM+SAD approximation obtained in [21]. From Corollary 1 therein, we can approximate the small sample distribution of the sample α-trimmed cross-variogram 2 γ ij α (h) when the observations X i come from F = (1 − )G + H, with k iterations (k large), by the VOM+SAD approximation to the distribution of the method-of-moments-estimator 2 γ ij (h), obtained in the previous section, as In the bottom row of Figure 3, we plot the tail probability of the 0.2-trimmed crossvariogram estimator 2 γ ij α (h) with no contamination ( = 0) and with two percentages of contamination: = 0.15 and = 0.3, with the sample size being N h = 10. We observe in this figure that, as we increase the contamination percentage, i.e., as we increase , the tail probabilities obtained with the trimmed cross-variogram estimators are affected but by less than those obtained with the classical method-of-moments estimator. We see this by comparing the first row of figures (non-trimmed cross-variogram estimators) with the second row of figures (trimmed cross-variogram estimators).

Huber's Cross-Variogram Estimator
If the ψ function, ψ(x, t) = ψ(x − t), used to obtain the M-estimator in Equation (6) is the Huber's function ψ b (u) = min{b, max{u, −b}}, the M-estimator obtained is called the Huber's cross-variogram estimator, 2 γ ij H (h). Since its score function is bounded, this estimator will be robust.
An approximation for its distribution can be obtained from (10). Nevertheless, the leading term P G 2 γ ij H (h) > t is not easy to compute. For this reason, in this case, we use the Lugannani and Rice formula to approximate this leading term, the VOM+SAD approximation for the distribution of the Huber's cross-variogram estimator being the following: where the saddlepoint z 0 is such that with G and H being the distributions that appear in (5), and where all the functionals r, r 1 , and s are computed with respect to model G. This approximation may seem complicated but it is easy to compute using the huber function of the MASS library, [23]. Example 1. In order to analyze the behaviour of the robust estimators defined in the paper, we compare them with the classical method-of-moments estimator, carrying out a simulation study in which we compare the 0.1-trimmed and Huber's variogram estimators with the classical one.
The study consists of a simulation of two spatial and statistical correlated variables Z 1 and Z 2 , both with a normal distribution, in different situations, with some of them considered, for instance, in [4]: The details of the simulations are in the Supplementary Materials. In these simulations, we observe less sensitivity in the robust estimators than in the classical one, as we increase the contamination in the model. We appreciate this in Figures 4 and 5. In the first one, we observe that the classical variogram model can be accepted for the three estimations in case (A), where there is no contamination. Nevertheless, as we increase the contamination ( Figure 5), this variogram model does not represent the classic variogram estimations; only in some cases it represents the 0.1-trimmed variogram estimations, and it can be accepted when we consider Huber's variogram estimations except, perhaps, in the last case, where it is doubtful.

Linearized Version of the Cross-Variogram Model
We saw at the end of Section 2.2 that, if linear models can be accepted as variograms and cross-variograms, the variables X s can be considered independent. These linearized versions of the model (classical and robust) were introduced in Section 9 of [5] and can be applied to model the cross-variogram. They essentially consists of replacing, before the range, the increasing part of the traditional variogram, or cross-variogram, using the regression line and, after the range, using the sill (or the robust sample mean in the robust linearized version).
Additionally, the test defined in Section 10.1 of [5] can be used to check if these models can be accepted, using saddlepoint approximations for the robust (and classical) estimators of the variograms and cross-variograms.
Namely, we test the null hypothesis of a particular variogram or cross-variogram model M 0 (h) from which we obtain the theoretical variogram values 2γ ii (h) (or crossvariogram values 2γ ij (h))) using as test statistic assuming that we consider K lags. If we unify both as the cumulative distribution function of S n is (see [5]) probabilities that are computed with the VOM+SAD approximations.
We remark that the number K of lags (and hence the value of h) can be modified to obtain the desired linearity.

Example 2.
Let us consider prediction data, included in the jura data set from Pierre Goovaerts' book that contains geolocated information of several variables. This data set is called prediction.dat in the R library, gstat.
Two correlated variables, with a distribution similar to a scale contaminated normal model, are ln(Pb) (natural logarithm of Lead) and Ni (Nickel).
The values of the classical method-of-moments estimator, the 0.1-trimmed crossvariogram estimator, and the Huber's cross-variogram estimator (with tuning constant b = 1.5) are easily obtained for these variables, as can be seen in the Supplementary Materials. The lag distant chosen was h = 0.2. These values are shown in Figure 6.
To use their distributions, obtained in the paper, it is necessary to check if we can accept linear variograms for these two variables and a linear cross-variogram for the pair, as it was pointed out in Section 2.2. If this is the case, the variables X s , s = 1, ..., n can be considered independent.  Assuming as underlying model, a scale contaminated normal with = 0.01 and g = 1.1, the linearized versions of the variograms for the logarithm of Lead are shown in Figure 7. The linearized versions of the variograms for Nickel are shown in Figure 8.
Finally, the linearized versions for the cross-variograms models are shown in Figure 9.    From a visual point of view, all these linearized versions can be accepted using the test considered in Section 7. The values of the test statistics S n = sup h 2 γ(h) − 2γ(h) and the p-values are given in Table 3 (see the Supplementary Materials). Thus, the independence of the X s can be accepted. We conclude the paper with a real-data example in which we observe how robust cross-variogram estimations provide models less sensitive to outliers, which will lead us to a more robust cokriging.  [24][25][26].
Two of these 4 variables are strongly correlated and have a distribution similar to a scale contaminated normal model; they are NO and NO 2 .
The variogram-crossvariogram matrix of the classical variogram and cross-variogram estimators along with classical least squares model (Mather's model in this case) are shown in Figure 10.
The values of the classical method-of-moments estimator, the 0.1-trimmed crossvariogram estimator, and the Huber's cross-variogram estimator (with tuning constant b = 1.5) for these variables are obtained in the Supplementary Materials. These values are shown in Figure 11, along with the linearized cross-variogram models.
We observe that, at first lag, the three estimations agree. In the others, we can see the soft effect of the 0.1-trimmed cross-variogram and Huber's cross-variogram estimators.
The linearized versions of the variograms and cross-variogram can be accepted, and therefore, the independence of the transformed variables X s , s = 1, ..., n.
Moreover, we appreciate the influence of the outliers in the estimation of the (linearized) cross-variogram in Figure 11 and, therefore, on the cokringing obtained with classical cross-variogram models. Thus, the use of robust estimators of the cross-varogram will be more reasonable in order to obtain a robust cokriging.

Conclusions
In this paper, we introduced new robust cross-variogram estimators and we obtained saddlepoint approximations for their distributions when the underlying model is a scalecontaminated normal distribution. We also obtained an approximation for the distribution of the method-of-moments estimator.
These approximations are especially useful when the sample size is small, a situation that we have when the size of the lag h is small.
We also proposed a suitable transformation of the initial observations to avoid the traditional dependence of the spatial observations. We see that is that linear variograms and a linear cross-variogram can be accepted as models to obtain this. −E (Z i (s) · Z j (s + h) + E (Z i (s) · Z j (s) Proof of Proposition 1. If Z i is a variable with normal distribution, Z i ≡ N(µ i , σ 2 i ), where X ≡ H stands for "X is distributed as H"; then, it is W i s = (Z i (s + h) − Z i (s)) ≡ N(0, 2γ ii (h)) because of the intrinsic stationary property of Z i .
If Z i has a distribution (1 − )N(µ i , σ 2 i ) + N(µ i , g 2 σ 2 i ) = (1 − )N 1 + N 2 , the cumulative distribution function of W i s will be where Φ is the cumulative distribution function of the standard normal distribution.