Testing Goodness-of-Fit of Parametric Spatial Trends

The aim of this work is to propose and analyze the behavior of a test statistic to assess a parametric trend surface, that is, a regression model with spatially correlated errors. The asymptotic behavior under the null hypothesis, as well as the asymptotic power of the test under local alternatives will be analyzed. Finite sample performance of the test is addressed by simulation, introducing a bootstrap calibration procedure.


Introduction
Consider a spatial stochastic process, which consists of a collection of random variables indexed on a certain domain of R 2 , with a well-defined joint distribution. In this framework, the observed data usually exhibit an important feature: close observations tend to be more similar than those which are far apart. Therefore, such observations cannot be treated as independent and the dependence structure should be taken into account in any descriptive or inferential procedure. In particular, from the perspective of spatial regression models (a trend surface plus an error term), the dependence structure should be considered and properly introduced into the model.
A common task in statistics is to determine whether a parametric model is an appropriate representation of a dataset. Under the assumption of independent errors, some authors have developed goodness-of-fit tests for parametric models that rely on a smooth alternative estimated by a nonparametric regression method, as [1] or [2].
A new proposal for testing a parametric trend surface is given in this paper. The proposed test is based on a comparison between a smooth version of a parametric fit with a nonparametric estimator of the trend (specifically, the multivariate local linear estimator will be used) in terms of a distance.

Statistical Model
Let {Z(s), s ∈ D} be a random spatial process consisting of collections of random variables indexed in a domain D ⊂ R 2 with a well-defined joint distribution. Consider n locations {s 1 , . . . , s n } on the region D generated from a density f . The set of random variables corresponding with those locations will be represented by {Z(s 1 ), . . . , Z(s n )}. Assume the model where m is an unknown smooth regression function which is supposed to be twice continuously differentiable. The ε are unobserved random variables with where σ 2 < ∞ and ρ n is a continuous correlation function satisfying ρ n (0) = 1, ρ n (s) = ρ n (−s) and |ρ n (s)| ≤ 1, ∀s. The goal of this work is to test if the trend function belongs to a parametric family: with B ⊂ R p a compact set. One of the more usual approaches is to compare a smooth version of a parametric fit with a nonparametric estimator of m(s) and "thereafter" to reject H 0 if the distance between both fits exceeds a critical value.

Test Statistic
A suitable test statistic in order to solve the testing problem (2) could be computed as a weighted L 2 -distance between the nonparametric and parametric fits, as in [2]: where w is a weight function. A full definition of the elements of the test statistic T n can be found in Appendix A. For the calibration of the critical values, a bootstrap procedure is considered, see Appendix B.

Simulations
In this section, a simulation study showing the performance of the bootstrap procedure is presented. For this purpose, 500 samples of size n = 400 are generated from an isotropic spatial process observed at regularly spaced locations {s 1 , . . . , s n } in the unit square, where s i = (s i1 , s i2 ), i = 1, . . . , n: The random errors ε(s i ) are normally distributed with zero mean and exponential covariance function Cov(ε(s i ), ε(s j )) = σ 2 {exp(− s i − s j /a e )}, with σ = 0.4 and σ = 0.8. Different values of parameter a e are considered: a e = 0.4, 0.6, 0.8. The bootstrap procedure has been performed using B = 500 replicas for each sample. The weight function used was taken as w(s) = 1. For simplicity, the bandwidth matrix was considered H = diag(h, h), and different bandwidth values were chosen, h = 0.10, 0.15, 0.20.
In Table 1, the simulated rejection probabilities obtained for T n are presented for the significance level α = 0.05 over the 500 trials. When c is equal to zero (under the null hypothesis of linearity of the trend), the proportion of rejections obtained is similar to the considered significance level, but this proportion depends directly on the value of the bandwidth h. When c is equal to 5 or 10, the power of the test is really good, since the proportion of rejections is close to one, in the majority of the cases. Again, this proportion depends on the value of the bandwidth.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Appendix A
The trend surface estimation can be performed using a parametric and a non-parametric approach. In the parametric context, an iterative estimation procedure could be used. Denoting Z = (Z(s 1 ), · · · , Z(s n )) and m β = (m β (s 1 ), . . . , m β (s n )) , under H 0 the steps of the procedure are: (1) Based on the sample, estimate the trend parameter β using the ordinary least squares estimator, ignoring the dependence structure of the errors: (2) Estimate the variance-covariance matrix of the errors Σ using the residualsε(s i ) = Z(s i ) − mβ(s i ), i = 1, . . . , n, obtained from the estimator of the trend from Step (1). Note that, the entries of Σ are: (3) Estimate the trend parameter β using the weighted least squares estimator, taking the dependence structure of the errors into account: Therefore, the parametric trend estimator considered would be mβ. Note that, an estimation of Σ can be obtained from the residualsε(s i ), i = 1, . . . , n, as follows: where γθ LS is the parametric least squares estimator of the variogram andσ 2 is an estimator of the variance. The last estimator could be obtained using a least squares procedure.
From a nonparametric point of view, model (1) has been studied by several authors. Some approaches used for this task include kernel-based methods. In this case, the trend is estimated using the multivariate local linear estimator, see [3]. In the spatial framework, the local linear estimator for m(s) at a location s can be explicitly written aŝ where e 1 = (1, 0, 0) , X s is a n × 3 matrix whose i-th row equals (1, (s i − s) ), i = 1, . . . , n, W s = diag{K H (s 1 − s), . . . , K H (s n − s)}, where K H (s) = |H| −1 K(H −1 s) is used to assign weights. H is a 2 × 2 symmetric, positive definite matrix depending on the sample size n and K is a multivariate kernel function. Given s, the bandwidth H controls the shape and the size of the local neighborhood used to estimate m. Therefore, taking into account these estimators, the proposed test statistic is where w is a weight function andm LL H,β is a smooth version of the parametric estimator mβ, which is where mβ = (mβ(s 1 ), . . . , mβ(s n )) .

Appendix B
Once a suitable test statistic is available, a crucial task is the calibration of critical values for a given level α, namely t α . Usually, the estimation of these critical values t α such that P H 0 (T n ≥ t α ) = α can be done by means of the asymptotic distribution. The use of asymptotic theory to calibrate the test poses some problems, such as the need to estimate some nuisance functions and a slow convergence rate to the limit distribution. Under these circumstances, calibration can be done by means of resampling procedures, such as bootstrap, see [4].
The procedure consists in generating a bootstrap sample {Z * (s i ), i = 1, . . . , n} and then computing a bootstrap statistic T * n like T n by the squared deviation between the smooth version of the parametric fitm LL β * and the nonparametric fitm * LL . Once the bootstrap statistic is computed, the distribution of T * n can be approximated by Monte Carlo. From this Monte Carlo approximation, the (1 − α) quantile t * α is defined and the parametric hypothesis es rejected if T n > t * α . The specific steps for the algorithm used in this work are the following: