Testing Goodness-of-Fit of Parametric Spatial Trends

Meilán-Vila, Andrea; Opsomer, Jean; Francisco-Fernández, Mario; Crujeiras, Rosa M.

doi:10.3390/proceedings2181185

Open AccessExtended Abstract

Testing Goodness-of-Fit of Parametric Spatial Trends^†

by

Andrea Meilán-Vila

^1,*,

Jean Opsomer

²,

Mario Francisco-Fernández

¹ and

Rosa M. Crujeiras

³

¹

Departamento de Matemáticas, Universidade da Coruña, 15071 A Coruña, Spain

²

Westat Inc., Rockville, MD 20850, USA

³

Departamento de Estadística, Análisis Matemático y Optimización, Universidade de Santiago de Compostela, 15782 Santiago de Compostela, Spain

^*

Author to whom correspondence should be addressed.

^†

Presented at the XoveTIC Congress, A Coruña, Spain, 27--28 September 2018.

Proceedings 2018, 2(18), 1185; https://doi.org/10.3390/proceedings2181185

Published: 17 September 2018

(This article belongs to the Proceedings of XoveTIC Congress 2018)

Download Versions Notes

Abstract

:

The aim of this work is to propose and analyze the behavior of a test statistic to assess a parametric trend surface, that is, a regression model with spatially correlated errors. The asymptotic behavior under the null hypothesis, as well as the asymptotic power of the test under local alternatives will be analyzed. Finite sample performance of the test is addressed by simulation, introducing a bootstrap calibration procedure.

Keywords:

model checking; spatial trend; local linear regression; least squares; bootstrap

1. Introduction

Consider a spatial stochastic process, which consists of a collection of random variables indexed on a certain domain of

R^{2}

, with a well-defined joint distribution. In this framework, the observed data usually exhibit an important feature: close observations tend to be more similar than those which are far apart. Therefore, such observations cannot be treated as independent and the dependence structure should be taken into account in any descriptive or inferential procedure. In particular, from the perspective of spatial regression models (a trend surface plus an error term), the dependence structure should be considered and properly introduced into the model.

A common task in statistics is to determine whether a parametric model is an appropriate representation of a dataset. Under the assumption of independent errors, some authors have developed goodness-of-fit tests for parametric models that rely on a smooth alternative estimated by a nonparametric regression method, as [1] or [2].

A new proposal for testing a parametric trend surface is given in this paper. The proposed test is based on a comparison between a smooth version of a parametric fit with a nonparametric estimator of the trend (specifically, the multivariate local linear estimator will be used) in terms of a distance.

2. Statistical Model

Let

{Z (s), s \in D}

be a random spatial process consisting of collections of random variables indexed in a domain

D \subset R^{2}

with a well-defined joint distribution. Consider n locations

{s_{1}, \dots, s_{n}}

on the region D generated from a density f. The set of random variables corresponding with those locations will be represented by

{Z (s_{1}), \dots, Z (s_{n})}

. Assume the model

Z (s_{i}) = m (s_{i}) + ε (s_{i}), i = 1, \dots, n,

(1)

where m is an unknown smooth regression function which is supposed to be twice continuously differentiable. The

ε

are unobserved random variables with

E [ε (s_{i})] = 0, Cov (ε (s_{i}), ε (s_{j})) = σ^{2} ρ_{n} (s_{i} - s_{j}), i, j = 1, \dots, n,

where

σ^{2} < \infty

and

ρ_{n}

is a continuous correlation function satisfying

ρ_{n} (0) = 1

,

ρ_{n} (s) = ρ_{n} (- s)

and

| ρ_{n} (s) | \leq 1

,

\forall s

. The goal of this work is to test if the trend function belongs to a parametric family:

H_{0} : m \in M_{β} = {m_{β}, β \in B}, vs . H_{a} : m \notin M_{β},

(2)

with

B \subset R^{p}

a compact set. One of the more usual approaches is to compare a smooth version of a parametric fit with a nonparametric estimator of

m (s)

and “thereafter” to reject

H_{0}

if the distance between both fits exceeds a critical value.

3. Test Statistic

A suitable test statistic in order to solve the testing problem (2) could be computed as a weighted

L_{2}

—distance between the nonparametric and parametric fits, as in [2]:

T_{n} = n {| H |}^{1 / 2} \int_{D}^{} {({\hat{m}}_{H}^{L L} (s) - {\hat{m}}_{H, \hat{β}}^{L L} (s))}^{2} w (s) d s,

(3)

where w is a weight function. A full definition of the elements of the test statistic

T_{n}

can be found in Appendix A. For the calibration of the critical values, a bootstrap procedure is considered, see Appendix B.

4. Simulations

In this section, a simulation study showing the performance of the bootstrap procedure is presented. For this purpose, 500 samples of size

n = 400

are generated from an isotropic spatial process observed at regularly spaced locations

{s_{1}, \dots, s_{n}}

in the unit square, where

s_{i} = (s_{i 1}, s_{i 2})

,

i = 1, \dots, n

:

Z (s_{i}) = 2 + s_{i 1} + s_{i 2} + c s_{i 1}^{3} + ε (s_{i}), 1 \leq i \leq n .

(4)

The random errors

ε (s_{i})

are normally distributed with zero mean and exponential covariance function

Cov (ε (s_{i}), ε (s_{j})) = σ^{2} {exp (- ∥ s_{i} - s_{j} ∥ / a_{e})},

with

σ = 0.4

and

σ = 0.8

. Different values of parameter

a_{e}

are considered:

a_{e} = 0.4, 0.6, 0.8

. The bootstrap procedure has been performed using

B = 500

replicas for each sample. The weight function used was taken as

w (s) = 1

. For simplicity, the bandwidth matrix was considered

H = diag (h, h)

, and different bandwidth values were chosen,

h = 0.10, 0.15, 0.20

.

In Table 1, the simulated rejection probabilities obtained for

T_{n}

are presented for the significance level

α = 0.05

over the 500 trials. When c is equal to zero (under the null hypothesis of linearity of the trend), the proportion of rejections obtained is similar to the considered significance level, but this proportion depends directly on the value of the bandwidth h. When c is equal to 5 or 10, the power of the test is really good, since the proportion of rejections is close to one, in the majority of the cases. Again, this proportion depends on the value of the bandwidth.

Funding

This research has received financial support from the Xunta de Galicia and the European Union (European Social Fund-ESF). This research has been partially supported by MINECO grant MTM2014-52876-R, MTM2016-76969-P and MTM2017-82724-R and by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2016-015 and Centro Singular de Investigación de Galicia ED431G/01), all of them through the ERDF.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Appendix A

The trend surface estimation can be performed using a parametric and a non-parametric approach. In the parametric context, an iterative estimation procedure could be used. Denoting

Z = {(Z (s_{1}), \dots, Z (s_{n}))}^{'}

and

m_{β} = {(m_{β} (s_{1}), \dots, m_{β} (s_{n}))}^{'}

, under

H_{0}

the steps of the procedure are:

(1) Based on the sample, estimate the trend parameter

β

using the ordinary least squares estimator, ignoring the dependence structure of the errors:

\tilde{β} = \arg min_{β} {(Z - m_{β})}^{'} (Z - m_{β}) .

(2) Estimate the variance-covariance matrix of the errors

Σ

using the residuals

\tilde{ε} (s_{i}) = Z (s_{i}) - m_{\tilde{β}} (s_{i})

,

i = 1, \dots, n

, obtained from the estimator of the trend from Step (1). Note that, the entries of

Σ

are:

Σ (i, j) = C_{θ} (s_{i} - s_{j}), i, j = 1 \dots, n,

where

C_{θ} (s_{i} - s_{j}) = σ^{2} - γ_{θ} (s_{i} - s_{j})

, being

{2 γ_{θ} (u) : θ \in Θ \subset R^{q}}

a valid parametric family to estimate the variogram function.

(3) Estimate the trend parameter

β

using the weighted least squares estimator, taking the dependence structure of the errors into account:

\hat{β} = \arg min_{β} {(Z - m_{β})}^{'} {\tilde{Σ}}^{- 1} (Z - m_{β}) .

Therefore, the parametric trend estimator considered would be

m_{\hat{β}}

. Note that, an estimation of

Σ

can be obtained from the residuals

\tilde{ε} (s_{i}), i = 1, \dots, n

, as follows:

\tilde{Σ} (i, j) = C_{{\tilde{θ}}_{L S}} (s_{i} - s_{j}) = {\tilde{σ}}^{2} - γ_{{\tilde{θ}}_{L S}} (s_{i} - s_{j}), i, j = 1 \dots, n,

where

γ_{{\tilde{θ}}_{L S}}

is the parametric least squares estimator of the variogram and

{\tilde{σ}}^{2}

is an estimator of the variance. The last estimator could be obtained using a least squares procedure.

From a nonparametric point of view, model (1) has been studied by several authors. Some approaches used for this task include kernel-based methods. In this case, the trend is estimated using the multivariate local linear estimator, see [3]. In the spatial framework, the local linear estimator for

m (s)

at a location

s

can be explicitly written as

{\hat{m}}_{H}^{L L} (s) = e_{1}^{'} {(X_{s}^{'} W_{s} X_{s})}^{- 1} X_{s}^{'} W_{s} Z,

where

e_{1} = {(1, 0, 0)}^{'}

,

X_{s}

is a

n \times 3

matrix whose i-th row equals

(1, {(s_{i} - s)}^{'})

,

i = 1, \dots, n

,

W_{s} = diag {K_{H} (s_{1} - s), \dots, K_{H} (s_{n} - s)}

, where

K_{H} (s) = {| H |}^{- 1} K (H^{- 1} s)

is used to assign weights. H is a

2 \times 2

symmetric, positive definite matrix depending on the sample size n and K is a multivariate kernel function. Given

s

, the bandwidth

H

controls the shape and the size of the local neighborhood used to estimate m.

Therefore, taking into account these estimators, the proposed test statistic is

T_{n} = n {| H |}^{1 / 2} \int_{D}^{} {({\hat{m}}_{H}^{L L} (s) - {\hat{m}}_{H, \hat{β}}^{L L} (s))}^{2} w (s) d s,

where w is a weight function and

{\hat{m}}_{H, \hat{β}}^{L L}

is a smooth version of the parametric estimator

m_{\hat{β}}

, which is defined by

{\hat{m}}_{H, \hat{β}}^{L L} (s) = e_{1}^{'} {(X_{s}^{'} W_{s} X_{s})}^{- 1} X_{s}^{'} W_{s} m_{\hat{β}},

where

m_{\hat{β}} = {(m_{\hat{β}} (s_{1}), \dots, m_{\hat{β}} (s_{n}))}^{'}

.

Appendix B

Once a suitable test statistic is available, a crucial task is the calibration of critical values for a given level

α

, namely

t_{α}

. Usually, the estimation of these critical values

t_{α}

such that

P_{H_{0}} (T_{n} \geq t_{α}) = α

can be done by means of the asymptotic distribution. The use of asymptotic theory to calibrate the test poses some problems, such as the need to estimate some nuisance functions and a slow convergence rate to the limit distribution. Under these circumstances, calibration can be done by means of resampling procedures, such as bootstrap, see [4].

The procedure consists in generating a bootstrap sample

{Z^{*} (s_{i}), i = 1, \dots, n}

and then computing a bootstrap statistic

T_{n}^{*}

like

T_{n}

by the squared deviation between the smooth version of the parametric fit

{\hat{m}}_{{\hat{β}}^{*}}^{L L}

and the nonparametric fit

{\hat{m}}^{* L L}

. Once the bootstrap statistic is computed, the distribution of

T_{n}^{*}

can be approximated by Monte Carlo. From this Monte Carlo approximation, the

(1 - α)

quantile

t_{α}^{*}

is defined and the parametric hypothesis es rejected if

T_{n} > t_{α}^{*}

. The specific steps for the algorithm used in this work are the following:

Obtain the parametric trend estimator $\hat{β}$ .
Estimate the covariance matrix of the errors $\hat{Σ}$ based on the residuals $\hat{ε} = {(\hat{ε} (s_{1}), \dots, \hat{ε} (s_{n}))}^{'}$ , where $\hat{ε} (s_{i}) = Z (s_{i}) - m_{\hat{β}} (s_{i})$ , $i = 1, \dots, n$ , and find the matrix L, such that $\hat{Σ} = L L^{'}$ , using Cholesky decomposition.
Compute the independent residuals, $e = {(e (s_{1}), \dots, e (s_{n}))}^{'}$ , given by $e (s_{i}) = L^{- 1} \hat{ε} (s_{i})$ .
These independent variables are centered and, from them, we obtain an independent bootstrap sample of size n, denoted by $e^{*} = (e^{*} (s_{1}), \dots, e^{*} (s_{n}))$ .
Finally, the bootstrap errors $ε^{*} = (ε^{*} (s_{1}), \dots, ε^{*} (s_{n}))$ are $ε^{*} (s_{i}) = L e^{*} (s_{i})$ , and the bootstrap samples are $Z^{*} (s_{i}) = m_{\hat{β}} (s_{i}) + ε^{*} (s_{i})$ .

References

Hardle, W.; Mammen, E. Comparing nonparametric versus parametric regression fits. Ann. Stat. 1993, 21, 1926–1947. [Google Scholar] [CrossRef]
Alcalá, J.; Cristóbal, J.; González-Manteiga, W. Goodness-of-fit test for linear models based on local polynomials. Stat. Probab. Lett. 1999, 42, 39–46. [Google Scholar] [CrossRef]
Fan, J.; Gijbels, I. Local Polynomial Modelling and Its Applications: Monographs on Statistics and Applied Probability; CRC Press: Boca Raton, FL, USA, 1996; Volume 66. [Google Scholar]
Francisco-Fernández, M.; Jurado-Expósito, M.; Opsomer, J.; López-Granados, F. A nonparametric analysis of the spatial distribution of Convolvulus arvensis in wheat-sunflower rotations. Environmetrics 2006, 17, 849–860. [Google Scholar] [CrossRef]

Table 1. Proportion of rejections of the null hypothesis.

				h
$σ$	$a_{e}$	c	0.10	0.15	$0.20$
$0.4$	0.4	0	0.052	0.047	$0.042$
		5	0.897	0.932	$0.911$
		10	0.905	0.948	$0.923$
$0.4$	0.6	0	0.054	0.042	$0.034$
		5	0.856	0.901	$0.898$
		10	0.894	0.926	$0.918$
$0.8$	0.8	0	0.068	0.048	$0.038$
		5	0.808	0.798	$0.806$
		10	0.845	0.803	$0.816$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Meilán-Vila, A.; Opsomer, J.; Francisco-Fernández, M.; Crujeiras, R.M. Testing Goodness-of-Fit of Parametric Spatial Trends. Proceedings 2018, 2, 1185. https://doi.org/10.3390/proceedings2181185

AMA Style

Meilán-Vila A, Opsomer J, Francisco-Fernández M, Crujeiras RM. Testing Goodness-of-Fit of Parametric Spatial Trends. Proceedings. 2018; 2(18):1185. https://doi.org/10.3390/proceedings2181185

Chicago/Turabian Style

Meilán-Vila, Andrea, Jean Opsomer, Mario Francisco-Fernández, and Rosa M. Crujeiras. 2018. "Testing Goodness-of-Fit of Parametric Spatial Trends" Proceedings 2, no. 18: 1185. https://doi.org/10.3390/proceedings2181185

Article Menu

Testing Goodness-of-Fit of Parametric Spatial Trends^†

Abstract

1. Introduction

2. Statistical Model

3. Test Statistic

4. Simulations

Funding

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Testing Goodness-of-Fit of Parametric Spatial Trends †

Abstract

1. Introduction

2. Statistical Model

3. Test Statistic

4. Simulations

Funding

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Testing Goodness-of-Fit of Parametric Spatial Trends^†