Spatial Warped Gaussian Processes: Estimation and Efficient Field Reconstruction

A class of models for non-Gaussian spatial random fields is explored for spatial field reconstruction in environmental and sensor network monitoring. The family of models explored utilises a class of transformation functions known as Tukey g-and-h transformations to create a family of warped spatial Gaussian process models which can support various desirable features such as flexible marginal distributions, which can be skewed, leptokurtic and/or heavy-tailed. The resulting model is widely applicable in a range of spatial field reconstruction applications. To utilise the model in applications in practice, it is important to carefully characterise the statistical properties of the Tukey g-and-h random fields. In this work, we study both the properties of the resulting warped Gaussian processes as well as using the characterising statistical properties of the warped processes to obtain flexible spatial field reconstructions. In this regard we derive five different estimators for various important quantities often considered in spatial field reconstruction problems. These include the multi-point Minimum Mean Squared Error (MMSE) estimators, the multi-point Maximum A-Posteriori (MAP) estimators, an efficient class of multi-point linear estimators based on the Spatial-Best Linear Unbiased (S-BLUE) estimators, and two multi-point threshold exceedance based estimators, namely the Spatial Regional and Level Exceedance estimators. Simulation results and real data examples show the benefits of using the Tukey g-and-h transformation as opposed to standard Gaussian spatial random fields in a real data application for environmental monitoring.


Introduction
Spatial monitoring of physical phenomena has been at the center of many different spatial statistics and signal processing innovations in recent decades. Specific applications in which environmental processes are monitored and spatial field reconstruction is performed include environmental monitoring [1] and weather monitoring [2][3][4], natural phenomena such as temperature [5], precipitation [6], wind speed [7][8][9][10], pollution [11] and signal processing [12,13] to name a few.
In this work, we explore warping maps of Gaussian process models such as those developed in [7] for spatial field reconstruction. It is most commonly the case that spatial field modelling is undertaken with a variation of a Gaussian Process (GP) model due to its direct interpretation and mathematical tractability [14]. GP models are readily characterized by a mean structure and a valid covariance function [14]. Such models have been applied in many different domains for spatial and temporal modelling: [15][16][17] and the classic texts on such Gaussian process models for spatial statistics [18,19]. The warping extensions

Contributions and Outline
The main contributions developed in this work are twofold: formulating a class of spatial warped GP models based on Tukey g-and-h warping functions. This involves deriving important key properties of the Tukey g-and-h transformation and resulting family of spatial warped GP (Sections 2 and 3); and using those results to develop new estimation algorithms for efficient spatial field reconstructions (Section 4). Our contributions may be summarised as follows: (1) In Section 2, we derive mathematical properties of the proposed warped GP models warping function known as the Tukey g-and-h transformation function. This is actually a class of warping functions with sub-classes related to skewness and kurtosis transformations. Since this transformation is not widely known, it is valuable for practitioners to see the basic properties of this class of functions in order to better understand the resulting properties of the warped GP that arise when using this transform. In this regard we explore the transformation and its limiting behaviour (Proposition 2), the existence of the inverse transform (Proposition 1), and its derivatives (Lemma 1). This set of results is instrumental for the analysis of the properties of the Tukey g-and-h process and for the construction of spatial field approximations that form the key application in this paper. (2) In Section 3, we derive important statistical properties of the warped GP process which include properties of the finite dimensional distributions such as raw and central moments, the multivariate moments (Corollary 1), the population mean and variance (Corollaries 3-4), its higher order multivariate cross moments of any order (Proposition 3), and the index of regular variation (Lemma 2), to determine conditions in terms of the parameters of the g-and-h transform for the existence of all higher order mixed moments. In particular, the cross moments are directly relevant to the construction of efficient spatial field reconstructions for multiple locations in space that are performed in the application section. The tail behaviour of the finite dimensional distributions is captured by the tail index analysis and is relevant to understand models with excess kurtosis relative to a GP model, which is induced by our class of warping function. (3) In Section 4, we derive the spatial field reconstruction estimators of the Tukey gand-h random field under five different metrics, most of which are based on the predictive posterior density (Lemma 4). (4) In Section 5, a detailed set of numerical experiments are provided. This includes both real data cases studies as well as synthetic case studies which were developed to demonstrate key properties of the spatial field reconstruction methods and their accuracy and performance relative to a baseline GP model. In regards to the real data case studies, a sequence of experiments was developed which was based on data obtained from the National Climatic Data Center (NCDC) which included spatial observations of precipitation obtained from ground monitoring stations in the regions of interest. In particular, we took one month of hourly precipitation measurements to fit our warped GP model to obtain a spatial field reconstruction which was compared to a baseline GP model with no warping in order to demonstrate the benefits of a spatial warping framework.

Constructing a Flexible Warping Function for Spatial Gaussian Processes
Many families of warping function can be developed for extending the class of GP spatial process models. In this work, we particularly focus on the study of the family of Tukey g-and-h warping functions. When developing the new warped GP class of flexible models, the following list of desirable properties was considered desirable for the resulting warped GP model to posses (see [22]): (1) Parsimonious and finite number of well interpreted parameters; (2) Well-separated roles for skewness, kurtosis and the tail behaviour parameters; (3) Accurate and computationally efficient parameter estimation and tractability; (4) Derivable and interpretable stochastic properties; (5) Flexible structures in the marginal, conditional and joint density shapes; (6) Good inferential properties; (7) Exact and computationally efficient data generating mechanisms.
Some of these aforementioned properties have been studied before in the context of the class of warping functions we consider, namely the Tukey g-and-h transformed random GP fields, see for example [31][32][33][34][35][36] and the textbook overview in [37].
In this paper, we re-explore these properties and extend them for the purposes of spatial field reconstruction. Therefore, in this section, we first define the fundamental building blocks for deriving the non-Gaussian warped GP fields. As such, we define the basic set of non-linear transformations of univariate and multivariate Gaussian random variables and the resulting Tukey g-and-h univariate and multivariate distributions. We also derive some fundamental properties such as the existence of the inverse transform as well as the derivatives of the distribution that will be important for future results.
We note that, as a default, we will consider the base transformed random variable W to be Gaussian with µ = 0 and σ = 1 unless otherwise stated. Furthermore, in general the transformation F (W; η) can have many forms, see discussions in [32]. In this manuscript we concentrate on two specific types of transformation directly related to inducing skewness in the warped GP model or inducing kurtosis in the warped GP model. Note that standard GP models can have flexible covariance functions (second order flexibility) but there is no flexibility in the higher order structure relating to third comoments (co-skewness) and fourth co-moments (co-kurtosis) that can often be observed in real spatial field observation data. Therefore, the use of the Tukey sub-families of warping functions that specifically produce parameterization of such structure in the warped GP model is of practical relevance in flexible reconstruction methods for spatial fields.
These set of Tukey transformations modify the tail behaviour and the skewness relative to the base distribution, in this case the GP model which has finite dimensional Gaussian distributions point-wise in space. As such, the warped GP model will be able to produce a class of processes which, point-wise in space, will have distributions that are less symmetric around the mode. In particular they can accommodate left or right skewness, depending on the sign of g in the warping transform G. Furthermore, they can be heavier tailed than a Gaussian probability distribution, via leptokurtic transform H, and the strength of kurtosis relative to the Gaussian at each spatial point is controlled by parameter h. The Tukey g-and-h transformation is known to encompass a large range of distributions and this is further explained in [31,32]. This is one of the most desirable properties of this transformation.
Before proceeding in using this family of warping functions to define a warped GP spatial process, it will be useful to provide a few basic properties of this class of transform, that whilst are simple to mathematically verify, are also highly useful in developing further the specification of the properties of the resulting warped processes.
When working with warped GPs it is valuable to be able to evaluate the transformed finite dimensional density at a single location in space point-wise when performing tasks such as hyper-parameter estimation via the likelihood and also for spatial field reconstruction. Therefore, it is of relevance to ask if the warping function is invertible, as such a property leads one to obtaining a tractable expression for the finite dimensional distributions. We show in Proposition 1 the conditions under which the inverse of the warping function exists. Consequently, we can be sure that should we satisfy these conditions for the warping function, then in the case of the Tukey g-and-h transformation, it does accommodate a closed form distribution representation which will be tractable up to performing a simple univariate root search, as we will see later in Proposition 5.
Proposition 1 (Inverse of the Tukey g-and-h transformation). The inverse of the Tukey g-and-h transformation T gh W; ζ gh exists if g ∈ R − {0} and h ∈ R + Proof. See Appendix A.

Remark 1.
The result in Proposition 1 affirms the existence of the inverse of this transformation but does not provide an analytical expression for the inverse. In practice, one needs to find the inverse using simple univariate root search computational methods in order to facilitate evaluation of the warped distributions or densities point-wise. The existence of the inverse is still critical to verify for the warping transform of the GP, as it will be used to facilitate expression of the spatial processes finite dimensional distributions, which will in turn be used for specification of the likelihood for use in calibration and spatial field estimation and reconstruction.
It will also be useful to provide simple characterisations of the derivatives of the warping functions up to third order, as these will be used numerous times in the warped GP spatial field reconstruction estimation frameworks to capture the moments, mode and curvature approximations (covariance) point-wise. The derivatives of the Tukey g-and-h transform are given in Lemma 1 by simple algebra.
Tukey Warped Spatial Gaussian Process One may now proceed to developing a characterisation of a warped GP spatial process based around the Tukey family of skewness and kurtosis transformations. We begin by providing a definition of the GP that will set the notation we will use to specify the spatial trend and spatial covariance operator for the input GP spatial model. This GP model will serve as our base distribution (process) to generate non-Gaussian fields, i.e., warped GP processes. Then we will show how one can construct the warped GP model based on the Tukey g-and-h random field transform.

Definition 3.
Gaussian Process (GP) [14]: Denote by W (x) : X → R a stochastic process parameterised by {x} ∈ X , where X ⊆ R d . Then, the random function W (x) is a Gaussian process if all its finite dimensional distributions are Gaussian, where for any m ∈ N, the random vector (W(x 1 ), W(x 2 ), . . . , W(x m )) is jointly normally distributed [14].
We can therefore interpret a GP as formally defined by the following class of random functions: At each point the function valued process is characterised by the mean function µ(·; θ), parameterised by θ, and the spatial dependence between any two points is given by the covariance function (Mercer kernel) C(·, ·; Ψ), parameterised by Ψ, see detailed discussion in [14].
It will be useful to make the following notational definitions for the prior mean vector, cross correlation vector and auto-correlation matrix, respectively: with X + (R m ) is the manifold of symmetric positive definite matrices. C x i , x j is the covariance function.

Properties of the Spatial Tukey Warped Gaussian Process
Having developed a basic definition of the warped GP model based on a Tukey family of warping functions, we can now study the basic properties of this spatial process by first exploring its finite dimensional distributions. Such basic finite dimensional multi-point characterisations are useful for preparing spatial field reconstruction estimators for these warped GP models. This starts with the model for a single point in space (univariate marginal warped GP model in Section 3.1), followed by specification of the properties of multiple-points in space (multivariate finite dimensional marginal warped GP model in Section 3.3).

Spatial Tukey Warped Gaussian Characterisation at a Single Point in Space
At any point in space x ∈ X ⊆ R 2 one may consider the properties of the single location warped GP model and its properties. In this case, at one location we may consider as input a standard GP with zero spatial mean and spatial covariance function k giving a Gaussian random variable W(x) ∼ N (µ, C(x, x)). Then the baseline reference Tukey g-and-h warped random variable is obtained by setting ζ gh := [m 0 , m 1 , µ, C, g, h, θ = 1] where g ∈ R − {0} and h ∈ R + to give: In the next proposition, we demonstrate that the warped Gaussian model will include the classical Gaussian model as a limiting case. This result is most readily proven for the setting of a single point in space, where we show that Y(x) has a Gaussian law under some special specification of the warping function or its parameters.

Remark 2.
The parameter g ∈ R modifies the skewness relative the base distribution, in this case the Gaussian. A positive value of g indicates a positive skew in the transformed distribution and a negative value of g indicates a negative skew. The parameter h indicates a transformation in the kurtosis relative to the base distribution and to guarantee existence of moments, one may consider a restricted range such as 0 ≤ h ≤ 1.
We can also easily characterise the tail behaviour of the propose g-and-h warping transform to demonstrate how the h parameter in the H transform is able to produce leptokurtic behaviour in the resulting warped Gaussian density. One way to understand this is to demonstrate that the resulting warped distribution function has a property of regular variation, in other words it can be expressed as a power law tail decay and a slowly varying function.
Therefore, we briefly study regular variation of a distribution of the Tukey g-and-h distribution, which is a measure of the heaviness of tails of a distribution. Later, we connect this to the existence of the n-th moment. The index of the regular variation for the Tukey gand-h distribution has been studied in [32,39], we summarise this result below for a generic base Gaussian distribution with W ∼ N (µ, C(x, x)). It is important to understand under what conditions one has finite first and second order moments for a model with a specific kurtosis transform if one is to develop classes of linear spatial field reconstruction based on first and second moment local approximations. An advantage of the Tukey g-and-h warping function is that one can set conditions of the warping functions through simple parameter space restrictions to ensure existence of appropriate moments for use in spatial field reconstructions. The following result in Lemma 2 is meaningful to accommodate this analysis.

Lemma 2.
Consider a location x ∈ X and input GP random variable W(x) ∼ N (µ, C(x, x)) then the Tukey g-and-h random variable Y(x) := T gh W(x); ζ gh with parameters ζ gh : Proof. See Appendix C.

Spatial Tukey Warped Gaussian Characterisation at Multiple Points in Space
One then readily extends the model development from the univariate Tukey g-and-h random variable, that characterises the warped GP model at a single location x ∈ X , to a multiple-location specification at points x 1 , . . . , x m ∈ X which requires specification of a multivariate Tukey g-and-h random vector Y = [Y(x 1 ), . . . , Y(x m )]. Such extensions have for instance been specified in constructions in [4,31,35] where the single point to multiple point extension is obtained by applying an element-wise Tukey g-and-h transformation to a Multivariate Normal distribution, W 1:m ∼ MV N (w; µ, Σ) with mean vector µ and covariance matrix Σ such that Σ ij = C(x i , x j ) based on the GP covariance kernel k at two points in space x i , x j . This gives for the j-th location x j the transformed random variable In understanding the basic properties of the resulting multivariate warped random vector for the Tukey g-and-h model at multiple spatial locations, we can study the exponential moments (Lemma 3) which will be instrumental in deriving many of the properties of the warped GP models spatial field reconstruction estimators. Although the Tukey g-and-h multivariate random vector and the Tukey g-and-h processes have been defined previously in the aforementioned works. They have not been carefully studied from the perspective of deriving closed form results for their higher order cross moments or spatial moments of any order, including cross covariance, cross skewness and cross kurtosis. In this section, our goal is therefore to derive certain desirable properties which help to characterize the Tukey g-and-h processes further than just their definition.
To achieve these spatial cross moment characterisations it will be useful to first state a result for the quadratic exponential moments of a multivariate normal random vector. This form of moment structure will arise naturally when expressing the cross moments of the class of warped GP model developed under Tukey transforms. Furthermore, these closed form results will be useful for developing computationally efficient and interpretable spatial field reconstructions based on a spatial Tukey warped GP model class.

Lemma 3 (Exponential Moments of a Gaussian Random Vector).
Consider a m-variate random vector W : . . , m} and u ∈ R m , then the quadratic exponential moments of W are given by: where, Proof. See Appendix D.
This result can now be developed into a core component, to characterise the raw cross moments and cross central moments, for the warped GP model. We again consider m-points in space in a vector x 1:m := [x 1 , . . . , x m ]. Let us first describe the generic equation for an m-variate raw cross moment of a spatial Tukey warped GP model.

Consider the m-point spatial Tukey warped GP model with random vector
Assume w.l.o.g. a standardised transform with location and scale parameters given by m 0 = 0, m 1 = 1 pointwise and ζ gh := [0, 1, g, h] selected such that existence of finite cross moments of orders (n 1 , . . . , n m ) for n i ∈ N holds. Then the raw spatial m-point cross moments of orders (n 1 , . . . , n m ) are given by: where and This general result is then instrumental in producing practical examples of raw mlocation cross moments of the spatial Tukey warped GP model of orders (n 1 , . . . , n m ) denoted byμ (n 1 ,...,n m ) gh as well as the centralised cross moments denoted by µ (n 1 ,...,n m ) gh , which are obtained by this same identity coupled with a binomial sum expansion to accommodate the centering transform at each of the locations x 1 to x m . This result generalises the previous results in the literature for moments of a Tukey g-and-h process which previously only explored pairwise (m = 2 points) second and first order moments, where as this result obviously provides generalised m ≥ 1 point cross moment expressions of any orders. This is a significant extension in characterising this family of warped GP models, which no-longer have sufficient statistics (functions) given solely by mean and covariance functions.

Practical cross Moment Special Cases for Spatial Field Reconstruction of Tukey Warped GP Models
The result in Proposition 3, whilst very general, can be used to readily express a few core identities of relevance to spatial field reconstruction estimators at multiple locations in space, when one assumes to have observations consistent with a class of spatial Tukey warped GP models.
In particular, we will state the moments and conditions for their existence in terms of the Tukey h parameter in the warping function H as well as the two-point cross moments.

Corollary 1 (Single Location Central Moments of Spatial Tukey Warped Gaussian Process).
The n i -th central moment at location x i is given by: As stated in Proposition 4 below, the results that allow one to connect the idea of regular variation to the existence of moments of a certain power are provided. One should then understand that as the tails of the distribution become heavier (as measured by the index of regular variation), the integrals involved in the calculation of moments may cease to converge to a finite value. As such, one can formally state the relationship between the moment existence and the regular variation index as follows in Proposition 4.

Proposition 4 (Existence of Moments of Single Location Spatial Tukey Warped GP).
For the n i -th moment to exist in Corollary 1 we have to ensure that the following inequality holds: Proof. The existence proof is based on the fact that the for the n-th moment to exist for a random variable, the index of regular variation of the random variable must be strictly greater than n. Thus, from Lemma 2, we get: where σ 2 = C(x i , x i ), for the Tukey g-and-h process at the location x i .
Next we can express, as a special case for two points x i , x j in space (m = 2), the cross moments of order (n i , n j ) as follows.

Corollary 2 (Two Location Cross Central Moments of Spatial Tukey Warped Gaussian Process).
The n i -th and the n j -th order cross moment between two locations x i and x j is given by: µ (0,...,0,n i ,0,...,0,n j ,0,...,0) gh where: Proof. This is direct application for points x i , x j of Proposition 3.
We note that with these results one may simply recover the mean at location x i with n i = 1 giving: and the variance at location x i with n i = 2 as follows: It will also be practically useful for the spatial field reconstructions to state the cross spatial covariance between the spatial field at new location x * and existing observed points x 1:N . These cross spatial covariance results will be critical for derivation of the multiple point Spatial-Best Linear Unbiased Estimator class we term the (S-BLUE) spatial field reconstruction. Consider the joint description of a location x * and the locations of the observations Y 1:N by the set {x 1 , x 2 , . . . , x N , x * }. We also define κ as the covariance at any two locations x i and x j . Then we obtain the following quantities using the aforementioned derivations applied to obtain these cross moments: Cov(Y * , Y 1:N ) := κ Having derived these properties of the Tukey g-and-h random field we are now ready to derive the results for various field reconstruction metrics.

Field Reconstruction for Spatial Tukey Warped Gaussian Processes
In many real world settings, for spatial field modelling and analysis, the spatial fields finite dimensional distributions or process may not be used explicitly in modelling and instead a selection of relevant point estimators will be used to characterise a statistical summary of the warped Gaussian process, in our case the Tukey g-and-h warped spatial processes. In this section, we propose estimators and show how to solve, for a variety of practically relevant spatial field point estimators, often called in spatial statistics "spatial field reconstruction" or "spatial field approximation/estimation" [7,8]. In our case, we tailored the spatial field reconstructions to three classes of estimator with varying efficacy and computational cost in construction.
We have formulated this spatial field reconstruction context in terms of classical risk minimization through a loss function formulating the class of estimator which is resolved with respect to the spatial process, namely the warped GP model. We are assuming the warped GP model is calibrated and here the exercise is not in calibration of this models hyper parameters for the spatial covariance kernel and warping skew and kurtosis parameters, namely g and h, respectively, but rather in regards to approximating efficiently point estimators for the spatial field at any collection of target locations.
In this regard we derive five different estimators for various important quantities often considered in spatial field reconstruction problems. The multi-point Minimum Mean Squared Error (MMSE) estimators, the multiple point Maximum A-Posteriori (MAP) estimators, an efficient class of multiple-point linear estimators based on the Spatial-Best Linear Unbiased (S-BLUE) estimators, and two multi-point threshold exceedance based estimators, namely the Spatial Regional and Level Exceedance estimators.
Throughout the remainder of the manuscript, in order to simplify the notation, we will denote process Y(x i ) at location x i simply by Y i . To perform spatial field reconstruction, under the assumption that the data generating process for the spatial observations is a spatial Tukey warped GP, one may approach the problem under a Bayesian paradigm. One may then develop field reconstruction estimators, at an arbitrary location x * , at which one has not observed the process, based on the predictive posterior density at any location in space, x * ∈ X , denoted p(Y * |Y 1:N ). Here we denote Y 1:N as the measurements obtained from the sensors that observe the process at N locations x 1:N ∈ X . This approach is widely used and is the standard in many applications in sensor networks [1][2][3][4][6][7][8][9][10] To proceed with development of the conditional probability of prediction of the distribution of the spatial field, at unobserved location x * , we need to specify the warped spatial GP's density, given in Lemma 4. This result shows how to predict the spatial field at the new unobserved location x * conditional on the observations of the warped GP spatial field Y 1:N at observed locations x 1:N . Due to the structure of the warping spatial process developed, we are able to analytically express the first two moments for the predictive condition mean and covariance in close form. This is critical for efficient development of point estimators for the spatial field reconstructions that will be developed in following sections.

Proof.
A derivation of this conditional predictive density for a general g-and-h density with Guassian input process is given in [31].
Furthermore, we may derive based on this predictive density the first two predictive conditional moments which we state in the following corollaries describing the posterior mean and posterior variance of the Tukey g-h process at some arbitrary location x * . There are obtained explicitly using results derived in Proposition 3 and the distribution characteristics specified in Section 3.

Spatial Field Reconstruction Risk Functionals and Loss Functions
In developing estimators formally for the spatial field reconstruction we introduce a set of different loss functions. These will characterise the risk functionals to be minimized with respect to the warped GP model in order to represent the spatial field reconstructions of different estimator types. We have developed five different spatial field reconstruction estimators that we formalise below based on the conditional predictive posterior distribution.
Based on the predictive posterior density, various point estimators, like the MMSE and the MAP estimators can be derived. These estimators provide a pointwise estimator of the intensity of the spatial filed, Y * at location x * . This enables us to reconstruct the whole spatial field by evaluating Y * on a fine grid of points.
We first define a set of objective functions that covers different aspects of spatial regression and hence we calculate different estimators, each one having its special characteristic. We then derive explicitly each of the estimators in closed form expressions.

(1) Minimum Mean Squared Estimator (MMSE) estimator
The MMSE estimator minimises the MSE by utilising a squared loss function: Then, the MMSE spatial field reconstruction estimator is given by The conditional expectation is the estimate that minimises the conditional MMSE and the minimum value is given by the conditional variance shown below: (2) Maximum A-Posteriori (MAP) estimator: The MAP estimator is the mode of the predictive posterior density and utilizes the 0/1 loss function: Then, the MAP spatial field reconstruction estimator is given by (3) Spatial-Best Linear Unbiased Estimator (S-BLUE): The S-BLUE utilises the same loss function as the MMSE estimator, but is restricted to the linear family of estimators, given by: where a * ∈ R and B T * ∈ R 1×N (4) Spatial Regional and Level Exceedance estimators: There are a few ways of characterizing the spatial exceedance of these processes, either through the region or the location of the spatial exceedance given a user defined threshold and the tolerance. The other characteristic is the level at which the random process exceeds the given threshold at a given location. We define these characterizations mathematically as follows: (a) Spatial-Regional Right Tail Exceedance (S-RTE): (b) Spatial-Regional Left Tail Exceedance (S-LTE): where T is the pre-defined threshold and α is the user defined tolerance.
Here we are interested in the locations at which the above exceedance is observed. The function D x (x * ; Y 1:N , x 1:N , T, α) : X → {0, 1}, i.e., it is a binary map in space indicating the locations x * at which the process exceeds a predefined threshold with a specified probability α. These locations may form points, lines or surfaces in X ⊆ R 2 . It is useful to define exceedances on both tails due to the presence of skewness and kurtosis which affects the tail behaviour asymmetrically. (c) Spatial-Level Exceedance (S-LE): Here we are interested to find the minimum quantile level α at which the process exceeds the given threshold T at a given location x * . The function D α (x * ; Y 1:N , x 1:N , T) : X → (0, 1), i.e., Now that we have specified various metrics by which the field reconstruction can be carried out, we detail each of the estimators.

Spatial Field Reconstruction Estimator Derivations
All the aforementioned objective functions except for the S-BLUE require the knowledge of the predictive posterior distribution conditioned on the observations and the sensor locations, given in Lemma 4. We begin by deriving the classical predictive spatial field estimator based on the minimum mean square error for the warped spatial GP.

Lemma 5 (MMSE spatial field reconstruction estimator [31]).
The MMSE spatial field reconstruction function is given by the mean of the predictive posterior which is of the form similar to the one described in Corollary 3 and is given by: The Mean Squared Error (MSE) is given by the variance of the predictive posterior which is of the form similar to the one described in Corollary 4 and is given by: Remark 3. The expression for the uncertainty associated with the MMSE estimate is shown in Equation (40). We can see that the expression depends on the predictive posterior density parameter µ p * , which in turn depends on the observations Y 1:N as seen from Lemma 4. This is significantly different from the GP, where the uncertainty does no depend on the observations and only depends on the covariances between the locations at which the data were observed. This influences applications that involve experimental design problems like sensor placement, sensor selection, etc, in that just knowing the information of the locations alone is no longer sufficient for the Tukey g-h process and repeated experiments may need to be performed for evaluating the characteristics of just one location set.

Remark 4.
Note that the existence of the MMSE at a location x * is given by, h < 1 nσ 2 p * . We also know that σ 2 p * ≤ k(x * , x * ). This means that for a given value of the parameter h, there may arise a case when the Tukey g-and-h population mean at a location does not exist but the conditional mean can exist. This is because as we gather observations around the location of interest, we reduce σ 2 p * and hence the effect of the kurtosis parameter on the conditional MMSE.

Proposition 5 (MAP spatial field reconstruction estimator:).
The spatial MAP estimator is obtained from the solution to the following optimisation problem: where w o * is the solution to the following optimsation problem.
Proof. See Appendix F.

Remark 5.
The optimisation problem in specified in Proposition 5, to obtain the MAP estimate of the Tukey g-and-h process at a location x * does not admit analytical solutions and hence has to be dealt with numerically. To this end, one can solve this one-dimensional optimisation through standard techniques such as Newton methods adopting a simple gradient descent (since we can obtain the derivatives of this expression at all values of w * ).

Proposition 6 (MAP spatial variance:).
The variance of the spatial MAP estimator is given by the hessian of the posterior GH distribution at a location x * , evaluated at the optimum value w o * obtained from Proposition 5. It is represented as follows: where w o * is the mode of the GH distribution at x * and f W (.) is the Gaussian pdf with mean and variance as derived in Lemma 4.

Proof. See Appendix G.
We now state a few corollaries for finding the solution to the optimisation problem posed for finding the MAP estimate at any arbitrary location. These are the cases when the parameters g → 0 and h → 0.

Proposition 7 (S-BLUE spatial field reconstruction estimator:).
The Spatial-Best Linear Unbiased Estimator (S-BLUE) at any location x * is given by: where a * = µ p * and B = Cov The exceedance problem of the Tukey g-and-h process, as described in the objective functions, given by Equations (36) and (37) are reduced to the problem of obtaining the multivariate quantile level sets of the process at d locations. First we look at univariate description of the quantile function at just one location. Then we will see that the existence of a tractable quantile Tukey g-and-h function enables us to reduce the exceedance problems to the description of these quantile function along with the user given parameters.
To proceed with these results it will be useful to state the following Lemma 6 that provides expressions for the marginal and conditional quantile functions of the Tukey g-and-h process.

Lemma 6 (Marginal and Conditional Quantile functions of the Tukey g-and-h process).
(1) Marginal Quantile Function: The univariate marginal quantile function of the Tukey g-and-h process at any arbitrary location x k is described as follows: where q k (α) = a k + C(x k , x k )Φ −1 (α), and Φ −1 (α) denotes the inverse cdf of the standard normal distribution. (2) Conditional Quantile Function: The univariate conditional quantile function of the Tukey g-and-h process at any arbitrary location x * is described as follows: is the inverse cdf of the standard normal distribution.

Proposition 8 (Spatial region of right tail exceedance estimator).
The region of right tail exceedance described in Equation (36) is reduced to finding the quantile level sets at the user given tolerance to see if it exceeds the given threshold. This is a binary estimate at location x * , that is characterized by: where q * (α) = µ p * + σ p * Φ −1 (α) is the quantile as seen from Lemma 6. The region is created by accumulating all the locations that produce 1 in the binary estimate.

Remark 6.
We note that derivation of the left tail exceedance spatial field reconstruction estimator follows analogously the result obtained in Proposition 8 with a reversal of the inequality and a quantile level (1 − α) ∈ [0, 1].
Proposition 9 (Spatial level of exceedance estimator). The region of exceedance described in Equation (38) is reduced to finding the quantile level sets at the user given tolerance to see if it exceeds the given threshold. This is a binary estimate at location x * , that is characterized by: where w * (α) = µ p * + σ p * Φ −1 (α)) as seen in Lemma 6.
Note that in Propositions 8 and 9, we used x * to denote the unobserved spatial field location. In these cases the quantile function is essentially the conditional quantile function from the conditional predictive posterior distribution obtained in Lemma 4.

Experimental Results: Warped Gaussian Process Spatial Field Reconstructions
In this section, we look at results concerning spatial field reconstruction based on a warped GP model of the Tukey g-and-h family of spatial processes. We consider both central estimators of spatial field as well as exceedance spatial maps. We compare the warped GP reconstructions of the Tukey g-and-h processes to those one would obtain had one miss-specified the model and applied a classical standard GP field reconstruction. It is of interest to assess the ability of the Tukey g-and-h process to detect attributes of spatial skewness and kurtosis, compared to the GP in a range of different settings. We consider both synthetic as well as real data studies to illustrate the Tukey g-and-h model developed.

Simulation Setup
In these case studies we model the physical phenomenon or spatial process that is partially observed by a Tukey transformation of an underlying GP over a spatial grid where x ∈ [−5, 5] and y ∈ [−5, 5]. We work with the well known squared exponential kernel, with a known unit variance and known length-scale. We also subtract the mean to obtain a zero mean GP. Furthermore, the parameters g, h are assumed to be known and are set to different values within the allowable range, to test various cases. This allows us to demonstrate, on a synthetic case study, the importance of developing spatial field reconstructions that accommodate warped spatial GP structures for spatial skewness and kurtosis by comparing the loss in accuracy if one applies just a standard GP model. Moreover, we have N sensors with known locations, spread uniformly over the spatial grid region of interest. We vary N to study the effect of the number of sensors on different regression estimators. To this end, we use the normalised spatial mean squared error (N-MSE) to characterize the performance of the different estimators and we define it as follows: where J is the number of spatial locations to regress and T is the number of trials of the Tukey g-and-h process simulation. In addition, maxY = max i,j x (ij) is the maximum value of the simulated process across all locations and trials and similarly minY = min i,j x (ij) is the minimum value of the simulated process.
We outline the following test scenarios studied in this setting.
(1) Scenario 1: We compare the performance of the different estimators such as, MMSE (Equation (39)), S-MAP (Equation (41)), S-BLUE (Equation (35)) to the Gaussian Process estimate in various scenarios such as, small g and h with low spatial correlation, high spatial correlation and other such combinations of g, h and l.
Scenario 2: We characterize the exceedance estimation ability of the Tukey g-and-h process as shown in Proposition 8 (g,h), through spatial ROC curves compared to a Gaussian process.
Scenario 3: We adopt real data case studies for the US and we apply our estimators on the noisy observed US precipitation data sets [40]. We consider a spatial field reconstruction that involves latitudes between 34 and 43 degrees and longitudes between −110 and −100 degrees and we compare the different spatial field reconstruction estimators and their performance on real data versus a classical Gaussian process model.
Before we look at the simulation results, we show the actual differences in the distribution of the Tukey g-and-h finite dimensional distribution, with the parameters that are used in the simulation case studies, in order to compare to the Gaussian distribution, see Figure 1. This allows us to demonstrate clearly that at each spatial field location one would expect that skewness and kurtosis will be meaningful to consider in constructing the spatial field reconstruction estimators.

Scenario 1: Spatial Field Reconstruction on Synthetic Data
In this section, we present the estimation performance of the Spatial-BLUE estimator, GP estimator, Spatial-MAP estimator and the Spatial MMSE estimator. The results here are shown for a synthetic data set constructed with given parameter values for g, h and l, using a squared exponential kernel. The study utilised 50 realisations of the Tukey g-and-h random field and the average spatial Mean Squared Error reported below is the average across the 50 realisations and all the locations in space where the field was reconstructed.
To study the variation across the number of sensors, we used N = {9, 16,25,36,49, 64, 81} and spread it uniformly across the field.
As we can see in the first top sub-figure in Figure 2, this is the situation in which the true spatial field process is very close to Gaussian as the g ≈ h ≈ 0 settings was used and therefore with both l values only two curves are seen for the different spatial field reconstructions and they are basically almost indistinguishable. The difference reflects the very slight skew and kurtosis still present in the target warped spatial Gaussian process since g and h were not set exactly to 0. Basically in this scenario we see that the curve associated with the S-BLUE and the S-MAP coincide, as is the case with the S-MMSE and the GP curves. Furthermore all the estimators perform better as the number of sensors increase. Overall the difference in the reconstruction accuracy is negligible between the four estimators as is expected since the values of g and h are very close to 0 as is to be expected. This is simply a case study to illustrate that our spatial field reconstructions for the warped spatial GP model can recover a classical GP reconstruction if the target spatial field truly has this characterisation.
In the bottom subplot of Figure 2, we see that with some significant increase in g and h values that adds to the warped spatial GP skewness and kurtosis, then in this case the standard GP spatial field reconstructions degrade in accuracy. The classical GP estimate is unable to account for the significant non-linearities in the transform that induce the skewness and kurtosis in the spatial field, while the other estimators that we have derived for spatial field reconstruction specifically in the setting of warped spatial GP significantly outperform the classical GP reconstructions of comparative loss functional type, especially as the number of sensors increase. We also observe that the linearisation of the spatial field reconstruction provides an efficient estimator computationally compared to the MMSE estimator but the cost of this computational efficiency is a loss in accuracy due to the attempt to linearise locally the highly non-linear spatial field warping transforms. We see that it becomes particularly effective to use an efficient linear S-BLUE estimator in settings in which observation coverage is sparse and sensors are placed sparsely in the spatial region of interest. In these cases we show that the S-BLUE estimator performs comparatively with the more computationally expensive MMSE estimators.

Scenario 2: Spatial Exceedance on Synthetic Data
In the exceedance examples, we characterize the spatial region of exceedance estimator (Proposition 8) and its left tail equivalent, and the region of exceedance (Proposition 9). We treat these problems as a detection problem, since we obtain a binary estimate at each location. We characterize such performance via plots of the Receiver Operating Characteristic (ROC) curves associated with different values of N. In these illustrations we use α = 0.1 and the threshold λ is varied from the minimum value of the realisations of the Tukey g-and-h field to the maximum value of the realisations. Once the predictive distribution is obtained at the target locations, we calculate the binary estimate of exceedance at each target location and finally to obtain the required probabilities of detection and false alarms, we collate the proportion of false alarms and correct detection at all locations and get the probabilities of false alarms and detection.
From the ROC curves in Figure 3 we see that the Tukey g-and-h distribution in different cases outperforms the Gaussian process in all the scenarios created, with closest performance, in Figure 3 subplot a, corresponding to cases when the generated synthetic data is from settings of g and h parameters in which the data generating mechanism is closest to no skewness or kurtosis effects. The case of a distinct model with regard to a Gaussian model, where skewness and kurtosis is non-negligible, is then shown in Figure 3 subplot b. Now Figure 4 contains the ROC curve corresponding to g = 1.2, h = 0.001 and l = 1. In subfigure a of Figure 4 one can see a scenario case that is close again to the Gaussian setting, with the difference between the case in Figure 3 subplot a being purely the scale parameter of the covariance kernel function, l. Therefore, as expected we see very close performance in the ROC curve in this setting when comparing the Gaussian and Tukey g-and-h models. In Figure 4 subplot b, we see the case with high positive skew and almost zero kurtosis. We notice that the false alarm rate for the quantile exceedance is extremely low and specifically we defined the exceedance as the right tail exceedance. This may be due to the fact that the estimation of the Tukey g-and-h posterior, at all the target locations, depend on the data from the sensors which tend to be considerably positively skewed (g = 1.2), without heavy tails. Since, the GP estimation involves the actual observations (without any transformation), the predictive distribution tends to have a mean with a high positive value and since there is the asymmetry and the lack of heavy tails, the GP also exhibits a reasonably low false alarm rate for the right tail exceedance that is comparable to the Tukey g-and-h process. However, for the same case if we calculate the left tail exceedance as characterized in Proposition 9, we notice that these ROC curves exhibit a higher range of false alarms and detection rate and we obtain the following figure.
In addition, we observed that when we considered the case g = −1.2, h = 0.001, l = 1 which is negatively skewed with 0 kurtosis, we noticed that the above trend is reversed, i.e., the right tail exceedance exhibited a larger false alarm rate and the left tail exceedance exhibited a very small rate of false alarms.

Scenario 3: Spatial Field Reconstruction on Real Spatial Sensor Data for Hourly Precipitation
In this section, we show results of spatial field reconstruction on the US precipitation dataset, available at https://www.ngdc.noaa.gov (accessed on 1 January 2020). This shows the hourly precipitation in digital data set DSI-3240, archived at the National Climatic Data Center (NCDC). It is obtained from US National Weather Service (NWS), Federal Aviation Administration (FAA), and cooperative observer stations in the United States of America, Puerto Rico, the US Virgin Islands, and various Pacific Islands. We extract data for one month (November 1994) from the sensors within the area shown in the map in Figure 5.
Here, to evaluate the performance of our estimators, we estimate the parameters from the US precipitation dataset corresponding to the data from the above locations. For both the Tukey g-and-h and the GP we use an underlying GP with zero mean and a squared exponential kernel where the length-scale and the variance are estimated appropriately. In addition, the Tukey g-and-h requires estimation of two additional parameters namely the skew parameter g and the kurtosis parameter h. After a maximum likelihood estimation in both cases, we obtain the estimates of the Tukey g-and-h to be l gh = 0.2, σ 2 gh = 1, g = 0.001 and h = 0.2 and the estimates for the GP are l gp = 0.1 and σ 2 gp = 2.5. Additionally, we define a normalised version of the median of absolute deviations for the comparison of our estimates on the real data. It is defined as follows: where J is the number of spatial locations to regress and T is the number of trials of the Tukey g-and-h process simulation. In addition, maxY = max i,j x (ij) is the maximum value of the simulated process across all locations and trials and similarly minY = min i,j x (ij) is the minimum value of the simulated process. In Figures 6 and 7 we show results for a real data spatial field reconstructions for the S-Blue, GP, S-MAP and S-MMSE results. In Figure 7, we see that the ratio in the majority (80%) of the test locations is less than or equal to 1, which means that the S-BLUE estimator performs better than the GP at those locations. The light blue markers indicated the locations where this ratio is between 1 and 5, which is around 11% of the test data. Similarly the dark blue markers and the black markers represent the small percentage of the locations where the GP performs better than the S-BLUE. We notice that these locations are generally concentrated in the corners of the grid, which might explain the presence of a few outliers as the main factor that affects the estimates is the covariance between the test location and the sensor locations. We can see from Figure 6, any estimator of the Tukey g-and-h process performs much better than the GP. Particularly in both cases the S-BLUE estimate performs the best in this case and this maybe because in the real data we do not have multiple realisations of the data, thus to get a good estimate we should exploit the spatial information well and the S-BLUE estimate does that precisely, as it requires the spatial Tukey g-and-h covariance, which enables it to produce better estimates and these estimates improve as the number of sensor locations having data increase. Moreover, we notice that the N-MSE of the GP increases as the number of sensors increases and this shows that we are fitting the data with a model that is biased.

Conclusions
We study in detail various aspects regarding Tukey g-and-h processes from its construction and derivation of n-variate cross moments, which helped us develop three different estimators for spatial regression, the MMSE, S-BLUE and the S-MAP. We discuss their derivation through the predictive posterior distribution. In addition, we define the right tail and the left tail exceedance estimators using the quantile level sets of these processes. Finally, we test the performance of our estimators on a simulated data setting and the US precipitation dataset. We find that although the Gaussian process is a useful model, we show that it suffers when the data exhibits heavy-tailedness and/or skewness, where the Tukey g-and-h process is better suited and is also practically applicable. The flexibility of the construction and the estimation of the Tukey g-and-h process can enable us to incorporate even more complex spatial and temporal patterns through different kernels much like the Gaussian process whilst enabling a way to account for non-Gaussian behaviour which arises in many real world applications.  Data Availability Statement: All real data utilised is available at the cited source location.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Proposition 1
To prove that the inverse of the transformation T gh W; ζ gh exists we prove that it is a bijection.
Firstly, we notice that the transformation is continuous in R, since it is essentially the difference of two exponential functions. The Intermediate Value Theorem states that if a function f is continuous in an interval [a, b] and min( f (a), f (b)) < L < max( f (a), f (b)), then there exists c ∈ [a, b] such that f (c) = L. We apply a version of this theorem to a function f that is continuous in an open interval(a, b). Now T gh W; ζ gh : R → R, where this function is continuous in (−∞, ∞). Let L 1 := lim W→−∞ T gh W; ζ gh = −∞ and L 2 := lim W→∞ T gh W; ζ gh = ∞. Since L 1 < L < L 2 for any arbitrary L ∈ (−∞, ∞) and since T gh W; ζ gh is continuous in this interval then there exists a c ∈ (−∞, ∞) such that T gh c; ζ gh = L. Hence this function is surjective.
To prove that it is injective, we use the derivative information presented in Lemma 1. The first derivative of the Tukey g-and-h transform is given by: T gh W; ζ gh = hWT gh W; ζ gh + exp gW + hw 2 2 .
We know that the kurtosis parameter is always non-negative and hence we need to find the sign of WT gh W; ζ gh , since exp gW + hW 2 2 is always positive. From the above expression to determine the sign of M, we just have to consider the sign of Wg(exp(gW) − 1). We know that when Wg > 0, exp(gW) > 1 and when Wg < 0, exp(gW) < 1, and therefore M > 0. Hence the derivative is positive and this means that the transform T gh W; ζ gh is strictly increasing and as a result it is injective. Since it is surjective and injective, we show that this Tukey g-and-h transformation has an inverse.

Appendix B. Proof of Proposition 2
We have the random variable Y := T gh W; ζ gh , where ζ gh := [m 0 , m 1 , µ, σ 2 , g, h, θ = 1]. Then we have that P(Y ≤ y) = P T gh W; ζ gh ≤ y . (A2) The limit of the function T gh W; ζ gh as g → 0 and h → 0 is given by: . Now the functions are separable in g and h, and the limit exists and can be evaluated separately if and only if the limits in g and h exist independently. We know that: Both the limits exist independently in g and h. Hence, L = m 0 + m 1 L 1 L 2 = m 0 + m 1 W. Furthermore, we know that W ∼ N µ, σ 2 . Then L ∼ N m 0 + µ, m 2 1 σ 2 Finally, As a corollary, the special case when, m 0 = 0 and m 1 = 1, we get Y d = W

Appendix C. Proof of Proposition 2
The index of regular variation for the survival function F Y (y) associated with random variable Y is λ where: lim y→∞ y f (y) 1 For a generic transformation T gh W; ζ gh with parameters ζ gh := [m 0 = 0, m 1 = 1, µ, σ 2 , g, h, θ = 1], let f W (.) be the Gaussian pdf of the underlying distribution and F W (.) be the Gaussian cdf and let u = T −1 gh (.). Then, to find the limit we do as follows: u(1−F W (u)) and L 2 = The last line can be obtained from Lemma 3, using the following substitution u (i) = [g(n 1 − i 1 ), g(n 2 − i 2 ), . . . , g(n m − i m )] T (A13) Which gives us the following result: µ (n 1 ,...,n m ) gh