Use of Correlated Data for Nonparametric Prediction of a Spatial Target Variable

: The kriging methodology can be applied to predict the value of a spatial variable at an unsampled location, from the available spatial data. Furthermore, additional information from secondary variables, correlated with the target one, can be included in the resulting predictor by using the cokriging techniques. The latter procedures require a previous speciﬁcation of the multivariate dependence structure, difﬁcult to characterize in practice in an appropriate way. To simplify this task, the current work introduces a nonparametric kernel approach for prediction, which satisﬁes good properties, such as asymptotic unbiasedness or the convergence to zero of the mean squared prediction error. The selection of the bandwidth parameters involved is also addressed, as well as the estimation of the remaining unknown terms in the kernel predictor. The performance of the new methodology is illustrated through numerical studies with simulated data, carried out in different scenarios. In addition, the proposed nonparametric approach is applied to predict the concentrations of a pollutant that represents a risk to human health, the cadmium, in the ﬂoodplain of the Meuse river (Netherlands), by incorporating the lead level as an auxiliary variable. methodology, P.G.-S.; software, T.R.C.-Y.; validation, P.G.-S. and T.R.C.-Y.; formal analysis, P.G.-S. and T.R.C.-Y.; investigation, P.G.-S. and T.R.C.-Y.; resources, P.G.-S. and T.R.C.-Y.; data curation, T.R.C.-Y.; writing—original draft preparation, P.G.-S. and T.R.C.-Y.; writing—review and editing, P.G.-S. and T.R.C.-Y.; visualization, P.G.-S. and T.R.C.-Y.; supervision, P.G.-S.; project administration, P.G.-S.;


Introduction
A major challenge for the World Health Organization (WHO) is to protect human health. With this aim, the WHO promotes a variety of strategies and plans, such as those addressed to provide scientific information about the harmful effects of certain chemicals on the population and recommendations for their management. Cadmium (Cd) is one of the elements included in the group of pollutants under control, whose toxicity can provoke renal or respiratory dysfunctions to humans and affect the skeleton system, as well as it is linked to cancer [1].
The presence of Cd in nature is scarce, mainly associated with zinc and lead. The wide industrial use of this metal releases Cd to the air, which may cause pollution in air, water and soil [2], even on areas far from the sources of emission, by the atmosphere or the river transport. This contaminant is absorbed by certain plants and aquatic organisms, so it can easily enter the food chain. Indeed, some research has been developed to predict the Cd concentrations in crops by taking into account the Cd content in soil, as well as other factors [3]. Thus, periodic assessments to check the soil levels of this variable would be advisable, since they would allow the authorities to detect those zones at risk and to adopt the necessary strategies to alleviate it. This task can be accomplished by collecting data of the target pollutant at some spatial locations in the corresponding domain and then applying appropriate tools to predict its values at a number of unsampled sites, so as to map the concentrations achieved in the whole region.
In geostatistics, prediction of one variable at an unsampled location is usually obtained by applying the kriging methodology on the data collected for the target variable [4,5]. Furthermore, when secondary variables have also been measured at a number of spatial locations and they present a significant correlation with the main variable, the whole data set can be used to improve the results. This is the key idea of the cokriging procedures [6], whose application may vary depending on whether or not the variables share the set of sampled sites, leading to the isotopic or heterotopic cases, respectively.
To solve a cokriging equation system, estimation of the multivariate dependence structure is needed, which is a complex task, since it demands the use of valid functions that must satisfy some relation constraints between each pair of variables. The linear coregionalization model (LCM) was one of the first proposals suggested to simplify the characterization of the multivariate dependence [7]. This approach is based on the idea that the required functions can be derived as a linear combination of a number of theoretical valid models [8]. Then, an appropriate selection of parameters and models is needed, which can be obtained in an iterative way, as developed in [9]. However, the LCM mechanism becomes limited for a large number of secondary variables, thus prompting alternative methods, such as the one presented in [10], derived from Bochner's theorem. Other approaches have been designed to provide more flexible models, based on considering moving averages [11] or multivariate Matérn functions [12]. The work presented in [13] offers a widespread review of procedures for multivariate dependence estimation, including the aforementioned proposals and their extension to the nonstationary setting.
The use of an auxiliary variable can be advisable, for instance, when it has been observed at more sites than the target one and both variables exhibit a strong dependence. Even, sometimes, the measurement of the secondary information is less time consuming and/or expensive than the main one. Thus, the incorporation of the resulting data can be advantageous and it provides a larger sample than the one derived from the principal variable, on which the cokriging approaches can be applied. This methodology has been widely used to solve a variety of problems, specially to assess the contamination level by any pollutant, whose concentration can be influenced by the presence of other variables (chemical elements, atmospheric variables, etc.). This way of proceeding has proved to result in an efficiency gain to determine the air pollution levels [14,15], the groundwater quality parameters [16,17] or the soil concentrations of different contaminants [18,19], among other examples. In addition, some research has been developed to extend the cokriging techniques for its application to functional random fields, taken as main or secondary variables [20,21], although the current work is restricted to the scalar case.
Next, a scheme of the universal cokriging approach is presented. Let {Z(s) = (Z 1 (s), ..., Z p (s))/s ∈ D ⊂ I R d } be a p-variate random process, where s represents a spatial location in the observation region D. The more general case (heterotopy) is considered here, so that data from the different variables are not necessarily collected at the same sampling sites, leading us to assume that Z i is observed on the set S i = {s i1 , ..., s in i } of n i locations, for n i > 0 and i = 1, 2, ..., p.
the whole set of data. Since only one incomplete realization of the multivariate process is available, it is necessary to assume some kind of stationarity on Z for developing inference on this process.
Suppose that Z i can be modeled as for i = 1, 2, ..., p, where µ i (·) is the deterministic trend and Y i (·) is a second-order stationary random process with zero mean. Then, it follows that: , for all s ∈ D and i = 1, 2, ..., p.
Functions C ii are referred to as cross-covariograms or cross-covariances.
If we assumed instead intrinsic stationarity from the random processes Y i , condition (ii) would be replaced by: , for all s, s ∈ D and i, i = 1, 2, ..., p.
The cross-semivariograms γ ii are symmetric functions, although this property does not necessarily follow for the cross-covariograms, when i = i . Under second-order stationarity, the relationship between γ ii and C ii is given by for t ∈ I R d . Without loss of generality, we take Z 1 as the main variable, while the remaining ones are considered as secondary variables. Then, a kriging predictor for Z 1 at an unobserved location s can be constructed as followsẐ where the parameters λ ij i would be obtained by minimizing subject to the unbiasedness condition, namely, For equality (4) to hold, it is sufficient to require that ∑ 1,j 1 λ 1j 1 = 1 and ∑ i,j i λ ij i = 0, for all i = 1. This would lead to the following objective function with l i denoting the Lagrange multipliers. By deriving function (5) with respect to each λ ij i and l i , for each i and j i , as well as equaling the result to zero, an equation system is obtained, whose solution provides the values of the unknown parameters λ ij i . They would be expressed in terms of the cross-covariograms C ii or cross-semivariograms γ ii , difficult to be characterized in practice [6]. Indeed, the number of functions to be estimated amounts to p 2 +p 2 and they must satisfy the positive definiteness property or the conditionally negative definiteness property, respectively, to guarantee the existence of a solution of the cokriging system. Besides, the different functions cannot be derived in an independent way, due to the underlying dependence among them.
Furthermore, the accuracy of the cokriging methodology is conditioned by different factors. A remarkable feature is related to the selection of models needed to estimate the dependence structure and the limitations of the cross-validation techniques to check the adequateness of the resulting functions [22]. Also, a low correlation level between the target and the secondary variables may discourage from applying cokriging on them [23] over univariate kriging, due to the increase in the uncertainty when the dimension augments. On the other hand, the quality of the data may have an effect on the results, which is particularly negative for samples of small sizes or unevenly distributed [24].
To avoid the aforementioned problems, the current research provides a nonparametric alternative for prediction in this setting, derived by extending the univariate approach without covariables that was proposed in [25]. The current work is organized as follows. Section 2 introduces the main hypotheses to be required. The proposed nonparametric predictor is described in Section 3, as well as its properties, whose proofs are outlined in Appendix A and B. Section 4 presents the results of the numerical studies with simulated data, carried out to illustrate the performance of the new methodology. Section 5 shows the application of our proposal to map the cadmium concentrations on the floodplain soil of the Meuse river (Netherlands), by incorporating data of the lead content, due to the correlation between both variables.

Hypotheses
To derive our proposal, the following conditions are required: < +∞, for all s ∈ D and i = 1, ..., p.

Hypothesis 4 (H4)
. µ 1 is twice continuously differentiable in a neighborhood of s, for all s ∈ D.

Hypothesis 6 (H6).
A random design is assumed for the spatial locations, which follow a density function f on D that is bounded and twice continuously differentiable in a neighborhood of s, for all s ∈ D.
Hypothesis 7 (H7). K d is a d-variate, compactly supported and symmetric kernel function.

Hypothesis 8 (H8).
L is a univariate, compactly supported and symmetric kernel function.

Nonparametric Predictor
In the univariate setting, the nonparametric prediction of variable Z 1 at an unsampled location s can be addressed by only considering the data collected for the target variable, together with the application of the methodology developed in [25], yieldinĝ where h represents a bandwidth parameter. Based on this idea, a new nonparametric predictor can be designed so as to incorporate the whole observed values, provided either by the main variable (Z 1 ) or the correlated secondary ones, through a weighted average, in the following manner with where h and h i represent bandwidth parameters and α ii equals -1, 0 or 1, depending on whether C ii (0) is negative, zero or positive, respectively. Each weight p ij i (s) has been defined to account for the correlation between Z 1 (s) and Z i (s ij i ). To avoid the border effects in predictor (6), boundary kernels could be used instead of symmetric ones. Predictor (6) satisfies good properties, under appropriate hypotheses, which include conditions (H1)-(H8), as well as the convergence to 0 of h i and h, as n i and n = ∑ i n i diverge to +∞, respectively, with n i denoting the number of observations collected for Z i , for i = 1, ..., p. On the one hand, the nonparametric predictor is asymptotically unbiased, so that On the other, its mean squared prediction error (MSPE) converges to zero, namely, The proofs of these properties are outlined in Appendix A and B. Also, a proposal for approximation of M(s) in practice is included in Appendix B, given by (A13) and defined in terms of the cross-covariances C 1i . It is important to notice that both properties, (7) and (8), are checked under the assumption of a random design for the spatial sampling locations, although they also follow for predictor (6) when deterministic designs are considered instead.
Next, we deal with the selection of the bandwidth parameters needed for implementation of predictor (6). Starting with h i , write a i for some sequence of positive values converging to 1, as n i tends to +∞, with a i < 1. Suppose that the correlation between Z 1 (s) and Z i (s) is considered significant if Corr[Z 1 (s), Z i (s)] 2 > a i . Seeking to meet this requirement for the data used to compute (6), h i can be taken as where b stands for the support of L.
Regarding the bandwidth h, global or local selectors can be considered. An option of the former group may be derived from the traditional cross-validation method [26], adapted to this setting. To proceed in this way, for each j 1 and a given h, we would leave the observation Z 1 (s 1j 1 ) out and use the remaining data to predict Z 1 at s 1j 1 , through (6). Denote byẐ −(j 1 ) 1h (s 1j 1 ) the resulting value and repeat this scheme for different choices of h ∈ H, where H represents an appropriate set of positive values. A global bandwidth h can be obtained as More generally, the set of sampling locations S 1 can be divided into two disjoint subsets, S 1t and S 1v , to be employed for training and validation, respectively. By omitting the data Z 1 (s 1j 1 ), with s 1j 1 ∈ S 1v , we can predict them from the values of the process at the training set S 1t and some h, 1h (s 1j 1 ) for the predictions attained and obtain a global selector h as follows with |A| denoting the cardinal of the set A. If a local bandwidth h = h(s) is preferred instead, a simple idea to derive it can be based on applying the balloon approach [27]. With this aim, bear in mind that the kernel density K d is compactly supported, so that K d (y) = 0, for all y ∈ I R d such that y ≥ c d and some c d > 0. Then, h can be taken to equal the m-th percentile (for instance, m equal to 10% or 20%) of the values c −1 d s − s ij i , for each s ∈ D. More complex alternatives for acquiring local bandwidths can be derived from the bootstrap techniques, as the one introduced in [25] for univariate prediction without covariables.
In addition to the bandwidth parameters, implementation of predictor (6) also requires estimation of other unknown terms, dependent on the trend and the covariogram functions. These issues can be addressed in a variety of ways, according to the conditions satisfied by the underlying random processes. Furthermore, parametric or nonparametric approaches can be selected to derive the necessary estimates.
Starting with the trend function µ i (·), the sample mean could be applied, under the assumption of constant trend,μ However, when the constant condition fails for the trend, this function can be approximated through a parametric model (e.g. a polynomial function), although misspecification of the selected model is an important problem that may arise with the parametric procedures and it could affect the accuracy of the results derived from it. Nonparametric alternatives for trend estimation include, for instance, the Nadaraya-Watson approach where g i is a bandwidth parameter, or the local linear trend [28]. The conditions required in the current work lead us to assume constant variance from each process Z i . Thus, a natural estimator of σ 2 i is provided by the sample variancê In addition, we could apply a kernel-based approach for approximation of the variance, as the one given byσ with g i representing a bandwidth parameter andμ i denoting an estimator of the trend µ i . Estimation of function C 1i can be addressed through a parametric fit, which would be also strongly dependent on an adequate selection of the parametric model. A nonparametric mechanism for the latter issue is provided in [13], based on the method of moments, which can be written as followŝ A smoothed version of the cross-covariance C 1i can be derived from the kernel method, proposed in [29], asĈ for some bandwidth parameter g i and estimatorμ i of µ i .

Remark 1.
The new prediction methodology must perform adequately for Gaussian data, similarly to what might be expected for the cokriging predictor (3). However, the latter one demands an appropriate characterization of the whole cross-covariograms C ii (or cross-semivariograms), which should additionally satisfy a number of restrictions so as to be valid for prediction, due to the underlying dependence among them, unlike what is needed to develop our proposal. Indeed, predictor (6) and its approximated error (A13) only involve terms dependent on the functions C 1i , whose estimators are not required to accomplish the aforementioned conditions, thus simplifying the practical implementation of the nonparametric approach.

Remark 2.
For construction of a prediction map, implementation of (6) is not so computationally demanding for each site as the cokriging predictor, since the terms K d s−s ij i h are the unique ones that involve the target location.

Remark 3.
Under isotopy, the new approach for prediction can be adapted in a simple way. With this aim, suppose that Z i is observed on the set S i = S = {s j } n j=1 of n locations, for all i = 1, ..., p. The kernel-type predictor of Z 1 at location s would then be given bŷ with h and h i denoting the bandwidth parameters, for i = 1, . . . , p.

Numerical Studies
This section presents the results of the numerical studies with simulated data, developed to illustrate the behavior of the proposed nonparametric predictor (np). For the sake of comparison, the cokriging (cokr) and the kriging (kr) approaches were also implemented.
The samples used in the analyses were generated from bivariate random processes, modeled as given in (1), for D = [0, 1] × [0, 1] and p = 2. To derive correlated variables Z 1 and Z 2 , we took Y 1 and Y 2 to follow a linear model of coregionalization [9], so that where V i represent univariate and independent stationary processes, with zero mean and unit variance, and vectors M i = (m 1i , m 2i ) t , for some real values m ij and i, j = 1, 2. Equivalently, the bivariate random process was obtained as with function µ i denoting the trend of Z i and M = (m ij ) being a 2 × 2 matrix, whose i-th column equals M i , for i = 1, 2.
Various semivariogram models γ i were considered to characterize the dependence structure of V i , with different values for the nugget effect τ 2 i and the range (or the asymptotic range) φ i , which are specified in each analysis. Under this configuration, one has the following cross-covariograms for process Z: The cross-semivariograms are given by: The correlation function between Z 1 and Z 2 amounts to and, therefore: We focused on prediction with heterotopic data. Thus, the sampling locations for Z 1 and Z 2 were respectively selected as random sites s 1j 1 drawn from the uniform distribution on D, for j 1 = 1, ..., n 1 , and equispaced sites s 2j 2 within D, for j 2 = 1, ..., n 2 . In addition, we took a fixed regular grid of 10 × 10 locations s 0k in D, with k = 1, ..., 100, for the prediction points. Then, for each model, 500 samples were generated from Z 1 at locations s 1j 1 and s 0k , as well as from Z 2 at sites s 2j 2 , for different sizes n 1 and n 2 . To assess the results derived for each prediction method, the corresponding averaged mean squared prediction error (AMSPE) was approximated as follows: 1 (s 0k ) denotes the value assigned to Z 1 (s 0k ) in the m-sample andẐ (m) 1 (s 0k ) stands for the prediction of Z 1 (s 0k ) obtained through the m-sample, for m = 1, ..., 500.
Implementation of the three predictors was developed by using different packages of the R library. The kriging and the cokriging predictors were computed with the gstat [30] and the RandomFields [31] packages. The npsp package [32] was employed to obtain predictor (6), combined with the locpol package [33], to yield the nonparametric (local-linear) trend estimates. Regarding the bandwidths needed to derive (6), h i was chosen as given in (9) and selector h was attained through (10), by considering S 1t as the set of sampling locations for Z 1 and S 1v as the set of prediction sites.
For the first analysis, constant trends µ 1 and µ 2 were taken, as well as Gaussian random fields V 1 and V 2 , with zero mean, unit variance and isotropic semivariograms following a pure nugget model and a spherical model, respectively, where τ 2 1 = 1, τ 2 2 = 0 and φ 2 = 0.5. We checked the performance of the nonparametric predictor for distinct choices of vectors M 1 and M 2 , given by 2) t and M 2 = ( √ 0.9, √ 0.1) t , so that ρ 12 (0) approximately equals 0.99, 0.74 and 0.57, respectively. For implementation of the kriging and cokriging, constant trends were assumed and the estimation of the dependence structure was addressed parametrically in both cases, through the theoretical models. Table 1 summarizes the values achieved for the AMSPE with different sample sizes, where the values of ρ 12 (0) have been included in the table to allow the identification of the models. However, it is important to notice that the relationship among the values of ρ 12 (0) for the different models does not necessarily hold for the cross-correlation at other lags. Table 1. AMSPE values obtained for several sample sizes n 1 and n 2 and correlations ρ 12 , for data drawn from random fields Z 1 and Z 2 with constant trends, derived from Gaussian processes V 1 and V 2 with nugget and spherical semivariograms, respectively. For implementation of the kriging and cokriging approaches, the dependence structures of V 1 and V 2 were estimated through the theoretical models. The kriging technique does not incorporate the information provided by the secondary variable, giving rise to the largest errors, as shown in Table 1. The best performance is exhibited by the cokriging predictor, for all sample sizes and correlation functions. An intermediate behavior is observed for the nonparametric approach, with values close to the cokriging predictions, specially for greater sizes. Another remarkable feature is the reduction in the errors achieved for the same size n 1 , with the cokriging method and the new predictor, under an increment of the secondary sample size. Anyway, the results obtained with the proposed methodology are appealing, if we bear in mind the advantages given to the kriging and cokriging procedures. Indeed, the latter methods are optimal under gaussianity and their application was accomplished by a correct specification of the trend and the semivariogram functions, parametrically estimated through the theoretical models.
A deeper analysis was carried out for the bivariate data generated with vectors M 1 = ( √ 0.1, √ 0.2) t and M 2 = ( √ 0.9, √ 0.8) t that provide the highest cross-correlation at lag 0, namely, ρ 12 (0) = 0.99. Table 2 and Figure 1 display the results achieved for the AMSPE, with several sizes n 1 and n 2 , where a similar pattern to the one previously described is shown. Indeed, those methods that make use of the auxiliary data (cokriging and nonparametric predictors) yield the smallest errors and they decrease for larger sizes n 1 and n 2 , with a slight outperformance of the cokriging approach over the nonparametric one. This fact may again be due to the favorable scenario designed for the cokriging technique, although the difference between both procedures tends to diminish as n 1 and n 2 increase. Table 2. AMSPE values obtained for several sample sizes n 1 and n 2 and correlation ρ 12 (0) = 0.99, for data drawn from random fields Z 1 and Z 2 with constant trends, derived from Gaussian processes V 1 and V 2 with nugget and spherical semivariograms, respectively. For implementation of the kriging and cokriging approaches, the dependence structures of V 1 and V 2 were estimated through the theoretical models. Thus, the first study confirms some benefits of using the cokriging technique, when Gaussian processes are considered, with a correct specification of the estimated trend and spatial dependence. Then, a further step in our research is addressed to analyze the development of the new methodology for prediction, under deviations from the aforementioned features.
We started by checking the effect of misspecification in the parametric estimation of the semivariogram functions. For the latter purpose, in the second study, bivariate data were drawn from independent Gaussian processes V 1 and V 2 , with isotropic exponential (Exp) and Matérn (Mat) semivariograms, respectively. The Matérn model involves an additional parameter ν and the particular choice of ν = 0.5 provides the exponential model. To generate the samples in our analyses, we considered constant trends µ 1 and µ 2 , as well as the semivariogram parameters τ 2 i = 0 and φ i = 0.5, for i = 1, 2, together with ν = 2 for the Matérn function. The cross-semivariograms γ ii were obtained as given in (12), through several combinations of semivariogram models selected for V 1 and V 2 , which include the theoretical ones (Exp-Mat), as well as misspecified alternatives obtained by considering the spherical (Sph) family for V 1 or the Gaussian (Gau) family for V 2 . Figure 2 depicts the form of the cross-semivariograms employed in the current study, so that the functions derived under a correct specification of the theoretical model are compared with their counterparts, achieved with a partially misspecified model (Figure 2(a)) and a totally misspecified one (Figure 2(b)) . These plots show that the wrong selection of the parametric families leads to inaccurate estimates of the semivariograms at small distances, with the largest deviations observed for the totally misspecified scenario, as expected, for which the resulting nuggets clearly differ from the corresponding values attained through the theoretical model. These variations of the semivariograms estimates at lags close to zero should have a noticeable effect on prediction, which was checked in the analysis described below.  Table 3 presents the results of the second numerical study, addressed to compare the performance of the three prediction approaches, under different specifications of the dependence structure. As expected, the use of the theoretical models (Exp-Mat) gives rise to the best results, with the smallest errors attained through the cokriging and the nonparametric methods, showing some distance to the corresponding values provided by the kriging predictor. On the other hand, the misspecification does not have an effect on the new methodology for prediction, unlike what happens to the other procedures. Indeed, the wrong selection of one of the semivariogram models (Exp-Gau) leads to an augment in the errors derived through the kriging and cokriging techniques. This trend is even more evident when using a totally misspecified model (Sph-Gau), which results in an additional increment of the AMSPE values obtained. Table 3. AMSPE values obtained for several sample sizes n 1 and n 2 , for data drawn from random fields Z 1 and Z 2 with constant trends, derived from Gaussian processes V 1 and V 2 with exponential and Matérn semivariograms, respectively. For implementation of the kriging and cokriging approaches, the dependence structures of V 1 and V 2 were estimated through the theoretical models (Exp-Mat), together with misspecified alternatives either for the second process (Exp-Gau) or for both processes (Sph-Gau).

Exp-Mat
Exp-Gau Sph-Gau In the previous studies, the trends µ 1 and µ 2 were taken as constant values, so the third analysis is focused on exploring the performance of the three methods considered for prediction under non-constant trend. With this aim, data were generated from model (11), with the same characteristics as in the second study, except that now linear trends were selected, given by µ i (s) = 1 + (−1) i+1 (s 1 + s 2 ), for s = (s 1 , s 2 ) and i = 1, 2. Table 4 shows the errors achieved with the proposed nonparametric predictor and the cokriging approach. In the former case, the trend was nonparametrically estimated, whereas for the cokriging predictor, the approximation of the trend was developed through a constant and a linear models. Table 4. AMSPE values obtained for several sample sizes n 1 and n 2 , for data drawn from random fields Z 1 and Z 2 with linear trends, derived from Gaussian processes V 1 and V 2 with exponential and Matérn semivariograms, respectively. For implementation of the cokriging approach, the dependence structures of V 1 and V 2 were estimated through the theoretical models and the trend estimation was addressed either by considering a constant model (cm) or a linear model (lm). The results in Table 4 reflect that the misspecification of the trend models yields the largest errors, even though the cokriging approach was implemented with the right models for the semivariograms. On the other hand, under a correct characterization of the trend, the cokriging and the nonparametric predictors have a similar performance, with a slight superiority of the former one. In this respect, recall the advantages given to the cokriging prediction in this scenario, since this technique is optimal under gaussianity from the random processes, together with the fact that the dependence structures and trends involved in this approach were correctly specified.
In the final numerical study, our aim was to analyze the effects that the deviations from gaussianity might have on the results achieved with the three predictors considered in the current research. Thus, to simulate a non-Gaussian bivariate process, modeled as in (11), we drew non-Gaussian processes V 1 and V 2 , by taking them as V i (s) = W i (s) 2 , for some Gaussian processes W i , with zero mean and unit variance, for i = 1, 2. This way of proceeding led to non-Gaussian but stationary random fields V i , with mean 1, variance 2 and semivariogram γ i (t) = 2γ W i (t)(2 − γ W i (t)), where γ W i stands for the semivariogram of W i . In particular, for data generation in the current analysis, W 1 and W 2 were selected to have the same characteristics as V i in the second study, so their semivariograms followed the isotropic exponential and Matérn models, respectively, with parameters ν = 2, τ 2 i = 0 and φ i = 0.5, for i = 1, 2. The results are summarized in Table 5. Table 5. AMSPE values obtained for several sample sizes n 1 and n 2 , for data drawn form random fields Z 1 and Z 2 with constant trends, derived from V 1 = W 2 1 and V 2 = W 2 2 , where W 1 and W 2 are zero mean Gaussian processes, with exponential and Matérn semivariograms, respectively. For implementation of the cokriging approach, the dependence structures of V 1 and V 2 were estimated through the exponential and Matérn models, respectively. The errors displayed in Table 5 give account of the best performance of the nonparametric predictor over the kriging and cokriging techniques, when the underlying distribution of the random processes V 1 and V 2 departs from gaussianity. Indeed, several problems arise in this scenario with the application of the kriging and cokriging methods. On the one hand, these approaches are no longer optimal for non-Gaussian processes and, on the other, their behavior is highly dependent on the use of proper specifications of the trend and the semivariogram functions.

Assessment of Cd Concentrations in the Floodplain of the Meuse River
Now, we describe the application of the proposed methodology to assess the contamination by Cd in the geographical area associated to the Meuse data set. These data, presented in [34], contain measurements of different pollutants (zinc, cadmium, lead, copper or organic matter, among others), taken on alluvial soils in the floodplain of the Meuse river, close to Stein (Netherlands). A map of the zone is displayed in Figure 3, designed with the RgoogleMaps package [35] of the R library, which shows the 155 sampling locations considered for data collection. The aforementioned sample has been studied with different aims and methods, which include the application of the cokriging techniques. An example can be found in [36], where cokriging was employed with lead (Pb) as target variable and organic matter or zinc as separate auxiliary variables. In the current research, we focused our attention on Cd, since it represents a risk to human health, with the aim of mapping the concentrations of this pollutant on the whole area of interest. This task was accomplished by using the collected Cd and Pb values, due to the association between both elements, whose correlation coefficient with the available data amounted to 0.7989.
A preliminary analysis of the Cd and Pb data depicted asymmetry, thus they were log-transformed to continue the study, where we applied the universal cokriging and the proposed nonparametric tool. The former approach was implemented by fitting a linear model of coregionalization with isotropic exponential semivariograms.
For the choice of bandwidth h in (6), as given in (10), we randomly divided the set of sampling sites into two disjoint subsets, one for training (with 132 sites) and the other for validation (with 23 sites), marked on Figure 3 as points and crosses, respectively. To proceed in this way, we first selected the spatial locations for the training set from the whole set of sampling sites and the remaining locations formed the validation set. In addition, aiming to solely use the value of one variable at each site, the training set was again split into two separate groups, so that we just considered the values of Cd collected at one of them (with 52 sites) and those of Pb at the other (with 80 sites), represented by red and green points on Figure 3, respectively. The resulting training data set was used to compute predictor (6) with different bandwidths and then we selected the one that best approximated the log-values of the Cd concentrations at the validation set. The bandwidth h glo achieved was employed to proceed with the new methodology.
Finally, the predictions derived with the data in the log-scale and both procedures were back-transformed to the original scale. The results attained for the Cd concentrations (in mg/kg soil), on the geographical area associated to the Meuse data set, are represented in Figure 4. To measure the accuracy of each mechanism, the AMSPE values were computed at the validation set and they amounted to 2.2737 and 0.8189 for the cokriging and the nonparametric approaches, respectively, thus giving advantage to the latter one. The maps displayed in Figure 4 give evidence of the differences between the two prediction methods considered in the current study, although they also exhibit some common patterns. On the one hand, the area with the smallest levels of Cd, identified with the dark blue color, is located in each case on the central part of the region, towards the eastern border, being of larger size the one achieved with the universal cokriging. Then, as we move away from each of the aforementioned areas, the predicted Cd content increases and the highest values are mainly concentrated on the western border of each domain and its vicinity. Nevertheless, the resulting hot spots range in different magnitudes for the two prediction approaches, so that the universal cokriging provides the largest Cd accumulations, even surpassing the green color scale and reaching values above 3 mg/kg soil, unlike what happens to the non-parametric method.
In summary, both ways of proceeding allow the identification of those zones that pose a threat to the population health for their pollutant content. However, the smaller error (AMSPE) achieved by the nonparameric procedure and its good theoretical properties present this predictor as a better candidate for assessment of the points at risk and the consequent decision-making.

Conclusions
The current work deals with the construction of a prediction map that shows the concentrations of a pollutant, the cadmium, in the whole region of interest. This issue may be addressed by using the available data of the target variable (Cd) and also incorporating information of a secondary correlated one (Pb), collected at various spatial sites. In geostatistics, problems of this kind are typically solved through the kriging or the cokriging techniques, depending on whether data of one or more variables are employed, respectively. These methods are optimal for Gaussian processes, although their practical implementation requires an appropriate characterization of the underlying dependence structure. This is not an easy task to be accomplished, specially when cokriging is applied, due to the number of functions to be approximated and the properties that the resulting estimates must satisfy. Thus, our research introduces an alternative kernel-type approach for spatial prediction with covariables, which was designed to overcome the previous drawbacks. Indeed, the new methodology just involves p cross-covariances (or cross-semivariograms), for a p-variate random process, instead of the p 2 +p 2 functions of more complex estimation that are needed for cokriging. To characterize the aforementioned p functions, as well as other unknown terms in our proposal, such as those dependent on the trends, we recommend proceeding through nonparametric procedures, to avoid the misspecification problem present in the parametric estimation. The choice of bandwidths, required to implement the kernel predictor, is also addressed in the current work. In particular, we propose a cross-validation method to derive a global bandwidth h, based on considering random training and validation sets. However, further research should be developed to analyze alternative mechanisms for the selection of sites to be included in each set and, even, to provide additional approaches for bandwidth selection, which could be derived from the application of the bootstrap methodology. The numerical studies carried out in the current study give account of the sensitivity of the kriging and the cokriging techniques to the proper specification of the trend and the cross-semivariograms, unlike what happens to the nonparametric predictor. In addition, deviations from gaussianity in the data are less noticeable when prediction is developed with the new methodology than with the other approaches. Then, the use of the nonparametric proposal is recommended over the cokriging to obtain more accurate predictions of the Cd concentrations, with Pb as auxiliary variable, so as to detect those areas with highest accumulation of Cd, which is crucial for health protection.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Property (7)
In this appendix, we check that relation (7) holds or, equivalently, thatẐ 1 (s) is asymptotically unbiased. With this aim, bear in mind that It follows that By making the change of variable y = s−u h in the previous integrals, one has that with supp(K d ) denoting the support of K d . Then, we can form the Taylor expansions for µ 1 (s − hy) and f (s − hy) about s, to obtain that Relation (A1) and the orders achieved in (A3) and (A4), as well as the application of the Slutsky's theorem, lead us to B(s) = O h 2 . Therefore, predictor (6) is asymptotically unbiased.

Appendix B. Proof of Property (8)
This appendix outlines the proof of relation (8), which establishes the convergence to zero of the mean squared prediction error.
To start, one has that by using in the latter equality the orders established in (A3) and (A4), together with Later, we will prove that Thus, by considering (A4)-(A6), as well as applying Slutsky's theorem, we obtain that the mean squared prediction error ofẐ 1 (s) tends to zero, since M(s) = o (1).
Then, it only remains to state the validity of (A6). This relation follows straightforwardly if differs from 0, for k = 1, 2, we will see that the order established in (A6) holds, due to the properties indicated below, whose demonstration will be provided later, α 1i k C 1i k (0) = σ 1 σ i k + O h i k , for k = 1, 2, (A7) Indeed, the use of (A7) and (A8) leads us to A 5i 1 j i 1 i 2 j i 2 (s) = w i 1 j i 1 (s)w i 2 j i 2 (s) which yields relation (A6). Now, to complete this proof, we check that properties (A7) and (A8) follow, when L C 1i k (0) 2 −σ 2 1 σ 2 i k h i k = 0, for k = 1, 2. With this aim, bear in mind that L is compactly supported and that the previous terms differ from zero, to obtain that C 1i k (0) 2 − σ 2 1 σ 2 i k ≤ h i k supp(L), for k = 1, 2, with supp(L) denoting the support of function L. The latter inequality directly leads to the relation established in (A7).
On the other hand, for t ∈ I R d , one has Cov Z i 1 (s), Z 1 (s − t) + Cov Z i 1 (s), Z i 2 (s − t) − C 1i 2 (0) We first deal with (I), to achieve that (I) = C 1i 2 (0)C i 1 1 (t) where relation (A7) has been used to obtain the last term. Secondly, the application of Hölder's inequality on (II) yields where the penultimate term was derived from (A7). Then, combination of (A10)-(A12) allows us to conclude the proof of property (A8). The final comment in this appendix is related to the practical implementation of the mean squared prediction error M(s), for a given data set. By taking into account (A5), as well as (A2) and (A9), we propose approximating M(s) througĥ