Cokriging Prediction Using as Secondary Variable a Functional Random Field with Application in Environmental Pollution

Giraldo, Ramón; Herrera, Luis; Leiva, Víctor

doi:10.3390/math8081305

Open AccessArticle

Cokriging Prediction Using as Secondary Variable a Functional Random Field with Application in Environmental Pollution

by

Ramón Giraldo

¹

,

Luis Herrera

¹ and

Víctor Leiva

^2,*

¹

Department of Statistics, Universidad Nacional de Colombia, Bogotá 111321, Colombia

²

School of Industrial Engineering, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(8), 1305; https://doi.org/10.3390/math8081305

Submission received: 16 July 2020 / Revised: 3 August 2020 / Accepted: 4 August 2020 / Published: 6 August 2020

(This article belongs to the Special Issue Statistical Simulation and Computation)

Download

Browse Figures

Versions Notes

Abstract

Cokriging is a geostatistical technique that is used for spatial prediction when realizations of a random field are available. If a secondary variable is cross-correlated with the primary variable, both variables may be employed for prediction by means of cokriging. In this work, we propose a predictive model that is based on cokriging when the secondary variable is functional. As in the ordinary cokriging, a co-regionalized linear model is needed in order to estimate the corresponding auto-correlations and cross-correlations. The proposed model is utilized for predicting the environmental pollution of particulate matter when considering wind speed curves as functional secondary variable.

Keywords:

functional data analysis; functional random fields; geostatistics; kriging; R software

1. Introduction and Bibliographical Review

Geostatistics is a branch of statistics used for performing spatial prediction when realizations of univariate or multivariate random fields are available [1]. Kriging, cokriging, and multivariate kriging are some techniques that are used for prediction in geostatistics (also known as spatial statistics) [2]. These techniques are generalized forms of univariate and multivariate linear regression models based on georeferenced data, which often have spatial dependence. Those interested in an overview about these techniques, and spatial statistics in general, are referred to [1,2,3,4].

Geostatistics is used in many different areas, such as ecology, geology, meteorology, and mining. In some applications of these areas, multiple variables are measured at each location of the region of interest. For example, in environmental studies, often data of several air pollutants, such as carbon monoxide (CO), ozone (O3), and particulate matter (PM), are collected at each monitoring station. Air quality can be evaluated in terms of concentrations of PM with diameter smaller than 10 mm (PM10) [5,6,7,8,9,10]. These pollutants may be correlated and then using all observed variables may improve the prediction [11].

Cokriging is a generalization of the kriging technique. Cokriging has been developed in order to deal with multivariate spatial interpolation (prediction) [12,13]. The cokriging technique has been widely studied and adapted to different practical scenarios. This technique is used to take advantage of the covariance between two or more realizations of cross-correlated random fields. Some variations of cokriging are the ordinary kriging [14], universal kriging [15], collocated kriging [16], and indicator kriging [17]. In the cokriging technique, information of one or more secondary variables is used for reducing the prediction variance of the primary variable. It has been demonstrated that cokriging is better than kriging in terms of the costs and precision of the predictions, when the random fields under study are cross-correlated [18].

Nowadays, the geostatistical prediction techniques (kriging and cokriging) have been extended to deal with realizations of functional random fields [19]. This is a relatively new research area, called functional geostatistics [20,21].

The first attempt to apply geostatistical interpolation techniques in the prediction of unvisited sites was done in the pioneering work that was published in [22]. Later, several kriging predictors for stationary [23,24,25] and non-stationary [26,27,28] functional random fields have been proposed. In such proposals, a spatial prediction of a functional variable was performed. In the case of stationarity, only the realization of the underlying functional random field is considered. However, in the non-stationary case, scalar covariates (for example, altitude, latitude, or longitude) or functional covariates can be used to estimate the trend. In the methodologies proposed in [20,23] for predicting functional variables, a preliminary step using basis functions is considered. After expanding in terms of basis functions, geostatistical prediction of functional variables becomes an ordinary cokriging prediction, which consists of cokriging on the estimated coefficients.

Note that cokriging can be used to predict both a scalar variable (using data of a multivariate random field) or a functional variable (when data of a functional random field are studied and the curves are represented by basis functions). In many areas, and particularly in environmental sciences, it is common to obtain simultaneously spatial data of scalar and functional variables. For instance, data of annual precipitation and average monthly temperature profiles were analyzed in [29], which were collected at 35 weather stations from Canada. Thus, when data of scalar and functional variables for several sites of a region are available, it could be of interest to perform spatial prediction by using all available data. To the best of our knowledge, no studies on cokriging prediction using a functional random field, as secondary variable, have been considered in the literature to date.

From the perspective of the empirical application of our work, note that the relation between PM10 and wind speed (WS) has been widely studied. For example, a forecasting model was formulated in [30] for the daily average PM10 using as input temperature, cloud cover, wind direction, and WS. Similarly, a model was proposed in [31], where temperature, relative humidity and WS were used as covariates. As a general rule, PM10 concentrations decrease as WS increases, but this is not always the case. For example, a direct relationship between these variables was found in the Kathmandu valley [32]. All of the above-mentioned studies evaluated the relationship between PM10 and WS by using scalar data of both variables. Therefore, to the best of our knowledge, no empirical studies relating PM10 and WS curves (functional data) by means of a spatial or co-regionalized linear model (CLM) have been considered until now. An overview about the CLM and some algorithms to find out co-regionalization matrices is given in [33,34].

Therefore, the objectives of this paper are twofold. The main objective is to propose a cokriging predictor of a scalar primary variable considering, as secondary variable, the realization of a functional random field. The secondary objective is to apply our methodology for relating PM10 and WS curves by using a CLM. The proposed cokriging predictor has a similar expression to the ordinary cokriging predictor employed in multivariate geostatistics. However, the secondary variable is now given by the coefficients obtained after representing the functional data in terms of basis functions. As in the ordinary scenario, a CLM is utilized to estimate the auto-correlation and cross-correlations functions required for estimating the corresponding parameters. The empirical application is based on real data to predict PM10 using a functional WS in Bogotá city, Colombia, one of the places with the worst air quality around the world [35].

The paper is organized, as follows. Section 2 gives an overview about cokriging prediction and introduces the cokriging technique based on scalar and functional variables. In Section 3, we use the methodology that was proposed for performing spatial prediction of the primary variable PM10 using curves of WS as secondary variable. The article ends with Section 4 providing conclusions, discussion, and suggestions for further research. Some detailed mathematical expressions are presented in the Appendix A.

2. Cokriging Using as Secondary Variable a Functional Random Field

In this section, we provide background of the ordinary cokriging predictor [36,37]. Subsequently, we outline the procedure to perform cokriging prediction based on a functional secondary variable. An expression for the predictor is deduced as a natural extension of the ordinary cokriging predictor when secondary variables are considered. Hence, we show how using basis functions to expand the functional variable allows us to come back to an ordinary scenario where the estimation procedure is known.

2.1. Ordinary Cokriging Predictor

A generalization of kriging prediction can be obtained when, instead of one stochastic process, we consider realizations of m random fields. Let

(X_{1} (s), \dots, X_{m} (s))

be a multivariate spatial vector of m random fields on a region

D \subset R^{d}

. Here, we assume that the m processes are stationary, which means that the mean vector is assumed to be constant for all

s \in D

, and the covariance and variogram functions only depend on the distance vector and not on the location

s

.

Let

X_{1} (s)

be the primary variable and

X_{j} (s)

, for

j = 2, \dots, m

, the secondary variables. The cokriging predictor of

X_{1} (s)

in the location

s_{0}

is given by [36]

\begin{matrix} {\hat{X}}_{1} (s_{0}) & = \sum_{i = 1}^{n} λ_{i} X_{1} (s_{i}) + \sum_{i = 1}^{n} α_{i 2} X_{2} (s_{i}) + \dots + \sum_{i = 1}^{n} α_{i m} X_{m} (s_{i}) = \sum_{i = 1}^{n} λ_{i} X_{1} (s_{i}) + \sum_{i = 1}^{n} \sum_{j = 2}^{m} α_{i j} X_{j} (s_{i}) . \end{matrix}

(1)

Observe that the predictor defined in (1) is unbiased if

\sum_{i = 1}^{n} λ_{i} = 1

and

\sum_{i = 1}^{n} α_{i j} = 0

.

We use the following notations:

(i): $2 γ_{l q} (s_{i}, s_{j}) = Cov (X_{l} (s_{i}) - X_{q} (s_{j}))$ , for $l, q = 1, \dots, m$ .
(ii): $γ_{l q}^{⊤} = (γ_{l q} (s_{1}, s_{0}), \dots, γ_{l q} (s_{n}, s_{0}))$ .
(iii): $Γ_{l q} = (\begin{matrix} γ_{l q} (s_{1}, s_{1}) & \dots & γ_{l q} (s_{1}, s_{n}) \\ ⋮ & ⋱ & ⋮ \\ γ_{l q} (s_{n}, s_{1}) & \dots & γ_{l q} (s_{n}, s_{n}) \end{matrix}) .$

By using the method of Lagrange multipliers, the cokriging system of equations must be solved in order to minimize the mean squared prediction error subject to the unbiasedness constraints [36]; see its matrix representation in the Appendix A. As usual in multivariable geostatistics, the solution to the system of equations above-mentioned and given in (A1) of the Appendix A is computed by fitting a CLM.

2.2. Cokriging Predictor Using Functional Secondary Variables

Now, suppose that the number of secondary variables is very large. Then, they can be replaced by a functional variable in order to define the cokriging predictor by using secondary variables of a random field. Subsequently, by employing the same idea given in [29] when the functional linear model for scalar response is introduced, suppose in (1) that

m \to \infty

. Thus, the parameters

α_{i j}

defined in (1) can be replaced by a function

α_{i} (t)

, whereas the secondary variables

X_{2} (s_{i}), \dots, X_{m} (s_{i})

may be replaced by a functional variable

X_{s_{i}} (t)

. Therefore, the cokriging predictor of

X_{1} (s)

in the location

s_{0}

takes now the form

\begin{matrix} {\hat{X}}_{1} (s_{0}) & = \sum_{i = 1}^{n} λ_{i} X_{1} (s_{i}) + \sum_{i = 1}^{n} \int_{T} α_{i} (t) X_{s_{i}} (t) d t, \end{matrix}

(2)

where

{\hat{X}}_{1} (s_{0})

is the prediction of the primary variable at an unsampled site

s_{0}

,

λ_{i}

provides the effect of the scalar variable

X_{1} (s_{i})

on the prediction,

X_{s_{i}} (t)

is a functional variable at the site

s_{i}

, and

α_{i} (t)

is a functional parameter, which gives the weight of

X_{s_{i}} (t)

on the prediction, for

i = 1, \dots, n

.

We employ an approach based on basis functions of B-splines and Fourier type in order to perform the parameter estimation [29]. We expand the functional variables and parameters, respectively, by

\begin{matrix} X_{s_{i}} (t) & = \sum_{j = 1}^{k} a_{i j} ϕ_{j} (t) = a_{i}^{⊤} ϕ (t), \end{matrix}

(3)

\begin{matrix} α_{i} (t) & = \sum_{j = 1}^{k} α_{i j} ϕ_{j} (t) = α_{i}^{⊤} ϕ (t), \end{matrix}

(4)

where

ϕ {(t)}^{⊤} = (ϕ_{1} (t), \dots, ϕ_{k} (t))

is the vector of basis functions and

{α_{i}}^{⊤}, a_{i}^{⊤}

are vectors of estimated coefficients, which are obtained by least squares [29]. By using basis functions and the expansions of the functional variables and parameters that are defined in (3) and (4), respectively, the predictor stated in (1) and (2) is now given by

\begin{matrix} {\hat{X}}_{1} (s_{0}) & = \sum_{i = 1}^{n} λ_{i} X_{1} (s_{i}) + \sum_{i = 1}^{n} α_{i}^{⊤} (\int_{T} ϕ (t) ϕ^{⊤} (t) d t) a_{i} \\ = \sum_{i = 1}^{n} λ_{i} X_{1} (s_{i}) + \sum_{i = 1}^{n} α_{i}^{⊤} W a_{i} = \sum_{i = 1}^{n} λ_{i} X_{1} (s_{i}) + \sum_{i = 1}^{n} α_{i}^{⊤} a_{i}^{*} \\ = \sum_{i = 1}^{n} λ_{i} X_{1} (s_{i}) + \sum_{i = 1}^{n} \sum_{j = 1}^{k} α_{i j} a_{i j}^{*}, \end{matrix}

(5)

where

α_{i j}

are defined in (4) and

\begin{matrix} a_{i}^{*} = {(a_{i 1}^{*}, \dots, a_{i j}^{*}, \dots, a_{i k}^{*})}^{⊤} . \end{matrix}

(6)

Details of the vector

a_{i}^{*}

are provided in (A2) of the Appendix A.

Remark 1.

A particular case of the predictor given in (5) is obtained when

a_{i j}^{*} = a_{i j}

. This occurs when orthonormal basis functions, such as Fourier basis, are used. The predictor that is proposed in (5) is similar to the ordinary cokriging predictor stated in (1), but considering coefficients of basis functions instead of variables. Once the coefficients

a_{i j}^{*}

defined in (6) are calculated, we have a cokriging prediction as in the ordinary case and then the theory above described may be used. A detailed review on multivariate geostatistics, and particularly on the cokriging technique, is presented in [3].

3. Real Data Analysis

In this section, we predict PM10 in Bogotá with our methodology. First, we define the problem upon study and then the cokriging prediction for PM10 is performed using WS curves.

3.1. Definition of the Problem upon Study

Air pollution can be assessed by means of concentrations of PM10 (in

μ / g^{3}

) [6,8]. The inhalation of PM is known to lead to serious health problems [5,10,30]. These particles are small enough to penetrate the respiratory tract of humans and for this reason they are potential disrupter of the normal functioning of the organism [31]. It is evident that having a model that is capable of predicting the maximum of PM10 in unmonitored areas of a city can be useful to environmental institutions in order to alert, when necessary, the population that is exposed at risk. We use the methodology proposed in this work to perform spatial prediction of maximum PM10 in Bogotá city based on data of this variable and WS data.

The data upon analysis were collected in ten stations from the air quality monitoring network in Bogotá city; see Figure 1. For each station, a maximum PM10 value is obtained; see Table 1. The maximum values were calculated from data that were collected hourly between 26 January 2011 (12:00 p.m.) and 3 February 2011 (2:00 p.m.). The bubble map of Figure 1 shows that the highest PM10 values are collected in the south-western part of the city (Kennedy and Carvajal stations). This is an expected result, because such stations are located in industrial zones [35]. The bubble map also indicates that stations in central and north-east zones of the city (such as IDRD, Ferias, Usaquen) have the lowest values, possibly because they have wetlands and urban woodlands around [35].

As mentioned in the introduction of this paper, the relation between WS and PM10 has been widely analyzed, but all of the mentioned studies evaluated the relationship using scalar data of both variables. In the present application, we consider a different approach, where the temporal variation of WS (a curve considered as functional data) is used for predicting a scalar variable (corresponding to the maximum PM10 in a period of eight days).

The data of WS (in m/s) at each station were collected each two hours between 26 January 2011 (10:00 a.m.) and 3 February 2011 (12:00 p.m.). In total, we have a WS data set of 98 observations at each monitoring station displayed in Figure 2. Two aspects can be highlighted from the temporal pattern of the WS data. The time-series have their highest peaks at 12:00 p.m. and three stations have significantly higher values than the others (curves in red, black, and dark blue in Figure 2). These values correspond to Fontibon, Kennedy, and Puente Aranda stations; see bubble map of Figure 1.

From Table 1, note that two of the highest PM10 values are obtained at stations that have high WS peaks (Kennedy and Puente Aranda). As mentioned, although it is not usual, some studies have reported direct correlation between the PM10 and WS variables.

3.2. Cokriging Prediction of PM10 Using WS Curves

There are several options for smoothing time series by using basis functions. Frequently, B-splines and Fourier basis are considered [29]. The second option is applied when a seasonal variation is present [29]. Given that time series show a periodic pattern in Figure 2, we consider a Fourier basis of dimension

k = 7

(estimated by cross-validation [29]) for smoothing the data. Table 1 provides the estimated coefficients of this basis. The obtained WS curves are shown in Figure 3, from where the pattern described above with the original data is more clearly observed in this figure. There are three stations with higher WS magnitudes than the remaining (red, black, and blue lines).

In summary, for each monitoring station, we dispose of a PM10 value and seven coefficients (resulting from fitting a Fourier basis to WS data) that characterize the information that is given by the curves. The functional data are now represented by the seven coefficients. As mentioned, now a complex spatial problem in which a functional covariate is considered becomes a classical problem with scalar covariates that can be solved by means of an ordinary cokriging technique.

First, a CLM was fitted based on the data reported in Table 1. A Gaussian model [14] was considered for describing all simple and cross variograms. The R software was used for obtaining the calculations [38] and an R package named gstat [39] was used to estimate the corresponding parameters. The data and R codes used in this empirical application are available at [40]. The estimated range of simple and cross-variograms was 8 km (about one-third of the maximum distance between monitoring sites in Figure 1).

Second, the estimated CLM was used to derive the matrices

Γ_{k k}

, for

k = 1, \dots, m

, defined in (A1) of the Appendix A, which contain the spatial dependence. Subsequently, the cokriging coefficients,

λ_{i}

and

α_{i j}

namely, for

i = 1, \dots, n

and

j = 1, \dots, k

, defined in (5), were obtained. A regular grid was established (with 10000 points) covering the study area. Thus, the cokriging prediction was performed at each of these points. A map with the predictions obtained is displayed in Figure 4. In addition, at each point, a prediction variance was estimated and whose results are shown graphically in Figure 5.

4. Concluding Remarks, Discussion, and Future Research

In this section, we provide conclusions of the technical aspects developed in this work and a discussion of the empirical study analyzed. In addition, some ideas on further work are mentioned.

4.1. Conclusions and Discussion

In this work, we have proposed a cokriging predictor using a functional secondary variable. We have used a co-regionalized linear model in order to estimate auto-correlations and cross-correlations.

From the application, the results were coherent with other studies that have described the behavior of PM10 in Bogotá city, one of the places with the worst air quality around the world, mainly due to the suspended material generated by mobile sources and the industrial sector [35]. Several aspects of PM10 behavior in the area under study can be mentioned. We started the analysis by assuming that the underlying stochastic process is stationary. Thus, we applied an ordinary cokriging predictor based on this assumption. However, some local variations were evident (above described with the bubble map of Figure 1). In general, it can be identified that, towards the south-west of the city, there are higher concentrations of PM10 and that towards the center and north-east the values tend to decrease as visualized in Figure 4. The highest concentrations were obtained in the industrial area of the city (darker grey) with values above 200.3 (

μ / g^{3}

). From the central area of the city to the north (intermediate shades of gray), the values of PM10 fluctuate between 110 (

μ / g^{3}

) and 200.3 (

μ / g^{3}

). The lowest concentrations were found in the eastern zone (the area closest to the city’s hills). According to the map presented in Figure 5, there are areas with high prediction uncertainty (darker grey). These correspond to areas without monitoring stations; see Figure 1. As mentioned, there are areas near hills that have the lowest values of PM10, being it reasonable for us. However, the uncertainty of the prediction is high. Therefore, any interpretation from a practical point of view must be taken carefully. The same comment applies to other areas far away from the sampling stations.

In summary, this paper reported the following findings:

(i): A cokriging predictor considering a functional secondary variable was proposed.
(ii): After smoothing by using basis functions, an ordinary cokriging was defined by means of several secondary variables (as many as basis functions are used for smoothing the data).
(iii): Cokriging was considered to be a better option than kriging, because including one or more secondary variables in the prediction process reduces uncertainty.
(iv): It was showed how to use the proposed methodology when there are many measurements of a secondary variable over time.
(v): An illustration with a real data set was considered to predict PM10 values in Bogotá city by using a cokriging predictor with wind speed as functional secondary variable.

The results of this work can be taken as a contribution to the explanation of the spatial variation of pollution within the city. Thus, our study can be a knowledge addition to the tool-kit of diverse practitioners, including environmental engineers, applied statisticians, and data scientists.

4.2. Further Work

Some themes for future research, which arose from the present investigation, are the following:

(i): A cokriging predictor with functional variables can be studied upon non-stationarity [21].
(ii): Extensions to the multivariate case is also of practical relevance [41].
(iii): Incorporation of temporal and quantile regression structures in the modeling, as well as errors-in-variables, and PLS regression, are also of interest [42,43,44,45]
(iv): The derivation of diagnostic techniques to detect potential influential cases are needed, which are an important tool to be used in all statistical modeling [44,46,47].
(v): The applications of the new methodology proposed in this investigation can be of interest in diverse areas, where the functional data analysis is considered [29].
(vi): It is of interest to study asymptotic behavior and performance of maximum likelihood estimators in spatial model [41,48].
(vii): Autoregressive model-based fuzzy clustering can be used for detecting information redundancy in air pollution monitoring networks [49].
(viii): Time series clustering by a robust autoregressive metric can be applied to the study of air pollution [50] and also other robust estimation methods when outliers are present into the data set [51].

Therefore, the proposed methodology in this investigation promotes new challenges and offers an open door to explore other theoretical and numerical issues. Research on these and other issues are in progress and their findings will be reported in future articles.

Author Contributions

Data curation, L.H. and R.G.; formal analysis, L.H., R.G. and V.L.; investigation, L.H., R.G. and V.L.; methodology, L.H., R.G. and V.L.; writing—original draft, R.G. and V.L.; writing—review and editing, V.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported partially by project grant “Fondecyt 1200525” (V. Leiva) from the National Agency for Research and Development (ANID) of the Chilean government.

Acknowledgments

The authors would also like to thank the Editor and Reviewers for their constructive comments which led to improve the presentation of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this appendix, we provide some mathematical details of the cokriging system of equations, and of the coefficients of basis functions.

Appendix A.1. Cokriging System of Equations

The matrix representation of the cokriging system of equations to minimize the mean squared prediction error, subject to the unbiasedness constraints, by using the method of Lagrange multipliers, is given by

(\begin{matrix} Γ_{11} & \dots & Γ_{1 k} & \dots & Γ_{1 m} & 1 & \dots & 0 & \dots & 0 \\ ⋮ & ⋱ & ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ & ⋱ & ⋮ \\ Γ_{k 1} & \dots & Γ_{k k} & \dots & Γ_{k m} & 0 & \dots & 1 & \dots & 0 \\ ⋮ & ⋱ & ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ & ⋱ & ⋮ \\ Γ_{m 1} & \dots & Γ_{m k} & \dots & Γ_{m m} & 0 & \dots & 0 & \dots & 1 \\ 1^{⊤} & \dots & 0^{⊤} & \dots & 0^{⊤} & 0 & \dots & 0 & \dots & 0 \\ ⋮ & ⋱ & ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ & ⋱ & ⋮ \\ 0^{⊤} & \dots & 1^{⊤} & \dots & 0^{⊤} & 0 & \dots & 0 & \dots & 0 \\ ⋮ & ⋱ & ⋮ & ⋱ & ⋮ & ⋮ & ⋱ & ⋮ & ⋱ & ⋮ \\ 0^{⊤} & \dots & 0^{⊤} & \dots & 1^{⊤} & 0 & \dots & 0 & \dots & 0 \end{matrix}) (\begin{matrix} λ \\ ⋮ \\ α_{k} \\ ⋮ \\ α_{m} \\ δ_{1} \\ ⋮ \\ δ_{k} \\ ⋮ \\ δ_{m} \end{matrix}) = (\begin{matrix} γ_{1 k} \\ ⋮ \\ γ_{k k} \\ ⋮ \\ γ_{m k} \\ 0 \\ ⋮ \\ 1 \\ ⋮ \\ 0 \end{matrix}) .

(A1)

Appendix A.2. Coefficients of Basis Functions

The coefficients of basis functions can be obtained from

\begin{matrix} a_{i}^{*} = W a_{i} & = \begin{matrix} (\begin{matrix} \int_{T} ϕ_{1} (t) ϕ_{1} (t) d t & \dots & \int_{T} ϕ_{1} (t) ϕ_{j} (t) d t & \dots & \int_{T} ϕ_{1} (t) ϕ_{k} (t) d t \\ ⋮ & ⋱ & ⋮ & ⋱ & ⋮ \\ \int_{T} ϕ_{j} (t) ϕ_{1} (t) d t & \dots & \int_{T} ϕ_{j} (t) ϕ_{j} (t) d t & \dots & \int_{T} ϕ_{j} (t) ϕ_{k} (t) d t \\ ⋮ & ⋱ & ⋮ & ⋱ & ⋮ \\ \int_{T} ϕ_{k} (t) ϕ_{1} (t) d t & \dots & \int_{T} ϕ_{k} (t) ϕ_{j} (t) d t & \dots & \int_{T} ϕ_{k} (t) ϕ_{k} (t) d t \end{matrix}) (\begin{matrix} a_{i 1} \\ ⋮ \\ a_{i j} \\ ⋮ \\ a_{i k} \end{matrix}) \end{matrix} \\ = \begin{matrix} (\begin{matrix} \sum_{l = 1}^{k} a_{i l} \int_{T} ϕ_{1} (t) ϕ_{l} (t) d t \\ ⋮ \\ \sum_{l = 1}^{k} a_{i l} \int_{T} ϕ_{j} (t) ϕ_{l} (t) d t \\ ⋮ \\ \sum_{l = 1}^{k} a_{i l} \int_{T} ϕ_{k} (t) ϕ_{l} (t) d t \end{matrix}) \end{matrix} = \begin{matrix} (\begin{matrix} a_{i 1}^{*} \\ ⋮ \\ a_{i j}^{*} \\ ⋮ \\ a_{i k}^{*} \end{matrix}) . \end{matrix} \end{matrix}

(A2)

References

Diggle, P.; Ribeiro, P. Model-Based Geoestatistics; Springer: New York, NY, USA, 2007. [Google Scholar]
Cressie, N. Statistics for Spatial Data; Wiley: New York, NY, USA, 1993. [Google Scholar]
Ver Hoef, J.; Barry, R. Constructing and fitting models for cokriging and multivariable spatial prediction. J. Stat. Plan. Inference 1998, 69, 275–294. [Google Scholar] [CrossRef]
Chiles, J.; Delfiner, P. Geostatistics: Modeling Spatial Uncertainty; Wiley: New York, NY, USA, 1999. [Google Scholar]
Marchant, C.; Leiva, V.; Cavieres, M.F.; Sanhueza, A. Air contaminant statistical distributions with application to PM10 in Santiago, Chile. Rev. Environ. Contam. Toxicol. 2013, 223, 1–31. [Google Scholar]
Cappelli, C.; D’Urso, P.; De Giovanni, L.; Massari, R. Regime change analysis of interval-valued time series with an application to PM10. Chemom. Intell. Lab. Syst. 2015, 146, 337–346. [Google Scholar] [CrossRef]
Leiva, V.; Marchant, C.; Ruggeri, F.; Saulo, H. Monitoring urban environmental pollution by bivariate control charts: New methodology and case study in Santiago, Chile. Environmetrics 2015, 30, e2551. [Google Scholar]
D’Urso, P.; Cappelli, C.; De Giovanni, L.; Massari, R. Autoregressive metric-based trimmed fuzzy clustering with an application to PM10 time series. Chemom. Intell. Lab. Syst. 2017, 161, 15–26. [Google Scholar] [CrossRef]
Marchant, C.; Leiva, V.; Christakos, G.; Cavieres, M.F. A criterion for environmental assessment using Birnbaum-Saunders attribute control charts. Environmetrics 2019, 26, 463–476. [Google Scholar]
Cavieres, M.F.; Leiva, V.; Marchant, C.; Rojas, F. A methodology for data-driven decision making in the monitoring of particulate matter environmental contamination in Santiago of Chile. Rev. Environ. Contam. Toxicol. 2020. [Google Scholar] [CrossRef]
Leiva, V.; Saulo, H.; Souza, R.; Aykroyd, R.G.; Vila, R. A new BISARMA time series model for forecasting mortality using weather and particulate matter data. J. Forecast. 2020. [Google Scholar] [CrossRef]
Le, N.; Zidek, J. Statistical Analysis of Environmental Space-Time Processes; Springer: New York, NY, USA, 2006. [Google Scholar]
Garcia-Papani, F.; Leiva, V.; Ruggeri, F.; Uribe-Opazo, M.A. Kriging with external drift in a Birnbaum-Saunders geostatistical model. Stoch. Environ. Res. Risk Assess. 2018, 32, 1517–1530. [Google Scholar] [CrossRef]
Wackernagel, H. Cokriging versus kriging in regionalized multivariate data analysis. Geoderma 1994, 62, 83–92. [Google Scholar] [CrossRef]
Helterbrand, J.; Cressie, N. Universal cokriging under intrinsic coregionalization. Math. Geol. 1994, 26, 205–226. [Google Scholar] [CrossRef]
Rivoirard, J. Which models for collocated cokriging? Math. Geol. 2001, 33, 117–131. [Google Scholar] [CrossRef]
Pardo-Igúzquiza, E.; Dowd, P. Multiple indicator cokriging with application to optimal sampling for environmental monitoring. Comput. Geosci. 2005, 31, 1–13. [Google Scholar] [CrossRef]
Isaaks, E.; Srivastava, M. Applied Geostatistics; Oxford University Press: New York, NY, USA, 1989. [Google Scholar]
Delicado, P.; Giraldo, R.; Comas, C.; Mateu, J. Geostatistics for spatial functional data: Some recent contributions. Environmetrics 2010, 21, 224–239. [Google Scholar] [CrossRef]
Giraldo, R. Geostatistics for Functional Data. Ph.D. Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain, 2009. [Google Scholar]
Martinez, S.; Giraldo, R.; Leiva, V. Birnbaum-Saunders functional regression models for spatial data. Stoch. Environ. Res. Risk Assess. 2019, 33, 1765–1780. [Google Scholar] [CrossRef]
Goulard, M.; Voltz, M. Geostatistical interpolation of curves: A case study in soil science. In Geostatistics Tróia’92; Soares, A., Ed.; Kluwer Academc Press: Dordrecht, The Netherlands, 1993; Volume 2, pp. 805–816. [Google Scholar]
Nerini, D.; Monestiez, P.; Manté, C. Cokriging for spatial functional data. J. Multivar. Anal. 2010, 101, 409–418. [Google Scholar] [CrossRef]
Giraldo, R.; Delicado, P.; Mateu, J. Ordinary kriging for function-valued spatial data. Environ. Ecol. Stat. 2011, 18, 411–426. [Google Scholar] [CrossRef]
Menafoglio, A.; Petris, G. Kriging for Hilbert-space valued random fields. The operational point of view. J. Multivar. Anal. 2016, 146, 84–94. [Google Scholar] [CrossRef]
Caballero, W.; Giraldo, R.; Mateu, J. A universal kriging approach for spatial functional data. Stoch. Environ. Res. Risk Assess. 2013, 27, 1553–1563. [Google Scholar] [CrossRef]
Ignaccolo, R.; Mateu, J.; Giraldo, R. Kriging with external drift for functional data for air quality monitoring. Stoch. Environ. Res. Risk Assess. 2014, 28, 1171–1186. [Google Scholar] [CrossRef]
Reyes, A.; Giraldo, R.; Mateu, J. Residual kriging for functional prediction of salinity curves. Commun. Stat. Theory Methods 2005, 44, 798–809. [Google Scholar] [CrossRef]
Ramsay, J.; Silverman, B. Functional Data Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
Hooyberghsa, J.; Mensinka, C.; Dumontb, G.; Fierensb, F.; Brasseurc, O. A neural network forecast for daily average PM10 concentrations in Belgium. Atmos. Environ. 2005, 39, 3279–3289. [Google Scholar] [CrossRef]
Pérez, P.; Reyes, J. Prediction of maximum of 24-h average of PM10 concentrations 30 h in advance in Santiago, Chile. Atmos. Environ. 2002, 36, 4555–4561. [Google Scholar] [CrossRef]
Giri, D.; Krishna-Murthy, V.; Adhiraky, P. The influence of meteorological conditions on PM10 concentrations in Kathmandu valley. Int. J. Environ. Res. 2008, 2, 49–60. [Google Scholar]
Emery, X. Iterative algorithms for fitting a linear model of coregionalization. Comput. Geosci. 2010, 36, 1150–1160. [Google Scholar] [CrossRef]
Giraldo, R. Propuesta de un indicador como variable auxiliar en el análisis cokriging. Rev. Colomb. Estadística 2001, 24, 1–12. [Google Scholar]
Rodríguez-Camargo, L.; Sierra-Parada, R.; Blanco-Becerra, L. Análisis espacial de las concentraciones de PM2.5 en Bogotá según los valores de las guías de la calidad del aire de la Organización Mundial de la Salud para enfermedades cardiopulmonares, 2014–2015. Biomédica 2020, 40, 137–152. [Google Scholar] [CrossRef]
Myers, D. Matrix formulation of cokriging. Math. Geol. 1982, 14, 249–257. [Google Scholar] [CrossRef]
Bogaert, P. Comparison of kriging techniques in a space-time context. Math. Geol. 1996, 28, 73–86. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018. [Google Scholar]
Pebesma, E. Multivariable geostatistics in S: The gstat package. Comput. Geosci. 2004, 30, 683–691. [Google Scholar] [CrossRef]
R Code. Available online: https://sites.google.com/a/unal.edu.co/ramon-giraldo-webpage/r-code (accessed on 20 July 2020).
Sánchez, L.; Leiva, V.; Galea, M.; Saulo, H. Birnbaum-Saunders quantile regression models with application to spatial data. Mathematics 2020, 8, 1000. [Google Scholar] [CrossRef]
Huerta, M.; Leiva, V.; Rodriguez, M.; Villegas, D. On a partial least squares regression model for asymmetric data with a chemical application in mining. Chemom. Intell. Lab. Syst. 2019, 190, 55–68. [Google Scholar] [CrossRef]
Saulo, H.; Leão, J.; Leiva, V.; Aykroyd, R.G. Birnbaum-Saunders autoregressive conditional duration models applied to high-frequency financial data. Stat. Pap. 2019, 60, 1605–1629. [Google Scholar] [CrossRef]
Carrasco, J.M.F.; Figueroa-Zuniga, J.I.; Leiva, V.; Riquelme, M.; Aykroyd, R.G. An errors-in-variables model based on the Birnbaum-Saunders and its diagnostics with an application to earthquake data. Stoch. Environ. Res. Risk Assess. 2020, 34, 369–380. [Google Scholar] [CrossRef]
Sánchez, L.; Leiva, V.; Galea, M.; Saulo, H. Birnbaum-Saunders quantile regression and its diagnostics with application to economic data. Appl. Stoch. Model. Bus. Ind. 2020. [Google Scholar] [CrossRef]
Garcia-Papani, F.; Leiva, V.; Uribe-Opazo, M.A.; Aykroyd, R.G. Birnbaum-Saunders spatial regression models: Diagnostics and application to chemical data. Chemom. Intell. Lab. Syst. 2018, 177, 114–128. [Google Scholar] [CrossRef]
Liu, Y.; Mao, G.; Leiva, V.; Liu, S.; Tapia, A. Diagnostic analytics for an autoregressive model under the skew-normal distribution. Mathematics 2020, 8, 693. [Google Scholar] [CrossRef]
Genton, M.G.; Zhang, H. Identifiability problems in some non-Gaussian spatial random fields. Chilean J. Stat. 2012, 3, 171–179. [Google Scholar]
D’Urso, P.; Di Lallo, D.; Maharaj, E.A. Autoregressive model-based fuzzy clustering and its application for detecting information redundancy in air pollution monitoring networks. Soft Comput. 2013, 17, 83–131. [Google Scholar] [CrossRef]
D’Urso, P.; De Giovanni, L.; Massari, R. Time series clustering by a robust autoregressive metric with application to air pollution. Chemom. Intell. Lab. Syst. 2015, 141, 107–124. [Google Scholar] [CrossRef]
Velasco, H.; Laniado, H.; Toro, M.; Leiva, V.; Lio, Y. Robust three-step regression based on comedian and its performance in cell-wise and case-wise outliers. Mathematics 2020, 8, 1259. [Google Scholar] [CrossRef]

Figure 1. Spatial location of ten air quality monitoring stations in the Bogotá area. Circles are proportional to maximum PM10 values (maximum at each station was calculated based on data that were collected hourly between 26 January 2011 at 12:00 p.m. to 3 February 2011 at 2:00 p.m.).

Figure 2. WS values (in m/s) of ten air quality monitoring stations in the Bogotá area (data at each monitoring station were collected each two hours between 26 January 2011 at 10:00 a.m. to 3 February 2011 at 12:00 p.m.). In total, WS data were collected in 98 time periods (time period 1 corresponds to 10:00 a.m. of 26 January 2011 and time period 98 to 12:00 p.m. 3 February 2011).

Figure 3. WS curves obtained by smoothing the data set of each station using a Fourier basis (of dimension

k = 7

). Time period 1 corresponds to 10:00 a.m. 26 January 2011 and time period 98 to 12:00 p.m. 3 February 2011.

Figure 3. WS curves obtained by smoothing the data set of each station using a Fourier basis (of dimension

k = 7

). Time period 1 corresponds to 10:00 a.m. 26 January 2011 and time period 98 to 12:00 p.m. 3 February 2011.

Figure 4. Map of PM10 predictions in the Bogotá area obtained with cokriging and a secondary variable corresponding to WS curves.

Figure 5. Map of PM10 prediction variances with the highest magnitudes (dark grey) corresponding to zones distant from the sampling sites.

Table 1. PM10 values and coefficients

a_{i j}

, for

i = 1, \dots, n

and

j = 1, \dots, 7

, of Fourier basis functions fitted to SW data (collected each two hours between 26 January 2011 and 3 February 2011 at each of ten environmental monitoring stations in Bogotá, Colombia).

Table 1. PM10 values and coefficients

a_{i j}

, for

i = 1, \dots, n

and

j = 1, \dots, 7

, of Fourier basis functions fitted to SW data (collected each two hours between 26 January 2011 and 3 February 2011 at each of ten environmental monitoring stations in Bogotá, Colombia).

Station ID	Monitoring Station	PM10	$a_{i 1}$	$a_{i 2}$	$a_{i 3}$	$a_{i 4}$	$a_{i 5}$	$a_{i 6}$	$a_{i 7}$
1	Carvajal	242	4.35	2.31	1.87	1.40	–0.40	–0.10	–0.67
2	Fontibon	174	11.24	3.97	2.12	2.17	–1.05	–0.37	–0.83
3	Guaymaral	179	3.49	1.67	1.26	0.85	–0.48	–0.51	–0.50
4	Kennedy	265	9.76	3.52	1.72	1.54	–0.55	–0.05	–0.60
5	Las Ferias	116	6.90	2.17	1.37	0.91	–0.69	–0.26	–0.56
6	Simón Bolivar	130	5.77	2.13	1.52	0.88	–0.27	–0.48	–0.24
7	Puente Aranda	237	9.80	3.00	1.49	1.77	–0.37	–0.48	–0.63
8	Suba	157	6.41	0.76	1.63	0.51	–0.13	–0.83	–0.23
9	Tunal	162	4.64	1.94	1.27	1.07	–0.33	–0.01	–0.35
10	Usaquen	118	5.39	1.12	-0.22	0.19	–0.07	–0.17	–0.59

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Giraldo, R.; Herrera, L.; Leiva, V. Cokriging Prediction Using as Secondary Variable a Functional Random Field with Application in Environmental Pollution. Mathematics 2020, 8, 1305. https://doi.org/10.3390/math8081305

AMA Style

Giraldo R, Herrera L, Leiva V. Cokriging Prediction Using as Secondary Variable a Functional Random Field with Application in Environmental Pollution. Mathematics. 2020; 8(8):1305. https://doi.org/10.3390/math8081305

Chicago/Turabian Style

Giraldo, Ramón, Luis Herrera, and Víctor Leiva. 2020. "Cokriging Prediction Using as Secondary Variable a Functional Random Field with Application in Environmental Pollution" Mathematics 8, no. 8: 1305. https://doi.org/10.3390/math8081305

APA Style

Giraldo, R., Herrera, L., & Leiva, V. (2020). Cokriging Prediction Using as Secondary Variable a Functional Random Field with Application in Environmental Pollution. Mathematics, 8(8), 1305. https://doi.org/10.3390/math8081305

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cokriging Prediction Using as Secondary Variable a Functional Random Field with Application in Environmental Pollution

Abstract

1. Introduction and Bibliographical Review

2. Cokriging Using as Secondary Variable a Functional Random Field

2.1. Ordinary Cokriging Predictor

2.2. Cokriging Predictor Using Functional Secondary Variables

3. Real Data Analysis

3.1. Definition of the Problem upon Study

3.2. Cokriging Prediction of PM10 Using WS Curves

4. Concluding Remarks, Discussion, and Future Research

4.1. Conclusions and Discussion

4.2. Further Work

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Cokriging System of Equations

Appendix A.2. Coefficients of Basis Functions

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI