Abstract
This article presents an overview of methodologies for spatial prediction of functional data, focusing on both stationary and non-stationary conditions. A significant aspect of the functional random fields analysis is evaluating stationarity to characterize the stability of statistical properties across the spatial domain. The article explores methodologies from the literature, providing insights into the challenges and advancements in functional geostatistics. This work is relevant from theoretical and practical perspectives, offering an integrated view of methodologies tailored to the specific stationarity conditions of the functional processes under study. The practical implications of our work span across fields like environmental monitoring, geosciences, and biomedical research. This overview encourages advancements in functional geostatistics, paving the way for the development of innovative techniques for analyzing and predicting spatially correlated functional data. It lays the groundwork for future research, enhancing our understanding of spatial statistics and its applications.
Keywords:
functional data; geostatistics; kriging; non-stationarity; spatial prediction; stationarity MSC:
62H11; 60G10
1. Introduction
The functional data analysis (FDA) plays a critical role in various fields, as it involves data depicted as curves, surfaces, or high-dimensional objects. The broad applicability of the FDA across diverse domains, emphasizing its capacity to discern complex data patterns, was illustrated in [1]. Geostatistics, as detailed in [2] and foundational works like [3], is an area known for providing a comprehensive framework in the modeling of spatial dependence structure of data [4]. Geostatistics facilitates estimation, prediction, and uncertainty quantification of this type of data. The core concepts of geostatistics were outlined in [5], while the concept of geostatistical functional data analysis (GFDA) has its roots in early studies that aimed to combine geostatistics with FDA [6,7,8,9]. It was further formalized and refined in [10], highlighting its application in merging the spatial data domain with geostatistical techniques and FDA.
In light of the emphasis on geostatistics, it is essential to acknowledge the foundational works that have shaped the understanding of spatio-temporal geostatistical frameworks [11,12,13]. This understanding is evident when we consider the exploration of the physical geometry concept, which is pivotal for spatio-temporal geostatistical hydrology [14]. There are also advanced perspectives on environmental health modeling [15]. These insights have played a crucial role in refining geostatistical methodologies, especially when analyzing environmental variables [16]. Significant contributions include the integration of innovative approaches, such as the analysis of neural network residual data in unique environmental contexts [17]. Moreover, the recent fusion of quantitative analyses with spatio-temporal considerations highlights the evolving nature of geostatistical studies [18].
Practical implications of the GFDA span various sectors such as climate science [19], agriculture [20,21], oceanology [22], environmental monitoring [23], geology [24], epidemiology [25], and pollution [26], among others. Through the GFDA, we seek to enhance our comprehension of spatial patterns and relationships, leading to informed decision-making and increased understanding of complex spatial phenomena. The GFDA aligns with the kriging methods used for spatial modeling as outlined in [27,28], by extending traditional geostatistical methods, primarily designed for uni-and-multivariate data, to accommodate functional observations. In [29], it was postulated a proposition for a wavelet regression stemming from FDA based on spatial correlation. In [30], a conditional structure based on stationary functional processes [31] employing a random sampling design was presented, whereas in [32], the robustness for spatio-temporal data was discussed.
As the geostatistical field continues to evolve, recent advancements have addressed nuanced statistical aspects and their real-world applications. The intricacies of spatial autocorrelation, particularly concerning unorthodox random variables, have been explored, emphasizing their broader implications [33]. In the era of expansive datasets, the need for robust estimation techniques has become paramount, leading to the development of methods tailored for reduced rank models to accommodate large spatial data [34]. This extends into practical applications, as showcased by the integration of the spatial eigenvector methodology in regularized regression, enhancing predictive precision in domains such as property valuation [35]. The contemporary relevance of geostatistical tools is further underscored by their application in global challenges, such as the COVID-19 pandemic, where both Bayesian and nonparametric geostatistical models have been employed for data analysis [36].
The intent of this article is to furnish an exhaustive overview concerning the techniques utilized for spatial estimation via kriging and cokriging techniques in the realm of geospatially cohesive functional data. We aim to extend the analysis presented in [10] by exploring the breadth and depth of applicability of these techniques across numerous disciplines and their potential in enhancing the understanding of spatial patterns and relationships. Therefore, our primary objective is to deepen the GFDA framework, with a special emphasis on its synergy with kriging techniques, building on and expanding the foundational study presented in [10]. Our work aims to clearly convey this emphasis. In pursuit of our objective, we present a detailed review of various methods for kriging and cokriging prediction based on stationary and non-stationary functional random fields.
After this introduction, in Section 2, we expand on the investigation presented in [5], discussing the methodologies for stationary functional data, including ordinary kriging and continuous time-varying kriging. We also delve into cokriging prediction based on observations of stationary functional random fields and the functional kriging total model. In Section 3, we shift our focus to the prediction of non-stationary functional random fields. Here, we specifically discuss on universal kriging and residual kriging with external drift, areas that remain less explored in the literature.
Our comprehensive overview of interpolative and co-interpolative techniques for geospatial estimation of functional data represents a significant contribution to the advancement of GFDA. By enhancing the integration of the FDA and geostatistics, we hope to open new avenues for research and applications across a variety of fields. The methodologies presented here may provide a better understanding of spatial processes and patterns, leading to improved decision-making and management of spatially referenced functional observations [27,28].
This article unfolds under the following structure. Section 2 is dedicated to a rigorous exploration of kriging and cokriging predictions tailored for stationary functional random fields. We delve into the foundational methodologies and elucidate their broader implications. In Section 3, we pivot our attention towards kriging predictions designed for non-stationary functional random fields. Concluding our discussion, Section 4 serves as a platform for reflective synthesis of our findings, leading us to propose recommendations that might guide future research in the domain of GFDA.
2. Kriging and Cokriging Prediction for Stationary Functional Random Fields
In this section, we provide background on the GFDA under the assumption of stationarity [27]. This section provides a detailed exploration of how kriging and cokriging can be applied to stationary functional random fields. By examining several variations of the GFDA, it contributes to a nuanced understanding of how this analysis can adapt to different circumstances and requirements. It offers valuable insight into both the practical implementation of the GFDA and the theoretical considerations that drive its use.
2.1. Context
Consider the stochastic functional process , where s belongs to a domain (commonly, ). This process is indexed by time t within T. For each specific s in D, acts as a functional variable. Observations are made at locations within D, represented as . The goal is to estimate the functional variable at an unsampled location .
For every , we operate under the assumption of a second-order stationary and isotropic process, ensuring consistent statistical properties throughout the domain. Specifically, for the process , its expected value is , for all and ; its variance is ; and its covariance is Cov(, which is a function of the distance , for any locations . Then, its semi-variogram is defined by and denoted by .
2.2. Standard Kriging for Spatially Correlated Functional Data
The kriging estimator for is given in vectorial notation as [27]
where represents the coefficient vector and denotes the vector of observed values. The symbol “·” used in (1) indicates the scalar product between the vectors and . Optimal values for as defined in (1) are determined by solving the subsequent system presented as
where is the Lagrange multiplier corresponding to the unbiasness constraint.
The integral representation stated in (2), with , is known as the trace-semi-variogram [27]. For the existence of this integral, certain conditions are vital. Specifically, T should be bounded, and the corresponding stochastic functional process must be compactly-supported. Holding these conditions ensures that the trace-semi-variogram provides an accurate representation of the spatial dependence structure.
Given the pairwise function stated as defined over the time domain T, and the set expressed as the function related to the semi-variogram can be formulated as
where represents the total number of distinct pairs in . This formulation of the semi-variogram emphasizes the significance of spatial relationships within the data. By integrating over the time domain and considering pairwise differences, we gain a more comprehensive insight into the spatial dependencies inherent in the dataset.
2.3. Continuous Temporal Fluctuating Interpolation for Functional Data
We provide an overview of the theoretical framework based on [28], considering functional parameters depicted as , where i spans the set from 1 to n. This leads to
with being the time-dependent coefficients and the vector of observed values being .
Thus, the time-varying nature of the coefficient vector arises from the functional character of the data. As the observed data vary over the time domain T, the weight or importance assigned to each observation must be flexible, adjusting over time. This variation captures the continuous shifts in data behavior across the domain.
In contrast, traditional ordinary kriging primarily deals with scalar spatial data, where each spatial location s has a static value. Our approach, focused on functional data, captures intrinsic variability over time, distinguishing it from traditional methods.
The functional weights, namely, change over T, denoting a dynamic weighting approach different from the traditional method where weights remain constant. The concept of minimizing the integrand term-by-term might seem similar to minimizing the overall integral, but this is not always the case in FDA. Given potential interdependencies of functions over T, our approach explicitly addresses the inherent integrated characteristics of functional data, rather than analyzing each temporal instance in isolation.
Having established the nature of functional weights, we next consider the problem of determining these weights mathematically. State the functional parameters as mentioned in (3) by solving the minimization problem formulated as
Suppose that we can represent each observed function with a combination of K basis functions, , for , assuggested in [1]. This representation can be mathematically articulated as
The assumptions concerning stationarity are:
- (i)
- The expected value of is consistent across locations. In more specific terms, we have for every location j, where is a location-independent constant vector.
- (ii)
- The covariance matrix depends solely on the relative distance or difference between locations i and j, rather than their absolute positions.
To ensure stationarity within our framework, the linear model of co-regionalization is subject to certain restrictions. Specifically, this model assumes that the covariances between different components of the multivariate field depend on the spatial difference only through a set of shared univariate functions.
Given the premise that , for each j within the set , represents spatially dependent stochastic functions, the assumption is that matrix , constituted by coefficients denoted , is characterized as
Note that constitutes a K-variate random field [37] of mean and matrix of covariance defined as
where . Here, are assumed to be samples from the spatial random field , for l in the set . We employ the framework of a co-regionalization linear model, as detailed in [37,38], for the estimation of the matrix presented in (5).
When stating unbiasedness for making a prediction using (3), each parameter is represented as a linear combination of the basis functions with coefficients . Mathematically, we have that
By using the formulas defined in (4) and (6), the predictor stated in (3) is now formulated as
The constraint of unbiasedness, expanded using the constant equal to one, is stated as
In this context, is a coefficient vector ensuring that the unbiasedness constraint is satisfied for all values of t by making sure the linear combination of the basis functions yields a constant value equal to one. This constraint may be presented by
which is analogous to or to
While constructing the dispersion of the objective function, it is pertinent to note that the coefficient vector corresponds to the functional expansion of , where . Thus, we have that
From the expression given in (7), for , with in the set , we get
For , the expression given in (8) denotes a matrix associated with , wherein the diagonal components correspond to the variances of the components .
Consider the definitions, for , stated as
where the matrix is determined through , which stems from the established linear model of co-regionalization.
Taking into account K Lagrange constraints represented as , the objective function to be optimized is articulated as
The unit matrix given in (11) has rank K. When optimizing the formulation stated in (10) in terms of , one obtains , which implies
The plug-in estimate for the integrated prediction variance is represented as
The consolidated predictive variance offers insight into the variability associated with forecasting a complete curve.
2.4. Cokriging Prediction Based on Observations of Stationary Functional Random Fields
This subsection is based on [23]. Let be a multivariate random field, where and . Assume we have an observation and that
Let denote the mean of variable j, which is considered to be constant across all , and let represent the random error. The cokriging predictor of variable j at location is expressed as
Now, let us shift our attention to a stochastic process with functional outputs, defined as . With the available observations , we estimate the value of the transformation at the specific pair .
The challenge in this context is to adapt the traditional cokriging predictor for these curves (functional data). This adaptation involves utilizing functional analogs in place of the usual parameters and shifting from random variables to a series of functional variables, like , with i spanning from 1 to n. Details of the adaptation are elaborated in the expressions stated in (12) and (13) and presented next.
- Parameters
- Variables
Hence, within the realm of functional data, the cokriging predictor for is depicted by
For each , the functional coefficients given in (14) adhere to traditional geostatistical requirements like unbiasedness and minimal prediction variance.
In addressing the complexities of our functional data, we adopt a strategy rooted in basis functions. This involves representing both variables and coefficients through expansions, for , given by
where K denotes the total number of basis functions. The choice of an optimal K is data-dependent and can be determined using cross-validation techniques. By utilizing the expansions given by (15) and (16), the predictor stated in (14) undergoes a transformation, yielding
The integral product is defined as
For commonly adopted orthonormal bases, like the Fourier series, the Gram matrix stated in (18) is recognized to be the identity matrix as referenced in (18). Conversely, for bases like B-splines, the determination of requires numerical integration. Assuming stationary behavior in the random functions, we deduce that
This establishes a K-variate random field with its mean and covariance matrix respectively expressed as
and
where depicts the covariance between and . Consequently, employing the formula given in (19), we derive
Hence, the mean predictor presented in (17) is characterized as
Moreover, the mean unobserved function at (location) and v (time) is formulated by
Therefore, the suggested predictor holds with the unbiasedness condition if
While Gram matrices are generally positive semi-definite, ensuring their strict positive definiteness can be challenging. However, in the context of our specific Gram matrix defined in (18), its positive definiteness is guaranteed. This arises from the fact that the functions , for , constitute a basis set and are linearly independent. As a direct consequence, the inverse matrix exists, ensuring the well-defined nature of the expression presented in (21).
To identify the optimal linear unbiased estimator, the functional parameters within our proposed predictor are obtained by addressing the optimization problem stated as
subject to
Obtaining the variance from the objective function formulated in (22), we find that
In this context, and denote the variances of and , respectively. Similarly, signifies the covariance between and , while represents the covariance of with . These metrics can be computed from given in (20) such as previously determined. However, multivariate geostatistics [37], and specifically a linear model of co-regionalization [37], can be employed to estimate these matrices. It should be noted that, due to the stationary random functions in , and are identical. Furthermore, is solely dependent on the distances and not on their locations. As a result, they may be directly obtained using . Defining
the expression given in (23) may be presented as
From the expression formulated in (24) and assuming the Lagrange multipliers , the optimization problem formulated in (22) may be stated as
Considering , we have
with
Solving the equation given in (25) for , we find which implies that and then .
For the prediction variance, we use the notation established as
with its plug-in estimate being given by
where the matrix is derived from an estimated value of via the fitted linear model of co-regionalization.
2.5. Holistic Functional Kriging Approach
Next, we synthesize the findings presented in [39]. The predictor is delineated by
This formulation extends the predictor introduced in (14). The functional parameter gauges the effect of i-th location at v on an unsampled function at location t. The formulation is in harmony with the linear framework designed for responses with functional structure, often denoted as the holistic model detailed in [1].
In the realm of geostatistical analyses, we present two distinct cokriging methodologies: the traditional multivariate cokriging and the more nuanced functional cokriging based on curves. The pivotal distinction between these methodologies hinges upon their treatment of parameters. In the traditional multivariate approach, each spatial location is associated with a set of fixed cokriging coefficients, representing the influence of each observed variable on the prediction. Conversely, in the functional cokriging model, these fixed coefficients evolve into functions, varying across a specified domain t. This introduces an element of flexibility, enabling the model to capture more intricate spatial relationships. While the former model operates directly upon observed variables, the latter delves into a functional setting, emphasizing the impact of each observed function over a domain. The functional cokriging provides an enhanced analytical perspective, aligning closely with the model tailored to functional data. This pivotal distinction between functional cokriging and other methods merits heightened attention to deepen the understanding of the model structures and implications. To delineate functional variables, we employ the basis functions denoted as . These are instrumental for expanding our variables and parameters into known terms. Mathematically, the functional variable and the functional parameter can be articulated using these basis functions as
The coefficients in allow us to model interactions between various basis functions, pivotal to capturing the data intricacy. The matrix is structured as
Now, considering the prediction, namely, which estimates the expected function value at point t, we can express it as
where
The matrix encapsulates the inner products of the basis functions, playing a pivotal role in merging the basis functions and the coefficients in the model. In the case of an orthogonal basis, such as the Fourier basis, the matrix reduces to an identity matrix. For alternative basis functions, like B-spline, it becomes necessary to compute potentially through the application of numerical integration methods.
Leveraging the vec operator, which vectorizes a matrix by stacking its columns into a single column vector, the formulation presented in (28) can be articulated as
with
Therefore, the predictor stated in (29) can be also expressed as
Next, unbiasedness is addressed. Assuming that the coefficients, as delineated in the formula presented in (27), for and , are structured in matrix form as
We can consider as emanating from a K multivariate random field. In this context, the expected value of is for all j in the set . The covariance matrix is then presented as
where . If is a vector with elements , its expected value is defined as
Thus, the expected function value at an unsampled location is defined as
By obtaining the expectation of the formula detailed in (29), we obtain
Note that is the covariance matrix of observation i in the estimation process. From the expressions given in (32) and (33), the predictor presented in (26) is unbiased if and only if
and
Assuming both and hold the condition of being full rank matrices, such a condition is analogous to or, by introducing the vec operation which linearly maps a matrix into a column vector by stacking its columns, the above can be equivalently stated as
Next, minimum variance and parameters estimation is discussed. Given the expressions presented in (27) and (30), the functional parameters within the proposed predictor are calculated (estimated) by optimizing the problem established as
subject to
By expanding the variance established in (34), we obtain
In the expression provided in (35), for where , the covariance matrix between vectors and is defined as
Moreover, it is noteworthy that the variance-covariance matrix of vector can be directly obtained as a special case of the above matrix by setting .
Observe that , for all i. This is due to the stationarity assumption, which implies that the variance does not vary with location. Similarly, the covariance is symmetric and does not depend on the order of i and j. If presented in (31) has been previously estimated, then and are known.
Now, we can utilize multivariate geostatistics [37]. Specifically, we employ a linear co-regionalization framework to estimate these matrices. From (23) and defining
where ⊗ is the Kronecker product of matrices, the objective function presented in (34) can be rewritten as
Then, we define . By substituting into the expression stated in (36), it simplifies to the expression given by
where
By considering the formula presented in (37), we obtain that
which implies that and then .
Empirically, we begin by evaluating a linear co-regionalization scheme for the multivariate random field . This provides an estimate for the matrix outlined in (31), leading directly to estimators for the matrices and . After obtaining and as stated in (38), we can use these matrices in conjunction with the formula given in (39) to estimate . This, in turn, allows us to determine the functional parameters as described in (28). Alternatively, the cumulative prediction variance, denoted as , can be directly estimated using the expression formulated as . This estimation is expressed as , with the matrix derived using , which, in turn, can be sourced from a co-regionalization linear model. The cumulative prediction variance, namely, serves as a holistic measure of the uncertainty associated with approximating the entire curve. Utilizing the estimated parameters and drawing from the equation outlined in (23), we can derive a variance function for individual prediction points.
3. Kriging Prediction for Non-Stationary Functional Random Fields
Non-stationarity in functional data poses unique challenges in geostatistical analysis. To address this type of non-stationarity, various kriging methods have been proposed and refined over the years. In this section, we focus on two core methodologies: universal kriging and residual kriging for functional data [40,41]. By understanding the underpinnings of these approaches, we aim to provide researchers and practitioners with a robust toolkit to address the intricacies posed by non-stationary functional random fields.
3.1. Formulation for Kriging of Geospatial Functional Observations
Let us begin by defining the formulation for kriging of geospatial functional data. Consider the functional vector , in which every component is a data point of a functional stochastic variable within a given region , where d is typically either 2 or 3. Let us denote , , and , where and .
Assuming a linear geospatial trend, we can write
where is the functional intercept, while and are functional coefficients corresponding to the spatial coordinates and , respectively, for every and . Additionally, and . The functional kriging estimator for is then given by
The elements of the weight vector are chosen so that the resulting estimator is both unbiased and minimizes its variance. The estimator stated in (40) remains unbiased under the conditions presented as
To determine the values of , for , we consider the formulation stated as
subject to , where represents the mean squared error (MSE).
In the stationary scenario, the variance is expressed as
Consider the formula given by
where denotes the Lagrange multipliers associated with the unbiasedness constraint. The optimal values of are achieved when the partial derivatives and vanish.
Taking derivatives with respect to , we obtain
Equating to zero the above expression, we have
Analogously, taking partial derivatives with respect to and equating them to zero, we reach the unbiasedness established as
Given that the matrix presented on the left-hand side of the expression is positive definite, we can conclude that
From the formulation stated in (43), we attain at
Then, we get
Thus, the predictor given in (40) and the MSE are obtained as
The value of the prediction variance is consequently reached as
To solve the matrix equation system presented in (43), it is necessary to estimate the spatial autocorrelation matrix with .
3.2. Residual Kriging and External Drift for Spatially Correlated Functional Data
Next, we present a review on residual kriging for functional data [40]. Assume that the residuals of a linear spatial trend are given by
where are the spatial coordinates.
While multiple regression can serve as an alternative, various models are available for detrending the mean of functional process.
Detrending plays a pivotal role in spatial analysis, particularly in identifying structures in the residuals. However, detrending may introduce bias in the experimental covariances or variograms of residuals [3]. It is essential for readers to tackle this step carefully, ensuring the advantages of detrending surpass potential drawbacks. Though this formulation highlights the 2D spatial coordinates and , it is mainly for illustrative purposes. The methodology is adaptable beyond 2D and can extend to higher-dimensional spaces, requiring additional spatial coordinates and associated drift terms. Building on the residuals outlined in (44), the estimation of a residual curve at unsampled locations can be achieved utilizing our discussion of Section 2. Specifically, a residual function can be predicted using
When predicting the functional variable at a non-sampled site , we get
where is defined in (45), (46), or (47).
Another approach involves using explanatory variables distinct from the spatial coordinates to estimate the spatial trend. In that case, consider the model formulated as
In this model, acts as a functional intercept, is the value of a scalar explanatory variable, and is the value of an explanatory variable at functional location . The terms and represent their respective regression parameters. The drift, as detailed in (48), incorporates both scalar and functional explanatory variables. Moreover, the parameters and are in functional form, allowing estimation of the nonlinear effects in the explanatory variables. Thus, the resulting residuals can be expressed as
Using the relations described in (45), (46), or (47), a prediction for the error at a non-sampled location can be derived. Then, the prediction of the functional variable is given as
3.3. R Software and Packages of Spatial Statistics
In the geostatistical framework, particularly when addressing kriging and cokriging for stationary and non-stationary functional random fields, the computational tools we employ are of paramount importance. The R software, an unparalleled open-source environment for statistical computing, stands as a cornerstone in this domain, primarily due to its rich and ever-growing ecosystem of packages dedicated to spatial statistics [42].
Among the myriad of packages available, sp serves as a linchpin, offering classes and methods dedicated to spatial data. This package essentially acts as the bedrock upon which many subsequent spatial packages have been developed [43]. The gstat package enriches this foundation, extending capabilities specifically for geostatistical analysis. Its offerings, especially concerning variogram estimation and various kriging methods, have been elucidated earlier in this section and in Section 2. These methods are deeply rooted in seminal works, particularly those presented in [23,27,28,39], among other pioneers in the field [40,41]. Importantly, gstat acts as a bridge, allowing the seamless translation of theoretical underpinnings into tangible computational solutions [44].
For those at the intersection of FDA and spatial statistics, the fda package is highly relevant. It provides tools for FDA, complementing our discussion on functionally-coherent spatial data and continuous time-varying kriging [45]. Beyond these foundational tools, spdep addresses the need for complex spatial regression, encapsulating both linear and nonlinear models. With capabilities such as defining spatial weights and analyzing spatial autocorrelation, it proves indispensable, especially when navigating the intricacies of multifaceted spatial structures [46]. The recent expansion in R geostatistical tools further underscores its stature. For instance, the EpiGeostats package enhances visualization capacities, making interpretation of geostatistical disease risk maps intuitive [47].
Additionally, the GeoSim package fills a niche by focusing on pluri-Gaussian simulation, bridging gaps between categorical and continuous variables [48]. For researchers in soil science, the insights from [49] provide invaluable guidance on merging linear mixed models with geostatistics [49]. The covatest package simplifies the selection of space-time covariance functions, a task pivotal to many geostatistical analyses [50]. And for those working with compositional data, the work presented in [51] serves as a comprehensive guide, detailing geostatistical methods in R language.
Confronted with the complexities that riddle our study, especially in deciphering kriging predictions for stationary and non-stationary functional random fields, a multifaceted approach becomes imperative. This entails harnessing methods from the aforementioned packages, often in synergy, and crafting custom scripts that address distinct challenges. This adaptability underscores the R environment’s unmatched versatility in tackling advanced spatial statistical problems. For those poised on the brink of deeper exploration, package vignettes remain an invaluable primer. Additionally, the breadth and depth of tutorials and applications available serve as a testament to the boundless possibilities that R offers for spatial data analysis [52]. Another noteworthy contribution to the R spatial statistics toolkit is a comparative case study on R packages designed for analyzing areal data.
This study offers insights into the optimal use-cases and advantages of various spatial packages, assisting practitioners in making informed decisions [53]. In the ever-evolving domain of mixed-effects models with spatial considerations, the sdmTMB package stands out. This package provides swift and versatile solutions, particularly suited for integrated linear models with random effects that address both geospatial and space-time variations. Such capabilities underscore its importance for detailed spatial analyses [54].
4. Discussion and Conclusions
The analysis of spatially correlated functional data has gained prominence in contemporary data analysis, notably in the big data era. Functional geostatistics, as a result, serves as a linchpin in deepening our grasp and modeling of intricate spatial processes. A pivotal factor in this analysis is the assessment of stationarity. This property, which underpins the stability of statistical characteristics over the spatial domain, guides the choice of the suitable kriging method for functional data. Table 1 summarizes the main results and contributions of our work.
Table 1.
Summary of main results and contributions.
Delving into this topic, the present article offered a detailed review of methodologies tailored for the spatial prediction of functional geostatistical data, encompassing both stationarity and non-stationarity scenarios. Through an exploration of various scholarly proposals, this study shed light on the prevailing challenges and breakthroughs in this realm. The significance of our findings spans both theoretical and applied dimensions.
The real-world implications of our exploration are expansive, influencing domains like environmental monitoring, geosciences, and even biomedical research. By fostering an accurate modeling paradigm for spatial dependence and functional variability, the groundwork is laid for experts to make insightful decisions in these arenas, particularly in spheres like environmental stewardship, resource allocation, and risk appraisal. Moreover, this article acts as a catalyst for innovative strides in functional geostatistics, spotlighting the hurdles brought about stationarity contexts. This beckons the genesis of avant-garde techniques to analyze and predict spatially intertwined functional data—a contribution benefiting not just functional geostatistics but also enriching spatial statistics at large.
In summation, the contributions herein fortify the foundational aspects of analyzing, modeling, and predicting spatially dependent functional data. By weaving in considerations of diverse stationarity contexts, we amplify our prowess in deciphering intricate spatial patterns and functional idiosyncrasies typical of real-world datasets. Harnessing these methodologies promises heightened prediction accuracy, fostering judicious decision-making. This not only augments realms like environmental governance and resource distribution but also heralds a new dawn of research and novelty in the domain, broadening the horizons of spatial statistics and its multifaceted applications.
In the setting of spatial data exhibiting non-symmetric distributions, several scholarly investigations have been undertaken [55,56,57,58], as well as in quantile regression with spatial data [59], and diagnostic analysis in regression models [21,60]. However, much investigation on this field must be explored still, even in traditional spatial models [61].
Author Contributions
Conceptualization, R.G. and V.L.; methodology, R.G., V.L. and C.C.; formal analysis, R.G., V.L. and C.C.; investigation, R.G.; writing—original draft preparation, R.G.; writing—review and editing, V.L. and C.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research was partially supported by FONDECYT, grant number 1200525 (V.L.), from the National Agency for Research and Development (ANID) of the Chilean government under the Ministry of Science, Technology, Knowledge, and Innovation; and by Portuguese funds through the CMAT—Research Centre of Mathematics of University of Minho—within projects UIDB/00013/2020 and UIDP/00013/2020 (C.C.).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Acknowledgments
The authors would also like to thank the Editors and Reviewers for their constructive comments which led to improve the presentation of the manuscript.
Conflicts of Interest
There are no conflict of interest declared by the authors.
References
- Ramsay, J.; Silverman, B. Functional Data Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
- Christakos, G. Modern Spatiotemporal Geostatistics; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
- Chilès, J.P.; Delfiner, P. Geostatistics: Modeling Spatial Uncertainty; Wiley: Hoboken, NJ, USA, 2009. [Google Scholar]
- Ripley, B.D. Spatial Statistics; Wiley: New York, NY, USA, 2005. [Google Scholar]
- Cressie, N. Statistics for Spatial Sata; Wiley: New York, NY, USA, 2015. [Google Scholar]
- Goulard, M.; Voltz, M. Geostatistical interpolation of curves: A case study in soil science. In Geostatistics Tróia’92; Springer: Dordrecht, The Netherlands, 1993; Volume 1, pp. 805–816. [Google Scholar]
- Ignaccolo, R.; Mateu, J.; Giraldo, R. Kriging with external drift for functional data for air quality monitoring. Stoch. Environ. Res. Risk Assess. 2014, 28, 1171–1186. [Google Scholar]
- Menafoglio, A.; Secchi, P.; Dalla Rosa, M. A universal kriging predictor for spatially dependent functional data of a Hilbert space. Electron. J. Stat. 2013, 7, 2209–2240. [Google Scholar]
- Menafoglio, A.; Grujic, O.; Caers, J. Universal kriging of functional data: Trace-variography vs cross-variography? Application to gas forecasting in unconventional shales. Spat. Stat. 2016, 15, 39–55. [Google Scholar]
- Mateu, J.; Giraldo, R. (Eds.) Geostatistical Functional Data Analysis; Wiley: Hoboken, NJ, USA, 2022. [Google Scholar]
- Christakos, G. Modern Spatiotemporal Geostatistics; Oxford University Press: New York, NY, USA, 2000. [Google Scholar]
- Christakos, G.; Olea, R.A.; Serre, M.L.; Yu, H.-L.; Wang, L.-L. Interdisciplinary Public Health Reasoning and Epidemic Modelling: The Case of Black Death; Springer: New York, NY, USA, 2002. [Google Scholar]
- Christakos, G.; Hristopulos, D.T.; Bogaert, P. Spatiotemporal Environmental Health Modelling: A Tractatus Stochasticus. Stoch. Environ. Res. Risk Assess. 2000, 14, 245–262. [Google Scholar]
- Christakos, G. Physical Geography, Geosystems and Spatiotemporal Geostatistics; Elsevier: Amsterdam, The Netherlands, 2012. [Google Scholar]
- Christakos, G. Environmental Health Modelling: An Introductory Manual; Springer: Cham, Switzerland, 2013. [Google Scholar]
- Christakos, G. Analysis of Environmental Data using Neural Networks. Environ. Sci. Technol. 1970, 4, 110–117. [Google Scholar]
- Christakos, G.; Serre, M.L.; Demyanov, V. Neural Network Residual Analysis of the Spatial Estimation of Radioactivity. Stoch. Environ. Res. Risk Assess. 2001, 15, 209–231. [Google Scholar]
- Wu, J.; He, X.; Christakos, G. Quantitative Integration of Spatiotemporal Data Sources in Modern Geosciences. Adv. Geosci. 2021, 56, 183–201. [Google Scholar]
- Strandberg, J.; de Luna, S.; Mateu, J. A Comparison of Spatiotemporal and Functional Kriging Approaches; Springer: New York, NY, USA, 2022. [Google Scholar]
- Cortes, D.L.; Camacho-Tamayo, J.H.; Giraldo, R. Spatial prediction of soil infiltration using functional geostatistics. AUC Geographica 2018, 53, 149–155. [Google Scholar]
- Garcia-Papani, F.; Uribe-Opazo, M.A.; Leiva, V.; Aykroyd, R.G. Birnbaum-Saunders spatial modelling and diagnostics applied to agricultural engineering data. Stoch. Environ. Res. Risk Assess. 2017, 31, 105–124. [Google Scholar]
- Dabo-Niang, S.; Ferraty, F.; Monestiez, P.; Nerini, D. A cokriging method for spatial functional data with applications in oceanology. In Functional and Operatorial Statistics; Physica-Verlag: Heidelberg, Germany, 2008; pp. 237–242. [Google Scholar]
- Giraldo, R.; Herrera, L.; Leiva, V. Cokriging prediction using as a secondary variable a functional random field with application in environmental pollution. Mathematics 2020, 8, 1305. [Google Scholar]
- Azevedo, L. Model reduction in geostatistical seismic inversion with functional data analysis in seismic inversion. Geophysics 2022, 87, M1–M11. [Google Scholar]
- Dabo-Niang, S.; Ternynck, C.; Thiam, B.; Yao, A.F. Nonparametric statistical analysis of spatially distributed functional data. In Geostatistical Functional Data Analysis; Mateu, J., Giraldo, R., Eds.; Springer: New York, NY, USA, 2022; pp. 175–210. [Google Scholar]
- Montero, J.M.; Fernández-Avilés, G. Functional kriging prediction of pollution series: The geostatistical alternative for spatially-fixed data. Estud. Econ. Apl. 2015, 33, 145–174. [Google Scholar]
- Giraldo, R.; Delicado, P.; Mateu, J. Ordinary kriging for function-valued spatial data. Environ. Ecol. Stat. 2011, 18, 411–426. [Google Scholar]
- Giraldo, R.; Delicado, P.; Mateu, J. Continuous time-varying kriging for spatial prediction of functional data: An environmental application. J. Agric. Biol. Environ. Stat. 2010, 15, 66–82. [Google Scholar]
- Fernández-Pascual, R.M.; Espejo, R.; Ruiz-Medina, M.D. Moment and Bayesian wavelet regression from spatially correlated functional data. Stoch. Environ. Res. Risk Assess. 2016, 30, 523–557. [Google Scholar]
- Bouzebda, S.; Soukarieh, I. Non-parametric conditional U-processes for locally stationary functional random fields under stochastic sampling design. Mathematics 2023, 11, 16. [Google Scholar]
- Adler, R.J.; Taylor, J.E. Random Fields and Geometry; Springer: New York, NY, USA, 2007. [Google Scholar]
- García-Pérez, A. On robustness for spatio-temporal data. Mathematics 2023, 10, 1785. [Google Scholar]
- Griffith, D.A. Spatial autocorrelation and unorthodox random variables: The uniform distribution. Chil. J. Stat. 2022, 13, 133–153. [Google Scholar]
- Jelsema, C.M.; Paul, R.; McKean, J.W. Robust estimation of reduced rank models to large spatial datasets. REVSTAT Stat. J. 2020, 18, 203–221. [Google Scholar]
- McCord, M.; Lo, D.; Davis, P.; McCord, J.; Hermans, L.; Bidanset, P. Applying the geostatistical eigenvector spatial filter approach into regularized regression for improving prediction accuracy for mass appraisal. Appl. Sci. 2022, 12, 10660. [Google Scholar]
- Alvo, M.; Mu, J. COVID-19 data analysis using Bayesian models and nonparametric geostatistical models. Mathematics 2023, 11, 1359. [Google Scholar]
- Wackernagel, H. Cokriging versus kriging in regionalized multivariate data analysis. Geoderma 1994, 62, 83–92. [Google Scholar]
- Nerini, D.; Monestiez, P.; Manté, C. Cokriging for spatial functional data. J. Multivar. Anal. 2010, 101, 409–418. [Google Scholar]
- Giraldo, R.; Delicado, P.; Mateu, J. Spatial prediction of a scalar variable based on data of a functional random field. Comun. Estadística 2017, 10, 315–344. [Google Scholar]
- Franco-Viloria, M.; Ignacollo, R. Universal, Residual and External Drift Functional Kriging. In Geostatistical Functional Data Analysis; Mateu, J., Giraldo, R., Eds.; Springer: New York, NY, USA, 2015; pp. 55–72. [Google Scholar]
- Caballero, W.; Giraldo, R.; Mateu, J. A universal kriging approach for spatial functional data. Stoch. Environ. Res. Risk Assess. 2013, 27, 1553–1563. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021; Available online: https://www.R-project.org/ (accessed on 2 August 2023).
- Pebesma, E.J.; Bivand, R.S. Classes and methods for spatial data in R. R News 2005, 5, 9–13. [Google Scholar]
- Pebesma, E.J. Multivariable geostatistics in S: The gstat R package. Comput. Geosci. 2004, 30, 683–691. [Google Scholar]
- Ramsay, J.O.; Hooker, G.; Graves, S. Functional Data Analysis with R and MATLAB; Springer: New York, NY, USA, 2009. [Google Scholar]
- Bivand, R.S.; Pebesma, E.; Gómez-Rubio, V. Applied Spatial Data Analysis with R; Springer: New York, NY, USA, 2013. [Google Scholar]
- Ribeiro, M.; Azevedo, L.; Pereira, M.J. EpiGeostats: An R Package to Facilitate Visualization of Geostatistical Disease Risk Maps. Math. Geosci. 2023, 1–17. [Google Scholar]
- Valakas, G.; Modis, K. GeoSim: An R-package for pluri-Gaussian simulation and co-simulation between categorical and continuous variables. Appl. Comput. Geosc. 2023, 19, 100130. [Google Scholar]
- Slaets, J.I.; Boeddinghaus, R.S.; Piepho, H.P. Linear mixed models and geostatistics for designed experiments in soil science: Two entirely different methods or two sides of the same coin? Eur. J. Soil Sci. 2021, 72, 47–68. [Google Scholar]
- Cappello, C.; De Iaco, S.; Posa, D. covatest: An R package for selecting a class of space-time covariance functions. J. Stat. Softw. 2020, 94, 1–42. [Google Scholar]
- Tolosana-Delgado, R.; Mueller, U. Geostatistics for compositional data with R; Springer: New York, NY, USA, 2021. [Google Scholar]
- Bivand, R.S.; Pebesma, E.; Gómez-Rubio, V. Applied Spatial Data Analysis with R; Springer: New York, NY, USA, 2008. [Google Scholar]
- Bivand, R. R packages for analyzing spatial data: A comparative case study with areal data. Geogr. Anal. 2022, 54, 488–518. [Google Scholar]
- Anderson, S.C.; Ward, E.J.; English, P.A.; Barnett, L.A. sdmTMB: An R package for fast, flexible, and user-friendly generalized linear mixed effects models with spatial and spatiotemporal random fields. bioRxiv 2022. [Google Scholar] [CrossRef]
- Martinez, S.; Giraldo, R.; Leiva, V. Birnbaum-Saunders functional regression models for spatial data. Stoch. Environ. Res. Risk Assess. 2019, 33, 1765–1780. [Google Scholar]
- Garcia-Papani, F.; Leiva, V.; Ruggeri, F.; Uribe-Opazo, M.A. Kriging with external drift in a Birnbaum-Saunders geostatistical model. Stoch. Environ. Res. Risk Assess. 2018, 32, 1517–1530. [Google Scholar]
- Sanchez, L.; Leiva, V.; Galea, M.; Saulo, H. Birnbaum-Saunders quantile regression models with application to spatial data. Mathematics 2020, 8, 1000. [Google Scholar]
- Kotz, S.; Leiva, V.; Sanhueza, A. Two new mixture models related to the inverse Gaussian distribution. Methodol. Comput. Appl. Probab. 2010, 12, 199–212. [Google Scholar]
- Leiva, V.; Sanchez, L.; Galea, M.; Saulo, H. Global and local diagnostic analytics for a geostatistical model based on a new approach to quantile regression. Stoch. Environ. Res. Risk Assess. 2020, 34, 1457–1471. [Google Scholar]
- Leiva, V.; Rojas, E.; Galea, M.; Sanhueza, A. Diagnostics in Birnbaum-Saunders accelerated life models with an application to fatigue data. Appl. Stoch. Models Bus. Ind. 2014, 30, 114–128. [Google Scholar]
- Giraldo, R.; Leiva, V.; Christakos, G. Leverage and Cook distance in regression with geostatistical data: Methodology, simulation, and applications related to geographical information. Int. J. Geogr. Inf. Sci. 2021, 37, 607–633. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).