Fully Bayesian Prediction Algorithms for Mobile Robotic Sensors under Uncertain Localization Using Gaussian Markov Random Fields

In this paper, we present algorithms for predicting a spatio-temporal random field measured by mobile robotic sensors under uncertainties in localization and measurements. The spatio-temporal field of interest is modeled by a sum of a time-varying mean function and a Gaussian Markov random field (GMRF) with unknown hyperparameters. We first derive the exact Bayesian solution to the problem of computing the predictive inference of the random field, taking into account observations, uncertain hyperparameters, measurement noise, and uncertain localization in a fully Bayesian point of view. We show that the exact solution for uncertain localization is not scalable as the number of observations increases. To cope with this exponentially increasing complexity and to be usable for mobile sensor networks with limited resources, we propose a scalable approximation with a controllable trade-off between approximation error and complexity to the exact solution. The effectiveness of the proposed algorithms is demonstrated by simulation and experimental results.


Introduction
In recent years, there has been an increasing exploitation of mobile robotic sensors in environmental monitoring [1,2]. Gaussian processes defined by mean and covariance functions over a continuum space have been frequently used for mobile sensor networks to statistically model physical phenomena such as harmful algal blooms, pH, and temperature, e.g., [3][4][5]. The significant computational complexity in Gaussian process regression due to the growing number of observations has been tackled in different ways. Xu et al. [4] analyzed the conditions under which near-optimal prediction can be achieved, using truncated observations when the covariance function is known a priori. Xu and Choi [6] developed a new efficient and scalable inference algorithm for a class of static Gaussian processes that builds on a Gaussian Markov random field (GMRF) is developed for known hyperparameters. In terms of the computational cost reduction, Ref. [7,8] showed that Gaussian processes can be formulated as infinite-dimensional Kalman filtering and such approach can scale down computational complexity. Ref. [9] combined Gaussian process and Kalman filter for efficient computation. On the other hand, unknown hyperparameters in the covariance function can be estimated by a maximum likelihood (ML) estimator or a maximum a posteriori (MAP) estimator and then can be used for the prediction [10]. However, the point estimate itself needs to be identified using a certain amount of measurements and it does not fully incorporate the uncertainty in the estimated hyperparameters into the prediction in a Bayesian perspective.
The advantage of a fully Bayesian approach is the capability of incorporating various uncertainties in the model parameters and measurement processes in the prediction [2]. However, the solution often requires an approximation technique such as Markov Chain Monte Carlo (MCMC), Laplace approximation, or variational Bayes methods, which still requires a high level of computational complexity [2]. Xu et al. [11] designed a sequential Bayesian prediction algorithm and its distributed version for Gaussian processes to deal with uncertain bandwidths. Xu et al. [12] presented sequential fully Bayesian prediction algorithms for a static GMRF with unknown hyperparameters.
Inexpensive wireless/mobile sensor networks [13] are widespread at the cost of precision in localization. Due to their growing usage, there are many practical opportunities where continuously sampled measurements need to be fused for sensors with localization uncertainty [14]. Secure indoor localization has been proposed based on extracting trusted fingerprint [15]. Theoretically-correct yet efficient (or scalable) inference algorithms need to be developed to meet such demands.
Related works involving uncertain localization in our context are as follows. Gaussian process regression has been used in building maps and localization in many practical applications. Brooks et al. [16] utilized Gaussian process regression to model geo-referenced sensor measurements (obtained from a camera) in a supervised learning manner. Kemppainen et al. [17] used Gaussian process regression to implement simultaneous localization and mapping (SLAM) using a magnetic field. O'Callaghan et al. [18] investigated the problem of using laser range-finder data to probabilistically classify a robot's environment. McHutchon and Rasmussen [19] presented a Gaussian process for training on input points corrupted by independent and identically distributed (i.i.d.) Gaussian noise. Do et al. [20] conducted visual feature selection via Gaussian process regression for position estimation using an omnidirectional camera. The works in [19,20] assume that all hyperparameters are trained offline a priori. Jadaliha et al. [21] and Choi et al. [22] formulated and solved the problem of Gaussian process regression with uncertain localization and known hyperparameters for both centralized and distributed fashions, respectively. A key limitation of such approaches with fixed hyperparameters arises from the fact that, after the initial training phase, learning is discontinued. If the environment changes, it is desirable that the localization algorithm adapts to the changes on the fly. A fully Bayesian approach that treats hyperparameters as random variables can address this issue with increased computational complexity.
The novelty of our work in contrast to ones in [11,12] is to fully consider the uncertainty on the sampling positions along with other uncertainties such as hyperparameters, observation noise, etc., in a fully Bayesian manner. The fully Bayesian field SLAM presented in [23] is quite similar to our current work in this paper. However, it is limited to a static random field while ours can deal with the time-varying random field. To the best of our knowledge, fully Bayesian prediction algorithms for spatio-temporal random fields that can take into account uncertain localization are scant to date. Hence, this paper aims to develop such inference algorithms for robotic sensor networks in practical situations by building on spatio-temporal models developed by Lynch et al. [1] and Xu et al. [2]. With continuous improvement in computation power in embedded systems, it is very important to prepare theoretically-correct, and flexible fully Bayesian approach to cope with such practical problems.
The contributions of the paper are as follows. Firstly, we model a physical spatio-temporal random field as a GMRF with uncertain hyperparameters and formulate the prediction problems with and without localization uncertainty. Next, we derive the exact Bayesian solution to the problem of computing the predictive inference of the random field, taking into account uncertain hyperparameters, measurement noise, and uncertain localization in a fully Bayesian point of view. We show that the exact solution for uncertain localization is not scalable as the number of observations increases. To cope with this increasing complexity, we propose a scalable approximation with a controllable trade-off between approximation error and complexity to the exact solution. The effectiveness of the proposed algorithms is demonstrated by experimental results in both static and dynamical environments.
The paper is organized as follows. In Section 2, we explain how a GMRF can be viewed as a sparse and discretized version of a Gaussian process. In Sections 3 and 4, we introduce a spatio-temporal field model based on a GMRF and the mobile sensor network. In Section 5.1, we present a fully Bayesian inference approach to estimate the spatio-temporal field. The Bayesian prediction algorithm is extended for uncertain sampling positions in Section 5.2. Finally, we evaluate our approach on a real experimental setup in Section 6.
Standard notation is used. Let R and Z >0 denote, respectively, the sets of real and positive integer numbers. The operator of expectation is denoted by E. A random vector x, which has a multivariate normal distribution of mean vector µ and covariance matrix Σ, is denoted by x ∼ N (µ, Σ). For given G = {c, d} and H = {1, 2}, the multiplication between two sets is defined as (2, c), (2, d)}. Other notation will be explained in due course.

From Gaussian Processes to Gaussian Markov Random Fields
There are efforts to fit a computationally efficient GMRF on a discrete lattice to a Gaussian random field on a continuum space [12,24,25]. It has been demonstrated that GMRFs with small neighborhoods can approximate Gaussian fields surprisingly well [24]. This approximated GMRF and its regression are efficient and very attractive [26] as compared to the standard Gaussian process and its regression. Fast kriging of large data sets by using a GMRF as an approximation of a Gaussian random field has been proposed in [25].
We now briefly review a GMRF as a discretized Gaussian process on a lattice. Consider a zero-mean Gaussian process: z(q) ∼ GP (0, Σ(q, q )), where Σ(·, ·) is the covariance function defined in a continuum space S c . We discretize the compact domain S c := [0 x max ] × [0 y max ] into n spatial sites S := {s [1] , · · · , s [n] } ⊂ R 2 , where n = hx max × hy max . h will be chosen such that n ∈ Z >0 . Note that n → ∞ as h → ∞. The collection of realized values of the random field in S is denoted by z := (z [1] The prior distribution of z is given by z ∼ N (0, Σ 0 ), and so we have where Σ 0 ∈ R n×n is the covariance matrix. The i, j-th element of Σ 0 is defined as Σ ). The prior distribution of z can be written by a precision matrix Q 0 = Σ −1 0 , i.e., z ∼ N (0, Q −1 0 ). This can be viewed as a discretized version of the Gaussian process (or a GMRF) with a precision matrix Q 0 on S. Note that Q 0 of this GMRF is not sparse. However, a sparse version of Q 0 , i.e.,Q 0 with a local neighborhood that can represent the original Gaussian process can be found, for example, makingQ 0 close to Q 0 in some norm [24,25]. This approximate GMRF will be computationally efficient due to the sparsity ofQ 0 . For our main problems, we will use a GMRF with a sparse precision matrix that represents a Gaussian process precisely.
We assume that we take N noisy measurements y = (y [1] , · · · , y [N] ) T ∈ R N from corresponding sampling locations q c = (q The measurement model is given by is the measurement noise and is assumed to be independent and identically distributed (i.i.d.).
Using Gaussian process regression, the posterior distribution for z ∈ R n is given by The predictive mean µ ∈ R n and covariance matrix Σ ∈ R n×n can by obtained by µ = K T C −1 y, Σ = Σ 0 − K T C −1 K, where the covariance matrices are defined as K := Cov(y, z) ∈ R N×n , C := Cov(y, y) ∈ R N×N , and Σ 0 := Cov(z, z) ∈ R n×n .
The pose of a robot can be estimated by fusing different sensory information producing its estimate and estimation error statistics [27,28]. Throughout the paper, we assume that the positions of mobile robotic sensors and their uncertainties are estimated by a standard technique. Having had the aforementioned assumption, from a localization algorithm, the prior distribution for sampling location c ), possibly with a compact support in S c . Then, the predictive distribution of z given the measured locationsq c = (q where π(z|q, y) can be obtained by Equation (3). However, the predictive distribution in Equation (4) does not have a closed-form solution and needs to be computed either by MCMC methods or approximation techniques [29]. Now we consider a discretized version of the Gaussian process, i.e., (GMRF) with a precision matrix Q 0 on S. Since the sampling points of Gaussian process regression are not necessarily on a finite compact domain S, we use the nearest grid point of a given sampling point q c in S c q [i] = arg min q∈S q [i] c − q . The sampling positions for the GMRF are then exactly on the lattice, i.e., q [i] ∈ S. The posterior distribution of z ∈ R n on S given by measurements in y ∈ R N and sampling positions in q = (q [1]T , · · · , q [N]T ) T ∈ S N is then obtained by where Q = Q 0 + HP −1 H T , b = HP −1 y, with P = σ 2 I ∈ R N×N and H ∈ R n×N defined as The proof of this posterior distribution of z in Equation (5) is very similar to that of Proposition 6.1 in Chapter 6 of [2], which was derived by using the Woodbury matrix identity.
We consider again localization uncertainty for this GMRF. Let the measured noisy locationq [i] be the nearest grid point of the measured noisy sampling pointq c of the Gaussian process. Now we obtain a set of discretized probabilities in S induced by the continuous prior distribution defined in S c . The discrete prior distribution for the sampling location q [i] is given by where π(s|q [i] ) is the continuous prior as in Gaussian process regression and V j is the Voronoi cell of the j-th grid point s [j] given by The predictive distribution of z given y andq is thus given by where π(z|q, y) can be obtained by Equation (5) and the summation is over all possible locations in S. Figure 1 shows two examples of using this approximation approach with h 1 > h 2 to convert a continuous space to a discrete one. When h → ∞,q →q c and the standard Gaussian process regression in a continuum space shall be recovered from the prediction using the GMRF in a discretized space.

Mobile Sensor Networks
Suppose that the sampling time t ∈ Z >0 is discrete. Let z t := (z t [1] , · · · , z t [n] ) T ∈ R n be the corresponding values of the scalar field at n special sites and time t.
Consider N spatially distributed mobile sensing agents indexed by j ∈ J := {1, · · · , N} sampling at time t ∈ Z >0 . At time t, agent j takes a noise corrupted measurement at its current location where the measurement errors { [j] t } are assumed to be i.i.d. The measurement noise level σ 2 > 0 is assumed to be known. We denote all agents' locations at time t by q t = q t [1]T , · · · , q t [N]T T ∈ S N and the observations made by all agents at time t by y t = y [1] t , · · · , y [N] t T ∈ R N . Furthermore, we denote the collection of agents' locations and the collective observations from time 1 to t by q 1:t = q 1 T , · · · , q t T T ∈ S Nt and y 1:t = (y 1 , · · · , y t ) T ∈ R Nt , respectively. In addition, let us define z t = (z t [1] , · · · , z t [n] ) T ∈ R n on S, and t = ( t ) T ∈ R N . We then have the following notation.
where H τ ∈ R n×N is defined by

Spatio-Temporal Field Model
The value of the scalar field at s t is modeled by a sum of a time-varying mean function and a GMRF z t Here the mean function λ ) T ∈ R p is a known regression function and β t = (β [1] t , · · · , β [p] t ) T ∈ R p is an unknown vector of regression coefficients. The time evolution of β t ∈ R p is modeled by a linear time-invariant system: where ω t ∼ N (0, W), β 0 ∼ N µ β 0 , Σ β 0 , and A t and B t are known system parameters or can be found as discussed in [30].
In addition, we consider a zero-mean GMRF [31] η t = η [1] t , · · · , η [n] t T ∈ R n whose covariance matrix is given by where t, and k are time indices, and δ(·) is the Kronecker delta defined by and the inverse covariance matrix (or precision matrix) Q θ ∈ R n×n is a function of the hyperparameter vector θ.
There are different parameterizations of the GMRF (i.e., the precision matrix Q θ ) [31]. Our Bayesian approach does not depend on the choice of the parameterization for the precision matrix. However, for a concrete and useful exposition, we describe a specific parameterization used in this paper. The precision matrix is parameterized with the full conditionals as follows. Let η be a GMRF on a regular two-dimensional lattice. The associated Gaussian full conditional mean is where Q [ij] θ is the i-th row and j-th column element of κ −1 Q θ . Here, η is the collection of η t values everywhere except s [i] . The hyperparameter vector is defined as θ is 4 + a 2 as denoted at the center node of the graph. That of Q θ is zero if j is not one of the twelve closest neighbors of i (or twelve neighbors whose 1-norm distance to the i-th location is less than or equal to 2), as shown in [32]. The equation in Equation (17) states that the conditional expectation of η [i] t given the value of η t everywhere else (i.e., η ) can be determined just by knowing the value of η t on the twelve closest neighbors (see more details in [32]). The resulting GMRF accurately represents a Gaussian random field with the Matérn covariance function as shown in [32] where K ρ (·) is a modified Bessel function [33], with order ρ = 1, a bandwidth = 1/h α 2 , and vertical scale σ 2 f = 1/4πακ. The hyperparameter α > 0 guarantees the positive definiteness of the precision matrix Q θ . In the case where α = 0, the resulting GMRF is a second-order polynomial intrinsic GMRF [31,34].
In other words, z t |β t , θ ∼ GP (F s β t , Σ θ ) ∈ R n is a non-zero mean Gaussian process. Here, the covariance matrix Σ θ is defined as inverse of the precision matrix (i.e., Σ θ = Q −1 θ ). Note that the precision matrix is a positive definite matrix and invertible, and Σ θ is the i, j-th element of the covariance matrix.
For simplicity, let us define B t = {β t , q t , y t , θ}. Using Gaussian process regression, the posterior distribution for z t |B t ∈ R n is given by The basic idea behind the model introduced in Equations (12), (14), and (15) stems from the space-time Kalman filter model proposed in [35]. The advantage of this spatio-temporal model with known hyperparameters is to make inferences in a recursive manner as the number of observations increases.
In this paper, however, uncertainties in the precision matrix and sampling positions are considered in a fully Bayesian manner. In contrast to [35,36], the GMRF with a sparse precision matrix is used to increase the computational efficiency.

Uncertain Hyperparameters and Exact Localization
In this section, we consider the problem of predicting a spatio-temporal random field, using successive noisy measurements sampled by a mobile sensor network. For a known covariance function, the prediction can be shown to be recursive [37] based on Gaussian process regression. The uncertainty in θ in a GMRF has been considered and its sequential prediction algorithms are derived in [12]. However, only the static field has been considered, i.e., µ t = µ 0 . In this section, we use a Bayesian approach to make predictive inferences of the spatio-temporal random field z t ∈ R n for the case with uncertain hyperparameters and the exact localization. To this end, we use the following Assumptions 1-5. Assumption 1. The spatio-temporal random field is generated by Equations (12), (14), and (15); Assumption 2. The precision matrix Q θ is a given function of an uncertain hyperparameter vector θ; Assumption 3. The noisy measurements {y t }, as in Equation (10) A.1 and A.2 stem from the discretization of a Gaussian process as we described in Section 2. From the model in A.1, the zero-mean GMRF represents a spatial structure by assuming that the difference between the parametric mean function and the dynamical environmental process is governed by a relatively large time scale. A.3 is a standard assumption over the measured observations [36]. A.4 will be relaxed to A.6 in Section 5.2 to deal with localization uncertainty. A.5 is from the discretization of the hyperparameter vector to replace an integration with a summation over possible hyperparameters.
We formulate the first problem as follows.

Problem 1.
Consider the Assumptions 1-5. Our problem is to find the predictive distribution, mean, and variance of x t conditional on D 1:t .
We then summarize the intermediate steps to obtain the solution to Problem 1. In what follows, our results are presented in terms of lemmas and theorems. We provide proofs when they are not straightforward.

Lemma 1.
Under Assumptions 1 and 2, the predictive distribution of x t conditional on the hyperparameter vector θ and the measurements D 1:t−1 is Gaussian with the following mean and precision matrix: where µ β t |θ,D 1:t−1 = A t µ β t−1 |θ,D 1:t−1 denotes the expectation of β t conditional on θ and D 1:t−1 and Σ β t |θ,D 1:t−1 = A t Σ β t−1 |θ,D 1:t−1 A T t + B t WB T t denotes the associated estimation error covariance matrix.
Proof of Lemma 1. It can be shown by the update using the prior precision matrix and the previous iteration as in [12] (see more details in Section 3 of [12]).
For a given hyperparameter vector θ, Equation (21) provides the optimal prediction of the spatio-temporal field in time t using data up to time t − 1.
The following lemma is used to compute the posterior distribution of θ, recursively.
The following theorem explicitly illustrates how the results of Lemmas 2 and 3 lead to the predictive statistics of x t under Assumptions 1-5, which will be the solution to Problem 1.

Theorem 1.
Under Assumption 5, the predictive distribution of x t |D 1:t is given by where π(θ|D 1:t ) and π(x t |θ, D 1:t ) are given by Lemmas 2 and 3, respectively. The predictive mean and variance follow as Proof of Theorem 1: The predictive mean and variance is obtained by marginalizing over the conditional distribution of θ given D 1:t . The marginal mean and variance are E Y (Y) = E Y (E X (Y|X)) and Var Y (Y) = E X (Var(Y|X)) + Var X (E(Y|X)), where E X and Var X denote the expectation and the variance with respect to the random variable X. Having Y := x t |D 1:t and X := θ|D 1:t completes the proof of Theorem 1.
The optimal prediction of the spatio-temporal field x t |θ, D 1:t−1 using predictive statistics of x t−1 |θ, D 1:t−1 is provided by Lemma 1. Lemma 3 provides the optimal estimator of x t |θ, D 1:t , using predictive statistics of x t |θ, D 1:t−1 which is given by Lemma 1. Using Lemmas 1 and 3 sequentially, we can update predictive statistics of x t |θ, D 1:t for known hyperparameters. Lemma 2 gives us the posterior distribution of θ based on the measured data. Finally, Theorem 1 provides the optimal Bayesian prediction of the spatio-temporal random field with a time varying mean function and uncertain hyperparameters by marginalizing θ over the conditional distribution of θ|D 1:t .
The proposed solution to the formulated problem is summarized by Algorithm 1.

Uncertain Hyperparameters and Localization
In the previous section, we assumed that the localization data q 1:t is exactly known. However, in practice, positions of sensor networks cannot be measured without noise. Instead, for example, there could be several probable possibilities inferred from the measured position. In this section, the proposed method in the previous section will be extended for the uncertain localization data.
In order to take into account the uncertainty in the sampling positions, we replace Assumption 4 with the following Assumption 6. Assumption 6. The prior distribution π(q t ) is discrete with a support Ω(t) = {q (k) t |k ∈ I(t)}, which is given at time t along with the corresponding measurement y t . Here, I(t) = {1, · · · , γ(t)} denotes the index in the support and γ(t) is the number of the probable possibilities for q t .
A straightforward consequence of Assumption 6 is that the prior distribution π(q k:r ) is discrete with a support Ω(k : r) := ∏ r g=k Ω(g). In addition, I(k : r) := ∏ r g=k I(g) denotes the index in the support Ω(k : r), and γ(k : r) := ∏ r g=k γ(g) is the number of the probable possibilities for q k:r . Now we state the problem as follows.

Problem 2.
Consider Assumptions 1-3, 5, and 6. Our problem is to find the predictive distribution, mean and variance of x t conditional on the prior P 0 and the measurements y 1:t .
For the sake of conciseness, let us define R r:k := {P r−1 , y r:k }. We then have R r:k ⊂ D r:k , where we recall that D r:k := {P r−1 , q r:k , y r:k }.
To solve Problem 2, we first look for a way to compute the posterior distribution of q t as summarized in the following lemma.   1:t−1 ) given by Equation (22), where n ∈ I(1 : t − 1), k ∈ I(t), and D We now give the exact solution to Problem 2 as follows. 1:t , y 1:t }. Under Assumption 6, the predictive distribution of x t |R 1:t can be obtained as follows:

Theorem 2. Consider the predictive distribution π(x t |D
Consequently, the predictive mean and variance are given by the formulas:

Proof of Theorem 2:
The proof is similar to that of Theorem 1. Hence, the predictive mean and variance are obtained by marginalizing over the conditional distribution of q 1:t . The marginal mean and variance are E where E X and Var X denote the expectation and the variance with respect to the random variable X.
The complexity of the proposed algorithm in Theorem 2 is proportional to the number of possibilities for q 1:t . The result of Lemma 4 enables us to compute these probable possibilities recursively. However, the number of the non-zero probable combinations grows exponentially by a power of time index t.
In what follows, we propose an approximation, with a controllable trade-off between approximation error and complexity, to the exact solution given in Theorem 2 by including an option of ignoring uncertainties on past position data. This approximation will include a case of the exact solution with the maximal and original complexity. The idea is based on the fact that the estimation of x t is more susceptible to the uncertainties in recently sampled positions as compared to old ones. To formulate our idea clearly, we present first the following results.

Lemma 5.
Using prior distribution of x t−m and measured data y t−m+1:t , where m ∈ Z >0 , the posterior distribution of q t−m+1:t can be obtained recursively via where j ∈ I(t − m + 1 : t − 1), and k ∈ I(t).
computed by Theorem 1. Under Assumption 6, the predictive statistics of x t |R t−m+1:t are as follows: To implement approximations to the predictive statistics of x t |R 1:t which are given by Theorem 2, we consider the following conditions.
C.2 For 1 m ≤ t, P t can be approximated by Under conditions C.1 and C.2, it is natural for us to propose the following approximations: In Theorem 3, the predictive statistics of x t |D (h) t−m+1:t are obtained from Algorithm 1, which is given in Section 5.1. The only difference is that we start from time t − m + 1 instead of time 1 with P t−m instead of P 0 . Note that, without condition C.2, we cannot use Algorithm 1 to calculate the statistics of x t |D (h) t−m+1:t . The proposed approximation for the case of uncertain localization in Equation (35) is quite different from a mere truncation of old data in the sense that past measurements still affect the current estimation through the approximately updated prior information using Equation (34). Note that we update the prior information from P t−m to P t with the cumulative data collected from time t − m + 1 up to time t, which is different from only using truncated observations. The proposed approximation for the formulated problem is summarized by Algorithm 2.
To further understand the nature of the proposed approximation, consider the following two extreme special cases.

Corollary 1.
As a special case of Theorem 3 for m = 1, the posterior distribution of q t can be obtained via

Complexity of Algorithms
In this section, we discuss complexity aspects of the proposed algorithms. For a fixed number of the radial basis functions (i.e., p) and a fixed number of the special sites (i.e., n), the computational complexity of Algorithm 1 is dominated by Equation (24). The complexity of Algorithm 1 in each time step is O(LN 2 ), where L is the number of possible hyperparameter vectors and N is the number of agents. The complexity of Algorithm 2 in time t is O (γ(t − m + 1 : t)) times the complexity of Algorithm 1 for m time steps. Hence, the complexity of Algorithm 2 in time t is O γ(t − m + 1 : t)LN 2 M . The numbers of special sites and radial basis functions affect the complexity of Algorithm 1 as well. The complexity of the three cases is O(n 3 ) with respect to the number of special sites due to the matrix inversion. Thus, if we use finer grids, we increase the number of special sites, i.e., n, and the complexity increase cubical. For a fixed set of L, n and N, the complexity of Algorithm 1 with respect to p is O(p 3 ).

Experimental Results
Similarly to [38], a vision-based robotic sensor was built to validate the proof of concept. The wireless mobile robot is equipped with two motorized wheels, a micro-controller, and a 360-degree omnidirectional camera, a motion sensor, a wireless receiver, and a transmitter. The omnidirectional camera is homemade from a cheap Wi-Fi remote CCD camera (Ai-Ball R , Trek 2000 International, Singapore) and a globe mirror. The vision images of the 360-degree environment around the robot produced by the omnidirectional camera are streamed via 802.11 b/g Wi-Fi interface to a laptop for image processing (see Figure 2).
In this section, we apply the proposed prediction algorithms to real experimental data. Figure 2 shows our experimental setup in which a redness intensity field is sampled by the captured images from the CCD camera on top of the mobile robot. The CCD camera captures 360-degree images. The redness intensity is computed by simply averaging the red component of the RGB picture. Noisy visual measurements are sampled at random sampling positions by our robot. The position of the robot has been measured by an image processing software which is built by the authors and the true sampling positions are obtained manually by inspection.
The experimental objective is to predict the redness intensity field over a spatial space or a spatio-temporal space using the scalar field model proposed in Section 4. Note that each image has a lot of information. However, we only used the redness out of each image as a scalar value of interest. In the future, we plan to extend this experiment for multivariate random fields for multiple features from each image. Here we are considering two different scenarios. First, we consider a static field and then a time-varying field with a moving person in the surveillance region.

Spatial Field in the Static Experiment
In this study, the spatial sites in S are considered to be 10 × 26 grid points, i.e., n = 260. The grid points are shown in Figure 2 with aqua-blue dots. The mean function µ t consists of only one radial basis function that keeps moving average of the field. Note that this basis function can model the changes in the brightness of the images caused by the slow changes of the environment lights. The center of the radial basis function is (5,13), and its bandwidth σ 1 is ∞. The prior distribution of the hyperparameter vector θ is chosen to be discrete with a support Θ = {25, 50, 100, 200, 400} × {0.1, 0.2, 0.4, 0.8, 1.6} and the associated uniform probabilities. The measurement noise variance σ = 0.01 is estimated.
To demonstrate the usefulness of our model in Equations (12), (14), and (15), and our prediction algorithm, we simulate eighty mobile robots with a single mobile robot that measures spatially distributed eighty samples, i.e., n = 80, where each of nine sampling positions is uncertain with four possibilities. Thus, there is 4 9 possible combinations for this set of sampling positions. After two sets of observations, the resulting posterior probabilities of the hyperparameters for Case 1 are shown in Figure 3. Figure 3 shows that the estimated hyperparameters converge to κ = 100 and α = 0.8, which is equivalent to = 1.58 and σ f = 0.0315. We consider three cases: Case 1 using Algorithm 1 with exact sampling positions, Case 2 using Algorithm 1 naively to measured sample positions including noisy locations, and Case 3 using by applying Algorithm 2 with m = 1. The prediction and prediction error variance are computed for Cases 1, 2, and 3. The prediction and the prediction error variance using true sampling positions (Case 1), are shown in Figure 4a,d, respectively. Red and blue colors represent the highest and the lowest values, respectively. Figure 4b,e show the resulting fields, by applying Algorithm 1 naively to measured sample positions including noisy locations (Case 2). Figure 4c,f show the results by applying Algorithm 2 with m = 1 (Case 3). Blue trails shown in Figure 4d-f represent the low predicted error variances due to sampling. The results confirm that the quality of the prediction in Case 3 are not very compromised as compared to Case 1 and demonstrate the capability of our proposed algorithm to deal with uncertain sampling positions. Figure 5 shows the controllable trade-off between approximation error and complexity for another simulated example. The complexity of Algorithm 2 increases exponentially with respect to m. The predicted field is compared between Case 1, which is the best prediction quality expected, and Case 3. Clearly, by increasing m, the mean square difference between Case 1 and Case 3 decreases. However, this mild improvement results in the exponentially increasing computational load, which guides us to use m = 1 for practical cases. Figure 6 shows the effect of an increasing number of observations with uncertain sampling positions on the three cases from the same simulated example. Here we assume that we have seven observations with known true sampling positions plus a few observations with uncertain sampling positions. Clearly, the root mean square (RMS) error decreases in Case 1 and Case 3 by adding new observations with true and uncertain sampling positions, respectively. On the other hand, as shown in Figure 6, adding new observations with noisy sampling positions could increase RMS error for Case 2. Figure 6 also shows the efficacy of our proposed algorithm, which can be compared to the previous work [23]. The previous work in [23] was limited to a static random field and achieved 3.63 RMS error with a bandwidth of 4.47. Note that our proposed method (Case 3) considers the temporal random field and achieved averaged RMS error less than 1.2 with a bandwidth of 1.58. For a fair comparison against [23], we normalize the input space of the GMRF by the bandwidth and compare resulting values in terms of RMS over bandwidth. Our current work achieved 0.759 RMS/bandwidth outperforming the previous work in [23] with 0.812 RMS/bandwidth.   The approximation order m The mean square di rence Figure 5. The mean square difference between Case 1 and Case 3 is shown for the different approximation orders m = 1, · · · , 5. On each box, the central mark is the median, the upper and lower edges of the box are the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points.

Spatio-Temporal Field in the Dynamic Experiment
In this scenario, the spatial sites are the same as in the previous experiment. The mean function µ t consists of twenty nine radial basis functions. The centers of radial basis functions are {(5, 13)} ∪ {1, 4, 7, 10} × {1, 5, 9, 13, 17, 21, 25}. The time evolution of β t is modeled by Equation (14), where the state matrix A t and the input matrix B t are given by I and 0.05I, respectively. The first radial basis function has an infinity bandwidth (i.e., σ 1 = ∞) to represent the average of the field, and the others have a bandwidth that is equal to σ j = 4. A person moves to a spot and stays still while a robot is collecting observations, the process of which is then repeated in order to efficiently simulate multiple robots. The robot collects 57, 40, 31, and 45 observations corresponding to times t = 1, 2, 3, and 4, respectively. The first column (a1-4) in Figure 7 shows the positions of moving robot and person in the domain. As mentioned, the position of the robot has been measured with a fixed camera and an image processing software. Sometimes, the robot is not visible by the positioning system since the moving person blocks the robot. The black areas in the second column (b1-4) of Figure 7 show the blind spots of the positioning system. Note that other positioning systems like GPS also have blind spots such as GPS denied areas. Therefore, when the robot moves to a blind spot the position cannot be determined precisely. In this case, we assign different probabilities on multiple sampling positions.
True sampling positions in each time step are shown with blue dots in the second column (b1-4) of Figure 7 while just blue dots in the white area are measured through the positioning system and the positions in the blind spots are recorded manually for a comparison purpose.
The prediction results of Case 1 and Case 3 are compared with a trivial method of prediction defined as follows.
Case 4: The fifth column (e1-4) of Figure 7 shows the resulting fields, by applying Algorithm 1 on just observed sampling positions. Here, all the observations whose sampling positions are uncertain are discarded. In particular, 12, 3, 2, and 4 observations have been discarded for time t = 1, 2, 3 and 4, respectively.
The predicted field using true sampling positions (Case 1) and uncertain sampling positions (Case 3) are shown in the third (c1-4) and forth (d1-4) columns of Figure 7, respectively. Since in the fully automated experiment true sampling positions in the blind spot are not available, we used the proposed algorithm in this paper to deal with uncertain sampling positions in the blind spot areas. The predicted field simply by discarding uncertain sampling positions is shown in the fifth column (e1-4) of Figure 7. The results obtained for Case 3 with m = 1, is not compromised considering the result for Case 1. Thus, the experimental result demonstrates the effectiveness of the proposed algorithm.

Conclusions
We have tackled a problem of predicting a spatio-temporal field using successive noisy scalar measurements obtained by mobile robotic sensors, some of which have uncertain localization. We developed the spatio-temporal field of interest using a GMRF and designed sequential prediction algorithms for computing the exact and approximated predictive inference from a Bayesian point of view. The most important contribution is that the computation times for Algorithms 1 and 2 with a finite m at each time step do not grow as the number of measurements increases. Two different static and time-varying experimental results along with a comparison study using simulation results provide a solid proof of concept on the proposed scheme. The proposed algorithm will be useful for robotics applications such as environmental monitoring by autonomous aquatic robots and drones. Future work is to apply our algorithms to spatio-temporal sensory information fusion for autonomous driving. In particular, we plan to predict a spatio-temporal scalar field of a risk measure. The self-driving vehicle will be designed to perform path-planning taking into account such predicted risk measures over space and time for better safety.

Conflicts of Interest:
The authors declare no conflict of interest.