Spatio-Temporal Dynamic Fields Estimating and Modeling of Missing Points in Data Sets Using a Flexible State-Space Model

: Modelling and estimating spatio-temporal dynamic ﬁeld are common challenges in much applied research. Most existing spatio-temporal interpolation methods require massive prior cal-culations and consistent observational data, resulting in low interpolation efﬁciency. This paper presents a ﬂexible state-space model for iteratively ﬁtting time-series from random missing points in data sets, namely Flexible Universal Kriging state-space model(FUKSS). In this work, a recursive method similar to Kalman ﬁlter is used to estimate the time-series, avoiding the problem of increasing data caused by Kriging space-time extension. Based on the statistical characteristics of Kriging, this method introduces a spatial selection matrix to make the different observation data and state vectors identical at different times, which solves the problem of missing data and reduces the calculation complexity. In addition, a dynamic linear autoregressive model is introduced to solve the problem that the universal Kriging state-space model cannot predict. We have demonstrated the superiority of our method by comparing it with different methods through experiments, and veriﬁed the effectiveness of this method through practical cases. Mixed-Effects model to calculate massive spatiotemporal data.


Introduction
Spatio-temporal dynamic data is a series of data with time and space dimensions, obtained by sensors, GPS, and other devices [1,2]. It is widely present in many domains, including ecology, meteorology, economics, traffic, and so on. Often, we need to use spatiotemporal data to describe or predict the rules of occurrence and development, further guiding our decision making. However, due to the inherent characteristics of spatiotemporal data in the three aspects of time, space, and attribute, it features the complexity of multi-dimensional and spatio-temporal dynamic correlation. Therefore, we require a flexible, interpretable, and accurate model to fit the spatio-temporal data and obtain a latent dynamic model [3].
Process spatial-temporal dynamic data consists of two stages: A sensor data acquisition stage and a data processing stage, as demonstrated in Figure 1. In the data acquisition stage, due to the limitations of current technology and cost, missing data [4,5] may occur in each round of data collection, and its distribution may differ. As shown in the second subfigure, black indicates effective data collection and white indicates missing data. Among them, the effective collected data are mainly composed of real data and random systematic error, which are represented by blue and red, respectively. At the data processing stage, with these incomplete sensor data, we expect to achieve interpolation estimation for generic positions, estimation for future moments, and filtering approximation for real data.
The characteristics of the above observed data have been extensively studied and various methods have been explored. Researchers have used existing sensor observation data to fill in the missing data by interpolation, including linear, spline [6], and Lagrange interpolations [7]. However, this strategy may lead to some deviations, especially the lack of consideration of spatio-temporal correlation uncertainties in describing local variations of dynamic fields. Similarly, data errors can be eliminated by filtering or statistical analysis, but this approach is mostly biased towards temporal or spatial dimensions. Various commonly used methods have been applied to model spatio-temporal field, including the Bayesian model [8], State Space model [9], and Kalman filter [10]. In practical application, these spatio-temporal dynamic models often require sufficient or consistent available sensor observations to achieve a balance between estimation accuracy and computational burden. Therefore, developing an effective and flexible spatio-temporal dynamic field modeling method from sensor observation data is essential. In this paper, we propose the FUKSS model, approximating the stochastic equation of state representation. This model retains the characteristics of universal Kriging, and uses best linear unbiased estimation (BLUE) [11] to reconstruct the estimation function. By the recursion and calculation of the state equation, we obtain the recursive variables and the calculation process, similar to the Kalman filter error correction. In this way, we make use of universal Kriging analysis of spatial correlation, while maintaining the Kriging calculation burden for each function in the sequence, in order to achieve the error recursive correction of inconsistent data and the storage of consistent intermediate variables. In essence, the algorithm is more biased to the Kalman-Kriging filtering model estimation in the spatial domain; that is, the weight of the current data in the current spatial domain estimation is increased, to some extent, in the calculation process. The result is similar to the extension of the unbiased vector Kalman filter proposed by Kitanidis [12] to the Kriging space domain.
In FUKSS, the problems to be solved and main contributions can be summarized as follows: • The estimation of general position is realized by the statistical characteristics of Kriging [13,14]. At the same time, the intrinsic random function (IRF) [15,16] is used to replace the variation function which requires prior statistics in the Kriging process, such that the whole model does not need prior information in the calculation process.

•
In the process of model calculation, a constant size state vector is maintained (the size being the number of sensors, N, in the sensor network) and the available sampling data n t at time t is obtained through the data selection matrix W t for correction and calculation. This method solves the problem of data inconsistency caused by different sampling data at different times. At the same time, for a constant extra space storage required (O(N 2 )) and computational complexity (O(n 3 t )) in the cyclic calculation process, the computational complexity can be further reduced by setting W t .
• Through the derivation process, similar to that for a Kalman filter, the model realizes a similar function to Kalman smoothing [17], such that the estimated data can eliminate the random systematic error, to a certain extent. Exponential smoothing prediction [18,19] is introduced, in order to improve the prediction ability of the model.
The remainder of this article is organized as follows: Section 2 provides related background information. Section 3 describes the FUKSS model in detail. For this model, in Section 3.1, we propose the state equation and derive the recursive equation. A specific formulation of the state-space model is described in Section 3.1.2, while the detailed calculation process is described in Section 3.1.3. In Section 3.2, we describe the selection of various functions for our model in detail. In Section 3.3, we introduce cubic exponential smoothing prediction to realize the multi-step prediction by the model. Based on this model, we conclude the paper with some numerical examples in Sections 4.1 and 4.2, an application of the algorithm in Section 4.3, and a discussion of the method in Section 5. Section 6 presents the conclusion of our work.
Notations: Matrices are in upper case bold, and column vectors are in lower case bold. The notation (.) T is the transpose operator, Z t (x) is the estimate of Z t (x). The special cases to be noted are: the regional variable functions Z and Y based on the stochastic process. Z t (x) and Y t (x) represent an implementation of a random process at time t and position x. In this case, bold capital Z t and Y t represents positions (x 1 , x 2 , . . .) at time t, which represents the n * 1 vector. Sampling data lowercase x represents the position information of a point, and a capital X represents the data set composed of n sampling points. An identity matrix of size N * N is denoted by I. x a − x b 2 represents the second order universal number between x a and x b , namely the Euclidean distance. Specific variable settings in the manuscript can be found in Appendix C.

Related Work
Estimation of arbitrary positions in space and time and filtering of observed data are also being encouraged in other fields. In these application fields, estimation methods are required as the main means to recover unknown information and solve the problems of missing data, sensor layout optimization, sensor system error and so on. It also provides support for data summary and spatial correlation study of observed data. In practice, estimation methods are needed to process spatio-temporal dynamic data, potentially in stream scenarios. Two common studies are mainly found in the literature as follows:

Kalman Filter
Kalman filter is widely used in spatio-temporal dynamic modeling. In recent years, Kriging has elicited considerable attention in describing spatial correlation, and has been widely used to compensate for spatial correlation of Kalman filter, namely Kalman-Kriging filtering model(KKF).
In KKF, Kriging interpolation is used to construct a spatial field, in order to describe the spatial correlation, and Kalman filtering is used to describe the temporal correlation [20][21][22]. Although Kriging [23][24][25] has been widely used to consider various fields of spatial correlation, in KKF, the time dimension as an additional dimension, brings non-stationary changes, that greatly increase the computational complexity of the linear equations. Considering the large computation problem, the KKF model is suitable for the interpolation of spatial sparse stations. Therefore, many researchers try to model covariance functions to reduce computational complexity, and several effective methods have been obtained, mainly including sparse matrix algorithm, rank reduction technology, and Gaussian random field. For example, Wike [26] proposed a space-time Kalman filter which has the dimension reduction function and a temporally dynamic and spatially descriptive statistical model. Cressie et al. [27][28][29] proposed a fixed-rank spatiotemporal Kalman, Spatio-Temporal Random Effects, or Spatio-temporal Mixed-Effects model to calculate massive spatiotemporal data.
As mentioned above, the KKF model has the advantages of Kriging and can also have high interpolation accuracy for sparse sites or irregular sampling distribution. However, the current calculation process of the KKF model [30] requires a lot of time to try different initial parameters, in order to obtain the optimal accuracy, which limits the usefulness and reliability of the model.
In addition, the aforementioned method requires sufficient available sensor observations to achieve a balance between the accuracy of the field estimate and the computational burden. This is mainly because the state vector and the observation vector need to be consistent in the time dimension in the KKF process. However, one of the challenges in modeling is that limited and time-varying distributions of sensor observations are available (sampling data at different moments are distributed differently, as shown in the data acquisition in Figure 1).

State-Space Model
The State-space model is also a rapid and flexible generalized model for handling a wide range of spatio-temporal dynamic problems. In the state-space, a state transition or state correlation model is constructed using stochastic theory, and then the transition or correlation model is used to estimate the value of the unobserved spatio-temporal position.
The state-space model based on Kriging can effectively use the characteristics of both. As often happens in geostatistics, the Kriging state-space model also constructs the random field into time-dependent trend terms and residual fields, where the trend term is used to describe large-scale variation characteristics and the residual field is used to compensate for local variation. The universal Kriging state-space model [3] has been used for the spatial interpolation of medical images, and the error is corrected according to a Kalman filter of time observations. The results of this model provide an optimal predictor for this process decomposition and provide a basis for extending recursive filters to the spatial domain of Kriging theory. Cressie [31] constructs Kriging state-space model using six-parameter quadric in polynomial trend surface analysis. Martin [32] construct the Kriging state model of the power trend using Euclidean distance, thus realizing the estimation of arbitrary point sources at different times. However, this model does not take further advantage of the characteristics of the universal Kriging method, which also requires the consistency of the observed data. At the same time, due to the limitations of its algorithm, it cannot carry out prediction in the absence of state inputs from the system. However, compared with the KKF algorithm, the universal Kriging state-space model requires less prior knowledge.

Problems and Solutions
Missing data and noisy data are problems encountered repeatedly in data acquisition and analysis. Especially for the analysis of spatio-temporal dynamic data, we also need to consider the characteristics of time, space and space-time. Here, periodic sensor temperature data obtained from grain storage is taken as an example to illustrate the data challenge. Due to cost and mechanized operation constraints, a large granary has only limited sensor observations (the distance between two adjacent horizontal sensors can be up to 5 m). Sensor observations may not be properly collected into the database due to unexpected reasons, such as sensor damage, communication failures, and data read/write errors. In addition, the data obtained by the sensor not only has the detection error of the sensor, but also has the systematic error caused by its principle, i.e., the temperature obtained by the sensor is the temperature in the pore inside the grains instead of the actual temperature of the grains. At the same time, this kind of data have obvious temporal and spatial relationships, i.e., its temperature changes are continuous in time and correlated in space.
In general, there are three problems related to the automatic processing of collected data: (1) The inconsistency of periodically collected data; (2) Possible deviation between the observed value and the real value; and (3) problems related to the temporal collected data and historical data. For the above problems, the existing models require consistent observations in the spatial and temporal domains to estimate dynamic fields. Comparatively, Kalman filter needs more data for pre-processing to obtain the initial parameters of the model, so it has a better prediction structure than the state-space model. In order to simplify the calculation, the proposed model is based on the state-space model, so it does not need additional data for pre-processing.
The FUKSS proposed in this paper provides a cyclic estimation method for spatiotemporal dynamic fields. The data selection matrix W t is used to realize the correspondence between the state matrix and the observation vector, so that the observation data of different sizes at different times can be directly used in the circular calculation process without additional operation. At the same time, we use the universal Kriging feature to estimate the arbitrary position and eliminate the system random error in a way similar to Kalman filter. Finally, exponential smoothing prediction is introduced to obtain quasi-observed value to simulate state input, so as to improve the prediction ability of the model.

The FUKSS Model
The FUKSS model is based on a universal Kriging state-space model and introduces the observed data Y t with length n t of different quantities and distributions into the cyclic calculation process. In the calculation process, we maintain a constant state vector Z t with length N and make corrections through different observation vectors. The FUKSS model is described by the following state and measurement equations: where Z t = [Z t (x 1 ), . . . , Z t (x n )] T are the state values at N sensor nodes positions, and v t is a vector of observation noise (n t * 1), as determined by the measurement error of the sensor. We can determine and it is uncorrelated with all past noise and data. F T β t is the expected value, where T . Among them f (x) denoting the r known drift functions, which often used in universal Kriging [33,34] estimation, and β t is unknown drift coefficients vector (r * 1). ε t = [ε t (x 1 ), . . . , ε t (x n )], ε t (x) denotes the spatially correlated errors, with zero mean and known covariance function: In the calculation process, we introduce the selection matrix W t with size n t * N and rank n t . For a sensor network with N sensor nodes, all sensors are numbered to obtain a vector for facilitate calculation. Therefore, the T rounds data collection will form a measurement matrix, as shown in Figure 1. Here, each row of W t has only one 1 and the rest of the entries are 0, where W t (i, j) = 1 representing the i th data Y t (x i ) of Y t in the coordinate of measurement matrix is j at time t. The expression is: The essence of the FUKSS model is to maintain a fixed state variable Z t , and use a different number of observation variables Y t to calibrate and estimate Z t (x), in order to realize estimation of the whole process. In this model, the current state, Z t (x), is determined by the previous state Z t−1 (x) and the stochastic model f T (x)β t + ε t (x) used in universal Kriging. Therefore, we can obtain the estimated for a generic spatial site x, as follows:

Simplified Calculation
When working with this model, although the state equation is effective, it is difficult to further calculate the Kalman filtering process. Therefore, we expand the state equation and carry out some related simplifications: Using these definitions, Z t can be written as By definition, we know that the trend term F T m t provides the mean, and the first component c t−1 = c t − ε t is a latent variable representing the error. Therefore, we can obtain the same expression as the spatial universal Kriging method: In contrast, the effective values we use for the universal Kriging estimate can be expressed as: It is inevitable that we consider the change of the state equation at different times. Our model is based on the state of the previous value and introduces a universal Kriging estimate; the specific process is shown in Figure 2. As can be seen from the figure, our model is mainly divided into two parts: One is the latent variable c t representing the errors, and the other is the trend term F T m t . In the model, we use the current observation data to simulate the trend term, and use the method similar to Kalman filter derivation to carry out correction and progressive estimation of latent variable. Therefore, the essence of this model is to update and cyclically estimate the error after removing the known trend term.
For the sake of consistency, we assume that neither trend term F nor Systematic error distribution R t changes with the time t. For spatially correlated errors ε t (x) at different times, we assume that: Finally, as an initial condition, we assume that Z 0 is a zero-mean process with a covariance function aE[ε 0 ε T 0 ], in order to keep c t consistent over time. Where a is a known parameter. Thus, we can extrapolate that Z 0 (x) is not correlated with ε t (x) for all t from the hypothesis.The next part of this article details the derivation and lists the required leading parameters.

Recursive Estimate
Given the observed data Y t and the latent variable c t−1 at time t, we estimate Z t (x) to obtain Y t (x). In this case, we use BLUE, which is widely used in universal Kriging, with expression by Equation (11): Unbiasedness implies that the estimate must satisfy The best estimate Z t (x), then minimizes the mean squared error Squaring the argument and applying the expectation yields where In this result, we know that v t is white noise, such that its expectation with the other values is 0. The terms q T t (x)W F T m t and f T (x)m t can be removed from the expression by Equation (14). Thus, the question now is how to compute q t (x). The q t (x) is derived by minimizing the variance of the prediction error, subject to FW T t q t (x) = f (x), such that the prediction is unbiased. The constrained minimizing the variance problem can be solved by the Lagrange multiplier method: where the Lagrange multipliers λ are unknown and must be estimated. Setting the partial derivatives of this criterion with respect to g t (x) and λ equal to zero, we obtain the following equations: . (24) Therefore, we can solve for q t (x), yielding the desired minimizing weight vector where As all K and R t are positive definite, and this implies that the inverse exists. P t is positive semi-definite covariance matrices, and L t is also positive definite. Then, we can prove that FW T t L t W t F T is full rank and has rank r because we have assumed F has rank r. This implies that M t exists.
By the generalized least squares method we can obtain the unknown drift coefficients vector m t in Equation (10): Thus, we can obtain the estimates of c t and c t (x) by the following equations: Before estimating these unknown terms, we can obtain a recursive estimate, c t , as follows: Then, we evaluate P t , as in Appendix A: To achieve the unknown quantity p t−1 (x), we discuss the relationship between p t−1 (x) and k(x) further in Appendix B, and obtain: We yield the solution between g t (x) and G t : The solution in Equation (33) we can also be written as: Therefore, Finally, we use the equation in Equation (5) to write:

Recursive Estimation and Parameters Settings
Our algorithm has a lot of prior variables, as shown in the previous derivation; mainly K, k(x), P 0 , R t , and F. First, the parameter selection of R t is mainly based on the measurement error of the observation itself. Therefore, the observed quantities are independent of each other. Therefore, for R t , we need the corresponding probability distribution to obtain the variance.
Second, we can find the value of P 0 = TK using the length of the sample data, or we can just specify the value of P 0 = 0 or a smaller value, when T is large.
Third, we must specify drift functions to obtain f (x) and F, mainly to reflect spatial non-stationarity. However, as the specific order of the spatial trend distribution cannot be obtained, the lower order is generally chosen to reflect the possible spatial trend characteristics, and the experiment proves that this choice also obtains good results. In fact, this phenomenon also conforms to the characteristics of universal Kriging estimation. In the process of universal Kriging estimation, it is necessary to calculate the variogram or covariance function of the residual error by removing the drift term. Compared with the calculation without drift term, the result can be made more accurate by removing the trend term of lower order.
Finally, it is difficult to choose the general k(x a , x b ) in Universal Kriging theory, to determine K and k(x). To simplify the calculation, we chose intrinsic random functions (IRF), which are a more generalized covariance function. In this paper, we propose two generalized covariance functions. Matheron [15] proposed an IRF model of order k, with expression k( x a − x b 2 ) = x a − x b 2k+1 2 . Watson [16] proposed an IRF function similar to a spline, with expression k( x a − x b 2 ) = x a − x b 2 2 log x a − x b 2 . Therefore, the form of k(x a , x b ) in this paper mainly refers to these two forms, and good results were obtained in the experimental verification. The specific steps are provided in Table 1. As shown in table, this FUKSS model maintains a fixed-size state variable c t and transfer matrix P t . The complexity of calculation at time t is only related to the quantity of valid sampling data Y t . Initialize: t = 1, P 0 = αK, c 0 = 0(n * 1).
Calculate the matrices: Updated parameters :

Dynamic Linear Autoregressive Model Specification
The prediction ability discussed in this paper is mainly for multi-step prediction ability. In this experiment, the time interval between the time-series data is equal, and the current data are used for estimation and revised in our calculations. Therefore, this method is not suitable for prediction, due to the lack of a specific model to exclude the prediction of future functions. Compared with spatial interpolation, our model can eliminate the observation error, to a certain extent, with an operation similar to that of a Kalman filter. To solve this problem, we introduce the dynamic linear autoregressive model to solve the prediction problem. During prediction, the predicted value is closer to the current value, as illustrated in the subsequent tests. A simpler alternative model, based on the dynamic linear model, is the Holt-Winters model. This model has three main components: A smoothing sequence (S t ), a trend sequence (B t ), and a periodic smoothing sequence (C t ). We can combine these components together with several methods. To make the model more general, we use an additive model, which integrates the components together to obtain the full model.
where L is the length of periodic sequence. Inside the model, we use the cubic exponential smoothing method to yield the components. ζ, τ, and γ are the parameters used, which have values between 0 and 1. The equations are given below.
The initial data have little impact on the whole model, so we choose S 0 = Y 0 (X), We introduce exponential smoothing prediction to provide additional quasi-observable values for multi-step prediction. The detailed steps are shown in Table 2. Exponential smoothing prediction is introduced to enhance FUKSS multi-step prediction. In the prediction process, we first use our proposed model to re-estimate the observed values, in order to improve the accuracy of exponential smoothing prediction, which can realize data cleaning (e.g., denoising) to a certain extent. For the prediction stage, we use the updated parameters of the iteration to re-estimate the quasi-observed values, which makes the actual predicted value more in line with the spatial distribution law set by the model. However, this prediction error also increases with the increase of the prediction step size. At the same time, due to the limitation of the dynamic linear model, it can predict and describe the time-series with trend and periodic change more accurately. Initialize: t = 1, P 0 = αK, c 0 = 0(N * 1). Given: K, k(x), α, R t , F,W t ,T pre , L, ζ, τ, and γ For t = 2, . . . , T Loop calculate the matrices: L t , M t , G t , P t , c t Calculate the desired estimate using: According to the order calculate the corresponding parameters: L t , M t , G t , P t , c t , S t , B t , C t . End for

Experiments
In this section, we present experimental comparisons of different types and dimensions. First, we conducted experiments on a thermodynamic fitting data set, in order to demonstrate some of the model's characteristics, including accuracy, continuous estimates of different sampled data, and prediction ability. We also apply the FUKSS model to real data.

Experiments on Thermodynamic Model
There have been many studies on thermal spreading models in the field of spatiotemporal analysis and prediction [35]. In the aspect of spatio-temporal correlation, they have very obvious characteristic. At the same time, their complexity is manageable. A 1D simple thermodynamic model is: where T is the temperature field, including time t and position x, and a is the thermal conductivity coefficient.Using the finite difference method, it can be discretized as The time of thermodynamic simulation was 200 time steps, and 41 discrete points were selected for calculation to construct the data set. The initial temperature distribution was 25 − 40 * (x − 0.5) 2 , which is a second-order distribution, and the boundary conditions were fixed. The result is shown in Figure 3a. White noise with a standard deviation of σ = 0.5 was also added to the simulated data, the results are shown in Figure 3b. We evenly chose 10 points in the interval at which to observe the functions. The sampling model is shown in Figure 3c.
We applied the FUKSS with model parameters k(x a , x] T , R t = 0.5 2 I, and α = 5. In the actual work process, we often cannot determine the trend term order of the distribution, and generally choose a low possible order. Therefore, we chose first-order f (x) in this experiment, and proved that this choice also achieved good results. For the parameter R t , we chose the observation error as a Gaussian distribution with standard deviation of 0 and variance of 0.5 2 . The selection of other parameters is detailed in Section 3.2 and the discussion.
The resulting function estimates are depicted in Figure 3f. The same sampled data were used in linear and Kriging. Note, that we adopted the same covariance parameter as FUKSS, as the fitting of the covariance function could not accurately describe the actual situation due to the small amount of sampling data for Kriging interpolation. Compared with the experiment, we found that the three methods could fit the actual situation well, but the proposed method had a smaller error. At the same time, it can be seen, from Figure 3f, that the Kalman filtering method could effectively eliminate the observation noise. We further compared different sampled data at different times, and the results are shown in Figure 4. Different sampling methods had an obvious influence on the spatial interpolation but, through iteration, our model obtained better results, compared with the other spatial interpolation models. At the same time, comparing Figures 3f,i and 4f,i, under the condition of the same number of observable sensors, multiple sampling methods can improve the fitting accuracy of the model.
In Figure 5, we took out 40 time-steps equally from 200 time-steps, in order to simplify the calculation process. The first 30 time-steps were trained and the last 10 time-steps were used as prediction test. We applied the KUKF Prediction algorithm with cubic exponential smoothing model parameters ζ = 0.8, τ = 0.3, γ = 0.2, and L = 0. For Figure 5e, we chose this model instead of KKF, mainly as the Kalman filter cannot be corrected when making multi-step prediction, which will increase the error and does not conform to the usage of the Kalman filter itself. As for Figure 5f, we used FUKSS to make the data similar to Kalman smoothing, improving the prediction result of the dynamic linear model. This method optimizes the Kalman filter correction problem in multi-step prediction, to a certain extent. However, at the same time, the model is required to have a more suitable initial parameter setting, which makes the model establishment more difficult.

2D Function Simulation
For the external variation conditions, we also carried out relevant experimental verification. Figure 6a shows a two-dimensional thermodynamic conductivity diagram, the bottom of which has a fixed temperature value, where the external temperature varies with time, and different thermal conductivity coefficients in the X and Y directions are used. We imitate the basic coefficient of grain storage, where the length was 24 m and the height was 7 m. The thermal conductivity referred to the actual thermal conductivity of wheat in the actual storage. The initial condition was set as uniform temperature, and we conducted 360 simulations with days as the unit of time. The external conditions were fitted into the annual weather changes using a second-order function. Sampling points are set using a grid sampling method with six points in the X direction and five points in the Y direction. In addition, we randomly selected 90% of the data at all sampling points for each time period. We resampled the time-series on a seven-day basis, in order to obtain 50 time-series with sampling intervals of weeks. Taking the first 40 as training, we predicted the temperature of the 45th. We added noise to the sampled data, using is a random distribution with a standard deviation of 0.5.The parameters selected for fitting were as follows: We chose R t = 0.5 2 I and a = 40. Figure 6b was obtained using our model. Figure 6c shows radial basis interpolation [36], where the radial basis function is: As can be seen from Figure 6, the changes of external conditions were described more accurately using the proposed model. In particular, at t = 23, a small bull's-eye region appears obviously in the radial basis function fit, mainly caused by the random error of sampling. In our model, there was obvious consistency between the before and after, and the timing prediction with five steps was also described more accurately.
The grain storage process is mainly affected by external weather conditions, which mainly present periodic changes. Therefore, we used the dynamic linear regression method to strengthen the multi-step prediction and obtain relatively more realistic results. At the same time, the spontaneous heat conduction inside the grain pile forms the distribution pattern of hot and cold zone, for which the Universal Kriging method can provide a more accurate description. Therefore, the prediction model can provide more accurate description and prediction for the grain storage process. In contrast, this linear regression model conducts multi-step prediction mainly for spatio-temporal data with periodic changes in the boundary conditions.
To further verify the reliability of our model, we used signal data to carry out fitting, in which changes in the temporal characteristics are more common. First of all, we used a sinc function to generate a time signal, and added spherical information on this basis, such that the simulation data contained spherical coordinate information and sinc function time information. As shown in Figure 7a, the height of the sinc function increased over time. We intercepted data from 25 equal time intervals, each of which was sampled on an equally spaced grid. Moreover, we added white noise to it, where the standard deviation was 0.1 times the maximum signal value of the whole process. The first 20 intervals were used for training and the last 5 were used for prediction. To estimate the original function from the noisy data, we applied the Kriging filter with data covariance function and the drift functions (according to Equation (47)). In this case, we chose R t = σ 2 I and a = 20. The results are shown in Figure 7b. For further comparison, cubic spline interpolation was selected for interpolation analysis of the sampled data, for which the results are shown in Figure 7c. Cubic spline interpolation preserves the advantages of piecewise interpolation polynomials and improves the smoothness of interpolation functions. However, as the sampling data contained noise, there were large fluctuation when cubic spline interpolation is used. Comparing (b) and (c) in Figure 7, both methods can preserve the characteristics of the original simulation to a certain extent, but the proposed model had smoother results, compared with cubic spline interpolation [37].

Application to Real Data
The Pacific Sea Temperature (PST) data set, which can be obtained from the Climate Data Library at Columbia University (http://iridl.ldeo.columbia.edu/SOURCES/.CAC/) [38], has 2520 sample temperature data points per month on the Pacific from January 1970 through March 2003. The sampling mode is grid (84 * 30), with a longitude and latitude interval of 2. In the PST data, the first 120 time-series were used for training, and the observation error was set as 0.5. The data of the subsequent 24 months were then predicted and described.The forecast period was 24 months, from January 1980 to December 1982.
To estimate the PST, we first used data normalization to make the coordinates in the same grade interval, and each time was resampled on a regular 21 * 10 grid. We applied the Kriging filter with the data covariance function in Equation (47), drift functions as f (x) = 1, x, y, xy, x 2 , y 2 T , and cubic exponential smoothing model parameters ζ = 0.45, τ = 0.2, γ = 0.95, and L = 12. The results are shown in Figure 8, while Figure 9 shows a comparison of the temperature of a single coordinate over time. As can be seen from the two figures, our method achieved good results, could effectively predict the temperature value of unknown points in the training data, and can also effectively show the trend of change in the prediction process. It can be seen, from the experiment, that the training deviation was relatively small, mainly due to the accuracy of universal Kriging estimation and Kalman filter using the correction of the current observations. At this point, we introduce the dynamic linear model to introduce additional quasi-observable values. Experimentally, it was found that the introduction of such quasi-observable values in the multi-step prediction led to better results. The possible reasons for this are as follows: (1) Kalman filtering cleans and smooths the data in the time-series, by removing the possible noise or outliers, such that better results can be obtained in dynamic linear prediction of the smoothed data; and (2) during the simulation interpolation, universal Kriging also fitted the data with features and correlations, in order to make the data more consistent with the actual situation. Although we solved the multi-step prediction, to a certain extent, the error of the additional quasi-observable values obtained by the dynamic linear model increases as the step size of the prediction increased. At the same time, this quasi-observable value error is also transmitted to the gain matrix, thereby increasing the error. In our choice of May 1981, which is a five-step forecast relative to the time interval, we can clearly see the variation of this error. The temperature of the Pacific Ocean changes on an annual basis. Although the range of change in this cycle is not large, it is also shown in the single point chart. Generally, the annual period of change is about 5 • C, but that in some regions can also reach a range of more than 10 • C. This is something that we have not fully solved yet as, over time, these errors accumulate and the forecast starts to drift. However, we still obtain high accuracy in multi-step short-term forecasting.
The considered grain warehouse temperature test system was a generally distributed system. The temperature sensor is stored in a cable, and fixed on the wall at one end. In this way, a rather warehouse impression of the locations of temperature sensors could be obtained. In this data, on the horizontal plane, the vertical and horizontal spacing of temperature measuring points were 4.3m and 4.6m, respectively, with a total of six rows and 11 columns, a height distance of 6.7m, and a total of five layers. The sensor arrangement is shown in Figure 10a. The data, mainly from October 2016, showed that it cooled during the November period, until June 2017, the data were mainly temperature data, and the collection time was once a week. To estimate grain warehouse temperature, we used data from the first 10 weeks for training and predicted data from the 14th week. From week 5, a small amount of sensor information was incorrectly collected and the value is null. In weeks 7 and 8, information collection of the entire temperature measurement cable was lost.
We applied the Kriging filter with the data covariance function in Equation (47), drift functions as f (x) = [ 1, x, y, z] T , and cubic exponential smoothing model parameters ζ = 0.8, τ = 0.5, γ = 0.3, and L = 0. In this case, we chose σ = 0.5 and a = 10. The results are shown in Figure 10. The granary temperature monitoring system is a three-dimensional storage system. As such, this analysis was carried out in 3D, as showing in Figure 10b-d, with an expanded view of the outer surface. Unsurprisingly, the change was not very dramatic, given the huge size of the silo and the fact that bulk grains are poor conductors of heat. However, for the temperature forecast from week 11 onwards, we obtain better results, as shown in Figure 11. The reason for this lies in the particularity of grain storage. Grain storage is mainly affected by external conditions and shows periodic changes, making it a suitable scenario for the dynamic linear regression model. Figure 11. Prediction against sampled points for all real points.

Discussion
The FUKSS model presented in this paper is a step-by-step method which can be used to reconstruct the time-series of spatial functions from scattered data. As our model is based on the universal Kriging state-space model, the calculation process is based on Kriging's spatial statistics and the derivation is similar to that of a Kalman filter. In terms of space, we do not need to have strict distribution requirements for the sampled data. Using the statistical characteristics of Kriging, partial missing or new data have little impact on the estimate. In terms of time-series prediction, this model has the advantages of small memory requirement (constant extra space storage required O(N 2 )), acceptable computational speed (constant computational complexity O(n 3 t )), and being suitable for dynamic problems. The keys to deducing the recursive filter is to use linear unbiased estimation, establishing the optimality of the recursive formulation. This method uses not only historical data but also the spatial relationship between the historical data, which optimizes the results. To further solve the problem that it is impossible to make long-term predictions, we introduce the dynamic linear regression model to carry out secondary processing on the estimated value of each step, such that the spatial information can still be retained, to some extent, in the long-term prediction. However, there are still some limitations to our model.
As FUKSS is based on the universal Kriging state-space model, our algorithm can be simplified to a similar method of universal Kriging interpolation and Kalman filtering in special cases. If the sensor accuracy is not taken into account-namely, if the measurement data is unbiased-we can simplify it, to a certain extent, to universal Kriging interpolation fitting in this time period. If we do not consider the drift term in Equation (1), we can simplify it to space-time Kalman filtering [39]. Therefore, the parameters of universal Kriging and Kalman filtering are used simultaneously in our model, such that the model has the advantages and disadvantages of both.
Our algorithm has a lot of prior variables; for example, we have to specify K, k(x), α, R t , and F. The selection method for these parameters (except for a) is similar to that of universal Kriging, which has been described in detail in the literature [31]. In general, we recommend using a polynomial drift function for F, as discussed in the literature [40]. For K and k(x), we cannot calculate them statistically as with universal Kriging, or fit it in any other way, as they would take a lot of computational effort to determine. To simplify the calculation, we chose IRF, which are a more generalized covariance function [15,16] . If enough data points are available, the selection of the covariance function is not critical. Finally, the noise covariance is mainly determined by the parameters of the sensor which has an effect similar to the smoothing parameters of the smooth spline.
Compared with KKF, the advantage of our method is that it considers the development law of dynamic system through the state space model, so it does not need to spend a lot of prior calculation to obtain the initial parameters. However, in the calculation process of KKF, the parameters are estimated from all historical data, which is more universal than our algorithm. From the results, our model focuses more on temporal space, while KKF focuses more on time. To estimate past values of the function, our method is similar to estimation by Kalman smoothing, which can obtain a relatively good result. As the trend term F T m t in our model is determined by the current sampling data Y t , our model is more unsuitable for time prediction than KKF. Therefore, we introduced a dynamic linear regression into our algorithm, in order to supplement the function of time prediction. At the same time, due to the limitations of dynamic linear models, our multi-step prediction may be more suitable for spatio-temporal data with periodic and trend changes. Comparing the simulation and actual data, good prediction results were observed.
Some limitations of the FUKSS and its modified, as presented, is the assumption regarding the initial parameters. First, we assume that c 0 (x) is a zero-mean process with a covariance function E[c 0 (x a )c 0 (x b )] = αk(x a , x b ). Of course, the known non-zero mean can be obtained from all the data vectors, but this requires a lot of extra work for the general covariance function. We chose [26], as it proposes two most likely cases, which do not add significant complexity to the description of the algorithm. Those two cases are as follows: (1) c 0 (x) is known, such that α = 0; and (2) c 0 (x) is determined by the definition, in which case it can be simplified to the length of the running time. In practical application, we prefer the second case, which is simulated as shown in Figure 4. On the other hand, we prefer to take a smaller value than zero if enough data points are available. Second, our algorithm takes advantage of the characteristics of Kriging space statistics and does not require fixed observation points. For the additional sample data, the computation requirement of our algorithm will be greatly increased. Finally, as for selecting parameters for dynamic linear regression, we need prior knowledge to obtain better results, which can lead to extra workload. Due to the characteristics of the cubic dynamic linear regression model and the certain smoothing of the historical data by our FUKSS model, rough parameter selection can be used to obtain relatively good results.

Conclusions
In this work, we proposed a spatio-temporal dynamic field estimation algorithm with a bias toward spatial optimization, based on universal Kriging state-space model. This model can describe the spatio-temporal characteristics of spatio-temporal data flexibly and efficiently, and yield its interpolation or prediction accurately. The proposed method uses spatial statistical information to reduce the requirement of sensor sampling consistency. We introduced dynamic linear regression into our algorithm, in order to supplement the function of time prediction. The salient features of this method include handling the spatial covariance matrices and tackling measurement noise. Numerical comparison indicated the feasibility and accuracy of the proposed method. Future research will consider how to set up self-organizing networks in the deployed sensors using temporal and spatial correlation, such that the service life of the sensors can be extended.
Author Contributions: Z.S. conceived the study and contributed to the design of the study, modeling, data interpretation, drafted and revised the manuscript. X.Z. contributed to the development of algorithms, data evaluation, and data analysis. Both authors reviewed the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
For P t , we can write it as: (A1) From the orthogonality principle, the linear minimum variance estimate c t = ∑ t i=1 G i Y i is the orthogonal projection of c t onto Y. We can obtain orthogonality by: Therefore, the second term of the equation can be reduced to E[(c t − c t )(G t Y t ) T ] = 0. Thus, we have: where we use and v t is uncorrelated with c t−1 , c t−1 , ε t . Furthermore, using the advantage of the unrelated properties between ε t and c t−1 − c t−1 , we find

Appendix B
In order to derive an expression for p t (x) = E[(c t (x) − c t (x))(c t − c t )], we first observe that the linear minimum variance estimate c t (x) = ∑ t i=1 g T i (x) Y i is the orthogonal projection of c t onto Y, by the orthogonality principle. Therefore, To simplify c t = ∑ t i=1 G i Y i , we can write it in terms of another equivalent BLUE We also have that A i W i F T = 0, i = 1, 2, . . . , t. Therefore, we obtain: Next, we replace k(x) by KK −1 k(x)and use P t = [p t (x 1 ), . . . , p t (x n )], K = [k(x 1 ), . . . , k(x n )] to obtain: Appendix C Table A1. Nomenclature.

N
The ideal number of sensors. n t Number of effective observed sensors at time t. x Space position. X Set of interested sites in quantity N. Z t (x) State variables at location x and time t. Y t (x) Observations at location x and time t. Z t The state vector at time t for N sensors. Y t Effective observed vector at time t for n t sensors.
The selection matrix is used to represent the conversion relationship between ideal sensors and effective observation sensors, and its matrix size is n t * N and its rank is n t .
Known drift functions at location x with size r * 1.
Known drift functions at N ideal sensors location with size r * N. β t = [β t1 . . . β tr ] T Unknown drift coefficients vector at time t with size r * 1.
ε t (x) The spatially correlated errors with zero mean and known covariance function E[ε t (x a )ε t (x b )] = k(x a , x b ). ε t Vector of ε t (x) for ideal sensors at time t with length N. k(x a , x b ) Kernel function in terms of locations x a and x b . K Covariance matrix of ε t (x) with size N * N. c t (x) a latent variable at location x and time t for simple calculation. c t The latent state vector at time t for N sensors. c t (x), c t Estimate of c t (x) and c t , respectively.
Y t A simple representation of the computational process, and its expression is Y t = Y t − W t c t−1 .

v t (x)
System random error at location x and time t, generally regarded as white noise, is independent of time and position.
v t Vector of v t (x) for effective observed vector at time t with length n t .
R t Covariance matrix of v t with size n t * n t , which is diagonal matrix. P t Covariance matrix of c t − c t with size N * N. p t (x) Covariance vector between c t − c t andc t (x) − c t (x) with length N.

k(x)
Covariance vector between ε t and ε t (x) with length N.
q t (x) The weight value when using BLUE estimation for location x and time t. ζ, τ, γ,L Parameters of the Holt-Winters model. S t , B t , C t−L+1 Intermediate variable vector of the Holt-Winters model. L t , M t , m t , G t , g t Simplified variables in calculation process.