A Two-Step Method for Missing Spatio-Temporal Data Reconstruction

Missing data reconstruction is a critical step in the analysis and mining of spatio-temporal data; however, few studies comprehensively consider missing data patterns, sample selection and spatio-temporal relationships. As a result, traditional methods often fail to obtain satisfactory accuracy or address high levels of complexity. To combat these problems, this study developed an effective two-step method for spatio-temporal missing data reconstruction (ST-2SMR). This approach includes a coarse-grained interpolation method for considering missing patterns, which can successfully eliminate the influence of continuous missing data on the overall results. Based on the results of coarse-grained interpolation, a dynamic sliding window selection algorithm was implemented to determine the most relevant sample data for fine-grained interpolation, considering both spatial and temporal heterogeneity. Finally, spatio-temporal interpolation results were integrated by using a neural network model. We validated our approach using Beijing air quality data and found that the proposed method outperforms existing solutions in term of estimation accuracy and reconstruction rate.


Introduction
Following both the rapid development and popularization of geographic information and the enhancement of data collection, data with temporal and spatial attributes are quickly accumulated and form large numbers of spatio-temporal datasets [1]; however, missing data are extremely common; for example, missing data on air quality monitoring sensor readings, missing data on floating car track points or the absence of mobile phone signaling records.If these gaps in data cannot be accurately filled, subsequent analysis and modeling of the data can lead to inaccurate results and unreasonable inference [2].Simply deleting records containing missing data would lead to significant loss of original information and would be a waste of data resources [3]; therefore, methods to accurately and efficiently interpolate missing data are urgently needed.
Spatial interpolation methods mainly use spatial correlation among data to interpolate missing data.Traditional methods (e.g., inverse distance weighting (IDW)) simply assume that the data distribution obeys the first law of geography, namely the closer data are in spatial distribution, the greater the contribution they make to missing data interpolation.During interpolation, each site is assumed to be independent of the others, and weights are calculated by computing the distance between missing data and the surrounding site [11,12].Most approaches use kriging, a linear regression method that utilizes minimum mean square error and does not treat each site independently.The method assumes consistent sample mean and variance to meet second order stationarity and assumes that the covariance between any two spatio-temporal points is only associated with the distance (i.e., the absolute position of time and space is irrelevant) [13,14].However, due to the existence of spatial and temporal heterogeneity, the data distribution can show uneven characteristics and relationships according to different regions [15]; therefore, the accuracy of interpolation results obtained by existing methods remains unsatisfactory if data are not homogeneously distributed.To solve this problem [16] considered spatial autocorrelation and heterogeneity in a study area and proposed a point estimation model of biased hospital-based area disease estimation (P-BSHADE).The P-BSHADE method calculates the covariance and correlation coefficient of historical observational data and uses the expectation between surrounding stations and interpolation sites to get an optimal linear unbiased estimator.However, in cases of continuous missing data, the method leads to a singular value of the missing data matrix, which results in large interpolation error.At the same time, this method does not consider the heterogeneity of the time dimension [2].
Time series prediction methods typically use historical data for a given location to build a prediction framework for predicting the values of missing data points at the same site.The autoregressive integrated moving-average (ARIMA) model [17] and simple exponential smoothing (SES) [18] are two representative examples of this approach.However, this approach fails to address two major problems.First, many prediction models do not fully utilize the essential characteristics of spatio-temporal data, which can degrade performance; second, if a consecutive series of data is all lost, prediction methods often cannot achieve complete reconstruction [19].
Given that single dimension interpolation methods only consider spatial or temporal dimensions, achieving satisfactory interpolation results is challenging.In recent years, a number of studies have extended single dimension interpolation methods to consider both space and time; for example, spatio-temporal probabilistic principal component regression (ST-PCR), spatio-temporal IDW (ST-IDW), spatio-temporal kriging (ST-kriging) and the spatio-temporal heterogeneous covariance method (ST-HC) [2,3,7,9,10,[20][21][22].ST-PCR [9] is a statistical learning-based method, which takes advantage of the statistical feature of observed data.However, it often needs a strong hypothesis over the data.ST-IDW [23] defined a three-dimensional space-time distance, which then applied IDW to estimate missing values; however, due to the existing problems with the IDW method, application of ST-IDW remains limited and fails to achieve unbiased estimation.The ST-kriging method [14] estimates the interpolation function by adopting a spatio-temporal covariance function, but it does not take into account the effects of spatial and temporal heterogeneity on interpolation results.To address this issue, the ST-HC method, which is an extension of P-BSHADE, estimates missing data by considering temporal and spatial heterogeneity.Missing datasets are firstly partitioned into homogenous spatial regions, and then, the most correlated spatial sampling and time sampling series are selected for the partition of missing data.According to the P-BSHADE algorithm, both spatial and temporal contribution weights are calculated to obtain the best linear unbiased estimates in spatial and temporal dimensions.Finally, using the correlation coefficient to determine the spatial and temporal weights, estimated values in spatial and temporal dimensions are integrated to obtain overall estimated values of missing data [2].However, this method requires the whole dataset to participate in computation, which leads to high computational complexity and a large volume of redundant data.For example, when the time span of the dataset is large, including the n dimensional space sequence, both n 2 covariance and the correlation coefficient need to be calculated for each missing data point.In addition, when data are missing continuously, interpolation accuracy is low, and it may even be impossible to obtain a final interpolation result.
Missing data reconstruction methods are further challenged by the variation in patterns of missing data [24,25]; for example, missing data completely at random, non-random missing data [26], or whole blocks of missing data [27] (Figure 1).The work in [10] formulates the spatio-temporal sensory data as a high-dimensional tensor, using the tensor completion method to recover missing values.However, when a whole temporal or spatial dimension of the data is missing, this method may fail.The same problem exists for other spatio-temporal interpolation methods (e.g., IDW, kriging, P-BSHADE, ST-IDW, ST-kriging, ST-HC).Furthermore, most existing methods assume a linear relationship between spatial and temporal data (e.g., ST-HC, ST-IDW, ST-kriging); however, the relationship between spatial and temporal data may be linear or nonlinear.To address these issues, in this study, we developed a new two-step method to reconstruct missing spatio-temporal data (ST-2SMR).or whole blocks of missing data [27] (Figure 1).The work in [10] formulates the spatio-temporal sensory data as a high-dimensional tensor, using the tensor completion method to recover missing values.However, when a whole temporal or spatial dimension of the data is missing, this method may fail.The same problem exists for other spatio-temporal interpolation methods (e.g., IDW, kriging, P-BSHADE, ST-IDW, ST-kriging, ST-HC).Furthermore, most existing methods assume a linear relationship between spatial and temporal data (e.g., ST-HC, ST-IDW, ST-kriging); however, the relationship between spatial and temporal data may be linear or nonlinear.To address these issues, in this study, we developed a new two-step method to reconstruct missing spatio-temporal data (ST-2SMR).

Problem Definitions
Definition 1. Suppose that ( , ) represents a spatio-temporal missing dataset, where S and T are the spatial and temporal dimension, respectively; = , , ⋯ , , = , , ⋯ , , m is the total number of spatial objects and n is the total number of timestamps.An entry = ( , ) refers to the value of the -th spatial object at the -th timestamp (1 ≤ ≤ , 1 ≤ ≤ ).The definition of the size of the sliding window is exists, missing at the j-th timestamp.Similarly, if there exist { ≠ ∅|∀1 ≤ ≤ }, then is a complete spatial sequence.If there exist { ≠ ∅|∃1 ≤ ≤ }, it means that exists, missing at the -th spatial object.
Definition 2. Suppose that = ( , ) is the estimated value of = ( , ), then the problem can be defined as: where MSE is the minimum mean square error and E is the statistical expectation.Meet ( ) = to ensure that the process of interpolation results in an unbiased estimate.

Method Framework
The method developed in this study (ST-2SMR) takes into account the problems of existing missing data interpolation methods and was constructed using 5 main steps (Figure 2): the partition of dataset into training and testing parts, coarse-grained interpolation, fine-grained interpolation, integration of temporal and spatial interpolation results and model performance evaluation.

Problem Definitions
Definition 1. Suppose that y(S, T) represents a spatio-temporal missing dataset, where S and T are the spatial and temporal dimension, respectively; S = {s 1, s 2 , • • • , s m }, T = {t 1, t 2 , • • • , t n }, m is the total number of spatial objects and n is the total number of timestamps.An entry V ij = y s i , t j refers to the value of the i-th spatial object at the j-th timestamp (1 2 ) ≤ j ≤ (j + w−1 2 )} , then s i exists, missing at the j-th timestamp.Similarly, if there exist {V ij = ∅ ∀1 ≤ i ≤ m} , then t j is a complete spatial sequence.If there exist {V ij = ∅ ∃1 ≤ i ≤ m}, it means that t j exists, missing at the i-th spatial object.Definition 2. Suppose that ẑi = ŷ(s i , t i ) is the estimated value of z i = y(s i , t i ), then the problem can be defined as: where MSE is the minimum mean square error and E is the statistical expectation.Meet E( ẑi ) = z i to ensure that the process of interpolation results in an unbiased estimate.

Method Framework
The method developed in this study (ST-2SMR) takes into account the problems of existing missing data interpolation methods and was constructed using 5 main steps (Figure 2): the partition of dataset into training and testing parts, coarse-grained interpolation, fine-grained interpolation, integration of temporal and spatial interpolation results and model performance evaluation.

1.
First, because the ST-2SMR method uses a neural network model to combine the interpolation results in spatial and temporal dimensions, the space-time missing dataset was divided, with 80% selected for parameter training and 20% used as a test dataset to evaluate model performance, to avoid overfitting and to improve generalization ability.

2.
Second, to avoid the influence of continuous missing data on fine-grained interpolation, coarse-grained interpolation for missing datasets was applied.

3.
Based on the results of coarse-grained interpolation, spatial and temporal heterogeneity was used to perform fine-grained interpolation.During this process, it was necessary to calculate the correlation coefficient and covariance between time and space sequences to fit the parameters.
If whole time series or spatial sequences are involved in interpolation, redundant sample data increase computational complexity; therefore, to improve the accuracy and speed of interpolation, a suitable sliding window was introduced to ensure the strongest correlation between sample data and missing data.Next, a heterogeneous covariance function was constructed for the space-time dimension.The best unbiased estimate of missing data can then be obtained by maximizing the objective function.4.
After interpolation of temporal and spatial dimensions, estimated values without missing data were chosen as training samples for the neural network, which is used to mine nonlinear relationships in spatio-temporal data.Estimated values of missing data were then obtained by inputting the spatio-temporal interpolation results of missing data into the trained neural network model.

5.
Finally, the performance of the model was evaluated using the test dataset.

Coarse-Grained Interpolation
Using existing spatial and temporal interpolation methods, it is difficult to obtain accurate interpolation estimates where there is a lack of sample points around missing points (Figure 1b,c).In some cases, it may even be impossible to obtain interpolation estimates (Figure 1d).We overcame this issue by introducing coarse-grained interpolation before fine-grained interpolation in order to eliminate the influence of continuous missing data.Using this approach, the ST-2SMR method is able to locate missing patterns in any combination, which ensures that the method is suitable for any serious loss situation.
In the spatial dimension, we used a classical multivariate statistical model (IDW) to interpolate missing data.IDW adopts the observed data of adjacent space points to estimate unknown data [11,28].When the distance of the adjacent space points is closer to the point of interpolation, the spatial contribution value is larger.Interpolation estimates for missing data v_spatial 0 are calculated as follows: where d i denotes the distance between interpolation points and observation points and α is the decay weight rate, where a larger α denotes a faster decay by the distance.
In the time dimension, an exponential moving average model (SES) was used to estimate missing data [18].SES assumes that there is strong temporal correlation between data, and the closer the sampling point is to the missing point, the bigger the weight that it is given.Traditional SES uses only the sampling data that are before the interpolation time point; however, when the time span is large, this can lead to too many irrelevant data involved in the calculation, which reduces interpolation accuracy.Here, we extended it to set a sliding window wc, which takes only before the wc time slice of the missing data and the last wc time slice of the missing data as the sampling point for the interpolation calculation.The model can be expressed as: where v_temporal 0 is the estimated value for missing data, th j is a time interval between the sampling data and the missing data and β is a smoothing parameter with a range of (0, 1).Assuming that V 4,6 are the data to be interpolated, we first selected all no missing data at t 6 , using IDW interpolation estimation (Figure 3).In the temporal dimension, we set the sliding window to be wc = 4 and selected {V 4,2 , V 4,3 , V 4,5 , V 4,7 , V 4,8 , V 4,10 } as the sampling data for interpolation.If both the IDW and SES methods can obtain an estimated value, the mean of the two is taken as the interpolation result of V 4,6 .If SES or IDW fail to get an interpolation result, the other estimated value is taken as the estimated value.If both SES and IDW fail to obtain an interpolation result, then V 4,6 is still missing, and the fine-grained interpolation algorithm is needed to estimate its value.The pseudocode of this process is shown in Algorithm 1.

Input: Original Missing Matrix V m×n
Temporal Threshold wc Parameter of IDW α Parameter of SES β Output: Coarse-Grained Imputing Matrix C_M m×n C_M m×n ← Initialization(V m×n )

Sliding Window
Before fine-grained interpolation, the ST-2SMR model needs to set up a dynamic sliding window to determine sample data.Owing to the strong temporal dependence of spatial and temporal data, selecting different numbers of data for interpolation estimation can lead to different results.If the window is set too small, sample data cannot fully reflect spatio-temporal relationships; if the window is too large (i.e., too many data are used), significant redundant data are introduced to the calculation, and the computational complexity increases.
Considering the fact that spatio-temporal data from a short period of time remain within the approximate correlation, take the average correlation of the missing data sequence and their adjacent spatial sequences to determine the optimal sample data using the expressions: where n is the timestamp for missing data, j are the first j moments of missing data, i are the last i moments of missing data, Corr t n , t j is the correlation coefficient between the spatial sequence of missing data and the first j spatial sequences, Corr(t n , t i ) is the correlation coefficient between the spatial sequence of missing data and the last i spatial sequences, w_begin is the beginning of the window and w_end is the end of the window.w_begin and w_end are determined heuristically and are initially set to n − 1 and n + 1. Corr t n , t j and Corr(t n , t i ) are first calculated, then w_begin moves forward, and w_end moves backwards until the mean correlation coefficient reaches the maximum.Assuming that V 4,6 is the missing datum to be interpolated, t 6 is the spatial sequence of missing data, and t 2 ~t10 is the sliding window (Figure 4).The pseudocode of this process is shown in Algorithm 2.

Input: Missing Spatial Series t n Output: Start of Window w_begin
End of Window w_end  Algorithm 2: Selected optimal window (SOM).

Fine-Grained Spatial Dimension Interpolation
In the spatial dimension, we first selected a sliding window based on the optimal window selection algorithm (SOM), whose size is ws.Assuming that V 4,6 is the missing data to be interpolated, the start and end positions of the selected window are centered on the V 4,6 in the first 4 columns and the last 4 columns of Figure 5.In this window, we chose the ms time series with the largest correlation of missing data.In detail, we adopted a pair-wise approach for calculating the correlation between the time series of missing data and its spatial neighbor sampling data, and then, ms spatial sampling data were selected to calculate the estimated value using the expression: ŷ0 = ms ∑ i=1 w i y i (7) where y i denotes the i-th spatial neighbor sampling data of missing data and w i denotes the corresponding contribution weight of y i .As shown in Figure 5, suppose ms = 3, if {s 2 , s 6 , s 8 } is the most relevant time series with missing data, then we take {V 2,6 , V 6,6 , V 8,6 } as the sampled data for interpolation.In order to ensure ŷ0 is an unbiased estimator of missing data, the following conditions must be satisfied: where E(.) represents the statistical expectation.Considering the heterogeneity of the spatial dimension, we introduced the parameter b i to calculate the ratio between the statistical expectation of ŷ0 and y 0 to characterize spatial heterogeneity.
Combining Formulae ( 7)-( 9), the constraint condition of w i is as follows: In order to obtain the parameter w i , the objective function was constructed to minimize the variance between the missing and true data.
Among them, Formula (11) can be resolved as follows: where C represents the covariance between different spatial points.Considering Formula (10), Formulae ( 11) and ( 12) can be written as: Then, the problem of solving the parameter w i was transformed into a standard Lagrange constrained optimization problem, and Formula (13) was re-written as: where µ is a Lagrange multiplier.The partial derivatives of σ 2 ŷ0 produces the equations: Formula ( 15) can be written in the matrix form: In order to obtain the parameter w i in Formula ( 16), we first calculated the covariance matrix C_S between the most relevant temporal series and the ratio of statistical expectation b i and covariance C i between the most relevant temporal series and the missing temporal series (Lines 6-9 of Algorithm 3).Then, we joined these values into a matrix like Formula ( 16) and solved this matrix to get parameter w i (Lines 10-12 of Algorithm 3).Finally, through Formula (7), we obtained estimated values of missing data.As shown in Figure 5, parameter w i of V 4,6 can be calculated by: (1) obtaining the covariance of {s 2 , s 6 , s 8 } and the covariance C(s 2 , s 4 ), C(s 6 , s 4 ) and C(s 8 , s 4 ); (2) calculating the ratio of statistical expectation b 1 = E(s 2 )/E(s 4 ), b 2 = E(s 6 )/E(s 4 ) and b 3 = E(s 8 )/E(s 4 ); (3)

Fine-Grained Temporal Dimension Interpolation
In the temporal dimension, we used the SOM algorithm to select an optimal window as the data matrix for fine-grained temporal dimension interpolation.In this sliding window, the nt sample data with the largest correlation of missing data were chosen.The estimated value of missing data t0 was calculated as follows: where t j denotes the j-th temporal neighbor sampling data of missing data and ϕ j denotes the corresponding contribution weight of t j .Similar to Formulae ( 12)-( 14), to ensure t0 is an unbiased estimator of missing data and to calculate the weight ϕ j , the following conditions must be satisfied: where v is a Lagrange multiplier, t 0 is the true value of missing data and a j is the ratio of statistical expectation between the most relevant spatial series and the spatial series of missing data.The partial derivatives of σ 2 t0 can be written as: Formula ( 19) can be written in the form of a matrix: As shown in Figure 6, in order to estimate the value of missing data V 4,6 , we first adopted the SOM algorithm to select t 2 − t 10 as the sliding window.In this window, we choose the nt spatial series with the largest correlation of missing data.For example, when nt = 4, the spatial series would be {t 2 , t 4 , t 8 , t 9 }, and we would take {V 4,2 , V 4,4 , V 4,8 , V 4,9 } as the sampled data for interpolation (Lines 3-5 of Algorithm 4).The covariance matrix C_T was calculated from the most relevant spatial series and the ratio of statistical expectation a j and covariance C j by using the most relevant spatial series and the missing spatial series (Lines 6-9 of Algorithm 4).Then, we joined these values into matrix Formula (20) and solved this matrix to get parameter ϕ j (Lines 10-12 of Algorithm 4).Finally, through Formula ( 17), interpolation result V 4,6 was calculated as ISPRS Int.J. Geo-Inf.2017, 6, 187 13 of 26 be { , , , } , and we would take { , , , , , , , } as the sampled data for interpolation (Lines 3-5 of Algorithm 4).The covariance matrix _ was calculated from the most relevant spatial series and the ratio of statistical expectation and covariance by using the most relevant spatial series and the missing spatial series (Lines 6-9 of Algorithm 4).Then, we joined these values into matrix Formula (20) and solved this matrix to get parameter (Lines 10-12 of Algorithm 4).Finally, through Formula ( 17), interpolation result , was calculated as × , + × , + × , + × , .

Spatio-Temporal Integration
After obtaining interpolation results for the time and space dimensions, the BP (back propagation) neural network was trained to integrate spatial and temporal interpolation results to obtain final missing data estimation values [5].The BP neural network can be regarded as a nonlinear function.When the number of input nodes is n, the output node is m, and the BP neural network expresses the mapping function from n independent variables to m dependent variables [29].
Appropriate samples are needed for training and testing neural networks.In this study, we first detect no missing data in the fine-grained temporal matrix F_T m×n , fine-grained spatial matrix F_S m×n and coarse-grained matrix C_M m×n to construct the samples (Lines 1-7 of Algorithm 5).Then, the samples were divided into three parts: 80% as a training dataset, 10% as a test dataset and 10% as a cross-validation dataset to control early stopping (Lines 8-10 of Algorithm 5).Next, the error back-propagation algorithm was used to train the neural network model (Figure 7).Assuming that the input variables of the neural network model are X = {F_S, F_T}, the connection weights between the input layer and the hidden layer are γ ij and the hidden layer bias value is bias1, then the output of the hidden layer is: where l is the number of hidden layer nodes and f is the activation function.Activation function f has many forms, from which we selected the sigmoid, which can be expressed as: After completing the neural network model, we were then able to predict the missing data.We first detected missing data in the coarse-grained matrix _ × .Then, the results of fine-grained spatial interpolation (FGSI) and fine-grained temporal interpolation (FGTI) algorithms were input to the neural network to calculate the estimated values (Lines 12-18 of Algorithm 5).Meanwhile, the results of FGSI and FGTI in the testing dataset were input to the neural network to evaluate the performance of the model (Lines 19-25 of Algorithm 5).The interpolation estimate of missing data ST was calculated as follows: where γ j1 is the connection weight between the hidden layer and the output layer and bias2 is the bias of the output layer.The weights and bias of Formula ( 23) can be calculated from: where η is the learning rate and e is the prediction error of the neural network (i.e., the difference between the predicted and expected outputs).The training process of the neural network was completed when the algorithm reached the set of training objectives (i.e., the number of iterations and the minimum error).
After completing the neural network model, we were then able to predict the missing data.We first detected missing data in the coarse-grained matrix C_M m×n .Then, the results of fine-grained spatial interpolation (FGSI) and fine-grained temporal interpolation (FGTI) algorithms were input to the neural network to calculate the estimated values (Lines 12-18 of Algorithm 5).Meanwhile, the results of FGSI and FGTI in the testing dataset were input to the neural network to evaluate the performance of the model (Lines 19-25 of Algorithm 5).

Datasets
We evaluated our model based on real air quality datasets from Beijing, which were collected between 5 January 2014 and 30 April 2015.Data included PM 2.5 , CO, SO 3 , O 3 , NO 2 and other attributes, each of which was collected at 36 air quality monitoring stations at hourly intervals, as depicted in Figure 8.The dataset contained a total of 8759 records [30] (Table 1).The different air quality variables have different degrees of missing data (Table 1).Among them, only PM2.5 contains a complete case.For the other attributes, we used combination analysis to explore the patterns of missing spatio-temporal missing data (Figure 9).Using the PM2.5 dataset as an example, we set = 48 as the time sliding window, which took 48 h of data to explore the missing pattern.The missing numbers of st19, st27, st35 and st36 at the same time were eight (i.e., the missing pattern in Figure 1c).Data for st19, st27, st36 were completely missing The different air quality variables have different degrees of missing data (Table 1).Among them, only PM 2.5 contains a complete case.For the other attributes, we used combination analysis to explore the patterns of missing spatio-temporal missing data (Figure 9).The different air quality variables have different degrees of missing data (Table 1).Among them, only PM2.5 contains a complete case.For the other attributes, we used combination analysis to explore the patterns of missing spatio-temporal missing data (Figure 9).Using the PM2.5 dataset as an example, we set = 48 as the time sliding window, which took 48 h of data to explore the missing pattern.The missing numbers of st19, st27, st35 and st36 at the same time were eight (i.e., the missing pattern in Figure 1c).Data for st19, st27, st36 were completely missing Using the PM 2.5 dataset as an example, we set wt = 48 as the time sliding window, which took 48 h of data to explore the missing pattern.The missing numbers of st 19 , st 27 , st 35 and st 36 at the same time were eight (i.e., the missing pattern in Figure 1c).Data for st 19 , st 27 , st 36 were completely missing (Figure 1d).Data for {st 18 , st 19 }, {st 35 , st 36 } showed random block loss (Figure 1b).Finally, a large number of patterns showed random missing data (Figure 1a).These patterns show that if the interpolation process was performed directly on the original dataset (i.e., without coarse-grained interpolation to eliminate the effect of successive missing data), it would be difficult to obtain accurate evaluation.

Evaluation Criteria
In order to evaluate the ST-2SMR method, we compared three existing methods (ST-kriging [31], P-BSHADE [16,32] and ST-HC [2,33]), each constrained in three different ways (i.e., increasing coarse-grained interpolation, increasing the sliding window or both; Table 2).We adopted mean absolute error (MAE), mean relative error (MRE) and the ratio of construction (RC) as the evaluation criteria to verify the performances of the proposed method.

Experimental Results
The proposed method was implemented in MATLAB 2016b.The PM2.5 dataset was selected to validate the proposed method.Through the comparative analysis of different experiments, the relevant parameters were set as follows: α = 4, β = 0.85, wc = 14, ms = 10, nt = 10, η = 0.01.

Overall Results
Among the first group of experiments (i.e., those using unaltered spatio-temporal interpolation methods), the ST-HC and ST-2SMR have the same reconstruction ratio, but different accuracy.The ST-2SMR method had the highest accuracy, reflecting in the effect of the nonlinear combination on the integration of spatial and temporal results (Table 3).The ST-HC and ST-2SMR have the same reconstruction ratio because they adopt the same spatio-temporal interpolation algorithm; however, the reconstruction rate was the lowest, which is owed to the introduction of heterogeneity in the time dimension.When we calculate the correlation coefficient and covariance between the spatial sequence of the missing data and the time slice, these sequences may seriously be missing.If the data sequence in the calculation is completely missing or using the pare-wise method makes the data sequence be completely missing, which may lead to the covariance matrix still having missing data, therefore we cannot get the final estimates, and this results in lower rate reconstruction.
In the second group of experiments (i.e., those with the original algorithms, but with the sliding window increased), interpolation precision was improved because the increased sliding window ensured that the sample data have the strongest correlation with missing data.In the third group of experiments (i.e., algorithms with coarse-grained interpolation added before the original interpolation method; here, we use IDW + SES as the coarse-grained interpolation method), the interpolation accuracy was also improved, and the complete reconstruction results were obtained because of the influence of continuous missing data has been eliminated.This result also validates the point of view in [34]: the accuracy and reliability of spatio-temporal interpolation methods depend on the pattern of missing data.However, interpolation accuracy improved the most when both constraints were applied at the same time, as demonstrated by the significant improvements in MAE and MSE (Table 3).

Effect of Coarse-Grained Interpolation
Through coarse-grained interpolation, continuous missing data were eliminated, significantly improving accuracy.Our experiments demonstrated that, regardless of the coarse-grained interpolation method chosen, interpolation accuracy improves when coarse-grained interpolation is first used (Table 4; here, we compare with the accuracy of those using unaltered spatio-temporal interpolation methods in Table 3).Among the methods tested, ST-2SMR showed the most significant improvement in accuracy, reflecting the nonlinear integration of spatial and temporal interpolation results, an approach that is particularly suitable for describing complex relationships between spatial and temporal data.However, overall, the accuracy of the IDW + SES method was found to be the best, so this method was chosen for coarse-grained interpolation in subsequent experiments.

Effect of the Coarse-Grained Missing Data Rate
According to Algorithm 1, the results of coarse-grained interpolation are mainly affected by the decay rate of weight α, smoothing parameter β and by time threshold wc.According to the experimental results of [27], performed using the same PM2.5 dataset, IDW and SES achieve a minimum MAE value when α = 4 and β = 0.85.For the time threshold, different values exert a significant influence on the coarse-grained interpolation results.In this study, we determined wc heuristically, with the value initially set to one (i.e., the center of missing data, taking the first hour and last hour as the sample data).With the increase in wc, the reconstruction rate of the missing data increased until complete reconstruction was achieved (Figure 10).We found that MAE was smallest when wc = 1 and stable when wc > 3.At wc > 3, the contribution weight of the sample data for missing data was nearly equal to zero, resulting in a small effect on the interpolation results.Furthermore, the termination condition of wc was set to ensure no continuous deletion of the whole space sequence and time series after coarse-grained interpolation, so as to obtain a complete interpolation result in the fine-grained interpolation.When wc < 14, the dataset from the coarse-grained interpolation still exhibited complete missing data in both the time and space sequences (Figure 11); therefore, fine-grained interpolation was required to achieve reconstruction.When wc = 14, coarse-grained interpolation eliminated the influence of successive missing data, so the reconstruction rate was 100%.ISPRS Int.J. Geo-Inf.2017, 6, 187 20 of 26 space sequence and time series after coarse-grained interpolation, so as to obtain a complete interpolation result in the fine-grained interpolation.When < 14, the dataset from the coarse-grained interpolation still exhibited complete missing data in both the time and space sequences (Figure 11); therefore, fine-grained interpolation was required to achieve reconstruction.When = 14, coarse-grained interpolation eliminated the influence of successive missing data, so the reconstruction rate was 100%.

Effect Sample Point Number
During fine-grained interpolation, we were required to select spatial neighbors and temporal neighbors for missing data.The number of sample points impacted the assessment results.Too few sample data points fail to reflect the correlation of spatial and temporal data, while excessive sample data increase computational complexity and also reduce the accuracy of assessment because of redundant data.According to the experimental results of [16], when the number of sampling points is set to between five and 15, the interpolation results are perfect; therefore, we set three sets of adjacent point selection patterns (5, 10 and 15) in both the spatial and temporal dimensions (i.e., a total of nine sets of experiments) and performed experiments to determine the most suitable number of samples.When = 10 and = 10, the ST-2SMR method achieved its best performance (Table 5); however, the results show that the number of neighbor points had no effect on the reconstruction rate.This result reflects the elimination of continuous missing data during space sequence and time series after coarse-grained interpolation, so as to obtain a complete interpolation result in the fine-grained interpolation.When < 14, the dataset from the coarse-grained interpolation still exhibited complete missing data in both the time and space sequences (Figure 11); therefore, fine-grained interpolation was required to achieve reconstruction.When = 14, coarse-grained interpolation eliminated the influence of successive missing data, so the reconstruction rate was 100%.

Effect Sample Point Number
During fine-grained interpolation, we were required to select spatial neighbors and temporal neighbors for missing data.The number of sample points impacted the assessment results.Too few sample data points fail to reflect the correlation of spatial and temporal data, while excessive sample data increase computational complexity and also reduce the accuracy of assessment because of redundant data.According to the experimental results of [16], when the number of sampling points is set to between five and 15, the interpolation results are perfect; therefore, we set three sets of adjacent point selection patterns (5, 10 and 15) in both the spatial and temporal dimensions (i.e., a total of nine sets of experiments) and performed experiments to determine the most suitable number of samples.When = 10 and = 10, the ST-2SMR method achieved its best performance (Table 5); however, the results show that the number of neighbor points had no effect on the reconstruction rate.This result reflects the elimination of continuous missing data during

Effect Sample Point Number
During fine-grained interpolation, we were required to select ms spatial neighbors and nt temporal neighbors for missing data.The number of sample points impacted the assessment results.Too few sample data points fail to reflect the correlation of spatial and temporal data, while excessive sample data increase computational complexity and also reduce the accuracy of assessment because of redundant data.According to the experimental results of [16], when the number of sampling points is set to between five and 15, the interpolation results are perfect; therefore, we set three sets of adjacent point selection patterns (5, 10 and 15) in both the spatial and temporal dimensions (i.e., a total of nine sets of experiments) and performed experiments to determine the most suitable number of samples.When ms = 10 and nt = 10, the ST-2SMR method achieved its best performance (Table 5); however, the results show that the number of neighbor points had no effect on the reconstruction rate.This result reflects the elimination of continuous missing data during coarse-grained interpolation, which meant that all missing data had access to observed data for interpolation, allowing the whole dataset to obtain corresponding estimates.Our experimental results show that the SOM algorithm can dynamically select the size of each window through the interaction information for time and space (Figure 12).As a result, interpolation accuracy was greatly improved.
ISPRS Int.J. Geo-Inf.2017, 6, 187 21 of 26 coarse-grained interpolation, which meant that all missing data had access to observed data for interpolation, allowing the whole dataset to obtain corresponding estimates.Our experimental results show that the SOM algorithm can dynamically select the size of each window through the interaction information for time and space (Figure 12).As a result, interpolation accuracy was greatly improved.

Performance of Two-and Three-Step Interpolation
In order to explore the convergence of the two-step interpolation, we further introduced a third-step interpolation (Table 6).The two-step interpolation used IDW + SES for coarse interpolation, while the three-step interpolation was based on the results of two-step interpolation using ST-kriging, P-BSHADE and ST-HC.The results demonstrate that three-step interpolation slightly improved accuracy, but the change was marginal.In addition, regardless of the method used for the third interpolation, the overall results tended to be stable; therefore, we concluded that the additional computational complexity of the three-step method was not justified by the minor improvement in performance.

Performance of Two-and Three-Step Interpolation
In order to explore the convergence of the two-step interpolation, we further introduced a third-step interpolation (Table 6).The two-step interpolation used IDW + SES for coarse interpolation, while the three-step interpolation was based on the results of two-step interpolation using ST-kriging, P-BSHADE and ST-HC.The results demonstrate that three-step interpolation slightly improved accuracy, but the change was marginal.In addition, regardless of the method used for the third interpolation, the overall results tended to be stable; therefore, we concluded that the additional computational complexity of the three-step method was not justified by the minor improvement in performance.To verify the universality of the proposed method, the approach was tested using the NO 2 , CO, SO 3 and O 3 datasets (Figure 13).The results confirmed that the proposed method is superior to the other three methods in terms of accuracy.We found that only our new method can guarantee a complete reconstruction result and is able to maintain consistent stability across different datasets.For example, the P-BSHADE method performed better on the SO 3 dataset, but worse for other datasets.The ST-HC method performed better on the NO 2 dataset, but worse on the other datasets.This variable performance reflects the fact that different datasets have completely different missing data patterns, from which existing methods directly interpolate results (i.e., they do not eliminate the influence of missing patterns before interpolation).To verify the universality of the proposed method, the approach was tested using the NO2, CO, SO3 and O3 datasets (Figure 13).The results confirmed that the proposed method is superior to the other three methods in terms of accuracy.We found that only our new method can guarantee a complete reconstruction result and is able to maintain consistent stability across different datasets.
For example, the P-BSHADE method performed better on the SO3 dataset, but worse for other datasets.The ST-HC method performed better on the NO2 dataset, but worse on the other datasets.This variable performance reflects the fact that different datasets have completely different missing data patterns, from which existing methods directly interpolate results (i.e., they do not eliminate the influence of missing patterns before interpolation).

Evaluation of Computational Efficiency
The computational efficiency is also an important factor worth evaluation for missing data reconstruction.We conducted a comparison of the computational efficiency on a 3.4-GHz Intel i7 CPU, with a 64-bit operating system, and 16.0 GM RAM.The CPU time costs of each interpolation method in the forecasting stages (i.e., we select 10% of samples as a test dataset) are shown in Figure 14.The computational efficiency of all of the four methods has no distinct difference.Obviously, ST-kriging is the fastest one among the interpolation methods because it does not take into account the effects of spatial and temporal heterogeneity on interpolation results.ST-HC and ST-2SMR consume only a little more time than ST-kriging and P-BSHADE because they are the extensions of P-BSHADE and consider the temporal and spatial heterogeneity.In addition, the linear combination of ST-HC to integrate spatio-temporal interpolation results almost spends the same time as nonlinear ways using a trained neural network, so the time complexity of ST-HC and ST-2SMR is nearly the same.However, the ST-2SMR method makes a significant trade-off between the efficiency and accuracy because the MSEs resulting from other methods are far larger than that of ST-2SMR (see Table 3).Therefore, when both efficiency and accuracy are considered, the proposed ST-2SMR outperforms the other methods.

Evaluation of Computational Efficiency
The computational efficiency is also an important factor worth evaluation for missing data reconstruction.We conducted a comparison of the computational efficiency on a 3.4-GHz Intel i7 CPU, with a 64-bit operating system, and 16.0 GM RAM.The CPU time costs of each interpolation method in the forecasting stages (i.e., we select 10% of samples as a test dataset) are shown in Figure 14.The computational efficiency of all of the four methods has no distinct difference.Obviously, ST-kriging is the fastest one among the interpolation methods because it does not take into account the effects of spatial and temporal heterogeneity on interpolation results.ST-HC and ST-2SMR consume only a little more time than ST-kriging and P-BSHADE because they are the extensions of P-BSHADE and consider the temporal and spatial heterogeneity.In addition, the linear combination of ST-HC to integrate spatio-temporal interpolation results almost spends the same time as nonlinear ways using a trained neural network, so the time complexity of ST-HC and ST-2SMR is nearly the same.However, the ST-2SMR method makes a significant trade-off between the efficiency and accuracy because the MSEs resulting from other methods are far larger than that of ST-2SMR (see Table 3).Therefore, when both efficiency and accuracy are considered, the proposed ST-2SMR outperforms the other methods.

Summary
Given the problem with existing missing data interpolation methods, this study developed a novel method called ST-2SMR.In the ST-2SMR method, missing data patterns in spatio-temporal datasets are first identified, and information on temporal and spatial dimensions is integrated to obtain a partial reconstruction.Using the output of this partial reconstruction, spatial and temporal heterogeneity is taken into account and a sliding window is set to both remove redundant sample data (i.e., to reduce computational complexity) and to ensure that the strongest correlation with missing data is selected (i.e., the most suitable data are chosen to improve the accuracy of the analysis).Finally, spatio-temporal interpolation results are integrated through a neural network model.We evaluated ST-2SMR using a real and open air quality dataset collected from Beijing.It is argued that the proposed method performs better than other existing methods.Providing the characteristics of the black box neural network models, the best way to integrate the results of spatio-temporal interpolation and to depict the nonlinear relationships between space and time requires further consideration.

Summary
Given the problem with existing missing data interpolation methods, this study developed a novel method called ST-2SMR.In the ST-2SMR method, missing data patterns in spatio-temporal datasets are first identified, and information on temporal and spatial dimensions is integrated to obtain a partial reconstruction.Using the output of this partial reconstruction, spatial and temporal heterogeneity is taken into account and a sliding window is set to both remove redundant sample data (i.e., to reduce computational complexity) and to ensure that the strongest correlation with missing data is selected (i.e., the most suitable data are chosen to improve the accuracy of the analysis).Finally, spatio-temporal interpolation results are integrated through a neural network model.We evaluated ST-2SMR using a real and open air quality dataset collected from Beijing.It is argued that the proposed method performs better than other existing methods.Providing the characteristics of the black box neural network models, the best way to integrate the results of spatio-temporal interpolation and to depict the nonlinear relationships between space and time requires further consideration.

Figure 1 .
Figure 1.Patterns of missing spatio-temporal data.Black squares represent missing data.

Figure 1 .
Figure 1.Patterns of missing spatio-temporal data.Black squares represent missing data.

Figure 2 .
Figure 2. Framework of model development.

Figure 2 .
Figure 2. Framework of model development.

Figure 5 .
Figure 5. Fine-grained interpolation in the spatial dimension.

Figure 6 .
Figure 6.Fine-grained interpolation in the temporal dimension.

Figure 9 .
Figure 9. Pattern of missing spatio-temporal PM2.5 data.Red squares represent missing data, and values on the right ordinate are the number of missing data combinations.

Figure 9 .
Figure 9. Pattern of missing spatio-temporal PM2.5 data.Red squares represent missing data, and values on the right ordinate are the number of missing data combinations.

Figure 9 .
Figure 9. Pattern of missing spatio-temporal PM 2.5 data.Red squares represent missing data, and values on the right ordinate are the number of missing data combinations.

Figure 10 .
Figure 10.Impact of the temporal threshold on coarse-grained interpolation.

Figure 11 .
Figure 11.Impact of the temporal threshold on ST-2SMR.

Figure 10 .
Figure 10.Impact of the temporal threshold on coarse-grained interpolation.

Figure 10 .
Figure 10.Impact of the temporal threshold on coarse-grained interpolation.

Figure 11 .
Figure 11.Impact of the temporal threshold on ST-2SMR.

Figure 11 .
Figure 11.Impact of the temporal threshold on ST-2SMR.

Figure 12 .
Figure 12.Influence of the sliding window on the interpolation results.The static window refers to the selection of fixed size sliding window (i.e., the center of missing data, taking the first 24 h and last 24 h as the sample data for interpolation).

Figure 12 .
Figure 12.Influence of the sliding window on the interpolation results.The static window refers to the selection of fixed size sliding window (i.e., the center of missing data, taking the first 24 h and last 24 h as the sample data for interpolation).

Figure 13 .
Figure 13.Performance of the ST-2SMR using different spatio-temporal datasets.(a) The result of the experiment in NO 2 datasets; (b) The result of the experiment in CO datasets; (c) The result of the experiment in SO 3 datasets; (d) The result of the experiment in O 3 datasets .

Figure 14 .
Figure 14.Comparison of the computational time costs for different interpolation methods.

Figure 14 .
Figure 14.Comparison of the computational time costs for different interpolation methods.

Algorithm 5 :
Combining spatial and temporal.

Table 2 .
Combination of different methods.P-BSHADE, point estimation model of biased hospital-based area disease estimation; ST, spatio-temporal; HC, heterogeneous covariance.

Table 3 .
Performance comparison of different methods.RC, ratio of construction; ST-2SMR, spatio-temporal missing data reconstruction.

Table 4 .
Performance of different coarse-grained interpolation methods 1 .

Table 5 .
Influence sample point number on interpolation results.

Table 5 .
Influence sample point number on interpolation results.

Table 6 .
Two-and three-step interpolation results.

Table 6 .
Two-and three-step interpolation results.