A Two-Step Method for Missing Spatio-Temporal Data Reconstruction

Shifen Cheng; Feng Lu

doi:10.3390/ijgi6070187

and

¹

State Key Lab of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

²

Fujian Collaborative Innovation Center for Big Data Applications in Governments, Fuzhou 350003, China

³

Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf.2017, 6(7), 187;https://doi.org/10.3390/ijgi6070187

Version Notes

Order Reprints

Abstract

Missing data reconstruction is a critical step in the analysis and mining of spatio-temporal data; however, few studies comprehensively consider missing data patterns, sample selection and spatio-temporal relationships. As a result, traditional methods often fail to obtain satisfactory accuracy or address high levels of complexity. To combat these problems, this study developed an effective two-step method for spatio-temporal missing data reconstruction (ST-2SMR). This approach includes a coarse-grained interpolation method for considering missing patterns, which can successfully eliminate the influence of continuous missing data on the overall results. Based on the results of coarse-grained interpolation, a dynamic sliding window selection algorithm was implemented to determine the most relevant sample data for fine-grained interpolation, considering both spatial and temporal heterogeneity. Finally, spatio-temporal interpolation results were integrated by using a neural network model. We validated our approach using Beijing air quality data and found that the proposed method outperforms existing solutions in term of estimation accuracy and reconstruction rate.

Keywords:

spatio-temporal interpolation; spatio-temporal heterogeneity; dynamic sliding window; neural network

1. Introduction

Following both the rapid development and popularization of geographic information and the enhancement of data collection, data with temporal and spatial attributes are quickly accumulated and form large numbers of spatio-temporal datasets [1]; however, missing data are extremely common; for example, missing data on air quality monitoring sensor readings, missing data on floating car track points or the absence of mobile phone signaling records. If these gaps in data cannot be accurately filled, subsequent analysis and modeling of the data can lead to inaccurate results and unreasonable inference [2]. Simply deleting records containing missing data would lead to significant loss of original information and would be a waste of data resources [3]; therefore, methods to accurately and efficiently interpolate missing data are urgently needed.

In past decades, a large number of interpolation methods has been proposed to solve the problem of spatio-temporal missing data [4,5,6,7,8,9,10]. These methods can be roughly divided into three categories: spatial interpolation, temporal interpolation and spatio-temporal interpolation.

Spatial interpolation methods mainly use spatial correlation among data to interpolate missing data. Traditional methods (e.g., inverse distance weighting (IDW)) simply assume that the data distribution obeys the first law of geography, namely the closer data are in spatial distribution, the greater the contribution they make to missing data interpolation. During interpolation, each site is assumed to be independent of the others, and weights are calculated by computing the distance between missing data and the surrounding site [11,12]. Most approaches use kriging, a linear regression method that utilizes minimum mean square error and does not treat each site independently. The method assumes consistent sample mean and variance to meet second order stationarity and assumes that the covariance between any two spatio-temporal points is only associated with the distance (i.e., the absolute position of time and space is irrelevant) [13,14]. However, due to the existence of spatial and temporal heterogeneity, the data distribution can show uneven characteristics and relationships according to different regions [15]; therefore, the accuracy of interpolation results obtained by existing methods remains unsatisfactory if data are not homogeneously distributed. To solve this problem [16] considered spatial autocorrelation and heterogeneity in a study area and proposed a point estimation model of biased hospital-based area disease estimation (P-BSHADE). The P-BSHADE method calculates the covariance and correlation coefficient of historical observational data and uses the expectation between surrounding stations and interpolation sites to get an optimal linear unbiased estimator. However, in cases of continuous missing data, the method leads to a singular value of the missing data matrix, which results in large interpolation error. At the same time, this method does not consider the heterogeneity of the time dimension [2].

Time series prediction methods typically use historical data for a given location to build a prediction framework for predicting the values of missing data points at the same site. The autoregressive integrated moving-average (ARIMA) model [17] and simple exponential smoothing (SES) [18] are two representative examples of this approach. However, this approach fails to address two major problems. First, many prediction models do not fully utilize the essential characteristics of spatio-temporal data, which can degrade performance; second, if a consecutive series of data is all lost, prediction methods often cannot achieve complete reconstruction [19].

Given that single dimension interpolation methods only consider spatial or temporal dimensions, achieving satisfactory interpolation results is challenging. In recent years, a number of studies have extended single dimension interpolation methods to consider both space and time; for example, spatio-temporal probabilistic principal component regression (ST-PCR), spatio-temporal IDW (ST-IDW), spatio-temporal kriging (ST-kriging) and the spatio-temporal heterogeneous covariance method (ST-HC) [2,3,7,9,10,20,21,22]. ST-PCR [9] is a statistical learning-based method, which takes advantage of the statistical feature of observed data. However, it often needs a strong hypothesis over the data. ST-IDW [23] defined a three-dimensional space-time distance, which then applied IDW to estimate missing values; however, due to the existing problems with the IDW method, application of ST-IDW remains limited and fails to achieve unbiased estimation. The ST-kriging method [14] estimates the interpolation function by adopting a spatio-temporal covariance function, but it does not take into account the effects of spatial and temporal heterogeneity on interpolation results. To address this issue, the ST-HC method, which is an extension of P-BSHADE, estimates missing data by considering temporal and spatial heterogeneity. Missing datasets are firstly partitioned into homogenous spatial regions, and then, the most correlated spatial sampling and time sampling series are selected for the partition of missing data. According to the P-BSHADE algorithm, both spatial and temporal contribution weights are calculated to obtain the best linear unbiased estimates in spatial and temporal dimensions. Finally, using the correlation coefficient to determine the spatial and temporal weights, estimated values in spatial and temporal dimensions are integrated to obtain overall estimated values of missing data [2]. However, this method requires the whole dataset to participate in computation, which leads to high computational complexity and a large volume of redundant data. For example, when the time span of the dataset is large, including the n dimensional space sequence, both n² covariance and the correlation coefficient need to be calculated for each missing data point. In addition, when data are missing continuously, interpolation accuracy is low, and it may even be impossible to obtain a final interpolation result.

Missing data reconstruction methods are further challenged by the variation in patterns of missing data [24,25]; for example, missing data completely at random, non-random missing data [26], or whole blocks of missing data [27] (Figure 1). The work in [10] formulates the spatio-temporal sensory data as a high-dimensional tensor, using the tensor completion method to recover missing values. However, when a whole temporal or spatial dimension of the data is missing, this method may fail. The same problem exists for other spatio-temporal interpolation methods (e.g., IDW, kriging, P-BSHADE, ST-IDW, ST-kriging, ST-HC). Furthermore, most existing methods assume a linear relationship between spatial and temporal data (e.g., ST-HC, ST-IDW, ST-kriging); however, the relationship between spatial and temporal data may be linear or nonlinear. To address these issues, in this study, we developed a new two-step method to reconstruct missing spatio-temporal data (ST-2SMR).

Figure 1. Patterns of missing spatio-temporal data. Black squares represent missing data.

2. Materials and Methods

2.1. Problem Definitions

Definition 1.

Suppose that

y (S, T)

represents a spatio-temporal missing dataset, where S and T are the spatial and temporal dimension, respectively;

S = {s_{1,} s_{2}, \dots, s_{m}}

,

T = {t_{1,} t_{2}, \dots, t_{n}}

, m is the total number of spatial objects and n is the total number of timestamps. An entry

V_{i j} = y (s_{i}, t_{j})

refers to the value of the

i

-th spatial object at the

j

-th timestamp

(1 \leq i \leq m, 1 \leq j \leq n)

. The definition of the size of the sliding window is

w

(

1 \leq w \leq n

). If there exist

{V_{i j} \neq \emptyset | \forall (j - \frac{w - 1}{2}) \leq j \leq (j + \frac{w - 1}{2})}

, then

s_{i}

is a complete temporal sequence. If there exist

{V_{i j} \neq \emptyset | \exists (j - \frac{w - 1}{2}) \leq j \leq (j + \frac{w - 1}{2})}

, then

s_{i}

exists, missing at the j-th timestamp. Similarly, if there exist

{V_{i j} \neq \emptyset | \forall 1 \leq i \leq m}

, then

t_{j}

is a complete spatial sequence. If there exist

{V_{i j} \neq \emptyset | \exists 1 \leq i \leq m},

it means that

t_{j}

exists, missing at the

i

-th spatial object.

Definition 2.

Suppose that

\hat{z_{i}} = \hat{y} (s_{i}, t_{i})

is the estimated value of

z_{i} = y (s_{i}, t_{i})

, then the problem can be defined as:

{\begin{matrix} m i n_{M S E} = E (\hat{z_{i}} - z_{i}) \\ s . t . E (\hat{z_{i}}) = z_{i} \end{matrix}

(1)

where MSE is the minimum mean square error and E is the statistical expectation. Meet

E (\hat{z_{i}}) = z_{i}

to ensure that the process of interpolation results in an unbiased estimate.

2.2. Method Framework

The method developed in this study (ST-2SMR) takes into account the problems of existing missing data interpolation methods and was constructed using 5 main steps (Figure 2): the partition of dataset into training and testing parts, coarse-grained interpolation, fine-grained interpolation, integration of temporal and spatial interpolation results and model performance evaluation.

Figure 2. Framework of model development.

First, because the ST-2SMR method uses a neural network model to combine the interpolation results in spatial and temporal dimensions, the space-time missing dataset was divided, with 80% selected for parameter training and 20% used as a test dataset to evaluate model performance, to avoid overfitting and to improve generalization ability.
Second, to avoid the influence of continuous missing data on fine-grained interpolation, coarse-grained interpolation for missing datasets was applied.
Based on the results of coarse-grained interpolation, spatial and temporal heterogeneity was used to perform fine-grained interpolation. During this process, it was necessary to calculate the correlation coefficient and covariance between time and space sequences to fit the parameters. If whole time series or spatial sequences are involved in interpolation, redundant sample data increase computational complexity; therefore, to improve the accuracy and speed of interpolation, a suitable sliding window was introduced to ensure the strongest correlation between sample data and missing data. Next, a heterogeneous covariance function was constructed for the space-time dimension. The best unbiased estimate of missing data can then be obtained by maximizing the objective function.
After interpolation of temporal and spatial dimensions, estimated values without missing data were chosen as training samples for the neural network, which is used to mine nonlinear relationships in spatio-temporal data. Estimated values of missing data were then obtained by inputting the spatio-temporal interpolation results of missing data into the trained neural network model.
Finally, the performance of the model was evaluated using the test dataset.

3. Detailed Design of ST-2SMR

3.1. Coarse-Grained Interpolation

Using existing spatial and temporal interpolation methods, it is difficult to obtain accurate interpolation estimates where there is a lack of sample points around missing points (Figure 1b,c). In some cases, it may even be impossible to obtain interpolation estimates (Figure 1d). We overcame this issue by introducing coarse-grained interpolation before fine-grained interpolation in order to eliminate the influence of continuous missing data. Using this approach, the ST-2SMR method is able to locate missing patterns in any combination, which ensures that the method is suitable for any serious loss situation.

In the spatial dimension, we used a classical multivariate statistical model (IDW) to interpolate missing data. IDW adopts the observed data of adjacent space points to estimate unknown data [11,28]. When the distance of the adjacent space points is closer to the point of interpolation, the spatial contribution value is larger. Interpolation estimates for missing data

\hat{v_s p a t i a l_{0}}

are calculated as follows:

\hat{v_s p a t i a l_{0}} = \sum_{i = 1}^{m} χ_{i} v_s p a t i a l_{i}

(2)

χ_{i} = \frac{d_{i}^{- α}}{\sum_{i = 1}^{m} d_{i}^{- α}}

(3)

where

d_{i}

denotes the distance between interpolation points and observation points and

α

is the decay weight rate, where a larger

α

denotes a faster decay by the distance.

In the time dimension, an exponential moving average model (SES) was used to estimate missing data [18]. SES assumes that there is strong temporal correlation between data, and the closer the sampling point is to the missing point, the bigger the weight that it is given. Traditional SES uses only the sampling data that are before the interpolation time point; however, when the time span is large, this can lead to too many irrelevant data involved in the calculation, which reduces interpolation accuracy. Here, we extended it to set a sliding window wc, which takes only before the wc time slice of the missing data and the last wc time slice of the missing data as the sampling point for the interpolation calculation. The model can be expressed as:

\hat{v_t e m p o r a l_{0} =} \frac{\sum_{j = 1}^{w c} v_t e m p o r a l_{j} * β * {(1 - β)}^{t h_{j} - 1}}{\sum_{j = 1}^{w c} β * {(1 - β)}^{t h_{j} - 1}}

(4)

where

\hat{v_t e m p o r a l_{0}}

is the estimated value for missing data,

t h_{j}

is a time interval between the sampling data and the missing data and

β

is a smoothing parameter with a range of (0, 1).

Assuming that

V_{4, 6}

are the data to be interpolated, we first selected all no missing data at

t_{6}

, using IDW interpolation estimation (Figure 3). In the temporal dimension, we set the sliding window to be

w c = 4

and selected

{V_{4, 2}, V_{4, 3}, V_{4, 5}, V_{4, 7}, V_{4, 8}, V_{4, 10}}

as the sampling data for interpolation. If both the IDW and SES methods can obtain an estimated value, the mean of the two is taken as the interpolation result of

V_{4, 6}

. If SES or IDW fail to get an interpolation result, the other estimated value is taken as the estimated value. If both SES and IDW fail to obtain an interpolation result, then

V_{4, 6}

is still missing, and the fine-grained interpolation algorithm is needed to estimate its value. The pseudocode of this process is shown in Algorithm 1.

Algorithm 1: Coarse-Grained interpolation.

Input: Original Missing Matrix

V_{m \times n}

Temporal Threshold

w c

Parameter of IDW

α

Parameter of SES

β

Output: Coarse-Grained Imputing Matrix

C_M_{m \times n}

1

C_M_{m \times n} \leftarrow I n i t i a l i z a t i o n (V_{m \times n})

2 For

i

= 1 to

m

3 For

j

= 1 to

n

4 If

V_{i j}

is missing value then

5

v_{c_s p a t i a l} \leftarrow 0

6

v_{c_t e m p o r a l} \leftarrow 0

7

v_{c_s p a t i a l} \leftarrow I D W (V_{i j}, α)

8

v_{c_t e m p o r a l} \leftarrow S E S (V_{i j}, β, w c)

9 If

v_{c_s p a t i a l}

,

v_{c_t e m p o r a l}

are not missing value then

10

C_M_{i j} \leftarrow (v_{c_s p a t i a l} + v_{c_t e m p o r a l}) / 2

11 Else if

v_{c_s p a t i a l}

is not missing value then

12

C_M_{i j} \leftarrow v_{c_s p a t i a l}

13 Else if

v_{c_t e m p o r a l}

is not missing value then

14

C_M_{i j} \leftarrow v_{c_t e m p o r a l}

15 Else

16

C_M_{i j} \leftarrow \emptyset

17 End for

18 End for

Figure 3. Coarse-grained interpolation method.

3.2. Fine-Grained Interpolation

3.2.1. Sliding Window

Before fine-grained interpolation, the ST-2SMR model needs to set up a dynamic sliding window to determine sample data. Owing to the strong temporal dependence of spatial and temporal data, selecting different numbers of data for interpolation estimation can lead to different results. If the window is set too small, sample data cannot fully reflect spatio-temporal relationships; if the window is too large (i.e., too many data are used), significant redundant data are introduced to the calculation, and the computational complexity increases.

Considering the fact that spatio-temporal data from a short period of time remain within the approximate correlation, take the average correlation of the missing data sequence and their adjacent spatial sequences to determine the optimal sample data using the expressions:

\begin{matrix} R_{b e g i n} = \frac{1}{n - w_{b e g i n}} \sum_{j = n - 1}^{w_{b e g i n}} C o r r (t_{n}, t_{j}) \\ o b j e c t i v e : m i n w_b e g i n \\ s u b j e c t t o : R_b e g i n = m a x (R_b e g i n) \end{matrix}

(5)

\begin{matrix} R_e n d = \frac{1}{w_e n d - n} \sum_{i = n + 1}^{w_e n d} C o r r (t_{n}, t_{i}) \\ o b j e c t i v e : m i n w_e n d \\ s u b j e c t t o : R_e n d = m a x (R_e n d) \end{matrix}

(6)

where

n

is the timestamp for missing data,

j

are the first

j

moments of missing data,

i

are the last

i

moments of missing data,

C o r r (t_{n}, t_{j})

is the correlation coefficient between the spatial sequence of missing data and the first

j

spatial sequences,

C o r r (t_{n}, t_{i})

is the correlation coefficient between the spatial sequence of missing data and the last

i

spatial sequences,

w_b e g i n

is the beginning of the window and

w_e n d

is the end of the window.

w_b e g i n

and

w_e n d

are determined heuristically and are initially set to

n - 1

and

n + 1

.

C o r r (t_{n}, t_{j})

and

C o r r (t_{n}, t_{i})

are first calculated, then

w_b e g i n

moves forward, and

w_e n d

moves backwards until the mean correlation coefficient reaches the maximum. Assuming that

V_{4, 6}

is the missing datum to be interpolated,

t_{6}

is the spatial sequence of missing data, and

t_{2}

~

t_{10}

is the sliding window (Figure 4). The pseudocode of this process is shown in Algorithm 2.

Algorithm 2: Selected optimal window (SOM).

Input: Missing Spatial Series

t_{n}

Output: Start of Window

w_b e g i n

End of Window

w_e n d

1

R_b e g i n \leftarrow 0

2

R_e n d \leftarrow 0

3 For

j = n - 1

to 1

4

R_l a s t \leftarrow R_b e g i n

5

R_b e g i n \leftarrow (R_b e g i n + C o r r (t_{n}, t_{j})) / (n - j)

6 If

R_b e g i n < R_l a s t

7 Return

j

8 End if

9 End for

10 For

i = n + 1

to end of the timestamp

11

R_l a s t \leftarrow R_e n d

12

R_e n d \leftarrow (R_e n d + C o r r (t_{n}, t_{i})) / (i - n)

13 If

R_e n d < R_l a s t

14 Return

i

15 End if

16 End for

17

w_b e g i n \leftarrow j

18

w_e n d \leftarrow i

Figure 4. Sliding window selection.

3.2.2. Fine-Grained Spatial Dimension Interpolation

In the spatial dimension, we first selected a sliding window based on the optimal window selection algorithm (SOM), whose size is

w s

. Assuming that

V_{4, 6}

is the missing data to be interpolated, the start and end positions of the selected window are centered on the

V_{4, 6}

in the first 4 columns and the last 4 columns of Figure 5. In this window, we chose the

m s

time series with the largest correlation of missing data. In detail, we adopted a pair-wise approach for calculating the correlation between the time series of missing data and its spatial neighbor sampling data, and then,

m s

spatial sampling data were selected to calculate the estimated value using the expression:

\hat{y_{0}} = \sum_{i = 1}^{m s} w_{i} y_{i}

(7)

where

y_{i}

denotes the

i

-th spatial neighbor sampling data of missing data and

w_{i}

denotes the corresponding contribution weight of

y_{i}

. As shown in Figure 5, suppose

m s = 3

, if

{s_{2}, s_{6}, s_{8}}

is the most relevant time series with missing data, then we take

{V_{2, 6}, V_{6, 6}, V_{8, 6}}

as the sampled data for interpolation.

Figure 5. Fine-grained interpolation in the spatial dimension.

In order to ensure

\hat{y_{0}}

is an unbiased estimator of missing data, the following conditions must be satisfied:

E (\hat{y_{0}}) = E (y_{0})

(8)

where

E (.)

represents the statistical expectation. Considering the heterogeneity of the spatial dimension, we introduced the parameter

b_{i}

to calculate the ratio between the statistical expectation of

\hat{y_{0}}

and

y_{0}

to characterize spatial heterogeneity.

b_{i} = E (y_{i}) / E (y_{0})

(9)

Combining Formulae (7)–(9), the constraint condition of

w_{i}

is as follows:

\sum_{i = 1}^{m s} w_{i} b_{i} = 1

(10)

In order to obtain the parameter

w_{i}

, the objective function was constructed to minimize the variance between the missing and true data.

m i n_{w} [σ^{2}_{\hat{y_{0}}} = E {(\hat{y_{0}} - y_{0})}^{2}]

(11)

Among them, Formula (11) can be resolved as follows:

σ^{2}_{\hat{y_{0}}} = C (\hat{y_{0}}, \hat{y_{0}}) + C (y_{0}, y_{0}) - 2 C (\hat{y_{0}}, y_{0}) = σ^{2}_{y_{0}} + \sum_{i = 1}^{m s} \sum_{j = 1}^{m s} w_{i} w_{j} C (y_{i}, y_{j}) - 2 \sum_{i = 1}^{m s} w_{i} C (y_{i}, y_{j})

(12)

where

C

represents the covariance between different spatial points. Considering Formula (10), Formulae (11) and (12) can be written as:

{\begin{matrix} a r g_{w} m i n σ^{2}_{\hat{y_{0}}} = a r g m i n E {(\hat{y_{0}} - y_{0})}^{2} \\ s . t . \sum_{i = 1}^{m s} w_{i} b_{i} = 1 \end{matrix}

(13)

Then, the problem of solving the parameter

w_{i}

was transformed into a standard Lagrange constrained optimization problem, and Formula (13) was re-written as:

σ^{2}_{\hat{y_{0}}} = σ^{2}_{y_{0}} + \sum_{i = 1}^{m s} \sum_{j = 1}^{m s} w_{i} w_{j} C (y_{i}, y_{j}) - 2 \sum_{i = 1}^{m s} w_{i} C (y_{i}, y_{0}) + 2 μ (\sum_{i = 1}^{m s} w_{i} b_{i} - 1)

(14)

where

μ

is a Lagrange multiplier. The partial derivatives of

σ^{2}_{\hat{y_{0}}}

produces the equations:

\begin{matrix} \frac{δ σ^{2}_{\hat{y_{0}}}}{δ w_{i}} = 0 & = > 2 \sum_{i = 1}^{m s} w_{i} C (y_{i}, y_{j}) - 2 C (y_{i}, y_{0}) + 2 μ b_{i} = 0 \\ = > \sum_{j = 1}^{m s} w_{j} C (y_{i}, y_{j}) + μ b_{i} = C (y_{i}, y_{0}) \end{matrix}

(15)

Formula (15) can be written in the matrix form:

[\begin{matrix} C (y_{1}, y_{1}) & \dots & C (y_{1}, y_{m s}) & b_{1} \\ ⋮ & ⋱ & ⋮ & ⋮ \\ C (y_{m s}, y_{1}) & \dots & C (y_{m s}, y_{m s}) & b_{m s} \\ b_{1} & \dots & b_{m s} & 0 \end{matrix}] [\begin{matrix} w_{1} \\ ⋮ \\ w_{m s} \\ μ \end{matrix}] = [\begin{matrix} C (y_{1}, y_{0}) \\ ⋮ \\ C (y_{m s}, y_{0}) \\ 1 \end{matrix}]

(16)

In order to obtain the parameter

w_{i}

in Formula (16), we first calculated the covariance matrix

C_S

between the most relevant temporal series and the ratio of statistical expectation

b_{i}

and covariance

C_{i}

between the most relevant temporal series and the missing temporal series (Lines 6–9 of Algorithm 3). Then, we joined these values into a matrix like Formula (16) and solved this matrix to get parameter

w_{i}

(Lines 10–12 of Algorithm 3). Finally, through Formula (7), we obtained estimated values of missing data. As shown in Figure 5, parameter

w_{i}

of

V_{4, 6}

can be calculated by: (1) obtaining the covariance of {

s_{2}, s_{6}, s_{8}

} and the covariance

C (s_{2}, s_{4})

,

C (s_{6}, s_{4})

and

C (s_{8}, s_{4})

; (2) calculating the ratio of statistical expectation

b_{1} = E (s_{2}) / E (s_{4})

,

b_{2} = E (s_{6}) / E (s_{4})

and

b_{3} = E (s_{8}) / E (s_{4})

; (3) obtaining weights

w_{1}

,

w_{2}

and

w_{3}

by solving the matrix; and finally, (4) interpolating

V_{4, 6}

from

w_{1} \times V_{2, 6} + w_{2} \times V_{6, 6} + w_{3} \times V_{8, 6}

.

Algorithm 3: Fine-Grained spatial interpolation.

Input: Coarse-Grained Matrix

C_M_{m \times n}

Number of Spatial Neighbors

m s

Output: Fine-Grained Spatial Matrix

F_S_{m \times n}

1 For

i = 1

to

m

2 For

j = 1

to

n

3

w s \leftarrow S O M (C_M_{m \times n}, C_M_{i j})

4

R s s \leftarrow C o r r c o e f (M_W_{m \times w s}, c o l u m n)

5

S_C o r r e l a t e \leftarrow M a x M s C o r r e l a t e (R s s, S_M i s s i n g_{i}, m s)

6

C_S \leftarrow C o v (S_C o r r e l a t e)

7 For each

S_{k} \in S_C o r r e l a t e

8

C_{k} \leftarrow C o v (S_{k}, S_M i s s i n g_{i})

9

b_{k} \leftarrow M e a n (S_{k}) / M e a n (S_M i s s i n g_{i})

10

C_M a t r i x_L e f t_{(m s + 1) \times (m s + 1)} \leftarrow C o m b i n e (C_S, b)

11

C_M a t r i x_R i g h t_{(m s + 1) \times 1} \leftarrow C o m b i n e (C, 1)

12

w \leftarrow C_M a t r i x_L e f t_{(m s + 1) \times (m s + 1)} C_M a t r i x_R i g h t_{(m s + 1) \times 1}

13

F_S_{i j} \leftarrow D o t P r o d u c t (S_C o r r e l a t e_{i}, w)

14 End for

15 End for

3.2.3. Fine-Grained Temporal Dimension Interpolation

In the temporal dimension, we used the SOM algorithm to select an optimal window as the data matrix for fine-grained temporal dimension interpolation. In this sliding window, the

n t

sample data with the largest correlation of missing data were chosen. The estimated value of missing data

\hat{t_{0}}

was calculated as follows:

\hat{t_{0}} = \sum_{j = 1}^{n t} φ_{j} t_{j}

(17)

where

t_{j}

denotes the

j

-th temporal neighbor sampling data of missing data and

φ_{j}

denotes the corresponding contribution weight of

t_{j}

. Similar to Formulae (12)–(14), to ensure

\hat{t_{0}}

is an unbiased estimator of missing data and to calculate the weight

φ_{j}

, the following conditions must be satisfied:

σ^{2}_{\hat{t_{0}}} = σ^{2}_{t_{0}} \sum_{j = 1}^{n t} \sum_{g = 1}^{n t} φ_{j} φ_{g} C (t_{j}, t_{g}) - 2 \sum_{j = 1}^{n t} φ_{j} C (\hat{t_{0}}, t_{0}) + 2 v (\sum_{j = 1}^{n t} φ_{j} a_{j} - 1)

(18)

where

v

is a Lagrange multiplier,

t_{0}

is the true value of missing data and

a_{j}

is the ratio of statistical expectation between the most relevant spatial series and the spatial series of missing data. The partial derivatives of

σ^{2}_{\hat{t_{0}}}

can be written as:

\frac{δ σ^{2}_{\hat{t_{0}}}}{δ φ_{i}} = 0 = > \sum_{j = 1}^{n t} φ_{j} C (t_{i}, t_{j}) + v a_{j} = C (t_{i}, t_{0})

(19)

Formula (19) can be written in the form of a matrix:

[\begin{matrix} C (t_{1}, t_{1}) & \dots & C (t_{1}, t_{n t}) & a_{1} \\ ⋮ & ⋱ & ⋮ & ⋮ \\ C (t_{n t}, t_{1}) & \dots & C (t_{n t}, t_{n t}) & a_{n t} \\ a_{1} & \dots & a_{n t} & 0 \end{matrix}] [\begin{matrix} φ_{1} \\ ⋮ \\ φ_{n t} \\ v \end{matrix}] = [\begin{matrix} C (t_{1}, t_{0}) \\ ⋮ \\ C (t_{n t}, t_{0}) \\ 1 \end{matrix}]

(20)

As shown in Figure 6, in order to estimate the value of missing data

V_{4, 6}

, we first adopted the SOM algorithm to select

t_{2} - t_{10}

as the sliding window. In this window, we choose the

n t

spatial series with the largest correlation of missing data. For example, when

n t = 4

, the spatial series would be

{t_{2}, t_{4}, t_{8}, t_{9}}

, and we would take

{V_{4, 2}, V_{4, 4}, V_{4, 8}, V_{4, 9}}

as the sampled data for interpolation (Lines 3–5 of Algorithm 4). The covariance matrix

C_T

was calculated from the most relevant spatial series and the ratio of statistical expectation

a_{j}

and covariance

C_{j}

by using the most relevant spatial series and the missing spatial series (Lines 6–9 of Algorithm 4). Then, we joined these values into matrix Formula (20) and solved this matrix to get parameter

φ_{j}

(Lines 10–12 of Algorithm 4). Finally, through Formula (17), interpolation result

V_{4, 6}

was calculated as

φ_{1} \times V_{4, 2} + φ_{2} \times V_{4, 4} + φ_{3} \times V_{4, 8} + φ_{4} \times V_{4, 9}

.

Algorithm 4: Fine-Grained temporal interpolation.

Input: Coarse-Grained Matrix

C_M_{m \times n}

Number of Temporal Neighbors

n t

Output: Fine-Grained Temporal Matrix

F_T_{m \times n}

1 For

i = 1

to

m

2 For

j = 1

to

n

3

w t \leftarrow S O M (C_M_{m \times n}, C_M_{i j})

4

R t t \leftarrow C o r r c o e f (M_W_{m \times w t}, r o w)

5

T_C o r r e l a t e \leftarrow M a x N t C o r r e l a t e (R t t, T_M i s s i n g_{j}, n t)

6

C_T \leftarrow C o v (T_C o r r e l a t e)

7 For each

T_{k} \in T_C o r r e l a t e

8

C_{k} \leftarrow C o v (T_{k}, T_M i s s i n g_{j})

9

b_{k} \leftarrow M e a n (T_{k}) / M e a n (T_M i s s i n g_{j})

10

C_M a t r i x_L e f t_{(n t + 1) \times (n t + 1)} \leftarrow C o m b i n e (C_T, b)

11

C_M a t r i x_R i g h t_{(n t + 1) \times 1} \leftarrow C o m b i n e (C, 1)

12

φ \leftarrow C_M a t r i x_L e f t_{(n t + 1) \times (n t + 1)} C_M a t r i x_R i g h t_{(n t + 1) \times 1}

13

F_T_{i j} \leftarrow D o t P r o d u c t (T_C o r r e l a t e_{j}, φ)

14 End for

15 End for

Figure 6. Fine-grained interpolation in the temporal dimension.

3.3. Spatio-Temporal Integration

After obtaining interpolation results for the time and space dimensions, the BP (back propagation) neural network was trained to integrate spatial and temporal interpolation results to obtain final missing data estimation values [5]. The BP neural network can be regarded as a nonlinear function. When the number of input nodes is n, the output node is m, and the BP neural network expresses the mapping function from

n

independent variables to m dependent variables [29].

Appropriate samples are needed for training and testing neural networks. In this study, we first detect no missing data in the fine-grained temporal matrix

F_T_{m \times n}

, fine-grained spatial matrix

F_S_{m \times n}

and coarse-grained matrix

C_M_{m \times n}

to construct the samples (Lines 1–7 of Algorithm 5). Then, the samples were divided into three parts: 80% as a training dataset, 10% as a test dataset and 10% as a cross-validation dataset to control early stopping (Lines 8–10 of Algorithm 5). Next, the error back-propagation algorithm was used to train the neural network model (Figure 7). Assuming that the input variables of the neural network model are

X = {F_S, F_T}

, the connection weights between the input layer and the hidden layer are

γ_{i j}

and the hidden layer bias value is

b i a s 1

, then the output of the hidden layer is:

H_{j} = f (\sum_{i = 1}^{n} γ_{i j} x_{i} - b i a s l) j = 1, 2, \dots, l

(21)

where

l

is the number of hidden layer nodes and

f

is the activation function. Activation function

f

has many forms, from which we selected the sigmoid, which can be expressed as:

f (x) = \frac{1}{1 + e^{- x}}

(22)

Figure 7. Neural network training.

The interpolation estimate of missing data

S T

was calculated as follows:

S T = \sum_{j = 1}^{l} H_{j} γ_{j 1} - b i a s 2

(23)

where

γ_{j 1}

is the connection weight between the hidden layer and the output layer and

b i a s 2

is the bias of the output layer. The weights and bias of Formula (23) can be calculated from:

γ_{i j} = γ_{i j} + η H_{j} (1 - H_{j}) x (i) γ_{j l} e i = 1, 2, \dots, n; j = 1, 2, \dots, l γ_{j l} = γ_{j l} + η H_{j} e j = 1, 2, \dots, l

(24)

b i a s 1 = b i a s 1 + η H_{j} (1 - H_{j}) w_{j l} j = 1, 2, \dots, l b i a s 2 = b i a s 2 + e

(25)

where

η

is the learning rate and

e

is the prediction error of the neural network (i.e., the difference between the predicted and expected outputs). The training process of the neural network was completed when the algorithm reached the set of training objectives (i.e., the number of iterations and the minimum error).

After completing the neural network model, we were then able to predict the missing data. We first detected missing data in the coarse-grained matrix

C_M_{m \times n}

. Then, the results of fine-grained spatial interpolation (FGSI) and fine-grained temporal interpolation (FGTI) algorithms were input to the neural network to calculate the estimated values (Lines 12–18 of Algorithm 5). Meanwhile, the results of FGSI and FGTI in the testing dataset were input to the neural network to evaluate the performance of the model (Lines 19–25 of Algorithm 5).

Algorithm 5: Combining spatial and temporal.

Input: Fine-Grained Spatial Matrix

F_S_{m \times n}

Fine-Grained Temporal Matrix

F_T_{m \times n}

Coarse-Grained Matrix

C_M_{m \times n}

Number of Spatial Neighbors

m s

Number of Temporal neighbors

n t

Output: Test Estimated Matrix

S T_{m \times n}

Missing Estimated Matrix

M_S T_{m \times n}

1 For

i = 1

to

m

2 For

j = 1

to

n

3 If

F_S_{i j}

F_T_{i j} C_M_{i j}

are not missing values then

4

S a m p l e \leftarrow S a m p l e (F_T_{i j}, F_S_{i j}, C_M_{i j})

5 End if

6 End for

7 End for

8

T r a i n i n g_S p l \leftarrow D i v i d e (S a m p l e, 0.8)

9

T e s t i n g_S p l \leftarrow D i v i d e (S a m p l e, 0.1)

10

C r o s s V a l i d a t i o n_S p l \leftarrow D i v i d e (S a m p l e, 0.1)

11

N e t \leftarrow T r a i n (T r a i n i n g_S p l)

\nabla N e u r a l N e t w o r k T r a i n i n g

12 For

i = 1

to

m

13 For

j = 1

to

n

14 If

C_M_{i j}

is missing value then

15

M_S T_{i j} \leftarrow S i m (N e t, F_T_{i j}, F_S_{i j})

16 End if

17 End for

18 End for

19 For

i = 1

to

m

20 For

j = 1

to

n

21 If

{F_S_{i j}, F_T_{i j}} \in T e s t i n g_S p l

then

22

S T_{i j} \leftarrow S i m (N e t, F_T_{i j}, F_S_{i j})

23 End if

24 End for

25 End for

4. Results

4.1. Datasets

We evaluated our model based on real air quality datasets from Beijing, which were collected between 5 January 2014 and 30 April 2015. Data included PM_2.5, CO, SO₃, O₃, NO₂ and other attributes, each of which was collected at 36 air quality monitoring stations at hourly intervals, as depicted in Figure 8. The dataset contained a total of 8759 records [30] (Table 1).

Figure 8. Air quality monitoring stations.

Table 1. Experimental dataset.

The different air quality variables have different degrees of missing data (Table 1). Among them, only PM_2.5 contains a complete case. For the other attributes, we used combination analysis to explore the patterns of missing spatio-temporal missing data (Figure 9).

Figure 9. Pattern of missing spatio-temporal PM_2.5 data. Red squares represent missing data, and values on the right ordinate are the number of missing data combinations.

Using the PM_2.5 dataset as an example, we set

w t

= 48 as the time sliding window, which took 48 h of data to explore the missing pattern. The missing numbers of st₁₉, st₂₇, st₃₅ and st₃₆ at the same time were eight (i.e., the missing pattern in Figure 1c). Data for st₁₉, st₂₇, st₃₆ were completely missing (Figure 1d). Data for {st₁₈, st₁₉}, {st₃₅, st₃₆} showed random block loss (Figure 1b). Finally, a large number of patterns showed random missing data (Figure 1a). These patterns show that if the interpolation process was performed directly on the original dataset (i.e., without coarse-grained interpolation to eliminate the effect of successive missing data), it would be difficult to obtain accurate evaluation.

4.2. Evaluation Criteria

In order to evaluate the ST-2SMR method, we compared three existing methods (ST-kriging [31], P-BSHADE [16,32] and ST-HC [2,33]), each constrained in three different ways (i.e., increasing coarse-grained interpolation, increasing the sliding window or both; Table 2). We adopted mean absolute error (MAE), mean relative error (MRE) and the ratio of construction (RC) as the evaluation criteria to verify the performances of the proposed method.

Table 2. Combination of different methods. P-BSHADE, point estimation model of biased hospital-based area disease estimation; ST, spatio-temporal; HC, heterogeneous covariance.

4.3. Experimental Results

The proposed method was implemented in MATLAB 2016b. The PM2.5 dataset was selected to validate the proposed method. Through the comparative analysis of different experiments, the relevant parameters were set as follows:

α

= 4,

β

= 0.85,

w c

= 14,

m s

= 10,

n t

= 10,

η

= 0.01.

4.3.1. Overall Results

Among the first group of experiments (i.e., those using unaltered spatio-temporal interpolation methods), the ST-HC and ST-2SMR have the same reconstruction ratio, but different accuracy. The ST-2SMR method had the highest accuracy, reflecting in the effect of the nonlinear combination on the integration of spatial and temporal results (Table 3). The ST-HC and ST-2SMR have the same reconstruction ratio because they adopt the same spatio-temporal interpolation algorithm; however, the reconstruction rate was the lowest, which is owed to the introduction of heterogeneity in the time dimension. When we calculate the correlation coefficient and covariance between the spatial sequence of the missing data and the time slice, these sequences may seriously be missing. If the data sequence in the calculation is completely missing or using the pare-wise method makes the data sequence be completely missing, which may lead to the covariance matrix still having missing data, therefore we cannot get the final estimates, and this results in lower rate reconstruction.

Table 3. Performance comparison of different methods. RC, ratio of construction; ST-2SMR, spatio-temporal missing data reconstruction.

In the second group of experiments (i.e., those with the original algorithms, but with the sliding window increased), interpolation precision was improved because the increased sliding window ensured that the sample data have the strongest correlation with missing data. In the third group of experiments (i.e., algorithms with coarse-grained interpolation added before the original interpolation method; here, we use IDW + SES as the coarse-grained interpolation method), the interpolation accuracy was also improved, and the complete reconstruction results were obtained because of the influence of continuous missing data has been eliminated. This result also validates the point of view in [34]: the accuracy and reliability of spatio-temporal interpolation methods depend on the pattern of missing data. However, interpolation accuracy improved the most when both constraints were applied at the same time, as demonstrated by the significant improvements in MAE and MSE (Table 3).

4.3.2. Effect of Coarse-Grained Interpolation

Through coarse-grained interpolation, continuous missing data were eliminated, significantly improving accuracy. Our experiments demonstrated that, regardless of the coarse-grained interpolation method chosen, interpolation accuracy improves when coarse-grained interpolation is first used (Table 4; here, we compare with the accuracy of those using unaltered spatio-temporal interpolation methods in Table 3). Among the methods tested, ST-2SMR showed the most significant improvement in accuracy, reflecting the nonlinear integration of spatial and temporal interpolation results, an approach that is particularly suitable for describing complex relationships between spatial and temporal data. However, overall, the accuracy of the IDW + SES method was found to be the best, so this method was chosen for coarse-grained interpolation in subsequent experiments.

Table 4. Performance of different coarse-grained interpolation methods ¹. SES, simple exponential smoothing.

4.3.3. Effect of the Coarse-Grained Missing Data Rate

According to Algorithm 1, the results of coarse-grained interpolation are mainly affected by the decay rate of weight

α

, smoothing parameter

β

and by time threshold wc. According to the experimental results of [27], performed using the same PM2.5 dataset, IDW and SES achieve a minimum MAE value when

α

= 4 and

β

= 0.85. For the time threshold, different values exert a significant influence on the coarse-grained interpolation results. In this study, we determined

w c

heuristically, with the value initially set to one (i.e., the center of missing data, taking the first hour and last hour as the sample data). With the increase in

w c

, the reconstruction rate of the missing data increased until complete reconstruction was achieved (Figure 10). We found that MAE was smallest when

w c

= 1 and stable when

w c

> 3. At

w c

> 3, the contribution weight of the sample data for missing data was nearly equal to zero, resulting in a small effect on the interpolation results. Furthermore, the termination condition of

w c

was set to ensure no continuous deletion of the whole space sequence and time series after coarse-grained interpolation, so as to obtain a complete interpolation result in the fine-grained interpolation. When

w c

< 14, the dataset from the coarse-grained interpolation still exhibited complete missing data in both the time and space sequences (Figure 11); therefore, fine-grained interpolation was required to achieve reconstruction. When

w c

= 14, coarse-grained interpolation eliminated the influence of successive missing data, so the reconstruction rate was 100%.

Figure 10. Impact of the temporal threshold on coarse-grained interpolation.

Figure 11. Impact of the temporal threshold on ST-2SMR.

4.3.4. Effect Sample Point Number

During fine-grained interpolation, we were required to select

m s

spatial neighbors and

n t

temporal neighbors for missing data. The number of sample points impacted the assessment results. Too few sample data points fail to reflect the correlation of spatial and temporal data, while excessive sample data increase computational complexity and also reduce the accuracy of assessment because of redundant data. According to the experimental results of [16], when the number of sampling points is set to between five and 15, the interpolation results are perfect; therefore, we set three sets of adjacent point selection patterns (5, 10 and 15) in both the spatial and temporal dimensions (i.e., a total of nine sets of experiments) and performed experiments to determine the most suitable number of samples. When

m s

= 10 and

n t

= 10, the ST-2SMR method achieved its best performance (Table 5); however, the results show that the number of neighbor points had no effect on the reconstruction rate. This result reflects the elimination of continuous missing data during coarse-grained interpolation, which meant that all missing data had access to observed data for interpolation, allowing the whole dataset to obtain corresponding estimates.

Table 5. Influence sample point number on interpolation results.

4.3.5. Effect of Sliding Window

Our experimental results show that the SOM algorithm can dynamically select the size of each window through the interaction information for time and space (Figure 12). As a result, interpolation accuracy was greatly improved.

Figure 12. Influence of the sliding window on the interpolation results. The static window refers to the selection of fixed size sliding window (i.e., the center of missing data, taking the first 24 h and last 24 h as the sample data for interpolation).

4.3.6. Performance of Two- and Three-Step Interpolation

In order to explore the convergence of the two-step interpolation, we further introduced a third-step interpolation (Table 6). The two-step interpolation used IDW + SES for coarse interpolation, while the three-step interpolation was based on the results of two-step interpolation using ST-kriging, P-BSHADE and ST-HC. The results demonstrate that three-step interpolation slightly improved accuracy, but the change was marginal. In addition, regardless of the method used for the third interpolation, the overall results tended to be stable; therefore, we concluded that the additional computational complexity of the three-step method was not justified by the minor improvement in performance.

Table 6. Two- and three-step interpolation results.

4.3.7. Performance Comparison for Different Datasets

To verify the universality of the proposed method, the approach was tested using the NO₂, CO, SO₃ and O₃ datasets (Figure 13). The results confirmed that the proposed method is superior to the other three methods in terms of accuracy. We found that only our new method can guarantee a complete reconstruction result and is able to maintain consistent stability across different datasets. For example, the P-BSHADE method performed better on the SO₃ dataset, but worse for other datasets. The ST-HC method performed better on the NO₂ dataset, but worse on the other datasets. This variable performance reflects the fact that different datasets have completely different missing data patterns, from which existing methods directly interpolate results (i.e., they do not eliminate the influence of missing patterns before interpolation).

Figure 13. Performance of the ST-2SMR using different spatio-temporal datasets. (a) The result of the experiment in NO₂ datasets; (b) The result of the experiment in CO datasets; (c) The result of the experiment in SO₃ datasets; (d) The result of the experiment in O₃ datasets .

4.3.8. Evaluation of Computational Efficiency

The computational efficiency is also an important factor worth evaluation for missing data reconstruction. We conducted a comparison of the computational efficiency on a 3.4-GHz Intel i7 CPU, with a 64-bit operating system, and 16.0 GM RAM. The CPU time costs of each interpolation method in the forecasting stages (i.e., we select 10% of samples as a test dataset) are shown in Figure 14. The computational efficiency of all of the four methods has no distinct difference. Obviously, ST-kriging is the fastest one among the interpolation methods because it does not take into account the effects of spatial and temporal heterogeneity on interpolation results. ST-HC and ST-2SMR consume only a little more time than ST-kriging and P-BSHADE because they are the extensions of P-BSHADE and consider the temporal and spatial heterogeneity. In addition, the linear combination of ST-HC to integrate spatio-temporal interpolation results almost spends the same time as nonlinear ways using a trained neural network, so the time complexity of ST-HC and ST-2SMR is nearly the same. However, the ST-2SMR method makes a significant trade-off between the efficiency and accuracy because the MSEs resulting from other methods are far larger than that of ST-2SMR (see Table 3). Therefore, when both efficiency and accuracy are considered, the proposed ST-2SMR outperforms the other methods.

Figure 14. Comparison of the computational time costs for different interpolation methods.

5. Summary

Given the problem with existing missing data interpolation methods, this study developed a novel method called ST-2SMR. In the ST-2SMR method, missing data patterns in spatio-temporal datasets are first identified, and information on temporal and spatial dimensions is integrated to obtain a partial reconstruction. Using the output of this partial reconstruction, spatial and temporal heterogeneity is taken into account and a sliding window is set to both remove redundant sample data (i.e., to reduce computational complexity) and to ensure that the strongest correlation with missing data is selected (i.e., the most suitable data are chosen to improve the accuracy of the analysis). Finally, spatio-temporal interpolation results are integrated through a neural network model. We evaluated ST-2SMR using a real and open air quality dataset collected from Beijing. It is argued that the proposed method performs better than other existing methods. Providing the characteristics of the black box neural network models, the best way to integrate the results of spatio-temporal interpolation and to depict the nonlinear relationships between space and time requires further consideration.

Acknowledgments

This research is supported by the State Key Research Development Program of China (Grant No. 2016YFB0502104), the National Natural Science Foundation of China (41631177), and the Key Research Program of the Chinese Academy of Sciences (Grant No. ZDRW-ZS-2016-6-3). Their supports are gratefully acknowledged. And we also thank the anonymous referees for their helpful comments and suggestions.

Author Contributions

Shifen Cheng, Feng Lu conceived the idea for the research and wrote the paper; Shifen Cheng implemented the ST-2SMR model and carried out the experimental validation. Feng Lu interpreted the results and made the important comments and suggestions for this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, J.; Yong, G.E.; Lianfa, L.I. Spatiotemporal data analysis in geography. Acta Geogr. Sin. 2014, 69, 1326–1345. [Google Scholar]
Deng, M.; Fan, Z.; Liu, Q. A Hybrid Method for Interpolating Missing Data in Heterogeneous Spatio-Temporal Datasets. ISPRS Int. J. Geo-Inf. 2016, 5, 13. [Google Scholar] [CrossRef]
Gao, Z.; Cheng, W.; Qiu, X. A Missing Sensor Data Estimation Algorithm Based on Temporal and Spatial Correlation. Int. J. Distrib. Sens. Netw. 2015, 2015, 1–10. [Google Scholar] [CrossRef]
Galán, C.O.; Lasheras, F.S.; Juez, F.J.D.C.; Sánchez, A.B. Missing data imputation of questionnaires by means of genetic algorithms with different fitness functions. J. Comput. Appl. Math. 2017, 311, 704–717. [Google Scholar] [CrossRef]
Durán-Rosal, A.M.; Hervás-Martínez, C.; Tallón-Ballesteros, A.J. Massive missing data reconstruction in ocean buoys with evolutionary product unit neural networks. Ocean Eng. 2016, 117, 292–301. [Google Scholar] [CrossRef]
Tak, S.; Woo, S.; Yeo, H. Data-Driven Imputation Method for Traffic Data in Sectional Units of Road Links. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1762–1771. [Google Scholar] [CrossRef]
Tonini, F.; Dillon, W.W.; Money, E.S. Spatio-temporal reconstruction of missing forest microclimate measurements. Agric. For. Meteorol. 2016, 218–219, 1–10. [Google Scholar] [CrossRef]
Londhe, S.; Dixit, P.; Shah, S. Infilling of missing daily rainfall records using artificial neural network. ISH J. Hydraul. Eng. 2015, 21, 255–264. [Google Scholar] [CrossRef]
Tipton, J.; Hooten, M.; Goring, S. Reconstruction of spatio-temporal temperature from sparse historical records using robust probabilistic principal component regression. Adv. Stat. Clim. Meteorol. Oceanogr. 2017, 3, 1–16. [Google Scholar] [CrossRef][Green Version]
Ruan, W.; Xu, P.; Sheng, Q.Z. Recovering Missing Values from Corrupted Spatio-Temporal Sensory Data via Robust Low-Rank Tensor Completion; International Conference on Database Systems for Advanced Applications; Springer: Cham, Switzerland, 2017. [Google Scholar]
Lu, G.Y.; Wong, D.W. An adaptive inverse-distance weighting spatial interpolation technique. Comput. Geosci. 2008, 34, 1044–1055. [Google Scholar] [CrossRef]
Bartier, P.M.; Keller, C.P. Multivariate interpolation to incorporate thematic surface data using inverse distance weighting (IDW). Comput. Geosci. 1996, 22, 795–799. [Google Scholar] [CrossRef]
Pesquer, L.; Cortés, A.; Pons, X. Parallel ordinary kriging interpolation incorporating automatic variogram fitting. Comput. Geosci. 2011, 37, 464–473. [Google Scholar] [CrossRef]
Bhattacharjee, S.; Mitra, P.; Ghosh, S.K. Spatial Interpolation to Predict Missing Attributes in GIS Using Semantic Kriging. IEEE Trans. Geosci. Remote Sens. 2014, 52, 4771–4780. [Google Scholar] [CrossRef]
Dutilleul, P. Spatio-Temporal Heterogeneity: Concepts and Analyses; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Xu, C.; Wang, J.; Hu, M. Interpolation of Missing Temperature Data at Meteorological Stations Using P-BSHADE. J. Clim. 2013, 26, 7452–7463. [Google Scholar] [CrossRef]
Yozgatligil, C.; Aslan, S.; Iyigun, C. Comparison of missing value imputation methods in time series: The case of Turkish meteorological data. Theor. Appl. Climatol. 2013, 112, 143–167. [Google Scholar] [CrossRef]
Gardner, E.S., Jr. Exponential smoothing: The state of the art—Part II. Int. J. Forecast. 2006, 22, 637–666. [Google Scholar] [CrossRef]
Li, Y.; Li, Z. Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transp. Res. Part C Emerg. Technol. 2013, 34, 108–120. [Google Scholar] [CrossRef]
Ran, B.; Tan, H.; Wu, Y. Tensor based missing traffic data completion with spatial-temporal correlation. Phys. A Stat. Mech. Appl. 2016, 446, 54–63. [Google Scholar] [CrossRef]
Qi, H.; Liu, M.; Wang, D. Spatial-Temporal Congestion Identification Based on Time Series Similarity Considering Missing Data. PLoS ONE 2016, 11, e162043. [Google Scholar] [CrossRef] [PubMed]
Holland, R.C.; Jones, G.; Benschop, J. Spatio-temporal modelling of disease incidence with missing covariate values. Epidemiol. Infect. 2015, 143, 1777–1788. [Google Scholar] [CrossRef] [PubMed]
Reynolds, K.M.; Madden, L.V. Analysis of epidemics using spatio-temporal autocorrelation. Phytopathology 1988, 78, 240–246. [Google Scholar] [CrossRef]
Li, D.; Deogun, J.; Spaulding, W. Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method. Lect. Notes Comput. Sci. 2004, 3066, 573–579. [Google Scholar]
Qu, L.; Li, L.; Zhang, Y. PPCA-based missing data imputation for traffic flow volume: A systematical approach. IEEE Trans. Intell. Transp. Syst. 2009, 10, 512–522. [Google Scholar]
Kong, L.; Xia, M.; Liu, X.Y. Data Loss and Reconstruction in Sensor Networks. IEEE Infocom 2013, 25, 1654–1662. [Google Scholar]
Yi, X.; Zheng, Y.; Zhang, J. ST-MVL: Filling Missing Values in Geo-Sensory Time Series Data. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016; pp. 2704–2710. [Google Scholar]
Karydas, C.G.; Gitas, I.Z.; Koutsogiannaki, E. Evaluation of spatial interpolation techniques for mapping agricultural topsoil properties in Crete. EARSeL eProceedings 2009, 8, 26–39. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning rep-resentation by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Zheng, Y.; Yi, X.; Li, M. Forecasting fine-grained air quality based on big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015. [Google Scholar]
Cesare, L.D.; Myers, D.E.; Posa, D. Estimating and modeling space-time correlation structures. Stat. Probab. Lett. 2001, 51, 9–14. [Google Scholar] [CrossRef]
Wang, J.F.; Li, X.H.; Christakos, G. Geographical Detectors-Based Health Risk Assessment and its Application in the Neural Tube Defects Study of the Heshun Region, China. Int. J. Geogr. Inf. Sci. 2010, 24, 107–127. [Google Scholar] [CrossRef]
Fan, Z.; Gong, G.; Liu, B. A Space-time Interpolation Method of Missing Data Based on Spatio-temporal Heterogeneity. Acta Geod. Cartogr. Sin. 2016, 45, 458–465. [Google Scholar]
Kondrashov, D.; Ghil, M. Spatio-temporal filling of missing points in geophysical data sets. Nonlinear Proc. Geophys. 2006, 13, 151–159. [Google Scholar] [CrossRef]

Figure 1. Patterns of missing spatio-temporal data. Black squares represent missing data.

Figure 2. Framework of model development.

Figure 3. Coarse-grained interpolation method.

Figure 4. Sliding window selection.

Figure 5. Fine-grained interpolation in the spatial dimension.

Figure 6. Fine-grained interpolation in the temporal dimension.

Figure 7. Neural network training.

Figure 8. Air quality monitoring stations.

Figure 9. Pattern of missing spatio-temporal PM_2.5 data. Red squares represent missing data, and values on the right ordinate are the number of missing data combinations.

Figure 10. Impact of the temporal threshold on coarse-grained interpolation.

Figure 11. Impact of the temporal threshold on ST-2SMR.

Figure 12. Influence of the sliding window on the interpolation results. The static window refers to the selection of fixed size sliding window (i.e., the center of missing data, taking the first 24 h and last 24 h as the sample data for interpolation).

Figure 13. Performance of the ST-2SMR using different spatio-temporal datasets. (a) The result of the experiment in NO₂ datasets; (b) The result of the experiment in CO datasets; (c) The result of the experiment in SO₃ datasets; (d) The result of the experiment in O₃ datasets .

Figure 14. Comparison of the computational time costs for different interpolation methods.

Table 1. Experimental dataset.

Data	Ratio of Missing	Complete Case	Number of Missing
PM2.5	13.25%	29.11%	41,771
CO	15.10%	0.00%	47,604
SO₃	15.24%	0.00%	48,041
O₃	15.43%	0.00%	48,667
NO₂	16.01%	0.00%	50,470

Table 2. Combination of different methods. P-BSHADE, point estimation model of biased hospital-based area disease estimation; ST, spatio-temporal; HC, heterogeneous covariance.

Method	Coarse-Grained	Window	Coarse-Grained + Window
ST-kriging	ST-kriging-C	ST-kriging-W	ST-kriging-C-W
P-BSHADE	P-BSHADE-C	P-BSHADE-W	P-BSHADE-C-W
ST-HC	ST-HC-C	ST-HC-W	ST-HC-C-W

Table 3. Performance comparison of different methods. RC, ratio of construction; ST-2SMR, spatio-temporal missing data reconstruction.

Condition	Method	MAE	MSE	RC
	ST-Kriging	18.3796	0.2211	96.20%
	P-BSHADE	18.2085	0.2190	96.20%
	ST-HC	26.4273	0.3066	65.94%
	ST-2SMR	15.5247	0.1319	65.94%
Window	ST-Kriging-W	14.3281	0.1724	99.60%
	P-BSHADE-W	14.6172	0.1758	99.29%
	ST-HC-W	11.2211	0.1349	93.69%
	ST-2SMR	9.5920	0.0822	93.69%
Coarse-Grained	ST-Kriging-C	13.1726	0.1585	100%
	P-BSHADE-C	12.9178	0.1554	100%
	ST-HC-C	8.7650	0.1054	100%
	ST-2SMR	7.4292	0.0470	100%
Coarse-Grained + Window	ST-Kriging-C-W	12.9717	0.1560	100%
	P-BSHADE-C-W	12.6669	0.1524	100%
	ST-HC-C-W	7.9196	0.0953	100%
	ST-2SMR	7.2285	0.0623	100%

Table 4. Performance of different coarse-grained interpolation methods ¹. SES, simple exponential smoothing.

Method	IDW + SES		ST-Kriging		P-BSHADE
Method	MAE	MRE	MAE	MRE	MAE	MRE
ST-kriging	13.1726	0.1585	13.6027	0.1636	13.6010	0.1636
P-BSHADE	12.9178	0.1554	13.5883	0.1635	13.6081	0.1637
ST-HC	8.7650	0.1054	9.0637	0.1089	9.1601	0.1101
ST-2SMR	7.4292	0.0470	7.4826	0.0475	7.6002	0.0484

¹ The abscissa represents fine-grained interpolation; the ordinate represents coarse-grained interpolation.

Table 5. Influence sample point number on interpolation results.

Neighbor Station Number		MAE	MRE	RC
Spatial	Temporal	MAE	MRE	RC
5	5	7.3300	0.0625	100%
5	10	7.3276	0.0635	100%
5	15	7.4736	0.0643	100%
10	5	7.2787	0.0630	100%
10	10	7.2285	0.0623	100%
10	15	7.2761	0.0630	100%
15	5	7.2952	0.0631	100%
15	10	7.2892	0.0631	100%
15	15	7.3332	0.0650	100%

Table 6. Two- and three-step interpolation results.

Method	Three-Step						Two-Step
	ST-Kriging		P-BSHADE		ST-HC		IDW + SES
	MAE	MRE	MAE	MRE	MAE	MRE	MAE	MRE
ST-Kriging	13.0498	0.1570	13.0235	0.1567	13.0741	0.1573	13.1726	0.1585
P-BSHADE	12.7898	0.1539	12.7964	0.1539	12.8304	0.1543	12.9178	0.1554
ST-HC	8.7075	0.1048	8.7206	0.1049	8.6618	0.1042	8.7650	0.1054
ST-2SMR	7.4143	0.0651	7.4215	0.0652	7.4080	0.0641	7.4292	0.0470

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

A Two-Step Method for Missing Spatio-Temporal Data Reconstruction

Abstract

1. Introduction

2. Materials and Methods

2.1. Problem Definitions

2.2. Method Framework

3. Detailed Design of ST-2SMR

3.1. Coarse-Grained Interpolation

3.2. Fine-Grained Interpolation

3.2.1. Sliding Window

3.2.2. Fine-Grained Spatial Dimension Interpolation

3.2.3. Fine-Grained Temporal Dimension Interpolation

3.3. Spatio-Temporal Integration

4. Results

4.1. Datasets

4.2. Evaluation Criteria

4.3. Experimental Results

4.3.1. Overall Results

4.3.2. Effect of Coarse-Grained Interpolation

4.3.3. Effect of the Coarse-Grained Missing Data Rate

4.3.4. Effect Sample Point Number

4.3.5. Effect of Sliding Window

4.3.6. Performance of Two- and Three-Step Interpolation

4.3.7. Performance Comparison for Different Datasets

4.3.8. Evaluation of Computational Efficiency

5. Summary

Acknowledgments

Author Contributions

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics