A Hybrid Method for Interpolating Missing Data in Heterogeneous Spatio-Temporal Datasets

Min Deng 1, Zide Fan 1, Qiliang Liu 1,* and Jianya Gong 2 1 Department of Geo-Informatics, Central South University, Changsha 410083, China; dengmin208@tom.com (M.D.); fanzide@msn.com (Z.F.) 2 State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China; gongjy@whu.edu.cn * Correspondence: qiliang.liu@csu.edu.cn; Tel.: +86-137-8611-5024; Fax: +86-731-8883-6783


Introduction
Evolving patterns of geographical phenomena are usually modeled as spatio-temporal processes depicted by space-time data.Nearly all instrumental space-time data are influenced by missing data [1,2].Before data analysis, missing data must be well handled.To treat missing data, two common approaches are available.One is to exclude periods with missing values from data analysis, and the other is to ignore the missing data based on the tacit assumption that the data represent one continuous series [3,4].However, these approaches may disregard useful information and bias the analysis results [2].To overcome these limitations, a number of interpolation methods have been proposed to estimate missing observations in space-time data.Most of these methods assume that the interpolation of space-time data can be reducible to a sequence of spatial interpolations [5].However, applying spatial interpolation methods to space-time data usually leads to the loss of valuable information in the temporal dimension [6].On that account, space-time interpolation methods that consider both spatial and temporal dimensions have been paid more attention and have been widely used in geoscience [7][8][9].
Currently, a series of space-time interpolation methods have been developed based on spatial interpolation methods-i.e., space-time inverse distance weighting methods, space-time kriging methods, covariance function plays a key role in spatio-temporal kriging methods.There are two kinds of spatio-temporal covariance functions: separable and non-separable models [25,29].In the separable model, the spatio-temporal covariance function is treated as either a sum or product of separate spatial and temporal covariance functions [30].In the non-separable model, the spatio-temporal covariance function is treated as a non-linear, multiplicative version of the spatial and temporal covariance functions [6,10,[31][32][33].However, spatio-temporal kriging methods assume that a space-time process has a constant mean and variance (i.e., second order stationarity) in space and time [13,15].To overcome this limitation, a point estimation model of biased hospitals-based area disease estimation, named P-Bshade, is developed [5,34].P-Bshade directly calculates the covariance of the historical observation data to determine the weighting coefficients of surrounding observation stations.The P-Bshade method is mainly designed for spatial interpolation; thus, the temporal dimension is not considered to estimate the missing data.

A Critical Analysis of Existing Work and Our Strategy
Based on the above analysis, the performance of existing methods can be summarized as follows: (a) Currently, missing data in a spatio-temporal dataset are mainly estimated by using spatial interpolation methods-e.g., spatial regression models, SRT, IDW, kriging, and P-Bshade.The neglect of time dimension will lead to the loss of valuable information in the estimation of missing data; and (b) Although a few spatio-temporal interpolation methods are currently available for estimating missing data-e.g., STIDW and spatio-temporal kriging-the heterogeneity (i.e., second-order non-stationarity) of spatio-temporal data should be further considered [35].
ISPRS Int.J. Geo-Inf.2016, 5, 13 3 of 14 covariance function plays a key role in spatio-temporal kriging methods.There are two kinds of spatio-temporal covariance functions: separable and non-separable models [25,29].In the separable model, the spatio-temporal covariance function is treated as either a sum or product of separate spatial and temporal covariance functions [30].In the non-separable model, the spatio-temporal covariance function is treated as a non-linear, multiplicative version of the spatial and temporal covariance functions [6,10,[31][32][33].However, spatio-temporal kriging methods assume that a space-time process has a constant mean and variance (i.e., second order stationarity) in space and time [13,15].To overcome this limitation, a point estimation model of biased hospitals-based area disease estimation, named P-Bshade, is developed [5,34].P-Bshade directly calculates the covariance of the historical observation data to determine the weighting coefficients of surrounding observation stations.The P-Bshade method is mainly designed for spatial interpolation; thus, the temporal dimension is not considered to estimate the missing data.

A Critical Analysis of Existing Work and Our Strategy
Based on the above analysis, the performance of existing methods can be summarized as follows: (a) Currently, missing data in a spatio-temporal dataset are mainly estimated by using spatial interpolation methods-e.g., spatial regression models, SRT, IDW, kriging, and P-Bshade.The neglect of time dimension will lead to the loss of valuable information in the estimation of missing data; and (b) Although a few spatio-temporal interpolation methods are currently available for estimating missing data-e.g., STIDW and spatio-temporal kriging-the heterogeneity (i.e., second-order non-stationarity) of spatio-temporal data should be further considered [35].

Missing data checked and tagged
Space-time data

Spatio-temporal integration
Cross validation  On that account, a new strategy should be developed.In this study, we assume that both spatial and temporal distributions of the spatio-temporal data are non-homogeneous (i.e., second-order non-stationarity).Motivated by the separable spatio-temporal covariance function, the space-time variable of interest is treated as a sum of independent spatial and temporal non-stationarity components.Heterogeneous covariance functions are then constructed for both spatial and temporal dimensions, and objective functions are maximized to obtain the best linear unbiased estimates in spatial and temporal dimensions.Finally, spatial and temporal correlations are considered to combine the interpolation results in spatial and temporal dimensions to estimate the missing data.The performance of the space-time interpolation operation will be evaluated by using cross-validation.In Figure 1, the strategy proposed in this study is shown.In the following, the spatio-temporal interpolation method will be represented.

Hybrid Interpolation Method for Heterogeneous Spatio-Temporal Data
Based on the strategy introduced in Section 2.2, spatial and temporal interpolations will be performed.A weighted sum of observations is used to obtain unbiased and minimal error variance estimates of missing data.To calculate the weights, heterogeneous covariance functions are constructed for spatial and temporal dimensions.

Heterogeneous Covariance Functions for Handling Space-Time Heterogeneity
As elaborated in Figure 1, we first check the space-time data to tag the unsampled space-time position.Then, a hierarchical clustering method-REDCAP (regionalization with dynamically constrained agglomerative clustering and partitioning) was employed to partition the study area into homogenous spatial regions based on the average observations at each station [36,37].The REDCAP method consists of two steps: spatial contiguity constrained hierarchical clustering and spatially-contiguous tree partitioning.In the first step, a hierarchical clustering method is first used to construct a spatially contiguous tree.In this study, average linkage is selected as the hierarchical clustering method.The spatially-contiguous tree is then partitioned into a number of sub-trees by minimizing an objective function.The objective function is defined as the total heterogeneity of all regions, represented as: where k is the number of regions, n i is the number of objects in ith region, x ij is the attribute value of the jth object in ith region, and x i is the mean attribute value of the ith region.
After the homogenous sub-regions are obtained, the missing values are interpolated in the temporal dimension.The m temporal neighbors of the missing value will be generated at first.As shown in Figure 2, let t 0 be a missing value in region k i at the time layer T 0 , m most correlated time layers of T 0 are determined based on region k i .In detail, the pair-wise objects for calculating the correlation between time layer T 0 and T j is identified as observed records in k i with same location.Then, at each correlated time layer of T 0 , the observed record with the same location of t 0 is identified as a temporal neighbor of t 0 .For example, in Figure 2, if {T 1 , T 2 , . . ., T m } are the m most correlated time layers of T 0 , the m temporal neighbors of t 0 are formed by {t 1 , t 2 , . . ., t m }.In the temporal dimension, the estimated value ( t0 ) of missing record t 0 is calculated according to Equation (2): where t j denotes the jth temporal neighbor of missing record t 0 in temporal dimension, and ϕ j denotes the corresponding contribution weight of t j .The missing records can be calculated by the other records of missing observation stations using Equation (2).To ensure that t0 is the unbiased estimate for the missing records, the following relationship should be satisfied: where t 0 represents the real value of missing value, E(t 0 ) represents the statistical expectation of t 0 in temporal dimension.Substituting Equation (2) into Equation (3), Equation (4) can be rewritten as: In view of temporal heterogeneity, we introduce a parameter of ratio a j calculated according to Equation (5).The parameter a j is used to represent the heterogeneity between two time layers: In Equation (5), if E(t j ) = 0 or E(t 0 ) = 0, a small positive constant (e.g., 0.0001) is added to E(t j ) or E(t 0 ).Combining Equations ( 4) and ( 5), the unbiased estimates of ϕ j can be determined by minimizing the variance E `t 0 ´t0 ˘2according to Equation (6).
With the restrictions of Equation ( 6), the minimized estimation error variance in temporal dimension can be represented as follows: where υ is a Lagrange multiplier.Furthermore, we can calculate the weight ϕ j based on Equation ( 7) and obtain the estimates of unsampled data based on Equation (2).
ISPRS Int.J. Geo-Inf.2016, 5, 13 5 of 14 where t0 represents the real value of missing value, E(t0) represents the statistical expectation of t0 in temporal dimension.Substituting Equation (2) into Equation (3), Equation ( 4) can be rewritten as: In view of temporal heterogeneity, we introduce a parameter of ratio aj calculated according to Equation (5).The parameter aj is used to represent the heterogeneity between two time layers: In Equation ( 5), if E(tj) = 0 or E(t0) = 0, a small positive constant (e.g., 0.0001) is added to E(tj) or E(t0).Combining Equations ( 4) and ( 5), the unbiased estimates of φj can be determined by minimizing the variance ̂ according to Equation ( 6).
With the restrictions of Equation ( 6), the minimized estimation error variance in temporal dimension can be represented as follows: where υ is a Lagrange multiplier.Furthermore, we can calculate the weight φj based on Equation ( 7) and obtain the estimates of unsampled data based on Equation (2).After the interpolation in the temporal dimension is finished, the interpolation in the spatial dimension will be implemented similar to that in the temporal dimension.In each homogenous sub-region, the largest correlated n stations are selected to interpolate the missing value in a station, After the interpolation in the temporal dimension is finished, the interpolation in the spatial dimension will be implemented similar to that in the temporal dimension.In each homogenous sub-region, the largest correlated n stations are selected to interpolate the missing value in a station, and the correlation between two stations is calculated based on the observed time series.The estimated value of missing value ŷ0 is calculated according to Equation ( 8 where y i denotes the observed record in station i at the same time layer of the missing value y 0 , w i denotes the corresponding contribution weight of y i , and ŷ0 is defined as an unbiased estimate for the missing value y 0 .Similarly to Equations ( 6) and ( 7), the minimized estimation error variance in spatial dimension can be represented as follows: where µ is a Lagrange multiplier.Furthermore, we can calculate the weight w i based on Equation ( 9) and obtain the estimates of the missing values based on Equation (8).

Estimating Spatio-Temporal Missing Data by Combining Both Spatial and Temporal Information
To calculate the missing value t 0 in region k i at time layer T 0 in temporal dimension, m most correlated time layers and m temporal neighbors are first generated based on the method introduced in Section 3.1.In Equation (7), minimizing σ 2 t0 with respect to weights ϕ j (j = 1, 2 , ..., m) and taking the partial derivative with respect to ϕ j , Equation ( 7) can be expanded into a matrix equation as follows: where υ is also the Lagrange multiplier, C ´tj , t j 1 ¯is the covariance between the jth time layer and the j'th time layer calculated based on the records in region k i , and pair-wise objects for calculating C ´tj , t j 1 ¯are identified as records in k i with the same location.C `tj , t 0 ˘is the covariance between the jth time layer and time layer T 0 with missing values.The calculation of C `tj , t 0 ˘is similar to that of C ´tj , t j 1 ¯; however, only observed records are involved to calculated the covariance.a j denotes a ratio between time layer T j and time layer T 0 with missing values (calculated by Equation ( 5)).Then, to calculate the missing value y 0 in the region k i at time layer T 0 in the spatial dimension, the largest correlated n stations are first selected based on the method introduced in Section 3.1.As shown in Figure 3, the red star is the station with the missing value y 0 .The green stars represent the largest correlated n stations of the station with the missing value (n = 5).The red dots represent the temporal neighbors of the missing value, and the green dots represent other observed records.In Equation ( 9), minimizing σ 2 ŷ0 with respect to weights w i (i = 1, 2, ..., n) and taking the partial derivative with respect to w i , Equation ( 9) can be expanded into a matrix equation as Equation ( 11): where µ denotes the Lagrange multiplier, C py i , y i 1 q on the left-hand side of Equation (11) is the covariance between the station i and the station i', calculated based on the time series of these two stations.C py i , y 0 q on the right-hand side of Equation (11) is the covariance between the station i and station with missing value, calculated based on the observed time series of these two stations.b i denotes a ratio between station i and the station with the missing value, calculated by E(y i )/E(y 0 ), where E(y i ) represents mean value of the time series at station i.The matrix consisting of the contribution weights w i can be calculated by Equation (11).
Finally, the estimated values in the spatial and temporal dimensions should be integrated to obtain the overall estimated value Y ij of the missing value.There are mainly two kinds of space-time geostatistical models, i.e., the separable model and non-separable model [29].The separable model is easy to implement; however, the space-time interaction may be not well considered.Although the non-separable model is able to consider the space-time interaction, in theory, the construction of the non-separable model for the non-stationarity space-time variable is very difficult [30].In this study, spatial and temporal dimensions are both considered to calculate the interpolation results in spatial or temporal dimensions, e.g., the solution of Equations ( 10) and (11).Thus, we think that, to some degree, space-time interaction is considered in spatial and temporal dimension interpolation.Therefore, the overall estimated value Y ij is defined as a weighted sum of estimated values in spatial and temporal dimensions, represented as follows: where i is the station number, j is the time series number, A is the weight in spatial dimension, and B is that in temporal dimension (A + B = 1).In this study, the weights in spatial and temporal dimensions are calculated according to the correlation coefficient, represented as follows: where n represents the number of spatial neighbors and m represents the number of temporal neighbors.R(y i , y 0 ) represents the correlation between the missing value and its spatial neighbors, measured by the correlation coefficient between the observed time series of station i and that of the station with missing value y 0 .R(t j , t 0 ) represents the correlation between the missing value and its temporal neighbors, measured by the correlation coefficient between time layer T 0 and T j calculated based on region k i (t j Pk i , t 0 Pk i ).From Equation ( 13), it can be found that if the missing value is more correlated with its spatial neighbors (temporal neighbors); thus, the weight in the spatial dimension (temporal dimension) will be heavy.After the weights in the spatial and temporal dimensions are calculated by solving Equation ( 13), the final estimation results of missing data can be obtained by Equation (12).
ISPRS Int.J. Geo-Inf.2016, 5, 13 7 of 14 Finally, the estimated values in the spatial and temporal dimensions should be integrated to obtain the overall estimated value Yij of the missing value.There are mainly two kinds of space-time geostatistical models, i.e., the separable model and non-separable model [29].The separable model is easy to implement; however, the space-time interaction may be not well considered.Although the non-separable model is able to consider the space-time interaction, in theory, the construction of the non-separable model for the non-stationarity space-time variable is very difficult [30].In this study, spatial and temporal dimensions are both considered to calculate the interpolation results in spatial or temporal dimensions, e.g., the solution of Equations ( 10) and (11).Thus, we think that, to some degree, space-time interaction is considered in spatial and temporal dimension interpolation.Therefore, the overall estimated value Yij is defined as a weighted sum of estimated values in spatial and temporal dimensions, represented as follows: where i is the station number, j is the time series number, A is the weight in spatial dimension, and B is that in temporal dimension (A + B = 1).In this study, the weights in spatial and temporal dimensions are calculated according to the correlation coefficient, represented as follows: where n represents the number of spatial neighbors and m represents the number of temporal neighbors.R(yi, y0) represents the correlation between the missing value and its spatial neighbors, measured by the correlation coefficient between the observed time series of station i and that of the station with missing value y0.R(tj, t0) represents the correlation between the missing value and its temporal neighbors, measured by the correlation coefficient between time layer T0 and Tj calculated based on region ki (tj∈ki, t0 ∈ki).From Equation ( 13), it can be found that if the missing value is more correlated with its spatial neighbors (temporal neighbors); thus, the weight in the spatial dimension (temporal dimension) will be heavy.After the weights in the spatial and temporal dimensions are calculated by solving Equation ( 13), the final estimation results of missing data can be obtained by Equation (12).

Experiments and Results Analysis
The proposed interpolation method is implemented in MATLAB 2014b.The average annual temperature and precipitation data from 554 meteorological stations during the period from 1984 to 2009 is selected to validate the proposed interpolation method.These experimental data are provided by the China National Meteorological Information Center (CNMIC).In these datasets, some temperature and precipitation records are missing.In Figure 4, the numbers of missing records in different years are shown.One can find that the number of missing data decreased sharply around 1995, gradually reaching a stable level around that year.The maximal number of missing records is 30 (in the year 1984), and the minimal number of missing records is eight (in the year 2001 and 2002).

Experiments and Results Analysis
The proposed interpolation method is implemented in MATLAB 2014b.The average annual temperature and precipitation data from 554 meteorological stations during the period from 1984 to 2009 is selected to validate the proposed interpolation method.These experimental data are provided by the China National Meteorological Information Center (CNMIC).In these datasets, some temperature and precipitation records are missing.In Figure 4, the numbers of missing records in different years are shown.One can find that the number of missing data decreased sharply around 1995, gradually reaching a stable level around that year.The maximal number of missing records is 30 (in the year 1984), and the minimal number of missing records is eight (in the year 2001 and 2002).As illustrated in Figure 5, the whole study area including 554 observation stations are partitioned into different homogenous sub-regions.Dots of identical color belong to the same sub-region.The number of sub-regions is determined by REDCAP, which is set as 12 for temperature observations and eight for precipitation observations.The number of clusters is determined based on the prior knowledge (e.g., climate zones in China) and the clustering validity index (L-method).The number of neighboring stations is determined based on the existing study and clustering results.Based on experiments, Xu et al. [5] suggested that the number of neighboring stations can be set from 5-15.It is also known that stations are similar to one another in the same cluster and are dissimilar to stations in different stations.Thus, the number of neighboring stations cannot be larger than the size of the smallest cluster.On the basis of these principles, the number of spatial neighbors n is set to 10. Hubbard et al. [20,21] suggested that the number of temporal neighbors can be set to about half of the research period.In this study, the research period is 26, therefore the number of temporal neighbors m is also set to 10.
Particularly, the interpolation results of other three widely used methods (spatio-temporal kriging with product-sum covariance method [6], denoted as STKriging; spatio-temporal inverse distance weighting, denoted as STIDW; and point estimation model of biased hospital-based area disease estimation, denoted as P-Bshade), and this proposed method, i.e., space-time heterogeneous covariance method (denoted as STHC) are compared to evaluate the accuracy.To implement the STKriging, a data pre-process operation is first implemented to guarantee second-order stationarity.In the temporal dimension, no obvious trend or periodicity is found.In the spatial dimension, the As illustrated in Figure 5, the whole study area including 554 observation stations are partitioned into different homogenous sub-regions.Dots of identical color belong to the same sub-region.The number of sub-regions is determined by REDCAP, which is set as 12 for temperature observations and eight for precipitation observations.The number of clusters is determined based on the prior knowledge (e.g., climate zones in China) and the clustering validity index (L-method).The number of neighboring stations is determined based on the existing study and clustering results.Based on experiments, Xu et al. [5] suggested that the number of neighboring stations can be set from 5-15.It is also known that stations are similar to one another in the same cluster and are dissimilar to stations in different stations.Thus, the number of neighboring stations cannot be larger than the size of the smallest cluster.On the basis of these principles, the number of spatial neighbors n is set to 10. Hubbard et al. [20,21] suggested that the number of temporal neighbors can be set to about half of the research period.In this study, the research period is 26, therefore the number of temporal neighbors m is also set to 10.
Particularly, the interpolation results of other three widely used methods (spatio-temporal kriging with product-sum covariance method [6], denoted as STKriging; spatio-temporal inverse distance weighting, denoted as STIDW; and point estimation model of biased hospital-based area disease estimation, denoted as P-Bshade), and this proposed method, i.e., space-time heterogeneous covariance method (denoted as STHC) are compared to evaluate the accuracy.To implement the STKriging, a data pre-process operation is first implemented to guarantee second-order stationarity.In the temporal dimension, no obvious trend or periodicity is found.In the spatial dimension, the trend is estimated in each spatial location by using a moving window, and a local trend estimation procedure with an optimum window size proposed by Pelletier, et al. [38] is employed to estimate the trend.In each window, the polynomial of order one is used to model the trend.In addition, STKriging and STIDW are also performed in each sub-region obtained by the clustering method, denoted as STKriging-Partition and STIDW-Partition.To assess the performance of different interpolation methods, annual station records in China from 1984 to 2009 were estimated by leave-one-out cross-validation [39].Each observed record is first removed, and then the estimated value is compared with the true observed value.Three indicators, i.e., mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (r 2 ), are calculated to evaluate the accuracy of the interpolation results.The statistical results are shown in Table 1.
ISPRS Int.J. Geo-Inf.2016, 5, 13 9 of 14 trend is estimated in each spatial location by using a moving window, and a local trend estimation procedure with an optimum window size proposed by Pelletier, et al. [38] is employed to estimate the trend.In each window, the polynomial of order one is used to model the trend.In addition, STKriging and STIDW are also performed in each sub-region obtained by the clustering method, denoted as STKriging-Partition and STIDW-Partition.To assess the performance of different interpolation methods, annual station records in China from 1984 to 2009 were estimated by leave-one-out cross-validation [39].Each observed record is first removed, and then the estimated value is compared with the true observed value.Three indicators, i.e., mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (r 2 ), are calculated to evaluate the accuracy of the interpolation results.The statistical results are shown in Table 1.
(a) (b) As listed in Table 1, it can be found that the accuracy of our method is obviously higher than that of the other methods.For temperature and precipitation data, the MAE and RMSE values of our method are significantly lower than those of STKriging, STIDW, STKriging-Partition, and STIDW-Partition methods, showing great improvement in interpolation accuracies.The interpolation accuracy of our method is also higher than that of P-Bshade method, even though the improvement is less remarkable.The values of MAE and RMSE per year are shown in Figure 6.For the temperature data, the MAE error of the proposed method is significantly lower than other methods over the years, as well as RMSE.For the precipitation data, it also can be found that the proposed method has the least MAE and RMSE error.However, two serious errors appeared in the years 1985 and 2000.In Figure 7, scatterplots between observed values and the estimated values for each method are shown.The horizontal axis denotes the estimated value, and the vertical axis denotes the observed value.The blue dots depict the scatter of estimated values and observed values, and the red line represents the line y = x.If the estimated values are similar to the observed values, the blue dots As listed in Table 1, it can be found that the accuracy of our method is obviously higher than that of the other methods.For temperature and precipitation data, the MAE and values of our method are significantly lower than those of STKriging, STIDW, STKriging-Partition, and STIDW-Partition methods, showing great improvement in interpolation accuracies.The interpolation accuracy of our method is also higher than that of P-Bshade method, even though the improvement is less remarkable.The values of MAE and RMSE per year are shown in Figure 6.For the temperature data, the MAE error of the proposed method is significantly lower than other methods over the years, as well as RMSE.For the precipitation data, it also can be found that the proposed method has the least MAE and RMSE error.However, two serious errors appeared in the years 1985 and 2000.In Figure 7, scatterplots between observed values and the estimated values for each method are shown.The horizontal axis denotes the estimated value, and the vertical axis denotes the observed value.The blue dots depict the scatter of estimated values and observed values, and the red line represents the line y = x.If the estimated values are similar to the observed values, the blue dots should be close to the red line (y = x).It can be seen that, compared with other five methods, the estimated value by our method is much closer to its observed value-i.e., closer to the red reference line.For all the six methods, the interpolation results of the temperature data are all better than the results of the precipitation data.
The average interpolation error at each station is plotted in Figure 8.As illustrated in Figure 8a, the error distribution of temperature data is relatively uniform, where Southwest and Northwest China are significantly higher than the eastern region, mainly because the observatory stations in Western China are scarcer.In contrast to the precipitation data, one can see that the errors in Southern China are generally higher than those in Northern China, mainly due to more plentiful rainfall in the south than in the north, which results in increasingly abnormal situations.In Table 1, the residual autocorrelation measured by the Moran's I index is calculated for each method.It can be found that the proposed method has the least residual autocorrelation.Further, the Quantile-Quantile plot is performed to investigate the normality of the residuals obtained by the proposed method.Based on the results shown in Figure 9, one can find that the plots are close to linear.The kurtosis and skewness are also calculated for the residuals.It can be seen that kurtosis is close to three and skewness is close to zero; thus, it can be concluded that the distribution of the residuals is close to the normal distribution.
should be close to the red line (y = x).It can be seen that, compared with other five methods, the estimated value by our method is much closer to its observed value-i.e., closer to the red reference line.For all the six methods, the interpolation results of the temperature data are all better than the results of the precipitation data.
The average interpolation error at each station is plotted in Figure 8.As illustrated in Figure 8a, the error distribution of temperature data is relatively uniform, where Southwest and Northwest China are significantly higher than the eastern region, mainly because the observatory stations in Western China are scarcer.In contrast to the precipitation data, one can see that the errors in Southern China are generally higher than those in Northern China, mainly due to more plentiful rainfall in the south than in the north, which results in increasingly abnormal situations.In Table 1, the residual autocorrelation measured by the Moran's I index is calculated for each method.It can be found that the proposed method has the least residual autocorrelation.Further, the Quantile-Quantile plot is performed to investigate the normality of the residuals obtained by the proposed method.Based on the results shown in Figure 9, one can find that the plots are close to linear.The kurtosis and skewness are also calculated for the residuals.It can be seen that kurtosis is close to three and skewness is close to zero; thus, it can be concluded that the distribution of the residuals is close to the normal distribution.

Conclusions
This paper develops a space-time missing data interpolation method based on a heterogeneous spatio-temporal covariance model.Spatial and temporal heterogeneity are first considered in the construction of the covariance model, and the best linear unbiased estimates in spatial and temporal dimensions are obtained.According to the spatio-temporal correlation coefficient, spatial and temporal interpolation results are then integrated to estimate the missing values of the unsampled stations.Experiments and comparisons were performed by using the average annual temperature and precipitation data in China over the past 26 years.The experimental results show that the proposed method achieves higher accuracy than other classic methods.
It should be noted that the space-time interaction may not be fully considered by the proposed method.Although the interpolation results obtained by the proposed method are more accurate than those of existing methods, a space-time coupling model that can fully consider space-time interaction should be developed in the future.

Conclusions
This paper develops a space-time missing data interpolation method based on a heterogeneous spatio-temporal covariance model.Spatial and temporal heterogeneity are first considered in the construction of the covariance model, and the best linear unbiased estimates in spatial and temporal dimensions are obtained.According to the spatio-temporal correlation coefficient, spatial and temporal interpolation results are then integrated to estimate the missing values of the unsampled stations.Experiments and comparisons were performed by using the average annual temperature and precipitation data in China over the past 26 years.The experimental results show that the proposed method achieves higher accuracy than other classic methods.
It should be noted that the space-time interaction may not be fully considered by the proposed method.Although the interpolation results obtained by the proposed method are more accurate than those of existing methods, a space-time coupling model that can fully consider space-time interaction should be developed in the future.Program of China (2013AA122301)", "The Hunan Funds for Excellent Doctoral Dissertation (CX2014B050)", "The Central South University Funds for Excellent Doctoral Dissertation (2015zzts067)" Grants, and "The State Key Laboratory of Resources and Environmental Information System".

Conclusions
This paper develops a space-time missing data interpolation method based on a heterogeneous spatio-temporal covariance model.Spatial and temporal heterogeneity are first considered in the construction of the covariance model, and the best linear unbiased estimates in spatial and temporal dimensions are obtained.According to the spatio-temporal correlation coefficient, spatial and temporal interpolation results are then integrated to estimate the missing values of the unsampled stations.Experiments and comparisons were performed by using the average annual temperature and precipitation data in China over the past 26 years.The experimental results show that the proposed method achieves higher accuracy than other classic methods.
It should be noted that the space-time interaction may not be fully considered by the proposed method.Although the interpolation results obtained by the proposed method are more accurate than those of existing methods, a space-time coupling model that can fully consider space-time interaction should be developed in the future.

Figure 1 .
Figure 1.The flowchart of this method.Figure 1.The flowchart of this method.

Figure 1 .
Figure 1.The flowchart of this method.Figure 1.The flowchart of this method.

Figure 2 .
Figure 2. Interpolation in temporal dimension (blue dots represent observed records; red dotted lines are used to represent the spatial relationships).

Figure 2 .
Figure 2. Interpolation in temporal dimension (blue dots represent observed records; red dotted lines are used to represent the spatial relationships).

Figure 3 .
Figure 3. Interpolation in the spatial dimension.Figure 3. Interpolation in the spatial dimension.

Figure 3 .
Figure 3. Interpolation in the spatial dimension.Figure 3. Interpolation in the spatial dimension.

Figure 4 .
Figure 4.The number of missing records from 1984 to 2009.

Figure 4 .
Figure 4.The number of missing records from 1984 to 2009.

Figure 7 .
Figure 7. Scatterplot between observed value and the estimated value; (a) temperature and (b) precipitation datasets.Figure 7. Scatterplot between observed value and the estimated value; (a) temperature and (b) precipitation datasets.

Figure 7 .
Figure 7. Scatterplot between observed value and the estimated value; (a) temperature and (b) precipitation datasets.Figure 7. Scatterplot between observed value and the estimated value; (a) temperature and (b) precipitation datasets.

Table 1 .
Experimental results of different interpolation methods.

Table 1 .
Experimental results of different interpolation methods.