Interpolation in Time Series: An Introductive Overview of Existing Methods, Their Performance Criteria and Uncertainty Assessment

: A thorough review has been performed on interpolation methods to ﬁll gaps in time-series, efﬁciency criteria, and uncertainty quantiﬁcations. On one hand, there are numerous available methods: interpolation, regression, autoregressive, machine learning methods, etc. On the other hand, there are many methods and criteria to estimate efﬁciencies of these methods, but uncertainties on the interpolated values are rarely calculated. Furthermore, while they are estimated according to standard methods, the prediction uncertainty is not taken into account: a discussion is thus presented on the uncertainty estimation of interpolated/extrapolated data. Finally, some suggestions for further research and a new method are proposed.


Introduction
For numerous purposes, different time series are recorded and analyzed to understand phenomena or/and behaviors of variables, to try to predict future values, etc.Unfortunately, and for several reasons, there are gaps in data, irregular time steps of recordings, or removed data points that often need to be filled for data analysis, calibration of models, or for data with a regular time step.Generally in practice, incomplete series are the rule [1].In order to fill gaps in time series, numerous methods are available, and the choice of the method(s) to apply is not easy for non-mathematicians among data users.This paper presents a general overview, as a general and non-detailed introduction to such methods, especially written for readers without an introduction on interpolation methods.
First to all, a distinction should be clearly stated: the difference between interpolation and extrapolation.A first approach can be defined as the following.To fill gaps and, respectively, to predict future values, often require similar methods [2], named as interpolation and extrapolation methods respectively.A second approach distinguishes between both cases, while comparing the range of existing and missing data.If there is a model to fill the gaps in a time series, if predictions are similar in content, and there is variability to observations, model errors should be equivalent for the existing and the missing data set (interpolation).Otherwise an extrapolation (i.e., predictions are outside of the observation range) has been performed, and expected errors would be greater for the predictions.An algorithm based on the data depth theory has been coded to make the distinction between interpolation and extrapolation [3].Differences between existing and missing data in those methods could cause significant errors [4].This paper deals with the interpolation method, in the sense Water 2017, 9, 796 2 of 20 of a first approach without consideration of the potential differences between the ranges of existing and missing data.
Secondly, there are many different kinds of data.Economical, financial, chemical, physical and environmental variables do not have the same characteristics: they can range from random to polycyclic processes.This wide range of characteristics requires a wide range of methods to fill gaps, and poses problems to researchers and practitioners when they choose one of the available methods.One thing is clear: a straightforward calculation of prediction uncertainty does not yield satisfactory results.
Finally, an increasing proportion of data contains values and their uncertainties (standard uncertainties, 95% confidence intervals, etc.), and requires specific methods to properly assess the uncertainties of the interpolated data.Two normalized methods are recommended and presented in a fourth section to assess uncertainties: the law of propagation of uncertainties [5] and Monte-Carlo simulations [6].The estimation of prediction uncertainties can be easily performed with combined procedures.Assessment of the uncertainties of interpolated data is reported only seldomly: a brief review of the literature is presented in Section 4. Two articles are already mentioned here, due to the relevance of their data and ideas: [7], where the application of a Markovian arrival process to interpolate daily rainfall amount applied on a wide range of existing distributions is presented, and [8], where an overview of methods and questions/consequences related to the evaluation of uncertainties in missing data is presented.
According to [1], a useful interpolation technique should meet four criteria: (i) not a lot of data is required to fill missing values; (ii) estimation of parameters of the model and missing values are permitted at the same time; (iii) computation of large series must be efficient and fast, and (iv) the technique should be applicable to stationary and non-stationary time series.The selected method should also be accurate and robust.Prior to the application of interpolation methods, [9] mentioned two preliminary steps to apply prior to the determination of a model: (i) to separate the signal (trends of interest, i.e., relevant and dependent on subsequent use of the data) from the noise (variability without interest) as in [10], or (ii) to understand the past or existing data to better forecast the future, or to fill gaps in missing data.Despite the fact that smoothing algorithms, which may be used to distinguish signals while removing their noise, and interpolation methods are sometimes similar, this article does not deal with smoothing methods.
The following example aims at presenting pending scientific questions on this topic.Suppose rainfall intensity is measured at an hourly basis.The value at 1 p.m. is equal to 10 mm/h with a standard uncertainty of ±1 mm/h at 1 p.m.The instrument produced an error signal at 2 p.m. Finally, at 3 p.m., the instrument again logged 10 ± 1 mm/h.Under the assumption of randomness of the series (covariances and correlation coefficients are equal to 0, direct calculation by law of propagation of uncertainties (an average between 1 p.m. and 3 p.m. values) of the missing value at 2 p.m. yields 10 ± 0.7 mm/h.The standard uncertainty is marked off by calculations that are applied with the minimal (equal to −1) and the maximal (equal to +1) correlation coefficient values.These calculations lead to a standard uncertainty range between ± 0.0 and ± 1 mm/h, which is always smaller or equal to the measurement uncertainties.The smaller uncertainty value, due to the averaging process, reflects the uncertainty of a 2 h averaging period, whereas we are interested in a 1 h averaging period around 2 p.m., similar to the measurement points at 1 p.m. and 3 p.m.It is concluded that the prediction uncertainty cannot be estimated from the law of propagation of uncertainties [5].
This review paper is structured as follows.Numerous methods (see Section 2) have been published about filling gaps in data.This paper gives an accessible overview of those methods in Section 2. In spite of the recommended comparisons [11] between existing methods, only a few studies have compared methods (e.g., [11][12][13]); these are detailed in the second section.In the third section of this article, criteria applied and published in past studies for comparison of methods are given.Due to the numerous methods and criteria available, a comparison between different studies with numerical values is often impossible.To improve the readability of the present article, a common notation has been applied to present methods: x i are observed data, X i are the interpolated value(s) of missing data, Water 2017, 9, 796 3 of 20 i is an index, and ε are the residuals.Each of them apply only for univariate time-series.This paper does not deal with multivariate time series.

Interpolation Methods
Existing methods are detailed in the following Sections: Section 2.1 for deterministic methods, and Section 2.2 for stochastic methods.This distinction, based on the existence or non-existence of residuals (differences between predictions at known locations and observations) in the interpolation function, also deals with uncertainty.

Deterministic Methods
This first group includes methods that perform basic interpolation between the last known point before the gap (x B ), and the first one known after the gap (x A ).

Nearest-Neighbor Interpolation
The nearest neighbor or proximal interpolation, introduced by [14], is the easiest method [13,15]: the value of the closest known (existing) neighbor is assigned to the missing value (such as Equation ( 1)).Interpolated data by this method are discontinuous [16].This method often yields the worst results [13,16], and should not be confused with the k-nearest-neighbors methods.
where a and b are the indexes of x A and x B .

Polynomial Interpolation
Another basic method is linear interpolation [16].This method searches for a straight line that passes through the end points x A and x B .Various equivalent equations exist for this method, such as (2a) given in [16], and (2b) given in [17].
where α is the interpolation factor and varies from 0 to 1.This method has been considered to be useful and easy to use [18].By construction and according to [9], interpolated data are bound between x A and x B , and true values are, in average, underestimated: this affirmation is strongly dependent on the distribution of data (on which side the distribution is tailed-left or right) and should be verified for each data set.[16] demonstrated that this method is efficient, and most of the time it is better than non-linear interpolations for predicting missing values in environmental phenomena with constant rates.The linear interpolation could be extended to higher degrees (Equation (3)).Polynomial, piecewise polynomials, Lagrange form polynomials [9], cubic interpolations such as Cubic Hermite [16] or spline interpolations are different terms for the same technique.Data are exactly fitted with a local polynomial function.These methods differ with the degrees of the polynomial and the continuity of derivatives function.For polynomial or piecewise polynomial interpolation, known and interpolated data are continuous.For the cubic Hermite interpolation, the first derivative function is continuous [16].For spline techniques, the second derivative function is also continuous [16].
Water 2017, 9, 796 4 of 20 where N MAX is the maximum degree of the polynomial function, c ij are coefficients and x i are existing data.As an interpolation method, the amount of data used to estimate the coefficients is equal to K, in order to force the model to match the original records [9]: this is too strong a constraint for when data are uncertain.Sometimes these methods interpolate outside of the observed range of data, contrary to nearest-neighbor and linear interpolations [9], or with spurious oscillations due to too high a degree of polynomials.Bates et al. [19] proposed to penalize the second derivative function of the polynomial to avoid oscillations.Many types of splines are very popular for interpolation.B-splines and their derivative functions could be summed and weighted in MOMS (Maximal Order and Minimal Support) functions [10].

Methods Based on Distance Weighting
These methods are quite similar to the autoregressive methods, and use nearest values weighted by a p power of the distance d i to the interpolated point (see Equation ( 4)).
where n is the number of points used for the interpolation.These methods can be named inverse distance weighting (if p = 1) of inverse squared distance weighting (if p = 2).Equation (4) shows that the methods are not very different from a moving average.In the review by [20], Taylor's adjustments of these methods for specific applications (e.g., in [21]) have been proposed for detrended inverse distance weighting, [22] for angular distance weighting, etc.).

Methods Based on Fourier's Theory: Use of Signal Analysis Methods
In some studies, interpolation methods based on the Fourier's Transform (FT) are presented.Basically, a Fourier's transform converts a signal from time or spatial domain to frequency domain: the signal becomes a sum of complex magnitudes.Fourier was the first researcher who used function superpositions to describe physical phenomena: this method assumes that a signal is a sum of different periodic signals (with different periods and magnitudes).If the signal contains aperiodic variations, the FT needs a lot of frequencies.There are many kinds of partial transforms for phase alone, magnitude alone, or signed magnitude [23].Authors have pointed out the advantages of FT.Chen et al. [24] considered that FT is a good technique, especially for autocorrelation and power spectrum simulations.Musial et al. [9] qualified this method as suitable and efficient for predicting future values, especially if the behavior of the missing values was similar to the existing values.However, some disadvantages have been listed: FT requires Gaussian distribution as autoregressive methods [24], and a similar behavior between observations and predictions [9].FT is only suitable for stationary data, and without aperiodic extreme behaviors [25].
To avoid a high number of frequencies for aperiodic signals, [9] have used different basic functions as Legendre or Tchebicheff polynomials, and Empirical Orthogonal Functions (EOFs), such as an extension of Principal Component Analysis (PCA) for time-series, also called Spectral PCA (SPCA) by [26].In this last study, methods have been presented: data pre-processing and filtering of spectra.According to [26], SPCA is more efficient than a basic PCA, due to the fact that the power spectrum is invariant to delays and phase shifts, and that SPCA is less sensitive to the range of missing data (by comparison the range of existing ones) than PCA.Plazas-Nossa and Torres [27] have compared two methods: one based on Discrete FT (DFT), and one mixed with PCA (PCA/DFT).According to the NRMSD criteria (Table 1), efficiencies of these methods are strongly dependent on data and the time of extrapolation.No generalizable conclusions have been found.
Musial et al. [9] have compared two methods based on FT.The first one is called the Long-Scargle method and retrieves the periodogram of time-series with non-constant time steps.This method could be used with two moving windows: the Hamming or the Kaiser-Bessel [11].The second window appears to be less sensitive to periodic gaps, and offers better estimations close to the start or the end of the series than the first one.The second one is Singular Spectrum Analysis (SSA) presented by [28] based on EOFs, and where functions are chosen by the data itself [9].The time series is a sum of trends, oscillations, and noise functions.This method possesses a good theoretical background, and is suitable for a large variety of series (with non-linear trend and non-harmonic oscillations).However, SSA requires a lot of computational effort, and shows a poor performance for long and continuous gaps.
Bienkiewicz et al. [25] have reminded that the Short-Time FT (STFT)-a FT with a fixed size and moving window-is suitable for non-stationary data, but strongly limited by fixed size window and frequency resolution.
The Wavelet Transforms (WTs) are a generalization of the STFT: the size of the window varies with the frequency [25].Two types of WT exist: the Continuous (CWT, Equation ( 5)) and the Discrete (DWT).
where ϕ is a localized and oscillatory function called a wavelet [29], and C D and C T are the wavelet coefficients of dilatation and translation [30].These coefficients can be used to understand phenomena.A lot of wavelet functions are available.WT seems to be useful for data with sudden changes [25].
Each of these transforms based on Fourier's theory (FT, DFT, CFT, STFT, DWT, CWT) can be reversed to assess X(t) and fill in gaps: see e.g., [29] for CWT or [27] for DFT.
A final method has been found in the literature, called Fractal Interpolation [31].

Other Deterministic Interpolation Method
Polynomial interpolations could be extended to any other kind of function if there is a model with a theoretical (e.g., chemical, physical) process description: to predict a discharge with a water level, or to predict a tension with a current, etc.All of those models need to be defined by a user with expertise in the concerned topic.Logistic regressions can be used to predict the missing values of a binary variable [13].

Regression Methods
Regression methods differ from real interpolation methods in several aspects.The number of data used to calibrate parameters of the regression function is higher than the number of parameters (risk of overfitting).Splines become smoothing splines and the function does not fit each observation as exponential functions (Equation ( 6)- [16]).
where x 0 is the initial value, k is the rate of change, and E(i) represents the stochastic part.Gnauck [16] designated a part of these methods (and Auto-Regressive methods-Section 2.2.2) as approximation methods: residuals occur and may be a source of uncertainties in interpolated or predicted values.Apart from the nearest-neighbor method and a reformulation of linear interpolations (Equation (2) to Equation (3) with N MAX = 1), both methods presented in Section 2.1 are specific cases of regression methods.
An important advantage of these methods is that no assumptions need to be formulated about the nature of the phenomena [9].Several authors have pointed out two disadvantages: (i) polynomial interpolations, polynomial or logistic regressions, and splines are tedious [24], inefficient and inapplicable methods [32], and (ii) management of oscillations.Chen et al. [24] pointed out that these methods could miss some variations if a phenomenon is shorter than the size of the gap.N MAX can be chosen according to variables and gaps: [16] preferred non-linear interpolations if gaps lasted more than 28 days for water quality time series.The trade-off between significant and noise variability requires the expertise of the physical processes that underline the data regarding the missing data to be interpolated, and probably to check a posteriori the predictions (interpolated values).

AutoRegressive Methods
These methods are statistical parametric models [13,33] based on a Markovian process.Schlegel et al. [16] explained the basic idea of those methods: a stochastic signal X(t) can be extracted from white noise (ε(t), with an average equal to 0) through smoothing processes.
where p is the order of the autoregressive part, and ϕ j are coefficients.AR(1) is the Markov process.
According to [20], the method combining a state-space approach and the Kalman filter [33] is similar to AR(1).Moving Average models (MA(q), see Equation ( 8)) have been considered to be as efficient as linear interpolation for filling gaps [16].
where q is the order of the moving average part and θ k are coefficients.q could be extended to infinity: MA(∞), as presented in [34].ARMA(p,q) models are combinations [35] of the two last models (Equation ( 9)), sometimes called general or mixed ARMA processes [36].This model can be used only for stationary series [35], and is efficient if the series has Gaussian fluctuations [24].
Other equivalent equations are available to describe ARMA(p,q) models: e.g., in [37].Ref. [33,35] presents a good introduction to these techniques.Box and Jenkins [33] gave two recommendations: if q < p, the observational errors increase with p, and if q < p − 1, modeling observational errors directly may yield a more parsimonious representation [35].The introduction of [12] was a good update until the late 1990s.
For interpolation through ARMA(p,q) methods, [12] advised the following four steps: (i) application of the Kalman filter [38]; (ii) skipping the missing values; (iii) estimation of parameters (ϕ j and θ k ), and (iv) application of a smoothing algorithm (as Fixed Point Smoothing) to fill gaps [39].Three methods are available for estimating parameters at the third step.The most popular is the Maximum Likelihood Estimator (MLE), and this has been used by [35].A second option is the use of the linear least squares interpolator to estimate parameters, as in [2].Despite the fact that estimation of parameters is equivalent through both methods [36], Ref. [40] considered that this method (linear least square interpolator) to be conceptually and statistically superior to the MLE.Ref. [1,37] extended the use of the Least Absolute Deviation (LAD) to estimate parameters as a third option.A fourth method developed by [41], called Minimum Mean Absolute Error Linear Interpolator (MMAELI) seemed to be less sensitive to outliers or non-usual values, according to [20].
ARMA models have been used on residuals (after a linear regression) for interpolation by [42]: this model, called the Additive Mixed Model is, according to [20], similar to the structural time-series model presented in [43].Ref. [16] has mentioned ARMAX(p,q) by adding a measurable input signal (called X, external and used as control signal) to ARMA(p,q) model.There is a last method, based on ARMA(p,q) which is available for non-stationary series: ARIMA(p,d,q) models (Auto-Regressive Integrated Moving Average, Equation ( 10)).Algorithms are similar than for ARMA(p,q) models.According to [12], the attention of users should be focused on starting conditions of the Kalman filter [38] and a proper definition of the likelihood.
where d is the order of the integrated part.The ARIMA (0,1,1) is also called the simple smoothing exponential [4], or the Exponentially Weighted Moving Average (EWMA, in [44]), and is presented in Equations ( 11a) and (11b).
where α is the smoothing factor and varies from 0 to 1. Gomez et al. [12] proposed three approaches and compared with the RMSE criteria (see Table 1) to estimate missing values: one based on the SKipping approach (SK), one based on the Additive Outlier approach without determinant correction (AON), and a last approach with determinant correction (AOC).Comparisons have been done for pure AR, pure MA, and mixed models.The approaches yielded the same results if there was less than 5% missing data.When more data were missing (20%), the SK approach was clearly faster than the others, but the AOC sometimes gave very precise estimations compared to values given by SK or AON approaches.Differences are explained by convergence properties of reverse filters, and are not negligible for mixed models.Alonso and Sipols [39] demonstrated that a sieve bootstrap procedure is better than methods proposed by [12] for estimating missing values according to the coverage criteria (see Table 1).
where α is the smoothing factor, β is the trend smoothing factor, and γ is the seasonal change-smoothing factor.Each of them vary from 0 to 1.

Machine Learning Methods
Artificial Neural Networks (ANN) ANN have been applied since late 1990s for interpolation purposes as a promising method [24] to model and predict missing data.This method is able to approximate universal functions [46].Neurons are disposed in many layers: an input layer where input variables are added (here existing records), an output layer (here the estimation of the missing values), and between them, one or a few hidden layers.Neurons, also called nodes or 'perceptrons', are connected with weighted links and contain basic functions from vectors (input of the neuron) to scalar (output).This is a data-driven method, and a number of hypotheses have to be set a priori for the number of hidden layers and underlying models.
ANNs have precious advantages.They are able to learn underlying behaviors under widely varying circumstances [24] such as non-linear or complex relationships, and to work with noisy data [47].Of course there are some disadvantages as well.ANN can suffer from over-fitting and under-fitting, mainly due to the number of hidden layers [4].To avoid over-fitting, two solutions can be applied: (i) application of a smoothing method to reduce the noise of predictions, or (ii) application of an Adaptative Neural Network, as presented by [4].There is no theoretical background to choose the optimal shape of the model, special skills are needed, and this requires a lot of computation time [24].Training is necessary to calibrate the networks with observations [4]: parameters of the pattern (connection weights and nodes biases) can iteratively change to decrease prediction errors [48].Data for the learning process should be similar to the missing data, in order to provide accuracy and robust predictions.Some studies have been published to improve the efficiency of training in ANN [3].Chen et al. [24] used ANN for churn predictions.
The structure or shape of the network is strongly correlated to data characteristics [4].Zhang and Qi [49] suggested that a single hidden layer, where neurons are connected with acyclic links, is the most used shape of network for modeling and forecasting in time series.Many hidden layers are required to catch non-linear relationships between variables [4].
where α j and β ij are parameters of the model, i l is the number of neurons of the input layer, r is the number of hidden neurons, and g is the hidden-layer transfer function from hidden layer.The logistic function is often used.
To avoid over-fitting problems, [4] developed the ADNN method, based on ANN with adaptive metrics input (a solution to take local trends and amplitude in considerations and based on the Adaptive K-nearest neighbors method-AKN) and an admixture mechanism for output data.They demonstrated that this method gives better results than usual techniques (ANN, AKN, AR, especially for chaotic and real time series).

Kernel Methods
Kernel methods have been used since the 1990s: Support Vector Machines (SVMs), kernel Fisher discriminant, and kernel principal component analysis [32].SVMs generally offer better performances than ANN methods, are adapted to high dimensional data, and have been used for interpolation purposes despite the required computation time.SVM builds a hyperplane (Equation ( 16)) in a high dimensional data set.
Water 2017, 9, 796 9 of 20 where f is the hyperplane, φ is a non-linear function from a high to a higher dimensional area, and W is the weighted vector.A kernel function is needed to estimate coefficients in the model.Kernel functions perform pairwise comparison: linear and algebraic functions, polynomials, and trigonometric polynomials functions can be used [32], but Gaussian (also called Radial Basis) functions are the most popular [18].The accuracy of prediction with SVM is strongly dependent on the choice of the kernel function and its parameters.Tripathi and Govindajaru [32] underlined that there is no general consensus to calibrate parameters of the kernel function.More details about basics of SVM are presented in [50,51].Multiple Kernel SVMs (MK-SVMs) use a combination of kernel functions [52] to find an optimal kernel [18].Two kinds of algorithms can solve this formulation: (i) in one step that contains classic reformulations, and (ii) in two steps designed to decrease computational times [18].
Based on those techniques, [18] have implemented a Hierarchical-MK-SVM (H-MK-SVM) divided in three steps to learn coefficients for static or longitudinal behavioral data.According to the result of this study, H-MK-SVM is more efficient than most existing methods for churn prediction.
Relevance Vectors Machines (RVMs) are based on SVM and Bayesian theory [53].RVMs have the same advantages and disadvantages as SVM and are able to predict uncertainty.Tripathi and Govindajaru [32] provided a four-step method based on RVMs to estimate the parameters of a kernel according to prediction risk (based on Leave-one-out cross validation).This method showed promising results for hydrologic data sets.

Tree Approaches
Decision trees have been used for decades to solve classification problems in the various sciences.Only a few studies have reported [18] or used [54] decision trees to fill gap(s): for churn prediction or for air quality data.Decision trees are used for too short a time to have a step back.Amato et al. [54] have demonstrated that the advantage of this method (by comparison to mean value and polynomial interpolation method) without detailing how many variables or the number of used acquired samples have been used for prediction.Many trees could be used through the random forest method.

Meta-Algorithms of Machine Learning Methods
Several meta-algorithms can also be used to fill gaps in data: boosting, adaboosting, bagging, etc. [18].These meta-algorithms are used depending on the problem (classification, regression, etc.), to improve the performance (accuracy and robustness) of methods presented in this section.

Methods Based on Data Dynamics
In this section, methods based on the behavior of observations are briefly presented.

k-Nearest Neighbors (kNN)
The kNN method [55] is probably the simplest to understand of these methods, is well adapted to pseudo-periodic signals, and should not be confused with the nearest neighbor approach.This method is based on the understanding of the evolution of the k-nearest neighbors.In observations, N windows of size T are compared to the windows of interest (where data are missing) with different metrics.Local and global methods should be distinguished [55]: the global one takes the whole data into account: the global one is less sensitive to new behaviors.According to [55], the value of T is often chosen by sight (according to the period of the signal), and many metrics (cityblock, Euclidean, Chebychev, etc.) are available to choose the k-nearest neighbors.Kulesh et al. [55] presented an adaptation of the nearest neighbors based on flexible metrics that reduce problems of amplitude changing and trend determination.Missing values are estimated, with a function in which selected nearest-neighbors are the variables.The function could be a weighted average of the k nearest neighbors for any integer k > 1.In that case, this method is very similar to MA methods, especially EWMA.

Box-Jenkins Models
Brubacher and Tunnicliffe Wilson [2] have reported works on Box-Jenkins models.These models are mainly based on autoregressive methods (see Section 2.2.2) while including some seasonality.That is why they are considered to be useful for polycyclic data (with various frequencies i.e., in predicting data related to human behavior-like electricity or tap water demand-or natural data presenting some strong seasonal patterns-like water level in river or in water table).

Method of Moment Fitting
Carrizosa et al. [13] have proposed a method based on the preservation of moments and autocorrelation between observations and predictions: this method requires a smooth non-convex optimization problem to be solved.Authors have compared this method to other existing ones.According to absolute percent error, absolute and quadratic differences, the method outperformed linear, nearest neighbor, cubic interpolation, and spline methods for a single gap of 30% of the time series.
Other Methods Based on Data Dynamic Chen and Andrew [56] listed a serial of other available methods, based on data behavior and applied to economical interpolations: Denton additive, Denton proportional, Growth rate preservation, and first and second difference smoothing.More details are presented in [57].Another method, considered as a refinement of inverse distance weighting method by [20], is the natural neighbor interpolation [58]: this method requires additional data (such as topology and by extension, other univariate series) to interpolate the missing value(s).

Methods Based on Kriging
Despite the fact that kriging methods are mostly used for multi-dimensional (i.e., spatial) data interpolation, they can be used for time series.Kriging groups a wide range of geostatistical methods, often used for spatial interpolation and for multivariate [59] or sometimes univariate series [60].A detailed review of these methods would require another review paper, but a brief introduction and list of the most relevant studies is given hereafter.As a common factor for all these methods, the interpolated value is estimated by a linear weighted combination of near values where weights and number of used values are dependent on the correlation between existing values [61].Combinations of kriging methods with Sequential Gaussian Simulation (SGS, [62]) or Sequential Indicator Simulation (SIS, [63]) allows estimation of uncertainty, and the distribution of the predictions.
where i is the location of existing values (single or multiple), m is the known stationary average, R(i) is the stochastic portion at i location(s) with a zero mean, constant variance, and a non-constant covariance (that varies with a distance between the i location(s) and the location of observations used for the calculation), n is the number of existing values used for the calculation, j is the location of the interpolated value, and λ i (j) are the weights of the existing value(s) (at location i) to estimate the missing value(s) at the location j.
Other versions of kriging have been developed and published: Ordinary Kriging (OK) where the sum of the weights is equal to 1, and m is constant and unknown, Kriging with a Trend (KT) model [61], also called kriging in a presence of trend or drift [58] where m (Equation (17a)) is not constant, block kriging, which can be used to estimate a value for a defined block (in term of distance, duration, e.g., an average daily water table).Some other kriging methods allow the estimation of the conditional cumulative distribution function of X i , such as indicator kriging, disjunctive kriging [65], and Multi-Gaussian kriging.

Other Stochastic Methods
A collection of other methods has been cited and listed in [18,76]: methods based on classification (rule based classifier), statistical methods (logistic regression, Bayesian Networks, hazard proportional model also Cox's regression) or a combination (Naïve Bayes).Two methods based on stochastic processes have been used for interpolation purposes: Spartan random processes [77] and Markovian arrival processes [7].Transfer function-noise (TFN) modeling is another interpolation method initially developed by [33], e.g., with application in [78].A continuous-time transfer function noise model (PIRFICT) is described and has been used by [79] to characterize the dynamics of slow processes.

Criteria Used to Compare Interpolation
A brief overview of applied criteria is given in Table 1.Most of them use variance between observed n values (x i ) and predicted n values (X i ).Their averages (x or X), and standard deviation σ(x i ) of observed values have been also been used.Equations in Table 1 have been extracted from articles and, sometimes, from personal conversations with authors.More than 30 criteria have been used in previous studies.Sometimes the same equation has different names in the studies.In [1], the applied criterion (MSE) has not been explicitly given.
Table 1.Criteria used to evaluate performance of interpolation methods found in literature.

Equations Names References
Standard errors [16] Prediction risk-leave one out MSE (R LOO ) [32] Mean Square Error (MSE) [32,39] nc [1,81] 1 Root Mean Squares (RMS) [24] Root Mean Square Errors of Prediction (RMSEP) [54] Mean Squares Error (NMSE) [4] 1 Root Mean Squares Deviations (RMSD) [9] Root Mean Squares Errors (RMSE) [12,77,80,83] 100 Normalized Root Mean Square Deviation [27] 1 Root Mean Square Standardized error (RMSS) [80] 100 Various equations 95% confidence interval [39] TN + TP TN + FP + FN + TP Percentage of Correctly Classified observations (PCC) [18] TP TP + FN Percentage of correctly classified observations in the positive class (Sensitivity) [18] FP FP + TN Percentage of correctly classified observations in the negative class (Specificity) [18] TP FP + TP FN + TP TN + FP + FN + TP Top 10% lift (Lift) [18] 2 TP TP + FP TP TP + FN TP TP + FP + TP TP + FN H-measure (H) [18] No equation Time of computation [10,12,18,84] Notes: 1 Where l is the lag (gap) size, e i is the residual between observations and interpolated values (i.e., x i − X i ). 2 Where p is the last 20% of the data rounded to the nearest multiple of n 2 and h is the number of periods. 3Where M T is the moment or autocorrelation of the interpolated series, and M R is the moment or autocorrelation in the existing data. 4Where c is the number of classes, f O is the observed frequency, and f E is the expected frequency.
Water 2017, 9, 796 13 of 20 Gnauck [16] then considered that r 2 criteria cannot be used, due to non-normally distributed data and non-linear effects in some phenomena, such as water quality processes.Some criteria could be used only for a specific method, such as AUC or MP, as proposed by [76], or APE [13].Chen et al. [18] used criteria based on the confusion matrix for binary classification [76] where TP are the True Positive, FP the False Positive, FN the False Negative, and TN the True Negative values.Those criteria/methods can only be applied if variables have been divided in classes.Alonso and Sipols [39] introduced criteria about coverage interval of interpolated values: LL for Lower Limit, and UL for Upper Limit of the 95% confidence interval.These values can be calculated with various equations (by the law of propagation of uncertainties) or with numerical methods.This is why no equation has been given for those values.The time of computation has been used in some studies.This is a useful measure for future users, but the characteristics of the computer are not always given: relative comparisons between different studies becomes more complicated with regard to this criterion.Any trade-off between quality and the cost of methods needs to be performed by each user [10].

Evaluation of Uncertainties
Among all the existing methods, there are normalized methods to assess uncertainties that are most likely used by practitioners and end-users.These standards are briefly presented in this section.

Law of Propagation of Uncertainties
The law of propagation of uncertainties [5] is an analytical method: standard uncertainty (u(y)) of the output y of a model f calculated is with N variables; here, the observed data, (x i ) is calculated through Equation (18).
Confidence intervals can be estimated with enlargement factors.This method requires that distributions of uncertain parameters and variables are known and symmetrical, and that an explicit and differentiable model f exists.This method always leads to underestimation of the prediction uncertainty due to the averaging effect, as illustrated in the example in the introduction.

Monte-Carlo Simulation
Monte-Carlo simulations can be used for uncertainty estimation [6]: the probability distributions of parameters and variables (generated with the respect of covariance between them) are propagated through N MC simulations (N MC is the number of Mont-Carlo simulations), to ascertain the probability distribution of the output y.This technique requires long computation times, but it can be used without an explicit model (as f in Section 4.1) and for every kind of distribution (symmetrical, asymmetrical, etc.).Any statistical property of y can be computed from the Monte Carlo realization of y.For example, the 95% confidence interval of y is calculated in four steps: (i) calculation of N MC values of y; (ii) ranking of values; (iii) calculation of the width of multiple 95% confidence intervals from the cumulative distribution function [F −1 y (0); F −1 y (0.95)] to [F −1 y (0.05); F −1 y (1)], and finally (iv) the smallest interval is retained as the final 95% confidence interval [6].

Discussion: From Literature Outcomes to a New Method
As this literature review has pointed out, there are numerous available methods to interpolate time series (Tables 5.1, 5.2 and 5.3 presented in [20] and other references in this study), and numerous criteria to assess the efficiencies of those methods.A second conclusion from the review is that uncertainties of the interpolated variables are generally not integrated in the interpolation methods, despite the fact that most of the methods can assess the uncertainty.End users of basic interpolation The last term in the Equation ( 20) is the bias term, which vanishes when α + β = 1.Minimizing the MSE for α and β results in optimal values for the latter parameters.Assuming that α + β = 1 and imposing the following condition (Equation ( 21)): Then, α and β can be derived (Equation ( 22)): Which leads to (Equation ( 23)): A simple case is obtained when the interpolation is done exactly halfway between the two adjacent samples, in this case, ρ (t i−1 , τ) = ρ (t i , τ), resulting in (Equation ( 24)): Assuming that the measuring error σ m is independent from the process monitored (i.e., the sensor has a measuring error that is not depending on the measuring scale), the total uncertainty at the interpolated point is (Equation ( 25)): It can be seen that in a process with an autocorrelation function, |ρ(t)| = 1, the error in the interpolation is equal to the measurement error.For every process with an autocorrelation −1 < ρ(t) < 1 the prediction uncertainty is larger than the measuring error.
The reasoning outlined (incorporating the process variability as well as the measuring uncertainty when interpolating) here can also be applied to more complicated interpolation techniques, as described in the literature review section of this article.
Future research will focus on the differences between several interpolations techniques in terms of prediction uncertainty, taking into account the characteristics of the (physical) process involved.
Figure 1 shows the uncertainties of interpolated values asset by: (i) the law of propagation of uncertainties (top left); (ii) the Monte-Carlo method (top right); (iii) the method proposed in [17] (bottom left), and (iv) the method proposed above (bottom right).The rain time series recorded in Rotterdam (The Netherlands) has been used for this comparison.On the top left, the law of propagation of uncertainties gave uncertainties lower than, and respectively equal to standard observation uncertainties (0.01 mm/h) under the hypothesis that data are fully negatively (ρ(t) = −1) and positively (ρ(t) = 1) correlated, respectively.Any additional calculation to estimate partial correlation in the time series will lead to estimations between these two dashed dot lines.The application of this first normalized method always leads to an underestimation of uncertainties, despite calculations of partial autocorrelation in the time series.On the top right, Monte Carlo method results have been plotted with a correlation coefficient of 0.051 (corresponding to the partial correlation of the time series, with a lag of 29 time steps-the lag between the last and the next values known around the gap): the resulting curve is in the area delimited by the law of propagation of uncertainties.The method proposed by [17] gave standard uncertainties (bottom left) mostly higher than the observation standard uncertainties except at the boundaries: standard uncertainties are lower here.The proposed method (bottom right) seems to give more logical estimations of standard uncertainties, with continuity at the gap boundaries, and the highest value in the middle of the interpolated values (the farthest position from the known data).
results have been plotted with a correlation coefficient of 0.051 (corresponding to the partial correlation of the time series, with a lag of 29 time steps-the lag between the last and the next values known around the gap): the resulting curve is in the area delimited by the law of propagation of uncertainties.The method proposed by [17] gave standard uncertainties (bottom left) mostly higher than the observation standard uncertainties except at the boundaries: standard uncertainties are lower here.The proposed method (bottom right) seems to give more logical estimations of standard uncertainties, with continuity at the gap boundaries, and the highest value in the middle of the interpolated values (the farthest position from the known data).

Conclusions
There are numerous methods and criteria for assessing the quality of interpolation methods.In the literature, many redundancies, discrepancies, or subtleties have been found: different names for

Conclusions
There are numerous methods and criteria for assessing the quality of interpolation methods.In the literature, many redundancies, discrepancies, or subtleties have been found: different names for the same method or criteria, different equations for the same criteria, etc. Future research should be very explicit and detailed, in order to avoid potential misunderstanding due to lexical discrepancies.No comprehensive comparative studies have been published so far: this lack of exhaustive feedback might be problematic for researchers, engineers, and practitioners who need to decide upon choosing interpolation methods for their purposes and data.To the authors' knowledge, no comparative study published so far has dealt with methods to quantify prediction uncertainties.This can explain why prediction uncertainties are, in practice, only rarely calculated.The combination of the easiest interpolation methods and uncertainty calculation standards leads to mistakes in uncertainty assessments (as demonstrated in the discussion part), and methods that perform both interpolation and uncertainty calculation have not been exhaustively compared.
According to these conclusions, future work should focus on those topics to fill in the gaps in literature, and to give the tools for researchers to decide between the many available methods.In this respect, two kinds of studies could be useful: (i) exhaustive and comparative studies with a special attention for lexical issues, to standardize names of methods and criteria (used as a new reference), and (ii) development of new methods to assess prediction uncertainties.

Figure 1 .
Figure 1.Different methods to asset prediction uncertainties.

Figure 1 .
Figure 1.Different methods to asset prediction uncertainties.