Particle Filtering: A Priori Estimation of Observational Errors of a State-Space Model with Linear Observation Equation

: Observational errors of Particle Filtering are studied over the case of a state-space model with a linear observation equation. In this study, the observational errors are estimated prior to the upcoming observations. This action is added to the basic algorithm of the ﬁlter as a new step for the acquisition of the state estimations. This intervention is useful in the presence of missing data problems mainly, as well as sample tracking for impoverishment issues. It applies theory of Homogeneous and Non-Homogeneous closed Markov Systems to the study of particle distribution over the state domain and, thus, lays the foundations for the employment of stochastic control against impoverishment. A simulating example is quoted to demonstrate the effectiveness of the proposed method in comparison with existing ones, showing that the proposed method is able to combine satisfactory precision of results with a low computational cost and provide an example to achieve impoverishment prediction and tracking.


Introduction
Particle Filter (PF) methodology deals with the estimation of latent variables of stochastic processes taking into consideration noisy observations generated by the latent variables [1]. This technique mainly consists of Monte-Carlo (MC) simulation [2] of the hidden variables and the weight assignment to the realizations of the random trials during simulation, the particles. This procedure is repeated sequentially, at every time step of a stochastic process. The involvement of sequential MC simulation in the method is accompanied by a heavy computational cost. However, the nature of the MC simulation makes the PF estimation methodology suitable for a wide variety of state-space models, including non-linear models with non-Gaussian noise. The weights are defined according to observations, which are received at every time step. The weight assignment step constitutes an evaluation process of the existing particles, which are created at the simulation step.
As weight assignment according to an observation dataset is a substantial part of PF, missing observations hinder the function of the filter. Wang et al. [3] wrote a review concerning PF on target tracking, wherein they mentioned cases of missing data and measurement uncertainties within multi-target tracking, as well as methods that deal with this problem (see, e.g., [4]). Techniques that face the problem of missing data focus mainly on substitution of the missing data. In recent decades, Expectation-Maximization algorithm [5] and Markov-Chain Monte Carlo methods [6] became popular for handling missing data problems. These algorithms have been constructed independently of PF. Gopaluni [7] proposed a combination of Expectation-Maximization algorithm with PF for parameter estimation with missing data. Housfater et al. [8] devised Multiple Imputations Particle Filter (MIPF), wherein missing data are substituted by multiple imputations from a proposal distribution and these imputations are evaluated with an additional weight assignment according to their proposal distribution. Xu et al. [9] involved uncertainty on data availability in the observations with the form of additional random variables in the subject state-space model. All the aforementioned approaches are powerful, although computationally costly.
This paper focuses on state-space models with linear observation equations and provides an estimation of the errors of missing observations (in cases of missing data), aiming at the approximation of weights, under a Missing At Random (MAR) assumption [10]. Linearity in an observation equation permits sequential replacements of missing values with equal quantities of known distributions. Although this method is applicable to a smaller set of models than the former ones, it is much faster as it leads to a single imputation process. A simulating example is provided for the comparison of the suggested method with existing techniques for the advantages of the proposed algorithm to be highlighted. The contribution of the a priori estimation step to the study of impoverishment phenomena is also exhibited through Markov System (MS) framework (see, e.g., [11]). The substitution of future weights renders the estimation of future distribution of particles in the state domain feasible. The significance of this initiative lays on the possible estimation of the sample condition concerning impoverishment, in future steps, based on the suggested theory. Such a practice permits the coordinated application of stochastic control [12] instead of the mostly empirical approaches that been proposed so far [13].
The present article is based on the work of Lykou and Tsaklidis [14]. Further mathematical propositions are formed by the sparse remarks exhibited in [14], and the incorporation procedure of MS-theory in the study of particle distribution is explained in detail. The presentation of the initial results of the simulation example is reformulated for the example to be more easily understandable, as well as a new application of MS-theory for the quantitative prior estimation of the particle distribution one time step forward is added to the initial example. In Section 2, PF algorithm is presented analytically. In Section 3, the new weight estimation step is introduced and its connection with the study of degeneracy and impoverishment is explained. In Section 4, a simulating example is provided, where the results of the current method are compared with those of MIPF and the results of the basic PF algorithm in the case when all data are available. An example for the estimation of the particle distribution one step ahead after the current is also presented in the direction of impoverishment tracking and prediction. In Section 5, the discussion and concluding remarks are quoted.

Particle Filter Algorithm
Let {x t } t∈N be a stochastic process described by m-dimensional latent vectors, x t ∈ R m , and {y t } t∈N be the k-dimensional process of noisy observations, y t ∈ R k . The states and observations are inter-related according to the state-space model, In the system of Equations (1) and (2), f is a known deterministic function of x t , v t stands for the process noise, and u t symbolizes the observation noise. Each sequence {v t } t∈N and {u t } t∈N consists of independent and identically distributed (iid) random vectors, while c ∈ R k is a constant vector and A ∈ R k×m is a constant matrix.
PF methodology employs Bayesian inference for state estimation. The Bayesian approach aims at the construction of the posterior probability distribution function p(x t |y 1:t ), where y 1:t = (y 1 , y 2 , ..., y t ), resorting to the recursive equations, These equations are analytically solvable in cases of linear state-space models with Gaussian noises. However, for more general models, analytical solutions are usually infeasible. For this reason, PF can be applied by utilizing MC simulation and integration to represent the state posterior probability distribution function, p(x t |y 1:t ), with the help of a set of N ∈ N particles x i t , i = 1, 2, ..., N, with corresponding weights w i t , i = 1, 2, ..., N. Then, p(x t |y 1:t ) can be approximated by the discrete mass probability distribution of the where δ is the Dirac delta function and weights are normalized, so that ∑ i w i k = 1. As p(x t |y 1:t ) is usually unknown; this MC simulation is based on importance sampling, namely a known probability (importance) density q(x t |y 1:t ) is chosen in order for the set of particles to be produced. Then, the state posterior distribution is approximated bŷ are the normalized particle weights for i = 1, 2, ..., N.
As PF is applied successively for several time steps, it happens that increasing weights are assigned to the most probable particles, while the weights of the other particles become negligible progressively. Thus, only a very small proportion of particles is finally used for the state estimation. This phenomenon is known as PF degeneracy. In order to face this problem, a resampling with replacement step according to the calculated weights has been incorporated into the initial algorithm, resulting in the Sampling Importance Resampling (SIR) algorithm. Nevertheless, sequential resampling leads the particles to take values from a very small domain and exclude many other less probable values. This problem is called impoverishment. A criterion over the weight variability has been introduced for a decision to be made at every time step, whether existing particles should be resampled or not, to reach the middle ground between degeneracy and impoverishment. In this criterion, the Effective Sample Size measure of degeneracy, defined as is involved (see, e.g., [15], pp. [35][36]. As this quantity cannot be calculated directly, it can be estimated asN The decision on resampling is positive wheneverN e f f (t) < N T , where N T = c 1 N, c 1 ∈ R is a fixed threshold. A usually selected value for N T is 75%N. Establishing a criterion for resampling slows down sample impoverishment of the sample but does not prevent it.
Algorithm 1 summarizes PF steps. The sampling part corresponds to the prior (prediction) step of Bayesian inference, while weight assignment and possible resampling constitute the posterior (update) step.

Algorithm 1 SIR Particle Filter
. end for Resampling

The Missing Data Case-Estimation of Weights
We now proceed to the addition of a new step to Algorithm 1 for the case of missing data. For that purpose, some new definitions need be quoted. As the missing data mechanism is usually unknown, its possible dependence on the missing data themselves could introduce bias to the statistical inference. For this reason, a Missing at Random (MAR) assumption is adopted: let a random indicator variable R t,j , j = 1, ..., k, indicate whether the jth component of the tth observation is available or not. That is, Additionally, sets Z t and W t are defined as the collections of missing and available components of observations y t , respectively, for every time step t ∈ N.
According to the MAR assumption, the missing data mechanism does not depend on the missing observations, given the available ones: be the set of particles produced for the posterior estimation of latent vector x t−1 , while whole observation y t is missing. In addition, let u i t , i = 1 . . . N, be the observational errors for corresponding candidate particlesx i t , so that according to (2). Then, the conditional distribution of every observational error u i t on the particle wherex t−1 is a point estimation of x t−1 and v i t represents the process noise for the generation of x i t .
Proof. Let x i t−1 be a particle for the posterior estimation of the hidden state x t−1 of the state-space model (1)- (2). Then, according to Algorithm 1 and Equation (1), the ith prior estimation of the hidden state x t is produced by equatioñ According to Equation (2), the observational error of the particle is calculated as If the (whole) observation y t is unavailable, sequential replacements of u i t and y t from Equations (6) and (2), respectively, contribute to the creation of the formula, As observation y t is considered missing, particlesx i t cannot be evaluated. Thus, both x t andx i t are replaced according to Equations (1) and (5), The hidden state x t−1 is unknown, but its posterior distribution is available, so that a point estimation of it,x t−1 , can be calculated. Then, Therefore, since the quantity ) is a constant at time t, the distribution of u i t can be approximated as Remark 1. Given that the distributions of the random vectors v t , v i t , and u t are known, the distribution of Av t − Av i t + u t is also known, as it is the convolution of linear functions of the initial components v t , v i t , and u t . Calculation of such convolutions is not always an easy task, as analytical solutions may not be feasible, leading to numerical approximation options ( [16]). However, given that each noise process consists of iid vectors and matrix A is constant, the distribution of this sum needs to be calculated only once.

Remark 2.
The replacement ofx i t can be avoided, if MC simulation has been implemented at this time point.
The weight assigned tox i t depends on u i t , according to Equations (3) and (6), because Then, as the two variables (w i t and u i t ) are closely associated, knowledge of the distribution of u i t leads to the derivation of the distribution ofw i t . Even if the distribution ofw i t may not be exactly calculated, in cases wherew i t are complicated functions of u i t , knowledge on the distribution of u i t will suffice to provide information on the weight distribution. Thereby, calculation of p(u i t ∈ D), D ⊂ R k , as it is suggested in Remark 1, is of interest for the concomitant estimation of weights.

Proposition 2.
If the conditions of Proposition 1 hold, while y t is partially observed, the conditional distribution of every observational error u i t on particle set {x i t−1 } N i=1 and collection W t of available components of y t is approximated as Proof. Estimation of u i t implies the estimation of y t , according to Equation (6). If observation y t is partially available, its available components, say W t collection, can be placed into the above equations. Thus, some components of the observational error will also be available, while the rest of the components, say Z t collection, possibly dependent on the available ones, is estimated in the same rationale. In this case, (4) takes the form In any case, missing (parts of) observational errors u i t along with their weights can be substituted by single values, as expected values or modes. Consequently, the initial PF algorithm undergoes a slight change, as presented in Algorithm 2. Further to Remark 2, the substitution of observational errors u i t is implemented after the Importance Sampling step in Algorithm 2 for the sake of simplicity.

Connection to Markov Systems and Contribution to the Study over Impoverishment
Impoverishment over the particle samples can be studied in connection with certain Markov models, the Homogeneous or Non-homogeneous Markov Systems (denoted as HMSs or NHMSs, respectively) , which have their roots in [17]. With the consideration of a grid of d ∈ N cells over the state domain, problems of impoverishment reduce to a problem concerning the derivation of the distribution of the particle population over the grid cells. Term "grid" denotes here a single partition over the whole state domain. The cells of this grid represent the states of the MS. At every time step, a particle moves from cell i (i = 1, . . . , d) to cell j (j = 1, . . . , d) with (time-dependent) transition probability p ij,h (t), (h = 1, . . . , N) in the general case. However, MS consideration is based on the hypothesis that population members which are situated in the same state move to any cell at the next time step according to a joint transition probability. Thus, for the introduction of MS-theory in the study of particle distribution over the grid, probabilities p ij,h (t) are approximated by single quantities p ij (t) for all particles in cell i at time point t. The fact that PF is applied to dynamical systems entails that different areas of the state domain become of particular interest at different time steps. Therefore, it is preferable for the grid lines not to remain constant over time. A simple time-varying grid is constructed within the simulating example in Section 4, while a more complex structure is provided in [18]. In the simple case that the PF algorithm comprises constant parameters and excludes the resampling step, the corresponding MS can be considered homogeneous, as the particles only move according to a state equation with constant approximating transition probability values. Resampling causes the redistribution of particles over the grid. The probability vectors for this redistribution are defined by the observational errors at every time step. Thus, steps of changing probability vectors are introduced in the MS rendering this MS non-homogeneous. Moreover, the results over the grid of both production of weighted particles on the basis of system (1)- (2) and resampling, at the end of every time step, derive the results of sums of multinomial trials with varying probability vectors (see also [19], p. 28); this procedure corresponds to the transitions of population members between the state of a MS. In general, the sums of multinomial trials can be considered to follow generalized multinomial distribution [20] or, more generally, Poisson Binomial distribution (which has its roots in [21], §14 ). As the number of particles remains constant, according to Algorithm 1, the MS is considered to be closed. The difficulty in making predictions on the MS lies in the fact that observational errors are not a priori acquired during the filtering procedure.

Algorithm 2 SIR Particle Filter for missing data with observational error estimation
, Produce observational error estimationsû i t for the missing components Z t and calculate importance weights Resampling In this study, observational errors are substituted by single values for one time step, so that weights can be estimated one step ahead. The set of weights configures the probability vectors of the resampling step. Thus, the distribution of particles over the grid cells can be approximated during the upcoming step of resampling and new Importance Sampling. Thus, the distribution of the particle population can be estimated for the next step on the basis of the estimated weights for the dispersion of the future particles over the grid to be assessed and impoverishment phenomena to be predicted. Such a practice paves the way to the involvement of stochastic control theory [12] (leading to control of asymptotic variability [22]) into the matter of the avoidance of impoverishment.

Contribution to the Missing Data Case
A simulation example is presented in this section to emphasize the benefits of the proposed algorithm. The proposed method is compared with the typical PF algorithm, when the complete dataset is available, and the multiple imputation particle filter (MIPF) for n = 5 imputations [8]. The reduction step proposed in [23] is incorporated in the initial MIPF algorithm for the best possible results to be achieved. The data simulation is based on the state-space model of Equations (1) and (2), with two-dimensional vectors where 0.03 and N symbolizes Gaussian distribution. Let x 0 = 1 0.5 be considered known. Next, concerning missing data, we let R t,j ∼ Bernoulli(0.15), j = 1, 2, that is the data are missing completely at random. N = 100 particles have been used for every filter. The distributions of noises are also considered known. The weighted mean is used as a point estimator of a hidden state and missing observational errors are substituted by their expected values. All the filters have been repeated for 100 times and their performance concerning their precision and consumed time has been recorded. (The code was written in R project [24]. Packages mvnorm [25], with its corresponding reference book [26], ggplot [27], and ggforce [28] were also used. Simulations were performed on an AMD A8-7600 3.10 GHz processor with 8 GB of RAM.) The results of the three methods are shown in Table 1. In the first two columns, the means over the simulations of Root-Mean-Square Errors (RMSE) of the estimators (weighted means) for each component of the hidden states are presented. The mean of the two aforementioned columns is also calculated, as well as the mean time consumed in each approach. In the table, it is shown that the weight estimation with the suggested method outperforms MIPF concerning both precision and time elapsed. The precision of the suggested method supersedes that of MIPF slightly, while the mean required computational time is about 50% less than the corresponding mean time required for MIPF. The proposed method is also compared with the results of the standard PF algorithm, for which all observations are available, and it seems that, even if the precision is inevitably reduced in the case of missing data, the computational time remains nearly the same. The small differentiation in the mean elapsed time is probably connected with the resampling decision. That is, in this example, the precision of the suggested method slightly supersedes its competitor, while its computational cost is much lower than the cost of its competitor, reaching the levels of the basic filter (which is practically infeasible in the missing data case). In Figures 1 and 2, the performances of the proposed method and MIPF are depicted for the two components of the state process, respectively, for one iteration of each filter. The estimators (weighted means) of the two approaches are close to each other, tracking the hidden vector satisfactorily. Therefore, in this example, the suggested method appears to provide the best option between the available ones in the missing data case.

Contribution to Impoverishment Prediction
As far as estimation of particle distribution one step ahead is concerned, an application for the transition of the particles from time point t = 9 to t = 10 is presented during one implementation of the suggested PF with single imputation for missing values on the available dataset. In the time interval (0,10], only the first component of observation y 6 is unavailable. In the end of time step t = 9, resampling is implemented and the histograms of the particle sets are exhibited in Figure 3. The sample mean of the particles is m = [m 1 = 0.947 m 2 = 1.07] T and the standard deviations of the corresponding components are s 1 = 0.126 and s 2 = 0.127. According to Equation (4) and the given parameters of the problem, the random factor needed to be estimated for every particle at the next time step is where σ z = 0.36. Thus, in both dimensions, the following partitions are considered, so that a grid of nine cells is configured over the two dimensions. The frequency table (Table 2) exhibits the particle distribution over the grid. (c) Joint histogram of the particle sample (i = 1, . . . , N) for the posterior estimation of both components of the whole hidden vector x 9 . Red lines delimit the suggested grid cells.The red diamond stands for the hidden state. The selected time period was chosen because there is a considerable number of preceding steps that permits a relatively good adaptation of the particle samples over the hidden states and the samples have not yet collapsed to a tiny neighborhood around a single point (utter impoverishment). This argument is evinced in Figure 3c and Table 2, where the distribution of the particles is presented in connection with the hidden state and the suggested grid. The produced particles are both close to the hidden state, as most of them are less than one standard deviation σ z from it, and sparse enough for the existence of particles outside the central cell of the grid. Thus, the condition of the sample during these time points configures a typical example of filter implementation before its collapse. Such time points may be suitable starting points for the introduction of control (which exceeds the limits of this study) as the subject sample is in a good condition concerning both impoverishment and accuracy over hidden variable estimation. Table 2. Frequency table of the particle distribution over the suggested grid at the end of t = 9. The movement of all the particles according to the deterministic part of Equation (9) results in the frequency table in Table 3, where it is shown that all the new particles belong to the central cell. Even though the particles are identically distributed, with the addition of the process noise to the particles, the probabilities for particles to move from the central cell to random ones defer from particle to particle, as the particles have different distances form the grid lines initially. This fact is in contrast to the theoretical background of MS, according to which population members have a common transition probability matrix P to move during a time step. For this reason, the probabilities of particles to move to a cell with the addition of the random noise are approximated by the probabilities of the point estimation x 10 (−) to move to a random cell with the addition of noise. These probabilities (rounded values) are provided in Table 4 Table 5. Concerning the expected posterior distribution of the particles, the expected observational errors are zero, so that particle weights are expected to remain the same. Thus, no further change is expected in their distribution in cells even if resampling is decided to take place, as all weights are equal after resampling in the previous time step.

Remark 3.
The expected values of observational errors are zero. Nevertheless, the prior estimation of their distribution according to relation (4) and model parameters, where the variances of the errors are presented, evinces the increased uncertainty for them, as σ 2 u = 0.03 while σ 2 z = 0.13. Table 3. Frequency table of the particle distribution over the suggested grid at t = 10 when the particles x i 9 , i = 1, . . . , 100, move only according to the deterministic part of Equation (9).  The results of the implementation of PF at time step t = 10 are also exhibited. Resampling has taken place at this time step as well. The joint histogram of the posterior sample over both dimensions (Figure 4) indicates that the majority of the particles do not belong to the central cell. This fact is reasonable, as the length of the sides of the central cell equals only one standard deviation σ z , so that prior probabilities for the particles to be placed outside of the central cell at this time point are considerably big according to the Empirical Rule (68-95-99.7) for normal distribution. For the consolidation of these results towards this rule, the orange squares are drawn in Figure 4 for the corresponding areas of the rule to be defined for each separate dimension, while the orange circles are the corresponding standard deviation circles (and not ellipses, generally, as the two components have the same variance σ 2 z = 0.13) of the whole vector. Thus, questions on the suitability of the proposed grid structure are raised for future study. Nevertheless, it should be mentioned that a grid with a central cell of double side length would have classified all particles to the central cell during time step t = 9, rendering further study on the issue meaningless. Additionally, a new grid of nine cells is also constructed around the mean of this posterior particle set, the central cell of which also has length σ z . The distribution of the particles in the new grid is quoted in Table 6. In comparison with Table 2, it seems that the number of particles in the central cell is increased in Table 6.   Table 3. The red diamond stands for the hidden state. The red star stands for the observation at this time point. The sides of the two squares are correspondingly one and two standard deviations σ z from the center of the diagram. The circles are inscribed in the corresponding squares.

Remark 4.
In the present example, the transitions of the particles according to the deterministic function led all particles to a single cell (Table 3), so that the result of the addition of process noise was handled as a result of a multinomial trial. In the case that the deterministic function leads the particles to more than one cell, then it is suggested that different means be found for each cell as well as corresponding transition probabilities, so that the final result can be considered the sum of results of multinomial trials for the transitions to every cell.

Discussion and Conclusions
In this study, single substitution (in contrast to MIPF) of observational errors is proposed for missing data cases, when PF is implemented and MAR assumption is adopted. This method is a single imputation procedure. Acuña et al. [23] argued against single imputation, as it is rather simplistic and it cannot attribute to a single value the distributional characteristics that can be approached and described by a sample of multiple imputation. Nevertheless, the primary target of the proposed technique is the minimization of the computational cost that is added to the initial PF algorithm, in the case of missing data. For this purpose, interventions in the PF algorithm are slight. Moreover, in the provided simulation, the suggested method outperforms the multiple imputation approach even for a considerable number of imputations, whereas Acuña et al. [23] noticed that MIPF with n = 3, 4, 5 imputations produces very satisfactory results, according to the approximation of multiple imputation estimator of efficiency provided by Rubin [29]. As a result, in this example, estimation of observational errors performs better with respect to both the computational time it requires and the precision it achieves. Besides, knowledge on the distribution of the observational errors contributes to the quantification possibility of the uncertainty over the point estimations. Thus, the suggested method takes advantage of the low computational cost of the single imputation option, while the study of more general distributional characteristics of the observational noise can also be taken into account at the same time (see Remark 3).
The contribution of such a method in order to cope with impoverishment problems is also worth mentioning. This method permits the estimation of observational errors and their corresponding weights one time step forward. The evolution of weight distribution has not been a priori estimated for multiple time steps yet, to the best of the authors' knowledge, but this is feasible at least for one step ahead. As the weights of the next step can be estimated, the probabilities that a particle will be chosen at the resampling step can also be estimated. As explained in Section 3.1, the assessment of weight distribution for the forthcoming time steps could be very interesting, as far as it is connected with impoverishment issues. Concerning future perspectives over this issue, the study of impoverishment problems can be implemented with the use of input control [12], in order for the impoverishment to be controlled; laws of large numbers [30], as MC approximation employs large samples; state capacity restrictions [31]; for the existence of a population limit at every grid cell; literature on the evolution of attainable structures [32]; the evolution of the distribution of particles [33] or of the corresponding moments [34] in the direction of HMSs and NHMSs; and for the estimation of the future behaviour of the sample, possibly reaching continuous-time models [35]. Research on automatic optimal control [36] could be combined with the suggested methodology, possibly leading to interesting joint applications of PF [37] along with artificial intelligence [38]. The performance of the method could also be tested when data are missing for longer time periods [39], while more sophisticated grid structures could also be examined [18]. Correspondingly, in a broader sense, the main idea of the proposed method could be implemented in the errorsin-variables signal processing for missing data cases [40], or it could be involved in more complex models that require MC simulation for the prior estimation of variables [41].  Acknowledgments: The authors would like to express their gratitude to Panagiotis-Christos Vassiliou and Andreas C. Georgiou for the opportunity to submit to this Special Issue for possible publication. The constructive comments and the valuable suggestions of the three anonymous reviewers as well as the editorial assistance of Bella Chen are highly appreciated.

Conflicts of Interest:
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: