^{1}

^{*}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

In 1960, Rudolf E. Kalman created what is known as the Kalman filter, which is a way to estimate unknown variables from noisy measurements. The algorithm follows the logic that if the previous state of the system is known, it could be used as the

In 1960, Rudolf E. Kalman demonstrated an ingenious way to estimate unknown variables from noisy measurements [

Essentially, the algorithm follows the logic that if the previous state of the system is known, it could be used as the

Bayesian inference is specifically designed to accommodate the problem of updating what we think of the world based on partial or uncertain information. It is well remarked that the Kalman filter is a special case of Bayesian inference [

However, Bayes rule does not assign probabilities; it is only a rule to manipulate them. The MaxEnt method [

In this paper, we will show several things; first, the derivation of the general Bayesian filter as used for problems that are of the nature that the Kalman filter is intended for,

Here, we will build the Bayesian filter. We start with Bayes rule,
_{k}_{k}_{k}_{k}_{k}_{k}_{1}, …, _{k}_{k}_{k}_{k}_{k−1}, where _{k−1} = {_{k−1}, …, _{1}}, which would give us,
_{k}_{k}_{k}

At this point, we come to our first key _{k}_{k}_{k−1} or _{k}_{k}_{k−1}) = _{k}_{k}_{k}_{k−1} to determine the probability of _{k}_{k}_{k−1}), is not known. However, it can be seen as a marginal,
_{k−1} is the previous state and _{k−1}|_{k−1}) is the previous

The second key assumption of the Kalman filter is that we assume that we do not have the past measurements, _{k−1}, when trying to determine our belief for _{k}_{k−1}, is known.

Now, we will include the main Kalman assumptions above, first that all noise is Gaussian and linearly additive. Therefore, we will use Gaussians for our density distributions,
_{k}_{k}_{k}_{k}_{k}|_{k−1}).

The next question is deciding on the value of the means, since we are not inferring those, as can be seen from the posterior. For this, we have to look at the “forward” problems for each density function. For the prior, the forward problem is _{k}_{k,k−1}_{k−1} + _{k−1}, where _{k,k−1} is called the “transition matrix” and for the likelihood, it is _{k}_{k}x_{k}_{k}_{k}_{k−1} and _{k}

For the likelihood we have,

Note, this is similar to a least squares (which itself is a special case of Bayes). There is one more obvious question to be answered: while this may be a solution for the density function in regards to _{k}_{k−1}? We need a single number. The answer depends on the what is considered the “best guess” or point estimate for _{k−1}. There are many choices, such as the mean, median or mode. However, since we are dealing with a symmetric solution, they are one in the same. Therefore, the easiest point estimate to get is the mode, _{k−1}, which is the mode from the previous step.

To show its processing workflow, we show a very simple example. We wish to know our 1_{k}_{0} is a known velocity constant and Δ_{k}

One last question that needs to be addressed is what is the value of _{k−1}? The last assumption of the Kalman filter is that the MAP estimate of the

These solutions can be manipulated and written in the other following form, as well,
_{k}

First, we present a review of maximum relative entropy. For a more detailed discussion and several examples, please see [_{new}(_{old} (_{old} (_{old} (

The new information is the

We proceed by maximizing _{old} given the constraints. The calculus of variations is used to do this by varying

First we setup the variational form with the Lagrange multipliers,

In order to determine the Lagrange multipliers, we substitute our solution ^{(−1+α)},
^{βf}(θ)P_{old}(

The Lagrange multiplier ^{βf}_{old}(_{new}(^{βf(θ)} _{old}(_{old}(_{old}(_{old}(

There are works where entropy maximization is being used in Kalman filtering [

This example consists of analyzing a linear system composed of two equations that represent linear motion with constant acceleration _{a,k}_{k}_{k}

Here, we will derive the so-called “prediction step”, which will be the posterior or the following optimization criterion or entropy, which has the form,
_{k}_{k}_{prior,k}_{k}_{k}_{k}_{k}_{k}_{k}_{k}_{k}

All constraints come from the same Kalman filter assumptions. We derive the first constraint using _{k}_{k}_{k}_{x,k−1}d_{v,k−1}d_{a,k−1}, and where,
_{k}_{k−1} and _{k−1} are estimates of our variables from the previous discretization interval and _{x,k−1}_{v,k−1}_{a,k−1} are multivariate normal distribution additive noise variables, which have means of zero. Frequently, the joint prior distribution of noise variables is defined by four main assumptions:

The means of all noise variables are zero;

The joint distribution function is a multivariate normal distribution;

The covariance matrix is not only valid for the previous posterior distribution discretization interval, but also for the _{k}_{k}_{x,k−1}_{v,k−1}_{a,k−1}). In other words, it is implied that in our specific case, we have the following equalities,
_{k−1} = d_{k−1}d_{k−1}d_{x,k−1}d_{v,k−1}d_{a,k−1},

The last, but not the least, assumption in Kalman Filtering is that our noise variables are independent from our main state variables, _{k}_{k}

Similarly, we can construct two other MrE constraints based on Kalman filter assumptions as,
_{prior,k}_{k}_{k}_{k}_{k}_{k}

To be clear, this “posterior”

Our focus here is the traditional updating step of the Kalman filter and its reproduction by MrE. The measurement distribution needed would be obtained in a similar manner as in the predictive step using the following constraints,
_{k}_{k}_{k}_{k}_{likelihood}_{,k}_{k}_{k}_{k}_{k}_{k}_{a,k}_{a,k−1}, _{prior,k+1} (_{k+1}_{k+1}) = _{prior,k+1} (_{k}_{k}_{k}_{k}_{k}

We will now present the solution of the Kalman filter that is the same closed form solution as in the previous subsection. First, we need to construct our problem in matrix form.

The mathematical model or our state space system, as in _{k}_{k}_{k}_{k}

Sometimes, there are discussions [

Summarizing, if our state space system and its corresponding transition matrix is of such a size and/or sparsity that we can get its inverse matrix analytically in its reduced solution without numerical iterations, then selecting which version (Joseph’s stabilized or original) does not matter, because the closed form and the numeric answer would be the same. From a practical point of view, the maximum relative entropy method might be particularly useful for loosely coupled systems (not necessarily small), because its complexity (the total number of Lagrange multipliers) is equal to the total number of transition equations, and it has no explicit difficulty in calculating inverse matrices, because of the variational techniques used.

The fact is that both

The original Kalman filter has an assumption that the relationships between the state space system’s variables are linear. This assumption allows it to be expressed in a matrix form. Therefore, by definition, the Kalman filter is a linear filter, and nonlinear relationships have no explicit representations in transition matrix

In this section, we construct a general transformation of variables. While this can be found in undergraduate texts and advanced literature on Kalman filtering [

The cumulative distribution function (cdf) of

Assume we are measuring the traveled distance by a robot in meters (random variable ^{−1}(

Current one-to-one assumptions are still more general than the original Kalman filter assumptions, because we allow not only the linear equation system of variables, but also the system of any continuously increasing or decreasing functions. Then, the definition or the meaning of _{increasing} (_{decreasing} (_{increasing} (_{decreasing} (_{Y}

We begin the generation for a multivariate, nonlinear filter. A system of transformations in the measurement space is,
_{1}, _{2}, … _{n}_{1}, _{n}_{k}_{k+1} = _{k}

In this subsection, we will revisit the Kalman filter example from the previous section. Our state space system with transformation function ^{−1} (⋯), of the transformation is,
_{k−1} (⋯), is constructed exactly the same as in previous subsection by eliminating the random noise variables. In other words, it is a multivariate normal distribution with a covariance matrix of,

Kalman demonstrated an ingenious way to estimate unknown variables from noisy measurements, in part by making various assumptions. In this paper, we derive the Bayesian filter and, then, show that by applying the Kalman assumptions, we arrive at a solution that is consistent with the original Kalman filter for pedagogical purposes; explicitly showing that the “transition” or “predictive” step is the prior information and the “measurement” or “updating” step is the likelihood of Bayes’ rule. Further, we showed that the well-known Kalman gain is the new uncertainty associated with the posterior distribution.

Recently, a paper [

By applying and manipulating pure probabilistic definitions and techniques used in signal analysis theory, we derived a general, nonlinear filter, where constraining the variables of interest in the form of continuous monotonic increasing or decreasing functions and not necessarily a linear set of functions, like in the original Kalman filter. Thus, we can include more information and extend approximation approaches, such as the extended Kalman filter and unscented Kalman filter techniques and other hybrid variants.

In the end, we derived general distributions using MrE for use in Bayes’ Theorem for the same purposes as the original Kalman filter and all of its offshoots. However, MrE can do even more. An important future work will be to include

We would like to acknowledge valuable discussions with Julian Center and Peter D. Joseph.

The authors declare no conflict of interest.