Next Article in Journal
A Semi-Parametric Approach to the Oaxaca–Blinder Decomposition with Continuous Group Variable and Self-Selection
Previous Article in Journal
A Frequentist Alternative to Significance Testing, p-Values, and Confidence Intervals
Previous Article in Special Issue
State-Space Models on the Stiefel Manifold with a New Approach to Nonlinear Filtering
Open AccessArticle

Looking Backward and Looking Forward

Center for Operations Research and Econometrics (CORE), Universite Catholique de Louvain, Voie du Roman Pays 34, B-1348 Louvain-la-Neuve, Belgium
Institut de Statistique, Biostatistique et Sciences Actuarielles (ISBA) and Center for Operations Research and Econometrics (CORE), Universite Catholique de Louvain, Voie du Roman Pays 20, B-1348 Louvain-la-Neuve, Belgium
Author to whom correspondence should be addressed.
Econometrics 2019, 7(2), 27;
Received: 31 July 2018 / Revised: 18 April 2019 / Accepted: 5 June 2019 / Published: 14 June 2019
(This article belongs to the Special Issue Filtering)


Filtering has had a profound impact as a device of perceiving information and deriving agent expectations in dynamic economic models. For an abstract economic system, this paper shows that the foundation of applying the filtering method corresponds to the existence of a conditional expectation as an equilibrium process. Agent-based rational behavior of looking backward and looking forward is generalized to a conditional expectation process where the economic system is approximated by a class of models, which can be represented and estimated without information loss. The proposed framework elucidates the range of applications of a general filtering device and is not limited to a particular model class such as rational expectations.
Keywords: perception; filter; rational expectations; estimation perception; filter; rational expectations; estimation

1. Perception as a Filter

Many people’s aspirations and desires imply forward-looking decisions. Making a forward-looking decision requires the construction of an expectation based on the information that is backward induced. It can be characterized by a process of making conditional expectations. Such a process allows us to construct our subjective beliefs rather than perceiving the world as mere presentation. Instead, we perceive the world as an object of perception in which our own experience and knowledge are integrated. We become unified with that perception. Thus formulating an expectation of an agent cannot separate the perceiver from the perception. From the market’s perspective, an agent no longer views himself as an individual, but rather becomes a “cognitive subject” of a time-invariant perception where the laws of the economy are revealed. The process of forming conditional expectations is the practical consequence of this identity as it attempts to represent our immersion with the world, and these attempts constitute the essential laws of the world. We refer to this process as a filter.
The equilibrium in rational expectation models is based on the assumption that the agents in the model are confident with their perceptions. As a consequence, the agents trust that the optimal actions following their expectations will give them maximum utilities or profits. Sequential decisions are made under expectations conditioning on past information. Such decision processes induce a representation1 of the conditional probability distribution of the economic dynamics: the law of motion perceived by agents will sequentially influence the law of motion that agents actually face. These perceptions can be viewed as filters as they are often characterized by recursive projection schemes (Simon 1959). There is, however, a dichotomy in the understanding of filters in economic theory and econometrics.
In economic theory, the filtering method (perception) is an active process involving the agent’s attention to a small part of the whole dynamic system and excluding almost everything out of the scope of their attention. In econometrics, the filtering method is treated as a passive process selecting some statistical relevant information of a given dynamical model. Both aspects of the filtering method are mainly treated by Sargent (1987) and Hamilton (1994), respectively.2 Some other, similar types of perceptions have been discussed in Marcet and Sargent (1989a, 1989b); Hansen and Sargent (2007); and Hansen et al. (2010). Rather than considering a specific economic or econometric model, this paper characterizes general perceptions that are concealed in abstract models where both the active and passive arguments can be integrated. Hansen (2007) examines the inference of rational expectation models from two separate perspectives. The way he reduces the gap between these two perspectives is to enlarge the models used by the economic agents. We extend the scope to a more general class of models for the economic agents under which the integration becomes more natural. The filtering method provides formal representation-estimation processes for practical situations. The relevance of these processes in our setting is that they are not merely statistical techniques, but actual dynamic mechanisms used in expectations and perceptions. The equations and assumptions that appear in the estimation procedure correspond to the perception of economic agents. This is in the spirit of Klein (1950): “The purpose in building econometric models is to describe the way in which the system actually operates. [...] The construction of such a system is a task in which economic theory and statistical method combine.”
In our context, building econometric models is related to the proper specification of filters. Considering filters as a perception device in an abstract economy induces a large class of econometric models. The remaining econometric task is to reduce the abstract representation to a feasible form for estimation. On the other hand, as the complexity of the environment increases, agents learn more and more about the mechanisms and processes that are used to relate themselves to that environment and to achieve their goals. The availability of general implementable techniques in econometrics will elucidate inaccessible places for the abstract models. As the comment in Simon (1959) on modeling human expectation says, “it is one thing to have a set of differential equations, and another thing to have their solutions.” Economic theory predominates in the definition of the representation describing a certain type of economic dynamics, while econometric methods are associated with the determination of the agent’s way of estimation.
The need for reconciling economic theory and econometrics may not be obvious in linear or linearizable structural models when the laws of motion are specified on either theoretical or empirical grounds, and hence either side of the coin will be sufficient to justify the model. However, if we start with an abstract economy, the limitation of modeling tools in this complex environment will force us to integrate all available factors. The generalization of the filtering mechanism will give a fundamental interpretation of agent’s perceptions in complex situations. All subsequent statistical procedures, such as estimation, inference and forecasting, will more or less depend on the way this generalization has been formulated. Our contribution will be to make this generalization available.
Early attempts in this direction were made in the framework of rational expectation models. The expected utility or profit for each agent depends on the assumption of the agent’s perception mechanism. Evans and Ramey (1992) show how agents adjust their long-term expectations under different perception rules or predictions. The standard method assumes that the agent’s prediction uses a single, presumably correct law of motion. But if perceptions of agents differ, then the corresponding predictions are incompatible with each other. A sequence of works by Hansen and Sargent, covered by the monograph Hansen and Sargent (2007), introduces the concern of robustness of agent’s expectations. The main idea is that agent’s decisions contain their prior worries about a possible mis-specification of the model. These multiple priors generalize the perception or the law of motion contained in the agent’s mind. The associated perception mechanism for the robust decision agents is also a filter, called robust filter, and is a dual of the linear-quadratic regulator problem of utilities. Given the robustness concerns, the agent makes the expectation based on a class of models whose information is presumably not far from the underlying model in a certain metric. Therefore, the robust filter generalizes the mechanism used in the single law case.
Our motivation is related to the robust filters of Hansen and Sargent (2007) and Hansen (2007), but the relation is more one of spirit rather than of a precise form. In terms of the robustness framework, our objective of generalizing filtering mechanisms attempts to analyze how large the class of alternative models could be while remaining consistent with some general filter. The class used in Hansen and Sargent (2007) is restricted by a risk threshold and is equivalent to a class of partially specified processes. If we enlarge such a class to an abstract economic system, is the filtering type mechanism still an optimal choice for agents in this system? We ask this question because rational expectation models are merely an approximation of the real world. Exploring the applicable range of a general filtering device will demonstrate its usefulness essentially irrespective of the particular form of the economic model.
The paper aims at making the previous assertions rigorous. The mathematical tools we use are borrowed from stochastic analysis and stochastic control. For expositional purposes we delay the formal results of the paper to Section 4. Section 2 introduces the economic model as a general probability space and anticipates the main result on the existence of filters. In Section 3 we define three claims under which the economic system is supposed to operate and which will allow us to obtain an explicit representation of the filter. Section 5 elaborates on the claims in asset pricing, on a framework with stochastic volatility, and on the link of the general filtering results with those in a linear framework. Section 6 summarizes the main findings. All proofs are relegated to Appendix A.

2. The Model

2.1. An Abstract Economy

The economic system in this paper is driven by components whose evolution is modeled via stochastic processes. We stack these components into a state vector and denote it as X t = { X i , t , t > 0 , i = 1 , , I X } . When we state specifically t Z + , X t is a discrete time process, otherwise X t is assumed to be a continuous time process. Throughout the paper, x refers to either a deterministic variable or a realization of X t . The state vector X t may consist of unobservable features such as private information, utilities or underlying prices.
The underlying abstract economic model in our context is a probability space ( Ω , F , P ) , where we define X together with a filtration ( F t ) t 0 . The filtration F t is right continuous, F t = ϵ > 0 F t + ϵ , and F = lim t F t . In F , there is a P -null set3 contained in F 0 and consequently in all F t .
The values of the states of X t form a measurable space ( S , S ) . The state space S is a compact metric space and is associated with a Borel σ -algebra S = B ( S ) . We assume X t to be measurable and the measurable mapping is:
X t ( ω ) : [ 0 , ) × Ω , B ( [ 0 , ) ) F ( S , S ) ,
where ⊗ denotes the product operator for σ -fields.
While the essential features of economic dynamics are assumed to be captured by the state variables X t , the observable economic variables and public information, in general, are not. Since the observable and public information is the major resource for agents to make their expectations about how the economic states X t change, we specify it by another process Y t . Let Y t include those observable variables that are related with X t , and assume that the dimension of Y t is not larger than that of X t . The observable information set is Y : = t R + Y t , the filtration generated by the observable process Y t such that
Y t : = σ ( Y s , s [ 0 , t ] ) N
with t 0 , where N is the collection of all P -null sets of our economic model ( Ω , F , P ) .4 The available information Y t is induced by observations up to time t and thus it will be used for making inference about X t . Since F t is right continuous, to make Y t compatible with X t , we assume that the filtration Y t is also right continuous.
Agents will construct their perceptions of states X t based on the information in Y t . It means that agents are able to construct π t , the conditional distribution of X t , given Y t . The conditional expectations characterize the perceptions or filters of X t . For any t, the conditional distribution is a stochastic process ( ω , t ) π t ( ω ) such that
π t X t ( ω ) A = P X t A | Y t ( ω ) , A S .
For simplicity, we will write π t ( ω ) as π t in short.
Since the perceptions are processes, any valuation of X t will involve an expectation w.r.t. the conditional distribution process. The definition of conditional expectation of X t is restricted to an equivalence class of Y t -measurable X such that:
P [ X t B ] = P [ X B ] , X , B Y t .
Then the expectation of a function φ ( X t ) can be expressed as φ ( x ) π t ( d x ) . The conditional expectation of φ is ultimately what is desired from filtering, but the methods for obtaining the conditional distribution process are quite involved. If this integral is well-defined for a class of functions φ , then we call them choice functions φ .5

2.2. Existence of Filters

In the abstract economy ( Ω , F , P ) , we consider perception equivalently as a filter. But perception as a common human behavior should always exist on either individual or aggregrate levels. Will filters always exist in this abstract economy? Due to the P -null set N Y t , π t may in fact not be well defined for all ω Ω but only for ω outside the P -null set. Thus, the question of existence of π t is equivalent to the question under which circumstances one can gain sufficient control over all P -null sets N such that the expectation φ ( x ) π t ( d x ) is well-defined for choice functions φ . In other words, the filters exist in the economy when perceptions induce well-defined expectations.
The theorem of the existence of filters in our abstract economy will be given in Section 4, Theorem 1. Here, we discuss the consequences of this theorem without presenting too many technical details. Suppose a process o φ ( · ) can be thought of as the Y t -measurable representation of the choice function φ ( · ) . The theorem states that given some regularity conditions, for any choice function φ , the expectation of φ ( X t ) w.r.t. the filter π t exists and is equal to the process o φ ( X t ) under the P measure. In other words, agent’s perceptions of φ ( · ) coincide with the observable information. Therefore, the existence of o φ ( · ) induces the existence of π t ( · ) for the choice function φ ( · ) and vice versa.
Note that although relations between X t and Y t exist, expectations conditional on Y t do not necessarily coincide with those conditional on F t . In particular, the P -null information originates in F t , but in Y t this information contains unpredictable events that may happen. Once agents observe these unpredictable events, their perceptions will be influenced. We will emphasize this point in the following subsection.

2.3. The Importance of the P -Null Set

Although it complicates the set-up, the P -null set is a crucial feature in ( Ω , F , P ) . Apart from its mathematical characteristics, it is meaningful in economic problems and affects our way of evaluating a model using empirical data.
The role of the P -null set in defining a conditional probability has first been illustrated by Kolmogorov in his famous Borel–Kolmogorov paradox. The paradox shows that the conditional probability is not uniquely defined with respect to a null set, see Kolmogorov (1956, chp. 5) and Bain and Crisan (2008, chp. 2)). From an economic perspective, one can think of the P -null set on F and Y as those unexpected events which have been included in the underlying economic mechanism F and in the agent’s observable information set Y .
Although most events in the P -null set of F correspond to events in the P -null set of Y , the two null sets are not equivalent. To see the subtle difference, let us assume that the P -null events in Y result from aggregating countable P -null events in F . The aggregation leads to uncountable events which are too “complex” to be embedded in the underlying model, the probability space ( Ω , F , P ) . The model ( Ω , F , P ) attributes zero-measures for any countable event sets that are beyond its explanatory power, but for uncountable event sets the model cannot even affirm their existence.
We illustrate the economic meaning of some P -null sets on Y by a concept which we call overflow. The effect of this overflow is related to the regularization of the P -null set on F , which is a result of Theorem 1.
To give an example of overflow, consider economic bubbles. There is a long debate whether or not economic bubbles exist. Rather than joining the debate, our intention here is to use bubbles as an example to illustrate overflow characteristics. Suppose some individual gamblers have complex trading strategies, and their gains are publicly observable. These speculative trades, therefore, are included in the information set Y t at the agents’ disposal. However, the strategies behind these trades may not be fathomable by the public and are conducted in manyfold ways, such as forbidden disclosures (private information), special technical equipments (e.g., high-frequency trading), or even improper policies (lobbying). Any economic model that wants to cover some or all of these specific features will make its complexity explode. This limitation is recognized by the public, and hence it is reasonable for the public to believe that the underlying economic model ( Ω , F , P ) will set zero measure on each of these strategies and the associated actions because they are unexplained by the model. In other words, each action of the trading strategy is in the P -null set on ( Ω , F , P ) .
The economic bubble can be considered as an aggregated effect of these trading strategies. Since there are numerous speculations happening in every minute, it is natural to think that their aggregation is uncountable. Later, we will show that an uncountable collection of null sets is not necessarily incorporated in the P -null set of F . This means that the aggregated effect, the bubble, may have a positive probability to occur, namely to appear in Y .
To formalize the previous argument, let A 1 , A 2 , … S be a sequence of pairwise disjoint sets. In order to ensure that π t is a regular conditional distribution, the σ -additivity condition needs to be satisfied:
π t i A i = i = 1 π t ( A i )
for every ω Ω \ N ( A i , i 1 ) , where N ( A i , i 1 ) is the P -null set for the disjoint set A i for any i 1 . Let the collection of these null sets be N 0 . Note that the power set of all null sets is 2 N which is uncountable. This means that N 0 is uncountable. We know that π t satisfies the σ -additivity condition only if ω N ( A i ) for any i 1 but not ω N 0 . Therefore, some event in N 0 \ i = 1 N ( A i ) is not in the null sets for π t and has positive probability to occur:
Y { N 0 \ i = 1 N ( A i ) } .
In fact, the set N 0 need not even be measurable because it is defined in terms of an uncountable union.6 Then π t cannot be a probability measure. The purpose of Theorem 1 is to regularize this problem so that the projected π t is on a countable subspace. This regularization implicitly forces π t to ignore those collections of countable P -null sets on F . As a consequence, the abstract economic model might not be a “proper” model for all events, but one that approximates a complex reality.

3. A Feasible Econom(etr)ic Model

As shown in the previous section, the P -null set on F may induce the arbitrariness of π t ( ω ) on Y . For the P -null set on Y , individuals may have arbitrary beliefs about the event sets, because they cannot figure out any “law” on the set. The arbitrariness allows us to modify Y t -adapted processes by changing the values of these processes on the P -null set, which corresponds to a change of measure. Then the new process should still be Y t -adapted. It accommodates the complexity of the real world but it induces a class of arbitrary filters π t . Due to the arbitrariness, the conditional distribution process π t ( ω ) exists even though some observable event sets in the economy are not explained by the underlying model. If the model needs a regular solution, it should be disencumbered of these irregularities. In this section, we look for a feasible model that will regularize the expected process o φ ( X ) and exploit a specific representation of it.
With three additional claims, one can obtain an explicit solution rather than an abstract process of o φ ( X ) . These claims are the following: First, the martingale fairness claim regularizes a class of probabilities that are not uniquely defined on the P -null set on F . Second, the invariance fairness claim induces a specification of X that is embedded in the general model ( Ω , F , P ) . Finally, the independent complement claim specifies the motions of the observable process Y. The first and second claims basically consider the same issue of finding a feasible sub-class models of the underlying economy ( Ω , F , P ) , but the development of the invariance fairness claim depends on the martingale fairness claim. With the specification of the law of X t , the last claim induces a feasible representation of o φ ( X t ) based on the observable process Y.

3.1. Fairness Existence

The following claim introduces a “stochastic constant” upon which we can build our model:
Claim 1
(Martingale Fairness, MF7). A probability measure Q on ( Ω , F ) is absolutely continuous with respect to P , such that Q P . The information of state X t at any time t is “fair” for all agents under Q and the information is memoryless, i.e., the process X t is Markovian.8
Fairness means the martingale property of X:
E Q [ X t | F s , s t ] = X s and E Q [ X t X s | F s , s t ] = 0 .
The martingale model ( Ω , F , Q ) is treated as a ghost model since fairness may never happen in reality. However, if one accepts the existence of this martingale model, it will guide us to a feasible base-line model and help us to solve the original problem. If there is a P -martingale process Z on ( Ω , F ) , then any Q -martingale process X implies a P -martingale process Z X , due to the absolute continuity of Q and P . It is obvious that if a process can be regularized on either measure, then it can also be regularized on the other one.
The Markovian structure of X means that the filtration F s is independent of the F -adapted X u if s < t < u . For arbitrary time t < u , the Markovian structure implies a transition kernel Q u t ( X u | X t ) . The Chapman-Kolmogorov equation of the transition kernel is also available such that
Q u s ( X u | X s ) = Q u t ( X u | X t ) Q t s ( d X t | X s )
which can be simply stated as Q τ + τ ( · | · ) = Q τ Q τ for τ = t s , τ = u t . The existence of the kernel Q τ ( · | · ) is a direct result of the Kolmogorov existence theorem (Kallenberg 2002, Theorem 7.4). It is obvious that the transition kernel Q τ ( · | · ) is a regular conditional probability.
With the MF claim, in Section 4 Corollary 1, we give a gain-loss (master) equation to describe the dynamics of X t :
τ Q τ ( X u | X s ) = W ( X u | X t ) Q τ ( d X t | X s ) W ( d X t | X u ) Q τ ( X u | X s )
where the function W ( X u | X t ) is the time derivative of the transition probability at τ = 0 , called transition probability per unit time. This equation describes the complete transition pattern of X by showing the variation of the corresponding transition kernel. If τ Q τ ( X u | X s ) is set to zero, the evolution of X attains a balance. The equation merely states the fact that the sum of all transitions per unit time into any state X t must be balanced by the sum of all transitions from X t into other states. Gain balances loss, in other words, we have a steady state.9

3.2. Invariance Behaviors

With the martingale fairness claim, we have seen that the Markovian model gives us an equation to measure the variation of state transitions of the underlying economy. The equation is valid at any time-point and in any state, but the equation provides no clue about W ( · | · ) , the transition probability per unit of time. Now an idea is to extract some information about the statistics of W ( · | · ) , in particular first and second moments. This type of information should be able to generate a class of sub-models that mimic the behavior of the original model of X. We need to find out under which conditions the sub-model is equivalent to the original one, in which case no loss of information occurs when representing ( Ω , F , P ) by the ghost model ( Ω , F , Q ) .
Let f ( · , · ) be a function satisfying the maximum principle up to second order, which means that for a compact subset of states B S , at time t, the maximum of f ( t , x ) in x B is found on the boundary of B, B . The simplest example of f is a function in the linear functional class such that for fixed t, x < x B implies f ( t , x ) < f ( t , x ) (or >), x f ( t , x ) 0 (or ≤) and x f ( t , x ) = 0 on B R . The extremum of f ( t , · ) always exists on the boundary of the domain. Here x and x denote the Laplace and gradient operators on x, respectively.
Think of f ( · , · ) as a time-dependent utility or value function. The requirement of f ( · , · ) being maximal up to second order means that x f ( t , x ) is proportional to f t ( t , x ) so that one can set up their relation by some equation, for example
f t ( t , x ) = 1 2 x f ( t , x )
which would imply that X t follows a Wiener process. Thus, the maximum principle pins down a specific evolution class for X t . We have the following claim to incorporate this idea.
Claim 2
(Invariance Fairness, IF). If claim MF is true, then for any f ( t , x ) satisfying the maximum principle up to second order, there exists a martingale measure such that f ( t , x ) will preserve the fairness on this measure. The law of X t will also satisfy the maximum principle.
Theorem 2 in Section 4 will show that the IF claim is another way of specifying Itô’s diffusion problem.10 To the best of our knowledge, this is the first time that the problem is motivated on the basis of the maximum principle. Understanding the connection between this economic claim and econometric models will help us to assess the potentials of modeling. That is, before doing estimation, testing, or prediction, it is essential to realize how far the model can reach in principle.
The diffusion structure induces a Wiener process specification for W ( · | · ) . The first and second moments of the process are given by
x W ( X t | d x ) , x 2 W ( X t | d x ) ,
where W ( · ) is the transition probability per unit time under the Wiener law. This is a diffusion martingale type model. Given the whole transition contents of X, our attention is only restricted to those transitions that will maintain the maximum principle up to second order. The reason is that only the transitions satisfying invariance fairness can be revealed and identified in standard econom(etr)ic models. It does not mean that the unqualified transitions do not exist. Conversely, many transitions in the system have high order features such as complex trading strategies in pricing, multiple correlated options, etc. What we can state however is that those transition features are too complex to be embedded in a diffusion model.11 Therefore, those higher order transition laws of X will be assigned to the P -null set in F .

3.3. Indifferent Projection

The last component we have not yet exploited is the observable process Y. In the economy, the process Y reflects the law of X, so the topological structure of Y should contain as much information as X. Since the IF claim is nothing but pinning the space of X onto the Wiener space L 2 ( W ) , an L 2 space with Wiener measure, it is natural to assume that Y can be represented in a similar space.
Given any map h in L 2 and Y = h ( X t ) , if Y can maintain all the information of the martingale diffusion process of X, then we say that Y shares an isometry property with X. Except for the information maintained under the isometry property, note that some information in Y , such as the collection of P -null sets in F , is not contained in X but affects the outcome of Y. We use measurement errors to represent this information. The following claim is to specify the law of Y.
Claim 3
(Independent Complement, IC). Let h ( · ) be a map in L 2 which satisfies the maximum principle up to second order as in the IF claim. Suppose the observable process Y is contaminated by an additive generalized Wiener noise W, where the noise process W t is generated by the information set F t but is independent of h ( X t ) .
The information set of W is generated by F \ σ ( X ) where σ ( X ) is the σ -algebra generated by those X satisfying MF and IF claims. In practice, the Wiener process is also modeled independently12 of X t . Thus Y t is a larger filtration than F t , i.e.,
Y t = σ ( X s , W s , s [ 0 , t ] ) N ,
since it allows for the measurability of the noise process. The process Y satisfying the IC claim is given as follows:
Y t = 0 t h ( X s ) d s + W t , t 0 .
Note that this specification is to restrict the process Y in L 2 ( Y t ) because
E h ( X s ) 2 d s < , and W t L 2 ( Y t ) .
Theorem 3 in Section 4 implies that for a class of these models, there will be a concrete way of specifying the conditional expectation process of this class. This theorem is an important step to derive a specific form for the filter. The representation of X t , as a result of IF claim, induces a feasible conditional expectation for φ ( X ) , while IC claim allows us to attain the expectation of φ ( X ) conditioning on the information generated by Y t .
IC claim has a similar role as MF claim. MF claim is to ensure the existence of a martingale problem for the state process. IC claim does the same but for the observable process. The aim is to make the state process X t , the diffusion generator in IF and the observable process Y t comparable.

3.4. An Explicit Representation

The previous claims are to obtain a representation of the conditional distribution π t for our class of models. Once a representation of π t is available, each model in this class will correspond to a specification of this representation. The data contained in the information Y will be useful for estimating the parameters of this specification. A specified representation plays a role as a predictor for the corresponding model and observable information.
The filtering problem is, essentially, to determine the conditional distribution π t of X ( ω ) at time t given the information accumulated from observing Y in the interval [ 0 , t ] . Given all three necessary claims, we show in Theorem 4 that for any bounded continuous choice function φ C b ( S ) , we can compute the conditional expectation of φ
π t ( φ ) : = E [ φ ( X t ) | Y t ] ,
via an equation called Kushner-Stratonovich-Pardoux equation.
Many dynamical estimates consist of computing the conditional distribution of a target process given a partially observed history. As the explicit solution of π t ( φ ) gives the end time marginals of the preceding conditional distributions defined for any bounded φ , this explicit representation of π t ( φ ) provides a concrete basis of nonlinear estimation problems. This point of view is also at heart of the Bayesian methodology, where the conditional distribution is the posterior and the path distribution of the states is the prior.

4. Main Results

The state X t is F t -adapted, while the constructed conditional expectation is evaluated by π t X t ( ω ) A = P X t A | Y t ( ω ) , an Y t -adapted process. Thus E [ φ ( X t ) | Y t ] = φ ( x ) π t ( x ) may not be well-defined. Let o φ ( X t ) be a counterpart of φ ( X t ) that is projected on the smallest σ -algebra on ( [ 0 , ) × Ω , B ( [ 0 , ) ) F ) such that o φ ( X t ) is Y t -adapted and measurable. Our first result is to show that for any choice function φ B ( S ) , the conditional expectation of φ ( X t ) is equivalent to o φ ( X t ) in probability. This result implies the existence of filters in our abstract economy.
With an enlarged σ -algebra, a representative process will be defined for φ ( X t ) even if φ ( X t ) is not Y t -adapted (Rogers and Williams 2000, Theorem 7.1). This theorem is called projection theorem and will be used in the proof of Theorem 1. The projection theorem says that if a process X is measurable and bounded, then for every stopping time T, there is a representation o X (optional process) such that
o X T I { T < } = E [ X T I { T < } | Y T ] ,
as a projection of X onto Y where I { A } is an indicator function for a set A. Here no restriction is imposed on the stopping time T. The notion of optional processes is due to Meyer (1976), see also (Doob 1983, pp. 388–98) and Brémaud and Yor (1978). The idea of projecting an F -measurable element onto Y is similar to formulating a filter of X given the observable information in Y . We apply this result to show the existence of a filter in the abstract economy.
Theorem 1.
Let P ( S ) denote the space of all probability measures on S . For a compact set S and its Borel σ-algebra S , there is a P ( S ) -valued conditional distribution process π t such that for any bounded S -measurable function φ B ( S ) ,
P S φ ( x ) π t ( d x ) = o φ ( X t ) t = 1 .
This distribution process is an equilibrium process for the economy whose underlying states are in a probability space ( Ω , F , P ) and whose observable information is contained in Y .
Theorem 1 implies the existence of π t for the abstract model given Section 2. The importance is that a perception of any bounded choice function always exists in this abstract economy although there could be multiple ways of forming the perception due to the incompatibility between underlying and observable layers of the economy.
Following the model in Section 2, we will specify a uniquely representable law of perception by using the claims in Section 3. First, the martingale fairness (MF) claim in Section 3.1 regularizes a class of probabilities of these processes that are not uniquely defined on the P -null set on F so that the analysis of the model can rely on one ghost model ( Ω , F , Q ) . The claim also discloses that the evolution of the state X is completely captured by the transition kernel Q τ ( · | · ) , whose variation describes the variation of the evolution pattern of X. Thus, the MF claim extracts important characteristics of the underlying dynamics.
Corollary 1.
The martingale model ( Ω , F , Q ) implies a gain-loss equation for the system such that:
τ Q τ ( X u | X s ) = W ( X u | X t ) Q τ ( d X t | X s ) W ( d X t | X u ) Q τ ( X u | X s ) .
The first term is the gain of state X u due to transitions from other states X t and the second term is the loss due to transitions from X u into other states.
The equation in Corollary 1 needs a further specification because of the unspecified W ( X u | X t ) . With the invariance fairness (IF) claim in Section 3.2 we can obtain a specification within the Itô diffusion problem. The following theorem gives the equivalence.
Theorem 2.
For X t ( S , S ) and f ( t , · ) C b , the following are equivalent:
(i) If claim IF is true, any f ( t , X t ) in C b ( [ 0 , ) , S ) has an approximating model that relies on the information contained in the first two moments of the process f ( t , X t ) .
(ii) The function f ( t , X t ) is an Itô diffusion process with drift and diffusion terms, ( a , b ) = ( a ( X t ) , b ( X t ) ) .
For a diffusion type process, its first and second order moment describe the full dynamics. Thus we can specify W .
Corollary 2.
If P M ( P ( S ) ) , then
f ( X t ) f ( X 0 ) 0 t ( A f ) d t , PB t , P
is a martingale, where A : = a ( · ) x + 1 2 b ( · ) x . In addition, if a ( · ) and b ( · ) are bounded and continuous, the weak solution of the diffusion problem ( a , b ) is unique. Then
a ( X t ) = x W ( X t | d x ) , b ( X t ) = x 2 W ( X t | d x ) ,
where the transition probability per unit time W ( · ) has the Wiener law.
By the property of characteristic functions, the martingale in Corollary 2 together with the initial condition captures all the information, the first and the second order moments, of W . This implies that A ( · ) captures the first two moments information of the process f ( X t ) on the economic model ( Ω , F , P ) . Therefore the process f ( X t ) is a diffusion type process on the Wiener path.
In fact, the IF claim is nothing but pinning the problem onto the Wiener space L 2 ( W ) , an L 2 space with Wiener measure. The martingale representation theorem says that any continuous martingale, i.e.,
M f , t : = f ( X t ) f ( X 0 ) 0 t ( A f ) d t , F t , P
generated by W , can be written as
M f = E [ M f ] + 0 T h s d W s
with a predictable process h s i.e., each h s is Y t -measurable for t < s . Without loss of generality, we consider the case E [ M f ] = 0 . The functional space of h s is
L T 2 : = h s : h s   is   F t predictable   and   E 0 T h s 2 d s < .
The stochastic integral of h is a map J : L T 2 L 2 ( F T ) such that
J ( h ) = 0 T h s d W s .
This map is an isometry as a consequence of the Itô isometry theorem. The image of J of the Hilbert space L T 2 is complete. Therefore, the martingale M f and the stochastic integral J ( h ) L 2 ( F t ) are isometric.
What we emphasize here is that the IF claim carries us to an L 2 space where the classical projection techniques are available.
Theorem 3.
If the MF and IF claims hold, then the IC claim in Section 3.3 implies the representation (1) for the observable process Y t . Suppose E exp 1 2 h ( X s ) 2 d s < , then the following statements are true:
(i) There exists a measure P ˜ such that
d P ˜ d P F t = exp 0 t h ( X s ) d W s 1 2 0 t h ( X s ) 2 d s ,
and, under measure P ˜ , Y is independent of X. In addition, the motions of X under P ˜ and under P are the same.
(ii) For any F t -measurable random variable φ ( X ) ,
E ˜ φ ( X ) | Y t = E ˜ φ ( X ) | Y
where Y = t R + Y t and Y t = σ ( Y s , s [ 0 , t ] ) N .
The time-invariant algebra in Theorem 3 enables us to use techniques based on Kolmogorov’s conditional expectation which would not be applicable if the conditioning set was time dependent, such as Y t .
With the existing results, we summarize the model specification in Section 3 as the following pair ( X , Y ) : X is a solution of the martingale problem for ( A ; π 0 ) ; in other words, assume that the distribution of X 0 is π 0 and that the process M f = { M f , t , t 0 } , where
M f , t = f ( X t ) f ( X 0 ) 0 t A f ( X s ) d s , t 0 ,
is an F t -adapted martingale for any f C b and ( A f ) ( · ) corresponds to ( a ( · ) , b ( · ) ) of a diffusion process. Y satisfies the evolution equation
Y t = 0 t h ( X s ) d s + W t , t 0 ,
with null initial condition.
Then our attempt is to connect the martingale problem in (4) with a diffusion type representation. Theorem 2 tells us that when the process is on the Wiener path, the solution of a martingale problem associated with the second order differential operator is the solution of the diffusion process. Theorem 3 tells us that Y is on the Wiener path under P ˜ .
Finally, we reach a specific representation of the perceptions in the abstract economy.
Theorem 4.
(Kushner-Stratonovich-Pardoux, KSP) For any φ C b ( S ) , Proposition A1 implies
π t ( φ ) = π 0 ( φ ) + 0 t π s ( A φ ) d s + 0 t π s ( φ h ) [ π s ( h ) ] 2 ( d Y s π s ( h ) d s ) .
where π t is the equilibrium density process for our general filtering setting. The conditional expectation π t ( φ ) = E [ φ ( X t ) | Y t ] varies accordingly to (5).
Equation (5) is called KSP which has recently been applied to solve non-linear filtering and smoothing problems in applied mathematics, see (Bensoussan 2004). One can think of the KSP representation as characterizing an equilibrium conditional expectation over any φ C b ( S ) . It is a stochastic PDE problem and has a unique solution.13
Theorem 4 is a rather general result. Although solving the KSP problem can be transferred to solving a parabolic PDE problem, except for the case of a linear model and Gaussian disturbances and initial conditions, finding a closed form expression for the distribution functions of (5) can be very demanding.

5. Remarks

We give respective remarks regarding the previous claims, the modeling procedure within the general framework, and the relation between general filtering results and the linear ones.

5.1. Claims in Asset Pricing Models

The three claims of the previous section have their counterparts in asset pricing models. For illustration purposes, we only consider a simple situation where X and Y are measurable and observable.
Let X be the price for some security contingent on an underlying asset S. Suppose that the price at time t is a random variable X t = 0 t H s d S s , where the integral is the Itô integral and H t is predictable, i.e., each H t is F s -measurable for s < t . The “fairness” in MF says that any X constructed in this way will have zero expected pay-off for some discounted price process under a probability measure Q such that E Q [ exp ( t T r s d s ) ( X T X t ) | F t ] = 0 , where r t is the risk-free short rate process. If this happens, by the fundamental theorem of asset pricing, the securities market admits no arbitrage. Q is called the equivalent martingale measure.
Markov uncertainty is often assumed for diffusion processes. The claim IF pins down a specific transition of the security process as a diffusion process. This implies that the process X is also a diffusion process. Suppose that d X t = μ t d t + σ t d W t where ( μ t , σ t ) characterize the instantaneous drift and volatility, respectively, of this security. By Girsanov’s theorem, there is a risk-free measure P ˜ for the function φ ( X t ) satisfying the maximal principle up to second order. In particular, if the market is complete, then for any diffusion process Y one can obtain Y by some self-financing strategy h ( · ) such that Y t = h ( X t ) .
All consequences induced by these three claims, e.g., no-arbitrage, diffusion path, and risk-free measures, are familiar to economists and should be acceptable for most dynamic models in econom(etr)ics.

5.2. Stochastic Volatility

Consider an equity model with stochastic volatility as in Elliott and Swishchuk (2007)
d S t = μ S t d t + z ( X t ) S t d W t
where S t is the price of a stock, the function z ( · ) is known and X t is a hidden state Markov process. An example is the Heston model, where z ( x ) = x and
d X t = κ ( m X t ) d t + γ X t d B t .
In general, if we observe a continuum of prices, then z ( X t ) is measurable with respect to the filtration generated by { S τ : τ t } . Let the observable process Y t = log S t , and notice that14
d Y t = μ 1 2 z 2 ( X t ) d t + z ( X t ) d W t .
The noise W t of Y t contains a diffusion coefficient function z ( X t ) . The suitable corresponding P ˜ for Y is
d P ˜ d P F t = exp 0 t μ z 2 ( X t ) / 2 z ( X t ) d W s 1 2 0 t μ z 2 ( X t ) / 2 z ( X t ) 2 d s ,
so that under P ˜ the process is independent of X t .
We can discretize the model. For a fixed t > 0 , let ( t k ) k be a partition of [ 0 , t ] , then the quadratic variation of Y is the cumulative variance
[ Y ] t = lim sup k ( t k + 1 t k ) 0 k ( Δ Y t k ) 2 = 0 t z 2 ( X τ ) d τ
where 0 t z 2 ( X τ ) d τ is Y t -measurable. If z ( X t ) is a continuous process, we have
d d t [ Y ] t = z 2 ( X t )
that is also Y t -measurable. Thus if z ( X t ) is a continuous process, the volatility is observable for almost every t, and X t is observable if z 1 exists.
With the Markov structure for X t , one can price derivatives on S t . The Black-Scholes price of a European call option C B S ( t , S t ; T , K , Z [ t , T ] ) in the presence of Markovian volatility is
E [ C B S ( t , S t ; T , K , Z [ t , T ] ) | Y t ] = C B S ( t , S t ; T , K , x ) π t ( d x ) ,
where Z [ t , T ] = 1 T t t T z 2 ( X s ) d s , and E [ · ] is w.r.t. the market’s pricing measure. Given X t and the parameters of its dynamics under the market measure, we can compute the expected return of the call option by taking a filtering expectation.

5.3. Kalman Filter

If h ( X t ) and f ( X t ) at every time t can be linearized as matrices (vectors) H t X t + h t and F t X t + f t such that
X t = X 0 + 0 t ( F s X s + f s ) d s + 0 t σ s d V s ,
Y t = 0 t ( H s X s + h s ) d s + W t ,
then KSP with test functions φ = x i and φ = x i x j will give us the standard Kalman filter, also called Kalman-Bucy filter, as follows. Let x ^ be the conditional mean of X such that
x ^ i , t = E [ X i , t | Y t ]
and R be the conditional covariance such that
R t i j = E [ X i , t X j , t | Y t ] E [ X i , t | Y t ] E [ X j , t | Y t ] .
If (6) and (7) are acceptable localizations for (4), then the solution of x ^ t satisfies the following SDE
d x ^ t = ( F t x ^ t + f t ) d t + R t H t T ( d Y t ( H t x ^ t + h t ) d t ) ,
where we substitute f ( x ^ t ) = F t x ^ t + f t and h ( x ^ t ) = H t x ^ t + h t into KSP Equation (5) and where the covariance term R t satisfies the deterministic Riccati equation15
d R t d t = σ t σ t T + F t R t + R t F t T R t H t T H t R t .
Equations (8) and (9) together give the Kalman-Bucy filter scheme. One can see that this scheme is a special case in the content of KSP equation. While we feature the Kalman filter in this paper, there are other well known filtering methods including particle filters and the Zakai equation that are relating to KSP problem.

6. Conclusions

We have started with the fact that filtering is an intrinsic element of economic phenomena. For a general abstract economy, we provide a result on the existence of filtering mechanisms. We emphasize a subtlety due to null sets that may lead to peculiar events with positive probability after aggregation even though on an individual level such events have zero probability. This feature turns out to be crucial for the understanding and interpretation of the economic model. It also has to be regularized in the derivation of the existence result.
By introducing three natural claims, we established a representation of the conditional distribution process and, hence, of the filtering device. The general representation is nonlinear and subject to estimation using statistical methods. We have outlined the realm of economic models for which this representation is applicable. The implication of our findings for the way economic theory and econometrics interact in general has yet to be discovered.

Author Contributions

Z.G. and C.M.H. have contributed equally to all parts of the manuscript.


This research received no external funding.


The authors would like to express their gratitude to Ken Judd and Peter C. B. Phillips for useful discussions at the early stage of this manuscript, and the participants in the seminars at University of Amsterdam. All the remaining errors are ours.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proof of Main Theorems

Appendix A.1.1. Proof of Theorem 1

The proof includes four steps: 1. construct a countable vector space U on S , 2. define a non-negative process that corresponds to the elements in U , 3. extend U to the space of continuous bounded functions, C b ( S ) , check that the definition of the process is still valid, and find a representation of π t , 4. extend C b ( S ) to B ( S ) and check that all properties are still valid.
Step 1. For C b ( S ) , compact S induces that C b ( S ) is dense and that a linear span exists. Let { φ i } i = 1 be the set of basis functions in the linear span and thus any φ i is bounded continuous. Let U be a countable vector space generated by finite linear combinations of { φ 1 , , φ n } with rational coefficients such that
U : = φ = i = 1 n α i φ i , α i   is   rational   for   all   i .
These φ i s are still linearly independent for any i Z + . Set φ 1 = 1 .
Step 2. For any t, φ n ( X t ) is another F t -adapted process16. Equation (3) implies that a Y t -adapted optional process g n t exists for φ n ( X t ) . Thus a sequence { g i t } i = 1 n is corresponding to { φ i ( X t ) } i = 1 n . For some N Z , linear independence induces that a function φ U is uniquely represented by i = 1 N α i φ i and furthermore it implies that a Y t -adapted process g t , corresponding to φ , is linearly and uniquely represented by i = 1 N α i g i t . We can define the linear functional
Λ ω t ( φ ) = g t ( ω ) , t .
Because the conditional distribution is a non-negative process, we need to construct a non-negative analog of Λ ω t . Define a subspace
U + : = { u U , u 0 }
that is countable. For u U + and fixed t, we define the null set for u such that
N ( u ) : = ω Ω : Λ ω t ( u ) < 0 .
Since u 0 , in order to show that N ( u ) is a P -null set, we need to show Λ ω t ( u ) 0 almost surely. If u ( ω ) 0 almost surely, then by Equation (3) the optional process would be non-negative on Y T and hence N ( u ) is a P -null set. The union of N ( u ) over U + ,
N : = u U + N ( u )
is a countable union. A new process Λ ¯ t ω is defined as
Λ ¯ t ω ( φ ) : = Λ t ω ( φ ) ω N , 0 ω N .
Step 3. In order to extend the definition of Λ ¯ t ω ( φ ) to φ outside U , we first need to check that Λ ¯ t ω is bounded. It is obvious that Λ ¯ t ω ( 1 ) = 1 . Since φ U , the uniform norm has the property that | φ | φ 1 . Then φ 1 ± φ 0 , from step 2, we know
Λ ¯ t ω ( φ 1 ± φ ) 0 φ ± Λ ¯ t ω ( φ ) 0
where the second inequality comes from the linearity of Λ ¯ t ω and Λ ¯ t ω ( 1 ) = 1 . It implies
sup t Λ ¯ t ω ( φ ) < φ ,
so that Λ ¯ t ω is bounded.
Let any φ C b ( S ) . Since U is dense in C b ( S ) , there exists a sequence φ k U such that φ k φ . We can define
Λ ˜ t ω ( φ ) : = Λ ¯ t ω ( φ ) φ U , lim k Λ t ω ( φ k ) φ C b ( S ) \ U
over C b ( S ) . For boundedness, we only need to check the case φ C b ( S ) \ U . Note that for any two sequences φ k and φ j , if φ k φ and φ j φ , we will have
sup Λ ˜ t ω ( φ k ) Λ ˜ t ω ( φ j ) φ k φ + φ φ + φ φ j
by the boundedness result in U and the triangle inequality. Thus, Λ ˜ t ω ( φ ) is bounded.
We also need to ensure that the optional process of Λ ˜ t ω ( φ ) is well-defined on C b ( S ) . For φ k in U , we have a Y t -adapted process Λ ˜ t ω ( φ k ) for φ k ( X t ) , and
E Λ ˜ T ω ( φ ) I T < = lim k E Λ ˜ T ω ( φ k ) I T < , = lim k E φ k ( X T ) I T < , = E φ ( X T ) I T < .
The last equation is implied by the dominated convergence theorem for bounded sequences.
Since S is compact, the Riesz representation theorem implies the existence of π t ω ,
Λ ˜ T ω ( φ ) = S φ ( x ) π t ω ( d x ) = π t ω , φ = π t ω φ , for   t
for any bounded and well-defined inner product.
Step 4. The last step is to extend the definition of π t ω φ to incorporate φ B ( S ) . Let B ¯ ( S ) be a subset of B ( S ) such that π t ω φ is a Y t -adapted optional process of φ ( X t ) on B ¯ ( S ) . It is obvious that C b ( S ) B ¯ ( S ) . Note that the Borel σ -algebra generated by B ( S ) is B ( S ) . By the completeness of C b ( S ) , we can construct a sequence of subsets { B ¯ i ( S ) } i such that
B ¯ 1 ( S ) B ¯ 2 ( S ) .
Compactness of S implies that B ( S ) is closed under finite intersections. From the construction in step 1, we know that the constant function is included in every B ¯ i ( S ) . The monotone class theorem implies i B ¯ i ( S ) B ( S ) , since any monotone non-negative increasing sequence { B ¯ i ( S ) } i , with indicator function of every set in S , contains the σ -algebra B ( S ) which is closed under finite intersections. Thus B ¯ ( S ) contains every bounded S -measurable function of S . As B ¯ ( S ) is a subset of B ( S ) , we conclude B ¯ ( S ) = B ( S ) . □

Appendix A.1.2. Proof of Theorem 2

From (ii) to (i), the proof is trivially applying Itô’s calculus.
From (i) to (ii), the proof consists of the following four steps: 1. show that the maximum principle on smooth functions is equivalent to the law of Wiener processes, 2. show that the invariance of the law is preserved on the Wiener path, 3. set up the approximation on the Wiener path by showing that the martingale fairness is preserved, and 4. extend the result to the model ( Ω , F , P ) .
Step 1. The definition of the maximum principle is simply the first and second derivative conditions in calculus. If a function f : S R attains its maximum at point x S , then x f ( x ) = 0 and x f ( x ) 0 . Furthermore, if f is a time-dependent function such that f : [ 0 , T ) × S R at a certain time interval [ 0 , t ] , and f attains its maximum at x when time is t, then f ( t , x ) / t 0 with x f ( t , x ) = 0 and x f ( t , x ) 0 . The inequality f ( t , x ) / t 0 expresses the uncertainty of the future such that f ( · , x ) / t could either strictly increase along t or attain its optimum at t. Since the maximum principle is preserved up to the second order, we have the heat equation
a ( x ) f t ( t , x ) + b ( x ) 2 x f ( t , x ) = 0 ,
without loss of generality, in steps 1 to 3, we only consider the standard case with the diffusion factor a ( x ) = b ( x ) = 1 , but (A1) holds for any real vectors a ( x ) and b ( x ) . The solution of (A1) is the well-known Wiener process.
Step 2: In order to formalize the concept of the Wiener path, we need to introduce the path space. Suppose that a series of realizations { x t i } t i t N corresponds to t via x i = ψ ( t i ) for t i t N . Then ψ : [ 0 , ) S is a continuous path with the image on the complete separable space S . A path space P ( S ) = C ( [ 0 , ) , S ) is a continuous function space of paths ψ . The σ -algebra PB is
PB s : = σ ψ ( t ) : t [ 0 , s ] , s [ 0 , )
generated by ψ P ( S ) ψ ( t ) S . The measure W for P ( S ) is called the Wiener measuresuch that for a sequence { ψ ( t i ) } t i t N = { x t i } t i t N :
W ψ : x 1 A t 1 , x t A t N = A t 1 A t N 1 2 π ( t 1 t 0 ) e ( x 1 x 0 ) 2 2 ( t 1 t 0 ) 1 2 π ( t N t N 1 ) e ( ( x N x N 1 ) 2 2 ( t N t N 1 ) d x t 1 d x t N .
The measure is tight in the sense that, if t s < ϵ ,
lim ϵ 0 sup ψ P ( S ) sup 0 s t T ρ ( ψ ( t ) , ψ ( s ) ) = 0
for any metric ρ ( · , · ) . This is the Ascoli-Arzela criterion for compact subsets.
We need to show that the invariance property of W is a restatement of the independent identical increment property.
Identical: Note that a function f over ψ will not change the expression except that ψ ( t ) is replaced by f ( ψ ( t ) ) . By Lemma 3.4.3 and Theorem 3.4.16 (Kolmogorov’s Criterion) of Stroock (2000), we have that for a subset μ of all tight measures M ( P ( S ) ) and ψ P ( R ) :
sup μ M ( P ( S ) ) E μ | ψ ( t ) ψ ( s ) | r C T | t s | 1 + α ,
where C T < is a constant, α > 0 and r 1 . Then we have
lim t s sup ψ P ( S ) ( ψ ( t ) ψ ( s ) ) 2 ( t s ) = lim t s sup ψ P ( S ) ψ ( t ) ψ ( s ) t s 2 ( t s ) 0 .
This means that the increments are controlled by the length of the time interval. When the interval is extremely small, all increments are essentially treated the same. So the smooth function f does not matter for the law of W .
Independent: For ψ , ϖ P ( R ) , let ϖ ( t ) = ψ ( t + s ) ψ ( s ) . By the definition of the Wiener measure, both ψ ( s ) and ϖ ( t ) associate with W on the time path [ 0 , s ] and [ 0 , t ] respectively. Clearly, they are independent.
Step 3. The reason why we are looking for a martingale representation is in fact to look for a “stochastic constant”. In the deterministic case, suppose we define an integral curve of ψ ( · ) on a smooth vector field a on R , starting at x R . Then the path ψ with ψ ( 0 ) = x has the property that
f ( ψ ( t ) ) 0 t a , x f ( ψ ( τ ) ) d τ
is a constant17 for any f C . If there is a stochastic analog, then we can use this stochastic constant to establish our approximating model. The aim is to maintain a stable “error”.
Recall the path space P ( S ) and its σ -algebra PB . For an incremental element ψ ( t ) ψ ( s ) on P ( R ) , the Fourier transform is:
E W e i ξ ( ψ ( t ) ψ ( s ) ) | PB s = e i ξ x 1 2 π ( t s ) e x 2 / 2 ( t s ) d x = e | ξ | 2 2 ( t s )
where x = ϖ ( t s ) = ψ ( t ) ψ ( s ) . What we want to obtain is a martingale and a “constant” under W . From the above equation, it easy to see that we can obtain both of them simultaneously if we shift the element exp i ξ ψ ( t ) by a Gaussian factor exp | ξ | 2 t / 2 :
E W e i ξ ψ ( t ) e 1 2 | ξ | 2 t | PB s = e 1 2 | ξ | 2 t E W e i ξ ψ ( t ) ψ ( s ) + ψ ( s ) | PB s = e 1 2 | ξ | 2 t e | ξ | 2 2 ( t s ) E W e i ξ ψ ( s ) | PB s = E W e i ξ ψ ( s ) e 1 2 | ξ | 2 s | PB s = 1 .
Let a triplet denote this martingale on the Wiener path W :
exp i ξ ψ ( t ) + 1 2 | ξ | 2 t , PB t , W .
We define the Fourier transform of f by F f ( ξ ) = f ( x ) e i ξ x d x , and the inverse Fourier transform is F 1 f ( ξ ) = f ( x ) e i ξ x d x .
As in the deterministic case, the ideal representation of f ( t , ψ ( t ) ) on W is the path integral:
0 t x f + 1 2 x f ( τ , ψ ( τ ) ) d τ .
We need to check whether the approximation error is a “constant” in the stochastic sense. Note that
f ( t , x ) = ( 2 π ) 1 e i ( ξ t + ξ x ) ( F 1 f ) d ξ d η .
By the property F 1 ( x ) ( · ) = i ξ F 1 ( · ) , we have
F 1 x f + 1 2 x f = i ξ 1 2 | ξ | 2 ( F 1 f ) .
The approximating error is
f ( t , ψ ( t ) ) 0 t x f + 1 2 f ( τ , ψ ( τ ) ) d τ = ( 2 π ) 1 e i ( ξ t + ξ ψ ( t ) ) 0 t e i ( ξ τ + ξ ψ ( τ ) ) ( i ξ 1 2 | ξ | 2 ) d τ M ξ ( t ) ( F 1 f ) d ξ d η
The Fourier term F 1 f is bounded and irrelevant for W . If M ξ ( t ) is a martingale in W , then the error will be a stochastic constant. Rewrite M ξ ( t ) as:
M ξ ( t ) = e i ξ t e i ξ x 0 t e i ξ ψ ( τ ) e i ξ τ d ( i ξ 1 2 | ξ | 2 ) τ .
The second term can be written as
0 t e i ξ ψ ( τ ) + 1 2 | ξ | 2 τ d ( e i ξ τ · e 1 2 | ξ | 2 τ )
and the first term can be written as e i ξ t 1 2 | ξ | 2 t e i ξ ψ ( t ) + 1 2 | ξ | 2 t . Fubini’s Lemma together with (A2) implies that
E W M ξ ( t ) | PB s = 1 · E W e i ξ t 1 2 | ξ | 2 t 0 t d ( e i ξ τ 1 2 | ξ | 2 τ ) d τ | PB s = 1 .
Thus f ( t , ψ ) 0 t x f + 1 2 x f ( τ , ψ ) d τ , PB t , W is a martingale.
Now we consider the general case in (A1). If the state moves with velocity a ( X t ) , the path derivative becomes a ( · ) f . Moreover, the Laplace operator Δ in the heat Equation (A1) may be associated with a volatility coefficient b ( · ) . Then the approximating model is given by
0 t a ( X s ) x f + 1 2 b ( X s ) x f d s ,
which is the integral of the Feller generator A on f:
A : = a ( · ) x + 1 2 b ( · ) x .
The generator is a dual representation of a diffusion process ( a , b ) such that
d X t = a ( X t ) d t + σ ( X t ) d V t
where b ( X t ) = σ ( X t ) T σ ( X t ) and V t is a Wiener process.
Step 4. Since the martingale with initial condition W ( ψ ( 0 ) = x ) = 1 completely characterizes W , the above result can be extended to any P by the Principle of Accompanying Laws and Donsker’s Invariance Principle (Theorem 3.1.14 and 3.4.20, Stroock 2000) if and only if P belongs to the family of all tight measures, M ( P ( S ) ) . In our setup, S is a compact metric space so the collection of P ( · ) over S is tight. The Principle of Accompanying Laws says that if a sequence is in a complete separable space with tight measure, the law of this sequence will weakly converge. Donsker’s Invariance Principle says that for independent increment processes, the convergent law is the law of the Wiener process. Therefore, P M ( P ( S ) ) and
f ( X t ) f ( X 0 ) 0 t ( A f ) d t , PB t , P
is a martingale.
The IF claim says that a martingale exists for f ( t , X t ) on ( Ω , F , P ) . The maximum principle restricts the process to be PB t -adapted, thus F PB and the result holds on ( S , S ) with the probability space ( Ω , F , P ) . □

Appendix A.1.3. Proof of Theorem 3

(i) Part of the proof follows by Propositions 3.13 and 3.15 in Bain and Crisan (2008). The boundedness condition
E exp 1 2 h ( X s ) 2 d s < ,
is called Novikov’s condition. By this condition, Girsanov’s theorem implies that Z t defined as
d P ˜ d P F t = Z t : = exp 0 t h ( X s ) d W s 1 2 0 t h ( X s ) 2 d s
is an F t -adapted martingale. The Martingale representation theorem implies that
W t + 0 t h ( X s ) d W s , W t t = W t + 0 t h ( X s ) d s = Y t ,
where · , · t is the quadratic variation such that W t , W t t = t . Thus, for d P ˜ = Z t d P , Y t is a Wiener process with respect to P ˜ :
E e W t + 0 t h ( X s ) d s e 0 t h ( X s ) d W s 1 2 0 t h ( X s ) 2 d s = E e 0 t ( 1 + h ( X s ) ) d W s 0 t ( 2 h ( X s ) + h 2 ( X s ) ) d s = E e t 2 / 2 · e 0 t ( 1 + h ( X s ) ) d W s 0 t ( 1 + h ( X s ) ) 2 d s = e t 2 / 2 .
The last line is the result of (A2).
The law of the pair process ( X , Y ) can be written as
( X t , Y t ) = ( X t , W t ) + 0 , 0 t h ( X s ) d s .
Thus, on an arbitrary time interval [ 0 , t ] , under the P ˜ -law, the law of ( X t , W t ) is absolutely continuous with respect to the law of the pair process ( X t , Y t ) . For any bounded measurable function φ defined on the product path space of ( X , Y ) , we have
E ˜ φ ( X t , Y t ) = E φ ( X t , Y t ) Z t = E φ ( X t , W t ) .
Therefore, X and Y are independent under P ˜ since X and W are independent.
(ii) Under the probability measure P ˜ , the law of the process Y is completely specified as an F t -adapted Wiener process with independent increments of Y. Hence, the σ -algebra is Y t = σ ( Y t + u Y t ) for any u 0 . Note that Y t and Y t are independent. By the conditional expectation property,
E ˜ φ ( X t ) | Y t = E ˜ φ ( X t ) | σ ( Y t , Y t ) .
Since Y t includes all the incremental information after time t,
σ ( Y t , Y t ) = Y t Y ( t t ) R = Y ,
and Y is a time invariant σ -algebra. □

Appendix A.1.4. Proof of Theorem 4

The proof follows the results given in Rozovskii (1991). First we give the Zakai equation, and then we show that KSP is a normalized Zakai equation.
Proposition A1.
(Zakai Equation) If E ˜ [ φ ( X t ) Z t | Y t ] is bounded under P ˜ , where
Z t = exp 0 t h ( X s ) d W s 1 2 0 t h ( X s ) 2 d s ,
then for any φ C b ( S ) the process ρ t ( φ ) : = E ˜ [ φ ( X t ) Z t | Y t ] follows
ρ t ( φ ) = π 0 ( φ ) + 0 t ρ s ( A φ ) d s + 0 t ρ s ( φ h ) d Y s
on P ˜ almost surely.
Note that if Z t is a P ˜ -martingale, then
Z t = exp 0 t h ( X s ) d Y s 1 2 0 t h ( X s ) 2 d s
since Y t is a Wiener process under P ˜ . By Girsanov’s theorem
Z t = 1 + 0 t Z t h ( X s ) d Y t .
Because ρ t ( φ ) is bounded, Fubini’s theorem and Itô’s lemma imply
d ρ t ( φ ) = d E ˜ [ φ ( X t ) Z t | Y t ] = E ˜ [ A φ ( X t ) Z t | Y t ] d t + E ˜ [ φ ( X t ) h ( X s ) Z t | Y t ] d Y t .
Taking the integral, we have the result. □
We now turn to the proof of Theorem 4.
If a new measure is constructed under a Wiener process Y, then π has a representation in terms of ρ by Bayes’ rule such that
π t ( φ ) = ρ t ( φ ) E ˜ [ Z t | Y t ] = ρ t ( φ ) exp π s ( h ) d Y s 1 2 0 t [ π s ( h ) ] 2 d s .
Since ρ t ( · ) satisfies a linear evolution equation, we expect this will lead to an evolution equation for π . From Equation (A4), we have
d 1 E ˜ [ Z t | Y t ] = 1 E ˜ [ Z t | Y t ] π s ( h ) d Y s 1 2 0 t [ π s ( h ) ] 2 d s
which is equivalent to
π t ( φ ) = ρ t ( φ ) · 1 E ˜ [ Z t | Y t ] .
Note that integration by parts implies
ρ t ( φ ) · 1 E ˜ [ Z t | Y t ] = 1 E ˜ [ Z t | Y t ] d ρ t ( φ ) + ρ t ( φ ) d 1 E ˜ [ Z t | Y t ] .
Substituting Equation (A3) for ρ t ( φ ) and (A5) for d ( 1 / E ˜ [ Z t | Y t ] ) , the result follows. □

Appendix A.2. Proof of Other Results

Appendix A.2.1. Proof of Corollary 1

Sketch of the proof. Take the transition probability Q τ and expand it w.r.t. τ at zero by Taylor’s expansion:
Q τ ( X u | X s ) = δ ( X u X s ) + τ W ( X u | X s ) + o ( τ ) ,
where δ ( · ) is the delta function18. The function W ( X u | X s ) is the time derivative of the transition probability at τ = 0 , called transition probability per unit time. This expression must satisfy the normalization property, in other words, the integral over X u must equal one. For this purpose, the above form can be corrected to:
Q τ ( X u | X s ) = ( 1 α 0 τ ) δ ( X u X s ) + τ W ( X u | X s ) + o ( τ ) ,
where α 0 ( X s ) = W ( d X u | X s ) . Substituting the expansion form into Chapman-Kolmogorov equation
Q τ + τ ( X u | X s ) = ( 1 α 0 τ ) Q τ ( X u | X s ) + τ W ( X u | X t ) Q τ ( d X t | X s ) ,
then dividing the equation by τ , substituting α 0 ( X s ) and letting τ go to zero give us the following result
τ Q τ ( X u | X s ) = W ( X u | X t ) Q τ ( d X t | X s ) W ( d X t | X u ) Q τ ( X u | X s ) .
This derivation is described in Van Kampen (2007), chp. 5). □

Appendix A.2.2. Proof of Corollary 2

For the first part, the IF claim says that a martingale exists for f ( t , X t ) on ( Ω , F , P ) . The maximum principle restricts the process to be PB t -adapted, thus F PB and the result holds on ( S , S ) with the probability space ( Ω , F , P ) . The second part is a standard result of diffusion processes. □


  1. Bain, Alan, and Dan Crisan. 2008. Fundamentals of Stochastic Filtering. In Stochastic Modelling and Applied Probability. New York: Springer Press. [Google Scholar]
  2. Bensoussan, Alain. 2004. Stochastic Control of Partially Observable Systems. Cambridge: Cambridge University Press. [Google Scholar]
  3. Brémaud, Pierre, and Marc Yor. 1978. Changes of filtrations and of probability measures. Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete 45: 269–95. [Google Scholar] [CrossRef]
  4. Doob, Joseph L. 1983. Classical Potential Theory and Its Probabilistic Counterpart. Berlin: Springer Press. [Google Scholar]
  5. Elliott, Robert J., and Anatoliy V. Swishchuk. 2007. Pricing Options and Variance Swaps in Markov-Modulated Brownian Markets. International Series in Operations Research and Management Science; Boston: Springer Press. [Google Scholar]
  6. Evans, George W., and Garey Ramey. 1992. Expectation calculation and macroeconomic dynamics. American Economic Review 82: 207–24. [Google Scholar]
  7. Fujisaki, Masatoshi, Kallianpur G., and Hiroshi Kunita. 1972. Stochastic differential equations for the nonlinear filtering problem. Osaka Journal of Mathematics 9: 19–40. [Google Scholar]
  8. Hamilton, James. 1994. Time Series Analysis. Princeton: Princeton University Press. [Google Scholar]
  9. Hansen, Lars Peter, Yacine Aït-Sahalia, and José A. Scheinkman. 2009. Operator methods for continuous-time markov processes. In Handbook of Financial Econometrics. Oxford: Elsevier, pp. 31–42. [Google Scholar]
  10. Hansen, Lars Peter, and Thomas J. Sargent. 2007. Robustness. Princeton: Princeton University Press. [Google Scholar]
  11. Hansen, Lars Peter. 2007. Beliefs, doubts and learning: Valuing macroeconomic risk. The American Economic Review 97: 1–30. [Google Scholar] [CrossRef]
  12. Hansen, Lars Peter, Nick Polson, and Thomas J. Sargent. 2010. Nonlinear Filtering and Robust Learning. Paper presented at Invited Lecture, ASSA Winter Meetings, Atlanta, GA, USA, October 20. [Google Scholar]
  13. Kallenberg, Olav. 2002. Foundations of Modern Probability. New York: Springer Press. [Google Scholar]
  14. Klein, Lawrence. 1950. Model Building—General Principles. Cowles Monograph No 11. New York: Wiley, pp. 1–13. [Google Scholar]
  15. Kolmogorov, Andrey. 1956. Foundations of the Theory of Probability, 2nd ed. Chelsea: Courier Dover Publications. [Google Scholar]
  16. Koopmans, Tjalling C., H. Rubin, and R.B. Leipnik. 1950. Measuring the equation systems of dynamic economics. In Statistical Inference in Dynamic Economic Models. New York: Wiley, pp. 53–238. [Google Scholar]
  17. Marcet, Albert, and Thomas J. Sargent. 1989a. Convergence of least squares learning in environments with hidden state variables and private information. Journal of Political Economy 97: 1306–22. [Google Scholar] [CrossRef]
  18. Marcet, Albert, and Thomas J. Sargent. 1989b. Convergence of least squares learning in self-referential linear stochastic models. Journal of Economic Theory 48: 337–68. [Google Scholar] [CrossRef]
  19. Meyer, Paul-André. 1976. Un Cours sur les Intégrales Stochastiques. Séminaire Probab. X, Lecture Notes in Mathematics 511. Berlin: Springer Press. [Google Scholar]
  20. Pollock, D. Stephen G. 2018. Filters, waves, and spectra. Econometrics 6: 35. [Google Scholar] [CrossRef]
  21. Rogers, L. Chris G., and David Williams. 2000. Diffusions, Markov Processes and Martingales: Volume 2. Cambridge: Cambridge University Press. [Google Scholar]
  22. Rozovskii, Boris L. 1991. A simple proof of uniqueness for Kushner and Zakai equations. In Stochastic Analysis. Oxford: Elsevier, pp. 449–58. [Google Scholar]
  23. Sargent, Thomas. 1987. Macroeconomic Theory, 2nd ed. Bingley: Emerald Group Publishing Limited. [Google Scholar]
  24. Simon, Herbert. 1959. Theories of decision-making in economics and behavioral science. American Economic Review 49: 253–83. [Google Scholar]
  25. Stroock, Daniel W. 2000. Probability Theory: An Analytic View. Cambridge: Cambridge University Press. [Google Scholar]
  26. Van Kampen, Nico. 2007. Stochastic Processes in Physics and Chemistry, 3rd ed. Oxford: Elsevier. [Google Scholar]
The representation is defined in Koopmans et al. (1950) as “a way of writing the system”. In general, the representation is a way of presenting the law of motion of this system.
See also Pollock (2018) for a recent treatment of filtering methods in the frequency domain, and Fujisaki et al. (1972) who study the nonlinear filtering problem with stochastic differential equations.
A set A is called P -null set if A is measurable on ( Ω , F , P ) and P ( A ) = 0 .
The notation A B means that the set is generated by A and B.
In distribution theory this function is usually called test function.
This statement follows from the axiom of choice, which allows for the construction of non-measurable sets, i.e., collections of events that do not have a measure in the ordinary sense, and whose construction requires an uncountable number of events.
The problem could be extended to a semi-martingale problem by using a No Free Lunch claim (the Kreps-Yan Theorem). But then X in general cannot provide any explicit solution for the conditional probability π t .
Formally, E [ f ( X T ) | F t ] = E [ f ( X T ) | X t ] for any f ( · ) B ( S ) .
When the process is assumed to be homogeneous in time, the family of Q ( · | · ) is a semigroup of transition kernels and has been extensively studied in recent works of operator methods, see e.g., Hansen et al. (2009).
Mathematically, this claim intends to squeeze a stochastic problem to a partial differential equation (PDE) problem so that it is possible for economists to construct and solve a specific analytic problem.
One can define a more complicated model to incorporate these effects, but the cost is to use higher order stochastic calculus. In fact, later we will see that the diffusion problem already induces an almost infeasible representation for the conditional density. At this stage, the complexity level of the problems that depart from the diffusion ones still needs to be elaborated.
The dependence between W and X is difficult to eliminate in economics and may cause an endogeneity problem. But, technically speaking, this issue often arises by using a too simple function h ( · ) . Since h ( · ) can be highly non-linear, i.e., containing all endogenous effects, it is reasonable to ignore this issue here.
See chp. 4.8 of Bensoussan (2004) for details about the SPDE problem.
By Itô calculus, d Y t = d log S t = ( 1 / S t ) d S t ( 1 / 2 S t 2 ) d t . As d S t / S t = μ d t + z ( X t ) d W t and d t / S t 2 = z 2 ( X t ) , we have the expression.
A detailed proof is given in Theorem 4.4.1 of Bensoussan (2004).
This statement skips one intermediate step which requires X t to be progressively measurable (see Stroock (2000) Remark 7.1.1 Lemma 7.1.2).
The constant is the initial value ψ ( 0 ) = x from the following ODE problem:
f ( ψ ( t ) ) t = a , f ( ψ ( τ ) ) .
Loosely speaking, delta function is a smooth indicator function such that the derivative of δ ( · ) exists in the weak sense. Regardless of technical differences, one can think both of them are identical.
Back to TopTop