Center for Operations Research and Econometrics (CORE), Universite Catholique de Louvain, Voie du Roman Pays 34, B-1348 Louvain-la-Neuve, Belgium
Institut de Statistique, Biostatistique et Sciences Actuarielles (ISBA) and Center for Operations Research and Econometrics (CORE), Universite Catholique de Louvain, Voie du Roman Pays 20, B-1348 Louvain-la-Neuve, Belgium
Author to whom correspondence should be addressed.
Filtering has had a profound impact as a device of perceiving information and deriving agent expectations in dynamic economic models. For an abstract economic system, this paper shows that the foundation of applying the filtering method corresponds to the existence of a conditional expectation as an equilibrium process. Agent-based rational behavior of looking backward and looking forward is generalized to a conditional expectation process where the economic system is approximated by a class of models, which can be represented and estimated without information loss. The proposed framework elucidates the range of applications of a general filtering device and is not limited to a particular model class such as rational expectations.
Many people’s aspirations and desires imply forward-looking decisions. Making a forward-looking decision requires the construction of an expectation based on the information that is backward induced. It can be characterized by a process of making conditional expectations. Such a process allows us to construct our subjective beliefs rather than perceiving the world as mere presentation. Instead, we perceive the world as an object of perception in which our own experience and knowledge are integrated. We become unified with that perception. Thus formulating an expectation of an agent cannot separate the perceiver from the perception. From the market’s perspective, an agent no longer views himself as an individual, but rather becomes a “cognitive subject” of a time-invariant perception where the laws of the economy are revealed. The process of forming conditional expectations is the practical consequence of this identity as it attempts to represent our immersion with the world, and these attempts constitute the essential laws of the world. We refer to this process as a filter.
The equilibrium in rational expectation models is based on the assumption that the agents in the model are confident with their perceptions. As a consequence, the agents trust that the optimal actions following their expectations will give them maximum utilities or profits. Sequential decisions are made under expectations conditioning on past information. Such decision processes induce a representation1 of the conditional probability distribution of the economic dynamics: the law of motion perceived by agents will sequentially influence the law of motion that agents actually face. These perceptions can be viewed as filters as they are often characterized by recursive projection schemes (Simon 1959). There is, however, a dichotomy in the understanding of filters in economic theory and econometrics.
In economic theory, the filtering method (perception) is an active process involving the agent’s attention to a small part of the whole dynamic system and excluding almost everything out of the scope of their attention. In econometrics, the filtering method is treated as a passive process selecting some statistical relevant information of a given dynamical model. Both aspects of the filtering method are mainly treated by Sargent (1987) and Hamilton (1994), respectively.2 Some other, similar types of perceptions have been discussed in Marcet and Sargent (1989a, 1989b); Hansen and Sargent (2007); and Hansen et al. (2010). Rather than considering a specific economic or econometric model, this paper characterizes general perceptions that are concealed in abstract models where both the active and passive arguments can be integrated. Hansen (2007) examines the inference of rational expectation models from two separate perspectives. The way he reduces the gap between these two perspectives is to enlarge the models used by the economic agents. We extend the scope to a more general class of models for the economic agents under which the integration becomes more natural. The filtering method provides formal representation-estimation processes for practical situations. The relevance of these processes in our setting is that they are not merely statistical techniques, but actual dynamic mechanisms used in expectations and perceptions. The equations and assumptions that appear in the estimation procedure correspond to the perception of economic agents. This is in the spirit of Klein (1950): “The purpose in building econometric models is to describe the way in which the system actually operates. [...] The construction of such a system is a task in which economic theory and statistical method combine.”
In our context, building econometric models is related to the proper specification of filters. Considering filters as a perception device in an abstract economy induces a large class of econometric models. The remaining econometric task is to reduce the abstract representation to a feasible form for estimation. On the other hand, as the complexity of the environment increases, agents learn more and more about the mechanisms and processes that are used to relate themselves to that environment and to achieve their goals. The availability of general implementable techniques in econometrics will elucidate inaccessible places for the abstract models. As the comment in Simon (1959) on modeling human expectation says, “it is one thing to have a set of differential equations, and another thing to have their solutions.” Economic theory predominates in the definition of the representation describing a certain type of economic dynamics, while econometric methods are associated with the determination of the agent’s way of estimation.
The need for reconciling economic theory and econometrics may not be obvious in linear or linearizable structural models when the laws of motion are specified on either theoretical or empirical grounds, and hence either side of the coin will be sufficient to justify the model. However, if we start with an abstract economy, the limitation of modeling tools in this complex environment will force us to integrate all available factors. The generalization of the filtering mechanism will give a fundamental interpretation of agent’s perceptions in complex situations. All subsequent statistical procedures, such as estimation, inference and forecasting, will more or less depend on the way this generalization has been formulated. Our contribution will be to make this generalization available.
Early attempts in this direction were made in the framework of rational expectation models. The expected utility or profit for each agent depends on the assumption of the agent’s perception mechanism. Evans and Ramey (1992) show how agents adjust their long-term expectations under different perception rules or predictions. The standard method assumes that the agent’s prediction uses a single, presumably correct law of motion. But if perceptions of agents differ, then the corresponding predictions are incompatible with each other. A sequence of works by Hansen and Sargent, covered by the monograph Hansen and Sargent (2007), introduces the concern of robustness of agent’s expectations. The main idea is that agent’s decisions contain their prior worries about a possible mis-specification of the model. These multiple priors generalize the perception or the law of motion contained in the agent’s mind. The associated perception mechanism for the robust decision agents is also a filter, called robust filter, and is a dual of the linear-quadratic regulator problem of utilities. Given the robustness concerns, the agent makes the expectation based on a class of models whose information is presumably not far from the underlying model in a certain metric. Therefore, the robust filter generalizes the mechanism used in the single law case.
Our motivation is related to the robust filters of Hansen and Sargent (2007) and Hansen (2007), but the relation is more one of spirit rather than of a precise form. In terms of the robustness framework, our objective of generalizing filtering mechanisms attempts to analyze how large the class of alternative models could be while remaining consistent with some general filter. The class used in Hansen and Sargent (2007) is restricted by a risk threshold and is equivalent to a class of partially specified processes. If we enlarge such a class to an abstract economic system, is the filtering type mechanism still an optimal choice for agents in this system? We ask this question because rational expectation models are merely an approximation of the real world. Exploring the applicable range of a general filtering device will demonstrate its usefulness essentially irrespective of the particular form of the economic model.
The paper aims at making the previous assertions rigorous. The mathematical tools we use are borrowed from stochastic analysis and stochastic control. For expositional purposes we delay the formal results of the paper to Section 4. Section 2 introduces the economic model as a general probability space and anticipates the main result on the existence of filters. In Section 3 we define three claims under which the economic system is supposed to operate and which will allow us to obtain an explicit representation of the filter. Section 5 elaborates on the claims in asset pricing, on a framework with stochastic volatility, and on the link of the general filtering results with those in a linear framework. Section 6 summarizes the main findings. All proofs are relegated to Appendix A.
2. The Model
2.1. An Abstract Economy
The economic system in this paper is driven by components whose evolution is modeled via stochastic processes. We stack these components into a state vector and denote it as . When we state specifically , is a discrete time process, otherwise is assumed to be a continuous time process. Throughout the paper, x refers to either a deterministic variable or a realization of . The state vector may consist of unobservable features such as private information, utilities or underlying prices.
The underlying abstract economic model in our context is a probability space , where we define X together with a filtration . The filtration is right continuous, , and . In , there is a -null set3 contained in and consequently in all .
The values of the states of form a measurable space . The state space is a compact metric space and is associated with a Borel -algebra . We assume to be measurable and the measurable mapping is:
where ⊗ denotes the product operator for -fields.
While the essential features of economic dynamics are assumed to be captured by the state variables , the observable economic variables and public information, in general, are not. Since the observable and public information is the major resource for agents to make their expectations about how the economic states change, we specify it by another process . Let include those observable variables that are related with , and assume that the dimension of is not larger than that of . The observable information set is , the filtration generated by the observable process such that
with , where is the collection of all -null sets of our economic model .4 The available information is induced by observations up to time t and thus it will be used for making inference about . Since is right continuous, to make compatible with , we assume that the filtration is also right continuous.
Agents will construct their perceptions of states based on the information in . It means that agents are able to construct , the conditional distribution of , given . The conditional expectations characterize the perceptions or filters of . For any t, the conditional distribution is a stochastic process such that
For simplicity, we will write as in short.
Since the perceptions are processes, any valuation of will involve an expectation w.r.t. the conditional distribution process. The definition of conditional expectation of is restricted to an equivalence class of -measurable X such that:
Then the expectation of a function can be expressed as . The conditional expectation of is ultimately what is desired from filtering, but the methods for obtaining the conditional distribution process are quite involved. If this integral is well-defined for a class of functions , then we call them choice functions .5
2.2. Existence of Filters
In the abstract economy , we consider perception equivalently as a filter. But perception as a common human behavior should always exist on either individual or aggregrate levels. Will filters always exist in this abstract economy? Due to the -null set , may in fact not be well defined for all but only for outside the -null set. Thus, the question of existence of is equivalent to the question under which circumstances one can gain sufficient control over all -null sets such that the expectation is well-defined for choice functions . In other words, the filters exist in the economy when perceptions induce well-defined expectations.
The theorem of the existence of filters in our abstract economy will be given in Section 4, Theorem 1. Here, we discuss the consequences of this theorem without presenting too many technical details. Suppose a process can be thought of as the -measurable representation of the choice function . The theorem states that given some regularity conditions, for any choice function , the expectation of w.r.t. the filter exists and is equal to the process under the measure. In other words, agent’s perceptions of coincide with the observable information. Therefore, the existence of induces the existence of for the choice function and vice versa.
Note that although relations between and exist, expectations conditional on do not necessarily coincide with those conditional on . In particular, the -null information originates in , but in this information contains unpredictable events that may happen. Once agents observe these unpredictable events, their perceptions will be influenced. We will emphasize this point in the following subsection.
2.3. The Importance of the -Null Set
Although it complicates the set-up, the -null set is a crucial feature in . Apart from its mathematical characteristics, it is meaningful in economic problems and affects our way of evaluating a model using empirical data.
The role of the -null set in defining a conditional probability has first been illustrated by Kolmogorov in his famous Borel–Kolmogorov paradox. The paradox shows that the conditional probability is not uniquely defined with respect to a null set, see Kolmogorov (1956, chp. 5) and Bain and Crisan (2008, chp. 2)). From an economic perspective, one can think of the -null set on and as those unexpected events which have been included in the underlying economic mechanism and in the agent’s observable information set .
Although most events in the -null set of correspond to events in the -null set of , the two null sets are not equivalent. To see the subtle difference, let us assume that the -null events in result from aggregating countable -null events in . The aggregation leads to uncountable events which are too “complex” to be embedded in the underlying model, the probability space . The model attributes zero-measures for any countable event sets that are beyond its explanatory power, but for uncountable event sets the model cannot even affirm their existence.
We illustrate the economic meaning of some -null sets on by a concept which we call overflow. The effect of this overflow is related to the regularization of the -null set on , which is a result of Theorem 1.
To give an example of overflow, consider economic bubbles. There is a long debate whether or not economic bubbles exist. Rather than joining the debate, our intention here is to use bubbles as an example to illustrate overflow characteristics. Suppose some individual gamblers have complex trading strategies, and their gains are publicly observable. These speculative trades, therefore, are included in the information set at the agents’ disposal. However, the strategies behind these trades may not be fathomable by the public and are conducted in manyfold ways, such as forbidden disclosures (private information), special technical equipments (e.g., high-frequency trading), or even improper policies (lobbying). Any economic model that wants to cover some or all of these specific features will make its complexity explode. This limitation is recognized by the public, and hence it is reasonable for the public to believe that the underlying economic model will set zero measure on each of these strategies and the associated actions because they are unexplained by the model. In other words, each action of the trading strategy is in the -null set on .
The economic bubble can be considered as an aggregated effect of these trading strategies. Since there are numerous speculations happening in every minute, it is natural to think that their aggregation is uncountable. Later, we will show that an uncountable collection of null sets is not necessarily incorporated in the -null set of . This means that the aggregated effect, the bubble, may have a positive probability to occur, namely to appear in .
To formalize the previous argument, let , , … be a sequence of pairwise disjoint sets. In order to ensure that is a regular conditional distribution, the -additivity condition needs to be satisfied:
for every , where is the -null set for the disjoint set for any . Let the collection of these null sets be . Note that the power set of all null sets is which is uncountable. This means that is uncountable. We know that satisfies the -additivity condition only if for any but not . Therefore, some event in is not in the null sets for and has positive probability to occur:
In fact, the set need not even be measurable because it is defined in terms of an uncountable union.6 Then cannot be a probability measure. The purpose of Theorem 1 is to regularize this problem so that the projected is on a countable subspace. This regularization implicitly forces to ignore those collections of countable -null sets on . As a consequence, the abstract economic model might not be a “proper” model for all events, but one that approximates a complex reality.
3. A Feasible Econom(etr)ic Model
As shown in the previous section, the -null set on may induce the arbitrariness of on . For the -null set on , individuals may have arbitrary beliefs about the event sets, because they cannot figure out any “law” on the set. The arbitrariness allows us to modify -adapted processes by changing the values of these processes on the -null set, which corresponds to a change of measure. Then the new process should still be -adapted. It accommodates the complexity of the real world but it induces a class of arbitrary filters . Due to the arbitrariness, the conditional distribution process exists even though some observable event sets in the economy are not explained by the underlying model. If the model needs a regular solution, it should be disencumbered of these irregularities. In this section, we look for a feasible model that will regularize the expected process and exploit a specific representation of it.
With three additional claims, one can obtain an explicit solution rather than an abstract process of . These claims are the following: First, the martingale fairness claim regularizes a class of probabilities that are not uniquely defined on the -null set on . Second, the invariance fairness claim induces a specification of X that is embedded in the general model . Finally, the independent complement claim specifies the motions of the observable process Y. The first and second claims basically consider the same issue of finding a feasible sub-class models of the underlying economy , but the development of the invariance fairness claim depends on the martingale fairness claim. With the specification of the law of , the last claim induces a feasible representation of based on the observable process Y.
3.1. Fairness Existence
The following claim introduces a “stochastic constant” upon which we can build our model:
(Martingale Fairness, MF7).A probability measureonis absolutely continuous with respect to, such that. The information of stateat any time t is “fair” for all agents underand the information is memoryless, i.e., the processis Markovian.8
Fairness means the martingale property of X:
The martingale model is treated as a ghost model since fairness may never happen in reality. However, if one accepts the existence of this martingale model, it will guide us to a feasible base-line model and help us to solve the original problem. If there is a -martingale process Z on , then any -martingale process X implies a -martingale process , due to the absolute continuity of and . It is obvious that if a process can be regularized on either measure, then it can also be regularized on the other one.
The Markovian structure of X means that the filtration is independent of the -adapted if . For arbitrary time , the Markovian structure implies a transition kernel . The Chapman-Kolmogorov equation of the transition kernel is also available such that
which can be simply stated as for , . The existence of the kernel is a direct result of the Kolmogorov existence theorem (Kallenberg 2002, Theorem 7.4). It is obvious that the transition kernel is a regular conditional probability.
With the MF claim, in Section 4 Corollary 1, we give a gain-loss (master) equation to describe the dynamics of :
where the function is the time derivative of the transition probability at , called transition probability per unit time. This equation describes the complete transition pattern of X by showing the variation of the corresponding transition kernel. If is set to zero, the evolution of X attains a balance. The equation merely states the fact that the sum of all transitions per unit time into any state must be balanced by the sum of all transitions from into other states. Gain balances loss, in other words, we have a steady state.9
3.2. Invariance Behaviors
With the martingale fairness claim, we have seen that the Markovian model gives us an equation to measure the variation of state transitions of the underlying economy. The equation is valid at any time-point and in any state, but the equation provides no clue about , the transition probability per unit of time. Now an idea is to extract some information about the statistics of , in particular first and second moments. This type of information should be able to generate a class of sub-models that mimic the behavior of the original model of X. We need to find out under which conditions the sub-model is equivalent to the original one, in which case no loss of information occurs when representing by the ghost model .
Let be a function satisfying the maximum principle up to second order, which means that for a compact subset of states , at time t, the maximum of in is found on the boundary of B, . The simplest example of f is a function in the linear functional class such that for fixed t, implies (or >), (or ≤) and on . The extremum of always exists on the boundary of the domain. Here and denote the Laplace and gradient operators on x, respectively.
Think of as a time-dependent utility or value function. The requirement of being maximal up to second order means that is proportional to so that one can set up their relation by some equation, for example
which would imply that follows a Wiener process. Thus, the maximum principle pins down a specific evolution class for . We have the following claim to incorporate this idea.
(Invariance Fairness, IF).If claim MF is true, then for anysatisfying the maximum principle up to second order, there exists a martingale measure such thatwill preserve the fairness on this measure. The law of will also satisfy the maximum principle.
Theorem 2 in Section 4 will show that the IF claim is another way of specifying Itô’s diffusion problem.10 To the best of our knowledge, this is the first time that the problem is motivated on the basis of the maximum principle. Understanding the connection between this economic claim and econometric models will help us to assess the potentials of modeling. That is, before doing estimation, testing, or prediction, it is essential to realize how far the model can reach in principle.
The diffusion structure induces a Wiener process specification for . The first and second moments of the process are given by
where is the transition probability per unit time under the Wiener law. This is a diffusion martingale type model. Given the whole transition contents of X, our attention is only restricted to those transitions that will maintain the maximum principle up to second order. The reason is that only the transitions satisfying invariance fairness can be revealed and identified in standard econom(etr)ic models. It does not mean that the unqualified transitions do not exist. Conversely, many transitions in the system have high order features such as complex trading strategies in pricing, multiple correlated options, etc. What we can state however is that those transition features are too complex to be embedded in a diffusion model.11 Therefore, those higher order transition laws of X will be assigned to the -null set in .
3.3. Indifferent Projection
The last component we have not yet exploited is the observable process Y. In the economy, the process Y reflects the law of X, so the topological structure of Y should contain as much information as X. Since the IF claim is nothing but pinning the space of X onto the Wiener space , an space with Wiener measure, it is natural to assume that Y can be represented in a similar space.
Given any map h in and , if Y can maintain all the information of the martingale diffusion process of X, then we say that Y shares an isometry property with X. Except for the information maintained under the isometry property, note that some information in , such as the collection of -null sets in , is not contained in X but affects the outcome of Y. We use measurement errors to represent this information. The following claim is to specify the law of Y.
(Independent Complement, IC).Letbe a map inwhich satisfies the maximum principle up to second order as in the IF claim. Suppose the observable process Y is contaminated by an additive generalized Wiener noise W, where the noise processis generated by the information setbut is independent of.
The information set of W is generated by where is the -algebra generated by those X satisfying MF and IF claims. In practice, the Wiener process is also modeled independently12 of . Thus is a larger filtration than , i.e.,
since it allows for the measurability of the noise process. The process Y satisfying the IC claim is given as follows:
Note that this specification is to restrict the process Y in because
Theorem 3 in Section 4 implies that for a class of these models, there will be a concrete way of specifying the conditional expectation process of this class. This theorem is an important step to derive a specific form for the filter. The representation of , as a result of IF claim, induces a feasible conditional expectation for , while IC claim allows us to attain the expectation of conditioning on the information generated by .
IC claim has a similar role as MF claim. MF claim is to ensure the existence of a martingale problem for the state process. IC claim does the same but for the observable process. The aim is to make the state process , the diffusion generator in IF and the observable process comparable.
3.4. An Explicit Representation
The previous claims are to obtain a representation of the conditional distribution for our class of models. Once a representation of is available, each model in this class will correspond to a specification of this representation. The data contained in the information will be useful for estimating the parameters of this specification. A specified representation plays a role as a predictor for the corresponding model and observable information.
The filtering problem is, essentially, to determine the conditional distribution of at time t given the information accumulated from observing Y in the interval . Given all three necessary claims, we show in Theorem 4 that for any bounded continuous choice function , we can compute the conditional expectation of
via an equation called Kushner-Stratonovich-Pardoux equation.
Many dynamical estimates consist of computing the conditional distribution of a target process given a partially observed history. As the explicit solution of gives the end time marginals of the preceding conditional distributions defined for any bounded , this explicit representation of provides a concrete basis of nonlinear estimation problems. This point of view is also at heart of the Bayesian methodology, where the conditional distribution is the posterior and the path distribution of the states is the prior.
4. Main Results
The state is -adapted, while the constructed conditional expectation is evaluated by , an -adapted process. Thus may not be well-defined. Let be a counterpart of that is projected on the smallest -algebra on such that is -adapted and measurable. Our first result is to show that for any choice function , the conditional expectation of is equivalent to in probability. This result implies the existence of filters in our abstract economy.
With an enlarged -algebra, a representative process will be defined for even if is not -adapted (Rogers and Williams 2000, Theorem 7.1). This theorem is called projection theorem and will be used in the proof of Theorem 1. The projection theorem says that if a process X is measurable and bounded, then for every stopping time T, there is a representation (optional process) such that
as a projection of X onto where is an indicator function for a set A. Here no restriction is imposed on the stopping time T. The notion of optional processes is due to Meyer (1976), see also (Doob 1983, pp. 388–98) and Brémaud and Yor (1978). The idea of projecting an -measurable element onto is similar to formulating a filter of X given the observable information in . We apply this result to show the existence of a filter in the abstract economy.
Letdenote the space of all probability measures on. For a compact setand its Borel σ-algebra, there is a-valued conditional distribution processsuch that for any bounded-measurable function,
This distribution process is an equilibrium process for the economy whose underlying states are in a probability spaceand whose observable information is contained in.
Theorem 1 implies the existence of for the abstract model given Section 2. The importance is that a perception of any bounded choice function always exists in this abstract economy although there could be multiple ways of forming the perception due to the incompatibility between underlying and observable layers of the economy.
Following the model in Section 2, we will specify a uniquely representable law of perception by using the claims in Section 3. First, the martingale fairness (MF) claim in Section 3.1 regularizes a class of probabilities of these processes that are not uniquely defined on the -null set on so that the analysis of the model can rely on one ghost model . The claim also discloses that the evolution of the state X is completely captured by the transition kernel , whose variation describes the variation of the evolution pattern of X. Thus, the MF claim extracts important characteristics of the underlying dynamics.
The martingale modelimplies a gain-loss equation for the system such that:
The first term is the gain of statedue to transitions from other statesand the second term is the loss due to transitions frominto other states.
The equation in Corollary 1 needs a further specification because of the unspecified . With the invariance fairness (IF) claim in Section 3.2 we can obtain a specification within the Itô diffusion problem. The following theorem gives the equivalence.
Forand, the following are equivalent:
(i) If claim IF is true, anyinhas an approximating model that relies on the information contained in the first two moments of the process.
(ii) The functionis an Itô diffusion process with drift and diffusion terms,.
For a diffusion type process, its first and second order moment describe the full dynamics. Thus we can specify .
is a martingale, whereIn addition, ifandare bounded and continuous, the weak solution of the diffusion problemis unique. Then
where the transition probability per unit timehas the Wiener law.
By the property of characteristic functions, the martingale in Corollary 2 together with the initial condition captures all the information, the first and the second order moments, of . This implies that captures the first two moments information of the process on the economic model . Therefore the process is a diffusion type process on the Wiener path.
In fact, the IF claim is nothing but pinning the problem onto the Wiener space , an space with Wiener measure. The martingale representation theorem says that any continuous martingale, i.e.,
generated by , can be written as
with a predictable process i.e., each is -measurable for . Without loss of generality, we consider the case . The functional space of is
The stochastic integral of h is a map such that
This map is an isometry as a consequence of the Itô isometry theorem. The image of J of the Hilbert space is complete. Therefore, the martingale and the stochastic integral are isometric.
What we emphasize here is that the IF claim carries us to an space where the classical projection techniques are available.
If the MF and IF claims hold, then the IC claim in Section 3.3 implies the representation (1) for the observable process. Supposethen the following statements are true:
(i) There exists a measuresuch that
and, under measure, Y is independent of X. In addition, the motions of X underand underare the same.
(ii) For any-measurable random variable,
The time-invariant algebra in Theorem 3 enables us to use techniques based on Kolmogorov’s conditional expectation which would not be applicable if the conditioning set was time dependent, such as .
With the existing results, we summarize the model specification in Section 3 as the following pair : X is a solution of the martingale problem for ; in other words, assume that the distribution of is and that the process , where
is an -adapted martingale for any and corresponds to of a diffusion process. Y satisfies the evolution equation
with null initial condition.
Then our attempt is to connect the martingale problem in (4) with a diffusion type representation. Theorem 2 tells us that when the process is on the Wiener path, the solution of a martingale problem associated with the second order differential operator is the solution of the diffusion process. Theorem 3 tells us that Y is on the Wiener path under .
Finally, we reach a specific representation of the perceptions in the abstract economy.
(Kushner-Stratonovich-Pardoux, KSP) For any, Proposition A1 implies
whereis the equilibrium density process for our general filtering setting. The conditional expectationvaries accordingly to (5).
Equation (5) is called KSP which has recently been applied to solve non-linear filtering and smoothing problems in applied mathematics, see (Bensoussan 2004). One can think of the KSP representation as characterizing an equilibrium conditional expectation over any . It is a stochastic PDE problem and has a unique solution.13
Theorem 4 is a rather general result. Although solving the KSP problem can be transferred to solving a parabolic PDE problem, except for the case of a linear model and Gaussian disturbances and initial conditions, finding a closed form expression for the distribution functions of (5) can be very demanding.
We give respective remarks regarding the previous claims, the modeling procedure within the general framework, and the relation between general filtering results and the linear ones.
5.1. Claims in Asset Pricing Models
The three claims of the previous section have their counterparts in asset pricing models. For illustration purposes, we only consider a simple situation where X and Y are measurable and observable.
Let X be the price for some security contingent on an underlying asset S. Suppose that the price at time t is a random variable , where the integral is the Itô integral and is predictable, i.e., each is -measurable for . The “fairness” in MF says that any X constructed in this way will have zero expected pay-off for some discounted price process under a probability measure such that , where is the risk-free short rate process. If this happens, by the fundamental theorem of asset pricing, the securities market admits no arbitrage. is called the equivalent martingale measure.
Markov uncertainty is often assumed for diffusion processes. The claim IF pins down a specific transition of the security process as a diffusion process. This implies that the process X is also a diffusion process. Suppose that where characterize the instantaneous drift and volatility, respectively, of this security. By Girsanov’s theorem, there is a risk-free measure for the function satisfying the maximal principle up to second order. In particular, if the market is complete, then for any diffusion process Y one can obtain Y by some self-financing strategy such that .
All consequences induced by these three claims, e.g., no-arbitrage, diffusion path, and risk-free measures, are familiar to economists and should be acceptable for most dynamic models in econom(etr)ics.
where is the price of a stock, the function is known and is a hidden state Markov process. An example is the Heston model, where and
In general, if we observe a continuum of prices, then is measurable with respect to the filtration generated by . Let the observable process , and notice that14
The noise of contains a diffusion coefficient function . The suitable corresponding for Y is
so that under the process is independent of .
We can discretize the model. For a fixed , let be a partition of , then the quadratic variation of Y is the cumulative variance
where is -measurable. If is a continuous process, we have
that is also -measurable. Thus if is a continuous process, the volatility is observable for almost every t, and is observable if exists.
With the Markov structure for , one can price derivatives on . The Black-Scholes price of a European call option in the presence of Markovian volatility is
where , and is w.r.t. the market’s pricing measure. Given and the parameters of its dynamics under the market measure, we can compute the expected return of the call option by taking a filtering expectation.
5.3. Kalman Filter
If and at every time t can be linearized as matrices (vectors) and such that
then KSP with test functions and will give us the standard Kalman filter, also called Kalman-Bucy filter, as follows. Let be the conditional mean of X such that
and R be the conditional covariance such that
If (6) and (7) are acceptable localizations for (4), then the solution of satisfies the following SDE
where we substitute and into KSP Equation (5) and where the covariance term satisfies the deterministic Riccati equation15
Equations (8) and (9) together give the Kalman-Bucy filter scheme. One can see that this scheme is a special case in the content of KSP equation. While we feature the Kalman filter in this paper, there are other well known filtering methods including particle filters and the Zakai equation that are relating to KSP problem.
We have started with the fact that filtering is an intrinsic element of economic phenomena. For a general abstract economy, we provide a result on the existence of filtering mechanisms. We emphasize a subtlety due to null sets that may lead to peculiar events with positive probability after aggregation even though on an individual level such events have zero probability. This feature turns out to be crucial for the understanding and interpretation of the economic model. It also has to be regularized in the derivation of the existence result.
By introducing three natural claims, we established a representation of the conditional distribution process and, hence, of the filtering device. The general representation is nonlinear and subject to estimation using statistical methods. We have outlined the realm of economic models for which this representation is applicable. The implication of our findings for the way economic theory and econometrics interact in general has yet to be discovered.
Z.G. and C.M.H. have contributed equally to all parts of the manuscript.
This research received no external funding.
The authors would like to express their gratitude to Ken Judd and Peter C. B. Phillips for useful discussions at the early stage of this manuscript, and the participants in the seminars at University of Amsterdam. All the remaining errors are ours.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A.1. Proof of Main Theorems
Appendix A.1.1. Proof of Theorem 1
The proof includes four steps: 1. construct a countable vector space on , 2. define a non-negative process that corresponds to the elements in , 3. extend to the space of continuous bounded functions, , check that the definition of the process is still valid, and find a representation of , 4. extend to and check that all properties are still valid.
Step 1. For , compact induces that is dense and that a linear span exists. Let be the set of basis functions in the linear span and thus any is bounded continuous. Let be a countable vector space generated by finite linear combinations of with rational coefficients such that
These s are still linearly independent for any . Set .
Step 2. For any t, is another -adapted process16. Equation (3) implies that a -adapted optional process exists for . Thus a sequence is corresponding to . For some , linear independence induces that a function is uniquely represented by and furthermore it implies that a -adapted process , corresponding to , is linearly and uniquely represented by . We can define the linear functional
Because the conditional distribution is a non-negative process, we need to construct a non-negative analog of . Define a subspace
that is countable. For and fixed t, we define the null set for u such that
Since , in order to show that is a -null set, we need to show almost surely. If almost surely, then by Equation (3) the optional process would be non-negative on and hence is a -null set. The union of over ,
is a countable union. A new process is defined as
Step 3. In order to extend the definition of to outside , we first need to check that is bounded. It is obvious that . Since , the uniform norm has the property that . Then , from step 2, we know
where the second inequality comes from the linearity of and . It implies
so that is bounded.
Let any . Since is dense in , there exists a sequence such that . We can define
over . For boundedness, we only need to check the case . Note that for any two sequences and , if and , we will have
by the boundedness result in and the triangle inequality. Thus, is bounded.
We also need to ensure that the optional process of is well-defined on . For in , we have a -adapted process for , and
The last equation is implied by the dominated convergence theorem for bounded sequences.
Since is compact, the Riesz representation theorem implies the existence of ,
for any bounded and well-defined inner product.
Step 4. The last step is to extend the definition of to incorporate . Let be a subset of such that is a -adapted optional process of on . It is obvious that . Note that the Borel -algebra generated by is . By the completeness of , we can construct a sequence of subsets such that
Compactness of implies that is closed under finite intersections. From the construction in step 1, we know that the constant function is included in every . The monotone class theorem implies , since any monotone non-negative increasing sequence , with indicator function of every set in , contains the -algebra which is closed under finite intersections. Thus contains every bounded -measurable function of . As is a subset of , we conclude . □
Appendix A.1.2. Proof of Theorem 2
From (ii) to (i), the proof is trivially applying Itô’s calculus.
From (i) to (ii), the proof consists of the following four steps: 1. show that the maximum principle on smooth functions is equivalent to the law of Wiener processes, 2. show that the invariance of the law is preserved on the Wiener path, 3. set up the approximation on the Wiener path by showing that the martingale fairness is preserved, and 4. extend the result to the model .
Step 1. The definition of the maximum principle is simply the first and second derivative conditions in calculus. If a function attains its maximum at point , then and . Furthermore, if f is a time-dependent function such that at a certain time interval , and f attains its maximum at x when time is t, then with and . The inequality expresses the uncertainty of the future such that could either strictly increase along t or attain its optimum at t. Since the maximum principle is preserved up to the second order, we have the heat equation
without loss of generality, in steps 1 to 3, we only consider the standard case with the diffusion factor , but (A1) holds for any real vectors and . The solution of (A1) is the well-known Wiener process.
Step 2: In order to formalize the concept of the Wiener path, we need to introduce the path space. Suppose that a series of realizations corresponds to t via for . Then is a continuous path with the image on the complete separable space . A path space is a continuous function space of paths . The -algebra is
generated by . The measure for is called the Wiener measuresuch that for a sequence :
The measure is tight in the sense that, if ,
for any metric . This is the Ascoli-Arzela criterion for compact subsets.
We need to show that the invariance property of is a restatement of the independent identical increment property.
Identical: Note that a function f over will not change the expression except that is replaced by . By Lemma 3.4.3 and Theorem 3.4.16 (Kolmogorov’s Criterion) of Stroock (2000), we have that for a subset of all tight measures and :
where is a constant, and . Then we have
This means that the increments are controlled by the length of the time interval. When the interval is extremely small, all increments are essentially treated the same. So the smooth function f does not matter for the law of .
Independent: For , , let . By the definition of the Wiener measure, both and associate with on the time path and respectively. Clearly, they are independent.
Step 3. The reason why we are looking for a martingale representation is in fact to look for a “stochastic constant”. In the deterministic case, suppose we define an integral curve of on a smooth vector field a on , starting at . Then the path with has the property that
is a constant17 for any . If there is a stochastic analog, then we can use this stochastic constant to establish our approximating model. The aim is to maintain a stable “error”.
Recall the path space and its -algebra . For an incremental element on , the Fourier transform is:
where . What we want to obtain is a martingale and a “constant” under . From the above equation, it easy to see that we can obtain both of them simultaneously if we shift the element by a Gaussian factor :
Let a triplet denote this martingale on the Wiener path :
We define the Fourier transform of f by , and the inverse Fourier transform is .
As in the deterministic case, the ideal representation of on is the path integral:
We need to check whether the approximation error is a “constant” in the stochastic sense. Note that
By the property , we have
The approximating error is
The Fourier term is bounded and irrelevant for . If is a martingale in , then the error will be a stochastic constant. Rewrite as:
The second term can be written as
and the first term can be written as . Fubini’s Lemma together with (A2) implies that
Thus is a martingale.
Now we consider the general case in (A1). If the state moves with velocity , the path derivative becomes . Moreover, the Laplace operator Δ in the heat Equation (A1) may be associated with a volatility coefficient . Then the approximating model is given by
which is the integral of the Feller generator on f:
The generator is a dual representation of a diffusion process such that
where and is a Wiener process.
Step 4. Since the martingale with initial condition completely characterizes , the above result can be extended to any by the Principle of Accompanying Laws and Donsker’s Invariance Principle (Theorem 3.1.14 and 3.4.20, Stroock 2000) if and only if belongs to the family of all tight measures, . In our setup, is a compact metric space so the collection of over is tight. The Principle of Accompanying Laws says that if a sequence is in a complete separable space with tight measure, the law of this sequence will weakly converge. Donsker’s Invariance Principle says that for independent increment processes, the convergent law is the law of the Wiener process. Therefore, and
is a martingale.
The IF claim says that a martingale exists for on . The maximum principle restricts the process to be -adapted, thus and the result holds on with the probability space . □
Appendix A.1.3. Proof of Theorem 3
(i) Part of the proof follows by Propositions 3.13 and 3.15 in Bain and Crisan (2008). The boundedness condition
is called Novikov’s condition. By this condition, Girsanov’s theorem implies that defined as
is an -adapted martingale. The Martingale representation theorem implies that
where is the quadratic variation such that . Thus, for , is a Wiener process with respect to :
Thus, on an arbitrary time interval , under the -law, the law of is absolutely continuous with respect to the law of the pair process . For any bounded measurable function defined on the product path space of , we have
Therefore, X and Y are independent under since X and W are independent.
(ii) Under the probability measure , the law of the process Y is completely specified as an -adapted Wiener process with independent increments of Y. Hence, the -algebra is for any . Note that and are independent. By the conditional expectation property,
Since includes all the incremental information after time t,
and is a time invariant -algebra. □
Appendix A.1.4. Proof of Theorem 4
The proof follows the results given in Rozovskii (1991). First we give the Zakai equation, and then we show that KSP is a normalized Zakai equation.
(Zakai Equation) Ifis bounded under, where
then for anythe processfollows
Note that if is a -martingale, then
since is a Wiener process under . By Girsanov’s theorem
Because is bounded, Fubini’s theorem and Itô’s lemma imply
Taking the integral, we have the result. □
We now turn to the proof of Theorem 4.
If a new measure is constructed under a Wiener process Y, then has a representation in terms of by Bayes’ rule such that
Since satisfies a linear evolution equation, we expect this will lead to an evolution equation for . From Equation (A4), we have
which is equivalent to
Note that integration by parts implies
Substituting Equation (A3) for and (A5) for , the result follows. □
Appendix A.2. Proof of Other Results
Appendix A.2.1. Proof of Corollary 1
Sketch of the proof. Take the transition probability and expand it w.r.t. at zero by Taylor’s expansion:
where is the delta function18. The function is the time derivative of the transition probability at , called transition probability per unit time. This expression must satisfy the normalization property, in other words, the integral over must equal one. For this purpose, the above form can be corrected to:
where . Substituting the expansion form into Chapman-Kolmogorov equation
then dividing the equation by , substituting and letting go to zero give us the following result
For the first part, the IF claim says that a martingale exists for on . The maximum principle restricts the process to be -adapted, thus and the result holds on with the probability space . The second part is a standard result of diffusion processes. □
Bain, Alan, and Dan Crisan. 2008. Fundamentals of Stochastic Filtering. In Stochastic Modelling and Applied Probability. New York: Springer Press. [Google Scholar]
Bensoussan, Alain. 2004. Stochastic Control of Partially Observable Systems. Cambridge: Cambridge University Press. [Google Scholar]
Brémaud, Pierre, and Marc Yor. 1978. Changes of filtrations and of probability measures. Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete 45: 269–95. [Google Scholar] [CrossRef]
Doob, Joseph L. 1983. Classical Potential Theory and Its Probabilistic Counterpart. Berlin: Springer Press. [Google Scholar]
Elliott, Robert J., and Anatoliy V. Swishchuk. 2007. Pricing Options and Variance Swaps in Markov-Modulated Brownian Markets. International Series in Operations Research and Management Science; Boston: Springer Press. [Google Scholar]
Evans, George W., and Garey Ramey. 1992. Expectation calculation and macroeconomic dynamics. American Economic Review 82: 207–24. [Google Scholar]
Fujisaki, Masatoshi, Kallianpur G., and Hiroshi Kunita. 1972. Stochastic differential equations for the nonlinear filtering problem. Osaka Journal of Mathematics 9: 19–40. [Google Scholar]
Hamilton, James. 1994. Time Series Analysis. Princeton: Princeton University Press. [Google Scholar]
Hansen, Lars Peter, Yacine Aït-Sahalia, and José A. Scheinkman. 2009. Operator methods for continuous-time markov processes. In Handbook of Financial Econometrics. Oxford: Elsevier, pp. 31–42. [Google Scholar]
Hansen, Lars Peter, and Thomas J. Sargent. 2007. Robustness. Princeton: Princeton University Press. [Google Scholar]
Hansen, Lars Peter. 2007. Beliefs, doubts and learning: Valuing macroeconomic risk. The American Economic Review 97: 1–30. [Google Scholar] [CrossRef]
Hansen, Lars Peter, Nick Polson, and Thomas J. Sargent. 2010. Nonlinear Filtering and Robust Learning. Paper presented at Invited Lecture, ASSA Winter Meetings, Atlanta, GA, USA, October 20. [Google Scholar]
Kallenberg, Olav. 2002. Foundations of Modern Probability. New York: Springer Press. [Google Scholar]
Klein, Lawrence. 1950. Model Building—General Principles. Cowles Monograph No 11. New York: Wiley, pp. 1–13. [Google Scholar]
Kolmogorov, Andrey. 1956. Foundations of the Theory of Probability, 2nd ed. Chelsea: Courier Dover Publications. [Google Scholar]
Koopmans, Tjalling C., H. Rubin, and R.B. Leipnik. 1950. Measuring the equation systems of dynamic economics. In Statistical Inference in Dynamic Economic Models. New York: Wiley, pp. 53–238. [Google Scholar]
Marcet, Albert, and Thomas J. Sargent. 1989a. Convergence of least squares learning in environments with hidden state variables and private information. Journal of Political Economy 97: 1306–22. [Google Scholar] [CrossRef]
Marcet, Albert, and Thomas J. Sargent. 1989b. Convergence of least squares learning in self-referential linear stochastic models. Journal of Economic Theory 48: 337–68. [Google Scholar] [CrossRef]
Meyer, Paul-André. 1976. Un Cours sur les Intégrales Stochastiques. Séminaire Probab. X, Lecture Notes in Mathematics 511. Berlin: Springer Press. [Google Scholar]
Simon, Herbert. 1959. Theories of decision-making in economics and behavioral science. American Economic Review 49: 253–83. [Google Scholar]
Stroock, Daniel W. 2000. Probability Theory: An Analytic View. Cambridge: Cambridge University Press. [Google Scholar]
Van Kampen, Nico. 2007. Stochastic Processes in Physics and Chemistry, 3rd ed. Oxford: Elsevier. [Google Scholar]
The representation is defined in Koopmans et al. (1950) as “a way of writing the system”. In general, the representation is a way of presenting the law of motion of this system.
See also Pollock (2018) for a recent treatment of filtering methods in the frequency domain, and Fujisaki et al. (1972) who study the nonlinear filtering problem with stochastic differential equations.
A set A is called -null set if A is measurable on and .
The notation means that the set is generated by A and B.
In distribution theory this function is usually called test function.
This statement follows from the axiom of choice, which allows for the construction of non-measurable sets, i.e., collections of events that do not have a measure in the ordinary sense, and whose construction requires an uncountable number of events.
The problem could be extended to a semi-martingale problem by using a No Free Lunch claim (the Kreps-Yan Theorem). But then X in general cannot provide any explicit solution for the conditional probability .
Formally, for any .
When the process is assumed to be homogeneous in time, the family of is a semigroup of transition kernels and has been extensively studied in recent works of operator methods, see e.g., Hansen et al. (2009).
Mathematically, this claim intends to squeeze a stochastic problem to a partial differential equation (PDE) problem so that it is possible for economists to construct and solve a specific analytic problem.
One can define a more complicated model to incorporate these effects, but the cost is to use higher order stochastic calculus. In fact, later we will see that the diffusion problem already induces an almost infeasible representation for the conditional density. At this stage, the complexity level of the problems that depart from the diffusion ones still needs to be elaborated.
The dependence between W and X is difficult to eliminate in economics and may cause an endogeneity problem. But, technically speaking, this issue often arises by using a too simple function . Since can be highly non-linear, i.e., containing all endogenous effects, it is reasonable to ignore this issue here.