Abstract
A general framework for inference in dynamical systems is described, based on the language of Bayesian probability theory and making use of the maximum entropy principle. Taking the concept of a path as fundamental, the continuity equation and Cauchy’s equation for fluid dynamics arise naturally, while the specific information about the system can be included using the maximum caliber (or maximum path entropy) principle.
1. Introduction
Dynamical system models are widely used to describe complex physical systems (e.g., the weather), as well as social and economic systems (e.g., the stock market). These systems are usually subject to high levels of uncertainty, either in their initial conditions and/or in their interactions with their environment. From the point of view of constructing predictive models, the optimal description of the time-dependent state of such a system given external constraints is a challenge with promising applications in both fundamental and applied science. This is of course an inference problem in which we must choose the most likely solution out of the (possibly infinite) alternatives compatible with the given information we have about the system.
Given all this, it seems that a unified framework for performing inference on dynamical systems may open new possibilities in several areas, including non-equilibrium statistical mechanics and thermodynamics, hydrodynamics (including magnetohydrodynamics), and classical mechanics under stochastic forces, among other possible fields of application. Of course, this vision of inference applied to dynamical systems is not new—the clearest exposition of the ideas that we aim to extend here was given by E. T. Jaynes [1,2], followed by several others [3,4,5].
In this work, we present some elements for a general framework of inference in dynamical systems, written in the language of Bayesian probability. The first is the master equation, which is shown as a direct consequence of the laws of probability. Next, we develop the treatment of inference over paths from which we obtain the continuity equation and Cauchy’s equation for fluid dynamics, and discuss their range of applicability. Finally, we close with some concluding remarks.
2. Why Bayesian Inference?
Unlike the standard (“frequentist”) interpretation of probability theory, in which probabilities are frequencies of occurrence of repeatable events, Bayesian probability can be understood as the natural extension of classical logic in the case of uncertainty [6,7]. Bayesian probability deals with unknown quantities rather than identical repetitions/copies of an event or system, and is able to include prior information when needed.
The conceptual framework of Bayesian probability provides an elegant language to describe dynamical systems under uncertainty. A straightforward advantage of the Bayesian framework is that one does not need to assume an ensemble of “many identical copies of a system”. A single system with uncertain initial conditions and/or forces is sufficient to construct a theory. The probability of finding this particular system in a given state at a given time would not be a frequency, but rather a degree of plausibility conditioned on the known information. In fact, we can lift even the common assumption of “many degrees of freedom”. The motion of a single particle could be used to construct an internally consistent theory, where the equations of motion for time-dependent probability densities are similar to the ones of hydrodynamics. We will describe in detail both features of the Bayesian formulation in the following sections.
A brief overview of Bayesian notation used in this work follows. We will take as the probability of a particular proposition being true given knowledge . On the other hand, will denote the expected value of an arbitrary quantity G given knowledge , and will be given by
where represents one of the possible states of the system.
3. Dynamical Evolution of Probabilities
Consider a discrete-time system that can transit between n possible states at different times. If we denote by the state of the system at time t, we have that the joint probability of being in state a at time t and in state b at time is given by
This equation is a discrete-time form of the celebrated “master equation” [8,9,10], involving time-dependent transition probabilities. The case where is independent of t (i.e., a function only of the initial state and final state x) is more commonly known as the master equation in the literature. In the continuous-time limit when , we can write this equation as:
where the instantaneous density is given by and we have defined the (continuous-time) transition rate as
In this sense, the master equation as written in Equation (5) is a direct consequence of the laws of probability, and its validity is universal whenever we have transitions between states. Please note that there is no requirement for the system to be Markovian: regardless of the form of the joint probability
there is always a marginal model
which yields the transition probability in Equation (7) as:
It is for this probability that Equation (5) holds. In general, the transition rate will most probably be time-dependent, due to the fact that it captures the dependence of the previous history of the system up to t. It follows from this that all probabilities of time-dependent quantities must evolve in time according to Equation (5) (or (6) in the case of continuous time) for some (possibly time-dependent) transition probability (rate). This continuous-time master equation (Equation (6)) is more general than the continuity equation, as it includes the case where some quantities can be created or destroyed during a process. However, time evolution under global and local conservation laws is a fundamental case that can also be readily obtained from the Bayesian formalism, as we will see in the following sections. As is well-known, the continuous-time master equation can be approximated in the limit of infinitesimally small transitions to obtain the Fokker–Planck equation [9,11], but in the next section we start from the existence of continuous paths as a postulate.
4. Fluid Theories in a Bayesian Formulation
We will now consider a dynamical system that follows a path in time, where denotes the space of all paths consistent with given boundary conditions. The path is not completely known, and we only have access to partial information denoted by .
In this setting, Bayesian theory defines a functional that is the probability density of the path being the “true path” under the known information. For any arbitrary functional of the path, we can then write its expected value as a path integral:
On the other hand, the expected value of any instantaneous quantity is given by
where is the instantaneous probability density at time t.
By using a quantity , we see that the probability density itself has a path integral representation:
By differentiating Equation (12) with respect to time, we obtain the continuity equation for the instantaneous probability density [12]:
where is the velocity field that describes the flow of probability, given by
This equation describes the global and local conservation of the probability of finding the system in a given state at a time t, and is guaranteed to hold for any system moving continuously in time through paths . In the same way, it is possible to derive a dynamical equation for the velocity field itself, by differentiating it with respect to time. We have:
where in the last line we have defined the acceleration field as
and the velocity covariance matrix
By using the continuity equation (Equation (13)) to rewrite the left-hand side as
we obtain, dividing both sides by and using Equation (15), that
The term cancels, and we have:
from which we now cancel the term , arriving at
Rearranging the derivatives of in the left-hand side, we have
which is the Cauchy momentum equation
with the convective derivative and the stress tensor.
Equations (13) and (23) form a closed coupled system of equations for and , needing as their only external input the velocity covariance matrix . These equations are then built-in features of inference over paths. In a Bayesian approach, they are valid for any system that moves continuously in time. The Cauchy momentum equation includes most notably the Navier–Stokes equation as a particular case [13].
5. Including Particular Knowledge into Our Models
At this point, we have developed a generic framework where no particular details about a system have been included. Clearly all those details have to be contained in , or rather, in the covariance matrix which can be derived from it. The question remains about how to incorporate these details in the most unbiased manner. In principle, we could start from the null assumption of equiprobable paths,
and add new information later on, by updating our probability functional to a new , where . There are essentially two equivalent methods to achieve this, and depending on the actual form of , one of them may be more directly applicable than the other.
- (1)
- Bayes’ theorem: the posterior distribution is given in terms of the prior byThis method is most useful when is comprised of statements about the states (e.g., boundary conditions).
- (2)
- Principle of maximum entropy: the posterior distribution is the one that maximizeswhere is the prior distribution. This method is most useful when consists of constraints on the final model , usually expressed as fixed expected values.
In Reference [2], Jaynes assumes the continuity equation from the start and derives the flux from symmetry considerations, the central limit theorem, and Bayes theorem. In our classification, this corresponds to method (1). In Reference [5], Gull recovers Brownian motion by essentially performing discrete-time maximum caliber inference under constraints over location and particle speed, hence corresponding to an application of method (2).
6. The Maximum Caliber Principle
The function that is closest to our prior probability and is consistent with the constraints is the one that maximizes the relative entropy [14,15]
among the set of functions p that are compatible with . The negative of this relative entropy, known as the Kullback–Leibler divergence, is commonly used to measure the “informational distance” from to p. It is important to note that this is a rule of inference and not a physical principle, and therefore it is not bounded by the meaning assigned to the states , as long as we can write (Bayesian) probabilities over them.
For the general case of m constraints of the form
the maximum entropy solution starting from is obtained through the use of m Lagrange multipliers (one for each constraint),
where is the partition function. This is compatible with Bayesian updating, as this posterior distribution is proportional to the prior. The Lagrange multipliers are solutions of the constraint equations in terms of Z:
In exactly the same way, the path (relative) entropy (sometimes known as the caliber) is defined as the path integral [1,16,17,18,19,20,21,22]:
where is the prior path probability. The use of this generalization is justified based on the fact that we can write any path in terms of a complete orthonormal basis ,
and then there is a one-to-one correspondence between every path and its coordinates . Inference over paths then becomes completely equivalent to inference over the coefficients , which form a system with N degrees of freedom.
In summary, for the general maximum caliber inference problem we have m constraints, written as
from which the probability functional obtained is
Any such maximum caliber solution can be cast in the “canonical” form, as
where is a functional, analogous to the Hamilton action of a classical system, and is a constant with the same physical units as . By simple inspection of this canonical form, it is straightforward to see that the most probable path is the one with minimum action:
7. An Illustration: Newtonian Mechanics of Charged Particles
As an example of the application of this formalism, consider a “particle” with known square speed , known instantaneous probability density , and known velocity field for all times . The corresponding constraints are then
The resulting maximum caliber solution is of the form
with the Hamilton action
and a Lagrangian defined as
where , , and are Lagrange multipliers. This Lagrangian can be cast into a more familiar form,
by simply renaming the Lagrange multipliers and integrating the delta function [18]. Interestingly, this is none other than the Lagrangian for a particle with time-dependent “mass” in an external “electromagnetic potential” . The most probable path under these constraints is determined by the solution of the Euler–Lagrange equation, which reduces to Newton’s second law under a “Lorenz force”,
as shown in the Appendix. In particular, it is important to note that it is the constraint on the squared speed that adds the mass to the model, as , the constraint on the probability density adds the scalar potential to the model, as , and finally the constraint on the local velocity field adds the vector potential to the model, because . Nowhere in the derivation of this Lagrangian have we assumed the existence of charges, electromagnetic fields, or the Lorenz force. The structure that is revealed is the most unbiased under the constraints given in Equations (31) to (33), that is, with approximate knowledge of its location (given by ) and velocity “field lines” (given by ). This model could be used for people in a busy street crossing, or vehicles in a city.
8. Concluding Remarks
We have shown that it is possible to construct a fluid theory from Bayesian inference of an abstract system with N degrees of freedom moving along paths , and that this theory automatically includes the continuity equation and the Cauchy momentum equation as built-in features. Moreover, through the use of the Maximum Caliber principle, it is possible to formulate the dynamics of such an abstract system in terms of an action that is minimal for the most probable path, resembling the well-known structures of Lagrangian and Hamiltonian mechanics.
By entering the square speed, instantaneous probability density, and velocity field into our model, a Lagrangian of a “particle” under external fields emerges naturally. This “particle” moves on average according to Newton’s law of motion under the Lorenz force, with scalar and vector potentials determined by the known information about the location and velocity lines. In this formulation, the only ingredients that we could call physical were the existence of an N-dimensional “particle” moving continuously along (unknown) paths. In this application, we see that position and velocity are the only intrinsic (real or ontological) properties of the particle at a given time t. On the other hand, the time-dependent mass and the fields and are emergent parameters (in fact, Lagrange multipliers) needed to impose the constraints on the known information used to construct the model.
Author Contributions
All authors of this paper contributed to the work. S.D. conceived the structure of the paper while D.G. and G.G. provided analysis of previous results and discussions.
Funding
This research was funded by FONDECYT grant 1171127.
Acknowledgments
S.D. and G.G. gratefully acknowledge funding from FONDECYT 1171127.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Derivation of the Lorenz Force from the Lagrangian of a Particle in an Electromagnetic Field
Replacing the total time derivative of the vector field ,
we have
which can be written as
References
- Jaynes, E.T. The Minimum Entropy Production Principle. Ann. Rev. Phys. Chem. 1980, 31, 579–601. [Google Scholar] [CrossRef]
- Jaynes, E.T. Clearing up Mysteries—The Original Goal. In Maximum Entropy and Bayesian Methods: Cambridge, England, 1988; Springer: Dordrecht, The Netherlands, 1989; pp. 1–27. [Google Scholar]
- Dewar, R. Information theory explanation of the fluctuation theorem, maximum entropy production and self-organized criticality in non-equilibrium stationary states. J. Phys. A Math. Gen. 2003, 36, 631–641. [Google Scholar] [CrossRef]
- Grandy, W.T. Entropy and the Time Evolution of Macroscopic Systems; Oxford Science Publications: New York, NY, USA, 2008. [Google Scholar]
- Gull, S. Some Misconceptions about Entropy. Available online: http://www.mrao.cam.ac.uk/~steve/maxent2009/images/miscon.pdf (accessed on 11 September 2018).
- Cox, R.T. Probability, frequency and reasonable expectation. Am. J. Phys. 1946, 14, 1–13. [Google Scholar] [CrossRef]
- Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
- Risken, H. The Fokker-Planck Equation: Methods of Solution and Applications; Springer: Berlin, Germany, 1996. [Google Scholar]
- Van Campen, N.G. Stochastic Processes in Physics and Chemistry; North Holland: Amsterdam, The Netherlands, 2007. [Google Scholar]
- Zwanzig, R. Nonequilibrium Statistical Mechanics; Oxford University Press: New York, NY, USA, 2001. [Google Scholar]
- Van Campen, N.G. The expansion of the master equation. Adv. Chem. Phys 1976, 34, 245–309. [Google Scholar]
- González, D.; Díaz, D.; Davis, S. Continuity equation for probability as a requirement of inference over paths. Eur. Phys. J. B 2016, 89, 214. [Google Scholar] [CrossRef]
- Lamb, H. Hydrodynamics; Dover Books on Physics; Dover Publications: New York, NY, USA, 1945. [Google Scholar]
- Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
- Caticha, A.; Giffin, A. Updating Probabilities. AIP Conf. Proc. 2006, 872, 31. [Google Scholar]
- Pressé, S.; Ghosh, K.; Lee, J.; Dill, K.A. Principles of maximum entropy and maximum caliber in statistical physics. Rev. Mod. Phys. 2013, 85, 1115–1141. [Google Scholar] [CrossRef]
- González, D.; Davis, S.; Gutiérrez, G. Newtonian mechanics from the principle of Maximum Caliber. Found. Phys. 2014, 44, 923. [Google Scholar] [CrossRef]
- Davis, S.; González, D. Hamiltonian Formalism and Path Entropy Maximization. J. Phys. A Math. Theor. 2015, 48, 425003. [Google Scholar] [CrossRef]
- Hazoglou, M.J.; Walther, V.; Dixit, P.D.; Dill, K.A. Maximum caliber is a general variational principle for nonequilibrium statistical mechanics. J. Chem. Phys. 2015, 143, 051104. [Google Scholar] [CrossRef] [PubMed]
- Wan, H.; Zhou, G.; Voeltz, V.A. A maximum-caliber approach to predicting perturbed folding kinetics due to mutations. J. Chem. Theory Comput. 2016, 12, 5768–5776. [Google Scholar] [CrossRef] [PubMed]
- Cafaro, C.; Ali, S.A. Maximum caliber inference and the stochastic Ising model. Phys. Rev. E 2016, 94, 052145. [Google Scholar] [CrossRef] [PubMed]
- Dixit, P.; Wagoner, J.; Weistuch, C.; Pressé, S.; Ghosh, K.; Dill, K.A. Perspective: Maximum caliber is a general variational principle for dynamical systems. J. Chem. Phys. 2018, 148, 010901. [Google Scholar] [CrossRef] [PubMed]
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).